[PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)
@ 2017-01-13 18:11 Jakub Jelinek
  2017-01-13 18:19 ` Joseph Myers
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Jakub Jelinek @ 2017-01-13 18:11 UTC (permalink / raw)
  To: Alexander Monakov, Thomas Schwinge, Cesar Philippidis,
	Chung-Lin Tang, Jeff Law
  Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 30985 bytes --]

Hi!

This is something that has been discussed already during the last Cauldron.
Especially for distributions it is undesirable to need to have proprietary
CUDA libraries and headers installed when building GCC.

These two patches allow building GCC without CUDA around in a way that later
on can offload to PTX if libcuda.so.1 is installed (and the NVidia kernel
driver is installed, haven't tried if it works with nouveau, nor tried some
free CUDA replacements).  This is important because the former step can be
done when building the distribution packages, while the latter is a decision
of the user.  If the nvptx libgomp plugin is installed, but libcuda.so.1
can't be found, then the plugin behaves as if there are no PTX devices
available.  In order to configure gcc to load libcuda.so.1 dynamically,
one has to either configure it --without-cuda-driver, or without
--with-cuda-driver=/--with-cuda-driver-lib=/--with-cuda-driver-include=
options if cuda.h and -lcuda aren't found in the default locations.

I've talked to our lawyers and they said that the cuda.h header included
in this patch doesn't infringe anyone's copyright or is otherwise a fair
use, it has been created by gathering all the cu*/CU* symbols from the
current and older nvptx plugin and some oacc tests, then stubbing the
pointer-ish typedefs, grabing most enum values and function prototypes from
https://raw.githubusercontent.com/shinpei0208/gdev/master/cuda/driver/cuda.h
and verifying assembly with that header against assembly when compiled
against NVidia's cuda.h.

The nvptx-tools change to the nvptx-none-as binary is an important part of
this, although it is not a change to gcc itself - the problem is that by
default nvptx-none-as was calling the ptxas program to verify the assembly
is correct, which of course doesn't work very well when the proprietary
ptxas is not available.  So the patch makes it invoke ptxas always only if
a new --verify option is used, if --no-verify is used, then as before it
is not invoked, and without either of these options the behavior is that if
ptxas is found in $PATH, then it invokes it, if not, it does only minimal
verification good enough for gcc/configure purposes (it turned out to be
sufficient to error out if .version directive is not the first non-comment
token (ptxas errors on that too).

Tested on x86_64-linux, with CUDA around
(--with-cuda-driver=/usr/local/cuda-8.0) as well as without, and tested
in that case also both with libcuda.so.1 available and without.

Can the OpenACC hackers as well as Alex (or his collegues) please also test
it?  Do you have any problems with the GCC patch (if not, I'd commit it
next week before stage3 closes)?  Is the nvptx-tools patch ok (and if so,
can you commit it; I guess I could create a github pull request for this
if needed).

P.S.: not sure what is the cuInit call in nvptx_init good for, doesn't
libgomp always call nvptx_get_num_devices first and thus call cuInit already
there (but I've kept it in the patch)?

2017-01-13  Jakub Jelinek  <jakub@redhat.com>

	* plugin/configfrag.ac: For --without-cuda-driver don't initialize
	CUDA_DRIVER_INCLUDE nor CUDA_DRIVER_LIB.  If both
	CUDA_DRIVER_INCLUDE and CUDA_DRIVER_LIB are empty and linking small
	cuda program fails, define PLUGIN_NVPTX_DYNAMIC to 1 and use
	plugin/include/cuda as include dir and -ldl instead of -lcuda as
	library to link ptx plugin against.
	* plugin/plugin-nvptx.c: Include dlfcn.h if PLUGIN_NVPTX_DYNAMIC.
	(CUDA_CALLS): Define.
	(cuda_lib, cuda_lib_inited): New variables.
	(init_cuda_lib): New function.
	(CUDA_CALL_PREFIX): Define.
	(CUDA_CALL_ERET, CUDA_CALL_ASSERT): Use CUDA_CALL_PREFIX.
	(CUDA_CALL): Use FN instead of (FN).
	(CUDA_CALL_NOCHECK): Define.
	(cuda_error, fini_streams_for_device, select_stream_for_async,
	nvptx_attach_host_thread_to_device, nvptx_open_device, link_ptx,
	event_gc, nvptx_exec, nvptx_async_test, nvptx_async_test_all,
	nvptx_wait_all, nvptx_set_clocktick, GOMP_OFFLOAD_unload_image,
	nvptx_stacks_alloc, nvptx_stacks_free, GOMP_OFFLOAD_run): Use
	CUDA_CALL_NOCHECK.
	(nvptx_init): Call init_cuda_lib, if it fails, return false.  Use
	CUDA_CALL_NOCHECK.
	(nvptx_get_num_devices): Call init_cuda_lib, if it fails, return 0.
	Use CUDA_CALL_NOCHECK.
	* plugin/cuda/cuda.h: New file.
	* config.h.in: Regenerated.
	* configure: Regenerated.
	* Makefile.in: Regenerated.

--- libgomp/plugin/configfrag.ac.jj	2017-01-13 12:07:56.000000000 +0100
+++ libgomp/plugin/configfrag.ac	2017-01-13 17:33:26.608240936 +0100
@@ -58,10 +58,12 @@ AC_ARG_WITH(cuda-driver-include,
 AC_ARG_WITH(cuda-driver-lib,
 	[AS_HELP_STRING([--with-cuda-driver-lib=PATH],
 		[specify directory for the installed CUDA driver library])])
-if test "x$with_cuda_driver" != x; then
-  CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
-  CUDA_DRIVER_LIB=$with_cuda_driver/lib
-fi
+case "x$with_cuda_driver" in
+  x | xno) ;;
+  *) CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
+     CUDA_DRIVER_LIB=$with_cuda_driver/lib
+     ;;
+esac
 if test "x$with_cuda_driver_include" != x; then
   CUDA_DRIVER_INCLUDE=$with_cuda_driver_include
 fi
@@ -79,6 +81,7 @@ PLUGIN_NVPTX=0
 PLUGIN_NVPTX_CPPFLAGS=
 PLUGIN_NVPTX_LDFLAGS=
 PLUGIN_NVPTX_LIBS=
+PLUGIN_NVPTX_DYNAMIC=0
 AC_SUBST(PLUGIN_NVPTX)
 AC_SUBST(PLUGIN_NVPTX_CPPFLAGS)
 AC_SUBST(PLUGIN_NVPTX_LDFLAGS)
@@ -167,9 +170,17 @@ if test x"$enable_offload_targets" != x;
 	LIBS=$PLUGIN_NVPTX_save_LIBS
 	case $PLUGIN_NVPTX in
 	  nvptx*)
-	    PLUGIN_NVPTX=0
-	    AC_MSG_ERROR([CUDA driver package required for nvptx support])
-	    ;;
+	    if test "x$CUDA_DRIVER_INCLUDE" = x \
+	       && test "x$CUDA_DRIVER_LIB" = x; then
+	      PLUGIN_NVPTX=1
+	      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
+	      PLUGIN_NVPTX_LIBS='-ldl'
+	      PLUGIN_NVPTX_DYNAMIC=1
+	    else
+	      PLUGIN_NVPTX=0
+	      AC_MSG_ERROR([CUDA driver package required for nvptx support])
+	    fi
+	  ;;
 	esac
 	;;
       hsa*)
@@ -241,6 +252,8 @@ AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$of
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
   [Define to 1 if the NVIDIA plugin is built, 0 if not.])
+AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
+  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should be linked against it.])
 AM_CONDITIONAL([PLUGIN_HSA], [test $PLUGIN_HSA = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_HSA], [$PLUGIN_HSA],
   [Define to 1 if the HSA plugin is built, 0 if not.])
--- libgomp/plugin/plugin-nvptx.c.jj	2017-01-13 12:07:56.000000000 +0100
+++ libgomp/plugin/plugin-nvptx.c	2017-01-13 18:00:39.693284346 +0100
@@ -48,30 +48,104 @@
 #include <assert.h>
 #include <errno.h>
 
-static const char *
-cuda_error (CUresult r)
-{
-#if CUDA_VERSION < 7000
-  /* Specified in documentation and present in library from at least
-     5.5.  Not declared in header file prior to 7.0.  */
-  extern CUresult cuGetErrorString (CUresult, const char **);
-#endif
-  const char *desc;
+#if PLUGIN_NVPTX_DYNAMIC
+# include <dlfcn.h>
 
-  r = cuGetErrorString (r, &desc);
-  if (r != CUDA_SUCCESS)
-    desc = "unknown cuda error";
+# define CUDA_CALLS \
+CUDA_ONE_CALL (cuCtxCreate)		\
+CUDA_ONE_CALL (cuCtxDestroy)		\
+CUDA_ONE_CALL (cuCtxGetCurrent)		\
+CUDA_ONE_CALL (cuCtxGetDevice)		\
+CUDA_ONE_CALL (cuCtxPopCurrent)		\
+CUDA_ONE_CALL (cuCtxPushCurrent)	\
+CUDA_ONE_CALL (cuCtxSynchronize)	\
+CUDA_ONE_CALL (cuDeviceGet)		\
+CUDA_ONE_CALL (cuDeviceGetAttribute)	\
+CUDA_ONE_CALL (cuDeviceGetCount)	\
+CUDA_ONE_CALL (cuEventCreate)		\
+CUDA_ONE_CALL (cuEventDestroy)		\
+CUDA_ONE_CALL (cuEventElapsedTime)	\
+CUDA_ONE_CALL (cuEventQuery)		\
+CUDA_ONE_CALL (cuEventRecord)		\
+CUDA_ONE_CALL (cuEventSynchronize)	\
+CUDA_ONE_CALL (cuFuncGetAttribute)	\
+CUDA_ONE_CALL (cuGetErrorString)	\
+CUDA_ONE_CALL (cuInit)			\
+CUDA_ONE_CALL (cuLaunchKernel)		\
+CUDA_ONE_CALL (cuLinkAddData)		\
+CUDA_ONE_CALL (cuLinkComplete)		\
+CUDA_ONE_CALL (cuLinkCreate)		\
+CUDA_ONE_CALL (cuLinkDestroy)		\
+CUDA_ONE_CALL (cuMemAlloc)		\
+CUDA_ONE_CALL (cuMemAllocHost)		\
+CUDA_ONE_CALL (cuMemcpy)		\
+CUDA_ONE_CALL (cuMemcpyDtoDAsync)	\
+CUDA_ONE_CALL (cuMemcpyDtoH)		\
+CUDA_ONE_CALL (cuMemcpyDtoHAsync)	\
+CUDA_ONE_CALL (cuMemcpyHtoD)		\
+CUDA_ONE_CALL (cuMemcpyHtoDAsync)	\
+CUDA_ONE_CALL (cuMemFree)		\
+CUDA_ONE_CALL (cuMemFreeHost)		\
+CUDA_ONE_CALL (cuMemGetAddressRange)	\
+CUDA_ONE_CALL (cuMemHostGetDevicePointer)\
+CUDA_ONE_CALL (cuModuleGetFunction)	\
+CUDA_ONE_CALL (cuModuleGetGlobal)	\
+CUDA_ONE_CALL (cuModuleLoad)		\
+CUDA_ONE_CALL (cuModuleLoadData)	\
+CUDA_ONE_CALL (cuModuleUnload)		\
+CUDA_ONE_CALL (cuStreamCreate)		\
+CUDA_ONE_CALL (cuStreamDestroy)		\
+CUDA_ONE_CALL (cuStreamQuery)		\
+CUDA_ONE_CALL (cuStreamSynchronize)	\
+CUDA_ONE_CALL (cuStreamWaitEvent)
+# define CUDA_ONE_CALL(call) \
+  __typeof (call) *call;
+struct cuda_lib_s {
+  CUDA_CALLS
+} cuda_lib;
+
+/* -1 if init_cuda_lib has not been called yet, false
+   if it has been and failed, true if it has been and succeeded.  */
+static char cuda_lib_inited = -1;
 
-  return desc;
+/* Dynamically load the CUDA runtime library and initialize function
+   pointers, return false if unsuccessful, true if successful.  */
+static bool
+init_cuda_lib (void)
+{
+  if (cuda_lib_inited != -1)
+    return cuda_lib_inited;
+  const char *cuda_runtime_lib = "libcuda.so.1";
+  void *h = dlopen (cuda_runtime_lib, RTLD_LAZY);
+  cuda_lib_inited = false;
+  if (h == NULL)
+    return false;
+# undef CUDA_ONE_CALL
+# define CUDA_ONE_CALL(call) CUDA_ONE_CALL_1 (call)
+# define CUDA_ONE_CALL_1(call) \
+  cuda_lib.call = dlsym (h, #call);	\
+  if (cuda_lib.call == NULL)		\
+    return false;
+  CUDA_CALLS
+  cuda_lib_inited = true;
+  return true;
 }
+# undef CUDA_ONE_CALL
+# undef CUDA_ONE_CALL_1
+# define CUDA_CALL_PREFIX cuda_lib.
+#else
+# define CUDA_CALL_PREFIX
+# define init_cuda_lib() true
+#endif
 
 /* Convenience macros for the frequently used CUDA library call and
-   error handling sequence.  This does not capture all the cases we
-   use in this file, but is common enough.  */
+   error handling sequence as well as CUDA library calls that
+   do the error checking themselves or don't do it at all.  */
 
 #define CUDA_CALL_ERET(ERET, FN, ...)		\
   do {						\
-    unsigned __r = FN (__VA_ARGS__);		\
+    unsigned __r				\
+      = CUDA_CALL_PREFIX FN (__VA_ARGS__);	\
     if (__r != CUDA_SUCCESS)			\
       {						\
 	GOMP_PLUGIN_error (#FN " error: %s",	\
@@ -81,11 +155,12 @@ cuda_error (CUresult r)
   } while (0)
 
 #define CUDA_CALL(FN, ...)			\
-  CUDA_CALL_ERET (false, (FN), __VA_ARGS__)
+  CUDA_CALL_ERET (false, FN, __VA_ARGS__)
 
 #define CUDA_CALL_ASSERT(FN, ...)		\
   do {						\
-    unsigned __r = FN (__VA_ARGS__);		\
+    unsigned __r				\
+      = CUDA_CALL_PREFIX FN (__VA_ARGS__);	\
     if (__r != CUDA_SUCCESS)			\
       {						\
 	GOMP_PLUGIN_fatal (#FN " error: %s",	\
@@ -93,6 +168,26 @@ cuda_error (CUresult r)
       }						\
   } while (0)
 
+#define CUDA_CALL_NOCHECK(FN, ...)		\
+  CUDA_CALL_PREFIX FN (__VA_ARGS__)
+
+static const char *
+cuda_error (CUresult r)
+{
+#if CUDA_VERSION < 7000
+  /* Specified in documentation and present in library from at least
+     5.5.  Not declared in header file prior to 7.0.  */
+  extern CUresult cuGetErrorString (CUresult, const char **);
+#endif
+  const char *desc;
+
+  r = CUDA_CALL_NOCHECK (cuGetErrorString, r, &desc);
+  if (r != CUDA_SUCCESS)
+    desc = "unknown cuda error";
+
+  return desc;
+}
+
 static unsigned int instantiated_devices = 0;
 static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
@@ -401,7 +496,7 @@ fini_streams_for_device (struct ptx_devi
 
       ret &= map_fini (s);
 
-      CUresult r = cuStreamDestroy (s->stream);
+      CUresult r = CUDA_CALL_NOCHECK (cuStreamDestroy, s->stream);
       if (r != CUDA_SUCCESS)
 	{
 	  GOMP_PLUGIN_error ("cuStreamDestroy error: %s", cuda_error (r));
@@ -484,7 +579,8 @@ select_stream_for_async (int async, pthr
 	    s->stream = existing;
 	  else
 	    {
-	      r = cuStreamCreate (&s->stream, CU_STREAM_DEFAULT);
+	      r = CUDA_CALL_NOCHECK (cuStreamCreate, &s->stream,
+				     CU_STREAM_DEFAULT);
 	      if (r != CUDA_SUCCESS)
 		{
 		  pthread_mutex_unlock (&ptx_dev->stream_lock);
@@ -554,10 +650,14 @@ nvptx_init (void)
   if (instantiated_devices != 0)
     return true;
 
-  CUDA_CALL (cuInit, 0);
   ptx_events = NULL;
   pthread_mutex_init (&ptx_event_lock, NULL);
 
+  if (!init_cuda_lib ())
+    return false;
+
+  CUDA_CALL (cuInit, 0);
+
   CUDA_CALL (cuDeviceGetCount, &ndevs);
   ptx_devices = GOMP_PLUGIN_malloc_cleared (sizeof (struct ptx_device *)
 					    * ndevs);
@@ -575,7 +675,7 @@ nvptx_attach_host_thread_to_device (int
   struct ptx_device *ptx_dev;
   CUcontext thd_ctx;
 
-  r = cuCtxGetDevice (&dev);
+  r = CUDA_CALL_NOCHECK (cuCtxGetDevice, &dev);
   if (r != CUDA_SUCCESS && r != CUDA_ERROR_INVALID_CONTEXT)
     {
       GOMP_PLUGIN_error ("cuCtxGetDevice error: %s", cuda_error (r));
@@ -623,7 +723,7 @@ nvptx_open_device (int n)
   ptx_dev->dev = dev;
   ptx_dev->ctx_shared = false;
 
-  r = cuCtxGetDevice (&ctx_dev);
+  r = CUDA_CALL_NOCHECK (cuCtxGetDevice, &ctx_dev);
   if (r != CUDA_SUCCESS && r != CUDA_ERROR_INVALID_CONTEXT)
     {
       GOMP_PLUGIN_error ("cuCtxGetDevice error: %s", cuda_error (r));
@@ -669,7 +769,7 @@ nvptx_open_device (int n)
 		  &pi, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
   ptx_dev->clock_khz = pi;
 
-  CUDA_CALL_ERET (NULL,  cuDeviceGetAttribute,
+  CUDA_CALL_ERET (NULL, cuDeviceGetAttribute,
 		  &pi, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT, dev);
   ptx_dev->num_sms = pi;
 
@@ -679,7 +779,7 @@ nvptx_open_device (int n)
 
   /* CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82 is defined only
      in CUDA 6.0 and newer.  */
-  r = cuDeviceGetAttribute (&pi, 82, dev);
+  r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &pi, 82, dev);
   /* Fallback: use limit of registers per block, which is usually equal.  */
   if (r == CUDA_ERROR_INVALID_VALUE)
     pi = ptx_dev->regs_per_block;
@@ -698,8 +798,8 @@ nvptx_open_device (int n)
       return NULL;
     }
 
-  r = cuDeviceGetAttribute (&async_engines,
-			    CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT, dev);
+  r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &async_engines,
+			 CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT, dev);
   if (r != CUDA_SUCCESS)
     async_engines = 1;
 
@@ -746,7 +846,9 @@ nvptx_get_num_devices (void)
      further initialization).  */
   if (instantiated_devices == 0)
     {
-      CUresult r = cuInit (0);
+      if (!init_cuda_lib ())
+	return 0;
+      CUresult r = CUDA_CALL_NOCHECK (cuInit, 0);
       /* This is not an error: e.g. we may have CUDA libraries installed but
          no devices available.  */
       if (r != CUDA_SUCCESS)
@@ -797,8 +899,9 @@ link_ptx (CUmodule *module, const struct
       /* cuLinkAddData's 'data' argument erroneously omits the const
 	 qualifier.  */
       GOMP_PLUGIN_debug (0, "Loading:\n---\n%s\n---\n", ptx_objs->code);
-      r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, (char*)ptx_objs->code,
-			 ptx_objs->size, 0, 0, 0, 0);
+      r = CUDA_CALL_NOCHECK (cuLinkAddData, linkstate, CU_JIT_INPUT_PTX,
+			     (char *) ptx_objs->code, ptx_objs->size,
+			     0, 0, 0, 0);
       if (r != CUDA_SUCCESS)
 	{
 	  GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
@@ -809,7 +912,7 @@ link_ptx (CUmodule *module, const struct
     }
 
   GOMP_PLUGIN_debug (0, "Linking\n");
-  r = cuLinkComplete (linkstate, &linkout, &linkoutsize);
+  r = CUDA_CALL_NOCHECK (cuLinkComplete, linkstate, &linkout, &linkoutsize);
 
   GOMP_PLUGIN_debug (0, "Link complete: %fms\n", elapsed);
   GOMP_PLUGIN_debug (0, "Link log %s\n", &ilog[0]);
@@ -844,7 +947,7 @@ event_gc (bool memmap_lockable)
       if (e->ord != nvthd->ptx_dev->ord)
 	continue;
 
-      r = cuEventQuery (*e->evt);
+      r = CUDA_CALL_NOCHECK (cuEventQuery, *e->evt);
       if (r == CUDA_SUCCESS)
 	{
 	  bool append_async = false;
@@ -877,7 +980,7 @@ event_gc (bool memmap_lockable)
 	      break;
 	    }
 
-	  cuEventDestroy (*te);
+	  CUDA_CALL_NOCHECK (cuEventDestroy, *te);
 	  free ((void *)te);
 
 	  /* Unlink 'e' from ptx_events list.  */
@@ -1015,10 +1118,14 @@ nvptx_exec (void (*fn), size_t mapnum, v
 	  cu_mpc = CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT;
 	  cu_tpm  = CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR;
 
-	  if (cuDeviceGetAttribute (&block_size, cu_tpb, dev) == CUDA_SUCCESS
-	      && cuDeviceGetAttribute (&warp_size, cu_ws, dev) == CUDA_SUCCESS
-	      && cuDeviceGetAttribute (&dev_size, cu_mpc, dev) == CUDA_SUCCESS
-	      && cuDeviceGetAttribute (&cpu_size, cu_tpm, dev)  == CUDA_SUCCESS)
+	  if (CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &block_size, cu_tpb,
+				 dev) == CUDA_SUCCESS
+	      && CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &warp_size, cu_ws,
+				    dev) == CUDA_SUCCESS
+	      && CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &dev_size, cu_mpc,
+				    dev) == CUDA_SUCCESS
+	      && CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &cpu_size, cu_tpm,
+				    dev) == CUDA_SUCCESS)
 	    {
 	      GOMP_PLUGIN_debug (0, " warp_size=%d, block_size=%d,"
 				 " dev_size=%d, cpu_size=%d\n",
@@ -1090,7 +1197,7 @@ nvptx_exec (void (*fn), size_t mapnum, v
 #ifndef DISABLE_ASYNC
   if (async < acc_async_noval)
     {
-      r = cuStreamSynchronize (dev_str->stream);
+      r = CUDA_CALL_NOCHECK (cuStreamSynchronize, dev_str->stream);
       if (r == CUDA_ERROR_LAUNCH_FAILED)
 	GOMP_PLUGIN_fatal ("cuStreamSynchronize error: %s %s\n", cuda_error (r),
 			   maybe_abort_msg);
@@ -1103,7 +1210,7 @@ nvptx_exec (void (*fn), size_t mapnum, v
 
       e = (CUevent *)GOMP_PLUGIN_malloc (sizeof (CUevent));
 
-      r = cuEventCreate (e, CU_EVENT_DISABLE_TIMING);
+      r = CUDA_CALL_NOCHECK (cuEventCreate, e, CU_EVENT_DISABLE_TIMING);
       if (r == CUDA_ERROR_LAUNCH_FAILED)
 	GOMP_PLUGIN_fatal ("cuEventCreate error: %s %s\n", cuda_error (r),
 			   maybe_abort_msg);
@@ -1117,7 +1224,7 @@ nvptx_exec (void (*fn), size_t mapnum, v
       event_add (PTX_EVT_KNL, e, (void *)dev_str, 0);
     }
 #else
-  r = cuCtxSynchronize ();
+  r = CUDA_CALL_NOCHECK (cuCtxSynchronize, );
   if (r == CUDA_ERROR_LAUNCH_FAILED)
     GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s %s\n", cuda_error (r),
 		       maybe_abort_msg);
@@ -1294,7 +1401,7 @@ nvptx_async_test (int async)
   if (!s)
     GOMP_PLUGIN_fatal ("unknown async %d", async);
 
-  r = cuStreamQuery (s->stream);
+  r = CUDA_CALL_NOCHECK (cuStreamQuery, s->stream);
   if (r == CUDA_SUCCESS)
     {
       /* The oacc-parallel.c:goacc_wait function calls this hook to determine
@@ -1325,7 +1432,8 @@ nvptx_async_test_all (void)
   for (s = nvthd->ptx_dev->active_streams; s != NULL; s = s->next)
     {
       if ((s->multithreaded || pthread_equal (s->host_thread, self))
-	  && cuStreamQuery (s->stream) == CUDA_ERROR_NOT_READY)
+	  && CUDA_CALL_NOCHECK (cuStreamQuery,
+				s->stream) == CUDA_ERROR_NOT_READY)
 	{
 	  pthread_mutex_unlock (&nvthd->ptx_dev->stream_lock);
 	  return 0;
@@ -1400,7 +1508,7 @@ nvptx_wait_all (void)
     {
       if (s->multithreaded || pthread_equal (s->host_thread, self))
 	{
-	  r = cuStreamQuery (s->stream);
+	  r = CUDA_CALL_NOCHECK (cuStreamQuery, s->stream);
 	  if (r == CUDA_SUCCESS)
 	    continue;
 	  else if (r != CUDA_ERROR_NOT_READY)
@@ -1632,13 +1740,15 @@ static void
 nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
 {
   CUdeviceptr dptr;
-  CUresult r = cuModuleGetGlobal (&dptr, NULL, module, "__nvptx_clocktick");
+  CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &dptr, NULL,
+				  module, "__nvptx_clocktick");
   if (r == CUDA_ERROR_NOT_FOUND)
     return;
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
   double __nvptx_clocktick = 1e-3 / dev->clock_khz;
-  r = cuMemcpyHtoD (dptr, &__nvptx_clocktick, sizeof (__nvptx_clocktick));
+  r = CUDA_CALL_NOCHECK (cuMemcpyHtoD, dptr, &__nvptx_clocktick,
+			 sizeof (__nvptx_clocktick));
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
 }
@@ -1761,7 +1871,7 @@ GOMP_OFFLOAD_unload_image (int ord, unsi
     if (image->target_data == target_data)
       {
 	*prev_p = image->next;
-	if (cuModuleUnload (image->module) != CUDA_SUCCESS)
+	if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS)
 	  ret = false;
 	free (image->fns);
 	free (image);
@@ -1974,7 +2084,7 @@ static void *
 nvptx_stacks_alloc (size_t size, int num)
 {
   CUdeviceptr stacks;
-  CUresult r = cuMemAlloc (&stacks, size * num);
+  CUresult r = CUDA_CALL_NOCHECK (cuMemAlloc, &stacks, size * num);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuMemAlloc error: %s", cuda_error (r));
   return (void *) stacks;
@@ -1985,7 +2095,7 @@ nvptx_stacks_alloc (size_t size, int num
 static void
 nvptx_stacks_free (void *p, int num)
 {
-  CUresult r = cuMemFree ((CUdeviceptr) p);
+  CUresult r = CUDA_CALL_NOCHECK (cuMemFree, (CUdeviceptr) p);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuMemFree error: %s", cuda_error (r));
 }
@@ -2028,14 +2138,13 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn,
     CU_LAUNCH_PARAM_BUFFER_SIZE, &fn_args_size,
     CU_LAUNCH_PARAM_END
   };
-  r = cuLaunchKernel (function,
-		      teams, 1, 1,
-		      32, threads, 1,
-		      0, ptx_dev->null_stream->stream, NULL, config);
+  r = CUDA_CALL_NOCHECK (cuLaunchKernel, function, teams, 1, 1,
+			 32, threads, 1, 0, ptx_dev->null_stream->stream,
+			 NULL, config);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r));
 
-  r = cuCtxSynchronize ();
+  r = CUDA_CALL_NOCHECK (cuCtxSynchronize, );
   if (r == CUDA_ERROR_LAUNCH_FAILED)
     GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s %s\n", cuda_error (r),
 		       maybe_abort_msg);
--- libgomp/plugin/cuda/cuda.h.jj	2017-01-13 15:58:00.966544147 +0100
+++ libgomp/plugin/cuda/cuda.h	2017-01-13 17:02:47.355817896 +0100
@@ -0,0 +1,174 @@
+/* CUDA API description.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.
+
+This header provides the minimum amount of typedefs, enums and function
+declarations to be able to compile plugin-nvptx.c if cuda.h and
+libcuda.so.1 are not available.  */
+
+#ifndef GCC_CUDA_H
+#define GCC_CUDA_H
+
+#include <stdlib.h>
+
+#define CUDA_VERSION 8000
+
+typedef void *CUcontext;
+typedef int CUdevice;
+#ifdef __LP64__
+typedef unsigned long long CUdeviceptr;
+#else
+typedef unsigned CUdeviceptr;
+#endif
+typedef void *CUevent;
+typedef void *CUfunction;
+typedef void *CUlinkState;
+typedef void *CUmodule;
+typedef void *CUstream;
+
+typedef enum {
+  CUDA_SUCCESS = 0,
+  CUDA_ERROR_INVALID_VALUE = 1,
+  CUDA_ERROR_OUT_OF_MEMORY = 2,
+  CUDA_ERROR_INVALID_CONTEXT = 201,
+  CUDA_ERROR_NOT_FOUND = 500,
+  CUDA_ERROR_NOT_READY = 600,
+  CUDA_ERROR_LAUNCH_FAILED = 719
+} CUresult;
+
+typedef enum {
+  CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK = 1,
+  CU_DEVICE_ATTRIBUTE_WARP_SIZE = 10,
+  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK = 12,
+  CU_DEVICE_ATTRIBUTE_CLOCK_RATE = 13,
+  CU_DEVICE_ATTRIBUTE_GPU_OVERLAP = 15,
+  CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT = 16,
+  CU_DEVICE_ATTRIBUTE_INTEGRATED = 18,
+  CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY = 19,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_MODE = 20,
+  CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS = 31,
+  CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
+  CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
+  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82
+} CUdevice_attribute;
+
+enum {
+  CU_EVENT_DEFAULT = 0,
+  CU_EVENT_DISABLE_TIMING = 2
+};
+
+typedef enum {
+  CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK = 0,
+  CU_FUNC_ATTRIBUTE_NUM_REGS = 4
+} CUfunction_attribute;
+
+typedef enum {
+  CU_JIT_WALL_TIME = 2,
+  CU_JIT_INFO_LOG_BUFFER = 3,
+  CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES = 4,
+  CU_JIT_ERROR_LOG_BUFFER = 5,
+  CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES = 6,
+  CU_JIT_LOG_VERBOSE = 12
+} CUjit_option;
+
+typedef enum {
+  CU_JIT_INPUT_PTX = 1
+} CUjitInputType;
+
+enum {
+  CU_CTX_SCHED_AUTO = 0
+};
+
+#define CU_LAUNCH_PARAM_END ((void *) 0)
+#define CU_LAUNCH_PARAM_BUFFER_POINTER ((void *) 1)
+#define CU_LAUNCH_PARAM_BUFFER_SIZE ((void *) 2)
+
+enum {
+  CU_STREAM_DEFAULT = 0,
+  CU_STREAM_NON_BLOCKING = 1
+};
+
+#define cuCtxCreate cuCtxCreate_v2
+CUresult cuCtxCreate (CUcontext *, unsigned, CUdevice);
+#define cuCtxDestroy cuCtxDestroy_v2
+CUresult cuCtxDestroy (CUcontext);
+CUresult cuCtxGetCurrent (CUcontext *);
+CUresult cuCtxGetDevice (CUdevice *);
+#define cuCtxPopCurrent cuCtxPopCurrent_v2
+CUresult cuCtxPopCurrent (CUcontext *);
+#define cuCtxPushCurrent cuCtxPushCurrent_v2
+CUresult cuCtxPushCurrent (CUcontext);
+CUresult cuCtxSynchronize (void);
+CUresult cuDeviceGet (CUdevice *, int);
+CUresult cuDeviceGetAttribute (int *, CUdevice_attribute, CUdevice);
+CUresult cuDeviceGetCount (int *);
+CUresult cuEventCreate (CUevent *, unsigned);
+#define cuEventDestroy cuEventDestroy_v2
+CUresult cuEventDestroy (CUevent);
+CUresult cuEventElapsedTime (float *, CUevent, CUevent);
+CUresult cuEventQuery (CUevent);
+CUresult cuEventRecord (CUevent, CUstream);
+CUresult cuEventSynchronize (CUevent);
+CUresult cuFuncGetAttribute (int *, CUfunction_attribute, CUfunction);
+CUresult cuGetErrorString (CUresult, const char **);
+CUresult cuInit (unsigned);
+CUresult cuLaunchKernel (CUfunction, unsigned, unsigned, unsigned, unsigned,
+			 unsigned, unsigned, unsigned, CUstream, void **, void **);
+#define cuLinkAddData cuLinkAddData_v2
+CUresult cuLinkAddData (CUlinkState, CUjitInputType, void *, size_t, const char *,
+			unsigned, CUjit_option *, void **);
+CUresult cuLinkComplete (CUlinkState, void **, size_t *);
+#define cuLinkCreate cuLinkCreate_v2
+CUresult cuLinkCreate (unsigned, CUjit_option *, void **, CUlinkState *);
+CUresult cuLinkDestroy (CUlinkState);
+#define cuMemAlloc cuMemAlloc_v2
+CUresult cuMemAlloc (CUdeviceptr *, size_t);
+#define cuMemAllocHost cuMemAllocHost_v2
+CUresult cuMemAllocHost (void **, size_t);
+CUresult cuMemcpy (CUdeviceptr, CUdeviceptr, size_t);
+#define cuMemcpyDtoDAsync cuMemcpyDtoDAsync_v2
+CUresult cuMemcpyDtoDAsync (CUdeviceptr, CUdeviceptr, size_t, CUstream);
+#define cuMemcpyDtoH cuMemcpyDtoH_v2
+CUresult cuMemcpyDtoH (void *, CUdeviceptr, size_t);
+#define cuMemcpyDtoHAsync cuMemcpyDtoHAsync_v2
+CUresult cuMemcpyDtoHAsync (void *, CUdeviceptr, size_t, CUstream);
+#define cuMemcpyHtoD cuMemcpyHtoD_v2
+CUresult cuMemcpyHtoD (CUdeviceptr, const void *, size_t);
+#define cuMemcpyHtoDAsync cuMemcpyHtoDAsync_v2
+CUresult cuMemcpyHtoDAsync (CUdeviceptr, const void *, size_t, CUstream);
+#define cuMemFree cuMemFree_v2
+CUresult cuMemFree (CUdeviceptr);
+CUresult cuMemFreeHost (void *);
+#define cuMemGetAddressRange cuMemGetAddressRange_v2
+CUresult cuMemGetAddressRange (CUdeviceptr *, size_t *, CUdeviceptr);
+#define cuMemHostGetDevicePointer cuMemHostGetDevicePointer_v2
+CUresult cuMemHostGetDevicePointer (CUdeviceptr *, void *, unsigned);
+CUresult cuModuleGetFunction (CUfunction *, CUmodule, const char *);
+#define cuModuleGetGlobal cuModuleGetGlobal_v2
+CUresult cuModuleGetGlobal (CUdeviceptr *, size_t *, CUmodule, const char *);
+CUresult cuModuleLoad (CUmodule *, const char *);
+CUresult cuModuleLoadData (CUmodule *, const void *);
+CUresult cuModuleUnload (CUmodule);
+CUresult cuStreamCreate (CUstream *, unsigned);
+#define cuStreamDestroy cuStreamDestroy_v2
+CUresult cuStreamDestroy (CUstream);
+CUresult cuStreamQuery (CUstream);
+CUresult cuStreamSynchronize (CUstream);
+CUresult cuStreamWaitEvent (CUstream, CUevent, unsigned);
+
+#endif /* GCC_CUDA_H */
--- libgomp/config.h.in.jj	2017-01-13 12:07:55.000000000 +0100
+++ libgomp/config.h.in	2017-01-13 16:46:37.000000000 +0100
@@ -155,6 +155,10 @@
 /* Define to 1 if the NVIDIA plugin is built, 0 if not. */
 #undef PLUGIN_NVPTX
 
+/* Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should
+   be linked against it. */
+#undef PLUGIN_NVPTX_DYNAMIC
+
 /* Define if all infrastructure, needed for plugins, is supported. */
 #undef PLUGIN_SUPPORT
 
--- libgomp/configure.jj	2017-01-13 12:07:56.000000000 +0100
+++ libgomp/configure	2017-01-13 17:34:02.384782324 +0100
@@ -15135,7 +15135,7 @@ fi
 
 # Plugins for offload execution, configure.ac fragment.  -*- mode: autoconf -*-
 #
-# Copyright (C) 2014-2016 Free Software Foundation, Inc.
+# Copyright (C) 2014-2017 Free Software Foundation, Inc.
 #
 # Contributed by Mentor Embedded.
 #
@@ -15295,10 +15295,12 @@ if test "${with_cuda_driver_lib+set}" =
   withval=$with_cuda_driver_lib;
 fi
 
-if test "x$with_cuda_driver" != x; then
-  CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
-  CUDA_DRIVER_LIB=$with_cuda_driver/lib
-fi
+case "x$with_cuda_driver" in
+  x | xno) ;;
+  *) CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
+     CUDA_DRIVER_LIB=$with_cuda_driver/lib
+     ;;
+esac
 if test "x$with_cuda_driver_include" != x; then
   CUDA_DRIVER_INCLUDE=$with_cuda_driver_include
 fi
@@ -15316,6 +15318,7 @@ PLUGIN_NVPTX=0
 PLUGIN_NVPTX_CPPFLAGS=
 PLUGIN_NVPTX_LDFLAGS=
 PLUGIN_NVPTX_LIBS=
+PLUGIN_NVPTX_DYNAMIC=0
 
 
 
@@ -15422,9 +15425,17 @@ rm -f core conftest.err conftest.$ac_obj
 	LIBS=$PLUGIN_NVPTX_save_LIBS
 	case $PLUGIN_NVPTX in
 	  nvptx*)
-	    PLUGIN_NVPTX=0
-	    as_fn_error "CUDA driver package required for nvptx support" "$LINENO" 5
-	    ;;
+	    if test "x$CUDA_DRIVER_INCLUDE" = x \
+	       && test "x$CUDA_DRIVER_LIB" = x; then
+	      PLUGIN_NVPTX=1
+	      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
+	      PLUGIN_NVPTX_LIBS='-ldl'
+	      PLUGIN_NVPTX_DYNAMIC=1
+	    else
+	      PLUGIN_NVPTX=0
+	      as_fn_error "CUDA driver package required for nvptx support" "$LINENO" 5
+	    fi
+	  ;;
 	esac
 	;;
       hsa*)
@@ -15509,6 +15520,11 @@ cat >>confdefs.h <<_ACEOF
 #define PLUGIN_NVPTX $PLUGIN_NVPTX
 _ACEOF
 
+
+cat >>confdefs.h <<_ACEOF
+#define PLUGIN_NVPTX_DYNAMIC $PLUGIN_NVPTX_DYNAMIC
+_ACEOF
+
  if test $PLUGIN_HSA = 1; then
   PLUGIN_HSA_TRUE=
   PLUGIN_HSA_FALSE='#'
--- libgomp/Makefile.in.jj	2017-01-13 12:07:55.000000000 +0100
+++ libgomp/Makefile.in	2017-01-13 16:46:53.769033874 +0100
@@ -17,7 +17,7 @@
 
 # Plugins for offload execution, Makefile.am fragment.
 #
-# Copyright (C) 2014-2016 Free Software Foundation, Inc.
+# Copyright (C) 2014-2017 Free Software Foundation, Inc.
 #
 # Contributed by Mentor Embedded.
 #

	Jakub

[-- Attachment #2: nvptx-tools.patch --]
[-- Type: text/plain, Size: 28878 bytes --]

diff --git a/configure.ac b/configure.ac
index ecc02c4..7bf8a3f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -51,6 +51,7 @@ LIBS="$LIBS -lcuda"
 AC_CHECK_FUNCS([[cuGetErrorName] [cuGetErrorString]])
 AC_CHECK_DECLS([[cuGetErrorName], [cuGetErrorString]],
   [], [], [[#include <cuda.h>]])
+AC_CHECK_HEADERS(unistd.h sys/stat.h)
 
 AC_MSG_CHECKING([for extra programs to build requiring -lcuda])
 NVPTX_RUN=
diff --git a/include/libiberty.h b/include/libiberty.h
index cacde80..29ceafe 100644
--- a/include/libiberty.h
+++ b/include/libiberty.h
@@ -390,6 +390,17 @@ extern void hex_init (void);
 /* Save files used for communication between processes.  */
 #define PEX_SAVE_TEMPS		0x4
 
+/* Max number of alloca bytes per call before we must switch to malloc.
+
+   ?? Swiped from gnulib's regex_internal.h header.  Is this actually
+   the case?  This number seems arbitrary, though sane.
+
+   The OS usually guarantees only one guard page at the bottom of the stack,
+   and a page size can be as small as 4096 bytes.  So we cannot safely
+   allocate anything larger than 4096 bytes.  Also care for the possibility
+   of a few compiler-allocated temporary stack slots.  */
+#define MAX_ALLOCA_SIZE	4032
+
 /* Prepare to execute one or more programs, with standard output of
    each program fed to standard input of the next.
    FLAGS	As above.
diff --git a/nvptx-as.c b/nvptx-as.c
index 53331af..1ad6699 100644
--- a/nvptx-as.c
+++ b/nvptx-as.c
@@ -30,6 +30,9 @@
 #include <string.h>
 #include <wait.h>
 #include <unistd.h>
+#ifdef HAVE_SYS_STAT_H
+#include <sys/stat.h>
+#endif
 #include <errno.h>
 #define obstack_chunk_alloc malloc
 #define obstack_chunk_free free
@@ -42,6 +45,38 @@
 
 #include "version.h"
 
+#ifndef R_OK
+#define R_OK 4
+#define W_OK 2
+#define X_OK 1
+#endif
+
+#ifndef DIR_SEPARATOR
+#  define DIR_SEPARATOR '/'
+#endif
+
+#if defined (_WIN32) || defined (__MSDOS__) \
+    || defined (__DJGPP__) || defined (__OS2__)
+#  define HAVE_DOS_BASED_FILE_SYSTEM
+#  define HAVE_HOST_EXECUTABLE_SUFFIX
+#  define HOST_EXECUTABLE_SUFFIX ".exe"
+#  ifndef DIR_SEPARATOR_2 
+#    define DIR_SEPARATOR_2 '\\'
+#  endif
+#  define PATH_SEPARATOR ';'
+#else
+#  define PATH_SEPARATOR ':'
+#endif
+
+#ifndef DIR_SEPARATOR_2
+#  define IS_DIR_SEPARATOR(ch) ((ch) == DIR_SEPARATOR)
+#else
+#  define IS_DIR_SEPARATOR(ch) \
+	(((ch) == DIR_SEPARATOR) || ((ch) == DIR_SEPARATOR_2))
+#endif
+
+#define DIR_UP ".."
+
 static const char *outname = NULL;
 
 static void __attribute__ ((format (printf, 1, 2)))
@@ -816,7 +851,7 @@ traverse (void **slot, void *data)
 }
 
 static void
-process (FILE *in, FILE *out)
+process (FILE *in, FILE *out, int verify, const char *outname)
 {
   symbol_table = htab_create (500, hash_string_hash, hash_string_eq,
                               NULL);
@@ -824,6 +859,18 @@ process (FILE *in, FILE *out)
   const char *input = read_file (in);
   Token *tok = tokenize (input);
 
+  /* By default, when ptxas is not in PATH, do minimalistic verification,
+     just require that the first non-comment directive is .version.  */
+  if (verify < 0)
+    {
+      size_t i;
+      for (i = 0; tok[i].kind == K_comment; i++)
+	;
+      if (tok[i].kind != K_dotted || !is_keyword (&tok[i], "version"))
+	fatal_error ("missing .version directive at start of file '%s'",
+		     outname);
+    }
+
   do
     tok = parse_file (tok);
   while (tok->kind);
@@ -897,9 +944,83 @@ fork_execute (const char *prog, char *const *argv)
   do_wait (prog, pex);
 }
 
+/* Determine if progname is available in PATH.  */
+static bool
+program_available (const char *progname)
+{
+  char *temp = getenv ("PATH");
+  if (temp)
+    {
+      char *startp, *endp, *nstore, *alloc_ptr = NULL;
+      size_t prefixlen = strlen (temp) + 1;
+      size_t len;
+      if (prefixlen < 2)
+	prefixlen = 2;
+
+      len = prefixlen + strlen (progname) + 1;
+#ifdef HAVE_HOST_EXECUTABLE_SUFFIX
+      len += strlen (HOST_EXECUTABLE_SUFFIX);
+#endif
+      if (len < MAX_ALLOCA_SIZE)
+	nstore = (char *) alloca (len);
+      else
+	alloc_ptr = nstore = (char *) malloc (len);
+
+      startp = endp = temp;
+      while (1)
+	{
+	  if (*endp == PATH_SEPARATOR || *endp == 0)
+	    {
+	      if (endp == startp)
+		{
+		  nstore[0] = '.';
+		  nstore[1] = DIR_SEPARATOR;
+		  nstore[2] = '\0';
+		}
+	      else
+		{
+		  memcpy (nstore, startp, endp - startp);
+		  if (! IS_DIR_SEPARATOR (endp[-1]))
+		    {
+		      nstore[endp - startp] = DIR_SEPARATOR;
+		      nstore[endp - startp + 1] = 0;
+		    }
+		  else
+		    nstore[endp - startp] = 0;
+		}
+	      strcat (nstore, progname);
+	      if (! access (nstore, X_OK)
+#ifdef HAVE_HOST_EXECUTABLE_SUFFIX
+		  || ! access (strcat (nstore, HOST_EXECUTABLE_SUFFIX), X_OK)
+#endif
+		 )
+		{
+#if defined (HAVE_SYS_STAT_H) && defined (S_ISREG)
+		  struct stat st;
+		  if (stat (nstore, &st) >= 0 && S_ISREG (st.st_mode))
+#endif
+		    {
+		      free (alloc_ptr);
+		      return true;
+		    }
+		}
+
+	      if (*endp == 0)
+		break;
+	      endp = startp = endp + 1;
+	    }
+	  else
+	    endp++;
+	}
+      free (alloc_ptr);
+    }
+  return false;
+}
+
 static struct option long_options[] = {
   {"traditional-format",     no_argument, 0,  0 },
   {"save-temps",  no_argument,       0,  0 },
+  {"verify",  no_argument,       0,  0 },
   {"no-verify",  no_argument,       0,  0 },
   {"help", no_argument, 0, 'h' },
   {"version", no_argument, 0, 'V' },
@@ -912,7 +1033,7 @@ main (int argc, char **argv)
   FILE *in = stdin;
   FILE *out = stdout;
   bool verbose __attribute__((unused)) = false;
-  bool verify = true;
+  int verify = -1;
   const char *smver = "sm_30";
 
   int o;
@@ -923,7 +1044,9 @@ main (int argc, char **argv)
 	{
 	case 0:
 	  if (option_index == 2)
-	    verify = false;
+	    verify = 1;
+	  else if (option_index == 3)
+	    verify = 0;
 	  break;
 	case 'v':
 	  verbose = true;
@@ -948,7 +1071,9 @@ Usage: nvptx-none-as [option...] [asmfile]\n\
 Options:\n\
   -o FILE               Write output to FILE\n\
   -v                    Be verbose\n\
+  --verify              Do verify output is acceptable to ptxas\n\
   --no-verify           Do not verify output is acceptable to ptxas\n\
+  --verify              Do verify output is acceptable to ptxas\n\
   --help                Print this help and exit\n\
   --version             Print version number and exit\n\
 \n\
@@ -983,11 +1108,17 @@ This program has absolutely no warranty.\n",
   if (!in)
     fatal_error ("cannot open input ptx file");
 
-  process (in, out);
-  if  (outname)
+  if (outname == NULL)
+    verify = 0;
+  else if (verify == -1)
+    if (program_available ("ptxas"))
+      verify = 1;
+
+  process (in, out, verify, outname);
+  if (outname)
     fclose (out);
 
-  if (verify && outname)
+  if (verify > 0)
     {
       struct obstack argv_obstack;
       obstack_init (&argv_obstack);
diff --git a/configure b/configure
index 9a0794a..4289569 100755
--- a/configure
+++ b/configure
@@ -168,7 +168,8 @@ test x\$exitcode = x0 || exit 1"
   as_suggested="  as_lineno_1=";as_suggested=$as_suggested$LINENO;as_suggested=$as_suggested" as_lineno_1a=\$LINENO
   as_lineno_2=";as_suggested=$as_suggested$LINENO;as_suggested=$as_suggested" as_lineno_2a=\$LINENO
   eval 'test \"x\$as_lineno_1'\$as_run'\" != \"x\$as_lineno_2'\$as_run'\" &&
-  test \"x\`expr \$as_lineno_1'\$as_run' + 1\`\" = \"x\$as_lineno_2'\$as_run'\"' || exit 1"
+  test \"x\`expr \$as_lineno_1'\$as_run' + 1\`\" = \"x\$as_lineno_2'\$as_run'\"' || exit 1
+test \$(( 1 + 1 )) = 2 || exit 1"
   if (eval "$as_required") 2>/dev/null; then :
   as_have_required=yes
 else
@@ -552,11 +553,50 @@ PACKAGE_URL=
 
 ac_unique_file="nvptx-tools"
 ac_unique_file="nvptx-as.c"
+# Factoring default headers for most tests.
+ac_includes_default="\
+#include <stdio.h>
+#ifdef HAVE_SYS_TYPES_H
+# include <sys/types.h>
+#endif
+#ifdef HAVE_SYS_STAT_H
+# include <sys/stat.h>
+#endif
+#ifdef STDC_HEADERS
+# include <stdlib.h>
+# include <stddef.h>
+#else
+# ifdef HAVE_STDLIB_H
+#  include <stdlib.h>
+# endif
+#endif
+#ifdef HAVE_STRING_H
+# if !defined STDC_HEADERS && defined HAVE_MEMORY_H
+#  include <memory.h>
+# endif
+# include <string.h>
+#endif
+#ifdef HAVE_STRINGS_H
+# include <strings.h>
+#endif
+#ifdef HAVE_INTTYPES_H
+# include <inttypes.h>
+#endif
+#ifdef HAVE_STDINT_H
+# include <stdint.h>
+#endif
+#ifdef HAVE_UNISTD_H
+# include <unistd.h>
+#endif"
+
 enable_option_checking=no
 ac_subst_vars='LTLIBOBJS
 LIBOBJS
 subdirs
 NVPTX_RUN
+EGREP
+GREP
+CPP
 CUDA_DRIVER_LDFLAGS
 CUDA_DRIVER_CPPFLAGS
 AR
@@ -635,7 +675,8 @@ LIBS
 CPPFLAGS
 CXX
 CXXFLAGS
-CCC'
+CCC
+CPP'
 ac_subdirs_all='libiberty'
 
 # Initialize some variables set by options.
@@ -1267,6 +1308,7 @@ Some influential environment variables:
               you have headers in a nonstandard directory <include dir>
   CXX         C++ compiler command
   CXXFLAGS    C++ compiler flags
+  CPP         C preprocessor
 
 Use these variables to override the choices made by `configure' or to help
 it to find libraries and programs with nonstandard names/locations.
@@ -1575,6 +1617,203 @@ $as_echo "$ac_res" >&6; }
   eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;}
 
 } # ac_fn_c_check_decl
+
+# ac_fn_c_try_cpp LINENO
+# ----------------------
+# Try to preprocess conftest.$ac_ext, and return whether this succeeded.
+ac_fn_c_try_cpp ()
+{
+  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
+  if { { ac_try="$ac_cpp conftest.$ac_ext"
+case "(($ac_try" in
+  *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+  *) ac_try_echo=$ac_try;;
+esac
+eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\""
+$as_echo "$ac_try_echo"; } >&5
+  (eval "$ac_cpp conftest.$ac_ext") 2>conftest.err
+  ac_status=$?
+  if test -s conftest.err; then
+    grep -v '^ *+' conftest.err >conftest.er1
+    cat conftest.er1 >&5
+    mv -f conftest.er1 conftest.err
+  fi
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; } >/dev/null && {
+	 test -z "$ac_c_preproc_warn_flag$ac_c_werror_flag" ||
+	 test ! -s conftest.err
+       }; then :
+  ac_retval=0
+else
+  $as_echo "$as_me: failed program was:" >&5
+sed 's/^/| /' conftest.$ac_ext >&5
+
+    ac_retval=1
+fi
+  eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;}
+  return $ac_retval
+
+} # ac_fn_c_try_cpp
+
+# ac_fn_c_check_header_mongrel LINENO HEADER VAR INCLUDES
+# -------------------------------------------------------
+# Tests whether HEADER exists, giving a warning if it cannot be compiled using
+# the include files in INCLUDES and setting the cache variable VAR
+# accordingly.
+ac_fn_c_check_header_mongrel ()
+{
+  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
+  if { as_var=$3; eval "test \"\${$as_var+set}\" = set"; }; then :
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5
+$as_echo_n "checking for $2... " >&6; }
+if { as_var=$3; eval "test \"\${$as_var+set}\" = set"; }; then :
+  $as_echo_n "(cached) " >&6
+fi
+eval ac_res=\$$3
+	       { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
+$as_echo "$ac_res" >&6; }
+else
+  # Is the header compilable?
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking $2 usability" >&5
+$as_echo_n "checking $2 usability... " >&6; }
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+$4
+#include <$2>
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  ac_header_compiler=yes
+else
+  ac_header_compiler=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_header_compiler" >&5
+$as_echo "$ac_header_compiler" >&6; }
+
+# Is the header present?
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking $2 presence" >&5
+$as_echo_n "checking $2 presence... " >&6; }
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <$2>
+_ACEOF
+if ac_fn_c_try_cpp "$LINENO"; then :
+  ac_header_preproc=yes
+else
+  ac_header_preproc=no
+fi
+rm -f conftest.err conftest.$ac_ext
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_header_preproc" >&5
+$as_echo "$ac_header_preproc" >&6; }
+
+# So?  What about this header?
+case $ac_header_compiler:$ac_header_preproc:$ac_c_preproc_warn_flag in #((
+  yes:no: )
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: accepted by the compiler, rejected by the preprocessor!" >&5
+$as_echo "$as_me: WARNING: $2: accepted by the compiler, rejected by the preprocessor!" >&2;}
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: proceeding with the compiler's result" >&5
+$as_echo "$as_me: WARNING: $2: proceeding with the compiler's result" >&2;}
+    ;;
+  no:yes:* )
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: present but cannot be compiled" >&5
+$as_echo "$as_me: WARNING: $2: present but cannot be compiled" >&2;}
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2:     check for missing prerequisite headers?" >&5
+$as_echo "$as_me: WARNING: $2:     check for missing prerequisite headers?" >&2;}
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: see the Autoconf documentation" >&5
+$as_echo "$as_me: WARNING: $2: see the Autoconf documentation" >&2;}
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2:     section \"Present But Cannot Be Compiled\"" >&5
+$as_echo "$as_me: WARNING: $2:     section \"Present But Cannot Be Compiled\"" >&2;}
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: proceeding with the compiler's result" >&5
+$as_echo "$as_me: WARNING: $2: proceeding with the compiler's result" >&2;}
+    ;;
+esac
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5
+$as_echo_n "checking for $2... " >&6; }
+if { as_var=$3; eval "test \"\${$as_var+set}\" = set"; }; then :
+  $as_echo_n "(cached) " >&6
+else
+  eval "$3=\$ac_header_compiler"
+fi
+eval ac_res=\$$3
+	       { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
+$as_echo "$ac_res" >&6; }
+fi
+  eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;}
+
+} # ac_fn_c_check_header_mongrel
+
+# ac_fn_c_try_run LINENO
+# ----------------------
+# Try to link conftest.$ac_ext, and return whether this succeeded. Assumes
+# that executables *can* be run.
+ac_fn_c_try_run ()
+{
+  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
+  if { { ac_try="$ac_link"
+case "(($ac_try" in
+  *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+  *) ac_try_echo=$ac_try;;
+esac
+eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\""
+$as_echo "$ac_try_echo"; } >&5
+  (eval "$ac_link") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; } && { ac_try='./conftest$ac_exeext'
+  { { case "(($ac_try" in
+  *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+  *) ac_try_echo=$ac_try;;
+esac
+eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\""
+$as_echo "$ac_try_echo"; } >&5
+  (eval "$ac_try") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }; then :
+  ac_retval=0
+else
+  $as_echo "$as_me: program exited with status $ac_status" >&5
+       $as_echo "$as_me: failed program was:" >&5
+sed 's/^/| /' conftest.$ac_ext >&5
+
+       ac_retval=$ac_status
+fi
+  rm -rf conftest.dSYM conftest_ipa8_conftest.oo
+  eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;}
+  return $ac_retval
+
+} # ac_fn_c_try_run
+
+# ac_fn_c_check_header_compile LINENO HEADER VAR INCLUDES
+# -------------------------------------------------------
+# Tests whether HEADER exists and can be compiled using the include files in
+# INCLUDES, setting the cache variable VAR accordingly.
+ac_fn_c_check_header_compile ()
+{
+  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5
+$as_echo_n "checking for $2... " >&6; }
+if { as_var=$3; eval "test \"\${$as_var+set}\" = set"; }; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+$4
+#include <$2>
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  eval "$3=yes"
+else
+  eval "$3=no"
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+eval ac_res=\$$3
+	       { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
+$as_echo "$ac_res" >&6; }
+  eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;}
+
+} # ac_fn_c_check_header_compile
 cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
@@ -3284,6 +3523,418 @@ cat >>confdefs.h <<_ACEOF
 #define HAVE_DECL_CUGETERRORSTRING $ac_have_decl
 _ACEOF
 
+ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking how to run the C preprocessor" >&5
+$as_echo_n "checking how to run the C preprocessor... " >&6; }
+# On Suns, sometimes $CPP names a directory.
+if test -n "$CPP" && test -d "$CPP"; then
+  CPP=
+fi
+if test -z "$CPP"; then
+  if test "${ac_cv_prog_CPP+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+      # Double quotes because CPP needs to be expanded
+    for CPP in "$CC -E" "$CC -E -traditional-cpp" "/lib/cpp"
+    do
+      ac_preproc_ok=false
+for ac_c_preproc_warn_flag in '' yes
+do
+  # Use a header file that comes with gcc, so configuring glibc
+  # with a fresh cross-compiler works.
+  # Prefer <limits.h> to <assert.h> if __STDC__ is defined, since
+  # <limits.h> exists even on freestanding compilers.
+  # On the NeXT, cc -E runs the code through the compiler's parser,
+  # not just through cpp. "Syntax error" is here to catch this case.
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#ifdef __STDC__
+# include <limits.h>
+#else
+# include <assert.h>
+#endif
+		     Syntax error
+_ACEOF
+if ac_fn_c_try_cpp "$LINENO"; then :
+
+else
+  # Broken: fails on valid input.
+continue
+fi
+rm -f conftest.err conftest.$ac_ext
+
+  # OK, works on sane cases.  Now check whether nonexistent headers
+  # can be detected and how.
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <ac_nonexistent.h>
+_ACEOF
+if ac_fn_c_try_cpp "$LINENO"; then :
+  # Broken: success on invalid input.
+continue
+else
+  # Passes both tests.
+ac_preproc_ok=:
+break
+fi
+rm -f conftest.err conftest.$ac_ext
+
+done
+# Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped.
+rm -f conftest.err conftest.$ac_ext
+if $ac_preproc_ok; then :
+  break
+fi
+
+    done
+    ac_cv_prog_CPP=$CPP
+
+fi
+  CPP=$ac_cv_prog_CPP
+else
+  ac_cv_prog_CPP=$CPP
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $CPP" >&5
+$as_echo "$CPP" >&6; }
+ac_preproc_ok=false
+for ac_c_preproc_warn_flag in '' yes
+do
+  # Use a header file that comes with gcc, so configuring glibc
+  # with a fresh cross-compiler works.
+  # Prefer <limits.h> to <assert.h> if __STDC__ is defined, since
+  # <limits.h> exists even on freestanding compilers.
+  # On the NeXT, cc -E runs the code through the compiler's parser,
+  # not just through cpp. "Syntax error" is here to catch this case.
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#ifdef __STDC__
+# include <limits.h>
+#else
+# include <assert.h>
+#endif
+		     Syntax error
+_ACEOF
+if ac_fn_c_try_cpp "$LINENO"; then :
+
+else
+  # Broken: fails on valid input.
+continue
+fi
+rm -f conftest.err conftest.$ac_ext
+
+  # OK, works on sane cases.  Now check whether nonexistent headers
+  # can be detected and how.
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <ac_nonexistent.h>
+_ACEOF
+if ac_fn_c_try_cpp "$LINENO"; then :
+  # Broken: success on invalid input.
+continue
+else
+  # Passes both tests.
+ac_preproc_ok=:
+break
+fi
+rm -f conftest.err conftest.$ac_ext
+
+done
+# Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped.
+rm -f conftest.err conftest.$ac_ext
+if $ac_preproc_ok; then :
+
+else
+  { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error "C preprocessor \"$CPP\" fails sanity check
+See \`config.log' for more details." "$LINENO" 5; }
+fi
+
+ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for grep that handles long lines and -e" >&5
+$as_echo_n "checking for grep that handles long lines and -e... " >&6; }
+if test "${ac_cv_path_GREP+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  if test -z "$GREP"; then
+  ac_path_GREP_found=false
+  # Loop through the user's path and test for each of PROGNAME-LIST
+  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin
+do
+  IFS=$as_save_IFS
+  test -z "$as_dir" && as_dir=.
+    for ac_prog in grep ggrep; do
+    for ac_exec_ext in '' $ac_executable_extensions; do
+      ac_path_GREP="$as_dir/$ac_prog$ac_exec_ext"
+      { test -f "$ac_path_GREP" && $as_test_x "$ac_path_GREP"; } || continue
+# Check for GNU ac_path_GREP and select it if it is found.
+  # Check for GNU $ac_path_GREP
+case `"$ac_path_GREP" --version 2>&1` in
+*GNU*)
+  ac_cv_path_GREP="$ac_path_GREP" ac_path_GREP_found=:;;
+*)
+  ac_count=0
+  $as_echo_n 0123456789 >"conftest.in"
+  while :
+  do
+    cat "conftest.in" "conftest.in" >"conftest.tmp"
+    mv "conftest.tmp" "conftest.in"
+    cp "conftest.in" "conftest.nl"
+    $as_echo 'GREP' >> "conftest.nl"
+    "$ac_path_GREP" -e 'GREP$' -e '-(cannot match)-' < "conftest.nl" >"conftest.out" 2>/dev/null || break
+    diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break
+    as_fn_arith $ac_count + 1 && ac_count=$as_val
+    if test $ac_count -gt ${ac_path_GREP_max-0}; then
+      # Best one so far, save it but keep looking for a better one
+      ac_cv_path_GREP="$ac_path_GREP"
+      ac_path_GREP_max=$ac_count
+    fi
+    # 10*(2^10) chars as input seems more than enough
+    test $ac_count -gt 10 && break
+  done
+  rm -f conftest.in conftest.tmp conftest.nl conftest.out;;
+esac
+
+      $ac_path_GREP_found && break 3
+    done
+  done
+  done
+IFS=$as_save_IFS
+  if test -z "$ac_cv_path_GREP"; then
+    as_fn_error "no acceptable grep could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" "$LINENO" 5
+  fi
+else
+  ac_cv_path_GREP=$GREP
+fi
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_GREP" >&5
+$as_echo "$ac_cv_path_GREP" >&6; }
+ GREP="$ac_cv_path_GREP"
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for egrep" >&5
+$as_echo_n "checking for egrep... " >&6; }
+if test "${ac_cv_path_EGREP+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  if echo a | $GREP -E '(a|b)' >/dev/null 2>&1
+   then ac_cv_path_EGREP="$GREP -E"
+   else
+     if test -z "$EGREP"; then
+  ac_path_EGREP_found=false
+  # Loop through the user's path and test for each of PROGNAME-LIST
+  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin
+do
+  IFS=$as_save_IFS
+  test -z "$as_dir" && as_dir=.
+    for ac_prog in egrep; do
+    for ac_exec_ext in '' $ac_executable_extensions; do
+      ac_path_EGREP="$as_dir/$ac_prog$ac_exec_ext"
+      { test -f "$ac_path_EGREP" && $as_test_x "$ac_path_EGREP"; } || continue
+# Check for GNU ac_path_EGREP and select it if it is found.
+  # Check for GNU $ac_path_EGREP
+case `"$ac_path_EGREP" --version 2>&1` in
+*GNU*)
+  ac_cv_path_EGREP="$ac_path_EGREP" ac_path_EGREP_found=:;;
+*)
+  ac_count=0
+  $as_echo_n 0123456789 >"conftest.in"
+  while :
+  do
+    cat "conftest.in" "conftest.in" >"conftest.tmp"
+    mv "conftest.tmp" "conftest.in"
+    cp "conftest.in" "conftest.nl"
+    $as_echo 'EGREP' >> "conftest.nl"
+    "$ac_path_EGREP" 'EGREP$' < "conftest.nl" >"conftest.out" 2>/dev/null || break
+    diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break
+    as_fn_arith $ac_count + 1 && ac_count=$as_val
+    if test $ac_count -gt ${ac_path_EGREP_max-0}; then
+      # Best one so far, save it but keep looking for a better one
+      ac_cv_path_EGREP="$ac_path_EGREP"
+      ac_path_EGREP_max=$ac_count
+    fi
+    # 10*(2^10) chars as input seems more than enough
+    test $ac_count -gt 10 && break
+  done
+  rm -f conftest.in conftest.tmp conftest.nl conftest.out;;
+esac
+
+      $ac_path_EGREP_found && break 3
+    done
+  done
+  done
+IFS=$as_save_IFS
+  if test -z "$ac_cv_path_EGREP"; then
+    as_fn_error "no acceptable egrep could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" "$LINENO" 5
+  fi
+else
+  ac_cv_path_EGREP=$EGREP
+fi
+
+   fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_EGREP" >&5
+$as_echo "$ac_cv_path_EGREP" >&6; }
+ EGREP="$ac_cv_path_EGREP"
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for ANSI C header files" >&5
+$as_echo_n "checking for ANSI C header files... " >&6; }
+if test "${ac_cv_header_stdc+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <stdlib.h>
+#include <stdarg.h>
+#include <string.h>
+#include <float.h>
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  ac_cv_header_stdc=yes
+else
+  ac_cv_header_stdc=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+
+if test $ac_cv_header_stdc = yes; then
+  # SunOS 4.x string.h does not declare mem*, contrary to ANSI.
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <string.h>
+
+_ACEOF
+if (eval "$ac_cpp conftest.$ac_ext") 2>&5 |
+  $EGREP "memchr" >/dev/null 2>&1; then :
+
+else
+  ac_cv_header_stdc=no
+fi
+rm -f conftest*
+
+fi
+
+if test $ac_cv_header_stdc = yes; then
+  # ISC 2.0.2 stdlib.h does not declare free, contrary to ANSI.
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <stdlib.h>
+
+_ACEOF
+if (eval "$ac_cpp conftest.$ac_ext") 2>&5 |
+  $EGREP "free" >/dev/null 2>&1; then :
+
+else
+  ac_cv_header_stdc=no
+fi
+rm -f conftest*
+
+fi
+
+if test $ac_cv_header_stdc = yes; then
+  # /bin/cc in Irix-4.0.5 gets non-ANSI ctype macros unless using -ansi.
+  if test "$cross_compiling" = yes; then :
+  :
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <ctype.h>
+#include <stdlib.h>
+#if ((' ' & 0x0FF) == 0x020)
+# define ISLOWER(c) ('a' <= (c) && (c) <= 'z')
+# define TOUPPER(c) (ISLOWER(c) ? 'A' + ((c) - 'a') : (c))
+#else
+# define ISLOWER(c) \
+		   (('a' <= (c) && (c) <= 'i') \
+		     || ('j' <= (c) && (c) <= 'r') \
+		     || ('s' <= (c) && (c) <= 'z'))
+# define TOUPPER(c) (ISLOWER(c) ? ((c) | 0x40) : (c))
+#endif
+
+#define XOR(e, f) (((e) && !(f)) || (!(e) && (f)))
+int
+main ()
+{
+  int i;
+  for (i = 0; i < 256; i++)
+    if (XOR (islower (i), ISLOWER (i))
+	|| toupper (i) != TOUPPER (i))
+      return 2;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_run "$LINENO"; then :
+
+else
+  ac_cv_header_stdc=no
+fi
+rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \
+  conftest.$ac_objext conftest.beam conftest.$ac_ext
+fi
+
+fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_header_stdc" >&5
+$as_echo "$ac_cv_header_stdc" >&6; }
+if test $ac_cv_header_stdc = yes; then
+
+$as_echo "#define STDC_HEADERS 1" >>confdefs.h
+
+fi
+
+# On IRIX 5.3, sys/types and inttypes.h are conflicting.
+for ac_header in sys/types.h sys/stat.h stdlib.h string.h memory.h strings.h \
+		  inttypes.h stdint.h unistd.h
+do :
+  as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
+ac_fn_c_check_header_compile "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default
+"
+eval as_val=\$$as_ac_Header
+   if test "x$as_val" = x""yes; then :
+  cat >>confdefs.h <<_ACEOF
+#define `$as_echo "HAVE_$ac_header" | $as_tr_cpp` 1
+_ACEOF
+
+fi
+
+done
+
+
+for ac_header in unistd.h sys/stat.h
+do :
+  as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
+ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
+eval as_val=\$$as_ac_Header
+   if test "x$as_val" = x""yes; then :
+  cat >>confdefs.h <<_ACEOF
+#define `$as_echo "HAVE_$ac_header" | $as_tr_cpp` 1
+_ACEOF
+
+fi
+
+done
+
 
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for extra programs to build requiring -lcuda" >&5
 $as_echo_n "checking for extra programs to build requiring -lcuda... " >&6; }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)
  2017-01-13 18:11 [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches) Jakub Jelinek
@ 2017-01-13 18:19 ` Joseph Myers
  2017-01-13 18:28   ` Jakub Jelinek
  2017-01-18 20:27 ` Alexander Monakov
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Joseph Myers @ 2017-01-13 18:19 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Alexander Monakov, Thomas Schwinge, Cesar Philippidis,
	Chung-Lin Tang, Jeff Law, gcc-patches

On Fri, 13 Jan 2017, Jakub Jelinek wrote:

> --- libgomp/plugin/cuda/cuda.h.jj	2017-01-13 15:58:00.966544147 +0100
> +++ libgomp/plugin/cuda/cuda.h	2017-01-13 17:02:47.355817896 +0100
> @@ -0,0 +1,174 @@
> +/* CUDA API description.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.

The new file should presumably have the runtime license exception.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)
  2017-01-13 18:19 ` Joseph Myers
@ 2017-01-13 18:28   ` Jakub Jelinek
  2017-01-16 19:23     ` Jeff Law
  0 siblings, 1 reply; 24+ messages in thread
From: Jakub Jelinek @ 2017-01-13 18:28 UTC (permalink / raw)
  To: Joseph Myers, Martin Jambor
  Cc: Alexander Monakov, Thomas Schwinge, Cesar Philippidis,
	Chung-Lin Tang, Jeff Law, gcc-patches

On Fri, Jan 13, 2017 at 06:19:02PM +0000, Joseph Myers wrote:
> > --- libgomp/plugin/cuda/cuda.h.jj	2017-01-13 15:58:00.966544147 +0100
> > +++ libgomp/plugin/cuda/cuda.h	2017-01-13 17:02:47.355817896 +0100
> > @@ -0,0 +1,174 @@
> > +/* CUDA API description.
> > +   Copyright (C) 2017 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify
> > +it under the terms of the GNU General Public License as published by
> > +the Free Software Foundation; either version 3, or (at your option)
> > +any later version.
> > +
> > +GCC is distributed in the hope that it will be useful,
> > +but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +GNU General Public License for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +<http://www.gnu.org/licenses/>.
> 
> The new file should presumably have the runtime license exception.

Agreed (though, most likely the file isn't copyrightable anyway, but
we use copyright boilerplate for various files that might not be
copyrightable).  But we should use it not just for cuda.h, but also
for hsa.h and hsa_ext_finalize.h (CCing Martin who has added those).

2017-01-13  Jakub Jelinek  <jakub@redhat.com>

	* plugin/cuda/cuda.h: Add GCC runtime library exception.
	* plugin/hsa.h: Likewise.
	* plugin/hsa_ext_finalize.h: Likewise.

--- libgomp/plugin/cuda/cuda.h.jj	2017-01-13 17:02:47.000000000 +0100
+++ libgomp/plugin/cuda/cuda.h	2017-01-13 19:21:06.307547518 +0100
@@ -13,8 +13,13 @@ but WITHOUT ANY WARRANTY; without even t
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.
 
-You should have received a copy of the GNU General Public License
-along with GCC; see the file COPYING3.  If not see
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.
 
 This header provides the minimum amount of typedefs, enums and function
--- libgomp/plugin/hsa.h.jj	2017-01-13 12:07:56.000000000 +0100
+++ libgomp/plugin/hsa.h	2017-01-13 19:21:37.230153569 +0100
@@ -13,8 +13,13 @@ but WITHOUT ANY WARRANTY; without even t
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.
 
-You should have received a copy of the GNU General Public License
-along with GCC; see the file COPYING3.  If not see
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.
 
 The contents of the file was created by extracting data structures, enum,
--- libgomp/plugin/hsa_ext_finalize.h.jj	2017-01-13 12:07:56.000000000 +0100
+++ libgomp/plugin/hsa_ext_finalize.h	2017-01-13 19:22:05.388794833 +0100
@@ -13,8 +13,13 @@ but WITHOUT ANY WARRANTY; without even t
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.
 
-You should have received a copy of the GNU General Public License
-along with GCC; see the file COPYING3.  If not see
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.
 
 The contents of the file was created by extracting data structures, enum,


	Jakub

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)
  2017-01-13 18:28   ` Jakub Jelinek
@ 2017-01-16 19:23     ` Jeff Law
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Law @ 2017-01-16 19:23 UTC (permalink / raw)
  To: Jakub Jelinek, Joseph Myers, Martin Jambor
  Cc: Alexander Monakov, Thomas Schwinge, Cesar Philippidis,
	Chung-Lin Tang, gcc-patches

On 01/13/2017 11:28 AM, Jakub Jelinek wrote:
> On Fri, Jan 13, 2017 at 06:19:02PM +0000, Joseph Myers wrote:
>>> --- libgomp/plugin/cuda/cuda.h.jj	2017-01-13 15:58:00.966544147 +0100
>>> +++ libgomp/plugin/cuda/cuda.h	2017-01-13 17:02:47.355817896 +0100
>>> @@ -0,0 +1,174 @@
>>> +/* CUDA API description.
>>> +   Copyright (C) 2017 Free Software Foundation, Inc.
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify
>>> +it under the terms of the GNU General Public License as published by
>>> +the Free Software Foundation; either version 3, or (at your option)
>>> +any later version.
>>> +
>>> +GCC is distributed in the hope that it will be useful,
>>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> +GNU General Public License for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License
>>> +along with GCC; see the file COPYING3.  If not see
>>> +<http://www.gnu.org/licenses/>.
>>
>> The new file should presumably have the runtime license exception.
>
> Agreed (though, most likely the file isn't copyrightable anyway, but
> we use copyright boilerplate for various files that might not be
> copyrightable).  But we should use it not just for cuda.h, but also
> for hsa.h and hsa_ext_finalize.h (CCing Martin who has added those).
>
> 2017-01-13  Jakub Jelinek  <jakub@redhat.com>
>
> 	* plugin/cuda/cuda.h: Add GCC runtime library exception.
> 	* plugin/hsa.h: Likewise.
> 	* plugin/hsa_ext_finalize.h: Likewise.
Yea, seems like an oversight.  Certainly the intention is that using 
cuda & hsa in and of itself doesn't require the resulting executable to 
be GPL licensed.

jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)
  2017-01-13 18:11 [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches) Jakub Jelinek
  2017-01-13 18:19 ` Joseph Myers
@ 2017-01-18 20:27 ` Alexander Monakov
  2017-01-18 22:18   ` Jakub Jelinek
  2017-01-21 15:28 ` Thomas Schwinge
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Alexander Monakov @ 2017-01-18 20:27 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Thomas Schwinge, Cesar Philippidis, Chung-Lin Tang, Jeff Law,
	gcc-patches

Hello Jakub,

Sorry for not noticing this earlier, but ...

> +#ifdef __LP64__
> +typedef unsigned long long CUdeviceptr;
> +#else
> +typedef unsigned CUdeviceptr;
> +#endif

I think this #ifdef doesn't do the right thing on MinGW.
Would it be fine to simplify it?  In my code I have

  typedef uintptr_t CUdeviceptr;

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)
  2017-01-18 20:27 ` Alexander Monakov
@ 2017-01-18 22:18   ` Jakub Jelinek
  2017-01-19 15:10     ` Alexander Monakov
  0 siblings, 1 reply; 24+ messages in thread
From: Jakub Jelinek @ 2017-01-18 22:18 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: Thomas Schwinge, Cesar Philippidis, Chung-Lin Tang, Jeff Law,
	gcc-patches

On Wed, Jan 18, 2017 at 10:52:32PM +0300, Alexander Monakov wrote:
> Sorry for not noticing this earlier, but ...
> 
> > +#ifdef __LP64__
> > +typedef unsigned long long CUdeviceptr;
> > +#else
> > +typedef unsigned CUdeviceptr;
> > +#endif
> 
> I think this #ifdef doesn't do the right thing on MinGW.
> Would it be fine to simplify it?  In my code I have
> 
>   typedef uintptr_t CUdeviceptr;

I think it depends on if we want to use CUdeviceptr typed variables
in printf like format strings, or C++ overloading (then the exact
type is significant and we should go for probably

-#ifdef __LP64__
+#if defined(__LP64__) || defined(_WIN64)

(is that the right define for 64-bit MingW, right?).

Otherwise, I think using uintptr_t is a problem, because we'd need to
#include <stdint.h> (the header only includes <stdlib.h>).

Another option is
typedef __UINTPTR_TYPE__ CUdeviceptr;

	Jakub

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)
  2017-01-18 22:18   ` Jakub Jelinek
@ 2017-01-19 15:10     ` Alexander Monakov
  2017-01-19 15:55       ` Jakub Jelinek
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Monakov @ 2017-01-19 15:10 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Thomas Schwinge, Cesar Philippidis, Chung-Lin Tang, Jeff Law,
	gcc-patches

On Wed, 18 Jan 2017, Jakub Jelinek wrote:
> On Wed, Jan 18, 2017 at 10:52:32PM +0300, Alexander Monakov wrote:
> > Sorry for not noticing this earlier, but ...
> > 
> > > +#ifdef __LP64__
> > > +typedef unsigned long long CUdeviceptr;
> > > +#else
> > > +typedef unsigned CUdeviceptr;
> > > +#endif
> > 
> > I think this #ifdef doesn't do the right thing on MinGW.
> > Would it be fine to simplify it?  In my code I have
> > 
> >   typedef uintptr_t CUdeviceptr;
> 
> I think it depends on if we want to use CUdeviceptr typed variables
> in printf like format strings, or C++ overloading (then the exact
> type is significant and we should go for probably
> 
> -#ifdef __LP64__
> +#if defined(__LP64__) || defined(_WIN64)
> 
> (is that the right define for 64-bit MingW, right?).

Yes, _WIN64; libsanitizer has a similar test.  Alternatively, I guess,

  #if __SIZEOF_POINTER__ == 8

> Otherwise, I think using uintptr_t is a problem, because we'd need to
> #include <stdint.h> (the header only includes <stdlib.h>).

Note that plugin-nvptx.c already includes <stdint.h>.  But, anyway, I agree that
there's value in defining the exact type via the #if.

Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)
  2017-01-19 15:10     ` Alexander Monakov
@ 2017-01-19 15:55       ` Jakub Jelinek
  0 siblings, 0 replies; 24+ messages in thread
From: Jakub Jelinek @ 2017-01-19 15:55 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: Thomas Schwinge, Cesar Philippidis, Chung-Lin Tang, Jeff Law,
	gcc-patches

On Thu, Jan 19, 2017 at 06:09:35PM +0300, Alexander Monakov wrote:
> > -#ifdef __LP64__
> > +#if defined(__LP64__) || defined(_WIN64)
> > 
> > (is that the right define for 64-bit MingW, right?).
> 
> Yes, _WIN64; libsanitizer has a similar test.  Alternatively, I guess,
> 
>   #if __SIZEOF_POINTER__ == 8
> 
> > Otherwise, I think using uintptr_t is a problem, because we'd need to
> > #include <stdint.h> (the header only includes <stdlib.h>).
> 
> Note that plugin-nvptx.c already includes <stdint.h>.  But, anyway, I agree that
> there's value in defining the exact type via the #if.

I've committed then.

2017-01-19  Jakub Jelinek  <jakub@redhat.com>

	* plugin/cuda/cuda.h (CUdeviceptr): Typedef to unsigned long long even
	for _WIN64.

--- libgomp/plugin/cuda/cuda.h	(revision 244570)
+++ libgomp/plugin/cuda/cuda.h	(working copy)
@@ -35,7 +35,7 @@ libcuda.so.1 are not available.  */
 
 typedef void *CUcontext;
 typedef int CUdevice;
-#ifdef __LP64__
+#if defined(__LP64__) || defined(_WIN64)
 typedef unsigned long long CUdeviceptr;
 #else
 typedef unsigned CUdeviceptr;


	Jakub

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)
  2017-01-13 18:11 [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches) Jakub Jelinek
  2017-01-13 18:19 ` Joseph Myers
  2017-01-18 20:27 ` Alexander Monakov
@ 2017-01-21 15:28 ` Thomas Schwinge
  2017-01-21 19:13   ` Jakub Jelinek
  2017-05-03  9:08   ` Jakub Jelinek
  2022-04-06 12:39 ` Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)) Thomas Schwinge
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 24+ messages in thread
From: Thomas Schwinge @ 2017-01-21 15:28 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: gcc-patches, Alexander Monakov, Cesar Philippidis,
	Chung-Lin Tang, Jeff Law

Hi!

On Fri, 13 Jan 2017 19:11:23 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> This is something that has been discussed already during the last Cauldron.
> Especially for distributions it is undesirable to need to have proprietary
> CUDA libraries and headers installed when building GCC.

ACK.

> These two patches allow building GCC without CUDA around in a way that later
> on can offload to PTX if libcuda.so.1 is installed

Thanks!

I'd like to have some additional changes done; see the attached patch,
and also some further comments below.

> In order to configure gcc to load libcuda.so.1 dynamically,
> one has to either configure it --without-cuda-driver, or without
> --with-cuda-driver=/--with-cuda-driver-lib=/--with-cuda-driver-include=
> options if cuda.h and -lcuda aren't found in the default locations.

Would be good to have that documented ;-) -- done.

> The nvptx-tools change

(I'll get to that later.)

> --- libgomp/plugin/configfrag.ac.jj	2017-01-13 12:07:56.000000000 +0100
> +++ libgomp/plugin/configfrag.ac	2017-01-13 17:33:26.608240936 +0100
> @@ -58,10 +58,12 @@ AC_ARG_WITH(cuda-driver-include,
>  AC_ARG_WITH(cuda-driver-lib,
>  	[AS_HELP_STRING([--with-cuda-driver-lib=PATH],
>  		[specify directory for the installed CUDA driver library])])
> -if test "x$with_cuda_driver" != x; then
> -  CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
> -  CUDA_DRIVER_LIB=$with_cuda_driver/lib
> -fi
> +case "x$with_cuda_driver" in
> +  x | xno) ;;
> +  *) CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
> +     CUDA_DRIVER_LIB=$with_cuda_driver/lib
> +     ;;
> +esac

I (obviously) agree with your intended (?) "--without-cuda-driver"
semantics, but I think a "--with-cuda-driver" option should actually mean
that the system's/installed CUDA driver package *must* be used (and
similar for other "--with-cuda-driver*" options); and I also added
"--with-cuda-driver=check" to allow overriding earlier such options (that
is, restore the default "check" behavior).

I say 'intended (?) "--without-cuda-driver" semantics', because with your
current patch/code, if I got that right, if one specifies
"--without-cuda-driver" but actually does have a CUDA driver system
installation available, then the nvptx libgomp plugin will still link
against that one, instead of "dlopen"ing it.  So I changed that
accordingly.

> +PLUGIN_NVPTX_DYNAMIC=0

I find the name "PLUGIN_NVPTX_DYNAMIC" a bit misleading, as this isn't
about the nvptx plugin being "dynamic" but rather it's about its usage of
the CUDA driver library.  Thus renamed to "CUDA_DRIVER_DYNAMIC".

> @@ -167,9 +170,17 @@ if test x"$enable_offload_targets" != x;
>  	LIBS=$PLUGIN_NVPTX_save_LIBS
>  	case $PLUGIN_NVPTX in
>  	  nvptx*)
> -	    PLUGIN_NVPTX=0
> -	    AC_MSG_ERROR([CUDA driver package required for nvptx support])
> -	    ;;
> +	    if test "x$CUDA_DRIVER_INCLUDE" = x \
> +	       && test "x$CUDA_DRIVER_LIB" = x; then
> +	      PLUGIN_NVPTX=1
> +	      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
> +	      PLUGIN_NVPTX_LIBS='-ldl'
> +	      PLUGIN_NVPTX_DYNAMIC=1
> +	    else
> +	      PLUGIN_NVPTX=0
> +	      AC_MSG_ERROR([CUDA driver package required for nvptx support])
> +	    fi
> +	  ;;
>  	esac

I reworked that logic to accommodate for the additional
"--with-cuda-driver" usage.

> --- libgomp/plugin/plugin-nvptx.c.jj	2017-01-13 12:07:56.000000000 +0100
> +++ libgomp/plugin/plugin-nvptx.c	2017-01-13 18:00:39.693284346 +0100

> +/* -1 if init_cuda_lib has not been called yet, false
> +   if it has been and failed, true if it has been and succeeded.  */
> +static char cuda_lib_inited = -1;

Don't we actually have to worry here about multiple threads running into
this in parallel -- thus need locking (or atomic accesses?) when
accessing "cuda_lib_inited"?

> +/* Dynamically load the CUDA runtime library and initialize function

Not "CUDA runtime" but actually "CUDA driver" -- changed.

> +   pointers, return false if unsuccessful, true if successful.  */
> +static bool
> +init_cuda_lib (void)
> +{
> +  if (cuda_lib_inited != -1)
> +    return cuda_lib_inited;
> +  const char *cuda_runtime_lib = "libcuda.so.1";
> +  void *h = dlopen (cuda_runtime_lib, RTLD_LAZY);
> +  cuda_lib_inited = false;
> +  if (h == NULL)
> +    return false;

I'd like some GOMP_PLUGIN_debug output for this and the following "return
false" cases -- added.

> +# undef CUDA_ONE_CALL
> +# define CUDA_ONE_CALL(call) CUDA_ONE_CALL_1 (call)
> +# define CUDA_ONE_CALL_1(call) \
> +  cuda_lib.call = dlsym (h, #call);	\
> +  if (cuda_lib.call == NULL)		\
> +    return false;
> +  CUDA_CALLS
> +  cuda_lib_inited = true;
> +  return true;
>  }

> --- libgomp/plugin/cuda/cuda.h.jj	2017-01-13 15:58:00.966544147 +0100
> +++ libgomp/plugin/cuda/cuda.h	2017-01-13 17:02:47.355817896 +0100

> +#define CUDA_VERSION 8000

Does that make it compatible to CUDA 8.0 (and later) only?  (Not yet
checked.)

(Have not reviewed this new file any further.)


Currently testing the following patch; OK for trunk?

commit 4ef19c27a9567df03f82282b8ae6608c5d88472d
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Sat Jan 21 15:25:44 2017 +0100

    libgomp: Additional "--with-cuda-driver" changes
    
            gcc/
            * doc/install.texi: Document "--with-cuda-driver" and related
            options.
            libgomp/
            * plugin/plugin-nvptx.c (init_cuda_lib): Add GOMP_PLUGIN_debug
            calls.
            * plugin/configfrag.ac: Document "--with-cuda-driver" and related
            options.  Handle "--with-cuda-driver", "--with-cuda-driver=check",
            and "--without-cuda-driver" options.
            (PLUGIN_NVPTX_DYNAMIC): Rename to...
            (CUDA_DRIVER_DYNAMIC): ... this.  Adjust all users.
            * config.h.in: Regenerate.
            * configure: Likewise.
---
 gcc/doc/install.texi          |  23 +++++++
 libgomp/config.h.in           |   8 +--
 libgomp/configure             | 146 ++++++++++++++++++++++++++++++------------
 libgomp/plugin/configfrag.ac  | 139 +++++++++++++++++++++++++++-------------
 libgomp/plugin/plugin-nvptx.c |  32 +++++----
 5 files changed, 248 insertions(+), 100 deletions(-)

diff --git gcc/doc/install.texi gcc/doc/install.texi
index cccf812..769bdc5 100644
--- gcc/doc/install.texi
+++ gcc/doc/install.texi
@@ -2061,6 +2061,29 @@ If @samp{hsa} is specified as one of the targets, the compiler will be
 built with support for HSA GPU accelerators.  Because the same
 compiler will emit the accelerator code, no path should be specified.
 
+@item --without-cuda-driver
+@itemx --with-cuda-driver=check
+@itemx --with-cuda-driver
+@itemx --with-cuda-driver=@var{pathname}
+@itemx --with-cuda-driver-include=@var{pathname}
+@itemx --with-cuda-driver-lib=@var{pathname}
+
+If you configure GCC for nvptx offloading, @code{libgomp}'s nvptx
+plugin requires to use the CUDA driver package.  The default is to
+link against the system's installation, if usable.  If that is not
+available, or if @option{--without-cuda-driver} has been specified,
+the plugin will instead @code{dlopen} the CUDA driver library at
+run-time.  With the exception of @option{--with-cuda-driver=check}
+which overrides any earlier options and restores the default behavior,
+all other usage of @option{--with-cuda-driver=@var{pathname}},
+@option{--with-cuda-driver-include=@var{pathname}}, or
+@option{--with-cuda-driver-lib=@var{pathname}} sets the include and
+library paths accordingly, and causes the build to stop if the CUDA
+driver package in these locations is not usable.  The
+@option{--with-cuda-driver=@var{pathname}} option is a shorthand for
+@option{--with-cuda-driver-lib=@var{pathname}/lib} and
+@option{--with-cuda-driver-include=@var{pathname}/include}.
+
 @item --with-hsa-runtime=@var{pathname}
 @itemx --with-hsa-runtime-include=@var{pathname}
 @itemx --with-hsa-runtime-lib=@var{pathname}
diff --git libgomp/plugin/configfrag.ac libgomp/plugin/configfrag.ac
index c4a9279..c18e118 100644
--- libgomp/plugin/configfrag.ac
+++ libgomp/plugin/configfrag.ac
@@ -41,6 +41,7 @@ AC_CHECK_FUNCS_ONCE(secure_getenv __secure_getenv getuid geteuid getgid getegid)
 
 
 # Look for the CUDA driver package.
+CUDA_DRIVER_DYNAMIC=invalid
 CUDA_DRIVER_INCLUDE=
 CUDA_DRIVER_LIB=
 AC_SUBST(CUDA_DRIVER_INCLUDE)
@@ -48,26 +49,44 @@ AC_SUBST(CUDA_DRIVER_LIB)
 CUDA_DRIVER_CPPFLAGS=
 CUDA_DRIVER_LDFLAGS=
 AC_ARG_WITH(cuda-driver,
+	[AS_HELP_STRING([--without-cuda-driver],
+		[do not use the system's CUDA driver package])])
+AC_ARG_WITH(cuda-driver,
+	[AS_HELP_STRING([--with-cuda-driver=check],
+		[use the system's CUDA driver package, if usable [default]])])
+AC_ARG_WITH(cuda-driver,
+	[AS_HELP_STRING([--with-cuda-driver],
+		[use the system's CUDA driver package])])
+AC_ARG_WITH(cuda-driver,
 	[AS_HELP_STRING([--with-cuda-driver=PATH],
-		[specify prefix directory for installed CUDA driver package.
-		 Equivalent to --with-cuda-driver-include=PATH/include
-		 plus --with-cuda-driver-lib=PATH/lib])])
+		[use installed CUDA driver package, and specify prefix
+		directory.  Equivalent to
+		--with-cuda-driver-include=PATH/include plus
+		--with-cuda-driver-lib=PATH/lib])],
+	[],
+	[with_cuda_driver=check])
 AC_ARG_WITH(cuda-driver-include,
 	[AS_HELP_STRING([--with-cuda-driver-include=PATH],
-		[specify directory for installed CUDA driver include files])])
+		[use installed CUDA driver package, and specify directory for
+		include files])])
 AC_ARG_WITH(cuda-driver-lib,
 	[AS_HELP_STRING([--with-cuda-driver-lib=PATH],
-		[specify directory for the installed CUDA driver library])])
+		[use installed CUDA driver package, and specify directory for
+		libraries])])
 case "x$with_cuda_driver" in
-  x | xno) ;;
-  *) CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
-     CUDA_DRIVER_LIB=$with_cuda_driver/lib
-     ;;
+  xcheck | xno | xyes)
+    ;;
+  *)
+    CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
+    CUDA_DRIVER_LIB=$with_cuda_driver/lib
+    ;;
 esac
 if test "x$with_cuda_driver_include" != x; then
+  CUDA_DRIVER_DYNAMIC=0
   CUDA_DRIVER_INCLUDE=$with_cuda_driver_include
 fi
 if test "x$with_cuda_driver_lib" != x; then
+  CUDA_DRIVER_DYNAMIC=0
   CUDA_DRIVER_LIB=$with_cuda_driver_lib
 fi
 if test "x$CUDA_DRIVER_INCLUDE" != x; then
@@ -76,12 +95,22 @@ fi
 if test "x$CUDA_DRIVER_LIB" != x; then
   CUDA_DRIVER_LDFLAGS=-L$CUDA_DRIVER_LIB
 fi
+case "x$with_cuda_driver" in
+  xcheck)
+    CUDA_DRIVER_DYNAMIC=check
+    ;;
+  xno)
+    CUDA_DRIVER_DYNAMIC=1
+    ;;
+  xyes | *)
+    CUDA_DRIVER_DYNAMIC=0
+    ;;
+esac
 
 PLUGIN_NVPTX=0
 PLUGIN_NVPTX_CPPFLAGS=
 PLUGIN_NVPTX_LDFLAGS=
 PLUGIN_NVPTX_LIBS=
-PLUGIN_NVPTX_DYNAMIC=0
 AC_SUBST(PLUGIN_NVPTX)
 AC_SUBST(PLUGIN_NVPTX_CPPFLAGS)
 AC_SUBST(PLUGIN_NVPTX_LDFLAGS)
@@ -149,39 +178,63 @@ if test x"$enable_offload_targets" != x; then
 	;;
       nvptx*)
         tgt_name=nvptx
-	PLUGIN_NVPTX=$tgt
-	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
-	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
-	PLUGIN_NVPTX_LIBS='-lcuda'
 
-	PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
-	CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
-	PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
-	LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
-	PLUGIN_NVPTX_save_LIBS=$LIBS
-	LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
-	AC_LINK_IFELSE(
-	  [AC_LANG_PROGRAM(
-	    [#include "cuda.h"],
-	      [CUresult r = cuCtxPushCurrent (NULL);])],
-	  [PLUGIN_NVPTX=1])
-	CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
-	LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
-	LIBS=$PLUGIN_NVPTX_save_LIBS
-	case $PLUGIN_NVPTX in
-	  nvptx*)
-	    if test "x$CUDA_DRIVER_INCLUDE" = x \
-	       && test "x$CUDA_DRIVER_LIB" = x; then
-	      PLUGIN_NVPTX=1
-	      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
-	      PLUGIN_NVPTX_LIBS='-ldl'
-	      PLUGIN_NVPTX_DYNAMIC=1
-	    else
-	      PLUGIN_NVPTX=0
-	      AC_MSG_ERROR([CUDA driver package required for nvptx support])
-	    fi
-	  ;;
+	case $CUDA_DRIVER_DYNAMIC in
+	  1)
+	    PLUGIN_NVPTX=1
+	    ;;
+	  check | 0)
+	    # Determine whether the system's CUDA driver package is usable.
+	    PLUGIN_NVPTX=0
+
+	    # Tentatively point to the system's CUDA driver package.
+	    PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+	    PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+	    PLUGIN_NVPTX_LIBS=-lcuda
+
+	    PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+	    CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+	    PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+	    LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+	    PLUGIN_NVPTX_save_LIBS=$LIBS
+	    LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+	    AC_LINK_IFELSE(
+	      [AC_LANG_PROGRAM(
+		[#include "cuda.h"],
+		  [CUresult r = cuCtxPushCurrent (NULL);])],
+	      [PLUGIN_NVPTX=1])
+	    CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+	    LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+	    LIBS=$PLUGIN_NVPTX_save_LIBS
+	    ;;
+	  *)
+	    AC_MSG_ERROR([internal error])
+	    ;;
+	esac
+
+	case $CUDA_DRIVER_DYNAMIC:$PLUGIN_NVPTX in
+	  check:0)
+	    CUDA_DRIVER_DYNAMIC=1
+	    PLUGIN_NVPTX=1
+	    ;;
+	  check:1)
+	    CUDA_DRIVER_DYNAMIC=0
+	    ;;
+	  0:1 | 1:1)
+	    ;;
+	  0:0)
+	    AC_MSG_ERROR([CUDA driver package not usable])
+	    ;;
+	  *)
+	    AC_MSG_ERROR([internal error])
+	    ;;
 	esac
+	if test $CUDA_DRIVER_DYNAMIC = 1; then
+	  # Point to the "dynamic" files.
+	  PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
+	  PLUGIN_NVPTX_LDFLAGS=
+	  PLUGIN_NVPTX_LIBS=-ldl
+	fi
 	;;
       hsa*)
 	case "${target}" in
@@ -252,8 +305,8 @@ AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
   [Define to 1 if the NVIDIA plugin is built, 0 if not.])
-AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
-  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should be linked against it.])
+AC_DEFINE_UNQUOTED([CUDA_DRIVER_DYNAMIC], [$CUDA_DRIVER_DYNAMIC],
+  [Define to 1 to dlopen the CUDA driver library, to 0 if linking against it.])
 AM_CONDITIONAL([PLUGIN_HSA], [test $PLUGIN_HSA = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_HSA], [$PLUGIN_HSA],
   [Define to 1 if the HSA plugin is built, 0 if not.])
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 4144218..e236af6 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -48,7 +48,7 @@
 #include <assert.h>
 #include <errno.h>
 
-#if PLUGIN_NVPTX_DYNAMIC
+#if CUDA_DRIVER_DYNAMIC
 # include <dlfcn.h>
 
 # define CUDA_CALLS \
@@ -103,40 +103,48 @@ CUDA_ONE_CALL (cuStreamWaitEvent)
 struct cuda_lib_s {
   CUDA_CALLS
 } cuda_lib;
+# undef CUDA_ONE_CALL
 
 /* -1 if init_cuda_lib has not been called yet, false
    if it has been and failed, true if it has been and succeeded.  */
 static char cuda_lib_inited = -1;
 
-/* Dynamically load the CUDA runtime library and initialize function
+/* Dynamically load the CUDA driver library and initialize function
    pointers, return false if unsuccessful, true if successful.  */
 static bool
 init_cuda_lib (void)
 {
+  GOMP_PLUGIN_debug (0, "%s; initially: cuda_lib_inited=%hhd\n",
+		     __FUNCTION__, cuda_lib_inited);
+
   if (cuda_lib_inited != -1)
     return cuda_lib_inited;
-  const char *cuda_runtime_lib = "libcuda.so.1";
-  void *h = dlopen (cuda_runtime_lib, RTLD_LAZY);
+  const char *cuda_driver_lib = "libcuda.so.1";
+  void *h = dlopen (cuda_driver_lib, RTLD_LAZY);
   cuda_lib_inited = false;
   if (h == NULL)
-    return false;
-# undef CUDA_ONE_CALL
+    goto dl_fail;
 # define CUDA_ONE_CALL(call) CUDA_ONE_CALL_1 (call)
 # define CUDA_ONE_CALL_1(call) \
   cuda_lib.call = dlsym (h, #call);	\
   if (cuda_lib.call == NULL)		\
-    return false;
+    goto dl_fail;
   CUDA_CALLS
-  cuda_lib_inited = true;
-  return true;
-}
 # undef CUDA_ONE_CALL
 # undef CUDA_ONE_CALL_1
+  cuda_lib_inited = true;
+  return true;
+
+ dl_fail:
+  GOMP_PLUGIN_debug (0, "  while loading %s: %s\n",
+		     cuda_driver_lib, dlerror ());
+  return false;
+}
 # define CUDA_CALL_PREFIX cuda_lib.
-#else
+#else /* CUDA_DRIVER_DYNAMIC */
 # define CUDA_CALL_PREFIX
 # define init_cuda_lib() true
-#endif
+#endif /* CUDA_DRIVER_DYNAMIC */
 
 /* Convenience macros for the frequently used CUDA library call and
    error handling sequence as well as CUDA library calls that


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)
  2017-01-21 15:28 ` Thomas Schwinge
@ 2017-01-21 19:13   ` Jakub Jelinek
  2017-05-03  9:08   ` Jakub Jelinek
  1 sibling, 0 replies; 24+ messages in thread
From: Jakub Jelinek @ 2017-01-21 19:13 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: gcc-patches, Alexander Monakov, Cesar Philippidis,
	Chung-Lin Tang, Jeff Law

On Sat, Jan 21, 2017 at 03:50:43PM +0100, Thomas Schwinge wrote:
> > In order to configure gcc to load libcuda.so.1 dynamically,
> > one has to either configure it --without-cuda-driver, or without
> > --with-cuda-driver=/--with-cuda-driver-lib=/--with-cuda-driver-include=
> > options if cuda.h and -lcuda aren't found in the default locations.
> 
> Would be good to have that documented ;-) -- done.

Thanks.

> I (obviously) agree with your intended (?) "--without-cuda-driver"
> semantics, but I think a "--with-cuda-driver" option should actually mean
> that the system's/installed CUDA driver package *must* be used (and
> similar for other "--with-cuda-driver*" options); and I also added
> "--with-cuda-driver=check" to allow overriding earlier such options (that
> is, restore the default "check" behavior).
> 
> I say 'intended (?) "--without-cuda-driver" semantics', because with your
> current patch/code, if I got that right, if one specifies
> "--without-cuda-driver" but actually does have a CUDA driver system
> installation available, then the nvptx libgomp plugin will still link
> against that one, instead of "dlopen"ing it.  So I changed that
> accordingly.

Agreed.

> > +PLUGIN_NVPTX_DYNAMIC=0
> 
> I find the name "PLUGIN_NVPTX_DYNAMIC" a bit misleading, as this isn't
> about the nvptx plugin being "dynamic" but rather it's about its usage of
> the CUDA driver library.  Thus renamed to "CUDA_DRIVER_DYNAMIC".

Ack.

> > --- libgomp/plugin/plugin-nvptx.c.jj	2017-01-13 12:07:56.000000000 +0100
> > +++ libgomp/plugin/plugin-nvptx.c	2017-01-13 18:00:39.693284346 +0100
> 
> > +/* -1 if init_cuda_lib has not been called yet, false
> > +   if it has been and failed, true if it has been and succeeded.  */
> > +static char cuda_lib_inited = -1;
> 
> Don't we actually have to worry here about multiple threads running into
> this in parallel -- thus need locking (or atomic accesses?) when
> accessing "cuda_lib_inited"?

I thought it is only accessed when a lock is held, but I could be wrong.
Also, please se my question about why we ever call cuInit in nvptx_init
(whether nvptx_get_num_devices doesn't have to be called first).

> > +/* Dynamically load the CUDA runtime library and initialize function
> 
> Not "CUDA runtime" but actually "CUDA driver" -- changed.

Ok.

> I'd like some GOMP_PLUGIN_debug output for this and the following "return
> false" cases -- added.

Ok.

> > --- libgomp/plugin/cuda/cuda.h.jj	2017-01-13 15:58:00.966544147 +0100
> > +++ libgomp/plugin/cuda/cuda.h	2017-01-13 17:02:47.355817896 +0100
> 
> > +#define CUDA_VERSION 8000
> 
> Does that make it compatible to CUDA 8.0 (and later) only?  (Not yet
> checked.)

The only reason for that is
#if CUDA_VERSION < 7000
  /* Specified in documentation and present in library from at least
     5.5.  Not declared in header file prior to 7.0.  */
  extern CUresult cuGetErrorString (CUresult, const char **);
#endif
I wanted to make it clear that cuGetErrorString prototype is provided.

I must say I don't know enough about ABI and API incompatibilities between
different CUDA versions, I presume functions with defines like:
#define cuLinkCreate cuLinkCreate_v2
at some point weren't using the _v2 suffixes, but have no idea if they had
different arguments or what.  Perhaps that would be supportable by having
some fallback if for those dlsym fails or something.

> @@ -48,26 +49,44 @@ AC_SUBST(CUDA_DRIVER_LIB)
>  CUDA_DRIVER_CPPFLAGS=
>  CUDA_DRIVER_LDFLAGS=
>  AC_ARG_WITH(cuda-driver,
> +	[AS_HELP_STRING([--without-cuda-driver],
> +		[do not use the system's CUDA driver package])])
> +AC_ARG_WITH(cuda-driver,
> +	[AS_HELP_STRING([--with-cuda-driver=check],
> +		[use the system's CUDA driver package, if usable [default]])])
> +AC_ARG_WITH(cuda-driver,
> +	[AS_HELP_STRING([--with-cuda-driver],
> +		[use the system's CUDA driver package])])
> +AC_ARG_WITH(cuda-driver,
>  	[AS_HELP_STRING([--with-cuda-driver=PATH],
> -		[specify prefix directory for installed CUDA driver package.
> -		 Equivalent to --with-cuda-driver-include=PATH/include
> -		 plus --with-cuda-driver-lib=PATH/lib])])
> +		[use installed CUDA driver package, and specify prefix
> +		directory.  Equivalent to
> +		--with-cuda-driver-include=PATH/include plus
> +		--with-cuda-driver-lib=PATH/lib])],
> +	[],
> +	[with_cuda_driver=check])

I admit my autoconf knowledge is limited, but it looks certainly strange
to have several AC_ARG_WITH for the same option.  Shouldn't we use
one AC_ARG_WITH(cuda-driver,
with multiple AS_HELP_STRING inside of its second argument?

>  AC_ARG_WITH(cuda-driver-include,
>  	[AS_HELP_STRING([--with-cuda-driver-include=PATH],
> -		[specify directory for installed CUDA driver include files])])
> +		[use installed CUDA driver package, and specify directory for
> +		include files])])
>  AC_ARG_WITH(cuda-driver-lib,
>  	[AS_HELP_STRING([--with-cuda-driver-lib=PATH],
> -		[specify directory for the installed CUDA driver library])])
> +		[use installed CUDA driver package, and specify directory for
> +		libraries])])
>  case "x$with_cuda_driver" in
> -  x | xno) ;;
> -  *) CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
> -     CUDA_DRIVER_LIB=$with_cuda_driver/lib
> -     ;;
> +  xcheck | xno | xyes)
> +    ;;
> +  *)
> +    CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
> +    CUDA_DRIVER_LIB=$with_cuda_driver/lib
> +    ;;
>  esac
>  if test "x$with_cuda_driver_include" != x; then
> +  CUDA_DRIVER_DYNAMIC=0
>    CUDA_DRIVER_INCLUDE=$with_cuda_driver_include
>  fi
>  if test "x$with_cuda_driver_lib" != x; then
> +  CUDA_DRIVER_DYNAMIC=0
>    CUDA_DRIVER_LIB=$with_cuda_driver_lib
>  fi
>  if test "x$CUDA_DRIVER_INCLUDE" != x; then
> @@ -76,12 +95,22 @@ fi
>  if test "x$CUDA_DRIVER_LIB" != x; then
>    CUDA_DRIVER_LDFLAGS=-L$CUDA_DRIVER_LIB
>  fi
> +case "x$with_cuda_driver" in
> +  xcheck)
> +    CUDA_DRIVER_DYNAMIC=check
> +    ;;
> +  xno)
> +    CUDA_DRIVER_DYNAMIC=1
> +    ;;
> +  xyes | *)
> +    CUDA_DRIVER_DYNAMIC=0
> +    ;;
> +esac

Why two separate case constructs?  Can't you do what you do in the second
in the second instead of that
> +  xcheck | xno | xyes)
> +    ;;
and just add CUDA_DRIVER_DYNAMIC=0 also to the *) case?
> +	case $CUDA_DRIVER_DYNAMIC in
> +	  1)
> +	    PLUGIN_NVPTX=1
> +	    ;;
> +	  check | 0)

Wouldn't it be far simpler to just use
	PLUGIN_NVPTX=1
	if test $CUDA_DRIVER_DYNAMIC != 1; then

> +	    # Determine whether the system's CUDA driver package is usable.
> +	    PLUGIN_NVPTX=0
> +
> +	    # Tentatively point to the system's CUDA driver package.
> +	    PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
> +	    PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
> +	    PLUGIN_NVPTX_LIBS=-lcuda
> +
> +	    PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
> +	    CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
> +	    PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
> +	    LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
> +	    PLUGIN_NVPTX_save_LIBS=$LIBS
> +	    LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
> +	    AC_LINK_IFELSE(
> +	      [AC_LANG_PROGRAM(
> +		[#include "cuda.h"],
> +		  [CUresult r = cuCtxPushCurrent (NULL);])],
> +	      [PLUGIN_NVPTX=1])
> +	    CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
> +	    LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
> +	    LIBS=$PLUGIN_NVPTX_save_LIBS

	fi

> +	    ;;
> +	  *)
> +	    AC_MSG_ERROR([internal error])
> +	    ;;
> +	esac

and drop the above?

> +	case $CUDA_DRIVER_DYNAMIC:$PLUGIN_NVPTX in
> +	  check:0)
> +	    CUDA_DRIVER_DYNAMIC=1
> +	    PLUGIN_NVPTX=1
> +	    ;;
> +	  check:1)
> +	    CUDA_DRIVER_DYNAMIC=0
> +	    ;;
> +	  0:1 | 1:1)
> +	    ;;
> +	  0:0)
> +	    AC_MSG_ERROR([CUDA driver package not usable])
> +	    ;;
> +	  *)
> +	    AC_MSG_ERROR([internal error])
> +	    ;;
>  	esac

This is fine.

> +	if test $CUDA_DRIVER_DYNAMIC = 1; then

Here you use very similar test rather than case (just an argument
why IMHO case is unnecessary).  But not a big deal for me.

> +  const char *cuda_driver_lib = "libcuda.so.1";
> +  void *h = dlopen (cuda_driver_lib, RTLD_LAZY);

Note the HSAIL plugin uses secure_getenv and allows an env var to override
the location of the runtime library.  Dunno if we don't want to do that too,
but in any case, it can be done incrementally.

Otherwise LGTM (and thanks for testing it and patch).

	Jakub

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)
  2017-01-21 15:28 ` Thomas Schwinge
  2017-01-21 19:13   ` Jakub Jelinek
@ 2017-05-03  9:08   ` Jakub Jelinek
  2017-05-04 17:26     ` Thomas Schwinge
  1 sibling, 1 reply; 24+ messages in thread
From: Jakub Jelinek @ 2017-05-03  9:08 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: gcc-patches, Alexander Monakov, Cesar Philippidis,
	Chung-Lin Tang, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 1431 bytes --]

On Sat, Jan 21, 2017 at 03:50:43PM +0100, Thomas Schwinge wrote:
> > In order to configure gcc to load libcuda.so.1 dynamically,
> > one has to either configure it --without-cuda-driver, or without
> > --with-cuda-driver=/--with-cuda-driver-lib=/--with-cuda-driver-include=
> > options if cuda.h and -lcuda aren't found in the default locations.
> 
> Would be good to have that documented ;-) -- done.
> 
> > The nvptx-tools change
> 
> (I'll get to that later.)

I'd like to ping the nvptx-tools change.  Shall I make a github pull request
for that?

I have additional following two further patches, the first one just to shut
up -Wformat-security warning, the other one discovered today to fix build
against glibc trunk - they have changed getopt related includes there and
we get:
In file included from /usr/include/bits/getopt_posix.h:27:0,
                 from /usr/include/unistd.h:872,
                 from ../nvptx-ld.c:23:
/usr/include/bits/getopt_core.h:91:12: error: declaration of 'int getopt(int, char* const*, const char*) throw ()' has a different exception specifier
 extern int getopt (int ___argc, char *const *___argv, const char *__shortopts)
            ^~~~~~
In file included from ../nvptx-ld.c:22:0:
../include/getopt.h:113:12: note: from previous declaration 'int getopt(int, char* const*, const char*)'
 extern int getopt (int argc, char *const *argv, const char *shortopts);
            ^~~~~~

	Jakub

[-- Attachment #2: nvptx-tools-no-ptxas.patch --]
[-- Type: text/plain, Size: 28602 bytes --]

--- nvptx-tools/configure.ac
+++ nvptx-tools/configure.ac
@@ -51,6 +51,7 @@ LIBS="$LIBS -lcuda"
 AC_CHECK_FUNCS([[cuGetErrorName] [cuGetErrorString]])
 AC_CHECK_DECLS([[cuGetErrorName], [cuGetErrorString]],
   [], [], [[#include <cuda.h>]])
+AC_CHECK_HEADERS(unistd.h sys/stat.h)
 
 AC_MSG_CHECKING([for extra programs to build requiring -lcuda])
 NVPTX_RUN=
--- nvptx-tools/include/libiberty.h
+++ nvptx-tools/include/libiberty.h
@@ -390,6 +390,17 @@ extern void hex_init (void);
 /* Save files used for communication between processes.  */
 #define PEX_SAVE_TEMPS		0x4
 
+/* Max number of alloca bytes per call before we must switch to malloc.
+
+   ?? Swiped from gnulib's regex_internal.h header.  Is this actually
+   the case?  This number seems arbitrary, though sane.
+
+   The OS usually guarantees only one guard page at the bottom of the stack,
+   and a page size can be as small as 4096 bytes.  So we cannot safely
+   allocate anything larger than 4096 bytes.  Also care for the possibility
+   of a few compiler-allocated temporary stack slots.  */
+#define MAX_ALLOCA_SIZE	4032
+
 /* Prepare to execute one or more programs, with standard output of
    each program fed to standard input of the next.
    FLAGS	As above.
--- nvptx-tools/nvptx-as.c
+++ nvptx-tools/nvptx-as.c
@@ -30,6 +30,9 @@
 #include <string.h>
 #include <wait.h>
 #include <unistd.h>
+#ifdef HAVE_SYS_STAT_H
+#include <sys/stat.h>
+#endif
 #include <errno.h>
 #define obstack_chunk_alloc malloc
 #define obstack_chunk_free free
@@ -42,6 +45,38 @@
 
 #include "version.h"
 
+#ifndef R_OK
+#define R_OK 4
+#define W_OK 2
+#define X_OK 1
+#endif
+
+#ifndef DIR_SEPARATOR
+#  define DIR_SEPARATOR '/'
+#endif
+
+#if defined (_WIN32) || defined (__MSDOS__) \
+    || defined (__DJGPP__) || defined (__OS2__)
+#  define HAVE_DOS_BASED_FILE_SYSTEM
+#  define HAVE_HOST_EXECUTABLE_SUFFIX
+#  define HOST_EXECUTABLE_SUFFIX ".exe"
+#  ifndef DIR_SEPARATOR_2 
+#    define DIR_SEPARATOR_2 '\\'
+#  endif
+#  define PATH_SEPARATOR ';'
+#else
+#  define PATH_SEPARATOR ':'
+#endif
+
+#ifndef DIR_SEPARATOR_2
+#  define IS_DIR_SEPARATOR(ch) ((ch) == DIR_SEPARATOR)
+#else
+#  define IS_DIR_SEPARATOR(ch) \
+	(((ch) == DIR_SEPARATOR) || ((ch) == DIR_SEPARATOR_2))
+#endif
+
+#define DIR_UP ".."
+
 static const char *outname = NULL;
 
 static void __attribute__ ((format (printf, 1, 2)))
@@ -816,7 +851,7 @@ traverse (void **slot, void *data)
 }
 
 static void
-process (FILE *in, FILE *out)
+process (FILE *in, FILE *out, int verify, const char *outname)
 {
   symbol_table = htab_create (500, hash_string_hash, hash_string_eq,
                               NULL);
@@ -824,6 +859,18 @@ process (FILE *in, FILE *out)
   const char *input = read_file (in);
   Token *tok = tokenize (input);
 
+  /* By default, when ptxas is not in PATH, do minimalistic verification,
+     just require that the first non-comment directive is .version.  */
+  if (verify < 0)
+    {
+      size_t i;
+      for (i = 0; tok[i].kind == K_comment; i++)
+	;
+      if (tok[i].kind != K_dotted || !is_keyword (&tok[i], "version"))
+	fatal_error ("missing .version directive at start of file '%s'",
+		     outname);
+    }
+
   do
     tok = parse_file (tok);
   while (tok->kind);
@@ -897,9 +944,83 @@ fork_execute (const char *prog, char *const *argv)
   do_wait (prog, pex);
 }
 
+/* Determine if progname is available in PATH.  */
+static bool
+program_available (const char *progname)
+{
+  char *temp = getenv ("PATH");
+  if (temp)
+    {
+      char *startp, *endp, *nstore, *alloc_ptr = NULL;
+      size_t prefixlen = strlen (temp) + 1;
+      size_t len;
+      if (prefixlen < 2)
+	prefixlen = 2;
+
+      len = prefixlen + strlen (progname) + 1;
+#ifdef HAVE_HOST_EXECUTABLE_SUFFIX
+      len += strlen (HOST_EXECUTABLE_SUFFIX);
+#endif
+      if (len < MAX_ALLOCA_SIZE)
+	nstore = (char *) alloca (len);
+      else
+	alloc_ptr = nstore = (char *) malloc (len);
+
+      startp = endp = temp;
+      while (1)
+	{
+	  if (*endp == PATH_SEPARATOR || *endp == 0)
+	    {
+	      if (endp == startp)
+		{
+		  nstore[0] = '.';
+		  nstore[1] = DIR_SEPARATOR;
+		  nstore[2] = '\0';
+		}
+	      else
+		{
+		  memcpy (nstore, startp, endp - startp);
+		  if (! IS_DIR_SEPARATOR (endp[-1]))
+		    {
+		      nstore[endp - startp] = DIR_SEPARATOR;
+		      nstore[endp - startp + 1] = 0;
+		    }
+		  else
+		    nstore[endp - startp] = 0;
+		}
+	      strcat (nstore, progname);
+	      if (! access (nstore, X_OK)
+#ifdef HAVE_HOST_EXECUTABLE_SUFFIX
+		  || ! access (strcat (nstore, HOST_EXECUTABLE_SUFFIX), X_OK)
+#endif
+		 )
+		{
+#if defined (HAVE_SYS_STAT_H) && defined (S_ISREG)
+		  struct stat st;
+		  if (stat (nstore, &st) >= 0 && S_ISREG (st.st_mode))
+#endif
+		    {
+		      free (alloc_ptr);
+		      return true;
+		    }
+		}
+
+	      if (*endp == 0)
+		break;
+	      endp = startp = endp + 1;
+	    }
+	  else
+	    endp++;
+	}
+      free (alloc_ptr);
+    }
+  return false;
+}
+
 static struct option long_options[] = {
   {"traditional-format",     no_argument, 0,  0 },
   {"save-temps",  no_argument,       0,  0 },
+  {"verify",  no_argument,       0,  0 },
   {"no-verify",  no_argument,       0,  0 },
   {"help", no_argument, 0, 'h' },
   {"version", no_argument, 0, 'V' },
@@ -912,7 +1033,7 @@ main (int argc, char **argv)
   FILE *in = stdin;
   FILE *out = stdout;
   bool verbose __attribute__((unused)) = false;
-  bool verify = true;
+  int verify = -1;
   const char *smver = "sm_30";
 
   int o;
@@ -923,7 +1044,9 @@ main (int argc, char **argv)
 	{
 	case 0:
 	  if (option_index == 2)
-	    verify = false;
+	    verify = 1;
+	  else if (option_index == 3)
+	    verify = 0;
 	  break;
 	case 'v':
 	  verbose = true;
@@ -948,7 +1071,8 @@ Usage: nvptx-none-as [option...] [asmfile]\n\
 Options:\n\
   -o FILE               Write output to FILE\n\
   -v                    Be verbose\n\
+  --verify              Do verify output is acceptable to ptxas\n\
   --no-verify           Do not verify output is acceptable to ptxas\n\
   --help                Print this help and exit\n\
   --version             Print version number and exit\n\
 \n\
@@ -983,11 +1108,17 @@ This program has absolutely no warranty.\n",
   if (!in)
     fatal_error ("cannot open input ptx file");
 
-  process (in, out);
-  if  (outname)
+  if (outname == NULL)
+    verify = 0;
+  else if (verify == -1)
+    if (program_available ("ptxas"))
+      verify = 1;
+
+  process (in, out, verify, outname);
+  if (outname)
     fclose (out);
 
-  if (verify && outname)
+  if (verify > 0)
     {
       struct obstack argv_obstack;
       obstack_init (&argv_obstack);
--- nvptx-tools/configure
+++ nvptx-tools/configure
@@ -168,7 +168,8 @@ test x\$exitcode = x0 || exit 1"
   as_suggested="  as_lineno_1=";as_suggested=$as_suggested$LINENO;as_suggested=$as_suggested" as_lineno_1a=\$LINENO
   as_lineno_2=";as_suggested=$as_suggested$LINENO;as_suggested=$as_suggested" as_lineno_2a=\$LINENO
   eval 'test \"x\$as_lineno_1'\$as_run'\" != \"x\$as_lineno_2'\$as_run'\" &&
-  test \"x\`expr \$as_lineno_1'\$as_run' + 1\`\" = \"x\$as_lineno_2'\$as_run'\"' || exit 1"
+  test \"x\`expr \$as_lineno_1'\$as_run' + 1\`\" = \"x\$as_lineno_2'\$as_run'\"' || exit 1
+test \$(( 1 + 1 )) = 2 || exit 1"
   if (eval "$as_required") 2>/dev/null; then :
   as_have_required=yes
 else
@@ -552,11 +553,50 @@ PACKAGE_URL=
 
 ac_unique_file="nvptx-tools"
 ac_unique_file="nvptx-as.c"
+# Factoring default headers for most tests.
+ac_includes_default="\
+#include <stdio.h>
+#ifdef HAVE_SYS_TYPES_H
+# include <sys/types.h>
+#endif
+#ifdef HAVE_SYS_STAT_H
+# include <sys/stat.h>
+#endif
+#ifdef STDC_HEADERS
+# include <stdlib.h>
+# include <stddef.h>
+#else
+# ifdef HAVE_STDLIB_H
+#  include <stdlib.h>
+# endif
+#endif
+#ifdef HAVE_STRING_H
+# if !defined STDC_HEADERS && defined HAVE_MEMORY_H
+#  include <memory.h>
+# endif
+# include <string.h>
+#endif
+#ifdef HAVE_STRINGS_H
+# include <strings.h>
+#endif
+#ifdef HAVE_INTTYPES_H
+# include <inttypes.h>
+#endif
+#ifdef HAVE_STDINT_H
+# include <stdint.h>
+#endif
+#ifdef HAVE_UNISTD_H
+# include <unistd.h>
+#endif"
+
 enable_option_checking=no
 ac_subst_vars='LTLIBOBJS
 LIBOBJS
 subdirs
 NVPTX_RUN
+EGREP
+GREP
+CPP
 CUDA_DRIVER_LDFLAGS
 CUDA_DRIVER_CPPFLAGS
 AR
@@ -635,7 +675,8 @@ LIBS
 CPPFLAGS
 CXX
 CXXFLAGS
-CCC'
+CCC
+CPP'
 ac_subdirs_all='libiberty'
 
 # Initialize some variables set by options.
@@ -1267,6 +1308,7 @@ Some influential environment variables:
               you have headers in a nonstandard directory <include dir>
   CXX         C++ compiler command
   CXXFLAGS    C++ compiler flags
+  CPP         C preprocessor
 
 Use these variables to override the choices made by `configure' or to help
 it to find libraries and programs with nonstandard names/locations.
@@ -1575,6 +1617,203 @@ $as_echo "$ac_res" >&6; }
   eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;}
 
 } # ac_fn_c_check_decl
+
+# ac_fn_c_try_cpp LINENO
+# ----------------------
+# Try to preprocess conftest.$ac_ext, and return whether this succeeded.
+ac_fn_c_try_cpp ()
+{
+  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
+  if { { ac_try="$ac_cpp conftest.$ac_ext"
+case "(($ac_try" in
+  *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+  *) ac_try_echo=$ac_try;;
+esac
+eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\""
+$as_echo "$ac_try_echo"; } >&5
+  (eval "$ac_cpp conftest.$ac_ext") 2>conftest.err
+  ac_status=$?
+  if test -s conftest.err; then
+    grep -v '^ *+' conftest.err >conftest.er1
+    cat conftest.er1 >&5
+    mv -f conftest.er1 conftest.err
+  fi
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; } >/dev/null && {
+	 test -z "$ac_c_preproc_warn_flag$ac_c_werror_flag" ||
+	 test ! -s conftest.err
+       }; then :
+  ac_retval=0
+else
+  $as_echo "$as_me: failed program was:" >&5
+sed 's/^/| /' conftest.$ac_ext >&5
+
+    ac_retval=1
+fi
+  eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;}
+  return $ac_retval
+
+} # ac_fn_c_try_cpp
+
+# ac_fn_c_check_header_mongrel LINENO HEADER VAR INCLUDES
+# -------------------------------------------------------
+# Tests whether HEADER exists, giving a warning if it cannot be compiled using
+# the include files in INCLUDES and setting the cache variable VAR
+# accordingly.
+ac_fn_c_check_header_mongrel ()
+{
+  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
+  if { as_var=$3; eval "test \"\${$as_var+set}\" = set"; }; then :
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5
+$as_echo_n "checking for $2... " >&6; }
+if { as_var=$3; eval "test \"\${$as_var+set}\" = set"; }; then :
+  $as_echo_n "(cached) " >&6
+fi
+eval ac_res=\$$3
+	       { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
+$as_echo "$ac_res" >&6; }
+else
+  # Is the header compilable?
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking $2 usability" >&5
+$as_echo_n "checking $2 usability... " >&6; }
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+$4
+#include <$2>
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  ac_header_compiler=yes
+else
+  ac_header_compiler=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_header_compiler" >&5
+$as_echo "$ac_header_compiler" >&6; }
+
+# Is the header present?
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking $2 presence" >&5
+$as_echo_n "checking $2 presence... " >&6; }
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <$2>
+_ACEOF
+if ac_fn_c_try_cpp "$LINENO"; then :
+  ac_header_preproc=yes
+else
+  ac_header_preproc=no
+fi
+rm -f conftest.err conftest.$ac_ext
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_header_preproc" >&5
+$as_echo "$ac_header_preproc" >&6; }
+
+# So?  What about this header?
+case $ac_header_compiler:$ac_header_preproc:$ac_c_preproc_warn_flag in #((
+  yes:no: )
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: accepted by the compiler, rejected by the preprocessor!" >&5
+$as_echo "$as_me: WARNING: $2: accepted by the compiler, rejected by the preprocessor!" >&2;}
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: proceeding with the compiler's result" >&5
+$as_echo "$as_me: WARNING: $2: proceeding with the compiler's result" >&2;}
+    ;;
+  no:yes:* )
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: present but cannot be compiled" >&5
+$as_echo "$as_me: WARNING: $2: present but cannot be compiled" >&2;}
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2:     check for missing prerequisite headers?" >&5
+$as_echo "$as_me: WARNING: $2:     check for missing prerequisite headers?" >&2;}
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: see the Autoconf documentation" >&5
+$as_echo "$as_me: WARNING: $2: see the Autoconf documentation" >&2;}
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2:     section \"Present But Cannot Be Compiled\"" >&5
+$as_echo "$as_me: WARNING: $2:     section \"Present But Cannot Be Compiled\"" >&2;}
+    { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: proceeding with the compiler's result" >&5
+$as_echo "$as_me: WARNING: $2: proceeding with the compiler's result" >&2;}
+    ;;
+esac
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5
+$as_echo_n "checking for $2... " >&6; }
+if { as_var=$3; eval "test \"\${$as_var+set}\" = set"; }; then :
+  $as_echo_n "(cached) " >&6
+else
+  eval "$3=\$ac_header_compiler"
+fi
+eval ac_res=\$$3
+	       { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
+$as_echo "$ac_res" >&6; }
+fi
+  eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;}
+
+} # ac_fn_c_check_header_mongrel
+
+# ac_fn_c_try_run LINENO
+# ----------------------
+# Try to link conftest.$ac_ext, and return whether this succeeded. Assumes
+# that executables *can* be run.
+ac_fn_c_try_run ()
+{
+  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
+  if { { ac_try="$ac_link"
+case "(($ac_try" in
+  *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+  *) ac_try_echo=$ac_try;;
+esac
+eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\""
+$as_echo "$ac_try_echo"; } >&5
+  (eval "$ac_link") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; } && { ac_try='./conftest$ac_exeext'
+  { { case "(($ac_try" in
+  *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+  *) ac_try_echo=$ac_try;;
+esac
+eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\""
+$as_echo "$ac_try_echo"; } >&5
+  (eval "$ac_try") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }; then :
+  ac_retval=0
+else
+  $as_echo "$as_me: program exited with status $ac_status" >&5
+       $as_echo "$as_me: failed program was:" >&5
+sed 's/^/| /' conftest.$ac_ext >&5
+
+       ac_retval=$ac_status
+fi
+  rm -rf conftest.dSYM conftest_ipa8_conftest.oo
+  eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;}
+  return $ac_retval
+
+} # ac_fn_c_try_run
+
+# ac_fn_c_check_header_compile LINENO HEADER VAR INCLUDES
+# -------------------------------------------------------
+# Tests whether HEADER exists and can be compiled using the include files in
+# INCLUDES, setting the cache variable VAR accordingly.
+ac_fn_c_check_header_compile ()
+{
+  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5
+$as_echo_n "checking for $2... " >&6; }
+if { as_var=$3; eval "test \"\${$as_var+set}\" = set"; }; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+$4
+#include <$2>
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  eval "$3=yes"
+else
+  eval "$3=no"
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+eval ac_res=\$$3
+	       { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
+$as_echo "$ac_res" >&6; }
+  eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;}
+
+} # ac_fn_c_check_header_compile
 cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
@@ -3284,6 +3523,418 @@ cat >>confdefs.h <<_ACEOF
 #define HAVE_DECL_CUGETERRORSTRING $ac_have_decl
 _ACEOF
 
+ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking how to run the C preprocessor" >&5
+$as_echo_n "checking how to run the C preprocessor... " >&6; }
+# On Suns, sometimes $CPP names a directory.
+if test -n "$CPP" && test -d "$CPP"; then
+  CPP=
+fi
+if test -z "$CPP"; then
+  if test "${ac_cv_prog_CPP+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+      # Double quotes because CPP needs to be expanded
+    for CPP in "$CC -E" "$CC -E -traditional-cpp" "/lib/cpp"
+    do
+      ac_preproc_ok=false
+for ac_c_preproc_warn_flag in '' yes
+do
+  # Use a header file that comes with gcc, so configuring glibc
+  # with a fresh cross-compiler works.
+  # Prefer <limits.h> to <assert.h> if __STDC__ is defined, since
+  # <limits.h> exists even on freestanding compilers.
+  # On the NeXT, cc -E runs the code through the compiler's parser,
+  # not just through cpp. "Syntax error" is here to catch this case.
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#ifdef __STDC__
+# include <limits.h>
+#else
+# include <assert.h>
+#endif
+		     Syntax error
+_ACEOF
+if ac_fn_c_try_cpp "$LINENO"; then :
+
+else
+  # Broken: fails on valid input.
+continue
+fi
+rm -f conftest.err conftest.$ac_ext
+
+  # OK, works on sane cases.  Now check whether nonexistent headers
+  # can be detected and how.
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <ac_nonexistent.h>
+_ACEOF
+if ac_fn_c_try_cpp "$LINENO"; then :
+  # Broken: success on invalid input.
+continue
+else
+  # Passes both tests.
+ac_preproc_ok=:
+break
+fi
+rm -f conftest.err conftest.$ac_ext
+
+done
+# Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped.
+rm -f conftest.err conftest.$ac_ext
+if $ac_preproc_ok; then :
+  break
+fi
+
+    done
+    ac_cv_prog_CPP=$CPP
+
+fi
+  CPP=$ac_cv_prog_CPP
+else
+  ac_cv_prog_CPP=$CPP
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $CPP" >&5
+$as_echo "$CPP" >&6; }
+ac_preproc_ok=false
+for ac_c_preproc_warn_flag in '' yes
+do
+  # Use a header file that comes with gcc, so configuring glibc
+  # with a fresh cross-compiler works.
+  # Prefer <limits.h> to <assert.h> if __STDC__ is defined, since
+  # <limits.h> exists even on freestanding compilers.
+  # On the NeXT, cc -E runs the code through the compiler's parser,
+  # not just through cpp. "Syntax error" is here to catch this case.
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#ifdef __STDC__
+# include <limits.h>
+#else
+# include <assert.h>
+#endif
+		     Syntax error
+_ACEOF
+if ac_fn_c_try_cpp "$LINENO"; then :
+
+else
+  # Broken: fails on valid input.
+continue
+fi
+rm -f conftest.err conftest.$ac_ext
+
+  # OK, works on sane cases.  Now check whether nonexistent headers
+  # can be detected and how.
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <ac_nonexistent.h>
+_ACEOF
+if ac_fn_c_try_cpp "$LINENO"; then :
+  # Broken: success on invalid input.
+continue
+else
+  # Passes both tests.
+ac_preproc_ok=:
+break
+fi
+rm -f conftest.err conftest.$ac_ext
+
+done
+# Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped.
+rm -f conftest.err conftest.$ac_ext
+if $ac_preproc_ok; then :
+
+else
+  { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error "C preprocessor \"$CPP\" fails sanity check
+See \`config.log' for more details." "$LINENO" 5; }
+fi
+
+ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for grep that handles long lines and -e" >&5
+$as_echo_n "checking for grep that handles long lines and -e... " >&6; }
+if test "${ac_cv_path_GREP+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  if test -z "$GREP"; then
+  ac_path_GREP_found=false
+  # Loop through the user's path and test for each of PROGNAME-LIST
+  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin
+do
+  IFS=$as_save_IFS
+  test -z "$as_dir" && as_dir=.
+    for ac_prog in grep ggrep; do
+    for ac_exec_ext in '' $ac_executable_extensions; do
+      ac_path_GREP="$as_dir/$ac_prog$ac_exec_ext"
+      { test -f "$ac_path_GREP" && $as_test_x "$ac_path_GREP"; } || continue
+# Check for GNU ac_path_GREP and select it if it is found.
+  # Check for GNU $ac_path_GREP
+case `"$ac_path_GREP" --version 2>&1` in
+*GNU*)
+  ac_cv_path_GREP="$ac_path_GREP" ac_path_GREP_found=:;;
+*)
+  ac_count=0
+  $as_echo_n 0123456789 >"conftest.in"
+  while :
+  do
+    cat "conftest.in" "conftest.in" >"conftest.tmp"
+    mv "conftest.tmp" "conftest.in"
+    cp "conftest.in" "conftest.nl"
+    $as_echo 'GREP' >> "conftest.nl"
+    "$ac_path_GREP" -e 'GREP$' -e '-(cannot match)-' < "conftest.nl" >"conftest.out" 2>/dev/null || break
+    diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break
+    as_fn_arith $ac_count + 1 && ac_count=$as_val
+    if test $ac_count -gt ${ac_path_GREP_max-0}; then
+      # Best one so far, save it but keep looking for a better one
+      ac_cv_path_GREP="$ac_path_GREP"
+      ac_path_GREP_max=$ac_count
+    fi
+    # 10*(2^10) chars as input seems more than enough
+    test $ac_count -gt 10 && break
+  done
+  rm -f conftest.in conftest.tmp conftest.nl conftest.out;;
+esac
+
+      $ac_path_GREP_found && break 3
+    done
+  done
+  done
+IFS=$as_save_IFS
+  if test -z "$ac_cv_path_GREP"; then
+    as_fn_error "no acceptable grep could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" "$LINENO" 5
+  fi
+else
+  ac_cv_path_GREP=$GREP
+fi
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_GREP" >&5
+$as_echo "$ac_cv_path_GREP" >&6; }
+ GREP="$ac_cv_path_GREP"
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for egrep" >&5
+$as_echo_n "checking for egrep... " >&6; }
+if test "${ac_cv_path_EGREP+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  if echo a | $GREP -E '(a|b)' >/dev/null 2>&1
+   then ac_cv_path_EGREP="$GREP -E"
+   else
+     if test -z "$EGREP"; then
+  ac_path_EGREP_found=false
+  # Loop through the user's path and test for each of PROGNAME-LIST
+  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin
+do
+  IFS=$as_save_IFS
+  test -z "$as_dir" && as_dir=.
+    for ac_prog in egrep; do
+    for ac_exec_ext in '' $ac_executable_extensions; do
+      ac_path_EGREP="$as_dir/$ac_prog$ac_exec_ext"
+      { test -f "$ac_path_EGREP" && $as_test_x "$ac_path_EGREP"; } || continue
+# Check for GNU ac_path_EGREP and select it if it is found.
+  # Check for GNU $ac_path_EGREP
+case `"$ac_path_EGREP" --version 2>&1` in
+*GNU*)
+  ac_cv_path_EGREP="$ac_path_EGREP" ac_path_EGREP_found=:;;
+*)
+  ac_count=0
+  $as_echo_n 0123456789 >"conftest.in"
+  while :
+  do
+    cat "conftest.in" "conftest.in" >"conftest.tmp"
+    mv "conftest.tmp" "conftest.in"
+    cp "conftest.in" "conftest.nl"
+    $as_echo 'EGREP' >> "conftest.nl"
+    "$ac_path_EGREP" 'EGREP$' < "conftest.nl" >"conftest.out" 2>/dev/null || break
+    diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break
+    as_fn_arith $ac_count + 1 && ac_count=$as_val
+    if test $ac_count -gt ${ac_path_EGREP_max-0}; then
+      # Best one so far, save it but keep looking for a better one
+      ac_cv_path_EGREP="$ac_path_EGREP"
+      ac_path_EGREP_max=$ac_count
+    fi
+    # 10*(2^10) chars as input seems more than enough
+    test $ac_count -gt 10 && break
+  done
+  rm -f conftest.in conftest.tmp conftest.nl conftest.out;;
+esac
+
+      $ac_path_EGREP_found && break 3
+    done
+  done
+  done
+IFS=$as_save_IFS
+  if test -z "$ac_cv_path_EGREP"; then
+    as_fn_error "no acceptable egrep could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" "$LINENO" 5
+  fi
+else
+  ac_cv_path_EGREP=$EGREP
+fi
+
+   fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_EGREP" >&5
+$as_echo "$ac_cv_path_EGREP" >&6; }
+ EGREP="$ac_cv_path_EGREP"
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for ANSI C header files" >&5
+$as_echo_n "checking for ANSI C header files... " >&6; }
+if test "${ac_cv_header_stdc+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <stdlib.h>
+#include <stdarg.h>
+#include <string.h>
+#include <float.h>
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  ac_cv_header_stdc=yes
+else
+  ac_cv_header_stdc=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+
+if test $ac_cv_header_stdc = yes; then
+  # SunOS 4.x string.h does not declare mem*, contrary to ANSI.
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <string.h>
+
+_ACEOF
+if (eval "$ac_cpp conftest.$ac_ext") 2>&5 |
+  $EGREP "memchr" >/dev/null 2>&1; then :
+
+else
+  ac_cv_header_stdc=no
+fi
+rm -f conftest*
+
+fi
+
+if test $ac_cv_header_stdc = yes; then
+  # ISC 2.0.2 stdlib.h does not declare free, contrary to ANSI.
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <stdlib.h>
+
+_ACEOF
+if (eval "$ac_cpp conftest.$ac_ext") 2>&5 |
+  $EGREP "free" >/dev/null 2>&1; then :
+
+else
+  ac_cv_header_stdc=no
+fi
+rm -f conftest*
+
+fi
+
+if test $ac_cv_header_stdc = yes; then
+  # /bin/cc in Irix-4.0.5 gets non-ANSI ctype macros unless using -ansi.
+  if test "$cross_compiling" = yes; then :
+  :
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <ctype.h>
+#include <stdlib.h>
+#if ((' ' & 0x0FF) == 0x020)
+# define ISLOWER(c) ('a' <= (c) && (c) <= 'z')
+# define TOUPPER(c) (ISLOWER(c) ? 'A' + ((c) - 'a') : (c))
+#else
+# define ISLOWER(c) \
+		   (('a' <= (c) && (c) <= 'i') \
+		     || ('j' <= (c) && (c) <= 'r') \
+		     || ('s' <= (c) && (c) <= 'z'))
+# define TOUPPER(c) (ISLOWER(c) ? ((c) | 0x40) : (c))
+#endif
+
+#define XOR(e, f) (((e) && !(f)) || (!(e) && (f)))
+int
+main ()
+{
+  int i;
+  for (i = 0; i < 256; i++)
+    if (XOR (islower (i), ISLOWER (i))
+	|| toupper (i) != TOUPPER (i))
+      return 2;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_run "$LINENO"; then :
+
+else
+  ac_cv_header_stdc=no
+fi
+rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \
+  conftest.$ac_objext conftest.beam conftest.$ac_ext
+fi
+
+fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_header_stdc" >&5
+$as_echo "$ac_cv_header_stdc" >&6; }
+if test $ac_cv_header_stdc = yes; then
+
+$as_echo "#define STDC_HEADERS 1" >>confdefs.h
+
+fi
+
+# On IRIX 5.3, sys/types and inttypes.h are conflicting.
+for ac_header in sys/types.h sys/stat.h stdlib.h string.h memory.h strings.h \
+		  inttypes.h stdint.h unistd.h
+do :
+  as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
+ac_fn_c_check_header_compile "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default
+"
+eval as_val=\$$as_ac_Header
+   if test "x$as_val" = x""yes; then :
+  cat >>confdefs.h <<_ACEOF
+#define `$as_echo "HAVE_$ac_header" | $as_tr_cpp` 1
+_ACEOF
+
+fi
+
+done
+
+
+for ac_header in unistd.h sys/stat.h
+do :
+  as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
+ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
+eval as_val=\$$as_ac_Header
+   if test "x$as_val" = x""yes; then :
+  cat >>confdefs.h <<_ACEOF
+#define `$as_echo "HAVE_$ac_header" | $as_tr_cpp` 1
+_ACEOF
+
+fi
+
+done
+
 
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for extra programs to build requiring -lcuda" >&5
 $as_echo_n "checking for extra programs to build requiring -lcuda... " >&6; }

[-- Attachment #3: nvptx-tools-build.patch --]
[-- Type: text/plain, Size: 330 bytes --]

--- nvptx-tools/nvptx-as.c.jj	2017-01-20 12:40:18.000000000 +0100
+++ nvptx-tools/nvptx-as.c	2017-01-20 12:43:53.864271442 +0100
@@ -939,7 +939,7 @@ fork_execute (const char *prog, char *co
 	  fatal_error ("%s: %m", errmsg);
 	}
       else
-	fatal_error (errmsg);
+	fatal_error ("%s", errmsg);
     }
   do_wait (prog, pex);
 }

[-- Attachment #4: nvptx-tools-glibc.patch --]
[-- Type: text/plain, Size: 844 bytes --]

--- nvptx-tools/configure.ac.jj	2017-01-13 12:48:31.000000000 +0100
+++ nvptx-tools/configure.ac	2017-05-03 10:26:57.076092259 +0200
@@ -66,6 +66,8 @@ CPPFLAGS=$save_CPPFLAGS
 LDFLAGS=$save_LDFLAGS
 LIBS=$save_LIBS
 
+AC_CHECK_DECLS(getopt)
+
 AC_CONFIG_SUBDIRS([libiberty])
 AC_CONFIG_FILES([Makefile dejagnu.exp])
 AC_OUTPUT
--- nvptx-tools/configure.jj	2017-01-13 12:48:54.000000000 +0100
+++ nvptx-tools/configure	2017-05-03 10:27:13.503876809 +0200
@@ -3963,6 +3963,18 @@ CPPFLAGS=$save_CPPFLAGS
 LDFLAGS=$save_LDFLAGS
 LIBS=$save_LIBS
 
+ac_fn_c_check_decl "$LINENO" "getopt" "ac_cv_have_decl_getopt" "$ac_includes_default"
+if test "x$ac_cv_have_decl_getopt" = x""yes; then :
+  ac_have_decl=1
+else
+  ac_have_decl=0
+fi
+
+cat >>confdefs.h <<_ACEOF
+#define HAVE_DECL_GETOPT $ac_have_decl
+_ACEOF
+
+
 
 
 subdirs="$subdirs libiberty"

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)
  2017-05-03  9:08   ` Jakub Jelinek
@ 2017-05-04 17:26     ` Thomas Schwinge
  0 siblings, 0 replies; 24+ messages in thread
From: Thomas Schwinge @ 2017-05-04 17:26 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: gcc-patches, Alexander Monakov, Cesar Philippidis,
	Chung-Lin Tang, Jeff Law

Hi!

On Wed, 3 May 2017 11:00:14 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Sat, Jan 21, 2017 at 03:50:43PM +0100, Thomas Schwinge wrote:
> > > In order to configure gcc to load libcuda.so.1 dynamically,
> > > one has to either configure it --without-cuda-driver, or without
> > > --with-cuda-driver=/--with-cuda-driver-lib=/--with-cuda-driver-include=
> > > options if cuda.h and -lcuda aren't found in the default locations.

(I still have to follow up with my additional GCC changes...)


> > > The nvptx-tools change
> > 
> > (I'll get to that later.)
> 
> I'd like to ping the nvptx-tools change.  Shall I make a github pull request
> for that?

In the future, yes please.

This time, I've handled it in
<https://github.com/MentorEmbedded/nvptx-tools/pull/18>.

> I have additional following two further patches, the first one just to shut
> up -Wformat-security warning

Tom had already submitted
<https://github.com/MentorEmbedded/nvptx-tools/pull/16> including the
same fix, which I've merged earlier today.

> the other one discovered today to fix build
> against glibc trunk - they have changed getopt related includes there

I handled that one in
<https://github.com/MentorEmbedded/nvptx-tools/pull/17>.

Thanks!


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))
  2017-01-13 18:11 [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches) Jakub Jelinek
                   ` (2 preceding siblings ...)
  2017-01-21 15:28 ` Thomas Schwinge
@ 2022-04-06 12:39 ` Thomas Schwinge
  2022-04-06 12:41   ` Jakub Jelinek
  2022-04-07 22:27 ` libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA' " Thomas Schwinge
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Thomas Schwinge @ 2022-04-06 12:39 UTC (permalink / raw)
  To: Jakub Jelinek, gcc-patches; +Cc: Alexander Monakov, Jeff Law, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 2612 bytes --]

Hi!

On 2017-01-13T19:11:23+0100, Jakub Jelinek <jakub@redhat.com> wrote:
> Especially for distributions it is undesirable to need to have proprietary
> CUDA libraries and headers installed when building GCC.

> I've talked to our lawyers and they said that the cuda.h header included
> in this patch doesn't infringe anyone's copyright or is otherwise a fair
> use, it has been created by gathering all the cu*/CU* symbols from the
> current and older nvptx plugin and some oacc tests, then stubbing the
> pointer-ish typedefs, grabing most enum values and function prototypes from
> https://raw.githubusercontent.com/shinpei0208/gdev/master/cuda/driver/cuda.h
> and verifying assembly with that header against assembly when compiled
> against NVidia's cuda.h.

..., and later accordingly was slightly extended, as necessary to use
further CUDA features in libgomp's nvptx plugin.

> --- libgomp/plugin/cuda/cuda.h.jj     2017-01-13 15:58:00.966544147 +0100
> +++ libgomp/plugin/cuda/cuda.h        2017-01-13 17:02:47.355817896 +0100
> @@ -0,0 +1,174 @@
> +/* CUDA API description.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.
> +
> +This header provides the minimum amount of typedefs, enums and function
> +declarations to be able to compile plugin-nvptx.c if cuda.h and
> +libcuda.so.1 are not available.  */
> +
> +#ifndef GCC_CUDA_H
> +#define GCC_CUDA_H
> +[...]
> +#endif /* GCC_CUDA_H */

OK to push the attached
"Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h'", so that I'm
also able to use that file in the nvptx-tools, which inherit GCC's
'include' directory?


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Move-libgomp-plugin-cuda-cuda.h-to-include-cuda-cuda.patch --]
[-- Type: text/x-diff, Size: 3237 bytes --]

From a6f9d53277ff8408cdbd7b89f3e7595e40333d48 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 6 Apr 2022 14:12:29 +0200
Subject: [PATCH] Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h'

... so that it may be used by other projects that inherit GCC's 'include'
directory.

	include/
	* cuda/cuda.h: New file.
	libgomp/
	* plugin/cuda/cuda.h: Remove file.
	* plugin/plugin-nvptx.c [PLUGIN_NVPTX_DYNAMIC]: Include
	"cuda/cuda.h" instead of <cuda.h>.
	* plugin/configfrag.ac <PLUGIN_NVPTX_DYNAMIC>: Don't set
	'PLUGIN_NVPTX_CPPFLAGS'.
	* configure: Regenerate.
---
 {libgomp/plugin => include}/cuda/cuda.h | 7 +++----
 libgomp/configure                       | 1 -
 libgomp/plugin/configfrag.ac            | 1 -
 libgomp/plugin/plugin-nvptx.c           | 6 +++++-
 4 files changed, 8 insertions(+), 7 deletions(-)
 rename {libgomp/plugin => include}/cuda/cuda.h (97%)

diff --git a/libgomp/plugin/cuda/cuda.h b/include/cuda/cuda.h
similarity index 97%
rename from libgomp/plugin/cuda/cuda.h
rename to include/cuda/cuda.h
index 5c679c1767a..5c813ad2cf8 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -1,4 +1,4 @@
-/* CUDA API description.
+/* CUDA Driver API description.
    Copyright (C) 2017-2022 Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -22,9 +22,8 @@ a copy of the GCC Runtime Library Exception along with this program;
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.
 
-This header provides the minimum amount of typedefs, enums and function
-declarations to be able to compile plugin-nvptx.c if cuda.h and
-libcuda.so.1 are not available.  */
+This header provides parts of the CUDA Driver API, without having to rely on
+the proprietary CUDA toolkit.  */
 
 #ifndef GCC_CUDA_H
 #define GCC_CUDA_H
diff --git a/libgomp/configure b/libgomp/configure
index b1b620cabc3..f863aa2ead4 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15297,7 +15297,6 @@ rm -f core conftest.err conftest.$ac_objext \
 		       && (test "x$CUDA_DRIVER_LIB" = x \
 			   || test "x$CUDA_DRIVER_LIB" = xno); then
 		      PLUGIN_NVPTX=1
-		      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
 		      PLUGIN_NVPTX_LIBS='-ldl'
 		      PLUGIN_NVPTX_DYNAMIC=1
 		    else
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index fc298391d4c..54d4b675c4e 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -156,7 +156,6 @@ if test x"$enable_offload_targets" != x; then
 		       && (test "x$CUDA_DRIVER_LIB" = x \
 			   || test "x$CUDA_DRIVER_LIB" = xno); then
 		      PLUGIN_NVPTX=1
-		      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
 		      PLUGIN_NVPTX_LIBS='-ldl'
 		      PLUGIN_NVPTX_DYNAMIC=1
 		    else
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index b4f0a84d77a..b28dfca00fa 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -41,7 +41,11 @@
 #include "oacc-int.h"
 
 #include <pthread.h>
-#include <cuda.h>
+#if PLUGIN_NVPTX_DYNAMIC
+# include "cuda/cuda.h"
+#else
+# include <cuda.h>
+#endif
 #include <stdbool.h>
 #include <limits.h>
 #include <string.h>
-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))
  2022-04-06 12:39 ` Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)) Thomas Schwinge
@ 2022-04-06 12:41   ` Jakub Jelinek
  0 siblings, 0 replies; 24+ messages in thread
From: Jakub Jelinek @ 2022-04-06 12:41 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches, Alexander Monakov, Jeff Law, Tom de Vries

On Wed, Apr 06, 2022 at 02:39:18PM +0200, Thomas Schwinge wrote:
> ... so that it may be used by other projects that inherit GCC's 'include'
> directory.
> 
> 	include/
> 	* cuda/cuda.h: New file.
> 	libgomp/
> 	* plugin/cuda/cuda.h: Remove file.
> 	* plugin/plugin-nvptx.c [PLUGIN_NVPTX_DYNAMIC]: Include
> 	"cuda/cuda.h" instead of <cuda.h>.
> 	* plugin/configfrag.ac <PLUGIN_NVPTX_DYNAMIC>: Don't set
> 	'PLUGIN_NVPTX_CPPFLAGS'.
> 	* configure: Regenerate.

Ok.

	Jakub


^ permalink raw reply	[flat|nested] 24+ messages in thread

* libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))
  2017-01-13 18:11 [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches) Jakub Jelinek
                   ` (3 preceding siblings ...)
  2022-04-06 12:39 ` Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)) Thomas Schwinge
@ 2022-04-07 22:27 ` Thomas Schwinge
  2022-04-08  7:35   ` Tom de Vries
  2022-05-18 10:08 ` 'include/cuda/cuda.h': For C++, wrap in 'extern "C"' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)) Thomas Schwinge
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Thomas Schwinge @ 2022-04-07 22:27 UTC (permalink / raw)
  To: Jakub Jelinek, Tom de Vries, gcc-patches; +Cc: Alexander Monakov, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 1268 bytes --]

Hi!

On 2017-01-13T19:11:23+0100, Jakub Jelinek <jakub@redhat.com> wrote:
> Especially for distributions it is undesirable to need to have proprietary
> CUDA libraries and headers installed when building GCC.

> --- libgomp/plugin/configfrag.ac.jj   2017-01-13 12:07:56.000000000 +0100
> +++ libgomp/plugin/configfrag.ac      2017-01-13 17:33:26.608240936 +0100

> +           PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
> +           PLUGIN_NVPTX_LIBS='-ldl'
> +           PLUGIN_NVPTX_DYNAMIC=1

> +AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
> +  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should be linked against it.])

Actually, the conditionals leading to 'PLUGIN_NVPTX_DYNAMIC=1' here do
control two orthogonal aspects; OK to disentangle that with the attached
"libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-nvptx-plugin-Split-PLUGIN_NVPTX_DYNAMIC-into.patch --]
[-- Type: text/x-diff, Size: 9643 bytes --]

From c455522ac5d8ab41e5d11f8997678e042ff48e87 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 7 Apr 2022 23:10:16 +0200
Subject: [PATCH] libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'

Including the GCC-shipped 'include/cuda/cuda.h' vs. system <cuda.h> and
'dlopen'ing the CUDA Driver library vs. linking it are separate concerns.

	libgomp/
	* plugin/Makefrag.am: Handle 'PLUGIN_NVPTX_DYNAMIC'.
	* plugin/configfrag.ac (PLUGIN_NVPTX_DYNAMIC): Change
	'AC_DEFINE_UNQUOTED' into 'AM_CONDITIONAL'.
	* plugin/plugin-nvptx.c: Split 'PLUGIN_NVPTX_DYNAMIC' into
	'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and
	'PLUGIN_NVPTX_LINK_LIBCUDA'.
	* Makefile.in: Regenerate.
	* config.h.in: Likewise.
	* configure: Likewise.
---
 libgomp/Makefile.in           | 26 +++++++++++++++++++-------
 libgomp/config.h.in           |  4 ----
 libgomp/configure             | 21 +++++++++++++++------
 libgomp/plugin/Makefrag.am    | 16 +++++++++++++++-
 libgomp/plugin/configfrag.ac  |  3 +--
 libgomp/plugin/plugin-nvptx.c |  4 ++--
 6 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 22cb2136a08..d43c584a32d 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -119,8 +119,16 @@ build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
 @PLUGIN_NVPTX_TRUE@am__append_1 = libgomp-plugin-nvptx.la
-@PLUGIN_GCN_TRUE@am__append_2 = libgomp-plugin-gcn.la
-@USE_FORTRAN_TRUE@am__append_3 = openacc.f90
+
+# Including the GCC-shipped 'include/cuda/cuda.h' vs. system <cuda.h>.
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@am__append_2 = -DPLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H \
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@	-DPLUGIN_NVPTX_LINK_LIBCUDA
+
+# 'dlopen'ing the CUDA Driver library vs. linking it.
+@PLUGIN_NVPTX_DYNAMIC_TRUE@@PLUGIN_NVPTX_TRUE@am__append_3 = $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@am__append_4 = $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_GCN_TRUE@am__append_5 = libgomp-plugin-gcn.la
+@USE_FORTRAN_TRUE@am__append_6 = openacc.f90
 subdir = .
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
@@ -197,8 +205,10 @@ libgomp_plugin_gcn_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC \
 	$(libgomp_plugin_gcn_la_LDFLAGS) $(LDFLAGS) -o $@
 @PLUGIN_GCN_TRUE@am_libgomp_plugin_gcn_la_rpath = -rpath \
 @PLUGIN_GCN_TRUE@	$(toolexeclibdir)
+@PLUGIN_NVPTX_DYNAMIC_TRUE@@PLUGIN_NVPTX_TRUE@am__DEPENDENCIES_2 = $(am__DEPENDENCIES_1)
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@am__DEPENDENCIES_3 = $(am__DEPENDENCIES_1)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_DEPENDENCIES = libgomp.la \
-@PLUGIN_NVPTX_TRUE@	$(am__DEPENDENCIES_1)
+@PLUGIN_NVPTX_TRUE@	$(am__DEPENDENCIES_2) $(am__DEPENDENCIES_3)
 @PLUGIN_NVPTX_TRUE@am_libgomp_plugin_nvptx_la_OBJECTS =  \
 @PLUGIN_NVPTX_TRUE@	libgomp_plugin_nvptx_la-plugin-nvptx.lo
 libgomp_plugin_nvptx_la_OBJECTS =  \
@@ -533,7 +543,7 @@ libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
 AM_CPPFLAGS = $(addprefix -I, $(search_path))
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
-toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) $(am__append_2)
+toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) $(am__append_5)
 nodist_toolexeclib_HEADERS = libgomp.spec
 
 # -Wc is only a libtool option.
@@ -559,16 +569,18 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \
 	oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \
 	oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
 	affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-	oacc-target.c $(am__append_3)
+	oacc-target.c $(am__append_6)
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_SOURCES = plugin/plugin-nvptx.c
-@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) \
+@PLUGIN_NVPTX_TRUE@	$(PLUGIN_NVPTX_CPPFLAGS) $(am__append_2)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LDFLAGS =  \
 @PLUGIN_NVPTX_TRUE@	$(libgomp_plugin_nvptx_version_info) \
 @PLUGIN_NVPTX_TRUE@	$(lt_host_flags) $(PLUGIN_NVPTX_LDFLAGS)
-@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = libgomp.la \
+@PLUGIN_NVPTX_TRUE@	$(am__append_3) $(am__append_4)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
 
 # AMD GCN plugin
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index e702625ab6e..8d703ec7226 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -176,10 +176,6 @@
 /* Define to 1 if the NVIDIA plugin is built, 0 if not. */
 #undef PLUGIN_NVPTX
 
-/* Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should
-   be linked against it. */
-#undef PLUGIN_NVPTX_DYNAMIC
-
 /* Define if all infrastructure, needed for plugins, is supported. */
 #undef PLUGIN_SUPPORT
 
diff --git a/libgomp/configure b/libgomp/configure
index 5ef071ea046..447d1734930 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -667,6 +667,8 @@ OPT_LDFLAGS
 SECTION_LDFLAGS
 PLUGIN_GCN_FALSE
 PLUGIN_GCN_TRUE
+PLUGIN_NVPTX_DYNAMIC_FALSE
+PLUGIN_NVPTX_DYNAMIC_TRUE
 PLUGIN_NVPTX_FALSE
 PLUGIN_NVPTX_TRUE
 offload_additional_lib_paths
@@ -11431,7 +11433,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11434 "configure"
+#line 11436 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11537,7 +11539,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11540 "configure"
+#line 11542 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15444,10 +15446,13 @@ cat >>confdefs.h <<_ACEOF
 #define PLUGIN_NVPTX $PLUGIN_NVPTX
 _ACEOF
 
-
-cat >>confdefs.h <<_ACEOF
-#define PLUGIN_NVPTX_DYNAMIC $PLUGIN_NVPTX_DYNAMIC
-_ACEOF
+ if test $PLUGIN_NVPTX_DYNAMIC = 1; then
+  PLUGIN_NVPTX_DYNAMIC_TRUE=
+  PLUGIN_NVPTX_DYNAMIC_FALSE='#'
+else
+  PLUGIN_NVPTX_DYNAMIC_TRUE='#'
+  PLUGIN_NVPTX_DYNAMIC_FALSE=
+fi
 
  if test $PLUGIN_GCN = 1; then
   PLUGIN_GCN_TRUE=
@@ -17221,6 +17226,10 @@ if test -z "${PLUGIN_NVPTX_TRUE}" && test -z "${PLUGIN_NVPTX_FALSE}"; then
   as_fn_error $? "conditional \"PLUGIN_NVPTX\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
 fi
+if test -z "${PLUGIN_NVPTX_DYNAMIC_TRUE}" && test -z "${PLUGIN_NVPTX_DYNAMIC_FALSE}"; then
+  as_fn_error $? "conditional \"PLUGIN_NVPTX_DYNAMIC\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
 if test -z "${PLUGIN_GCN_TRUE}" && test -z "${PLUGIN_GCN_FALSE}"; then
   as_fn_error $? "conditional \"PLUGIN_GCN\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
diff --git a/libgomp/plugin/Makefrag.am b/libgomp/plugin/Makefrag.am
index 3fe50b61cfd..d33df85b83e 100644
--- a/libgomp/plugin/Makefrag.am
+++ b/libgomp/plugin/Makefrag.am
@@ -35,8 +35,22 @@ libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
 libgomp_plugin_nvptx_la_LDFLAGS = $(libgomp_plugin_nvptx_version_info) \
 	$(lt_host_flags)
 libgomp_plugin_nvptx_la_LDFLAGS += $(PLUGIN_NVPTX_LDFLAGS)
-libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
+libgomp_plugin_nvptx_la_LIBADD = libgomp.la
 libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
+
+# Including the GCC-shipped 'include/cuda/cuda.h' vs. system <cuda.h>.
+if PLUGIN_NVPTX_DYNAMIC
+else
+libgomp_plugin_nvptx_la_CPPFLAGS += -DPLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H
+endif
+
+# 'dlopen'ing the CUDA Driver library vs. linking it.
+if PLUGIN_NVPTX_DYNAMIC
+libgomp_plugin_nvptx_la_LIBADD += $(PLUGIN_NVPTX_LIBS)
+else
+libgomp_plugin_nvptx_la_CPPFLAGS += -DPLUGIN_NVPTX_LINK_LIBCUDA
+libgomp_plugin_nvptx_la_LIBADD += $(PLUGIN_NVPTX_LIBS)
+endif
 endif
 
 if PLUGIN_GCN
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 09f87f48bc7..6f5a1ba2b51 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -266,8 +266,7 @@ AC_DEFINE_UNQUOTED(OFFLOAD_PLUGINS, "$offload_plugins",
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
   [Define to 1 if the NVIDIA plugin is built, 0 if not.])
-AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
-  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should be linked against it.])
+AM_CONDITIONAL([PLUGIN_NVPTX_DYNAMIC], [test $PLUGIN_NVPTX_DYNAMIC = 1])
 AM_CONDITIONAL([PLUGIN_GCN], [test $PLUGIN_GCN = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_GCN], [$PLUGIN_GCN],
   [Define to 1 if the GCN plugin is built, 0 if not.])
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index b28dfca00fa..387bcbbc52a 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -41,7 +41,7 @@
 #include "oacc-int.h"
 
 #include <pthread.h>
-#if PLUGIN_NVPTX_DYNAMIC
+#ifndef PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H
 # include "cuda/cuda.h"
 #else
 # include <cuda.h>
@@ -85,7 +85,7 @@ CUresult cuOccupancyMaxPotentialBlockSize(int *, int *, CUfunction,
 
 #define DO_PRAGMA(x) _Pragma (#x)
 
-#if PLUGIN_NVPTX_DYNAMIC
+#ifndef PLUGIN_NVPTX_LINK_LIBCUDA
 # include <dlfcn.h>
 
 struct cuda_lib_s {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))
  2022-04-07 22:27 ` libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA' " Thomas Schwinge
@ 2022-04-08  7:35   ` Tom de Vries
  2022-04-28 13:45     ` libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA' Thomas Schwinge
  0 siblings, 1 reply; 24+ messages in thread
From: Tom de Vries @ 2022-04-08  7:35 UTC (permalink / raw)
  To: Thomas Schwinge, Jakub Jelinek, gcc-patches; +Cc: Alexander Monakov, Jeff Law

On 4/8/22 00:27, Thomas Schwinge wrote:
> Hi!
> 
> On 2017-01-13T19:11:23+0100, Jakub Jelinek <jakub@redhat.com> wrote:
>> Especially for distributions it is undesirable to need to have proprietary
>> CUDA libraries and headers installed when building GCC.
> 
>> --- libgomp/plugin/configfrag.ac.jj   2017-01-13 12:07:56.000000000 +0100
>> +++ libgomp/plugin/configfrag.ac      2017-01-13 17:33:26.608240936 +0100
> 
>> +           PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
>> +           PLUGIN_NVPTX_LIBS='-ldl'
>> +           PLUGIN_NVPTX_DYNAMIC=1
> 
>> +AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
>> +  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should be linked against it.])
> 
> Actually, the conditionals leading to 'PLUGIN_NVPTX_DYNAMIC=1' here do
> control two orthogonal aspects; OK to disentangle that with the attached
> "libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
> 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?
> 

Hi Thomas,

we discussed dropping --with-cuda, so do I understand it correctly that 
you now propose to drop --with-cuda and --with-cuda-driver-lib but 
intend to keep --with-cuda-driver-include ?

Can you explain what user or maintainer scenario is served by this?  Is 
there a problem with using gcc's cuda.h?

Thanks,
- Tom

> 
> Grüße
>   Thomas
> 
> 
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'
  2022-04-08  7:35   ` Tom de Vries
@ 2022-04-28 13:45     ` Thomas Schwinge
  2022-05-05 19:18       ` [PING] " Thomas Schwinge
  2022-05-12 12:09       ` Tom de Vries
  0 siblings, 2 replies; 24+ messages in thread
From: Thomas Schwinge @ 2022-04-28 13:45 UTC (permalink / raw)
  To: Tom de Vries, gcc-patches; +Cc: Alexander Monakov, Jeff Law, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2878 bytes --]

Hi Tom!

On 2022-04-08T09:35:44+0200, Tom de Vries <tdevries@suse.de> wrote:
> On 4/8/22 00:27, Thomas Schwinge wrote:
>> On 2017-01-13T19:11:23+0100, Jakub Jelinek <jakub@redhat.com> wrote:
>>> Especially for distributions it is undesirable to need to have proprietary
>>> CUDA libraries and headers installed when building GCC.
>>
>>> --- libgomp/plugin/configfrag.ac.jj   2017-01-13 12:07:56.000000000 +0100
>>> +++ libgomp/plugin/configfrag.ac      2017-01-13 17:33:26.608240936 +0100
>>
>>> +           PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
>>> +           PLUGIN_NVPTX_LIBS='-ldl'
>>> +           PLUGIN_NVPTX_DYNAMIC=1
>>
>>> +AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
>>> +  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should be linked against it.])
>>
>> Actually, the conditionals leading to 'PLUGIN_NVPTX_DYNAMIC=1' here do
>> control two orthogonal aspects; OK to disentangle that with the attached
>> "libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
>> 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?

> we discussed dropping --with-cuda, so do I understand it correctly that
> you now propose to drop --with-cuda and --with-cuda-driver-lib but
> intend to keep --with-cuda-driver-include ?

No, I think you're reading too much into this first patch.  ;-)

The goal with this patch is just to help disentangle two orthogonal
concepts (as described in the commit log), and then...

> Can you explain what user or maintainer scenario is served by this?

... in a next step, we may indeed remove the current user-visible
'--with-cuda-driver' etc., but keep the underlying functionality
available for the developers.  That's to address the point you'd made in
the "Proposal to remove '--with-cuda-driver'" thread: that it still
"could be useful for debugging / comparison purposes" -- and especially
for development purposes, in my opinion: if you develop CUDA API-level
changes in the libgomp nvptx plugin, it's likely to be easier to just use
the full CUDA toolkit 'cuda.h' and directly link against libcuda (so that
you've got all symbols etc. available), and only once you know what
exactly you need, update GCC's 'include/cuda/cuda.h' and
'libgomp/plugin/cuda-lib.def'.

With that hopefully clarified, OK to push the re-attached
"libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?

> Is
> there a problem with using gcc's cuda.h?

No, all good.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-nvptx-plugin-Split-PLUGIN_NVPTX_DYNAMIC-into.patch --]
[-- Type: text/x-diff, Size: 9643 bytes --]

From c455522ac5d8ab41e5d11f8997678e042ff48e87 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 7 Apr 2022 23:10:16 +0200
Subject: [PATCH] libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'

Including the GCC-shipped 'include/cuda/cuda.h' vs. system <cuda.h> and
'dlopen'ing the CUDA Driver library vs. linking it are separate concerns.

	libgomp/
	* plugin/Makefrag.am: Handle 'PLUGIN_NVPTX_DYNAMIC'.
	* plugin/configfrag.ac (PLUGIN_NVPTX_DYNAMIC): Change
	'AC_DEFINE_UNQUOTED' into 'AM_CONDITIONAL'.
	* plugin/plugin-nvptx.c: Split 'PLUGIN_NVPTX_DYNAMIC' into
	'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and
	'PLUGIN_NVPTX_LINK_LIBCUDA'.
	* Makefile.in: Regenerate.
	* config.h.in: Likewise.
	* configure: Likewise.
---
 libgomp/Makefile.in           | 26 +++++++++++++++++++-------
 libgomp/config.h.in           |  4 ----
 libgomp/configure             | 21 +++++++++++++++------
 libgomp/plugin/Makefrag.am    | 16 +++++++++++++++-
 libgomp/plugin/configfrag.ac  |  3 +--
 libgomp/plugin/plugin-nvptx.c |  4 ++--
 6 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 22cb2136a08..d43c584a32d 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -119,8 +119,16 @@ build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
 @PLUGIN_NVPTX_TRUE@am__append_1 = libgomp-plugin-nvptx.la
-@PLUGIN_GCN_TRUE@am__append_2 = libgomp-plugin-gcn.la
-@USE_FORTRAN_TRUE@am__append_3 = openacc.f90
+
+# Including the GCC-shipped 'include/cuda/cuda.h' vs. system <cuda.h>.
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@am__append_2 = -DPLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H \
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@	-DPLUGIN_NVPTX_LINK_LIBCUDA
+
+# 'dlopen'ing the CUDA Driver library vs. linking it.
+@PLUGIN_NVPTX_DYNAMIC_TRUE@@PLUGIN_NVPTX_TRUE@am__append_3 = $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@am__append_4 = $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_GCN_TRUE@am__append_5 = libgomp-plugin-gcn.la
+@USE_FORTRAN_TRUE@am__append_6 = openacc.f90
 subdir = .
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
@@ -197,8 +205,10 @@ libgomp_plugin_gcn_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC \
 	$(libgomp_plugin_gcn_la_LDFLAGS) $(LDFLAGS) -o $@
 @PLUGIN_GCN_TRUE@am_libgomp_plugin_gcn_la_rpath = -rpath \
 @PLUGIN_GCN_TRUE@	$(toolexeclibdir)
+@PLUGIN_NVPTX_DYNAMIC_TRUE@@PLUGIN_NVPTX_TRUE@am__DEPENDENCIES_2 = $(am__DEPENDENCIES_1)
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@am__DEPENDENCIES_3 = $(am__DEPENDENCIES_1)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_DEPENDENCIES = libgomp.la \
-@PLUGIN_NVPTX_TRUE@	$(am__DEPENDENCIES_1)
+@PLUGIN_NVPTX_TRUE@	$(am__DEPENDENCIES_2) $(am__DEPENDENCIES_3)
 @PLUGIN_NVPTX_TRUE@am_libgomp_plugin_nvptx_la_OBJECTS =  \
 @PLUGIN_NVPTX_TRUE@	libgomp_plugin_nvptx_la-plugin-nvptx.lo
 libgomp_plugin_nvptx_la_OBJECTS =  \
@@ -533,7 +543,7 @@ libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
 AM_CPPFLAGS = $(addprefix -I, $(search_path))
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
-toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) $(am__append_2)
+toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) $(am__append_5)
 nodist_toolexeclib_HEADERS = libgomp.spec
 
 # -Wc is only a libtool option.
@@ -559,16 +569,18 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \
 	oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \
 	oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
 	affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-	oacc-target.c $(am__append_3)
+	oacc-target.c $(am__append_6)
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_SOURCES = plugin/plugin-nvptx.c
-@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) \
+@PLUGIN_NVPTX_TRUE@	$(PLUGIN_NVPTX_CPPFLAGS) $(am__append_2)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LDFLAGS =  \
 @PLUGIN_NVPTX_TRUE@	$(libgomp_plugin_nvptx_version_info) \
 @PLUGIN_NVPTX_TRUE@	$(lt_host_flags) $(PLUGIN_NVPTX_LDFLAGS)
-@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = libgomp.la \
+@PLUGIN_NVPTX_TRUE@	$(am__append_3) $(am__append_4)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
 
 # AMD GCN plugin
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index e702625ab6e..8d703ec7226 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -176,10 +176,6 @@
 /* Define to 1 if the NVIDIA plugin is built, 0 if not. */
 #undef PLUGIN_NVPTX
 
-/* Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should
-   be linked against it. */
-#undef PLUGIN_NVPTX_DYNAMIC
-
 /* Define if all infrastructure, needed for plugins, is supported. */
 #undef PLUGIN_SUPPORT
 
diff --git a/libgomp/configure b/libgomp/configure
index 5ef071ea046..447d1734930 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -667,6 +667,8 @@ OPT_LDFLAGS
 SECTION_LDFLAGS
 PLUGIN_GCN_FALSE
 PLUGIN_GCN_TRUE
+PLUGIN_NVPTX_DYNAMIC_FALSE
+PLUGIN_NVPTX_DYNAMIC_TRUE
 PLUGIN_NVPTX_FALSE
 PLUGIN_NVPTX_TRUE
 offload_additional_lib_paths
@@ -11431,7 +11433,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11434 "configure"
+#line 11436 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11537,7 +11539,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11540 "configure"
+#line 11542 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15444,10 +15446,13 @@ cat >>confdefs.h <<_ACEOF
 #define PLUGIN_NVPTX $PLUGIN_NVPTX
 _ACEOF
 
-
-cat >>confdefs.h <<_ACEOF
-#define PLUGIN_NVPTX_DYNAMIC $PLUGIN_NVPTX_DYNAMIC
-_ACEOF
+ if test $PLUGIN_NVPTX_DYNAMIC = 1; then
+  PLUGIN_NVPTX_DYNAMIC_TRUE=
+  PLUGIN_NVPTX_DYNAMIC_FALSE='#'
+else
+  PLUGIN_NVPTX_DYNAMIC_TRUE='#'
+  PLUGIN_NVPTX_DYNAMIC_FALSE=
+fi
 
  if test $PLUGIN_GCN = 1; then
   PLUGIN_GCN_TRUE=
@@ -17221,6 +17226,10 @@ if test -z "${PLUGIN_NVPTX_TRUE}" && test -z "${PLUGIN_NVPTX_FALSE}"; then
   as_fn_error $? "conditional \"PLUGIN_NVPTX\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
 fi
+if test -z "${PLUGIN_NVPTX_DYNAMIC_TRUE}" && test -z "${PLUGIN_NVPTX_DYNAMIC_FALSE}"; then
+  as_fn_error $? "conditional \"PLUGIN_NVPTX_DYNAMIC\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
 if test -z "${PLUGIN_GCN_TRUE}" && test -z "${PLUGIN_GCN_FALSE}"; then
   as_fn_error $? "conditional \"PLUGIN_GCN\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
diff --git a/libgomp/plugin/Makefrag.am b/libgomp/plugin/Makefrag.am
index 3fe50b61cfd..d33df85b83e 100644
--- a/libgomp/plugin/Makefrag.am
+++ b/libgomp/plugin/Makefrag.am
@@ -35,8 +35,22 @@ libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
 libgomp_plugin_nvptx_la_LDFLAGS = $(libgomp_plugin_nvptx_version_info) \
 	$(lt_host_flags)
 libgomp_plugin_nvptx_la_LDFLAGS += $(PLUGIN_NVPTX_LDFLAGS)
-libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
+libgomp_plugin_nvptx_la_LIBADD = libgomp.la
 libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
+
+# Including the GCC-shipped 'include/cuda/cuda.h' vs. system <cuda.h>.
+if PLUGIN_NVPTX_DYNAMIC
+else
+libgomp_plugin_nvptx_la_CPPFLAGS += -DPLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H
+endif
+
+# 'dlopen'ing the CUDA Driver library vs. linking it.
+if PLUGIN_NVPTX_DYNAMIC
+libgomp_plugin_nvptx_la_LIBADD += $(PLUGIN_NVPTX_LIBS)
+else
+libgomp_plugin_nvptx_la_CPPFLAGS += -DPLUGIN_NVPTX_LINK_LIBCUDA
+libgomp_plugin_nvptx_la_LIBADD += $(PLUGIN_NVPTX_LIBS)
+endif
 endif
 
 if PLUGIN_GCN
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 09f87f48bc7..6f5a1ba2b51 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -266,8 +266,7 @@ AC_DEFINE_UNQUOTED(OFFLOAD_PLUGINS, "$offload_plugins",
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
   [Define to 1 if the NVIDIA plugin is built, 0 if not.])
-AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
-  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should be linked against it.])
+AM_CONDITIONAL([PLUGIN_NVPTX_DYNAMIC], [test $PLUGIN_NVPTX_DYNAMIC = 1])
 AM_CONDITIONAL([PLUGIN_GCN], [test $PLUGIN_GCN = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_GCN], [$PLUGIN_GCN],
   [Define to 1 if the GCN plugin is built, 0 if not.])
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index b28dfca00fa..387bcbbc52a 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -41,7 +41,7 @@
 #include "oacc-int.h"
 
 #include <pthread.h>
-#if PLUGIN_NVPTX_DYNAMIC
+#ifndef PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H
 # include "cuda/cuda.h"
 #else
 # include <cuda.h>
@@ -85,7 +85,7 @@ CUresult cuOccupancyMaxPotentialBlockSize(int *, int *, CUfunction,
 
 #define DO_PRAGMA(x) _Pragma (#x)
 
-#if PLUGIN_NVPTX_DYNAMIC
+#ifndef PLUGIN_NVPTX_LINK_LIBCUDA
 # include <dlfcn.h>
 
 struct cuda_lib_s {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PING] libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'
  2022-04-28 13:45     ` libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA' Thomas Schwinge
@ 2022-05-05 19:18       ` Thomas Schwinge
  2022-05-12 11:57         ` [PING^2] " Thomas Schwinge
  2022-05-12 12:09       ` Tom de Vries
  1 sibling, 1 reply; 24+ messages in thread
From: Thomas Schwinge @ 2022-05-05 19:18 UTC (permalink / raw)
  To: Tom de Vries, gcc-patches; +Cc: Jakub Jelinek, Alexander Monakov, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 3037 bytes --]

Hi!

Ping.


Grüße
 Thomas


On 2022-04-28T15:45:20+0200, I wrote:
> Hi Tom!
>
> On 2022-04-08T09:35:44+0200, Tom de Vries <tdevries@suse.de> wrote:
>> On 4/8/22 00:27, Thomas Schwinge wrote:
>>> On 2017-01-13T19:11:23+0100, Jakub Jelinek <jakub@redhat.com> wrote:
>>>> Especially for distributions it is undesirable to need to have proprietary
>>>> CUDA libraries and headers installed when building GCC.
>>>
>>>> --- libgomp/plugin/configfrag.ac.jj   2017-01-13 12:07:56.000000000 +0100
>>>> +++ libgomp/plugin/configfrag.ac      2017-01-13 17:33:26.608240936 +0100
>>>
>>>> +           PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
>>>> +           PLUGIN_NVPTX_LIBS='-ldl'
>>>> +           PLUGIN_NVPTX_DYNAMIC=1
>>>
>>>> +AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
>>>> +  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should be linked against it.])
>>>
>>> Actually, the conditionals leading to 'PLUGIN_NVPTX_DYNAMIC=1' here do
>>> control two orthogonal aspects; OK to disentangle that with the attached
>>> "libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
>>> 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?
>
>> we discussed dropping --with-cuda, so do I understand it correctly that
>> you now propose to drop --with-cuda and --with-cuda-driver-lib but
>> intend to keep --with-cuda-driver-include ?
>
> No, I think you're reading too much into this first patch.  ;-)
>
> The goal with this patch is just to help disentangle two orthogonal
> concepts (as described in the commit log), and then...
>
>> Can you explain what user or maintainer scenario is served by this?
>
> ... in a next step, we may indeed remove the current user-visible
> '--with-cuda-driver' etc., but keep the underlying functionality
> available for the developers.  That's to address the point you'd made in
> the "Proposal to remove '--with-cuda-driver'" thread: that it still
> "could be useful for debugging / comparison purposes" -- and especially
> for development purposes, in my opinion: if you develop CUDA API-level
> changes in the libgomp nvptx plugin, it's likely to be easier to just use
> the full CUDA toolkit 'cuda.h' and directly link against libcuda (so that
> you've got all symbols etc. available), and only once you know what
> exactly you need, update GCC's 'include/cuda/cuda.h' and
> 'libgomp/plugin/cuda-lib.def'.
>
> With that hopefully clarified, OK to push the re-attached
> "libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
> 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?
>
>> Is
>> there a problem with using gcc's cuda.h?
>
> No, all good.
>
>
> Grüße
>  Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-nvptx-plugin-Split-PLUGIN_NVPTX_DYNAMIC-into.patch --]
[-- Type: text/x-diff, Size: 9643 bytes --]

From c455522ac5d8ab41e5d11f8997678e042ff48e87 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 7 Apr 2022 23:10:16 +0200
Subject: [PATCH] libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'

Including the GCC-shipped 'include/cuda/cuda.h' vs. system <cuda.h> and
'dlopen'ing the CUDA Driver library vs. linking it are separate concerns.

	libgomp/
	* plugin/Makefrag.am: Handle 'PLUGIN_NVPTX_DYNAMIC'.
	* plugin/configfrag.ac (PLUGIN_NVPTX_DYNAMIC): Change
	'AC_DEFINE_UNQUOTED' into 'AM_CONDITIONAL'.
	* plugin/plugin-nvptx.c: Split 'PLUGIN_NVPTX_DYNAMIC' into
	'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and
	'PLUGIN_NVPTX_LINK_LIBCUDA'.
	* Makefile.in: Regenerate.
	* config.h.in: Likewise.
	* configure: Likewise.
---
 libgomp/Makefile.in           | 26 +++++++++++++++++++-------
 libgomp/config.h.in           |  4 ----
 libgomp/configure             | 21 +++++++++++++++------
 libgomp/plugin/Makefrag.am    | 16 +++++++++++++++-
 libgomp/plugin/configfrag.ac  |  3 +--
 libgomp/plugin/plugin-nvptx.c |  4 ++--
 6 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 22cb2136a08..d43c584a32d 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -119,8 +119,16 @@ build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
 @PLUGIN_NVPTX_TRUE@am__append_1 = libgomp-plugin-nvptx.la
-@PLUGIN_GCN_TRUE@am__append_2 = libgomp-plugin-gcn.la
-@USE_FORTRAN_TRUE@am__append_3 = openacc.f90
+
+# Including the GCC-shipped 'include/cuda/cuda.h' vs. system <cuda.h>.
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@am__append_2 = -DPLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H \
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@	-DPLUGIN_NVPTX_LINK_LIBCUDA
+
+# 'dlopen'ing the CUDA Driver library vs. linking it.
+@PLUGIN_NVPTX_DYNAMIC_TRUE@@PLUGIN_NVPTX_TRUE@am__append_3 = $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@am__append_4 = $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_GCN_TRUE@am__append_5 = libgomp-plugin-gcn.la
+@USE_FORTRAN_TRUE@am__append_6 = openacc.f90
 subdir = .
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
@@ -197,8 +205,10 @@ libgomp_plugin_gcn_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC \
 	$(libgomp_plugin_gcn_la_LDFLAGS) $(LDFLAGS) -o $@
 @PLUGIN_GCN_TRUE@am_libgomp_plugin_gcn_la_rpath = -rpath \
 @PLUGIN_GCN_TRUE@	$(toolexeclibdir)
+@PLUGIN_NVPTX_DYNAMIC_TRUE@@PLUGIN_NVPTX_TRUE@am__DEPENDENCIES_2 = $(am__DEPENDENCIES_1)
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@am__DEPENDENCIES_3 = $(am__DEPENDENCIES_1)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_DEPENDENCIES = libgomp.la \
-@PLUGIN_NVPTX_TRUE@	$(am__DEPENDENCIES_1)
+@PLUGIN_NVPTX_TRUE@	$(am__DEPENDENCIES_2) $(am__DEPENDENCIES_3)
 @PLUGIN_NVPTX_TRUE@am_libgomp_plugin_nvptx_la_OBJECTS =  \
 @PLUGIN_NVPTX_TRUE@	libgomp_plugin_nvptx_la-plugin-nvptx.lo
 libgomp_plugin_nvptx_la_OBJECTS =  \
@@ -533,7 +543,7 @@ libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
 AM_CPPFLAGS = $(addprefix -I, $(search_path))
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
-toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) $(am__append_2)
+toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) $(am__append_5)
 nodist_toolexeclib_HEADERS = libgomp.spec
 
 # -Wc is only a libtool option.
@@ -559,16 +569,18 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \
 	oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \
 	oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
 	affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-	oacc-target.c $(am__append_3)
+	oacc-target.c $(am__append_6)
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_SOURCES = plugin/plugin-nvptx.c
-@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) \
+@PLUGIN_NVPTX_TRUE@	$(PLUGIN_NVPTX_CPPFLAGS) $(am__append_2)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LDFLAGS =  \
 @PLUGIN_NVPTX_TRUE@	$(libgomp_plugin_nvptx_version_info) \
 @PLUGIN_NVPTX_TRUE@	$(lt_host_flags) $(PLUGIN_NVPTX_LDFLAGS)
-@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = libgomp.la \
+@PLUGIN_NVPTX_TRUE@	$(am__append_3) $(am__append_4)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
 
 # AMD GCN plugin
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index e702625ab6e..8d703ec7226 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -176,10 +176,6 @@
 /* Define to 1 if the NVIDIA plugin is built, 0 if not. */
 #undef PLUGIN_NVPTX
 
-/* Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should
-   be linked against it. */
-#undef PLUGIN_NVPTX_DYNAMIC
-
 /* Define if all infrastructure, needed for plugins, is supported. */
 #undef PLUGIN_SUPPORT
 
diff --git a/libgomp/configure b/libgomp/configure
index 5ef071ea046..447d1734930 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -667,6 +667,8 @@ OPT_LDFLAGS
 SECTION_LDFLAGS
 PLUGIN_GCN_FALSE
 PLUGIN_GCN_TRUE
+PLUGIN_NVPTX_DYNAMIC_FALSE
+PLUGIN_NVPTX_DYNAMIC_TRUE
 PLUGIN_NVPTX_FALSE
 PLUGIN_NVPTX_TRUE
 offload_additional_lib_paths
@@ -11431,7 +11433,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11434 "configure"
+#line 11436 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11537,7 +11539,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11540 "configure"
+#line 11542 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15444,10 +15446,13 @@ cat >>confdefs.h <<_ACEOF
 #define PLUGIN_NVPTX $PLUGIN_NVPTX
 _ACEOF
 
-
-cat >>confdefs.h <<_ACEOF
-#define PLUGIN_NVPTX_DYNAMIC $PLUGIN_NVPTX_DYNAMIC
-_ACEOF
+ if test $PLUGIN_NVPTX_DYNAMIC = 1; then
+  PLUGIN_NVPTX_DYNAMIC_TRUE=
+  PLUGIN_NVPTX_DYNAMIC_FALSE='#'
+else
+  PLUGIN_NVPTX_DYNAMIC_TRUE='#'
+  PLUGIN_NVPTX_DYNAMIC_FALSE=
+fi
 
  if test $PLUGIN_GCN = 1; then
   PLUGIN_GCN_TRUE=
@@ -17221,6 +17226,10 @@ if test -z "${PLUGIN_NVPTX_TRUE}" && test -z "${PLUGIN_NVPTX_FALSE}"; then
   as_fn_error $? "conditional \"PLUGIN_NVPTX\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
 fi
+if test -z "${PLUGIN_NVPTX_DYNAMIC_TRUE}" && test -z "${PLUGIN_NVPTX_DYNAMIC_FALSE}"; then
+  as_fn_error $? "conditional \"PLUGIN_NVPTX_DYNAMIC\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
 if test -z "${PLUGIN_GCN_TRUE}" && test -z "${PLUGIN_GCN_FALSE}"; then
   as_fn_error $? "conditional \"PLUGIN_GCN\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
diff --git a/libgomp/plugin/Makefrag.am b/libgomp/plugin/Makefrag.am
index 3fe50b61cfd..d33df85b83e 100644
--- a/libgomp/plugin/Makefrag.am
+++ b/libgomp/plugin/Makefrag.am
@@ -35,8 +35,22 @@ libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
 libgomp_plugin_nvptx_la_LDFLAGS = $(libgomp_plugin_nvptx_version_info) \
 	$(lt_host_flags)
 libgomp_plugin_nvptx_la_LDFLAGS += $(PLUGIN_NVPTX_LDFLAGS)
-libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
+libgomp_plugin_nvptx_la_LIBADD = libgomp.la
 libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
+
+# Including the GCC-shipped 'include/cuda/cuda.h' vs. system <cuda.h>.
+if PLUGIN_NVPTX_DYNAMIC
+else
+libgomp_plugin_nvptx_la_CPPFLAGS += -DPLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H
+endif
+
+# 'dlopen'ing the CUDA Driver library vs. linking it.
+if PLUGIN_NVPTX_DYNAMIC
+libgomp_plugin_nvptx_la_LIBADD += $(PLUGIN_NVPTX_LIBS)
+else
+libgomp_plugin_nvptx_la_CPPFLAGS += -DPLUGIN_NVPTX_LINK_LIBCUDA
+libgomp_plugin_nvptx_la_LIBADD += $(PLUGIN_NVPTX_LIBS)
+endif
 endif
 
 if PLUGIN_GCN
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 09f87f48bc7..6f5a1ba2b51 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -266,8 +266,7 @@ AC_DEFINE_UNQUOTED(OFFLOAD_PLUGINS, "$offload_plugins",
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
   [Define to 1 if the NVIDIA plugin is built, 0 if not.])
-AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
-  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should be linked against it.])
+AM_CONDITIONAL([PLUGIN_NVPTX_DYNAMIC], [test $PLUGIN_NVPTX_DYNAMIC = 1])
 AM_CONDITIONAL([PLUGIN_GCN], [test $PLUGIN_GCN = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_GCN], [$PLUGIN_GCN],
   [Define to 1 if the GCN plugin is built, 0 if not.])
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index b28dfca00fa..387bcbbc52a 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -41,7 +41,7 @@
 #include "oacc-int.h"
 
 #include <pthread.h>
-#if PLUGIN_NVPTX_DYNAMIC
+#ifndef PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H
 # include "cuda/cuda.h"
 #else
 # include <cuda.h>
@@ -85,7 +85,7 @@ CUresult cuOccupancyMaxPotentialBlockSize(int *, int *, CUfunction,
 
 #define DO_PRAGMA(x) _Pragma (#x)
 
-#if PLUGIN_NVPTX_DYNAMIC
+#ifndef PLUGIN_NVPTX_LINK_LIBCUDA
 # include <dlfcn.h>
 
 struct cuda_lib_s {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PING^2] libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'
  2022-05-05 19:18       ` [PING] " Thomas Schwinge
@ 2022-05-12 11:57         ` Thomas Schwinge
  0 siblings, 0 replies; 24+ messages in thread
From: Thomas Schwinge @ 2022-05-12 11:57 UTC (permalink / raw)
  To: Jakub Jelinek, Tom de Vries, gcc-patches; +Cc: Alexander Monakov, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 3315 bytes --]

Hi!

Another ping -- Jakub maybe?  This is a simple refactor; no change in
behavior.  I'll soon post further patches depending on this.


Grüße
 Thomas


On 2022-05-05T21:18:47+0200, I wrote:
> Hi!
>
> Ping.
>
>
> Grüße
>  Thomas
>
>
> On 2022-04-28T15:45:20+0200, I wrote:
>> Hi Tom!
>>
>> On 2022-04-08T09:35:44+0200, Tom de Vries <tdevries@suse.de> wrote:
>>> On 4/8/22 00:27, Thomas Schwinge wrote:
>>>> On 2017-01-13T19:11:23+0100, Jakub Jelinek <jakub@redhat.com> wrote:
>>>>> Especially for distributions it is undesirable to need to have proprietary
>>>>> CUDA libraries and headers installed when building GCC.
>>>>
>>>>> --- libgomp/plugin/configfrag.ac.jj   2017-01-13 12:07:56.000000000 +0100
>>>>> +++ libgomp/plugin/configfrag.ac      2017-01-13 17:33:26.608240936 +0100
>>>>
>>>>> +           PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
>>>>> +           PLUGIN_NVPTX_LIBS='-ldl'
>>>>> +           PLUGIN_NVPTX_DYNAMIC=1
>>>>
>>>>> +AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
>>>>> +  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should be linked against it.])
>>>>
>>>> Actually, the conditionals leading to 'PLUGIN_NVPTX_DYNAMIC=1' here do
>>>> control two orthogonal aspects; OK to disentangle that with the attached
>>>> "libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
>>>> 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?
>>
>>> we discussed dropping --with-cuda, so do I understand it correctly that
>>> you now propose to drop --with-cuda and --with-cuda-driver-lib but
>>> intend to keep --with-cuda-driver-include ?
>>
>> No, I think you're reading too much into this first patch.  ;-)
>>
>> The goal with this patch is just to help disentangle two orthogonal
>> concepts (as described in the commit log), and then...
>>
>>> Can you explain what user or maintainer scenario is served by this?
>>
>> ... in a next step, we may indeed remove the current user-visible
>> '--with-cuda-driver' etc., but keep the underlying functionality
>> available for the developers.  That's to address the point you'd made in
>> the "Proposal to remove '--with-cuda-driver'" thread: that it still
>> "could be useful for debugging / comparison purposes" -- and especially
>> for development purposes, in my opinion: if you develop CUDA API-level
>> changes in the libgomp nvptx plugin, it's likely to be easier to just use
>> the full CUDA toolkit 'cuda.h' and directly link against libcuda (so that
>> you've got all symbols etc. available), and only once you know what
>> exactly you need, update GCC's 'include/cuda/cuda.h' and
>> 'libgomp/plugin/cuda-lib.def'.
>>
>> With that hopefully clarified, OK to push the re-attached
>> "libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
>> 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?
>>
>>> Is
>>> there a problem with using gcc's cuda.h?
>>
>> No, all good.
>>
>>
>> Grüße
>>  Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-nvptx-plugin-Split-PLUGIN_NVPTX_DYNAMIC-into.patch --]
[-- Type: text/x-diff, Size: 9435 bytes --]

From 72e5a2271348fe167713dd3b2afbcd988274bf7c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 7 Apr 2022 23:10:16 +0200
Subject: [PATCH] libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'

Including the GCC-shipped 'include/cuda/cuda.h' vs. system <cuda.h> and
'dlopen'ing the CUDA Driver library vs. linking it are separate concerns.

	libgomp/
	* plugin/Makefrag.am: Handle 'PLUGIN_NVPTX_DYNAMIC'.
	* plugin/configfrag.ac (PLUGIN_NVPTX_DYNAMIC): Change
	'AC_DEFINE_UNQUOTED' into 'AM_CONDITIONAL'.
	* plugin/plugin-nvptx.c: Split 'PLUGIN_NVPTX_DYNAMIC' into
	'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and
	'PLUGIN_NVPTX_LINK_LIBCUDA'.
	* Makefile.in: Regenerate.
	* config.h.in: Likewise.
	* configure: Likewise.
---
 libgomp/Makefile.in           | 26 +++++++++++++++++++-------
 libgomp/config.h.in           |  4 ----
 libgomp/configure             | 21 +++++++++++++++------
 libgomp/plugin/Makefrag.am    | 16 +++++++++++++++-
 libgomp/plugin/configfrag.ac  |  3 +--
 libgomp/plugin/plugin-nvptx.c |  4 ++--
 6 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 1d55f4b65e2..51252b89462 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -119,8 +119,16 @@ build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
 @PLUGIN_NVPTX_TRUE@am__append_1 = libgomp-plugin-nvptx.la
-@PLUGIN_GCN_TRUE@am__append_2 = libgomp-plugin-gcn.la
-@USE_FORTRAN_TRUE@am__append_3 = openacc.f90
+
+# Including the GCC-shipped 'include/cuda/cuda.h' vs. system <cuda.h>.
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@am__append_2 = -DPLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H \
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@	-DPLUGIN_NVPTX_LINK_LIBCUDA
+
+# 'dlopen'ing the CUDA Driver library vs. linking it.
+@PLUGIN_NVPTX_DYNAMIC_TRUE@@PLUGIN_NVPTX_TRUE@am__append_3 = $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@am__append_4 = $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_GCN_TRUE@am__append_5 = libgomp-plugin-gcn.la
+@USE_FORTRAN_TRUE@am__append_6 = openacc.f90
 subdir = .
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
@@ -197,8 +205,10 @@ libgomp_plugin_gcn_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC \
 	$(libgomp_plugin_gcn_la_LDFLAGS) $(LDFLAGS) -o $@
 @PLUGIN_GCN_TRUE@am_libgomp_plugin_gcn_la_rpath = -rpath \
 @PLUGIN_GCN_TRUE@	$(toolexeclibdir)
+@PLUGIN_NVPTX_DYNAMIC_TRUE@@PLUGIN_NVPTX_TRUE@am__DEPENDENCIES_2 = $(am__DEPENDENCIES_1)
+@PLUGIN_NVPTX_DYNAMIC_FALSE@@PLUGIN_NVPTX_TRUE@am__DEPENDENCIES_3 = $(am__DEPENDENCIES_1)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_DEPENDENCIES = libgomp.la \
-@PLUGIN_NVPTX_TRUE@	$(am__DEPENDENCIES_1)
+@PLUGIN_NVPTX_TRUE@	$(am__DEPENDENCIES_2) $(am__DEPENDENCIES_3)
 @PLUGIN_NVPTX_TRUE@am_libgomp_plugin_nvptx_la_OBJECTS =  \
 @PLUGIN_NVPTX_TRUE@	libgomp_plugin_nvptx_la-plugin-nvptx.lo
 libgomp_plugin_nvptx_la_OBJECTS =  \
@@ -527,7 +537,7 @@ libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
 AM_CPPFLAGS = $(addprefix -I, $(search_path))
 AM_CFLAGS = $(XCFLAGS)
 AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
-toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) $(am__append_2)
+toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) $(am__append_5)
 nodist_toolexeclib_HEADERS = libgomp.spec
 
 # -Wc is only a libtool option.
@@ -553,16 +563,18 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \
 	oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \
 	oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
 	affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-	oacc-target.c $(am__append_3)
+	oacc-target.c $(am__append_6)
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_SOURCES = plugin/plugin-nvptx.c
-@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) \
+@PLUGIN_NVPTX_TRUE@	$(PLUGIN_NVPTX_CPPFLAGS) $(am__append_2)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LDFLAGS =  \
 @PLUGIN_NVPTX_TRUE@	$(libgomp_plugin_nvptx_version_info) \
 @PLUGIN_NVPTX_TRUE@	$(lt_host_flags) $(PLUGIN_NVPTX_LDFLAGS)
-@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
+@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = libgomp.la \
+@PLUGIN_NVPTX_TRUE@	$(am__append_3) $(am__append_4)
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
 
 # AMD GCN plugin
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 5611ed925ad..d971ea38c46 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -170,10 +170,6 @@
 /* Define to the version of this package. */
 #undef PACKAGE_VERSION
 
-/* Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should
-   be linked against it. */
-#undef PLUGIN_NVPTX_DYNAMIC
-
 /* Define if all infrastructure, needed for plugins, is supported. */
 #undef PLUGIN_SUPPORT
 
diff --git a/libgomp/configure b/libgomp/configure
index be675a6b8ab..0df47cf96e3 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -667,6 +667,8 @@ OPT_LDFLAGS
 SECTION_LDFLAGS
 PLUGIN_GCN_FALSE
 PLUGIN_GCN_TRUE
+PLUGIN_NVPTX_DYNAMIC_FALSE
+PLUGIN_NVPTX_DYNAMIC_TRUE
 PLUGIN_NVPTX_FALSE
 PLUGIN_NVPTX_TRUE
 offload_additional_lib_paths
@@ -11412,7 +11414,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11415 "configure"
+#line 11417 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11518,7 +11520,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11521 "configure"
+#line 11523 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15368,10 +15370,13 @@ else
   PLUGIN_NVPTX_FALSE=
 fi
 
-
-cat >>confdefs.h <<_ACEOF
-#define PLUGIN_NVPTX_DYNAMIC $PLUGIN_NVPTX_DYNAMIC
-_ACEOF
+ if test $PLUGIN_NVPTX_DYNAMIC = 1; then
+  PLUGIN_NVPTX_DYNAMIC_TRUE=
+  PLUGIN_NVPTX_DYNAMIC_FALSE='#'
+else
+  PLUGIN_NVPTX_DYNAMIC_TRUE='#'
+  PLUGIN_NVPTX_DYNAMIC_FALSE=
+fi
 
  if test $PLUGIN_GCN = 1; then
   PLUGIN_GCN_TRUE=
@@ -17140,6 +17145,10 @@ if test -z "${PLUGIN_NVPTX_TRUE}" && test -z "${PLUGIN_NVPTX_FALSE}"; then
   as_fn_error $? "conditional \"PLUGIN_NVPTX\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
 fi
+if test -z "${PLUGIN_NVPTX_DYNAMIC_TRUE}" && test -z "${PLUGIN_NVPTX_DYNAMIC_FALSE}"; then
+  as_fn_error $? "conditional \"PLUGIN_NVPTX_DYNAMIC\" was never defined.
+Usually this means the macro was only invoked conditionally." "$LINENO" 5
+fi
 if test -z "${PLUGIN_GCN_TRUE}" && test -z "${PLUGIN_GCN_FALSE}"; then
   as_fn_error $? "conditional \"PLUGIN_GCN\" was never defined.
 Usually this means the macro was only invoked conditionally." "$LINENO" 5
diff --git a/libgomp/plugin/Makefrag.am b/libgomp/plugin/Makefrag.am
index 11929d4ff29..3eeb3419f9c 100644
--- a/libgomp/plugin/Makefrag.am
+++ b/libgomp/plugin/Makefrag.am
@@ -35,8 +35,22 @@ libgomp_plugin_nvptx_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_NVPTX_CPPFLAGS)
 libgomp_plugin_nvptx_la_LDFLAGS = $(libgomp_plugin_nvptx_version_info) \
 	$(lt_host_flags)
 libgomp_plugin_nvptx_la_LDFLAGS += $(PLUGIN_NVPTX_LDFLAGS)
-libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
+libgomp_plugin_nvptx_la_LIBADD = libgomp.la
 libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
+
+# Including the GCC-shipped 'include/cuda/cuda.h' vs. system <cuda.h>.
+if PLUGIN_NVPTX_DYNAMIC
+else
+libgomp_plugin_nvptx_la_CPPFLAGS += -DPLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H
+endif
+
+# 'dlopen'ing the CUDA Driver library vs. linking it.
+if PLUGIN_NVPTX_DYNAMIC
+libgomp_plugin_nvptx_la_LIBADD += $(PLUGIN_NVPTX_LIBS)
+else
+libgomp_plugin_nvptx_la_CPPFLAGS += -DPLUGIN_NVPTX_LINK_LIBCUDA
+libgomp_plugin_nvptx_la_LIBADD += $(PLUGIN_NVPTX_LIBS)
+endif
 endif
 
 if PLUGIN_GCN
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 1a61db94381..c16224104e3 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -219,6 +219,5 @@ fi
 AC_DEFINE_UNQUOTED(OFFLOAD_PLUGINS, "$offload_plugins",
   [Define to offload plugins, separated by commas.])
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
-AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
-  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should be linked against it.])
+AM_CONDITIONAL([PLUGIN_NVPTX_DYNAMIC], [test $PLUGIN_NVPTX_DYNAMIC = 1])
 AM_CONDITIONAL([PLUGIN_GCN], [test $PLUGIN_GCN = 1])
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index b28dfca00fa..387bcbbc52a 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -41,7 +41,7 @@
 #include "oacc-int.h"
 
 #include <pthread.h>
-#if PLUGIN_NVPTX_DYNAMIC
+#ifndef PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H
 # include "cuda/cuda.h"
 #else
 # include <cuda.h>
@@ -85,7 +85,7 @@ CUresult cuOccupancyMaxPotentialBlockSize(int *, int *, CUfunction,
 
 #define DO_PRAGMA(x) _Pragma (#x)
 
-#if PLUGIN_NVPTX_DYNAMIC
+#ifndef PLUGIN_NVPTX_LINK_LIBCUDA
 # include <dlfcn.h>
 
 struct cuda_lib_s {
-- 
2.35.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'
  2022-04-28 13:45     ` libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA' Thomas Schwinge
  2022-05-05 19:18       ` [PING] " Thomas Schwinge
@ 2022-05-12 12:09       ` Tom de Vries
  1 sibling, 0 replies; 24+ messages in thread
From: Tom de Vries @ 2022-05-12 12:09 UTC (permalink / raw)
  To: Thomas Schwinge, gcc-patches; +Cc: Alexander Monakov, Jeff Law, Jakub Jelinek

On 4/28/22 15:45, Thomas Schwinge wrote:
> Hi Tom!
> 
> On 2022-04-08T09:35:44+0200, Tom de Vries <tdevries@suse.de> wrote:
>> On 4/8/22 00:27, Thomas Schwinge wrote:
>>> On 2017-01-13T19:11:23+0100, Jakub Jelinek <jakub@redhat.com> wrote:
>>>> Especially for distributions it is undesirable to need to have proprietary
>>>> CUDA libraries and headers installed when building GCC.
>>>
>>>> --- libgomp/plugin/configfrag.ac.jj   2017-01-13 12:07:56.000000000 +0100
>>>> +++ libgomp/plugin/configfrag.ac      2017-01-13 17:33:26.608240936 +0100
>>>
>>>> +           PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
>>>> +           PLUGIN_NVPTX_LIBS='-ldl'
>>>> +           PLUGIN_NVPTX_DYNAMIC=1
>>>
>>>> +AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
>>>> +  [Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should be linked against it.])
>>>
>>> Actually, the conditionals leading to 'PLUGIN_NVPTX_DYNAMIC=1' here do
>>> control two orthogonal aspects; OK to disentangle that with the attached
>>> "libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
>>> 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?
> 
>> we discussed dropping --with-cuda, so do I understand it correctly that
>> you now propose to drop --with-cuda and --with-cuda-driver-lib but
>> intend to keep --with-cuda-driver-include ?
> 
> No, I think you're reading too much into this first patch.  ;-)
> 
> The goal with this patch is just to help disentangle two orthogonal
> concepts (as described in the commit log), and then...
> 
>> Can you explain what user or maintainer scenario is served by this?
> 
> ... in a next step, we may indeed remove the current user-visible
> '--with-cuda-driver' etc., but keep the underlying functionality
> available for the developers.  That's to address the point you'd made in
> the "Proposal to remove '--with-cuda-driver'" thread: that it still
> "could be useful for debugging / comparison purposes" -- and especially
> for development purposes, in my opinion: if you develop CUDA API-level
> changes in the libgomp nvptx plugin, it's likely to be easier to just use
> the full CUDA toolkit 'cuda.h' and directly link against libcuda (so that
> you've got all symbols etc. available), and only once you know what
> exactly you need, update GCC's 'include/cuda/cuda.h' and
> 'libgomp/plugin/cuda-lib.def'.
> 
> With that hopefully clarified, OK to push the re-attached
> "libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into
> 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA'"?
> 

Ack, understood, thanks for the detailed explanation.

LGTM.

Thanks,
- Tom

>> Is
>> there a problem with using gcc's cuda.h?
> 
> No, all good.
> 
> 
> Grüße
>   Thomas
> 
> 
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 24+ messages in thread

* 'include/cuda/cuda.h': For C++, wrap in 'extern "C"' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))
  2017-01-13 18:11 [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches) Jakub Jelinek
                   ` (4 preceding siblings ...)
  2022-04-07 22:27 ` libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA' " Thomas Schwinge
@ 2022-05-18 10:08 ` Thomas Schwinge
  2022-05-18 10:11 ` 'include/cuda/cuda.h': Add parts necessary for nvptx-tools 'nvptx-run' " Thomas Schwinge
  2024-03-07 11:53 ` GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1' " Thomas Schwinge
  7 siblings, 0 replies; 24+ messages in thread
From: Thomas Schwinge @ 2022-05-18 10:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 609 bytes --]

Hi!

On 2017-01-13T19:11:23+0100, Jakub Jelinek <jakub@redhat.com> wrote:
> cuda.h header included
> in this patch

To make this '#include'able in C++ code, I've pushed to master branch
commit bdd1dc1bfbe1492edf3ce5e4288cfbc55be329ab
"'include/cuda/cuda.h': For C++, wrap in 'extern "C"'", see attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-include-cuda-cuda.h-For-C-wrap-in-extern-C.patch --]
[-- Type: text/x-diff, Size: 949 bytes --]

From bdd1dc1bfbe1492edf3ce5e4288cfbc55be329ab Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Fri, 29 Apr 2022 10:33:15 +0200
Subject: [PATCH] 'include/cuda/cuda.h': For C++, wrap in 'extern "C"'

	include/
	* cuda/cuda.h: For C++, wrap in 'extern "C"'.
---
 include/cuda/cuda.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 5c813ad2cf8..d7105fb331e 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -32,6 +32,10 @@ the proprietary CUDA toolkit.  */
 
 #define CUDA_VERSION 8000
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 typedef void *CUcontext;
 typedef int CUdevice;
 #if defined(__LP64__) || defined(_WIN64)
@@ -191,4 +195,8 @@ CUresult cuStreamQuery (CUstream);
 CUresult cuStreamSynchronize (CUstream);
 CUresult cuStreamWaitEvent (CUstream, CUevent, unsigned);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* GCC_CUDA_H */
-- 
2.35.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* 'include/cuda/cuda.h': Add parts necessary for nvptx-tools 'nvptx-run' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))
  2017-01-13 18:11 [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches) Jakub Jelinek
                   ` (5 preceding siblings ...)
  2022-05-18 10:08 ` 'include/cuda/cuda.h': For C++, wrap in 'extern "C"' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)) Thomas Schwinge
@ 2022-05-18 10:11 ` Thomas Schwinge
  2024-03-07 11:53 ` GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1' " Thomas Schwinge
  7 siblings, 0 replies; 24+ messages in thread
From: Thomas Schwinge @ 2022-05-18 10:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 674 bytes --]

Hi!

On 2017-01-13T19:11:23+0100, Jakub Jelinek <jakub@redhat.com> wrote:
> cuda.h header included
> in this patch

In order to be able to use that file without changes for
nvptx-tools 'nvptx-run', I've pushed to GCC master branch
commit 86f64400a5692499856d41462461327b93f82b8d
"'include/cuda/cuda.h': Add parts necessary for nvptx-tools 'nvptx-run'",
see attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-include-cuda-cuda.h-Add-parts-necessary-for-nvptx-to.patch --]
[-- Type: text/x-diff, Size: 2102 bytes --]

From 86f64400a5692499856d41462461327b93f82b8d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Fri, 29 Apr 2022 10:44:12 +0200
Subject: [PATCH] 'include/cuda/cuda.h': Add parts necessary for nvptx-tools
 'nvptx-run'

	include/
	* cuda/cuda.h (enum CUjit_option): Add
	'CU_JIT_GENERATE_DEBUG_INFO', 'CU_JIT_GENERATE_LINE_INFO'.
	(enum CUlimit): Add 'CU_LIMIT_STACK_SIZE',
	'CU_LIMIT_MALLOC_HEAP_SIZE'.
	(cuCtxSetLimit, cuGetErrorName): Add.
---
 include/cuda/cuda.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index d7105fb331e..3938d05d150 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -97,7 +97,9 @@ typedef enum {
   CU_JIT_ERROR_LOG_BUFFER = 5,
   CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES = 6,
   CU_JIT_OPTIMIZATION_LEVEL = 7,
-  CU_JIT_LOG_VERBOSE = 12
+  CU_JIT_GENERATE_DEBUG_INFO = 11,
+  CU_JIT_LOG_VERBOSE = 12,
+  CU_JIT_GENERATE_LINE_INFO = 13,
 } CUjit_option;
 
 typedef enum {
@@ -117,6 +119,11 @@ enum {
   CU_STREAM_NON_BLOCKING = 1
 };
 
+typedef enum {
+  CU_LIMIT_STACK_SIZE = 0x00,
+  CU_LIMIT_MALLOC_HEAP_SIZE = 0x02,
+} CUlimit;
+
 #define cuCtxCreate cuCtxCreate_v2
 CUresult cuCtxCreate (CUcontext *, unsigned, CUdevice);
 #define cuCtxDestroy cuCtxDestroy_v2
@@ -128,6 +135,7 @@ CUresult cuCtxPopCurrent (CUcontext *);
 #define cuCtxPushCurrent cuCtxPushCurrent_v2
 CUresult cuCtxPushCurrent (CUcontext);
 CUresult cuCtxSynchronize (void);
+CUresult cuCtxSetLimit (CUlimit, size_t);
 CUresult cuDeviceGet (CUdevice *, int);
 #define cuDeviceTotalMem cuDeviceTotalMem_v2
 CUresult cuDeviceTotalMem (size_t *, CUdevice);
@@ -143,6 +151,7 @@ CUresult cuEventRecord (CUevent, CUstream);
 CUresult cuEventSynchronize (CUevent);
 CUresult cuFuncGetAttribute (int *, CUfunction_attribute, CUfunction);
 CUresult cuGetErrorString (CUresult, const char **);
+CUresult cuGetErrorName (CUresult, const char **);
 CUresult cuInit (unsigned);
 CUresult cuDriverGetVersion (int *);
 CUresult cuLaunchKernel (CUfunction, unsigned, unsigned, unsigned, unsigned,
-- 
2.35.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))
  2017-01-13 18:11 [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches) Jakub Jelinek
                   ` (6 preceding siblings ...)
  2022-05-18 10:11 ` 'include/cuda/cuda.h': Add parts necessary for nvptx-tools 'nvptx-run' " Thomas Schwinge
@ 2024-03-07 11:53 ` Thomas Schwinge
  2024-03-07 11:55   ` Jakub Jelinek
  7 siblings, 1 reply; 24+ messages in thread
From: Thomas Schwinge @ 2024-03-07 11:53 UTC (permalink / raw)
  To: Jakub Jelinek, gcc-patches; +Cc: Tobias Burnus, Andrew Stubbs

[-- Attachment #1: Type: text/plain, Size: 1608 bytes --]

Hi!

On 2017-01-13T19:11:23+0100, Jakub Jelinek <jakub@redhat.com> wrote:
> [...] If the nvptx libgomp plugin is installed, but libcuda.so.1
> can't be found, then the plugin behaves as if there are no PTX devices
> available.  [...]

ACK.

> --- libgomp/plugin/plugin-nvptx.c.jj	2017-01-13 12:07:56.000000000 +0100
> +++ libgomp/plugin/plugin-nvptx.c	2017-01-13 18:00:39.693284346 +0100

> +/* -1 if init_cuda_lib has not been called yet, false
> +   if it has been and failed, true if it has been and succeeded.  */
> +static char cuda_lib_inited = -1;
>  
> -  return desc;
> +/* Dynamically load the CUDA runtime library and initialize function
> +   pointers, return false if unsuccessful, true if successful.  */
> +static bool
> +init_cuda_lib (void)
> +{
> +  if (cuda_lib_inited != -1)
> +    return cuda_lib_inited;
> +  const char *cuda_runtime_lib = "libcuda.so.1";
> +  void *h = dlopen (cuda_runtime_lib, RTLD_LAZY);
> +  cuda_lib_inited = false;
> +  if (h == NULL)
> +    return false;

..., so this has to stay.

> +# undef CUDA_ONE_CALL
> +# define CUDA_ONE_CALL(call) CUDA_ONE_CALL_1 (call)
> +# define CUDA_ONE_CALL_1(call) \
> +  cuda_lib.call = dlsym (h, #call);	\
> +  if (cuda_lib.call == NULL)		\
> +    return false;

However, this (missing symbol) I'd like to make a fatal error, instead of
silently disabling the plugin/device.  OK to push the attached
"GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1'"?

> +  [...]
> +  cuda_lib_inited = true;
> +  return true;
>  }


Grüße
 Thomas



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-GCN-nvptx-Fatal-error-for-missing-symbols-in-libhsa-.patch --]
[-- Type: text/x-diff, Size: 2013 bytes --]

From 6a6520e01f7e7118b556683c2934f2c64c6dbc81 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tschwinge@baylibre.com>
Date: Thu, 7 Mar 2024 12:31:52 +0100
Subject: [PATCH] GCN, nvptx: Fatal error for missing symbols in
 'libhsa-runtime64.so.1', 'libcuda.so.1'

If 'libhsa-runtime64.so.1', 'libcuda.so.1' are not available, the corresponding
libgomp plugin/device gets disabled, as before.  But if they are available,
report any inconsistencies such as missing symbols, similar to how we fail in
presence of other issues during device initialization.

	libgomp/
	* plugin/plugin-gcn.c (init_hsa_runtime_functions): Fatal error
	for missing symbols.
	* plugin/plugin-nvptx.c (init_cuda_lib): Likewise.
---
 libgomp/plugin/plugin-gcn.c   | 3 ++-
 libgomp/plugin/plugin-nvptx.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 464164afb03..338225db6f4 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1382,9 +1382,10 @@ init_hsa_runtime_functions (void)
 #define DLSYM_FN(function) \
   hsa_fns.function##_fn = dlsym (handle, #function); \
   if (hsa_fns.function##_fn == NULL) \
-    return false;
+    GOMP_PLUGIN_fatal ("'%s' is missing '%s'", hsa_runtime_lib, #function);
 #define DLSYM_OPT_FN(function) \
   hsa_fns.function##_fn = dlsym (handle, #function);
+
   void *handle = dlopen (hsa_runtime_lib, RTLD_LAZY);
   if (handle == NULL)
     return false;
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 3fd6cd42fa6..ffb1db67d20 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -127,7 +127,7 @@ init_cuda_lib (void)
 # define CUDA_ONE_CALL_1(call, allow_null)		\
   cuda_lib.call = dlsym (h, #call);	\
   if (!allow_null && cuda_lib.call == NULL)		\
-    return false;
+    GOMP_PLUGIN_fatal ("'%s' is missing '%s'", cuda_runtime_lib, #call);
 #include "cuda-lib.def"
 # undef CUDA_ONE_CALL
 # undef CUDA_ONE_CALL_1
-- 
2.34.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches))
  2024-03-07 11:53 ` GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1' " Thomas Schwinge
@ 2024-03-07 11:55   ` Jakub Jelinek
  0 siblings, 0 replies; 24+ messages in thread
From: Jakub Jelinek @ 2024-03-07 11:55 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches, Tobias Burnus, Andrew Stubbs

On Thu, Mar 07, 2024 at 12:53:31PM +0100, Thomas Schwinge wrote:
> >From 6a6520e01f7e7118b556683c2934f2c64c6dbc81 Mon Sep 17 00:00:00 2001
> From: Thomas Schwinge <tschwinge@baylibre.com>
> Date: Thu, 7 Mar 2024 12:31:52 +0100
> Subject: [PATCH] GCN, nvptx: Fatal error for missing symbols in
>  'libhsa-runtime64.so.1', 'libcuda.so.1'
> 
> If 'libhsa-runtime64.so.1', 'libcuda.so.1' are not available, the corresponding
> libgomp plugin/device gets disabled, as before.  But if they are available,
> report any inconsistencies such as missing symbols, similar to how we fail in
> presence of other issues during device initialization.
> 
> 	libgomp/
> 	* plugin/plugin-gcn.c (init_hsa_runtime_functions): Fatal error
> 	for missing symbols.
> 	* plugin/plugin-nvptx.c (init_cuda_lib): Likewise.

Ok.

	Jakub


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-03-07 11:55 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-13 18:11 [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches) Jakub Jelinek
2017-01-13 18:19 ` Joseph Myers
2017-01-13 18:28   ` Jakub Jelinek
2017-01-16 19:23     ` Jeff Law
2017-01-18 20:27 ` Alexander Monakov
2017-01-18 22:18   ` Jakub Jelinek
2017-01-19 15:10     ` Alexander Monakov
2017-01-19 15:55       ` Jakub Jelinek
2017-01-21 15:28 ` Thomas Schwinge
2017-01-21 19:13   ` Jakub Jelinek
2017-05-03  9:08   ` Jakub Jelinek
2017-05-04 17:26     ` Thomas Schwinge
2022-04-06 12:39 ` Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)) Thomas Schwinge
2022-04-06 12:41   ` Jakub Jelinek
2022-04-07 22:27 ` libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA' " Thomas Schwinge
2022-04-08  7:35   ` Tom de Vries
2022-04-28 13:45     ` libgomp nvptx plugin: Split 'PLUGIN_NVPTX_DYNAMIC' into 'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and 'PLUGIN_NVPTX_LINK_LIBCUDA' Thomas Schwinge
2022-05-05 19:18       ` [PING] " Thomas Schwinge
2022-05-12 11:57         ` [PING^2] " Thomas Schwinge
2022-05-12 12:09       ` Tom de Vries
2022-05-18 10:08 ` 'include/cuda/cuda.h': For C++, wrap in 'extern "C"' (was: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)) Thomas Schwinge
2022-05-18 10:11 ` 'include/cuda/cuda.h': Add parts necessary for nvptx-tools 'nvptx-run' " Thomas Schwinge
2024-03-07 11:53 ` GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', 'libcuda.so.1' " Thomas Schwinge
2024-03-07 11:55   ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).