public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* nvptx: Support global constructors/destructors via 'collect2'
       [not found] <878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com>
@ 2022-12-02 13:35 ` Thomas Schwinge
  2022-12-20  8:03   ` [PING] " Thomas Schwinge
                     ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Thomas Schwinge @ 2022-12-02 13:35 UTC (permalink / raw)
  To: gcc-patches, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

Hi!

On 2022-12-01T22:13:38+0100, I wrote:
> I'm working on support for global constructors/destructors with
> GCC/nvptx

See "nvptx: Support global constructors/destructors via 'collect2'"
attached; OK to push?  (... with 'gcc/doc/install.texi' accordingly
updated once <https://github.com/MentorEmbedded/nvptx-tools/pull/40>
"'nm'" and newlib
<https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
"nvptx: Implement '_exit' instead of 'exit'" have been merged; any
comments to those?)

Per my quick scanning of 'gcc/config.gcc' history, for more than two
decades, there was a clear trend to remove 'use_collect2=yes'
configurations; now finally a new one is being added -- making sure we're
not slowly dispensing with the need for the early 1990s piece of work
that 'gcc/collect2*' is...  ;'-P


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Support-global-constructors-destructors-via-co.patch --]
[-- Type: text/x-diff, Size: 10745 bytes --]

From ba5f6471d39e684fb740523651138a90a1b63cf9 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Sun, 13 Nov 2022 14:19:30 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'

The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this.  Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more "sorry, unimplemented:
global constructors not supported on this target".

This depends on <https://github.com/MentorEmbedded/nvptx-tools/pull/40> "'nm'"
generally, and for global destructors support: newlib
<https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
"nvptx: Implement '_exit' instead of 'exit'".

	gcc/
	* collect2.cc (write_c_file_glob): Allow for
	'COLLECT2_MAIN_REFERENCE' override.
	* config.gcc <case ${target} in nvptx-*>: Set 'use_collect2=yes'.
	* config/nvptx/nvptx.h: Adjust.
	gcc/testsuite/
	* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
	'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
	in identifiers.
	* lib/target-supports.exp
	(check_effective_target_global_constructor): Enable for nvptx.
	libgcc/
	* config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
	'crtend.o' to 'extra_parts'.
	* config/nvptx/crt0.c: Invoke '__do_global_ctors',
	'__do_global_dtors'.
	* config/nvptx/crtstuff.c: New.
	* config/nvptx/t-nvptx: Adjust.
---
 gcc/collect2.cc                               |  4 ++
 gcc/config.gcc                                |  1 +
 gcc/config/nvptx/nvptx.h                      | 35 ++++++++++-
 .../no_profile_instrument_function-attr-1.c   |  2 +-
 gcc/testsuite/lib/target-supports.exp         |  3 +-
 libgcc/config.host                            |  2 +-
 libgcc/config/nvptx/crt0.c                    |  5 ++
 libgcc/config/nvptx/crtstuff.c                | 58 +++++++++++++++++++
 libgcc/config/nvptx/t-nvptx                   | 15 ++++-
 9 files changed, 118 insertions(+), 7 deletions(-)
 create mode 100644 libgcc/config/nvptx/crtstuff.c

diff --git a/gcc/collect2.cc b/gcc/collect2.cc
index d81c7f28f16..945a9ff86dd 100644
--- a/gcc/collect2.cc
+++ b/gcc/collect2.cc
@@ -2238,8 +2238,12 @@ write_c_file_glob (FILE *stream, const char *name ATTRIBUTE_UNUSED)
     fprintf (stream, "\tdereg_frame,\n");
   fprintf (stream, "\t0\n};\n\n");
 
+# ifdef COLLECT2_MAIN_REFERENCE
+  fprintf (stream, "%s\n\n", COLLECT2_MAIN_REFERENCE);
+# else
   fprintf (stream, "extern entry_pt %s;\n", NAME__MAIN);
   fprintf (stream, "entry_pt *__main_reference = %s;\n\n", NAME__MAIN);
+# endif
 }
 #endif /* ! LD_INIT_SWITCH */
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index b5eda046033..9b27efd5f51 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2783,6 +2783,7 @@ nvptx-*)
 	tm_file="${tm_file} newlib-stdint.h"
 	use_gcc_stdint=wrap
 	tmake_file="nvptx/t-nvptx"
+	use_collect2=yes
 	if test x$enable_as_accelerator = xyes; then
 		extra_programs="${extra_programs} mkoffload\$(exeext)"
 		tm_file="${tm_file} nvptx/offload.h"
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index b1cbe5d417b..bc1021a8031 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -35,7 +35,39 @@
    '../../gcc.cc:asm_options', 'HAVE_GNU_AS'.  */
 #define ASM_SPEC "%{v}"
 
-#define STARTFILE_SPEC "%{mmainkernel:crt0.o%s}"
+#define STARTFILE_SPEC \
+  STARTFILE_SPEC_MMAINKERNEL \
+  " " STARTFILE_SPEC_CDTOR
+
+#define ENDFILE_SPEC \
+  ENDFILE_SPEC_CDTOR
+
+#define STARTFILE_SPEC_MMAINKERNEL "%{mmainkernel:crt0.o%s}"
+
+/* Support for global constructors/destructors is implemented via
+   'collect2' and the following helpers.  */
+
+#define STARTFILE_SPEC_CDTOR "crtbegin.o%s"
+
+#define ENDFILE_SPEC_CDTOR "crtend.o%s"
+
+/* nvptx does its own wrapping of 'main'
+   (see 'libgcc/config/nvptx/crt0.c:__main').  */
+#define HAS_INIT_SECTION
+
+/* For example with old Nvidia Tesla K20c, Driver Version: 361.93.02, the
+   function pointers stored in the '__CTOR_LIST__', '__DTOR_LIST__' arrays
+   evidently evaluate to NULL in JIT compilation.  Avoiding the use of
+   assembler names ('write_list_with_asm') doesn't help, but defining a dummy
+   function next to the arrays apparently does work around this issue...
+
+   The default '__main_reference' synthesized by 'collect2' refers to our
+   'crt0.o' '__main' function with incompatible signature:
+
+       error   : Function '__main' not declared __global__ in all source files
+
+   Address both these issues via 'COLLECT2_MAIN_REFERENCE'.  */
+#define COLLECT2_MAIN_REFERENCE "__attribute__((unused)) static void dummy () {}"
 
 #define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
 
@@ -354,7 +386,6 @@ struct GTY(()) machine_function
 #define MOVE_MAX 8
 #define MOVE_RATIO(SPEED) 4
 #define FUNCTION_MODE QImode
-#define HAS_INIT_SECTION 1
 
 /* The C++ front end insists to link against libstdc++ -- which we don't build.
    Tell it to instead link against the innocuous libgcc.  */
diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
index 909f8a68479..5b4101cf596 100644
--- a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
@@ -18,7 +18,7 @@ int main ()
   return foo ();
 }
 
-/* { dg-final { scan-tree-dump-times "__gcov0\[._\]main.* = PROF_edge_counter" 1 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "__gcov0\[$._\]main.* = PROF_edge_counter" 1 "optimized"} } */
 /* { dg-final { scan-tree-dump-times "__gcov_indirect_call_profiler_v" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "__gcov_time_profiler_counter = " 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "__gcov_init" 1 "optimized" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index d2de761adb5..ba9b045ebd4 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -907,8 +907,7 @@ proc check_effective_target_nonlocal_goto {} {
 # Return 1 if global constructors are supported, 0 otherwise.
 
 proc check_effective_target_global_constructor {} {
-    if { [istarget nvptx-*-*]
-	 || [istarget bpf-*-*] } {
+    if { [istarget bpf-*-*] } {
 	return 0
     }
     return 1
diff --git a/libgcc/config.host b/libgcc/config.host
index eb23abe89f5..25072f41860 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1499,7 +1499,7 @@ m32c-*-elf*)
  	;;
 nvptx-*)
 	tmake_file="$tmake_file nvptx/t-nvptx"
-	extra_parts="crt0.o"
+	extra_parts="crt0.o crtbegin.o crtend.o"
 	;;
 *)
 	echo "*** Configuration ${host} not supported" 1>&2
diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c
index abf047327ae..6a49790f479 100644
--- a/libgcc/config/nvptx/crt0.c
+++ b/libgcc/config/nvptx/crt0.c
@@ -19,6 +19,8 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include "gbl-ctors.h"
+
 int *__exitval_ptr;
 
 extern void __attribute__((noreturn)) exit (int status);
@@ -47,5 +49,8 @@ __main (int *rval_ptr, int argc, void **argv)
   __nvptx_stacks[0] = stack + sizeof stack;
   __nvptx_uni[0] = 0;
 
+  __do_global_ctors ();
+  atexit (__do_global_dtors);
+
   exit (main (argc, argv));
 }
diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
new file mode 100644
index 00000000000..0823fc49901
--- /dev/null
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -0,0 +1,58 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "gbl-ctors.h"
+
+/* The common 'crtstuff.c' doesn't quite provide what we need, so we roll our
+   own.
+
+   There's no technical reason in this configuration here to split the two
+   functions '__do_global_ctors' and '__do_global_ctors' into two separate
+   files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
+   do so anyway, for symmetry with other configurations.  */
+
+#ifdef CRT_BEGIN
+
+void
+__do_global_ctors (void)
+{
+  DO_GLOBAL_CTORS_BODY;
+}
+
+#elif defined(CRT_END) /* ! CRT_BEGIN */
+
+void
+__do_global_dtors (void)
+{
+  /* In this configuration here, there's no way that "this routine is run more
+     than once [...] when exit is called recursively": for nvptx target, the
+     call to '__do_global_dtors' is registered via 'atexit', which doesn't
+     re-enter a function already run.
+     Therefore, we do *not* "arrange to remember where in the list we left off
+     processing".  */
+  func_ptr *p;
+  for (p = __DTOR_LIST__ + 1; *p; )
+    (*p++) ();
+}
+
+#else /* ! CRT_BEGIN && ! CRT_END */
+#error "One of CRT_BEGIN or CRT_END must be defined."
+#endif
diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index ede0bf0f87d..9a0454c3a4d 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -3,7 +3,7 @@ LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
 	$(srcdir)/config/nvptx/atomic.c
 
 LIB2ADDEH=
-LIB2FUNCS_EXCLUDE=__main
+LIB2FUNCS_EXCLUDE=
 
 crt0.o: $(srcdir)/config/nvptx/crt0.c
 	$(crt_compile) -c $<
@@ -12,3 +12,16 @@ crt0.o: $(srcdir)/config/nvptx/crt0.c
 # support it, and it may cause the build to fail, because of alloca usage, for
 # example.
 INHIBIT_LIBC_CFLAGS = -Dinhibit_libc
+
+# Support for global constructors/destructors is implemented via
+# 'collect2' and the following helpers.
+
+LIB2FUNCS_EXCLUDE += __main
+
+CUSTOM_CRTSTUFF = yes
+
+crtbegin.o: $(srcdir)/config/nvptx/crtstuff.c
+	$(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_BEGIN
+
+crtend.o: $(srcdir)/config/nvptx/crtstuff.c
+	$(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_END
-- 
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PING] nvptx: Support global constructors/destructors via 'collect2'
  2022-12-02 13:35 ` nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
@ 2022-12-20  8:03   ` Thomas Schwinge
  2023-01-11 11:48     ` [PING^2] " Thomas Schwinge
  2023-01-24  9:01     ` Make 'libgcc/config/nvptx/crt0.c' build '--without-headers' (was: [PING] nvptx: Support global constructors/destructors via 'collect2') Thomas Schwinge
  2022-12-23 13:35   ` nvptx: Support global constructors/destructors via 'collect2' for offloading (was: " Thomas Schwinge
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 19+ messages in thread
From: Thomas Schwinge @ 2022-12-20  8:03 UTC (permalink / raw)
  To: gcc-patches, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1433 bytes --]

Hi!

Ping.


Minor change in the attached
"nvptx: Support global constructors/destructors via 'collect2'": for
'atexit', add '#include <stdlib.h>' to 'libgcc/config/nvptx/crt0.c'.


Grüße
 Thomas


On 2022-12-02T14:35:35+0100, I wrote:
> Hi!
>
> On 2022-12-01T22:13:38+0100, I wrote:
>> I'm working on support for global constructors/destructors with
>> GCC/nvptx
>
> See "nvptx: Support global constructors/destructors via 'collect2'"
> attached; OK to push?  (... with 'gcc/doc/install.texi' accordingly
> updated once <https://github.com/MentorEmbedded/nvptx-tools/pull/40>
> "'nm'" and newlib
> <https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
> "nvptx: Implement '_exit' instead of 'exit'" have been merged; any
> comments to those?)
>
> Per my quick scanning of 'gcc/config.gcc' history, for more than two
> decades, there was a clear trend to remove 'use_collect2=yes'
> configurations; now finally a new one is being added -- making sure we're
> not slowly dispensing with the need for the early 1990s piece of work
> that 'gcc/collect2*' is...  ;'-P
>
>
> Grüße
>  Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Support-global-constructors-destructors-via-co.patch --]
[-- Type: text/x-diff, Size: 10784 bytes --]

From 0e7cf5a9f83c3a82eafa126886e5d92651bfbb30 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Sun, 13 Nov 2022 14:19:30 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'

The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this.  Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more "sorry, unimplemented:
global constructors not supported on this target".

This depends on <https://github.com/MentorEmbedded/nvptx-tools/pull/40> "'nm'"
generally, and for global destructors support: newlib
<https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
"nvptx: Implement '_exit' instead of 'exit'".

	gcc/
	* collect2.cc (write_c_file_glob): Allow for
	'COLLECT2_MAIN_REFERENCE' override.
	* config.gcc <case ${target} in nvptx-*>: Set 'use_collect2=yes'.
	* config/nvptx/nvptx.h: Adjust.
	gcc/testsuite/
	* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
	'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
	in identifiers.
	* lib/target-supports.exp
	(check_effective_target_global_constructor): Enable for nvptx.
	libgcc/
	* config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
	'crtend.o' to 'extra_parts'.
	* config/nvptx/crt0.c: Invoke '__do_global_ctors',
	'__do_global_dtors'.
	* config/nvptx/crtstuff.c: New.
	* config/nvptx/t-nvptx: Adjust.
---
 gcc/collect2.cc                               |  4 ++
 gcc/config.gcc                                |  1 +
 gcc/config/nvptx/nvptx.h                      | 35 ++++++++++-
 .../no_profile_instrument_function-attr-1.c   |  2 +-
 gcc/testsuite/lib/target-supports.exp         |  3 +-
 libgcc/config.host                            |  2 +-
 libgcc/config/nvptx/crt0.c                    |  6 ++
 libgcc/config/nvptx/crtstuff.c                | 58 +++++++++++++++++++
 libgcc/config/nvptx/t-nvptx                   | 15 ++++-
 9 files changed, 119 insertions(+), 7 deletions(-)
 create mode 100644 libgcc/config/nvptx/crtstuff.c

diff --git a/gcc/collect2.cc b/gcc/collect2.cc
index d81c7f28f16a..945a9ff86dda 100644
--- a/gcc/collect2.cc
+++ b/gcc/collect2.cc
@@ -2238,8 +2238,12 @@ write_c_file_glob (FILE *stream, const char *name ATTRIBUTE_UNUSED)
     fprintf (stream, "\tdereg_frame,\n");
   fprintf (stream, "\t0\n};\n\n");
 
+# ifdef COLLECT2_MAIN_REFERENCE
+  fprintf (stream, "%s\n\n", COLLECT2_MAIN_REFERENCE);
+# else
   fprintf (stream, "extern entry_pt %s;\n", NAME__MAIN);
   fprintf (stream, "entry_pt *__main_reference = %s;\n\n", NAME__MAIN);
+# endif
 }
 #endif /* ! LD_INIT_SWITCH */
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 951902338205..fec67d7b6e40 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2784,6 +2784,7 @@ nvptx-*)
 	tm_file="${tm_file} newlib-stdint.h"
 	use_gcc_stdint=wrap
 	tmake_file="nvptx/t-nvptx"
+	use_collect2=yes
 	if test x$enable_as_accelerator = xyes; then
 		extra_programs="${extra_programs} mkoffload\$(exeext)"
 		tm_file="${tm_file} nvptx/offload.h"
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index dc676dcb5fc5..235c1e4d99d5 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -35,7 +35,39 @@
    '../../gcc.cc:asm_options', 'HAVE_GNU_AS'.  */
 #define ASM_SPEC "%{v}"
 
-#define STARTFILE_SPEC "%{mmainkernel:crt0.o%s}"
+#define STARTFILE_SPEC \
+  STARTFILE_SPEC_MMAINKERNEL \
+  " " STARTFILE_SPEC_CDTOR
+
+#define ENDFILE_SPEC \
+  ENDFILE_SPEC_CDTOR
+
+#define STARTFILE_SPEC_MMAINKERNEL "%{mmainkernel:crt0.o%s}"
+
+/* Support for global constructors/destructors is implemented via
+   'collect2' and the following helpers.  */
+
+#define STARTFILE_SPEC_CDTOR "crtbegin.o%s"
+
+#define ENDFILE_SPEC_CDTOR "crtend.o%s"
+
+/* nvptx does its own wrapping of 'main'
+   (see 'libgcc/config/nvptx/crt0.c:__main').  */
+#define HAS_INIT_SECTION
+
+/* For example with old Nvidia Tesla K20c, Driver Version: 361.93.02, the
+   function pointers stored in the '__CTOR_LIST__', '__DTOR_LIST__' arrays
+   evidently evaluate to NULL in JIT compilation.  Avoiding the use of
+   assembler names ('write_list_with_asm') doesn't help, but defining a dummy
+   function next to the arrays apparently does work around this issue...
+
+   The default '__main_reference' synthesized by 'collect2' refers to our
+   'crt0.o' '__main' function with incompatible signature:
+
+       error   : Function '__main' not declared __global__ in all source files
+
+   Address both these issues via 'COLLECT2_MAIN_REFERENCE'.  */
+#define COLLECT2_MAIN_REFERENCE "__attribute__((unused)) static void dummy () {}"
 
 #define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
 
@@ -348,7 +380,6 @@ struct GTY(()) machine_function
 #define MOVE_MAX 8
 #define MOVE_RATIO(SPEED) 4
 #define FUNCTION_MODE QImode
-#define HAS_INIT_SECTION 1
 
 /* The C++ front end insists to link against libstdc++ -- which we don't build.
    Tell it to instead link against the innocuous libgcc.  */
diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
index 909f8a684791..5b4101cf596d 100644
--- a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
@@ -18,7 +18,7 @@ int main ()
   return foo ();
 }
 
-/* { dg-final { scan-tree-dump-times "__gcov0\[._\]main.* = PROF_edge_counter" 1 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "__gcov0\[$._\]main.* = PROF_edge_counter" 1 "optimized"} } */
 /* { dg-final { scan-tree-dump-times "__gcov_indirect_call_profiler_v" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "__gcov_time_profiler_counter = " 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "__gcov_init" 1 "optimized" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index ea06e21c3a14..b1b1c5b36bc2 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -907,8 +907,7 @@ proc check_effective_target_nonlocal_goto {} {
 # Return 1 if global constructors are supported, 0 otherwise.
 
 proc check_effective_target_global_constructor {} {
-    if { [istarget nvptx-*-*]
-	 || [istarget bpf-*-*] } {
+    if { [istarget bpf-*-*] } {
 	return 0
     }
     return 1
diff --git a/libgcc/config.host b/libgcc/config.host
index eb23abe89f5e..25072f41860c 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1499,7 +1499,7 @@ m32c-*-elf*)
  	;;
 nvptx-*)
 	tmake_file="$tmake_file nvptx/t-nvptx"
-	extra_parts="crt0.o"
+	extra_parts="crt0.o crtbegin.o crtend.o"
 	;;
 *)
 	echo "*** Configuration ${host} not supported" 1>&2
diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c
index abf047327ae7..860e2bfacadd 100644
--- a/libgcc/config/nvptx/crt0.c
+++ b/libgcc/config/nvptx/crt0.c
@@ -19,6 +19,9 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include <stdlib.h>
+#include "gbl-ctors.h"
+
 int *__exitval_ptr;
 
 extern void __attribute__((noreturn)) exit (int status);
@@ -47,5 +50,8 @@ __main (int *rval_ptr, int argc, void **argv)
   __nvptx_stacks[0] = stack + sizeof stack;
   __nvptx_uni[0] = 0;
 
+  __do_global_ctors ();
+  atexit (__do_global_dtors);
+
   exit (main (argc, argv));
 }
diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
new file mode 100644
index 000000000000..0823fc499019
--- /dev/null
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -0,0 +1,58 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "gbl-ctors.h"
+
+/* The common 'crtstuff.c' doesn't quite provide what we need, so we roll our
+   own.
+
+   There's no technical reason in this configuration here to split the two
+   functions '__do_global_ctors' and '__do_global_ctors' into two separate
+   files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
+   do so anyway, for symmetry with other configurations.  */
+
+#ifdef CRT_BEGIN
+
+void
+__do_global_ctors (void)
+{
+  DO_GLOBAL_CTORS_BODY;
+}
+
+#elif defined(CRT_END) /* ! CRT_BEGIN */
+
+void
+__do_global_dtors (void)
+{
+  /* In this configuration here, there's no way that "this routine is run more
+     than once [...] when exit is called recursively": for nvptx target, the
+     call to '__do_global_dtors' is registered via 'atexit', which doesn't
+     re-enter a function already run.
+     Therefore, we do *not* "arrange to remember where in the list we left off
+     processing".  */
+  func_ptr *p;
+  for (p = __DTOR_LIST__ + 1; *p; )
+    (*p++) ();
+}
+
+#else /* ! CRT_BEGIN && ! CRT_END */
+#error "One of CRT_BEGIN or CRT_END must be defined."
+#endif
diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index ede0bf0f87dd..9a0454c3a4d0 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -3,7 +3,7 @@ LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
 	$(srcdir)/config/nvptx/atomic.c
 
 LIB2ADDEH=
-LIB2FUNCS_EXCLUDE=__main
+LIB2FUNCS_EXCLUDE=
 
 crt0.o: $(srcdir)/config/nvptx/crt0.c
 	$(crt_compile) -c $<
@@ -12,3 +12,16 @@ crt0.o: $(srcdir)/config/nvptx/crt0.c
 # support it, and it may cause the build to fail, because of alloca usage, for
 # example.
 INHIBIT_LIBC_CFLAGS = -Dinhibit_libc
+
+# Support for global constructors/destructors is implemented via
+# 'collect2' and the following helpers.
+
+LIB2FUNCS_EXCLUDE += __main
+
+CUSTOM_CRTSTUFF = yes
+
+crtbegin.o: $(srcdir)/config/nvptx/crtstuff.c
+	$(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_BEGIN
+
+crtend.o: $(srcdir)/config/nvptx/crtstuff.c
+	$(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_END
-- 
2.35.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* nvptx: Support global constructors/destructors via 'collect2' for offloading (was: nvptx: Support global constructors/destructors via 'collect2')
  2022-12-02 13:35 ` nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
  2022-12-20  8:03   ` [PING] " Thomas Schwinge
@ 2022-12-23 13:35   ` Thomas Schwinge
  2022-12-23 13:37     ` Thomas Schwinge
                       ` (2 more replies)
  2023-01-20 20:41   ` [og12] nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
  2024-05-31 13:15   ` nvptx target: Global constructor, destructor support, via nvptx-tools 'ld' (was: nvptx: Support global constructors/destructors via 'collect2') Thomas Schwinge
  3 siblings, 3 replies; 19+ messages in thread
From: Thomas Schwinge @ 2022-12-23 13:35 UTC (permalink / raw)
  To: gcc-patches, Tom de Vries

Hi!

On 2022-12-02T14:35:35+0100, I wrote:
> On 2022-12-01T22:13:38+0100, I wrote:
>> I'm working on support for global constructors/destructors with
>> GCC/nvptx
>
> See "nvptx: Support global constructors/destructors via 'collect2'"
> [posted before]

Building on that, attached is now the additional "for offloading" piece:
"nvptx: Support global constructors/destructors via 'collect2' for offloading".
OK to push?

I did manually test this (by putting a few constructors/destructors into
'libgomp/config/nvptx/oacc-parallel.c', and observing them be executed),
and also in my WIP development tree with standard libgfortran
constructors (with 'LIBGFOR_MINIMAL' disabled).


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 19+ messages in thread

* nvptx: Support global constructors/destructors via 'collect2' for offloading (was: nvptx: Support global constructors/destructors via 'collect2')
  2022-12-23 13:35   ` nvptx: Support global constructors/destructors via 'collect2' for offloading (was: " Thomas Schwinge
@ 2022-12-23 13:37     ` Thomas Schwinge
  2023-01-11 11:49       ` [PING] " Thomas Schwinge
  2023-01-20 20:46     ` [og12] " Thomas Schwinge
  2024-06-06 12:02     ` nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld' (was: nvptx: Support global constructors/destructors via 'collect2' for offloading) Thomas Schwinge
  2 siblings, 1 reply; 19+ messages in thread
From: Thomas Schwinge @ 2022-12-23 13:37 UTC (permalink / raw)
  To: gcc-patches, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]

Hi!

On 2022-12-23T14:35:16+0100, I wrote:
> On 2022-12-02T14:35:35+0100, I wrote:
>> On 2022-12-01T22:13:38+0100, I wrote:
>>> I'm working on support for global constructors/destructors with
>>> GCC/nvptx
>>
>> See "nvptx: Support global constructors/destructors via 'collect2'"
>> [posted before]
>
> Building on that, attached is now the additional "for offloading" piece:
> "nvptx: Support global constructors/destructors via 'collect2' for offloading".
> OK to push?

Now really attached.

> I did manually test this (by putting a few constructors/destructors into
> 'libgomp/config/nvptx/oacc-parallel.c', and observing them be executed),
> and also in my WIP development tree with standard libgfortran
> constructors (with 'LIBGFOR_MINIMAL' disabled).


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Support-global-constructors-destructors-via-co.patch --]
[-- Type: text/x-diff, Size: 8131 bytes --]

From fb67006eeca0c8e2bfdf86576ed3109dacaf6868 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 30 Nov 2022 22:09:35 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'
 for offloading

This extends "nvptx: Support global constructors/destructors via 'collect2'"
for offloading.

	libgcc/
	* config/nvptx/crtstuff.c ["mgomp"]
	(__do_global_ctors__entry__mgomp)
	(__do_global_dtors__entry__mgomp): New.
	[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
	New.
	libgomp/
	* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
	(nvptx_close_device, GOMP_OFFLOAD_load_image)
	(GOMP_OFFLOAD_unload_image): Call it.
---
 libgcc/config/nvptx/crtstuff.c |  64 ++++++++++++++++++-
 libgomp/plugin/plugin-nvptx.c  | 113 ++++++++++++++++++++++++++++++++-
 2 files changed, 175 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
index 0823fc49901..8dc80687e0a 100644
--- a/libgcc/config/nvptx/crtstuff.c
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -29,6 +29,14 @@
    files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
    do so anyway, for symmetry with other configurations.  */
 
+
+/* See 'crt0.c', 'mgomp.c'.  */
+#if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+extern void *__nvptx_stacks[32] __attribute__((shared,nocommon));
+extern unsigned __nvptx_uni[32] __attribute__((shared,nocommon));
+#endif
+
+
 #ifdef CRT_BEGIN
 
 void
@@ -37,6 +45,33 @@ __do_global_ctors (void)
   DO_GLOBAL_CTORS_BODY;
 }
 
+/* Need '.entry' wrapper for offloading.  */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_ctors__entry__mgomp (void *);
+
+void
+__do_global_ctors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __do_global_ctors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_ctors__entry (void);
+
+void
+__do_global_ctors__entry (void)
+{
+  __do_global_ctors ();
+}
+
+# endif
+
 #elif defined(CRT_END) /* ! CRT_BEGIN */
 
 void
@@ -45,7 +80,7 @@ __do_global_dtors (void)
   /* In this configuration here, there's no way that "this routine is run more
      than once [...] when exit is called recursively": for nvptx target, the
      call to '__do_global_dtors' is registered via 'atexit', which doesn't
-     re-enter a function already run.
+     re-enter a function already run, and neither does nvptx offload target.
      Therefore, we do *not* "arrange to remember where in the list we left off
      processing".  */
   func_ptr *p;
@@ -53,6 +88,33 @@ __do_global_dtors (void)
     (*p++) ();
 }
 
+/* Need '.entry' wrapper for offloading.  */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_dtors__entry__mgomp (void *);
+
+void
+__do_global_dtors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __do_global_dtors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_dtors__entry (void);
+
+void
+__do_global_dtors__entry (void)
+{
+  __do_global_dtors ();
+}
+
+# endif
+
 #else /* ! CRT_BEGIN && ! CRT_END */
 #error "One of CRT_BEGIN or CRT_END must be defined."
 #endif
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index fcc97c6e0d5..395639537e8 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -338,6 +338,11 @@ struct ptx_device
 
 static struct ptx_device **ptx_devices;
 
+static bool nvptx_do_global_cdtors (CUmodule, struct ptx_device *,
+				    const char *);
+static size_t nvptx_stacks_size ();
+static void *nvptx_stacks_acquire (struct ptx_device *, size_t, int);
+
 static inline struct nvptx_thread *
 nvptx_thread (void)
 {
@@ -557,6 +562,17 @@ nvptx_close_device (struct ptx_device *ptx_dev)
   if (!ptx_dev)
     return true;
 
+  bool ret = true;
+
+  for (struct ptx_image_data *image = ptx_dev->images;
+       image != NULL;
+       image = image->next)
+    {
+      if (!nvptx_do_global_cdtors (image->module, ptx_dev,
+				   "__do_global_dtors__entry"))
+	ret = false;
+    }
+
   for (struct ptx_free_block *b = ptx_dev->free_blocks; b;)
     {
       struct ptx_free_block *b_next = b->next;
@@ -577,7 +593,8 @@ nvptx_close_device (struct ptx_device *ptx_dev)
     CUDA_CALL (cuCtxDestroy, ptx_dev->ctx);
 
   free (ptx_dev);
-  return true;
+
+  return ret;
 }
 
 static int
@@ -1280,6 +1297,93 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
     GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
 }
 
+/* Invoke MODULE's global constructors/destructors.  */
+
+static bool
+nvptx_do_global_cdtors (CUmodule module, struct ptx_device *ptx_dev,
+			const char *funcname)
+{
+  bool ret = true;
+  char *funcname_mgomp = NULL;
+  CUresult r;
+  CUfunction funcptr;
+  r = CUDA_CALL_NOCHECK (cuModuleGetFunction,
+			 &funcptr, module, funcname);
+  GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n",
+		     funcname, cuda_error (r));
+  if (r == CUDA_ERROR_NOT_FOUND)
+    {
+      /* Try '[funcname]__mgomp'.  */
+
+      size_t funcname_len = strlen (funcname);
+      const char *mgomp_suffix = "__mgomp";
+      size_t mgomp_suffix_len = strlen (mgomp_suffix);
+      funcname_mgomp
+	= GOMP_PLUGIN_malloc (funcname_len + mgomp_suffix_len + 1);
+      memcpy (funcname_mgomp, funcname, funcname_len);
+      memcpy (funcname_mgomp + funcname_len,
+	      mgomp_suffix, mgomp_suffix_len + 1);
+      funcname = funcname_mgomp;
+
+      r = CUDA_CALL_NOCHECK (cuModuleGetFunction,
+			     &funcptr, module, funcname);
+      GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n",
+			 funcname, cuda_error (r));
+    }
+  if (r == CUDA_ERROR_NOT_FOUND)
+    ;
+  else if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("cuModuleGetFunction (%s) error: %s",
+			 funcname, cuda_error (r));
+      ret = false;
+    }
+  else
+    {
+      /* If necessary, set up soft stack.  */
+      void *nvptx_stacks_0;
+      void *kargs[1];
+      if (funcname_mgomp)
+	{
+	  size_t stack_size = nvptx_stacks_size ();
+	  pthread_mutex_lock (&ptx_dev->omp_stacks.lock);
+	  nvptx_stacks_0 = nvptx_stacks_acquire (ptx_dev, stack_size, 1);
+	  nvptx_stacks_0 += stack_size;
+	  kargs[0] = &nvptx_stacks_0;
+	}
+      r = CUDA_CALL_NOCHECK (cuLaunchKernel,
+			     funcptr,
+			     1, 1, 1, 1, 1, 1,
+			     /* sharedMemBytes */ 0,
+			     /* hStream */ NULL,
+			     /* kernelParams */ funcname_mgomp ? kargs : NULL,
+			     /* extra */ NULL);
+      if (r != CUDA_SUCCESS)
+	{
+	  GOMP_PLUGIN_error ("cuLaunchKernel (%s) error: %s",
+			     funcname, cuda_error (r));
+	  ret = false;
+	}
+
+      r = CUDA_CALL_NOCHECK (cuStreamSynchronize,
+			     NULL);
+      if (r != CUDA_SUCCESS)
+	{
+	  GOMP_PLUGIN_error ("cuStreamSynchronize (%s) error: %s",
+			     funcname, cuda_error (r));
+	  ret = false;
+	}
+
+      if (funcname_mgomp)
+	pthread_mutex_unlock (&ptx_dev->omp_stacks.lock);
+    }
+
+  if (funcname_mgomp)
+    free (funcname_mgomp);
+
+  return ret;
+}
+
 /* Load the (partial) program described by TARGET_DATA to device
    number ORD.  Allocate and return TARGET_TABLE.  If not NULL, REV_FN_TABLE
    will contain the on-device addresses of the functions for reverse offload.
@@ -1452,6 +1556,9 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 
   nvptx_set_clocktick (module, dev);
 
+  if (!nvptx_do_global_cdtors (module, dev, "__do_global_ctors__entry"))
+    return -1;
+
   return fn_entries + var_entries + other_entries;
 }
 
@@ -1477,6 +1584,10 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data)
   for (prev_p = &dev->images; (image = *prev_p) != 0; prev_p = &image->next)
     if (image->target_data == target_data)
       {
+	if (!nvptx_do_global_cdtors (image->module, dev,
+				     "__do_global_dtors__entry"))
+	  ret = false;
+
 	*prev_p = image->next;
 	if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS)
 	  ret = false;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PING^2] nvptx: Support global constructors/destructors via 'collect2'
  2022-12-20  8:03   ` [PING] " Thomas Schwinge
@ 2023-01-11 11:48     ` Thomas Schwinge
  2023-01-24  9:01     ` Make 'libgcc/config/nvptx/crt0.c' build '--without-headers' (was: [PING] nvptx: Support global constructors/destructors via 'collect2') Thomas Schwinge
  1 sibling, 0 replies; 19+ messages in thread
From: Thomas Schwinge @ 2023-01-11 11:48 UTC (permalink / raw)
  To: gcc-patches, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1558 bytes --]

Hi!

Ping.


Grüße
 Thomas


On 2022-12-20T09:03:51+0100, I wrote:
> Hi!
>
> Ping.
>
>
> Minor change in the attached
> "nvptx: Support global constructors/destructors via 'collect2'": for
> 'atexit', add '#include <stdlib.h>' to 'libgcc/config/nvptx/crt0.c'.
>
>
> Grüße
>  Thomas
>
>
> On 2022-12-02T14:35:35+0100, I wrote:
>> Hi!
>>
>> On 2022-12-01T22:13:38+0100, I wrote:
>>> I'm working on support for global constructors/destructors with
>>> GCC/nvptx
>>
>> See "nvptx: Support global constructors/destructors via 'collect2'"
>> attached; OK to push?  (... with 'gcc/doc/install.texi' accordingly
>> updated once <https://github.com/MentorEmbedded/nvptx-tools/pull/40>
>> "'nm'" and newlib
>> <https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
>> "nvptx: Implement '_exit' instead of 'exit'" have been merged; any
>> comments to those?)
>>
>> Per my quick scanning of 'gcc/config.gcc' history, for more than two
>> decades, there was a clear trend to remove 'use_collect2=yes'
>> configurations; now finally a new one is being added -- making sure we're
>> not slowly dispensing with the need for the early 1990s piece of work
>> that 'gcc/collect2*' is...  ;'-P
>>
>>
>> Grüße
>>  Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Support-global-constructors-destructors-via-co.patch --]
[-- Type: text/x-diff, Size: 10784 bytes --]

From 0e7cf5a9f83c3a82eafa126886e5d92651bfbb30 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Sun, 13 Nov 2022 14:19:30 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'

The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this.  Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more "sorry, unimplemented:
global constructors not supported on this target".

This depends on <https://github.com/MentorEmbedded/nvptx-tools/pull/40> "'nm'"
generally, and for global destructors support: newlib
<https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
"nvptx: Implement '_exit' instead of 'exit'".

	gcc/
	* collect2.cc (write_c_file_glob): Allow for
	'COLLECT2_MAIN_REFERENCE' override.
	* config.gcc <case ${target} in nvptx-*>: Set 'use_collect2=yes'.
	* config/nvptx/nvptx.h: Adjust.
	gcc/testsuite/
	* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
	'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
	in identifiers.
	* lib/target-supports.exp
	(check_effective_target_global_constructor): Enable for nvptx.
	libgcc/
	* config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
	'crtend.o' to 'extra_parts'.
	* config/nvptx/crt0.c: Invoke '__do_global_ctors',
	'__do_global_dtors'.
	* config/nvptx/crtstuff.c: New.
	* config/nvptx/t-nvptx: Adjust.
---
 gcc/collect2.cc                               |  4 ++
 gcc/config.gcc                                |  1 +
 gcc/config/nvptx/nvptx.h                      | 35 ++++++++++-
 .../no_profile_instrument_function-attr-1.c   |  2 +-
 gcc/testsuite/lib/target-supports.exp         |  3 +-
 libgcc/config.host                            |  2 +-
 libgcc/config/nvptx/crt0.c                    |  6 ++
 libgcc/config/nvptx/crtstuff.c                | 58 +++++++++++++++++++
 libgcc/config/nvptx/t-nvptx                   | 15 ++++-
 9 files changed, 119 insertions(+), 7 deletions(-)
 create mode 100644 libgcc/config/nvptx/crtstuff.c

diff --git a/gcc/collect2.cc b/gcc/collect2.cc
index d81c7f28f16a..945a9ff86dda 100644
--- a/gcc/collect2.cc
+++ b/gcc/collect2.cc
@@ -2238,8 +2238,12 @@ write_c_file_glob (FILE *stream, const char *name ATTRIBUTE_UNUSED)
     fprintf (stream, "\tdereg_frame,\n");
   fprintf (stream, "\t0\n};\n\n");
 
+# ifdef COLLECT2_MAIN_REFERENCE
+  fprintf (stream, "%s\n\n", COLLECT2_MAIN_REFERENCE);
+# else
   fprintf (stream, "extern entry_pt %s;\n", NAME__MAIN);
   fprintf (stream, "entry_pt *__main_reference = %s;\n\n", NAME__MAIN);
+# endif
 }
 #endif /* ! LD_INIT_SWITCH */
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 951902338205..fec67d7b6e40 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2784,6 +2784,7 @@ nvptx-*)
 	tm_file="${tm_file} newlib-stdint.h"
 	use_gcc_stdint=wrap
 	tmake_file="nvptx/t-nvptx"
+	use_collect2=yes
 	if test x$enable_as_accelerator = xyes; then
 		extra_programs="${extra_programs} mkoffload\$(exeext)"
 		tm_file="${tm_file} nvptx/offload.h"
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index dc676dcb5fc5..235c1e4d99d5 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -35,7 +35,39 @@
    '../../gcc.cc:asm_options', 'HAVE_GNU_AS'.  */
 #define ASM_SPEC "%{v}"
 
-#define STARTFILE_SPEC "%{mmainkernel:crt0.o%s}"
+#define STARTFILE_SPEC \
+  STARTFILE_SPEC_MMAINKERNEL \
+  " " STARTFILE_SPEC_CDTOR
+
+#define ENDFILE_SPEC \
+  ENDFILE_SPEC_CDTOR
+
+#define STARTFILE_SPEC_MMAINKERNEL "%{mmainkernel:crt0.o%s}"
+
+/* Support for global constructors/destructors is implemented via
+   'collect2' and the following helpers.  */
+
+#define STARTFILE_SPEC_CDTOR "crtbegin.o%s"
+
+#define ENDFILE_SPEC_CDTOR "crtend.o%s"
+
+/* nvptx does its own wrapping of 'main'
+   (see 'libgcc/config/nvptx/crt0.c:__main').  */
+#define HAS_INIT_SECTION
+
+/* For example with old Nvidia Tesla K20c, Driver Version: 361.93.02, the
+   function pointers stored in the '__CTOR_LIST__', '__DTOR_LIST__' arrays
+   evidently evaluate to NULL in JIT compilation.  Avoiding the use of
+   assembler names ('write_list_with_asm') doesn't help, but defining a dummy
+   function next to the arrays apparently does work around this issue...
+
+   The default '__main_reference' synthesized by 'collect2' refers to our
+   'crt0.o' '__main' function with incompatible signature:
+
+       error   : Function '__main' not declared __global__ in all source files
+
+   Address both these issues via 'COLLECT2_MAIN_REFERENCE'.  */
+#define COLLECT2_MAIN_REFERENCE "__attribute__((unused)) static void dummy () {}"
 
 #define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
 
@@ -348,7 +380,6 @@ struct GTY(()) machine_function
 #define MOVE_MAX 8
 #define MOVE_RATIO(SPEED) 4
 #define FUNCTION_MODE QImode
-#define HAS_INIT_SECTION 1
 
 /* The C++ front end insists to link against libstdc++ -- which we don't build.
    Tell it to instead link against the innocuous libgcc.  */
diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
index 909f8a684791..5b4101cf596d 100644
--- a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
@@ -18,7 +18,7 @@ int main ()
   return foo ();
 }
 
-/* { dg-final { scan-tree-dump-times "__gcov0\[._\]main.* = PROF_edge_counter" 1 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "__gcov0\[$._\]main.* = PROF_edge_counter" 1 "optimized"} } */
 /* { dg-final { scan-tree-dump-times "__gcov_indirect_call_profiler_v" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "__gcov_time_profiler_counter = " 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "__gcov_init" 1 "optimized" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index ea06e21c3a14..b1b1c5b36bc2 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -907,8 +907,7 @@ proc check_effective_target_nonlocal_goto {} {
 # Return 1 if global constructors are supported, 0 otherwise.
 
 proc check_effective_target_global_constructor {} {
-    if { [istarget nvptx-*-*]
-	 || [istarget bpf-*-*] } {
+    if { [istarget bpf-*-*] } {
 	return 0
     }
     return 1
diff --git a/libgcc/config.host b/libgcc/config.host
index eb23abe89f5e..25072f41860c 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1499,7 +1499,7 @@ m32c-*-elf*)
  	;;
 nvptx-*)
 	tmake_file="$tmake_file nvptx/t-nvptx"
-	extra_parts="crt0.o"
+	extra_parts="crt0.o crtbegin.o crtend.o"
 	;;
 *)
 	echo "*** Configuration ${host} not supported" 1>&2
diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c
index abf047327ae7..860e2bfacadd 100644
--- a/libgcc/config/nvptx/crt0.c
+++ b/libgcc/config/nvptx/crt0.c
@@ -19,6 +19,9 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include <stdlib.h>
+#include "gbl-ctors.h"
+
 int *__exitval_ptr;
 
 extern void __attribute__((noreturn)) exit (int status);
@@ -47,5 +50,8 @@ __main (int *rval_ptr, int argc, void **argv)
   __nvptx_stacks[0] = stack + sizeof stack;
   __nvptx_uni[0] = 0;
 
+  __do_global_ctors ();
+  atexit (__do_global_dtors);
+
   exit (main (argc, argv));
 }
diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
new file mode 100644
index 000000000000..0823fc499019
--- /dev/null
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -0,0 +1,58 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "gbl-ctors.h"
+
+/* The common 'crtstuff.c' doesn't quite provide what we need, so we roll our
+   own.
+
+   There's no technical reason in this configuration here to split the two
+   functions '__do_global_ctors' and '__do_global_ctors' into two separate
+   files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
+   do so anyway, for symmetry with other configurations.  */
+
+#ifdef CRT_BEGIN
+
+void
+__do_global_ctors (void)
+{
+  DO_GLOBAL_CTORS_BODY;
+}
+
+#elif defined(CRT_END) /* ! CRT_BEGIN */
+
+void
+__do_global_dtors (void)
+{
+  /* In this configuration here, there's no way that "this routine is run more
+     than once [...] when exit is called recursively": for nvptx target, the
+     call to '__do_global_dtors' is registered via 'atexit', which doesn't
+     re-enter a function already run.
+     Therefore, we do *not* "arrange to remember where in the list we left off
+     processing".  */
+  func_ptr *p;
+  for (p = __DTOR_LIST__ + 1; *p; )
+    (*p++) ();
+}
+
+#else /* ! CRT_BEGIN && ! CRT_END */
+#error "One of CRT_BEGIN or CRT_END must be defined."
+#endif
diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index ede0bf0f87dd..9a0454c3a4d0 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -3,7 +3,7 @@ LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
 	$(srcdir)/config/nvptx/atomic.c
 
 LIB2ADDEH=
-LIB2FUNCS_EXCLUDE=__main
+LIB2FUNCS_EXCLUDE=
 
 crt0.o: $(srcdir)/config/nvptx/crt0.c
 	$(crt_compile) -c $<
@@ -12,3 +12,16 @@ crt0.o: $(srcdir)/config/nvptx/crt0.c
 # support it, and it may cause the build to fail, because of alloca usage, for
 # example.
 INHIBIT_LIBC_CFLAGS = -Dinhibit_libc
+
+# Support for global constructors/destructors is implemented via
+# 'collect2' and the following helpers.
+
+LIB2FUNCS_EXCLUDE += __main
+
+CUSTOM_CRTSTUFF = yes
+
+crtbegin.o: $(srcdir)/config/nvptx/crtstuff.c
+	$(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_BEGIN
+
+crtend.o: $(srcdir)/config/nvptx/crtstuff.c
+	$(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_END
-- 
2.35.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PING] nvptx: Support global constructors/destructors via 'collect2' for offloading (was: nvptx: Support global constructors/destructors via 'collect2')
  2022-12-23 13:37     ` Thomas Schwinge
@ 2023-01-11 11:49       ` Thomas Schwinge
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Schwinge @ 2023-01-11 11:49 UTC (permalink / raw)
  To: gcc-patches, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1185 bytes --]

Hi!

Ping.


Grüße
 Thomas


On 2022-12-23T14:37:47+0100, I wrote:
> Hi!
>
> On 2022-12-23T14:35:16+0100, I wrote:
>> On 2022-12-02T14:35:35+0100, I wrote:
>>> On 2022-12-01T22:13:38+0100, I wrote:
>>>> I'm working on support for global constructors/destructors with
>>>> GCC/nvptx
>>>
>>> See "nvptx: Support global constructors/destructors via 'collect2'"
>>> [posted before]
>>
>> Building on that, attached is now the additional "for offloading" piece:
>> "nvptx: Support global constructors/destructors via 'collect2' for offloading".
>> OK to push?
>
> Now really attached.
>
>> I did manually test this (by putting a few constructors/destructors into
>> 'libgomp/config/nvptx/oacc-parallel.c', and observing them be executed),
>> and also in my WIP development tree with standard libgfortran
>> constructors (with 'LIBGFOR_MINIMAL' disabled).
>
>
> Grüße
>  Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Support-global-constructors-destructors-via-co.patch --]
[-- Type: text/x-diff, Size: 8131 bytes --]

From fb67006eeca0c8e2bfdf86576ed3109dacaf6868 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 30 Nov 2022 22:09:35 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'
 for offloading

This extends "nvptx: Support global constructors/destructors via 'collect2'"
for offloading.

	libgcc/
	* config/nvptx/crtstuff.c ["mgomp"]
	(__do_global_ctors__entry__mgomp)
	(__do_global_dtors__entry__mgomp): New.
	[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
	New.
	libgomp/
	* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
	(nvptx_close_device, GOMP_OFFLOAD_load_image)
	(GOMP_OFFLOAD_unload_image): Call it.
---
 libgcc/config/nvptx/crtstuff.c |  64 ++++++++++++++++++-
 libgomp/plugin/plugin-nvptx.c  | 113 ++++++++++++++++++++++++++++++++-
 2 files changed, 175 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
index 0823fc49901..8dc80687e0a 100644
--- a/libgcc/config/nvptx/crtstuff.c
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -29,6 +29,14 @@
    files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
    do so anyway, for symmetry with other configurations.  */
 
+
+/* See 'crt0.c', 'mgomp.c'.  */
+#if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+extern void *__nvptx_stacks[32] __attribute__((shared,nocommon));
+extern unsigned __nvptx_uni[32] __attribute__((shared,nocommon));
+#endif
+
+
 #ifdef CRT_BEGIN
 
 void
@@ -37,6 +45,33 @@ __do_global_ctors (void)
   DO_GLOBAL_CTORS_BODY;
 }
 
+/* Need '.entry' wrapper for offloading.  */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_ctors__entry__mgomp (void *);
+
+void
+__do_global_ctors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __do_global_ctors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_ctors__entry (void);
+
+void
+__do_global_ctors__entry (void)
+{
+  __do_global_ctors ();
+}
+
+# endif
+
 #elif defined(CRT_END) /* ! CRT_BEGIN */
 
 void
@@ -45,7 +80,7 @@ __do_global_dtors (void)
   /* In this configuration here, there's no way that "this routine is run more
      than once [...] when exit is called recursively": for nvptx target, the
      call to '__do_global_dtors' is registered via 'atexit', which doesn't
-     re-enter a function already run.
+     re-enter a function already run, and neither does nvptx offload target.
      Therefore, we do *not* "arrange to remember where in the list we left off
      processing".  */
   func_ptr *p;
@@ -53,6 +88,33 @@ __do_global_dtors (void)
     (*p++) ();
 }
 
+/* Need '.entry' wrapper for offloading.  */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_dtors__entry__mgomp (void *);
+
+void
+__do_global_dtors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __do_global_dtors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_dtors__entry (void);
+
+void
+__do_global_dtors__entry (void)
+{
+  __do_global_dtors ();
+}
+
+# endif
+
 #else /* ! CRT_BEGIN && ! CRT_END */
 #error "One of CRT_BEGIN or CRT_END must be defined."
 #endif
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index fcc97c6e0d5..395639537e8 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -338,6 +338,11 @@ struct ptx_device
 
 static struct ptx_device **ptx_devices;
 
+static bool nvptx_do_global_cdtors (CUmodule, struct ptx_device *,
+				    const char *);
+static size_t nvptx_stacks_size ();
+static void *nvptx_stacks_acquire (struct ptx_device *, size_t, int);
+
 static inline struct nvptx_thread *
 nvptx_thread (void)
 {
@@ -557,6 +562,17 @@ nvptx_close_device (struct ptx_device *ptx_dev)
   if (!ptx_dev)
     return true;
 
+  bool ret = true;
+
+  for (struct ptx_image_data *image = ptx_dev->images;
+       image != NULL;
+       image = image->next)
+    {
+      if (!nvptx_do_global_cdtors (image->module, ptx_dev,
+				   "__do_global_dtors__entry"))
+	ret = false;
+    }
+
   for (struct ptx_free_block *b = ptx_dev->free_blocks; b;)
     {
       struct ptx_free_block *b_next = b->next;
@@ -577,7 +593,8 @@ nvptx_close_device (struct ptx_device *ptx_dev)
     CUDA_CALL (cuCtxDestroy, ptx_dev->ctx);
 
   free (ptx_dev);
-  return true;
+
+  return ret;
 }
 
 static int
@@ -1280,6 +1297,93 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
     GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
 }
 
+/* Invoke MODULE's global constructors/destructors.  */
+
+static bool
+nvptx_do_global_cdtors (CUmodule module, struct ptx_device *ptx_dev,
+			const char *funcname)
+{
+  bool ret = true;
+  char *funcname_mgomp = NULL;
+  CUresult r;
+  CUfunction funcptr;
+  r = CUDA_CALL_NOCHECK (cuModuleGetFunction,
+			 &funcptr, module, funcname);
+  GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n",
+		     funcname, cuda_error (r));
+  if (r == CUDA_ERROR_NOT_FOUND)
+    {
+      /* Try '[funcname]__mgomp'.  */
+
+      size_t funcname_len = strlen (funcname);
+      const char *mgomp_suffix = "__mgomp";
+      size_t mgomp_suffix_len = strlen (mgomp_suffix);
+      funcname_mgomp
+	= GOMP_PLUGIN_malloc (funcname_len + mgomp_suffix_len + 1);
+      memcpy (funcname_mgomp, funcname, funcname_len);
+      memcpy (funcname_mgomp + funcname_len,
+	      mgomp_suffix, mgomp_suffix_len + 1);
+      funcname = funcname_mgomp;
+
+      r = CUDA_CALL_NOCHECK (cuModuleGetFunction,
+			     &funcptr, module, funcname);
+      GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n",
+			 funcname, cuda_error (r));
+    }
+  if (r == CUDA_ERROR_NOT_FOUND)
+    ;
+  else if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("cuModuleGetFunction (%s) error: %s",
+			 funcname, cuda_error (r));
+      ret = false;
+    }
+  else
+    {
+      /* If necessary, set up soft stack.  */
+      void *nvptx_stacks_0;
+      void *kargs[1];
+      if (funcname_mgomp)
+	{
+	  size_t stack_size = nvptx_stacks_size ();
+	  pthread_mutex_lock (&ptx_dev->omp_stacks.lock);
+	  nvptx_stacks_0 = nvptx_stacks_acquire (ptx_dev, stack_size, 1);
+	  nvptx_stacks_0 += stack_size;
+	  kargs[0] = &nvptx_stacks_0;
+	}
+      r = CUDA_CALL_NOCHECK (cuLaunchKernel,
+			     funcptr,
+			     1, 1, 1, 1, 1, 1,
+			     /* sharedMemBytes */ 0,
+			     /* hStream */ NULL,
+			     /* kernelParams */ funcname_mgomp ? kargs : NULL,
+			     /* extra */ NULL);
+      if (r != CUDA_SUCCESS)
+	{
+	  GOMP_PLUGIN_error ("cuLaunchKernel (%s) error: %s",
+			     funcname, cuda_error (r));
+	  ret = false;
+	}
+
+      r = CUDA_CALL_NOCHECK (cuStreamSynchronize,
+			     NULL);
+      if (r != CUDA_SUCCESS)
+	{
+	  GOMP_PLUGIN_error ("cuStreamSynchronize (%s) error: %s",
+			     funcname, cuda_error (r));
+	  ret = false;
+	}
+
+      if (funcname_mgomp)
+	pthread_mutex_unlock (&ptx_dev->omp_stacks.lock);
+    }
+
+  if (funcname_mgomp)
+    free (funcname_mgomp);
+
+  return ret;
+}
+
 /* Load the (partial) program described by TARGET_DATA to device
    number ORD.  Allocate and return TARGET_TABLE.  If not NULL, REV_FN_TABLE
    will contain the on-device addresses of the functions for reverse offload.
@@ -1452,6 +1556,9 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 
   nvptx_set_clocktick (module, dev);
 
+  if (!nvptx_do_global_cdtors (module, dev, "__do_global_ctors__entry"))
+    return -1;
+
   return fn_entries + var_entries + other_entries;
 }
 
@@ -1477,6 +1584,10 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data)
   for (prev_p = &dev->images; (image = *prev_p) != 0; prev_p = &image->next)
     if (image->target_data == target_data)
       {
+	if (!nvptx_do_global_cdtors (image->module, dev,
+				     "__do_global_dtors__entry"))
+	  ret = false;
+
 	*prev_p = image->next;
 	if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS)
 	  ret = false;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [og12] nvptx: Support global constructors/destructors via 'collect2'
  2022-12-02 13:35 ` nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
  2022-12-20  8:03   ` [PING] " Thomas Schwinge
  2022-12-23 13:35   ` nvptx: Support global constructors/destructors via 'collect2' for offloading (was: " Thomas Schwinge
@ 2023-01-20 20:41   ` Thomas Schwinge
  2023-01-20 20:45     ` Thomas Schwinge
  2024-05-31 13:15   ` nvptx target: Global constructor, destructor support, via nvptx-tools 'ld' (was: nvptx: Support global constructors/destructors via 'collect2') Thomas Schwinge
  3 siblings, 1 reply; 19+ messages in thread
From: Thomas Schwinge @ 2023-01-20 20:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: Tom de Vries

Hi!

On 2022-12-02T14:35:35+0100, I wrote:
> On 2022-12-01T22:13:38+0100, I wrote:
>> I'm working on support for global constructors/destructors with
>> GCC/nvptx
>
> See "nvptx: Support global constructors/destructors via 'collect2'"
> attached; OK to push?  (... with 'gcc/doc/install.texi' accordingly
> updated once <https://github.com/MentorEmbedded/nvptx-tools/pull/40>
> "'nm'" and newlib
> <https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
> "nvptx: Implement '_exit' instead of 'exit'" have been merged; any
> comments to those?)

For now pushed to devel/omp/gcc-12 branch in
commit fe07b0003bb2092bc34d4bed504be1868b88782d
"nvptx: Support global constructors/destructors via 'collect2'",
see attached.

> Per my quick scanning of 'gcc/config.gcc' history, for more than two
> decades, there was a clear trend to remove 'use_collect2=yes'
> configurations; now finally a new one is being added -- making sure we're
> not slowly dispensing with the need for the early 1990s piece of work
> that 'gcc/collect2*' is...  ;'-P

(I still find that "notable" and "funny" in a certain way.)  ;-*


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [og12] nvptx: Support global constructors/destructors via 'collect2'
  2023-01-20 20:41   ` [og12] nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
@ 2023-01-20 20:45     ` Thomas Schwinge
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Schwinge @ 2023-01-20 20:45 UTC (permalink / raw)
  To: gcc-patches; +Cc: Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1548 bytes --]

Hi!

On 2023-01-20T21:41:26+0100, I wrote:
> On 2022-12-02T14:35:35+0100, I wrote:
>> On 2022-12-01T22:13:38+0100, I wrote:
>>> I'm working on support for global constructors/destructors with
>>> GCC/nvptx
>>
>> See "nvptx: Support global constructors/destructors via 'collect2'"
>> attached; OK to push?  (... with 'gcc/doc/install.texi' accordingly
>> updated once <https://github.com/MentorEmbedded/nvptx-tools/pull/40>
>> "'nm'" and newlib
>> <https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
>> "nvptx: Implement '_exit' instead of 'exit'" have been merged; any
>> comments to those?)
>
> For now pushed to devel/omp/gcc-12 branch in
> commit fe07b0003bb2092bc34d4bed504be1868b88782d
> "nvptx: Support global constructors/destructors via 'collect2'",
> see attached.

Now really attached.

>> Per my quick scanning of 'gcc/config.gcc' history, for more than two
>> decades, there was a clear trend to remove 'use_collect2=yes'
>> configurations; now finally a new one is being added -- making sure we're
>> not slowly dispensing with the need for the early 1990s piece of work
>> that 'gcc/collect2*' is...  ;'-P
>
> (I still find that "notable" and "funny" in a certain way.)  ;-*


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Support-global-constructors-destructors-via-co.patch --]
[-- Type: text/x-diff, Size: 12741 bytes --]

From fe07b0003bb2092bc34d4bed504be1868b88782d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Sun, 13 Nov 2022 14:19:30 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'

The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this.  Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more "sorry, unimplemented:
global constructors not supported on this target".

This depends on <https://github.com/MentorEmbedded/nvptx-tools/pull/40> "'nm'"
generally, and for global destructors support: newlib
<https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
"nvptx: Implement '_exit' instead of 'exit'".

	gcc/
	* collect2.cc (write_c_file_glob): Allow for
	'COLLECT2_MAIN_REFERENCE' override.
	* config.gcc <case ${target} in nvptx-*>: Set 'use_collect2=yes'.
	* config/nvptx/nvptx.h: Adjust.
	gcc/testsuite/
	* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
	'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
	in identifiers.
	* lib/target-supports.exp
	(check_effective_target_global_constructor): Enable for nvptx.
	libgcc/
	* config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
	'crtend.o' to 'extra_parts'.
	* config/nvptx/crt0.c: Invoke '__do_global_ctors',
	'__do_global_dtors'.
	* config/nvptx/crtstuff.c: New.
	* config/nvptx/t-nvptx: Adjust.
---
 gcc/ChangeLog.omp                             |  5 ++
 gcc/collect2.cc                               |  4 ++
 gcc/config.gcc                                |  1 +
 gcc/config/nvptx/nvptx.h                      | 35 ++++++++++-
 gcc/testsuite/ChangeLog.omp                   |  6 ++
 .../no_profile_instrument_function-attr-1.c   |  2 +-
 gcc/testsuite/lib/target-supports.exp         |  3 +-
 libgcc/ChangeLog.omp                          |  9 +++
 libgcc/config.host                            |  2 +-
 libgcc/config/nvptx/crt0.c                    |  6 ++
 libgcc/config/nvptx/crtstuff.c                | 58 +++++++++++++++++++
 libgcc/config/nvptx/t-nvptx                   | 15 ++++-
 12 files changed, 139 insertions(+), 7 deletions(-)
 create mode 100644 libgcc/config/nvptx/crtstuff.c

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 127b450644b..ca00bfb48f9 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,5 +1,10 @@
 2023-01-20  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* collect2.cc (write_c_file_glob): Allow for
+	'COLLECT2_MAIN_REFERENCE' override.
+	* config.gcc <case ${target} in nvptx-*>: Set 'use_collect2=yes'.
+	* config/nvptx/nvptx.h: Adjust.
+
 	* config/nvptx/nvptx.cc (nvptx_assemble_undefined_decl): Notice
 	'__nvptx_stacks', '__nvptx_uni' declarations.
 	(nvptx_file_end): Don't emit duplicate declarations for those.
diff --git a/gcc/collect2.cc b/gcc/collect2.cc
index d81c7f28f16..945a9ff86dd 100644
--- a/gcc/collect2.cc
+++ b/gcc/collect2.cc
@@ -2238,8 +2238,12 @@ write_c_file_glob (FILE *stream, const char *name ATTRIBUTE_UNUSED)
     fprintf (stream, "\tdereg_frame,\n");
   fprintf (stream, "\t0\n};\n\n");
 
+# ifdef COLLECT2_MAIN_REFERENCE
+  fprintf (stream, "%s\n\n", COLLECT2_MAIN_REFERENCE);
+# else
   fprintf (stream, "extern entry_pt %s;\n", NAME__MAIN);
   fprintf (stream, "entry_pt *__main_reference = %s;\n\n", NAME__MAIN);
+# endif
 }
 #endif /* ! LD_INIT_SWITCH */
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index e6b9c864b0d..9c9365886cf 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2835,6 +2835,7 @@ nvptx-*)
 	tm_file="${tm_file} newlib-stdint.h"
 	use_gcc_stdint=wrap
 	tmake_file="nvptx/t-nvptx"
+	use_collect2=yes
 	if test x$enable_as_accelerator = xyes; then
 		extra_programs="${extra_programs} mkoffload\$(exeext)"
 		tm_file="${tm_file} nvptx/offload.h"
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index b81f9f42cd3..d815081147e 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -35,7 +35,39 @@
    '../../gcc.cc:asm_options', 'HAVE_GNU_AS'.  */
 #define ASM_SPEC "%{v}"
 
-#define STARTFILE_SPEC "%{mmainkernel:crt0.o%s}"
+#define STARTFILE_SPEC \
+  STARTFILE_SPEC_MMAINKERNEL \
+  " " STARTFILE_SPEC_CDTOR
+
+#define ENDFILE_SPEC \
+  ENDFILE_SPEC_CDTOR
+
+#define STARTFILE_SPEC_MMAINKERNEL "%{mmainkernel:crt0.o%s}"
+
+/* Support for global constructors/destructors is implemented via
+   'collect2' and the following helpers.  */
+
+#define STARTFILE_SPEC_CDTOR "crtbegin.o%s"
+
+#define ENDFILE_SPEC_CDTOR "crtend.o%s"
+
+/* nvptx does its own wrapping of 'main'
+   (see 'libgcc/config/nvptx/crt0.c:__main').  */
+#define HAS_INIT_SECTION
+
+/* For example with old Nvidia Tesla K20c, Driver Version: 361.93.02, the
+   function pointers stored in the '__CTOR_LIST__', '__DTOR_LIST__' arrays
+   evidently evaluate to NULL in JIT compilation.  Avoiding the use of
+   assembler names ('write_list_with_asm') doesn't help, but defining a dummy
+   function next to the arrays apparently does work around this issue...
+
+   The default '__main_reference' synthesized by 'collect2' refers to our
+   'crt0.o' '__main' function with incompatible signature:
+
+       error   : Function '__main' not declared __global__ in all source files
+
+   Address both these issues via 'COLLECT2_MAIN_REFERENCE'.  */
+#define COLLECT2_MAIN_REFERENCE "__attribute__((unused)) static void dummy () {}"
 
 #define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
 
@@ -348,7 +380,6 @@ struct GTY(()) machine_function
 #define MOVE_MAX 8
 #define MOVE_RATIO(SPEED) 4
 #define FUNCTION_MODE QImode
-#define HAS_INIT_SECTION 1
 
 /* The C++ front end insists to link against libstdc++ -- which we don't build.
    Tell it to instead link against the innocuous libgcc.  */
diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index c942c34dc70..f35568d83a9 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,5 +1,11 @@
 2023-01-20  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
+	'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
+	in identifiers.
+	* lib/target-supports.exp
+	(check_effective_target_global_constructor): Enable for nvptx.
+
 	* gcc.target/nvptx/softstack-decl-1.c: Make 'dg-do assemble',
 	adjust.
 	* gcc.target/nvptx/uniform-simt-decl-1.c: Likewise.
diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
index 909f8a68479..5b4101cf596 100644
--- a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
@@ -18,7 +18,7 @@ int main ()
   return foo ();
 }
 
-/* { dg-final { scan-tree-dump-times "__gcov0\[._\]main.* = PROF_edge_counter" 1 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "__gcov0\[$._\]main.* = PROF_edge_counter" 1 "optimized"} } */
 /* { dg-final { scan-tree-dump-times "__gcov_indirect_call_profiler_v" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "__gcov_time_profiler_counter = " 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "__gcov_init" 1 "optimized" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index e911b298f4c..efd02a61a6b 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -850,8 +850,7 @@ proc check_effective_target_nonlocal_goto {} {
 # Return 1 if global constructors are supported, 0 otherwise.
 
 proc check_effective_target_global_constructor {} {
-    if { [istarget nvptx-*-*]
-	 || [istarget bpf-*-*] } {
+    if { [istarget bpf-*-*] } {
 	return 0
     }
     return 1
diff --git a/libgcc/ChangeLog.omp b/libgcc/ChangeLog.omp
index f41bbbb339a..68a99cbe427 100644
--- a/libgcc/ChangeLog.omp
+++ b/libgcc/ChangeLog.omp
@@ -1,3 +1,12 @@
+2023-01-20  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
+	'crtend.o' to 'extra_parts'.
+	* config/nvptx/crt0.c: Invoke '__do_global_ctors',
+	'__do_global_dtors'.
+	* config/nvptx/crtstuff.c: New.
+	* config/nvptx/t-nvptx: Adjust.
+
 2022-11-07  Kwok Cheung Yeung  <kcy@codesourcery.com>
 
 	* config/gcn/simd-math/amdgcnmach.h (VECTOR_RETURN): Store value of
diff --git a/libgcc/config.host b/libgcc/config.host
index 072dd26a276..29b5a2bd7cf 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1535,7 +1535,7 @@ m32c-*-elf*|m32c-*-rtems*)
  	;;
 nvptx-*)
 	tmake_file="$tmake_file nvptx/t-nvptx"
-	extra_parts="crt0.o"
+	extra_parts="crt0.o crtbegin.o crtend.o"
 	;;
 *)
 	echo "*** Configuration ${host} not supported" 1>&2
diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c
index abf047327ae..860e2bfacad 100644
--- a/libgcc/config/nvptx/crt0.c
+++ b/libgcc/config/nvptx/crt0.c
@@ -19,6 +19,9 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include <stdlib.h>
+#include "gbl-ctors.h"
+
 int *__exitval_ptr;
 
 extern void __attribute__((noreturn)) exit (int status);
@@ -47,5 +50,8 @@ __main (int *rval_ptr, int argc, void **argv)
   __nvptx_stacks[0] = stack + sizeof stack;
   __nvptx_uni[0] = 0;
 
+  __do_global_ctors ();
+  atexit (__do_global_dtors);
+
   exit (main (argc, argv));
 }
diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
new file mode 100644
index 00000000000..0823fc49901
--- /dev/null
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -0,0 +1,58 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "gbl-ctors.h"
+
+/* The common 'crtstuff.c' doesn't quite provide what we need, so we roll our
+   own.
+
+   There's no technical reason in this configuration here to split the two
+   functions '__do_global_ctors' and '__do_global_ctors' into two separate
+   files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
+   do so anyway, for symmetry with other configurations.  */
+
+#ifdef CRT_BEGIN
+
+void
+__do_global_ctors (void)
+{
+  DO_GLOBAL_CTORS_BODY;
+}
+
+#elif defined(CRT_END) /* ! CRT_BEGIN */
+
+void
+__do_global_dtors (void)
+{
+  /* In this configuration here, there's no way that "this routine is run more
+     than once [...] when exit is called recursively": for nvptx target, the
+     call to '__do_global_dtors' is registered via 'atexit', which doesn't
+     re-enter a function already run.
+     Therefore, we do *not* "arrange to remember where in the list we left off
+     processing".  */
+  func_ptr *p;
+  for (p = __DTOR_LIST__ + 1; *p; )
+    (*p++) ();
+}
+
+#else /* ! CRT_BEGIN && ! CRT_END */
+#error "One of CRT_BEGIN or CRT_END must be defined."
+#endif
diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index ede0bf0f87d..9a0454c3a4d 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -3,7 +3,7 @@ LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
 	$(srcdir)/config/nvptx/atomic.c
 
 LIB2ADDEH=
-LIB2FUNCS_EXCLUDE=__main
+LIB2FUNCS_EXCLUDE=
 
 crt0.o: $(srcdir)/config/nvptx/crt0.c
 	$(crt_compile) -c $<
@@ -12,3 +12,16 @@ crt0.o: $(srcdir)/config/nvptx/crt0.c
 # support it, and it may cause the build to fail, because of alloca usage, for
 # example.
 INHIBIT_LIBC_CFLAGS = -Dinhibit_libc
+
+# Support for global constructors/destructors is implemented via
+# 'collect2' and the following helpers.
+
+LIB2FUNCS_EXCLUDE += __main
+
+CUSTOM_CRTSTUFF = yes
+
+crtbegin.o: $(srcdir)/config/nvptx/crtstuff.c
+	$(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_BEGIN
+
+crtend.o: $(srcdir)/config/nvptx/crtstuff.c
+	$(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_END
-- 
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [og12] nvptx: Support global constructors/destructors via 'collect2' for offloading (was: nvptx: Support global constructors/destructors via 'collect2')
  2022-12-23 13:35   ` nvptx: Support global constructors/destructors via 'collect2' for offloading (was: " Thomas Schwinge
  2022-12-23 13:37     ` Thomas Schwinge
@ 2023-01-20 20:46     ` Thomas Schwinge
  2024-06-06 12:02     ` nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld' (was: nvptx: Support global constructors/destructors via 'collect2' for offloading) Thomas Schwinge
  2 siblings, 0 replies; 19+ messages in thread
From: Thomas Schwinge @ 2023-01-20 20:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]

Hi!

On 2022-12-23T14:35:16+0100, I wrote:
> On 2022-12-02T14:35:35+0100, I wrote:
>> On 2022-12-01T22:13:38+0100, I wrote:
>>> I'm working on support for global constructors/destructors with
>>> GCC/nvptx
>>
>> See "nvptx: Support global constructors/destructors via 'collect2'"
>> [posted before]
>
> Building on that, attached is now the additional "for offloading" piece:
> "nvptx: Support global constructors/destructors via 'collect2' for offloading".
> OK to push?

For now pushed to devel/omp/gcc-12 branch in
commit 689a5340c7e4286b451f1bc600342550c7c94da2
"nvptx: Support global constructors/destructors via 'collect2' for offloading",
see attached.

> I did manually test this (by putting a few constructors/destructors into
> 'libgomp/config/nvptx/oacc-parallel.c', and observing them be executed),
> and also in my WIP development tree with standard libgfortran
> constructors (with 'LIBGFOR_MINIMAL' disabled).


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Support-global-constructors-destructors-via-co.patch --]
[-- Type: text/x-diff, Size: 9283 bytes --]

From 689a5340c7e4286b451f1bc600342550c7c94da2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 30 Nov 2022 22:09:35 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'
 for offloading

This extends "nvptx: Support global constructors/destructors via 'collect2'"
for offloading.

	libgcc/
	* config/nvptx/crtstuff.c ["mgomp"]
	(__do_global_ctors__entry__mgomp)
	(__do_global_dtors__entry__mgomp): New.
	[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
	New.
	libgomp/
	* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
	(nvptx_close_device, GOMP_OFFLOAD_load_image)
	(GOMP_OFFLOAD_unload_image): Call it.
---
 libgcc/ChangeLog.omp           |   6 ++
 libgcc/config/nvptx/crtstuff.c |  64 ++++++++++++++++++-
 libgomp/ChangeLog.omp          |   4 ++
 libgomp/plugin/plugin-nvptx.c  | 113 ++++++++++++++++++++++++++++++++-
 4 files changed, 185 insertions(+), 2 deletions(-)

diff --git a/libgcc/ChangeLog.omp b/libgcc/ChangeLog.omp
index 68a99cbe427..2e7bf5cc029 100644
--- a/libgcc/ChangeLog.omp
+++ b/libgcc/ChangeLog.omp
@@ -1,5 +1,11 @@
 2023-01-20  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* config/nvptx/crtstuff.c ["mgomp"]
+	(__do_global_ctors__entry__mgomp)
+	(__do_global_dtors__entry__mgomp): New.
+	[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
+	New.
+
 	* config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
 	'crtend.o' to 'extra_parts'.
 	* config/nvptx/crt0.c: Invoke '__do_global_ctors',
diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
index 0823fc49901..8dc80687e0a 100644
--- a/libgcc/config/nvptx/crtstuff.c
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -29,6 +29,14 @@
    files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
    do so anyway, for symmetry with other configurations.  */
 
+
+/* See 'crt0.c', 'mgomp.c'.  */
+#if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+extern void *__nvptx_stacks[32] __attribute__((shared,nocommon));
+extern unsigned __nvptx_uni[32] __attribute__((shared,nocommon));
+#endif
+
+
 #ifdef CRT_BEGIN
 
 void
@@ -37,6 +45,33 @@ __do_global_ctors (void)
   DO_GLOBAL_CTORS_BODY;
 }
 
+/* Need '.entry' wrapper for offloading.  */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_ctors__entry__mgomp (void *);
+
+void
+__do_global_ctors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __do_global_ctors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_ctors__entry (void);
+
+void
+__do_global_ctors__entry (void)
+{
+  __do_global_ctors ();
+}
+
+# endif
+
 #elif defined(CRT_END) /* ! CRT_BEGIN */
 
 void
@@ -45,7 +80,7 @@ __do_global_dtors (void)
   /* In this configuration here, there's no way that "this routine is run more
      than once [...] when exit is called recursively": for nvptx target, the
      call to '__do_global_dtors' is registered via 'atexit', which doesn't
-     re-enter a function already run.
+     re-enter a function already run, and neither does nvptx offload target.
      Therefore, we do *not* "arrange to remember where in the list we left off
      processing".  */
   func_ptr *p;
@@ -53,6 +88,33 @@ __do_global_dtors (void)
     (*p++) ();
 }
 
+/* Need '.entry' wrapper for offloading.  */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_dtors__entry__mgomp (void *);
+
+void
+__do_global_dtors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __do_global_dtors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_dtors__entry (void);
+
+void
+__do_global_dtors__entry (void)
+{
+  __do_global_dtors ();
+}
+
+# endif
+
 #else /* ! CRT_BEGIN && ! CRT_END */
 #error "One of CRT_BEGIN or CRT_END must be defined."
 #endif
diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 4447b74a2ab..32aa9705296 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,9 @@
 2023-01-20  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
+	(nvptx_close_device, GOMP_OFFLOAD_load_image)
+	(GOMP_OFFLOAD_unload_image): Call it.
+
 	* plugin/plugin-nvptx.c (nvptx_exec): Assert what we know about
 	'blockDimX'.
 
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index b2fabc61cc8..8e7b63bd637 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -344,6 +344,11 @@ static struct ptx_device **ptx_devices;
    default is set here.  */
 static unsigned lowlat_pool_size = 8*1024;
 
+static bool nvptx_do_global_cdtors (CUmodule, struct ptx_device *,
+				    const char *);
+static size_t nvptx_stacks_size ();
+static void *nvptx_stacks_acquire (struct ptx_device *, size_t, int);
+
 static inline struct nvptx_thread *
 nvptx_thread (void)
 {
@@ -571,6 +576,17 @@ nvptx_close_device (struct ptx_device *ptx_dev)
   if (!ptx_dev)
     return true;
 
+  bool ret = true;
+
+  for (struct ptx_image_data *image = ptx_dev->images;
+       image != NULL;
+       image = image->next)
+    {
+      if (!nvptx_do_global_cdtors (image->module, ptx_dev,
+				   "__do_global_dtors__entry"))
+	ret = false;
+    }
+
   for (struct ptx_free_block *b = ptx_dev->free_blocks; b;)
     {
       struct ptx_free_block *b_next = b->next;
@@ -591,7 +607,8 @@ nvptx_close_device (struct ptx_device *ptx_dev)
     CUDA_CALL (cuCtxDestroy, ptx_dev->ctx);
 
   free (ptx_dev);
-  return true;
+
+  return ret;
 }
 
 static int
@@ -1313,6 +1330,93 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
     GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
 }
 
+/* Invoke MODULE's global constructors/destructors.  */
+
+static bool
+nvptx_do_global_cdtors (CUmodule module, struct ptx_device *ptx_dev,
+			const char *funcname)
+{
+  bool ret = true;
+  char *funcname_mgomp = NULL;
+  CUresult r;
+  CUfunction funcptr;
+  r = CUDA_CALL_NOCHECK (cuModuleGetFunction,
+			 &funcptr, module, funcname);
+  GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n",
+		     funcname, cuda_error (r));
+  if (r == CUDA_ERROR_NOT_FOUND)
+    {
+      /* Try '[funcname]__mgomp'.  */
+
+      size_t funcname_len = strlen (funcname);
+      const char *mgomp_suffix = "__mgomp";
+      size_t mgomp_suffix_len = strlen (mgomp_suffix);
+      funcname_mgomp
+	= GOMP_PLUGIN_malloc (funcname_len + mgomp_suffix_len + 1);
+      memcpy (funcname_mgomp, funcname, funcname_len);
+      memcpy (funcname_mgomp + funcname_len,
+	      mgomp_suffix, mgomp_suffix_len + 1);
+      funcname = funcname_mgomp;
+
+      r = CUDA_CALL_NOCHECK (cuModuleGetFunction,
+			     &funcptr, module, funcname);
+      GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n",
+			 funcname, cuda_error (r));
+    }
+  if (r == CUDA_ERROR_NOT_FOUND)
+    ;
+  else if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("cuModuleGetFunction (%s) error: %s",
+			 funcname, cuda_error (r));
+      ret = false;
+    }
+  else
+    {
+      /* If necessary, set up soft stack.  */
+      void *nvptx_stacks_0;
+      void *kargs[1];
+      if (funcname_mgomp)
+	{
+	  size_t stack_size = nvptx_stacks_size ();
+	  pthread_mutex_lock (&ptx_dev->omp_stacks.lock);
+	  nvptx_stacks_0 = nvptx_stacks_acquire (ptx_dev, stack_size, 1);
+	  nvptx_stacks_0 += stack_size;
+	  kargs[0] = &nvptx_stacks_0;
+	}
+      r = CUDA_CALL_NOCHECK (cuLaunchKernel,
+			     funcptr,
+			     1, 1, 1, 1, 1, 1,
+			     /* sharedMemBytes */ 0,
+			     /* hStream */ NULL,
+			     /* kernelParams */ funcname_mgomp ? kargs : NULL,
+			     /* extra */ NULL);
+      if (r != CUDA_SUCCESS)
+	{
+	  GOMP_PLUGIN_error ("cuLaunchKernel (%s) error: %s",
+			     funcname, cuda_error (r));
+	  ret = false;
+	}
+
+      r = CUDA_CALL_NOCHECK (cuStreamSynchronize,
+			     NULL);
+      if (r != CUDA_SUCCESS)
+	{
+	  GOMP_PLUGIN_error ("cuStreamSynchronize (%s) error: %s",
+			     funcname, cuda_error (r));
+	  ret = false;
+	}
+
+      if (funcname_mgomp)
+	pthread_mutex_unlock (&ptx_dev->omp_stacks.lock);
+    }
+
+  if (funcname_mgomp)
+    free (funcname_mgomp);
+
+  return ret;
+}
+
 /* Load the (partial) program described by TARGET_DATA to device
    number ORD.  Allocate and return TARGET_TABLE.  If not NULL, REV_FN_TABLE
    will contain the on-device addresses of the functions for reverse offload.
@@ -1485,6 +1589,9 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 
   nvptx_set_clocktick (module, dev);
 
+  if (!nvptx_do_global_cdtors (module, dev, "__do_global_ctors__entry"))
+    return -1;
+
   return fn_entries + var_entries + other_entries;
 }
 
@@ -1510,6 +1617,10 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data)
   for (prev_p = &dev->images; (image = *prev_p) != 0; prev_p = &image->next)
     if (image->target_data == target_data)
       {
+	if (!nvptx_do_global_cdtors (image->module, dev,
+				     "__do_global_dtors__entry"))
+	  ret = false;
+
 	*prev_p = image->next;
 	if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS)
 	  ret = false;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Make 'libgcc/config/nvptx/crt0.c' build '--without-headers' (was: [PING] nvptx: Support global constructors/destructors via 'collect2')
  2022-12-20  8:03   ` [PING] " Thomas Schwinge
  2023-01-11 11:48     ` [PING^2] " Thomas Schwinge
@ 2023-01-24  9:01     ` Thomas Schwinge
  1 sibling, 0 replies; 19+ messages in thread
From: Thomas Schwinge @ 2023-01-24  9:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1408 bytes --]

Hi!

On 2022-12-20T09:03:51+0100, I wrote:
> Minor change in the attached
> "nvptx: Support global constructors/destructors via 'collect2'": for
> 'atexit', add '#include <stdlib.h>' to 'libgcc/config/nvptx/crt0.c'.

Turns out, it's not that easy.  ;-) Pushed to devel/omp/gcc-12 branch
commit d90a8a5685c8bd3657892feac01739fe87a457a5
"Make 'libgcc/config/nvptx/crt0.c' build '--without-headers'", see
attached.  Please consider that one 'fixup'ed into the GCC master branch
submission.


Grüße
 Thomas


> --- a/libgcc/config/nvptx/crt0.c
> +++ b/libgcc/config/nvptx/crt0.c
> @@ -19,6 +19,9 @@
>     see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>     <http://www.gnu.org/licenses/>.  */
>
> +#include <stdlib.h>
> +#include "gbl-ctors.h"
> +
>  int *__exitval_ptr;
>
>  extern void __attribute__((noreturn)) exit (int status);
> @@ -47,5 +50,8 @@ __main (int *rval_ptr, int argc, void **argv)
>    __nvptx_stacks[0] = stack + sizeof stack;
>    __nvptx_uni[0] = 0;
>
> +  __do_global_ctors ();
> +  atexit (__do_global_dtors);
> +
>    exit (main (argc, argv));
>  }


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Make-libgcc-config-nvptx-crt0.c-build-without-header.patch --]
[-- Type: text/x-diff, Size: 1767 bytes --]

From d90a8a5685c8bd3657892feac01739fe87a457a5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Tue, 24 Jan 2023 09:49:34 +0100
Subject: [PATCH] Make 'libgcc/config/nvptx/crt0.c' build '--without-headers'

..., where it currently fails:

    [...]/libgcc/config/nvptx/crt0.c:22:10: fatal error: stdlib.h: No such file or directory
       22 | #include <stdlib.h>
          |          ^~~~~~~~~~

Fix-up for "nvptx: Support global constructors/destructors via 'collect2'".

	libgcc/
	* config/nvptx/crt0.c [!HAVE_STDLIB_H]: Don't '#include <stdlib.h>'.
	(atexit): Prototype.
---
 libgcc/ChangeLog.omp       | 5 +++++
 libgcc/config/nvptx/crt0.c | 7 ++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/libgcc/ChangeLog.omp b/libgcc/ChangeLog.omp
index c46f49bf5b7..cf509a70d61 100644
--- a/libgcc/ChangeLog.omp
+++ b/libgcc/ChangeLog.omp
@@ -1,3 +1,8 @@
+2023-01-24  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* config/nvptx/crt0.c [!HAVE_STDLIB_H]: Don't '#include <stdlib.h>'.
+	(atexit): Prototype.
+
 2023-01-20  Thomas Schwinge  <thomas@codesourcery.com>
 	    Andrew Stubbs  <ams@codesourcery.com>
 
diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c
index 860e2bfacad..02648bef84b 100644
--- a/libgcc/config/nvptx/crt0.c
+++ b/libgcc/config/nvptx/crt0.c
@@ -19,11 +19,16 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include <stdlib.h>
+#include "auto-target.h"
+
+#ifdef HAVE_STDLIB_H
+# include <stdlib.h>
+#endif
 #include "gbl-ctors.h"
 
 int *__exitval_ptr;
 
+extern int atexit (void (*function) (void));
 extern void __attribute__((noreturn)) exit (int status);
 extern int main (int, void **);
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* nvptx target: Global constructor, destructor support, via nvptx-tools 'ld' (was: nvptx: Support global constructors/destructors via 'collect2')
  2022-12-02 13:35 ` nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
                     ` (2 preceding siblings ...)
  2023-01-20 20:41   ` [og12] nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
@ 2024-05-31 13:15   ` Thomas Schwinge
  2024-06-03  7:28     ` [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30 (was: Re: nvptx target: Global constructor, destructor support, via nvptx-tools 'ld') Tobias Burnus
  3 siblings, 1 reply; 19+ messages in thread
From: Thomas Schwinge @ 2024-05-31 13:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 2811 bytes --]

Hi!

On 2022-12-02T14:35:35+0100, I wrote:
> On 2022-12-01T22:13:38+0100, I wrote:
>> I'm working on support for global constructors/destructors with
>> GCC/nvptx
>
> See "nvptx: Support global constructors/destructors via 'collect2'"
> attached; [...]
>
> Per my quick scanning of 'gcc/config.gcc' history, for more than two
> decades, there was a clear trend to remove 'use_collect2=yes'
> configurations; now finally a new one is being added -- making sure we're
> not slowly dispensing with the need for the early 1990s piece of work
> that 'gcc/collect2*' is...  ;'-P

In the following, I have then reconsidered that stance; we may actually
"Implement global constructor, destructor support in a conceptually
simpler way than using 'collect2' (the program): implement the respective
functionality in the nvptx-tools 'ld'".  The latter is
<https://github.com/SourceryTools/nvptx-tools/commit/96f8fc59a757767b9e98157d95c21e9fef22a93b>
"ld: Global constructor/destructor support".

Thus, this:

> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -2783,6 +2783,7 @@ nvptx-*)
>  	tm_file="${tm_file} newlib-stdint.h"
>  	use_gcc_stdint=wrap
>  	tmake_file="nvptx/t-nvptx"
> +	use_collect2=yes
>  	if test x$enable_as_accelerator = xyes; then
>  		extra_programs="${extra_programs} mkoffload\$(exeext)"
>  		tm_file="${tm_file} nvptx/offload.h"

... now is gone again.  ;'-)

Pushed to trunk branch commit d9c90c82d900fdae95df4499bf5f0a4ecb903b53
"nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'",
see attached.

(Support for nvptx offloading, enablement of full libgfortran for nvptx,
and corresponding documentation updates, etc. are to follow as separate
commits.)


Compared to the 2022 'collect2' version, this 'ld' version also does
happen to avoid one class of FAILs:

    [-FAIL:-]{+PASS:+} gfortran.dg/implicit_class_1.f90   -O0  (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} gfortran.dg/implicit_class_1.f90   -O0  [-compilation failed to produce executable-]{+execution test+}
    [...]

That was due to:

    Executing on host: [gfortran] [...] [...]/gfortran.dg/implicit_class_1.f90 [...] -fdump-fortran-original [...]
    [...]
    cc1: error: unrecognized command-line option '-fdump-fortran-original'; did you mean '-fdump-tree-original'?
    collect2: fatal error: gcc returned 1 exit status
    compilation terminated.
    compiler exited with status 1
    FAIL: gfortran.dg/implicit_class_1.f90   -O0  (test for excess errors)

That is, the 'gcc' invocation by 'collect2' is passed
'-fdump-fortran-original', but doesn't know what to do with that.  (Maybe
using '-Wno-complain-wrong-lang' in 'collect2' would help?)  (I'm not
going to look into that any further.)


Grüße
 Thomas



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-target-Global-constructor-destructor-support-v.patch --]
[-- Type: text/x-diff, Size: 8255 bytes --]

From d9c90c82d900fdae95df4499bf5f0a4ecb903b53 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tschwinge@baylibre.com>
Date: Tue, 28 May 2024 23:20:29 +0200
Subject: [PATCH] nvptx target: Global constructor, destructor support, via
 nvptx-tools 'ld'

The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this.  Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more
"sorry, unimplemented: global constructors not supported on this target".

For proper execution test results, this depends on
<https://github.com/SourceryTools/nvptx-tools/commit/96f8fc59a757767b9e98157d95c21e9fef22a93b>
"ld: Global constructor/destructor support".

	gcc/
	* config/nvptx/nvptx.h: Configure global constructor, destructor
	support.
	gcc/testsuite/
	* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
	'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
	in identifiers.
	* lib/target-supports.exp
	(check_effective_target_global_constructor): Enable for nvptx.
	libgcc/
	* config/nvptx/crt0.c (__gbl_ctors): New weak function.
	(__main): Invoke it.
	* config/nvptx/gbl-ctors.c: New.
	* config/nvptx/t-nvptx: Configure global constructor, destructor
	support.
---
 gcc/config/nvptx/nvptx.h                      | 14 +++-
 .../no_profile_instrument_function-attr-1.c   |  2 +-
 gcc/testsuite/lib/target-supports.exp         |  3 +-
 libgcc/config/nvptx/crt0.c                    | 12 +++
 libgcc/config/nvptx/gbl-ctors.c               | 74 +++++++++++++++++++
 libgcc/config/nvptx/t-nvptx                   |  9 ++-
 6 files changed, 109 insertions(+), 5 deletions(-)
 create mode 100644 libgcc/config/nvptx/gbl-ctors.c

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index e282aad1b73..74f4a68924c 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -356,7 +356,19 @@ struct GTY(()) machine_function
 #define MOVE_MAX 8
 #define MOVE_RATIO(SPEED) 4
 #define FUNCTION_MODE QImode
-#define HAS_INIT_SECTION 1
+
+/* Implement global constructor, destructor support in a conceptually simpler
+   way than using 'collect2' (the program): implement the respective
+   functionality in the nvptx-tools 'ld'.  This however still requires the
+   compiler-side effects corresponding to 'USE_COLLECT2': the global
+   constructor, destructor support functions need to have external linkage, and
+   therefore names that are "unique across the whole link".  Use
+   '!targetm.have_ctors_dtors' to achieve this (..., and thus don't need to
+   provide 'targetm.asm_out.constructor', 'targetm.asm_out.destructor').  */
+#define TARGET_HAVE_CTORS_DTORS false
+
+/* See 'libgcc/config/nvptx/crt0.c' for wrapping of 'main'.  */
+#define HAS_INIT_SECTION
 
 /* The C++ front end insists to link against libstdc++ -- which we don't build.
    Tell it to instead link against the innocuous libgcc.  */
diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
index 909f8a68479..5b4101cf596 100644
--- a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
@@ -18,7 +18,7 @@ int main ()
   return foo ();
 }
 
-/* { dg-final { scan-tree-dump-times "__gcov0\[._\]main.* = PROF_edge_counter" 1 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "__gcov0\[$._\]main.* = PROF_edge_counter" 1 "optimized"} } */
 /* { dg-final { scan-tree-dump-times "__gcov_indirect_call_profiler_v" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "__gcov_time_profiler_counter = " 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "__gcov_init" 1 "optimized" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index f0f6da52275..a3992faab5e 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -942,8 +942,7 @@ proc check_effective_target_nonlocal_goto {} {
 # Return 1 if global constructors are supported, 0 otherwise.
 
 proc check_effective_target_global_constructor {} {
-    if { [istarget nvptx-*-*]
-	 || [istarget bpf-*-*] } {
+    if { [istarget bpf-*-*] } {
 	return 0
     }
     return 1
diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c
index e37a6fb40d3..47e8ec44c19 100644
--- a/libgcc/config/nvptx/crt0.c
+++ b/libgcc/config/nvptx/crt0.c
@@ -32,6 +32,16 @@ void *__nvptx_stacks[32] __attribute__((shared,nocommon));
 /* Likewise for -muniform-simt.  */
 unsigned __nvptx_uni[32] __attribute__((shared,nocommon));
 
+/* Global constructor/destructor support.  Dummy; if necessary, overridden via
+   'gbl-ctors.c'.  */
+
+extern void __gbl_ctors (void);
+
+void __attribute__((weak))
+__gbl_ctors (void)
+{
+}
+
 extern void __main (int *, int, void **) __attribute__((kernel));
 
 void
@@ -47,5 +57,7 @@ __main (int *rval_ptr, int argc, void **argv)
   __nvptx_stacks[0] = stack + sizeof stack;
   __nvptx_uni[0] = 0;
 
+  __gbl_ctors ();
+
   exit (main (argc, argv));
 }
diff --git a/libgcc/config/nvptx/gbl-ctors.c b/libgcc/config/nvptx/gbl-ctors.c
new file mode 100644
index 00000000000..a2ca053e5e3
--- /dev/null
+++ b/libgcc/config/nvptx/gbl-ctors.c
@@ -0,0 +1,74 @@
+/* Global constructor/destructor support
+
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "auto-target.h"
+
+#ifdef HAVE_STDLIB_H
+# include <stdlib.h>
+#endif
+#include "gbl-ctors.h"
+
+extern int atexit (void (*function) (void));
+
+
+/* Handler functions ('static', in contrast to the 'gbl-ctors.h'
+   prototypes).  */
+
+static void __static_do_global_ctors (void);
+
+static void
+__static_do_global_ctors (void)
+{
+  __SIZE_TYPE__ nptrs = (__SIZE_TYPE__) __CTOR_LIST__[0];
+  for (__SIZE_TYPE__ i = nptrs; i >= 1; --i)
+    __CTOR_LIST__[i] ();
+}
+
+static void __static_do_global_dtors (void);
+
+static void
+__static_do_global_dtors (void)
+{
+  func_ptr *p = __DTOR_LIST__;
+  ++p;
+  for (; *p; ++p)
+    (*p) ();
+}
+
+
+/* For nvptx target configurations, override the 'crt0.c' dummy.  */
+
+extern void __gbl_ctors (void);
+
+void
+__gbl_ctors (void)
+{
+  __static_do_global_ctors ();
+  atexit (__static_do_global_dtors);
+}
+
+
+/* The following symbol just provides a means for the nvptx-tools 'ld' to
+   trigger linking in this file.  */
+
+int __trigger_gbl_ctors;
diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index 49fdb557b56..260ed6334db 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -6,8 +6,10 @@ LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
 LIB2ADD += $(srcdir)/c++-minimal/guard.c
 
 LIB2ADDEH=
-LIB2FUNCS_EXCLUDE=__main
+LIB2FUNCS_EXCLUDE=
 
+# Wrapping of 'main'.
+LIB2FUNCS_EXCLUDE += __main
 crt0.o: $(srcdir)/config/nvptx/crt0.c
 	$(crt_compile) -c $<
 
@@ -15,3 +17,8 @@ crt0.o: $(srcdir)/config/nvptx/crt0.c
 # support it, and it may cause the build to fail, because of alloca usage, for
 # example.
 INHIBIT_LIBC_CFLAGS = -Dinhibit_libc
+
+# Support for global constructors/destructors is implemented via the
+# nvptx-tools 'ld' and the following helpers.
+LIB2ADD += $(srcdir)/config/nvptx/gbl-ctors.c
+LIB2FUNCS_EXCLUDE += _ctors
-- 
2.34.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30 (was: Re: nvptx target: Global constructor, destructor support, via nvptx-tools 'ld')
  2024-05-31 13:15   ` nvptx target: Global constructor, destructor support, via nvptx-tools 'ld' (was: nvptx: Support global constructors/destructors via 'collect2') Thomas Schwinge
@ 2024-06-03  7:28     ` Tobias Burnus
  2024-06-03  8:23       ` Richard Biener
  0 siblings, 1 reply; 19+ messages in thread
From: Tobias Burnus @ 2024-06-03  7:28 UTC (permalink / raw)
  To: Thomas Schwinge, gcc-patches; +Cc: Tom de Vries, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 814 bytes --]

Thomas Schwinge wrote:
> In the following, I have then reconsidered that stance; we may actually
> "Implement global constructor, destructor support in a conceptually
> simpler way than using 'collect2' (the program): implement the respective
> functionality in the nvptx-tools 'ld'".  The latter is
> <https://github.com/SourceryTools/nvptx-tools/commit/96f8fc59a757767b9e98157d95c21e9fef22a93b>
> "ld: Global constructor/destructor support".

The attached patch makes clearer which version should be
installed by recommending this patch (= latest nvptx-tools)
in install.texi.

OK? Comments, remarks?

Tobias

PS: If the https://github.com/SourceryTools/nvptx-tools/pull/47
(nvptx-ld.cc: Improve C++11 compatibility with older compilers)
proofs worthwhile and gets merged, we should point to that commit
instead.

[-- Attachment #2: nvptx-install.diff --]
[-- Type: text/x-patch, Size: 703 bytes --]

install.texi (nvptx): Recommend nvptx-tools 2024-05-30

gcc/
	* doc/install.texi (nvptx): Recommend nvptx-tools 2024-05-30 or newer.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 42b462a2ce2..4859f6743ab 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -4698,7 +4698,8 @@ Andes NDS32 target in big endian mode.
 Nvidia PTX target.
 
 Instead of GNU binutils, you will need to install
-@uref{https://github.com/SourceryTools/nvptx-tools,,nvptx-tools}.
+@uref{https://github.com/SourceryTools/nvptx-tools,,nvptx-tools}
+(recommended: 96f8fc5 of 2024-05-30 -- or newer).
 Tell GCC where to find it:
 @option{--with-build-time-tools=[install-nvptx-tools]/nvptx-none/bin}.
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30 (was: Re: nvptx target: Global constructor, destructor support, via nvptx-tools 'ld')
  2024-06-03  7:28     ` [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30 (was: Re: nvptx target: Global constructor, destructor support, via nvptx-tools 'ld') Tobias Burnus
@ 2024-06-03  8:23       ` Richard Biener
  2024-06-03  8:37         ` [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30 Tobias Burnus
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Biener @ 2024-06-03  8:23 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Thomas Schwinge, gcc-patches, Tom de Vries

On Mon, 3 Jun 2024, Tobias Burnus wrote:

> Thomas Schwinge wrote:
> > In the following, I have then reconsidered that stance; we may actually
> > "Implement global constructor, destructor support in a conceptually
> > simpler way than using 'collect2' (the program): implement the respective
> > functionality in the nvptx-tools 'ld'".  The latter is
> > <https://github.com/SourceryTools/nvptx-tools/commit/96f8fc59a757767b9e98157d95c21e9fef22a93b>
> > "ld: Global constructor/destructor support".
> 
> The attached patch makes clearer which version should be
> installed by recommending this patch (= latest nvptx-tools)
> in install.texi.
> 
> OK? Comments, remarks?

Can we simply say "newerst" where I guess refering to a github repo
already implies this?

> Tobias
> 
> PS: If the https://github.com/SourceryTools/nvptx-tools/pull/47
> (nvptx-ld.cc: Improve C++11 compatibility with older compilers)
> proofs worthwhile and gets merged, we should point to that commit
> instead.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30
  2024-06-03  8:23       ` Richard Biener
@ 2024-06-03  8:37         ` Tobias Burnus
  2024-06-03  9:09           ` Richard Biener
  0 siblings, 1 reply; 19+ messages in thread
From: Tobias Burnus @ 2024-06-03  8:37 UTC (permalink / raw)
  To: Richard Biener; +Cc: Thomas Schwinge, gcc-patches, Tom de Vries

Richard Biener wrote:
> On Mon, 3 Jun 2024, Tobias Burnus wrote:
>> Thomas Schwinge wrote:
>>> In the following, I have then reconsidered that stance; we may actually
>>> "Implement global constructor, destructor support in a conceptually
>>> simpler way than using 'collect2' (the program): implement the respective
>>> functionality in the nvptx-tools 'ld'".  The latter is
>>> <https://github.com/SourceryTools/nvptx-tools/commit/96f8fc59a757767b9e98157d95c21e9fef22a93b>
>>> "ld: Global constructor/destructor support".
>> The attached patch makes clearer which version should be
>> installed by recommending this patch (= latest nvptx-tools)
>> in install.texi.
> Can we simply say "newerst" where I guess refering to a github repo
> already implies this?

Good question. The problem I see with just referring to a repository 
(even with newest) often means: yes, that software I have (whatever 
version). While if some reference goes to a 2024 version, I might not 
know what version I have but likely an older version → I will update.

Admittedly, as people tend to *not* read the documentation, this 
approach might fail as well. But, maybe, it is sufficient to update GCC 
15's release notes?*

It won't help those not reading with the release notes before building 
and the wording* had to be changed a bit as install.texi no longer 
states what version should be used, but it would be an alternative

(*) https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653417.html

Tobias


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30
  2024-06-03  8:37         ` [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30 Tobias Burnus
@ 2024-06-03  9:09           ` Richard Biener
  2024-06-03 10:26             ` Tobias Burnus
  2024-06-08 23:01             ` Gerald Pfeifer
  0 siblings, 2 replies; 19+ messages in thread
From: Richard Biener @ 2024-06-03  9:09 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Thomas Schwinge, gcc-patches, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1963 bytes --]

On Mon, 3 Jun 2024, Tobias Burnus wrote:

> Richard Biener wrote:
> > On Mon, 3 Jun 2024, Tobias Burnus wrote:
> >> Thomas Schwinge wrote:
> >>> In the following, I have then reconsidered that stance; we may actually
> >>> "Implement global constructor, destructor support in a conceptually
> >>> simpler way than using 'collect2' (the program): implement the respective
> >>> functionality in the nvptx-tools 'ld'".  The latter is
> >>> <https://github.com/SourceryTools/nvptx-tools/commit/96f8fc59a757767b9e98157d95c21e9fef22a93b>
> >>> "ld: Global constructor/destructor support".
> >> The attached patch makes clearer which version should be
> >> installed by recommending this patch (= latest nvptx-tools)
> >> in install.texi.
> > Can we simply say "newerst" where I guess refering to a github repo
> > already implies this?
> 
> Good question. The problem I see with just referring to a repository (even
> with newest) often means: yes, that software I have (whatever version). While
> if some reference goes to a 2024 version, I might not know what version I have
> but likely an older version → I will update.
> 
> Admittedly, as people tend to *not* read the documentation, this approach
> might fail as well. But, maybe, it is sufficient to update GCC 15's release
> notes?*
> 
> It won't help those not reading with the release notes before building and the
> wording* had to be changed a bit as install.texi no longer states what version
> should be used, but it would be an alternative

install.texi also has the issue that it's not pre-packaged in a
easy to discover and readable file in the release tarballs and that
the online version is only for trunk.

> (*) https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653417.html
> 
> Tobias
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30
  2024-06-03  9:09           ` Richard Biener
@ 2024-06-03 10:26             ` Tobias Burnus
  2024-06-03 11:16               ` Richard Biener
  2024-06-08 23:01             ` Gerald Pfeifer
  1 sibling, 1 reply; 19+ messages in thread
From: Tobias Burnus @ 2024-06-03 10:26 UTC (permalink / raw)
  To: Richard Biener; +Cc: Thomas Schwinge, gcc-patches, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 369 bytes --]

Richard Biener wrote:
> install.texi also has the issue that it's not pre-packaged in a
> easy to discover and readable file in the release tarballs and that
> the online version is only for trunk.

I always wondered why it is not included at 
https://gcc.gnu.org/onlinedocs/ — it would then also be linked from, 
e.g., https://gcc.gnu.org/gcc-14/index.html

Tobias


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30
  2024-06-03 10:26             ` Tobias Burnus
@ 2024-06-03 11:16               ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2024-06-03 11:16 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Thomas Schwinge, gcc-patches, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 863 bytes --]

On Mon, 3 Jun 2024, Tobias Burnus wrote:

> Richard Biener wrote:
> > install.texi also has the issue that it's not pre-packaged in a
> > easy to discover and readable file in the release tarballs and that
> > the online version is only for trunk.
> 
> I always wondered why it is not included at https://gcc.gnu.org/onlinedocs/ —
> it would then also be linked from, e.g., https://gcc.gnu.org/gcc-14/index.html

I'm quite sure it's because nobody bothered to update
maintainer-scripts/update_web_docs_git.  The install docs are
generated into INSTALL/ in the release tarballs it seems but
it's html rather than a plain text file there.

Richard.

> Tobias
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld' (was: nvptx: Support global constructors/destructors via 'collect2' for offloading)
  2022-12-23 13:35   ` nvptx: Support global constructors/destructors via 'collect2' for offloading (was: " Thomas Schwinge
  2022-12-23 13:37     ` Thomas Schwinge
  2023-01-20 20:46     ` [og12] " Thomas Schwinge
@ 2024-06-06 12:02     ` Thomas Schwinge
  2 siblings, 0 replies; 19+ messages in thread
From: Thomas Schwinge @ 2024-06-06 12:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 868 bytes --]

Hi!

On 2022-12-23T14:35:16+0100, I wrote:
> On 2022-12-02T14:35:35+0100, I wrote:
>> On 2022-12-01T22:13:38+0100, I wrote:
>>> I'm working on support for global constructors/destructors with
>>> GCC/nvptx
>>
>> See "nvptx: Support global constructors/destructors via 'collect2'"
>> [posted before]

..., which I then recently revised; see
commit d9c90c82d900fdae95df4499bf5f0a4ecb903b53
"nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'".

> Building on that, attached is now the additional "for offloading" piece:
> "nvptx: Support global constructors/destructors via 'collect2' for offloading".

Similarly revised, I've now pushed to trunk branch
commit 5bbe5350a0932c78d4ffce292ba4104a6fe6ef96
"nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld'",
see attached.


Grüße
 Thomas



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-offloading-Global-constructor-destructor-suppo.patch --]
[-- Type: text/x-diff, Size: 7460 bytes --]

From 5bbe5350a0932c78d4ffce292ba4104a6fe6ef96 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tschwinge@baylibre.com>
Date: Wed, 5 Jun 2024 12:40:50 +0200
Subject: [PATCH] nvptx offloading: Global constructor, destructor support, via
 nvptx-tools 'ld'

This extends commit d9c90c82d900fdae95df4499bf5f0a4ecb903b53
"nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'"
for offloading.

	libgcc/
	* config/nvptx/gbl-ctors.c ["mgomp"]
	(__do_global_ctors__entry__mgomp)
	(__do_global_dtors__entry__mgomp): New.
	[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
	New.
	libgomp/
	* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
	(nvptx_close_device, GOMP_OFFLOAD_load_image)
	(GOMP_OFFLOAD_unload_image): Call it.
---
 libgcc/config/nvptx/gbl-ctors.c |  55 +++++++++++++++
 libgomp/plugin/plugin-nvptx.c   | 117 +++++++++++++++++++++++++++++++-
 2 files changed, 171 insertions(+), 1 deletion(-)

diff --git a/libgcc/config/nvptx/gbl-ctors.c b/libgcc/config/nvptx/gbl-ctors.c
index a2ca053e5e3..a56d64f8ef8 100644
--- a/libgcc/config/nvptx/gbl-ctors.c
+++ b/libgcc/config/nvptx/gbl-ctors.c
@@ -68,6 +68,61 @@ __gbl_ctors (void)
 }
 
 
+/* For nvptx offloading configurations, need '.entry' wrappers.  */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+/* OpenMP */
+
+/* See 'crt0.c', 'mgomp.c'.  */
+extern void *__nvptx_stacks[32] __attribute__((shared,nocommon));
+extern unsigned __nvptx_uni[32] __attribute__((shared,nocommon));
+
+__attribute__((kernel)) void __do_global_ctors__entry__mgomp (void *);
+
+void
+__do_global_ctors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __static_do_global_ctors ();
+}
+
+__attribute__((kernel)) void __do_global_dtors__entry__mgomp (void *);
+
+void
+__do_global_dtors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __static_do_global_dtors ();
+}
+
+# else
+
+/* OpenACC */
+
+__attribute__((kernel)) void __do_global_ctors__entry (void);
+
+void
+__do_global_ctors__entry (void)
+{
+  __static_do_global_ctors ();
+}
+
+__attribute__((kernel)) void __do_global_dtors__entry (void);
+
+void
+__do_global_dtors__entry (void)
+{
+  __static_do_global_dtors ();
+}
+
+# endif
+
+
 /* The following symbol just provides a means for the nvptx-tools 'ld' to
    trigger linking in this file.  */
 
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 4cedc5390a3..0f3a3be1898 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -346,6 +346,11 @@ static struct ptx_device **ptx_devices;
    default is set here.  */
 static unsigned lowlat_pool_size = 8 * 1024;
 
+static bool nvptx_do_global_cdtors (CUmodule, struct ptx_device *,
+				    const char *);
+static size_t nvptx_stacks_size ();
+static void *nvptx_stacks_acquire (struct ptx_device *, size_t, int);
+
 static inline struct nvptx_thread *
 nvptx_thread (void)
 {
@@ -565,6 +570,18 @@ nvptx_close_device (struct ptx_device *ptx_dev)
   if (!ptx_dev)
     return true;
 
+  bool ret = true;
+
+  for (struct ptx_image_data *image = ptx_dev->images;
+       image != NULL;
+       image = image->next)
+    {
+      if (!nvptx_do_global_cdtors (image->module, ptx_dev,
+				   "__do_global_dtors__entry"
+				   /* or "__do_global_dtors__entry__mgomp" */))
+	ret = false;
+    }
+
   for (struct ptx_free_block *b = ptx_dev->free_blocks; b;)
     {
       struct ptx_free_block *b_next = b->next;
@@ -585,7 +602,8 @@ nvptx_close_device (struct ptx_device *ptx_dev)
     CUDA_CALL (cuCtxDestroy, ptx_dev->ctx);
 
   free (ptx_dev);
-  return true;
+
+  return ret;
 }
 
 static int
@@ -1317,6 +1335,93 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
     GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
 }
 
+/* Invoke MODULE's global constructors/destructors.  */
+
+static bool
+nvptx_do_global_cdtors (CUmodule module, struct ptx_device *ptx_dev,
+			const char *funcname)
+{
+  bool ret = true;
+  char *funcname_mgomp = NULL;
+  CUresult r;
+  CUfunction funcptr;
+  r = CUDA_CALL_NOCHECK (cuModuleGetFunction,
+			 &funcptr, module, funcname);
+  GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n",
+		     funcname, cuda_error (r));
+  if (r == CUDA_ERROR_NOT_FOUND)
+    {
+      /* Try '[funcname]__mgomp'.  */
+
+      size_t funcname_len = strlen (funcname);
+      const char *mgomp_suffix = "__mgomp";
+      size_t mgomp_suffix_len = strlen (mgomp_suffix);
+      funcname_mgomp
+	= GOMP_PLUGIN_malloc (funcname_len + mgomp_suffix_len + 1);
+      memcpy (funcname_mgomp, funcname, funcname_len);
+      memcpy (funcname_mgomp + funcname_len,
+	      mgomp_suffix, mgomp_suffix_len + 1);
+      funcname = funcname_mgomp;
+
+      r = CUDA_CALL_NOCHECK (cuModuleGetFunction,
+			     &funcptr, module, funcname);
+      GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n",
+			 funcname, cuda_error (r));
+    }
+  if (r == CUDA_ERROR_NOT_FOUND)
+    ;
+  else if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("cuModuleGetFunction (%s) error: %s",
+			 funcname, cuda_error (r));
+      ret = false;
+    }
+  else
+    {
+      /* If necessary, set up soft stack.  */
+      void *nvptx_stacks_0;
+      void *kargs[1];
+      if (funcname_mgomp)
+	{
+	  size_t stack_size = nvptx_stacks_size ();
+	  pthread_mutex_lock (&ptx_dev->omp_stacks.lock);
+	  nvptx_stacks_0 = nvptx_stacks_acquire (ptx_dev, stack_size, 1);
+	  nvptx_stacks_0 += stack_size;
+	  kargs[0] = &nvptx_stacks_0;
+	}
+      r = CUDA_CALL_NOCHECK (cuLaunchKernel,
+			     funcptr,
+			     1, 1, 1, 1, 1, 1,
+			     /* sharedMemBytes */ 0,
+			     /* hStream */ NULL,
+			     /* kernelParams */ funcname_mgomp ? kargs : NULL,
+			     /* extra */ NULL);
+      if (r != CUDA_SUCCESS)
+	{
+	  GOMP_PLUGIN_error ("cuLaunchKernel (%s) error: %s",
+			     funcname, cuda_error (r));
+	  ret = false;
+	}
+
+      r = CUDA_CALL_NOCHECK (cuStreamSynchronize,
+			     NULL);
+      if (r != CUDA_SUCCESS)
+	{
+	  GOMP_PLUGIN_error ("cuStreamSynchronize (%s) error: %s",
+			     funcname, cuda_error (r));
+	  ret = false;
+	}
+
+      if (funcname_mgomp)
+	pthread_mutex_unlock (&ptx_dev->omp_stacks.lock);
+    }
+
+  if (funcname_mgomp)
+    free (funcname_mgomp);
+
+  return ret;
+}
+
 /* Load the (partial) program described by TARGET_DATA to device
    number ORD.  Allocate and return TARGET_TABLE.  If not NULL, REV_FN_TABLE
    will contain the on-device addresses of the functions for reverse offload.
@@ -1546,6 +1651,11 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 
   nvptx_set_clocktick (module, dev);
 
+  if (!nvptx_do_global_cdtors (module, dev,
+			       "__do_global_ctors__entry"
+			       /* or "__do_global_ctors__entry__mgomp" */))
+    return -1;
+
   return fn_entries + var_entries + other_entries;
 }
 
@@ -1571,6 +1681,11 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data)
   for (prev_p = &dev->images; (image = *prev_p) != 0; prev_p = &image->next)
     if (image->target_data == target_data)
       {
+	if (!nvptx_do_global_cdtors (image->module, dev,
+				     "__do_global_dtors__entry"
+				     /* or "__do_global_dtors__entry__mgomp" */))
+	  ret = false;
+
 	*prev_p = image->next;
 	if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS)
 	  ret = false;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30
  2024-06-03  9:09           ` Richard Biener
  2024-06-03 10:26             ` Tobias Burnus
@ 2024-06-08 23:01             ` Gerald Pfeifer
  1 sibling, 0 replies; 19+ messages in thread
From: Gerald Pfeifer @ 2024-06-08 23:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: Tobias Burnus, Thomas Schwinge, gcc-patches, Tom de Vries

On Mon, 3 Jun 2024, Richard Biener wrote:
> install.texi also has the issue that it's not pre-packaged in a
> easy to discover and readable file in the release tarballs and that
> the online version is only for trunk.

The latter is only partially true: we generally try to keep it applicable 
more broadly - to a fault at times, if you look at some of the recent 
pruning I had to do.

Gerald

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-06-08 23:01 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com>
2022-12-02 13:35 ` nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
2022-12-20  8:03   ` [PING] " Thomas Schwinge
2023-01-11 11:48     ` [PING^2] " Thomas Schwinge
2023-01-24  9:01     ` Make 'libgcc/config/nvptx/crt0.c' build '--without-headers' (was: [PING] nvptx: Support global constructors/destructors via 'collect2') Thomas Schwinge
2022-12-23 13:35   ` nvptx: Support global constructors/destructors via 'collect2' for offloading (was: " Thomas Schwinge
2022-12-23 13:37     ` Thomas Schwinge
2023-01-11 11:49       ` [PING] " Thomas Schwinge
2023-01-20 20:46     ` [og12] " Thomas Schwinge
2024-06-06 12:02     ` nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld' (was: nvptx: Support global constructors/destructors via 'collect2' for offloading) Thomas Schwinge
2023-01-20 20:41   ` [og12] nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
2023-01-20 20:45     ` Thomas Schwinge
2024-05-31 13:15   ` nvptx target: Global constructor, destructor support, via nvptx-tools 'ld' (was: nvptx: Support global constructors/destructors via 'collect2') Thomas Schwinge
2024-06-03  7:28     ` [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30 (was: Re: nvptx target: Global constructor, destructor support, via nvptx-tools 'ld') Tobias Burnus
2024-06-03  8:23       ` Richard Biener
2024-06-03  8:37         ` [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30 Tobias Burnus
2024-06-03  9:09           ` Richard Biener
2024-06-03 10:26             ` Tobias Burnus
2024-06-03 11:16               ` Richard Biener
2024-06-08 23:01             ` Gerald Pfeifer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).