* nvptx: Support global constructors/destructors via 'collect2'
[not found] <878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com>
@ 2022-12-02 13:35 ` Thomas Schwinge
2022-12-20 8:03 ` [PING] " Thomas Schwinge
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Thomas Schwinge @ 2022-12-02 13:35 UTC (permalink / raw)
To: gcc-patches, Tom de Vries
[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]
Hi!
On 2022-12-01T22:13:38+0100, I wrote:
> I'm working on support for global constructors/destructors with
> GCC/nvptx
See "nvptx: Support global constructors/destructors via 'collect2'"
attached; OK to push? (... with 'gcc/doc/install.texi' accordingly
updated once <https://github.com/MentorEmbedded/nvptx-tools/pull/40>
"'nm'" and newlib
<https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
"nvptx: Implement '_exit' instead of 'exit'" have been merged; any
comments to those?)
Per my quick scanning of 'gcc/config.gcc' history, for more than two
decades, there was a clear trend to remove 'use_collect2=yes'
configurations; now finally a new one is being added -- making sure we're
not slowly dispensing with the need for the early 1990s piece of work
that 'gcc/collect2*' is... ;'-P
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Support-global-constructors-destructors-via-co.patch --]
[-- Type: text/x-diff, Size: 10745 bytes --]
From ba5f6471d39e684fb740523651138a90a1b63cf9 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Sun, 13 Nov 2022 14:19:30 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'
The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this. Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more "sorry, unimplemented:
global constructors not supported on this target".
This depends on <https://github.com/MentorEmbedded/nvptx-tools/pull/40> "'nm'"
generally, and for global destructors support: newlib
<https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
"nvptx: Implement '_exit' instead of 'exit'".
gcc/
* collect2.cc (write_c_file_glob): Allow for
'COLLECT2_MAIN_REFERENCE' override.
* config.gcc <case ${target} in nvptx-*>: Set 'use_collect2=yes'.
* config/nvptx/nvptx.h: Adjust.
gcc/testsuite/
* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
in identifiers.
* lib/target-supports.exp
(check_effective_target_global_constructor): Enable for nvptx.
libgcc/
* config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
'crtend.o' to 'extra_parts'.
* config/nvptx/crt0.c: Invoke '__do_global_ctors',
'__do_global_dtors'.
* config/nvptx/crtstuff.c: New.
* config/nvptx/t-nvptx: Adjust.
---
gcc/collect2.cc | 4 ++
gcc/config.gcc | 1 +
gcc/config/nvptx/nvptx.h | 35 ++++++++++-
.../no_profile_instrument_function-attr-1.c | 2 +-
gcc/testsuite/lib/target-supports.exp | 3 +-
libgcc/config.host | 2 +-
libgcc/config/nvptx/crt0.c | 5 ++
libgcc/config/nvptx/crtstuff.c | 58 +++++++++++++++++++
libgcc/config/nvptx/t-nvptx | 15 ++++-
9 files changed, 118 insertions(+), 7 deletions(-)
create mode 100644 libgcc/config/nvptx/crtstuff.c
diff --git a/gcc/collect2.cc b/gcc/collect2.cc
index d81c7f28f16..945a9ff86dd 100644
--- a/gcc/collect2.cc
+++ b/gcc/collect2.cc
@@ -2238,8 +2238,12 @@ write_c_file_glob (FILE *stream, const char *name ATTRIBUTE_UNUSED)
fprintf (stream, "\tdereg_frame,\n");
fprintf (stream, "\t0\n};\n\n");
+# ifdef COLLECT2_MAIN_REFERENCE
+ fprintf (stream, "%s\n\n", COLLECT2_MAIN_REFERENCE);
+# else
fprintf (stream, "extern entry_pt %s;\n", NAME__MAIN);
fprintf (stream, "entry_pt *__main_reference = %s;\n\n", NAME__MAIN);
+# endif
}
#endif /* ! LD_INIT_SWITCH */
diff --git a/gcc/config.gcc b/gcc/config.gcc
index b5eda046033..9b27efd5f51 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2783,6 +2783,7 @@ nvptx-*)
tm_file="${tm_file} newlib-stdint.h"
use_gcc_stdint=wrap
tmake_file="nvptx/t-nvptx"
+ use_collect2=yes
if test x$enable_as_accelerator = xyes; then
extra_programs="${extra_programs} mkoffload\$(exeext)"
tm_file="${tm_file} nvptx/offload.h"
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index b1cbe5d417b..bc1021a8031 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -35,7 +35,39 @@
'../../gcc.cc:asm_options', 'HAVE_GNU_AS'. */
#define ASM_SPEC "%{v}"
-#define STARTFILE_SPEC "%{mmainkernel:crt0.o%s}"
+#define STARTFILE_SPEC \
+ STARTFILE_SPEC_MMAINKERNEL \
+ " " STARTFILE_SPEC_CDTOR
+
+#define ENDFILE_SPEC \
+ ENDFILE_SPEC_CDTOR
+
+#define STARTFILE_SPEC_MMAINKERNEL "%{mmainkernel:crt0.o%s}"
+
+/* Support for global constructors/destructors is implemented via
+ 'collect2' and the following helpers. */
+
+#define STARTFILE_SPEC_CDTOR "crtbegin.o%s"
+
+#define ENDFILE_SPEC_CDTOR "crtend.o%s"
+
+/* nvptx does its own wrapping of 'main'
+ (see 'libgcc/config/nvptx/crt0.c:__main'). */
+#define HAS_INIT_SECTION
+
+/* For example with old Nvidia Tesla K20c, Driver Version: 361.93.02, the
+ function pointers stored in the '__CTOR_LIST__', '__DTOR_LIST__' arrays
+ evidently evaluate to NULL in JIT compilation. Avoiding the use of
+ assembler names ('write_list_with_asm') doesn't help, but defining a dummy
+ function next to the arrays apparently does work around this issue...
+
+ The default '__main_reference' synthesized by 'collect2' refers to our
+ 'crt0.o' '__main' function with incompatible signature:
+
+ error : Function '__main' not declared __global__ in all source files
+
+ Address both these issues via 'COLLECT2_MAIN_REFERENCE'. */
+#define COLLECT2_MAIN_REFERENCE "__attribute__((unused)) static void dummy () {}"
#define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
@@ -354,7 +386,6 @@ struct GTY(()) machine_function
#define MOVE_MAX 8
#define MOVE_RATIO(SPEED) 4
#define FUNCTION_MODE QImode
-#define HAS_INIT_SECTION 1
/* The C++ front end insists to link against libstdc++ -- which we don't build.
Tell it to instead link against the innocuous libgcc. */
diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
index 909f8a68479..5b4101cf596 100644
--- a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
@@ -18,7 +18,7 @@ int main ()
return foo ();
}
-/* { dg-final { scan-tree-dump-times "__gcov0\[._\]main.* = PROF_edge_counter" 1 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "__gcov0\[$._\]main.* = PROF_edge_counter" 1 "optimized"} } */
/* { dg-final { scan-tree-dump-times "__gcov_indirect_call_profiler_v" 1 "optimized" } } */
/* { dg-final { scan-tree-dump-times "__gcov_time_profiler_counter = " 1 "optimized" } } */
/* { dg-final { scan-tree-dump-times "__gcov_init" 1 "optimized" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index d2de761adb5..ba9b045ebd4 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -907,8 +907,7 @@ proc check_effective_target_nonlocal_goto {} {
# Return 1 if global constructors are supported, 0 otherwise.
proc check_effective_target_global_constructor {} {
- if { [istarget nvptx-*-*]
- || [istarget bpf-*-*] } {
+ if { [istarget bpf-*-*] } {
return 0
}
return 1
diff --git a/libgcc/config.host b/libgcc/config.host
index eb23abe89f5..25072f41860 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1499,7 +1499,7 @@ m32c-*-elf*)
;;
nvptx-*)
tmake_file="$tmake_file nvptx/t-nvptx"
- extra_parts="crt0.o"
+ extra_parts="crt0.o crtbegin.o crtend.o"
;;
*)
echo "*** Configuration ${host} not supported" 1>&2
diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c
index abf047327ae..6a49790f479 100644
--- a/libgcc/config/nvptx/crt0.c
+++ b/libgcc/config/nvptx/crt0.c
@@ -19,6 +19,8 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#include "gbl-ctors.h"
+
int *__exitval_ptr;
extern void __attribute__((noreturn)) exit (int status);
@@ -47,5 +49,8 @@ __main (int *rval_ptr, int argc, void **argv)
__nvptx_stacks[0] = stack + sizeof stack;
__nvptx_uni[0] = 0;
+ __do_global_ctors ();
+ atexit (__do_global_dtors);
+
exit (main (argc, argv));
}
diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
new file mode 100644
index 00000000000..0823fc49901
--- /dev/null
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -0,0 +1,58 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include "gbl-ctors.h"
+
+/* The common 'crtstuff.c' doesn't quite provide what we need, so we roll our
+ own.
+
+ There's no technical reason in this configuration here to split the two
+ functions '__do_global_ctors' and '__do_global_ctors' into two separate
+ files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
+ do so anyway, for symmetry with other configurations. */
+
+#ifdef CRT_BEGIN
+
+void
+__do_global_ctors (void)
+{
+ DO_GLOBAL_CTORS_BODY;
+}
+
+#elif defined(CRT_END) /* ! CRT_BEGIN */
+
+void
+__do_global_dtors (void)
+{
+ /* In this configuration here, there's no way that "this routine is run more
+ than once [...] when exit is called recursively": for nvptx target, the
+ call to '__do_global_dtors' is registered via 'atexit', which doesn't
+ re-enter a function already run.
+ Therefore, we do *not* "arrange to remember where in the list we left off
+ processing". */
+ func_ptr *p;
+ for (p = __DTOR_LIST__ + 1; *p; )
+ (*p++) ();
+}
+
+#else /* ! CRT_BEGIN && ! CRT_END */
+#error "One of CRT_BEGIN or CRT_END must be defined."
+#endif
diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index ede0bf0f87d..9a0454c3a4d 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -3,7 +3,7 @@ LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
$(srcdir)/config/nvptx/atomic.c
LIB2ADDEH=
-LIB2FUNCS_EXCLUDE=__main
+LIB2FUNCS_EXCLUDE=
crt0.o: $(srcdir)/config/nvptx/crt0.c
$(crt_compile) -c $<
@@ -12,3 +12,16 @@ crt0.o: $(srcdir)/config/nvptx/crt0.c
# support it, and it may cause the build to fail, because of alloca usage, for
# example.
INHIBIT_LIBC_CFLAGS = -Dinhibit_libc
+
+# Support for global constructors/destructors is implemented via
+# 'collect2' and the following helpers.
+
+LIB2FUNCS_EXCLUDE += __main
+
+CUSTOM_CRTSTUFF = yes
+
+crtbegin.o: $(srcdir)/config/nvptx/crtstuff.c
+ $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_BEGIN
+
+crtend.o: $(srcdir)/config/nvptx/crtstuff.c
+ $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_END
--
2.25.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PING] nvptx: Support global constructors/destructors via 'collect2'
2022-12-02 13:35 ` nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
@ 2022-12-20 8:03 ` Thomas Schwinge
2023-01-11 11:48 ` [PING^2] " Thomas Schwinge
2023-01-24 9:01 ` Make 'libgcc/config/nvptx/crt0.c' build '--without-headers' (was: [PING] nvptx: Support global constructors/destructors via 'collect2') Thomas Schwinge
2022-12-23 13:35 ` nvptx: Support global constructors/destructors via 'collect2' for offloading (was: " Thomas Schwinge
2023-01-20 20:41 ` [og12] nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
2 siblings, 2 replies; 10+ messages in thread
From: Thomas Schwinge @ 2022-12-20 8:03 UTC (permalink / raw)
To: gcc-patches, Tom de Vries
[-- Attachment #1: Type: text/plain, Size: 1433 bytes --]
Hi!
Ping.
Minor change in the attached
"nvptx: Support global constructors/destructors via 'collect2'": for
'atexit', add '#include <stdlib.h>' to 'libgcc/config/nvptx/crt0.c'.
Grüße
Thomas
On 2022-12-02T14:35:35+0100, I wrote:
> Hi!
>
> On 2022-12-01T22:13:38+0100, I wrote:
>> I'm working on support for global constructors/destructors with
>> GCC/nvptx
>
> See "nvptx: Support global constructors/destructors via 'collect2'"
> attached; OK to push? (... with 'gcc/doc/install.texi' accordingly
> updated once <https://github.com/MentorEmbedded/nvptx-tools/pull/40>
> "'nm'" and newlib
> <https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
> "nvptx: Implement '_exit' instead of 'exit'" have been merged; any
> comments to those?)
>
> Per my quick scanning of 'gcc/config.gcc' history, for more than two
> decades, there was a clear trend to remove 'use_collect2=yes'
> configurations; now finally a new one is being added -- making sure we're
> not slowly dispensing with the need for the early 1990s piece of work
> that 'gcc/collect2*' is... ;'-P
>
>
> Grüße
> Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Support-global-constructors-destructors-via-co.patch --]
[-- Type: text/x-diff, Size: 10784 bytes --]
From 0e7cf5a9f83c3a82eafa126886e5d92651bfbb30 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Sun, 13 Nov 2022 14:19:30 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'
The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this. Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more "sorry, unimplemented:
global constructors not supported on this target".
This depends on <https://github.com/MentorEmbedded/nvptx-tools/pull/40> "'nm'"
generally, and for global destructors support: newlib
<https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
"nvptx: Implement '_exit' instead of 'exit'".
gcc/
* collect2.cc (write_c_file_glob): Allow for
'COLLECT2_MAIN_REFERENCE' override.
* config.gcc <case ${target} in nvptx-*>: Set 'use_collect2=yes'.
* config/nvptx/nvptx.h: Adjust.
gcc/testsuite/
* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
in identifiers.
* lib/target-supports.exp
(check_effective_target_global_constructor): Enable for nvptx.
libgcc/
* config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
'crtend.o' to 'extra_parts'.
* config/nvptx/crt0.c: Invoke '__do_global_ctors',
'__do_global_dtors'.
* config/nvptx/crtstuff.c: New.
* config/nvptx/t-nvptx: Adjust.
---
gcc/collect2.cc | 4 ++
gcc/config.gcc | 1 +
gcc/config/nvptx/nvptx.h | 35 ++++++++++-
.../no_profile_instrument_function-attr-1.c | 2 +-
gcc/testsuite/lib/target-supports.exp | 3 +-
libgcc/config.host | 2 +-
libgcc/config/nvptx/crt0.c | 6 ++
libgcc/config/nvptx/crtstuff.c | 58 +++++++++++++++++++
libgcc/config/nvptx/t-nvptx | 15 ++++-
9 files changed, 119 insertions(+), 7 deletions(-)
create mode 100644 libgcc/config/nvptx/crtstuff.c
diff --git a/gcc/collect2.cc b/gcc/collect2.cc
index d81c7f28f16a..945a9ff86dda 100644
--- a/gcc/collect2.cc
+++ b/gcc/collect2.cc
@@ -2238,8 +2238,12 @@ write_c_file_glob (FILE *stream, const char *name ATTRIBUTE_UNUSED)
fprintf (stream, "\tdereg_frame,\n");
fprintf (stream, "\t0\n};\n\n");
+# ifdef COLLECT2_MAIN_REFERENCE
+ fprintf (stream, "%s\n\n", COLLECT2_MAIN_REFERENCE);
+# else
fprintf (stream, "extern entry_pt %s;\n", NAME__MAIN);
fprintf (stream, "entry_pt *__main_reference = %s;\n\n", NAME__MAIN);
+# endif
}
#endif /* ! LD_INIT_SWITCH */
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 951902338205..fec67d7b6e40 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2784,6 +2784,7 @@ nvptx-*)
tm_file="${tm_file} newlib-stdint.h"
use_gcc_stdint=wrap
tmake_file="nvptx/t-nvptx"
+ use_collect2=yes
if test x$enable_as_accelerator = xyes; then
extra_programs="${extra_programs} mkoffload\$(exeext)"
tm_file="${tm_file} nvptx/offload.h"
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index dc676dcb5fc5..235c1e4d99d5 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -35,7 +35,39 @@
'../../gcc.cc:asm_options', 'HAVE_GNU_AS'. */
#define ASM_SPEC "%{v}"
-#define STARTFILE_SPEC "%{mmainkernel:crt0.o%s}"
+#define STARTFILE_SPEC \
+ STARTFILE_SPEC_MMAINKERNEL \
+ " " STARTFILE_SPEC_CDTOR
+
+#define ENDFILE_SPEC \
+ ENDFILE_SPEC_CDTOR
+
+#define STARTFILE_SPEC_MMAINKERNEL "%{mmainkernel:crt0.o%s}"
+
+/* Support for global constructors/destructors is implemented via
+ 'collect2' and the following helpers. */
+
+#define STARTFILE_SPEC_CDTOR "crtbegin.o%s"
+
+#define ENDFILE_SPEC_CDTOR "crtend.o%s"
+
+/* nvptx does its own wrapping of 'main'
+ (see 'libgcc/config/nvptx/crt0.c:__main'). */
+#define HAS_INIT_SECTION
+
+/* For example with old Nvidia Tesla K20c, Driver Version: 361.93.02, the
+ function pointers stored in the '__CTOR_LIST__', '__DTOR_LIST__' arrays
+ evidently evaluate to NULL in JIT compilation. Avoiding the use of
+ assembler names ('write_list_with_asm') doesn't help, but defining a dummy
+ function next to the arrays apparently does work around this issue...
+
+ The default '__main_reference' synthesized by 'collect2' refers to our
+ 'crt0.o' '__main' function with incompatible signature:
+
+ error : Function '__main' not declared __global__ in all source files
+
+ Address both these issues via 'COLLECT2_MAIN_REFERENCE'. */
+#define COLLECT2_MAIN_REFERENCE "__attribute__((unused)) static void dummy () {}"
#define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
@@ -348,7 +380,6 @@ struct GTY(()) machine_function
#define MOVE_MAX 8
#define MOVE_RATIO(SPEED) 4
#define FUNCTION_MODE QImode
-#define HAS_INIT_SECTION 1
/* The C++ front end insists to link against libstdc++ -- which we don't build.
Tell it to instead link against the innocuous libgcc. */
diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
index 909f8a684791..5b4101cf596d 100644
--- a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
@@ -18,7 +18,7 @@ int main ()
return foo ();
}
-/* { dg-final { scan-tree-dump-times "__gcov0\[._\]main.* = PROF_edge_counter" 1 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "__gcov0\[$._\]main.* = PROF_edge_counter" 1 "optimized"} } */
/* { dg-final { scan-tree-dump-times "__gcov_indirect_call_profiler_v" 1 "optimized" } } */
/* { dg-final { scan-tree-dump-times "__gcov_time_profiler_counter = " 1 "optimized" } } */
/* { dg-final { scan-tree-dump-times "__gcov_init" 1 "optimized" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index ea06e21c3a14..b1b1c5b36bc2 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -907,8 +907,7 @@ proc check_effective_target_nonlocal_goto {} {
# Return 1 if global constructors are supported, 0 otherwise.
proc check_effective_target_global_constructor {} {
- if { [istarget nvptx-*-*]
- || [istarget bpf-*-*] } {
+ if { [istarget bpf-*-*] } {
return 0
}
return 1
diff --git a/libgcc/config.host b/libgcc/config.host
index eb23abe89f5e..25072f41860c 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1499,7 +1499,7 @@ m32c-*-elf*)
;;
nvptx-*)
tmake_file="$tmake_file nvptx/t-nvptx"
- extra_parts="crt0.o"
+ extra_parts="crt0.o crtbegin.o crtend.o"
;;
*)
echo "*** Configuration ${host} not supported" 1>&2
diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c
index abf047327ae7..860e2bfacadd 100644
--- a/libgcc/config/nvptx/crt0.c
+++ b/libgcc/config/nvptx/crt0.c
@@ -19,6 +19,9 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#include <stdlib.h>
+#include "gbl-ctors.h"
+
int *__exitval_ptr;
extern void __attribute__((noreturn)) exit (int status);
@@ -47,5 +50,8 @@ __main (int *rval_ptr, int argc, void **argv)
__nvptx_stacks[0] = stack + sizeof stack;
__nvptx_uni[0] = 0;
+ __do_global_ctors ();
+ atexit (__do_global_dtors);
+
exit (main (argc, argv));
}
diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
new file mode 100644
index 000000000000..0823fc499019
--- /dev/null
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -0,0 +1,58 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include "gbl-ctors.h"
+
+/* The common 'crtstuff.c' doesn't quite provide what we need, so we roll our
+ own.
+
+ There's no technical reason in this configuration here to split the two
+ functions '__do_global_ctors' and '__do_global_ctors' into two separate
+ files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
+ do so anyway, for symmetry with other configurations. */
+
+#ifdef CRT_BEGIN
+
+void
+__do_global_ctors (void)
+{
+ DO_GLOBAL_CTORS_BODY;
+}
+
+#elif defined(CRT_END) /* ! CRT_BEGIN */
+
+void
+__do_global_dtors (void)
+{
+ /* In this configuration here, there's no way that "this routine is run more
+ than once [...] when exit is called recursively": for nvptx target, the
+ call to '__do_global_dtors' is registered via 'atexit', which doesn't
+ re-enter a function already run.
+ Therefore, we do *not* "arrange to remember where in the list we left off
+ processing". */
+ func_ptr *p;
+ for (p = __DTOR_LIST__ + 1; *p; )
+ (*p++) ();
+}
+
+#else /* ! CRT_BEGIN && ! CRT_END */
+#error "One of CRT_BEGIN or CRT_END must be defined."
+#endif
diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index ede0bf0f87dd..9a0454c3a4d0 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -3,7 +3,7 @@ LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
$(srcdir)/config/nvptx/atomic.c
LIB2ADDEH=
-LIB2FUNCS_EXCLUDE=__main
+LIB2FUNCS_EXCLUDE=
crt0.o: $(srcdir)/config/nvptx/crt0.c
$(crt_compile) -c $<
@@ -12,3 +12,16 @@ crt0.o: $(srcdir)/config/nvptx/crt0.c
# support it, and it may cause the build to fail, because of alloca usage, for
# example.
INHIBIT_LIBC_CFLAGS = -Dinhibit_libc
+
+# Support for global constructors/destructors is implemented via
+# 'collect2' and the following helpers.
+
+LIB2FUNCS_EXCLUDE += __main
+
+CUSTOM_CRTSTUFF = yes
+
+crtbegin.o: $(srcdir)/config/nvptx/crtstuff.c
+ $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_BEGIN
+
+crtend.o: $(srcdir)/config/nvptx/crtstuff.c
+ $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_END
--
2.35.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* nvptx: Support global constructors/destructors via 'collect2' for offloading (was: nvptx: Support global constructors/destructors via 'collect2')
2022-12-02 13:35 ` nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
2022-12-20 8:03 ` [PING] " Thomas Schwinge
@ 2022-12-23 13:35 ` Thomas Schwinge
2022-12-23 13:37 ` Thomas Schwinge
2023-01-20 20:46 ` [og12] " Thomas Schwinge
2023-01-20 20:41 ` [og12] nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
2 siblings, 2 replies; 10+ messages in thread
From: Thomas Schwinge @ 2022-12-23 13:35 UTC (permalink / raw)
To: gcc-patches, Tom de Vries
Hi!
On 2022-12-02T14:35:35+0100, I wrote:
> On 2022-12-01T22:13:38+0100, I wrote:
>> I'm working on support for global constructors/destructors with
>> GCC/nvptx
>
> See "nvptx: Support global constructors/destructors via 'collect2'"
> [posted before]
Building on that, attached is now the additional "for offloading" piece:
"nvptx: Support global constructors/destructors via 'collect2' for offloading".
OK to push?
I did manually test this (by putting a few constructors/destructors into
'libgomp/config/nvptx/oacc-parallel.c', and observing them be executed),
and also in my WIP development tree with standard libgfortran
constructors (with 'LIBGFOR_MINIMAL' disabled).
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
^ permalink raw reply [flat|nested] 10+ messages in thread
* nvptx: Support global constructors/destructors via 'collect2' for offloading (was: nvptx: Support global constructors/destructors via 'collect2')
2022-12-23 13:35 ` nvptx: Support global constructors/destructors via 'collect2' for offloading (was: " Thomas Schwinge
@ 2022-12-23 13:37 ` Thomas Schwinge
2023-01-11 11:49 ` [PING] " Thomas Schwinge
2023-01-20 20:46 ` [og12] " Thomas Schwinge
1 sibling, 1 reply; 10+ messages in thread
From: Thomas Schwinge @ 2022-12-23 13:37 UTC (permalink / raw)
To: gcc-patches, Tom de Vries
[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]
Hi!
On 2022-12-23T14:35:16+0100, I wrote:
> On 2022-12-02T14:35:35+0100, I wrote:
>> On 2022-12-01T22:13:38+0100, I wrote:
>>> I'm working on support for global constructors/destructors with
>>> GCC/nvptx
>>
>> See "nvptx: Support global constructors/destructors via 'collect2'"
>> [posted before]
>
> Building on that, attached is now the additional "for offloading" piece:
> "nvptx: Support global constructors/destructors via 'collect2' for offloading".
> OK to push?
Now really attached.
> I did manually test this (by putting a few constructors/destructors into
> 'libgomp/config/nvptx/oacc-parallel.c', and observing them be executed),
> and also in my WIP development tree with standard libgfortran
> constructors (with 'LIBGFOR_MINIMAL' disabled).
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Support-global-constructors-destructors-via-co.patch --]
[-- Type: text/x-diff, Size: 8131 bytes --]
From fb67006eeca0c8e2bfdf86576ed3109dacaf6868 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 30 Nov 2022 22:09:35 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'
for offloading
This extends "nvptx: Support global constructors/destructors via 'collect2'"
for offloading.
libgcc/
* config/nvptx/crtstuff.c ["mgomp"]
(__do_global_ctors__entry__mgomp)
(__do_global_dtors__entry__mgomp): New.
[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
New.
libgomp/
* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
(nvptx_close_device, GOMP_OFFLOAD_load_image)
(GOMP_OFFLOAD_unload_image): Call it.
---
libgcc/config/nvptx/crtstuff.c | 64 ++++++++++++++++++-
libgomp/plugin/plugin-nvptx.c | 113 ++++++++++++++++++++++++++++++++-
2 files changed, 175 insertions(+), 2 deletions(-)
diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
index 0823fc49901..8dc80687e0a 100644
--- a/libgcc/config/nvptx/crtstuff.c
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -29,6 +29,14 @@
files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
do so anyway, for symmetry with other configurations. */
+
+/* See 'crt0.c', 'mgomp.c'. */
+#if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+extern void *__nvptx_stacks[32] __attribute__((shared,nocommon));
+extern unsigned __nvptx_uni[32] __attribute__((shared,nocommon));
+#endif
+
+
#ifdef CRT_BEGIN
void
@@ -37,6 +45,33 @@ __do_global_ctors (void)
DO_GLOBAL_CTORS_BODY;
}
+/* Need '.entry' wrapper for offloading. */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_ctors__entry__mgomp (void *);
+
+void
+__do_global_ctors__entry__mgomp (void *nvptx_stacks_0)
+{
+ __nvptx_stacks[0] = nvptx_stacks_0;
+ __nvptx_uni[0] = 0;
+
+ __do_global_ctors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_ctors__entry (void);
+
+void
+__do_global_ctors__entry (void)
+{
+ __do_global_ctors ();
+}
+
+# endif
+
#elif defined(CRT_END) /* ! CRT_BEGIN */
void
@@ -45,7 +80,7 @@ __do_global_dtors (void)
/* In this configuration here, there's no way that "this routine is run more
than once [...] when exit is called recursively": for nvptx target, the
call to '__do_global_dtors' is registered via 'atexit', which doesn't
- re-enter a function already run.
+ re-enter a function already run, and neither does nvptx offload target.
Therefore, we do *not* "arrange to remember where in the list we left off
processing". */
func_ptr *p;
@@ -53,6 +88,33 @@ __do_global_dtors (void)
(*p++) ();
}
+/* Need '.entry' wrapper for offloading. */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_dtors__entry__mgomp (void *);
+
+void
+__do_global_dtors__entry__mgomp (void *nvptx_stacks_0)
+{
+ __nvptx_stacks[0] = nvptx_stacks_0;
+ __nvptx_uni[0] = 0;
+
+ __do_global_dtors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_dtors__entry (void);
+
+void
+__do_global_dtors__entry (void)
+{
+ __do_global_dtors ();
+}
+
+# endif
+
#else /* ! CRT_BEGIN && ! CRT_END */
#error "One of CRT_BEGIN or CRT_END must be defined."
#endif
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index fcc97c6e0d5..395639537e8 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -338,6 +338,11 @@ struct ptx_device
static struct ptx_device **ptx_devices;
+static bool nvptx_do_global_cdtors (CUmodule, struct ptx_device *,
+ const char *);
+static size_t nvptx_stacks_size ();
+static void *nvptx_stacks_acquire (struct ptx_device *, size_t, int);
+
static inline struct nvptx_thread *
nvptx_thread (void)
{
@@ -557,6 +562,17 @@ nvptx_close_device (struct ptx_device *ptx_dev)
if (!ptx_dev)
return true;
+ bool ret = true;
+
+ for (struct ptx_image_data *image = ptx_dev->images;
+ image != NULL;
+ image = image->next)
+ {
+ if (!nvptx_do_global_cdtors (image->module, ptx_dev,
+ "__do_global_dtors__entry"))
+ ret = false;
+ }
+
for (struct ptx_free_block *b = ptx_dev->free_blocks; b;)
{
struct ptx_free_block *b_next = b->next;
@@ -577,7 +593,8 @@ nvptx_close_device (struct ptx_device *ptx_dev)
CUDA_CALL (cuCtxDestroy, ptx_dev->ctx);
free (ptx_dev);
- return true;
+
+ return ret;
}
static int
@@ -1280,6 +1297,93 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
}
+/* Invoke MODULE's global constructors/destructors. */
+
+static bool
+nvptx_do_global_cdtors (CUmodule module, struct ptx_device *ptx_dev,
+ const char *funcname)
+{
+ bool ret = true;
+ char *funcname_mgomp = NULL;
+ CUresult r;
+ CUfunction funcptr;
+ r = CUDA_CALL_NOCHECK (cuModuleGetFunction,
+ &funcptr, module, funcname);
+ GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n",
+ funcname, cuda_error (r));
+ if (r == CUDA_ERROR_NOT_FOUND)
+ {
+ /* Try '[funcname]__mgomp'. */
+
+ size_t funcname_len = strlen (funcname);
+ const char *mgomp_suffix = "__mgomp";
+ size_t mgomp_suffix_len = strlen (mgomp_suffix);
+ funcname_mgomp
+ = GOMP_PLUGIN_malloc (funcname_len + mgomp_suffix_len + 1);
+ memcpy (funcname_mgomp, funcname, funcname_len);
+ memcpy (funcname_mgomp + funcname_len,
+ mgomp_suffix, mgomp_suffix_len + 1);
+ funcname = funcname_mgomp;
+
+ r = CUDA_CALL_NOCHECK (cuModuleGetFunction,
+ &funcptr, module, funcname);
+ GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n",
+ funcname, cuda_error (r));
+ }
+ if (r == CUDA_ERROR_NOT_FOUND)
+ ;
+ else if (r != CUDA_SUCCESS)
+ {
+ GOMP_PLUGIN_error ("cuModuleGetFunction (%s) error: %s",
+ funcname, cuda_error (r));
+ ret = false;
+ }
+ else
+ {
+ /* If necessary, set up soft stack. */
+ void *nvptx_stacks_0;
+ void *kargs[1];
+ if (funcname_mgomp)
+ {
+ size_t stack_size = nvptx_stacks_size ();
+ pthread_mutex_lock (&ptx_dev->omp_stacks.lock);
+ nvptx_stacks_0 = nvptx_stacks_acquire (ptx_dev, stack_size, 1);
+ nvptx_stacks_0 += stack_size;
+ kargs[0] = &nvptx_stacks_0;
+ }
+ r = CUDA_CALL_NOCHECK (cuLaunchKernel,
+ funcptr,
+ 1, 1, 1, 1, 1, 1,
+ /* sharedMemBytes */ 0,
+ /* hStream */ NULL,
+ /* kernelParams */ funcname_mgomp ? kargs : NULL,
+ /* extra */ NULL);
+ if (r != CUDA_SUCCESS)
+ {
+ GOMP_PLUGIN_error ("cuLaunchKernel (%s) error: %s",
+ funcname, cuda_error (r));
+ ret = false;
+ }
+
+ r = CUDA_CALL_NOCHECK (cuStreamSynchronize,
+ NULL);
+ if (r != CUDA_SUCCESS)
+ {
+ GOMP_PLUGIN_error ("cuStreamSynchronize (%s) error: %s",
+ funcname, cuda_error (r));
+ ret = false;
+ }
+
+ if (funcname_mgomp)
+ pthread_mutex_unlock (&ptx_dev->omp_stacks.lock);
+ }
+
+ if (funcname_mgomp)
+ free (funcname_mgomp);
+
+ return ret;
+}
+
/* Load the (partial) program described by TARGET_DATA to device
number ORD. Allocate and return TARGET_TABLE. If not NULL, REV_FN_TABLE
will contain the on-device addresses of the functions for reverse offload.
@@ -1452,6 +1556,9 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
nvptx_set_clocktick (module, dev);
+ if (!nvptx_do_global_cdtors (module, dev, "__do_global_ctors__entry"))
+ return -1;
+
return fn_entries + var_entries + other_entries;
}
@@ -1477,6 +1584,10 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data)
for (prev_p = &dev->images; (image = *prev_p) != 0; prev_p = &image->next)
if (image->target_data == target_data)
{
+ if (!nvptx_do_global_cdtors (image->module, dev,
+ "__do_global_dtors__entry"))
+ ret = false;
+
*prev_p = image->next;
if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS)
ret = false;
--
2.25.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PING^2] nvptx: Support global constructors/destructors via 'collect2'
2022-12-20 8:03 ` [PING] " Thomas Schwinge
@ 2023-01-11 11:48 ` Thomas Schwinge
2023-01-24 9:01 ` Make 'libgcc/config/nvptx/crt0.c' build '--without-headers' (was: [PING] nvptx: Support global constructors/destructors via 'collect2') Thomas Schwinge
1 sibling, 0 replies; 10+ messages in thread
From: Thomas Schwinge @ 2023-01-11 11:48 UTC (permalink / raw)
To: gcc-patches, Tom de Vries
[-- Attachment #1: Type: text/plain, Size: 1558 bytes --]
Hi!
Ping.
Grüße
Thomas
On 2022-12-20T09:03:51+0100, I wrote:
> Hi!
>
> Ping.
>
>
> Minor change in the attached
> "nvptx: Support global constructors/destructors via 'collect2'": for
> 'atexit', add '#include <stdlib.h>' to 'libgcc/config/nvptx/crt0.c'.
>
>
> Grüße
> Thomas
>
>
> On 2022-12-02T14:35:35+0100, I wrote:
>> Hi!
>>
>> On 2022-12-01T22:13:38+0100, I wrote:
>>> I'm working on support for global constructors/destructors with
>>> GCC/nvptx
>>
>> See "nvptx: Support global constructors/destructors via 'collect2'"
>> attached; OK to push? (... with 'gcc/doc/install.texi' accordingly
>> updated once <https://github.com/MentorEmbedded/nvptx-tools/pull/40>
>> "'nm'" and newlib
>> <https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
>> "nvptx: Implement '_exit' instead of 'exit'" have been merged; any
>> comments to those?)
>>
>> Per my quick scanning of 'gcc/config.gcc' history, for more than two
>> decades, there was a clear trend to remove 'use_collect2=yes'
>> configurations; now finally a new one is being added -- making sure we're
>> not slowly dispensing with the need for the early 1990s piece of work
>> that 'gcc/collect2*' is... ;'-P
>>
>>
>> Grüße
>> Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Support-global-constructors-destructors-via-co.patch --]
[-- Type: text/x-diff, Size: 10784 bytes --]
From 0e7cf5a9f83c3a82eafa126886e5d92651bfbb30 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Sun, 13 Nov 2022 14:19:30 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'
The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this. Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more "sorry, unimplemented:
global constructors not supported on this target".
This depends on <https://github.com/MentorEmbedded/nvptx-tools/pull/40> "'nm'"
generally, and for global destructors support: newlib
<https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
"nvptx: Implement '_exit' instead of 'exit'".
gcc/
* collect2.cc (write_c_file_glob): Allow for
'COLLECT2_MAIN_REFERENCE' override.
* config.gcc <case ${target} in nvptx-*>: Set 'use_collect2=yes'.
* config/nvptx/nvptx.h: Adjust.
gcc/testsuite/
* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
in identifiers.
* lib/target-supports.exp
(check_effective_target_global_constructor): Enable for nvptx.
libgcc/
* config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
'crtend.o' to 'extra_parts'.
* config/nvptx/crt0.c: Invoke '__do_global_ctors',
'__do_global_dtors'.
* config/nvptx/crtstuff.c: New.
* config/nvptx/t-nvptx: Adjust.
---
gcc/collect2.cc | 4 ++
gcc/config.gcc | 1 +
gcc/config/nvptx/nvptx.h | 35 ++++++++++-
.../no_profile_instrument_function-attr-1.c | 2 +-
gcc/testsuite/lib/target-supports.exp | 3 +-
libgcc/config.host | 2 +-
libgcc/config/nvptx/crt0.c | 6 ++
libgcc/config/nvptx/crtstuff.c | 58 +++++++++++++++++++
libgcc/config/nvptx/t-nvptx | 15 ++++-
9 files changed, 119 insertions(+), 7 deletions(-)
create mode 100644 libgcc/config/nvptx/crtstuff.c
diff --git a/gcc/collect2.cc b/gcc/collect2.cc
index d81c7f28f16a..945a9ff86dda 100644
--- a/gcc/collect2.cc
+++ b/gcc/collect2.cc
@@ -2238,8 +2238,12 @@ write_c_file_glob (FILE *stream, const char *name ATTRIBUTE_UNUSED)
fprintf (stream, "\tdereg_frame,\n");
fprintf (stream, "\t0\n};\n\n");
+# ifdef COLLECT2_MAIN_REFERENCE
+ fprintf (stream, "%s\n\n", COLLECT2_MAIN_REFERENCE);
+# else
fprintf (stream, "extern entry_pt %s;\n", NAME__MAIN);
fprintf (stream, "entry_pt *__main_reference = %s;\n\n", NAME__MAIN);
+# endif
}
#endif /* ! LD_INIT_SWITCH */
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 951902338205..fec67d7b6e40 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2784,6 +2784,7 @@ nvptx-*)
tm_file="${tm_file} newlib-stdint.h"
use_gcc_stdint=wrap
tmake_file="nvptx/t-nvptx"
+ use_collect2=yes
if test x$enable_as_accelerator = xyes; then
extra_programs="${extra_programs} mkoffload\$(exeext)"
tm_file="${tm_file} nvptx/offload.h"
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index dc676dcb5fc5..235c1e4d99d5 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -35,7 +35,39 @@
'../../gcc.cc:asm_options', 'HAVE_GNU_AS'. */
#define ASM_SPEC "%{v}"
-#define STARTFILE_SPEC "%{mmainkernel:crt0.o%s}"
+#define STARTFILE_SPEC \
+ STARTFILE_SPEC_MMAINKERNEL \
+ " " STARTFILE_SPEC_CDTOR
+
+#define ENDFILE_SPEC \
+ ENDFILE_SPEC_CDTOR
+
+#define STARTFILE_SPEC_MMAINKERNEL "%{mmainkernel:crt0.o%s}"
+
+/* Support for global constructors/destructors is implemented via
+ 'collect2' and the following helpers. */
+
+#define STARTFILE_SPEC_CDTOR "crtbegin.o%s"
+
+#define ENDFILE_SPEC_CDTOR "crtend.o%s"
+
+/* nvptx does its own wrapping of 'main'
+ (see 'libgcc/config/nvptx/crt0.c:__main'). */
+#define HAS_INIT_SECTION
+
+/* For example with old Nvidia Tesla K20c, Driver Version: 361.93.02, the
+ function pointers stored in the '__CTOR_LIST__', '__DTOR_LIST__' arrays
+ evidently evaluate to NULL in JIT compilation. Avoiding the use of
+ assembler names ('write_list_with_asm') doesn't help, but defining a dummy
+ function next to the arrays apparently does work around this issue...
+
+ The default '__main_reference' synthesized by 'collect2' refers to our
+ 'crt0.o' '__main' function with incompatible signature:
+
+ error : Function '__main' not declared __global__ in all source files
+
+ Address both these issues via 'COLLECT2_MAIN_REFERENCE'. */
+#define COLLECT2_MAIN_REFERENCE "__attribute__((unused)) static void dummy () {}"
#define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
@@ -348,7 +380,6 @@ struct GTY(()) machine_function
#define MOVE_MAX 8
#define MOVE_RATIO(SPEED) 4
#define FUNCTION_MODE QImode
-#define HAS_INIT_SECTION 1
/* The C++ front end insists to link against libstdc++ -- which we don't build.
Tell it to instead link against the innocuous libgcc. */
diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
index 909f8a684791..5b4101cf596d 100644
--- a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
@@ -18,7 +18,7 @@ int main ()
return foo ();
}
-/* { dg-final { scan-tree-dump-times "__gcov0\[._\]main.* = PROF_edge_counter" 1 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "__gcov0\[$._\]main.* = PROF_edge_counter" 1 "optimized"} } */
/* { dg-final { scan-tree-dump-times "__gcov_indirect_call_profiler_v" 1 "optimized" } } */
/* { dg-final { scan-tree-dump-times "__gcov_time_profiler_counter = " 1 "optimized" } } */
/* { dg-final { scan-tree-dump-times "__gcov_init" 1 "optimized" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index ea06e21c3a14..b1b1c5b36bc2 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -907,8 +907,7 @@ proc check_effective_target_nonlocal_goto {} {
# Return 1 if global constructors are supported, 0 otherwise.
proc check_effective_target_global_constructor {} {
- if { [istarget nvptx-*-*]
- || [istarget bpf-*-*] } {
+ if { [istarget bpf-*-*] } {
return 0
}
return 1
diff --git a/libgcc/config.host b/libgcc/config.host
index eb23abe89f5e..25072f41860c 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1499,7 +1499,7 @@ m32c-*-elf*)
;;
nvptx-*)
tmake_file="$tmake_file nvptx/t-nvptx"
- extra_parts="crt0.o"
+ extra_parts="crt0.o crtbegin.o crtend.o"
;;
*)
echo "*** Configuration ${host} not supported" 1>&2
diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c
index abf047327ae7..860e2bfacadd 100644
--- a/libgcc/config/nvptx/crt0.c
+++ b/libgcc/config/nvptx/crt0.c
@@ -19,6 +19,9 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#include <stdlib.h>
+#include "gbl-ctors.h"
+
int *__exitval_ptr;
extern void __attribute__((noreturn)) exit (int status);
@@ -47,5 +50,8 @@ __main (int *rval_ptr, int argc, void **argv)
__nvptx_stacks[0] = stack + sizeof stack;
__nvptx_uni[0] = 0;
+ __do_global_ctors ();
+ atexit (__do_global_dtors);
+
exit (main (argc, argv));
}
diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
new file mode 100644
index 000000000000..0823fc499019
--- /dev/null
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -0,0 +1,58 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include "gbl-ctors.h"
+
+/* The common 'crtstuff.c' doesn't quite provide what we need, so we roll our
+ own.
+
+ There's no technical reason in this configuration here to split the two
+ functions '__do_global_ctors' and '__do_global_ctors' into two separate
+ files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
+ do so anyway, for symmetry with other configurations. */
+
+#ifdef CRT_BEGIN
+
+void
+__do_global_ctors (void)
+{
+ DO_GLOBAL_CTORS_BODY;
+}
+
+#elif defined(CRT_END) /* ! CRT_BEGIN */
+
+void
+__do_global_dtors (void)
+{
+ /* In this configuration here, there's no way that "this routine is run more
+ than once [...] when exit is called recursively": for nvptx target, the
+ call to '__do_global_dtors' is registered via 'atexit', which doesn't
+ re-enter a function already run.
+ Therefore, we do *not* "arrange to remember where in the list we left off
+ processing". */
+ func_ptr *p;
+ for (p = __DTOR_LIST__ + 1; *p; )
+ (*p++) ();
+}
+
+#else /* ! CRT_BEGIN && ! CRT_END */
+#error "One of CRT_BEGIN or CRT_END must be defined."
+#endif
diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index ede0bf0f87dd..9a0454c3a4d0 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -3,7 +3,7 @@ LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
$(srcdir)/config/nvptx/atomic.c
LIB2ADDEH=
-LIB2FUNCS_EXCLUDE=__main
+LIB2FUNCS_EXCLUDE=
crt0.o: $(srcdir)/config/nvptx/crt0.c
$(crt_compile) -c $<
@@ -12,3 +12,16 @@ crt0.o: $(srcdir)/config/nvptx/crt0.c
# support it, and it may cause the build to fail, because of alloca usage, for
# example.
INHIBIT_LIBC_CFLAGS = -Dinhibit_libc
+
+# Support for global constructors/destructors is implemented via
+# 'collect2' and the following helpers.
+
+LIB2FUNCS_EXCLUDE += __main
+
+CUSTOM_CRTSTUFF = yes
+
+crtbegin.o: $(srcdir)/config/nvptx/crtstuff.c
+ $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_BEGIN
+
+crtend.o: $(srcdir)/config/nvptx/crtstuff.c
+ $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_END
--
2.35.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PING] nvptx: Support global constructors/destructors via 'collect2' for offloading (was: nvptx: Support global constructors/destructors via 'collect2')
2022-12-23 13:37 ` Thomas Schwinge
@ 2023-01-11 11:49 ` Thomas Schwinge
0 siblings, 0 replies; 10+ messages in thread
From: Thomas Schwinge @ 2023-01-11 11:49 UTC (permalink / raw)
To: gcc-patches, Tom de Vries
[-- Attachment #1: Type: text/plain, Size: 1185 bytes --]
Hi!
Ping.
Grüße
Thomas
On 2022-12-23T14:37:47+0100, I wrote:
> Hi!
>
> On 2022-12-23T14:35:16+0100, I wrote:
>> On 2022-12-02T14:35:35+0100, I wrote:
>>> On 2022-12-01T22:13:38+0100, I wrote:
>>>> I'm working on support for global constructors/destructors with
>>>> GCC/nvptx
>>>
>>> See "nvptx: Support global constructors/destructors via 'collect2'"
>>> [posted before]
>>
>> Building on that, attached is now the additional "for offloading" piece:
>> "nvptx: Support global constructors/destructors via 'collect2' for offloading".
>> OK to push?
>
> Now really attached.
>
>> I did manually test this (by putting a few constructors/destructors into
>> 'libgomp/config/nvptx/oacc-parallel.c', and observing them be executed),
>> and also in my WIP development tree with standard libgfortran
>> constructors (with 'LIBGFOR_MINIMAL' disabled).
>
>
> Grüße
> Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Support-global-constructors-destructors-via-co.patch --]
[-- Type: text/x-diff, Size: 8131 bytes --]
From fb67006eeca0c8e2bfdf86576ed3109dacaf6868 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 30 Nov 2022 22:09:35 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'
for offloading
This extends "nvptx: Support global constructors/destructors via 'collect2'"
for offloading.
libgcc/
* config/nvptx/crtstuff.c ["mgomp"]
(__do_global_ctors__entry__mgomp)
(__do_global_dtors__entry__mgomp): New.
[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
New.
libgomp/
* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
(nvptx_close_device, GOMP_OFFLOAD_load_image)
(GOMP_OFFLOAD_unload_image): Call it.
---
libgcc/config/nvptx/crtstuff.c | 64 ++++++++++++++++++-
libgomp/plugin/plugin-nvptx.c | 113 ++++++++++++++++++++++++++++++++-
2 files changed, 175 insertions(+), 2 deletions(-)
diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
index 0823fc49901..8dc80687e0a 100644
--- a/libgcc/config/nvptx/crtstuff.c
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -29,6 +29,14 @@
files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
do so anyway, for symmetry with other configurations. */
+
+/* See 'crt0.c', 'mgomp.c'. */
+#if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+extern void *__nvptx_stacks[32] __attribute__((shared,nocommon));
+extern unsigned __nvptx_uni[32] __attribute__((shared,nocommon));
+#endif
+
+
#ifdef CRT_BEGIN
void
@@ -37,6 +45,33 @@ __do_global_ctors (void)
DO_GLOBAL_CTORS_BODY;
}
+/* Need '.entry' wrapper for offloading. */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_ctors__entry__mgomp (void *);
+
+void
+__do_global_ctors__entry__mgomp (void *nvptx_stacks_0)
+{
+ __nvptx_stacks[0] = nvptx_stacks_0;
+ __nvptx_uni[0] = 0;
+
+ __do_global_ctors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_ctors__entry (void);
+
+void
+__do_global_ctors__entry (void)
+{
+ __do_global_ctors ();
+}
+
+# endif
+
#elif defined(CRT_END) /* ! CRT_BEGIN */
void
@@ -45,7 +80,7 @@ __do_global_dtors (void)
/* In this configuration here, there's no way that "this routine is run more
than once [...] when exit is called recursively": for nvptx target, the
call to '__do_global_dtors' is registered via 'atexit', which doesn't
- re-enter a function already run.
+ re-enter a function already run, and neither does nvptx offload target.
Therefore, we do *not* "arrange to remember where in the list we left off
processing". */
func_ptr *p;
@@ -53,6 +88,33 @@ __do_global_dtors (void)
(*p++) ();
}
+/* Need '.entry' wrapper for offloading. */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_dtors__entry__mgomp (void *);
+
+void
+__do_global_dtors__entry__mgomp (void *nvptx_stacks_0)
+{
+ __nvptx_stacks[0] = nvptx_stacks_0;
+ __nvptx_uni[0] = 0;
+
+ __do_global_dtors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_dtors__entry (void);
+
+void
+__do_global_dtors__entry (void)
+{
+ __do_global_dtors ();
+}
+
+# endif
+
#else /* ! CRT_BEGIN && ! CRT_END */
#error "One of CRT_BEGIN or CRT_END must be defined."
#endif
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index fcc97c6e0d5..395639537e8 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -338,6 +338,11 @@ struct ptx_device
static struct ptx_device **ptx_devices;
+static bool nvptx_do_global_cdtors (CUmodule, struct ptx_device *,
+ const char *);
+static size_t nvptx_stacks_size ();
+static void *nvptx_stacks_acquire (struct ptx_device *, size_t, int);
+
static inline struct nvptx_thread *
nvptx_thread (void)
{
@@ -557,6 +562,17 @@ nvptx_close_device (struct ptx_device *ptx_dev)
if (!ptx_dev)
return true;
+ bool ret = true;
+
+ for (struct ptx_image_data *image = ptx_dev->images;
+ image != NULL;
+ image = image->next)
+ {
+ if (!nvptx_do_global_cdtors (image->module, ptx_dev,
+ "__do_global_dtors__entry"))
+ ret = false;
+ }
+
for (struct ptx_free_block *b = ptx_dev->free_blocks; b;)
{
struct ptx_free_block *b_next = b->next;
@@ -577,7 +593,8 @@ nvptx_close_device (struct ptx_device *ptx_dev)
CUDA_CALL (cuCtxDestroy, ptx_dev->ctx);
free (ptx_dev);
- return true;
+
+ return ret;
}
static int
@@ -1280,6 +1297,93 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
}
+/* Invoke MODULE's global constructors/destructors. */
+
+static bool
+nvptx_do_global_cdtors (CUmodule module, struct ptx_device *ptx_dev,
+ const char *funcname)
+{
+ bool ret = true;
+ char *funcname_mgomp = NULL;
+ CUresult r;
+ CUfunction funcptr;
+ r = CUDA_CALL_NOCHECK (cuModuleGetFunction,
+ &funcptr, module, funcname);
+ GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n",
+ funcname, cuda_error (r));
+ if (r == CUDA_ERROR_NOT_FOUND)
+ {
+ /* Try '[funcname]__mgomp'. */
+
+ size_t funcname_len = strlen (funcname);
+ const char *mgomp_suffix = "__mgomp";
+ size_t mgomp_suffix_len = strlen (mgomp_suffix);
+ funcname_mgomp
+ = GOMP_PLUGIN_malloc (funcname_len + mgomp_suffix_len + 1);
+ memcpy (funcname_mgomp, funcname, funcname_len);
+ memcpy (funcname_mgomp + funcname_len,
+ mgomp_suffix, mgomp_suffix_len + 1);
+ funcname = funcname_mgomp;
+
+ r = CUDA_CALL_NOCHECK (cuModuleGetFunction,
+ &funcptr, module, funcname);
+ GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n",
+ funcname, cuda_error (r));
+ }
+ if (r == CUDA_ERROR_NOT_FOUND)
+ ;
+ else if (r != CUDA_SUCCESS)
+ {
+ GOMP_PLUGIN_error ("cuModuleGetFunction (%s) error: %s",
+ funcname, cuda_error (r));
+ ret = false;
+ }
+ else
+ {
+ /* If necessary, set up soft stack. */
+ void *nvptx_stacks_0;
+ void *kargs[1];
+ if (funcname_mgomp)
+ {
+ size_t stack_size = nvptx_stacks_size ();
+ pthread_mutex_lock (&ptx_dev->omp_stacks.lock);
+ nvptx_stacks_0 = nvptx_stacks_acquire (ptx_dev, stack_size, 1);
+ nvptx_stacks_0 += stack_size;
+ kargs[0] = &nvptx_stacks_0;
+ }
+ r = CUDA_CALL_NOCHECK (cuLaunchKernel,
+ funcptr,
+ 1, 1, 1, 1, 1, 1,
+ /* sharedMemBytes */ 0,
+ /* hStream */ NULL,
+ /* kernelParams */ funcname_mgomp ? kargs : NULL,
+ /* extra */ NULL);
+ if (r != CUDA_SUCCESS)
+ {
+ GOMP_PLUGIN_error ("cuLaunchKernel (%s) error: %s",
+ funcname, cuda_error (r));
+ ret = false;
+ }
+
+ r = CUDA_CALL_NOCHECK (cuStreamSynchronize,
+ NULL);
+ if (r != CUDA_SUCCESS)
+ {
+ GOMP_PLUGIN_error ("cuStreamSynchronize (%s) error: %s",
+ funcname, cuda_error (r));
+ ret = false;
+ }
+
+ if (funcname_mgomp)
+ pthread_mutex_unlock (&ptx_dev->omp_stacks.lock);
+ }
+
+ if (funcname_mgomp)
+ free (funcname_mgomp);
+
+ return ret;
+}
+
/* Load the (partial) program described by TARGET_DATA to device
number ORD. Allocate and return TARGET_TABLE. If not NULL, REV_FN_TABLE
will contain the on-device addresses of the functions for reverse offload.
@@ -1452,6 +1556,9 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
nvptx_set_clocktick (module, dev);
+ if (!nvptx_do_global_cdtors (module, dev, "__do_global_ctors__entry"))
+ return -1;
+
return fn_entries + var_entries + other_entries;
}
@@ -1477,6 +1584,10 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data)
for (prev_p = &dev->images; (image = *prev_p) != 0; prev_p = &image->next)
if (image->target_data == target_data)
{
+ if (!nvptx_do_global_cdtors (image->module, dev,
+ "__do_global_dtors__entry"))
+ ret = false;
+
*prev_p = image->next;
if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS)
ret = false;
--
2.25.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [og12] nvptx: Support global constructors/destructors via 'collect2'
2022-12-02 13:35 ` nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
2022-12-20 8:03 ` [PING] " Thomas Schwinge
2022-12-23 13:35 ` nvptx: Support global constructors/destructors via 'collect2' for offloading (was: " Thomas Schwinge
@ 2023-01-20 20:41 ` Thomas Schwinge
2023-01-20 20:45 ` Thomas Schwinge
2 siblings, 1 reply; 10+ messages in thread
From: Thomas Schwinge @ 2023-01-20 20:41 UTC (permalink / raw)
To: gcc-patches; +Cc: Tom de Vries
Hi!
On 2022-12-02T14:35:35+0100, I wrote:
> On 2022-12-01T22:13:38+0100, I wrote:
>> I'm working on support for global constructors/destructors with
>> GCC/nvptx
>
> See "nvptx: Support global constructors/destructors via 'collect2'"
> attached; OK to push? (... with 'gcc/doc/install.texi' accordingly
> updated once <https://github.com/MentorEmbedded/nvptx-tools/pull/40>
> "'nm'" and newlib
> <https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
> "nvptx: Implement '_exit' instead of 'exit'" have been merged; any
> comments to those?)
For now pushed to devel/omp/gcc-12 branch in
commit fe07b0003bb2092bc34d4bed504be1868b88782d
"nvptx: Support global constructors/destructors via 'collect2'",
see attached.
> Per my quick scanning of 'gcc/config.gcc' history, for more than two
> decades, there was a clear trend to remove 'use_collect2=yes'
> configurations; now finally a new one is being added -- making sure we're
> not slowly dispensing with the need for the early 1990s piece of work
> that 'gcc/collect2*' is... ;'-P
(I still find that "notable" and "funny" in a certain way.) ;-*
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [og12] nvptx: Support global constructors/destructors via 'collect2'
2023-01-20 20:41 ` [og12] nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
@ 2023-01-20 20:45 ` Thomas Schwinge
0 siblings, 0 replies; 10+ messages in thread
From: Thomas Schwinge @ 2023-01-20 20:45 UTC (permalink / raw)
To: gcc-patches; +Cc: Tom de Vries
[-- Attachment #1: Type: text/plain, Size: 1548 bytes --]
Hi!
On 2023-01-20T21:41:26+0100, I wrote:
> On 2022-12-02T14:35:35+0100, I wrote:
>> On 2022-12-01T22:13:38+0100, I wrote:
>>> I'm working on support for global constructors/destructors with
>>> GCC/nvptx
>>
>> See "nvptx: Support global constructors/destructors via 'collect2'"
>> attached; OK to push? (... with 'gcc/doc/install.texi' accordingly
>> updated once <https://github.com/MentorEmbedded/nvptx-tools/pull/40>
>> "'nm'" and newlib
>> <https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
>> "nvptx: Implement '_exit' instead of 'exit'" have been merged; any
>> comments to those?)
>
> For now pushed to devel/omp/gcc-12 branch in
> commit fe07b0003bb2092bc34d4bed504be1868b88782d
> "nvptx: Support global constructors/destructors via 'collect2'",
> see attached.
Now really attached.
>> Per my quick scanning of 'gcc/config.gcc' history, for more than two
>> decades, there was a clear trend to remove 'use_collect2=yes'
>> configurations; now finally a new one is being added -- making sure we're
>> not slowly dispensing with the need for the early 1990s piece of work
>> that 'gcc/collect2*' is... ;'-P
>
> (I still find that "notable" and "funny" in a certain way.) ;-*
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Support-global-constructors-destructors-via-co.patch --]
[-- Type: text/x-diff, Size: 12741 bytes --]
From fe07b0003bb2092bc34d4bed504be1868b88782d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Sun, 13 Nov 2022 14:19:30 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'
The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this. Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more "sorry, unimplemented:
global constructors not supported on this target".
This depends on <https://github.com/MentorEmbedded/nvptx-tools/pull/40> "'nm'"
generally, and for global destructors support: newlib
<https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
"nvptx: Implement '_exit' instead of 'exit'".
gcc/
* collect2.cc (write_c_file_glob): Allow for
'COLLECT2_MAIN_REFERENCE' override.
* config.gcc <case ${target} in nvptx-*>: Set 'use_collect2=yes'.
* config/nvptx/nvptx.h: Adjust.
gcc/testsuite/
* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
in identifiers.
* lib/target-supports.exp
(check_effective_target_global_constructor): Enable for nvptx.
libgcc/
* config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
'crtend.o' to 'extra_parts'.
* config/nvptx/crt0.c: Invoke '__do_global_ctors',
'__do_global_dtors'.
* config/nvptx/crtstuff.c: New.
* config/nvptx/t-nvptx: Adjust.
---
gcc/ChangeLog.omp | 5 ++
gcc/collect2.cc | 4 ++
gcc/config.gcc | 1 +
gcc/config/nvptx/nvptx.h | 35 ++++++++++-
gcc/testsuite/ChangeLog.omp | 6 ++
.../no_profile_instrument_function-attr-1.c | 2 +-
gcc/testsuite/lib/target-supports.exp | 3 +-
libgcc/ChangeLog.omp | 9 +++
libgcc/config.host | 2 +-
libgcc/config/nvptx/crt0.c | 6 ++
libgcc/config/nvptx/crtstuff.c | 58 +++++++++++++++++++
libgcc/config/nvptx/t-nvptx | 15 ++++-
12 files changed, 139 insertions(+), 7 deletions(-)
create mode 100644 libgcc/config/nvptx/crtstuff.c
diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 127b450644b..ca00bfb48f9 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,5 +1,10 @@
2023-01-20 Thomas Schwinge <thomas@codesourcery.com>
+ * collect2.cc (write_c_file_glob): Allow for
+ 'COLLECT2_MAIN_REFERENCE' override.
+ * config.gcc <case ${target} in nvptx-*>: Set 'use_collect2=yes'.
+ * config/nvptx/nvptx.h: Adjust.
+
* config/nvptx/nvptx.cc (nvptx_assemble_undefined_decl): Notice
'__nvptx_stacks', '__nvptx_uni' declarations.
(nvptx_file_end): Don't emit duplicate declarations for those.
diff --git a/gcc/collect2.cc b/gcc/collect2.cc
index d81c7f28f16..945a9ff86dd 100644
--- a/gcc/collect2.cc
+++ b/gcc/collect2.cc
@@ -2238,8 +2238,12 @@ write_c_file_glob (FILE *stream, const char *name ATTRIBUTE_UNUSED)
fprintf (stream, "\tdereg_frame,\n");
fprintf (stream, "\t0\n};\n\n");
+# ifdef COLLECT2_MAIN_REFERENCE
+ fprintf (stream, "%s\n\n", COLLECT2_MAIN_REFERENCE);
+# else
fprintf (stream, "extern entry_pt %s;\n", NAME__MAIN);
fprintf (stream, "entry_pt *__main_reference = %s;\n\n", NAME__MAIN);
+# endif
}
#endif /* ! LD_INIT_SWITCH */
diff --git a/gcc/config.gcc b/gcc/config.gcc
index e6b9c864b0d..9c9365886cf 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2835,6 +2835,7 @@ nvptx-*)
tm_file="${tm_file} newlib-stdint.h"
use_gcc_stdint=wrap
tmake_file="nvptx/t-nvptx"
+ use_collect2=yes
if test x$enable_as_accelerator = xyes; then
extra_programs="${extra_programs} mkoffload\$(exeext)"
tm_file="${tm_file} nvptx/offload.h"
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index b81f9f42cd3..d815081147e 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -35,7 +35,39 @@
'../../gcc.cc:asm_options', 'HAVE_GNU_AS'. */
#define ASM_SPEC "%{v}"
-#define STARTFILE_SPEC "%{mmainkernel:crt0.o%s}"
+#define STARTFILE_SPEC \
+ STARTFILE_SPEC_MMAINKERNEL \
+ " " STARTFILE_SPEC_CDTOR
+
+#define ENDFILE_SPEC \
+ ENDFILE_SPEC_CDTOR
+
+#define STARTFILE_SPEC_MMAINKERNEL "%{mmainkernel:crt0.o%s}"
+
+/* Support for global constructors/destructors is implemented via
+ 'collect2' and the following helpers. */
+
+#define STARTFILE_SPEC_CDTOR "crtbegin.o%s"
+
+#define ENDFILE_SPEC_CDTOR "crtend.o%s"
+
+/* nvptx does its own wrapping of 'main'
+ (see 'libgcc/config/nvptx/crt0.c:__main'). */
+#define HAS_INIT_SECTION
+
+/* For example with old Nvidia Tesla K20c, Driver Version: 361.93.02, the
+ function pointers stored in the '__CTOR_LIST__', '__DTOR_LIST__' arrays
+ evidently evaluate to NULL in JIT compilation. Avoiding the use of
+ assembler names ('write_list_with_asm') doesn't help, but defining a dummy
+ function next to the arrays apparently does work around this issue...
+
+ The default '__main_reference' synthesized by 'collect2' refers to our
+ 'crt0.o' '__main' function with incompatible signature:
+
+ error : Function '__main' not declared __global__ in all source files
+
+ Address both these issues via 'COLLECT2_MAIN_REFERENCE'. */
+#define COLLECT2_MAIN_REFERENCE "__attribute__((unused)) static void dummy () {}"
#define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
@@ -348,7 +380,6 @@ struct GTY(()) machine_function
#define MOVE_MAX 8
#define MOVE_RATIO(SPEED) 4
#define FUNCTION_MODE QImode
-#define HAS_INIT_SECTION 1
/* The C++ front end insists to link against libstdc++ -- which we don't build.
Tell it to instead link against the innocuous libgcc. */
diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index c942c34dc70..f35568d83a9 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,5 +1,11 @@
2023-01-20 Thomas Schwinge <thomas@codesourcery.com>
+ * gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
+ 'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
+ in identifiers.
+ * lib/target-supports.exp
+ (check_effective_target_global_constructor): Enable for nvptx.
+
* gcc.target/nvptx/softstack-decl-1.c: Make 'dg-do assemble',
adjust.
* gcc.target/nvptx/uniform-simt-decl-1.c: Likewise.
diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
index 909f8a68479..5b4101cf596 100644
--- a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
@@ -18,7 +18,7 @@ int main ()
return foo ();
}
-/* { dg-final { scan-tree-dump-times "__gcov0\[._\]main.* = PROF_edge_counter" 1 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "__gcov0\[$._\]main.* = PROF_edge_counter" 1 "optimized"} } */
/* { dg-final { scan-tree-dump-times "__gcov_indirect_call_profiler_v" 1 "optimized" } } */
/* { dg-final { scan-tree-dump-times "__gcov_time_profiler_counter = " 1 "optimized" } } */
/* { dg-final { scan-tree-dump-times "__gcov_init" 1 "optimized" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index e911b298f4c..efd02a61a6b 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -850,8 +850,7 @@ proc check_effective_target_nonlocal_goto {} {
# Return 1 if global constructors are supported, 0 otherwise.
proc check_effective_target_global_constructor {} {
- if { [istarget nvptx-*-*]
- || [istarget bpf-*-*] } {
+ if { [istarget bpf-*-*] } {
return 0
}
return 1
diff --git a/libgcc/ChangeLog.omp b/libgcc/ChangeLog.omp
index f41bbbb339a..68a99cbe427 100644
--- a/libgcc/ChangeLog.omp
+++ b/libgcc/ChangeLog.omp
@@ -1,3 +1,12 @@
+2023-01-20 Thomas Schwinge <thomas@codesourcery.com>
+
+ * config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
+ 'crtend.o' to 'extra_parts'.
+ * config/nvptx/crt0.c: Invoke '__do_global_ctors',
+ '__do_global_dtors'.
+ * config/nvptx/crtstuff.c: New.
+ * config/nvptx/t-nvptx: Adjust.
+
2022-11-07 Kwok Cheung Yeung <kcy@codesourcery.com>
* config/gcn/simd-math/amdgcnmach.h (VECTOR_RETURN): Store value of
diff --git a/libgcc/config.host b/libgcc/config.host
index 072dd26a276..29b5a2bd7cf 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1535,7 +1535,7 @@ m32c-*-elf*|m32c-*-rtems*)
;;
nvptx-*)
tmake_file="$tmake_file nvptx/t-nvptx"
- extra_parts="crt0.o"
+ extra_parts="crt0.o crtbegin.o crtend.o"
;;
*)
echo "*** Configuration ${host} not supported" 1>&2
diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c
index abf047327ae..860e2bfacad 100644
--- a/libgcc/config/nvptx/crt0.c
+++ b/libgcc/config/nvptx/crt0.c
@@ -19,6 +19,9 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
+#include <stdlib.h>
+#include "gbl-ctors.h"
+
int *__exitval_ptr;
extern void __attribute__((noreturn)) exit (int status);
@@ -47,5 +50,8 @@ __main (int *rval_ptr, int argc, void **argv)
__nvptx_stacks[0] = stack + sizeof stack;
__nvptx_uni[0] = 0;
+ __do_global_ctors ();
+ atexit (__do_global_dtors);
+
exit (main (argc, argv));
}
diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
new file mode 100644
index 00000000000..0823fc49901
--- /dev/null
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -0,0 +1,58 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+
+ This file is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 3, or (at your option) any
+ later version.
+
+ This file is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include "gbl-ctors.h"
+
+/* The common 'crtstuff.c' doesn't quite provide what we need, so we roll our
+ own.
+
+ There's no technical reason in this configuration here to split the two
+ functions '__do_global_ctors' and '__do_global_ctors' into two separate
+ files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
+ do so anyway, for symmetry with other configurations. */
+
+#ifdef CRT_BEGIN
+
+void
+__do_global_ctors (void)
+{
+ DO_GLOBAL_CTORS_BODY;
+}
+
+#elif defined(CRT_END) /* ! CRT_BEGIN */
+
+void
+__do_global_dtors (void)
+{
+ /* In this configuration here, there's no way that "this routine is run more
+ than once [...] when exit is called recursively": for nvptx target, the
+ call to '__do_global_dtors' is registered via 'atexit', which doesn't
+ re-enter a function already run.
+ Therefore, we do *not* "arrange to remember where in the list we left off
+ processing". */
+ func_ptr *p;
+ for (p = __DTOR_LIST__ + 1; *p; )
+ (*p++) ();
+}
+
+#else /* ! CRT_BEGIN && ! CRT_END */
+#error "One of CRT_BEGIN or CRT_END must be defined."
+#endif
diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index ede0bf0f87d..9a0454c3a4d 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -3,7 +3,7 @@ LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
$(srcdir)/config/nvptx/atomic.c
LIB2ADDEH=
-LIB2FUNCS_EXCLUDE=__main
+LIB2FUNCS_EXCLUDE=
crt0.o: $(srcdir)/config/nvptx/crt0.c
$(crt_compile) -c $<
@@ -12,3 +12,16 @@ crt0.o: $(srcdir)/config/nvptx/crt0.c
# support it, and it may cause the build to fail, because of alloca usage, for
# example.
INHIBIT_LIBC_CFLAGS = -Dinhibit_libc
+
+# Support for global constructors/destructors is implemented via
+# 'collect2' and the following helpers.
+
+LIB2FUNCS_EXCLUDE += __main
+
+CUSTOM_CRTSTUFF = yes
+
+crtbegin.o: $(srcdir)/config/nvptx/crtstuff.c
+ $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_BEGIN
+
+crtend.o: $(srcdir)/config/nvptx/crtstuff.c
+ $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_END
--
2.25.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [og12] nvptx: Support global constructors/destructors via 'collect2' for offloading (was: nvptx: Support global constructors/destructors via 'collect2')
2022-12-23 13:35 ` nvptx: Support global constructors/destructors via 'collect2' for offloading (was: " Thomas Schwinge
2022-12-23 13:37 ` Thomas Schwinge
@ 2023-01-20 20:46 ` Thomas Schwinge
1 sibling, 0 replies; 10+ messages in thread
From: Thomas Schwinge @ 2023-01-20 20:46 UTC (permalink / raw)
To: gcc-patches; +Cc: Tom de Vries
[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]
Hi!
On 2022-12-23T14:35:16+0100, I wrote:
> On 2022-12-02T14:35:35+0100, I wrote:
>> On 2022-12-01T22:13:38+0100, I wrote:
>>> I'm working on support for global constructors/destructors with
>>> GCC/nvptx
>>
>> See "nvptx: Support global constructors/destructors via 'collect2'"
>> [posted before]
>
> Building on that, attached is now the additional "for offloading" piece:
> "nvptx: Support global constructors/destructors via 'collect2' for offloading".
> OK to push?
For now pushed to devel/omp/gcc-12 branch in
commit 689a5340c7e4286b451f1bc600342550c7c94da2
"nvptx: Support global constructors/destructors via 'collect2' for offloading",
see attached.
> I did manually test this (by putting a few constructors/destructors into
> 'libgomp/config/nvptx/oacc-parallel.c', and observing them be executed),
> and also in my WIP development tree with standard libgfortran
> constructors (with 'LIBGFOR_MINIMAL' disabled).
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Support-global-constructors-destructors-via-co.patch --]
[-- Type: text/x-diff, Size: 9283 bytes --]
From 689a5340c7e4286b451f1bc600342550c7c94da2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 30 Nov 2022 22:09:35 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'
for offloading
This extends "nvptx: Support global constructors/destructors via 'collect2'"
for offloading.
libgcc/
* config/nvptx/crtstuff.c ["mgomp"]
(__do_global_ctors__entry__mgomp)
(__do_global_dtors__entry__mgomp): New.
[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
New.
libgomp/
* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
(nvptx_close_device, GOMP_OFFLOAD_load_image)
(GOMP_OFFLOAD_unload_image): Call it.
---
libgcc/ChangeLog.omp | 6 ++
libgcc/config/nvptx/crtstuff.c | 64 ++++++++++++++++++-
libgomp/ChangeLog.omp | 4 ++
libgomp/plugin/plugin-nvptx.c | 113 ++++++++++++++++++++++++++++++++-
4 files changed, 185 insertions(+), 2 deletions(-)
diff --git a/libgcc/ChangeLog.omp b/libgcc/ChangeLog.omp
index 68a99cbe427..2e7bf5cc029 100644
--- a/libgcc/ChangeLog.omp
+++ b/libgcc/ChangeLog.omp
@@ -1,5 +1,11 @@
2023-01-20 Thomas Schwinge <thomas@codesourcery.com>
+ * config/nvptx/crtstuff.c ["mgomp"]
+ (__do_global_ctors__entry__mgomp)
+ (__do_global_dtors__entry__mgomp): New.
+ [!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
+ New.
+
* config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
'crtend.o' to 'extra_parts'.
* config/nvptx/crt0.c: Invoke '__do_global_ctors',
diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
index 0823fc49901..8dc80687e0a 100644
--- a/libgcc/config/nvptx/crtstuff.c
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -29,6 +29,14 @@
files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
do so anyway, for symmetry with other configurations. */
+
+/* See 'crt0.c', 'mgomp.c'. */
+#if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+extern void *__nvptx_stacks[32] __attribute__((shared,nocommon));
+extern unsigned __nvptx_uni[32] __attribute__((shared,nocommon));
+#endif
+
+
#ifdef CRT_BEGIN
void
@@ -37,6 +45,33 @@ __do_global_ctors (void)
DO_GLOBAL_CTORS_BODY;
}
+/* Need '.entry' wrapper for offloading. */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_ctors__entry__mgomp (void *);
+
+void
+__do_global_ctors__entry__mgomp (void *nvptx_stacks_0)
+{
+ __nvptx_stacks[0] = nvptx_stacks_0;
+ __nvptx_uni[0] = 0;
+
+ __do_global_ctors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_ctors__entry (void);
+
+void
+__do_global_ctors__entry (void)
+{
+ __do_global_ctors ();
+}
+
+# endif
+
#elif defined(CRT_END) /* ! CRT_BEGIN */
void
@@ -45,7 +80,7 @@ __do_global_dtors (void)
/* In this configuration here, there's no way that "this routine is run more
than once [...] when exit is called recursively": for nvptx target, the
call to '__do_global_dtors' is registered via 'atexit', which doesn't
- re-enter a function already run.
+ re-enter a function already run, and neither does nvptx offload target.
Therefore, we do *not* "arrange to remember where in the list we left off
processing". */
func_ptr *p;
@@ -53,6 +88,33 @@ __do_global_dtors (void)
(*p++) ();
}
+/* Need '.entry' wrapper for offloading. */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_dtors__entry__mgomp (void *);
+
+void
+__do_global_dtors__entry__mgomp (void *nvptx_stacks_0)
+{
+ __nvptx_stacks[0] = nvptx_stacks_0;
+ __nvptx_uni[0] = 0;
+
+ __do_global_dtors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_dtors__entry (void);
+
+void
+__do_global_dtors__entry (void)
+{
+ __do_global_dtors ();
+}
+
+# endif
+
#else /* ! CRT_BEGIN && ! CRT_END */
#error "One of CRT_BEGIN or CRT_END must be defined."
#endif
diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 4447b74a2ab..32aa9705296 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,9 @@
2023-01-20 Thomas Schwinge <thomas@codesourcery.com>
+ * plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
+ (nvptx_close_device, GOMP_OFFLOAD_load_image)
+ (GOMP_OFFLOAD_unload_image): Call it.
+
* plugin/plugin-nvptx.c (nvptx_exec): Assert what we know about
'blockDimX'.
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index b2fabc61cc8..8e7b63bd637 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -344,6 +344,11 @@ static struct ptx_device **ptx_devices;
default is set here. */
static unsigned lowlat_pool_size = 8*1024;
+static bool nvptx_do_global_cdtors (CUmodule, struct ptx_device *,
+ const char *);
+static size_t nvptx_stacks_size ();
+static void *nvptx_stacks_acquire (struct ptx_device *, size_t, int);
+
static inline struct nvptx_thread *
nvptx_thread (void)
{
@@ -571,6 +576,17 @@ nvptx_close_device (struct ptx_device *ptx_dev)
if (!ptx_dev)
return true;
+ bool ret = true;
+
+ for (struct ptx_image_data *image = ptx_dev->images;
+ image != NULL;
+ image = image->next)
+ {
+ if (!nvptx_do_global_cdtors (image->module, ptx_dev,
+ "__do_global_dtors__entry"))
+ ret = false;
+ }
+
for (struct ptx_free_block *b = ptx_dev->free_blocks; b;)
{
struct ptx_free_block *b_next = b->next;
@@ -591,7 +607,8 @@ nvptx_close_device (struct ptx_device *ptx_dev)
CUDA_CALL (cuCtxDestroy, ptx_dev->ctx);
free (ptx_dev);
- return true;
+
+ return ret;
}
static int
@@ -1313,6 +1330,93 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
}
+/* Invoke MODULE's global constructors/destructors. */
+
+static bool
+nvptx_do_global_cdtors (CUmodule module, struct ptx_device *ptx_dev,
+ const char *funcname)
+{
+ bool ret = true;
+ char *funcname_mgomp = NULL;
+ CUresult r;
+ CUfunction funcptr;
+ r = CUDA_CALL_NOCHECK (cuModuleGetFunction,
+ &funcptr, module, funcname);
+ GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n",
+ funcname, cuda_error (r));
+ if (r == CUDA_ERROR_NOT_FOUND)
+ {
+ /* Try '[funcname]__mgomp'. */
+
+ size_t funcname_len = strlen (funcname);
+ const char *mgomp_suffix = "__mgomp";
+ size_t mgomp_suffix_len = strlen (mgomp_suffix);
+ funcname_mgomp
+ = GOMP_PLUGIN_malloc (funcname_len + mgomp_suffix_len + 1);
+ memcpy (funcname_mgomp, funcname, funcname_len);
+ memcpy (funcname_mgomp + funcname_len,
+ mgomp_suffix, mgomp_suffix_len + 1);
+ funcname = funcname_mgomp;
+
+ r = CUDA_CALL_NOCHECK (cuModuleGetFunction,
+ &funcptr, module, funcname);
+ GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n",
+ funcname, cuda_error (r));
+ }
+ if (r == CUDA_ERROR_NOT_FOUND)
+ ;
+ else if (r != CUDA_SUCCESS)
+ {
+ GOMP_PLUGIN_error ("cuModuleGetFunction (%s) error: %s",
+ funcname, cuda_error (r));
+ ret = false;
+ }
+ else
+ {
+ /* If necessary, set up soft stack. */
+ void *nvptx_stacks_0;
+ void *kargs[1];
+ if (funcname_mgomp)
+ {
+ size_t stack_size = nvptx_stacks_size ();
+ pthread_mutex_lock (&ptx_dev->omp_stacks.lock);
+ nvptx_stacks_0 = nvptx_stacks_acquire (ptx_dev, stack_size, 1);
+ nvptx_stacks_0 += stack_size;
+ kargs[0] = &nvptx_stacks_0;
+ }
+ r = CUDA_CALL_NOCHECK (cuLaunchKernel,
+ funcptr,
+ 1, 1, 1, 1, 1, 1,
+ /* sharedMemBytes */ 0,
+ /* hStream */ NULL,
+ /* kernelParams */ funcname_mgomp ? kargs : NULL,
+ /* extra */ NULL);
+ if (r != CUDA_SUCCESS)
+ {
+ GOMP_PLUGIN_error ("cuLaunchKernel (%s) error: %s",
+ funcname, cuda_error (r));
+ ret = false;
+ }
+
+ r = CUDA_CALL_NOCHECK (cuStreamSynchronize,
+ NULL);
+ if (r != CUDA_SUCCESS)
+ {
+ GOMP_PLUGIN_error ("cuStreamSynchronize (%s) error: %s",
+ funcname, cuda_error (r));
+ ret = false;
+ }
+
+ if (funcname_mgomp)
+ pthread_mutex_unlock (&ptx_dev->omp_stacks.lock);
+ }
+
+ if (funcname_mgomp)
+ free (funcname_mgomp);
+
+ return ret;
+}
+
/* Load the (partial) program described by TARGET_DATA to device
number ORD. Allocate and return TARGET_TABLE. If not NULL, REV_FN_TABLE
will contain the on-device addresses of the functions for reverse offload.
@@ -1485,6 +1589,9 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
nvptx_set_clocktick (module, dev);
+ if (!nvptx_do_global_cdtors (module, dev, "__do_global_ctors__entry"))
+ return -1;
+
return fn_entries + var_entries + other_entries;
}
@@ -1510,6 +1617,10 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data)
for (prev_p = &dev->images; (image = *prev_p) != 0; prev_p = &image->next)
if (image->target_data == target_data)
{
+ if (!nvptx_do_global_cdtors (image->module, dev,
+ "__do_global_dtors__entry"))
+ ret = false;
+
*prev_p = image->next;
if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS)
ret = false;
--
2.25.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Make 'libgcc/config/nvptx/crt0.c' build '--without-headers' (was: [PING] nvptx: Support global constructors/destructors via 'collect2')
2022-12-20 8:03 ` [PING] " Thomas Schwinge
2023-01-11 11:48 ` [PING^2] " Thomas Schwinge
@ 2023-01-24 9:01 ` Thomas Schwinge
1 sibling, 0 replies; 10+ messages in thread
From: Thomas Schwinge @ 2023-01-24 9:01 UTC (permalink / raw)
To: gcc-patches; +Cc: Tom de Vries
[-- Attachment #1: Type: text/plain, Size: 1408 bytes --]
Hi!
On 2022-12-20T09:03:51+0100, I wrote:
> Minor change in the attached
> "nvptx: Support global constructors/destructors via 'collect2'": for
> 'atexit', add '#include <stdlib.h>' to 'libgcc/config/nvptx/crt0.c'.
Turns out, it's not that easy. ;-) Pushed to devel/omp/gcc-12 branch
commit d90a8a5685c8bd3657892feac01739fe87a457a5
"Make 'libgcc/config/nvptx/crt0.c' build '--without-headers'", see
attached. Please consider that one 'fixup'ed into the GCC master branch
submission.
Grüße
Thomas
> --- a/libgcc/config/nvptx/crt0.c
> +++ b/libgcc/config/nvptx/crt0.c
> @@ -19,6 +19,9 @@
> see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
> <http://www.gnu.org/licenses/>. */
>
> +#include <stdlib.h>
> +#include "gbl-ctors.h"
> +
> int *__exitval_ptr;
>
> extern void __attribute__((noreturn)) exit (int status);
> @@ -47,5 +50,8 @@ __main (int *rval_ptr, int argc, void **argv)
> __nvptx_stacks[0] = stack + sizeof stack;
> __nvptx_uni[0] = 0;
>
> + __do_global_ctors ();
> + atexit (__do_global_dtors);
> +
> exit (main (argc, argv));
> }
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Make-libgcc-config-nvptx-crt0.c-build-without-header.patch --]
[-- Type: text/x-diff, Size: 1767 bytes --]
From d90a8a5685c8bd3657892feac01739fe87a457a5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Tue, 24 Jan 2023 09:49:34 +0100
Subject: [PATCH] Make 'libgcc/config/nvptx/crt0.c' build '--without-headers'
..., where it currently fails:
[...]/libgcc/config/nvptx/crt0.c:22:10: fatal error: stdlib.h: No such file or directory
22 | #include <stdlib.h>
| ^~~~~~~~~~
Fix-up for "nvptx: Support global constructors/destructors via 'collect2'".
libgcc/
* config/nvptx/crt0.c [!HAVE_STDLIB_H]: Don't '#include <stdlib.h>'.
(atexit): Prototype.
---
libgcc/ChangeLog.omp | 5 +++++
libgcc/config/nvptx/crt0.c | 7 ++++++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/libgcc/ChangeLog.omp b/libgcc/ChangeLog.omp
index c46f49bf5b7..cf509a70d61 100644
--- a/libgcc/ChangeLog.omp
+++ b/libgcc/ChangeLog.omp
@@ -1,3 +1,8 @@
+2023-01-24 Thomas Schwinge <thomas@codesourcery.com>
+
+ * config/nvptx/crt0.c [!HAVE_STDLIB_H]: Don't '#include <stdlib.h>'.
+ (atexit): Prototype.
+
2023-01-20 Thomas Schwinge <thomas@codesourcery.com>
Andrew Stubbs <ams@codesourcery.com>
diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c
index 860e2bfacad..02648bef84b 100644
--- a/libgcc/config/nvptx/crt0.c
+++ b/libgcc/config/nvptx/crt0.c
@@ -19,11 +19,16 @@
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
-#include <stdlib.h>
+#include "auto-target.h"
+
+#ifdef HAVE_STDLIB_H
+# include <stdlib.h>
+#endif
#include "gbl-ctors.h"
int *__exitval_ptr;
+extern int atexit (void (*function) (void));
extern void __attribute__((noreturn)) exit (int status);
extern int main (int, void **);
--
2.25.1
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-01-24 9:01 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com>
2022-12-02 13:35 ` nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
2022-12-20 8:03 ` [PING] " Thomas Schwinge
2023-01-11 11:48 ` [PING^2] " Thomas Schwinge
2023-01-24 9:01 ` Make 'libgcc/config/nvptx/crt0.c' build '--without-headers' (was: [PING] nvptx: Support global constructors/destructors via 'collect2') Thomas Schwinge
2022-12-23 13:35 ` nvptx: Support global constructors/destructors via 'collect2' for offloading (was: " Thomas Schwinge
2022-12-23 13:37 ` Thomas Schwinge
2023-01-11 11:49 ` [PING] " Thomas Schwinge
2023-01-20 20:46 ` [og12] " Thomas Schwinge
2023-01-20 20:41 ` [og12] nvptx: Support global constructors/destructors via 'collect2' Thomas Schwinge
2023-01-20 20:45 ` Thomas Schwinge
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).