public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] core: Support heap-based trampolines
@ 2023-07-16 10:38 FX Coudert
  2023-07-17  6:31 ` Richard Biener
  0 siblings, 1 reply; 15+ messages in thread
From: FX Coudert @ 2023-07-16 10:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: Iain Sandoe, maxim.blinov, ebotcazou, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 1542 bytes --]

Hi,

This is a reworked version (following review) of the patch by Maxim Blinov and Iain Sandoe enabling heap-based trampolines. What has changed since the last version:

- wording changes, preferring to use “heap-based trampolines” rather than “off-stack trampolines”
- the option triggering generation of these new trampolines is now a binary choice: -ftrampoline-impl=[stack|heap]
- some adjustments due to changes in the macOS build flags in GCC since last year

Regarding testing, this patch has had excellent exposure on darwin (both x86_64 and aarch64) because it was part of Iain’s branch, distributed by many macOS distros/vendors (including Homebrew) for more than a year, and there was no bug report against the feature or implementation. On x86_64-linux, I have regression-tested it in three different configurations:
- a default build
- a build with --enable-heap-trampolines
- a build with --enable-heap-trampolines and heap trampolines forced by default (forcing HEAP_TRAMPOLINES_INIT = 1)

There are no regressions in any of these settings (apart from an expected missing warning in gcc.dg/Wtrampolines.c).

----------

From the original review, one question asked (by Jeff Law) was: whether the two linux implementations should be dropped, and the configure time 
selectability as well. Regardless of the answer to the first question, I think we probably want to retain the later, even if only for darwin, as we want to implement this only on recent darwin versions.


OK to commit?

FX



[-- Attachment #2: 0001-core-Support-heap-based-trampolines.patch --]
[-- Type: application/octet-stream, Size: 37849 bytes --]

From d52627ab9aad754d872874401ec8de623ca775f1 Mon Sep 17 00:00:00 2001
From: Maxim Blinov <maxim.blinov@embecosm.com>
Date: Sat, 13 Nov 2021 04:39:53 +0000
Subject: [PATCH] core: Support heap-based trampolines

1. Generate heap-based nested function trampolines

Add support for allocating nested function trampolines on an
executable heap rather than on the stack. This is motivated by targets
such as AArch64 Darwin, which globally prohibit executing code on the
stack.

The target-specific routines for allocating and writing trampolines is
to be provided in libgcc, and is by-default _not_ compiled in unless
the target specifically requires it, or you manually provide
--enable-heap-trampolines when configuring gcc/libgcc.

The gcc flag -ftrampoline-impl controls whether to generate code
that instantiates trampolines on the stack, or to emit calls to
__builtin_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted. Note that this flag is completely
independent of libgcc: If libgcc is for any reason missing those
symbols, you will get a link failure.

This implementation imposes some implicit restrictions as compared to
stack trampolines. longjmp'ing back to a state before a trampoline was
created will cause us to skip over the corresponding
__builtin_nested_func_ptr_deleted, which will leak trampolines
starting from the beginning of the linked list of allocated
trampolines. There may be scope for instrumenting longjmp/setjmp to
trigger cleanups of trampolines.

2. Add x86_64-linux support for heap-based trampolines

Implement the __builtin_nested_func_ptr_{created,deleted} functions
for the x86_64-linux platform. This serves to exercise the
infrastructure added in libgcc (--enable-heap-trampolines) and
gcc (-ftrampoline-impl=heap) in supporting heap-based trampoline
generation, and is intended primarily for demonstration and debugging
purposes.

3. Add aarch64-linux support for head-based trampolines

Implement the __builtin_nested_func_ptr_{created,deleted} functions
for the aarch64-linux platform. This serves to exercise the
infrastructure added in libgcc (--enable-heap-trampolines) and
gcc (-ftrampoline-impl=heap) in supporting heap-based trampoline
generation, and is intended primarily for demonstration and debugging
purposes.

4. Darwin, aarch64, x86_64: Support heap trampolines.

Implement the __builtin_nested_func_ptr_{created,deleted} functions for
x86_64 and aarch64 Darwin.

For aarch64 --enable-heap-trampolines is enabled by default, and
-ftrampoline-impl=heap is enabled by default if we are on host macOS
version 11.x or greater.

For x86_64 this is configure-time opt-in (and can be applied from 10.10
onwards)

Co-Authored-By: Andrew Burgess <andrew.burgess@embecosm.com>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>

gcc/ChangeLog:

	* builtins.def (BUILT_IN_NESTED_PTR_CREATED): Define.
	(BUILT_IN_NESTED_PTR_DELETED): Ditto.
	* common.opt (ftrampoline-impl): Add option to control
	generation of trampoline instantiation (heap or stack).
	* config.gcc: Default to heap trampolines on macOS 11 and above.
	* config.in: Regenerate.
	* config/i386/darwin.h: Define X86_CUSTOM_FUNCTION_TEST.
	* config/i386/i386.h: Define X86_CUSTOM_FUNCTION_TEST.
	* config/i386/i386.cc: Use X86_CUSTOM_FUNCTION_TEST.
	* coretypes.h: Define enum trampoline_impl.
	* tree-nested.cc (convert_tramp_reference_op): Don't bother calling
	__builtin_adjust_trampoline for heap trampolines.
	(finalize_nesting_tree_1): Emit calls to
	__builtin_nested_...{created,deleted} if we're generating with
	-ftrampoline-impl=heap.
	* tree.cc (build_common_builtin_nodes): Build
	__builtin_nested_...{created,deleted}.
	* doc/invoke.texi (-ftrampoline-impl): Document.

libgcc/ChangeLog:

	* configure.ac: Add configure parameter
	--enable-heap-trampolines, and do error checking if we've
	trying to enable heap-based trampolines for a platform that doesn't
	provide any such implementation.
	* libgcc-std.ver.in: Ditto.
	* libgcc2.h (__builtin_nested_func_ptr_created): Declare.
	(__builtin_nested_func_ptr_deleted): Ditto.
	* config/aarch64/heap-trampoline.c: New file: Implement heap-based
	trampolines for aarch64.
	* config/aarch64/t-heap-trampoline: Add rule to build
	config/aarch64/heap-trampoline.c
	* config/i386/heap-trampoline.c: New file: Implement heap-based
	trampolines for x86_64.
	* config/i386/t-heap-trampoline: Add rule to build
	config/i386/heap-trampoline.cc
	* config.host: Handle --enable-heap-trampolines on
	x86_64-*-linux*, aarch64-*-linux*, aarch64*-*darwin*.
	* configure: Regenerate.
---
 gcc/builtins.def                        |   2 +
 gcc/common.opt                          |  17 ++-
 gcc/config.gcc                          |  11 ++
 gcc/config.in                           |   3 +-
 gcc/config/i386/darwin.h                |   6 +
 gcc/config/i386/i386.cc                 |   2 +-
 gcc/config/i386/i386.h                  |   6 +
 gcc/coretypes.h                         |   6 +
 gcc/doc/invoke.texi                     |  17 ++-
 gcc/tree-nested.cc                      | 121 ++++++++++++++---
 gcc/tree.cc                             |  17 +++
 libgcc/config.host                      |   9 ++
 libgcc/config/aarch64/heap-trampoline.c | 172 ++++++++++++++++++++++++
 libgcc/config/aarch64/t-heap-trampoline |  19 +++
 libgcc/config/i386/heap-trampoline.c    | 172 ++++++++++++++++++++++++
 libgcc/config/i386/t-heap-trampoline    |  19 +++
 libgcc/configure                        |  38 ++++++
 libgcc/configure.ac                     |  29 ++++
 libgcc/libgcc-std.ver.in                |   3 +
 libgcc/libgcc2.h                        |   3 +
 20 files changed, 651 insertions(+), 21 deletions(-)
 create mode 100644 libgcc/config/aarch64/heap-trampoline.c
 create mode 100644 libgcc/config/aarch64/t-heap-trampoline
 create mode 100644 libgcc/config/i386/heap-trampoline.c
 create mode 100644 libgcc/config/i386/t-heap-trampoline

diff --git a/gcc/builtins.def b/gcc/builtins.def
index 76e7200e772..918389d863d 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1073,6 +1073,8 @@ DEF_BUILTIN_STUB (BUILT_IN_ADJUST_TRAMPOLINE, "__builtin_adjust_trampoline")
 DEF_BUILTIN_STUB (BUILT_IN_INIT_DESCRIPTOR, "__builtin_init_descriptor")
 DEF_BUILTIN_STUB (BUILT_IN_ADJUST_DESCRIPTOR, "__builtin_adjust_descriptor")
 DEF_BUILTIN_STUB (BUILT_IN_NONLOCAL_GOTO, "__builtin_nonlocal_goto")
+DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_CREATED, "__builtin_nested_func_ptr_created")
+DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_DELETED, "__builtin_nested_func_ptr_deleted")
 
 /* Implementing __builtin_setjmp.  */
 DEF_BUILTIN_STUB (BUILT_IN_SETJMP_SETUP, "__builtin_setjmp_setup")
diff --git a/gcc/common.opt b/gcc/common.opt
index 25f650e2dae..4511930fe58 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2884,10 +2884,25 @@ Common Var(flag_tracer) Optimization
 Perform superblock formation via tail duplication.
 
 ftrampolines
-Common Var(flag_trampolines) Init(0)
+Common Var(flag_trampolines) Init(HEAP_TRAMPOLINES_INIT)
 For targets that normally need trampolines for nested functions, always
 generate them instead of using descriptors.
 
+ftrampoline-impl=
+Common Joined RejectNegative Enum(trampoline_impl) Var(flag_trampoline_impl) Init(HEAP_TRAMPOLINES_INIT ? TRAMPOLINE_IMPL_HEAP : TRAMPOLINE_IMPL_STACK)
+Whether trampolines are generated in executable memory rather than
+executable stack.
+
+Enum
+Name(trampoline_impl) Type(enum trampoline_impl) UnknownError(unknown trampoline implementation %qs)
+
+EnumValue
+Enum(trampoline_impl) String(stack) Value(TRAMPOLINE_IMPL_STACK)
+
+EnumValue
+Enum(trampoline_impl) String(heap) Value(TRAMPOLINE_IMPL_HEAP)
+
+
 ; Zero means that floating-point math operations cannot generate a
 ; (user-visible) trap.  This is the case, for example, in nonstop
 ; IEEE 754 arithmetic.
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1446eb2b3ca..a94d86f85e7 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1125,6 +1125,17 @@ case ${target} in
   ;;
 esac
 
+# Figure out if we need to enable -fheap-trampolines by default
+case ${target} in
+*-*-darwin2*)
+  # Currently, we do this for macOS 11 and above.
+  tm_defines="$tm_defines HEAP_TRAMPOLINES_INIT=1"
+  ;;
+*)
+  tm_defines="$tm_defines HEAP_TRAMPOLINES_INIT=0"
+  ;;
+esac
+
 case ${target} in
 aarch64*-*-elf | aarch64*-*-fuchsia* | aarch64*-*-rtems*)
 	tm_file="${tm_file} elfos.h newlib-stdint.h"
diff --git a/gcc/config.in b/gcc/config.in
index 0e62b9fbfc9..4cad077bfbe 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -2239,7 +2239,8 @@
 #endif
 
 
-/* Define to the sub-directory where libtool stores uninstalled libraries. */
+/* Define to the sub-directory in which libtool stores uninstalled libraries.
+   */
 #ifndef USED_FOR_TARGET
 #undef LT_OBJDIR
 #endif
diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h
index 588bd669bdd..036eefbbb95 100644
--- a/gcc/config/i386/darwin.h
+++ b/gcc/config/i386/darwin.h
@@ -308,3 +308,9 @@ along with GCC; see the file COPYING3.  If not see
 #define CLEAR_INSN_CACHE(beg, end)				\
   extern void sys_icache_invalidate(void *start, size_t len);	\
   sys_icache_invalidate ((beg), (size_t)((end)-(beg)))
+
+/* Disable custom function descriptors for Darwin when we have off-stack
+   trampolines.  */
+#undef X86_CUSTOM_FUNCTION_TEST
+#define X86_CUSTOM_FUNCTION_TEST \
+  (flag_trampolines && flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP) ? 0 : 1
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index f0d6167e667..ec80c71200c 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -25565,7 +25565,7 @@ ix86_libgcc_floating_mode_supported_p
 #define TARGET_HARD_REGNO_SCRATCH_OK ix86_hard_regno_scratch_ok
 
 #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
-#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
+#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS X86_CUSTOM_FUNCTION_TEST
 
 #undef TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID
 #define TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID ix86_addr_space_zero_address_valid
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index aea3209d5a3..19b535edf05 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -755,6 +755,12 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 /* Minimum allocation boundary for the code of a function.  */
 #define FUNCTION_BOUNDARY 8
 
+/* We will and with this value to test if a custom function descriptor needs
+   a static chain.  The function boundary must the adjusted so that the bit
+   this represents is no longer part of the address.  0 Disables the custom
+   function descriptors.  */
+#define X86_CUSTOM_FUNCTION_TEST 1
+
 /* C++ stores the virtual bit in the lowest bit of function pointers.  */
 #define TARGET_PTRMEMFUNC_VBIT_LOCATION ptrmemfunc_vbit_in_pfn
 
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index ca8837cef67..7e022a427c4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -199,6 +199,12 @@ enum tls_model {
   TLS_MODEL_LOCAL_EXEC
 };
 
+/* Types of trampoline implementation.  */
+enum trampoline_impl {
+  TRAMPOLINE_IMPL_STACK,
+  TRAMPOLINE_IMPL_HEAP
+};
+
 /* Types of ABI for an offload compiler.  */
 enum offload_abi {
   OFFLOAD_ABI_UNSET,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index cbc1282c274..6cb3b24221b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -710,7 +710,8 @@ Objective-C and Objective-C++ Dialects}.
 -fverbose-asm  -fpack-struct[=@var{n}]
 -fleading-underscore  -ftls-model=@var{model}
 -fstack-reuse=@var{reuse_level}
--ftrampolines  -ftrapv  -fwrapv
+-ftrampolines -ftrampoline-impl=@r{[}stack@r{|}heap@r{]}
+-ftrapv  -fwrapv
 -fvisibility=@r{[}default@r{|}internal@r{|}hidden@r{|}protected@r{]}
 -fstrict-volatile-bitfields  -fsync-libcalls}
 
@@ -18801,6 +18802,20 @@ For languages other than Ada, the @code{-ftrampolines} and
 trampolines are always generated on platforms that need them
 for nested functions.
 
+@opindex ftrampoline-impl
+@item -ftrampoline-impl=@r{[}stack@r{|}heap@r{]}
+By default, trampolines are generated on stack. However, certain platforms
+(such as the Apple M1) do not permit an executable stack.  Compiling with
+@option{-ftrampoline-impl=heap} generate calls to @code{__builtin_nested_func_ptr_created}
+and @code{__builtin_nested_func_ptr_deleted} in order to allocate and
+deallocate trampoline space on the executable heap. Please note that
+these functions are implemented in libgcc, and will not be compiled in
+unless you provide @option{--enable-heap-trampolines} when
+building gcc.  @emph{PLEASE NOTE}: Heap trampolines are @emph{not}
+guaranteed to be correctly deallocated if you @code{setjmp},
+instantiate nested functions, and then @code{longjmp} back to a state
+prior to having allocated those nested functions.
+
 @opindex fvisibility
 @item -fvisibility=@r{[}default@r{|}internal@r{|}hidden@r{|}protected@r{]}
 Set the default ELF image symbol visibility to the specified option---all
diff --git a/gcc/tree-nested.cc b/gcc/tree-nested.cc
index ae7d1f1f6a8..84ee9962485 100644
--- a/gcc/tree-nested.cc
+++ b/gcc/tree-nested.cc
@@ -611,6 +611,14 @@ get_trampoline_type (struct nesting_info *info)
   if (trampoline_type)
     return trampoline_type;
 
+  /* When trampolines are created off-stack then the only thing we need in the
+     local frame is a single pointer.  */
+  if (flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+    {
+      trampoline_type = build_pointer_type (void_type_node);
+      return trampoline_type;
+    }
+
   align = TRAMPOLINE_ALIGNMENT;
   size = TRAMPOLINE_SIZE;
 
@@ -2788,17 +2796,27 @@ convert_tramp_reference_op (tree *tp, int *walk_subtrees, void *data)
 
       /* Compute the address of the field holding the trampoline.  */
       x = get_frame_field (info, target_context, x, &wi->gsi);
-      x = build_addr (x);
-      x = gsi_gimplify_val (info, x, &wi->gsi);
 
-      /* Do machine-specific ugliness.  Normally this will involve
-	 computing extra alignment, but it can really be anything.  */
-      if (descr)
-	builtin = builtin_decl_implicit (BUILT_IN_ADJUST_DESCRIPTOR);
+      /* APB: We don't need to do the adjustment calls when using off-stack
+	 trampolines, any such adjustment will be done when the off-stack
+	 trampoline is created.  */
+      if (!descr && flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+	x = gsi_gimplify_val (info, x, &wi->gsi);
       else
-	builtin = builtin_decl_implicit (BUILT_IN_ADJUST_TRAMPOLINE);
-      call = gimple_build_call (builtin, 1, x);
-      x = init_tmp_var_with_call (info, &wi->gsi, call);
+	{
+	  x = build_addr (x);
+
+	  x = gsi_gimplify_val (info, x, &wi->gsi);
+
+	  /* Do machine-specific ugliness.  Normally this will involve
+	     computing extra alignment, but it can really be anything.  */
+	  if (descr)
+	    builtin = builtin_decl_implicit (BUILT_IN_ADJUST_DESCRIPTOR);
+	  else
+	    builtin = builtin_decl_implicit (BUILT_IN_ADJUST_TRAMPOLINE);
+	  call = gimple_build_call (builtin, 1, x);
+	  x = init_tmp_var_with_call (info, &wi->gsi, call);
+	}
 
       /* Cast back to the proper function type.  */
       x = build1 (NOP_EXPR, TREE_TYPE (t), x);
@@ -3377,6 +3395,7 @@ build_init_call_stmt (struct nesting_info *info, tree decl, tree field,
 static void
 finalize_nesting_tree_1 (struct nesting_info *root)
 {
+  gimple_seq cleanup_list = NULL;
   gimple_seq stmt_list = NULL;
   gimple *stmt;
   tree context = root->context;
@@ -3508,9 +3527,48 @@ finalize_nesting_tree_1 (struct nesting_info *root)
 	  if (!field)
 	    continue;
 
-	  x = builtin_decl_implicit (BUILT_IN_INIT_TRAMPOLINE);
-	  stmt = build_init_call_stmt (root, i->context, field, x);
-	  gimple_seq_add_stmt (&stmt_list, stmt);
+	  if (flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+	    {
+	      /* We pass a whole bunch of arguments to the builtin function that
+		 creates the off-stack trampoline, these are
+		 1. The nested function chain value (that must be passed to the
+		 nested function so it can find the function arguments).
+		 2. A pointer to the nested function implementation,
+		 3. The address in the local stack frame where we should write
+		 the address of the trampoline.
+
+		 When this code was originally written I just kind of threw
+		 everything at the builtin, figuring I'd work out what was
+		 actually needed later, I think, the stack pointer could
+		 certainly be dropped, arguments #2 and #4 are based off the
+		 stack pointer anyway, so #1 doesn't seem to add much value.  */
+	      tree arg1, arg2, arg3;
+
+	      gcc_assert (DECL_STATIC_CHAIN (i->context));
+	      arg1 = build_addr (root->frame_decl);
+	      arg2 = build_addr (i->context);
+
+	      x = build3 (COMPONENT_REF, TREE_TYPE (field),
+			  root->frame_decl, field, NULL_TREE);
+	      arg3 = build_addr (x);
+
+	      x = builtin_decl_implicit (BUILT_IN_NESTED_PTR_CREATED);
+	      stmt = gimple_build_call (x, 3, arg1, arg2, arg3);
+	      gimple_seq_add_stmt (&stmt_list, stmt);
+
+	      /* This call to delete the nested function trampoline is added to
+		 the cleanup list, and called when we exit the current scope.  */
+	      x = builtin_decl_implicit (BUILT_IN_NESTED_PTR_DELETED);
+	      stmt = gimple_build_call (x, 0);
+	      gimple_seq_add_stmt (&cleanup_list, stmt);
+	    }
+	  else
+	    {
+	      /* Original code to initialise the on stack trampoline.  */
+	      x = builtin_decl_implicit (BUILT_IN_INIT_TRAMPOLINE);
+	      stmt = build_init_call_stmt (root, i->context, field, x);
+	      gimple_seq_add_stmt (&stmt_list, stmt);
+	    }
 	}
     }
 
@@ -3535,11 +3593,40 @@ finalize_nesting_tree_1 (struct nesting_info *root)
   /* If we created initialization statements, insert them.  */
   if (stmt_list)
     {
-      gbind *bind;
-      annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
-      bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
-      gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));
-      gimple_bind_set_body (bind, stmt_list);
+      if (flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+	{
+	  /* Handle off-stack trampolines.  */
+	  gbind *bind;
+	  annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
+	  annotate_all_with_location (cleanup_list, DECL_SOURCE_LOCATION (context));
+	  bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
+	  gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));
+
+	  gimple_seq xxx_list = NULL;
+
+	  if (cleanup_list != NULL)
+	    {
+	      /* Maybe we shouldn't be creating this try/finally if -fno-exceptions is
+		 in use.  If this is the case, then maybe we should, instead, be
+		 inserting the cleanup code onto every path out of this function?  Not
+		 yet figured out how we would do this.  */
+	      gtry *t = gimple_build_try (stmt_list, cleanup_list, GIMPLE_TRY_FINALLY);
+	      gimple_seq_add_stmt (&xxx_list, t);
+	    }
+	  else
+	    xxx_list = stmt_list;
+
+	  gimple_bind_set_body (bind, xxx_list);
+	}
+      else
+	{
+	  /* The traditional, on stack trampolines.  */
+	  gbind *bind;
+	  annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
+	  bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
+	  gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));
+	  gimple_bind_set_body (bind, stmt_list);
+	}
     }
 
   /* If a chain_decl was created, then it needs to be registered with
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 420857b110c..3e7beba8744 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -9870,6 +9870,23 @@ build_common_builtin_nodes (void)
 			"__builtin_nonlocal_goto",
 			ECF_NORETURN | ECF_NOTHROW);
 
+  tree ptr_ptr_type_node = build_pointer_type (ptr_type_node);
+
+  ftype = build_function_type_list (void_type_node,
+				    ptr_type_node, // void *chain
+				    ptr_type_node, // void *func
+				    ptr_ptr_type_node, // void **dst
+				    NULL_TREE);
+  local_define_builtin ("__builtin_nested_func_ptr_created", ftype,
+			BUILT_IN_NESTED_PTR_CREATED,
+			"__builtin_nested_func_ptr_created", ECF_NOTHROW);
+
+  ftype = build_function_type_list (void_type_node,
+				    NULL_TREE);
+  local_define_builtin ("__builtin_nested_func_ptr_deleted", ftype,
+			BUILT_IN_NESTED_PTR_DELETED,
+			"__builtin_nested_func_ptr_deleted", ECF_NOTHROW);
+
   ftype = build_function_type_list (void_type_node,
 				    ptr_type_node, ptr_type_node, NULL_TREE);
   local_define_builtin ("__builtin_setjmp_setup", ftype,
diff --git a/libgcc/config.host b/libgcc/config.host
index 9d7212028d0..e3e311b75a4 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -423,6 +423,9 @@ aarch64*-*-linux*)
 	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
 	tmake_file="${tmake_file} t-dfprules"
+	if test x$heap_trampolines = xyes; then
+	    tmake_file="${tmake_file} ${cpu_type}/t-heap-trampoline"
+	fi
 	;;
 aarch64*-*-vxworks7*)
 	extra_parts="$extra_parts crtfastmath.o"
@@ -697,6 +700,9 @@ x86_64-*-darwin*)
 	tmake_file="$tmake_file i386/t-crtpc t-crtfm i386/t-msabi"
 	tm_file="$tm_file i386/darwin-lib.h"
 	extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o"
+	if test x$heap_trampolines = xyes; then
+	    tmake_file="${tmake_file} i386/t-heap-trampoline"
+	fi
 	;;
 i[34567]86-*-elfiamcu)
 	tmake_file="$tmake_file i386/t-crtstuff t-softfp-sfdftf i386/32/t-softfp i386/32/t-iamcu i386/t-softfp t-softfp t-dfprules"
@@ -763,6 +769,9 @@ x86_64-*-linux*)
 	tmake_file="${tmake_file} i386/t-crtpc t-crtfm i386/t-crtstuff t-dfprules"
 	tm_file="${tm_file} i386/elf-lib.h"
 	md_unwind_header=i386/linux-unwind.h
+	if test x$heap_trampolines = xyes; then
+	    tmake_file="${tmake_file} i386/t-heap-trampoline"
+	fi
 	;;
 x86_64-*-kfreebsd*-gnu)
 	extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o"
diff --git a/libgcc/config/aarch64/heap-trampoline.c b/libgcc/config/aarch64/heap-trampoline.c
new file mode 100644
index 00000000000..c8b83681ed7
--- /dev/null
+++ b/libgcc/config/aarch64/heap-trampoline.c
@@ -0,0 +1,172 @@
+/* Copyright The GNU Toolchain Authors. */
+
+#include <unistd.h>
+#include <sys/mman.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#if __APPLE__
+/* For pthread_jit_write_protect_np */
+#include <pthread.h>
+#endif
+
+void *allocate_trampoline_page (void);
+int get_trampolines_per_page (void);
+struct tramp_ctrl_data *allocate_tramp_ctrl (struct tramp_ctrl_data *parent);
+void *allocate_trampoline_page (void);
+
+void __builtin_nested_func_ptr_created (void *chain, void *func, void **dst);
+void __builtin_nested_func_ptr_deleted (void);
+
+#if defined(__gnu_linux__)
+static const uint32_t aarch64_trampoline_insns[] = {
+  0xd503245f, /* hint    34 */
+  0x580000b1, /* ldr     x17, .+20 */
+  0x580000d2, /* ldr     x18, .+24 */
+  0xd61f0220, /* br      x17 */
+  0xd5033f9f, /* dsb     sy */
+  0xd5033fdf /* isb */
+};
+
+#elif __APPLE__
+static const uint32_t aarch64_trampoline_insns[] = {
+  0xd503245f, /* hint    34 */
+  0x580000b1, /* ldr     x17, .+20 */
+  0x580000d0, /* ldr     x16, .+24 */
+  0xd61f0220, /* br      x17 */
+  0xd5033f9f, /* dsb     sy */
+  0xd5033fdf /* isb */
+};
+
+#else
+#error "Unsupported AArch64 platform for heap trampolines"
+#endif
+
+struct aarch64_trampoline {
+  uint32_t insns[6];
+  void *func_ptr;
+  void *chain_ptr;
+};
+
+struct tramp_ctrl_data
+{
+  struct tramp_ctrl_data *prev;
+
+  int free_trampolines;
+
+  /* This will be pointing to an executable mmap'ed page.  */
+  struct aarch64_trampoline *trampolines;
+};
+
+int
+get_trampolines_per_page (void)
+{
+  return getpagesize() / sizeof(struct aarch64_trampoline);
+}
+
+static _Thread_local struct tramp_ctrl_data *tramp_ctrl_curr = NULL;
+
+void *
+allocate_trampoline_page (void)
+{
+  void *page;
+
+#if defined(__gnu_linux__)
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE, 0, 0);
+#elif __APPLE__
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE | MAP_JIT, 0, 0);
+#else
+  page = MAP_FAILED;
+#endif
+
+  return page;
+}
+
+struct tramp_ctrl_data *
+allocate_tramp_ctrl (struct tramp_ctrl_data *parent)
+{
+  struct tramp_ctrl_data *p = malloc (sizeof (struct tramp_ctrl_data));
+  if (p == NULL)
+    return NULL;
+
+  p->trampolines = allocate_trampoline_page ();
+
+  if (p->trampolines == MAP_FAILED)
+    return NULL;
+
+  p->prev = parent;
+  p->free_trampolines = get_trampolines_per_page();
+
+  return p;
+}
+
+void
+__builtin_nested_func_ptr_created (void *chain, void *func, void **dst)
+{
+  if (tramp_ctrl_curr == NULL)
+    {
+      tramp_ctrl_curr = allocate_tramp_ctrl (NULL);
+      if (tramp_ctrl_curr == NULL)
+	abort ();
+    }
+
+  if (tramp_ctrl_curr->free_trampolines == 0)
+    {
+      void *tramp_ctrl = allocate_tramp_ctrl (tramp_ctrl_curr);
+      if (!tramp_ctrl)
+	abort ();
+
+      tramp_ctrl_curr = tramp_ctrl;
+    }
+
+  struct aarch64_trampoline *trampoline
+    = &tramp_ctrl_curr->trampolines[get_trampolines_per_page ()
+				    - tramp_ctrl_curr->free_trampolines];
+
+#if __APPLE__
+  /* Disable write protection for the MAP_JIT regions in this thread (see
+     https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon) */
+  pthread_jit_write_protect_np (0);
+#endif
+
+  memcpy (trampoline->insns, aarch64_trampoline_insns,
+	  sizeof(aarch64_trampoline_insns));
+  trampoline->func_ptr = func;
+  trampoline->chain_ptr = chain;
+
+#if __APPLE__
+  /* Re-enable write protection.  */
+  pthread_jit_write_protect_np (1);
+#endif
+
+  tramp_ctrl_curr->free_trampolines -= 1;
+
+  __builtin___clear_cache ((void *)trampoline->insns,
+			   ((void *)trampoline->insns + sizeof(trampoline->insns)));
+
+  *dst = &trampoline->insns;
+}
+
+void
+__builtin_nested_func_ptr_deleted (void)
+{
+  if (tramp_ctrl_curr == NULL)
+    abort ();
+
+  tramp_ctrl_curr->free_trampolines += 1;
+
+  if (tramp_ctrl_curr->free_trampolines == get_trampolines_per_page ())
+    {
+      if (tramp_ctrl_curr->prev == NULL)
+	return;
+
+      munmap (tramp_ctrl_curr->trampolines, getpagesize());
+      struct tramp_ctrl_data *prev = tramp_ctrl_curr->prev;
+      free (tramp_ctrl_curr);
+      tramp_ctrl_curr = prev;
+    }
+}
diff --git a/libgcc/config/aarch64/t-heap-trampoline b/libgcc/config/aarch64/t-heap-trampoline
new file mode 100644
index 00000000000..b22480800b2
--- /dev/null
+++ b/libgcc/config/aarch64/t-heap-trampoline
@@ -0,0 +1,19 @@
+# Copyright The GNU Toolchain Authors.
+
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+LIB2ADD += $(srcdir)/config/aarch64/heap-trampoline.c
diff --git a/libgcc/config/i386/heap-trampoline.c b/libgcc/config/i386/heap-trampoline.c
new file mode 100644
index 00000000000..96e13bf828e
--- /dev/null
+++ b/libgcc/config/i386/heap-trampoline.c
@@ -0,0 +1,172 @@
+/* Copyright The GNU Toolchain Authors. */
+
+#include <unistd.h>
+#include <sys/mman.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#if __APPLE__ && __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+/* For pthread_jit_write_protect_np */
+#include <pthread.h>
+#endif
+
+void *allocate_trampoline_page (void);
+int get_trampolines_per_page (void);
+struct tramp_ctrl_data *allocate_tramp_ctrl (struct tramp_ctrl_data *parent);
+void *allocate_trampoline_page (void);
+
+void __builtin_nested_func_ptr_created (void *chain, void *func, void **dst);
+void __builtin_nested_func_ptr_deleted (void);
+
+static const uint8_t trampoline_insns[] = {
+  /* movabs $<chain>,%r11  */
+  0x49, 0xbb,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+
+  /* movabs $<func>,%r10  */
+  0x49, 0xba,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+
+  /* rex.WB jmpq *%r11  */
+  0x41, 0xff, 0xe3
+};
+
+union ix86_trampoline {
+  uint8_t insns[sizeof(trampoline_insns)];
+
+  struct __attribute__((packed)) fields {
+    uint8_t insn_0[2];
+    void *func_ptr;
+    uint8_t insn_1[2];
+    void *chain_ptr;
+    uint8_t insn_2[3];
+  } fields;
+};
+
+struct tramp_ctrl_data
+{
+  struct tramp_ctrl_data *prev;
+
+  int free_trampolines;
+
+  /* This will be pointing to an executable mmap'ed page.  */
+  union ix86_trampoline *trampolines;
+};
+
+int
+get_trampolines_per_page (void)
+{
+  return getpagesize() / sizeof(union ix86_trampoline);
+}
+
+static _Thread_local struct tramp_ctrl_data *tramp_ctrl_curr = NULL;
+
+void *
+allocate_trampoline_page (void)
+{
+  void *page;
+
+#if defined(__gnu_linux__)
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE, 0, 0);
+#elif __APPLE__
+# if  __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE | MAP_JIT, 0, 0);
+# else
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE, 0, 0);
+# endif
+#else
+  page = MAP_FAILED;
+#endif
+
+  return page;
+}
+
+struct tramp_ctrl_data *
+allocate_tramp_ctrl (struct tramp_ctrl_data *parent)
+{
+  struct tramp_ctrl_data *p = malloc (sizeof (struct tramp_ctrl_data));
+  if (p == NULL)
+    return NULL;
+
+  p->trampolines = allocate_trampoline_page ();
+
+  if (p->trampolines == MAP_FAILED)
+    return NULL;
+
+  p->prev = parent;
+  p->free_trampolines = get_trampolines_per_page();
+
+  return p;
+}
+
+void
+__builtin_nested_func_ptr_created (void *chain, void *func, void **dst)
+{
+  if (tramp_ctrl_curr == NULL)
+    {
+      tramp_ctrl_curr = allocate_tramp_ctrl (NULL);
+      if (tramp_ctrl_curr == NULL)
+	abort ();
+    }
+
+  if (tramp_ctrl_curr->free_trampolines == 0)
+    {
+      void *tramp_ctrl = allocate_tramp_ctrl (tramp_ctrl_curr);
+      if (!tramp_ctrl)
+	abort ();
+
+      tramp_ctrl_curr = tramp_ctrl;
+    }
+
+  union ix86_trampoline *trampoline
+    = &tramp_ctrl_curr->trampolines[get_trampolines_per_page ()
+				    - tramp_ctrl_curr->free_trampolines];
+
+#if __APPLE__ && __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+  /* Disable write protection for the MAP_JIT regions in this thread (see
+     https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon) */
+  pthread_jit_write_protect_np (0);
+#endif
+
+  memcpy (trampoline->insns, trampoline_insns,
+	  sizeof(trampoline_insns));
+  trampoline->fields.func_ptr = func;
+  trampoline->fields.chain_ptr = chain;
+
+#if __APPLE__ && __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+  /* Re-enable write protection.  */
+  pthread_jit_write_protect_np (1);
+#endif
+
+  tramp_ctrl_curr->free_trampolines -= 1;
+
+  __builtin___clear_cache ((void *)trampoline->insns,
+			   ((void *)trampoline->insns + sizeof(trampoline->insns)));
+
+  *dst = &trampoline->insns;
+}
+
+void
+__builtin_nested_func_ptr_deleted (void)
+{
+  if (tramp_ctrl_curr == NULL)
+    abort ();
+
+  tramp_ctrl_curr->free_trampolines += 1;
+
+  if (tramp_ctrl_curr->free_trampolines == get_trampolines_per_page ())
+    {
+      if (tramp_ctrl_curr->prev == NULL)
+	return;
+
+      munmap (tramp_ctrl_curr->trampolines, getpagesize());
+      struct tramp_ctrl_data *prev = tramp_ctrl_curr->prev;
+      free (tramp_ctrl_curr);
+      tramp_ctrl_curr = prev;
+    }
+}
diff --git a/libgcc/config/i386/t-heap-trampoline b/libgcc/config/i386/t-heap-trampoline
new file mode 100644
index 00000000000..613f635b1f6
--- /dev/null
+++ b/libgcc/config/i386/t-heap-trampoline
@@ -0,0 +1,19 @@
+# Copyright The GNU Toolchain Authors.
+
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+LIB2ADD += $(srcdir)/config/i386/heap-trampoline.c
diff --git a/libgcc/configure b/libgcc/configure
index be5d45f1755..f607f592a90 100755
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -654,6 +654,7 @@ build_cpu
 build
 with_aix_soname
 enable_vtable_verify
+heap_trampolines
 enable_shared
 libgcc_topdir
 target_alias
@@ -701,6 +702,7 @@ with_target_subdir
 with_cross_host
 with_ld
 enable_shared
+enable_heap_trampolines
 enable_vtable_verify
 with_aix_soname
 enable_version_specific_runtime_libs
@@ -1342,6 +1344,9 @@ Optional Features:
   --disable-FEATURE       do not include FEATURE (same as --enable-FEATURE=no)
   --enable-FEATURE[=ARG]  include FEATURE [ARG=yes]
   --disable-shared        don't provide a shared libgcc
+  --enable-heap-trampolines
+                  Specify whether to support generating heap trampolines
+
   --enable-vtable-verify    Enable vtable verification feature
   --enable-version-specific-runtime-libs    Specify that runtime libraries should be installed in a compiler-specific directory
   --enable-maintainer-mode
@@ -2252,6 +2257,39 @@ fi
 
 
 
+# Check whether --enable-heap-trampolines was given.
+if test "${enable_heap_trampolines+set}" = set; then :
+  enableval=$enable_heap_trampolines;
+case "$target" in
+  x86_64-*-linux* | x86_64-*-darwin1[4-9]* | x86_64-*-darwin2*)
+    heap_trampolines=$enableval
+    ;;
+  aarch64*-*-linux* )
+    heap_trampolines=$enableval
+    ;;
+  aarch64*-*darwin* )
+    heap_trampolines=$enableval
+    ;;
+  *)
+    as_fn_error $? "Configure option --enable-off-stack-trampolines is not supported \
+for this platform" "$LINENO" 5
+    heap_trampolines=no
+    ;;
+esac
+else
+
+case "$target" in
+  *-*-darwin2*)
+    heap_trampolines=yes
+    ;;
+  *)
+    heap_trampolines=no
+    ;;
+esac
+fi
+
+
+
 # Check whether --enable-vtable-verify was given.
 if test "${enable_vtable_verify+set}" = set; then :
   enableval=$enable_vtable_verify; case "$enableval" in
diff --git a/libgcc/configure.ac b/libgcc/configure.ac
index 2fc9d5d7c93..459657838e0 100644
--- a/libgcc/configure.ac
+++ b/libgcc/configure.ac
@@ -68,6 +68,35 @@ AC_ARG_ENABLE(shared,
 ], [enable_shared=yes])
 AC_SUBST(enable_shared)
 
+AC_ARG_ENABLE([heap-trampolines],
+  [AS_HELP_STRING([--enable-heap-trampolines]
+                  [Specify whether to support generating heap trampolines])],[
+case "$target" in
+  x86_64-*-linux* | x86_64-*-darwin1[[4-9]]* | x86_64-*-darwin2*)
+    heap_trampolines=$enableval
+    ;;
+  aarch64*-*-linux* )
+    heap_trampolines=$enableval
+    ;;
+  aarch64*-*darwin* )
+    heap_trampolines=$enableval
+    ;;
+  *)
+    AC_MSG_ERROR([Configure option --enable-off-stack-trampolines is not supported \
+for this platform])
+    heap_trampolines=no
+    ;;
+esac],[
+case "$target" in
+  *-*-darwin2*)
+    heap_trampolines=yes
+    ;;
+  *)
+    heap_trampolines=no
+    ;;
+esac])
+AC_SUBST(heap_trampolines)
+
 AC_ARG_ENABLE(vtable-verify,
 [  --enable-vtable-verify    Enable vtable verification feature ],
 [case "$enableval" in
diff --git a/libgcc/libgcc-std.ver.in b/libgcc/libgcc-std.ver.in
index c4f87a50e70..a48f4899eb6 100644
--- a/libgcc/libgcc-std.ver.in
+++ b/libgcc/libgcc-std.ver.in
@@ -1943,4 +1943,7 @@ GCC_4.8.0 {
 GCC_7.0.0 {
   __PFX__divmoddi4
   __PFX__divmodti4
+
+  __builtin_nested_func_ptr_created
+  __builtin_nested_func_ptr_deleted
 }
diff --git a/libgcc/libgcc2.h b/libgcc/libgcc2.h
index 3ec9bbd8164..ac7eaab4f01 100644
--- a/libgcc/libgcc2.h
+++ b/libgcc/libgcc2.h
@@ -29,6 +29,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #pragma GCC visibility push(default)
 #endif
 
+extern void __builtin_nested_func_ptr_created (void *, void *, void **);
+extern void __builtin_nested_func_ptr_deleted (void);
+
 extern int __gcc_bcmp (const unsigned char *, const unsigned char *, size_t);
 extern void __clear_cache (void *, void *);
 extern void __eprintf (const char *, const char *, unsigned int, const char *)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] core: Support heap-based trampolines
  2023-07-16 10:38 [PATCH] core: Support heap-based trampolines FX Coudert
@ 2023-07-17  6:31 ` Richard Biener
  2023-07-17  6:43   ` FX Coudert
  2023-08-05 14:20   ` FX Coudert
  0 siblings, 2 replies; 15+ messages in thread
From: Richard Biener @ 2023-07-17  6:31 UTC (permalink / raw)
  To: FX Coudert; +Cc: gcc-patches, Iain Sandoe, maxim.blinov, ebotcazou, Jeff Law

On Sun, Jul 16, 2023 at 12:39 PM FX Coudert via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> This is a reworked version (following review) of the patch by Maxim Blinov and Iain Sandoe enabling heap-based trampolines. What has changed since the last version:
>
> - wording changes, preferring to use “heap-based trampolines” rather than “off-stack trampolines”
> - the option triggering generation of these new trampolines is now a binary choice: -ftrampoline-impl=[stack|heap]
> - some adjustments due to changes in the macOS build flags in GCC since last year
>
> Regarding testing, this patch has had excellent exposure on darwin (both x86_64 and aarch64) because it was part of Iain’s branch, distributed by many macOS distros/vendors (including Homebrew) for more than a year, and there was no bug report against the feature or implementation. On x86_64-linux, I have regression-tested it in three different configurations:
> - a default build
> - a build with --enable-heap-trampolines
> - a build with --enable-heap-trampolines and heap trampolines forced by default (forcing HEAP_TRAMPOLINES_INIT = 1)
>
> There are no regressions in any of these settings (apart from an expected missing warning in gcc.dg/Wtrampolines.c).
>
> ----------
>
> From the original review, one question asked (by Jeff Law) was: whether the two linux implementations should be dropped, and the configure time
> selectability as well. Regardless of the answer to the first question, I think we probably want to retain the later, even if only for darwin, as we want to implement this only on recent darwin versions.

Since this affects the ABI of libgcc I think we don't want that part
to be user configurable but rather determined by
some static list of targets that opt-in to this config.

You mention setjmp/longjmp - on darwin and other platforms requiring
non-stack based trampolines
does the system runtime provide means to deal with this issue like an
alternate allocation method
or a way to register cleanup?

Was there ever an attempt to provide a "generic" trampoline driven by
a more complex descriptor?
(well, it could be a bytecode interpreter and the trampoline being
bytecode on the stack?!)

Otherwise I suggest to split the patch into libgcc, generic and target parts.

Thanks,
Richard.

>
> OK to commit?
>
> FX
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] core: Support heap-based trampolines
  2023-07-17  6:31 ` Richard Biener
@ 2023-07-17  6:43   ` FX Coudert
  2023-07-17  6:58     ` Iain Sandoe
  2023-08-05 14:20   ` FX Coudert
  1 sibling, 1 reply; 15+ messages in thread
From: FX Coudert @ 2023-07-17  6:43 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Iain Sandoe, maxim.blinov, ebotcazou, Jeff Law

Hi,

> Since this affects the ABI of libgcc I think we don't want that part
> to be user configurable but rather determined by
> some static list of targets that opt-in to this config.

If I do that, do the Linux maintainers want Linux in or out?


> You mention setjmp/longjmp - on darwin and other platforms requiring
> non-stack based trampolines
> does the system runtime provide means to deal with this issue like an
> alternate allocation method
> or a way to register cleanup?

There is an alternate mechanism relying on system libraries that is possible on darwin specifically (I don’t know for other targets) but it will only work for signed binaries, and would require us to codesign everything produced by gcc. During development, it was deemed too big an ask and the current strategy was chosen (Iain can surely add more background on that if needed).


> Was there ever an attempt to provide a "generic" trampoline driven by
> a more complex descriptor?
> (well, it could be a bytecode interpreter and the trampoline being
> bytecode on the stack?!)

My own opinion is that executable stack should go away on all targets at some point, so a truly generic solution to the problem would be great. Having something that works reliably across all targets, like you suggest, is a much bigger project that this patch, and I am not aware of any previous attempt at it.


> Otherwise I suggest to split the patch into libgcc, generic and target parts.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] core: Support heap-based trampolines
  2023-07-17  6:43   ` FX Coudert
@ 2023-07-17  6:58     ` Iain Sandoe
  2023-07-17  7:16       ` Iain Sandoe
  0 siblings, 1 reply; 15+ messages in thread
From: Iain Sandoe @ 2023-07-17  6:58 UTC (permalink / raw)
  To: Richard Biener
  Cc: GCC Patches, Maxim Blinov, FX Coudert, Eric Botcazou, Jeff Law, aburgess



> On 17 Jul 2023, at 07:43, FX Coudert <fxcoudert@gmail.com> wrote:
> 
> Hi,
> 
>> Since this affects the ABI of libgcc I think we don't want that part
>> to be user configurable but rather determined by
>> some static list of targets that opt-in to this config.
> 
> If I do that, do the Linux maintainers want Linux in or out?

Presumably that can be a target define, and can be opted in/out in libgcc/config.host (given that the target maintainer also needs to provide the builtins).

>> You mention setjmp/longjmp - on darwin and other platforms requiring
>> non-stack based trampolines
>> does the system runtime provide means to deal with this issue like an
>> alternate allocation method
>> or a way to register cleanup?
> 
> There is an alternate mechanism relying on system libraries that is possible on darwin specifically (I don’t know for other targets) but it will only work for signed binaries, and would require us to codesign everything produced by gcc. During development, it was deemed too big an ask and the current strategy was chosen (Iain can surely add more background on that if needed).

I do not think that this solves the setjump/longjump issue - since there’s still a notional allocation that takes place (it’s just that the mechanism for determining permissions is different).

It is also a big barrier for the general user - and prevents normal folks from distributing GCC - since codesigning requires an external certificate (i.e. I would really rather avoid it).

>> Was there ever an attempt to provide a "generic" trampoline driven by
>> a more complex descriptor?

We did look at the “unused address bits” mechanism that Ada has used - but that is not really available to a non-private ABI (unless the system vendor agrees to change ABI to leave a bit spare) for the base arch either the bits are not there (e.g. X86) or reserved (e.g. AArch64).

Andrew Burgess did the original work he might have comments on alternatives we tried

>> (well, it could be a bytecode interpreter and the trampoline being
>> bytecode on the stack?!)
> 
> My own opinion is that executable stack should go away on all targets at some point, so a truly generic solution to the problem would be great.

indeed it would.

> Having something that works reliably across all targets, like you suggest, is a much bigger project that this patch, and I am not aware of any previous attempt at it.

The bytecode interpreter idea is neat;  (a) I wonder about performance and (b) it is, as FX says, a bigger project - certainly bigger than the voluntary Darwin time available :(

Iain

> 
> 
>> Otherwise I suggest to split the patch into libgcc, generic and target parts.
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] core: Support heap-based trampolines
  2023-07-17  6:58     ` Iain Sandoe
@ 2023-07-17  7:16       ` Iain Sandoe
  2023-07-19  9:04         ` Martin Uecker
  0 siblings, 1 reply; 15+ messages in thread
From: Iain Sandoe @ 2023-07-17  7:16 UTC (permalink / raw)
  To: Richard Biener
  Cc: GCC Patches, Maxim Blinov, FX Coudert, Eric Botcazou, Jeff Law, aburgess



> On 17 Jul 2023, at 07:58, Iain Sandoe <iain@sandoe.co.uk> wrote
> 
>> On 17 Jul 2023, at 07:43, FX Coudert <fxcoudert@gmail.com> wrote:
>> 

>> 
>> There is an alternate mechanism relying on system libraries that is possible on darwin specifically (I don’t know for other targets) but it will only work for signed binaries, and would require us to codesign everything produced by gcc. During development, it was deemed too big an ask and the current strategy was chosen (Iain can surely add more background on that if needed).
> 
> I do not think that this solves the setjump/longjump issue - since there’s still a notional allocation that takes place (it’s just that the mechanism for determining permissions is different).
> 
> It is also a big barrier for the general user - and prevents normal folks from distributing GCC - since codesigning requires an external certificate (i.e. I would really rather avoid it).
> 
>>> Was there ever an attempt to provide a "generic" trampoline driven by
>>> a more complex descriptor?
> 
> We did look at the “unused address bits” mechanism that Ada has used - but that is not really available to a non-private ABI (unless the system vendor agrees to change ABI to leave a bit spare) for the base arch either the bits are not there (e.g. X86) or reserved (e.g. AArch64).
> 
> Andrew Burgess did the original work he might have comments on alternatives we tried

Although I will comment that the main barrier to data / descriptor based schemes is that we allow recursive use of nested functions and that means that each nest level needs a distinct target address to branch to / call.  [that might also make the bytecode scheme hard(er)]

Iain


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] core: Support heap-based trampolines
  2023-07-17  7:16       ` Iain Sandoe
@ 2023-07-19  9:04         ` Martin Uecker
  2023-07-19  9:29           ` Iain Sandoe
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Uecker @ 2023-07-19  9:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: iain



> 
> > On 17 Jul 2023, 
> 

> >> You mention setjmp/longjmp - on darwin and other platforms
> requiring
> >> non-stack based trampolines
> >> does the system runtime provide means to deal with this issue like
> an
> >> alternate allocation method
> >> or a way to register cleanup?
> > 
> > There is an alternate mechanism relying on system libraries that is
> possible on darwin specifically (I don’t know for other targets) but
> it will only work for signed binaries, and would require us to
> codesign everything produced by gcc. During development, it was
> deemed too big an ask and the current strategy was chosen (Iain can
> surely add more background on that if needed).
> 
> I do not think that this solves the setjump/longjump issue - since
> there’s still a notional allocation that takes place (it’s just that
> the mechanism for determining permissions is different).
> 
> It is also a big barrier for the general user - and prevents normal
> folks from distributing GCC - since codesigning requires an external
> certificate (i.e. I would really rather avoid it).
> 
> >> Was there ever an attempt to provide a "generic" trampoline driven
> by
> >> a more complex descriptor?
> 
> We did look at the “unused address bits” mechanism that Ada has used
> - but that is not really available to a non-private ABI (unless the
> system vendor agrees to change ABI to leave a bit spare) for the base
> arch either the bits are not there (e.g. X86) or reserved (e.g.
> AArch64).
> 
> Andrew Burgess did the original work he might have comments on
> alternatives we tried
> 

For reference, I proposed a patch for this in 2018. It was not
accepted because minimum alignment for functions would increase
for some archs:

https://gcc.gnu.org/legacy-ml/gcc-patches/2018-12/msg01532.html



> >> (well, it could be a bytecode interpreter and the trampoline being
> >> bytecode on the stack?!)
> > 
> > My own opinion is that executable stack should go away on all
> targets at some point, so a truly generic solution to the problem
> would be great.
> 
> indeed it would.
> 

I think we need a solution rather sooner than later on all archs.

Martin

> > Having something that works reliably across all targets, like you
> suggest, is a much bigger project that this patch, and I am not aware
> of any previous attempt at it.
> 
> The bytecode interpreter idea is neat;  (a) I wonder about
> performance and (b) it is, as FX says, a bigger project - certainly
> bigger than the voluntary Darwin time available :(
> 
> Iain
> 
> > 
> > 
> >> Otherwise I suggest to split the patch into libgcc, generic and
> target parts.
> > 
> > 
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] core: Support heap-based trampolines
  2023-07-19  9:04         ` Martin Uecker
@ 2023-07-19  9:29           ` Iain Sandoe
  2023-07-19 10:43             ` Martin Uecker
  0 siblings, 1 reply; 15+ messages in thread
From: Iain Sandoe @ 2023-07-19  9:29 UTC (permalink / raw)
  To: Martin Uecker
  Cc: GCC Patches, FX Coudert, Richard Biener, Maxim Blinov,
	Eric Botcazou, Jeff Law, aburgess

Hi Martin,

> On 19 Jul 2023, at 10:04, Martin Uecker <ma.uecker@gmail.com> wrote:

>>> On 17 Jul 2023, 
>> 
> 
>>>> You mention setjmp/longjmp - on darwin and other platforms
>> requiring
>>>> non-stack based trampolines
>>>> does the system runtime provide means to deal with this issue like
>> an
>>>> alternate allocation method
>>>> or a way to register cleanup?
>>> 
>>> There is an alternate mechanism relying on system libraries that is
>> possible on darwin specifically (I don’t know for other targets) but
>> it will only work for signed binaries, and would require us to
>> codesign everything produced by gcc. During development, it was
>> deemed too big an ask and the current strategy was chosen (Iain can
>> surely add more background on that if needed).
>> 
>> I do not think that this solves the setjump/longjump issue - since
>> there’s still a notional allocation that takes place (it’s just that
>> the mechanism for determining permissions is different).
>> 
>> It is also a big barrier for the general user - and prevents normal
>> folks from distributing GCC - since codesigning requires an external
>> certificate (i.e. I would really rather avoid it).
>> 
>>>> Was there ever an attempt to provide a "generic" trampoline driven
>> by
>>>> a more complex descriptor?
>> 
>> We did look at the “unused address bits” mechanism that Ada has used
>> - but that is not really available to a non-private ABI (unless the
>> system vendor agrees to change ABI to leave a bit spare) for the base
>> arch either the bits are not there (e.g. X86) or reserved (e.g.
>> AArch64).
>> 
>> Andrew Burgess did the original work he might have comments on
>> alternatives we tried
>> 
> 
> For reference, I proposed a patch for this in 2018. It was not
> accepted because minimum alignment for functions would increase
> for some archs:
> 
> https://gcc.gnu.org/legacy-ml/gcc-patches/2018-12/msg01532.html

Right - that was the one we originally looked at and has the issue that it 
breaks ABI - and thus would need vendor by-in to alter as you say.

>>>> (well, it could be a bytecode interpreter and the trampoline being
>>>> bytecode on the stack?!)
>>> 
>>> My own opinion is that executable stack should go away on all
>> targets at some point, so a truly generic solution to the problem
>> would be great.
>> 
>> indeed it would.

> I think we need a solution rather sooner than later on all archs.

AFAICS the  heap-based trampolines can work for any arch**, this issue is about
system security policy, rather than arch, specifically?

It seems to me that for any system security policy that permits JIT, (but not
executable stack) the heap-based trampolines are viable.

This seems to be a useful step forward; and we can add some other mechanism
to the flag’s supported list if someone develops one?

Iain

** modulo the target maintainers implementing the builtins.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] core: Support heap-based trampolines
  2023-07-19  9:29           ` Iain Sandoe
@ 2023-07-19 10:43             ` Martin Uecker
  2023-07-19 14:23               ` Iain Sandoe
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Uecker @ 2023-07-19 10:43 UTC (permalink / raw)
  To: Iain Sandoe
  Cc: GCC Patches, FX Coudert, Richard Biener, Maxim Blinov,
	Eric Botcazou, Jeff Law, aburgess

Am Mittwoch, dem 19.07.2023 um 10:29 +0100 schrieb Iain Sandoe:
> Hi Martin,
> 
> > On 19 Jul 2023, at 10:04, Martin Uecker <ma.uecker@gmail.com>
> > wrote:
> 
> > > > On 17 Jul 2023, 
> > > 
> > 
> > > > > You mention setjmp/longjmp - on darwin and other platforms
> > > requiring
> > > > > non-stack based trampolines
> > > > > does the system runtime provide means to deal with this issue
> > > > > like
> > > an
> > > > > alternate allocation method
> > > > > or a way to register cleanup?
> > > > 
> > > > There is an alternate mechanism relying on system libraries
> > > > that is
> > > possible on darwin specifically (I don’t know for other targets)
> > > but
> > > it will only work for signed binaries, and would require us to
> > > codesign everything produced by gcc. During development, it was
> > > deemed too big an ask and the current strategy was chosen (Iain
> > > can
> > > surely add more background on that if needed).
> > > 
> > > I do not think that this solves the setjump/longjump issue -
> > > since
> > > there’s still a notional allocation that takes place (it’s just
> > > that
> > > the mechanism for determining permissions is different).
> > > 
> > > It is also a big barrier for the general user - and prevents
> > > normal
> > > folks from distributing GCC - since codesigning requires an
> > > external
> > > certificate (i.e. I would really rather avoid it).
> > > 
> > > > > Was there ever an attempt to provide a "generic" trampoline
> > > > > driven
> > > by
> > > > > a more complex descriptor?
> > > 
> > > We did look at the “unused address bits” mechanism that Ada has
> > > used
> > > - but that is not really available to a non-private ABI (unless
> > > the
> > > system vendor agrees to change ABI to leave a bit spare) for the
> > > base
> > > arch either the bits are not there (e.g. X86) or reserved (e.g.
> > > AArch64).
> > > 
> > > Andrew Burgess did the original work he might have comments on
> > > alternatives we tried
> > > 
> > 
> > For reference, I proposed a patch for this in 2018. It was not
> > accepted because minimum alignment for functions would increase
> > for some archs:
> > 
> > https://gcc.gnu.org/legacy-ml/gcc-patches/2018-12/msg01532.html
> 
> Right - that was the one we originally looked at and has the issue
> that it 
> breaks ABI - and thus would need vendor by-in to alter as you say.
> 
> > > > > (well, it could be a bytecode interpreter and the trampoline
> > > > > being
> > > > > bytecode on the stack?!)
> > > > 
> > > > My own opinion is that executable stack should go away on all
> > > targets at some point, so a truly generic solution to the problem
> > > would be great.
> > > 
> > > indeed it would.
> 
> > I think we need a solution rather sooner than later on all archs.
> 
> AFAICS the  heap-based trampolines can work for any arch**, this
> issue is about
> system security policy, rather than arch, specifically?
> 
> It seems to me that for any system security policy that permits JIT,
> (but not
> executable stack) the heap-based trampolines are viable.

I agree. 

BTW; One option we discussed before, was to map a page with 
pre-allocated trampolines, which look up the address of
a callee and the static chain in a table based on its own
address. Then no code generation is involved.

The difficult part is avoiding leaks with longjmp / setjmp.
One idea was to have a shadow stack consisting of the
pre-allocated trampolines, but this probably causes other
issues...

I wonder how difficult it is to have longjmp / setjmp walk 
the stack in C?   This would also be useful for C++
interoperability and to free  heap-allocated VLAs.


As a user of nested functions, from my side it would also 
ok to simply add a wide function pointer type that contains
address + static chain.  This would require changing code, 
but would also work with Clang's blocks and solve other 
language interoperability problems, while avoiding all 
existing ABI issues.

> 
> This seems to be a useful step forward; and we can add some other
> mechanism to the flag’s supported list if someone develops one?

I think it is a useful step forward.

Martin


> 
> Iain
> 
> ** modulo the target maintainers implementing the builtins.
> 
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] core: Support heap-based trampolines
  2023-07-19 10:43             ` Martin Uecker
@ 2023-07-19 14:23               ` Iain Sandoe
  2023-07-19 15:18                 ` Martin Uecker
  0 siblings, 1 reply; 15+ messages in thread
From: Iain Sandoe @ 2023-07-19 14:23 UTC (permalink / raw)
  To: Martin Uecker
  Cc: GCC Patches, FX Coudert, Richard Biener, Maxim Blinov,
	Eric Botcazou, Jeff Law, aburgess

Hi Martin,

> On 19 Jul 2023, at 11:43, Martin Uecker via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> Am Mittwoch, dem 19.07.2023 um 10:29 +0100 schrieb Iain Sandoe:

>>> On 19 Jul 2023, at 10:04, Martin Uecker <ma.uecker@gmail.com>
>>> wrote:
>> 
>>>>> On 17 Jul 2023, 
>>>> 
>>> 
>>>>>> You mention setjmp/longjmp - on darwin and other platforms
>>>> requiring
>>>>>> non-stack based trampolines
>>>>>> does the system runtime provide means to deal with this issue
>>>>>> like
>>>> an
>>>>>> alternate allocation method
>>>>>> or a way to register cleanup?
>>>>> 
>>>>> There is an alternate mechanism relying on system libraries
>>>>> that is
>>>> possible on darwin specifically (I don’t know for other targets)
>>>> but
>>>> it will only work for signed binaries, and would require us to
>>>> codesign everything produced by gcc. During development, it was
>>>> deemed too big an ask and the current strategy was chosen (Iain
>>>> can
>>>> surely add more background on that if needed).
>>>> 
>>>> I do not think that this solves the setjump/longjump issue -
>>>> since
>>>> there’s still a notional allocation that takes place (it’s just
>>>> that
>>>> the mechanism for determining permissions is different).
>>>> 
>>>> It is also a big barrier for the general user - and prevents
>>>> normal
>>>> folks from distributing GCC - since codesigning requires an
>>>> external
>>>> certificate (i.e. I would really rather avoid it).
>>>> 
>>>>>> Was there ever an attempt to provide a "generic" trampoline
>>>>>> driven
>>>> by
>>>>>> a more complex descriptor?

>>>>> My own opinion is that executable stack should go away on all
>>>> targets at some point, so a truly generic solution to the problem
>>>> would be great.
>>>> 
>>>> indeed it would.
>> 
>>> I think we need a solution rather sooner than later on all archs.
>> 
>> AFAICS the  heap-based trampolines can work for any arch**, this
>> issue is about
>> system security policy, rather than arch, specifically?
>> 
>> It seems to me that for any system security policy that permits JIT,
>> (but not
>> executable stack) the heap-based trampolines are viable.
> 
> I agree. 
> 
> BTW; One option we discussed before, was to map a page with 
> pre-allocated trampolines, which look up the address of
> a callee and the static chain in a table based on its own
> address. Then no code generation is involved.

That reads similar to the scheme Apple have implemented for libobjc and libffi.
In order to be extensible (i.e to allow the table to grow at runtime), it means
having some loadable executable object; if that is implemented in a way shared
between users (delivered as part of the implementation) then, for Darwin at
least, it must be codesigned - which is somewhere I really want to avoid going
with GCC.  

> The difficult part is avoiding leaks with longjmp / setjmp.
> One idea was to have a shadow stack consisting of the
> pre-allocated trampolines, but this probably causes other
> issues...

With a per-thread table, I *think* for most targets, we discussed in the team
maintaining a ’tide mark’ of the stack as part of the saved data in the
trampoline (not used as part of the execution, but only as part of the allocation
mangement)… but ..

> I wonder how difficult it is to have longjmp / setjmp walk 
> the stack in C?   This would also be useful for C++
> interoperability and to free  heap-allocated VLAs.

… this would be a better solution (as we can see trampolines are a small
leak c.f. the general uses)?

> As a user of nested functions, from my side it would also 
> ok to simply add a wide function pointer type that contains
> address + static chain.  This would require changing code, 
> but would also work with Clang's blocks and solve other 
> language interoperability problems, while avoiding all 
> existing ABI issues.

How does that work when passing a callback to libc (e.g. qsort?)

(Implementing Clang’s blocks is also on my TODO, but a different discussion ;))

>> This seems to be a useful step forward; and we can add some other
>> mechanism to the flag’s supported list if someone develops one?
> 
> I think it is a useful step forward.

Assembled maintainers, do you think this is OK for trunk given the various
discussions above?

thanks
Iain


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] core: Support heap-based trampolines
  2023-07-19 14:23               ` Iain Sandoe
@ 2023-07-19 15:18                 ` Martin Uecker
  0 siblings, 0 replies; 15+ messages in thread
From: Martin Uecker @ 2023-07-19 15:18 UTC (permalink / raw)
  To: Iain Sandoe
  Cc: GCC Patches, FX Coudert, Richard Biener, Maxim Blinov,
	Eric Botcazou, Jeff Law, aburgess

Am Mittwoch, dem 19.07.2023 um 15:23 +0100 schrieb Iain Sandoe:
> Hi Martin,
> 
> > On 19 Jul 2023, at 11:43, Martin Uecker via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > 
> > Am Mittwoch, dem 19.07.2023 um 10:29 +0100 schrieb Iain Sandoe:
> 
> > > > On 19 Jul 2023, at 10:04, Martin Uecker <ma.uecker@gmail.com>
> > > > wrote:
> > > 
> > > > > > On 17 Jul 2023, 
> > > > > 
> > > > 
> > > > > > > You mention setjmp/longjmp - on darwin and other platforms
> > > > > requiring
> > > > > > > non-stack based trampolines
> > > > > > > does the system runtime provide means to deal with this issue
> > > > > > > like
> > > > > an
> > > > > > > alternate allocation method
> > > > > > > or a way to register cleanup?
> > > > > > 
> > > > > > There is an alternate mechanism relying on system libraries
> > > > > > that is
> > > > > possible on darwin specifically (I don’t know for other targets)
> > > > > but
> > > > > it will only work for signed binaries, and would require us to
> > > > > codesign everything produced by gcc. During development, it was
> > > > > deemed too big an ask and the current strategy was chosen (Iain
> > > > > can
> > > > > surely add more background on that if needed).
> > > > > 
> > > > > I do not think that this solves the setjump/longjump issue -
> > > > > since
> > > > > there’s still a notional allocation that takes place (it’s just
> > > > > that
> > > > > the mechanism for determining permissions is different).
> > > > > 
> > > > > It is also a big barrier for the general user - and prevents
> > > > > normal
> > > > > folks from distributing GCC - since codesigning requires an
> > > > > external
> > > > > certificate (i.e. I would really rather avoid it).
> > > > > 
> > > > > > > Was there ever an attempt to provide a "generic" trampoline
> > > > > > > driven
> > > > > by
> > > > > > > a more complex descriptor?
> 
> > > > > > My own opinion is that executable stack should go away on all
> > > > > targets at some point, so a truly generic solution to the problem
> > > > > would be great.
> > > > > 
> > > > > indeed it would.
> > > 
> > > > I think we need a solution rather sooner than later on all archs.
> > > 
> > > AFAICS the  heap-based trampolines can work for any arch**, this
> > > issue is about
> > > system security policy, rather than arch, specifically?
> > > 
> > > It seems to me that for any system security policy that permits JIT,
> > > (but not
> > > executable stack) the heap-based trampolines are viable.
> > 
> > I agree. 
> > 
> > BTW; One option we discussed before, was to map a page with 
> > pre-allocated trampolines, which look up the address of
> > a callee and the static chain in a table based on its own
> > address. Then no code generation is involved.
> 
> That reads similar to the scheme Apple have implemented for libobjc and libffi.
> In order to be extensible (i.e to allow the table to grow at runtime), it means
> having some loadable executable object; if that is implemented in a way shared
> between users (delivered as part of the implementation) then, for Darwin at
> least, it must be codesigned - which is somewhere I really want to avoid going
> with GCC.  
> 
> > The difficult part is avoiding leaks with longjmp / setjmp.
> > One idea was to have a shadow stack consisting of the
> > pre-allocated trampolines, but this probably causes other
> > issues...
> 
> With a per-thread table, I *think* for most targets, we discussed in the team
> maintaining a ’tide mark’ of the stack as part of the saved data in the
> trampoline (not used as part of the execution, but only as part of the allocation
> mangement)… but ..
> 
> > I wonder how difficult it is to have longjmp / setjmp walk 
> > the stack in C?   This would also be useful for C++
> > interoperability and to free  heap-allocated VLAs.
> 
> … this would be a better solution (as we can see trampolines are a small
> leak c.f. the general uses)?
> 
> > As a user of nested functions, from my side it would also 
> > ok to simply add a wide function pointer type that contains
> > address + static chain.  This would require changing code, 
> > but would also work with Clang's blocks and solve other 
> > language interoperability problems, while avoiding all 
> > existing ABI issues.
> 
> How does that work when passing a callback to libc (e.g. qsort?)

This would not work because it would be an ABI change, but because
it solves a general problem and would plug a major language
interoperability issue between C and all languages that have
callable objects or nested functions, I think one could make a
good case that such an extension goes into C standard together
with a set of enhanced interfaces.

I have an initial proposal here:

http://www2.open-std.org/JTC1/SC22/WG14/www/docs/n2787.pdf

One could combine this with other solutions where a user
created trampoline with explicit allocation and deallocation
calls such a wide pointer, i.e. one has a library function
that takes a wide pointer and returns regular function pointer
that points to an allocated trampoline the user has to free
explicitely.

BTW: having a GCC builtin that returns the static chain
for a nested function so that it can later be used with
__builtin_call_with_static_chain would also be helpful.

> (Implementing Clang’s blocks is also on my TODO, but a different discussion ;))

It would be nice to make this compatible.

Martin

> > > This seems to be a useful step forward; and we can add some other
> > > mechanism to the flag’s supported list if someone develops one?
> > 
> > I think it is a useful step forward.
> 
> Assembled maintainers, do you think this is OK for trunk given the various
> discussions above?
> 
> thanks
> Iain
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] core: Support heap-based trampolines
  2023-07-17  6:31 ` Richard Biener
  2023-07-17  6:43   ` FX Coudert
@ 2023-08-05 14:20   ` FX Coudert
  2023-08-20  9:43     ` FX Coudert
  2023-09-06 15:44     ` FX Coudert
  1 sibling, 2 replies; 15+ messages in thread
From: FX Coudert @ 2023-08-05 14:20 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Iain Sandoe, maxim.blinov, ebotcazou, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 495 bytes --]

Hi Richard,

Thanks for your feedback. Here is an amended version of the patch, taking into consideration your requests and the following discussion. There is no configure option for the libgcc part, and the documentation is amended. The patch is split into three commits for core, target and libgcc.

Currently regtesting on x86_64 linux and darwin (it was fine before I split up into three commits, so I’m re-testing to make sure I didn’t screw anything up).

OK to commit?
FX



[-- Attachment #2: 0001-core-Support-heap-based-trampolines.patch --]
[-- Type: application/octet-stream, Size: 14264 bytes --]

From bfb1e356e7e6848736218608eca953569361cf18 Mon Sep 17 00:00:00 2001
From: Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
Date: Sat, 5 Aug 2023 14:54:11 +0200
Subject: [PATCH 1/3] core: Support heap-based trampolines

Generate heap-based nested function trampolines

Add support for allocating nested function trampolines on an
executable heap rather than on the stack. This is motivated by targets
such as AArch64 Darwin, which globally prohibit executing code on the
stack.

The target-specific routines for allocating and writing trampolines are
to be provided in libgcc.

The gcc flag -ftrampoline-impl controls whether to generate code
that instantiates trampolines on the stack, or to emit calls to
__builtin_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted. Note that this flag is completely
independent of libgcc: If libgcc is for any reason missing those
symbols, you will get a link failure.

This implementation imposes some implicit restrictions as compared to
stack trampolines. longjmp'ing back to a state before a trampoline was
created will cause us to skip over the corresponding
__builtin_nested_func_ptr_deleted, which will leak trampolines
starting from the beginning of the linked list of allocated
trampolines. There may be scope for instrumenting longjmp/setjmp to
trigger cleanups of trampolines.

Co-Authored-By: Andrew Burgess <andrew.burgess@embecosm.com>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>

gcc/ChangeLog:

	* builtins.def (BUILT_IN_NESTED_PTR_CREATED): Define.
	(BUILT_IN_NESTED_PTR_DELETED): Ditto.
	* common.opt (ftrampoline-impl): Add option to control
	generation of trampoline instantiation (heap or stack).
	* coretypes.h: Define enum trampoline_impl.
	* tree-nested.cc (convert_tramp_reference_op): Don't bother calling
	__builtin_adjust_trampoline for heap trampolines.
	(finalize_nesting_tree_1): Emit calls to
	__builtin_nested_...{created,deleted} if we're generating with
	-ftrampoline-impl=heap.
	* tree.cc (build_common_builtin_nodes): Build
	__builtin_nested_...{created,deleted}.
	* doc/invoke.texi (-ftrampoline-impl): Document.
---
 gcc/builtins.def    |   2 +
 gcc/common.opt      |  17 ++++++-
 gcc/coretypes.h     |   6 +++
 gcc/doc/invoke.texi |  17 ++++++-
 gcc/tree-nested.cc  | 121 +++++++++++++++++++++++++++++++++++++-------
 gcc/tree.cc         |  17 +++++++
 6 files changed, 161 insertions(+), 19 deletions(-)

diff --git a/gcc/builtins.def b/gcc/builtins.def
index 5953266acba..7a7987100d1 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1074,6 +1074,8 @@ DEF_BUILTIN_STUB (BUILT_IN_ADJUST_TRAMPOLINE, "__builtin_adjust_trampoline")
 DEF_BUILTIN_STUB (BUILT_IN_INIT_DESCRIPTOR, "__builtin_init_descriptor")
 DEF_BUILTIN_STUB (BUILT_IN_ADJUST_DESCRIPTOR, "__builtin_adjust_descriptor")
 DEF_BUILTIN_STUB (BUILT_IN_NONLOCAL_GOTO, "__builtin_nonlocal_goto")
+DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_CREATED, "__builtin_nested_func_ptr_created")
+DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_DELETED, "__builtin_nested_func_ptr_deleted")
 
 /* Implementing __builtin_setjmp.  */
 DEF_BUILTIN_STUB (BUILT_IN_SETJMP_SETUP, "__builtin_setjmp_setup")
diff --git a/gcc/common.opt b/gcc/common.opt
index 0888c15b88f..949307a4414 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2884,10 +2884,25 @@ Common Var(flag_tracer) Optimization
 Perform superblock formation via tail duplication.
 
 ftrampolines
-Common Var(flag_trampolines) Init(0)
+Common Var(flag_trampolines) Init(HEAP_TRAMPOLINES_INIT)
 For targets that normally need trampolines for nested functions, always
 generate them instead of using descriptors.
 
+ftrampoline-impl=
+Common Joined RejectNegative Enum(trampoline_impl) Var(flag_trampoline_impl) Init(HEAP_TRAMPOLINES_INIT ? TRAMPOLINE_IMPL_HEAP : TRAMPOLINE_IMPL_STACK)
+Whether trampolines are generated in executable memory rather than
+executable stack.
+
+Enum
+Name(trampoline_impl) Type(enum trampoline_impl) UnknownError(unknown trampoline implementation %qs)
+
+EnumValue
+Enum(trampoline_impl) String(stack) Value(TRAMPOLINE_IMPL_STACK)
+
+EnumValue
+Enum(trampoline_impl) String(heap) Value(TRAMPOLINE_IMPL_HEAP)
+
+
 ; Zero means that floating-point math operations cannot generate a
 ; (user-visible) trap.  This is the case, for example, in nonstop
 ; IEEE 754 arithmetic.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index ca8837cef67..7e022a427c4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -199,6 +199,12 @@ enum tls_model {
   TLS_MODEL_LOCAL_EXEC
 };
 
+/* Types of trampoline implementation.  */
+enum trampoline_impl {
+  TRAMPOLINE_IMPL_STACK,
+  TRAMPOLINE_IMPL_HEAP
+};
+
 /* Types of ABI for an offload compiler.  */
 enum offload_abi {
   OFFLOAD_ABI_UNSET,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 674f956f4b8..13e13728621 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -711,7 +711,8 @@ Objective-C and Objective-C++ Dialects}.
 -fverbose-asm  -fpack-struct[=@var{n}]
 -fleading-underscore  -ftls-model=@var{model}
 -fstack-reuse=@var{reuse_level}
--ftrampolines  -ftrapv  -fwrapv
+-ftrampolines -ftrampoline-impl=@r{[}stack@r{|}heap@r{]}
+-ftrapv  -fwrapv
 -fvisibility=@r{[}default@r{|}internal@r{|}hidden@r{|}protected@r{]}
 -fstrict-volatile-bitfields  -fsync-libcalls}
 
@@ -18834,6 +18835,20 @@ For languages other than Ada, the @code{-ftrampolines} and
 trampolines are always generated on platforms that need them
 for nested functions.
 
+@opindex ftrampoline-impl
+@item -ftrampoline-impl=@r{[}stack@r{|}heap@r{]}
+By default, trampolines are generated on stack.  However, certain platforms
+(such as the Apple M1) do not permit an executable stack.  Compiling with
+@option{-ftrampoline-impl=heap} generate calls to
+@code{__builtin_nested_func_ptr_created} and
+@code{__builtin_nested_func_ptr_deleted} in order to allocate and
+deallocate trampoline space on the executable heap.  These functions are
+implemented in libgcc, and will only be provided on specific targets:
+x86_64 Darwin, x86_64 and aarch64 Linux.  @emph{PLEASE NOTE}: Heap
+trampolines are @emph{not} guaranteed to be correctly deallocated if you
+@code{setjmp}, instantiate nested functions, and then @code{longjmp} back
+to a state prior to having allocated those nested functions.
+
 @opindex fvisibility
 @item -fvisibility=@r{[}default@r{|}internal@r{|}hidden@r{|}protected@r{]}
 Set the default ELF image symbol visibility to the specified option---all
diff --git a/gcc/tree-nested.cc b/gcc/tree-nested.cc
index ae7d1f1f6a8..84ee9962485 100644
--- a/gcc/tree-nested.cc
+++ b/gcc/tree-nested.cc
@@ -611,6 +611,14 @@ get_trampoline_type (struct nesting_info *info)
   if (trampoline_type)
     return trampoline_type;
 
+  /* When trampolines are created off-stack then the only thing we need in the
+     local frame is a single pointer.  */
+  if (flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+    {
+      trampoline_type = build_pointer_type (void_type_node);
+      return trampoline_type;
+    }
+
   align = TRAMPOLINE_ALIGNMENT;
   size = TRAMPOLINE_SIZE;
 
@@ -2788,17 +2796,27 @@ convert_tramp_reference_op (tree *tp, int *walk_subtrees, void *data)
 
       /* Compute the address of the field holding the trampoline.  */
       x = get_frame_field (info, target_context, x, &wi->gsi);
-      x = build_addr (x);
-      x = gsi_gimplify_val (info, x, &wi->gsi);
 
-      /* Do machine-specific ugliness.  Normally this will involve
-	 computing extra alignment, but it can really be anything.  */
-      if (descr)
-	builtin = builtin_decl_implicit (BUILT_IN_ADJUST_DESCRIPTOR);
+      /* APB: We don't need to do the adjustment calls when using off-stack
+	 trampolines, any such adjustment will be done when the off-stack
+	 trampoline is created.  */
+      if (!descr && flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+	x = gsi_gimplify_val (info, x, &wi->gsi);
       else
-	builtin = builtin_decl_implicit (BUILT_IN_ADJUST_TRAMPOLINE);
-      call = gimple_build_call (builtin, 1, x);
-      x = init_tmp_var_with_call (info, &wi->gsi, call);
+	{
+	  x = build_addr (x);
+
+	  x = gsi_gimplify_val (info, x, &wi->gsi);
+
+	  /* Do machine-specific ugliness.  Normally this will involve
+	     computing extra alignment, but it can really be anything.  */
+	  if (descr)
+	    builtin = builtin_decl_implicit (BUILT_IN_ADJUST_DESCRIPTOR);
+	  else
+	    builtin = builtin_decl_implicit (BUILT_IN_ADJUST_TRAMPOLINE);
+	  call = gimple_build_call (builtin, 1, x);
+	  x = init_tmp_var_with_call (info, &wi->gsi, call);
+	}
 
       /* Cast back to the proper function type.  */
       x = build1 (NOP_EXPR, TREE_TYPE (t), x);
@@ -3377,6 +3395,7 @@ build_init_call_stmt (struct nesting_info *info, tree decl, tree field,
 static void
 finalize_nesting_tree_1 (struct nesting_info *root)
 {
+  gimple_seq cleanup_list = NULL;
   gimple_seq stmt_list = NULL;
   gimple *stmt;
   tree context = root->context;
@@ -3508,9 +3527,48 @@ finalize_nesting_tree_1 (struct nesting_info *root)
 	  if (!field)
 	    continue;
 
-	  x = builtin_decl_implicit (BUILT_IN_INIT_TRAMPOLINE);
-	  stmt = build_init_call_stmt (root, i->context, field, x);
-	  gimple_seq_add_stmt (&stmt_list, stmt);
+	  if (flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+	    {
+	      /* We pass a whole bunch of arguments to the builtin function that
+		 creates the off-stack trampoline, these are
+		 1. The nested function chain value (that must be passed to the
+		 nested function so it can find the function arguments).
+		 2. A pointer to the nested function implementation,
+		 3. The address in the local stack frame where we should write
+		 the address of the trampoline.
+
+		 When this code was originally written I just kind of threw
+		 everything at the builtin, figuring I'd work out what was
+		 actually needed later, I think, the stack pointer could
+		 certainly be dropped, arguments #2 and #4 are based off the
+		 stack pointer anyway, so #1 doesn't seem to add much value.  */
+	      tree arg1, arg2, arg3;
+
+	      gcc_assert (DECL_STATIC_CHAIN (i->context));
+	      arg1 = build_addr (root->frame_decl);
+	      arg2 = build_addr (i->context);
+
+	      x = build3 (COMPONENT_REF, TREE_TYPE (field),
+			  root->frame_decl, field, NULL_TREE);
+	      arg3 = build_addr (x);
+
+	      x = builtin_decl_implicit (BUILT_IN_NESTED_PTR_CREATED);
+	      stmt = gimple_build_call (x, 3, arg1, arg2, arg3);
+	      gimple_seq_add_stmt (&stmt_list, stmt);
+
+	      /* This call to delete the nested function trampoline is added to
+		 the cleanup list, and called when we exit the current scope.  */
+	      x = builtin_decl_implicit (BUILT_IN_NESTED_PTR_DELETED);
+	      stmt = gimple_build_call (x, 0);
+	      gimple_seq_add_stmt (&cleanup_list, stmt);
+	    }
+	  else
+	    {
+	      /* Original code to initialise the on stack trampoline.  */
+	      x = builtin_decl_implicit (BUILT_IN_INIT_TRAMPOLINE);
+	      stmt = build_init_call_stmt (root, i->context, field, x);
+	      gimple_seq_add_stmt (&stmt_list, stmt);
+	    }
 	}
     }
 
@@ -3535,11 +3593,40 @@ finalize_nesting_tree_1 (struct nesting_info *root)
   /* If we created initialization statements, insert them.  */
   if (stmt_list)
     {
-      gbind *bind;
-      annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
-      bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
-      gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));
-      gimple_bind_set_body (bind, stmt_list);
+      if (flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+	{
+	  /* Handle off-stack trampolines.  */
+	  gbind *bind;
+	  annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
+	  annotate_all_with_location (cleanup_list, DECL_SOURCE_LOCATION (context));
+	  bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
+	  gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));
+
+	  gimple_seq xxx_list = NULL;
+
+	  if (cleanup_list != NULL)
+	    {
+	      /* Maybe we shouldn't be creating this try/finally if -fno-exceptions is
+		 in use.  If this is the case, then maybe we should, instead, be
+		 inserting the cleanup code onto every path out of this function?  Not
+		 yet figured out how we would do this.  */
+	      gtry *t = gimple_build_try (stmt_list, cleanup_list, GIMPLE_TRY_FINALLY);
+	      gimple_seq_add_stmt (&xxx_list, t);
+	    }
+	  else
+	    xxx_list = stmt_list;
+
+	  gimple_bind_set_body (bind, xxx_list);
+	}
+      else
+	{
+	  /* The traditional, on stack trampolines.  */
+	  gbind *bind;
+	  annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
+	  bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
+	  gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));
+	  gimple_bind_set_body (bind, stmt_list);
+	}
     }
 
   /* If a chain_decl was created, then it needs to be registered with
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 420857b110c..3e7beba8744 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -9870,6 +9870,23 @@ build_common_builtin_nodes (void)
 			"__builtin_nonlocal_goto",
 			ECF_NORETURN | ECF_NOTHROW);
 
+  tree ptr_ptr_type_node = build_pointer_type (ptr_type_node);
+
+  ftype = build_function_type_list (void_type_node,
+				    ptr_type_node, // void *chain
+				    ptr_type_node, // void *func
+				    ptr_ptr_type_node, // void **dst
+				    NULL_TREE);
+  local_define_builtin ("__builtin_nested_func_ptr_created", ftype,
+			BUILT_IN_NESTED_PTR_CREATED,
+			"__builtin_nested_func_ptr_created", ECF_NOTHROW);
+
+  ftype = build_function_type_list (void_type_node,
+				    NULL_TREE);
+  local_define_builtin ("__builtin_nested_func_ptr_deleted", ftype,
+			BUILT_IN_NESTED_PTR_DELETED,
+			"__builtin_nested_func_ptr_deleted", ECF_NOTHROW);
+
   ftype = build_function_type_list (void_type_node,
 				    ptr_type_node, ptr_type_node, NULL_TREE);
   local_define_builtin ("__builtin_setjmp_setup", ftype,
-- 
2.39.2 (Apple Git-143)


[-- Attachment #3: 0002-target-Support-heap-based-trampolines.patch --]
[-- Type: application/octet-stream, Size: 3571 bytes --]

From a7c7415110feb085620497852776fdad7edf9116 Mon Sep 17 00:00:00 2001
From: Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
Date: Sat, 5 Aug 2023 14:56:31 +0200
Subject: [PATCH 2/3] target: Support heap-based trampolines

Enable -ftrampoline-impl=heap by default if we are on macOS 11
or later.

Co-Authored-By: Andrew Burgess <andrew.burgess@embecosm.com>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>

gcc/ChangeLog:

	* config.gcc: Default to heap trampolines on macOS 11 and above.
	* config/i386/darwin.h: Define X86_CUSTOM_FUNCTION_TEST.
	* config/i386/i386.h: Define X86_CUSTOM_FUNCTION_TEST.
	* config/i386/i386.cc: Use X86_CUSTOM_FUNCTION_TEST.
---
 gcc/config.gcc           | 11 +++++++++++
 gcc/config/i386/darwin.h |  6 ++++++
 gcc/config/i386/i386.cc  |  2 +-
 gcc/config/i386/i386.h   |  6 ++++++
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 415e0e1ebc5..5d70b9b4daf 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1127,6 +1127,17 @@ case ${target} in
   ;;
 esac
 
+# Figure out if we need to enable heap trampolines by default
+case ${target} in
+*-*-darwin2*)
+  # Currently, we do this for macOS 11 and above.
+  tm_defines="$tm_defines HEAP_TRAMPOLINES_INIT=1"
+  ;;
+*)
+  tm_defines="$tm_defines HEAP_TRAMPOLINES_INIT=0"
+  ;;
+esac
+
 case ${target} in
 aarch64*-*-elf | aarch64*-*-fuchsia* | aarch64*-*-rtems*)
 	tm_file="${tm_file} elfos.h newlib-stdint.h"
diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h
index 588bd669bdd..036eefbbb95 100644
--- a/gcc/config/i386/darwin.h
+++ b/gcc/config/i386/darwin.h
@@ -308,3 +308,9 @@ along with GCC; see the file COPYING3.  If not see
 #define CLEAR_INSN_CACHE(beg, end)				\
   extern void sys_icache_invalidate(void *start, size_t len);	\
   sys_icache_invalidate ((beg), (size_t)((end)-(beg)))
+
+/* Disable custom function descriptors for Darwin when we have off-stack
+   trampolines.  */
+#undef X86_CUSTOM_FUNCTION_TEST
+#define X86_CUSTOM_FUNCTION_TEST \
+  (flag_trampolines && flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP) ? 0 : 1
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 8cd26eb54fa..d7fe8f75c4f 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -25705,7 +25705,7 @@ ix86_libgcc_floating_mode_supported_p
 #define TARGET_HARD_REGNO_SCRATCH_OK ix86_hard_regno_scratch_ok
 
 #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
-#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
+#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS X86_CUSTOM_FUNCTION_TEST
 
 #undef TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID
 #define TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID ix86_addr_space_zero_address_valid
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ef342fcee9b..e1495e98c42 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -755,6 +755,12 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 /* Minimum allocation boundary for the code of a function.  */
 #define FUNCTION_BOUNDARY 8
 
+/* We will and with this value to test if a custom function descriptor needs
+   a static chain.  The function boundary must the adjusted so that the bit
+   this represents is no longer part of the address.  0 Disables the custom
+   function descriptors.  */
+#define X86_CUSTOM_FUNCTION_TEST 1
+
 /* C++ stores the virtual bit in the lowest bit of function pointers.  */
 #define TARGET_PTRMEMFUNC_VBIT_LOCATION ptrmemfunc_vbit_in_pfn
 
-- 
2.39.2 (Apple Git-143)


[-- Attachment #4: 0003-libgcc-support-heap-based-trampolines.patch --]
[-- Type: application/octet-stream, Size: 15597 bytes --]

From e875cd959ea6d674530280ead2a2323bd6c2ad3a Mon Sep 17 00:00:00 2001
From: Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
Date: Sat, 5 Aug 2023 14:31:06 +0200
Subject: [PATCH 3/3] libgcc: support heap-based trampolines

Add support for heap-based trampolines on x86_64-linux, aarch64-linux,
and x86_64-darwin. Implement the __builtin_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted functions for these targets.

Co-Authored-By: Andrew Burgess <andrew.burgess@embecosm.com>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>

libgcc/ChangeLog:

	* libgcc2.h (__builtin_nested_func_ptr_created): Declare.
	(__builtin_nested_func_ptr_deleted): Declare.
	* libgcc-std.ver.in: Add the new symbols.
	* config/aarch64/heap-trampoline.c: Implement heap-based
	trampolines for aarch64.
	* config/aarch64/t-heap-trampoline: Add rule to build
	config/aarch64/heap-trampoline.c
	* config/i386/heap-trampoline.c: Implement heap-based
	trampolines for x86_64.
	* config/i386/t-heap-trampoline: Add rule to build
	config/i386/heap-trampoline.cc
	* config.host: Handle --enable-heap-trampolines on
	x86_64-*-linux*, aarch64-*-linux*, x86_64-*-darwin*.
---
 libgcc/config.host                      |   3 +
 libgcc/config/aarch64/heap-trampoline.c | 172 ++++++++++++++++++++++++
 libgcc/config/aarch64/t-heap-trampoline |  19 +++
 libgcc/config/i386/heap-trampoline.c    | 172 ++++++++++++++++++++++++
 libgcc/config/i386/t-heap-trampoline    |  19 +++
 libgcc/libgcc-std.ver.in                |   3 +
 libgcc/libgcc2.h                        |   3 +
 7 files changed, 391 insertions(+)
 create mode 100644 libgcc/config/aarch64/heap-trampoline.c
 create mode 100644 libgcc/config/aarch64/t-heap-trampoline
 create mode 100644 libgcc/config/i386/heap-trampoline.c
 create mode 100644 libgcc/config/i386/t-heap-trampoline

diff --git a/libgcc/config.host b/libgcc/config.host
index c94d69d84b7..d96b02ce87f 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -423,6 +423,7 @@ aarch64*-*-linux*)
 	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
 	tmake_file="${tmake_file} t-dfprules"
+	tmake_file="${tmake_file} ${cpu_type}/t-heap-trampoline"
 	;;
 aarch64*-*-vxworks7*)
 	extra_parts="$extra_parts crtfastmath.o"
@@ -697,6 +698,7 @@ x86_64-*-darwin*)
 	tmake_file="$tmake_file i386/t-crtpc t-crtfm i386/t-msabi"
 	tm_file="$tm_file i386/darwin-lib.h"
 	extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o"
+	tmake_file="${tmake_file} i386/t-heap-trampoline"
 	;;
 i[34567]86-*-elfiamcu)
 	tmake_file="$tmake_file i386/t-crtstuff t-softfp-sfdftf i386/32/t-softfp i386/32/t-iamcu i386/t-softfp t-softfp t-dfprules"
@@ -763,6 +765,7 @@ x86_64-*-linux*)
 	tmake_file="${tmake_file} i386/t-crtpc t-crtfm i386/t-crtstuff t-dfprules"
 	tm_file="${tm_file} i386/elf-lib.h"
 	md_unwind_header=i386/linux-unwind.h
+	tmake_file="${tmake_file} i386/t-heap-trampoline"
 	;;
 x86_64-*-kfreebsd*-gnu)
 	extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o"
diff --git a/libgcc/config/aarch64/heap-trampoline.c b/libgcc/config/aarch64/heap-trampoline.c
new file mode 100644
index 00000000000..c8b83681ed7
--- /dev/null
+++ b/libgcc/config/aarch64/heap-trampoline.c
@@ -0,0 +1,172 @@
+/* Copyright The GNU Toolchain Authors. */
+
+#include <unistd.h>
+#include <sys/mman.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#if __APPLE__
+/* For pthread_jit_write_protect_np */
+#include <pthread.h>
+#endif
+
+void *allocate_trampoline_page (void);
+int get_trampolines_per_page (void);
+struct tramp_ctrl_data *allocate_tramp_ctrl (struct tramp_ctrl_data *parent);
+void *allocate_trampoline_page (void);
+
+void __builtin_nested_func_ptr_created (void *chain, void *func, void **dst);
+void __builtin_nested_func_ptr_deleted (void);
+
+#if defined(__gnu_linux__)
+static const uint32_t aarch64_trampoline_insns[] = {
+  0xd503245f, /* hint    34 */
+  0x580000b1, /* ldr     x17, .+20 */
+  0x580000d2, /* ldr     x18, .+24 */
+  0xd61f0220, /* br      x17 */
+  0xd5033f9f, /* dsb     sy */
+  0xd5033fdf /* isb */
+};
+
+#elif __APPLE__
+static const uint32_t aarch64_trampoline_insns[] = {
+  0xd503245f, /* hint    34 */
+  0x580000b1, /* ldr     x17, .+20 */
+  0x580000d0, /* ldr     x16, .+24 */
+  0xd61f0220, /* br      x17 */
+  0xd5033f9f, /* dsb     sy */
+  0xd5033fdf /* isb */
+};
+
+#else
+#error "Unsupported AArch64 platform for heap trampolines"
+#endif
+
+struct aarch64_trampoline {
+  uint32_t insns[6];
+  void *func_ptr;
+  void *chain_ptr;
+};
+
+struct tramp_ctrl_data
+{
+  struct tramp_ctrl_data *prev;
+
+  int free_trampolines;
+
+  /* This will be pointing to an executable mmap'ed page.  */
+  struct aarch64_trampoline *trampolines;
+};
+
+int
+get_trampolines_per_page (void)
+{
+  return getpagesize() / sizeof(struct aarch64_trampoline);
+}
+
+static _Thread_local struct tramp_ctrl_data *tramp_ctrl_curr = NULL;
+
+void *
+allocate_trampoline_page (void)
+{
+  void *page;
+
+#if defined(__gnu_linux__)
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE, 0, 0);
+#elif __APPLE__
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE | MAP_JIT, 0, 0);
+#else
+  page = MAP_FAILED;
+#endif
+
+  return page;
+}
+
+struct tramp_ctrl_data *
+allocate_tramp_ctrl (struct tramp_ctrl_data *parent)
+{
+  struct tramp_ctrl_data *p = malloc (sizeof (struct tramp_ctrl_data));
+  if (p == NULL)
+    return NULL;
+
+  p->trampolines = allocate_trampoline_page ();
+
+  if (p->trampolines == MAP_FAILED)
+    return NULL;
+
+  p->prev = parent;
+  p->free_trampolines = get_trampolines_per_page();
+
+  return p;
+}
+
+void
+__builtin_nested_func_ptr_created (void *chain, void *func, void **dst)
+{
+  if (tramp_ctrl_curr == NULL)
+    {
+      tramp_ctrl_curr = allocate_tramp_ctrl (NULL);
+      if (tramp_ctrl_curr == NULL)
+	abort ();
+    }
+
+  if (tramp_ctrl_curr->free_trampolines == 0)
+    {
+      void *tramp_ctrl = allocate_tramp_ctrl (tramp_ctrl_curr);
+      if (!tramp_ctrl)
+	abort ();
+
+      tramp_ctrl_curr = tramp_ctrl;
+    }
+
+  struct aarch64_trampoline *trampoline
+    = &tramp_ctrl_curr->trampolines[get_trampolines_per_page ()
+				    - tramp_ctrl_curr->free_trampolines];
+
+#if __APPLE__
+  /* Disable write protection for the MAP_JIT regions in this thread (see
+     https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon) */
+  pthread_jit_write_protect_np (0);
+#endif
+
+  memcpy (trampoline->insns, aarch64_trampoline_insns,
+	  sizeof(aarch64_trampoline_insns));
+  trampoline->func_ptr = func;
+  trampoline->chain_ptr = chain;
+
+#if __APPLE__
+  /* Re-enable write protection.  */
+  pthread_jit_write_protect_np (1);
+#endif
+
+  tramp_ctrl_curr->free_trampolines -= 1;
+
+  __builtin___clear_cache ((void *)trampoline->insns,
+			   ((void *)trampoline->insns + sizeof(trampoline->insns)));
+
+  *dst = &trampoline->insns;
+}
+
+void
+__builtin_nested_func_ptr_deleted (void)
+{
+  if (tramp_ctrl_curr == NULL)
+    abort ();
+
+  tramp_ctrl_curr->free_trampolines += 1;
+
+  if (tramp_ctrl_curr->free_trampolines == get_trampolines_per_page ())
+    {
+      if (tramp_ctrl_curr->prev == NULL)
+	return;
+
+      munmap (tramp_ctrl_curr->trampolines, getpagesize());
+      struct tramp_ctrl_data *prev = tramp_ctrl_curr->prev;
+      free (tramp_ctrl_curr);
+      tramp_ctrl_curr = prev;
+    }
+}
diff --git a/libgcc/config/aarch64/t-heap-trampoline b/libgcc/config/aarch64/t-heap-trampoline
new file mode 100644
index 00000000000..b22480800b2
--- /dev/null
+++ b/libgcc/config/aarch64/t-heap-trampoline
@@ -0,0 +1,19 @@
+# Copyright The GNU Toolchain Authors.
+
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+LIB2ADD += $(srcdir)/config/aarch64/heap-trampoline.c
diff --git a/libgcc/config/i386/heap-trampoline.c b/libgcc/config/i386/heap-trampoline.c
new file mode 100644
index 00000000000..96e13bf828e
--- /dev/null
+++ b/libgcc/config/i386/heap-trampoline.c
@@ -0,0 +1,172 @@
+/* Copyright The GNU Toolchain Authors. */
+
+#include <unistd.h>
+#include <sys/mman.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#if __APPLE__ && __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+/* For pthread_jit_write_protect_np */
+#include <pthread.h>
+#endif
+
+void *allocate_trampoline_page (void);
+int get_trampolines_per_page (void);
+struct tramp_ctrl_data *allocate_tramp_ctrl (struct tramp_ctrl_data *parent);
+void *allocate_trampoline_page (void);
+
+void __builtin_nested_func_ptr_created (void *chain, void *func, void **dst);
+void __builtin_nested_func_ptr_deleted (void);
+
+static const uint8_t trampoline_insns[] = {
+  /* movabs $<chain>,%r11  */
+  0x49, 0xbb,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+
+  /* movabs $<func>,%r10  */
+  0x49, 0xba,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+
+  /* rex.WB jmpq *%r11  */
+  0x41, 0xff, 0xe3
+};
+
+union ix86_trampoline {
+  uint8_t insns[sizeof(trampoline_insns)];
+
+  struct __attribute__((packed)) fields {
+    uint8_t insn_0[2];
+    void *func_ptr;
+    uint8_t insn_1[2];
+    void *chain_ptr;
+    uint8_t insn_2[3];
+  } fields;
+};
+
+struct tramp_ctrl_data
+{
+  struct tramp_ctrl_data *prev;
+
+  int free_trampolines;
+
+  /* This will be pointing to an executable mmap'ed page.  */
+  union ix86_trampoline *trampolines;
+};
+
+int
+get_trampolines_per_page (void)
+{
+  return getpagesize() / sizeof(union ix86_trampoline);
+}
+
+static _Thread_local struct tramp_ctrl_data *tramp_ctrl_curr = NULL;
+
+void *
+allocate_trampoline_page (void)
+{
+  void *page;
+
+#if defined(__gnu_linux__)
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE, 0, 0);
+#elif __APPLE__
+# if  __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE | MAP_JIT, 0, 0);
+# else
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE, 0, 0);
+# endif
+#else
+  page = MAP_FAILED;
+#endif
+
+  return page;
+}
+
+struct tramp_ctrl_data *
+allocate_tramp_ctrl (struct tramp_ctrl_data *parent)
+{
+  struct tramp_ctrl_data *p = malloc (sizeof (struct tramp_ctrl_data));
+  if (p == NULL)
+    return NULL;
+
+  p->trampolines = allocate_trampoline_page ();
+
+  if (p->trampolines == MAP_FAILED)
+    return NULL;
+
+  p->prev = parent;
+  p->free_trampolines = get_trampolines_per_page();
+
+  return p;
+}
+
+void
+__builtin_nested_func_ptr_created (void *chain, void *func, void **dst)
+{
+  if (tramp_ctrl_curr == NULL)
+    {
+      tramp_ctrl_curr = allocate_tramp_ctrl (NULL);
+      if (tramp_ctrl_curr == NULL)
+	abort ();
+    }
+
+  if (tramp_ctrl_curr->free_trampolines == 0)
+    {
+      void *tramp_ctrl = allocate_tramp_ctrl (tramp_ctrl_curr);
+      if (!tramp_ctrl)
+	abort ();
+
+      tramp_ctrl_curr = tramp_ctrl;
+    }
+
+  union ix86_trampoline *trampoline
+    = &tramp_ctrl_curr->trampolines[get_trampolines_per_page ()
+				    - tramp_ctrl_curr->free_trampolines];
+
+#if __APPLE__ && __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+  /* Disable write protection for the MAP_JIT regions in this thread (see
+     https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon) */
+  pthread_jit_write_protect_np (0);
+#endif
+
+  memcpy (trampoline->insns, trampoline_insns,
+	  sizeof(trampoline_insns));
+  trampoline->fields.func_ptr = func;
+  trampoline->fields.chain_ptr = chain;
+
+#if __APPLE__ && __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+  /* Re-enable write protection.  */
+  pthread_jit_write_protect_np (1);
+#endif
+
+  tramp_ctrl_curr->free_trampolines -= 1;
+
+  __builtin___clear_cache ((void *)trampoline->insns,
+			   ((void *)trampoline->insns + sizeof(trampoline->insns)));
+
+  *dst = &trampoline->insns;
+}
+
+void
+__builtin_nested_func_ptr_deleted (void)
+{
+  if (tramp_ctrl_curr == NULL)
+    abort ();
+
+  tramp_ctrl_curr->free_trampolines += 1;
+
+  if (tramp_ctrl_curr->free_trampolines == get_trampolines_per_page ())
+    {
+      if (tramp_ctrl_curr->prev == NULL)
+	return;
+
+      munmap (tramp_ctrl_curr->trampolines, getpagesize());
+      struct tramp_ctrl_data *prev = tramp_ctrl_curr->prev;
+      free (tramp_ctrl_curr);
+      tramp_ctrl_curr = prev;
+    }
+}
diff --git a/libgcc/config/i386/t-heap-trampoline b/libgcc/config/i386/t-heap-trampoline
new file mode 100644
index 00000000000..613f635b1f6
--- /dev/null
+++ b/libgcc/config/i386/t-heap-trampoline
@@ -0,0 +1,19 @@
+# Copyright The GNU Toolchain Authors.
+
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+LIB2ADD += $(srcdir)/config/i386/heap-trampoline.c
diff --git a/libgcc/libgcc-std.ver.in b/libgcc/libgcc-std.ver.in
index c4f87a50e70..a48f4899eb6 100644
--- a/libgcc/libgcc-std.ver.in
+++ b/libgcc/libgcc-std.ver.in
@@ -1943,4 +1943,7 @@ GCC_4.8.0 {
 GCC_7.0.0 {
   __PFX__divmoddi4
   __PFX__divmodti4
+
+  __builtin_nested_func_ptr_created
+  __builtin_nested_func_ptr_deleted
 }
diff --git a/libgcc/libgcc2.h b/libgcc/libgcc2.h
index 3ec9bbd8164..ac7eaab4f01 100644
--- a/libgcc/libgcc2.h
+++ b/libgcc/libgcc2.h
@@ -29,6 +29,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #pragma GCC visibility push(default)
 #endif
 
+extern void __builtin_nested_func_ptr_created (void *, void *, void **);
+extern void __builtin_nested_func_ptr_deleted (void);
+
 extern int __gcc_bcmp (const unsigned char *, const unsigned char *, size_t);
 extern void __clear_cache (void *, void *);
 extern void __eprintf (const char *, const char *, unsigned int, const char *)
-- 
2.39.2 (Apple Git-143)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] core: Support heap-based trampolines
  2023-08-05 14:20   ` FX Coudert
@ 2023-08-20  9:43     ` FX Coudert
  2023-09-06 15:44     ` FX Coudert
  1 sibling, 0 replies; 15+ messages in thread
From: FX Coudert @ 2023-08-20  9:43 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Iain Sandoe, maxim.blinov, ebotcazou, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 691 bytes --]

Hi,

A gentle ping on the revised patch, for Richard or another global reviewer.

Thanks,
FX



> Le 5 août 2023 à 16:20, FX Coudert <fxcoudert@gmail.com> a écrit :
> 
> Hi Richard,
> 
> Thanks for your feedback. Here is an amended version of the patch, taking into consideration your requests and the following discussion. There is no configure option for the libgcc part, and the documentation is amended. The patch is split into three commits for core, target and libgcc.
> 
> Currently regtesting on x86_64 linux and darwin (it was fine before I split up into three commits, so I’m re-testing to make sure I didn’t screw anything up).
> 
> OK to commit?
> FX


[-- Attachment #2: 0001-core-Support-heap-based-trampolines.patch --]
[-- Type: application/octet-stream, Size: 14264 bytes --]

From bfb1e356e7e6848736218608eca953569361cf18 Mon Sep 17 00:00:00 2001
From: Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
Date: Sat, 5 Aug 2023 14:54:11 +0200
Subject: [PATCH 1/3] core: Support heap-based trampolines

Generate heap-based nested function trampolines

Add support for allocating nested function trampolines on an
executable heap rather than on the stack. This is motivated by targets
such as AArch64 Darwin, which globally prohibit executing code on the
stack.

The target-specific routines for allocating and writing trampolines are
to be provided in libgcc.

The gcc flag -ftrampoline-impl controls whether to generate code
that instantiates trampolines on the stack, or to emit calls to
__builtin_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted. Note that this flag is completely
independent of libgcc: If libgcc is for any reason missing those
symbols, you will get a link failure.

This implementation imposes some implicit restrictions as compared to
stack trampolines. longjmp'ing back to a state before a trampoline was
created will cause us to skip over the corresponding
__builtin_nested_func_ptr_deleted, which will leak trampolines
starting from the beginning of the linked list of allocated
trampolines. There may be scope for instrumenting longjmp/setjmp to
trigger cleanups of trampolines.

Co-Authored-By: Andrew Burgess <andrew.burgess@embecosm.com>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>

gcc/ChangeLog:

	* builtins.def (BUILT_IN_NESTED_PTR_CREATED): Define.
	(BUILT_IN_NESTED_PTR_DELETED): Ditto.
	* common.opt (ftrampoline-impl): Add option to control
	generation of trampoline instantiation (heap or stack).
	* coretypes.h: Define enum trampoline_impl.
	* tree-nested.cc (convert_tramp_reference_op): Don't bother calling
	__builtin_adjust_trampoline for heap trampolines.
	(finalize_nesting_tree_1): Emit calls to
	__builtin_nested_...{created,deleted} if we're generating with
	-ftrampoline-impl=heap.
	* tree.cc (build_common_builtin_nodes): Build
	__builtin_nested_...{created,deleted}.
	* doc/invoke.texi (-ftrampoline-impl): Document.
---
 gcc/builtins.def    |   2 +
 gcc/common.opt      |  17 ++++++-
 gcc/coretypes.h     |   6 +++
 gcc/doc/invoke.texi |  17 ++++++-
 gcc/tree-nested.cc  | 121 +++++++++++++++++++++++++++++++++++++-------
 gcc/tree.cc         |  17 +++++++
 6 files changed, 161 insertions(+), 19 deletions(-)

diff --git a/gcc/builtins.def b/gcc/builtins.def
index 5953266acba..7a7987100d1 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1074,6 +1074,8 @@ DEF_BUILTIN_STUB (BUILT_IN_ADJUST_TRAMPOLINE, "__builtin_adjust_trampoline")
 DEF_BUILTIN_STUB (BUILT_IN_INIT_DESCRIPTOR, "__builtin_init_descriptor")
 DEF_BUILTIN_STUB (BUILT_IN_ADJUST_DESCRIPTOR, "__builtin_adjust_descriptor")
 DEF_BUILTIN_STUB (BUILT_IN_NONLOCAL_GOTO, "__builtin_nonlocal_goto")
+DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_CREATED, "__builtin_nested_func_ptr_created")
+DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_DELETED, "__builtin_nested_func_ptr_deleted")
 
 /* Implementing __builtin_setjmp.  */
 DEF_BUILTIN_STUB (BUILT_IN_SETJMP_SETUP, "__builtin_setjmp_setup")
diff --git a/gcc/common.opt b/gcc/common.opt
index 0888c15b88f..949307a4414 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2884,10 +2884,25 @@ Common Var(flag_tracer) Optimization
 Perform superblock formation via tail duplication.
 
 ftrampolines
-Common Var(flag_trampolines) Init(0)
+Common Var(flag_trampolines) Init(HEAP_TRAMPOLINES_INIT)
 For targets that normally need trampolines for nested functions, always
 generate them instead of using descriptors.
 
+ftrampoline-impl=
+Common Joined RejectNegative Enum(trampoline_impl) Var(flag_trampoline_impl) Init(HEAP_TRAMPOLINES_INIT ? TRAMPOLINE_IMPL_HEAP : TRAMPOLINE_IMPL_STACK)
+Whether trampolines are generated in executable memory rather than
+executable stack.
+
+Enum
+Name(trampoline_impl) Type(enum trampoline_impl) UnknownError(unknown trampoline implementation %qs)
+
+EnumValue
+Enum(trampoline_impl) String(stack) Value(TRAMPOLINE_IMPL_STACK)
+
+EnumValue
+Enum(trampoline_impl) String(heap) Value(TRAMPOLINE_IMPL_HEAP)
+
+
 ; Zero means that floating-point math operations cannot generate a
 ; (user-visible) trap.  This is the case, for example, in nonstop
 ; IEEE 754 arithmetic.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index ca8837cef67..7e022a427c4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -199,6 +199,12 @@ enum tls_model {
   TLS_MODEL_LOCAL_EXEC
 };
 
+/* Types of trampoline implementation.  */
+enum trampoline_impl {
+  TRAMPOLINE_IMPL_STACK,
+  TRAMPOLINE_IMPL_HEAP
+};
+
 /* Types of ABI for an offload compiler.  */
 enum offload_abi {
   OFFLOAD_ABI_UNSET,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 674f956f4b8..13e13728621 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -711,7 +711,8 @@ Objective-C and Objective-C++ Dialects}.
 -fverbose-asm  -fpack-struct[=@var{n}]
 -fleading-underscore  -ftls-model=@var{model}
 -fstack-reuse=@var{reuse_level}
--ftrampolines  -ftrapv  -fwrapv
+-ftrampolines -ftrampoline-impl=@r{[}stack@r{|}heap@r{]}
+-ftrapv  -fwrapv
 -fvisibility=@r{[}default@r{|}internal@r{|}hidden@r{|}protected@r{]}
 -fstrict-volatile-bitfields  -fsync-libcalls}
 
@@ -18834,6 +18835,20 @@ For languages other than Ada, the @code{-ftrampolines} and
 trampolines are always generated on platforms that need them
 for nested functions.
 
+@opindex ftrampoline-impl
+@item -ftrampoline-impl=@r{[}stack@r{|}heap@r{]}
+By default, trampolines are generated on stack.  However, certain platforms
+(such as the Apple M1) do not permit an executable stack.  Compiling with
+@option{-ftrampoline-impl=heap} generate calls to
+@code{__builtin_nested_func_ptr_created} and
+@code{__builtin_nested_func_ptr_deleted} in order to allocate and
+deallocate trampoline space on the executable heap.  These functions are
+implemented in libgcc, and will only be provided on specific targets:
+x86_64 Darwin, x86_64 and aarch64 Linux.  @emph{PLEASE NOTE}: Heap
+trampolines are @emph{not} guaranteed to be correctly deallocated if you
+@code{setjmp}, instantiate nested functions, and then @code{longjmp} back
+to a state prior to having allocated those nested functions.
+
 @opindex fvisibility
 @item -fvisibility=@r{[}default@r{|}internal@r{|}hidden@r{|}protected@r{]}
 Set the default ELF image symbol visibility to the specified option---all
diff --git a/gcc/tree-nested.cc b/gcc/tree-nested.cc
index ae7d1f1f6a8..84ee9962485 100644
--- a/gcc/tree-nested.cc
+++ b/gcc/tree-nested.cc
@@ -611,6 +611,14 @@ get_trampoline_type (struct nesting_info *info)
   if (trampoline_type)
     return trampoline_type;
 
+  /* When trampolines are created off-stack then the only thing we need in the
+     local frame is a single pointer.  */
+  if (flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+    {
+      trampoline_type = build_pointer_type (void_type_node);
+      return trampoline_type;
+    }
+
   align = TRAMPOLINE_ALIGNMENT;
   size = TRAMPOLINE_SIZE;
 
@@ -2788,17 +2796,27 @@ convert_tramp_reference_op (tree *tp, int *walk_subtrees, void *data)
 
       /* Compute the address of the field holding the trampoline.  */
       x = get_frame_field (info, target_context, x, &wi->gsi);
-      x = build_addr (x);
-      x = gsi_gimplify_val (info, x, &wi->gsi);
 
-      /* Do machine-specific ugliness.  Normally this will involve
-	 computing extra alignment, but it can really be anything.  */
-      if (descr)
-	builtin = builtin_decl_implicit (BUILT_IN_ADJUST_DESCRIPTOR);
+      /* APB: We don't need to do the adjustment calls when using off-stack
+	 trampolines, any such adjustment will be done when the off-stack
+	 trampoline is created.  */
+      if (!descr && flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+	x = gsi_gimplify_val (info, x, &wi->gsi);
       else
-	builtin = builtin_decl_implicit (BUILT_IN_ADJUST_TRAMPOLINE);
-      call = gimple_build_call (builtin, 1, x);
-      x = init_tmp_var_with_call (info, &wi->gsi, call);
+	{
+	  x = build_addr (x);
+
+	  x = gsi_gimplify_val (info, x, &wi->gsi);
+
+	  /* Do machine-specific ugliness.  Normally this will involve
+	     computing extra alignment, but it can really be anything.  */
+	  if (descr)
+	    builtin = builtin_decl_implicit (BUILT_IN_ADJUST_DESCRIPTOR);
+	  else
+	    builtin = builtin_decl_implicit (BUILT_IN_ADJUST_TRAMPOLINE);
+	  call = gimple_build_call (builtin, 1, x);
+	  x = init_tmp_var_with_call (info, &wi->gsi, call);
+	}
 
       /* Cast back to the proper function type.  */
       x = build1 (NOP_EXPR, TREE_TYPE (t), x);
@@ -3377,6 +3395,7 @@ build_init_call_stmt (struct nesting_info *info, tree decl, tree field,
 static void
 finalize_nesting_tree_1 (struct nesting_info *root)
 {
+  gimple_seq cleanup_list = NULL;
   gimple_seq stmt_list = NULL;
   gimple *stmt;
   tree context = root->context;
@@ -3508,9 +3527,48 @@ finalize_nesting_tree_1 (struct nesting_info *root)
 	  if (!field)
 	    continue;
 
-	  x = builtin_decl_implicit (BUILT_IN_INIT_TRAMPOLINE);
-	  stmt = build_init_call_stmt (root, i->context, field, x);
-	  gimple_seq_add_stmt (&stmt_list, stmt);
+	  if (flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+	    {
+	      /* We pass a whole bunch of arguments to the builtin function that
+		 creates the off-stack trampoline, these are
+		 1. The nested function chain value (that must be passed to the
+		 nested function so it can find the function arguments).
+		 2. A pointer to the nested function implementation,
+		 3. The address in the local stack frame where we should write
+		 the address of the trampoline.
+
+		 When this code was originally written I just kind of threw
+		 everything at the builtin, figuring I'd work out what was
+		 actually needed later, I think, the stack pointer could
+		 certainly be dropped, arguments #2 and #4 are based off the
+		 stack pointer anyway, so #1 doesn't seem to add much value.  */
+	      tree arg1, arg2, arg3;
+
+	      gcc_assert (DECL_STATIC_CHAIN (i->context));
+	      arg1 = build_addr (root->frame_decl);
+	      arg2 = build_addr (i->context);
+
+	      x = build3 (COMPONENT_REF, TREE_TYPE (field),
+			  root->frame_decl, field, NULL_TREE);
+	      arg3 = build_addr (x);
+
+	      x = builtin_decl_implicit (BUILT_IN_NESTED_PTR_CREATED);
+	      stmt = gimple_build_call (x, 3, arg1, arg2, arg3);
+	      gimple_seq_add_stmt (&stmt_list, stmt);
+
+	      /* This call to delete the nested function trampoline is added to
+		 the cleanup list, and called when we exit the current scope.  */
+	      x = builtin_decl_implicit (BUILT_IN_NESTED_PTR_DELETED);
+	      stmt = gimple_build_call (x, 0);
+	      gimple_seq_add_stmt (&cleanup_list, stmt);
+	    }
+	  else
+	    {
+	      /* Original code to initialise the on stack trampoline.  */
+	      x = builtin_decl_implicit (BUILT_IN_INIT_TRAMPOLINE);
+	      stmt = build_init_call_stmt (root, i->context, field, x);
+	      gimple_seq_add_stmt (&stmt_list, stmt);
+	    }
 	}
     }
 
@@ -3535,11 +3593,40 @@ finalize_nesting_tree_1 (struct nesting_info *root)
   /* If we created initialization statements, insert them.  */
   if (stmt_list)
     {
-      gbind *bind;
-      annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
-      bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
-      gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));
-      gimple_bind_set_body (bind, stmt_list);
+      if (flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+	{
+	  /* Handle off-stack trampolines.  */
+	  gbind *bind;
+	  annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
+	  annotate_all_with_location (cleanup_list, DECL_SOURCE_LOCATION (context));
+	  bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
+	  gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));
+
+	  gimple_seq xxx_list = NULL;
+
+	  if (cleanup_list != NULL)
+	    {
+	      /* Maybe we shouldn't be creating this try/finally if -fno-exceptions is
+		 in use.  If this is the case, then maybe we should, instead, be
+		 inserting the cleanup code onto every path out of this function?  Not
+		 yet figured out how we would do this.  */
+	      gtry *t = gimple_build_try (stmt_list, cleanup_list, GIMPLE_TRY_FINALLY);
+	      gimple_seq_add_stmt (&xxx_list, t);
+	    }
+	  else
+	    xxx_list = stmt_list;
+
+	  gimple_bind_set_body (bind, xxx_list);
+	}
+      else
+	{
+	  /* The traditional, on stack trampolines.  */
+	  gbind *bind;
+	  annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
+	  bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
+	  gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));
+	  gimple_bind_set_body (bind, stmt_list);
+	}
     }
 
   /* If a chain_decl was created, then it needs to be registered with
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 420857b110c..3e7beba8744 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -9870,6 +9870,23 @@ build_common_builtin_nodes (void)
 			"__builtin_nonlocal_goto",
 			ECF_NORETURN | ECF_NOTHROW);
 
+  tree ptr_ptr_type_node = build_pointer_type (ptr_type_node);
+
+  ftype = build_function_type_list (void_type_node,
+				    ptr_type_node, // void *chain
+				    ptr_type_node, // void *func
+				    ptr_ptr_type_node, // void **dst
+				    NULL_TREE);
+  local_define_builtin ("__builtin_nested_func_ptr_created", ftype,
+			BUILT_IN_NESTED_PTR_CREATED,
+			"__builtin_nested_func_ptr_created", ECF_NOTHROW);
+
+  ftype = build_function_type_list (void_type_node,
+				    NULL_TREE);
+  local_define_builtin ("__builtin_nested_func_ptr_deleted", ftype,
+			BUILT_IN_NESTED_PTR_DELETED,
+			"__builtin_nested_func_ptr_deleted", ECF_NOTHROW);
+
   ftype = build_function_type_list (void_type_node,
 				    ptr_type_node, ptr_type_node, NULL_TREE);
   local_define_builtin ("__builtin_setjmp_setup", ftype,
-- 
2.39.2 (Apple Git-143)


[-- Attachment #3: 0002-target-Support-heap-based-trampolines.patch --]
[-- Type: application/octet-stream, Size: 3571 bytes --]

From a7c7415110feb085620497852776fdad7edf9116 Mon Sep 17 00:00:00 2001
From: Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
Date: Sat, 5 Aug 2023 14:56:31 +0200
Subject: [PATCH 2/3] target: Support heap-based trampolines

Enable -ftrampoline-impl=heap by default if we are on macOS 11
or later.

Co-Authored-By: Andrew Burgess <andrew.burgess@embecosm.com>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>

gcc/ChangeLog:

	* config.gcc: Default to heap trampolines on macOS 11 and above.
	* config/i386/darwin.h: Define X86_CUSTOM_FUNCTION_TEST.
	* config/i386/i386.h: Define X86_CUSTOM_FUNCTION_TEST.
	* config/i386/i386.cc: Use X86_CUSTOM_FUNCTION_TEST.
---
 gcc/config.gcc           | 11 +++++++++++
 gcc/config/i386/darwin.h |  6 ++++++
 gcc/config/i386/i386.cc  |  2 +-
 gcc/config/i386/i386.h   |  6 ++++++
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 415e0e1ebc5..5d70b9b4daf 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1127,6 +1127,17 @@ case ${target} in
   ;;
 esac
 
+# Figure out if we need to enable heap trampolines by default
+case ${target} in
+*-*-darwin2*)
+  # Currently, we do this for macOS 11 and above.
+  tm_defines="$tm_defines HEAP_TRAMPOLINES_INIT=1"
+  ;;
+*)
+  tm_defines="$tm_defines HEAP_TRAMPOLINES_INIT=0"
+  ;;
+esac
+
 case ${target} in
 aarch64*-*-elf | aarch64*-*-fuchsia* | aarch64*-*-rtems*)
 	tm_file="${tm_file} elfos.h newlib-stdint.h"
diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h
index 588bd669bdd..036eefbbb95 100644
--- a/gcc/config/i386/darwin.h
+++ b/gcc/config/i386/darwin.h
@@ -308,3 +308,9 @@ along with GCC; see the file COPYING3.  If not see
 #define CLEAR_INSN_CACHE(beg, end)				\
   extern void sys_icache_invalidate(void *start, size_t len);	\
   sys_icache_invalidate ((beg), (size_t)((end)-(beg)))
+
+/* Disable custom function descriptors for Darwin when we have off-stack
+   trampolines.  */
+#undef X86_CUSTOM_FUNCTION_TEST
+#define X86_CUSTOM_FUNCTION_TEST \
+  (flag_trampolines && flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP) ? 0 : 1
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 8cd26eb54fa..d7fe8f75c4f 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -25705,7 +25705,7 @@ ix86_libgcc_floating_mode_supported_p
 #define TARGET_HARD_REGNO_SCRATCH_OK ix86_hard_regno_scratch_ok
 
 #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
-#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
+#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS X86_CUSTOM_FUNCTION_TEST
 
 #undef TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID
 #define TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID ix86_addr_space_zero_address_valid
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ef342fcee9b..e1495e98c42 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -755,6 +755,12 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 /* Minimum allocation boundary for the code of a function.  */
 #define FUNCTION_BOUNDARY 8
 
+/* We will and with this value to test if a custom function descriptor needs
+   a static chain.  The function boundary must the adjusted so that the bit
+   this represents is no longer part of the address.  0 Disables the custom
+   function descriptors.  */
+#define X86_CUSTOM_FUNCTION_TEST 1
+
 /* C++ stores the virtual bit in the lowest bit of function pointers.  */
 #define TARGET_PTRMEMFUNC_VBIT_LOCATION ptrmemfunc_vbit_in_pfn
 
-- 
2.39.2 (Apple Git-143)


[-- Attachment #4: 0003-libgcc-support-heap-based-trampolines.patch --]
[-- Type: application/octet-stream, Size: 15597 bytes --]

From e875cd959ea6d674530280ead2a2323bd6c2ad3a Mon Sep 17 00:00:00 2001
From: Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
Date: Sat, 5 Aug 2023 14:31:06 +0200
Subject: [PATCH 3/3] libgcc: support heap-based trampolines

Add support for heap-based trampolines on x86_64-linux, aarch64-linux,
and x86_64-darwin. Implement the __builtin_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted functions for these targets.

Co-Authored-By: Andrew Burgess <andrew.burgess@embecosm.com>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>

libgcc/ChangeLog:

	* libgcc2.h (__builtin_nested_func_ptr_created): Declare.
	(__builtin_nested_func_ptr_deleted): Declare.
	* libgcc-std.ver.in: Add the new symbols.
	* config/aarch64/heap-trampoline.c: Implement heap-based
	trampolines for aarch64.
	* config/aarch64/t-heap-trampoline: Add rule to build
	config/aarch64/heap-trampoline.c
	* config/i386/heap-trampoline.c: Implement heap-based
	trampolines for x86_64.
	* config/i386/t-heap-trampoline: Add rule to build
	config/i386/heap-trampoline.cc
	* config.host: Handle --enable-heap-trampolines on
	x86_64-*-linux*, aarch64-*-linux*, x86_64-*-darwin*.
---
 libgcc/config.host                      |   3 +
 libgcc/config/aarch64/heap-trampoline.c | 172 ++++++++++++++++++++++++
 libgcc/config/aarch64/t-heap-trampoline |  19 +++
 libgcc/config/i386/heap-trampoline.c    | 172 ++++++++++++++++++++++++
 libgcc/config/i386/t-heap-trampoline    |  19 +++
 libgcc/libgcc-std.ver.in                |   3 +
 libgcc/libgcc2.h                        |   3 +
 7 files changed, 391 insertions(+)
 create mode 100644 libgcc/config/aarch64/heap-trampoline.c
 create mode 100644 libgcc/config/aarch64/t-heap-trampoline
 create mode 100644 libgcc/config/i386/heap-trampoline.c
 create mode 100644 libgcc/config/i386/t-heap-trampoline

diff --git a/libgcc/config.host b/libgcc/config.host
index c94d69d84b7..d96b02ce87f 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -423,6 +423,7 @@ aarch64*-*-linux*)
 	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
 	tmake_file="${tmake_file} t-dfprules"
+	tmake_file="${tmake_file} ${cpu_type}/t-heap-trampoline"
 	;;
 aarch64*-*-vxworks7*)
 	extra_parts="$extra_parts crtfastmath.o"
@@ -697,6 +698,7 @@ x86_64-*-darwin*)
 	tmake_file="$tmake_file i386/t-crtpc t-crtfm i386/t-msabi"
 	tm_file="$tm_file i386/darwin-lib.h"
 	extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o"
+	tmake_file="${tmake_file} i386/t-heap-trampoline"
 	;;
 i[34567]86-*-elfiamcu)
 	tmake_file="$tmake_file i386/t-crtstuff t-softfp-sfdftf i386/32/t-softfp i386/32/t-iamcu i386/t-softfp t-softfp t-dfprules"
@@ -763,6 +765,7 @@ x86_64-*-linux*)
 	tmake_file="${tmake_file} i386/t-crtpc t-crtfm i386/t-crtstuff t-dfprules"
 	tm_file="${tm_file} i386/elf-lib.h"
 	md_unwind_header=i386/linux-unwind.h
+	tmake_file="${tmake_file} i386/t-heap-trampoline"
 	;;
 x86_64-*-kfreebsd*-gnu)
 	extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o"
diff --git a/libgcc/config/aarch64/heap-trampoline.c b/libgcc/config/aarch64/heap-trampoline.c
new file mode 100644
index 00000000000..c8b83681ed7
--- /dev/null
+++ b/libgcc/config/aarch64/heap-trampoline.c
@@ -0,0 +1,172 @@
+/* Copyright The GNU Toolchain Authors. */
+
+#include <unistd.h>
+#include <sys/mman.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#if __APPLE__
+/* For pthread_jit_write_protect_np */
+#include <pthread.h>
+#endif
+
+void *allocate_trampoline_page (void);
+int get_trampolines_per_page (void);
+struct tramp_ctrl_data *allocate_tramp_ctrl (struct tramp_ctrl_data *parent);
+void *allocate_trampoline_page (void);
+
+void __builtin_nested_func_ptr_created (void *chain, void *func, void **dst);
+void __builtin_nested_func_ptr_deleted (void);
+
+#if defined(__gnu_linux__)
+static const uint32_t aarch64_trampoline_insns[] = {
+  0xd503245f, /* hint    34 */
+  0x580000b1, /* ldr     x17, .+20 */
+  0x580000d2, /* ldr     x18, .+24 */
+  0xd61f0220, /* br      x17 */
+  0xd5033f9f, /* dsb     sy */
+  0xd5033fdf /* isb */
+};
+
+#elif __APPLE__
+static const uint32_t aarch64_trampoline_insns[] = {
+  0xd503245f, /* hint    34 */
+  0x580000b1, /* ldr     x17, .+20 */
+  0x580000d0, /* ldr     x16, .+24 */
+  0xd61f0220, /* br      x17 */
+  0xd5033f9f, /* dsb     sy */
+  0xd5033fdf /* isb */
+};
+
+#else
+#error "Unsupported AArch64 platform for heap trampolines"
+#endif
+
+struct aarch64_trampoline {
+  uint32_t insns[6];
+  void *func_ptr;
+  void *chain_ptr;
+};
+
+struct tramp_ctrl_data
+{
+  struct tramp_ctrl_data *prev;
+
+  int free_trampolines;
+
+  /* This will be pointing to an executable mmap'ed page.  */
+  struct aarch64_trampoline *trampolines;
+};
+
+int
+get_trampolines_per_page (void)
+{
+  return getpagesize() / sizeof(struct aarch64_trampoline);
+}
+
+static _Thread_local struct tramp_ctrl_data *tramp_ctrl_curr = NULL;
+
+void *
+allocate_trampoline_page (void)
+{
+  void *page;
+
+#if defined(__gnu_linux__)
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE, 0, 0);
+#elif __APPLE__
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE | MAP_JIT, 0, 0);
+#else
+  page = MAP_FAILED;
+#endif
+
+  return page;
+}
+
+struct tramp_ctrl_data *
+allocate_tramp_ctrl (struct tramp_ctrl_data *parent)
+{
+  struct tramp_ctrl_data *p = malloc (sizeof (struct tramp_ctrl_data));
+  if (p == NULL)
+    return NULL;
+
+  p->trampolines = allocate_trampoline_page ();
+
+  if (p->trampolines == MAP_FAILED)
+    return NULL;
+
+  p->prev = parent;
+  p->free_trampolines = get_trampolines_per_page();
+
+  return p;
+}
+
+void
+__builtin_nested_func_ptr_created (void *chain, void *func, void **dst)
+{
+  if (tramp_ctrl_curr == NULL)
+    {
+      tramp_ctrl_curr = allocate_tramp_ctrl (NULL);
+      if (tramp_ctrl_curr == NULL)
+	abort ();
+    }
+
+  if (tramp_ctrl_curr->free_trampolines == 0)
+    {
+      void *tramp_ctrl = allocate_tramp_ctrl (tramp_ctrl_curr);
+      if (!tramp_ctrl)
+	abort ();
+
+      tramp_ctrl_curr = tramp_ctrl;
+    }
+
+  struct aarch64_trampoline *trampoline
+    = &tramp_ctrl_curr->trampolines[get_trampolines_per_page ()
+				    - tramp_ctrl_curr->free_trampolines];
+
+#if __APPLE__
+  /* Disable write protection for the MAP_JIT regions in this thread (see
+     https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon) */
+  pthread_jit_write_protect_np (0);
+#endif
+
+  memcpy (trampoline->insns, aarch64_trampoline_insns,
+	  sizeof(aarch64_trampoline_insns));
+  trampoline->func_ptr = func;
+  trampoline->chain_ptr = chain;
+
+#if __APPLE__
+  /* Re-enable write protection.  */
+  pthread_jit_write_protect_np (1);
+#endif
+
+  tramp_ctrl_curr->free_trampolines -= 1;
+
+  __builtin___clear_cache ((void *)trampoline->insns,
+			   ((void *)trampoline->insns + sizeof(trampoline->insns)));
+
+  *dst = &trampoline->insns;
+}
+
+void
+__builtin_nested_func_ptr_deleted (void)
+{
+  if (tramp_ctrl_curr == NULL)
+    abort ();
+
+  tramp_ctrl_curr->free_trampolines += 1;
+
+  if (tramp_ctrl_curr->free_trampolines == get_trampolines_per_page ())
+    {
+      if (tramp_ctrl_curr->prev == NULL)
+	return;
+
+      munmap (tramp_ctrl_curr->trampolines, getpagesize());
+      struct tramp_ctrl_data *prev = tramp_ctrl_curr->prev;
+      free (tramp_ctrl_curr);
+      tramp_ctrl_curr = prev;
+    }
+}
diff --git a/libgcc/config/aarch64/t-heap-trampoline b/libgcc/config/aarch64/t-heap-trampoline
new file mode 100644
index 00000000000..b22480800b2
--- /dev/null
+++ b/libgcc/config/aarch64/t-heap-trampoline
@@ -0,0 +1,19 @@
+# Copyright The GNU Toolchain Authors.
+
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+LIB2ADD += $(srcdir)/config/aarch64/heap-trampoline.c
diff --git a/libgcc/config/i386/heap-trampoline.c b/libgcc/config/i386/heap-trampoline.c
new file mode 100644
index 00000000000..96e13bf828e
--- /dev/null
+++ b/libgcc/config/i386/heap-trampoline.c
@@ -0,0 +1,172 @@
+/* Copyright The GNU Toolchain Authors. */
+
+#include <unistd.h>
+#include <sys/mman.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#if __APPLE__ && __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+/* For pthread_jit_write_protect_np */
+#include <pthread.h>
+#endif
+
+void *allocate_trampoline_page (void);
+int get_trampolines_per_page (void);
+struct tramp_ctrl_data *allocate_tramp_ctrl (struct tramp_ctrl_data *parent);
+void *allocate_trampoline_page (void);
+
+void __builtin_nested_func_ptr_created (void *chain, void *func, void **dst);
+void __builtin_nested_func_ptr_deleted (void);
+
+static const uint8_t trampoline_insns[] = {
+  /* movabs $<chain>,%r11  */
+  0x49, 0xbb,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+
+  /* movabs $<func>,%r10  */
+  0x49, 0xba,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+
+  /* rex.WB jmpq *%r11  */
+  0x41, 0xff, 0xe3
+};
+
+union ix86_trampoline {
+  uint8_t insns[sizeof(trampoline_insns)];
+
+  struct __attribute__((packed)) fields {
+    uint8_t insn_0[2];
+    void *func_ptr;
+    uint8_t insn_1[2];
+    void *chain_ptr;
+    uint8_t insn_2[3];
+  } fields;
+};
+
+struct tramp_ctrl_data
+{
+  struct tramp_ctrl_data *prev;
+
+  int free_trampolines;
+
+  /* This will be pointing to an executable mmap'ed page.  */
+  union ix86_trampoline *trampolines;
+};
+
+int
+get_trampolines_per_page (void)
+{
+  return getpagesize() / sizeof(union ix86_trampoline);
+}
+
+static _Thread_local struct tramp_ctrl_data *tramp_ctrl_curr = NULL;
+
+void *
+allocate_trampoline_page (void)
+{
+  void *page;
+
+#if defined(__gnu_linux__)
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE, 0, 0);
+#elif __APPLE__
+# if  __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE | MAP_JIT, 0, 0);
+# else
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE, 0, 0);
+# endif
+#else
+  page = MAP_FAILED;
+#endif
+
+  return page;
+}
+
+struct tramp_ctrl_data *
+allocate_tramp_ctrl (struct tramp_ctrl_data *parent)
+{
+  struct tramp_ctrl_data *p = malloc (sizeof (struct tramp_ctrl_data));
+  if (p == NULL)
+    return NULL;
+
+  p->trampolines = allocate_trampoline_page ();
+
+  if (p->trampolines == MAP_FAILED)
+    return NULL;
+
+  p->prev = parent;
+  p->free_trampolines = get_trampolines_per_page();
+
+  return p;
+}
+
+void
+__builtin_nested_func_ptr_created (void *chain, void *func, void **dst)
+{
+  if (tramp_ctrl_curr == NULL)
+    {
+      tramp_ctrl_curr = allocate_tramp_ctrl (NULL);
+      if (tramp_ctrl_curr == NULL)
+	abort ();
+    }
+
+  if (tramp_ctrl_curr->free_trampolines == 0)
+    {
+      void *tramp_ctrl = allocate_tramp_ctrl (tramp_ctrl_curr);
+      if (!tramp_ctrl)
+	abort ();
+
+      tramp_ctrl_curr = tramp_ctrl;
+    }
+
+  union ix86_trampoline *trampoline
+    = &tramp_ctrl_curr->trampolines[get_trampolines_per_page ()
+				    - tramp_ctrl_curr->free_trampolines];
+
+#if __APPLE__ && __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+  /* Disable write protection for the MAP_JIT regions in this thread (see
+     https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon) */
+  pthread_jit_write_protect_np (0);
+#endif
+
+  memcpy (trampoline->insns, trampoline_insns,
+	  sizeof(trampoline_insns));
+  trampoline->fields.func_ptr = func;
+  trampoline->fields.chain_ptr = chain;
+
+#if __APPLE__ && __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+  /* Re-enable write protection.  */
+  pthread_jit_write_protect_np (1);
+#endif
+
+  tramp_ctrl_curr->free_trampolines -= 1;
+
+  __builtin___clear_cache ((void *)trampoline->insns,
+			   ((void *)trampoline->insns + sizeof(trampoline->insns)));
+
+  *dst = &trampoline->insns;
+}
+
+void
+__builtin_nested_func_ptr_deleted (void)
+{
+  if (tramp_ctrl_curr == NULL)
+    abort ();
+
+  tramp_ctrl_curr->free_trampolines += 1;
+
+  if (tramp_ctrl_curr->free_trampolines == get_trampolines_per_page ())
+    {
+      if (tramp_ctrl_curr->prev == NULL)
+	return;
+
+      munmap (tramp_ctrl_curr->trampolines, getpagesize());
+      struct tramp_ctrl_data *prev = tramp_ctrl_curr->prev;
+      free (tramp_ctrl_curr);
+      tramp_ctrl_curr = prev;
+    }
+}
diff --git a/libgcc/config/i386/t-heap-trampoline b/libgcc/config/i386/t-heap-trampoline
new file mode 100644
index 00000000000..613f635b1f6
--- /dev/null
+++ b/libgcc/config/i386/t-heap-trampoline
@@ -0,0 +1,19 @@
+# Copyright The GNU Toolchain Authors.
+
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+LIB2ADD += $(srcdir)/config/i386/heap-trampoline.c
diff --git a/libgcc/libgcc-std.ver.in b/libgcc/libgcc-std.ver.in
index c4f87a50e70..a48f4899eb6 100644
--- a/libgcc/libgcc-std.ver.in
+++ b/libgcc/libgcc-std.ver.in
@@ -1943,4 +1943,7 @@ GCC_4.8.0 {
 GCC_7.0.0 {
   __PFX__divmoddi4
   __PFX__divmodti4
+
+  __builtin_nested_func_ptr_created
+  __builtin_nested_func_ptr_deleted
 }
diff --git a/libgcc/libgcc2.h b/libgcc/libgcc2.h
index 3ec9bbd8164..ac7eaab4f01 100644
--- a/libgcc/libgcc2.h
+++ b/libgcc/libgcc2.h
@@ -29,6 +29,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #pragma GCC visibility push(default)
 #endif
 
+extern void __builtin_nested_func_ptr_created (void *, void *, void **);
+extern void __builtin_nested_func_ptr_deleted (void);
+
 extern int __gcc_bcmp (const unsigned char *, const unsigned char *, size_t);
 extern void __clear_cache (void *, void *);
 extern void __eprintf (const char *, const char *, unsigned int, const char *)
-- 
2.39.2 (Apple Git-143)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] core: Support heap-based trampolines
  2023-08-05 14:20   ` FX Coudert
  2023-08-20  9:43     ` FX Coudert
@ 2023-09-06 15:44     ` FX Coudert
  2023-09-14 10:18       ` Richard Biener
  1 sibling, 1 reply; 15+ messages in thread
From: FX Coudert @ 2023-09-06 15:44 UTC (permalink / raw)
  To: Richard Biener, Jeff Law, GCC Patches
  Cc: Iain Sandoe, maxim.blinov, ebotcazou

[-- Attachment #1: Type: text/plain, Size: 861 bytes --]

Hi,

ping**2 on the revised patch, for Richard or another global reviewer. So far all review feedback is that it’s a step forward, and it’s been widely used for both aarch64-darwin and x86_64-darwin distributions for almost three years now.

OK to commit?
FX



> Le 5 août 2023 à 16:20, FX Coudert <fxcoudert@gmail.com> a écrit :
> 
> Hi Richard,
> 
> Thanks for your feedback. Here is an amended version of the patch, taking into consideration your requests and the following discussion. There is no configure option for the libgcc part, and the documentation is amended. The patch is split into three commits for core, target and libgcc.
> 
> Currently regtesting on x86_64 linux and darwin (it was fine before I split up into three commits, so I’m re-testing to make sure I didn’t screw anything up).
> 
> OK to commit?
> FX


[-- Attachment #2: 0001-core-Support-heap-based-trampolines.patch --]
[-- Type: application/octet-stream, Size: 14264 bytes --]

From bfb1e356e7e6848736218608eca953569361cf18 Mon Sep 17 00:00:00 2001
From: Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
Date: Sat, 5 Aug 2023 14:54:11 +0200
Subject: [PATCH 1/3] core: Support heap-based trampolines

Generate heap-based nested function trampolines

Add support for allocating nested function trampolines on an
executable heap rather than on the stack. This is motivated by targets
such as AArch64 Darwin, which globally prohibit executing code on the
stack.

The target-specific routines for allocating and writing trampolines are
to be provided in libgcc.

The gcc flag -ftrampoline-impl controls whether to generate code
that instantiates trampolines on the stack, or to emit calls to
__builtin_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted. Note that this flag is completely
independent of libgcc: If libgcc is for any reason missing those
symbols, you will get a link failure.

This implementation imposes some implicit restrictions as compared to
stack trampolines. longjmp'ing back to a state before a trampoline was
created will cause us to skip over the corresponding
__builtin_nested_func_ptr_deleted, which will leak trampolines
starting from the beginning of the linked list of allocated
trampolines. There may be scope for instrumenting longjmp/setjmp to
trigger cleanups of trampolines.

Co-Authored-By: Andrew Burgess <andrew.burgess@embecosm.com>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>

gcc/ChangeLog:

	* builtins.def (BUILT_IN_NESTED_PTR_CREATED): Define.
	(BUILT_IN_NESTED_PTR_DELETED): Ditto.
	* common.opt (ftrampoline-impl): Add option to control
	generation of trampoline instantiation (heap or stack).
	* coretypes.h: Define enum trampoline_impl.
	* tree-nested.cc (convert_tramp_reference_op): Don't bother calling
	__builtin_adjust_trampoline for heap trampolines.
	(finalize_nesting_tree_1): Emit calls to
	__builtin_nested_...{created,deleted} if we're generating with
	-ftrampoline-impl=heap.
	* tree.cc (build_common_builtin_nodes): Build
	__builtin_nested_...{created,deleted}.
	* doc/invoke.texi (-ftrampoline-impl): Document.
---
 gcc/builtins.def    |   2 +
 gcc/common.opt      |  17 ++++++-
 gcc/coretypes.h     |   6 +++
 gcc/doc/invoke.texi |  17 ++++++-
 gcc/tree-nested.cc  | 121 +++++++++++++++++++++++++++++++++++++-------
 gcc/tree.cc         |  17 +++++++
 6 files changed, 161 insertions(+), 19 deletions(-)

diff --git a/gcc/builtins.def b/gcc/builtins.def
index 5953266acba..7a7987100d1 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1074,6 +1074,8 @@ DEF_BUILTIN_STUB (BUILT_IN_ADJUST_TRAMPOLINE, "__builtin_adjust_trampoline")
 DEF_BUILTIN_STUB (BUILT_IN_INIT_DESCRIPTOR, "__builtin_init_descriptor")
 DEF_BUILTIN_STUB (BUILT_IN_ADJUST_DESCRIPTOR, "__builtin_adjust_descriptor")
 DEF_BUILTIN_STUB (BUILT_IN_NONLOCAL_GOTO, "__builtin_nonlocal_goto")
+DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_CREATED, "__builtin_nested_func_ptr_created")
+DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_DELETED, "__builtin_nested_func_ptr_deleted")
 
 /* Implementing __builtin_setjmp.  */
 DEF_BUILTIN_STUB (BUILT_IN_SETJMP_SETUP, "__builtin_setjmp_setup")
diff --git a/gcc/common.opt b/gcc/common.opt
index 0888c15b88f..949307a4414 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2884,10 +2884,25 @@ Common Var(flag_tracer) Optimization
 Perform superblock formation via tail duplication.
 
 ftrampolines
-Common Var(flag_trampolines) Init(0)
+Common Var(flag_trampolines) Init(HEAP_TRAMPOLINES_INIT)
 For targets that normally need trampolines for nested functions, always
 generate them instead of using descriptors.
 
+ftrampoline-impl=
+Common Joined RejectNegative Enum(trampoline_impl) Var(flag_trampoline_impl) Init(HEAP_TRAMPOLINES_INIT ? TRAMPOLINE_IMPL_HEAP : TRAMPOLINE_IMPL_STACK)
+Whether trampolines are generated in executable memory rather than
+executable stack.
+
+Enum
+Name(trampoline_impl) Type(enum trampoline_impl) UnknownError(unknown trampoline implementation %qs)
+
+EnumValue
+Enum(trampoline_impl) String(stack) Value(TRAMPOLINE_IMPL_STACK)
+
+EnumValue
+Enum(trampoline_impl) String(heap) Value(TRAMPOLINE_IMPL_HEAP)
+
+
 ; Zero means that floating-point math operations cannot generate a
 ; (user-visible) trap.  This is the case, for example, in nonstop
 ; IEEE 754 arithmetic.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index ca8837cef67..7e022a427c4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -199,6 +199,12 @@ enum tls_model {
   TLS_MODEL_LOCAL_EXEC
 };
 
+/* Types of trampoline implementation.  */
+enum trampoline_impl {
+  TRAMPOLINE_IMPL_STACK,
+  TRAMPOLINE_IMPL_HEAP
+};
+
 /* Types of ABI for an offload compiler.  */
 enum offload_abi {
   OFFLOAD_ABI_UNSET,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 674f956f4b8..13e13728621 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -711,7 +711,8 @@ Objective-C and Objective-C++ Dialects}.
 -fverbose-asm  -fpack-struct[=@var{n}]
 -fleading-underscore  -ftls-model=@var{model}
 -fstack-reuse=@var{reuse_level}
--ftrampolines  -ftrapv  -fwrapv
+-ftrampolines -ftrampoline-impl=@r{[}stack@r{|}heap@r{]}
+-ftrapv  -fwrapv
 -fvisibility=@r{[}default@r{|}internal@r{|}hidden@r{|}protected@r{]}
 -fstrict-volatile-bitfields  -fsync-libcalls}
 
@@ -18834,6 +18835,20 @@ For languages other than Ada, the @code{-ftrampolines} and
 trampolines are always generated on platforms that need them
 for nested functions.
 
+@opindex ftrampoline-impl
+@item -ftrampoline-impl=@r{[}stack@r{|}heap@r{]}
+By default, trampolines are generated on stack.  However, certain platforms
+(such as the Apple M1) do not permit an executable stack.  Compiling with
+@option{-ftrampoline-impl=heap} generate calls to
+@code{__builtin_nested_func_ptr_created} and
+@code{__builtin_nested_func_ptr_deleted} in order to allocate and
+deallocate trampoline space on the executable heap.  These functions are
+implemented in libgcc, and will only be provided on specific targets:
+x86_64 Darwin, x86_64 and aarch64 Linux.  @emph{PLEASE NOTE}: Heap
+trampolines are @emph{not} guaranteed to be correctly deallocated if you
+@code{setjmp}, instantiate nested functions, and then @code{longjmp} back
+to a state prior to having allocated those nested functions.
+
 @opindex fvisibility
 @item -fvisibility=@r{[}default@r{|}internal@r{|}hidden@r{|}protected@r{]}
 Set the default ELF image symbol visibility to the specified option---all
diff --git a/gcc/tree-nested.cc b/gcc/tree-nested.cc
index ae7d1f1f6a8..84ee9962485 100644
--- a/gcc/tree-nested.cc
+++ b/gcc/tree-nested.cc
@@ -611,6 +611,14 @@ get_trampoline_type (struct nesting_info *info)
   if (trampoline_type)
     return trampoline_type;
 
+  /* When trampolines are created off-stack then the only thing we need in the
+     local frame is a single pointer.  */
+  if (flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+    {
+      trampoline_type = build_pointer_type (void_type_node);
+      return trampoline_type;
+    }
+
   align = TRAMPOLINE_ALIGNMENT;
   size = TRAMPOLINE_SIZE;
 
@@ -2788,17 +2796,27 @@ convert_tramp_reference_op (tree *tp, int *walk_subtrees, void *data)
 
       /* Compute the address of the field holding the trampoline.  */
       x = get_frame_field (info, target_context, x, &wi->gsi);
-      x = build_addr (x);
-      x = gsi_gimplify_val (info, x, &wi->gsi);
 
-      /* Do machine-specific ugliness.  Normally this will involve
-	 computing extra alignment, but it can really be anything.  */
-      if (descr)
-	builtin = builtin_decl_implicit (BUILT_IN_ADJUST_DESCRIPTOR);
+      /* APB: We don't need to do the adjustment calls when using off-stack
+	 trampolines, any such adjustment will be done when the off-stack
+	 trampoline is created.  */
+      if (!descr && flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+	x = gsi_gimplify_val (info, x, &wi->gsi);
       else
-	builtin = builtin_decl_implicit (BUILT_IN_ADJUST_TRAMPOLINE);
-      call = gimple_build_call (builtin, 1, x);
-      x = init_tmp_var_with_call (info, &wi->gsi, call);
+	{
+	  x = build_addr (x);
+
+	  x = gsi_gimplify_val (info, x, &wi->gsi);
+
+	  /* Do machine-specific ugliness.  Normally this will involve
+	     computing extra alignment, but it can really be anything.  */
+	  if (descr)
+	    builtin = builtin_decl_implicit (BUILT_IN_ADJUST_DESCRIPTOR);
+	  else
+	    builtin = builtin_decl_implicit (BUILT_IN_ADJUST_TRAMPOLINE);
+	  call = gimple_build_call (builtin, 1, x);
+	  x = init_tmp_var_with_call (info, &wi->gsi, call);
+	}
 
       /* Cast back to the proper function type.  */
       x = build1 (NOP_EXPR, TREE_TYPE (t), x);
@@ -3377,6 +3395,7 @@ build_init_call_stmt (struct nesting_info *info, tree decl, tree field,
 static void
 finalize_nesting_tree_1 (struct nesting_info *root)
 {
+  gimple_seq cleanup_list = NULL;
   gimple_seq stmt_list = NULL;
   gimple *stmt;
   tree context = root->context;
@@ -3508,9 +3527,48 @@ finalize_nesting_tree_1 (struct nesting_info *root)
 	  if (!field)
 	    continue;
 
-	  x = builtin_decl_implicit (BUILT_IN_INIT_TRAMPOLINE);
-	  stmt = build_init_call_stmt (root, i->context, field, x);
-	  gimple_seq_add_stmt (&stmt_list, stmt);
+	  if (flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+	    {
+	      /* We pass a whole bunch of arguments to the builtin function that
+		 creates the off-stack trampoline, these are
+		 1. The nested function chain value (that must be passed to the
+		 nested function so it can find the function arguments).
+		 2. A pointer to the nested function implementation,
+		 3. The address in the local stack frame where we should write
+		 the address of the trampoline.
+
+		 When this code was originally written I just kind of threw
+		 everything at the builtin, figuring I'd work out what was
+		 actually needed later, I think, the stack pointer could
+		 certainly be dropped, arguments #2 and #4 are based off the
+		 stack pointer anyway, so #1 doesn't seem to add much value.  */
+	      tree arg1, arg2, arg3;
+
+	      gcc_assert (DECL_STATIC_CHAIN (i->context));
+	      arg1 = build_addr (root->frame_decl);
+	      arg2 = build_addr (i->context);
+
+	      x = build3 (COMPONENT_REF, TREE_TYPE (field),
+			  root->frame_decl, field, NULL_TREE);
+	      arg3 = build_addr (x);
+
+	      x = builtin_decl_implicit (BUILT_IN_NESTED_PTR_CREATED);
+	      stmt = gimple_build_call (x, 3, arg1, arg2, arg3);
+	      gimple_seq_add_stmt (&stmt_list, stmt);
+
+	      /* This call to delete the nested function trampoline is added to
+		 the cleanup list, and called when we exit the current scope.  */
+	      x = builtin_decl_implicit (BUILT_IN_NESTED_PTR_DELETED);
+	      stmt = gimple_build_call (x, 0);
+	      gimple_seq_add_stmt (&cleanup_list, stmt);
+	    }
+	  else
+	    {
+	      /* Original code to initialise the on stack trampoline.  */
+	      x = builtin_decl_implicit (BUILT_IN_INIT_TRAMPOLINE);
+	      stmt = build_init_call_stmt (root, i->context, field, x);
+	      gimple_seq_add_stmt (&stmt_list, stmt);
+	    }
 	}
     }
 
@@ -3535,11 +3593,40 @@ finalize_nesting_tree_1 (struct nesting_info *root)
   /* If we created initialization statements, insert them.  */
   if (stmt_list)
     {
-      gbind *bind;
-      annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
-      bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
-      gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));
-      gimple_bind_set_body (bind, stmt_list);
+      if (flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP)
+	{
+	  /* Handle off-stack trampolines.  */
+	  gbind *bind;
+	  annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
+	  annotate_all_with_location (cleanup_list, DECL_SOURCE_LOCATION (context));
+	  bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
+	  gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));
+
+	  gimple_seq xxx_list = NULL;
+
+	  if (cleanup_list != NULL)
+	    {
+	      /* Maybe we shouldn't be creating this try/finally if -fno-exceptions is
+		 in use.  If this is the case, then maybe we should, instead, be
+		 inserting the cleanup code onto every path out of this function?  Not
+		 yet figured out how we would do this.  */
+	      gtry *t = gimple_build_try (stmt_list, cleanup_list, GIMPLE_TRY_FINALLY);
+	      gimple_seq_add_stmt (&xxx_list, t);
+	    }
+	  else
+	    xxx_list = stmt_list;
+
+	  gimple_bind_set_body (bind, xxx_list);
+	}
+      else
+	{
+	  /* The traditional, on stack trampolines.  */
+	  gbind *bind;
+	  annotate_all_with_location (stmt_list, DECL_SOURCE_LOCATION (context));
+	  bind = gimple_seq_first_stmt_as_a_bind (gimple_body (context));
+	  gimple_seq_add_seq (&stmt_list, gimple_bind_body (bind));
+	  gimple_bind_set_body (bind, stmt_list);
+	}
     }
 
   /* If a chain_decl was created, then it needs to be registered with
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 420857b110c..3e7beba8744 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -9870,6 +9870,23 @@ build_common_builtin_nodes (void)
 			"__builtin_nonlocal_goto",
 			ECF_NORETURN | ECF_NOTHROW);
 
+  tree ptr_ptr_type_node = build_pointer_type (ptr_type_node);
+
+  ftype = build_function_type_list (void_type_node,
+				    ptr_type_node, // void *chain
+				    ptr_type_node, // void *func
+				    ptr_ptr_type_node, // void **dst
+				    NULL_TREE);
+  local_define_builtin ("__builtin_nested_func_ptr_created", ftype,
+			BUILT_IN_NESTED_PTR_CREATED,
+			"__builtin_nested_func_ptr_created", ECF_NOTHROW);
+
+  ftype = build_function_type_list (void_type_node,
+				    NULL_TREE);
+  local_define_builtin ("__builtin_nested_func_ptr_deleted", ftype,
+			BUILT_IN_NESTED_PTR_DELETED,
+			"__builtin_nested_func_ptr_deleted", ECF_NOTHROW);
+
   ftype = build_function_type_list (void_type_node,
 				    ptr_type_node, ptr_type_node, NULL_TREE);
   local_define_builtin ("__builtin_setjmp_setup", ftype,
-- 
2.39.2 (Apple Git-143)


[-- Attachment #3: 0002-target-Support-heap-based-trampolines.patch --]
[-- Type: application/octet-stream, Size: 3571 bytes --]

From a7c7415110feb085620497852776fdad7edf9116 Mon Sep 17 00:00:00 2001
From: Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
Date: Sat, 5 Aug 2023 14:56:31 +0200
Subject: [PATCH 2/3] target: Support heap-based trampolines

Enable -ftrampoline-impl=heap by default if we are on macOS 11
or later.

Co-Authored-By: Andrew Burgess <andrew.burgess@embecosm.com>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>

gcc/ChangeLog:

	* config.gcc: Default to heap trampolines on macOS 11 and above.
	* config/i386/darwin.h: Define X86_CUSTOM_FUNCTION_TEST.
	* config/i386/i386.h: Define X86_CUSTOM_FUNCTION_TEST.
	* config/i386/i386.cc: Use X86_CUSTOM_FUNCTION_TEST.
---
 gcc/config.gcc           | 11 +++++++++++
 gcc/config/i386/darwin.h |  6 ++++++
 gcc/config/i386/i386.cc  |  2 +-
 gcc/config/i386/i386.h   |  6 ++++++
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 415e0e1ebc5..5d70b9b4daf 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1127,6 +1127,17 @@ case ${target} in
   ;;
 esac
 
+# Figure out if we need to enable heap trampolines by default
+case ${target} in
+*-*-darwin2*)
+  # Currently, we do this for macOS 11 and above.
+  tm_defines="$tm_defines HEAP_TRAMPOLINES_INIT=1"
+  ;;
+*)
+  tm_defines="$tm_defines HEAP_TRAMPOLINES_INIT=0"
+  ;;
+esac
+
 case ${target} in
 aarch64*-*-elf | aarch64*-*-fuchsia* | aarch64*-*-rtems*)
 	tm_file="${tm_file} elfos.h newlib-stdint.h"
diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h
index 588bd669bdd..036eefbbb95 100644
--- a/gcc/config/i386/darwin.h
+++ b/gcc/config/i386/darwin.h
@@ -308,3 +308,9 @@ along with GCC; see the file COPYING3.  If not see
 #define CLEAR_INSN_CACHE(beg, end)				\
   extern void sys_icache_invalidate(void *start, size_t len);	\
   sys_icache_invalidate ((beg), (size_t)((end)-(beg)))
+
+/* Disable custom function descriptors for Darwin when we have off-stack
+   trampolines.  */
+#undef X86_CUSTOM_FUNCTION_TEST
+#define X86_CUSTOM_FUNCTION_TEST \
+  (flag_trampolines && flag_trampoline_impl == TRAMPOLINE_IMPL_HEAP) ? 0 : 1
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 8cd26eb54fa..d7fe8f75c4f 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -25705,7 +25705,7 @@ ix86_libgcc_floating_mode_supported_p
 #define TARGET_HARD_REGNO_SCRATCH_OK ix86_hard_regno_scratch_ok
 
 #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
-#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
+#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS X86_CUSTOM_FUNCTION_TEST
 
 #undef TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID
 #define TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID ix86_addr_space_zero_address_valid
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ef342fcee9b..e1495e98c42 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -755,6 +755,12 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 /* Minimum allocation boundary for the code of a function.  */
 #define FUNCTION_BOUNDARY 8
 
+/* We will and with this value to test if a custom function descriptor needs
+   a static chain.  The function boundary must the adjusted so that the bit
+   this represents is no longer part of the address.  0 Disables the custom
+   function descriptors.  */
+#define X86_CUSTOM_FUNCTION_TEST 1
+
 /* C++ stores the virtual bit in the lowest bit of function pointers.  */
 #define TARGET_PTRMEMFUNC_VBIT_LOCATION ptrmemfunc_vbit_in_pfn
 
-- 
2.39.2 (Apple Git-143)


[-- Attachment #4: 0003-libgcc-support-heap-based-trampolines.patch --]
[-- Type: application/octet-stream, Size: 15597 bytes --]

From e875cd959ea6d674530280ead2a2323bd6c2ad3a Mon Sep 17 00:00:00 2001
From: Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
Date: Sat, 5 Aug 2023 14:31:06 +0200
Subject: [PATCH 3/3] libgcc: support heap-based trampolines

Add support for heap-based trampolines on x86_64-linux, aarch64-linux,
and x86_64-darwin. Implement the __builtin_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted functions for these targets.

Co-Authored-By: Andrew Burgess <andrew.burgess@embecosm.com>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>

libgcc/ChangeLog:

	* libgcc2.h (__builtin_nested_func_ptr_created): Declare.
	(__builtin_nested_func_ptr_deleted): Declare.
	* libgcc-std.ver.in: Add the new symbols.
	* config/aarch64/heap-trampoline.c: Implement heap-based
	trampolines for aarch64.
	* config/aarch64/t-heap-trampoline: Add rule to build
	config/aarch64/heap-trampoline.c
	* config/i386/heap-trampoline.c: Implement heap-based
	trampolines for x86_64.
	* config/i386/t-heap-trampoline: Add rule to build
	config/i386/heap-trampoline.cc
	* config.host: Handle --enable-heap-trampolines on
	x86_64-*-linux*, aarch64-*-linux*, x86_64-*-darwin*.
---
 libgcc/config.host                      |   3 +
 libgcc/config/aarch64/heap-trampoline.c | 172 ++++++++++++++++++++++++
 libgcc/config/aarch64/t-heap-trampoline |  19 +++
 libgcc/config/i386/heap-trampoline.c    | 172 ++++++++++++++++++++++++
 libgcc/config/i386/t-heap-trampoline    |  19 +++
 libgcc/libgcc-std.ver.in                |   3 +
 libgcc/libgcc2.h                        |   3 +
 7 files changed, 391 insertions(+)
 create mode 100644 libgcc/config/aarch64/heap-trampoline.c
 create mode 100644 libgcc/config/aarch64/t-heap-trampoline
 create mode 100644 libgcc/config/i386/heap-trampoline.c
 create mode 100644 libgcc/config/i386/t-heap-trampoline

diff --git a/libgcc/config.host b/libgcc/config.host
index c94d69d84b7..d96b02ce87f 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -423,6 +423,7 @@ aarch64*-*-linux*)
 	tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
 	tmake_file="${tmake_file} t-dfprules"
+	tmake_file="${tmake_file} ${cpu_type}/t-heap-trampoline"
 	;;
 aarch64*-*-vxworks7*)
 	extra_parts="$extra_parts crtfastmath.o"
@@ -697,6 +698,7 @@ x86_64-*-darwin*)
 	tmake_file="$tmake_file i386/t-crtpc t-crtfm i386/t-msabi"
 	tm_file="$tm_file i386/darwin-lib.h"
 	extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o"
+	tmake_file="${tmake_file} i386/t-heap-trampoline"
 	;;
 i[34567]86-*-elfiamcu)
 	tmake_file="$tmake_file i386/t-crtstuff t-softfp-sfdftf i386/32/t-softfp i386/32/t-iamcu i386/t-softfp t-softfp t-dfprules"
@@ -763,6 +765,7 @@ x86_64-*-linux*)
 	tmake_file="${tmake_file} i386/t-crtpc t-crtfm i386/t-crtstuff t-dfprules"
 	tm_file="${tm_file} i386/elf-lib.h"
 	md_unwind_header=i386/linux-unwind.h
+	tmake_file="${tmake_file} i386/t-heap-trampoline"
 	;;
 x86_64-*-kfreebsd*-gnu)
 	extra_parts="$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o"
diff --git a/libgcc/config/aarch64/heap-trampoline.c b/libgcc/config/aarch64/heap-trampoline.c
new file mode 100644
index 00000000000..c8b83681ed7
--- /dev/null
+++ b/libgcc/config/aarch64/heap-trampoline.c
@@ -0,0 +1,172 @@
+/* Copyright The GNU Toolchain Authors. */
+
+#include <unistd.h>
+#include <sys/mman.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#if __APPLE__
+/* For pthread_jit_write_protect_np */
+#include <pthread.h>
+#endif
+
+void *allocate_trampoline_page (void);
+int get_trampolines_per_page (void);
+struct tramp_ctrl_data *allocate_tramp_ctrl (struct tramp_ctrl_data *parent);
+void *allocate_trampoline_page (void);
+
+void __builtin_nested_func_ptr_created (void *chain, void *func, void **dst);
+void __builtin_nested_func_ptr_deleted (void);
+
+#if defined(__gnu_linux__)
+static const uint32_t aarch64_trampoline_insns[] = {
+  0xd503245f, /* hint    34 */
+  0x580000b1, /* ldr     x17, .+20 */
+  0x580000d2, /* ldr     x18, .+24 */
+  0xd61f0220, /* br      x17 */
+  0xd5033f9f, /* dsb     sy */
+  0xd5033fdf /* isb */
+};
+
+#elif __APPLE__
+static const uint32_t aarch64_trampoline_insns[] = {
+  0xd503245f, /* hint    34 */
+  0x580000b1, /* ldr     x17, .+20 */
+  0x580000d0, /* ldr     x16, .+24 */
+  0xd61f0220, /* br      x17 */
+  0xd5033f9f, /* dsb     sy */
+  0xd5033fdf /* isb */
+};
+
+#else
+#error "Unsupported AArch64 platform for heap trampolines"
+#endif
+
+struct aarch64_trampoline {
+  uint32_t insns[6];
+  void *func_ptr;
+  void *chain_ptr;
+};
+
+struct tramp_ctrl_data
+{
+  struct tramp_ctrl_data *prev;
+
+  int free_trampolines;
+
+  /* This will be pointing to an executable mmap'ed page.  */
+  struct aarch64_trampoline *trampolines;
+};
+
+int
+get_trampolines_per_page (void)
+{
+  return getpagesize() / sizeof(struct aarch64_trampoline);
+}
+
+static _Thread_local struct tramp_ctrl_data *tramp_ctrl_curr = NULL;
+
+void *
+allocate_trampoline_page (void)
+{
+  void *page;
+
+#if defined(__gnu_linux__)
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE, 0, 0);
+#elif __APPLE__
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE | MAP_JIT, 0, 0);
+#else
+  page = MAP_FAILED;
+#endif
+
+  return page;
+}
+
+struct tramp_ctrl_data *
+allocate_tramp_ctrl (struct tramp_ctrl_data *parent)
+{
+  struct tramp_ctrl_data *p = malloc (sizeof (struct tramp_ctrl_data));
+  if (p == NULL)
+    return NULL;
+
+  p->trampolines = allocate_trampoline_page ();
+
+  if (p->trampolines == MAP_FAILED)
+    return NULL;
+
+  p->prev = parent;
+  p->free_trampolines = get_trampolines_per_page();
+
+  return p;
+}
+
+void
+__builtin_nested_func_ptr_created (void *chain, void *func, void **dst)
+{
+  if (tramp_ctrl_curr == NULL)
+    {
+      tramp_ctrl_curr = allocate_tramp_ctrl (NULL);
+      if (tramp_ctrl_curr == NULL)
+	abort ();
+    }
+
+  if (tramp_ctrl_curr->free_trampolines == 0)
+    {
+      void *tramp_ctrl = allocate_tramp_ctrl (tramp_ctrl_curr);
+      if (!tramp_ctrl)
+	abort ();
+
+      tramp_ctrl_curr = tramp_ctrl;
+    }
+
+  struct aarch64_trampoline *trampoline
+    = &tramp_ctrl_curr->trampolines[get_trampolines_per_page ()
+				    - tramp_ctrl_curr->free_trampolines];
+
+#if __APPLE__
+  /* Disable write protection for the MAP_JIT regions in this thread (see
+     https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon) */
+  pthread_jit_write_protect_np (0);
+#endif
+
+  memcpy (trampoline->insns, aarch64_trampoline_insns,
+	  sizeof(aarch64_trampoline_insns));
+  trampoline->func_ptr = func;
+  trampoline->chain_ptr = chain;
+
+#if __APPLE__
+  /* Re-enable write protection.  */
+  pthread_jit_write_protect_np (1);
+#endif
+
+  tramp_ctrl_curr->free_trampolines -= 1;
+
+  __builtin___clear_cache ((void *)trampoline->insns,
+			   ((void *)trampoline->insns + sizeof(trampoline->insns)));
+
+  *dst = &trampoline->insns;
+}
+
+void
+__builtin_nested_func_ptr_deleted (void)
+{
+  if (tramp_ctrl_curr == NULL)
+    abort ();
+
+  tramp_ctrl_curr->free_trampolines += 1;
+
+  if (tramp_ctrl_curr->free_trampolines == get_trampolines_per_page ())
+    {
+      if (tramp_ctrl_curr->prev == NULL)
+	return;
+
+      munmap (tramp_ctrl_curr->trampolines, getpagesize());
+      struct tramp_ctrl_data *prev = tramp_ctrl_curr->prev;
+      free (tramp_ctrl_curr);
+      tramp_ctrl_curr = prev;
+    }
+}
diff --git a/libgcc/config/aarch64/t-heap-trampoline b/libgcc/config/aarch64/t-heap-trampoline
new file mode 100644
index 00000000000..b22480800b2
--- /dev/null
+++ b/libgcc/config/aarch64/t-heap-trampoline
@@ -0,0 +1,19 @@
+# Copyright The GNU Toolchain Authors.
+
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+LIB2ADD += $(srcdir)/config/aarch64/heap-trampoline.c
diff --git a/libgcc/config/i386/heap-trampoline.c b/libgcc/config/i386/heap-trampoline.c
new file mode 100644
index 00000000000..96e13bf828e
--- /dev/null
+++ b/libgcc/config/i386/heap-trampoline.c
@@ -0,0 +1,172 @@
+/* Copyright The GNU Toolchain Authors. */
+
+#include <unistd.h>
+#include <sys/mman.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#if __APPLE__ && __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+/* For pthread_jit_write_protect_np */
+#include <pthread.h>
+#endif
+
+void *allocate_trampoline_page (void);
+int get_trampolines_per_page (void);
+struct tramp_ctrl_data *allocate_tramp_ctrl (struct tramp_ctrl_data *parent);
+void *allocate_trampoline_page (void);
+
+void __builtin_nested_func_ptr_created (void *chain, void *func, void **dst);
+void __builtin_nested_func_ptr_deleted (void);
+
+static const uint8_t trampoline_insns[] = {
+  /* movabs $<chain>,%r11  */
+  0x49, 0xbb,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+
+  /* movabs $<func>,%r10  */
+  0x49, 0xba,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+
+  /* rex.WB jmpq *%r11  */
+  0x41, 0xff, 0xe3
+};
+
+union ix86_trampoline {
+  uint8_t insns[sizeof(trampoline_insns)];
+
+  struct __attribute__((packed)) fields {
+    uint8_t insn_0[2];
+    void *func_ptr;
+    uint8_t insn_1[2];
+    void *chain_ptr;
+    uint8_t insn_2[3];
+  } fields;
+};
+
+struct tramp_ctrl_data
+{
+  struct tramp_ctrl_data *prev;
+
+  int free_trampolines;
+
+  /* This will be pointing to an executable mmap'ed page.  */
+  union ix86_trampoline *trampolines;
+};
+
+int
+get_trampolines_per_page (void)
+{
+  return getpagesize() / sizeof(union ix86_trampoline);
+}
+
+static _Thread_local struct tramp_ctrl_data *tramp_ctrl_curr = NULL;
+
+void *
+allocate_trampoline_page (void)
+{
+  void *page;
+
+#if defined(__gnu_linux__)
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE, 0, 0);
+#elif __APPLE__
+# if  __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE | MAP_JIT, 0, 0);
+# else
+  page = mmap (0, getpagesize (), PROT_WRITE | PROT_EXEC,
+	       MAP_ANON | MAP_PRIVATE, 0, 0);
+# endif
+#else
+  page = MAP_FAILED;
+#endif
+
+  return page;
+}
+
+struct tramp_ctrl_data *
+allocate_tramp_ctrl (struct tramp_ctrl_data *parent)
+{
+  struct tramp_ctrl_data *p = malloc (sizeof (struct tramp_ctrl_data));
+  if (p == NULL)
+    return NULL;
+
+  p->trampolines = allocate_trampoline_page ();
+
+  if (p->trampolines == MAP_FAILED)
+    return NULL;
+
+  p->prev = parent;
+  p->free_trampolines = get_trampolines_per_page();
+
+  return p;
+}
+
+void
+__builtin_nested_func_ptr_created (void *chain, void *func, void **dst)
+{
+  if (tramp_ctrl_curr == NULL)
+    {
+      tramp_ctrl_curr = allocate_tramp_ctrl (NULL);
+      if (tramp_ctrl_curr == NULL)
+	abort ();
+    }
+
+  if (tramp_ctrl_curr->free_trampolines == 0)
+    {
+      void *tramp_ctrl = allocate_tramp_ctrl (tramp_ctrl_curr);
+      if (!tramp_ctrl)
+	abort ();
+
+      tramp_ctrl_curr = tramp_ctrl;
+    }
+
+  union ix86_trampoline *trampoline
+    = &tramp_ctrl_curr->trampolines[get_trampolines_per_page ()
+				    - tramp_ctrl_curr->free_trampolines];
+
+#if __APPLE__ && __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+  /* Disable write protection for the MAP_JIT regions in this thread (see
+     https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon) */
+  pthread_jit_write_protect_np (0);
+#endif
+
+  memcpy (trampoline->insns, trampoline_insns,
+	  sizeof(trampoline_insns));
+  trampoline->fields.func_ptr = func;
+  trampoline->fields.chain_ptr = chain;
+
+#if __APPLE__ && __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101400
+  /* Re-enable write protection.  */
+  pthread_jit_write_protect_np (1);
+#endif
+
+  tramp_ctrl_curr->free_trampolines -= 1;
+
+  __builtin___clear_cache ((void *)trampoline->insns,
+			   ((void *)trampoline->insns + sizeof(trampoline->insns)));
+
+  *dst = &trampoline->insns;
+}
+
+void
+__builtin_nested_func_ptr_deleted (void)
+{
+  if (tramp_ctrl_curr == NULL)
+    abort ();
+
+  tramp_ctrl_curr->free_trampolines += 1;
+
+  if (tramp_ctrl_curr->free_trampolines == get_trampolines_per_page ())
+    {
+      if (tramp_ctrl_curr->prev == NULL)
+	return;
+
+      munmap (tramp_ctrl_curr->trampolines, getpagesize());
+      struct tramp_ctrl_data *prev = tramp_ctrl_curr->prev;
+      free (tramp_ctrl_curr);
+      tramp_ctrl_curr = prev;
+    }
+}
diff --git a/libgcc/config/i386/t-heap-trampoline b/libgcc/config/i386/t-heap-trampoline
new file mode 100644
index 00000000000..613f635b1f6
--- /dev/null
+++ b/libgcc/config/i386/t-heap-trampoline
@@ -0,0 +1,19 @@
+# Copyright The GNU Toolchain Authors.
+
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+LIB2ADD += $(srcdir)/config/i386/heap-trampoline.c
diff --git a/libgcc/libgcc-std.ver.in b/libgcc/libgcc-std.ver.in
index c4f87a50e70..a48f4899eb6 100644
--- a/libgcc/libgcc-std.ver.in
+++ b/libgcc/libgcc-std.ver.in
@@ -1943,4 +1943,7 @@ GCC_4.8.0 {
 GCC_7.0.0 {
   __PFX__divmoddi4
   __PFX__divmodti4
+
+  __builtin_nested_func_ptr_created
+  __builtin_nested_func_ptr_deleted
 }
diff --git a/libgcc/libgcc2.h b/libgcc/libgcc2.h
index 3ec9bbd8164..ac7eaab4f01 100644
--- a/libgcc/libgcc2.h
+++ b/libgcc/libgcc2.h
@@ -29,6 +29,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #pragma GCC visibility push(default)
 #endif
 
+extern void __builtin_nested_func_ptr_created (void *, void *, void **);
+extern void __builtin_nested_func_ptr_deleted (void);
+
 extern int __gcc_bcmp (const unsigned char *, const unsigned char *, size_t);
 extern void __clear_cache (void *, void *);
 extern void __eprintf (const char *, const char *, unsigned int, const char *)
-- 
2.39.2 (Apple Git-143)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] core: Support heap-based trampolines
  2023-09-06 15:44     ` FX Coudert
@ 2023-09-14 10:18       ` Richard Biener
  2023-09-16 19:10         ` Iain Sandoe
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2023-09-14 10:18 UTC (permalink / raw)
  To: FX Coudert; +Cc: Jeff Law, GCC Patches, Iain Sandoe, maxim.blinov, ebotcazou

On Wed, Sep 6, 2023 at 5:44 PM FX Coudert <fxcoudert@gmail.com> wrote:
>
> Hi,
>
> ping**2 on the revised patch, for Richard or another global reviewer. So far all review feedback is that it’s a step forward, and it’s been widely used for both aarch64-darwin and x86_64-darwin distributions for almost three years now.
>
> OK to commit?

I just noticed that ftrampoline-impl isn't Optimize, thus it's not
streamed with LTO.  How does mixing
different -ftrampoline-impl for different LTO TUs behave?  How does
mis-specifying -ftrampoline-impl
at LTO link time compared to compile-time behave?  Is the state fully
reflected during pre-IPA compilation
and the flag not needed after that?  It appears so, but did you check?

OK if that's a non-issue.

Thanks,
Richard.

> FX
>
>
>
> > Le 5 août 2023 à 16:20, FX Coudert <fxcoudert@gmail.com> a écrit :
> >
> > Hi Richard,
> >
> > Thanks for your feedback. Here is an amended version of the patch, taking into consideration your requests and the following discussion. There is no configure option for the libgcc part, and the documentation is amended. The patch is split into three commits for core, target and libgcc.
> >
> > Currently regtesting on x86_64 linux and darwin (it was fine before I split up into three commits, so I’m re-testing to make sure I didn’t screw anything up).
> >
> > OK to commit?
> > FX
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] core: Support heap-based trampolines
  2023-09-14 10:18       ` Richard Biener
@ 2023-09-16 19:10         ` Iain Sandoe
  0 siblings, 0 replies; 15+ messages in thread
From: Iain Sandoe @ 2023-09-16 19:10 UTC (permalink / raw)
  To: Richard Biener
  Cc: FX Coudert, Jeff Law, GCC Patches, Maxim Blinov, Eric Botcazou

Hi Richard,

> On 14 Sep 2023, at 11:18, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Wed, Sep 6, 2023 at 5:44 PM FX Coudert <fxcoudert@gmail.com> wrote:
>> 

>> ping**2 on the revised patch, for Richard or another global reviewer. So far all review feedback is that it’s a step forward, and it’s been widely used for both aarch64-darwin and x86_64-darwin distributions for almost three years now.
>> 
>> OK to commit?
> 
> I just noticed that ftrampoline-impl isn't Optimize, thus it's not
> streamed with LTO.

I think this is fine, the nested pass runs before LTO streaming and lowers to the relevant built-ins for the chosen impl.
The builtins are distinct and can co-exist in the linked exe,

>  How does mixing different -ftrampoline-impl for different LTO TUs behave?

Assuming that a target can support multiple implementations, then each is applied local to a single TU.  The nested functions are scoped within their parent and thus should not be candidates for merging by LTO.

For a target that cannot support both, then one or more of the TUs should be rejected before we even get to LTO.

>  How does mis-specifying -ftrampoline-impl at LTO link time compared to compile-time behave?

The flag should be a  NOP at LTO link time (but I do not think we want to reject it, that would probably create other issues?)

>  Is the state fully reflected during pre-IPA compilation and the flag not needed after that?  

yes, that is my understanding, nested runs very early.

> It appears so, but did you check?

I actually checked on x86_64-darwin (which does support both) and we see…
here with two tus with nested fns and a third with the main().

$ nm -mapv ./nn.ltrans0.ltrans.o

as expected, two instances of the nested “bar”.

00000000000001a8 (__TEXT,__cstring) non-external lC0
000000000000001f (__TEXT,__text) non-external _bar.0.lto_priv.0 
00000000000001d0 (__TEXT,__cstring) non-external lC1
00000000000000ec (__TEXT,__text) non-external _bar.0.lto_priv.1
000000000000007c (__TEXT,__text) external _foo_1
0000000000000149 (__TEXT,__text) external _foo_2
0000000000000000 (__TEXT,__text) external _main

>>> these for heap-based:
                 (undefined) external ___builtin_nested_func_ptr_created 
                 (undefined) external ___builtin_nested_func_ptr_deleted

>>> this for stack-based.
                 (undefined) external ___enable_execute_stack

(and the code executes as expected).

> OK if that's a non-issue.

thanks, we'll wait a day or two in case of any follow-on comments,
Iain

P.S. I was investigating some unrelated unwinder issues a couple of weeks ago, but that did highlight that we have a possibility to avoid the leaks from longjump if we hang on the forced_unwind() machinery [TODO, tho, not part of this initial patch]


> 
> Thanks,
> Richard.
> 
>> FX
>> 
>> 
>> 
>>> Le 5 août 2023 à 16:20, FX Coudert <fxcoudert@gmail.com> a écrit :
>>> 
>>> Hi Richard,
>>> 
>>> Thanks for your feedback. Here is an amended version of the patch, taking into consideration your requests and the following discussion. There is no configure option for the libgcc part, and the documentation is amended. The patch is split into three commits for core, target and libgcc.
>>> 
>>> Currently regtesting on x86_64 linux and darwin (it was fine before I split up into three commits, so I’m re-testing to make sure I didn’t screw anything up).
>>> 
>>> OK to commit?
>>> FX
>> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-09-16 19:11 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-16 10:38 [PATCH] core: Support heap-based trampolines FX Coudert
2023-07-17  6:31 ` Richard Biener
2023-07-17  6:43   ` FX Coudert
2023-07-17  6:58     ` Iain Sandoe
2023-07-17  7:16       ` Iain Sandoe
2023-07-19  9:04         ` Martin Uecker
2023-07-19  9:29           ` Iain Sandoe
2023-07-19 10:43             ` Martin Uecker
2023-07-19 14:23               ` Iain Sandoe
2023-07-19 15:18                 ` Martin Uecker
2023-08-05 14:20   ` FX Coudert
2023-08-20  9:43     ` FX Coudert
2023-09-06 15:44     ` FX Coudert
2023-09-14 10:18       ` Richard Biener
2023-09-16 19:10         ` Iain Sandoe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).