I've committed this patch to change the ways stacks are initialized on amdgcn. The patch only touches GCN files, or the GCN-only portions of libgomp files, so I'm allowing it despite stage 4 because I want the ABI change done for GCC 13, and because it enables Tobias's reverse offload-patch that has already been approved, I think. The stacks used to be placed in the "private segment" provided for the purpose by the GPU drivers, but those addresses are not accessible from the host, not even by the HSA API, which was a problem for reverse offload. The new scheme allocates space in the same way as we do the heap space, except that each kernel has its own instance. We were already doing that for the "team arena" ephemeral heap, so I have unified the two implementations. While the change does not alter the procedure call standard, it does alter the kernel entry ABI and requires any code using the compiler builtins for kernel properties to be rebuilt. A recent version of Newlib is required (version 4.3.0.20230120 has the necessary changes). Benchmarking shows no significant change in performance. The __builtin_apply tests fail because they attempt to access memory in parent stack frames (I think), but that causes a memory fault when they don't exist (stack underflow; if I modify the testcase to include extra call depth it passed fine). In any case, the behaviour of __builtin_apply has not changed, only the device has become less forgiving. I will back-port this to OG12 shortly. Andrew