* [RFC] [PR 68191] s390: Add -fsplit-stack support. @ 2016-01-02 19:16 Marcin Kościelnicki 2016-01-02 19:16 ` [PATCH 2/5] s390: Fix missing .size directives Marcin Kościelnicki ` (5 more replies) 0 siblings, 6 replies; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-02 19:16 UTC (permalink / raw) To: gcc-patches Here's my attempt at adding -fsplit-stack support for s390 targets (bug 68191). Patches 1 and 2 fix s390-specific issues affecting split stack code, and can be pushed independently of the main course. Patches 3 and 4 attempt to fix target-independent issues involving unconditional jumps with side effects (see below). I'm not exactly sure I'm doing the right thing in these, and I'd really welcome some feedback about them and the general approach taken. Patch 5 is split stack support proper. This patch should be used along with the matching glibc and gold patches (I'll soon link them all in the bugzilla entry). The generic approach is identical to x86: I add a new __private_ss field to the TCB in glibc, add a target-specific __morestack function and friends, emit a split-stack prologue, teach va_start to deal with a dedicated vararg pointer, and teach gold to recognize the split-stack prologue and handle non-split-stack calls by bumping the requested frame size. The differences start in the __morestack calling convention. Basically, since pushing things on stuck is unwieldy and there's only one free register (%r0 could be used for static chain, %r2-%r6 contain arguments, %r6-%r15 are callee-saved), I stuff the parameters somewhere in .rodata or .text section, and pass the address of the parameter block in %r1. The parameter block also contains a (position-relative) address that __morestack should jump to (x86 just mangles the return address from __morestack to compute that). On zSeries CPUs, the parameter block is stuffed somewhere in .rodata, its address loaded to %r1 by larl instruction, and __morestack is sibling-called by jg instruction. On older CPUs, lacking long jump and PC-relative load-address instructions, I use the following sequence instead: # load .L1 to %r1 basr %r1, 0 .L1: # Load __morestack to %r1 a %r1, .L2-.L1(%r1) # Jump to __morestack and stuff return address (aka param block address) # to %r1. basr %r1, %r1 # param block comes here .L3: .long <frame_size> .long <args_size> .long .L4-.L3 # relative __morestack address here .L2: .long __morestack-.L1 .L4: # __morestack jumps here As on other targets, the call to __morestack is conditional, based on comparing the stack pointer with a field in TCB. For zSeries, I just make the jump to __morestack a conditional one, while for older CPUs I emit a jump over the sequence. Also, for vararg functions, I need to stuff the vararg pointer in some register. Since %r1 is again the only one guaranteed to be free, it's the one used. If __morestack is called, it'll leave the correct pointer in %r1. Otherwise, I emit a simple load-address instruction. Since I only need that instruction in the not-called branch (as opposed to x86 that emits it on both branches), I get terser code. Now, here come the problems. To keep optimization passes from destroying the above sequence (as well as the simpler ones with larl), I emit a pseudo-insn (split_stack_call_*) that is expanded to the above in machine-dependent reorg phase, just like normal const pools. The instruction is considered to be an unconditional jump to the .L4 label (since __morestack will jump to an arbitrary address selected by param block anyway, that's what it effectively is). For a zSeries CPU with a conditional call, I represent the sequence as a conditional jump instead. So overall the sequences, as emitted by s390_expand_split_stack_prologue, look as follows: # (1) Old CPU, unconditional <call __morestack using basr as above, jump to .L4> .L4: # Normal prologue starts here. # (2) zSeries CPU, unconditional <call __morestack using larl+jg, jump to .L4> .L4: # Normal prologue starts here. # Which will expand to: larl %r1, .L3 jg __morestack .section .rodata .L3: # Or .long for 31-bit target. .quad <frame_size> .quad <args_size> .quad .L4-.L3 .text # (3) Old CPU, conditional <load and compare the guard against stack pointer - nothing interesting> jhe .L5 <call __morestack using basr, jump to .L4> .L5: # Compute vararg pointer (vararg functions only) la %r1, 96(%r15) .L4: # Normal prologue starts here. # (4) zSeries CPU, conditional <load and compare the guard against stack pointer> <conditionally call __morestack using larl+jgl, if called jump to .L4> # Compute vararg pointer (vararg functions only) la %r1, 160(%r15) .L4: # Normal prologue starts here. # Expands as above, except with jgl instead of jg. Case (4) is the least problematic: conditional jumps with side effects appear to work quite well. However, the other variants involve an unconditional jump with side effects, which causes two problems: - If the jump is to immediately following label (which will happen always in cases (1) and (2), and for non-vararg functions in (3)), rtl_tidy_fallthru_edge mistakenly marks it as a fallthru edge, even though it correctly figures the jump cannot be removed due to the side effects. This causes a verification failure later. - In case (3), since the call to __morestack is considered to be unlikely, the basic block with the call pseudo-insn will be moved to the end of the function if we're optimizing. Since it already ends with an unconditional jump, no new jump will be inserted (as opposed to x86). Soon afterwards, reposition_prologue_and_epilogue_notes will move NOTE_INSN_PROLOGUE_END after the last prologue instruction, which is now our pseudo-jump. Unfortunately, it doesn't consider the possibility of it being an unconditional jump, and stuffs the note right between the jump and the following barrier, again causing a verification failure. Patches 3 and 4 of the patchset attempt to fix the above problems. For the first one, I just skip the edge if it involves an unconditional jump with side effects. For the second, I carefully extract the note from its basic block and put it after the barrier. I'm not sure any of it is the right approach, and would welcome any feedback. I've also found a target-independent issue with -fsplit-stack: suppose we're compiling with -fsplit-stack and -fprofile-use or some other option that will partition the code into hot and cold sections. Further suppose that the code that ends up in .text.unlikely involves a function call aiming at a function compiled without -fsplit-stack. In that case, the linker should obviously perform the necessary transforms on the function prologue to bump its frame-size. However, since the code in .text.unlikely doesn't really belong to function foo according to the symbol table, one of the following happens instead: - x86: since foo.cold.0 is not a function (STT_NOTYPE), it's not scanned for calls to -fno-split-stack functions, and may easily result in a stack overflow at runtime. - s390: since foo.cold.0 *is* a function (STT_FUNCT), it's scanned for such calls, and linker tries to modify foo.cold.0's split-stack prologue. This fails with a linker error, since it obviously doesn't have one. I have no idea what to do about that. Since mixing split-stack code with -fno-split-stack is horribly broken in many ways, I'm tempted to just ignore the problem. A few other non-obvious problems and notes: - For old CPUs, in case (3), optimization will move the call to the end of the function... but since branches on s390 reach only 4kiB in either direction, we s390_split_branches may attempt to split the branch to that block, which would fail horribly since it's before proper prologue and we cannot clobber %r14. I detect this case and move the basic block back to its original location instead. - Likewise, s390_split_branches needed to be taught not to look at the __morestack call pseudo-insn (which is considered a jump). It'd only get confused. - s390_chunkify_start is responsible for reloading the const pool register when branches are made between portions of a function using different const pools. In case (3), we likewise cannot do that, since %r13 cannot be clobbered yet. I just disable emitting the const pool reload in this case. - The (ordinary) prologue needs a temp register for its own use. As per the above rationale, it also tends to pick %r1, which collides with us using it for the vararg pointer. There already was a condition that picks %r14 instead, if possible. I amended it to pick %r12 if %r1 would be picked in a vararg split-stack function, and modified s390_register_info to consider it clobbered in this case. - For leaf functions, there's a possibility that frame_size will be 0. In this case, there's no point in doing the __morestack dance. However, we need some way to tell a split-stack function apart in the linker and perhaps at runtime as well, if non-split function-pointer calls are ever implemented. We may be able to get away without that, but just in case, I emit a funny nop (nopr %r15) instead of split-stack prologue in such functions to mark them (both x86 and ppc always emit a split-stack prologue and I'd feel uneasy if I didn't include one). - I use a conditional __morestack call if frame_size fits in an add immediate instruction (16-bit signed if the CPU doesn't have extended immediate instructions, 32-bit if it does), unconditional otherwise (__morestack will check anyway, but there's not much chance of already having such a big frame). - gold will try bumping the immediate field in the above add instruction if it's present and the frame size still fits, and will nop out the comparison and convert to an unconditional call otherwise. It'll always bump the frame size in the parameter block. Thanks to that, we don't need a separate __morestack_nonsplit function like x86. - If -pg is used together with -fsplit-stack, the call to _mcount will be emitted before the split-stack prologue (as opposed to x86, which emits it after the prologue). This is not a big problem, but gold needs to account for that and recognize the _mcount call before the split-stack prologue. I have run the testsuite on a z13 machine. In addition to running it with -fsplit-stack, I've also run it with s390_expand_split_stack_prologue modified to always emit unconditional calls (to exercise more paths in __morestack). There are a few new failures, but they can all be explained: - the testcases for __builtin_return_address and friends hit __morestack's stack frame instead of whatever they were hoping to find. - guality tests all break since gdb looks at __morestack's frame instead of the one that called it. Marking guality_check with __attribute__ ((no_split_stack)) made them go away, though a better fix would be to make gdb skip __morestack frames somehow... - some guality tests try printing function arguments after an alloca or VLA allocation with optimization. These no longer work, since the arguments are in caller-saved registers, and a call to __morestack_allocate_stack_space will destroy them. - the .text.unlikely issue mentioned above. ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH 2/5] s390: Fix missing .size directives. 2016-01-02 19:16 [RFC] [PR 68191] s390: Add -fsplit-stack support Marcin Kościelnicki @ 2016-01-02 19:16 ` Marcin Kościelnicki 2016-01-20 13:16 ` Andreas Krebbel 2016-01-02 19:16 ` [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU Marcin Kościelnicki ` (4 subsequent siblings) 5 siblings, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-02 19:16 UTC (permalink / raw) To: gcc-patches; +Cc: Marcin Kościelnicki It seems at some point the .size hook was hijacked to emit some machine-specific directives, and the actual .size directive was forgotten. This caused problems for split-stack support, since linker couldn't scan the function body for non-split-stack calls. gcc/ChangeLog: * config/s390/s390.c (s390_asm_declare_function_size): Add code to actually emit the .size directive. --- gcc/ChangeLog | 5 +++++ gcc/config/s390/s390.c | 4 +++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 2c572a7..6aef3f9 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,10 @@ 2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> + * config/s390/s390.c (s390_asm_declare_function_size): Add code + to actually emit the .size directive. + +2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> + * config/s390/s390.md (pool_section_start): Use switch_to_section to select proper read-only data section instead of hardcoding .rodata. (pool_section_end): Use switch_to_section to match the above. diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 16045f0..9dc8d1e 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -6834,8 +6834,10 @@ s390_asm_output_function_prefix (FILE *asm_out_file, void s390_asm_declare_function_size (FILE *asm_out_file, - const char *fnname ATTRIBUTE_UNUSED, tree decl) + const char *fnname, tree decl) { + if (!flag_inhibit_size_directive) + ASM_OUTPUT_MEASURED_SIZE (asm_out_file, fnname); if (DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL) return; fprintf (asm_out_file, "\t.machine pop\n"); -- 2.6.4 ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 2/5] s390: Fix missing .size directives. 2016-01-02 19:16 ` [PATCH 2/5] s390: Fix missing .size directives Marcin Kościelnicki @ 2016-01-20 13:16 ` Andreas Krebbel 2016-01-20 14:01 ` Dominik Vogt 2016-01-21 9:59 ` Andreas Krebbel 0 siblings, 2 replies; 55+ messages in thread From: Andreas Krebbel @ 2016-01-20 13:16 UTC (permalink / raw) To: Marcin Kościelnicki, gcc-patches On 01/02/2016 08:16 PM, Marcin KoÅcielnicki wrote: > It seems at some point the .size hook was hijacked to emit some > machine-specific directives, and the actual .size directive was > forgotten. This caused problems for split-stack support, since > linker couldn't scan the function body for non-split-stack calls. > > gcc/ChangeLog: > > * config/s390/s390.c (s390_asm_declare_function_size): Add code > to actually emit the .size directive. ... > s390_asm_declare_function_size (FILE *asm_out_file, > - const char *fnname ATTRIBUTE_UNUSED, tree decl) > + const char *fnname, tree decl) > { > + if (!flag_inhibit_size_directive) > + ASM_OUTPUT_MEASURED_SIZE (asm_out_file, fnname); > if (DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL) > return; > fprintf (asm_out_file, "\t.machine pop\n"); It would be good to use the original ASM_DECLARE_FUNCTION_SIZE macro from config/elfos.h here. This probably would require to change its name in s390.h first and then use it from s390_asm_declare_function_size. Not really beautiful but at least changes to the original macro would not require adjusting our backend. -Andreas- ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 2/5] s390: Fix missing .size directives. 2016-01-20 13:16 ` Andreas Krebbel @ 2016-01-20 14:01 ` Dominik Vogt 2016-01-21 9:59 ` Andreas Krebbel 1 sibling, 0 replies; 55+ messages in thread From: Dominik Vogt @ 2016-01-20 14:01 UTC (permalink / raw) To: gcc-patches On Wed, Jan 20, 2016 at 02:16:23PM +0100, Andreas Krebbel wrote: > On 01/02/2016 08:16 PM, Marcin KoÅcielnicki wrote: > > s390_asm_declare_function_size (FILE *asm_out_file, > > - const char *fnname ATTRIBUTE_UNUSED, tree decl) > > + const char *fnname, tree decl) > > { > > + if (!flag_inhibit_size_directive) > > + ASM_OUTPUT_MEASURED_SIZE (asm_out_file, fnname); > > if (DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL) > > return; > > fprintf (asm_out_file, "\t.machine pop\n"); > > It would be good to use the original ASM_DECLARE_FUNCTION_SIZE macro from config/elfos.h here. This > probably would require to change its name in s390.h first and then use it from > s390_asm_declare_function_size. Not really beautiful but at least changes to the original macro > would not require adjusting our backend. Maybe it's better not to invent yet another solution to deal with this and just do it like proposed in the patch. So if the default implementation is ever changed, the same search pattern will find all identical copies of the code. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 2/5] s390: Fix missing .size directives. 2016-01-20 13:16 ` Andreas Krebbel 2016-01-20 14:01 ` Dominik Vogt @ 2016-01-21 9:59 ` Andreas Krebbel 2016-01-21 10:10 ` Marcin Kościelnicki 1 sibling, 1 reply; 55+ messages in thread From: Andreas Krebbel @ 2016-01-21 9:59 UTC (permalink / raw) To: Marcin Kościelnicki, gcc-patches On 01/20/2016 02:16 PM, Andreas Krebbel wrote: > On 01/02/2016 08:16 PM, Marcin KoÅcielnicki wrote: >> It seems at some point the .size hook was hijacked to emit some >> machine-specific directives, and the actual .size directive was >> forgotten. This caused problems for split-stack support, since >> linker couldn't scan the function body for non-split-stack calls. >> >> gcc/ChangeLog: >> >> * config/s390/s390.c (s390_asm_declare_function_size): Add code >> to actually emit the .size directive. > > ... > >> s390_asm_declare_function_size (FILE *asm_out_file, >> - const char *fnname ATTRIBUTE_UNUSED, tree decl) >> + const char *fnname, tree decl) >> { >> + if (!flag_inhibit_size_directive) >> + ASM_OUTPUT_MEASURED_SIZE (asm_out_file, fnname); >> if (DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL) >> return; >> fprintf (asm_out_file, "\t.machine pop\n"); > > It would be good to use the original ASM_DECLARE_FUNCTION_SIZE macro from config/elfos.h here. This > probably would require to change its name in s390.h first and then use it from > s390_asm_declare_function_size. Not really beautiful but at least changes to the original macro > would not require adjusting our backend. I've looked into how the other archs are doing this and didn't find anything better than just including the code from the original macro. The real fix probably would be to turn this into a target hook instead. I've committed the patch now since it fixes a real problem not only with split-stack. Thanks! -Andreas- ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 2/5] s390: Fix missing .size directives. 2016-01-21 9:59 ` Andreas Krebbel @ 2016-01-21 10:10 ` Marcin Kościelnicki 0 siblings, 0 replies; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-21 10:10 UTC (permalink / raw) To: Andreas Krebbel, gcc-patches On 21/01/16 10:58, Andreas Krebbel wrote: > On 01/20/2016 02:16 PM, Andreas Krebbel wrote: >> On 01/02/2016 08:16 PM, Marcin KoÅcielnicki wrote: >>> It seems at some point the .size hook was hijacked to emit some >>> machine-specific directives, and the actual .size directive was >>> forgotten. This caused problems for split-stack support, since >>> linker couldn't scan the function body for non-split-stack calls. >>> >>> gcc/ChangeLog: >>> >>> * config/s390/s390.c (s390_asm_declare_function_size): Add code >>> to actually emit the .size directive. >> >> ... >> >>> s390_asm_declare_function_size (FILE *asm_out_file, >>> - const char *fnname ATTRIBUTE_UNUSED, tree decl) >>> + const char *fnname, tree decl) >>> { >>> + if (!flag_inhibit_size_directive) >>> + ASM_OUTPUT_MEASURED_SIZE (asm_out_file, fnname); >>> if (DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL) >>> return; >>> fprintf (asm_out_file, "\t.machine pop\n"); >> >> It would be good to use the original ASM_DECLARE_FUNCTION_SIZE macro from config/elfos.h here. This >> probably would require to change its name in s390.h first and then use it from >> s390_asm_declare_function_size. Not really beautiful but at least changes to the original macro >> would not require adjusting our backend. > > I've looked into how the other archs are doing this and didn't find anything better than just > including the code from the original macro. The real fix probably would be to turn this into a > target hook instead. > > I've committed the patch now since it fixes a real problem not only with split-stack. > > Thanks! > > -Andreas- > I did a version that reincludes elfos.h, but it didn't finish testing (it made it through bootstrap) before you applied the patch: diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 21a5687..c56b909 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -6832,10 +6832,17 @@ s390_asm_output_function_prefix (FILE *asm_out_file, /* Write an extra function footer after the very end of the function. */ +/* Get elfos.h's original ASM_DECLARE_FUNCTION_SIZE, so that we can delegate + to it below. */ + +#undef ASM_DECLARE_FUNCTION_SIZE +#include "../elfos.h" + void s390_asm_declare_function_size (FILE *asm_out_file, - const char *fnname ATTRIBUTE_UNUSED, tree decl) + const char *fnname, tree decl) { + ASM_DECLARE_FUNCTION_SIZE(asm_out_file, fnname, decl); if (DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL) return; fprintf (asm_out_file, "\t.machine pop\n"); But, this is much uglier, and the macro is very unlikely to change in the first place. I guess we should stay with the applied patch. Thanks, Marcin ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU. 2016-01-02 19:16 [RFC] [PR 68191] s390: Add -fsplit-stack support Marcin Kościelnicki 2016-01-02 19:16 ` [PATCH 2/5] s390: Fix missing .size directives Marcin Kościelnicki @ 2016-01-02 19:16 ` Marcin Kościelnicki 2016-01-21 10:05 ` Andreas Krebbel 2016-04-17 21:24 ` Jeff Law 2016-01-02 19:16 ` [PATCH 3/5] Fix NOTE_INSN_PROLOGUE_END after unconditional jump Marcin Kościelnicki ` (3 subsequent siblings) 5 siblings, 2 replies; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-02 19:16 UTC (permalink / raw) To: gcc-patches; +Cc: Marcin Kościelnicki When an unconditional jump with side effects targets an immediately following label, rtl_tidy_fallthru_edge is called. Since it has side effects, it doesn't remove the jump, but the label is still marked as fallthru. This later causes a verification error. Do nothing in this case instead. gcc/ChangeLog: * cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps with side effects. --- gcc/ChangeLog | 5 +++++ gcc/cfgrtl.c | 2 ++ 2 files changed, 7 insertions(+) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 56e31f6..4c7046f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,10 @@ 2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> + * cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps + with side effects. + +2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> + * function.c (reposition_prologue_and_epilogue_notes): Avoid verification error if the last insn of prologue is an unconditional jump. diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c index fbfc7cd..dc4c2b1 100644 --- a/gcc/cfgrtl.c +++ b/gcc/cfgrtl.c @@ -1762,6 +1762,8 @@ rtl_tidy_fallthru_edge (edge e) If block B consisted only of this single jump, turn it into a deleted note. */ q = BB_END (b); + if (JUMP_P (q) && !onlyjump_p (q)) + return; if (JUMP_P (q) && onlyjump_p (q) && (any_uncondjump_p (q) -- 2.6.4 ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU. 2016-01-02 19:16 ` [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU Marcin Kościelnicki @ 2016-01-21 10:05 ` Andreas Krebbel 2016-01-21 10:10 ` Marcin Kościelnicki 2016-01-21 23:10 ` Jeff Law 2016-04-17 21:24 ` Jeff Law 1 sibling, 2 replies; 55+ messages in thread From: Andreas Krebbel @ 2016-01-21 10:05 UTC (permalink / raw) To: Marcin Kościelnicki, gcc-patches On 01/02/2016 08:16 PM, Marcin KoÅcielnicki wrote: > When an unconditional jump with side effects targets an immediately > following label, rtl_tidy_fallthru_edge is called. Since it has side > effects, it doesn't remove the jump, but the label is still marked > as fallthru. This later causes a verification error. Do nothing in this > case instead. > > gcc/ChangeLog: > > * cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps > with side effects. The change looks ok to me (although I'm not able to approve it). Could you please run regressions tests on x86_64 with that change? Perhaps a short comment in the code would be good. -Andreas- ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU. 2016-01-21 10:05 ` Andreas Krebbel @ 2016-01-21 10:10 ` Marcin Kościelnicki 2016-01-21 23:10 ` Jeff Law 1 sibling, 0 replies; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-21 10:10 UTC (permalink / raw) To: Andreas Krebbel, gcc-patches On 21/01/16 11:05, Andreas Krebbel wrote: > On 01/02/2016 08:16 PM, Marcin KoÅcielnicki wrote: >> When an unconditional jump with side effects targets an immediately >> following label, rtl_tidy_fallthru_edge is called. Since it has side >> effects, it doesn't remove the jump, but the label is still marked >> as fallthru. This later causes a verification error. Do nothing in this >> case instead. >> >> gcc/ChangeLog: >> >> * cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps >> with side effects. > > The change looks ok to me (although I'm not able to approve it). Could you please run regressions > tests on x86_64 with that change? > > Perhaps a short comment in the code would be good. > > -Andreas- > OK, I'll run the testsuite and add a comment. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU. 2016-01-21 10:05 ` Andreas Krebbel 2016-01-21 10:10 ` Marcin Kościelnicki @ 2016-01-21 23:10 ` Jeff Law 2016-01-22 7:44 ` Andreas Krebbel 1 sibling, 1 reply; 55+ messages in thread From: Jeff Law @ 2016-01-21 23:10 UTC (permalink / raw) To: Andreas Krebbel, Marcin Kościelnicki, gcc-patches On 01/21/2016 03:05 AM, Andreas Krebbel wrote: > On 01/02/2016 08:16 PM, Marcin KoÅcielnicki wrote: >> When an unconditional jump with side effects targets an immediately >> following label, rtl_tidy_fallthru_edge is called. Since it has side >> effects, it doesn't remove the jump, but the label is still marked >> as fallthru. This later causes a verification error. Do nothing in this >> case instead. >> >> gcc/ChangeLog: >> >> * cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps >> with side effects. > > The change looks ok to me (although I'm not able to approve it). Could you please run regressions > tests on x86_64 with that change? > > Perhaps a short comment in the code would be good. I think the patch is technically fine, the question is does it fix a visible bug? I read the series as new feature enablement so I put this patch into my gcc7 queue. jeff ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU. 2016-01-21 23:10 ` Jeff Law @ 2016-01-22 7:44 ` Andreas Krebbel 2016-01-22 16:39 ` Marcin Kościelnicki 2016-01-27 7:11 ` Jeff Law 0 siblings, 2 replies; 55+ messages in thread From: Andreas Krebbel @ 2016-01-22 7:44 UTC (permalink / raw) To: Jeff Law, Marcin Kościelnicki, gcc-patches On 01/22/2016 12:10 AM, Jeff Law wrote: > On 01/21/2016 03:05 AM, Andreas Krebbel wrote: >> On 01/02/2016 08:16 PM, Marcin KoÅcielnicki wrote: >>> When an unconditional jump with side effects targets an immediately >>> following label, rtl_tidy_fallthru_edge is called. Since it has side >>> effects, it doesn't remove the jump, but the label is still marked >>> as fallthru. This later causes a verification error. Do nothing in this >>> case instead. >>> >>> gcc/ChangeLog: >>> >>> * cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps >>> with side effects. >> >> The change looks ok to me (although I'm not able to approve it). Could you please run regressions >> tests on x86_64 with that change? >> >> Perhaps a short comment in the code would be good. > I think the patch is technically fine, the question is does it fix a > visible bug? I read the series as new feature enablement so I put this > patch into my gcc7 queue. We need the patch for the S/390 split-stack implementation which we would like to see in GCC 6. I'm aware that this isn't stage 3 material but people seem to have reasons to really want split stack on S/390 asap and we would have to backport this feature anyway. Therefore I would prefer to have it in the official release already. That's the only common code change we would need for that. I've started a bootstrap and regression test for the patch also on Power. Do you see a chance we can get this into GCC 6? Bye, -Andreas- ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU. 2016-01-22 7:44 ` Andreas Krebbel @ 2016-01-22 16:39 ` Marcin Kościelnicki 2016-01-27 7:11 ` Jeff Law 1 sibling, 0 replies; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-22 16:39 UTC (permalink / raw) To: Andreas Krebbel, Jeff Law, gcc-patches On 22/01/16 08:44, Andreas Krebbel wrote: > On 01/22/2016 12:10 AM, Jeff Law wrote: >> On 01/21/2016 03:05 AM, Andreas Krebbel wrote: >>> On 01/02/2016 08:16 PM, Marcin KoÅcielnicki wrote: >>>> When an unconditional jump with side effects targets an immediately >>>> following label, rtl_tidy_fallthru_edge is called. Since it has side >>>> effects, it doesn't remove the jump, but the label is still marked >>>> as fallthru. This later causes a verification error. Do nothing in this >>>> case instead. >>>> >>>> gcc/ChangeLog: >>>> >>>> * cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps >>>> with side effects. >>> >>> The change looks ok to me (although I'm not able to approve it). Could you please run regressions >>> tests on x86_64 with that change? >>> >>> Perhaps a short comment in the code would be good. >> I think the patch is technically fine, the question is does it fix a >> visible bug? I read the series as new feature enablement so I put this >> patch into my gcc7 queue. > > We need the patch for the S/390 split-stack implementation which we would like to see in GCC 6. I'm > aware that this isn't stage 3 material but people seem to have reasons to really want split stack on > S/390 asap and we would have to backport this feature anyway. Therefore I would prefer to have it in > the official release already. That's the only common code change we would need for that. > > I've started a bootstrap and regression test for the patch also on Power. > > Do you see a chance we can get this into GCC 6? > > Bye, > > -Andreas- > I've tested the patch on x86_64, no regressions. I'm not entirely sure if the patch needs to go in for the current version of split-stack support. This patch fixed a showstopper bug on g5 CPUs when the patch still supported them. I haven't seen this bug with the z900 sequences (which are now the only ones left), but since we're still using unconditional jumps with side effects, I left it in just to be safe. The testsuite passes on s390x -fsplit-stack both with the patch and without it. So, I don't know. It seems to work now, probably because no optimization pass has a reason to touch that jump, but it may start to fail if someone adds a new optimization that tries to be smart with our prologue. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU. 2016-01-22 7:44 ` Andreas Krebbel 2016-01-22 16:39 ` Marcin Kościelnicki @ 2016-01-27 7:11 ` Jeff Law 1 sibling, 0 replies; 55+ messages in thread From: Jeff Law @ 2016-01-27 7:11 UTC (permalink / raw) To: Andreas Krebbel, Marcin Kościelnicki, gcc-patches On 01/22/2016 12:44 AM, Andreas Krebbel wrote: > On 01/22/2016 12:10 AM, Jeff Law wrote: >> On 01/21/2016 03:05 AM, Andreas Krebbel wrote: >>> On 01/02/2016 08:16 PM, Marcin KoÅcielnicki wrote: >>>> When an unconditional jump with side effects targets an immediately >>>> following label, rtl_tidy_fallthru_edge is called. Since it has side >>>> effects, it doesn't remove the jump, but the label is still marked >>>> as fallthru. This later causes a verification error. Do nothing in this >>>> case instead. >>>> >>>> gcc/ChangeLog: >>>> >>>> * cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps >>>> with side effects. >>> >>> The change looks ok to me (although I'm not able to approve it). Could you please run regressions >>> tests on x86_64 with that change? >>> >>> Perhaps a short comment in the code would be good. >> I think the patch is technically fine, the question is does it fix a >> visible bug? I read the series as new feature enablement so I put this >> patch into my gcc7 queue. > > We need the patch for the S/390 split-stack implementation which we would like to see in GCC 6. I'm > aware that this isn't stage 3 material but people seem to have reasons to really want split stack on > S/390 asap and we would have to backport this feature anyway. Therefore I would prefer to have it in > the official release already. That's the only common code change we would need for that. > > I've started a bootstrap and regression test for the patch also on Power. > > Do you see a chance we can get this into GCC 6? So I think it'd largely depend on what you do with the s390 specific bits -- if you decide to drop those in (ISTM that's your call), then I think adding the cfgrtl patch is probably the wise thing to do. So consider it approved for gcc-6 if/when you decide to go forward with the s390 specific bits. FWIW, the PA might run afoul of the code you're fixing as well. It's got add[i]b,tr and mov[i]b,tr which are unconditional jumps with other side effects. We never really used them all that much and once the PA8000 series came out, they were actually a performance lose, so they were disabled on the "modern" PA machines. Jeff ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU. 2016-01-02 19:16 ` [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU Marcin Kościelnicki 2016-01-21 10:05 ` Andreas Krebbel @ 2016-04-17 21:24 ` Jeff Law 1 sibling, 0 replies; 55+ messages in thread From: Jeff Law @ 2016-04-17 21:24 UTC (permalink / raw) To: Marcin Kościelnicki, gcc-patches On 01/02/2016 12:16 PM, Marcin KoÅcielnicki wrote: > When an unconditional jump with side effects targets an immediately > following label, rtl_tidy_fallthru_edge is called. Since it has side > effects, it doesn't remove the jump, but the label is still marked > as fallthru. This later causes a verification error. Do nothing in this > case instead. > > gcc/ChangeLog: > > * cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps > with side effects. OK for the trunk (gcc-7) It may not matter in practice, but you could try ripping out the other wide effects into individual insns and recognizing them. And if that works, then you can proceed to eliminate the jump, marking the fallthru label, etc. I think combine has some code to do similar things. jeff ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH 3/5] Fix NOTE_INSN_PROLOGUE_END after unconditional jump. 2016-01-02 19:16 [RFC] [PR 68191] s390: Add -fsplit-stack support Marcin Kościelnicki 2016-01-02 19:16 ` [PATCH 2/5] s390: Fix missing .size directives Marcin Kościelnicki 2016-01-02 19:16 ` [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU Marcin Kościelnicki @ 2016-01-02 19:16 ` Marcin Kościelnicki 2016-04-17 21:25 ` Jeff Law 2016-01-02 19:17 ` [PATCH 5/5] s390: Add -fsplit-stack support Marcin Kościelnicki ` (2 subsequent siblings) 5 siblings, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-02 19:16 UTC (permalink / raw) To: gcc-patches; +Cc: Marcin Kościelnicki With the new s390 split-stack support, when optimization is enabled, the cold path of calling __morestack is likely to be moved to the end of the function. This will result in the function ending in split_stack_call_esa, which is an unconditional jump instruction and part of the function prologue. reposition_prologue_and_epilogue_notes will insert NOTE_INSN_PROLOGUE_END right after it (and before the following barrier), causing a verification error. Insert it after the barrier instead (and outside of basic block). gcc/ChangeLog: * function.c (reposition_prologue_and_epilogue_notes): Avoid verification error if the last insn of prologue is an unconditional jump. --- gcc/ChangeLog | 6 ++++++ gcc/function.c | 6 ++++++ 2 files changed, 12 insertions(+) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 6aef3f9..56e31f6 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,11 @@ 2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> + * function.c (reposition_prologue_and_epilogue_notes): Avoid + verification error if the last insn of prologue is an unconditional + jump. + +2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> + * config/s390/s390.c (s390_asm_declare_function_size): Add code to actually emit the .size directive. diff --git a/gcc/function.c b/gcc/function.c index 035a49e..921945f 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -6348,6 +6348,12 @@ reposition_prologue_and_epilogue_notes (void) /* Avoid placing note between CODE_LABEL and BASIC_BLOCK note. */ if (LABEL_P (last)) last = NEXT_INSN (last); + if (BARRIER_P (last) && BLOCK_FOR_INSN (note)) + { + if (BB_END (BLOCK_FOR_INSN (note)) == note) + BB_END (BLOCK_FOR_INSN (note)) = PREV_INSN (note); + BLOCK_FOR_INSN (note) = 0; + } reorder_insns (note, note, last); } } -- 2.6.4 ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 3/5] Fix NOTE_INSN_PROLOGUE_END after unconditional jump. 2016-01-02 19:16 ` [PATCH 3/5] Fix NOTE_INSN_PROLOGUE_END after unconditional jump Marcin Kościelnicki @ 2016-04-17 21:25 ` Jeff Law 0 siblings, 0 replies; 55+ messages in thread From: Jeff Law @ 2016-04-17 21:25 UTC (permalink / raw) To: Marcin Kościelnicki, gcc-patches On 01/02/2016 12:16 PM, Marcin KoÅcielnicki wrote: > With the new s390 split-stack support, when optimization is enabled, > the cold path of calling __morestack is likely to be moved to the > end of the function. This will result in the function ending in > split_stack_call_esa, which is an unconditional jump instruction and > part of the function prologue. reposition_prologue_and_epilogue_notes > will insert NOTE_INSN_PROLOGUE_END right after it (and before the > following barrier), causing a verification error. Insert it after > the barrier instead (and outside of basic block). > > gcc/ChangeLog: > > * function.c (reposition_prologue_and_epilogue_notes): Avoid > verification error if the last insn of prologue is an unconditional > jump. > --- > gcc/ChangeLog | 6 ++++++ > gcc/function.c | 6 ++++++ > 2 files changed, 12 insertions(+) > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 6aef3f9..56e31f6 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,5 +1,11 @@ > 2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> > > + * function.c (reposition_prologue_and_epilogue_notes): Avoid > + verification error if the last insn of prologue is an unconditional > + jump. I'm guessing the BARRIER is actually in the hash table of prologue insns? Oh how I wish we didn't express barriers rtl. Can this leave NOTEs with no associated basic block in the chain? reorder_blocks only fixes the block boundaries, it doesn't fix BLOCK_FOR_INSN. Jeff ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH 5/5] s390: Add -fsplit-stack support 2016-01-02 19:16 [RFC] [PR 68191] s390: Add -fsplit-stack support Marcin Kościelnicki ` (2 preceding siblings ...) 2016-01-02 19:16 ` [PATCH 3/5] Fix NOTE_INSN_PROLOGUE_END after unconditional jump Marcin Kościelnicki @ 2016-01-02 19:17 ` Marcin Kościelnicki 2016-01-15 18:39 ` Andreas Krebbel 2016-01-02 19:17 ` [PATCH 1/5] s390: Use proper read-only data section for literals Marcin Kościelnicki 2016-01-03 3:21 ` [RFC] [PR 68191] s390: Add -fsplit-stack support Ian Lance Taylor 5 siblings, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-02 19:17 UTC (permalink / raw) To: gcc-patches; +Cc: Marcin Kościelnicki libgcc/ChangeLog: * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. * config/s390/morestack.S: New file. * config/s390/t-stack-s390: New file. * generic-morestack.c (__splitstack_find): Add s390-specific code. gcc/ChangeLog: * common/config/s390/s390-common.c (s390_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro. * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. * config/s390/s390.c (struct machine_function): New field split_stack_varargs_pointer. (s390_split_branches): Don't split split-stack pseudo-insns, rewire split-stack prologue conditional jump instead of splitting it. (s390_chunkify_start): Don't reload const pool register on split-stack prologue conditional jumps. (s390_register_info): Mark r12 as clobbered if it'll be used as temp in s390_emit_prologue. (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack vararg pointer. (morestack_ref): New global. (SPLIT_STACK_AVAILABLE): New macro. (s390_expand_split_stack_prologue): New function. (s390_expand_split_stack_call_esa): New function. (s390_expand_split_stack_call_zarch): New function. (s390_live_on_entry): New function. (s390_va_start): Use split-stack vararg pointer if appropriate. (s390_reorg): Lower the split-stack pseudo-insns. (s390_asm_file_end): Emit the split-stack note sections. (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. * config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec. (UNSPECV_SPLIT_STACK_CALL_ZARCH): New unspec. (UNSPECV_SPLIT_STACK_CALL_ESA): New unspec. (UNSPECV_SPLIT_STACK_SIBCALL): New unspec. (UNSPECV_SPLIT_STACK_MARKER): New unspec. (split_stack_prologue): New expand. (split_stack_call_esa): New insn. (split_stack_call_zarch_*): New insn. (split_stack_cond_call_zarch_*): New insn. (split_stack_space_check): New expand. (split_stack_sibcall_basr): New insn. (split_stack_sibcall_*): New insn. (split_stack_cond_sibcall_*): New insn. (split_stack_marker): New insn. --- gcc/ChangeLog | 41 ++ gcc/common/config/s390/s390-common.c | 14 + gcc/config/s390/s390-protos.h | 1 + gcc/config/s390/s390.c | 538 +++++++++++++++++++++++++- gcc/config/s390/s390.md | 133 +++++++ libgcc/ChangeLog | 7 + libgcc/config.host | 4 +- libgcc/config/s390/morestack.S | 718 +++++++++++++++++++++++++++++++++++ libgcc/config/s390/t-stack-s390 | 2 + libgcc/generic-morestack.c | 4 + 10 files changed, 1454 insertions(+), 8 deletions(-) create mode 100644 libgcc/config/s390/morestack.S create mode 100644 libgcc/config/s390/t-stack-s390 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 4c7046f..a4f4dff 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,46 @@ 2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> + * common/config/s390/s390-common.c (s390_supports_split_stack): + New function. + (TARGET_SUPPORTS_SPLIT_STACK): New macro. + * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. + * config/s390/s390.c (struct machine_function): New field + split_stack_varargs_pointer. + (s390_split_branches): Don't split split-stack pseudo-insns, rewire + split-stack prologue conditional jump instead of splitting it. + (s390_chunkify_start): Don't reload const pool register on split-stack + prologue conditional jumps. + (s390_register_info): Mark r12 as clobbered if it'll be used as temp + in s390_emit_prologue. + (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack + vararg pointer. + (morestack_ref): New global. + (SPLIT_STACK_AVAILABLE): New macro. + (s390_expand_split_stack_prologue): New function. + (s390_expand_split_stack_call_esa): New function. + (s390_expand_split_stack_call_zarch): New function. + (s390_live_on_entry): New function. + (s390_va_start): Use split-stack vararg pointer if appropriate. + (s390_reorg): Lower the split-stack pseudo-insns. + (s390_asm_file_end): Emit the split-stack note sections. + (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. + * config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec. + (UNSPECV_SPLIT_STACK_CALL_ZARCH): New unspec. + (UNSPECV_SPLIT_STACK_CALL_ESA): New unspec. + (UNSPECV_SPLIT_STACK_SIBCALL): New unspec. + (UNSPECV_SPLIT_STACK_MARKER): New unspec. + (split_stack_prologue): New expand. + (split_stack_call_esa): New insn. + (split_stack_call_zarch_*): New insn. + (split_stack_cond_call_zarch_*): New insn. + (split_stack_space_check): New expand. + (split_stack_sibcall_basr): New insn. + (split_stack_sibcall_*): New insn. + (split_stack_cond_sibcall_*): New insn. + (split_stack_marker): New insn. + +2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> + * cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps with side effects. diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c index 4cf0df7..0c468bf 100644 --- a/gcc/common/config/s390/s390-common.c +++ b/gcc/common/config/s390/s390-common.c @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, } } +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23. + We don't verify it, since earlier versions just have padding at + its place, which works just as well. */ + +static bool +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED, + struct gcc_options *opts ATTRIBUTE_UNUSED) +{ + return true; +} + #undef TARGET_DEFAULT_TARGET_FLAGS #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT) @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, #undef TARGET_OPTION_INIT_STRUCT #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct +#undef TARGET_SUPPORTS_SPLIT_STACK +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack + struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index 962abb1..936e267 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, extern HOST_WIDE_INT s390_initial_elimination_offset (int, int); extern void s390_emit_prologue (void); extern void s390_emit_epilogue (bool); +extern void s390_expand_split_stack_prologue (void); extern bool s390_can_use_simple_return_insn (void); extern bool s390_can_use_return_insn (void); extern void s390_function_profiler (FILE *, int); diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 9dc8d1e..0255eec 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -426,6 +426,13 @@ struct GTY(()) machine_function /* True if the current function may contain a tbegin clobbering FPRs. */ bool tbegin_p; + + /* For -fsplit-stack support: A stack local which holds a pointer to + the stack arguments for a function with a variable number of + arguments. This is set at the start of the function and is used + to initialize the overflow_arg_area field of the va_list + structure. */ + rtx split_stack_varargs_pointer; }; /* Few accessor macros for struct cfun->machine->s390_frame_layout. */ @@ -7669,7 +7676,17 @@ s390_split_branches (void) pat = PATTERN (insn); if (GET_CODE (pat) == PARALLEL) - pat = XVECEXP (pat, 0, 0); + { + /* Split stack call pseudo-jump doesn't need splitting. */ + if (GET_CODE (XVECEXP (pat, 0, 1)) == SET + && GET_CODE (XEXP (XVECEXP (pat, 0, 1), 1)) == UNSPEC_VOLATILE + && (XINT (XEXP (XVECEXP (pat, 0, 1), 1), 1) + == UNSPECV_SPLIT_STACK_CALL_ESA + || XINT (XEXP (XVECEXP (pat, 0, 1), 1), 1) + == UNSPECV_SPLIT_STACK_CALL_ZARCH)) + continue; + pat = XVECEXP (pat, 0, 0); + } if (GET_CODE (pat) != SET || SET_DEST (pat) != pc_rtx) continue; @@ -7692,6 +7709,49 @@ s390_split_branches (void) if (get_attr_length (insn) <= 4) continue; + if (prologue_epilogue_contains (insn)) + { + /* A jump in prologue/epilogue must come from the split-stack + prologue. It cannot be split - there are no scratch regs + available at that point. Rewire it instead. */ + + rtx_insn *code_label = (rtx_insn *)XEXP (*label, 0); + gcc_assert (LABEL_P (code_label)); + rtx_insn *note = NEXT_INSN (code_label); + gcc_assert (NOTE_P (note)); + rtx_insn *jump_ss = NEXT_INSN (note); + gcc_assert (JUMP_P (jump_ss)); + rtx_insn *barrier = NEXT_INSN (jump_ss); + gcc_assert (BARRIER_P (barrier)); + gcc_assert (GET_CODE (SET_SRC (pat)) == IF_THEN_ELSE); + gcc_assert (GET_CODE (XEXP (SET_SRC (pat), 0)) == LT); + + /* step 1 - insert new label after */ + rtx new_label = gen_label_rtx (); + emit_label_after (new_label, insn); + + /* step 2 - reorder */ + reorder_insns_nobb (code_label, barrier, insn); + + /* step 3 - retarget jump */ + rtx new_target = gen_rtx_LABEL_REF (VOIDmode, new_label); + ret = validate_change (insn, label, new_target, 0); + gcc_assert (ret); + LABEL_NUSES (new_label)++; + LABEL_NUSES (code_label)--; + JUMP_LABEL (insn) = new_label; + + /* step 4 - invert jump cc */ + rtx *pcond = &XEXP (SET_SRC (pat), 0); + rtx new_cond = gen_rtx_fmt_ee (GE, VOIDmode, + XEXP (*pcond, 0), + XEXP (*pcond, 1)); + ret = validate_change (insn, pcond, new_cond, 0); + gcc_assert (ret); + + continue; + } + /* We are going to use the return register as scratch register, make sure it will be saved/restored by the prologue/epilogue. */ cfun_frame_layout.save_return_addr_p = 1; @@ -8736,7 +8796,7 @@ s390_chunkify_start (void) } /* If we have a direct jump (conditional or unconditional), check all potential targets. */ - else if (JUMP_P (insn)) + else if (JUMP_P (insn) && !prologue_epilogue_contains (insn)) { rtx pat = PATTERN (insn); @@ -9316,9 +9376,13 @@ s390_register_info () cfun_frame_layout.high_fprs++; } - if (flag_pic) - clobbered_regs[PIC_OFFSET_TABLE_REGNUM] - |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM); + /* Register 12 is used for GOT address, but also as temp in prologue + for split-stack stdarg functions (unless r14 is available). */ + clobbered_regs[12] + |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM)) + || (flag_split_stack && cfun->stdarg + && (crtl->is_leaf || TARGET_TPF_PROFILING + || has_hard_reg_initial_val (Pmode, RETURN_REGNUM)))); clobbered_regs[BASE_REGNUM] |= (cfun->machine->base_reg @@ -10446,6 +10510,8 @@ s390_emit_prologue (void) && !crtl->is_leaf && !TARGET_TPF_PROFILING) temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM); + else if (flag_split_stack && cfun->stdarg) + temp_reg = gen_rtx_REG (Pmode, 12); else temp_reg = gen_rtx_REG (Pmode, 1); @@ -10939,6 +11005,386 @@ s300_set_up_by_prologue (hard_reg_set_container *regs) SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg)); } +/* -fsplit-stack support. */ + +/* A SYMBOL_REF for __morestack. */ +static GTY(()) rtx morestack_ref; + +/* When using -fsplit-stack, the allocation routines set a field in + the TCB to the bottom of the stack plus this much space, measured + in bytes. */ + +#define SPLIT_STACK_AVAILABLE 1024 + +/* Emit -fsplit-stack prologue, which goes before the regular function + prologue. */ + +void +s390_expand_split_stack_prologue (void) +{ + rtx r1, guard, cc; + rtx_insn *insn; + /* Offset from thread pointer to __private_ss. */ + int psso = TARGET_64BIT ? 0x38 : 0x20; + /* Pointer size in bytes. */ + /* Frame size and argument size - the two parameters to __morestack. */ + HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size; + /* Align argument size to 8 bytes - simplifies __morestack code. */ + HOST_WIDE_INT args_size = crtl->args.size >= 0 + ? ((crtl->args.size + 7) & ~7) + : 0; + /* Label to jump to when no __morestack call is necessary. */ + rtx_code_label *enough = NULL; + /* Label to be called by __morestack. */ + rtx_code_label *call_done = NULL; + /* 1 if __morestack called conditionally, 0 if always. */ + int conditional = 0; + + gcc_assert (flag_split_stack && reload_completed); + + r1 = gen_rtx_REG (Pmode, 1); + + /* If no stack frame will be allocated, don't do anything. */ + if (!frame_size) + { + /* But emit a marker that will let linker and indirect function + calls recognise this function as split-stack aware. */ + emit_insn(gen_split_stack_marker()); + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + /* If va_start is used, just use r15. */ + emit_move_insn (r1, + gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (STACK_POINTER_OFFSET))); + } + return; + } + + if (morestack_ref == NULL_RTX) + { + morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack"); + SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL + | SYMBOL_FLAG_FUNCTION); + } + + if (frame_size <= 0x7fff || (TARGET_EXTIMM && frame_size <= 0xffffffffu)) + { + /* If frame_size will fit in an add instruction, do a stack space + check, and only call __morestack if there's not enough space. */ + conditional = 1; + + /* Get thread pointer. r1 is the only register we can always destroy - r0 + could contain a static chain (and cannot be used to address memory + anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved. */ + emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM)); + /* Aim at __private_ss. */ + guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso)); + + /* If less that 1kiB used, skip addition and compare directly with + __private_ss. */ + if (frame_size > SPLIT_STACK_AVAILABLE) + { + emit_move_insn (r1, guard); + if (TARGET_64BIT) + emit_insn (gen_adddi3 (r1, r1, GEN_INT(frame_size))); + else + emit_insn (gen_addsi3 (r1, r1, GEN_INT(frame_size))); + guard = r1; + } + + if (TARGET_CPU_ZARCH) + { + rtx tmp; + + /* Compare the (maybe adjusted) guard with the stack pointer. */ + cc = s390_emit_compare (LT, stack_pointer_rtx, guard); + + call_done = gen_label_rtx (); + + if (TARGET_64BIT) + tmp = gen_split_stack_cond_call_zarch_di (call_done, + morestack_ref, + GEN_INT (frame_size), + GEN_INT (args_size), + cc); + else + tmp = gen_split_stack_cond_call_zarch_si (call_done, + morestack_ref, + GEN_INT (frame_size), + GEN_INT (args_size), + cc); + + + insn = emit_jump_insn (tmp); + JUMP_LABEL (insn) = call_done; + + /* Mark the jump as very unlikely to be taken. */ + add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100); + } + else + { + /* Compare the (maybe adjusted) guard with the stack pointer. */ + cc = s390_emit_compare (GE, stack_pointer_rtx, guard); + + enough = gen_label_rtx (); + insn = s390_emit_jump (enough, cc); + JUMP_LABEL (insn) = enough; + + /* Mark the jump as very likely to be taken. */ + add_int_reg_note (insn, REG_BR_PROB, + REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100); + } + } + + if (call_done == NULL) + { + rtx tmp; + call_done = gen_label_rtx (); + + /* Now, we need to call __morestack. It has very special calling + conventions: it preserves param/return/static chain registers for + calling main function body, and looks for its own parameters + at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */ + if (TARGET_64BIT) + tmp = gen_split_stack_call_zarch_di (call_done, + morestack_ref, + GEN_INT (frame_size), + GEN_INT (args_size)); + else if (TARGET_CPU_ZARCH) + tmp = gen_split_stack_call_zarch_si (call_done, + morestack_ref, + GEN_INT (frame_size), + GEN_INT (args_size)); + else + tmp = gen_split_stack_call_esa (call_done, + morestack_ref, + GEN_INT (frame_size), + GEN_INT (args_size)); + insn = emit_jump_insn (tmp); + JUMP_LABEL (insn) = call_done; + emit_barrier (); + } + + /* __morestack will call us here. */ + + if (enough != NULL) + { + emit_label (enough); + LABEL_NUSES (enough) = 1; + } + + if (conditional && cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + /* If va_start is used, and __morestack was not called, just use r15. */ + emit_move_insn (r1, + gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (STACK_POINTER_OFFSET))); + } + + emit_label (call_done); + LABEL_NUSES (call_done) = 1; +} + +/* Generates split-stack call sequence for esa mode, along with its parameter + block. */ + +static void +s390_expand_split_stack_call_esa (rtx_insn *orig_insn, + rtx call_done, + rtx function, + rtx frame_size, + rtx args_size) +{ + int psize = GET_MODE_SIZE (Pmode); + /* Labels for literal base, literal __morestack, param base. */ + rtx litbase = gen_label_rtx(); + rtx litms = gen_label_rtx(); + rtx parmbase = gen_label_rtx(); + rtx r1 = gen_rtx_REG (Pmode, 1); + rtx_insn *insn = orig_insn; + rtx tmp, tmp2; + + /* No brasl, we have to make do using basr and a literal pool. */ + + /* %r1 = litbase. */ + insn = emit_insn_after (gen_main_base_31_small (r1, litbase), insn); + insn = emit_label_after (litbase, insn); + + /* a %r1, .Llitms-.Llitbase(%r1) */ + tmp = gen_rtx_LABEL_REF (Pmode, litbase); + tmp2 = gen_rtx_LABEL_REF (Pmode, litms); + tmp = gen_rtx_UNSPEC (Pmode, + gen_rtvec (2, tmp2, tmp), + UNSPEC_POOL_OFFSET); + tmp = gen_rtx_CONST (Pmode, tmp); + tmp = gen_rtx_MEM (Pmode, gen_rtx_PLUS (Pmode, r1, tmp)); + insn = emit_insn_after (gen_addsi3 (r1, r1, tmp), insn); + add_reg_note (insn, REG_LABEL_OPERAND, litbase); + add_reg_note (insn, REG_LABEL_OPERAND, litms); + LABEL_NUSES (litbase)++; + LABEL_NUSES (litms)++; + + /* basr %r1, %r1 */ + tmp = gen_split_stack_sibcall_basr (r1, call_done); + insn = emit_jump_insn_after (tmp, insn); + JUMP_LABEL (insn) = call_done; + LABEL_NUSES (call_done)++; + + /* __morestack will mangle its return register to get our parameters. */ + + /* Now, we'll emit parameters to __morestack. First, align to pointer size + (this mirrors the alignment done in __morestack - don't touch it). */ + insn = emit_insn_after (gen_pool_align (GEN_INT (psize)), insn); + + insn = emit_label_after (parmbase, insn); + + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, frame_size), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + + /* Second parameter is size of the arguments passed on stack that + __morestack has to copy to the new stack (does not include varargs). */ + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, args_size), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + + /* Third parameter is offset between start of the parameter block + and function body to be called by __morestack. */ + tmp = gen_rtx_LABEL_REF (Pmode, parmbase); + tmp2 = gen_rtx_LABEL_REF (Pmode, call_done); + tmp = gen_rtx_CONST (Pmode, + gen_rtx_MINUS (Pmode, tmp2, tmp)); + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, tmp), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + add_reg_note (insn, REG_LABEL_OPERAND, call_done); + LABEL_NUSES (call_done)++; + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); + LABEL_NUSES (parmbase)++; + + /* We take advantage of the already-existing literal pool here to stuff + the __morestack address for use in the call above. */ + + insn = emit_label_after (litms, insn); + + /* We actually emit __morestack - litbase to support PIC. Since it + works just as well for non-PIC, we use it in all cases. */ + + tmp = gen_rtx_LABEL_REF (Pmode, litbase); + tmp = gen_rtx_CONST (Pmode, + gen_rtx_MINUS (Pmode, function, tmp)); + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, tmp), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + add_reg_note (insn, REG_LABEL_OPERAND, litbase); + LABEL_NUSES (litbase)++; + + delete_insn (orig_insn); +} + +/* Generates split-stack call sequence for zarch mode, along with its parameter + block. */ + +static void +s390_expand_split_stack_call_zarch (rtx_insn *orig_insn, + rtx call_done, + rtx function, + rtx frame_size, + rtx args_size, + rtx cond) +{ + int psize = GET_MODE_SIZE (Pmode); + rtx_insn *insn = orig_insn; + rtx parmbase = gen_label_rtx(); + rtx r1 = gen_rtx_REG (Pmode, 1); + rtx tmp, tmp2; + + /* %r1 = litbase. */ + insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn); + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); + LABEL_NUSES (parmbase)++; + + /* jg<cond> __morestack. */ + if (cond == NULL) + { + if (TARGET_64BIT) + tmp = gen_split_stack_sibcall_di (function, call_done); + else + tmp = gen_split_stack_sibcall_si (function, call_done); + insn = emit_jump_insn_after (tmp, insn); + } + else + { + if (!s390_comparison (cond, VOIDmode)) + internal_error ("bad split_stack_call_zarch cond"); + if (TARGET_64BIT) + tmp = gen_split_stack_cond_sibcall_di (function, cond, call_done); + else + tmp = gen_split_stack_cond_sibcall_si (function, cond, call_done); + insn = emit_jump_insn_after (tmp, insn); + } + JUMP_LABEL (insn) = call_done; + LABEL_NUSES (call_done)++; + + /* Go to .rodata. */ + insn = emit_insn_after (gen_pool_section_start (), insn); + + /* Now, we'll emit parameters to __morestack. First, align to pointer size + (this mirrors the alignment done in __morestack - don't touch it). */ + insn = emit_insn_after (gen_pool_align (GEN_INT (psize)), insn); + + insn = emit_label_after (parmbase, insn); + + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, frame_size), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + + /* Second parameter is size of the arguments passed on stack that + __morestack has to copy to the new stack (does not include varargs). */ + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, args_size), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + + /* Third parameter is offset between start of the parameter block + and function body to be called by __morestack. */ + tmp = gen_rtx_LABEL_REF (Pmode, parmbase); + tmp2 = gen_rtx_LABEL_REF (Pmode, call_done); + tmp = gen_rtx_CONST (Pmode, + gen_rtx_MINUS (Pmode, tmp2, tmp)); + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, tmp), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + add_reg_note (insn, REG_LABEL_OPERAND, call_done); + LABEL_NUSES (call_done)++; + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); + LABEL_NUSES (parmbase)++; + + /* Return from .rodata. */ + insn = emit_insn_after (gen_pool_section_end (), insn); + + delete_insn (orig_insn); +} + +/* We may have to tell the dataflow pass that the split stack prologue + is initializing a register. */ + +static void +s390_live_on_entry (bitmap regs) +{ + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + gcc_assert (flag_split_stack); + bitmap_set_bit (regs, 1); + } +} + /* Return true if the function can use simple_return to return outside of a shrink-wrapped region. At present shrink-wrapping is supported in all cases. */ @@ -11541,6 +11987,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } + if (flag_split_stack + && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) + == NULL) + && cfun->machine->split_stack_varargs_pointer == NULL_RTX) + { + rtx reg; + rtx_insn *seq; + + reg = gen_reg_rtx (Pmode); + cfun->machine->split_stack_varargs_pointer = reg; + + start_sequence (); + emit_move_insn (reg, gen_rtx_REG (Pmode, 1)); + seq = get_insns (); + end_sequence (); + + push_topmost_sequence (); + emit_insn_after (seq, entry_of_function ()); + pop_topmost_sequence (); + } + /* Find the overflow area. FIXME: This currently is too pessimistic when the vector ABI is enabled. In that case we *always* set up the overflow area @@ -11549,7 +12016,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG || TARGET_VX_ABI) { - t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); + if (cfun->machine->split_stack_varargs_pointer == NULL_RTX) + t = make_tree (TREE_TYPE (ovf), crtl->args.internal_arg_pointer); + else + t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer); off = INTVAL (crtl->args.arg_offset_rtx); off = off < 0 ? 0 : off; @@ -13158,6 +13628,56 @@ s390_reorg (void) } } + if (flag_split_stack) + { + rtx_insn *insn; + + for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) + { + /* Look for the split-stack fake jump instructions. */ + if (!JUMP_P(insn)) + continue; + if (GET_CODE (PATTERN (insn)) != PARALLEL + || XVECLEN (PATTERN (insn), 0) != 2) + continue; + rtx set = XVECEXP (PATTERN (insn), 0, 1); + if (GET_CODE (set) != SET) + continue; + rtx unspec = XEXP(set, 1); + if (GET_CODE (unspec) != UNSPEC_VOLATILE) + continue; + if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL_ESA + && XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL_ZARCH) + continue; + rtx set_pc = XVECEXP (PATTERN (insn), 0, 0); + rtx function = XVECEXP (unspec, 0, 0); + rtx frame_size = XVECEXP (unspec, 0, 1); + rtx args_size = XVECEXP (unspec, 0, 2); + rtx pc_src = XEXP (set_pc, 1); + rtx call_done, cond = NULL_RTX; + if (GET_CODE (pc_src) == IF_THEN_ELSE) + { + cond = XEXP (pc_src, 0); + call_done = XEXP (XEXP (pc_src, 1), 0); + } + else + call_done = XEXP (pc_src, 0); + if (XINT (unspec, 1) == UNSPECV_SPLIT_STACK_CALL_ESA) + s390_expand_split_stack_call_esa (insn, + call_done, + function, + frame_size, + args_size); + else + s390_expand_split_stack_call_zarch (insn, + call_done, + function, + frame_size, + args_size, + cond); + } + } + /* Try to optimize prologue and epilogue further. */ s390_optimize_prologue (); @@ -14469,6 +14989,9 @@ s390_asm_file_end (void) s390_vector_abi); #endif file_end_indicate_exec_stack (); + + if (flag_split_stack) + file_end_indicate_split_stack (); } /* Return true if TYPE is a vector bool type. */ @@ -14724,6 +15247,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty #undef TARGET_SET_UP_BY_PROLOGUE #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue +#undef TARGET_EXTRA_LIVE_ON_ENTRY +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry + #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ s390_use_by_pieces_infrastructure_p diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 0ebefd6..15c6eed 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -114,6 +114,9 @@ UNSPEC_SP_SET UNSPEC_SP_TEST + ; Split stack support + UNSPEC_STACK_CHECK + ; Test Data Class (TDC) UNSPEC_TDC_INSN @@ -276,6 +279,12 @@ ; Set and get floating point control register UNSPECV_SFPC UNSPECV_EFPC + + ; Split stack support + UNSPECV_SPLIT_STACK_CALL_ZARCH + UNSPECV_SPLIT_STACK_CALL_ESA + UNSPECV_SPLIT_STACK_SIBCALL + UNSPECV_SPLIT_STACK_MARKER ]) ;; @@ -10909,3 +10918,127 @@ "TARGET_Z13" "lcbb\t%0,%1,%b2" [(set_attr "op_type" "VRX")]) + +; Handle -fsplit-stack. + +(define_expand "split_stack_prologue" + [(const_int 0)] + "" +{ + s390_expand_split_stack_prologue (); + DONE; +}) + +(define_insn "split_stack_call_esa" + [(set (pc) (label_ref (match_operand 0 "" ""))) + (set (reg:SI 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") + (match_operand 2 "consttable_operand" "X") + (match_operand 3 "consttable_operand" "X")] + UNSPECV_SPLIT_STACK_CALL_ESA))] + "!TARGET_CPU_ZARCH" +{ + gcc_unreachable (); +} + [(set_attr "length" "32")]) + +(define_insn "split_stack_call_zarch_<mode>" + [(set (pc) (label_ref (match_operand 0 "" ""))) + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") + (match_operand 2 "consttable_operand" "X") + (match_operand 3 "consttable_operand" "X")] + UNSPECV_SPLIT_STACK_CALL_ZARCH))] + "TARGET_CPU_ZARCH" +{ + gcc_unreachable (); +} + [(set_attr "length" "12")]) + +(define_insn "split_stack_cond_call_zarch_<mode>" + [(set (pc) + (if_then_else + (match_operand 4 "" "") + (label_ref (match_operand 0 "" "")) + (pc))) + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") + (match_operand 2 "consttable_operand" "X") + (match_operand 3 "consttable_operand" "X")] + UNSPECV_SPLIT_STACK_CALL_ZARCH))] + "TARGET_CPU_ZARCH" +{ + gcc_unreachable (); +} + [(set_attr "length" "12")]) + +;; If there are operand 0 bytes available on the stack, jump to +;; operand 1. + +(define_expand "split_stack_space_check" + [(set (pc) (if_then_else + (ltu (minus (reg 15) + (match_operand 0 "register_operand")) + (unspec [(const_int 0)] UNSPEC_STACK_CHECK)) + (label_ref (match_operand 1)) + (pc)))] + "" +{ + /* Offset from thread pointer to __private_ss. */ + int psso = TARGET_64BIT ? 0x38 : 0x20; + rtx tp = s390_get_thread_pointer (); + rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso)); + rtx reg = gen_reg_rtx (Pmode); + rtx cc; + if (TARGET_64BIT) + emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0])); + else + emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0])); + cc = s390_emit_compare (GT, reg, guard); + s390_emit_jump (operands[1], cc); + + DONE; +}) + +;; A basr for use in split stack prologue. + +(define_insn "split_stack_sibcall_basr" + [(set (pc) (label_ref (match_operand 1 "" ""))) + (set (reg:SI 1) (unspec_volatile [(match_operand 0 "register_operand" "a")] + UNSPECV_SPLIT_STACK_SIBCALL))] + "!TARGET_CPU_ZARCH" + "basr\t%%r1, %0" + [(set_attr "op_type" "RR") + (set_attr "type" "jsr")]) + +;; A jg with minimal fuss for use in split stack prologue. + +(define_insn "split_stack_sibcall_<mode>" + [(set (pc) (label_ref (match_operand 1 "" ""))) + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] + UNSPECV_SPLIT_STACK_SIBCALL))] + "TARGET_CPU_ZARCH" + "jg\t%0" + [(set_attr "op_type" "RIL") + (set_attr "type" "branch")]) + +;; Also a conditional one. + +(define_insn "split_stack_cond_sibcall_<mode>" + [(set (pc) + (if_then_else + (match_operand 1 "" "") + (label_ref (match_operand 2 "" "")) + (pc))) + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] + UNSPECV_SPLIT_STACK_SIBCALL))] + "TARGET_CPU_ZARCH" + "jg%C1\t%0" + [(set_attr "op_type" "RIL") + (set_attr "type" "branch")]) + +;; An unusual nop instruction used to mark functions with no stack frames +;; as split-stack aware. + +(define_insn "split_stack_marker" + [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)] + "" + "nopr\t%%r15" + [(set_attr "op_type" "RR")]) diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog index f66646c..ff60571 100644 --- a/libgcc/ChangeLog +++ b/libgcc/ChangeLog @@ -1,3 +1,10 @@ +2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> + + * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. + * config/s390/morestack.S: New file. + * config/s390/t-stack-s390: New file. + * generic-morestack.c (__splitstack_find): Add s390-specific code. + 2015-12-18 Andris Pavenis <andris.pavenis@iki.fi> * config.host: Add *-*-msdosdjgpp to lists of i[34567]86-*-* diff --git a/libgcc/config.host b/libgcc/config.host index 0a3b879..ce6d259 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1105,11 +1105,11 @@ rx-*-elf) tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h" ;; s390-*-linux*) - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi" + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390" md_unwind_header=s390/linux-unwind.h ;; s390x-*-linux*) - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux" + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390" if test "${host_address}" = 32; then tmake_file="${tmake_file} s390/32/t-floattodi" fi diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S new file mode 100644 index 0000000..8e26c66 --- /dev/null +++ b/libgcc/config/s390/morestack.S @@ -0,0 +1,718 @@ +# s390 support for -fsplit-stack. +# Copyright (C) 2015 Free Software Foundation, Inc. +# Contributed by Marcin KoÅcielnicki <koriakin@0x04.net>. + +# This file is part of GCC. + +# GCC is free software; you can redistribute it and/or modify it under +# the terms of the GNU General Public License as published by the Free +# Software Foundation; either version 3, or (at your option) any later +# version. + +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. + +# Under Section 7 of GPL version 3, you are granted additional +# permissions described in the GCC Runtime Library Exception, version +# 3.1, as published by the Free Software Foundation. + +# You should have received a copy of the GNU General Public License and +# a copy of the GCC Runtime Library Exception along with this program; +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +# <http://www.gnu.org/licenses/>. + +# Excess space needed to call ld.so resolver for lazy plt +# resolution. Go uses sigaltstack so this doesn't need to +# also cover signal frame size. +#define BACKOFF 0x1000 + +# The __morestack function. + + .global __morestack + .hidden __morestack + + .type __morestack,@function + +__morestack: +.LFB1: + .cfi_startproc + + +#ifndef __s390x__ + + +# The 31-bit __morestack function. + + # We use a cleanup to restore the stack guard if an exception + # is thrown through this code. +#ifndef __PIC__ + .cfi_personality 0,__gcc_personality_v0 + .cfi_lsda 0,.LLSDA1 +#else + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 + .cfi_lsda 0x1b,.LLSDA1 +#endif + + stm %r2, %r15, 0x8(%r15) # Save %r2-%r15. + .cfi_offset %r6, -0x48 + .cfi_offset %r7, -0x44 + .cfi_offset %r8, -0x40 + .cfi_offset %r9, -0x3c + .cfi_offset %r10, -0x38 + .cfi_offset %r11, -0x34 + .cfi_offset %r12, -0x30 + .cfi_offset %r13, -0x2c + .cfi_offset %r14, -0x28 + .cfi_offset %r15, -0x24 + lr %r11, %r15 # Make frame pointer for vararg. + .cfi_def_cfa_register %r11 + ahi %r15, -0x60 # 0x60 for standard frame. + st %r11, 0(%r15) # Save back chain. + lr %r8, %r0 # Save %r0 (static chain). + + basr %r13, 0 # .Lmsl to %r13 +.Lmsl: + + # %r1 may point directly to the parameter area (zarch), or right after + # the basr instruction that called us (esa). In the first case, + # the pointer is already aligned. In the second case, we may need to + # align it up to 4 bytes to get to the parameters. + la %r10, 3(%r1) + lhi %r7, -4 + nr %r10, %r7 # %r10 = (%r1 + 3) & ~3 + + l %r7, 0(%r10) # Required frame size to %r7 + ear %r1, %a0 # Extract thread pointer. + l %r1, 0x20(%r1) # Get stack bounduary + ar %r1, %r7 # Stack bounduary + frame size + a %r1, 4(%r10) # + stack param size + clr %r1, %r15 # Compare with current stack pointer + jle .Lnoalloc # guard > sp - frame-size: need alloc + + l %r1, .Lmslbs-.Lmsl(%r13) # __morestack_block_signals +#ifdef __PIC__ + bas %r14, 0(%r1, %r13) +#else + basr %r14, %r1 +#endif + + # We abuse one of caller's fpr save slots (which we don't use for fprs) + # as a local variable. Not needed here, but done to be consistent with + # the below use. + ahi %r7, BACKOFF # Bump requested size a bit. + st %r7, 0x40(%r11) # Stuff frame size on stack. + la %r2, 0x40(%r11) # Pass its address as parameter. + la %r3, 0x60(%r11) # Caller's stack parameters. + l %r4, 4(%r10) # Size of stack paremeters. + + l %r1, .Lmslgms-.Lmsl(%r13) # __generic_morestack +#ifdef __PIC__ + bas %r14, 0(%r1, %r13) +#else + basr %r14, %r1 +#endif + + lr %r15, %r2 # Switch to the new stack. + ahi %r15, -0x60 # Make a stack frame on it. + st %r11, 0(%r15) # Save back chain. + + s %r2, 0x40(%r11) # The end of stack space. + ahi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. +.LEHB0: + st %r2, 0x20(%r1) # Save the new stack boundary. + + l %r1, .Lmslubs-.Lmsl(%r13) # __morestack_unblock_signals +#ifdef __PIC__ + bas %r14, 0(%r1, %r13) +#else + basr %r14, %r1 +#endif + + lr %r0, %r8 # Static chain. + lm %r2, %r6, 0x8(%r11) # Paremeter registers. + + # Third parameter is address of function meat - address of parameter + # block. + a %r10, 0x8(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0x60(%r11) + + # State of registers: + # %r0: Static chain from entry. + # %r1: Vararg pointer. + # %r2-%r6: Parameters from entry. + # %r7-%r10: Indeterminate. + # %r11: Frame pointer (%r15 from entry). + # %r12: Indeterminate. + # %r13: Literal pool address. + # %r14: Return address. + # %r15: Stack pointer. + basr %r14, %r10 # Call our caller. + + stm %r2, %r3, 0x8(%r11) # Save return registers. + + l %r1, .Lmslbs-.Lmsl(%r13) # __morestack_block_signals +#ifdef __PIC__ + bas %r14, 0(%r1, %r13) +#else + basr %r14, %r1 +#endif + + # We need a stack slot now, but have no good way to get it - the frame + # on new stack had to be exactly 0x60 bytes, or stack parameters would + # be passed wrong. Abuse fpr save area in caller's frame (we don't + # save actual fprs). + la %r2, 0x40(%r11) + l %r1, .Lmslgrs-.Lmsl(%r13) # __generic_releasestack +#ifdef __PIC__ + bas %r14, 0(%r1, %r13) +#else + basr %r14, %r1 +#endif + + s %r2, 0x40(%r11) # Subtract available space. + ahi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. +.LEHE0: + st %r2, 0x20(%r1) # Save the new stack boundary. + + # We need to restore the old stack pointer before unblocking signals. + # We also need 0x60 bytes for a stack frame. Since we had a stack + # frame at this place before the stack switch, there's no need to + # write the back chain again. + lr %r15, %r11 + ahi %r15, -0x60 + + l %r1, .Lmslubs-.Lmsl(%r13) # __morestack_unblock_signals +#ifdef __PIC__ + bas %r14, 0(%r1, %r13) +#else + basr %r14, %r1 +#endif + + lm %r2, %r15, 0x8(%r11) # Restore all registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# Executed if no new stack allocation is needed. + +.Lnoalloc: + .cfi_restore_state + # We may need to copy stack parameters. + l %r9, 0x4(%r10) # Load stack parameter size. + ltr %r9, %r9 # And check if it's 0. + je .Lnostackparm # Skip the copy if not needed. + sr %r15, %r9 # Make space on the stack. + la %r8, 0x60(%r15) # Destination. + la %r12, 0x60(%r11) # Source. + lr %r13, %r9 # Source size. +.Lcopy: + mvcle %r8, %r12, 0 # Copy. + jo .Lcopy + +.Lnostackparm: + # Third parameter is address of function meat - address of parameter + # block. + a %r10, 0x8(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0x60(%r11) + + # OK, no stack allocation needed. We still follow the protocol and + # call our caller - it doesn't cost much and makes sure vararg works. + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. + basr %r14, %r10 # Call our caller. + + lm %r6, %r15, 0x18(%r11) # Restore all callee-saved registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# This is the cleanup code called by the stack unwinder when unwinding +# through the code between .LEHB0 and .LEHE0 above. + +.L1: + .cfi_restore_state + lr %r2, %r11 # Stack pointer after resume. + l %r1, .Lmslgfs-.Lmsl(%r13) # __generic_findstack +#ifdef __PIC__ + bas %r14, 0(%r1, %r13) +#else + basr %r14, %r1 +#endif + lr %r3, %r11 # Get the stack pointer. + sr %r3, %r2 # Subtract available space. + ahi %r3, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. + st %r3, 0x20(%r1) # Save the new stack boundary. + + lr %r2, %r6 # Exception header. +#ifdef __PIC__ + l %r12, .Lmslgot-.Lmsl(%r13) + ar %r12, %r13 + l %r1, .Lmslunw-.Lmsl(%r13) + bas %r14, 0(%r1, %r12) +#else + l %r1, .Lmslunw-.Lmsl(%r13) + basr %r14, %r1 +#endif + +# Literal pool. + +.align 4 +#ifdef __PIC__ +.Lmslbs: + .long __morestack_block_signals-.Lmsl +.Lmslubs: + .long __morestack_unblock_signals-.Lmsl +.Lmslgms: + .long __generic_morestack-.Lmsl +.Lmslgrs: + .long __generic_releasestack-.Lmsl +.Lmslgfs: + .long __generic_findstack-.Lmsl +.Lmslunw: + .long _Unwind_Resume@PLTOFF +.Lmslgot: + .long _GLOBAL_OFFSET_TABLE_-.Lmsl +#else +.Lmslbs: + .long __morestack_block_signals +.Lmslubs: + .long __morestack_unblock_signals +.Lmslgms: + .long __generic_morestack +.Lmslgrs: + .long __generic_releasestack +.Lmslgfs: + .long __generic_findstack +.Lmslunw: + .long _Unwind_Resume +#endif + +#else /* defined(__s390x__) */ + + +# The 64-bit __morestack function. + + # We use a cleanup to restore the stack guard if an exception + # is thrown through this code. +#ifndef __PIC__ + .cfi_personality 0x3,__gcc_personality_v0 + .cfi_lsda 0x3,.LLSDA1 +#else + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 + .cfi_lsda 0x1b,.LLSDA1 +#endif + + stmg %r2, %r15, 0x10(%r15) # Save %r2-%r15. + .cfi_offset %r6, -0x70 + .cfi_offset %r7, -0x68 + .cfi_offset %r8, -0x60 + .cfi_offset %r9, -0x58 + .cfi_offset %r10, -0x50 + .cfi_offset %r11, -0x48 + .cfi_offset %r12, -0x40 + .cfi_offset %r13, -0x38 + .cfi_offset %r14, -0x30 + .cfi_offset %r15, -0x28 + lgr %r11, %r15 # Make frame pointer for vararg. + .cfi_def_cfa_register %r11 + aghi %r15, -0xa0 # 0xa0 for standard frame. + stg %r11, 0(%r15) # Save back chain. + lgr %r8, %r0 # Save %r0 (static chain). + lgr %r10, %r1 # Save %r1 (address of parameter block). + + lg %r7, 0(%r10) # Required frame size to %r7 + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. + lg %r1, 0x38(%r1) # Get stack bounduary + agr %r1, %r7 # Stack bounduary + frame size + ag %r1, 8(%r10) # + stack param size + clgr %r1, %r15 # Compare with current stack pointer + jle .Lnoalloc # guard > sp - frame-size: need alloc + + brasl %r14, __morestack_block_signals + + # We abuse one of caller's fpr save slots (which we don't use for fprs) + # as a local variable. Not needed here, but done to be consistent with + # the below use. + aghi %r7, BACKOFF # Bump requested size a bit. + stg %r7, 0x80(%r11) # Stuff frame size on stack. + la %r2, 0x80(%r11) # Pass its address as parameter. + la %r3, 0xa0(%r11) # Caller's stack parameters. + lg %r4, 8(%r10) # Size of stack paremeters. + brasl %r14, __generic_morestack + + lgr %r15, %r2 # Switch to the new stack. + aghi %r15, -0xa0 # Make a stack frame on it. + stg %r11, 0(%r15) # Save back chain. + + sg %r2, 0x80(%r11) # The end of stack space. + aghi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. +.LEHB0: + stg %r2, 0x38(%r1) # Save the new stack boundary. + + brasl %r14, __morestack_unblock_signals + + lgr %r0, %r8 # Static chain. + lmg %r2, %r6, 0x10(%r11) # Paremeter registers. + + # Third parameter is address of function meat - address of parameter + # block. + ag %r10, 0x10(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0xa0(%r11) + + # State of registers: + # %r0: Static chain from entry. + # %r1: Vararg pointer. + # %r2-%r6: Parameters from entry. + # %r7-%r10: Indeterminate. + # %r11: Frame pointer (%r15 from entry). + # %r12-%r13: Indeterminate. + # %r14: Return address. + # %r15: Stack pointer. + basr %r14, %r10 # Call our caller. + + stg %r2, 0x10(%r11) # Save return register. + + brasl %r14, __morestack_block_signals + + # We need a stack slot now, but have no good way to get it - the frame + # on new stack had to be exactly 0xa0 bytes, or stack parameters would + # be passed wrong. Abuse fpr save area in caller's frame (we don't + # save actual fprs). + la %r2, 0x80(%r11) + brasl %r14, __generic_releasestack + + sg %r2, 0x80(%r11) # Subtract available space. + aghi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. +.LEHE0: + stg %r2, 0x38(%r1) # Save the new stack boundary. + + # We need to restore the old stack pointer before unblocking signals. + # We also need 0xa0 bytes for a stack frame. Since we had a stack + # frame at this place before the stack switch, there's no need to + # write the back chain again. + lgr %r15, %r11 + aghi %r15, -0xa0 + + brasl %r14, __morestack_unblock_signals + + lmg %r2, %r15, 0x10(%r11) # Restore all registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# Executed if no new stack allocation is needed. + +.Lnoalloc: + .cfi_restore_state + # We may need to copy stack parameters. + lg %r9, 0x8(%r10) # Load stack parameter size. + ltgr %r9, %r9 # Check if it's 0. + je .Lnostackparm # Skip the copy if not needed. + sgr %r15, %r9 # Make space on the stack. + la %r8, 0xa0(%r15) # Destination. + la %r12, 0xa0(%r11) # Source. + lgr %r13, %r9 # Source size. +.Lcopy: + mvcle %r8, %r12, 0 # Copy. + jo .Lcopy + +.Lnostackparm: + # Third parameter is address of function meat - address of parameter + # block. + ag %r10, 0x10(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0xa0(%r11) + + # OK, no stack allocation needed. We still follow the protocol and + # call our caller - it doesn't cost much and makes sure vararg works. + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. + basr %r14, %r10 # Call our caller. + + lmg %r6, %r15, 0x30(%r11) # Restore all callee-saved registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# This is the cleanup code called by the stack unwinder when unwinding +# through the code between .LEHB0 and .LEHE0 above. + +.L1: + .cfi_restore_state + lgr %r2, %r11 # Stack pointer after resume. + brasl %r14, __generic_findstack + lgr %r3, %r11 # Get the stack pointer. + sgr %r3, %r2 # Subtract available space. + aghi %r3, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. + stg %r3, 0x38(%r1) # Save the new stack boundary. + + lgr %r2, %r6 # Exception header. +#ifdef __PIC__ + brasl %r14, _Unwind_Resume@PLT +#else + brasl %r14, _Unwind_Resume +#endif + +#endif /* defined(__s390x__) */ + + .cfi_endproc + .size __morestack, . - __morestack + + +# The exception table. This tells the personality routine to execute +# the exception handler. + + .section .gcc_except_table,"a",@progbits + .align 4 +.LLSDA1: + .byte 0xff # @LPStart format (omit) + .byte 0xff # @TType format (omit) + .byte 0x1 # call-site format (uleb128) + .uleb128 .LLSDACSE1-.LLSDACSB1 # Call-site table length +.LLSDACSB1: + .uleb128 .LEHB0-.LFB1 # region 0 start + .uleb128 .LEHE0-.LEHB0 # length + .uleb128 .L1-.LFB1 # landing pad + .uleb128 0 # action +.LLSDACSE1: + + + .global __gcc_personality_v0 +#ifdef __PIC__ + # Build a position independent reference to the basic + # personality function. + .hidden DW.ref.__gcc_personality_v0 + .weak DW.ref.__gcc_personality_v0 + .section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat + .type DW.ref.__gcc_personality_v0, @object +DW.ref.__gcc_personality_v0: +#ifndef __LP64__ + .align 4 + .size DW.ref.__gcc_personality_v0, 4 + .long __gcc_personality_v0 +#else + .align 8 + .size DW.ref.__gcc_personality_v0, 8 + .quad __gcc_personality_v0 +#endif +#endif + + + +# Initialize the stack test value when the program starts or when a +# new thread starts. We don't know how large the main stack is, so we +# guess conservatively. We might be able to use getrlimit here. + + .text + .global __stack_split_initialize + .hidden __stack_split_initialize + + .type __stack_split_initialize, @function + +__stack_split_initialize: + +#ifndef __s390x__ + + ear %r1, %a0 + lr %r0, %r15 + ahi %r0, -0x4000 # We should have at least 16K. + st %r0, 0x20(%r1) + + lr %r2, %r15 + lhi %r3, 0x4000 +#ifdef __PIC__ + # Cannot do a tail call - we'll go through PLT, so we need GOT address + # in %r12, which is callee-saved. + stm %r12, %r15, 0x30(%r15) + basr %r13, 0 +.Lssi0: + ahi %r15, -0x60 + l %r12, .Lssi2-.Lssi0(%r13) + ar %r12, %r13 + l %r1, .Lssi1-.Lssi0(%r13) + bas %r14, 0(%r1, %r12) + lm %r12, %r15, 0x90(%r15) + br %r14 + +.align 4 +.Lssi1: + .long __generic_morestack_set_initial_sp@PLTOFF +.Lssi2: + .long _GLOBAL_OFFSET_TABLE_-.Lssi0 + +#else + basr %r1, 0 +.Lssi0: + l %r1, .Lssi1-.Lssi0(%r1) + br %r1 # Tail call + +.align 4 +.Lssi1: + .long __generic_morestack_set_initial_sp +#endif + +#else /* defined(__s390x__) */ + + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + lgr %r0, %r15 + aghi %r0, -0x4000 # We should have at least 16K. + stg %r0, 0x38(%r1) + + lgr %r2, %r15 + lghi %r3, 0x4000 +#ifdef __PIC__ + jg __generic_morestack_set_initial_sp@PLT # Tail call +#else + jg __generic_morestack_set_initial_sp # Tail call +#endif + +#endif /* defined(__s390x__) */ + + .size __stack_split_initialize, . - __stack_split_initialize + +# Routines to get and set the guard, for __splitstack_getcontext, +# __splitstack_setcontext, and __splitstack_makecontext. + +# void *__morestack_get_guard (void) returns the current stack guard. + .text + .global __morestack_get_guard + .hidden __morestack_get_guard + + .type __morestack_get_guard,@function + +__morestack_get_guard: + +#ifndef __s390x__ + ear %r1, %a0 + l %r2, 0x20(%r1) +#else + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + lg %r2, 0x38(%r1) +#endif + br %r14 + + .size __morestack_get_guard, . - __morestack_get_guard + +# void __morestack_set_guard (void *) sets the stack guard. + .global __morestack_set_guard + .hidden __morestack_set_guard + + .type __morestack_set_guard,@function + +__morestack_set_guard: + +#ifndef __s390x__ + ear %r1, %a0 + st %r2, 0x20(%r1) +#else + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + stg %r2, 0x38(%r1) +#endif + br %r14 + + .size __morestack_set_guard, . - __morestack_set_guard + +# void *__morestack_make_guard (void *, size_t) returns the stack +# guard value for a stack. + .global __morestack_make_guard + .hidden __morestack_make_guard + + .type __morestack_make_guard,@function + +__morestack_make_guard: + +#ifndef __s390x__ + sr %r2, %r3 + ahi %r2, BACKOFF +#else + sgr %r2, %r3 + aghi %r2, BACKOFF +#endif + br %r14 + + .size __morestack_make_guard, . - __morestack_make_guard + +# Make __stack_split_initialize a high priority constructor. + + .section .ctors.65535,"aw",@progbits + +#ifndef __LP64__ + .align 4 + .long __stack_split_initialize + .long __morestack_load_mmap +#else + .align 8 + .quad __stack_split_initialize + .quad __morestack_load_mmap +#endif + + .section .note.GNU-stack,"",@progbits + .section .note.GNU-split-stack,"",@progbits + .section .note.GNU-no-split-stack,"",@progbits diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390 new file mode 100644 index 0000000..4c959b0 --- /dev/null +++ b/libgcc/config/s390/t-stack-s390 @@ -0,0 +1,2 @@ +# Makefile fragment to support -fsplit-stack for s390. +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c index a10559b..8109c1a 100644 --- a/libgcc/generic-morestack.c +++ b/libgcc/generic-morestack.c @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len, #elif defined (__i386__) nsp -= 6 * sizeof (void *); #elif defined __powerpc64__ +#elif defined __s390x__ + nsp -= 2 * 160; +#elif defined __s390__ + nsp -= 2 * 96; #else #error "unrecognized target" #endif -- 2.6.4 ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 5/5] s390: Add -fsplit-stack support 2016-01-02 19:17 ` [PATCH 5/5] s390: Add -fsplit-stack support Marcin Kościelnicki @ 2016-01-15 18:39 ` Andreas Krebbel 2016-01-15 21:08 ` Marcin Kościelnicki 2016-01-16 13:46 ` [PATCH] " Marcin Kościelnicki 0 siblings, 2 replies; 55+ messages in thread From: Andreas Krebbel @ 2016-01-15 18:39 UTC (permalink / raw) To: Marcin Kościelnicki, gcc-patches Marcin, your implementation looks very good to me. Thanks! But please be aware that we deprecated the support of g5 and g6 and intend to remove that code from the back-end with the next GCC version. So I would prefer if you could remove all the !TARGET_CPU_ZARCH stuff from the implementation and just error out if split-stack is enabled with -march g5/g6. It currently makes the implementation more complicated and would have to be removed anyway in the future. Thanks! https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01854.html Bye, -Andreas- On 01/02/2016 08:16 PM, Marcin KoÅcielnicki wrote: > libgcc/ChangeLog: > > * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. > * config/s390/morestack.S: New file. > * config/s390/t-stack-s390: New file. > * generic-morestack.c (__splitstack_find): Add s390-specific code. > > gcc/ChangeLog: > > * common/config/s390/s390-common.c (s390_supports_split_stack): > New function. > (TARGET_SUPPORTS_SPLIT_STACK): New macro. > * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. > * config/s390/s390.c (struct machine_function): New field > split_stack_varargs_pointer. > (s390_split_branches): Don't split split-stack pseudo-insns, rewire > split-stack prologue conditional jump instead of splitting it. > (s390_chunkify_start): Don't reload const pool register on split-stack > prologue conditional jumps. > (s390_register_info): Mark r12 as clobbered if it'll be used as temp > in s390_emit_prologue. > (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack > vararg pointer. > (morestack_ref): New global. > (SPLIT_STACK_AVAILABLE): New macro. > (s390_expand_split_stack_prologue): New function. > (s390_expand_split_stack_call_esa): New function. > (s390_expand_split_stack_call_zarch): New function. > (s390_live_on_entry): New function. > (s390_va_start): Use split-stack vararg pointer if appropriate. > (s390_reorg): Lower the split-stack pseudo-insns. > (s390_asm_file_end): Emit the split-stack note sections. > (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. > * config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec. > (UNSPECV_SPLIT_STACK_CALL_ZARCH): New unspec. > (UNSPECV_SPLIT_STACK_CALL_ESA): New unspec. > (UNSPECV_SPLIT_STACK_SIBCALL): New unspec. > (UNSPECV_SPLIT_STACK_MARKER): New unspec. > (split_stack_prologue): New expand. > (split_stack_call_esa): New insn. > (split_stack_call_zarch_*): New insn. > (split_stack_cond_call_zarch_*): New insn. > (split_stack_space_check): New expand. > (split_stack_sibcall_basr): New insn. > (split_stack_sibcall_*): New insn. > (split_stack_cond_sibcall_*): New insn. > (split_stack_marker): New insn. > --- > gcc/ChangeLog | 41 ++ > gcc/common/config/s390/s390-common.c | 14 + > gcc/config/s390/s390-protos.h | 1 + > gcc/config/s390/s390.c | 538 +++++++++++++++++++++++++- > gcc/config/s390/s390.md | 133 +++++++ > libgcc/ChangeLog | 7 + > libgcc/config.host | 4 +- > libgcc/config/s390/morestack.S | 718 +++++++++++++++++++++++++++++++++++ > libgcc/config/s390/t-stack-s390 | 2 + > libgcc/generic-morestack.c | 4 + > 10 files changed, 1454 insertions(+), 8 deletions(-) > create mode 100644 libgcc/config/s390/morestack.S > create mode 100644 libgcc/config/s390/t-stack-s390 > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 4c7046f..a4f4dff 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,5 +1,46 @@ > 2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> > > + * common/config/s390/s390-common.c (s390_supports_split_stack): > + New function. > + (TARGET_SUPPORTS_SPLIT_STACK): New macro. > + * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. > + * config/s390/s390.c (struct machine_function): New field > + split_stack_varargs_pointer. > + (s390_split_branches): Don't split split-stack pseudo-insns, rewire > + split-stack prologue conditional jump instead of splitting it. > + (s390_chunkify_start): Don't reload const pool register on split-stack > + prologue conditional jumps. > + (s390_register_info): Mark r12 as clobbered if it'll be used as temp > + in s390_emit_prologue. > + (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack > + vararg pointer. > + (morestack_ref): New global. > + (SPLIT_STACK_AVAILABLE): New macro. > + (s390_expand_split_stack_prologue): New function. > + (s390_expand_split_stack_call_esa): New function. > + (s390_expand_split_stack_call_zarch): New function. > + (s390_live_on_entry): New function. > + (s390_va_start): Use split-stack vararg pointer if appropriate. > + (s390_reorg): Lower the split-stack pseudo-insns. > + (s390_asm_file_end): Emit the split-stack note sections. > + (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. > + * config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec. > + (UNSPECV_SPLIT_STACK_CALL_ZARCH): New unspec. > + (UNSPECV_SPLIT_STACK_CALL_ESA): New unspec. > + (UNSPECV_SPLIT_STACK_SIBCALL): New unspec. > + (UNSPECV_SPLIT_STACK_MARKER): New unspec. > + (split_stack_prologue): New expand. > + (split_stack_call_esa): New insn. > + (split_stack_call_zarch_*): New insn. > + (split_stack_cond_call_zarch_*): New insn. > + (split_stack_space_check): New expand. > + (split_stack_sibcall_basr): New insn. > + (split_stack_sibcall_*): New insn. > + (split_stack_cond_sibcall_*): New insn. > + (split_stack_marker): New insn. > + > +2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> > + > * cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps > with side effects. > > diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c > index 4cf0df7..0c468bf 100644 > --- a/gcc/common/config/s390/s390-common.c > +++ b/gcc/common/config/s390/s390-common.c > @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, > } > } > > +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23. > + We don't verify it, since earlier versions just have padding at > + its place, which works just as well. */ > + > +static bool > +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED, > + struct gcc_options *opts ATTRIBUTE_UNUSED) > +{ > + return true; > +} > + > #undef TARGET_DEFAULT_TARGET_FLAGS > #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT) > > @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, > #undef TARGET_OPTION_INIT_STRUCT > #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct > > +#undef TARGET_SUPPORTS_SPLIT_STACK > +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack > + > struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; > diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h > index 962abb1..936e267 100644 > --- a/gcc/config/s390/s390-protos.h > +++ b/gcc/config/s390/s390-protos.h > @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, > extern HOST_WIDE_INT s390_initial_elimination_offset (int, int); > extern void s390_emit_prologue (void); > extern void s390_emit_epilogue (bool); > +extern void s390_expand_split_stack_prologue (void); > extern bool s390_can_use_simple_return_insn (void); > extern bool s390_can_use_return_insn (void); > extern void s390_function_profiler (FILE *, int); > diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c > index 9dc8d1e..0255eec 100644 > --- a/gcc/config/s390/s390.c > +++ b/gcc/config/s390/s390.c > @@ -426,6 +426,13 @@ struct GTY(()) machine_function > /* True if the current function may contain a tbegin clobbering > FPRs. */ > bool tbegin_p; > + > + /* For -fsplit-stack support: A stack local which holds a pointer to > + the stack arguments for a function with a variable number of > + arguments. This is set at the start of the function and is used > + to initialize the overflow_arg_area field of the va_list > + structure. */ > + rtx split_stack_varargs_pointer; > }; > > /* Few accessor macros for struct cfun->machine->s390_frame_layout. */ > @@ -7669,7 +7676,17 @@ s390_split_branches (void) > > pat = PATTERN (insn); > if (GET_CODE (pat) == PARALLEL) > - pat = XVECEXP (pat, 0, 0); > + { > + /* Split stack call pseudo-jump doesn't need splitting. */ > + if (GET_CODE (XVECEXP (pat, 0, 1)) == SET > + && GET_CODE (XEXP (XVECEXP (pat, 0, 1), 1)) == UNSPEC_VOLATILE > + && (XINT (XEXP (XVECEXP (pat, 0, 1), 1), 1) > + == UNSPECV_SPLIT_STACK_CALL_ESA > + || XINT (XEXP (XVECEXP (pat, 0, 1), 1), 1) > + == UNSPECV_SPLIT_STACK_CALL_ZARCH)) > + continue; > + pat = XVECEXP (pat, 0, 0); > + } > if (GET_CODE (pat) != SET || SET_DEST (pat) != pc_rtx) > continue; > > @@ -7692,6 +7709,49 @@ s390_split_branches (void) > if (get_attr_length (insn) <= 4) > continue; > > + if (prologue_epilogue_contains (insn)) > + { > + /* A jump in prologue/epilogue must come from the split-stack > + prologue. It cannot be split - there are no scratch regs > + available at that point. Rewire it instead. */ > + > + rtx_insn *code_label = (rtx_insn *)XEXP (*label, 0); > + gcc_assert (LABEL_P (code_label)); > + rtx_insn *note = NEXT_INSN (code_label); > + gcc_assert (NOTE_P (note)); > + rtx_insn *jump_ss = NEXT_INSN (note); > + gcc_assert (JUMP_P (jump_ss)); > + rtx_insn *barrier = NEXT_INSN (jump_ss); > + gcc_assert (BARRIER_P (barrier)); > + gcc_assert (GET_CODE (SET_SRC (pat)) == IF_THEN_ELSE); > + gcc_assert (GET_CODE (XEXP (SET_SRC (pat), 0)) == LT); > + > + /* step 1 - insert new label after */ > + rtx new_label = gen_label_rtx (); > + emit_label_after (new_label, insn); > + > + /* step 2 - reorder */ > + reorder_insns_nobb (code_label, barrier, insn); > + > + /* step 3 - retarget jump */ > + rtx new_target = gen_rtx_LABEL_REF (VOIDmode, new_label); > + ret = validate_change (insn, label, new_target, 0); > + gcc_assert (ret); > + LABEL_NUSES (new_label)++; > + LABEL_NUSES (code_label)--; > + JUMP_LABEL (insn) = new_label; > + > + /* step 4 - invert jump cc */ > + rtx *pcond = &XEXP (SET_SRC (pat), 0); > + rtx new_cond = gen_rtx_fmt_ee (GE, VOIDmode, > + XEXP (*pcond, 0), > + XEXP (*pcond, 1)); > + ret = validate_change (insn, pcond, new_cond, 0); > + gcc_assert (ret); > + > + continue; > + } > + > /* We are going to use the return register as scratch register, > make sure it will be saved/restored by the prologue/epilogue. */ > cfun_frame_layout.save_return_addr_p = 1; > @@ -8736,7 +8796,7 @@ s390_chunkify_start (void) > } > /* If we have a direct jump (conditional or unconditional), > check all potential targets. */ > - else if (JUMP_P (insn)) > + else if (JUMP_P (insn) && !prologue_epilogue_contains (insn)) > { > rtx pat = PATTERN (insn); > > @@ -9316,9 +9376,13 @@ s390_register_info () > cfun_frame_layout.high_fprs++; > } > > - if (flag_pic) > - clobbered_regs[PIC_OFFSET_TABLE_REGNUM] > - |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM); > + /* Register 12 is used for GOT address, but also as temp in prologue > + for split-stack stdarg functions (unless r14 is available). */ > + clobbered_regs[12] > + |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM)) > + || (flag_split_stack && cfun->stdarg > + && (crtl->is_leaf || TARGET_TPF_PROFILING > + || has_hard_reg_initial_val (Pmode, RETURN_REGNUM)))); > > clobbered_regs[BASE_REGNUM] > |= (cfun->machine->base_reg > @@ -10446,6 +10510,8 @@ s390_emit_prologue (void) > && !crtl->is_leaf > && !TARGET_TPF_PROFILING) > temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM); > + else if (flag_split_stack && cfun->stdarg) > + temp_reg = gen_rtx_REG (Pmode, 12); > else > temp_reg = gen_rtx_REG (Pmode, 1); > > @@ -10939,6 +11005,386 @@ s300_set_up_by_prologue (hard_reg_set_container *regs) > SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg)); > } > > +/* -fsplit-stack support. */ > + > +/* A SYMBOL_REF for __morestack. */ > +static GTY(()) rtx morestack_ref; > + > +/* When using -fsplit-stack, the allocation routines set a field in > + the TCB to the bottom of the stack plus this much space, measured > + in bytes. */ > + > +#define SPLIT_STACK_AVAILABLE 1024 > + > +/* Emit -fsplit-stack prologue, which goes before the regular function > + prologue. */ > + > +void > +s390_expand_split_stack_prologue (void) > +{ > + rtx r1, guard, cc; > + rtx_insn *insn; > + /* Offset from thread pointer to __private_ss. */ > + int psso = TARGET_64BIT ? 0x38 : 0x20; > + /* Pointer size in bytes. */ > + /* Frame size and argument size - the two parameters to __morestack. */ > + HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size; > + /* Align argument size to 8 bytes - simplifies __morestack code. */ > + HOST_WIDE_INT args_size = crtl->args.size >= 0 > + ? ((crtl->args.size + 7) & ~7) > + : 0; > + /* Label to jump to when no __morestack call is necessary. */ > + rtx_code_label *enough = NULL; > + /* Label to be called by __morestack. */ > + rtx_code_label *call_done = NULL; > + /* 1 if __morestack called conditionally, 0 if always. */ > + int conditional = 0; > + > + gcc_assert (flag_split_stack && reload_completed); > + > + r1 = gen_rtx_REG (Pmode, 1); > + > + /* If no stack frame will be allocated, don't do anything. */ > + if (!frame_size) > + { > + /* But emit a marker that will let linker and indirect function > + calls recognise this function as split-stack aware. */ > + emit_insn(gen_split_stack_marker()); > + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) > + { > + /* If va_start is used, just use r15. */ > + emit_move_insn (r1, > + gen_rtx_PLUS (Pmode, stack_pointer_rtx, > + GEN_INT (STACK_POINTER_OFFSET))); > + } > + return; > + } > + > + if (morestack_ref == NULL_RTX) > + { > + morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack"); > + SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL > + | SYMBOL_FLAG_FUNCTION); > + } > + > + if (frame_size <= 0x7fff || (TARGET_EXTIMM && frame_size <= 0xffffffffu)) > + { > + /* If frame_size will fit in an add instruction, do a stack space > + check, and only call __morestack if there's not enough space. */ > + conditional = 1; > + > + /* Get thread pointer. r1 is the only register we can always destroy - r0 > + could contain a static chain (and cannot be used to address memory > + anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved. */ > + emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM)); > + /* Aim at __private_ss. */ > + guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso)); > + > + /* If less that 1kiB used, skip addition and compare directly with > + __private_ss. */ > + if (frame_size > SPLIT_STACK_AVAILABLE) > + { > + emit_move_insn (r1, guard); > + if (TARGET_64BIT) > + emit_insn (gen_adddi3 (r1, r1, GEN_INT(frame_size))); > + else > + emit_insn (gen_addsi3 (r1, r1, GEN_INT(frame_size))); > + guard = r1; > + } > + > + if (TARGET_CPU_ZARCH) > + { > + rtx tmp; > + > + /* Compare the (maybe adjusted) guard with the stack pointer. */ > + cc = s390_emit_compare (LT, stack_pointer_rtx, guard); > + > + call_done = gen_label_rtx (); > + > + if (TARGET_64BIT) > + tmp = gen_split_stack_cond_call_zarch_di (call_done, > + morestack_ref, > + GEN_INT (frame_size), > + GEN_INT (args_size), > + cc); > + else > + tmp = gen_split_stack_cond_call_zarch_si (call_done, > + morestack_ref, > + GEN_INT (frame_size), > + GEN_INT (args_size), > + cc); > + > + > + insn = emit_jump_insn (tmp); > + JUMP_LABEL (insn) = call_done; > + > + /* Mark the jump as very unlikely to be taken. */ > + add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100); > + } > + else > + { > + /* Compare the (maybe adjusted) guard with the stack pointer. */ > + cc = s390_emit_compare (GE, stack_pointer_rtx, guard); > + > + enough = gen_label_rtx (); > + insn = s390_emit_jump (enough, cc); > + JUMP_LABEL (insn) = enough; > + > + /* Mark the jump as very likely to be taken. */ > + add_int_reg_note (insn, REG_BR_PROB, > + REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100); > + } > + } > + > + if (call_done == NULL) > + { > + rtx tmp; > + call_done = gen_label_rtx (); > + > + /* Now, we need to call __morestack. It has very special calling > + conventions: it preserves param/return/static chain registers for > + calling main function body, and looks for its own parameters > + at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */ > + if (TARGET_64BIT) > + tmp = gen_split_stack_call_zarch_di (call_done, > + morestack_ref, > + GEN_INT (frame_size), > + GEN_INT (args_size)); > + else if (TARGET_CPU_ZARCH) > + tmp = gen_split_stack_call_zarch_si (call_done, > + morestack_ref, > + GEN_INT (frame_size), > + GEN_INT (args_size)); > + else > + tmp = gen_split_stack_call_esa (call_done, > + morestack_ref, > + GEN_INT (frame_size), > + GEN_INT (args_size)); > + insn = emit_jump_insn (tmp); > + JUMP_LABEL (insn) = call_done; > + emit_barrier (); > + } > + > + /* __morestack will call us here. */ > + > + if (enough != NULL) > + { > + emit_label (enough); > + LABEL_NUSES (enough) = 1; > + } > + > + if (conditional && cfun->machine->split_stack_varargs_pointer != NULL_RTX) > + { > + /* If va_start is used, and __morestack was not called, just use r15. */ > + emit_move_insn (r1, > + gen_rtx_PLUS (Pmode, stack_pointer_rtx, > + GEN_INT (STACK_POINTER_OFFSET))); > + } > + > + emit_label (call_done); > + LABEL_NUSES (call_done) = 1; > +} > + > +/* Generates split-stack call sequence for esa mode, along with its parameter > + block. */ > + > +static void > +s390_expand_split_stack_call_esa (rtx_insn *orig_insn, > + rtx call_done, > + rtx function, > + rtx frame_size, > + rtx args_size) > +{ > + int psize = GET_MODE_SIZE (Pmode); > + /* Labels for literal base, literal __morestack, param base. */ > + rtx litbase = gen_label_rtx(); > + rtx litms = gen_label_rtx(); > + rtx parmbase = gen_label_rtx(); > + rtx r1 = gen_rtx_REG (Pmode, 1); > + rtx_insn *insn = orig_insn; > + rtx tmp, tmp2; > + > + /* No brasl, we have to make do using basr and a literal pool. */ > + > + /* %r1 = litbase. */ > + insn = emit_insn_after (gen_main_base_31_small (r1, litbase), insn); > + insn = emit_label_after (litbase, insn); > + > + /* a %r1, .Llitms-.Llitbase(%r1) */ > + tmp = gen_rtx_LABEL_REF (Pmode, litbase); > + tmp2 = gen_rtx_LABEL_REF (Pmode, litms); > + tmp = gen_rtx_UNSPEC (Pmode, > + gen_rtvec (2, tmp2, tmp), > + UNSPEC_POOL_OFFSET); > + tmp = gen_rtx_CONST (Pmode, tmp); > + tmp = gen_rtx_MEM (Pmode, gen_rtx_PLUS (Pmode, r1, tmp)); > + insn = emit_insn_after (gen_addsi3 (r1, r1, tmp), insn); > + add_reg_note (insn, REG_LABEL_OPERAND, litbase); > + add_reg_note (insn, REG_LABEL_OPERAND, litms); > + LABEL_NUSES (litbase)++; > + LABEL_NUSES (litms)++; > + > + /* basr %r1, %r1 */ > + tmp = gen_split_stack_sibcall_basr (r1, call_done); > + insn = emit_jump_insn_after (tmp, insn); > + JUMP_LABEL (insn) = call_done; > + LABEL_NUSES (call_done)++; > + > + /* __morestack will mangle its return register to get our parameters. */ > + > + /* Now, we'll emit parameters to __morestack. First, align to pointer size > + (this mirrors the alignment done in __morestack - don't touch it). */ > + insn = emit_insn_after (gen_pool_align (GEN_INT (psize)), insn); > + > + insn = emit_label_after (parmbase, insn); > + > + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, > + gen_rtvec (1, frame_size), > + UNSPECV_POOL_ENTRY); > + insn = emit_insn_after (tmp, insn); > + > + /* Second parameter is size of the arguments passed on stack that > + __morestack has to copy to the new stack (does not include varargs). */ > + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, > + gen_rtvec (1, args_size), > + UNSPECV_POOL_ENTRY); > + insn = emit_insn_after (tmp, insn); > + > + /* Third parameter is offset between start of the parameter block > + and function body to be called by __morestack. */ > + tmp = gen_rtx_LABEL_REF (Pmode, parmbase); > + tmp2 = gen_rtx_LABEL_REF (Pmode, call_done); > + tmp = gen_rtx_CONST (Pmode, > + gen_rtx_MINUS (Pmode, tmp2, tmp)); > + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, > + gen_rtvec (1, tmp), > + UNSPECV_POOL_ENTRY); > + insn = emit_insn_after (tmp, insn); > + add_reg_note (insn, REG_LABEL_OPERAND, call_done); > + LABEL_NUSES (call_done)++; > + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); > + LABEL_NUSES (parmbase)++; > + > + /* We take advantage of the already-existing literal pool here to stuff > + the __morestack address for use in the call above. */ > + > + insn = emit_label_after (litms, insn); > + > + /* We actually emit __morestack - litbase to support PIC. Since it > + works just as well for non-PIC, we use it in all cases. */ > + > + tmp = gen_rtx_LABEL_REF (Pmode, litbase); > + tmp = gen_rtx_CONST (Pmode, > + gen_rtx_MINUS (Pmode, function, tmp)); > + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, > + gen_rtvec (1, tmp), > + UNSPECV_POOL_ENTRY); > + insn = emit_insn_after (tmp, insn); > + add_reg_note (insn, REG_LABEL_OPERAND, litbase); > + LABEL_NUSES (litbase)++; > + > + delete_insn (orig_insn); > +} > + > +/* Generates split-stack call sequence for zarch mode, along with its parameter > + block. */ > + > +static void > +s390_expand_split_stack_call_zarch (rtx_insn *orig_insn, > + rtx call_done, > + rtx function, > + rtx frame_size, > + rtx args_size, > + rtx cond) > +{ > + int psize = GET_MODE_SIZE (Pmode); > + rtx_insn *insn = orig_insn; > + rtx parmbase = gen_label_rtx(); > + rtx r1 = gen_rtx_REG (Pmode, 1); > + rtx tmp, tmp2; > + > + /* %r1 = litbase. */ > + insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn); > + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); > + LABEL_NUSES (parmbase)++; > + > + /* jg<cond> __morestack. */ > + if (cond == NULL) > + { > + if (TARGET_64BIT) > + tmp = gen_split_stack_sibcall_di (function, call_done); > + else > + tmp = gen_split_stack_sibcall_si (function, call_done); > + insn = emit_jump_insn_after (tmp, insn); > + } > + else > + { > + if (!s390_comparison (cond, VOIDmode)) > + internal_error ("bad split_stack_call_zarch cond"); > + if (TARGET_64BIT) > + tmp = gen_split_stack_cond_sibcall_di (function, cond, call_done); > + else > + tmp = gen_split_stack_cond_sibcall_si (function, cond, call_done); > + insn = emit_jump_insn_after (tmp, insn); > + } > + JUMP_LABEL (insn) = call_done; > + LABEL_NUSES (call_done)++; > + > + /* Go to .rodata. */ > + insn = emit_insn_after (gen_pool_section_start (), insn); > + > + /* Now, we'll emit parameters to __morestack. First, align to pointer size > + (this mirrors the alignment done in __morestack - don't touch it). */ > + insn = emit_insn_after (gen_pool_align (GEN_INT (psize)), insn); > + > + insn = emit_label_after (parmbase, insn); > + > + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, > + gen_rtvec (1, frame_size), > + UNSPECV_POOL_ENTRY); > + insn = emit_insn_after (tmp, insn); > + > + /* Second parameter is size of the arguments passed on stack that > + __morestack has to copy to the new stack (does not include varargs). */ > + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, > + gen_rtvec (1, args_size), > + UNSPECV_POOL_ENTRY); > + insn = emit_insn_after (tmp, insn); > + > + /* Third parameter is offset between start of the parameter block > + and function body to be called by __morestack. */ > + tmp = gen_rtx_LABEL_REF (Pmode, parmbase); > + tmp2 = gen_rtx_LABEL_REF (Pmode, call_done); > + tmp = gen_rtx_CONST (Pmode, > + gen_rtx_MINUS (Pmode, tmp2, tmp)); > + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, > + gen_rtvec (1, tmp), > + UNSPECV_POOL_ENTRY); > + insn = emit_insn_after (tmp, insn); > + add_reg_note (insn, REG_LABEL_OPERAND, call_done); > + LABEL_NUSES (call_done)++; > + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); > + LABEL_NUSES (parmbase)++; > + > + /* Return from .rodata. */ > + insn = emit_insn_after (gen_pool_section_end (), insn); > + > + delete_insn (orig_insn); > +} > + > +/* We may have to tell the dataflow pass that the split stack prologue > + is initializing a register. */ > + > +static void > +s390_live_on_entry (bitmap regs) > +{ > + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) > + { > + gcc_assert (flag_split_stack); > + bitmap_set_bit (regs, 1); > + } > +} > + > /* Return true if the function can use simple_return to return outside > of a shrink-wrapped region. At present shrink-wrapping is supported > in all cases. */ > @@ -11541,6 +11987,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) > expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); > } > > + if (flag_split_stack > + && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) > + == NULL) > + && cfun->machine->split_stack_varargs_pointer == NULL_RTX) > + { > + rtx reg; > + rtx_insn *seq; > + > + reg = gen_reg_rtx (Pmode); > + cfun->machine->split_stack_varargs_pointer = reg; > + > + start_sequence (); > + emit_move_insn (reg, gen_rtx_REG (Pmode, 1)); > + seq = get_insns (); > + end_sequence (); > + > + push_topmost_sequence (); > + emit_insn_after (seq, entry_of_function ()); > + pop_topmost_sequence (); > + } > + > /* Find the overflow area. > FIXME: This currently is too pessimistic when the vector ABI is > enabled. In that case we *always* set up the overflow area > @@ -11549,7 +12016,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) > || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG > || TARGET_VX_ABI) > { > - t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); > + if (cfun->machine->split_stack_varargs_pointer == NULL_RTX) > + t = make_tree (TREE_TYPE (ovf), crtl->args.internal_arg_pointer); > + else > + t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer); > > off = INTVAL (crtl->args.arg_offset_rtx); > off = off < 0 ? 0 : off; > @@ -13158,6 +13628,56 @@ s390_reorg (void) > } > } > > + if (flag_split_stack) > + { > + rtx_insn *insn; > + > + for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) > + { > + /* Look for the split-stack fake jump instructions. */ > + if (!JUMP_P(insn)) > + continue; > + if (GET_CODE (PATTERN (insn)) != PARALLEL > + || XVECLEN (PATTERN (insn), 0) != 2) > + continue; > + rtx set = XVECEXP (PATTERN (insn), 0, 1); > + if (GET_CODE (set) != SET) > + continue; > + rtx unspec = XEXP(set, 1); > + if (GET_CODE (unspec) != UNSPEC_VOLATILE) > + continue; > + if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL_ESA > + && XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL_ZARCH) > + continue; > + rtx set_pc = XVECEXP (PATTERN (insn), 0, 0); > + rtx function = XVECEXP (unspec, 0, 0); > + rtx frame_size = XVECEXP (unspec, 0, 1); > + rtx args_size = XVECEXP (unspec, 0, 2); > + rtx pc_src = XEXP (set_pc, 1); > + rtx call_done, cond = NULL_RTX; > + if (GET_CODE (pc_src) == IF_THEN_ELSE) > + { > + cond = XEXP (pc_src, 0); > + call_done = XEXP (XEXP (pc_src, 1), 0); > + } > + else > + call_done = XEXP (pc_src, 0); > + if (XINT (unspec, 1) == UNSPECV_SPLIT_STACK_CALL_ESA) > + s390_expand_split_stack_call_esa (insn, > + call_done, > + function, > + frame_size, > + args_size); > + else > + s390_expand_split_stack_call_zarch (insn, > + call_done, > + function, > + frame_size, > + args_size, > + cond); > + } > + } > + > /* Try to optimize prologue and epilogue further. */ > s390_optimize_prologue (); > > @@ -14469,6 +14989,9 @@ s390_asm_file_end (void) > s390_vector_abi); > #endif > file_end_indicate_exec_stack (); > + > + if (flag_split_stack) > + file_end_indicate_split_stack (); > } > > /* Return true if TYPE is a vector bool type. */ > @@ -14724,6 +15247,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty > #undef TARGET_SET_UP_BY_PROLOGUE > #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue > > +#undef TARGET_EXTRA_LIVE_ON_ENTRY > +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry > + > #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P > #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ > s390_use_by_pieces_infrastructure_p > diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md > index 0ebefd6..15c6eed 100644 > --- a/gcc/config/s390/s390.md > +++ b/gcc/config/s390/s390.md > @@ -114,6 +114,9 @@ > UNSPEC_SP_SET > UNSPEC_SP_TEST > > + ; Split stack support > + UNSPEC_STACK_CHECK > + > ; Test Data Class (TDC) > UNSPEC_TDC_INSN > > @@ -276,6 +279,12 @@ > ; Set and get floating point control register > UNSPECV_SFPC > UNSPECV_EFPC > + > + ; Split stack support > + UNSPECV_SPLIT_STACK_CALL_ZARCH > + UNSPECV_SPLIT_STACK_CALL_ESA > + UNSPECV_SPLIT_STACK_SIBCALL > + UNSPECV_SPLIT_STACK_MARKER > ]) > > ;; > @@ -10909,3 +10918,127 @@ > "TARGET_Z13" > "lcbb\t%0,%1,%b2" > [(set_attr "op_type" "VRX")]) > + > +; Handle -fsplit-stack. > + > +(define_expand "split_stack_prologue" > + [(const_int 0)] > + "" > +{ > + s390_expand_split_stack_prologue (); > + DONE; > +}) > + > +(define_insn "split_stack_call_esa" > + [(set (pc) (label_ref (match_operand 0 "" ""))) > + (set (reg:SI 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") > + (match_operand 2 "consttable_operand" "X") > + (match_operand 3 "consttable_operand" "X")] > + UNSPECV_SPLIT_STACK_CALL_ESA))] > + "!TARGET_CPU_ZARCH" > +{ > + gcc_unreachable (); > +} > + [(set_attr "length" "32")]) > + > +(define_insn "split_stack_call_zarch_<mode>" > + [(set (pc) (label_ref (match_operand 0 "" ""))) > + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") > + (match_operand 2 "consttable_operand" "X") > + (match_operand 3 "consttable_operand" "X")] > + UNSPECV_SPLIT_STACK_CALL_ZARCH))] > + "TARGET_CPU_ZARCH" > +{ > + gcc_unreachable (); > +} > + [(set_attr "length" "12")]) > + > +(define_insn "split_stack_cond_call_zarch_<mode>" > + [(set (pc) > + (if_then_else > + (match_operand 4 "" "") > + (label_ref (match_operand 0 "" "")) > + (pc))) > + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") > + (match_operand 2 "consttable_operand" "X") > + (match_operand 3 "consttable_operand" "X")] > + UNSPECV_SPLIT_STACK_CALL_ZARCH))] > + "TARGET_CPU_ZARCH" > +{ > + gcc_unreachable (); > +} > + [(set_attr "length" "12")]) > + > +;; If there are operand 0 bytes available on the stack, jump to > +;; operand 1. > + > +(define_expand "split_stack_space_check" > + [(set (pc) (if_then_else > + (ltu (minus (reg 15) > + (match_operand 0 "register_operand")) > + (unspec [(const_int 0)] UNSPEC_STACK_CHECK)) > + (label_ref (match_operand 1)) > + (pc)))] > + "" > +{ > + /* Offset from thread pointer to __private_ss. */ > + int psso = TARGET_64BIT ? 0x38 : 0x20; > + rtx tp = s390_get_thread_pointer (); > + rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso)); > + rtx reg = gen_reg_rtx (Pmode); > + rtx cc; > + if (TARGET_64BIT) > + emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0])); > + else > + emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0])); > + cc = s390_emit_compare (GT, reg, guard); > + s390_emit_jump (operands[1], cc); > + > + DONE; > +}) > + > +;; A basr for use in split stack prologue. > + > +(define_insn "split_stack_sibcall_basr" > + [(set (pc) (label_ref (match_operand 1 "" ""))) > + (set (reg:SI 1) (unspec_volatile [(match_operand 0 "register_operand" "a")] > + UNSPECV_SPLIT_STACK_SIBCALL))] > + "!TARGET_CPU_ZARCH" > + "basr\t%%r1, %0" > + [(set_attr "op_type" "RR") > + (set_attr "type" "jsr")]) > + > +;; A jg with minimal fuss for use in split stack prologue. > + > +(define_insn "split_stack_sibcall_<mode>" > + [(set (pc) (label_ref (match_operand 1 "" ""))) > + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] > + UNSPECV_SPLIT_STACK_SIBCALL))] > + "TARGET_CPU_ZARCH" > + "jg\t%0" > + [(set_attr "op_type" "RIL") > + (set_attr "type" "branch")]) > + > +;; Also a conditional one. > + > +(define_insn "split_stack_cond_sibcall_<mode>" > + [(set (pc) > + (if_then_else > + (match_operand 1 "" "") > + (label_ref (match_operand 2 "" "")) > + (pc))) > + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] > + UNSPECV_SPLIT_STACK_SIBCALL))] > + "TARGET_CPU_ZARCH" > + "jg%C1\t%0" > + [(set_attr "op_type" "RIL") > + (set_attr "type" "branch")]) > + > +;; An unusual nop instruction used to mark functions with no stack frames > +;; as split-stack aware. > + > +(define_insn "split_stack_marker" > + [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)] > + "" > + "nopr\t%%r15" > + [(set_attr "op_type" "RR")]) > diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog > index f66646c..ff60571 100644 > --- a/libgcc/ChangeLog > +++ b/libgcc/ChangeLog > @@ -1,3 +1,10 @@ > +2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> > + > + * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. > + * config/s390/morestack.S: New file. > + * config/s390/t-stack-s390: New file. > + * generic-morestack.c (__splitstack_find): Add s390-specific code. > + > 2015-12-18 Andris Pavenis <andris.pavenis@iki.fi> > > * config.host: Add *-*-msdosdjgpp to lists of i[34567]86-*-* > diff --git a/libgcc/config.host b/libgcc/config.host > index 0a3b879..ce6d259 100644 > --- a/libgcc/config.host > +++ b/libgcc/config.host > @@ -1105,11 +1105,11 @@ rx-*-elf) > tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h" > ;; > s390-*-linux*) > - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi" > + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390" > md_unwind_header=s390/linux-unwind.h > ;; > s390x-*-linux*) > - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux" > + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390" > if test "${host_address}" = 32; then > tmake_file="${tmake_file} s390/32/t-floattodi" > fi > diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S > new file mode 100644 > index 0000000..8e26c66 > --- /dev/null > +++ b/libgcc/config/s390/morestack.S > @@ -0,0 +1,718 @@ > +# s390 support for -fsplit-stack. > +# Copyright (C) 2015 Free Software Foundation, Inc. > +# Contributed by Marcin KoÅcielnicki <koriakin@0x04.net>. > + > +# This file is part of GCC. > + > +# GCC is free software; you can redistribute it and/or modify it under > +# the terms of the GNU General Public License as published by the Free > +# Software Foundation; either version 3, or (at your option) any later > +# version. > + > +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +# WARRANTY; without even the implied warranty of MERCHANTABILITY or > +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > +# for more details. > + > +# Under Section 7 of GPL version 3, you are granted additional > +# permissions described in the GCC Runtime Library Exception, version > +# 3.1, as published by the Free Software Foundation. > + > +# You should have received a copy of the GNU General Public License and > +# a copy of the GCC Runtime Library Exception along with this program; > +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > +# <http://www.gnu.org/licenses/>. > + > +# Excess space needed to call ld.so resolver for lazy plt > +# resolution. Go uses sigaltstack so this doesn't need to > +# also cover signal frame size. > +#define BACKOFF 0x1000 > + > +# The __morestack function. > + > + .global __morestack > + .hidden __morestack > + > + .type __morestack,@function > + > +__morestack: > +.LFB1: > + .cfi_startproc > + > + > +#ifndef __s390x__ > + > + > +# The 31-bit __morestack function. > + > + # We use a cleanup to restore the stack guard if an exception > + # is thrown through this code. > +#ifndef __PIC__ > + .cfi_personality 0,__gcc_personality_v0 > + .cfi_lsda 0,.LLSDA1 > +#else > + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 > + .cfi_lsda 0x1b,.LLSDA1 > +#endif > + > + stm %r2, %r15, 0x8(%r15) # Save %r2-%r15. > + .cfi_offset %r6, -0x48 > + .cfi_offset %r7, -0x44 > + .cfi_offset %r8, -0x40 > + .cfi_offset %r9, -0x3c > + .cfi_offset %r10, -0x38 > + .cfi_offset %r11, -0x34 > + .cfi_offset %r12, -0x30 > + .cfi_offset %r13, -0x2c > + .cfi_offset %r14, -0x28 > + .cfi_offset %r15, -0x24 > + lr %r11, %r15 # Make frame pointer for vararg. > + .cfi_def_cfa_register %r11 > + ahi %r15, -0x60 # 0x60 for standard frame. > + st %r11, 0(%r15) # Save back chain. > + lr %r8, %r0 # Save %r0 (static chain). > + > + basr %r13, 0 # .Lmsl to %r13 > +.Lmsl: > + > + # %r1 may point directly to the parameter area (zarch), or right after > + # the basr instruction that called us (esa). In the first case, > + # the pointer is already aligned. In the second case, we may need to > + # align it up to 4 bytes to get to the parameters. > + la %r10, 3(%r1) > + lhi %r7, -4 > + nr %r10, %r7 # %r10 = (%r1 + 3) & ~3 > + > + l %r7, 0(%r10) # Required frame size to %r7 > + ear %r1, %a0 # Extract thread pointer. > + l %r1, 0x20(%r1) # Get stack bounduary > + ar %r1, %r7 # Stack bounduary + frame size > + a %r1, 4(%r10) # + stack param size > + clr %r1, %r15 # Compare with current stack pointer > + jle .Lnoalloc # guard > sp - frame-size: need alloc > + > + l %r1, .Lmslbs-.Lmsl(%r13) # __morestack_block_signals > +#ifdef __PIC__ > + bas %r14, 0(%r1, %r13) > +#else > + basr %r14, %r1 > +#endif > + > + # We abuse one of caller's fpr save slots (which we don't use for fprs) > + # as a local variable. Not needed here, but done to be consistent with > + # the below use. > + ahi %r7, BACKOFF # Bump requested size a bit. > + st %r7, 0x40(%r11) # Stuff frame size on stack. > + la %r2, 0x40(%r11) # Pass its address as parameter. > + la %r3, 0x60(%r11) # Caller's stack parameters. > + l %r4, 4(%r10) # Size of stack paremeters. > + > + l %r1, .Lmslgms-.Lmsl(%r13) # __generic_morestack > +#ifdef __PIC__ > + bas %r14, 0(%r1, %r13) > +#else > + basr %r14, %r1 > +#endif > + > + lr %r15, %r2 # Switch to the new stack. > + ahi %r15, -0x60 # Make a stack frame on it. > + st %r11, 0(%r15) # Save back chain. > + > + s %r2, 0x40(%r11) # The end of stack space. > + ahi %r2, BACKOFF # Back off a bit. > + ear %r1, %a0 # Extract thread pointer. > +.LEHB0: > + st %r2, 0x20(%r1) # Save the new stack boundary. > + > + l %r1, .Lmslubs-.Lmsl(%r13) # __morestack_unblock_signals > +#ifdef __PIC__ > + bas %r14, 0(%r1, %r13) > +#else > + basr %r14, %r1 > +#endif > + > + lr %r0, %r8 # Static chain. > + lm %r2, %r6, 0x8(%r11) # Paremeter registers. > + > + # Third parameter is address of function meat - address of parameter > + # block. > + a %r10, 0x8(%r10) > + > + # Leave vararg pointer in %r1, in case function uses it > + la %r1, 0x60(%r11) > + > + # State of registers: > + # %r0: Static chain from entry. > + # %r1: Vararg pointer. > + # %r2-%r6: Parameters from entry. > + # %r7-%r10: Indeterminate. > + # %r11: Frame pointer (%r15 from entry). > + # %r12: Indeterminate. > + # %r13: Literal pool address. > + # %r14: Return address. > + # %r15: Stack pointer. > + basr %r14, %r10 # Call our caller. > + > + stm %r2, %r3, 0x8(%r11) # Save return registers. > + > + l %r1, .Lmslbs-.Lmsl(%r13) # __morestack_block_signals > +#ifdef __PIC__ > + bas %r14, 0(%r1, %r13) > +#else > + basr %r14, %r1 > +#endif > + > + # We need a stack slot now, but have no good way to get it - the frame > + # on new stack had to be exactly 0x60 bytes, or stack parameters would > + # be passed wrong. Abuse fpr save area in caller's frame (we don't > + # save actual fprs). > + la %r2, 0x40(%r11) > + l %r1, .Lmslgrs-.Lmsl(%r13) # __generic_releasestack > +#ifdef __PIC__ > + bas %r14, 0(%r1, %r13) > +#else > + basr %r14, %r1 > +#endif > + > + s %r2, 0x40(%r11) # Subtract available space. > + ahi %r2, BACKOFF # Back off a bit. > + ear %r1, %a0 # Extract thread pointer. > +.LEHE0: > + st %r2, 0x20(%r1) # Save the new stack boundary. > + > + # We need to restore the old stack pointer before unblocking signals. > + # We also need 0x60 bytes for a stack frame. Since we had a stack > + # frame at this place before the stack switch, there's no need to > + # write the back chain again. > + lr %r15, %r11 > + ahi %r15, -0x60 > + > + l %r1, .Lmslubs-.Lmsl(%r13) # __morestack_unblock_signals > +#ifdef __PIC__ > + bas %r14, 0(%r1, %r13) > +#else > + basr %r14, %r1 > +#endif > + > + lm %r2, %r15, 0x8(%r11) # Restore all registers. > + .cfi_remember_state > + .cfi_restore %r15 > + .cfi_restore %r14 > + .cfi_restore %r13 > + .cfi_restore %r12 > + .cfi_restore %r11 > + .cfi_restore %r10 > + .cfi_restore %r9 > + .cfi_restore %r8 > + .cfi_restore %r7 > + .cfi_restore %r6 > + .cfi_def_cfa_register %r15 > + br %r14 # Return to caller's caller. > + > +# Executed if no new stack allocation is needed. > + > +.Lnoalloc: > + .cfi_restore_state > + # We may need to copy stack parameters. > + l %r9, 0x4(%r10) # Load stack parameter size. > + ltr %r9, %r9 # And check if it's 0. > + je .Lnostackparm # Skip the copy if not needed. > + sr %r15, %r9 # Make space on the stack. > + la %r8, 0x60(%r15) # Destination. > + la %r12, 0x60(%r11) # Source. > + lr %r13, %r9 # Source size. > +.Lcopy: > + mvcle %r8, %r12, 0 # Copy. > + jo .Lcopy > + > +.Lnostackparm: > + # Third parameter is address of function meat - address of parameter > + # block. > + a %r10, 0x8(%r10) > + > + # Leave vararg pointer in %r1, in case function uses it > + la %r1, 0x60(%r11) > + > + # OK, no stack allocation needed. We still follow the protocol and > + # call our caller - it doesn't cost much and makes sure vararg works. > + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. > + basr %r14, %r10 # Call our caller. > + > + lm %r6, %r15, 0x18(%r11) # Restore all callee-saved registers. > + .cfi_remember_state > + .cfi_restore %r15 > + .cfi_restore %r14 > + .cfi_restore %r13 > + .cfi_restore %r12 > + .cfi_restore %r11 > + .cfi_restore %r10 > + .cfi_restore %r9 > + .cfi_restore %r8 > + .cfi_restore %r7 > + .cfi_restore %r6 > + .cfi_def_cfa_register %r15 > + br %r14 # Return to caller's caller. > + > +# This is the cleanup code called by the stack unwinder when unwinding > +# through the code between .LEHB0 and .LEHE0 above. > + > +.L1: > + .cfi_restore_state > + lr %r2, %r11 # Stack pointer after resume. > + l %r1, .Lmslgfs-.Lmsl(%r13) # __generic_findstack > +#ifdef __PIC__ > + bas %r14, 0(%r1, %r13) > +#else > + basr %r14, %r1 > +#endif > + lr %r3, %r11 # Get the stack pointer. > + sr %r3, %r2 # Subtract available space. > + ahi %r3, BACKOFF # Back off a bit. > + ear %r1, %a0 # Extract thread pointer. > + st %r3, 0x20(%r1) # Save the new stack boundary. > + > + lr %r2, %r6 # Exception header. > +#ifdef __PIC__ > + l %r12, .Lmslgot-.Lmsl(%r13) > + ar %r12, %r13 > + l %r1, .Lmslunw-.Lmsl(%r13) > + bas %r14, 0(%r1, %r12) > +#else > + l %r1, .Lmslunw-.Lmsl(%r13) > + basr %r14, %r1 > +#endif > + > +# Literal pool. > + > +.align 4 > +#ifdef __PIC__ > +.Lmslbs: > + .long __morestack_block_signals-.Lmsl > +.Lmslubs: > + .long __morestack_unblock_signals-.Lmsl > +.Lmslgms: > + .long __generic_morestack-.Lmsl > +.Lmslgrs: > + .long __generic_releasestack-.Lmsl > +.Lmslgfs: > + .long __generic_findstack-.Lmsl > +.Lmslunw: > + .long _Unwind_Resume@PLTOFF > +.Lmslgot: > + .long _GLOBAL_OFFSET_TABLE_-.Lmsl > +#else > +.Lmslbs: > + .long __morestack_block_signals > +.Lmslubs: > + .long __morestack_unblock_signals > +.Lmslgms: > + .long __generic_morestack > +.Lmslgrs: > + .long __generic_releasestack > +.Lmslgfs: > + .long __generic_findstack > +.Lmslunw: > + .long _Unwind_Resume > +#endif > + > +#else /* defined(__s390x__) */ > + > + > +# The 64-bit __morestack function. > + > + # We use a cleanup to restore the stack guard if an exception > + # is thrown through this code. > +#ifndef __PIC__ > + .cfi_personality 0x3,__gcc_personality_v0 > + .cfi_lsda 0x3,.LLSDA1 > +#else > + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 > + .cfi_lsda 0x1b,.LLSDA1 > +#endif > + > + stmg %r2, %r15, 0x10(%r15) # Save %r2-%r15. > + .cfi_offset %r6, -0x70 > + .cfi_offset %r7, -0x68 > + .cfi_offset %r8, -0x60 > + .cfi_offset %r9, -0x58 > + .cfi_offset %r10, -0x50 > + .cfi_offset %r11, -0x48 > + .cfi_offset %r12, -0x40 > + .cfi_offset %r13, -0x38 > + .cfi_offset %r14, -0x30 > + .cfi_offset %r15, -0x28 > + lgr %r11, %r15 # Make frame pointer for vararg. > + .cfi_def_cfa_register %r11 > + aghi %r15, -0xa0 # 0xa0 for standard frame. > + stg %r11, 0(%r15) # Save back chain. > + lgr %r8, %r0 # Save %r0 (static chain). > + lgr %r10, %r1 # Save %r1 (address of parameter block). > + > + lg %r7, 0(%r10) # Required frame size to %r7 > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 # Extract thread pointer. > + lg %r1, 0x38(%r1) # Get stack bounduary > + agr %r1, %r7 # Stack bounduary + frame size > + ag %r1, 8(%r10) # + stack param size > + clgr %r1, %r15 # Compare with current stack pointer > + jle .Lnoalloc # guard > sp - frame-size: need alloc > + > + brasl %r14, __morestack_block_signals > + > + # We abuse one of caller's fpr save slots (which we don't use for fprs) > + # as a local variable. Not needed here, but done to be consistent with > + # the below use. > + aghi %r7, BACKOFF # Bump requested size a bit. > + stg %r7, 0x80(%r11) # Stuff frame size on stack. > + la %r2, 0x80(%r11) # Pass its address as parameter. > + la %r3, 0xa0(%r11) # Caller's stack parameters. > + lg %r4, 8(%r10) # Size of stack paremeters. > + brasl %r14, __generic_morestack > + > + lgr %r15, %r2 # Switch to the new stack. > + aghi %r15, -0xa0 # Make a stack frame on it. > + stg %r11, 0(%r15) # Save back chain. > + > + sg %r2, 0x80(%r11) # The end of stack space. > + aghi %r2, BACKOFF # Back off a bit. > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 # Extract thread pointer. > +.LEHB0: > + stg %r2, 0x38(%r1) # Save the new stack boundary. > + > + brasl %r14, __morestack_unblock_signals > + > + lgr %r0, %r8 # Static chain. > + lmg %r2, %r6, 0x10(%r11) # Paremeter registers. > + > + # Third parameter is address of function meat - address of parameter > + # block. > + ag %r10, 0x10(%r10) > + > + # Leave vararg pointer in %r1, in case function uses it > + la %r1, 0xa0(%r11) > + > + # State of registers: > + # %r0: Static chain from entry. > + # %r1: Vararg pointer. > + # %r2-%r6: Parameters from entry. > + # %r7-%r10: Indeterminate. > + # %r11: Frame pointer (%r15 from entry). > + # %r12-%r13: Indeterminate. > + # %r14: Return address. > + # %r15: Stack pointer. > + basr %r14, %r10 # Call our caller. > + > + stg %r2, 0x10(%r11) # Save return register. > + > + brasl %r14, __morestack_block_signals > + > + # We need a stack slot now, but have no good way to get it - the frame > + # on new stack had to be exactly 0xa0 bytes, or stack parameters would > + # be passed wrong. Abuse fpr save area in caller's frame (we don't > + # save actual fprs). > + la %r2, 0x80(%r11) > + brasl %r14, __generic_releasestack > + > + sg %r2, 0x80(%r11) # Subtract available space. > + aghi %r2, BACKOFF # Back off a bit. > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 # Extract thread pointer. > +.LEHE0: > + stg %r2, 0x38(%r1) # Save the new stack boundary. > + > + # We need to restore the old stack pointer before unblocking signals. > + # We also need 0xa0 bytes for a stack frame. Since we had a stack > + # frame at this place before the stack switch, there's no need to > + # write the back chain again. > + lgr %r15, %r11 > + aghi %r15, -0xa0 > + > + brasl %r14, __morestack_unblock_signals > + > + lmg %r2, %r15, 0x10(%r11) # Restore all registers. > + .cfi_remember_state > + .cfi_restore %r15 > + .cfi_restore %r14 > + .cfi_restore %r13 > + .cfi_restore %r12 > + .cfi_restore %r11 > + .cfi_restore %r10 > + .cfi_restore %r9 > + .cfi_restore %r8 > + .cfi_restore %r7 > + .cfi_restore %r6 > + .cfi_def_cfa_register %r15 > + br %r14 # Return to caller's caller. > + > +# Executed if no new stack allocation is needed. > + > +.Lnoalloc: > + .cfi_restore_state > + # We may need to copy stack parameters. > + lg %r9, 0x8(%r10) # Load stack parameter size. > + ltgr %r9, %r9 # Check if it's 0. > + je .Lnostackparm # Skip the copy if not needed. > + sgr %r15, %r9 # Make space on the stack. > + la %r8, 0xa0(%r15) # Destination. > + la %r12, 0xa0(%r11) # Source. > + lgr %r13, %r9 # Source size. > +.Lcopy: > + mvcle %r8, %r12, 0 # Copy. > + jo .Lcopy > + > +.Lnostackparm: > + # Third parameter is address of function meat - address of parameter > + # block. > + ag %r10, 0x10(%r10) > + > + # Leave vararg pointer in %r1, in case function uses it > + la %r1, 0xa0(%r11) > + > + # OK, no stack allocation needed. We still follow the protocol and > + # call our caller - it doesn't cost much and makes sure vararg works. > + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. > + basr %r14, %r10 # Call our caller. > + > + lmg %r6, %r15, 0x30(%r11) # Restore all callee-saved registers. > + .cfi_remember_state > + .cfi_restore %r15 > + .cfi_restore %r14 > + .cfi_restore %r13 > + .cfi_restore %r12 > + .cfi_restore %r11 > + .cfi_restore %r10 > + .cfi_restore %r9 > + .cfi_restore %r8 > + .cfi_restore %r7 > + .cfi_restore %r6 > + .cfi_def_cfa_register %r15 > + br %r14 # Return to caller's caller. > + > +# This is the cleanup code called by the stack unwinder when unwinding > +# through the code between .LEHB0 and .LEHE0 above. > + > +.L1: > + .cfi_restore_state > + lgr %r2, %r11 # Stack pointer after resume. > + brasl %r14, __generic_findstack > + lgr %r3, %r11 # Get the stack pointer. > + sgr %r3, %r2 # Subtract available space. > + aghi %r3, BACKOFF # Back off a bit. > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 # Extract thread pointer. > + stg %r3, 0x38(%r1) # Save the new stack boundary. > + > + lgr %r2, %r6 # Exception header. > +#ifdef __PIC__ > + brasl %r14, _Unwind_Resume@PLT > +#else > + brasl %r14, _Unwind_Resume > +#endif > + > +#endif /* defined(__s390x__) */ > + > + .cfi_endproc > + .size __morestack, . - __morestack > + > + > +# The exception table. This tells the personality routine to execute > +# the exception handler. > + > + .section .gcc_except_table,"a",@progbits > + .align 4 > +.LLSDA1: > + .byte 0xff # @LPStart format (omit) > + .byte 0xff # @TType format (omit) > + .byte 0x1 # call-site format (uleb128) > + .uleb128 .LLSDACSE1-.LLSDACSB1 # Call-site table length > +.LLSDACSB1: > + .uleb128 .LEHB0-.LFB1 # region 0 start > + .uleb128 .LEHE0-.LEHB0 # length > + .uleb128 .L1-.LFB1 # landing pad > + .uleb128 0 # action > +.LLSDACSE1: > + > + > + .global __gcc_personality_v0 > +#ifdef __PIC__ > + # Build a position independent reference to the basic > + # personality function. > + .hidden DW.ref.__gcc_personality_v0 > + .weak DW.ref.__gcc_personality_v0 > + .section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat > + .type DW.ref.__gcc_personality_v0, @object > +DW.ref.__gcc_personality_v0: > +#ifndef __LP64__ > + .align 4 > + .size DW.ref.__gcc_personality_v0, 4 > + .long __gcc_personality_v0 > +#else > + .align 8 > + .size DW.ref.__gcc_personality_v0, 8 > + .quad __gcc_personality_v0 > +#endif > +#endif > + > + > + > +# Initialize the stack test value when the program starts or when a > +# new thread starts. We don't know how large the main stack is, so we > +# guess conservatively. We might be able to use getrlimit here. > + > + .text > + .global __stack_split_initialize > + .hidden __stack_split_initialize > + > + .type __stack_split_initialize, @function > + > +__stack_split_initialize: > + > +#ifndef __s390x__ > + > + ear %r1, %a0 > + lr %r0, %r15 > + ahi %r0, -0x4000 # We should have at least 16K. > + st %r0, 0x20(%r1) > + > + lr %r2, %r15 > + lhi %r3, 0x4000 > +#ifdef __PIC__ > + # Cannot do a tail call - we'll go through PLT, so we need GOT address > + # in %r12, which is callee-saved. > + stm %r12, %r15, 0x30(%r15) > + basr %r13, 0 > +.Lssi0: > + ahi %r15, -0x60 > + l %r12, .Lssi2-.Lssi0(%r13) > + ar %r12, %r13 > + l %r1, .Lssi1-.Lssi0(%r13) > + bas %r14, 0(%r1, %r12) > + lm %r12, %r15, 0x90(%r15) > + br %r14 > + > +.align 4 > +.Lssi1: > + .long __generic_morestack_set_initial_sp@PLTOFF > +.Lssi2: > + .long _GLOBAL_OFFSET_TABLE_-.Lssi0 > + > +#else > + basr %r1, 0 > +.Lssi0: > + l %r1, .Lssi1-.Lssi0(%r1) > + br %r1 # Tail call > + > +.align 4 > +.Lssi1: > + .long __generic_morestack_set_initial_sp > +#endif > + > +#else /* defined(__s390x__) */ > + > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 > + lgr %r0, %r15 > + aghi %r0, -0x4000 # We should have at least 16K. > + stg %r0, 0x38(%r1) > + > + lgr %r2, %r15 > + lghi %r3, 0x4000 > +#ifdef __PIC__ > + jg __generic_morestack_set_initial_sp@PLT # Tail call > +#else > + jg __generic_morestack_set_initial_sp # Tail call > +#endif > + > +#endif /* defined(__s390x__) */ > + > + .size __stack_split_initialize, . - __stack_split_initialize > + > +# Routines to get and set the guard, for __splitstack_getcontext, > +# __splitstack_setcontext, and __splitstack_makecontext. > + > +# void *__morestack_get_guard (void) returns the current stack guard. > + .text > + .global __morestack_get_guard > + .hidden __morestack_get_guard > + > + .type __morestack_get_guard,@function > + > +__morestack_get_guard: > + > +#ifndef __s390x__ > + ear %r1, %a0 > + l %r2, 0x20(%r1) > +#else > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 > + lg %r2, 0x38(%r1) > +#endif > + br %r14 > + > + .size __morestack_get_guard, . - __morestack_get_guard > + > +# void __morestack_set_guard (void *) sets the stack guard. > + .global __morestack_set_guard > + .hidden __morestack_set_guard > + > + .type __morestack_set_guard,@function > + > +__morestack_set_guard: > + > +#ifndef __s390x__ > + ear %r1, %a0 > + st %r2, 0x20(%r1) > +#else > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 > + stg %r2, 0x38(%r1) > +#endif > + br %r14 > + > + .size __morestack_set_guard, . - __morestack_set_guard > + > +# void *__morestack_make_guard (void *, size_t) returns the stack > +# guard value for a stack. > + .global __morestack_make_guard > + .hidden __morestack_make_guard > + > + .type __morestack_make_guard,@function > + > +__morestack_make_guard: > + > +#ifndef __s390x__ > + sr %r2, %r3 > + ahi %r2, BACKOFF > +#else > + sgr %r2, %r3 > + aghi %r2, BACKOFF > +#endif > + br %r14 > + > + .size __morestack_make_guard, . - __morestack_make_guard > + > +# Make __stack_split_initialize a high priority constructor. > + > + .section .ctors.65535,"aw",@progbits > + > +#ifndef __LP64__ > + .align 4 > + .long __stack_split_initialize > + .long __morestack_load_mmap > +#else > + .align 8 > + .quad __stack_split_initialize > + .quad __morestack_load_mmap > +#endif > + > + .section .note.GNU-stack,"",@progbits > + .section .note.GNU-split-stack,"",@progbits > + .section .note.GNU-no-split-stack,"",@progbits > diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390 > new file mode 100644 > index 0000000..4c959b0 > --- /dev/null > +++ b/libgcc/config/s390/t-stack-s390 > @@ -0,0 +1,2 @@ > +# Makefile fragment to support -fsplit-stack for s390. > +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S > diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c > index a10559b..8109c1a 100644 > --- a/libgcc/generic-morestack.c > +++ b/libgcc/generic-morestack.c > @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len, > #elif defined (__i386__) > nsp -= 6 * sizeof (void *); > #elif defined __powerpc64__ > +#elif defined __s390x__ > + nsp -= 2 * 160; > +#elif defined __s390__ > + nsp -= 2 * 96; > #else > #error "unrecognized target" > #endif > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 5/5] s390: Add -fsplit-stack support 2016-01-15 18:39 ` Andreas Krebbel @ 2016-01-15 21:08 ` Marcin Kościelnicki 2016-01-21 10:12 ` Andreas Krebbel 2016-01-16 13:46 ` [PATCH] " Marcin Kościelnicki 1 sibling, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-15 21:08 UTC (permalink / raw) To: Andreas Krebbel, gcc-patches On 15/01/16 19:38, Andreas Krebbel wrote: > Marcin, > > your implementation looks very good to me. Thanks! > > But please be aware that we deprecated the support of g5 and g6 and intend to remove that code from > the back-end with the next GCC version. So I would prefer if you could remove all the > !TARGET_CPU_ZARCH stuff from the implementation and just error out if split-stack is enabled with > -march g5/g6. It currently makes the implementation more complicated and would have to be removed > anyway in the future. > > Thanks! > > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01854.html > > > Bye, > > -Andreas- > > Very well, I'll do that. Btw, as for dropping support for g5/g6: I've noticed s390_function_profiler could also use larl+brasl for -m31 given TARGET_CPU_ZARCH. Should I submit a patch for that? I'm asking because gold with -fsplit-stack needs to know the exact sequence used, so if it's going to change after g5/g6 removal, I'd better add it to gold now (and make gcc always emit it for non-g5/g6, so that gold won't need to look at the old one). What about the other patches? #1 and #2 should be ready to go. I'm not sure how I should go about getting #3 and #4 reviewed. We don't need #3 anymore once g5/g6 support is removed, but #4 might still be necessary - we still have that unconditional jump. Marcin KoÅcielnicki ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 5/5] s390: Add -fsplit-stack support 2016-01-15 21:08 ` Marcin Kościelnicki @ 2016-01-21 10:12 ` Andreas Krebbel 2016-01-21 13:04 ` Marcin Kościelnicki 0 siblings, 1 reply; 55+ messages in thread From: Andreas Krebbel @ 2016-01-21 10:12 UTC (permalink / raw) To: Marcin Kościelnicki, gcc-patches On 01/15/2016 10:08 PM, Marcin KoÅcielnicki wrote: > On 15/01/16 19:38, Andreas Krebbel wrote: >> Marcin, >> >> your implementation looks very good to me. Thanks! >> >> But please be aware that we deprecated the support of g5 and g6 and intend to remove that code from >> the back-end with the next GCC version. So I would prefer if you could remove all the >> !TARGET_CPU_ZARCH stuff from the implementation and just error out if split-stack is enabled with >> -march g5/g6. It currently makes the implementation more complicated and would have to be removed >> anyway in the future. >> >> Thanks! >> >> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01854.html >> >> >> Bye, >> >> -Andreas- >> >> > > Very well, I'll do that. > > Btw, as for dropping support for g5/g6: I've noticed > s390_function_profiler could also use larl+brasl for -m31 given > TARGET_CPU_ZARCH. Should I submit a patch for that? I'm asking because > gold with -fsplit-stack needs to know the exact sequence used, so if > it's going to change after g5/g6 removal, I'd better add it to gold now > (and make gcc always emit it for non-g5/g6, so that gold won't need to > look at the old one). Yes please, that would be great. Good catch! Thanks! -Andreas- ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 5/5] s390: Add -fsplit-stack support 2016-01-21 10:12 ` Andreas Krebbel @ 2016-01-21 13:04 ` Marcin Kościelnicki 0 siblings, 0 replies; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-21 13:04 UTC (permalink / raw) To: Andreas Krebbel, gcc-patches On 21/01/16 11:12, Andreas Krebbel wrote: > On 01/15/2016 10:08 PM, Marcin KoÅcielnicki wrote: >> On 15/01/16 19:38, Andreas Krebbel wrote: >>> Marcin, >>> >>> your implementation looks very good to me. Thanks! >>> >>> But please be aware that we deprecated the support of g5 and g6 and intend to remove that code from >>> the back-end with the next GCC version. So I would prefer if you could remove all the >>> !TARGET_CPU_ZARCH stuff from the implementation and just error out if split-stack is enabled with >>> -march g5/g6. It currently makes the implementation more complicated and would have to be removed >>> anyway in the future. >>> >>> Thanks! >>> >>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01854.html >>> >>> >>> Bye, >>> >>> -Andreas- >>> >>> >> >> Very well, I'll do that. >> >> Btw, as for dropping support for g5/g6: I've noticed >> s390_function_profiler could also use larl+brasl for -m31 given >> TARGET_CPU_ZARCH. Should I submit a patch for that? I'm asking because >> gold with -fsplit-stack needs to know the exact sequence used, so if >> it's going to change after g5/g6 removal, I'd better add it to gold now >> (and make gcc always emit it for non-g5/g6, so that gold won't need to >> look at the old one). > > Yes please, that would be great. Good catch! > > Thanks! > > -Andreas- > I've submitted the gcc patch, and will soon update the gold patch. Marcin KoÅcielnicki ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH] s390: Add -fsplit-stack support 2016-01-15 18:39 ` Andreas Krebbel 2016-01-15 21:08 ` Marcin Kościelnicki @ 2016-01-16 13:46 ` Marcin Kościelnicki 2016-01-29 13:33 ` Andreas Krebbel 1 sibling, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-16 13:46 UTC (permalink / raw) To: krebbel; +Cc: gcc-patches, Marcin Kościelnicki libgcc/ChangeLog: * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. * config/s390/morestack.S: New file. * config/s390/t-stack-s390: New file. * generic-morestack.c (__splitstack_find): Add s390-specific code. gcc/ChangeLog: * common/config/s390/s390-common.c (s390_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro. * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. * config/s390/s390.c (struct machine_function): New field split_stack_varargs_pointer. (s390_register_info): Mark r12 as clobbered if it'll be used as temp in s390_emit_prologue. (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack vararg pointer. (morestack_ref): New global. (SPLIT_STACK_AVAILABLE): New macro. (s390_expand_split_stack_prologue): New function. (s390_expand_split_stack_call): New function. (s390_live_on_entry): New function. (s390_va_start): Use split-stack vararg pointer if appropriate. (s390_reorg): Lower the split-stack pseudo-insns. (s390_asm_file_end): Emit the split-stack note sections. (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. * config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec. (UNSPECV_SPLIT_STACK_CALL): New unspec. (UNSPECV_SPLIT_STACK_SIBCALL): New unspec. (UNSPECV_SPLIT_STACK_MARKER): New unspec. (split_stack_prologue): New expand. (split_stack_call_*): New insn. (split_stack_cond_call_*): New insn. (split_stack_space_check): New expand. (split_stack_sibcall_*): New insn. (split_stack_cond_sibcall_*): New insn. (split_stack_marker): New insn. --- Support for !TARGET_CPU_ZARCH removed and sorried. I've also cleaned up the 31-bit versions of morestack.S routines to more closely mirror their 64-bit counterparts, since I can now use the newer opcodes. I'm also submitting a new version of the gold patch, which has support for old CPUs likewise removed. gcc/ChangeLog | 33 ++ gcc/common/config/s390/s390-common.c | 14 + gcc/config/s390/s390-protos.h | 1 + gcc/config/s390/s390.c | 371 ++++++++++++++++++++- gcc/config/s390/s390.md | 109 +++++++ libgcc/ChangeLog | 7 + libgcc/config.host | 4 +- libgcc/config/s390/morestack.S | 609 +++++++++++++++++++++++++++++++++++ libgcc/config/s390/t-stack-s390 | 2 + libgcc/generic-morestack.c | 4 + 10 files changed, 1148 insertions(+), 6 deletions(-) create mode 100644 libgcc/config/s390/morestack.S create mode 100644 libgcc/config/s390/t-stack-s390 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index c881d52..71f6f38 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,38 @@ 2016-01-16 Marcin KoÅcielnicki <koriakin@0x04.net> + * common/config/s390/s390-common.c (s390_supports_split_stack): + New function. + (TARGET_SUPPORTS_SPLIT_STACK): New macro. + * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. + * config/s390/s390.c (struct machine_function): New field + split_stack_varargs_pointer. + (s390_register_info): Mark r12 as clobbered if it'll be used as temp + in s390_emit_prologue. + (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack + vararg pointer. + (morestack_ref): New global. + (SPLIT_STACK_AVAILABLE): New macro. + (s390_expand_split_stack_prologue): New function. + (s390_expand_split_stack_call): New function. + (s390_live_on_entry): New function. + (s390_va_start): Use split-stack vararg pointer if appropriate. + (s390_reorg): Lower the split-stack pseudo-insns. + (s390_asm_file_end): Emit the split-stack note sections. + (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. + * config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec. + (UNSPECV_SPLIT_STACK_CALL): New unspec. + (UNSPECV_SPLIT_STACK_SIBCALL): New unspec. + (UNSPECV_SPLIT_STACK_MARKER): New unspec. + (split_stack_prologue): New expand. + (split_stack_call_*): New insn. + (split_stack_cond_call_*): New insn. + (split_stack_space_check): New expand. + (split_stack_sibcall_*): New insn. + (split_stack_cond_sibcall_*): New insn. + (split_stack_marker): New insn. + +2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> + * cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps with side effects. diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c index 4519c21..1e497e6 100644 --- a/gcc/common/config/s390/s390-common.c +++ b/gcc/common/config/s390/s390-common.c @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, } } +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23. + We don't verify it, since earlier versions just have padding at + its place, which works just as well. */ + +static bool +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED, + struct gcc_options *opts ATTRIBUTE_UNUSED) +{ + return true; +} + #undef TARGET_DEFAULT_TARGET_FLAGS #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT) @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, #undef TARGET_OPTION_INIT_STRUCT #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct +#undef TARGET_SUPPORTS_SPLIT_STACK +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack + struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index 633bc1e..09032c9 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, extern HOST_WIDE_INT s390_initial_elimination_offset (int, int); extern void s390_emit_prologue (void); extern void s390_emit_epilogue (bool); +extern void s390_expand_split_stack_prologue (void); extern bool s390_can_use_simple_return_insn (void); extern bool s390_can_use_return_insn (void); extern void s390_function_profiler (FILE *, int); diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 3be64de..6afce7c 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -426,6 +426,13 @@ struct GTY(()) machine_function /* True if the current function may contain a tbegin clobbering FPRs. */ bool tbegin_p; + + /* For -fsplit-stack support: A stack local which holds a pointer to + the stack arguments for a function with a variable number of + arguments. This is set at the start of the function and is used + to initialize the overflow_arg_area field of the va_list + structure. */ + rtx split_stack_varargs_pointer; }; /* Few accessor macros for struct cfun->machine->s390_frame_layout. */ @@ -9316,9 +9323,13 @@ s390_register_info () cfun_frame_layout.high_fprs++; } - if (flag_pic) - clobbered_regs[PIC_OFFSET_TABLE_REGNUM] - |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM); + /* Register 12 is used for GOT address, but also as temp in prologue + for split-stack stdarg functions (unless r14 is available). */ + clobbered_regs[12] + |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM)) + || (flag_split_stack && cfun->stdarg + && (crtl->is_leaf || TARGET_TPF_PROFILING + || has_hard_reg_initial_val (Pmode, RETURN_REGNUM)))); clobbered_regs[BASE_REGNUM] |= (cfun->machine->base_reg @@ -10446,6 +10457,8 @@ s390_emit_prologue (void) && !crtl->is_leaf && !TARGET_TPF_PROFILING) temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM); + else if (flag_split_stack && cfun->stdarg) + temp_reg = gen_rtx_REG (Pmode, 12); else temp_reg = gen_rtx_REG (Pmode, 1); @@ -10939,6 +10952,284 @@ s300_set_up_by_prologue (hard_reg_set_container *regs) SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg)); } +/* -fsplit-stack support. */ + +/* A SYMBOL_REF for __morestack. */ +static GTY(()) rtx morestack_ref; + +/* When using -fsplit-stack, the allocation routines set a field in + the TCB to the bottom of the stack plus this much space, measured + in bytes. */ + +#define SPLIT_STACK_AVAILABLE 1024 + +/* Emit -fsplit-stack prologue, which goes before the regular function + prologue. */ + +void +s390_expand_split_stack_prologue (void) +{ + rtx r1, guard, cc; + rtx_insn *insn; + /* Offset from thread pointer to __private_ss. */ + int psso = TARGET_64BIT ? 0x38 : 0x20; + /* Pointer size in bytes. */ + /* Frame size and argument size - the two parameters to __morestack. */ + HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size; + /* Align argument size to 8 bytes - simplifies __morestack code. */ + HOST_WIDE_INT args_size = crtl->args.size >= 0 + ? ((crtl->args.size + 7) & ~7) + : 0; + /* Label to jump to when no __morestack call is necessary. */ + rtx_code_label *enough = NULL; + /* Label to be called by __morestack. */ + rtx_code_label *call_done = NULL; + /* 1 if __morestack called conditionally, 0 if always. */ + int conditional = 0; + + gcc_assert (flag_split_stack && reload_completed); + if (!TARGET_CPU_ZARCH) + { + sorry ("CPUs older than z900 are not supported for -fsplit-stack"); + return; + } + + r1 = gen_rtx_REG (Pmode, 1); + + /* If no stack frame will be allocated, don't do anything. */ + if (!frame_size) + { + /* But emit a marker that will let linker and indirect function + calls recognise this function as split-stack aware. */ + emit_insn(gen_split_stack_marker()); + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + /* If va_start is used, just use r15. */ + emit_move_insn (r1, + gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (STACK_POINTER_OFFSET))); + } + return; + } + + if (morestack_ref == NULL_RTX) + { + morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack"); + SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL + | SYMBOL_FLAG_FUNCTION); + } + + if (frame_size <= 0x7fff || (TARGET_EXTIMM && frame_size <= 0xffffffffu)) + { + /* If frame_size will fit in an add instruction, do a stack space + check, and only call __morestack if there's not enough space. */ + conditional = 1; + + /* Get thread pointer. r1 is the only register we can always destroy - r0 + could contain a static chain (and cannot be used to address memory + anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved. */ + emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM)); + /* Aim at __private_ss. */ + guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso)); + + /* If less that 1kiB used, skip addition and compare directly with + __private_ss. */ + if (frame_size > SPLIT_STACK_AVAILABLE) + { + emit_move_insn (r1, guard); + if (TARGET_64BIT) + emit_insn (gen_adddi3 (r1, r1, GEN_INT(frame_size))); + else + emit_insn (gen_addsi3 (r1, r1, GEN_INT(frame_size))); + guard = r1; + } + + if (TARGET_CPU_ZARCH) + { + rtx tmp; + + /* Compare the (maybe adjusted) guard with the stack pointer. */ + cc = s390_emit_compare (LT, stack_pointer_rtx, guard); + + call_done = gen_label_rtx (); + + if (TARGET_64BIT) + tmp = gen_split_stack_cond_call_di (call_done, + morestack_ref, + GEN_INT (frame_size), + GEN_INT (args_size), + cc); + else + tmp = gen_split_stack_cond_call_si (call_done, + morestack_ref, + GEN_INT (frame_size), + GEN_INT (args_size), + cc); + + + insn = emit_jump_insn (tmp); + JUMP_LABEL (insn) = call_done; + + /* Mark the jump as very unlikely to be taken. */ + add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100); + } + else + { + /* Compare the (maybe adjusted) guard with the stack pointer. */ + cc = s390_emit_compare (GE, stack_pointer_rtx, guard); + + enough = gen_label_rtx (); + insn = s390_emit_jump (enough, cc); + JUMP_LABEL (insn) = enough; + + /* Mark the jump as very likely to be taken. */ + add_int_reg_note (insn, REG_BR_PROB, + REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100); + } + } + + if (call_done == NULL) + { + rtx tmp; + call_done = gen_label_rtx (); + + /* Now, we need to call __morestack. It has very special calling + conventions: it preserves param/return/static chain registers for + calling main function body, and looks for its own parameters + at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */ + if (TARGET_64BIT) + tmp = gen_split_stack_call_di (call_done, + morestack_ref, + GEN_INT (frame_size), + GEN_INT (args_size)); + else + tmp = gen_split_stack_call_si (call_done, + morestack_ref, + GEN_INT (frame_size), + GEN_INT (args_size)); + insn = emit_jump_insn (tmp); + JUMP_LABEL (insn) = call_done; + emit_barrier (); + } + + /* __morestack will call us here. */ + + if (enough != NULL) + { + emit_label (enough); + LABEL_NUSES (enough) = 1; + } + + if (conditional && cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + /* If va_start is used, and __morestack was not called, just use r15. */ + emit_move_insn (r1, + gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (STACK_POINTER_OFFSET))); + } + + emit_label (call_done); + LABEL_NUSES (call_done) = 1; +} + +/* Generates split-stack call sequence, along with its parameter block. */ + +static void +s390_expand_split_stack_call (rtx_insn *orig_insn, + rtx call_done, + rtx function, + rtx frame_size, + rtx args_size, + rtx cond) +{ + int psize = GET_MODE_SIZE (Pmode); + rtx_insn *insn = orig_insn; + rtx parmbase = gen_label_rtx(); + rtx r1 = gen_rtx_REG (Pmode, 1); + rtx tmp, tmp2; + + /* %r1 = litbase. */ + insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn); + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); + LABEL_NUSES (parmbase)++; + + /* jg<cond> __morestack. */ + if (cond == NULL) + { + if (TARGET_64BIT) + tmp = gen_split_stack_sibcall_di (function, call_done); + else + tmp = gen_split_stack_sibcall_si (function, call_done); + insn = emit_jump_insn_after (tmp, insn); + } + else + { + if (!s390_comparison (cond, VOIDmode)) + internal_error ("bad split_stack_call cond"); + if (TARGET_64BIT) + tmp = gen_split_stack_cond_sibcall_di (function, cond, call_done); + else + tmp = gen_split_stack_cond_sibcall_si (function, cond, call_done); + insn = emit_jump_insn_after (tmp, insn); + } + JUMP_LABEL (insn) = call_done; + LABEL_NUSES (call_done)++; + + /* Go to .rodata. */ + insn = emit_insn_after (gen_pool_section_start (), insn); + + /* Now, we'll emit parameters to __morestack. First, align to pointer size + (this mirrors the alignment done in __morestack - don't touch it). */ + insn = emit_insn_after (gen_pool_align (GEN_INT (psize)), insn); + + insn = emit_label_after (parmbase, insn); + + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, frame_size), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + + /* Second parameter is size of the arguments passed on stack that + __morestack has to copy to the new stack (does not include varargs). */ + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, args_size), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + + /* Third parameter is offset between start of the parameter block + and function body to be called by __morestack. */ + tmp = gen_rtx_LABEL_REF (Pmode, parmbase); + tmp2 = gen_rtx_LABEL_REF (Pmode, call_done); + tmp = gen_rtx_CONST (Pmode, + gen_rtx_MINUS (Pmode, tmp2, tmp)); + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, tmp), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + add_reg_note (insn, REG_LABEL_OPERAND, call_done); + LABEL_NUSES (call_done)++; + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); + LABEL_NUSES (parmbase)++; + + /* Return from .rodata. */ + insn = emit_insn_after (gen_pool_section_end (), insn); + + delete_insn (orig_insn); +} + +/* We may have to tell the dataflow pass that the split stack prologue + is initializing a register. */ + +static void +s390_live_on_entry (bitmap regs) +{ + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + gcc_assert (flag_split_stack); + bitmap_set_bit (regs, 1); + } +} + /* Return true if the function can use simple_return to return outside of a shrink-wrapped region. At present shrink-wrapping is supported in all cases. */ @@ -11541,6 +11832,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } + if (flag_split_stack + && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) + == NULL) + && cfun->machine->split_stack_varargs_pointer == NULL_RTX) + { + rtx reg; + rtx_insn *seq; + + reg = gen_reg_rtx (Pmode); + cfun->machine->split_stack_varargs_pointer = reg; + + start_sequence (); + emit_move_insn (reg, gen_rtx_REG (Pmode, 1)); + seq = get_insns (); + end_sequence (); + + push_topmost_sequence (); + emit_insn_after (seq, entry_of_function ()); + pop_topmost_sequence (); + } + /* Find the overflow area. FIXME: This currently is too pessimistic when the vector ABI is enabled. In that case we *always* set up the overflow area @@ -11549,7 +11861,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG || TARGET_VX_ABI) { - t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); + if (cfun->machine->split_stack_varargs_pointer == NULL_RTX) + t = make_tree (TREE_TYPE (ovf), crtl->args.internal_arg_pointer); + else + t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer); off = INTVAL (crtl->args.arg_offset_rtx); off = off < 0 ? 0 : off; @@ -13158,6 +13473,48 @@ s390_reorg (void) } } + if (flag_split_stack) + { + rtx_insn *insn; + + for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) + { + /* Look for the split-stack fake jump instructions. */ + if (!JUMP_P(insn)) + continue; + if (GET_CODE (PATTERN (insn)) != PARALLEL + || XVECLEN (PATTERN (insn), 0) != 2) + continue; + rtx set = XVECEXP (PATTERN (insn), 0, 1); + if (GET_CODE (set) != SET) + continue; + rtx unspec = XEXP(set, 1); + if (GET_CODE (unspec) != UNSPEC_VOLATILE) + continue; + if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL) + continue; + rtx set_pc = XVECEXP (PATTERN (insn), 0, 0); + rtx function = XVECEXP (unspec, 0, 0); + rtx frame_size = XVECEXP (unspec, 0, 1); + rtx args_size = XVECEXP (unspec, 0, 2); + rtx pc_src = XEXP (set_pc, 1); + rtx call_done, cond = NULL_RTX; + if (GET_CODE (pc_src) == IF_THEN_ELSE) + { + cond = XEXP (pc_src, 0); + call_done = XEXP (XEXP (pc_src, 1), 0); + } + else + call_done = XEXP (pc_src, 0); + s390_expand_split_stack_call (insn, + call_done, + function, + frame_size, + args_size, + cond); + } + } + /* Try to optimize prologue and epilogue further. */ s390_optimize_prologue (); @@ -14469,6 +14826,9 @@ s390_asm_file_end (void) s390_vector_abi); #endif file_end_indicate_exec_stack (); + + if (flag_split_stack) + file_end_indicate_split_stack (); } /* Return true if TYPE is a vector bool type. */ @@ -14724,6 +15084,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty #undef TARGET_SET_UP_BY_PROLOGUE #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue +#undef TARGET_EXTRA_LIVE_ON_ENTRY +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry + #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ s390_use_by_pieces_infrastructure_p diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 9b869d5..21cd989 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -114,6 +114,9 @@ UNSPEC_SP_SET UNSPEC_SP_TEST + ; Split stack support + UNSPEC_STACK_CHECK + ; Test Data Class (TDC) UNSPEC_TDC_INSN @@ -276,6 +279,11 @@ ; Set and get floating point control register UNSPECV_SFPC UNSPECV_EFPC + + ; Split stack support + UNSPECV_SPLIT_STACK_CALL + UNSPECV_SPLIT_STACK_SIBCALL + UNSPECV_SPLIT_STACK_MARKER ]) ;; @@ -10907,3 +10915,104 @@ "TARGET_Z13" "lcbb\t%0,%1,%b2" [(set_attr "op_type" "VRX")]) + +; Handle -fsplit-stack. + +(define_expand "split_stack_prologue" + [(const_int 0)] + "" +{ + s390_expand_split_stack_prologue (); + DONE; +}) + +(define_insn "split_stack_call_<mode>" + [(set (pc) (label_ref (match_operand 0 "" ""))) + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") + (match_operand 2 "consttable_operand" "X") + (match_operand 3 "consttable_operand" "X")] + UNSPECV_SPLIT_STACK_CALL))] + "TARGET_CPU_ZARCH" +{ + gcc_unreachable (); +} + [(set_attr "length" "12")]) + +(define_insn "split_stack_cond_call_<mode>" + [(set (pc) + (if_then_else + (match_operand 4 "" "") + (label_ref (match_operand 0 "" "")) + (pc))) + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") + (match_operand 2 "consttable_operand" "X") + (match_operand 3 "consttable_operand" "X")] + UNSPECV_SPLIT_STACK_CALL))] + "TARGET_CPU_ZARCH" +{ + gcc_unreachable (); +} + [(set_attr "length" "12")]) + +;; If there are operand 0 bytes available on the stack, jump to +;; operand 1. + +(define_expand "split_stack_space_check" + [(set (pc) (if_then_else + (ltu (minus (reg 15) + (match_operand 0 "register_operand")) + (unspec [(const_int 0)] UNSPEC_STACK_CHECK)) + (label_ref (match_operand 1)) + (pc)))] + "" +{ + /* Offset from thread pointer to __private_ss. */ + int psso = TARGET_64BIT ? 0x38 : 0x20; + rtx tp = s390_get_thread_pointer (); + rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso)); + rtx reg = gen_reg_rtx (Pmode); + rtx cc; + if (TARGET_64BIT) + emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0])); + else + emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0])); + cc = s390_emit_compare (GT, reg, guard); + s390_emit_jump (operands[1], cc); + + DONE; +}) + +;; A jg with minimal fuss for use in split stack prologue. + +(define_insn "split_stack_sibcall_<mode>" + [(set (pc) (label_ref (match_operand 1 "" ""))) + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] + UNSPECV_SPLIT_STACK_SIBCALL))] + "TARGET_CPU_ZARCH" + "jg\t%0" + [(set_attr "op_type" "RIL") + (set_attr "type" "branch")]) + +;; Also a conditional one. + +(define_insn "split_stack_cond_sibcall_<mode>" + [(set (pc) + (if_then_else + (match_operand 1 "" "") + (label_ref (match_operand 2 "" "")) + (pc))) + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] + UNSPECV_SPLIT_STACK_SIBCALL))] + "TARGET_CPU_ZARCH" + "jg%C1\t%0" + [(set_attr "op_type" "RIL") + (set_attr "type" "branch")]) + +;; An unusual nop instruction used to mark functions with no stack frames +;; as split-stack aware. + +(define_insn "split_stack_marker" + [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)] + "" + "nopr\t%%r15" + [(set_attr "op_type" "RR")]) diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog index 4cd8f01..604b120 100644 --- a/libgcc/ChangeLog +++ b/libgcc/ChangeLog @@ -1,3 +1,10 @@ +2016-01-16 Marcin KoÅcielnicki <koriakin@0x04.net> + + * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. + * config/s390/morestack.S: New file. + * config/s390/t-stack-s390: New file. + * generic-morestack.c (__splitstack_find): Add s390-specific code. + 2016-01-15 Nick Clifton <nickc@redhat.com> * config/msp430/t-msp430 (lib2_mul_none.o): Only use the first diff --git a/libgcc/config.host b/libgcc/config.host index f58ee45..9793155 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1105,11 +1105,11 @@ rx-*-elf) tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h" ;; s390-*-linux*) - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi" + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390" md_unwind_header=s390/linux-unwind.h ;; s390x-*-linux*) - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux" + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390" if test "${host_address}" = 32; then tmake_file="${tmake_file} s390/32/t-floattodi" fi diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S new file mode 100644 index 0000000..c99f6e4 --- /dev/null +++ b/libgcc/config/s390/morestack.S @@ -0,0 +1,609 @@ +# s390 support for -fsplit-stack. +# Copyright (C) 2015 Free Software Foundation, Inc. +# Contributed by Marcin KoÅcielnicki <koriakin@0x04.net>. + +# This file is part of GCC. + +# GCC is free software; you can redistribute it and/or modify it under +# the terms of the GNU General Public License as published by the Free +# Software Foundation; either version 3, or (at your option) any later +# version. + +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. + +# Under Section 7 of GPL version 3, you are granted additional +# permissions described in the GCC Runtime Library Exception, version +# 3.1, as published by the Free Software Foundation. + +# You should have received a copy of the GNU General Public License and +# a copy of the GCC Runtime Library Exception along with this program; +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +# <http://www.gnu.org/licenses/>. + +# Excess space needed to call ld.so resolver for lazy plt +# resolution. Go uses sigaltstack so this doesn't need to +# also cover signal frame size. +#define BACKOFF 0x1000 + +# The __morestack function. + + .global __morestack + .hidden __morestack + + .type __morestack,@function + +__morestack: +.LFB1: + .cfi_startproc + + +#ifndef __s390x__ + + +# The 31-bit __morestack function. + + # We use a cleanup to restore the stack guard if an exception + # is thrown through this code. +#ifndef __PIC__ + .cfi_personality 0,__gcc_personality_v0 + .cfi_lsda 0,.LLSDA1 +#else + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 + .cfi_lsda 0x1b,.LLSDA1 +#endif + + stm %r2, %r15, 0x8(%r15) # Save %r2-%r15. + .cfi_offset %r6, -0x48 + .cfi_offset %r7, -0x44 + .cfi_offset %r8, -0x40 + .cfi_offset %r9, -0x3c + .cfi_offset %r10, -0x38 + .cfi_offset %r11, -0x34 + .cfi_offset %r12, -0x30 + .cfi_offset %r13, -0x2c + .cfi_offset %r14, -0x28 + .cfi_offset %r15, -0x24 + lr %r11, %r15 # Make frame pointer for vararg. + .cfi_def_cfa_register %r11 + ahi %r15, -0x60 # 0x60 for standard frame. + st %r11, 0(%r15) # Save back chain. + lr %r8, %r0 # Save %r0 (static chain). + lr %r10, %r1 # Save %r1 (address of parameter block). + + l %r7, 0(%r10) # Required frame size to %r7 + ear %r1, %a0 # Extract thread pointer. + l %r1, 0x20(%r1) # Get stack bounduary + ar %r1, %r7 # Stack bounduary + frame size + a %r1, 4(%r10) # + stack param size + clr %r1, %r15 # Compare with current stack pointer + jle .Lnoalloc # guard > sp - frame-size: need alloc + + brasl %r14, __morestack_block_signals + + # We abuse one of caller's fpr save slots (which we don't use for fprs) + # as a local variable. Not needed here, but done to be consistent with + # the below use. + ahi %r7, BACKOFF # Bump requested size a bit. + st %r7, 0x40(%r11) # Stuff frame size on stack. + la %r2, 0x40(%r11) # Pass its address as parameter. + la %r3, 0x60(%r11) # Caller's stack parameters. + l %r4, 4(%r10) # Size of stack paremeters. + brasl %r14, __generic_morestack + + lr %r15, %r2 # Switch to the new stack. + ahi %r15, -0x60 # Make a stack frame on it. + st %r11, 0(%r15) # Save back chain. + + s %r2, 0x40(%r11) # The end of stack space. + ahi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. +.LEHB0: + st %r2, 0x20(%r1) # Save the new stack boundary. + + brasl %r14, __morestack_unblock_signals + + lr %r0, %r8 # Static chain. + lm %r2, %r6, 0x8(%r11) # Paremeter registers. + + # Third parameter is address of function meat - address of parameter + # block. + a %r10, 0x8(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0x60(%r11) + + # State of registers: + # %r0: Static chain from entry. + # %r1: Vararg pointer. + # %r2-%r6: Parameters from entry. + # %r7-%r10: Indeterminate. + # %r11: Frame pointer (%r15 from entry). + # %r12-%r13: Indeterminate. + # %r14: Return address. + # %r15: Stack pointer. + basr %r14, %r10 # Call our caller. + + stm %r2, %r3, 0x8(%r11) # Save return registers. + + brasl %r14, __morestack_block_signals + + # We need a stack slot now, but have no good way to get it - the frame + # on new stack had to be exactly 0x60 bytes, or stack parameters would + # be passed wrong. Abuse fpr save area in caller's frame (we don't + # save actual fprs). + la %r2, 0x40(%r11) + brasl %r14, __generic_releasestack + + s %r2, 0x40(%r11) # Subtract available space. + ahi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. +.LEHE0: + st %r2, 0x20(%r1) # Save the new stack boundary. + + # We need to restore the old stack pointer before unblocking signals. + # We also need 0x60 bytes for a stack frame. Since we had a stack + # frame at this place before the stack switch, there's no need to + # write the back chain again. + lr %r15, %r11 + ahi %r15, -0x60 + + brasl %r14, __morestack_unblock_signals + + lm %r2, %r15, 0x8(%r11) # Restore all registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# Executed if no new stack allocation is needed. + +.Lnoalloc: + .cfi_restore_state + # We may need to copy stack parameters. + l %r9, 0x4(%r10) # Load stack parameter size. + ltr %r9, %r9 # And check if it's 0. + je .Lnostackparm # Skip the copy if not needed. + sr %r15, %r9 # Make space on the stack. + la %r8, 0x60(%r15) # Destination. + la %r12, 0x60(%r11) # Source. + lr %r13, %r9 # Source size. +.Lcopy: + mvcle %r8, %r12, 0 # Copy. + jo .Lcopy + +.Lnostackparm: + # Third parameter is address of function meat - address of parameter + # block. + a %r10, 0x8(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0x60(%r11) + + # OK, no stack allocation needed. We still follow the protocol and + # call our caller - it doesn't cost much and makes sure vararg works. + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. + basr %r14, %r10 # Call our caller. + + lm %r6, %r15, 0x18(%r11) # Restore all callee-saved registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# This is the cleanup code called by the stack unwinder when unwinding +# through the code between .LEHB0 and .LEHE0 above. + +.L1: + .cfi_restore_state + lr %r2, %r11 # Stack pointer after resume. + brasl %r14, __generic_findstack + lr %r3, %r11 # Get the stack pointer. + sr %r3, %r2 # Subtract available space. + ahi %r3, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. + st %r3, 0x20(%r1) # Save the new stack boundary. + + lr %r2, %r6 # Exception header. +#ifdef __PIC__ + brasl %r14, _Unwind_Resume@PLT +#else + brasl %r14, _Unwind_Resume +#endif + +#else /* defined(__s390x__) */ + + +# The 64-bit __morestack function. + + # We use a cleanup to restore the stack guard if an exception + # is thrown through this code. +#ifndef __PIC__ + .cfi_personality 0x3,__gcc_personality_v0 + .cfi_lsda 0x3,.LLSDA1 +#else + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 + .cfi_lsda 0x1b,.LLSDA1 +#endif + + stmg %r2, %r15, 0x10(%r15) # Save %r2-%r15. + .cfi_offset %r6, -0x70 + .cfi_offset %r7, -0x68 + .cfi_offset %r8, -0x60 + .cfi_offset %r9, -0x58 + .cfi_offset %r10, -0x50 + .cfi_offset %r11, -0x48 + .cfi_offset %r12, -0x40 + .cfi_offset %r13, -0x38 + .cfi_offset %r14, -0x30 + .cfi_offset %r15, -0x28 + lgr %r11, %r15 # Make frame pointer for vararg. + .cfi_def_cfa_register %r11 + aghi %r15, -0xa0 # 0xa0 for standard frame. + stg %r11, 0(%r15) # Save back chain. + lgr %r8, %r0 # Save %r0 (static chain). + lgr %r10, %r1 # Save %r1 (address of parameter block). + + lg %r7, 0(%r10) # Required frame size to %r7 + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. + lg %r1, 0x38(%r1) # Get stack bounduary + agr %r1, %r7 # Stack bounduary + frame size + ag %r1, 8(%r10) # + stack param size + clgr %r1, %r15 # Compare with current stack pointer + jle .Lnoalloc # guard > sp - frame-size: need alloc + + brasl %r14, __morestack_block_signals + + # We abuse one of caller's fpr save slots (which we don't use for fprs) + # as a local variable. Not needed here, but done to be consistent with + # the below use. + aghi %r7, BACKOFF # Bump requested size a bit. + stg %r7, 0x80(%r11) # Stuff frame size on stack. + la %r2, 0x80(%r11) # Pass its address as parameter. + la %r3, 0xa0(%r11) # Caller's stack parameters. + lg %r4, 8(%r10) # Size of stack paremeters. + brasl %r14, __generic_morestack + + lgr %r15, %r2 # Switch to the new stack. + aghi %r15, -0xa0 # Make a stack frame on it. + stg %r11, 0(%r15) # Save back chain. + + sg %r2, 0x80(%r11) # The end of stack space. + aghi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. +.LEHB0: + stg %r2, 0x38(%r1) # Save the new stack boundary. + + brasl %r14, __morestack_unblock_signals + + lgr %r0, %r8 # Static chain. + lmg %r2, %r6, 0x10(%r11) # Paremeter registers. + + # Third parameter is address of function meat - address of parameter + # block. + ag %r10, 0x10(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0xa0(%r11) + + # State of registers: + # %r0: Static chain from entry. + # %r1: Vararg pointer. + # %r2-%r6: Parameters from entry. + # %r7-%r10: Indeterminate. + # %r11: Frame pointer (%r15 from entry). + # %r12-%r13: Indeterminate. + # %r14: Return address. + # %r15: Stack pointer. + basr %r14, %r10 # Call our caller. + + stg %r2, 0x10(%r11) # Save return register. + + brasl %r14, __morestack_block_signals + + # We need a stack slot now, but have no good way to get it - the frame + # on new stack had to be exactly 0xa0 bytes, or stack parameters would + # be passed wrong. Abuse fpr save area in caller's frame (we don't + # save actual fprs). + la %r2, 0x80(%r11) + brasl %r14, __generic_releasestack + + sg %r2, 0x80(%r11) # Subtract available space. + aghi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. +.LEHE0: + stg %r2, 0x38(%r1) # Save the new stack boundary. + + # We need to restore the old stack pointer before unblocking signals. + # We also need 0xa0 bytes for a stack frame. Since we had a stack + # frame at this place before the stack switch, there's no need to + # write the back chain again. + lgr %r15, %r11 + aghi %r15, -0xa0 + + brasl %r14, __morestack_unblock_signals + + lmg %r2, %r15, 0x10(%r11) # Restore all registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# Executed if no new stack allocation is needed. + +.Lnoalloc: + .cfi_restore_state + # We may need to copy stack parameters. + lg %r9, 0x8(%r10) # Load stack parameter size. + ltgr %r9, %r9 # Check if it's 0. + je .Lnostackparm # Skip the copy if not needed. + sgr %r15, %r9 # Make space on the stack. + la %r8, 0xa0(%r15) # Destination. + la %r12, 0xa0(%r11) # Source. + lgr %r13, %r9 # Source size. +.Lcopy: + mvcle %r8, %r12, 0 # Copy. + jo .Lcopy + +.Lnostackparm: + # Third parameter is address of function meat - address of parameter + # block. + ag %r10, 0x10(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0xa0(%r11) + + # OK, no stack allocation needed. We still follow the protocol and + # call our caller - it doesn't cost much and makes sure vararg works. + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. + basr %r14, %r10 # Call our caller. + + lmg %r6, %r15, 0x30(%r11) # Restore all callee-saved registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# This is the cleanup code called by the stack unwinder when unwinding +# through the code between .LEHB0 and .LEHE0 above. + +.L1: + .cfi_restore_state + lgr %r2, %r11 # Stack pointer after resume. + brasl %r14, __generic_findstack + lgr %r3, %r11 # Get the stack pointer. + sgr %r3, %r2 # Subtract available space. + aghi %r3, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. + stg %r3, 0x38(%r1) # Save the new stack boundary. + + lgr %r2, %r6 # Exception header. +#ifdef __PIC__ + brasl %r14, _Unwind_Resume@PLT +#else + brasl %r14, _Unwind_Resume +#endif + +#endif /* defined(__s390x__) */ + + .cfi_endproc + .size __morestack, . - __morestack + + +# The exception table. This tells the personality routine to execute +# the exception handler. + + .section .gcc_except_table,"a",@progbits + .align 4 +.LLSDA1: + .byte 0xff # @LPStart format (omit) + .byte 0xff # @TType format (omit) + .byte 0x1 # call-site format (uleb128) + .uleb128 .LLSDACSE1-.LLSDACSB1 # Call-site table length +.LLSDACSB1: + .uleb128 .LEHB0-.LFB1 # region 0 start + .uleb128 .LEHE0-.LEHB0 # length + .uleb128 .L1-.LFB1 # landing pad + .uleb128 0 # action +.LLSDACSE1: + + + .global __gcc_personality_v0 +#ifdef __PIC__ + # Build a position independent reference to the basic + # personality function. + .hidden DW.ref.__gcc_personality_v0 + .weak DW.ref.__gcc_personality_v0 + .section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat + .type DW.ref.__gcc_personality_v0, @object +DW.ref.__gcc_personality_v0: +#ifndef __LP64__ + .align 4 + .size DW.ref.__gcc_personality_v0, 4 + .long __gcc_personality_v0 +#else + .align 8 + .size DW.ref.__gcc_personality_v0, 8 + .quad __gcc_personality_v0 +#endif +#endif + + + +# Initialize the stack test value when the program starts or when a +# new thread starts. We don't know how large the main stack is, so we +# guess conservatively. We might be able to use getrlimit here. + + .text + .global __stack_split_initialize + .hidden __stack_split_initialize + + .type __stack_split_initialize, @function + +__stack_split_initialize: + +#ifndef __s390x__ + + ear %r1, %a0 + lr %r0, %r15 + ahi %r0, -0x4000 # We should have at least 16K. + st %r0, 0x20(%r1) + + lr %r2, %r15 + lhi %r3, 0x4000 +#ifdef __PIC__ + jg __generic_morestack_set_initial_sp@PLT # Tail call +#else + jg __generic_morestack_set_initial_sp # Tail call +#endif + +#else /* defined(__s390x__) */ + + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + lgr %r0, %r15 + aghi %r0, -0x4000 # We should have at least 16K. + stg %r0, 0x38(%r1) + + lgr %r2, %r15 + lghi %r3, 0x4000 +#ifdef __PIC__ + jg __generic_morestack_set_initial_sp@PLT # Tail call +#else + jg __generic_morestack_set_initial_sp # Tail call +#endif + +#endif /* defined(__s390x__) */ + + .size __stack_split_initialize, . - __stack_split_initialize + +# Routines to get and set the guard, for __splitstack_getcontext, +# __splitstack_setcontext, and __splitstack_makecontext. + +# void *__morestack_get_guard (void) returns the current stack guard. + .text + .global __morestack_get_guard + .hidden __morestack_get_guard + + .type __morestack_get_guard,@function + +__morestack_get_guard: + +#ifndef __s390x__ + ear %r1, %a0 + l %r2, 0x20(%r1) +#else + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + lg %r2, 0x38(%r1) +#endif + br %r14 + + .size __morestack_get_guard, . - __morestack_get_guard + +# void __morestack_set_guard (void *) sets the stack guard. + .global __morestack_set_guard + .hidden __morestack_set_guard + + .type __morestack_set_guard,@function + +__morestack_set_guard: + +#ifndef __s390x__ + ear %r1, %a0 + st %r2, 0x20(%r1) +#else + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + stg %r2, 0x38(%r1) +#endif + br %r14 + + .size __morestack_set_guard, . - __morestack_set_guard + +# void *__morestack_make_guard (void *, size_t) returns the stack +# guard value for a stack. + .global __morestack_make_guard + .hidden __morestack_make_guard + + .type __morestack_make_guard,@function + +__morestack_make_guard: + +#ifndef __s390x__ + sr %r2, %r3 + ahi %r2, BACKOFF +#else + sgr %r2, %r3 + aghi %r2, BACKOFF +#endif + br %r14 + + .size __morestack_make_guard, . - __morestack_make_guard + +# Make __stack_split_initialize a high priority constructor. + + .section .ctors.65535,"aw",@progbits + +#ifndef __LP64__ + .align 4 + .long __stack_split_initialize + .long __morestack_load_mmap +#else + .align 8 + .quad __stack_split_initialize + .quad __morestack_load_mmap +#endif + + .section .note.GNU-stack,"",@progbits + .section .note.GNU-split-stack,"",@progbits + .section .note.GNU-no-split-stack,"",@progbits diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390 new file mode 100644 index 0000000..4c959b0 --- /dev/null +++ b/libgcc/config/s390/t-stack-s390 @@ -0,0 +1,2 @@ +# Makefile fragment to support -fsplit-stack for s390. +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c index 89765d4..b8eec4e 100644 --- a/libgcc/generic-morestack.c +++ b/libgcc/generic-morestack.c @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len, #elif defined (__i386__) nsp -= 6 * sizeof (void *); #elif defined __powerpc64__ +#elif defined __s390x__ + nsp -= 2 * 160; +#elif defined __s390__ + nsp -= 2 * 96; #else #error "unrecognized target" #endif -- 2.7.0 ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-01-16 13:46 ` [PATCH] " Marcin Kościelnicki @ 2016-01-29 13:33 ` Andreas Krebbel 2016-01-29 15:43 ` Marcin Kościelnicki 0 siblings, 1 reply; 55+ messages in thread From: Andreas Krebbel @ 2016-01-29 13:33 UTC (permalink / raw) To: Marcin Kościelnicki; +Cc: gcc-patches Hi Marcin, sorry for the late feedback. A few comments regarding the split stack implementation: The GNU coding style requires to replace every 8 leading blanks on a line with a tab. There are many lines in your patch violating this. In case you are an emacs user `whitespace-cleanup' will fix this for you. Could you please add a testcase checking the different variants. I.e. with early exit, no-alloc in __morestack, and with an actual allocation? There are a few more comments inline. Bye, -Andreas- > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index c881d52..71f6f38 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,5 +1,38 @@ > 2016-01-16 Marcin KoÅcielnicki <koriakin@0x04.net> > > + * common/config/s390/s390-common.c (s390_supports_split_stack): > + New function. > + (TARGET_SUPPORTS_SPLIT_STACK): New macro. > + * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. > + * config/s390/s390.c (struct machine_function): New field > + split_stack_varargs_pointer. > + (s390_register_info): Mark r12 as clobbered if it'll be used as temp > + in s390_emit_prologue. > + (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack > + vararg pointer. > + (morestack_ref): New global. > + (SPLIT_STACK_AVAILABLE): New macro. > + (s390_expand_split_stack_prologue): New function. > + (s390_expand_split_stack_call): New function. > + (s390_live_on_entry): New function. > + (s390_va_start): Use split-stack vararg pointer if appropriate. > + (s390_reorg): Lower the split-stack pseudo-insns. > + (s390_asm_file_end): Emit the split-stack note sections. > + (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. > + * config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec. > + (UNSPECV_SPLIT_STACK_CALL): New unspec. > + (UNSPECV_SPLIT_STACK_SIBCALL): New unspec. > + (UNSPECV_SPLIT_STACK_MARKER): New unspec. > + (split_stack_prologue): New expand. > + (split_stack_call_*): New insn. > + (split_stack_cond_call_*): New insn. > + (split_stack_space_check): New expand. > + (split_stack_sibcall_*): New insn. > + (split_stack_cond_sibcall_*): New insn. > + (split_stack_marker): New insn. > + > +2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> > + > * cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps > with side effects. > > diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c > index 4519c21..1e497e6 100644 > --- a/gcc/common/config/s390/s390-common.c > +++ b/gcc/common/config/s390/s390-common.c > @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, > } > } > > +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23. > + We don't verify it, since earlier versions just have padding at > + its place, which works just as well. */ > + > +static bool > +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED, > + struct gcc_options *opts ATTRIBUTE_UNUSED) > +{ > + return true; > +} > + > #undef TARGET_DEFAULT_TARGET_FLAGS > #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT) > > @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, > #undef TARGET_OPTION_INIT_STRUCT > #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct > > +#undef TARGET_SUPPORTS_SPLIT_STACK > +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack > + > struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; > diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h > index 633bc1e..09032c9 100644 > --- a/gcc/config/s390/s390-protos.h > +++ b/gcc/config/s390/s390-protos.h > @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, > extern HOST_WIDE_INT s390_initial_elimination_offset (int, int); > extern void s390_emit_prologue (void); > extern void s390_emit_epilogue (bool); > +extern void s390_expand_split_stack_prologue (void); > extern bool s390_can_use_simple_return_insn (void); > extern bool s390_can_use_return_insn (void); > extern void s390_function_profiler (FILE *, int); > diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c > index 3be64de..6afce7c 100644 > --- a/gcc/config/s390/s390.c > +++ b/gcc/config/s390/s390.c > @@ -426,6 +426,13 @@ struct GTY(()) machine_function > /* True if the current function may contain a tbegin clobbering > FPRs. */ > bool tbegin_p; > + > + /* For -fsplit-stack support: A stack local which holds a pointer to > + the stack arguments for a function with a variable number of > + arguments. This is set at the start of the function and is used > + to initialize the overflow_arg_area field of the va_list > + structure. */ > + rtx split_stack_varargs_pointer; > }; > > /* Few accessor macros for struct cfun->machine->s390_frame_layout. */ > @@ -9316,9 +9323,13 @@ s390_register_info () > cfun_frame_layout.high_fprs++; > } > > - if (flag_pic) > - clobbered_regs[PIC_OFFSET_TABLE_REGNUM] > - |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM); > + /* Register 12 is used for GOT address, but also as temp in prologue > + for split-stack stdarg functions (unless r14 is available). */ > + clobbered_regs[12] > + |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM)) > + || (flag_split_stack && cfun->stdarg > + && (crtl->is_leaf || TARGET_TPF_PROFILING > + || has_hard_reg_initial_val (Pmode, RETURN_REGNUM)))); > > clobbered_regs[BASE_REGNUM] > |= (cfun->machine->base_reg > @@ -10446,6 +10457,8 @@ s390_emit_prologue (void) > && !crtl->is_leaf > && !TARGET_TPF_PROFILING) > temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM); > + else if (flag_split_stack && cfun->stdarg) > + temp_reg = gen_rtx_REG (Pmode, 12); TPF uses r1 hard coded in tracing prologue/epilogue. So I think we need && !TARGET_TPF_PROFILING here as well. > else > temp_reg = gen_rtx_REG (Pmode, 1); > > @@ -10939,6 +10952,284 @@ s300_set_up_by_prologue (hard_reg_set_container *regs) > SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg)); > } > > +/* -fsplit-stack support. */ > + > +/* A SYMBOL_REF for __morestack. */ > +static GTY(()) rtx morestack_ref; > + > +/* When using -fsplit-stack, the allocation routines set a field in > + the TCB to the bottom of the stack plus this much space, measured > + in bytes. */ > + > +#define SPLIT_STACK_AVAILABLE 1024 > + > +/* Emit -fsplit-stack prologue, which goes before the regular function > + prologue. */ > + > +void > +s390_expand_split_stack_prologue (void) > +{ > + rtx r1, guard, cc; > + rtx_insn *insn; > + /* Offset from thread pointer to __private_ss. */ > + int psso = TARGET_64BIT ? 0x38 : 0x20; > + /* Pointer size in bytes. */ > + /* Frame size and argument size - the two parameters to __morestack. */ > + HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size; > + /* Align argument size to 8 bytes - simplifies __morestack code. */ > + HOST_WIDE_INT args_size = crtl->args.size >= 0 > + ? ((crtl->args.size + 7) & ~7) > + : 0; > + /* Label to jump to when no __morestack call is necessary. */ > + rtx_code_label *enough = NULL; > + /* Label to be called by __morestack. */ > + rtx_code_label *call_done = NULL; > + /* 1 if __morestack called conditionally, 0 if always. */ > + int conditional = 0; > + > + gcc_assert (flag_split_stack && reload_completed); > + if (!TARGET_CPU_ZARCH) > + { > + sorry ("CPUs older than z900 are not supported for -fsplit-stack"); > + return; > + } > + > + r1 = gen_rtx_REG (Pmode, 1); > + > + /* If no stack frame will be allocated, don't do anything. */ > + if (!frame_size) > + { > + /* But emit a marker that will let linker and indirect function > + calls recognise this function as split-stack aware. */ > + emit_insn(gen_split_stack_marker()); 2x missing blank before ( > + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) > + { > + /* If va_start is used, just use r15. */ > + emit_move_insn (r1, > + gen_rtx_PLUS (Pmode, stack_pointer_rtx, > + GEN_INT (STACK_POINTER_OFFSET))); virtual_incoming_args_rtx ? > + } > + return; > + } > + > + if (morestack_ref == NULL_RTX) > + { > + morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack"); > + SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL > + | SYMBOL_FLAG_FUNCTION); > + } > + > + if (frame_size <= 0x7fff || (TARGET_EXTIMM && frame_size <= 0xffffffffu)) The agfi immediate value is a signed 32 bit integer. So you can only add up to 2G-1. I think it would be more readable to write this as: if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Os (frame_size)) as in s390_emit_prologue. The Os check will check for TARGET_EXTIMM as well. > + { > + /* If frame_size will fit in an add instruction, do a stack space > + check, and only call __morestack if there's not enough space. */ > + conditional = 1; > + > + /* Get thread pointer. r1 is the only register we can always destroy - r0 > + could contain a static chain (and cannot be used to address memory > + anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved. */ > + emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM)); > + /* Aim at __private_ss. */ > + guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso)); > + > + /* If less that 1kiB used, skip addition and compare directly with > + __private_ss. */ > + if (frame_size > SPLIT_STACK_AVAILABLE) > + { > + emit_move_insn (r1, guard); > + if (TARGET_64BIT) > + emit_insn (gen_adddi3 (r1, r1, GEN_INT(frame_size))); > + else > + emit_insn (gen_addsi3 (r1, r1, GEN_INT(frame_size))); > + guard = r1; > + } > + > + if (TARGET_CPU_ZARCH) > + { Looks like the !TARGET_CPU_ZARCH stuff hasn't been completely removed?! > + rtx tmp; > + > + /* Compare the (maybe adjusted) guard with the stack pointer. */ > + cc = s390_emit_compare (LT, stack_pointer_rtx, guard); > + > + call_done = gen_label_rtx (); > + > + if (TARGET_64BIT) > + tmp = gen_split_stack_cond_call_di (call_done, > + morestack_ref, > + GEN_INT (frame_size), > + GEN_INT (args_size), > + cc); > + else > + tmp = gen_split_stack_cond_call_si (call_done, > + morestack_ref, > + GEN_INT (frame_size), > + GEN_INT (args_size), > + cc); Perhaps it would be more readable to do the TARGET_64BIT check in a separate expander. Please see "movstr" in s390.md. The same applies to all the other gen_split_stack* invocations. > + > + > + insn = emit_jump_insn (tmp); > + JUMP_LABEL (insn) = call_done; > + > + /* Mark the jump as very unlikely to be taken. */ > + add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100); > + } > + else > + { > + /* Compare the (maybe adjusted) guard with the stack pointer. */ > + cc = s390_emit_compare (GE, stack_pointer_rtx, guard); > + > + enough = gen_label_rtx (); > + insn = s390_emit_jump (enough, cc); > + JUMP_LABEL (insn) = enough; > + > + /* Mark the jump as very likely to be taken. */ > + add_int_reg_note (insn, REG_BR_PROB, > + REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100); > + } > + } > + > + if (call_done == NULL) With the !TARGET_CPU_ZARCH path removed above this could be the else path to the frame_size check and call_done can be removed. > + { > + rtx tmp; > + call_done = gen_label_rtx (); > + > + /* Now, we need to call __morestack. It has very special calling > + conventions: it preserves param/return/static chain registers for > + calling main function body, and looks for its own parameters > + at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */ > + if (TARGET_64BIT) > + tmp = gen_split_stack_call_di (call_done, > + morestack_ref, > + GEN_INT (frame_size), > + GEN_INT (args_size)); Indentation. > + else > + tmp = gen_split_stack_call_si (call_done, > + morestack_ref, > + GEN_INT (frame_size), > + GEN_INT (args_size)); Indentation. > + insn = emit_jump_insn (tmp); > + JUMP_LABEL (insn) = call_done; > + emit_barrier (); > + } > + > + /* __morestack will call us here. */ > + > + if (enough != NULL) > + { > + emit_label (enough); > + LABEL_NUSES (enough) = 1; > + } This also was only for !TARGET_CPU_ZARCH. > + > + if (conditional && cfun->machine->split_stack_varargs_pointer != NULL_RTX) > + { > + /* If va_start is used, and __morestack was not called, just use r15. */ > + emit_move_insn (r1, > + gen_rtx_PLUS (Pmode, stack_pointer_rtx, > + GEN_INT (STACK_POINTER_OFFSET))); virtual_incoming_args_rtx? > + } > + > + emit_label (call_done); > + LABEL_NUSES (call_done) = 1; > +} > + > +/* Generates split-stack call sequence, along with its parameter block. */ > + > +static void > +s390_expand_split_stack_call (rtx_insn *orig_insn, > + rtx call_done, > + rtx function, > + rtx frame_size, > + rtx args_size, > + rtx cond) > +{ > + int psize = GET_MODE_SIZE (Pmode); > + rtx_insn *insn = orig_insn; > + rtx parmbase = gen_label_rtx(); > + rtx r1 = gen_rtx_REG (Pmode, 1); > + rtx tmp, tmp2; > + > + /* %r1 = litbase. */ > + insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn); > + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); > + LABEL_NUSES (parmbase)++; > + > + /* jg<cond> __morestack. */ > + if (cond == NULL) > + { > + if (TARGET_64BIT) > + tmp = gen_split_stack_sibcall_di (function, call_done); > + else > + tmp = gen_split_stack_sibcall_si (function, call_done); > + insn = emit_jump_insn_after (tmp, insn); > + } > + else > + { > + if (!s390_comparison (cond, VOIDmode)) > + internal_error ("bad split_stack_call cond"); Perhaps just gcc_assert (s390_comparison (cond, VOIDmode)); ? > + if (TARGET_64BIT) > + tmp = gen_split_stack_cond_sibcall_di (function, cond, call_done); > + else > + tmp = gen_split_stack_cond_sibcall_si (function, cond, call_done); > + insn = emit_jump_insn_after (tmp, insn); > + } > + JUMP_LABEL (insn) = call_done; > + LABEL_NUSES (call_done)++; > + > + /* Go to .rodata. */ > + insn = emit_insn_after (gen_pool_section_start (), insn); > + > + /* Now, we'll emit parameters to __morestack. First, align to pointer size > + (this mirrors the alignment done in __morestack - don't touch it). */ > + insn = emit_insn_after (gen_pool_align (GEN_INT (psize)), insn); psize -> UNITS_PER_LONG? > + > + insn = emit_label_after (parmbase, insn); > + > + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, > + gen_rtvec (1, frame_size), > + UNSPECV_POOL_ENTRY); > + insn = emit_insn_after (tmp, insn); > + > + /* Second parameter is size of the arguments passed on stack that > + __morestack has to copy to the new stack (does not include varargs). */ > + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, > + gen_rtvec (1, args_size), > + UNSPECV_POOL_ENTRY); > + insn = emit_insn_after (tmp, insn); > + > + /* Third parameter is offset between start of the parameter block > + and function body to be called by __morestack. */ > + tmp = gen_rtx_LABEL_REF (Pmode, parmbase); > + tmp2 = gen_rtx_LABEL_REF (Pmode, call_done); > + tmp = gen_rtx_CONST (Pmode, > + gen_rtx_MINUS (Pmode, tmp2, tmp)); > + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, > + gen_rtvec (1, tmp), > + UNSPECV_POOL_ENTRY); > + insn = emit_insn_after (tmp, insn); > + add_reg_note (insn, REG_LABEL_OPERAND, call_done); > + LABEL_NUSES (call_done)++; > + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); > + LABEL_NUSES (parmbase)++; > + > + /* Return from .rodata. */ > + insn = emit_insn_after (gen_pool_section_end (), insn); > + > + delete_insn (orig_insn); > +} > + > +/* We may have to tell the dataflow pass that the split stack prologue > + is initializing a register. */ > + > +static void > +s390_live_on_entry (bitmap regs) > +{ > + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) > + { > + gcc_assert (flag_split_stack); > + bitmap_set_bit (regs, 1); > + } > +} > + > /* Return true if the function can use simple_return to return outside > of a shrink-wrapped region. At present shrink-wrapping is supported > in all cases. */ > @@ -11541,6 +11832,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) > expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); > } > > + if (flag_split_stack > + && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) > + == NULL) > + && cfun->machine->split_stack_varargs_pointer == NULL_RTX) > + { > + rtx reg; > + rtx_insn *seq; > + > + reg = gen_reg_rtx (Pmode); > + cfun->machine->split_stack_varargs_pointer = reg; > + > + start_sequence (); > + emit_move_insn (reg, gen_rtx_REG (Pmode, 1)); > + seq = get_insns (); > + end_sequence (); > + > + push_topmost_sequence (); > + emit_insn_after (seq, entry_of_function ()); > + pop_topmost_sequence (); > + } > + > /* Find the overflow area. > FIXME: This currently is too pessimistic when the vector ABI is > enabled. In that case we *always* set up the overflow area > @@ -11549,7 +11861,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) > || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG > || TARGET_VX_ABI) > { > - t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); > + if (cfun->machine->split_stack_varargs_pointer == NULL_RTX) > + t = make_tree (TREE_TYPE (ovf), crtl->args.internal_arg_pointer); What is the reason for changing virtual_incoming_args_rtx to crtl->args.internal_arg_pointer in the non-split-stack case? > + else > + t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer); > > off = INTVAL (crtl->args.arg_offset_rtx); > off = off < 0 ? 0 : off; > @@ -13158,6 +13473,48 @@ s390_reorg (void) > } > } > > + if (flag_split_stack) > + { > + rtx_insn *insn; > + > + for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) > + { > + /* Look for the split-stack fake jump instructions. */ > + if (!JUMP_P(insn)) > + continue; > + if (GET_CODE (PATTERN (insn)) != PARALLEL > + || XVECLEN (PATTERN (insn), 0) != 2) > + continue; > + rtx set = XVECEXP (PATTERN (insn), 0, 1); > + if (GET_CODE (set) != SET) > + continue; > + rtx unspec = XEXP(set, 1); > + if (GET_CODE (unspec) != UNSPEC_VOLATILE) > + continue; > + if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL) > + continue; > + rtx set_pc = XVECEXP (PATTERN (insn), 0, 0); > + rtx function = XVECEXP (unspec, 0, 0); > + rtx frame_size = XVECEXP (unspec, 0, 1); > + rtx args_size = XVECEXP (unspec, 0, 2); > + rtx pc_src = XEXP (set_pc, 1); > + rtx call_done, cond = NULL_RTX; > + if (GET_CODE (pc_src) == IF_THEN_ELSE) > + { > + cond = XEXP (pc_src, 0); > + call_done = XEXP (XEXP (pc_src, 1), 0); > + } > + else > + call_done = XEXP (pc_src, 0); > + s390_expand_split_stack_call (insn, > + call_done, > + function, > + frame_size, > + args_size, > + cond); > + } > + } > + I'm wondering if it is really necessary to expand the call in that two-step approach?! We do the general literal pool handling in s390_reorg because we need all the insn lengths to be finalized before performing the branch/pool splitting loop. But this shouldn't be necessary in this case. Would it be possible to expand the call already in emit_prologue phase and get rid of the s390_reorg part? > /* Try to optimize prologue and epilogue further. */ > s390_optimize_prologue (); > > @@ -14469,6 +14826,9 @@ s390_asm_file_end (void) > s390_vector_abi); > #endif > file_end_indicate_exec_stack (); > + > + if (flag_split_stack) > + file_end_indicate_split_stack (); > } > > /* Return true if TYPE is a vector bool type. */ > @@ -14724,6 +15084,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty > #undef TARGET_SET_UP_BY_PROLOGUE > #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue > > +#undef TARGET_EXTRA_LIVE_ON_ENTRY > +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry > + > #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P > #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ > s390_use_by_pieces_infrastructure_p > diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md > index 9b869d5..21cd989 100644 > --- a/gcc/config/s390/s390.md > +++ b/gcc/config/s390/s390.md > @@ -114,6 +114,9 @@ > UNSPEC_SP_SET > UNSPEC_SP_TEST > > + ; Split stack support > + UNSPEC_STACK_CHECK > + > ; Test Data Class (TDC) > UNSPEC_TDC_INSN > > @@ -276,6 +279,11 @@ > ; Set and get floating point control register > UNSPECV_SFPC > UNSPECV_EFPC > + > + ; Split stack support > + UNSPECV_SPLIT_STACK_CALL > + UNSPECV_SPLIT_STACK_SIBCALL > + UNSPECV_SPLIT_STACK_MARKER > ]) > > ;; > @@ -10907,3 +10915,104 @@ > "TARGET_Z13" > "lcbb\t%0,%1,%b2" > [(set_attr "op_type" "VRX")]) > + > +; Handle -fsplit-stack. > + > +(define_expand "split_stack_prologue" > + [(const_int 0)] > + "" > +{ > + s390_expand_split_stack_prologue (); > + DONE; > +}) > + > +(define_insn "split_stack_call_<mode>" > + [(set (pc) (label_ref (match_operand 0 "" ""))) > + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") > + (match_operand 2 "consttable_operand" "X") > + (match_operand 3 "consttable_operand" "X")] > + UNSPECV_SPLIT_STACK_CALL))] > + "TARGET_CPU_ZARCH" > +{ > + gcc_unreachable (); > +} > + [(set_attr "length" "12")]) > + > +(define_insn "split_stack_cond_call_<mode>" > + [(set (pc) > + (if_then_else > + (match_operand 4 "" "") > + (label_ref (match_operand 0 "" "")) > + (pc))) > + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") > + (match_operand 2 "consttable_operand" "X") > + (match_operand 3 "consttable_operand" "X")] > + UNSPECV_SPLIT_STACK_CALL))] > + "TARGET_CPU_ZARCH" > +{ > + gcc_unreachable (); > +} > + [(set_attr "length" "12")]) > + > +;; If there are operand 0 bytes available on the stack, jump to > +;; operand 1. > + > +(define_expand "split_stack_space_check" > + [(set (pc) (if_then_else > + (ltu (minus (reg 15) > + (match_operand 0 "register_operand")) > + (unspec [(const_int 0)] UNSPEC_STACK_CHECK)) > + (label_ref (match_operand 1)) > + (pc)))] > + "" > +{ > + /* Offset from thread pointer to __private_ss. */ > + int psso = TARGET_64BIT ? 0x38 : 0x20; > + rtx tp = s390_get_thread_pointer (); > + rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso)); > + rtx reg = gen_reg_rtx (Pmode); > + rtx cc; > + if (TARGET_64BIT) > + emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0])); > + else > + emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0])); > + cc = s390_emit_compare (GT, reg, guard); > + s390_emit_jump (operands[1], cc); > + > + DONE; > +}) This expander does not seem to get called from anywhere. > + > +;; A jg with minimal fuss for use in split stack prologue. > + > +(define_insn "split_stack_sibcall_<mode>" > + [(set (pc) (label_ref (match_operand 1 "" ""))) > + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] > + UNSPECV_SPLIT_STACK_SIBCALL))] > + "TARGET_CPU_ZARCH" > + "jg\t%0" > + [(set_attr "op_type" "RIL") > + (set_attr "type" "branch")]) > + > +;; Also a conditional one. > + > +(define_insn "split_stack_cond_sibcall_<mode>" > + [(set (pc) > + (if_then_else > + (match_operand 1 "" "") > + (label_ref (match_operand 2 "" "")) > + (pc))) > + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] > + UNSPECV_SPLIT_STACK_SIBCALL))] > + "TARGET_CPU_ZARCH" > + "jg%C1\t%0" > + [(set_attr "op_type" "RIL") > + (set_attr "type" "branch")]) > + > +;; An unusual nop instruction used to mark functions with no stack frames > +;; as split-stack aware. > + > +(define_insn "split_stack_marker" > + [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)] > + "" > + "nopr\t%%r15" > + [(set_attr "op_type" "RR")]) > diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog > index 4cd8f01..604b120 100644 > --- a/libgcc/ChangeLog > +++ b/libgcc/ChangeLog > @@ -1,3 +1,10 @@ > +2016-01-16 Marcin KoÅcielnicki <koriakin@0x04.net> > + > + * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. > + * config/s390/morestack.S: New file. > + * config/s390/t-stack-s390: New file. > + * generic-morestack.c (__splitstack_find): Add s390-specific code. > + > 2016-01-15 Nick Clifton <nickc@redhat.com> > > * config/msp430/t-msp430 (lib2_mul_none.o): Only use the first > diff --git a/libgcc/config.host b/libgcc/config.host > index f58ee45..9793155 100644 > --- a/libgcc/config.host > +++ b/libgcc/config.host > @@ -1105,11 +1105,11 @@ rx-*-elf) > tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h" > ;; > s390-*-linux*) > - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi" > + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390" > md_unwind_header=s390/linux-unwind.h > ;; > s390x-*-linux*) > - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux" > + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390" > if test "${host_address}" = 32; then > tmake_file="${tmake_file} s390/32/t-floattodi" > fi > diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S > new file mode 100644 > index 0000000..c99f6e4 > --- /dev/null > +++ b/libgcc/config/s390/morestack.S > @@ -0,0 +1,609 @@ > +# s390 support for -fsplit-stack. > +# Copyright (C) 2015 Free Software Foundation, Inc. > +# Contributed by Marcin KoÅcielnicki <koriakin@0x04.net>. > + > +# This file is part of GCC. > + > +# GCC is free software; you can redistribute it and/or modify it under > +# the terms of the GNU General Public License as published by the Free > +# Software Foundation; either version 3, or (at your option) any later > +# version. > + > +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +# WARRANTY; without even the implied warranty of MERCHANTABILITY or > +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > +# for more details. > + > +# Under Section 7 of GPL version 3, you are granted additional > +# permissions described in the GCC Runtime Library Exception, version > +# 3.1, as published by the Free Software Foundation. > + > +# You should have received a copy of the GNU General Public License and > +# a copy of the GCC Runtime Library Exception along with this program; > +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > +# <http://www.gnu.org/licenses/>. > + > +# Excess space needed to call ld.so resolver for lazy plt > +# resolution. Go uses sigaltstack so this doesn't need to > +# also cover signal frame size. > +#define BACKOFF 0x1000 > + > +# The __morestack function. > + > + .global __morestack > + .hidden __morestack > + > + .type __morestack,@function > + > +__morestack: > +.LFB1: > + .cfi_startproc > + > + > +#ifndef __s390x__ > + > + > +# The 31-bit __morestack function. > + > + # We use a cleanup to restore the stack guard if an exception > + # is thrown through this code. > +#ifndef __PIC__ > + .cfi_personality 0,__gcc_personality_v0 > + .cfi_lsda 0,.LLSDA1 > +#else > + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 > + .cfi_lsda 0x1b,.LLSDA1 > +#endif > + > + stm %r2, %r15, 0x8(%r15) # Save %r2-%r15. > + .cfi_offset %r6, -0x48 > + .cfi_offset %r7, -0x44 > + .cfi_offset %r8, -0x40 > + .cfi_offset %r9, -0x3c > + .cfi_offset %r10, -0x38 > + .cfi_offset %r11, -0x34 > + .cfi_offset %r12, -0x30 > + .cfi_offset %r13, -0x2c > + .cfi_offset %r14, -0x28 > + .cfi_offset %r15, -0x24 > + lr %r11, %r15 # Make frame pointer for vararg. > + .cfi_def_cfa_register %r11 > + ahi %r15, -0x60 # 0x60 for standard frame. > + st %r11, 0(%r15) # Save back chain. > + lr %r8, %r0 # Save %r0 (static chain). > + lr %r10, %r1 # Save %r1 (address of parameter block). > + > + l %r7, 0(%r10) # Required frame size to %r7 > + ear %r1, %a0 # Extract thread pointer. > + l %r1, 0x20(%r1) # Get stack bounduary > + ar %r1, %r7 # Stack bounduary + frame size > + a %r1, 4(%r10) # + stack param size > + clr %r1, %r15 # Compare with current stack pointer > + jle .Lnoalloc # guard > sp - frame-size: need alloc > + > + brasl %r14, __morestack_block_signals > + > + # We abuse one of caller's fpr save slots (which we don't use for fprs) > + # as a local variable. Not needed here, but done to be consistent with > + # the below use. > + ahi %r7, BACKOFF # Bump requested size a bit. > + st %r7, 0x40(%r11) # Stuff frame size on stack. > + la %r2, 0x40(%r11) # Pass its address as parameter. > + la %r3, 0x60(%r11) # Caller's stack parameters. > + l %r4, 4(%r10) # Size of stack paremeters. parameters > + brasl %r14, __generic_morestack > + > + lr %r15, %r2 # Switch to the new stack. > + ahi %r15, -0x60 # Make a stack frame on it. > + st %r11, 0(%r15) # Save back chain. > + > + s %r2, 0x40(%r11) # The end of stack space. > + ahi %r2, BACKOFF # Back off a bit. > + ear %r1, %a0 # Extract thread pointer. > +.LEHB0: > + st %r2, 0x20(%r1) # Save the new stack boundary. > + > + brasl %r14, __morestack_unblock_signals > + > + lr %r0, %r8 # Static chain. > + lm %r2, %r6, 0x8(%r11) # Paremeter registers. > + > + # Third parameter is address of function meat - address of parameter > + # block. > + a %r10, 0x8(%r10) > + > + # Leave vararg pointer in %r1, in case function uses it > + la %r1, 0x60(%r11) > + > + # State of registers: > + # %r0: Static chain from entry. > + # %r1: Vararg pointer. > + # %r2-%r6: Parameters from entry. > + # %r7-%r10: Indeterminate. > + # %r11: Frame pointer (%r15 from entry). > + # %r12-%r13: Indeterminate. > + # %r14: Return address. > + # %r15: Stack pointer. > + basr %r14, %r10 # Call our caller. > + > + stm %r2, %r3, 0x8(%r11) # Save return registers. > + > + brasl %r14, __morestack_block_signals > + > + # We need a stack slot now, but have no good way to get it - the frame > + # on new stack had to be exactly 0x60 bytes, or stack parameters would > + # be passed wrong. Abuse fpr save area in caller's frame (we don't > + # save actual fprs). > + la %r2, 0x40(%r11) > + brasl %r14, __generic_releasestack > + > + s %r2, 0x40(%r11) # Subtract available space. > + ahi %r2, BACKOFF # Back off a bit. > + ear %r1, %a0 # Extract thread pointer. > +.LEHE0: > + st %r2, 0x20(%r1) # Save the new stack boundary. > + > + # We need to restore the old stack pointer before unblocking signals. > + # We also need 0x60 bytes for a stack frame. Since we had a stack > + # frame at this place before the stack switch, there's no need to > + # write the back chain again. > + lr %r15, %r11 > + ahi %r15, -0x60 > + > + brasl %r14, __morestack_unblock_signals > + > + lm %r2, %r15, 0x8(%r11) # Restore all registers. > + .cfi_remember_state > + .cfi_restore %r15 > + .cfi_restore %r14 > + .cfi_restore %r13 > + .cfi_restore %r12 > + .cfi_restore %r11 > + .cfi_restore %r10 > + .cfi_restore %r9 > + .cfi_restore %r8 > + .cfi_restore %r7 > + .cfi_restore %r6 > + .cfi_def_cfa_register %r15 > + br %r14 # Return to caller's caller. > + > +# Executed if no new stack allocation is needed. > + > +.Lnoalloc: > + .cfi_restore_state > + # We may need to copy stack parameters. > + l %r9, 0x4(%r10) # Load stack parameter size. > + ltr %r9, %r9 # And check if it's 0. > + je .Lnostackparm # Skip the copy if not needed. > + sr %r15, %r9 # Make space on the stack. > + la %r8, 0x60(%r15) # Destination. > + la %r12, 0x60(%r11) # Source. > + lr %r13, %r9 # Source size. > +.Lcopy: > + mvcle %r8, %r12, 0 # Copy. > + jo .Lcopy > + > +.Lnostackparm: > + # Third parameter is address of function meat - address of parameter > + # block. > + a %r10, 0x8(%r10) > + > + # Leave vararg pointer in %r1, in case function uses it > + la %r1, 0x60(%r11) > + > + # OK, no stack allocation needed. We still follow the protocol and > + # call our caller - it doesn't cost much and makes sure vararg works. > + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. > + basr %r14, %r10 # Call our caller. The comment confuses me. It somewhat sounds to me like the call wouldn't be really needed but in fact it cannot even remotely work without jumping back to the function body right?! > + > + lm %r6, %r15, 0x18(%r11) # Restore all callee-saved registers. > + .cfi_remember_state > + .cfi_restore %r15 > + .cfi_restore %r14 > + .cfi_restore %r13 > + .cfi_restore %r12 > + .cfi_restore %r11 > + .cfi_restore %r10 > + .cfi_restore %r9 > + .cfi_restore %r8 > + .cfi_restore %r7 > + .cfi_restore %r6 > + .cfi_def_cfa_register %r15 > + br %r14 # Return to caller's caller. > + > +# This is the cleanup code called by the stack unwinder when unwinding > +# through the code between .LEHB0 and .LEHE0 above. > + > +.L1: > + .cfi_restore_state > + lr %r2, %r11 # Stack pointer after resume. > + brasl %r14, __generic_findstack > + lr %r3, %r11 # Get the stack pointer. > + sr %r3, %r2 # Subtract available space. > + ahi %r3, BACKOFF # Back off a bit. > + ear %r1, %a0 # Extract thread pointer. > + st %r3, 0x20(%r1) # Save the new stack boundary. > + > + lr %r2, %r6 # Exception header. > +#ifdef __PIC__ > + brasl %r14, _Unwind_Resume@PLT > +#else > + brasl %r14, _Unwind_Resume > +#endif > + > +#else /* defined(__s390x__) */ > + > + > +# The 64-bit __morestack function. > + > + # We use a cleanup to restore the stack guard if an exception > + # is thrown through this code. > +#ifndef __PIC__ > + .cfi_personality 0x3,__gcc_personality_v0 > + .cfi_lsda 0x3,.LLSDA1 > +#else > + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 > + .cfi_lsda 0x1b,.LLSDA1 > +#endif > + > + stmg %r2, %r15, 0x10(%r15) # Save %r2-%r15. > + .cfi_offset %r6, -0x70 > + .cfi_offset %r7, -0x68 > + .cfi_offset %r8, -0x60 > + .cfi_offset %r9, -0x58 > + .cfi_offset %r10, -0x50 > + .cfi_offset %r11, -0x48 > + .cfi_offset %r12, -0x40 > + .cfi_offset %r13, -0x38 > + .cfi_offset %r14, -0x30 > + .cfi_offset %r15, -0x28 > + lgr %r11, %r15 # Make frame pointer for vararg. > + .cfi_def_cfa_register %r11 > + aghi %r15, -0xa0 # 0xa0 for standard frame. > + stg %r11, 0(%r15) # Save back chain. > + lgr %r8, %r0 # Save %r0 (static chain). > + lgr %r10, %r1 # Save %r1 (address of parameter block). > + > + lg %r7, 0(%r10) # Required frame size to %r7 > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 # Extract thread pointer. > + lg %r1, 0x38(%r1) # Get stack bounduary > + agr %r1, %r7 # Stack bounduary + frame size > + ag %r1, 8(%r10) # + stack param size > + clgr %r1, %r15 # Compare with current stack pointer > + jle .Lnoalloc # guard > sp - frame-size: need alloc > + > + brasl %r14, __morestack_block_signals > + > + # We abuse one of caller's fpr save slots (which we don't use for fprs) > + # as a local variable. Not needed here, but done to be consistent with > + # the below use. > + aghi %r7, BACKOFF # Bump requested size a bit. > + stg %r7, 0x80(%r11) # Stuff frame size on stack. > + la %r2, 0x80(%r11) # Pass its address as parameter. > + la %r3, 0xa0(%r11) # Caller's stack parameters. > + lg %r4, 8(%r10) # Size of stack paremeters. > + brasl %r14, __generic_morestack > + > + lgr %r15, %r2 # Switch to the new stack. > + aghi %r15, -0xa0 # Make a stack frame on it. > + stg %r11, 0(%r15) # Save back chain. > + > + sg %r2, 0x80(%r11) # The end of stack space. > + aghi %r2, BACKOFF # Back off a bit. > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 # Extract thread pointer. > +.LEHB0: > + stg %r2, 0x38(%r1) # Save the new stack boundary. > + > + brasl %r14, __morestack_unblock_signals > + > + lgr %r0, %r8 # Static chain. > + lmg %r2, %r6, 0x10(%r11) # Paremeter registers. > + > + # Third parameter is address of function meat - address of parameter > + # block. > + ag %r10, 0x10(%r10) > + > + # Leave vararg pointer in %r1, in case function uses it > + la %r1, 0xa0(%r11) > + > + # State of registers: > + # %r0: Static chain from entry. > + # %r1: Vararg pointer. > + # %r2-%r6: Parameters from entry. > + # %r7-%r10: Indeterminate. > + # %r11: Frame pointer (%r15 from entry). > + # %r12-%r13: Indeterminate. > + # %r14: Return address. > + # %r15: Stack pointer. > + basr %r14, %r10 # Call our caller. > + > + stg %r2, 0x10(%r11) # Save return register. > + > + brasl %r14, __morestack_block_signals > + > + # We need a stack slot now, but have no good way to get it - the frame > + # on new stack had to be exactly 0xa0 bytes, or stack parameters would > + # be passed wrong. Abuse fpr save area in caller's frame (we don't > + # save actual fprs). > + la %r2, 0x80(%r11) > + brasl %r14, __generic_releasestack > + > + sg %r2, 0x80(%r11) # Subtract available space. > + aghi %r2, BACKOFF # Back off a bit. > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 # Extract thread pointer. > +.LEHE0: > + stg %r2, 0x38(%r1) # Save the new stack boundary. > + > + # We need to restore the old stack pointer before unblocking signals. > + # We also need 0xa0 bytes for a stack frame. Since we had a stack > + # frame at this place before the stack switch, there's no need to > + # write the back chain again. > + lgr %r15, %r11 > + aghi %r15, -0xa0 > + > + brasl %r14, __morestack_unblock_signals > + > + lmg %r2, %r15, 0x10(%r11) # Restore all registers. > + .cfi_remember_state > + .cfi_restore %r15 > + .cfi_restore %r14 > + .cfi_restore %r13 > + .cfi_restore %r12 > + .cfi_restore %r11 > + .cfi_restore %r10 > + .cfi_restore %r9 > + .cfi_restore %r8 > + .cfi_restore %r7 > + .cfi_restore %r6 > + .cfi_def_cfa_register %r15 > + br %r14 # Return to caller's caller. > + > +# Executed if no new stack allocation is needed. > + > +.Lnoalloc: > + .cfi_restore_state > + # We may need to copy stack parameters. > + lg %r9, 0x8(%r10) # Load stack parameter size. > + ltgr %r9, %r9 # Check if it's 0. > + je .Lnostackparm # Skip the copy if not needed. > + sgr %r15, %r9 # Make space on the stack. > + la %r8, 0xa0(%r15) # Destination. > + la %r12, 0xa0(%r11) # Source. > + lgr %r13, %r9 # Source size. > +.Lcopy: > + mvcle %r8, %r12, 0 # Copy. > + jo .Lcopy > + > +.Lnostackparm: > + # Third parameter is address of function meat - address of parameter > + # block. > + ag %r10, 0x10(%r10) > + > + # Leave vararg pointer in %r1, in case function uses it > + la %r1, 0xa0(%r11) > + > + # OK, no stack allocation needed. We still follow the protocol and > + # call our caller - it doesn't cost much and makes sure vararg works. > + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. > + basr %r14, %r10 # Call our caller. > + > + lmg %r6, %r15, 0x30(%r11) # Restore all callee-saved registers. > + .cfi_remember_state > + .cfi_restore %r15 > + .cfi_restore %r14 > + .cfi_restore %r13 > + .cfi_restore %r12 > + .cfi_restore %r11 > + .cfi_restore %r10 > + .cfi_restore %r9 > + .cfi_restore %r8 > + .cfi_restore %r7 > + .cfi_restore %r6 > + .cfi_def_cfa_register %r15 > + br %r14 # Return to caller's caller. > + > +# This is the cleanup code called by the stack unwinder when unwinding > +# through the code between .LEHB0 and .LEHE0 above. > + > +.L1: > + .cfi_restore_state > + lgr %r2, %r11 # Stack pointer after resume. > + brasl %r14, __generic_findstack > + lgr %r3, %r11 # Get the stack pointer. > + sgr %r3, %r2 # Subtract available space. > + aghi %r3, BACKOFF # Back off a bit. > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 # Extract thread pointer. > + stg %r3, 0x38(%r1) # Save the new stack boundary. > + > + lgr %r2, %r6 # Exception header. > +#ifdef __PIC__ > + brasl %r14, _Unwind_Resume@PLT > +#else > + brasl %r14, _Unwind_Resume > +#endif > + > +#endif /* defined(__s390x__) */ > + > + .cfi_endproc > + .size __morestack, . - __morestack > + > + > +# The exception table. This tells the personality routine to execute > +# the exception handler. > + > + .section .gcc_except_table,"a",@progbits > + .align 4 > +.LLSDA1: > + .byte 0xff # @LPStart format (omit) > + .byte 0xff # @TType format (omit) > + .byte 0x1 # call-site format (uleb128) > + .uleb128 .LLSDACSE1-.LLSDACSB1 # Call-site table length > +.LLSDACSB1: > + .uleb128 .LEHB0-.LFB1 # region 0 start > + .uleb128 .LEHE0-.LEHB0 # length > + .uleb128 .L1-.LFB1 # landing pad > + .uleb128 0 # action > +.LLSDACSE1: > + > + > + .global __gcc_personality_v0 > +#ifdef __PIC__ > + # Build a position independent reference to the basic > + # personality function. > + .hidden DW.ref.__gcc_personality_v0 > + .weak DW.ref.__gcc_personality_v0 > + .section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat > + .type DW.ref.__gcc_personality_v0, @object > +DW.ref.__gcc_personality_v0: > +#ifndef __LP64__ > + .align 4 > + .size DW.ref.__gcc_personality_v0, 4 > + .long __gcc_personality_v0 > +#else > + .align 8 > + .size DW.ref.__gcc_personality_v0, 8 > + .quad __gcc_personality_v0 > +#endif > +#endif > + > + > + > +# Initialize the stack test value when the program starts or when a > +# new thread starts. We don't know how large the main stack is, so we > +# guess conservatively. We might be able to use getrlimit here. > + > + .text > + .global __stack_split_initialize > + .hidden __stack_split_initialize > + > + .type __stack_split_initialize, @function > + > +__stack_split_initialize: > + > +#ifndef __s390x__ > + > + ear %r1, %a0 > + lr %r0, %r15 > + ahi %r0, -0x4000 # We should have at least 16K. > + st %r0, 0x20(%r1) > + > + lr %r2, %r15 > + lhi %r3, 0x4000 > +#ifdef __PIC__ > + jg __generic_morestack_set_initial_sp@PLT # Tail call > +#else > + jg __generic_morestack_set_initial_sp # Tail call > +#endif > + > +#else /* defined(__s390x__) */ > + > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 > + lgr %r0, %r15 > + aghi %r0, -0x4000 # We should have at least 16K. > + stg %r0, 0x38(%r1) > + > + lgr %r2, %r15 > + lghi %r3, 0x4000 > +#ifdef __PIC__ > + jg __generic_morestack_set_initial_sp@PLT # Tail call > +#else > + jg __generic_morestack_set_initial_sp # Tail call > +#endif > + > +#endif /* defined(__s390x__) */ > + > + .size __stack_split_initialize, . - __stack_split_initialize > + > +# Routines to get and set the guard, for __splitstack_getcontext, > +# __splitstack_setcontext, and __splitstack_makecontext. > + > +# void *__morestack_get_guard (void) returns the current stack guard. > + .text > + .global __morestack_get_guard > + .hidden __morestack_get_guard > + > + .type __morestack_get_guard,@function > + > +__morestack_get_guard: > + > +#ifndef __s390x__ > + ear %r1, %a0 > + l %r2, 0x20(%r1) > +#else > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 > + lg %r2, 0x38(%r1) > +#endif > + br %r14 > + > + .size __morestack_get_guard, . - __morestack_get_guard > + > +# void __morestack_set_guard (void *) sets the stack guard. > + .global __morestack_set_guard > + .hidden __morestack_set_guard > + > + .type __morestack_set_guard,@function > + > +__morestack_set_guard: > + > +#ifndef __s390x__ > + ear %r1, %a0 > + st %r2, 0x20(%r1) > +#else > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 > + stg %r2, 0x38(%r1) > +#endif > + br %r14 > + > + .size __morestack_set_guard, . - __morestack_set_guard > + > +# void *__morestack_make_guard (void *, size_t) returns the stack > +# guard value for a stack. > + .global __morestack_make_guard > + .hidden __morestack_make_guard > + > + .type __morestack_make_guard,@function > + > +__morestack_make_guard: > + > +#ifndef __s390x__ > + sr %r2, %r3 > + ahi %r2, BACKOFF > +#else > + sgr %r2, %r3 > + aghi %r2, BACKOFF > +#endif > + br %r14 > + > + .size __morestack_make_guard, . - __morestack_make_guard > + > +# Make __stack_split_initialize a high priority constructor. > + > + .section .ctors.65535,"aw",@progbits > + > +#ifndef __LP64__ > + .align 4 > + .long __stack_split_initialize > + .long __morestack_load_mmap > +#else > + .align 8 > + .quad __stack_split_initialize > + .quad __morestack_load_mmap > +#endif > + > + .section .note.GNU-stack,"",@progbits > + .section .note.GNU-split-stack,"",@progbits > + .section .note.GNU-no-split-stack,"",@progbits > diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390 > new file mode 100644 > index 0000000..4c959b0 > --- /dev/null > +++ b/libgcc/config/s390/t-stack-s390 > @@ -0,0 +1,2 @@ > +# Makefile fragment to support -fsplit-stack for s390. > +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S > diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c > index 89765d4..b8eec4e 100644 > --- a/libgcc/generic-morestack.c > +++ b/libgcc/generic-morestack.c > @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len, > #elif defined (__i386__) > nsp -= 6 * sizeof (void *); > #elif defined __powerpc64__ > +#elif defined __s390x__ > + nsp -= 2 * 160; > +#elif defined __s390__ > + nsp -= 2 * 96; > #else > #error "unrecognized target" > #endif > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-01-29 13:33 ` Andreas Krebbel @ 2016-01-29 15:43 ` Marcin Kościelnicki 2016-01-29 16:17 ` Andreas Krebbel 0 siblings, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-29 15:43 UTC (permalink / raw) To: Andreas Krebbel; +Cc: gcc-patches On 29/01/16 14:33, Andreas Krebbel wrote: > Hi Marcin, > > sorry for the late feedback. > > A few comments regarding the split stack implementation: > > The GNU coding style requires to replace every 8 leading blanks on a > line with a tab. There are many lines in your patch violating this. > In case you are an emacs user `whitespace-cleanup' will fix this for > you. OK, will do. > > Could you please add a testcase checking the different > variants. I.e. with early exit, no-alloc in __morestack, and with an > actual allocation? The testsuite with -fsplit-stack already hits all of them, and checking them manually is rather tricky (I don't know if it could be done in target-independent way at all), but I think it'd be reasonable to make assembly testcases calling __morestack for the last two cases, to check if the registers are being preserved, etc. > > There are a few more comments inline. > > Bye, > > -Andreas- > >> diff --git a/gcc/ChangeLog b/gcc/ChangeLog >> index c881d52..71f6f38 100644 >> --- a/gcc/ChangeLog >> +++ b/gcc/ChangeLog >> @@ -1,5 +1,38 @@ >> 2016-01-16 Marcin KoÅcielnicki <koriakin@0x04.net> >> >> + * common/config/s390/s390-common.c (s390_supports_split_stack): >> + New function. >> + (TARGET_SUPPORTS_SPLIT_STACK): New macro. >> + * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. >> + * config/s390/s390.c (struct machine_function): New field >> + split_stack_varargs_pointer. >> + (s390_register_info): Mark r12 as clobbered if it'll be used as temp >> + in s390_emit_prologue. >> + (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack >> + vararg pointer. >> + (morestack_ref): New global. >> + (SPLIT_STACK_AVAILABLE): New macro. >> + (s390_expand_split_stack_prologue): New function. >> + (s390_expand_split_stack_call): New function. >> + (s390_live_on_entry): New function. >> + (s390_va_start): Use split-stack vararg pointer if appropriate. >> + (s390_reorg): Lower the split-stack pseudo-insns. >> + (s390_asm_file_end): Emit the split-stack note sections. >> + (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. >> + * config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec. >> + (UNSPECV_SPLIT_STACK_CALL): New unspec. >> + (UNSPECV_SPLIT_STACK_SIBCALL): New unspec. >> + (UNSPECV_SPLIT_STACK_MARKER): New unspec. >> + (split_stack_prologue): New expand. >> + (split_stack_call_*): New insn. >> + (split_stack_cond_call_*): New insn. >> + (split_stack_space_check): New expand. >> + (split_stack_sibcall_*): New insn. >> + (split_stack_cond_sibcall_*): New insn. >> + (split_stack_marker): New insn. >> + >> +2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> >> + >> * cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps >> with side effects. >> >> diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c >> index 4519c21..1e497e6 100644 >> --- a/gcc/common/config/s390/s390-common.c >> +++ b/gcc/common/config/s390/s390-common.c >> @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, >> } >> } >> >> +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23. >> + We don't verify it, since earlier versions just have padding at >> + its place, which works just as well. */ >> + >> +static bool >> +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED, >> + struct gcc_options *opts ATTRIBUTE_UNUSED) >> +{ >> + return true; >> +} >> + >> #undef TARGET_DEFAULT_TARGET_FLAGS >> #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT) >> >> @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, >> #undef TARGET_OPTION_INIT_STRUCT >> #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct >> >> +#undef TARGET_SUPPORTS_SPLIT_STACK >> +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack >> + >> struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; >> diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h >> index 633bc1e..09032c9 100644 >> --- a/gcc/config/s390/s390-protos.h >> +++ b/gcc/config/s390/s390-protos.h >> @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, >> extern HOST_WIDE_INT s390_initial_elimination_offset (int, int); >> extern void s390_emit_prologue (void); >> extern void s390_emit_epilogue (bool); >> +extern void s390_expand_split_stack_prologue (void); >> extern bool s390_can_use_simple_return_insn (void); >> extern bool s390_can_use_return_insn (void); >> extern void s390_function_profiler (FILE *, int); >> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c >> index 3be64de..6afce7c 100644 >> --- a/gcc/config/s390/s390.c >> +++ b/gcc/config/s390/s390.c >> @@ -426,6 +426,13 @@ struct GTY(()) machine_function >> /* True if the current function may contain a tbegin clobbering >> FPRs. */ >> bool tbegin_p; >> + >> + /* For -fsplit-stack support: A stack local which holds a pointer to >> + the stack arguments for a function with a variable number of >> + arguments. This is set at the start of the function and is used >> + to initialize the overflow_arg_area field of the va_list >> + structure. */ >> + rtx split_stack_varargs_pointer; >> }; >> >> /* Few accessor macros for struct cfun->machine->s390_frame_layout. */ >> @@ -9316,9 +9323,13 @@ s390_register_info () >> cfun_frame_layout.high_fprs++; >> } >> >> - if (flag_pic) >> - clobbered_regs[PIC_OFFSET_TABLE_REGNUM] >> - |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM); >> + /* Register 12 is used for GOT address, but also as temp in prologue >> + for split-stack stdarg functions (unless r14 is available). */ >> + clobbered_regs[12] >> + |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM)) >> + || (flag_split_stack && cfun->stdarg >> + && (crtl->is_leaf || TARGET_TPF_PROFILING >> + || has_hard_reg_initial_val (Pmode, RETURN_REGNUM)))); >> >> clobbered_regs[BASE_REGNUM] >> |= (cfun->machine->base_reg >> @@ -10446,6 +10457,8 @@ s390_emit_prologue (void) >> && !crtl->is_leaf >> && !TARGET_TPF_PROFILING) >> temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM); >> + else if (flag_split_stack && cfun->stdarg) >> + temp_reg = gen_rtx_REG (Pmode, 12); > TPF uses r1 hard coded in tracing prologue/epilogue. So I think we > need && !TARGET_TPF_PROFILING here as well. Well, in that case, we'll need to emit a move instruction to some temp register, since __morestack will leave the pointer in %r1. I'll look into that. > >> else >> temp_reg = gen_rtx_REG (Pmode, 1); >> >> @@ -10939,6 +10952,284 @@ s300_set_up_by_prologue (hard_reg_set_container *regs) >> SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg)); >> } >> >> +/* -fsplit-stack support. */ >> + >> +/* A SYMBOL_REF for __morestack. */ >> +static GTY(()) rtx morestack_ref; >> + >> +/* When using -fsplit-stack, the allocation routines set a field in >> + the TCB to the bottom of the stack plus this much space, measured >> + in bytes. */ >> + >> +#define SPLIT_STACK_AVAILABLE 1024 >> + >> +/* Emit -fsplit-stack prologue, which goes before the regular function >> + prologue. */ >> + >> +void >> +s390_expand_split_stack_prologue (void) >> +{ >> + rtx r1, guard, cc; >> + rtx_insn *insn; >> + /* Offset from thread pointer to __private_ss. */ >> + int psso = TARGET_64BIT ? 0x38 : 0x20; >> + /* Pointer size in bytes. */ >> + /* Frame size and argument size - the two parameters to __morestack. */ >> + HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size; >> + /* Align argument size to 8 bytes - simplifies __morestack code. */ >> + HOST_WIDE_INT args_size = crtl->args.size >= 0 >> + ? ((crtl->args.size + 7) & ~7) >> + : 0; >> + /* Label to jump to when no __morestack call is necessary. */ >> + rtx_code_label *enough = NULL; >> + /* Label to be called by __morestack. */ >> + rtx_code_label *call_done = NULL; >> + /* 1 if __morestack called conditionally, 0 if always. */ >> + int conditional = 0; >> + >> + gcc_assert (flag_split_stack && reload_completed); >> + if (!TARGET_CPU_ZARCH) >> + { >> + sorry ("CPUs older than z900 are not supported for -fsplit-stack"); >> + return; >> + } >> + >> + r1 = gen_rtx_REG (Pmode, 1); >> + >> + /* If no stack frame will be allocated, don't do anything. */ >> + if (!frame_size) >> + { >> + /* But emit a marker that will let linker and indirect function >> + calls recognise this function as split-stack aware. */ >> + emit_insn(gen_split_stack_marker()); > 2x missing blank before ( > >> + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) >> + { >> + /* If va_start is used, just use r15. */ >> + emit_move_insn (r1, >> + gen_rtx_PLUS (Pmode, stack_pointer_rtx, >> + GEN_INT (STACK_POINTER_OFFSET))); > virtual_incoming_args_rtx ? > Alright. >> + } >> + return; >> + } >> + >> + if (morestack_ref == NULL_RTX) >> + { >> + morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack"); >> + SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL >> + | SYMBOL_FLAG_FUNCTION); >> + } >> + >> + if (frame_size <= 0x7fff || (TARGET_EXTIMM && frame_size <= 0xffffffffu)) > The agfi immediate value is a signed 32 bit integer. So you can only > add up to 2G-1. I think it would be more readable to write this as: We're emitting ALGFI here, which accepts unsigned 32-bit integer. > > if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Os (frame_size)) > > as in s390_emit_prologue. The Os check will check for TARGET_EXTIMM as well. Alright. > >> + { >> + /* If frame_size will fit in an add instruction, do a stack space >> + check, and only call __morestack if there's not enough space. */ >> + conditional = 1; >> + >> + /* Get thread pointer. r1 is the only register we can always destroy - r0 >> + could contain a static chain (and cannot be used to address memory >> + anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved. */ >> + emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM)); >> + /* Aim at __private_ss. */ >> + guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso)); >> + >> + /* If less that 1kiB used, skip addition and compare directly with >> + __private_ss. */ >> + if (frame_size > SPLIT_STACK_AVAILABLE) >> + { >> + emit_move_insn (r1, guard); >> + if (TARGET_64BIT) >> + emit_insn (gen_adddi3 (r1, r1, GEN_INT(frame_size))); >> + else >> + emit_insn (gen_addsi3 (r1, r1, GEN_INT(frame_size))); >> + guard = r1; >> + } >> + >> + if (TARGET_CPU_ZARCH) >> + { > Looks like the !TARGET_CPU_ZARCH stuff hasn't been completely removed?! Oops, will remove that. > >> + rtx tmp; >> + >> + /* Compare the (maybe adjusted) guard with the stack pointer. */ >> + cc = s390_emit_compare (LT, stack_pointer_rtx, guard); >> + >> + call_done = gen_label_rtx (); >> + >> + if (TARGET_64BIT) >> + tmp = gen_split_stack_cond_call_di (call_done, >> + morestack_ref, >> + GEN_INT (frame_size), >> + GEN_INT (args_size), >> + cc); >> + else >> + tmp = gen_split_stack_cond_call_si (call_done, >> + morestack_ref, >> + GEN_INT (frame_size), >> + GEN_INT (args_size), >> + cc); > Perhaps it would be more readable to do the TARGET_64BIT check in a separate > expander. Please see "movstr" in s390.md. The same applies to all the > other gen_split_stack* invocations. Alright. > >> + >> + >> + insn = emit_jump_insn (tmp); >> + JUMP_LABEL (insn) = call_done; >> + >> + /* Mark the jump as very unlikely to be taken. */ >> + add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100); >> + } >> + else >> + { >> + /* Compare the (maybe adjusted) guard with the stack pointer. */ >> + cc = s390_emit_compare (GE, stack_pointer_rtx, guard); >> + >> + enough = gen_label_rtx (); >> + insn = s390_emit_jump (enough, cc); >> + JUMP_LABEL (insn) = enough; >> + >> + /* Mark the jump as very likely to be taken. */ >> + add_int_reg_note (insn, REG_BR_PROB, >> + REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100); >> + } >> + } >> + >> + if (call_done == NULL) > With the !TARGET_CPU_ZARCH path removed above this could be the else > path to the frame_size check and call_done can be removed. Right. > >> + { >> + rtx tmp; >> + call_done = gen_label_rtx (); >> + >> + /* Now, we need to call __morestack. It has very special calling >> + conventions: it preserves param/return/static chain registers for >> + calling main function body, and looks for its own parameters >> + at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */ >> + if (TARGET_64BIT) >> + tmp = gen_split_stack_call_di (call_done, >> + morestack_ref, >> + GEN_INT (frame_size), >> + GEN_INT (args_size)); > Indentation. > >> + else >> + tmp = gen_split_stack_call_si (call_done, >> + morestack_ref, >> + GEN_INT (frame_size), >> + GEN_INT (args_size)); > Indentation. > >> + insn = emit_jump_insn (tmp); >> + JUMP_LABEL (insn) = call_done; >> + emit_barrier (); >> + } >> + >> + /* __morestack will call us here. */ >> + >> + if (enough != NULL) >> + { >> + emit_label (enough); >> + LABEL_NUSES (enough) = 1; >> + } > This also was only for !TARGET_CPU_ZARCH. Yes, it'll be removed. > >> + >> + if (conditional && cfun->machine->split_stack_varargs_pointer != NULL_RTX) >> + { >> + /* If va_start is used, and __morestack was not called, just use r15. */ >> + emit_move_insn (r1, >> + gen_rtx_PLUS (Pmode, stack_pointer_rtx, >> + GEN_INT (STACK_POINTER_OFFSET))); > virtual_incoming_args_rtx? > >> + } >> + >> + emit_label (call_done); >> + LABEL_NUSES (call_done) = 1; >> +} >> + >> +/* Generates split-stack call sequence, along with its parameter block. */ >> + >> +static void >> +s390_expand_split_stack_call (rtx_insn *orig_insn, >> + rtx call_done, >> + rtx function, >> + rtx frame_size, >> + rtx args_size, >> + rtx cond) >> +{ >> + int psize = GET_MODE_SIZE (Pmode); >> + rtx_insn *insn = orig_insn; >> + rtx parmbase = gen_label_rtx(); >> + rtx r1 = gen_rtx_REG (Pmode, 1); >> + rtx tmp, tmp2; >> + >> + /* %r1 = litbase. */ >> + insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn); >> + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); >> + LABEL_NUSES (parmbase)++; >> + >> + /* jg<cond> __morestack. */ >> + if (cond == NULL) >> + { >> + if (TARGET_64BIT) >> + tmp = gen_split_stack_sibcall_di (function, call_done); >> + else >> + tmp = gen_split_stack_sibcall_si (function, call_done); >> + insn = emit_jump_insn_after (tmp, insn); >> + } >> + else >> + { >> + if (!s390_comparison (cond, VOIDmode)) >> + internal_error ("bad split_stack_call cond"); > Perhaps just gcc_assert (s390_comparison (cond, VOIDmode)); ? OK. > >> + if (TARGET_64BIT) >> + tmp = gen_split_stack_cond_sibcall_di (function, cond, call_done); >> + else >> + tmp = gen_split_stack_cond_sibcall_si (function, cond, call_done); >> + insn = emit_jump_insn_after (tmp, insn); >> + } >> + JUMP_LABEL (insn) = call_done; >> + LABEL_NUSES (call_done)++; >> + >> + /* Go to .rodata. */ >> + insn = emit_insn_after (gen_pool_section_start (), insn); >> + >> + /* Now, we'll emit parameters to __morestack. First, align to pointer size >> + (this mirrors the alignment done in __morestack - don't touch it). */ >> + insn = emit_insn_after (gen_pool_align (GEN_INT (psize)), insn); > psize -> UNITS_PER_LONG? > OK. >> + >> + insn = emit_label_after (parmbase, insn); >> + >> + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, >> + gen_rtvec (1, frame_size), >> + UNSPECV_POOL_ENTRY); >> + insn = emit_insn_after (tmp, insn); >> + >> + /* Second parameter is size of the arguments passed on stack that >> + __morestack has to copy to the new stack (does not include varargs). */ >> + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, >> + gen_rtvec (1, args_size), >> + UNSPECV_POOL_ENTRY); >> + insn = emit_insn_after (tmp, insn); >> + >> + /* Third parameter is offset between start of the parameter block >> + and function body to be called by __morestack. */ >> + tmp = gen_rtx_LABEL_REF (Pmode, parmbase); >> + tmp2 = gen_rtx_LABEL_REF (Pmode, call_done); >> + tmp = gen_rtx_CONST (Pmode, >> + gen_rtx_MINUS (Pmode, tmp2, tmp)); >> + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, >> + gen_rtvec (1, tmp), >> + UNSPECV_POOL_ENTRY); >> + insn = emit_insn_after (tmp, insn); >> + add_reg_note (insn, REG_LABEL_OPERAND, call_done); >> + LABEL_NUSES (call_done)++; >> + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); >> + LABEL_NUSES (parmbase)++; >> + >> + /* Return from .rodata. */ >> + insn = emit_insn_after (gen_pool_section_end (), insn); >> + >> + delete_insn (orig_insn); >> +} >> + >> +/* We may have to tell the dataflow pass that the split stack prologue >> + is initializing a register. */ >> + >> +static void >> +s390_live_on_entry (bitmap regs) >> +{ >> + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) >> + { >> + gcc_assert (flag_split_stack); >> + bitmap_set_bit (regs, 1); >> + } >> +} >> + >> /* Return true if the function can use simple_return to return outside >> of a shrink-wrapped region. At present shrink-wrapping is supported >> in all cases. */ >> @@ -11541,6 +11832,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) >> expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); >> } >> >> + if (flag_split_stack >> + && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) >> + == NULL) >> + && cfun->machine->split_stack_varargs_pointer == NULL_RTX) >> + { >> + rtx reg; >> + rtx_insn *seq; >> + >> + reg = gen_reg_rtx (Pmode); >> + cfun->machine->split_stack_varargs_pointer = reg; >> + >> + start_sequence (); >> + emit_move_insn (reg, gen_rtx_REG (Pmode, 1)); >> + seq = get_insns (); >> + end_sequence (); >> + >> + push_topmost_sequence (); >> + emit_insn_after (seq, entry_of_function ()); >> + pop_topmost_sequence (); >> + } >> + >> /* Find the overflow area. >> FIXME: This currently is too pessimistic when the vector ABI is >> enabled. In that case we *always* set up the overflow area >> @@ -11549,7 +11861,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) >> || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG >> || TARGET_VX_ABI) >> { >> - t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); >> + if (cfun->machine->split_stack_varargs_pointer == NULL_RTX) >> + t = make_tree (TREE_TYPE (ovf), crtl->args.internal_arg_pointer); > What is the reason for changing virtual_incoming_args_rtx to > crtl->args.internal_arg_pointer in the non-split-stack case? Looks like an accident, will change it back. > >> + else >> + t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer); >> >> off = INTVAL (crtl->args.arg_offset_rtx); >> off = off < 0 ? 0 : off; >> @@ -13158,6 +13473,48 @@ s390_reorg (void) >> } >> } >> >> + if (flag_split_stack) >> + { >> + rtx_insn *insn; >> + >> + for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) >> + { >> + /* Look for the split-stack fake jump instructions. */ >> + if (!JUMP_P(insn)) >> + continue; >> + if (GET_CODE (PATTERN (insn)) != PARALLEL >> + || XVECLEN (PATTERN (insn), 0) != 2) >> + continue; >> + rtx set = XVECEXP (PATTERN (insn), 0, 1); >> + if (GET_CODE (set) != SET) >> + continue; >> + rtx unspec = XEXP(set, 1); >> + if (GET_CODE (unspec) != UNSPEC_VOLATILE) >> + continue; >> + if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL) >> + continue; >> + rtx set_pc = XVECEXP (PATTERN (insn), 0, 0); >> + rtx function = XVECEXP (unspec, 0, 0); >> + rtx frame_size = XVECEXP (unspec, 0, 1); >> + rtx args_size = XVECEXP (unspec, 0, 2); >> + rtx pc_src = XEXP (set_pc, 1); >> + rtx call_done, cond = NULL_RTX; >> + if (GET_CODE (pc_src) == IF_THEN_ELSE) >> + { >> + cond = XEXP (pc_src, 0); >> + call_done = XEXP (XEXP (pc_src, 1), 0); >> + } >> + else >> + call_done = XEXP (pc_src, 0); >> + s390_expand_split_stack_call (insn, >> + call_done, >> + function, >> + frame_size, >> + args_size, >> + cond); >> + } >> + } >> + > I'm wondering if it is really necessary to expand the call in that > two-step approach?! We do the general literal pool handling in > s390_reorg because we need all the insn lengths to be finalized before > performing the branch/pool splitting loop. But this shouldn't be necessary > in this case. Would it be possible to expand the call already in > emit_prologue phase and get rid of the s390_reorg part? There's an internal literal pool involved, which needs to be emitted as one chunk. Optimizations are also very likely to destroy the sequence: consider the target address that __morestack will call - the control flow change happens in __morestack jump instruction, but the address itself is encoded in one of the pool literals. Just not worth the risk. > >> /* Try to optimize prologue and epilogue further. */ >> s390_optimize_prologue (); >> >> @@ -14469,6 +14826,9 @@ s390_asm_file_end (void) >> s390_vector_abi); >> #endif >> file_end_indicate_exec_stack (); >> + >> + if (flag_split_stack) >> + file_end_indicate_split_stack (); >> } >> >> /* Return true if TYPE is a vector bool type. */ >> @@ -14724,6 +15084,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty >> #undef TARGET_SET_UP_BY_PROLOGUE >> #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue >> >> +#undef TARGET_EXTRA_LIVE_ON_ENTRY >> +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry >> + >> #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P >> #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ >> s390_use_by_pieces_infrastructure_p >> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md >> index 9b869d5..21cd989 100644 >> --- a/gcc/config/s390/s390.md >> +++ b/gcc/config/s390/s390.md >> @@ -114,6 +114,9 @@ >> UNSPEC_SP_SET >> UNSPEC_SP_TEST >> >> + ; Split stack support >> + UNSPEC_STACK_CHECK >> + >> ; Test Data Class (TDC) >> UNSPEC_TDC_INSN >> >> @@ -276,6 +279,11 @@ >> ; Set and get floating point control register >> UNSPECV_SFPC >> UNSPECV_EFPC >> + >> + ; Split stack support >> + UNSPECV_SPLIT_STACK_CALL >> + UNSPECV_SPLIT_STACK_SIBCALL >> + UNSPECV_SPLIT_STACK_MARKER >> ]) >> >> ;; >> @@ -10907,3 +10915,104 @@ >> "TARGET_Z13" >> "lcbb\t%0,%1,%b2" >> [(set_attr "op_type" "VRX")]) >> + >> +; Handle -fsplit-stack. >> + >> +(define_expand "split_stack_prologue" >> + [(const_int 0)] >> + "" >> +{ >> + s390_expand_split_stack_prologue (); >> + DONE; >> +}) >> + >> +(define_insn "split_stack_call_<mode>" >> + [(set (pc) (label_ref (match_operand 0 "" ""))) >> + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") >> + (match_operand 2 "consttable_operand" "X") >> + (match_operand 3 "consttable_operand" "X")] >> + UNSPECV_SPLIT_STACK_CALL))] >> + "TARGET_CPU_ZARCH" >> +{ >> + gcc_unreachable (); >> +} >> + [(set_attr "length" "12")]) >> + >> +(define_insn "split_stack_cond_call_<mode>" >> + [(set (pc) >> + (if_then_else >> + (match_operand 4 "" "") >> + (label_ref (match_operand 0 "" "")) >> + (pc))) >> + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") >> + (match_operand 2 "consttable_operand" "X") >> + (match_operand 3 "consttable_operand" "X")] >> + UNSPECV_SPLIT_STACK_CALL))] >> + "TARGET_CPU_ZARCH" >> +{ >> + gcc_unreachable (); >> +} >> + [(set_attr "length" "12")]) >> + >> +;; If there are operand 0 bytes available on the stack, jump to >> +;; operand 1. >> + >> +(define_expand "split_stack_space_check" >> + [(set (pc) (if_then_else >> + (ltu (minus (reg 15) >> + (match_operand 0 "register_operand")) >> + (unspec [(const_int 0)] UNSPEC_STACK_CHECK)) >> + (label_ref (match_operand 1)) >> + (pc)))] >> + "" >> +{ >> + /* Offset from thread pointer to __private_ss. */ >> + int psso = TARGET_64BIT ? 0x38 : 0x20; >> + rtx tp = s390_get_thread_pointer (); >> + rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso)); >> + rtx reg = gen_reg_rtx (Pmode); >> + rtx cc; >> + if (TARGET_64BIT) >> + emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0])); >> + else >> + emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0])); >> + cc = s390_emit_compare (GT, reg, guard); >> + s390_emit_jump (operands[1], cc); >> + >> + DONE; >> +}) > This expander does not seem to get called from anywhere. It's called from target-independent code for alloca and VLAs. > >> + >> +;; A jg with minimal fuss for use in split stack prologue. >> + >> +(define_insn "split_stack_sibcall_<mode>" >> + [(set (pc) (label_ref (match_operand 1 "" ""))) >> + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] >> + UNSPECV_SPLIT_STACK_SIBCALL))] >> + "TARGET_CPU_ZARCH" >> + "jg\t%0" >> + [(set_attr "op_type" "RIL") >> + (set_attr "type" "branch")]) >> + >> +;; Also a conditional one. >> + >> +(define_insn "split_stack_cond_sibcall_<mode>" >> + [(set (pc) >> + (if_then_else >> + (match_operand 1 "" "") >> + (label_ref (match_operand 2 "" "")) >> + (pc))) >> + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] >> + UNSPECV_SPLIT_STACK_SIBCALL))] >> + "TARGET_CPU_ZARCH" >> + "jg%C1\t%0" >> + [(set_attr "op_type" "RIL") >> + (set_attr "type" "branch")]) >> + >> +;; An unusual nop instruction used to mark functions with no stack frames >> +;; as split-stack aware. >> + >> +(define_insn "split_stack_marker" >> + [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)] >> + "" >> + "nopr\t%%r15" >> + [(set_attr "op_type" "RR")]) >> diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog >> index 4cd8f01..604b120 100644 >> --- a/libgcc/ChangeLog >> +++ b/libgcc/ChangeLog >> @@ -1,3 +1,10 @@ >> +2016-01-16 Marcin KoÅcielnicki <koriakin@0x04.net> >> + >> + * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. >> + * config/s390/morestack.S: New file. >> + * config/s390/t-stack-s390: New file. >> + * generic-morestack.c (__splitstack_find): Add s390-specific code. >> + >> 2016-01-15 Nick Clifton <nickc@redhat.com> >> >> * config/msp430/t-msp430 (lib2_mul_none.o): Only use the first >> diff --git a/libgcc/config.host b/libgcc/config.host >> index f58ee45..9793155 100644 >> --- a/libgcc/config.host >> +++ b/libgcc/config.host >> @@ -1105,11 +1105,11 @@ rx-*-elf) >> tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h" >> ;; >> s390-*-linux*) >> - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi" >> + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390" >> md_unwind_header=s390/linux-unwind.h >> ;; >> s390x-*-linux*) >> - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux" >> + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390" >> if test "${host_address}" = 32; then >> tmake_file="${tmake_file} s390/32/t-floattodi" >> fi >> diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S >> new file mode 100644 >> index 0000000..c99f6e4 >> --- /dev/null >> +++ b/libgcc/config/s390/morestack.S >> @@ -0,0 +1,609 @@ >> +# s390 support for -fsplit-stack. >> +# Copyright (C) 2015 Free Software Foundation, Inc. >> +# Contributed by Marcin KoÅcielnicki <koriakin@0x04.net>. >> + >> +# This file is part of GCC. >> + >> +# GCC is free software; you can redistribute it and/or modify it under >> +# the terms of the GNU General Public License as published by the Free >> +# Software Foundation; either version 3, or (at your option) any later >> +# version. >> + >> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY >> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or >> +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License >> +# for more details. >> + >> +# Under Section 7 of GPL version 3, you are granted additional >> +# permissions described in the GCC Runtime Library Exception, version >> +# 3.1, as published by the Free Software Foundation. >> + >> +# You should have received a copy of the GNU General Public License and >> +# a copy of the GCC Runtime Library Exception along with this program; >> +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see >> +# <http://www.gnu.org/licenses/>. >> + >> +# Excess space needed to call ld.so resolver for lazy plt >> +# resolution. Go uses sigaltstack so this doesn't need to >> +# also cover signal frame size. >> +#define BACKOFF 0x1000 >> + >> +# The __morestack function. >> + >> + .global __morestack >> + .hidden __morestack >> + >> + .type __morestack,@function >> + >> +__morestack: >> +.LFB1: >> + .cfi_startproc >> + >> + >> +#ifndef __s390x__ >> + >> + >> +# The 31-bit __morestack function. >> + >> + # We use a cleanup to restore the stack guard if an exception >> + # is thrown through this code. >> +#ifndef __PIC__ >> + .cfi_personality 0,__gcc_personality_v0 >> + .cfi_lsda 0,.LLSDA1 >> +#else >> + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 >> + .cfi_lsda 0x1b,.LLSDA1 >> +#endif >> + >> + stm %r2, %r15, 0x8(%r15) # Save %r2-%r15. >> + .cfi_offset %r6, -0x48 >> + .cfi_offset %r7, -0x44 >> + .cfi_offset %r8, -0x40 >> + .cfi_offset %r9, -0x3c >> + .cfi_offset %r10, -0x38 >> + .cfi_offset %r11, -0x34 >> + .cfi_offset %r12, -0x30 >> + .cfi_offset %r13, -0x2c >> + .cfi_offset %r14, -0x28 >> + .cfi_offset %r15, -0x24 >> + lr %r11, %r15 # Make frame pointer for vararg. >> + .cfi_def_cfa_register %r11 >> + ahi %r15, -0x60 # 0x60 for standard frame. >> + st %r11, 0(%r15) # Save back chain. >> + lr %r8, %r0 # Save %r0 (static chain). >> + lr %r10, %r1 # Save %r1 (address of parameter block). >> + >> + l %r7, 0(%r10) # Required frame size to %r7 >> + ear %r1, %a0 # Extract thread pointer. >> + l %r1, 0x20(%r1) # Get stack bounduary >> + ar %r1, %r7 # Stack bounduary + frame size >> + a %r1, 4(%r10) # + stack param size >> + clr %r1, %r15 # Compare with current stack pointer >> + jle .Lnoalloc # guard > sp - frame-size: need alloc >> + >> + brasl %r14, __morestack_block_signals >> + >> + # We abuse one of caller's fpr save slots (which we don't use for fprs) >> + # as a local variable. Not needed here, but done to be consistent with >> + # the below use. >> + ahi %r7, BACKOFF # Bump requested size a bit. >> + st %r7, 0x40(%r11) # Stuff frame size on stack. >> + la %r2, 0x40(%r11) # Pass its address as parameter. >> + la %r3, 0x60(%r11) # Caller's stack parameters. >> + l %r4, 4(%r10) # Size of stack paremeters. > parameters > >> + brasl %r14, __generic_morestack >> + >> + lr %r15, %r2 # Switch to the new stack. >> + ahi %r15, -0x60 # Make a stack frame on it. >> + st %r11, 0(%r15) # Save back chain. >> + >> + s %r2, 0x40(%r11) # The end of stack space. >> + ahi %r2, BACKOFF # Back off a bit. >> + ear %r1, %a0 # Extract thread pointer. >> +.LEHB0: >> + st %r2, 0x20(%r1) # Save the new stack boundary. >> + >> + brasl %r14, __morestack_unblock_signals >> + >> + lr %r0, %r8 # Static chain. >> + lm %r2, %r6, 0x8(%r11) # Paremeter registers. >> + >> + # Third parameter is address of function meat - address of parameter >> + # block. >> + a %r10, 0x8(%r10) >> + >> + # Leave vararg pointer in %r1, in case function uses it >> + la %r1, 0x60(%r11) >> + >> + # State of registers: >> + # %r0: Static chain from entry. >> + # %r1: Vararg pointer. >> + # %r2-%r6: Parameters from entry. >> + # %r7-%r10: Indeterminate. >> + # %r11: Frame pointer (%r15 from entry). >> + # %r12-%r13: Indeterminate. >> + # %r14: Return address. >> + # %r15: Stack pointer. >> + basr %r14, %r10 # Call our caller. >> + >> + stm %r2, %r3, 0x8(%r11) # Save return registers. >> + >> + brasl %r14, __morestack_block_signals >> + >> + # We need a stack slot now, but have no good way to get it - the frame >> + # on new stack had to be exactly 0x60 bytes, or stack parameters would >> + # be passed wrong. Abuse fpr save area in caller's frame (we don't >> + # save actual fprs). >> + la %r2, 0x40(%r11) >> + brasl %r14, __generic_releasestack >> + >> + s %r2, 0x40(%r11) # Subtract available space. >> + ahi %r2, BACKOFF # Back off a bit. >> + ear %r1, %a0 # Extract thread pointer. >> +.LEHE0: >> + st %r2, 0x20(%r1) # Save the new stack boundary. >> + >> + # We need to restore the old stack pointer before unblocking signals. >> + # We also need 0x60 bytes for a stack frame. Since we had a stack >> + # frame at this place before the stack switch, there's no need to >> + # write the back chain again. >> + lr %r15, %r11 >> + ahi %r15, -0x60 >> + >> + brasl %r14, __morestack_unblock_signals >> + >> + lm %r2, %r15, 0x8(%r11) # Restore all registers. >> + .cfi_remember_state >> + .cfi_restore %r15 >> + .cfi_restore %r14 >> + .cfi_restore %r13 >> + .cfi_restore %r12 >> + .cfi_restore %r11 >> + .cfi_restore %r10 >> + .cfi_restore %r9 >> + .cfi_restore %r8 >> + .cfi_restore %r7 >> + .cfi_restore %r6 >> + .cfi_def_cfa_register %r15 >> + br %r14 # Return to caller's caller. >> + >> +# Executed if no new stack allocation is needed. >> + >> +.Lnoalloc: >> + .cfi_restore_state >> + # We may need to copy stack parameters. >> + l %r9, 0x4(%r10) # Load stack parameter size. >> + ltr %r9, %r9 # And check if it's 0. >> + je .Lnostackparm # Skip the copy if not needed. >> + sr %r15, %r9 # Make space on the stack. >> + la %r8, 0x60(%r15) # Destination. >> + la %r12, 0x60(%r11) # Source. >> + lr %r13, %r9 # Source size. >> +.Lcopy: >> + mvcle %r8, %r12, 0 # Copy. >> + jo .Lcopy >> + >> +.Lnostackparm: >> + # Third parameter is address of function meat - address of parameter >> + # block. >> + a %r10, 0x8(%r10) >> + >> + # Leave vararg pointer in %r1, in case function uses it >> + la %r1, 0x60(%r11) >> + >> + # OK, no stack allocation needed. We still follow the protocol and >> + # call our caller - it doesn't cost much and makes sure vararg works. >> + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. >> + basr %r14, %r10 # Call our caller. > The comment confuses me. It somewhat sounds to me like the call > wouldn't be really needed but in fact it cannot even remotely work > without jumping back to the function body right?! Certainly. __morestack's task is to call the given function entry point once the necessary stack space is established. In fact, in the no allocation case, a sibling-call would actually be possible, if it weren't for one annoying detail: there are no free GPRs we could use to keep the address of the entry point - %r0 may be used to keep static chain, %r1 may have to be the argument pointer, %r2-%r5 may be used to keep parameters, and %r6-%r15 are callee-saved. > >> + >> + lm %r6, %r15, 0x18(%r11) # Restore all callee-saved registers. >> + .cfi_remember_state >> + .cfi_restore %r15 >> + .cfi_restore %r14 >> + .cfi_restore %r13 >> + .cfi_restore %r12 >> + .cfi_restore %r11 >> + .cfi_restore %r10 >> + .cfi_restore %r9 >> + .cfi_restore %r8 >> + .cfi_restore %r7 >> + .cfi_restore %r6 >> + .cfi_def_cfa_register %r15 >> + br %r14 # Return to caller's caller. >> + >> +# This is the cleanup code called by the stack unwinder when unwinding >> +# through the code between .LEHB0 and .LEHE0 above. >> + >> +.L1: >> + .cfi_restore_state >> + lr %r2, %r11 # Stack pointer after resume. >> + brasl %r14, __generic_findstack >> + lr %r3, %r11 # Get the stack pointer. >> + sr %r3, %r2 # Subtract available space. >> + ahi %r3, BACKOFF # Back off a bit. >> + ear %r1, %a0 # Extract thread pointer. >> + st %r3, 0x20(%r1) # Save the new stack boundary. >> + >> + lr %r2, %r6 # Exception header. >> +#ifdef __PIC__ >> + brasl %r14, _Unwind_Resume@PLT >> +#else >> + brasl %r14, _Unwind_Resume >> +#endif >> + >> +#else /* defined(__s390x__) */ >> + >> + >> +# The 64-bit __morestack function. >> + >> + # We use a cleanup to restore the stack guard if an exception >> + # is thrown through this code. >> +#ifndef __PIC__ >> + .cfi_personality 0x3,__gcc_personality_v0 >> + .cfi_lsda 0x3,.LLSDA1 >> +#else >> + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 >> + .cfi_lsda 0x1b,.LLSDA1 >> +#endif >> + >> + stmg %r2, %r15, 0x10(%r15) # Save %r2-%r15. >> + .cfi_offset %r6, -0x70 >> + .cfi_offset %r7, -0x68 >> + .cfi_offset %r8, -0x60 >> + .cfi_offset %r9, -0x58 >> + .cfi_offset %r10, -0x50 >> + .cfi_offset %r11, -0x48 >> + .cfi_offset %r12, -0x40 >> + .cfi_offset %r13, -0x38 >> + .cfi_offset %r14, -0x30 >> + .cfi_offset %r15, -0x28 >> + lgr %r11, %r15 # Make frame pointer for vararg. >> + .cfi_def_cfa_register %r11 >> + aghi %r15, -0xa0 # 0xa0 for standard frame. >> + stg %r11, 0(%r15) # Save back chain. >> + lgr %r8, %r0 # Save %r0 (static chain). >> + lgr %r10, %r1 # Save %r1 (address of parameter block). >> + >> + lg %r7, 0(%r10) # Required frame size to %r7 >> + ear %r1, %a0 >> + sllg %r1, %r1, 32 >> + ear %r1, %a1 # Extract thread pointer. >> + lg %r1, 0x38(%r1) # Get stack bounduary >> + agr %r1, %r7 # Stack bounduary + frame size >> + ag %r1, 8(%r10) # + stack param size >> + clgr %r1, %r15 # Compare with current stack pointer >> + jle .Lnoalloc # guard > sp - frame-size: need alloc >> + >> + brasl %r14, __morestack_block_signals >> + >> + # We abuse one of caller's fpr save slots (which we don't use for fprs) >> + # as a local variable. Not needed here, but done to be consistent with >> + # the below use. >> + aghi %r7, BACKOFF # Bump requested size a bit. >> + stg %r7, 0x80(%r11) # Stuff frame size on stack. >> + la %r2, 0x80(%r11) # Pass its address as parameter. >> + la %r3, 0xa0(%r11) # Caller's stack parameters. >> + lg %r4, 8(%r10) # Size of stack paremeters. >> + brasl %r14, __generic_morestack >> + >> + lgr %r15, %r2 # Switch to the new stack. >> + aghi %r15, -0xa0 # Make a stack frame on it. >> + stg %r11, 0(%r15) # Save back chain. >> + >> + sg %r2, 0x80(%r11) # The end of stack space. >> + aghi %r2, BACKOFF # Back off a bit. >> + ear %r1, %a0 >> + sllg %r1, %r1, 32 >> + ear %r1, %a1 # Extract thread pointer. >> +.LEHB0: >> + stg %r2, 0x38(%r1) # Save the new stack boundary. >> + >> + brasl %r14, __morestack_unblock_signals >> + >> + lgr %r0, %r8 # Static chain. >> + lmg %r2, %r6, 0x10(%r11) # Paremeter registers. >> + >> + # Third parameter is address of function meat - address of parameter >> + # block. >> + ag %r10, 0x10(%r10) >> + >> + # Leave vararg pointer in %r1, in case function uses it >> + la %r1, 0xa0(%r11) >> + >> + # State of registers: >> + # %r0: Static chain from entry. >> + # %r1: Vararg pointer. >> + # %r2-%r6: Parameters from entry. >> + # %r7-%r10: Indeterminate. >> + # %r11: Frame pointer (%r15 from entry). >> + # %r12-%r13: Indeterminate. >> + # %r14: Return address. >> + # %r15: Stack pointer. >> + basr %r14, %r10 # Call our caller. >> + >> + stg %r2, 0x10(%r11) # Save return register. >> + >> + brasl %r14, __morestack_block_signals >> + >> + # We need a stack slot now, but have no good way to get it - the frame >> + # on new stack had to be exactly 0xa0 bytes, or stack parameters would >> + # be passed wrong. Abuse fpr save area in caller's frame (we don't >> + # save actual fprs). >> + la %r2, 0x80(%r11) >> + brasl %r14, __generic_releasestack >> + >> + sg %r2, 0x80(%r11) # Subtract available space. >> + aghi %r2, BACKOFF # Back off a bit. >> + ear %r1, %a0 >> + sllg %r1, %r1, 32 >> + ear %r1, %a1 # Extract thread pointer. >> +.LEHE0: >> + stg %r2, 0x38(%r1) # Save the new stack boundary. >> + >> + # We need to restore the old stack pointer before unblocking signals. >> + # We also need 0xa0 bytes for a stack frame. Since we had a stack >> + # frame at this place before the stack switch, there's no need to >> + # write the back chain again. >> + lgr %r15, %r11 >> + aghi %r15, -0xa0 >> + >> + brasl %r14, __morestack_unblock_signals >> + >> + lmg %r2, %r15, 0x10(%r11) # Restore all registers. >> + .cfi_remember_state >> + .cfi_restore %r15 >> + .cfi_restore %r14 >> + .cfi_restore %r13 >> + .cfi_restore %r12 >> + .cfi_restore %r11 >> + .cfi_restore %r10 >> + .cfi_restore %r9 >> + .cfi_restore %r8 >> + .cfi_restore %r7 >> + .cfi_restore %r6 >> + .cfi_def_cfa_register %r15 >> + br %r14 # Return to caller's caller. >> + >> +# Executed if no new stack allocation is needed. >> + >> +.Lnoalloc: >> + .cfi_restore_state >> + # We may need to copy stack parameters. >> + lg %r9, 0x8(%r10) # Load stack parameter size. >> + ltgr %r9, %r9 # Check if it's 0. >> + je .Lnostackparm # Skip the copy if not needed. >> + sgr %r15, %r9 # Make space on the stack. >> + la %r8, 0xa0(%r15) # Destination. >> + la %r12, 0xa0(%r11) # Source. >> + lgr %r13, %r9 # Source size. >> +.Lcopy: >> + mvcle %r8, %r12, 0 # Copy. >> + jo .Lcopy >> + >> +.Lnostackparm: >> + # Third parameter is address of function meat - address of parameter >> + # block. >> + ag %r10, 0x10(%r10) >> + >> + # Leave vararg pointer in %r1, in case function uses it >> + la %r1, 0xa0(%r11) >> + >> + # OK, no stack allocation needed. We still follow the protocol and >> + # call our caller - it doesn't cost much and makes sure vararg works. >> + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. >> + basr %r14, %r10 # Call our caller. >> + >> + lmg %r6, %r15, 0x30(%r11) # Restore all callee-saved registers. >> + .cfi_remember_state >> + .cfi_restore %r15 >> + .cfi_restore %r14 >> + .cfi_restore %r13 >> + .cfi_restore %r12 >> + .cfi_restore %r11 >> + .cfi_restore %r10 >> + .cfi_restore %r9 >> + .cfi_restore %r8 >> + .cfi_restore %r7 >> + .cfi_restore %r6 >> + .cfi_def_cfa_register %r15 >> + br %r14 # Return to caller's caller. >> + >> +# This is the cleanup code called by the stack unwinder when unwinding >> +# through the code between .LEHB0 and .LEHE0 above. >> + >> +.L1: >> + .cfi_restore_state >> + lgr %r2, %r11 # Stack pointer after resume. >> + brasl %r14, __generic_findstack >> + lgr %r3, %r11 # Get the stack pointer. >> + sgr %r3, %r2 # Subtract available space. >> + aghi %r3, BACKOFF # Back off a bit. >> + ear %r1, %a0 >> + sllg %r1, %r1, 32 >> + ear %r1, %a1 # Extract thread pointer. >> + stg %r3, 0x38(%r1) # Save the new stack boundary. >> + >> + lgr %r2, %r6 # Exception header. >> +#ifdef __PIC__ >> + brasl %r14, _Unwind_Resume@PLT >> +#else >> + brasl %r14, _Unwind_Resume >> +#endif >> + >> +#endif /* defined(__s390x__) */ >> + >> + .cfi_endproc >> + .size __morestack, . - __morestack >> + >> + >> +# The exception table. This tells the personality routine to execute >> +# the exception handler. >> + >> + .section .gcc_except_table,"a",@progbits >> + .align 4 >> +.LLSDA1: >> + .byte 0xff # @LPStart format (omit) >> + .byte 0xff # @TType format (omit) >> + .byte 0x1 # call-site format (uleb128) >> + .uleb128 .LLSDACSE1-.LLSDACSB1 # Call-site table length >> +.LLSDACSB1: >> + .uleb128 .LEHB0-.LFB1 # region 0 start >> + .uleb128 .LEHE0-.LEHB0 # length >> + .uleb128 .L1-.LFB1 # landing pad >> + .uleb128 0 # action >> +.LLSDACSE1: >> + >> + >> + .global __gcc_personality_v0 >> +#ifdef __PIC__ >> + # Build a position independent reference to the basic >> + # personality function. >> + .hidden DW.ref.__gcc_personality_v0 >> + .weak DW.ref.__gcc_personality_v0 >> + .section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat >> + .type DW.ref.__gcc_personality_v0, @object >> +DW.ref.__gcc_personality_v0: >> +#ifndef __LP64__ >> + .align 4 >> + .size DW.ref.__gcc_personality_v0, 4 >> + .long __gcc_personality_v0 >> +#else >> + .align 8 >> + .size DW.ref.__gcc_personality_v0, 8 >> + .quad __gcc_personality_v0 >> +#endif >> +#endif >> + >> + >> + >> +# Initialize the stack test value when the program starts or when a >> +# new thread starts. We don't know how large the main stack is, so we >> +# guess conservatively. We might be able to use getrlimit here. >> + >> + .text >> + .global __stack_split_initialize >> + .hidden __stack_split_initialize >> + >> + .type __stack_split_initialize, @function >> + >> +__stack_split_initialize: >> + >> +#ifndef __s390x__ >> + >> + ear %r1, %a0 >> + lr %r0, %r15 >> + ahi %r0, -0x4000 # We should have at least 16K. >> + st %r0, 0x20(%r1) >> + >> + lr %r2, %r15 >> + lhi %r3, 0x4000 >> +#ifdef __PIC__ >> + jg __generic_morestack_set_initial_sp@PLT # Tail call >> +#else >> + jg __generic_morestack_set_initial_sp # Tail call >> +#endif >> + >> +#else /* defined(__s390x__) */ >> + >> + ear %r1, %a0 >> + sllg %r1, %r1, 32 >> + ear %r1, %a1 >> + lgr %r0, %r15 >> + aghi %r0, -0x4000 # We should have at least 16K. >> + stg %r0, 0x38(%r1) >> + >> + lgr %r2, %r15 >> + lghi %r3, 0x4000 >> +#ifdef __PIC__ >> + jg __generic_morestack_set_initial_sp@PLT # Tail call >> +#else >> + jg __generic_morestack_set_initial_sp # Tail call >> +#endif >> + >> +#endif /* defined(__s390x__) */ >> + >> + .size __stack_split_initialize, . - __stack_split_initialize >> + >> +# Routines to get and set the guard, for __splitstack_getcontext, >> +# __splitstack_setcontext, and __splitstack_makecontext. >> + >> +# void *__morestack_get_guard (void) returns the current stack guard. >> + .text >> + .global __morestack_get_guard >> + .hidden __morestack_get_guard >> + >> + .type __morestack_get_guard,@function >> + >> +__morestack_get_guard: >> + >> +#ifndef __s390x__ >> + ear %r1, %a0 >> + l %r2, 0x20(%r1) >> +#else >> + ear %r1, %a0 >> + sllg %r1, %r1, 32 >> + ear %r1, %a1 >> + lg %r2, 0x38(%r1) >> +#endif >> + br %r14 >> + >> + .size __morestack_get_guard, . - __morestack_get_guard >> + >> +# void __morestack_set_guard (void *) sets the stack guard. >> + .global __morestack_set_guard >> + .hidden __morestack_set_guard >> + >> + .type __morestack_set_guard,@function >> + >> +__morestack_set_guard: >> + >> +#ifndef __s390x__ >> + ear %r1, %a0 >> + st %r2, 0x20(%r1) >> +#else >> + ear %r1, %a0 >> + sllg %r1, %r1, 32 >> + ear %r1, %a1 >> + stg %r2, 0x38(%r1) >> +#endif >> + br %r14 >> + >> + .size __morestack_set_guard, . - __morestack_set_guard >> + >> +# void *__morestack_make_guard (void *, size_t) returns the stack >> +# guard value for a stack. >> + .global __morestack_make_guard >> + .hidden __morestack_make_guard >> + >> + .type __morestack_make_guard,@function >> + >> +__morestack_make_guard: >> + >> +#ifndef __s390x__ >> + sr %r2, %r3 >> + ahi %r2, BACKOFF >> +#else >> + sgr %r2, %r3 >> + aghi %r2, BACKOFF >> +#endif >> + br %r14 >> + >> + .size __morestack_make_guard, . - __morestack_make_guard >> + >> +# Make __stack_split_initialize a high priority constructor. >> + >> + .section .ctors.65535,"aw",@progbits >> + >> +#ifndef __LP64__ >> + .align 4 >> + .long __stack_split_initialize >> + .long __morestack_load_mmap >> +#else >> + .align 8 >> + .quad __stack_split_initialize >> + .quad __morestack_load_mmap >> +#endif >> + >> + .section .note.GNU-stack,"",@progbits >> + .section .note.GNU-split-stack,"",@progbits >> + .section .note.GNU-no-split-stack,"",@progbits >> diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390 >> new file mode 100644 >> index 0000000..4c959b0 >> --- /dev/null >> +++ b/libgcc/config/s390/t-stack-s390 >> @@ -0,0 +1,2 @@ >> +# Makefile fragment to support -fsplit-stack for s390. >> +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S >> diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c >> index 89765d4..b8eec4e 100644 >> --- a/libgcc/generic-morestack.c >> +++ b/libgcc/generic-morestack.c >> @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len, >> #elif defined (__i386__) >> nsp -= 6 * sizeof (void *); >> #elif defined __powerpc64__ >> +#elif defined __s390x__ >> + nsp -= 2 * 160; >> +#elif defined __s390__ >> + nsp -= 2 * 96; >> #else >> #error "unrecognized target" >> #endif >> > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-01-29 15:43 ` Marcin Kościelnicki @ 2016-01-29 16:17 ` Andreas Krebbel 2016-02-02 14:52 ` Marcin Kościelnicki 2016-02-07 12:22 ` [PATCH] testsuite/s390: Add __morestack test Marcin Kościelnicki 0 siblings, 2 replies; 55+ messages in thread From: Andreas Krebbel @ 2016-01-29 16:17 UTC (permalink / raw) To: Marcin Kościelnicki; +Cc: gcc-patches On 01/29/2016 04:43 PM, Marcin KoÅcielnicki wrote: > The testsuite with -fsplit-stack already hits all of them, and checking > them manually is rather tricky (I don't know if it could be done in > target-independent way at all), but I think it'd be reasonable to make > assembly testcases calling __morestack for the last two cases, to check > if the registers are being preserved, etc. Sounds good. Thanks! ... >>> + if (frame_size <= 0x7fff || (TARGET_EXTIMM && frame_size <= 0xffffffffu)) >> The agfi immediate value is a signed 32 bit integer. So you can only >> add up to 2G-1. I think it would be more readable to write this as: > > We're emitting ALGFI here, which accepts unsigned 32-bit integer. Ah right. Then it would be: if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Op (frame_size)) instead. >> >> if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Os (frame_size)) >> >> as in s390_emit_prologue. The Os check will check for TARGET_EXTIMM as well. > > Alright. ... >> I'm wondering if it is really necessary to expand the call in that >> two-step approach?! We do the general literal pool handling in >> s390_reorg because we need all the insn lengths to be finalized before >> performing the branch/pool splitting loop. But this shouldn't be necessary >> in this case. Would it be possible to expand the call already in >> emit_prologue phase and get rid of the s390_reorg part? > > There's an internal literal pool involved, which needs to be emitted as > one chunk. Optimizations are also very likely to destroy the sequence: > consider the target address that __morestack will call - the control > flow change happens in __morestack jump instruction, but the address > itself is encoded in one of the pool literals. Just not worth the risk. Ok. ... >>> + # OK, no stack allocation needed. We still follow the protocol and >>> + # call our caller - it doesn't cost much and makes sure vararg works. >>> + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. >>> + basr %r14, %r10 # Call our caller. >> The comment confuses me. It somewhat sounds to me like the call >> wouldn't be really needed but in fact it cannot even remotely work >> without jumping back to the function body right?! > > Certainly. __morestack's task is to call the given function entry point > once the necessary stack space is established. In fact, in the no > allocation case, a sibling-call would actually be possible, if it > weren't for one annoying detail: there are no free GPRs we could use to > keep the address of the entry point - %r0 may be used to keep static > chain, %r1 may have to be the argument pointer, %r2-%r5 may be used to > keep parameters, and %r6-%r15 are callee-saved. Ok. The comment isn't about no-call vs. call it is about sibcall vs. call - got it. Bye, -Andreas- ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH] s390: Add -fsplit-stack support 2016-01-29 16:17 ` Andreas Krebbel @ 2016-02-02 14:52 ` Marcin Kościelnicki 2016-02-02 15:19 ` Andreas Krebbel 2016-02-07 12:22 ` [PATCH] testsuite/s390: Add __morestack test Marcin Kościelnicki 1 sibling, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-02-02 14:52 UTC (permalink / raw) To: krebbel; +Cc: gcc-patches, Marcin Kościelnicki libgcc/ChangeLog: * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. * config/s390/morestack.S: New file. * config/s390/t-stack-s390: New file. * generic-morestack.c (__splitstack_find): Add s390-specific code. gcc/ChangeLog: * common/config/s390/s390-common.c (s390_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro. * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. * config/s390/s390.c (struct machine_function): New field split_stack_varargs_pointer. (s390_register_info): Mark r12 as clobbered if it'll be used as temp in s390_emit_prologue. (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack vararg pointer. (morestack_ref): New global. (SPLIT_STACK_AVAILABLE): New macro. (s390_expand_split_stack_prologue): New function. (s390_expand_split_stack_call): New function. (s390_live_on_entry): New function. (s390_va_start): Use split-stack vararg pointer if appropriate. (s390_reorg): Lower the split-stack pseudo-insns. (s390_asm_file_end): Emit the split-stack note sections. (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. (UNSPECV_SPLIT_STACK_CALL): New unspec. (UNSPECV_SPLIT_STACK_SIBCALL): New unspec. (UNSPECV_SPLIT_STACK_MARKER): New unspec. (split_stack_prologue): New expand. (split_stack_call): New expand. (split_stack_call_*): New insn. (split_stack_cond_call): New expand. (split_stack_cond_call_*): New insn. (split_stack_space_check): New expand. (split_stack_sibcall): New expand. (split_stack_sibcall_*): New insn. (split_stack_cond_sibcall): New expand. (split_stack_cond_sibcall_*): New insn. (split_stack_marker): New insn. --- I've implemented most of your requested changes, with two exceptions: - I don't use virtual_incoming_args_rtx in s390_expand_split_stack_prologue, since this causes constraint error - I suppose it just cannot be used after reload. - It seems to me there's no problem with TPF and r1 - the conditional you mention is meant to avoid modifying r14 (which we do - by aiming at r1 and r12 for arg pointer and temp, respectively), not to ensure use of r1 as the temporary. Unless there's a good reason to avoid modifying r12, the code seems fine to me. As for the testcase we discussed, I'll submit it as a separate patch. gcc/ChangeLog | 37 +++ gcc/common/config/s390/s390-common.c | 14 + gcc/config/s390/s390-protos.h | 1 + gcc/config/s390/s390.c | 321 +++++++++++++++++- gcc/config/s390/s390.md | 177 ++++++++++ libgcc/ChangeLog | 7 + libgcc/config.host | 4 +- libgcc/config/s390/morestack.S | 609 +++++++++++++++++++++++++++++++++++ libgcc/config/s390/t-stack-s390 | 2 + libgcc/generic-morestack.c | 4 + 10 files changed, 1170 insertions(+), 6 deletions(-) create mode 100644 libgcc/config/s390/morestack.S create mode 100644 libgcc/config/s390/t-stack-s390 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 9a2cec8..af86079 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,40 @@ +2016-02-02 Marcin KoÅcielnicki <koriakin@0x04.net> + + * common/config/s390/s390-common.c (s390_supports_split_stack): + New function. + (TARGET_SUPPORTS_SPLIT_STACK): New macro. + * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. + * config/s390/s390.c (struct machine_function): New field + split_stack_varargs_pointer. + (s390_register_info): Mark r12 as clobbered if it'll be used as temp + in s390_emit_prologue. + (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack + vararg pointer. + (morestack_ref): New global. + (SPLIT_STACK_AVAILABLE): New macro. + (s390_expand_split_stack_prologue): New function. + (s390_expand_split_stack_call): New function. + (s390_live_on_entry): New function. + (s390_va_start): Use split-stack vararg pointer if appropriate. + (s390_reorg): Lower the split-stack pseudo-insns. + (s390_asm_file_end): Emit the split-stack note sections. + (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. + * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. + (UNSPECV_SPLIT_STACK_CALL): New unspec. + (UNSPECV_SPLIT_STACK_SIBCALL): New unspec. + (UNSPECV_SPLIT_STACK_MARKER): New unspec. + (split_stack_prologue): New expand. + (split_stack_call): New expand. + (split_stack_call_*): New insn. + (split_stack_cond_call): New expand. + (split_stack_cond_call_*): New insn. + (split_stack_space_check): New expand. + (split_stack_sibcall): New expand. + (split_stack_sibcall_*): New insn. + (split_stack_cond_sibcall): New expand. + (split_stack_cond_sibcall_*): New insn. + (split_stack_marker): New insn. + 2016-02-02 Thomas Schwinge <thomas@codesourcery.com> * omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): Remove. diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c index 4519c21..1e497e6 100644 --- a/gcc/common/config/s390/s390-common.c +++ b/gcc/common/config/s390/s390-common.c @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, } } +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23. + We don't verify it, since earlier versions just have padding at + its place, which works just as well. */ + +static bool +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED, + struct gcc_options *opts ATTRIBUTE_UNUSED) +{ + return true; +} + #undef TARGET_DEFAULT_TARGET_FLAGS #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT) @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, #undef TARGET_OPTION_INIT_STRUCT #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct +#undef TARGET_SUPPORTS_SPLIT_STACK +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack + struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index 633bc1e..09032c9 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, extern HOST_WIDE_INT s390_initial_elimination_offset (int, int); extern void s390_emit_prologue (void); extern void s390_emit_epilogue (bool); +extern void s390_expand_split_stack_prologue (void); extern bool s390_can_use_simple_return_insn (void); extern bool s390_can_use_return_insn (void); extern void s390_function_profiler (FILE *, int); diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 3be64de..59628ba 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -426,6 +426,13 @@ struct GTY(()) machine_function /* True if the current function may contain a tbegin clobbering FPRs. */ bool tbegin_p; + + /* For -fsplit-stack support: A stack local which holds a pointer to + the stack arguments for a function with a variable number of + arguments. This is set at the start of the function and is used + to initialize the overflow_arg_area field of the va_list + structure. */ + rtx split_stack_varargs_pointer; }; /* Few accessor macros for struct cfun->machine->s390_frame_layout. */ @@ -9316,9 +9323,13 @@ s390_register_info () cfun_frame_layout.high_fprs++; } - if (flag_pic) - clobbered_regs[PIC_OFFSET_TABLE_REGNUM] - |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM); + /* Register 12 is used for GOT address, but also as temp in prologue + for split-stack stdarg functions (unless r14 is available). */ + clobbered_regs[12] + |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM)) + || (flag_split_stack && cfun->stdarg + && (crtl->is_leaf || TARGET_TPF_PROFILING + || has_hard_reg_initial_val (Pmode, RETURN_REGNUM)))); clobbered_regs[BASE_REGNUM] |= (cfun->machine->base_reg @@ -10446,6 +10457,8 @@ s390_emit_prologue (void) && !crtl->is_leaf && !TARGET_TPF_PROFILING) temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM); + else if (flag_split_stack && cfun->stdarg) + temp_reg = gen_rtx_REG (Pmode, 12); else temp_reg = gen_rtx_REG (Pmode, 1); @@ -10939,6 +10952,234 @@ s300_set_up_by_prologue (hard_reg_set_container *regs) SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg)); } +/* -fsplit-stack support. */ + +/* A SYMBOL_REF for __morestack. */ +static GTY(()) rtx morestack_ref; + +/* When using -fsplit-stack, the allocation routines set a field in + the TCB to the bottom of the stack plus this much space, measured + in bytes. */ + +#define SPLIT_STACK_AVAILABLE 1024 + +/* Emit -fsplit-stack prologue, which goes before the regular function + prologue. */ + +void +s390_expand_split_stack_prologue (void) +{ + rtx r1, guard, cc; + rtx_insn *insn; + /* Offset from thread pointer to __private_ss. */ + int psso = TARGET_64BIT ? 0x38 : 0x20; + /* Pointer size in bytes. */ + /* Frame size and argument size - the two parameters to __morestack. */ + HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size; + /* Align argument size to 8 bytes - simplifies __morestack code. */ + HOST_WIDE_INT args_size = crtl->args.size >= 0 + ? ((crtl->args.size + 7) & ~7) + : 0; + /* Label to be called by __morestack. */ + rtx_code_label *call_done = NULL; + rtx tmp; + + gcc_assert (flag_split_stack && reload_completed); + if (!TARGET_CPU_ZARCH) + { + sorry ("CPUs older than z900 are not supported for -fsplit-stack"); + return; + } + + r1 = gen_rtx_REG (Pmode, 1); + + /* If no stack frame will be allocated, don't do anything. */ + if (!frame_size) + { + /* But emit a marker that will let linker and indirect function + calls recognise this function as split-stack aware. */ + emit_insn (gen_split_stack_marker ()); + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + /* If va_start is used, just use r15. */ + emit_move_insn (r1, + gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (STACK_POINTER_OFFSET))); + + } + return; + } + + if (morestack_ref == NULL_RTX) + { + morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack"); + SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL + | SYMBOL_FLAG_FUNCTION); + } + + if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Op (frame_size)) + { + /* If frame_size will fit in an add instruction, do a stack space + check, and only call __morestack if there's not enough space. */ + + /* Get thread pointer. r1 is the only register we can always destroy - r0 + could contain a static chain (and cannot be used to address memory + anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved. */ + emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM)); + /* Aim at __private_ss. */ + guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso)); + + /* If less that 1kiB used, skip addition and compare directly with + __private_ss. */ + if (frame_size > SPLIT_STACK_AVAILABLE) + { + emit_move_insn (r1, guard); + if (TARGET_64BIT) + emit_insn (gen_adddi3 (r1, r1, GEN_INT (frame_size))); + else + emit_insn (gen_addsi3 (r1, r1, GEN_INT (frame_size))); + guard = r1; + } + + /* Compare the (maybe adjusted) guard with the stack pointer. */ + cc = s390_emit_compare (LT, stack_pointer_rtx, guard); + + call_done = gen_label_rtx (); + + tmp = gen_split_stack_cond_call (call_done, + morestack_ref, + GEN_INT (frame_size), + GEN_INT (args_size), + cc); + + insn = emit_jump_insn (tmp); + JUMP_LABEL (insn) = call_done; + + /* Mark the jump as very unlikely to be taken. */ + add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100); + + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + /* If va_start is used, and __morestack was not called, just use + r15. */ + emit_move_insn (r1, + gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (STACK_POINTER_OFFSET))); + } + } + else + { + call_done = gen_label_rtx (); + + /* Now, we need to call __morestack. It has very special calling + conventions: it preserves param/return/static chain registers for + calling main function body, and looks for its own parameters + at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */ + tmp = gen_split_stack_call (call_done, + morestack_ref, + GEN_INT (frame_size), + GEN_INT (args_size)); + insn = emit_jump_insn (tmp); + JUMP_LABEL (insn) = call_done; + emit_barrier (); + } + + /* __morestack will call us here. */ + + emit_label (call_done); + LABEL_NUSES (call_done) = 1; +} + +/* Generates split-stack call sequence, along with its parameter block. */ + +static void +s390_expand_split_stack_call (rtx_insn *orig_insn, + rtx call_done, + rtx function, + rtx frame_size, + rtx args_size, + rtx cond) +{ + rtx_insn *insn = orig_insn; + rtx parmbase = gen_label_rtx (); + rtx r1 = gen_rtx_REG (Pmode, 1); + rtx tmp, tmp2; + + /* %r1 = litbase. */ + insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn); + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); + LABEL_NUSES (parmbase)++; + + /* jg<cond> __morestack. */ + if (cond == NULL) + { + tmp = gen_split_stack_sibcall (function, call_done); + insn = emit_jump_insn_after (tmp, insn); + } + else + { + gcc_assert (s390_comparison (cond, VOIDmode)); + tmp = gen_split_stack_cond_sibcall (function, cond, call_done); + insn = emit_jump_insn_after (tmp, insn); + } + JUMP_LABEL (insn) = call_done; + LABEL_NUSES (call_done)++; + + /* Go to .rodata. */ + insn = emit_insn_after (gen_pool_section_start (), insn); + + /* Now, we'll emit parameters to __morestack. First, align to pointer size + (this mirrors the alignment done in __morestack - don't touch it). */ + insn = emit_insn_after (gen_pool_align (GEN_INT (UNITS_PER_LONG)), insn); + + insn = emit_label_after (parmbase, insn); + + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, frame_size), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + + /* Second parameter is size of the arguments passed on stack that + __morestack has to copy to the new stack (does not include varargs). */ + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, args_size), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + + /* Third parameter is offset between start of the parameter block + and function body to be called by __morestack. */ + tmp = gen_rtx_LABEL_REF (Pmode, parmbase); + tmp2 = gen_rtx_LABEL_REF (Pmode, call_done); + tmp = gen_rtx_CONST (Pmode, + gen_rtx_MINUS (Pmode, tmp2, tmp)); + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, tmp), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + add_reg_note (insn, REG_LABEL_OPERAND, call_done); + LABEL_NUSES (call_done)++; + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); + LABEL_NUSES (parmbase)++; + + /* Return from .rodata. */ + insn = emit_insn_after (gen_pool_section_end (), insn); + + delete_insn (orig_insn); +} + +/* We may have to tell the dataflow pass that the split stack prologue + is initializing a register. */ + +static void +s390_live_on_entry (bitmap regs) +{ + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + gcc_assert (flag_split_stack); + bitmap_set_bit (regs, 1); + } +} + /* Return true if the function can use simple_return to return outside of a shrink-wrapped region. At present shrink-wrapping is supported in all cases. */ @@ -11541,6 +11782,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } + if (flag_split_stack + && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) + == NULL) + && cfun->machine->split_stack_varargs_pointer == NULL_RTX) + { + rtx reg; + rtx_insn *seq; + + reg = gen_reg_rtx (Pmode); + cfun->machine->split_stack_varargs_pointer = reg; + + start_sequence (); + emit_move_insn (reg, gen_rtx_REG (Pmode, 1)); + seq = get_insns (); + end_sequence (); + + push_topmost_sequence (); + emit_insn_after (seq, entry_of_function ()); + pop_topmost_sequence (); + } + /* Find the overflow area. FIXME: This currently is too pessimistic when the vector ABI is enabled. In that case we *always* set up the overflow area @@ -11549,7 +11811,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG || TARGET_VX_ABI) { - t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); + if (cfun->machine->split_stack_varargs_pointer == NULL_RTX) + t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); + else + t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer); off = INTVAL (crtl->args.arg_offset_rtx); off = off < 0 ? 0 : off; @@ -13158,6 +13423,48 @@ s390_reorg (void) } } + if (flag_split_stack) + { + rtx_insn *insn; + + for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) + { + /* Look for the split-stack fake jump instructions. */ + if (!JUMP_P(insn)) + continue; + if (GET_CODE (PATTERN (insn)) != PARALLEL + || XVECLEN (PATTERN (insn), 0) != 2) + continue; + rtx set = XVECEXP (PATTERN (insn), 0, 1); + if (GET_CODE (set) != SET) + continue; + rtx unspec = XEXP(set, 1); + if (GET_CODE (unspec) != UNSPEC_VOLATILE) + continue; + if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL) + continue; + rtx set_pc = XVECEXP (PATTERN (insn), 0, 0); + rtx function = XVECEXP (unspec, 0, 0); + rtx frame_size = XVECEXP (unspec, 0, 1); + rtx args_size = XVECEXP (unspec, 0, 2); + rtx pc_src = XEXP (set_pc, 1); + rtx call_done, cond = NULL_RTX; + if (GET_CODE (pc_src) == IF_THEN_ELSE) + { + cond = XEXP (pc_src, 0); + call_done = XEXP (XEXP (pc_src, 1), 0); + } + else + call_done = XEXP (pc_src, 0); + s390_expand_split_stack_call (insn, + call_done, + function, + frame_size, + args_size, + cond); + } + } + /* Try to optimize prologue and epilogue further. */ s390_optimize_prologue (); @@ -14469,6 +14776,9 @@ s390_asm_file_end (void) s390_vector_abi); #endif file_end_indicate_exec_stack (); + + if (flag_split_stack) + file_end_indicate_split_stack (); } /* Return true if TYPE is a vector bool type. */ @@ -14724,6 +15034,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty #undef TARGET_SET_UP_BY_PROLOGUE #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue +#undef TARGET_EXTRA_LIVE_ON_ENTRY +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry + #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ s390_use_by_pieces_infrastructure_p diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 9b869d5..771f1cc 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -114,6 +114,9 @@ UNSPEC_SP_SET UNSPEC_SP_TEST + ; Split stack support + UNSPEC_STACK_CHECK + ; Test Data Class (TDC) UNSPEC_TDC_INSN @@ -276,6 +279,11 @@ ; Set and get floating point control register UNSPECV_SFPC UNSPECV_EFPC + + ; Split stack support + UNSPECV_SPLIT_STACK_CALL + UNSPECV_SPLIT_STACK_SIBCALL + UNSPECV_SPLIT_STACK_MARKER ]) ;; @@ -10907,3 +10915,172 @@ "TARGET_Z13" "lcbb\t%0,%1,%b2" [(set_attr "op_type" "VRX")]) + +; Handle -fsplit-stack. + +(define_expand "split_stack_prologue" + [(const_int 0)] + "" +{ + s390_expand_split_stack_prologue (); + DONE; +}) + +(define_expand "split_stack_call" + [(match_operand 0 "" "") + (match_operand 1 "bras_sym_operand" "X") + (match_operand 2 "consttable_operand" "X") + (match_operand 3 "consttable_operand" "X")] + "TARGET_CPU_ZARCH" +{ + if (TARGET_64BIT) + emit_jump_insn (gen_split_stack_call_di (operands[0], + operands[1], + operands[2], + operands[3])); + else + emit_jump_insn (gen_split_stack_call_si (operands[0], + operands[1], + operands[2], + operands[3])); + DONE; +}) + +(define_insn "split_stack_call_<mode>" + [(set (pc) (label_ref (match_operand 0 "" ""))) + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") + (match_operand 2 "consttable_operand" "X") + (match_operand 3 "consttable_operand" "X")] + UNSPECV_SPLIT_STACK_CALL))] + "TARGET_CPU_ZARCH" +{ + gcc_unreachable (); +} + [(set_attr "length" "12")]) + +(define_expand "split_stack_cond_call" + [(match_operand 0 "" "") + (match_operand 1 "bras_sym_operand" "X") + (match_operand 2 "consttable_operand" "X") + (match_operand 3 "consttable_operand" "X") + (match_operand 4 "" "")] + "TARGET_CPU_ZARCH" +{ + if (TARGET_64BIT) + emit_jump_insn (gen_split_stack_cond_call_di (operands[0], + operands[1], + operands[2], + operands[3], + operands[4])); + else + emit_jump_insn (gen_split_stack_cond_call_si (operands[0], + operands[1], + operands[2], + operands[3], + operands[4])); + DONE; +}) + +(define_insn "split_stack_cond_call_<mode>" + [(set (pc) + (if_then_else + (match_operand 4 "" "") + (label_ref (match_operand 0 "" "")) + (pc))) + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") + (match_operand 2 "consttable_operand" "X") + (match_operand 3 "consttable_operand" "X")] + UNSPECV_SPLIT_STACK_CALL))] + "TARGET_CPU_ZARCH" +{ + gcc_unreachable (); +} + [(set_attr "length" "12")]) + +;; If there are operand 0 bytes available on the stack, jump to +;; operand 1. + +(define_expand "split_stack_space_check" + [(set (pc) (if_then_else + (ltu (minus (reg 15) + (match_operand 0 "register_operand")) + (unspec [(const_int 0)] UNSPEC_STACK_CHECK)) + (label_ref (match_operand 1)) + (pc)))] + "" +{ + /* Offset from thread pointer to __private_ss. */ + int psso = TARGET_64BIT ? 0x38 : 0x20; + rtx tp = s390_get_thread_pointer (); + rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso)); + rtx reg = gen_reg_rtx (Pmode); + rtx cc; + if (TARGET_64BIT) + emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0])); + else + emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0])); + cc = s390_emit_compare (GT, reg, guard); + s390_emit_jump (operands[1], cc); + + DONE; +}) + +;; A jg with minimal fuss for use in split stack prologue. + +(define_expand "split_stack_sibcall" + [(match_operand 0 "bras_sym_operand" "X") + (match_operand 1 "" "")] + "TARGET_CPU_ZARCH" +{ + if (TARGET_64BIT) + emit_jump_insn (gen_split_stack_sibcall_di (operands[0], operands[1])); + else + emit_jump_insn (gen_split_stack_sibcall_si (operands[0], operands[1])); + DONE; +}) + +(define_insn "split_stack_sibcall_<mode>" + [(set (pc) (label_ref (match_operand 1 "" ""))) + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] + UNSPECV_SPLIT_STACK_SIBCALL))] + "TARGET_CPU_ZARCH" + "jg\t%0" + [(set_attr "op_type" "RIL") + (set_attr "type" "branch")]) + +;; Also a conditional one. + +(define_expand "split_stack_cond_sibcall" + [(match_operand 0 "bras_sym_operand" "X") + (match_operand 1 "" "") + (match_operand 2 "" "")] + "TARGET_CPU_ZARCH" +{ + if (TARGET_64BIT) + emit_jump_insn (gen_split_stack_cond_sibcall_di (operands[0], operands[1], operands[2])); + else + emit_jump_insn (gen_split_stack_cond_sibcall_si (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_insn "split_stack_cond_sibcall_<mode>" + [(set (pc) + (if_then_else + (match_operand 1 "" "") + (label_ref (match_operand 2 "" "")) + (pc))) + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] + UNSPECV_SPLIT_STACK_SIBCALL))] + "TARGET_CPU_ZARCH" + "jg%C1\t%0" + [(set_attr "op_type" "RIL") + (set_attr "type" "branch")]) + +;; An unusual nop instruction used to mark functions with no stack frames +;; as split-stack aware. + +(define_insn "split_stack_marker" + [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)] + "" + "nopr\t%%r15" + [(set_attr "op_type" "RR")]) diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog index 49c7929..3900ab1 100644 --- a/libgcc/ChangeLog +++ b/libgcc/ChangeLog @@ -1,3 +1,10 @@ +2016-02-02 Marcin KoÅcielnicki <koriakin@0x04.net> + + * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. + * config/s390/morestack.S: New file. + * config/s390/t-stack-s390: New file. + * generic-morestack.c (__splitstack_find): Add s390-specific code. + 2016-01-25 Jakub Jelinek <jakub@redhat.com> PR target/69444 diff --git a/libgcc/config.host b/libgcc/config.host index d8efd82..2be5f7e 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1114,11 +1114,11 @@ rx-*-elf) tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h" ;; s390-*-linux*) - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi" + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390" md_unwind_header=s390/linux-unwind.h ;; s390x-*-linux*) - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux" + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390" if test "${host_address}" = 32; then tmake_file="${tmake_file} s390/32/t-floattodi" fi diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S new file mode 100644 index 0000000..141dead --- /dev/null +++ b/libgcc/config/s390/morestack.S @@ -0,0 +1,609 @@ +# s390 support for -fsplit-stack. +# Copyright (C) 2015 Free Software Foundation, Inc. +# Contributed by Marcin KoÅcielnicki <koriakin@0x04.net>. + +# This file is part of GCC. + +# GCC is free software; you can redistribute it and/or modify it under +# the terms of the GNU General Public License as published by the Free +# Software Foundation; either version 3, or (at your option) any later +# version. + +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. + +# Under Section 7 of GPL version 3, you are granted additional +# permissions described in the GCC Runtime Library Exception, version +# 3.1, as published by the Free Software Foundation. + +# You should have received a copy of the GNU General Public License and +# a copy of the GCC Runtime Library Exception along with this program; +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +# <http://www.gnu.org/licenses/>. + +# Excess space needed to call ld.so resolver for lazy plt +# resolution. Go uses sigaltstack so this doesn't need to +# also cover signal frame size. +#define BACKOFF 0x1000 + +# The __morestack function. + + .global __morestack + .hidden __morestack + + .type __morestack,@function + +__morestack: +.LFB1: + .cfi_startproc + + +#ifndef __s390x__ + + +# The 31-bit __morestack function. + + # We use a cleanup to restore the stack guard if an exception + # is thrown through this code. +#ifndef __PIC__ + .cfi_personality 0,__gcc_personality_v0 + .cfi_lsda 0,.LLSDA1 +#else + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 + .cfi_lsda 0x1b,.LLSDA1 +#endif + + stm %r2, %r15, 0x8(%r15) # Save %r2-%r15. + .cfi_offset %r6, -0x48 + .cfi_offset %r7, -0x44 + .cfi_offset %r8, -0x40 + .cfi_offset %r9, -0x3c + .cfi_offset %r10, -0x38 + .cfi_offset %r11, -0x34 + .cfi_offset %r12, -0x30 + .cfi_offset %r13, -0x2c + .cfi_offset %r14, -0x28 + .cfi_offset %r15, -0x24 + lr %r11, %r15 # Make frame pointer for vararg. + .cfi_def_cfa_register %r11 + ahi %r15, -0x60 # 0x60 for standard frame. + st %r11, 0(%r15) # Save back chain. + lr %r8, %r0 # Save %r0 (static chain). + lr %r10, %r1 # Save %r1 (address of parameter block). + + l %r7, 0(%r10) # Required frame size to %r7 + ear %r1, %a0 # Extract thread pointer. + l %r1, 0x20(%r1) # Get stack bounduary + ar %r1, %r7 # Stack bounduary + frame size + a %r1, 4(%r10) # + stack param size + clr %r1, %r15 # Compare with current stack pointer + jle .Lnoalloc # guard > sp - frame-size: need alloc + + brasl %r14, __morestack_block_signals + + # We abuse one of caller's fpr save slots (which we don't use for fprs) + # as a local variable. Not needed here, but done to be consistent with + # the below use. + ahi %r7, BACKOFF # Bump requested size a bit. + st %r7, 0x40(%r11) # Stuff frame size on stack. + la %r2, 0x40(%r11) # Pass its address as parameter. + la %r3, 0x60(%r11) # Caller's stack parameters. + l %r4, 4(%r10) # Size of stack parameters. + brasl %r14, __generic_morestack + + lr %r15, %r2 # Switch to the new stack. + ahi %r15, -0x60 # Make a stack frame on it. + st %r11, 0(%r15) # Save back chain. + + s %r2, 0x40(%r11) # The end of stack space. + ahi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. +.LEHB0: + st %r2, 0x20(%r1) # Save the new stack boundary. + + brasl %r14, __morestack_unblock_signals + + lr %r0, %r8 # Static chain. + lm %r2, %r6, 0x8(%r11) # Paremeter registers. + + # Third parameter is address of function meat - address of parameter + # block. + a %r10, 0x8(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0x60(%r11) + + # State of registers: + # %r0: Static chain from entry. + # %r1: Vararg pointer. + # %r2-%r6: Parameters from entry. + # %r7-%r10: Indeterminate. + # %r11: Frame pointer (%r15 from entry). + # %r12-%r13: Indeterminate. + # %r14: Return address. + # %r15: Stack pointer. + basr %r14, %r10 # Call our caller. + + stm %r2, %r3, 0x8(%r11) # Save return registers. + + brasl %r14, __morestack_block_signals + + # We need a stack slot now, but have no good way to get it - the frame + # on new stack had to be exactly 0x60 bytes, or stack parameters would + # be passed wrong. Abuse fpr save area in caller's frame (we don't + # save actual fprs). + la %r2, 0x40(%r11) + brasl %r14, __generic_releasestack + + s %r2, 0x40(%r11) # Subtract available space. + ahi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. +.LEHE0: + st %r2, 0x20(%r1) # Save the new stack boundary. + + # We need to restore the old stack pointer before unblocking signals. + # We also need 0x60 bytes for a stack frame. Since we had a stack + # frame at this place before the stack switch, there's no need to + # write the back chain again. + lr %r15, %r11 + ahi %r15, -0x60 + + brasl %r14, __morestack_unblock_signals + + lm %r2, %r15, 0x8(%r11) # Restore all registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# Executed if no new stack allocation is needed. + +.Lnoalloc: + .cfi_restore_state + # We may need to copy stack parameters. + l %r9, 0x4(%r10) # Load stack parameter size. + ltr %r9, %r9 # And check if it's 0. + je .Lnostackparm # Skip the copy if not needed. + sr %r15, %r9 # Make space on the stack. + la %r8, 0x60(%r15) # Destination. + la %r12, 0x60(%r11) # Source. + lr %r13, %r9 # Source size. +.Lcopy: + mvcle %r8, %r12, 0 # Copy. + jo .Lcopy + +.Lnostackparm: + # Third parameter is address of function meat - address of parameter + # block. + a %r10, 0x8(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0x60(%r11) + + # OK, no stack allocation needed. We still follow the protocol and + # call our caller - it doesn't cost much and makes sure vararg works. + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. + basr %r14, %r10 # Call our caller. + + lm %r6, %r15, 0x18(%r11) # Restore all callee-saved registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# This is the cleanup code called by the stack unwinder when unwinding +# through the code between .LEHB0 and .LEHE0 above. + +.L1: + .cfi_restore_state + lr %r2, %r11 # Stack pointer after resume. + brasl %r14, __generic_findstack + lr %r3, %r11 # Get the stack pointer. + sr %r3, %r2 # Subtract available space. + ahi %r3, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. + st %r3, 0x20(%r1) # Save the new stack boundary. + + lr %r2, %r6 # Exception header. +#ifdef __PIC__ + brasl %r14, _Unwind_Resume@PLT +#else + brasl %r14, _Unwind_Resume +#endif + +#else /* defined(__s390x__) */ + + +# The 64-bit __morestack function. + + # We use a cleanup to restore the stack guard if an exception + # is thrown through this code. +#ifndef __PIC__ + .cfi_personality 0x3,__gcc_personality_v0 + .cfi_lsda 0x3,.LLSDA1 +#else + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 + .cfi_lsda 0x1b,.LLSDA1 +#endif + + stmg %r2, %r15, 0x10(%r15) # Save %r2-%r15. + .cfi_offset %r6, -0x70 + .cfi_offset %r7, -0x68 + .cfi_offset %r8, -0x60 + .cfi_offset %r9, -0x58 + .cfi_offset %r10, -0x50 + .cfi_offset %r11, -0x48 + .cfi_offset %r12, -0x40 + .cfi_offset %r13, -0x38 + .cfi_offset %r14, -0x30 + .cfi_offset %r15, -0x28 + lgr %r11, %r15 # Make frame pointer for vararg. + .cfi_def_cfa_register %r11 + aghi %r15, -0xa0 # 0xa0 for standard frame. + stg %r11, 0(%r15) # Save back chain. + lgr %r8, %r0 # Save %r0 (static chain). + lgr %r10, %r1 # Save %r1 (address of parameter block). + + lg %r7, 0(%r10) # Required frame size to %r7 + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. + lg %r1, 0x38(%r1) # Get stack bounduary + agr %r1, %r7 # Stack bounduary + frame size + ag %r1, 8(%r10) # + stack param size + clgr %r1, %r15 # Compare with current stack pointer + jle .Lnoalloc # guard > sp - frame-size: need alloc + + brasl %r14, __morestack_block_signals + + # We abuse one of caller's fpr save slots (which we don't use for fprs) + # as a local variable. Not needed here, but done to be consistent with + # the below use. + aghi %r7, BACKOFF # Bump requested size a bit. + stg %r7, 0x80(%r11) # Stuff frame size on stack. + la %r2, 0x80(%r11) # Pass its address as parameter. + la %r3, 0xa0(%r11) # Caller's stack parameters. + lg %r4, 8(%r10) # Size of stack parameters. + brasl %r14, __generic_morestack + + lgr %r15, %r2 # Switch to the new stack. + aghi %r15, -0xa0 # Make a stack frame on it. + stg %r11, 0(%r15) # Save back chain. + + sg %r2, 0x80(%r11) # The end of stack space. + aghi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. +.LEHB0: + stg %r2, 0x38(%r1) # Save the new stack boundary. + + brasl %r14, __morestack_unblock_signals + + lgr %r0, %r8 # Static chain. + lmg %r2, %r6, 0x10(%r11) # Paremeter registers. + + # Third parameter is address of function meat - address of parameter + # block. + ag %r10, 0x10(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0xa0(%r11) + + # State of registers: + # %r0: Static chain from entry. + # %r1: Vararg pointer. + # %r2-%r6: Parameters from entry. + # %r7-%r10: Indeterminate. + # %r11: Frame pointer (%r15 from entry). + # %r12-%r13: Indeterminate. + # %r14: Return address. + # %r15: Stack pointer. + basr %r14, %r10 # Call our caller. + + stg %r2, 0x10(%r11) # Save return register. + + brasl %r14, __morestack_block_signals + + # We need a stack slot now, but have no good way to get it - the frame + # on new stack had to be exactly 0xa0 bytes, or stack parameters would + # be passed wrong. Abuse fpr save area in caller's frame (we don't + # save actual fprs). + la %r2, 0x80(%r11) + brasl %r14, __generic_releasestack + + sg %r2, 0x80(%r11) # Subtract available space. + aghi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. +.LEHE0: + stg %r2, 0x38(%r1) # Save the new stack boundary. + + # We need to restore the old stack pointer before unblocking signals. + # We also need 0xa0 bytes for a stack frame. Since we had a stack + # frame at this place before the stack switch, there's no need to + # write the back chain again. + lgr %r15, %r11 + aghi %r15, -0xa0 + + brasl %r14, __morestack_unblock_signals + + lmg %r2, %r15, 0x10(%r11) # Restore all registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# Executed if no new stack allocation is needed. + +.Lnoalloc: + .cfi_restore_state + # We may need to copy stack parameters. + lg %r9, 0x8(%r10) # Load stack parameter size. + ltgr %r9, %r9 # Check if it's 0. + je .Lnostackparm # Skip the copy if not needed. + sgr %r15, %r9 # Make space on the stack. + la %r8, 0xa0(%r15) # Destination. + la %r12, 0xa0(%r11) # Source. + lgr %r13, %r9 # Source size. +.Lcopy: + mvcle %r8, %r12, 0 # Copy. + jo .Lcopy + +.Lnostackparm: + # Third parameter is address of function meat - address of parameter + # block. + ag %r10, 0x10(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0xa0(%r11) + + # OK, no stack allocation needed. We still follow the protocol and + # call our caller - it doesn't cost much and makes sure vararg works. + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. + basr %r14, %r10 # Call our caller. + + lmg %r6, %r15, 0x30(%r11) # Restore all callee-saved registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# This is the cleanup code called by the stack unwinder when unwinding +# through the code between .LEHB0 and .LEHE0 above. + +.L1: + .cfi_restore_state + lgr %r2, %r11 # Stack pointer after resume. + brasl %r14, __generic_findstack + lgr %r3, %r11 # Get the stack pointer. + sgr %r3, %r2 # Subtract available space. + aghi %r3, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. + stg %r3, 0x38(%r1) # Save the new stack boundary. + + lgr %r2, %r6 # Exception header. +#ifdef __PIC__ + brasl %r14, _Unwind_Resume@PLT +#else + brasl %r14, _Unwind_Resume +#endif + +#endif /* defined(__s390x__) */ + + .cfi_endproc + .size __morestack, . - __morestack + + +# The exception table. This tells the personality routine to execute +# the exception handler. + + .section .gcc_except_table,"a",@progbits + .align 4 +.LLSDA1: + .byte 0xff # @LPStart format (omit) + .byte 0xff # @TType format (omit) + .byte 0x1 # call-site format (uleb128) + .uleb128 .LLSDACSE1-.LLSDACSB1 # Call-site table length +.LLSDACSB1: + .uleb128 .LEHB0-.LFB1 # region 0 start + .uleb128 .LEHE0-.LEHB0 # length + .uleb128 .L1-.LFB1 # landing pad + .uleb128 0 # action +.LLSDACSE1: + + + .global __gcc_personality_v0 +#ifdef __PIC__ + # Build a position independent reference to the basic + # personality function. + .hidden DW.ref.__gcc_personality_v0 + .weak DW.ref.__gcc_personality_v0 + .section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat + .type DW.ref.__gcc_personality_v0, @object +DW.ref.__gcc_personality_v0: +#ifndef __LP64__ + .align 4 + .size DW.ref.__gcc_personality_v0, 4 + .long __gcc_personality_v0 +#else + .align 8 + .size DW.ref.__gcc_personality_v0, 8 + .quad __gcc_personality_v0 +#endif +#endif + + + +# Initialize the stack test value when the program starts or when a +# new thread starts. We don't know how large the main stack is, so we +# guess conservatively. We might be able to use getrlimit here. + + .text + .global __stack_split_initialize + .hidden __stack_split_initialize + + .type __stack_split_initialize, @function + +__stack_split_initialize: + +#ifndef __s390x__ + + ear %r1, %a0 + lr %r0, %r15 + ahi %r0, -0x4000 # We should have at least 16K. + st %r0, 0x20(%r1) + + lr %r2, %r15 + lhi %r3, 0x4000 +#ifdef __PIC__ + jg __generic_morestack_set_initial_sp@PLT # Tail call +#else + jg __generic_morestack_set_initial_sp # Tail call +#endif + +#else /* defined(__s390x__) */ + + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + lgr %r0, %r15 + aghi %r0, -0x4000 # We should have at least 16K. + stg %r0, 0x38(%r1) + + lgr %r2, %r15 + lghi %r3, 0x4000 +#ifdef __PIC__ + jg __generic_morestack_set_initial_sp@PLT # Tail call +#else + jg __generic_morestack_set_initial_sp # Tail call +#endif + +#endif /* defined(__s390x__) */ + + .size __stack_split_initialize, . - __stack_split_initialize + +# Routines to get and set the guard, for __splitstack_getcontext, +# __splitstack_setcontext, and __splitstack_makecontext. + +# void *__morestack_get_guard (void) returns the current stack guard. + .text + .global __morestack_get_guard + .hidden __morestack_get_guard + + .type __morestack_get_guard,@function + +__morestack_get_guard: + +#ifndef __s390x__ + ear %r1, %a0 + l %r2, 0x20(%r1) +#else + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + lg %r2, 0x38(%r1) +#endif + br %r14 + + .size __morestack_get_guard, . - __morestack_get_guard + +# void __morestack_set_guard (void *) sets the stack guard. + .global __morestack_set_guard + .hidden __morestack_set_guard + + .type __morestack_set_guard,@function + +__morestack_set_guard: + +#ifndef __s390x__ + ear %r1, %a0 + st %r2, 0x20(%r1) +#else + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + stg %r2, 0x38(%r1) +#endif + br %r14 + + .size __morestack_set_guard, . - __morestack_set_guard + +# void *__morestack_make_guard (void *, size_t) returns the stack +# guard value for a stack. + .global __morestack_make_guard + .hidden __morestack_make_guard + + .type __morestack_make_guard,@function + +__morestack_make_guard: + +#ifndef __s390x__ + sr %r2, %r3 + ahi %r2, BACKOFF +#else + sgr %r2, %r3 + aghi %r2, BACKOFF +#endif + br %r14 + + .size __morestack_make_guard, . - __morestack_make_guard + +# Make __stack_split_initialize a high priority constructor. + + .section .ctors.65535,"aw",@progbits + +#ifndef __LP64__ + .align 4 + .long __stack_split_initialize + .long __morestack_load_mmap +#else + .align 8 + .quad __stack_split_initialize + .quad __morestack_load_mmap +#endif + + .section .note.GNU-stack,"",@progbits + .section .note.GNU-split-stack,"",@progbits + .section .note.GNU-no-split-stack,"",@progbits diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390 new file mode 100644 index 0000000..4c959b0 --- /dev/null +++ b/libgcc/config/s390/t-stack-s390 @@ -0,0 +1,2 @@ +# Makefile fragment to support -fsplit-stack for s390. +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c index 89765d4..b8eec4e 100644 --- a/libgcc/generic-morestack.c +++ b/libgcc/generic-morestack.c @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len, #elif defined (__i386__) nsp -= 6 * sizeof (void *); #elif defined __powerpc64__ +#elif defined __s390x__ + nsp -= 2 * 160; +#elif defined __s390__ + nsp -= 2 * 96; #else #error "unrecognized target" #endif -- 2.7.0 ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-02 14:52 ` Marcin Kościelnicki @ 2016-02-02 15:19 ` Andreas Krebbel 2016-02-02 15:31 ` Marcin Kościelnicki 0 siblings, 1 reply; 55+ messages in thread From: Andreas Krebbel @ 2016-02-02 15:19 UTC (permalink / raw) To: Marcin Kościelnicki; +Cc: gcc-patches On 02/02/2016 03:52 PM, Marcin KoÅcielnicki wrote: > libgcc/ChangeLog: > > * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. > * config/s390/morestack.S: New file. > * config/s390/t-stack-s390: New file. > * generic-morestack.c (__splitstack_find): Add s390-specific code. > > gcc/ChangeLog: > > * common/config/s390/s390-common.c (s390_supports_split_stack): > New function. > (TARGET_SUPPORTS_SPLIT_STACK): New macro. > * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. > * config/s390/s390.c (struct machine_function): New field > split_stack_varargs_pointer. > (s390_register_info): Mark r12 as clobbered if it'll be used as temp > in s390_emit_prologue. > (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack > vararg pointer. > (morestack_ref): New global. > (SPLIT_STACK_AVAILABLE): New macro. > (s390_expand_split_stack_prologue): New function. > (s390_expand_split_stack_call): New function. > (s390_live_on_entry): New function. > (s390_va_start): Use split-stack vararg pointer if appropriate. > (s390_reorg): Lower the split-stack pseudo-insns. > (s390_asm_file_end): Emit the split-stack note sections. > (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. > * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. > (UNSPECV_SPLIT_STACK_CALL): New unspec. > (UNSPECV_SPLIT_STACK_SIBCALL): New unspec. > (UNSPECV_SPLIT_STACK_MARKER): New unspec. > (split_stack_prologue): New expand. > (split_stack_call): New expand. > (split_stack_call_*): New insn. > (split_stack_cond_call): New expand. > (split_stack_cond_call_*): New insn. > (split_stack_space_check): New expand. > (split_stack_sibcall): New expand. > (split_stack_sibcall_*): New insn. > (split_stack_cond_sibcall): New expand. > (split_stack_cond_sibcall_*): New insn. > (split_stack_marker): New insn. > --- > I've implemented most of your requested changes, with two exceptions: > > - I don't use virtual_incoming_args_rtx in s390_expand_split_stack_prologue, > since this causes constraint error - I suppose it just cannot be used after > reload. Right. As an elimination reg it cannot be used in the code path called from s390_reorg. > - It seems to me there's no problem with TPF and r1 - the conditional you > mention is meant to avoid modifying r14 (which we do - by aiming at r1 and > r12 for arg pointer and temp, respectively), not to ensure use of r1 as the > temporary. Unless there's a good reason to avoid modifying r12, the code > seems fine to me. Ok. The comment above this check then does not seem to be correct anymore. Could you please adjust it as well. It should read "avoid register 14" then. /* Choose best register to use for temp use within prologue. See below for why TPF must use the register 1. */ if (!has_hard_reg_initial_val (Pmode, RETURN_REGNUM) && !crtl->is_leaf && !TARGET_TPF_PROFILING) temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM); ... -Andreas- > > As for the testcase we discussed, I'll submit it as a separate patch. > > > gcc/ChangeLog | 37 +++ > gcc/common/config/s390/s390-common.c | 14 + > gcc/config/s390/s390-protos.h | 1 + > gcc/config/s390/s390.c | 321 +++++++++++++++++- > gcc/config/s390/s390.md | 177 ++++++++++ > libgcc/ChangeLog | 7 + > libgcc/config.host | 4 +- > libgcc/config/s390/morestack.S | 609 +++++++++++++++++++++++++++++++++++ > libgcc/config/s390/t-stack-s390 | 2 + > libgcc/generic-morestack.c | 4 + > 10 files changed, 1170 insertions(+), 6 deletions(-) > create mode 100644 libgcc/config/s390/morestack.S > create mode 100644 libgcc/config/s390/t-stack-s390 > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 9a2cec8..af86079 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,40 @@ > +2016-02-02 Marcin KoÅcielnicki <koriakin@0x04.net> > + > + * common/config/s390/s390-common.c (s390_supports_split_stack): > + New function. > + (TARGET_SUPPORTS_SPLIT_STACK): New macro. > + * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. > + * config/s390/s390.c (struct machine_function): New field > + split_stack_varargs_pointer. > + (s390_register_info): Mark r12 as clobbered if it'll be used as temp > + in s390_emit_prologue. > + (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack > + vararg pointer. > + (morestack_ref): New global. > + (SPLIT_STACK_AVAILABLE): New macro. > + (s390_expand_split_stack_prologue): New function. > + (s390_expand_split_stack_call): New function. > + (s390_live_on_entry): New function. > + (s390_va_start): Use split-stack vararg pointer if appropriate. > + (s390_reorg): Lower the split-stack pseudo-insns. > + (s390_asm_file_end): Emit the split-stack note sections. > + (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. > + * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. > + (UNSPECV_SPLIT_STACK_CALL): New unspec. > + (UNSPECV_SPLIT_STACK_SIBCALL): New unspec. > + (UNSPECV_SPLIT_STACK_MARKER): New unspec. > + (split_stack_prologue): New expand. > + (split_stack_call): New expand. > + (split_stack_call_*): New insn. > + (split_stack_cond_call): New expand. > + (split_stack_cond_call_*): New insn. > + (split_stack_space_check): New expand. > + (split_stack_sibcall): New expand. > + (split_stack_sibcall_*): New insn. > + (split_stack_cond_sibcall): New expand. > + (split_stack_cond_sibcall_*): New insn. > + (split_stack_marker): New insn. > + > 2016-02-02 Thomas Schwinge <thomas@codesourcery.com> > > * omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): Remove. > diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c > index 4519c21..1e497e6 100644 > --- a/gcc/common/config/s390/s390-common.c > +++ b/gcc/common/config/s390/s390-common.c > @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, > } > } > > +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23. > + We don't verify it, since earlier versions just have padding at > + its place, which works just as well. */ > + > +static bool > +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED, > + struct gcc_options *opts ATTRIBUTE_UNUSED) > +{ > + return true; > +} > + > #undef TARGET_DEFAULT_TARGET_FLAGS > #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT) > > @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, > #undef TARGET_OPTION_INIT_STRUCT > #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct > > +#undef TARGET_SUPPORTS_SPLIT_STACK > +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack > + > struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; > diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h > index 633bc1e..09032c9 100644 > --- a/gcc/config/s390/s390-protos.h > +++ b/gcc/config/s390/s390-protos.h > @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, > extern HOST_WIDE_INT s390_initial_elimination_offset (int, int); > extern void s390_emit_prologue (void); > extern void s390_emit_epilogue (bool); > +extern void s390_expand_split_stack_prologue (void); > extern bool s390_can_use_simple_return_insn (void); > extern bool s390_can_use_return_insn (void); > extern void s390_function_profiler (FILE *, int); > diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c > index 3be64de..59628ba 100644 > --- a/gcc/config/s390/s390.c > +++ b/gcc/config/s390/s390.c > @@ -426,6 +426,13 @@ struct GTY(()) machine_function > /* True if the current function may contain a tbegin clobbering > FPRs. */ > bool tbegin_p; > + > + /* For -fsplit-stack support: A stack local which holds a pointer to > + the stack arguments for a function with a variable number of > + arguments. This is set at the start of the function and is used > + to initialize the overflow_arg_area field of the va_list > + structure. */ > + rtx split_stack_varargs_pointer; > }; > > /* Few accessor macros for struct cfun->machine->s390_frame_layout. */ > @@ -9316,9 +9323,13 @@ s390_register_info () > cfun_frame_layout.high_fprs++; > } > > - if (flag_pic) > - clobbered_regs[PIC_OFFSET_TABLE_REGNUM] > - |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM); > + /* Register 12 is used for GOT address, but also as temp in prologue > + for split-stack stdarg functions (unless r14 is available). */ > + clobbered_regs[12] > + |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM)) > + || (flag_split_stack && cfun->stdarg > + && (crtl->is_leaf || TARGET_TPF_PROFILING > + || has_hard_reg_initial_val (Pmode, RETURN_REGNUM)))); > > clobbered_regs[BASE_REGNUM] > |= (cfun->machine->base_reg > @@ -10446,6 +10457,8 @@ s390_emit_prologue (void) > && !crtl->is_leaf > && !TARGET_TPF_PROFILING) > temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM); > + else if (flag_split_stack && cfun->stdarg) > + temp_reg = gen_rtx_REG (Pmode, 12); > else > temp_reg = gen_rtx_REG (Pmode, 1); > > @@ -10939,6 +10952,234 @@ s300_set_up_by_prologue (hard_reg_set_container *regs) > SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg)); > } > > +/* -fsplit-stack support. */ > + > +/* A SYMBOL_REF for __morestack. */ > +static GTY(()) rtx morestack_ref; > + > +/* When using -fsplit-stack, the allocation routines set a field in > + the TCB to the bottom of the stack plus this much space, measured > + in bytes. */ > + > +#define SPLIT_STACK_AVAILABLE 1024 > + > +/* Emit -fsplit-stack prologue, which goes before the regular function > + prologue. */ > + > +void > +s390_expand_split_stack_prologue (void) > +{ > + rtx r1, guard, cc; > + rtx_insn *insn; > + /* Offset from thread pointer to __private_ss. */ > + int psso = TARGET_64BIT ? 0x38 : 0x20; > + /* Pointer size in bytes. */ > + /* Frame size and argument size - the two parameters to __morestack. */ > + HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size; > + /* Align argument size to 8 bytes - simplifies __morestack code. */ > + HOST_WIDE_INT args_size = crtl->args.size >= 0 > + ? ((crtl->args.size + 7) & ~7) > + : 0; > + /* Label to be called by __morestack. */ > + rtx_code_label *call_done = NULL; > + rtx tmp; > + > + gcc_assert (flag_split_stack && reload_completed); > + if (!TARGET_CPU_ZARCH) > + { > + sorry ("CPUs older than z900 are not supported for -fsplit-stack"); > + return; > + } > + > + r1 = gen_rtx_REG (Pmode, 1); > + > + /* If no stack frame will be allocated, don't do anything. */ > + if (!frame_size) > + { > + /* But emit a marker that will let linker and indirect function > + calls recognise this function as split-stack aware. */ > + emit_insn (gen_split_stack_marker ()); > + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) > + { > + /* If va_start is used, just use r15. */ > + emit_move_insn (r1, > + gen_rtx_PLUS (Pmode, stack_pointer_rtx, > + GEN_INT (STACK_POINTER_OFFSET))); > + > + } > + return; > + } > + > + if (morestack_ref == NULL_RTX) > + { > + morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack"); > + SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL > + | SYMBOL_FLAG_FUNCTION); > + } > + > + if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Op (frame_size)) > + { > + /* If frame_size will fit in an add instruction, do a stack space > + check, and only call __morestack if there's not enough space. */ > + > + /* Get thread pointer. r1 is the only register we can always destroy - r0 > + could contain a static chain (and cannot be used to address memory > + anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved. */ > + emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM)); > + /* Aim at __private_ss. */ > + guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso)); > + > + /* If less that 1kiB used, skip addition and compare directly with > + __private_ss. */ > + if (frame_size > SPLIT_STACK_AVAILABLE) > + { > + emit_move_insn (r1, guard); > + if (TARGET_64BIT) > + emit_insn (gen_adddi3 (r1, r1, GEN_INT (frame_size))); > + else > + emit_insn (gen_addsi3 (r1, r1, GEN_INT (frame_size))); > + guard = r1; > + } > + > + /* Compare the (maybe adjusted) guard with the stack pointer. */ > + cc = s390_emit_compare (LT, stack_pointer_rtx, guard); > + > + call_done = gen_label_rtx (); > + > + tmp = gen_split_stack_cond_call (call_done, > + morestack_ref, > + GEN_INT (frame_size), > + GEN_INT (args_size), > + cc); > + > + insn = emit_jump_insn (tmp); > + JUMP_LABEL (insn) = call_done; > + > + /* Mark the jump as very unlikely to be taken. */ > + add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100); > + > + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) > + { > + /* If va_start is used, and __morestack was not called, just use > + r15. */ > + emit_move_insn (r1, > + gen_rtx_PLUS (Pmode, stack_pointer_rtx, > + GEN_INT (STACK_POINTER_OFFSET))); > + } > + } > + else > + { > + call_done = gen_label_rtx (); > + > + /* Now, we need to call __morestack. It has very special calling > + conventions: it preserves param/return/static chain registers for > + calling main function body, and looks for its own parameters > + at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */ > + tmp = gen_split_stack_call (call_done, > + morestack_ref, > + GEN_INT (frame_size), > + GEN_INT (args_size)); > + insn = emit_jump_insn (tmp); > + JUMP_LABEL (insn) = call_done; > + emit_barrier (); > + } > + > + /* __morestack will call us here. */ > + > + emit_label (call_done); > + LABEL_NUSES (call_done) = 1; > +} > + > +/* Generates split-stack call sequence, along with its parameter block. */ > + > +static void > +s390_expand_split_stack_call (rtx_insn *orig_insn, > + rtx call_done, > + rtx function, > + rtx frame_size, > + rtx args_size, > + rtx cond) > +{ > + rtx_insn *insn = orig_insn; > + rtx parmbase = gen_label_rtx (); > + rtx r1 = gen_rtx_REG (Pmode, 1); > + rtx tmp, tmp2; > + > + /* %r1 = litbase. */ > + insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn); > + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); > + LABEL_NUSES (parmbase)++; > + > + /* jg<cond> __morestack. */ > + if (cond == NULL) > + { > + tmp = gen_split_stack_sibcall (function, call_done); > + insn = emit_jump_insn_after (tmp, insn); > + } > + else > + { > + gcc_assert (s390_comparison (cond, VOIDmode)); > + tmp = gen_split_stack_cond_sibcall (function, cond, call_done); > + insn = emit_jump_insn_after (tmp, insn); > + } > + JUMP_LABEL (insn) = call_done; > + LABEL_NUSES (call_done)++; > + > + /* Go to .rodata. */ > + insn = emit_insn_after (gen_pool_section_start (), insn); > + > + /* Now, we'll emit parameters to __morestack. First, align to pointer size > + (this mirrors the alignment done in __morestack - don't touch it). */ > + insn = emit_insn_after (gen_pool_align (GEN_INT (UNITS_PER_LONG)), insn); > + > + insn = emit_label_after (parmbase, insn); > + > + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, > + gen_rtvec (1, frame_size), > + UNSPECV_POOL_ENTRY); > + insn = emit_insn_after (tmp, insn); > + > + /* Second parameter is size of the arguments passed on stack that > + __morestack has to copy to the new stack (does not include varargs). */ > + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, > + gen_rtvec (1, args_size), > + UNSPECV_POOL_ENTRY); > + insn = emit_insn_after (tmp, insn); > + > + /* Third parameter is offset between start of the parameter block > + and function body to be called by __morestack. */ > + tmp = gen_rtx_LABEL_REF (Pmode, parmbase); > + tmp2 = gen_rtx_LABEL_REF (Pmode, call_done); > + tmp = gen_rtx_CONST (Pmode, > + gen_rtx_MINUS (Pmode, tmp2, tmp)); > + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, > + gen_rtvec (1, tmp), > + UNSPECV_POOL_ENTRY); > + insn = emit_insn_after (tmp, insn); > + add_reg_note (insn, REG_LABEL_OPERAND, call_done); > + LABEL_NUSES (call_done)++; > + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); > + LABEL_NUSES (parmbase)++; > + > + /* Return from .rodata. */ > + insn = emit_insn_after (gen_pool_section_end (), insn); > + > + delete_insn (orig_insn); > +} > + > +/* We may have to tell the dataflow pass that the split stack prologue > + is initializing a register. */ > + > +static void > +s390_live_on_entry (bitmap regs) > +{ > + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) > + { > + gcc_assert (flag_split_stack); > + bitmap_set_bit (regs, 1); > + } > +} > + > /* Return true if the function can use simple_return to return outside > of a shrink-wrapped region. At present shrink-wrapping is supported > in all cases. */ > @@ -11541,6 +11782,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) > expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); > } > > + if (flag_split_stack > + && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) > + == NULL) > + && cfun->machine->split_stack_varargs_pointer == NULL_RTX) > + { > + rtx reg; > + rtx_insn *seq; > + > + reg = gen_reg_rtx (Pmode); > + cfun->machine->split_stack_varargs_pointer = reg; > + > + start_sequence (); > + emit_move_insn (reg, gen_rtx_REG (Pmode, 1)); > + seq = get_insns (); > + end_sequence (); > + > + push_topmost_sequence (); > + emit_insn_after (seq, entry_of_function ()); > + pop_topmost_sequence (); > + } > + > /* Find the overflow area. > FIXME: This currently is too pessimistic when the vector ABI is > enabled. In that case we *always* set up the overflow area > @@ -11549,7 +11811,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) > || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG > || TARGET_VX_ABI) > { > - t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); > + if (cfun->machine->split_stack_varargs_pointer == NULL_RTX) > + t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); > + else > + t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer); > > off = INTVAL (crtl->args.arg_offset_rtx); > off = off < 0 ? 0 : off; > @@ -13158,6 +13423,48 @@ s390_reorg (void) > } > } > > + if (flag_split_stack) > + { > + rtx_insn *insn; > + > + for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) > + { > + /* Look for the split-stack fake jump instructions. */ > + if (!JUMP_P(insn)) > + continue; > + if (GET_CODE (PATTERN (insn)) != PARALLEL > + || XVECLEN (PATTERN (insn), 0) != 2) > + continue; > + rtx set = XVECEXP (PATTERN (insn), 0, 1); > + if (GET_CODE (set) != SET) > + continue; > + rtx unspec = XEXP(set, 1); > + if (GET_CODE (unspec) != UNSPEC_VOLATILE) > + continue; > + if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL) > + continue; > + rtx set_pc = XVECEXP (PATTERN (insn), 0, 0); > + rtx function = XVECEXP (unspec, 0, 0); > + rtx frame_size = XVECEXP (unspec, 0, 1); > + rtx args_size = XVECEXP (unspec, 0, 2); > + rtx pc_src = XEXP (set_pc, 1); > + rtx call_done, cond = NULL_RTX; > + if (GET_CODE (pc_src) == IF_THEN_ELSE) > + { > + cond = XEXP (pc_src, 0); > + call_done = XEXP (XEXP (pc_src, 1), 0); > + } > + else > + call_done = XEXP (pc_src, 0); > + s390_expand_split_stack_call (insn, > + call_done, > + function, > + frame_size, > + args_size, > + cond); > + } > + } > + > /* Try to optimize prologue and epilogue further. */ > s390_optimize_prologue (); > > @@ -14469,6 +14776,9 @@ s390_asm_file_end (void) > s390_vector_abi); > #endif > file_end_indicate_exec_stack (); > + > + if (flag_split_stack) > + file_end_indicate_split_stack (); > } > > /* Return true if TYPE is a vector bool type. */ > @@ -14724,6 +15034,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty > #undef TARGET_SET_UP_BY_PROLOGUE > #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue > > +#undef TARGET_EXTRA_LIVE_ON_ENTRY > +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry > + > #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P > #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ > s390_use_by_pieces_infrastructure_p > diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md > index 9b869d5..771f1cc 100644 > --- a/gcc/config/s390/s390.md > +++ b/gcc/config/s390/s390.md > @@ -114,6 +114,9 @@ > UNSPEC_SP_SET > UNSPEC_SP_TEST > > + ; Split stack support > + UNSPEC_STACK_CHECK > + > ; Test Data Class (TDC) > UNSPEC_TDC_INSN > > @@ -276,6 +279,11 @@ > ; Set and get floating point control register > UNSPECV_SFPC > UNSPECV_EFPC > + > + ; Split stack support > + UNSPECV_SPLIT_STACK_CALL > + UNSPECV_SPLIT_STACK_SIBCALL > + UNSPECV_SPLIT_STACK_MARKER > ]) > > ;; > @@ -10907,3 +10915,172 @@ > "TARGET_Z13" > "lcbb\t%0,%1,%b2" > [(set_attr "op_type" "VRX")]) > + > +; Handle -fsplit-stack. > + > +(define_expand "split_stack_prologue" > + [(const_int 0)] > + "" > +{ > + s390_expand_split_stack_prologue (); > + DONE; > +}) > + > +(define_expand "split_stack_call" > + [(match_operand 0 "" "") > + (match_operand 1 "bras_sym_operand" "X") > + (match_operand 2 "consttable_operand" "X") > + (match_operand 3 "consttable_operand" "X")] > + "TARGET_CPU_ZARCH" > +{ > + if (TARGET_64BIT) > + emit_jump_insn (gen_split_stack_call_di (operands[0], > + operands[1], > + operands[2], > + operands[3])); > + else > + emit_jump_insn (gen_split_stack_call_si (operands[0], > + operands[1], > + operands[2], > + operands[3])); > + DONE; > +}) > + > +(define_insn "split_stack_call_<mode>" > + [(set (pc) (label_ref (match_operand 0 "" ""))) > + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") > + (match_operand 2 "consttable_operand" "X") > + (match_operand 3 "consttable_operand" "X")] > + UNSPECV_SPLIT_STACK_CALL))] > + "TARGET_CPU_ZARCH" > +{ > + gcc_unreachable (); > +} > + [(set_attr "length" "12")]) > + > +(define_expand "split_stack_cond_call" > + [(match_operand 0 "" "") > + (match_operand 1 "bras_sym_operand" "X") > + (match_operand 2 "consttable_operand" "X") > + (match_operand 3 "consttable_operand" "X") > + (match_operand 4 "" "")] > + "TARGET_CPU_ZARCH" > +{ > + if (TARGET_64BIT) > + emit_jump_insn (gen_split_stack_cond_call_di (operands[0], > + operands[1], > + operands[2], > + operands[3], > + operands[4])); > + else > + emit_jump_insn (gen_split_stack_cond_call_si (operands[0], > + operands[1], > + operands[2], > + operands[3], > + operands[4])); > + DONE; > +}) > + > +(define_insn "split_stack_cond_call_<mode>" > + [(set (pc) > + (if_then_else > + (match_operand 4 "" "") > + (label_ref (match_operand 0 "" "")) > + (pc))) > + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") > + (match_operand 2 "consttable_operand" "X") > + (match_operand 3 "consttable_operand" "X")] > + UNSPECV_SPLIT_STACK_CALL))] > + "TARGET_CPU_ZARCH" > +{ > + gcc_unreachable (); > +} > + [(set_attr "length" "12")]) > + > +;; If there are operand 0 bytes available on the stack, jump to > +;; operand 1. > + > +(define_expand "split_stack_space_check" > + [(set (pc) (if_then_else > + (ltu (minus (reg 15) > + (match_operand 0 "register_operand")) > + (unspec [(const_int 0)] UNSPEC_STACK_CHECK)) > + (label_ref (match_operand 1)) > + (pc)))] > + "" > +{ > + /* Offset from thread pointer to __private_ss. */ > + int psso = TARGET_64BIT ? 0x38 : 0x20; > + rtx tp = s390_get_thread_pointer (); > + rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso)); > + rtx reg = gen_reg_rtx (Pmode); > + rtx cc; > + if (TARGET_64BIT) > + emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0])); > + else > + emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0])); > + cc = s390_emit_compare (GT, reg, guard); > + s390_emit_jump (operands[1], cc); > + > + DONE; > +}) > + > +;; A jg with minimal fuss for use in split stack prologue. > + > +(define_expand "split_stack_sibcall" > + [(match_operand 0 "bras_sym_operand" "X") > + (match_operand 1 "" "")] > + "TARGET_CPU_ZARCH" > +{ > + if (TARGET_64BIT) > + emit_jump_insn (gen_split_stack_sibcall_di (operands[0], operands[1])); > + else > + emit_jump_insn (gen_split_stack_sibcall_si (operands[0], operands[1])); > + DONE; > +}) > + > +(define_insn "split_stack_sibcall_<mode>" > + [(set (pc) (label_ref (match_operand 1 "" ""))) > + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] > + UNSPECV_SPLIT_STACK_SIBCALL))] > + "TARGET_CPU_ZARCH" > + "jg\t%0" > + [(set_attr "op_type" "RIL") > + (set_attr "type" "branch")]) > + > +;; Also a conditional one. > + > +(define_expand "split_stack_cond_sibcall" > + [(match_operand 0 "bras_sym_operand" "X") > + (match_operand 1 "" "") > + (match_operand 2 "" "")] > + "TARGET_CPU_ZARCH" > +{ > + if (TARGET_64BIT) > + emit_jump_insn (gen_split_stack_cond_sibcall_di (operands[0], operands[1], operands[2])); > + else > + emit_jump_insn (gen_split_stack_cond_sibcall_si (operands[0], operands[1], operands[2])); > + DONE; > +}) > + > +(define_insn "split_stack_cond_sibcall_<mode>" > + [(set (pc) > + (if_then_else > + (match_operand 1 "" "") > + (label_ref (match_operand 2 "" "")) > + (pc))) > + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] > + UNSPECV_SPLIT_STACK_SIBCALL))] > + "TARGET_CPU_ZARCH" > + "jg%C1\t%0" > + [(set_attr "op_type" "RIL") > + (set_attr "type" "branch")]) > + > +;; An unusual nop instruction used to mark functions with no stack frames > +;; as split-stack aware. > + > +(define_insn "split_stack_marker" > + [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)] > + "" > + "nopr\t%%r15" > + [(set_attr "op_type" "RR")]) > diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog > index 49c7929..3900ab1 100644 > --- a/libgcc/ChangeLog > +++ b/libgcc/ChangeLog > @@ -1,3 +1,10 @@ > +2016-02-02 Marcin KoÅcielnicki <koriakin@0x04.net> > + > + * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. > + * config/s390/morestack.S: New file. > + * config/s390/t-stack-s390: New file. > + * generic-morestack.c (__splitstack_find): Add s390-specific code. > + > 2016-01-25 Jakub Jelinek <jakub@redhat.com> > > PR target/69444 > diff --git a/libgcc/config.host b/libgcc/config.host > index d8efd82..2be5f7e 100644 > --- a/libgcc/config.host > +++ b/libgcc/config.host > @@ -1114,11 +1114,11 @@ rx-*-elf) > tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h" > ;; > s390-*-linux*) > - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi" > + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390" > md_unwind_header=s390/linux-unwind.h > ;; > s390x-*-linux*) > - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux" > + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390" > if test "${host_address}" = 32; then > tmake_file="${tmake_file} s390/32/t-floattodi" > fi > diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S > new file mode 100644 > index 0000000..141dead > --- /dev/null > +++ b/libgcc/config/s390/morestack.S > @@ -0,0 +1,609 @@ > +# s390 support for -fsplit-stack. > +# Copyright (C) 2015 Free Software Foundation, Inc. > +# Contributed by Marcin KoÅcielnicki <koriakin@0x04.net>. > + > +# This file is part of GCC. > + > +# GCC is free software; you can redistribute it and/or modify it under > +# the terms of the GNU General Public License as published by the Free > +# Software Foundation; either version 3, or (at your option) any later > +# version. > + > +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +# WARRANTY; without even the implied warranty of MERCHANTABILITY or > +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > +# for more details. > + > +# Under Section 7 of GPL version 3, you are granted additional > +# permissions described in the GCC Runtime Library Exception, version > +# 3.1, as published by the Free Software Foundation. > + > +# You should have received a copy of the GNU General Public License and > +# a copy of the GCC Runtime Library Exception along with this program; > +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > +# <http://www.gnu.org/licenses/>. > + > +# Excess space needed to call ld.so resolver for lazy plt > +# resolution. Go uses sigaltstack so this doesn't need to > +# also cover signal frame size. > +#define BACKOFF 0x1000 > + > +# The __morestack function. > + > + .global __morestack > + .hidden __morestack > + > + .type __morestack,@function > + > +__morestack: > +.LFB1: > + .cfi_startproc > + > + > +#ifndef __s390x__ > + > + > +# The 31-bit __morestack function. > + > + # We use a cleanup to restore the stack guard if an exception > + # is thrown through this code. > +#ifndef __PIC__ > + .cfi_personality 0,__gcc_personality_v0 > + .cfi_lsda 0,.LLSDA1 > +#else > + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 > + .cfi_lsda 0x1b,.LLSDA1 > +#endif > + > + stm %r2, %r15, 0x8(%r15) # Save %r2-%r15. > + .cfi_offset %r6, -0x48 > + .cfi_offset %r7, -0x44 > + .cfi_offset %r8, -0x40 > + .cfi_offset %r9, -0x3c > + .cfi_offset %r10, -0x38 > + .cfi_offset %r11, -0x34 > + .cfi_offset %r12, -0x30 > + .cfi_offset %r13, -0x2c > + .cfi_offset %r14, -0x28 > + .cfi_offset %r15, -0x24 > + lr %r11, %r15 # Make frame pointer for vararg. > + .cfi_def_cfa_register %r11 > + ahi %r15, -0x60 # 0x60 for standard frame. > + st %r11, 0(%r15) # Save back chain. > + lr %r8, %r0 # Save %r0 (static chain). > + lr %r10, %r1 # Save %r1 (address of parameter block). > + > + l %r7, 0(%r10) # Required frame size to %r7 > + ear %r1, %a0 # Extract thread pointer. > + l %r1, 0x20(%r1) # Get stack bounduary > + ar %r1, %r7 # Stack bounduary + frame size > + a %r1, 4(%r10) # + stack param size > + clr %r1, %r15 # Compare with current stack pointer > + jle .Lnoalloc # guard > sp - frame-size: need alloc > + > + brasl %r14, __morestack_block_signals > + > + # We abuse one of caller's fpr save slots (which we don't use for fprs) > + # as a local variable. Not needed here, but done to be consistent with > + # the below use. > + ahi %r7, BACKOFF # Bump requested size a bit. > + st %r7, 0x40(%r11) # Stuff frame size on stack. > + la %r2, 0x40(%r11) # Pass its address as parameter. > + la %r3, 0x60(%r11) # Caller's stack parameters. > + l %r4, 4(%r10) # Size of stack parameters. > + brasl %r14, __generic_morestack > + > + lr %r15, %r2 # Switch to the new stack. > + ahi %r15, -0x60 # Make a stack frame on it. > + st %r11, 0(%r15) # Save back chain. > + > + s %r2, 0x40(%r11) # The end of stack space. > + ahi %r2, BACKOFF # Back off a bit. > + ear %r1, %a0 # Extract thread pointer. > +.LEHB0: > + st %r2, 0x20(%r1) # Save the new stack boundary. > + > + brasl %r14, __morestack_unblock_signals > + > + lr %r0, %r8 # Static chain. > + lm %r2, %r6, 0x8(%r11) # Paremeter registers. > + > + # Third parameter is address of function meat - address of parameter > + # block. > + a %r10, 0x8(%r10) > + > + # Leave vararg pointer in %r1, in case function uses it > + la %r1, 0x60(%r11) > + > + # State of registers: > + # %r0: Static chain from entry. > + # %r1: Vararg pointer. > + # %r2-%r6: Parameters from entry. > + # %r7-%r10: Indeterminate. > + # %r11: Frame pointer (%r15 from entry). > + # %r12-%r13: Indeterminate. > + # %r14: Return address. > + # %r15: Stack pointer. > + basr %r14, %r10 # Call our caller. > + > + stm %r2, %r3, 0x8(%r11) # Save return registers. > + > + brasl %r14, __morestack_block_signals > + > + # We need a stack slot now, but have no good way to get it - the frame > + # on new stack had to be exactly 0x60 bytes, or stack parameters would > + # be passed wrong. Abuse fpr save area in caller's frame (we don't > + # save actual fprs). > + la %r2, 0x40(%r11) > + brasl %r14, __generic_releasestack > + > + s %r2, 0x40(%r11) # Subtract available space. > + ahi %r2, BACKOFF # Back off a bit. > + ear %r1, %a0 # Extract thread pointer. > +.LEHE0: > + st %r2, 0x20(%r1) # Save the new stack boundary. > + > + # We need to restore the old stack pointer before unblocking signals. > + # We also need 0x60 bytes for a stack frame. Since we had a stack > + # frame at this place before the stack switch, there's no need to > + # write the back chain again. > + lr %r15, %r11 > + ahi %r15, -0x60 > + > + brasl %r14, __morestack_unblock_signals > + > + lm %r2, %r15, 0x8(%r11) # Restore all registers. > + .cfi_remember_state > + .cfi_restore %r15 > + .cfi_restore %r14 > + .cfi_restore %r13 > + .cfi_restore %r12 > + .cfi_restore %r11 > + .cfi_restore %r10 > + .cfi_restore %r9 > + .cfi_restore %r8 > + .cfi_restore %r7 > + .cfi_restore %r6 > + .cfi_def_cfa_register %r15 > + br %r14 # Return to caller's caller. > + > +# Executed if no new stack allocation is needed. > + > +.Lnoalloc: > + .cfi_restore_state > + # We may need to copy stack parameters. > + l %r9, 0x4(%r10) # Load stack parameter size. > + ltr %r9, %r9 # And check if it's 0. > + je .Lnostackparm # Skip the copy if not needed. > + sr %r15, %r9 # Make space on the stack. > + la %r8, 0x60(%r15) # Destination. > + la %r12, 0x60(%r11) # Source. > + lr %r13, %r9 # Source size. > +.Lcopy: > + mvcle %r8, %r12, 0 # Copy. > + jo .Lcopy > + > +.Lnostackparm: > + # Third parameter is address of function meat - address of parameter > + # block. > + a %r10, 0x8(%r10) > + > + # Leave vararg pointer in %r1, in case function uses it > + la %r1, 0x60(%r11) > + > + # OK, no stack allocation needed. We still follow the protocol and > + # call our caller - it doesn't cost much and makes sure vararg works. > + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. > + basr %r14, %r10 # Call our caller. > + > + lm %r6, %r15, 0x18(%r11) # Restore all callee-saved registers. > + .cfi_remember_state > + .cfi_restore %r15 > + .cfi_restore %r14 > + .cfi_restore %r13 > + .cfi_restore %r12 > + .cfi_restore %r11 > + .cfi_restore %r10 > + .cfi_restore %r9 > + .cfi_restore %r8 > + .cfi_restore %r7 > + .cfi_restore %r6 > + .cfi_def_cfa_register %r15 > + br %r14 # Return to caller's caller. > + > +# This is the cleanup code called by the stack unwinder when unwinding > +# through the code between .LEHB0 and .LEHE0 above. > + > +.L1: > + .cfi_restore_state > + lr %r2, %r11 # Stack pointer after resume. > + brasl %r14, __generic_findstack > + lr %r3, %r11 # Get the stack pointer. > + sr %r3, %r2 # Subtract available space. > + ahi %r3, BACKOFF # Back off a bit. > + ear %r1, %a0 # Extract thread pointer. > + st %r3, 0x20(%r1) # Save the new stack boundary. > + > + lr %r2, %r6 # Exception header. > +#ifdef __PIC__ > + brasl %r14, _Unwind_Resume@PLT > +#else > + brasl %r14, _Unwind_Resume > +#endif > + > +#else /* defined(__s390x__) */ > + > + > +# The 64-bit __morestack function. > + > + # We use a cleanup to restore the stack guard if an exception > + # is thrown through this code. > +#ifndef __PIC__ > + .cfi_personality 0x3,__gcc_personality_v0 > + .cfi_lsda 0x3,.LLSDA1 > +#else > + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 > + .cfi_lsda 0x1b,.LLSDA1 > +#endif > + > + stmg %r2, %r15, 0x10(%r15) # Save %r2-%r15. > + .cfi_offset %r6, -0x70 > + .cfi_offset %r7, -0x68 > + .cfi_offset %r8, -0x60 > + .cfi_offset %r9, -0x58 > + .cfi_offset %r10, -0x50 > + .cfi_offset %r11, -0x48 > + .cfi_offset %r12, -0x40 > + .cfi_offset %r13, -0x38 > + .cfi_offset %r14, -0x30 > + .cfi_offset %r15, -0x28 > + lgr %r11, %r15 # Make frame pointer for vararg. > + .cfi_def_cfa_register %r11 > + aghi %r15, -0xa0 # 0xa0 for standard frame. > + stg %r11, 0(%r15) # Save back chain. > + lgr %r8, %r0 # Save %r0 (static chain). > + lgr %r10, %r1 # Save %r1 (address of parameter block). > + > + lg %r7, 0(%r10) # Required frame size to %r7 > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 # Extract thread pointer. > + lg %r1, 0x38(%r1) # Get stack bounduary > + agr %r1, %r7 # Stack bounduary + frame size > + ag %r1, 8(%r10) # + stack param size > + clgr %r1, %r15 # Compare with current stack pointer > + jle .Lnoalloc # guard > sp - frame-size: need alloc > + > + brasl %r14, __morestack_block_signals > + > + # We abuse one of caller's fpr save slots (which we don't use for fprs) > + # as a local variable. Not needed here, but done to be consistent with > + # the below use. > + aghi %r7, BACKOFF # Bump requested size a bit. > + stg %r7, 0x80(%r11) # Stuff frame size on stack. > + la %r2, 0x80(%r11) # Pass its address as parameter. > + la %r3, 0xa0(%r11) # Caller's stack parameters. > + lg %r4, 8(%r10) # Size of stack parameters. > + brasl %r14, __generic_morestack > + > + lgr %r15, %r2 # Switch to the new stack. > + aghi %r15, -0xa0 # Make a stack frame on it. > + stg %r11, 0(%r15) # Save back chain. > + > + sg %r2, 0x80(%r11) # The end of stack space. > + aghi %r2, BACKOFF # Back off a bit. > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 # Extract thread pointer. > +.LEHB0: > + stg %r2, 0x38(%r1) # Save the new stack boundary. > + > + brasl %r14, __morestack_unblock_signals > + > + lgr %r0, %r8 # Static chain. > + lmg %r2, %r6, 0x10(%r11) # Paremeter registers. > + > + # Third parameter is address of function meat - address of parameter > + # block. > + ag %r10, 0x10(%r10) > + > + # Leave vararg pointer in %r1, in case function uses it > + la %r1, 0xa0(%r11) > + > + # State of registers: > + # %r0: Static chain from entry. > + # %r1: Vararg pointer. > + # %r2-%r6: Parameters from entry. > + # %r7-%r10: Indeterminate. > + # %r11: Frame pointer (%r15 from entry). > + # %r12-%r13: Indeterminate. > + # %r14: Return address. > + # %r15: Stack pointer. > + basr %r14, %r10 # Call our caller. > + > + stg %r2, 0x10(%r11) # Save return register. > + > + brasl %r14, __morestack_block_signals > + > + # We need a stack slot now, but have no good way to get it - the frame > + # on new stack had to be exactly 0xa0 bytes, or stack parameters would > + # be passed wrong. Abuse fpr save area in caller's frame (we don't > + # save actual fprs). > + la %r2, 0x80(%r11) > + brasl %r14, __generic_releasestack > + > + sg %r2, 0x80(%r11) # Subtract available space. > + aghi %r2, BACKOFF # Back off a bit. > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 # Extract thread pointer. > +.LEHE0: > + stg %r2, 0x38(%r1) # Save the new stack boundary. > + > + # We need to restore the old stack pointer before unblocking signals. > + # We also need 0xa0 bytes for a stack frame. Since we had a stack > + # frame at this place before the stack switch, there's no need to > + # write the back chain again. > + lgr %r15, %r11 > + aghi %r15, -0xa0 > + > + brasl %r14, __morestack_unblock_signals > + > + lmg %r2, %r15, 0x10(%r11) # Restore all registers. > + .cfi_remember_state > + .cfi_restore %r15 > + .cfi_restore %r14 > + .cfi_restore %r13 > + .cfi_restore %r12 > + .cfi_restore %r11 > + .cfi_restore %r10 > + .cfi_restore %r9 > + .cfi_restore %r8 > + .cfi_restore %r7 > + .cfi_restore %r6 > + .cfi_def_cfa_register %r15 > + br %r14 # Return to caller's caller. > + > +# Executed if no new stack allocation is needed. > + > +.Lnoalloc: > + .cfi_restore_state > + # We may need to copy stack parameters. > + lg %r9, 0x8(%r10) # Load stack parameter size. > + ltgr %r9, %r9 # Check if it's 0. > + je .Lnostackparm # Skip the copy if not needed. > + sgr %r15, %r9 # Make space on the stack. > + la %r8, 0xa0(%r15) # Destination. > + la %r12, 0xa0(%r11) # Source. > + lgr %r13, %r9 # Source size. > +.Lcopy: > + mvcle %r8, %r12, 0 # Copy. > + jo .Lcopy > + > +.Lnostackparm: > + # Third parameter is address of function meat - address of parameter > + # block. > + ag %r10, 0x10(%r10) > + > + # Leave vararg pointer in %r1, in case function uses it > + la %r1, 0xa0(%r11) > + > + # OK, no stack allocation needed. We still follow the protocol and > + # call our caller - it doesn't cost much and makes sure vararg works. > + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. > + basr %r14, %r10 # Call our caller. > + > + lmg %r6, %r15, 0x30(%r11) # Restore all callee-saved registers. > + .cfi_remember_state > + .cfi_restore %r15 > + .cfi_restore %r14 > + .cfi_restore %r13 > + .cfi_restore %r12 > + .cfi_restore %r11 > + .cfi_restore %r10 > + .cfi_restore %r9 > + .cfi_restore %r8 > + .cfi_restore %r7 > + .cfi_restore %r6 > + .cfi_def_cfa_register %r15 > + br %r14 # Return to caller's caller. > + > +# This is the cleanup code called by the stack unwinder when unwinding > +# through the code between .LEHB0 and .LEHE0 above. > + > +.L1: > + .cfi_restore_state > + lgr %r2, %r11 # Stack pointer after resume. > + brasl %r14, __generic_findstack > + lgr %r3, %r11 # Get the stack pointer. > + sgr %r3, %r2 # Subtract available space. > + aghi %r3, BACKOFF # Back off a bit. > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 # Extract thread pointer. > + stg %r3, 0x38(%r1) # Save the new stack boundary. > + > + lgr %r2, %r6 # Exception header. > +#ifdef __PIC__ > + brasl %r14, _Unwind_Resume@PLT > +#else > + brasl %r14, _Unwind_Resume > +#endif > + > +#endif /* defined(__s390x__) */ > + > + .cfi_endproc > + .size __morestack, . - __morestack > + > + > +# The exception table. This tells the personality routine to execute > +# the exception handler. > + > + .section .gcc_except_table,"a",@progbits > + .align 4 > +.LLSDA1: > + .byte 0xff # @LPStart format (omit) > + .byte 0xff # @TType format (omit) > + .byte 0x1 # call-site format (uleb128) > + .uleb128 .LLSDACSE1-.LLSDACSB1 # Call-site table length > +.LLSDACSB1: > + .uleb128 .LEHB0-.LFB1 # region 0 start > + .uleb128 .LEHE0-.LEHB0 # length > + .uleb128 .L1-.LFB1 # landing pad > + .uleb128 0 # action > +.LLSDACSE1: > + > + > + .global __gcc_personality_v0 > +#ifdef __PIC__ > + # Build a position independent reference to the basic > + # personality function. > + .hidden DW.ref.__gcc_personality_v0 > + .weak DW.ref.__gcc_personality_v0 > + .section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat > + .type DW.ref.__gcc_personality_v0, @object > +DW.ref.__gcc_personality_v0: > +#ifndef __LP64__ > + .align 4 > + .size DW.ref.__gcc_personality_v0, 4 > + .long __gcc_personality_v0 > +#else > + .align 8 > + .size DW.ref.__gcc_personality_v0, 8 > + .quad __gcc_personality_v0 > +#endif > +#endif > + > + > + > +# Initialize the stack test value when the program starts or when a > +# new thread starts. We don't know how large the main stack is, so we > +# guess conservatively. We might be able to use getrlimit here. > + > + .text > + .global __stack_split_initialize > + .hidden __stack_split_initialize > + > + .type __stack_split_initialize, @function > + > +__stack_split_initialize: > + > +#ifndef __s390x__ > + > + ear %r1, %a0 > + lr %r0, %r15 > + ahi %r0, -0x4000 # We should have at least 16K. > + st %r0, 0x20(%r1) > + > + lr %r2, %r15 > + lhi %r3, 0x4000 > +#ifdef __PIC__ > + jg __generic_morestack_set_initial_sp@PLT # Tail call > +#else > + jg __generic_morestack_set_initial_sp # Tail call > +#endif > + > +#else /* defined(__s390x__) */ > + > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 > + lgr %r0, %r15 > + aghi %r0, -0x4000 # We should have at least 16K. > + stg %r0, 0x38(%r1) > + > + lgr %r2, %r15 > + lghi %r3, 0x4000 > +#ifdef __PIC__ > + jg __generic_morestack_set_initial_sp@PLT # Tail call > +#else > + jg __generic_morestack_set_initial_sp # Tail call > +#endif > + > +#endif /* defined(__s390x__) */ > + > + .size __stack_split_initialize, . - __stack_split_initialize > + > +# Routines to get and set the guard, for __splitstack_getcontext, > +# __splitstack_setcontext, and __splitstack_makecontext. > + > +# void *__morestack_get_guard (void) returns the current stack guard. > + .text > + .global __morestack_get_guard > + .hidden __morestack_get_guard > + > + .type __morestack_get_guard,@function > + > +__morestack_get_guard: > + > +#ifndef __s390x__ > + ear %r1, %a0 > + l %r2, 0x20(%r1) > +#else > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 > + lg %r2, 0x38(%r1) > +#endif > + br %r14 > + > + .size __morestack_get_guard, . - __morestack_get_guard > + > +# void __morestack_set_guard (void *) sets the stack guard. > + .global __morestack_set_guard > + .hidden __morestack_set_guard > + > + .type __morestack_set_guard,@function > + > +__morestack_set_guard: > + > +#ifndef __s390x__ > + ear %r1, %a0 > + st %r2, 0x20(%r1) > +#else > + ear %r1, %a0 > + sllg %r1, %r1, 32 > + ear %r1, %a1 > + stg %r2, 0x38(%r1) > +#endif > + br %r14 > + > + .size __morestack_set_guard, . - __morestack_set_guard > + > +# void *__morestack_make_guard (void *, size_t) returns the stack > +# guard value for a stack. > + .global __morestack_make_guard > + .hidden __morestack_make_guard > + > + .type __morestack_make_guard,@function > + > +__morestack_make_guard: > + > +#ifndef __s390x__ > + sr %r2, %r3 > + ahi %r2, BACKOFF > +#else > + sgr %r2, %r3 > + aghi %r2, BACKOFF > +#endif > + br %r14 > + > + .size __morestack_make_guard, . - __morestack_make_guard > + > +# Make __stack_split_initialize a high priority constructor. > + > + .section .ctors.65535,"aw",@progbits > + > +#ifndef __LP64__ > + .align 4 > + .long __stack_split_initialize > + .long __morestack_load_mmap > +#else > + .align 8 > + .quad __stack_split_initialize > + .quad __morestack_load_mmap > +#endif > + > + .section .note.GNU-stack,"",@progbits > + .section .note.GNU-split-stack,"",@progbits > + .section .note.GNU-no-split-stack,"",@progbits > diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390 > new file mode 100644 > index 0000000..4c959b0 > --- /dev/null > +++ b/libgcc/config/s390/t-stack-s390 > @@ -0,0 +1,2 @@ > +# Makefile fragment to support -fsplit-stack for s390. > +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S > diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c > index 89765d4..b8eec4e 100644 > --- a/libgcc/generic-morestack.c > +++ b/libgcc/generic-morestack.c > @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len, > #elif defined (__i386__) > nsp -= 6 * sizeof (void *); > #elif defined __powerpc64__ > +#elif defined __s390x__ > + nsp -= 2 * 160; > +#elif defined __s390__ > + nsp -= 2 * 96; > #else > #error "unrecognized target" > #endif > ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH] s390: Add -fsplit-stack support 2016-02-02 15:19 ` Andreas Krebbel @ 2016-02-02 15:31 ` Marcin Kościelnicki 2016-02-02 18:34 ` Ulrich Weigand 0 siblings, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-02-02 15:31 UTC (permalink / raw) To: krebbel; +Cc: gcc-patches, Marcin Kościelnicki libgcc/ChangeLog: * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. * config/s390/morestack.S: New file. * config/s390/t-stack-s390: New file. * generic-morestack.c (__splitstack_find): Add s390-specific code. gcc/ChangeLog: * common/config/s390/s390-common.c (s390_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro. * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. * config/s390/s390.c (struct machine_function): New field split_stack_varargs_pointer. (s390_register_info): Mark r12 as clobbered if it'll be used as temp in s390_emit_prologue. (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack vararg pointer. (morestack_ref): New global. (SPLIT_STACK_AVAILABLE): New macro. (s390_expand_split_stack_prologue): New function. (s390_expand_split_stack_call): New function. (s390_live_on_entry): New function. (s390_va_start): Use split-stack vararg pointer if appropriate. (s390_reorg): Lower the split-stack pseudo-insns. (s390_asm_file_end): Emit the split-stack note sections. (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. (UNSPECV_SPLIT_STACK_CALL): New unspec. (UNSPECV_SPLIT_STACK_SIBCALL): New unspec. (UNSPECV_SPLIT_STACK_MARKER): New unspec. (split_stack_prologue): New expand. (split_stack_call): New expand. (split_stack_call_*): New insn. (split_stack_cond_call): New expand. (split_stack_cond_call_*): New insn. (split_stack_space_check): New expand. (split_stack_sibcall): New expand. (split_stack_sibcall_*): New insn. (split_stack_cond_sibcall): New expand. (split_stack_cond_sibcall_*): New insn. (split_stack_marker): New insn. --- Here we go. I've also removed the "see below", since I don't really see anything below... gcc/ChangeLog | 37 +++ gcc/common/config/s390/s390-common.c | 14 + gcc/config/s390/s390-protos.h | 1 + gcc/config/s390/s390.c | 323 ++++++++++++++++++- gcc/config/s390/s390.md | 177 ++++++++++ libgcc/ChangeLog | 7 + libgcc/config.host | 4 +- libgcc/config/s390/morestack.S | 609 +++++++++++++++++++++++++++++++++++ libgcc/config/s390/t-stack-s390 | 2 + libgcc/generic-morestack.c | 4 + 10 files changed, 1171 insertions(+), 7 deletions(-) create mode 100644 libgcc/config/s390/morestack.S create mode 100644 libgcc/config/s390/t-stack-s390 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 9a2cec8..af86079 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,40 @@ +2016-02-02 Marcin KoÅcielnicki <koriakin@0x04.net> + + * common/config/s390/s390-common.c (s390_supports_split_stack): + New function. + (TARGET_SUPPORTS_SPLIT_STACK): New macro. + * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. + * config/s390/s390.c (struct machine_function): New field + split_stack_varargs_pointer. + (s390_register_info): Mark r12 as clobbered if it'll be used as temp + in s390_emit_prologue. + (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack + vararg pointer. + (morestack_ref): New global. + (SPLIT_STACK_AVAILABLE): New macro. + (s390_expand_split_stack_prologue): New function. + (s390_expand_split_stack_call): New function. + (s390_live_on_entry): New function. + (s390_va_start): Use split-stack vararg pointer if appropriate. + (s390_reorg): Lower the split-stack pseudo-insns. + (s390_asm_file_end): Emit the split-stack note sections. + (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. + * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. + (UNSPECV_SPLIT_STACK_CALL): New unspec. + (UNSPECV_SPLIT_STACK_SIBCALL): New unspec. + (UNSPECV_SPLIT_STACK_MARKER): New unspec. + (split_stack_prologue): New expand. + (split_stack_call): New expand. + (split_stack_call_*): New insn. + (split_stack_cond_call): New expand. + (split_stack_cond_call_*): New insn. + (split_stack_space_check): New expand. + (split_stack_sibcall): New expand. + (split_stack_sibcall_*): New insn. + (split_stack_cond_sibcall): New expand. + (split_stack_cond_sibcall_*): New insn. + (split_stack_marker): New insn. + 2016-02-02 Thomas Schwinge <thomas@codesourcery.com> * omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): Remove. diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c index 4519c21..1e497e6 100644 --- a/gcc/common/config/s390/s390-common.c +++ b/gcc/common/config/s390/s390-common.c @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, } } +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23. + We don't verify it, since earlier versions just have padding at + its place, which works just as well. */ + +static bool +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED, + struct gcc_options *opts ATTRIBUTE_UNUSED) +{ + return true; +} + #undef TARGET_DEFAULT_TARGET_FLAGS #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT) @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, #undef TARGET_OPTION_INIT_STRUCT #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct +#undef TARGET_SUPPORTS_SPLIT_STACK +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack + struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index 633bc1e..09032c9 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, extern HOST_WIDE_INT s390_initial_elimination_offset (int, int); extern void s390_emit_prologue (void); extern void s390_emit_epilogue (bool); +extern void s390_expand_split_stack_prologue (void); extern bool s390_can_use_simple_return_insn (void); extern bool s390_can_use_return_insn (void); extern void s390_function_profiler (FILE *, int); diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 3be64de..6c1cb1e 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -426,6 +426,13 @@ struct GTY(()) machine_function /* True if the current function may contain a tbegin clobbering FPRs. */ bool tbegin_p; + + /* For -fsplit-stack support: A stack local which holds a pointer to + the stack arguments for a function with a variable number of + arguments. This is set at the start of the function and is used + to initialize the overflow_arg_area field of the va_list + structure. */ + rtx split_stack_varargs_pointer; }; /* Few accessor macros for struct cfun->machine->s390_frame_layout. */ @@ -9316,9 +9323,13 @@ s390_register_info () cfun_frame_layout.high_fprs++; } - if (flag_pic) - clobbered_regs[PIC_OFFSET_TABLE_REGNUM] - |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM); + /* Register 12 is used for GOT address, but also as temp in prologue + for split-stack stdarg functions (unless r14 is available). */ + clobbered_regs[12] + |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM)) + || (flag_split_stack && cfun->stdarg + && (crtl->is_leaf || TARGET_TPF_PROFILING + || has_hard_reg_initial_val (Pmode, RETURN_REGNUM)))); clobbered_regs[BASE_REGNUM] |= (cfun->machine->base_reg @@ -10440,12 +10451,14 @@ s390_emit_prologue (void) int next_fpr = 0; /* Choose best register to use for temp use within prologue. - See below for why TPF must use the register 1. */ + TPF with profiling must avoid the register 14. */ if (!has_hard_reg_initial_val (Pmode, RETURN_REGNUM) && !crtl->is_leaf && !TARGET_TPF_PROFILING) temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM); + else if (flag_split_stack && cfun->stdarg) + temp_reg = gen_rtx_REG (Pmode, 12); else temp_reg = gen_rtx_REG (Pmode, 1); @@ -10939,6 +10952,234 @@ s300_set_up_by_prologue (hard_reg_set_container *regs) SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg)); } +/* -fsplit-stack support. */ + +/* A SYMBOL_REF for __morestack. */ +static GTY(()) rtx morestack_ref; + +/* When using -fsplit-stack, the allocation routines set a field in + the TCB to the bottom of the stack plus this much space, measured + in bytes. */ + +#define SPLIT_STACK_AVAILABLE 1024 + +/* Emit -fsplit-stack prologue, which goes before the regular function + prologue. */ + +void +s390_expand_split_stack_prologue (void) +{ + rtx r1, guard, cc; + rtx_insn *insn; + /* Offset from thread pointer to __private_ss. */ + int psso = TARGET_64BIT ? 0x38 : 0x20; + /* Pointer size in bytes. */ + /* Frame size and argument size - the two parameters to __morestack. */ + HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size; + /* Align argument size to 8 bytes - simplifies __morestack code. */ + HOST_WIDE_INT args_size = crtl->args.size >= 0 + ? ((crtl->args.size + 7) & ~7) + : 0; + /* Label to be called by __morestack. */ + rtx_code_label *call_done = NULL; + rtx tmp; + + gcc_assert (flag_split_stack && reload_completed); + if (!TARGET_CPU_ZARCH) + { + sorry ("CPUs older than z900 are not supported for -fsplit-stack"); + return; + } + + r1 = gen_rtx_REG (Pmode, 1); + + /* If no stack frame will be allocated, don't do anything. */ + if (!frame_size) + { + /* But emit a marker that will let linker and indirect function + calls recognise this function as split-stack aware. */ + emit_insn (gen_split_stack_marker ()); + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + /* If va_start is used, just use r15. */ + emit_move_insn (r1, + gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (STACK_POINTER_OFFSET))); + + } + return; + } + + if (morestack_ref == NULL_RTX) + { + morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack"); + SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL + | SYMBOL_FLAG_FUNCTION); + } + + if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Op (frame_size)) + { + /* If frame_size will fit in an add instruction, do a stack space + check, and only call __morestack if there's not enough space. */ + + /* Get thread pointer. r1 is the only register we can always destroy - r0 + could contain a static chain (and cannot be used to address memory + anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved. */ + emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM)); + /* Aim at __private_ss. */ + guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso)); + + /* If less that 1kiB used, skip addition and compare directly with + __private_ss. */ + if (frame_size > SPLIT_STACK_AVAILABLE) + { + emit_move_insn (r1, guard); + if (TARGET_64BIT) + emit_insn (gen_adddi3 (r1, r1, GEN_INT (frame_size))); + else + emit_insn (gen_addsi3 (r1, r1, GEN_INT (frame_size))); + guard = r1; + } + + /* Compare the (maybe adjusted) guard with the stack pointer. */ + cc = s390_emit_compare (LT, stack_pointer_rtx, guard); + + call_done = gen_label_rtx (); + + tmp = gen_split_stack_cond_call (call_done, + morestack_ref, + GEN_INT (frame_size), + GEN_INT (args_size), + cc); + + insn = emit_jump_insn (tmp); + JUMP_LABEL (insn) = call_done; + + /* Mark the jump as very unlikely to be taken. */ + add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100); + + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + /* If va_start is used, and __morestack was not called, just use + r15. */ + emit_move_insn (r1, + gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (STACK_POINTER_OFFSET))); + } + } + else + { + call_done = gen_label_rtx (); + + /* Now, we need to call __morestack. It has very special calling + conventions: it preserves param/return/static chain registers for + calling main function body, and looks for its own parameters + at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */ + tmp = gen_split_stack_call (call_done, + morestack_ref, + GEN_INT (frame_size), + GEN_INT (args_size)); + insn = emit_jump_insn (tmp); + JUMP_LABEL (insn) = call_done; + emit_barrier (); + } + + /* __morestack will call us here. */ + + emit_label (call_done); + LABEL_NUSES (call_done) = 1; +} + +/* Generates split-stack call sequence, along with its parameter block. */ + +static void +s390_expand_split_stack_call (rtx_insn *orig_insn, + rtx call_done, + rtx function, + rtx frame_size, + rtx args_size, + rtx cond) +{ + rtx_insn *insn = orig_insn; + rtx parmbase = gen_label_rtx (); + rtx r1 = gen_rtx_REG (Pmode, 1); + rtx tmp, tmp2; + + /* %r1 = litbase. */ + insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn); + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); + LABEL_NUSES (parmbase)++; + + /* jg<cond> __morestack. */ + if (cond == NULL) + { + tmp = gen_split_stack_sibcall (function, call_done); + insn = emit_jump_insn_after (tmp, insn); + } + else + { + gcc_assert (s390_comparison (cond, VOIDmode)); + tmp = gen_split_stack_cond_sibcall (function, cond, call_done); + insn = emit_jump_insn_after (tmp, insn); + } + JUMP_LABEL (insn) = call_done; + LABEL_NUSES (call_done)++; + + /* Go to .rodata. */ + insn = emit_insn_after (gen_pool_section_start (), insn); + + /* Now, we'll emit parameters to __morestack. First, align to pointer size + (this mirrors the alignment done in __morestack - don't touch it). */ + insn = emit_insn_after (gen_pool_align (GEN_INT (UNITS_PER_LONG)), insn); + + insn = emit_label_after (parmbase, insn); + + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, frame_size), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + + /* Second parameter is size of the arguments passed on stack that + __morestack has to copy to the new stack (does not include varargs). */ + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, args_size), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + + /* Third parameter is offset between start of the parameter block + and function body to be called by __morestack. */ + tmp = gen_rtx_LABEL_REF (Pmode, parmbase); + tmp2 = gen_rtx_LABEL_REF (Pmode, call_done); + tmp = gen_rtx_CONST (Pmode, + gen_rtx_MINUS (Pmode, tmp2, tmp)); + tmp = gen_rtx_UNSPEC_VOLATILE (Pmode, + gen_rtvec (1, tmp), + UNSPECV_POOL_ENTRY); + insn = emit_insn_after (tmp, insn); + add_reg_note (insn, REG_LABEL_OPERAND, call_done); + LABEL_NUSES (call_done)++; + add_reg_note (insn, REG_LABEL_OPERAND, parmbase); + LABEL_NUSES (parmbase)++; + + /* Return from .rodata. */ + insn = emit_insn_after (gen_pool_section_end (), insn); + + delete_insn (orig_insn); +} + +/* We may have to tell the dataflow pass that the split stack prologue + is initializing a register. */ + +static void +s390_live_on_entry (bitmap regs) +{ + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + gcc_assert (flag_split_stack); + bitmap_set_bit (regs, 1); + } +} + /* Return true if the function can use simple_return to return outside of a shrink-wrapped region. At present shrink-wrapping is supported in all cases. */ @@ -11541,6 +11782,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } + if (flag_split_stack + && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) + == NULL) + && cfun->machine->split_stack_varargs_pointer == NULL_RTX) + { + rtx reg; + rtx_insn *seq; + + reg = gen_reg_rtx (Pmode); + cfun->machine->split_stack_varargs_pointer = reg; + + start_sequence (); + emit_move_insn (reg, gen_rtx_REG (Pmode, 1)); + seq = get_insns (); + end_sequence (); + + push_topmost_sequence (); + emit_insn_after (seq, entry_of_function ()); + pop_topmost_sequence (); + } + /* Find the overflow area. FIXME: This currently is too pessimistic when the vector ABI is enabled. In that case we *always* set up the overflow area @@ -11549,7 +11811,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG || TARGET_VX_ABI) { - t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); + if (cfun->machine->split_stack_varargs_pointer == NULL_RTX) + t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); + else + t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer); off = INTVAL (crtl->args.arg_offset_rtx); off = off < 0 ? 0 : off; @@ -13158,6 +13423,48 @@ s390_reorg (void) } } + if (flag_split_stack) + { + rtx_insn *insn; + + for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) + { + /* Look for the split-stack fake jump instructions. */ + if (!JUMP_P(insn)) + continue; + if (GET_CODE (PATTERN (insn)) != PARALLEL + || XVECLEN (PATTERN (insn), 0) != 2) + continue; + rtx set = XVECEXP (PATTERN (insn), 0, 1); + if (GET_CODE (set) != SET) + continue; + rtx unspec = XEXP(set, 1); + if (GET_CODE (unspec) != UNSPEC_VOLATILE) + continue; + if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL) + continue; + rtx set_pc = XVECEXP (PATTERN (insn), 0, 0); + rtx function = XVECEXP (unspec, 0, 0); + rtx frame_size = XVECEXP (unspec, 0, 1); + rtx args_size = XVECEXP (unspec, 0, 2); + rtx pc_src = XEXP (set_pc, 1); + rtx call_done, cond = NULL_RTX; + if (GET_CODE (pc_src) == IF_THEN_ELSE) + { + cond = XEXP (pc_src, 0); + call_done = XEXP (XEXP (pc_src, 1), 0); + } + else + call_done = XEXP (pc_src, 0); + s390_expand_split_stack_call (insn, + call_done, + function, + frame_size, + args_size, + cond); + } + } + /* Try to optimize prologue and epilogue further. */ s390_optimize_prologue (); @@ -14469,6 +14776,9 @@ s390_asm_file_end (void) s390_vector_abi); #endif file_end_indicate_exec_stack (); + + if (flag_split_stack) + file_end_indicate_split_stack (); } /* Return true if TYPE is a vector bool type. */ @@ -14724,6 +15034,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty #undef TARGET_SET_UP_BY_PROLOGUE #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue +#undef TARGET_EXTRA_LIVE_ON_ENTRY +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry + #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ s390_use_by_pieces_infrastructure_p diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 9b869d5..771f1cc 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -114,6 +114,9 @@ UNSPEC_SP_SET UNSPEC_SP_TEST + ; Split stack support + UNSPEC_STACK_CHECK + ; Test Data Class (TDC) UNSPEC_TDC_INSN @@ -276,6 +279,11 @@ ; Set and get floating point control register UNSPECV_SFPC UNSPECV_EFPC + + ; Split stack support + UNSPECV_SPLIT_STACK_CALL + UNSPECV_SPLIT_STACK_SIBCALL + UNSPECV_SPLIT_STACK_MARKER ]) ;; @@ -10907,3 +10915,172 @@ "TARGET_Z13" "lcbb\t%0,%1,%b2" [(set_attr "op_type" "VRX")]) + +; Handle -fsplit-stack. + +(define_expand "split_stack_prologue" + [(const_int 0)] + "" +{ + s390_expand_split_stack_prologue (); + DONE; +}) + +(define_expand "split_stack_call" + [(match_operand 0 "" "") + (match_operand 1 "bras_sym_operand" "X") + (match_operand 2 "consttable_operand" "X") + (match_operand 3 "consttable_operand" "X")] + "TARGET_CPU_ZARCH" +{ + if (TARGET_64BIT) + emit_jump_insn (gen_split_stack_call_di (operands[0], + operands[1], + operands[2], + operands[3])); + else + emit_jump_insn (gen_split_stack_call_si (operands[0], + operands[1], + operands[2], + operands[3])); + DONE; +}) + +(define_insn "split_stack_call_<mode>" + [(set (pc) (label_ref (match_operand 0 "" ""))) + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") + (match_operand 2 "consttable_operand" "X") + (match_operand 3 "consttable_operand" "X")] + UNSPECV_SPLIT_STACK_CALL))] + "TARGET_CPU_ZARCH" +{ + gcc_unreachable (); +} + [(set_attr "length" "12")]) + +(define_expand "split_stack_cond_call" + [(match_operand 0 "" "") + (match_operand 1 "bras_sym_operand" "X") + (match_operand 2 "consttable_operand" "X") + (match_operand 3 "consttable_operand" "X") + (match_operand 4 "" "")] + "TARGET_CPU_ZARCH" +{ + if (TARGET_64BIT) + emit_jump_insn (gen_split_stack_cond_call_di (operands[0], + operands[1], + operands[2], + operands[3], + operands[4])); + else + emit_jump_insn (gen_split_stack_cond_call_si (operands[0], + operands[1], + operands[2], + operands[3], + operands[4])); + DONE; +}) + +(define_insn "split_stack_cond_call_<mode>" + [(set (pc) + (if_then_else + (match_operand 4 "" "") + (label_ref (match_operand 0 "" "")) + (pc))) + (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X") + (match_operand 2 "consttable_operand" "X") + (match_operand 3 "consttable_operand" "X")] + UNSPECV_SPLIT_STACK_CALL))] + "TARGET_CPU_ZARCH" +{ + gcc_unreachable (); +} + [(set_attr "length" "12")]) + +;; If there are operand 0 bytes available on the stack, jump to +;; operand 1. + +(define_expand "split_stack_space_check" + [(set (pc) (if_then_else + (ltu (minus (reg 15) + (match_operand 0 "register_operand")) + (unspec [(const_int 0)] UNSPEC_STACK_CHECK)) + (label_ref (match_operand 1)) + (pc)))] + "" +{ + /* Offset from thread pointer to __private_ss. */ + int psso = TARGET_64BIT ? 0x38 : 0x20; + rtx tp = s390_get_thread_pointer (); + rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso)); + rtx reg = gen_reg_rtx (Pmode); + rtx cc; + if (TARGET_64BIT) + emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0])); + else + emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0])); + cc = s390_emit_compare (GT, reg, guard); + s390_emit_jump (operands[1], cc); + + DONE; +}) + +;; A jg with minimal fuss for use in split stack prologue. + +(define_expand "split_stack_sibcall" + [(match_operand 0 "bras_sym_operand" "X") + (match_operand 1 "" "")] + "TARGET_CPU_ZARCH" +{ + if (TARGET_64BIT) + emit_jump_insn (gen_split_stack_sibcall_di (operands[0], operands[1])); + else + emit_jump_insn (gen_split_stack_sibcall_si (operands[0], operands[1])); + DONE; +}) + +(define_insn "split_stack_sibcall_<mode>" + [(set (pc) (label_ref (match_operand 1 "" ""))) + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] + UNSPECV_SPLIT_STACK_SIBCALL))] + "TARGET_CPU_ZARCH" + "jg\t%0" + [(set_attr "op_type" "RIL") + (set_attr "type" "branch")]) + +;; Also a conditional one. + +(define_expand "split_stack_cond_sibcall" + [(match_operand 0 "bras_sym_operand" "X") + (match_operand 1 "" "") + (match_operand 2 "" "")] + "TARGET_CPU_ZARCH" +{ + if (TARGET_64BIT) + emit_jump_insn (gen_split_stack_cond_sibcall_di (operands[0], operands[1], operands[2])); + else + emit_jump_insn (gen_split_stack_cond_sibcall_si (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_insn "split_stack_cond_sibcall_<mode>" + [(set (pc) + (if_then_else + (match_operand 1 "" "") + (label_ref (match_operand 2 "" "")) + (pc))) + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")] + UNSPECV_SPLIT_STACK_SIBCALL))] + "TARGET_CPU_ZARCH" + "jg%C1\t%0" + [(set_attr "op_type" "RIL") + (set_attr "type" "branch")]) + +;; An unusual nop instruction used to mark functions with no stack frames +;; as split-stack aware. + +(define_insn "split_stack_marker" + [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)] + "" + "nopr\t%%r15" + [(set_attr "op_type" "RR")]) diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog index 49c7929..3900ab1 100644 --- a/libgcc/ChangeLog +++ b/libgcc/ChangeLog @@ -1,3 +1,10 @@ +2016-02-02 Marcin KoÅcielnicki <koriakin@0x04.net> + + * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. + * config/s390/morestack.S: New file. + * config/s390/t-stack-s390: New file. + * generic-morestack.c (__splitstack_find): Add s390-specific code. + 2016-01-25 Jakub Jelinek <jakub@redhat.com> PR target/69444 diff --git a/libgcc/config.host b/libgcc/config.host index d8efd82..2be5f7e 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1114,11 +1114,11 @@ rx-*-elf) tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h" ;; s390-*-linux*) - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi" + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390" md_unwind_header=s390/linux-unwind.h ;; s390x-*-linux*) - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux" + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390" if test "${host_address}" = 32; then tmake_file="${tmake_file} s390/32/t-floattodi" fi diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S new file mode 100644 index 0000000..141dead --- /dev/null +++ b/libgcc/config/s390/morestack.S @@ -0,0 +1,609 @@ +# s390 support for -fsplit-stack. +# Copyright (C) 2015 Free Software Foundation, Inc. +# Contributed by Marcin KoÅcielnicki <koriakin@0x04.net>. + +# This file is part of GCC. + +# GCC is free software; you can redistribute it and/or modify it under +# the terms of the GNU General Public License as published by the Free +# Software Foundation; either version 3, or (at your option) any later +# version. + +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. + +# Under Section 7 of GPL version 3, you are granted additional +# permissions described in the GCC Runtime Library Exception, version +# 3.1, as published by the Free Software Foundation. + +# You should have received a copy of the GNU General Public License and +# a copy of the GCC Runtime Library Exception along with this program; +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +# <http://www.gnu.org/licenses/>. + +# Excess space needed to call ld.so resolver for lazy plt +# resolution. Go uses sigaltstack so this doesn't need to +# also cover signal frame size. +#define BACKOFF 0x1000 + +# The __morestack function. + + .global __morestack + .hidden __morestack + + .type __morestack,@function + +__morestack: +.LFB1: + .cfi_startproc + + +#ifndef __s390x__ + + +# The 31-bit __morestack function. + + # We use a cleanup to restore the stack guard if an exception + # is thrown through this code. +#ifndef __PIC__ + .cfi_personality 0,__gcc_personality_v0 + .cfi_lsda 0,.LLSDA1 +#else + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 + .cfi_lsda 0x1b,.LLSDA1 +#endif + + stm %r2, %r15, 0x8(%r15) # Save %r2-%r15. + .cfi_offset %r6, -0x48 + .cfi_offset %r7, -0x44 + .cfi_offset %r8, -0x40 + .cfi_offset %r9, -0x3c + .cfi_offset %r10, -0x38 + .cfi_offset %r11, -0x34 + .cfi_offset %r12, -0x30 + .cfi_offset %r13, -0x2c + .cfi_offset %r14, -0x28 + .cfi_offset %r15, -0x24 + lr %r11, %r15 # Make frame pointer for vararg. + .cfi_def_cfa_register %r11 + ahi %r15, -0x60 # 0x60 for standard frame. + st %r11, 0(%r15) # Save back chain. + lr %r8, %r0 # Save %r0 (static chain). + lr %r10, %r1 # Save %r1 (address of parameter block). + + l %r7, 0(%r10) # Required frame size to %r7 + ear %r1, %a0 # Extract thread pointer. + l %r1, 0x20(%r1) # Get stack bounduary + ar %r1, %r7 # Stack bounduary + frame size + a %r1, 4(%r10) # + stack param size + clr %r1, %r15 # Compare with current stack pointer + jle .Lnoalloc # guard > sp - frame-size: need alloc + + brasl %r14, __morestack_block_signals + + # We abuse one of caller's fpr save slots (which we don't use for fprs) + # as a local variable. Not needed here, but done to be consistent with + # the below use. + ahi %r7, BACKOFF # Bump requested size a bit. + st %r7, 0x40(%r11) # Stuff frame size on stack. + la %r2, 0x40(%r11) # Pass its address as parameter. + la %r3, 0x60(%r11) # Caller's stack parameters. + l %r4, 4(%r10) # Size of stack parameters. + brasl %r14, __generic_morestack + + lr %r15, %r2 # Switch to the new stack. + ahi %r15, -0x60 # Make a stack frame on it. + st %r11, 0(%r15) # Save back chain. + + s %r2, 0x40(%r11) # The end of stack space. + ahi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. +.LEHB0: + st %r2, 0x20(%r1) # Save the new stack boundary. + + brasl %r14, __morestack_unblock_signals + + lr %r0, %r8 # Static chain. + lm %r2, %r6, 0x8(%r11) # Paremeter registers. + + # Third parameter is address of function meat - address of parameter + # block. + a %r10, 0x8(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0x60(%r11) + + # State of registers: + # %r0: Static chain from entry. + # %r1: Vararg pointer. + # %r2-%r6: Parameters from entry. + # %r7-%r10: Indeterminate. + # %r11: Frame pointer (%r15 from entry). + # %r12-%r13: Indeterminate. + # %r14: Return address. + # %r15: Stack pointer. + basr %r14, %r10 # Call our caller. + + stm %r2, %r3, 0x8(%r11) # Save return registers. + + brasl %r14, __morestack_block_signals + + # We need a stack slot now, but have no good way to get it - the frame + # on new stack had to be exactly 0x60 bytes, or stack parameters would + # be passed wrong. Abuse fpr save area in caller's frame (we don't + # save actual fprs). + la %r2, 0x40(%r11) + brasl %r14, __generic_releasestack + + s %r2, 0x40(%r11) # Subtract available space. + ahi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. +.LEHE0: + st %r2, 0x20(%r1) # Save the new stack boundary. + + # We need to restore the old stack pointer before unblocking signals. + # We also need 0x60 bytes for a stack frame. Since we had a stack + # frame at this place before the stack switch, there's no need to + # write the back chain again. + lr %r15, %r11 + ahi %r15, -0x60 + + brasl %r14, __morestack_unblock_signals + + lm %r2, %r15, 0x8(%r11) # Restore all registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# Executed if no new stack allocation is needed. + +.Lnoalloc: + .cfi_restore_state + # We may need to copy stack parameters. + l %r9, 0x4(%r10) # Load stack parameter size. + ltr %r9, %r9 # And check if it's 0. + je .Lnostackparm # Skip the copy if not needed. + sr %r15, %r9 # Make space on the stack. + la %r8, 0x60(%r15) # Destination. + la %r12, 0x60(%r11) # Source. + lr %r13, %r9 # Source size. +.Lcopy: + mvcle %r8, %r12, 0 # Copy. + jo .Lcopy + +.Lnostackparm: + # Third parameter is address of function meat - address of parameter + # block. + a %r10, 0x8(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0x60(%r11) + + # OK, no stack allocation needed. We still follow the protocol and + # call our caller - it doesn't cost much and makes sure vararg works. + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. + basr %r14, %r10 # Call our caller. + + lm %r6, %r15, 0x18(%r11) # Restore all callee-saved registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# This is the cleanup code called by the stack unwinder when unwinding +# through the code between .LEHB0 and .LEHE0 above. + +.L1: + .cfi_restore_state + lr %r2, %r11 # Stack pointer after resume. + brasl %r14, __generic_findstack + lr %r3, %r11 # Get the stack pointer. + sr %r3, %r2 # Subtract available space. + ahi %r3, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. + st %r3, 0x20(%r1) # Save the new stack boundary. + + lr %r2, %r6 # Exception header. +#ifdef __PIC__ + brasl %r14, _Unwind_Resume@PLT +#else + brasl %r14, _Unwind_Resume +#endif + +#else /* defined(__s390x__) */ + + +# The 64-bit __morestack function. + + # We use a cleanup to restore the stack guard if an exception + # is thrown through this code. +#ifndef __PIC__ + .cfi_personality 0x3,__gcc_personality_v0 + .cfi_lsda 0x3,.LLSDA1 +#else + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 + .cfi_lsda 0x1b,.LLSDA1 +#endif + + stmg %r2, %r15, 0x10(%r15) # Save %r2-%r15. + .cfi_offset %r6, -0x70 + .cfi_offset %r7, -0x68 + .cfi_offset %r8, -0x60 + .cfi_offset %r9, -0x58 + .cfi_offset %r10, -0x50 + .cfi_offset %r11, -0x48 + .cfi_offset %r12, -0x40 + .cfi_offset %r13, -0x38 + .cfi_offset %r14, -0x30 + .cfi_offset %r15, -0x28 + lgr %r11, %r15 # Make frame pointer for vararg. + .cfi_def_cfa_register %r11 + aghi %r15, -0xa0 # 0xa0 for standard frame. + stg %r11, 0(%r15) # Save back chain. + lgr %r8, %r0 # Save %r0 (static chain). + lgr %r10, %r1 # Save %r1 (address of parameter block). + + lg %r7, 0(%r10) # Required frame size to %r7 + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. + lg %r1, 0x38(%r1) # Get stack bounduary + agr %r1, %r7 # Stack bounduary + frame size + ag %r1, 8(%r10) # + stack param size + clgr %r1, %r15 # Compare with current stack pointer + jle .Lnoalloc # guard > sp - frame-size: need alloc + + brasl %r14, __morestack_block_signals + + # We abuse one of caller's fpr save slots (which we don't use for fprs) + # as a local variable. Not needed here, but done to be consistent with + # the below use. + aghi %r7, BACKOFF # Bump requested size a bit. + stg %r7, 0x80(%r11) # Stuff frame size on stack. + la %r2, 0x80(%r11) # Pass its address as parameter. + la %r3, 0xa0(%r11) # Caller's stack parameters. + lg %r4, 8(%r10) # Size of stack parameters. + brasl %r14, __generic_morestack + + lgr %r15, %r2 # Switch to the new stack. + aghi %r15, -0xa0 # Make a stack frame on it. + stg %r11, 0(%r15) # Save back chain. + + sg %r2, 0x80(%r11) # The end of stack space. + aghi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. +.LEHB0: + stg %r2, 0x38(%r1) # Save the new stack boundary. + + brasl %r14, __morestack_unblock_signals + + lgr %r0, %r8 # Static chain. + lmg %r2, %r6, 0x10(%r11) # Paremeter registers. + + # Third parameter is address of function meat - address of parameter + # block. + ag %r10, 0x10(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0xa0(%r11) + + # State of registers: + # %r0: Static chain from entry. + # %r1: Vararg pointer. + # %r2-%r6: Parameters from entry. + # %r7-%r10: Indeterminate. + # %r11: Frame pointer (%r15 from entry). + # %r12-%r13: Indeterminate. + # %r14: Return address. + # %r15: Stack pointer. + basr %r14, %r10 # Call our caller. + + stg %r2, 0x10(%r11) # Save return register. + + brasl %r14, __morestack_block_signals + + # We need a stack slot now, but have no good way to get it - the frame + # on new stack had to be exactly 0xa0 bytes, or stack parameters would + # be passed wrong. Abuse fpr save area in caller's frame (we don't + # save actual fprs). + la %r2, 0x80(%r11) + brasl %r14, __generic_releasestack + + sg %r2, 0x80(%r11) # Subtract available space. + aghi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. +.LEHE0: + stg %r2, 0x38(%r1) # Save the new stack boundary. + + # We need to restore the old stack pointer before unblocking signals. + # We also need 0xa0 bytes for a stack frame. Since we had a stack + # frame at this place before the stack switch, there's no need to + # write the back chain again. + lgr %r15, %r11 + aghi %r15, -0xa0 + + brasl %r14, __morestack_unblock_signals + + lmg %r2, %r15, 0x10(%r11) # Restore all registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# Executed if no new stack allocation is needed. + +.Lnoalloc: + .cfi_restore_state + # We may need to copy stack parameters. + lg %r9, 0x8(%r10) # Load stack parameter size. + ltgr %r9, %r9 # Check if it's 0. + je .Lnostackparm # Skip the copy if not needed. + sgr %r15, %r9 # Make space on the stack. + la %r8, 0xa0(%r15) # Destination. + la %r12, 0xa0(%r11) # Source. + lgr %r13, %r9 # Source size. +.Lcopy: + mvcle %r8, %r12, 0 # Copy. + jo .Lcopy + +.Lnostackparm: + # Third parameter is address of function meat - address of parameter + # block. + ag %r10, 0x10(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0xa0(%r11) + + # OK, no stack allocation needed. We still follow the protocol and + # call our caller - it doesn't cost much and makes sure vararg works. + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. + basr %r14, %r10 # Call our caller. + + lmg %r6, %r15, 0x30(%r11) # Restore all callee-saved registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# This is the cleanup code called by the stack unwinder when unwinding +# through the code between .LEHB0 and .LEHE0 above. + +.L1: + .cfi_restore_state + lgr %r2, %r11 # Stack pointer after resume. + brasl %r14, __generic_findstack + lgr %r3, %r11 # Get the stack pointer. + sgr %r3, %r2 # Subtract available space. + aghi %r3, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. + stg %r3, 0x38(%r1) # Save the new stack boundary. + + lgr %r2, %r6 # Exception header. +#ifdef __PIC__ + brasl %r14, _Unwind_Resume@PLT +#else + brasl %r14, _Unwind_Resume +#endif + +#endif /* defined(__s390x__) */ + + .cfi_endproc + .size __morestack, . - __morestack + + +# The exception table. This tells the personality routine to execute +# the exception handler. + + .section .gcc_except_table,"a",@progbits + .align 4 +.LLSDA1: + .byte 0xff # @LPStart format (omit) + .byte 0xff # @TType format (omit) + .byte 0x1 # call-site format (uleb128) + .uleb128 .LLSDACSE1-.LLSDACSB1 # Call-site table length +.LLSDACSB1: + .uleb128 .LEHB0-.LFB1 # region 0 start + .uleb128 .LEHE0-.LEHB0 # length + .uleb128 .L1-.LFB1 # landing pad + .uleb128 0 # action +.LLSDACSE1: + + + .global __gcc_personality_v0 +#ifdef __PIC__ + # Build a position independent reference to the basic + # personality function. + .hidden DW.ref.__gcc_personality_v0 + .weak DW.ref.__gcc_personality_v0 + .section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat + .type DW.ref.__gcc_personality_v0, @object +DW.ref.__gcc_personality_v0: +#ifndef __LP64__ + .align 4 + .size DW.ref.__gcc_personality_v0, 4 + .long __gcc_personality_v0 +#else + .align 8 + .size DW.ref.__gcc_personality_v0, 8 + .quad __gcc_personality_v0 +#endif +#endif + + + +# Initialize the stack test value when the program starts or when a +# new thread starts. We don't know how large the main stack is, so we +# guess conservatively. We might be able to use getrlimit here. + + .text + .global __stack_split_initialize + .hidden __stack_split_initialize + + .type __stack_split_initialize, @function + +__stack_split_initialize: + +#ifndef __s390x__ + + ear %r1, %a0 + lr %r0, %r15 + ahi %r0, -0x4000 # We should have at least 16K. + st %r0, 0x20(%r1) + + lr %r2, %r15 + lhi %r3, 0x4000 +#ifdef __PIC__ + jg __generic_morestack_set_initial_sp@PLT # Tail call +#else + jg __generic_morestack_set_initial_sp # Tail call +#endif + +#else /* defined(__s390x__) */ + + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + lgr %r0, %r15 + aghi %r0, -0x4000 # We should have at least 16K. + stg %r0, 0x38(%r1) + + lgr %r2, %r15 + lghi %r3, 0x4000 +#ifdef __PIC__ + jg __generic_morestack_set_initial_sp@PLT # Tail call +#else + jg __generic_morestack_set_initial_sp # Tail call +#endif + +#endif /* defined(__s390x__) */ + + .size __stack_split_initialize, . - __stack_split_initialize + +# Routines to get and set the guard, for __splitstack_getcontext, +# __splitstack_setcontext, and __splitstack_makecontext. + +# void *__morestack_get_guard (void) returns the current stack guard. + .text + .global __morestack_get_guard + .hidden __morestack_get_guard + + .type __morestack_get_guard,@function + +__morestack_get_guard: + +#ifndef __s390x__ + ear %r1, %a0 + l %r2, 0x20(%r1) +#else + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + lg %r2, 0x38(%r1) +#endif + br %r14 + + .size __morestack_get_guard, . - __morestack_get_guard + +# void __morestack_set_guard (void *) sets the stack guard. + .global __morestack_set_guard + .hidden __morestack_set_guard + + .type __morestack_set_guard,@function + +__morestack_set_guard: + +#ifndef __s390x__ + ear %r1, %a0 + st %r2, 0x20(%r1) +#else + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + stg %r2, 0x38(%r1) +#endif + br %r14 + + .size __morestack_set_guard, . - __morestack_set_guard + +# void *__morestack_make_guard (void *, size_t) returns the stack +# guard value for a stack. + .global __morestack_make_guard + .hidden __morestack_make_guard + + .type __morestack_make_guard,@function + +__morestack_make_guard: + +#ifndef __s390x__ + sr %r2, %r3 + ahi %r2, BACKOFF +#else + sgr %r2, %r3 + aghi %r2, BACKOFF +#endif + br %r14 + + .size __morestack_make_guard, . - __morestack_make_guard + +# Make __stack_split_initialize a high priority constructor. + + .section .ctors.65535,"aw",@progbits + +#ifndef __LP64__ + .align 4 + .long __stack_split_initialize + .long __morestack_load_mmap +#else + .align 8 + .quad __stack_split_initialize + .quad __morestack_load_mmap +#endif + + .section .note.GNU-stack,"",@progbits + .section .note.GNU-split-stack,"",@progbits + .section .note.GNU-no-split-stack,"",@progbits diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390 new file mode 100644 index 0000000..4c959b0 --- /dev/null +++ b/libgcc/config/s390/t-stack-s390 @@ -0,0 +1,2 @@ +# Makefile fragment to support -fsplit-stack for s390. +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c index 89765d4..b8eec4e 100644 --- a/libgcc/generic-morestack.c +++ b/libgcc/generic-morestack.c @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len, #elif defined (__i386__) nsp -= 6 * sizeof (void *); #elif defined __powerpc64__ +#elif defined __s390x__ + nsp -= 2 * 160; +#elif defined __s390__ + nsp -= 2 * 96; #else #error "unrecognized target" #endif -- 2.7.0 ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-02 15:31 ` Marcin Kościelnicki @ 2016-02-02 18:34 ` Ulrich Weigand 2016-02-02 20:11 ` Marcin Kościelnicki 2016-02-03 0:20 ` Marcin Kościelnicki 0 siblings, 2 replies; 55+ messages in thread From: Ulrich Weigand @ 2016-02-02 18:34 UTC (permalink / raw) To: Marcin Kościelnicki; +Cc: krebbel, gcc-patches, Marcin Kościelnicki Marcin Koà Âcielnicki wrote: > Here we go. I've also removed the "see below", since I don't really > see anything below... The "see below" refers to this code (which I agree isn't really obvious): if (TARGET_TPF_PROFILING) { /* Generate a BAS instruction to serve as a function entry intercept to facilitate the use of tracing algorithms located at the branch target. */ emit_insn (gen_prologue_tpf ()); What is not explicitly called out here is that this tracing function actually refers to some hard registers, in particular r14, and assumes they still have the original contents as at function entry. That is why the prolog code avoid using r14 as temporary if the TPF tracing mechanism is in use. Now I think this doesn't apply to r12, so this part of your patch should still be fine. (In addition, TPF is not going to support split stacks --or indeed the Go language-- anyway, so it doesn't really matter all that much.) I do have two other issues; sorry for bringing those up again although they've been discussed up in the past, but I still think we can find some improvements here ... The first is the question Andreas brought up, why we need the extra set of insns introduced by s390_reorg. I think this may really have been necessary for the ESA case where data elements had to be intermixed into code at a specific location. But since we no longer support ESA, we now just have a data block that can be placed anywhere. For example, we could just have an insn (at any point in the prolog stream) that simply emits the full data block during final output, along the lines of (note: needs to be updated for SImode vs. DImode.): (define_insn "split_stack_data" [(unspec_volatile [(match_operand 0 "bras_sym_operand" "X") (match_operand 1 "bras_sym_operand" "X") (match_operand 2 "consttable_operand" "X") (match_operand 3 "consttable_operand" "X")] UNSPECV_SPLIT_STACK_DATA)] "" { switch_to_section (targetm.asm_out.function_rodata_section (current_function_decl)); output_asm_insn (\".align 3", operands); (*targetm.asm_out.internal_label) (asm_out_file, \"L\", CODE_LABEL_NUMBER (operands[0])); output_asm_insn (\".quad %2\", operands); output_asm_insn (\".quad %3\", operands); output_asm_insn (\".quad %1-%0\", operands); switch_to_section (current_function_section ()); return ""; } [(set_attr "length" "0")]) Or possibly even cleaner, we can simply define the data block at the tree level as if it were an initialized global variable of a certain struct type, and just leave it to common code to emit it as usual. Then we just have the code bits, but I don't really see much difference between the split_stack_call and split_stack_sibcall patterns (apart from the data block), so if code flow is OK with the former insns, it should be OK with the latter too .. [ Or else, if there *are* code flow issues, the other alternative would be to emit the full call sequence, code and data, from a single insn pattern during final output. This might have the extra benefit that the assembler sequence is fully fixed, and thus easier to detect in the linker. ] Getting rid of the extra transformation in s390_reorg would not just remove a bunch of code from the back-end (always good!), it would also speed up compile time a bit. The second issue I'm still not sure about is the magic nop marker for frameless functions. In an earlier mail you wrote: > Both currently supported > architectures always emit split-stack code on every function. At least for rs6000 this doesn't appear to be true; in rs6000_expand_split_stack_prologue we have: if (!info->push_p) return; so it does nothing for frameless routines. Now on i386 we do indeed generate code for frameless routines; in fact, the *same* full stack check is generated as for any other routine. Now I'm wondering: is there are reason why this check would be necessary (and there's simply a bug in the rs6000 implementation)? Then we obviously should do the same on s390. On the other hand, if rs6000 works fine *without* any code in frameless routines, why wouldn't that work for s390 too? Emitting a nop (that is always executed) still looks weird to me. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-02 18:34 ` Ulrich Weigand @ 2016-02-02 20:11 ` Marcin Kościelnicki 2016-02-03 18:40 ` Marcin Kościelnicki 2016-02-03 0:20 ` Marcin Kościelnicki 1 sibling, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-02-02 20:11 UTC (permalink / raw) To: Ulrich Weigand; +Cc: krebbel, gcc-patches On 02/02/16 19:33, Ulrich Weigand wrote: > Marcin Kościelnicki wrote: > >> Here we go. I've also removed the "see below", since I don't really >> see anything below... > > The "see below" refers to this code (which I agree isn't really obvious): > > if (TARGET_TPF_PROFILING) > { > /* Generate a BAS instruction to serve as a function > entry intercept to facilitate the use of tracing > algorithms located at the branch target. */ > emit_insn (gen_prologue_tpf ()); > > What is not explicitly called out here is that this tracing function > actually refers to some hard registers, in particular r14, and assumes > they still have the original contents as at function entry. > > That is why the prolog code avoid using r14 as temporary if the TPF > tracing mechanism is in use. Now I think this doesn't apply to r12, > so this part of your patch should still be fine. (In addition, TPF > is not going to support split stacks --or indeed the Go language-- > anyway, so it doesn't really matter all that much.) Very well, I'll improve the comment. > > > I do have two other issues; sorry for bringing those up again although > they've been discussed up in the past, but I still think we can find > some improvements here ... > > The first is the question Andreas brought up, why we need the extra > set of insns introduced by s390_reorg. I think this may really have > been necessary for the ESA case where data elements had to be intermixed > into code at a specific location. But since we no longer support ESA, > we now just have a data block that can be placed anywhere. For example, > we could just have an insn (at any point in the prolog stream) that > simply emits the full data block during final output, along the lines of > (note: needs to be updated for SImode vs. DImode.): > > (define_insn "split_stack_data" > [(unspec_volatile [(match_operand 0 "bras_sym_operand" "X") > (match_operand 1 "bras_sym_operand" "X") > (match_operand 2 "consttable_operand" "X") > (match_operand 3 "consttable_operand" "X")] > UNSPECV_SPLIT_STACK_DATA)] > "" > { > switch_to_section (targetm.asm_out.function_rodata_section > (current_function_decl)); > > output_asm_insn (\".align 3", operands); > (*targetm.asm_out.internal_label) (asm_out_file, \"L\", > CODE_LABEL_NUMBER (operands[0])); > output_asm_insn (\".quad %2\", operands); > output_asm_insn (\".quad %3\", operands); > output_asm_insn (\".quad %1-%0\", operands); > > switch_to_section (current_function_section ()); > return ""; > } > [(set_attr "length" "0")]) > > Or possibly even cleaner, we can simply define the data block at the > tree level as if it were an initialized global variable of a certain > struct type, and just leave it to common code to emit it as usual. > > Then we just have the code bits, but I don't really see much > difference between the split_stack_call and split_stack_sibcall > patterns (apart from the data block), so if code flow is OK with > the former insns, it should be OK with the latter too .. > > [ Or else, if there *are* code flow issues, the other alternative > would be to emit the full call sequence, code and data, from a > single insn pattern during final output. This might have the extra > benefit that the assembler sequence is fully fixed, and thus easier > to detect in the linker. ] > > Getting rid of the extra transformation in s390_reorg would not > just remove a bunch of code from the back-end (always good!), > it would also speed up compile time a bit. When I wasn't using reorg, I had problems with gcc deleting the label in .rodata, since it wasn't used by any jump instruction. I guess having a whole-block instruction that emits the label on its own should solve the issue, though - let's try that. > > > The second issue I'm still not sure about is the magic nop marker > for frameless functions. In an earlier mail you wrote: > >> Both currently supported >> architectures always emit split-stack code on every function. > > At least for rs6000 this doesn't appear to be true; in > rs6000_expand_split_stack_prologue we have: > > if (!info->push_p) > return; > > so it does nothing for frameless routines. > > Now on i386 we do indeed generate code for frameless routines; > in fact, the *same* full stack check is generated as for any > other routine. Now I'm wondering: is there are reason why > this check would be necessary (and there's simply a bug in > the rs6000 implementation)? Then we obviously should do the > same on s390. Try that on powerpc64(le): $ cat a.c #include <stdio.h> void f(void) { } typedef void (*fptr)(void); fptr g(void); int main() { printf("%p\n", g()); } $ cat b.c void f(void); typedef void (*fptr)(void); fptr g(void) { return f; } $ gcc -O3 -fsplit-stack -c b.c $ gcc -O3 -c a.c $ gcc a.o b.o -fuse-ld=gold I don't have a recent enough gcc for powerpc, but from what I've seen in the code, this should explode with a linker error. Of course, mixing split-stack and non-split-stack code when function pointers are involved is sketchy anyway, so what's one more bug... That said, for s390, we can avoid the above problem by checking the relocation in gold now that ESA paths are gone - for direct function calls (the only ones we care about), we should be seeing a relocation in brasl. So I'll remove the nopmark thing and add proper recognition in gold. > > On the other hand, if rs6000 works fine *without* any code > in frameless routines, why wouldn't that work for s390 too? > > Emitting a nop (that is always executed) still looks weird to me. > > > Bye, > Ulrich > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-02 20:11 ` Marcin Kościelnicki @ 2016-02-03 18:40 ` Marcin Kościelnicki 2016-02-04 15:06 ` Ulrich Weigand 0 siblings, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-02-03 18:40 UTC (permalink / raw) To: Ulrich Weigand; +Cc: krebbel, gcc-patches >> >> The second issue I'm still not sure about is the magic nop marker >> for frameless functions. In an earlier mail you wrote: >> >>> Both currently supported >>> architectures always emit split-stack code on every function. >> >> At least for rs6000 this doesn't appear to be true; in >> rs6000_expand_split_stack_prologue we have: >> >> if (!info->push_p) >> return; >> >> so it does nothing for frameless routines. >> >> Now on i386 we do indeed generate code for frameless routines; >> in fact, the *same* full stack check is generated as for any >> other routine. Now I'm wondering: is there are reason why >> this check would be necessary (and there's simply a bug in >> the rs6000 implementation)? Then we obviously should do the >> same on s390. > > Try that on powerpc64(le): > > $ cat a.c > #include <stdio.h> > > void f(void) { > } > > typedef void (*fptr)(void); > > fptr g(void); > > int main() { > printf("%p\n", g()); > } > > $ cat b.c > void f(void); > > typedef void (*fptr)(void); > > fptr g(void) { > return f; > } > > $ gcc -O3 -fsplit-stack -c b.c > $ gcc -O3 -c a.c > $ gcc a.o b.o -fuse-ld=gold > > I don't have a recent enough gcc for powerpc, but from what I've seen in > the code, this should explode with a linker error. > > Of course, mixing split-stack and non-split-stack code when function > pointers are involved is sketchy anyway, so what's one more bug... > > That said, for s390, we can avoid the above problem by checking the > relocation in gold now that ESA paths are gone - for direct function > calls (the only ones we care about), we should be seeing a relocation in > brasl. So I'll remove the nopmark thing and add proper recognition in > gold. Ugh. I take that back. For -fPIC, the load-address sequence is: larl %r1,f@GOTENT lg %r2,0(%r1) br %r14 And (sibling) call sequence is: larl %r1,f@GOTENT lg %r1,0(%r1) br %r1 It seems there's no proper way to recognize a call vs a load address - so we can either go with emitting the marker, or have the same problem as on ppc. So - how much should we care? > >> >> On the other hand, if rs6000 works fine *without* any code >> in frameless routines, why wouldn't that work for s390 too? >> >> Emitting a nop (that is always executed) still looks weird to me. >> >> >> Bye, >> Ulrich >> > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-03 18:40 ` Marcin Kościelnicki @ 2016-02-04 15:06 ` Ulrich Weigand 2016-02-04 15:20 ` Marcin Kościelnicki 0 siblings, 1 reply; 55+ messages in thread From: Ulrich Weigand @ 2016-02-04 15:06 UTC (permalink / raw) To: Marcin Kościelnicki; +Cc: krebbel, gcc-patches Marcin Koà Âcielnicki wrote: > Ugh. I take that back. For -fPIC, the load-address sequence is: > > larl %r1,f@GOTENT > lg %r2,0(%r1) > br %r14 This is correct. > And (sibling) call sequence is: > > larl %r1,f@GOTENT > lg %r1,0(%r1) > br %r1 Oops. That is actually a GCC bug. The sibcall sequence really must be: jg f@PLT This is a real bug since it forces non-lazy symbol resolution for f just because the compiler chose those use a sibcall optimization; that's not supposed to happen. It seems this bug was accidentally introduced here: 2010-04-20 Andreas Krebbel <Andreas.Krebbel@de.ibm.com> PR target/43635 * config/s390/s390.c (s390_emit_call): Turn direct into indirect calls for -fpic -m31 if they have been sibcall optimized. since the patch doesn't check for TARGET_64BIT ... Andreas, can you have a look? > It seems there's no proper way to recognize a call vs a load address - > so we can either go with emitting the marker, or have the same problem > as on ppc. > > So - how much should we care? I think we should fix that bug. That won't help for existing objects, but those don't use split stack either, so that shouldn't matter. If we fix that bug before (or at the same time as) adding split-stack support, the linker will still be able to distigunish function pointer loads from calls (including sibcalls) on all objects using split stack. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-04 15:06 ` Ulrich Weigand @ 2016-02-04 15:20 ` Marcin Kościelnicki 2016-02-04 16:27 ` Ulrich Weigand 0 siblings, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-02-04 15:20 UTC (permalink / raw) To: Ulrich Weigand; +Cc: krebbel, gcc-patches On 04/02/16 16:06, Ulrich Weigand wrote: > Marcin Koà âºcielnicki wrote: > >> Ugh. I take that back. For -fPIC, the load-address sequence is: >> >> larl %r1,f@GOTENT >> lg %r2,0(%r1) >> br %r14 > > This is correct. > >> And (sibling) call sequence is: >> >> larl %r1,f@GOTENT >> lg %r1,0(%r1) >> br %r1 > > Oops. That is actually a GCC bug. The sibcall sequence really must be: > > jg f@PLT > > This is a real bug since it forces non-lazy symbol resolution for f > just because the compiler chose those use a sibcall optimization; > that's not supposed to happen. > > It seems this bug was accidentally introduced here: > > 2010-04-20 Andreas Krebbel <Andreas.Krebbel@de.ibm.com> > > PR target/43635 > * config/s390/s390.c (s390_emit_call): Turn direct into indirect > calls for -fpic -m31 if they have been sibcall optimized. > > since the patch doesn't check for TARGET_64BIT ... > > Andreas, can you have a look? > >> It seems there's no proper way to recognize a call vs a load address - >> so we can either go with emitting the marker, or have the same problem >> as on ppc. >> >> So - how much should we care? > > I think we should fix that bug. That won't help for existing objects, > but those don't use split stack either, so that shouldn't matter. > > If we fix that bug before (or at the same time as) adding split-stack > support, the linker will still be able to distigunish function pointer > loads from calls (including sibcalls) on all objects using split stack. > > Bye, > Ulrich > Fair enough. Here's what I'm going to implement in gold: - any PLT relocation: call - PC32DBL on a larl: non-call - PC32DBL otherwise: call - any other relocation: non-call Does that sound right? Marcin KoÅcielnicki ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-04 15:20 ` Marcin Kościelnicki @ 2016-02-04 16:27 ` Ulrich Weigand 2016-02-05 21:13 ` Marcin Kościelnicki 0 siblings, 1 reply; 55+ messages in thread From: Ulrich Weigand @ 2016-02-04 16:27 UTC (permalink / raw) To: Marcin Kościelnicki; +Cc: krebbel, gcc-patches Marcin Koà Âcielnicki wrote: > Fair enough. Here's what I'm going to implement in gold: > > - any PLT relocation: call > - PC32DBL on a larl: non-call > - PC32DBL otherwise: call > - any other relocation: non-call > > Does that sound right? Hmm, I'm wondering about the PC32DBL choices. There are now a large number of other non-call instructions that use PC32DBL, including lrl, strl, crl, cgrl, cgfrl, ... However, these all access *data* at the pointed-to location, so it is quite unlikely they would ever be used with a function symbol. So, assuming that you also check that the target of the relocation is a function symbol, treating only larl as non-call might be OK. Maybe a more conservative approach might be to make the decision the other way round: for PC32DBL check for *branch* instructions, and treat only those are calls. There's just a few branch instruction using PC32DBL: brasl (call) brcl (conditional or unconditional sibcall) brcth (???) where the last one is extremely unlikely (but theorically possible) to be used as conditional sibcall combined with a register decrement; I don't think this can ever happen with current compilers however. For full completeness, there are also PC16DBL relocations that *could* target called functions, but only when compiling with the -msmall-exec flag to assume total executable size is less than 64 KB. These are used by the following instructions: bras brc brct brctg brxh brxhg brxle brxlg crj cgrj clrj clgrj cij cgij clij clgij Note that those are *all* branch instructions, so it might make sense to add any PC16DBL targetting a function symbol to the list of calls, just in case. (But since basically nobody ever uses -msmall-exec, it doesn't really matter much either.) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-04 16:27 ` Ulrich Weigand @ 2016-02-05 21:13 ` Marcin Kościelnicki 2016-02-05 22:02 ` Ulrich Weigand 0 siblings, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-02-05 21:13 UTC (permalink / raw) To: Ulrich Weigand; +Cc: krebbel, gcc-patches On 04/02/16 17:27, Ulrich Weigand wrote: > Marcin Koà âºcielnicki wrote: > >> Fair enough. Here's what I'm going to implement in gold: >> >> - any PLT relocation: call >> - PC32DBL on a larl: non-call >> - PC32DBL otherwise: call >> - any other relocation: non-call >> >> Does that sound right? > > Hmm, I'm wondering about the PC32DBL choices. There are now > a large number of other non-call instructions that use PC32DBL, > including lrl, strl, crl, cgrl, cgfrl, ... > > However, these all access *data* at the pointed-to location, > so it is quite unlikely they would ever be used with a > function symbol. So, assuming that you also check that the > target of the relocation is a function symbol, treating only > larl as non-call might be OK. Yeah, I make sure the symbol is a STT_FUNC. > > Maybe a more conservative approach might be to make the decision > the other way round: for PC32DBL check for *branch* instructions, > and treat only those are calls. There's just a few branch > instruction using PC32DBL: > > brasl (call) > brcl (conditional or unconditional sibcall) > brcth (???) > > where the last one is extremely unlikely (but theorically > possible) to be used as conditional sibcall combined with > a register decrement; I don't think this can ever happen > with current compilers however. I'll stay with checking for larl - while I can imagine someone adding a new conditional branch instruction, I don't see a need for another larl-like instruction. Besides, this way the failure mode for an unknown instruction would be producing an error, instead of silently emitting code with unfixed prologue. > > For full completeness, there are also PC16DBL relocations that > *could* target called functions, but only when compiling with > the -msmall-exec flag to assume total executable size is less > than 64 KB. These are used by the following instructions: > > bras > brc > brct > brctg > brxh > brxhg > brxle > brxlg > crj > cgrj > clrj > clgrj > cij > cgij > clij > clgij > > Note that those are *all* branch instructions, so it might > make sense to add any PC16DBL targetting a function symbol > to the list of calls, just in case. (But since basically > nobody ever uses -msmall-exec, it doesn't really matter > much either.) Ah right, I've added PC16DBL to the "always call" list. > > Bye, > Ulrich > I've updated and resubmitted the gold patch. Marcin KoÅcielnicki ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-05 21:13 ` Marcin Kościelnicki @ 2016-02-05 22:02 ` Ulrich Weigand 0 siblings, 0 replies; 55+ messages in thread From: Ulrich Weigand @ 2016-02-05 22:02 UTC (permalink / raw) To: Marcin Kościelnicki; +Cc: krebbel, gcc-patches Marcin Koà Âcielnicki wrote: > I'll stay with checking for larl - while I can imagine someone adding a > new conditional branch instruction, I don't see a need for another > larl-like instruction. Besides, this way the failure mode for an > unknown instruction would be producing an error, instead of silently > emitting code with unfixed prologue. OK, fine with me. B.t.w. Andreas has checked in the sibcall fix, so you no longer should be seeing larl used for sibcalls. > I've updated and resubmitted the gold patch. Thanks! Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH] s390: Add -fsplit-stack support 2016-02-02 18:34 ` Ulrich Weigand 2016-02-02 20:11 ` Marcin Kościelnicki @ 2016-02-03 0:20 ` Marcin Kościelnicki 2016-02-03 17:03 ` Ulrich Weigand 1 sibling, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-02-03 0:20 UTC (permalink / raw) To: uweigand; +Cc: krebbel, gcc-patches, Marcin Kościelnicki libgcc/ChangeLog: * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. * config/s390/morestack.S: New file. * config/s390/t-stack-s390: New file. * generic-morestack.c (__splitstack_find): Add s390-specific code. gcc/ChangeLog: * common/config/s390/s390-common.c (s390_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro. * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. * config/s390/s390.c (struct machine_function): New field split_stack_varargs_pointer. (s390_register_info): Mark r12 as clobbered if it'll be used as temp in s390_emit_prologue. (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack vararg pointer. (morestack_ref): New global. (SPLIT_STACK_AVAILABLE): New macro. (s390_expand_split_stack_prologue): New function. (s390_live_on_entry): New function. (s390_va_start): Use split-stack vararg pointer if appropriate. (s390_asm_file_end): Emit the split-stack note sections. (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. (UNSPECV_SPLIT_STACK_CALL): New unspec. (UNSPECV_SPLIT_STACK_DATA): New unspec. (split_stack_prologue): New expand. (split_stack_space_check): New expand. (split_stack_data): New insn. (split_stack_call): New expand. (split_stack_call_*): New insn. (split_stack_cond_call): New expand. (split_stack_cond_call_*): New insn. --- Comment fixed, split_stack_marker gone, reorg gone. Generated code seems sane, but testsuite still running. I will need to modify the gold patch to handle the "leaf function taking non-split stack function address" issue - this will likely require messing with the target independent plumbing, the hook for that doesn't seem to get enough params. gcc/ChangeLog | 30 ++ gcc/common/config/s390/s390-common.c | 14 + gcc/config/s390/s390-protos.h | 1 + gcc/config/s390/s390.c | 214 +++++++++++- gcc/config/s390/s390.md | 138 ++++++++ libgcc/ChangeLog | 7 + libgcc/config.host | 4 +- libgcc/config/s390/morestack.S | 609 +++++++++++++++++++++++++++++++++++ libgcc/config/s390/t-stack-s390 | 2 + libgcc/generic-morestack.c | 4 + 10 files changed, 1016 insertions(+), 7 deletions(-) create mode 100644 libgcc/config/s390/morestack.S create mode 100644 libgcc/config/s390/t-stack-s390 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 9a2cec8..568dff4 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,33 @@ +2016-02-02 Marcin KoÅcielnicki <koriakin@0x04.net> + + * common/config/s390/s390-common.c (s390_supports_split_stack): + New function. + (TARGET_SUPPORTS_SPLIT_STACK): New macro. + * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. + * config/s390/s390.c (struct machine_function): New field + split_stack_varargs_pointer. + (s390_register_info): Mark r12 as clobbered if it'll be used as temp + in s390_emit_prologue. + (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack + vararg pointer. + (morestack_ref): New global. + (SPLIT_STACK_AVAILABLE): New macro. + (s390_expand_split_stack_prologue): New function. + (s390_live_on_entry): New function. + (s390_va_start): Use split-stack vararg pointer if appropriate. + (s390_asm_file_end): Emit the split-stack note sections. + (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. + * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. + (UNSPECV_SPLIT_STACK_CALL): New unspec. + (UNSPECV_SPLIT_STACK_DATA): New unspec. + (split_stack_prologue): New expand. + (split_stack_space_check): New expand. + (split_stack_data): New insn. + (split_stack_call): New expand. + (split_stack_call_*): New insn. + (split_stack_cond_call): New expand. + (split_stack_cond_call_*): New insn. + 2016-02-02 Thomas Schwinge <thomas@codesourcery.com> * omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): Remove. diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c index 4519c21..1e497e6 100644 --- a/gcc/common/config/s390/s390-common.c +++ b/gcc/common/config/s390/s390-common.c @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, } } +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23. + We don't verify it, since earlier versions just have padding at + its place, which works just as well. */ + +static bool +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED, + struct gcc_options *opts ATTRIBUTE_UNUSED) +{ + return true; +} + #undef TARGET_DEFAULT_TARGET_FLAGS #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT) @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, #undef TARGET_OPTION_INIT_STRUCT #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct +#undef TARGET_SUPPORTS_SPLIT_STACK +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack + struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index 633bc1e..09032c9 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, extern HOST_WIDE_INT s390_initial_elimination_offset (int, int); extern void s390_emit_prologue (void); extern void s390_emit_epilogue (bool); +extern void s390_expand_split_stack_prologue (void); extern bool s390_can_use_simple_return_insn (void); extern bool s390_can_use_return_insn (void); extern void s390_function_profiler (FILE *, int); diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 3be64de..aafb442 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -426,6 +426,13 @@ struct GTY(()) machine_function /* True if the current function may contain a tbegin clobbering FPRs. */ bool tbegin_p; + + /* For -fsplit-stack support: A stack local which holds a pointer to + the stack arguments for a function with a variable number of + arguments. This is set at the start of the function and is used + to initialize the overflow_arg_area field of the va_list + structure. */ + rtx split_stack_varargs_pointer; }; /* Few accessor macros for struct cfun->machine->s390_frame_layout. */ @@ -9316,9 +9323,13 @@ s390_register_info () cfun_frame_layout.high_fprs++; } - if (flag_pic) - clobbered_regs[PIC_OFFSET_TABLE_REGNUM] - |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM); + /* Register 12 is used for GOT address, but also as temp in prologue + for split-stack stdarg functions (unless r14 is available). */ + clobbered_regs[12] + |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM)) + || (flag_split_stack && cfun->stdarg + && (crtl->is_leaf || TARGET_TPF_PROFILING + || has_hard_reg_initial_val (Pmode, RETURN_REGNUM)))); clobbered_regs[BASE_REGNUM] |= (cfun->machine->base_reg @@ -10440,12 +10451,15 @@ s390_emit_prologue (void) int next_fpr = 0; /* Choose best register to use for temp use within prologue. - See below for why TPF must use the register 1. */ + TPF with profiling must avoid the register 14 - the tracing function + needs the original contents of r14 to be preserved. */ if (!has_hard_reg_initial_val (Pmode, RETURN_REGNUM) && !crtl->is_leaf && !TARGET_TPF_PROFILING) temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM); + else if (flag_split_stack && cfun->stdarg) + temp_reg = gen_rtx_REG (Pmode, 12); else temp_reg = gen_rtx_REG (Pmode, 1); @@ -10939,6 +10953,166 @@ s300_set_up_by_prologue (hard_reg_set_container *regs) SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg)); } +/* -fsplit-stack support. */ + +/* A SYMBOL_REF for __morestack. */ +static GTY(()) rtx morestack_ref; + +/* When using -fsplit-stack, the allocation routines set a field in + the TCB to the bottom of the stack plus this much space, measured + in bytes. */ + +#define SPLIT_STACK_AVAILABLE 1024 + +/* Emit -fsplit-stack prologue, which goes before the regular function + prologue. */ + +void +s390_expand_split_stack_prologue (void) +{ + rtx r1, guard, cc = NULL; + rtx_insn *insn; + /* Offset from thread pointer to __private_ss. */ + int psso = TARGET_64BIT ? 0x38 : 0x20; + /* Pointer size in bytes. */ + /* Frame size and argument size - the two parameters to __morestack. */ + HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size; + /* Align argument size to 8 bytes - simplifies __morestack code. */ + HOST_WIDE_INT args_size = crtl->args.size >= 0 + ? ((crtl->args.size + 7) & ~7) + : 0; + /* Label to be called by __morestack. */ + rtx_code_label *call_done = NULL; + rtx_code_label *parm_base = NULL; + rtx tmp; + + gcc_assert (flag_split_stack && reload_completed); + if (!TARGET_CPU_ZARCH) + { + sorry ("CPUs older than z900 are not supported for -fsplit-stack"); + return; + } + + r1 = gen_rtx_REG (Pmode, 1); + + /* If no stack frame will be allocated, don't do anything. */ + if (!frame_size) + { + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + /* If va_start is used, just use r15. */ + emit_move_insn (r1, + gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (STACK_POINTER_OFFSET))); + + } + return; + } + + if (morestack_ref == NULL_RTX) + { + morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack"); + SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL + | SYMBOL_FLAG_FUNCTION); + } + + if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Op (frame_size)) + { + /* If frame_size will fit in an add instruction, do a stack space + check, and only call __morestack if there's not enough space. */ + + /* Get thread pointer. r1 is the only register we can always destroy - r0 + could contain a static chain (and cannot be used to address memory + anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved. */ + emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM)); + /* Aim at __private_ss. */ + guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso)); + + /* If less that 1kiB used, skip addition and compare directly with + __private_ss. */ + if (frame_size > SPLIT_STACK_AVAILABLE) + { + emit_move_insn (r1, guard); + if (TARGET_64BIT) + emit_insn (gen_adddi3 (r1, r1, GEN_INT (frame_size))); + else + emit_insn (gen_addsi3 (r1, r1, GEN_INT (frame_size))); + guard = r1; + } + + /* Compare the (maybe adjusted) guard with the stack pointer. */ + cc = s390_emit_compare (LT, stack_pointer_rtx, guard); + } + + call_done = gen_label_rtx (); + parm_base = gen_label_rtx (); + + /* Emit the parameter block. */ + tmp = gen_split_stack_data (parm_base, call_done, + GEN_INT (frame_size), + GEN_INT (args_size)); + insn = emit_insn (tmp); + add_reg_note (insn, REG_LABEL_OPERAND, call_done); + LABEL_NUSES (call_done)++; + add_reg_note (insn, REG_LABEL_OPERAND, parm_base); + LABEL_NUSES (parm_base)++; + + /* %r1 = litbase. */ + insn = emit_insn (gen_main_base_64 (r1, parm_base)); + add_reg_note (insn, REG_LABEL_OPERAND, parm_base); + LABEL_NUSES (parm_base)++; + + /* Now, we need to call __morestack. It has very special calling + conventions: it preserves param/return/static chain registers for + calling main function body, and looks for its own parameters at %r1. */ + + if (cc != NULL) + { + tmp = gen_split_stack_cond_call (morestack_ref, cc, call_done); + + insn = emit_jump_insn (tmp); + JUMP_LABEL (insn) = call_done; + LABEL_NUSES (call_done)++; + + /* Mark the jump as very unlikely to be taken. */ + add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100); + + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + /* If va_start is used, and __morestack was not called, just use + r15. */ + emit_move_insn (r1, + gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (STACK_POINTER_OFFSET))); + } + } + else + { + tmp = gen_split_stack_call (morestack_ref, call_done); + insn = emit_jump_insn (tmp); + JUMP_LABEL (insn) = call_done; + LABEL_NUSES (call_done)++; + emit_barrier (); + } + + /* __morestack will call us here. */ + + emit_label (call_done); +} + +/* We may have to tell the dataflow pass that the split stack prologue + is initializing a register. */ + +static void +s390_live_on_entry (bitmap regs) +{ + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + gcc_assert (flag_split_stack); + bitmap_set_bit (regs, 1); + } +} + /* Return true if the function can use simple_return to return outside of a shrink-wrapped region. At present shrink-wrapping is supported in all cases. */ @@ -11541,6 +11715,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } + if (flag_split_stack + && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) + == NULL) + && cfun->machine->split_stack_varargs_pointer == NULL_RTX) + { + rtx reg; + rtx_insn *seq; + + reg = gen_reg_rtx (Pmode); + cfun->machine->split_stack_varargs_pointer = reg; + + start_sequence (); + emit_move_insn (reg, gen_rtx_REG (Pmode, 1)); + seq = get_insns (); + end_sequence (); + + push_topmost_sequence (); + emit_insn_after (seq, entry_of_function ()); + pop_topmost_sequence (); + } + /* Find the overflow area. FIXME: This currently is too pessimistic when the vector ABI is enabled. In that case we *always* set up the overflow area @@ -11549,7 +11744,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG || TARGET_VX_ABI) { - t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); + if (cfun->machine->split_stack_varargs_pointer == NULL_RTX) + t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); + else + t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer); off = INTVAL (crtl->args.arg_offset_rtx); off = off < 0 ? 0 : off; @@ -14469,6 +14667,9 @@ s390_asm_file_end (void) s390_vector_abi); #endif file_end_indicate_exec_stack (); + + if (flag_split_stack) + file_end_indicate_split_stack (); } /* Return true if TYPE is a vector bool type. */ @@ -14724,6 +14925,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty #undef TARGET_SET_UP_BY_PROLOGUE #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue +#undef TARGET_EXTRA_LIVE_ON_ENTRY +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry + #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ s390_use_by_pieces_infrastructure_p diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 9b869d5..cc120b1 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -114,6 +114,9 @@ UNSPEC_SP_SET UNSPEC_SP_TEST + ; Split stack support + UNSPEC_STACK_CHECK + ; Test Data Class (TDC) UNSPEC_TDC_INSN @@ -276,6 +279,10 @@ ; Set and get floating point control register UNSPECV_SFPC UNSPECV_EFPC + + ; Split stack support + UNSPECV_SPLIT_STACK_CALL + UNSPECV_SPLIT_STACK_DATA ]) ;; @@ -10907,3 +10914,134 @@ "TARGET_Z13" "lcbb\t%0,%1,%b2" [(set_attr "op_type" "VRX")]) + +; Handle -fsplit-stack. + +(define_expand "split_stack_prologue" + [(const_int 0)] + "" +{ + s390_expand_split_stack_prologue (); + DONE; +}) + +;; If there are operand 0 bytes available on the stack, jump to +;; operand 1. + +(define_expand "split_stack_space_check" + [(set (pc) (if_then_else + (ltu (minus (reg 15) + (match_operand 0 "register_operand")) + (unspec [(const_int 0)] UNSPEC_STACK_CHECK)) + (label_ref (match_operand 1)) + (pc)))] + "" +{ + /* Offset from thread pointer to __private_ss. */ + int psso = TARGET_64BIT ? 0x38 : 0x20; + rtx tp = s390_get_thread_pointer (); + rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso)); + rtx reg = gen_reg_rtx (Pmode); + rtx cc; + if (TARGET_64BIT) + emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0])); + else + emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0])); + cc = s390_emit_compare (GT, reg, guard); + s390_emit_jump (operands[1], cc); + + DONE; +}) + +;; __morestack parameter block for split stack prologue. Parameters are: +;; parameter block label, label to be called by __morestack, frame size, +;; stack parameter size. + +(define_insn "split_stack_data" + [(unspec_volatile [(match_operand 0 "" "X") + (match_operand 1 "" "X") + (match_operand 2 "consttable_operand" "X") + (match_operand 3 "consttable_operand" "X")] + UNSPECV_SPLIT_STACK_DATA)] + "TARGET_CPU_ZARCH" +{ + switch_to_section (targetm.asm_out.function_rodata_section + (current_function_decl)); + + if (TARGET_64BIT) + output_asm_insn (".align\t8", operands); + else + output_asm_insn (".align\t4", operands); + (*targetm.asm_out.internal_label) (asm_out_file, "L", + CODE_LABEL_NUMBER (operands[0])); + if (TARGET_64BIT) + { + output_asm_insn (".quad\t%2", operands); + output_asm_insn (".quad\t%3", operands); + output_asm_insn (".quad\t%1-%0", operands); + } + else + { + output_asm_insn (".long\t%2", operands); + output_asm_insn (".long\t%3", operands); + output_asm_insn (".long\t%1-%0", operands); + } + + switch_to_section (current_function_section ()); + return ""; +} + [(set_attr "length" "0")]) + + +;; A jg with minimal fuss for use in split stack prologue. + +(define_expand "split_stack_call" + [(match_operand 0 "bras_sym_operand" "X") + (match_operand 1 "" "")] + "TARGET_CPU_ZARCH" +{ + if (TARGET_64BIT) + emit_jump_insn (gen_split_stack_call_di (operands[0], operands[1])); + else + emit_jump_insn (gen_split_stack_call_si (operands[0], operands[1])); + DONE; +}) + +(define_insn "split_stack_call_<mode>" + [(set (pc) (label_ref (match_operand 1 "" ""))) + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X") + (reg:P 1)] + UNSPECV_SPLIT_STACK_CALL))] + "TARGET_CPU_ZARCH" + "jg\t%0" + [(set_attr "op_type" "RIL") + (set_attr "type" "branch")]) + +;; Also a conditional one. + +(define_expand "split_stack_cond_call" + [(match_operand 0 "bras_sym_operand" "X") + (match_operand 1 "" "") + (match_operand 2 "" "")] + "TARGET_CPU_ZARCH" +{ + if (TARGET_64BIT) + emit_jump_insn (gen_split_stack_cond_call_di (operands[0], operands[1], operands[2])); + else + emit_jump_insn (gen_split_stack_cond_call_si (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_insn "split_stack_cond_call_<mode>" + [(set (pc) + (if_then_else + (match_operand 1 "" "") + (label_ref (match_operand 2 "" "")) + (pc))) + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X") + (reg:P 1)] + UNSPECV_SPLIT_STACK_CALL))] + "TARGET_CPU_ZARCH" + "jg%C1\t%0" + [(set_attr "op_type" "RIL") + (set_attr "type" "branch")]) diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog index 49c7929..3900ab1 100644 --- a/libgcc/ChangeLog +++ b/libgcc/ChangeLog @@ -1,3 +1,10 @@ +2016-02-02 Marcin KoÅcielnicki <koriakin@0x04.net> + + * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. + * config/s390/morestack.S: New file. + * config/s390/t-stack-s390: New file. + * generic-morestack.c (__splitstack_find): Add s390-specific code. + 2016-01-25 Jakub Jelinek <jakub@redhat.com> PR target/69444 diff --git a/libgcc/config.host b/libgcc/config.host index d8efd82..2be5f7e 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1114,11 +1114,11 @@ rx-*-elf) tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h" ;; s390-*-linux*) - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi" + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390" md_unwind_header=s390/linux-unwind.h ;; s390x-*-linux*) - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux" + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390" if test "${host_address}" = 32; then tmake_file="${tmake_file} s390/32/t-floattodi" fi diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S new file mode 100644 index 0000000..141dead --- /dev/null +++ b/libgcc/config/s390/morestack.S @@ -0,0 +1,609 @@ +# s390 support for -fsplit-stack. +# Copyright (C) 2015 Free Software Foundation, Inc. +# Contributed by Marcin KoÅcielnicki <koriakin@0x04.net>. + +# This file is part of GCC. + +# GCC is free software; you can redistribute it and/or modify it under +# the terms of the GNU General Public License as published by the Free +# Software Foundation; either version 3, or (at your option) any later +# version. + +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. + +# Under Section 7 of GPL version 3, you are granted additional +# permissions described in the GCC Runtime Library Exception, version +# 3.1, as published by the Free Software Foundation. + +# You should have received a copy of the GNU General Public License and +# a copy of the GCC Runtime Library Exception along with this program; +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +# <http://www.gnu.org/licenses/>. + +# Excess space needed to call ld.so resolver for lazy plt +# resolution. Go uses sigaltstack so this doesn't need to +# also cover signal frame size. +#define BACKOFF 0x1000 + +# The __morestack function. + + .global __morestack + .hidden __morestack + + .type __morestack,@function + +__morestack: +.LFB1: + .cfi_startproc + + +#ifndef __s390x__ + + +# The 31-bit __morestack function. + + # We use a cleanup to restore the stack guard if an exception + # is thrown through this code. +#ifndef __PIC__ + .cfi_personality 0,__gcc_personality_v0 + .cfi_lsda 0,.LLSDA1 +#else + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 + .cfi_lsda 0x1b,.LLSDA1 +#endif + + stm %r2, %r15, 0x8(%r15) # Save %r2-%r15. + .cfi_offset %r6, -0x48 + .cfi_offset %r7, -0x44 + .cfi_offset %r8, -0x40 + .cfi_offset %r9, -0x3c + .cfi_offset %r10, -0x38 + .cfi_offset %r11, -0x34 + .cfi_offset %r12, -0x30 + .cfi_offset %r13, -0x2c + .cfi_offset %r14, -0x28 + .cfi_offset %r15, -0x24 + lr %r11, %r15 # Make frame pointer for vararg. + .cfi_def_cfa_register %r11 + ahi %r15, -0x60 # 0x60 for standard frame. + st %r11, 0(%r15) # Save back chain. + lr %r8, %r0 # Save %r0 (static chain). + lr %r10, %r1 # Save %r1 (address of parameter block). + + l %r7, 0(%r10) # Required frame size to %r7 + ear %r1, %a0 # Extract thread pointer. + l %r1, 0x20(%r1) # Get stack bounduary + ar %r1, %r7 # Stack bounduary + frame size + a %r1, 4(%r10) # + stack param size + clr %r1, %r15 # Compare with current stack pointer + jle .Lnoalloc # guard > sp - frame-size: need alloc + + brasl %r14, __morestack_block_signals + + # We abuse one of caller's fpr save slots (which we don't use for fprs) + # as a local variable. Not needed here, but done to be consistent with + # the below use. + ahi %r7, BACKOFF # Bump requested size a bit. + st %r7, 0x40(%r11) # Stuff frame size on stack. + la %r2, 0x40(%r11) # Pass its address as parameter. + la %r3, 0x60(%r11) # Caller's stack parameters. + l %r4, 4(%r10) # Size of stack parameters. + brasl %r14, __generic_morestack + + lr %r15, %r2 # Switch to the new stack. + ahi %r15, -0x60 # Make a stack frame on it. + st %r11, 0(%r15) # Save back chain. + + s %r2, 0x40(%r11) # The end of stack space. + ahi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. +.LEHB0: + st %r2, 0x20(%r1) # Save the new stack boundary. + + brasl %r14, __morestack_unblock_signals + + lr %r0, %r8 # Static chain. + lm %r2, %r6, 0x8(%r11) # Paremeter registers. + + # Third parameter is address of function meat - address of parameter + # block. + a %r10, 0x8(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0x60(%r11) + + # State of registers: + # %r0: Static chain from entry. + # %r1: Vararg pointer. + # %r2-%r6: Parameters from entry. + # %r7-%r10: Indeterminate. + # %r11: Frame pointer (%r15 from entry). + # %r12-%r13: Indeterminate. + # %r14: Return address. + # %r15: Stack pointer. + basr %r14, %r10 # Call our caller. + + stm %r2, %r3, 0x8(%r11) # Save return registers. + + brasl %r14, __morestack_block_signals + + # We need a stack slot now, but have no good way to get it - the frame + # on new stack had to be exactly 0x60 bytes, or stack parameters would + # be passed wrong. Abuse fpr save area in caller's frame (we don't + # save actual fprs). + la %r2, 0x40(%r11) + brasl %r14, __generic_releasestack + + s %r2, 0x40(%r11) # Subtract available space. + ahi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. +.LEHE0: + st %r2, 0x20(%r1) # Save the new stack boundary. + + # We need to restore the old stack pointer before unblocking signals. + # We also need 0x60 bytes for a stack frame. Since we had a stack + # frame at this place before the stack switch, there's no need to + # write the back chain again. + lr %r15, %r11 + ahi %r15, -0x60 + + brasl %r14, __morestack_unblock_signals + + lm %r2, %r15, 0x8(%r11) # Restore all registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# Executed if no new stack allocation is needed. + +.Lnoalloc: + .cfi_restore_state + # We may need to copy stack parameters. + l %r9, 0x4(%r10) # Load stack parameter size. + ltr %r9, %r9 # And check if it's 0. + je .Lnostackparm # Skip the copy if not needed. + sr %r15, %r9 # Make space on the stack. + la %r8, 0x60(%r15) # Destination. + la %r12, 0x60(%r11) # Source. + lr %r13, %r9 # Source size. +.Lcopy: + mvcle %r8, %r12, 0 # Copy. + jo .Lcopy + +.Lnostackparm: + # Third parameter is address of function meat - address of parameter + # block. + a %r10, 0x8(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0x60(%r11) + + # OK, no stack allocation needed. We still follow the protocol and + # call our caller - it doesn't cost much and makes sure vararg works. + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. + basr %r14, %r10 # Call our caller. + + lm %r6, %r15, 0x18(%r11) # Restore all callee-saved registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# This is the cleanup code called by the stack unwinder when unwinding +# through the code between .LEHB0 and .LEHE0 above. + +.L1: + .cfi_restore_state + lr %r2, %r11 # Stack pointer after resume. + brasl %r14, __generic_findstack + lr %r3, %r11 # Get the stack pointer. + sr %r3, %r2 # Subtract available space. + ahi %r3, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. + st %r3, 0x20(%r1) # Save the new stack boundary. + + lr %r2, %r6 # Exception header. +#ifdef __PIC__ + brasl %r14, _Unwind_Resume@PLT +#else + brasl %r14, _Unwind_Resume +#endif + +#else /* defined(__s390x__) */ + + +# The 64-bit __morestack function. + + # We use a cleanup to restore the stack guard if an exception + # is thrown through this code. +#ifndef __PIC__ + .cfi_personality 0x3,__gcc_personality_v0 + .cfi_lsda 0x3,.LLSDA1 +#else + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 + .cfi_lsda 0x1b,.LLSDA1 +#endif + + stmg %r2, %r15, 0x10(%r15) # Save %r2-%r15. + .cfi_offset %r6, -0x70 + .cfi_offset %r7, -0x68 + .cfi_offset %r8, -0x60 + .cfi_offset %r9, -0x58 + .cfi_offset %r10, -0x50 + .cfi_offset %r11, -0x48 + .cfi_offset %r12, -0x40 + .cfi_offset %r13, -0x38 + .cfi_offset %r14, -0x30 + .cfi_offset %r15, -0x28 + lgr %r11, %r15 # Make frame pointer for vararg. + .cfi_def_cfa_register %r11 + aghi %r15, -0xa0 # 0xa0 for standard frame. + stg %r11, 0(%r15) # Save back chain. + lgr %r8, %r0 # Save %r0 (static chain). + lgr %r10, %r1 # Save %r1 (address of parameter block). + + lg %r7, 0(%r10) # Required frame size to %r7 + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. + lg %r1, 0x38(%r1) # Get stack bounduary + agr %r1, %r7 # Stack bounduary + frame size + ag %r1, 8(%r10) # + stack param size + clgr %r1, %r15 # Compare with current stack pointer + jle .Lnoalloc # guard > sp - frame-size: need alloc + + brasl %r14, __morestack_block_signals + + # We abuse one of caller's fpr save slots (which we don't use for fprs) + # as a local variable. Not needed here, but done to be consistent with + # the below use. + aghi %r7, BACKOFF # Bump requested size a bit. + stg %r7, 0x80(%r11) # Stuff frame size on stack. + la %r2, 0x80(%r11) # Pass its address as parameter. + la %r3, 0xa0(%r11) # Caller's stack parameters. + lg %r4, 8(%r10) # Size of stack parameters. + brasl %r14, __generic_morestack + + lgr %r15, %r2 # Switch to the new stack. + aghi %r15, -0xa0 # Make a stack frame on it. + stg %r11, 0(%r15) # Save back chain. + + sg %r2, 0x80(%r11) # The end of stack space. + aghi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. +.LEHB0: + stg %r2, 0x38(%r1) # Save the new stack boundary. + + brasl %r14, __morestack_unblock_signals + + lgr %r0, %r8 # Static chain. + lmg %r2, %r6, 0x10(%r11) # Paremeter registers. + + # Third parameter is address of function meat - address of parameter + # block. + ag %r10, 0x10(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0xa0(%r11) + + # State of registers: + # %r0: Static chain from entry. + # %r1: Vararg pointer. + # %r2-%r6: Parameters from entry. + # %r7-%r10: Indeterminate. + # %r11: Frame pointer (%r15 from entry). + # %r12-%r13: Indeterminate. + # %r14: Return address. + # %r15: Stack pointer. + basr %r14, %r10 # Call our caller. + + stg %r2, 0x10(%r11) # Save return register. + + brasl %r14, __morestack_block_signals + + # We need a stack slot now, but have no good way to get it - the frame + # on new stack had to be exactly 0xa0 bytes, or stack parameters would + # be passed wrong. Abuse fpr save area in caller's frame (we don't + # save actual fprs). + la %r2, 0x80(%r11) + brasl %r14, __generic_releasestack + + sg %r2, 0x80(%r11) # Subtract available space. + aghi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. +.LEHE0: + stg %r2, 0x38(%r1) # Save the new stack boundary. + + # We need to restore the old stack pointer before unblocking signals. + # We also need 0xa0 bytes for a stack frame. Since we had a stack + # frame at this place before the stack switch, there's no need to + # write the back chain again. + lgr %r15, %r11 + aghi %r15, -0xa0 + + brasl %r14, __morestack_unblock_signals + + lmg %r2, %r15, 0x10(%r11) # Restore all registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# Executed if no new stack allocation is needed. + +.Lnoalloc: + .cfi_restore_state + # We may need to copy stack parameters. + lg %r9, 0x8(%r10) # Load stack parameter size. + ltgr %r9, %r9 # Check if it's 0. + je .Lnostackparm # Skip the copy if not needed. + sgr %r15, %r9 # Make space on the stack. + la %r8, 0xa0(%r15) # Destination. + la %r12, 0xa0(%r11) # Source. + lgr %r13, %r9 # Source size. +.Lcopy: + mvcle %r8, %r12, 0 # Copy. + jo .Lcopy + +.Lnostackparm: + # Third parameter is address of function meat - address of parameter + # block. + ag %r10, 0x10(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0xa0(%r11) + + # OK, no stack allocation needed. We still follow the protocol and + # call our caller - it doesn't cost much and makes sure vararg works. + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. + basr %r14, %r10 # Call our caller. + + lmg %r6, %r15, 0x30(%r11) # Restore all callee-saved registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# This is the cleanup code called by the stack unwinder when unwinding +# through the code between .LEHB0 and .LEHE0 above. + +.L1: + .cfi_restore_state + lgr %r2, %r11 # Stack pointer after resume. + brasl %r14, __generic_findstack + lgr %r3, %r11 # Get the stack pointer. + sgr %r3, %r2 # Subtract available space. + aghi %r3, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. + stg %r3, 0x38(%r1) # Save the new stack boundary. + + lgr %r2, %r6 # Exception header. +#ifdef __PIC__ + brasl %r14, _Unwind_Resume@PLT +#else + brasl %r14, _Unwind_Resume +#endif + +#endif /* defined(__s390x__) */ + + .cfi_endproc + .size __morestack, . - __morestack + + +# The exception table. This tells the personality routine to execute +# the exception handler. + + .section .gcc_except_table,"a",@progbits + .align 4 +.LLSDA1: + .byte 0xff # @LPStart format (omit) + .byte 0xff # @TType format (omit) + .byte 0x1 # call-site format (uleb128) + .uleb128 .LLSDACSE1-.LLSDACSB1 # Call-site table length +.LLSDACSB1: + .uleb128 .LEHB0-.LFB1 # region 0 start + .uleb128 .LEHE0-.LEHB0 # length + .uleb128 .L1-.LFB1 # landing pad + .uleb128 0 # action +.LLSDACSE1: + + + .global __gcc_personality_v0 +#ifdef __PIC__ + # Build a position independent reference to the basic + # personality function. + .hidden DW.ref.__gcc_personality_v0 + .weak DW.ref.__gcc_personality_v0 + .section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat + .type DW.ref.__gcc_personality_v0, @object +DW.ref.__gcc_personality_v0: +#ifndef __LP64__ + .align 4 + .size DW.ref.__gcc_personality_v0, 4 + .long __gcc_personality_v0 +#else + .align 8 + .size DW.ref.__gcc_personality_v0, 8 + .quad __gcc_personality_v0 +#endif +#endif + + + +# Initialize the stack test value when the program starts or when a +# new thread starts. We don't know how large the main stack is, so we +# guess conservatively. We might be able to use getrlimit here. + + .text + .global __stack_split_initialize + .hidden __stack_split_initialize + + .type __stack_split_initialize, @function + +__stack_split_initialize: + +#ifndef __s390x__ + + ear %r1, %a0 + lr %r0, %r15 + ahi %r0, -0x4000 # We should have at least 16K. + st %r0, 0x20(%r1) + + lr %r2, %r15 + lhi %r3, 0x4000 +#ifdef __PIC__ + jg __generic_morestack_set_initial_sp@PLT # Tail call +#else + jg __generic_morestack_set_initial_sp # Tail call +#endif + +#else /* defined(__s390x__) */ + + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + lgr %r0, %r15 + aghi %r0, -0x4000 # We should have at least 16K. + stg %r0, 0x38(%r1) + + lgr %r2, %r15 + lghi %r3, 0x4000 +#ifdef __PIC__ + jg __generic_morestack_set_initial_sp@PLT # Tail call +#else + jg __generic_morestack_set_initial_sp # Tail call +#endif + +#endif /* defined(__s390x__) */ + + .size __stack_split_initialize, . - __stack_split_initialize + +# Routines to get and set the guard, for __splitstack_getcontext, +# __splitstack_setcontext, and __splitstack_makecontext. + +# void *__morestack_get_guard (void) returns the current stack guard. + .text + .global __morestack_get_guard + .hidden __morestack_get_guard + + .type __morestack_get_guard,@function + +__morestack_get_guard: + +#ifndef __s390x__ + ear %r1, %a0 + l %r2, 0x20(%r1) +#else + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + lg %r2, 0x38(%r1) +#endif + br %r14 + + .size __morestack_get_guard, . - __morestack_get_guard + +# void __morestack_set_guard (void *) sets the stack guard. + .global __morestack_set_guard + .hidden __morestack_set_guard + + .type __morestack_set_guard,@function + +__morestack_set_guard: + +#ifndef __s390x__ + ear %r1, %a0 + st %r2, 0x20(%r1) +#else + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + stg %r2, 0x38(%r1) +#endif + br %r14 + + .size __morestack_set_guard, . - __morestack_set_guard + +# void *__morestack_make_guard (void *, size_t) returns the stack +# guard value for a stack. + .global __morestack_make_guard + .hidden __morestack_make_guard + + .type __morestack_make_guard,@function + +__morestack_make_guard: + +#ifndef __s390x__ + sr %r2, %r3 + ahi %r2, BACKOFF +#else + sgr %r2, %r3 + aghi %r2, BACKOFF +#endif + br %r14 + + .size __morestack_make_guard, . - __morestack_make_guard + +# Make __stack_split_initialize a high priority constructor. + + .section .ctors.65535,"aw",@progbits + +#ifndef __LP64__ + .align 4 + .long __stack_split_initialize + .long __morestack_load_mmap +#else + .align 8 + .quad __stack_split_initialize + .quad __morestack_load_mmap +#endif + + .section .note.GNU-stack,"",@progbits + .section .note.GNU-split-stack,"",@progbits + .section .note.GNU-no-split-stack,"",@progbits diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390 new file mode 100644 index 0000000..4c959b0 --- /dev/null +++ b/libgcc/config/s390/t-stack-s390 @@ -0,0 +1,2 @@ +# Makefile fragment to support -fsplit-stack for s390. +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c index 89765d4..b8eec4e 100644 --- a/libgcc/generic-morestack.c +++ b/libgcc/generic-morestack.c @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len, #elif defined (__i386__) nsp -= 6 * sizeof (void *); #elif defined __powerpc64__ +#elif defined __s390x__ + nsp -= 2 * 160; +#elif defined __s390__ + nsp -= 2 * 96; #else #error "unrecognized target" #endif -- 2.7.0 ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-03 0:20 ` Marcin Kościelnicki @ 2016-02-03 17:03 ` Ulrich Weigand 2016-02-03 17:18 ` Marcin Kościelnicki 0 siblings, 1 reply; 55+ messages in thread From: Ulrich Weigand @ 2016-02-03 17:03 UTC (permalink / raw) To: Marcin Kościelnicki; +Cc: krebbel, gcc-patches, Marcin Kościelnicki Marcin Koà Âcielnicki wrote: > Comment fixed, split_stack_marker gone, reorg gone. Generated code seems sane, > but testsuite still running. > > I will need to modify the gold patch to handle the "leaf function taking non-split > stack function address" issue - this will likely require messing with the target > independent plumbing, the hook for that doesn't seem to get enough params. Thanks for making those changes; the patch is looking a lot nicer (and shorter :-)) now! Just to clarify, your original patch series had two common-code prerequisite patches (3/5 and 4/5) -- it looks like those may still be needed? If so, we'll have to get approval from the appropriate middle-end maintainers before this patch can go it as well. As to the back-end patch, I've now only got some cosmetical issues: > + insn = emit_insn (gen_main_base_64 (r1, parm_base)); Now that we aren't using the literal pool infrastructure for the block any more, I guess we shouldn't be using it to load the address either. Just something like: insn = emit_move_insn (r1, gen_rtx_LABEL_REF (VOIDmode, parm_base)); should do it. > +(define_insn "split_stack_data" > + [(unspec_volatile [(match_operand 0 "" "X") > + (match_operand 1 "" "X") > + (match_operand 2 "consttable_operand" "X") > + (match_operand 3 "consttable_operand" "X")] And similarly here, just use const_int_operand. Otherwise, this all looks very good to me. Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH] s390: Add -fsplit-stack support 2016-02-03 17:03 ` Ulrich Weigand @ 2016-02-03 17:18 ` Marcin Kościelnicki 2016-02-03 17:27 ` Ulrich Weigand 0 siblings, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-02-03 17:18 UTC (permalink / raw) To: uweigand; +Cc: krebbel, gcc-patches, Marcin Kościelnicki libgcc/ChangeLog: * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. * config/s390/morestack.S: New file. * config/s390/t-stack-s390: New file. * generic-morestack.c (__splitstack_find): Add s390-specific code. gcc/ChangeLog: * common/config/s390/s390-common.c (s390_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro. * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. * config/s390/s390.c (struct machine_function): New field split_stack_varargs_pointer. (s390_register_info): Mark r12 as clobbered if it'll be used as temp in s390_emit_prologue. (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack vararg pointer. (morestack_ref): New global. (SPLIT_STACK_AVAILABLE): New macro. (s390_expand_split_stack_prologue): New function. (s390_live_on_entry): New function. (s390_va_start): Use split-stack vararg pointer if appropriate. (s390_asm_file_end): Emit the split-stack note sections. (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. (UNSPECV_SPLIT_STACK_CALL): New unspec. (UNSPECV_SPLIT_STACK_DATA): New unspec. (split_stack_prologue): New expand. (split_stack_space_check): New expand. (split_stack_data): New insn. (split_stack_call): New expand. (split_stack_call_*): New insn. (split_stack_cond_call): New expand. (split_stack_cond_call_*): New insn. --- Changes applied. Testsuite still running, still works on my simple tests. As for common code prerequisites: #3 is no longer needed, and very likely so is #4 (it fixes problems that I've only seen with ESA mode, and testsuite runs just fine without it now). gcc/ChangeLog | 30 ++ gcc/common/config/s390/s390-common.c | 14 + gcc/config/s390/s390-protos.h | 1 + gcc/config/s390/s390.c | 214 +++++++++++- gcc/config/s390/s390.md | 138 ++++++++ libgcc/ChangeLog | 7 + libgcc/config.host | 4 +- libgcc/config/s390/morestack.S | 609 +++++++++++++++++++++++++++++++++++ libgcc/config/s390/t-stack-s390 | 2 + libgcc/generic-morestack.c | 4 + 10 files changed, 1016 insertions(+), 7 deletions(-) create mode 100644 libgcc/config/s390/morestack.S create mode 100644 libgcc/config/s390/t-stack-s390 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 92db764..8e3f9f7 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,33 @@ +2016-02-03 Marcin KoÅcielnicki <koriakin@0x04.net> + + * common/config/s390/s390-common.c (s390_supports_split_stack): + New function. + (TARGET_SUPPORTS_SPLIT_STACK): New macro. + * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. + * config/s390/s390.c (struct machine_function): New field + split_stack_varargs_pointer. + (s390_register_info): Mark r12 as clobbered if it'll be used as temp + in s390_emit_prologue. + (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack + vararg pointer. + (morestack_ref): New global. + (SPLIT_STACK_AVAILABLE): New macro. + (s390_expand_split_stack_prologue): New function. + (s390_live_on_entry): New function. + (s390_va_start): Use split-stack vararg pointer if appropriate. + (s390_asm_file_end): Emit the split-stack note sections. + (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. + * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. + (UNSPECV_SPLIT_STACK_CALL): New unspec. + (UNSPECV_SPLIT_STACK_DATA): New unspec. + (split_stack_prologue): New expand. + (split_stack_space_check): New expand. + (split_stack_data): New insn. + (split_stack_call): New expand. + (split_stack_call_*): New insn. + (split_stack_cond_call): New expand. + (split_stack_cond_call_*): New insn. + 2016-02-03 Kirill Yukhin <kirill.yukhin@intel.com> PR target/69118 diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c index 4519c21..1e497e6 100644 --- a/gcc/common/config/s390/s390-common.c +++ b/gcc/common/config/s390/s390-common.c @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, } } +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23. + We don't verify it, since earlier versions just have padding at + its place, which works just as well. */ + +static bool +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED, + struct gcc_options *opts ATTRIBUTE_UNUSED) +{ + return true; +} + #undef TARGET_DEFAULT_TARGET_FLAGS #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT) @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, #undef TARGET_OPTION_INIT_STRUCT #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct +#undef TARGET_SUPPORTS_SPLIT_STACK +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack + struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index 633bc1e..09032c9 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, extern HOST_WIDE_INT s390_initial_elimination_offset (int, int); extern void s390_emit_prologue (void); extern void s390_emit_epilogue (bool); +extern void s390_expand_split_stack_prologue (void); extern bool s390_can_use_simple_return_insn (void); extern bool s390_can_use_return_insn (void); extern void s390_function_profiler (FILE *, int); diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 3be64de..9c33545 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -426,6 +426,13 @@ struct GTY(()) machine_function /* True if the current function may contain a tbegin clobbering FPRs. */ bool tbegin_p; + + /* For -fsplit-stack support: A stack local which holds a pointer to + the stack arguments for a function with a variable number of + arguments. This is set at the start of the function and is used + to initialize the overflow_arg_area field of the va_list + structure. */ + rtx split_stack_varargs_pointer; }; /* Few accessor macros for struct cfun->machine->s390_frame_layout. */ @@ -9316,9 +9323,13 @@ s390_register_info () cfun_frame_layout.high_fprs++; } - if (flag_pic) - clobbered_regs[PIC_OFFSET_TABLE_REGNUM] - |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM); + /* Register 12 is used for GOT address, but also as temp in prologue + for split-stack stdarg functions (unless r14 is available). */ + clobbered_regs[12] + |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM)) + || (flag_split_stack && cfun->stdarg + && (crtl->is_leaf || TARGET_TPF_PROFILING + || has_hard_reg_initial_val (Pmode, RETURN_REGNUM)))); clobbered_regs[BASE_REGNUM] |= (cfun->machine->base_reg @@ -10440,12 +10451,15 @@ s390_emit_prologue (void) int next_fpr = 0; /* Choose best register to use for temp use within prologue. - See below for why TPF must use the register 1. */ + TPF with profiling must avoid the register 14 - the tracing function + needs the original contents of r14 to be preserved. */ if (!has_hard_reg_initial_val (Pmode, RETURN_REGNUM) && !crtl->is_leaf && !TARGET_TPF_PROFILING) temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM); + else if (flag_split_stack && cfun->stdarg) + temp_reg = gen_rtx_REG (Pmode, 12); else temp_reg = gen_rtx_REG (Pmode, 1); @@ -10939,6 +10953,166 @@ s300_set_up_by_prologue (hard_reg_set_container *regs) SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg)); } +/* -fsplit-stack support. */ + +/* A SYMBOL_REF for __morestack. */ +static GTY(()) rtx morestack_ref; + +/* When using -fsplit-stack, the allocation routines set a field in + the TCB to the bottom of the stack plus this much space, measured + in bytes. */ + +#define SPLIT_STACK_AVAILABLE 1024 + +/* Emit -fsplit-stack prologue, which goes before the regular function + prologue. */ + +void +s390_expand_split_stack_prologue (void) +{ + rtx r1, guard, cc = NULL; + rtx_insn *insn; + /* Offset from thread pointer to __private_ss. */ + int psso = TARGET_64BIT ? 0x38 : 0x20; + /* Pointer size in bytes. */ + /* Frame size and argument size - the two parameters to __morestack. */ + HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size; + /* Align argument size to 8 bytes - simplifies __morestack code. */ + HOST_WIDE_INT args_size = crtl->args.size >= 0 + ? ((crtl->args.size + 7) & ~7) + : 0; + /* Label to be called by __morestack. */ + rtx_code_label *call_done = NULL; + rtx_code_label *parm_base = NULL; + rtx tmp; + + gcc_assert (flag_split_stack && reload_completed); + if (!TARGET_CPU_ZARCH) + { + sorry ("CPUs older than z900 are not supported for -fsplit-stack"); + return; + } + + r1 = gen_rtx_REG (Pmode, 1); + + /* If no stack frame will be allocated, don't do anything. */ + if (!frame_size) + { + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + /* If va_start is used, just use r15. */ + emit_move_insn (r1, + gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (STACK_POINTER_OFFSET))); + + } + return; + } + + if (morestack_ref == NULL_RTX) + { + morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack"); + SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL + | SYMBOL_FLAG_FUNCTION); + } + + if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Op (frame_size)) + { + /* If frame_size will fit in an add instruction, do a stack space + check, and only call __morestack if there's not enough space. */ + + /* Get thread pointer. r1 is the only register we can always destroy - r0 + could contain a static chain (and cannot be used to address memory + anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved. */ + emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM)); + /* Aim at __private_ss. */ + guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso)); + + /* If less that 1kiB used, skip addition and compare directly with + __private_ss. */ + if (frame_size > SPLIT_STACK_AVAILABLE) + { + emit_move_insn (r1, guard); + if (TARGET_64BIT) + emit_insn (gen_adddi3 (r1, r1, GEN_INT (frame_size))); + else + emit_insn (gen_addsi3 (r1, r1, GEN_INT (frame_size))); + guard = r1; + } + + /* Compare the (maybe adjusted) guard with the stack pointer. */ + cc = s390_emit_compare (LT, stack_pointer_rtx, guard); + } + + call_done = gen_label_rtx (); + parm_base = gen_label_rtx (); + + /* Emit the parameter block. */ + tmp = gen_split_stack_data (parm_base, call_done, + GEN_INT (frame_size), + GEN_INT (args_size)); + insn = emit_insn (tmp); + add_reg_note (insn, REG_LABEL_OPERAND, call_done); + LABEL_NUSES (call_done)++; + add_reg_note (insn, REG_LABEL_OPERAND, parm_base); + LABEL_NUSES (parm_base)++; + + /* %r1 = litbase. */ + insn = emit_move_insn (r1, gen_rtx_LABEL_REF (VOIDmode, parm_base)); + add_reg_note (insn, REG_LABEL_OPERAND, parm_base); + LABEL_NUSES (parm_base)++; + + /* Now, we need to call __morestack. It has very special calling + conventions: it preserves param/return/static chain registers for + calling main function body, and looks for its own parameters at %r1. */ + + if (cc != NULL) + { + tmp = gen_split_stack_cond_call (morestack_ref, cc, call_done); + + insn = emit_jump_insn (tmp); + JUMP_LABEL (insn) = call_done; + LABEL_NUSES (call_done)++; + + /* Mark the jump as very unlikely to be taken. */ + add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100); + + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + /* If va_start is used, and __morestack was not called, just use + r15. */ + emit_move_insn (r1, + gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (STACK_POINTER_OFFSET))); + } + } + else + { + tmp = gen_split_stack_call (morestack_ref, call_done); + insn = emit_jump_insn (tmp); + JUMP_LABEL (insn) = call_done; + LABEL_NUSES (call_done)++; + emit_barrier (); + } + + /* __morestack will call us here. */ + + emit_label (call_done); +} + +/* We may have to tell the dataflow pass that the split stack prologue + is initializing a register. */ + +static void +s390_live_on_entry (bitmap regs) +{ + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + gcc_assert (flag_split_stack); + bitmap_set_bit (regs, 1); + } +} + /* Return true if the function can use simple_return to return outside of a shrink-wrapped region. At present shrink-wrapping is supported in all cases. */ @@ -11541,6 +11715,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } + if (flag_split_stack + && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) + == NULL) + && cfun->machine->split_stack_varargs_pointer == NULL_RTX) + { + rtx reg; + rtx_insn *seq; + + reg = gen_reg_rtx (Pmode); + cfun->machine->split_stack_varargs_pointer = reg; + + start_sequence (); + emit_move_insn (reg, gen_rtx_REG (Pmode, 1)); + seq = get_insns (); + end_sequence (); + + push_topmost_sequence (); + emit_insn_after (seq, entry_of_function ()); + pop_topmost_sequence (); + } + /* Find the overflow area. FIXME: This currently is too pessimistic when the vector ABI is enabled. In that case we *always* set up the overflow area @@ -11549,7 +11744,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG || TARGET_VX_ABI) { - t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); + if (cfun->machine->split_stack_varargs_pointer == NULL_RTX) + t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); + else + t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer); off = INTVAL (crtl->args.arg_offset_rtx); off = off < 0 ? 0 : off; @@ -14469,6 +14667,9 @@ s390_asm_file_end (void) s390_vector_abi); #endif file_end_indicate_exec_stack (); + + if (flag_split_stack) + file_end_indicate_split_stack (); } /* Return true if TYPE is a vector bool type. */ @@ -14724,6 +14925,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty #undef TARGET_SET_UP_BY_PROLOGUE #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue +#undef TARGET_EXTRA_LIVE_ON_ENTRY +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry + #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ s390_use_by_pieces_infrastructure_p diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 9b869d5..975ee27 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -114,6 +114,9 @@ UNSPEC_SP_SET UNSPEC_SP_TEST + ; Split stack support + UNSPEC_STACK_CHECK + ; Test Data Class (TDC) UNSPEC_TDC_INSN @@ -276,6 +279,10 @@ ; Set and get floating point control register UNSPECV_SFPC UNSPECV_EFPC + + ; Split stack support + UNSPECV_SPLIT_STACK_CALL + UNSPECV_SPLIT_STACK_DATA ]) ;; @@ -10907,3 +10914,134 @@ "TARGET_Z13" "lcbb\t%0,%1,%b2" [(set_attr "op_type" "VRX")]) + +; Handle -fsplit-stack. + +(define_expand "split_stack_prologue" + [(const_int 0)] + "" +{ + s390_expand_split_stack_prologue (); + DONE; +}) + +;; If there are operand 0 bytes available on the stack, jump to +;; operand 1. + +(define_expand "split_stack_space_check" + [(set (pc) (if_then_else + (ltu (minus (reg 15) + (match_operand 0 "register_operand")) + (unspec [(const_int 0)] UNSPEC_STACK_CHECK)) + (label_ref (match_operand 1)) + (pc)))] + "" +{ + /* Offset from thread pointer to __private_ss. */ + int psso = TARGET_64BIT ? 0x38 : 0x20; + rtx tp = s390_get_thread_pointer (); + rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso)); + rtx reg = gen_reg_rtx (Pmode); + rtx cc; + if (TARGET_64BIT) + emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0])); + else + emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0])); + cc = s390_emit_compare (GT, reg, guard); + s390_emit_jump (operands[1], cc); + + DONE; +}) + +;; __morestack parameter block for split stack prologue. Parameters are: +;; parameter block label, label to be called by __morestack, frame size, +;; stack parameter size. + +(define_insn "split_stack_data" + [(unspec_volatile [(match_operand 0 "" "X") + (match_operand 1 "" "X") + (match_operand 2 "const_int_operand" "X") + (match_operand 3 "const_int_operand" "X")] + UNSPECV_SPLIT_STACK_DATA)] + "TARGET_CPU_ZARCH" +{ + switch_to_section (targetm.asm_out.function_rodata_section + (current_function_decl)); + + if (TARGET_64BIT) + output_asm_insn (".align\t8", operands); + else + output_asm_insn (".align\t4", operands); + (*targetm.asm_out.internal_label) (asm_out_file, "L", + CODE_LABEL_NUMBER (operands[0])); + if (TARGET_64BIT) + { + output_asm_insn (".quad\t%2", operands); + output_asm_insn (".quad\t%3", operands); + output_asm_insn (".quad\t%1-%0", operands); + } + else + { + output_asm_insn (".long\t%2", operands); + output_asm_insn (".long\t%3", operands); + output_asm_insn (".long\t%1-%0", operands); + } + + switch_to_section (current_function_section ()); + return ""; +} + [(set_attr "length" "0")]) + + +;; A jg with minimal fuss for use in split stack prologue. + +(define_expand "split_stack_call" + [(match_operand 0 "bras_sym_operand" "X") + (match_operand 1 "" "")] + "TARGET_CPU_ZARCH" +{ + if (TARGET_64BIT) + emit_jump_insn (gen_split_stack_call_di (operands[0], operands[1])); + else + emit_jump_insn (gen_split_stack_call_si (operands[0], operands[1])); + DONE; +}) + +(define_insn "split_stack_call_<mode>" + [(set (pc) (label_ref (match_operand 1 "" ""))) + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X") + (reg:P 1)] + UNSPECV_SPLIT_STACK_CALL))] + "TARGET_CPU_ZARCH" + "jg\t%0" + [(set_attr "op_type" "RIL") + (set_attr "type" "branch")]) + +;; Also a conditional one. + +(define_expand "split_stack_cond_call" + [(match_operand 0 "bras_sym_operand" "X") + (match_operand 1 "" "") + (match_operand 2 "" "")] + "TARGET_CPU_ZARCH" +{ + if (TARGET_64BIT) + emit_jump_insn (gen_split_stack_cond_call_di (operands[0], operands[1], operands[2])); + else + emit_jump_insn (gen_split_stack_cond_call_si (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_insn "split_stack_cond_call_<mode>" + [(set (pc) + (if_then_else + (match_operand 1 "" "") + (label_ref (match_operand 2 "" "")) + (pc))) + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X") + (reg:P 1)] + UNSPECV_SPLIT_STACK_CALL))] + "TARGET_CPU_ZARCH" + "jg%C1\t%0" + [(set_attr "op_type" "RIL") + (set_attr "type" "branch")]) diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog index 49c7929..102cb3f 100644 --- a/libgcc/ChangeLog +++ b/libgcc/ChangeLog @@ -1,3 +1,10 @@ +2016-02-03 Marcin KoÅcielnicki <koriakin@0x04.net> + + * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. + * config/s390/morestack.S: New file. + * config/s390/t-stack-s390: New file. + * generic-morestack.c (__splitstack_find): Add s390-specific code. + 2016-01-25 Jakub Jelinek <jakub@redhat.com> PR target/69444 diff --git a/libgcc/config.host b/libgcc/config.host index d8efd82..2be5f7e 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1114,11 +1114,11 @@ rx-*-elf) tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h" ;; s390-*-linux*) - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi" + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390" md_unwind_header=s390/linux-unwind.h ;; s390x-*-linux*) - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux" + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390" if test "${host_address}" = 32; then tmake_file="${tmake_file} s390/32/t-floattodi" fi diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S new file mode 100644 index 0000000..141dead --- /dev/null +++ b/libgcc/config/s390/morestack.S @@ -0,0 +1,609 @@ +# s390 support for -fsplit-stack. +# Copyright (C) 2015 Free Software Foundation, Inc. +# Contributed by Marcin KoÅcielnicki <koriakin@0x04.net>. + +# This file is part of GCC. + +# GCC is free software; you can redistribute it and/or modify it under +# the terms of the GNU General Public License as published by the Free +# Software Foundation; either version 3, or (at your option) any later +# version. + +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. + +# Under Section 7 of GPL version 3, you are granted additional +# permissions described in the GCC Runtime Library Exception, version +# 3.1, as published by the Free Software Foundation. + +# You should have received a copy of the GNU General Public License and +# a copy of the GCC Runtime Library Exception along with this program; +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +# <http://www.gnu.org/licenses/>. + +# Excess space needed to call ld.so resolver for lazy plt +# resolution. Go uses sigaltstack so this doesn't need to +# also cover signal frame size. +#define BACKOFF 0x1000 + +# The __morestack function. + + .global __morestack + .hidden __morestack + + .type __morestack,@function + +__morestack: +.LFB1: + .cfi_startproc + + +#ifndef __s390x__ + + +# The 31-bit __morestack function. + + # We use a cleanup to restore the stack guard if an exception + # is thrown through this code. +#ifndef __PIC__ + .cfi_personality 0,__gcc_personality_v0 + .cfi_lsda 0,.LLSDA1 +#else + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 + .cfi_lsda 0x1b,.LLSDA1 +#endif + + stm %r2, %r15, 0x8(%r15) # Save %r2-%r15. + .cfi_offset %r6, -0x48 + .cfi_offset %r7, -0x44 + .cfi_offset %r8, -0x40 + .cfi_offset %r9, -0x3c + .cfi_offset %r10, -0x38 + .cfi_offset %r11, -0x34 + .cfi_offset %r12, -0x30 + .cfi_offset %r13, -0x2c + .cfi_offset %r14, -0x28 + .cfi_offset %r15, -0x24 + lr %r11, %r15 # Make frame pointer for vararg. + .cfi_def_cfa_register %r11 + ahi %r15, -0x60 # 0x60 for standard frame. + st %r11, 0(%r15) # Save back chain. + lr %r8, %r0 # Save %r0 (static chain). + lr %r10, %r1 # Save %r1 (address of parameter block). + + l %r7, 0(%r10) # Required frame size to %r7 + ear %r1, %a0 # Extract thread pointer. + l %r1, 0x20(%r1) # Get stack bounduary + ar %r1, %r7 # Stack bounduary + frame size + a %r1, 4(%r10) # + stack param size + clr %r1, %r15 # Compare with current stack pointer + jle .Lnoalloc # guard > sp - frame-size: need alloc + + brasl %r14, __morestack_block_signals + + # We abuse one of caller's fpr save slots (which we don't use for fprs) + # as a local variable. Not needed here, but done to be consistent with + # the below use. + ahi %r7, BACKOFF # Bump requested size a bit. + st %r7, 0x40(%r11) # Stuff frame size on stack. + la %r2, 0x40(%r11) # Pass its address as parameter. + la %r3, 0x60(%r11) # Caller's stack parameters. + l %r4, 4(%r10) # Size of stack parameters. + brasl %r14, __generic_morestack + + lr %r15, %r2 # Switch to the new stack. + ahi %r15, -0x60 # Make a stack frame on it. + st %r11, 0(%r15) # Save back chain. + + s %r2, 0x40(%r11) # The end of stack space. + ahi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. +.LEHB0: + st %r2, 0x20(%r1) # Save the new stack boundary. + + brasl %r14, __morestack_unblock_signals + + lr %r0, %r8 # Static chain. + lm %r2, %r6, 0x8(%r11) # Paremeter registers. + + # Third parameter is address of function meat - address of parameter + # block. + a %r10, 0x8(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0x60(%r11) + + # State of registers: + # %r0: Static chain from entry. + # %r1: Vararg pointer. + # %r2-%r6: Parameters from entry. + # %r7-%r10: Indeterminate. + # %r11: Frame pointer (%r15 from entry). + # %r12-%r13: Indeterminate. + # %r14: Return address. + # %r15: Stack pointer. + basr %r14, %r10 # Call our caller. + + stm %r2, %r3, 0x8(%r11) # Save return registers. + + brasl %r14, __morestack_block_signals + + # We need a stack slot now, but have no good way to get it - the frame + # on new stack had to be exactly 0x60 bytes, or stack parameters would + # be passed wrong. Abuse fpr save area in caller's frame (we don't + # save actual fprs). + la %r2, 0x40(%r11) + brasl %r14, __generic_releasestack + + s %r2, 0x40(%r11) # Subtract available space. + ahi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. +.LEHE0: + st %r2, 0x20(%r1) # Save the new stack boundary. + + # We need to restore the old stack pointer before unblocking signals. + # We also need 0x60 bytes for a stack frame. Since we had a stack + # frame at this place before the stack switch, there's no need to + # write the back chain again. + lr %r15, %r11 + ahi %r15, -0x60 + + brasl %r14, __morestack_unblock_signals + + lm %r2, %r15, 0x8(%r11) # Restore all registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# Executed if no new stack allocation is needed. + +.Lnoalloc: + .cfi_restore_state + # We may need to copy stack parameters. + l %r9, 0x4(%r10) # Load stack parameter size. + ltr %r9, %r9 # And check if it's 0. + je .Lnostackparm # Skip the copy if not needed. + sr %r15, %r9 # Make space on the stack. + la %r8, 0x60(%r15) # Destination. + la %r12, 0x60(%r11) # Source. + lr %r13, %r9 # Source size. +.Lcopy: + mvcle %r8, %r12, 0 # Copy. + jo .Lcopy + +.Lnostackparm: + # Third parameter is address of function meat - address of parameter + # block. + a %r10, 0x8(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0x60(%r11) + + # OK, no stack allocation needed. We still follow the protocol and + # call our caller - it doesn't cost much and makes sure vararg works. + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. + basr %r14, %r10 # Call our caller. + + lm %r6, %r15, 0x18(%r11) # Restore all callee-saved registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# This is the cleanup code called by the stack unwinder when unwinding +# through the code between .LEHB0 and .LEHE0 above. + +.L1: + .cfi_restore_state + lr %r2, %r11 # Stack pointer after resume. + brasl %r14, __generic_findstack + lr %r3, %r11 # Get the stack pointer. + sr %r3, %r2 # Subtract available space. + ahi %r3, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. + st %r3, 0x20(%r1) # Save the new stack boundary. + + lr %r2, %r6 # Exception header. +#ifdef __PIC__ + brasl %r14, _Unwind_Resume@PLT +#else + brasl %r14, _Unwind_Resume +#endif + +#else /* defined(__s390x__) */ + + +# The 64-bit __morestack function. + + # We use a cleanup to restore the stack guard if an exception + # is thrown through this code. +#ifndef __PIC__ + .cfi_personality 0x3,__gcc_personality_v0 + .cfi_lsda 0x3,.LLSDA1 +#else + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 + .cfi_lsda 0x1b,.LLSDA1 +#endif + + stmg %r2, %r15, 0x10(%r15) # Save %r2-%r15. + .cfi_offset %r6, -0x70 + .cfi_offset %r7, -0x68 + .cfi_offset %r8, -0x60 + .cfi_offset %r9, -0x58 + .cfi_offset %r10, -0x50 + .cfi_offset %r11, -0x48 + .cfi_offset %r12, -0x40 + .cfi_offset %r13, -0x38 + .cfi_offset %r14, -0x30 + .cfi_offset %r15, -0x28 + lgr %r11, %r15 # Make frame pointer for vararg. + .cfi_def_cfa_register %r11 + aghi %r15, -0xa0 # 0xa0 for standard frame. + stg %r11, 0(%r15) # Save back chain. + lgr %r8, %r0 # Save %r0 (static chain). + lgr %r10, %r1 # Save %r1 (address of parameter block). + + lg %r7, 0(%r10) # Required frame size to %r7 + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. + lg %r1, 0x38(%r1) # Get stack bounduary + agr %r1, %r7 # Stack bounduary + frame size + ag %r1, 8(%r10) # + stack param size + clgr %r1, %r15 # Compare with current stack pointer + jle .Lnoalloc # guard > sp - frame-size: need alloc + + brasl %r14, __morestack_block_signals + + # We abuse one of caller's fpr save slots (which we don't use for fprs) + # as a local variable. Not needed here, but done to be consistent with + # the below use. + aghi %r7, BACKOFF # Bump requested size a bit. + stg %r7, 0x80(%r11) # Stuff frame size on stack. + la %r2, 0x80(%r11) # Pass its address as parameter. + la %r3, 0xa0(%r11) # Caller's stack parameters. + lg %r4, 8(%r10) # Size of stack parameters. + brasl %r14, __generic_morestack + + lgr %r15, %r2 # Switch to the new stack. + aghi %r15, -0xa0 # Make a stack frame on it. + stg %r11, 0(%r15) # Save back chain. + + sg %r2, 0x80(%r11) # The end of stack space. + aghi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. +.LEHB0: + stg %r2, 0x38(%r1) # Save the new stack boundary. + + brasl %r14, __morestack_unblock_signals + + lgr %r0, %r8 # Static chain. + lmg %r2, %r6, 0x10(%r11) # Paremeter registers. + + # Third parameter is address of function meat - address of parameter + # block. + ag %r10, 0x10(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0xa0(%r11) + + # State of registers: + # %r0: Static chain from entry. + # %r1: Vararg pointer. + # %r2-%r6: Parameters from entry. + # %r7-%r10: Indeterminate. + # %r11: Frame pointer (%r15 from entry). + # %r12-%r13: Indeterminate. + # %r14: Return address. + # %r15: Stack pointer. + basr %r14, %r10 # Call our caller. + + stg %r2, 0x10(%r11) # Save return register. + + brasl %r14, __morestack_block_signals + + # We need a stack slot now, but have no good way to get it - the frame + # on new stack had to be exactly 0xa0 bytes, or stack parameters would + # be passed wrong. Abuse fpr save area in caller's frame (we don't + # save actual fprs). + la %r2, 0x80(%r11) + brasl %r14, __generic_releasestack + + sg %r2, 0x80(%r11) # Subtract available space. + aghi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. +.LEHE0: + stg %r2, 0x38(%r1) # Save the new stack boundary. + + # We need to restore the old stack pointer before unblocking signals. + # We also need 0xa0 bytes for a stack frame. Since we had a stack + # frame at this place before the stack switch, there's no need to + # write the back chain again. + lgr %r15, %r11 + aghi %r15, -0xa0 + + brasl %r14, __morestack_unblock_signals + + lmg %r2, %r15, 0x10(%r11) # Restore all registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# Executed if no new stack allocation is needed. + +.Lnoalloc: + .cfi_restore_state + # We may need to copy stack parameters. + lg %r9, 0x8(%r10) # Load stack parameter size. + ltgr %r9, %r9 # Check if it's 0. + je .Lnostackparm # Skip the copy if not needed. + sgr %r15, %r9 # Make space on the stack. + la %r8, 0xa0(%r15) # Destination. + la %r12, 0xa0(%r11) # Source. + lgr %r13, %r9 # Source size. +.Lcopy: + mvcle %r8, %r12, 0 # Copy. + jo .Lcopy + +.Lnostackparm: + # Third parameter is address of function meat - address of parameter + # block. + ag %r10, 0x10(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0xa0(%r11) + + # OK, no stack allocation needed. We still follow the protocol and + # call our caller - it doesn't cost much and makes sure vararg works. + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. + basr %r14, %r10 # Call our caller. + + lmg %r6, %r15, 0x30(%r11) # Restore all callee-saved registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# This is the cleanup code called by the stack unwinder when unwinding +# through the code between .LEHB0 and .LEHE0 above. + +.L1: + .cfi_restore_state + lgr %r2, %r11 # Stack pointer after resume. + brasl %r14, __generic_findstack + lgr %r3, %r11 # Get the stack pointer. + sgr %r3, %r2 # Subtract available space. + aghi %r3, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. + stg %r3, 0x38(%r1) # Save the new stack boundary. + + lgr %r2, %r6 # Exception header. +#ifdef __PIC__ + brasl %r14, _Unwind_Resume@PLT +#else + brasl %r14, _Unwind_Resume +#endif + +#endif /* defined(__s390x__) */ + + .cfi_endproc + .size __morestack, . - __morestack + + +# The exception table. This tells the personality routine to execute +# the exception handler. + + .section .gcc_except_table,"a",@progbits + .align 4 +.LLSDA1: + .byte 0xff # @LPStart format (omit) + .byte 0xff # @TType format (omit) + .byte 0x1 # call-site format (uleb128) + .uleb128 .LLSDACSE1-.LLSDACSB1 # Call-site table length +.LLSDACSB1: + .uleb128 .LEHB0-.LFB1 # region 0 start + .uleb128 .LEHE0-.LEHB0 # length + .uleb128 .L1-.LFB1 # landing pad + .uleb128 0 # action +.LLSDACSE1: + + + .global __gcc_personality_v0 +#ifdef __PIC__ + # Build a position independent reference to the basic + # personality function. + .hidden DW.ref.__gcc_personality_v0 + .weak DW.ref.__gcc_personality_v0 + .section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat + .type DW.ref.__gcc_personality_v0, @object +DW.ref.__gcc_personality_v0: +#ifndef __LP64__ + .align 4 + .size DW.ref.__gcc_personality_v0, 4 + .long __gcc_personality_v0 +#else + .align 8 + .size DW.ref.__gcc_personality_v0, 8 + .quad __gcc_personality_v0 +#endif +#endif + + + +# Initialize the stack test value when the program starts or when a +# new thread starts. We don't know how large the main stack is, so we +# guess conservatively. We might be able to use getrlimit here. + + .text + .global __stack_split_initialize + .hidden __stack_split_initialize + + .type __stack_split_initialize, @function + +__stack_split_initialize: + +#ifndef __s390x__ + + ear %r1, %a0 + lr %r0, %r15 + ahi %r0, -0x4000 # We should have at least 16K. + st %r0, 0x20(%r1) + + lr %r2, %r15 + lhi %r3, 0x4000 +#ifdef __PIC__ + jg __generic_morestack_set_initial_sp@PLT # Tail call +#else + jg __generic_morestack_set_initial_sp # Tail call +#endif + +#else /* defined(__s390x__) */ + + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + lgr %r0, %r15 + aghi %r0, -0x4000 # We should have at least 16K. + stg %r0, 0x38(%r1) + + lgr %r2, %r15 + lghi %r3, 0x4000 +#ifdef __PIC__ + jg __generic_morestack_set_initial_sp@PLT # Tail call +#else + jg __generic_morestack_set_initial_sp # Tail call +#endif + +#endif /* defined(__s390x__) */ + + .size __stack_split_initialize, . - __stack_split_initialize + +# Routines to get and set the guard, for __splitstack_getcontext, +# __splitstack_setcontext, and __splitstack_makecontext. + +# void *__morestack_get_guard (void) returns the current stack guard. + .text + .global __morestack_get_guard + .hidden __morestack_get_guard + + .type __morestack_get_guard,@function + +__morestack_get_guard: + +#ifndef __s390x__ + ear %r1, %a0 + l %r2, 0x20(%r1) +#else + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + lg %r2, 0x38(%r1) +#endif + br %r14 + + .size __morestack_get_guard, . - __morestack_get_guard + +# void __morestack_set_guard (void *) sets the stack guard. + .global __morestack_set_guard + .hidden __morestack_set_guard + + .type __morestack_set_guard,@function + +__morestack_set_guard: + +#ifndef __s390x__ + ear %r1, %a0 + st %r2, 0x20(%r1) +#else + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + stg %r2, 0x38(%r1) +#endif + br %r14 + + .size __morestack_set_guard, . - __morestack_set_guard + +# void *__morestack_make_guard (void *, size_t) returns the stack +# guard value for a stack. + .global __morestack_make_guard + .hidden __morestack_make_guard + + .type __morestack_make_guard,@function + +__morestack_make_guard: + +#ifndef __s390x__ + sr %r2, %r3 + ahi %r2, BACKOFF +#else + sgr %r2, %r3 + aghi %r2, BACKOFF +#endif + br %r14 + + .size __morestack_make_guard, . - __morestack_make_guard + +# Make __stack_split_initialize a high priority constructor. + + .section .ctors.65535,"aw",@progbits + +#ifndef __LP64__ + .align 4 + .long __stack_split_initialize + .long __morestack_load_mmap +#else + .align 8 + .quad __stack_split_initialize + .quad __morestack_load_mmap +#endif + + .section .note.GNU-stack,"",@progbits + .section .note.GNU-split-stack,"",@progbits + .section .note.GNU-no-split-stack,"",@progbits diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390 new file mode 100644 index 0000000..4c959b0 --- /dev/null +++ b/libgcc/config/s390/t-stack-s390 @@ -0,0 +1,2 @@ +# Makefile fragment to support -fsplit-stack for s390. +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c index 89765d4..b8eec4e 100644 --- a/libgcc/generic-morestack.c +++ b/libgcc/generic-morestack.c @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len, #elif defined (__i386__) nsp -= 6 * sizeof (void *); #elif defined __powerpc64__ +#elif defined __s390x__ + nsp -= 2 * 160; +#elif defined __s390__ + nsp -= 2 * 96; #else #error "unrecognized target" #endif -- 2.7.0 ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-03 17:18 ` Marcin Kościelnicki @ 2016-02-03 17:27 ` Ulrich Weigand 2016-02-04 12:44 ` Marcin Kościelnicki 0 siblings, 1 reply; 55+ messages in thread From: Ulrich Weigand @ 2016-02-03 17:27 UTC (permalink / raw) To: Marcin Kościelnicki; +Cc: krebbel, gcc-patches, Marcin Kościelnicki Marcin Koà Âcielnicki wrote: > libgcc/ChangeLog: > > * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. > * config/s390/morestack.S: New file. > * config/s390/t-stack-s390: New file. > * generic-morestack.c (__splitstack_find): Add s390-specific code. > > gcc/ChangeLog: > > * common/config/s390/s390-common.c (s390_supports_split_stack): > New function. > (TARGET_SUPPORTS_SPLIT_STACK): New macro. > * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. > * config/s390/s390.c (struct machine_function): New field > split_stack_varargs_pointer. > (s390_register_info): Mark r12 as clobbered if it'll be used as temp > in s390_emit_prologue. > (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack > vararg pointer. > (morestack_ref): New global. > (SPLIT_STACK_AVAILABLE): New macro. > (s390_expand_split_stack_prologue): New function. > (s390_live_on_entry): New function. > (s390_va_start): Use split-stack vararg pointer if appropriate. > (s390_asm_file_end): Emit the split-stack note sections. > (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. > * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. > (UNSPECV_SPLIT_STACK_CALL): New unspec. > (UNSPECV_SPLIT_STACK_DATA): New unspec. > (split_stack_prologue): New expand. > (split_stack_space_check): New expand. > (split_stack_data): New insn. > (split_stack_call): New expand. > (split_stack_call_*): New insn. > (split_stack_cond_call): New expand. > (split_stack_cond_call_*): New insn. > --- > Changes applied. Testsuite still running, still works on my simple tests. > > As for common code prerequisites: #3 is no longer needed, and very likely > so is #4 (it fixes problems that I've only seen with ESA mode, and testsuite > runs just fine without it now). OK, I see. The patch is OK for mainline then, assuming testing passes. Thanks again, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-03 17:27 ` Ulrich Weigand @ 2016-02-04 12:44 ` Marcin Kościelnicki 2016-02-10 13:14 ` Marcin Kościelnicki 0 siblings, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-02-04 12:44 UTC (permalink / raw) To: Ulrich Weigand; +Cc: krebbel, gcc-patches On 03/02/16 18:27, Ulrich Weigand wrote: > Marcin Kościelnicki wrote: > >> libgcc/ChangeLog: >> >> * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. >> * config/s390/morestack.S: New file. >> * config/s390/t-stack-s390: New file. >> * generic-morestack.c (__splitstack_find): Add s390-specific code. >> >> gcc/ChangeLog: >> >> * common/config/s390/s390-common.c (s390_supports_split_stack): >> New function. >> (TARGET_SUPPORTS_SPLIT_STACK): New macro. >> * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. >> * config/s390/s390.c (struct machine_function): New field >> split_stack_varargs_pointer. >> (s390_register_info): Mark r12 as clobbered if it'll be used as temp >> in s390_emit_prologue. >> (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack >> vararg pointer. >> (morestack_ref): New global. >> (SPLIT_STACK_AVAILABLE): New macro. >> (s390_expand_split_stack_prologue): New function. >> (s390_live_on_entry): New function. >> (s390_va_start): Use split-stack vararg pointer if appropriate. >> (s390_asm_file_end): Emit the split-stack note sections. >> (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. >> * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. >> (UNSPECV_SPLIT_STACK_CALL): New unspec. >> (UNSPECV_SPLIT_STACK_DATA): New unspec. >> (split_stack_prologue): New expand. >> (split_stack_space_check): New expand. >> (split_stack_data): New insn. >> (split_stack_call): New expand. >> (split_stack_call_*): New insn. >> (split_stack_cond_call): New expand. >> (split_stack_cond_call_*): New insn. >> --- >> Changes applied. Testsuite still running, still works on my simple tests. >> >> As for common code prerequisites: #3 is no longer needed, and very likely >> so is #4 (it fixes problems that I've only seen with ESA mode, and testsuite >> runs just fine without it now). > > OK, I see. The patch is OK for mainline then, assuming testing passes. Well, testing passes (as in, is no worse than x86 - the testsuite doesn't really agree with -fsplit-stack in a few places involving backtraces). However, there's still the libgo issue to be taken care of. For my tests, I patched it up with: diff --git a/libgo/runtime/proc.c b/libgo/runtime/proc.c index c25a217..efa6806 100644 --- a/libgo/runtime/proc.c +++ b/libgo/runtime/proc.c @@ -2016,17 +2016,19 @@ doentersyscall() m->locks++; // Leave SP around for GC and traceback. + { #ifdef USING_SPLIT_STACK - g->gcstack = __splitstack_find(nil, nil, &g->gcstack_size, - &g->gcnext_segment, &g->gcnext_sp, - &g->gcinitial_sp); + size_t size_tmp; + g->gcstack = __splitstack_find(nil, nil, &size_tmp, + &g->gcnext_segment, &g->gcnext_sp, + &g->gcinitial_sp); + g->gcstack_size = size_tmp; #else - { void *v; g->gcnext_sp = (byte *) &v; - } #endif + } g->status = Gsyscall; @@ -2064,9 +2066,13 @@ runtime_entersyscallblock(void) // Leave SP around for GC and traceback. #ifdef USING_SPLIT_STACK - g->gcstack = __splitstack_find(nil, nil, &g->gcstack_size, - &g->gcnext_segment, &g->gcnext_sp, - &g->gcinitial_sp); + { + size_t size_tmp; + g->gcstack = __splitstack_find(nil, nil, &size_tmp, + &g->gcnext_segment, &g->gcnext_sp, + &g->gcinitial_sp); + g->gcstack_size = size_tmp; + } #else g->gcnext_sp = (byte *) &p; #endif Andreas, did you have any luck with fixing this? If not, I'll try submitting the above patch to gofrontend. > > Thanks again, > Ulrich > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-04 12:44 ` Marcin Kościelnicki @ 2016-02-10 13:14 ` Marcin Kościelnicki 2016-02-14 16:01 ` Marcin Kościelnicki 0 siblings, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-02-10 13:14 UTC (permalink / raw) To: Ulrich Weigand; +Cc: krebbel, gcc-patches On 04/02/16 13:44, Marcin KoÅcielnicki wrote: > On 03/02/16 18:27, Ulrich Weigand wrote: >> Marcin Koà âºcielnicki wrote: >> >>> libgcc/ChangeLog: >>> >>> * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. >>> * config/s390/morestack.S: New file. >>> * config/s390/t-stack-s390: New file. >>> * generic-morestack.c (__splitstack_find): Add s390-specific code. >>> >>> gcc/ChangeLog: >>> >>> * common/config/s390/s390-common.c (s390_supports_split_stack): >>> New function. >>> (TARGET_SUPPORTS_SPLIT_STACK): New macro. >>> * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. >>> * config/s390/s390.c (struct machine_function): New field >>> split_stack_varargs_pointer. >>> (s390_register_info): Mark r12 as clobbered if it'll be used as temp >>> in s390_emit_prologue. >>> (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack >>> vararg pointer. >>> (morestack_ref): New global. >>> (SPLIT_STACK_AVAILABLE): New macro. >>> (s390_expand_split_stack_prologue): New function. >>> (s390_live_on_entry): New function. >>> (s390_va_start): Use split-stack vararg pointer if appropriate. >>> (s390_asm_file_end): Emit the split-stack note sections. >>> (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. >>> * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. >>> (UNSPECV_SPLIT_STACK_CALL): New unspec. >>> (UNSPECV_SPLIT_STACK_DATA): New unspec. >>> (split_stack_prologue): New expand. >>> (split_stack_space_check): New expand. >>> (split_stack_data): New insn. >>> (split_stack_call): New expand. >>> (split_stack_call_*): New insn. >>> (split_stack_cond_call): New expand. >>> (split_stack_cond_call_*): New insn. >>> --- >>> Changes applied. Testsuite still running, still works on my simple >>> tests. >>> >>> As for common code prerequisites: #3 is no longer needed, and very >>> likely >>> so is #4 (it fixes problems that I've only seen with ESA mode, and >>> testsuite >>> runs just fine without it now). >> >> OK, I see. The patch is OK for mainline then, assuming testing passes. > > Well, testing passes (as in, is no worse than x86 - the testsuite > doesn't really agree with -fsplit-stack in a few places involving > backtraces). However, there's still the libgo issue to be taken care > of. For my tests, I patched it up with: > [...] I see the libgo patch has landed today. Can we get this pushed? Marcin KoÅcielnicki ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH] s390: Add -fsplit-stack support 2016-02-10 13:14 ` Marcin Kościelnicki @ 2016-02-14 16:01 ` Marcin Kościelnicki 2016-02-15 10:21 ` Andreas Krebbel 0 siblings, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-02-14 16:01 UTC (permalink / raw) To: uweigand; +Cc: gcc-patches, krebbel, Marcin Kościelnicki libgcc/ChangeLog: * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. * config/s390/morestack.S: New file. * config/s390/t-stack-s390: New file. * generic-morestack.c (__splitstack_find): Add s390-specific code. gcc/ChangeLog: * common/config/s390/s390-common.c (s390_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro. * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. * config/s390/s390.c (struct machine_function): New field split_stack_varargs_pointer. (s390_register_info): Mark r12 as clobbered if it'll be used as temp in s390_emit_prologue. (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack vararg pointer. (morestack_ref): New global. (SPLIT_STACK_AVAILABLE): New macro. (s390_expand_split_stack_prologue): New function. (s390_live_on_entry): New function. (s390_va_start): Use split-stack vararg pointer if appropriate. (s390_asm_file_end): Emit the split-stack note sections. (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. (UNSPECV_SPLIT_STACK_CALL): New unspec. (UNSPECV_SPLIT_STACK_DATA): New unspec. (split_stack_prologue): New expand. (split_stack_space_check): New expand. (split_stack_data): New insn. (split_stack_call): New expand. (split_stack_call_*): New insn. (split_stack_cond_call): New expand. (split_stack_cond_call_*): New insn. --- Whoops, I noticed a problem introduced when removing ESA bits: in the __morestack exception-handling path in 31-bit version, I neglected to stuff GOT address in %r12, which is necessary for the PLT stub to work. The only change in this version is the added larl %r12, _GLOBAL_OFFSET_TABLE_ line. gcc/ChangeLog | 30 ++ gcc/common/config/s390/s390-common.c | 14 + gcc/config/s390/s390-protos.h | 1 + gcc/config/s390/s390.c | 214 +++++++++++- gcc/config/s390/s390.md | 138 ++++++++ libgcc/ChangeLog | 7 + libgcc/config.host | 4 +- libgcc/config/s390/morestack.S | 611 +++++++++++++++++++++++++++++++++++ libgcc/config/s390/t-stack-s390 | 2 + libgcc/generic-morestack.c | 4 + 10 files changed, 1018 insertions(+), 7 deletions(-) create mode 100644 libgcc/config/s390/morestack.S create mode 100644 libgcc/config/s390/t-stack-s390 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index e81d1fe..60a4608 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,33 @@ +2016-02-14 Marcin KoÅcielnicki <koriakin@0x04.net> + + * common/config/s390/s390-common.c (s390_supports_split_stack): + New function. + (TARGET_SUPPORTS_SPLIT_STACK): New macro. + * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. + * config/s390/s390.c (struct machine_function): New field + split_stack_varargs_pointer. + (s390_register_info): Mark r12 as clobbered if it'll be used as temp + in s390_emit_prologue. + (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack + vararg pointer. + (morestack_ref): New global. + (SPLIT_STACK_AVAILABLE): New macro. + (s390_expand_split_stack_prologue): New function. + (s390_live_on_entry): New function. + (s390_va_start): Use split-stack vararg pointer if appropriate. + (s390_asm_file_end): Emit the split-stack note sections. + (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. + * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. + (UNSPECV_SPLIT_STACK_CALL): New unspec. + (UNSPECV_SPLIT_STACK_DATA): New unspec. + (split_stack_prologue): New expand. + (split_stack_space_check): New expand. + (split_stack_data): New insn. + (split_stack_call): New expand. + (split_stack_call_*): New insn. + (split_stack_cond_call): New expand. + (split_stack_cond_call_*): New insn. + 2016-02-14 Venkataramanan Kumar <venkataramanan.kumar@amd.com> * config/i386/znver1.md diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c index 4519c21..1e497e6 100644 --- a/gcc/common/config/s390/s390-common.c +++ b/gcc/common/config/s390/s390-common.c @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, } } +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23. + We don't verify it, since earlier versions just have padding at + its place, which works just as well. */ + +static bool +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED, + struct gcc_options *opts ATTRIBUTE_UNUSED) +{ + return true; +} + #undef TARGET_DEFAULT_TARGET_FLAGS #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT) @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, #undef TARGET_OPTION_INIT_STRUCT #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct +#undef TARGET_SUPPORTS_SPLIT_STACK +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack + struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index 633bc1e..09032c9 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED, extern HOST_WIDE_INT s390_initial_elimination_offset (int, int); extern void s390_emit_prologue (void); extern void s390_emit_epilogue (bool); +extern void s390_expand_split_stack_prologue (void); extern bool s390_can_use_simple_return_insn (void); extern bool s390_can_use_return_insn (void); extern void s390_function_profiler (FILE *, int); diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 9facd96..aa82d1c 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -428,6 +428,13 @@ struct GTY(()) machine_function /* True if the current function may contain a tbegin clobbering FPRs. */ bool tbegin_p; + + /* For -fsplit-stack support: A stack local which holds a pointer to + the stack arguments for a function with a variable number of + arguments. This is set at the start of the function and is used + to initialize the overflow_arg_area field of the va_list + structure. */ + rtx split_stack_varargs_pointer; }; /* Few accessor macros for struct cfun->machine->s390_frame_layout. */ @@ -9371,9 +9378,13 @@ s390_register_info () cfun_frame_layout.high_fprs++; } - if (flag_pic) - clobbered_regs[PIC_OFFSET_TABLE_REGNUM] - |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM); + /* Register 12 is used for GOT address, but also as temp in prologue + for split-stack stdarg functions (unless r14 is available). */ + clobbered_regs[12] + |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM)) + || (flag_split_stack && cfun->stdarg + && (crtl->is_leaf || TARGET_TPF_PROFILING + || has_hard_reg_initial_val (Pmode, RETURN_REGNUM)))); clobbered_regs[BASE_REGNUM] |= (cfun->machine->base_reg @@ -10473,12 +10484,15 @@ s390_emit_prologue (void) int next_fpr = 0; /* Choose best register to use for temp use within prologue. - See below for why TPF must use the register 1. */ + TPF with profiling must avoid the register 14 - the tracing function + needs the original contents of r14 to be preserved. */ if (!has_hard_reg_initial_val (Pmode, RETURN_REGNUM) && !crtl->is_leaf && !TARGET_TPF_PROFILING) temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM); + else if (flag_split_stack && cfun->stdarg) + temp_reg = gen_rtx_REG (Pmode, 12); else temp_reg = gen_rtx_REG (Pmode, 1); @@ -10972,6 +10986,166 @@ s300_set_up_by_prologue (hard_reg_set_container *regs) SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg)); } +/* -fsplit-stack support. */ + +/* A SYMBOL_REF for __morestack. */ +static GTY(()) rtx morestack_ref; + +/* When using -fsplit-stack, the allocation routines set a field in + the TCB to the bottom of the stack plus this much space, measured + in bytes. */ + +#define SPLIT_STACK_AVAILABLE 1024 + +/* Emit -fsplit-stack prologue, which goes before the regular function + prologue. */ + +void +s390_expand_split_stack_prologue (void) +{ + rtx r1, guard, cc = NULL; + rtx_insn *insn; + /* Offset from thread pointer to __private_ss. */ + int psso = TARGET_64BIT ? 0x38 : 0x20; + /* Pointer size in bytes. */ + /* Frame size and argument size - the two parameters to __morestack. */ + HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size; + /* Align argument size to 8 bytes - simplifies __morestack code. */ + HOST_WIDE_INT args_size = crtl->args.size >= 0 + ? ((crtl->args.size + 7) & ~7) + : 0; + /* Label to be called by __morestack. */ + rtx_code_label *call_done = NULL; + rtx_code_label *parm_base = NULL; + rtx tmp; + + gcc_assert (flag_split_stack && reload_completed); + if (!TARGET_CPU_ZARCH) + { + sorry ("CPUs older than z900 are not supported for -fsplit-stack"); + return; + } + + r1 = gen_rtx_REG (Pmode, 1); + + /* If no stack frame will be allocated, don't do anything. */ + if (!frame_size) + { + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + /* If va_start is used, just use r15. */ + emit_move_insn (r1, + gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (STACK_POINTER_OFFSET))); + + } + return; + } + + if (morestack_ref == NULL_RTX) + { + morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack"); + SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL + | SYMBOL_FLAG_FUNCTION); + } + + if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Op (frame_size)) + { + /* If frame_size will fit in an add instruction, do a stack space + check, and only call __morestack if there's not enough space. */ + + /* Get thread pointer. r1 is the only register we can always destroy - r0 + could contain a static chain (and cannot be used to address memory + anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved. */ + emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM)); + /* Aim at __private_ss. */ + guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso)); + + /* If less that 1kiB used, skip addition and compare directly with + __private_ss. */ + if (frame_size > SPLIT_STACK_AVAILABLE) + { + emit_move_insn (r1, guard); + if (TARGET_64BIT) + emit_insn (gen_adddi3 (r1, r1, GEN_INT (frame_size))); + else + emit_insn (gen_addsi3 (r1, r1, GEN_INT (frame_size))); + guard = r1; + } + + /* Compare the (maybe adjusted) guard with the stack pointer. */ + cc = s390_emit_compare (LT, stack_pointer_rtx, guard); + } + + call_done = gen_label_rtx (); + parm_base = gen_label_rtx (); + + /* Emit the parameter block. */ + tmp = gen_split_stack_data (parm_base, call_done, + GEN_INT (frame_size), + GEN_INT (args_size)); + insn = emit_insn (tmp); + add_reg_note (insn, REG_LABEL_OPERAND, call_done); + LABEL_NUSES (call_done)++; + add_reg_note (insn, REG_LABEL_OPERAND, parm_base); + LABEL_NUSES (parm_base)++; + + /* %r1 = litbase. */ + insn = emit_move_insn (r1, gen_rtx_LABEL_REF (VOIDmode, parm_base)); + add_reg_note (insn, REG_LABEL_OPERAND, parm_base); + LABEL_NUSES (parm_base)++; + + /* Now, we need to call __morestack. It has very special calling + conventions: it preserves param/return/static chain registers for + calling main function body, and looks for its own parameters at %r1. */ + + if (cc != NULL) + { + tmp = gen_split_stack_cond_call (morestack_ref, cc, call_done); + + insn = emit_jump_insn (tmp); + JUMP_LABEL (insn) = call_done; + LABEL_NUSES (call_done)++; + + /* Mark the jump as very unlikely to be taken. */ + add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100); + + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + /* If va_start is used, and __morestack was not called, just use + r15. */ + emit_move_insn (r1, + gen_rtx_PLUS (Pmode, stack_pointer_rtx, + GEN_INT (STACK_POINTER_OFFSET))); + } + } + else + { + tmp = gen_split_stack_call (morestack_ref, call_done); + insn = emit_jump_insn (tmp); + JUMP_LABEL (insn) = call_done; + LABEL_NUSES (call_done)++; + emit_barrier (); + } + + /* __morestack will call us here. */ + + emit_label (call_done); +} + +/* We may have to tell the dataflow pass that the split stack prologue + is initializing a register. */ + +static void +s390_live_on_entry (bitmap regs) +{ + if (cfun->machine->split_stack_varargs_pointer != NULL_RTX) + { + gcc_assert (flag_split_stack); + bitmap_set_bit (regs, 1); + } +} + /* Return true if the function can use simple_return to return outside of a shrink-wrapped region. At present shrink-wrapping is supported in all cases. */ @@ -11574,6 +11748,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } + if (flag_split_stack + && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) + == NULL) + && cfun->machine->split_stack_varargs_pointer == NULL_RTX) + { + rtx reg; + rtx_insn *seq; + + reg = gen_reg_rtx (Pmode); + cfun->machine->split_stack_varargs_pointer = reg; + + start_sequence (); + emit_move_insn (reg, gen_rtx_REG (Pmode, 1)); + seq = get_insns (); + end_sequence (); + + push_topmost_sequence (); + emit_insn_after (seq, entry_of_function ()); + pop_topmost_sequence (); + } + /* Find the overflow area. FIXME: This currently is too pessimistic when the vector ABI is enabled. In that case we *always* set up the overflow area @@ -11582,7 +11777,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG || TARGET_VX_ABI) { - t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); + if (cfun->machine->split_stack_varargs_pointer == NULL_RTX) + t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx); + else + t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer); off = INTVAL (crtl->args.arg_offset_rtx); off = off < 0 ? 0 : off; @@ -14502,6 +14700,9 @@ s390_asm_file_end (void) s390_vector_abi); #endif file_end_indicate_exec_stack (); + + if (flag_split_stack) + file_end_indicate_split_stack (); } /* Return true if TYPE is a vector bool type. */ @@ -14757,6 +14958,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty #undef TARGET_SET_UP_BY_PROLOGUE #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue +#undef TARGET_EXTRA_LIVE_ON_ENTRY +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry + #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ s390_use_by_pieces_infrastructure_p diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index ccedead..6f0e172 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -114,6 +114,9 @@ UNSPEC_SP_SET UNSPEC_SP_TEST + ; Split stack support + UNSPEC_STACK_CHECK + ; Test Data Class (TDC) UNSPEC_TDC_INSN @@ -276,6 +279,10 @@ ; Set and get floating point control register UNSPECV_SFPC UNSPECV_EFPC + + ; Split stack support + UNSPECV_SPLIT_STACK_CALL + UNSPECV_SPLIT_STACK_DATA ]) ;; @@ -10909,3 +10916,134 @@ "TARGET_Z13" "lcbb\t%0,%1,%b2" [(set_attr "op_type" "VRX")]) + +; Handle -fsplit-stack. + +(define_expand "split_stack_prologue" + [(const_int 0)] + "" +{ + s390_expand_split_stack_prologue (); + DONE; +}) + +;; If there are operand 0 bytes available on the stack, jump to +;; operand 1. + +(define_expand "split_stack_space_check" + [(set (pc) (if_then_else + (ltu (minus (reg 15) + (match_operand 0 "register_operand")) + (unspec [(const_int 0)] UNSPEC_STACK_CHECK)) + (label_ref (match_operand 1)) + (pc)))] + "" +{ + /* Offset from thread pointer to __private_ss. */ + int psso = TARGET_64BIT ? 0x38 : 0x20; + rtx tp = s390_get_thread_pointer (); + rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso)); + rtx reg = gen_reg_rtx (Pmode); + rtx cc; + if (TARGET_64BIT) + emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0])); + else + emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0])); + cc = s390_emit_compare (GT, reg, guard); + s390_emit_jump (operands[1], cc); + + DONE; +}) + +;; __morestack parameter block for split stack prologue. Parameters are: +;; parameter block label, label to be called by __morestack, frame size, +;; stack parameter size. + +(define_insn "split_stack_data" + [(unspec_volatile [(match_operand 0 "" "X") + (match_operand 1 "" "X") + (match_operand 2 "const_int_operand" "X") + (match_operand 3 "const_int_operand" "X")] + UNSPECV_SPLIT_STACK_DATA)] + "TARGET_CPU_ZARCH" +{ + switch_to_section (targetm.asm_out.function_rodata_section + (current_function_decl)); + + if (TARGET_64BIT) + output_asm_insn (".align\t8", operands); + else + output_asm_insn (".align\t4", operands); + (*targetm.asm_out.internal_label) (asm_out_file, "L", + CODE_LABEL_NUMBER (operands[0])); + if (TARGET_64BIT) + { + output_asm_insn (".quad\t%2", operands); + output_asm_insn (".quad\t%3", operands); + output_asm_insn (".quad\t%1-%0", operands); + } + else + { + output_asm_insn (".long\t%2", operands); + output_asm_insn (".long\t%3", operands); + output_asm_insn (".long\t%1-%0", operands); + } + + switch_to_section (current_function_section ()); + return ""; +} + [(set_attr "length" "0")]) + + +;; A jg with minimal fuss for use in split stack prologue. + +(define_expand "split_stack_call" + [(match_operand 0 "bras_sym_operand" "X") + (match_operand 1 "" "")] + "TARGET_CPU_ZARCH" +{ + if (TARGET_64BIT) + emit_jump_insn (gen_split_stack_call_di (operands[0], operands[1])); + else + emit_jump_insn (gen_split_stack_call_si (operands[0], operands[1])); + DONE; +}) + +(define_insn "split_stack_call_<mode>" + [(set (pc) (label_ref (match_operand 1 "" ""))) + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X") + (reg:P 1)] + UNSPECV_SPLIT_STACK_CALL))] + "TARGET_CPU_ZARCH" + "jg\t%0" + [(set_attr "op_type" "RIL") + (set_attr "type" "branch")]) + +;; Also a conditional one. + +(define_expand "split_stack_cond_call" + [(match_operand 0 "bras_sym_operand" "X") + (match_operand 1 "" "") + (match_operand 2 "" "")] + "TARGET_CPU_ZARCH" +{ + if (TARGET_64BIT) + emit_jump_insn (gen_split_stack_cond_call_di (operands[0], operands[1], operands[2])); + else + emit_jump_insn (gen_split_stack_cond_call_si (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_insn "split_stack_cond_call_<mode>" + [(set (pc) + (if_then_else + (match_operand 1 "" "") + (label_ref (match_operand 2 "" "")) + (pc))) + (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X") + (reg:P 1)] + UNSPECV_SPLIT_STACK_CALL))] + "TARGET_CPU_ZARCH" + "jg%C1\t%0" + [(set_attr "op_type" "RIL") + (set_attr "type" "branch")]) diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog index 63ad30e..a02e940 100644 --- a/libgcc/ChangeLog +++ b/libgcc/ChangeLog @@ -1,3 +1,10 @@ +2016-02-14 Marcin KoÅcielnicki <koriakin@0x04.net> + + * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. + * config/s390/morestack.S: New file. + * config/s390/t-stack-s390: New file. + * generic-morestack.c (__splitstack_find): Add s390-specific code. + 2016-02-12 Walter Lee <walt@tilera.com> * config.host (tilegx*-*-linux*): remove ti from diff --git a/libgcc/config.host b/libgcc/config.host index 06de0de..ef7dfd0 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1114,11 +1114,11 @@ rx-*-elf) tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h" ;; s390-*-linux*) - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi" + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390" md_unwind_header=s390/linux-unwind.h ;; s390x-*-linux*) - tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux" + tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390" if test "${host_address}" = 32; then tmake_file="${tmake_file} s390/32/t-floattodi" fi diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S new file mode 100644 index 0000000..fa6951b --- /dev/null +++ b/libgcc/config/s390/morestack.S @@ -0,0 +1,611 @@ +# s390 support for -fsplit-stack. +# Copyright (C) 2015 Free Software Foundation, Inc. +# Contributed by Marcin KoÅcielnicki <koriakin@0x04.net>. + +# This file is part of GCC. + +# GCC is free software; you can redistribute it and/or modify it under +# the terms of the GNU General Public License as published by the Free +# Software Foundation; either version 3, or (at your option) any later +# version. + +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. + +# Under Section 7 of GPL version 3, you are granted additional +# permissions described in the GCC Runtime Library Exception, version +# 3.1, as published by the Free Software Foundation. + +# You should have received a copy of the GNU General Public License and +# a copy of the GCC Runtime Library Exception along with this program; +# see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +# <http://www.gnu.org/licenses/>. + +# Excess space needed to call ld.so resolver for lazy plt +# resolution. Go uses sigaltstack so this doesn't need to +# also cover signal frame size. +#define BACKOFF 0x1000 + +# The __morestack function. + + .global __morestack + .hidden __morestack + + .type __morestack,@function + +__morestack: +.LFB1: + .cfi_startproc + + +#ifndef __s390x__ + + +# The 31-bit __morestack function. + + # We use a cleanup to restore the stack guard if an exception + # is thrown through this code. +#ifndef __PIC__ + .cfi_personality 0,__gcc_personality_v0 + .cfi_lsda 0,.LLSDA1 +#else + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 + .cfi_lsda 0x1b,.LLSDA1 +#endif + + stm %r2, %r15, 0x8(%r15) # Save %r2-%r15. + .cfi_offset %r6, -0x48 + .cfi_offset %r7, -0x44 + .cfi_offset %r8, -0x40 + .cfi_offset %r9, -0x3c + .cfi_offset %r10, -0x38 + .cfi_offset %r11, -0x34 + .cfi_offset %r12, -0x30 + .cfi_offset %r13, -0x2c + .cfi_offset %r14, -0x28 + .cfi_offset %r15, -0x24 + lr %r11, %r15 # Make frame pointer for vararg. + .cfi_def_cfa_register %r11 + ahi %r15, -0x60 # 0x60 for standard frame. + st %r11, 0(%r15) # Save back chain. + lr %r8, %r0 # Save %r0 (static chain). + lr %r10, %r1 # Save %r1 (address of parameter block). + + l %r7, 0(%r10) # Required frame size to %r7 + ear %r1, %a0 # Extract thread pointer. + l %r1, 0x20(%r1) # Get stack bounduary + ar %r1, %r7 # Stack bounduary + frame size + a %r1, 4(%r10) # + stack param size + clr %r1, %r15 # Compare with current stack pointer + jle .Lnoalloc # guard > sp - frame-size: need alloc + + brasl %r14, __morestack_block_signals + + # We abuse one of caller's fpr save slots (which we don't use for fprs) + # as a local variable. Not needed here, but done to be consistent with + # the below use. + ahi %r7, BACKOFF # Bump requested size a bit. + st %r7, 0x40(%r11) # Stuff frame size on stack. + la %r2, 0x40(%r11) # Pass its address as parameter. + la %r3, 0x60(%r11) # Caller's stack parameters. + l %r4, 4(%r10) # Size of stack parameters. + brasl %r14, __generic_morestack + + lr %r15, %r2 # Switch to the new stack. + ahi %r15, -0x60 # Make a stack frame on it. + st %r11, 0(%r15) # Save back chain. + + s %r2, 0x40(%r11) # The end of stack space. + ahi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. +.LEHB0: + st %r2, 0x20(%r1) # Save the new stack boundary. + + brasl %r14, __morestack_unblock_signals + + lr %r0, %r8 # Static chain. + lm %r2, %r6, 0x8(%r11) # Paremeter registers. + + # Third parameter is address of function meat - address of parameter + # block. + a %r10, 0x8(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0x60(%r11) + + # State of registers: + # %r0: Static chain from entry. + # %r1: Vararg pointer. + # %r2-%r6: Parameters from entry. + # %r7-%r10: Indeterminate. + # %r11: Frame pointer (%r15 from entry). + # %r12-%r13: Indeterminate. + # %r14: Return address. + # %r15: Stack pointer. + basr %r14, %r10 # Call our caller. + + stm %r2, %r3, 0x8(%r11) # Save return registers. + + brasl %r14, __morestack_block_signals + + # We need a stack slot now, but have no good way to get it - the frame + # on new stack had to be exactly 0x60 bytes, or stack parameters would + # be passed wrong. Abuse fpr save area in caller's frame (we don't + # save actual fprs). + la %r2, 0x40(%r11) + brasl %r14, __generic_releasestack + + s %r2, 0x40(%r11) # Subtract available space. + ahi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. +.LEHE0: + st %r2, 0x20(%r1) # Save the new stack boundary. + + # We need to restore the old stack pointer before unblocking signals. + # We also need 0x60 bytes for a stack frame. Since we had a stack + # frame at this place before the stack switch, there's no need to + # write the back chain again. + lr %r15, %r11 + ahi %r15, -0x60 + + brasl %r14, __morestack_unblock_signals + + lm %r2, %r15, 0x8(%r11) # Restore all registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# Executed if no new stack allocation is needed. + +.Lnoalloc: + .cfi_restore_state + # We may need to copy stack parameters. + l %r9, 0x4(%r10) # Load stack parameter size. + ltr %r9, %r9 # And check if it's 0. + je .Lnostackparm # Skip the copy if not needed. + sr %r15, %r9 # Make space on the stack. + la %r8, 0x60(%r15) # Destination. + la %r12, 0x60(%r11) # Source. + lr %r13, %r9 # Source size. +.Lcopy: + mvcle %r8, %r12, 0 # Copy. + jo .Lcopy + +.Lnostackparm: + # Third parameter is address of function meat - address of parameter + # block. + a %r10, 0x8(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0x60(%r11) + + # OK, no stack allocation needed. We still follow the protocol and + # call our caller - it doesn't cost much and makes sure vararg works. + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. + basr %r14, %r10 # Call our caller. + + lm %r6, %r15, 0x18(%r11) # Restore all callee-saved registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# This is the cleanup code called by the stack unwinder when unwinding +# through the code between .LEHB0 and .LEHE0 above. + +.L1: + .cfi_restore_state + lr %r2, %r11 # Stack pointer after resume. + brasl %r14, __generic_findstack + lr %r3, %r11 # Get the stack pointer. + sr %r3, %r2 # Subtract available space. + ahi %r3, BACKOFF # Back off a bit. + ear %r1, %a0 # Extract thread pointer. + st %r3, 0x20(%r1) # Save the new stack boundary. + + # We need GOT pointer in %r12 for PLT entry. + larl %r12,_GLOBAL_OFFSET_TABLE_ + lr %r2, %r6 # Exception header. +#ifdef __PIC__ + brasl %r14, _Unwind_Resume@PLT +#else + brasl %r14, _Unwind_Resume +#endif + +#else /* defined(__s390x__) */ + + +# The 64-bit __morestack function. + + # We use a cleanup to restore the stack guard if an exception + # is thrown through this code. +#ifndef __PIC__ + .cfi_personality 0x3,__gcc_personality_v0 + .cfi_lsda 0x3,.LLSDA1 +#else + .cfi_personality 0x9b,DW.ref.__gcc_personality_v0 + .cfi_lsda 0x1b,.LLSDA1 +#endif + + stmg %r2, %r15, 0x10(%r15) # Save %r2-%r15. + .cfi_offset %r6, -0x70 + .cfi_offset %r7, -0x68 + .cfi_offset %r8, -0x60 + .cfi_offset %r9, -0x58 + .cfi_offset %r10, -0x50 + .cfi_offset %r11, -0x48 + .cfi_offset %r12, -0x40 + .cfi_offset %r13, -0x38 + .cfi_offset %r14, -0x30 + .cfi_offset %r15, -0x28 + lgr %r11, %r15 # Make frame pointer for vararg. + .cfi_def_cfa_register %r11 + aghi %r15, -0xa0 # 0xa0 for standard frame. + stg %r11, 0(%r15) # Save back chain. + lgr %r8, %r0 # Save %r0 (static chain). + lgr %r10, %r1 # Save %r1 (address of parameter block). + + lg %r7, 0(%r10) # Required frame size to %r7 + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. + lg %r1, 0x38(%r1) # Get stack bounduary + agr %r1, %r7 # Stack bounduary + frame size + ag %r1, 8(%r10) # + stack param size + clgr %r1, %r15 # Compare with current stack pointer + jle .Lnoalloc # guard > sp - frame-size: need alloc + + brasl %r14, __morestack_block_signals + + # We abuse one of caller's fpr save slots (which we don't use for fprs) + # as a local variable. Not needed here, but done to be consistent with + # the below use. + aghi %r7, BACKOFF # Bump requested size a bit. + stg %r7, 0x80(%r11) # Stuff frame size on stack. + la %r2, 0x80(%r11) # Pass its address as parameter. + la %r3, 0xa0(%r11) # Caller's stack parameters. + lg %r4, 8(%r10) # Size of stack parameters. + brasl %r14, __generic_morestack + + lgr %r15, %r2 # Switch to the new stack. + aghi %r15, -0xa0 # Make a stack frame on it. + stg %r11, 0(%r15) # Save back chain. + + sg %r2, 0x80(%r11) # The end of stack space. + aghi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. +.LEHB0: + stg %r2, 0x38(%r1) # Save the new stack boundary. + + brasl %r14, __morestack_unblock_signals + + lgr %r0, %r8 # Static chain. + lmg %r2, %r6, 0x10(%r11) # Paremeter registers. + + # Third parameter is address of function meat - address of parameter + # block. + ag %r10, 0x10(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0xa0(%r11) + + # State of registers: + # %r0: Static chain from entry. + # %r1: Vararg pointer. + # %r2-%r6: Parameters from entry. + # %r7-%r10: Indeterminate. + # %r11: Frame pointer (%r15 from entry). + # %r12-%r13: Indeterminate. + # %r14: Return address. + # %r15: Stack pointer. + basr %r14, %r10 # Call our caller. + + stg %r2, 0x10(%r11) # Save return register. + + brasl %r14, __morestack_block_signals + + # We need a stack slot now, but have no good way to get it - the frame + # on new stack had to be exactly 0xa0 bytes, or stack parameters would + # be passed wrong. Abuse fpr save area in caller's frame (we don't + # save actual fprs). + la %r2, 0x80(%r11) + brasl %r14, __generic_releasestack + + sg %r2, 0x80(%r11) # Subtract available space. + aghi %r2, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. +.LEHE0: + stg %r2, 0x38(%r1) # Save the new stack boundary. + + # We need to restore the old stack pointer before unblocking signals. + # We also need 0xa0 bytes for a stack frame. Since we had a stack + # frame at this place before the stack switch, there's no need to + # write the back chain again. + lgr %r15, %r11 + aghi %r15, -0xa0 + + brasl %r14, __morestack_unblock_signals + + lmg %r2, %r15, 0x10(%r11) # Restore all registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# Executed if no new stack allocation is needed. + +.Lnoalloc: + .cfi_restore_state + # We may need to copy stack parameters. + lg %r9, 0x8(%r10) # Load stack parameter size. + ltgr %r9, %r9 # Check if it's 0. + je .Lnostackparm # Skip the copy if not needed. + sgr %r15, %r9 # Make space on the stack. + la %r8, 0xa0(%r15) # Destination. + la %r12, 0xa0(%r11) # Source. + lgr %r13, %r9 # Source size. +.Lcopy: + mvcle %r8, %r12, 0 # Copy. + jo .Lcopy + +.Lnostackparm: + # Third parameter is address of function meat - address of parameter + # block. + ag %r10, 0x10(%r10) + + # Leave vararg pointer in %r1, in case function uses it + la %r1, 0xa0(%r11) + + # OK, no stack allocation needed. We still follow the protocol and + # call our caller - it doesn't cost much and makes sure vararg works. + # No need to set any registers here - %r0 and %r2-%r6 weren't modified. + basr %r14, %r10 # Call our caller. + + lmg %r6, %r15, 0x30(%r11) # Restore all callee-saved registers. + .cfi_remember_state + .cfi_restore %r15 + .cfi_restore %r14 + .cfi_restore %r13 + .cfi_restore %r12 + .cfi_restore %r11 + .cfi_restore %r10 + .cfi_restore %r9 + .cfi_restore %r8 + .cfi_restore %r7 + .cfi_restore %r6 + .cfi_def_cfa_register %r15 + br %r14 # Return to caller's caller. + +# This is the cleanup code called by the stack unwinder when unwinding +# through the code between .LEHB0 and .LEHE0 above. + +.L1: + .cfi_restore_state + lgr %r2, %r11 # Stack pointer after resume. + brasl %r14, __generic_findstack + lgr %r3, %r11 # Get the stack pointer. + sgr %r3, %r2 # Subtract available space. + aghi %r3, BACKOFF # Back off a bit. + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 # Extract thread pointer. + stg %r3, 0x38(%r1) # Save the new stack boundary. + + lgr %r2, %r6 # Exception header. +#ifdef __PIC__ + brasl %r14, _Unwind_Resume@PLT +#else + brasl %r14, _Unwind_Resume +#endif + +#endif /* defined(__s390x__) */ + + .cfi_endproc + .size __morestack, . - __morestack + + +# The exception table. This tells the personality routine to execute +# the exception handler. + + .section .gcc_except_table,"a",@progbits + .align 4 +.LLSDA1: + .byte 0xff # @LPStart format (omit) + .byte 0xff # @TType format (omit) + .byte 0x1 # call-site format (uleb128) + .uleb128 .LLSDACSE1-.LLSDACSB1 # Call-site table length +.LLSDACSB1: + .uleb128 .LEHB0-.LFB1 # region 0 start + .uleb128 .LEHE0-.LEHB0 # length + .uleb128 .L1-.LFB1 # landing pad + .uleb128 0 # action +.LLSDACSE1: + + + .global __gcc_personality_v0 +#ifdef __PIC__ + # Build a position independent reference to the basic + # personality function. + .hidden DW.ref.__gcc_personality_v0 + .weak DW.ref.__gcc_personality_v0 + .section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat + .type DW.ref.__gcc_personality_v0, @object +DW.ref.__gcc_personality_v0: +#ifndef __LP64__ + .align 4 + .size DW.ref.__gcc_personality_v0, 4 + .long __gcc_personality_v0 +#else + .align 8 + .size DW.ref.__gcc_personality_v0, 8 + .quad __gcc_personality_v0 +#endif +#endif + + + +# Initialize the stack test value when the program starts or when a +# new thread starts. We don't know how large the main stack is, so we +# guess conservatively. We might be able to use getrlimit here. + + .text + .global __stack_split_initialize + .hidden __stack_split_initialize + + .type __stack_split_initialize, @function + +__stack_split_initialize: + +#ifndef __s390x__ + + ear %r1, %a0 + lr %r0, %r15 + ahi %r0, -0x4000 # We should have at least 16K. + st %r0, 0x20(%r1) + + lr %r2, %r15 + lhi %r3, 0x4000 +#ifdef __PIC__ + jg __generic_morestack_set_initial_sp@PLT # Tail call +#else + jg __generic_morestack_set_initial_sp # Tail call +#endif + +#else /* defined(__s390x__) */ + + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + lgr %r0, %r15 + aghi %r0, -0x4000 # We should have at least 16K. + stg %r0, 0x38(%r1) + + lgr %r2, %r15 + lghi %r3, 0x4000 +#ifdef __PIC__ + jg __generic_morestack_set_initial_sp@PLT # Tail call +#else + jg __generic_morestack_set_initial_sp # Tail call +#endif + +#endif /* defined(__s390x__) */ + + .size __stack_split_initialize, . - __stack_split_initialize + +# Routines to get and set the guard, for __splitstack_getcontext, +# __splitstack_setcontext, and __splitstack_makecontext. + +# void *__morestack_get_guard (void) returns the current stack guard. + .text + .global __morestack_get_guard + .hidden __morestack_get_guard + + .type __morestack_get_guard,@function + +__morestack_get_guard: + +#ifndef __s390x__ + ear %r1, %a0 + l %r2, 0x20(%r1) +#else + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + lg %r2, 0x38(%r1) +#endif + br %r14 + + .size __morestack_get_guard, . - __morestack_get_guard + +# void __morestack_set_guard (void *) sets the stack guard. + .global __morestack_set_guard + .hidden __morestack_set_guard + + .type __morestack_set_guard,@function + +__morestack_set_guard: + +#ifndef __s390x__ + ear %r1, %a0 + st %r2, 0x20(%r1) +#else + ear %r1, %a0 + sllg %r1, %r1, 32 + ear %r1, %a1 + stg %r2, 0x38(%r1) +#endif + br %r14 + + .size __morestack_set_guard, . - __morestack_set_guard + +# void *__morestack_make_guard (void *, size_t) returns the stack +# guard value for a stack. + .global __morestack_make_guard + .hidden __morestack_make_guard + + .type __morestack_make_guard,@function + +__morestack_make_guard: + +#ifndef __s390x__ + sr %r2, %r3 + ahi %r2, BACKOFF +#else + sgr %r2, %r3 + aghi %r2, BACKOFF +#endif + br %r14 + + .size __morestack_make_guard, . - __morestack_make_guard + +# Make __stack_split_initialize a high priority constructor. + + .section .ctors.65535,"aw",@progbits + +#ifndef __LP64__ + .align 4 + .long __stack_split_initialize + .long __morestack_load_mmap +#else + .align 8 + .quad __stack_split_initialize + .quad __morestack_load_mmap +#endif + + .section .note.GNU-stack,"",@progbits + .section .note.GNU-split-stack,"",@progbits + .section .note.GNU-no-split-stack,"",@progbits diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390 new file mode 100644 index 0000000..4c959b0 --- /dev/null +++ b/libgcc/config/s390/t-stack-s390 @@ -0,0 +1,2 @@ +# Makefile fragment to support -fsplit-stack for s390. +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c index 89765d4..b8eec4e 100644 --- a/libgcc/generic-morestack.c +++ b/libgcc/generic-morestack.c @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len, #elif defined (__i386__) nsp -= 6 * sizeof (void *); #elif defined __powerpc64__ +#elif defined __s390x__ + nsp -= 2 * 160; +#elif defined __s390__ + nsp -= 2 * 96; #else #error "unrecognized target" #endif -- 2.7.0 ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-14 16:01 ` Marcin Kościelnicki @ 2016-02-15 10:21 ` Andreas Krebbel 2016-02-15 10:44 ` Marcin Kościelnicki 0 siblings, 1 reply; 55+ messages in thread From: Andreas Krebbel @ 2016-02-15 10:21 UTC (permalink / raw) To: Marcin Kościelnicki, uweigand; +Cc: gcc-patches On 02/14/2016 05:01 PM, Marcin KoÅcielnicki wrote: > libgcc/ChangeLog: > > * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. > * config/s390/morestack.S: New file. > * config/s390/t-stack-s390: New file. > * generic-morestack.c (__splitstack_find): Add s390-specific code. > > gcc/ChangeLog: > > * common/config/s390/s390-common.c (s390_supports_split_stack): > New function. > (TARGET_SUPPORTS_SPLIT_STACK): New macro. > * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. > * config/s390/s390.c (struct machine_function): New field > split_stack_varargs_pointer. > (s390_register_info): Mark r12 as clobbered if it'll be used as temp > in s390_emit_prologue. > (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack > vararg pointer. > (morestack_ref): New global. > (SPLIT_STACK_AVAILABLE): New macro. > (s390_expand_split_stack_prologue): New function. > (s390_live_on_entry): New function. > (s390_va_start): Use split-stack vararg pointer if appropriate. > (s390_asm_file_end): Emit the split-stack note sections. > (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. > * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. > (UNSPECV_SPLIT_STACK_CALL): New unspec. > (UNSPECV_SPLIT_STACK_DATA): New unspec. > (split_stack_prologue): New expand. > (split_stack_space_check): New expand. > (split_stack_data): New insn. > (split_stack_call): New expand. > (split_stack_call_*): New insn. > (split_stack_cond_call): New expand. > (split_stack_cond_call_*): New insn. Applied. Thanks! -Andreas- ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] s390: Add -fsplit-stack support 2016-02-15 10:21 ` Andreas Krebbel @ 2016-02-15 10:44 ` Marcin Kościelnicki 0 siblings, 0 replies; 55+ messages in thread From: Marcin Kościelnicki @ 2016-02-15 10:44 UTC (permalink / raw) To: Andreas Krebbel, uweigand; +Cc: gcc-patches On 15/02/16 11:21, Andreas Krebbel wrote: > On 02/14/2016 05:01 PM, Marcin KoÅcielnicki wrote: >> libgcc/ChangeLog: >> >> * config.host: Use t-stack and t-stack-s390 for s390*-*-linux. >> * config/s390/morestack.S: New file. >> * config/s390/t-stack-s390: New file. >> * generic-morestack.c (__splitstack_find): Add s390-specific code. >> >> gcc/ChangeLog: >> >> * common/config/s390/s390-common.c (s390_supports_split_stack): >> New function. >> (TARGET_SUPPORTS_SPLIT_STACK): New macro. >> * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue. >> * config/s390/s390.c (struct machine_function): New field >> split_stack_varargs_pointer. >> (s390_register_info): Mark r12 as clobbered if it'll be used as temp >> in s390_emit_prologue. >> (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack >> vararg pointer. >> (morestack_ref): New global. >> (SPLIT_STACK_AVAILABLE): New macro. >> (s390_expand_split_stack_prologue): New function. >> (s390_live_on_entry): New function. >> (s390_va_start): Use split-stack vararg pointer if appropriate. >> (s390_asm_file_end): Emit the split-stack note sections. >> (TARGET_EXTRA_LIVE_ON_ENTRY): New macro. >> * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec. >> (UNSPECV_SPLIT_STACK_CALL): New unspec. >> (UNSPECV_SPLIT_STACK_DATA): New unspec. >> (split_stack_prologue): New expand. >> (split_stack_space_check): New expand. >> (split_stack_data): New insn. >> (split_stack_call): New expand. >> (split_stack_call_*): New insn. >> (split_stack_cond_call): New expand. >> (split_stack_cond_call_*): New insn. > > Applied. Thanks! > > -Andreas- > Thanks. And how about that testcase I submitted, does that look OK? Marcin KoÅcielnicki ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH] testsuite/s390: Add __morestack test. 2016-01-29 16:17 ` Andreas Krebbel 2016-02-02 14:52 ` Marcin Kościelnicki @ 2016-02-07 12:22 ` Marcin Kościelnicki 2016-02-19 10:21 ` Andreas Krebbel 1 sibling, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-02-07 12:22 UTC (permalink / raw) To: krebbel; +Cc: gcc-patches, Marcin Kościelnicki gcc/testsuite/ChangeLog: * gcc.target/s390/morestack.c: New test. --- Here's the promised test. gcc/testsuite/ChangeLog | 4 + gcc/testsuite/gcc.target/s390/morestack.c | 260 ++++++++++++++++++++++++++++++ 2 files changed, 264 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/morestack.c diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 8f528b2..26d600f 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2016-02-05 Marcin KoÅcielnicki <koriakin@0x04.net>: + + * gcc.target/s390/morestack.c: New test. + 2016-02-04 Martin Liska <mliska@suse.cz> * g++.dg/asan/pr69276.C: New test. diff --git a/gcc/testsuite/gcc.target/s390/morestack.c b/gcc/testsuite/gcc.target/s390/morestack.c new file mode 100644 index 0000000..aa28b72 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/morestack.c @@ -0,0 +1,260 @@ +/* Checks proper behavior of __morestack function - specifically, GPR + values surviving, stack parameters being copied, and vararg + pointer being correct. */ + +/* { dg-do run } */ +/* { dg-options "" } */ + +#include <stdlib.h> + +void *orig_r15; + +/* 1. Function "test" saves registers, makes a stack frame, puts known + * values in registers, and calls __morestack, telling it to jump to + * testinner, with return address pointing to "testret". + * 2. "testinner" checks that parameter registers match what has been + * passed from "test", stack parameters were copied properly to + * the new stack, and the argument pointer matches the calling + * function's stack pointer. It then leaves new values in volatile + * registers (including return value registers) and returns. + * 3. "testret" checks that return value registers contain the expected + * return value, callee-saved GPRs match the values from "test", + * and then returns to main. */ + +extern unsigned long testparams[3]; + +#ifdef __s390x__ + +asm( + ".global test\n" + "test:\n" + ".type test, @function\n" + /* Save registers. */ + "stmg %r6, %r15, 0x30(%r15)\n" + /* Save original sp in a global. */ + "larl %r1, orig_r15\n" + "stg %r15, 0(%r1)\n" + /* Make a stack frame. */ + "aghi %r15, -168\n" + /* A stack parameter. */ + "lghi %r1, 0x1240\n" + "stg %r1, 160(%r15)\n" + /* Registers. */ + "lghi %r0, 0x1230\n" + "lghi %r2, 0x1232\n" + "lghi %r3, 0x1233\n" + "lghi %r4, 0x1234\n" + "lghi %r5, 0x1235\n" + "lghi %r6, 0x1236\n" + "lghi %r7, 0x1237\n" + "lghi %r8, 0x1238\n" + "lghi %r9, 0x1239\n" + "lghi %r10, 0x123a\n" + "lghi %r11, 0x123b\n" + "lghi %r12, 0x123c\n" + "lghi %r13, 0x123d\n" + /* Fake return address. */ + "larl %r14, testret\n" + /* Call morestack. */ + "larl %r1, testparams\n" + "jg __morestack\n" + + /* Entry point. */ + "testinner:\n" + /* Check registers. */ + "cghi %r0, 0x1230\n" + "jne testerr\n" + "cghi %r2, 0x1232\n" + "jne testerr\n" + "cghi %r3, 0x1233\n" + "jne testerr\n" + "cghi %r4, 0x1234\n" + "jne testerr\n" + "cghi %r5, 0x1235\n" + "jne testerr\n" + "cghi %r6, 0x1236\n" + "jne testerr\n" + /* Check stack param. */ + "lg %r0, 0xa0(%r15)\n" + "cghi %r0, 0x1240\n" + "jne testerr\n" + /* Check argument pointer. */ + "aghi %r1, 8\n" + "larl %r2, orig_r15\n" + "cg %r1, 0(%r2)\n" + "jne testerr\n" + /* Modify volatile registers. */ + "lghi %r0, 0x1250\n" + "lghi %r1, 0x1251\n" + "lghi %r2, 0x1252\n" + "lghi %r3, 0x1253\n" + "lghi %r4, 0x1254\n" + "lghi %r5, 0x1255\n" + /* Return. */ + "br %r14\n" + + /* Returns here. */ + "testret:\n" + /* Check return registers. */ + "cghi %r2, 0x1252\n" + "jne testerr\n" + /* Check callee-saved registers. */ + "cghi %r6, 0x1236\n" + "jne testerr\n" + "cghi %r7, 0x1237\n" + "jne testerr\n" + "cghi %r8, 0x1238\n" + "jne testerr\n" + "cghi %r9, 0x1239\n" + "jne testerr\n" + "cghi %r10, 0x123a\n" + "jne testerr\n" + "cghi %r11, 0x123b\n" + "jne testerr\n" + "cghi %r12, 0x123c\n" + "jne testerr\n" + "cghi %r13, 0x123d\n" + "jne testerr\n" + /* Return. */ + "lmg %r6, %r15, 0xd8(%r15)\n" + "br %r14\n" + + /* Parameters block. */ + ".section .data\n" + ".align 8\n" + "testparams:\n" + ".quad 160\n" + ".quad 8\n" + ".quad testinner-testparams\n" + ".text\n" +); + +#else + +asm( + ".global test\n" + "test:\n" + ".type test, @function\n" + /* Save registers. */ + "stm %r6, %r15, 0x18(%r15)\n" + /* Save original sp in a global. */ + "larl %r1, orig_r15\n" + "st %r15, 0(%r1)\n" + /* Make a stack frame. */ + "ahi %r15, -0x68\n" + /* A stack parameter. */ + "lhi %r1, 0x1240\n" + "st %r1, 0x60(%r15)\n" + "lhi %r1, 0x1241\n" + "st %r1, 0x64(%r15)\n" + /* Registers. */ + "lhi %r0, 0x1230\n" + "lhi %r2, 0x1232\n" + "lhi %r3, 0x1233\n" + "lhi %r4, 0x1234\n" + "lhi %r5, 0x1235\n" + "lhi %r6, 0x1236\n" + "lhi %r7, 0x1237\n" + "lhi %r8, 0x1238\n" + "lhi %r9, 0x1239\n" + "lhi %r10, 0x123a\n" + "lhi %r11, 0x123b\n" + "lhi %r12, 0x123c\n" + "lhi %r13, 0x123d\n" + /* Fake return address. */ + "larl %r14, testret\n" + /* Call morestack. */ + "larl %r1, testparams\n" + "jg __morestack\n" + + /* Entry point. */ + "testinner:\n" + /* Check registers. */ + "chi %r0, 0x1230\n" + "jne testerr\n" + "chi %r2, 0x1232\n" + "jne testerr\n" + "chi %r3, 0x1233\n" + "jne testerr\n" + "chi %r4, 0x1234\n" + "jne testerr\n" + "chi %r5, 0x1235\n" + "jne testerr\n" + "chi %r6, 0x1236\n" + "jne testerr\n" + /* Check stack param. */ + "l %r0, 0x60(%r15)\n" + "chi %r0, 0x1240\n" + "jne testerr\n" + "l %r0, 0x64(%r15)\n" + "chi %r0, 0x1241\n" + "jne testerr\n" + /* Check argument pointer. */ + "ahi %r1, 8\n" + "larl %r2, orig_r15\n" + "c %r1, 0(%r2)\n" + "jne testerr\n" + /* Modify volatile registers. */ + "lhi %r0, 0x1250\n" + "lhi %r1, 0x1251\n" + "lhi %r2, 0x1252\n" + "lhi %r3, 0x1253\n" + "lhi %r4, 0x1254\n" + "lhi %r5, 0x1255\n" + /* Return. */ + "br %r14\n" + + /* Returns here. */ + "testret:\n" + /* Check return registers. */ + "chi %r2, 0x1252\n" + "jne testerr\n" + "chi %r3, 0x1253\n" + "jne testerr\n" + /* Check callee-saved registers. */ + "chi %r6, 0x1236\n" + "jne testerr\n" + "chi %r7, 0x1237\n" + "jne testerr\n" + "chi %r8, 0x1238\n" + "jne testerr\n" + "chi %r9, 0x1239\n" + "jne testerr\n" + "chi %r10, 0x123a\n" + "jne testerr\n" + "chi %r11, 0x123b\n" + "jne testerr\n" + "chi %r12, 0x123c\n" + "jne testerr\n" + "chi %r13, 0x123d\n" + "jne testerr\n" + /* Return. */ + "lm %r6, %r15, 0x80(%r15)\n" + "br %r14\n" + + /* Parameters block. */ + ".section .data\n" + ".align 4\n" + "testparams:\n" + ".long 96\n" + ".long 8\n" + ".long testinner-testparams\n" + ".text\n" +); + +#endif + +_Noreturn void testerr (void) { + exit(1); +} + +extern void test (void); + +int main (void) { + test(); + /* Now try again, with huge stack frame requested - to exercise + both paths in __morestack (new allocation needed or not). */ + testparams[0] = 1000000; + test(); + return 0; +} -- 2.7.0 ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH] testsuite/s390: Add __morestack test. 2016-02-07 12:22 ` [PATCH] testsuite/s390: Add __morestack test Marcin Kościelnicki @ 2016-02-19 10:21 ` Andreas Krebbel 0 siblings, 0 replies; 55+ messages in thread From: Andreas Krebbel @ 2016-02-19 10:21 UTC (permalink / raw) To: Marcin Kościelnicki; +Cc: gcc-patches On 02/07/2016 01:22 PM, Marcin KoÅcielnicki wrote: > gcc/testsuite/ChangeLog: > > * gcc.target/s390/morestack.c: New test. Applied. Thanks! -Andreas- ^ permalink raw reply [flat|nested] 55+ messages in thread
* [PATCH 1/5] s390: Use proper read-only data section for literals. 2016-01-02 19:16 [RFC] [PR 68191] s390: Add -fsplit-stack support Marcin Kościelnicki ` (3 preceding siblings ...) 2016-01-02 19:17 ` [PATCH 5/5] s390: Add -fsplit-stack support Marcin Kościelnicki @ 2016-01-02 19:17 ` Marcin Kościelnicki 2016-01-20 13:11 ` Andreas Krebbel 2016-01-03 3:21 ` [RFC] [PR 68191] s390: Add -fsplit-stack support Ian Lance Taylor 5 siblings, 1 reply; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-02 19:17 UTC (permalink / raw) To: gcc-patches; +Cc: Marcin Kościelnicki Previously, .rodata was hardcoded. For C++ vague linkage functions, this resulted in needlessly duplicated literals. With the new split stack support, this resulted in link errors, due to .rodata containing relocations to the discarded text sections. gcc/ChangeLog: * config/s390/s390.md (pool_section_start): Use switch_to_section to select proper read-only data section instead of hardcoding .rodata. (pool_section_end): Use switch_to_section to match the above. --- gcc/ChangeLog | 6 ++++++ gcc/config/s390/s390.md | 11 +++++++++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 23ce209..2c572a7 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2016-01-02 Marcin Kościelnicki <koriakin@0x04.net> + + * config/s390/s390.md (pool_section_start): Use switch_to_section + to select proper read-only data section instead of hardcoding .rodata. + (pool_section_end): Use switch_to_section to match the above. + 2016-01-01 Sandra Loosemore <sandra@codesourcery.com> PR 1078 diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index a1fc96a..0ebefd6 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -10247,13 +10247,20 @@ (define_insn "pool_section_start" [(unspec_volatile [(const_int 1)] UNSPECV_POOL_SECTION)] "" - ".section\t.rodata" +{ + switch_to_section (targetm.asm_out.function_rodata_section + (current_function_decl)); + return ""; +} [(set_attr "length" "0")]) (define_insn "pool_section_end" [(unspec_volatile [(const_int 0)] UNSPECV_POOL_SECTION)] "" - ".previous" +{ + switch_to_section (current_function_section ()); + return ""; +} [(set_attr "length" "0")]) (define_insn "main_base_31_small" -- 2.6.4 ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 1/5] s390: Use proper read-only data section for literals. 2016-01-02 19:17 ` [PATCH 1/5] s390: Use proper read-only data section for literals Marcin Kościelnicki @ 2016-01-20 13:11 ` Andreas Krebbel 2016-01-21 6:56 ` Marcin Kościelnicki 0 siblings, 1 reply; 55+ messages in thread From: Andreas Krebbel @ 2016-01-20 13:11 UTC (permalink / raw) To: Marcin Kościelnicki, gcc-patches On 01/02/2016 08:16 PM, Marcin KoÅcielnicki wrote: > Previously, .rodata was hardcoded. For C++ vague linkage functions, > this resulted in needlessly duplicated literals. With the new split > stack support, this resulted in link errors, due to .rodata containing > relocations to the discarded text sections. > > gcc/ChangeLog: > > * config/s390/s390.md (pool_section_start): Use switch_to_section > to select proper read-only data section instead of hardcoding .rodata. > (pool_section_end): Use switch_to_section to match the above. > --- > gcc/ChangeLog | 6 ++++++ > gcc/config/s390/s390.md | 11 +++++++++-- > 2 files changed, 15 insertions(+), 2 deletions(-) > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 23ce209..2c572a7 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,9 @@ > +2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> > + > + * config/s390/s390.md (pool_section_start): Use switch_to_section > + to select proper read-only data section instead of hardcoding .rodata. > + (pool_section_end): Use switch_to_section to match the above. > + This is ok if bootstrap and regression tests are clean. Thanks! -Andreas- ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 1/5] s390: Use proper read-only data section for literals. 2016-01-20 13:11 ` Andreas Krebbel @ 2016-01-21 6:56 ` Marcin Kościelnicki 2016-01-21 8:17 ` Mike Stump 2016-01-21 9:46 ` Andreas Krebbel 0 siblings, 2 replies; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-21 6:56 UTC (permalink / raw) To: Andreas Krebbel, gcc-patches On 20/01/16 14:11, Andreas Krebbel wrote: > On 01/02/2016 08:16 PM, Marcin KoÅcielnicki wrote: >> Previously, .rodata was hardcoded. For C++ vague linkage functions, >> this resulted in needlessly duplicated literals. With the new split >> stack support, this resulted in link errors, due to .rodata containing >> relocations to the discarded text sections. >> >> gcc/ChangeLog: >> >> * config/s390/s390.md (pool_section_start): Use switch_to_section >> to select proper read-only data section instead of hardcoding .rodata. >> (pool_section_end): Use switch_to_section to match the above. >> --- >> gcc/ChangeLog | 6 ++++++ >> gcc/config/s390/s390.md | 11 +++++++++-- >> 2 files changed, 15 insertions(+), 2 deletions(-) >> >> diff --git a/gcc/ChangeLog b/gcc/ChangeLog >> index 23ce209..2c572a7 100644 >> --- a/gcc/ChangeLog >> +++ b/gcc/ChangeLog >> @@ -1,3 +1,9 @@ >> +2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> >> + >> + * config/s390/s390.md (pool_section_start): Use switch_to_section >> + to select proper read-only data section instead of hardcoding .rodata. >> + (pool_section_end): Use switch_to_section to match the above. >> + > > This is ok if bootstrap and regression tests are clean. Thanks! > > -Andreas- > > The bootstrap and regression tests are indeed clean for this patch and #2. I don't have commit access to gcc repo, how do I get this pushed? Marcin KoÅcielnicki ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 1/5] s390: Use proper read-only data section for literals. 2016-01-21 6:56 ` Marcin Kościelnicki @ 2016-01-21 8:17 ` Mike Stump 2016-01-21 9:46 ` Andreas Krebbel 1 sibling, 0 replies; 55+ messages in thread From: Mike Stump @ 2016-01-21 8:17 UTC (permalink / raw) To: Marcin Kościelnicki; +Cc: Andreas Krebbel, gcc-patches On Jan 20, 2016, at 10:56 PM, Marcin Kościelnicki <koriakin@0x04.net> wrote: >> This is ok if bootstrap and regression tests are clean. Thanks! > The bootstrap and regression tests are indeed clean for this patch and #2. I don't have commit access to gcc repo, how do I get this pushed? Just ask someone to apply it for you. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [PATCH 1/5] s390: Use proper read-only data section for literals. 2016-01-21 6:56 ` Marcin Kościelnicki 2016-01-21 8:17 ` Mike Stump @ 2016-01-21 9:46 ` Andreas Krebbel 1 sibling, 0 replies; 55+ messages in thread From: Andreas Krebbel @ 2016-01-21 9:46 UTC (permalink / raw) To: Marcin Kościelnicki, gcc-patches On 01/21/2016 07:56 AM, Marcin KoÅcielnicki wrote: >>> +2016-01-02 Marcin KoÅcielnicki <koriakin@0x04.net> >>> + >>> + * config/s390/s390.md (pool_section_start): Use switch_to_section >>> + to select proper read-only data section instead of hardcoding .rodata. >>> + (pool_section_end): Use switch_to_section to match the above. >>> + >> >> This is ok if bootstrap and regression tests are clean. Thanks! >> >> -Andreas- >> >> > > The bootstrap and regression tests are indeed clean for this patch and > #2. I don't have commit access to gcc repo, how do I get this pushed? Committed to mainline. Thanks! -Andreas- ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC] [PR 68191] s390: Add -fsplit-stack support. 2016-01-02 19:16 [RFC] [PR 68191] s390: Add -fsplit-stack support Marcin Kościelnicki ` (4 preceding siblings ...) 2016-01-02 19:17 ` [PATCH 1/5] s390: Use proper read-only data section for literals Marcin Kościelnicki @ 2016-01-03 3:21 ` Ian Lance Taylor 2016-01-03 10:32 ` Marcin Kościelnicki 2016-01-04 7:35 ` Marcin Kościelnicki 5 siblings, 2 replies; 55+ messages in thread From: Ian Lance Taylor @ 2016-01-03 3:21 UTC (permalink / raw) To: Marcin Kościelnicki; +Cc: gcc-patches On Sat, Jan 2, 2016 at 11:16 AM, Marcin Kościelnicki <koriakin@0x04.net> wrote: > > The differences start in the __morestack calling convention. Basically, > since pushing things on stuck is unwieldy and there's only one free > register (%r0 could be used for static chain, %r2-%r6 contain arguments, > %r6-%r15 are callee-saved), I stuff the parameters somewhere in .rodata > or .text section, and pass the address of the parameter block in %r1. > The parameter block also contains a (position-relative) address that > __morestack should jump to (x86 just mangles the return address from > __morestack to compute that). On zSeries CPUs, the parameter block > is stuffed somewhere in .rodata, its address loaded to %r1 by larl > instruction, and __morestack is sibling-called by jg instruction. Does that work in a multi-threaded program if two different threads are calling the same function at the same time and both threads need to split the stack? Ian ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC] [PR 68191] s390: Add -fsplit-stack support. 2016-01-03 3:21 ` [RFC] [PR 68191] s390: Add -fsplit-stack support Ian Lance Taylor @ 2016-01-03 10:32 ` Marcin Kościelnicki 2016-01-04 7:35 ` Marcin Kościelnicki 1 sibling, 0 replies; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-03 10:32 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: gcc-patches On 03/01/16 04:20, Ian Lance Taylor wrote: > On Sat, Jan 2, 2016 at 11:16 AM, Marcin KoÅcielnicki <koriakin@0x04.net> wrote: >> >> The differences start in the __morestack calling convention. Basically, >> since pushing things on stuck is unwieldy and there's only one free >> register (%r0 could be used for static chain, %r2-%r6 contain arguments, >> %r6-%r15 are callee-saved), I stuff the parameters somewhere in .rodata >> or .text section, and pass the address of the parameter block in %r1. >> The parameter block also contains a (position-relative) address that >> __morestack should jump to (x86 just mangles the return address from >> __morestack to compute that). On zSeries CPUs, the parameter block >> is stuffed somewhere in .rodata, its address loaded to %r1 by larl >> instruction, and __morestack is sibling-called by jg instruction. > > Does that work in a multi-threaded program if two different threads > are calling the same function at the same time and both threads need > to split the stack? > > Ian > Sure, why not? The parameters are link-time constants after all. Marcin KoÅcielnicki ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [RFC] [PR 68191] s390: Add -fsplit-stack support. 2016-01-03 3:21 ` [RFC] [PR 68191] s390: Add -fsplit-stack support Ian Lance Taylor 2016-01-03 10:32 ` Marcin Kościelnicki @ 2016-01-04 7:35 ` Marcin Kościelnicki 1 sibling, 0 replies; 55+ messages in thread From: Marcin Kościelnicki @ 2016-01-04 7:35 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: gcc-patches On 03/01/16 04:20, Ian Lance Taylor wrote: > On Sat, Jan 2, 2016 at 11:16 AM, Marcin KoÅcielnicki <koriakin@0x04.net> wrote: >> >> The differences start in the __morestack calling convention. Basically, >> since pushing things on stuck is unwieldy and there's only one free >> register (%r0 could be used for static chain, %r2-%r6 contain arguments, >> %r6-%r15 are callee-saved), I stuff the parameters somewhere in .rodata >> or .text section, and pass the address of the parameter block in %r1. >> The parameter block also contains a (position-relative) address that >> __morestack should jump to (x86 just mangles the return address from >> __morestack to compute that). On zSeries CPUs, the parameter block >> is stuffed somewhere in .rodata, its address loaded to %r1 by larl >> instruction, and __morestack is sibling-called by jg instruction. > > Does that work in a multi-threaded program if two different threads > are calling the same function at the same time and both threads need > to split the stack? For a few more details - __morestack takes three parameters: - function's frame size (initial frame size if it happens to use alloca or VLAs later) - size function's arguments on stack (not including varargs, if any) - a pointer to the label where execution should be continued after stack is allocated All three are per-function consts. The first two are computed by the compiler (though frame size can be mangled by linker for functions calling non-split-stack code), and the third by the linker (since it involves relocation). Since the parameters are known at link time, they're put in a per-function block in .rodata or .text and never change. Simultanous access to that area is not a problem, since it's never written. Marcin KoÅcielnicki > > Ian > ^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2016-04-17 21:24 UTC | newest] Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-02 19:16 [RFC] [PR 68191] s390: Add -fsplit-stack support Marcin Kościelnicki 2016-01-02 19:16 ` [PATCH 2/5] s390: Fix missing .size directives Marcin Kościelnicki 2016-01-20 13:16 ` Andreas Krebbel 2016-01-20 14:01 ` Dominik Vogt 2016-01-21 9:59 ` Andreas Krebbel 2016-01-21 10:10 ` Marcin Kościelnicki 2016-01-02 19:16 ` [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU Marcin Kościelnicki 2016-01-21 10:05 ` Andreas Krebbel 2016-01-21 10:10 ` Marcin Kościelnicki 2016-01-21 23:10 ` Jeff Law 2016-01-22 7:44 ` Andreas Krebbel 2016-01-22 16:39 ` Marcin Kościelnicki 2016-01-27 7:11 ` Jeff Law 2016-04-17 21:24 ` Jeff Law 2016-01-02 19:16 ` [PATCH 3/5] Fix NOTE_INSN_PROLOGUE_END after unconditional jump Marcin Kościelnicki 2016-04-17 21:25 ` Jeff Law 2016-01-02 19:17 ` [PATCH 5/5] s390: Add -fsplit-stack support Marcin Kościelnicki 2016-01-15 18:39 ` Andreas Krebbel 2016-01-15 21:08 ` Marcin Kościelnicki 2016-01-21 10:12 ` Andreas Krebbel 2016-01-21 13:04 ` Marcin Kościelnicki 2016-01-16 13:46 ` [PATCH] " Marcin Kościelnicki 2016-01-29 13:33 ` Andreas Krebbel 2016-01-29 15:43 ` Marcin Kościelnicki 2016-01-29 16:17 ` Andreas Krebbel 2016-02-02 14:52 ` Marcin Kościelnicki 2016-02-02 15:19 ` Andreas Krebbel 2016-02-02 15:31 ` Marcin Kościelnicki 2016-02-02 18:34 ` Ulrich Weigand 2016-02-02 20:11 ` Marcin Kościelnicki 2016-02-03 18:40 ` Marcin Kościelnicki 2016-02-04 15:06 ` Ulrich Weigand 2016-02-04 15:20 ` Marcin Kościelnicki 2016-02-04 16:27 ` Ulrich Weigand 2016-02-05 21:13 ` Marcin Kościelnicki 2016-02-05 22:02 ` Ulrich Weigand 2016-02-03 0:20 ` Marcin Kościelnicki 2016-02-03 17:03 ` Ulrich Weigand 2016-02-03 17:18 ` Marcin Kościelnicki 2016-02-03 17:27 ` Ulrich Weigand 2016-02-04 12:44 ` Marcin Kościelnicki 2016-02-10 13:14 ` Marcin Kościelnicki 2016-02-14 16:01 ` Marcin Kościelnicki 2016-02-15 10:21 ` Andreas Krebbel 2016-02-15 10:44 ` Marcin Kościelnicki 2016-02-07 12:22 ` [PATCH] testsuite/s390: Add __morestack test Marcin Kościelnicki 2016-02-19 10:21 ` Andreas Krebbel 2016-01-02 19:17 ` [PATCH 1/5] s390: Use proper read-only data section for literals Marcin Kościelnicki 2016-01-20 13:11 ` Andreas Krebbel 2016-01-21 6:56 ` Marcin Kościelnicki 2016-01-21 8:17 ` Mike Stump 2016-01-21 9:46 ` Andreas Krebbel 2016-01-03 3:21 ` [RFC] [PR 68191] s390: Add -fsplit-stack support Ian Lance Taylor 2016-01-03 10:32 ` Marcin Kościelnicki 2016-01-04 7:35 ` Marcin Kościelnicki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).