public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU.
  2016-01-02 19:16 [RFC] [PR 68191] s390: Add -fsplit-stack support Marcin Kościelnicki
  2016-01-02 19:16 ` [PATCH 2/5] s390: Fix missing .size directives Marcin Kościelnicki
@ 2016-01-02 19:16 ` Marcin Kościelnicki
  2016-01-21 10:05   ` Andreas Krebbel
  2016-04-17 21:24   ` Jeff Law
  2016-01-02 19:16 ` [PATCH 3/5] Fix NOTE_INSN_PROLOGUE_END after unconditional jump Marcin Kościelnicki
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-02 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Marcin Kościelnicki

When an unconditional jump with side effects targets an immediately
following label, rtl_tidy_fallthru_edge is called.  Since it has side
effects, it doesn't remove the jump, but the label is still marked
as fallthru.  This later causes a verification error.  Do nothing in this
case instead.

gcc/ChangeLog:

	* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
	with side effects.
---
 gcc/ChangeLog | 5 +++++
 gcc/cfgrtl.c  | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 56e31f6..4c7046f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,10 @@
 2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
 
+	* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
+	with side effects.
+
+2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
+
 	* function.c (reposition_prologue_and_epilogue_notes): Avoid
 	verification error if the last insn of prologue is an unconditional
 	jump.
diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index fbfc7cd..dc4c2b1 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -1762,6 +1762,8 @@ rtl_tidy_fallthru_edge (edge e)
      If block B consisted only of this single jump, turn it into a deleted
      note.  */
   q = BB_END (b);
+  if (JUMP_P (q) && !onlyjump_p (q))
+    return;
   if (JUMP_P (q)
       && onlyjump_p (q)
       && (any_uncondjump_p (q)
-- 
2.6.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 3/5] Fix NOTE_INSN_PROLOGUE_END after unconditional jump.
  2016-01-02 19:16 [RFC] [PR 68191] s390: Add -fsplit-stack support Marcin Kościelnicki
  2016-01-02 19:16 ` [PATCH 2/5] s390: Fix missing .size directives Marcin Kościelnicki
  2016-01-02 19:16 ` [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU Marcin Kościelnicki
@ 2016-01-02 19:16 ` Marcin Kościelnicki
  2016-04-17 21:25   ` Jeff Law
  2016-01-02 19:17 ` [PATCH 5/5] s390: Add -fsplit-stack support Marcin Kościelnicki
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-02 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Marcin Kościelnicki

With the new s390 split-stack support, when optimization is enabled,
the cold path of calling __morestack is likely to be moved to the
end of the function.  This will result in the function ending in
split_stack_call_esa, which is an unconditional jump instruction and
part of the function prologue.  reposition_prologue_and_epilogue_notes
will insert NOTE_INSN_PROLOGUE_END right after it (and before the
following barrier), causing a verification error.  Insert it after
the barrier instead (and outside of basic block).

gcc/ChangeLog:

	* function.c (reposition_prologue_and_epilogue_notes): Avoid
	verification error if the last insn of prologue is an unconditional
	jump.
---
 gcc/ChangeLog  | 6 ++++++
 gcc/function.c | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6aef3f9..56e31f6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,11 @@
 2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
 
+	* function.c (reposition_prologue_and_epilogue_notes): Avoid
+	verification error if the last insn of prologue is an unconditional
+	jump.
+
+2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
+
 	* config/s390/s390.c (s390_asm_declare_function_size): Add code
 	to actually emit the .size directive.
 
diff --git a/gcc/function.c b/gcc/function.c
index 035a49e..921945f 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -6348,6 +6348,12 @@ reposition_prologue_and_epilogue_notes (void)
 	  /* Avoid placing note between CODE_LABEL and BASIC_BLOCK note.  */
 	  if (LABEL_P (last))
 	    last = NEXT_INSN (last);
+	  if (BARRIER_P (last) && BLOCK_FOR_INSN (note))
+	    {
+	      if (BB_END (BLOCK_FOR_INSN (note)) == note)
+		BB_END (BLOCK_FOR_INSN (note)) = PREV_INSN (note);
+	      BLOCK_FOR_INSN (note) = 0;
+	    }
 	  reorder_insns (note, note, last);
 	}
     }
-- 
2.6.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 2/5] s390: Fix missing .size directives.
  2016-01-02 19:16 [RFC] [PR 68191] s390: Add -fsplit-stack support Marcin Kościelnicki
@ 2016-01-02 19:16 ` Marcin Kościelnicki
  2016-01-20 13:16   ` Andreas Krebbel
  2016-01-02 19:16 ` [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU Marcin Kościelnicki
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-02 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Marcin Kościelnicki

It seems at some point the .size hook was hijacked to emit some
machine-specific directives, and the actual .size directive was
forgotten.  This caused problems for split-stack support, since
linker couldn't scan the function body for non-split-stack calls.

gcc/ChangeLog:

	* config/s390/s390.c (s390_asm_declare_function_size): Add code
	to actually emit the .size directive.
---
 gcc/ChangeLog          | 5 +++++
 gcc/config/s390/s390.c | 4 +++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2c572a7..6aef3f9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,10 @@
 2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
 
+	* config/s390/s390.c (s390_asm_declare_function_size): Add code
+	to actually emit the .size directive.
+
+2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
+
 	* config/s390/s390.md (pool_section_start): Use switch_to_section
 	to select proper read-only data section instead of hardcoding .rodata.
 	(pool_section_end): Use switch_to_section to match the above.
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 16045f0..9dc8d1e 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -6834,8 +6834,10 @@ s390_asm_output_function_prefix (FILE *asm_out_file,
 
 void
 s390_asm_declare_function_size (FILE *asm_out_file,
-				const char *fnname ATTRIBUTE_UNUSED, tree decl)
+				const char *fnname, tree decl)
 {
+  if (!flag_inhibit_size_directive)
+    ASM_OUTPUT_MEASURED_SIZE (asm_out_file, fnname);
   if (DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL)
     return;
   fprintf (asm_out_file, "\t.machine pop\n");
-- 
2.6.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC] [PR 68191] s390: Add -fsplit-stack support.
@ 2016-01-02 19:16 Marcin Kościelnicki
  2016-01-02 19:16 ` [PATCH 2/5] s390: Fix missing .size directives Marcin Kościelnicki
                   ` (5 more replies)
  0 siblings, 6 replies; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-02 19:16 UTC (permalink / raw)
  To: gcc-patches

Here's my attempt at adding -fsplit-stack support for s390 targets
(bug 68191).  Patches 1 and 2 fix s390-specific issues affecting split
stack code, and can be pushed independently of the main course.  Patches
3 and 4 attempt to fix target-independent issues involving unconditional
jumps with side effects (see below).  I'm not exactly sure I'm doing
the right thing in these, and I'd really welcome some feedback about
them and the general approach taken.  Patch 5 is split stack support
proper.  This patch should be used along with the matching glibc and
gold patches (I'll soon link them all in the bugzilla entry).

The generic approach is identical to x86: I add a new __private_ss
field to the TCB in glibc, add a target-specific __morestack function
and friends, emit a split-stack prologue, teach va_start to deal with
a dedicated vararg pointer, and teach gold to recognize the split-stack
prologue and handle non-split-stack calls by bumping the requested
frame size.

The differences start in the __morestack calling convention.  Basically,
since pushing things on stuck is unwieldy and there's only one free
register (%r0 could be used for static chain, %r2-%r6 contain arguments,
%r6-%r15 are callee-saved), I stuff the parameters somewhere in .rodata
or .text section, and pass the address of the parameter block in %r1.
The parameter block also contains a (position-relative) address that
__morestack should jump to (x86 just mangles the return address from
__morestack to compute that).  On zSeries CPUs, the parameter block
is stuffed somewhere in .rodata, its address loaded to %r1 by larl
instruction, and __morestack is sibling-called by jg instruction.
On older CPUs, lacking long jump and PC-relative load-address
instructions, I use the following sequence instead:

# load .L1 to %r1
basr %r1, 0 
.L1:
# Load __morestack to %r1
a %r1, .L2-.L1(%r1)
# Jump to __morestack and stuff return address (aka param block address)
# to %r1.
basr %r1, %r1
# param block comes here
.L3:
.long <frame_size>
.long <args_size>
.long .L4-.L3
# relative __morestack address here
.L2:
.long __morestack-.L1
.L4:
# __morestack jumps here

As on other targets, the call to __morestack is conditional, based
on comparing the stack pointer with a field in TCB.  For zSeries,
I just make the jump to __morestack a conditional one, while for
older CPUs I emit a jump over the sequence.  Also, for vararg
functions, I need to stuff the vararg pointer in some register. Since
%r1 is again the only one guaranteed to be free, it's the one used.
If __morestack is called, it'll leave the correct pointer in %r1.
Otherwise, I emit a simple load-address instruction.  Since I only
need that instruction in the not-called branch (as opposed to x86
that emits it on both branches), I get terser code.

Now, here come the problems.  To keep optimization passes from
destroying the above sequence (as well as the simpler ones with larl),
I emit a pseudo-insn (split_stack_call_*) that is expanded to the
above in machine-dependent reorg phase, just like normal const pools.
The instruction is considered to be an unconditional jump to the .L4
label (since __morestack will jump to an arbitrary address selected
by param block anyway, that's what it effectively is).  For a zSeries
CPU with a conditional call, I represent the sequence as a conditional
jump instead.  So overall the sequences, as emitted by
s390_expand_split_stack_prologue, look as follows:

# (1) Old CPU, unconditional
<call __morestack using basr as above, jump to .L4>
.L4:
# Normal prologue starts here.

# (2) zSeries CPU, unconditional
<call __morestack using larl+jg, jump to .L4>
.L4:
# Normal prologue starts here.

# Which will expand to:
larl %r1, .L3
jg __morestack
.section .rodata
.L3:
# Or .long for 31-bit target.
.quad <frame_size>
.quad <args_size>
.quad .L4-.L3
.text

# (3) Old CPU, conditional
<load and compare the guard against stack pointer - nothing interesting>
jhe .L5
<call __morestack using basr, jump to .L4>
.L5:
# Compute vararg pointer (vararg functions only)
la %r1, 96(%r15)
.L4:
# Normal prologue starts here.

# (4) zSeries CPU, conditional
<load and compare the guard against stack pointer>
<conditionally call __morestack using larl+jgl, if called jump to .L4>
# Compute vararg pointer (vararg functions only)
la %r1, 160(%r15)
.L4:
# Normal prologue starts here.
# Expands as above, except with jgl instead of jg.

Case (4) is the least problematic: conditional jumps with side effects
appear to work quite well.  However, the other variants involve an
unconditional jump with side effects, which causes two problems:

- If the jump is to immediately following label (which will happen always
  in cases (1) and (2), and for non-vararg functions in (3)),
  rtl_tidy_fallthru_edge mistakenly marks it as a fallthru edge, even
  though it correctly figures the jump cannot be removed due to the side
  effects.  This causes a verification failure later.
- In case (3), since the call to __morestack is considered to be unlikely,
  the basic block with the call pseudo-insn will be moved to the end of
  the function if we're optimizing.  Since it already ends with
  an unconditional jump, no new jump will be inserted (as opposed to x86).
  Soon afterwards, reposition_prologue_and_epilogue_notes will move
  NOTE_INSN_PROLOGUE_END after the last prologue instruction, which is now
  our pseudo-jump.  Unfortunately, it doesn't consider the possibility of
  it being an unconditional jump, and stuffs the note right between the
  jump and the following barrier, again causing a verification failure.

Patches 3 and 4 of the patchset attempt to fix the above problems.
For the first one, I just skip the edge if it involves an unconditional
jump with side effects.  For the second, I carefully extract the note
from its basic block and put it after the barrier.  I'm not sure any
of it is the right approach, and would welcome any feedback.

I've also found a target-independent issue with -fsplit-stack: suppose
we're compiling with -fsplit-stack and -fprofile-use or some other option
that will partition the code into hot and cold sections.  Further suppose
that the code that ends up in .text.unlikely involves a function call
aiming at a function compiled without -fsplit-stack.  In that case, the
linker should obviously perform the necessary transforms on the function
prologue to bump its frame-size.  However, since the code in
.text.unlikely doesn't really belong to function foo according to the
symbol table, one of the following happens instead:

- x86: since foo.cold.0 is not a function (STT_NOTYPE), it's not scanned
  for calls to -fno-split-stack functions, and may easily result in
  a stack overflow at runtime.
- s390: since foo.cold.0 *is* a function (STT_FUNCT), it's scanned for
  such calls, and linker tries to modify foo.cold.0's split-stack
  prologue.  This fails with a linker error, since it obviously doesn't
  have one.

I have no idea what to do about that.  Since mixing split-stack code with
-fno-split-stack is horribly broken in many ways, I'm tempted to just
ignore the problem.

A few other non-obvious problems and notes:

- For old CPUs, in case (3), optimization will move the call to the end
  of the function... but since branches on s390 reach only 4kiB in either
  direction, we s390_split_branches may attempt to split the branch to
  that block, which would fail horribly since it's before proper prologue
  and we cannot clobber %r14.  I detect this case and move the basic block
  back to its original location instead.
- Likewise, s390_split_branches needed to be taught not to look at the
  __morestack call pseudo-insn (which is considered a jump).  It'd only
  get confused.
- s390_chunkify_start is responsible for reloading the const pool register
  when branches are made between portions of a function using different
  const pools.  In case (3), we likewise cannot do that, since %r13 cannot
  be clobbered yet.  I just disable emitting the const pool reload in this
  case.
- The (ordinary) prologue needs a temp register for its own use.  As per
  the above rationale, it also tends to pick %r1, which collides with us
  using it for the vararg pointer.  There already was a condition that
  picks %r14 instead, if possible.  I amended it to pick %r12 if %r1
  would be picked in a vararg split-stack function, and modified
  s390_register_info to consider it clobbered in this case.
- For leaf functions, there's a possibility that frame_size will be 0.
  In this case, there's no point in doing the __morestack dance.  However,
  we need some way to tell a split-stack function apart in the linker
  and perhaps at runtime as well, if non-split function-pointer calls are
  ever implemented.  We may be able to get away without that, but just in
  case, I emit a funny nop (nopr %r15) instead of split-stack prologue
  in such functions to mark them (both x86 and ppc always emit
  a split-stack prologue and I'd feel uneasy if I didn't include one).
- I use a conditional __morestack call if frame_size fits in an add
  immediate instruction (16-bit signed if the CPU doesn't have extended
  immediate instructions, 32-bit if it does), unconditional otherwise
  (__morestack will check anyway, but there's not much chance of already
  having such a big frame).
- gold will try bumping the immediate field in the above add instruction
  if it's present and the frame size still fits, and will nop out the
  comparison and convert to an unconditional call otherwise.  It'll
  always bump the frame size in the parameter block.  Thanks to that, we
  don't need a separate __morestack_nonsplit function like x86.
- If -pg is used together with -fsplit-stack, the call to _mcount will
  be emitted before the split-stack prologue (as opposed to x86, which
  emits it after the prologue).  This is not a big problem, but gold
  needs to account for that and recognize the _mcount call before
  the split-stack prologue.

I have run the testsuite on a z13 machine.  In addition to running it
with -fsplit-stack, I've also run it with s390_expand_split_stack_prologue
modified to always emit unconditional calls (to exercise more paths
in __morestack).  There are a few new failures, but they can all be
explained:

- the testcases for __builtin_return_address and friends hit __morestack's
  stack frame instead of whatever they were hoping to find.
- guality tests all break since gdb looks at __morestack's frame instead
  of the one that called it.  Marking guality_check with __attribute__
  ((no_split_stack)) made them go away, though a better fix would be
  to make gdb skip __morestack frames somehow...
- some guality tests try printing function arguments after an alloca
  or VLA allocation with optimization.  These no longer work, since
  the arguments are in caller-saved registers, and a call to
  __morestack_allocate_stack_space will destroy them.
- the .text.unlikely issue mentioned above.



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 1/5] s390: Use proper read-only data section for literals.
  2016-01-02 19:16 [RFC] [PR 68191] s390: Add -fsplit-stack support Marcin Kościelnicki
                   ` (3 preceding siblings ...)
  2016-01-02 19:17 ` [PATCH 5/5] s390: Add -fsplit-stack support Marcin Kościelnicki
@ 2016-01-02 19:17 ` Marcin Kościelnicki
  2016-01-20 13:11   ` Andreas Krebbel
  2016-01-03  3:21 ` [RFC] [PR 68191] s390: Add -fsplit-stack support Ian Lance Taylor
  5 siblings, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-02 19:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: Marcin Kościelnicki

Previously, .rodata was hardcoded.  For C++ vague linkage functions,
this resulted in needlessly duplicated literals.  With the new split
stack support, this resulted in link errors, due to .rodata containing
relocations to the discarded text sections.

gcc/ChangeLog:

	* config/s390/s390.md (pool_section_start): Use switch_to_section
	to select proper read-only data section instead of hardcoding .rodata.
	(pool_section_end): Use switch_to_section to match the above.
---
 gcc/ChangeLog           |  6 ++++++
 gcc/config/s390/s390.md | 11 +++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 23ce209..2c572a7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
+
+	* config/s390/s390.md (pool_section_start): Use switch_to_section
+	to select proper read-only data section instead of hardcoding .rodata.
+	(pool_section_end): Use switch_to_section to match the above.
+
 2016-01-01  Sandra Loosemore  <sandra@codesourcery.com>
 
 	PR 1078
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index a1fc96a..0ebefd6 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -10247,13 +10247,20 @@
 (define_insn "pool_section_start"
   [(unspec_volatile [(const_int 1)] UNSPECV_POOL_SECTION)]
   ""
-  ".section\t.rodata"
+{
+  switch_to_section (targetm.asm_out.function_rodata_section
+		 (current_function_decl));
+  return "";
+}
   [(set_attr "length" "0")])
 
 (define_insn "pool_section_end"
   [(unspec_volatile [(const_int 0)] UNSPECV_POOL_SECTION)]
   ""
-  ".previous"
+{
+  switch_to_section (current_function_section ());
+  return "";
+}
   [(set_attr "length" "0")])
 
 (define_insn "main_base_31_small"
-- 
2.6.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 5/5] s390: Add -fsplit-stack support
  2016-01-02 19:16 [RFC] [PR 68191] s390: Add -fsplit-stack support Marcin Kościelnicki
                   ` (2 preceding siblings ...)
  2016-01-02 19:16 ` [PATCH 3/5] Fix NOTE_INSN_PROLOGUE_END after unconditional jump Marcin Kościelnicki
@ 2016-01-02 19:17 ` Marcin Kościelnicki
  2016-01-15 18:39   ` Andreas Krebbel
  2016-01-02 19:17 ` [PATCH 1/5] s390: Use proper read-only data section for literals Marcin Kościelnicki
  2016-01-03  3:21 ` [RFC] [PR 68191] s390: Add -fsplit-stack support Ian Lance Taylor
  5 siblings, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-02 19:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: Marcin Kościelnicki

libgcc/ChangeLog:

	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
	* config/s390/morestack.S: New file.
	* config/s390/t-stack-s390: New file.
	* generic-morestack.c (__splitstack_find): Add s390-specific code.

gcc/ChangeLog:

	* common/config/s390/s390-common.c (s390_supports_split_stack):
	New function.
	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
	* config/s390/s390.c (struct machine_function): New field
	split_stack_varargs_pointer.
	(s390_split_branches): Don't split split-stack pseudo-insns, rewire
	split-stack prologue conditional jump instead of splitting it.
	(s390_chunkify_start): Don't reload const pool register on split-stack
	prologue conditional jumps.
	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
	in s390_emit_prologue.
	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
	vararg pointer.
	(morestack_ref): New global.
	(SPLIT_STACK_AVAILABLE): New macro.
	(s390_expand_split_stack_prologue): New function.
	(s390_expand_split_stack_call_esa): New function.
	(s390_expand_split_stack_call_zarch): New function.
	(s390_live_on_entry): New function.
	(s390_va_start): Use split-stack vararg pointer if appropriate.
	(s390_reorg): Lower the split-stack pseudo-insns.
	(s390_asm_file_end): Emit the split-stack note sections.
	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
	* config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec.
	(UNSPECV_SPLIT_STACK_CALL_ZARCH): New unspec.
	(UNSPECV_SPLIT_STACK_CALL_ESA): New unspec.
	(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
	(UNSPECV_SPLIT_STACK_MARKER): New unspec.
	(split_stack_prologue): New expand.
	(split_stack_call_esa): New insn.
	(split_stack_call_zarch_*): New insn.
	(split_stack_cond_call_zarch_*): New insn.
	(split_stack_space_check): New expand.
	(split_stack_sibcall_basr): New insn.
	(split_stack_sibcall_*): New insn.
	(split_stack_cond_sibcall_*): New insn.
	(split_stack_marker): New insn.
---
 gcc/ChangeLog                        |  41 ++
 gcc/common/config/s390/s390-common.c |  14 +
 gcc/config/s390/s390-protos.h        |   1 +
 gcc/config/s390/s390.c               | 538 +++++++++++++++++++++++++-
 gcc/config/s390/s390.md              | 133 +++++++
 libgcc/ChangeLog                     |   7 +
 libgcc/config.host                   |   4 +-
 libgcc/config/s390/morestack.S       | 718 +++++++++++++++++++++++++++++++++++
 libgcc/config/s390/t-stack-s390      |   2 +
 libgcc/generic-morestack.c           |   4 +
 10 files changed, 1454 insertions(+), 8 deletions(-)
 create mode 100644 libgcc/config/s390/morestack.S
 create mode 100644 libgcc/config/s390/t-stack-s390

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 4c7046f..a4f4dff 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,46 @@
 2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
 
+	* common/config/s390/s390-common.c (s390_supports_split_stack):
+	New function.
+	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
+	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
+	* config/s390/s390.c (struct machine_function): New field
+	split_stack_varargs_pointer.
+	(s390_split_branches): Don't split split-stack pseudo-insns, rewire
+	split-stack prologue conditional jump instead of splitting it.
+	(s390_chunkify_start): Don't reload const pool register on split-stack
+	prologue conditional jumps.
+	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
+	in s390_emit_prologue.
+	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
+	vararg pointer.
+	(morestack_ref): New global.
+	(SPLIT_STACK_AVAILABLE): New macro.
+	(s390_expand_split_stack_prologue): New function.
+	(s390_expand_split_stack_call_esa): New function.
+	(s390_expand_split_stack_call_zarch): New function.
+	(s390_live_on_entry): New function.
+	(s390_va_start): Use split-stack vararg pointer if appropriate.
+	(s390_reorg): Lower the split-stack pseudo-insns.
+	(s390_asm_file_end): Emit the split-stack note sections.
+	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
+	* config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec.
+	(UNSPECV_SPLIT_STACK_CALL_ZARCH): New unspec.
+	(UNSPECV_SPLIT_STACK_CALL_ESA): New unspec.
+	(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
+	(UNSPECV_SPLIT_STACK_MARKER): New unspec.
+	(split_stack_prologue): New expand.
+	(split_stack_call_esa): New insn.
+	(split_stack_call_zarch_*): New insn.
+	(split_stack_cond_call_zarch_*): New insn.
+	(split_stack_space_check): New expand.
+	(split_stack_sibcall_basr): New insn.
+	(split_stack_sibcall_*): New insn.
+	(split_stack_cond_sibcall_*): New insn.
+	(split_stack_marker): New insn.
+
+2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
+
 	* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
 	with side effects.
 
diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c
index 4cf0df7..0c468bf 100644
--- a/gcc/common/config/s390/s390-common.c
+++ b/gcc/common/config/s390/s390-common.c
@@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
     }
 }
 
+/* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
+   We don't verify it, since earlier versions just have padding at
+   its place, which works just as well.  */
+
+static bool
+s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
+			   struct gcc_options *opts ATTRIBUTE_UNUSED)
+{
+  return true;
+}
+
 #undef TARGET_DEFAULT_TARGET_FLAGS
 #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT)
 
@@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
 #undef TARGET_OPTION_INIT_STRUCT
 #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct
 
+#undef TARGET_SUPPORTS_SPLIT_STACK
+#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack
+
 struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 962abb1..936e267 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
 extern HOST_WIDE_INT s390_initial_elimination_offset (int, int);
 extern void s390_emit_prologue (void);
 extern void s390_emit_epilogue (bool);
+extern void s390_expand_split_stack_prologue (void);
 extern bool s390_can_use_simple_return_insn (void);
 extern bool s390_can_use_return_insn (void);
 extern void s390_function_profiler (FILE *, int);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 9dc8d1e..0255eec 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -426,6 +426,13 @@ struct GTY(()) machine_function
   /* True if the current function may contain a tbegin clobbering
      FPRs.  */
   bool tbegin_p;
+
+  /* For -fsplit-stack support: A stack local which holds a pointer to
+     the stack arguments for a function with a variable number of
+     arguments.  This is set at the start of the function and is used
+     to initialize the overflow_arg_area field of the va_list
+     structure.  */
+  rtx split_stack_varargs_pointer;
 };
 
 /* Few accessor macros for struct cfun->machine->s390_frame_layout.  */
@@ -7669,7 +7676,17 @@ s390_split_branches (void)
 
       pat = PATTERN (insn);
       if (GET_CODE (pat) == PARALLEL)
-	pat = XVECEXP (pat, 0, 0);
+	{
+	  /* Split stack call pseudo-jump doesn't need splitting.  */
+	  if (GET_CODE (XVECEXP (pat, 0, 1)) == SET
+	      && GET_CODE (XEXP (XVECEXP (pat, 0, 1), 1)) == UNSPEC_VOLATILE
+	      && (XINT (XEXP (XVECEXP (pat, 0, 1), 1), 1)
+		  == UNSPECV_SPLIT_STACK_CALL_ESA
+	          || XINT (XEXP (XVECEXP (pat, 0, 1), 1), 1)
+		     == UNSPECV_SPLIT_STACK_CALL_ZARCH))
+	    continue;
+	  pat = XVECEXP (pat, 0, 0);
+	}
       if (GET_CODE (pat) != SET || SET_DEST (pat) != pc_rtx)
 	continue;
 
@@ -7692,6 +7709,49 @@ s390_split_branches (void)
       if (get_attr_length (insn) <= 4)
 	continue;
 
+      if (prologue_epilogue_contains (insn))
+        {
+	  /* A jump in prologue/epilogue must come from the split-stack
+	     prologue.  It cannot be split - there are no scratch regs
+	     available at that point.  Rewire it instead.  */
+
+	  rtx_insn *code_label = (rtx_insn *)XEXP (*label, 0);
+	  gcc_assert (LABEL_P (code_label));
+	  rtx_insn *note = NEXT_INSN (code_label);
+	  gcc_assert (NOTE_P (note));
+	  rtx_insn *jump_ss = NEXT_INSN (note);
+	  gcc_assert (JUMP_P (jump_ss));
+	  rtx_insn *barrier = NEXT_INSN (jump_ss);
+	  gcc_assert (BARRIER_P (barrier));
+	  gcc_assert (GET_CODE (SET_SRC (pat)) == IF_THEN_ELSE);
+	  gcc_assert (GET_CODE (XEXP (SET_SRC (pat), 0)) == LT);
+
+	  /* step 1 - insert new label after */
+	  rtx new_label = gen_label_rtx ();
+	  emit_label_after (new_label, insn);
+
+	  /* step 2 - reorder */
+	  reorder_insns_nobb (code_label, barrier, insn);
+
+	  /* step 3 - retarget jump */
+	  rtx new_target = gen_rtx_LABEL_REF (VOIDmode, new_label);
+	  ret = validate_change (insn, label, new_target, 0);
+	  gcc_assert (ret);
+	  LABEL_NUSES (new_label)++;
+	  LABEL_NUSES (code_label)--;
+	  JUMP_LABEL (insn) = new_label;
+
+	  /* step 4 - invert jump cc */
+	  rtx *pcond = &XEXP (SET_SRC (pat), 0);
+	  rtx new_cond = gen_rtx_fmt_ee (GE, VOIDmode,
+					 XEXP (*pcond, 0),
+					 XEXP (*pcond, 1));
+	  ret = validate_change (insn, pcond, new_cond, 0);
+	  gcc_assert (ret);
+
+	  continue;
+	}
+
       /* We are going to use the return register as scratch register,
 	 make sure it will be saved/restored by the prologue/epilogue.  */
       cfun_frame_layout.save_return_addr_p = 1;
@@ -8736,7 +8796,7 @@ s390_chunkify_start (void)
 	}
       /* If we have a direct jump (conditional or unconditional),
 	 check all potential targets.  */
-      else if (JUMP_P (insn))
+      else if (JUMP_P (insn) && !prologue_epilogue_contains (insn))
 	{
 	  rtx pat = PATTERN (insn);
 
@@ -9316,9 +9376,13 @@ s390_register_info ()
 	  cfun_frame_layout.high_fprs++;
       }
 
-  if (flag_pic)
-    clobbered_regs[PIC_OFFSET_TABLE_REGNUM]
-      |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM);
+  /* Register 12 is used for GOT address, but also as temp in prologue
+     for split-stack stdarg functions (unless r14 is available).  */
+  clobbered_regs[12]
+    |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM))
+	|| (flag_split_stack && cfun->stdarg
+	    && (crtl->is_leaf || TARGET_TPF_PROFILING
+		|| has_hard_reg_initial_val (Pmode, RETURN_REGNUM))));
 
   clobbered_regs[BASE_REGNUM]
     |= (cfun->machine->base_reg
@@ -10446,6 +10510,8 @@ s390_emit_prologue (void)
       && !crtl->is_leaf
       && !TARGET_TPF_PROFILING)
     temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM);
+  else if (flag_split_stack && cfun->stdarg)
+    temp_reg = gen_rtx_REG (Pmode, 12);
   else
     temp_reg = gen_rtx_REG (Pmode, 1);
 
@@ -10939,6 +11005,386 @@ s300_set_up_by_prologue (hard_reg_set_container *regs)
     SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg));
 }
 
+/* -fsplit-stack support.  */
+
+/* A SYMBOL_REF for __morestack.  */
+static GTY(()) rtx morestack_ref;
+
+/* When using -fsplit-stack, the allocation routines set a field in
+   the TCB to the bottom of the stack plus this much space, measured
+   in bytes.  */
+
+#define SPLIT_STACK_AVAILABLE 1024
+
+/* Emit -fsplit-stack prologue, which goes before the regular function
+   prologue.  */
+
+void
+s390_expand_split_stack_prologue (void)
+{
+  rtx r1, guard, cc;
+  rtx_insn *insn;
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = TARGET_64BIT ? 0x38 : 0x20;
+  /* Pointer size in bytes.  */
+  /* Frame size and argument size - the two parameters to __morestack.  */
+  HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size;
+  /* Align argument size to 8 bytes - simplifies __morestack code.  */
+  HOST_WIDE_INT args_size = crtl->args.size >= 0
+			    ? ((crtl->args.size + 7) & ~7)
+			    : 0;
+  /* Label to jump to when no __morestack call is necessary.  */
+  rtx_code_label *enough = NULL;
+  /* Label to be called by __morestack.  */
+  rtx_code_label *call_done = NULL;
+  /* 1 if __morestack called conditionally, 0 if always.  */
+  int conditional = 0;
+
+  gcc_assert (flag_split_stack && reload_completed);
+
+  r1 = gen_rtx_REG (Pmode, 1);
+
+  /* If no stack frame will be allocated, don't do anything.  */
+  if (!frame_size)
+    {
+      /* But emit a marker that will let linker and indirect function
+	 calls recognise this function as split-stack aware.  */
+      emit_insn(gen_split_stack_marker());
+      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+        {
+          /* If va_start is used, just use r15.  */
+          emit_move_insn (r1,
+		          gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+			                GEN_INT (STACK_POINTER_OFFSET)));
+        }
+      return;
+    }
+
+  if (morestack_ref == NULL_RTX)
+    {
+      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
+      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
+					   | SYMBOL_FLAG_FUNCTION);
+    }
+
+  if (frame_size <= 0x7fff || (TARGET_EXTIMM && frame_size <= 0xffffffffu))
+    {
+      /* If frame_size will fit in an add instruction, do a stack space
+	 check, and only call __morestack if there's not enough space.  */
+      conditional = 1;
+
+      /* Get thread pointer.  r1 is the only register we can always destroy - r0
+         could contain a static chain (and cannot be used to address memory
+         anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved.  */
+      emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM));
+      /* Aim at __private_ss.  */
+      guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso));
+
+      /* If less that 1kiB used, skip addition and compare directly with
+         __private_ss.  */
+      if (frame_size > SPLIT_STACK_AVAILABLE)
+        {
+          emit_move_insn (r1, guard);
+	  if (TARGET_64BIT)
+	    emit_insn (gen_adddi3 (r1, r1, GEN_INT(frame_size)));
+	  else
+	    emit_insn (gen_addsi3 (r1, r1, GEN_INT(frame_size)));
+	  guard = r1;
+        }
+
+      if (TARGET_CPU_ZARCH)
+        {
+	  rtx tmp;
+
+          /* Compare the (maybe adjusted) guard with the stack pointer.  */
+          cc = s390_emit_compare (LT, stack_pointer_rtx, guard);
+
+          call_done = gen_label_rtx ();
+
+	  if (TARGET_64BIT)
+	    tmp = gen_split_stack_cond_call_zarch_di (call_done,
+						      morestack_ref,
+						      GEN_INT (frame_size),
+						      GEN_INT (args_size),
+						      cc);
+	  else
+	    tmp = gen_split_stack_cond_call_zarch_si (call_done,
+						      morestack_ref,
+						      GEN_INT (frame_size),
+						      GEN_INT (args_size),
+						      cc);
+
+
+          insn = emit_jump_insn (tmp);
+	  JUMP_LABEL (insn) = call_done;
+
+          /* Mark the jump as very unlikely to be taken.  */
+          add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100);
+	}
+      else
+        {
+          /* Compare the (maybe adjusted) guard with the stack pointer.  */
+          cc = s390_emit_compare (GE, stack_pointer_rtx, guard);
+
+          enough = gen_label_rtx ();
+          insn = s390_emit_jump (enough, cc);
+          JUMP_LABEL (insn) = enough;
+
+          /* Mark the jump as very likely to be taken.  */
+          add_int_reg_note (insn, REG_BR_PROB,
+			    REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100);
+	}
+    }
+
+  if (call_done == NULL)
+    {
+      rtx tmp;
+      call_done = gen_label_rtx ();
+
+      /* Now, we need to call __morestack.  It has very special calling
+         conventions: it preserves param/return/static chain registers for
+         calling main function body, and looks for its own parameters
+         at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */
+      if (TARGET_64BIT)
+        tmp = gen_split_stack_call_zarch_di (call_done,
+					     morestack_ref,
+					     GEN_INT (frame_size),
+					     GEN_INT (args_size));
+      else if (TARGET_CPU_ZARCH)
+        tmp = gen_split_stack_call_zarch_si (call_done,
+					     morestack_ref,
+					     GEN_INT (frame_size),
+					     GEN_INT (args_size));
+      else
+        tmp = gen_split_stack_call_esa (call_done,
+					morestack_ref,
+					GEN_INT (frame_size),
+					GEN_INT (args_size));
+      insn = emit_jump_insn (tmp);
+      JUMP_LABEL (insn) = call_done;
+      emit_barrier ();
+    }
+
+  /* __morestack will call us here.  */
+
+  if (enough != NULL)
+    {
+      emit_label (enough);
+      LABEL_NUSES (enough) = 1;
+    }
+
+  if (conditional && cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+    {
+      /* If va_start is used, and __morestack was not called, just use r15.  */
+      emit_move_insn (r1,
+		      gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+			            GEN_INT (STACK_POINTER_OFFSET)));
+    }
+
+  emit_label (call_done);
+  LABEL_NUSES (call_done) = 1;
+}
+
+/* Generates split-stack call sequence for esa mode, along with its parameter
+   block.  */
+
+static void
+s390_expand_split_stack_call_esa (rtx_insn *orig_insn,
+				  rtx call_done,
+				  rtx function,
+				  rtx frame_size,
+				  rtx args_size)
+{
+  int psize = GET_MODE_SIZE (Pmode);
+  /* Labels for literal base, literal __morestack, param base.  */
+  rtx litbase = gen_label_rtx();
+  rtx litms = gen_label_rtx();
+  rtx parmbase = gen_label_rtx();
+  rtx r1 = gen_rtx_REG (Pmode, 1);
+  rtx_insn *insn = orig_insn;
+  rtx tmp, tmp2;
+
+  /* No brasl, we have to make do using basr and a literal pool.  */
+
+  /* %r1 = litbase.  */
+  insn = emit_insn_after (gen_main_base_31_small (r1, litbase), insn);
+  insn = emit_label_after (litbase, insn);
+
+  /* a %r1, .Llitms-.Llitbase(%r1) */
+  tmp = gen_rtx_LABEL_REF (Pmode, litbase);
+  tmp2 = gen_rtx_LABEL_REF (Pmode, litms);
+  tmp = gen_rtx_UNSPEC (Pmode,
+			gen_rtvec (2, tmp2, tmp),
+			UNSPEC_POOL_OFFSET);
+  tmp = gen_rtx_CONST (Pmode, tmp);
+  tmp = gen_rtx_MEM (Pmode, gen_rtx_PLUS (Pmode, r1, tmp));
+  insn = emit_insn_after (gen_addsi3 (r1, r1, tmp), insn);
+  add_reg_note (insn, REG_LABEL_OPERAND, litbase);
+  add_reg_note (insn, REG_LABEL_OPERAND, litms);
+  LABEL_NUSES (litbase)++;
+  LABEL_NUSES (litms)++;
+
+  /* basr %r1, %r1 */
+  tmp = gen_split_stack_sibcall_basr (r1, call_done);
+  insn = emit_jump_insn_after (tmp, insn);
+  JUMP_LABEL (insn) = call_done;
+  LABEL_NUSES (call_done)++;
+
+  /* __morestack will mangle its return register to get our parameters.  */
+
+  /* Now, we'll emit parameters to __morestack.  First, align to pointer size
+     (this mirrors the alignment done in __morestack - don't touch it).  */
+  insn = emit_insn_after (gen_pool_align (GEN_INT (psize)), insn);
+
+  insn = emit_label_after (parmbase, insn);
+
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, frame_size),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+
+  /* Second parameter is size of the arguments passed on stack that
+     __morestack has to copy to the new stack (does not include varargs).  */
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, args_size),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+
+  /* Third parameter is offset between start of the parameter block
+     and function body to be called by __morestack.  */
+  tmp = gen_rtx_LABEL_REF (Pmode, parmbase);
+  tmp2 = gen_rtx_LABEL_REF (Pmode, call_done);
+  tmp = gen_rtx_CONST (Pmode,
+                       gen_rtx_MINUS (Pmode, tmp2, tmp));
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, tmp),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+  add_reg_note (insn, REG_LABEL_OPERAND, call_done);
+  LABEL_NUSES (call_done)++;
+  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
+  LABEL_NUSES (parmbase)++;
+
+  /* We take advantage of the already-existing literal pool here to stuff
+     the __morestack address for use in the call above.  */
+
+  insn = emit_label_after (litms, insn);
+
+  /* We actually emit __morestack - litbase to support PIC.  Since it
+     works just as well for non-PIC, we use it in all cases.  */
+
+  tmp = gen_rtx_LABEL_REF (Pmode, litbase);
+  tmp = gen_rtx_CONST (Pmode,
+                       gen_rtx_MINUS (Pmode, function, tmp));
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, tmp),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+  add_reg_note (insn, REG_LABEL_OPERAND, litbase);
+  LABEL_NUSES (litbase)++;
+
+  delete_insn (orig_insn);
+}
+
+/* Generates split-stack call sequence for zarch mode, along with its parameter
+   block.  */
+
+static void
+s390_expand_split_stack_call_zarch (rtx_insn *orig_insn,
+				    rtx call_done,
+				    rtx function,
+				    rtx frame_size,
+				    rtx args_size,
+				    rtx cond)
+{
+  int psize = GET_MODE_SIZE (Pmode);
+  rtx_insn *insn = orig_insn;
+  rtx parmbase = gen_label_rtx();
+  rtx r1 = gen_rtx_REG (Pmode, 1);
+  rtx tmp, tmp2;
+
+  /* %r1 = litbase.  */
+  insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn);
+  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
+  LABEL_NUSES (parmbase)++;
+
+  /* jg<cond> __morestack.  */
+  if (cond == NULL)
+    {
+      if (TARGET_64BIT)
+        tmp = gen_split_stack_sibcall_di (function, call_done);
+      else
+        tmp = gen_split_stack_sibcall_si (function, call_done);
+      insn = emit_jump_insn_after (tmp, insn);
+    }
+  else
+    {
+      if (!s390_comparison (cond, VOIDmode))
+	internal_error ("bad split_stack_call_zarch cond");
+      if (TARGET_64BIT)
+        tmp = gen_split_stack_cond_sibcall_di (function, cond, call_done);
+      else
+        tmp = gen_split_stack_cond_sibcall_si (function, cond, call_done);
+      insn = emit_jump_insn_after (tmp, insn);
+    }
+  JUMP_LABEL (insn) = call_done;
+  LABEL_NUSES (call_done)++;
+
+  /* Go to .rodata.  */
+  insn = emit_insn_after (gen_pool_section_start (), insn);
+
+  /* Now, we'll emit parameters to __morestack.  First, align to pointer size
+     (this mirrors the alignment done in __morestack - don't touch it).  */
+  insn = emit_insn_after (gen_pool_align (GEN_INT (psize)), insn);
+
+  insn = emit_label_after (parmbase, insn);
+
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, frame_size),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+
+  /* Second parameter is size of the arguments passed on stack that
+     __morestack has to copy to the new stack (does not include varargs).  */
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, args_size),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+
+  /* Third parameter is offset between start of the parameter block
+     and function body to be called by __morestack.  */
+  tmp = gen_rtx_LABEL_REF (Pmode, parmbase);
+  tmp2 = gen_rtx_LABEL_REF (Pmode, call_done);
+  tmp = gen_rtx_CONST (Pmode,
+                       gen_rtx_MINUS (Pmode, tmp2, tmp));
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, tmp),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+  add_reg_note (insn, REG_LABEL_OPERAND, call_done);
+  LABEL_NUSES (call_done)++;
+  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
+  LABEL_NUSES (parmbase)++;
+
+  /* Return from .rodata.  */
+  insn = emit_insn_after (gen_pool_section_end (), insn);
+
+  delete_insn (orig_insn);
+}
+
+/* We may have to tell the dataflow pass that the split stack prologue
+   is initializing a register.  */
+
+static void
+s390_live_on_entry (bitmap regs)
+{
+  if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+    {
+      gcc_assert (flag_split_stack);
+      bitmap_set_bit (regs, 1);
+    }
+}
+
 /* Return true if the function can use simple_return to return outside
    of a shrink-wrapped region.  At present shrink-wrapping is supported
    in all cases.  */
@@ -11541,6 +11987,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
       expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
     }
 
+  if (flag_split_stack
+     && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
+         == NULL)
+     && cfun->machine->split_stack_varargs_pointer == NULL_RTX)
+    {
+      rtx reg;
+      rtx_insn *seq;
+
+      reg = gen_reg_rtx (Pmode);
+      cfun->machine->split_stack_varargs_pointer = reg;
+
+      start_sequence ();
+      emit_move_insn (reg, gen_rtx_REG (Pmode, 1));
+      seq = get_insns ();
+      end_sequence ();
+
+      push_topmost_sequence ();
+      emit_insn_after (seq, entry_of_function ());
+      pop_topmost_sequence ();
+    }
+
   /* Find the overflow area.
      FIXME: This currently is too pessimistic when the vector ABI is
      enabled.  In that case we *always* set up the overflow area
@@ -11549,7 +12016,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
       || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG
       || TARGET_VX_ABI)
     {
-      t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
+      if (cfun->machine->split_stack_varargs_pointer == NULL_RTX)
+        t = make_tree (TREE_TYPE (ovf), crtl->args.internal_arg_pointer);
+      else
+        t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer);
 
       off = INTVAL (crtl->args.arg_offset_rtx);
       off = off < 0 ? 0 : off;
@@ -13158,6 +13628,56 @@ s390_reorg (void)
 	}
     }
 
+  if (flag_split_stack)
+    {
+      rtx_insn *insn;
+
+      for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+	{
+	  /* Look for the split-stack fake jump instructions.  */
+	  if (!JUMP_P(insn))
+	    continue;
+	  if (GET_CODE (PATTERN (insn)) != PARALLEL
+	      || XVECLEN (PATTERN (insn), 0) != 2)
+	    continue;
+	  rtx set = XVECEXP (PATTERN (insn), 0, 1);
+	  if (GET_CODE (set) != SET)
+	    continue;
+	  rtx unspec = XEXP(set, 1);
+	  if (GET_CODE (unspec) != UNSPEC_VOLATILE)
+	    continue;
+	  if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL_ESA
+	      && XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL_ZARCH)
+	    continue;
+	  rtx set_pc = XVECEXP (PATTERN (insn), 0, 0);
+	  rtx function = XVECEXP (unspec, 0, 0);
+	  rtx frame_size = XVECEXP (unspec, 0, 1);
+	  rtx args_size = XVECEXP (unspec, 0, 2);
+	  rtx pc_src = XEXP (set_pc, 1);
+	  rtx call_done, cond = NULL_RTX;
+	  if (GET_CODE (pc_src) == IF_THEN_ELSE)
+	    {
+	      cond = XEXP (pc_src, 0);
+	      call_done = XEXP (XEXP (pc_src, 1), 0);
+	    }
+	  else
+	    call_done = XEXP (pc_src, 0);
+	  if (XINT (unspec, 1) == UNSPECV_SPLIT_STACK_CALL_ESA)
+	    s390_expand_split_stack_call_esa (insn,
+					      call_done,
+					      function,
+					      frame_size,
+					      args_size);
+	  else
+	    s390_expand_split_stack_call_zarch (insn,
+					        call_done,
+					        function,
+					        frame_size,
+					        args_size,
+					        cond);
+	}
+    }
+
   /* Try to optimize prologue and epilogue further.  */
   s390_optimize_prologue ();
 
@@ -14469,6 +14989,9 @@ s390_asm_file_end (void)
 	     s390_vector_abi);
 #endif
   file_end_indicate_exec_stack ();
+
+  if (flag_split_stack)
+    file_end_indicate_split_stack ();
 }
 
 /* Return true if TYPE is a vector bool type.  */
@@ -14724,6 +15247,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty
 #undef TARGET_SET_UP_BY_PROLOGUE
 #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue
 
+#undef TARGET_EXTRA_LIVE_ON_ENTRY
+#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry
+
 #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   s390_use_by_pieces_infrastructure_p
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 0ebefd6..15c6eed 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -114,6 +114,9 @@
    UNSPEC_SP_SET
    UNSPEC_SP_TEST
 
+   ; Split stack support
+   UNSPEC_STACK_CHECK
+
    ; Test Data Class (TDC)
    UNSPEC_TDC_INSN
 
@@ -276,6 +279,12 @@
    ; Set and get floating point control register
    UNSPECV_SFPC
    UNSPECV_EFPC
+
+   ; Split stack support
+   UNSPECV_SPLIT_STACK_CALL_ZARCH
+   UNSPECV_SPLIT_STACK_CALL_ESA
+   UNSPECV_SPLIT_STACK_SIBCALL
+   UNSPECV_SPLIT_STACK_MARKER
   ])
 
 ;;
@@ -10909,3 +10918,127 @@
   "TARGET_Z13"
   "lcbb\t%0,%1,%b2"
   [(set_attr "op_type" "VRX")])
+
+; Handle -fsplit-stack.
+
+(define_expand "split_stack_prologue"
+  [(const_int 0)]
+  ""
+{
+  s390_expand_split_stack_prologue ();
+  DONE;
+})
+
+(define_insn "split_stack_call_esa"
+  [(set (pc) (label_ref (match_operand 0 "" "")))
+   (set (reg:SI 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
+                                     (match_operand 2 "consttable_operand" "X")
+                                     (match_operand 3 "consttable_operand" "X")]
+                                    UNSPECV_SPLIT_STACK_CALL_ESA))]
+  "!TARGET_CPU_ZARCH"
+{
+  gcc_unreachable ();
+}
+  [(set_attr "length" "32")])
+
+(define_insn "split_stack_call_zarch_<mode>"
+  [(set (pc) (label_ref (match_operand 0 "" "")))
+   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
+                                    (match_operand 2 "consttable_operand" "X")
+                                    (match_operand 3 "consttable_operand" "X")]
+                                   UNSPECV_SPLIT_STACK_CALL_ZARCH))]
+  "TARGET_CPU_ZARCH"
+{
+  gcc_unreachable ();
+}
+  [(set_attr "length" "12")])
+
+(define_insn "split_stack_cond_call_zarch_<mode>"
+  [(set (pc)
+        (if_then_else
+          (match_operand 4 "" "")
+          (label_ref (match_operand 0 "" ""))
+          (pc)))
+   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
+                                    (match_operand 2 "consttable_operand" "X")
+                                    (match_operand 3 "consttable_operand" "X")]
+                                   UNSPECV_SPLIT_STACK_CALL_ZARCH))]
+  "TARGET_CPU_ZARCH"
+{
+  gcc_unreachable ();
+}
+  [(set_attr "length" "12")])
+
+;; If there are operand 0 bytes available on the stack, jump to
+;; operand 1.
+
+(define_expand "split_stack_space_check"
+  [(set (pc) (if_then_else
+	      (ltu (minus (reg 15)
+			  (match_operand 0 "register_operand"))
+		   (unspec [(const_int 0)] UNSPEC_STACK_CHECK))
+	      (label_ref (match_operand 1))
+	      (pc)))]
+  ""
+{
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = TARGET_64BIT ? 0x38 : 0x20;
+  rtx tp = s390_get_thread_pointer ();
+  rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso));
+  rtx reg = gen_reg_rtx (Pmode);
+  rtx cc;
+  if (TARGET_64BIT)
+    emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0]));
+  else
+    emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0]));
+  cc = s390_emit_compare (GT, reg, guard);
+  s390_emit_jump (operands[1], cc);
+
+  DONE;
+})
+
+;; A basr for use in split stack prologue.
+
+(define_insn "split_stack_sibcall_basr"
+  [(set (pc) (label_ref (match_operand 1 "" "")))
+   (set (reg:SI 1) (unspec_volatile [(match_operand 0 "register_operand" "a")]
+                                     UNSPECV_SPLIT_STACK_SIBCALL))]
+  "!TARGET_CPU_ZARCH"
+  "basr\t%%r1, %0"
+  [(set_attr "op_type" "RR")
+   (set_attr "type"  "jsr")])
+
+;; A jg with minimal fuss for use in split stack prologue.
+
+(define_insn "split_stack_sibcall_<mode>"
+  [(set (pc) (label_ref (match_operand 1 "" "")))
+   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
+                                   UNSPECV_SPLIT_STACK_SIBCALL))]
+  "TARGET_CPU_ZARCH"
+  "jg\t%0"
+  [(set_attr "op_type" "RIL")
+   (set_attr "type"  "branch")])
+
+;; Also a conditional one.
+
+(define_insn "split_stack_cond_sibcall_<mode>"
+  [(set (pc)
+        (if_then_else
+          (match_operand 1 "" "")
+          (label_ref (match_operand 2 "" ""))
+          (pc)))
+   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
+                                   UNSPECV_SPLIT_STACK_SIBCALL))]
+  "TARGET_CPU_ZARCH"
+  "jg%C1\t%0"
+  [(set_attr "op_type" "RIL")
+   (set_attr "type"  "branch")])
+
+;; An unusual nop instruction used to mark functions with no stack frames
+;; as split-stack aware.
+
+(define_insn "split_stack_marker"
+  [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)]
+  ""
+  "nopr\t%%r15"
+  [(set_attr "op_type" "RR")])
diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index f66646c..ff60571 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
+
+	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
+	* config/s390/morestack.S: New file.
+	* config/s390/t-stack-s390: New file.
+	* generic-morestack.c (__splitstack_find): Add s390-specific code.
+
 2015-12-18  Andris Pavenis  <andris.pavenis@iki.fi>
 
 	* config.host: Add *-*-msdosdjgpp to lists of i[34567]86-*-*
diff --git a/libgcc/config.host b/libgcc/config.host
index 0a3b879..ce6d259 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1105,11 +1105,11 @@ rx-*-elf)
 	tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h"
 	;;
 s390-*-linux*)
-	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi"
+	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390"
 	md_unwind_header=s390/linux-unwind.h
 	;;
 s390x-*-linux*)
-	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux"
+	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390"
 	if test "${host_address}" = 32; then
 	   tmake_file="${tmake_file} s390/32/t-floattodi"
 	fi
diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S
new file mode 100644
index 0000000..8e26c66
--- /dev/null
+++ b/libgcc/config/s390/morestack.S
@@ -0,0 +1,718 @@
+# s390 support for -fsplit-stack.
+# Copyright (C) 2015 Free Software Foundation, Inc.
+# Contributed by Marcin Kościelnicki <koriakin@0x04.net>.
+
+# This file is part of GCC.
+
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# Excess space needed to call ld.so resolver for lazy plt
+# resolution.  Go uses sigaltstack so this doesn't need to
+# also cover signal frame size.
+#define BACKOFF 0x1000
+
+# The __morestack function.
+
+	.global	__morestack
+	.hidden	__morestack
+
+	.type	__morestack,@function
+
+__morestack:
+.LFB1:
+	.cfi_startproc
+
+
+#ifndef __s390x__
+
+
+# The 31-bit __morestack function.
+
+	# We use a cleanup to restore the stack guard if an exception
+	# is thrown through this code.
+#ifndef __PIC__
+	.cfi_personality 0,__gcc_personality_v0
+	.cfi_lsda 0,.LLSDA1
+#else
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#endif
+
+	stm	%r2, %r15, 0x8(%r15)	# Save %r2-%r15.
+	.cfi_offset %r6, -0x48
+	.cfi_offset %r7, -0x44
+	.cfi_offset %r8, -0x40
+	.cfi_offset %r9, -0x3c
+	.cfi_offset %r10, -0x38
+	.cfi_offset %r11, -0x34
+	.cfi_offset %r12, -0x30
+	.cfi_offset %r13, -0x2c
+	.cfi_offset %r14, -0x28
+	.cfi_offset %r15, -0x24
+	lr	%r11, %r15		# Make frame pointer for vararg.
+	.cfi_def_cfa_register %r11
+	ahi	%r15, -0x60		# 0x60 for standard frame.
+	st	%r11, 0(%r15)		# Save back chain.
+	lr	%r8, %r0		# Save %r0 (static chain).
+
+	basr	%r13, 0			# .Lmsl to %r13
+.Lmsl:
+
+	# %r1 may point directly to the parameter area (zarch), or right after
+	# the basr instruction that called us (esa).  In the first case,
+	# the pointer is already aligned.  In the second case, we may need to
+	# align it up to 4 bytes to get to the parameters.
+	la	%r10, 3(%r1)
+	lhi	%r7, -4
+	nr	%r10, %r7		# %r10 = (%r1 + 3) & ~3
+
+	l	%r7, 0(%r10)		# Required frame size to %r7
+	ear	%r1, %a0		# Extract thread pointer.
+	l	%r1, 0x20(%r1)		# Get stack bounduary
+	ar	%r1, %r7		# Stack bounduary + frame size
+	a	%r1, 4(%r10)		# + stack param size
+	clr	%r1, %r15		# Compare with current stack pointer
+	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
+
+	l	%r1, .Lmslbs-.Lmsl(%r13)	# __morestack_block_signals
+#ifdef __PIC__
+	bas	%r14, 0(%r1, %r13)
+#else
+	basr	%r14, %r1
+#endif
+
+	# We abuse one of caller's fpr save slots (which we don't use for fprs)
+	# as a local variable.  Not needed here, but done to be consistent with
+	# the below use.
+	ahi	%r7, BACKOFF		# Bump requested size a bit.
+	st	%r7, 0x40(%r11)		# Stuff frame size on stack.
+	la	%r2, 0x40(%r11)		# Pass its address as parameter.
+	la	%r3, 0x60(%r11)		# Caller's stack parameters.
+	l	%r4, 4(%r10)		# Size of stack paremeters.
+
+	l	%r1, .Lmslgms-.Lmsl(%r13)	# __generic_morestack
+#ifdef __PIC__
+	bas	%r14, 0(%r1, %r13)
+#else
+	basr	%r14, %r1
+#endif
+
+	lr	%r15, %r2		# Switch to the new stack.
+	ahi	%r15, -0x60		# Make a stack frame on it.
+	st	%r11, 0(%r15)		# Save back chain.
+
+	s	%r2, 0x40(%r11)		# The end of stack space.
+	ahi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+.LEHB0:
+	st	%r2, 0x20(%r1)	# Save the new stack boundary.
+
+	l	%r1, .Lmslubs-.Lmsl(%r13)	# __morestack_unblock_signals
+#ifdef __PIC__
+	bas	%r14, 0(%r1, %r13)
+#else
+	basr	%r14, %r1
+#endif
+
+	lr	%r0, %r8		# Static chain.
+	lm	%r2, %r6, 0x8(%r11)	# Paremeter registers.
+
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	a	%r10, 0x8(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0x60(%r11)
+
+	# State of registers:
+	# %r0: Static chain from entry.
+	# %r1: Vararg pointer.
+	# %r2-%r6: Parameters from entry.
+	# %r7-%r10: Indeterminate.
+	# %r11: Frame pointer (%r15 from entry).
+	# %r12: Indeterminate.
+	# %r13: Literal pool address.
+	# %r14: Return address.
+	# %r15: Stack pointer.
+	basr	%r14, %r10		# Call our caller.
+
+	stm	%r2, %r3, 0x8(%r11)	# Save return registers.
+
+	l	%r1, .Lmslbs-.Lmsl(%r13)	# __morestack_block_signals
+#ifdef __PIC__
+	bas	%r14, 0(%r1, %r13)
+#else
+	basr	%r14, %r1
+#endif
+
+	# We need a stack slot now, but have no good way to get it - the frame
+	# on new stack had to be exactly 0x60 bytes, or stack parameters would
+	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
+	# save actual fprs).
+	la	%r2, 0x40(%r11)
+	l	%r1, .Lmslgrs-.Lmsl(%r13)	# __generic_releasestack
+#ifdef __PIC__
+	bas	%r14, 0(%r1, %r13)
+#else
+	basr	%r14, %r1
+#endif
+
+	s	%r2, 0x40(%r11)		# Subtract available space.
+	ahi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+.LEHE0:
+	st	%r2, 0x20(%r1)	# Save the new stack boundary.
+
+	# We need to restore the old stack pointer before unblocking signals.
+	# We also need 0x60 bytes for a stack frame.  Since we had a stack
+	# frame at this place before the stack switch, there's no need to
+	# write the back chain again.
+	lr	%r15, %r11
+	ahi	%r15, -0x60
+
+	l	%r1, .Lmslubs-.Lmsl(%r13)	# __morestack_unblock_signals
+#ifdef __PIC__
+	bas	%r14, 0(%r1, %r13)
+#else
+	basr	%r14, %r1
+#endif
+
+	lm	%r2, %r15, 0x8(%r11)	# Restore all registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# Executed if no new stack allocation is needed.
+
+.Lnoalloc:
+	.cfi_restore_state
+	# We may need to copy stack parameters.
+	l	%r9, 0x4(%r10)		# Load stack parameter size.
+	ltr	%r9, %r9		# And check if it's 0.
+	je	.Lnostackparm		# Skip the copy if not needed.
+	sr	%r15, %r9		# Make space on the stack.
+	la	%r8, 0x60(%r15)		# Destination.
+	la	%r12, 0x60(%r11)	# Source.
+	lr	%r13, %r9		# Source size.
+.Lcopy:
+	mvcle	%r8, %r12, 0		# Copy.
+	jo	.Lcopy
+
+.Lnostackparm:
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	a	%r10, 0x8(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0x60(%r11)
+
+	# OK, no stack allocation needed.  We still follow the protocol and
+	# call our caller - it doesn't cost much and makes sure vararg works.
+	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
+	basr	%r14, %r10		# Call our caller.
+
+	lm	%r6, %r15, 0x18(%r11)	# Restore all callee-saved registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# This is the cleanup code called by the stack unwinder when unwinding
+# through the code between .LEHB0 and .LEHE0 above.
+
+.L1:
+	.cfi_restore_state
+	lr	%r2, %r11		# Stack pointer after resume.
+	l	%r1, .Lmslgfs-.Lmsl(%r13)	# __generic_findstack
+#ifdef __PIC__
+	bas	%r14, 0(%r1, %r13)
+#else
+	basr	%r14, %r1
+#endif
+	lr	%r3, %r11		# Get the stack pointer.
+	sr	%r3, %r2		# Subtract available space.
+	ahi	%r3, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+	st	%r3, 0x20(%r1)	# Save the new stack boundary.
+
+	lr	%r2, %r6		# Exception header.
+#ifdef __PIC__
+	l	%r12, .Lmslgot-.Lmsl(%r13)
+	ar	%r12, %r13
+	l	%r1, .Lmslunw-.Lmsl(%r13)
+	bas	%r14, 0(%r1, %r12)
+#else
+	l	%r1, .Lmslunw-.Lmsl(%r13)
+	basr	%r14, %r1
+#endif
+
+# Literal pool.
+
+.align 4
+#ifdef __PIC__
+.Lmslbs:
+	.long __morestack_block_signals-.Lmsl
+.Lmslubs:
+	.long __morestack_unblock_signals-.Lmsl
+.Lmslgms:
+	.long __generic_morestack-.Lmsl
+.Lmslgrs:
+	.long __generic_releasestack-.Lmsl
+.Lmslgfs:
+	.long __generic_findstack-.Lmsl
+.Lmslunw:
+	.long _Unwind_Resume@PLTOFF
+.Lmslgot:
+	.long _GLOBAL_OFFSET_TABLE_-.Lmsl
+#else
+.Lmslbs:
+	.long __morestack_block_signals
+.Lmslubs:
+	.long __morestack_unblock_signals
+.Lmslgms:
+	.long __generic_morestack
+.Lmslgrs:
+	.long __generic_releasestack
+.Lmslgfs:
+	.long __generic_findstack
+.Lmslunw:
+	.long _Unwind_Resume
+#endif
+
+#else /* defined(__s390x__) */
+
+
+# The 64-bit __morestack function.
+
+	# We use a cleanup to restore the stack guard if an exception
+	# is thrown through this code.
+#ifndef __PIC__
+	.cfi_personality 0x3,__gcc_personality_v0
+	.cfi_lsda 0x3,.LLSDA1
+#else
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#endif
+
+	stmg	%r2, %r15, 0x10(%r15)	# Save %r2-%r15.
+	.cfi_offset %r6, -0x70
+	.cfi_offset %r7, -0x68
+	.cfi_offset %r8, -0x60
+	.cfi_offset %r9, -0x58
+	.cfi_offset %r10, -0x50
+	.cfi_offset %r11, -0x48
+	.cfi_offset %r12, -0x40
+	.cfi_offset %r13, -0x38
+	.cfi_offset %r14, -0x30
+	.cfi_offset %r15, -0x28
+	lgr	%r11, %r15		# Make frame pointer for vararg.
+	.cfi_def_cfa_register %r11
+	aghi	%r15, -0xa0		# 0xa0 for standard frame.
+	stg	%r11, 0(%r15)		# Save back chain.
+	lgr	%r8, %r0		# Save %r0 (static chain).
+	lgr	%r10, %r1		# Save %r1 (address of parameter block).
+
+	lg	%r7, 0(%r10)		# Required frame size to %r7
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+	lg	%r1, 0x38(%r1)		# Get stack bounduary
+	agr	%r1, %r7		# Stack bounduary + frame size
+	ag	%r1, 8(%r10)		# + stack param size
+	clgr	%r1, %r15		# Compare with current stack pointer
+	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
+
+	brasl	%r14, __morestack_block_signals
+
+	# We abuse one of caller's fpr save slots (which we don't use for fprs)
+	# as a local variable.  Not needed here, but done to be consistent with
+	# the below use.
+	aghi	%r7, BACKOFF		# Bump requested size a bit.
+	stg	%r7, 0x80(%r11)		# Stuff frame size on stack.
+	la	%r2, 0x80(%r11)		# Pass its address as parameter.
+	la	%r3, 0xa0(%r11)		# Caller's stack parameters.
+	lg	%r4, 8(%r10)		# Size of stack paremeters.
+	brasl	%r14, __generic_morestack
+
+	lgr	%r15, %r2		# Switch to the new stack.
+	aghi	%r15, -0xa0		# Make a stack frame on it.
+	stg	%r11, 0(%r15)		# Save back chain.
+
+	sg	%r2, 0x80(%r11)		# The end of stack space.
+	aghi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+.LEHB0:
+	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lgr	%r0, %r8		# Static chain.
+	lmg	%r2, %r6, 0x10(%r11)	# Paremeter registers.
+
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	ag	%r10, 0x10(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0xa0(%r11)
+
+	# State of registers:
+	# %r0: Static chain from entry.
+	# %r1: Vararg pointer.
+	# %r2-%r6: Parameters from entry.
+	# %r7-%r10: Indeterminate.
+	# %r11: Frame pointer (%r15 from entry).
+	# %r12-%r13: Indeterminate.
+	# %r14: Return address.
+	# %r15: Stack pointer.
+	basr	%r14, %r10		# Call our caller.
+
+	stg	%r2, 0x10(%r11)		# Save return register.
+
+	brasl	%r14, __morestack_block_signals
+
+	# We need a stack slot now, but have no good way to get it - the frame
+	# on new stack had to be exactly 0xa0 bytes, or stack parameters would
+	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
+	# save actual fprs).
+	la	%r2, 0x80(%r11)
+	brasl	%r14, __generic_releasestack
+
+	sg	%r2, 0x80(%r11)		# Subtract available space.
+	aghi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+.LEHE0:
+	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
+
+	# We need to restore the old stack pointer before unblocking signals.
+	# We also need 0xa0 bytes for a stack frame.  Since we had a stack
+	# frame at this place before the stack switch, there's no need to
+	# write the back chain again.
+	lgr	%r15, %r11
+	aghi	%r15, -0xa0
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lmg	%r2, %r15, 0x10(%r11)	# Restore all registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# Executed if no new stack allocation is needed.
+
+.Lnoalloc:
+	.cfi_restore_state
+	# We may need to copy stack parameters.
+	lg	%r9, 0x8(%r10)		# Load stack parameter size.
+	ltgr	%r9, %r9		# Check if it's 0.
+	je	.Lnostackparm		# Skip the copy if not needed.
+	sgr	%r15, %r9		# Make space on the stack.
+	la	%r8, 0xa0(%r15)		# Destination.
+	la	%r12, 0xa0(%r11)	# Source.
+	lgr	%r13, %r9		# Source size.
+.Lcopy:
+	mvcle	%r8, %r12, 0		# Copy.
+	jo	.Lcopy
+
+.Lnostackparm:
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	ag	%r10, 0x10(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0xa0(%r11)
+
+	# OK, no stack allocation needed.  We still follow the protocol and
+	# call our caller - it doesn't cost much and makes sure vararg works.
+	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
+	basr	%r14, %r10		# Call our caller.
+
+	lmg	%r6, %r15, 0x30(%r11)	# Restore all callee-saved registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# This is the cleanup code called by the stack unwinder when unwinding
+# through the code between .LEHB0 and .LEHE0 above.
+
+.L1:
+	.cfi_restore_state
+	lgr	%r2, %r11		# Stack pointer after resume.
+	brasl	%r14, __generic_findstack
+	lgr	%r3, %r11		# Get the stack pointer.
+	sgr	%r3, %r2		# Subtract available space.
+	aghi	%r3, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+	stg	%r3, 0x38(%r1)	# Save the new stack boundary.
+
+	lgr	%r2, %r6		# Exception header.
+#ifdef __PIC__
+	brasl	%r14, _Unwind_Resume@PLT
+#else
+	brasl	%r14, _Unwind_Resume
+#endif
+
+#endif /* defined(__s390x__) */
+
+	.cfi_endproc
+	.size	__morestack, . - __morestack
+
+
+# The exception table.  This tells the personality routine to execute
+# the exception handler.
+
+	.section	.gcc_except_table,"a",@progbits
+	.align	4
+.LLSDA1:
+	.byte	0xff	# @LPStart format (omit)
+	.byte	0xff	# @TType format (omit)
+	.byte	0x1	# call-site format (uleb128)
+	.uleb128 .LLSDACSE1-.LLSDACSB1	# Call-site table length
+.LLSDACSB1:
+	.uleb128 .LEHB0-.LFB1	# region 0 start
+	.uleb128 .LEHE0-.LEHB0	# length
+	.uleb128 .L1-.LFB1	# landing pad
+	.uleb128 0		# action
+.LLSDACSE1:
+
+
+	.global __gcc_personality_v0
+#ifdef __PIC__
+	# Build a position independent reference to the basic
+        # personality function.
+	.hidden DW.ref.__gcc_personality_v0
+	.weak   DW.ref.__gcc_personality_v0
+	.section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
+	.type	DW.ref.__gcc_personality_v0, @object
+DW.ref.__gcc_personality_v0:
+#ifndef __LP64__
+	.align 4
+	.size	DW.ref.__gcc_personality_v0, 4
+	.long	__gcc_personality_v0
+#else
+	.align 8
+	.size	DW.ref.__gcc_personality_v0, 8
+	.quad	__gcc_personality_v0
+#endif
+#endif
+
+
+
+# Initialize the stack test value when the program starts or when a
+# new thread starts.  We don't know how large the main stack is, so we
+# guess conservatively.  We might be able to use getrlimit here.
+
+	.text
+	.global	__stack_split_initialize
+	.hidden	__stack_split_initialize
+
+	.type	__stack_split_initialize, @function
+
+__stack_split_initialize:
+
+#ifndef __s390x__
+
+	ear	%r1, %a0
+	lr	%r0, %r15
+	ahi	%r0, -0x4000	# We should have at least 16K.
+	st	%r0, 0x20(%r1)
+
+	lr	%r2, %r15
+	lhi	%r3, 0x4000
+#ifdef __PIC__
+	# Cannot do a tail call - we'll go through PLT, so we need GOT address
+	# in %r12, which is callee-saved.
+	stm	%r12, %r15, 0x30(%r15)
+	basr	%r13, 0
+.Lssi0:
+	ahi	%r15, -0x60
+	l	%r12, .Lssi2-.Lssi0(%r13)
+	ar	%r12, %r13
+	l	%r1, .Lssi1-.Lssi0(%r13)
+	bas	%r14, 0(%r1, %r12)
+	lm	%r12, %r15, 0x90(%r15)
+	br	%r14
+
+.align 4
+.Lssi1:
+	.long	__generic_morestack_set_initial_sp@PLTOFF
+.Lssi2:
+	.long	_GLOBAL_OFFSET_TABLE_-.Lssi0
+
+#else
+	basr	%r1, 0
+.Lssi0:
+	l	%r1, .Lssi1-.Lssi0(%r1)
+	br	%r1	# Tail call
+
+.align 4
+.Lssi1:
+	.long	__generic_morestack_set_initial_sp
+#endif
+
+#else /* defined(__s390x__) */
+
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	lgr	%r0, %r15
+	aghi	%r0, -0x4000	# We should have at least 16K.
+	stg	%r0, 0x38(%r1)
+
+	lgr	%r2, %r15
+	lghi	%r3, 0x4000
+#ifdef __PIC__
+	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
+#else
+	jg	__generic_morestack_set_initial_sp	# Tail call
+#endif
+
+#endif /* defined(__s390x__) */
+
+	.size	__stack_split_initialize, . - __stack_split_initialize
+
+# Routines to get and set the guard, for __splitstack_getcontext,
+# __splitstack_setcontext, and __splitstack_makecontext.
+
+# void *__morestack_get_guard (void) returns the current stack guard.
+	.text
+	.global	__morestack_get_guard
+	.hidden	__morestack_get_guard
+
+	.type	__morestack_get_guard,@function
+
+__morestack_get_guard:
+
+#ifndef __s390x__
+	ear	%r1, %a0
+	l	%r2, 0x20(%r1)
+#else
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	lg	%r2, 0x38(%r1)
+#endif
+	br %r14
+
+	.size	__morestack_get_guard, . - __morestack_get_guard
+
+# void __morestack_set_guard (void *) sets the stack guard.
+	.global	__morestack_set_guard
+	.hidden	__morestack_set_guard
+
+	.type	__morestack_set_guard,@function
+
+__morestack_set_guard:
+
+#ifndef __s390x__
+	ear	%r1, %a0
+	st	%r2, 0x20(%r1)
+#else
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	stg	%r2, 0x38(%r1)
+#endif
+	br	%r14
+
+	.size	__morestack_set_guard, . - __morestack_set_guard
+
+# void *__morestack_make_guard (void *, size_t) returns the stack
+# guard value for a stack.
+	.global	__morestack_make_guard
+	.hidden	__morestack_make_guard
+
+	.type	__morestack_make_guard,@function
+
+__morestack_make_guard:
+
+#ifndef __s390x__
+	sr	%r2, %r3
+	ahi	%r2, BACKOFF
+#else
+	sgr	%r2, %r3
+	aghi	%r2, BACKOFF
+#endif
+	br	%r14
+
+	.size	__morestack_make_guard, . - __morestack_make_guard
+
+# Make __stack_split_initialize a high priority constructor.
+
+	.section .ctors.65535,"aw",@progbits
+
+#ifndef __LP64__
+	.align	4
+	.long	__stack_split_initialize
+	.long	__morestack_load_mmap
+#else
+	.align	8
+	.quad	__stack_split_initialize
+	.quad	__morestack_load_mmap
+#endif
+
+	.section	.note.GNU-stack,"",@progbits
+	.section	.note.GNU-split-stack,"",@progbits
+	.section	.note.GNU-no-split-stack,"",@progbits
diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390
new file mode 100644
index 0000000..4c959b0
--- /dev/null
+++ b/libgcc/config/s390/t-stack-s390
@@ -0,0 +1,2 @@
+# Makefile fragment to support -fsplit-stack for s390.
+LIB2ADD_ST += $(srcdir)/config/s390/morestack.S
diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
index a10559b..8109c1a 100644
--- a/libgcc/generic-morestack.c
+++ b/libgcc/generic-morestack.c
@@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len,
 #elif defined (__i386__)
       nsp -= 6 * sizeof (void *);
 #elif defined __powerpc64__
+#elif defined __s390x__
+      nsp -= 2 * 160;
+#elif defined __s390__
+      nsp -= 2 * 96;
 #else
 #error "unrecognized target"
 #endif
-- 
2.6.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC] [PR 68191] s390: Add -fsplit-stack support.
  2016-01-02 19:16 [RFC] [PR 68191] s390: Add -fsplit-stack support Marcin Kościelnicki
                   ` (4 preceding siblings ...)
  2016-01-02 19:17 ` [PATCH 1/5] s390: Use proper read-only data section for literals Marcin Kościelnicki
@ 2016-01-03  3:21 ` Ian Lance Taylor
  2016-01-03 10:32   ` Marcin Kościelnicki
  2016-01-04  7:35   ` Marcin Kościelnicki
  5 siblings, 2 replies; 55+ messages in thread
From: Ian Lance Taylor @ 2016-01-03  3:21 UTC (permalink / raw)
  To: Marcin Kościelnicki; +Cc: gcc-patches

On Sat, Jan 2, 2016 at 11:16 AM, Marcin Kościelnicki <koriakin@0x04.net> wrote:
>
> The differences start in the __morestack calling convention.  Basically,
> since pushing things on stuck is unwieldy and there's only one free
> register (%r0 could be used for static chain, %r2-%r6 contain arguments,
> %r6-%r15 are callee-saved), I stuff the parameters somewhere in .rodata
> or .text section, and pass the address of the parameter block in %r1.
> The parameter block also contains a (position-relative) address that
> __morestack should jump to (x86 just mangles the return address from
> __morestack to compute that).  On zSeries CPUs, the parameter block
> is stuffed somewhere in .rodata, its address loaded to %r1 by larl
> instruction, and __morestack is sibling-called by jg instruction.

Does that work in a multi-threaded program if two different threads
are calling the same function at the same time and both threads need
to split the stack?

Ian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC] [PR 68191] s390: Add -fsplit-stack support.
  2016-01-03  3:21 ` [RFC] [PR 68191] s390: Add -fsplit-stack support Ian Lance Taylor
@ 2016-01-03 10:32   ` Marcin Kościelnicki
  2016-01-04  7:35   ` Marcin Kościelnicki
  1 sibling, 0 replies; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-03 10:32 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-patches

On 03/01/16 04:20, Ian Lance Taylor wrote:
> On Sat, Jan 2, 2016 at 11:16 AM, Marcin Kościelnicki <koriakin@0x04.net> wrote:
>>
>> The differences start in the __morestack calling convention.  Basically,
>> since pushing things on stuck is unwieldy and there's only one free
>> register (%r0 could be used for static chain, %r2-%r6 contain arguments,
>> %r6-%r15 are callee-saved), I stuff the parameters somewhere in .rodata
>> or .text section, and pass the address of the parameter block in %r1.
>> The parameter block also contains a (position-relative) address that
>> __morestack should jump to (x86 just mangles the return address from
>> __morestack to compute that).  On zSeries CPUs, the parameter block
>> is stuffed somewhere in .rodata, its address loaded to %r1 by larl
>> instruction, and __morestack is sibling-called by jg instruction.
>
> Does that work in a multi-threaded program if two different threads
> are calling the same function at the same time and both threads need
> to split the stack?
>
> Ian
>

Sure, why not?  The parameters are link-time constants after all.

Marcin Kościelnicki

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC] [PR 68191] s390: Add -fsplit-stack support.
  2016-01-03  3:21 ` [RFC] [PR 68191] s390: Add -fsplit-stack support Ian Lance Taylor
  2016-01-03 10:32   ` Marcin Kościelnicki
@ 2016-01-04  7:35   ` Marcin Kościelnicki
  1 sibling, 0 replies; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-04  7:35 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-patches

On 03/01/16 04:20, Ian Lance Taylor wrote:
> On Sat, Jan 2, 2016 at 11:16 AM, Marcin Kościelnicki <koriakin@0x04.net> wrote:
>>
>> The differences start in the __morestack calling convention.  Basically,
>> since pushing things on stuck is unwieldy and there's only one free
>> register (%r0 could be used for static chain, %r2-%r6 contain arguments,
>> %r6-%r15 are callee-saved), I stuff the parameters somewhere in .rodata
>> or .text section, and pass the address of the parameter block in %r1.
>> The parameter block also contains a (position-relative) address that
>> __morestack should jump to (x86 just mangles the return address from
>> __morestack to compute that).  On zSeries CPUs, the parameter block
>> is stuffed somewhere in .rodata, its address loaded to %r1 by larl
>> instruction, and __morestack is sibling-called by jg instruction.
>
> Does that work in a multi-threaded program if two different threads
> are calling the same function at the same time and both threads need
> to split the stack?

For a few more details - __morestack takes three parameters:

- function's frame size (initial frame size if it happens to use alloca 
or VLAs later)
- size function's arguments on stack (not including varargs, if any)
- a pointer to the label where execution should be continued after stack 
is allocated

All three are per-function consts.  The first two are computed by the 
compiler (though frame size can be mangled by linker for functions 
calling non-split-stack code), and the third by the linker (since it 
involves relocation).  Since the parameters are known at link time, 
they're put in a per-function block in .rodata or .text and never 
change.  Simultanous access to that area is not a problem, since it's 
never written.

Marcin Kościelnicki

>
> Ian
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/5] s390: Add -fsplit-stack support
  2016-01-02 19:17 ` [PATCH 5/5] s390: Add -fsplit-stack support Marcin Kościelnicki
@ 2016-01-15 18:39   ` Andreas Krebbel
  2016-01-15 21:08     ` Marcin Kościelnicki
  2016-01-16 13:46     ` [PATCH] " Marcin Kościelnicki
  0 siblings, 2 replies; 55+ messages in thread
From: Andreas Krebbel @ 2016-01-15 18:39 UTC (permalink / raw)
  To: Marcin Kościelnicki, gcc-patches

Marcin,

your implementation looks very good to me. Thanks!

But please be aware that we deprecated the support of g5 and g6 and intend to remove that code from
the back-end with the next GCC version.  So I would prefer if you could remove all the
!TARGET_CPU_ZARCH stuff from the implementation and just error out if split-stack is enabled with
-march g5/g6.  It currently makes the implementation more complicated and would have to be removed
anyway in the future.

Thanks!

https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01854.html


Bye,

-Andreas-



On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
> libgcc/ChangeLog:
> 
> 	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
> 	* config/s390/morestack.S: New file.
> 	* config/s390/t-stack-s390: New file.
> 	* generic-morestack.c (__splitstack_find): Add s390-specific code.
> 
> gcc/ChangeLog:
> 
> 	* common/config/s390/s390-common.c (s390_supports_split_stack):
> 	New function.
> 	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
> 	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
> 	* config/s390/s390.c (struct machine_function): New field
> 	split_stack_varargs_pointer.
> 	(s390_split_branches): Don't split split-stack pseudo-insns, rewire
> 	split-stack prologue conditional jump instead of splitting it.
> 	(s390_chunkify_start): Don't reload const pool register on split-stack
> 	prologue conditional jumps.
> 	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
> 	in s390_emit_prologue.
> 	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
> 	vararg pointer.
> 	(morestack_ref): New global.
> 	(SPLIT_STACK_AVAILABLE): New macro.
> 	(s390_expand_split_stack_prologue): New function.
> 	(s390_expand_split_stack_call_esa): New function.
> 	(s390_expand_split_stack_call_zarch): New function.
> 	(s390_live_on_entry): New function.
> 	(s390_va_start): Use split-stack vararg pointer if appropriate.
> 	(s390_reorg): Lower the split-stack pseudo-insns.
> 	(s390_asm_file_end): Emit the split-stack note sections.
> 	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
> 	* config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec.
> 	(UNSPECV_SPLIT_STACK_CALL_ZARCH): New unspec.
> 	(UNSPECV_SPLIT_STACK_CALL_ESA): New unspec.
> 	(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
> 	(UNSPECV_SPLIT_STACK_MARKER): New unspec.
> 	(split_stack_prologue): New expand.
> 	(split_stack_call_esa): New insn.
> 	(split_stack_call_zarch_*): New insn.
> 	(split_stack_cond_call_zarch_*): New insn.
> 	(split_stack_space_check): New expand.
> 	(split_stack_sibcall_basr): New insn.
> 	(split_stack_sibcall_*): New insn.
> 	(split_stack_cond_sibcall_*): New insn.
> 	(split_stack_marker): New insn.
> ---
>  gcc/ChangeLog                        |  41 ++
>  gcc/common/config/s390/s390-common.c |  14 +
>  gcc/config/s390/s390-protos.h        |   1 +
>  gcc/config/s390/s390.c               | 538 +++++++++++++++++++++++++-
>  gcc/config/s390/s390.md              | 133 +++++++
>  libgcc/ChangeLog                     |   7 +
>  libgcc/config.host                   |   4 +-
>  libgcc/config/s390/morestack.S       | 718 +++++++++++++++++++++++++++++++++++
>  libgcc/config/s390/t-stack-s390      |   2 +
>  libgcc/generic-morestack.c           |   4 +
>  10 files changed, 1454 insertions(+), 8 deletions(-)
>  create mode 100644 libgcc/config/s390/morestack.S
>  create mode 100644 libgcc/config/s390/t-stack-s390
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 4c7046f..a4f4dff 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,5 +1,46 @@
>  2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
> 
> +	* common/config/s390/s390-common.c (s390_supports_split_stack):
> +	New function.
> +	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
> +	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
> +	* config/s390/s390.c (struct machine_function): New field
> +	split_stack_varargs_pointer.
> +	(s390_split_branches): Don't split split-stack pseudo-insns, rewire
> +	split-stack prologue conditional jump instead of splitting it.
> +	(s390_chunkify_start): Don't reload const pool register on split-stack
> +	prologue conditional jumps.
> +	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
> +	in s390_emit_prologue.
> +	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
> +	vararg pointer.
> +	(morestack_ref): New global.
> +	(SPLIT_STACK_AVAILABLE): New macro.
> +	(s390_expand_split_stack_prologue): New function.
> +	(s390_expand_split_stack_call_esa): New function.
> +	(s390_expand_split_stack_call_zarch): New function.
> +	(s390_live_on_entry): New function.
> +	(s390_va_start): Use split-stack vararg pointer if appropriate.
> +	(s390_reorg): Lower the split-stack pseudo-insns.
> +	(s390_asm_file_end): Emit the split-stack note sections.
> +	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
> +	* config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec.
> +	(UNSPECV_SPLIT_STACK_CALL_ZARCH): New unspec.
> +	(UNSPECV_SPLIT_STACK_CALL_ESA): New unspec.
> +	(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
> +	(UNSPECV_SPLIT_STACK_MARKER): New unspec.
> +	(split_stack_prologue): New expand.
> +	(split_stack_call_esa): New insn.
> +	(split_stack_call_zarch_*): New insn.
> +	(split_stack_cond_call_zarch_*): New insn.
> +	(split_stack_space_check): New expand.
> +	(split_stack_sibcall_basr): New insn.
> +	(split_stack_sibcall_*): New insn.
> +	(split_stack_cond_sibcall_*): New insn.
> +	(split_stack_marker): New insn.
> +
> +2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
> +
>  	* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
>  	with side effects.
> 
> diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c
> index 4cf0df7..0c468bf 100644
> --- a/gcc/common/config/s390/s390-common.c
> +++ b/gcc/common/config/s390/s390-common.c
> @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
>      }
>  }
> 
> +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
> +   We don't verify it, since earlier versions just have padding at
> +   its place, which works just as well.  */
> +
> +static bool
> +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
> +			   struct gcc_options *opts ATTRIBUTE_UNUSED)
> +{
> +  return true;
> +}
> +
>  #undef TARGET_DEFAULT_TARGET_FLAGS
>  #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT)
> 
> @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
>  #undef TARGET_OPTION_INIT_STRUCT
>  #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct
> 
> +#undef TARGET_SUPPORTS_SPLIT_STACK
> +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack
> +
>  struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
> diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
> index 962abb1..936e267 100644
> --- a/gcc/config/s390/s390-protos.h
> +++ b/gcc/config/s390/s390-protos.h
> @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
>  extern HOST_WIDE_INT s390_initial_elimination_offset (int, int);
>  extern void s390_emit_prologue (void);
>  extern void s390_emit_epilogue (bool);
> +extern void s390_expand_split_stack_prologue (void);
>  extern bool s390_can_use_simple_return_insn (void);
>  extern bool s390_can_use_return_insn (void);
>  extern void s390_function_profiler (FILE *, int);
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index 9dc8d1e..0255eec 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -426,6 +426,13 @@ struct GTY(()) machine_function
>    /* True if the current function may contain a tbegin clobbering
>       FPRs.  */
>    bool tbegin_p;
> +
> +  /* For -fsplit-stack support: A stack local which holds a pointer to
> +     the stack arguments for a function with a variable number of
> +     arguments.  This is set at the start of the function and is used
> +     to initialize the overflow_arg_area field of the va_list
> +     structure.  */
> +  rtx split_stack_varargs_pointer;
>  };
> 
>  /* Few accessor macros for struct cfun->machine->s390_frame_layout.  */
> @@ -7669,7 +7676,17 @@ s390_split_branches (void)
> 
>        pat = PATTERN (insn);
>        if (GET_CODE (pat) == PARALLEL)
> -	pat = XVECEXP (pat, 0, 0);
> +	{
> +	  /* Split stack call pseudo-jump doesn't need splitting.  */
> +	  if (GET_CODE (XVECEXP (pat, 0, 1)) == SET
> +	      && GET_CODE (XEXP (XVECEXP (pat, 0, 1), 1)) == UNSPEC_VOLATILE
> +	      && (XINT (XEXP (XVECEXP (pat, 0, 1), 1), 1)
> +		  == UNSPECV_SPLIT_STACK_CALL_ESA
> +	          || XINT (XEXP (XVECEXP (pat, 0, 1), 1), 1)
> +		     == UNSPECV_SPLIT_STACK_CALL_ZARCH))
> +	    continue;
> +	  pat = XVECEXP (pat, 0, 0);
> +	}
>        if (GET_CODE (pat) != SET || SET_DEST (pat) != pc_rtx)
>  	continue;
> 
> @@ -7692,6 +7709,49 @@ s390_split_branches (void)
>        if (get_attr_length (insn) <= 4)
>  	continue;
> 
> +      if (prologue_epilogue_contains (insn))
> +        {
> +	  /* A jump in prologue/epilogue must come from the split-stack
> +	     prologue.  It cannot be split - there are no scratch regs
> +	     available at that point.  Rewire it instead.  */
> +
> +	  rtx_insn *code_label = (rtx_insn *)XEXP (*label, 0);
> +	  gcc_assert (LABEL_P (code_label));
> +	  rtx_insn *note = NEXT_INSN (code_label);
> +	  gcc_assert (NOTE_P (note));
> +	  rtx_insn *jump_ss = NEXT_INSN (note);
> +	  gcc_assert (JUMP_P (jump_ss));
> +	  rtx_insn *barrier = NEXT_INSN (jump_ss);
> +	  gcc_assert (BARRIER_P (barrier));
> +	  gcc_assert (GET_CODE (SET_SRC (pat)) == IF_THEN_ELSE);
> +	  gcc_assert (GET_CODE (XEXP (SET_SRC (pat), 0)) == LT);
> +
> +	  /* step 1 - insert new label after */
> +	  rtx new_label = gen_label_rtx ();
> +	  emit_label_after (new_label, insn);
> +
> +	  /* step 2 - reorder */
> +	  reorder_insns_nobb (code_label, barrier, insn);
> +
> +	  /* step 3 - retarget jump */
> +	  rtx new_target = gen_rtx_LABEL_REF (VOIDmode, new_label);
> +	  ret = validate_change (insn, label, new_target, 0);
> +	  gcc_assert (ret);
> +	  LABEL_NUSES (new_label)++;
> +	  LABEL_NUSES (code_label)--;
> +	  JUMP_LABEL (insn) = new_label;
> +
> +	  /* step 4 - invert jump cc */
> +	  rtx *pcond = &XEXP (SET_SRC (pat), 0);
> +	  rtx new_cond = gen_rtx_fmt_ee (GE, VOIDmode,
> +					 XEXP (*pcond, 0),
> +					 XEXP (*pcond, 1));
> +	  ret = validate_change (insn, pcond, new_cond, 0);
> +	  gcc_assert (ret);
> +
> +	  continue;
> +	}
> +
>        /* We are going to use the return register as scratch register,
>  	 make sure it will be saved/restored by the prologue/epilogue.  */
>        cfun_frame_layout.save_return_addr_p = 1;
> @@ -8736,7 +8796,7 @@ s390_chunkify_start (void)
>  	}
>        /* If we have a direct jump (conditional or unconditional),
>  	 check all potential targets.  */
> -      else if (JUMP_P (insn))
> +      else if (JUMP_P (insn) && !prologue_epilogue_contains (insn))
>  	{
>  	  rtx pat = PATTERN (insn);
> 
> @@ -9316,9 +9376,13 @@ s390_register_info ()
>  	  cfun_frame_layout.high_fprs++;
>        }
> 
> -  if (flag_pic)
> -    clobbered_regs[PIC_OFFSET_TABLE_REGNUM]
> -      |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM);
> +  /* Register 12 is used for GOT address, but also as temp in prologue
> +     for split-stack stdarg functions (unless r14 is available).  */
> +  clobbered_regs[12]
> +    |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM))
> +	|| (flag_split_stack && cfun->stdarg
> +	    && (crtl->is_leaf || TARGET_TPF_PROFILING
> +		|| has_hard_reg_initial_val (Pmode, RETURN_REGNUM))));
> 
>    clobbered_regs[BASE_REGNUM]
>      |= (cfun->machine->base_reg
> @@ -10446,6 +10510,8 @@ s390_emit_prologue (void)
>        && !crtl->is_leaf
>        && !TARGET_TPF_PROFILING)
>      temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM);
> +  else if (flag_split_stack && cfun->stdarg)
> +    temp_reg = gen_rtx_REG (Pmode, 12);
>    else
>      temp_reg = gen_rtx_REG (Pmode, 1);
> 
> @@ -10939,6 +11005,386 @@ s300_set_up_by_prologue (hard_reg_set_container *regs)
>      SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg));
>  }
> 
> +/* -fsplit-stack support.  */
> +
> +/* A SYMBOL_REF for __morestack.  */
> +static GTY(()) rtx morestack_ref;
> +
> +/* When using -fsplit-stack, the allocation routines set a field in
> +   the TCB to the bottom of the stack plus this much space, measured
> +   in bytes.  */
> +
> +#define SPLIT_STACK_AVAILABLE 1024
> +
> +/* Emit -fsplit-stack prologue, which goes before the regular function
> +   prologue.  */
> +
> +void
> +s390_expand_split_stack_prologue (void)
> +{
> +  rtx r1, guard, cc;
> +  rtx_insn *insn;
> +  /* Offset from thread pointer to __private_ss.  */
> +  int psso = TARGET_64BIT ? 0x38 : 0x20;
> +  /* Pointer size in bytes.  */
> +  /* Frame size and argument size - the two parameters to __morestack.  */
> +  HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size;
> +  /* Align argument size to 8 bytes - simplifies __morestack code.  */
> +  HOST_WIDE_INT args_size = crtl->args.size >= 0
> +			    ? ((crtl->args.size + 7) & ~7)
> +			    : 0;
> +  /* Label to jump to when no __morestack call is necessary.  */
> +  rtx_code_label *enough = NULL;
> +  /* Label to be called by __morestack.  */
> +  rtx_code_label *call_done = NULL;
> +  /* 1 if __morestack called conditionally, 0 if always.  */
> +  int conditional = 0;
> +
> +  gcc_assert (flag_split_stack && reload_completed);
> +
> +  r1 = gen_rtx_REG (Pmode, 1);
> +
> +  /* If no stack frame will be allocated, don't do anything.  */
> +  if (!frame_size)
> +    {
> +      /* But emit a marker that will let linker and indirect function
> +	 calls recognise this function as split-stack aware.  */
> +      emit_insn(gen_split_stack_marker());
> +      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
> +        {
> +          /* If va_start is used, just use r15.  */
> +          emit_move_insn (r1,
> +		          gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> +			                GEN_INT (STACK_POINTER_OFFSET)));
> +        }
> +      return;
> +    }
> +
> +  if (morestack_ref == NULL_RTX)
> +    {
> +      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
> +      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
> +					   | SYMBOL_FLAG_FUNCTION);
> +    }
> +
> +  if (frame_size <= 0x7fff || (TARGET_EXTIMM && frame_size <= 0xffffffffu))
> +    {
> +      /* If frame_size will fit in an add instruction, do a stack space
> +	 check, and only call __morestack if there's not enough space.  */
> +      conditional = 1;
> +
> +      /* Get thread pointer.  r1 is the only register we can always destroy - r0
> +         could contain a static chain (and cannot be used to address memory
> +         anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved.  */
> +      emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM));
> +      /* Aim at __private_ss.  */
> +      guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso));
> +
> +      /* If less that 1kiB used, skip addition and compare directly with
> +         __private_ss.  */
> +      if (frame_size > SPLIT_STACK_AVAILABLE)
> +        {
> +          emit_move_insn (r1, guard);
> +	  if (TARGET_64BIT)
> +	    emit_insn (gen_adddi3 (r1, r1, GEN_INT(frame_size)));
> +	  else
> +	    emit_insn (gen_addsi3 (r1, r1, GEN_INT(frame_size)));
> +	  guard = r1;
> +        }
> +
> +      if (TARGET_CPU_ZARCH)
> +        {
> +	  rtx tmp;
> +
> +          /* Compare the (maybe adjusted) guard with the stack pointer.  */
> +          cc = s390_emit_compare (LT, stack_pointer_rtx, guard);
> +
> +          call_done = gen_label_rtx ();
> +
> +	  if (TARGET_64BIT)
> +	    tmp = gen_split_stack_cond_call_zarch_di (call_done,
> +						      morestack_ref,
> +						      GEN_INT (frame_size),
> +						      GEN_INT (args_size),
> +						      cc);
> +	  else
> +	    tmp = gen_split_stack_cond_call_zarch_si (call_done,
> +						      morestack_ref,
> +						      GEN_INT (frame_size),
> +						      GEN_INT (args_size),
> +						      cc);
> +
> +
> +          insn = emit_jump_insn (tmp);
> +	  JUMP_LABEL (insn) = call_done;
> +
> +          /* Mark the jump as very unlikely to be taken.  */
> +          add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100);
> +	}
> +      else
> +        {
> +          /* Compare the (maybe adjusted) guard with the stack pointer.  */
> +          cc = s390_emit_compare (GE, stack_pointer_rtx, guard);
> +
> +          enough = gen_label_rtx ();
> +          insn = s390_emit_jump (enough, cc);
> +          JUMP_LABEL (insn) = enough;
> +
> +          /* Mark the jump as very likely to be taken.  */
> +          add_int_reg_note (insn, REG_BR_PROB,
> +			    REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100);
> +	}
> +    }
> +
> +  if (call_done == NULL)
> +    {
> +      rtx tmp;
> +      call_done = gen_label_rtx ();
> +
> +      /* Now, we need to call __morestack.  It has very special calling
> +         conventions: it preserves param/return/static chain registers for
> +         calling main function body, and looks for its own parameters
> +         at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */
> +      if (TARGET_64BIT)
> +        tmp = gen_split_stack_call_zarch_di (call_done,
> +					     morestack_ref,
> +					     GEN_INT (frame_size),
> +					     GEN_INT (args_size));
> +      else if (TARGET_CPU_ZARCH)
> +        tmp = gen_split_stack_call_zarch_si (call_done,
> +					     morestack_ref,
> +					     GEN_INT (frame_size),
> +					     GEN_INT (args_size));
> +      else
> +        tmp = gen_split_stack_call_esa (call_done,
> +					morestack_ref,
> +					GEN_INT (frame_size),
> +					GEN_INT (args_size));
> +      insn = emit_jump_insn (tmp);
> +      JUMP_LABEL (insn) = call_done;
> +      emit_barrier ();
> +    }
> +
> +  /* __morestack will call us here.  */
> +
> +  if (enough != NULL)
> +    {
> +      emit_label (enough);
> +      LABEL_NUSES (enough) = 1;
> +    }
> +
> +  if (conditional && cfun->machine->split_stack_varargs_pointer != NULL_RTX)
> +    {
> +      /* If va_start is used, and __morestack was not called, just use r15.  */
> +      emit_move_insn (r1,
> +		      gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> +			            GEN_INT (STACK_POINTER_OFFSET)));
> +    }
> +
> +  emit_label (call_done);
> +  LABEL_NUSES (call_done) = 1;
> +}
> +
> +/* Generates split-stack call sequence for esa mode, along with its parameter
> +   block.  */
> +
> +static void
> +s390_expand_split_stack_call_esa (rtx_insn *orig_insn,
> +				  rtx call_done,
> +				  rtx function,
> +				  rtx frame_size,
> +				  rtx args_size)
> +{
> +  int psize = GET_MODE_SIZE (Pmode);
> +  /* Labels for literal base, literal __morestack, param base.  */
> +  rtx litbase = gen_label_rtx();
> +  rtx litms = gen_label_rtx();
> +  rtx parmbase = gen_label_rtx();
> +  rtx r1 = gen_rtx_REG (Pmode, 1);
> +  rtx_insn *insn = orig_insn;
> +  rtx tmp, tmp2;
> +
> +  /* No brasl, we have to make do using basr and a literal pool.  */
> +
> +  /* %r1 = litbase.  */
> +  insn = emit_insn_after (gen_main_base_31_small (r1, litbase), insn);
> +  insn = emit_label_after (litbase, insn);
> +
> +  /* a %r1, .Llitms-.Llitbase(%r1) */
> +  tmp = gen_rtx_LABEL_REF (Pmode, litbase);
> +  tmp2 = gen_rtx_LABEL_REF (Pmode, litms);
> +  tmp = gen_rtx_UNSPEC (Pmode,
> +			gen_rtvec (2, tmp2, tmp),
> +			UNSPEC_POOL_OFFSET);
> +  tmp = gen_rtx_CONST (Pmode, tmp);
> +  tmp = gen_rtx_MEM (Pmode, gen_rtx_PLUS (Pmode, r1, tmp));
> +  insn = emit_insn_after (gen_addsi3 (r1, r1, tmp), insn);
> +  add_reg_note (insn, REG_LABEL_OPERAND, litbase);
> +  add_reg_note (insn, REG_LABEL_OPERAND, litms);
> +  LABEL_NUSES (litbase)++;
> +  LABEL_NUSES (litms)++;
> +
> +  /* basr %r1, %r1 */
> +  tmp = gen_split_stack_sibcall_basr (r1, call_done);
> +  insn = emit_jump_insn_after (tmp, insn);
> +  JUMP_LABEL (insn) = call_done;
> +  LABEL_NUSES (call_done)++;
> +
> +  /* __morestack will mangle its return register to get our parameters.  */
> +
> +  /* Now, we'll emit parameters to __morestack.  First, align to pointer size
> +     (this mirrors the alignment done in __morestack - don't touch it).  */
> +  insn = emit_insn_after (gen_pool_align (GEN_INT (psize)), insn);
> +
> +  insn = emit_label_after (parmbase, insn);
> +
> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
> +				 gen_rtvec (1, frame_size),
> +				 UNSPECV_POOL_ENTRY);
> +  insn = emit_insn_after (tmp, insn);
> +
> +  /* Second parameter is size of the arguments passed on stack that
> +     __morestack has to copy to the new stack (does not include varargs).  */
> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
> +				 gen_rtvec (1, args_size),
> +				 UNSPECV_POOL_ENTRY);
> +  insn = emit_insn_after (tmp, insn);
> +
> +  /* Third parameter is offset between start of the parameter block
> +     and function body to be called by __morestack.  */
> +  tmp = gen_rtx_LABEL_REF (Pmode, parmbase);
> +  tmp2 = gen_rtx_LABEL_REF (Pmode, call_done);
> +  tmp = gen_rtx_CONST (Pmode,
> +                       gen_rtx_MINUS (Pmode, tmp2, tmp));
> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
> +				 gen_rtvec (1, tmp),
> +				 UNSPECV_POOL_ENTRY);
> +  insn = emit_insn_after (tmp, insn);
> +  add_reg_note (insn, REG_LABEL_OPERAND, call_done);
> +  LABEL_NUSES (call_done)++;
> +  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
> +  LABEL_NUSES (parmbase)++;
> +
> +  /* We take advantage of the already-existing literal pool here to stuff
> +     the __morestack address for use in the call above.  */
> +
> +  insn = emit_label_after (litms, insn);
> +
> +  /* We actually emit __morestack - litbase to support PIC.  Since it
> +     works just as well for non-PIC, we use it in all cases.  */
> +
> +  tmp = gen_rtx_LABEL_REF (Pmode, litbase);
> +  tmp = gen_rtx_CONST (Pmode,
> +                       gen_rtx_MINUS (Pmode, function, tmp));
> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
> +				 gen_rtvec (1, tmp),
> +				 UNSPECV_POOL_ENTRY);
> +  insn = emit_insn_after (tmp, insn);
> +  add_reg_note (insn, REG_LABEL_OPERAND, litbase);
> +  LABEL_NUSES (litbase)++;
> +
> +  delete_insn (orig_insn);
> +}
> +
> +/* Generates split-stack call sequence for zarch mode, along with its parameter
> +   block.  */
> +
> +static void
> +s390_expand_split_stack_call_zarch (rtx_insn *orig_insn,
> +				    rtx call_done,
> +				    rtx function,
> +				    rtx frame_size,
> +				    rtx args_size,
> +				    rtx cond)
> +{
> +  int psize = GET_MODE_SIZE (Pmode);
> +  rtx_insn *insn = orig_insn;
> +  rtx parmbase = gen_label_rtx();
> +  rtx r1 = gen_rtx_REG (Pmode, 1);
> +  rtx tmp, tmp2;
> +
> +  /* %r1 = litbase.  */
> +  insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn);
> +  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
> +  LABEL_NUSES (parmbase)++;
> +
> +  /* jg<cond> __morestack.  */
> +  if (cond == NULL)
> +    {
> +      if (TARGET_64BIT)
> +        tmp = gen_split_stack_sibcall_di (function, call_done);
> +      else
> +        tmp = gen_split_stack_sibcall_si (function, call_done);
> +      insn = emit_jump_insn_after (tmp, insn);
> +    }
> +  else
> +    {
> +      if (!s390_comparison (cond, VOIDmode))
> +	internal_error ("bad split_stack_call_zarch cond");
> +      if (TARGET_64BIT)
> +        tmp = gen_split_stack_cond_sibcall_di (function, cond, call_done);
> +      else
> +        tmp = gen_split_stack_cond_sibcall_si (function, cond, call_done);
> +      insn = emit_jump_insn_after (tmp, insn);
> +    }
> +  JUMP_LABEL (insn) = call_done;
> +  LABEL_NUSES (call_done)++;
> +
> +  /* Go to .rodata.  */
> +  insn = emit_insn_after (gen_pool_section_start (), insn);
> +
> +  /* Now, we'll emit parameters to __morestack.  First, align to pointer size
> +     (this mirrors the alignment done in __morestack - don't touch it).  */
> +  insn = emit_insn_after (gen_pool_align (GEN_INT (psize)), insn);
> +
> +  insn = emit_label_after (parmbase, insn);
> +
> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
> +				 gen_rtvec (1, frame_size),
> +				 UNSPECV_POOL_ENTRY);
> +  insn = emit_insn_after (tmp, insn);
> +
> +  /* Second parameter is size of the arguments passed on stack that
> +     __morestack has to copy to the new stack (does not include varargs).  */
> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
> +				 gen_rtvec (1, args_size),
> +				 UNSPECV_POOL_ENTRY);
> +  insn = emit_insn_after (tmp, insn);
> +
> +  /* Third parameter is offset between start of the parameter block
> +     and function body to be called by __morestack.  */
> +  tmp = gen_rtx_LABEL_REF (Pmode, parmbase);
> +  tmp2 = gen_rtx_LABEL_REF (Pmode, call_done);
> +  tmp = gen_rtx_CONST (Pmode,
> +                       gen_rtx_MINUS (Pmode, tmp2, tmp));
> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
> +				 gen_rtvec (1, tmp),
> +				 UNSPECV_POOL_ENTRY);
> +  insn = emit_insn_after (tmp, insn);
> +  add_reg_note (insn, REG_LABEL_OPERAND, call_done);
> +  LABEL_NUSES (call_done)++;
> +  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
> +  LABEL_NUSES (parmbase)++;
> +
> +  /* Return from .rodata.  */
> +  insn = emit_insn_after (gen_pool_section_end (), insn);
> +
> +  delete_insn (orig_insn);
> +}
> +
> +/* We may have to tell the dataflow pass that the split stack prologue
> +   is initializing a register.  */
> +
> +static void
> +s390_live_on_entry (bitmap regs)
> +{
> +  if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
> +    {
> +      gcc_assert (flag_split_stack);
> +      bitmap_set_bit (regs, 1);
> +    }
> +}
> +
>  /* Return true if the function can use simple_return to return outside
>     of a shrink-wrapped region.  At present shrink-wrapping is supported
>     in all cases.  */
> @@ -11541,6 +11987,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
>        expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
>      }
> 
> +  if (flag_split_stack
> +     && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
> +         == NULL)
> +     && cfun->machine->split_stack_varargs_pointer == NULL_RTX)
> +    {
> +      rtx reg;
> +      rtx_insn *seq;
> +
> +      reg = gen_reg_rtx (Pmode);
> +      cfun->machine->split_stack_varargs_pointer = reg;
> +
> +      start_sequence ();
> +      emit_move_insn (reg, gen_rtx_REG (Pmode, 1));
> +      seq = get_insns ();
> +      end_sequence ();
> +
> +      push_topmost_sequence ();
> +      emit_insn_after (seq, entry_of_function ());
> +      pop_topmost_sequence ();
> +    }
> +
>    /* Find the overflow area.
>       FIXME: This currently is too pessimistic when the vector ABI is
>       enabled.  In that case we *always* set up the overflow area
> @@ -11549,7 +12016,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
>        || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG
>        || TARGET_VX_ABI)
>      {
> -      t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
> +      if (cfun->machine->split_stack_varargs_pointer == NULL_RTX)
> +        t = make_tree (TREE_TYPE (ovf), crtl->args.internal_arg_pointer);
> +      else
> +        t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer);
> 
>        off = INTVAL (crtl->args.arg_offset_rtx);
>        off = off < 0 ? 0 : off;
> @@ -13158,6 +13628,56 @@ s390_reorg (void)
>  	}
>      }
> 
> +  if (flag_split_stack)
> +    {
> +      rtx_insn *insn;
> +
> +      for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
> +	{
> +	  /* Look for the split-stack fake jump instructions.  */
> +	  if (!JUMP_P(insn))
> +	    continue;
> +	  if (GET_CODE (PATTERN (insn)) != PARALLEL
> +	      || XVECLEN (PATTERN (insn), 0) != 2)
> +	    continue;
> +	  rtx set = XVECEXP (PATTERN (insn), 0, 1);
> +	  if (GET_CODE (set) != SET)
> +	    continue;
> +	  rtx unspec = XEXP(set, 1);
> +	  if (GET_CODE (unspec) != UNSPEC_VOLATILE)
> +	    continue;
> +	  if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL_ESA
> +	      && XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL_ZARCH)
> +	    continue;
> +	  rtx set_pc = XVECEXP (PATTERN (insn), 0, 0);
> +	  rtx function = XVECEXP (unspec, 0, 0);
> +	  rtx frame_size = XVECEXP (unspec, 0, 1);
> +	  rtx args_size = XVECEXP (unspec, 0, 2);
> +	  rtx pc_src = XEXP (set_pc, 1);
> +	  rtx call_done, cond = NULL_RTX;
> +	  if (GET_CODE (pc_src) == IF_THEN_ELSE)
> +	    {
> +	      cond = XEXP (pc_src, 0);
> +	      call_done = XEXP (XEXP (pc_src, 1), 0);
> +	    }
> +	  else
> +	    call_done = XEXP (pc_src, 0);
> +	  if (XINT (unspec, 1) == UNSPECV_SPLIT_STACK_CALL_ESA)
> +	    s390_expand_split_stack_call_esa (insn,
> +					      call_done,
> +					      function,
> +					      frame_size,
> +					      args_size);
> +	  else
> +	    s390_expand_split_stack_call_zarch (insn,
> +					        call_done,
> +					        function,
> +					        frame_size,
> +					        args_size,
> +					        cond);
> +	}
> +    }
> +
>    /* Try to optimize prologue and epilogue further.  */
>    s390_optimize_prologue ();
> 
> @@ -14469,6 +14989,9 @@ s390_asm_file_end (void)
>  	     s390_vector_abi);
>  #endif
>    file_end_indicate_exec_stack ();
> +
> +  if (flag_split_stack)
> +    file_end_indicate_split_stack ();
>  }
> 
>  /* Return true if TYPE is a vector bool type.  */
> @@ -14724,6 +15247,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty
>  #undef TARGET_SET_UP_BY_PROLOGUE
>  #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue
> 
> +#undef TARGET_EXTRA_LIVE_ON_ENTRY
> +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry
> +
>  #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
>  #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
>    s390_use_by_pieces_infrastructure_p
> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
> index 0ebefd6..15c6eed 100644
> --- a/gcc/config/s390/s390.md
> +++ b/gcc/config/s390/s390.md
> @@ -114,6 +114,9 @@
>     UNSPEC_SP_SET
>     UNSPEC_SP_TEST
> 
> +   ; Split stack support
> +   UNSPEC_STACK_CHECK
> +
>     ; Test Data Class (TDC)
>     UNSPEC_TDC_INSN
> 
> @@ -276,6 +279,12 @@
>     ; Set and get floating point control register
>     UNSPECV_SFPC
>     UNSPECV_EFPC
> +
> +   ; Split stack support
> +   UNSPECV_SPLIT_STACK_CALL_ZARCH
> +   UNSPECV_SPLIT_STACK_CALL_ESA
> +   UNSPECV_SPLIT_STACK_SIBCALL
> +   UNSPECV_SPLIT_STACK_MARKER
>    ])
> 
>  ;;
> @@ -10909,3 +10918,127 @@
>    "TARGET_Z13"
>    "lcbb\t%0,%1,%b2"
>    [(set_attr "op_type" "VRX")])
> +
> +; Handle -fsplit-stack.
> +
> +(define_expand "split_stack_prologue"
> +  [(const_int 0)]
> +  ""
> +{
> +  s390_expand_split_stack_prologue ();
> +  DONE;
> +})
> +
> +(define_insn "split_stack_call_esa"
> +  [(set (pc) (label_ref (match_operand 0 "" "")))
> +   (set (reg:SI 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
> +                                     (match_operand 2 "consttable_operand" "X")
> +                                     (match_operand 3 "consttable_operand" "X")]
> +                                    UNSPECV_SPLIT_STACK_CALL_ESA))]
> +  "!TARGET_CPU_ZARCH"
> +{
> +  gcc_unreachable ();
> +}
> +  [(set_attr "length" "32")])
> +
> +(define_insn "split_stack_call_zarch_<mode>"
> +  [(set (pc) (label_ref (match_operand 0 "" "")))
> +   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
> +                                    (match_operand 2 "consttable_operand" "X")
> +                                    (match_operand 3 "consttable_operand" "X")]
> +                                   UNSPECV_SPLIT_STACK_CALL_ZARCH))]
> +  "TARGET_CPU_ZARCH"
> +{
> +  gcc_unreachable ();
> +}
> +  [(set_attr "length" "12")])
> +
> +(define_insn "split_stack_cond_call_zarch_<mode>"
> +  [(set (pc)
> +        (if_then_else
> +          (match_operand 4 "" "")
> +          (label_ref (match_operand 0 "" ""))
> +          (pc)))
> +   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
> +                                    (match_operand 2 "consttable_operand" "X")
> +                                    (match_operand 3 "consttable_operand" "X")]
> +                                   UNSPECV_SPLIT_STACK_CALL_ZARCH))]
> +  "TARGET_CPU_ZARCH"
> +{
> +  gcc_unreachable ();
> +}
> +  [(set_attr "length" "12")])
> +
> +;; If there are operand 0 bytes available on the stack, jump to
> +;; operand 1.
> +
> +(define_expand "split_stack_space_check"
> +  [(set (pc) (if_then_else
> +	      (ltu (minus (reg 15)
> +			  (match_operand 0 "register_operand"))
> +		   (unspec [(const_int 0)] UNSPEC_STACK_CHECK))
> +	      (label_ref (match_operand 1))
> +	      (pc)))]
> +  ""
> +{
> +  /* Offset from thread pointer to __private_ss.  */
> +  int psso = TARGET_64BIT ? 0x38 : 0x20;
> +  rtx tp = s390_get_thread_pointer ();
> +  rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso));
> +  rtx reg = gen_reg_rtx (Pmode);
> +  rtx cc;
> +  if (TARGET_64BIT)
> +    emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0]));
> +  else
> +    emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0]));
> +  cc = s390_emit_compare (GT, reg, guard);
> +  s390_emit_jump (operands[1], cc);
> +
> +  DONE;
> +})
> +
> +;; A basr for use in split stack prologue.
> +
> +(define_insn "split_stack_sibcall_basr"
> +  [(set (pc) (label_ref (match_operand 1 "" "")))
> +   (set (reg:SI 1) (unspec_volatile [(match_operand 0 "register_operand" "a")]
> +                                     UNSPECV_SPLIT_STACK_SIBCALL))]
> +  "!TARGET_CPU_ZARCH"
> +  "basr\t%%r1, %0"
> +  [(set_attr "op_type" "RR")
> +   (set_attr "type"  "jsr")])
> +
> +;; A jg with minimal fuss for use in split stack prologue.
> +
> +(define_insn "split_stack_sibcall_<mode>"
> +  [(set (pc) (label_ref (match_operand 1 "" "")))
> +   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
> +                                   UNSPECV_SPLIT_STACK_SIBCALL))]
> +  "TARGET_CPU_ZARCH"
> +  "jg\t%0"
> +  [(set_attr "op_type" "RIL")
> +   (set_attr "type"  "branch")])
> +
> +;; Also a conditional one.
> +
> +(define_insn "split_stack_cond_sibcall_<mode>"
> +  [(set (pc)
> +        (if_then_else
> +          (match_operand 1 "" "")
> +          (label_ref (match_operand 2 "" ""))
> +          (pc)))
> +   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
> +                                   UNSPECV_SPLIT_STACK_SIBCALL))]
> +  "TARGET_CPU_ZARCH"
> +  "jg%C1\t%0"
> +  [(set_attr "op_type" "RIL")
> +   (set_attr "type"  "branch")])
> +
> +;; An unusual nop instruction used to mark functions with no stack frames
> +;; as split-stack aware.
> +
> +(define_insn "split_stack_marker"
> +  [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)]
> +  ""
> +  "nopr\t%%r15"
> +  [(set_attr "op_type" "RR")])
> diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
> index f66646c..ff60571 100644
> --- a/libgcc/ChangeLog
> +++ b/libgcc/ChangeLog
> @@ -1,3 +1,10 @@
> +2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
> +
> +	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
> +	* config/s390/morestack.S: New file.
> +	* config/s390/t-stack-s390: New file.
> +	* generic-morestack.c (__splitstack_find): Add s390-specific code.
> +
>  2015-12-18  Andris Pavenis  <andris.pavenis@iki.fi>
> 
>  	* config.host: Add *-*-msdosdjgpp to lists of i[34567]86-*-*
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 0a3b879..ce6d259 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -1105,11 +1105,11 @@ rx-*-elf)
>  	tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h"
>  	;;
>  s390-*-linux*)
> -	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi"
> +	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390"
>  	md_unwind_header=s390/linux-unwind.h
>  	;;
>  s390x-*-linux*)
> -	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux"
> +	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390"
>  	if test "${host_address}" = 32; then
>  	   tmake_file="${tmake_file} s390/32/t-floattodi"
>  	fi
> diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S
> new file mode 100644
> index 0000000..8e26c66
> --- /dev/null
> +++ b/libgcc/config/s390/morestack.S
> @@ -0,0 +1,718 @@
> +# s390 support for -fsplit-stack.
> +# Copyright (C) 2015 Free Software Foundation, Inc.
> +# Contributed by Marcin Kościelnicki <koriakin@0x04.net>.
> +
> +# This file is part of GCC.
> +
> +# GCC is free software; you can redistribute it and/or modify it under
> +# the terms of the GNU General Public License as published by the Free
> +# Software Foundation; either version 3, or (at your option) any later
> +# version.
> +
> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +# for more details.
> +
> +# Under Section 7 of GPL version 3, you are granted additional
> +# permissions described in the GCC Runtime Library Exception, version
> +# 3.1, as published by the Free Software Foundation.
> +
> +# You should have received a copy of the GNU General Public License and
> +# a copy of the GCC Runtime Library Exception along with this program;
> +# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +# <http://www.gnu.org/licenses/>.
> +
> +# Excess space needed to call ld.so resolver for lazy plt
> +# resolution.  Go uses sigaltstack so this doesn't need to
> +# also cover signal frame size.
> +#define BACKOFF 0x1000
> +
> +# The __morestack function.
> +
> +	.global	__morestack
> +	.hidden	__morestack
> +
> +	.type	__morestack,@function
> +
> +__morestack:
> +.LFB1:
> +	.cfi_startproc
> +
> +
> +#ifndef __s390x__
> +
> +
> +# The 31-bit __morestack function.
> +
> +	# We use a cleanup to restore the stack guard if an exception
> +	# is thrown through this code.
> +#ifndef __PIC__
> +	.cfi_personality 0,__gcc_personality_v0
> +	.cfi_lsda 0,.LLSDA1
> +#else
> +	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
> +	.cfi_lsda 0x1b,.LLSDA1
> +#endif
> +
> +	stm	%r2, %r15, 0x8(%r15)	# Save %r2-%r15.
> +	.cfi_offset %r6, -0x48
> +	.cfi_offset %r7, -0x44
> +	.cfi_offset %r8, -0x40
> +	.cfi_offset %r9, -0x3c
> +	.cfi_offset %r10, -0x38
> +	.cfi_offset %r11, -0x34
> +	.cfi_offset %r12, -0x30
> +	.cfi_offset %r13, -0x2c
> +	.cfi_offset %r14, -0x28
> +	.cfi_offset %r15, -0x24
> +	lr	%r11, %r15		# Make frame pointer for vararg.
> +	.cfi_def_cfa_register %r11
> +	ahi	%r15, -0x60		# 0x60 for standard frame.
> +	st	%r11, 0(%r15)		# Save back chain.
> +	lr	%r8, %r0		# Save %r0 (static chain).
> +
> +	basr	%r13, 0			# .Lmsl to %r13
> +.Lmsl:
> +
> +	# %r1 may point directly to the parameter area (zarch), or right after
> +	# the basr instruction that called us (esa).  In the first case,
> +	# the pointer is already aligned.  In the second case, we may need to
> +	# align it up to 4 bytes to get to the parameters.
> +	la	%r10, 3(%r1)
> +	lhi	%r7, -4
> +	nr	%r10, %r7		# %r10 = (%r1 + 3) & ~3
> +
> +	l	%r7, 0(%r10)		# Required frame size to %r7
> +	ear	%r1, %a0		# Extract thread pointer.
> +	l	%r1, 0x20(%r1)		# Get stack bounduary
> +	ar	%r1, %r7		# Stack bounduary + frame size
> +	a	%r1, 4(%r10)		# + stack param size
> +	clr	%r1, %r15		# Compare with current stack pointer
> +	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
> +
> +	l	%r1, .Lmslbs-.Lmsl(%r13)	# __morestack_block_signals
> +#ifdef __PIC__
> +	bas	%r14, 0(%r1, %r13)
> +#else
> +	basr	%r14, %r1
> +#endif
> +
> +	# We abuse one of caller's fpr save slots (which we don't use for fprs)
> +	# as a local variable.  Not needed here, but done to be consistent with
> +	# the below use.
> +	ahi	%r7, BACKOFF		# Bump requested size a bit.
> +	st	%r7, 0x40(%r11)		# Stuff frame size on stack.
> +	la	%r2, 0x40(%r11)		# Pass its address as parameter.
> +	la	%r3, 0x60(%r11)		# Caller's stack parameters.
> +	l	%r4, 4(%r10)		# Size of stack paremeters.
> +
> +	l	%r1, .Lmslgms-.Lmsl(%r13)	# __generic_morestack
> +#ifdef __PIC__
> +	bas	%r14, 0(%r1, %r13)
> +#else
> +	basr	%r14, %r1
> +#endif
> +
> +	lr	%r15, %r2		# Switch to the new stack.
> +	ahi	%r15, -0x60		# Make a stack frame on it.
> +	st	%r11, 0(%r15)		# Save back chain.
> +
> +	s	%r2, 0x40(%r11)		# The end of stack space.
> +	ahi	%r2, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0		# Extract thread pointer.
> +.LEHB0:
> +	st	%r2, 0x20(%r1)	# Save the new stack boundary.
> +
> +	l	%r1, .Lmslubs-.Lmsl(%r13)	# __morestack_unblock_signals
> +#ifdef __PIC__
> +	bas	%r14, 0(%r1, %r13)
> +#else
> +	basr	%r14, %r1
> +#endif
> +
> +	lr	%r0, %r8		# Static chain.
> +	lm	%r2, %r6, 0x8(%r11)	# Paremeter registers.
> +
> +	# Third parameter is address of function meat - address of parameter
> +	# block.
> +	a	%r10, 0x8(%r10)
> +
> +	# Leave vararg pointer in %r1, in case function uses it
> +	la	%r1, 0x60(%r11)
> +
> +	# State of registers:
> +	# %r0: Static chain from entry.
> +	# %r1: Vararg pointer.
> +	# %r2-%r6: Parameters from entry.
> +	# %r7-%r10: Indeterminate.
> +	# %r11: Frame pointer (%r15 from entry).
> +	# %r12: Indeterminate.
> +	# %r13: Literal pool address.
> +	# %r14: Return address.
> +	# %r15: Stack pointer.
> +	basr	%r14, %r10		# Call our caller.
> +
> +	stm	%r2, %r3, 0x8(%r11)	# Save return registers.
> +
> +	l	%r1, .Lmslbs-.Lmsl(%r13)	# __morestack_block_signals
> +#ifdef __PIC__
> +	bas	%r14, 0(%r1, %r13)
> +#else
> +	basr	%r14, %r1
> +#endif
> +
> +	# We need a stack slot now, but have no good way to get it - the frame
> +	# on new stack had to be exactly 0x60 bytes, or stack parameters would
> +	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
> +	# save actual fprs).
> +	la	%r2, 0x40(%r11)
> +	l	%r1, .Lmslgrs-.Lmsl(%r13)	# __generic_releasestack
> +#ifdef __PIC__
> +	bas	%r14, 0(%r1, %r13)
> +#else
> +	basr	%r14, %r1
> +#endif
> +
> +	s	%r2, 0x40(%r11)		# Subtract available space.
> +	ahi	%r2, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0		# Extract thread pointer.
> +.LEHE0:
> +	st	%r2, 0x20(%r1)	# Save the new stack boundary.
> +
> +	# We need to restore the old stack pointer before unblocking signals.
> +	# We also need 0x60 bytes for a stack frame.  Since we had a stack
> +	# frame at this place before the stack switch, there's no need to
> +	# write the back chain again.
> +	lr	%r15, %r11
> +	ahi	%r15, -0x60
> +
> +	l	%r1, .Lmslubs-.Lmsl(%r13)	# __morestack_unblock_signals
> +#ifdef __PIC__
> +	bas	%r14, 0(%r1, %r13)
> +#else
> +	basr	%r14, %r1
> +#endif
> +
> +	lm	%r2, %r15, 0x8(%r11)	# Restore all registers.
> +	.cfi_remember_state
> +	.cfi_restore %r15
> +	.cfi_restore %r14
> +	.cfi_restore %r13
> +	.cfi_restore %r12
> +	.cfi_restore %r11
> +	.cfi_restore %r10
> +	.cfi_restore %r9
> +	.cfi_restore %r8
> +	.cfi_restore %r7
> +	.cfi_restore %r6
> +	.cfi_def_cfa_register %r15
> +	br	%r14			# Return to caller's caller.
> +
> +# Executed if no new stack allocation is needed.
> +
> +.Lnoalloc:
> +	.cfi_restore_state
> +	# We may need to copy stack parameters.
> +	l	%r9, 0x4(%r10)		# Load stack parameter size.
> +	ltr	%r9, %r9		# And check if it's 0.
> +	je	.Lnostackparm		# Skip the copy if not needed.
> +	sr	%r15, %r9		# Make space on the stack.
> +	la	%r8, 0x60(%r15)		# Destination.
> +	la	%r12, 0x60(%r11)	# Source.
> +	lr	%r13, %r9		# Source size.
> +.Lcopy:
> +	mvcle	%r8, %r12, 0		# Copy.
> +	jo	.Lcopy
> +
> +.Lnostackparm:
> +	# Third parameter is address of function meat - address of parameter
> +	# block.
> +	a	%r10, 0x8(%r10)
> +
> +	# Leave vararg pointer in %r1, in case function uses it
> +	la	%r1, 0x60(%r11)
> +
> +	# OK, no stack allocation needed.  We still follow the protocol and
> +	# call our caller - it doesn't cost much and makes sure vararg works.
> +	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
> +	basr	%r14, %r10		# Call our caller.
> +
> +	lm	%r6, %r15, 0x18(%r11)	# Restore all callee-saved registers.
> +	.cfi_remember_state
> +	.cfi_restore %r15
> +	.cfi_restore %r14
> +	.cfi_restore %r13
> +	.cfi_restore %r12
> +	.cfi_restore %r11
> +	.cfi_restore %r10
> +	.cfi_restore %r9
> +	.cfi_restore %r8
> +	.cfi_restore %r7
> +	.cfi_restore %r6
> +	.cfi_def_cfa_register %r15
> +	br	%r14			# Return to caller's caller.
> +
> +# This is the cleanup code called by the stack unwinder when unwinding
> +# through the code between .LEHB0 and .LEHE0 above.
> +
> +.L1:
> +	.cfi_restore_state
> +	lr	%r2, %r11		# Stack pointer after resume.
> +	l	%r1, .Lmslgfs-.Lmsl(%r13)	# __generic_findstack
> +#ifdef __PIC__
> +	bas	%r14, 0(%r1, %r13)
> +#else
> +	basr	%r14, %r1
> +#endif
> +	lr	%r3, %r11		# Get the stack pointer.
> +	sr	%r3, %r2		# Subtract available space.
> +	ahi	%r3, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0		# Extract thread pointer.
> +	st	%r3, 0x20(%r1)	# Save the new stack boundary.
> +
> +	lr	%r2, %r6		# Exception header.
> +#ifdef __PIC__
> +	l	%r12, .Lmslgot-.Lmsl(%r13)
> +	ar	%r12, %r13
> +	l	%r1, .Lmslunw-.Lmsl(%r13)
> +	bas	%r14, 0(%r1, %r12)
> +#else
> +	l	%r1, .Lmslunw-.Lmsl(%r13)
> +	basr	%r14, %r1
> +#endif
> +
> +# Literal pool.
> +
> +.align 4
> +#ifdef __PIC__
> +.Lmslbs:
> +	.long __morestack_block_signals-.Lmsl
> +.Lmslubs:
> +	.long __morestack_unblock_signals-.Lmsl
> +.Lmslgms:
> +	.long __generic_morestack-.Lmsl
> +.Lmslgrs:
> +	.long __generic_releasestack-.Lmsl
> +.Lmslgfs:
> +	.long __generic_findstack-.Lmsl
> +.Lmslunw:
> +	.long _Unwind_Resume@PLTOFF
> +.Lmslgot:
> +	.long _GLOBAL_OFFSET_TABLE_-.Lmsl
> +#else
> +.Lmslbs:
> +	.long __morestack_block_signals
> +.Lmslubs:
> +	.long __morestack_unblock_signals
> +.Lmslgms:
> +	.long __generic_morestack
> +.Lmslgrs:
> +	.long __generic_releasestack
> +.Lmslgfs:
> +	.long __generic_findstack
> +.Lmslunw:
> +	.long _Unwind_Resume
> +#endif
> +
> +#else /* defined(__s390x__) */
> +
> +
> +# The 64-bit __morestack function.
> +
> +	# We use a cleanup to restore the stack guard if an exception
> +	# is thrown through this code.
> +#ifndef __PIC__
> +	.cfi_personality 0x3,__gcc_personality_v0
> +	.cfi_lsda 0x3,.LLSDA1
> +#else
> +	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
> +	.cfi_lsda 0x1b,.LLSDA1
> +#endif
> +
> +	stmg	%r2, %r15, 0x10(%r15)	# Save %r2-%r15.
> +	.cfi_offset %r6, -0x70
> +	.cfi_offset %r7, -0x68
> +	.cfi_offset %r8, -0x60
> +	.cfi_offset %r9, -0x58
> +	.cfi_offset %r10, -0x50
> +	.cfi_offset %r11, -0x48
> +	.cfi_offset %r12, -0x40
> +	.cfi_offset %r13, -0x38
> +	.cfi_offset %r14, -0x30
> +	.cfi_offset %r15, -0x28
> +	lgr	%r11, %r15		# Make frame pointer for vararg.
> +	.cfi_def_cfa_register %r11
> +	aghi	%r15, -0xa0		# 0xa0 for standard frame.
> +	stg	%r11, 0(%r15)		# Save back chain.
> +	lgr	%r8, %r0		# Save %r0 (static chain).
> +	lgr	%r10, %r1		# Save %r1 (address of parameter block).
> +
> +	lg	%r7, 0(%r10)		# Required frame size to %r7
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1		# Extract thread pointer.
> +	lg	%r1, 0x38(%r1)		# Get stack bounduary
> +	agr	%r1, %r7		# Stack bounduary + frame size
> +	ag	%r1, 8(%r10)		# + stack param size
> +	clgr	%r1, %r15		# Compare with current stack pointer
> +	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
> +
> +	brasl	%r14, __morestack_block_signals
> +
> +	# We abuse one of caller's fpr save slots (which we don't use for fprs)
> +	# as a local variable.  Not needed here, but done to be consistent with
> +	# the below use.
> +	aghi	%r7, BACKOFF		# Bump requested size a bit.
> +	stg	%r7, 0x80(%r11)		# Stuff frame size on stack.
> +	la	%r2, 0x80(%r11)		# Pass its address as parameter.
> +	la	%r3, 0xa0(%r11)		# Caller's stack parameters.
> +	lg	%r4, 8(%r10)		# Size of stack paremeters.
> +	brasl	%r14, __generic_morestack
> +
> +	lgr	%r15, %r2		# Switch to the new stack.
> +	aghi	%r15, -0xa0		# Make a stack frame on it.
> +	stg	%r11, 0(%r15)		# Save back chain.
> +
> +	sg	%r2, 0x80(%r11)		# The end of stack space.
> +	aghi	%r2, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1		# Extract thread pointer.
> +.LEHB0:
> +	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
> +
> +	brasl	%r14, __morestack_unblock_signals
> +
> +	lgr	%r0, %r8		# Static chain.
> +	lmg	%r2, %r6, 0x10(%r11)	# Paremeter registers.
> +
> +	# Third parameter is address of function meat - address of parameter
> +	# block.
> +	ag	%r10, 0x10(%r10)
> +
> +	# Leave vararg pointer in %r1, in case function uses it
> +	la	%r1, 0xa0(%r11)
> +
> +	# State of registers:
> +	# %r0: Static chain from entry.
> +	# %r1: Vararg pointer.
> +	# %r2-%r6: Parameters from entry.
> +	# %r7-%r10: Indeterminate.
> +	# %r11: Frame pointer (%r15 from entry).
> +	# %r12-%r13: Indeterminate.
> +	# %r14: Return address.
> +	# %r15: Stack pointer.
> +	basr	%r14, %r10		# Call our caller.
> +
> +	stg	%r2, 0x10(%r11)		# Save return register.
> +
> +	brasl	%r14, __morestack_block_signals
> +
> +	# We need a stack slot now, but have no good way to get it - the frame
> +	# on new stack had to be exactly 0xa0 bytes, or stack parameters would
> +	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
> +	# save actual fprs).
> +	la	%r2, 0x80(%r11)
> +	brasl	%r14, __generic_releasestack
> +
> +	sg	%r2, 0x80(%r11)		# Subtract available space.
> +	aghi	%r2, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1		# Extract thread pointer.
> +.LEHE0:
> +	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
> +
> +	# We need to restore the old stack pointer before unblocking signals.
> +	# We also need 0xa0 bytes for a stack frame.  Since we had a stack
> +	# frame at this place before the stack switch, there's no need to
> +	# write the back chain again.
> +	lgr	%r15, %r11
> +	aghi	%r15, -0xa0
> +
> +	brasl	%r14, __morestack_unblock_signals
> +
> +	lmg	%r2, %r15, 0x10(%r11)	# Restore all registers.
> +	.cfi_remember_state
> +	.cfi_restore %r15
> +	.cfi_restore %r14
> +	.cfi_restore %r13
> +	.cfi_restore %r12
> +	.cfi_restore %r11
> +	.cfi_restore %r10
> +	.cfi_restore %r9
> +	.cfi_restore %r8
> +	.cfi_restore %r7
> +	.cfi_restore %r6
> +	.cfi_def_cfa_register %r15
> +	br	%r14			# Return to caller's caller.
> +
> +# Executed if no new stack allocation is needed.
> +
> +.Lnoalloc:
> +	.cfi_restore_state
> +	# We may need to copy stack parameters.
> +	lg	%r9, 0x8(%r10)		# Load stack parameter size.
> +	ltgr	%r9, %r9		# Check if it's 0.
> +	je	.Lnostackparm		# Skip the copy if not needed.
> +	sgr	%r15, %r9		# Make space on the stack.
> +	la	%r8, 0xa0(%r15)		# Destination.
> +	la	%r12, 0xa0(%r11)	# Source.
> +	lgr	%r13, %r9		# Source size.
> +.Lcopy:
> +	mvcle	%r8, %r12, 0		# Copy.
> +	jo	.Lcopy
> +
> +.Lnostackparm:
> +	# Third parameter is address of function meat - address of parameter
> +	# block.
> +	ag	%r10, 0x10(%r10)
> +
> +	# Leave vararg pointer in %r1, in case function uses it
> +	la	%r1, 0xa0(%r11)
> +
> +	# OK, no stack allocation needed.  We still follow the protocol and
> +	# call our caller - it doesn't cost much and makes sure vararg works.
> +	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
> +	basr	%r14, %r10		# Call our caller.
> +
> +	lmg	%r6, %r15, 0x30(%r11)	# Restore all callee-saved registers.
> +	.cfi_remember_state
> +	.cfi_restore %r15
> +	.cfi_restore %r14
> +	.cfi_restore %r13
> +	.cfi_restore %r12
> +	.cfi_restore %r11
> +	.cfi_restore %r10
> +	.cfi_restore %r9
> +	.cfi_restore %r8
> +	.cfi_restore %r7
> +	.cfi_restore %r6
> +	.cfi_def_cfa_register %r15
> +	br	%r14			# Return to caller's caller.
> +
> +# This is the cleanup code called by the stack unwinder when unwinding
> +# through the code between .LEHB0 and .LEHE0 above.
> +
> +.L1:
> +	.cfi_restore_state
> +	lgr	%r2, %r11		# Stack pointer after resume.
> +	brasl	%r14, __generic_findstack
> +	lgr	%r3, %r11		# Get the stack pointer.
> +	sgr	%r3, %r2		# Subtract available space.
> +	aghi	%r3, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1		# Extract thread pointer.
> +	stg	%r3, 0x38(%r1)	# Save the new stack boundary.
> +
> +	lgr	%r2, %r6		# Exception header.
> +#ifdef __PIC__
> +	brasl	%r14, _Unwind_Resume@PLT
> +#else
> +	brasl	%r14, _Unwind_Resume
> +#endif
> +
> +#endif /* defined(__s390x__) */
> +
> +	.cfi_endproc
> +	.size	__morestack, . - __morestack
> +
> +
> +# The exception table.  This tells the personality routine to execute
> +# the exception handler.
> +
> +	.section	.gcc_except_table,"a",@progbits
> +	.align	4
> +.LLSDA1:
> +	.byte	0xff	# @LPStart format (omit)
> +	.byte	0xff	# @TType format (omit)
> +	.byte	0x1	# call-site format (uleb128)
> +	.uleb128 .LLSDACSE1-.LLSDACSB1	# Call-site table length
> +.LLSDACSB1:
> +	.uleb128 .LEHB0-.LFB1	# region 0 start
> +	.uleb128 .LEHE0-.LEHB0	# length
> +	.uleb128 .L1-.LFB1	# landing pad
> +	.uleb128 0		# action
> +.LLSDACSE1:
> +
> +
> +	.global __gcc_personality_v0
> +#ifdef __PIC__
> +	# Build a position independent reference to the basic
> +        # personality function.
> +	.hidden DW.ref.__gcc_personality_v0
> +	.weak   DW.ref.__gcc_personality_v0
> +	.section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
> +	.type	DW.ref.__gcc_personality_v0, @object
> +DW.ref.__gcc_personality_v0:
> +#ifndef __LP64__
> +	.align 4
> +	.size	DW.ref.__gcc_personality_v0, 4
> +	.long	__gcc_personality_v0
> +#else
> +	.align 8
> +	.size	DW.ref.__gcc_personality_v0, 8
> +	.quad	__gcc_personality_v0
> +#endif
> +#endif
> +
> +
> +
> +# Initialize the stack test value when the program starts or when a
> +# new thread starts.  We don't know how large the main stack is, so we
> +# guess conservatively.  We might be able to use getrlimit here.
> +
> +	.text
> +	.global	__stack_split_initialize
> +	.hidden	__stack_split_initialize
> +
> +	.type	__stack_split_initialize, @function
> +
> +__stack_split_initialize:
> +
> +#ifndef __s390x__
> +
> +	ear	%r1, %a0
> +	lr	%r0, %r15
> +	ahi	%r0, -0x4000	# We should have at least 16K.
> +	st	%r0, 0x20(%r1)
> +
> +	lr	%r2, %r15
> +	lhi	%r3, 0x4000
> +#ifdef __PIC__
> +	# Cannot do a tail call - we'll go through PLT, so we need GOT address
> +	# in %r12, which is callee-saved.
> +	stm	%r12, %r15, 0x30(%r15)
> +	basr	%r13, 0
> +.Lssi0:
> +	ahi	%r15, -0x60
> +	l	%r12, .Lssi2-.Lssi0(%r13)
> +	ar	%r12, %r13
> +	l	%r1, .Lssi1-.Lssi0(%r13)
> +	bas	%r14, 0(%r1, %r12)
> +	lm	%r12, %r15, 0x90(%r15)
> +	br	%r14
> +
> +.align 4
> +.Lssi1:
> +	.long	__generic_morestack_set_initial_sp@PLTOFF
> +.Lssi2:
> +	.long	_GLOBAL_OFFSET_TABLE_-.Lssi0
> +
> +#else
> +	basr	%r1, 0
> +.Lssi0:
> +	l	%r1, .Lssi1-.Lssi0(%r1)
> +	br	%r1	# Tail call
> +
> +.align 4
> +.Lssi1:
> +	.long	__generic_morestack_set_initial_sp
> +#endif
> +
> +#else /* defined(__s390x__) */
> +
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1
> +	lgr	%r0, %r15
> +	aghi	%r0, -0x4000	# We should have at least 16K.
> +	stg	%r0, 0x38(%r1)
> +
> +	lgr	%r2, %r15
> +	lghi	%r3, 0x4000
> +#ifdef __PIC__
> +	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
> +#else
> +	jg	__generic_morestack_set_initial_sp	# Tail call
> +#endif
> +
> +#endif /* defined(__s390x__) */
> +
> +	.size	__stack_split_initialize, . - __stack_split_initialize
> +
> +# Routines to get and set the guard, for __splitstack_getcontext,
> +# __splitstack_setcontext, and __splitstack_makecontext.
> +
> +# void *__morestack_get_guard (void) returns the current stack guard.
> +	.text
> +	.global	__morestack_get_guard
> +	.hidden	__morestack_get_guard
> +
> +	.type	__morestack_get_guard,@function
> +
> +__morestack_get_guard:
> +
> +#ifndef __s390x__
> +	ear	%r1, %a0
> +	l	%r2, 0x20(%r1)
> +#else
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1
> +	lg	%r2, 0x38(%r1)
> +#endif
> +	br %r14
> +
> +	.size	__morestack_get_guard, . - __morestack_get_guard
> +
> +# void __morestack_set_guard (void *) sets the stack guard.
> +	.global	__morestack_set_guard
> +	.hidden	__morestack_set_guard
> +
> +	.type	__morestack_set_guard,@function
> +
> +__morestack_set_guard:
> +
> +#ifndef __s390x__
> +	ear	%r1, %a0
> +	st	%r2, 0x20(%r1)
> +#else
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1
> +	stg	%r2, 0x38(%r1)
> +#endif
> +	br	%r14
> +
> +	.size	__morestack_set_guard, . - __morestack_set_guard
> +
> +# void *__morestack_make_guard (void *, size_t) returns the stack
> +# guard value for a stack.
> +	.global	__morestack_make_guard
> +	.hidden	__morestack_make_guard
> +
> +	.type	__morestack_make_guard,@function
> +
> +__morestack_make_guard:
> +
> +#ifndef __s390x__
> +	sr	%r2, %r3
> +	ahi	%r2, BACKOFF
> +#else
> +	sgr	%r2, %r3
> +	aghi	%r2, BACKOFF
> +#endif
> +	br	%r14
> +
> +	.size	__morestack_make_guard, . - __morestack_make_guard
> +
> +# Make __stack_split_initialize a high priority constructor.
> +
> +	.section .ctors.65535,"aw",@progbits
> +
> +#ifndef __LP64__
> +	.align	4
> +	.long	__stack_split_initialize
> +	.long	__morestack_load_mmap
> +#else
> +	.align	8
> +	.quad	__stack_split_initialize
> +	.quad	__morestack_load_mmap
> +#endif
> +
> +	.section	.note.GNU-stack,"",@progbits
> +	.section	.note.GNU-split-stack,"",@progbits
> +	.section	.note.GNU-no-split-stack,"",@progbits
> diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390
> new file mode 100644
> index 0000000..4c959b0
> --- /dev/null
> +++ b/libgcc/config/s390/t-stack-s390
> @@ -0,0 +1,2 @@
> +# Makefile fragment to support -fsplit-stack for s390.
> +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S
> diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
> index a10559b..8109c1a 100644
> --- a/libgcc/generic-morestack.c
> +++ b/libgcc/generic-morestack.c
> @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len,
>  #elif defined (__i386__)
>        nsp -= 6 * sizeof (void *);
>  #elif defined __powerpc64__
> +#elif defined __s390x__
> +      nsp -= 2 * 160;
> +#elif defined __s390__
> +      nsp -= 2 * 96;
>  #else
>  #error "unrecognized target"
>  #endif
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/5] s390: Add -fsplit-stack support
  2016-01-15 18:39   ` Andreas Krebbel
@ 2016-01-15 21:08     ` Marcin Kościelnicki
  2016-01-21 10:12       ` Andreas Krebbel
  2016-01-16 13:46     ` [PATCH] " Marcin Kościelnicki
  1 sibling, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-15 21:08 UTC (permalink / raw)
  To: Andreas Krebbel, gcc-patches

On 15/01/16 19:38, Andreas Krebbel wrote:
> Marcin,
>
> your implementation looks very good to me. Thanks!
>
> But please be aware that we deprecated the support of g5 and g6 and intend to remove that code from
> the back-end with the next GCC version.  So I would prefer if you could remove all the
> !TARGET_CPU_ZARCH stuff from the implementation and just error out if split-stack is enabled with
> -march g5/g6.  It currently makes the implementation more complicated and would have to be removed
> anyway in the future.
>
> Thanks!
>
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01854.html
>
>
> Bye,
>
> -Andreas-
>
>

Very well, I'll do that.

Btw, as for dropping support for g5/g6: I've noticed 
s390_function_profiler could also use larl+brasl for -m31 given 
TARGET_CPU_ZARCH.  Should I submit a patch for that?  I'm asking because 
gold with -fsplit-stack needs to know the exact sequence used, so if 
it's going to change after g5/g6 removal, I'd better add it to gold now 
(and make gcc always emit it for non-g5/g6, so that gold won't need to 
look at the old one).

What about the other patches?  #1 and #2 should be ready to go.  I'm not 
sure how I should go about getting #3 and #4 reviewed.  We don't need #3 
anymore once g5/g6 support is removed, but #4 might still be necessary - 
we still have that unconditional jump.

Marcin Kościelnicki

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] s390: Add -fsplit-stack support
  2016-01-15 18:39   ` Andreas Krebbel
  2016-01-15 21:08     ` Marcin Kościelnicki
@ 2016-01-16 13:46     ` Marcin Kościelnicki
  2016-01-29 13:33       ` Andreas Krebbel
  1 sibling, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-16 13:46 UTC (permalink / raw)
  To: krebbel; +Cc: gcc-patches, Marcin Kościelnicki

libgcc/ChangeLog:

	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
	* config/s390/morestack.S: New file.
	* config/s390/t-stack-s390: New file.
	* generic-morestack.c (__splitstack_find): Add s390-specific code.

gcc/ChangeLog:

	* common/config/s390/s390-common.c (s390_supports_split_stack):
	New function.
	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
	* config/s390/s390.c (struct machine_function): New field
	split_stack_varargs_pointer.
	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
	in s390_emit_prologue.
	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
	vararg pointer.
	(morestack_ref): New global.
	(SPLIT_STACK_AVAILABLE): New macro.
	(s390_expand_split_stack_prologue): New function.
	(s390_expand_split_stack_call): New function.
	(s390_live_on_entry): New function.
	(s390_va_start): Use split-stack vararg pointer if appropriate.
	(s390_reorg): Lower the split-stack pseudo-insns.
	(s390_asm_file_end): Emit the split-stack note sections.
	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
	* config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec.
	(UNSPECV_SPLIT_STACK_CALL): New unspec.
	(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
	(UNSPECV_SPLIT_STACK_MARKER): New unspec.
	(split_stack_prologue): New expand.
	(split_stack_call_*): New insn.
	(split_stack_cond_call_*): New insn.
	(split_stack_space_check): New expand.
	(split_stack_sibcall_*): New insn.
	(split_stack_cond_sibcall_*): New insn.
	(split_stack_marker): New insn.
---
Support for !TARGET_CPU_ZARCH removed and sorried.  I've also cleaned up
the 31-bit versions of morestack.S routines to more closely mirror their
64-bit counterparts, since I can now use the newer opcodes.

I'm also submitting a new version of the gold patch, which has support for
old CPUs likewise removed.

 gcc/ChangeLog                        |  33 ++
 gcc/common/config/s390/s390-common.c |  14 +
 gcc/config/s390/s390-protos.h        |   1 +
 gcc/config/s390/s390.c               | 371 ++++++++++++++++++++-
 gcc/config/s390/s390.md              | 109 +++++++
 libgcc/ChangeLog                     |   7 +
 libgcc/config.host                   |   4 +-
 libgcc/config/s390/morestack.S       | 609 +++++++++++++++++++++++++++++++++++
 libgcc/config/s390/t-stack-s390      |   2 +
 libgcc/generic-morestack.c           |   4 +
 10 files changed, 1148 insertions(+), 6 deletions(-)
 create mode 100644 libgcc/config/s390/morestack.S
 create mode 100644 libgcc/config/s390/t-stack-s390

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index c881d52..71f6f38 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,38 @@
 2016-01-16  Marcin Kościelnicki  <koriakin@0x04.net>
 
+	* common/config/s390/s390-common.c (s390_supports_split_stack):
+	New function.
+	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
+	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
+	* config/s390/s390.c (struct machine_function): New field
+	split_stack_varargs_pointer.
+	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
+	in s390_emit_prologue.
+	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
+	vararg pointer.
+	(morestack_ref): New global.
+	(SPLIT_STACK_AVAILABLE): New macro.
+	(s390_expand_split_stack_prologue): New function.
+	(s390_expand_split_stack_call): New function.
+	(s390_live_on_entry): New function.
+	(s390_va_start): Use split-stack vararg pointer if appropriate.
+	(s390_reorg): Lower the split-stack pseudo-insns.
+	(s390_asm_file_end): Emit the split-stack note sections.
+	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
+	* config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec.
+	(UNSPECV_SPLIT_STACK_CALL): New unspec.
+	(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
+	(UNSPECV_SPLIT_STACK_MARKER): New unspec.
+	(split_stack_prologue): New expand.
+	(split_stack_call_*): New insn.
+	(split_stack_cond_call_*): New insn.
+	(split_stack_space_check): New expand.
+	(split_stack_sibcall_*): New insn.
+	(split_stack_cond_sibcall_*): New insn.
+	(split_stack_marker): New insn.
+
+2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
+
 	* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
 	with side effects.
 
diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c
index 4519c21..1e497e6 100644
--- a/gcc/common/config/s390/s390-common.c
+++ b/gcc/common/config/s390/s390-common.c
@@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
     }
 }
 
+/* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
+   We don't verify it, since earlier versions just have padding at
+   its place, which works just as well.  */
+
+static bool
+s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
+			   struct gcc_options *opts ATTRIBUTE_UNUSED)
+{
+  return true;
+}
+
 #undef TARGET_DEFAULT_TARGET_FLAGS
 #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT)
 
@@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
 #undef TARGET_OPTION_INIT_STRUCT
 #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct
 
+#undef TARGET_SUPPORTS_SPLIT_STACK
+#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack
+
 struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 633bc1e..09032c9 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
 extern HOST_WIDE_INT s390_initial_elimination_offset (int, int);
 extern void s390_emit_prologue (void);
 extern void s390_emit_epilogue (bool);
+extern void s390_expand_split_stack_prologue (void);
 extern bool s390_can_use_simple_return_insn (void);
 extern bool s390_can_use_return_insn (void);
 extern void s390_function_profiler (FILE *, int);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 3be64de..6afce7c 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -426,6 +426,13 @@ struct GTY(()) machine_function
   /* True if the current function may contain a tbegin clobbering
      FPRs.  */
   bool tbegin_p;
+
+  /* For -fsplit-stack support: A stack local which holds a pointer to
+     the stack arguments for a function with a variable number of
+     arguments.  This is set at the start of the function and is used
+     to initialize the overflow_arg_area field of the va_list
+     structure.  */
+  rtx split_stack_varargs_pointer;
 };
 
 /* Few accessor macros for struct cfun->machine->s390_frame_layout.  */
@@ -9316,9 +9323,13 @@ s390_register_info ()
 	  cfun_frame_layout.high_fprs++;
       }
 
-  if (flag_pic)
-    clobbered_regs[PIC_OFFSET_TABLE_REGNUM]
-      |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM);
+  /* Register 12 is used for GOT address, but also as temp in prologue
+     for split-stack stdarg functions (unless r14 is available).  */
+  clobbered_regs[12]
+    |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM))
+	|| (flag_split_stack && cfun->stdarg
+	    && (crtl->is_leaf || TARGET_TPF_PROFILING
+		|| has_hard_reg_initial_val (Pmode, RETURN_REGNUM))));
 
   clobbered_regs[BASE_REGNUM]
     |= (cfun->machine->base_reg
@@ -10446,6 +10457,8 @@ s390_emit_prologue (void)
       && !crtl->is_leaf
       && !TARGET_TPF_PROFILING)
     temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM);
+  else if (flag_split_stack && cfun->stdarg)
+    temp_reg = gen_rtx_REG (Pmode, 12);
   else
     temp_reg = gen_rtx_REG (Pmode, 1);
 
@@ -10939,6 +10952,284 @@ s300_set_up_by_prologue (hard_reg_set_container *regs)
     SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg));
 }
 
+/* -fsplit-stack support.  */
+
+/* A SYMBOL_REF for __morestack.  */
+static GTY(()) rtx morestack_ref;
+
+/* When using -fsplit-stack, the allocation routines set a field in
+   the TCB to the bottom of the stack plus this much space, measured
+   in bytes.  */
+
+#define SPLIT_STACK_AVAILABLE 1024
+
+/* Emit -fsplit-stack prologue, which goes before the regular function
+   prologue.  */
+
+void
+s390_expand_split_stack_prologue (void)
+{
+  rtx r1, guard, cc;
+  rtx_insn *insn;
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = TARGET_64BIT ? 0x38 : 0x20;
+  /* Pointer size in bytes.  */
+  /* Frame size and argument size - the two parameters to __morestack.  */
+  HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size;
+  /* Align argument size to 8 bytes - simplifies __morestack code.  */
+  HOST_WIDE_INT args_size = crtl->args.size >= 0
+			    ? ((crtl->args.size + 7) & ~7)
+			    : 0;
+  /* Label to jump to when no __morestack call is necessary.  */
+  rtx_code_label *enough = NULL;
+  /* Label to be called by __morestack.  */
+  rtx_code_label *call_done = NULL;
+  /* 1 if __morestack called conditionally, 0 if always.  */
+  int conditional = 0;
+
+  gcc_assert (flag_split_stack && reload_completed);
+  if (!TARGET_CPU_ZARCH)
+    {
+      sorry ("CPUs older than z900 are not supported for -fsplit-stack");
+      return;
+    }
+
+  r1 = gen_rtx_REG (Pmode, 1);
+
+  /* If no stack frame will be allocated, don't do anything.  */
+  if (!frame_size)
+    {
+      /* But emit a marker that will let linker and indirect function
+	 calls recognise this function as split-stack aware.  */
+      emit_insn(gen_split_stack_marker());
+      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+        {
+          /* If va_start is used, just use r15.  */
+          emit_move_insn (r1,
+		          gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+			                GEN_INT (STACK_POINTER_OFFSET)));
+        }
+      return;
+    }
+
+  if (morestack_ref == NULL_RTX)
+    {
+      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
+      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
+					   | SYMBOL_FLAG_FUNCTION);
+    }
+
+  if (frame_size <= 0x7fff || (TARGET_EXTIMM && frame_size <= 0xffffffffu))
+    {
+      /* If frame_size will fit in an add instruction, do a stack space
+	 check, and only call __morestack if there's not enough space.  */
+      conditional = 1;
+
+      /* Get thread pointer.  r1 is the only register we can always destroy - r0
+         could contain a static chain (and cannot be used to address memory
+         anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved.  */
+      emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM));
+      /* Aim at __private_ss.  */
+      guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso));
+
+      /* If less that 1kiB used, skip addition and compare directly with
+         __private_ss.  */
+      if (frame_size > SPLIT_STACK_AVAILABLE)
+        {
+          emit_move_insn (r1, guard);
+	  if (TARGET_64BIT)
+	    emit_insn (gen_adddi3 (r1, r1, GEN_INT(frame_size)));
+	  else
+	    emit_insn (gen_addsi3 (r1, r1, GEN_INT(frame_size)));
+	  guard = r1;
+        }
+
+      if (TARGET_CPU_ZARCH)
+        {
+	  rtx tmp;
+
+          /* Compare the (maybe adjusted) guard with the stack pointer.  */
+          cc = s390_emit_compare (LT, stack_pointer_rtx, guard);
+
+          call_done = gen_label_rtx ();
+
+	  if (TARGET_64BIT)
+	    tmp = gen_split_stack_cond_call_di (call_done,
+						morestack_ref,
+						GEN_INT (frame_size),
+						GEN_INT (args_size),
+						cc);
+	  else
+	    tmp = gen_split_stack_cond_call_si (call_done,
+						morestack_ref,
+						GEN_INT (frame_size),
+						GEN_INT (args_size),
+						cc);
+
+
+          insn = emit_jump_insn (tmp);
+	  JUMP_LABEL (insn) = call_done;
+
+          /* Mark the jump as very unlikely to be taken.  */
+          add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100);
+	}
+      else
+        {
+          /* Compare the (maybe adjusted) guard with the stack pointer.  */
+          cc = s390_emit_compare (GE, stack_pointer_rtx, guard);
+
+          enough = gen_label_rtx ();
+          insn = s390_emit_jump (enough, cc);
+          JUMP_LABEL (insn) = enough;
+
+          /* Mark the jump as very likely to be taken.  */
+          add_int_reg_note (insn, REG_BR_PROB,
+			    REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100);
+	}
+    }
+
+  if (call_done == NULL)
+    {
+      rtx tmp;
+      call_done = gen_label_rtx ();
+
+      /* Now, we need to call __morestack.  It has very special calling
+         conventions: it preserves param/return/static chain registers for
+         calling main function body, and looks for its own parameters
+         at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */
+      if (TARGET_64BIT)
+        tmp = gen_split_stack_call_di (call_done,
+					     morestack_ref,
+					     GEN_INT (frame_size),
+					     GEN_INT (args_size));
+      else
+        tmp = gen_split_stack_call_si (call_done,
+					     morestack_ref,
+					     GEN_INT (frame_size),
+					     GEN_INT (args_size));
+      insn = emit_jump_insn (tmp);
+      JUMP_LABEL (insn) = call_done;
+      emit_barrier ();
+    }
+
+  /* __morestack will call us here.  */
+
+  if (enough != NULL)
+    {
+      emit_label (enough);
+      LABEL_NUSES (enough) = 1;
+    }
+
+  if (conditional && cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+    {
+      /* If va_start is used, and __morestack was not called, just use r15.  */
+      emit_move_insn (r1,
+		      gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+			            GEN_INT (STACK_POINTER_OFFSET)));
+    }
+
+  emit_label (call_done);
+  LABEL_NUSES (call_done) = 1;
+}
+
+/* Generates split-stack call sequence, along with its parameter block.  */
+
+static void
+s390_expand_split_stack_call (rtx_insn *orig_insn,
+			      rtx call_done,
+			      rtx function,
+			      rtx frame_size,
+			      rtx args_size,
+			      rtx cond)
+{
+  int psize = GET_MODE_SIZE (Pmode);
+  rtx_insn *insn = orig_insn;
+  rtx parmbase = gen_label_rtx();
+  rtx r1 = gen_rtx_REG (Pmode, 1);
+  rtx tmp, tmp2;
+
+  /* %r1 = litbase.  */
+  insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn);
+  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
+  LABEL_NUSES (parmbase)++;
+
+  /* jg<cond> __morestack.  */
+  if (cond == NULL)
+    {
+      if (TARGET_64BIT)
+        tmp = gen_split_stack_sibcall_di (function, call_done);
+      else
+        tmp = gen_split_stack_sibcall_si (function, call_done);
+      insn = emit_jump_insn_after (tmp, insn);
+    }
+  else
+    {
+      if (!s390_comparison (cond, VOIDmode))
+	internal_error ("bad split_stack_call cond");
+      if (TARGET_64BIT)
+        tmp = gen_split_stack_cond_sibcall_di (function, cond, call_done);
+      else
+        tmp = gen_split_stack_cond_sibcall_si (function, cond, call_done);
+      insn = emit_jump_insn_after (tmp, insn);
+    }
+  JUMP_LABEL (insn) = call_done;
+  LABEL_NUSES (call_done)++;
+
+  /* Go to .rodata.  */
+  insn = emit_insn_after (gen_pool_section_start (), insn);
+
+  /* Now, we'll emit parameters to __morestack.  First, align to pointer size
+     (this mirrors the alignment done in __morestack - don't touch it).  */
+  insn = emit_insn_after (gen_pool_align (GEN_INT (psize)), insn);
+
+  insn = emit_label_after (parmbase, insn);
+
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, frame_size),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+
+  /* Second parameter is size of the arguments passed on stack that
+     __morestack has to copy to the new stack (does not include varargs).  */
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, args_size),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+
+  /* Third parameter is offset between start of the parameter block
+     and function body to be called by __morestack.  */
+  tmp = gen_rtx_LABEL_REF (Pmode, parmbase);
+  tmp2 = gen_rtx_LABEL_REF (Pmode, call_done);
+  tmp = gen_rtx_CONST (Pmode,
+                       gen_rtx_MINUS (Pmode, tmp2, tmp));
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, tmp),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+  add_reg_note (insn, REG_LABEL_OPERAND, call_done);
+  LABEL_NUSES (call_done)++;
+  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
+  LABEL_NUSES (parmbase)++;
+
+  /* Return from .rodata.  */
+  insn = emit_insn_after (gen_pool_section_end (), insn);
+
+  delete_insn (orig_insn);
+}
+
+/* We may have to tell the dataflow pass that the split stack prologue
+   is initializing a register.  */
+
+static void
+s390_live_on_entry (bitmap regs)
+{
+  if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+    {
+      gcc_assert (flag_split_stack);
+      bitmap_set_bit (regs, 1);
+    }
+}
+
 /* Return true if the function can use simple_return to return outside
    of a shrink-wrapped region.  At present shrink-wrapping is supported
    in all cases.  */
@@ -11541,6 +11832,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
       expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
     }
 
+  if (flag_split_stack
+     && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
+         == NULL)
+     && cfun->machine->split_stack_varargs_pointer == NULL_RTX)
+    {
+      rtx reg;
+      rtx_insn *seq;
+
+      reg = gen_reg_rtx (Pmode);
+      cfun->machine->split_stack_varargs_pointer = reg;
+
+      start_sequence ();
+      emit_move_insn (reg, gen_rtx_REG (Pmode, 1));
+      seq = get_insns ();
+      end_sequence ();
+
+      push_topmost_sequence ();
+      emit_insn_after (seq, entry_of_function ());
+      pop_topmost_sequence ();
+    }
+
   /* Find the overflow area.
      FIXME: This currently is too pessimistic when the vector ABI is
      enabled.  In that case we *always* set up the overflow area
@@ -11549,7 +11861,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
       || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG
       || TARGET_VX_ABI)
     {
-      t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
+      if (cfun->machine->split_stack_varargs_pointer == NULL_RTX)
+        t = make_tree (TREE_TYPE (ovf), crtl->args.internal_arg_pointer);
+      else
+        t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer);
 
       off = INTVAL (crtl->args.arg_offset_rtx);
       off = off < 0 ? 0 : off;
@@ -13158,6 +13473,48 @@ s390_reorg (void)
 	}
     }
 
+  if (flag_split_stack)
+    {
+      rtx_insn *insn;
+
+      for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+	{
+	  /* Look for the split-stack fake jump instructions.  */
+	  if (!JUMP_P(insn))
+	    continue;
+	  if (GET_CODE (PATTERN (insn)) != PARALLEL
+	      || XVECLEN (PATTERN (insn), 0) != 2)
+	    continue;
+	  rtx set = XVECEXP (PATTERN (insn), 0, 1);
+	  if (GET_CODE (set) != SET)
+	    continue;
+	  rtx unspec = XEXP(set, 1);
+	  if (GET_CODE (unspec) != UNSPEC_VOLATILE)
+	    continue;
+	  if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL)
+	    continue;
+	  rtx set_pc = XVECEXP (PATTERN (insn), 0, 0);
+	  rtx function = XVECEXP (unspec, 0, 0);
+	  rtx frame_size = XVECEXP (unspec, 0, 1);
+	  rtx args_size = XVECEXP (unspec, 0, 2);
+	  rtx pc_src = XEXP (set_pc, 1);
+	  rtx call_done, cond = NULL_RTX;
+	  if (GET_CODE (pc_src) == IF_THEN_ELSE)
+	    {
+	      cond = XEXP (pc_src, 0);
+	      call_done = XEXP (XEXP (pc_src, 1), 0);
+	    }
+	  else
+	    call_done = XEXP (pc_src, 0);
+	  s390_expand_split_stack_call (insn,
+					call_done,
+					function,
+					frame_size,
+					args_size,
+					cond);
+	}
+    }
+
   /* Try to optimize prologue and epilogue further.  */
   s390_optimize_prologue ();
 
@@ -14469,6 +14826,9 @@ s390_asm_file_end (void)
 	     s390_vector_abi);
 #endif
   file_end_indicate_exec_stack ();
+
+  if (flag_split_stack)
+    file_end_indicate_split_stack ();
 }
 
 /* Return true if TYPE is a vector bool type.  */
@@ -14724,6 +15084,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty
 #undef TARGET_SET_UP_BY_PROLOGUE
 #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue
 
+#undef TARGET_EXTRA_LIVE_ON_ENTRY
+#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry
+
 #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   s390_use_by_pieces_infrastructure_p
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 9b869d5..21cd989 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -114,6 +114,9 @@
    UNSPEC_SP_SET
    UNSPEC_SP_TEST
 
+   ; Split stack support
+   UNSPEC_STACK_CHECK
+
    ; Test Data Class (TDC)
    UNSPEC_TDC_INSN
 
@@ -276,6 +279,11 @@
    ; Set and get floating point control register
    UNSPECV_SFPC
    UNSPECV_EFPC
+
+   ; Split stack support
+   UNSPECV_SPLIT_STACK_CALL
+   UNSPECV_SPLIT_STACK_SIBCALL
+   UNSPECV_SPLIT_STACK_MARKER
   ])
 
 ;;
@@ -10907,3 +10915,104 @@
   "TARGET_Z13"
   "lcbb\t%0,%1,%b2"
   [(set_attr "op_type" "VRX")])
+
+; Handle -fsplit-stack.
+
+(define_expand "split_stack_prologue"
+  [(const_int 0)]
+  ""
+{
+  s390_expand_split_stack_prologue ();
+  DONE;
+})
+
+(define_insn "split_stack_call_<mode>"
+  [(set (pc) (label_ref (match_operand 0 "" "")))
+   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
+                                    (match_operand 2 "consttable_operand" "X")
+                                    (match_operand 3 "consttable_operand" "X")]
+                                   UNSPECV_SPLIT_STACK_CALL))]
+  "TARGET_CPU_ZARCH"
+{
+  gcc_unreachable ();
+}
+  [(set_attr "length" "12")])
+
+(define_insn "split_stack_cond_call_<mode>"
+  [(set (pc)
+        (if_then_else
+          (match_operand 4 "" "")
+          (label_ref (match_operand 0 "" ""))
+          (pc)))
+   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
+                                    (match_operand 2 "consttable_operand" "X")
+                                    (match_operand 3 "consttable_operand" "X")]
+                                   UNSPECV_SPLIT_STACK_CALL))]
+  "TARGET_CPU_ZARCH"
+{
+  gcc_unreachable ();
+}
+  [(set_attr "length" "12")])
+
+;; If there are operand 0 bytes available on the stack, jump to
+;; operand 1.
+
+(define_expand "split_stack_space_check"
+  [(set (pc) (if_then_else
+	      (ltu (minus (reg 15)
+			  (match_operand 0 "register_operand"))
+		   (unspec [(const_int 0)] UNSPEC_STACK_CHECK))
+	      (label_ref (match_operand 1))
+	      (pc)))]
+  ""
+{
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = TARGET_64BIT ? 0x38 : 0x20;
+  rtx tp = s390_get_thread_pointer ();
+  rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso));
+  rtx reg = gen_reg_rtx (Pmode);
+  rtx cc;
+  if (TARGET_64BIT)
+    emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0]));
+  else
+    emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0]));
+  cc = s390_emit_compare (GT, reg, guard);
+  s390_emit_jump (operands[1], cc);
+
+  DONE;
+})
+
+;; A jg with minimal fuss for use in split stack prologue.
+
+(define_insn "split_stack_sibcall_<mode>"
+  [(set (pc) (label_ref (match_operand 1 "" "")))
+   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
+                                   UNSPECV_SPLIT_STACK_SIBCALL))]
+  "TARGET_CPU_ZARCH"
+  "jg\t%0"
+  [(set_attr "op_type" "RIL")
+   (set_attr "type"  "branch")])
+
+;; Also a conditional one.
+
+(define_insn "split_stack_cond_sibcall_<mode>"
+  [(set (pc)
+        (if_then_else
+          (match_operand 1 "" "")
+          (label_ref (match_operand 2 "" ""))
+          (pc)))
+   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
+                                   UNSPECV_SPLIT_STACK_SIBCALL))]
+  "TARGET_CPU_ZARCH"
+  "jg%C1\t%0"
+  [(set_attr "op_type" "RIL")
+   (set_attr "type"  "branch")])
+
+;; An unusual nop instruction used to mark functions with no stack frames
+;; as split-stack aware.
+
+(define_insn "split_stack_marker"
+  [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)]
+  ""
+  "nopr\t%%r15"
+  [(set_attr "op_type" "RR")])
diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index 4cd8f01..604b120 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-01-16  Marcin Kościelnicki  <koriakin@0x04.net>
+
+	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
+	* config/s390/morestack.S: New file.
+	* config/s390/t-stack-s390: New file.
+	* generic-morestack.c (__splitstack_find): Add s390-specific code.
+
 2016-01-15  Nick Clifton  <nickc@redhat.com>
 
 	* config/msp430/t-msp430 (lib2_mul_none.o): Only use the first
diff --git a/libgcc/config.host b/libgcc/config.host
index f58ee45..9793155 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1105,11 +1105,11 @@ rx-*-elf)
 	tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h"
 	;;
 s390-*-linux*)
-	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi"
+	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390"
 	md_unwind_header=s390/linux-unwind.h
 	;;
 s390x-*-linux*)
-	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux"
+	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390"
 	if test "${host_address}" = 32; then
 	   tmake_file="${tmake_file} s390/32/t-floattodi"
 	fi
diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S
new file mode 100644
index 0000000..c99f6e4
--- /dev/null
+++ b/libgcc/config/s390/morestack.S
@@ -0,0 +1,609 @@
+# s390 support for -fsplit-stack.
+# Copyright (C) 2015 Free Software Foundation, Inc.
+# Contributed by Marcin Kościelnicki <koriakin@0x04.net>.
+
+# This file is part of GCC.
+
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# Excess space needed to call ld.so resolver for lazy plt
+# resolution.  Go uses sigaltstack so this doesn't need to
+# also cover signal frame size.
+#define BACKOFF 0x1000
+
+# The __morestack function.
+
+	.global	__morestack
+	.hidden	__morestack
+
+	.type	__morestack,@function
+
+__morestack:
+.LFB1:
+	.cfi_startproc
+
+
+#ifndef __s390x__
+
+
+# The 31-bit __morestack function.
+
+	# We use a cleanup to restore the stack guard if an exception
+	# is thrown through this code.
+#ifndef __PIC__
+	.cfi_personality 0,__gcc_personality_v0
+	.cfi_lsda 0,.LLSDA1
+#else
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#endif
+
+	stm	%r2, %r15, 0x8(%r15)	# Save %r2-%r15.
+	.cfi_offset %r6, -0x48
+	.cfi_offset %r7, -0x44
+	.cfi_offset %r8, -0x40
+	.cfi_offset %r9, -0x3c
+	.cfi_offset %r10, -0x38
+	.cfi_offset %r11, -0x34
+	.cfi_offset %r12, -0x30
+	.cfi_offset %r13, -0x2c
+	.cfi_offset %r14, -0x28
+	.cfi_offset %r15, -0x24
+	lr	%r11, %r15		# Make frame pointer for vararg.
+	.cfi_def_cfa_register %r11
+	ahi	%r15, -0x60		# 0x60 for standard frame.
+	st	%r11, 0(%r15)		# Save back chain.
+	lr	%r8, %r0		# Save %r0 (static chain).
+	lr	%r10, %r1		# Save %r1 (address of parameter block).
+
+	l	%r7, 0(%r10)		# Required frame size to %r7
+	ear	%r1, %a0		# Extract thread pointer.
+	l	%r1, 0x20(%r1)		# Get stack bounduary
+	ar	%r1, %r7		# Stack bounduary + frame size
+	a	%r1, 4(%r10)		# + stack param size
+	clr	%r1, %r15		# Compare with current stack pointer
+	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
+
+	brasl	%r14, __morestack_block_signals
+
+	# We abuse one of caller's fpr save slots (which we don't use for fprs)
+	# as a local variable.  Not needed here, but done to be consistent with
+	# the below use.
+	ahi	%r7, BACKOFF		# Bump requested size a bit.
+	st	%r7, 0x40(%r11)		# Stuff frame size on stack.
+	la	%r2, 0x40(%r11)		# Pass its address as parameter.
+	la	%r3, 0x60(%r11)		# Caller's stack parameters.
+	l	%r4, 4(%r10)		# Size of stack paremeters.
+	brasl	%r14, __generic_morestack
+
+	lr	%r15, %r2		# Switch to the new stack.
+	ahi	%r15, -0x60		# Make a stack frame on it.
+	st	%r11, 0(%r15)		# Save back chain.
+
+	s	%r2, 0x40(%r11)		# The end of stack space.
+	ahi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+.LEHB0:
+	st	%r2, 0x20(%r1)	# Save the new stack boundary.
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lr	%r0, %r8		# Static chain.
+	lm	%r2, %r6, 0x8(%r11)	# Paremeter registers.
+
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	a	%r10, 0x8(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0x60(%r11)
+
+	# State of registers:
+	# %r0: Static chain from entry.
+	# %r1: Vararg pointer.
+	# %r2-%r6: Parameters from entry.
+	# %r7-%r10: Indeterminate.
+	# %r11: Frame pointer (%r15 from entry).
+	# %r12-%r13: Indeterminate.
+	# %r14: Return address.
+	# %r15: Stack pointer.
+	basr	%r14, %r10		# Call our caller.
+
+	stm	%r2, %r3, 0x8(%r11)	# Save return registers.
+
+	brasl	%r14, __morestack_block_signals
+
+	# We need a stack slot now, but have no good way to get it - the frame
+	# on new stack had to be exactly 0x60 bytes, or stack parameters would
+	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
+	# save actual fprs).
+	la	%r2, 0x40(%r11)
+	brasl	%r14, __generic_releasestack
+
+	s	%r2, 0x40(%r11)		# Subtract available space.
+	ahi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+.LEHE0:
+	st	%r2, 0x20(%r1)	# Save the new stack boundary.
+
+	# We need to restore the old stack pointer before unblocking signals.
+	# We also need 0x60 bytes for a stack frame.  Since we had a stack
+	# frame at this place before the stack switch, there's no need to
+	# write the back chain again.
+	lr	%r15, %r11
+	ahi	%r15, -0x60
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lm	%r2, %r15, 0x8(%r11)	# Restore all registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# Executed if no new stack allocation is needed.
+
+.Lnoalloc:
+	.cfi_restore_state
+	# We may need to copy stack parameters.
+	l	%r9, 0x4(%r10)		# Load stack parameter size.
+	ltr	%r9, %r9		# And check if it's 0.
+	je	.Lnostackparm		# Skip the copy if not needed.
+	sr	%r15, %r9		# Make space on the stack.
+	la	%r8, 0x60(%r15)		# Destination.
+	la	%r12, 0x60(%r11)	# Source.
+	lr	%r13, %r9		# Source size.
+.Lcopy:
+	mvcle	%r8, %r12, 0		# Copy.
+	jo	.Lcopy
+
+.Lnostackparm:
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	a	%r10, 0x8(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0x60(%r11)
+
+	# OK, no stack allocation needed.  We still follow the protocol and
+	# call our caller - it doesn't cost much and makes sure vararg works.
+	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
+	basr	%r14, %r10		# Call our caller.
+
+	lm	%r6, %r15, 0x18(%r11)	# Restore all callee-saved registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# This is the cleanup code called by the stack unwinder when unwinding
+# through the code between .LEHB0 and .LEHE0 above.
+
+.L1:
+	.cfi_restore_state
+	lr	%r2, %r11		# Stack pointer after resume.
+	brasl	%r14, __generic_findstack
+	lr	%r3, %r11		# Get the stack pointer.
+	sr	%r3, %r2		# Subtract available space.
+	ahi	%r3, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+	st	%r3, 0x20(%r1)	# Save the new stack boundary.
+
+	lr	%r2, %r6		# Exception header.
+#ifdef __PIC__
+	brasl	%r14, _Unwind_Resume@PLT
+#else
+	brasl	%r14, _Unwind_Resume
+#endif
+
+#else /* defined(__s390x__) */
+
+
+# The 64-bit __morestack function.
+
+	# We use a cleanup to restore the stack guard if an exception
+	# is thrown through this code.
+#ifndef __PIC__
+	.cfi_personality 0x3,__gcc_personality_v0
+	.cfi_lsda 0x3,.LLSDA1
+#else
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#endif
+
+	stmg	%r2, %r15, 0x10(%r15)	# Save %r2-%r15.
+	.cfi_offset %r6, -0x70
+	.cfi_offset %r7, -0x68
+	.cfi_offset %r8, -0x60
+	.cfi_offset %r9, -0x58
+	.cfi_offset %r10, -0x50
+	.cfi_offset %r11, -0x48
+	.cfi_offset %r12, -0x40
+	.cfi_offset %r13, -0x38
+	.cfi_offset %r14, -0x30
+	.cfi_offset %r15, -0x28
+	lgr	%r11, %r15		# Make frame pointer for vararg.
+	.cfi_def_cfa_register %r11
+	aghi	%r15, -0xa0		# 0xa0 for standard frame.
+	stg	%r11, 0(%r15)		# Save back chain.
+	lgr	%r8, %r0		# Save %r0 (static chain).
+	lgr	%r10, %r1		# Save %r1 (address of parameter block).
+
+	lg	%r7, 0(%r10)		# Required frame size to %r7
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+	lg	%r1, 0x38(%r1)		# Get stack bounduary
+	agr	%r1, %r7		# Stack bounduary + frame size
+	ag	%r1, 8(%r10)		# + stack param size
+	clgr	%r1, %r15		# Compare with current stack pointer
+	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
+
+	brasl	%r14, __morestack_block_signals
+
+	# We abuse one of caller's fpr save slots (which we don't use for fprs)
+	# as a local variable.  Not needed here, but done to be consistent with
+	# the below use.
+	aghi	%r7, BACKOFF		# Bump requested size a bit.
+	stg	%r7, 0x80(%r11)		# Stuff frame size on stack.
+	la	%r2, 0x80(%r11)		# Pass its address as parameter.
+	la	%r3, 0xa0(%r11)		# Caller's stack parameters.
+	lg	%r4, 8(%r10)		# Size of stack paremeters.
+	brasl	%r14, __generic_morestack
+
+	lgr	%r15, %r2		# Switch to the new stack.
+	aghi	%r15, -0xa0		# Make a stack frame on it.
+	stg	%r11, 0(%r15)		# Save back chain.
+
+	sg	%r2, 0x80(%r11)		# The end of stack space.
+	aghi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+.LEHB0:
+	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lgr	%r0, %r8		# Static chain.
+	lmg	%r2, %r6, 0x10(%r11)	# Paremeter registers.
+
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	ag	%r10, 0x10(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0xa0(%r11)
+
+	# State of registers:
+	# %r0: Static chain from entry.
+	# %r1: Vararg pointer.
+	# %r2-%r6: Parameters from entry.
+	# %r7-%r10: Indeterminate.
+	# %r11: Frame pointer (%r15 from entry).
+	# %r12-%r13: Indeterminate.
+	# %r14: Return address.
+	# %r15: Stack pointer.
+	basr	%r14, %r10		# Call our caller.
+
+	stg	%r2, 0x10(%r11)		# Save return register.
+
+	brasl	%r14, __morestack_block_signals
+
+	# We need a stack slot now, but have no good way to get it - the frame
+	# on new stack had to be exactly 0xa0 bytes, or stack parameters would
+	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
+	# save actual fprs).
+	la	%r2, 0x80(%r11)
+	brasl	%r14, __generic_releasestack
+
+	sg	%r2, 0x80(%r11)		# Subtract available space.
+	aghi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+.LEHE0:
+	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
+
+	# We need to restore the old stack pointer before unblocking signals.
+	# We also need 0xa0 bytes for a stack frame.  Since we had a stack
+	# frame at this place before the stack switch, there's no need to
+	# write the back chain again.
+	lgr	%r15, %r11
+	aghi	%r15, -0xa0
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lmg	%r2, %r15, 0x10(%r11)	# Restore all registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# Executed if no new stack allocation is needed.
+
+.Lnoalloc:
+	.cfi_restore_state
+	# We may need to copy stack parameters.
+	lg	%r9, 0x8(%r10)		# Load stack parameter size.
+	ltgr	%r9, %r9		# Check if it's 0.
+	je	.Lnostackparm		# Skip the copy if not needed.
+	sgr	%r15, %r9		# Make space on the stack.
+	la	%r8, 0xa0(%r15)		# Destination.
+	la	%r12, 0xa0(%r11)	# Source.
+	lgr	%r13, %r9		# Source size.
+.Lcopy:
+	mvcle	%r8, %r12, 0		# Copy.
+	jo	.Lcopy
+
+.Lnostackparm:
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	ag	%r10, 0x10(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0xa0(%r11)
+
+	# OK, no stack allocation needed.  We still follow the protocol and
+	# call our caller - it doesn't cost much and makes sure vararg works.
+	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
+	basr	%r14, %r10		# Call our caller.
+
+	lmg	%r6, %r15, 0x30(%r11)	# Restore all callee-saved registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# This is the cleanup code called by the stack unwinder when unwinding
+# through the code between .LEHB0 and .LEHE0 above.
+
+.L1:
+	.cfi_restore_state
+	lgr	%r2, %r11		# Stack pointer after resume.
+	brasl	%r14, __generic_findstack
+	lgr	%r3, %r11		# Get the stack pointer.
+	sgr	%r3, %r2		# Subtract available space.
+	aghi	%r3, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+	stg	%r3, 0x38(%r1)	# Save the new stack boundary.
+
+	lgr	%r2, %r6		# Exception header.
+#ifdef __PIC__
+	brasl	%r14, _Unwind_Resume@PLT
+#else
+	brasl	%r14, _Unwind_Resume
+#endif
+
+#endif /* defined(__s390x__) */
+
+	.cfi_endproc
+	.size	__morestack, . - __morestack
+
+
+# The exception table.  This tells the personality routine to execute
+# the exception handler.
+
+	.section	.gcc_except_table,"a",@progbits
+	.align	4
+.LLSDA1:
+	.byte	0xff	# @LPStart format (omit)
+	.byte	0xff	# @TType format (omit)
+	.byte	0x1	# call-site format (uleb128)
+	.uleb128 .LLSDACSE1-.LLSDACSB1	# Call-site table length
+.LLSDACSB1:
+	.uleb128 .LEHB0-.LFB1	# region 0 start
+	.uleb128 .LEHE0-.LEHB0	# length
+	.uleb128 .L1-.LFB1	# landing pad
+	.uleb128 0		# action
+.LLSDACSE1:
+
+
+	.global __gcc_personality_v0
+#ifdef __PIC__
+	# Build a position independent reference to the basic
+        # personality function.
+	.hidden DW.ref.__gcc_personality_v0
+	.weak   DW.ref.__gcc_personality_v0
+	.section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
+	.type	DW.ref.__gcc_personality_v0, @object
+DW.ref.__gcc_personality_v0:
+#ifndef __LP64__
+	.align 4
+	.size	DW.ref.__gcc_personality_v0, 4
+	.long	__gcc_personality_v0
+#else
+	.align 8
+	.size	DW.ref.__gcc_personality_v0, 8
+	.quad	__gcc_personality_v0
+#endif
+#endif
+
+
+
+# Initialize the stack test value when the program starts or when a
+# new thread starts.  We don't know how large the main stack is, so we
+# guess conservatively.  We might be able to use getrlimit here.
+
+	.text
+	.global	__stack_split_initialize
+	.hidden	__stack_split_initialize
+
+	.type	__stack_split_initialize, @function
+
+__stack_split_initialize:
+
+#ifndef __s390x__
+
+	ear	%r1, %a0
+	lr	%r0, %r15
+	ahi	%r0, -0x4000	# We should have at least 16K.
+	st	%r0, 0x20(%r1)
+
+	lr	%r2, %r15
+	lhi	%r3, 0x4000
+#ifdef __PIC__
+	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
+#else
+	jg	__generic_morestack_set_initial_sp	# Tail call
+#endif
+
+#else /* defined(__s390x__) */
+
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	lgr	%r0, %r15
+	aghi	%r0, -0x4000	# We should have at least 16K.
+	stg	%r0, 0x38(%r1)
+
+	lgr	%r2, %r15
+	lghi	%r3, 0x4000
+#ifdef __PIC__
+	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
+#else
+	jg	__generic_morestack_set_initial_sp	# Tail call
+#endif
+
+#endif /* defined(__s390x__) */
+
+	.size	__stack_split_initialize, . - __stack_split_initialize
+
+# Routines to get and set the guard, for __splitstack_getcontext,
+# __splitstack_setcontext, and __splitstack_makecontext.
+
+# void *__morestack_get_guard (void) returns the current stack guard.
+	.text
+	.global	__morestack_get_guard
+	.hidden	__morestack_get_guard
+
+	.type	__morestack_get_guard,@function
+
+__morestack_get_guard:
+
+#ifndef __s390x__
+	ear	%r1, %a0
+	l	%r2, 0x20(%r1)
+#else
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	lg	%r2, 0x38(%r1)
+#endif
+	br %r14
+
+	.size	__morestack_get_guard, . - __morestack_get_guard
+
+# void __morestack_set_guard (void *) sets the stack guard.
+	.global	__morestack_set_guard
+	.hidden	__morestack_set_guard
+
+	.type	__morestack_set_guard,@function
+
+__morestack_set_guard:
+
+#ifndef __s390x__
+	ear	%r1, %a0
+	st	%r2, 0x20(%r1)
+#else
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	stg	%r2, 0x38(%r1)
+#endif
+	br	%r14
+
+	.size	__morestack_set_guard, . - __morestack_set_guard
+
+# void *__morestack_make_guard (void *, size_t) returns the stack
+# guard value for a stack.
+	.global	__morestack_make_guard
+	.hidden	__morestack_make_guard
+
+	.type	__morestack_make_guard,@function
+
+__morestack_make_guard:
+
+#ifndef __s390x__
+	sr	%r2, %r3
+	ahi	%r2, BACKOFF
+#else
+	sgr	%r2, %r3
+	aghi	%r2, BACKOFF
+#endif
+	br	%r14
+
+	.size	__morestack_make_guard, . - __morestack_make_guard
+
+# Make __stack_split_initialize a high priority constructor.
+
+	.section .ctors.65535,"aw",@progbits
+
+#ifndef __LP64__
+	.align	4
+	.long	__stack_split_initialize
+	.long	__morestack_load_mmap
+#else
+	.align	8
+	.quad	__stack_split_initialize
+	.quad	__morestack_load_mmap
+#endif
+
+	.section	.note.GNU-stack,"",@progbits
+	.section	.note.GNU-split-stack,"",@progbits
+	.section	.note.GNU-no-split-stack,"",@progbits
diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390
new file mode 100644
index 0000000..4c959b0
--- /dev/null
+++ b/libgcc/config/s390/t-stack-s390
@@ -0,0 +1,2 @@
+# Makefile fragment to support -fsplit-stack for s390.
+LIB2ADD_ST += $(srcdir)/config/s390/morestack.S
diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
index 89765d4..b8eec4e 100644
--- a/libgcc/generic-morestack.c
+++ b/libgcc/generic-morestack.c
@@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len,
 #elif defined (__i386__)
       nsp -= 6 * sizeof (void *);
 #elif defined __powerpc64__
+#elif defined __s390x__
+      nsp -= 2 * 160;
+#elif defined __s390__
+      nsp -= 2 * 96;
 #else
 #error "unrecognized target"
 #endif
-- 
2.7.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] s390: Use proper read-only data section for literals.
  2016-01-02 19:17 ` [PATCH 1/5] s390: Use proper read-only data section for literals Marcin Kościelnicki
@ 2016-01-20 13:11   ` Andreas Krebbel
  2016-01-21  6:56     ` Marcin Kościelnicki
  0 siblings, 1 reply; 55+ messages in thread
From: Andreas Krebbel @ 2016-01-20 13:11 UTC (permalink / raw)
  To: Marcin Kościelnicki, gcc-patches

On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
> Previously, .rodata was hardcoded.  For C++ vague linkage functions,
> this resulted in needlessly duplicated literals.  With the new split
> stack support, this resulted in link errors, due to .rodata containing
> relocations to the discarded text sections.
> 
> gcc/ChangeLog:
> 
> 	* config/s390/s390.md (pool_section_start): Use switch_to_section
> 	to select proper read-only data section instead of hardcoding .rodata.
> 	(pool_section_end): Use switch_to_section to match the above.
> ---
>  gcc/ChangeLog           |  6 ++++++
>  gcc/config/s390/s390.md | 11 +++++++++--
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 23ce209..2c572a7 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,9 @@
> +2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
> +
> +	* config/s390/s390.md (pool_section_start): Use switch_to_section
> +	to select proper read-only data section instead of hardcoding .rodata.
> +	(pool_section_end): Use switch_to_section to match the above.
> +

This is ok if bootstrap and regression tests are clean. Thanks!

-Andreas-


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/5] s390: Fix missing .size directives.
  2016-01-02 19:16 ` [PATCH 2/5] s390: Fix missing .size directives Marcin Kościelnicki
@ 2016-01-20 13:16   ` Andreas Krebbel
  2016-01-20 14:01     ` Dominik Vogt
  2016-01-21  9:59     ` Andreas Krebbel
  0 siblings, 2 replies; 55+ messages in thread
From: Andreas Krebbel @ 2016-01-20 13:16 UTC (permalink / raw)
  To: Marcin Kościelnicki, gcc-patches

On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
> It seems at some point the .size hook was hijacked to emit some
> machine-specific directives, and the actual .size directive was
> forgotten.  This caused problems for split-stack support, since
> linker couldn't scan the function body for non-split-stack calls.
> 
> gcc/ChangeLog:
> 
> 	* config/s390/s390.c (s390_asm_declare_function_size): Add code
> 	to actually emit the .size directive.

...

>  s390_asm_declare_function_size (FILE *asm_out_file,
> -				const char *fnname ATTRIBUTE_UNUSED, tree decl)
> +				const char *fnname, tree decl)
>  {
> +  if (!flag_inhibit_size_directive)
> +    ASM_OUTPUT_MEASURED_SIZE (asm_out_file, fnname);
>    if (DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL)
>      return;
>    fprintf (asm_out_file, "\t.machine pop\n");

It would be good to use the original ASM_DECLARE_FUNCTION_SIZE macro from config/elfos.h here.  This
probably would require to change its name in s390.h first and then use it from
s390_asm_declare_function_size. Not really beautiful but at least changes to the original macro
would not require adjusting our backend.

-Andreas-

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/5] s390: Fix missing .size directives.
  2016-01-20 13:16   ` Andreas Krebbel
@ 2016-01-20 14:01     ` Dominik Vogt
  2016-01-21  9:59     ` Andreas Krebbel
  1 sibling, 0 replies; 55+ messages in thread
From: Dominik Vogt @ 2016-01-20 14:01 UTC (permalink / raw)
  To: gcc-patches

On Wed, Jan 20, 2016 at 02:16:23PM +0100, Andreas Krebbel wrote:
> On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
> >  s390_asm_declare_function_size (FILE *asm_out_file,
> > -				const char *fnname ATTRIBUTE_UNUSED, tree decl)
> > +				const char *fnname, tree decl)
> >  {
> > +  if (!flag_inhibit_size_directive)
> > +    ASM_OUTPUT_MEASURED_SIZE (asm_out_file, fnname);
> >    if (DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL)
> >      return;
> >    fprintf (asm_out_file, "\t.machine pop\n");
> 
> It would be good to use the original ASM_DECLARE_FUNCTION_SIZE macro from config/elfos.h here.  This
> probably would require to change its name in s390.h first and then use it from
> s390_asm_declare_function_size. Not really beautiful but at least changes to the original macro
> would not require adjusting our backend.

Maybe it's better not to invent yet another solution to deal with
this and just do it like proposed in the patch.  So if the default
implementation is ever changed, the same search pattern will find
all identical copies of the code.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] s390: Use proper read-only data section for literals.
  2016-01-20 13:11   ` Andreas Krebbel
@ 2016-01-21  6:56     ` Marcin Kościelnicki
  2016-01-21  8:17       ` Mike Stump
  2016-01-21  9:46       ` Andreas Krebbel
  0 siblings, 2 replies; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-21  6:56 UTC (permalink / raw)
  To: Andreas Krebbel, gcc-patches

On 20/01/16 14:11, Andreas Krebbel wrote:
> On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
>> Previously, .rodata was hardcoded.  For C++ vague linkage functions,
>> this resulted in needlessly duplicated literals.  With the new split
>> stack support, this resulted in link errors, due to .rodata containing
>> relocations to the discarded text sections.
>>
>> gcc/ChangeLog:
>>
>> 	* config/s390/s390.md (pool_section_start): Use switch_to_section
>> 	to select proper read-only data section instead of hardcoding .rodata.
>> 	(pool_section_end): Use switch_to_section to match the above.
>> ---
>>   gcc/ChangeLog           |  6 ++++++
>>   gcc/config/s390/s390.md | 11 +++++++++--
>>   2 files changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 23ce209..2c572a7 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,9 @@
>> +2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
>> +
>> +	* config/s390/s390.md (pool_section_start): Use switch_to_section
>> +	to select proper read-only data section instead of hardcoding .rodata.
>> +	(pool_section_end): Use switch_to_section to match the above.
>> +
>
> This is ok if bootstrap and regression tests are clean. Thanks!
>
> -Andreas-
>
>

The bootstrap and regression tests are indeed clean for this patch and 
#2.  I don't have commit access to gcc repo, how do I get this pushed?

Marcin Kościelnicki

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] s390: Use proper read-only data section for literals.
  2016-01-21  6:56     ` Marcin Kościelnicki
@ 2016-01-21  8:17       ` Mike Stump
  2016-01-21  9:46       ` Andreas Krebbel
  1 sibling, 0 replies; 55+ messages in thread
From: Mike Stump @ 2016-01-21  8:17 UTC (permalink / raw)
  To: Marcin Kościelnicki; +Cc: Andreas Krebbel, gcc-patches

On Jan 20, 2016, at 10:56 PM, Marcin Kościelnicki <koriakin@0x04.net> wrote:
>> This is ok if bootstrap and regression tests are clean. Thanks!

> The bootstrap and regression tests are indeed clean for this patch and #2.  I don't have commit access to gcc repo, how do I get this pushed?

Just ask someone to apply it for you.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] s390: Use proper read-only data section for literals.
  2016-01-21  6:56     ` Marcin Kościelnicki
  2016-01-21  8:17       ` Mike Stump
@ 2016-01-21  9:46       ` Andreas Krebbel
  1 sibling, 0 replies; 55+ messages in thread
From: Andreas Krebbel @ 2016-01-21  9:46 UTC (permalink / raw)
  To: Marcin Kościelnicki, gcc-patches

On 01/21/2016 07:56 AM, Marcin Kościelnicki wrote:
>>> +2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
>>> +
>>> +	* config/s390/s390.md (pool_section_start): Use switch_to_section
>>> +	to select proper read-only data section instead of hardcoding .rodata.
>>> +	(pool_section_end): Use switch_to_section to match the above.
>>> +
>>
>> This is ok if bootstrap and regression tests are clean. Thanks!
>>
>> -Andreas-
>>
>>
> 
> The bootstrap and regression tests are indeed clean for this patch and 
> #2.  I don't have commit access to gcc repo, how do I get this pushed?

Committed to mainline. Thanks!

-Andreas-


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/5] s390: Fix missing .size directives.
  2016-01-20 13:16   ` Andreas Krebbel
  2016-01-20 14:01     ` Dominik Vogt
@ 2016-01-21  9:59     ` Andreas Krebbel
  2016-01-21 10:10       ` Marcin Kościelnicki
  1 sibling, 1 reply; 55+ messages in thread
From: Andreas Krebbel @ 2016-01-21  9:59 UTC (permalink / raw)
  To: Marcin Kościelnicki, gcc-patches

On 01/20/2016 02:16 PM, Andreas Krebbel wrote:
> On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
>> It seems at some point the .size hook was hijacked to emit some
>> machine-specific directives, and the actual .size directive was
>> forgotten.  This caused problems for split-stack support, since
>> linker couldn't scan the function body for non-split-stack calls.
>>
>> gcc/ChangeLog:
>>
>> 	* config/s390/s390.c (s390_asm_declare_function_size): Add code
>> 	to actually emit the .size directive.
> 
> ...
> 
>>  s390_asm_declare_function_size (FILE *asm_out_file,
>> -				const char *fnname ATTRIBUTE_UNUSED, tree decl)
>> +				const char *fnname, tree decl)
>>  {
>> +  if (!flag_inhibit_size_directive)
>> +    ASM_OUTPUT_MEASURED_SIZE (asm_out_file, fnname);
>>    if (DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL)
>>      return;
>>    fprintf (asm_out_file, "\t.machine pop\n");
> 
> It would be good to use the original ASM_DECLARE_FUNCTION_SIZE macro from config/elfos.h here.  This
> probably would require to change its name in s390.h first and then use it from
> s390_asm_declare_function_size. Not really beautiful but at least changes to the original macro
> would not require adjusting our backend.

I've looked into how the other archs are doing this and didn't find anything better than just
including the code from the original macro. The real fix probably would be to turn this into a
target hook instead.

I've committed the patch now since it fixes a real problem not only with split-stack.

Thanks!

-Andreas-

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU.
  2016-01-02 19:16 ` [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU Marcin Kościelnicki
@ 2016-01-21 10:05   ` Andreas Krebbel
  2016-01-21 10:10     ` Marcin Kościelnicki
  2016-01-21 23:10     ` Jeff Law
  2016-04-17 21:24   ` Jeff Law
  1 sibling, 2 replies; 55+ messages in thread
From: Andreas Krebbel @ 2016-01-21 10:05 UTC (permalink / raw)
  To: Marcin Kościelnicki, gcc-patches

On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
> When an unconditional jump with side effects targets an immediately
> following label, rtl_tidy_fallthru_edge is called.  Since it has side
> effects, it doesn't remove the jump, but the label is still marked
> as fallthru.  This later causes a verification error.  Do nothing in this
> case instead.
> 
> gcc/ChangeLog:
> 
> 	* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
> 	with side effects.

The change looks ok to me (although I'm not able to approve it). Could you please run regressions
tests on x86_64 with that change?

Perhaps a short comment in the code would be good.

-Andreas-

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU.
  2016-01-21 10:05   ` Andreas Krebbel
@ 2016-01-21 10:10     ` Marcin Kościelnicki
  2016-01-21 23:10     ` Jeff Law
  1 sibling, 0 replies; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-21 10:10 UTC (permalink / raw)
  To: Andreas Krebbel, gcc-patches

On 21/01/16 11:05, Andreas Krebbel wrote:
> On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
>> When an unconditional jump with side effects targets an immediately
>> following label, rtl_tidy_fallthru_edge is called.  Since it has side
>> effects, it doesn't remove the jump, but the label is still marked
>> as fallthru.  This later causes a verification error.  Do nothing in this
>> case instead.
>>
>> gcc/ChangeLog:
>>
>> 	* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
>> 	with side effects.
>
> The change looks ok to me (although I'm not able to approve it). Could you please run regressions
> tests on x86_64 with that change?
>
> Perhaps a short comment in the code would be good.
>
> -Andreas-
>

OK, I'll run the testsuite and add a comment.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/5] s390: Fix missing .size directives.
  2016-01-21  9:59     ` Andreas Krebbel
@ 2016-01-21 10:10       ` Marcin Kościelnicki
  0 siblings, 0 replies; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-21 10:10 UTC (permalink / raw)
  To: Andreas Krebbel, gcc-patches

On 21/01/16 10:58, Andreas Krebbel wrote:
> On 01/20/2016 02:16 PM, Andreas Krebbel wrote:
>> On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
>>> It seems at some point the .size hook was hijacked to emit some
>>> machine-specific directives, and the actual .size directive was
>>> forgotten.  This caused problems for split-stack support, since
>>> linker couldn't scan the function body for non-split-stack calls.
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* config/s390/s390.c (s390_asm_declare_function_size): Add code
>>> 	to actually emit the .size directive.
>>
>> ...
>>
>>>   s390_asm_declare_function_size (FILE *asm_out_file,
>>> -				const char *fnname ATTRIBUTE_UNUSED, tree decl)
>>> +				const char *fnname, tree decl)
>>>   {
>>> +  if (!flag_inhibit_size_directive)
>>> +    ASM_OUTPUT_MEASURED_SIZE (asm_out_file, fnname);
>>>     if (DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL)
>>>       return;
>>>     fprintf (asm_out_file, "\t.machine pop\n");
>>
>> It would be good to use the original ASM_DECLARE_FUNCTION_SIZE macro from config/elfos.h here.  This
>> probably would require to change its name in s390.h first and then use it from
>> s390_asm_declare_function_size. Not really beautiful but at least changes to the original macro
>> would not require adjusting our backend.
>
> I've looked into how the other archs are doing this and didn't find anything better than just
> including the code from the original macro. The real fix probably would be to turn this into a
> target hook instead.
>
> I've committed the patch now since it fixes a real problem not only with split-stack.
>
> Thanks!
>
> -Andreas-
>

I did a version that reincludes elfos.h, but it didn't finish testing 
(it made it through bootstrap) before you applied the patch:

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 21a5687..c56b909 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -6832,10 +6832,17 @@ s390_asm_output_function_prefix (FILE *asm_out_file,

  /* Write an extra function footer after the very end of the function.  */

+/* Get elfos.h's original ASM_DECLARE_FUNCTION_SIZE, so that we can 
delegate
+   to it below.  */
+
+#undef ASM_DECLARE_FUNCTION_SIZE
+#include "../elfos.h"
+
  void
  s390_asm_declare_function_size (FILE *asm_out_file,
-                               const char *fnname ATTRIBUTE_UNUSED, 
tree decl)
+                               const char *fnname, tree decl)
  {
+  ASM_DECLARE_FUNCTION_SIZE(asm_out_file, fnname, decl);
    if (DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL)
      return;
    fprintf (asm_out_file, "\t.machine pop\n");


But, this is much uglier, and the macro is very unlikely to change in 
the first place.  I guess we should stay with the applied patch.

Thanks,

Marcin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/5] s390: Add -fsplit-stack support
  2016-01-15 21:08     ` Marcin Kościelnicki
@ 2016-01-21 10:12       ` Andreas Krebbel
  2016-01-21 13:04         ` Marcin Kościelnicki
  0 siblings, 1 reply; 55+ messages in thread
From: Andreas Krebbel @ 2016-01-21 10:12 UTC (permalink / raw)
  To: Marcin Kościelnicki, gcc-patches

On 01/15/2016 10:08 PM, Marcin Kościelnicki wrote:
> On 15/01/16 19:38, Andreas Krebbel wrote:
>> Marcin,
>>
>> your implementation looks very good to me. Thanks!
>>
>> But please be aware that we deprecated the support of g5 and g6 and intend to remove that code from
>> the back-end with the next GCC version.  So I would prefer if you could remove all the
>> !TARGET_CPU_ZARCH stuff from the implementation and just error out if split-stack is enabled with
>> -march g5/g6.  It currently makes the implementation more complicated and would have to be removed
>> anyway in the future.
>>
>> Thanks!
>>
>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01854.html
>>
>>
>> Bye,
>>
>> -Andreas-
>>
>>
> 
> Very well, I'll do that.
> 
> Btw, as for dropping support for g5/g6: I've noticed 
> s390_function_profiler could also use larl+brasl for -m31 given 
> TARGET_CPU_ZARCH.  Should I submit a patch for that?  I'm asking because 
> gold with -fsplit-stack needs to know the exact sequence used, so if 
> it's going to change after g5/g6 removal, I'd better add it to gold now 
> (and make gcc always emit it for non-g5/g6, so that gold won't need to 
> look at the old one).

Yes please, that would be great. Good catch!

Thanks!

-Andreas-

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/5] s390: Add -fsplit-stack support
  2016-01-21 10:12       ` Andreas Krebbel
@ 2016-01-21 13:04         ` Marcin Kościelnicki
  0 siblings, 0 replies; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-21 13:04 UTC (permalink / raw)
  To: Andreas Krebbel, gcc-patches

On 21/01/16 11:12, Andreas Krebbel wrote:
> On 01/15/2016 10:08 PM, Marcin Kościelnicki wrote:
>> On 15/01/16 19:38, Andreas Krebbel wrote:
>>> Marcin,
>>>
>>> your implementation looks very good to me. Thanks!
>>>
>>> But please be aware that we deprecated the support of g5 and g6 and intend to remove that code from
>>> the back-end with the next GCC version.  So I would prefer if you could remove all the
>>> !TARGET_CPU_ZARCH stuff from the implementation and just error out if split-stack is enabled with
>>> -march g5/g6.  It currently makes the implementation more complicated and would have to be removed
>>> anyway in the future.
>>>
>>> Thanks!
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01854.html
>>>
>>>
>>> Bye,
>>>
>>> -Andreas-
>>>
>>>
>>
>> Very well, I'll do that.
>>
>> Btw, as for dropping support for g5/g6: I've noticed
>> s390_function_profiler could also use larl+brasl for -m31 given
>> TARGET_CPU_ZARCH.  Should I submit a patch for that?  I'm asking because
>> gold with -fsplit-stack needs to know the exact sequence used, so if
>> it's going to change after g5/g6 removal, I'd better add it to gold now
>> (and make gcc always emit it for non-g5/g6, so that gold won't need to
>> look at the old one).
>
> Yes please, that would be great. Good catch!
>
> Thanks!
>
> -Andreas-
>

I've submitted the gcc patch, and will soon update the gold patch.

Marcin Kościelnicki

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU.
  2016-01-21 10:05   ` Andreas Krebbel
  2016-01-21 10:10     ` Marcin Kościelnicki
@ 2016-01-21 23:10     ` Jeff Law
  2016-01-22  7:44       ` Andreas Krebbel
  1 sibling, 1 reply; 55+ messages in thread
From: Jeff Law @ 2016-01-21 23:10 UTC (permalink / raw)
  To: Andreas Krebbel, Marcin Kościelnicki, gcc-patches

On 01/21/2016 03:05 AM, Andreas Krebbel wrote:
> On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
>> When an unconditional jump with side effects targets an immediately
>> following label, rtl_tidy_fallthru_edge is called.  Since it has side
>> effects, it doesn't remove the jump, but the label is still marked
>> as fallthru.  This later causes a verification error.  Do nothing in this
>> case instead.
>>
>> gcc/ChangeLog:
>>
>> 	* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
>> 	with side effects.
>
> The change looks ok to me (although I'm not able to approve it). Could you please run regressions
> tests on x86_64 with that change?
>
> Perhaps a short comment in the code would be good.
I think the patch is technically fine, the question is does it fix a 
visible bug?  I read the series as new feature enablement so I put this 
patch into my gcc7 queue.

jeff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU.
  2016-01-21 23:10     ` Jeff Law
@ 2016-01-22  7:44       ` Andreas Krebbel
  2016-01-22 16:39         ` Marcin Kościelnicki
  2016-01-27  7:11         ` Jeff Law
  0 siblings, 2 replies; 55+ messages in thread
From: Andreas Krebbel @ 2016-01-22  7:44 UTC (permalink / raw)
  To: Jeff Law, Marcin Kościelnicki, gcc-patches

On 01/22/2016 12:10 AM, Jeff Law wrote:
> On 01/21/2016 03:05 AM, Andreas Krebbel wrote:
>> On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
>>> When an unconditional jump with side effects targets an immediately
>>> following label, rtl_tidy_fallthru_edge is called.  Since it has side
>>> effects, it doesn't remove the jump, but the label is still marked
>>> as fallthru.  This later causes a verification error.  Do nothing in this
>>> case instead.
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
>>> 	with side effects.
>>
>> The change looks ok to me (although I'm not able to approve it). Could you please run regressions
>> tests on x86_64 with that change?
>>
>> Perhaps a short comment in the code would be good.
> I think the patch is technically fine, the question is does it fix a 
> visible bug?  I read the series as new feature enablement so I put this 
> patch into my gcc7 queue.

We need the patch for the S/390 split-stack implementation which we would like to see in GCC 6.  I'm
aware that this isn't stage 3 material but people seem to have reasons to really want split stack on
S/390 asap and we would have to backport this feature anyway. Therefore I would prefer to have it in
the official release already. That's the only common code change we would need for that.

I've started a bootstrap and regression test for the patch also on Power.

Do you see a chance we can get this into GCC 6?

Bye,

-Andreas-

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU.
  2016-01-22  7:44       ` Andreas Krebbel
@ 2016-01-22 16:39         ` Marcin Kościelnicki
  2016-01-27  7:11         ` Jeff Law
  1 sibling, 0 replies; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-22 16:39 UTC (permalink / raw)
  To: Andreas Krebbel, Jeff Law, gcc-patches

On 22/01/16 08:44, Andreas Krebbel wrote:
> On 01/22/2016 12:10 AM, Jeff Law wrote:
>> On 01/21/2016 03:05 AM, Andreas Krebbel wrote:
>>> On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
>>>> When an unconditional jump with side effects targets an immediately
>>>> following label, rtl_tidy_fallthru_edge is called.  Since it has side
>>>> effects, it doesn't remove the jump, but the label is still marked
>>>> as fallthru.  This later causes a verification error.  Do nothing in this
>>>> case instead.
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 	* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
>>>> 	with side effects.
>>>
>>> The change looks ok to me (although I'm not able to approve it). Could you please run regressions
>>> tests on x86_64 with that change?
>>>
>>> Perhaps a short comment in the code would be good.
>> I think the patch is technically fine, the question is does it fix a
>> visible bug?  I read the series as new feature enablement so I put this
>> patch into my gcc7 queue.
>
> We need the patch for the S/390 split-stack implementation which we would like to see in GCC 6.  I'm
> aware that this isn't stage 3 material but people seem to have reasons to really want split stack on
> S/390 asap and we would have to backport this feature anyway. Therefore I would prefer to have it in
> the official release already. That's the only common code change we would need for that.
>
> I've started a bootstrap and regression test for the patch also on Power.
>
> Do you see a chance we can get this into GCC 6?
>
> Bye,
>
> -Andreas-
>

I've tested the patch on x86_64, no regressions.

I'm not entirely sure if the patch needs to go in for the current 
version of split-stack support.

This patch fixed a showstopper bug on g5 CPUs when the patch still 
supported them.  I haven't seen this bug with the z900 sequences (which 
are now the only ones left), but since we're still using unconditional 
jumps with side effects, I left it in just to be safe.  The testsuite 
passes on s390x -fsplit-stack both with the patch and without it.

So, I don't know.  It seems to work now, probably because no 
optimization pass has a reason to touch that jump, but it may start to 
fail if someone adds a new optimization that tries to be smart with our 
prologue.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU.
  2016-01-22  7:44       ` Andreas Krebbel
  2016-01-22 16:39         ` Marcin Kościelnicki
@ 2016-01-27  7:11         ` Jeff Law
  1 sibling, 0 replies; 55+ messages in thread
From: Jeff Law @ 2016-01-27  7:11 UTC (permalink / raw)
  To: Andreas Krebbel, Marcin Kościelnicki, gcc-patches

On 01/22/2016 12:44 AM, Andreas Krebbel wrote:
> On 01/22/2016 12:10 AM, Jeff Law wrote:
>> On 01/21/2016 03:05 AM, Andreas Krebbel wrote:
>>> On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
>>>> When an unconditional jump with side effects targets an immediately
>>>> following label, rtl_tidy_fallthru_edge is called.  Since it has side
>>>> effects, it doesn't remove the jump, but the label is still marked
>>>> as fallthru.  This later causes a verification error.  Do nothing in this
>>>> case instead.
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 	* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
>>>> 	with side effects.
>>>
>>> The change looks ok to me (although I'm not able to approve it). Could you please run regressions
>>> tests on x86_64 with that change?
>>>
>>> Perhaps a short comment in the code would be good.
>> I think the patch is technically fine, the question is does it fix a
>> visible bug?  I read the series as new feature enablement so I put this
>> patch into my gcc7 queue.
>
> We need the patch for the S/390 split-stack implementation which we would like to see in GCC 6.  I'm
> aware that this isn't stage 3 material but people seem to have reasons to really want split stack on
> S/390 asap and we would have to backport this feature anyway. Therefore I would prefer to have it in
> the official release already. That's the only common code change we would need for that.
>
> I've started a bootstrap and regression test for the patch also on Power.
>
> Do you see a chance we can get this into GCC 6?
So I think it'd largely depend on what you do with the s390 specific 
bits -- if you decide to drop those in (ISTM that's your call), then I 
think adding the cfgrtl patch is probably the wise thing to do.  So 
consider it approved for gcc-6 if/when you decide to go forward with the 
s390 specific bits.

FWIW, the PA might run afoul of the code you're fixing as well. It's got 
add[i]b,tr and mov[i]b,tr which are unconditional jumps with other side 
effects.  We never really used them all that much and once the PA8000 
series came out, they were actually a performance lose, so they were 
disabled on the "modern" PA machines.

Jeff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-01-16 13:46     ` [PATCH] " Marcin Kościelnicki
@ 2016-01-29 13:33       ` Andreas Krebbel
  2016-01-29 15:43         ` Marcin Kościelnicki
  0 siblings, 1 reply; 55+ messages in thread
From: Andreas Krebbel @ 2016-01-29 13:33 UTC (permalink / raw)
  To: Marcin Kościelnicki; +Cc: gcc-patches

Hi Marcin,

sorry for the late feedback.

A few comments regarding the split stack implementation:

The GNU coding style requires to replace every 8 leading blanks on a
line with a tab.  There are many lines in your patch violating this.
In case you are an emacs user `whitespace-cleanup' will fix this for
you.

Could you please add a testcase checking the different
variants. I.e. with early exit, no-alloc in __morestack, and with an
actual allocation?

There are a few more comments inline.

Bye,

-Andreas-

> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index c881d52..71f6f38 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,5 +1,38 @@
>  2016-01-16  Marcin Kościelnicki  <koriakin@0x04.net>
> 
> +	* common/config/s390/s390-common.c (s390_supports_split_stack):
> +	New function.
> +	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
> +	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
> +	* config/s390/s390.c (struct machine_function): New field
> +	split_stack_varargs_pointer.
> +	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
> +	in s390_emit_prologue.
> +	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
> +	vararg pointer.
> +	(morestack_ref): New global.
> +	(SPLIT_STACK_AVAILABLE): New macro.
> +	(s390_expand_split_stack_prologue): New function.
> +	(s390_expand_split_stack_call): New function.
> +	(s390_live_on_entry): New function.
> +	(s390_va_start): Use split-stack vararg pointer if appropriate.
> +	(s390_reorg): Lower the split-stack pseudo-insns.
> +	(s390_asm_file_end): Emit the split-stack note sections.
> +	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
> +	* config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec.
> +	(UNSPECV_SPLIT_STACK_CALL): New unspec.
> +	(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
> +	(UNSPECV_SPLIT_STACK_MARKER): New unspec.
> +	(split_stack_prologue): New expand.
> +	(split_stack_call_*): New insn.
> +	(split_stack_cond_call_*): New insn.
> +	(split_stack_space_check): New expand.
> +	(split_stack_sibcall_*): New insn.
> +	(split_stack_cond_sibcall_*): New insn.
> +	(split_stack_marker): New insn.
> +
> +2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
> +
>  	* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
>  	with side effects.
> 
> diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c
> index 4519c21..1e497e6 100644
> --- a/gcc/common/config/s390/s390-common.c
> +++ b/gcc/common/config/s390/s390-common.c
> @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
>      }
>  }
> 
> +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
> +   We don't verify it, since earlier versions just have padding at
> +   its place, which works just as well.  */
> +
> +static bool
> +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
> +			   struct gcc_options *opts ATTRIBUTE_UNUSED)
> +{
> +  return true;
> +}
> +
>  #undef TARGET_DEFAULT_TARGET_FLAGS
>  #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT)
> 
> @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
>  #undef TARGET_OPTION_INIT_STRUCT
>  #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct
> 
> +#undef TARGET_SUPPORTS_SPLIT_STACK
> +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack
> +
>  struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
> diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
> index 633bc1e..09032c9 100644
> --- a/gcc/config/s390/s390-protos.h
> +++ b/gcc/config/s390/s390-protos.h
> @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
>  extern HOST_WIDE_INT s390_initial_elimination_offset (int, int);
>  extern void s390_emit_prologue (void);
>  extern void s390_emit_epilogue (bool);
> +extern void s390_expand_split_stack_prologue (void);
>  extern bool s390_can_use_simple_return_insn (void);
>  extern bool s390_can_use_return_insn (void);
>  extern void s390_function_profiler (FILE *, int);
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index 3be64de..6afce7c 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -426,6 +426,13 @@ struct GTY(()) machine_function
>    /* True if the current function may contain a tbegin clobbering
>       FPRs.  */
>    bool tbegin_p;
> +
> +  /* For -fsplit-stack support: A stack local which holds a pointer to
> +     the stack arguments for a function with a variable number of
> +     arguments.  This is set at the start of the function and is used
> +     to initialize the overflow_arg_area field of the va_list
> +     structure.  */
> +  rtx split_stack_varargs_pointer;
>  };
> 
>  /* Few accessor macros for struct cfun->machine->s390_frame_layout.  */
> @@ -9316,9 +9323,13 @@ s390_register_info ()
>  	  cfun_frame_layout.high_fprs++;
>        }
> 
> -  if (flag_pic)
> -    clobbered_regs[PIC_OFFSET_TABLE_REGNUM]
> -      |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM);
> +  /* Register 12 is used for GOT address, but also as temp in prologue
> +     for split-stack stdarg functions (unless r14 is available).  */
> +  clobbered_regs[12]
> +    |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM))
> +	|| (flag_split_stack && cfun->stdarg
> +	    && (crtl->is_leaf || TARGET_TPF_PROFILING
> +		|| has_hard_reg_initial_val (Pmode, RETURN_REGNUM))));
> 
>    clobbered_regs[BASE_REGNUM]
>      |= (cfun->machine->base_reg
> @@ -10446,6 +10457,8 @@ s390_emit_prologue (void)
>        && !crtl->is_leaf
>        && !TARGET_TPF_PROFILING)
>      temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM);
> +  else if (flag_split_stack && cfun->stdarg)
> +    temp_reg = gen_rtx_REG (Pmode, 12);
TPF uses r1 hard coded in tracing prologue/epilogue.  So I think we
need && !TARGET_TPF_PROFILING here as well.

>    else
>      temp_reg = gen_rtx_REG (Pmode, 1);
> 
> @@ -10939,6 +10952,284 @@ s300_set_up_by_prologue (hard_reg_set_container *regs)
>      SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg));
>  }
> 
> +/* -fsplit-stack support.  */
> +
> +/* A SYMBOL_REF for __morestack.  */
> +static GTY(()) rtx morestack_ref;
> +
> +/* When using -fsplit-stack, the allocation routines set a field in
> +   the TCB to the bottom of the stack plus this much space, measured
> +   in bytes.  */
> +
> +#define SPLIT_STACK_AVAILABLE 1024
> +
> +/* Emit -fsplit-stack prologue, which goes before the regular function
> +   prologue.  */
> +
> +void
> +s390_expand_split_stack_prologue (void)
> +{
> +  rtx r1, guard, cc;
> +  rtx_insn *insn;
> +  /* Offset from thread pointer to __private_ss.  */
> +  int psso = TARGET_64BIT ? 0x38 : 0x20;
> +  /* Pointer size in bytes.  */
> +  /* Frame size and argument size - the two parameters to __morestack.  */
> +  HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size;
> +  /* Align argument size to 8 bytes - simplifies __morestack code.  */
> +  HOST_WIDE_INT args_size = crtl->args.size >= 0
> +			    ? ((crtl->args.size + 7) & ~7)
> +			    : 0;
> +  /* Label to jump to when no __morestack call is necessary.  */
> +  rtx_code_label *enough = NULL;
> +  /* Label to be called by __morestack.  */
> +  rtx_code_label *call_done = NULL;
> +  /* 1 if __morestack called conditionally, 0 if always.  */
> +  int conditional = 0;
> +
> +  gcc_assert (flag_split_stack && reload_completed);
> +  if (!TARGET_CPU_ZARCH)
> +    {
> +      sorry ("CPUs older than z900 are not supported for -fsplit-stack");
> +      return;
> +    }
> +
> +  r1 = gen_rtx_REG (Pmode, 1);
> +
> +  /* If no stack frame will be allocated, don't do anything.  */
> +  if (!frame_size)
> +    {
> +      /* But emit a marker that will let linker and indirect function
> +	 calls recognise this function as split-stack aware.  */
> +      emit_insn(gen_split_stack_marker());
2x missing blank before (

> +      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
> +        {
> +          /* If va_start is used, just use r15.  */
> +          emit_move_insn (r1,
> +		          gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> +			                GEN_INT (STACK_POINTER_OFFSET)));
virtual_incoming_args_rtx ?

> +        }
> +      return;
> +    }
> +
> +  if (morestack_ref == NULL_RTX)
> +    {
> +      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
> +      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
> +					   | SYMBOL_FLAG_FUNCTION);
> +    }
> +
> +  if (frame_size <= 0x7fff || (TARGET_EXTIMM && frame_size <= 0xffffffffu))
The agfi immediate value is a signed 32 bit integer.  So you can only
add up to 2G-1.  I think it would be more readable to write this as:

if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Os (frame_size))

as in s390_emit_prologue. The Os check will check for TARGET_EXTIMM as well.

> +    {
> +      /* If frame_size will fit in an add instruction, do a stack space
> +	 check, and only call __morestack if there's not enough space.  */
> +      conditional = 1;
> +
> +      /* Get thread pointer.  r1 is the only register we can always destroy - r0
> +         could contain a static chain (and cannot be used to address memory
> +         anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved.  */
> +      emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM));
> +      /* Aim at __private_ss.  */
> +      guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso));
> +
> +      /* If less that 1kiB used, skip addition and compare directly with
> +         __private_ss.  */
> +      if (frame_size > SPLIT_STACK_AVAILABLE)
> +        {
> +          emit_move_insn (r1, guard);
> +	  if (TARGET_64BIT)
> +	    emit_insn (gen_adddi3 (r1, r1, GEN_INT(frame_size)));
> +	  else
> +	    emit_insn (gen_addsi3 (r1, r1, GEN_INT(frame_size)));
> +	  guard = r1;
> +        }
> +
> +      if (TARGET_CPU_ZARCH)
> +        {
Looks like the !TARGET_CPU_ZARCH stuff hasn't been completely removed?!

> +	  rtx tmp;
> +
> +          /* Compare the (maybe adjusted) guard with the stack pointer.  */
> +          cc = s390_emit_compare (LT, stack_pointer_rtx, guard);
> +
> +          call_done = gen_label_rtx ();
> +
> +	  if (TARGET_64BIT)
> +	    tmp = gen_split_stack_cond_call_di (call_done,
> +						morestack_ref,
> +						GEN_INT (frame_size),
> +						GEN_INT (args_size),
> +						cc);
> +	  else
> +	    tmp = gen_split_stack_cond_call_si (call_done,
> +						morestack_ref,
> +						GEN_INT (frame_size),
> +						GEN_INT (args_size),
> +						cc);
Perhaps it would be more readable to do the TARGET_64BIT check in a separate
expander.  Please see "movstr" in s390.md. The same applies to all the
other gen_split_stack* invocations.

> +
> +
> +          insn = emit_jump_insn (tmp);
> +	  JUMP_LABEL (insn) = call_done;
> +
> +          /* Mark the jump as very unlikely to be taken.  */
> +          add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100);
> +	}
> +      else
> +        {
> +          /* Compare the (maybe adjusted) guard with the stack pointer.  */
> +          cc = s390_emit_compare (GE, stack_pointer_rtx, guard);
> +
> +          enough = gen_label_rtx ();
> +          insn = s390_emit_jump (enough, cc);
> +          JUMP_LABEL (insn) = enough;
> +
> +          /* Mark the jump as very likely to be taken.  */
> +          add_int_reg_note (insn, REG_BR_PROB,
> +			    REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100);
> +	}
> +    }
> +
> +  if (call_done == NULL)
With the !TARGET_CPU_ZARCH path removed above this could be the else
path to the frame_size check and call_done can be removed.

> +    {
> +      rtx tmp;
> +      call_done = gen_label_rtx ();
> +
> +      /* Now, we need to call __morestack.  It has very special calling
> +         conventions: it preserves param/return/static chain registers for
> +         calling main function body, and looks for its own parameters
> +         at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */
> +      if (TARGET_64BIT)
> +        tmp = gen_split_stack_call_di (call_done,
> +					     morestack_ref,
> +					     GEN_INT (frame_size),
> +					     GEN_INT (args_size));
Indentation.

> +      else
> +        tmp = gen_split_stack_call_si (call_done,
> +					     morestack_ref,
> +					     GEN_INT (frame_size),
> +					     GEN_INT (args_size));
Indentation.

> +      insn = emit_jump_insn (tmp);
> +      JUMP_LABEL (insn) = call_done;
> +      emit_barrier ();
> +    }
> +
> +  /* __morestack will call us here.  */
> +
> +  if (enough != NULL)
> +    {
> +      emit_label (enough);
> +      LABEL_NUSES (enough) = 1;
> +    }
This also was only for !TARGET_CPU_ZARCH.

> +
> +  if (conditional && cfun->machine->split_stack_varargs_pointer != NULL_RTX)
> +    {
> +      /* If va_start is used, and __morestack was not called, just use r15.  */
> +      emit_move_insn (r1,
> +		      gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> +			            GEN_INT (STACK_POINTER_OFFSET)));
virtual_incoming_args_rtx?

> +    }
> +
> +  emit_label (call_done);
> +  LABEL_NUSES (call_done) = 1;
> +}
> +
> +/* Generates split-stack call sequence, along with its parameter block.  */
> +
> +static void
> +s390_expand_split_stack_call (rtx_insn *orig_insn,
> +			      rtx call_done,
> +			      rtx function,
> +			      rtx frame_size,
> +			      rtx args_size,
> +			      rtx cond)
> +{
> +  int psize = GET_MODE_SIZE (Pmode);
> +  rtx_insn *insn = orig_insn;
> +  rtx parmbase = gen_label_rtx();
> +  rtx r1 = gen_rtx_REG (Pmode, 1);
> +  rtx tmp, tmp2;
> +
> +  /* %r1 = litbase.  */
> +  insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn);
> +  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
> +  LABEL_NUSES (parmbase)++;
> +
> +  /* jg<cond> __morestack.  */
> +  if (cond == NULL)
> +    {
> +      if (TARGET_64BIT)
> +        tmp = gen_split_stack_sibcall_di (function, call_done);
> +      else
> +        tmp = gen_split_stack_sibcall_si (function, call_done);
> +      insn = emit_jump_insn_after (tmp, insn);
> +    }
> +  else
> +    {
> +      if (!s390_comparison (cond, VOIDmode))
> +	internal_error ("bad split_stack_call cond");
Perhaps just gcc_assert (s390_comparison (cond, VOIDmode)); ?

> +      if (TARGET_64BIT)
> +        tmp = gen_split_stack_cond_sibcall_di (function, cond, call_done);
> +      else
> +        tmp = gen_split_stack_cond_sibcall_si (function, cond, call_done);
> +      insn = emit_jump_insn_after (tmp, insn);
> +    }
> +  JUMP_LABEL (insn) = call_done;
> +  LABEL_NUSES (call_done)++;
> +
> +  /* Go to .rodata.  */
> +  insn = emit_insn_after (gen_pool_section_start (), insn);
> +
> +  /* Now, we'll emit parameters to __morestack.  First, align to pointer size
> +     (this mirrors the alignment done in __morestack - don't touch it).  */
> +  insn = emit_insn_after (gen_pool_align (GEN_INT (psize)), insn);
psize -> UNITS_PER_LONG?

> +
> +  insn = emit_label_after (parmbase, insn);
> +
> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
> +				 gen_rtvec (1, frame_size),
> +				 UNSPECV_POOL_ENTRY);
> +  insn = emit_insn_after (tmp, insn);
> +
> +  /* Second parameter is size of the arguments passed on stack that
> +     __morestack has to copy to the new stack (does not include varargs).  */
> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
> +				 gen_rtvec (1, args_size),
> +				 UNSPECV_POOL_ENTRY);
> +  insn = emit_insn_after (tmp, insn);
> +
> +  /* Third parameter is offset between start of the parameter block
> +     and function body to be called by __morestack.  */
> +  tmp = gen_rtx_LABEL_REF (Pmode, parmbase);
> +  tmp2 = gen_rtx_LABEL_REF (Pmode, call_done);
> +  tmp = gen_rtx_CONST (Pmode,
> +                       gen_rtx_MINUS (Pmode, tmp2, tmp));
> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
> +				 gen_rtvec (1, tmp),
> +				 UNSPECV_POOL_ENTRY);
> +  insn = emit_insn_after (tmp, insn);
> +  add_reg_note (insn, REG_LABEL_OPERAND, call_done);
> +  LABEL_NUSES (call_done)++;
> +  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
> +  LABEL_NUSES (parmbase)++;
> +
> +  /* Return from .rodata.  */
> +  insn = emit_insn_after (gen_pool_section_end (), insn);
> +
> +  delete_insn (orig_insn);
> +}
> +
> +/* We may have to tell the dataflow pass that the split stack prologue
> +   is initializing a register.  */
> +
> +static void
> +s390_live_on_entry (bitmap regs)
> +{
> +  if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
> +    {
> +      gcc_assert (flag_split_stack);
> +      bitmap_set_bit (regs, 1);
> +    }
> +}
> +
>  /* Return true if the function can use simple_return to return outside
>     of a shrink-wrapped region.  At present shrink-wrapping is supported
>     in all cases.  */
> @@ -11541,6 +11832,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
>        expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
>      }
> 
> +  if (flag_split_stack
> +     && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
> +         == NULL)
> +     && cfun->machine->split_stack_varargs_pointer == NULL_RTX)
> +    {
> +      rtx reg;
> +      rtx_insn *seq;
> +
> +      reg = gen_reg_rtx (Pmode);
> +      cfun->machine->split_stack_varargs_pointer = reg;
> +
> +      start_sequence ();
> +      emit_move_insn (reg, gen_rtx_REG (Pmode, 1));
> +      seq = get_insns ();
> +      end_sequence ();
> +
> +      push_topmost_sequence ();
> +      emit_insn_after (seq, entry_of_function ());
> +      pop_topmost_sequence ();
> +    }
> +
>    /* Find the overflow area.
>       FIXME: This currently is too pessimistic when the vector ABI is
>       enabled.  In that case we *always* set up the overflow area
> @@ -11549,7 +11861,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
>        || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG
>        || TARGET_VX_ABI)
>      {
> -      t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
> +      if (cfun->machine->split_stack_varargs_pointer == NULL_RTX)
> +        t = make_tree (TREE_TYPE (ovf), crtl->args.internal_arg_pointer);
What is the reason for changing virtual_incoming_args_rtx to
crtl->args.internal_arg_pointer in the non-split-stack case?

> +      else
> +        t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer);
> 
>        off = INTVAL (crtl->args.arg_offset_rtx);
>        off = off < 0 ? 0 : off;
> @@ -13158,6 +13473,48 @@ s390_reorg (void)
>  	}
>      }
> 
> +  if (flag_split_stack)
> +    {
> +      rtx_insn *insn;
> +
> +      for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
> +	{
> +	  /* Look for the split-stack fake jump instructions.  */
> +	  if (!JUMP_P(insn))
> +	    continue;
> +	  if (GET_CODE (PATTERN (insn)) != PARALLEL
> +	      || XVECLEN (PATTERN (insn), 0) != 2)
> +	    continue;
> +	  rtx set = XVECEXP (PATTERN (insn), 0, 1);
> +	  if (GET_CODE (set) != SET)
> +	    continue;
> +	  rtx unspec = XEXP(set, 1);
> +	  if (GET_CODE (unspec) != UNSPEC_VOLATILE)
> +	    continue;
> +	  if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL)
> +	    continue;
> +	  rtx set_pc = XVECEXP (PATTERN (insn), 0, 0);
> +	  rtx function = XVECEXP (unspec, 0, 0);
> +	  rtx frame_size = XVECEXP (unspec, 0, 1);
> +	  rtx args_size = XVECEXP (unspec, 0, 2);
> +	  rtx pc_src = XEXP (set_pc, 1);
> +	  rtx call_done, cond = NULL_RTX;
> +	  if (GET_CODE (pc_src) == IF_THEN_ELSE)
> +	    {
> +	      cond = XEXP (pc_src, 0);
> +	      call_done = XEXP (XEXP (pc_src, 1), 0);
> +	    }
> +	  else
> +	    call_done = XEXP (pc_src, 0);
> +	  s390_expand_split_stack_call (insn,
> +					call_done,
> +					function,
> +					frame_size,
> +					args_size,
> +					cond);
> +	}
> +    }
> +
I'm wondering if it is really necessary to expand the call in that
two-step approach?! We do the general literal pool handling in
s390_reorg because we need all the insn lengths to be finalized before
performing the branch/pool splitting loop.  But this shouldn't be necessary
in this case.  Would it be possible to expand the call already in
emit_prologue phase and get rid of the s390_reorg part?

>    /* Try to optimize prologue and epilogue further.  */
>    s390_optimize_prologue ();
> 
> @@ -14469,6 +14826,9 @@ s390_asm_file_end (void)
>  	     s390_vector_abi);
>  #endif
>    file_end_indicate_exec_stack ();
> +
> +  if (flag_split_stack)
> +    file_end_indicate_split_stack ();
>  }
> 
>  /* Return true if TYPE is a vector bool type.  */
> @@ -14724,6 +15084,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty
>  #undef TARGET_SET_UP_BY_PROLOGUE
>  #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue
> 
> +#undef TARGET_EXTRA_LIVE_ON_ENTRY
> +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry
> +
>  #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
>  #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
>    s390_use_by_pieces_infrastructure_p
> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
> index 9b869d5..21cd989 100644
> --- a/gcc/config/s390/s390.md
> +++ b/gcc/config/s390/s390.md
> @@ -114,6 +114,9 @@
>     UNSPEC_SP_SET
>     UNSPEC_SP_TEST
> 
> +   ; Split stack support
> +   UNSPEC_STACK_CHECK
> +
>     ; Test Data Class (TDC)
>     UNSPEC_TDC_INSN
> 
> @@ -276,6 +279,11 @@
>     ; Set and get floating point control register
>     UNSPECV_SFPC
>     UNSPECV_EFPC
> +
> +   ; Split stack support
> +   UNSPECV_SPLIT_STACK_CALL
> +   UNSPECV_SPLIT_STACK_SIBCALL
> +   UNSPECV_SPLIT_STACK_MARKER
>    ])
> 
>  ;;
> @@ -10907,3 +10915,104 @@
>    "TARGET_Z13"
>    "lcbb\t%0,%1,%b2"
>    [(set_attr "op_type" "VRX")])
> +
> +; Handle -fsplit-stack.
> +
> +(define_expand "split_stack_prologue"
> +  [(const_int 0)]
> +  ""
> +{
> +  s390_expand_split_stack_prologue ();
> +  DONE;
> +})
> +
> +(define_insn "split_stack_call_<mode>"
> +  [(set (pc) (label_ref (match_operand 0 "" "")))
> +   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
> +                                    (match_operand 2 "consttable_operand" "X")
> +                                    (match_operand 3 "consttable_operand" "X")]
> +                                   UNSPECV_SPLIT_STACK_CALL))]
> +  "TARGET_CPU_ZARCH"
> +{
> +  gcc_unreachable ();
> +}
> +  [(set_attr "length" "12")])
> +
> +(define_insn "split_stack_cond_call_<mode>"
> +  [(set (pc)
> +        (if_then_else
> +          (match_operand 4 "" "")
> +          (label_ref (match_operand 0 "" ""))
> +          (pc)))
> +   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
> +                                    (match_operand 2 "consttable_operand" "X")
> +                                    (match_operand 3 "consttable_operand" "X")]
> +                                   UNSPECV_SPLIT_STACK_CALL))]
> +  "TARGET_CPU_ZARCH"
> +{
> +  gcc_unreachable ();
> +}
> +  [(set_attr "length" "12")])
> +
> +;; If there are operand 0 bytes available on the stack, jump to
> +;; operand 1.
> +
> +(define_expand "split_stack_space_check"
> +  [(set (pc) (if_then_else
> +	      (ltu (minus (reg 15)
> +			  (match_operand 0 "register_operand"))
> +		   (unspec [(const_int 0)] UNSPEC_STACK_CHECK))
> +	      (label_ref (match_operand 1))
> +	      (pc)))]
> +  ""
> +{
> +  /* Offset from thread pointer to __private_ss.  */
> +  int psso = TARGET_64BIT ? 0x38 : 0x20;
> +  rtx tp = s390_get_thread_pointer ();
> +  rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso));
> +  rtx reg = gen_reg_rtx (Pmode);
> +  rtx cc;
> +  if (TARGET_64BIT)
> +    emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0]));
> +  else
> +    emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0]));
> +  cc = s390_emit_compare (GT, reg, guard);
> +  s390_emit_jump (operands[1], cc);
> +
> +  DONE;
> +})
This expander does not seem to get called from anywhere.

> +
> +;; A jg with minimal fuss for use in split stack prologue.
> +
> +(define_insn "split_stack_sibcall_<mode>"
> +  [(set (pc) (label_ref (match_operand 1 "" "")))
> +   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
> +                                   UNSPECV_SPLIT_STACK_SIBCALL))]
> +  "TARGET_CPU_ZARCH"
> +  "jg\t%0"
> +  [(set_attr "op_type" "RIL")
> +   (set_attr "type"  "branch")])
> +
> +;; Also a conditional one.
> +
> +(define_insn "split_stack_cond_sibcall_<mode>"
> +  [(set (pc)
> +        (if_then_else
> +          (match_operand 1 "" "")
> +          (label_ref (match_operand 2 "" ""))
> +          (pc)))
> +   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
> +                                   UNSPECV_SPLIT_STACK_SIBCALL))]
> +  "TARGET_CPU_ZARCH"
> +  "jg%C1\t%0"
> +  [(set_attr "op_type" "RIL")
> +   (set_attr "type"  "branch")])
> +
> +;; An unusual nop instruction used to mark functions with no stack frames
> +;; as split-stack aware.
> +
> +(define_insn "split_stack_marker"
> +  [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)]
> +  ""
> +  "nopr\t%%r15"
> +  [(set_attr "op_type" "RR")])
> diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
> index 4cd8f01..604b120 100644
> --- a/libgcc/ChangeLog
> +++ b/libgcc/ChangeLog
> @@ -1,3 +1,10 @@
> +2016-01-16  Marcin Kościelnicki  <koriakin@0x04.net>
> +
> +	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
> +	* config/s390/morestack.S: New file.
> +	* config/s390/t-stack-s390: New file.
> +	* generic-morestack.c (__splitstack_find): Add s390-specific code.
> +
>  2016-01-15  Nick Clifton  <nickc@redhat.com>
> 
>  	* config/msp430/t-msp430 (lib2_mul_none.o): Only use the first
> diff --git a/libgcc/config.host b/libgcc/config.host
> index f58ee45..9793155 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -1105,11 +1105,11 @@ rx-*-elf)
>  	tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h"
>  	;;
>  s390-*-linux*)
> -	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi"
> +	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390"
>  	md_unwind_header=s390/linux-unwind.h
>  	;;
>  s390x-*-linux*)
> -	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux"
> +	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390"
>  	if test "${host_address}" = 32; then
>  	   tmake_file="${tmake_file} s390/32/t-floattodi"
>  	fi
> diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S
> new file mode 100644
> index 0000000..c99f6e4
> --- /dev/null
> +++ b/libgcc/config/s390/morestack.S
> @@ -0,0 +1,609 @@
> +# s390 support for -fsplit-stack.
> +# Copyright (C) 2015 Free Software Foundation, Inc.
> +# Contributed by Marcin Kościelnicki <koriakin@0x04.net>.
> +
> +# This file is part of GCC.
> +
> +# GCC is free software; you can redistribute it and/or modify it under
> +# the terms of the GNU General Public License as published by the Free
> +# Software Foundation; either version 3, or (at your option) any later
> +# version.
> +
> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +# for more details.
> +
> +# Under Section 7 of GPL version 3, you are granted additional
> +# permissions described in the GCC Runtime Library Exception, version
> +# 3.1, as published by the Free Software Foundation.
> +
> +# You should have received a copy of the GNU General Public License and
> +# a copy of the GCC Runtime Library Exception along with this program;
> +# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +# <http://www.gnu.org/licenses/>.
> +
> +# Excess space needed to call ld.so resolver for lazy plt
> +# resolution.  Go uses sigaltstack so this doesn't need to
> +# also cover signal frame size.
> +#define BACKOFF 0x1000
> +
> +# The __morestack function.
> +
> +	.global	__morestack
> +	.hidden	__morestack
> +
> +	.type	__morestack,@function
> +
> +__morestack:
> +.LFB1:
> +	.cfi_startproc
> +
> +
> +#ifndef __s390x__
> +
> +
> +# The 31-bit __morestack function.
> +
> +	# We use a cleanup to restore the stack guard if an exception
> +	# is thrown through this code.
> +#ifndef __PIC__
> +	.cfi_personality 0,__gcc_personality_v0
> +	.cfi_lsda 0,.LLSDA1
> +#else
> +	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
> +	.cfi_lsda 0x1b,.LLSDA1
> +#endif
> +
> +	stm	%r2, %r15, 0x8(%r15)	# Save %r2-%r15.
> +	.cfi_offset %r6, -0x48
> +	.cfi_offset %r7, -0x44
> +	.cfi_offset %r8, -0x40
> +	.cfi_offset %r9, -0x3c
> +	.cfi_offset %r10, -0x38
> +	.cfi_offset %r11, -0x34
> +	.cfi_offset %r12, -0x30
> +	.cfi_offset %r13, -0x2c
> +	.cfi_offset %r14, -0x28
> +	.cfi_offset %r15, -0x24
> +	lr	%r11, %r15		# Make frame pointer for vararg.
> +	.cfi_def_cfa_register %r11
> +	ahi	%r15, -0x60		# 0x60 for standard frame.
> +	st	%r11, 0(%r15)		# Save back chain.
> +	lr	%r8, %r0		# Save %r0 (static chain).
> +	lr	%r10, %r1		# Save %r1 (address of parameter block).
> +
> +	l	%r7, 0(%r10)		# Required frame size to %r7
> +	ear	%r1, %a0		# Extract thread pointer.
> +	l	%r1, 0x20(%r1)		# Get stack bounduary
> +	ar	%r1, %r7		# Stack bounduary + frame size
> +	a	%r1, 4(%r10)		# + stack param size
> +	clr	%r1, %r15		# Compare with current stack pointer
> +	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
> +
> +	brasl	%r14, __morestack_block_signals
> +
> +	# We abuse one of caller's fpr save slots (which we don't use for fprs)
> +	# as a local variable.  Not needed here, but done to be consistent with
> +	# the below use.
> +	ahi	%r7, BACKOFF		# Bump requested size a bit.
> +	st	%r7, 0x40(%r11)		# Stuff frame size on stack.
> +	la	%r2, 0x40(%r11)		# Pass its address as parameter.
> +	la	%r3, 0x60(%r11)		# Caller's stack parameters.
> +	l	%r4, 4(%r10)		# Size of stack paremeters.
parameters

> +	brasl	%r14, __generic_morestack
> +
> +	lr	%r15, %r2		# Switch to the new stack.
> +	ahi	%r15, -0x60		# Make a stack frame on it.
> +	st	%r11, 0(%r15)		# Save back chain.
> +
> +	s	%r2, 0x40(%r11)		# The end of stack space.
> +	ahi	%r2, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0		# Extract thread pointer.
> +.LEHB0:
> +	st	%r2, 0x20(%r1)	# Save the new stack boundary.
> +
> +	brasl	%r14, __morestack_unblock_signals
> +
> +	lr	%r0, %r8		# Static chain.
> +	lm	%r2, %r6, 0x8(%r11)	# Paremeter registers.
> +
> +	# Third parameter is address of function meat - address of parameter
> +	# block.
> +	a	%r10, 0x8(%r10)
> +
> +	# Leave vararg pointer in %r1, in case function uses it
> +	la	%r1, 0x60(%r11)
> +
> +	# State of registers:
> +	# %r0: Static chain from entry.
> +	# %r1: Vararg pointer.
> +	# %r2-%r6: Parameters from entry.
> +	# %r7-%r10: Indeterminate.
> +	# %r11: Frame pointer (%r15 from entry).
> +	# %r12-%r13: Indeterminate.
> +	# %r14: Return address.
> +	# %r15: Stack pointer.
> +	basr	%r14, %r10		# Call our caller.
> +
> +	stm	%r2, %r3, 0x8(%r11)	# Save return registers.
> +
> +	brasl	%r14, __morestack_block_signals
> +
> +	# We need a stack slot now, but have no good way to get it - the frame
> +	# on new stack had to be exactly 0x60 bytes, or stack parameters would
> +	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
> +	# save actual fprs).
> +	la	%r2, 0x40(%r11)
> +	brasl	%r14, __generic_releasestack
> +
> +	s	%r2, 0x40(%r11)		# Subtract available space.
> +	ahi	%r2, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0		# Extract thread pointer.
> +.LEHE0:
> +	st	%r2, 0x20(%r1)	# Save the new stack boundary.
> +
> +	# We need to restore the old stack pointer before unblocking signals.
> +	# We also need 0x60 bytes for a stack frame.  Since we had a stack
> +	# frame at this place before the stack switch, there's no need to
> +	# write the back chain again.
> +	lr	%r15, %r11
> +	ahi	%r15, -0x60
> +
> +	brasl	%r14, __morestack_unblock_signals
> +
> +	lm	%r2, %r15, 0x8(%r11)	# Restore all registers.
> +	.cfi_remember_state
> +	.cfi_restore %r15
> +	.cfi_restore %r14
> +	.cfi_restore %r13
> +	.cfi_restore %r12
> +	.cfi_restore %r11
> +	.cfi_restore %r10
> +	.cfi_restore %r9
> +	.cfi_restore %r8
> +	.cfi_restore %r7
> +	.cfi_restore %r6
> +	.cfi_def_cfa_register %r15
> +	br	%r14			# Return to caller's caller.
> +
> +# Executed if no new stack allocation is needed.
> +
> +.Lnoalloc:
> +	.cfi_restore_state
> +	# We may need to copy stack parameters.
> +	l	%r9, 0x4(%r10)		# Load stack parameter size.
> +	ltr	%r9, %r9		# And check if it's 0.
> +	je	.Lnostackparm		# Skip the copy if not needed.
> +	sr	%r15, %r9		# Make space on the stack.
> +	la	%r8, 0x60(%r15)		# Destination.
> +	la	%r12, 0x60(%r11)	# Source.
> +	lr	%r13, %r9		# Source size.
> +.Lcopy:
> +	mvcle	%r8, %r12, 0		# Copy.
> +	jo	.Lcopy
> +
> +.Lnostackparm:
> +	# Third parameter is address of function meat - address of parameter
> +	# block.
> +	a	%r10, 0x8(%r10)
> +
> +	# Leave vararg pointer in %r1, in case function uses it
> +	la	%r1, 0x60(%r11)
> +
> +	# OK, no stack allocation needed.  We still follow the protocol and
> +	# call our caller - it doesn't cost much and makes sure vararg works.
> +	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
> +	basr	%r14, %r10		# Call our caller.
The comment confuses me.  It somewhat sounds to me like the call
wouldn't be really needed but in fact it cannot even remotely work
without jumping back to the function body right?!

> +
> +	lm	%r6, %r15, 0x18(%r11)	# Restore all callee-saved registers.
> +	.cfi_remember_state
> +	.cfi_restore %r15
> +	.cfi_restore %r14
> +	.cfi_restore %r13
> +	.cfi_restore %r12
> +	.cfi_restore %r11
> +	.cfi_restore %r10
> +	.cfi_restore %r9
> +	.cfi_restore %r8
> +	.cfi_restore %r7
> +	.cfi_restore %r6
> +	.cfi_def_cfa_register %r15
> +	br	%r14			# Return to caller's caller.
> +
> +# This is the cleanup code called by the stack unwinder when unwinding
> +# through the code between .LEHB0 and .LEHE0 above.
> +
> +.L1:
> +	.cfi_restore_state
> +	lr	%r2, %r11		# Stack pointer after resume.
> +	brasl	%r14, __generic_findstack
> +	lr	%r3, %r11		# Get the stack pointer.
> +	sr	%r3, %r2		# Subtract available space.
> +	ahi	%r3, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0		# Extract thread pointer.
> +	st	%r3, 0x20(%r1)	# Save the new stack boundary.
> +
> +	lr	%r2, %r6		# Exception header.
> +#ifdef __PIC__
> +	brasl	%r14, _Unwind_Resume@PLT
> +#else
> +	brasl	%r14, _Unwind_Resume
> +#endif
> +
> +#else /* defined(__s390x__) */
> +
> +
> +# The 64-bit __morestack function.
> +
> +	# We use a cleanup to restore the stack guard if an exception
> +	# is thrown through this code.
> +#ifndef __PIC__
> +	.cfi_personality 0x3,__gcc_personality_v0
> +	.cfi_lsda 0x3,.LLSDA1
> +#else
> +	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
> +	.cfi_lsda 0x1b,.LLSDA1
> +#endif
> +
> +	stmg	%r2, %r15, 0x10(%r15)	# Save %r2-%r15.
> +	.cfi_offset %r6, -0x70
> +	.cfi_offset %r7, -0x68
> +	.cfi_offset %r8, -0x60
> +	.cfi_offset %r9, -0x58
> +	.cfi_offset %r10, -0x50
> +	.cfi_offset %r11, -0x48
> +	.cfi_offset %r12, -0x40
> +	.cfi_offset %r13, -0x38
> +	.cfi_offset %r14, -0x30
> +	.cfi_offset %r15, -0x28
> +	lgr	%r11, %r15		# Make frame pointer for vararg.
> +	.cfi_def_cfa_register %r11
> +	aghi	%r15, -0xa0		# 0xa0 for standard frame.
> +	stg	%r11, 0(%r15)		# Save back chain.
> +	lgr	%r8, %r0		# Save %r0 (static chain).
> +	lgr	%r10, %r1		# Save %r1 (address of parameter block).
> +
> +	lg	%r7, 0(%r10)		# Required frame size to %r7
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1		# Extract thread pointer.
> +	lg	%r1, 0x38(%r1)		# Get stack bounduary
> +	agr	%r1, %r7		# Stack bounduary + frame size
> +	ag	%r1, 8(%r10)		# + stack param size
> +	clgr	%r1, %r15		# Compare with current stack pointer
> +	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
> +
> +	brasl	%r14, __morestack_block_signals
> +
> +	# We abuse one of caller's fpr save slots (which we don't use for fprs)
> +	# as a local variable.  Not needed here, but done to be consistent with
> +	# the below use.
> +	aghi	%r7, BACKOFF		# Bump requested size a bit.
> +	stg	%r7, 0x80(%r11)		# Stuff frame size on stack.
> +	la	%r2, 0x80(%r11)		# Pass its address as parameter.
> +	la	%r3, 0xa0(%r11)		# Caller's stack parameters.
> +	lg	%r4, 8(%r10)		# Size of stack paremeters.
> +	brasl	%r14, __generic_morestack
> +
> +	lgr	%r15, %r2		# Switch to the new stack.
> +	aghi	%r15, -0xa0		# Make a stack frame on it.
> +	stg	%r11, 0(%r15)		# Save back chain.
> +
> +	sg	%r2, 0x80(%r11)		# The end of stack space.
> +	aghi	%r2, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1		# Extract thread pointer.
> +.LEHB0:
> +	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
> +
> +	brasl	%r14, __morestack_unblock_signals
> +
> +	lgr	%r0, %r8		# Static chain.
> +	lmg	%r2, %r6, 0x10(%r11)	# Paremeter registers.
> +
> +	# Third parameter is address of function meat - address of parameter
> +	# block.
> +	ag	%r10, 0x10(%r10)
> +
> +	# Leave vararg pointer in %r1, in case function uses it
> +	la	%r1, 0xa0(%r11)
> +
> +	# State of registers:
> +	# %r0: Static chain from entry.
> +	# %r1: Vararg pointer.
> +	# %r2-%r6: Parameters from entry.
> +	# %r7-%r10: Indeterminate.
> +	# %r11: Frame pointer (%r15 from entry).
> +	# %r12-%r13: Indeterminate.
> +	# %r14: Return address.
> +	# %r15: Stack pointer.
> +	basr	%r14, %r10		# Call our caller.
> +
> +	stg	%r2, 0x10(%r11)		# Save return register.
> +
> +	brasl	%r14, __morestack_block_signals
> +
> +	# We need a stack slot now, but have no good way to get it - the frame
> +	# on new stack had to be exactly 0xa0 bytes, or stack parameters would
> +	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
> +	# save actual fprs).
> +	la	%r2, 0x80(%r11)
> +	brasl	%r14, __generic_releasestack
> +
> +	sg	%r2, 0x80(%r11)		# Subtract available space.
> +	aghi	%r2, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1		# Extract thread pointer.
> +.LEHE0:
> +	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
> +
> +	# We need to restore the old stack pointer before unblocking signals.
> +	# We also need 0xa0 bytes for a stack frame.  Since we had a stack
> +	# frame at this place before the stack switch, there's no need to
> +	# write the back chain again.
> +	lgr	%r15, %r11
> +	aghi	%r15, -0xa0
> +
> +	brasl	%r14, __morestack_unblock_signals
> +
> +	lmg	%r2, %r15, 0x10(%r11)	# Restore all registers.
> +	.cfi_remember_state
> +	.cfi_restore %r15
> +	.cfi_restore %r14
> +	.cfi_restore %r13
> +	.cfi_restore %r12
> +	.cfi_restore %r11
> +	.cfi_restore %r10
> +	.cfi_restore %r9
> +	.cfi_restore %r8
> +	.cfi_restore %r7
> +	.cfi_restore %r6
> +	.cfi_def_cfa_register %r15
> +	br	%r14			# Return to caller's caller.
> +
> +# Executed if no new stack allocation is needed.
> +
> +.Lnoalloc:
> +	.cfi_restore_state
> +	# We may need to copy stack parameters.
> +	lg	%r9, 0x8(%r10)		# Load stack parameter size.
> +	ltgr	%r9, %r9		# Check if it's 0.
> +	je	.Lnostackparm		# Skip the copy if not needed.
> +	sgr	%r15, %r9		# Make space on the stack.
> +	la	%r8, 0xa0(%r15)		# Destination.
> +	la	%r12, 0xa0(%r11)	# Source.
> +	lgr	%r13, %r9		# Source size.
> +.Lcopy:
> +	mvcle	%r8, %r12, 0		# Copy.
> +	jo	.Lcopy
> +
> +.Lnostackparm:
> +	# Third parameter is address of function meat - address of parameter
> +	# block.
> +	ag	%r10, 0x10(%r10)
> +
> +	# Leave vararg pointer in %r1, in case function uses it
> +	la	%r1, 0xa0(%r11)
> +
> +	# OK, no stack allocation needed.  We still follow the protocol and
> +	# call our caller - it doesn't cost much and makes sure vararg works.
> +	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
> +	basr	%r14, %r10		# Call our caller.
> +
> +	lmg	%r6, %r15, 0x30(%r11)	# Restore all callee-saved registers.
> +	.cfi_remember_state
> +	.cfi_restore %r15
> +	.cfi_restore %r14
> +	.cfi_restore %r13
> +	.cfi_restore %r12
> +	.cfi_restore %r11
> +	.cfi_restore %r10
> +	.cfi_restore %r9
> +	.cfi_restore %r8
> +	.cfi_restore %r7
> +	.cfi_restore %r6
> +	.cfi_def_cfa_register %r15
> +	br	%r14			# Return to caller's caller.
> +
> +# This is the cleanup code called by the stack unwinder when unwinding
> +# through the code between .LEHB0 and .LEHE0 above.
> +
> +.L1:
> +	.cfi_restore_state
> +	lgr	%r2, %r11		# Stack pointer after resume.
> +	brasl	%r14, __generic_findstack
> +	lgr	%r3, %r11		# Get the stack pointer.
> +	sgr	%r3, %r2		# Subtract available space.
> +	aghi	%r3, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1		# Extract thread pointer.
> +	stg	%r3, 0x38(%r1)	# Save the new stack boundary.
> +
> +	lgr	%r2, %r6		# Exception header.
> +#ifdef __PIC__
> +	brasl	%r14, _Unwind_Resume@PLT
> +#else
> +	brasl	%r14, _Unwind_Resume
> +#endif
> +
> +#endif /* defined(__s390x__) */
> +
> +	.cfi_endproc
> +	.size	__morestack, . - __morestack
> +
> +
> +# The exception table.  This tells the personality routine to execute
> +# the exception handler.
> +
> +	.section	.gcc_except_table,"a",@progbits
> +	.align	4
> +.LLSDA1:
> +	.byte	0xff	# @LPStart format (omit)
> +	.byte	0xff	# @TType format (omit)
> +	.byte	0x1	# call-site format (uleb128)
> +	.uleb128 .LLSDACSE1-.LLSDACSB1	# Call-site table length
> +.LLSDACSB1:
> +	.uleb128 .LEHB0-.LFB1	# region 0 start
> +	.uleb128 .LEHE0-.LEHB0	# length
> +	.uleb128 .L1-.LFB1	# landing pad
> +	.uleb128 0		# action
> +.LLSDACSE1:
> +
> +
> +	.global __gcc_personality_v0
> +#ifdef __PIC__
> +	# Build a position independent reference to the basic
> +        # personality function.
> +	.hidden DW.ref.__gcc_personality_v0
> +	.weak   DW.ref.__gcc_personality_v0
> +	.section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
> +	.type	DW.ref.__gcc_personality_v0, @object
> +DW.ref.__gcc_personality_v0:
> +#ifndef __LP64__
> +	.align 4
> +	.size	DW.ref.__gcc_personality_v0, 4
> +	.long	__gcc_personality_v0
> +#else
> +	.align 8
> +	.size	DW.ref.__gcc_personality_v0, 8
> +	.quad	__gcc_personality_v0
> +#endif
> +#endif
> +
> +
> +
> +# Initialize the stack test value when the program starts or when a
> +# new thread starts.  We don't know how large the main stack is, so we
> +# guess conservatively.  We might be able to use getrlimit here.
> +
> +	.text
> +	.global	__stack_split_initialize
> +	.hidden	__stack_split_initialize
> +
> +	.type	__stack_split_initialize, @function
> +
> +__stack_split_initialize:
> +
> +#ifndef __s390x__
> +
> +	ear	%r1, %a0
> +	lr	%r0, %r15
> +	ahi	%r0, -0x4000	# We should have at least 16K.
> +	st	%r0, 0x20(%r1)
> +
> +	lr	%r2, %r15
> +	lhi	%r3, 0x4000
> +#ifdef __PIC__
> +	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
> +#else
> +	jg	__generic_morestack_set_initial_sp	# Tail call
> +#endif
> +
> +#else /* defined(__s390x__) */
> +
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1
> +	lgr	%r0, %r15
> +	aghi	%r0, -0x4000	# We should have at least 16K.
> +	stg	%r0, 0x38(%r1)
> +
> +	lgr	%r2, %r15
> +	lghi	%r3, 0x4000
> +#ifdef __PIC__
> +	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
> +#else
> +	jg	__generic_morestack_set_initial_sp	# Tail call
> +#endif
> +
> +#endif /* defined(__s390x__) */
> +
> +	.size	__stack_split_initialize, . - __stack_split_initialize
> +
> +# Routines to get and set the guard, for __splitstack_getcontext,
> +# __splitstack_setcontext, and __splitstack_makecontext.
> +
> +# void *__morestack_get_guard (void) returns the current stack guard.
> +	.text
> +	.global	__morestack_get_guard
> +	.hidden	__morestack_get_guard
> +
> +	.type	__morestack_get_guard,@function
> +
> +__morestack_get_guard:
> +
> +#ifndef __s390x__
> +	ear	%r1, %a0
> +	l	%r2, 0x20(%r1)
> +#else
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1
> +	lg	%r2, 0x38(%r1)
> +#endif
> +	br %r14
> +
> +	.size	__morestack_get_guard, . - __morestack_get_guard
> +
> +# void __morestack_set_guard (void *) sets the stack guard.
> +	.global	__morestack_set_guard
> +	.hidden	__morestack_set_guard
> +
> +	.type	__morestack_set_guard,@function
> +
> +__morestack_set_guard:
> +
> +#ifndef __s390x__
> +	ear	%r1, %a0
> +	st	%r2, 0x20(%r1)
> +#else
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1
> +	stg	%r2, 0x38(%r1)
> +#endif
> +	br	%r14
> +
> +	.size	__morestack_set_guard, . - __morestack_set_guard
> +
> +# void *__morestack_make_guard (void *, size_t) returns the stack
> +# guard value for a stack.
> +	.global	__morestack_make_guard
> +	.hidden	__morestack_make_guard
> +
> +	.type	__morestack_make_guard,@function
> +
> +__morestack_make_guard:
> +
> +#ifndef __s390x__
> +	sr	%r2, %r3
> +	ahi	%r2, BACKOFF
> +#else
> +	sgr	%r2, %r3
> +	aghi	%r2, BACKOFF
> +#endif
> +	br	%r14
> +
> +	.size	__morestack_make_guard, . - __morestack_make_guard
> +
> +# Make __stack_split_initialize a high priority constructor.
> +
> +	.section .ctors.65535,"aw",@progbits
> +
> +#ifndef __LP64__
> +	.align	4
> +	.long	__stack_split_initialize
> +	.long	__morestack_load_mmap
> +#else
> +	.align	8
> +	.quad	__stack_split_initialize
> +	.quad	__morestack_load_mmap
> +#endif
> +
> +	.section	.note.GNU-stack,"",@progbits
> +	.section	.note.GNU-split-stack,"",@progbits
> +	.section	.note.GNU-no-split-stack,"",@progbits
> diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390
> new file mode 100644
> index 0000000..4c959b0
> --- /dev/null
> +++ b/libgcc/config/s390/t-stack-s390
> @@ -0,0 +1,2 @@
> +# Makefile fragment to support -fsplit-stack for s390.
> +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S
> diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
> index 89765d4..b8eec4e 100644
> --- a/libgcc/generic-morestack.c
> +++ b/libgcc/generic-morestack.c
> @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len,
>  #elif defined (__i386__)
>        nsp -= 6 * sizeof (void *);
>  #elif defined __powerpc64__
> +#elif defined __s390x__
> +      nsp -= 2 * 160;
> +#elif defined __s390__
> +      nsp -= 2 * 96;
>  #else
>  #error "unrecognized target"
>  #endif
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-01-29 13:33       ` Andreas Krebbel
@ 2016-01-29 15:43         ` Marcin Kościelnicki
  2016-01-29 16:17           ` Andreas Krebbel
  0 siblings, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-01-29 15:43 UTC (permalink / raw)
  To: Andreas Krebbel; +Cc: gcc-patches

On 29/01/16 14:33, Andreas Krebbel wrote:
> Hi Marcin,
>
> sorry for the late feedback.
>
> A few comments regarding the split stack implementation:
>
> The GNU coding style requires to replace every 8 leading blanks on a
> line with a tab.  There are many lines in your patch violating this.
> In case you are an emacs user `whitespace-cleanup' will fix this for
> you.

OK, will do.
>
> Could you please add a testcase checking the different
> variants. I.e. with early exit, no-alloc in __morestack, and with an
> actual allocation?

The testsuite with -fsplit-stack already hits all of them, and checking 
them manually is rather tricky (I don't know if it could be done in 
target-independent way at all), but I think it'd be reasonable to make 
assembly testcases calling __morestack for the last two cases, to check 
if the registers are being preserved, etc.

>
> There are a few more comments inline.
>
> Bye,
>
> -Andreas-
>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index c881d52..71f6f38 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,5 +1,38 @@
>>   2016-01-16  Marcin Kościelnicki  <koriakin@0x04.net>
>>
>> +	* common/config/s390/s390-common.c (s390_supports_split_stack):
>> +	New function.
>> +	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
>> +	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
>> +	* config/s390/s390.c (struct machine_function): New field
>> +	split_stack_varargs_pointer.
>> +	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
>> +	in s390_emit_prologue.
>> +	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
>> +	vararg pointer.
>> +	(morestack_ref): New global.
>> +	(SPLIT_STACK_AVAILABLE): New macro.
>> +	(s390_expand_split_stack_prologue): New function.
>> +	(s390_expand_split_stack_call): New function.
>> +	(s390_live_on_entry): New function.
>> +	(s390_va_start): Use split-stack vararg pointer if appropriate.
>> +	(s390_reorg): Lower the split-stack pseudo-insns.
>> +	(s390_asm_file_end): Emit the split-stack note sections.
>> +	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
>> +	* config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec.
>> +	(UNSPECV_SPLIT_STACK_CALL): New unspec.
>> +	(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
>> +	(UNSPECV_SPLIT_STACK_MARKER): New unspec.
>> +	(split_stack_prologue): New expand.
>> +	(split_stack_call_*): New insn.
>> +	(split_stack_cond_call_*): New insn.
>> +	(split_stack_space_check): New expand.
>> +	(split_stack_sibcall_*): New insn.
>> +	(split_stack_cond_sibcall_*): New insn.
>> +	(split_stack_marker): New insn.
>> +
>> +2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
>> +
>>   	* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
>>   	with side effects.
>>
>> diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c
>> index 4519c21..1e497e6 100644
>> --- a/gcc/common/config/s390/s390-common.c
>> +++ b/gcc/common/config/s390/s390-common.c
>> @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
>>       }
>>   }
>>
>> +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
>> +   We don't verify it, since earlier versions just have padding at
>> +   its place, which works just as well.  */
>> +
>> +static bool
>> +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
>> +			   struct gcc_options *opts ATTRIBUTE_UNUSED)
>> +{
>> +  return true;
>> +}
>> +
>>   #undef TARGET_DEFAULT_TARGET_FLAGS
>>   #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT)
>>
>> @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
>>   #undef TARGET_OPTION_INIT_STRUCT
>>   #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct
>>
>> +#undef TARGET_SUPPORTS_SPLIT_STACK
>> +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack
>> +
>>   struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
>> diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
>> index 633bc1e..09032c9 100644
>> --- a/gcc/config/s390/s390-protos.h
>> +++ b/gcc/config/s390/s390-protos.h
>> @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
>>   extern HOST_WIDE_INT s390_initial_elimination_offset (int, int);
>>   extern void s390_emit_prologue (void);
>>   extern void s390_emit_epilogue (bool);
>> +extern void s390_expand_split_stack_prologue (void);
>>   extern bool s390_can_use_simple_return_insn (void);
>>   extern bool s390_can_use_return_insn (void);
>>   extern void s390_function_profiler (FILE *, int);
>> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
>> index 3be64de..6afce7c 100644
>> --- a/gcc/config/s390/s390.c
>> +++ b/gcc/config/s390/s390.c
>> @@ -426,6 +426,13 @@ struct GTY(()) machine_function
>>     /* True if the current function may contain a tbegin clobbering
>>        FPRs.  */
>>     bool tbegin_p;
>> +
>> +  /* For -fsplit-stack support: A stack local which holds a pointer to
>> +     the stack arguments for a function with a variable number of
>> +     arguments.  This is set at the start of the function and is used
>> +     to initialize the overflow_arg_area field of the va_list
>> +     structure.  */
>> +  rtx split_stack_varargs_pointer;
>>   };
>>
>>   /* Few accessor macros for struct cfun->machine->s390_frame_layout.  */
>> @@ -9316,9 +9323,13 @@ s390_register_info ()
>>   	  cfun_frame_layout.high_fprs++;
>>         }
>>
>> -  if (flag_pic)
>> -    clobbered_regs[PIC_OFFSET_TABLE_REGNUM]
>> -      |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM);
>> +  /* Register 12 is used for GOT address, but also as temp in prologue
>> +     for split-stack stdarg functions (unless r14 is available).  */
>> +  clobbered_regs[12]
>> +    |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM))
>> +	|| (flag_split_stack && cfun->stdarg
>> +	    && (crtl->is_leaf || TARGET_TPF_PROFILING
>> +		|| has_hard_reg_initial_val (Pmode, RETURN_REGNUM))));
>>
>>     clobbered_regs[BASE_REGNUM]
>>       |= (cfun->machine->base_reg
>> @@ -10446,6 +10457,8 @@ s390_emit_prologue (void)
>>         && !crtl->is_leaf
>>         && !TARGET_TPF_PROFILING)
>>       temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM);
>> +  else if (flag_split_stack && cfun->stdarg)
>> +    temp_reg = gen_rtx_REG (Pmode, 12);
> TPF uses r1 hard coded in tracing prologue/epilogue.  So I think we
> need && !TARGET_TPF_PROFILING here as well.

Well, in that case, we'll need to emit a move instruction to some temp 
register, since __morestack will leave the pointer in %r1.  I'll look 
into that.
>
>>     else
>>       temp_reg = gen_rtx_REG (Pmode, 1);
>>
>> @@ -10939,6 +10952,284 @@ s300_set_up_by_prologue (hard_reg_set_container *regs)
>>       SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg));
>>   }
>>
>> +/* -fsplit-stack support.  */
>> +
>> +/* A SYMBOL_REF for __morestack.  */
>> +static GTY(()) rtx morestack_ref;
>> +
>> +/* When using -fsplit-stack, the allocation routines set a field in
>> +   the TCB to the bottom of the stack plus this much space, measured
>> +   in bytes.  */
>> +
>> +#define SPLIT_STACK_AVAILABLE 1024
>> +
>> +/* Emit -fsplit-stack prologue, which goes before the regular function
>> +   prologue.  */
>> +
>> +void
>> +s390_expand_split_stack_prologue (void)
>> +{
>> +  rtx r1, guard, cc;
>> +  rtx_insn *insn;
>> +  /* Offset from thread pointer to __private_ss.  */
>> +  int psso = TARGET_64BIT ? 0x38 : 0x20;
>> +  /* Pointer size in bytes.  */
>> +  /* Frame size and argument size - the two parameters to __morestack.  */
>> +  HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size;
>> +  /* Align argument size to 8 bytes - simplifies __morestack code.  */
>> +  HOST_WIDE_INT args_size = crtl->args.size >= 0
>> +			    ? ((crtl->args.size + 7) & ~7)
>> +			    : 0;
>> +  /* Label to jump to when no __morestack call is necessary.  */
>> +  rtx_code_label *enough = NULL;
>> +  /* Label to be called by __morestack.  */
>> +  rtx_code_label *call_done = NULL;
>> +  /* 1 if __morestack called conditionally, 0 if always.  */
>> +  int conditional = 0;
>> +
>> +  gcc_assert (flag_split_stack && reload_completed);
>> +  if (!TARGET_CPU_ZARCH)
>> +    {
>> +      sorry ("CPUs older than z900 are not supported for -fsplit-stack");
>> +      return;
>> +    }
>> +
>> +  r1 = gen_rtx_REG (Pmode, 1);
>> +
>> +  /* If no stack frame will be allocated, don't do anything.  */
>> +  if (!frame_size)
>> +    {
>> +      /* But emit a marker that will let linker and indirect function
>> +	 calls recognise this function as split-stack aware.  */
>> +      emit_insn(gen_split_stack_marker());
> 2x missing blank before (
>
>> +      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
>> +        {
>> +          /* If va_start is used, just use r15.  */
>> +          emit_move_insn (r1,
>> +		          gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>> +			                GEN_INT (STACK_POINTER_OFFSET)));
> virtual_incoming_args_rtx ?
>

Alright.

>> +        }
>> +      return;
>> +    }
>> +
>> +  if (morestack_ref == NULL_RTX)
>> +    {
>> +      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
>> +      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
>> +					   | SYMBOL_FLAG_FUNCTION);
>> +    }
>> +
>> +  if (frame_size <= 0x7fff || (TARGET_EXTIMM && frame_size <= 0xffffffffu))
> The agfi immediate value is a signed 32 bit integer.  So you can only
> add up to 2G-1.  I think it would be more readable to write this as:

We're emitting ALGFI here, which accepts unsigned 32-bit integer.
>
> if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Os (frame_size))
>
> as in s390_emit_prologue. The Os check will check for TARGET_EXTIMM as well.

Alright.
>
>> +    {
>> +      /* If frame_size will fit in an add instruction, do a stack space
>> +	 check, and only call __morestack if there's not enough space.  */
>> +      conditional = 1;
>> +
>> +      /* Get thread pointer.  r1 is the only register we can always destroy - r0
>> +         could contain a static chain (and cannot be used to address memory
>> +         anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved.  */
>> +      emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM));
>> +      /* Aim at __private_ss.  */
>> +      guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso));
>> +
>> +      /* If less that 1kiB used, skip addition and compare directly with
>> +         __private_ss.  */
>> +      if (frame_size > SPLIT_STACK_AVAILABLE)
>> +        {
>> +          emit_move_insn (r1, guard);
>> +	  if (TARGET_64BIT)
>> +	    emit_insn (gen_adddi3 (r1, r1, GEN_INT(frame_size)));
>> +	  else
>> +	    emit_insn (gen_addsi3 (r1, r1, GEN_INT(frame_size)));
>> +	  guard = r1;
>> +        }
>> +
>> +      if (TARGET_CPU_ZARCH)
>> +        {
> Looks like the !TARGET_CPU_ZARCH stuff hasn't been completely removed?!

Oops, will remove that.
>
>> +	  rtx tmp;
>> +
>> +          /* Compare the (maybe adjusted) guard with the stack pointer.  */
>> +          cc = s390_emit_compare (LT, stack_pointer_rtx, guard);
>> +
>> +          call_done = gen_label_rtx ();
>> +
>> +	  if (TARGET_64BIT)
>> +	    tmp = gen_split_stack_cond_call_di (call_done,
>> +						morestack_ref,
>> +						GEN_INT (frame_size),
>> +						GEN_INT (args_size),
>> +						cc);
>> +	  else
>> +	    tmp = gen_split_stack_cond_call_si (call_done,
>> +						morestack_ref,
>> +						GEN_INT (frame_size),
>> +						GEN_INT (args_size),
>> +						cc);
> Perhaps it would be more readable to do the TARGET_64BIT check in a separate
> expander.  Please see "movstr" in s390.md. The same applies to all the
> other gen_split_stack* invocations.

Alright.
>
>> +
>> +
>> +          insn = emit_jump_insn (tmp);
>> +	  JUMP_LABEL (insn) = call_done;
>> +
>> +          /* Mark the jump as very unlikely to be taken.  */
>> +          add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100);
>> +	}
>> +      else
>> +        {
>> +          /* Compare the (maybe adjusted) guard with the stack pointer.  */
>> +          cc = s390_emit_compare (GE, stack_pointer_rtx, guard);
>> +
>> +          enough = gen_label_rtx ();
>> +          insn = s390_emit_jump (enough, cc);
>> +          JUMP_LABEL (insn) = enough;
>> +
>> +          /* Mark the jump as very likely to be taken.  */
>> +          add_int_reg_note (insn, REG_BR_PROB,
>> +			    REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100);
>> +	}
>> +    }
>> +
>> +  if (call_done == NULL)
> With the !TARGET_CPU_ZARCH path removed above this could be the else
> path to the frame_size check and call_done can be removed.

Right.
>
>> +    {
>> +      rtx tmp;
>> +      call_done = gen_label_rtx ();
>> +
>> +      /* Now, we need to call __morestack.  It has very special calling
>> +         conventions: it preserves param/return/static chain registers for
>> +         calling main function body, and looks for its own parameters
>> +         at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */
>> +      if (TARGET_64BIT)
>> +        tmp = gen_split_stack_call_di (call_done,
>> +					     morestack_ref,
>> +					     GEN_INT (frame_size),
>> +					     GEN_INT (args_size));
> Indentation.
>
>> +      else
>> +        tmp = gen_split_stack_call_si (call_done,
>> +					     morestack_ref,
>> +					     GEN_INT (frame_size),
>> +					     GEN_INT (args_size));
> Indentation.
>
>> +      insn = emit_jump_insn (tmp);
>> +      JUMP_LABEL (insn) = call_done;
>> +      emit_barrier ();
>> +    }
>> +
>> +  /* __morestack will call us here.  */
>> +
>> +  if (enough != NULL)
>> +    {
>> +      emit_label (enough);
>> +      LABEL_NUSES (enough) = 1;
>> +    }
> This also was only for !TARGET_CPU_ZARCH.

Yes, it'll be removed.
>
>> +
>> +  if (conditional && cfun->machine->split_stack_varargs_pointer != NULL_RTX)
>> +    {
>> +      /* If va_start is used, and __morestack was not called, just use r15.  */
>> +      emit_move_insn (r1,
>> +		      gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>> +			            GEN_INT (STACK_POINTER_OFFSET)));
> virtual_incoming_args_rtx?
>
>> +    }
>> +
>> +  emit_label (call_done);
>> +  LABEL_NUSES (call_done) = 1;
>> +}
>> +
>> +/* Generates split-stack call sequence, along with its parameter block.  */
>> +
>> +static void
>> +s390_expand_split_stack_call (rtx_insn *orig_insn,
>> +			      rtx call_done,
>> +			      rtx function,
>> +			      rtx frame_size,
>> +			      rtx args_size,
>> +			      rtx cond)
>> +{
>> +  int psize = GET_MODE_SIZE (Pmode);
>> +  rtx_insn *insn = orig_insn;
>> +  rtx parmbase = gen_label_rtx();
>> +  rtx r1 = gen_rtx_REG (Pmode, 1);
>> +  rtx tmp, tmp2;
>> +
>> +  /* %r1 = litbase.  */
>> +  insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn);
>> +  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
>> +  LABEL_NUSES (parmbase)++;
>> +
>> +  /* jg<cond> __morestack.  */
>> +  if (cond == NULL)
>> +    {
>> +      if (TARGET_64BIT)
>> +        tmp = gen_split_stack_sibcall_di (function, call_done);
>> +      else
>> +        tmp = gen_split_stack_sibcall_si (function, call_done);
>> +      insn = emit_jump_insn_after (tmp, insn);
>> +    }
>> +  else
>> +    {
>> +      if (!s390_comparison (cond, VOIDmode))
>> +	internal_error ("bad split_stack_call cond");
> Perhaps just gcc_assert (s390_comparison (cond, VOIDmode)); ?

OK.
>
>> +      if (TARGET_64BIT)
>> +        tmp = gen_split_stack_cond_sibcall_di (function, cond, call_done);
>> +      else
>> +        tmp = gen_split_stack_cond_sibcall_si (function, cond, call_done);
>> +      insn = emit_jump_insn_after (tmp, insn);
>> +    }
>> +  JUMP_LABEL (insn) = call_done;
>> +  LABEL_NUSES (call_done)++;
>> +
>> +  /* Go to .rodata.  */
>> +  insn = emit_insn_after (gen_pool_section_start (), insn);
>> +
>> +  /* Now, we'll emit parameters to __morestack.  First, align to pointer size
>> +     (this mirrors the alignment done in __morestack - don't touch it).  */
>> +  insn = emit_insn_after (gen_pool_align (GEN_INT (psize)), insn);
> psize -> UNITS_PER_LONG?
>

OK.
>> +
>> +  insn = emit_label_after (parmbase, insn);
>> +
>> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
>> +				 gen_rtvec (1, frame_size),
>> +				 UNSPECV_POOL_ENTRY);
>> +  insn = emit_insn_after (tmp, insn);
>> +
>> +  /* Second parameter is size of the arguments passed on stack that
>> +     __morestack has to copy to the new stack (does not include varargs).  */
>> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
>> +				 gen_rtvec (1, args_size),
>> +				 UNSPECV_POOL_ENTRY);
>> +  insn = emit_insn_after (tmp, insn);
>> +
>> +  /* Third parameter is offset between start of the parameter block
>> +     and function body to be called by __morestack.  */
>> +  tmp = gen_rtx_LABEL_REF (Pmode, parmbase);
>> +  tmp2 = gen_rtx_LABEL_REF (Pmode, call_done);
>> +  tmp = gen_rtx_CONST (Pmode,
>> +                       gen_rtx_MINUS (Pmode, tmp2, tmp));
>> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
>> +				 gen_rtvec (1, tmp),
>> +				 UNSPECV_POOL_ENTRY);
>> +  insn = emit_insn_after (tmp, insn);
>> +  add_reg_note (insn, REG_LABEL_OPERAND, call_done);
>> +  LABEL_NUSES (call_done)++;
>> +  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
>> +  LABEL_NUSES (parmbase)++;
>> +
>> +  /* Return from .rodata.  */
>> +  insn = emit_insn_after (gen_pool_section_end (), insn);
>> +
>> +  delete_insn (orig_insn);
>> +}
>> +
>> +/* We may have to tell the dataflow pass that the split stack prologue
>> +   is initializing a register.  */
>> +
>> +static void
>> +s390_live_on_entry (bitmap regs)
>> +{
>> +  if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
>> +    {
>> +      gcc_assert (flag_split_stack);
>> +      bitmap_set_bit (regs, 1);
>> +    }
>> +}
>> +
>>   /* Return true if the function can use simple_return to return outside
>>      of a shrink-wrapped region.  At present shrink-wrapping is supported
>>      in all cases.  */
>> @@ -11541,6 +11832,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
>>         expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
>>       }
>>
>> +  if (flag_split_stack
>> +     && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
>> +         == NULL)
>> +     && cfun->machine->split_stack_varargs_pointer == NULL_RTX)
>> +    {
>> +      rtx reg;
>> +      rtx_insn *seq;
>> +
>> +      reg = gen_reg_rtx (Pmode);
>> +      cfun->machine->split_stack_varargs_pointer = reg;
>> +
>> +      start_sequence ();
>> +      emit_move_insn (reg, gen_rtx_REG (Pmode, 1));
>> +      seq = get_insns ();
>> +      end_sequence ();
>> +
>> +      push_topmost_sequence ();
>> +      emit_insn_after (seq, entry_of_function ());
>> +      pop_topmost_sequence ();
>> +    }
>> +
>>     /* Find the overflow area.
>>        FIXME: This currently is too pessimistic when the vector ABI is
>>        enabled.  In that case we *always* set up the overflow area
>> @@ -11549,7 +11861,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
>>         || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG
>>         || TARGET_VX_ABI)
>>       {
>> -      t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
>> +      if (cfun->machine->split_stack_varargs_pointer == NULL_RTX)
>> +        t = make_tree (TREE_TYPE (ovf), crtl->args.internal_arg_pointer);
> What is the reason for changing virtual_incoming_args_rtx to
> crtl->args.internal_arg_pointer in the non-split-stack case?

Looks like an accident, will change it back.
>
>> +      else
>> +        t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer);
>>
>>         off = INTVAL (crtl->args.arg_offset_rtx);
>>         off = off < 0 ? 0 : off;
>> @@ -13158,6 +13473,48 @@ s390_reorg (void)
>>   	}
>>       }
>>
>> +  if (flag_split_stack)
>> +    {
>> +      rtx_insn *insn;
>> +
>> +      for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
>> +	{
>> +	  /* Look for the split-stack fake jump instructions.  */
>> +	  if (!JUMP_P(insn))
>> +	    continue;
>> +	  if (GET_CODE (PATTERN (insn)) != PARALLEL
>> +	      || XVECLEN (PATTERN (insn), 0) != 2)
>> +	    continue;
>> +	  rtx set = XVECEXP (PATTERN (insn), 0, 1);
>> +	  if (GET_CODE (set) != SET)
>> +	    continue;
>> +	  rtx unspec = XEXP(set, 1);
>> +	  if (GET_CODE (unspec) != UNSPEC_VOLATILE)
>> +	    continue;
>> +	  if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL)
>> +	    continue;
>> +	  rtx set_pc = XVECEXP (PATTERN (insn), 0, 0);
>> +	  rtx function = XVECEXP (unspec, 0, 0);
>> +	  rtx frame_size = XVECEXP (unspec, 0, 1);
>> +	  rtx args_size = XVECEXP (unspec, 0, 2);
>> +	  rtx pc_src = XEXP (set_pc, 1);
>> +	  rtx call_done, cond = NULL_RTX;
>> +	  if (GET_CODE (pc_src) == IF_THEN_ELSE)
>> +	    {
>> +	      cond = XEXP (pc_src, 0);
>> +	      call_done = XEXP (XEXP (pc_src, 1), 0);
>> +	    }
>> +	  else
>> +	    call_done = XEXP (pc_src, 0);
>> +	  s390_expand_split_stack_call (insn,
>> +					call_done,
>> +					function,
>> +					frame_size,
>> +					args_size,
>> +					cond);
>> +	}
>> +    }
>> +
> I'm wondering if it is really necessary to expand the call in that
> two-step approach?! We do the general literal pool handling in
> s390_reorg because we need all the insn lengths to be finalized before
> performing the branch/pool splitting loop.  But this shouldn't be necessary
> in this case.  Would it be possible to expand the call already in
> emit_prologue phase and get rid of the s390_reorg part?

There's an internal literal pool involved, which needs to be emitted as 
one chunk.  Optimizations are also very likely to destroy the sequence: 
consider the target address that __morestack will call - the control 
flow change happens in __morestack jump instruction, but the address 
itself is encoded in one of the pool literals.  Just not worth the risk.

>
>>     /* Try to optimize prologue and epilogue further.  */
>>     s390_optimize_prologue ();
>>
>> @@ -14469,6 +14826,9 @@ s390_asm_file_end (void)
>>   	     s390_vector_abi);
>>   #endif
>>     file_end_indicate_exec_stack ();
>> +
>> +  if (flag_split_stack)
>> +    file_end_indicate_split_stack ();
>>   }
>>
>>   /* Return true if TYPE is a vector bool type.  */
>> @@ -14724,6 +15084,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty
>>   #undef TARGET_SET_UP_BY_PROLOGUE
>>   #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue
>>
>> +#undef TARGET_EXTRA_LIVE_ON_ENTRY
>> +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry
>> +
>>   #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
>>   #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
>>     s390_use_by_pieces_infrastructure_p
>> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
>> index 9b869d5..21cd989 100644
>> --- a/gcc/config/s390/s390.md
>> +++ b/gcc/config/s390/s390.md
>> @@ -114,6 +114,9 @@
>>      UNSPEC_SP_SET
>>      UNSPEC_SP_TEST
>>
>> +   ; Split stack support
>> +   UNSPEC_STACK_CHECK
>> +
>>      ; Test Data Class (TDC)
>>      UNSPEC_TDC_INSN
>>
>> @@ -276,6 +279,11 @@
>>      ; Set and get floating point control register
>>      UNSPECV_SFPC
>>      UNSPECV_EFPC
>> +
>> +   ; Split stack support
>> +   UNSPECV_SPLIT_STACK_CALL
>> +   UNSPECV_SPLIT_STACK_SIBCALL
>> +   UNSPECV_SPLIT_STACK_MARKER
>>     ])
>>
>>   ;;
>> @@ -10907,3 +10915,104 @@
>>     "TARGET_Z13"
>>     "lcbb\t%0,%1,%b2"
>>     [(set_attr "op_type" "VRX")])
>> +
>> +; Handle -fsplit-stack.
>> +
>> +(define_expand "split_stack_prologue"
>> +  [(const_int 0)]
>> +  ""
>> +{
>> +  s390_expand_split_stack_prologue ();
>> +  DONE;
>> +})
>> +
>> +(define_insn "split_stack_call_<mode>"
>> +  [(set (pc) (label_ref (match_operand 0 "" "")))
>> +   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
>> +                                    (match_operand 2 "consttable_operand" "X")
>> +                                    (match_operand 3 "consttable_operand" "X")]
>> +                                   UNSPECV_SPLIT_STACK_CALL))]
>> +  "TARGET_CPU_ZARCH"
>> +{
>> +  gcc_unreachable ();
>> +}
>> +  [(set_attr "length" "12")])
>> +
>> +(define_insn "split_stack_cond_call_<mode>"
>> +  [(set (pc)
>> +        (if_then_else
>> +          (match_operand 4 "" "")
>> +          (label_ref (match_operand 0 "" ""))
>> +          (pc)))
>> +   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
>> +                                    (match_operand 2 "consttable_operand" "X")
>> +                                    (match_operand 3 "consttable_operand" "X")]
>> +                                   UNSPECV_SPLIT_STACK_CALL))]
>> +  "TARGET_CPU_ZARCH"
>> +{
>> +  gcc_unreachable ();
>> +}
>> +  [(set_attr "length" "12")])
>> +
>> +;; If there are operand 0 bytes available on the stack, jump to
>> +;; operand 1.
>> +
>> +(define_expand "split_stack_space_check"
>> +  [(set (pc) (if_then_else
>> +	      (ltu (minus (reg 15)
>> +			  (match_operand 0 "register_operand"))
>> +		   (unspec [(const_int 0)] UNSPEC_STACK_CHECK))
>> +	      (label_ref (match_operand 1))
>> +	      (pc)))]
>> +  ""
>> +{
>> +  /* Offset from thread pointer to __private_ss.  */
>> +  int psso = TARGET_64BIT ? 0x38 : 0x20;
>> +  rtx tp = s390_get_thread_pointer ();
>> +  rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso));
>> +  rtx reg = gen_reg_rtx (Pmode);
>> +  rtx cc;
>> +  if (TARGET_64BIT)
>> +    emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0]));
>> +  else
>> +    emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0]));
>> +  cc = s390_emit_compare (GT, reg, guard);
>> +  s390_emit_jump (operands[1], cc);
>> +
>> +  DONE;
>> +})
> This expander does not seem to get called from anywhere.

It's called from target-independent code for alloca and VLAs.
>
>> +
>> +;; A jg with minimal fuss for use in split stack prologue.
>> +
>> +(define_insn "split_stack_sibcall_<mode>"
>> +  [(set (pc) (label_ref (match_operand 1 "" "")))
>> +   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
>> +                                   UNSPECV_SPLIT_STACK_SIBCALL))]
>> +  "TARGET_CPU_ZARCH"
>> +  "jg\t%0"
>> +  [(set_attr "op_type" "RIL")
>> +   (set_attr "type"  "branch")])
>> +
>> +;; Also a conditional one.
>> +
>> +(define_insn "split_stack_cond_sibcall_<mode>"
>> +  [(set (pc)
>> +        (if_then_else
>> +          (match_operand 1 "" "")
>> +          (label_ref (match_operand 2 "" ""))
>> +          (pc)))
>> +   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
>> +                                   UNSPECV_SPLIT_STACK_SIBCALL))]
>> +  "TARGET_CPU_ZARCH"
>> +  "jg%C1\t%0"
>> +  [(set_attr "op_type" "RIL")
>> +   (set_attr "type"  "branch")])
>> +
>> +;; An unusual nop instruction used to mark functions with no stack frames
>> +;; as split-stack aware.
>> +
>> +(define_insn "split_stack_marker"
>> +  [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)]
>> +  ""
>> +  "nopr\t%%r15"
>> +  [(set_attr "op_type" "RR")])
>> diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
>> index 4cd8f01..604b120 100644
>> --- a/libgcc/ChangeLog
>> +++ b/libgcc/ChangeLog
>> @@ -1,3 +1,10 @@
>> +2016-01-16  Marcin Kościelnicki  <koriakin@0x04.net>
>> +
>> +	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
>> +	* config/s390/morestack.S: New file.
>> +	* config/s390/t-stack-s390: New file.
>> +	* generic-morestack.c (__splitstack_find): Add s390-specific code.
>> +
>>   2016-01-15  Nick Clifton  <nickc@redhat.com>
>>
>>   	* config/msp430/t-msp430 (lib2_mul_none.o): Only use the first
>> diff --git a/libgcc/config.host b/libgcc/config.host
>> index f58ee45..9793155 100644
>> --- a/libgcc/config.host
>> +++ b/libgcc/config.host
>> @@ -1105,11 +1105,11 @@ rx-*-elf)
>>   	tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h"
>>   	;;
>>   s390-*-linux*)
>> -	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi"
>> +	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390"
>>   	md_unwind_header=s390/linux-unwind.h
>>   	;;
>>   s390x-*-linux*)
>> -	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux"
>> +	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390"
>>   	if test "${host_address}" = 32; then
>>   	   tmake_file="${tmake_file} s390/32/t-floattodi"
>>   	fi
>> diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S
>> new file mode 100644
>> index 0000000..c99f6e4
>> --- /dev/null
>> +++ b/libgcc/config/s390/morestack.S
>> @@ -0,0 +1,609 @@
>> +# s390 support for -fsplit-stack.
>> +# Copyright (C) 2015 Free Software Foundation, Inc.
>> +# Contributed by Marcin Kościelnicki <koriakin@0x04.net>.
>> +
>> +# This file is part of GCC.
>> +
>> +# GCC is free software; you can redistribute it and/or modify it under
>> +# the terms of the GNU General Public License as published by the Free
>> +# Software Foundation; either version 3, or (at your option) any later
>> +# version.
>> +
>> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +# for more details.
>> +
>> +# Under Section 7 of GPL version 3, you are granted additional
>> +# permissions described in the GCC Runtime Library Exception, version
>> +# 3.1, as published by the Free Software Foundation.
>> +
>> +# You should have received a copy of the GNU General Public License and
>> +# a copy of the GCC Runtime Library Exception along with this program;
>> +# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>> +# <http://www.gnu.org/licenses/>.
>> +
>> +# Excess space needed to call ld.so resolver for lazy plt
>> +# resolution.  Go uses sigaltstack so this doesn't need to
>> +# also cover signal frame size.
>> +#define BACKOFF 0x1000
>> +
>> +# The __morestack function.
>> +
>> +	.global	__morestack
>> +	.hidden	__morestack
>> +
>> +	.type	__morestack,@function
>> +
>> +__morestack:
>> +.LFB1:
>> +	.cfi_startproc
>> +
>> +
>> +#ifndef __s390x__
>> +
>> +
>> +# The 31-bit __morestack function.
>> +
>> +	# We use a cleanup to restore the stack guard if an exception
>> +	# is thrown through this code.
>> +#ifndef __PIC__
>> +	.cfi_personality 0,__gcc_personality_v0
>> +	.cfi_lsda 0,.LLSDA1
>> +#else
>> +	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
>> +	.cfi_lsda 0x1b,.LLSDA1
>> +#endif
>> +
>> +	stm	%r2, %r15, 0x8(%r15)	# Save %r2-%r15.
>> +	.cfi_offset %r6, -0x48
>> +	.cfi_offset %r7, -0x44
>> +	.cfi_offset %r8, -0x40
>> +	.cfi_offset %r9, -0x3c
>> +	.cfi_offset %r10, -0x38
>> +	.cfi_offset %r11, -0x34
>> +	.cfi_offset %r12, -0x30
>> +	.cfi_offset %r13, -0x2c
>> +	.cfi_offset %r14, -0x28
>> +	.cfi_offset %r15, -0x24
>> +	lr	%r11, %r15		# Make frame pointer for vararg.
>> +	.cfi_def_cfa_register %r11
>> +	ahi	%r15, -0x60		# 0x60 for standard frame.
>> +	st	%r11, 0(%r15)		# Save back chain.
>> +	lr	%r8, %r0		# Save %r0 (static chain).
>> +	lr	%r10, %r1		# Save %r1 (address of parameter block).
>> +
>> +	l	%r7, 0(%r10)		# Required frame size to %r7
>> +	ear	%r1, %a0		# Extract thread pointer.
>> +	l	%r1, 0x20(%r1)		# Get stack bounduary
>> +	ar	%r1, %r7		# Stack bounduary + frame size
>> +	a	%r1, 4(%r10)		# + stack param size
>> +	clr	%r1, %r15		# Compare with current stack pointer
>> +	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
>> +
>> +	brasl	%r14, __morestack_block_signals
>> +
>> +	# We abuse one of caller's fpr save slots (which we don't use for fprs)
>> +	# as a local variable.  Not needed here, but done to be consistent with
>> +	# the below use.
>> +	ahi	%r7, BACKOFF		# Bump requested size a bit.
>> +	st	%r7, 0x40(%r11)		# Stuff frame size on stack.
>> +	la	%r2, 0x40(%r11)		# Pass its address as parameter.
>> +	la	%r3, 0x60(%r11)		# Caller's stack parameters.
>> +	l	%r4, 4(%r10)		# Size of stack paremeters.
> parameters
>
>> +	brasl	%r14, __generic_morestack
>> +
>> +	lr	%r15, %r2		# Switch to the new stack.
>> +	ahi	%r15, -0x60		# Make a stack frame on it.
>> +	st	%r11, 0(%r15)		# Save back chain.
>> +
>> +	s	%r2, 0x40(%r11)		# The end of stack space.
>> +	ahi	%r2, BACKOFF		# Back off a bit.
>> +	ear	%r1, %a0		# Extract thread pointer.
>> +.LEHB0:
>> +	st	%r2, 0x20(%r1)	# Save the new stack boundary.
>> +
>> +	brasl	%r14, __morestack_unblock_signals
>> +
>> +	lr	%r0, %r8		# Static chain.
>> +	lm	%r2, %r6, 0x8(%r11)	# Paremeter registers.
>> +
>> +	# Third parameter is address of function meat - address of parameter
>> +	# block.
>> +	a	%r10, 0x8(%r10)
>> +
>> +	# Leave vararg pointer in %r1, in case function uses it
>> +	la	%r1, 0x60(%r11)
>> +
>> +	# State of registers:
>> +	# %r0: Static chain from entry.
>> +	# %r1: Vararg pointer.
>> +	# %r2-%r6: Parameters from entry.
>> +	# %r7-%r10: Indeterminate.
>> +	# %r11: Frame pointer (%r15 from entry).
>> +	# %r12-%r13: Indeterminate.
>> +	# %r14: Return address.
>> +	# %r15: Stack pointer.
>> +	basr	%r14, %r10		# Call our caller.
>> +
>> +	stm	%r2, %r3, 0x8(%r11)	# Save return registers.
>> +
>> +	brasl	%r14, __morestack_block_signals
>> +
>> +	# We need a stack slot now, but have no good way to get it - the frame
>> +	# on new stack had to be exactly 0x60 bytes, or stack parameters would
>> +	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
>> +	# save actual fprs).
>> +	la	%r2, 0x40(%r11)
>> +	brasl	%r14, __generic_releasestack
>> +
>> +	s	%r2, 0x40(%r11)		# Subtract available space.
>> +	ahi	%r2, BACKOFF		# Back off a bit.
>> +	ear	%r1, %a0		# Extract thread pointer.
>> +.LEHE0:
>> +	st	%r2, 0x20(%r1)	# Save the new stack boundary.
>> +
>> +	# We need to restore the old stack pointer before unblocking signals.
>> +	# We also need 0x60 bytes for a stack frame.  Since we had a stack
>> +	# frame at this place before the stack switch, there's no need to
>> +	# write the back chain again.
>> +	lr	%r15, %r11
>> +	ahi	%r15, -0x60
>> +
>> +	brasl	%r14, __morestack_unblock_signals
>> +
>> +	lm	%r2, %r15, 0x8(%r11)	# Restore all registers.
>> +	.cfi_remember_state
>> +	.cfi_restore %r15
>> +	.cfi_restore %r14
>> +	.cfi_restore %r13
>> +	.cfi_restore %r12
>> +	.cfi_restore %r11
>> +	.cfi_restore %r10
>> +	.cfi_restore %r9
>> +	.cfi_restore %r8
>> +	.cfi_restore %r7
>> +	.cfi_restore %r6
>> +	.cfi_def_cfa_register %r15
>> +	br	%r14			# Return to caller's caller.
>> +
>> +# Executed if no new stack allocation is needed.
>> +
>> +.Lnoalloc:
>> +	.cfi_restore_state
>> +	# We may need to copy stack parameters.
>> +	l	%r9, 0x4(%r10)		# Load stack parameter size.
>> +	ltr	%r9, %r9		# And check if it's 0.
>> +	je	.Lnostackparm		# Skip the copy if not needed.
>> +	sr	%r15, %r9		# Make space on the stack.
>> +	la	%r8, 0x60(%r15)		# Destination.
>> +	la	%r12, 0x60(%r11)	# Source.
>> +	lr	%r13, %r9		# Source size.
>> +.Lcopy:
>> +	mvcle	%r8, %r12, 0		# Copy.
>> +	jo	.Lcopy
>> +
>> +.Lnostackparm:
>> +	# Third parameter is address of function meat - address of parameter
>> +	# block.
>> +	a	%r10, 0x8(%r10)
>> +
>> +	# Leave vararg pointer in %r1, in case function uses it
>> +	la	%r1, 0x60(%r11)
>> +
>> +	# OK, no stack allocation needed.  We still follow the protocol and
>> +	# call our caller - it doesn't cost much and makes sure vararg works.
>> +	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
>> +	basr	%r14, %r10		# Call our caller.
> The comment confuses me.  It somewhat sounds to me like the call
> wouldn't be really needed but in fact it cannot even remotely work
> without jumping back to the function body right?!

Certainly.  __morestack's task is to call the given function entry point 
once the necessary stack space is established.  In fact, in the no 
allocation case, a sibling-call would actually be possible, if it 
weren't for one annoying detail: there are no free GPRs we could use to 
keep the address of the entry point - %r0 may be used to keep static 
chain, %r1 may have to be the argument pointer, %r2-%r5 may be used to 
keep parameters, and %r6-%r15 are callee-saved.
>
>> +
>> +	lm	%r6, %r15, 0x18(%r11)	# Restore all callee-saved registers.
>> +	.cfi_remember_state
>> +	.cfi_restore %r15
>> +	.cfi_restore %r14
>> +	.cfi_restore %r13
>> +	.cfi_restore %r12
>> +	.cfi_restore %r11
>> +	.cfi_restore %r10
>> +	.cfi_restore %r9
>> +	.cfi_restore %r8
>> +	.cfi_restore %r7
>> +	.cfi_restore %r6
>> +	.cfi_def_cfa_register %r15
>> +	br	%r14			# Return to caller's caller.
>> +
>> +# This is the cleanup code called by the stack unwinder when unwinding
>> +# through the code between .LEHB0 and .LEHE0 above.
>> +
>> +.L1:
>> +	.cfi_restore_state
>> +	lr	%r2, %r11		# Stack pointer after resume.
>> +	brasl	%r14, __generic_findstack
>> +	lr	%r3, %r11		# Get the stack pointer.
>> +	sr	%r3, %r2		# Subtract available space.
>> +	ahi	%r3, BACKOFF		# Back off a bit.
>> +	ear	%r1, %a0		# Extract thread pointer.
>> +	st	%r3, 0x20(%r1)	# Save the new stack boundary.
>> +
>> +	lr	%r2, %r6		# Exception header.
>> +#ifdef __PIC__
>> +	brasl	%r14, _Unwind_Resume@PLT
>> +#else
>> +	brasl	%r14, _Unwind_Resume
>> +#endif
>> +
>> +#else /* defined(__s390x__) */
>> +
>> +
>> +# The 64-bit __morestack function.
>> +
>> +	# We use a cleanup to restore the stack guard if an exception
>> +	# is thrown through this code.
>> +#ifndef __PIC__
>> +	.cfi_personality 0x3,__gcc_personality_v0
>> +	.cfi_lsda 0x3,.LLSDA1
>> +#else
>> +	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
>> +	.cfi_lsda 0x1b,.LLSDA1
>> +#endif
>> +
>> +	stmg	%r2, %r15, 0x10(%r15)	# Save %r2-%r15.
>> +	.cfi_offset %r6, -0x70
>> +	.cfi_offset %r7, -0x68
>> +	.cfi_offset %r8, -0x60
>> +	.cfi_offset %r9, -0x58
>> +	.cfi_offset %r10, -0x50
>> +	.cfi_offset %r11, -0x48
>> +	.cfi_offset %r12, -0x40
>> +	.cfi_offset %r13, -0x38
>> +	.cfi_offset %r14, -0x30
>> +	.cfi_offset %r15, -0x28
>> +	lgr	%r11, %r15		# Make frame pointer for vararg.
>> +	.cfi_def_cfa_register %r11
>> +	aghi	%r15, -0xa0		# 0xa0 for standard frame.
>> +	stg	%r11, 0(%r15)		# Save back chain.
>> +	lgr	%r8, %r0		# Save %r0 (static chain).
>> +	lgr	%r10, %r1		# Save %r1 (address of parameter block).
>> +
>> +	lg	%r7, 0(%r10)		# Required frame size to %r7
>> +	ear	%r1, %a0
>> +	sllg	%r1, %r1, 32
>> +	ear	%r1, %a1		# Extract thread pointer.
>> +	lg	%r1, 0x38(%r1)		# Get stack bounduary
>> +	agr	%r1, %r7		# Stack bounduary + frame size
>> +	ag	%r1, 8(%r10)		# + stack param size
>> +	clgr	%r1, %r15		# Compare with current stack pointer
>> +	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
>> +
>> +	brasl	%r14, __morestack_block_signals
>> +
>> +	# We abuse one of caller's fpr save slots (which we don't use for fprs)
>> +	# as a local variable.  Not needed here, but done to be consistent with
>> +	# the below use.
>> +	aghi	%r7, BACKOFF		# Bump requested size a bit.
>> +	stg	%r7, 0x80(%r11)		# Stuff frame size on stack.
>> +	la	%r2, 0x80(%r11)		# Pass its address as parameter.
>> +	la	%r3, 0xa0(%r11)		# Caller's stack parameters.
>> +	lg	%r4, 8(%r10)		# Size of stack paremeters.
>> +	brasl	%r14, __generic_morestack
>> +
>> +	lgr	%r15, %r2		# Switch to the new stack.
>> +	aghi	%r15, -0xa0		# Make a stack frame on it.
>> +	stg	%r11, 0(%r15)		# Save back chain.
>> +
>> +	sg	%r2, 0x80(%r11)		# The end of stack space.
>> +	aghi	%r2, BACKOFF		# Back off a bit.
>> +	ear	%r1, %a0
>> +	sllg	%r1, %r1, 32
>> +	ear	%r1, %a1		# Extract thread pointer.
>> +.LEHB0:
>> +	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
>> +
>> +	brasl	%r14, __morestack_unblock_signals
>> +
>> +	lgr	%r0, %r8		# Static chain.
>> +	lmg	%r2, %r6, 0x10(%r11)	# Paremeter registers.
>> +
>> +	# Third parameter is address of function meat - address of parameter
>> +	# block.
>> +	ag	%r10, 0x10(%r10)
>> +
>> +	# Leave vararg pointer in %r1, in case function uses it
>> +	la	%r1, 0xa0(%r11)
>> +
>> +	# State of registers:
>> +	# %r0: Static chain from entry.
>> +	# %r1: Vararg pointer.
>> +	# %r2-%r6: Parameters from entry.
>> +	# %r7-%r10: Indeterminate.
>> +	# %r11: Frame pointer (%r15 from entry).
>> +	# %r12-%r13: Indeterminate.
>> +	# %r14: Return address.
>> +	# %r15: Stack pointer.
>> +	basr	%r14, %r10		# Call our caller.
>> +
>> +	stg	%r2, 0x10(%r11)		# Save return register.
>> +
>> +	brasl	%r14, __morestack_block_signals
>> +
>> +	# We need a stack slot now, but have no good way to get it - the frame
>> +	# on new stack had to be exactly 0xa0 bytes, or stack parameters would
>> +	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
>> +	# save actual fprs).
>> +	la	%r2, 0x80(%r11)
>> +	brasl	%r14, __generic_releasestack
>> +
>> +	sg	%r2, 0x80(%r11)		# Subtract available space.
>> +	aghi	%r2, BACKOFF		# Back off a bit.
>> +	ear	%r1, %a0
>> +	sllg	%r1, %r1, 32
>> +	ear	%r1, %a1		# Extract thread pointer.
>> +.LEHE0:
>> +	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
>> +
>> +	# We need to restore the old stack pointer before unblocking signals.
>> +	# We also need 0xa0 bytes for a stack frame.  Since we had a stack
>> +	# frame at this place before the stack switch, there's no need to
>> +	# write the back chain again.
>> +	lgr	%r15, %r11
>> +	aghi	%r15, -0xa0
>> +
>> +	brasl	%r14, __morestack_unblock_signals
>> +
>> +	lmg	%r2, %r15, 0x10(%r11)	# Restore all registers.
>> +	.cfi_remember_state
>> +	.cfi_restore %r15
>> +	.cfi_restore %r14
>> +	.cfi_restore %r13
>> +	.cfi_restore %r12
>> +	.cfi_restore %r11
>> +	.cfi_restore %r10
>> +	.cfi_restore %r9
>> +	.cfi_restore %r8
>> +	.cfi_restore %r7
>> +	.cfi_restore %r6
>> +	.cfi_def_cfa_register %r15
>> +	br	%r14			# Return to caller's caller.
>> +
>> +# Executed if no new stack allocation is needed.
>> +
>> +.Lnoalloc:
>> +	.cfi_restore_state
>> +	# We may need to copy stack parameters.
>> +	lg	%r9, 0x8(%r10)		# Load stack parameter size.
>> +	ltgr	%r9, %r9		# Check if it's 0.
>> +	je	.Lnostackparm		# Skip the copy if not needed.
>> +	sgr	%r15, %r9		# Make space on the stack.
>> +	la	%r8, 0xa0(%r15)		# Destination.
>> +	la	%r12, 0xa0(%r11)	# Source.
>> +	lgr	%r13, %r9		# Source size.
>> +.Lcopy:
>> +	mvcle	%r8, %r12, 0		# Copy.
>> +	jo	.Lcopy
>> +
>> +.Lnostackparm:
>> +	# Third parameter is address of function meat - address of parameter
>> +	# block.
>> +	ag	%r10, 0x10(%r10)
>> +
>> +	# Leave vararg pointer in %r1, in case function uses it
>> +	la	%r1, 0xa0(%r11)
>> +
>> +	# OK, no stack allocation needed.  We still follow the protocol and
>> +	# call our caller - it doesn't cost much and makes sure vararg works.
>> +	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
>> +	basr	%r14, %r10		# Call our caller.
>> +
>> +	lmg	%r6, %r15, 0x30(%r11)	# Restore all callee-saved registers.
>> +	.cfi_remember_state
>> +	.cfi_restore %r15
>> +	.cfi_restore %r14
>> +	.cfi_restore %r13
>> +	.cfi_restore %r12
>> +	.cfi_restore %r11
>> +	.cfi_restore %r10
>> +	.cfi_restore %r9
>> +	.cfi_restore %r8
>> +	.cfi_restore %r7
>> +	.cfi_restore %r6
>> +	.cfi_def_cfa_register %r15
>> +	br	%r14			# Return to caller's caller.
>> +
>> +# This is the cleanup code called by the stack unwinder when unwinding
>> +# through the code between .LEHB0 and .LEHE0 above.
>> +
>> +.L1:
>> +	.cfi_restore_state
>> +	lgr	%r2, %r11		# Stack pointer after resume.
>> +	brasl	%r14, __generic_findstack
>> +	lgr	%r3, %r11		# Get the stack pointer.
>> +	sgr	%r3, %r2		# Subtract available space.
>> +	aghi	%r3, BACKOFF		# Back off a bit.
>> +	ear	%r1, %a0
>> +	sllg	%r1, %r1, 32
>> +	ear	%r1, %a1		# Extract thread pointer.
>> +	stg	%r3, 0x38(%r1)	# Save the new stack boundary.
>> +
>> +	lgr	%r2, %r6		# Exception header.
>> +#ifdef __PIC__
>> +	brasl	%r14, _Unwind_Resume@PLT
>> +#else
>> +	brasl	%r14, _Unwind_Resume
>> +#endif
>> +
>> +#endif /* defined(__s390x__) */
>> +
>> +	.cfi_endproc
>> +	.size	__morestack, . - __morestack
>> +
>> +
>> +# The exception table.  This tells the personality routine to execute
>> +# the exception handler.
>> +
>> +	.section	.gcc_except_table,"a",@progbits
>> +	.align	4
>> +.LLSDA1:
>> +	.byte	0xff	# @LPStart format (omit)
>> +	.byte	0xff	# @TType format (omit)
>> +	.byte	0x1	# call-site format (uleb128)
>> +	.uleb128 .LLSDACSE1-.LLSDACSB1	# Call-site table length
>> +.LLSDACSB1:
>> +	.uleb128 .LEHB0-.LFB1	# region 0 start
>> +	.uleb128 .LEHE0-.LEHB0	# length
>> +	.uleb128 .L1-.LFB1	# landing pad
>> +	.uleb128 0		# action
>> +.LLSDACSE1:
>> +
>> +
>> +	.global __gcc_personality_v0
>> +#ifdef __PIC__
>> +	# Build a position independent reference to the basic
>> +        # personality function.
>> +	.hidden DW.ref.__gcc_personality_v0
>> +	.weak   DW.ref.__gcc_personality_v0
>> +	.section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
>> +	.type	DW.ref.__gcc_personality_v0, @object
>> +DW.ref.__gcc_personality_v0:
>> +#ifndef __LP64__
>> +	.align 4
>> +	.size	DW.ref.__gcc_personality_v0, 4
>> +	.long	__gcc_personality_v0
>> +#else
>> +	.align 8
>> +	.size	DW.ref.__gcc_personality_v0, 8
>> +	.quad	__gcc_personality_v0
>> +#endif
>> +#endif
>> +
>> +
>> +
>> +# Initialize the stack test value when the program starts or when a
>> +# new thread starts.  We don't know how large the main stack is, so we
>> +# guess conservatively.  We might be able to use getrlimit here.
>> +
>> +	.text
>> +	.global	__stack_split_initialize
>> +	.hidden	__stack_split_initialize
>> +
>> +	.type	__stack_split_initialize, @function
>> +
>> +__stack_split_initialize:
>> +
>> +#ifndef __s390x__
>> +
>> +	ear	%r1, %a0
>> +	lr	%r0, %r15
>> +	ahi	%r0, -0x4000	# We should have at least 16K.
>> +	st	%r0, 0x20(%r1)
>> +
>> +	lr	%r2, %r15
>> +	lhi	%r3, 0x4000
>> +#ifdef __PIC__
>> +	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
>> +#else
>> +	jg	__generic_morestack_set_initial_sp	# Tail call
>> +#endif
>> +
>> +#else /* defined(__s390x__) */
>> +
>> +	ear	%r1, %a0
>> +	sllg	%r1, %r1, 32
>> +	ear	%r1, %a1
>> +	lgr	%r0, %r15
>> +	aghi	%r0, -0x4000	# We should have at least 16K.
>> +	stg	%r0, 0x38(%r1)
>> +
>> +	lgr	%r2, %r15
>> +	lghi	%r3, 0x4000
>> +#ifdef __PIC__
>> +	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
>> +#else
>> +	jg	__generic_morestack_set_initial_sp	# Tail call
>> +#endif
>> +
>> +#endif /* defined(__s390x__) */
>> +
>> +	.size	__stack_split_initialize, . - __stack_split_initialize
>> +
>> +# Routines to get and set the guard, for __splitstack_getcontext,
>> +# __splitstack_setcontext, and __splitstack_makecontext.
>> +
>> +# void *__morestack_get_guard (void) returns the current stack guard.
>> +	.text
>> +	.global	__morestack_get_guard
>> +	.hidden	__morestack_get_guard
>> +
>> +	.type	__morestack_get_guard,@function
>> +
>> +__morestack_get_guard:
>> +
>> +#ifndef __s390x__
>> +	ear	%r1, %a0
>> +	l	%r2, 0x20(%r1)
>> +#else
>> +	ear	%r1, %a0
>> +	sllg	%r1, %r1, 32
>> +	ear	%r1, %a1
>> +	lg	%r2, 0x38(%r1)
>> +#endif
>> +	br %r14
>> +
>> +	.size	__morestack_get_guard, . - __morestack_get_guard
>> +
>> +# void __morestack_set_guard (void *) sets the stack guard.
>> +	.global	__morestack_set_guard
>> +	.hidden	__morestack_set_guard
>> +
>> +	.type	__morestack_set_guard,@function
>> +
>> +__morestack_set_guard:
>> +
>> +#ifndef __s390x__
>> +	ear	%r1, %a0
>> +	st	%r2, 0x20(%r1)
>> +#else
>> +	ear	%r1, %a0
>> +	sllg	%r1, %r1, 32
>> +	ear	%r1, %a1
>> +	stg	%r2, 0x38(%r1)
>> +#endif
>> +	br	%r14
>> +
>> +	.size	__morestack_set_guard, . - __morestack_set_guard
>> +
>> +# void *__morestack_make_guard (void *, size_t) returns the stack
>> +# guard value for a stack.
>> +	.global	__morestack_make_guard
>> +	.hidden	__morestack_make_guard
>> +
>> +	.type	__morestack_make_guard,@function
>> +
>> +__morestack_make_guard:
>> +
>> +#ifndef __s390x__
>> +	sr	%r2, %r3
>> +	ahi	%r2, BACKOFF
>> +#else
>> +	sgr	%r2, %r3
>> +	aghi	%r2, BACKOFF
>> +#endif
>> +	br	%r14
>> +
>> +	.size	__morestack_make_guard, . - __morestack_make_guard
>> +
>> +# Make __stack_split_initialize a high priority constructor.
>> +
>> +	.section .ctors.65535,"aw",@progbits
>> +
>> +#ifndef __LP64__
>> +	.align	4
>> +	.long	__stack_split_initialize
>> +	.long	__morestack_load_mmap
>> +#else
>> +	.align	8
>> +	.quad	__stack_split_initialize
>> +	.quad	__morestack_load_mmap
>> +#endif
>> +
>> +	.section	.note.GNU-stack,"",@progbits
>> +	.section	.note.GNU-split-stack,"",@progbits
>> +	.section	.note.GNU-no-split-stack,"",@progbits
>> diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390
>> new file mode 100644
>> index 0000000..4c959b0
>> --- /dev/null
>> +++ b/libgcc/config/s390/t-stack-s390
>> @@ -0,0 +1,2 @@
>> +# Makefile fragment to support -fsplit-stack for s390.
>> +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S
>> diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
>> index 89765d4..b8eec4e 100644
>> --- a/libgcc/generic-morestack.c
>> +++ b/libgcc/generic-morestack.c
>> @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len,
>>   #elif defined (__i386__)
>>         nsp -= 6 * sizeof (void *);
>>   #elif defined __powerpc64__
>> +#elif defined __s390x__
>> +      nsp -= 2 * 160;
>> +#elif defined __s390__
>> +      nsp -= 2 * 96;
>>   #else
>>   #error "unrecognized target"
>>   #endif
>>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-01-29 15:43         ` Marcin Kościelnicki
@ 2016-01-29 16:17           ` Andreas Krebbel
  2016-02-02 14:52             ` Marcin Kościelnicki
  2016-02-07 12:22             ` [PATCH] testsuite/s390: Add __morestack test Marcin Kościelnicki
  0 siblings, 2 replies; 55+ messages in thread
From: Andreas Krebbel @ 2016-01-29 16:17 UTC (permalink / raw)
  To: Marcin Kościelnicki; +Cc: gcc-patches

On 01/29/2016 04:43 PM, Marcin Kościelnicki wrote:
> The testsuite with -fsplit-stack already hits all of them, and checking 
> them manually is rather tricky (I don't know if it could be done in 
> target-independent way at all), but I think it'd be reasonable to make 
> assembly testcases calling __morestack for the last two cases, to check 
> if the registers are being preserved, etc.
Sounds good. Thanks!

...
>>> +  if (frame_size <= 0x7fff || (TARGET_EXTIMM && frame_size <= 0xffffffffu))
>> The agfi immediate value is a signed 32 bit integer.  So you can only
>> add up to 2G-1.  I think it would be more readable to write this as:
> 
> We're emitting ALGFI here, which accepts unsigned 32-bit integer.
Ah right. Then it would be:

if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Op (frame_size))

instead.

>>
>> if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Os (frame_size))
>>
>> as in s390_emit_prologue. The Os check will check for TARGET_EXTIMM as well.
> 
> Alright.

...

>> I'm wondering if it is really necessary to expand the call in that
>> two-step approach?! We do the general literal pool handling in
>> s390_reorg because we need all the insn lengths to be finalized before
>> performing the branch/pool splitting loop.  But this shouldn't be necessary
>> in this case.  Would it be possible to expand the call already in
>> emit_prologue phase and get rid of the s390_reorg part?
> 
> There's an internal literal pool involved, which needs to be emitted as 
> one chunk.  Optimizations are also very likely to destroy the sequence: 
> consider the target address that __morestack will call - the control 
> flow change happens in __morestack jump instruction, but the address 
> itself is encoded in one of the pool literals.  Just not worth the risk.
Ok.

...
>>> +	# OK, no stack allocation needed.  We still follow the protocol and
>>> +	# call our caller - it doesn't cost much and makes sure vararg works.
>>> +	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
>>> +	basr	%r14, %r10		# Call our caller.
>> The comment confuses me.  It somewhat sounds to me like the call
>> wouldn't be really needed but in fact it cannot even remotely work
>> without jumping back to the function body right?!
> 
> Certainly.  __morestack's task is to call the given function entry point 
> once the necessary stack space is established.  In fact, in the no 
> allocation case, a sibling-call would actually be possible, if it 
> weren't for one annoying detail: there are no free GPRs we could use to 
> keep the address of the entry point - %r0 may be used to keep static 
> chain, %r1 may have to be the argument pointer, %r2-%r5 may be used to 
> keep parameters, and %r6-%r15 are callee-saved.
Ok. The comment isn't about no-call vs. call it is about sibcall vs. call - got it.

Bye,

-Andreas-

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] s390: Add -fsplit-stack support
  2016-01-29 16:17           ` Andreas Krebbel
@ 2016-02-02 14:52             ` Marcin Kościelnicki
  2016-02-02 15:19               ` Andreas Krebbel
  2016-02-07 12:22             ` [PATCH] testsuite/s390: Add __morestack test Marcin Kościelnicki
  1 sibling, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-02-02 14:52 UTC (permalink / raw)
  To: krebbel; +Cc: gcc-patches, Marcin Kościelnicki

libgcc/ChangeLog:

	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
	* config/s390/morestack.S: New file.
	* config/s390/t-stack-s390: New file.
	* generic-morestack.c (__splitstack_find): Add s390-specific code.

gcc/ChangeLog:

	* common/config/s390/s390-common.c (s390_supports_split_stack):
	New function.
	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
	* config/s390/s390.c (struct machine_function): New field
	split_stack_varargs_pointer.
	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
	in s390_emit_prologue.
	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
	vararg pointer.
	(morestack_ref): New global.
	(SPLIT_STACK_AVAILABLE): New macro.
	(s390_expand_split_stack_prologue): New function.
	(s390_expand_split_stack_call): New function.
	(s390_live_on_entry): New function.
	(s390_va_start): Use split-stack vararg pointer if appropriate.
	(s390_reorg): Lower the split-stack pseudo-insns.
	(s390_asm_file_end): Emit the split-stack note sections.
	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
	(UNSPECV_SPLIT_STACK_CALL): New unspec.
	(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
	(UNSPECV_SPLIT_STACK_MARKER): New unspec.
	(split_stack_prologue): New expand.
	(split_stack_call): New expand.
	(split_stack_call_*): New insn.
	(split_stack_cond_call): New expand.
	(split_stack_cond_call_*): New insn.
	(split_stack_space_check): New expand.
	(split_stack_sibcall): New expand.
	(split_stack_sibcall_*): New insn.
	(split_stack_cond_sibcall): New expand.
	(split_stack_cond_sibcall_*): New insn.
	(split_stack_marker): New insn.
---
I've implemented most of your requested changes, with two exceptions:

- I don't use virtual_incoming_args_rtx in s390_expand_split_stack_prologue,
  since this causes constraint error - I suppose it just cannot be used after
  reload.
- It seems to me there's no problem with TPF and r1 - the conditional you
  mention is meant to avoid modifying r14 (which we do - by aiming at r1 and
  r12 for arg pointer and temp, respectively), not to ensure use of r1 as the
  temporary.  Unless there's a good reason to avoid modifying r12, the code
  seems fine to me.

As for the testcase we discussed, I'll submit it as a separate patch.


 gcc/ChangeLog                        |  37 +++
 gcc/common/config/s390/s390-common.c |  14 +
 gcc/config/s390/s390-protos.h        |   1 +
 gcc/config/s390/s390.c               | 321 +++++++++++++++++-
 gcc/config/s390/s390.md              | 177 ++++++++++
 libgcc/ChangeLog                     |   7 +
 libgcc/config.host                   |   4 +-
 libgcc/config/s390/morestack.S       | 609 +++++++++++++++++++++++++++++++++++
 libgcc/config/s390/t-stack-s390      |   2 +
 libgcc/generic-morestack.c           |   4 +
 10 files changed, 1170 insertions(+), 6 deletions(-)
 create mode 100644 libgcc/config/s390/morestack.S
 create mode 100644 libgcc/config/s390/t-stack-s390

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9a2cec8..af86079 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,40 @@
+2016-02-02  Marcin Kościelnicki  <koriakin@0x04.net>
+
+	* common/config/s390/s390-common.c (s390_supports_split_stack):
+	New function.
+	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
+	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
+	* config/s390/s390.c (struct machine_function): New field
+	split_stack_varargs_pointer.
+	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
+	in s390_emit_prologue.
+	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
+	vararg pointer.
+	(morestack_ref): New global.
+	(SPLIT_STACK_AVAILABLE): New macro.
+	(s390_expand_split_stack_prologue): New function.
+	(s390_expand_split_stack_call): New function.
+	(s390_live_on_entry): New function.
+	(s390_va_start): Use split-stack vararg pointer if appropriate.
+	(s390_reorg): Lower the split-stack pseudo-insns.
+	(s390_asm_file_end): Emit the split-stack note sections.
+	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
+	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
+	(UNSPECV_SPLIT_STACK_CALL): New unspec.
+	(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
+	(UNSPECV_SPLIT_STACK_MARKER): New unspec.
+	(split_stack_prologue): New expand.
+	(split_stack_call): New expand.
+	(split_stack_call_*): New insn.
+	(split_stack_cond_call): New expand.
+	(split_stack_cond_call_*): New insn.
+	(split_stack_space_check): New expand.
+	(split_stack_sibcall): New expand.
+	(split_stack_sibcall_*): New insn.
+	(split_stack_cond_sibcall): New expand.
+	(split_stack_cond_sibcall_*): New insn.
+	(split_stack_marker): New insn.
+
 2016-02-02  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): Remove.
diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c
index 4519c21..1e497e6 100644
--- a/gcc/common/config/s390/s390-common.c
+++ b/gcc/common/config/s390/s390-common.c
@@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
     }
 }
 
+/* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
+   We don't verify it, since earlier versions just have padding at
+   its place, which works just as well.  */
+
+static bool
+s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
+			   struct gcc_options *opts ATTRIBUTE_UNUSED)
+{
+  return true;
+}
+
 #undef TARGET_DEFAULT_TARGET_FLAGS
 #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT)
 
@@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
 #undef TARGET_OPTION_INIT_STRUCT
 #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct
 
+#undef TARGET_SUPPORTS_SPLIT_STACK
+#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack
+
 struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 633bc1e..09032c9 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
 extern HOST_WIDE_INT s390_initial_elimination_offset (int, int);
 extern void s390_emit_prologue (void);
 extern void s390_emit_epilogue (bool);
+extern void s390_expand_split_stack_prologue (void);
 extern bool s390_can_use_simple_return_insn (void);
 extern bool s390_can_use_return_insn (void);
 extern void s390_function_profiler (FILE *, int);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 3be64de..59628ba 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -426,6 +426,13 @@ struct GTY(()) machine_function
   /* True if the current function may contain a tbegin clobbering
      FPRs.  */
   bool tbegin_p;
+
+  /* For -fsplit-stack support: A stack local which holds a pointer to
+     the stack arguments for a function with a variable number of
+     arguments.  This is set at the start of the function and is used
+     to initialize the overflow_arg_area field of the va_list
+     structure.  */
+  rtx split_stack_varargs_pointer;
 };
 
 /* Few accessor macros for struct cfun->machine->s390_frame_layout.  */
@@ -9316,9 +9323,13 @@ s390_register_info ()
 	  cfun_frame_layout.high_fprs++;
       }
 
-  if (flag_pic)
-    clobbered_regs[PIC_OFFSET_TABLE_REGNUM]
-      |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM);
+  /* Register 12 is used for GOT address, but also as temp in prologue
+     for split-stack stdarg functions (unless r14 is available).  */
+  clobbered_regs[12]
+    |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM))
+	|| (flag_split_stack && cfun->stdarg
+	    && (crtl->is_leaf || TARGET_TPF_PROFILING
+		|| has_hard_reg_initial_val (Pmode, RETURN_REGNUM))));
 
   clobbered_regs[BASE_REGNUM]
     |= (cfun->machine->base_reg
@@ -10446,6 +10457,8 @@ s390_emit_prologue (void)
       && !crtl->is_leaf
       && !TARGET_TPF_PROFILING)
     temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM);
+  else if (flag_split_stack && cfun->stdarg)
+    temp_reg = gen_rtx_REG (Pmode, 12);
   else
     temp_reg = gen_rtx_REG (Pmode, 1);
 
@@ -10939,6 +10952,234 @@ s300_set_up_by_prologue (hard_reg_set_container *regs)
     SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg));
 }
 
+/* -fsplit-stack support.  */
+
+/* A SYMBOL_REF for __morestack.  */
+static GTY(()) rtx morestack_ref;
+
+/* When using -fsplit-stack, the allocation routines set a field in
+   the TCB to the bottom of the stack plus this much space, measured
+   in bytes.  */
+
+#define SPLIT_STACK_AVAILABLE 1024
+
+/* Emit -fsplit-stack prologue, which goes before the regular function
+   prologue.  */
+
+void
+s390_expand_split_stack_prologue (void)
+{
+  rtx r1, guard, cc;
+  rtx_insn *insn;
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = TARGET_64BIT ? 0x38 : 0x20;
+  /* Pointer size in bytes.  */
+  /* Frame size and argument size - the two parameters to __morestack.  */
+  HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size;
+  /* Align argument size to 8 bytes - simplifies __morestack code.  */
+  HOST_WIDE_INT args_size = crtl->args.size >= 0
+			    ? ((crtl->args.size + 7) & ~7)
+			    : 0;
+  /* Label to be called by __morestack.  */
+  rtx_code_label *call_done = NULL;
+  rtx tmp;
+
+  gcc_assert (flag_split_stack && reload_completed);
+  if (!TARGET_CPU_ZARCH)
+    {
+      sorry ("CPUs older than z900 are not supported for -fsplit-stack");
+      return;
+    }
+
+  r1 = gen_rtx_REG (Pmode, 1);
+
+  /* If no stack frame will be allocated, don't do anything.  */
+  if (!frame_size)
+    {
+      /* But emit a marker that will let linker and indirect function
+	 calls recognise this function as split-stack aware.  */
+      emit_insn (gen_split_stack_marker ());
+      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+	{
+	  /* If va_start is used, just use r15.  */
+	  emit_move_insn (r1,
+			 gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+				       GEN_INT (STACK_POINTER_OFFSET)));
+
+	}
+      return;
+    }
+
+  if (morestack_ref == NULL_RTX)
+    {
+      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
+      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
+					   | SYMBOL_FLAG_FUNCTION);
+    }
+
+  if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Op (frame_size))
+    {
+      /* If frame_size will fit in an add instruction, do a stack space
+	 check, and only call __morestack if there's not enough space.  */
+
+      /* Get thread pointer.  r1 is the only register we can always destroy - r0
+	 could contain a static chain (and cannot be used to address memory
+	 anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved.  */
+      emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM));
+      /* Aim at __private_ss.  */
+      guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso));
+
+      /* If less that 1kiB used, skip addition and compare directly with
+	 __private_ss.  */
+      if (frame_size > SPLIT_STACK_AVAILABLE)
+	{
+	  emit_move_insn (r1, guard);
+	  if (TARGET_64BIT)
+	    emit_insn (gen_adddi3 (r1, r1, GEN_INT (frame_size)));
+	  else
+	    emit_insn (gen_addsi3 (r1, r1, GEN_INT (frame_size)));
+	  guard = r1;
+	}
+
+      /* Compare the (maybe adjusted) guard with the stack pointer.  */
+      cc = s390_emit_compare (LT, stack_pointer_rtx, guard);
+
+      call_done = gen_label_rtx ();
+
+      tmp = gen_split_stack_cond_call (call_done,
+				       morestack_ref,
+				       GEN_INT (frame_size),
+				       GEN_INT (args_size),
+				       cc);
+
+      insn = emit_jump_insn (tmp);
+      JUMP_LABEL (insn) = call_done;
+
+      /* Mark the jump as very unlikely to be taken.  */
+      add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100);
+
+      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+	{
+	  /* If va_start is used, and __morestack was not called, just use
+	     r15.  */
+	  emit_move_insn (r1,
+			 gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+				       GEN_INT (STACK_POINTER_OFFSET)));
+	}
+    }
+  else
+    {
+      call_done = gen_label_rtx ();
+
+      /* Now, we need to call __morestack.  It has very special calling
+	 conventions: it preserves param/return/static chain registers for
+	 calling main function body, and looks for its own parameters
+	 at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */
+      tmp = gen_split_stack_call (call_done,
+				  morestack_ref,
+				  GEN_INT (frame_size),
+				  GEN_INT (args_size));
+      insn = emit_jump_insn (tmp);
+      JUMP_LABEL (insn) = call_done;
+      emit_barrier ();
+    }
+
+  /* __morestack will call us here.  */
+
+  emit_label (call_done);
+  LABEL_NUSES (call_done) = 1;
+}
+
+/* Generates split-stack call sequence, along with its parameter block.  */
+
+static void
+s390_expand_split_stack_call (rtx_insn *orig_insn,
+			      rtx call_done,
+			      rtx function,
+			      rtx frame_size,
+			      rtx args_size,
+			      rtx cond)
+{
+  rtx_insn *insn = orig_insn;
+  rtx parmbase = gen_label_rtx ();
+  rtx r1 = gen_rtx_REG (Pmode, 1);
+  rtx tmp, tmp2;
+
+  /* %r1 = litbase.  */
+  insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn);
+  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
+  LABEL_NUSES (parmbase)++;
+
+  /* jg<cond> __morestack.  */
+  if (cond == NULL)
+    {
+      tmp = gen_split_stack_sibcall (function, call_done);
+      insn = emit_jump_insn_after (tmp, insn);
+    }
+  else
+    {
+      gcc_assert (s390_comparison (cond, VOIDmode));
+      tmp = gen_split_stack_cond_sibcall (function, cond, call_done);
+      insn = emit_jump_insn_after (tmp, insn);
+    }
+  JUMP_LABEL (insn) = call_done;
+  LABEL_NUSES (call_done)++;
+
+  /* Go to .rodata.  */
+  insn = emit_insn_after (gen_pool_section_start (), insn);
+
+  /* Now, we'll emit parameters to __morestack.  First, align to pointer size
+     (this mirrors the alignment done in __morestack - don't touch it).  */
+  insn = emit_insn_after (gen_pool_align (GEN_INT (UNITS_PER_LONG)), insn);
+
+  insn = emit_label_after (parmbase, insn);
+
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, frame_size),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+
+  /* Second parameter is size of the arguments passed on stack that
+     __morestack has to copy to the new stack (does not include varargs).  */
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, args_size),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+
+  /* Third parameter is offset between start of the parameter block
+     and function body to be called by __morestack.  */
+  tmp = gen_rtx_LABEL_REF (Pmode, parmbase);
+  tmp2 = gen_rtx_LABEL_REF (Pmode, call_done);
+  tmp = gen_rtx_CONST (Pmode,
+		       gen_rtx_MINUS (Pmode, tmp2, tmp));
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, tmp),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+  add_reg_note (insn, REG_LABEL_OPERAND, call_done);
+  LABEL_NUSES (call_done)++;
+  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
+  LABEL_NUSES (parmbase)++;
+
+  /* Return from .rodata.  */
+  insn = emit_insn_after (gen_pool_section_end (), insn);
+
+  delete_insn (orig_insn);
+}
+
+/* We may have to tell the dataflow pass that the split stack prologue
+   is initializing a register.  */
+
+static void
+s390_live_on_entry (bitmap regs)
+{
+  if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+    {
+      gcc_assert (flag_split_stack);
+      bitmap_set_bit (regs, 1);
+    }
+}
+
 /* Return true if the function can use simple_return to return outside
    of a shrink-wrapped region.  At present shrink-wrapping is supported
    in all cases.  */
@@ -11541,6 +11782,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
       expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
     }
 
+  if (flag_split_stack
+     && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
+         == NULL)
+     && cfun->machine->split_stack_varargs_pointer == NULL_RTX)
+    {
+      rtx reg;
+      rtx_insn *seq;
+
+      reg = gen_reg_rtx (Pmode);
+      cfun->machine->split_stack_varargs_pointer = reg;
+
+      start_sequence ();
+      emit_move_insn (reg, gen_rtx_REG (Pmode, 1));
+      seq = get_insns ();
+      end_sequence ();
+
+      push_topmost_sequence ();
+      emit_insn_after (seq, entry_of_function ());
+      pop_topmost_sequence ();
+    }
+
   /* Find the overflow area.
      FIXME: This currently is too pessimistic when the vector ABI is
      enabled.  In that case we *always* set up the overflow area
@@ -11549,7 +11811,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
       || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG
       || TARGET_VX_ABI)
     {
-      t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
+      if (cfun->machine->split_stack_varargs_pointer == NULL_RTX)
+        t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
+      else
+        t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer);
 
       off = INTVAL (crtl->args.arg_offset_rtx);
       off = off < 0 ? 0 : off;
@@ -13158,6 +13423,48 @@ s390_reorg (void)
 	}
     }
 
+  if (flag_split_stack)
+    {
+      rtx_insn *insn;
+
+      for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+	{
+	  /* Look for the split-stack fake jump instructions.  */
+	  if (!JUMP_P(insn))
+	    continue;
+	  if (GET_CODE (PATTERN (insn)) != PARALLEL
+	      || XVECLEN (PATTERN (insn), 0) != 2)
+	    continue;
+	  rtx set = XVECEXP (PATTERN (insn), 0, 1);
+	  if (GET_CODE (set) != SET)
+	    continue;
+	  rtx unspec = XEXP(set, 1);
+	  if (GET_CODE (unspec) != UNSPEC_VOLATILE)
+	    continue;
+	  if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL)
+	    continue;
+	  rtx set_pc = XVECEXP (PATTERN (insn), 0, 0);
+	  rtx function = XVECEXP (unspec, 0, 0);
+	  rtx frame_size = XVECEXP (unspec, 0, 1);
+	  rtx args_size = XVECEXP (unspec, 0, 2);
+	  rtx pc_src = XEXP (set_pc, 1);
+	  rtx call_done, cond = NULL_RTX;
+	  if (GET_CODE (pc_src) == IF_THEN_ELSE)
+	    {
+	      cond = XEXP (pc_src, 0);
+	      call_done = XEXP (XEXP (pc_src, 1), 0);
+	    }
+	  else
+	    call_done = XEXP (pc_src, 0);
+	  s390_expand_split_stack_call (insn,
+					call_done,
+					function,
+					frame_size,
+					args_size,
+					cond);
+	}
+    }
+
   /* Try to optimize prologue and epilogue further.  */
   s390_optimize_prologue ();
 
@@ -14469,6 +14776,9 @@ s390_asm_file_end (void)
 	     s390_vector_abi);
 #endif
   file_end_indicate_exec_stack ();
+
+  if (flag_split_stack)
+    file_end_indicate_split_stack ();
 }
 
 /* Return true if TYPE is a vector bool type.  */
@@ -14724,6 +15034,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty
 #undef TARGET_SET_UP_BY_PROLOGUE
 #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue
 
+#undef TARGET_EXTRA_LIVE_ON_ENTRY
+#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry
+
 #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   s390_use_by_pieces_infrastructure_p
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 9b869d5..771f1cc 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -114,6 +114,9 @@
    UNSPEC_SP_SET
    UNSPEC_SP_TEST
 
+   ; Split stack support
+   UNSPEC_STACK_CHECK
+
    ; Test Data Class (TDC)
    UNSPEC_TDC_INSN
 
@@ -276,6 +279,11 @@
    ; Set and get floating point control register
    UNSPECV_SFPC
    UNSPECV_EFPC
+
+   ; Split stack support
+   UNSPECV_SPLIT_STACK_CALL
+   UNSPECV_SPLIT_STACK_SIBCALL
+   UNSPECV_SPLIT_STACK_MARKER
   ])
 
 ;;
@@ -10907,3 +10915,172 @@
   "TARGET_Z13"
   "lcbb\t%0,%1,%b2"
   [(set_attr "op_type" "VRX")])
+
+; Handle -fsplit-stack.
+
+(define_expand "split_stack_prologue"
+  [(const_int 0)]
+  ""
+{
+  s390_expand_split_stack_prologue ();
+  DONE;
+})
+
+(define_expand "split_stack_call"
+  [(match_operand 0 "" "")
+   (match_operand 1 "bras_sym_operand" "X")
+   (match_operand 2 "consttable_operand" "X")
+   (match_operand 3 "consttable_operand" "X")]
+  "TARGET_CPU_ZARCH"
+{
+  if (TARGET_64BIT)
+    emit_jump_insn (gen_split_stack_call_di (operands[0],
+					     operands[1],
+					     operands[2],
+					     operands[3]));
+  else
+    emit_jump_insn (gen_split_stack_call_si (operands[0],
+					     operands[1],
+					     operands[2],
+					     operands[3]));
+  DONE;
+})
+
+(define_insn "split_stack_call_<mode>"
+  [(set (pc) (label_ref (match_operand 0 "" "")))
+   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
+				    (match_operand 2 "consttable_operand" "X")
+				    (match_operand 3 "consttable_operand" "X")]
+				   UNSPECV_SPLIT_STACK_CALL))]
+  "TARGET_CPU_ZARCH"
+{
+  gcc_unreachable ();
+}
+  [(set_attr "length" "12")])
+
+(define_expand "split_stack_cond_call"
+  [(match_operand 0 "" "")
+   (match_operand 1 "bras_sym_operand" "X")
+   (match_operand 2 "consttable_operand" "X")
+   (match_operand 3 "consttable_operand" "X")
+   (match_operand 4 "" "")]
+  "TARGET_CPU_ZARCH"
+{
+  if (TARGET_64BIT)
+    emit_jump_insn (gen_split_stack_cond_call_di (operands[0],
+						  operands[1],
+						  operands[2],
+						  operands[3],
+						  operands[4]));
+  else
+    emit_jump_insn (gen_split_stack_cond_call_si (operands[0],
+						  operands[1],
+						  operands[2],
+						  operands[3],
+						  operands[4]));
+  DONE;
+})
+
+(define_insn "split_stack_cond_call_<mode>"
+  [(set (pc)
+	(if_then_else
+	  (match_operand 4 "" "")
+	  (label_ref (match_operand 0 "" ""))
+	  (pc)))
+   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
+				    (match_operand 2 "consttable_operand" "X")
+				    (match_operand 3 "consttable_operand" "X")]
+				   UNSPECV_SPLIT_STACK_CALL))]
+  "TARGET_CPU_ZARCH"
+{
+  gcc_unreachable ();
+}
+  [(set_attr "length" "12")])
+
+;; If there are operand 0 bytes available on the stack, jump to
+;; operand 1.
+
+(define_expand "split_stack_space_check"
+  [(set (pc) (if_then_else
+	      (ltu (minus (reg 15)
+			  (match_operand 0 "register_operand"))
+		   (unspec [(const_int 0)] UNSPEC_STACK_CHECK))
+	      (label_ref (match_operand 1))
+	      (pc)))]
+  ""
+{
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = TARGET_64BIT ? 0x38 : 0x20;
+  rtx tp = s390_get_thread_pointer ();
+  rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso));
+  rtx reg = gen_reg_rtx (Pmode);
+  rtx cc;
+  if (TARGET_64BIT)
+    emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0]));
+  else
+    emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0]));
+  cc = s390_emit_compare (GT, reg, guard);
+  s390_emit_jump (operands[1], cc);
+
+  DONE;
+})
+
+;; A jg with minimal fuss for use in split stack prologue.
+
+(define_expand "split_stack_sibcall"
+  [(match_operand 0 "bras_sym_operand" "X")
+   (match_operand 1 "" "")]
+  "TARGET_CPU_ZARCH"
+{
+  if (TARGET_64BIT)
+    emit_jump_insn (gen_split_stack_sibcall_di (operands[0], operands[1]));
+  else
+    emit_jump_insn (gen_split_stack_sibcall_si (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "split_stack_sibcall_<mode>"
+  [(set (pc) (label_ref (match_operand 1 "" "")))
+   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
+				   UNSPECV_SPLIT_STACK_SIBCALL))]
+  "TARGET_CPU_ZARCH"
+  "jg\t%0"
+  [(set_attr "op_type" "RIL")
+   (set_attr "type"  "branch")])
+
+;; Also a conditional one.
+
+(define_expand "split_stack_cond_sibcall"
+  [(match_operand 0 "bras_sym_operand" "X")
+   (match_operand 1 "" "")
+   (match_operand 2 "" "")]
+  "TARGET_CPU_ZARCH"
+{
+  if (TARGET_64BIT)
+    emit_jump_insn (gen_split_stack_cond_sibcall_di (operands[0], operands[1], operands[2]));
+  else
+    emit_jump_insn (gen_split_stack_cond_sibcall_si (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "split_stack_cond_sibcall_<mode>"
+  [(set (pc)
+	(if_then_else
+	  (match_operand 1 "" "")
+	  (label_ref (match_operand 2 "" ""))
+	  (pc)))
+   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
+				   UNSPECV_SPLIT_STACK_SIBCALL))]
+  "TARGET_CPU_ZARCH"
+  "jg%C1\t%0"
+  [(set_attr "op_type" "RIL")
+   (set_attr "type"  "branch")])
+
+;; An unusual nop instruction used to mark functions with no stack frames
+;; as split-stack aware.
+
+(define_insn "split_stack_marker"
+  [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)]
+  ""
+  "nopr\t%%r15"
+  [(set_attr "op_type" "RR")])
diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index 49c7929..3900ab1 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-02-02  Marcin Kościelnicki  <koriakin@0x04.net>
+
+	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
+	* config/s390/morestack.S: New file.
+	* config/s390/t-stack-s390: New file.
+	* generic-morestack.c (__splitstack_find): Add s390-specific code.
+
 2016-01-25  Jakub Jelinek  <jakub@redhat.com>
 
 	PR target/69444
diff --git a/libgcc/config.host b/libgcc/config.host
index d8efd82..2be5f7e 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1114,11 +1114,11 @@ rx-*-elf)
 	tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h"
 	;;
 s390-*-linux*)
-	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi"
+	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390"
 	md_unwind_header=s390/linux-unwind.h
 	;;
 s390x-*-linux*)
-	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux"
+	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390"
 	if test "${host_address}" = 32; then
 	   tmake_file="${tmake_file} s390/32/t-floattodi"
 	fi
diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S
new file mode 100644
index 0000000..141dead
--- /dev/null
+++ b/libgcc/config/s390/morestack.S
@@ -0,0 +1,609 @@
+# s390 support for -fsplit-stack.
+# Copyright (C) 2015 Free Software Foundation, Inc.
+# Contributed by Marcin Kościelnicki <koriakin@0x04.net>.
+
+# This file is part of GCC.
+
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# Excess space needed to call ld.so resolver for lazy plt
+# resolution.  Go uses sigaltstack so this doesn't need to
+# also cover signal frame size.
+#define BACKOFF 0x1000
+
+# The __morestack function.
+
+	.global	__morestack
+	.hidden	__morestack
+
+	.type	__morestack,@function
+
+__morestack:
+.LFB1:
+	.cfi_startproc
+
+
+#ifndef __s390x__
+
+
+# The 31-bit __morestack function.
+
+	# We use a cleanup to restore the stack guard if an exception
+	# is thrown through this code.
+#ifndef __PIC__
+	.cfi_personality 0,__gcc_personality_v0
+	.cfi_lsda 0,.LLSDA1
+#else
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#endif
+
+	stm	%r2, %r15, 0x8(%r15)	# Save %r2-%r15.
+	.cfi_offset %r6, -0x48
+	.cfi_offset %r7, -0x44
+	.cfi_offset %r8, -0x40
+	.cfi_offset %r9, -0x3c
+	.cfi_offset %r10, -0x38
+	.cfi_offset %r11, -0x34
+	.cfi_offset %r12, -0x30
+	.cfi_offset %r13, -0x2c
+	.cfi_offset %r14, -0x28
+	.cfi_offset %r15, -0x24
+	lr	%r11, %r15		# Make frame pointer for vararg.
+	.cfi_def_cfa_register %r11
+	ahi	%r15, -0x60		# 0x60 for standard frame.
+	st	%r11, 0(%r15)		# Save back chain.
+	lr	%r8, %r0		# Save %r0 (static chain).
+	lr	%r10, %r1		# Save %r1 (address of parameter block).
+
+	l	%r7, 0(%r10)		# Required frame size to %r7
+	ear	%r1, %a0		# Extract thread pointer.
+	l	%r1, 0x20(%r1)		# Get stack bounduary
+	ar	%r1, %r7		# Stack bounduary + frame size
+	a	%r1, 4(%r10)		# + stack param size
+	clr	%r1, %r15		# Compare with current stack pointer
+	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
+
+	brasl	%r14, __morestack_block_signals
+
+	# We abuse one of caller's fpr save slots (which we don't use for fprs)
+	# as a local variable.  Not needed here, but done to be consistent with
+	# the below use.
+	ahi	%r7, BACKOFF		# Bump requested size a bit.
+	st	%r7, 0x40(%r11)		# Stuff frame size on stack.
+	la	%r2, 0x40(%r11)		# Pass its address as parameter.
+	la	%r3, 0x60(%r11)		# Caller's stack parameters.
+	l	%r4, 4(%r10)		# Size of stack parameters.
+	brasl	%r14, __generic_morestack
+
+	lr	%r15, %r2		# Switch to the new stack.
+	ahi	%r15, -0x60		# Make a stack frame on it.
+	st	%r11, 0(%r15)		# Save back chain.
+
+	s	%r2, 0x40(%r11)		# The end of stack space.
+	ahi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+.LEHB0:
+	st	%r2, 0x20(%r1)	# Save the new stack boundary.
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lr	%r0, %r8		# Static chain.
+	lm	%r2, %r6, 0x8(%r11)	# Paremeter registers.
+
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	a	%r10, 0x8(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0x60(%r11)
+
+	# State of registers:
+	# %r0: Static chain from entry.
+	# %r1: Vararg pointer.
+	# %r2-%r6: Parameters from entry.
+	# %r7-%r10: Indeterminate.
+	# %r11: Frame pointer (%r15 from entry).
+	# %r12-%r13: Indeterminate.
+	# %r14: Return address.
+	# %r15: Stack pointer.
+	basr	%r14, %r10		# Call our caller.
+
+	stm	%r2, %r3, 0x8(%r11)	# Save return registers.
+
+	brasl	%r14, __morestack_block_signals
+
+	# We need a stack slot now, but have no good way to get it - the frame
+	# on new stack had to be exactly 0x60 bytes, or stack parameters would
+	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
+	# save actual fprs).
+	la	%r2, 0x40(%r11)
+	brasl	%r14, __generic_releasestack
+
+	s	%r2, 0x40(%r11)		# Subtract available space.
+	ahi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+.LEHE0:
+	st	%r2, 0x20(%r1)	# Save the new stack boundary.
+
+	# We need to restore the old stack pointer before unblocking signals.
+	# We also need 0x60 bytes for a stack frame.  Since we had a stack
+	# frame at this place before the stack switch, there's no need to
+	# write the back chain again.
+	lr	%r15, %r11
+	ahi	%r15, -0x60
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lm	%r2, %r15, 0x8(%r11)	# Restore all registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# Executed if no new stack allocation is needed.
+
+.Lnoalloc:
+	.cfi_restore_state
+	# We may need to copy stack parameters.
+	l	%r9, 0x4(%r10)		# Load stack parameter size.
+	ltr	%r9, %r9		# And check if it's 0.
+	je	.Lnostackparm		# Skip the copy if not needed.
+	sr	%r15, %r9		# Make space on the stack.
+	la	%r8, 0x60(%r15)		# Destination.
+	la	%r12, 0x60(%r11)	# Source.
+	lr	%r13, %r9		# Source size.
+.Lcopy:
+	mvcle	%r8, %r12, 0		# Copy.
+	jo	.Lcopy
+
+.Lnostackparm:
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	a	%r10, 0x8(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0x60(%r11)
+
+	# OK, no stack allocation needed.  We still follow the protocol and
+	# call our caller - it doesn't cost much and makes sure vararg works.
+	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
+	basr	%r14, %r10		# Call our caller.
+
+	lm	%r6, %r15, 0x18(%r11)	# Restore all callee-saved registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# This is the cleanup code called by the stack unwinder when unwinding
+# through the code between .LEHB0 and .LEHE0 above.
+
+.L1:
+	.cfi_restore_state
+	lr	%r2, %r11		# Stack pointer after resume.
+	brasl	%r14, __generic_findstack
+	lr	%r3, %r11		# Get the stack pointer.
+	sr	%r3, %r2		# Subtract available space.
+	ahi	%r3, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+	st	%r3, 0x20(%r1)	# Save the new stack boundary.
+
+	lr	%r2, %r6		# Exception header.
+#ifdef __PIC__
+	brasl	%r14, _Unwind_Resume@PLT
+#else
+	brasl	%r14, _Unwind_Resume
+#endif
+
+#else /* defined(__s390x__) */
+
+
+# The 64-bit __morestack function.
+
+	# We use a cleanup to restore the stack guard if an exception
+	# is thrown through this code.
+#ifndef __PIC__
+	.cfi_personality 0x3,__gcc_personality_v0
+	.cfi_lsda 0x3,.LLSDA1
+#else
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#endif
+
+	stmg	%r2, %r15, 0x10(%r15)	# Save %r2-%r15.
+	.cfi_offset %r6, -0x70
+	.cfi_offset %r7, -0x68
+	.cfi_offset %r8, -0x60
+	.cfi_offset %r9, -0x58
+	.cfi_offset %r10, -0x50
+	.cfi_offset %r11, -0x48
+	.cfi_offset %r12, -0x40
+	.cfi_offset %r13, -0x38
+	.cfi_offset %r14, -0x30
+	.cfi_offset %r15, -0x28
+	lgr	%r11, %r15		# Make frame pointer for vararg.
+	.cfi_def_cfa_register %r11
+	aghi	%r15, -0xa0		# 0xa0 for standard frame.
+	stg	%r11, 0(%r15)		# Save back chain.
+	lgr	%r8, %r0		# Save %r0 (static chain).
+	lgr	%r10, %r1		# Save %r1 (address of parameter block).
+
+	lg	%r7, 0(%r10)		# Required frame size to %r7
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+	lg	%r1, 0x38(%r1)		# Get stack bounduary
+	agr	%r1, %r7		# Stack bounduary + frame size
+	ag	%r1, 8(%r10)		# + stack param size
+	clgr	%r1, %r15		# Compare with current stack pointer
+	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
+
+	brasl	%r14, __morestack_block_signals
+
+	# We abuse one of caller's fpr save slots (which we don't use for fprs)
+	# as a local variable.  Not needed here, but done to be consistent with
+	# the below use.
+	aghi	%r7, BACKOFF		# Bump requested size a bit.
+	stg	%r7, 0x80(%r11)		# Stuff frame size on stack.
+	la	%r2, 0x80(%r11)		# Pass its address as parameter.
+	la	%r3, 0xa0(%r11)		# Caller's stack parameters.
+	lg	%r4, 8(%r10)		# Size of stack parameters.
+	brasl	%r14, __generic_morestack
+
+	lgr	%r15, %r2		# Switch to the new stack.
+	aghi	%r15, -0xa0		# Make a stack frame on it.
+	stg	%r11, 0(%r15)		# Save back chain.
+
+	sg	%r2, 0x80(%r11)		# The end of stack space.
+	aghi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+.LEHB0:
+	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lgr	%r0, %r8		# Static chain.
+	lmg	%r2, %r6, 0x10(%r11)	# Paremeter registers.
+
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	ag	%r10, 0x10(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0xa0(%r11)
+
+	# State of registers:
+	# %r0: Static chain from entry.
+	# %r1: Vararg pointer.
+	# %r2-%r6: Parameters from entry.
+	# %r7-%r10: Indeterminate.
+	# %r11: Frame pointer (%r15 from entry).
+	# %r12-%r13: Indeterminate.
+	# %r14: Return address.
+	# %r15: Stack pointer.
+	basr	%r14, %r10		# Call our caller.
+
+	stg	%r2, 0x10(%r11)		# Save return register.
+
+	brasl	%r14, __morestack_block_signals
+
+	# We need a stack slot now, but have no good way to get it - the frame
+	# on new stack had to be exactly 0xa0 bytes, or stack parameters would
+	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
+	# save actual fprs).
+	la	%r2, 0x80(%r11)
+	brasl	%r14, __generic_releasestack
+
+	sg	%r2, 0x80(%r11)		# Subtract available space.
+	aghi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+.LEHE0:
+	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
+
+	# We need to restore the old stack pointer before unblocking signals.
+	# We also need 0xa0 bytes for a stack frame.  Since we had a stack
+	# frame at this place before the stack switch, there's no need to
+	# write the back chain again.
+	lgr	%r15, %r11
+	aghi	%r15, -0xa0
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lmg	%r2, %r15, 0x10(%r11)	# Restore all registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# Executed if no new stack allocation is needed.
+
+.Lnoalloc:
+	.cfi_restore_state
+	# We may need to copy stack parameters.
+	lg	%r9, 0x8(%r10)		# Load stack parameter size.
+	ltgr	%r9, %r9		# Check if it's 0.
+	je	.Lnostackparm		# Skip the copy if not needed.
+	sgr	%r15, %r9		# Make space on the stack.
+	la	%r8, 0xa0(%r15)		# Destination.
+	la	%r12, 0xa0(%r11)	# Source.
+	lgr	%r13, %r9		# Source size.
+.Lcopy:
+	mvcle	%r8, %r12, 0		# Copy.
+	jo	.Lcopy
+
+.Lnostackparm:
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	ag	%r10, 0x10(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0xa0(%r11)
+
+	# OK, no stack allocation needed.  We still follow the protocol and
+	# call our caller - it doesn't cost much and makes sure vararg works.
+	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
+	basr	%r14, %r10		# Call our caller.
+
+	lmg	%r6, %r15, 0x30(%r11)	# Restore all callee-saved registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# This is the cleanup code called by the stack unwinder when unwinding
+# through the code between .LEHB0 and .LEHE0 above.
+
+.L1:
+	.cfi_restore_state
+	lgr	%r2, %r11		# Stack pointer after resume.
+	brasl	%r14, __generic_findstack
+	lgr	%r3, %r11		# Get the stack pointer.
+	sgr	%r3, %r2		# Subtract available space.
+	aghi	%r3, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+	stg	%r3, 0x38(%r1)	# Save the new stack boundary.
+
+	lgr	%r2, %r6		# Exception header.
+#ifdef __PIC__
+	brasl	%r14, _Unwind_Resume@PLT
+#else
+	brasl	%r14, _Unwind_Resume
+#endif
+
+#endif /* defined(__s390x__) */
+
+	.cfi_endproc
+	.size	__morestack, . - __morestack
+
+
+# The exception table.  This tells the personality routine to execute
+# the exception handler.
+
+	.section	.gcc_except_table,"a",@progbits
+	.align	4
+.LLSDA1:
+	.byte	0xff	# @LPStart format (omit)
+	.byte	0xff	# @TType format (omit)
+	.byte	0x1	# call-site format (uleb128)
+	.uleb128 .LLSDACSE1-.LLSDACSB1	# Call-site table length
+.LLSDACSB1:
+	.uleb128 .LEHB0-.LFB1	# region 0 start
+	.uleb128 .LEHE0-.LEHB0	# length
+	.uleb128 .L1-.LFB1	# landing pad
+	.uleb128 0		# action
+.LLSDACSE1:
+
+
+	.global __gcc_personality_v0
+#ifdef __PIC__
+	# Build a position independent reference to the basic
+	# personality function.
+	.hidden DW.ref.__gcc_personality_v0
+	.weak   DW.ref.__gcc_personality_v0
+	.section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
+	.type	DW.ref.__gcc_personality_v0, @object
+DW.ref.__gcc_personality_v0:
+#ifndef __LP64__
+	.align 4
+	.size	DW.ref.__gcc_personality_v0, 4
+	.long	__gcc_personality_v0
+#else
+	.align 8
+	.size	DW.ref.__gcc_personality_v0, 8
+	.quad	__gcc_personality_v0
+#endif
+#endif
+
+
+
+# Initialize the stack test value when the program starts or when a
+# new thread starts.  We don't know how large the main stack is, so we
+# guess conservatively.  We might be able to use getrlimit here.
+
+	.text
+	.global	__stack_split_initialize
+	.hidden	__stack_split_initialize
+
+	.type	__stack_split_initialize, @function
+
+__stack_split_initialize:
+
+#ifndef __s390x__
+
+	ear	%r1, %a0
+	lr	%r0, %r15
+	ahi	%r0, -0x4000	# We should have at least 16K.
+	st	%r0, 0x20(%r1)
+
+	lr	%r2, %r15
+	lhi	%r3, 0x4000
+#ifdef __PIC__
+	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
+#else
+	jg	__generic_morestack_set_initial_sp	# Tail call
+#endif
+
+#else /* defined(__s390x__) */
+
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	lgr	%r0, %r15
+	aghi	%r0, -0x4000	# We should have at least 16K.
+	stg	%r0, 0x38(%r1)
+
+	lgr	%r2, %r15
+	lghi	%r3, 0x4000
+#ifdef __PIC__
+	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
+#else
+	jg	__generic_morestack_set_initial_sp	# Tail call
+#endif
+
+#endif /* defined(__s390x__) */
+
+	.size	__stack_split_initialize, . - __stack_split_initialize
+
+# Routines to get and set the guard, for __splitstack_getcontext,
+# __splitstack_setcontext, and __splitstack_makecontext.
+
+# void *__morestack_get_guard (void) returns the current stack guard.
+	.text
+	.global	__morestack_get_guard
+	.hidden	__morestack_get_guard
+
+	.type	__morestack_get_guard,@function
+
+__morestack_get_guard:
+
+#ifndef __s390x__
+	ear	%r1, %a0
+	l	%r2, 0x20(%r1)
+#else
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	lg	%r2, 0x38(%r1)
+#endif
+	br %r14
+
+	.size	__morestack_get_guard, . - __morestack_get_guard
+
+# void __morestack_set_guard (void *) sets the stack guard.
+	.global	__morestack_set_guard
+	.hidden	__morestack_set_guard
+
+	.type	__morestack_set_guard,@function
+
+__morestack_set_guard:
+
+#ifndef __s390x__
+	ear	%r1, %a0
+	st	%r2, 0x20(%r1)
+#else
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	stg	%r2, 0x38(%r1)
+#endif
+	br	%r14
+
+	.size	__morestack_set_guard, . - __morestack_set_guard
+
+# void *__morestack_make_guard (void *, size_t) returns the stack
+# guard value for a stack.
+	.global	__morestack_make_guard
+	.hidden	__morestack_make_guard
+
+	.type	__morestack_make_guard,@function
+
+__morestack_make_guard:
+
+#ifndef __s390x__
+	sr	%r2, %r3
+	ahi	%r2, BACKOFF
+#else
+	sgr	%r2, %r3
+	aghi	%r2, BACKOFF
+#endif
+	br	%r14
+
+	.size	__morestack_make_guard, . - __morestack_make_guard
+
+# Make __stack_split_initialize a high priority constructor.
+
+	.section .ctors.65535,"aw",@progbits
+
+#ifndef __LP64__
+	.align	4
+	.long	__stack_split_initialize
+	.long	__morestack_load_mmap
+#else
+	.align	8
+	.quad	__stack_split_initialize
+	.quad	__morestack_load_mmap
+#endif
+
+	.section	.note.GNU-stack,"",@progbits
+	.section	.note.GNU-split-stack,"",@progbits
+	.section	.note.GNU-no-split-stack,"",@progbits
diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390
new file mode 100644
index 0000000..4c959b0
--- /dev/null
+++ b/libgcc/config/s390/t-stack-s390
@@ -0,0 +1,2 @@
+# Makefile fragment to support -fsplit-stack for s390.
+LIB2ADD_ST += $(srcdir)/config/s390/morestack.S
diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
index 89765d4..b8eec4e 100644
--- a/libgcc/generic-morestack.c
+++ b/libgcc/generic-morestack.c
@@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len,
 #elif defined (__i386__)
       nsp -= 6 * sizeof (void *);
 #elif defined __powerpc64__
+#elif defined __s390x__
+      nsp -= 2 * 160;
+#elif defined __s390__
+      nsp -= 2 * 96;
 #else
 #error "unrecognized target"
 #endif
-- 
2.7.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-02 14:52             ` Marcin Kościelnicki
@ 2016-02-02 15:19               ` Andreas Krebbel
  2016-02-02 15:31                 ` Marcin Kościelnicki
  0 siblings, 1 reply; 55+ messages in thread
From: Andreas Krebbel @ 2016-02-02 15:19 UTC (permalink / raw)
  To: Marcin Kościelnicki; +Cc: gcc-patches

On 02/02/2016 03:52 PM, Marcin Kościelnicki wrote:
> libgcc/ChangeLog:
> 
> 	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
> 	* config/s390/morestack.S: New file.
> 	* config/s390/t-stack-s390: New file.
> 	* generic-morestack.c (__splitstack_find): Add s390-specific code.
> 
> gcc/ChangeLog:
> 
> 	* common/config/s390/s390-common.c (s390_supports_split_stack):
> 	New function.
> 	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
> 	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
> 	* config/s390/s390.c (struct machine_function): New field
> 	split_stack_varargs_pointer.
> 	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
> 	in s390_emit_prologue.
> 	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
> 	vararg pointer.
> 	(morestack_ref): New global.
> 	(SPLIT_STACK_AVAILABLE): New macro.
> 	(s390_expand_split_stack_prologue): New function.
> 	(s390_expand_split_stack_call): New function.
> 	(s390_live_on_entry): New function.
> 	(s390_va_start): Use split-stack vararg pointer if appropriate.
> 	(s390_reorg): Lower the split-stack pseudo-insns.
> 	(s390_asm_file_end): Emit the split-stack note sections.
> 	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
> 	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
> 	(UNSPECV_SPLIT_STACK_CALL): New unspec.
> 	(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
> 	(UNSPECV_SPLIT_STACK_MARKER): New unspec.
> 	(split_stack_prologue): New expand.
> 	(split_stack_call): New expand.
> 	(split_stack_call_*): New insn.
> 	(split_stack_cond_call): New expand.
> 	(split_stack_cond_call_*): New insn.
> 	(split_stack_space_check): New expand.
> 	(split_stack_sibcall): New expand.
> 	(split_stack_sibcall_*): New insn.
> 	(split_stack_cond_sibcall): New expand.
> 	(split_stack_cond_sibcall_*): New insn.
> 	(split_stack_marker): New insn.
> ---
> I've implemented most of your requested changes, with two exceptions:
> 
> - I don't use virtual_incoming_args_rtx in s390_expand_split_stack_prologue,
>   since this causes constraint error - I suppose it just cannot be used after
>   reload.
Right. As an elimination reg it cannot be used in the code path called from s390_reorg.

> - It seems to me there's no problem with TPF and r1 - the conditional you
>   mention is meant to avoid modifying r14 (which we do - by aiming at r1 and
>   r12 for arg pointer and temp, respectively), not to ensure use of r1 as the
>   temporary.  Unless there's a good reason to avoid modifying r12, the code
>   seems fine to me.
Ok. The comment above this check then does not seem to be correct anymore. Could you please adjust
it as well. It should read "avoid register 14" then.

  /* Choose best register to use for temp use within prologue.
     See below for why TPF must use the register 1.  */

  if (!has_hard_reg_initial_val (Pmode, RETURN_REGNUM)
      && !crtl->is_leaf
      && !TARGET_TPF_PROFILING)
    temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM);
...

-Andreas-



> 
> As for the testcase we discussed, I'll submit it as a separate patch.
> 
> 
>  gcc/ChangeLog                        |  37 +++
>  gcc/common/config/s390/s390-common.c |  14 +
>  gcc/config/s390/s390-protos.h        |   1 +
>  gcc/config/s390/s390.c               | 321 +++++++++++++++++-
>  gcc/config/s390/s390.md              | 177 ++++++++++
>  libgcc/ChangeLog                     |   7 +
>  libgcc/config.host                   |   4 +-
>  libgcc/config/s390/morestack.S       | 609 +++++++++++++++++++++++++++++++++++
>  libgcc/config/s390/t-stack-s390      |   2 +
>  libgcc/generic-morestack.c           |   4 +
>  10 files changed, 1170 insertions(+), 6 deletions(-)
>  create mode 100644 libgcc/config/s390/morestack.S
>  create mode 100644 libgcc/config/s390/t-stack-s390
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 9a2cec8..af86079 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,40 @@
> +2016-02-02  Marcin Kościelnicki  <koriakin@0x04.net>
> +
> +	* common/config/s390/s390-common.c (s390_supports_split_stack):
> +	New function.
> +	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
> +	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
> +	* config/s390/s390.c (struct machine_function): New field
> +	split_stack_varargs_pointer.
> +	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
> +	in s390_emit_prologue.
> +	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
> +	vararg pointer.
> +	(morestack_ref): New global.
> +	(SPLIT_STACK_AVAILABLE): New macro.
> +	(s390_expand_split_stack_prologue): New function.
> +	(s390_expand_split_stack_call): New function.
> +	(s390_live_on_entry): New function.
> +	(s390_va_start): Use split-stack vararg pointer if appropriate.
> +	(s390_reorg): Lower the split-stack pseudo-insns.
> +	(s390_asm_file_end): Emit the split-stack note sections.
> +	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
> +	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
> +	(UNSPECV_SPLIT_STACK_CALL): New unspec.
> +	(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
> +	(UNSPECV_SPLIT_STACK_MARKER): New unspec.
> +	(split_stack_prologue): New expand.
> +	(split_stack_call): New expand.
> +	(split_stack_call_*): New insn.
> +	(split_stack_cond_call): New expand.
> +	(split_stack_cond_call_*): New insn.
> +	(split_stack_space_check): New expand.
> +	(split_stack_sibcall): New expand.
> +	(split_stack_sibcall_*): New insn.
> +	(split_stack_cond_sibcall): New expand.
> +	(split_stack_cond_sibcall_*): New insn.
> +	(split_stack_marker): New insn.
> +
>  2016-02-02  Thomas Schwinge  <thomas@codesourcery.com>
> 
>  	* omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): Remove.
> diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c
> index 4519c21..1e497e6 100644
> --- a/gcc/common/config/s390/s390-common.c
> +++ b/gcc/common/config/s390/s390-common.c
> @@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
>      }
>  }
> 
> +/* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
> +   We don't verify it, since earlier versions just have padding at
> +   its place, which works just as well.  */
> +
> +static bool
> +s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
> +			   struct gcc_options *opts ATTRIBUTE_UNUSED)
> +{
> +  return true;
> +}
> +
>  #undef TARGET_DEFAULT_TARGET_FLAGS
>  #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT)
> 
> @@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
>  #undef TARGET_OPTION_INIT_STRUCT
>  #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct
> 
> +#undef TARGET_SUPPORTS_SPLIT_STACK
> +#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack
> +
>  struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
> diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
> index 633bc1e..09032c9 100644
> --- a/gcc/config/s390/s390-protos.h
> +++ b/gcc/config/s390/s390-protos.h
> @@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
>  extern HOST_WIDE_INT s390_initial_elimination_offset (int, int);
>  extern void s390_emit_prologue (void);
>  extern void s390_emit_epilogue (bool);
> +extern void s390_expand_split_stack_prologue (void);
>  extern bool s390_can_use_simple_return_insn (void);
>  extern bool s390_can_use_return_insn (void);
>  extern void s390_function_profiler (FILE *, int);
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index 3be64de..59628ba 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -426,6 +426,13 @@ struct GTY(()) machine_function
>    /* True if the current function may contain a tbegin clobbering
>       FPRs.  */
>    bool tbegin_p;
> +
> +  /* For -fsplit-stack support: A stack local which holds a pointer to
> +     the stack arguments for a function with a variable number of
> +     arguments.  This is set at the start of the function and is used
> +     to initialize the overflow_arg_area field of the va_list
> +     structure.  */
> +  rtx split_stack_varargs_pointer;
>  };
> 
>  /* Few accessor macros for struct cfun->machine->s390_frame_layout.  */
> @@ -9316,9 +9323,13 @@ s390_register_info ()
>  	  cfun_frame_layout.high_fprs++;
>        }
> 
> -  if (flag_pic)
> -    clobbered_regs[PIC_OFFSET_TABLE_REGNUM]
> -      |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM);
> +  /* Register 12 is used for GOT address, but also as temp in prologue
> +     for split-stack stdarg functions (unless r14 is available).  */
> +  clobbered_regs[12]
> +    |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM))
> +	|| (flag_split_stack && cfun->stdarg
> +	    && (crtl->is_leaf || TARGET_TPF_PROFILING
> +		|| has_hard_reg_initial_val (Pmode, RETURN_REGNUM))));
> 
>    clobbered_regs[BASE_REGNUM]
>      |= (cfun->machine->base_reg
> @@ -10446,6 +10457,8 @@ s390_emit_prologue (void)
>        && !crtl->is_leaf
>        && !TARGET_TPF_PROFILING)
>      temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM);
> +  else if (flag_split_stack && cfun->stdarg)
> +    temp_reg = gen_rtx_REG (Pmode, 12);
>    else
>      temp_reg = gen_rtx_REG (Pmode, 1);
> 
> @@ -10939,6 +10952,234 @@ s300_set_up_by_prologue (hard_reg_set_container *regs)
>      SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg));
>  }
> 
> +/* -fsplit-stack support.  */
> +
> +/* A SYMBOL_REF for __morestack.  */
> +static GTY(()) rtx morestack_ref;
> +
> +/* When using -fsplit-stack, the allocation routines set a field in
> +   the TCB to the bottom of the stack plus this much space, measured
> +   in bytes.  */
> +
> +#define SPLIT_STACK_AVAILABLE 1024
> +
> +/* Emit -fsplit-stack prologue, which goes before the regular function
> +   prologue.  */
> +
> +void
> +s390_expand_split_stack_prologue (void)
> +{
> +  rtx r1, guard, cc;
> +  rtx_insn *insn;
> +  /* Offset from thread pointer to __private_ss.  */
> +  int psso = TARGET_64BIT ? 0x38 : 0x20;
> +  /* Pointer size in bytes.  */
> +  /* Frame size and argument size - the two parameters to __morestack.  */
> +  HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size;
> +  /* Align argument size to 8 bytes - simplifies __morestack code.  */
> +  HOST_WIDE_INT args_size = crtl->args.size >= 0
> +			    ? ((crtl->args.size + 7) & ~7)
> +			    : 0;
> +  /* Label to be called by __morestack.  */
> +  rtx_code_label *call_done = NULL;
> +  rtx tmp;
> +
> +  gcc_assert (flag_split_stack && reload_completed);
> +  if (!TARGET_CPU_ZARCH)
> +    {
> +      sorry ("CPUs older than z900 are not supported for -fsplit-stack");
> +      return;
> +    }
> +
> +  r1 = gen_rtx_REG (Pmode, 1);
> +
> +  /* If no stack frame will be allocated, don't do anything.  */
> +  if (!frame_size)
> +    {
> +      /* But emit a marker that will let linker and indirect function
> +	 calls recognise this function as split-stack aware.  */
> +      emit_insn (gen_split_stack_marker ());
> +      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
> +	{
> +	  /* If va_start is used, just use r15.  */
> +	  emit_move_insn (r1,
> +			 gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> +				       GEN_INT (STACK_POINTER_OFFSET)));
> +
> +	}
> +      return;
> +    }
> +
> +  if (morestack_ref == NULL_RTX)
> +    {
> +      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
> +      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
> +					   | SYMBOL_FLAG_FUNCTION);
> +    }
> +
> +  if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Op (frame_size))
> +    {
> +      /* If frame_size will fit in an add instruction, do a stack space
> +	 check, and only call __morestack if there's not enough space.  */
> +
> +      /* Get thread pointer.  r1 is the only register we can always destroy - r0
> +	 could contain a static chain (and cannot be used to address memory
> +	 anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved.  */
> +      emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM));
> +      /* Aim at __private_ss.  */
> +      guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso));
> +
> +      /* If less that 1kiB used, skip addition and compare directly with
> +	 __private_ss.  */
> +      if (frame_size > SPLIT_STACK_AVAILABLE)
> +	{
> +	  emit_move_insn (r1, guard);
> +	  if (TARGET_64BIT)
> +	    emit_insn (gen_adddi3 (r1, r1, GEN_INT (frame_size)));
> +	  else
> +	    emit_insn (gen_addsi3 (r1, r1, GEN_INT (frame_size)));
> +	  guard = r1;
> +	}
> +
> +      /* Compare the (maybe adjusted) guard with the stack pointer.  */
> +      cc = s390_emit_compare (LT, stack_pointer_rtx, guard);
> +
> +      call_done = gen_label_rtx ();
> +
> +      tmp = gen_split_stack_cond_call (call_done,
> +				       morestack_ref,
> +				       GEN_INT (frame_size),
> +				       GEN_INT (args_size),
> +				       cc);
> +
> +      insn = emit_jump_insn (tmp);
> +      JUMP_LABEL (insn) = call_done;
> +
> +      /* Mark the jump as very unlikely to be taken.  */
> +      add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100);
> +
> +      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
> +	{
> +	  /* If va_start is used, and __morestack was not called, just use
> +	     r15.  */
> +	  emit_move_insn (r1,
> +			 gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> +				       GEN_INT (STACK_POINTER_OFFSET)));
> +	}
> +    }
> +  else
> +    {
> +      call_done = gen_label_rtx ();
> +
> +      /* Now, we need to call __morestack.  It has very special calling
> +	 conventions: it preserves param/return/static chain registers for
> +	 calling main function body, and looks for its own parameters
> +	 at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */
> +      tmp = gen_split_stack_call (call_done,
> +				  morestack_ref,
> +				  GEN_INT (frame_size),
> +				  GEN_INT (args_size));
> +      insn = emit_jump_insn (tmp);
> +      JUMP_LABEL (insn) = call_done;
> +      emit_barrier ();
> +    }
> +
> +  /* __morestack will call us here.  */
> +
> +  emit_label (call_done);
> +  LABEL_NUSES (call_done) = 1;
> +}
> +
> +/* Generates split-stack call sequence, along with its parameter block.  */
> +
> +static void
> +s390_expand_split_stack_call (rtx_insn *orig_insn,
> +			      rtx call_done,
> +			      rtx function,
> +			      rtx frame_size,
> +			      rtx args_size,
> +			      rtx cond)
> +{
> +  rtx_insn *insn = orig_insn;
> +  rtx parmbase = gen_label_rtx ();
> +  rtx r1 = gen_rtx_REG (Pmode, 1);
> +  rtx tmp, tmp2;
> +
> +  /* %r1 = litbase.  */
> +  insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn);
> +  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
> +  LABEL_NUSES (parmbase)++;
> +
> +  /* jg<cond> __morestack.  */
> +  if (cond == NULL)
> +    {
> +      tmp = gen_split_stack_sibcall (function, call_done);
> +      insn = emit_jump_insn_after (tmp, insn);
> +    }
> +  else
> +    {
> +      gcc_assert (s390_comparison (cond, VOIDmode));
> +      tmp = gen_split_stack_cond_sibcall (function, cond, call_done);
> +      insn = emit_jump_insn_after (tmp, insn);
> +    }
> +  JUMP_LABEL (insn) = call_done;
> +  LABEL_NUSES (call_done)++;
> +
> +  /* Go to .rodata.  */
> +  insn = emit_insn_after (gen_pool_section_start (), insn);
> +
> +  /* Now, we'll emit parameters to __morestack.  First, align to pointer size
> +     (this mirrors the alignment done in __morestack - don't touch it).  */
> +  insn = emit_insn_after (gen_pool_align (GEN_INT (UNITS_PER_LONG)), insn);
> +
> +  insn = emit_label_after (parmbase, insn);
> +
> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
> +				 gen_rtvec (1, frame_size),
> +				 UNSPECV_POOL_ENTRY);
> +  insn = emit_insn_after (tmp, insn);
> +
> +  /* Second parameter is size of the arguments passed on stack that
> +     __morestack has to copy to the new stack (does not include varargs).  */
> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
> +				 gen_rtvec (1, args_size),
> +				 UNSPECV_POOL_ENTRY);
> +  insn = emit_insn_after (tmp, insn);
> +
> +  /* Third parameter is offset between start of the parameter block
> +     and function body to be called by __morestack.  */
> +  tmp = gen_rtx_LABEL_REF (Pmode, parmbase);
> +  tmp2 = gen_rtx_LABEL_REF (Pmode, call_done);
> +  tmp = gen_rtx_CONST (Pmode,
> +		       gen_rtx_MINUS (Pmode, tmp2, tmp));
> +  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
> +				 gen_rtvec (1, tmp),
> +				 UNSPECV_POOL_ENTRY);
> +  insn = emit_insn_after (tmp, insn);
> +  add_reg_note (insn, REG_LABEL_OPERAND, call_done);
> +  LABEL_NUSES (call_done)++;
> +  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
> +  LABEL_NUSES (parmbase)++;
> +
> +  /* Return from .rodata.  */
> +  insn = emit_insn_after (gen_pool_section_end (), insn);
> +
> +  delete_insn (orig_insn);
> +}
> +
> +/* We may have to tell the dataflow pass that the split stack prologue
> +   is initializing a register.  */
> +
> +static void
> +s390_live_on_entry (bitmap regs)
> +{
> +  if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
> +    {
> +      gcc_assert (flag_split_stack);
> +      bitmap_set_bit (regs, 1);
> +    }
> +}
> +
>  /* Return true if the function can use simple_return to return outside
>     of a shrink-wrapped region.  At present shrink-wrapping is supported
>     in all cases.  */
> @@ -11541,6 +11782,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
>        expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
>      }
> 
> +  if (flag_split_stack
> +     && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
> +         == NULL)
> +     && cfun->machine->split_stack_varargs_pointer == NULL_RTX)
> +    {
> +      rtx reg;
> +      rtx_insn *seq;
> +
> +      reg = gen_reg_rtx (Pmode);
> +      cfun->machine->split_stack_varargs_pointer = reg;
> +
> +      start_sequence ();
> +      emit_move_insn (reg, gen_rtx_REG (Pmode, 1));
> +      seq = get_insns ();
> +      end_sequence ();
> +
> +      push_topmost_sequence ();
> +      emit_insn_after (seq, entry_of_function ());
> +      pop_topmost_sequence ();
> +    }
> +
>    /* Find the overflow area.
>       FIXME: This currently is too pessimistic when the vector ABI is
>       enabled.  In that case we *always* set up the overflow area
> @@ -11549,7 +11811,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
>        || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG
>        || TARGET_VX_ABI)
>      {
> -      t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
> +      if (cfun->machine->split_stack_varargs_pointer == NULL_RTX)
> +        t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
> +      else
> +        t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer);
> 
>        off = INTVAL (crtl->args.arg_offset_rtx);
>        off = off < 0 ? 0 : off;
> @@ -13158,6 +13423,48 @@ s390_reorg (void)
>  	}
>      }
> 
> +  if (flag_split_stack)
> +    {
> +      rtx_insn *insn;
> +
> +      for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
> +	{
> +	  /* Look for the split-stack fake jump instructions.  */
> +	  if (!JUMP_P(insn))
> +	    continue;
> +	  if (GET_CODE (PATTERN (insn)) != PARALLEL
> +	      || XVECLEN (PATTERN (insn), 0) != 2)
> +	    continue;
> +	  rtx set = XVECEXP (PATTERN (insn), 0, 1);
> +	  if (GET_CODE (set) != SET)
> +	    continue;
> +	  rtx unspec = XEXP(set, 1);
> +	  if (GET_CODE (unspec) != UNSPEC_VOLATILE)
> +	    continue;
> +	  if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL)
> +	    continue;
> +	  rtx set_pc = XVECEXP (PATTERN (insn), 0, 0);
> +	  rtx function = XVECEXP (unspec, 0, 0);
> +	  rtx frame_size = XVECEXP (unspec, 0, 1);
> +	  rtx args_size = XVECEXP (unspec, 0, 2);
> +	  rtx pc_src = XEXP (set_pc, 1);
> +	  rtx call_done, cond = NULL_RTX;
> +	  if (GET_CODE (pc_src) == IF_THEN_ELSE)
> +	    {
> +	      cond = XEXP (pc_src, 0);
> +	      call_done = XEXP (XEXP (pc_src, 1), 0);
> +	    }
> +	  else
> +	    call_done = XEXP (pc_src, 0);
> +	  s390_expand_split_stack_call (insn,
> +					call_done,
> +					function,
> +					frame_size,
> +					args_size,
> +					cond);
> +	}
> +    }
> +
>    /* Try to optimize prologue and epilogue further.  */
>    s390_optimize_prologue ();
> 
> @@ -14469,6 +14776,9 @@ s390_asm_file_end (void)
>  	     s390_vector_abi);
>  #endif
>    file_end_indicate_exec_stack ();
> +
> +  if (flag_split_stack)
> +    file_end_indicate_split_stack ();
>  }
> 
>  /* Return true if TYPE is a vector bool type.  */
> @@ -14724,6 +15034,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty
>  #undef TARGET_SET_UP_BY_PROLOGUE
>  #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue
> 
> +#undef TARGET_EXTRA_LIVE_ON_ENTRY
> +#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry
> +
>  #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
>  #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
>    s390_use_by_pieces_infrastructure_p
> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
> index 9b869d5..771f1cc 100644
> --- a/gcc/config/s390/s390.md
> +++ b/gcc/config/s390/s390.md
> @@ -114,6 +114,9 @@
>     UNSPEC_SP_SET
>     UNSPEC_SP_TEST
> 
> +   ; Split stack support
> +   UNSPEC_STACK_CHECK
> +
>     ; Test Data Class (TDC)
>     UNSPEC_TDC_INSN
> 
> @@ -276,6 +279,11 @@
>     ; Set and get floating point control register
>     UNSPECV_SFPC
>     UNSPECV_EFPC
> +
> +   ; Split stack support
> +   UNSPECV_SPLIT_STACK_CALL
> +   UNSPECV_SPLIT_STACK_SIBCALL
> +   UNSPECV_SPLIT_STACK_MARKER
>    ])
> 
>  ;;
> @@ -10907,3 +10915,172 @@
>    "TARGET_Z13"
>    "lcbb\t%0,%1,%b2"
>    [(set_attr "op_type" "VRX")])
> +
> +; Handle -fsplit-stack.
> +
> +(define_expand "split_stack_prologue"
> +  [(const_int 0)]
> +  ""
> +{
> +  s390_expand_split_stack_prologue ();
> +  DONE;
> +})
> +
> +(define_expand "split_stack_call"
> +  [(match_operand 0 "" "")
> +   (match_operand 1 "bras_sym_operand" "X")
> +   (match_operand 2 "consttable_operand" "X")
> +   (match_operand 3 "consttable_operand" "X")]
> +  "TARGET_CPU_ZARCH"
> +{
> +  if (TARGET_64BIT)
> +    emit_jump_insn (gen_split_stack_call_di (operands[0],
> +					     operands[1],
> +					     operands[2],
> +					     operands[3]));
> +  else
> +    emit_jump_insn (gen_split_stack_call_si (operands[0],
> +					     operands[1],
> +					     operands[2],
> +					     operands[3]));
> +  DONE;
> +})
> +
> +(define_insn "split_stack_call_<mode>"
> +  [(set (pc) (label_ref (match_operand 0 "" "")))
> +   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
> +				    (match_operand 2 "consttable_operand" "X")
> +				    (match_operand 3 "consttable_operand" "X")]
> +				   UNSPECV_SPLIT_STACK_CALL))]
> +  "TARGET_CPU_ZARCH"
> +{
> +  gcc_unreachable ();
> +}
> +  [(set_attr "length" "12")])
> +
> +(define_expand "split_stack_cond_call"
> +  [(match_operand 0 "" "")
> +   (match_operand 1 "bras_sym_operand" "X")
> +   (match_operand 2 "consttable_operand" "X")
> +   (match_operand 3 "consttable_operand" "X")
> +   (match_operand 4 "" "")]
> +  "TARGET_CPU_ZARCH"
> +{
> +  if (TARGET_64BIT)
> +    emit_jump_insn (gen_split_stack_cond_call_di (operands[0],
> +						  operands[1],
> +						  operands[2],
> +						  operands[3],
> +						  operands[4]));
> +  else
> +    emit_jump_insn (gen_split_stack_cond_call_si (operands[0],
> +						  operands[1],
> +						  operands[2],
> +						  operands[3],
> +						  operands[4]));
> +  DONE;
> +})
> +
> +(define_insn "split_stack_cond_call_<mode>"
> +  [(set (pc)
> +	(if_then_else
> +	  (match_operand 4 "" "")
> +	  (label_ref (match_operand 0 "" ""))
> +	  (pc)))
> +   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
> +				    (match_operand 2 "consttable_operand" "X")
> +				    (match_operand 3 "consttable_operand" "X")]
> +				   UNSPECV_SPLIT_STACK_CALL))]
> +  "TARGET_CPU_ZARCH"
> +{
> +  gcc_unreachable ();
> +}
> +  [(set_attr "length" "12")])
> +
> +;; If there are operand 0 bytes available on the stack, jump to
> +;; operand 1.
> +
> +(define_expand "split_stack_space_check"
> +  [(set (pc) (if_then_else
> +	      (ltu (minus (reg 15)
> +			  (match_operand 0 "register_operand"))
> +		   (unspec [(const_int 0)] UNSPEC_STACK_CHECK))
> +	      (label_ref (match_operand 1))
> +	      (pc)))]
> +  ""
> +{
> +  /* Offset from thread pointer to __private_ss.  */
> +  int psso = TARGET_64BIT ? 0x38 : 0x20;
> +  rtx tp = s390_get_thread_pointer ();
> +  rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso));
> +  rtx reg = gen_reg_rtx (Pmode);
> +  rtx cc;
> +  if (TARGET_64BIT)
> +    emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0]));
> +  else
> +    emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0]));
> +  cc = s390_emit_compare (GT, reg, guard);
> +  s390_emit_jump (operands[1], cc);
> +
> +  DONE;
> +})
> +
> +;; A jg with minimal fuss for use in split stack prologue.
> +
> +(define_expand "split_stack_sibcall"
> +  [(match_operand 0 "bras_sym_operand" "X")
> +   (match_operand 1 "" "")]
> +  "TARGET_CPU_ZARCH"
> +{
> +  if (TARGET_64BIT)
> +    emit_jump_insn (gen_split_stack_sibcall_di (operands[0], operands[1]));
> +  else
> +    emit_jump_insn (gen_split_stack_sibcall_si (operands[0], operands[1]));
> +  DONE;
> +})
> +
> +(define_insn "split_stack_sibcall_<mode>"
> +  [(set (pc) (label_ref (match_operand 1 "" "")))
> +   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
> +				   UNSPECV_SPLIT_STACK_SIBCALL))]
> +  "TARGET_CPU_ZARCH"
> +  "jg\t%0"
> +  [(set_attr "op_type" "RIL")
> +   (set_attr "type"  "branch")])
> +
> +;; Also a conditional one.
> +
> +(define_expand "split_stack_cond_sibcall"
> +  [(match_operand 0 "bras_sym_operand" "X")
> +   (match_operand 1 "" "")
> +   (match_operand 2 "" "")]
> +  "TARGET_CPU_ZARCH"
> +{
> +  if (TARGET_64BIT)
> +    emit_jump_insn (gen_split_stack_cond_sibcall_di (operands[0], operands[1], operands[2]));
> +  else
> +    emit_jump_insn (gen_split_stack_cond_sibcall_si (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_insn "split_stack_cond_sibcall_<mode>"
> +  [(set (pc)
> +	(if_then_else
> +	  (match_operand 1 "" "")
> +	  (label_ref (match_operand 2 "" ""))
> +	  (pc)))
> +   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
> +				   UNSPECV_SPLIT_STACK_SIBCALL))]
> +  "TARGET_CPU_ZARCH"
> +  "jg%C1\t%0"
> +  [(set_attr "op_type" "RIL")
> +   (set_attr "type"  "branch")])
> +
> +;; An unusual nop instruction used to mark functions with no stack frames
> +;; as split-stack aware.
> +
> +(define_insn "split_stack_marker"
> +  [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)]
> +  ""
> +  "nopr\t%%r15"
> +  [(set_attr "op_type" "RR")])
> diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
> index 49c7929..3900ab1 100644
> --- a/libgcc/ChangeLog
> +++ b/libgcc/ChangeLog
> @@ -1,3 +1,10 @@
> +2016-02-02  Marcin Kościelnicki  <koriakin@0x04.net>
> +
> +	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
> +	* config/s390/morestack.S: New file.
> +	* config/s390/t-stack-s390: New file.
> +	* generic-morestack.c (__splitstack_find): Add s390-specific code.
> +
>  2016-01-25  Jakub Jelinek  <jakub@redhat.com>
> 
>  	PR target/69444
> diff --git a/libgcc/config.host b/libgcc/config.host
> index d8efd82..2be5f7e 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -1114,11 +1114,11 @@ rx-*-elf)
>  	tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h"
>  	;;
>  s390-*-linux*)
> -	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi"
> +	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390"
>  	md_unwind_header=s390/linux-unwind.h
>  	;;
>  s390x-*-linux*)
> -	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux"
> +	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390"
>  	if test "${host_address}" = 32; then
>  	   tmake_file="${tmake_file} s390/32/t-floattodi"
>  	fi
> diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S
> new file mode 100644
> index 0000000..141dead
> --- /dev/null
> +++ b/libgcc/config/s390/morestack.S
> @@ -0,0 +1,609 @@
> +# s390 support for -fsplit-stack.
> +# Copyright (C) 2015 Free Software Foundation, Inc.
> +# Contributed by Marcin Kościelnicki <koriakin@0x04.net>.
> +
> +# This file is part of GCC.
> +
> +# GCC is free software; you can redistribute it and/or modify it under
> +# the terms of the GNU General Public License as published by the Free
> +# Software Foundation; either version 3, or (at your option) any later
> +# version.
> +
> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +# for more details.
> +
> +# Under Section 7 of GPL version 3, you are granted additional
> +# permissions described in the GCC Runtime Library Exception, version
> +# 3.1, as published by the Free Software Foundation.
> +
> +# You should have received a copy of the GNU General Public License and
> +# a copy of the GCC Runtime Library Exception along with this program;
> +# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +# <http://www.gnu.org/licenses/>.
> +
> +# Excess space needed to call ld.so resolver for lazy plt
> +# resolution.  Go uses sigaltstack so this doesn't need to
> +# also cover signal frame size.
> +#define BACKOFF 0x1000
> +
> +# The __morestack function.
> +
> +	.global	__morestack
> +	.hidden	__morestack
> +
> +	.type	__morestack,@function
> +
> +__morestack:
> +.LFB1:
> +	.cfi_startproc
> +
> +
> +#ifndef __s390x__
> +
> +
> +# The 31-bit __morestack function.
> +
> +	# We use a cleanup to restore the stack guard if an exception
> +	# is thrown through this code.
> +#ifndef __PIC__
> +	.cfi_personality 0,__gcc_personality_v0
> +	.cfi_lsda 0,.LLSDA1
> +#else
> +	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
> +	.cfi_lsda 0x1b,.LLSDA1
> +#endif
> +
> +	stm	%r2, %r15, 0x8(%r15)	# Save %r2-%r15.
> +	.cfi_offset %r6, -0x48
> +	.cfi_offset %r7, -0x44
> +	.cfi_offset %r8, -0x40
> +	.cfi_offset %r9, -0x3c
> +	.cfi_offset %r10, -0x38
> +	.cfi_offset %r11, -0x34
> +	.cfi_offset %r12, -0x30
> +	.cfi_offset %r13, -0x2c
> +	.cfi_offset %r14, -0x28
> +	.cfi_offset %r15, -0x24
> +	lr	%r11, %r15		# Make frame pointer for vararg.
> +	.cfi_def_cfa_register %r11
> +	ahi	%r15, -0x60		# 0x60 for standard frame.
> +	st	%r11, 0(%r15)		# Save back chain.
> +	lr	%r8, %r0		# Save %r0 (static chain).
> +	lr	%r10, %r1		# Save %r1 (address of parameter block).
> +
> +	l	%r7, 0(%r10)		# Required frame size to %r7
> +	ear	%r1, %a0		# Extract thread pointer.
> +	l	%r1, 0x20(%r1)		# Get stack bounduary
> +	ar	%r1, %r7		# Stack bounduary + frame size
> +	a	%r1, 4(%r10)		# + stack param size
> +	clr	%r1, %r15		# Compare with current stack pointer
> +	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
> +
> +	brasl	%r14, __morestack_block_signals
> +
> +	# We abuse one of caller's fpr save slots (which we don't use for fprs)
> +	# as a local variable.  Not needed here, but done to be consistent with
> +	# the below use.
> +	ahi	%r7, BACKOFF		# Bump requested size a bit.
> +	st	%r7, 0x40(%r11)		# Stuff frame size on stack.
> +	la	%r2, 0x40(%r11)		# Pass its address as parameter.
> +	la	%r3, 0x60(%r11)		# Caller's stack parameters.
> +	l	%r4, 4(%r10)		# Size of stack parameters.
> +	brasl	%r14, __generic_morestack
> +
> +	lr	%r15, %r2		# Switch to the new stack.
> +	ahi	%r15, -0x60		# Make a stack frame on it.
> +	st	%r11, 0(%r15)		# Save back chain.
> +
> +	s	%r2, 0x40(%r11)		# The end of stack space.
> +	ahi	%r2, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0		# Extract thread pointer.
> +.LEHB0:
> +	st	%r2, 0x20(%r1)	# Save the new stack boundary.
> +
> +	brasl	%r14, __morestack_unblock_signals
> +
> +	lr	%r0, %r8		# Static chain.
> +	lm	%r2, %r6, 0x8(%r11)	# Paremeter registers.
> +
> +	# Third parameter is address of function meat - address of parameter
> +	# block.
> +	a	%r10, 0x8(%r10)
> +
> +	# Leave vararg pointer in %r1, in case function uses it
> +	la	%r1, 0x60(%r11)
> +
> +	# State of registers:
> +	# %r0: Static chain from entry.
> +	# %r1: Vararg pointer.
> +	# %r2-%r6: Parameters from entry.
> +	# %r7-%r10: Indeterminate.
> +	# %r11: Frame pointer (%r15 from entry).
> +	# %r12-%r13: Indeterminate.
> +	# %r14: Return address.
> +	# %r15: Stack pointer.
> +	basr	%r14, %r10		# Call our caller.
> +
> +	stm	%r2, %r3, 0x8(%r11)	# Save return registers.
> +
> +	brasl	%r14, __morestack_block_signals
> +
> +	# We need a stack slot now, but have no good way to get it - the frame
> +	# on new stack had to be exactly 0x60 bytes, or stack parameters would
> +	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
> +	# save actual fprs).
> +	la	%r2, 0x40(%r11)
> +	brasl	%r14, __generic_releasestack
> +
> +	s	%r2, 0x40(%r11)		# Subtract available space.
> +	ahi	%r2, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0		# Extract thread pointer.
> +.LEHE0:
> +	st	%r2, 0x20(%r1)	# Save the new stack boundary.
> +
> +	# We need to restore the old stack pointer before unblocking signals.
> +	# We also need 0x60 bytes for a stack frame.  Since we had a stack
> +	# frame at this place before the stack switch, there's no need to
> +	# write the back chain again.
> +	lr	%r15, %r11
> +	ahi	%r15, -0x60
> +
> +	brasl	%r14, __morestack_unblock_signals
> +
> +	lm	%r2, %r15, 0x8(%r11)	# Restore all registers.
> +	.cfi_remember_state
> +	.cfi_restore %r15
> +	.cfi_restore %r14
> +	.cfi_restore %r13
> +	.cfi_restore %r12
> +	.cfi_restore %r11
> +	.cfi_restore %r10
> +	.cfi_restore %r9
> +	.cfi_restore %r8
> +	.cfi_restore %r7
> +	.cfi_restore %r6
> +	.cfi_def_cfa_register %r15
> +	br	%r14			# Return to caller's caller.
> +
> +# Executed if no new stack allocation is needed.
> +
> +.Lnoalloc:
> +	.cfi_restore_state
> +	# We may need to copy stack parameters.
> +	l	%r9, 0x4(%r10)		# Load stack parameter size.
> +	ltr	%r9, %r9		# And check if it's 0.
> +	je	.Lnostackparm		# Skip the copy if not needed.
> +	sr	%r15, %r9		# Make space on the stack.
> +	la	%r8, 0x60(%r15)		# Destination.
> +	la	%r12, 0x60(%r11)	# Source.
> +	lr	%r13, %r9		# Source size.
> +.Lcopy:
> +	mvcle	%r8, %r12, 0		# Copy.
> +	jo	.Lcopy
> +
> +.Lnostackparm:
> +	# Third parameter is address of function meat - address of parameter
> +	# block.
> +	a	%r10, 0x8(%r10)
> +
> +	# Leave vararg pointer in %r1, in case function uses it
> +	la	%r1, 0x60(%r11)
> +
> +	# OK, no stack allocation needed.  We still follow the protocol and
> +	# call our caller - it doesn't cost much and makes sure vararg works.
> +	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
> +	basr	%r14, %r10		# Call our caller.
> +
> +	lm	%r6, %r15, 0x18(%r11)	# Restore all callee-saved registers.
> +	.cfi_remember_state
> +	.cfi_restore %r15
> +	.cfi_restore %r14
> +	.cfi_restore %r13
> +	.cfi_restore %r12
> +	.cfi_restore %r11
> +	.cfi_restore %r10
> +	.cfi_restore %r9
> +	.cfi_restore %r8
> +	.cfi_restore %r7
> +	.cfi_restore %r6
> +	.cfi_def_cfa_register %r15
> +	br	%r14			# Return to caller's caller.
> +
> +# This is the cleanup code called by the stack unwinder when unwinding
> +# through the code between .LEHB0 and .LEHE0 above.
> +
> +.L1:
> +	.cfi_restore_state
> +	lr	%r2, %r11		# Stack pointer after resume.
> +	brasl	%r14, __generic_findstack
> +	lr	%r3, %r11		# Get the stack pointer.
> +	sr	%r3, %r2		# Subtract available space.
> +	ahi	%r3, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0		# Extract thread pointer.
> +	st	%r3, 0x20(%r1)	# Save the new stack boundary.
> +
> +	lr	%r2, %r6		# Exception header.
> +#ifdef __PIC__
> +	brasl	%r14, _Unwind_Resume@PLT
> +#else
> +	brasl	%r14, _Unwind_Resume
> +#endif
> +
> +#else /* defined(__s390x__) */
> +
> +
> +# The 64-bit __morestack function.
> +
> +	# We use a cleanup to restore the stack guard if an exception
> +	# is thrown through this code.
> +#ifndef __PIC__
> +	.cfi_personality 0x3,__gcc_personality_v0
> +	.cfi_lsda 0x3,.LLSDA1
> +#else
> +	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
> +	.cfi_lsda 0x1b,.LLSDA1
> +#endif
> +
> +	stmg	%r2, %r15, 0x10(%r15)	# Save %r2-%r15.
> +	.cfi_offset %r6, -0x70
> +	.cfi_offset %r7, -0x68
> +	.cfi_offset %r8, -0x60
> +	.cfi_offset %r9, -0x58
> +	.cfi_offset %r10, -0x50
> +	.cfi_offset %r11, -0x48
> +	.cfi_offset %r12, -0x40
> +	.cfi_offset %r13, -0x38
> +	.cfi_offset %r14, -0x30
> +	.cfi_offset %r15, -0x28
> +	lgr	%r11, %r15		# Make frame pointer for vararg.
> +	.cfi_def_cfa_register %r11
> +	aghi	%r15, -0xa0		# 0xa0 for standard frame.
> +	stg	%r11, 0(%r15)		# Save back chain.
> +	lgr	%r8, %r0		# Save %r0 (static chain).
> +	lgr	%r10, %r1		# Save %r1 (address of parameter block).
> +
> +	lg	%r7, 0(%r10)		# Required frame size to %r7
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1		# Extract thread pointer.
> +	lg	%r1, 0x38(%r1)		# Get stack bounduary
> +	agr	%r1, %r7		# Stack bounduary + frame size
> +	ag	%r1, 8(%r10)		# + stack param size
> +	clgr	%r1, %r15		# Compare with current stack pointer
> +	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
> +
> +	brasl	%r14, __morestack_block_signals
> +
> +	# We abuse one of caller's fpr save slots (which we don't use for fprs)
> +	# as a local variable.  Not needed here, but done to be consistent with
> +	# the below use.
> +	aghi	%r7, BACKOFF		# Bump requested size a bit.
> +	stg	%r7, 0x80(%r11)		# Stuff frame size on stack.
> +	la	%r2, 0x80(%r11)		# Pass its address as parameter.
> +	la	%r3, 0xa0(%r11)		# Caller's stack parameters.
> +	lg	%r4, 8(%r10)		# Size of stack parameters.
> +	brasl	%r14, __generic_morestack
> +
> +	lgr	%r15, %r2		# Switch to the new stack.
> +	aghi	%r15, -0xa0		# Make a stack frame on it.
> +	stg	%r11, 0(%r15)		# Save back chain.
> +
> +	sg	%r2, 0x80(%r11)		# The end of stack space.
> +	aghi	%r2, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1		# Extract thread pointer.
> +.LEHB0:
> +	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
> +
> +	brasl	%r14, __morestack_unblock_signals
> +
> +	lgr	%r0, %r8		# Static chain.
> +	lmg	%r2, %r6, 0x10(%r11)	# Paremeter registers.
> +
> +	# Third parameter is address of function meat - address of parameter
> +	# block.
> +	ag	%r10, 0x10(%r10)
> +
> +	# Leave vararg pointer in %r1, in case function uses it
> +	la	%r1, 0xa0(%r11)
> +
> +	# State of registers:
> +	# %r0: Static chain from entry.
> +	# %r1: Vararg pointer.
> +	# %r2-%r6: Parameters from entry.
> +	# %r7-%r10: Indeterminate.
> +	# %r11: Frame pointer (%r15 from entry).
> +	# %r12-%r13: Indeterminate.
> +	# %r14: Return address.
> +	# %r15: Stack pointer.
> +	basr	%r14, %r10		# Call our caller.
> +
> +	stg	%r2, 0x10(%r11)		# Save return register.
> +
> +	brasl	%r14, __morestack_block_signals
> +
> +	# We need a stack slot now, but have no good way to get it - the frame
> +	# on new stack had to be exactly 0xa0 bytes, or stack parameters would
> +	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
> +	# save actual fprs).
> +	la	%r2, 0x80(%r11)
> +	brasl	%r14, __generic_releasestack
> +
> +	sg	%r2, 0x80(%r11)		# Subtract available space.
> +	aghi	%r2, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1		# Extract thread pointer.
> +.LEHE0:
> +	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
> +
> +	# We need to restore the old stack pointer before unblocking signals.
> +	# We also need 0xa0 bytes for a stack frame.  Since we had a stack
> +	# frame at this place before the stack switch, there's no need to
> +	# write the back chain again.
> +	lgr	%r15, %r11
> +	aghi	%r15, -0xa0
> +
> +	brasl	%r14, __morestack_unblock_signals
> +
> +	lmg	%r2, %r15, 0x10(%r11)	# Restore all registers.
> +	.cfi_remember_state
> +	.cfi_restore %r15
> +	.cfi_restore %r14
> +	.cfi_restore %r13
> +	.cfi_restore %r12
> +	.cfi_restore %r11
> +	.cfi_restore %r10
> +	.cfi_restore %r9
> +	.cfi_restore %r8
> +	.cfi_restore %r7
> +	.cfi_restore %r6
> +	.cfi_def_cfa_register %r15
> +	br	%r14			# Return to caller's caller.
> +
> +# Executed if no new stack allocation is needed.
> +
> +.Lnoalloc:
> +	.cfi_restore_state
> +	# We may need to copy stack parameters.
> +	lg	%r9, 0x8(%r10)		# Load stack parameter size.
> +	ltgr	%r9, %r9		# Check if it's 0.
> +	je	.Lnostackparm		# Skip the copy if not needed.
> +	sgr	%r15, %r9		# Make space on the stack.
> +	la	%r8, 0xa0(%r15)		# Destination.
> +	la	%r12, 0xa0(%r11)	# Source.
> +	lgr	%r13, %r9		# Source size.
> +.Lcopy:
> +	mvcle	%r8, %r12, 0		# Copy.
> +	jo	.Lcopy
> +
> +.Lnostackparm:
> +	# Third parameter is address of function meat - address of parameter
> +	# block.
> +	ag	%r10, 0x10(%r10)
> +
> +	# Leave vararg pointer in %r1, in case function uses it
> +	la	%r1, 0xa0(%r11)
> +
> +	# OK, no stack allocation needed.  We still follow the protocol and
> +	# call our caller - it doesn't cost much and makes sure vararg works.
> +	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
> +	basr	%r14, %r10		# Call our caller.
> +
> +	lmg	%r6, %r15, 0x30(%r11)	# Restore all callee-saved registers.
> +	.cfi_remember_state
> +	.cfi_restore %r15
> +	.cfi_restore %r14
> +	.cfi_restore %r13
> +	.cfi_restore %r12
> +	.cfi_restore %r11
> +	.cfi_restore %r10
> +	.cfi_restore %r9
> +	.cfi_restore %r8
> +	.cfi_restore %r7
> +	.cfi_restore %r6
> +	.cfi_def_cfa_register %r15
> +	br	%r14			# Return to caller's caller.
> +
> +# This is the cleanup code called by the stack unwinder when unwinding
> +# through the code between .LEHB0 and .LEHE0 above.
> +
> +.L1:
> +	.cfi_restore_state
> +	lgr	%r2, %r11		# Stack pointer after resume.
> +	brasl	%r14, __generic_findstack
> +	lgr	%r3, %r11		# Get the stack pointer.
> +	sgr	%r3, %r2		# Subtract available space.
> +	aghi	%r3, BACKOFF		# Back off a bit.
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1		# Extract thread pointer.
> +	stg	%r3, 0x38(%r1)	# Save the new stack boundary.
> +
> +	lgr	%r2, %r6		# Exception header.
> +#ifdef __PIC__
> +	brasl	%r14, _Unwind_Resume@PLT
> +#else
> +	brasl	%r14, _Unwind_Resume
> +#endif
> +
> +#endif /* defined(__s390x__) */
> +
> +	.cfi_endproc
> +	.size	__morestack, . - __morestack
> +
> +
> +# The exception table.  This tells the personality routine to execute
> +# the exception handler.
> +
> +	.section	.gcc_except_table,"a",@progbits
> +	.align	4
> +.LLSDA1:
> +	.byte	0xff	# @LPStart format (omit)
> +	.byte	0xff	# @TType format (omit)
> +	.byte	0x1	# call-site format (uleb128)
> +	.uleb128 .LLSDACSE1-.LLSDACSB1	# Call-site table length
> +.LLSDACSB1:
> +	.uleb128 .LEHB0-.LFB1	# region 0 start
> +	.uleb128 .LEHE0-.LEHB0	# length
> +	.uleb128 .L1-.LFB1	# landing pad
> +	.uleb128 0		# action
> +.LLSDACSE1:
> +
> +
> +	.global __gcc_personality_v0
> +#ifdef __PIC__
> +	# Build a position independent reference to the basic
> +	# personality function.
> +	.hidden DW.ref.__gcc_personality_v0
> +	.weak   DW.ref.__gcc_personality_v0
> +	.section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
> +	.type	DW.ref.__gcc_personality_v0, @object
> +DW.ref.__gcc_personality_v0:
> +#ifndef __LP64__
> +	.align 4
> +	.size	DW.ref.__gcc_personality_v0, 4
> +	.long	__gcc_personality_v0
> +#else
> +	.align 8
> +	.size	DW.ref.__gcc_personality_v0, 8
> +	.quad	__gcc_personality_v0
> +#endif
> +#endif
> +
> +
> +
> +# Initialize the stack test value when the program starts or when a
> +# new thread starts.  We don't know how large the main stack is, so we
> +# guess conservatively.  We might be able to use getrlimit here.
> +
> +	.text
> +	.global	__stack_split_initialize
> +	.hidden	__stack_split_initialize
> +
> +	.type	__stack_split_initialize, @function
> +
> +__stack_split_initialize:
> +
> +#ifndef __s390x__
> +
> +	ear	%r1, %a0
> +	lr	%r0, %r15
> +	ahi	%r0, -0x4000	# We should have at least 16K.
> +	st	%r0, 0x20(%r1)
> +
> +	lr	%r2, %r15
> +	lhi	%r3, 0x4000
> +#ifdef __PIC__
> +	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
> +#else
> +	jg	__generic_morestack_set_initial_sp	# Tail call
> +#endif
> +
> +#else /* defined(__s390x__) */
> +
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1
> +	lgr	%r0, %r15
> +	aghi	%r0, -0x4000	# We should have at least 16K.
> +	stg	%r0, 0x38(%r1)
> +
> +	lgr	%r2, %r15
> +	lghi	%r3, 0x4000
> +#ifdef __PIC__
> +	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
> +#else
> +	jg	__generic_morestack_set_initial_sp	# Tail call
> +#endif
> +
> +#endif /* defined(__s390x__) */
> +
> +	.size	__stack_split_initialize, . - __stack_split_initialize
> +
> +# Routines to get and set the guard, for __splitstack_getcontext,
> +# __splitstack_setcontext, and __splitstack_makecontext.
> +
> +# void *__morestack_get_guard (void) returns the current stack guard.
> +	.text
> +	.global	__morestack_get_guard
> +	.hidden	__morestack_get_guard
> +
> +	.type	__morestack_get_guard,@function
> +
> +__morestack_get_guard:
> +
> +#ifndef __s390x__
> +	ear	%r1, %a0
> +	l	%r2, 0x20(%r1)
> +#else
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1
> +	lg	%r2, 0x38(%r1)
> +#endif
> +	br %r14
> +
> +	.size	__morestack_get_guard, . - __morestack_get_guard
> +
> +# void __morestack_set_guard (void *) sets the stack guard.
> +	.global	__morestack_set_guard
> +	.hidden	__morestack_set_guard
> +
> +	.type	__morestack_set_guard,@function
> +
> +__morestack_set_guard:
> +
> +#ifndef __s390x__
> +	ear	%r1, %a0
> +	st	%r2, 0x20(%r1)
> +#else
> +	ear	%r1, %a0
> +	sllg	%r1, %r1, 32
> +	ear	%r1, %a1
> +	stg	%r2, 0x38(%r1)
> +#endif
> +	br	%r14
> +
> +	.size	__morestack_set_guard, . - __morestack_set_guard
> +
> +# void *__morestack_make_guard (void *, size_t) returns the stack
> +# guard value for a stack.
> +	.global	__morestack_make_guard
> +	.hidden	__morestack_make_guard
> +
> +	.type	__morestack_make_guard,@function
> +
> +__morestack_make_guard:
> +
> +#ifndef __s390x__
> +	sr	%r2, %r3
> +	ahi	%r2, BACKOFF
> +#else
> +	sgr	%r2, %r3
> +	aghi	%r2, BACKOFF
> +#endif
> +	br	%r14
> +
> +	.size	__morestack_make_guard, . - __morestack_make_guard
> +
> +# Make __stack_split_initialize a high priority constructor.
> +
> +	.section .ctors.65535,"aw",@progbits
> +
> +#ifndef __LP64__
> +	.align	4
> +	.long	__stack_split_initialize
> +	.long	__morestack_load_mmap
> +#else
> +	.align	8
> +	.quad	__stack_split_initialize
> +	.quad	__morestack_load_mmap
> +#endif
> +
> +	.section	.note.GNU-stack,"",@progbits
> +	.section	.note.GNU-split-stack,"",@progbits
> +	.section	.note.GNU-no-split-stack,"",@progbits
> diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390
> new file mode 100644
> index 0000000..4c959b0
> --- /dev/null
> +++ b/libgcc/config/s390/t-stack-s390
> @@ -0,0 +1,2 @@
> +# Makefile fragment to support -fsplit-stack for s390.
> +LIB2ADD_ST += $(srcdir)/config/s390/morestack.S
> diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
> index 89765d4..b8eec4e 100644
> --- a/libgcc/generic-morestack.c
> +++ b/libgcc/generic-morestack.c
> @@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len,
>  #elif defined (__i386__)
>        nsp -= 6 * sizeof (void *);
>  #elif defined __powerpc64__
> +#elif defined __s390x__
> +      nsp -= 2 * 160;
> +#elif defined __s390__
> +      nsp -= 2 * 96;
>  #else
>  #error "unrecognized target"
>  #endif
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] s390: Add -fsplit-stack support
  2016-02-02 15:19               ` Andreas Krebbel
@ 2016-02-02 15:31                 ` Marcin Kościelnicki
  2016-02-02 18:34                   ` Ulrich Weigand
  0 siblings, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-02-02 15:31 UTC (permalink / raw)
  To: krebbel; +Cc: gcc-patches, Marcin Kościelnicki

libgcc/ChangeLog:

	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
	* config/s390/morestack.S: New file.
	* config/s390/t-stack-s390: New file.
	* generic-morestack.c (__splitstack_find): Add s390-specific code.

gcc/ChangeLog:

	* common/config/s390/s390-common.c (s390_supports_split_stack):
	New function.
	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
	* config/s390/s390.c (struct machine_function): New field
	split_stack_varargs_pointer.
	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
	in s390_emit_prologue.
	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
	vararg pointer.
	(morestack_ref): New global.
	(SPLIT_STACK_AVAILABLE): New macro.
	(s390_expand_split_stack_prologue): New function.
	(s390_expand_split_stack_call): New function.
	(s390_live_on_entry): New function.
	(s390_va_start): Use split-stack vararg pointer if appropriate.
	(s390_reorg): Lower the split-stack pseudo-insns.
	(s390_asm_file_end): Emit the split-stack note sections.
	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
	(UNSPECV_SPLIT_STACK_CALL): New unspec.
	(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
	(UNSPECV_SPLIT_STACK_MARKER): New unspec.
	(split_stack_prologue): New expand.
	(split_stack_call): New expand.
	(split_stack_call_*): New insn.
	(split_stack_cond_call): New expand.
	(split_stack_cond_call_*): New insn.
	(split_stack_space_check): New expand.
	(split_stack_sibcall): New expand.
	(split_stack_sibcall_*): New insn.
	(split_stack_cond_sibcall): New expand.
	(split_stack_cond_sibcall_*): New insn.
	(split_stack_marker): New insn.
---
Here we go.  I've also removed the "see below", since I don't really
see anything below...

 gcc/ChangeLog                        |  37 +++
 gcc/common/config/s390/s390-common.c |  14 +
 gcc/config/s390/s390-protos.h        |   1 +
 gcc/config/s390/s390.c               | 323 ++++++++++++++++++-
 gcc/config/s390/s390.md              | 177 ++++++++++
 libgcc/ChangeLog                     |   7 +
 libgcc/config.host                   |   4 +-
 libgcc/config/s390/morestack.S       | 609 +++++++++++++++++++++++++++++++++++
 libgcc/config/s390/t-stack-s390      |   2 +
 libgcc/generic-morestack.c           |   4 +
 10 files changed, 1171 insertions(+), 7 deletions(-)
 create mode 100644 libgcc/config/s390/morestack.S
 create mode 100644 libgcc/config/s390/t-stack-s390

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9a2cec8..af86079 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,40 @@
+2016-02-02  Marcin Kościelnicki  <koriakin@0x04.net>
+
+	* common/config/s390/s390-common.c (s390_supports_split_stack):
+	New function.
+	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
+	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
+	* config/s390/s390.c (struct machine_function): New field
+	split_stack_varargs_pointer.
+	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
+	in s390_emit_prologue.
+	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
+	vararg pointer.
+	(morestack_ref): New global.
+	(SPLIT_STACK_AVAILABLE): New macro.
+	(s390_expand_split_stack_prologue): New function.
+	(s390_expand_split_stack_call): New function.
+	(s390_live_on_entry): New function.
+	(s390_va_start): Use split-stack vararg pointer if appropriate.
+	(s390_reorg): Lower the split-stack pseudo-insns.
+	(s390_asm_file_end): Emit the split-stack note sections.
+	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
+	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
+	(UNSPECV_SPLIT_STACK_CALL): New unspec.
+	(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
+	(UNSPECV_SPLIT_STACK_MARKER): New unspec.
+	(split_stack_prologue): New expand.
+	(split_stack_call): New expand.
+	(split_stack_call_*): New insn.
+	(split_stack_cond_call): New expand.
+	(split_stack_cond_call_*): New insn.
+	(split_stack_space_check): New expand.
+	(split_stack_sibcall): New expand.
+	(split_stack_sibcall_*): New insn.
+	(split_stack_cond_sibcall): New expand.
+	(split_stack_cond_sibcall_*): New insn.
+	(split_stack_marker): New insn.
+
 2016-02-02  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): Remove.
diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c
index 4519c21..1e497e6 100644
--- a/gcc/common/config/s390/s390-common.c
+++ b/gcc/common/config/s390/s390-common.c
@@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
     }
 }
 
+/* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
+   We don't verify it, since earlier versions just have padding at
+   its place, which works just as well.  */
+
+static bool
+s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
+			   struct gcc_options *opts ATTRIBUTE_UNUSED)
+{
+  return true;
+}
+
 #undef TARGET_DEFAULT_TARGET_FLAGS
 #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT)
 
@@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
 #undef TARGET_OPTION_INIT_STRUCT
 #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct
 
+#undef TARGET_SUPPORTS_SPLIT_STACK
+#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack
+
 struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 633bc1e..09032c9 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
 extern HOST_WIDE_INT s390_initial_elimination_offset (int, int);
 extern void s390_emit_prologue (void);
 extern void s390_emit_epilogue (bool);
+extern void s390_expand_split_stack_prologue (void);
 extern bool s390_can_use_simple_return_insn (void);
 extern bool s390_can_use_return_insn (void);
 extern void s390_function_profiler (FILE *, int);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 3be64de..6c1cb1e 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -426,6 +426,13 @@ struct GTY(()) machine_function
   /* True if the current function may contain a tbegin clobbering
      FPRs.  */
   bool tbegin_p;
+
+  /* For -fsplit-stack support: A stack local which holds a pointer to
+     the stack arguments for a function with a variable number of
+     arguments.  This is set at the start of the function and is used
+     to initialize the overflow_arg_area field of the va_list
+     structure.  */
+  rtx split_stack_varargs_pointer;
 };
 
 /* Few accessor macros for struct cfun->machine->s390_frame_layout.  */
@@ -9316,9 +9323,13 @@ s390_register_info ()
 	  cfun_frame_layout.high_fprs++;
       }
 
-  if (flag_pic)
-    clobbered_regs[PIC_OFFSET_TABLE_REGNUM]
-      |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM);
+  /* Register 12 is used for GOT address, but also as temp in prologue
+     for split-stack stdarg functions (unless r14 is available).  */
+  clobbered_regs[12]
+    |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM))
+	|| (flag_split_stack && cfun->stdarg
+	    && (crtl->is_leaf || TARGET_TPF_PROFILING
+		|| has_hard_reg_initial_val (Pmode, RETURN_REGNUM))));
 
   clobbered_regs[BASE_REGNUM]
     |= (cfun->machine->base_reg
@@ -10440,12 +10451,14 @@ s390_emit_prologue (void)
   int next_fpr = 0;
 
   /* Choose best register to use for temp use within prologue.
-     See below for why TPF must use the register 1.  */
+     TPF with profiling must avoid the register 14.  */
 
   if (!has_hard_reg_initial_val (Pmode, RETURN_REGNUM)
       && !crtl->is_leaf
       && !TARGET_TPF_PROFILING)
     temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM);
+  else if (flag_split_stack && cfun->stdarg)
+    temp_reg = gen_rtx_REG (Pmode, 12);
   else
     temp_reg = gen_rtx_REG (Pmode, 1);
 
@@ -10939,6 +10952,234 @@ s300_set_up_by_prologue (hard_reg_set_container *regs)
     SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg));
 }
 
+/* -fsplit-stack support.  */
+
+/* A SYMBOL_REF for __morestack.  */
+static GTY(()) rtx morestack_ref;
+
+/* When using -fsplit-stack, the allocation routines set a field in
+   the TCB to the bottom of the stack plus this much space, measured
+   in bytes.  */
+
+#define SPLIT_STACK_AVAILABLE 1024
+
+/* Emit -fsplit-stack prologue, which goes before the regular function
+   prologue.  */
+
+void
+s390_expand_split_stack_prologue (void)
+{
+  rtx r1, guard, cc;
+  rtx_insn *insn;
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = TARGET_64BIT ? 0x38 : 0x20;
+  /* Pointer size in bytes.  */
+  /* Frame size and argument size - the two parameters to __morestack.  */
+  HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size;
+  /* Align argument size to 8 bytes - simplifies __morestack code.  */
+  HOST_WIDE_INT args_size = crtl->args.size >= 0
+			    ? ((crtl->args.size + 7) & ~7)
+			    : 0;
+  /* Label to be called by __morestack.  */
+  rtx_code_label *call_done = NULL;
+  rtx tmp;
+
+  gcc_assert (flag_split_stack && reload_completed);
+  if (!TARGET_CPU_ZARCH)
+    {
+      sorry ("CPUs older than z900 are not supported for -fsplit-stack");
+      return;
+    }
+
+  r1 = gen_rtx_REG (Pmode, 1);
+
+  /* If no stack frame will be allocated, don't do anything.  */
+  if (!frame_size)
+    {
+      /* But emit a marker that will let linker and indirect function
+	 calls recognise this function as split-stack aware.  */
+      emit_insn (gen_split_stack_marker ());
+      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+	{
+	  /* If va_start is used, just use r15.  */
+	  emit_move_insn (r1,
+			 gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+				       GEN_INT (STACK_POINTER_OFFSET)));
+
+	}
+      return;
+    }
+
+  if (morestack_ref == NULL_RTX)
+    {
+      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
+      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
+					   | SYMBOL_FLAG_FUNCTION);
+    }
+
+  if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Op (frame_size))
+    {
+      /* If frame_size will fit in an add instruction, do a stack space
+	 check, and only call __morestack if there's not enough space.  */
+
+      /* Get thread pointer.  r1 is the only register we can always destroy - r0
+	 could contain a static chain (and cannot be used to address memory
+	 anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved.  */
+      emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM));
+      /* Aim at __private_ss.  */
+      guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso));
+
+      /* If less that 1kiB used, skip addition and compare directly with
+	 __private_ss.  */
+      if (frame_size > SPLIT_STACK_AVAILABLE)
+	{
+	  emit_move_insn (r1, guard);
+	  if (TARGET_64BIT)
+	    emit_insn (gen_adddi3 (r1, r1, GEN_INT (frame_size)));
+	  else
+	    emit_insn (gen_addsi3 (r1, r1, GEN_INT (frame_size)));
+	  guard = r1;
+	}
+
+      /* Compare the (maybe adjusted) guard with the stack pointer.  */
+      cc = s390_emit_compare (LT, stack_pointer_rtx, guard);
+
+      call_done = gen_label_rtx ();
+
+      tmp = gen_split_stack_cond_call (call_done,
+				       morestack_ref,
+				       GEN_INT (frame_size),
+				       GEN_INT (args_size),
+				       cc);
+
+      insn = emit_jump_insn (tmp);
+      JUMP_LABEL (insn) = call_done;
+
+      /* Mark the jump as very unlikely to be taken.  */
+      add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100);
+
+      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+	{
+	  /* If va_start is used, and __morestack was not called, just use
+	     r15.  */
+	  emit_move_insn (r1,
+			 gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+				       GEN_INT (STACK_POINTER_OFFSET)));
+	}
+    }
+  else
+    {
+      call_done = gen_label_rtx ();
+
+      /* Now, we need to call __morestack.  It has very special calling
+	 conventions: it preserves param/return/static chain registers for
+	 calling main function body, and looks for its own parameters
+	 at %r1 (after aligning it up to a 4 byte bounduary for 31-bit mode). */
+      tmp = gen_split_stack_call (call_done,
+				  morestack_ref,
+				  GEN_INT (frame_size),
+				  GEN_INT (args_size));
+      insn = emit_jump_insn (tmp);
+      JUMP_LABEL (insn) = call_done;
+      emit_barrier ();
+    }
+
+  /* __morestack will call us here.  */
+
+  emit_label (call_done);
+  LABEL_NUSES (call_done) = 1;
+}
+
+/* Generates split-stack call sequence, along with its parameter block.  */
+
+static void
+s390_expand_split_stack_call (rtx_insn *orig_insn,
+			      rtx call_done,
+			      rtx function,
+			      rtx frame_size,
+			      rtx args_size,
+			      rtx cond)
+{
+  rtx_insn *insn = orig_insn;
+  rtx parmbase = gen_label_rtx ();
+  rtx r1 = gen_rtx_REG (Pmode, 1);
+  rtx tmp, tmp2;
+
+  /* %r1 = litbase.  */
+  insn = emit_insn_after (gen_main_base_64 (r1, parmbase), insn);
+  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
+  LABEL_NUSES (parmbase)++;
+
+  /* jg<cond> __morestack.  */
+  if (cond == NULL)
+    {
+      tmp = gen_split_stack_sibcall (function, call_done);
+      insn = emit_jump_insn_after (tmp, insn);
+    }
+  else
+    {
+      gcc_assert (s390_comparison (cond, VOIDmode));
+      tmp = gen_split_stack_cond_sibcall (function, cond, call_done);
+      insn = emit_jump_insn_after (tmp, insn);
+    }
+  JUMP_LABEL (insn) = call_done;
+  LABEL_NUSES (call_done)++;
+
+  /* Go to .rodata.  */
+  insn = emit_insn_after (gen_pool_section_start (), insn);
+
+  /* Now, we'll emit parameters to __morestack.  First, align to pointer size
+     (this mirrors the alignment done in __morestack - don't touch it).  */
+  insn = emit_insn_after (gen_pool_align (GEN_INT (UNITS_PER_LONG)), insn);
+
+  insn = emit_label_after (parmbase, insn);
+
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, frame_size),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+
+  /* Second parameter is size of the arguments passed on stack that
+     __morestack has to copy to the new stack (does not include varargs).  */
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, args_size),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+
+  /* Third parameter is offset between start of the parameter block
+     and function body to be called by __morestack.  */
+  tmp = gen_rtx_LABEL_REF (Pmode, parmbase);
+  tmp2 = gen_rtx_LABEL_REF (Pmode, call_done);
+  tmp = gen_rtx_CONST (Pmode,
+		       gen_rtx_MINUS (Pmode, tmp2, tmp));
+  tmp = gen_rtx_UNSPEC_VOLATILE (Pmode,
+				 gen_rtvec (1, tmp),
+				 UNSPECV_POOL_ENTRY);
+  insn = emit_insn_after (tmp, insn);
+  add_reg_note (insn, REG_LABEL_OPERAND, call_done);
+  LABEL_NUSES (call_done)++;
+  add_reg_note (insn, REG_LABEL_OPERAND, parmbase);
+  LABEL_NUSES (parmbase)++;
+
+  /* Return from .rodata.  */
+  insn = emit_insn_after (gen_pool_section_end (), insn);
+
+  delete_insn (orig_insn);
+}
+
+/* We may have to tell the dataflow pass that the split stack prologue
+   is initializing a register.  */
+
+static void
+s390_live_on_entry (bitmap regs)
+{
+  if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+    {
+      gcc_assert (flag_split_stack);
+      bitmap_set_bit (regs, 1);
+    }
+}
+
 /* Return true if the function can use simple_return to return outside
    of a shrink-wrapped region.  At present shrink-wrapping is supported
    in all cases.  */
@@ -11541,6 +11782,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
       expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
     }
 
+  if (flag_split_stack
+     && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
+         == NULL)
+     && cfun->machine->split_stack_varargs_pointer == NULL_RTX)
+    {
+      rtx reg;
+      rtx_insn *seq;
+
+      reg = gen_reg_rtx (Pmode);
+      cfun->machine->split_stack_varargs_pointer = reg;
+
+      start_sequence ();
+      emit_move_insn (reg, gen_rtx_REG (Pmode, 1));
+      seq = get_insns ();
+      end_sequence ();
+
+      push_topmost_sequence ();
+      emit_insn_after (seq, entry_of_function ());
+      pop_topmost_sequence ();
+    }
+
   /* Find the overflow area.
      FIXME: This currently is too pessimistic when the vector ABI is
      enabled.  In that case we *always* set up the overflow area
@@ -11549,7 +11811,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
       || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG
       || TARGET_VX_ABI)
     {
-      t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
+      if (cfun->machine->split_stack_varargs_pointer == NULL_RTX)
+        t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
+      else
+        t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer);
 
       off = INTVAL (crtl->args.arg_offset_rtx);
       off = off < 0 ? 0 : off;
@@ -13158,6 +13423,48 @@ s390_reorg (void)
 	}
     }
 
+  if (flag_split_stack)
+    {
+      rtx_insn *insn;
+
+      for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+	{
+	  /* Look for the split-stack fake jump instructions.  */
+	  if (!JUMP_P(insn))
+	    continue;
+	  if (GET_CODE (PATTERN (insn)) != PARALLEL
+	      || XVECLEN (PATTERN (insn), 0) != 2)
+	    continue;
+	  rtx set = XVECEXP (PATTERN (insn), 0, 1);
+	  if (GET_CODE (set) != SET)
+	    continue;
+	  rtx unspec = XEXP(set, 1);
+	  if (GET_CODE (unspec) != UNSPEC_VOLATILE)
+	    continue;
+	  if (XINT (unspec, 1) != UNSPECV_SPLIT_STACK_CALL)
+	    continue;
+	  rtx set_pc = XVECEXP (PATTERN (insn), 0, 0);
+	  rtx function = XVECEXP (unspec, 0, 0);
+	  rtx frame_size = XVECEXP (unspec, 0, 1);
+	  rtx args_size = XVECEXP (unspec, 0, 2);
+	  rtx pc_src = XEXP (set_pc, 1);
+	  rtx call_done, cond = NULL_RTX;
+	  if (GET_CODE (pc_src) == IF_THEN_ELSE)
+	    {
+	      cond = XEXP (pc_src, 0);
+	      call_done = XEXP (XEXP (pc_src, 1), 0);
+	    }
+	  else
+	    call_done = XEXP (pc_src, 0);
+	  s390_expand_split_stack_call (insn,
+					call_done,
+					function,
+					frame_size,
+					args_size,
+					cond);
+	}
+    }
+
   /* Try to optimize prologue and epilogue further.  */
   s390_optimize_prologue ();
 
@@ -14469,6 +14776,9 @@ s390_asm_file_end (void)
 	     s390_vector_abi);
 #endif
   file_end_indicate_exec_stack ();
+
+  if (flag_split_stack)
+    file_end_indicate_split_stack ();
 }
 
 /* Return true if TYPE is a vector bool type.  */
@@ -14724,6 +15034,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty
 #undef TARGET_SET_UP_BY_PROLOGUE
 #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue
 
+#undef TARGET_EXTRA_LIVE_ON_ENTRY
+#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry
+
 #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   s390_use_by_pieces_infrastructure_p
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 9b869d5..771f1cc 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -114,6 +114,9 @@
    UNSPEC_SP_SET
    UNSPEC_SP_TEST
 
+   ; Split stack support
+   UNSPEC_STACK_CHECK
+
    ; Test Data Class (TDC)
    UNSPEC_TDC_INSN
 
@@ -276,6 +279,11 @@
    ; Set and get floating point control register
    UNSPECV_SFPC
    UNSPECV_EFPC
+
+   ; Split stack support
+   UNSPECV_SPLIT_STACK_CALL
+   UNSPECV_SPLIT_STACK_SIBCALL
+   UNSPECV_SPLIT_STACK_MARKER
   ])
 
 ;;
@@ -10907,3 +10915,172 @@
   "TARGET_Z13"
   "lcbb\t%0,%1,%b2"
   [(set_attr "op_type" "VRX")])
+
+; Handle -fsplit-stack.
+
+(define_expand "split_stack_prologue"
+  [(const_int 0)]
+  ""
+{
+  s390_expand_split_stack_prologue ();
+  DONE;
+})
+
+(define_expand "split_stack_call"
+  [(match_operand 0 "" "")
+   (match_operand 1 "bras_sym_operand" "X")
+   (match_operand 2 "consttable_operand" "X")
+   (match_operand 3 "consttable_operand" "X")]
+  "TARGET_CPU_ZARCH"
+{
+  if (TARGET_64BIT)
+    emit_jump_insn (gen_split_stack_call_di (operands[0],
+					     operands[1],
+					     operands[2],
+					     operands[3]));
+  else
+    emit_jump_insn (gen_split_stack_call_si (operands[0],
+					     operands[1],
+					     operands[2],
+					     operands[3]));
+  DONE;
+})
+
+(define_insn "split_stack_call_<mode>"
+  [(set (pc) (label_ref (match_operand 0 "" "")))
+   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
+				    (match_operand 2 "consttable_operand" "X")
+				    (match_operand 3 "consttable_operand" "X")]
+				   UNSPECV_SPLIT_STACK_CALL))]
+  "TARGET_CPU_ZARCH"
+{
+  gcc_unreachable ();
+}
+  [(set_attr "length" "12")])
+
+(define_expand "split_stack_cond_call"
+  [(match_operand 0 "" "")
+   (match_operand 1 "bras_sym_operand" "X")
+   (match_operand 2 "consttable_operand" "X")
+   (match_operand 3 "consttable_operand" "X")
+   (match_operand 4 "" "")]
+  "TARGET_CPU_ZARCH"
+{
+  if (TARGET_64BIT)
+    emit_jump_insn (gen_split_stack_cond_call_di (operands[0],
+						  operands[1],
+						  operands[2],
+						  operands[3],
+						  operands[4]));
+  else
+    emit_jump_insn (gen_split_stack_cond_call_si (operands[0],
+						  operands[1],
+						  operands[2],
+						  operands[3],
+						  operands[4]));
+  DONE;
+})
+
+(define_insn "split_stack_cond_call_<mode>"
+  [(set (pc)
+	(if_then_else
+	  (match_operand 4 "" "")
+	  (label_ref (match_operand 0 "" ""))
+	  (pc)))
+   (set (reg:P 1) (unspec_volatile [(match_operand 1 "bras_sym_operand" "X")
+				    (match_operand 2 "consttable_operand" "X")
+				    (match_operand 3 "consttable_operand" "X")]
+				   UNSPECV_SPLIT_STACK_CALL))]
+  "TARGET_CPU_ZARCH"
+{
+  gcc_unreachable ();
+}
+  [(set_attr "length" "12")])
+
+;; If there are operand 0 bytes available on the stack, jump to
+;; operand 1.
+
+(define_expand "split_stack_space_check"
+  [(set (pc) (if_then_else
+	      (ltu (minus (reg 15)
+			  (match_operand 0 "register_operand"))
+		   (unspec [(const_int 0)] UNSPEC_STACK_CHECK))
+	      (label_ref (match_operand 1))
+	      (pc)))]
+  ""
+{
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = TARGET_64BIT ? 0x38 : 0x20;
+  rtx tp = s390_get_thread_pointer ();
+  rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso));
+  rtx reg = gen_reg_rtx (Pmode);
+  rtx cc;
+  if (TARGET_64BIT)
+    emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0]));
+  else
+    emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0]));
+  cc = s390_emit_compare (GT, reg, guard);
+  s390_emit_jump (operands[1], cc);
+
+  DONE;
+})
+
+;; A jg with minimal fuss for use in split stack prologue.
+
+(define_expand "split_stack_sibcall"
+  [(match_operand 0 "bras_sym_operand" "X")
+   (match_operand 1 "" "")]
+  "TARGET_CPU_ZARCH"
+{
+  if (TARGET_64BIT)
+    emit_jump_insn (gen_split_stack_sibcall_di (operands[0], operands[1]));
+  else
+    emit_jump_insn (gen_split_stack_sibcall_si (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "split_stack_sibcall_<mode>"
+  [(set (pc) (label_ref (match_operand 1 "" "")))
+   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
+				   UNSPECV_SPLIT_STACK_SIBCALL))]
+  "TARGET_CPU_ZARCH"
+  "jg\t%0"
+  [(set_attr "op_type" "RIL")
+   (set_attr "type"  "branch")])
+
+;; Also a conditional one.
+
+(define_expand "split_stack_cond_sibcall"
+  [(match_operand 0 "bras_sym_operand" "X")
+   (match_operand 1 "" "")
+   (match_operand 2 "" "")]
+  "TARGET_CPU_ZARCH"
+{
+  if (TARGET_64BIT)
+    emit_jump_insn (gen_split_stack_cond_sibcall_di (operands[0], operands[1], operands[2]));
+  else
+    emit_jump_insn (gen_split_stack_cond_sibcall_si (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "split_stack_cond_sibcall_<mode>"
+  [(set (pc)
+	(if_then_else
+	  (match_operand 1 "" "")
+	  (label_ref (match_operand 2 "" ""))
+	  (pc)))
+   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")]
+				   UNSPECV_SPLIT_STACK_SIBCALL))]
+  "TARGET_CPU_ZARCH"
+  "jg%C1\t%0"
+  [(set_attr "op_type" "RIL")
+   (set_attr "type"  "branch")])
+
+;; An unusual nop instruction used to mark functions with no stack frames
+;; as split-stack aware.
+
+(define_insn "split_stack_marker"
+  [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_MARKER)]
+  ""
+  "nopr\t%%r15"
+  [(set_attr "op_type" "RR")])
diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index 49c7929..3900ab1 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-02-02  Marcin Kościelnicki  <koriakin@0x04.net>
+
+	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
+	* config/s390/morestack.S: New file.
+	* config/s390/t-stack-s390: New file.
+	* generic-morestack.c (__splitstack_find): Add s390-specific code.
+
 2016-01-25  Jakub Jelinek  <jakub@redhat.com>
 
 	PR target/69444
diff --git a/libgcc/config.host b/libgcc/config.host
index d8efd82..2be5f7e 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1114,11 +1114,11 @@ rx-*-elf)
 	tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h"
 	;;
 s390-*-linux*)
-	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi"
+	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390"
 	md_unwind_header=s390/linux-unwind.h
 	;;
 s390x-*-linux*)
-	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux"
+	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390"
 	if test "${host_address}" = 32; then
 	   tmake_file="${tmake_file} s390/32/t-floattodi"
 	fi
diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S
new file mode 100644
index 0000000..141dead
--- /dev/null
+++ b/libgcc/config/s390/morestack.S
@@ -0,0 +1,609 @@
+# s390 support for -fsplit-stack.
+# Copyright (C) 2015 Free Software Foundation, Inc.
+# Contributed by Marcin Kościelnicki <koriakin@0x04.net>.
+
+# This file is part of GCC.
+
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# Excess space needed to call ld.so resolver for lazy plt
+# resolution.  Go uses sigaltstack so this doesn't need to
+# also cover signal frame size.
+#define BACKOFF 0x1000
+
+# The __morestack function.
+
+	.global	__morestack
+	.hidden	__morestack
+
+	.type	__morestack,@function
+
+__morestack:
+.LFB1:
+	.cfi_startproc
+
+
+#ifndef __s390x__
+
+
+# The 31-bit __morestack function.
+
+	# We use a cleanup to restore the stack guard if an exception
+	# is thrown through this code.
+#ifndef __PIC__
+	.cfi_personality 0,__gcc_personality_v0
+	.cfi_lsda 0,.LLSDA1
+#else
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#endif
+
+	stm	%r2, %r15, 0x8(%r15)	# Save %r2-%r15.
+	.cfi_offset %r6, -0x48
+	.cfi_offset %r7, -0x44
+	.cfi_offset %r8, -0x40
+	.cfi_offset %r9, -0x3c
+	.cfi_offset %r10, -0x38
+	.cfi_offset %r11, -0x34
+	.cfi_offset %r12, -0x30
+	.cfi_offset %r13, -0x2c
+	.cfi_offset %r14, -0x28
+	.cfi_offset %r15, -0x24
+	lr	%r11, %r15		# Make frame pointer for vararg.
+	.cfi_def_cfa_register %r11
+	ahi	%r15, -0x60		# 0x60 for standard frame.
+	st	%r11, 0(%r15)		# Save back chain.
+	lr	%r8, %r0		# Save %r0 (static chain).
+	lr	%r10, %r1		# Save %r1 (address of parameter block).
+
+	l	%r7, 0(%r10)		# Required frame size to %r7
+	ear	%r1, %a0		# Extract thread pointer.
+	l	%r1, 0x20(%r1)		# Get stack bounduary
+	ar	%r1, %r7		# Stack bounduary + frame size
+	a	%r1, 4(%r10)		# + stack param size
+	clr	%r1, %r15		# Compare with current stack pointer
+	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
+
+	brasl	%r14, __morestack_block_signals
+
+	# We abuse one of caller's fpr save slots (which we don't use for fprs)
+	# as a local variable.  Not needed here, but done to be consistent with
+	# the below use.
+	ahi	%r7, BACKOFF		# Bump requested size a bit.
+	st	%r7, 0x40(%r11)		# Stuff frame size on stack.
+	la	%r2, 0x40(%r11)		# Pass its address as parameter.
+	la	%r3, 0x60(%r11)		# Caller's stack parameters.
+	l	%r4, 4(%r10)		# Size of stack parameters.
+	brasl	%r14, __generic_morestack
+
+	lr	%r15, %r2		# Switch to the new stack.
+	ahi	%r15, -0x60		# Make a stack frame on it.
+	st	%r11, 0(%r15)		# Save back chain.
+
+	s	%r2, 0x40(%r11)		# The end of stack space.
+	ahi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+.LEHB0:
+	st	%r2, 0x20(%r1)	# Save the new stack boundary.
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lr	%r0, %r8		# Static chain.
+	lm	%r2, %r6, 0x8(%r11)	# Paremeter registers.
+
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	a	%r10, 0x8(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0x60(%r11)
+
+	# State of registers:
+	# %r0: Static chain from entry.
+	# %r1: Vararg pointer.
+	# %r2-%r6: Parameters from entry.
+	# %r7-%r10: Indeterminate.
+	# %r11: Frame pointer (%r15 from entry).
+	# %r12-%r13: Indeterminate.
+	# %r14: Return address.
+	# %r15: Stack pointer.
+	basr	%r14, %r10		# Call our caller.
+
+	stm	%r2, %r3, 0x8(%r11)	# Save return registers.
+
+	brasl	%r14, __morestack_block_signals
+
+	# We need a stack slot now, but have no good way to get it - the frame
+	# on new stack had to be exactly 0x60 bytes, or stack parameters would
+	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
+	# save actual fprs).
+	la	%r2, 0x40(%r11)
+	brasl	%r14, __generic_releasestack
+
+	s	%r2, 0x40(%r11)		# Subtract available space.
+	ahi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+.LEHE0:
+	st	%r2, 0x20(%r1)	# Save the new stack boundary.
+
+	# We need to restore the old stack pointer before unblocking signals.
+	# We also need 0x60 bytes for a stack frame.  Since we had a stack
+	# frame at this place before the stack switch, there's no need to
+	# write the back chain again.
+	lr	%r15, %r11
+	ahi	%r15, -0x60
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lm	%r2, %r15, 0x8(%r11)	# Restore all registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# Executed if no new stack allocation is needed.
+
+.Lnoalloc:
+	.cfi_restore_state
+	# We may need to copy stack parameters.
+	l	%r9, 0x4(%r10)		# Load stack parameter size.
+	ltr	%r9, %r9		# And check if it's 0.
+	je	.Lnostackparm		# Skip the copy if not needed.
+	sr	%r15, %r9		# Make space on the stack.
+	la	%r8, 0x60(%r15)		# Destination.
+	la	%r12, 0x60(%r11)	# Source.
+	lr	%r13, %r9		# Source size.
+.Lcopy:
+	mvcle	%r8, %r12, 0		# Copy.
+	jo	.Lcopy
+
+.Lnostackparm:
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	a	%r10, 0x8(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0x60(%r11)
+
+	# OK, no stack allocation needed.  We still follow the protocol and
+	# call our caller - it doesn't cost much and makes sure vararg works.
+	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
+	basr	%r14, %r10		# Call our caller.
+
+	lm	%r6, %r15, 0x18(%r11)	# Restore all callee-saved registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# This is the cleanup code called by the stack unwinder when unwinding
+# through the code between .LEHB0 and .LEHE0 above.
+
+.L1:
+	.cfi_restore_state
+	lr	%r2, %r11		# Stack pointer after resume.
+	brasl	%r14, __generic_findstack
+	lr	%r3, %r11		# Get the stack pointer.
+	sr	%r3, %r2		# Subtract available space.
+	ahi	%r3, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+	st	%r3, 0x20(%r1)	# Save the new stack boundary.
+
+	lr	%r2, %r6		# Exception header.
+#ifdef __PIC__
+	brasl	%r14, _Unwind_Resume@PLT
+#else
+	brasl	%r14, _Unwind_Resume
+#endif
+
+#else /* defined(__s390x__) */
+
+
+# The 64-bit __morestack function.
+
+	# We use a cleanup to restore the stack guard if an exception
+	# is thrown through this code.
+#ifndef __PIC__
+	.cfi_personality 0x3,__gcc_personality_v0
+	.cfi_lsda 0x3,.LLSDA1
+#else
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#endif
+
+	stmg	%r2, %r15, 0x10(%r15)	# Save %r2-%r15.
+	.cfi_offset %r6, -0x70
+	.cfi_offset %r7, -0x68
+	.cfi_offset %r8, -0x60
+	.cfi_offset %r9, -0x58
+	.cfi_offset %r10, -0x50
+	.cfi_offset %r11, -0x48
+	.cfi_offset %r12, -0x40
+	.cfi_offset %r13, -0x38
+	.cfi_offset %r14, -0x30
+	.cfi_offset %r15, -0x28
+	lgr	%r11, %r15		# Make frame pointer for vararg.
+	.cfi_def_cfa_register %r11
+	aghi	%r15, -0xa0		# 0xa0 for standard frame.
+	stg	%r11, 0(%r15)		# Save back chain.
+	lgr	%r8, %r0		# Save %r0 (static chain).
+	lgr	%r10, %r1		# Save %r1 (address of parameter block).
+
+	lg	%r7, 0(%r10)		# Required frame size to %r7
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+	lg	%r1, 0x38(%r1)		# Get stack bounduary
+	agr	%r1, %r7		# Stack bounduary + frame size
+	ag	%r1, 8(%r10)		# + stack param size
+	clgr	%r1, %r15		# Compare with current stack pointer
+	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
+
+	brasl	%r14, __morestack_block_signals
+
+	# We abuse one of caller's fpr save slots (which we don't use for fprs)
+	# as a local variable.  Not needed here, but done to be consistent with
+	# the below use.
+	aghi	%r7, BACKOFF		# Bump requested size a bit.
+	stg	%r7, 0x80(%r11)		# Stuff frame size on stack.
+	la	%r2, 0x80(%r11)		# Pass its address as parameter.
+	la	%r3, 0xa0(%r11)		# Caller's stack parameters.
+	lg	%r4, 8(%r10)		# Size of stack parameters.
+	brasl	%r14, __generic_morestack
+
+	lgr	%r15, %r2		# Switch to the new stack.
+	aghi	%r15, -0xa0		# Make a stack frame on it.
+	stg	%r11, 0(%r15)		# Save back chain.
+
+	sg	%r2, 0x80(%r11)		# The end of stack space.
+	aghi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+.LEHB0:
+	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lgr	%r0, %r8		# Static chain.
+	lmg	%r2, %r6, 0x10(%r11)	# Paremeter registers.
+
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	ag	%r10, 0x10(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0xa0(%r11)
+
+	# State of registers:
+	# %r0: Static chain from entry.
+	# %r1: Vararg pointer.
+	# %r2-%r6: Parameters from entry.
+	# %r7-%r10: Indeterminate.
+	# %r11: Frame pointer (%r15 from entry).
+	# %r12-%r13: Indeterminate.
+	# %r14: Return address.
+	# %r15: Stack pointer.
+	basr	%r14, %r10		# Call our caller.
+
+	stg	%r2, 0x10(%r11)		# Save return register.
+
+	brasl	%r14, __morestack_block_signals
+
+	# We need a stack slot now, but have no good way to get it - the frame
+	# on new stack had to be exactly 0xa0 bytes, or stack parameters would
+	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
+	# save actual fprs).
+	la	%r2, 0x80(%r11)
+	brasl	%r14, __generic_releasestack
+
+	sg	%r2, 0x80(%r11)		# Subtract available space.
+	aghi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+.LEHE0:
+	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
+
+	# We need to restore the old stack pointer before unblocking signals.
+	# We also need 0xa0 bytes for a stack frame.  Since we had a stack
+	# frame at this place before the stack switch, there's no need to
+	# write the back chain again.
+	lgr	%r15, %r11
+	aghi	%r15, -0xa0
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lmg	%r2, %r15, 0x10(%r11)	# Restore all registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# Executed if no new stack allocation is needed.
+
+.Lnoalloc:
+	.cfi_restore_state
+	# We may need to copy stack parameters.
+	lg	%r9, 0x8(%r10)		# Load stack parameter size.
+	ltgr	%r9, %r9		# Check if it's 0.
+	je	.Lnostackparm		# Skip the copy if not needed.
+	sgr	%r15, %r9		# Make space on the stack.
+	la	%r8, 0xa0(%r15)		# Destination.
+	la	%r12, 0xa0(%r11)	# Source.
+	lgr	%r13, %r9		# Source size.
+.Lcopy:
+	mvcle	%r8, %r12, 0		# Copy.
+	jo	.Lcopy
+
+.Lnostackparm:
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	ag	%r10, 0x10(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0xa0(%r11)
+
+	# OK, no stack allocation needed.  We still follow the protocol and
+	# call our caller - it doesn't cost much and makes sure vararg works.
+	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
+	basr	%r14, %r10		# Call our caller.
+
+	lmg	%r6, %r15, 0x30(%r11)	# Restore all callee-saved registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# This is the cleanup code called by the stack unwinder when unwinding
+# through the code between .LEHB0 and .LEHE0 above.
+
+.L1:
+	.cfi_restore_state
+	lgr	%r2, %r11		# Stack pointer after resume.
+	brasl	%r14, __generic_findstack
+	lgr	%r3, %r11		# Get the stack pointer.
+	sgr	%r3, %r2		# Subtract available space.
+	aghi	%r3, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+	stg	%r3, 0x38(%r1)	# Save the new stack boundary.
+
+	lgr	%r2, %r6		# Exception header.
+#ifdef __PIC__
+	brasl	%r14, _Unwind_Resume@PLT
+#else
+	brasl	%r14, _Unwind_Resume
+#endif
+
+#endif /* defined(__s390x__) */
+
+	.cfi_endproc
+	.size	__morestack, . - __morestack
+
+
+# The exception table.  This tells the personality routine to execute
+# the exception handler.
+
+	.section	.gcc_except_table,"a",@progbits
+	.align	4
+.LLSDA1:
+	.byte	0xff	# @LPStart format (omit)
+	.byte	0xff	# @TType format (omit)
+	.byte	0x1	# call-site format (uleb128)
+	.uleb128 .LLSDACSE1-.LLSDACSB1	# Call-site table length
+.LLSDACSB1:
+	.uleb128 .LEHB0-.LFB1	# region 0 start
+	.uleb128 .LEHE0-.LEHB0	# length
+	.uleb128 .L1-.LFB1	# landing pad
+	.uleb128 0		# action
+.LLSDACSE1:
+
+
+	.global __gcc_personality_v0
+#ifdef __PIC__
+	# Build a position independent reference to the basic
+	# personality function.
+	.hidden DW.ref.__gcc_personality_v0
+	.weak   DW.ref.__gcc_personality_v0
+	.section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
+	.type	DW.ref.__gcc_personality_v0, @object
+DW.ref.__gcc_personality_v0:
+#ifndef __LP64__
+	.align 4
+	.size	DW.ref.__gcc_personality_v0, 4
+	.long	__gcc_personality_v0
+#else
+	.align 8
+	.size	DW.ref.__gcc_personality_v0, 8
+	.quad	__gcc_personality_v0
+#endif
+#endif
+
+
+
+# Initialize the stack test value when the program starts or when a
+# new thread starts.  We don't know how large the main stack is, so we
+# guess conservatively.  We might be able to use getrlimit here.
+
+	.text
+	.global	__stack_split_initialize
+	.hidden	__stack_split_initialize
+
+	.type	__stack_split_initialize, @function
+
+__stack_split_initialize:
+
+#ifndef __s390x__
+
+	ear	%r1, %a0
+	lr	%r0, %r15
+	ahi	%r0, -0x4000	# We should have at least 16K.
+	st	%r0, 0x20(%r1)
+
+	lr	%r2, %r15
+	lhi	%r3, 0x4000
+#ifdef __PIC__
+	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
+#else
+	jg	__generic_morestack_set_initial_sp	# Tail call
+#endif
+
+#else /* defined(__s390x__) */
+
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	lgr	%r0, %r15
+	aghi	%r0, -0x4000	# We should have at least 16K.
+	stg	%r0, 0x38(%r1)
+
+	lgr	%r2, %r15
+	lghi	%r3, 0x4000
+#ifdef __PIC__
+	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
+#else
+	jg	__generic_morestack_set_initial_sp	# Tail call
+#endif
+
+#endif /* defined(__s390x__) */
+
+	.size	__stack_split_initialize, . - __stack_split_initialize
+
+# Routines to get and set the guard, for __splitstack_getcontext,
+# __splitstack_setcontext, and __splitstack_makecontext.
+
+# void *__morestack_get_guard (void) returns the current stack guard.
+	.text
+	.global	__morestack_get_guard
+	.hidden	__morestack_get_guard
+
+	.type	__morestack_get_guard,@function
+
+__morestack_get_guard:
+
+#ifndef __s390x__
+	ear	%r1, %a0
+	l	%r2, 0x20(%r1)
+#else
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	lg	%r2, 0x38(%r1)
+#endif
+	br %r14
+
+	.size	__morestack_get_guard, . - __morestack_get_guard
+
+# void __morestack_set_guard (void *) sets the stack guard.
+	.global	__morestack_set_guard
+	.hidden	__morestack_set_guard
+
+	.type	__morestack_set_guard,@function
+
+__morestack_set_guard:
+
+#ifndef __s390x__
+	ear	%r1, %a0
+	st	%r2, 0x20(%r1)
+#else
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	stg	%r2, 0x38(%r1)
+#endif
+	br	%r14
+
+	.size	__morestack_set_guard, . - __morestack_set_guard
+
+# void *__morestack_make_guard (void *, size_t) returns the stack
+# guard value for a stack.
+	.global	__morestack_make_guard
+	.hidden	__morestack_make_guard
+
+	.type	__morestack_make_guard,@function
+
+__morestack_make_guard:
+
+#ifndef __s390x__
+	sr	%r2, %r3
+	ahi	%r2, BACKOFF
+#else
+	sgr	%r2, %r3
+	aghi	%r2, BACKOFF
+#endif
+	br	%r14
+
+	.size	__morestack_make_guard, . - __morestack_make_guard
+
+# Make __stack_split_initialize a high priority constructor.
+
+	.section .ctors.65535,"aw",@progbits
+
+#ifndef __LP64__
+	.align	4
+	.long	__stack_split_initialize
+	.long	__morestack_load_mmap
+#else
+	.align	8
+	.quad	__stack_split_initialize
+	.quad	__morestack_load_mmap
+#endif
+
+	.section	.note.GNU-stack,"",@progbits
+	.section	.note.GNU-split-stack,"",@progbits
+	.section	.note.GNU-no-split-stack,"",@progbits
diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390
new file mode 100644
index 0000000..4c959b0
--- /dev/null
+++ b/libgcc/config/s390/t-stack-s390
@@ -0,0 +1,2 @@
+# Makefile fragment to support -fsplit-stack for s390.
+LIB2ADD_ST += $(srcdir)/config/s390/morestack.S
diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
index 89765d4..b8eec4e 100644
--- a/libgcc/generic-morestack.c
+++ b/libgcc/generic-morestack.c
@@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len,
 #elif defined (__i386__)
       nsp -= 6 * sizeof (void *);
 #elif defined __powerpc64__
+#elif defined __s390x__
+      nsp -= 2 * 160;
+#elif defined __s390__
+      nsp -= 2 * 96;
 #else
 #error "unrecognized target"
 #endif
-- 
2.7.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-02 15:31                 ` Marcin Kościelnicki
@ 2016-02-02 18:34                   ` Ulrich Weigand
  2016-02-02 20:11                     ` Marcin Kościelnicki
  2016-02-03  0:20                     ` Marcin Kościelnicki
  0 siblings, 2 replies; 55+ messages in thread
From: Ulrich Weigand @ 2016-02-02 18:34 UTC (permalink / raw)
  To: Marcin Kościelnicki; +Cc: krebbel, gcc-patches, Marcin Kościelnicki

Marcin Kościelnicki wrote:

> Here we go.  I've also removed the "see below", since I don't really
> see anything below...

The "see below" refers to this code (which I agree isn't really obvious):

  if (TARGET_TPF_PROFILING)
    {
      /* Generate a BAS instruction to serve as a function
         entry intercept to facilitate the use of tracing
         algorithms located at the branch target.  */
      emit_insn (gen_prologue_tpf ());

What is not explicitly called out here is that this tracing function
actually refers to some hard registers, in particular r14, and assumes
they still have the original contents as at function entry.

That is why the prolog code avoid using r14 as temporary if the TPF
tracing mechanism is in use.  Now I think this doesn't apply to r12,
so this part of your patch should still be fine.  (In addition, TPF
is not going to support split stacks --or indeed the Go language--
anyway, so it doesn't really matter all that much.)


I do have two other issues; sorry for bringing those up again although
they've been discussed up in the past, but I still think we can find
some improvements here ...

The first is the question Andreas brought up, why we need the extra
set of insns introduced by s390_reorg.  I think this may really have
been necessary for the ESA case where data elements had to be intermixed
into code at a specific location.  But since we no longer support ESA,
we now just have a data block that can be placed anywhere.  For example,
we could just have an insn (at any point in the prolog stream) that
simply emits the full data block during final output, along the lines of
(note: needs to be updated for SImode vs. DImode.):

(define_insn "split_stack_data"
  [(unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
                     (match_operand 1 "bras_sym_operand" "X")
                     (match_operand 2 "consttable_operand" "X")
                     (match_operand 3 "consttable_operand" "X")]
                    UNSPECV_SPLIT_STACK_DATA)]
  ""
{
  switch_to_section (targetm.asm_out.function_rodata_section
                      (current_function_decl));

  output_asm_insn (\".align 3", operands);
  (*targetm.asm_out.internal_label) (asm_out_file, \"L\",
                                     CODE_LABEL_NUMBER (operands[0]));
  output_asm_insn (\".quad %2\", operands);
  output_asm_insn (\".quad %3\", operands);
  output_asm_insn (\".quad %1-%0\", operands);

  switch_to_section (current_function_section ());
  return "";
}
  [(set_attr "length" "0")])

Or possibly even cleaner, we can simply define the data block at the
tree level as if it were an initialized global variable of a certain
struct type, and just leave it to common code to emit it as usual.

Then we just have the code bits, but I don't really see much
difference between the split_stack_call and split_stack_sibcall
patterns (apart from the data block), so if code flow is OK with
the former insns, it should be OK with the latter too ..

[ Or else, if there *are* code flow issues, the other alternative
would be to emit the full call sequence, code and data, from a
single insn pattern during final output.  This might have the extra
benefit that the assembler sequence is fully fixed, and thus easier
to detect in the linker.  ]

Getting rid of the extra transformation in s390_reorg would not
just remove a bunch of code from the back-end (always good!),
it would also speed up compile time a bit.


The second issue I'm still not sure about is the magic nop marker
for frameless functions.  In an earlier mail you wrote:

> Both currently supported 
> architectures always emit split-stack code on every function.

At least for rs6000 this doesn't appear to be true; in
rs6000_expand_split_stack_prologue we have:

  if (!info->push_p)
    return;

so it does nothing for frameless routines.

Now on i386 we do indeed generate code for frameless routines;
in fact, the *same* full stack check is generated as for any
other routine.  Now I'm wondering: is there are reason why
this check would be necessary (and there's simply a bug in
the rs6000 implementation)?  Then we obviously should do the
same on s390.

On the other hand, if rs6000 works fine *without* any code
in frameless routines, why wouldn't that work for s390 too?

Emitting a nop (that is always executed) still looks weird to me.


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-02 18:34                   ` Ulrich Weigand
@ 2016-02-02 20:11                     ` Marcin Kościelnicki
  2016-02-03 18:40                       ` Marcin Kościelnicki
  2016-02-03  0:20                     ` Marcin Kościelnicki
  1 sibling, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-02-02 20:11 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: krebbel, gcc-patches

On 02/02/16 19:33, Ulrich Weigand wrote:
> Marcin Kościelnicki wrote:
>
>> Here we go.  I've also removed the "see below", since I don't really
>> see anything below...
>
> The "see below" refers to this code (which I agree isn't really obvious):
>
>    if (TARGET_TPF_PROFILING)
>      {
>        /* Generate a BAS instruction to serve as a function
>           entry intercept to facilitate the use of tracing
>           algorithms located at the branch target.  */
>        emit_insn (gen_prologue_tpf ());
>
> What is not explicitly called out here is that this tracing function
> actually refers to some hard registers, in particular r14, and assumes
> they still have the original contents as at function entry.
>
> That is why the prolog code avoid using r14 as temporary if the TPF
> tracing mechanism is in use.  Now I think this doesn't apply to r12,
> so this part of your patch should still be fine.  (In addition, TPF
> is not going to support split stacks --or indeed the Go language--
> anyway, so it doesn't really matter all that much.)

Very well, I'll improve the comment.
>
>
> I do have two other issues; sorry for bringing those up again although
> they've been discussed up in the past, but I still think we can find
> some improvements here ...
>
> The first is the question Andreas brought up, why we need the extra
> set of insns introduced by s390_reorg.  I think this may really have
> been necessary for the ESA case where data elements had to be intermixed
> into code at a specific location.  But since we no longer support ESA,
> we now just have a data block that can be placed anywhere.  For example,
> we could just have an insn (at any point in the prolog stream) that
> simply emits the full data block during final output, along the lines of
> (note: needs to be updated for SImode vs. DImode.):
>
> (define_insn "split_stack_data"
>    [(unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
>                       (match_operand 1 "bras_sym_operand" "X")
>                       (match_operand 2 "consttable_operand" "X")
>                       (match_operand 3 "consttable_operand" "X")]
>                      UNSPECV_SPLIT_STACK_DATA)]
>    ""
> {
>    switch_to_section (targetm.asm_out.function_rodata_section
>                        (current_function_decl));
>
>    output_asm_insn (\".align 3", operands);
>    (*targetm.asm_out.internal_label) (asm_out_file, \"L\",
>                                       CODE_LABEL_NUMBER (operands[0]));
>    output_asm_insn (\".quad %2\", operands);
>    output_asm_insn (\".quad %3\", operands);
>    output_asm_insn (\".quad %1-%0\", operands);
>
>    switch_to_section (current_function_section ());
>    return "";
> }
>    [(set_attr "length" "0")])
>
> Or possibly even cleaner, we can simply define the data block at the
> tree level as if it were an initialized global variable of a certain
> struct type, and just leave it to common code to emit it as usual.
>
> Then we just have the code bits, but I don't really see much
> difference between the split_stack_call and split_stack_sibcall
> patterns (apart from the data block), so if code flow is OK with
> the former insns, it should be OK with the latter too ..
>
> [ Or else, if there *are* code flow issues, the other alternative
> would be to emit the full call sequence, code and data, from a
> single insn pattern during final output.  This might have the extra
> benefit that the assembler sequence is fully fixed, and thus easier
> to detect in the linker.  ]
>
> Getting rid of the extra transformation in s390_reorg would not
> just remove a bunch of code from the back-end (always good!),
> it would also speed up compile time a bit.

When I wasn't using reorg, I had problems with gcc deleting the label in 
.rodata, since it wasn't used by any jump instruction.  I guess having a 
whole-block instruction that emits the label on its own should solve the 
issue, though - let's try that.
>
>
> The second issue I'm still not sure about is the magic nop marker
> for frameless functions.  In an earlier mail you wrote:
>
>> Both currently supported
>> architectures always emit split-stack code on every function.
>
> At least for rs6000 this doesn't appear to be true; in
> rs6000_expand_split_stack_prologue we have:
>
>    if (!info->push_p)
>      return;
>
> so it does nothing for frameless routines.
>
> Now on i386 we do indeed generate code for frameless routines;
> in fact, the *same* full stack check is generated as for any
> other routine.  Now I'm wondering: is there are reason why
> this check would be necessary (and there's simply a bug in
> the rs6000 implementation)?  Then we obviously should do the
> same on s390.

Try that on powerpc64(le):

$ cat a.c
#include <stdio.h>

void f(void) {
}

typedef void (*fptr)(void);

fptr g(void);

int main() {
         printf("%p\n", g());
}

$ cat b.c
void f(void);

typedef void (*fptr)(void);

fptr g(void) {
         return f;
}

$ gcc -O3 -fsplit-stack -c b.c
$ gcc -O3 -c a.c
$ gcc a.o b.o -fuse-ld=gold

I don't have a recent enough gcc for powerpc, but from what I've seen in 
the code, this should explode with a linker error.

Of course, mixing split-stack and non-split-stack code when function 
pointers are involved is sketchy anyway, so what's one more bug...

That said, for s390, we can avoid the above problem by checking the 
relocation in gold now that ESA paths are gone - for direct function 
calls (the only ones we care about), we should be seeing a relocation in 
brasl.  So I'll remove the nopmark thing and add proper recognition in gold.

>
> On the other hand, if rs6000 works fine *without* any code
> in frameless routines, why wouldn't that work for s390 too?
>
> Emitting a nop (that is always executed) still looks weird to me.
>
>
> Bye,
> Ulrich
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] s390: Add -fsplit-stack support
  2016-02-02 18:34                   ` Ulrich Weigand
  2016-02-02 20:11                     ` Marcin Kościelnicki
@ 2016-02-03  0:20                     ` Marcin Kościelnicki
  2016-02-03 17:03                       ` Ulrich Weigand
  1 sibling, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-02-03  0:20 UTC (permalink / raw)
  To: uweigand; +Cc: krebbel, gcc-patches, Marcin Kościelnicki

libgcc/ChangeLog:

	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
	* config/s390/morestack.S: New file.
	* config/s390/t-stack-s390: New file.
	* generic-morestack.c (__splitstack_find): Add s390-specific code.

gcc/ChangeLog:

	* common/config/s390/s390-common.c (s390_supports_split_stack):
	New function.
	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
	* config/s390/s390.c (struct machine_function): New field
	split_stack_varargs_pointer.
	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
	in s390_emit_prologue.
	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
	vararg pointer.
	(morestack_ref): New global.
	(SPLIT_STACK_AVAILABLE): New macro.
	(s390_expand_split_stack_prologue): New function.
	(s390_live_on_entry): New function.
	(s390_va_start): Use split-stack vararg pointer if appropriate.
	(s390_asm_file_end): Emit the split-stack note sections.
	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
	(UNSPECV_SPLIT_STACK_CALL): New unspec.
	(UNSPECV_SPLIT_STACK_DATA): New unspec.
	(split_stack_prologue): New expand.
	(split_stack_space_check): New expand.
	(split_stack_data): New insn.
	(split_stack_call): New expand.
	(split_stack_call_*): New insn.
	(split_stack_cond_call): New expand.
	(split_stack_cond_call_*): New insn.
---
Comment fixed, split_stack_marker gone, reorg gone.  Generated code seems sane,
but testsuite still running.

I will need to modify the gold patch to handle the "leaf function taking non-split
stack function address" issue - this will likely require messing with the target
independent plumbing, the hook for that doesn't seem to get enough params.

 gcc/ChangeLog                        |  30 ++
 gcc/common/config/s390/s390-common.c |  14 +
 gcc/config/s390/s390-protos.h        |   1 +
 gcc/config/s390/s390.c               | 214 +++++++++++-
 gcc/config/s390/s390.md              | 138 ++++++++
 libgcc/ChangeLog                     |   7 +
 libgcc/config.host                   |   4 +-
 libgcc/config/s390/morestack.S       | 609 +++++++++++++++++++++++++++++++++++
 libgcc/config/s390/t-stack-s390      |   2 +
 libgcc/generic-morestack.c           |   4 +
 10 files changed, 1016 insertions(+), 7 deletions(-)
 create mode 100644 libgcc/config/s390/morestack.S
 create mode 100644 libgcc/config/s390/t-stack-s390

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9a2cec8..568dff4 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,33 @@
+2016-02-02  Marcin Kościelnicki  <koriakin@0x04.net>
+
+	* common/config/s390/s390-common.c (s390_supports_split_stack):
+	New function.
+	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
+	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
+	* config/s390/s390.c (struct machine_function): New field
+	split_stack_varargs_pointer.
+	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
+	in s390_emit_prologue.
+	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
+	vararg pointer.
+	(morestack_ref): New global.
+	(SPLIT_STACK_AVAILABLE): New macro.
+	(s390_expand_split_stack_prologue): New function.
+	(s390_live_on_entry): New function.
+	(s390_va_start): Use split-stack vararg pointer if appropriate.
+	(s390_asm_file_end): Emit the split-stack note sections.
+	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
+	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
+	(UNSPECV_SPLIT_STACK_CALL): New unspec.
+	(UNSPECV_SPLIT_STACK_DATA): New unspec.
+	(split_stack_prologue): New expand.
+	(split_stack_space_check): New expand.
+	(split_stack_data): New insn.
+	(split_stack_call): New expand.
+	(split_stack_call_*): New insn.
+	(split_stack_cond_call): New expand.
+	(split_stack_cond_call_*): New insn.
+
 2016-02-02  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): Remove.
diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c
index 4519c21..1e497e6 100644
--- a/gcc/common/config/s390/s390-common.c
+++ b/gcc/common/config/s390/s390-common.c
@@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
     }
 }
 
+/* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
+   We don't verify it, since earlier versions just have padding at
+   its place, which works just as well.  */
+
+static bool
+s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
+			   struct gcc_options *opts ATTRIBUTE_UNUSED)
+{
+  return true;
+}
+
 #undef TARGET_DEFAULT_TARGET_FLAGS
 #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT)
 
@@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
 #undef TARGET_OPTION_INIT_STRUCT
 #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct
 
+#undef TARGET_SUPPORTS_SPLIT_STACK
+#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack
+
 struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 633bc1e..09032c9 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
 extern HOST_WIDE_INT s390_initial_elimination_offset (int, int);
 extern void s390_emit_prologue (void);
 extern void s390_emit_epilogue (bool);
+extern void s390_expand_split_stack_prologue (void);
 extern bool s390_can_use_simple_return_insn (void);
 extern bool s390_can_use_return_insn (void);
 extern void s390_function_profiler (FILE *, int);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 3be64de..aafb442 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -426,6 +426,13 @@ struct GTY(()) machine_function
   /* True if the current function may contain a tbegin clobbering
      FPRs.  */
   bool tbegin_p;
+
+  /* For -fsplit-stack support: A stack local which holds a pointer to
+     the stack arguments for a function with a variable number of
+     arguments.  This is set at the start of the function and is used
+     to initialize the overflow_arg_area field of the va_list
+     structure.  */
+  rtx split_stack_varargs_pointer;
 };
 
 /* Few accessor macros for struct cfun->machine->s390_frame_layout.  */
@@ -9316,9 +9323,13 @@ s390_register_info ()
 	  cfun_frame_layout.high_fprs++;
       }
 
-  if (flag_pic)
-    clobbered_regs[PIC_OFFSET_TABLE_REGNUM]
-      |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM);
+  /* Register 12 is used for GOT address, but also as temp in prologue
+     for split-stack stdarg functions (unless r14 is available).  */
+  clobbered_regs[12]
+    |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM))
+	|| (flag_split_stack && cfun->stdarg
+	    && (crtl->is_leaf || TARGET_TPF_PROFILING
+		|| has_hard_reg_initial_val (Pmode, RETURN_REGNUM))));
 
   clobbered_regs[BASE_REGNUM]
     |= (cfun->machine->base_reg
@@ -10440,12 +10451,15 @@ s390_emit_prologue (void)
   int next_fpr = 0;
 
   /* Choose best register to use for temp use within prologue.
-     See below for why TPF must use the register 1.  */
+     TPF with profiling must avoid the register 14 - the tracing function
+     needs the original contents of r14 to be preserved.  */
 
   if (!has_hard_reg_initial_val (Pmode, RETURN_REGNUM)
       && !crtl->is_leaf
       && !TARGET_TPF_PROFILING)
     temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM);
+  else if (flag_split_stack && cfun->stdarg)
+    temp_reg = gen_rtx_REG (Pmode, 12);
   else
     temp_reg = gen_rtx_REG (Pmode, 1);
 
@@ -10939,6 +10953,166 @@ s300_set_up_by_prologue (hard_reg_set_container *regs)
     SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg));
 }
 
+/* -fsplit-stack support.  */
+
+/* A SYMBOL_REF for __morestack.  */
+static GTY(()) rtx morestack_ref;
+
+/* When using -fsplit-stack, the allocation routines set a field in
+   the TCB to the bottom of the stack plus this much space, measured
+   in bytes.  */
+
+#define SPLIT_STACK_AVAILABLE 1024
+
+/* Emit -fsplit-stack prologue, which goes before the regular function
+   prologue.  */
+
+void
+s390_expand_split_stack_prologue (void)
+{
+  rtx r1, guard, cc = NULL;
+  rtx_insn *insn;
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = TARGET_64BIT ? 0x38 : 0x20;
+  /* Pointer size in bytes.  */
+  /* Frame size and argument size - the two parameters to __morestack.  */
+  HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size;
+  /* Align argument size to 8 bytes - simplifies __morestack code.  */
+  HOST_WIDE_INT args_size = crtl->args.size >= 0
+			    ? ((crtl->args.size + 7) & ~7)
+			    : 0;
+  /* Label to be called by __morestack.  */
+  rtx_code_label *call_done = NULL;
+  rtx_code_label *parm_base = NULL;
+  rtx tmp;
+
+  gcc_assert (flag_split_stack && reload_completed);
+  if (!TARGET_CPU_ZARCH)
+    {
+      sorry ("CPUs older than z900 are not supported for -fsplit-stack");
+      return;
+    }
+
+  r1 = gen_rtx_REG (Pmode, 1);
+
+  /* If no stack frame will be allocated, don't do anything.  */
+  if (!frame_size)
+    {
+      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+	{
+	  /* If va_start is used, just use r15.  */
+	  emit_move_insn (r1,
+			 gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+				       GEN_INT (STACK_POINTER_OFFSET)));
+
+	}
+      return;
+    }
+
+  if (morestack_ref == NULL_RTX)
+    {
+      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
+      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
+					   | SYMBOL_FLAG_FUNCTION);
+    }
+
+  if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Op (frame_size))
+    {
+      /* If frame_size will fit in an add instruction, do a stack space
+	 check, and only call __morestack if there's not enough space.  */
+
+      /* Get thread pointer.  r1 is the only register we can always destroy - r0
+	 could contain a static chain (and cannot be used to address memory
+	 anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved.  */
+      emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM));
+      /* Aim at __private_ss.  */
+      guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso));
+
+      /* If less that 1kiB used, skip addition and compare directly with
+	 __private_ss.  */
+      if (frame_size > SPLIT_STACK_AVAILABLE)
+	{
+	  emit_move_insn (r1, guard);
+	  if (TARGET_64BIT)
+	    emit_insn (gen_adddi3 (r1, r1, GEN_INT (frame_size)));
+	  else
+	    emit_insn (gen_addsi3 (r1, r1, GEN_INT (frame_size)));
+	  guard = r1;
+	}
+
+      /* Compare the (maybe adjusted) guard with the stack pointer.  */
+      cc = s390_emit_compare (LT, stack_pointer_rtx, guard);
+    }
+
+  call_done = gen_label_rtx ();
+  parm_base = gen_label_rtx ();
+
+  /* Emit the parameter block.  */
+  tmp = gen_split_stack_data (parm_base, call_done,
+			      GEN_INT (frame_size),
+			      GEN_INT (args_size));
+  insn = emit_insn (tmp);
+  add_reg_note (insn, REG_LABEL_OPERAND, call_done);
+  LABEL_NUSES (call_done)++;
+  add_reg_note (insn, REG_LABEL_OPERAND, parm_base);
+  LABEL_NUSES (parm_base)++;
+
+  /* %r1 = litbase.  */
+  insn = emit_insn (gen_main_base_64 (r1, parm_base));
+  add_reg_note (insn, REG_LABEL_OPERAND, parm_base);
+  LABEL_NUSES (parm_base)++;
+
+  /* Now, we need to call __morestack.  It has very special calling
+     conventions: it preserves param/return/static chain registers for
+     calling main function body, and looks for its own parameters at %r1. */
+
+  if (cc != NULL)
+    {
+      tmp = gen_split_stack_cond_call (morestack_ref, cc, call_done);
+
+      insn = emit_jump_insn (tmp);
+      JUMP_LABEL (insn) = call_done;
+      LABEL_NUSES (call_done)++;
+
+      /* Mark the jump as very unlikely to be taken.  */
+      add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100);
+
+      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+	{
+	  /* If va_start is used, and __morestack was not called, just use
+	     r15.  */
+	  emit_move_insn (r1,
+			 gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+				       GEN_INT (STACK_POINTER_OFFSET)));
+	}
+    }
+  else
+    {
+      tmp = gen_split_stack_call (morestack_ref, call_done);
+      insn = emit_jump_insn (tmp);
+      JUMP_LABEL (insn) = call_done;
+      LABEL_NUSES (call_done)++;
+      emit_barrier ();
+    }
+
+  /* __morestack will call us here.  */
+
+  emit_label (call_done);
+}
+
+/* We may have to tell the dataflow pass that the split stack prologue
+   is initializing a register.  */
+
+static void
+s390_live_on_entry (bitmap regs)
+{
+  if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+    {
+      gcc_assert (flag_split_stack);
+      bitmap_set_bit (regs, 1);
+    }
+}
+
 /* Return true if the function can use simple_return to return outside
    of a shrink-wrapped region.  At present shrink-wrapping is supported
    in all cases.  */
@@ -11541,6 +11715,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
       expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
     }
 
+  if (flag_split_stack
+     && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
+         == NULL)
+     && cfun->machine->split_stack_varargs_pointer == NULL_RTX)
+    {
+      rtx reg;
+      rtx_insn *seq;
+
+      reg = gen_reg_rtx (Pmode);
+      cfun->machine->split_stack_varargs_pointer = reg;
+
+      start_sequence ();
+      emit_move_insn (reg, gen_rtx_REG (Pmode, 1));
+      seq = get_insns ();
+      end_sequence ();
+
+      push_topmost_sequence ();
+      emit_insn_after (seq, entry_of_function ());
+      pop_topmost_sequence ();
+    }
+
   /* Find the overflow area.
      FIXME: This currently is too pessimistic when the vector ABI is
      enabled.  In that case we *always* set up the overflow area
@@ -11549,7 +11744,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
       || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG
       || TARGET_VX_ABI)
     {
-      t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
+      if (cfun->machine->split_stack_varargs_pointer == NULL_RTX)
+        t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
+      else
+        t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer);
 
       off = INTVAL (crtl->args.arg_offset_rtx);
       off = off < 0 ? 0 : off;
@@ -14469,6 +14667,9 @@ s390_asm_file_end (void)
 	     s390_vector_abi);
 #endif
   file_end_indicate_exec_stack ();
+
+  if (flag_split_stack)
+    file_end_indicate_split_stack ();
 }
 
 /* Return true if TYPE is a vector bool type.  */
@@ -14724,6 +14925,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty
 #undef TARGET_SET_UP_BY_PROLOGUE
 #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue
 
+#undef TARGET_EXTRA_LIVE_ON_ENTRY
+#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry
+
 #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   s390_use_by_pieces_infrastructure_p
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 9b869d5..cc120b1 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -114,6 +114,9 @@
    UNSPEC_SP_SET
    UNSPEC_SP_TEST
 
+   ; Split stack support
+   UNSPEC_STACK_CHECK
+
    ; Test Data Class (TDC)
    UNSPEC_TDC_INSN
 
@@ -276,6 +279,10 @@
    ; Set and get floating point control register
    UNSPECV_SFPC
    UNSPECV_EFPC
+
+   ; Split stack support
+   UNSPECV_SPLIT_STACK_CALL
+   UNSPECV_SPLIT_STACK_DATA
   ])
 
 ;;
@@ -10907,3 +10914,134 @@
   "TARGET_Z13"
   "lcbb\t%0,%1,%b2"
   [(set_attr "op_type" "VRX")])
+
+; Handle -fsplit-stack.
+
+(define_expand "split_stack_prologue"
+  [(const_int 0)]
+  ""
+{
+  s390_expand_split_stack_prologue ();
+  DONE;
+})
+
+;; If there are operand 0 bytes available on the stack, jump to
+;; operand 1.
+
+(define_expand "split_stack_space_check"
+  [(set (pc) (if_then_else
+	      (ltu (minus (reg 15)
+			  (match_operand 0 "register_operand"))
+		   (unspec [(const_int 0)] UNSPEC_STACK_CHECK))
+	      (label_ref (match_operand 1))
+	      (pc)))]
+  ""
+{
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = TARGET_64BIT ? 0x38 : 0x20;
+  rtx tp = s390_get_thread_pointer ();
+  rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso));
+  rtx reg = gen_reg_rtx (Pmode);
+  rtx cc;
+  if (TARGET_64BIT)
+    emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0]));
+  else
+    emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0]));
+  cc = s390_emit_compare (GT, reg, guard);
+  s390_emit_jump (operands[1], cc);
+
+  DONE;
+})
+
+;; __morestack parameter block for split stack prologue.  Parameters are:
+;; parameter block label, label to be called by __morestack, frame size,
+;; stack parameter size.
+
+(define_insn "split_stack_data"
+  [(unspec_volatile [(match_operand 0 "" "X")
+		     (match_operand 1 "" "X")
+		     (match_operand 2 "consttable_operand" "X")
+		     (match_operand 3 "consttable_operand" "X")]
+		    UNSPECV_SPLIT_STACK_DATA)]
+  "TARGET_CPU_ZARCH"
+{
+  switch_to_section (targetm.asm_out.function_rodata_section
+		 (current_function_decl));
+
+  if (TARGET_64BIT)
+    output_asm_insn (".align\t8", operands);
+  else
+    output_asm_insn (".align\t4", operands);
+  (*targetm.asm_out.internal_label) (asm_out_file, "L",
+				     CODE_LABEL_NUMBER (operands[0]));
+  if (TARGET_64BIT)
+    {
+      output_asm_insn (".quad\t%2", operands);
+      output_asm_insn (".quad\t%3", operands);
+      output_asm_insn (".quad\t%1-%0", operands);
+    }
+  else
+    {
+      output_asm_insn (".long\t%2", operands);
+      output_asm_insn (".long\t%3", operands);
+      output_asm_insn (".long\t%1-%0", operands);
+    }
+
+  switch_to_section (current_function_section ());
+  return "";
+}
+  [(set_attr "length" "0")])
+
+
+;; A jg with minimal fuss for use in split stack prologue.
+
+(define_expand "split_stack_call"
+  [(match_operand 0 "bras_sym_operand" "X")
+   (match_operand 1 "" "")]
+  "TARGET_CPU_ZARCH"
+{
+  if (TARGET_64BIT)
+    emit_jump_insn (gen_split_stack_call_di (operands[0], operands[1]));
+  else
+    emit_jump_insn (gen_split_stack_call_si (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "split_stack_call_<mode>"
+  [(set (pc) (label_ref (match_operand 1 "" "")))
+   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
+				    (reg:P 1)]
+				   UNSPECV_SPLIT_STACK_CALL))]
+  "TARGET_CPU_ZARCH"
+  "jg\t%0"
+  [(set_attr "op_type" "RIL")
+   (set_attr "type"  "branch")])
+
+;; Also a conditional one.
+
+(define_expand "split_stack_cond_call"
+  [(match_operand 0 "bras_sym_operand" "X")
+   (match_operand 1 "" "")
+   (match_operand 2 "" "")]
+  "TARGET_CPU_ZARCH"
+{
+  if (TARGET_64BIT)
+    emit_jump_insn (gen_split_stack_cond_call_di (operands[0], operands[1], operands[2]));
+  else
+    emit_jump_insn (gen_split_stack_cond_call_si (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "split_stack_cond_call_<mode>"
+  [(set (pc)
+	(if_then_else
+	  (match_operand 1 "" "")
+	  (label_ref (match_operand 2 "" ""))
+	  (pc)))
+   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
+				    (reg:P 1)]
+				   UNSPECV_SPLIT_STACK_CALL))]
+  "TARGET_CPU_ZARCH"
+  "jg%C1\t%0"
+  [(set_attr "op_type" "RIL")
+   (set_attr "type"  "branch")])
diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index 49c7929..3900ab1 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-02-02  Marcin Kościelnicki  <koriakin@0x04.net>
+
+	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
+	* config/s390/morestack.S: New file.
+	* config/s390/t-stack-s390: New file.
+	* generic-morestack.c (__splitstack_find): Add s390-specific code.
+
 2016-01-25  Jakub Jelinek  <jakub@redhat.com>
 
 	PR target/69444
diff --git a/libgcc/config.host b/libgcc/config.host
index d8efd82..2be5f7e 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1114,11 +1114,11 @@ rx-*-elf)
 	tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h"
 	;;
 s390-*-linux*)
-	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi"
+	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390"
 	md_unwind_header=s390/linux-unwind.h
 	;;
 s390x-*-linux*)
-	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux"
+	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390"
 	if test "${host_address}" = 32; then
 	   tmake_file="${tmake_file} s390/32/t-floattodi"
 	fi
diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S
new file mode 100644
index 0000000..141dead
--- /dev/null
+++ b/libgcc/config/s390/morestack.S
@@ -0,0 +1,609 @@
+# s390 support for -fsplit-stack.
+# Copyright (C) 2015 Free Software Foundation, Inc.
+# Contributed by Marcin Kościelnicki <koriakin@0x04.net>.
+
+# This file is part of GCC.
+
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# Excess space needed to call ld.so resolver for lazy plt
+# resolution.  Go uses sigaltstack so this doesn't need to
+# also cover signal frame size.
+#define BACKOFF 0x1000
+
+# The __morestack function.
+
+	.global	__morestack
+	.hidden	__morestack
+
+	.type	__morestack,@function
+
+__morestack:
+.LFB1:
+	.cfi_startproc
+
+
+#ifndef __s390x__
+
+
+# The 31-bit __morestack function.
+
+	# We use a cleanup to restore the stack guard if an exception
+	# is thrown through this code.
+#ifndef __PIC__
+	.cfi_personality 0,__gcc_personality_v0
+	.cfi_lsda 0,.LLSDA1
+#else
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#endif
+
+	stm	%r2, %r15, 0x8(%r15)	# Save %r2-%r15.
+	.cfi_offset %r6, -0x48
+	.cfi_offset %r7, -0x44
+	.cfi_offset %r8, -0x40
+	.cfi_offset %r9, -0x3c
+	.cfi_offset %r10, -0x38
+	.cfi_offset %r11, -0x34
+	.cfi_offset %r12, -0x30
+	.cfi_offset %r13, -0x2c
+	.cfi_offset %r14, -0x28
+	.cfi_offset %r15, -0x24
+	lr	%r11, %r15		# Make frame pointer for vararg.
+	.cfi_def_cfa_register %r11
+	ahi	%r15, -0x60		# 0x60 for standard frame.
+	st	%r11, 0(%r15)		# Save back chain.
+	lr	%r8, %r0		# Save %r0 (static chain).
+	lr	%r10, %r1		# Save %r1 (address of parameter block).
+
+	l	%r7, 0(%r10)		# Required frame size to %r7
+	ear	%r1, %a0		# Extract thread pointer.
+	l	%r1, 0x20(%r1)		# Get stack bounduary
+	ar	%r1, %r7		# Stack bounduary + frame size
+	a	%r1, 4(%r10)		# + stack param size
+	clr	%r1, %r15		# Compare with current stack pointer
+	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
+
+	brasl	%r14, __morestack_block_signals
+
+	# We abuse one of caller's fpr save slots (which we don't use for fprs)
+	# as a local variable.  Not needed here, but done to be consistent with
+	# the below use.
+	ahi	%r7, BACKOFF		# Bump requested size a bit.
+	st	%r7, 0x40(%r11)		# Stuff frame size on stack.
+	la	%r2, 0x40(%r11)		# Pass its address as parameter.
+	la	%r3, 0x60(%r11)		# Caller's stack parameters.
+	l	%r4, 4(%r10)		# Size of stack parameters.
+	brasl	%r14, __generic_morestack
+
+	lr	%r15, %r2		# Switch to the new stack.
+	ahi	%r15, -0x60		# Make a stack frame on it.
+	st	%r11, 0(%r15)		# Save back chain.
+
+	s	%r2, 0x40(%r11)		# The end of stack space.
+	ahi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+.LEHB0:
+	st	%r2, 0x20(%r1)	# Save the new stack boundary.
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lr	%r0, %r8		# Static chain.
+	lm	%r2, %r6, 0x8(%r11)	# Paremeter registers.
+
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	a	%r10, 0x8(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0x60(%r11)
+
+	# State of registers:
+	# %r0: Static chain from entry.
+	# %r1: Vararg pointer.
+	# %r2-%r6: Parameters from entry.
+	# %r7-%r10: Indeterminate.
+	# %r11: Frame pointer (%r15 from entry).
+	# %r12-%r13: Indeterminate.
+	# %r14: Return address.
+	# %r15: Stack pointer.
+	basr	%r14, %r10		# Call our caller.
+
+	stm	%r2, %r3, 0x8(%r11)	# Save return registers.
+
+	brasl	%r14, __morestack_block_signals
+
+	# We need a stack slot now, but have no good way to get it - the frame
+	# on new stack had to be exactly 0x60 bytes, or stack parameters would
+	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
+	# save actual fprs).
+	la	%r2, 0x40(%r11)
+	brasl	%r14, __generic_releasestack
+
+	s	%r2, 0x40(%r11)		# Subtract available space.
+	ahi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+.LEHE0:
+	st	%r2, 0x20(%r1)	# Save the new stack boundary.
+
+	# We need to restore the old stack pointer before unblocking signals.
+	# We also need 0x60 bytes for a stack frame.  Since we had a stack
+	# frame at this place before the stack switch, there's no need to
+	# write the back chain again.
+	lr	%r15, %r11
+	ahi	%r15, -0x60
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lm	%r2, %r15, 0x8(%r11)	# Restore all registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# Executed if no new stack allocation is needed.
+
+.Lnoalloc:
+	.cfi_restore_state
+	# We may need to copy stack parameters.
+	l	%r9, 0x4(%r10)		# Load stack parameter size.
+	ltr	%r9, %r9		# And check if it's 0.
+	je	.Lnostackparm		# Skip the copy if not needed.
+	sr	%r15, %r9		# Make space on the stack.
+	la	%r8, 0x60(%r15)		# Destination.
+	la	%r12, 0x60(%r11)	# Source.
+	lr	%r13, %r9		# Source size.
+.Lcopy:
+	mvcle	%r8, %r12, 0		# Copy.
+	jo	.Lcopy
+
+.Lnostackparm:
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	a	%r10, 0x8(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0x60(%r11)
+
+	# OK, no stack allocation needed.  We still follow the protocol and
+	# call our caller - it doesn't cost much and makes sure vararg works.
+	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
+	basr	%r14, %r10		# Call our caller.
+
+	lm	%r6, %r15, 0x18(%r11)	# Restore all callee-saved registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# This is the cleanup code called by the stack unwinder when unwinding
+# through the code between .LEHB0 and .LEHE0 above.
+
+.L1:
+	.cfi_restore_state
+	lr	%r2, %r11		# Stack pointer after resume.
+	brasl	%r14, __generic_findstack
+	lr	%r3, %r11		# Get the stack pointer.
+	sr	%r3, %r2		# Subtract available space.
+	ahi	%r3, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+	st	%r3, 0x20(%r1)	# Save the new stack boundary.
+
+	lr	%r2, %r6		# Exception header.
+#ifdef __PIC__
+	brasl	%r14, _Unwind_Resume@PLT
+#else
+	brasl	%r14, _Unwind_Resume
+#endif
+
+#else /* defined(__s390x__) */
+
+
+# The 64-bit __morestack function.
+
+	# We use a cleanup to restore the stack guard if an exception
+	# is thrown through this code.
+#ifndef __PIC__
+	.cfi_personality 0x3,__gcc_personality_v0
+	.cfi_lsda 0x3,.LLSDA1
+#else
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#endif
+
+	stmg	%r2, %r15, 0x10(%r15)	# Save %r2-%r15.
+	.cfi_offset %r6, -0x70
+	.cfi_offset %r7, -0x68
+	.cfi_offset %r8, -0x60
+	.cfi_offset %r9, -0x58
+	.cfi_offset %r10, -0x50
+	.cfi_offset %r11, -0x48
+	.cfi_offset %r12, -0x40
+	.cfi_offset %r13, -0x38
+	.cfi_offset %r14, -0x30
+	.cfi_offset %r15, -0x28
+	lgr	%r11, %r15		# Make frame pointer for vararg.
+	.cfi_def_cfa_register %r11
+	aghi	%r15, -0xa0		# 0xa0 for standard frame.
+	stg	%r11, 0(%r15)		# Save back chain.
+	lgr	%r8, %r0		# Save %r0 (static chain).
+	lgr	%r10, %r1		# Save %r1 (address of parameter block).
+
+	lg	%r7, 0(%r10)		# Required frame size to %r7
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+	lg	%r1, 0x38(%r1)		# Get stack bounduary
+	agr	%r1, %r7		# Stack bounduary + frame size
+	ag	%r1, 8(%r10)		# + stack param size
+	clgr	%r1, %r15		# Compare with current stack pointer
+	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
+
+	brasl	%r14, __morestack_block_signals
+
+	# We abuse one of caller's fpr save slots (which we don't use for fprs)
+	# as a local variable.  Not needed here, but done to be consistent with
+	# the below use.
+	aghi	%r7, BACKOFF		# Bump requested size a bit.
+	stg	%r7, 0x80(%r11)		# Stuff frame size on stack.
+	la	%r2, 0x80(%r11)		# Pass its address as parameter.
+	la	%r3, 0xa0(%r11)		# Caller's stack parameters.
+	lg	%r4, 8(%r10)		# Size of stack parameters.
+	brasl	%r14, __generic_morestack
+
+	lgr	%r15, %r2		# Switch to the new stack.
+	aghi	%r15, -0xa0		# Make a stack frame on it.
+	stg	%r11, 0(%r15)		# Save back chain.
+
+	sg	%r2, 0x80(%r11)		# The end of stack space.
+	aghi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+.LEHB0:
+	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lgr	%r0, %r8		# Static chain.
+	lmg	%r2, %r6, 0x10(%r11)	# Paremeter registers.
+
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	ag	%r10, 0x10(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0xa0(%r11)
+
+	# State of registers:
+	# %r0: Static chain from entry.
+	# %r1: Vararg pointer.
+	# %r2-%r6: Parameters from entry.
+	# %r7-%r10: Indeterminate.
+	# %r11: Frame pointer (%r15 from entry).
+	# %r12-%r13: Indeterminate.
+	# %r14: Return address.
+	# %r15: Stack pointer.
+	basr	%r14, %r10		# Call our caller.
+
+	stg	%r2, 0x10(%r11)		# Save return register.
+
+	brasl	%r14, __morestack_block_signals
+
+	# We need a stack slot now, but have no good way to get it - the frame
+	# on new stack had to be exactly 0xa0 bytes, or stack parameters would
+	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
+	# save actual fprs).
+	la	%r2, 0x80(%r11)
+	brasl	%r14, __generic_releasestack
+
+	sg	%r2, 0x80(%r11)		# Subtract available space.
+	aghi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+.LEHE0:
+	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
+
+	# We need to restore the old stack pointer before unblocking signals.
+	# We also need 0xa0 bytes for a stack frame.  Since we had a stack
+	# frame at this place before the stack switch, there's no need to
+	# write the back chain again.
+	lgr	%r15, %r11
+	aghi	%r15, -0xa0
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lmg	%r2, %r15, 0x10(%r11)	# Restore all registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# Executed if no new stack allocation is needed.
+
+.Lnoalloc:
+	.cfi_restore_state
+	# We may need to copy stack parameters.
+	lg	%r9, 0x8(%r10)		# Load stack parameter size.
+	ltgr	%r9, %r9		# Check if it's 0.
+	je	.Lnostackparm		# Skip the copy if not needed.
+	sgr	%r15, %r9		# Make space on the stack.
+	la	%r8, 0xa0(%r15)		# Destination.
+	la	%r12, 0xa0(%r11)	# Source.
+	lgr	%r13, %r9		# Source size.
+.Lcopy:
+	mvcle	%r8, %r12, 0		# Copy.
+	jo	.Lcopy
+
+.Lnostackparm:
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	ag	%r10, 0x10(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0xa0(%r11)
+
+	# OK, no stack allocation needed.  We still follow the protocol and
+	# call our caller - it doesn't cost much and makes sure vararg works.
+	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
+	basr	%r14, %r10		# Call our caller.
+
+	lmg	%r6, %r15, 0x30(%r11)	# Restore all callee-saved registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# This is the cleanup code called by the stack unwinder when unwinding
+# through the code between .LEHB0 and .LEHE0 above.
+
+.L1:
+	.cfi_restore_state
+	lgr	%r2, %r11		# Stack pointer after resume.
+	brasl	%r14, __generic_findstack
+	lgr	%r3, %r11		# Get the stack pointer.
+	sgr	%r3, %r2		# Subtract available space.
+	aghi	%r3, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+	stg	%r3, 0x38(%r1)	# Save the new stack boundary.
+
+	lgr	%r2, %r6		# Exception header.
+#ifdef __PIC__
+	brasl	%r14, _Unwind_Resume@PLT
+#else
+	brasl	%r14, _Unwind_Resume
+#endif
+
+#endif /* defined(__s390x__) */
+
+	.cfi_endproc
+	.size	__morestack, . - __morestack
+
+
+# The exception table.  This tells the personality routine to execute
+# the exception handler.
+
+	.section	.gcc_except_table,"a",@progbits
+	.align	4
+.LLSDA1:
+	.byte	0xff	# @LPStart format (omit)
+	.byte	0xff	# @TType format (omit)
+	.byte	0x1	# call-site format (uleb128)
+	.uleb128 .LLSDACSE1-.LLSDACSB1	# Call-site table length
+.LLSDACSB1:
+	.uleb128 .LEHB0-.LFB1	# region 0 start
+	.uleb128 .LEHE0-.LEHB0	# length
+	.uleb128 .L1-.LFB1	# landing pad
+	.uleb128 0		# action
+.LLSDACSE1:
+
+
+	.global __gcc_personality_v0
+#ifdef __PIC__
+	# Build a position independent reference to the basic
+	# personality function.
+	.hidden DW.ref.__gcc_personality_v0
+	.weak   DW.ref.__gcc_personality_v0
+	.section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
+	.type	DW.ref.__gcc_personality_v0, @object
+DW.ref.__gcc_personality_v0:
+#ifndef __LP64__
+	.align 4
+	.size	DW.ref.__gcc_personality_v0, 4
+	.long	__gcc_personality_v0
+#else
+	.align 8
+	.size	DW.ref.__gcc_personality_v0, 8
+	.quad	__gcc_personality_v0
+#endif
+#endif
+
+
+
+# Initialize the stack test value when the program starts or when a
+# new thread starts.  We don't know how large the main stack is, so we
+# guess conservatively.  We might be able to use getrlimit here.
+
+	.text
+	.global	__stack_split_initialize
+	.hidden	__stack_split_initialize
+
+	.type	__stack_split_initialize, @function
+
+__stack_split_initialize:
+
+#ifndef __s390x__
+
+	ear	%r1, %a0
+	lr	%r0, %r15
+	ahi	%r0, -0x4000	# We should have at least 16K.
+	st	%r0, 0x20(%r1)
+
+	lr	%r2, %r15
+	lhi	%r3, 0x4000
+#ifdef __PIC__
+	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
+#else
+	jg	__generic_morestack_set_initial_sp	# Tail call
+#endif
+
+#else /* defined(__s390x__) */
+
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	lgr	%r0, %r15
+	aghi	%r0, -0x4000	# We should have at least 16K.
+	stg	%r0, 0x38(%r1)
+
+	lgr	%r2, %r15
+	lghi	%r3, 0x4000
+#ifdef __PIC__
+	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
+#else
+	jg	__generic_morestack_set_initial_sp	# Tail call
+#endif
+
+#endif /* defined(__s390x__) */
+
+	.size	__stack_split_initialize, . - __stack_split_initialize
+
+# Routines to get and set the guard, for __splitstack_getcontext,
+# __splitstack_setcontext, and __splitstack_makecontext.
+
+# void *__morestack_get_guard (void) returns the current stack guard.
+	.text
+	.global	__morestack_get_guard
+	.hidden	__morestack_get_guard
+
+	.type	__morestack_get_guard,@function
+
+__morestack_get_guard:
+
+#ifndef __s390x__
+	ear	%r1, %a0
+	l	%r2, 0x20(%r1)
+#else
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	lg	%r2, 0x38(%r1)
+#endif
+	br %r14
+
+	.size	__morestack_get_guard, . - __morestack_get_guard
+
+# void __morestack_set_guard (void *) sets the stack guard.
+	.global	__morestack_set_guard
+	.hidden	__morestack_set_guard
+
+	.type	__morestack_set_guard,@function
+
+__morestack_set_guard:
+
+#ifndef __s390x__
+	ear	%r1, %a0
+	st	%r2, 0x20(%r1)
+#else
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	stg	%r2, 0x38(%r1)
+#endif
+	br	%r14
+
+	.size	__morestack_set_guard, . - __morestack_set_guard
+
+# void *__morestack_make_guard (void *, size_t) returns the stack
+# guard value for a stack.
+	.global	__morestack_make_guard
+	.hidden	__morestack_make_guard
+
+	.type	__morestack_make_guard,@function
+
+__morestack_make_guard:
+
+#ifndef __s390x__
+	sr	%r2, %r3
+	ahi	%r2, BACKOFF
+#else
+	sgr	%r2, %r3
+	aghi	%r2, BACKOFF
+#endif
+	br	%r14
+
+	.size	__morestack_make_guard, . - __morestack_make_guard
+
+# Make __stack_split_initialize a high priority constructor.
+
+	.section .ctors.65535,"aw",@progbits
+
+#ifndef __LP64__
+	.align	4
+	.long	__stack_split_initialize
+	.long	__morestack_load_mmap
+#else
+	.align	8
+	.quad	__stack_split_initialize
+	.quad	__morestack_load_mmap
+#endif
+
+	.section	.note.GNU-stack,"",@progbits
+	.section	.note.GNU-split-stack,"",@progbits
+	.section	.note.GNU-no-split-stack,"",@progbits
diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390
new file mode 100644
index 0000000..4c959b0
--- /dev/null
+++ b/libgcc/config/s390/t-stack-s390
@@ -0,0 +1,2 @@
+# Makefile fragment to support -fsplit-stack for s390.
+LIB2ADD_ST += $(srcdir)/config/s390/morestack.S
diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
index 89765d4..b8eec4e 100644
--- a/libgcc/generic-morestack.c
+++ b/libgcc/generic-morestack.c
@@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len,
 #elif defined (__i386__)
       nsp -= 6 * sizeof (void *);
 #elif defined __powerpc64__
+#elif defined __s390x__
+      nsp -= 2 * 160;
+#elif defined __s390__
+      nsp -= 2 * 96;
 #else
 #error "unrecognized target"
 #endif
-- 
2.7.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-03  0:20                     ` Marcin Kościelnicki
@ 2016-02-03 17:03                       ` Ulrich Weigand
  2016-02-03 17:18                         ` Marcin Kościelnicki
  0 siblings, 1 reply; 55+ messages in thread
From: Ulrich Weigand @ 2016-02-03 17:03 UTC (permalink / raw)
  To: Marcin Kościelnicki; +Cc: krebbel, gcc-patches, Marcin Kościelnicki

Marcin Kościelnicki wrote:

> Comment fixed, split_stack_marker gone, reorg gone.  Generated code seems sane,
> but testsuite still running.
> 
> I will need to modify the gold patch to handle the "leaf function taking non-split
> stack function address" issue - this will likely require messing with the target
> independent plumbing, the hook for that doesn't seem to get enough params.

Thanks for making those changes; the patch is looking a lot nicer (and shorter :-))
now!  Just to clarify, your original patch series had two common-code prerequisite
patches (3/5 and 4/5) -- it looks like those may still be needed?  If so, we'll
have to get approval from the appropriate middle-end maintainers before this
patch can go it as well.

As to the back-end patch, I've now only got some cosmetical issues:

> +  insn = emit_insn (gen_main_base_64 (r1, parm_base));

Now that we aren't using the literal pool infrastructure for the block any more,
I guess we shouldn't be using it to load the address either.  Just something
like:
  insn = emit_move_insn (r1, gen_rtx_LABEL_REF (VOIDmode, parm_base));
should do it.

> +(define_insn "split_stack_data"
> +  [(unspec_volatile [(match_operand 0 "" "X")
> +		     (match_operand 1 "" "X")
> +		     (match_operand 2 "consttable_operand" "X")
> +		     (match_operand 3 "consttable_operand" "X")]

And similarly here, just use const_int_operand.

Otherwise, this all looks very good to me.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] s390: Add -fsplit-stack support
  2016-02-03 17:03                       ` Ulrich Weigand
@ 2016-02-03 17:18                         ` Marcin Kościelnicki
  2016-02-03 17:27                           ` Ulrich Weigand
  0 siblings, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-02-03 17:18 UTC (permalink / raw)
  To: uweigand; +Cc: krebbel, gcc-patches, Marcin Kościelnicki

libgcc/ChangeLog:

	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
	* config/s390/morestack.S: New file.
	* config/s390/t-stack-s390: New file.
	* generic-morestack.c (__splitstack_find): Add s390-specific code.

gcc/ChangeLog:

	* common/config/s390/s390-common.c (s390_supports_split_stack):
	New function.
	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
	* config/s390/s390.c (struct machine_function): New field
	split_stack_varargs_pointer.
	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
	in s390_emit_prologue.
	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
	vararg pointer.
	(morestack_ref): New global.
	(SPLIT_STACK_AVAILABLE): New macro.
	(s390_expand_split_stack_prologue): New function.
	(s390_live_on_entry): New function.
	(s390_va_start): Use split-stack vararg pointer if appropriate.
	(s390_asm_file_end): Emit the split-stack note sections.
	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
	(UNSPECV_SPLIT_STACK_CALL): New unspec.
	(UNSPECV_SPLIT_STACK_DATA): New unspec.
	(split_stack_prologue): New expand.
	(split_stack_space_check): New expand.
	(split_stack_data): New insn.
	(split_stack_call): New expand.
	(split_stack_call_*): New insn.
	(split_stack_cond_call): New expand.
	(split_stack_cond_call_*): New insn.
---
Changes applied.  Testsuite still running, still works on my simple tests.

As for common code prerequisites: #3 is no longer needed, and very likely
so is #4 (it fixes problems that I've only seen with ESA mode, and testsuite
runs just fine without it now).

 gcc/ChangeLog                        |  30 ++
 gcc/common/config/s390/s390-common.c |  14 +
 gcc/config/s390/s390-protos.h        |   1 +
 gcc/config/s390/s390.c               | 214 +++++++++++-
 gcc/config/s390/s390.md              | 138 ++++++++
 libgcc/ChangeLog                     |   7 +
 libgcc/config.host                   |   4 +-
 libgcc/config/s390/morestack.S       | 609 +++++++++++++++++++++++++++++++++++
 libgcc/config/s390/t-stack-s390      |   2 +
 libgcc/generic-morestack.c           |   4 +
 10 files changed, 1016 insertions(+), 7 deletions(-)
 create mode 100644 libgcc/config/s390/morestack.S
 create mode 100644 libgcc/config/s390/t-stack-s390

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 92db764..8e3f9f7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,33 @@
+2016-02-03  Marcin Kościelnicki  <koriakin@0x04.net>
+
+	* common/config/s390/s390-common.c (s390_supports_split_stack):
+	New function.
+	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
+	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
+	* config/s390/s390.c (struct machine_function): New field
+	split_stack_varargs_pointer.
+	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
+	in s390_emit_prologue.
+	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
+	vararg pointer.
+	(morestack_ref): New global.
+	(SPLIT_STACK_AVAILABLE): New macro.
+	(s390_expand_split_stack_prologue): New function.
+	(s390_live_on_entry): New function.
+	(s390_va_start): Use split-stack vararg pointer if appropriate.
+	(s390_asm_file_end): Emit the split-stack note sections.
+	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
+	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
+	(UNSPECV_SPLIT_STACK_CALL): New unspec.
+	(UNSPECV_SPLIT_STACK_DATA): New unspec.
+	(split_stack_prologue): New expand.
+	(split_stack_space_check): New expand.
+	(split_stack_data): New insn.
+	(split_stack_call): New expand.
+	(split_stack_call_*): New insn.
+	(split_stack_cond_call): New expand.
+	(split_stack_cond_call_*): New insn.
+
 2016-02-03  Kirill Yukhin  <kirill.yukhin@intel.com>
 
 	PR target/69118
diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c
index 4519c21..1e497e6 100644
--- a/gcc/common/config/s390/s390-common.c
+++ b/gcc/common/config/s390/s390-common.c
@@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
     }
 }
 
+/* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
+   We don't verify it, since earlier versions just have padding at
+   its place, which works just as well.  */
+
+static bool
+s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
+			   struct gcc_options *opts ATTRIBUTE_UNUSED)
+{
+  return true;
+}
+
 #undef TARGET_DEFAULT_TARGET_FLAGS
 #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT)
 
@@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
 #undef TARGET_OPTION_INIT_STRUCT
 #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct
 
+#undef TARGET_SUPPORTS_SPLIT_STACK
+#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack
+
 struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 633bc1e..09032c9 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
 extern HOST_WIDE_INT s390_initial_elimination_offset (int, int);
 extern void s390_emit_prologue (void);
 extern void s390_emit_epilogue (bool);
+extern void s390_expand_split_stack_prologue (void);
 extern bool s390_can_use_simple_return_insn (void);
 extern bool s390_can_use_return_insn (void);
 extern void s390_function_profiler (FILE *, int);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 3be64de..9c33545 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -426,6 +426,13 @@ struct GTY(()) machine_function
   /* True if the current function may contain a tbegin clobbering
      FPRs.  */
   bool tbegin_p;
+
+  /* For -fsplit-stack support: A stack local which holds a pointer to
+     the stack arguments for a function with a variable number of
+     arguments.  This is set at the start of the function and is used
+     to initialize the overflow_arg_area field of the va_list
+     structure.  */
+  rtx split_stack_varargs_pointer;
 };
 
 /* Few accessor macros for struct cfun->machine->s390_frame_layout.  */
@@ -9316,9 +9323,13 @@ s390_register_info ()
 	  cfun_frame_layout.high_fprs++;
       }
 
-  if (flag_pic)
-    clobbered_regs[PIC_OFFSET_TABLE_REGNUM]
-      |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM);
+  /* Register 12 is used for GOT address, but also as temp in prologue
+     for split-stack stdarg functions (unless r14 is available).  */
+  clobbered_regs[12]
+    |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM))
+	|| (flag_split_stack && cfun->stdarg
+	    && (crtl->is_leaf || TARGET_TPF_PROFILING
+		|| has_hard_reg_initial_val (Pmode, RETURN_REGNUM))));
 
   clobbered_regs[BASE_REGNUM]
     |= (cfun->machine->base_reg
@@ -10440,12 +10451,15 @@ s390_emit_prologue (void)
   int next_fpr = 0;
 
   /* Choose best register to use for temp use within prologue.
-     See below for why TPF must use the register 1.  */
+     TPF with profiling must avoid the register 14 - the tracing function
+     needs the original contents of r14 to be preserved.  */
 
   if (!has_hard_reg_initial_val (Pmode, RETURN_REGNUM)
       && !crtl->is_leaf
       && !TARGET_TPF_PROFILING)
     temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM);
+  else if (flag_split_stack && cfun->stdarg)
+    temp_reg = gen_rtx_REG (Pmode, 12);
   else
     temp_reg = gen_rtx_REG (Pmode, 1);
 
@@ -10939,6 +10953,166 @@ s300_set_up_by_prologue (hard_reg_set_container *regs)
     SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg));
 }
 
+/* -fsplit-stack support.  */
+
+/* A SYMBOL_REF for __morestack.  */
+static GTY(()) rtx morestack_ref;
+
+/* When using -fsplit-stack, the allocation routines set a field in
+   the TCB to the bottom of the stack plus this much space, measured
+   in bytes.  */
+
+#define SPLIT_STACK_AVAILABLE 1024
+
+/* Emit -fsplit-stack prologue, which goes before the regular function
+   prologue.  */
+
+void
+s390_expand_split_stack_prologue (void)
+{
+  rtx r1, guard, cc = NULL;
+  rtx_insn *insn;
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = TARGET_64BIT ? 0x38 : 0x20;
+  /* Pointer size in bytes.  */
+  /* Frame size and argument size - the two parameters to __morestack.  */
+  HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size;
+  /* Align argument size to 8 bytes - simplifies __morestack code.  */
+  HOST_WIDE_INT args_size = crtl->args.size >= 0
+			    ? ((crtl->args.size + 7) & ~7)
+			    : 0;
+  /* Label to be called by __morestack.  */
+  rtx_code_label *call_done = NULL;
+  rtx_code_label *parm_base = NULL;
+  rtx tmp;
+
+  gcc_assert (flag_split_stack && reload_completed);
+  if (!TARGET_CPU_ZARCH)
+    {
+      sorry ("CPUs older than z900 are not supported for -fsplit-stack");
+      return;
+    }
+
+  r1 = gen_rtx_REG (Pmode, 1);
+
+  /* If no stack frame will be allocated, don't do anything.  */
+  if (!frame_size)
+    {
+      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+	{
+	  /* If va_start is used, just use r15.  */
+	  emit_move_insn (r1,
+			 gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+				       GEN_INT (STACK_POINTER_OFFSET)));
+
+	}
+      return;
+    }
+
+  if (morestack_ref == NULL_RTX)
+    {
+      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
+      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
+					   | SYMBOL_FLAG_FUNCTION);
+    }
+
+  if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Op (frame_size))
+    {
+      /* If frame_size will fit in an add instruction, do a stack space
+	 check, and only call __morestack if there's not enough space.  */
+
+      /* Get thread pointer.  r1 is the only register we can always destroy - r0
+	 could contain a static chain (and cannot be used to address memory
+	 anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved.  */
+      emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM));
+      /* Aim at __private_ss.  */
+      guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso));
+
+      /* If less that 1kiB used, skip addition and compare directly with
+	 __private_ss.  */
+      if (frame_size > SPLIT_STACK_AVAILABLE)
+	{
+	  emit_move_insn (r1, guard);
+	  if (TARGET_64BIT)
+	    emit_insn (gen_adddi3 (r1, r1, GEN_INT (frame_size)));
+	  else
+	    emit_insn (gen_addsi3 (r1, r1, GEN_INT (frame_size)));
+	  guard = r1;
+	}
+
+      /* Compare the (maybe adjusted) guard with the stack pointer.  */
+      cc = s390_emit_compare (LT, stack_pointer_rtx, guard);
+    }
+
+  call_done = gen_label_rtx ();
+  parm_base = gen_label_rtx ();
+
+  /* Emit the parameter block.  */
+  tmp = gen_split_stack_data (parm_base, call_done,
+			      GEN_INT (frame_size),
+			      GEN_INT (args_size));
+  insn = emit_insn (tmp);
+  add_reg_note (insn, REG_LABEL_OPERAND, call_done);
+  LABEL_NUSES (call_done)++;
+  add_reg_note (insn, REG_LABEL_OPERAND, parm_base);
+  LABEL_NUSES (parm_base)++;
+
+  /* %r1 = litbase.  */
+  insn = emit_move_insn (r1, gen_rtx_LABEL_REF (VOIDmode, parm_base));
+  add_reg_note (insn, REG_LABEL_OPERAND, parm_base);
+  LABEL_NUSES (parm_base)++;
+
+  /* Now, we need to call __morestack.  It has very special calling
+     conventions: it preserves param/return/static chain registers for
+     calling main function body, and looks for its own parameters at %r1. */
+
+  if (cc != NULL)
+    {
+      tmp = gen_split_stack_cond_call (morestack_ref, cc, call_done);
+
+      insn = emit_jump_insn (tmp);
+      JUMP_LABEL (insn) = call_done;
+      LABEL_NUSES (call_done)++;
+
+      /* Mark the jump as very unlikely to be taken.  */
+      add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100);
+
+      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+	{
+	  /* If va_start is used, and __morestack was not called, just use
+	     r15.  */
+	  emit_move_insn (r1,
+			 gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+				       GEN_INT (STACK_POINTER_OFFSET)));
+	}
+    }
+  else
+    {
+      tmp = gen_split_stack_call (morestack_ref, call_done);
+      insn = emit_jump_insn (tmp);
+      JUMP_LABEL (insn) = call_done;
+      LABEL_NUSES (call_done)++;
+      emit_barrier ();
+    }
+
+  /* __morestack will call us here.  */
+
+  emit_label (call_done);
+}
+
+/* We may have to tell the dataflow pass that the split stack prologue
+   is initializing a register.  */
+
+static void
+s390_live_on_entry (bitmap regs)
+{
+  if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+    {
+      gcc_assert (flag_split_stack);
+      bitmap_set_bit (regs, 1);
+    }
+}
+
 /* Return true if the function can use simple_return to return outside
    of a shrink-wrapped region.  At present shrink-wrapping is supported
    in all cases.  */
@@ -11541,6 +11715,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
       expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
     }
 
+  if (flag_split_stack
+     && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
+         == NULL)
+     && cfun->machine->split_stack_varargs_pointer == NULL_RTX)
+    {
+      rtx reg;
+      rtx_insn *seq;
+
+      reg = gen_reg_rtx (Pmode);
+      cfun->machine->split_stack_varargs_pointer = reg;
+
+      start_sequence ();
+      emit_move_insn (reg, gen_rtx_REG (Pmode, 1));
+      seq = get_insns ();
+      end_sequence ();
+
+      push_topmost_sequence ();
+      emit_insn_after (seq, entry_of_function ());
+      pop_topmost_sequence ();
+    }
+
   /* Find the overflow area.
      FIXME: This currently is too pessimistic when the vector ABI is
      enabled.  In that case we *always* set up the overflow area
@@ -11549,7 +11744,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
       || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG
       || TARGET_VX_ABI)
     {
-      t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
+      if (cfun->machine->split_stack_varargs_pointer == NULL_RTX)
+        t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
+      else
+        t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer);
 
       off = INTVAL (crtl->args.arg_offset_rtx);
       off = off < 0 ? 0 : off;
@@ -14469,6 +14667,9 @@ s390_asm_file_end (void)
 	     s390_vector_abi);
 #endif
   file_end_indicate_exec_stack ();
+
+  if (flag_split_stack)
+    file_end_indicate_split_stack ();
 }
 
 /* Return true if TYPE is a vector bool type.  */
@@ -14724,6 +14925,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty
 #undef TARGET_SET_UP_BY_PROLOGUE
 #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue
 
+#undef TARGET_EXTRA_LIVE_ON_ENTRY
+#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry
+
 #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   s390_use_by_pieces_infrastructure_p
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 9b869d5..975ee27 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -114,6 +114,9 @@
    UNSPEC_SP_SET
    UNSPEC_SP_TEST
 
+   ; Split stack support
+   UNSPEC_STACK_CHECK
+
    ; Test Data Class (TDC)
    UNSPEC_TDC_INSN
 
@@ -276,6 +279,10 @@
    ; Set and get floating point control register
    UNSPECV_SFPC
    UNSPECV_EFPC
+
+   ; Split stack support
+   UNSPECV_SPLIT_STACK_CALL
+   UNSPECV_SPLIT_STACK_DATA
   ])
 
 ;;
@@ -10907,3 +10914,134 @@
   "TARGET_Z13"
   "lcbb\t%0,%1,%b2"
   [(set_attr "op_type" "VRX")])
+
+; Handle -fsplit-stack.
+
+(define_expand "split_stack_prologue"
+  [(const_int 0)]
+  ""
+{
+  s390_expand_split_stack_prologue ();
+  DONE;
+})
+
+;; If there are operand 0 bytes available on the stack, jump to
+;; operand 1.
+
+(define_expand "split_stack_space_check"
+  [(set (pc) (if_then_else
+	      (ltu (minus (reg 15)
+			  (match_operand 0 "register_operand"))
+		   (unspec [(const_int 0)] UNSPEC_STACK_CHECK))
+	      (label_ref (match_operand 1))
+	      (pc)))]
+  ""
+{
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = TARGET_64BIT ? 0x38 : 0x20;
+  rtx tp = s390_get_thread_pointer ();
+  rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso));
+  rtx reg = gen_reg_rtx (Pmode);
+  rtx cc;
+  if (TARGET_64BIT)
+    emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0]));
+  else
+    emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0]));
+  cc = s390_emit_compare (GT, reg, guard);
+  s390_emit_jump (operands[1], cc);
+
+  DONE;
+})
+
+;; __morestack parameter block for split stack prologue.  Parameters are:
+;; parameter block label, label to be called by __morestack, frame size,
+;; stack parameter size.
+
+(define_insn "split_stack_data"
+  [(unspec_volatile [(match_operand 0 "" "X")
+		     (match_operand 1 "" "X")
+		     (match_operand 2 "const_int_operand" "X")
+		     (match_operand 3 "const_int_operand" "X")]
+		    UNSPECV_SPLIT_STACK_DATA)]
+  "TARGET_CPU_ZARCH"
+{
+  switch_to_section (targetm.asm_out.function_rodata_section
+		 (current_function_decl));
+
+  if (TARGET_64BIT)
+    output_asm_insn (".align\t8", operands);
+  else
+    output_asm_insn (".align\t4", operands);
+  (*targetm.asm_out.internal_label) (asm_out_file, "L",
+				     CODE_LABEL_NUMBER (operands[0]));
+  if (TARGET_64BIT)
+    {
+      output_asm_insn (".quad\t%2", operands);
+      output_asm_insn (".quad\t%3", operands);
+      output_asm_insn (".quad\t%1-%0", operands);
+    }
+  else
+    {
+      output_asm_insn (".long\t%2", operands);
+      output_asm_insn (".long\t%3", operands);
+      output_asm_insn (".long\t%1-%0", operands);
+    }
+
+  switch_to_section (current_function_section ());
+  return "";
+}
+  [(set_attr "length" "0")])
+
+
+;; A jg with minimal fuss for use in split stack prologue.
+
+(define_expand "split_stack_call"
+  [(match_operand 0 "bras_sym_operand" "X")
+   (match_operand 1 "" "")]
+  "TARGET_CPU_ZARCH"
+{
+  if (TARGET_64BIT)
+    emit_jump_insn (gen_split_stack_call_di (operands[0], operands[1]));
+  else
+    emit_jump_insn (gen_split_stack_call_si (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "split_stack_call_<mode>"
+  [(set (pc) (label_ref (match_operand 1 "" "")))
+   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
+				    (reg:P 1)]
+				   UNSPECV_SPLIT_STACK_CALL))]
+  "TARGET_CPU_ZARCH"
+  "jg\t%0"
+  [(set_attr "op_type" "RIL")
+   (set_attr "type"  "branch")])
+
+;; Also a conditional one.
+
+(define_expand "split_stack_cond_call"
+  [(match_operand 0 "bras_sym_operand" "X")
+   (match_operand 1 "" "")
+   (match_operand 2 "" "")]
+  "TARGET_CPU_ZARCH"
+{
+  if (TARGET_64BIT)
+    emit_jump_insn (gen_split_stack_cond_call_di (operands[0], operands[1], operands[2]));
+  else
+    emit_jump_insn (gen_split_stack_cond_call_si (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "split_stack_cond_call_<mode>"
+  [(set (pc)
+	(if_then_else
+	  (match_operand 1 "" "")
+	  (label_ref (match_operand 2 "" ""))
+	  (pc)))
+   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
+				    (reg:P 1)]
+				   UNSPECV_SPLIT_STACK_CALL))]
+  "TARGET_CPU_ZARCH"
+  "jg%C1\t%0"
+  [(set_attr "op_type" "RIL")
+   (set_attr "type"  "branch")])
diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index 49c7929..102cb3f 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-02-03  Marcin Kościelnicki  <koriakin@0x04.net>
+
+	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
+	* config/s390/morestack.S: New file.
+	* config/s390/t-stack-s390: New file.
+	* generic-morestack.c (__splitstack_find): Add s390-specific code.
+
 2016-01-25  Jakub Jelinek  <jakub@redhat.com>
 
 	PR target/69444
diff --git a/libgcc/config.host b/libgcc/config.host
index d8efd82..2be5f7e 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1114,11 +1114,11 @@ rx-*-elf)
 	tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h"
 	;;
 s390-*-linux*)
-	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi"
+	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390"
 	md_unwind_header=s390/linux-unwind.h
 	;;
 s390x-*-linux*)
-	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux"
+	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390"
 	if test "${host_address}" = 32; then
 	   tmake_file="${tmake_file} s390/32/t-floattodi"
 	fi
diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S
new file mode 100644
index 0000000..141dead
--- /dev/null
+++ b/libgcc/config/s390/morestack.S
@@ -0,0 +1,609 @@
+# s390 support for -fsplit-stack.
+# Copyright (C) 2015 Free Software Foundation, Inc.
+# Contributed by Marcin Kościelnicki <koriakin@0x04.net>.
+
+# This file is part of GCC.
+
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# Excess space needed to call ld.so resolver for lazy plt
+# resolution.  Go uses sigaltstack so this doesn't need to
+# also cover signal frame size.
+#define BACKOFF 0x1000
+
+# The __morestack function.
+
+	.global	__morestack
+	.hidden	__morestack
+
+	.type	__morestack,@function
+
+__morestack:
+.LFB1:
+	.cfi_startproc
+
+
+#ifndef __s390x__
+
+
+# The 31-bit __morestack function.
+
+	# We use a cleanup to restore the stack guard if an exception
+	# is thrown through this code.
+#ifndef __PIC__
+	.cfi_personality 0,__gcc_personality_v0
+	.cfi_lsda 0,.LLSDA1
+#else
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#endif
+
+	stm	%r2, %r15, 0x8(%r15)	# Save %r2-%r15.
+	.cfi_offset %r6, -0x48
+	.cfi_offset %r7, -0x44
+	.cfi_offset %r8, -0x40
+	.cfi_offset %r9, -0x3c
+	.cfi_offset %r10, -0x38
+	.cfi_offset %r11, -0x34
+	.cfi_offset %r12, -0x30
+	.cfi_offset %r13, -0x2c
+	.cfi_offset %r14, -0x28
+	.cfi_offset %r15, -0x24
+	lr	%r11, %r15		# Make frame pointer for vararg.
+	.cfi_def_cfa_register %r11
+	ahi	%r15, -0x60		# 0x60 for standard frame.
+	st	%r11, 0(%r15)		# Save back chain.
+	lr	%r8, %r0		# Save %r0 (static chain).
+	lr	%r10, %r1		# Save %r1 (address of parameter block).
+
+	l	%r7, 0(%r10)		# Required frame size to %r7
+	ear	%r1, %a0		# Extract thread pointer.
+	l	%r1, 0x20(%r1)		# Get stack bounduary
+	ar	%r1, %r7		# Stack bounduary + frame size
+	a	%r1, 4(%r10)		# + stack param size
+	clr	%r1, %r15		# Compare with current stack pointer
+	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
+
+	brasl	%r14, __morestack_block_signals
+
+	# We abuse one of caller's fpr save slots (which we don't use for fprs)
+	# as a local variable.  Not needed here, but done to be consistent with
+	# the below use.
+	ahi	%r7, BACKOFF		# Bump requested size a bit.
+	st	%r7, 0x40(%r11)		# Stuff frame size on stack.
+	la	%r2, 0x40(%r11)		# Pass its address as parameter.
+	la	%r3, 0x60(%r11)		# Caller's stack parameters.
+	l	%r4, 4(%r10)		# Size of stack parameters.
+	brasl	%r14, __generic_morestack
+
+	lr	%r15, %r2		# Switch to the new stack.
+	ahi	%r15, -0x60		# Make a stack frame on it.
+	st	%r11, 0(%r15)		# Save back chain.
+
+	s	%r2, 0x40(%r11)		# The end of stack space.
+	ahi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+.LEHB0:
+	st	%r2, 0x20(%r1)	# Save the new stack boundary.
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lr	%r0, %r8		# Static chain.
+	lm	%r2, %r6, 0x8(%r11)	# Paremeter registers.
+
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	a	%r10, 0x8(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0x60(%r11)
+
+	# State of registers:
+	# %r0: Static chain from entry.
+	# %r1: Vararg pointer.
+	# %r2-%r6: Parameters from entry.
+	# %r7-%r10: Indeterminate.
+	# %r11: Frame pointer (%r15 from entry).
+	# %r12-%r13: Indeterminate.
+	# %r14: Return address.
+	# %r15: Stack pointer.
+	basr	%r14, %r10		# Call our caller.
+
+	stm	%r2, %r3, 0x8(%r11)	# Save return registers.
+
+	brasl	%r14, __morestack_block_signals
+
+	# We need a stack slot now, but have no good way to get it - the frame
+	# on new stack had to be exactly 0x60 bytes, or stack parameters would
+	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
+	# save actual fprs).
+	la	%r2, 0x40(%r11)
+	brasl	%r14, __generic_releasestack
+
+	s	%r2, 0x40(%r11)		# Subtract available space.
+	ahi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+.LEHE0:
+	st	%r2, 0x20(%r1)	# Save the new stack boundary.
+
+	# We need to restore the old stack pointer before unblocking signals.
+	# We also need 0x60 bytes for a stack frame.  Since we had a stack
+	# frame at this place before the stack switch, there's no need to
+	# write the back chain again.
+	lr	%r15, %r11
+	ahi	%r15, -0x60
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lm	%r2, %r15, 0x8(%r11)	# Restore all registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# Executed if no new stack allocation is needed.
+
+.Lnoalloc:
+	.cfi_restore_state
+	# We may need to copy stack parameters.
+	l	%r9, 0x4(%r10)		# Load stack parameter size.
+	ltr	%r9, %r9		# And check if it's 0.
+	je	.Lnostackparm		# Skip the copy if not needed.
+	sr	%r15, %r9		# Make space on the stack.
+	la	%r8, 0x60(%r15)		# Destination.
+	la	%r12, 0x60(%r11)	# Source.
+	lr	%r13, %r9		# Source size.
+.Lcopy:
+	mvcle	%r8, %r12, 0		# Copy.
+	jo	.Lcopy
+
+.Lnostackparm:
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	a	%r10, 0x8(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0x60(%r11)
+
+	# OK, no stack allocation needed.  We still follow the protocol and
+	# call our caller - it doesn't cost much and makes sure vararg works.
+	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
+	basr	%r14, %r10		# Call our caller.
+
+	lm	%r6, %r15, 0x18(%r11)	# Restore all callee-saved registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# This is the cleanup code called by the stack unwinder when unwinding
+# through the code between .LEHB0 and .LEHE0 above.
+
+.L1:
+	.cfi_restore_state
+	lr	%r2, %r11		# Stack pointer after resume.
+	brasl	%r14, __generic_findstack
+	lr	%r3, %r11		# Get the stack pointer.
+	sr	%r3, %r2		# Subtract available space.
+	ahi	%r3, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+	st	%r3, 0x20(%r1)	# Save the new stack boundary.
+
+	lr	%r2, %r6		# Exception header.
+#ifdef __PIC__
+	brasl	%r14, _Unwind_Resume@PLT
+#else
+	brasl	%r14, _Unwind_Resume
+#endif
+
+#else /* defined(__s390x__) */
+
+
+# The 64-bit __morestack function.
+
+	# We use a cleanup to restore the stack guard if an exception
+	# is thrown through this code.
+#ifndef __PIC__
+	.cfi_personality 0x3,__gcc_personality_v0
+	.cfi_lsda 0x3,.LLSDA1
+#else
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#endif
+
+	stmg	%r2, %r15, 0x10(%r15)	# Save %r2-%r15.
+	.cfi_offset %r6, -0x70
+	.cfi_offset %r7, -0x68
+	.cfi_offset %r8, -0x60
+	.cfi_offset %r9, -0x58
+	.cfi_offset %r10, -0x50
+	.cfi_offset %r11, -0x48
+	.cfi_offset %r12, -0x40
+	.cfi_offset %r13, -0x38
+	.cfi_offset %r14, -0x30
+	.cfi_offset %r15, -0x28
+	lgr	%r11, %r15		# Make frame pointer for vararg.
+	.cfi_def_cfa_register %r11
+	aghi	%r15, -0xa0		# 0xa0 for standard frame.
+	stg	%r11, 0(%r15)		# Save back chain.
+	lgr	%r8, %r0		# Save %r0 (static chain).
+	lgr	%r10, %r1		# Save %r1 (address of parameter block).
+
+	lg	%r7, 0(%r10)		# Required frame size to %r7
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+	lg	%r1, 0x38(%r1)		# Get stack bounduary
+	agr	%r1, %r7		# Stack bounduary + frame size
+	ag	%r1, 8(%r10)		# + stack param size
+	clgr	%r1, %r15		# Compare with current stack pointer
+	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
+
+	brasl	%r14, __morestack_block_signals
+
+	# We abuse one of caller's fpr save slots (which we don't use for fprs)
+	# as a local variable.  Not needed here, but done to be consistent with
+	# the below use.
+	aghi	%r7, BACKOFF		# Bump requested size a bit.
+	stg	%r7, 0x80(%r11)		# Stuff frame size on stack.
+	la	%r2, 0x80(%r11)		# Pass its address as parameter.
+	la	%r3, 0xa0(%r11)		# Caller's stack parameters.
+	lg	%r4, 8(%r10)		# Size of stack parameters.
+	brasl	%r14, __generic_morestack
+
+	lgr	%r15, %r2		# Switch to the new stack.
+	aghi	%r15, -0xa0		# Make a stack frame on it.
+	stg	%r11, 0(%r15)		# Save back chain.
+
+	sg	%r2, 0x80(%r11)		# The end of stack space.
+	aghi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+.LEHB0:
+	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lgr	%r0, %r8		# Static chain.
+	lmg	%r2, %r6, 0x10(%r11)	# Paremeter registers.
+
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	ag	%r10, 0x10(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0xa0(%r11)
+
+	# State of registers:
+	# %r0: Static chain from entry.
+	# %r1: Vararg pointer.
+	# %r2-%r6: Parameters from entry.
+	# %r7-%r10: Indeterminate.
+	# %r11: Frame pointer (%r15 from entry).
+	# %r12-%r13: Indeterminate.
+	# %r14: Return address.
+	# %r15: Stack pointer.
+	basr	%r14, %r10		# Call our caller.
+
+	stg	%r2, 0x10(%r11)		# Save return register.
+
+	brasl	%r14, __morestack_block_signals
+
+	# We need a stack slot now, but have no good way to get it - the frame
+	# on new stack had to be exactly 0xa0 bytes, or stack parameters would
+	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
+	# save actual fprs).
+	la	%r2, 0x80(%r11)
+	brasl	%r14, __generic_releasestack
+
+	sg	%r2, 0x80(%r11)		# Subtract available space.
+	aghi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+.LEHE0:
+	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
+
+	# We need to restore the old stack pointer before unblocking signals.
+	# We also need 0xa0 bytes for a stack frame.  Since we had a stack
+	# frame at this place before the stack switch, there's no need to
+	# write the back chain again.
+	lgr	%r15, %r11
+	aghi	%r15, -0xa0
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lmg	%r2, %r15, 0x10(%r11)	# Restore all registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# Executed if no new stack allocation is needed.
+
+.Lnoalloc:
+	.cfi_restore_state
+	# We may need to copy stack parameters.
+	lg	%r9, 0x8(%r10)		# Load stack parameter size.
+	ltgr	%r9, %r9		# Check if it's 0.
+	je	.Lnostackparm		# Skip the copy if not needed.
+	sgr	%r15, %r9		# Make space on the stack.
+	la	%r8, 0xa0(%r15)		# Destination.
+	la	%r12, 0xa0(%r11)	# Source.
+	lgr	%r13, %r9		# Source size.
+.Lcopy:
+	mvcle	%r8, %r12, 0		# Copy.
+	jo	.Lcopy
+
+.Lnostackparm:
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	ag	%r10, 0x10(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0xa0(%r11)
+
+	# OK, no stack allocation needed.  We still follow the protocol and
+	# call our caller - it doesn't cost much and makes sure vararg works.
+	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
+	basr	%r14, %r10		# Call our caller.
+
+	lmg	%r6, %r15, 0x30(%r11)	# Restore all callee-saved registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# This is the cleanup code called by the stack unwinder when unwinding
+# through the code between .LEHB0 and .LEHE0 above.
+
+.L1:
+	.cfi_restore_state
+	lgr	%r2, %r11		# Stack pointer after resume.
+	brasl	%r14, __generic_findstack
+	lgr	%r3, %r11		# Get the stack pointer.
+	sgr	%r3, %r2		# Subtract available space.
+	aghi	%r3, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+	stg	%r3, 0x38(%r1)	# Save the new stack boundary.
+
+	lgr	%r2, %r6		# Exception header.
+#ifdef __PIC__
+	brasl	%r14, _Unwind_Resume@PLT
+#else
+	brasl	%r14, _Unwind_Resume
+#endif
+
+#endif /* defined(__s390x__) */
+
+	.cfi_endproc
+	.size	__morestack, . - __morestack
+
+
+# The exception table.  This tells the personality routine to execute
+# the exception handler.
+
+	.section	.gcc_except_table,"a",@progbits
+	.align	4
+.LLSDA1:
+	.byte	0xff	# @LPStart format (omit)
+	.byte	0xff	# @TType format (omit)
+	.byte	0x1	# call-site format (uleb128)
+	.uleb128 .LLSDACSE1-.LLSDACSB1	# Call-site table length
+.LLSDACSB1:
+	.uleb128 .LEHB0-.LFB1	# region 0 start
+	.uleb128 .LEHE0-.LEHB0	# length
+	.uleb128 .L1-.LFB1	# landing pad
+	.uleb128 0		# action
+.LLSDACSE1:
+
+
+	.global __gcc_personality_v0
+#ifdef __PIC__
+	# Build a position independent reference to the basic
+	# personality function.
+	.hidden DW.ref.__gcc_personality_v0
+	.weak   DW.ref.__gcc_personality_v0
+	.section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
+	.type	DW.ref.__gcc_personality_v0, @object
+DW.ref.__gcc_personality_v0:
+#ifndef __LP64__
+	.align 4
+	.size	DW.ref.__gcc_personality_v0, 4
+	.long	__gcc_personality_v0
+#else
+	.align 8
+	.size	DW.ref.__gcc_personality_v0, 8
+	.quad	__gcc_personality_v0
+#endif
+#endif
+
+
+
+# Initialize the stack test value when the program starts or when a
+# new thread starts.  We don't know how large the main stack is, so we
+# guess conservatively.  We might be able to use getrlimit here.
+
+	.text
+	.global	__stack_split_initialize
+	.hidden	__stack_split_initialize
+
+	.type	__stack_split_initialize, @function
+
+__stack_split_initialize:
+
+#ifndef __s390x__
+
+	ear	%r1, %a0
+	lr	%r0, %r15
+	ahi	%r0, -0x4000	# We should have at least 16K.
+	st	%r0, 0x20(%r1)
+
+	lr	%r2, %r15
+	lhi	%r3, 0x4000
+#ifdef __PIC__
+	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
+#else
+	jg	__generic_morestack_set_initial_sp	# Tail call
+#endif
+
+#else /* defined(__s390x__) */
+
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	lgr	%r0, %r15
+	aghi	%r0, -0x4000	# We should have at least 16K.
+	stg	%r0, 0x38(%r1)
+
+	lgr	%r2, %r15
+	lghi	%r3, 0x4000
+#ifdef __PIC__
+	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
+#else
+	jg	__generic_morestack_set_initial_sp	# Tail call
+#endif
+
+#endif /* defined(__s390x__) */
+
+	.size	__stack_split_initialize, . - __stack_split_initialize
+
+# Routines to get and set the guard, for __splitstack_getcontext,
+# __splitstack_setcontext, and __splitstack_makecontext.
+
+# void *__morestack_get_guard (void) returns the current stack guard.
+	.text
+	.global	__morestack_get_guard
+	.hidden	__morestack_get_guard
+
+	.type	__morestack_get_guard,@function
+
+__morestack_get_guard:
+
+#ifndef __s390x__
+	ear	%r1, %a0
+	l	%r2, 0x20(%r1)
+#else
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	lg	%r2, 0x38(%r1)
+#endif
+	br %r14
+
+	.size	__morestack_get_guard, . - __morestack_get_guard
+
+# void __morestack_set_guard (void *) sets the stack guard.
+	.global	__morestack_set_guard
+	.hidden	__morestack_set_guard
+
+	.type	__morestack_set_guard,@function
+
+__morestack_set_guard:
+
+#ifndef __s390x__
+	ear	%r1, %a0
+	st	%r2, 0x20(%r1)
+#else
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	stg	%r2, 0x38(%r1)
+#endif
+	br	%r14
+
+	.size	__morestack_set_guard, . - __morestack_set_guard
+
+# void *__morestack_make_guard (void *, size_t) returns the stack
+# guard value for a stack.
+	.global	__morestack_make_guard
+	.hidden	__morestack_make_guard
+
+	.type	__morestack_make_guard,@function
+
+__morestack_make_guard:
+
+#ifndef __s390x__
+	sr	%r2, %r3
+	ahi	%r2, BACKOFF
+#else
+	sgr	%r2, %r3
+	aghi	%r2, BACKOFF
+#endif
+	br	%r14
+
+	.size	__morestack_make_guard, . - __morestack_make_guard
+
+# Make __stack_split_initialize a high priority constructor.
+
+	.section .ctors.65535,"aw",@progbits
+
+#ifndef __LP64__
+	.align	4
+	.long	__stack_split_initialize
+	.long	__morestack_load_mmap
+#else
+	.align	8
+	.quad	__stack_split_initialize
+	.quad	__morestack_load_mmap
+#endif
+
+	.section	.note.GNU-stack,"",@progbits
+	.section	.note.GNU-split-stack,"",@progbits
+	.section	.note.GNU-no-split-stack,"",@progbits
diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390
new file mode 100644
index 0000000..4c959b0
--- /dev/null
+++ b/libgcc/config/s390/t-stack-s390
@@ -0,0 +1,2 @@
+# Makefile fragment to support -fsplit-stack for s390.
+LIB2ADD_ST += $(srcdir)/config/s390/morestack.S
diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
index 89765d4..b8eec4e 100644
--- a/libgcc/generic-morestack.c
+++ b/libgcc/generic-morestack.c
@@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len,
 #elif defined (__i386__)
       nsp -= 6 * sizeof (void *);
 #elif defined __powerpc64__
+#elif defined __s390x__
+      nsp -= 2 * 160;
+#elif defined __s390__
+      nsp -= 2 * 96;
 #else
 #error "unrecognized target"
 #endif
-- 
2.7.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-03 17:18                         ` Marcin Kościelnicki
@ 2016-02-03 17:27                           ` Ulrich Weigand
  2016-02-04 12:44                             ` Marcin Kościelnicki
  0 siblings, 1 reply; 55+ messages in thread
From: Ulrich Weigand @ 2016-02-03 17:27 UTC (permalink / raw)
  To: Marcin Kościelnicki; +Cc: krebbel, gcc-patches, Marcin Kościelnicki

Marcin Kościelnicki wrote:

> libgcc/ChangeLog:
> 
> 	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
> 	* config/s390/morestack.S: New file.
> 	* config/s390/t-stack-s390: New file.
> 	* generic-morestack.c (__splitstack_find): Add s390-specific code.
> 
> gcc/ChangeLog:
> 
> 	* common/config/s390/s390-common.c (s390_supports_split_stack):
> 	New function.
> 	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
> 	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
> 	* config/s390/s390.c (struct machine_function): New field
> 	split_stack_varargs_pointer.
> 	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
> 	in s390_emit_prologue.
> 	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
> 	vararg pointer.
> 	(morestack_ref): New global.
> 	(SPLIT_STACK_AVAILABLE): New macro.
> 	(s390_expand_split_stack_prologue): New function.
> 	(s390_live_on_entry): New function.
> 	(s390_va_start): Use split-stack vararg pointer if appropriate.
> 	(s390_asm_file_end): Emit the split-stack note sections.
> 	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
> 	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
> 	(UNSPECV_SPLIT_STACK_CALL): New unspec.
> 	(UNSPECV_SPLIT_STACK_DATA): New unspec.
> 	(split_stack_prologue): New expand.
> 	(split_stack_space_check): New expand.
> 	(split_stack_data): New insn.
> 	(split_stack_call): New expand.
> 	(split_stack_call_*): New insn.
> 	(split_stack_cond_call): New expand.
> 	(split_stack_cond_call_*): New insn.
> ---
> Changes applied.  Testsuite still running, still works on my simple tests.
> 
> As for common code prerequisites: #3 is no longer needed, and very likely
> so is #4 (it fixes problems that I've only seen with ESA mode, and testsuite
> runs just fine without it now).

OK, I see.  The patch is OK for mainline then, assuming testing passes.

Thanks again,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-02 20:11                     ` Marcin Kościelnicki
@ 2016-02-03 18:40                       ` Marcin Kościelnicki
  2016-02-04 15:06                         ` Ulrich Weigand
  0 siblings, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-02-03 18:40 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: krebbel, gcc-patches


>>
>> The second issue I'm still not sure about is the magic nop marker
>> for frameless functions.  In an earlier mail you wrote:
>>
>>> Both currently supported
>>> architectures always emit split-stack code on every function.
>>
>> At least for rs6000 this doesn't appear to be true; in
>> rs6000_expand_split_stack_prologue we have:
>>
>>    if (!info->push_p)
>>      return;
>>
>> so it does nothing for frameless routines.
>>
>> Now on i386 we do indeed generate code for frameless routines;
>> in fact, the *same* full stack check is generated as for any
>> other routine.  Now I'm wondering: is there are reason why
>> this check would be necessary (and there's simply a bug in
>> the rs6000 implementation)?  Then we obviously should do the
>> same on s390.
>
> Try that on powerpc64(le):
>
> $ cat a.c
> #include <stdio.h>
>
> void f(void) {
> }
>
> typedef void (*fptr)(void);
>
> fptr g(void);
>
> int main() {
>          printf("%p\n", g());
> }
>
> $ cat b.c
> void f(void);
>
> typedef void (*fptr)(void);
>
> fptr g(void) {
>          return f;
> }
>
> $ gcc -O3 -fsplit-stack -c b.c
> $ gcc -O3 -c a.c
> $ gcc a.o b.o -fuse-ld=gold
>
> I don't have a recent enough gcc for powerpc, but from what I've seen in
> the code, this should explode with a linker error.
>
> Of course, mixing split-stack and non-split-stack code when function
> pointers are involved is sketchy anyway, so what's one more bug...
>
> That said, for s390, we can avoid the above problem by checking the
> relocation in gold now that ESA paths are gone - for direct function
> calls (the only ones we care about), we should be seeing a relocation in
> brasl.  So I'll remove the nopmark thing and add proper recognition in
> gold.

Ugh. I take that back.  For -fPIC, the load-address sequence is:

         larl    %r1,f@GOTENT
         lg      %r2,0(%r1)
         br      %r14

And (sibling) call sequence is:

         larl    %r1,f@GOTENT
         lg      %r1,0(%r1)
         br      %r1

It seems there's no proper way to recognize a call vs a load address - 
so we can either go with emitting the marker, or have the same problem 
as on ppc.

So - how much should we care?

>
>>
>> On the other hand, if rs6000 works fine *without* any code
>> in frameless routines, why wouldn't that work for s390 too?
>>
>> Emitting a nop (that is always executed) still looks weird to me.
>>
>>
>> Bye,
>> Ulrich
>>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-03 17:27                           ` Ulrich Weigand
@ 2016-02-04 12:44                             ` Marcin Kościelnicki
  2016-02-10 13:14                               ` Marcin Kościelnicki
  0 siblings, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-02-04 12:44 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: krebbel, gcc-patches

On 03/02/16 18:27, Ulrich Weigand wrote:
> Marcin Kościelnicki wrote:
>
>> libgcc/ChangeLog:
>>
>> 	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
>> 	* config/s390/morestack.S: New file.
>> 	* config/s390/t-stack-s390: New file.
>> 	* generic-morestack.c (__splitstack_find): Add s390-specific code.
>>
>> gcc/ChangeLog:
>>
>> 	* common/config/s390/s390-common.c (s390_supports_split_stack):
>> 	New function.
>> 	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
>> 	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
>> 	* config/s390/s390.c (struct machine_function): New field
>> 	split_stack_varargs_pointer.
>> 	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
>> 	in s390_emit_prologue.
>> 	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
>> 	vararg pointer.
>> 	(morestack_ref): New global.
>> 	(SPLIT_STACK_AVAILABLE): New macro.
>> 	(s390_expand_split_stack_prologue): New function.
>> 	(s390_live_on_entry): New function.
>> 	(s390_va_start): Use split-stack vararg pointer if appropriate.
>> 	(s390_asm_file_end): Emit the split-stack note sections.
>> 	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
>> 	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
>> 	(UNSPECV_SPLIT_STACK_CALL): New unspec.
>> 	(UNSPECV_SPLIT_STACK_DATA): New unspec.
>> 	(split_stack_prologue): New expand.
>> 	(split_stack_space_check): New expand.
>> 	(split_stack_data): New insn.
>> 	(split_stack_call): New expand.
>> 	(split_stack_call_*): New insn.
>> 	(split_stack_cond_call): New expand.
>> 	(split_stack_cond_call_*): New insn.
>> ---
>> Changes applied.  Testsuite still running, still works on my simple tests.
>>
>> As for common code prerequisites: #3 is no longer needed, and very likely
>> so is #4 (it fixes problems that I've only seen with ESA mode, and testsuite
>> runs just fine without it now).
>
> OK, I see.  The patch is OK for mainline then, assuming testing passes.

Well, testing passes (as in, is no worse than x86 - the testsuite 
doesn't really agree with -fsplit-stack in a few places involving 
backtraces).  However, there's still the libgo issue to be taken care 
of.  For my tests, I patched it up with:

diff --git a/libgo/runtime/proc.c b/libgo/runtime/proc.c
index c25a217..efa6806 100644
--- a/libgo/runtime/proc.c
+++ b/libgo/runtime/proc.c
@@ -2016,17 +2016,19 @@ doentersyscall()
  	m->locks++;

  	// Leave SP around for GC and traceback.
+	{
  #ifdef USING_SPLIT_STACK
-	g->gcstack = __splitstack_find(nil, nil, &g->gcstack_size,
-				       &g->gcnext_segment, &g->gcnext_sp,
-				       &g->gcinitial_sp);
+		size_t size_tmp;
+		g->gcstack = __splitstack_find(nil, nil, &size_tmp,
+					       &g->gcnext_segment, &g->gcnext_sp,
+					       &g->gcinitial_sp);
+		g->gcstack_size = size_tmp;
  #else
-	{
  		void *v;

  		g->gcnext_sp = (byte *) &v;
-	}
  #endif
+	}

  	g->status = Gsyscall;

@@ -2064,9 +2066,13 @@ runtime_entersyscallblock(void)

  	// Leave SP around for GC and traceback.
  #ifdef USING_SPLIT_STACK
-	g->gcstack = __splitstack_find(nil, nil, &g->gcstack_size,
-				       &g->gcnext_segment, &g->gcnext_sp,
-				       &g->gcinitial_sp);
+	{
+		size_t size_tmp;
+		g->gcstack = __splitstack_find(nil, nil, &size_tmp,
+					       &g->gcnext_segment, &g->gcnext_sp,
+					       &g->gcinitial_sp);
+		g->gcstack_size = size_tmp;
+	}
  #else
  	g->gcnext_sp = (byte *) &p;
  #endif

Andreas, did you have any luck with fixing this?  If not, I'll try 
submitting the above patch to gofrontend.

>
> Thanks again,
> Ulrich
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-03 18:40                       ` Marcin Kościelnicki
@ 2016-02-04 15:06                         ` Ulrich Weigand
  2016-02-04 15:20                           ` Marcin Kościelnicki
  0 siblings, 1 reply; 55+ messages in thread
From: Ulrich Weigand @ 2016-02-04 15:06 UTC (permalink / raw)
  To: Marcin Kościelnicki; +Cc: krebbel, gcc-patches

Marcin Kościelnicki wrote:

> Ugh. I take that back.  For -fPIC, the load-address sequence is:
> 
>          larl    %r1,f@GOTENT
>          lg      %r2,0(%r1)
>          br      %r14

This is correct.

> And (sibling) call sequence is:
> 
>          larl    %r1,f@GOTENT
>          lg      %r1,0(%r1)
>          br      %r1

Oops.  That is actually a GCC bug.  The sibcall sequence really must be:

	jg f@PLT

This is a real bug since it forces non-lazy symbol resolution for f
just because the compiler chose those use a sibcall optimization;
that's not supposed to happen.

It seems this bug was accidentally introduced here:

2010-04-20  Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>

        PR target/43635
        * config/s390/s390.c (s390_emit_call): Turn direct into indirect
        calls for -fpic -m31 if they have been sibcall optimized.

since the patch doesn't check for TARGET_64BIT ...

Andreas, can you have a look?

> It seems there's no proper way to recognize a call vs a load address - 
> so we can either go with emitting the marker, or have the same problem 
> as on ppc.
> 
> So - how much should we care?

I think we should fix that bug.  That won't help for existing objects,
but those don't use split stack either, so that shouldn't matter.

If we fix that bug before (or at the same time as) adding split-stack
support, the linker will still be able to distigunish function pointer
loads from calls (including sibcalls) on all objects using split stack.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-04 15:06                         ` Ulrich Weigand
@ 2016-02-04 15:20                           ` Marcin Kościelnicki
  2016-02-04 16:27                             ` Ulrich Weigand
  0 siblings, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-02-04 15:20 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: krebbel, gcc-patches

On 04/02/16 16:06, Ulrich Weigand wrote:
> Marcin Kościelnicki wrote:
>
>> Ugh. I take that back.  For -fPIC, the load-address sequence is:
>>
>>           larl    %r1,f@GOTENT
>>           lg      %r2,0(%r1)
>>           br      %r14
>
> This is correct.
>
>> And (sibling) call sequence is:
>>
>>           larl    %r1,f@GOTENT
>>           lg      %r1,0(%r1)
>>           br      %r1
>
> Oops.  That is actually a GCC bug.  The sibcall sequence really must be:
>
> 	jg f@PLT
>
> This is a real bug since it forces non-lazy symbol resolution for f
> just because the compiler chose those use a sibcall optimization;
> that's not supposed to happen.
>
> It seems this bug was accidentally introduced here:
>
> 2010-04-20  Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>
>
>          PR target/43635
>          * config/s390/s390.c (s390_emit_call): Turn direct into indirect
>          calls for -fpic -m31 if they have been sibcall optimized.
>
> since the patch doesn't check for TARGET_64BIT ...
>
> Andreas, can you have a look?
>
>> It seems there's no proper way to recognize a call vs a load address -
>> so we can either go with emitting the marker, or have the same problem
>> as on ppc.
>>
>> So - how much should we care?
>
> I think we should fix that bug.  That won't help for existing objects,
> but those don't use split stack either, so that shouldn't matter.
>
> If we fix that bug before (or at the same time as) adding split-stack
> support, the linker will still be able to distigunish function pointer
> loads from calls (including sibcalls) on all objects using split stack.
>
> Bye,
> Ulrich
>


Fair enough.  Here's what I'm going to implement in gold:

- any PLT relocation: call
- PC32DBL on a larl: non-call
- PC32DBL otherwise: call
- any other relocation: non-call

Does that sound right?

Marcin Kościelnicki

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-04 15:20                           ` Marcin Kościelnicki
@ 2016-02-04 16:27                             ` Ulrich Weigand
  2016-02-05 21:13                               ` Marcin Kościelnicki
  0 siblings, 1 reply; 55+ messages in thread
From: Ulrich Weigand @ 2016-02-04 16:27 UTC (permalink / raw)
  To: Marcin Kościelnicki; +Cc: krebbel, gcc-patches

Marcin Kościelnicki wrote:

> Fair enough.  Here's what I'm going to implement in gold:
> 
> - any PLT relocation: call
> - PC32DBL on a larl: non-call
> - PC32DBL otherwise: call
> - any other relocation: non-call
> 
> Does that sound right?

Hmm, I'm wondering about the PC32DBL choices.  There are now
a large number of other non-call instructions that use PC32DBL,
including lrl, strl, crl, cgrl, cgfrl, ...

However, these all access *data* at the pointed-to location,
so it is quite unlikely they would ever be used with a
function symbol.  So, assuming that you also check that the
target of the relocation is a function symbol, treating only
larl as non-call might be OK.

Maybe a more conservative approach might be to  make the decision
the other way round: for PC32DBL check for *branch* instructions,
and treat only those are calls.  There's just a few branch
instruction using PC32DBL:

brasl  (call)
brcl   (conditional or unconditional sibcall)
brcth  (???)

where the last one is extremely unlikely (but theorically
possible) to be used as conditional sibcall combined with
a register decrement; I don't think this can ever happen
with current compilers however.

For full completeness, there are also PC16DBL relocations that
*could* target called functions, but only when compiling with
the -msmall-exec flag to assume total executable size is less
than 64 KB.  These are used by the following instructions:

bras
brc
brct
brctg
brxh
brxhg
brxle
brxlg
crj
cgrj
clrj
clgrj
cij
cgij
clij
clgij

Note that those are *all* branch instructions, so it might
make sense to add any PC16DBL targetting a function symbol
to the list of calls, just in case.  (But since basically
nobody ever uses -msmall-exec, it doesn't really matter
much either.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-04 16:27                             ` Ulrich Weigand
@ 2016-02-05 21:13                               ` Marcin Kościelnicki
  2016-02-05 22:02                                 ` Ulrich Weigand
  0 siblings, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-02-05 21:13 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: krebbel, gcc-patches

On 04/02/16 17:27, Ulrich Weigand wrote:
> Marcin Kościelnicki wrote:
>
>> Fair enough.  Here's what I'm going to implement in gold:
>>
>> - any PLT relocation: call
>> - PC32DBL on a larl: non-call
>> - PC32DBL otherwise: call
>> - any other relocation: non-call
>>
>> Does that sound right?
>
> Hmm, I'm wondering about the PC32DBL choices.  There are now
> a large number of other non-call instructions that use PC32DBL,
> including lrl, strl, crl, cgrl, cgfrl, ...
>
> However, these all access *data* at the pointed-to location,
> so it is quite unlikely they would ever be used with a
> function symbol.  So, assuming that you also check that the
> target of the relocation is a function symbol, treating only
> larl as non-call might be OK.

Yeah, I make sure the symbol is a STT_FUNC.
>
> Maybe a more conservative approach might be to  make the decision
> the other way round: for PC32DBL check for *branch* instructions,
> and treat only those are calls.  There's just a few branch
> instruction using PC32DBL:
>
> brasl  (call)
> brcl   (conditional or unconditional sibcall)
> brcth  (???)
>
> where the last one is extremely unlikely (but theorically
> possible) to be used as conditional sibcall combined with
> a register decrement; I don't think this can ever happen
> with current compilers however.

I'll stay with checking for larl - while I can imagine someone adding a 
new conditional branch instruction, I don't see a need for another 
larl-like instruction.  Besides, this way the failure mode for an 
unknown instruction would be producing an error, instead of silently 
emitting code with unfixed prologue.
>
> For full completeness, there are also PC16DBL relocations that
> *could* target called functions, but only when compiling with
> the -msmall-exec flag to assume total executable size is less
> than 64 KB.  These are used by the following instructions:
>
> bras
> brc
> brct
> brctg
> brxh
> brxhg
> brxle
> brxlg
> crj
> cgrj
> clrj
> clgrj
> cij
> cgij
> clij
> clgij
>
> Note that those are *all* branch instructions, so it might
> make sense to add any PC16DBL targetting a function symbol
> to the list of calls, just in case.  (But since basically
> nobody ever uses -msmall-exec, it doesn't really matter
> much either.)

Ah right, I've added PC16DBL to the "always call" list.
>
> Bye,
> Ulrich
>

I've updated and resubmitted the gold patch.

Marcin Kościelnicki

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-05 21:13                               ` Marcin Kościelnicki
@ 2016-02-05 22:02                                 ` Ulrich Weigand
  0 siblings, 0 replies; 55+ messages in thread
From: Ulrich Weigand @ 2016-02-05 22:02 UTC (permalink / raw)
  To: Marcin Kościelnicki; +Cc: krebbel, gcc-patches

Marcin Kościelnicki wrote:

> I'll stay with checking for larl - while I can imagine someone adding a 
> new conditional branch instruction, I don't see a need for another 
> larl-like instruction.  Besides, this way the failure mode for an 
> unknown instruction would be producing an error, instead of silently 
> emitting code with unfixed prologue.

OK, fine with me.  B.t.w. Andreas has checked in the sibcall fix,
so you no longer should be seeing larl used for sibcalls.

> I've updated and resubmitted the gold patch.

Thanks!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] testsuite/s390: Add __morestack test.
  2016-01-29 16:17           ` Andreas Krebbel
  2016-02-02 14:52             ` Marcin Kościelnicki
@ 2016-02-07 12:22             ` Marcin Kościelnicki
  2016-02-19 10:21               ` Andreas Krebbel
  1 sibling, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-02-07 12:22 UTC (permalink / raw)
  To: krebbel; +Cc: gcc-patches, Marcin Kościelnicki

gcc/testsuite/ChangeLog:

	* gcc.target/s390/morestack.c: New test.
---
Here's the promised test.

 gcc/testsuite/ChangeLog                   |   4 +
 gcc/testsuite/gcc.target/s390/morestack.c | 260 ++++++++++++++++++++++++++++++
 2 files changed, 264 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/morestack.c

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 8f528b2..26d600f 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2016-02-05  Marcin Kościelnicki  <koriakin@0x04.net>:
+
+	* gcc.target/s390/morestack.c: New test.
+
 2016-02-04  Martin Liska  <mliska@suse.cz>
 
 	* g++.dg/asan/pr69276.C: New test.
diff --git a/gcc/testsuite/gcc.target/s390/morestack.c b/gcc/testsuite/gcc.target/s390/morestack.c
new file mode 100644
index 0000000..aa28b72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/morestack.c
@@ -0,0 +1,260 @@
+/* Checks proper behavior of __morestack function - specifically, GPR
+   values surviving, stack parameters being copied, and vararg
+   pointer being correct.  */
+
+/* { dg-do run } */
+/* { dg-options "" } */
+
+#include <stdlib.h>
+
+void *orig_r15;
+
+/* 1. Function "test" saves registers, makes a stack frame, puts known
+ *    values in registers, and calls __morestack, telling it to jump to
+ *    testinner, with return address pointing to "testret".
+ * 2. "testinner" checks that parameter registers match what has been
+ *    passed from "test", stack parameters were copied properly to
+ *    the new stack, and the argument pointer matches the calling
+ *    function's stack pointer.  It then leaves new values in volatile
+ *    registers (including return value registers) and returns.
+ * 3. "testret" checks that return value registers contain the expected
+ *    return value, callee-saved GPRs match the values from "test",
+ *    and then returns to main. */
+
+extern unsigned long testparams[3];
+
+#ifdef __s390x__
+
+asm(
+  ".global test\n"
+  "test:\n"
+  ".type test, @function\n"
+  /* Save registers.  */
+  "stmg %r6, %r15, 0x30(%r15)\n"
+  /* Save original sp in a global.  */
+  "larl %r1, orig_r15\n"
+  "stg %r15, 0(%r1)\n"
+  /* Make a stack frame.  */
+  "aghi %r15, -168\n"
+  /* A stack parameter.  */
+  "lghi %r1, 0x1240\n"
+  "stg %r1, 160(%r15)\n"
+  /* Registers.  */
+  "lghi %r0, 0x1230\n"
+  "lghi %r2, 0x1232\n"
+  "lghi %r3, 0x1233\n"
+  "lghi %r4, 0x1234\n"
+  "lghi %r5, 0x1235\n"
+  "lghi %r6, 0x1236\n"
+  "lghi %r7, 0x1237\n"
+  "lghi %r8, 0x1238\n"
+  "lghi %r9, 0x1239\n"
+  "lghi %r10, 0x123a\n"
+  "lghi %r11, 0x123b\n"
+  "lghi %r12, 0x123c\n"
+  "lghi %r13, 0x123d\n"
+  /* Fake return address.  */
+  "larl %r14, testret\n"
+  /* Call morestack.  */
+  "larl %r1, testparams\n"
+  "jg __morestack\n"
+
+  /* Entry point.  */
+  "testinner:\n"
+  /* Check registers.  */
+  "cghi %r0, 0x1230\n"
+  "jne testerr\n"
+  "cghi %r2, 0x1232\n"
+  "jne testerr\n"
+  "cghi %r3, 0x1233\n"
+  "jne testerr\n"
+  "cghi %r4, 0x1234\n"
+  "jne testerr\n"
+  "cghi %r5, 0x1235\n"
+  "jne testerr\n"
+  "cghi %r6, 0x1236\n"
+  "jne testerr\n"
+  /* Check stack param.  */
+  "lg %r0, 0xa0(%r15)\n"
+  "cghi %r0, 0x1240\n"
+  "jne testerr\n"
+  /* Check argument pointer.  */
+  "aghi %r1, 8\n"
+  "larl %r2, orig_r15\n"
+  "cg %r1, 0(%r2)\n"
+  "jne testerr\n"
+  /* Modify volatile registers.  */
+  "lghi %r0, 0x1250\n"
+  "lghi %r1, 0x1251\n"
+  "lghi %r2, 0x1252\n"
+  "lghi %r3, 0x1253\n"
+  "lghi %r4, 0x1254\n"
+  "lghi %r5, 0x1255\n"
+  /* Return.  */
+  "br %r14\n"
+
+  /* Returns here.  */
+  "testret:\n"
+  /* Check return registers.  */
+  "cghi %r2, 0x1252\n"
+  "jne testerr\n"
+  /* Check callee-saved registers.  */
+  "cghi %r6, 0x1236\n"
+  "jne testerr\n"
+  "cghi %r7, 0x1237\n"
+  "jne testerr\n"
+  "cghi %r8, 0x1238\n"
+  "jne testerr\n"
+  "cghi %r9, 0x1239\n"
+  "jne testerr\n"
+  "cghi %r10, 0x123a\n"
+  "jne testerr\n"
+  "cghi %r11, 0x123b\n"
+  "jne testerr\n"
+  "cghi %r12, 0x123c\n"
+  "jne testerr\n"
+  "cghi %r13, 0x123d\n"
+  "jne testerr\n"
+  /* Return.  */
+  "lmg %r6, %r15, 0xd8(%r15)\n"
+  "br %r14\n" 
+
+  /* Parameters block.  */
+  ".section .data\n"
+  ".align 8\n"
+  "testparams:\n"
+  ".quad 160\n"
+  ".quad 8\n"
+  ".quad testinner-testparams\n"
+  ".text\n"
+);
+
+#else
+
+asm(
+  ".global test\n"
+  "test:\n"
+  ".type test, @function\n"
+  /* Save registers.  */
+  "stm %r6, %r15, 0x18(%r15)\n"
+  /* Save original sp in a global.  */
+  "larl %r1, orig_r15\n"
+  "st %r15, 0(%r1)\n"
+  /* Make a stack frame.  */
+  "ahi %r15, -0x68\n"
+  /* A stack parameter.  */
+  "lhi %r1, 0x1240\n"
+  "st %r1, 0x60(%r15)\n"
+  "lhi %r1, 0x1241\n"
+  "st %r1, 0x64(%r15)\n"
+  /* Registers.  */
+  "lhi %r0, 0x1230\n"
+  "lhi %r2, 0x1232\n"
+  "lhi %r3, 0x1233\n"
+  "lhi %r4, 0x1234\n"
+  "lhi %r5, 0x1235\n"
+  "lhi %r6, 0x1236\n"
+  "lhi %r7, 0x1237\n"
+  "lhi %r8, 0x1238\n"
+  "lhi %r9, 0x1239\n"
+  "lhi %r10, 0x123a\n"
+  "lhi %r11, 0x123b\n"
+  "lhi %r12, 0x123c\n"
+  "lhi %r13, 0x123d\n"
+  /* Fake return address.  */
+  "larl %r14, testret\n"
+  /* Call morestack.  */
+  "larl %r1, testparams\n"
+  "jg __morestack\n"
+
+  /* Entry point.  */
+  "testinner:\n"
+  /* Check registers.  */
+  "chi %r0, 0x1230\n"
+  "jne testerr\n"
+  "chi %r2, 0x1232\n"
+  "jne testerr\n"
+  "chi %r3, 0x1233\n"
+  "jne testerr\n"
+  "chi %r4, 0x1234\n"
+  "jne testerr\n"
+  "chi %r5, 0x1235\n"
+  "jne testerr\n"
+  "chi %r6, 0x1236\n"
+  "jne testerr\n"
+  /* Check stack param.  */
+  "l %r0, 0x60(%r15)\n"
+  "chi %r0, 0x1240\n"
+  "jne testerr\n"
+  "l %r0, 0x64(%r15)\n"
+  "chi %r0, 0x1241\n"
+  "jne testerr\n"
+  /* Check argument pointer.  */
+  "ahi %r1, 8\n"
+  "larl %r2, orig_r15\n"
+  "c %r1, 0(%r2)\n"
+  "jne testerr\n"
+  /* Modify volatile registers.  */
+  "lhi %r0, 0x1250\n"
+  "lhi %r1, 0x1251\n"
+  "lhi %r2, 0x1252\n"
+  "lhi %r3, 0x1253\n"
+  "lhi %r4, 0x1254\n"
+  "lhi %r5, 0x1255\n"
+  /* Return.  */
+  "br %r14\n"
+
+  /* Returns here.  */
+  "testret:\n"
+  /* Check return registers.  */
+  "chi %r2, 0x1252\n"
+  "jne testerr\n"
+  "chi %r3, 0x1253\n"
+  "jne testerr\n"
+  /* Check callee-saved registers.  */
+  "chi %r6, 0x1236\n"
+  "jne testerr\n"
+  "chi %r7, 0x1237\n"
+  "jne testerr\n"
+  "chi %r8, 0x1238\n"
+  "jne testerr\n"
+  "chi %r9, 0x1239\n"
+  "jne testerr\n"
+  "chi %r10, 0x123a\n"
+  "jne testerr\n"
+  "chi %r11, 0x123b\n"
+  "jne testerr\n"
+  "chi %r12, 0x123c\n"
+  "jne testerr\n"
+  "chi %r13, 0x123d\n"
+  "jne testerr\n"
+  /* Return.  */
+  "lm %r6, %r15, 0x80(%r15)\n"
+  "br %r14\n" 
+
+  /* Parameters block.  */
+  ".section .data\n"
+  ".align 4\n"
+  "testparams:\n"
+  ".long 96\n"
+  ".long 8\n"
+  ".long testinner-testparams\n"
+  ".text\n"
+);
+
+#endif
+
+_Noreturn void testerr (void) {
+  exit(1);
+}
+
+extern void test (void);
+
+int main (void) {
+  test();
+  /* Now try again, with huge stack frame requested - to exercise
+     both paths in __morestack (new allocation needed or not).  */
+  testparams[0] = 1000000;
+  test();
+  return 0;
+}
-- 
2.7.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-04 12:44                             ` Marcin Kościelnicki
@ 2016-02-10 13:14                               ` Marcin Kościelnicki
  2016-02-14 16:01                                 ` Marcin Kościelnicki
  0 siblings, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-02-10 13:14 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: krebbel, gcc-patches

On 04/02/16 13:44, Marcin Kościelnicki wrote:
> On 03/02/16 18:27, Ulrich Weigand wrote:
>> Marcin Kościelnicki wrote:
>>
>>> libgcc/ChangeLog:
>>>
>>>     * config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
>>>     * config/s390/morestack.S: New file.
>>>     * config/s390/t-stack-s390: New file.
>>>     * generic-morestack.c (__splitstack_find): Add s390-specific code.
>>>
>>> gcc/ChangeLog:
>>>
>>>     * common/config/s390/s390-common.c (s390_supports_split_stack):
>>>     New function.
>>>     (TARGET_SUPPORTS_SPLIT_STACK): New macro.
>>>     * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
>>>     * config/s390/s390.c (struct machine_function): New field
>>>     split_stack_varargs_pointer.
>>>     (s390_register_info): Mark r12 as clobbered if it'll be used as temp
>>>     in s390_emit_prologue.
>>>     (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
>>>     vararg pointer.
>>>     (morestack_ref): New global.
>>>     (SPLIT_STACK_AVAILABLE): New macro.
>>>     (s390_expand_split_stack_prologue): New function.
>>>     (s390_live_on_entry): New function.
>>>     (s390_va_start): Use split-stack vararg pointer if appropriate.
>>>     (s390_asm_file_end): Emit the split-stack note sections.
>>>     (TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
>>>     * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
>>>     (UNSPECV_SPLIT_STACK_CALL): New unspec.
>>>     (UNSPECV_SPLIT_STACK_DATA): New unspec.
>>>     (split_stack_prologue): New expand.
>>>     (split_stack_space_check): New expand.
>>>     (split_stack_data): New insn.
>>>     (split_stack_call): New expand.
>>>     (split_stack_call_*): New insn.
>>>     (split_stack_cond_call): New expand.
>>>     (split_stack_cond_call_*): New insn.
>>> ---
>>> Changes applied.  Testsuite still running, still works on my simple
>>> tests.
>>>
>>> As for common code prerequisites: #3 is no longer needed, and very
>>> likely
>>> so is #4 (it fixes problems that I've only seen with ESA mode, and
>>> testsuite
>>> runs just fine without it now).
>>
>> OK, I see.  The patch is OK for mainline then, assuming testing passes.
>
> Well, testing passes (as in, is no worse than x86 - the testsuite
> doesn't really agree with -fsplit-stack in a few places involving
> backtraces).  However, there's still the libgo issue to be taken care
> of.  For my tests, I patched it up with:
> [...]

I see the libgo patch has landed today.  Can we get this pushed?

Marcin Kościelnicki


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] s390: Add -fsplit-stack support
  2016-02-10 13:14                               ` Marcin Kościelnicki
@ 2016-02-14 16:01                                 ` Marcin Kościelnicki
  2016-02-15 10:21                                   ` Andreas Krebbel
  0 siblings, 1 reply; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-02-14 16:01 UTC (permalink / raw)
  To: uweigand; +Cc: gcc-patches, krebbel, Marcin Kościelnicki

libgcc/ChangeLog:

	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
	* config/s390/morestack.S: New file.
	* config/s390/t-stack-s390: New file.
	* generic-morestack.c (__splitstack_find): Add s390-specific code.

gcc/ChangeLog:

	* common/config/s390/s390-common.c (s390_supports_split_stack):
	New function.
	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
	* config/s390/s390.c (struct machine_function): New field
	split_stack_varargs_pointer.
	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
	in s390_emit_prologue.
	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
	vararg pointer.
	(morestack_ref): New global.
	(SPLIT_STACK_AVAILABLE): New macro.
	(s390_expand_split_stack_prologue): New function.
	(s390_live_on_entry): New function.
	(s390_va_start): Use split-stack vararg pointer if appropriate.
	(s390_asm_file_end): Emit the split-stack note sections.
	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
	(UNSPECV_SPLIT_STACK_CALL): New unspec.
	(UNSPECV_SPLIT_STACK_DATA): New unspec.
	(split_stack_prologue): New expand.
	(split_stack_space_check): New expand.
	(split_stack_data): New insn.
	(split_stack_call): New expand.
	(split_stack_call_*): New insn.
	(split_stack_cond_call): New expand.
	(split_stack_cond_call_*): New insn.
---
Whoops, I noticed a problem introduced when removing ESA bits: in the
__morestack exception-handling path in 31-bit version, I neglected
to stuff GOT address in %r12, which is necessary for the PLT stub to
work.  The only change in this version is the added larl %r12,
_GLOBAL_OFFSET_TABLE_ line.

 gcc/ChangeLog                        |  30 ++
 gcc/common/config/s390/s390-common.c |  14 +
 gcc/config/s390/s390-protos.h        |   1 +
 gcc/config/s390/s390.c               | 214 +++++++++++-
 gcc/config/s390/s390.md              | 138 ++++++++
 libgcc/ChangeLog                     |   7 +
 libgcc/config.host                   |   4 +-
 libgcc/config/s390/morestack.S       | 611 +++++++++++++++++++++++++++++++++++
 libgcc/config/s390/t-stack-s390      |   2 +
 libgcc/generic-morestack.c           |   4 +
 10 files changed, 1018 insertions(+), 7 deletions(-)
 create mode 100644 libgcc/config/s390/morestack.S
 create mode 100644 libgcc/config/s390/t-stack-s390

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index e81d1fe..60a4608 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,33 @@
+2016-02-14  Marcin Kościelnicki  <koriakin@0x04.net>
+
+	* common/config/s390/s390-common.c (s390_supports_split_stack):
+	New function.
+	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
+	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
+	* config/s390/s390.c (struct machine_function): New field
+	split_stack_varargs_pointer.
+	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
+	in s390_emit_prologue.
+	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
+	vararg pointer.
+	(morestack_ref): New global.
+	(SPLIT_STACK_AVAILABLE): New macro.
+	(s390_expand_split_stack_prologue): New function.
+	(s390_live_on_entry): New function.
+	(s390_va_start): Use split-stack vararg pointer if appropriate.
+	(s390_asm_file_end): Emit the split-stack note sections.
+	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
+	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
+	(UNSPECV_SPLIT_STACK_CALL): New unspec.
+	(UNSPECV_SPLIT_STACK_DATA): New unspec.
+	(split_stack_prologue): New expand.
+	(split_stack_space_check): New expand.
+	(split_stack_data): New insn.
+	(split_stack_call): New expand.
+	(split_stack_call_*): New insn.
+	(split_stack_cond_call): New expand.
+	(split_stack_cond_call_*): New insn.
+
 2016-02-14  Venkataramanan Kumar  <venkataramanan.kumar@amd.com>
 
 	*  config/i386/znver1.md
diff --git a/gcc/common/config/s390/s390-common.c b/gcc/common/config/s390/s390-common.c
index 4519c21..1e497e6 100644
--- a/gcc/common/config/s390/s390-common.c
+++ b/gcc/common/config/s390/s390-common.c
@@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
     }
 }
 
+/* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
+   We don't verify it, since earlier versions just have padding at
+   its place, which works just as well.  */
+
+static bool
+s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
+			   struct gcc_options *opts ATTRIBUTE_UNUSED)
+{
+  return true;
+}
+
 #undef TARGET_DEFAULT_TARGET_FLAGS
 #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT)
 
@@ -117,4 +128,7 @@ s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
 #undef TARGET_OPTION_INIT_STRUCT
 #define TARGET_OPTION_INIT_STRUCT s390_option_init_struct
 
+#undef TARGET_SUPPORTS_SPLIT_STACK
+#define TARGET_SUPPORTS_SPLIT_STACK s390_supports_split_stack
+
 struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 633bc1e..09032c9 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -42,6 +42,7 @@ extern bool s390_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
 extern HOST_WIDE_INT s390_initial_elimination_offset (int, int);
 extern void s390_emit_prologue (void);
 extern void s390_emit_epilogue (bool);
+extern void s390_expand_split_stack_prologue (void);
 extern bool s390_can_use_simple_return_insn (void);
 extern bool s390_can_use_return_insn (void);
 extern void s390_function_profiler (FILE *, int);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 9facd96..aa82d1c 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -428,6 +428,13 @@ struct GTY(()) machine_function
   /* True if the current function may contain a tbegin clobbering
      FPRs.  */
   bool tbegin_p;
+
+  /* For -fsplit-stack support: A stack local which holds a pointer to
+     the stack arguments for a function with a variable number of
+     arguments.  This is set at the start of the function and is used
+     to initialize the overflow_arg_area field of the va_list
+     structure.  */
+  rtx split_stack_varargs_pointer;
 };
 
 /* Few accessor macros for struct cfun->machine->s390_frame_layout.  */
@@ -9371,9 +9378,13 @@ s390_register_info ()
 	  cfun_frame_layout.high_fprs++;
       }
 
-  if (flag_pic)
-    clobbered_regs[PIC_OFFSET_TABLE_REGNUM]
-      |= !!df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM);
+  /* Register 12 is used for GOT address, but also as temp in prologue
+     for split-stack stdarg functions (unless r14 is available).  */
+  clobbered_regs[12]
+    |= ((flag_pic && df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM))
+	|| (flag_split_stack && cfun->stdarg
+	    && (crtl->is_leaf || TARGET_TPF_PROFILING
+		|| has_hard_reg_initial_val (Pmode, RETURN_REGNUM))));
 
   clobbered_regs[BASE_REGNUM]
     |= (cfun->machine->base_reg
@@ -10473,12 +10484,15 @@ s390_emit_prologue (void)
   int next_fpr = 0;
 
   /* Choose best register to use for temp use within prologue.
-     See below for why TPF must use the register 1.  */
+     TPF with profiling must avoid the register 14 - the tracing function
+     needs the original contents of r14 to be preserved.  */
 
   if (!has_hard_reg_initial_val (Pmode, RETURN_REGNUM)
       && !crtl->is_leaf
       && !TARGET_TPF_PROFILING)
     temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM);
+  else if (flag_split_stack && cfun->stdarg)
+    temp_reg = gen_rtx_REG (Pmode, 12);
   else
     temp_reg = gen_rtx_REG (Pmode, 1);
 
@@ -10972,6 +10986,166 @@ s300_set_up_by_prologue (hard_reg_set_container *regs)
     SET_HARD_REG_BIT (regs->set, REGNO (cfun->machine->base_reg));
 }
 
+/* -fsplit-stack support.  */
+
+/* A SYMBOL_REF for __morestack.  */
+static GTY(()) rtx morestack_ref;
+
+/* When using -fsplit-stack, the allocation routines set a field in
+   the TCB to the bottom of the stack plus this much space, measured
+   in bytes.  */
+
+#define SPLIT_STACK_AVAILABLE 1024
+
+/* Emit -fsplit-stack prologue, which goes before the regular function
+   prologue.  */
+
+void
+s390_expand_split_stack_prologue (void)
+{
+  rtx r1, guard, cc = NULL;
+  rtx_insn *insn;
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = TARGET_64BIT ? 0x38 : 0x20;
+  /* Pointer size in bytes.  */
+  /* Frame size and argument size - the two parameters to __morestack.  */
+  HOST_WIDE_INT frame_size = cfun_frame_layout.frame_size;
+  /* Align argument size to 8 bytes - simplifies __morestack code.  */
+  HOST_WIDE_INT args_size = crtl->args.size >= 0
+			    ? ((crtl->args.size + 7) & ~7)
+			    : 0;
+  /* Label to be called by __morestack.  */
+  rtx_code_label *call_done = NULL;
+  rtx_code_label *parm_base = NULL;
+  rtx tmp;
+
+  gcc_assert (flag_split_stack && reload_completed);
+  if (!TARGET_CPU_ZARCH)
+    {
+      sorry ("CPUs older than z900 are not supported for -fsplit-stack");
+      return;
+    }
+
+  r1 = gen_rtx_REG (Pmode, 1);
+
+  /* If no stack frame will be allocated, don't do anything.  */
+  if (!frame_size)
+    {
+      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+	{
+	  /* If va_start is used, just use r15.  */
+	  emit_move_insn (r1,
+			 gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+				       GEN_INT (STACK_POINTER_OFFSET)));
+
+	}
+      return;
+    }
+
+  if (morestack_ref == NULL_RTX)
+    {
+      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
+      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
+					   | SYMBOL_FLAG_FUNCTION);
+    }
+
+  if (CONST_OK_FOR_K (frame_size) || CONST_OK_FOR_Op (frame_size))
+    {
+      /* If frame_size will fit in an add instruction, do a stack space
+	 check, and only call __morestack if there's not enough space.  */
+
+      /* Get thread pointer.  r1 is the only register we can always destroy - r0
+	 could contain a static chain (and cannot be used to address memory
+	 anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved.  */
+      emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM));
+      /* Aim at __private_ss.  */
+      guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso));
+
+      /* If less that 1kiB used, skip addition and compare directly with
+	 __private_ss.  */
+      if (frame_size > SPLIT_STACK_AVAILABLE)
+	{
+	  emit_move_insn (r1, guard);
+	  if (TARGET_64BIT)
+	    emit_insn (gen_adddi3 (r1, r1, GEN_INT (frame_size)));
+	  else
+	    emit_insn (gen_addsi3 (r1, r1, GEN_INT (frame_size)));
+	  guard = r1;
+	}
+
+      /* Compare the (maybe adjusted) guard with the stack pointer.  */
+      cc = s390_emit_compare (LT, stack_pointer_rtx, guard);
+    }
+
+  call_done = gen_label_rtx ();
+  parm_base = gen_label_rtx ();
+
+  /* Emit the parameter block.  */
+  tmp = gen_split_stack_data (parm_base, call_done,
+			      GEN_INT (frame_size),
+			      GEN_INT (args_size));
+  insn = emit_insn (tmp);
+  add_reg_note (insn, REG_LABEL_OPERAND, call_done);
+  LABEL_NUSES (call_done)++;
+  add_reg_note (insn, REG_LABEL_OPERAND, parm_base);
+  LABEL_NUSES (parm_base)++;
+
+  /* %r1 = litbase.  */
+  insn = emit_move_insn (r1, gen_rtx_LABEL_REF (VOIDmode, parm_base));
+  add_reg_note (insn, REG_LABEL_OPERAND, parm_base);
+  LABEL_NUSES (parm_base)++;
+
+  /* Now, we need to call __morestack.  It has very special calling
+     conventions: it preserves param/return/static chain registers for
+     calling main function body, and looks for its own parameters at %r1. */
+
+  if (cc != NULL)
+    {
+      tmp = gen_split_stack_cond_call (morestack_ref, cc, call_done);
+
+      insn = emit_jump_insn (tmp);
+      JUMP_LABEL (insn) = call_done;
+      LABEL_NUSES (call_done)++;
+
+      /* Mark the jump as very unlikely to be taken.  */
+      add_int_reg_note (insn, REG_BR_PROB, REG_BR_PROB_BASE / 100);
+
+      if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+	{
+	  /* If va_start is used, and __morestack was not called, just use
+	     r15.  */
+	  emit_move_insn (r1,
+			 gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+				       GEN_INT (STACK_POINTER_OFFSET)));
+	}
+    }
+  else
+    {
+      tmp = gen_split_stack_call (morestack_ref, call_done);
+      insn = emit_jump_insn (tmp);
+      JUMP_LABEL (insn) = call_done;
+      LABEL_NUSES (call_done)++;
+      emit_barrier ();
+    }
+
+  /* __morestack will call us here.  */
+
+  emit_label (call_done);
+}
+
+/* We may have to tell the dataflow pass that the split stack prologue
+   is initializing a register.  */
+
+static void
+s390_live_on_entry (bitmap regs)
+{
+  if (cfun->machine->split_stack_varargs_pointer != NULL_RTX)
+    {
+      gcc_assert (flag_split_stack);
+      bitmap_set_bit (regs, 1);
+    }
+}
+
 /* Return true if the function can use simple_return to return outside
    of a shrink-wrapped region.  At present shrink-wrapping is supported
    in all cases.  */
@@ -11574,6 +11748,27 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
       expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
     }
 
+  if (flag_split_stack
+     && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
+         == NULL)
+     && cfun->machine->split_stack_varargs_pointer == NULL_RTX)
+    {
+      rtx reg;
+      rtx_insn *seq;
+
+      reg = gen_reg_rtx (Pmode);
+      cfun->machine->split_stack_varargs_pointer = reg;
+
+      start_sequence ();
+      emit_move_insn (reg, gen_rtx_REG (Pmode, 1));
+      seq = get_insns ();
+      end_sequence ();
+
+      push_topmost_sequence ();
+      emit_insn_after (seq, entry_of_function ());
+      pop_topmost_sequence ();
+    }
+
   /* Find the overflow area.
      FIXME: This currently is too pessimistic when the vector ABI is
      enabled.  In that case we *always* set up the overflow area
@@ -11582,7 +11777,10 @@ s390_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
       || n_fpr + cfun->va_list_fpr_size > FP_ARG_NUM_REG
       || TARGET_VX_ABI)
     {
-      t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
+      if (cfun->machine->split_stack_varargs_pointer == NULL_RTX)
+        t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
+      else
+        t = make_tree (TREE_TYPE (ovf), cfun->machine->split_stack_varargs_pointer);
 
       off = INTVAL (crtl->args.arg_offset_rtx);
       off = off < 0 ? 0 : off;
@@ -14502,6 +14700,9 @@ s390_asm_file_end (void)
 	     s390_vector_abi);
 #endif
   file_end_indicate_exec_stack ();
+
+  if (flag_split_stack)
+    file_end_indicate_split_stack ();
 }
 
 /* Return true if TYPE is a vector bool type.  */
@@ -14757,6 +14958,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty
 #undef TARGET_SET_UP_BY_PROLOGUE
 #define TARGET_SET_UP_BY_PROLOGUE s300_set_up_by_prologue
 
+#undef TARGET_EXTRA_LIVE_ON_ENTRY
+#define TARGET_EXTRA_LIVE_ON_ENTRY s390_live_on_entry
+
 #undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   s390_use_by_pieces_infrastructure_p
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index ccedead..6f0e172 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -114,6 +114,9 @@
    UNSPEC_SP_SET
    UNSPEC_SP_TEST
 
+   ; Split stack support
+   UNSPEC_STACK_CHECK
+
    ; Test Data Class (TDC)
    UNSPEC_TDC_INSN
 
@@ -276,6 +279,10 @@
    ; Set and get floating point control register
    UNSPECV_SFPC
    UNSPECV_EFPC
+
+   ; Split stack support
+   UNSPECV_SPLIT_STACK_CALL
+   UNSPECV_SPLIT_STACK_DATA
   ])
 
 ;;
@@ -10909,3 +10916,134 @@
   "TARGET_Z13"
   "lcbb\t%0,%1,%b2"
   [(set_attr "op_type" "VRX")])
+
+; Handle -fsplit-stack.
+
+(define_expand "split_stack_prologue"
+  [(const_int 0)]
+  ""
+{
+  s390_expand_split_stack_prologue ();
+  DONE;
+})
+
+;; If there are operand 0 bytes available on the stack, jump to
+;; operand 1.
+
+(define_expand "split_stack_space_check"
+  [(set (pc) (if_then_else
+	      (ltu (minus (reg 15)
+			  (match_operand 0 "register_operand"))
+		   (unspec [(const_int 0)] UNSPEC_STACK_CHECK))
+	      (label_ref (match_operand 1))
+	      (pc)))]
+  ""
+{
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = TARGET_64BIT ? 0x38 : 0x20;
+  rtx tp = s390_get_thread_pointer ();
+  rtx guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, tp, psso));
+  rtx reg = gen_reg_rtx (Pmode);
+  rtx cc;
+  if (TARGET_64BIT)
+    emit_insn (gen_subdi3 (reg, stack_pointer_rtx, operands[0]));
+  else
+    emit_insn (gen_subsi3 (reg, stack_pointer_rtx, operands[0]));
+  cc = s390_emit_compare (GT, reg, guard);
+  s390_emit_jump (operands[1], cc);
+
+  DONE;
+})
+
+;; __morestack parameter block for split stack prologue.  Parameters are:
+;; parameter block label, label to be called by __morestack, frame size,
+;; stack parameter size.
+
+(define_insn "split_stack_data"
+  [(unspec_volatile [(match_operand 0 "" "X")
+		     (match_operand 1 "" "X")
+		     (match_operand 2 "const_int_operand" "X")
+		     (match_operand 3 "const_int_operand" "X")]
+		    UNSPECV_SPLIT_STACK_DATA)]
+  "TARGET_CPU_ZARCH"
+{
+  switch_to_section (targetm.asm_out.function_rodata_section
+		 (current_function_decl));
+
+  if (TARGET_64BIT)
+    output_asm_insn (".align\t8", operands);
+  else
+    output_asm_insn (".align\t4", operands);
+  (*targetm.asm_out.internal_label) (asm_out_file, "L",
+				     CODE_LABEL_NUMBER (operands[0]));
+  if (TARGET_64BIT)
+    {
+      output_asm_insn (".quad\t%2", operands);
+      output_asm_insn (".quad\t%3", operands);
+      output_asm_insn (".quad\t%1-%0", operands);
+    }
+  else
+    {
+      output_asm_insn (".long\t%2", operands);
+      output_asm_insn (".long\t%3", operands);
+      output_asm_insn (".long\t%1-%0", operands);
+    }
+
+  switch_to_section (current_function_section ());
+  return "";
+}
+  [(set_attr "length" "0")])
+
+
+;; A jg with minimal fuss for use in split stack prologue.
+
+(define_expand "split_stack_call"
+  [(match_operand 0 "bras_sym_operand" "X")
+   (match_operand 1 "" "")]
+  "TARGET_CPU_ZARCH"
+{
+  if (TARGET_64BIT)
+    emit_jump_insn (gen_split_stack_call_di (operands[0], operands[1]));
+  else
+    emit_jump_insn (gen_split_stack_call_si (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "split_stack_call_<mode>"
+  [(set (pc) (label_ref (match_operand 1 "" "")))
+   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
+				    (reg:P 1)]
+				   UNSPECV_SPLIT_STACK_CALL))]
+  "TARGET_CPU_ZARCH"
+  "jg\t%0"
+  [(set_attr "op_type" "RIL")
+   (set_attr "type"  "branch")])
+
+;; Also a conditional one.
+
+(define_expand "split_stack_cond_call"
+  [(match_operand 0 "bras_sym_operand" "X")
+   (match_operand 1 "" "")
+   (match_operand 2 "" "")]
+  "TARGET_CPU_ZARCH"
+{
+  if (TARGET_64BIT)
+    emit_jump_insn (gen_split_stack_cond_call_di (operands[0], operands[1], operands[2]));
+  else
+    emit_jump_insn (gen_split_stack_cond_call_si (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "split_stack_cond_call_<mode>"
+  [(set (pc)
+	(if_then_else
+	  (match_operand 1 "" "")
+	  (label_ref (match_operand 2 "" ""))
+	  (pc)))
+   (set (reg:P 1) (unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
+				    (reg:P 1)]
+				   UNSPECV_SPLIT_STACK_CALL))]
+  "TARGET_CPU_ZARCH"
+  "jg%C1\t%0"
+  [(set_attr "op_type" "RIL")
+   (set_attr "type"  "branch")])
diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index 63ad30e..a02e940 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-02-14  Marcin Kościelnicki  <koriakin@0x04.net>
+
+	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
+	* config/s390/morestack.S: New file.
+	* config/s390/t-stack-s390: New file.
+	* generic-morestack.c (__splitstack_find): Add s390-specific code.
+
 2016-02-12  Walter Lee  <walt@tilera.com>
 
 	* config.host (tilegx*-*-linux*): remove ti from
diff --git a/libgcc/config.host b/libgcc/config.host
index 06de0de..ef7dfd0 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1114,11 +1114,11 @@ rx-*-elf)
 	tm_file="$tm_file rx/rx-abi.h rx/rx-lib.h"
 	;;
 s390-*-linux*)
-	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi"
+	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux s390/32/t-floattodi t-stack s390/t-stack-s390"
 	md_unwind_header=s390/linux-unwind.h
 	;;
 s390x-*-linux*)
-	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux"
+	tmake_file="${tmake_file} s390/t-crtstuff s390/t-linux t-stack s390/t-stack-s390"
 	if test "${host_address}" = 32; then
 	   tmake_file="${tmake_file} s390/32/t-floattodi"
 	fi
diff --git a/libgcc/config/s390/morestack.S b/libgcc/config/s390/morestack.S
new file mode 100644
index 0000000..fa6951b
--- /dev/null
+++ b/libgcc/config/s390/morestack.S
@@ -0,0 +1,611 @@
+# s390 support for -fsplit-stack.
+# Copyright (C) 2015 Free Software Foundation, Inc.
+# Contributed by Marcin Kościelnicki <koriakin@0x04.net>.
+
+# This file is part of GCC.
+
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+# Excess space needed to call ld.so resolver for lazy plt
+# resolution.  Go uses sigaltstack so this doesn't need to
+# also cover signal frame size.
+#define BACKOFF 0x1000
+
+# The __morestack function.
+
+	.global	__morestack
+	.hidden	__morestack
+
+	.type	__morestack,@function
+
+__morestack:
+.LFB1:
+	.cfi_startproc
+
+
+#ifndef __s390x__
+
+
+# The 31-bit __morestack function.
+
+	# We use a cleanup to restore the stack guard if an exception
+	# is thrown through this code.
+#ifndef __PIC__
+	.cfi_personality 0,__gcc_personality_v0
+	.cfi_lsda 0,.LLSDA1
+#else
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#endif
+
+	stm	%r2, %r15, 0x8(%r15)	# Save %r2-%r15.
+	.cfi_offset %r6, -0x48
+	.cfi_offset %r7, -0x44
+	.cfi_offset %r8, -0x40
+	.cfi_offset %r9, -0x3c
+	.cfi_offset %r10, -0x38
+	.cfi_offset %r11, -0x34
+	.cfi_offset %r12, -0x30
+	.cfi_offset %r13, -0x2c
+	.cfi_offset %r14, -0x28
+	.cfi_offset %r15, -0x24
+	lr	%r11, %r15		# Make frame pointer for vararg.
+	.cfi_def_cfa_register %r11
+	ahi	%r15, -0x60		# 0x60 for standard frame.
+	st	%r11, 0(%r15)		# Save back chain.
+	lr	%r8, %r0		# Save %r0 (static chain).
+	lr	%r10, %r1		# Save %r1 (address of parameter block).
+
+	l	%r7, 0(%r10)		# Required frame size to %r7
+	ear	%r1, %a0		# Extract thread pointer.
+	l	%r1, 0x20(%r1)		# Get stack bounduary
+	ar	%r1, %r7		# Stack bounduary + frame size
+	a	%r1, 4(%r10)		# + stack param size
+	clr	%r1, %r15		# Compare with current stack pointer
+	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
+
+	brasl	%r14, __morestack_block_signals
+
+	# We abuse one of caller's fpr save slots (which we don't use for fprs)
+	# as a local variable.  Not needed here, but done to be consistent with
+	# the below use.
+	ahi	%r7, BACKOFF		# Bump requested size a bit.
+	st	%r7, 0x40(%r11)		# Stuff frame size on stack.
+	la	%r2, 0x40(%r11)		# Pass its address as parameter.
+	la	%r3, 0x60(%r11)		# Caller's stack parameters.
+	l	%r4, 4(%r10)		# Size of stack parameters.
+	brasl	%r14, __generic_morestack
+
+	lr	%r15, %r2		# Switch to the new stack.
+	ahi	%r15, -0x60		# Make a stack frame on it.
+	st	%r11, 0(%r15)		# Save back chain.
+
+	s	%r2, 0x40(%r11)		# The end of stack space.
+	ahi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+.LEHB0:
+	st	%r2, 0x20(%r1)	# Save the new stack boundary.
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lr	%r0, %r8		# Static chain.
+	lm	%r2, %r6, 0x8(%r11)	# Paremeter registers.
+
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	a	%r10, 0x8(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0x60(%r11)
+
+	# State of registers:
+	# %r0: Static chain from entry.
+	# %r1: Vararg pointer.
+	# %r2-%r6: Parameters from entry.
+	# %r7-%r10: Indeterminate.
+	# %r11: Frame pointer (%r15 from entry).
+	# %r12-%r13: Indeterminate.
+	# %r14: Return address.
+	# %r15: Stack pointer.
+	basr	%r14, %r10		# Call our caller.
+
+	stm	%r2, %r3, 0x8(%r11)	# Save return registers.
+
+	brasl	%r14, __morestack_block_signals
+
+	# We need a stack slot now, but have no good way to get it - the frame
+	# on new stack had to be exactly 0x60 bytes, or stack parameters would
+	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
+	# save actual fprs).
+	la	%r2, 0x40(%r11)
+	brasl	%r14, __generic_releasestack
+
+	s	%r2, 0x40(%r11)		# Subtract available space.
+	ahi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+.LEHE0:
+	st	%r2, 0x20(%r1)	# Save the new stack boundary.
+
+	# We need to restore the old stack pointer before unblocking signals.
+	# We also need 0x60 bytes for a stack frame.  Since we had a stack
+	# frame at this place before the stack switch, there's no need to
+	# write the back chain again.
+	lr	%r15, %r11
+	ahi	%r15, -0x60
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lm	%r2, %r15, 0x8(%r11)	# Restore all registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# Executed if no new stack allocation is needed.
+
+.Lnoalloc:
+	.cfi_restore_state
+	# We may need to copy stack parameters.
+	l	%r9, 0x4(%r10)		# Load stack parameter size.
+	ltr	%r9, %r9		# And check if it's 0.
+	je	.Lnostackparm		# Skip the copy if not needed.
+	sr	%r15, %r9		# Make space on the stack.
+	la	%r8, 0x60(%r15)		# Destination.
+	la	%r12, 0x60(%r11)	# Source.
+	lr	%r13, %r9		# Source size.
+.Lcopy:
+	mvcle	%r8, %r12, 0		# Copy.
+	jo	.Lcopy
+
+.Lnostackparm:
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	a	%r10, 0x8(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0x60(%r11)
+
+	# OK, no stack allocation needed.  We still follow the protocol and
+	# call our caller - it doesn't cost much and makes sure vararg works.
+	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
+	basr	%r14, %r10		# Call our caller.
+
+	lm	%r6, %r15, 0x18(%r11)	# Restore all callee-saved registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# This is the cleanup code called by the stack unwinder when unwinding
+# through the code between .LEHB0 and .LEHE0 above.
+
+.L1:
+	.cfi_restore_state
+	lr	%r2, %r11		# Stack pointer after resume.
+	brasl	%r14, __generic_findstack
+	lr	%r3, %r11		# Get the stack pointer.
+	sr	%r3, %r2		# Subtract available space.
+	ahi	%r3, BACKOFF		# Back off a bit.
+	ear	%r1, %a0		# Extract thread pointer.
+	st	%r3, 0x20(%r1)	# Save the new stack boundary.
+
+	# We need GOT pointer in %r12 for PLT entry.
+	larl	%r12,_GLOBAL_OFFSET_TABLE_
+	lr	%r2, %r6		# Exception header.
+#ifdef __PIC__
+	brasl	%r14, _Unwind_Resume@PLT
+#else
+	brasl	%r14, _Unwind_Resume
+#endif
+
+#else /* defined(__s390x__) */
+
+
+# The 64-bit __morestack function.
+
+	# We use a cleanup to restore the stack guard if an exception
+	# is thrown through this code.
+#ifndef __PIC__
+	.cfi_personality 0x3,__gcc_personality_v0
+	.cfi_lsda 0x3,.LLSDA1
+#else
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#endif
+
+	stmg	%r2, %r15, 0x10(%r15)	# Save %r2-%r15.
+	.cfi_offset %r6, -0x70
+	.cfi_offset %r7, -0x68
+	.cfi_offset %r8, -0x60
+	.cfi_offset %r9, -0x58
+	.cfi_offset %r10, -0x50
+	.cfi_offset %r11, -0x48
+	.cfi_offset %r12, -0x40
+	.cfi_offset %r13, -0x38
+	.cfi_offset %r14, -0x30
+	.cfi_offset %r15, -0x28
+	lgr	%r11, %r15		# Make frame pointer for vararg.
+	.cfi_def_cfa_register %r11
+	aghi	%r15, -0xa0		# 0xa0 for standard frame.
+	stg	%r11, 0(%r15)		# Save back chain.
+	lgr	%r8, %r0		# Save %r0 (static chain).
+	lgr	%r10, %r1		# Save %r1 (address of parameter block).
+
+	lg	%r7, 0(%r10)		# Required frame size to %r7
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+	lg	%r1, 0x38(%r1)		# Get stack bounduary
+	agr	%r1, %r7		# Stack bounduary + frame size
+	ag	%r1, 8(%r10)		# + stack param size
+	clgr	%r1, %r15		# Compare with current stack pointer
+	jle	.Lnoalloc		# guard > sp - frame-size: need alloc
+
+	brasl	%r14, __morestack_block_signals
+
+	# We abuse one of caller's fpr save slots (which we don't use for fprs)
+	# as a local variable.  Not needed here, but done to be consistent with
+	# the below use.
+	aghi	%r7, BACKOFF		# Bump requested size a bit.
+	stg	%r7, 0x80(%r11)		# Stuff frame size on stack.
+	la	%r2, 0x80(%r11)		# Pass its address as parameter.
+	la	%r3, 0xa0(%r11)		# Caller's stack parameters.
+	lg	%r4, 8(%r10)		# Size of stack parameters.
+	brasl	%r14, __generic_morestack
+
+	lgr	%r15, %r2		# Switch to the new stack.
+	aghi	%r15, -0xa0		# Make a stack frame on it.
+	stg	%r11, 0(%r15)		# Save back chain.
+
+	sg	%r2, 0x80(%r11)		# The end of stack space.
+	aghi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+.LEHB0:
+	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lgr	%r0, %r8		# Static chain.
+	lmg	%r2, %r6, 0x10(%r11)	# Paremeter registers.
+
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	ag	%r10, 0x10(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0xa0(%r11)
+
+	# State of registers:
+	# %r0: Static chain from entry.
+	# %r1: Vararg pointer.
+	# %r2-%r6: Parameters from entry.
+	# %r7-%r10: Indeterminate.
+	# %r11: Frame pointer (%r15 from entry).
+	# %r12-%r13: Indeterminate.
+	# %r14: Return address.
+	# %r15: Stack pointer.
+	basr	%r14, %r10		# Call our caller.
+
+	stg	%r2, 0x10(%r11)		# Save return register.
+
+	brasl	%r14, __morestack_block_signals
+
+	# We need a stack slot now, but have no good way to get it - the frame
+	# on new stack had to be exactly 0xa0 bytes, or stack parameters would
+	# be passed wrong.  Abuse fpr save area in caller's frame (we don't
+	# save actual fprs).
+	la	%r2, 0x80(%r11)
+	brasl	%r14, __generic_releasestack
+
+	sg	%r2, 0x80(%r11)		# Subtract available space.
+	aghi	%r2, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+.LEHE0:
+	stg	%r2, 0x38(%r1)	# Save the new stack boundary.
+
+	# We need to restore the old stack pointer before unblocking signals.
+	# We also need 0xa0 bytes for a stack frame.  Since we had a stack
+	# frame at this place before the stack switch, there's no need to
+	# write the back chain again.
+	lgr	%r15, %r11
+	aghi	%r15, -0xa0
+
+	brasl	%r14, __morestack_unblock_signals
+
+	lmg	%r2, %r15, 0x10(%r11)	# Restore all registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# Executed if no new stack allocation is needed.
+
+.Lnoalloc:
+	.cfi_restore_state
+	# We may need to copy stack parameters.
+	lg	%r9, 0x8(%r10)		# Load stack parameter size.
+	ltgr	%r9, %r9		# Check if it's 0.
+	je	.Lnostackparm		# Skip the copy if not needed.
+	sgr	%r15, %r9		# Make space on the stack.
+	la	%r8, 0xa0(%r15)		# Destination.
+	la	%r12, 0xa0(%r11)	# Source.
+	lgr	%r13, %r9		# Source size.
+.Lcopy:
+	mvcle	%r8, %r12, 0		# Copy.
+	jo	.Lcopy
+
+.Lnostackparm:
+	# Third parameter is address of function meat - address of parameter
+	# block.
+	ag	%r10, 0x10(%r10)
+
+	# Leave vararg pointer in %r1, in case function uses it
+	la	%r1, 0xa0(%r11)
+
+	# OK, no stack allocation needed.  We still follow the protocol and
+	# call our caller - it doesn't cost much and makes sure vararg works.
+	# No need to set any registers here - %r0 and %r2-%r6 weren't modified.
+	basr	%r14, %r10		# Call our caller.
+
+	lmg	%r6, %r15, 0x30(%r11)	# Restore all callee-saved registers.
+	.cfi_remember_state
+	.cfi_restore %r15
+	.cfi_restore %r14
+	.cfi_restore %r13
+	.cfi_restore %r12
+	.cfi_restore %r11
+	.cfi_restore %r10
+	.cfi_restore %r9
+	.cfi_restore %r8
+	.cfi_restore %r7
+	.cfi_restore %r6
+	.cfi_def_cfa_register %r15
+	br	%r14			# Return to caller's caller.
+
+# This is the cleanup code called by the stack unwinder when unwinding
+# through the code between .LEHB0 and .LEHE0 above.
+
+.L1:
+	.cfi_restore_state
+	lgr	%r2, %r11		# Stack pointer after resume.
+	brasl	%r14, __generic_findstack
+	lgr	%r3, %r11		# Get the stack pointer.
+	sgr	%r3, %r2		# Subtract available space.
+	aghi	%r3, BACKOFF		# Back off a bit.
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1		# Extract thread pointer.
+	stg	%r3, 0x38(%r1)	# Save the new stack boundary.
+
+	lgr	%r2, %r6		# Exception header.
+#ifdef __PIC__
+	brasl	%r14, _Unwind_Resume@PLT
+#else
+	brasl	%r14, _Unwind_Resume
+#endif
+
+#endif /* defined(__s390x__) */
+
+	.cfi_endproc
+	.size	__morestack, . - __morestack
+
+
+# The exception table.  This tells the personality routine to execute
+# the exception handler.
+
+	.section	.gcc_except_table,"a",@progbits
+	.align	4
+.LLSDA1:
+	.byte	0xff	# @LPStart format (omit)
+	.byte	0xff	# @TType format (omit)
+	.byte	0x1	# call-site format (uleb128)
+	.uleb128 .LLSDACSE1-.LLSDACSB1	# Call-site table length
+.LLSDACSB1:
+	.uleb128 .LEHB0-.LFB1	# region 0 start
+	.uleb128 .LEHE0-.LEHB0	# length
+	.uleb128 .L1-.LFB1	# landing pad
+	.uleb128 0		# action
+.LLSDACSE1:
+
+
+	.global __gcc_personality_v0
+#ifdef __PIC__
+	# Build a position independent reference to the basic
+	# personality function.
+	.hidden DW.ref.__gcc_personality_v0
+	.weak   DW.ref.__gcc_personality_v0
+	.section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
+	.type	DW.ref.__gcc_personality_v0, @object
+DW.ref.__gcc_personality_v0:
+#ifndef __LP64__
+	.align 4
+	.size	DW.ref.__gcc_personality_v0, 4
+	.long	__gcc_personality_v0
+#else
+	.align 8
+	.size	DW.ref.__gcc_personality_v0, 8
+	.quad	__gcc_personality_v0
+#endif
+#endif
+
+
+
+# Initialize the stack test value when the program starts or when a
+# new thread starts.  We don't know how large the main stack is, so we
+# guess conservatively.  We might be able to use getrlimit here.
+
+	.text
+	.global	__stack_split_initialize
+	.hidden	__stack_split_initialize
+
+	.type	__stack_split_initialize, @function
+
+__stack_split_initialize:
+
+#ifndef __s390x__
+
+	ear	%r1, %a0
+	lr	%r0, %r15
+	ahi	%r0, -0x4000	# We should have at least 16K.
+	st	%r0, 0x20(%r1)
+
+	lr	%r2, %r15
+	lhi	%r3, 0x4000
+#ifdef __PIC__
+	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
+#else
+	jg	__generic_morestack_set_initial_sp	# Tail call
+#endif
+
+#else /* defined(__s390x__) */
+
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	lgr	%r0, %r15
+	aghi	%r0, -0x4000	# We should have at least 16K.
+	stg	%r0, 0x38(%r1)
+
+	lgr	%r2, %r15
+	lghi	%r3, 0x4000
+#ifdef __PIC__
+	jg	__generic_morestack_set_initial_sp@PLT	# Tail call
+#else
+	jg	__generic_morestack_set_initial_sp	# Tail call
+#endif
+
+#endif /* defined(__s390x__) */
+
+	.size	__stack_split_initialize, . - __stack_split_initialize
+
+# Routines to get and set the guard, for __splitstack_getcontext,
+# __splitstack_setcontext, and __splitstack_makecontext.
+
+# void *__morestack_get_guard (void) returns the current stack guard.
+	.text
+	.global	__morestack_get_guard
+	.hidden	__morestack_get_guard
+
+	.type	__morestack_get_guard,@function
+
+__morestack_get_guard:
+
+#ifndef __s390x__
+	ear	%r1, %a0
+	l	%r2, 0x20(%r1)
+#else
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	lg	%r2, 0x38(%r1)
+#endif
+	br %r14
+
+	.size	__morestack_get_guard, . - __morestack_get_guard
+
+# void __morestack_set_guard (void *) sets the stack guard.
+	.global	__morestack_set_guard
+	.hidden	__morestack_set_guard
+
+	.type	__morestack_set_guard,@function
+
+__morestack_set_guard:
+
+#ifndef __s390x__
+	ear	%r1, %a0
+	st	%r2, 0x20(%r1)
+#else
+	ear	%r1, %a0
+	sllg	%r1, %r1, 32
+	ear	%r1, %a1
+	stg	%r2, 0x38(%r1)
+#endif
+	br	%r14
+
+	.size	__morestack_set_guard, . - __morestack_set_guard
+
+# void *__morestack_make_guard (void *, size_t) returns the stack
+# guard value for a stack.
+	.global	__morestack_make_guard
+	.hidden	__morestack_make_guard
+
+	.type	__morestack_make_guard,@function
+
+__morestack_make_guard:
+
+#ifndef __s390x__
+	sr	%r2, %r3
+	ahi	%r2, BACKOFF
+#else
+	sgr	%r2, %r3
+	aghi	%r2, BACKOFF
+#endif
+	br	%r14
+
+	.size	__morestack_make_guard, . - __morestack_make_guard
+
+# Make __stack_split_initialize a high priority constructor.
+
+	.section .ctors.65535,"aw",@progbits
+
+#ifndef __LP64__
+	.align	4
+	.long	__stack_split_initialize
+	.long	__morestack_load_mmap
+#else
+	.align	8
+	.quad	__stack_split_initialize
+	.quad	__morestack_load_mmap
+#endif
+
+	.section	.note.GNU-stack,"",@progbits
+	.section	.note.GNU-split-stack,"",@progbits
+	.section	.note.GNU-no-split-stack,"",@progbits
diff --git a/libgcc/config/s390/t-stack-s390 b/libgcc/config/s390/t-stack-s390
new file mode 100644
index 0000000..4c959b0
--- /dev/null
+++ b/libgcc/config/s390/t-stack-s390
@@ -0,0 +1,2 @@
+# Makefile fragment to support -fsplit-stack for s390.
+LIB2ADD_ST += $(srcdir)/config/s390/morestack.S
diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
index 89765d4..b8eec4e 100644
--- a/libgcc/generic-morestack.c
+++ b/libgcc/generic-morestack.c
@@ -939,6 +939,10 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len,
 #elif defined (__i386__)
       nsp -= 6 * sizeof (void *);
 #elif defined __powerpc64__
+#elif defined __s390x__
+      nsp -= 2 * 160;
+#elif defined __s390__
+      nsp -= 2 * 96;
 #else
 #error "unrecognized target"
 #endif
-- 
2.7.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-14 16:01                                 ` Marcin Kościelnicki
@ 2016-02-15 10:21                                   ` Andreas Krebbel
  2016-02-15 10:44                                     ` Marcin Kościelnicki
  0 siblings, 1 reply; 55+ messages in thread
From: Andreas Krebbel @ 2016-02-15 10:21 UTC (permalink / raw)
  To: Marcin Kościelnicki, uweigand; +Cc: gcc-patches

On 02/14/2016 05:01 PM, Marcin Kościelnicki wrote:
> libgcc/ChangeLog:
> 
> 	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
> 	* config/s390/morestack.S: New file.
> 	* config/s390/t-stack-s390: New file.
> 	* generic-morestack.c (__splitstack_find): Add s390-specific code.
> 
> gcc/ChangeLog:
> 
> 	* common/config/s390/s390-common.c (s390_supports_split_stack):
> 	New function.
> 	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
> 	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
> 	* config/s390/s390.c (struct machine_function): New field
> 	split_stack_varargs_pointer.
> 	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
> 	in s390_emit_prologue.
> 	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
> 	vararg pointer.
> 	(morestack_ref): New global.
> 	(SPLIT_STACK_AVAILABLE): New macro.
> 	(s390_expand_split_stack_prologue): New function.
> 	(s390_live_on_entry): New function.
> 	(s390_va_start): Use split-stack vararg pointer if appropriate.
> 	(s390_asm_file_end): Emit the split-stack note sections.
> 	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
> 	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
> 	(UNSPECV_SPLIT_STACK_CALL): New unspec.
> 	(UNSPECV_SPLIT_STACK_DATA): New unspec.
> 	(split_stack_prologue): New expand.
> 	(split_stack_space_check): New expand.
> 	(split_stack_data): New insn.
> 	(split_stack_call): New expand.
> 	(split_stack_call_*): New insn.
> 	(split_stack_cond_call): New expand.
> 	(split_stack_cond_call_*): New insn.

Applied. Thanks!

-Andreas-

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] s390: Add -fsplit-stack support
  2016-02-15 10:21                                   ` Andreas Krebbel
@ 2016-02-15 10:44                                     ` Marcin Kościelnicki
  0 siblings, 0 replies; 55+ messages in thread
From: Marcin Kościelnicki @ 2016-02-15 10:44 UTC (permalink / raw)
  To: Andreas Krebbel, uweigand; +Cc: gcc-patches

On 15/02/16 11:21, Andreas Krebbel wrote:
> On 02/14/2016 05:01 PM, Marcin Kościelnicki wrote:
>> libgcc/ChangeLog:
>>
>> 	* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
>> 	* config/s390/morestack.S: New file.
>> 	* config/s390/t-stack-s390: New file.
>> 	* generic-morestack.c (__splitstack_find): Add s390-specific code.
>>
>> gcc/ChangeLog:
>>
>> 	* common/config/s390/s390-common.c (s390_supports_split_stack):
>> 	New function.
>> 	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
>> 	* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
>> 	* config/s390/s390.c (struct machine_function): New field
>> 	split_stack_varargs_pointer.
>> 	(s390_register_info): Mark r12 as clobbered if it'll be used as temp
>> 	in s390_emit_prologue.
>> 	(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
>> 	vararg pointer.
>> 	(morestack_ref): New global.
>> 	(SPLIT_STACK_AVAILABLE): New macro.
>> 	(s390_expand_split_stack_prologue): New function.
>> 	(s390_live_on_entry): New function.
>> 	(s390_va_start): Use split-stack vararg pointer if appropriate.
>> 	(s390_asm_file_end): Emit the split-stack note sections.
>> 	(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
>> 	* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
>> 	(UNSPECV_SPLIT_STACK_CALL): New unspec.
>> 	(UNSPECV_SPLIT_STACK_DATA): New unspec.
>> 	(split_stack_prologue): New expand.
>> 	(split_stack_space_check): New expand.
>> 	(split_stack_data): New insn.
>> 	(split_stack_call): New expand.
>> 	(split_stack_call_*): New insn.
>> 	(split_stack_cond_call): New expand.
>> 	(split_stack_cond_call_*): New insn.
>
> Applied. Thanks!
>
> -Andreas-
>

Thanks.  And how about that testcase I submitted, does that look OK?

Marcin Kościelnicki

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] testsuite/s390: Add __morestack test.
  2016-02-07 12:22             ` [PATCH] testsuite/s390: Add __morestack test Marcin Kościelnicki
@ 2016-02-19 10:21               ` Andreas Krebbel
  0 siblings, 0 replies; 55+ messages in thread
From: Andreas Krebbel @ 2016-02-19 10:21 UTC (permalink / raw)
  To: Marcin Kościelnicki; +Cc: gcc-patches

On 02/07/2016 01:22 PM, Marcin Kościelnicki wrote:
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/s390/morestack.c: New test.

Applied.  Thanks!

-Andreas-


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU.
  2016-01-02 19:16 ` [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU Marcin Kościelnicki
  2016-01-21 10:05   ` Andreas Krebbel
@ 2016-04-17 21:24   ` Jeff Law
  1 sibling, 0 replies; 55+ messages in thread
From: Jeff Law @ 2016-04-17 21:24 UTC (permalink / raw)
  To: Marcin Kościelnicki, gcc-patches

On 01/02/2016 12:16 PM, Marcin Kościelnicki wrote:
> When an unconditional jump with side effects targets an immediately
> following label, rtl_tidy_fallthru_edge is called.  Since it has side
> effects, it doesn't remove the jump, but the label is still marked
> as fallthru.  This later causes a verification error.  Do nothing in this
> case instead.
>
> gcc/ChangeLog:
>
> 	* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
> 	with side effects.
OK for the trunk (gcc-7)

It may not matter in practice, but you could try ripping out the other 
wide effects into individual insns and recognizing them.  And if that 
works, then you can proceed to eliminate the jump, marking the fallthru 
label, etc.

I think combine has some code to do similar things.

jeff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/5] Fix NOTE_INSN_PROLOGUE_END after unconditional jump.
  2016-01-02 19:16 ` [PATCH 3/5] Fix NOTE_INSN_PROLOGUE_END after unconditional jump Marcin Kościelnicki
@ 2016-04-17 21:25   ` Jeff Law
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff Law @ 2016-04-17 21:25 UTC (permalink / raw)
  To: Marcin Kościelnicki, gcc-patches

On 01/02/2016 12:16 PM, Marcin Kościelnicki wrote:
> With the new s390 split-stack support, when optimization is enabled,
> the cold path of calling __morestack is likely to be moved to the
> end of the function.  This will result in the function ending in
> split_stack_call_esa, which is an unconditional jump instruction and
> part of the function prologue.  reposition_prologue_and_epilogue_notes
> will insert NOTE_INSN_PROLOGUE_END right after it (and before the
> following barrier), causing a verification error.  Insert it after
> the barrier instead (and outside of basic block).
>
> gcc/ChangeLog:
>
> 	* function.c (reposition_prologue_and_epilogue_notes): Avoid
> 	verification error if the last insn of prologue is an unconditional
> 	jump.
> ---
>   gcc/ChangeLog  | 6 ++++++
>   gcc/function.c | 6 ++++++
>   2 files changed, 12 insertions(+)
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 6aef3f9..56e31f6 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,5 +1,11 @@
>   2016-01-02  Marcin Kościelnicki  <koriakin@0x04.net>
>
> +	* function.c (reposition_prologue_and_epilogue_notes): Avoid
> +	verification error if the last insn of prologue is an unconditional
> +	jump.
I'm guessing the BARRIER is actually in the hash table of prologue 
insns?  Oh how I wish we didn't express barriers rtl.


Can this leave NOTEs with no associated basic block in the chain? 
reorder_blocks only fixes the block boundaries, it doesn't fix 
BLOCK_FOR_INSN.

Jeff


^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2016-04-17 21:24 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-02 19:16 [RFC] [PR 68191] s390: Add -fsplit-stack support Marcin Kościelnicki
2016-01-02 19:16 ` [PATCH 2/5] s390: Fix missing .size directives Marcin Kościelnicki
2016-01-20 13:16   ` Andreas Krebbel
2016-01-20 14:01     ` Dominik Vogt
2016-01-21  9:59     ` Andreas Krebbel
2016-01-21 10:10       ` Marcin Kościelnicki
2016-01-02 19:16 ` [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU Marcin Kościelnicki
2016-01-21 10:05   ` Andreas Krebbel
2016-01-21 10:10     ` Marcin Kościelnicki
2016-01-21 23:10     ` Jeff Law
2016-01-22  7:44       ` Andreas Krebbel
2016-01-22 16:39         ` Marcin Kościelnicki
2016-01-27  7:11         ` Jeff Law
2016-04-17 21:24   ` Jeff Law
2016-01-02 19:16 ` [PATCH 3/5] Fix NOTE_INSN_PROLOGUE_END after unconditional jump Marcin Kościelnicki
2016-04-17 21:25   ` Jeff Law
2016-01-02 19:17 ` [PATCH 5/5] s390: Add -fsplit-stack support Marcin Kościelnicki
2016-01-15 18:39   ` Andreas Krebbel
2016-01-15 21:08     ` Marcin Kościelnicki
2016-01-21 10:12       ` Andreas Krebbel
2016-01-21 13:04         ` Marcin Kościelnicki
2016-01-16 13:46     ` [PATCH] " Marcin Kościelnicki
2016-01-29 13:33       ` Andreas Krebbel
2016-01-29 15:43         ` Marcin Kościelnicki
2016-01-29 16:17           ` Andreas Krebbel
2016-02-02 14:52             ` Marcin Kościelnicki
2016-02-02 15:19               ` Andreas Krebbel
2016-02-02 15:31                 ` Marcin Kościelnicki
2016-02-02 18:34                   ` Ulrich Weigand
2016-02-02 20:11                     ` Marcin Kościelnicki
2016-02-03 18:40                       ` Marcin Kościelnicki
2016-02-04 15:06                         ` Ulrich Weigand
2016-02-04 15:20                           ` Marcin Kościelnicki
2016-02-04 16:27                             ` Ulrich Weigand
2016-02-05 21:13                               ` Marcin Kościelnicki
2016-02-05 22:02                                 ` Ulrich Weigand
2016-02-03  0:20                     ` Marcin Kościelnicki
2016-02-03 17:03                       ` Ulrich Weigand
2016-02-03 17:18                         ` Marcin Kościelnicki
2016-02-03 17:27                           ` Ulrich Weigand
2016-02-04 12:44                             ` Marcin Kościelnicki
2016-02-10 13:14                               ` Marcin Kościelnicki
2016-02-14 16:01                                 ` Marcin Kościelnicki
2016-02-15 10:21                                   ` Andreas Krebbel
2016-02-15 10:44                                     ` Marcin Kościelnicki
2016-02-07 12:22             ` [PATCH] testsuite/s390: Add __morestack test Marcin Kościelnicki
2016-02-19 10:21               ` Andreas Krebbel
2016-01-02 19:17 ` [PATCH 1/5] s390: Use proper read-only data section for literals Marcin Kościelnicki
2016-01-20 13:11   ` Andreas Krebbel
2016-01-21  6:56     ` Marcin Kościelnicki
2016-01-21  8:17       ` Mike Stump
2016-01-21  9:46       ` Andreas Krebbel
2016-01-03  3:21 ` [RFC] [PR 68191] s390: Add -fsplit-stack support Ian Lance Taylor
2016-01-03 10:32   ` Marcin Kościelnicki
2016-01-04  7:35   ` Marcin Kościelnicki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).