public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Allocate constant size dynamic stack space in the prologue
@ 2015-11-25 13:17 Dominik Vogt
  2015-11-25 13:33 ` Bernd Schmidt
  2016-05-06  9:38 ` [PATCH v2] " Dominik Vogt
  0 siblings, 2 replies; 27+ messages in thread
From: Dominik Vogt @ 2015-11-25 13:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andreas Krebbel, Ulrich Weigand, Andreas Arnez

[-- Attachment #1: Type: text/plain, Size: 1356 bytes --]

The attached patch fixes a warning during Linux kernel compilation
on S/390 due to -mwarn-dynamicstack and runtime alignment of stack
variables with constant size causing cfun->calls_alloca to be set
(even if alloca is not used at all).  The patched code places
constant size runtime aligned variables in the "virtual stack
vars" area instead of creating a "virtual stack dynamic" area.

This behaviour is activated by defining

  #define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1

in the backend; otherwise the old logic is used.

The kernel uses runtime alignment for the page structure (aligned
to 16 bytes), and apart from triggereing the alloca warning
(-mwarn-dynamicstack), the current Gcc also generates inefficient
code like

  aghi %r15,-160  # prologue: create stack frame
  lgr %r11,%r15   # prologue: generate frame pointer
  aghi %r15,-32   # space for dynamic stack

which could be simplified to

  aghi %r15,-192

(if later optimization passes are able to get rid of the frame
pointer).  Is there a specific reason why the patched behaviour
shouldn't be used for all platforms?

--

As the placement of runtime aligned stack variables with constant
size is done completely in the middleend, I don't see a way to fix
this in the backend.

The patch has been tested (only) on S/390[x].

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-ChangeLog --]
[-- Type: text/plain, Size: 785 bytes --]

gcc/ChangeLog

	* cfgexpand.c (expand_stack_vars): Implement
	ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE.
	* explow.c (get_dynamic_stack_base): New function to return an address
	expression for the dynamic stack base when using
	ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE.
	(allocate_dynamic_stack_space): Use new functions.
	(align_dynamic_address, adjust_size_align): Move some code
	from allocate_dynamic_stack_space to new functions.
	* explow.h (get_dynamic_stack_base): Export.
	* doc/tm.texi (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Document.
	* config/s390/s390.h (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Define.
	* defaults.h (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Define by
	default.

gcc/testsuite/ChangeLog

	* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.

[-- Attachment #3: 0001-Allocate-constant-size-dynamic-stack-space-in-the-pr.patch --]
[-- Type: text/x-diff, Size: 11343 bytes --]

From f96098e0f399b2b9c932f174a4050cab38233b6c Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Mon, 23 Nov 2015 14:35:50 +0100
Subject: [PATCH] Allocate constant size dynamic stack space in the
 prologue ...

... and place it in the virtual stack vars area, if the platform supports it.
On S/390 this saves adjusting the stack pointer twice and forcing the frame
pointer into existence.  It also removes the warning with -mwarn-dynamicstack
that is triggered by cfun->calls_alloca == 1.

This fixes a problem with the Linux kernel which aligns the page structure to
16 bytes at run time using inefficient code and issuing a bogus warning.
---
 gcc/cfgexpand.c                      |  21 +++++-
 gcc/config/s390/s390.h               |   4 ++
 gcc/defaults.h                       |   4 ++
 gcc/doc/tm.texi                      |   6 ++
 gcc/doc/tm.texi.in                   |   6 ++
 gcc/explow.c                         | 129 ++++++++++++++++++++++++++---------
 gcc/explow.h                         |   4 ++
 gcc/testsuite/gcc.dg/stack-usage-2.c |   4 +-
 8 files changed, 142 insertions(+), 36 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 1990e10..bb49a0e 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1079,8 +1079,25 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 
       /* If there were any, allocate space.  */
       if (large_size > 0)
-	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
-						   large_align, true);
+	{
+	  if (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE)
+	    {
+	      HOST_WIDE_INT large_offset;
+	      HOST_WIDE_INT size;
+
+	      size = large_size + large_align / BITS_PER_UNIT;
+
+	      /* Allocate space in the prologue, at the beginning of the virtual
+		 stack vars area.  */
+	      large_offset = alloc_stack_frame_space (size, 1);
+	      large_base = get_dynamic_stack_base (GEN_INT (large_size),
+						   large_offset, large_align);
+	    }
+	  else
+	    /* Allocate space now.  */
+	    large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
+						       large_align, true);
+	}
     }
 
   for (si = 0; si < n; ++si)
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index a0faf13..073ce5c 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -594,6 +594,10 @@ extern const enum reg_class regclass_map[FIRST_PSEUDO_REGISTER];
 #define STACK_DYNAMIC_OFFSET(FUNDECL) \
   (STACK_POINTER_OFFSET + crtl->outgoing_args_size)
 
+/* Constant size dynamic stack space can be allocated through the function
+   prologue to save the extra instructions to adjust the stack pointer.  */
+#define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1
+
 /* Offset of first parameter from the argument pointer register value.
    We have a fake argument pointer register that points directly to
    the argument area.  */
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 0f1c713..abd6ab2 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1055,6 +1055,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define STACK_POINTER_OFFSET    0
 #endif
 
+#ifndef ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE
+#define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 0
+#endif
+
 #ifndef LOCAL_REGNO
 #define LOCAL_REGNO(REGNO)  0
 #endif
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bde808b..921e98a 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3015,6 +3015,12 @@ length of the outgoing arguments.  The default is correct for most
 machines.  See @file{function.c} for details.
 @end defmac
 
+@defmac ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE
+Define to non-zero to enable allocating constant dynamic stack space
+through the function prologue.  This affects only stack variables with
+run time alignment.
+@end defmac
+
 @defmac INITIAL_FRAME_ADDRESS_RTX
 A C expression whose value is RTL representing the address of the initial
 stack frame. This address is passed to @code{RETURN_ADDR_RTX} and
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 0677fc1..5512577 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2621,6 +2621,12 @@ length of the outgoing arguments.  The default is correct for most
 machines.  See @file{function.c} for details.
 @end defmac
 
+@defmac ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE
+Define to non-zero to enable allocating constant dynamic stack space
+through the function prologue.  This affects only stack variables with
+run time alignment.
+@end defmac
+
 @defmac INITIAL_FRAME_ADDRESS_RTX
 A C expression whose value is RTL representing the address of the initial
 stack frame. This address is passed to @code{RETURN_ADDR_RTX} and
diff --git a/gcc/explow.c b/gcc/explow.c
index e6a69e0..c6848d3 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1140,6 +1140,52 @@ record_new_stack_level (void)
     update_sjlj_context ();
 }
 \f
+/* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
+static rtx
+align_dynamic_address (rtx target, unsigned required_align)
+{
+  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
+     but we know it can't.  So add ourselves and then do
+     TRUNC_DIV_EXPR.  */
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
+				       Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
+			  gen_int_mode (required_align / BITS_PER_UNIT,
+					Pmode),
+			  NULL_RTX, 1);
+  target = expand_mult (Pmode, target,
+			gen_int_mode (required_align / BITS_PER_UNIT,
+				      Pmode),
+			NULL_RTX, 1);
+
+  return target;
+}
+
+/* Common code used by allocate_dynamic_stack_space and get_dynamic_stack_base.
+   Adjust SIZE_ALIGN for SIZE, if needed, and returns the updated value.  */
+static unsigned
+adjust_size_align (rtx size, unsigned size_align)
+{
+  if (CONST_INT_P (size))
+    {
+      unsigned HOST_WIDE_INT lsb;
+
+      lsb = INTVAL (size);
+      lsb &= -lsb;
+
+      /* Watch out for overflow truncating to "unsigned".  */
+      if (lsb > UINT_MAX / BITS_PER_UNIT)
+	size_align = 1u << (HOST_BITS_PER_INT - 1);
+      else
+	size_align = (unsigned)lsb * BITS_PER_UNIT;
+    }
+  else if (size_align < BITS_PER_UNIT)
+    size_align = BITS_PER_UNIT;
+  return size_align;
+}
+
 /* Return an rtx representing the address of an area of memory dynamically
    pushed on the stack.
 
@@ -1216,22 +1262,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
   if (GET_MODE (size) != VOIDmode && GET_MODE (size) != Pmode)
     size = convert_to_mode (Pmode, size, 1);
 
-  /* Adjust SIZE_ALIGN, if needed.  */
-  if (CONST_INT_P (size))
-    {
-      unsigned HOST_WIDE_INT lsb;
-
-      lsb = INTVAL (size);
-      lsb &= -lsb;
-
-      /* Watch out for overflow truncating to "unsigned".  */
-      if (lsb > UINT_MAX / BITS_PER_UNIT)
-	size_align = 1u << (HOST_BITS_PER_INT - 1);
-      else
-	size_align = (unsigned)lsb * BITS_PER_UNIT;
-    }
-  else if (size_align < BITS_PER_UNIT)
-    size_align = BITS_PER_UNIT;
+  size_align = adjust_size_align (size, size_align);
 
   /* We can't attempt to minimize alignment necessary, because we don't
      know the final value of preferred_stack_boundary yet while executing
@@ -1473,23 +1504,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
     }
 
   if (must_align)
-    {
-      /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
-	 but we know it can't.  So add ourselves and then do
-	 TRUNC_DIV_EXPR.  */
-      target = expand_binop (Pmode, add_optab, target,
-			     gen_int_mode (required_align / BITS_PER_UNIT - 1,
-					   Pmode),
-			     NULL_RTX, 1, OPTAB_LIB_WIDEN);
-      target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
-			      gen_int_mode (required_align / BITS_PER_UNIT,
-					    Pmode),
-			      NULL_RTX, 1);
-      target = expand_mult (Pmode, target,
-			    gen_int_mode (required_align / BITS_PER_UNIT,
-					  Pmode),
-			    NULL_RTX, 1);
-    }
+    target = align_dynamic_address (target, required_align);
 
   /* Now that we've committed to a return value, mark its alignment.  */
   mark_reg_pointer (target, required_align);
@@ -1499,6 +1514,54 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
 
   return target;
 }
+
+/* Return an rtx representing the address of an area of memory already
+   statically pushed onto the stack in the virtual stack vars area.  (It is
+   assumed that the area is allocated in the function prologue.)
+
+   Any required stack pointer alignment is preserved.
+
+   SIZE is an rtx representing the size of the area.  SIZE must be a constant
+   in VOIDmode or Pmode larger than zero.
+
+   OFFSET is the offset of the area into the virtual stack vars area.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.  */
+
+rtx
+get_dynamic_stack_base (rtx size, HOST_WIDE_INT offset, unsigned required_align)
+{
+  rtx target;
+  unsigned size_align;
+  unsigned extra;
+
+  size_align = adjust_size_align (size, 0);
+
+  if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
+    crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+
+  extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT;
+  size = plus_constant (Pmode, size, extra);
+  size = force_operand (size, NULL_RTX);
+  if (extra && size_align > BITS_PER_UNIT)
+    size_align = BITS_PER_UNIT;
+
+  if (size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0)
+    size = round_push (size);
+
+  target = gen_reg_rtx (Pmode);
+  emit_move_insn (target, virtual_stack_vars_rtx);
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (offset, Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = align_dynamic_address (target, required_align);
+
+  /* Now that we've committed to a return value, mark its alignment.  */
+  mark_reg_pointer (target, required_align);
+
+  return target;
+}
 \f
 /* A front end may want to override GCC's stack checking by providing a
    run-time routine to call to check the stack, so provide a mechanism for
diff --git a/gcc/explow.h b/gcc/explow.h
index 52113db..aa42efb 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -87,6 +87,10 @@ extern void record_new_stack_level (void);
 /* Allocate some space on the stack dynamically and return its address.  */
 extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned, bool);
 
+/* Returns the address of the dynamic stack space without allocating it.  */
+extern rtx get_dynamic_stack_base (rtx size, HOST_WIDE_INT offset,
+				   unsigned required_align);
+
 /* Emit one stack probe at ADDRESS, an address within the stack.  */
 extern void emit_stack_probe (rtx);
 
diff --git a/gcc/testsuite/gcc.dg/stack-usage-2.c b/gcc/testsuite/gcc.dg/stack-usage-2.c
index c2527d2..7d246ec 100644
--- a/gcc/testsuite/gcc.dg/stack-usage-2.c
+++ b/gcc/testsuite/gcc.dg/stack-usage-2.c
@@ -17,7 +17,9 @@ int foo2 (void)  /* { dg-warning "stack usage is \[0-9\]* bytes" } */
   return 0;
 }
 
-int foo3 (void) /* { dg-warning "stack usage might be \[0-9\]* bytes" } */
+/* The actual warning depends on whether stack space is allocated dynamically
+   or staically.  */
+int foo3 (void) /* { dg-warning "stack usage (might be)|(is) \[0-9\]* bytes" } */
 {
   char arr[1024] __attribute__((aligned (512)));
   arr[0] = 1;
-- 
2.3.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Allocate constant size dynamic stack space in the prologue
  2015-11-25 13:17 [PATCH] Allocate constant size dynamic stack space in the prologue Dominik Vogt
@ 2015-11-25 13:33 ` Bernd Schmidt
  2015-11-25 14:54   ` Dominik Vogt
  2016-05-06  9:38 ` [PATCH v2] " Dominik Vogt
  1 sibling, 1 reply; 27+ messages in thread
From: Bernd Schmidt @ 2015-11-25 13:33 UTC (permalink / raw)
  To: gcc-patches, Andreas Krebbel, Ulrich Weigand, Andreas Arnez

On 11/25/2015 01:56 PM, Dominik Vogt wrote:
> The attached patch fixes a warning during Linux kernel compilation
> on S/390 due to -mwarn-dynamicstack and runtime alignment of stack
> variables with constant size causing cfun->calls_alloca to be set
> (even if alloca is not used at all).  The patched code places
> constant size runtime aligned variables in the "virtual stack
> vars" area instead of creating a "virtual stack dynamic" area.
>
> This behaviour is activated by defining
>
>    #define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1
>
> in the backend; otherwise the old logic is used.
>
> The kernel uses runtime alignment for the page structure (aligned
> to 16 bytes),

Just curious, is that necessary or is it an optimization for statically 
allocated page structures?

> 	* cfgexpand.c (expand_stack_vars): Implement
> 	ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE.
> 	* explow.c (get_dynamic_stack_base): New function to return an address
> 	expression for the dynamic stack base when using
> 	ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE.
> 	(allocate_dynamic_stack_space): Use new functions.
> 	(align_dynamic_address, adjust_size_align): Move some code
> 	from allocate_dynamic_stack_space to new functions.
> 	* explow.h (get_dynamic_stack_base): Export.
> 	* doc/tm.texi (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Document.
> 	* config/s390/s390.h (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Define.
> 	* defaults.h (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Define by
> 	default.

I think the approach is quite reasonable. Not sure whether this is 
appropriate for stage3 - it does look slightly risky and may not be 
worth it at this point for just fixing a warning.

However, I don't think this should be a target-controlled thing, just 
make it use the new behaviour unconditionally. Also, in the future, when 
making something target-controlled, use a hook, not a macro.

> +	      /* Allocate space in the prologue, at the beginning of the virtual
> +		 stack vars area.  */

Is this really at the beginning? What if FRAME_GROWS_DOWNWARD?

> /* Common code used by allocate_dynamic_stack_space and get_dynamic_stack_base.
> +   Adjust SIZE_ALIGN for SIZE, if needed, and returns the updated value.  */

The comment is meaningless. Adjust how?

The new function get_dynamic_stack base looks like a shrunk-down version 
of allocate_dynamic_stack_space. What I'm worried about is that it makes 
various adjustments to the size, and these are not mirrored in 
expand_stack_vars. That function already has (after your patch)

+	      size = large_size + large_align / BITS_PER_UNIT;

So no further adjustment should be necessary. Right?

> +  extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT;

allocate_dynamic_stack_space has extra_align here instead of the first 
BITS_PER_UNIT. Why isn't this retained (or, as pointed out above, why is 
any of this code here in the first place)?


Bernd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Allocate constant size dynamic stack space in the prologue
  2015-11-25 13:33 ` Bernd Schmidt
@ 2015-11-25 14:54   ` Dominik Vogt
  2015-11-25 15:06     ` Bernd Schmidt
  0 siblings, 1 reply; 27+ messages in thread
From: Dominik Vogt @ 2015-11-25 14:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andreas Krebbel, Andreas Arnez, Ulrich Weigand

On Wed, Nov 25, 2015 at 02:31:38PM +0100, Bernd Schmidt wrote:
> On 11/25/2015 01:56 PM, Dominik Vogt wrote:
> >The attached patch fixes a warning during Linux kernel compilation
> >on S/390 due to -mwarn-dynamicstack and runtime alignment of stack
> >variables with constant size causing cfun->calls_alloca to be set
> >(even if alloca is not used at all).  The patched code places
> >constant size runtime aligned variables in the "virtual stack
> >vars" area instead of creating a "virtual stack dynamic" area.
> >
> >This behaviour is activated by defining
> >
> >   #define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1
> >
> >in the backend; otherwise the old logic is used.
> >
> >The kernel uses runtime alignment for the page structure (aligned
> >to 16 bytes),
> 
> Just curious, is that necessary or is it an optimization for
> statically allocated page structures?

Without looking into the details, I believe it's an optimization
to have certain frequently used members of the struct always on
the same cache line.

> >	* cfgexpand.c (expand_stack_vars): Implement
> >	ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE.
> >	* explow.c (get_dynamic_stack_base): New function to return an address
> >	expression for the dynamic stack base when using
> >	ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE.
> >	(allocate_dynamic_stack_space): Use new functions.
> >	(align_dynamic_address, adjust_size_align): Move some code
> >	from allocate_dynamic_stack_space to new functions.
> >	* explow.h (get_dynamic_stack_base): Export.
> >	* doc/tm.texi (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Document.
> >	* config/s390/s390.h (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Define.
> >	* defaults.h (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Define by
> >	default.
> 
> I think the approach is quite reasonable. Not sure whether this is
> appropriate for stage3 - it does look slightly risky and may not be
> worth it at this point for just fixing a warning.

Our hope, was to justify this change with "does nothing except on
S/390 for now".  Most of the diff in explow.c is just putting
common parts of allocate_dynamic_stack_space and
get_dynamic_stack_base into subfunctions.  It is possible to leave
the existing function allocate_... untouched and just duplicate
some more code in get_... and clean up the code duplication after
Gcc6.

> However, I don't think this should be a target-controlled thing,
> just make it use the new behaviour unconditionally. Also, in the
> future, when making something target-controlled, use a hook, not a
> macro.

All right.

> >+	      /* Allocate space in the prologue, at the beginning of the virtual
> >+		 stack vars area.  */
> 
> Is this really at the beginning? What if FRAME_GROWS_DOWNWARD?

I thought it was when I wrote the comment, but for S/390 it's
actually placed what I'd call "after" the other stack variables
(at a lower address).  The comment is misleading.  It does not
matter whether the area ends up at the beginning or end.

> >/* Common code used by allocate_dynamic_stack_space and get_dynamic_stack_base.
> >+   Adjust SIZE_ALIGN for SIZE, if needed, and returns the updated value.  */
> 
> The comment is meaningless. Adjust how?

Yeah.  See below.

> The new function get_dynamic_stack base looks like a shrunk-down
> version of allocate_dynamic_stack_space.

It is.

> What I'm worried about is
> that it makes various adjustments to the size, and these are not
> mirrored in expand_stack_vars. That function already has (after your
> patch)
> 
> +	      size = large_size + large_align / BITS_PER_UNIT;
> 
> So no further adjustment should be necessary. Right?
> 
> >+  extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT;
> 
> allocate_dynamic_stack_space has extra_align here instead of the
> first BITS_PER_UNIT. Why isn't this retained (or, as pointed out
> above, why is any of this code here in the first place)?

Actually this whole calculation with size, size_align and extra is
bogus code.  The calculated values are not used.  This makes
adjust_size_align superfluous (and thereby "fixes" the comment
documenting that function).

I have two more questions regarding code copied frpm
allocate_dynamic_stack_space.

1. Is this really necessary in get_dynamic_stack_base?

> +  if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
> +    crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;

2. What is the effect of OPTAB_LIB_WIDEN below?  I've copied it
from the existing "plus" without actually understanding it.

> +  target = expand_binop (Pmode, add_optab, target,
> +                        gen_int_mode (offset, Pmode),
> +                        NULL_RTX, 1, OPTAB_LIB_WIDEN);

I'll send an updated patch tomorrow.  Thanks for your comments and
your help.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Allocate constant size dynamic stack space in the prologue
  2015-11-25 14:54   ` Dominik Vogt
@ 2015-11-25 15:06     ` Bernd Schmidt
  2015-11-27 14:13       ` Dominik Vogt
  0 siblings, 1 reply; 27+ messages in thread
From: Bernd Schmidt @ 2015-11-25 15:06 UTC (permalink / raw)
  To: gcc-patches, Andreas Krebbel, Andreas Arnez, Ulrich Weigand

On 11/25/2015 03:52 PM, Dominik Vogt wrote:
> Without looking into the details, I believe it's an optimization
> to have certain frequently used members of the struct always on
> the same cache line.

Probably better to annotate the global vars then with an alignment, 
rather than waste space on the stack for locals by annotating the type.

> I have two more questions regarding code copied frpm
> allocate_dynamic_stack_space.
>
> 1. Is this really necessary in get_dynamic_stack_base?
>
>> +  if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
>> +    crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;

Good question. Maybe it's in the wrong place. I think this is used to 
get the difference between known and requested stack alignment, and 
thereby reduce the amount of extra space we have to allocate in order to 
ensure the requested alignment. (i.e. you want 16 bytes of alignment and 
your ABI guarantees 4, you allocate 12 additional bytes. That's the 
must_align stuff in allocate_dynamic_stack_space. You might want to copy 
the concept into your new code in expand_stack_vars. I'm thinking the 
code looking at STACK_POINTER_OFFSET/STACK_DYNAMIC_OFFSET is going to be 
irrelevant when you're looking at offsets from the frame pointer, so you 
just need to up the preferred stack boundary and use that.

> 2. What is the effect of OPTAB_LIB_WIDEN below?  I've copied it
> from the existing "plus" without actually understanding it.
>
>> +  target = expand_binop (Pmode, add_optab, target,
>> +                        gen_int_mode (offset, Pmode),
>> +                        NULL_RTX, 1, OPTAB_LIB_WIDEN);

/* Passed to expand_simple_binop and expand_binop to say which options
    to try to use if the requested operation can't be open-coded on the
    requisite mode.  Either OPTAB_LIB or OPTAB_LIB_WIDEN says try using
    a library call.  Either OPTAB_WIDEN or OPTAB_LIB_WIDEN says try
    using a wider mode.  OPTAB_MUST_WIDEN says try widening and don't
    try anything else.  */


Bernd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Allocate constant size dynamic stack space in the prologue
  2015-11-25 15:06     ` Bernd Schmidt
@ 2015-11-27 14:13       ` Dominik Vogt
  2015-11-27 14:24         ` Dominik Vogt
  2015-12-02 19:05         ` Jeff Law
  0 siblings, 2 replies; 27+ messages in thread
From: Dominik Vogt @ 2015-11-27 14:13 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 893 bytes --]

New patch with the following changes:

* Fixed comment about dynamic var area placement.
* The area is now placed further away from the stack pointer than
  the non-dynamic stack variables (tested only with
  STACK_GROWS_DOWNWARD).  This is a possible performance
  improvement on S/390 (hoping that more variables will be
  addressable using a displacement).
* Moved the code that calculates the size to actually allocate
  from the size required by dynamic stack variables to a separate
  function.  Use that function from allocate_dynamic_stack_space()
  and expand_stack_vars() so the size calculations are the same
  for both.
* Use a target hook to activate the feature (for now).
  (This is just meant to make it more feasible to be included in
  Gcc6.  If it's to late for this the code may be as well be used
  for all targets.)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-ChangeLog --]
[-- Type: text/plain, Size: 1038 bytes --]

gcc/ChangeLog

	* cfgexpand.c (expand_stack_vars): Implement
	ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE.
	* explow.c (get_dynamic_stack_base): New function to return an address
	expression for the dynamic stack base when using
	ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE.
	(get_dynamic_stack_size): New function to do the required dynamic stack
	space size calculations.
	(allocate_dynamic_stack_space): Use new functions.
	(align_dynamic_address): Move some code from
	allocate_dynamic_stack_space to new function.
	* explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P):
	Define documentation hook.
	* config/s390/s390.c
	(TARGET_ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P):
	Provide hook.
	* defaults.h (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P): Define by
	default.
	* target.def (allocate_dynamic_stack_space_in_prologue_p): Define hook.

gcc/testsuite/ChangeLog

	* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.

[-- Attachment #3: 0001-v2-Allocate-constant-size-dynamic-stack-space-in-the-pr.patch --]
[-- Type: text/x-diff, Size: 18544 bytes --]

From 55b9ba6882dbd2d8deed6c337b0e7de65617d7b3 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Wed, 25 Nov 2015 09:31:19 +0100
Subject: [PATCH] v2: Allocate constant size dynamic stack space in the
 prologue ...

... and place it in the virtual stack vars area, if the platform supports it.
On S/390 this saves adjusting the stack pointer twice and forcing the frame
pointer into existence.  It also removes the warning with -mwarn-dynamicstack
that is triggered by cfun->calls_alloca == 1.

This fixes a problem with the Linux kernel which aligns the page structure to
16 bytes at run time using inefficient code and issuing a bogus warning.
---
 gcc/cfgexpand.c                      |  26 +++-
 gcc/config/s390/s390.c               |   3 +
 gcc/config/s390/s390.h               |   4 +
 gcc/defaults.h                       |   4 +
 gcc/doc/tm.texi                      |   5 +
 gcc/doc/tm.texi.in                   |   2 +
 gcc/explow.c                         | 232 +++++++++++++++++++++++------------
 gcc/explow.h                         |   9 ++
 gcc/target.def                       |   9 ++
 gcc/testsuite/gcc.dg/stack-usage-2.c |   4 +-
 10 files changed, 214 insertions(+), 84 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 1990e10..81a7aac 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1032,7 +1032,9 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
   size_t si, i, j, n = stack_vars_num;
   HOST_WIDE_INT large_size = 0, large_alloc = 0;
   rtx large_base = NULL;
+  rtx large_allocsize = NULL;
   unsigned large_align = 0;
+  bool large_allocation_done = false;
   tree decl;
 
   /* Determine if there are any variables requiring "large" alignment.
@@ -1079,8 +1081,17 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 
       /* If there were any, allocate space.  */
       if (large_size > 0)
-	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
-						   large_align, true);
+	{
+	  if (targetm.calls.allocate_dynamic_stack_space_in_prologue_p ())
+	    {
+	      large_allocsize = GEN_INT (large_size);
+	      get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
+	    }
+	  else
+	    /* Allocate space now.  */
+	    large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
+						       large_align, true);
+	}
     }
 
   for (si = 0; si < n; ++si)
@@ -1166,6 +1177,17 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	  /* Large alignment is only processed in the last pass.  */
 	  if (pred)
 	    continue;
+
+	  if (large_allocsize && ! large_allocation_done)
+	    {
+	      /* Allocate space the virtual stack vars area in the prologue.
+	       */
+	      HOST_WIDE_INT loffset;
+
+	      loffset = alloc_stack_frame_space (INTVAL (large_allocsize), 1);
+	      large_base = get_dynamic_stack_base (loffset, large_align);
+	      large_allocation_done = true;
+	    }
 	  gcc_assert (large_base != NULL);
 
 	  large_alloc += alignb - 1;
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 40ee2f7..61793ca 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -14096,6 +14096,9 @@ s390_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1, const_tree ty
 #undef TARGET_ASM_FILE_END
 #define TARGET_ASM_FILE_END s390_asm_file_end
 
+#undef TARGET_ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P
+#define TARGET_ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P hook_bool_void_true
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-s390.h"
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index a0faf13..073ce5c 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -594,6 +594,10 @@ extern const enum reg_class regclass_map[FIRST_PSEUDO_REGISTER];
 #define STACK_DYNAMIC_OFFSET(FUNDECL) \
   (STACK_POINTER_OFFSET + crtl->outgoing_args_size)
 
+/* Constant size dynamic stack space can be allocated through the function
+   prologue to save the extra instructions to adjust the stack pointer.  */
+#define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1
+
 /* Offset of first parameter from the argument pointer register value.
    We have a fake argument pointer register that points directly to
    the argument area.  */
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 0f1c713..2c4ab7d 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1055,6 +1055,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define STACK_POINTER_OFFSET    0
 #endif
 
+#ifndef ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P
+#define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P 0
+#endif
+
 #ifndef LOCAL_REGNO
 #define LOCAL_REGNO(REGNO)  0
 #endif
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bde808b..4ede2d7 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11568,6 +11568,11 @@ to the stack.  Therefore, this hook should return true in general, but
 false for naked functions.  The default implementation always returns true.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P (void)
+This hook indicates whether the target supports allocating runtime aligned
+stack variable with constant size through the function prologue.
+@end deftypefn
+
 @deftypevr {Target Hook} {unsigned HOST_WIDE_INT} TARGET_CONST_ANCHOR
 On some architectures it can take multiple instructions to synthesize
 a constant.  If there is another constant already in a register that
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 0677fc1..8c5c68d 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8201,6 +8201,8 @@ and the associated definitions of those functions.
 
 @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
 
+@hook TARGET_ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P
+
 @hook TARGET_CONST_ANCHOR
 
 @hook TARGET_ASAN_SHADOW_OFFSET
diff --git a/gcc/explow.c b/gcc/explow.c
index e6a69e0..88d58d4 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1140,83 +1140,57 @@ record_new_stack_level (void)
     update_sjlj_context ();
 }
 \f
-/* Return an rtx representing the address of an area of memory dynamically
-   pushed on the stack.
+/* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
+static rtx
+align_dynamic_address (rtx target, unsigned required_align)
+{
+  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
+     but we know it can't.  So add ourselves and then do
+     TRUNC_DIV_EXPR.  */
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
+				       Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
+			  gen_int_mode (required_align / BITS_PER_UNIT,
+					Pmode),
+			  NULL_RTX, 1);
+  target = expand_mult (Pmode, target,
+			gen_int_mode (required_align / BITS_PER_UNIT,
+				      Pmode),
+			NULL_RTX, 1);
 
-   Any required stack pointer alignment is preserved.
+  return target;
+}
 
-   SIZE is an rtx representing the size of the area.
+/* Return an rtx through *PSIZE, representing the size of an area of memory to
+   be dynamically pushed on the stack.  The bool return value of this function
+   indicates whether any alignment has been done.
+
+   *PSIZE is an rtx representing the size of the area.
 
    SIZE_ALIGN is the alignment (in bits) that we know SIZE has.  This
-   parameter may be zero.  If so, a proper value will be extracted 
+   parameter may be zero.  If so, a proper value will be extracted
    from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed.
 
    REQUIRED_ALIGN is the alignment (in bits) required for the region
    of memory.
 
-   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
-   stack space allocated by the generated code cannot be added with itself
-   in the course of the execution of the function.  It is always safe to
-   pass FALSE here and the following criterion is sufficient in order to
-   pass TRUE: every path in the CFG that starts at the allocation point and
-   loops to it executes the associated deallocation code.  */
-
-rtx
-allocate_dynamic_stack_space (rtx size, unsigned size_align,
-			      unsigned required_align, bool cannot_accumulate)
+   If PSTACK_USAGE_SIZE is not NULL it points to a value that is increased for
+   the additional size returned.  */
+bool
+get_dynamic_stack_size (rtx *psize, unsigned size_align,
+			unsigned required_align,
+			HOST_WIDE_INT *pstack_usage_size)
 {
-  HOST_WIDE_INT stack_usage_size = -1;
-  rtx_code_label *final_label;
-  rtx final_target, target;
   unsigned extra_align = 0;
   bool must_align;
-
-  /* If we're asking for zero bytes, it doesn't matter what we point
-     to since we can't dereference it.  But return a reasonable
-     address anyway.  */
-  if (size == const0_rtx)
-    return virtual_stack_dynamic_rtx;
-
-  /* Otherwise, show we're calling alloca or equivalent.  */
-  cfun->calls_alloca = 1;
-
-  /* If stack usage info is requested, look into the size we are passed.
-     We need to do so this early to avoid the obfuscation that may be
-     introduced later by the various alignment operations.  */
-  if (flag_stack_usage_info)
-    {
-      if (CONST_INT_P (size))
-	stack_usage_size = INTVAL (size);
-      else if (REG_P (size))
-        {
-	  /* Look into the last emitted insn and see if we can deduce
-	     something for the register.  */
-	  rtx_insn *insn;
-	  rtx set, note;
-	  insn = get_last_insn ();
-	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
-	    {
-	      if (CONST_INT_P (SET_SRC (set)))
-		stack_usage_size = INTVAL (SET_SRC (set));
-	      else if ((note = find_reg_equal_equiv_note (insn))
-		       && CONST_INT_P (XEXP (note, 0)))
-		stack_usage_size = INTVAL (XEXP (note, 0));
-	    }
-	}
-
-      /* If the size is not constant, we can't say anything.  */
-      if (stack_usage_size == -1)
-	{
-	  current_function_has_unbounded_dynamic_stack_size = 1;
-	  stack_usage_size = 0;
-	}
-    }
+  rtx size = *psize;
 
   /* Ensure the size is in the proper mode.  */
   if (GET_MODE (size) != VOIDmode && GET_MODE (size) != Pmode)
     size = convert_to_mode (Pmode, size, 1);
 
-  /* Adjust SIZE_ALIGN, if needed.  */
   if (CONST_INT_P (size))
     {
       unsigned HOST_WIDE_INT lsb;
@@ -1276,8 +1250,8 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
       size = plus_constant (Pmode, size, extra);
       size = force_operand (size, NULL_RTX);
 
-      if (flag_stack_usage_info)
-	stack_usage_size += extra;
+      if (flag_stack_usage_info && pstack_usage_size)
+	*pstack_usage_size += extra;
 
       if (extra && size_align > extra_align)
 	size_align = extra_align;
@@ -1300,13 +1274,93 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
     {
       size = round_push (size);
 
-      if (flag_stack_usage_info)
+      if (flag_stack_usage_info && pstack_usage_size)
 	{
 	  int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
-	  stack_usage_size = (stack_usage_size + align - 1) / align * align;
+	  *pstack_usage_size =
+	    (*pstack_usage_size + align - 1) / align * align;
+	}
+    }
+
+  *psize = size;
+
+  return must_align;
+}
+
+/* Return an rtx representing the address of an area of memory dynamically
+   pushed on the stack.
+
+   Any required stack pointer alignment is preserved.
+
+   SIZE is an rtx representing the size of the area.
+
+   SIZE_ALIGN is the alignment (in bits) that we know SIZE has.  This
+   parameter may be zero.  If so, a proper value will be extracted
+   from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.
+
+   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
+   stack space allocated by the generated code cannot be added with itself
+   in the course of the execution of the function.  It is always safe to
+   pass FALSE here and the following criterion is sufficient in order to
+   pass TRUE: every path in the CFG that starts at the allocation point and
+   loops to it executes the associated deallocation code.  */
+
+rtx
+allocate_dynamic_stack_space (rtx size, unsigned size_align,
+			      unsigned required_align, bool cannot_accumulate)
+{
+  HOST_WIDE_INT stack_usage_size = -1;
+  rtx_code_label *final_label;
+  rtx final_target, target;
+  bool must_align;
+
+  /* If we're asking for zero bytes, it doesn't matter what we point
+     to since we can't dereference it.  But return a reasonable
+     address anyway.  */
+  if (size == const0_rtx)
+    return virtual_stack_dynamic_rtx;
+
+  /* Otherwise, show we're calling alloca or equivalent.  */
+  cfun->calls_alloca = 1;
+
+  /* If stack usage info is requested, look into the size we are passed.
+     We need to do so this early to avoid the obfuscation that may be
+     introduced later by the various alignment operations.  */
+  if (flag_stack_usage_info)
+    {
+      if (CONST_INT_P (size))
+	stack_usage_size = INTVAL (size);
+      else if (REG_P (size))
+        {
+	  /* Look into the last emitted insn and see if we can deduce
+	     something for the register.  */
+	  rtx_insn *insn;
+	  rtx set, note;
+	  insn = get_last_insn ();
+	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
+	    {
+	      if (CONST_INT_P (SET_SRC (set)))
+		stack_usage_size = INTVAL (SET_SRC (set));
+	      else if ((note = find_reg_equal_equiv_note (insn))
+		       && CONST_INT_P (XEXP (note, 0)))
+		stack_usage_size = INTVAL (XEXP (note, 0));
+	    }
+	}
+
+      /* If the size is not constant, we can't say anything.  */
+      if (stack_usage_size == -1)
+	{
+	  current_function_has_unbounded_dynamic_stack_size = 1;
+	  stack_usage_size = 0;
 	}
     }
 
+  must_align = get_dynamic_stack_size (&size, size_align, required_align,
+				       &stack_usage_size);
+
   target = gen_reg_rtx (Pmode);
 
   /* The size is supposed to be fully adjusted at this point so record it
@@ -1473,23 +1527,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
     }
 
   if (must_align)
-    {
-      /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
-	 but we know it can't.  So add ourselves and then do
-	 TRUNC_DIV_EXPR.  */
-      target = expand_binop (Pmode, add_optab, target,
-			     gen_int_mode (required_align / BITS_PER_UNIT - 1,
-					   Pmode),
-			     NULL_RTX, 1, OPTAB_LIB_WIDEN);
-      target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
-			      gen_int_mode (required_align / BITS_PER_UNIT,
-					    Pmode),
-			      NULL_RTX, 1);
-      target = expand_mult (Pmode, target,
-			    gen_int_mode (required_align / BITS_PER_UNIT,
-					  Pmode),
-			    NULL_RTX, 1);
-    }
+    target = align_dynamic_address (target, required_align);
 
   /* Now that we've committed to a return value, mark its alignment.  */
   mark_reg_pointer (target, required_align);
@@ -1499,6 +1537,38 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
 
   return target;
 }
+
+/* Return an rtx representing the address of an area of memory already
+   statically pushed onto the stack in the virtual stack vars area.  (It is
+   assumed that the area is allocated in the function prologue.)
+
+   Any required stack pointer alignment is preserved.
+
+   OFFSET is the offset of the area into the virtual stack vars area.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.  */
+
+rtx
+get_dynamic_stack_base (HOST_WIDE_INT offset, unsigned required_align)
+{
+  rtx target;
+
+  if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
+    crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+
+  target = gen_reg_rtx (Pmode);
+  emit_move_insn (target, virtual_stack_vars_rtx);
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (offset, Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = align_dynamic_address (target, required_align);
+
+  /* Now that we've committed to a return value, mark its alignment.  */
+  mark_reg_pointer (target, required_align);
+
+  return target;
+}
 \f
 /* A front end may want to override GCC's stack checking by providing a
    run-time routine to call to check the stack, so provide a mechanism for
diff --git a/gcc/explow.h b/gcc/explow.h
index 52113db..6a89387 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -87,6 +87,15 @@ extern void record_new_stack_level (void);
 /* Allocate some space on the stack dynamically and return its address.  */
 extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned, bool);
 
+/* Calculate the necessary size of a constant dynamic stack allocation from the
+   size of the variable area.  */
+extern bool get_dynamic_stack_size (rtx *, unsigned, unsigned,
+				    HOST_WIDE_INT *);
+
+/* Returns the address of the dynamic stack space without allocating it.  */
+extern rtx get_dynamic_stack_base (HOST_WIDE_INT offset,
+				   unsigned required_align);
+
 /* Emit one stack probe at ADDRESS, an address within the stack.  */
 extern void emit_stack_probe (rtx);
 
diff --git a/gcc/target.def b/gcc/target.def
index b0ad09e..2b30082 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4651,6 +4651,15 @@ false for naked functions.  The default implementation always returns true.",
  bool, (void),
  hook_bool_void_true)
 
+/* Return true if the target supports allocation of runtime aligned stack
+   variables in the prologue.  */
+DEFHOOK
+(allocate_dynamic_stack_space_in_prologue_p,
+ "This hook indicates whether the target supports allocating runtime aligned\n\
+stack variable with constant size through the function prologue.",
+ bool, (void),
+ hook_bool_void_false)
+
 /* Return an rtx for the static chain for FNDECL_OR_TYPE.  If INCOMING_P
    is true, then it should be for the callee; otherwise for the caller.  */
 DEFHOOK
diff --git a/gcc/testsuite/gcc.dg/stack-usage-2.c b/gcc/testsuite/gcc.dg/stack-usage-2.c
index c2527d2..7d246ec 100644
--- a/gcc/testsuite/gcc.dg/stack-usage-2.c
+++ b/gcc/testsuite/gcc.dg/stack-usage-2.c
@@ -17,7 +17,9 @@ int foo2 (void)  /* { dg-warning "stack usage is \[0-9\]* bytes" } */
   return 0;
 }
 
-int foo3 (void) /* { dg-warning "stack usage might be \[0-9\]* bytes" } */
+/* The actual warning depends on whether stack space is allocated dynamically
+   or staically.  */
+int foo3 (void) /* { dg-warning "stack usage (might be)|(is) \[0-9\]* bytes" } */
 {
   char arr[1024] __attribute__((aligned (512)));
   arr[0] = 1;
-- 
2.3.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Allocate constant size dynamic stack space in the prologue
  2015-11-27 14:13       ` Dominik Vogt
@ 2015-11-27 14:24         ` Dominik Vogt
  2015-12-02 19:05         ` Jeff Law
  1 sibling, 0 replies; 27+ messages in thread
From: Dominik Vogt @ 2015-11-27 14:24 UTC (permalink / raw)
  To: gcc-patches

On Fri, Nov 27, 2015 at 03:09:15PM +0100, Dominik Vogt wrote:
> +++ b/gcc/config/s390/s390.h
...
> +/* Constant size dynamic stack space can be allocated through the function
> +   prologue to save the extra instructions to adjust the stack pointer.  */
> +#define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1
> +

This is obsolete and needs to be removed.

> +++ b/gcc/defaults.h
> +#ifndef ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P
> +#define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE_P 0
> +#endif
> +

Ditto.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Allocate constant size dynamic stack space in the prologue
  2015-11-27 14:13       ` Dominik Vogt
  2015-11-27 14:24         ` Dominik Vogt
@ 2015-12-02 19:05         ` Jeff Law
  2015-12-03  0:16           ` Bernd Schmidt
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff Law @ 2015-12-02 19:05 UTC (permalink / raw)
  To: gcc-patches

On 11/27/2015 07:09 AM, Dominik Vogt wrote:
> New patch with the following changes:
>
> * Fixed comment about dynamic var area placement.
> * The area is now placed further away from the stack pointer than
>    the non-dynamic stack variables (tested only with
>    STACK_GROWS_DOWNWARD).  This is a possible performance
>    improvement on S/390 (hoping that more variables will be
>    addressable using a displacement).
> * Moved the code that calculates the size to actually allocate
>    from the size required by dynamic stack variables to a separate
>    function.  Use that function from allocate_dynamic_stack_space()
>    and expand_stack_vars() so the size calculations are the same
>    for both.
> * Use a target hook to activate the feature (for now).
>    (This is just meant to make it more feasible to be included in
>    Gcc6.  If it's to late for this the code may be as well be used
>    for all targets.)
My inclination is for this to wait until gcc7.

jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Allocate constant size dynamic stack space in the prologue
  2015-12-02 19:05         ` Jeff Law
@ 2015-12-03  0:16           ` Bernd Schmidt
  0 siblings, 0 replies; 27+ messages in thread
From: Bernd Schmidt @ 2015-12-03  0:16 UTC (permalink / raw)
  To: Jeff Law, gcc-patches

On 12/02/2015 08:05 PM, Jeff Law wrote:
> On 11/27/2015 07:09 AM, Dominik Vogt wrote:
>> New patch with the following changes:
>>
>> * Fixed comment about dynamic var area placement.
>> * The area is now placed further away from the stack pointer than
>>    the non-dynamic stack variables (tested only with
>>    STACK_GROWS_DOWNWARD).  This is a possible performance
>>    improvement on S/390 (hoping that more variables will be
>>    addressable using a displacement).
>> * Moved the code that calculates the size to actually allocate
>>    from the size required by dynamic stack variables to a separate
>>    function.  Use that function from allocate_dynamic_stack_space()
>>    and expand_stack_vars() so the size calculations are the same
>>    for both.
>> * Use a target hook to activate the feature (for now).
>>    (This is just meant to make it more feasible to be included in
>>    Gcc6.  If it's to late for this the code may be as well be used
>>    for all targets.)
> My inclination is for this to wait until gcc7.

Yeah, I was going to say the same thing.


Bernd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue
  2015-11-25 13:17 [PATCH] Allocate constant size dynamic stack space in the prologue Dominik Vogt
  2015-11-25 13:33 ` Bernd Schmidt
@ 2016-05-06  9:38 ` Dominik Vogt
  2016-05-06  9:44   ` Dominik Vogt
  2016-06-23  4:43   ` [PATCH v2] " Jeff Law
  1 sibling, 2 replies; 27+ messages in thread
From: Dominik Vogt @ 2016-05-06  9:38 UTC (permalink / raw)
  To: gcc-patches, Andreas Krebbel, Ulrich Weigand, Andreas Arnez

[-- Attachment #1: Type: text/plain, Size: 1844 bytes --]

Updated version of the patch described below.  Apart from fixing a
bug and adding a test, the new logic is now used always, for all
targets.  The discussion of the original patch starts here:

https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03052.html

The new patch has been bootstrapped and regression tested on s390,
s390x and x86_64, but please check the questions/comments in the
follow up message.

On Wed, Nov 25, 2015 at 01:56:10PM +0100, Dominik Vogt wrote:
> The attached patch fixes a warning during Linux kernel compilation
> on S/390 due to -mwarn-dynamicstack and runtime alignment of stack
> variables with constant size causing cfun->calls_alloca to be set
> (even if alloca is not used at all).  The patched code places
> constant size runtime aligned variables in the "virtual stack
> vars" area instead of creating a "virtual stack dynamic" area.
> 
> This behaviour is activated by defining
> 
>   #define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1
> 
> in the backend; otherwise the old logic is used.
> 
> The kernel uses runtime alignment for the page structure (aligned
> to 16 bytes), and apart from triggereing the alloca warning
> (-mwarn-dynamicstack), the current Gcc also generates inefficient
> code like
> 
>   aghi %r15,-160  # prologue: create stack frame
>   lgr %r11,%r15   # prologue: generate frame pointer
>   aghi %r15,-32   # space for dynamic stack
> 
> which could be simplified to
> 
>   aghi %r15,-192
> 
> (if later optimization passes are able to get rid of the frame
> pointer).  Is there a specific reason why the patched behaviour
> shouldn't be used for all platforms?
> 
> --
> 
> As the placement of runtime aligned stack variables with constant
> size is done completely in the middleend, I don't see a way to fix
> this in the backend.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-v2-ChangeLog --]
[-- Type: text/plain, Size: 708 bytes --]

gcc/ChangeLog

	* cfgexpand.c (expand_stack_vars): Implement synamic stack space
	allocation in the prologue.
	* explow.c (get_dynamic_stack_base): New function to return an address
	expression for the dynamic stack base.
	(get_dynamic_stack_size): New function to do the required dynamic stack
	space size calculations.
	(allocate_dynamic_stack_space): Use new functions.
	(align_dynamic_address): Move some code from
	allocate_dynamic_stack_space to new function.
	* explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export.
gcc/testsuite/ChangeLog

	* gcc.target/s390/warn-dynamicstack-1.c: New test.
	* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.
	stack-layout-dynamic-1.c: New test.

[-- Attachment #3: 0001-v2-Allocate-constant-size-dynamic-stack-space-in-the-pr.patch --]
[-- Type: text/x-diff, Size: 16257 bytes --]

From e76a7e02f7862681d1b5344e64aca1b0a62cdc2c Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Wed, 25 Nov 2015 09:31:19 +0100
Subject: [PATCH] Allocate constant size dynamic stack space in the
 prologue ...

... and place it in the virtual stack vars area, if the platform supports it.
On S/390 this saves adjusting the stack pointer twice and forcing the frame
pointer into existence.  It also removes the warning with -mwarn-dynamicstack
that is triggered by cfun->calls_alloca == 1.

This fixes a problem with the Linux kernel which aligns the page structure to
16 bytes at run time using inefficient code and issuing a bogus warning.
---
 gcc/cfgexpand.c                                    |  20 +-
 gcc/explow.c                                       | 232 ++++++++++++++-------
 gcc/explow.h                                       |   9 +
 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c      |  14 ++
 gcc/testsuite/gcc.dg/stack-usage-2.c               |   4 +-
 .../gcc.target/s390/warn-dynamicstack-1.c          |  17 ++
 6 files changed, 212 insertions(+), 84 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 21f21c9..4d48afd 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1052,7 +1052,9 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
   size_t si, i, j, n = stack_vars_num;
   HOST_WIDE_INT large_size = 0, large_alloc = 0;
   rtx large_base = NULL;
+  rtx large_allocsize = NULL;
   unsigned large_align = 0;
+  bool large_allocation_done = false;
   tree decl;
 
   /* Determine if there are any variables requiring "large" alignment.
@@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 
       /* If there were any, allocate space.  */
       if (large_size > 0)
-	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
-						   large_align, true);
+	{
+	  large_allocsize = GEN_INT (large_size);
+	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
+	}
     }
 
   for (si = 0; si < n; ++si)
@@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	  /* Large alignment is only processed in the last pass.  */
 	  if (pred)
 	    continue;
+
+	  if (large_allocsize && ! large_allocation_done)
+	    {
+	      /* Allocate space the virtual stack vars area in the prologue.
+	       */
+	      HOST_WIDE_INT loffset;
+
+	      loffset = alloc_stack_frame_space (INTVAL (large_allocsize),
+						 PREFERRED_STACK_BOUNDARY);
+	      large_base = get_dynamic_stack_base (loffset, large_align);
+	      large_allocation_done = true;
+	    }
 	  gcc_assert (large_base != NULL);
 
 	  large_alloc += alignb - 1;
diff --git a/gcc/explow.c b/gcc/explow.c
index 1858597..7d13ed7 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1152,84 +1152,58 @@ record_new_stack_level (void)
     update_sjlj_context ();
 }
 \f
-/* Return an rtx representing the address of an area of memory dynamically
-   pushed on the stack.
+/* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
+static rtx
+align_dynamic_address (rtx target, unsigned required_align)
+{
+  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
+     but we know it can't.  So add ourselves and then do
+     TRUNC_DIV_EXPR.  */
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
+				       Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
+			  gen_int_mode (required_align / BITS_PER_UNIT,
+					Pmode),
+			  NULL_RTX, 1);
+  target = expand_mult (Pmode, target,
+			gen_int_mode (required_align / BITS_PER_UNIT,
+				      Pmode),
+			NULL_RTX, 1);
 
-   Any required stack pointer alignment is preserved.
+  return target;
+}
 
-   SIZE is an rtx representing the size of the area.
+/* Return an rtx through *PSIZE, representing the size of an area of memory to
+   be dynamically pushed on the stack.  The bool return value of this function
+   indicates whether any alignment has been done.
+
+   *PSIZE is an rtx representing the size of the area.
 
    SIZE_ALIGN is the alignment (in bits) that we know SIZE has.  This
-   parameter may be zero.  If so, a proper value will be extracted 
+   parameter may be zero.  If so, a proper value will be extracted
    from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed.
 
    REQUIRED_ALIGN is the alignment (in bits) required for the region
    of memory.
 
-   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
-   stack space allocated by the generated code cannot be added with itself
-   in the course of the execution of the function.  It is always safe to
-   pass FALSE here and the following criterion is sufficient in order to
-   pass TRUE: every path in the CFG that starts at the allocation point and
-   loops to it executes the associated deallocation code.  */
-
-rtx
-allocate_dynamic_stack_space (rtx size, unsigned size_align,
-			      unsigned required_align, bool cannot_accumulate)
+   If PSTACK_USAGE_SIZE is not NULL it points to a value that is increased for
+   the additional size returned.  */
+bool
+get_dynamic_stack_size (rtx *psize, unsigned size_align,
+			unsigned required_align,
+			HOST_WIDE_INT *pstack_usage_size)
 {
-  HOST_WIDE_INT stack_usage_size = -1;
-  rtx_code_label *final_label;
-  rtx final_target, target;
   unsigned extra_align = 0;
   unsigned extra = 0;
   bool must_align;
-
-  /* If we're asking for zero bytes, it doesn't matter what we point
-     to since we can't dereference it.  But return a reasonable
-     address anyway.  */
-  if (size == const0_rtx)
-    return virtual_stack_dynamic_rtx;
-
-  /* Otherwise, show we're calling alloca or equivalent.  */
-  cfun->calls_alloca = 1;
-
-  /* If stack usage info is requested, look into the size we are passed.
-     We need to do so this early to avoid the obfuscation that may be
-     introduced later by the various alignment operations.  */
-  if (flag_stack_usage_info)
-    {
-      if (CONST_INT_P (size))
-	stack_usage_size = INTVAL (size);
-      else if (REG_P (size))
-        {
-	  /* Look into the last emitted insn and see if we can deduce
-	     something for the register.  */
-	  rtx_insn *insn;
-	  rtx set, note;
-	  insn = get_last_insn ();
-	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
-	    {
-	      if (CONST_INT_P (SET_SRC (set)))
-		stack_usage_size = INTVAL (SET_SRC (set));
-	      else if ((note = find_reg_equal_equiv_note (insn))
-		       && CONST_INT_P (XEXP (note, 0)))
-		stack_usage_size = INTVAL (XEXP (note, 0));
-	    }
-	}
-
-      /* If the size is not constant, we can't say anything.  */
-      if (stack_usage_size == -1)
-	{
-	  current_function_has_unbounded_dynamic_stack_size = 1;
-	  stack_usage_size = 0;
-	}
-    }
+  rtx size = *psize;
 
   /* Ensure the size is in the proper mode.  */
   if (GET_MODE (size) != VOIDmode && GET_MODE (size) != Pmode)
     size = convert_to_mode (Pmode, size, 1);
 
-  /* Adjust SIZE_ALIGN, if needed.  */
   if (CONST_INT_P (size))
     {
       unsigned HOST_WIDE_INT lsb;
@@ -1289,8 +1263,8 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
       size = plus_constant (Pmode, size, extra);
       size = force_operand (size, NULL_RTX);
 
-      if (flag_stack_usage_info)
-	stack_usage_size += extra;
+      if (flag_stack_usage_info && pstack_usage_size)
+	*pstack_usage_size += extra;
 
       if (size_align > extra_align)
 	size_align = extra_align;
@@ -1313,13 +1287,93 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
     {
       size = round_push (size, extra);
 
-      if (flag_stack_usage_info)
+      if (flag_stack_usage_info && pstack_usage_size)
 	{
 	  int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
-	  stack_usage_size = (stack_usage_size + align - 1) / align * align;
+	  *pstack_usage_size =
+	    (*pstack_usage_size + align - 1) / align * align;
+	}
+    }
+
+  *psize = size;
+
+  return must_align;
+}
+
+/* Return an rtx representing the address of an area of memory dynamically
+   pushed on the stack.
+
+   Any required stack pointer alignment is preserved.
+
+   SIZE is an rtx representing the size of the area.
+
+   SIZE_ALIGN is the alignment (in bits) that we know SIZE has.  This
+   parameter may be zero.  If so, a proper value will be extracted
+   from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.
+
+   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
+   stack space allocated by the generated code cannot be added with itself
+   in the course of the execution of the function.  It is always safe to
+   pass FALSE here and the following criterion is sufficient in order to
+   pass TRUE: every path in the CFG that starts at the allocation point and
+   loops to it executes the associated deallocation code.  */
+
+rtx
+allocate_dynamic_stack_space (rtx size, unsigned size_align,
+			      unsigned required_align, bool cannot_accumulate)
+{
+  HOST_WIDE_INT stack_usage_size = -1;
+  rtx_code_label *final_label;
+  rtx final_target, target;
+  bool must_align;
+
+  /* If we're asking for zero bytes, it doesn't matter what we point
+     to since we can't dereference it.  But return a reasonable
+     address anyway.  */
+  if (size == const0_rtx)
+    return virtual_stack_dynamic_rtx;
+
+  /* Otherwise, show we're calling alloca or equivalent.  */
+  cfun->calls_alloca = 1;
+
+  /* If stack usage info is requested, look into the size we are passed.
+     We need to do so this early to avoid the obfuscation that may be
+     introduced later by the various alignment operations.  */
+  if (flag_stack_usage_info)
+    {
+      if (CONST_INT_P (size))
+	stack_usage_size = INTVAL (size);
+      else if (REG_P (size))
+        {
+	  /* Look into the last emitted insn and see if we can deduce
+	     something for the register.  */
+	  rtx_insn *insn;
+	  rtx set, note;
+	  insn = get_last_insn ();
+	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
+	    {
+	      if (CONST_INT_P (SET_SRC (set)))
+		stack_usage_size = INTVAL (SET_SRC (set));
+	      else if ((note = find_reg_equal_equiv_note (insn))
+		       && CONST_INT_P (XEXP (note, 0)))
+		stack_usage_size = INTVAL (XEXP (note, 0));
+	    }
+	}
+
+      /* If the size is not constant, we can't say anything.  */
+      if (stack_usage_size == -1)
+	{
+	  current_function_has_unbounded_dynamic_stack_size = 1;
+	  stack_usage_size = 0;
 	}
     }
 
+  must_align = get_dynamic_stack_size (&size, size_align, required_align,
+				       &stack_usage_size);
+
   target = gen_reg_rtx (Pmode);
 
   /* The size is supposed to be fully adjusted at this point so record it
@@ -1486,23 +1540,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
     }
 
   if (must_align)
-    {
-      /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
-	 but we know it can't.  So add ourselves and then do
-	 TRUNC_DIV_EXPR.  */
-      target = expand_binop (Pmode, add_optab, target,
-			     gen_int_mode (required_align / BITS_PER_UNIT - 1,
-					   Pmode),
-			     NULL_RTX, 1, OPTAB_LIB_WIDEN);
-      target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
-			      gen_int_mode (required_align / BITS_PER_UNIT,
-					    Pmode),
-			      NULL_RTX, 1);
-      target = expand_mult (Pmode, target,
-			    gen_int_mode (required_align / BITS_PER_UNIT,
-					  Pmode),
-			    NULL_RTX, 1);
-    }
+    target = align_dynamic_address (target, required_align);
 
   /* Now that we've committed to a return value, mark its alignment.  */
   mark_reg_pointer (target, required_align);
@@ -1512,6 +1550,38 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
 
   return target;
 }
+
+/* Return an rtx representing the address of an area of memory already
+   statically pushed onto the stack in the virtual stack vars area.  (It is
+   assumed that the area is allocated in the function prologue.)
+
+   Any required stack pointer alignment is preserved.
+
+   OFFSET is the offset of the area into the virtual stack vars area.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.  */
+
+rtx
+get_dynamic_stack_base (HOST_WIDE_INT offset, unsigned required_align)
+{
+  rtx target;
+
+  if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
+    crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+
+  target = gen_reg_rtx (Pmode);
+  emit_move_insn (target, virtual_stack_vars_rtx);
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (offset, Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = align_dynamic_address (target, required_align);
+
+  /* Now that we've committed to a return value, mark its alignment.  */
+  mark_reg_pointer (target, required_align);
+
+  return target;
+}
 \f
 /* A front end may want to override GCC's stack checking by providing a
    run-time routine to call to check the stack, so provide a mechanism for
diff --git a/gcc/explow.h b/gcc/explow.h
index 52113db..6a89387 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -87,6 +87,15 @@ extern void record_new_stack_level (void);
 /* Allocate some space on the stack dynamically and return its address.  */
 extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned, bool);
 
+/* Calculate the necessary size of a constant dynamic stack allocation from the
+   size of the variable area.  */
+extern bool get_dynamic_stack_size (rtx *, unsigned, unsigned,
+				    HOST_WIDE_INT *);
+
+/* Returns the address of the dynamic stack space without allocating it.  */
+extern rtx get_dynamic_stack_base (HOST_WIDE_INT offset,
+				   unsigned required_align);
+
 /* Emit one stack probe at ADDRESS, an address within the stack.  */
 extern void emit_stack_probe (rtx);
 
diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
new file mode 100644
index 0000000..e06a16c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
@@ -0,0 +1,14 @@
+/* Verify that run time aligned local variables are aloocated in the prologue
+   in one pass together with normal local variables.  */
+/* { dg-do compile } */
+/* { dg-options "-O0" } */
+
+extern void bar (void *, void *, void *);
+void foo (void)
+{
+  int i;
+  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
+  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
+  bar (&i, &runtime_aligned_1, &runtime_aligned_2);
+}
+/* { dg-final { scan-assembler-times "cfi_def_cfa_offset" 2 { target { s390*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/stack-usage-2.c b/gcc/testsuite/gcc.dg/stack-usage-2.c
index 86e2a65..a3dc522 100644
--- a/gcc/testsuite/gcc.dg/stack-usage-2.c
+++ b/gcc/testsuite/gcc.dg/stack-usage-2.c
@@ -16,7 +16,9 @@ int foo2 (void)  /* { dg-warning "stack usage is \[0-9\]* bytes" } */
   return 0;
 }
 
-int foo3 (void) /* { dg-warning "stack usage might be \[0-9\]* bytes" } */
+/* The actual warning depends on whether stack space is allocated dynamically
+   or staically.  */
+int foo3 (void) /* { dg-warning "stack usage (might be)|(is) \[0-9\]* bytes" } */
 {
   char arr[1024] __attribute__((aligned (512)));
   arr[0] = 1;
diff --git a/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c b/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c
new file mode 100644
index 0000000..66913f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c
@@ -0,0 +1,17 @@
+/* Check that the stack pointer is decreased only once in a funtion with
+   runtime aligned stack variables and -mwarn-dynamicstack does not generate a
+   warning.  */
+
+/* { dg-do compile { target { s390*-*-* } } } */
+/* { dg-options "-O2 -mwarn-dynamicstack" } */
+
+extern int bar (char *pl);
+
+int foo (long size)
+{
+  char __attribute__ ((aligned(16))) l = size;
+
+  return bar (&l);
+}
+
+/* { dg-final { scan-assembler-times "%r15,-" 1 } } */
-- 
2.3.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue
  2016-05-06  9:38 ` [PATCH v2] " Dominik Vogt
@ 2016-05-06  9:44   ` Dominik Vogt
  2016-06-20 11:09     ` [PATCH v2, PING 1] " Dominik Vogt
  2016-06-23  4:46     ` [PATCH v2] " Jeff Law
  2016-06-23  4:43   ` [PATCH v2] " Jeff Law
  1 sibling, 2 replies; 27+ messages in thread
From: Dominik Vogt @ 2016-05-06  9:44 UTC (permalink / raw)
  To: gcc-patches, Andreas Krebbel, Ulrich Weigand, Andreas Arnez

> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 21f21c9..4d48afd 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
...
> @@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>  
>        /* If there were any, allocate space.  */
>        if (large_size > 0)
> -	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
> -						   large_align, true);
> +	{
> +	  large_allocsize = GEN_INT (large_size);
> +	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
...

See below.

> @@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>  	  /* Large alignment is only processed in the last pass.  */
>  	  if (pred)
>  	    continue;
> +
> +	  if (large_allocsize && ! large_allocation_done)
> +	    {
> +	      /* Allocate space the virtual stack vars area in the prologue.
> +	       */
> +	      HOST_WIDE_INT loffset;
> +
> +	      loffset = alloc_stack_frame_space (INTVAL (large_allocsize),
> +						 PREFERRED_STACK_BOUNDARY);

1) Should this use PREFERRED_STACK_BOUNDARY or just STACK_BOUNDARY?
2) Is this the right place for rounding up, or should 
   it be done above, maybe in get_dynamic_stack_size?

Not sure whether this is the right 

> +	      large_base = get_dynamic_stack_base (loffset, large_align);
> +	      large_allocation_done = true;
> +	    }
>  	  gcc_assert (large_base != NULL);
>  
>  	  large_alloc += alignb - 1;

> diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
> new file mode 100644
> index 0000000..e06a16c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
> @@ -0,0 +1,14 @@
> +/* Verify that run time aligned local variables are aloocated in the prologue
> +   in one pass together with normal local variables.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O0" } */
> +
> +extern void bar (void *, void *, void *);
> +void foo (void)
> +{
> +  int i;
> +  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
> +  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
> +  bar (&i, &runtime_aligned_1, &runtime_aligned_2);
> +}
> +/* { dg-final { scan-assembler-times "cfi_def_cfa_offset" 2 { target { s390*-*-* } } } } */

I've no idea how to test this on other targets, or how to express
the test in a target independent way.  The scan-assembler-times
does not work on x86_64.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2, PING 1] Allocate constant size dynamic stack space in the prologue
  2016-05-06  9:44   ` Dominik Vogt
@ 2016-06-20 11:09     ` Dominik Vogt
  2016-06-23  4:46     ` [PATCH v2] " Jeff Law
  1 sibling, 0 replies; 27+ messages in thread
From: Dominik Vogt @ 2016-06-20 11:09 UTC (permalink / raw)
  To: gcc-patches, Andreas Krebbel, Ulrich Weigand, Andreas Arnez

On Fri, May 06, 2016 at 10:44:15AM +0100, Dominik Vogt wrote:
> > diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> > index 21f21c9..4d48afd 100644
> > --- a/gcc/cfgexpand.c
> > +++ b/gcc/cfgexpand.c
> ...
> > @@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
> >  
> >        /* If there were any, allocate space.  */
> >        if (large_size > 0)
> > -	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
> > -						   large_align, true);
> > +	{
> > +	  large_allocsize = GEN_INT (large_size);
> > +	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
> ...
> 
> See below.
> 
> > @@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
> >  	  /* Large alignment is only processed in the last pass.  */
> >  	  if (pred)
> >  	    continue;
> > +
> > +	  if (large_allocsize && ! large_allocation_done)
> > +	    {
> > +	      /* Allocate space the virtual stack vars area in the prologue.
> > +	       */
> > +	      HOST_WIDE_INT loffset;
> > +
> > +	      loffset = alloc_stack_frame_space (INTVAL (large_allocsize),
> > +						 PREFERRED_STACK_BOUNDARY);
> 
> 1) Should this use PREFERRED_STACK_BOUNDARY or just STACK_BOUNDARY?
> 2) Is this the right place for rounding up, or should 
>    it be done above, maybe in get_dynamic_stack_size?
> 
> Not sure whether this is the right 
> 
> > +	      large_base = get_dynamic_stack_base (loffset, large_align);
> > +	      large_allocation_done = true;
> > +	    }
> >  	  gcc_assert (large_base != NULL);
> >  
> >  	  large_alloc += alignb - 1;
> 
> > diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
> > new file mode 100644
> > index 0000000..e06a16c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
> > @@ -0,0 +1,14 @@
> > +/* Verify that run time aligned local variables are aloocated in the prologue
> > +   in one pass together with normal local variables.  */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O0" } */
> > +
> > +extern void bar (void *, void *, void *);
> > +void foo (void)
> > +{
> > +  int i;
> > +  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
> > +  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
> > +  bar (&i, &runtime_aligned_1, &runtime_aligned_2);
> > +}
> > +/* { dg-final { scan-assembler-times "cfi_def_cfa_offset" 2 { target { s390*-*-* } } } } */
> 
> I've no idea how to test this on other targets, or how to express
> the test in a target independent way.  The scan-assembler-times
> does not work on x86_64.
> 
> Ciao
> 
> Dominik ^_^  ^_^
> 
> -- 
> 
> Dominik Vogt
> IBM Germany
> 
> 


Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue
  2016-05-06  9:38 ` [PATCH v2] " Dominik Vogt
  2016-05-06  9:44   ` Dominik Vogt
@ 2016-06-23  4:43   ` Jeff Law
  1 sibling, 0 replies; 27+ messages in thread
From: Jeff Law @ 2016-06-23  4:43 UTC (permalink / raw)
  To: vogt, gcc-patches, Andreas Krebbel, Ulrich Weigand, Andreas Arnez

On 05/06/2016 03:37 AM, Dominik Vogt wrote:
> Updated version of the patch described below.  Apart from fixing a
> bug and adding a test, the new logic is now used always, for all
> targets.  The discussion of the original patch starts here:
>
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03052.html
>
> The new patch has been bootstrapped and regression tested on s390,
> s390x and x86_64, but please check the questions/comments in the
> follow up message.
>
> On Wed, Nov 25, 2015 at 01:56:10PM +0100, Dominik Vogt wrote:
>> > The attached patch fixes a warning during Linux kernel compilation
>> > on S/390 due to -mwarn-dynamicstack and runtime alignment of stack
>> > variables with constant size causing cfun->calls_alloca to be set
>> > (even if alloca is not used at all).  The patched code places
>> > constant size runtime aligned variables in the "virtual stack
>> > vars" area instead of creating a "virtual stack dynamic" area.
>> >
>> > This behaviour is activated by defining
>> >
>> >   #define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1
Is there some reason why we don't just to this unconditionally?  ie, if 
we know the size of dynamic space, why not just always handle that in 
the prologue?  Seems like a useful optimization for a variety of reasons.

Of course if we do this unconditionally, we definitely need to find a 
way to test it better.

In reality, I don't see where/how the patch uses 
ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE anyway and it seems to be 
enabled for all targets, which is what I want :-)



>> >
>> > in the backend; otherwise the old logic is used.
>> >
>> > The kernel uses runtime alignment for the page structure (aligned
>> > to 16 bytes), and apart from triggereing the alloca warning
>> > (-mwarn-dynamicstack), the current Gcc also generates inefficient
>> > code like
>> >
>> >   aghi %r15,-160  # prologue: create stack frame
>> >   lgr %r11,%r15   # prologue: generate frame pointer
>> >   aghi %r15,-32   # space for dynamic stack
>> >
>> > which could be simplified to
>> >
>> >   aghi %r15,-192
>> >
>> > (if later optimization passes are able to get rid of the frame
>> > pointer).  Is there a specific reason why the patched behaviour
>> > shouldn't be used for all platforms?
>> >
>> > --
>> >
>> > As the placement of runtime aligned stack variables with constant
>> > size is done completely in the middleend, I don't see a way to fix
>> > this in the backend.
> Ciao
>
> Dominik ^_^  ^_^
>
> -- Dominik Vogt IBM Germany
>
>
> 0001-v2-ChangeLog
>
>
> gcc/ChangeLog
>
> 	* cfgexpand.c (expand_stack_vars): Implement synamic stack space
> 	allocation in the prologue.
> 	* explow.c (get_dynamic_stack_base): New function to return an address
> 	expression for the dynamic stack base.
> 	(get_dynamic_stack_size): New function to do the required dynamic stack
> 	space size calculations.
> 	(allocate_dynamic_stack_space): Use new functions.
> 	(align_dynamic_address): Move some code from
> 	allocate_dynamic_stack_space to new function.
> 	* explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export.
> gcc/testsuite/ChangeLog
>
> 	* gcc.target/s390/warn-dynamicstack-1.c: New test.
> 	* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.
> 	stack-layout-dynamic-1.c: New test.
>
>
> 0001-v2-Allocate-constant-size-dynamic-stack-space-in-the-pr.patch
>
>
> From e76a7e02f7862681d1b5344e64aca1b0a62cdc2c Mon Sep 17 00:00:00 2001
> From: Dominik Vogt <vogt@linux.vnet.ibm.com>
> Date: Wed, 25 Nov 2015 09:31:19 +0100
> Subject: [PATCH] Allocate constant size dynamic stack space in the
>  prologue ...
>
> ... and place it in the virtual stack vars area, if the platform supports it.
> On S/390 this saves adjusting the stack pointer twice and forcing the frame
> pointer into existence.  It also removes the warning with -mwarn-dynamicstack
> that is triggered by cfun->calls_alloca == 1.
>
> This fixes a problem with the Linux kernel which aligns the page structure to
> 16 bytes at run time using inefficient code and issuing a bogus warning.

> @@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>  	  /* Large alignment is only processed in the last pass.  */
>  	  if (pred)
>  	    continue;
> +
> +	  if (large_allocsize && ! large_allocation_done)
> +	    {
> +	      /* Allocate space the virtual stack vars area in the prologue.
> +	       */
Line wrapping nit here.  Bring "prologue" down to the next line.

I really like this.  I think the big question is how do we test it.  I 
suspect our bootstrap and regression suite probably don't exercise this 
code is any significant way.

Jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue
  2016-05-06  9:44   ` Dominik Vogt
  2016-06-20 11:09     ` [PATCH v2, PING 1] " Dominik Vogt
@ 2016-06-23  4:46     ` Jeff Law
  2016-06-23 15:48       ` Dominik Vogt
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff Law @ 2016-06-23  4:46 UTC (permalink / raw)
  To: vogt, gcc-patches, Andreas Krebbel, Ulrich Weigand, Andreas Arnez

On 05/06/2016 03:44 AM, Dominik Vogt wrote:
>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
>> index 21f21c9..4d48afd 100644
>> --- a/gcc/cfgexpand.c
>> +++ b/gcc/cfgexpand.c
> ...
>> @@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>>
>>        /* If there were any, allocate space.  */
>>        if (large_size > 0)
>> -	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
>> -						   large_align, true);
>> +	{
>> +	  large_allocsize = GEN_INT (large_size);
>> +	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
> ...
>
> See below.
>
>> @@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>>  	  /* Large alignment is only processed in the last pass.  */
>>  	  if (pred)
>>  	    continue;
>> +
>> +	  if (large_allocsize && ! large_allocation_done)
>> +	    {
>> +	      /* Allocate space the virtual stack vars area in the prologue.
>> +	       */
>> +	      HOST_WIDE_INT loffset;
>> +
>> +	      loffset = alloc_stack_frame_space (INTVAL (large_allocsize),
>> +						 PREFERRED_STACK_BOUNDARY);
>
> 1) Should this use PREFERRED_STACK_BOUNDARY or just STACK_BOUNDARY?
> 2) Is this the right place for rounding up, or should
>    it be done above, maybe in get_dynamic_stack_size?
I think PREFERRED_STACK_BOUNDARY is the correct one to use.

I think rounding in either place is fine.  We'd like to avoid multiple 
roundings, but otherwise I don't think it really matters.

jeff
>
> Not sure whether this is the right
>
>> +	      large_base = get_dynamic_stack_base (loffset, large_align);
>> +	      large_allocation_done = true;
>> +	    }
>>  	  gcc_assert (large_base != NULL);
>>
>>  	  large_alloc += alignb - 1;
>
>> diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
>> new file mode 100644
>> index 0000000..e06a16c
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
>> @@ -0,0 +1,14 @@
>> +/* Verify that run time aligned local variables are aloocated in the prologue
>> +   in one pass together with normal local variables.  */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O0" } */
>> +
>> +extern void bar (void *, void *, void *);
>> +void foo (void)
>> +{
>> +  int i;
>> +  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
>> +  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
>> +  bar (&i, &runtime_aligned_1, &runtime_aligned_2);
>> +}
>> +/* { dg-final { scan-assembler-times "cfi_def_cfa_offset" 2 { target { s390*-*-* } } } } */
>
> I've no idea how to test this on other targets, or how to express
> the test in a target independent way.  The scan-assembler-times
> does not work on x86_64.
I wonder if you could force -fomit-frame-pointer and see if we still end 
up with a frame pointer?

jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue
  2016-06-23  4:46     ` [PATCH v2] " Jeff Law
@ 2016-06-23 15:48       ` Dominik Vogt
  2016-06-24 12:40         ` Dominik Vogt
  0 siblings, 1 reply; 27+ messages in thread
From: Dominik Vogt @ 2016-06-23 15:48 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Andreas Krebbel, Ulrich Weigand, Andreas Arnez

[-- Attachment #1: Type: text/plain, Size: 3726 bytes --]

Third version of the patch.  Changes:

 * Corrected a typo in a test case comment.
 * Verify that stack variable alignment does not force the frame
   pointer into existence (with -fomit-frame-pointer)

The test should hopefully run on all targets.  Tested on s390,
s390x biarch, x86_64.  The only open question I'm aware of is the
stack-usage-2.c test.  I guess foo3() will not generate

  stack usage might be ... bytes

On any target anymore, and using alloca() with a constant size
results in "unbounded".  It's unclear to me whether that message
is ever generated, and if so, how to trigger it.

> >>diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
> >>new file mode 100644
> >>index 0000000..e06a16c
> >>--- /dev/null
> >>+++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
> >>@@ -0,0 +1,14 @@
> >>+/* Verify that run time aligned local variables are aloocated in the prologue
> >>+   in one pass together with normal local variables.  */
> >>+/* { dg-do compile } */
> >>+/* { dg-options "-O0" } */
> >>+
> >>+extern void bar (void *, void *, void *);
> >>+void foo (void)
> >>+{
> >>+  int i;
> >>+  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
> >>+  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
> >>+  bar (&i, &runtime_aligned_1, &runtime_aligned_2);
> >>+}
> >>+/* { dg-final { scan-assembler-times "cfi_def_cfa_offset" 2 { target { s390*-*-* } } } } */
> >
> >I've no idea how to test this on other targets, or how to express
> >the test in a target independent way.  The scan-assembler-times
> >does not work on x86_64.
> I wonder if you could force -fomit-frame-pointer and see if we still
> end up with a frame pointer?
> 
> jeff

On Fri, May 06, 2016 at 10:37:47AM +0100, Dominik Vogt wrote:
> Updated version of the patch described below.  Apart from fixing a
> bug and adding a test, the new logic is now used always, for all
> targets.  The discussion of the original patch starts here:
>
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03052.html
>
> The new patch has been bootstrapped and regression tested on s390,
> s390x and x86_64, but please check the questions/comments in the
> follow up message.
>
> On Wed, Nov 25, 2015 at 01:56:10PM +0100, Dominik Vogt wrote:
> > The attached patch fixes a warning during Linux kernel compilation
> > on S/390 due to -mwarn-dynamicstack and runtime alignment of stack
> > variables with constant size causing cfun->calls_alloca to be set
> > (even if alloca is not used at all).  The patched code places
> > constant size runtime aligned variables in the "virtual stack
> > vars" area instead of creating a "virtual stack dynamic" area.
> >
> > This behaviour is activated by defining
> >
> >   #define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1
> >
> > in the backend; otherwise the old logic is used.
> >
> > The kernel uses runtime alignment for the page structure (aligned
> > to 16 bytes), and apart from triggereing the alloca warning
> > (-mwarn-dynamicstack), the current Gcc also generates inefficient
> > code like
> >
> >   aghi %r15,-160  # prologue: create stack frame
> >   lgr %r11,%r15   # prologue: generate frame pointer
> >   aghi %r15,-32   # space for dynamic stack
> >
> > which could be simplified to
> >
> >   aghi %r15,-192
> >
> > (if later optimization passes are able to get rid of the frame
> > pointer).  Is there a specific reason why the patched behaviour
> > shouldn't be used for all platforms?
> >
> > --
> >
> > As the placement of runtime aligned stack variables with constant
> > size is done completely in the middleend, I don't see a way to fix
> > this in the backend.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-v3-ChangeLog --]
[-- Type: text/plain, Size: 708 bytes --]

gcc/ChangeLog

	* cfgexpand.c (expand_stack_vars): Implement synamic stack space
	allocation in the prologue.
	* explow.c (get_dynamic_stack_base): New function to return an address
	expression for the dynamic stack base.
	(get_dynamic_stack_size): New function to do the required dynamic stack
	space size calculations.
	(allocate_dynamic_stack_space): Use new functions.
	(align_dynamic_address): Move some code from
	allocate_dynamic_stack_space to new function.
	* explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export.
gcc/testsuite/ChangeLog

	* gcc.target/s390/warn-dynamicstack-1.c: New test.
	* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.
	stack-layout-dynamic-1.c: New test.

[-- Attachment #3: 0001-v3-Allocate-constant-size-dynamic-stack-space-in-the-pr.patch --]
[-- Type: text/x-diff, Size: 15987 bytes --]

From 8e48e4a8ff063e7894a375724ed5eddb57018c03 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Wed, 25 Nov 2015 09:31:19 +0100
Subject: [PATCH] Allocate constant size dynamic stack space in the
 prologue ...

... and place it in the virtual stack vars area, if the platform supports it.
On S/390 this saves adjusting the stack pointer twice and forcing the frame
pointer into existence.  It also removes the warning with -mwarn-dynamicstack
that is triggered by cfun->calls_alloca == 1.

This fixes a problem with the Linux kernel which aligns the page structure to
16 bytes at run time using inefficient code and issuing a bogus warning.
---
 gcc/cfgexpand.c                                    |  20 +-
 gcc/explow.c                                       | 223 ++++++++++++++-------
 gcc/explow.h                                       |   8 +
 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c      |  14 ++
 gcc/testsuite/gcc.dg/stack-usage-2.c               |   4 +-
 .../gcc.target/s390/warn-dynamicstack-1.c          |  17 ++
 6 files changed, 206 insertions(+), 80 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 56ef71d..8d2c687 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1052,7 +1052,9 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
   size_t si, i, j, n = stack_vars_num;
   HOST_WIDE_INT large_size = 0, large_alloc = 0;
   rtx large_base = NULL;
+  rtx large_allocsize = NULL;
   unsigned large_align = 0;
+  bool large_allocation_done = false;
   tree decl;
 
   /* Determine if there are any variables requiring "large" alignment.
@@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 
       /* If there were any, allocate space.  */
       if (large_size > 0)
-	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
-						   large_align, true);
+	{
+	  large_allocsize = GEN_INT (large_size);
+	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
+	}
     }
 
   for (si = 0; si < n; ++si)
@@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	  /* Large alignment is only processed in the last pass.  */
 	  if (pred)
 	    continue;
+
+	  if (large_allocsize && ! large_allocation_done)
+	    {
+	      /* Allocate space the virtual stack vars area in the
+	         prologue.  */
+	      HOST_WIDE_INT loffset;
+
+	      loffset = alloc_stack_frame_space (INTVAL (large_allocsize),
+						 PREFERRED_STACK_BOUNDARY);
+	      large_base = get_dynamic_stack_base (loffset, large_align);
+	      large_allocation_done = true;
+	    }
 	  gcc_assert (large_base != NULL);
 
 	  large_alloc += alignb - 1;
diff --git a/gcc/explow.c b/gcc/explow.c
index 56ebccf..e22da31 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1151,82 +1151,55 @@ record_new_stack_level (void)
     update_sjlj_context ();
 }
 \f
-/* Return an rtx representing the address of an area of memory dynamically
-   pushed on the stack.
+/* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
+static rtx
+align_dynamic_address (rtx target, unsigned required_align)
+{
+  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
+     but we know it can't.  So add ourselves and then do
+     TRUNC_DIV_EXPR.  */
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
+				       Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
+			  gen_int_mode (required_align / BITS_PER_UNIT,
+					Pmode),
+			  NULL_RTX, 1);
+  target = expand_mult (Pmode, target,
+			gen_int_mode (required_align / BITS_PER_UNIT,
+				      Pmode),
+			NULL_RTX, 1);
 
-   Any required stack pointer alignment is preserved.
+  return target;
+}
 
-   SIZE is an rtx representing the size of the area.
+/* Return an rtx through *PSIZE, representing the size of an area of memory to
+   be dynamically pushed on the stack.
+
+   *PSIZE is an rtx representing the size of the area.
 
    SIZE_ALIGN is the alignment (in bits) that we know SIZE has.  This
-   parameter may be zero.  If so, a proper value will be extracted 
+   parameter may be zero.  If so, a proper value will be extracted
    from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed.
 
    REQUIRED_ALIGN is the alignment (in bits) required for the region
    of memory.
 
-   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
-   stack space allocated by the generated code cannot be added with itself
-   in the course of the execution of the function.  It is always safe to
-   pass FALSE here and the following criterion is sufficient in order to
-   pass TRUE: every path in the CFG that starts at the allocation point and
-   loops to it executes the associated deallocation code.  */
-
-rtx
-allocate_dynamic_stack_space (rtx size, unsigned size_align,
-			      unsigned required_align, bool cannot_accumulate)
+   If PSTACK_USAGE_SIZE is not NULL it points to a value that is increased for
+   the additional size returned.  */
+void
+get_dynamic_stack_size (rtx *psize, unsigned size_align,
+			unsigned required_align,
+			HOST_WIDE_INT *pstack_usage_size)
 {
-  HOST_WIDE_INT stack_usage_size = -1;
-  rtx_code_label *final_label;
-  rtx final_target, target;
   unsigned extra = 0;
-
-  /* If we're asking for zero bytes, it doesn't matter what we point
-     to since we can't dereference it.  But return a reasonable
-     address anyway.  */
-  if (size == const0_rtx)
-    return virtual_stack_dynamic_rtx;
-
-  /* Otherwise, show we're calling alloca or equivalent.  */
-  cfun->calls_alloca = 1;
-
-  /* If stack usage info is requested, look into the size we are passed.
-     We need to do so this early to avoid the obfuscation that may be
-     introduced later by the various alignment operations.  */
-  if (flag_stack_usage_info)
-    {
-      if (CONST_INT_P (size))
-	stack_usage_size = INTVAL (size);
-      else if (REG_P (size))
-        {
-	  /* Look into the last emitted insn and see if we can deduce
-	     something for the register.  */
-	  rtx_insn *insn;
-	  rtx set, note;
-	  insn = get_last_insn ();
-	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
-	    {
-	      if (CONST_INT_P (SET_SRC (set)))
-		stack_usage_size = INTVAL (SET_SRC (set));
-	      else if ((note = find_reg_equal_equiv_note (insn))
-		       && CONST_INT_P (XEXP (note, 0)))
-		stack_usage_size = INTVAL (XEXP (note, 0));
-	    }
-	}
-
-      /* If the size is not constant, we can't say anything.  */
-      if (stack_usage_size == -1)
-	{
-	  current_function_has_unbounded_dynamic_stack_size = 1;
-	  stack_usage_size = 0;
-	}
-    }
+  rtx size = *psize;
 
   /* Ensure the size is in the proper mode.  */
   if (GET_MODE (size) != VOIDmode && GET_MODE (size) != Pmode)
     size = convert_to_mode (Pmode, size, 1);
 
-  /* Adjust SIZE_ALIGN, if needed.  */
   if (CONST_INT_P (size))
     {
       unsigned HOST_WIDE_INT lsb;
@@ -1262,8 +1235,8 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
       size = plus_constant (Pmode, size, extra);
       size = force_operand (size, NULL_RTX);
 
-      if (flag_stack_usage_info)
-	stack_usage_size += extra;
+      if (flag_stack_usage_info && pstack_usage_size)
+	*pstack_usage_size += extra;
 
       if (size_align > BITS_PER_UNIT)
 	size_align = BITS_PER_UNIT;
@@ -1286,13 +1259,89 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
     {
       size = round_push (size, extra);
 
-      if (flag_stack_usage_info)
+      if (flag_stack_usage_info && pstack_usage_size)
 	{
 	  int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
-	  stack_usage_size = (stack_usage_size + align - 1) / align * align;
+	  *pstack_usage_size =
+	    (*pstack_usage_size + align - 1) / align * align;
+	}
+    }
+
+  *psize = size;
+}
+
+/* Return an rtx representing the address of an area of memory dynamically
+   pushed on the stack.
+
+   Any required stack pointer alignment is preserved.
+
+   SIZE is an rtx representing the size of the area.
+
+   SIZE_ALIGN is the alignment (in bits) that we know SIZE has.  This
+   parameter may be zero.  If so, a proper value will be extracted
+   from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.
+
+   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
+   stack space allocated by the generated code cannot be added with itself
+   in the course of the execution of the function.  It is always safe to
+   pass FALSE here and the following criterion is sufficient in order to
+   pass TRUE: every path in the CFG that starts at the allocation point and
+   loops to it executes the associated deallocation code.  */
+
+rtx
+allocate_dynamic_stack_space (rtx size, unsigned size_align,
+			      unsigned required_align, bool cannot_accumulate)
+{
+  HOST_WIDE_INT stack_usage_size = -1;
+  rtx_code_label *final_label;
+  rtx final_target, target;
+
+  /* If we're asking for zero bytes, it doesn't matter what we point
+     to since we can't dereference it.  But return a reasonable
+     address anyway.  */
+  if (size == const0_rtx)
+    return virtual_stack_dynamic_rtx;
+
+  /* Otherwise, show we're calling alloca or equivalent.  */
+  cfun->calls_alloca = 1;
+
+  /* If stack usage info is requested, look into the size we are passed.
+     We need to do so this early to avoid the obfuscation that may be
+     introduced later by the various alignment operations.  */
+  if (flag_stack_usage_info)
+    {
+      if (CONST_INT_P (size))
+	stack_usage_size = INTVAL (size);
+      else if (REG_P (size))
+        {
+	  /* Look into the last emitted insn and see if we can deduce
+	     something for the register.  */
+	  rtx_insn *insn;
+	  rtx set, note;
+	  insn = get_last_insn ();
+	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
+	    {
+	      if (CONST_INT_P (SET_SRC (set)))
+		stack_usage_size = INTVAL (SET_SRC (set));
+	      else if ((note = find_reg_equal_equiv_note (insn))
+		       && CONST_INT_P (XEXP (note, 0)))
+		stack_usage_size = INTVAL (XEXP (note, 0));
+	    }
+	}
+
+      /* If the size is not constant, we can't say anything.  */
+      if (stack_usage_size == -1)
+	{
+	  current_function_has_unbounded_dynamic_stack_size = 1;
+	  stack_usage_size = 0;
 	}
     }
 
+  get_dynamic_stack_size (&size, size_align, required_align, &stack_usage_size);
+
   target = gen_reg_rtx (Pmode);
 
   /* The size is supposed to be fully adjusted at this point so record it
@@ -1455,19 +1504,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
       target = final_target;
     }
 
-  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
-     but we know it can't.  So add ourselves and then do
-     TRUNC_DIV_EXPR.  */
-  target = expand_binop (Pmode, add_optab, target,
-			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
-				       Pmode),
-			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
-  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
-			  gen_int_mode (required_align / BITS_PER_UNIT, Pmode),
-			  NULL_RTX, 1);
-  target = expand_mult (Pmode, target,
-			gen_int_mode (required_align / BITS_PER_UNIT, Pmode),
-			NULL_RTX, 1);
+  target = align_dynamic_address (target, required_align);
 
   /* Now that we've committed to a return value, mark its alignment.  */
   mark_reg_pointer (target, required_align);
@@ -1477,6 +1514,38 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
 
   return target;
 }
+
+/* Return an rtx representing the address of an area of memory already
+   statically pushed onto the stack in the virtual stack vars area.  (It is
+   assumed that the area is allocated in the function prologue.)
+
+   Any required stack pointer alignment is preserved.
+
+   OFFSET is the offset of the area into the virtual stack vars area.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.  */
+
+rtx
+get_dynamic_stack_base (HOST_WIDE_INT offset, unsigned required_align)
+{
+  rtx target;
+
+  if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
+    crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+
+  target = gen_reg_rtx (Pmode);
+  emit_move_insn (target, virtual_stack_vars_rtx);
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (offset, Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = align_dynamic_address (target, required_align);
+
+  /* Now that we've committed to a return value, mark its alignment.  */
+  mark_reg_pointer (target, required_align);
+
+  return target;
+}
 \f
 /* A front end may want to override GCC's stack checking by providing a
    run-time routine to call to check the stack, so provide a mechanism for
diff --git a/gcc/explow.h b/gcc/explow.h
index 52113db..e12f90c 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -87,6 +87,14 @@ extern void record_new_stack_level (void);
 /* Allocate some space on the stack dynamically and return its address.  */
 extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned, bool);
 
+/* Calculate the necessary size of a constant dynamic stack allocation from the
+   size of the variable area.  */
+extern void get_dynamic_stack_size (rtx *, unsigned, unsigned, HOST_WIDE_INT *);
+
+/* Returns the address of the dynamic stack space without allocating it.  */
+extern rtx get_dynamic_stack_base (HOST_WIDE_INT offset,
+				   unsigned required_align);
+
 /* Emit one stack probe at ADDRESS, an address within the stack.  */
 extern void emit_stack_probe (rtx);
 
diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
new file mode 100644
index 0000000..6dc17af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
@@ -0,0 +1,14 @@
+/* Verify that run time aligned local variables are aloocated in the prologue
+   in one pass together with normal local variables.  */
+/* { dg-do compile } */
+/* { dg-options "-O0 -fomit-frame-pointer" } */
+
+extern void bar (void *, void *, void *);
+void foo (void)
+{
+  int i;
+  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
+  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
+  bar (&i, &runtime_aligned_1, &runtime_aligned_2);
+}
+/* { dg-final { scan-assembler-not "cfi_def_cfa_register" } } */
diff --git a/gcc/testsuite/gcc.dg/stack-usage-2.c b/gcc/testsuite/gcc.dg/stack-usage-2.c
index 86e2a65..1a9e7f3 100644
--- a/gcc/testsuite/gcc.dg/stack-usage-2.c
+++ b/gcc/testsuite/gcc.dg/stack-usage-2.c
@@ -16,7 +16,9 @@ int foo2 (void)  /* { dg-warning "stack usage is \[0-9\]* bytes" } */
   return 0;
 }
 
-int foo3 (void) /* { dg-warning "stack usage might be \[0-9\]* bytes" } */
+/* The actual warning depends on whether stack space is allocated dynamically
+   or statically.  */
+int foo3 (void) /* { dg-warning "stack usage (might be)|(is) \[0-9\]* bytes" } */
 {
   char arr[1024] __attribute__((aligned (512)));
   arr[0] = 1;
diff --git a/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c b/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c
new file mode 100644
index 0000000..66913f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c
@@ -0,0 +1,17 @@
+/* Check that the stack pointer is decreased only once in a funtion with
+   runtime aligned stack variables and -mwarn-dynamicstack does not generate a
+   warning.  */
+
+/* { dg-do compile { target { s390*-*-* } } } */
+/* { dg-options "-O2 -mwarn-dynamicstack" } */
+
+extern int bar (char *pl);
+
+int foo (long size)
+{
+  char __attribute__ ((aligned(16))) l = size;
+
+  return bar (&l);
+}
+
+/* { dg-final { scan-assembler-times "%r15,-" 1 } } */
-- 
2.3.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue
  2016-06-23 15:48       ` Dominik Vogt
@ 2016-06-24 12:40         ` Dominik Vogt
  2016-07-04 12:20           ` Dominik Vogt
  0 siblings, 1 reply; 27+ messages in thread
From: Dominik Vogt @ 2016-06-24 12:40 UTC (permalink / raw)
  To: Jeff Law, gcc-patches, Andreas Krebbel, Ulrich Weigand, Andreas Arnez

On Thu, Jun 23, 2016 at 04:48:14PM +0100, Dominik Vogt wrote:
> Third version of the patch.  Changes:
> 
>  * Corrected a typo in a test case comment.
>  * Verify that stack variable alignment does not force the frame
>    pointer into existence (with -fomit-frame-pointer)
> 
> The test should hopefully run on all targets.  Tested on s390,
> s390x biarch, x86_64.  The only open question I'm aware of is the
> stack-usage-2.c test.  I guess foo3() will not generate
> 
>   stack usage might be ... bytes
> 
> On any target anymore, and using alloca() with a constant size
> results in "unbounded".  It's unclear to me whether that message
> is ever generated, and if so, how to trigger it.

> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>  	  /* Large alignment is only processed in the last pass.  */
>  	  if (pred)
>  	    continue;
> +
> +	  if (large_allocsize && ! large_allocation_done)
> +	    {
> +	      /* Allocate space the virtual stack vars area in the
> +	         prologue.  */
> +	      HOST_WIDE_INT loffset;
> +
> +	      loffset = alloc_stack_frame_space (INTVAL (large_allocsize),
> +						 PREFERRED_STACK_BOUNDARY);
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^

Uh, this must be PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue
  2016-06-24 12:40         ` Dominik Vogt
@ 2016-07-04 12:20           ` Dominik Vogt
  2016-07-04 14:09             ` Andreas Krebbel
  2016-07-06 12:01             ` Bernd Schmidt
  0 siblings, 2 replies; 27+ messages in thread
From: Dominik Vogt @ 2016-07-04 12:20 UTC (permalink / raw)
  To: Jeff Law, gcc-patches, Andreas Krebbel, Ulrich Weigand, Andreas Arnez

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

Version 4 with the following change:

 * Rebased on top of the "Minor cleanup to
   allocate_dynamic_stack_space" patch.  The "Drop excess size
   used for run time allocated stack variables." path needs an
   update because it touches the dsame code as the patch in this
   message.

Ran the testsuite on s390x biarch, s390 and x86_64.

On Fri, Jun 24, 2016 at 01:30:44PM +0100, Dominik Vogt wrote:
> > The only open question I'm aware of is the
> > stack-usage-2.c test.  I guess foo3() will not generate
> > 
> >   stack usage might be ... bytes
> > 
> > On any target anymore, and using alloca() with a constant size
> > results in "unbounded".  It's unclear to me whether that message
> > is ever generated, and if so, how to trigger it.

This point is still open.  If nobody has more comments Andreas
will commit the (afaik already approved) patch soon and we can
clean up the test case in a follow up patch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-v4-ChangeLog --]
[-- Type: text/plain, Size: 708 bytes --]

gcc/ChangeLog

	* cfgexpand.c (expand_stack_vars): Implement synamic stack space
	allocation in the prologue.
	* explow.c (get_dynamic_stack_base): New function to return an address
	expression for the dynamic stack base.
	(get_dynamic_stack_size): New function to do the required dynamic stack
	space size calculations.
	(allocate_dynamic_stack_space): Use new functions.
	(align_dynamic_address): Move some code from
	allocate_dynamic_stack_space to new function.
	* explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export.
gcc/testsuite/ChangeLog

	* gcc.target/s390/warn-dynamicstack-1.c: New test.
	* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.
	stack-layout-dynamic-1.c: New test.

[-- Attachment #3: 0001-v4-Allocate-constant-size-dynamic-stack-space-in-the-pr.patch --]
[-- Type: text/x-diff, Size: 16023 bytes --]

From 83fafd37883e1af3deb2ff13b9fcaefc9d3b7c7e Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Wed, 25 Nov 2015 09:31:19 +0100
Subject: [PATCH] Allocate constant size dynamic stack space in the
 prologue ...

... and place it in the virtual stack vars area, if the platform supports it.
On S/390 this saves adjusting the stack pointer twice and forcing the frame
pointer into existence.  It also removes the warning with -mwarn-dynamicstack
that is triggered by cfun->calls_alloca == 1.

This fixes a problem with the Linux kernel which aligns the page structure to
16 bytes at run time using inefficient code and issuing a bogus warning.
---
 gcc/cfgexpand.c                                    |  21 +-
 gcc/explow.c                                       | 226 ++++++++++++++-------
 gcc/explow.h                                       |   8 +
 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c      |  14 ++
 gcc/testsuite/gcc.dg/stack-usage-2.c               |   4 +-
 .../gcc.target/s390/warn-dynamicstack-1.c          |  17 ++
 6 files changed, 209 insertions(+), 81 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 56ef71d..f0ef80f 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1052,7 +1052,9 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
   size_t si, i, j, n = stack_vars_num;
   HOST_WIDE_INT large_size = 0, large_alloc = 0;
   rtx large_base = NULL;
+  rtx large_allocsize = NULL;
   unsigned large_align = 0;
+  bool large_allocation_done = false;
   tree decl;
 
   /* Determine if there are any variables requiring "large" alignment.
@@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 
       /* If there were any, allocate space.  */
       if (large_size > 0)
-	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
-						   large_align, true);
+	{
+	  large_allocsize = GEN_INT (large_size);
+	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
+	}
     }
 
   for (si = 0; si < n; ++si)
@@ -1186,6 +1190,19 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	  /* Large alignment is only processed in the last pass.  */
 	  if (pred)
 	    continue;
+
+	  if (large_allocsize && ! large_allocation_done)
+	    {
+	      /* Allocate space the virtual stack vars area in the
+	         prologue.  */
+	      HOST_WIDE_INT loffset;
+
+	      loffset = alloc_stack_frame_space
+		(INTVAL (large_allocsize),
+		 PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
+	      large_base = get_dynamic_stack_base (loffset, large_align);
+	      large_allocation_done = true;
+	    }
 	  gcc_assert (large_base != NULL);
 
 	  large_alloc += alignb - 1;
diff --git a/gcc/explow.c b/gcc/explow.c
index 09a0330..d505e98 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1146,82 +1146,55 @@ record_new_stack_level (void)
     update_sjlj_context ();
 }
 \f
-/* Return an rtx representing the address of an area of memory dynamically
-   pushed on the stack.
+/* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
+static rtx
+align_dynamic_address (rtx target, unsigned required_align)
+{
+  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
+     but we know it can't.  So add ourselves and then do
+     TRUNC_DIV_EXPR.  */
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
+				       Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
+			  gen_int_mode (required_align / BITS_PER_UNIT,
+					Pmode),
+			  NULL_RTX, 1);
+  target = expand_mult (Pmode, target,
+			gen_int_mode (required_align / BITS_PER_UNIT,
+				      Pmode),
+			NULL_RTX, 1);
 
-   Any required stack pointer alignment is preserved.
+  return target;
+}
 
-   SIZE is an rtx representing the size of the area.
+/* Return an rtx through *PSIZE, representing the size of an area of memory to
+   be dynamically pushed on the stack.
+
+   *PSIZE is an rtx representing the size of the area.
 
    SIZE_ALIGN is the alignment (in bits) that we know SIZE has.  This
-   parameter may be zero.  If so, a proper value will be extracted 
+   parameter may be zero.  If so, a proper value will be extracted
    from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed.
 
    REQUIRED_ALIGN is the alignment (in bits) required for the region
    of memory.
 
-   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
-   stack space allocated by the generated code cannot be added with itself
-   in the course of the execution of the function.  It is always safe to
-   pass FALSE here and the following criterion is sufficient in order to
-   pass TRUE: every path in the CFG that starts at the allocation point and
-   loops to it executes the associated deallocation code.  */
-
-rtx
-allocate_dynamic_stack_space (rtx size, unsigned size_align,
-			      unsigned required_align, bool cannot_accumulate)
+   If PSTACK_USAGE_SIZE is not NULL it points to a value that is increased for
+   the additional size returned.  */
+void
+get_dynamic_stack_size (rtx *psize, unsigned size_align,
+			unsigned required_align,
+			HOST_WIDE_INT *pstack_usage_size)
 {
-  HOST_WIDE_INT stack_usage_size = -1;
-  rtx_code_label *final_label;
-  rtx final_target, target;
-  unsigned extra;
-
-  /* If we're asking for zero bytes, it doesn't matter what we point
-     to since we can't dereference it.  But return a reasonable
-     address anyway.  */
-  if (size == const0_rtx)
-    return virtual_stack_dynamic_rtx;
-
-  /* Otherwise, show we're calling alloca or equivalent.  */
-  cfun->calls_alloca = 1;
-
-  /* If stack usage info is requested, look into the size we are passed.
-     We need to do so this early to avoid the obfuscation that may be
-     introduced later by the various alignment operations.  */
-  if (flag_stack_usage_info)
-    {
-      if (CONST_INT_P (size))
-	stack_usage_size = INTVAL (size);
-      else if (REG_P (size))
-        {
-	  /* Look into the last emitted insn and see if we can deduce
-	     something for the register.  */
-	  rtx_insn *insn;
-	  rtx set, note;
-	  insn = get_last_insn ();
-	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
-	    {
-	      if (CONST_INT_P (SET_SRC (set)))
-		stack_usage_size = INTVAL (SET_SRC (set));
-	      else if ((note = find_reg_equal_equiv_note (insn))
-		       && CONST_INT_P (XEXP (note, 0)))
-		stack_usage_size = INTVAL (XEXP (note, 0));
-	    }
-	}
-
-      /* If the size is not constant, we can't say anything.  */
-      if (stack_usage_size == -1)
-	{
-	  current_function_has_unbounded_dynamic_stack_size = 1;
-	  stack_usage_size = 0;
-	}
-    }
+  unsigned extra = 0;
+  rtx size = *psize;
 
   /* Ensure the size is in the proper mode.  */
   if (GET_MODE (size) != VOIDmode && GET_MODE (size) != Pmode)
     size = convert_to_mode (Pmode, size, 1);
 
-  /* Adjust SIZE_ALIGN, if needed.  */
   if (CONST_INT_P (size))
     {
       unsigned HOST_WIDE_INT lsb;
@@ -1255,8 +1228,9 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
   size = plus_constant (Pmode, size, extra);
   size = force_operand (size, NULL_RTX);
 
-  if (flag_stack_usage_info)
-    stack_usage_size += extra;
+  /*!!!*/
+  if (flag_stack_usage_info && pstack_usage_size)
+    *pstack_usage_size += extra;
 
   if (extra && size_align > BITS_PER_UNIT)
     size_align = BITS_PER_UNIT;
@@ -1278,13 +1252,89 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
     {
       size = round_push (size);
 
-      if (flag_stack_usage_info)
+      if (flag_stack_usage_info && pstack_usage_size)
 	{
 	  int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
-	  stack_usage_size = (stack_usage_size + align - 1) / align * align;
+	  *pstack_usage_size =
+	    (*pstack_usage_size + align - 1) / align * align;
 	}
     }
 
+  *psize = size;
+}
+
+/* Return an rtx representing the address of an area of memory dynamically
+   pushed on the stack.
+
+   Any required stack pointer alignment is preserved.
+
+   SIZE is an rtx representing the size of the area.
+
+   SIZE_ALIGN is the alignment (in bits) that we know SIZE has.  This
+   parameter may be zero.  If so, a proper value will be extracted
+   from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.
+
+   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
+   stack space allocated by the generated code cannot be added with itself
+   in the course of the execution of the function.  It is always safe to
+   pass FALSE here and the following criterion is sufficient in order to
+   pass TRUE: every path in the CFG that starts at the allocation point and
+   loops to it executes the associated deallocation code.  */
+
+rtx
+allocate_dynamic_stack_space (rtx size, unsigned size_align,
+			      unsigned required_align, bool cannot_accumulate)
+{
+  HOST_WIDE_INT stack_usage_size = -1;
+  rtx_code_label *final_label;
+  rtx final_target, target;
+
+  /* If we're asking for zero bytes, it doesn't matter what we point
+     to since we can't dereference it.  But return a reasonable
+     address anyway.  */
+  if (size == const0_rtx)
+    return virtual_stack_dynamic_rtx;
+
+  /* Otherwise, show we're calling alloca or equivalent.  */
+  cfun->calls_alloca = 1;
+
+  /* If stack usage info is requested, look into the size we are passed.
+     We need to do so this early to avoid the obfuscation that may be
+     introduced later by the various alignment operations.  */
+  if (flag_stack_usage_info)
+    {
+      if (CONST_INT_P (size))
+	stack_usage_size = INTVAL (size);
+      else if (REG_P (size))
+        {
+	  /* Look into the last emitted insn and see if we can deduce
+	     something for the register.  */
+	  rtx_insn *insn;
+	  rtx set, note;
+	  insn = get_last_insn ();
+	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
+	    {
+	      if (CONST_INT_P (SET_SRC (set)))
+		stack_usage_size = INTVAL (SET_SRC (set));
+	      else if ((note = find_reg_equal_equiv_note (insn))
+		       && CONST_INT_P (XEXP (note, 0)))
+		stack_usage_size = INTVAL (XEXP (note, 0));
+	    }
+	}
+
+      /* If the size is not constant, we can't say anything.  */
+      if (stack_usage_size == -1)
+	{
+	  current_function_has_unbounded_dynamic_stack_size = 1;
+	  stack_usage_size = 0;
+	}
+    }
+
+  get_dynamic_stack_size (&size, size_align, required_align, &stack_usage_size);
+
   target = gen_reg_rtx (Pmode);
 
   /* The size is supposed to be fully adjusted at this point so record it
@@ -1447,19 +1497,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
       target = final_target;
     }
 
-  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
-     but we know it can't.  So add ourselves and then do
-     TRUNC_DIV_EXPR.  */
-  target = expand_binop (Pmode, add_optab, target,
-			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
-				       Pmode),
-			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
-  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
-			  gen_int_mode (required_align / BITS_PER_UNIT, Pmode),
-			  NULL_RTX, 1);
-  target = expand_mult (Pmode, target,
-			gen_int_mode (required_align / BITS_PER_UNIT, Pmode),
-			NULL_RTX, 1);
+  target = align_dynamic_address (target, required_align);
 
   /* Now that we've committed to a return value, mark its alignment.  */
   mark_reg_pointer (target, required_align);
@@ -1469,6 +1507,38 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
 
   return target;
 }
+
+/* Return an rtx representing the address of an area of memory already
+   statically pushed onto the stack in the virtual stack vars area.  (It is
+   assumed that the area is allocated in the function prologue.)
+
+   Any required stack pointer alignment is preserved.
+
+   OFFSET is the offset of the area into the virtual stack vars area.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.  */
+
+rtx
+get_dynamic_stack_base (HOST_WIDE_INT offset, unsigned required_align)
+{
+  rtx target;
+
+  if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
+    crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+
+  target = gen_reg_rtx (Pmode);
+  emit_move_insn (target, virtual_stack_vars_rtx);
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (offset, Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = align_dynamic_address (target, required_align);
+
+  /* Now that we've committed to a return value, mark its alignment.  */
+  mark_reg_pointer (target, required_align);
+
+  return target;
+}
 \f
 /* A front end may want to override GCC's stack checking by providing a
    run-time routine to call to check the stack, so provide a mechanism for
diff --git a/gcc/explow.h b/gcc/explow.h
index 52113db..e12f90c 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -87,6 +87,14 @@ extern void record_new_stack_level (void);
 /* Allocate some space on the stack dynamically and return its address.  */
 extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned, bool);
 
+/* Calculate the necessary size of a constant dynamic stack allocation from the
+   size of the variable area.  */
+extern void get_dynamic_stack_size (rtx *, unsigned, unsigned, HOST_WIDE_INT *);
+
+/* Returns the address of the dynamic stack space without allocating it.  */
+extern rtx get_dynamic_stack_base (HOST_WIDE_INT offset,
+				   unsigned required_align);
+
 /* Emit one stack probe at ADDRESS, an address within the stack.  */
 extern void emit_stack_probe (rtx);
 
diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
new file mode 100644
index 0000000..6dc17af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
@@ -0,0 +1,14 @@
+/* Verify that run time aligned local variables are aloocated in the prologue
+   in one pass together with normal local variables.  */
+/* { dg-do compile } */
+/* { dg-options "-O0 -fomit-frame-pointer" } */
+
+extern void bar (void *, void *, void *);
+void foo (void)
+{
+  int i;
+  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
+  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
+  bar (&i, &runtime_aligned_1, &runtime_aligned_2);
+}
+/* { dg-final { scan-assembler-not "cfi_def_cfa_register" } } */
diff --git a/gcc/testsuite/gcc.dg/stack-usage-2.c b/gcc/testsuite/gcc.dg/stack-usage-2.c
index 86e2a65..1a9e7f3 100644
--- a/gcc/testsuite/gcc.dg/stack-usage-2.c
+++ b/gcc/testsuite/gcc.dg/stack-usage-2.c
@@ -16,7 +16,9 @@ int foo2 (void)  /* { dg-warning "stack usage is \[0-9\]* bytes" } */
   return 0;
 }
 
-int foo3 (void) /* { dg-warning "stack usage might be \[0-9\]* bytes" } */
+/* The actual warning depends on whether stack space is allocated dynamically
+   or statically.  */
+int foo3 (void) /* { dg-warning "stack usage (might be)|(is) \[0-9\]* bytes" } */
 {
   char arr[1024] __attribute__((aligned (512)));
   arr[0] = 1;
diff --git a/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c b/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c
new file mode 100644
index 0000000..66913f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c
@@ -0,0 +1,17 @@
+/* Check that the stack pointer is decreased only once in a funtion with
+   runtime aligned stack variables and -mwarn-dynamicstack does not generate a
+   warning.  */
+
+/* { dg-do compile { target { s390*-*-* } } } */
+/* { dg-options "-O2 -mwarn-dynamicstack" } */
+
+extern int bar (char *pl);
+
+int foo (long size)
+{
+  char __attribute__ ((aligned(16))) l = size;
+
+  return bar (&l);
+}
+
+/* { dg-final { scan-assembler-times "%r15,-" 1 } } */
-- 
2.3.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue
  2016-07-04 12:20           ` Dominik Vogt
@ 2016-07-04 14:09             ` Andreas Krebbel
  2016-07-06 12:01             ` Bernd Schmidt
  1 sibling, 0 replies; 27+ messages in thread
From: Andreas Krebbel @ 2016-07-04 14:09 UTC (permalink / raw)
  To: vogt, Jeff Law, gcc-patches, Ulrich Weigand, Andreas Arnez

On 07/04/2016 02:19 PM, Dominik Vogt wrote:
> Version 4 with the following change:
> 
>  * Rebased on top of the "Minor cleanup to
>    allocate_dynamic_stack_space" patch.  The "Drop excess size
>    used for run time allocated stack variables." path needs an
>    update because it touches the dsame code as the patch in this
>    message.
> 
> Ran the testsuite on s390x biarch, s390 and x86_64.
> 
> On Fri, Jun 24, 2016 at 01:30:44PM +0100, Dominik Vogt wrote:
>>> The only open question I'm aware of is the
>>> stack-usage-2.c test.  I guess foo3() will not generate
>>>
>>>   stack usage might be ... bytes
>>>
>>> On any target anymore, and using alloca() with a constant size
>>> results in "unbounded".  It's unclear to me whether that message
>>> is ever generated, and if so, how to trigger it.
> 
> This point is still open.  If nobody has more comments Andreas
> will commit the (afaik already approved) patch soon and we can
> clean up the test case in a follow up patch.

I would like to see an explicit approval before doing the commit.  I think it would also make sense
to let other target maintainers have a look whether this might cause any problems.

Bye,

-Andreas-


> 
> Ciao
> 
> Dominik ^_^  ^_^
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue
  2016-07-04 12:20           ` Dominik Vogt
  2016-07-04 14:09             ` Andreas Krebbel
@ 2016-07-06 12:01             ` Bernd Schmidt
  2016-07-07 11:57               ` Dominik Vogt
  1 sibling, 1 reply; 27+ messages in thread
From: Bernd Schmidt @ 2016-07-06 12:01 UTC (permalink / raw)
  To: vogt, Jeff Law, gcc-patches, Andreas Krebbel, Ulrich Weigand,
	Andreas Arnez

There's one thing I don't quite understand and which seems to have 
changed since v1:

On 07/04/2016 02:19 PM, Dominik Vogt wrote:
> @@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>
>        /* If there were any, allocate space.  */
>        if (large_size > 0)
> -	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
> -						   large_align, true);
> +	{
> +	  large_allocsize = GEN_INT (large_size);
> +	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
> +	}
>      }
>
>    for (si = 0; si < n; ++si)
> @@ -1186,6 +1190,19 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>  	  /* Large alignment is only processed in the last pass.  */
>  	  if (pred)
>  	    continue;
> +
> +	  if (large_allocsize && ! large_allocation_done)
> +	    {
> +	      /* Allocate space the virtual stack vars area in the
> +	         prologue.  */
> +	      HOST_WIDE_INT loffset;
> +
> +	      loffset = alloc_stack_frame_space
> +		(INTVAL (large_allocsize),
> +		 PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
> +	      large_base = get_dynamic_stack_base (loffset, large_align);
> +	      large_allocation_done = true;
> +	    }
>  	  gcc_assert (large_base != NULL);
>

Why is this code split between the two places here? v1 seems to have 
done it all in the first piece of code where we now only set 
large_allocsize.


Bernd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue
  2016-07-06 12:01             ` Bernd Schmidt
@ 2016-07-07 11:57               ` Dominik Vogt
  2016-07-11 11:44                 ` [PATCH v5] " Dominik Vogt
  0 siblings, 1 reply; 27+ messages in thread
From: Dominik Vogt @ 2016-07-07 11:57 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Jeff Law, gcc-patches, Andreas Krebbel, Ulrich Weigand, Andreas Arnez

On Wed, Jul 06, 2016 at 02:01:06PM +0200, Bernd Schmidt wrote:
> There's one thing I don't quite understand and which seems to have
> changed since v1:
> 
> On 07/04/2016 02:19 PM, Dominik Vogt wrote:
> >@@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
> >
> >       /* If there were any, allocate space.  */
> >       if (large_size > 0)
> >-	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
> >-						   large_align, true);
> >+	{
> >+	  large_allocsize = GEN_INT (large_size);
> >+	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
> >+	}
> >     }
> >
> >   for (si = 0; si < n; ++si)
> >@@ -1186,6 +1190,19 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
> > 	  /* Large alignment is only processed in the last pass.  */
> > 	  if (pred)
> > 	    continue;
> >+
> >+	  if (large_allocsize && ! large_allocation_done)
> >+	    {
> >+	      /* Allocate space the virtual stack vars area in the
> >+	         prologue.  */
> >+	      HOST_WIDE_INT loffset;
> >+
> >+	      loffset = alloc_stack_frame_space
> >+		(INTVAL (large_allocsize),
> >+		 PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
> >+	      large_base = get_dynamic_stack_base (loffset, large_align);
> >+	      large_allocation_done = true;
> >+	    }
> > 	  gcc_assert (large_base != NULL);
> >
> 
> Why is this code split between the two places here?

This is a bugfix from v1 to v2.
I think I had to move this code to the second loop so that the
space for dynamically aligned variables is allocated at the "end"
of the space for stack variables.  I cannot remember what the bug
was, but maybe it was that the variables with fixed and static
alignment ended up at the same address.

Anyway, now that the new allocation strategy is used
unconditionally, it should be possible to move the
get_dynamic_stack_size call down to the second loop and simplify
the code somewhat.  I'll look into that.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] Allocate constant size dynamic stack space in the prologue
  2016-07-07 11:57               ` Dominik Vogt
@ 2016-07-11 11:44                 ` Dominik Vogt
  2016-07-13 22:12                   ` Jeff Law
  0 siblings, 1 reply; 27+ messages in thread
From: Dominik Vogt @ 2016-07-11 11:44 UTC (permalink / raw)
  To: Bernd Schmidt, Jeff Law, gcc-patches, Andreas Krebbel,
	Ulrich Weigand, Andreas Arnez

On Thu, Jul 07, 2016 at 12:57:16PM +0100, Dominik Vogt wrote:
> On Wed, Jul 06, 2016 at 02:01:06PM +0200, Bernd Schmidt wrote:
> > There's one thing I don't quite understand and which seems to have
> > changed since v1:
> > 
> > On 07/04/2016 02:19 PM, Dominik Vogt wrote:
> > >@@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
> > >
> > >       /* If there were any, allocate space.  */
> > >       if (large_size > 0)
> > >-	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
> > >-						   large_align, true);
> > >+	{
> > >+	  large_allocsize = GEN_INT (large_size);
> > >+	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
> > >+	}
> > >     }
> > >
> > >   for (si = 0; si < n; ++si)
> > >@@ -1186,6 +1190,19 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
> > > 	  /* Large alignment is only processed in the last pass.  */
> > > 	  if (pred)
> > > 	    continue;
> > >+
> > >+	  if (large_allocsize && ! large_allocation_done)
> > >+	    {
> > >+	      /* Allocate space the virtual stack vars area in the
> > >+	         prologue.  */
> > >+	      HOST_WIDE_INT loffset;
> > >+
> > >+	      loffset = alloc_stack_frame_space
> > >+		(INTVAL (large_allocsize),
> > >+		 PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
> > >+	      large_base = get_dynamic_stack_base (loffset, large_align);
> > >+	      large_allocation_done = true;
> > >+	    }
> > > 	  gcc_assert (large_base != NULL);
> > >
> > 
> > Why is this code split between the two places here?
> 
> This is a bugfix from v1 to v2.
> I think I had to move this code to the second loop so that the
> space for dynamically aligned variables is allocated at the "end"
> of the space for stack variables.  I cannot remember what the bug
> was, but maybe it was that the variables with fixed and static
> alignment ended up at the same address.
> 
> Anyway, now that the new allocation strategy is used
> unconditionally, it should be possible to move the
> get_dynamic_stack_size call down to the second loop and simplify
> the code somewhat.  I'll look into that.

Version 5 with some code moved from the first loop to the second.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] Allocate constant size dynamic stack space in the prologue
  2016-07-11 11:44                 ` [PATCH v5] " Dominik Vogt
@ 2016-07-13 22:12                   ` Jeff Law
  2016-07-14  9:11                     ` Dominik Vogt
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Law @ 2016-07-13 22:12 UTC (permalink / raw)
  To: vogt, Bernd Schmidt, gcc-patches, Andreas Krebbel,
	Ulrich Weigand, Andreas Arnez

On 07/11/2016 05:44 AM, Dominik Vogt wrote:
> On Thu, Jul 07, 2016 at 12:57:16PM +0100, Dominik Vogt wrote:
>> On Wed, Jul 06, 2016 at 02:01:06PM +0200, Bernd Schmidt wrote:
>>> There's one thing I don't quite understand and which seems to have
>>> changed since v1:
>>>
>>> On 07/04/2016 02:19 PM, Dominik Vogt wrote:
>>>> @@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>>>>
>>>>       /* If there were any, allocate space.  */
>>>>       if (large_size > 0)
>>>> -	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
>>>> -						   large_align, true);
>>>> +	{
>>>> +	  large_allocsize = GEN_INT (large_size);
>>>> +	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
>>>> +	}
>>>>     }
>>>>
>>>>   for (si = 0; si < n; ++si)
>>>> @@ -1186,6 +1190,19 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>>>> 	  /* Large alignment is only processed in the last pass.  */
>>>> 	  if (pred)
>>>> 	    continue;
>>>> +
>>>> +	  if (large_allocsize && ! large_allocation_done)
>>>> +	    {
>>>> +	      /* Allocate space the virtual stack vars area in the
>>>> +	         prologue.  */
>>>> +	      HOST_WIDE_INT loffset;
>>>> +
>>>> +	      loffset = alloc_stack_frame_space
>>>> +		(INTVAL (large_allocsize),
>>>> +		 PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
>>>> +	      large_base = get_dynamic_stack_base (loffset, large_align);
>>>> +	      large_allocation_done = true;
>>>> +	    }
>>>> 	  gcc_assert (large_base != NULL);
>>>>
>>>
>>> Why is this code split between the two places here?
>>
>> This is a bugfix from v1 to v2.
>> I think I had to move this code to the second loop so that the
>> space for dynamically aligned variables is allocated at the "end"
>> of the space for stack variables.  I cannot remember what the bug
>> was, but maybe it was that the variables with fixed and static
>> alignment ended up at the same address.
>>
>> Anyway, now that the new allocation strategy is used
>> unconditionally, it should be possible to move the
>> get_dynamic_stack_size call down to the second loop and simplify
>> the code somewhat.  I'll look into that.
>
> Version 5 with some code moved from the first loop to the second.
-ENOPATCH
jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] Allocate constant size dynamic stack space in the prologue
  2016-07-13 22:12                   ` Jeff Law
@ 2016-07-14  9:11                     ` Dominik Vogt
  2016-07-15 11:51                       ` Bernd Schmidt
  0 siblings, 1 reply; 27+ messages in thread
From: Dominik Vogt @ 2016-07-14  9:11 UTC (permalink / raw)
  To: Jeff Law
  Cc: Bernd Schmidt, gcc-patches, Andreas Krebbel, Ulrich Weigand,
	Andreas Arnez

[-- Attachment #1: Type: text/plain, Size: 2441 bytes --]

On Wed, Jul 13, 2016 at 04:12:36PM -0600, Jeff Law wrote:
> On 07/11/2016 05:44 AM, Dominik Vogt wrote:
> >On Thu, Jul 07, 2016 at 12:57:16PM +0100, Dominik Vogt wrote:
> >>On Wed, Jul 06, 2016 at 02:01:06PM +0200, Bernd Schmidt wrote:
> >>>There's one thing I don't quite understand and which seems to have
> >>>changed since v1:
> >>>
> >>>On 07/04/2016 02:19 PM, Dominik Vogt wrote:
> >>>>@@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
> >>>>
> >>>>      /* If there were any, allocate space.  */
> >>>>      if (large_size > 0)
> >>>>-	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
> >>>>-						   large_align, true);
> >>>>+	{
> >>>>+	  large_allocsize = GEN_INT (large_size);
> >>>>+	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
> >>>>+	}
> >>>>    }
> >>>>
> >>>>  for (si = 0; si < n; ++si)
> >>>>@@ -1186,6 +1190,19 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
> >>>>	  /* Large alignment is only processed in the last pass.  */
> >>>>	  if (pred)
> >>>>	    continue;
> >>>>+
> >>>>+	  if (large_allocsize && ! large_allocation_done)
> >>>>+	    {
> >>>>+	      /* Allocate space the virtual stack vars area in the
> >>>>+	         prologue.  */
> >>>>+	      HOST_WIDE_INT loffset;
> >>>>+
> >>>>+	      loffset = alloc_stack_frame_space
> >>>>+		(INTVAL (large_allocsize),
> >>>>+		 PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
> >>>>+	      large_base = get_dynamic_stack_base (loffset, large_align);
> >>>>+	      large_allocation_done = true;
> >>>>+	    }
> >>>>	  gcc_assert (large_base != NULL);
> >>>>
> >>>
> >>>Why is this code split between the two places here?
> >>
> >>This is a bugfix from v1 to v2.
> >>I think I had to move this code to the second loop so that the
> >>space for dynamically aligned variables is allocated at the "end"
> >>of the space for stack variables.  I cannot remember what the bug
> >>was, but maybe it was that the variables with fixed and static
> >>alignment ended up at the same address.
> >>
> >>Anyway, now that the new allocation strategy is used
> >>unconditionally, it should be possible to move the
> >>get_dynamic_stack_size call down to the second loop and simplify
> >>the code somewhat.  I'll look into that.
> >
> >Version 5 with some code moved from the first loop to the second.
> -ENOPATCH

Oops.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-v5-ChangeLog --]
[-- Type: text/plain, Size: 708 bytes --]

gcc/ChangeLog

	* cfgexpand.c (expand_stack_vars): Implement synamic stack space
	allocation in the prologue.
	* explow.c (get_dynamic_stack_base): New function to return an address
	expression for the dynamic stack base.
	(get_dynamic_stack_size): New function to do the required dynamic stack
	space size calculations.
	(allocate_dynamic_stack_space): Use new functions.
	(align_dynamic_address): Move some code from
	allocate_dynamic_stack_space to new function.
	* explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export.
gcc/testsuite/ChangeLog

	* gcc.target/s390/warn-dynamicstack-1.c: New test.
	* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.
	stack-layout-dynamic-1.c: New test.

[-- Attachment #3: 0001-v5-Allocate-constant-size-dynamic-stack-space-in-the-pr.patch --]
[-- Type: text/x-diff, Size: 16069 bytes --]

From 254581388640af34bcff55105ec13555043b62e5 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Wed, 25 Nov 2015 09:31:19 +0100
Subject: [PATCH] Allocate constant size dynamic stack space in the
 prologue ...

... and place it in the virtual stack vars area, if the platform supports it.
On S/390 this saves adjusting the stack pointer twice and forcing the frame
pointer into existence.  It also removes the warning with -mwarn-dynamicstack
that is triggered by cfun->calls_alloca == 1.

This fixes a problem with the Linux kernel which aligns the page structure to
16 bytes at run time using inefficient code and issuing a bogus warning.
---
 gcc/cfgexpand.c                                    |  22 +-
 gcc/explow.c                                       | 226 ++++++++++++++-------
 gcc/explow.h                                       |   8 +
 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c      |  14 ++
 gcc/testsuite/gcc.dg/stack-usage-2.c               |   4 +-
 .../gcc.target/s390/warn-dynamicstack-1.c          |  17 ++
 6 files changed, 207 insertions(+), 84 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index e4ddb3a..5046d61 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1053,6 +1053,7 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
   HOST_WIDE_INT large_size = 0, large_alloc = 0;
   rtx large_base = NULL;
   unsigned large_align = 0;
+  bool large_allocation_done = false;
   tree decl;
 
   /* Determine if there are any variables requiring "large" alignment.
@@ -1096,11 +1097,6 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	  large_size &= -(HOST_WIDE_INT)alignb;
 	  large_size += stack_vars[i].size;
 	}
-
-      /* If there were any, allocate space.  */
-      if (large_size > 0)
-	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
-						   large_align, true);
     }
 
   for (si = 0; si < n; ++si)
@@ -1186,6 +1182,22 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	  /* Large alignment is only processed in the last pass.  */
 	  if (pred)
 	    continue;
+
+	  /* If there were any variables requiring "large" alignment, allocate
+	     space.  */
+	  if (large_size > 0 && ! large_allocation_done)
+	    {
+	      HOST_WIDE_INT loffset;
+	      rtx large_allocsize;
+
+	      large_allocsize = GEN_INT (large_size);
+	      get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
+	      loffset = alloc_stack_frame_space
+		(INTVAL (large_allocsize),
+		 PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
+	      large_base = get_dynamic_stack_base (loffset, large_align);
+	      large_allocation_done = true;
+	    }
 	  gcc_assert (large_base != NULL);
 
 	  large_alloc += alignb - 1;
diff --git a/gcc/explow.c b/gcc/explow.c
index 09a0330..d505e98 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1146,82 +1146,55 @@ record_new_stack_level (void)
     update_sjlj_context ();
 }
 \f
-/* Return an rtx representing the address of an area of memory dynamically
-   pushed on the stack.
+/* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
+static rtx
+align_dynamic_address (rtx target, unsigned required_align)
+{
+  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
+     but we know it can't.  So add ourselves and then do
+     TRUNC_DIV_EXPR.  */
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
+				       Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
+			  gen_int_mode (required_align / BITS_PER_UNIT,
+					Pmode),
+			  NULL_RTX, 1);
+  target = expand_mult (Pmode, target,
+			gen_int_mode (required_align / BITS_PER_UNIT,
+				      Pmode),
+			NULL_RTX, 1);
 
-   Any required stack pointer alignment is preserved.
+  return target;
+}
 
-   SIZE is an rtx representing the size of the area.
+/* Return an rtx through *PSIZE, representing the size of an area of memory to
+   be dynamically pushed on the stack.
+
+   *PSIZE is an rtx representing the size of the area.
 
    SIZE_ALIGN is the alignment (in bits) that we know SIZE has.  This
-   parameter may be zero.  If so, a proper value will be extracted 
+   parameter may be zero.  If so, a proper value will be extracted
    from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed.
 
    REQUIRED_ALIGN is the alignment (in bits) required for the region
    of memory.
 
-   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
-   stack space allocated by the generated code cannot be added with itself
-   in the course of the execution of the function.  It is always safe to
-   pass FALSE here and the following criterion is sufficient in order to
-   pass TRUE: every path in the CFG that starts at the allocation point and
-   loops to it executes the associated deallocation code.  */
-
-rtx
-allocate_dynamic_stack_space (rtx size, unsigned size_align,
-			      unsigned required_align, bool cannot_accumulate)
+   If PSTACK_USAGE_SIZE is not NULL it points to a value that is increased for
+   the additional size returned.  */
+void
+get_dynamic_stack_size (rtx *psize, unsigned size_align,
+			unsigned required_align,
+			HOST_WIDE_INT *pstack_usage_size)
 {
-  HOST_WIDE_INT stack_usage_size = -1;
-  rtx_code_label *final_label;
-  rtx final_target, target;
-  unsigned extra;
-
-  /* If we're asking for zero bytes, it doesn't matter what we point
-     to since we can't dereference it.  But return a reasonable
-     address anyway.  */
-  if (size == const0_rtx)
-    return virtual_stack_dynamic_rtx;
-
-  /* Otherwise, show we're calling alloca or equivalent.  */
-  cfun->calls_alloca = 1;
-
-  /* If stack usage info is requested, look into the size we are passed.
-     We need to do so this early to avoid the obfuscation that may be
-     introduced later by the various alignment operations.  */
-  if (flag_stack_usage_info)
-    {
-      if (CONST_INT_P (size))
-	stack_usage_size = INTVAL (size);
-      else if (REG_P (size))
-        {
-	  /* Look into the last emitted insn and see if we can deduce
-	     something for the register.  */
-	  rtx_insn *insn;
-	  rtx set, note;
-	  insn = get_last_insn ();
-	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
-	    {
-	      if (CONST_INT_P (SET_SRC (set)))
-		stack_usage_size = INTVAL (SET_SRC (set));
-	      else if ((note = find_reg_equal_equiv_note (insn))
-		       && CONST_INT_P (XEXP (note, 0)))
-		stack_usage_size = INTVAL (XEXP (note, 0));
-	    }
-	}
-
-      /* If the size is not constant, we can't say anything.  */
-      if (stack_usage_size == -1)
-	{
-	  current_function_has_unbounded_dynamic_stack_size = 1;
-	  stack_usage_size = 0;
-	}
-    }
+  unsigned extra = 0;
+  rtx size = *psize;
 
   /* Ensure the size is in the proper mode.  */
   if (GET_MODE (size) != VOIDmode && GET_MODE (size) != Pmode)
     size = convert_to_mode (Pmode, size, 1);
 
-  /* Adjust SIZE_ALIGN, if needed.  */
   if (CONST_INT_P (size))
     {
       unsigned HOST_WIDE_INT lsb;
@@ -1255,8 +1228,9 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
   size = plus_constant (Pmode, size, extra);
   size = force_operand (size, NULL_RTX);
 
-  if (flag_stack_usage_info)
-    stack_usage_size += extra;
+  /*!!!*/
+  if (flag_stack_usage_info && pstack_usage_size)
+    *pstack_usage_size += extra;
 
   if (extra && size_align > BITS_PER_UNIT)
     size_align = BITS_PER_UNIT;
@@ -1278,13 +1252,89 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
     {
       size = round_push (size);
 
-      if (flag_stack_usage_info)
+      if (flag_stack_usage_info && pstack_usage_size)
 	{
 	  int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
-	  stack_usage_size = (stack_usage_size + align - 1) / align * align;
+	  *pstack_usage_size =
+	    (*pstack_usage_size + align - 1) / align * align;
 	}
     }
 
+  *psize = size;
+}
+
+/* Return an rtx representing the address of an area of memory dynamically
+   pushed on the stack.
+
+   Any required stack pointer alignment is preserved.
+
+   SIZE is an rtx representing the size of the area.
+
+   SIZE_ALIGN is the alignment (in bits) that we know SIZE has.  This
+   parameter may be zero.  If so, a proper value will be extracted
+   from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.
+
+   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
+   stack space allocated by the generated code cannot be added with itself
+   in the course of the execution of the function.  It is always safe to
+   pass FALSE here and the following criterion is sufficient in order to
+   pass TRUE: every path in the CFG that starts at the allocation point and
+   loops to it executes the associated deallocation code.  */
+
+rtx
+allocate_dynamic_stack_space (rtx size, unsigned size_align,
+			      unsigned required_align, bool cannot_accumulate)
+{
+  HOST_WIDE_INT stack_usage_size = -1;
+  rtx_code_label *final_label;
+  rtx final_target, target;
+
+  /* If we're asking for zero bytes, it doesn't matter what we point
+     to since we can't dereference it.  But return a reasonable
+     address anyway.  */
+  if (size == const0_rtx)
+    return virtual_stack_dynamic_rtx;
+
+  /* Otherwise, show we're calling alloca or equivalent.  */
+  cfun->calls_alloca = 1;
+
+  /* If stack usage info is requested, look into the size we are passed.
+     We need to do so this early to avoid the obfuscation that may be
+     introduced later by the various alignment operations.  */
+  if (flag_stack_usage_info)
+    {
+      if (CONST_INT_P (size))
+	stack_usage_size = INTVAL (size);
+      else if (REG_P (size))
+        {
+	  /* Look into the last emitted insn and see if we can deduce
+	     something for the register.  */
+	  rtx_insn *insn;
+	  rtx set, note;
+	  insn = get_last_insn ();
+	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
+	    {
+	      if (CONST_INT_P (SET_SRC (set)))
+		stack_usage_size = INTVAL (SET_SRC (set));
+	      else if ((note = find_reg_equal_equiv_note (insn))
+		       && CONST_INT_P (XEXP (note, 0)))
+		stack_usage_size = INTVAL (XEXP (note, 0));
+	    }
+	}
+
+      /* If the size is not constant, we can't say anything.  */
+      if (stack_usage_size == -1)
+	{
+	  current_function_has_unbounded_dynamic_stack_size = 1;
+	  stack_usage_size = 0;
+	}
+    }
+
+  get_dynamic_stack_size (&size, size_align, required_align, &stack_usage_size);
+
   target = gen_reg_rtx (Pmode);
 
   /* The size is supposed to be fully adjusted at this point so record it
@@ -1447,19 +1497,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
       target = final_target;
     }
 
-  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
-     but we know it can't.  So add ourselves and then do
-     TRUNC_DIV_EXPR.  */
-  target = expand_binop (Pmode, add_optab, target,
-			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
-				       Pmode),
-			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
-  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
-			  gen_int_mode (required_align / BITS_PER_UNIT, Pmode),
-			  NULL_RTX, 1);
-  target = expand_mult (Pmode, target,
-			gen_int_mode (required_align / BITS_PER_UNIT, Pmode),
-			NULL_RTX, 1);
+  target = align_dynamic_address (target, required_align);
 
   /* Now that we've committed to a return value, mark its alignment.  */
   mark_reg_pointer (target, required_align);
@@ -1469,6 +1507,38 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
 
   return target;
 }
+
+/* Return an rtx representing the address of an area of memory already
+   statically pushed onto the stack in the virtual stack vars area.  (It is
+   assumed that the area is allocated in the function prologue.)
+
+   Any required stack pointer alignment is preserved.
+
+   OFFSET is the offset of the area into the virtual stack vars area.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.  */
+
+rtx
+get_dynamic_stack_base (HOST_WIDE_INT offset, unsigned required_align)
+{
+  rtx target;
+
+  if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
+    crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+
+  target = gen_reg_rtx (Pmode);
+  emit_move_insn (target, virtual_stack_vars_rtx);
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (offset, Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = align_dynamic_address (target, required_align);
+
+  /* Now that we've committed to a return value, mark its alignment.  */
+  mark_reg_pointer (target, required_align);
+
+  return target;
+}
 \f
 /* A front end may want to override GCC's stack checking by providing a
    run-time routine to call to check the stack, so provide a mechanism for
diff --git a/gcc/explow.h b/gcc/explow.h
index 52113db..e12f90c 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -87,6 +87,14 @@ extern void record_new_stack_level (void);
 /* Allocate some space on the stack dynamically and return its address.  */
 extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned, bool);
 
+/* Calculate the necessary size of a constant dynamic stack allocation from the
+   size of the variable area.  */
+extern void get_dynamic_stack_size (rtx *, unsigned, unsigned, HOST_WIDE_INT *);
+
+/* Returns the address of the dynamic stack space without allocating it.  */
+extern rtx get_dynamic_stack_base (HOST_WIDE_INT offset,
+				   unsigned required_align);
+
 /* Emit one stack probe at ADDRESS, an address within the stack.  */
 extern void emit_stack_probe (rtx);
 
diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
new file mode 100644
index 0000000..6dc17af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
@@ -0,0 +1,14 @@
+/* Verify that run time aligned local variables are aloocated in the prologue
+   in one pass together with normal local variables.  */
+/* { dg-do compile } */
+/* { dg-options "-O0 -fomit-frame-pointer" } */
+
+extern void bar (void *, void *, void *);
+void foo (void)
+{
+  int i;
+  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
+  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
+  bar (&i, &runtime_aligned_1, &runtime_aligned_2);
+}
+/* { dg-final { scan-assembler-not "cfi_def_cfa_register" } } */
diff --git a/gcc/testsuite/gcc.dg/stack-usage-2.c b/gcc/testsuite/gcc.dg/stack-usage-2.c
index 86e2a65..1a9e7f3 100644
--- a/gcc/testsuite/gcc.dg/stack-usage-2.c
+++ b/gcc/testsuite/gcc.dg/stack-usage-2.c
@@ -16,7 +16,9 @@ int foo2 (void)  /* { dg-warning "stack usage is \[0-9\]* bytes" } */
   return 0;
 }
 
-int foo3 (void) /* { dg-warning "stack usage might be \[0-9\]* bytes" } */
+/* The actual warning depends on whether stack space is allocated dynamically
+   or statically.  */
+int foo3 (void) /* { dg-warning "stack usage (might be)|(is) \[0-9\]* bytes" } */
 {
   char arr[1024] __attribute__((aligned (512)));
   arr[0] = 1;
diff --git a/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c b/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c
new file mode 100644
index 0000000..66913f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c
@@ -0,0 +1,17 @@
+/* Check that the stack pointer is decreased only once in a funtion with
+   runtime aligned stack variables and -mwarn-dynamicstack does not generate a
+   warning.  */
+
+/* { dg-do compile { target { s390*-*-* } } } */
+/* { dg-options "-O2 -mwarn-dynamicstack" } */
+
+extern int bar (char *pl);
+
+int foo (long size)
+{
+  char __attribute__ ((aligned(16))) l = size;
+
+  return bar (&l);
+}
+
+/* { dg-final { scan-assembler-times "%r15,-" 1 } } */
-- 
2.3.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] Allocate constant size dynamic stack space in the prologue
  2016-07-14  9:11                     ` Dominik Vogt
@ 2016-07-15 11:51                       ` Bernd Schmidt
  2016-07-15 12:05                         ` Dominik Vogt
  0 siblings, 1 reply; 27+ messages in thread
From: Bernd Schmidt @ 2016-07-15 11:51 UTC (permalink / raw)
  To: vogt, Jeff Law, gcc-patches, Andreas Krebbel, Ulrich Weigand,
	Andreas Arnez

On 07/14/2016 11:10 AM, Dominik Vogt wrote:

> -  if (flag_stack_usage_info)
> -    stack_usage_size += extra;
> +  /*!!!*/
> +  if (flag_stack_usage_info && pstack_usage_size)
> +    *pstack_usage_size += extra;

What does the comment mean? Whatever it is needs to be addressed and the 
comment removed.

Other than that it looks ok, although I still feel uneasy about the 
unexplained movement of code since v1 - next time you fix a bug in a 
patch, please document what happened.


Bernd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] Allocate constant size dynamic stack space in the prologue
  2016-07-15 11:51                       ` Bernd Schmidt
@ 2016-07-15 12:05                         ` Dominik Vogt
  2016-07-15 12:22                           ` Dominik Vogt
  0 siblings, 1 reply; 27+ messages in thread
From: Dominik Vogt @ 2016-07-15 12:05 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Jeff Law, gcc-patches, Andreas Krebbel, Ulrich Weigand, Andreas Arnez

On Fri, Jul 15, 2016 at 01:51:51PM +0200, Bernd Schmidt wrote:
> On 07/14/2016 11:10 AM, Dominik Vogt wrote:
> 
> >-  if (flag_stack_usage_info)
> >-    stack_usage_size += extra;
> >+  /*!!!*/
> >+  if (flag_stack_usage_info && pstack_usage_size)
> >+    *pstack_usage_size += extra;
> 
> What does the comment mean?

It means that I wanted to look at that code location for some
reason and forgot to do so.  Need to check why I've put it in
there in the first place.

> Other than that it looks ok, although I still feel uneasy about the
> unexplained movement of code since v1 - next time you fix a bug in a
> patch, please document what happened.

Yep.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] Allocate constant size dynamic stack space in the prologue
  2016-07-15 12:05                         ` Dominik Vogt
@ 2016-07-15 12:22                           ` Dominik Vogt
  2016-07-15 13:18                             ` Bernd Schmidt
  2016-07-18 13:11                             ` Andreas Krebbel
  0 siblings, 2 replies; 27+ messages in thread
From: Dominik Vogt @ 2016-07-15 12:22 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 222 bytes --]

Final version 6 with the stray comment removed (was a harmless
oversight).

Initial description of the patch:
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03052.html

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-v6-ChangeLog --]
[-- Type: text/plain, Size: 708 bytes --]

gcc/ChangeLog

	* cfgexpand.c (expand_stack_vars): Implement synamic stack space
	allocation in the prologue.
	* explow.c (get_dynamic_stack_base): New function to return an address
	expression for the dynamic stack base.
	(get_dynamic_stack_size): New function to do the required dynamic stack
	space size calculations.
	(allocate_dynamic_stack_space): Use new functions.
	(align_dynamic_address): Move some code from
	allocate_dynamic_stack_space to new function.
	* explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export.
gcc/testsuite/ChangeLog

	* gcc.target/s390/warn-dynamicstack-1.c: New test.
	* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.
	stack-layout-dynamic-1.c: New test.

[-- Attachment #3: 0001-v6-Allocate-constant-size-dynamic-stack-space-in-the-pr.patch --]
[-- Type: text/plain, Size: 16058 bytes --]

From 03d09e6adacf582d52f53ca36cfa3c001ed444e5 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Wed, 25 Nov 2015 09:31:19 +0100
Subject: [PATCH] Allocate constant size dynamic stack space in the
 prologue ...

... and place it in the virtual stack vars area, if the platform supports it.
On S/390 this saves adjusting the stack pointer twice and forcing the frame
pointer into existence.  It also removes the warning with -mwarn-dynamicstack
that is triggered by cfun->calls_alloca == 1.

This fixes a problem with the Linux kernel which aligns the page structure to
16 bytes at run time using inefficient code and issuing a bogus warning.
---
 gcc/cfgexpand.c                                    |  22 +-
 gcc/explow.c                                       | 225 ++++++++++++++-------
 gcc/explow.h                                       |   8 +
 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c      |  14 ++
 gcc/testsuite/gcc.dg/stack-usage-2.c               |   4 +-
 .../gcc.target/s390/warn-dynamicstack-1.c          |  17 ++
 6 files changed, 206 insertions(+), 84 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index e4ddb3a..5046d61 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1053,6 +1053,7 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
   HOST_WIDE_INT large_size = 0, large_alloc = 0;
   rtx large_base = NULL;
   unsigned large_align = 0;
+  bool large_allocation_done = false;
   tree decl;
 
   /* Determine if there are any variables requiring "large" alignment.
@@ -1096,11 +1097,6 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	  large_size &= -(HOST_WIDE_INT)alignb;
 	  large_size += stack_vars[i].size;
 	}
-
-      /* If there were any, allocate space.  */
-      if (large_size > 0)
-	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
-						   large_align, true);
     }
 
   for (si = 0; si < n; ++si)
@@ -1186,6 +1182,22 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	  /* Large alignment is only processed in the last pass.  */
 	  if (pred)
 	    continue;
+
+	  /* If there were any variables requiring "large" alignment, allocate
+	     space.  */
+	  if (large_size > 0 && ! large_allocation_done)
+	    {
+	      HOST_WIDE_INT loffset;
+	      rtx large_allocsize;
+
+	      large_allocsize = GEN_INT (large_size);
+	      get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
+	      loffset = alloc_stack_frame_space
+		(INTVAL (large_allocsize),
+		 PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
+	      large_base = get_dynamic_stack_base (loffset, large_align);
+	      large_allocation_done = true;
+	    }
 	  gcc_assert (large_base != NULL);
 
 	  large_alloc += alignb - 1;
diff --git a/gcc/explow.c b/gcc/explow.c
index 09a0330..a345690 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1146,82 +1146,55 @@ record_new_stack_level (void)
     update_sjlj_context ();
 }
 \f
-/* Return an rtx representing the address of an area of memory dynamically
-   pushed on the stack.
+/* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
+static rtx
+align_dynamic_address (rtx target, unsigned required_align)
+{
+  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
+     but we know it can't.  So add ourselves and then do
+     TRUNC_DIV_EXPR.  */
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
+				       Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
+			  gen_int_mode (required_align / BITS_PER_UNIT,
+					Pmode),
+			  NULL_RTX, 1);
+  target = expand_mult (Pmode, target,
+			gen_int_mode (required_align / BITS_PER_UNIT,
+				      Pmode),
+			NULL_RTX, 1);
 
-   Any required stack pointer alignment is preserved.
+  return target;
+}
 
-   SIZE is an rtx representing the size of the area.
+/* Return an rtx through *PSIZE, representing the size of an area of memory to
+   be dynamically pushed on the stack.
+
+   *PSIZE is an rtx representing the size of the area.
 
    SIZE_ALIGN is the alignment (in bits) that we know SIZE has.  This
-   parameter may be zero.  If so, a proper value will be extracted 
+   parameter may be zero.  If so, a proper value will be extracted
    from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed.
 
    REQUIRED_ALIGN is the alignment (in bits) required for the region
    of memory.
 
-   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
-   stack space allocated by the generated code cannot be added with itself
-   in the course of the execution of the function.  It is always safe to
-   pass FALSE here and the following criterion is sufficient in order to
-   pass TRUE: every path in the CFG that starts at the allocation point and
-   loops to it executes the associated deallocation code.  */
-
-rtx
-allocate_dynamic_stack_space (rtx size, unsigned size_align,
-			      unsigned required_align, bool cannot_accumulate)
+   If PSTACK_USAGE_SIZE is not NULL it points to a value that is increased for
+   the additional size returned.  */
+void
+get_dynamic_stack_size (rtx *psize, unsigned size_align,
+			unsigned required_align,
+			HOST_WIDE_INT *pstack_usage_size)
 {
-  HOST_WIDE_INT stack_usage_size = -1;
-  rtx_code_label *final_label;
-  rtx final_target, target;
-  unsigned extra;
-
-  /* If we're asking for zero bytes, it doesn't matter what we point
-     to since we can't dereference it.  But return a reasonable
-     address anyway.  */
-  if (size == const0_rtx)
-    return virtual_stack_dynamic_rtx;
-
-  /* Otherwise, show we're calling alloca or equivalent.  */
-  cfun->calls_alloca = 1;
-
-  /* If stack usage info is requested, look into the size we are passed.
-     We need to do so this early to avoid the obfuscation that may be
-     introduced later by the various alignment operations.  */
-  if (flag_stack_usage_info)
-    {
-      if (CONST_INT_P (size))
-	stack_usage_size = INTVAL (size);
-      else if (REG_P (size))
-        {
-	  /* Look into the last emitted insn and see if we can deduce
-	     something for the register.  */
-	  rtx_insn *insn;
-	  rtx set, note;
-	  insn = get_last_insn ();
-	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
-	    {
-	      if (CONST_INT_P (SET_SRC (set)))
-		stack_usage_size = INTVAL (SET_SRC (set));
-	      else if ((note = find_reg_equal_equiv_note (insn))
-		       && CONST_INT_P (XEXP (note, 0)))
-		stack_usage_size = INTVAL (XEXP (note, 0));
-	    }
-	}
-
-      /* If the size is not constant, we can't say anything.  */
-      if (stack_usage_size == -1)
-	{
-	  current_function_has_unbounded_dynamic_stack_size = 1;
-	  stack_usage_size = 0;
-	}
-    }
+  unsigned extra = 0;
+  rtx size = *psize;
 
   /* Ensure the size is in the proper mode.  */
   if (GET_MODE (size) != VOIDmode && GET_MODE (size) != Pmode)
     size = convert_to_mode (Pmode, size, 1);
 
-  /* Adjust SIZE_ALIGN, if needed.  */
   if (CONST_INT_P (size))
     {
       unsigned HOST_WIDE_INT lsb;
@@ -1255,8 +1228,8 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
   size = plus_constant (Pmode, size, extra);
   size = force_operand (size, NULL_RTX);
 
-  if (flag_stack_usage_info)
-    stack_usage_size += extra;
+  if (flag_stack_usage_info && pstack_usage_size)
+    *pstack_usage_size += extra;
 
   if (extra && size_align > BITS_PER_UNIT)
     size_align = BITS_PER_UNIT;
@@ -1278,13 +1251,89 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
     {
       size = round_push (size);
 
-      if (flag_stack_usage_info)
+      if (flag_stack_usage_info && pstack_usage_size)
 	{
 	  int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
-	  stack_usage_size = (stack_usage_size + align - 1) / align * align;
+	  *pstack_usage_size =
+	    (*pstack_usage_size + align - 1) / align * align;
 	}
     }
 
+  *psize = size;
+}
+
+/* Return an rtx representing the address of an area of memory dynamically
+   pushed on the stack.
+
+   Any required stack pointer alignment is preserved.
+
+   SIZE is an rtx representing the size of the area.
+
+   SIZE_ALIGN is the alignment (in bits) that we know SIZE has.  This
+   parameter may be zero.  If so, a proper value will be extracted
+   from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.
+
+   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
+   stack space allocated by the generated code cannot be added with itself
+   in the course of the execution of the function.  It is always safe to
+   pass FALSE here and the following criterion is sufficient in order to
+   pass TRUE: every path in the CFG that starts at the allocation point and
+   loops to it executes the associated deallocation code.  */
+
+rtx
+allocate_dynamic_stack_space (rtx size, unsigned size_align,
+			      unsigned required_align, bool cannot_accumulate)
+{
+  HOST_WIDE_INT stack_usage_size = -1;
+  rtx_code_label *final_label;
+  rtx final_target, target;
+
+  /* If we're asking for zero bytes, it doesn't matter what we point
+     to since we can't dereference it.  But return a reasonable
+     address anyway.  */
+  if (size == const0_rtx)
+    return virtual_stack_dynamic_rtx;
+
+  /* Otherwise, show we're calling alloca or equivalent.  */
+  cfun->calls_alloca = 1;
+
+  /* If stack usage info is requested, look into the size we are passed.
+     We need to do so this early to avoid the obfuscation that may be
+     introduced later by the various alignment operations.  */
+  if (flag_stack_usage_info)
+    {
+      if (CONST_INT_P (size))
+	stack_usage_size = INTVAL (size);
+      else if (REG_P (size))
+        {
+	  /* Look into the last emitted insn and see if we can deduce
+	     something for the register.  */
+	  rtx_insn *insn;
+	  rtx set, note;
+	  insn = get_last_insn ();
+	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
+	    {
+	      if (CONST_INT_P (SET_SRC (set)))
+		stack_usage_size = INTVAL (SET_SRC (set));
+	      else if ((note = find_reg_equal_equiv_note (insn))
+		       && CONST_INT_P (XEXP (note, 0)))
+		stack_usage_size = INTVAL (XEXP (note, 0));
+	    }
+	}
+
+      /* If the size is not constant, we can't say anything.  */
+      if (stack_usage_size == -1)
+	{
+	  current_function_has_unbounded_dynamic_stack_size = 1;
+	  stack_usage_size = 0;
+	}
+    }
+
+  get_dynamic_stack_size (&size, size_align, required_align, &stack_usage_size);
+
   target = gen_reg_rtx (Pmode);
 
   /* The size is supposed to be fully adjusted at this point so record it
@@ -1447,19 +1496,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
       target = final_target;
     }
 
-  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
-     but we know it can't.  So add ourselves and then do
-     TRUNC_DIV_EXPR.  */
-  target = expand_binop (Pmode, add_optab, target,
-			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
-				       Pmode),
-			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
-  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
-			  gen_int_mode (required_align / BITS_PER_UNIT, Pmode),
-			  NULL_RTX, 1);
-  target = expand_mult (Pmode, target,
-			gen_int_mode (required_align / BITS_PER_UNIT, Pmode),
-			NULL_RTX, 1);
+  target = align_dynamic_address (target, required_align);
 
   /* Now that we've committed to a return value, mark its alignment.  */
   mark_reg_pointer (target, required_align);
@@ -1469,6 +1506,38 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
 
   return target;
 }
+
+/* Return an rtx representing the address of an area of memory already
+   statically pushed onto the stack in the virtual stack vars area.  (It is
+   assumed that the area is allocated in the function prologue.)
+
+   Any required stack pointer alignment is preserved.
+
+   OFFSET is the offset of the area into the virtual stack vars area.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.  */
+
+rtx
+get_dynamic_stack_base (HOST_WIDE_INT offset, unsigned required_align)
+{
+  rtx target;
+
+  if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
+    crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+
+  target = gen_reg_rtx (Pmode);
+  emit_move_insn (target, virtual_stack_vars_rtx);
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (offset, Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = align_dynamic_address (target, required_align);
+
+  /* Now that we've committed to a return value, mark its alignment.  */
+  mark_reg_pointer (target, required_align);
+
+  return target;
+}
 \f
 /* A front end may want to override GCC's stack checking by providing a
    run-time routine to call to check the stack, so provide a mechanism for
diff --git a/gcc/explow.h b/gcc/explow.h
index 52113db..e12f90c 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -87,6 +87,14 @@ extern void record_new_stack_level (void);
 /* Allocate some space on the stack dynamically and return its address.  */
 extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned, bool);
 
+/* Calculate the necessary size of a constant dynamic stack allocation from the
+   size of the variable area.  */
+extern void get_dynamic_stack_size (rtx *, unsigned, unsigned, HOST_WIDE_INT *);
+
+/* Returns the address of the dynamic stack space without allocating it.  */
+extern rtx get_dynamic_stack_base (HOST_WIDE_INT offset,
+				   unsigned required_align);
+
 /* Emit one stack probe at ADDRESS, an address within the stack.  */
 extern void emit_stack_probe (rtx);
 
diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
new file mode 100644
index 0000000..6dc17af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
@@ -0,0 +1,14 @@
+/* Verify that run time aligned local variables are aloocated in the prologue
+   in one pass together with normal local variables.  */
+/* { dg-do compile } */
+/* { dg-options "-O0 -fomit-frame-pointer" } */
+
+extern void bar (void *, void *, void *);
+void foo (void)
+{
+  int i;
+  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
+  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
+  bar (&i, &runtime_aligned_1, &runtime_aligned_2);
+}
+/* { dg-final { scan-assembler-not "cfi_def_cfa_register" } } */
diff --git a/gcc/testsuite/gcc.dg/stack-usage-2.c b/gcc/testsuite/gcc.dg/stack-usage-2.c
index 86e2a65..1a9e7f3 100644
--- a/gcc/testsuite/gcc.dg/stack-usage-2.c
+++ b/gcc/testsuite/gcc.dg/stack-usage-2.c
@@ -16,7 +16,9 @@ int foo2 (void)  /* { dg-warning "stack usage is \[0-9\]* bytes" } */
   return 0;
 }
 
-int foo3 (void) /* { dg-warning "stack usage might be \[0-9\]* bytes" } */
+/* The actual warning depends on whether stack space is allocated dynamically
+   or statically.  */
+int foo3 (void) /* { dg-warning "stack usage (might be)|(is) \[0-9\]* bytes" } */
 {
   char arr[1024] __attribute__((aligned (512)));
   arr[0] = 1;
diff --git a/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c b/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c
new file mode 100644
index 0000000..66913f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c
@@ -0,0 +1,17 @@
+/* Check that the stack pointer is decreased only once in a funtion with
+   runtime aligned stack variables and -mwarn-dynamicstack does not generate a
+   warning.  */
+
+/* { dg-do compile { target { s390*-*-* } } } */
+/* { dg-options "-O2 -mwarn-dynamicstack" } */
+
+extern int bar (char *pl);
+
+int foo (long size)
+{
+  char __attribute__ ((aligned(16))) l = size;
+
+  return bar (&l);
+}
+
+/* { dg-final { scan-assembler-times "%r15,-" 1 } } */
-- 
2.3.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] Allocate constant size dynamic stack space in the prologue
  2016-07-15 12:22                           ` Dominik Vogt
@ 2016-07-15 13:18                             ` Bernd Schmidt
  2016-07-18 13:11                             ` Andreas Krebbel
  1 sibling, 0 replies; 27+ messages in thread
From: Bernd Schmidt @ 2016-07-15 13:18 UTC (permalink / raw)
  To: vogt, gcc-patches

On 07/15/2016 02:22 PM, Dominik Vogt wrote:
> Final version 6 with the stray comment removed (was a harmless
> oversight).

Ok, let's put it in.


Bernd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5] Allocate constant size dynamic stack space in the prologue
  2016-07-15 12:22                           ` Dominik Vogt
  2016-07-15 13:18                             ` Bernd Schmidt
@ 2016-07-18 13:11                             ` Andreas Krebbel
  1 sibling, 0 replies; 27+ messages in thread
From: Andreas Krebbel @ 2016-07-18 13:11 UTC (permalink / raw)
  To: Dominik Vogt; +Cc: gcc-patches

> gcc/ChangeLog
> 
> 	* cfgexpand.c (expand_stack_vars): Implement synamic stack space
> 	allocation in the prologue.
> 	* explow.c (get_dynamic_stack_base): New function to return an address
> 	expression for the dynamic stack base.
> 	(get_dynamic_stack_size): New function to do the required dynamic stack
> 	space size calculations.
> 	(allocate_dynamic_stack_space): Use new functions.
> 	(align_dynamic_address): Move some code from
> 	allocate_dynamic_stack_space to new function.
> 	* explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export.
> gcc/testsuite/ChangeLog
> 
> 	* gcc.target/s390/warn-dynamicstack-1.c: New test.
> 	* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.
> 	stack-layout-dynamic-1.c: New test.

Applied.  Thanks!

-Andreas-

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-07-18 13:11 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-25 13:17 [PATCH] Allocate constant size dynamic stack space in the prologue Dominik Vogt
2015-11-25 13:33 ` Bernd Schmidt
2015-11-25 14:54   ` Dominik Vogt
2015-11-25 15:06     ` Bernd Schmidt
2015-11-27 14:13       ` Dominik Vogt
2015-11-27 14:24         ` Dominik Vogt
2015-12-02 19:05         ` Jeff Law
2015-12-03  0:16           ` Bernd Schmidt
2016-05-06  9:38 ` [PATCH v2] " Dominik Vogt
2016-05-06  9:44   ` Dominik Vogt
2016-06-20 11:09     ` [PATCH v2, PING 1] " Dominik Vogt
2016-06-23  4:46     ` [PATCH v2] " Jeff Law
2016-06-23 15:48       ` Dominik Vogt
2016-06-24 12:40         ` Dominik Vogt
2016-07-04 12:20           ` Dominik Vogt
2016-07-04 14:09             ` Andreas Krebbel
2016-07-06 12:01             ` Bernd Schmidt
2016-07-07 11:57               ` Dominik Vogt
2016-07-11 11:44                 ` [PATCH v5] " Dominik Vogt
2016-07-13 22:12                   ` Jeff Law
2016-07-14  9:11                     ` Dominik Vogt
2016-07-15 11:51                       ` Bernd Schmidt
2016-07-15 12:05                         ` Dominik Vogt
2016-07-15 12:22                           ` Dominik Vogt
2016-07-15 13:18                             ` Bernd Schmidt
2016-07-18 13:11                             ` Andreas Krebbel
2016-06-23  4:43   ` [PATCH v2] " Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).