public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Drop excess size used for run time allocated stack variables.
@ 2016-04-29 22:13 Dominik Vogt
  2016-04-29 22:15 ` Dominik Vogt
                   ` (3 more replies)
  0 siblings, 4 replies; 35+ messages in thread
From: Dominik Vogt @ 2016-04-29 22:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andreas Krebbel

[-- Attachment #1: Type: text/plain, Size: 202 bytes --]

The attached patch removes excess stack space allocation with
alloca in some situations.  Plese check the commit message in the
patch for details.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-ChangeLog --]
[-- Type: text/plain, Size: 132 bytes --]

gcc/ChangeLog

	* explow.c (round_push): Use know adjustment.
	(allocate_dynamic_stack_space): Pass known adjustment to round_push.

[-- Attachment #3: 0001-Drop-excess-size-used-for-run-time-allocated-stack-v.patch --]
[-- Type: text/x-diff, Size: 4520 bytes --]

From 9ea451aef0f1f2fb0a36a7b718f910cfe285541d Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Fri, 29 Apr 2016 08:36:59 +0100
Subject: [PATCH] Drop excess size used for run time allocated stack
 variables.

The present calculation sometimes led to more stack memory being used than
necessary with alloca.  First, (STACK_BOUNDARY -1) would be added to the
allocated size:

  size = plus_constant (Pmode, size, extra);
  size = force_operand (size, NULL_RTX);

Then round_push was called and added another (STACK_BOUNDARY - 1) before
rounding down to a multiple of STACK_BOUNDARY.  On s390x this resulted in
adding 14 before rounding down for "x" in the test case pr36728-1.c.

round_push() now takes an argument to inform it about what has already been
added to size.
---
 gcc/explow.c | 33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/gcc/explow.c b/gcc/explow.c
index e0ce201..a039295 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -949,24 +949,30 @@ anti_adjust_stack (rtx adjust)
 }
 
 /* Round the size of a block to be pushed up to the boundary required
-   by this machine.  SIZE is the desired size, which need not be constant.  */
+   by this machine.  SIZE is the desired size, which need not be constant.
+   ALREADY_ADDED is the number of units that have already been added to SIZE for
+   other alignment reasons.
+*/
 
 static rtx
-round_push (rtx size)
+round_push (rtx size, int already_added)
 {
-  rtx align_rtx, alignm1_rtx;
+  rtx align_rtx, add_rtx;
 
   if (!SUPPORTS_STACK_ALIGNMENT
       || crtl->preferred_stack_boundary == MAX_SUPPORTED_STACK_ALIGNMENT)
     {
       int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
+      int add;
 
       if (align == 1)
 	return size;
 
+      add = (align > already_added) ? align - already_added - 1 : 0;
+
       if (CONST_INT_P (size))
 	{
-	  HOST_WIDE_INT new_size = (INTVAL (size) + align - 1) / align * align;
+	  HOST_WIDE_INT new_size = (INTVAL (size) + add) / align * align;
 
 	  if (INTVAL (size) != new_size)
 	    size = GEN_INT (new_size);
@@ -974,7 +980,7 @@ round_push (rtx size)
 	}
 
       align_rtx = GEN_INT (align);
-      alignm1_rtx = GEN_INT (align - 1);
+      add_rtx = (add > 0) ? GEN_INT (add) : const0_rtx;
     }
   else
     {
@@ -983,15 +989,15 @@ round_push (rtx size)
 	 substituted by the right value in vregs pass and optimized
 	 during combine.  */
       align_rtx = virtual_preferred_stack_boundary_rtx;
-      alignm1_rtx = force_operand (plus_constant (Pmode, align_rtx, -1),
-				   NULL_RTX);
+      add_rtx = force_operand (plus_constant (Pmode, align_rtx, -1), NULL_RTX);
     }
 
   /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
      but we know it can't.  So add ourselves and then do
      TRUNC_DIV_EXPR.  */
-  size = expand_binop (Pmode, add_optab, size, alignm1_rtx,
-		       NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  if (add_rtx != const0_rtx)
+    size = expand_binop (Pmode, add_optab, size, add_rtx,
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
   size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx,
 			NULL_RTX, 1);
   size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1);
@@ -1175,6 +1181,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
   rtx_code_label *final_label;
   rtx final_target, target;
   unsigned extra_align = 0;
+  unsigned extra = 0;
   bool must_align;
 
   /* If we're asking for zero bytes, it doesn't matter what we point
@@ -1275,9 +1282,9 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
   extra_align = BITS_PER_UNIT;
 #endif
 
-  if (must_align)
+  if (must_align && required_align > extra_align)
     {
-      unsigned extra = (required_align - extra_align) / BITS_PER_UNIT;
+      extra = (required_align - extra_align) / BITS_PER_UNIT;
 
       size = plus_constant (Pmode, size, extra);
       size = force_operand (size, NULL_RTX);
@@ -1285,7 +1292,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
       if (flag_stack_usage_info)
 	stack_usage_size += extra;
 
-      if (extra && size_align > extra_align)
+      if (size_align > extra_align)
 	size_align = extra_align;
     }
 
@@ -1304,7 +1311,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
      momentarily mis-aligning the stack.  */
   if (size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0)
     {
-      size = round_push (size);
+      size = round_push (size, extra);
 
       if (flag_stack_usage_info)
 	{
-- 
2.3.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Drop excess size used for run time allocated stack variables.
  2016-04-29 22:13 [PATCH] Drop excess size used for run time allocated stack variables Dominik Vogt
@ 2016-04-29 22:15 ` Dominik Vogt
  2016-04-30  9:44 ` Eric Botcazou
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: Dominik Vogt @ 2016-04-29 22:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andreas Krebbel

P.S.:  Bootstrapped and regression tested on s390, s390x and
x86_64.

--

I'm not sure whether something has to be done about the else
branch in round_push too:

  else 
    { 
      /* If crtl->preferred_stack_boundary might still grow, use 
         virtual_preferred_stack_boundary_rtx instead.  This will be 
         substituted by the right value in vregs pass and optimized 
         during combine.  */ 
      align_rtx = virtual_preferred_stack_boundary_rtx; 
      add_rtx = force_operand (plus_constant (Pmode, align_rtx, -1), NULL_RTX); 
    } 

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Drop excess size used for run time allocated stack variables.
  2016-04-29 22:13 [PATCH] Drop excess size used for run time allocated stack variables Dominik Vogt
  2016-04-29 22:15 ` Dominik Vogt
@ 2016-04-30  9:44 ` Eric Botcazou
  2016-04-30 10:14   ` Dominik Vogt
  2016-05-02 13:43 ` Dominik Vogt
  2016-05-02 15:10 ` Jeff Law
  3 siblings, 1 reply; 35+ messages in thread
From: Eric Botcazou @ 2016-04-30  9:44 UTC (permalink / raw)
  To: vogt; +Cc: gcc-patches, Andreas Krebbel

> The attached patch removes excess stack space allocation with
> alloca in some situations.  Plese check the commit message in the
> patch for details.

This might fix PR middle-end/50938.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Drop excess size used for run time allocated stack variables.
  2016-04-30  9:44 ` Eric Botcazou
@ 2016-04-30 10:14   ` Dominik Vogt
  0 siblings, 0 replies; 35+ messages in thread
From: Dominik Vogt @ 2016-04-30 10:14 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, Andreas Krebbel

On Sat, Apr 30, 2016 at 11:44:01AM +0200, Eric Botcazou wrote:
> > The attached patch removes excess stack space allocation with
> > alloca in some situations.  Plese check the commit message in the
> > patch for details.
> 
> This might fix PR middle-end/50938.

This certainly looks like what I was trying to fix.  Now, if
anyone could think of a target independent test case for the patch
...

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Drop excess size used for run time allocated stack variables.
  2016-04-29 22:13 [PATCH] Drop excess size used for run time allocated stack variables Dominik Vogt
  2016-04-29 22:15 ` Dominik Vogt
  2016-04-30  9:44 ` Eric Botcazou
@ 2016-05-02 13:43 ` Dominik Vogt
  2016-05-02 15:10 ` Jeff Law
  3 siblings, 0 replies; 35+ messages in thread
From: Dominik Vogt @ 2016-05-02 13:43 UTC (permalink / raw)
  To: gcc-patches, Andreas Krebbel

>  static rtx
> -round_push (rtx size)
> +round_push (rtx size, int already_added)

round_push also needs to know about the required alignment in case
that is more strict than a simple stack slot alignment.

>  {
> -  rtx align_rtx, alignm1_rtx;
> +  rtx align_rtx, add_rtx;
>  
>    if (!SUPPORTS_STACK_ALIGNMENT
>        || crtl->preferred_stack_boundary == MAX_SUPPORTED_STACK_ALIGNMENT)
>      {
>        int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;

=>

  int align = MAX (required_align,
                   crtl->preferred_stack_boundary) / BITS_PER_UNIT;

Unfortunately the testsuite didn't detect this bug; it showed up
while testing anothe patch related to stack layout.  I'm testing
the modified patch right now.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Drop excess size used for run time allocated stack variables.
  2016-04-29 22:13 [PATCH] Drop excess size used for run time allocated stack variables Dominik Vogt
                   ` (2 preceding siblings ...)
  2016-05-02 13:43 ` Dominik Vogt
@ 2016-05-02 15:10 ` Jeff Law
  2016-05-03 14:18   ` Dominik Vogt
  3 siblings, 1 reply; 35+ messages in thread
From: Jeff Law @ 2016-05-02 15:10 UTC (permalink / raw)
  To: vogt, gcc-patches, Andreas Krebbel

On 04/29/2016 04:12 PM, Dominik Vogt wrote:
> The attached patch removes excess stack space allocation with
> alloca in some situations.  Plese check the commit message in the
> patch for details.
>
> Ciao
>
> Dominik ^_^  ^_^
>
> -- Dominik Vogt IBM Germany
>
>
> 0001-ChangeLog
>
>
> gcc/ChangeLog
>
> 	* explow.c (round_push): Use know adjustment.
> 	(allocate_dynamic_stack_space): Pass known adjustment to round_push.
If I understand the state of this patch correctly, you're working on 
another iteration, so I'm not going to dig into this version.

However, I would strongly recommend some tests, even if they are target 
specific.  You can always copy pr36728-1 into the s390x directory and 
look at size of the generated stack.  Simliarly for pr50938 for x86.

jeff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Drop excess size used for run time allocated stack variables.
  2016-05-02 15:10 ` Jeff Law
@ 2016-05-03 14:18   ` Dominik Vogt
  2016-05-19 23:11     ` Jeff Law
  2016-05-25 14:02     ` [PATCH 1/2][v3] " Dominik Vogt
  0 siblings, 2 replies; 35+ messages in thread
From: Dominik Vogt @ 2016-05-03 14:18 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Andreas Krebbel

[-- Attachment #1: Type: text/plain, Size: 889 bytes --]

Version two of the patch including a test case.

On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote:
> On 04/29/2016 04:12 PM, Dominik Vogt wrote:
> >The attached patch removes excess stack space allocation with
> >alloca in some situations.  Plese check the commit message in the
> >patch for details.

> However, I would strongly recommend some tests, even if they are
> target specific.  You can always copy pr36728-1 into the s390x
> directory and look at size of the generated stack.  Simliarly for
> pr50938 for x86.

However, x86 uses the "else" branch in round_push, i.e. it uses
"virtual_preferred_stack_boundary_rtx" to calculate the number of
bytes to add for stack alignment.  That value is unknown at the
time round_push is called, so the test case fails on such targets,
and I've no idea how to fix this properly.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-v2-ChangeLog --]
[-- Type: text/plain, Size: 188 bytes --]

gcc/ChangeLog

	* explow.c (round_push): Use know adjustment.
	(allocate_dynamic_stack_space): Pass known adjustment to round_push.
gcc/testsuite/ChangeLog

	* gcc.dg/pr50938.c: New test.

[-- Attachment #3: 0001-v2-Drop-excess-size-used-for-run-time-allocated-stack-v.patch --]
[-- Type: text/x-diff, Size: 5950 bytes --]

From f4d252eb2c860450b3738591fca36f568c958232 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Fri, 29 Apr 2016 08:36:59 +0100
Subject: [PATCH] Drop excess size used for run time allocated stack
 variables.

The present calculation sometimes led to more stack memory being used than
necessary with alloca.  First, (STACK_BOUNDARY -1) would be added to the
allocated size:

  size = plus_constant (Pmode, size, extra);
  size = force_operand (size, NULL_RTX);

Then round_push was called and added another (STACK_BOUNDARY - 1) before
rounding down to a multiple of STACK_BOUNDARY.  On s390x this resulted in
adding 14 before rounding down for "x" in the test case pr36728-1.c.

round_push() now takes an argument to inform it about what has already been
added to size.
---
 gcc/explow.c                   | 33 ++++++++++++++++-----------
 gcc/testsuite/gcc.dg/pr50938.c | 52 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr50938.c

diff --git a/gcc/explow.c b/gcc/explow.c
index e0ce201..1858597 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -949,24 +949,30 @@ anti_adjust_stack (rtx adjust)
 }
 
 /* Round the size of a block to be pushed up to the boundary required
-   by this machine.  SIZE is the desired size, which need not be constant.  */
+   by this machine.  SIZE is the desired size, which need not be constant.
+   ALREADY_ADDED is the number of units that have already been added to SIZE
+   for other alignment reasons.
+*/
 
 static rtx
-round_push (rtx size)
+round_push (rtx size, int already_added)
 {
-  rtx align_rtx, alignm1_rtx;
+  rtx align_rtx, add_rtx;
 
   if (!SUPPORTS_STACK_ALIGNMENT
       || crtl->preferred_stack_boundary == MAX_SUPPORTED_STACK_ALIGNMENT)
     {
       int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
+      int add;
 
       if (align == 1)
 	return size;
 
+      add = (align > already_added) ? align - already_added - 1 : 0;
+
       if (CONST_INT_P (size))
 	{
-	  HOST_WIDE_INT new_size = (INTVAL (size) + align - 1) / align * align;
+	  HOST_WIDE_INT new_size = (INTVAL (size) + add) / align * align;
 
 	  if (INTVAL (size) != new_size)
 	    size = GEN_INT (new_size);
@@ -974,7 +980,7 @@ round_push (rtx size)
 	}
 
       align_rtx = GEN_INT (align);
-      alignm1_rtx = GEN_INT (align - 1);
+      add_rtx = (add > 0) ? GEN_INT (add) : const0_rtx;
     }
   else
     {
@@ -983,15 +989,15 @@ round_push (rtx size)
 	 substituted by the right value in vregs pass and optimized
 	 during combine.  */
       align_rtx = virtual_preferred_stack_boundary_rtx;
-      alignm1_rtx = force_operand (plus_constant (Pmode, align_rtx, -1),
-				   NULL_RTX);
+      add_rtx = force_operand (plus_constant (Pmode, align_rtx, -1), NULL_RTX);
     }
 
   /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
      but we know it can't.  So add ourselves and then do
      TRUNC_DIV_EXPR.  */
-  size = expand_binop (Pmode, add_optab, size, alignm1_rtx,
-		       NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  if (add_rtx != const0_rtx)
+    size = expand_binop (Pmode, add_optab, size, add_rtx,
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
   size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx,
 			NULL_RTX, 1);
   size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1);
@@ -1175,6 +1181,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
   rtx_code_label *final_label;
   rtx final_target, target;
   unsigned extra_align = 0;
+  unsigned extra = 0;
   bool must_align;
 
   /* If we're asking for zero bytes, it doesn't matter what we point
@@ -1275,9 +1282,9 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
   extra_align = BITS_PER_UNIT;
 #endif
 
-  if (must_align)
+  if (must_align && required_align > extra_align)
     {
-      unsigned extra = (required_align - extra_align) / BITS_PER_UNIT;
+      extra = (required_align - extra_align) / BITS_PER_UNIT;
 
       size = plus_constant (Pmode, size, extra);
       size = force_operand (size, NULL_RTX);
@@ -1285,7 +1292,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
       if (flag_stack_usage_info)
 	stack_usage_size += extra;
 
-      if (extra && size_align > extra_align)
+      if (size_align > extra_align)
 	size_align = extra_align;
     }
 
@@ -1304,7 +1311,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
      momentarily mis-aligning the stack.  */
   if (size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0)
     {
-      size = round_push (size);
+      size = round_push (size, extra);
 
       if (flag_stack_usage_info)
 	{
diff --git a/gcc/testsuite/gcc.dg/pr50938.c b/gcc/testsuite/gcc.dg/pr50938.c
new file mode 100644
index 0000000..14b7540
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr50938.c
@@ -0,0 +1,52 @@
+/* PR/50938: Check that alloca () reserves the correct amount of stack space.
+ */
+
+/* { dg-do run } */
+/* { dg-options "-O0" } */
+/* { dg-require-effective-target alloca } */
+
+__attribute__ ((noinline))
+static void
+dummy (void)
+{
+}
+
+__attribute__ ((noinline))
+static char *
+get_frame_addr (void *p)
+{
+  void *faddr = __builtin_frame_address (0);
+  /* Call a function to make sure that a stack frame exists.  */
+  dummy ();
+  return faddr;
+}
+
+__attribute__ ((noinline))
+static void *
+stack_var (void)
+{
+  /* 1024 bytes on the stack.  */
+  char s[1024];
+  /* One stack slot.  */
+  void *p = (void *)s;
+  /* Keep stack memory alive by passing it to the function.  */
+  return get_frame_addr (p);
+}
+
+__attribute__ ((noinline))
+static void *
+alloca_var (void)
+{
+  /* 1024 bytes on the stack plus one stack slot for a.  */
+  void *a = __builtin_alloca (1024);
+  return get_frame_addr (a);
+}
+
+int main (void)
+{
+  if (stack_var () != alloca_var ())
+    /* The functions used a different amount of stack space.  */
+    __builtin_abort ();
+
+  return 0;
+}
-- 
2.3.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Drop excess size used for run time allocated stack variables.
  2016-05-03 14:18   ` Dominik Vogt
@ 2016-05-19 23:11     ` Jeff Law
  2016-05-20 21:24       ` [RFA] Minor cleanup to allocate_dynamic_stack_space Jeff Law
  2016-06-09 12:00       ` [PATCH] Drop excess size used for run time allocated stack variables Bernd Schmidt
  2016-05-25 14:02     ` [PATCH 1/2][v3] " Dominik Vogt
  1 sibling, 2 replies; 35+ messages in thread
From: Jeff Law @ 2016-05-19 23:11 UTC (permalink / raw)
  To: vogt, gcc-patches, Andreas Krebbel

On 05/03/2016 08:17 AM, Dominik Vogt wrote:
> Version two of the patch including a test case.
>
> On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote:
>> On 04/29/2016 04:12 PM, Dominik Vogt wrote:
>>> The attached patch removes excess stack space allocation with
>>> alloca in some situations.  Plese check the commit message in the
>>> patch for details.
>
>> However, I would strongly recommend some tests, even if they are
>> target specific.  You can always copy pr36728-1 into the s390x
>> directory and look at size of the generated stack.  Simliarly for
>> pr50938 for x86.
>
> However, x86 uses the "else" branch in round_push, i.e. it uses
> "virtual_preferred_stack_boundary_rtx" to calculate the number of
> bytes to add for stack alignment.  That value is unknown at the
> time round_push is called, so the test case fails on such targets,
> and I've no idea how to fix this properly.
I'm not going to be able to complete a review of this today.  As I dig 
into the history of this code I came across pr34548, pr47353, & pr46894 
(and their associated discussions on the lists) that we'll need to 
verify we don't break.  This is a bit of a mess and I think the code 
needs some TLC before we start hacking it up further.

Let's start with clean up of dead code:

  /* We will need to ensure that the address we return is aligned to
      REQUIRED_ALIGN.  If STACK_DYNAMIC_OFFSET is defined, we don't
      always know its final value at this point in the compilation (it
      might depend on the size of the outgoing parameter lists, for
      example), so we must align the value to be returned in that case.
      (Note that STACK_DYNAMIC_OFFSET will have a default nonzero value if
      STACK_POINTER_OFFSET or ACCUMULATE_OUTGOING_ARGS are defined).
      We must also do an alignment operation on the returned value if
      the stack pointer alignment is less strict than REQUIRED_ALIGN.

      If we have to align, we must leave space in SIZE for the hole
      that might result from the alignment operation.  */

   must_align = (crtl->preferred_stack_boundary < required_align);
   if (must_align)
     {
       if (required_align > PREFERRED_STACK_BOUNDARY)
         extra_align = PREFERRED_STACK_BOUNDARY;
       else if (required_align > STACK_BOUNDARY)
         extra_align = STACK_BOUNDARY;
       else
         extra_align = BITS_PER_UNIT;
     }

   /* ??? STACK_POINTER_OFFSET is always defined now.  */
#if defined (STACK_DYNAMIC_OFFSET) || defined (STACK_POINTER_OFFSET)
   must_align = true;
   extra_align = BITS_PER_UNIT;
#endif

If we look at defaults.h, it always defines STACK_POINTER_OFFSET.  So 
all the code above I think collapses to:

   must_align = true;
   extra_align = BITS_PER_UNIT

And the only other assignment to must_align assigns it the value "true". 
  There are two conditionals on must_align that looks like

if (must_align)
   {
     CODE;
   }

We should remove the conditional and pull CODE out an indentation level. 
  And remove all remnants of must_align.

I don't think that changes your patch in any way.  Hopefully it makes 
the whole function somewhat easier to grok.

Thoughts?

jeff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFA] Minor cleanup to allocate_dynamic_stack_space
  2016-05-19 23:11     ` Jeff Law
@ 2016-05-20 21:24       ` Jeff Law
  2016-05-20 21:44         ` Eric Botcazou
                           ` (2 more replies)
  2016-06-09 12:00       ` [PATCH] Drop excess size used for run time allocated stack variables Bernd Schmidt
  1 sibling, 3 replies; 35+ messages in thread
From: Jeff Law @ 2016-05-20 21:24 UTC (permalink / raw)
  To: vogt, gcc-patches, Andreas Krebbel

[-- Attachment #1: Type: text/plain, Size: 2450 bytes --]

On 05/19/2016 05:11 PM, Jeff Law wrote:
[ ... ]
>This is a bit of a mess and I think the code
> needs some TLC before we start hacking it up further.
>
> Let's start with clean up of dead code:
>
>  /* We will need to ensure that the address we return is aligned to
>      REQUIRED_ALIGN.  If STACK_DYNAMIC_OFFSET is defined, we don't
>      always know its final value at this point in the compilation (it
>      might depend on the size of the outgoing parameter lists, for
>      example), so we must align the value to be returned in that case.
>      (Note that STACK_DYNAMIC_OFFSET will have a default nonzero value if
>      STACK_POINTER_OFFSET or ACCUMULATE_OUTGOING_ARGS are defined).
>      We must also do an alignment operation on the returned value if
>      the stack pointer alignment is less strict than REQUIRED_ALIGN.
>
>      If we have to align, we must leave space in SIZE for the hole
>      that might result from the alignment operation.  */
>
>   must_align = (crtl->preferred_stack_boundary < required_align);
>   if (must_align)
>     {
>       if (required_align > PREFERRED_STACK_BOUNDARY)
>         extra_align = PREFERRED_STACK_BOUNDARY;
>       else if (required_align > STACK_BOUNDARY)
>         extra_align = STACK_BOUNDARY;
>       else
>         extra_align = BITS_PER_UNIT;
>     }
>
>   /* ??? STACK_POINTER_OFFSET is always defined now.  */
> #if defined (STACK_DYNAMIC_OFFSET) || defined (STACK_POINTER_OFFSET)
>   must_align = true;
>   extra_align = BITS_PER_UNIT;
> #endif
>
> If we look at defaults.h, it always defines STACK_POINTER_OFFSET.  So
> all the code above I think collapses to:
>
>   must_align = true;
>   extra_align = BITS_PER_UNIT
>
> And the only other assignment to must_align assigns it the value "true".
>  There are two conditionals on must_align that looks like
>
> if (must_align)
>   {
>     CODE;
>   }
>
> We should remove the conditional and pull CODE out an indentation level.
>  And remove all remnants of must_align.
>
> I don't think that changes your patch in any way.  Hopefully it makes
> the whole function somewhat easier to grok.
>
> Thoughts?
So here's that cleanup.  The diffs are larger than one might expect 
because of the reindentation that needs to happen.  So I've included a 
-b diff variant which shows how little actually changed here.

This should have no impact on any target.

Bootstrapped and regression tested on x86_64 linux.  Ok for the trunk?

Jeff


[-- Attachment #2: P1 --]
[-- Type: text/plain, Size: 4892 bytes --]

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b8fb96c..257e98b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2016-05-20  Jeff Law  <law@redhat.com>
+
+        * explow.c (allocate_dynamic_stack_space): Simplify
+        knowing that MUST_ALIGN was always true.
+
 2016-05-16  Ryan Burn  <contact@rnburn.com>
 
 	* Makefile.in (GTFILES): Add cilk.h and cilk-common.c.
diff --git a/gcc/explow.c b/gcc/explow.c
index e0ce201..51897e0 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1175,7 +1175,6 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
   rtx_code_label *final_label;
   rtx final_target, target;
   unsigned extra_align = 0;
-  bool must_align;
 
   /* If we're asking for zero bytes, it doesn't matter what we point
      to since we can't dereference it.  But return a reasonable
@@ -1245,49 +1244,18 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
   if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
     crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
 
-  /* We will need to ensure that the address we return is aligned to
-     REQUIRED_ALIGN.  If STACK_DYNAMIC_OFFSET is defined, we don't
-     always know its final value at this point in the compilation (it
-     might depend on the size of the outgoing parameter lists, for
-     example), so we must align the value to be returned in that case.
-     (Note that STACK_DYNAMIC_OFFSET will have a default nonzero value if
-     STACK_POINTER_OFFSET or ACCUMULATE_OUTGOING_ARGS are defined).
-     We must also do an alignment operation on the returned value if
-     the stack pointer alignment is less strict than REQUIRED_ALIGN.
-
-     If we have to align, we must leave space in SIZE for the hole
-     that might result from the alignment operation.  */
-
-  must_align = (crtl->preferred_stack_boundary < required_align);
-  if (must_align)
-    {
-      if (required_align > PREFERRED_STACK_BOUNDARY)
-	extra_align = PREFERRED_STACK_BOUNDARY;
-      else if (required_align > STACK_BOUNDARY)
-	extra_align = STACK_BOUNDARY;
-      else
-	extra_align = BITS_PER_UNIT;
-    }
-
-  /* ??? STACK_POINTER_OFFSET is always defined now.  */
-#if defined (STACK_DYNAMIC_OFFSET) || defined (STACK_POINTER_OFFSET)
-  must_align = true;
   extra_align = BITS_PER_UNIT;
-#endif
 
-  if (must_align)
-    {
-      unsigned extra = (required_align - extra_align) / BITS_PER_UNIT;
+  unsigned extra = (required_align - extra_align) / BITS_PER_UNIT;
 
-      size = plus_constant (Pmode, size, extra);
-      size = force_operand (size, NULL_RTX);
+  size = plus_constant (Pmode, size, extra);
+  size = force_operand (size, NULL_RTX);
 
-      if (flag_stack_usage_info)
-	stack_usage_size += extra;
+  if (flag_stack_usage_info)
+    stack_usage_size += extra;
 
-      if (extra && size_align > extra_align)
-	size_align = extra_align;
-    }
+  if (extra && size_align > extra_align)
+    size_align = extra_align;
 
   /* Round the size to a multiple of the required stack alignment.
      Since the stack if presumed to be rounded before this allocation,
@@ -1366,7 +1334,6 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
 			      gen_int_mode (required_align / BITS_PER_UNIT - 1,
 					    Pmode),
 			      NULL_RTX, 1, OPTAB_LIB_WIDEN);
-	  must_align = true;
 	}
 
       func = init_one_libfunc ("__morestack_allocate_stack_space");
@@ -1478,24 +1445,21 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
       target = final_target;
     }
 
-  if (must_align)
-    {
-      /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
-	 but we know it can't.  So add ourselves and then do
-	 TRUNC_DIV_EXPR.  */
-      target = expand_binop (Pmode, add_optab, target,
-			     gen_int_mode (required_align / BITS_PER_UNIT - 1,
-					   Pmode),
-			     NULL_RTX, 1, OPTAB_LIB_WIDEN);
-      target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
-			      gen_int_mode (required_align / BITS_PER_UNIT,
-					    Pmode),
-			      NULL_RTX, 1);
-      target = expand_mult (Pmode, target,
-			    gen_int_mode (required_align / BITS_PER_UNIT,
-					  Pmode),
-			    NULL_RTX, 1);
-    }
+  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
+     but we know it can't.  So add ourselves and then do
+     TRUNC_DIV_EXPR.  */
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
+				       Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
+			  gen_int_mode (required_align / BITS_PER_UNIT,
+				        Pmode),
+			  NULL_RTX, 1);
+  target = expand_mult (Pmode, target,
+			gen_int_mode (required_align / BITS_PER_UNIT,
+				      Pmode),
+			NULL_RTX, 1);
 
   /* Now that we've committed to a return value, mark its alignment.  */
   mark_reg_pointer (target, required_align);

[-- Attachment #3: P2 --]
[-- Type: text/plain, Size: 3668 bytes --]

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b8fb96c..257e98b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2016-05-20  Jeff Law  <law@redhat.com>
+
+        * explow.c (allocate_dynamic_stack_space): Simplify
+        knowing that MUST_ALIGN was always true.
+
 2016-05-16  Ryan Burn  <contact@rnburn.com>
 
 	* Makefile.in (GTFILES): Add cilk.h and cilk-common.c.
diff --git a/gcc/explow.c b/gcc/explow.c
index e0ce201..51897e0 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1175,7 +1175,6 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
   rtx_code_label *final_label;
   rtx final_target, target;
   unsigned extra_align = 0;
-  bool must_align;
 
   /* If we're asking for zero bytes, it doesn't matter what we point
      to since we can't dereference it.  But return a reasonable
@@ -1245,38 +1244,8 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
   if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
     crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
 
-  /* We will need to ensure that the address we return is aligned to
-     REQUIRED_ALIGN.  If STACK_DYNAMIC_OFFSET is defined, we don't
-     always know its final value at this point in the compilation (it
-     might depend on the size of the outgoing parameter lists, for
-     example), so we must align the value to be returned in that case.
-     (Note that STACK_DYNAMIC_OFFSET will have a default nonzero value if
-     STACK_POINTER_OFFSET or ACCUMULATE_OUTGOING_ARGS are defined).
-     We must also do an alignment operation on the returned value if
-     the stack pointer alignment is less strict than REQUIRED_ALIGN.
-
-     If we have to align, we must leave space in SIZE for the hole
-     that might result from the alignment operation.  */
-
-  must_align = (crtl->preferred_stack_boundary < required_align);
-  if (must_align)
-    {
-      if (required_align > PREFERRED_STACK_BOUNDARY)
-	extra_align = PREFERRED_STACK_BOUNDARY;
-      else if (required_align > STACK_BOUNDARY)
-	extra_align = STACK_BOUNDARY;
-      else
   extra_align = BITS_PER_UNIT;
-    }
-
-  /* ??? STACK_POINTER_OFFSET is always defined now.  */
-#if defined (STACK_DYNAMIC_OFFSET) || defined (STACK_POINTER_OFFSET)
-  must_align = true;
-  extra_align = BITS_PER_UNIT;
-#endif
 
-  if (must_align)
-    {
   unsigned extra = (required_align - extra_align) / BITS_PER_UNIT;
 
   size = plus_constant (Pmode, size, extra);
@@ -1287,7 +1256,6 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
 
   if (extra && size_align > extra_align)
     size_align = extra_align;
-    }
 
   /* Round the size to a multiple of the required stack alignment.
      Since the stack if presumed to be rounded before this allocation,
@@ -1366,7 +1334,6 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
 			      gen_int_mode (required_align / BITS_PER_UNIT - 1,
 					    Pmode),
 			      NULL_RTX, 1, OPTAB_LIB_WIDEN);
-	  must_align = true;
 	}
 
       func = init_one_libfunc ("__morestack_allocate_stack_space");
@@ -1478,8 +1445,6 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
       target = final_target;
     }
 
-  if (must_align)
-    {
   /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
      but we know it can't.  So add ourselves and then do
      TRUNC_DIV_EXPR.  */
@@ -1495,7 +1460,6 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
 			gen_int_mode (required_align / BITS_PER_UNIT,
 				      Pmode),
 			NULL_RTX, 1);
-    }
 
   /* Now that we've committed to a return value, mark its alignment.  */
   mark_reg_pointer (target, required_align);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFA] Minor cleanup to allocate_dynamic_stack_space
  2016-05-20 21:24       ` [RFA] Minor cleanup to allocate_dynamic_stack_space Jeff Law
@ 2016-05-20 21:44         ` Eric Botcazou
  2016-05-20 21:48           ` Jeff Law
  2016-05-23 10:12         ` Dominik Vogt
  2016-05-25 12:46         ` Dominik Vogt
  2 siblings, 1 reply; 35+ messages in thread
From: Eric Botcazou @ 2016-05-20 21:44 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, vogt, Andreas Krebbel

> So here's that cleanup.  The diffs are larger than one might expect
> because of the reindentation that needs to happen.  So I've included a
> -b diff variant which shows how little actually changed here.

I'm wondering if it isn't counter-productive.  The ??? comment is explicit 
about where the problem comes from: STACK_POINTER_OFFSET used to be defined 
only when needed, now it's always defined.

So I think that we should try to restore the initial state, this will very 
likely generate fewer alignment operations, for example:

#if defined (STACK_DYNAMIC_OFFSET)
  if (1)
#else
  if (STACK_POINTER_OFFSET)
#endif
    {
      must_align = true;
      extra_align = BITS_PER_UNIT;
    }

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFA] Minor cleanup to allocate_dynamic_stack_space
  2016-05-20 21:44         ` Eric Botcazou
@ 2016-05-20 21:48           ` Jeff Law
  2016-05-23  7:53             ` Eric Botcazou
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Law @ 2016-05-20 21:48 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, vogt, Andreas Krebbel

On 05/20/2016 03:44 PM, Eric Botcazou wrote:
>> So here's that cleanup.  The diffs are larger than one might expect
>> because of the reindentation that needs to happen.  So I've included a
>> -b diff variant which shows how little actually changed here.
>
> I'm wondering if it isn't counter-productive.  The ??? comment is explicit
> about where the problem comes from: STACK_POINTER_OFFSET used to be defined
> only when needed, now it's always defined.
>
> So I think that we should try to restore the initial state, this will very
> likely generate fewer alignment operations, for example:
I pondered that as a direction, but was scared off by the overall 
fragility of this code when I looked back through the old BZs.  I 
figured cleanup preserving existing behavior was the first step.

We can go the other way if you prefer.   It just makes reasoning about 
how this code is supposed to work harder.

jeff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFA] Minor cleanup to allocate_dynamic_stack_space
  2016-05-20 21:48           ` Jeff Law
@ 2016-05-23  7:53             ` Eric Botcazou
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Botcazou @ 2016-05-23  7:53 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, vogt, Andreas Krebbel

> I pondered that as a direction, but was scared off by the overall
> fragility of this code when I looked back through the old BZs.  I
> figured cleanup preserving existing behavior was the first step.

Setting aside the flag_split_stack stuff, the must_align logic is clear enough 
and quite localized.

> We can go the other way if you prefer.   It just makes reasoning about
> how this code is supposed to work harder.

On second thoughts, there might be a real issue with STACK_DYNAMIC_OFFSET that 
is papered over by the STACK_POINTER_OFFSET glitch.  The big comment explains 
why, if STACK_DYNAMIC_OFFSET is defined, we need to preventively align, and 
that it's defined if ACCUMULATE_OUTGOING_ARGS is activated.  That's not true 
for STACK_DYNAMIC_OFFSET overall, but that's indeed true in function.c where 
STACK_DYNAMIC_OFFSET is forcibly defined.

Given that ACCUMULATE_OUTGOING_ARGS is activated by default on most mainstream 
platforms, it probably makes sense to always preventively align.  From there 
on, I think that what would be left to discuss is the amount of alignment we 
can start from (extra_align in the current code); maybe BITS_PER_UNIT is also 
the only sensible value in the end.

So I think that your patch is OK if you rescue the main comment, for example:

  /* We will need to ensure that the address we return is aligned to
     REQUIRED_ALIGN.  At this point in the compilation, we don't always
     know the final value of the STACK_DYNAMIC_OFFSET used in function.c
     (it might depend on the size of the outgoing parameter lists, for
     example), so we must preventively align the value.  We leave space
     in SIZE for the hole that might result from the alignment operation.  */

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFA] Minor cleanup to allocate_dynamic_stack_space
  2016-05-20 21:24       ` [RFA] Minor cleanup to allocate_dynamic_stack_space Jeff Law
  2016-05-20 21:44         ` Eric Botcazou
@ 2016-05-23 10:12         ` Dominik Vogt
  2016-05-25 12:46         ` Dominik Vogt
  2 siblings, 0 replies; 35+ messages in thread
From: Dominik Vogt @ 2016-05-23 10:12 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Andreas Krebbel

On Fri, May 20, 2016 at 03:23:49PM -0600, Jeff Law wrote:
> On 05/19/2016 05:11 PM, Jeff Law wrote:
> [ ... ]
> >This is a bit of a mess and I think the code
> >needs some TLC before we start hacking it up further.
> >
> >Let's start with clean up of dead code:
> >
> > /* We will need to ensure that the address we return is aligned to
> >     REQUIRED_ALIGN.  If STACK_DYNAMIC_OFFSET is defined, we don't
> >     always know its final value at this point in the compilation (it
> >     might depend on the size of the outgoing parameter lists, for
> >     example), so we must align the value to be returned in that case.
> >     (Note that STACK_DYNAMIC_OFFSET will have a default nonzero value if
> >     STACK_POINTER_OFFSET or ACCUMULATE_OUTGOING_ARGS are defined).
> >     We must also do an alignment operation on the returned value if
> >     the stack pointer alignment is less strict than REQUIRED_ALIGN.
> >
> >     If we have to align, we must leave space in SIZE for the hole
> >     that might result from the alignment operation.  */
> >
> >  must_align = (crtl->preferred_stack_boundary < required_align);
> >  if (must_align)
> >    {
> >      if (required_align > PREFERRED_STACK_BOUNDARY)
> >        extra_align = PREFERRED_STACK_BOUNDARY;
> >      else if (required_align > STACK_BOUNDARY)
> >        extra_align = STACK_BOUNDARY;
> >      else
> >        extra_align = BITS_PER_UNIT;
> >    }
> >
> >  /* ??? STACK_POINTER_OFFSET is always defined now.  */
> >#if defined (STACK_DYNAMIC_OFFSET) || defined (STACK_POINTER_OFFSET)
> >  must_align = true;
> >  extra_align = BITS_PER_UNIT;
> >#endif
> >
> >If we look at defaults.h, it always defines STACK_POINTER_OFFSET.  So
> >all the code above I think collapses to:
> >
> >  must_align = true;
> >  extra_align = BITS_PER_UNIT
> >
> >And the only other assignment to must_align assigns it the value "true".
> > There are two conditionals on must_align that looks like
> >
> >if (must_align)
> >  {
> >    CODE;
> >  }
> >
> >We should remove the conditional and pull CODE out an indentation level.
> > And remove all remnants of must_align.
> >
> >I don't think that changes your patch in any way.  Hopefully it makes
> >the whole function somewhat easier to grok.
> >
> >Thoughts?
> So here's that cleanup.  The diffs are larger than one might expect
> because of the reindentation that needs to happen.  So I've included
> a -b diff variant which shows how little actually changed here.
> 
> This should have no impact on any target.
> 
> Bootstrapped and regression tested on x86_64 linux.  Ok for the trunk?

The first part of the change (removing the dead code) is fine.
However, I suggest to leave "must_align" in the code for now
because I have another patch in the queue that assigns a
calculated value to must_align.  For that I'd have to revert this
part of your patch, so I think it's not worth the effort to remove
it in the first place.  See

  https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00445.html

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFA] Minor cleanup to allocate_dynamic_stack_space
  2016-05-20 21:24       ` [RFA] Minor cleanup to allocate_dynamic_stack_space Jeff Law
  2016-05-20 21:44         ` Eric Botcazou
  2016-05-23 10:12         ` Dominik Vogt
@ 2016-05-25 12:46         ` Dominik Vogt
  2 siblings, 0 replies; 35+ messages in thread
From: Dominik Vogt @ 2016-05-25 12:46 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Andreas Krebbel

On Fri, May 20, 2016 at 03:23:49PM -0600, Jeff Law wrote:
> On 05/19/2016 05:11 PM, Jeff Law wrote:
> [ ... ]
> >This is a bit of a mess and I think the code
> >needs some TLC before we start hacking it up further.
> >
> >Let's start with clean up of dead code:
> >
> > /* We will need to ensure that the address we return is aligned to
> >     REQUIRED_ALIGN.  If STACK_DYNAMIC_OFFSET is defined, we don't
> >     always know its final value at this point in the compilation (it
> >     might depend on the size of the outgoing parameter lists, for
> >     example), so we must align the value to be returned in that case.
> >     (Note that STACK_DYNAMIC_OFFSET will have a default nonzero value if
> >     STACK_POINTER_OFFSET or ACCUMULATE_OUTGOING_ARGS are defined).
> >     We must also do an alignment operation on the returned value if
> >     the stack pointer alignment is less strict than REQUIRED_ALIGN.
> >
> >     If we have to align, we must leave space in SIZE for the hole
> >     that might result from the alignment operation.  */
> >
> >  must_align = (crtl->preferred_stack_boundary < required_align);
> >  if (must_align)
> >    {
> >      if (required_align > PREFERRED_STACK_BOUNDARY)
> >        extra_align = PREFERRED_STACK_BOUNDARY;
> >      else if (required_align > STACK_BOUNDARY)
> >        extra_align = STACK_BOUNDARY;
> >      else
> >        extra_align = BITS_PER_UNIT;
> >    }
> >
> >  /* ??? STACK_POINTER_OFFSET is always defined now.  */
> >#if defined (STACK_DYNAMIC_OFFSET) || defined (STACK_POINTER_OFFSET)
> >  must_align = true;
> >  extra_align = BITS_PER_UNIT;
> >#endif
> >
> >If we look at defaults.h, it always defines STACK_POINTER_OFFSET.  So
> >all the code above I think collapses to:
> >
> >  must_align = true;
> >  extra_align = BITS_PER_UNIT
> >
> >And the only other assignment to must_align assigns it the value "true".
> > There are two conditionals on must_align that looks like
> >
> >if (must_align)
> >  {
> >    CODE;
> >  }
> >
> >We should remove the conditional and pull CODE out an indentation level.
> > And remove all remnants of must_align.
> >
> >I don't think that changes your patch in any way.  Hopefully it makes
> >the whole function somewhat easier to grok.
> >
> >Thoughts?
> So here's that cleanup.  The diffs are larger than one might expect
> because of the reindentation that needs to happen.  So I've included
> a -b diff variant which shows how little actually changed here.
> 
> This should have no impact on any target.
> 
> Bootstrapped and regression tested on x86_64 linux.  Ok for the trunk?

I've rebased my patch on yours and also removed EXTRA_ALIGN which
is also constant.  I'll send new versions of both patches later.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2][v3] Drop excess size used for run time allocated stack variables.
  2016-05-03 14:18   ` Dominik Vogt
  2016-05-19 23:11     ` Jeff Law
@ 2016-05-25 14:02     ` Dominik Vogt
  2016-05-25 14:31       ` [PATCH 2/2][v3] " Dominik Vogt
                         ` (3 more replies)
  1 sibling, 4 replies; 35+ messages in thread
From: Dominik Vogt @ 2016-05-25 14:02 UTC (permalink / raw)
  To: Jeff Law, gcc-patches, Andreas Krebbel

[-- Attachment #1: Type: text/plain, Size: 1332 bytes --]

On Tue, May 03, 2016 at 03:17:53PM +0100, Dominik Vogt wrote:
> Version two of the patch including a test case.
> 
> On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote:
> > On 04/29/2016 04:12 PM, Dominik Vogt wrote:
> > >The attached patch removes excess stack space allocation with
> > >alloca in some situations.  Plese check the commit message in the
> > >patch for details.
> 
> > However, I would strongly recommend some tests, even if they are
> > target specific.  You can always copy pr36728-1 into the s390x
> > directory and look at size of the generated stack.  Simliarly for
> > pr50938 for x86.
> 
> However, x86 uses the "else" branch in round_push, i.e. it uses
> "virtual_preferred_stack_boundary_rtx" to calculate the number of
> bytes to add for stack alignment.  That value is unknown at the
> time round_push is called, so the test case fails on such targets,
> and I've no idea how to fix this properly.

Third version of the patch with the suggested cleanup in the first
patch and the functional stuff in the second one.  The first patch
is based on Jeff's draft with the change suggested by Eric and
more cleanup added by me.

Tested and bootstrapped on s390x biarch (but did not look for
performance regressions as the change should be a no-op).

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-ChangeLog --]
[-- Type: text/plain, Size: 152 bytes --]

gcc/ChangeLog0

	* explow.c (allocate_dynamic_stack_space): Simplify knowing that
	MUST_ALIGN was always true and extra_align ist always BITS_PER_UNIT.

[-- Attachment #3: 0001-Minor-cleanup-to-allocate_dynamic_stack_space.patch --]
[-- Type: text/x-diff, Size: 5503 bytes --]

From 01264dac2f62c16a2c7e684a674aa9e9bc27b434 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Wed, 25 May 2016 10:23:57 +0100
Subject: [PATCH 1/2] Minor cleanup to allocate_dynamic_stack_space.

---
 gcc/explow.c | 96 +++++++++++++++++++-----------------------------------------
 1 file changed, 30 insertions(+), 66 deletions(-)

diff --git a/gcc/explow.c b/gcc/explow.c
index e0ce201..09a0330 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1174,8 +1174,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
   HOST_WIDE_INT stack_usage_size = -1;
   rtx_code_label *final_label;
   rtx final_target, target;
-  unsigned extra_align = 0;
-  bool must_align;
+  unsigned extra;
 
   /* If we're asking for zero bytes, it doesn't matter what we point
      to since we can't dereference it.  But return a reasonable
@@ -1246,48 +1245,21 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
     crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
 
   /* We will need to ensure that the address we return is aligned to
-     REQUIRED_ALIGN.  If STACK_DYNAMIC_OFFSET is defined, we don't
-     always know its final value at this point in the compilation (it
-     might depend on the size of the outgoing parameter lists, for
-     example), so we must align the value to be returned in that case.
-     (Note that STACK_DYNAMIC_OFFSET will have a default nonzero value if
-     STACK_POINTER_OFFSET or ACCUMULATE_OUTGOING_ARGS are defined).
-     We must also do an alignment operation on the returned value if
-     the stack pointer alignment is less strict than REQUIRED_ALIGN.
-
-     If we have to align, we must leave space in SIZE for the hole
-     that might result from the alignment operation.  */
-
-  must_align = (crtl->preferred_stack_boundary < required_align);
-  if (must_align)
-    {
-      if (required_align > PREFERRED_STACK_BOUNDARY)
-	extra_align = PREFERRED_STACK_BOUNDARY;
-      else if (required_align > STACK_BOUNDARY)
-	extra_align = STACK_BOUNDARY;
-      else
-	extra_align = BITS_PER_UNIT;
-    }
+     REQUIRED_ALIGN.  At this point in the compilation, we don't always
+     know the final value of the STACK_DYNAMIC_OFFSET used in function.c
+     (it might depend on the size of the outgoing parameter lists, for
+     example), so we must preventively align the value.  We leave space
+     in SIZE for the hole that might result from the alignment operation.  */
 
-  /* ??? STACK_POINTER_OFFSET is always defined now.  */
-#if defined (STACK_DYNAMIC_OFFSET) || defined (STACK_POINTER_OFFSET)
-  must_align = true;
-  extra_align = BITS_PER_UNIT;
-#endif
-
-  if (must_align)
-    {
-      unsigned extra = (required_align - extra_align) / BITS_PER_UNIT;
+  extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT;
+  size = plus_constant (Pmode, size, extra);
+  size = force_operand (size, NULL_RTX);
 
-      size = plus_constant (Pmode, size, extra);
-      size = force_operand (size, NULL_RTX);
-
-      if (flag_stack_usage_info)
-	stack_usage_size += extra;
+  if (flag_stack_usage_info)
+    stack_usage_size += extra;
 
-      if (extra && size_align > extra_align)
-	size_align = extra_align;
-    }
+  if (extra && size_align > BITS_PER_UNIT)
+    size_align = BITS_PER_UNIT;
 
   /* Round the size to a multiple of the required stack alignment.
      Since the stack if presumed to be rounded before this allocation,
@@ -1361,13 +1333,10 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
       if (MALLOC_ABI_ALIGNMENT >= required_align)
 	ask = size;
       else
-	{
-	  ask = expand_binop (Pmode, add_optab, size,
-			      gen_int_mode (required_align / BITS_PER_UNIT - 1,
-					    Pmode),
-			      NULL_RTX, 1, OPTAB_LIB_WIDEN);
-	  must_align = true;
-	}
+	ask = expand_binop (Pmode, add_optab, size,
+			    gen_int_mode (required_align / BITS_PER_UNIT - 1,
+					  Pmode),
+			    NULL_RTX, 1, OPTAB_LIB_WIDEN);
 
       func = init_one_libfunc ("__morestack_allocate_stack_space");
 
@@ -1478,24 +1447,19 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
       target = final_target;
     }
 
-  if (must_align)
-    {
-      /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
-	 but we know it can't.  So add ourselves and then do
-	 TRUNC_DIV_EXPR.  */
-      target = expand_binop (Pmode, add_optab, target,
-			     gen_int_mode (required_align / BITS_PER_UNIT - 1,
-					   Pmode),
-			     NULL_RTX, 1, OPTAB_LIB_WIDEN);
-      target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
-			      gen_int_mode (required_align / BITS_PER_UNIT,
-					    Pmode),
-			      NULL_RTX, 1);
-      target = expand_mult (Pmode, target,
-			    gen_int_mode (required_align / BITS_PER_UNIT,
-					  Pmode),
-			    NULL_RTX, 1);
-    }
+  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
+     but we know it can't.  So add ourselves and then do
+     TRUNC_DIV_EXPR.  */
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
+				       Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
+			  gen_int_mode (required_align / BITS_PER_UNIT, Pmode),
+			  NULL_RTX, 1);
+  target = expand_mult (Pmode, target,
+			gen_int_mode (required_align / BITS_PER_UNIT, Pmode),
+			NULL_RTX, 1);
 
   /* Now that we've committed to a return value, mark its alignment.  */
   mark_reg_pointer (target, required_align);
-- 
2.3.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2][v3] Drop excess size used for run time allocated stack variables.
  2016-05-25 14:02     ` [PATCH 1/2][v3] " Dominik Vogt
@ 2016-05-25 14:31       ` Dominik Vogt
  2016-05-25 14:51         ` Dominik Vogt
                           ` (2 more replies)
  2016-06-08 11:20       ` [PATCH 1/2][v3] " Bernd Schmidt
                         ` (2 subsequent siblings)
  3 siblings, 3 replies; 35+ messages in thread
From: Dominik Vogt @ 2016-05-25 14:31 UTC (permalink / raw)
  To: Jeff Law, gcc-patches, Andreas Krebbel

[-- Attachment #1: Type: text/plain, Size: 1519 bytes --]

On Wed, May 25, 2016 at 02:30:54PM +0100, Dominik Vogt wrote:
> On Tue, May 03, 2016 at 03:17:53PM +0100, Dominik Vogt wrote:
> > Version two of the patch including a test case.
> > 
> > On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote:
> > > On 04/29/2016 04:12 PM, Dominik Vogt wrote:
> > > >The attached patch removes excess stack space allocation with
> > > >alloca in some situations.  Plese check the commit message in the
> > > >patch for details.
> > 
> > > However, I would strongly recommend some tests, even if they are
> > > target specific.  You can always copy pr36728-1 into the s390x
> > > directory and look at size of the generated stack.  Simliarly for
> > > pr50938 for x86.
> > 
> > However, x86 uses the "else" branch in round_push, i.e. it uses
> > "virtual_preferred_stack_boundary_rtx" to calculate the number of
> > bytes to add for stack alignment.  That value is unknown at the
> > time round_push is called, so the test case fails on such targets,
> > and I've no idea how to fix this properly.
> 
> Third version of the patch with the suggested cleanup in the first
> patch and the functional stuff in the second one.  The first patch
> is based on Jeff's draft with the change suggested by Eric and
> more cleanup added by me.

This is the updated funtional patch.  Re-tested with limited
effort, i.e. tested and bootstrapped on s390x biarch (but did not
look for performance regressions compared to version 2 of the
patch).

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0002-v3-ChangeLog --]
[-- Type: text/plain, Size: 188 bytes --]

gcc/ChangeLog

	* explow.c (round_push): Use know adjustment.
	(allocate_dynamic_stack_space): Pass known adjustment to round_push.
gcc/testsuite/ChangeLog

	* gcc.dg/pr50938.c: New test.

[-- Attachment #3: 0002-v3-Drop-excess-size-used-for-run-time-allocated-stack-v.patch --]
[-- Type: text/x-diff, Size: 6354 bytes --]

From 4296d353e1d153b5b5ee435a44cae6117bf2fff0 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Fri, 29 Apr 2016 08:36:59 +0100
Subject: [PATCH 2/2] Drop excess size used for run time allocated stack
 variables.

The present calculation sometimes led to more stack memory being used than
necessary with alloca.  First, (STACK_BOUNDARY -1) would be added to the
allocated size:

  size = plus_constant (Pmode, size, extra);
  size = force_operand (size, NULL_RTX);

Then round_push was called and added another (STACK_BOUNDARY - 1) before
rounding down to a multiple of STACK_BOUNDARY.  On s390x this resulted in
adding 14 before rounding down for "x" in the test case pr36728-1.c.

round_push() now takes an argument to inform it about what has already been
added to size.
---
 gcc/explow.c                   | 45 +++++++++++++++++++++---------------
 gcc/testsuite/gcc.dg/pr50938.c | 52 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr50938.c

diff --git a/gcc/explow.c b/gcc/explow.c
index 09a0330..85596e2 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -949,24 +949,30 @@ anti_adjust_stack (rtx adjust)
 }
 
 /* Round the size of a block to be pushed up to the boundary required
-   by this machine.  SIZE is the desired size, which need not be constant.  */
+   by this machine.  SIZE is the desired size, which need not be constant.
+   ALREADY_ADDED is the number of units that have already been added to SIZE
+   for other alignment reasons.
+*/
 
 static rtx
-round_push (rtx size)
+round_push (rtx size, int already_added)
 {
-  rtx align_rtx, alignm1_rtx;
+  rtx align_rtx, add_rtx;
 
   if (!SUPPORTS_STACK_ALIGNMENT
       || crtl->preferred_stack_boundary == MAX_SUPPORTED_STACK_ALIGNMENT)
     {
       int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
+      int add;
 
       if (align == 1)
 	return size;
 
+      add = (align > already_added) ? align - already_added - 1 : 0;
+
       if (CONST_INT_P (size))
 	{
-	  HOST_WIDE_INT new_size = (INTVAL (size) + align - 1) / align * align;
+	  HOST_WIDE_INT new_size = (INTVAL (size) + add) / align * align;
 
 	  if (INTVAL (size) != new_size)
 	    size = GEN_INT (new_size);
@@ -974,7 +980,7 @@ round_push (rtx size)
 	}
 
       align_rtx = GEN_INT (align);
-      alignm1_rtx = GEN_INT (align - 1);
+      add_rtx = (add > 0) ? GEN_INT (add) : const0_rtx;
     }
   else
     {
@@ -983,15 +989,15 @@ round_push (rtx size)
 	 substituted by the right value in vregs pass and optimized
 	 during combine.  */
       align_rtx = virtual_preferred_stack_boundary_rtx;
-      alignm1_rtx = force_operand (plus_constant (Pmode, align_rtx, -1),
-				   NULL_RTX);
+      add_rtx = force_operand (plus_constant (Pmode, align_rtx, -1), NULL_RTX);
     }
 
   /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
      but we know it can't.  So add ourselves and then do
      TRUNC_DIV_EXPR.  */
-  size = expand_binop (Pmode, add_optab, size, alignm1_rtx,
-		       NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  if (add_rtx != const0_rtx)
+    size = expand_binop (Pmode, add_optab, size, add_rtx,
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
   size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx,
 			NULL_RTX, 1);
   size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1);
@@ -1174,7 +1180,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
   HOST_WIDE_INT stack_usage_size = -1;
   rtx_code_label *final_label;
   rtx final_target, target;
-  unsigned extra;
+  unsigned extra = 0;
 
   /* If we're asking for zero bytes, it doesn't matter what we point
      to since we can't dereference it.  But return a reasonable
@@ -1251,15 +1257,18 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
      example), so we must preventively align the value.  We leave space
      in SIZE for the hole that might result from the alignment operation.  */
 
-  extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT;
-  size = plus_constant (Pmode, size, extra);
-  size = force_operand (size, NULL_RTX);
+  if (required_align > BITS_PER_UNIT)
+    {
+      extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT;
+      size = plus_constant (Pmode, size, extra);
+      size = force_operand (size, NULL_RTX);
 
-  if (flag_stack_usage_info)
-    stack_usage_size += extra;
+      if (flag_stack_usage_info)
+	stack_usage_size += extra;
 
-  if (extra && size_align > BITS_PER_UNIT)
-    size_align = BITS_PER_UNIT;
+      if (size_align > BITS_PER_UNIT)
+	size_align = BITS_PER_UNIT;
+    }
 
   /* Round the size to a multiple of the required stack alignment.
      Since the stack if presumed to be rounded before this allocation,
@@ -1276,7 +1285,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
      momentarily mis-aligning the stack.  */
   if (size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0)
     {
-      size = round_push (size);
+      size = round_push (size, extra);
 
       if (flag_stack_usage_info)
 	{
diff --git a/gcc/testsuite/gcc.dg/pr50938.c b/gcc/testsuite/gcc.dg/pr50938.c
new file mode 100644
index 0000000..14b7540
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr50938.c
@@ -0,0 +1,52 @@
+/* PR/50938: Check that alloca () reserves the correct amount of stack space.
+ */
+
+/* { dg-do run } */
+/* { dg-options "-O0" } */
+/* { dg-require-effective-target alloca } */
+
+__attribute__ ((noinline))
+static void
+dummy (void)
+{
+}
+
+__attribute__ ((noinline))
+static char *
+get_frame_addr (void *p)
+{
+  void *faddr = __builtin_frame_address (0);
+  /* Call a function to make sure that a stack frame exists.  */
+  dummy ();
+  return faddr;
+}
+
+__attribute__ ((noinline))
+static void *
+stack_var (void)
+{
+  /* 1024 bytes on the stack.  */
+  char s[1024];
+  /* One stack slot.  */
+  void *p = (void *)s;
+  /* Keep stack memory alive by passing it to the function.  */
+  return get_frame_addr (p);
+}
+
+__attribute__ ((noinline))
+static void *
+alloca_var (void)
+{
+  /* 1024 bytes on the stack plus one stack slot for a.  */
+  void *a = __builtin_alloca (1024);
+  return get_frame_addr (a);
+}
+
+int main (void)
+{
+  if (stack_var () != alloca_var ())
+    /* The functions used a different amount of stack space.  */
+    __builtin_abort ();
+
+  return 0;
+}
-- 
2.3.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2][v3] Drop excess size used for run time allocated stack variables.
  2016-05-25 14:31       ` [PATCH 2/2][v3] " Dominik Vogt
@ 2016-05-25 14:51         ` Dominik Vogt
  2016-06-08 11:21         ` Bernd Schmidt
  2016-06-23  4:24         ` Jeff Law
  2 siblings, 0 replies; 35+ messages in thread
From: Dominik Vogt @ 2016-05-25 14:51 UTC (permalink / raw)
  To: Jeff Law, gcc-patches, Andreas Krebbel

On Wed, May 25, 2016 at 02:32:41PM +0100, Dominik Vogt wrote:
> On Wed, May 25, 2016 at 02:30:54PM +0100, Dominik Vogt wrote:
> > On Tue, May 03, 2016 at 03:17:53PM +0100, Dominik Vogt wrote:
> > > Version two of the patch including a test case.
> > > 
> > > On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote:
> > > > On 04/29/2016 04:12 PM, Dominik Vogt wrote:
> > > > >The attached patch removes excess stack space allocation with
> > > > >alloca in some situations.  Plese check the commit message in the
> > > > >patch for details.
> > > 
> > > > However, I would strongly recommend some tests, even if they are
> > > > target specific.  You can always copy pr36728-1 into the s390x
> > > > directory and look at size of the generated stack.  Simliarly for
> > > > pr50938 for x86.
> > > 
> > > However, x86 uses the "else" branch in round_push, i.e. it uses
> > > "virtual_preferred_stack_boundary_rtx" to calculate the number of
> > > bytes to add for stack alignment.  That value is unknown at the
> > > time round_push is called, so the test case fails on such targets,
> > > and I've no idea how to fix this properly.
> > 
> > Third version of the patch with the suggested cleanup in the first
> > patch and the functional stuff in the second one.  The first patch
> > is based on Jeff's draft with the change suggested by Eric and
> > more cleanup added by me.
> 
> This is the updated funtional patch.  Re-tested with limited
> effort, i.e. tested and bootstrapped on s390x biarch (but did not
> look for performance regressions compared to version 2 of the
> patch).

(I won't be able to reply to any comments until 20th of June.)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2][v3] Drop excess size used for run time allocated stack variables.
  2016-05-25 14:02     ` [PATCH 1/2][v3] " Dominik Vogt
  2016-05-25 14:31       ` [PATCH 2/2][v3] " Dominik Vogt
@ 2016-06-08 11:20       ` Bernd Schmidt
  2016-06-08 12:20         ` Eric Botcazou
  2016-06-22 20:34       ` Jeff Law
  2016-07-04 14:22       ` Andreas Krebbel
  3 siblings, 1 reply; 35+ messages in thread
From: Bernd Schmidt @ 2016-06-08 11:20 UTC (permalink / raw)
  To: vogt, Jeff Law, gcc-patches, Andreas Krebbel, Eric Botcazou

On 05/25/2016 03:30 PM, Dominik Vogt wrote:
> 	* explow.c (allocate_dynamic_stack_space): Simplify knowing that
> 	MUST_ALIGN was always true and extra_align ist always BITS_PER_UNIT.

I tried to do some archaeology to find out how the code came to look the 
way it currently does. A relevant message appears to be

https://gcc.gnu.org/ml/gcc-patches/2011-01/msg00836.html

There's some discussion about how STACK_POINT_OFFSET shouldn't cause us 
to have to align, and postponing that optimization to gcc-4.7. Since 
STACK_POINTER_OFFSET should be constant, it ought to be easy enough to 
take it into account.

So, I'm undecided. Your cleanup is valid as the code stands right now, 
but I'm undecided whether we shouldn't fix the potentially unnecessary 
extra alignment instead.


Bernd

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2][v3] Drop excess size used for run time allocated stack variables.
  2016-05-25 14:31       ` [PATCH 2/2][v3] " Dominik Vogt
  2016-05-25 14:51         ` Dominik Vogt
@ 2016-06-08 11:21         ` Bernd Schmidt
  2016-06-20 12:20           ` Dominik Vogt
  2016-06-23  4:24         ` Jeff Law
  2 siblings, 1 reply; 35+ messages in thread
From: Bernd Schmidt @ 2016-06-08 11:21 UTC (permalink / raw)
  To: vogt, Jeff Law, gcc-patches, Andreas Krebbel

On 05/25/2016 03:32 PM, Dominik Vogt wrote:
> 	* explow.c (round_push): Use know adjustment.
> 	(allocate_dynamic_stack_space): Pass known adjustment to round_push.
> gcc/testsuite/ChangeLog
>

I was thinking about whether it would be possible/desirable to eliminate 
the double add entirely, but I couldn't find a way to structure the code 
in a way that seems better than what you have. So, ...

>  /* Round the size of a block to be pushed up to the boundary required
> -   by this machine.  SIZE is the desired size, which need not be constant.  */
> +   by this machine.  SIZE is the desired size, which need not be constant.
> +   ALREADY_ADDED is the number of units that have already been added to SIZE
> +   for other alignment reasons.
> +*/

The */ goes on the last line of the comment.

> +/* PR/50938: Check that alloca () reserves the correct amount of stack space.
> + */

Same here really, even if it's only a test.

Ok with these fixed.


Bernd

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2][v3] Drop excess size used for run time allocated stack variables.
  2016-06-08 11:20       ` [PATCH 1/2][v3] " Bernd Schmidt
@ 2016-06-08 12:20         ` Eric Botcazou
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Botcazou @ 2016-06-08 12:20 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, vogt, Jeff Law, Andreas Krebbel

> There's some discussion about how STACK_POINT_OFFSET shouldn't cause us
> to have to align, and postponing that optimization to gcc-4.7. Since
> STACK_POINTER_OFFSET should be constant, it ought to be easy enough to
> take it into account.

See the "Minor cleanup to allocate_dynamic_stack_space" subthread, I think 
that the real issue is STACK_DYNAMIC_OFFSET.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Drop excess size used for run time allocated stack variables.
  2016-05-19 23:11     ` Jeff Law
  2016-05-20 21:24       ` [RFA] Minor cleanup to allocate_dynamic_stack_space Jeff Law
@ 2016-06-09 12:00       ` Bernd Schmidt
  2016-06-21  9:35         ` Dominik Vogt
  1 sibling, 1 reply; 35+ messages in thread
From: Bernd Schmidt @ 2016-06-09 12:00 UTC (permalink / raw)
  To: Jeff Law, vogt, gcc-patches, Andreas Krebbel, Richard Henderson

On 05/20/2016 01:11 AM, Jeff Law wrote:
> Let's start with clean up of dead code:
>
>  /* We will need to ensure that the address we return is aligned to
>      REQUIRED_ALIGN.  If STACK_DYNAMIC_OFFSET is defined, we don't
>      always know its final value at this point in the compilation (it
>      might depend on the size of the outgoing parameter lists, for
>      example), so we must align the value to be returned in that case.
>      (Note that STACK_DYNAMIC_OFFSET will have a default nonzero value if
>      STACK_POINTER_OFFSET or ACCUMULATE_OUTGOING_ARGS are defined).
>      We must also do an alignment operation on the returned value if
>      the stack pointer alignment is less strict than REQUIRED_ALIGN.
>
>      If we have to align, we must leave space in SIZE for the hole
>      that might result from the alignment operation.  */
>
>   must_align = (crtl->preferred_stack_boundary < required_align);
>   if (must_align)
>     {
>       if (required_align > PREFERRED_STACK_BOUNDARY)
>         extra_align = PREFERRED_STACK_BOUNDARY;
>       else if (required_align > STACK_BOUNDARY)
>         extra_align = STACK_BOUNDARY;
>       else
>         extra_align = BITS_PER_UNIT;
>     }
>
>   /* ??? STACK_POINTER_OFFSET is always defined now.  */
> #if defined (STACK_DYNAMIC_OFFSET) || defined (STACK_POINTER_OFFSET)
>   must_align = true;
>   extra_align = BITS_PER_UNIT;
> #endif
>
> If we look at defaults.h, it always defines STACK_POINTER_OFFSET.  So
> all the code above I think collapses to:
>
>   must_align = true;
>   extra_align = BITS_PER_UNIT

(Cc'ing rth because portions of this seem to be his, r165240).

I kind of want to approach this from a different angle; let's look at 
extra_align. The way this is used subsequently makes it appear to be 
misnamed. It looks like it should hold the stack alignment we can assume:

   unsigned extra = (required_align - extra_align) / BITS_PER_UNIT;

   size = plus_constant (Pmode, size, extra);
[...]
   if (extra && size_align > extra_align)
     size_align = extra_align;

(where size_align is the known alignment of the size of the block to be 
allocated). If I'm reading this right, then the first part of the 
cleanup ought to be to get the naming right.

So why BITS_PER_UNIT? Shouldn't it at least be STACK_BOUNDARY? Let's 
look at the previous block a little more closely.

 >   must_align = (crtl->preferred_stack_boundary < required_align);

[ crtl->preferred_stack_boundary is initialized to STACK_BOUNDARY in 
cfgexpand and only ever increased ]

 >   if (must_align)
 >     {

[ if must_align, then required_align > crtl->p_s_b >= STACK_BOUNDARY ]

 >       if (required_align > PREFERRED_STACK_BOUNDARY)
 >         extra_align = PREFERRED_STACK_BOUNDARY;

[ so far so good ]

 >       else if (required_align > STACK_BOUNDARY)
 >         extra_align = STACK_BOUNDARY;

[ always true, right? ]

 >       else
 >         extra_align = BITS_PER_UNIT;
 >     }

[ dead code, right? ]

So we're left with the question of why extra_align is set to 
BITS_PER_UNIT for STACK_DYNAMIC_OFFSET, and I can't really see a reason 
to do that either. AFAIK the minimum alignment of the stack is always 
STACK_BOUNDARY, and it's possible we could do better.

As far as I can tell, no definition of STACK_DYNAMIC_OFFSET differs 
substantially from the default definition in function.c. Why couldn't we 
round up the outgoing_args_size to the preferred stack boundary (or a 
new value to keep track of the required alignment for dynamic 
allocations) before instantiating dynamic_offset? We then wouldn't have 
to add extra alignment for it here.

This rounding seems to happen anyway in port's frame calculations, e.g. 
here in i386:

  if (ACCUMULATE_OUTGOING_ARGS
       && (!crtl->is_leaf || cfun->calls_alloca
           || ix86_current_function_calls_tls_descriptor))
     {
       offset += crtl->outgoing_args_size;
       frame->outgoing_arguments_size = crtl->outgoing_args_size;
     }
   else
     frame->outgoing_arguments_size = 0;

   /* Align stack boundary.  Only needed if we're calling another function
      or using alloca.  */
   if (!crtl->is_leaf || cfun->calls_alloca
       || ix86_current_function_calls_tls_descriptor)
     offset = ROUND_UP (offset, preferred_alignment);

or here in rs6000:
   info->parm_size    = RS6000_ALIGN (crtl->outgoing_args_size,
                                          TARGET_ALTIVEC ? 16 : 8);

which could probably use the default definition of STACK_DYNAMIC_OFFSET 
instead of this, if the outgoing_args_size was rounded appropriately:
#define STACK_DYNAMIC_OFFSET(FUNDECL)                                   \
   (RS6000_ALIGN (crtl->outgoing_args_size,                              \
                  (TARGET_ALTIVEC || TARGET_VSX) ? 16 : 8)               \
    + (STACK_POINTER_OFFSET))


Bernd

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2][v3] Drop excess size used for run time allocated stack variables.
  2016-06-20 12:20           ` Dominik Vogt
@ 2016-06-20 12:20             ` Bernd Schmidt
  0 siblings, 0 replies; 35+ messages in thread
From: Bernd Schmidt @ 2016-06-20 12:20 UTC (permalink / raw)
  To: vogt, Jeff Law, gcc-patches, Andreas Krebbel

On 06/20/2016 02:19 PM, Dominik Vogt wrote:

>>> +/* PR/50938: Check that alloca () reserves the correct amount of stack space.
>>> + */
>>
>> Same here really, even if it's only a test.
>
> In this case, the line gets too long with "  */" appended.

In that case we wrap before the last word.


Bernd

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2][v3] Drop excess size used for run time allocated stack variables.
  2016-06-08 11:21         ` Bernd Schmidt
@ 2016-06-20 12:20           ` Dominik Vogt
  2016-06-20 12:20             ` Bernd Schmidt
  0 siblings, 1 reply; 35+ messages in thread
From: Dominik Vogt @ 2016-06-20 12:20 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Jeff Law, gcc-patches, Andreas Krebbel

On Wed, Jun 08, 2016 at 01:21:09PM +0200, Bernd Schmidt wrote:
> On 05/25/2016 03:32 PM, Dominik Vogt wrote:
> >	* explow.c (round_push): Use know adjustment.
> >	(allocate_dynamic_stack_space): Pass known adjustment to round_push.
> >gcc/testsuite/ChangeLog
> >
> 
> I was thinking about whether it would be possible/desirable to
> eliminate the double add entirely, but I couldn't find a way to
> structure the code in a way that seems better than what you have.
> So, ...
> 
> > /* Round the size of a block to be pushed up to the boundary required
> >-   by this machine.  SIZE is the desired size, which need not be constant.  */
> >+   by this machine.  SIZE is the desired size, which need not be constant.
> >+   ALREADY_ADDED is the number of units that have already been added to SIZE
> >+   for other alignment reasons.
> >+*/
> 
> The */ goes on the last line of the comment.

No problem.

> >+/* PR/50938: Check that alloca () reserves the correct amount of stack space.
> >+ */
> 
> Same here really, even if it's only a test.

In this case, the line gets too long with "  */" appended.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Drop excess size used for run time allocated stack variables.
  2016-06-09 12:00       ` [PATCH] Drop excess size used for run time allocated stack variables Bernd Schmidt
@ 2016-06-21  9:35         ` Dominik Vogt
  2016-06-21 22:26           ` Jeff Law
  0 siblings, 1 reply; 35+ messages in thread
From: Dominik Vogt @ 2016-06-21  9:35 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Jeff Law, gcc-patches, Andreas Krebbel, Richard Henderson

What do we do now with the two patches?  At the moment, the
functional patch depends on the changes in the cleanup patch, so
it cannot be applied on its own.  Options:

(with the requested cleanup in the functional patch)

 1) Apply both patches as they are now and do further cleanup on
    top of it.
 2) Rewrite the functional patch so that it applies without the
    cleanup patch and commit it now.
 3) Look into the suggested cleanup now and adapt the functional
    patch to it when its ready.

Actually I'd prefer (1) or (2) to just get the functional patch
off my desk.  I agree that the cleanup is very useful, but there's
not relation between the cleanup and the functional stuff except
that they touch the same code.  Having the functional patch
applied would simplify further work for me.

On Thu, Jun 09, 2016 at 02:00:21PM +0200, Bernd Schmidt wrote:
> On 05/20/2016 01:11 AM, Jeff Law wrote:
> >Let's start with clean up of dead code:
> >
> > /* We will need to ensure that the address we return is aligned to
> >     REQUIRED_ALIGN.  If STACK_DYNAMIC_OFFSET is defined, we don't
> >     always know its final value at this point in the compilation (it
> >     might depend on the size of the outgoing parameter lists, for
> >     example), so we must align the value to be returned in that case.
> >     (Note that STACK_DYNAMIC_OFFSET will have a default nonzero value if
> >     STACK_POINTER_OFFSET or ACCUMULATE_OUTGOING_ARGS are defined).
> >     We must also do an alignment operation on the returned value if
> >     the stack pointer alignment is less strict than REQUIRED_ALIGN.
> >
> >     If we have to align, we must leave space in SIZE for the hole
> >     that might result from the alignment operation.  */
> >
> >  must_align = (crtl->preferred_stack_boundary < required_align);
> >  if (must_align)
> >    {
> >      if (required_align > PREFERRED_STACK_BOUNDARY)
> >        extra_align = PREFERRED_STACK_BOUNDARY;
> >      else if (required_align > STACK_BOUNDARY)
> >        extra_align = STACK_BOUNDARY;
> >      else
> >        extra_align = BITS_PER_UNIT;
> >    }
> >
> >  /* ??? STACK_POINTER_OFFSET is always defined now.  */
> >#if defined (STACK_DYNAMIC_OFFSET) || defined (STACK_POINTER_OFFSET)
> >  must_align = true;
> >  extra_align = BITS_PER_UNIT;
> >#endif
> >
> >If we look at defaults.h, it always defines STACK_POINTER_OFFSET.  So
> >all the code above I think collapses to:
> >
> >  must_align = true;
> >  extra_align = BITS_PER_UNIT
> 
> (Cc'ing rth because portions of this seem to be his, r165240).
> 
> I kind of want to approach this from a different angle; let's look
> at extra_align. The way this is used subsequently makes it appear to
> be misnamed. It looks like it should hold the stack alignment we can
> assume:
> 
>   unsigned extra = (required_align - extra_align) / BITS_PER_UNIT;
> 
>   size = plus_constant (Pmode, size, extra);
> [...]
>   if (extra && size_align > extra_align)
>     size_align = extra_align;
> 
> (where size_align is the known alignment of the size of the block to
> be allocated). If I'm reading this right, then the first part of the
> cleanup ought to be to get the naming right.
> 
> So why BITS_PER_UNIT? Shouldn't it at least be STACK_BOUNDARY? Let's
> look at the previous block a little more closely.
> 
> >   must_align = (crtl->preferred_stack_boundary < required_align);
> 
> [ crtl->preferred_stack_boundary is initialized to STACK_BOUNDARY in
> cfgexpand and only ever increased ]
> 
> >   if (must_align)
> >     {
> 
> [ if must_align, then required_align > crtl->p_s_b >= STACK_BOUNDARY ]
> 
> >       if (required_align > PREFERRED_STACK_BOUNDARY)
> >         extra_align = PREFERRED_STACK_BOUNDARY;
> 
> [ so far so good ]
> 
> >       else if (required_align > STACK_BOUNDARY)
> >         extra_align = STACK_BOUNDARY;
> 
> [ always true, right? ]
> 
> >       else
> >         extra_align = BITS_PER_UNIT;
> >     }
> 
> [ dead code, right? ]
> 
> So we're left with the question of why extra_align is set to
> BITS_PER_UNIT for STACK_DYNAMIC_OFFSET, and I can't really see a
> reason to do that either. AFAIK the minimum alignment of the stack
> is always STACK_BOUNDARY, and it's possible we could do better.
> 
> As far as I can tell, no definition of STACK_DYNAMIC_OFFSET differs
> substantially from the default definition in function.c. Why
> couldn't we round up the outgoing_args_size to the preferred stack
> boundary (or a new value to keep track of the required alignment for
> dynamic allocations) before instantiating dynamic_offset? We then
> wouldn't have to add extra alignment for it here.
> 
> This rounding seems to happen anyway in port's frame calculations,
> e.g. here in i386:
> 
>  if (ACCUMULATE_OUTGOING_ARGS
>       && (!crtl->is_leaf || cfun->calls_alloca
>           || ix86_current_function_calls_tls_descriptor))
>     {
>       offset += crtl->outgoing_args_size;
>       frame->outgoing_arguments_size = crtl->outgoing_args_size;
>     }
>   else
>     frame->outgoing_arguments_size = 0;
> 
>   /* Align stack boundary.  Only needed if we're calling another function
>      or using alloca.  */
>   if (!crtl->is_leaf || cfun->calls_alloca
>       || ix86_current_function_calls_tls_descriptor)
>     offset = ROUND_UP (offset, preferred_alignment);
> 
> or here in rs6000:
>   info->parm_size    = RS6000_ALIGN (crtl->outgoing_args_size,
>                                          TARGET_ALTIVEC ? 16 : 8);
> 
> which could probably use the default definition of
> STACK_DYNAMIC_OFFSET instead of this, if the outgoing_args_size was
> rounded appropriately:
> #define STACK_DYNAMIC_OFFSET(FUNDECL)                                   \
>   (RS6000_ALIGN (crtl->outgoing_args_size,                              \
>                  (TARGET_ALTIVEC || TARGET_VSX) ? 16 : 8)               \
>    + (STACK_POINTER_OFFSET))

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Drop excess size used for run time allocated stack variables.
  2016-06-21  9:35         ` Dominik Vogt
@ 2016-06-21 22:26           ` Jeff Law
  2016-06-22  8:57             ` Dominik Vogt
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Law @ 2016-06-21 22:26 UTC (permalink / raw)
  To: vogt, Bernd Schmidt, gcc-patches, Andreas Krebbel, Richard Henderson

On 06/21/2016 03:35 AM, Dominik Vogt wrote:
> What do we do now with the two patches?  At the moment, the
> functional patch depends on the changes in the cleanup patch, so
> it cannot be applied on its own.  Options:
>
> (with the requested cleanup in the functional patch)
>
>  1) Apply both patches as they are now and do further cleanup on
>     top of it.
>  2) Rewrite the functional patch so that it applies without the
>     cleanup patch and commit it now.
>  3) Look into the suggested cleanup now and adapt the functional
>     patch to it when its ready.
>
> Actually I'd prefer (1) or (2) to just get the functional patch
> off my desk.  I agree that the cleanup is very useful, but there's
> not relation between the cleanup and the functional stuff except
> that they touch the same code.  Having the functional patch
> applied would simplify further work for me.
I thought Eric had ack'd the cleanup patch with a comment fix, so that 
can move forward and presumably unblock your functional patch.  Right?

So I think the TODO here is for me to fix the comment per Eric's review 
so that you can move forward.  The trick is getting it done before I go 
on PTO at the end of this week :-)

Jeff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Drop excess size used for run time allocated stack variables.
  2016-06-21 22:26           ` Jeff Law
@ 2016-06-22  8:57             ` Dominik Vogt
  0 siblings, 0 replies; 35+ messages in thread
From: Dominik Vogt @ 2016-06-22  8:57 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bernd Schmidt, gcc-patches, Andreas Krebbel, Richard Henderson

On Tue, Jun 21, 2016 at 04:26:03PM -0600, Jeff Law wrote:
> On 06/21/2016 03:35 AM, Dominik Vogt wrote:
> >What do we do now with the two patches?  At the moment, the
> >functional patch depends on the changes in the cleanup patch, so
> >it cannot be applied on its own.  Options:
> >
> >(with the requested cleanup in the functional patch)
> >
> > 1) Apply both patches as they are now and do further cleanup on
> >    top of it.
> > 2) Rewrite the functional patch so that it applies without the
> >    cleanup patch and commit it now.
> > 3) Look into the suggested cleanup now and adapt the functional
> >    patch to it when its ready.
> >
> >Actually I'd prefer (1) or (2) to just get the functional patch
> >off my desk.  I agree that the cleanup is very useful, but there's
> >not relation between the cleanup and the functional stuff except
> >that they touch the same code.  Having the functional patch
> >applied would simplify further work for me.
> I thought Eric had ack'd the cleanup patch with a comment fix, so
> that can move forward and presumably unblock your functional patch.
> Right?
> 
> So I think the TODO here is for me to fix the comment per Eric's
> review so that you can move forward.  The trick is getting it done
> before I go on PTO at the end of this week :-)

The comment fix is part of the version of the cleanup patch I
posted, but I've removed some more dead code.  I can handle all of
this if I know what to do exactly.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2][v3] Drop excess size used for run time allocated stack variables.
  2016-05-25 14:02     ` [PATCH 1/2][v3] " Dominik Vogt
  2016-05-25 14:31       ` [PATCH 2/2][v3] " Dominik Vogt
  2016-06-08 11:20       ` [PATCH 1/2][v3] " Bernd Schmidt
@ 2016-06-22 20:34       ` Jeff Law
  2016-07-04 14:22       ` Andreas Krebbel
  3 siblings, 0 replies; 35+ messages in thread
From: Jeff Law @ 2016-06-22 20:34 UTC (permalink / raw)
  To: vogt, gcc-patches, Andreas Krebbel

On 05/25/2016 07:30 AM, Dominik Vogt wrote:
> On Tue, May 03, 2016 at 03:17:53PM +0100, Dominik Vogt wrote:
>> > Version two of the patch including a test case.
>> >
>> > On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote:
>>> > > On 04/29/2016 04:12 PM, Dominik Vogt wrote:
>>>> > > >The attached patch removes excess stack space allocation with
>>>> > > >alloca in some situations.  Plese check the commit message in the
>>>> > > >patch for details.
>> >
>>> > > However, I would strongly recommend some tests, even if they are
>>> > > target specific.  You can always copy pr36728-1 into the s390x
>>> > > directory and look at size of the generated stack.  Simliarly for
>>> > > pr50938 for x86.
>> >
>> > However, x86 uses the "else" branch in round_push, i.e. it uses
>> > "virtual_preferred_stack_boundary_rtx" to calculate the number of
>> > bytes to add for stack alignment.  That value is unknown at the
>> > time round_push is called, so the test case fails on such targets,
>> > and I've no idea how to fix this properly.
> Third version of the patch with the suggested cleanup in the first
> patch and the functional stuff in the second one.  The first patch
> is based on Jeff's draft with the change suggested by Eric and
> more cleanup added by me.
>
> Tested and bootstrapped on s390x biarch (but did not look for
> performance regressions as the change should be a no-op).
>
> Ciao
>
> Dominik ^_^  ^_^
>
> -- Dominik Vogt IBM Germany
>
>
> 0001-ChangeLog
>
>
> gcc/ChangeLog0
>
> 	* explow.c (allocate_dynamic_stack_space): Simplify knowing that
> 	MUST_ALIGN was always true and extra_align ist always BITS_PER_UNIT.
I think this meets the spirit of Eric's request to save the comment.  So 
OK for the trunk.  I realize this is kind of self-approving since the 
original cleanup was mine, but Eric signaled he was OK with the cleanup 
as long as the comment was saved.

jeff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2][v3] Drop excess size used for run time allocated stack variables.
  2016-05-25 14:31       ` [PATCH 2/2][v3] " Dominik Vogt
  2016-05-25 14:51         ` Dominik Vogt
  2016-06-08 11:21         ` Bernd Schmidt
@ 2016-06-23  4:24         ` Jeff Law
  2016-06-23  9:57           ` Dominik Vogt
  2 siblings, 1 reply; 35+ messages in thread
From: Jeff Law @ 2016-06-23  4:24 UTC (permalink / raw)
  To: vogt, gcc-patches, Andreas Krebbel

On 05/25/2016 07:32 AM, Dominik Vogt wrote:
> On Wed, May 25, 2016 at 02:30:54PM +0100, Dominik Vogt wrote:
>> > On Tue, May 03, 2016 at 03:17:53PM +0100, Dominik Vogt wrote:
>>> > > Version two of the patch including a test case.
>>> > >
>>> > > On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote:
>>>> > > > On 04/29/2016 04:12 PM, Dominik Vogt wrote:
>>>>> > > > >The attached patch removes excess stack space allocation with
>>>>> > > > >alloca in some situations.  Plese check the commit message in the
>>>>> > > > >patch for details.
>>> > >
>>>> > > > However, I would strongly recommend some tests, even if they are
>>>> > > > target specific.  You can always copy pr36728-1 into the s390x
>>>> > > > directory and look at size of the generated stack.  Simliarly for
>>>> > > > pr50938 for x86.
>>> > >
>>> > > However, x86 uses the "else" branch in round_push, i.e. it uses
>>> > > "virtual_preferred_stack_boundary_rtx" to calculate the number of
>>> > > bytes to add for stack alignment.  That value is unknown at the
>>> > > time round_push is called, so the test case fails on such targets,
>>> > > and I've no idea how to fix this properly.
>> >
>> > Third version of the patch with the suggested cleanup in the first
>> > patch and the functional stuff in the second one.  The first patch
>> > is based on Jeff's draft with the change suggested by Eric and
>> > more cleanup added by me.
> This is the updated funtional patch.  Re-tested with limited
> effort, i.e. tested and bootstrapped on s390x biarch (but did not
> look for performance regressions compared to version 2 of the
> patch).
>
> Ciao
>
> Dominik ^_^  ^_^
>
> -- Dominik Vogt IBM Germany
>
>
> 0002-v3-ChangeLog
>
>
> gcc/ChangeLog
>
> 	* explow.c (round_push): Use know adjustment.
> 	(allocate_dynamic_stack_space): Pass known adjustment to round_push.
> gcc/testsuite/ChangeLog
>
> 	* gcc.dg/pr50938.c: New test.
>
>
> 0002-v3-Drop-excess-size-used-for-run-time-allocated-stack-v.patch
>
>
> From 4296d353e1d153b5b5ee435a44cae6117bf2fff0 Mon Sep 17 00:00:00 2001
> From: Dominik Vogt <vogt@linux.vnet.ibm.com>
> Date: Fri, 29 Apr 2016 08:36:59 +0100
> Subject: [PATCH 2/2] Drop excess size used for run time allocated stack
>  variables.
>
> The present calculation sometimes led to more stack memory being used than
> necessary with alloca.  First, (STACK_BOUNDARY -1) would be added to the
> allocated size:
>
>   size = plus_constant (Pmode, size, extra);
>   size = force_operand (size, NULL_RTX);
>
> Then round_push was called and added another (STACK_BOUNDARY - 1) before
> rounding down to a multiple of STACK_BOUNDARY.  On s390x this resulted in
> adding 14 before rounding down for "x" in the test case pr36728-1.c.
>
> round_push() now takes an argument to inform it about what has already been
> added to size.
> ---
>  gcc/explow.c                   | 45 +++++++++++++++++++++---------------
>  gcc/testsuite/gcc.dg/pr50938.c | 52 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 79 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr50938.c
>
> diff --git a/gcc/explow.c b/gcc/explow.c
> index 09a0330..85596e2 100644
> --- a/gcc/explow.c
> +++ b/gcc/explow.c
> @@ -949,24 +949,30 @@ anti_adjust_stack (rtx adjust)
>  }
>
>  /* Round the size of a block to be pushed up to the boundary required
> -   by this machine.  SIZE is the desired size, which need not be constant.  */
> +   by this machine.  SIZE is the desired size, which need not be constant.
> +   ALREADY_ADDED is the number of units that have already been added to SIZE
> +   for other alignment reasons.
> +*/
>
>  static rtx
> -round_push (rtx size)
> +round_push (rtx size, int already_added)
>  {
> -  rtx align_rtx, alignm1_rtx;
> +  rtx align_rtx, add_rtx;
>
>    if (!SUPPORTS_STACK_ALIGNMENT
>        || crtl->preferred_stack_boundary == MAX_SUPPORTED_STACK_ALIGNMENT)
>      {
>        int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
> +      int add;
>
>        if (align == 1)
>  	return size;
>
> +      add = (align > already_added) ? align - already_added - 1 : 0;
> +
>        if (CONST_INT_P (size))
>  	{
> -	  HOST_WIDE_INT new_size = (INTVAL (size) + align - 1) / align * align;
> +	  HOST_WIDE_INT new_size = (INTVAL (size) + add) / align * align;
>
>  	  if (INTVAL (size) != new_size)
>  	    size = GEN_INT (new_size);
So presumably the idea here is when the requested SIZE would require 
allocating additional space to first see if the necessary space is 
already available inside ALREADY_ADDED and use that rather than rounding 
size up to an alignment boundary.

I can see how that works in the sense of avoiding allocating extra 
space.  What I'm struggling with is how do we know the space actually 
allocated is going to have the right alignment?

What am I missing here?

jeff



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2][v3] Drop excess size used for run time allocated stack variables.
  2016-06-23  4:24         ` Jeff Law
@ 2016-06-23  9:57           ` Dominik Vogt
  2016-07-21 20:07             ` Jeff Law
  0 siblings, 1 reply; 35+ messages in thread
From: Dominik Vogt @ 2016-06-23  9:57 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Andreas Krebbel

On Wed, Jun 22, 2016 at 10:24:02PM -0600, Jeff Law wrote:
> On 05/25/2016 07:32 AM, Dominik Vogt wrote:
> >On Wed, May 25, 2016 at 02:30:54PM +0100, Dominik Vogt wrote:
> >>> On Tue, May 03, 2016 at 03:17:53PM +0100, Dominik Vogt wrote:
> >>>> > Version two of the patch including a test case.
> >>>> >
> >>>> > On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote:
> >>>>> > > On 04/29/2016 04:12 PM, Dominik Vogt wrote:
> >>>>>> > > >The attached patch removes excess stack space allocation with
> >>>>>> > > >alloca in some situations.  Plese check the commit message in the
> >>>>>> > > >patch for details.
> >>>> >
> >>>>> > > However, I would strongly recommend some tests, even if they are
> >>>>> > > target specific.  You can always copy pr36728-1 into the s390x
> >>>>> > > directory and look at size of the generated stack.  Simliarly for
> >>>>> > > pr50938 for x86.
> >>>> >
> >>>> > However, x86 uses the "else" branch in round_push, i.e. it uses
> >>>> > "virtual_preferred_stack_boundary_rtx" to calculate the number of
> >>>> > bytes to add for stack alignment.  That value is unknown at the
> >>>> > time round_push is called, so the test case fails on such targets,
> >>>> > and I've no idea how to fix this properly.
> >>>
> >>> Third version of the patch with the suggested cleanup in the first
> >>> patch and the functional stuff in the second one.  The first patch
> >>> is based on Jeff's draft with the change suggested by Eric and
> >>> more cleanup added by me.
> >This is the updated funtional patch.  Re-tested with limited
> >effort, i.e. tested and bootstrapped on s390x biarch (but did not
> >look for performance regressions compared to version 2 of the
> >patch).
> >
> >Ciao
> >
> >Dominik ^_^  ^_^
> >
> >-- Dominik Vogt IBM Germany
> >
> >
> >0002-v3-ChangeLog
> >
> >
> >gcc/ChangeLog
> >
> >	* explow.c (round_push): Use know adjustment.
> >	(allocate_dynamic_stack_space): Pass known adjustment to round_push.
> >gcc/testsuite/ChangeLog
> >
> >	* gcc.dg/pr50938.c: New test.
> >
> >
> >0002-v3-Drop-excess-size-used-for-run-time-allocated-stack-v.patch
> >
> >
> >From 4296d353e1d153b5b5ee435a44cae6117bf2fff0 Mon Sep 17 00:00:00 2001
> >From: Dominik Vogt <vogt@linux.vnet.ibm.com>
> >Date: Fri, 29 Apr 2016 08:36:59 +0100
> >Subject: [PATCH 2/2] Drop excess size used for run time allocated stack
> > variables.
> >
> >The present calculation sometimes led to more stack memory being used than
> >necessary with alloca.  First, (STACK_BOUNDARY -1) would be added to the
> >allocated size:
> >
> >  size = plus_constant (Pmode, size, extra);
> >  size = force_operand (size, NULL_RTX);
> >
> >Then round_push was called and added another (STACK_BOUNDARY - 1) before
> >rounding down to a multiple of STACK_BOUNDARY.  On s390x this resulted in
> >adding 14 before rounding down for "x" in the test case pr36728-1.c.
> >
> >round_push() now takes an argument to inform it about what has already been
> >added to size.
> >---
> > gcc/explow.c                   | 45 +++++++++++++++++++++---------------
> > gcc/testsuite/gcc.dg/pr50938.c | 52 ++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 79 insertions(+), 18 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.dg/pr50938.c
> >
> >diff --git a/gcc/explow.c b/gcc/explow.c
> >index 09a0330..85596e2 100644
> >--- a/gcc/explow.c
> >+++ b/gcc/explow.c
> >@@ -949,24 +949,30 @@ anti_adjust_stack (rtx adjust)
> > }
> >
> > /* Round the size of a block to be pushed up to the boundary required
> >-   by this machine.  SIZE is the desired size, which need not be constant.  */
> >+   by this machine.  SIZE is the desired size, which need not be constant.
> >+   ALREADY_ADDED is the number of units that have already been added to SIZE
> >+   for other alignment reasons.
> >+*/
> >
> > static rtx
> >-round_push (rtx size)
> >+round_push (rtx size, int already_added)
> > {
> >-  rtx align_rtx, alignm1_rtx;
> >+  rtx align_rtx, add_rtx;
> >
> >   if (!SUPPORTS_STACK_ALIGNMENT
> >       || crtl->preferred_stack_boundary == MAX_SUPPORTED_STACK_ALIGNMENT)
> >     {
> >       int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
> >+      int add;
> >
> >       if (align == 1)
> > 	return size;
> >
> >+      add = (align > already_added) ? align - already_added - 1 : 0;
> >+
> >       if (CONST_INT_P (size))
> > 	{
> >-	  HOST_WIDE_INT new_size = (INTVAL (size) + align - 1) / align * align;
> >+	  HOST_WIDE_INT new_size = (INTVAL (size) + add) / align * align;
> >
> > 	  if (INTVAL (size) != new_size)
> > 	    size = GEN_INT (new_size);
> So presumably the idea here is when the requested SIZE would require
> allocating additional space to first see if the necessary space is
> already available inside ALREADY_ADDED

Yes.

> and use that rather than rounding size up to an alignment boundary.

Not exactly.  Consider the unpatched code.  At the beginning we
have some amount of space to be allocated on the stack at runtime
("SSIZE"), some requested alignment for it ("SALIGN").

get_dynamic_stack_size() first calculates the space needed for run
time alignment:

  SIZE = SSIZE + SALIGN - 1

Then it calls round_push() to add *another* chunk of memory to the
allocation size to be able to align it to the required stack slot
alignment ("SLOTALIGN") at run time.

  SIZE = SIZE + SLOTALIGN - 1
       = SSIZE + (SALIGN - 1) + (SLOTALIGN - 1)

Now it has added two chunks of memory but alignment is only done
once.  With the patch it just adds the maximum of (SALIGN - 1) and
(SLOTALIGN - 1), not both.  Thinking about it, the "round_push"
stuff is a very complicated way of saying "add max(A, B)".

I'd volunteer to clean this up more, but preferrably when the two
pending patches are in.  The current code is a real brain-twister.

> I can see how that works in the sense of avoiding allocating extra
> space.  What I'm struggling with is how do we know the space
> actually allocated is going to have the right alignment?

It doesn't.  That is what the extra space is needed for, i.e. the
data is placed in the (larger) allocated space at an address with
proper alignment, at run time.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2][v3] Drop excess size used for run time allocated stack variables.
  2016-05-25 14:02     ` [PATCH 1/2][v3] " Dominik Vogt
                         ` (2 preceding siblings ...)
  2016-06-22 20:34       ` Jeff Law
@ 2016-07-04 14:22       ` Andreas Krebbel
  3 siblings, 0 replies; 35+ messages in thread
From: Andreas Krebbel @ 2016-07-04 14:22 UTC (permalink / raw)
  To: Dominik Vogt; +Cc: gcc-patches

> gcc/ChangeLog0
> 
> 	* explow.c (allocate_dynamic_stack_space): Simplify knowing that
> 	MUST_ALIGN was always true and extra_align ist always BITS_PER_UNIT.

Applied. Thanks!

-Andreas-

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2][v3] Drop excess size used for run time allocated stack variables.
  2016-06-23  9:57           ` Dominik Vogt
@ 2016-07-21 20:07             ` Jeff Law
  2016-07-22 12:02               ` Dominik Vogt
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Law @ 2016-07-21 20:07 UTC (permalink / raw)
  To: vogt, gcc-patches, Andreas Krebbel

On 06/23/2016 03:57 AM, Dominik Vogt wrote:
>>> 0002-v3-ChangeLog
>>>
>>>
>>> gcc/ChangeLog
>>>
>>> 	* explow.c (round_push): Use know adjustment.
>>> 	(allocate_dynamic_stack_space): Pass known adjustment to round_push.
>>> gcc/testsuite/ChangeLog
>>>
>>> 	* gcc.dg/pr50938.c: New test.
>>>
>>>
>>> 0002-v3-Drop-excess-size-used-for-run-time-allocated-stack-v.patch
>>>
>>>
>> >From 4296d353e1d153b5b5ee435a44cae6117bf2fff0 Mon Sep 17 00:00:00 2001
>>> From: Dominik Vogt <vogt@linux.vnet.ibm.com>
>>> Date: Fri, 29 Apr 2016 08:36:59 +0100
>>> Subject: [PATCH 2/2] Drop excess size used for run time allocated stack
>>> variables.
>>>
>>> The present calculation sometimes led to more stack memory being used than
>>> necessary with alloca.  First, (STACK_BOUNDARY -1) would be added to the
>>> allocated size:
>>>
>>>  size = plus_constant (Pmode, size, extra);
>>>  size = force_operand (size, NULL_RTX);
>>>
>>> Then round_push was called and added another (STACK_BOUNDARY - 1) before
>>> rounding down to a multiple of STACK_BOUNDARY.  On s390x this resulted in
>>> adding 14 before rounding down for "x" in the test case pr36728-1.c.
>>>
>>> round_push() now takes an argument to inform it about what has already been
>>> added to size.
>>> ---
>>> gcc/explow.c                   | 45 +++++++++++++++++++++---------------
>>> gcc/testsuite/gcc.dg/pr50938.c | 52 ++++++++++++++++++++++++++++++++++++++++++
>>> 2 files changed, 79 insertions(+), 18 deletions(-)
>>> create mode 100644 gcc/testsuite/gcc.dg/pr50938.c
>>>
>>> diff --git a/gcc/explow.c b/gcc/explow.c
>>> index 09a0330..85596e2 100644
>>> --- a/gcc/explow.c
>>> +++ b/gcc/explow.c
>>> @@ -949,24 +949,30 @@ anti_adjust_stack (rtx adjust)
>>> }
>>>
>>> /* Round the size of a block to be pushed up to the boundary required
>>> -   by this machine.  SIZE is the desired size, which need not be constant.  */
>>> +   by this machine.  SIZE is the desired size, which need not be constant.
>>> +   ALREADY_ADDED is the number of units that have already been added to SIZE
>>> +   for other alignment reasons.
>>> +*/
>>>
>>> static rtx
>>> -round_push (rtx size)
>>> +round_push (rtx size, int already_added)
>>> {
>>> -  rtx align_rtx, alignm1_rtx;
>>> +  rtx align_rtx, add_rtx;
>>>
>>>   if (!SUPPORTS_STACK_ALIGNMENT
>>>       || crtl->preferred_stack_boundary == MAX_SUPPORTED_STACK_ALIGNMENT)
>>>     {
>>>       int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
>>> +      int add;
>>>
>>>       if (align == 1)
>>> 	return size;
>>>
>>> +      add = (align > already_added) ? align - already_added - 1 : 0;
>>> +
>>>       if (CONST_INT_P (size))
>>> 	{
>>> -	  HOST_WIDE_INT new_size = (INTVAL (size) + align - 1) / align * align;
>>> +	  HOST_WIDE_INT new_size = (INTVAL (size) + add) / align * align;
>>>
>>> 	  if (INTVAL (size) != new_size)
>>> 	    size = GEN_INT (new_size);
>> So presumably the idea here is when the requested SIZE would require
>> allocating additional space to first see if the necessary space is
>> already available inside ALREADY_ADDED
>
> Yes.
>
>> and use that rather than rounding size up to an alignment boundary.
>
> Not exactly.  Consider the unpatched code.  At the beginning we
> have some amount of space to be allocated on the stack at runtime
> ("SSIZE"), some requested alignment for it ("SALIGN").
>
> get_dynamic_stack_size() first calculates the space needed for run
> time alignment:
>
>   SIZE = SSIZE + SALIGN - 1
>
> Then it calls round_push() to add *another* chunk of memory to the
> allocation size to be able to align it to the required stack slot
> alignment ("SLOTALIGN") at run time.
>
>   SIZE = SIZE + SLOTALIGN - 1
>        = SSIZE + (SALIGN - 1) + (SLOTALIGN - 1)
>
> Now it has added two chunks of memory but alignment is only done
> once.  With the patch it just adds the maximum of (SALIGN - 1) and
> (SLOTALIGN - 1), not both.  Thinking about it, the "round_push"
> stuff is a very complicated way of saying "add max(A, B)".
Now I see it.  Thanks, that helped a ton.

>
> I'd volunteer to clean this up more, but preferrably when the two
> pending patches are in.  The current code is a real brain-twister.
I'd be all for such cleanups after we wrap up the pending patches.  It's 
certainly a rats nest of code right now.

This patch is fine for the trunk.  Thanks for your patience.

jeff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2][v3] Drop excess size used for run time allocated stack variables.
  2016-07-21 20:07             ` Jeff Law
@ 2016-07-22 12:02               ` Dominik Vogt
  2016-07-26 15:53                 ` [PATCH 2/2][v4] " Dominik Vogt
  0 siblings, 1 reply; 35+ messages in thread
From: Dominik Vogt @ 2016-07-22 12:02 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Andreas Krebbel

On Thu, Jul 21, 2016 at 02:07:05PM -0600, Jeff Law wrote:
> On 06/23/2016 03:57 AM, Dominik Vogt wrote:
> >>and use that rather than rounding size up to an alignment boundary.
> >
> >Not exactly.  Consider the unpatched code.  At the beginning we
> >have some amount of space to be allocated on the stack at runtime
> >("SSIZE"), some requested alignment for it ("SALIGN").
> >
> >get_dynamic_stack_size() first calculates the space needed for run
> >time alignment:
> >
> >  SIZE = SSIZE + SALIGN - 1
> >
> >Then it calls round_push() to add *another* chunk of memory to the
> >allocation size to be able to align it to the required stack slot
> >alignment ("SLOTALIGN") at run time.
> >
> >  SIZE = SIZE + SLOTALIGN - 1
> >       = SSIZE + (SALIGN - 1) + (SLOTALIGN - 1)
> >
> >Now it has added two chunks of memory but alignment is only done
> >once.  With the patch it just adds the maximum of (SALIGN - 1) and
> >(SLOTALIGN - 1), not both.  Thinking about it, the "round_push"
> >stuff is a very complicated way of saying "add max(A, B)".
> Now I see it.  Thanks, that helped a ton.
> 
> >
> >I'd volunteer to clean this up more, but preferrably when the two
> >pending patches are in.  The current code is a real brain-twister.
> I'd be all for such cleanups after we wrap up the pending patches.
> It's certainly a rats nest of code right now.
> 
> This patch is fine for the trunk.  Thanks for your patience.

Actually I was goind to abandon the patch in its current state.
:-)  We talked about it internally and concluded that the problem
is really this:

 * get_dynamic_stack_size is passed a SIZE of a data block (which
   is allocated elsewhere), the SIZE_ALIGN of the SIZE (i.e. the
   alignment of the underlying memory units (e.g. 32 bytes split
   into 4 times 8 bytes = 64 bit alignment) and the
   REQUIRED_ALIGN of the data portion of the allocated memory.
 * Assuming the function is called with SIZE = 2, SIZE_ALIGN = 8
   and REQUIRED_ALIGN = 64 it first adds 7 bytes to SIZE -> 9.
   This is what is needed to have two bytes 8-byte-aligned at some
   memory location without any known alignment.
 * Finally round_push is called to round up SIZE to a multiple of
   the stack slot size.

The key to understanding this is that the function assumes that
STACK_DYNMAIC_OFFSET is completely unknown at the time its called
and therefore it does not make assumptions about the alignment of
STACKPOINTER + STACK_DYNMAIC_OFFSET.  The latest patch simply
hard-codes that SP + SDO is supposed to be aligned to at least
stack slot size (and does that in a very complicated way).  Since
there is no guarantee that this is the case on all targets, the
patch is broken.  It may miscalculate a SIZE that is too small in
some cases.

However, on many targets there is some guarantee about the
alignment of SP + SDO even if the actual value of SDO is unknown.
On s390x it's always 8-byte-aligned (stack slot size).  So the
right fix should be to add knowledge about the target's guaranteed
alignment of SP + SDO to the function.  I'm right now testing a
much simpler patch that uses
REGNO_POINTER_ALIGN(VIRTUAL_STACK_DYNAMIC_REGNUM) as the
alignment.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2][v4] Drop excess size used for run time allocated stack variables.
  2016-07-22 12:02               ` Dominik Vogt
@ 2016-07-26 15:53                 ` Dominik Vogt
  2016-08-18 16:20                   ` Jeff Law
  2016-08-23  9:23                   ` Andreas Krebbel
  0 siblings, 2 replies; 35+ messages in thread
From: Dominik Vogt @ 2016-07-26 15:53 UTC (permalink / raw)
  To: Jeff Law, gcc-patches, Andreas Krebbel

[-- Attachment #1: Type: text/plain, Size: 2611 bytes --]

Finally a patch that works and is simple.  Bootstrapped and
regression tested on s390, s390x biarch and x86_64.  The new patch
exploits the known alignment of (stack pointer +
STACK_DYNAMIC_OFFSET) as described earlier (see below).  I think
that is the right way to get rid of the extra allocation.  It
took a long time to understand the problem.

As the patch triggers a bug in the fortran compiler, the
der_type.f90 test case may fail on some targets if this patch is
used without the fortran fix that I've posted in another thread.

(The patch also contains a fix for a typo in a comment in the
patched function.)

See ChangeLog for a full description of the new patch.

Since the patch is all new, we're not going to commit it without a
new OK.

> Actually I was goind to abandon the patch in its current state.
> :-)  We talked about it internally and concluded that the problem
> is really this:
> 
>  * get_dynamic_stack_size is passed a SIZE of a data block (which
>    is allocated elsewhere), the SIZE_ALIGN of the SIZE (i.e. the
>    alignment of the underlying memory units (e.g. 32 bytes split
>    into 4 times 8 bytes = 64 bit alignment) and the
>    REQUIRED_ALIGN of the data portion of the allocated memory.
>  * Assuming the function is called with SIZE = 2, SIZE_ALIGN = 8
>    and REQUIRED_ALIGN = 64 it first adds 7 bytes to SIZE -> 9.
>    This is what is needed to have two bytes 8-byte-aligned at some
>    memory location without any known alignment.
>  * Finally round_push is called to round up SIZE to a multiple of
>    the stack slot size.
> 
> The key to understanding this is that the function assumes that
> STACK_DYNMAIC_OFFSET is completely unknown at the time its called
> and therefore it does not make assumptions about the alignment of
> STACKPOINTER + STACK_DYNMAIC_OFFSET.  The latest patch simply
> hard-codes that SP + SDO is supposed to be aligned to at least
> stack slot size (and does that in a very complicated way).  Since
> there is no guarantee that this is the case on all targets, the
> patch is broken.  It may miscalculate a SIZE that is too small in
> some cases.
> 
> However, on many targets there is some guarantee about the
> alignment of SP + SDO even if the actual value of SDO is unknown.
> On s390x it's always 8-byte-aligned (stack slot size).  So the
> right fix should be to add knowledge about the target's guaranteed
> alignment of SP + SDO to the function.  I'm right now testing a
> much simpler patch that uses
> REGNO_POINTER_ALIGN(VIRTUAL_STACK_DYNAMIC_REGNUM) as the
> alignment.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: 0001-v4-ChangeLog --]
[-- Type: text/plain, Size: 193 bytes --]

gcc/ChangeLog

	* explow.c (get_dynamic_stack_size): Take known alignment of stack
	pointer + STACK_DYNAMIC_OFFSET into account when calculating the size
	needed.
	Correct a typo in a comment.

[-- Attachment #3: 0001-v4-Reduce-size-allocated-for-run-time-allocated-stack-v.patch --]
[-- Type: text/plain, Size: 2398 bytes --]

From 7d7a6b7fdb189759eb11b96c93ee4adfa8608a97 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Fri, 29 Apr 2016 08:36:59 +0100
Subject: [PATCH] Reduce size allocated for run time allocated stack
 variables.

The present calculation sometimes led to more stack memory being used than
necessary with alloca.  First, (STACK_BOUNDARY -1) would be added to the
allocated size:

  size = plus_constant (Pmode, size, extra);
  size = force_operand (size, NULL_RTX);

Then round_push was called and added another (STACK_BOUNDARY - 1) before
rounding down to a multiple of STACK_BOUNDARY.  On s390x this resulted in
adding 14 before rounding down for "x" in the test case pr36728-1.c.

The problem was that get_dynamic_stack_size did not take into account that the
target might guarantee some alignment of (stack_pointer + STACK_DYNAMIC_OFFSET)
even if the value of STACK_DYNAMIC_OFFSET is not known yet.
---
 gcc/explow.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/gcc/explow.c b/gcc/explow.c
index a345690..f97a214 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1224,9 +1224,15 @@ get_dynamic_stack_size (rtx *psize, unsigned size_align,
      example), so we must preventively align the value.  We leave space
      in SIZE for the hole that might result from the alignment operation.  */
 
-  extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT;
-  size = plus_constant (Pmode, size, extra);
-  size = force_operand (size, NULL_RTX);
+  unsigned known_align = REGNO_POINTER_ALIGN (VIRTUAL_STACK_DYNAMIC_REGNUM);
+  if (known_align == 0)
+    known_align = BITS_PER_UNIT;
+  if (required_align > known_align)
+    {
+      extra = (required_align - known_align) / BITS_PER_UNIT;
+      size = plus_constant (Pmode, size, extra);
+      size = force_operand (size, NULL_RTX);
+    }
 
   if (flag_stack_usage_info && pstack_usage_size)
     *pstack_usage_size += extra;
@@ -1235,7 +1241,7 @@ get_dynamic_stack_size (rtx *psize, unsigned size_align,
     size_align = BITS_PER_UNIT;
 
   /* Round the size to a multiple of the required stack alignment.
-     Since the stack if presumed to be rounded before this allocation,
+     Since the stack is presumed to be rounded before this allocation,
      this will maintain the required alignment.
 
      If the stack grows downward, we could save an insn by subtracting
-- 
2.3.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2][v4] Drop excess size used for run time allocated stack variables.
  2016-07-26 15:53                 ` [PATCH 2/2][v4] " Dominik Vogt
@ 2016-08-18 16:20                   ` Jeff Law
  2016-08-23  9:23                   ` Andreas Krebbel
  1 sibling, 0 replies; 35+ messages in thread
From: Jeff Law @ 2016-08-18 16:20 UTC (permalink / raw)
  To: vogt, gcc-patches, Andreas Krebbel

On 07/26/2016 09:53 AM, Dominik Vogt wrote:
> Finally a patch that works and is simple.  Bootstrapped and
> regression tested on s390, s390x biarch and x86_64.  The new patch
> exploits the known alignment of (stack pointer +
> STACK_DYNAMIC_OFFSET) as described earlier (see below).  I think
> that is the right way to get rid of the extra allocation.  It
> took a long time to understand the problem.
>
> As the patch triggers a bug in the fortran compiler, the
> der_type.f90 test case may fail on some targets if this patch is
> used without the fortran fix that I've posted in another thread.
>
> (The patch also contains a fix for a typo in a comment in the
> patched function.)
>
> See ChangeLog for a full description of the new patch.
>
> Since the patch is all new, we're not going to commit it without a
> new OK.
I like this one much better :-)

OK.

Thanks for your patience,
JEff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2][v4] Drop excess size used for run time allocated stack variables.
  2016-07-26 15:53                 ` [PATCH 2/2][v4] " Dominik Vogt
  2016-08-18 16:20                   ` Jeff Law
@ 2016-08-23  9:23                   ` Andreas Krebbel
  1 sibling, 0 replies; 35+ messages in thread
From: Andreas Krebbel @ 2016-08-23  9:23 UTC (permalink / raw)
  To: Dominik Vogt; +Cc: gcc-patches

> gcc/ChangeLog
> 
> 	* explow.c (get_dynamic_stack_size): Take known alignment of stack
> 	pointer + STACK_DYNAMIC_OFFSET into account when calculating the size
> 	needed.
> 	Correct a typo in a comment.

Applied.  Thanks!

-Andreas-

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-08-23  9:23 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-29 22:13 [PATCH] Drop excess size used for run time allocated stack variables Dominik Vogt
2016-04-29 22:15 ` Dominik Vogt
2016-04-30  9:44 ` Eric Botcazou
2016-04-30 10:14   ` Dominik Vogt
2016-05-02 13:43 ` Dominik Vogt
2016-05-02 15:10 ` Jeff Law
2016-05-03 14:18   ` Dominik Vogt
2016-05-19 23:11     ` Jeff Law
2016-05-20 21:24       ` [RFA] Minor cleanup to allocate_dynamic_stack_space Jeff Law
2016-05-20 21:44         ` Eric Botcazou
2016-05-20 21:48           ` Jeff Law
2016-05-23  7:53             ` Eric Botcazou
2016-05-23 10:12         ` Dominik Vogt
2016-05-25 12:46         ` Dominik Vogt
2016-06-09 12:00       ` [PATCH] Drop excess size used for run time allocated stack variables Bernd Schmidt
2016-06-21  9:35         ` Dominik Vogt
2016-06-21 22:26           ` Jeff Law
2016-06-22  8:57             ` Dominik Vogt
2016-05-25 14:02     ` [PATCH 1/2][v3] " Dominik Vogt
2016-05-25 14:31       ` [PATCH 2/2][v3] " Dominik Vogt
2016-05-25 14:51         ` Dominik Vogt
2016-06-08 11:21         ` Bernd Schmidt
2016-06-20 12:20           ` Dominik Vogt
2016-06-20 12:20             ` Bernd Schmidt
2016-06-23  4:24         ` Jeff Law
2016-06-23  9:57           ` Dominik Vogt
2016-07-21 20:07             ` Jeff Law
2016-07-22 12:02               ` Dominik Vogt
2016-07-26 15:53                 ` [PATCH 2/2][v4] " Dominik Vogt
2016-08-18 16:20                   ` Jeff Law
2016-08-23  9:23                   ` Andreas Krebbel
2016-06-08 11:20       ` [PATCH 1/2][v3] " Bernd Schmidt
2016-06-08 12:20         ` Eric Botcazou
2016-06-22 20:34       ` Jeff Law
2016-07-04 14:22       ` Andreas Krebbel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).