public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, PR50527] Don't assume alignment of vla-related allocas.
@ 2011-09-28 10:56 Tom de Vries
  2011-09-28 11:35 ` Richard Guenther
  0 siblings, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2011-09-28 10:56 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches >> "gcc-patches@gcc.gnu.org"

[-- Attachment #1: Type: text/plain, Size: 430 bytes --]

Richard,

I got a patch for PR50527.

The patch prevents the alignment of vla-related allocas to be set to
BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
the alloca.

Bootstrapped and regtested on x86_64.

OK for trunk?

Thanks,
- Tom

2011-09-27  Tom de Vries  <tom@codesourcery.com>

	* tree-ssa-ccp.c (evaluate_stmt): Don't assume alignment for vla-related
	allocas.

	* gcc.dg/pr50527.c: New test.

[-- Attachment #2: pr50527.patch --]
[-- Type: text/x-patch, Size: 1660 bytes --]

Index: gcc/tree-ssa-ccp.c
===================================================================
--- gcc/tree-ssa-ccp.c (revision 179210)
+++ gcc/tree-ssa-ccp.c (working copy)
@@ -1632,6 +1632,8 @@ evaluate_stmt (gimple stmt)
 	      break;
 
 	    case BUILT_IN_ALLOCA:
+	      if (gimple_call_alloca_for_var_p (stmt))
+		break;
 	      val.lattice_val = CONSTANT;
 	      val.value = build_int_cst (TREE_TYPE (gimple_get_lhs (stmt)), 0);
 	      val.mask = shwi_to_double_int
Index: gcc/testsuite/gcc.dg/pr50527.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr50527.c (revision 0)
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-options "-Os --param large-stack-frame=30" } */
+
+extern void abort (void);
+
+void __attribute__((noinline)) 
+bar (char *a)
+{
+}
+
+void __attribute__((noinline)) 
+foo (char *a, int b)
+{
+}
+
+void __attribute__((noinline))
+test_align (char *p, int aligned, unsigned int mask)
+{
+  int p_aligned = ((unsigned long int)p & mask) == 0;
+  if (aligned != p_aligned)
+    abort ();
+}
+
+int
+main ()
+{
+  const int kIterations = 4;
+  char results[kIterations];
+  int i;
+  unsigned int mask;
+
+  mask = 0xf;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x7;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x3;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x1;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+
+  bar (results);
+  for (i = 0; i < kIterations; i++)
+    foo ("%d ", results[i]);
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-09-28 10:56 [PATCH, PR50527] Don't assume alignment of vla-related allocas Tom de Vries
@ 2011-09-28 11:35 ` Richard Guenther
  2011-09-29 15:11   ` Tom de Vries
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Guenther @ 2011-09-28 11:35 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches >> "gcc-patches@gcc.gnu.org"

On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> Richard,
>
> I got a patch for PR50527.
>
> The patch prevents the alignment of vla-related allocas to be set to
> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
> the alloca.
>
> Bootstrapped and regtested on x86_64.
>
> OK for trunk?

Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
the vectorizer then will no longer see that the arrays are properly aligned.

I'm not sure what the best thing to do is here, other than trying to record
the alignment requirement of the VLA somewhere.

Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
has the issue that it will force stack-realignment which isn't free (and the
point was to make the decl cheaper than the alloca).  But that might
possibly be the better choice.

Any other thoughts?

Thanks,
Richard.

> Thanks,
> - Tom
>
> 2011-09-27  Tom de Vries  <tom@codesourcery.com>
>
>        * tree-ssa-ccp.c (evaluate_stmt): Don't assume alignment for vla-related
>        allocas.
>
>        * gcc.dg/pr50527.c: New test.
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-09-28 11:35 ` Richard Guenther
@ 2011-09-29 15:11   ` Tom de Vries
  2011-09-30 14:35     ` Richard Guenther
  0 siblings, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2011-09-29 15:11 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches >> "gcc-patches@gcc.gnu.org"

[-- Attachment #1: Type: text/plain, Size: 1953 bytes --]

On 09/28/2011 11:53 AM, Richard Guenther wrote:
> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> Richard,
>>
>> I got a patch for PR50527.
>>
>> The patch prevents the alignment of vla-related allocas to be set to
>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>> the alloca.
>>
>> Bootstrapped and regtested on x86_64.
>>
>> OK for trunk?
> 
> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
> the vectorizer then will no longer see that the arrays are properly aligned.
> 
> I'm not sure what the best thing to do is here, other than trying to record
> the alignment requirement of the VLA somewhere.
> 
> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
> has the issue that it will force stack-realignment which isn't free (and the
> point was to make the decl cheaper than the alloca).  But that might
> possibly be the better choice.
> 
> Any other thoughts?

How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
for the new array prevents stack realignment for folded vla-allocas, also for
large vlas.

This will not help in vectorizing large folded vla-allocas, but I think it's not
reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
been the case up until we started to fold). If you want to trigger vectorization
for a vla, you can still use the aligned attribute on the declaration.

Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
an attribute on the decl. This patch exploits this by setting it at the end of
the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
propagation though, because although the ptr_info of the lhs is propagated via
copy_prop afterwards, it's not propagated anymore via ccp.

Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
not fold during ccp3.

Thanks,
- Tom


[-- Attachment #2: pr50527.2.patch --]
[-- Type: text/x-patch, Size: 6987 bytes --]

Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h (revision 179043)
+++ gcc/tree-pass.h (working copy)
@@ -389,6 +389,7 @@ extern struct gimple_opt_pass pass_iv_op
 extern struct gimple_opt_pass pass_tree_loop_done;
 extern struct gimple_opt_pass pass_ch;
 extern struct gimple_opt_pass pass_ccp;
+extern struct gimple_opt_pass pass_ccp_last;
 extern struct gimple_opt_pass pass_phi_only_cprop;
 extern struct gimple_opt_pass pass_build_ssa;
 extern struct gimple_opt_pass pass_build_alias;
Index: gcc/tree-ssa-ccp.c
===================================================================
--- gcc/tree-ssa-ccp.c (revision 179210)
+++ gcc/tree-ssa-ccp.c (working copy)
@@ -167,8 +167,12 @@ typedef struct prop_value_d prop_value_t
    doing the store).  */
 static prop_value_t *const_val;
 
+/* Indicates whether we're in pass_ccp_last.  */
+static bool ccp_last;
+
 static void canonicalize_float_value (prop_value_t *);
 static bool ccp_fold_stmt (gimple_stmt_iterator *);
+static bool fold_builtin_alloca_for_var_p (gimple);
 
 /* Dump constant propagation value VAL to file OUTF prefixed by PREFIX.  */
 
@@ -813,6 +817,15 @@ ccp_finalize (void)
 	  || !POINTER_TYPE_P (TREE_TYPE (name)))
 	continue;
 
+      if (ccp_last && !SSA_NAME_IS_DEFAULT_DEF (name)
+	  && gimple_call_alloca_for_var_p (SSA_NAME_DEF_STMT (name))
+	  && !fold_builtin_alloca_for_var_p (SSA_NAME_DEF_STMT (name)))
+	{
+	  pi = get_ptr_info (name);
+	  pi->align = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
+	  continue;
+	}
+
       val = get_value (name);
       if (val->lattice_val != CONSTANT
 	  || TREE_CODE (val->value) != INTEGER_CST)
@@ -1492,6 +1505,7 @@ evaluate_stmt (gimple stmt)
   tree simplified = NULL_TREE;
   ccp_lattice_t likelyvalue = likely_value (stmt);
   bool is_constant = false;
+  unsigned int align;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -1632,10 +1646,13 @@ evaluate_stmt (gimple stmt)
 	      break;
 
 	    case BUILT_IN_ALLOCA:
+	      align = (gimple_call_alloca_for_var_p (stmt)
+		       ? TYPE_ALIGN (TREE_TYPE (gimple_get_lhs (stmt)))
+		       : BIGGEST_ALIGNMENT);
 	      val.lattice_val = CONSTANT;
 	      val.value = build_int_cst (TREE_TYPE (gimple_get_lhs (stmt)), 0);
 	      val.mask = shwi_to_double_int
-		  	   (~(((HOST_WIDE_INT) BIGGEST_ALIGNMENT)
+		  	   (~(((HOST_WIDE_INT) align)
 			      / BITS_PER_UNIT - 1));
 	      break;
 
@@ -1685,27 +1702,25 @@ evaluate_stmt (gimple stmt)
   return val;
 }
 
-/* Detects a vla-related alloca with a constant argument.  Declares fixed-size
-   array and return the address, if found, otherwise returns NULL_TREE.  */
-
-static tree
-fold_builtin_alloca_for_var (gimple stmt)
+static bool
+fold_builtin_alloca_for_var_p (gimple stmt)
 {
-  unsigned HOST_WIDE_INT size, threshold, n_elem;
-  tree lhs, arg, block, var, elem_type, array_type;
-  unsigned int align;
+  unsigned HOST_WIDE_INT size, threshold;
+  tree lhs, arg, block;
+
+  gcc_checking_assert (gimple_call_alloca_for_var_p (stmt));
 
   /* Get lhs.  */
   lhs = gimple_call_lhs (stmt);
   if (lhs == NULL_TREE)
-    return NULL_TREE;
+    return false;
 
   /* Detect constant argument.  */
   arg = get_constant_value (gimple_call_arg (stmt, 0));
   if (arg == NULL_TREE
       || TREE_CODE (arg) != INTEGER_CST
       || !host_integerp (arg, 1))
-    return NULL_TREE;
+    return false;
 
   size = TREE_INT_CST_LOW (arg);
 
@@ -1718,17 +1733,33 @@ fold_builtin_alloca_for_var (gimple stmt
         && TREE_CODE (BLOCK_SUPERCONTEXT (block)) == FUNCTION_DECL))
     threshold /= 10;
   if (size > threshold)
-    return NULL_TREE;
+    return false;
+
+  return true;
+}
+
+/* Detects a vla-related alloca with a constant argument.  Declares fixed-size
+   array and return the address, if found, otherwise returns NULL_TREE.  */
+
+static tree
+fold_builtin_alloca_for_var (gimple stmt)
+{
+  unsigned HOST_WIDE_INT size, n_elem;
+  tree lhs, arg, var, elem_type, array_type;
+
+  if (!fold_builtin_alloca_for_var_p (stmt))
+    return NULL;
+
+  lhs = gimple_call_lhs (stmt);
+  arg = get_constant_value (gimple_call_arg (stmt, 0));
+  size = TREE_INT_CST_LOW (arg);
 
   /* Declare array.  */
   elem_type = build_nonstandard_integer_type (BITS_PER_UNIT, 1);
   n_elem = size * 8 / BITS_PER_UNIT;
-  align = MIN (size * 8, BIGGEST_ALIGNMENT);
-  if (align < BITS_PER_UNIT)
-    align = BITS_PER_UNIT;
   array_type = build_array_type_nelts (elem_type, n_elem);
   var = create_tmp_var (array_type, NULL);
-  DECL_ALIGN (var) = align;
+  DECL_ALIGN (var) = TYPE_ALIGN (TREE_TYPE (TREE_TYPE (SSA_NAME_VAR (lhs))));
   {
     struct ptr_info_def *pi = SSA_NAME_PTR_INFO (lhs);
     if (pi != NULL && !pi->pt.anything)
@@ -2030,6 +2061,19 @@ do_ssa_ccp (void)
     return 0;
 }
 
+/* Last invocation of SSA Conditional Constant Propagation.
+   At the end of the last invocation, we know which vla-allocas will remain
+   allocas, and we can assume BIGGEST_ALIGNMENT for those.  */
+
+static unsigned int
+do_ssa_ccp_last (void)
+{
+  unsigned int res;
+  ccp_last = true;
+  res = do_ssa_ccp ();
+  ccp_last = false;
+  return res;
+}
 
 static bool
 gate_ccp (void)
@@ -2058,6 +2102,25 @@ struct gimple_opt_pass pass_ccp =
  }
 };
 
+struct gimple_opt_pass pass_ccp_last =
+{
+ {
+  GIMPLE_PASS,
+  "ccp3",				/* name */
+  gate_ccp,				/* gate */
+  do_ssa_ccp_last,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_TREE_CCP,				/* tv_id */
+  PROP_cfg | PROP_ssa,			/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_verify_ssa
+  | TODO_verify_stmts | TODO_ggc_collect/* todo_flags_finish */
+ }
+};
 
 
 /* Try to optimize out __builtin_stack_restore.  Optimize it out
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c (revision 179043)
+++ gcc/gimplify.c (working copy)
@@ -1321,7 +1321,8 @@ gimplify_vla_decl (tree decl, gimple_seq
      things: First, it lets the rest of the gimplifier know what
      replacement to use.  Second, it lets the debug info know
      where to find the value.  */
-  ptr_type = build_pointer_type (TREE_TYPE (decl));
+  ptr_type = build_pointer_type (build_aligned_type (TREE_TYPE (decl),
+						     DECL_ALIGN (decl)));
   addr = create_tmp_var (ptr_type, get_name (decl));
   DECL_IGNORED_P (addr) = 0;
   t = build_fold_indirect_ref (addr);
Index: gcc/passes.c
===================================================================
--- gcc/passes.c (revision 179043)
+++ gcc/passes.c (working copy)
@@ -1321,7 +1321,7 @@ init_optimization_passes (void)
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt);
       NEXT_PASS (pass_object_sizes);
-      NEXT_PASS (pass_ccp);
+      NEXT_PASS (pass_ccp_last);
       NEXT_PASS (pass_copy_prop);
       NEXT_PASS (pass_cse_sincos);
       NEXT_PASS (pass_optimize_bswap);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-09-29 15:11   ` Tom de Vries
@ 2011-09-30 14:35     ` Richard Guenther
  2011-10-01 15:44       ` Tom de Vries
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Guenther @ 2011-09-30 14:35 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches >> "gcc-patches@gcc.gnu.org"

On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> Richard,
>>>
>>> I got a patch for PR50527.
>>>
>>> The patch prevents the alignment of vla-related allocas to be set to
>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>>> the alloca.
>>>
>>> Bootstrapped and regtested on x86_64.
>>>
>>> OK for trunk?
>>
>> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
>> the vectorizer then will no longer see that the arrays are properly aligned.
>>
>> I'm not sure what the best thing to do is here, other than trying to record
>> the alignment requirement of the VLA somewhere.
>>
>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>> has the issue that it will force stack-realignment which isn't free (and the
>> point was to make the decl cheaper than the alloca).  But that might
>> possibly be the better choice.
>>
>> Any other thoughts?
>
> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
> for the new array prevents stack realignment for folded vla-allocas, also for
> large vlas.
>
> This will not help in vectorizing large folded vla-allocas, but I think it's not
> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
> been the case up until we started to fold). If you want to trigger vectorization
> for a vla, you can still use the aligned attribute on the declaration.
>
> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
> an attribute on the decl. This patch exploits this by setting it at the end of
> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
> propagation though, because although the ptr_info of the lhs is propagated via
> copy_prop afterwards, it's not propagated anymore via ccp.
>
> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
> not fold during ccp3.

Ugh, somehow I like this the least ;)

How about lowering VLAs to

  p = __builtin_alloca (...);
  p = __builtin_assume_aligned (p, DECL_ALIGN (vla));

and not assume anything for alloca itself if it feeds a
__builtin_assume_aligned?

Or rather introduce a __builtin_alloca_with_align () and for VLAs do

 p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));

that's less awkward to use?

Sorry for not having a clear plan here ;)

Richard.

> Thanks,
> - Tom
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-09-30 14:35     ` Richard Guenther
@ 2011-10-01 15:44       ` Tom de Vries
  2011-10-04  7:50         ` Tom de Vries
  0 siblings, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2011-10-01 15:44 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches >> "gcc-patches@gcc.gnu.org"

[-- Attachment #1: Type: text/plain, Size: 3126 bytes --]

On 09/30/2011 03:29 PM, Richard Guenther wrote:
> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>> Richard,
>>>>
>>>> I got a patch for PR50527.
>>>>
>>>> The patch prevents the alignment of vla-related allocas to be set to
>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>>>> the alloca.
>>>>
>>>> Bootstrapped and regtested on x86_64.
>>>>
>>>> OK for trunk?
>>>
>>> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
>>> the vectorizer then will no longer see that the arrays are properly aligned.
>>>
>>> I'm not sure what the best thing to do is here, other than trying to record
>>> the alignment requirement of the VLA somewhere.
>>>
>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>>> has the issue that it will force stack-realignment which isn't free (and the
>>> point was to make the decl cheaper than the alloca).  But that might
>>> possibly be the better choice.
>>>
>>> Any other thoughts?
>>
>> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
>> for the new array prevents stack realignment for folded vla-allocas, also for
>> large vlas.
>>
>> This will not help in vectorizing large folded vla-allocas, but I think it's not
>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
>> been the case up until we started to fold). If you want to trigger vectorization
>> for a vla, you can still use the aligned attribute on the declaration.
>>
>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
>> an attribute on the decl. This patch exploits this by setting it at the end of
>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>> propagation though, because although the ptr_info of the lhs is propagated via
>> copy_prop afterwards, it's not propagated anymore via ccp.
>>
>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
>> not fold during ccp3.
> 
> Ugh, somehow I like this the least ;)
> 
> How about lowering VLAs to
> 
>   p = __builtin_alloca (...);
>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
> 
> and not assume anything for alloca itself if it feeds a
> __builtin_assume_aligned?
> 
> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
> 
>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
> 
> that's less awkward to use?
> 
> Sorry for not having a clear plan here ;)
> 

Using assume_aligned is a more orthogonal way to represent this in gimple, but
indeed harder to use.

Another possibility is to add a 'tree vla_decl' field to struct
gimple_statement_call, which is probably the easiest to implement.

But I think __builtin_alloca_with_align might have a use beyond vlas, so I
decided to try this one. Attached patch implements my first stab at this  (now
testing on x86_64).

Is this an acceptable approach?

Thanks,
- Tom


> Richard.
> 

Thanks,- Tom

[-- Attachment #2: pr50527.6.patch --]
[-- Type: text/x-patch, Size: 19554 bytes --]

Index: gcc/tree.c
===================================================================
--- gcc/tree.c (revision 179210)
+++ gcc/tree.c (working copy)
@@ -9483,9 +9483,18 @@ build_common_builtin_nodes (void)
 			    "alloca", ECF_MALLOC | ECF_NOTHROW | ECF_LEAF);
     }
 
+  ftype = build_function_type_list (ptr_type_node, size_type_node,
+				    size_type_node, NULL_TREE);
+  local_define_builtin ("__builtin_alloca_with_align", ftype,
+			BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align",
+			ECF_MALLOC | ECF_NOTHROW | ECF_LEAF);
+
   /* If we're checking the stack, `alloca' can throw.  */
   if (flag_stack_check)
-    TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA]) = 0;
+    {
+      TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA]) = 0;
+      TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN]) = 0;
+    }
 
   ftype = build_function_type_list (void_type_node,
 				    ptr_type_node, ptr_type_node,
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h (revision 179210)
+++ gcc/tree-pass.h (working copy)
@@ -389,6 +389,7 @@ extern struct gimple_opt_pass pass_iv_op
 extern struct gimple_opt_pass pass_tree_loop_done;
 extern struct gimple_opt_pass pass_ch;
 extern struct gimple_opt_pass pass_ccp;
+extern struct gimple_opt_pass pass_ccp_last;
 extern struct gimple_opt_pass pass_phi_only_cprop;
 extern struct gimple_opt_pass pass_build_ssa;
 extern struct gimple_opt_pass pass_build_alias;
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c (revision 179210)
+++ gcc/builtins.c (working copy)
@@ -5304,6 +5304,7 @@ expand_builtin (tree exp, rtx target, rt
       && !called_as_built_in (fndecl)
       && DECL_ASSEMBLER_NAME_SET_P (fndecl)
       && fcode != BUILT_IN_ALLOCA
+      && fcode != BUILT_IN_ALLOCA_WITH_ALIGN
       && fcode != BUILT_IN_FREE)
     return expand_call (exp, target, ignore);
 
@@ -5559,6 +5560,7 @@ expand_builtin (tree exp, rtx target, rt
 	return XEXP (DECL_RTL (DECL_RESULT (current_function_decl)), 0);
 
     case BUILT_IN_ALLOCA:
+    case BUILT_IN_ALLOCA_WITH_ALIGN:
       /* If the allocation stems from the declaration of a variable-sized
 	 object, it cannot accumulate.  */
       target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
@@ -13561,6 +13563,7 @@ is_inexpensive_builtin (tree decl)
       {
       case BUILT_IN_ABS:
       case BUILT_IN_ALLOCA:
+      case BUILT_IN_ALLOCA_WITH_ALIGN:
       case BUILT_IN_BSWAP32:
       case BUILT_IN_BSWAP64:
       case BUILT_IN_CLZ:
Index: gcc/tree-ssa-ccp.c
===================================================================
--- gcc/tree-ssa-ccp.c (revision 179210)
+++ gcc/tree-ssa-ccp.c (working copy)
@@ -167,8 +167,13 @@ typedef struct prop_value_d prop_value_t
    doing the store).  */
 static prop_value_t *const_val;
 
+/* Indicates whether we're in pass_ccp_last.  */
+static bool ccp_last;
+
 static void canonicalize_float_value (prop_value_t *);
 static bool ccp_fold_stmt (gimple_stmt_iterator *);
+static bool fold_builtin_alloca_with_align_p (gimple);
+static bool builtin_alloca_with_align_p (gimple);
 
 /* Dump constant propagation value VAL to file OUTF prefixed by PREFIX.  */
 
@@ -813,6 +818,18 @@ ccp_finalize (void)
 	  || !POINTER_TYPE_P (TREE_TYPE (name)))
 	continue;
 
+      if (ccp_last && !SSA_NAME_IS_DEFAULT_DEF (name))
+	{
+	  gimple def = SSA_NAME_DEF_STMT (name);
+	  if (builtin_alloca_with_align_p (def)
+	      && !fold_builtin_alloca_with_align_p (def))
+	    {
+	      pi = get_ptr_info (name);
+	      pi->align = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
+	      continue;
+	    }
+	}
+
       val = get_value (name);
       if (val->lattice_val != CONSTANT
 	  || TREE_CODE (val->value) != INTEGER_CST)
@@ -1492,6 +1509,7 @@ evaluate_stmt (gimple stmt)
   tree simplified = NULL_TREE;
   ccp_lattice_t likelyvalue = likely_value (stmt);
   bool is_constant = false;
+  unsigned int align;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -1632,10 +1650,14 @@ evaluate_stmt (gimple stmt)
 	      break;
 
 	    case BUILT_IN_ALLOCA:
+	    case BUILT_IN_ALLOCA_WITH_ALIGN:
+	      align = (builtin_alloca_with_align_p (stmt)
+		       ? TREE_INT_CST_LOW (gimple_call_arg (stmt, 1))
+		       : BIGGEST_ALIGNMENT);
 	      val.lattice_val = CONSTANT;
 	      val.value = build_int_cst (TREE_TYPE (gimple_get_lhs (stmt)), 0);
 	      val.mask = shwi_to_double_int
-		  	   (~(((HOST_WIDE_INT) BIGGEST_ALIGNMENT)
+		  	   (~(((HOST_WIDE_INT) align)
 			      / BITS_PER_UNIT - 1));
 	      break;
 
@@ -1685,27 +1707,45 @@ evaluate_stmt (gimple stmt)
   return val;
 }
 
-/* Detects a vla-related alloca with a constant argument.  Declares fixed-size
-   array and return the address, if found, otherwise returns NULL_TREE.  */
+/* Return true if stmt is a call to __builtin_alloca_with_align. */
 
-static tree
-fold_builtin_alloca_for_var (gimple stmt)
+static bool
+builtin_alloca_with_align_p (gimple stmt)
 {
-  unsigned HOST_WIDE_INT size, threshold, n_elem;
-  tree lhs, arg, block, var, elem_type, array_type;
-  unsigned int align;
+  tree fndecl;
+  if (!is_gimple_call (stmt))
+    return false;
+
+  fndecl = gimple_call_fndecl (stmt);
+
+  return (fndecl != NULL_TREE
+	  && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN);
+}
+
+/* Returns true if STMT is a call to __builtin_alloca_with_align that will be\
+   folded.  */
+
+static bool
+fold_builtin_alloca_with_align_p (gimple stmt)
+{
+  unsigned HOST_WIDE_INT size, threshold;
+  tree lhs, arg, block;
+
+  if (!builtin_alloca_with_align_p (stmt)
+      || !gimple_call_alloca_for_var_p (stmt))
+    return false;
 
   /* Get lhs.  */
   lhs = gimple_call_lhs (stmt);
   if (lhs == NULL_TREE)
-    return NULL_TREE;
+    return false;
 
   /* Detect constant argument.  */
   arg = get_constant_value (gimple_call_arg (stmt, 0));
   if (arg == NULL_TREE
       || TREE_CODE (arg) != INTEGER_CST
       || !host_integerp (arg, 1))
-    return NULL_TREE;
+    return false;
 
   size = TREE_INT_CST_LOW (arg);
 
@@ -1718,28 +1758,49 @@ fold_builtin_alloca_for_var (gimple stmt
         && TREE_CODE (BLOCK_SUPERCONTEXT (block)) == FUNCTION_DECL))
     threshold /= 10;
   if (size > threshold)
-    return NULL_TREE;
+    return false;
+
+  return true;
+}
+
+/* Detects a __builtin_alloca_with_align with constant size argument.  Declares
+   fixed-size array and returns the address, if found, otherwise returns
+   NULL_TREE.  */
+
+static tree
+fold_builtin_alloca_with_align (gimple stmt)
+{
+  unsigned HOST_WIDE_INT size, n_elem;
+  tree lhs, arg, var, elem_type, array_type;
+  struct ptr_info_def *pi;
+
+  if (!fold_builtin_alloca_with_align_p (stmt))
+    return NULL;
+
+  lhs = gimple_call_lhs (stmt);
+  pi = SSA_NAME_PTR_INFO (lhs);
+  arg = get_constant_value (gimple_call_arg (stmt, 0));
+  size = TREE_INT_CST_LOW (arg);
 
   /* Declare array.  */
   elem_type = build_nonstandard_integer_type (BITS_PER_UNIT, 1);
   n_elem = size * 8 / BITS_PER_UNIT;
-  align = MIN (size * 8, BIGGEST_ALIGNMENT);
-  if (align < BITS_PER_UNIT)
-    align = BITS_PER_UNIT;
   array_type = build_array_type_nelts (elem_type, n_elem);
   var = create_tmp_var (array_type, NULL);
-  DECL_ALIGN (var) = align;
-  {
-    struct ptr_info_def *pi = SSA_NAME_PTR_INFO (lhs);
-    if (pi != NULL && !pi->pt.anything)
-      {
-	bool singleton_p;
-	unsigned uid;
-	singleton_p = pt_solution_singleton_p (&pi->pt, &uid);
-	gcc_assert (singleton_p);
-	SET_DECL_PT_UID (var, uid);
-      }
-  }
+  DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
+
+  /* Make sure DECL_ALIGN garantuees the alignment propagated from the ptr_info
+     of lhs.  */
+  gcc_checking_assert (DECL_ALIGN (var) >= pi->align);
+
+  if (!pi->pt.anything)
+    {
+      bool singleton_p;
+      unsigned uid;
+      singleton_p = pt_solution_singleton_p (&pi->pt, &uid);
+      gcc_assert (singleton_p);
+      SET_DECL_PT_UID (var, uid);
+    }
 
   /* Fold alloca to the address of the array.  */
   return fold_convert (TREE_TYPE (lhs), build_fold_addr_expr (var));
@@ -1816,9 +1877,9 @@ ccp_fold_stmt (gimple_stmt_iterator *gsi
         /* The heuristic of fold_builtin_alloca_for_var differs before and after
            inlining, so we don't require the arg to be changed into a constant
            for folding, but just to be constant.  */
-        if (gimple_call_alloca_for_var_p (stmt))
+        if (builtin_alloca_with_align_p (stmt))
           {
-            tree new_rhs = fold_builtin_alloca_for_var (stmt);
+            tree new_rhs = fold_builtin_alloca_with_align (stmt);
             if (new_rhs)
 	      {
 		bool res = update_call_from_tree (gsi, new_rhs);
@@ -2030,6 +2091,20 @@ do_ssa_ccp (void)
     return 0;
 }
 
+/* Last invocation of SSA Conditional Constant Propagation.
+   At the end of the last invocation, we know which builtin_alloca_with_align
+   will not be folded, and we can assign BIGGEST_ALIGNMENT for those in
+   ccp_finalize.  */
+
+static unsigned int
+do_ssa_ccp_last (void)
+{
+  unsigned int res;
+  ccp_last = true;
+  res = do_ssa_ccp ();
+  ccp_last = false;
+  return res;
+}
 
 static bool
 gate_ccp (void)
@@ -2058,6 +2133,25 @@ struct gimple_opt_pass pass_ccp =
  }
 };
 
+struct gimple_opt_pass pass_ccp_last =
+{
+ {
+  GIMPLE_PASS,
+  "ccp3",				/* name */
+  gate_ccp,				/* gate */
+  do_ssa_ccp_last,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_TREE_CCP,				/* tv_id */
+  PROP_cfg | PROP_ssa,			/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_verify_ssa
+  | TODO_verify_stmts | TODO_ggc_collect/* todo_flags_finish */
+ }
+};
 
 
 /* Try to optimize out __builtin_stack_restore.  Optimize it out
@@ -2093,7 +2187,8 @@ optimize_stack_restore (gimple_stmt_iter
       if (!callee
 	  || DECL_BUILT_IN_CLASS (callee) != BUILT_IN_NORMAL
 	  /* All regular builtins are ok, just obviously not alloca.  */
-	  || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA)
+	  || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA
+	  || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA_WITH_ALIGN)
 	return NULL_TREE;
 
       if (DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE)
Index: gcc/builtins.def
===================================================================
--- gcc/builtins.def (revision 179210)
+++ gcc/builtins.def (working copy)
@@ -616,6 +616,7 @@ DEF_LIB_BUILTIN        (BUILT_IN_ABORT,
 DEF_LIB_BUILTIN        (BUILT_IN_ABS, "abs", BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN        (BUILT_IN_AGGREGATE_INCOMING_ADDRESS, "aggregate_incoming_address", BT_FN_PTR_VAR, ATTR_LEAF_LIST)
 DEF_EXT_LIB_BUILTIN    (BUILT_IN_ALLOCA, "alloca", BT_FN_PTR_SIZE, ATTR_MALLOC_NOTHROW_LEAF_LIST)
+DEF_BUILTIN_STUB       (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align")
 DEF_GCC_BUILTIN        (BUILT_IN_APPLY, "apply", BT_FN_PTR_PTR_FN_VOID_VAR_PTR_SIZE, ATTR_NULL)
 DEF_GCC_BUILTIN        (BUILT_IN_APPLY_ARGS, "apply_args", BT_FN_PTR_VAR, ATTR_LEAF_LIST)
 DEF_GCC_BUILTIN        (BUILT_IN_BSWAP32, "bswap32", BT_FN_UINT32_UINT32, ATTR_CONST_NOTHROW_LEAF_LIST)
Index: gcc/ipa-pure-const.c
===================================================================
--- gcc/ipa-pure-const.c (revision 179210)
+++ gcc/ipa-pure-const.c (working copy)
@@ -437,6 +437,7 @@ special_builtin_state (enum pure_const_s
 	case BUILT_IN_RETURN:
 	case BUILT_IN_UNREACHABLE:
 	case BUILT_IN_ALLOCA:
+	case BUILT_IN_ALLOCA_WITH_ALIGN:
 	case BUILT_IN_STACK_SAVE:
 	case BUILT_IN_STACK_RESTORE:
 	case BUILT_IN_EH_POINTER:
Index: gcc/tree-ssa-alias.c
===================================================================
--- gcc/tree-ssa-alias.c (revision 179210)
+++ gcc/tree-ssa-alias.c (working copy)
@@ -1231,6 +1231,7 @@ ref_maybe_used_by_call_p_1 (gimple call,
 	case BUILT_IN_MALLOC:
 	case BUILT_IN_CALLOC:
 	case BUILT_IN_ALLOCA:
+	case BUILT_IN_ALLOCA_WITH_ALIGN:
 	case BUILT_IN_STACK_SAVE:
 	case BUILT_IN_STACK_RESTORE:
 	case BUILT_IN_MEMSET:
@@ -1513,6 +1514,7 @@ call_may_clobber_ref_p_1 (gimple call, a
 	  return false;
 	case BUILT_IN_STACK_SAVE:
 	case BUILT_IN_ALLOCA:
+	case BUILT_IN_ALLOCA_WITH_ALIGN:
 	case BUILT_IN_ASSUME_ALIGNED:
 	  return false;
 	/* Freeing memory kills the pointed-to memory.  More importantly
Index: gcc/function.c
===================================================================
--- gcc/function.c (revision 179210)
+++ gcc/function.c (working copy)
@@ -3638,8 +3638,10 @@ gimplify_parameters (void)
 		  DECL_IGNORED_P (addr) = 0;
 		  local = build_fold_indirect_ref (addr);
 
-		  t = built_in_decls[BUILT_IN_ALLOCA];
-		  t = build_call_expr (t, 1, DECL_SIZE_UNIT (parm));
+		  t = built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN];
+		  t = build_call_expr (t, 1, DECL_SIZE_UNIT (parm),
+				       build_int_cstu (size_type_node,
+						       DECL_ALIGN (parm)));
 		  /* The call has been built for a variable-sized object.  */
 		  CALL_ALLOCA_FOR_VAR_P (t) = 1;
 		  t = fold_convert (ptr_type, t);
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c (revision 179210)
+++ gcc/gimplify.c (working copy)
@@ -1329,8 +1329,9 @@ gimplify_vla_decl (tree decl, gimple_seq
   SET_DECL_VALUE_EXPR (decl, t);
   DECL_HAS_VALUE_EXPR_P (decl) = 1;
 
-  t = built_in_decls[BUILT_IN_ALLOCA];
-  t = build_call_expr (t, 1, DECL_SIZE_UNIT (decl));
+  t = built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN];
+  t = build_call_expr (t, 2, DECL_SIZE_UNIT (decl),
+		       build_int_cstu (size_type_node, DECL_ALIGN (decl)));
   /* The call has been built for a variable-sized object.  */
   CALL_ALLOCA_FOR_VAR_P (t) = 1;
   t = fold_convert (ptr_type, t);
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c (revision 179210)
+++ gcc/cfgexpand.c (working copy)
@@ -1858,7 +1858,8 @@ expand_call_stmt (gimple stmt)
   CALL_EXPR_RETURN_SLOT_OPT (exp) = gimple_call_return_slot_opt_p (stmt);
   if (decl
       && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL
-      && DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA)
+      && (DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA
+	  || DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA))
     CALL_ALLOCA_FOR_VAR_P (exp) = gimple_call_alloca_for_var_p (stmt);
   else
     CALL_FROM_THUNK_P (exp) = gimple_call_from_thunk_p (stmt);
Index: gcc/tree-mudflap.c
===================================================================
--- gcc/tree-mudflap.c (revision 179210)
+++ gcc/tree-mudflap.c (working copy)
@@ -969,7 +969,9 @@ mf_xform_statements (void)
             case GIMPLE_CALL:
               {
                 tree fndecl = gimple_call_fndecl (s);
-                if (fndecl && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA))
+                if (fndecl && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA
+			       || (DECL_FUNCTION_CODE (fndecl)
+				   == BUILT_IN_ALLOCA_WITH_ALIGN)))
                   gimple_call_set_cannot_inline (s, true);
               }
               break;
Index: gcc/tree-ssa-dce.c
===================================================================
--- gcc/tree-ssa-dce.c (revision 179210)
+++ gcc/tree-ssa-dce.c (working copy)
@@ -308,6 +308,7 @@ mark_stmt_if_obviously_necessary (gimple
 	    case BUILT_IN_MALLOC:
 	    case BUILT_IN_CALLOC:
 	    case BUILT_IN_ALLOCA:
+	    case BUILT_IN_ALLOCA_WITH_ALIGN:
 	      return;
 
 	    default:;
@@ -639,6 +640,7 @@ mark_all_reaching_defs_necessary_1 (ao_r
 	  case BUILT_IN_MALLOC:
 	  case BUILT_IN_CALLOC:
 	  case BUILT_IN_ALLOCA:
+	  case BUILT_IN_ALLOCA_WITH_ALIGN:
 	  case BUILT_IN_FREE:
 	    return false;
 
@@ -890,6 +892,8 @@ propagate_necessity (struct edge_list *e
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_FREE
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_VA_END
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA
+		      || (DECL_FUNCTION_CODE (callee)
+			  == BUILT_IN_ALLOCA_WITH_ALIGN)
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_SAVE
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_ASSUME_ALIGNED))
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c (revision 179210)
+++ gcc/varasm.c (working copy)
@@ -2104,7 +2104,8 @@ incorporeal_function_p (tree decl)
       const char *name;
 
       if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL
-	  && DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA)
+	  && (DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA
+	      || DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA_WITH_ALIGN))
 	return true;
 
       name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
Index: gcc/tree-object-size.c
===================================================================
--- gcc/tree-object-size.c (revision 179210)
+++ gcc/tree-object-size.c (working copy)
@@ -411,6 +411,7 @@ alloc_object_size (const_gimple call, in
 	/* fall through */
       case BUILT_IN_MALLOC:
       case BUILT_IN_ALLOCA:
+      case BUILT_IN_ALLOCA_WITH_ALIGN:
 	arg1 = 0;
       default:
 	break;
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c (revision 179210)
+++ gcc/gimple.c (working copy)
@@ -374,7 +374,8 @@ gimple_build_call_from_tree (tree t)
   gimple_call_set_return_slot_opt (call, CALL_EXPR_RETURN_SLOT_OPT (t));
   if (fndecl
       && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
-      && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA)
+      && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA
+	  || DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN))
     gimple_call_set_alloca_for_var (call, CALL_ALLOCA_FOR_VAR_P (t));
   else
     gimple_call_set_from_thunk (call, CALL_FROM_THUNK_P (t));
Index: gcc/passes.c
===================================================================
--- gcc/passes.c (revision 179210)
+++ gcc/passes.c (working copy)
@@ -1321,7 +1321,7 @@ init_optimization_passes (void)
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt);
       NEXT_PASS (pass_object_sizes);
-      NEXT_PASS (pass_ccp);
+      NEXT_PASS (pass_ccp_last);
       NEXT_PASS (pass_copy_prop);
       NEXT_PASS (pass_cse_sincos);
       NEXT_PASS (pass_optimize_bswap);
Index: gcc/testsuite/gcc.dg/pr50527.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr50527.c (revision 0)
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-options "-Os --param large-stack-frame=30" } */
+
+extern void abort (void);
+
+void __attribute__((noinline)) 
+bar (char *a)
+{
+}
+
+void __attribute__((noinline)) 
+foo (char *a, int b)
+{
+}
+
+void __attribute__((noinline))
+test_align (char *p, int aligned, unsigned int mask)
+{
+  int p_aligned = ((unsigned long int)p & mask) == 0;
+  if (aligned != p_aligned)
+    abort ();
+}
+
+int
+main ()
+{
+  const int kIterations = 4;
+  char results[kIterations];
+  int i;
+  unsigned int mask;
+
+  mask = 0xf;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x7;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x3;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x1;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+
+  bar (results);
+  for (i = 0; i < kIterations; i++)
+    foo ("%d ", results[i]);
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-10-01 15:44       ` Tom de Vries
@ 2011-10-04  7:50         ` Tom de Vries
  2011-10-04 13:06           ` Richard Guenther
  0 siblings, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2011-10-04  7:50 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches >> "gcc-patches@gcc.gnu.org"

[-- Attachment #1: Type: text/plain, Size: 5327 bytes --]

On 10/01/2011 05:46 PM, Tom de Vries wrote:
> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>> Richard,
>>>>>
>>>>> I got a patch for PR50527.
>>>>>
>>>>> The patch prevents the alignment of vla-related allocas to be set to
>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>>>>> the alloca.
>>>>>
>>>>> Bootstrapped and regtested on x86_64.
>>>>>
>>>>> OK for trunk?
>>>>
>>>> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
>>>> the vectorizer then will no longer see that the arrays are properly aligned.
>>>>
>>>> I'm not sure what the best thing to do is here, other than trying to record
>>>> the alignment requirement of the VLA somewhere.
>>>>
>>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>>>> has the issue that it will force stack-realignment which isn't free (and the
>>>> point was to make the decl cheaper than the alloca).  But that might
>>>> possibly be the better choice.
>>>>
>>>> Any other thoughts?
>>>
>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
>>> for the new array prevents stack realignment for folded vla-allocas, also for
>>> large vlas.
>>>
>>> This will not help in vectorizing large folded vla-allocas, but I think it's not
>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
>>> been the case up until we started to fold). If you want to trigger vectorization
>>> for a vla, you can still use the aligned attribute on the declaration.
>>>
>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
>>> an attribute on the decl. This patch exploits this by setting it at the end of
>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>>> propagation though, because although the ptr_info of the lhs is propagated via
>>> copy_prop afterwards, it's not propagated anymore via ccp.
>>>
>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
>>> not fold during ccp3.
>>
>> Ugh, somehow I like this the least ;)
>>
>> How about lowering VLAs to
>>
>>   p = __builtin_alloca (...);
>>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>
>> and not assume anything for alloca itself if it feeds a
>> __builtin_assume_aligned?
>>
>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>
>>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>
>> that's less awkward to use?
>>
>> Sorry for not having a clear plan here ;)
>>
> 
> Using assume_aligned is a more orthogonal way to represent this in gimple, but
> indeed harder to use.
> 
> Another possibility is to add a 'tree vla_decl' field to struct
> gimple_statement_call, which is probably the easiest to implement.
> 
> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
> decided to try this one. Attached patch implements my first stab at this  (now
> testing on x86_64).
> 
> Is this an acceptable approach?
> 

bootstrapped and reg-tested (including ada) on x86_64.

Ok for trunk?

Thanks,
- Tom

2011-10-03  Tom de Vries  <tom@codesourcery.com>

	PR middle-end/50527
	* tree.c (build_common_builtin_nodes): Add local_define_builtin for
	BUILT_IN_ALLOCA_WITH_ALIGN.
	* builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
	arglist.
	(expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.  Expand
	BUILT_IN_ALLOCA_WITH_ALIGN as BUILT_IN_ALLOCA.
	(is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
	* tree-pass.h (pass_ccp_last): Declare.
	* passes.c (init_optimization_passes): Replace 3rd pass_ccp with
	pass_ccp_last.
	* gimplify.c (gimplify_vla_decl): Lower vla to
	BUILT_IN_ALLOCA_WITH_ALIGN.
	* tree-ssa-ccp.c (cpp_last): Define.
	(ccp_finalize): Improve align ptr_info for BUILT_IN_ALLOCA_WITH_ALIGN.
	(evaluate_stmt): Set align for BUILT_IN_ALLOCA_WITH_ALIGN.
	(builtin_alloca_with_align_p): New function.
	(fold_builtin_alloca_with_align_p): New function, factored out of ...
	(fold_builtin_alloca_for_var): Renamed to ...
	(fold_builtin_alloca_with_align): Use fold_builtin_alloca_with_align_p.
	Set DECL_ALIGN from BUILT_IN_ALLOCA_WITH_ALIGN argument.
	(ccp_fold_stmt): Try folding using fold_builtin_alloca_with_align if
	builtin_alloca_with_align_p.
	(do_ssa_ccp_last): New function.
	(pass_ccp_last): Define.
	(optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
	* builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
	DEF_BUILTIN_STUB.
	* ipa-pure-const.c (special_builtin_state): Handle
	BUILT_IN_ALLOCA_WITH_ALIGN.
	* tree-ssa-alias.c (ref_maybe_used_by_call_p_1
	(call_may_clobber_ref_p_1): Same.
	function.c (gimplify_parameters): Lower vla to
	BUILT_IN_ALLOCA_WITH_ALIGN.
	cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
	tree-mudflap.c (mf_xform_statements): Same.
	tree-ssa-dce.c (mark_stmt_if_obviously_necessary)
	(mark_all_reaching_defs_necessary_1, propagate_necessity): Same.
	varasm.c (incorporeal_function_p): Same.
	tree-object-size.c (alloc_object_size): Same.
	gimple.c (gimple_build_call_from_tree): Same.

	* gcc.dg/pr50527.c: New test.



[-- Attachment #2: pr50527.10.patch --]
[-- Type: text/x-patch, Size: 20624 bytes --]

Index: gcc/tree.c
===================================================================
--- gcc/tree.c (revision 179210)
+++ gcc/tree.c (working copy)
@@ -9483,9 +9483,18 @@ build_common_builtin_nodes (void)
 			    "alloca", ECF_MALLOC | ECF_NOTHROW | ECF_LEAF);
     }
 
+  ftype = build_function_type_list (ptr_type_node, size_type_node,
+				    size_type_node, NULL_TREE);
+  local_define_builtin ("__builtin_alloca_with_align", ftype,
+			BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align",
+			ECF_MALLOC | ECF_NOTHROW | ECF_LEAF);
+
   /* If we're checking the stack, `alloca' can throw.  */
   if (flag_stack_check)
-    TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA]) = 0;
+    {
+      TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA]) = 0;
+      TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN]) = 0;
+    }
 
   ftype = build_function_type_list (void_type_node,
 				    ptr_type_node, ptr_type_node,
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h (revision 179210)
+++ gcc/tree-pass.h (working copy)
@@ -389,6 +389,7 @@ extern struct gimple_opt_pass pass_iv_op
 extern struct gimple_opt_pass pass_tree_loop_done;
 extern struct gimple_opt_pass pass_ch;
 extern struct gimple_opt_pass pass_ccp;
+extern struct gimple_opt_pass pass_ccp_last;
 extern struct gimple_opt_pass pass_phi_only_cprop;
 extern struct gimple_opt_pass pass_build_ssa;
 extern struct gimple_opt_pass pass_build_alias;
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c (revision 179210)
+++ gcc/builtins.c (working copy)
@@ -4516,12 +4516,19 @@ expand_builtin_alloca (tree exp, bool ca
 {
   rtx op0;
   rtx result;
+  bool valid_arglist;
 
   /* Emit normal call if marked not-inlineable.  */
   if (CALL_CANNOT_INLINE_P (exp))
     return NULL_RTX;
 
-  if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
+  valid_arglist
+    = ((DECL_FUNCTION_CODE (get_callee_fndecl (exp))
+	== BUILT_IN_ALLOCA_WITH_ALIGN)
+       ? validate_arglist (exp, INTEGER_TYPE, INTEGER_TYPE, VOID_TYPE)
+       : validate_arglist (exp, INTEGER_TYPE, VOID_TYPE));
+
+  if (!valid_arglist)
     return NULL_RTX;
 
   /* Compute the argument.  */
@@ -5304,6 +5311,7 @@ expand_builtin (tree exp, rtx target, rt
       && !called_as_built_in (fndecl)
       && DECL_ASSEMBLER_NAME_SET_P (fndecl)
       && fcode != BUILT_IN_ALLOCA
+      && fcode != BUILT_IN_ALLOCA_WITH_ALIGN
       && fcode != BUILT_IN_FREE)
     return expand_call (exp, target, ignore);
 
@@ -5559,11 +5567,21 @@ expand_builtin (tree exp, rtx target, rt
 	return XEXP (DECL_RTL (DECL_RESULT (current_function_decl)), 0);
 
     case BUILT_IN_ALLOCA:
+    case BUILT_IN_ALLOCA_WITH_ALIGN:
       /* If the allocation stems from the declaration of a variable-sized
 	 object, it cannot accumulate.  */
       target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
       if (target)
 	return target;
+      if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
+	  == BUILT_IN_ALLOCA_WITH_ALIGN)
+	{
+	  tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
+					       built_in_decls[BUILT_IN_ALLOCA],
+					       1, CALL_EXPR_ARG (exp, 0));
+	  CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
+	  exp = new_call;
+	}
       break;
 
     case BUILT_IN_STACK_SAVE:
@@ -13561,6 +13579,7 @@ is_inexpensive_builtin (tree decl)
       {
       case BUILT_IN_ABS:
       case BUILT_IN_ALLOCA:
+      case BUILT_IN_ALLOCA_WITH_ALIGN:
       case BUILT_IN_BSWAP32:
       case BUILT_IN_BSWAP64:
       case BUILT_IN_CLZ:
Index: gcc/tree-ssa-ccp.c
===================================================================
--- gcc/tree-ssa-ccp.c (revision 179210)
+++ gcc/tree-ssa-ccp.c (working copy)
@@ -167,8 +167,13 @@ typedef struct prop_value_d prop_value_t
    doing the store).  */
 static prop_value_t *const_val;
 
+/* Indicates whether we're in pass_ccp_last.  */
+static bool ccp_last;
+
 static void canonicalize_float_value (prop_value_t *);
 static bool ccp_fold_stmt (gimple_stmt_iterator *);
+static bool fold_builtin_alloca_with_align_p (gimple);
+static bool builtin_alloca_with_align_p (gimple);
 
 /* Dump constant propagation value VAL to file OUTF prefixed by PREFIX.  */
 
@@ -813,6 +818,18 @@ ccp_finalize (void)
 	  || !POINTER_TYPE_P (TREE_TYPE (name)))
 	continue;
 
+      if (ccp_last && !SSA_NAME_IS_DEFAULT_DEF (name))
+	{
+	  gimple def = SSA_NAME_DEF_STMT (name);
+	  if (builtin_alloca_with_align_p (def)
+	      && !fold_builtin_alloca_with_align_p (def))
+	    {
+	      pi = get_ptr_info (name);
+	      pi->align = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
+	      continue;
+	    }
+	}
+
       val = get_value (name);
       if (val->lattice_val != CONSTANT
 	  || TREE_CODE (val->value) != INTEGER_CST)
@@ -1492,6 +1509,7 @@ evaluate_stmt (gimple stmt)
   tree simplified = NULL_TREE;
   ccp_lattice_t likelyvalue = likely_value (stmt);
   bool is_constant = false;
+  unsigned int align;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -1632,10 +1650,14 @@ evaluate_stmt (gimple stmt)
 	      break;
 
 	    case BUILT_IN_ALLOCA:
+	    case BUILT_IN_ALLOCA_WITH_ALIGN:
+	      align = (builtin_alloca_with_align_p (stmt)
+		       ? TREE_INT_CST_LOW (gimple_call_arg (stmt, 1))
+		       : BIGGEST_ALIGNMENT);
 	      val.lattice_val = CONSTANT;
 	      val.value = build_int_cst (TREE_TYPE (gimple_get_lhs (stmt)), 0);
 	      val.mask = shwi_to_double_int
-		  	   (~(((HOST_WIDE_INT) BIGGEST_ALIGNMENT)
+		  	   (~(((HOST_WIDE_INT) align)
 			      / BITS_PER_UNIT - 1));
 	      break;
 
@@ -1685,27 +1707,45 @@ evaluate_stmt (gimple stmt)
   return val;
 }
 
-/* Detects a vla-related alloca with a constant argument.  Declares fixed-size
-   array and return the address, if found, otherwise returns NULL_TREE.  */
+/* Return true if stmt is a call to __builtin_alloca_with_align.  */
 
-static tree
-fold_builtin_alloca_for_var (gimple stmt)
+static bool
+builtin_alloca_with_align_p (gimple stmt)
 {
-  unsigned HOST_WIDE_INT size, threshold, n_elem;
-  tree lhs, arg, block, var, elem_type, array_type;
-  unsigned int align;
+  tree fndecl;
+  if (!is_gimple_call (stmt))
+    return false;
+
+  fndecl = gimple_call_fndecl (stmt);
+
+  return (fndecl != NULL_TREE
+	  && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN);
+}
+
+/* Returns true if STMT is a call to __builtin_alloca_with_align that will be\
+   folded.  */
+
+static bool
+fold_builtin_alloca_with_align_p (gimple stmt)
+{
+  unsigned HOST_WIDE_INT size, threshold;
+  tree lhs, arg, block;
+
+  if (!builtin_alloca_with_align_p (stmt)
+      || !gimple_call_alloca_for_var_p (stmt))
+    return false;
 
   /* Get lhs.  */
   lhs = gimple_call_lhs (stmt);
   if (lhs == NULL_TREE)
-    return NULL_TREE;
+    return false;
 
   /* Detect constant argument.  */
   arg = get_constant_value (gimple_call_arg (stmt, 0));
   if (arg == NULL_TREE
       || TREE_CODE (arg) != INTEGER_CST
       || !host_integerp (arg, 1))
-    return NULL_TREE;
+    return false;
 
   size = TREE_INT_CST_LOW (arg);
 
@@ -1718,28 +1758,49 @@ fold_builtin_alloca_for_var (gimple stmt
         && TREE_CODE (BLOCK_SUPERCONTEXT (block)) == FUNCTION_DECL))
     threshold /= 10;
   if (size > threshold)
-    return NULL_TREE;
+    return false;
+
+  return true;
+}
+
+/* Detects a __builtin_alloca_with_align with constant size argument.  Declares
+   fixed-size array and returns the address, if found, otherwise returns
+   NULL_TREE.  */
+
+static tree
+fold_builtin_alloca_with_align (gimple stmt)
+{
+  unsigned HOST_WIDE_INT size, n_elem;
+  tree lhs, arg, var, elem_type, array_type;
+  struct ptr_info_def *pi;
+
+  if (!fold_builtin_alloca_with_align_p (stmt))
+    return NULL;
+
+  lhs = gimple_call_lhs (stmt);
+  pi = SSA_NAME_PTR_INFO (lhs);
+  arg = get_constant_value (gimple_call_arg (stmt, 0));
+  size = TREE_INT_CST_LOW (arg);
 
   /* Declare array.  */
   elem_type = build_nonstandard_integer_type (BITS_PER_UNIT, 1);
   n_elem = size * 8 / BITS_PER_UNIT;
-  align = MIN (size * 8, BIGGEST_ALIGNMENT);
-  if (align < BITS_PER_UNIT)
-    align = BITS_PER_UNIT;
   array_type = build_array_type_nelts (elem_type, n_elem);
   var = create_tmp_var (array_type, NULL);
-  DECL_ALIGN (var) = align;
-  {
-    struct ptr_info_def *pi = SSA_NAME_PTR_INFO (lhs);
-    if (pi != NULL && !pi->pt.anything)
-      {
-	bool singleton_p;
-	unsigned uid;
-	singleton_p = pt_solution_singleton_p (&pi->pt, &uid);
-	gcc_assert (singleton_p);
-	SET_DECL_PT_UID (var, uid);
-      }
-  }
+  DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
+
+  /* Make sure DECL_ALIGN garantuees the alignment propagated from the
+     ptr_info of lhs.  */
+  gcc_checking_assert (DECL_ALIGN (var) >= (pi != NULL ? pi->align : 1));
+
+  if (pi != NULL && !pi->pt.anything)
+    {
+      bool singleton_p;
+      unsigned uid;
+      singleton_p = pt_solution_singleton_p (&pi->pt, &uid);
+      gcc_assert (singleton_p);
+      SET_DECL_PT_UID (var, uid);
+    }
 
   /* Fold alloca to the address of the array.  */
   return fold_convert (TREE_TYPE (lhs), build_fold_addr_expr (var));
@@ -1816,9 +1877,9 @@ ccp_fold_stmt (gimple_stmt_iterator *gsi
         /* The heuristic of fold_builtin_alloca_for_var differs before and after
            inlining, so we don't require the arg to be changed into a constant
            for folding, but just to be constant.  */
-        if (gimple_call_alloca_for_var_p (stmt))
+        if (builtin_alloca_with_align_p (stmt))
           {
-            tree new_rhs = fold_builtin_alloca_for_var (stmt);
+            tree new_rhs = fold_builtin_alloca_with_align (stmt);
             if (new_rhs)
 	      {
 		bool res = update_call_from_tree (gsi, new_rhs);
@@ -2030,6 +2091,20 @@ do_ssa_ccp (void)
     return 0;
 }
 
+/* Last invocation of SSA Conditional Constant Propagation.
+   At the end of the last invocation, we know which builtin_alloca_with_align
+   will not be folded, and we can assign BIGGEST_ALIGNMENT for those in
+   ccp_finalize.  */
+
+static unsigned int
+do_ssa_ccp_last (void)
+{
+  unsigned int res;
+  ccp_last = true;
+  res = do_ssa_ccp ();
+  ccp_last = false;
+  return res;
+}
 
 static bool
 gate_ccp (void)
@@ -2058,6 +2133,25 @@ struct gimple_opt_pass pass_ccp =
  }
 };
 
+struct gimple_opt_pass pass_ccp_last =
+{
+ {
+  GIMPLE_PASS,
+  "ccp3",				/* name */
+  gate_ccp,				/* gate */
+  do_ssa_ccp_last,			/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_TREE_CCP,				/* tv_id */
+  PROP_cfg | PROP_ssa,			/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_verify_ssa
+  | TODO_verify_stmts | TODO_ggc_collect/* todo_flags_finish */
+ }
+};
 
 
 /* Try to optimize out __builtin_stack_restore.  Optimize it out
@@ -2093,7 +2187,8 @@ optimize_stack_restore (gimple_stmt_iter
       if (!callee
 	  || DECL_BUILT_IN_CLASS (callee) != BUILT_IN_NORMAL
 	  /* All regular builtins are ok, just obviously not alloca.  */
-	  || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA)
+	  || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA
+	  || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA_WITH_ALIGN)
 	return NULL_TREE;
 
       if (DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE)
Index: gcc/builtins.def
===================================================================
--- gcc/builtins.def (revision 179210)
+++ gcc/builtins.def (working copy)
@@ -616,6 +616,7 @@ DEF_LIB_BUILTIN        (BUILT_IN_ABORT,
 DEF_LIB_BUILTIN        (BUILT_IN_ABS, "abs", BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN        (BUILT_IN_AGGREGATE_INCOMING_ADDRESS, "aggregate_incoming_address", BT_FN_PTR_VAR, ATTR_LEAF_LIST)
 DEF_EXT_LIB_BUILTIN    (BUILT_IN_ALLOCA, "alloca", BT_FN_PTR_SIZE, ATTR_MALLOC_NOTHROW_LEAF_LIST)
+DEF_BUILTIN_STUB       (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align")
 DEF_GCC_BUILTIN        (BUILT_IN_APPLY, "apply", BT_FN_PTR_PTR_FN_VOID_VAR_PTR_SIZE, ATTR_NULL)
 DEF_GCC_BUILTIN        (BUILT_IN_APPLY_ARGS, "apply_args", BT_FN_PTR_VAR, ATTR_LEAF_LIST)
 DEF_GCC_BUILTIN        (BUILT_IN_BSWAP32, "bswap32", BT_FN_UINT32_UINT32, ATTR_CONST_NOTHROW_LEAF_LIST)
Index: gcc/ipa-pure-const.c
===================================================================
--- gcc/ipa-pure-const.c (revision 179210)
+++ gcc/ipa-pure-const.c (working copy)
@@ -437,6 +437,7 @@ special_builtin_state (enum pure_const_s
 	case BUILT_IN_RETURN:
 	case BUILT_IN_UNREACHABLE:
 	case BUILT_IN_ALLOCA:
+	case BUILT_IN_ALLOCA_WITH_ALIGN:
 	case BUILT_IN_STACK_SAVE:
 	case BUILT_IN_STACK_RESTORE:
 	case BUILT_IN_EH_POINTER:
Index: gcc/tree-ssa-alias.c
===================================================================
--- gcc/tree-ssa-alias.c (revision 179210)
+++ gcc/tree-ssa-alias.c (working copy)
@@ -1231,6 +1231,7 @@ ref_maybe_used_by_call_p_1 (gimple call,
 	case BUILT_IN_MALLOC:
 	case BUILT_IN_CALLOC:
 	case BUILT_IN_ALLOCA:
+	case BUILT_IN_ALLOCA_WITH_ALIGN:
 	case BUILT_IN_STACK_SAVE:
 	case BUILT_IN_STACK_RESTORE:
 	case BUILT_IN_MEMSET:
@@ -1513,6 +1514,7 @@ call_may_clobber_ref_p_1 (gimple call, a
 	  return false;
 	case BUILT_IN_STACK_SAVE:
 	case BUILT_IN_ALLOCA:
+	case BUILT_IN_ALLOCA_WITH_ALIGN:
 	case BUILT_IN_ASSUME_ALIGNED:
 	  return false;
 	/* Freeing memory kills the pointed-to memory.  More importantly
Index: gcc/function.c
===================================================================
--- gcc/function.c (revision 179210)
+++ gcc/function.c (working copy)
@@ -3638,8 +3638,10 @@ gimplify_parameters (void)
 		  DECL_IGNORED_P (addr) = 0;
 		  local = build_fold_indirect_ref (addr);
 
-		  t = built_in_decls[BUILT_IN_ALLOCA];
-		  t = build_call_expr (t, 1, DECL_SIZE_UNIT (parm));
+		  t = built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN];
+		  t = build_call_expr (t, 1, DECL_SIZE_UNIT (parm),
+				       build_int_cstu (size_type_node,
+						       DECL_ALIGN (parm)));
 		  /* The call has been built for a variable-sized object.  */
 		  CALL_ALLOCA_FOR_VAR_P (t) = 1;
 		  t = fold_convert (ptr_type, t);
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c (revision 179210)
+++ gcc/gimplify.c (working copy)
@@ -1329,8 +1329,9 @@ gimplify_vla_decl (tree decl, gimple_seq
   SET_DECL_VALUE_EXPR (decl, t);
   DECL_HAS_VALUE_EXPR_P (decl) = 1;
 
-  t = built_in_decls[BUILT_IN_ALLOCA];
-  t = build_call_expr (t, 1, DECL_SIZE_UNIT (decl));
+  t = built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN];
+  t = build_call_expr (t, 2, DECL_SIZE_UNIT (decl),
+		       build_int_cstu (size_type_node, DECL_ALIGN (decl)));
   /* The call has been built for a variable-sized object.  */
   CALL_ALLOCA_FOR_VAR_P (t) = 1;
   t = fold_convert (ptr_type, t);
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c (revision 179210)
+++ gcc/cfgexpand.c (working copy)
@@ -1858,7 +1858,8 @@ expand_call_stmt (gimple stmt)
   CALL_EXPR_RETURN_SLOT_OPT (exp) = gimple_call_return_slot_opt_p (stmt);
   if (decl
       && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL
-      && DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA)
+      && (DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA
+	  || DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA_WITH_ALIGN))
     CALL_ALLOCA_FOR_VAR_P (exp) = gimple_call_alloca_for_var_p (stmt);
   else
     CALL_FROM_THUNK_P (exp) = gimple_call_from_thunk_p (stmt);
Index: gcc/tree-mudflap.c
===================================================================
--- gcc/tree-mudflap.c (revision 179210)
+++ gcc/tree-mudflap.c (working copy)
@@ -969,7 +969,9 @@ mf_xform_statements (void)
             case GIMPLE_CALL:
               {
                 tree fndecl = gimple_call_fndecl (s);
-                if (fndecl && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA))
+                if (fndecl && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA
+			       || (DECL_FUNCTION_CODE (fndecl)
+				   == BUILT_IN_ALLOCA_WITH_ALIGN)))
                   gimple_call_set_cannot_inline (s, true);
               }
               break;
Index: gcc/tree-ssa-dce.c
===================================================================
--- gcc/tree-ssa-dce.c (revision 179210)
+++ gcc/tree-ssa-dce.c (working copy)
@@ -308,6 +308,7 @@ mark_stmt_if_obviously_necessary (gimple
 	    case BUILT_IN_MALLOC:
 	    case BUILT_IN_CALLOC:
 	    case BUILT_IN_ALLOCA:
+	    case BUILT_IN_ALLOCA_WITH_ALIGN:
 	      return;
 
 	    default:;
@@ -639,6 +640,7 @@ mark_all_reaching_defs_necessary_1 (ao_r
 	  case BUILT_IN_MALLOC:
 	  case BUILT_IN_CALLOC:
 	  case BUILT_IN_ALLOCA:
+	  case BUILT_IN_ALLOCA_WITH_ALIGN:
 	  case BUILT_IN_FREE:
 	    return false;
 
@@ -890,6 +892,8 @@ propagate_necessity (struct edge_list *e
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_FREE
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_VA_END
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA
+		      || (DECL_FUNCTION_CODE (callee)
+			  == BUILT_IN_ALLOCA_WITH_ALIGN)
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_SAVE
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_ASSUME_ALIGNED))
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c (revision 179210)
+++ gcc/varasm.c (working copy)
@@ -2104,7 +2104,8 @@ incorporeal_function_p (tree decl)
       const char *name;
 
       if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL
-	  && DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA)
+	  && (DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA
+	      || DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA_WITH_ALIGN))
 	return true;
 
       name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
Index: gcc/tree-object-size.c
===================================================================
--- gcc/tree-object-size.c (revision 179210)
+++ gcc/tree-object-size.c (working copy)
@@ -411,6 +411,7 @@ alloc_object_size (const_gimple call, in
 	/* fall through */
       case BUILT_IN_MALLOC:
       case BUILT_IN_ALLOCA:
+      case BUILT_IN_ALLOCA_WITH_ALIGN:
 	arg1 = 0;
       default:
 	break;
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c (revision 179210)
+++ gcc/gimple.c (working copy)
@@ -374,7 +374,8 @@ gimple_build_call_from_tree (tree t)
   gimple_call_set_return_slot_opt (call, CALL_EXPR_RETURN_SLOT_OPT (t));
   if (fndecl
       && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
-      && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA)
+      && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA
+	  || DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN))
     gimple_call_set_alloca_for_var (call, CALL_ALLOCA_FOR_VAR_P (t));
   else
     gimple_call_set_from_thunk (call, CALL_FROM_THUNK_P (t));
Index: gcc/passes.c
===================================================================
--- gcc/passes.c (revision 179210)
+++ gcc/passes.c (working copy)
@@ -1321,7 +1321,7 @@ init_optimization_passes (void)
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt);
       NEXT_PASS (pass_object_sizes);
-      NEXT_PASS (pass_ccp);
+      NEXT_PASS (pass_ccp_last);
       NEXT_PASS (pass_copy_prop);
       NEXT_PASS (pass_cse_sincos);
       NEXT_PASS (pass_optimize_bswap);
Index: gcc/testsuite/gcc.dg/pr50527.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr50527.c (revision 0)
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-options "-Os --param large-stack-frame=30" } */
+
+extern void abort (void);
+
+void __attribute__((noinline))
+bar (char *a)
+{
+}
+
+void __attribute__((noinline))
+foo (char *a, int b)
+{
+}
+
+void __attribute__((noinline))
+test_align (char *p, int aligned, unsigned int mask)
+{
+  int p_aligned = ((unsigned long int)p & mask) == 0;
+  if (aligned != p_aligned)
+    abort ();
+}
+
+int
+main ()
+{
+  const int kIterations = 4;
+  char results[kIterations];
+  int i;
+  unsigned int mask;
+
+  mask = 0xf;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x7;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x3;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x1;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+
+  bar (results);
+  for (i = 0; i < kIterations; i++)
+    foo ("%d ", results[i]);
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-10-04  7:50         ` Tom de Vries
@ 2011-10-04 13:06           ` Richard Guenther
  2011-10-04 16:30             ` Tom de Vries
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Guenther @ 2011-10-04 13:06 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches >> "gcc-patches@gcc.gnu.org"

On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 10/01/2011 05:46 PM, Tom de Vries wrote:
>> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> I got a patch for PR50527.
>>>>>>
>>>>>> The patch prevents the alignment of vla-related allocas to be set to
>>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>>>>>> the alloca.
>>>>>>
>>>>>> Bootstrapped and regtested on x86_64.
>>>>>>
>>>>>> OK for trunk?
>>>>>
>>>>> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
>>>>> the vectorizer then will no longer see that the arrays are properly aligned.
>>>>>
>>>>> I'm not sure what the best thing to do is here, other than trying to record
>>>>> the alignment requirement of the VLA somewhere.
>>>>>
>>>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>>>>> has the issue that it will force stack-realignment which isn't free (and the
>>>>> point was to make the decl cheaper than the alloca).  But that might
>>>>> possibly be the better choice.
>>>>>
>>>>> Any other thoughts?
>>>>
>>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
>>>> for the new array prevents stack realignment for folded vla-allocas, also for
>>>> large vlas.
>>>>
>>>> This will not help in vectorizing large folded vla-allocas, but I think it's not
>>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
>>>> been the case up until we started to fold). If you want to trigger vectorization
>>>> for a vla, you can still use the aligned attribute on the declaration.
>>>>
>>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
>>>> an attribute on the decl. This patch exploits this by setting it at the end of
>>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>>>> propagation though, because although the ptr_info of the lhs is propagated via
>>>> copy_prop afterwards, it's not propagated anymore via ccp.
>>>>
>>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
>>>> not fold during ccp3.
>>>
>>> Ugh, somehow I like this the least ;)
>>>
>>> How about lowering VLAs to
>>>
>>>   p = __builtin_alloca (...);
>>>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>>
>>> and not assume anything for alloca itself if it feeds a
>>> __builtin_assume_aligned?
>>>
>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>>
>>>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>>
>>> that's less awkward to use?
>>>
>>> Sorry for not having a clear plan here ;)
>>>
>>
>> Using assume_aligned is a more orthogonal way to represent this in gimple, but
>> indeed harder to use.
>>
>> Another possibility is to add a 'tree vla_decl' field to struct
>> gimple_statement_call, which is probably the easiest to implement.
>>
>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
>> decided to try this one. Attached patch implements my first stab at this  (now
>> testing on x86_64).
>>
>> Is this an acceptable approach?
>>
>
> bootstrapped and reg-tested (including ada) on x86_64.
>
> Ok for trunk?

The idea is ok I think.  But

     case BUILT_IN_ALLOCA:
+    case BUILT_IN_ALLOCA_WITH_ALIGN:
       /* If the allocation stems from the declaration of a variable-sized
         object, it cannot accumulate.  */
       target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
       if (target)
        return target;
+      if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
+         == BUILT_IN_ALLOCA_WITH_ALIGN)
+       {
+         tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
+                                              built_in_decls[BUILT_IN_ALLOCA],
+                                              1, CALL_EXPR_ARG (exp, 0));
+         CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
+         exp = new_call;
+       }

Ick.  Why can't the rest of the compiler not handle
BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
(thus, arrange things so the assembler name of alloca-with-align is alloca?)

I don't see why you still need the special late CCP pass.

-static tree
-fold_builtin_alloca_for_var (gimple stmt)
+static bool
+builtin_alloca_with_align_p (gimple stmt)
 {
-  unsigned HOST_WIDE_INT size, threshold, n_elem;
-  tree lhs, arg, block, var, elem_type, array_type;
-  unsigned int align;
+  tree fndecl;
+  if (!is_gimple_call (stmt))
+    return false;
+
+  fndecl = gimple_call_fndecl (stmt);
+
+  return (fndecl != NULL_TREE
+         && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN);
+}

equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN).

+  DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));

Now, users might call __builtin_alloca_with_align (8, align), thus use
a non-constant alignment argument.  You either need to reject this
in c-family/c-common.c:check_builtin_function_arguments with an
error or fall back to BIGGEST_ALIGNMENT (similarly during propagation
I suppose).

+DEF_BUILTIN_STUB       (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align")

Oh, I see you are not exposing this to users, so ignore the comments
about non-constant align.  Can you move this
DEF down, near the other DEF_BUILTIN_STUBs and add a comment
that this is used for VLAs?

+                                      build_int_cstu (size_type_node,
+                                                      DECL_ALIGN (parm)));

size_int (DECL_ALIGN (parm)), similar in other places.

Let's see if there are comments from others on this - which I like.

Thanks,
Richard.

> Thanks,
> - Tom
>
> 2011-10-03  Tom de Vries  <tom@codesourcery.com>
>
>        PR middle-end/50527
>        * tree.c (build_common_builtin_nodes): Add local_define_builtin for
>        BUILT_IN_ALLOCA_WITH_ALIGN.
>        * builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
>        arglist.
>        (expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.  Expand
>        BUILT_IN_ALLOCA_WITH_ALIGN as BUILT_IN_ALLOCA.
>        (is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        * tree-pass.h (pass_ccp_last): Declare.
>        * passes.c (init_optimization_passes): Replace 3rd pass_ccp with
>        pass_ccp_last.
>        * gimplify.c (gimplify_vla_decl): Lower vla to
>        BUILT_IN_ALLOCA_WITH_ALIGN.
>        * tree-ssa-ccp.c (cpp_last): Define.
>        (ccp_finalize): Improve align ptr_info for BUILT_IN_ALLOCA_WITH_ALIGN.
>        (evaluate_stmt): Set align for BUILT_IN_ALLOCA_WITH_ALIGN.
>        (builtin_alloca_with_align_p): New function.
>        (fold_builtin_alloca_with_align_p): New function, factored out of ...
>        (fold_builtin_alloca_for_var): Renamed to ...
>        (fold_builtin_alloca_with_align): Use fold_builtin_alloca_with_align_p.
>        Set DECL_ALIGN from BUILT_IN_ALLOCA_WITH_ALIGN argument.
>        (ccp_fold_stmt): Try folding using fold_builtin_alloca_with_align if
>        builtin_alloca_with_align_p.
>        (do_ssa_ccp_last): New function.
>        (pass_ccp_last): Define.
>        (optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
>        DEF_BUILTIN_STUB.
>        * ipa-pure-const.c (special_builtin_state): Handle
>        BUILT_IN_ALLOCA_WITH_ALIGN.
>        * tree-ssa-alias.c (ref_maybe_used_by_call_p_1
>        (call_may_clobber_ref_p_1): Same.
>        function.c (gimplify_parameters): Lower vla to
>        BUILT_IN_ALLOCA_WITH_ALIGN.
>        cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        tree-mudflap.c (mf_xform_statements): Same.
>        tree-ssa-dce.c (mark_stmt_if_obviously_necessary)
>        (mark_all_reaching_defs_necessary_1, propagate_necessity): Same.
>        varasm.c (incorporeal_function_p): Same.
>        tree-object-size.c (alloc_object_size): Same.
>        gimple.c (gimple_build_call_from_tree): Same.
>
>        * gcc.dg/pr50527.c: New test.
>
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-10-04 13:06           ` Richard Guenther
@ 2011-10-04 16:30             ` Tom de Vries
  2011-10-04 18:54               ` Richard Henderson
  2011-10-05  8:54               ` Richard Guenther
  0 siblings, 2 replies; 17+ messages in thread
From: Tom de Vries @ 2011-10-04 16:30 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches >> "gcc-patches@gcc.gnu.org"

On 10/04/2011 03:03 PM, Richard Guenther wrote:
> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 10/01/2011 05:46 PM, Tom de Vries wrote:
>>> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>>>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> I got a patch for PR50527.
>>>>>>>
>>>>>>> The patch prevents the alignment of vla-related allocas to be set to
>>>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>>>>>>> the alloca.
>>>>>>>
>>>>>>> Bootstrapped and regtested on x86_64.
>>>>>>>
>>>>>>> OK for trunk?
>>>>>>
>>>>>> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
>>>>>> the vectorizer then will no longer see that the arrays are properly aligned.
>>>>>>
>>>>>> I'm not sure what the best thing to do is here, other than trying to record
>>>>>> the alignment requirement of the VLA somewhere.
>>>>>>
>>>>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>>>>>> has the issue that it will force stack-realignment which isn't free (and the
>>>>>> point was to make the decl cheaper than the alloca).  But that might
>>>>>> possibly be the better choice.
>>>>>>
>>>>>> Any other thoughts?
>>>>>
>>>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
>>>>> for the new array prevents stack realignment for folded vla-allocas, also for
>>>>> large vlas.
>>>>>
>>>>> This will not help in vectorizing large folded vla-allocas, but I think it's not
>>>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
>>>>> been the case up until we started to fold). If you want to trigger vectorization
>>>>> for a vla, you can still use the aligned attribute on the declaration.
>>>>>
>>>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
>>>>> an attribute on the decl. This patch exploits this by setting it at the end of
>>>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>>>>> propagation though, because although the ptr_info of the lhs is propagated via
>>>>> copy_prop afterwards, it's not propagated anymore via ccp.
>>>>>
>>>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
>>>>> not fold during ccp3.
>>>>
>>>> Ugh, somehow I like this the least ;)
>>>>
>>>> How about lowering VLAs to
>>>>
>>>>   p = __builtin_alloca (...);
>>>>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>>>
>>>> and not assume anything for alloca itself if it feeds a
>>>> __builtin_assume_aligned?
>>>>
>>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>>>
>>>>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>>>
>>>> that's less awkward to use?
>>>>
>>>> Sorry for not having a clear plan here ;)
>>>>
>>>
>>> Using assume_aligned is a more orthogonal way to represent this in gimple, but
>>> indeed harder to use.
>>>
>>> Another possibility is to add a 'tree vla_decl' field to struct
>>> gimple_statement_call, which is probably the easiest to implement.
>>>
>>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
>>> decided to try this one. Attached patch implements my first stab at this  (now
>>> testing on x86_64).
>>>
>>> Is this an acceptable approach?
>>>
>>
>> bootstrapped and reg-tested (including ada) on x86_64.
>>
>> Ok for trunk?
> 
> The idea is ok I think.  But
> 
>      case BUILT_IN_ALLOCA:
> +    case BUILT_IN_ALLOCA_WITH_ALIGN:
>        /* If the allocation stems from the declaration of a variable-sized
>          object, it cannot accumulate.  */
>        target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
>        if (target)
>         return target;
> +      if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
> +         == BUILT_IN_ALLOCA_WITH_ALIGN)
> +       {
> +         tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
> +                                              built_in_decls[BUILT_IN_ALLOCA],
> +                                              1, CALL_EXPR_ARG (exp, 0));
> +         CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
> +         exp = new_call;
> +       }
> 
> Ick.  Why can't the rest of the compiler not handle
> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
> (thus, arrange things so the assembler name of alloca-with-align is alloca?)
> 

We can set the assembler name in the local_define_builtin call. But that will
still create a call alloca (12, 4). How do we deal with the second argument?

> I don't see why you still need the special late CCP pass.
> 

For alloca_with_align, the align will minimally be the 2nd argument. This is
independent of folding, and we can propagate this information in every ccp.

If the alloca_with_align is not folded and will not be folded anymore (something
we know at the earliest after the propagation phase of the last ccp),
the alignment of BIGGEST_ALIGNMENT is guaranteed, because we expand it like a
normal allloca. We try to exploit this by improving the ptr_info->align.

Well, that was the idea. But now I wonder, isn't it better to do this in
expand_builtin_alloca:
...
   /* Compute the argument.  */
   op0 = expand_normal (CALL_EXPR_ARG (exp, 0));

+  align =
+    (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) ==
+     BUILT_IN_ALLOCA_WITH_ALIGN)
+    ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
+    : BIGGEST_ALIGNMENT;
+
+
   /* Allocate the desired space.  */
-  result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
-					 cannot_accumulate);
+  result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);
   result = convert_memory_address (ptr_mode, result);

   return result;

...

This way, we lower the alignment requirement for vlas in general, not only when
folding (unless we expand to an actual alloca call).

If we do that, we don't know whether BIGGEST_ALIGNMENT is going to be applicable
until expand, so we can't exploit that in the middle-end, so we lose the need
for ccp_last.

> -static tree
> -fold_builtin_alloca_for_var (gimple stmt)
> +static bool
> +builtin_alloca_with_align_p (gimple stmt)
>  {
> -  unsigned HOST_WIDE_INT size, threshold, n_elem;
> -  tree lhs, arg, block, var, elem_type, array_type;
> -  unsigned int align;
> +  tree fndecl;
> +  if (!is_gimple_call (stmt))
> +    return false;
> +
> +  fndecl = gimple_call_fndecl (stmt);
> +
> +  return (fndecl != NULL_TREE
> +         && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN);
> +}
> 
> equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN).
> 
> +  DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
> 
> Now, users might call __builtin_alloca_with_align (8, align), thus use
> a non-constant alignment argument.  You either need to reject this
> in c-family/c-common.c:check_builtin_function_arguments with an
> error or fall back to BIGGEST_ALIGNMENT (similarly during propagation
> I suppose).
> 
> +DEF_BUILTIN_STUB       (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align")
> 
> Oh, I see you are not exposing this to users, so ignore the comments
> about non-constant align.  Can you move this
> DEF down, near the other DEF_BUILTIN_STUBs and add a comment
> that this is used for VLAs?
> 
> +                                      build_int_cstu (size_type_node,
> +                                                      DECL_ALIGN (parm)));
> 
> size_int (DECL_ALIGN (parm)), similar in other places.
> 
> Let's see if there are comments from others on this - which I like.
> 

Thanks,
- Tom

> Thanks,
> Richard.
> 
>> Thanks,
>> - Tom
>>


>> 2011-10-03  Tom de Vries  <tom@codesourcery.com>
>>
>>        PR middle-end/50527
>>        * tree.c (build_common_builtin_nodes): Add local_define_builtin for
>>        BUILT_IN_ALLOCA_WITH_ALIGN.
>>        * builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
>>        arglist.
>>        (expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.  Expand
>>        BUILT_IN_ALLOCA_WITH_ALIGN as BUILT_IN_ALLOCA.
>>        (is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>>        * tree-pass.h (pass_ccp_last): Declare.
>>        * passes.c (init_optimization_passes): Replace 3rd pass_ccp with
>>        pass_ccp_last.
>>        * gimplify.c (gimplify_vla_decl): Lower vla to
>>        BUILT_IN_ALLOCA_WITH_ALIGN.
>>        * tree-ssa-ccp.c (cpp_last): Define.
>>        (ccp_finalize): Improve align ptr_info for BUILT_IN_ALLOCA_WITH_ALIGN.
>>        (evaluate_stmt): Set align for BUILT_IN_ALLOCA_WITH_ALIGN.
>>        (builtin_alloca_with_align_p): New function.
>>        (fold_builtin_alloca_with_align_p): New function, factored out of ...
>>        (fold_builtin_alloca_for_var): Renamed to ...
>>        (fold_builtin_alloca_with_align): Use fold_builtin_alloca_with_align_p.
>>        Set DECL_ALIGN from BUILT_IN_ALLOCA_WITH_ALIGN argument.
>>        (ccp_fold_stmt): Try folding using fold_builtin_alloca_with_align if
>>        builtin_alloca_with_align_p.
>>        (do_ssa_ccp_last): New function.
>>        (pass_ccp_last): Define.
>>        (optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>>        * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
>>        DEF_BUILTIN_STUB.
>>        * ipa-pure-const.c (special_builtin_state): Handle
>>        BUILT_IN_ALLOCA_WITH_ALIGN.
>>        * tree-ssa-alias.c (ref_maybe_used_by_call_p_1
>>        (call_may_clobber_ref_p_1): Same.
>>        function.c (gimplify_parameters): Lower vla to
>>        BUILT_IN_ALLOCA_WITH_ALIGN.
>>        cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>>        tree-mudflap.c (mf_xform_statements): Same.
>>        tree-ssa-dce.c (mark_stmt_if_obviously_necessary)
>>        (mark_all_reaching_defs_necessary_1, propagate_necessity): Same.
>>        varasm.c (incorporeal_function_p): Same.
>>        tree-object-size.c (alloc_object_size): Same.
>>        gimple.c (gimple_build_call_from_tree): Same.
>>
>>        * gcc.dg/pr50527.c: New test.
>>
>>
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-10-04 16:30             ` Tom de Vries
@ 2011-10-04 18:54               ` Richard Henderson
  2011-10-04 20:19                 ` Tom de Vries
  2011-10-05  8:54               ` Richard Guenther
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2011-10-04 18:54 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Richard Guenther, gcc-patches >> "gcc-patches@gcc.gnu.org"

On 10/04/2011 09:28 AM, Tom de Vries wrote:
> Well, that was the idea. But now I wonder, isn't it better to do this in
> expand_builtin_alloca:
> ...
>    /* Compute the argument.  */
>    op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
> 
> +  align =
> +    (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) ==
> +     BUILT_IN_ALLOCA_WITH_ALIGN)
> +    ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
> +    : BIGGEST_ALIGNMENT;
> +
> +
>    /* Allocate the desired space.  */
> -  result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
> -					 cannot_accumulate);
> +  result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);

Yes, this is better.

I've lost track of what you're trying to do with "folding" alloca?


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-10-04 18:54               ` Richard Henderson
@ 2011-10-04 20:19                 ` Tom de Vries
  2011-10-04 20:59                   ` Richard Henderson
  0 siblings, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2011-10-04 20:19 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Richard Guenther, gcc-patches >> "gcc-patches@gcc.gnu.org"

On 10/04/2011 08:51 PM, Richard Henderson wrote:
> On 10/04/2011 09:28 AM, Tom de Vries wrote:
>> Well, that was the idea. But now I wonder, isn't it better to do this in
>> expand_builtin_alloca:
>> ...
>>    /* Compute the argument.  */
>>    op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
>>
>> +  align =
>> +    (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) ==
>> +     BUILT_IN_ALLOCA_WITH_ALIGN)
>> +    ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
>> +    : BIGGEST_ALIGNMENT;
>> +
>> +
>>    /* Allocate the desired space.  */
>> -  result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
>> -					 cannot_accumulate);
>> +  result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);
> 
> Yes, this is better.
> 
> I've lost track of what you're trying to do with "folding" alloca?
> 

In general, to fold vlas (which are lowered to allocas) to normal declarations,
if the alloca argument is constant.

This particular patch tries 2 addres 2 issues:
- A bug in the current implementation (PR50527). We assume a different alignment
  while propagating in ccp, than during folding the alloca to a decl.
- We try to use the DECL_ALIGN of the vla as the DECL_ALIGN of the folded alloca
  decl. We try to carry the DECL_ALIGN from lowering to folding and later using
  the newly introduced __builtin_alloca_with_align. There are other ways to do
  this carrying:
  - introduce a vla_decl field in the call gimple
  - use a composition of __builtin_alloca and __builtin_assume_aligned
  I chose for __builtin_alloca_with_align to because it's simpler to use than
  the latter. The former is the simplest, but I think
  __builtin_alloca_with_align might have a use beyond vlas.

Any guidance on what to do if we have to expand the __builtin_alloca_with_align
to a function call, specifically, with the second argument? This is currently
handled like this, which is not very nice:
...
@@ -5559,11 +5573,21 @@ expand_builtin (tree exp, rtx target, rt
 	return XEXP (DECL_RTL (DECL_RESULT (current_function_decl)), 0);

     case BUILT_IN_ALLOCA:
+    case BUILT_IN_ALLOCA_WITH_ALIGN:
       /* If the allocation stems from the declaration of a variable-sized
 	 object, it cannot accumulate.  */
       target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
       if (target)
 	return target;
+      if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
+	  == BUILT_IN_ALLOCA_WITH_ALIGN)
+	{
+	  tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
+					       built_in_decls[BUILT_IN_ALLOCA],
+					       1, CALL_EXPR_ARG (exp, 0));
+	  CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
+	  exp = new_call;
+	}
       break;

     case BUILT_IN_STACK_SAVE:
...

Thanks,
- Tom

> 
> r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-10-04 20:19                 ` Tom de Vries
@ 2011-10-04 20:59                   ` Richard Henderson
  2011-10-04 21:36                     ` Tom de Vries
  2011-10-05  8:47                     ` Richard Guenther
  0 siblings, 2 replies; 17+ messages in thread
From: Richard Henderson @ 2011-10-04 20:59 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Richard Guenther, gcc-patches >> "gcc-patches@gcc.gnu.org"

On 10/04/2011 01:17 PM, Tom de Vries wrote:
> In general, to fold vlas (which are lowered to allocas) to normal declarations,
> if the alloca argument is constant.

Ah.  Ok, I suppose.  How often are you seeing this happening?  I can imagine
a few instances via inlining, but even there not so much...

> Any guidance on what to do if we have to expand the __builtin_alloca_with_align
> to a function call, specifically, with the second argument? This is currently
> handled like this, which is not very nice:

Don't do anything special.  Just let it fall through as with alloca.  This should
never happen, except for stupid user tricks like -fno-builtin-alloca_with_align.
And if the user does that, they get what they deserve wrt needing to implement
that function in their runtime.


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-10-04 20:59                   ` Richard Henderson
@ 2011-10-04 21:36                     ` Tom de Vries
  2011-10-04 23:13                       ` Tom de Vries
  2011-10-05  8:47                     ` Richard Guenther
  1 sibling, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2011-10-04 21:36 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Richard Guenther, gcc-patches >> "gcc-patches@gcc.gnu.org"

On 10/04/2011 10:58 PM, Richard Henderson wrote:
> On 10/04/2011 01:17 PM, Tom de Vries wrote:
>> In general, to fold vlas (which are lowered to allocas) to normal declarations,
>> if the alloca argument is constant.
> 
> Ah.  Ok, I suppose.  How often are you seeing this happening?  I can imagine
> a few instances via inlining, but even there not so much...
>

I have no statistics on this. But something simple like this is still translated
into a vla, and can be folded:
...
const int n = 5;
int array[n];
...

>> Any guidance on what to do if we have to expand the __builtin_alloca_with_align
>> to a function call, specifically, with the second argument? This is currently
>> handled like this, which is not very nice:
> 
> Don't do anything special.  Just let it fall through as with alloca.  This should
> never happen, except for stupid user tricks like -fno-builtin-alloca_with_align.
> And if the user does that, they get what they deserve wrt needing to implement
> that function in their runtime.
> 

What currently causes this behaviour, is -fmudflap, see mf_xform_statements in
tree-mudflap.c. The patch handles handles __builtin_allloca_with_align the same
as __builtins_alloca in that function, meaning that both are marked with
gimple_call_set_cannot_inline.

So:
- we always expand alloca_with_align in expand_builtin_alloca, even for
  -fmudflap, or
- we need to handle the expansion of the call, or
- ... ?

Thanks,
- Tom

> 
> r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-10-04 21:36                     ` Tom de Vries
@ 2011-10-04 23:13                       ` Tom de Vries
  0 siblings, 0 replies; 17+ messages in thread
From: Tom de Vries @ 2011-10-04 23:13 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Richard Guenther, gcc-patches >> "gcc-patches@gcc.gnu.org"

On 10/04/2011 11:31 PM, Tom de Vries wrote:
> On 10/04/2011 10:58 PM, Richard Henderson wrote:
>> On 10/04/2011 01:17 PM, Tom de Vries wrote:
>>> In general, to fold vlas (which are lowered to allocas) to normal declarations,
>>> if the alloca argument is constant.
>>
>> Ah.  Ok, I suppose.  How often are you seeing this happening?  I can imagine
>> a few instances via inlining, but even there not so much...
>>
> 
> I have no statistics on this. But something simple like this is still translated
> into a vla, and can be folded:
> ...
> const int n = 5;
> int array[n];
> ...
> 
>>> Any guidance on what to do if we have to expand the __builtin_alloca_with_align
>>> to a function call, specifically, with the second argument? This is currently
>>> handled like this, which is not very nice:
>>
>> Don't do anything special.  Just let it fall through as with alloca.  This should
>> never happen, except for stupid user tricks like -fno-builtin-alloca_with_align.
>> And if the user does that, they get what they deserve wrt needing to implement
>> that function in their runtime.
>>
> 
> What currently causes this behaviour, is -fmudflap, see mf_xform_statements in
> tree-mudflap.c. The patch handles handles __builtin_allloca_with_align the same
> as __builtins_alloca in that function, meaning that both are marked with
> gimple_call_set_cannot_inline.
> 
> So:
> - we always expand alloca_with_align in expand_builtin_alloca, even for
>   -fmudflap, or
> - we need to handle the expansion of the call, or
> - ... ?
> 

Or we don't lower to alloca_with_align when -fmudflap is present?
...
Index: gimplify.c
===================================================================
--- gimplify.c (revision 179210)
+++ gimplify.c (working copy)
@@ -1329,8 +1329,17 @@ gimplify_vla_decl (tree decl, gimple_seq
   SET_DECL_VALUE_EXPR (decl, t);
   DECL_HAS_VALUE_EXPR_P (decl) = 1;

-  t = built_in_decls[BUILT_IN_ALLOCA];
-  t = build_call_expr (t, 1, DECL_SIZE_UNIT (decl));
+  if (flag_mudflap)
+    {
+      t = built_in_decls[BUILT_IN_ALLOCA];
+      t = build_call_expr (t, 1, DECL_SIZE_UNIT (decl));
+    }
+  else
+    {
+      t = built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN];
+      t = build_call_expr (t, 2, DECL_SIZE_UNIT (decl),
+			   build_int_cstu (size_type_node, DECL_ALIGN (decl)));
+    }
   /* The call has been built for a variable-sized object.  */
   CALL_ALLOCA_FOR_VAR_P (t) = 1;
   t = fold_convert (ptr_type, t);
...

> Thanks,
> - Tom
> 
>>
>> r~
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-10-04 20:59                   ` Richard Henderson
  2011-10-04 21:36                     ` Tom de Vries
@ 2011-10-05  8:47                     ` Richard Guenther
  1 sibling, 0 replies; 17+ messages in thread
From: Richard Guenther @ 2011-10-05  8:47 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Tom de Vries, gcc-patches >> "gcc-patches@gcc.gnu.org"

On Tue, Oct 4, 2011 at 10:58 PM, Richard Henderson <rth@redhat.com> wrote:
> On 10/04/2011 01:17 PM, Tom de Vries wrote:
>> In general, to fold vlas (which are lowered to allocas) to normal declarations,
>> if the alloca argument is constant.
>
> Ah.  Ok, I suppose.  How often are you seeing this happening?  I can imagine
> a few instances via inlining, but even there not so much...
>
>> Any guidance on what to do if we have to expand the __builtin_alloca_with_align
>> to a function call, specifically, with the second argument? This is currently
>> handled like this, which is not very nice:
>
> Don't do anything special.  Just let it fall through as with alloca.  This should
> never happen, except for stupid user tricks like -fno-builtin-alloca_with_align.
> And if the user does that, they get what they deserve wrt needing to implement
> that function in their runtime.

Yes, especially as we only use builtin_alloca_with_align for VLAs.  Setting
the assembler name to alloca might be nice to users at least, the
excess argument shouldn't be any problem (well, hopefully targets don't
have a special ABI for alloca ...)

Richard.

>
> r~
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-10-04 16:30             ` Tom de Vries
  2011-10-04 18:54               ` Richard Henderson
@ 2011-10-05  8:54               ` Richard Guenther
  2011-10-05 21:15                 ` Tom de Vries
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Guenther @ 2011-10-05  8:54 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches >> "gcc-patches@gcc.gnu.org"

On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 10/04/2011 03:03 PM, Richard Guenther wrote:
>> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 10/01/2011 05:46 PM, Tom de Vries wrote:
>>>> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>>>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>>>>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> I got a patch for PR50527.
>>>>>>>>
>>>>>>>> The patch prevents the alignment of vla-related allocas to be set to
>>>>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>>>>>>>> the alloca.
>>>>>>>>
>>>>>>>> Bootstrapped and regtested on x86_64.
>>>>>>>>
>>>>>>>> OK for trunk?
>>>>>>>
>>>>>>> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
>>>>>>> the vectorizer then will no longer see that the arrays are properly aligned.
>>>>>>>
>>>>>>> I'm not sure what the best thing to do is here, other than trying to record
>>>>>>> the alignment requirement of the VLA somewhere.
>>>>>>>
>>>>>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>>>>>>> has the issue that it will force stack-realignment which isn't free (and the
>>>>>>> point was to make the decl cheaper than the alloca).  But that might
>>>>>>> possibly be the better choice.
>>>>>>>
>>>>>>> Any other thoughts?
>>>>>>
>>>>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
>>>>>> for the new array prevents stack realignment for folded vla-allocas, also for
>>>>>> large vlas.
>>>>>>
>>>>>> This will not help in vectorizing large folded vla-allocas, but I think it's not
>>>>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
>>>>>> been the case up until we started to fold). If you want to trigger vectorization
>>>>>> for a vla, you can still use the aligned attribute on the declaration.
>>>>>>
>>>>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
>>>>>> an attribute on the decl. This patch exploits this by setting it at the end of
>>>>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>>>>>> propagation though, because although the ptr_info of the lhs is propagated via
>>>>>> copy_prop afterwards, it's not propagated anymore via ccp.
>>>>>>
>>>>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
>>>>>> not fold during ccp3.
>>>>>
>>>>> Ugh, somehow I like this the least ;)
>>>>>
>>>>> How about lowering VLAs to
>>>>>
>>>>>   p = __builtin_alloca (...);
>>>>>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>>>>
>>>>> and not assume anything for alloca itself if it feeds a
>>>>> __builtin_assume_aligned?
>>>>>
>>>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>>>>
>>>>>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>>>>
>>>>> that's less awkward to use?
>>>>>
>>>>> Sorry for not having a clear plan here ;)
>>>>>
>>>>
>>>> Using assume_aligned is a more orthogonal way to represent this in gimple, but
>>>> indeed harder to use.
>>>>
>>>> Another possibility is to add a 'tree vla_decl' field to struct
>>>> gimple_statement_call, which is probably the easiest to implement.
>>>>
>>>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
>>>> decided to try this one. Attached patch implements my first stab at this  (now
>>>> testing on x86_64).
>>>>
>>>> Is this an acceptable approach?
>>>>
>>>
>>> bootstrapped and reg-tested (including ada) on x86_64.
>>>
>>> Ok for trunk?
>>
>> The idea is ok I think.  But
>>
>>      case BUILT_IN_ALLOCA:
>> +    case BUILT_IN_ALLOCA_WITH_ALIGN:
>>        /* If the allocation stems from the declaration of a variable-sized
>>          object, it cannot accumulate.  */
>>        target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
>>        if (target)
>>         return target;
>> +      if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
>> +         == BUILT_IN_ALLOCA_WITH_ALIGN)
>> +       {
>> +         tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
>> +                                              built_in_decls[BUILT_IN_ALLOCA],
>> +                                              1, CALL_EXPR_ARG (exp, 0));
>> +         CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
>> +         exp = new_call;
>> +       }
>>
>> Ick.  Why can't the rest of the compiler not handle
>> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
>> (thus, arrange things so the assembler name of alloca-with-align is alloca?)
>>
>
> We can set the assembler name in the local_define_builtin call. But that will
> still create a call alloca (12, 4). How do we deal with the second argument?
>
>> I don't see why you still need the special late CCP pass.
>>
>
> For alloca_with_align, the align will minimally be the 2nd argument. This is
> independent of folding, and we can propagate this information in every ccp.
>
> If the alloca_with_align is not folded and will not be folded anymore (something
> we know at the earliest after the propagation phase of the last ccp),
> the alignment of BIGGEST_ALIGNMENT is guaranteed, because we expand it like a
> normal allloca. We try to exploit this by improving the ptr_info->align.

I'd just assume the (lower) constant alignment of the argument always.

> Well, that was the idea. But now I wonder, isn't it better to do this in
> expand_builtin_alloca:
> ...
>   /* Compute the argument.  */
>   op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
>
> +  align =
> +    (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) ==
> +     BUILT_IN_ALLOCA_WITH_ALIGN)
> +    ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
> +    : BIGGEST_ALIGNMENT;
> +
> +
>   /* Allocate the desired space.  */
> -  result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
> -                                        cannot_accumulate);
> +  result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);
>   result = convert_memory_address (ptr_mode, result);

Yes, that's probably a good idea anyway - it will reduce excess
stack re-alignment for VLAs.

>   return result;
>
> ...
>
> This way, we lower the alignment requirement for vlas in general, not only when
> folding (unless we expand to an actual alloca call).
>
> If we do that, we don't know whether BIGGEST_ALIGNMENT is going to be applicable
> until expand, so we can't exploit that in the middle-end, so we lose the need
> for ccp_last.

Good ;)

Thanks,
Richard.

>> -static tree
>> -fold_builtin_alloca_for_var (gimple stmt)
>> +static bool
>> +builtin_alloca_with_align_p (gimple stmt)
>>  {
>> -  unsigned HOST_WIDE_INT size, threshold, n_elem;
>> -  tree lhs, arg, block, var, elem_type, array_type;
>> -  unsigned int align;
>> +  tree fndecl;
>> +  if (!is_gimple_call (stmt))
>> +    return false;
>> +
>> +  fndecl = gimple_call_fndecl (stmt);
>> +
>> +  return (fndecl != NULL_TREE
>> +         && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN);
>> +}
>>
>> equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN).
>>
>> +  DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>>
>> Now, users might call __builtin_alloca_with_align (8, align), thus use
>> a non-constant alignment argument.  You either need to reject this
>> in c-family/c-common.c:check_builtin_function_arguments with an
>> error or fall back to BIGGEST_ALIGNMENT (similarly during propagation
>> I suppose).
>>
>> +DEF_BUILTIN_STUB       (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align")
>>
>> Oh, I see you are not exposing this to users, so ignore the comments
>> about non-constant align.  Can you move this
>> DEF down, near the other DEF_BUILTIN_STUBs and add a comment
>> that this is used for VLAs?
>>
>> +                                      build_int_cstu (size_type_node,
>> +                                                      DECL_ALIGN (parm)));
>>
>> size_int (DECL_ALIGN (parm)), similar in other places.
>>
>> Let's see if there are comments from others on this - which I like.
>>
>
> Thanks,
> - Tom
>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> - Tom
>>>
>
>
>>> 2011-10-03  Tom de Vries  <tom@codesourcery.com>
>>>
>>>        PR middle-end/50527
>>>        * tree.c (build_common_builtin_nodes): Add local_define_builtin for
>>>        BUILT_IN_ALLOCA_WITH_ALIGN.
>>>        * builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
>>>        arglist.
>>>        (expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.  Expand
>>>        BUILT_IN_ALLOCA_WITH_ALIGN as BUILT_IN_ALLOCA.
>>>        (is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>>>        * tree-pass.h (pass_ccp_last): Declare.
>>>        * passes.c (init_optimization_passes): Replace 3rd pass_ccp with
>>>        pass_ccp_last.
>>>        * gimplify.c (gimplify_vla_decl): Lower vla to
>>>        BUILT_IN_ALLOCA_WITH_ALIGN.
>>>        * tree-ssa-ccp.c (cpp_last): Define.
>>>        (ccp_finalize): Improve align ptr_info for BUILT_IN_ALLOCA_WITH_ALIGN.
>>>        (evaluate_stmt): Set align for BUILT_IN_ALLOCA_WITH_ALIGN.
>>>        (builtin_alloca_with_align_p): New function.
>>>        (fold_builtin_alloca_with_align_p): New function, factored out of ...
>>>        (fold_builtin_alloca_for_var): Renamed to ...
>>>        (fold_builtin_alloca_with_align): Use fold_builtin_alloca_with_align_p.
>>>        Set DECL_ALIGN from BUILT_IN_ALLOCA_WITH_ALIGN argument.
>>>        (ccp_fold_stmt): Try folding using fold_builtin_alloca_with_align if
>>>        builtin_alloca_with_align_p.
>>>        (do_ssa_ccp_last): New function.
>>>        (pass_ccp_last): Define.
>>>        (optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>>>        * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
>>>        DEF_BUILTIN_STUB.
>>>        * ipa-pure-const.c (special_builtin_state): Handle
>>>        BUILT_IN_ALLOCA_WITH_ALIGN.
>>>        * tree-ssa-alias.c (ref_maybe_used_by_call_p_1
>>>        (call_may_clobber_ref_p_1): Same.
>>>        function.c (gimplify_parameters): Lower vla to
>>>        BUILT_IN_ALLOCA_WITH_ALIGN.
>>>        cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>>>        tree-mudflap.c (mf_xform_statements): Same.
>>>        tree-ssa-dce.c (mark_stmt_if_obviously_necessary)
>>>        (mark_all_reaching_defs_necessary_1, propagate_necessity): Same.
>>>        varasm.c (incorporeal_function_p): Same.
>>>        tree-object-size.c (alloc_object_size): Same.
>>>        gimple.c (gimple_build_call_from_tree): Same.
>>>
>>>        * gcc.dg/pr50527.c: New test.
>>>
>>>
>>>
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-10-05  8:54               ` Richard Guenther
@ 2011-10-05 21:15                 ` Tom de Vries
  2011-10-06 10:08                   ` Richard Guenther
  0 siblings, 1 reply; 17+ messages in thread
From: Tom de Vries @ 2011-10-05 21:15 UTC (permalink / raw)
  To: Richard Guenther
  Cc: gcc-patches >> "gcc-patches@gcc.gnu.org",
	Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 10499 bytes --]

On 10/05/2011 10:46 AM, Richard Guenther wrote:
> On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 10/04/2011 03:03 PM, Richard Guenther wrote:
>>> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>> On 10/01/2011 05:46 PM, Tom de Vries wrote:
>>>>> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>>>>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>>>>>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> I got a patch for PR50527.
>>>>>>>>>
>>>>>>>>> The patch prevents the alignment of vla-related allocas to be set to
>>>>>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>>>>>>>>> the alloca.
>>>>>>>>>
>>>>>>>>> Bootstrapped and regtested on x86_64.
>>>>>>>>>
>>>>>>>>> OK for trunk?
>>>>>>>>
>>>>>>>> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
>>>>>>>> the vectorizer then will no longer see that the arrays are properly aligned.
>>>>>>>>
>>>>>>>> I'm not sure what the best thing to do is here, other than trying to record
>>>>>>>> the alignment requirement of the VLA somewhere.
>>>>>>>>
>>>>>>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>>>>>>>> has the issue that it will force stack-realignment which isn't free (and the
>>>>>>>> point was to make the decl cheaper than the alloca).  But that might
>>>>>>>> possibly be the better choice.
>>>>>>>>
>>>>>>>> Any other thoughts?
>>>>>>>
>>>>>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
>>>>>>> for the new array prevents stack realignment for folded vla-allocas, also for
>>>>>>> large vlas.
>>>>>>>
>>>>>>> This will not help in vectorizing large folded vla-allocas, but I think it's not
>>>>>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
>>>>>>> been the case up until we started to fold). If you want to trigger vectorization
>>>>>>> for a vla, you can still use the aligned attribute on the declaration.
>>>>>>>
>>>>>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
>>>>>>> an attribute on the decl. This patch exploits this by setting it at the end of
>>>>>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>>>>>>> propagation though, because although the ptr_info of the lhs is propagated via
>>>>>>> copy_prop afterwards, it's not propagated anymore via ccp.
>>>>>>>
>>>>>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
>>>>>>> not fold during ccp3.
>>>>>>
>>>>>> Ugh, somehow I like this the least ;)
>>>>>>
>>>>>> How about lowering VLAs to
>>>>>>
>>>>>>   p = __builtin_alloca (...);
>>>>>>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>>>>>
>>>>>> and not assume anything for alloca itself if it feeds a
>>>>>> __builtin_assume_aligned?
>>>>>>
>>>>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>>>>>
>>>>>>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>>>>>
>>>>>> that's less awkward to use?
>>>>>>
>>>>>> Sorry for not having a clear plan here ;)
>>>>>>
>>>>>
>>>>> Using assume_aligned is a more orthogonal way to represent this in gimple, but
>>>>> indeed harder to use.
>>>>>
>>>>> Another possibility is to add a 'tree vla_decl' field to struct
>>>>> gimple_statement_call, which is probably the easiest to implement.
>>>>>
>>>>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
>>>>> decided to try this one. Attached patch implements my first stab at this  (now
>>>>> testing on x86_64).
>>>>>
>>>>> Is this an acceptable approach?
>>>>>
>>>>
>>>> bootstrapped and reg-tested (including ada) on x86_64.
>>>>
>>>> Ok for trunk?
>>>
>>> The idea is ok I think.  But
>>>
>>>      case BUILT_IN_ALLOCA:
>>> +    case BUILT_IN_ALLOCA_WITH_ALIGN:
>>>        /* If the allocation stems from the declaration of a variable-sized
>>>          object, it cannot accumulate.  */
>>>        target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
>>>        if (target)
>>>         return target;
>>> +      if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
>>> +         == BUILT_IN_ALLOCA_WITH_ALIGN)
>>> +       {
>>> +         tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
>>> +                                              built_in_decls[BUILT_IN_ALLOCA],
>>> +                                              1, CALL_EXPR_ARG (exp, 0));
>>> +         CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
>>> +         exp = new_call;
>>> +       }
>>>
>>> Ick.  Why can't the rest of the compiler not handle
>>> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
>>> (thus, arrange things so the assembler name of alloca-with-align is alloca?)
>>>
>>
>> We can set the assembler name in the local_define_builtin call. But that will
>> still create a call alloca (12, 4). How do we deal with the second argument?
>>
>>> I don't see why you still need the special late CCP pass.
>>>
>>
>> For alloca_with_align, the align will minimally be the 2nd argument. This is
>> independent of folding, and we can propagate this information in every ccp.
>>
>> If the alloca_with_align is not folded and will not be folded anymore (something
>> we know at the earliest after the propagation phase of the last ccp),
>> the alignment of BIGGEST_ALIGNMENT is guaranteed, because we expand it like a
>> normal allloca. We try to exploit this by improving the ptr_info->align.
> 
> I'd just assume the (lower) constant alignment of the argument always.
> 
>> Well, that was the idea. But now I wonder, isn't it better to do this in
>> expand_builtin_alloca:
>> ...
>>   /* Compute the argument.  */
>>   op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
>>
>> +  align =
>> +    (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) ==
>> +     BUILT_IN_ALLOCA_WITH_ALIGN)
>> +    ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
>> +    : BIGGEST_ALIGNMENT;
>> +
>> +
>>   /* Allocate the desired space.  */
>> -  result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
>> -                                        cannot_accumulate);
>> +  result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);
>>   result = convert_memory_address (ptr_mode, result);
> 
> Yes, that's probably a good idea anyway - it will reduce excess
> stack re-alignment for VLAs.
> 
>>   return result;
>>
>> ...
>>
>> This way, we lower the alignment requirement for vlas in general, not only when
>> folding (unless we expand to an actual alloca call).
>>
>> If we do that, we don't know whether BIGGEST_ALIGNMENT is going to be applicable
>> until expand, so we can't exploit that in the middle-end, so we lose the need
>> for ccp_last.
> 
> Good ;)
> 

Handled this way now.

> Thanks,
> Richard.
> 
>>> -static tree
>>> -fold_builtin_alloca_for_var (gimple stmt)
>>> +static bool
>>> +builtin_alloca_with_align_p (gimple stmt)
>>>  {
>>> -  unsigned HOST_WIDE_INT size, threshold, n_elem;
>>> -  tree lhs, arg, block, var, elem_type, array_type;
>>> -  unsigned int align;
>>> +  tree fndecl;
>>> +  if (!is_gimple_call (stmt))
>>> +    return false;
>>> +
>>> +  fndecl = gimple_call_fndecl (stmt);
>>> +
>>> +  return (fndecl != NULL_TREE
>>> +         && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN);
>>> +}
>>>
>>> equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN).
>>>
>>> +  DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>>>
>>> Now, users might call __builtin_alloca_with_align (8, align), thus use
>>> a non-constant alignment argument.  You either need to reject this
>>> in c-family/c-common.c:check_builtin_function_arguments with an
>>> error or fall back to BIGGEST_ALIGNMENT (similarly during propagation
>>> I suppose).
>>>
>>> +DEF_BUILTIN_STUB       (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align")
>>>
>>> Oh, I see you are not exposing this to users, so ignore the comments
>>> about non-constant align.

>>> Can you move this
>>> DEF down, near the other DEF_BUILTIN_STUBs and add a comment
>>> that this is used for VLAs?

Done.

>>>
>>> +                                      build_int_cstu (size_type_node,
>>> +                                                      DECL_ALIGN (parm)));
>>>
>>> size_int (DECL_ALIGN (parm)), similar in other places.
>>>

Done.

>>> Let's see if there are comments from others on this - which I like.
>>>
>>
>> Thanks,
>> - Tom
>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks,
>>>> - Tom
>>>>
>>
>>

Also, the alloca_with_align (a,b) now appears in assembly as alloca (a, b), as
discussed in http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00307.html.

Bootstrapped and regtested (including ada) on x86_64.

OK for trunk?

Thanks,
- Tom

2011-10-05  Tom de Vries  <tom@codesourcery.com>

	PR middle-end/50527
	* tree.c (build_common_builtin_nodes): Add local_define_builtin for
	BUILT_IN_ALLOCA_WITH_ALIGN.  Mark that BUILT_IN_ALLOCA_WITH_ALIGN can
	throw.
	* builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
	arglist.  Set align for	BUILT_IN_ALLOCA_WITH_ALIGN.
	(expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
	(is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
	* tree-ssa-ccp.c (evaluate_stmt): Set align for
	BUILT_IN_ALLOCA_WITH_ALIGN.
	(fold_builtin_alloca_for_var): Rename to ...
	(fold_builtin_alloca_with_align): Set DECL_ALIGN from 2nd
	BUILT_IN_ALLOCA_WITH_ALIGN argument.
	(ccp_fold_stmt): Try folding BUILT_IN_ALLOCA_WITH_ALIGN using
	fold_builtin_alloca_with_align.
	(optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
	* builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
	DEF_BUILTIN_STUB.
	* ipa-pure-const.c (special_builtin_state): Handle
	BUILT_IN_ALLOCA_WITH_ALIGN.
	* tree-ssa-alias.c (ref_maybe_used_by_call_p_1)
	(call_may_clobber_ref_p_1): Same.
	* function.c (gimplify_parameters): Lower vla to
	BUILT_IN_ALLOCA_WITH_ALIGN.
	* gimplify.c (gimplify_vla_decl): Same.
	* cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
	* tree-mudflap.c (mf_xform_statements): Same.
	* tree-ssa-dce.c (mark_stmt_if_obviously_necessary)
	(mark_all_reaching_defs_necessary_1, propagate_necessity): Same.
	* varasm.c (incorporeal_function_p): Same.
	* tree-object-size.c (alloc_object_size): Same.
	* gimple.c (gimple_build_call_from_tree): Same.

	* gcc.dg/pr50527.c: New test.

[-- Attachment #2: pr50527.13.patch --]
[-- Type: text/x-patch, Size: 15940 bytes --]

Index: gcc/tree.c
===================================================================
--- gcc/tree.c (revision 179210)
+++ gcc/tree.c (working copy)
@@ -9483,9 +9483,18 @@ build_common_builtin_nodes (void)
 			    "alloca", ECF_MALLOC | ECF_NOTHROW | ECF_LEAF);
     }
 
+  ftype = build_function_type_list (ptr_type_node, size_type_node,
+				    size_type_node, NULL_TREE);
+  local_define_builtin ("__builtin_alloca_with_align", ftype,
+			BUILT_IN_ALLOCA_WITH_ALIGN, "alloca",
+			ECF_MALLOC | ECF_NOTHROW | ECF_LEAF);
+
   /* If we're checking the stack, `alloca' can throw.  */
   if (flag_stack_check)
-    TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA]) = 0;
+    {
+      TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA]) = 0;
+      TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN]) = 0;
+    }
 
   ftype = build_function_type_list (void_type_node,
 				    ptr_type_node, ptr_type_node,
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c (revision 179210)
+++ gcc/builtins.c (working copy)
@@ -4516,20 +4516,33 @@ expand_builtin_alloca (tree exp, bool ca
 {
   rtx op0;
   rtx result;
+  bool valid_arglist;
+  unsigned int align;
+  bool alloca_with_align = (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
+			    == BUILT_IN_ALLOCA_WITH_ALIGN);
 
   /* Emit normal call if marked not-inlineable.  */
   if (CALL_CANNOT_INLINE_P (exp))
     return NULL_RTX;
 
-  if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
+  valid_arglist
+    = (alloca_with_align
+       ? validate_arglist (exp, INTEGER_TYPE, INTEGER_TYPE, VOID_TYPE)
+       : validate_arglist (exp, INTEGER_TYPE, VOID_TYPE));
+
+  if (!valid_arglist)
     return NULL_RTX;
 
   /* Compute the argument.  */
   op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
 
+  /* Compute the alignment.  */
+  align = (alloca_with_align
+	   ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
+	   : BIGGEST_ALIGNMENT);
+
   /* Allocate the desired space.  */
-  result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
-					 cannot_accumulate);
+  result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);
   result = convert_memory_address (ptr_mode, result);
 
   return result;
@@ -5304,6 +5317,7 @@ expand_builtin (tree exp, rtx target, rt
       && !called_as_built_in (fndecl)
       && DECL_ASSEMBLER_NAME_SET_P (fndecl)
       && fcode != BUILT_IN_ALLOCA
+      && fcode != BUILT_IN_ALLOCA_WITH_ALIGN
       && fcode != BUILT_IN_FREE)
     return expand_call (exp, target, ignore);
 
@@ -5559,6 +5573,7 @@ expand_builtin (tree exp, rtx target, rt
 	return XEXP (DECL_RTL (DECL_RESULT (current_function_decl)), 0);
 
     case BUILT_IN_ALLOCA:
+    case BUILT_IN_ALLOCA_WITH_ALIGN:
       /* If the allocation stems from the declaration of a variable-sized
 	 object, it cannot accumulate.  */
       target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
@@ -13561,6 +13576,7 @@ is_inexpensive_builtin (tree decl)
       {
       case BUILT_IN_ABS:
       case BUILT_IN_ALLOCA:
+      case BUILT_IN_ALLOCA_WITH_ALIGN:
       case BUILT_IN_BSWAP32:
       case BUILT_IN_BSWAP64:
       case BUILT_IN_CLZ:
Index: gcc/tree-ssa-ccp.c
===================================================================
--- gcc/tree-ssa-ccp.c (revision 179210)
+++ gcc/tree-ssa-ccp.c (working copy)
@@ -1492,6 +1492,7 @@ evaluate_stmt (gimple stmt)
   tree simplified = NULL_TREE;
   ccp_lattice_t likelyvalue = likely_value (stmt);
   bool is_constant = false;
+  unsigned int align;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -1632,10 +1633,14 @@ evaluate_stmt (gimple stmt)
 	      break;
 
 	    case BUILT_IN_ALLOCA:
+	    case BUILT_IN_ALLOCA_WITH_ALIGN:
+	      align = (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN
+		       ? TREE_INT_CST_LOW (gimple_call_arg (stmt, 1))
+		       : BIGGEST_ALIGNMENT);
 	      val.lattice_val = CONSTANT;
 	      val.value = build_int_cst (TREE_TYPE (gimple_get_lhs (stmt)), 0);
 	      val.mask = shwi_to_double_int
-		  	   (~(((HOST_WIDE_INT) BIGGEST_ALIGNMENT)
+		  	   (~(((HOST_WIDE_INT) align)
 			      / BITS_PER_UNIT - 1));
 	      break;
 
@@ -1685,15 +1690,15 @@ evaluate_stmt (gimple stmt)
   return val;
 }
 
-/* Detects a vla-related alloca with a constant argument.  Declares fixed-size
-   array and return the address, if found, otherwise returns NULL_TREE.  */
+/* Detects a __builtin_alloca_with_align with constant size argument.  Declares
+   fixed-size array and returns the address, if found, otherwise returns
+   NULL_TREE.  */
 
 static tree
-fold_builtin_alloca_for_var (gimple stmt)
+fold_builtin_alloca_with_align (gimple stmt)
 {
   unsigned HOST_WIDE_INT size, threshold, n_elem;
   tree lhs, arg, block, var, elem_type, array_type;
-  unsigned int align;
 
   /* Get lhs.  */
   lhs = gimple_call_lhs (stmt);
@@ -1709,10 +1714,10 @@ fold_builtin_alloca_for_var (gimple stmt
 
   size = TREE_INT_CST_LOW (arg);
 
-  /* Heuristic: don't fold large vlas.  */
+  /* Heuristic: don't fold large allocas.  */
   threshold = (unsigned HOST_WIDE_INT)PARAM_VALUE (PARAM_LARGE_STACK_FRAME);
-  /* In case a vla is declared at function scope, it has the same lifetime as a
-     declared array, so we allow a larger size.  */
+  /* In case the alloca is located at function entry, it has the same lifetime
+     as a declared array, so we allow a larger size.  */
   block = gimple_block (stmt);
   if (!(cfun->after_inlining
         && TREE_CODE (BLOCK_SUPERCONTEXT (block)) == FUNCTION_DECL))
@@ -1723,12 +1728,9 @@ fold_builtin_alloca_for_var (gimple stmt
   /* Declare array.  */
   elem_type = build_nonstandard_integer_type (BITS_PER_UNIT, 1);
   n_elem = size * 8 / BITS_PER_UNIT;
-  align = MIN (size * 8, BIGGEST_ALIGNMENT);
-  if (align < BITS_PER_UNIT)
-    align = BITS_PER_UNIT;
   array_type = build_array_type_nelts (elem_type, n_elem);
   var = create_tmp_var (array_type, NULL);
-  DECL_ALIGN (var) = align;
+  DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
   {
     struct ptr_info_def *pi = SSA_NAME_PTR_INFO (lhs);
     if (pi != NULL && !pi->pt.anything)
@@ -1813,12 +1815,12 @@ ccp_fold_stmt (gimple_stmt_iterator *gsi
 	if (gimple_call_internal_p (stmt))
 	  return false;
 
-        /* The heuristic of fold_builtin_alloca_for_var differs before and after
-           inlining, so we don't require the arg to be changed into a constant
-           for folding, but just to be constant.  */
-        if (gimple_call_alloca_for_var_p (stmt))
+        /* The heuristic of fold_builtin_alloca_with_align differs before and
+	   after inlining, so we don't require the arg to be changed into a
+	   constant for folding, but just to be constant.  */
+        if (gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN))
           {
-            tree new_rhs = fold_builtin_alloca_for_var (stmt);
+            tree new_rhs = fold_builtin_alloca_with_align (stmt);
             if (new_rhs)
 	      {
 		bool res = update_call_from_tree (gsi, new_rhs);
@@ -2093,7 +2095,8 @@ optimize_stack_restore (gimple_stmt_iter
       if (!callee
 	  || DECL_BUILT_IN_CLASS (callee) != BUILT_IN_NORMAL
 	  /* All regular builtins are ok, just obviously not alloca.  */
-	  || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA)
+	  || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA
+	  || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA_WITH_ALIGN)
 	return NULL_TREE;
 
       if (DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE)
Index: gcc/builtins.def
===================================================================
--- gcc/builtins.def (revision 179210)
+++ gcc/builtins.def (working copy)
@@ -738,6 +738,7 @@ DEF_BUILTIN_STUB (BUILT_IN_SETJMP_RECEIV
 /* Implementing variable sized local variables.  */
 DEF_BUILTIN_STUB (BUILT_IN_STACK_SAVE, "__builtin_stack_save")
 DEF_BUILTIN_STUB (BUILT_IN_STACK_RESTORE, "__builtin_stack_restore")
+DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN, "__builtin_alloca_with_align")
 
 /* Object size checking builtins.  */
 DEF_GCC_BUILTIN	       (BUILT_IN_OBJECT_SIZE, "object_size", BT_FN_SIZE_CONST_PTR_INT, ATTR_PURE_NOTHROW_LEAF_LIST)
Index: gcc/ipa-pure-const.c
===================================================================
--- gcc/ipa-pure-const.c (revision 179210)
+++ gcc/ipa-pure-const.c (working copy)
@@ -437,6 +437,7 @@ special_builtin_state (enum pure_const_s
 	case BUILT_IN_RETURN:
 	case BUILT_IN_UNREACHABLE:
 	case BUILT_IN_ALLOCA:
+	case BUILT_IN_ALLOCA_WITH_ALIGN:
 	case BUILT_IN_STACK_SAVE:
 	case BUILT_IN_STACK_RESTORE:
 	case BUILT_IN_EH_POINTER:
Index: gcc/tree-ssa-alias.c
===================================================================
--- gcc/tree-ssa-alias.c (revision 179210)
+++ gcc/tree-ssa-alias.c (working copy)
@@ -1231,6 +1231,7 @@ ref_maybe_used_by_call_p_1 (gimple call,
 	case BUILT_IN_MALLOC:
 	case BUILT_IN_CALLOC:
 	case BUILT_IN_ALLOCA:
+	case BUILT_IN_ALLOCA_WITH_ALIGN:
 	case BUILT_IN_STACK_SAVE:
 	case BUILT_IN_STACK_RESTORE:
 	case BUILT_IN_MEMSET:
@@ -1513,6 +1514,7 @@ call_may_clobber_ref_p_1 (gimple call, a
 	  return false;
 	case BUILT_IN_STACK_SAVE:
 	case BUILT_IN_ALLOCA:
+	case BUILT_IN_ALLOCA_WITH_ALIGN:
 	case BUILT_IN_ASSUME_ALIGNED:
 	  return false;
 	/* Freeing memory kills the pointed-to memory.  More importantly
Index: gcc/function.c
===================================================================
--- gcc/function.c (revision 179210)
+++ gcc/function.c (working copy)
@@ -3638,8 +3638,10 @@ gimplify_parameters (void)
 		  DECL_IGNORED_P (addr) = 0;
 		  local = build_fold_indirect_ref (addr);
 
-		  t = built_in_decls[BUILT_IN_ALLOCA];
-		  t = build_call_expr (t, 1, DECL_SIZE_UNIT (parm));
+		  t = built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN];
+		  t = build_call_expr (t, 1, DECL_SIZE_UNIT (parm),
+				       size_int (DECL_ALIGN (parm)));
+
 		  /* The call has been built for a variable-sized object.  */
 		  CALL_ALLOCA_FOR_VAR_P (t) = 1;
 		  t = fold_convert (ptr_type, t);
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c (revision 179210)
+++ gcc/gimplify.c (working copy)
@@ -1329,8 +1329,9 @@ gimplify_vla_decl (tree decl, gimple_seq
   SET_DECL_VALUE_EXPR (decl, t);
   DECL_HAS_VALUE_EXPR_P (decl) = 1;
 
-  t = built_in_decls[BUILT_IN_ALLOCA];
-  t = build_call_expr (t, 1, DECL_SIZE_UNIT (decl));
+  t = built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN];
+  t = build_call_expr (t, 2, DECL_SIZE_UNIT (decl),
+		       size_int (DECL_ALIGN (decl)));
   /* The call has been built for a variable-sized object.  */
   CALL_ALLOCA_FOR_VAR_P (t) = 1;
   t = fold_convert (ptr_type, t);
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c (revision 179210)
+++ gcc/cfgexpand.c (working copy)
@@ -1858,7 +1858,8 @@ expand_call_stmt (gimple stmt)
   CALL_EXPR_RETURN_SLOT_OPT (exp) = gimple_call_return_slot_opt_p (stmt);
   if (decl
       && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL
-      && DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA)
+      && (DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA
+	  || DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA_WITH_ALIGN))
     CALL_ALLOCA_FOR_VAR_P (exp) = gimple_call_alloca_for_var_p (stmt);
   else
     CALL_FROM_THUNK_P (exp) = gimple_call_from_thunk_p (stmt);
Index: gcc/tree-mudflap.c
===================================================================
--- gcc/tree-mudflap.c (revision 179210)
+++ gcc/tree-mudflap.c (working copy)
@@ -969,7 +969,9 @@ mf_xform_statements (void)
             case GIMPLE_CALL:
               {
                 tree fndecl = gimple_call_fndecl (s);
-                if (fndecl && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA))
+                if (fndecl && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA
+			       || (DECL_FUNCTION_CODE (fndecl)
+				   == BUILT_IN_ALLOCA_WITH_ALIGN)))
                   gimple_call_set_cannot_inline (s, true);
               }
               break;
Index: gcc/tree-ssa-dce.c
===================================================================
--- gcc/tree-ssa-dce.c (revision 179210)
+++ gcc/tree-ssa-dce.c (working copy)
@@ -308,6 +308,7 @@ mark_stmt_if_obviously_necessary (gimple
 	    case BUILT_IN_MALLOC:
 	    case BUILT_IN_CALLOC:
 	    case BUILT_IN_ALLOCA:
+	    case BUILT_IN_ALLOCA_WITH_ALIGN:
 	      return;
 
 	    default:;
@@ -639,6 +640,7 @@ mark_all_reaching_defs_necessary_1 (ao_r
 	  case BUILT_IN_MALLOC:
 	  case BUILT_IN_CALLOC:
 	  case BUILT_IN_ALLOCA:
+	  case BUILT_IN_ALLOCA_WITH_ALIGN:
 	  case BUILT_IN_FREE:
 	    return false;
 
@@ -890,6 +892,8 @@ propagate_necessity (struct edge_list *e
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_FREE
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_VA_END
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA
+		      || (DECL_FUNCTION_CODE (callee)
+			  == BUILT_IN_ALLOCA_WITH_ALIGN)
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_SAVE
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_ASSUME_ALIGNED))
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c (revision 179210)
+++ gcc/varasm.c (working copy)
@@ -2104,7 +2104,8 @@ incorporeal_function_p (tree decl)
       const char *name;
 
       if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL
-	  && DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA)
+	  && (DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA
+	      || DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA_WITH_ALIGN))
 	return true;
 
       name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
Index: gcc/tree-object-size.c
===================================================================
--- gcc/tree-object-size.c (revision 179210)
+++ gcc/tree-object-size.c (working copy)
@@ -411,6 +411,7 @@ alloc_object_size (const_gimple call, in
 	/* fall through */
       case BUILT_IN_MALLOC:
       case BUILT_IN_ALLOCA:
+      case BUILT_IN_ALLOCA_WITH_ALIGN:
 	arg1 = 0;
       default:
 	break;
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c (revision 179210)
+++ gcc/gimple.c (working copy)
@@ -374,7 +374,8 @@ gimple_build_call_from_tree (tree t)
   gimple_call_set_return_slot_opt (call, CALL_EXPR_RETURN_SLOT_OPT (t));
   if (fndecl
       && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
-      && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA)
+      && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA
+	  || DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN))
     gimple_call_set_alloca_for_var (call, CALL_ALLOCA_FOR_VAR_P (t));
   else
     gimple_call_set_from_thunk (call, CALL_FROM_THUNK_P (t));
Index: gcc/testsuite/gcc.dg/pr50527.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr50527.c (revision 0)
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-options "-Os --param large-stack-frame=30" } */
+
+extern void abort (void);
+
+void __attribute__((noinline))
+bar (char *a)
+{
+}
+
+void __attribute__((noinline))
+foo (char *a, int b)
+{
+}
+
+void __attribute__((noinline))
+test_align (char *p, int aligned, unsigned int mask)
+{
+  int p_aligned = ((unsigned long int)p & mask) == 0;
+  if (aligned != p_aligned)
+    abort ();
+}
+
+int
+main ()
+{
+  const int kIterations = 4;
+  char results[kIterations];
+  int i;
+  unsigned int mask;
+
+  mask = 0xf;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x7;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x3;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x1;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+
+  bar (results);
+  for (i = 0; i < kIterations; i++)
+    foo ("%d ", results[i]);
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
  2011-10-05 21:15                 ` Tom de Vries
@ 2011-10-06 10:08                   ` Richard Guenther
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Guenther @ 2011-10-06 10:08 UTC (permalink / raw)
  To: Tom de Vries
  Cc: gcc-patches >> "gcc-patches@gcc.gnu.org",
	Richard Henderson

On Wed, Oct 5, 2011 at 11:07 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 10/05/2011 10:46 AM, Richard Guenther wrote:
>> On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 10/04/2011 03:03 PM, Richard Guenther wrote:
>>>> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>> On 10/01/2011 05:46 PM, Tom de Vries wrote:
>>>>>> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>>>>>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>>>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>>>>>>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> I got a patch for PR50527.
>>>>>>>>>>
>>>>>>>>>> The patch prevents the alignment of vla-related allocas to be set to
>>>>>>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>>>>>>>>>> the alloca.
>>>>>>>>>>
>>>>>>>>>> Bootstrapped and regtested on x86_64.
>>>>>>>>>>
>>>>>>>>>> OK for trunk?
>>>>>>>>>
>>>>>>>>> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
>>>>>>>>> the vectorizer then will no longer see that the arrays are properly aligned.
>>>>>>>>>
>>>>>>>>> I'm not sure what the best thing to do is here, other than trying to record
>>>>>>>>> the alignment requirement of the VLA somewhere.
>>>>>>>>>
>>>>>>>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>>>>>>>>> has the issue that it will force stack-realignment which isn't free (and the
>>>>>>>>> point was to make the decl cheaper than the alloca).  But that might
>>>>>>>>> possibly be the better choice.
>>>>>>>>>
>>>>>>>>> Any other thoughts?
>>>>>>>>
>>>>>>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
>>>>>>>> for the new array prevents stack realignment for folded vla-allocas, also for
>>>>>>>> large vlas.
>>>>>>>>
>>>>>>>> This will not help in vectorizing large folded vla-allocas, but I think it's not
>>>>>>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
>>>>>>>> been the case up until we started to fold). If you want to trigger vectorization
>>>>>>>> for a vla, you can still use the aligned attribute on the declaration.
>>>>>>>>
>>>>>>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
>>>>>>>> an attribute on the decl. This patch exploits this by setting it at the end of
>>>>>>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>>>>>>>> propagation though, because although the ptr_info of the lhs is propagated via
>>>>>>>> copy_prop afterwards, it's not propagated anymore via ccp.
>>>>>>>>
>>>>>>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
>>>>>>>> not fold during ccp3.
>>>>>>>
>>>>>>> Ugh, somehow I like this the least ;)
>>>>>>>
>>>>>>> How about lowering VLAs to
>>>>>>>
>>>>>>>   p = __builtin_alloca (...);
>>>>>>>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>>>>>>
>>>>>>> and not assume anything for alloca itself if it feeds a
>>>>>>> __builtin_assume_aligned?
>>>>>>>
>>>>>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>>>>>>
>>>>>>>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>>>>>>
>>>>>>> that's less awkward to use?
>>>>>>>
>>>>>>> Sorry for not having a clear plan here ;)
>>>>>>>
>>>>>>
>>>>>> Using assume_aligned is a more orthogonal way to represent this in gimple, but
>>>>>> indeed harder to use.
>>>>>>
>>>>>> Another possibility is to add a 'tree vla_decl' field to struct
>>>>>> gimple_statement_call, which is probably the easiest to implement.
>>>>>>
>>>>>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
>>>>>> decided to try this one. Attached patch implements my first stab at this  (now
>>>>>> testing on x86_64).
>>>>>>
>>>>>> Is this an acceptable approach?
>>>>>>
>>>>>
>>>>> bootstrapped and reg-tested (including ada) on x86_64.
>>>>>
>>>>> Ok for trunk?
>>>>
>>>> The idea is ok I think.  But
>>>>
>>>>      case BUILT_IN_ALLOCA:
>>>> +    case BUILT_IN_ALLOCA_WITH_ALIGN:
>>>>        /* If the allocation stems from the declaration of a variable-sized
>>>>          object, it cannot accumulate.  */
>>>>        target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
>>>>        if (target)
>>>>         return target;
>>>> +      if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
>>>> +         == BUILT_IN_ALLOCA_WITH_ALIGN)
>>>> +       {
>>>> +         tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
>>>> +                                              built_in_decls[BUILT_IN_ALLOCA],
>>>> +                                              1, CALL_EXPR_ARG (exp, 0));
>>>> +         CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
>>>> +         exp = new_call;
>>>> +       }
>>>>
>>>> Ick.  Why can't the rest of the compiler not handle
>>>> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
>>>> (thus, arrange things so the assembler name of alloca-with-align is alloca?)
>>>>
>>>
>>> We can set the assembler name in the local_define_builtin call. But that will
>>> still create a call alloca (12, 4). How do we deal with the second argument?
>>>
>>>> I don't see why you still need the special late CCP pass.
>>>>
>>>
>>> For alloca_with_align, the align will minimally be the 2nd argument. This is
>>> independent of folding, and we can propagate this information in every ccp.
>>>
>>> If the alloca_with_align is not folded and will not be folded anymore (something
>>> we know at the earliest after the propagation phase of the last ccp),
>>> the alignment of BIGGEST_ALIGNMENT is guaranteed, because we expand it like a
>>> normal allloca. We try to exploit this by improving the ptr_info->align.
>>
>> I'd just assume the (lower) constant alignment of the argument always.
>>
>>> Well, that was the idea. But now I wonder, isn't it better to do this in
>>> expand_builtin_alloca:
>>> ...
>>>   /* Compute the argument.  */
>>>   op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
>>>
>>> +  align =
>>> +    (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) ==
>>> +     BUILT_IN_ALLOCA_WITH_ALIGN)
>>> +    ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
>>> +    : BIGGEST_ALIGNMENT;
>>> +
>>> +
>>>   /* Allocate the desired space.  */
>>> -  result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
>>> -                                        cannot_accumulate);
>>> +  result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);
>>>   result = convert_memory_address (ptr_mode, result);
>>
>> Yes, that's probably a good idea anyway - it will reduce excess
>> stack re-alignment for VLAs.
>>
>>>   return result;
>>>
>>> ...
>>>
>>> This way, we lower the alignment requirement for vlas in general, not only when
>>> folding (unless we expand to an actual alloca call).
>>>
>>> If we do that, we don't know whether BIGGEST_ALIGNMENT is going to be applicable
>>> until expand, so we can't exploit that in the middle-end, so we lose the need
>>> for ccp_last.
>>
>> Good ;)
>>
>
> Handled this way now.
>
>> Thanks,
>> Richard.
>>
>>>> -static tree
>>>> -fold_builtin_alloca_for_var (gimple stmt)
>>>> +static bool
>>>> +builtin_alloca_with_align_p (gimple stmt)
>>>>  {
>>>> -  unsigned HOST_WIDE_INT size, threshold, n_elem;
>>>> -  tree lhs, arg, block, var, elem_type, array_type;
>>>> -  unsigned int align;
>>>> +  tree fndecl;
>>>> +  if (!is_gimple_call (stmt))
>>>> +    return false;
>>>> +
>>>> +  fndecl = gimple_call_fndecl (stmt);
>>>> +
>>>> +  return (fndecl != NULL_TREE
>>>> +         && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN);
>>>> +}
>>>>
>>>> equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN).
>>>>
>>>> +  DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>>>>
>>>> Now, users might call __builtin_alloca_with_align (8, align), thus use
>>>> a non-constant alignment argument.  You either need to reject this
>>>> in c-family/c-common.c:check_builtin_function_arguments with an
>>>> error or fall back to BIGGEST_ALIGNMENT (similarly during propagation
>>>> I suppose).
>>>>
>>>> +DEF_BUILTIN_STUB       (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align")
>>>>
>>>> Oh, I see you are not exposing this to users, so ignore the comments
>>>> about non-constant align.
>
>>>> Can you move this
>>>> DEF down, near the other DEF_BUILTIN_STUBs and add a comment
>>>> that this is used for VLAs?
>
> Done.
>
>>>>
>>>> +                                      build_int_cstu (size_type_node,
>>>> +                                                      DECL_ALIGN (parm)));
>>>>
>>>> size_int (DECL_ALIGN (parm)), similar in other places.
>>>>
>
> Done.
>
>>>> Let's see if there are comments from others on this - which I like.
>>>>
>>>
>>> Thanks,
>>> - Tom
>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> Thanks,
>>>>> - Tom
>>>>>
>>>
>>>
>
> Also, the alloca_with_align (a,b) now appears in assembly as alloca (a, b), as
> discussed in http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00307.html.
>
> Bootstrapped and regtested (including ada) on x86_64.
>
> OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
>
> 2011-10-05  Tom de Vries  <tom@codesourcery.com>
>
>        PR middle-end/50527
>        * tree.c (build_common_builtin_nodes): Add local_define_builtin for
>        BUILT_IN_ALLOCA_WITH_ALIGN.  Mark that BUILT_IN_ALLOCA_WITH_ALIGN can
>        throw.
>        * builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
>        arglist.  Set align for BUILT_IN_ALLOCA_WITH_ALIGN.
>        (expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        (is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        * tree-ssa-ccp.c (evaluate_stmt): Set align for
>        BUILT_IN_ALLOCA_WITH_ALIGN.
>        (fold_builtin_alloca_for_var): Rename to ...
>        (fold_builtin_alloca_with_align): Set DECL_ALIGN from 2nd
>        BUILT_IN_ALLOCA_WITH_ALIGN argument.
>        (ccp_fold_stmt): Try folding BUILT_IN_ALLOCA_WITH_ALIGN using
>        fold_builtin_alloca_with_align.
>        (optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
>        DEF_BUILTIN_STUB.
>        * ipa-pure-const.c (special_builtin_state): Handle
>        BUILT_IN_ALLOCA_WITH_ALIGN.
>        * tree-ssa-alias.c (ref_maybe_used_by_call_p_1)
>        (call_may_clobber_ref_p_1): Same.
>        * function.c (gimplify_parameters): Lower vla to
>        BUILT_IN_ALLOCA_WITH_ALIGN.
>        * gimplify.c (gimplify_vla_decl): Same.
>        * cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        * tree-mudflap.c (mf_xform_statements): Same.
>        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary)
>        (mark_all_reaching_defs_necessary_1, propagate_necessity): Same.
>        * varasm.c (incorporeal_function_p): Same.
>        * tree-object-size.c (alloc_object_size): Same.
>        * gimple.c (gimple_build_call_from_tree): Same.
>
>        * gcc.dg/pr50527.c: New test.
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-10-06  9:48 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-28 10:56 [PATCH, PR50527] Don't assume alignment of vla-related allocas Tom de Vries
2011-09-28 11:35 ` Richard Guenther
2011-09-29 15:11   ` Tom de Vries
2011-09-30 14:35     ` Richard Guenther
2011-10-01 15:44       ` Tom de Vries
2011-10-04  7:50         ` Tom de Vries
2011-10-04 13:06           ` Richard Guenther
2011-10-04 16:30             ` Tom de Vries
2011-10-04 18:54               ` Richard Henderson
2011-10-04 20:19                 ` Tom de Vries
2011-10-04 20:59                   ` Richard Henderson
2011-10-04 21:36                     ` Tom de Vries
2011-10-04 23:13                       ` Tom de Vries
2011-10-05  8:47                     ` Richard Guenther
2011-10-05  8:54               ` Richard Guenther
2011-10-05 21:15                 ` Tom de Vries
2011-10-06 10:08                   ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).