public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Implement -fstack-usage option
@ 2010-08-05 15:42 Eric Botcazou
  2010-08-05 19:13 ` Richard Henderson
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Botcazou @ 2010-08-05 15:42 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3784 bytes --]

Hi,

this is again something that has been for a while in our tree.  The command 
line option -fstack-usage is added and makes the compiler generate a second 
output file (xxx.su) for each compilation unit, which contains one line per 
function emitted in the assembly file and describing the stack usage of the 
function.  The exact format is documented in the manual.  The primary goal is 
for the data to be conservatively correct, in particular to give an upper 
bound of the actual stack usage at run time.

Implementation-wise, there is nothing sophisticated, but various quirks of 
expand_call and allocate_dynamic_stack_space need to be properly handled.
It's worth noting that this is implemented only for 7 architectures (Alpha, 
x86/x86-64, IA-64, MIPS, PA, PowerPC and SPARC) but extending it to others 
should be a trivial task for the respective maintainers.

Tested on x86-64-suse-linux, OK for mainline?  If so, what form(s) of tests 
should be written?


2010-08-05  Eric Botcazou  <ebotcazou@adacore.com>

	Stack usage support
	* common.opt (-fstack-usage): New option.
	* doc/invoke.texi (Debugging options): Document it.
	* builtins.c (expand_builtin_apply): Pass TRUE as 4th argument to
	allocate_dynamic_stack_space.
	(expand_builtin_alloca): Add 4th bool parameter CANNOT_ACCUMULATE
	and propagate it to allocate_dynamic_stack_space.
	(expand_builtin) <BUILT_IN_ALLOCA>: Adjust for above change.
	* calls.c (initialize_argument_information): Pass TRUE as 4th
	argument to allocate_dynamic_stack_space.
	(expand_call): Set current_function_has_unbounded_dynamic_stack_size
	to 1 when pushing a variable-sized argument onto the stack.  Pass
	TRUE as 4th argument to allocate_dynamic_stack_space.
	Update current_function_pushed_stack_size.
	(emit_library_call_value_1): Likewise.
	* explow.c (allocate_dynamic_stack_space): Add 4th bool parameter
	CANNOT_ACCUMULATE.  If flag_stack_usage, look into the size and
	attempt to find an upper bound.  Remove redundant code for the
	SETJMP_VIA_SAVE_AREA case.
	* expr.h (allocate_dynamic_stack_space): Add 4th bool parameter.
	* function.h (struct stack_usage): New structure.
	(current_function_static_stack_size): New macro.
	(current_function_dynamic_stack_size): Likewise.
	(current_function_pushed_stack_size): Likewise.
	(current_function_dynamic_alloc_count): Likewise.
	(current_function_has_unbounded_dynamic_stack_size): Likewise.
	(current_function_allocates_dynamic_stack_space): Likewise.
	(struct function): Add new field 'su'.
	* function.c (instantiate_virtual_regs): If SETJMP_VIA_SAVE_AREA,
	add the value of the dynamic offset to the dynamic stack usage.
	(prepare_function_start): Allocate cfun->su if flag_stack_usage.
	(rest_of_handle_thread_prologue_and_epilogue): Call output_stack_usage.
	* gimplify.c (gimplify_decl_expr): Set ALLOCA_FOR_VAR_P on the call
	to BUILT_IN_ALLOCA for variable-sized objects.
	* output.h (output_stack_usage): Declare.
	* toplev.c (stack_usage_file): New file pointer.
	(output_stack_usage): New function.
	(open_auxiliary_file): Likewise.
	(lang_dependent_init): Open file if flag_stack_usage is set.
	(finalize): Close file if stack_usage_file is not null.
	* tree.h (ALLOCA_FOR_VAR_P): New macro.
	* config/alpha/alpha.c (compute_frame_size): New function.
	(alpha_expand_prologue): Use it.
	(alpha_start_function): Likewise.
	(alpha_expand_epilogue): Likewise.
	Set current_function_static_stack_size.
	* config/i386/i386.c (ix86_expand_prologue): Likewise.
	* config/ia64/ia64.c (ia64_expand_prologue): Likewise.
	* config/mips/mips.c (mips_expand_prologue): Likewise.
	* config/pa/pa.c (hppa_expand_prologue): Likewise.
	* config/rs6000/rs6000.c (rs6000_emit_prologue): Likewise.
	* config/sparc/sparc.c (sparc_expand_prologue): Likewise.


-- 
Eric Botcazou

[-- Attachment #2: p46.diff --]
[-- Type: text/x-diff, Size: 32137 bytes --]

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 162897)
+++ doc/invoke.texi	(working copy)
@@ -313,7 +313,7 @@ Objective-C and Objective-C++ Dialects}.
 -fmem-report -fpre-ipa-mem-report -fpost-ipa-mem-report -fprofile-arcs @gol
 -frandom-seed=@var{string} -fsched-verbose=@var{n} @gol
 -fsel-sched-verbose -fsel-sched-dump-cfg -fsel-sched-pipelining-verbose @gol
--ftest-coverage  -ftime-report -fvar-tracking @gol
+-fstack-usage  -ftest-coverage  -ftime-report -fvar-tracking @gol
 -fvar-tracking-assignments  -fvar-tracking-assignments-toggle @gol
 -g  -g@var{level}  -gtoggle  -gcoff  -gdwarf-@var{version} @gol
 -ggdb  -gstabs  -gstabs+  -gstrict-dwarf  -gno-strict-dwarf @gol
@@ -4839,6 +4839,39 @@ allocation when it finishes.
 Makes the compiler print some statistics about permanent memory
 allocation before or after interprocedural optimization.
 
+@item -fstack-usage
+@opindex fstack-usage
+Makes the compiler output stack usage information for the program, on a
+per-function basis.  The filename for the dump is made by appending
+@file{.su} to the AUXNAME.  AUXNAME is generated from the name of
+the output file, if explicitly specified and it is not an executable,
+otherwise it is the basename of the source file.  An entry is made up
+of three fields:
+
+@itemize
+@item
+The name of the function.
+@item
+A number of bytes.
+@item
+One or more qualifiers: @code{static}, @code{dynamic}, @code{bounded}.
+@end itemize
+
+The qualifier @code{static} means that the function manipulates the stack
+statically: a fixed number of bytes are allocated for the frame on function
+entry and released on function exit; no stack adjustments are otherwise made
+in the function.  The second field is this fixed number of bytes.
+
+The qualifier @code{dynamic} means that the function manipulates the stack
+dynamically: in addition to the static allocation described above, stack
+adjustments are made in the body of the function, for example to push/pop
+arguments around function calls.  If the qualifier @code{bounded} is also
+present, the amount of these adjustments is bounded at compile-time and
+the second field is an upper bound of the total amount of stack used by
+the function.  If it is not present, the amount of these adjustments is
+not bounded at compile-time and the second field only represents the
+bounded part.
+
 @item -fprofile-arcs
 @opindex fprofile-arcs
 Add code so that program flow @dfn{arcs} are instrumented.  During
Index: tree.h
===================================================================
--- tree.h	(revision 162897)
+++ tree.h	(working copy)
@@ -511,7 +511,8 @@ struct GTY(()) tree_common {
            BLOCK
            all decls
 
-       CALL_FROM_THUNK_P in
+       CALL_FROM_THUNK_P and
+       ALLOCA_FOR_VAR_P in
            CALL_EXPR
 
    side_effects_flag:
@@ -1318,6 +1319,10 @@ extern void omp_clause_range_check_faile
    thunked-to function.  */
 #define CALL_FROM_THUNK_P(NODE) (CALL_EXPR_CHECK (NODE)->base.protected_flag)
 
+/* In a CALL_EXPR, if the function being called is BUILT_IN_ALLOCA, means that
+   it has been built for the declaration of a variable-sized object.  */
+#define ALLOCA_FOR_VAR_P(NODE) (CALL_EXPR_CHECK (NODE)->base.protected_flag)
+
 /* In a type, nonzero means that all objects of the type are guaranteed by the
    language or front-end to be properly aligned, so we can indicate that a MEM
    of this type is aligned at least to the alignment of the type, even if it
Index: builtins.c
===================================================================
--- builtins.c	(revision 162897)
+++ builtins.c	(working copy)
@@ -132,7 +132,7 @@ static rtx expand_builtin_memset (tree,
 static rtx expand_builtin_memset_args (tree, tree, tree, rtx, enum machine_mode, tree);
 static rtx expand_builtin_bzero (tree);
 static rtx expand_builtin_strlen (tree, rtx, enum machine_mode);
-static rtx expand_builtin_alloca (tree, rtx);
+static rtx expand_builtin_alloca (tree, rtx, bool);
 static rtx expand_builtin_unop (enum machine_mode, tree, rtx, rtx, optab);
 static rtx expand_builtin_frame_address (tree, tree);
 static tree stabilize_va_list_loc (location_t, tree, int);
@@ -1520,8 +1520,10 @@ expand_builtin_apply (rtx function, rtx
     emit_stack_save (SAVE_BLOCK, &old_stack_level, NULL_RTX);
 
   /* Allocate a block of memory onto the stack and copy the memory
-     arguments to the outgoing arguments address.  */
-  allocate_dynamic_stack_space (argsize, 0, BITS_PER_UNIT);
+     arguments to the outgoing arguments address.  We can pass TRUE
+     as the 4th argument because we just saved the stack pointer
+     and will restore it right after the call.  */
+  allocate_dynamic_stack_space (argsize, 0, BITS_PER_UNIT, TRUE);
 
   /* Set DRAP flag to true, even though allocate_dynamic_stack_space
      may have already set current_function_calls_alloca to true.
@@ -4881,12 +4883,13 @@ expand_builtin_frame_address (tree fndec
     }
 }
 
-/* Expand EXP, a call to the alloca builtin.  Return NULL_RTX if
-   we failed and the caller should emit a normal call, otherwise try to get
-   the result in TARGET, if convenient.  */
+/* Expand EXP, a call to the alloca builtin.  Return NULL_RTX if we
+   failed and the caller should emit a normal call, otherwise try to
+   get the result in TARGET, if convenient.  CANNOT_ACCUMULATE is the
+   same as for allocate_dynamic_stack_space. */
 
 static rtx
-expand_builtin_alloca (tree exp, rtx target)
+expand_builtin_alloca (tree exp, rtx target, bool cannot_accumulate)
 {
   rtx op0;
   rtx result;
@@ -4902,7 +4905,8 @@ expand_builtin_alloca (tree exp, rtx tar
   op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
 
   /* Allocate the desired space.  */
-  result = allocate_dynamic_stack_space (op0, target, BITS_PER_UNIT);
+  result = allocate_dynamic_stack_space (op0, target, BITS_PER_UNIT,
+					 cannot_accumulate);
   result = convert_memory_address (ptr_mode, result);
 
   return result;
@@ -5937,7 +5941,9 @@ expand_builtin (tree exp, rtx target, rt
 	return XEXP (DECL_RTL (DECL_RESULT (current_function_decl)), 0);
 
     case BUILT_IN_ALLOCA:
-      target = expand_builtin_alloca (exp, target);
+      /* If the allocation stems from the declaration of a variable-sized
+	 object, it cannot accumulate.  */
+      target = expand_builtin_alloca (exp, target, ALLOCA_FOR_VAR_P (exp));
       if (target)
 	return target;
       break;
Index: toplev.c
===================================================================
--- toplev.c	(revision 162897)
+++ toplev.c	(working copy)
@@ -350,6 +350,7 @@ static const param_info lang_independent
 
 FILE *asm_out_file;
 FILE *aux_info_file;
+FILE *stack_usage_file = NULL;
 FILE *dump_file = NULL;
 const char *dump_file_name;
 
@@ -1558,6 +1559,88 @@ alloc_for_identifier_to_locale (size_t l
   return ggc_alloc_atomic (len);
 }
 
+/* Output stack usage information.  */
+void
+output_stack_usage (void)
+{
+  static bool warning_issued = false;
+  enum stack_usage_kind_type { STATIC = 0, DYNAMIC, DYNAMIC_BOUNDED };
+  const char *stack_usage_kind_str[] = {
+    "static",
+    "dynamic",
+    "dynamic,bounded"
+  };
+  HOST_WIDE_INT stack_usage = current_function_static_stack_size;
+  enum stack_usage_kind_type stack_usage_kind;
+  expanded_location loc;
+  const char *raw_id, *id;
+
+  if (stack_usage < 0)
+    {
+      if (!warning_issued)
+	{
+	  warning (0, "-fstack-usage not supported for this target");
+	  warning_issued = true;
+	}
+      return;
+    }
+
+  stack_usage_kind = STATIC;
+
+  /* Add the maximum amount of space pushed onto the stack.  */
+  if (current_function_pushed_stack_size > 0)
+    {
+      stack_usage += current_function_pushed_stack_size;
+      stack_usage_kind = DYNAMIC_BOUNDED;
+    }
+
+  /* Now on to the tricky part: dynamic stack allocation.  */
+  if (current_function_allocates_dynamic_stack_space)
+    {
+      if (current_function_has_unbounded_dynamic_stack_size)
+	stack_usage_kind = DYNAMIC;
+      else
+	stack_usage_kind = DYNAMIC_BOUNDED;
+
+      /* Add the size even in the unbounded case, this can't hurt.  */
+      stack_usage += current_function_dynamic_stack_size;
+    }
+
+  loc = expand_location (DECL_SOURCE_LOCATION (current_function_decl));
+
+  /* Strip the scope prefix if any.  */
+  raw_id = lang_hooks.decl_printable_name (current_function_decl, 2);
+  id = strrchr (raw_id, '.');
+  if (id)
+    id++;
+  else
+    id = raw_id;
+
+  fprintf (stack_usage_file,
+	   "%s:%d:%d:%s\t"HOST_WIDE_INT_PRINT_DEC"\t%s\n",
+	   basename (loc.file),
+	   loc.line,
+	   loc.column,
+	   id,
+	   stack_usage,
+	   stack_usage_kind_str[stack_usage_kind]);
+}
+
+/* Open an auxiliary output file.  */
+static FILE *
+open_auxiliary_file (const char *ext)
+{
+  char *filename;
+  FILE *file;
+
+  filename = concat (aux_base_name, ".", ext, NULL);
+  file = fopen (filename, "w");
+  if (!file)
+    fatal_error ("can't open %s for writing: %m", filename);
+  free (filename);
+  return file;
+}
+
 /* Initialization of the front end environment, before command line
    options are parsed.  Signal handlers, internationalization etc.
    ARGV0 is main's argv[0].  */
@@ -2173,6 +2256,10 @@ lang_dependent_init (const char *name)
 
   init_asm_output (name);
 
+  /* If stack usage information is desired, open the output file.  */
+  if (flag_stack_usage)
+    stack_usage_file = open_auxiliary_file ("su");
+
   /* This creates various _DECL nodes, so needs to be called after the
      front end is initialized.  */
   init_eh ();
@@ -2254,6 +2341,9 @@ finalize (void)
 	unlink_if_ordinary (asm_file_name);
     }
 
+  if (stack_usage_file)
+    fclose (stack_usage_file);
+
   statistics_fini ();
   finish_optimization_passes ();
 
Index: expr.h
===================================================================
--- expr.h	(revision 162897)
+++ expr.h	(working copy)
@@ -641,9 +641,8 @@ extern void emit_stack_restore (enum sav
 /* Invoke emit_stack_save for the nonlocal_goto_save_area.  */
 extern void update_nonlocal_goto_save_area (void);
 
-/* Allocate some space on the stack dynamically and return its address.  An rtx
-   says how many bytes.  */
-extern rtx allocate_dynamic_stack_space (rtx, rtx, int);
+/* Allocate some space on the stack dynamically and return its address.  */
+extern rtx allocate_dynamic_stack_space (rtx, rtx, int, bool);
 
 /* Emit one stack probe at ADDRESS, an address within the stack.  */
 extern void emit_stack_probe (rtx);
Index: function.c
===================================================================
--- function.c	(revision 162897)
+++ function.c	(working copy)
@@ -1899,6 +1899,18 @@ instantiate_virtual_regs (void)
   /* Indicate that, from now on, assign_stack_local should use
      frame_pointer_rtx.  */
   virtuals_instantiated = 1;
+
+  /* See allocate_dynamic_stack_space for the rationale.  */
+#ifdef SETJMP_VIA_SAVE_AREA
+  if (flag_stack_usage && cfun->calls_setjmp)
+    {
+      int align = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
+      dynamic_offset = (dynamic_offset + align - 1) / align * align;
+      current_function_dynamic_stack_size
+	+= current_function_dynamic_alloc_count * dynamic_offset;
+    }
+#endif
+
   return 0;
 }
 
@@ -4334,6 +4346,12 @@ prepare_function_start (void)
   init_expr ();
   default_rtl_profile ();
 
+  if (flag_stack_usage)
+    {
+      cfun->su = ggc_alloc_cleared_stack_usage ();
+      cfun->su->static_stack_size = -1;
+    }
+
   cse_not_expected = ! optimize;
 
   /* Caller save not needed yet.  */
@@ -5722,12 +5740,17 @@ rest_of_handle_thread_prologue_and_epilo
 {
   if (optimize)
     cleanup_cfg (CLEANUP_EXPENSIVE);
+
   /* On some machines, the prologue and epilogue code, or parts thereof,
      can be represented as RTL.  Doing so lets us schedule insns between
      it and the rest of the code and also allows delayed branch
      scheduling to operate in the epilogue.  */
-
   thread_prologue_and_epilogue_insns ();
+
+  /* The stack usage info is finalized during prologue expansion.  */
+  if (flag_stack_usage)
+    output_stack_usage ();
+
   return 0;
 }
 
Index: function.h
===================================================================
--- function.h	(revision 162897)
+++ function.h	(working copy)
@@ -468,6 +468,37 @@ extern GTY(()) struct rtl_data x_rtl;
    want to do differently.  */
 #define crtl (&x_rtl)
 
+struct GTY(()) stack_usage
+{
+  /* # of bytes of static stack space allocated by the function.  */
+  HOST_WIDE_INT static_stack_size;
+
+  /* # of bytes of dynamic stack space allocated by the function.  This is
+     meaningful only if has_unbounded_dynamic_stack_size is zero.  */
+  HOST_WIDE_INT dynamic_stack_size;
+
+  /* # of bytes of space pushed onto the stack after the prologue.  If
+     !ACCUMULATE_OUTGOING_ARGS, it contains the outgoing arguments.  */
+  int pushed_stack_size;
+
+  /* # of dynamic allocations in the function.  */
+  unsigned int dynamic_alloc_count : 31;
+
+  /* Nonzero if the amount of stack space allocated dynamically cannot
+     be bounded at compile-time.  */
+  unsigned int has_unbounded_dynamic_stack_size : 1;
+};
+
+#define current_function_static_stack_size (cfun->su->static_stack_size)
+#define current_function_dynamic_stack_size (cfun->su->dynamic_stack_size)
+#define current_function_pushed_stack_size (cfun->su->pushed_stack_size)
+#define current_function_dynamic_alloc_count (cfun->su->dynamic_alloc_count)
+#define current_function_has_unbounded_dynamic_stack_size \
+  (cfun->su->has_unbounded_dynamic_stack_size)
+#define current_function_allocates_dynamic_stack_space    \
+  (current_function_dynamic_stack_size != 0               \
+   || current_function_has_unbounded_dynamic_stack_size)
+
 /* This structure can save all the important global and static variables
    describing the status of the current function.  */
 
@@ -486,6 +517,9 @@ struct GTY(()) function {
   /* The loops in this function.  */
   struct loops *x_current_loops;
 
+  /* The stack usage of this function.  */
+  struct stack_usage *su;
+
   /* Value histograms attached to particular statements.  */
   htab_t GTY((skip)) value_histograms;
 
Index: gimplify.c
===================================================================
--- gimplify.c	(revision 162897)
+++ gimplify.c	(working copy)
@@ -1329,6 +1329,8 @@ gimplify_vla_decl (tree decl, gimple_seq
 
   t = built_in_decls[BUILT_IN_ALLOCA];
   t = build_call_expr (t, 1, DECL_SIZE_UNIT (decl));
+  /* The call has been built for a variable-sized object.  */
+  ALLOCA_FOR_VAR_P (t) = 1;
   t = fold_convert (ptr_type, t);
   t = build2 (MODIFY_EXPR, TREE_TYPE (addr), addr, t);
 
Index: calls.c
===================================================================
--- calls.c	(revision 162897)
+++ calls.c	(working copy)
@@ -1091,9 +1091,13 @@ initialize_argument_information (int num
 		      pending_stack_adjust = 0;
 		    }
 
+		  /* We can pass TRUE as the 4th argument because we just
+		     saved the stack pointer and will restore it right after
+		     the call.  */
 		  copy = gen_rtx_MEM (BLKmode,
 				      allocate_dynamic_stack_space
-				      (size_rtx, NULL_RTX, TYPE_ALIGN (type)));
+				      (size_rtx, NULL_RTX,
+				       TYPE_ALIGN (type), TRUE));
 		  set_mem_attributes (copy, type, 1);
 		}
 	      else
@@ -2488,6 +2492,8 @@ expand_call (tree exp, rtx target, int i
 	      stack_arg_under_construction = 0;
 	    }
 	  argblock = push_block (ARGS_SIZE_RTX (adjusted_args_size), 0, 0);
+	  if (flag_stack_usage)
+	    current_function_has_unbounded_dynamic_stack_size = 1;
 	}
       else
 	{
@@ -2649,8 +2655,11 @@ expand_call (tree exp, rtx target, int i
 		  stack_usage_map = stack_usage_map_buf;
 		  highest_outgoing_arg_in_use = 0;
 		}
+	      /* We can pass TRUE as the 4th argument because we just
+		 saved the stack pointer and will restore it right after
+		 the call.  */
 	      allocate_dynamic_stack_space (push_size, NULL_RTX,
-					    BITS_PER_UNIT);
+					    BITS_PER_UNIT, TRUE);
 	    }
 
 	  /* If argument evaluation might modify the stack pointer,
@@ -2690,6 +2699,19 @@ expand_call (tree exp, rtx target, int i
 	 be deferred during the evaluation of the arguments.  */
       NO_DEFER_POP;
 
+      if (flag_stack_usage
+	  && !ACCUMULATE_OUTGOING_ARGS && pass && adjusted_args_size.var == 0)
+	{
+	  int pushed = adjusted_args_size.constant + pending_stack_adjust;
+
+	  /* Record the maximum pushed stack space size.  We need to
+	     delay it this far to take into account the optimization
+	     done by combine_pending_stack_adjustment_and_call.  */
+
+	  if (pushed > current_function_pushed_stack_size)
+	    current_function_pushed_stack_size = pushed;
+	}
+
       funexp = rtx_for_function_call (fndecl, addr);
 
       /* Figure out the register where the value, if any, will come back.  */
@@ -3547,6 +3569,13 @@ emit_library_call_value_1 (int retval, r
   if (args_size.constant > crtl->outgoing_args_size)
     crtl->outgoing_args_size = args_size.constant;
 
+  if (flag_stack_usage && !ACCUMULATE_OUTGOING_ARGS)
+    {
+      int pushed = args_size.constant + pending_stack_adjust;
+      if (pushed > current_function_pushed_stack_size)
+	current_function_pushed_stack_size = pushed;
+    }
+
   if (ACCUMULATE_OUTGOING_ARGS)
     {
       /* Since the stack pointer will never be pushed, it is possible for
Index: explow.c
===================================================================
--- explow.c	(revision 162897)
+++ explow.c	(working copy)
@@ -1110,11 +1110,23 @@ update_nonlocal_goto_save_area (void)
    SIZE is an rtx representing the size of the area.
    TARGET is a place in which the address can be placed.
 
-   KNOWN_ALIGN is the alignment (in bits) that we know SIZE has.  */
+   KNOWN_ALIGN is the alignment (in bits) that we know SIZE has.
+
+   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
+   stack space allocated by the generated code cannot be added with itself
+   in the course of the execution of the function.  It is always safe to
+   pass FALSE here and the following criterion is sufficient in order to
+   pass TRUE: every path in the CFG that starts at the allocation point and
+   loops to it executes the associated deallocation code (that always exists
+   if the function does not use the depressed stack pointer mechanism).  */
 
 rtx
-allocate_dynamic_stack_space (rtx size, rtx target, int known_align)
+allocate_dynamic_stack_space (rtx size, rtx target, int known_align,
+			      bool cannot_accumulate)
 {
+  HOST_WIDE_INT stack_usage_size = -1;
+  bool known_align_valid = true;
+
   /* If we're asking for zero bytes, it doesn't matter what we point
      to since we can't dereference it.  But return a reasonable
      address anyway.  */
@@ -1124,6 +1136,37 @@ allocate_dynamic_stack_space (rtx size,
   /* Otherwise, show we're calling alloca or equivalent.  */
   cfun->calls_alloca = 1;
 
+  /* If stack usage info is requested, look into the size we are passed.
+     We need to do so this early to avoid the obfuscation that may be
+     introduced later by the various alignment operations.  */
+  if (flag_stack_usage)
+    {
+      if (GET_CODE (size) == CONST_INT)
+	stack_usage_size = INTVAL (size);
+      else if (GET_CODE (size) == REG)
+        {
+	  /* Look into the last emitted insn and see if we can deduce
+	     something for the register.  */
+	  rtx insn, set, note;
+	  insn = get_last_insn ();
+	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
+	    {
+	      if (GET_CODE (SET_SRC (set)) == CONST_INT)
+		stack_usage_size = INTVAL (SET_SRC (set));
+	      else if ((note = find_reg_equal_equiv_note (insn))
+		       && GET_CODE (XEXP (note, 0)) == CONST_INT)
+		stack_usage_size = INTVAL (XEXP (note, 0));
+	    }
+	}
+
+      /* If the size is not constant, we can't say anything.  */
+      if (stack_usage_size == -1)
+	{
+	  current_function_has_unbounded_dynamic_stack_size = 1;
+	  stack_usage_size = 0;
+	}
+    }
+
   /* Ensure the size is in the proper mode.  */
   if (GET_MODE (size) != VOIDmode && GET_MODE (size) != Pmode)
     size = convert_to_mode (Pmode, size, 1);
@@ -1153,10 +1196,17 @@ allocate_dynamic_stack_space (rtx size,
 #endif
 
   if (MUST_ALIGN)
-    size
-      = force_operand (plus_constant (size,
-				      BIGGEST_ALIGNMENT / BITS_PER_UNIT - 1),
-		       NULL_RTX);
+    {
+      size
+        = force_operand (plus_constant (size,
+					BIGGEST_ALIGNMENT / BITS_PER_UNIT - 1),
+			 NULL_RTX);
+
+      if (flag_stack_usage)
+	stack_usage_size += BIGGEST_ALIGNMENT / BITS_PER_UNIT - 1;
+
+      known_align_valid = false;
+    }
 
 #ifdef SETJMP_VIA_SAVE_AREA
   /* If setjmp restores regs from a save area in the stack frame,
@@ -1170,32 +1220,7 @@ allocate_dynamic_stack_space (rtx size,
      would use reg notes to store the "optimized" size and fix things
      up later.  These days we know this information before we ever
      start building RTL so the reg notes are unnecessary.  */
-  if (!cfun->calls_setjmp)
-    {
-      int align = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
-
-      /* ??? Code below assumes that the save area needs maximal
-	 alignment.  This constraint may be too strong.  */
-      gcc_assert (PREFERRED_STACK_BOUNDARY == BIGGEST_ALIGNMENT);
-
-      if (CONST_INT_P (size))
-	{
-	  HOST_WIDE_INT new_size = INTVAL (size) / align * align;
-
-	  if (INTVAL (size) != new_size)
-	    size = GEN_INT (new_size);
-	}
-      else
-	{
-	  /* Since we know overflow is not possible, we avoid using
-	     CEIL_DIV_EXPR and use TRUNC_DIV_EXPR instead.  */
-	  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size,
-				GEN_INT (align), NULL_RTX, 1);
-	  size = expand_mult (Pmode, size,
-			      GEN_INT (align), NULL_RTX, 1);
-	}
-    }
-  else
+  if (cfun->calls_setjmp)
     {
       rtx dynamic_offset
 	= expand_binop (Pmode, sub_optab, virtual_stack_dynamic_rtx,
@@ -1203,6 +1228,14 @@ allocate_dynamic_stack_space (rtx size,
 
       size = expand_binop (Pmode, add_optab, size, dynamic_offset,
 			   NULL_RTX, 1, OPTAB_LIB_WIDEN);
+
+      /* The above dynamic offset cannot be computed statically at this
+	 point, but it will be possible to do so after RTL expansion is
+	 done.  Record how many times we will need to add it.  */
+      if (flag_stack_usage)
+	current_function_dynamic_alloc_count++;
+
+      known_align_valid = false;
     }
 #endif /* SETJMP_VIA_SAVE_AREA */
 
@@ -1219,13 +1252,28 @@ allocate_dynamic_stack_space (rtx size,
      insns.  Since this is an extremely rare event, we have no reliable
      way of knowing which systems have this problem.  So we avoid even
      momentarily mis-aligning the stack.  */
+  if (!known_align_valid || known_align % PREFERRED_STACK_BOUNDARY != 0)
+    {
+      size = round_push (size);
 
-  /* If we added a variable amount to SIZE,
-     we can no longer assume it is aligned.  */
-#if !defined (SETJMP_VIA_SAVE_AREA)
-  if (MUST_ALIGN || known_align % PREFERRED_STACK_BOUNDARY != 0)
-#endif
-    size = round_push (size);
+      if (flag_stack_usage)
+	{
+	  int align = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
+	  stack_usage_size = (stack_usage_size + align - 1) / align * align;
+	}
+    }
+
+  /* The size is supposed to be fully adjusted at this point so record it
+     if stack usage info is requested.  */
+  if (flag_stack_usage)
+    {
+      current_function_dynamic_stack_size += stack_usage_size;
+
+      /* ??? This is gross but the only safe stance in the absence
+	 of stack usage oriented flow analysis.  */
+      if (!cannot_accumulate)
+	current_function_has_unbounded_dynamic_stack_size = 1;
+    }
 
   do_pending_stack_adjust ();
 
Index: common.opt
===================================================================
--- common.opt	(revision 162897)
+++ common.opt	(working copy)
@@ -1222,6 +1222,10 @@ fstack-protector-all
 Common Report RejectNegative Var(flag_stack_protect, 2) VarExists
 Use a stack protection method for every function
 
+fstack-usage
+Common RejectNegative Var(flag_stack_usage)
+Output stack usage information on a per-function basis
+
 fstrength-reduce
 Common
 Does nothing.  Preserved for backward compatibility.
Index: output.h
===================================================================
--- output.h	(revision 162897)
+++ output.h	(working copy)
@@ -639,6 +639,9 @@ extern int maybe_assemble_visibility (tr
 
 extern int default_address_cost (rtx, bool);
 
+/* Output stack usage information.  */
+extern void output_stack_usage (void);
+
 /* dbxout helper functions */
 #if defined DBX_DEBUGGING_INFO || defined XCOFF_DEBUGGING_INFO
 
Index: config/alpha/alpha.c
===================================================================
--- config/alpha/alpha.c	(revision 162897)
+++ config/alpha/alpha.c	(working copy)
@@ -7763,6 +7763,30 @@ emit_frame_store (unsigned int regno, rt
   emit_frame_store_1 (reg, base_reg, frame_bias, base_ofs, reg);
 }
 
+/* Compute the frame size.  SIZE is the size of the "naked" frame
+   and SA_SIZE is the size of the register save area.  */
+
+static HOST_WIDE_INT
+compute_frame_size (HOST_WIDE_INT size, HOST_WIDE_INT sa_size)
+{
+  if (TARGET_ABI_OPEN_VMS)
+    return ALPHA_ROUND (sa_size 
+			+ (alpha_procedure_type == PT_STACK ? 8 : 0)
+			+ size
+			+ crtl->args.pretend_args_size);
+  else if (TARGET_ABI_UNICOSMK)
+    /* We have to allocate space for the DSIB if we generate a frame.  */
+    return ALPHA_ROUND (sa_size
+			+ (alpha_procedure_type == PT_STACK ? 48 : 0))
+	   + ALPHA_ROUND (size
+			  + crtl->outgoing_args_size);
+  else
+    return ALPHA_ROUND (crtl->outgoing_args_size)
+	   + sa_size
+	   + ALPHA_ROUND (size
+			  + crtl->args.pretend_args_size);
+}
+
 /* Write function prologue.  */
 
 /* On vms we have two kinds of functions:
@@ -7796,24 +7820,10 @@ alpha_expand_prologue (void)
   int i;
 
   sa_size = alpha_sa_size ();
+  frame_size = compute_frame_size (get_frame_size (), sa_size);
 
-  frame_size = get_frame_size ();
-  if (TARGET_ABI_OPEN_VMS)
-    frame_size = ALPHA_ROUND (sa_size
-			      + (alpha_procedure_type == PT_STACK ? 8 : 0)
-			      + frame_size
-			      + crtl->args.pretend_args_size);
-  else if (TARGET_ABI_UNICOSMK)
-    /* We have to allocate space for the DSIB if we generate a frame.  */
-    frame_size = ALPHA_ROUND (sa_size
-			      + (alpha_procedure_type == PT_STACK ? 48 : 0))
-		 + ALPHA_ROUND (frame_size
-				+ crtl->outgoing_args_size);
-  else
-    frame_size = (ALPHA_ROUND (crtl->outgoing_args_size)
-		  + sa_size
-		  + ALPHA_ROUND (frame_size
-				 + crtl->args.pretend_args_size));
+  if (flag_stack_usage)
+    current_function_static_stack_size = frame_size;
 
   if (TARGET_ABI_OPEN_VMS)
     reg_offset = 8 + 8 * cfun->machine->uses_condition_handler;
@@ -8135,23 +8145,7 @@ alpha_start_function (FILE *file, const
 
   alpha_fnname = fnname;
   sa_size = alpha_sa_size ();
-
-  frame_size = get_frame_size ();
-  if (TARGET_ABI_OPEN_VMS)
-    frame_size = ALPHA_ROUND (sa_size
-			      + (alpha_procedure_type == PT_STACK ? 8 : 0)
-			      + frame_size
-			      + crtl->args.pretend_args_size);
-  else if (TARGET_ABI_UNICOSMK)
-    frame_size = ALPHA_ROUND (sa_size
-			      + (alpha_procedure_type == PT_STACK ? 48 : 0))
-		 + ALPHA_ROUND (frame_size
-			      + crtl->outgoing_args_size);
-  else
-    frame_size = (ALPHA_ROUND (crtl->outgoing_args_size)
-		  + sa_size
-		  + ALPHA_ROUND (frame_size
-				 + crtl->args.pretend_args_size));
+  frame_size = compute_frame_size (get_frame_size (), sa_size);
 
   if (TARGET_ABI_OPEN_VMS)
     reg_offset = 8 + 8 * cfun->machine->uses_condition_handler;
@@ -8353,23 +8347,7 @@ alpha_expand_epilogue (void)
   int i;
 
   sa_size = alpha_sa_size ();
-
-  frame_size = get_frame_size ();
-  if (TARGET_ABI_OPEN_VMS)
-    frame_size = ALPHA_ROUND (sa_size
-			      + (alpha_procedure_type == PT_STACK ? 8 : 0)
-			      + frame_size
-			      + crtl->args.pretend_args_size);
-  else if (TARGET_ABI_UNICOSMK)
-    frame_size = ALPHA_ROUND (sa_size
-			      + (alpha_procedure_type == PT_STACK ? 48 : 0))
-		 + ALPHA_ROUND (frame_size
-			      + crtl->outgoing_args_size);
-  else
-    frame_size = (ALPHA_ROUND (crtl->outgoing_args_size)
-		  + sa_size
-		  + ALPHA_ROUND (frame_size
-				 + crtl->args.pretend_args_size));
+  frame_size = compute_frame_size (get_frame_size (), sa_size);
 
   if (TARGET_ABI_OPEN_VMS)
     {
Index: config/sparc/sparc.c
===================================================================
--- config/sparc/sparc.c	(revision 162897)
+++ config/sparc/sparc.c	(working copy)
@@ -4402,6 +4402,9 @@ sparc_expand_prologue (void)
   /* Advertise that the data calculated just above are now valid.  */
   sparc_prologue_data_valid_p = true;
 
+  if (flag_stack_usage)
+    current_function_static_stack_size = actual_fsize;
+
   if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK && actual_fsize)
     sparc_emit_probe_stack_range (STACK_CHECK_PROTECT, actual_fsize);
 
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 162897)
+++ config/i386/i386.c	(working copy)
@@ -9588,6 +9588,27 @@ ix86_expand_prologue (void)
     }
   allocate = frame.stack_pointer_offset - m->fs.sp_offset;
 
+  if (flag_stack_usage)
+    {
+      /* We start to count from ARG_POINTER.  */
+      HOST_WIDE_INT stack_size = frame.stack_pointer_offset;
+
+      /* If it was realigned, take into account the fake frame.  */
+      if (stack_realign_drap)
+	{
+	  if (ix86_static_chain_on_stack)
+	    stack_size += UNITS_PER_WORD;
+
+	  if (!call_used_regs[REGNO (crtl->drap_reg)])
+	    stack_size += UNITS_PER_WORD;
+
+	  /* This takes into account the new return address slot.  */
+	  stack_size += crtl->stack_alignment_needed / BITS_PER_UNIT;
+	}
+
+      current_function_static_stack_size = stack_size;
+    }
+
   /* The stack has already been decremented by the instruction calling us
      so we need to probe unconditionally to preserve the protection area.  */
   if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK)
Index: config/ia64/ia64.c
===================================================================
--- config/ia64/ia64.c	(revision 162897)
+++ config/ia64/ia64.c	(working copy)
@@ -3097,6 +3097,9 @@ ia64_expand_prologue (void)
   ia64_compute_frame_size (get_frame_size ());
   last_scratch_gr_reg = 15;
 
+  if (flag_stack_usage)
+    current_function_static_stack_size = current_frame_info.total_size;
+
   if (dump_file) 
     {
       fprintf (dump_file, "ia64 frame related registers "
Index: config/rs6000/rs6000.c
===================================================================
--- config/rs6000/rs6000.c	(revision 162897)
+++ config/rs6000/rs6000.c	(working copy)
@@ -19403,6 +19403,9 @@ rs6000_emit_prologue (void)
 			      && call_used_regs[STATIC_CHAIN_REGNUM]);
   HOST_WIDE_INT sp_offset = 0;
 
+  if (flag_stack_usage)
+    current_function_static_stack_size = info->total_size;
+
   if (TARGET_FIX_AND_CONTINUE)
     {
       /* gdb on darwin arranges to forward a function from the old
Index: config/pa/pa.c
===================================================================
--- config/pa/pa.c	(revision 162897)
+++ config/pa/pa.c	(working copy)
@@ -3688,6 +3688,8 @@ hppa_expand_prologue (void)
     local_fsize += STARTING_FRAME_OFFSET;
 
   actual_fsize = compute_frame_size (size, &save_fregs);
+  if (flag_stack_usage)
+    current_function_static_stack_size = actual_fsize;
 
   /* Compute a few things we will use often.  */
   tmpreg = gen_rtx_REG (word_mode, 1);
Index: config/mips/mips.c
===================================================================
--- config/mips/mips.c	(revision 162897)
+++ config/mips/mips.c	(working copy)
@@ -10061,6 +10061,9 @@ mips_expand_prologue (void)
   frame = &cfun->machine->frame;
   size = frame->total_size;
 
+  if (flag_stack_usage)
+    current_function_static_stack_size = size;
+
   /* Save the registers.  Allocate up to MIPS_MAX_FIRST_STACK_STEP
      bytes beforehand; this is enough to cover the register save area
      without going out of range.  */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Implement -fstack-usage option
  2010-08-05 15:42 [PATCH] Implement -fstack-usage option Eric Botcazou
@ 2010-08-05 19:13 ` Richard Henderson
  2010-08-06 17:42   ` Eric Botcazou
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2010-08-05 19:13 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches

On 08/05/2010 08:42 AM, Eric Botcazou wrote:
> Hi,
> 
> this is again something that has been for a while in our tree.  The command 
> line option -fstack-usage is added and makes the compiler generate a second 
> output file (xxx.su) for each compilation unit, which contains one line per 
> function emitted in the assembly file and describing the stack usage of the 
> function.  The exact format is documented in the manual.  The primary goal is 
> for the data to be conservatively correct, in particular to give an upper 
> bound of the actual stack usage at run time.
> 
> Implementation-wise, there is nothing sophisticated, but various quirks of 
> expand_call and allocate_dynamic_stack_space need to be properly handled.
> It's worth noting that this is implemented only for 7 architectures (Alpha, 
> x86/x86-64, IA-64, MIPS, PA, PowerPC and SPARC) but extending it to others 
> should be a trivial task for the respective maintainers.

Most of this looks fairly sane.

I do wonder if stack re-alignment should fall under the heading of
dynamic_stack_size though.  I.e.

> Index: config/i386/i386.c
> ===================================================================
> --- config/i386/i386.c	(revision 162897)
> +++ config/i386/i386.c	(working copy)
> @@ -9588,6 +9588,27 @@ ix86_expand_prologue (void)
>      }
>    allocate = frame.stack_pointer_offset - m->fs.sp_offset;
>  
> +  if (flag_stack_usage)
> +    {
> +      /* We start to count from ARG_POINTER.  */
> +      HOST_WIDE_INT stack_size = frame.stack_pointer_offset;
> +
> +      /* If it was realigned, take into account the fake frame.  */
> +      if (stack_realign_drap)
> +	{
> +	  if (ix86_static_chain_on_stack)
> +	    stack_size += UNITS_PER_WORD;
> +
> +	  if (!call_used_regs[REGNO (crtl->drap_reg)])
> +	    stack_size += UNITS_PER_WORD;
> +
> +	  /* This takes into account the new return address slot.  */
> +	  stack_size += crtl->stack_alignment_needed / BITS_PER_UNIT;

This last line ought to add to the dynamic stack size instead.
(And you're over-estimating by 1 minimal-stack-alignment-unit.  ;-)


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Implement -fstack-usage option
  2010-08-05 19:13 ` Richard Henderson
@ 2010-08-06 17:42   ` Eric Botcazou
  2010-08-30 17:20     ` Eric Botcazou
  2010-09-01 20:55     ` Jack Howarth
  0 siblings, 2 replies; 11+ messages in thread
From: Eric Botcazou @ 2010-08-06 17:42 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 778 bytes --]

> Most of this looks fairly sane.

Thanks.

> This last line ought to add to the dynamic stack size instead.

This makes sense indeed.

> (And you're over-estimating by 1 minimal-stack-alignment-unit.  ;-)

I think that this is mitigated by not adding the word for the new return 
address slot; that's what I was trying to say in the comment.  Here is a 
corrected version, attached.  I've also added a testcase for this, as well
as a generic one.


	* config/i386/i386.c (ix86_expand_prologue): Set stack usage info.
testsuite/
	* lib/gcc-dg.exp (cleanup-stack-usage): New procedure.
	* lib/scanasm.exp (scan-stack-usage): Likewise.
	(scan-stack-usage-not): Likewise.
	* gcc.dg/stack-usage-1.c: New test/
	* gcc.target/i386/stack-usage-realign.c: Likewise.


-- 
Eric Botcazou

[-- Attachment #2: p.diff --]
[-- Type: text/x-diff, Size: 3645 bytes --]

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 162897)
+++ config/i386/i386.c	(working copy)
@@ -9588,6 +9588,29 @@ ix86_expand_prologue (void)
     }
   allocate = frame.stack_pointer_offset - m->fs.sp_offset;
 
+  if (flag_stack_usage)
+    {
+      /* We start to count from ARG_POINTER.  */
+      HOST_WIDE_INT stack_size = frame.stack_pointer_offset;
+
+      /* If it was realigned, take into account the fake frame.  */
+      if (stack_realign_drap)
+	{
+	  if (ix86_static_chain_on_stack)
+	    stack_size += UNITS_PER_WORD;
+
+	  if (!call_used_regs[REGNO (crtl->drap_reg)])
+	    stack_size += UNITS_PER_WORD;
+
+	  /* This over-estimates by 1 minimal-stack-alignment-unit but
+	     mitigates that by counting in the new return address slot.  */
+	  current_function_dynamic_stack_size
+	    += crtl->stack_alignment_needed / BITS_PER_UNIT;
+	}
+
+      current_function_static_stack_size = stack_size;
+    }
+
   /* The stack has already been decremented by the instruction calling us
      so we need to probe unconditionally to preserve the protection area.  */
   if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK)
Index: testsuite/lib/gcc-dg.exp
===================================================================
--- testsuite/lib/gcc-dg.exp	(revision 162897)
+++ testsuite/lib/gcc-dg.exp	(working copy)
@@ -1,5 +1,5 @@
-#   Copyright (C) 1997, 1999, 2000, 2003, 2004, 2005, 2006, 2007, 2008, 2009
-#   Free Software Foundation, Inc.
+#   Copyright (C) 1997, 1999, 2000, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
+#   2010 Free Software Foundation, Inc.
 
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -460,6 +460,11 @@ proc cleanup-ipa-dump { suffix } {
   cleanup-dump "\[0-9\]\[0-9\]\[0-9\]i.$suffix"
 }
 
+# Remove a stack usage file for the current test.
+proc cleanup-stack-usage { args } {
+  cleanup-dump "su"
+}
+
 # Remove all dump files with the provided suffix.
 proc cleanup-dump { suffix } {
     # This assumes that we are three frames down from dg-test or some other
Index: testsuite/lib/scanasm.exp
===================================================================
--- testsuite/lib/scanasm.exp	(revision 162897)
+++ testsuite/lib/scanasm.exp	(working copy)
@@ -1,4 +1,5 @@
-# Copyright (C) 2000, 2002, 2003, 2007, 2008 Free Software Foundation, Inc.
+# Copyright (C) 2000, 2002, 2003, 2007, 2008, 2010
+# Free Software Foundation, Inc.
 
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -154,6 +155,28 @@ proc scan-file-not { output_file args }
     dg-scan "scan-file-not" 0 $testcase $output_file $args
 }
 
+# Look for a pattern in the .su file produced by the compiler.  See
+# dg-scan for details.
+
+proc scan-stack-usage { args } {
+    upvar 2 name testcase
+    set testcase [lindex $testcase 0]
+    set output_file "[file rootname [file tail $testcase]].su"
+
+    dg-scan "scan-file" 1 $testcase $output_file $args
+}
+
+# Check that a pattern is not present in the .su file produced by the
+# compiler.  See dg-scan for details.
+
+proc scan-stack-usage-not { args } {
+    upvar 2 name testcase
+    set testcase [lindex $testcase 0]
+    set output_file "[file rootname [file tail $testcase]].su"
+
+    dg-scan "scan-file-not" 0 $testcase $output_file $args
+}
+
 # Call pass if pattern is present given number of times, otherwise fail.
 proc scan-assembler-times { args } {
     if { [llength $args] < 2 } {

[-- Attachment #3: stack-usage-1.c --]
[-- Type: text/x-csrc, Size: 1182 bytes --]

/* { dg-do compile } */
/* { dg-options "-fstack-usage" } */

/* This is aimed at testing basic support for -fstack-usage in the back-ends.
   See the SPARC back-end for an example (grep flag_stack_usage in sparc.c).
   Once it is implemented, adjust SIZE below so that the stack usage for the
   function FOO is reported as 256 or 264 in the stack usage (.su) file.
   Then check that this is the actual stack usage in the assembly file.  */

#if defined(__i386__)
#  define SIZE 248
#elif defined(__x86_64__)
#  define SIZE 356
#elif defined (__sparc__)
#  if defined (__arch64__)
#    define SIZE 76
#  else
#    define SIZE 160
#  endif
#elif defined(__hppa__)
#  define SIZE 192
#elif defined (__alpha__)
#  define SIZE 240
#elif defined (__ia64__)
#  define SIZE 272
#elif defined(__mips__)
#  define SIZE 240
#elif defined (__powerpc__) || defined (__PPC__) || defined (__ppc__) \
      || defined (__POWERPC__) || defined (PPC) || defined (_IBMR2)
#  define SIZE 240
#else
#  define SIZE 256
#endif

int foo (void)
{
  char arr[SIZE];
  arr[0] = 1;
  return 0;
}

/* { dg-final { scan-stack-usage "foo\t\(256|264\)\tstatic" } } */
/* { dg-final { cleanup-stack-usage } } */

[-- Attachment #4: stack-usage-realign.c --]
[-- Type: text/x-csrc, Size: 378 bytes --]

/* { dg-do compile } */
/* { dg-require-effective-target ilp32 } */
/* { dg-options "-fstack-usage -msse2 -mforce-drap" } */

typedef int __attribute__((vector_size(16))) vec;

vec foo (vec v)
{
  return v;
}

int main (void)
{
  vec V;
  V = foo (V);
  return 0;
}

/* { dg-final { scan-stack-usage "main\t48\tdynamic,bounded" } } */
/* { dg-final { cleanup-stack-usage } } */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Implement -fstack-usage option
  2010-08-06 17:42   ` Eric Botcazou
@ 2010-08-30 17:20     ` Eric Botcazou
  2010-08-30 18:00       ` Richard Henderson
  2010-09-01 20:55     ` Jack Howarth
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Botcazou @ 2010-08-30 17:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches

> I think that this is mitigated by not adding the word for the new return
> address slot; that's what I was trying to say in the comment.  Here is a
> corrected version, attached.  I've also added a testcase for this, as well
> as a generic one.
>
>
> 	* config/i386/i386.c (ix86_expand_prologue): Set stack usage info.
> testsuite/
> 	* lib/gcc-dg.exp (cleanup-stack-usage): New procedure.
> 	* lib/scanasm.exp (scan-stack-usage): Likewise.
> 	(scan-stack-usage-not): Likewise.
> 	* gcc.dg/stack-usage-1.c: New test/
> 	* gcc.target/i386/stack-usage-realign.c: Likewise.

Do you think this is good enough to be applied with the rest of the patch on 
the mainline?

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Implement -fstack-usage option
  2010-08-30 17:20     ` Eric Botcazou
@ 2010-08-30 18:00       ` Richard Henderson
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2010-08-30 18:00 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches

On 08/30/2010 09:59 AM, Eric Botcazou wrote:
>> I think that this is mitigated by not adding the word for the new return
>> address slot; that's what I was trying to say in the comment.  Here is a
>> corrected version, attached.  I've also added a testcase for this, as well
>> as a generic one.
>>
>>
>> 	* config/i386/i386.c (ix86_expand_prologue): Set stack usage info.
>> testsuite/
>> 	* lib/gcc-dg.exp (cleanup-stack-usage): New procedure.
>> 	* lib/scanasm.exp (scan-stack-usage): Likewise.
>> 	(scan-stack-usage-not): Likewise.
>> 	* gcc.dg/stack-usage-1.c: New test/
>> 	* gcc.target/i386/stack-usage-realign.c: Likewise.
> 
> Do you think this is good enough to be applied with the rest of the patch on 
> the mainline?
> 

Yes, this is fine.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Implement -fstack-usage option
  2010-08-06 17:42   ` Eric Botcazou
  2010-08-30 17:20     ` Eric Botcazou
@ 2010-09-01 20:55     ` Jack Howarth
  2010-09-01 21:32       ` Eric Botcazou
  1 sibling, 1 reply; 11+ messages in thread
From: Jack Howarth @ 2010-09-01 20:55 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Richard Henderson, gcc-patches

On Fri, Aug 06, 2010 at 07:41:55PM +0200, Eric Botcazou wrote:
> > Most of this looks fairly sane.
> 
> Thanks.
> 
> > This last line ought to add to the dynamic stack size instead.
> 
> This makes sense indeed.
> 
> > (And you're over-estimating by 1 minimal-stack-alignment-unit.  ;-)
> 
> I think that this is mitigated by not adding the word for the new return 
> address slot; that's what I was trying to say in the comment.  Here is a 
> corrected version, attached.  I've also added a testcase for this, as well
> as a generic one.

Eric,
   Was this patch tested on x86_64-apple-darwin10 at adacore? Since the
patch was committed as r163660, we have been randomly failing three different
testcases...

FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c compilation,  -Os  (internal
compiler error)
UNRESOLVED: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -Os 
FAIL: gcc.dg/stack-usage-1.c scan-file foo\\t(256|264)\\tstatic
FAIL: gcc.target/i386/stack-usage-realign.c scan-file
main\\t48\\tdynamic,bounded

which Richard Henderson says looks like random memory corruption.
                  Jack

> 
> 
> 	* config/i386/i386.c (ix86_expand_prologue): Set stack usage info.
> testsuite/
> 	* lib/gcc-dg.exp (cleanup-stack-usage): New procedure.
> 	* lib/scanasm.exp (scan-stack-usage): Likewise.
> 	(scan-stack-usage-not): Likewise.
> 	* gcc.dg/stack-usage-1.c: New test/
> 	* gcc.target/i386/stack-usage-realign.c: Likewise.
> 
> 
> -- 
> Eric Botcazou

> Index: config/i386/i386.c
> ===================================================================
> --- config/i386/i386.c	(revision 162897)
> +++ config/i386/i386.c	(working copy)
> @@ -9588,6 +9588,29 @@ ix86_expand_prologue (void)
>      }
>    allocate = frame.stack_pointer_offset - m->fs.sp_offset;
>  
> +  if (flag_stack_usage)
> +    {
> +      /* We start to count from ARG_POINTER.  */
> +      HOST_WIDE_INT stack_size = frame.stack_pointer_offset;
> +
> +      /* If it was realigned, take into account the fake frame.  */
> +      if (stack_realign_drap)
> +	{
> +	  if (ix86_static_chain_on_stack)
> +	    stack_size += UNITS_PER_WORD;
> +
> +	  if (!call_used_regs[REGNO (crtl->drap_reg)])
> +	    stack_size += UNITS_PER_WORD;
> +
> +	  /* This over-estimates by 1 minimal-stack-alignment-unit but
> +	     mitigates that by counting in the new return address slot.  */
> +	  current_function_dynamic_stack_size
> +	    += crtl->stack_alignment_needed / BITS_PER_UNIT;
> +	}
> +
> +      current_function_static_stack_size = stack_size;
> +    }
> +
>    /* The stack has already been decremented by the instruction calling us
>       so we need to probe unconditionally to preserve the protection area.  */
>    if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK)
> Index: testsuite/lib/gcc-dg.exp
> ===================================================================
> --- testsuite/lib/gcc-dg.exp	(revision 162897)
> +++ testsuite/lib/gcc-dg.exp	(working copy)
> @@ -1,5 +1,5 @@
> -#   Copyright (C) 1997, 1999, 2000, 2003, 2004, 2005, 2006, 2007, 2008, 2009
> -#   Free Software Foundation, Inc.
> +#   Copyright (C) 1997, 1999, 2000, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
> +#   2010 Free Software Foundation, Inc.
>  
>  # This program is free software; you can redistribute it and/or modify
>  # it under the terms of the GNU General Public License as published by
> @@ -460,6 +460,11 @@ proc cleanup-ipa-dump { suffix } {
>    cleanup-dump "\[0-9\]\[0-9\]\[0-9\]i.$suffix"
>  }
>  
> +# Remove a stack usage file for the current test.
> +proc cleanup-stack-usage { args } {
> +  cleanup-dump "su"
> +}
> +
>  # Remove all dump files with the provided suffix.
>  proc cleanup-dump { suffix } {
>      # This assumes that we are three frames down from dg-test or some other
> Index: testsuite/lib/scanasm.exp
> ===================================================================
> --- testsuite/lib/scanasm.exp	(revision 162897)
> +++ testsuite/lib/scanasm.exp	(working copy)
> @@ -1,4 +1,5 @@
> -# Copyright (C) 2000, 2002, 2003, 2007, 2008 Free Software Foundation, Inc.
> +# Copyright (C) 2000, 2002, 2003, 2007, 2008, 2010
> +# Free Software Foundation, Inc.
>  
>  # This program is free software; you can redistribute it and/or modify
>  # it under the terms of the GNU General Public License as published by
> @@ -154,6 +155,28 @@ proc scan-file-not { output_file args }
>      dg-scan "scan-file-not" 0 $testcase $output_file $args
>  }
>  
> +# Look for a pattern in the .su file produced by the compiler.  See
> +# dg-scan for details.
> +
> +proc scan-stack-usage { args } {
> +    upvar 2 name testcase
> +    set testcase [lindex $testcase 0]
> +    set output_file "[file rootname [file tail $testcase]].su"
> +
> +    dg-scan "scan-file" 1 $testcase $output_file $args
> +}
> +
> +# Check that a pattern is not present in the .su file produced by the
> +# compiler.  See dg-scan for details.
> +
> +proc scan-stack-usage-not { args } {
> +    upvar 2 name testcase
> +    set testcase [lindex $testcase 0]
> +    set output_file "[file rootname [file tail $testcase]].su"
> +
> +    dg-scan "scan-file-not" 0 $testcase $output_file $args
> +}
> +
>  # Call pass if pattern is present given number of times, otherwise fail.
>  proc scan-assembler-times { args } {
>      if { [llength $args] < 2 } {

> /* { dg-do compile } */
> /* { dg-options "-fstack-usage" } */
> 
> /* This is aimed at testing basic support for -fstack-usage in the back-ends.
>    See the SPARC back-end for an example (grep flag_stack_usage in sparc.c).
>    Once it is implemented, adjust SIZE below so that the stack usage for the
>    function FOO is reported as 256 or 264 in the stack usage (.su) file.
>    Then check that this is the actual stack usage in the assembly file.  */
> 
> #if defined(__i386__)
> #  define SIZE 248
> #elif defined(__x86_64__)
> #  define SIZE 356
> #elif defined (__sparc__)
> #  if defined (__arch64__)
> #    define SIZE 76
> #  else
> #    define SIZE 160
> #  endif
> #elif defined(__hppa__)
> #  define SIZE 192
> #elif defined (__alpha__)
> #  define SIZE 240
> #elif defined (__ia64__)
> #  define SIZE 272
> #elif defined(__mips__)
> #  define SIZE 240
> #elif defined (__powerpc__) || defined (__PPC__) || defined (__ppc__) \
>       || defined (__POWERPC__) || defined (PPC) || defined (_IBMR2)
> #  define SIZE 240
> #else
> #  define SIZE 256
> #endif
> 
> int foo (void)
> {
>   char arr[SIZE];
>   arr[0] = 1;
>   return 0;
> }
> 
> /* { dg-final { scan-stack-usage "foo\t\(256|264\)\tstatic" } } */
> /* { dg-final { cleanup-stack-usage } } */

> /* { dg-do compile } */
> /* { dg-require-effective-target ilp32 } */
> /* { dg-options "-fstack-usage -msse2 -mforce-drap" } */
> 
> typedef int __attribute__((vector_size(16))) vec;
> 
> vec foo (vec v)
> {
>   return v;
> }
> 
> int main (void)
> {
>   vec V;
>   V = foo (V);
>   return 0;
> }
> 
> /* { dg-final { scan-stack-usage "main\t48\tdynamic,bounded" } } */
> /* { dg-final { cleanup-stack-usage } } */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Implement -fstack-usage option
  2010-09-01 20:55     ` Jack Howarth
@ 2010-09-01 21:32       ` Eric Botcazou
  2010-09-02  0:08         ` Jack Howarth
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Botcazou @ 2010-09-01 21:32 UTC (permalink / raw)
  To: Jack Howarth; +Cc: Richard Henderson, gcc-patches

> Since the patch was committed as r163660, we have been randomly failing
> three  different testcases...
>
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c compilation,  -Os 
> (internal compiler error)
> UNRESOLVED: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -Os
> FAIL: gcc.dg/stack-usage-1.c scan-file foo\\t(256|264)\\tstatic
> FAIL: gcc.target/i386/stack-usage-realign.c scan-file
> main\\t48\\tdynamic,bounded

Do the last 2 really pass sometimes?  They should _always_ fail on 32-bit 
Darwin as far as I can see...

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Implement -fstack-usage option
  2010-09-01 21:32       ` Eric Botcazou
@ 2010-09-02  0:08         ` Jack Howarth
  2010-09-02 11:38           ` Eric Botcazou
  0 siblings, 1 reply; 11+ messages in thread
From: Jack Howarth @ 2010-09-02  0:08 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Richard Henderson, gcc-patches

On Wed, Sep 01, 2010 at 11:22:26PM +0200, Eric Botcazou wrote:
> > Since the patch was committed as r163660, we have been randomly failing
> > three  different testcases...
> >
> > FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c compilation,  -Os 
> > (internal compiler error)
> > UNRESOLVED: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -Os
> > FAIL: gcc.dg/stack-usage-1.c scan-file foo\\t(256|264)\\tstatic
> > FAIL: gcc.target/i386/stack-usage-realign.c scan-file
> > main\\t48\\tdynamic,bounded
> 
> Do the last 2 really pass sometimes?  They should _always_ fail on 32-bit 
> Darwin as far as I can see...

Sorry, my mistake. The gcc.dg/stack-usage-1.c scan-file foo\\t(256|264)\\tstatic
and gcc.target/i386/stack-usage-realign.c scan-file main\\t48\\tdynamic,bounded
tests always compile and then fail on execution at -m32. Their code generation is
always consistent. It is the gcc.c-torture/execute/builtins/sprintf-chk.c
compilation which is exhibiting three behaviors...

1) consistent code
2) random differences in code
3) compiler ICE

My main concern is if this bug will spill over into production code generation.
             Jack

> 
> -- 
> Eric Botcazou

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Implement -fstack-usage option
  2010-09-02  0:08         ` Jack Howarth
@ 2010-09-02 11:38           ` Eric Botcazou
  2010-09-02 13:24             ` Jack Howarth
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Botcazou @ 2010-09-02 11:38 UTC (permalink / raw)
  To: Jack Howarth; +Cc: Richard Henderson, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1163 bytes --]

> Sorry, my mistake. The gcc.dg/stack-usage-1.c scan-file
> foo\\t(256|264)\\tstatic and gcc.target/i386/stack-usage-realign.c
> scan-file main\\t48\\tdynamic,bounded tests always compile and then fail on
> execution at -m32.  Their code generation is always consistent.

Not on execution but on scanning of the stack usage file.  Adjusted thusly,
applied on the mainline.


        * gcc.dg/stack-usage-1.c: Adjust on i386/Darwin.
        * gcc.target/i386/stack-usage-realign.c: Skip on i386/Darwin.


> It is the gcc.c-torture/execute/builtins/sprintf-chk.c compilation which is
> exhibiting three behaviors...
>
> 1) consistent code
> 2) random differences in code
> 3) compiler ICE
>
> My main concern is if this bug will spill over into production code
> generation.

Sure, but I don't really see the relationship with my patch.  It has been used 
for more than 4 years in AdaCore compilers based on 3 different GCC versions 
on a wide range of architectures and OSes (including Darwin) so I've a hard 
time believing it could be responsible for random failures like this one, 
especially if you don't pass -fstack-usage to the compiler.

-- 
Eric Botcazou

[-- Attachment #2: p.diff --]
[-- Type: text/x-diff, Size: 990 bytes --]

Index: gcc.target/i386/stack-usage-realign.c
===================================================================
--- gcc.target/i386/stack-usage-realign.c	(revision 163745)
+++ gcc.target/i386/stack-usage-realign.c	(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target ilp32 } */
+/* { dg-skip-if "no stack realignment" { *-*-darwin* } { "*" } { "" } } */
 /* { dg-options "-fstack-usage -msse2 -mforce-drap" } */
 
 typedef int __attribute__((vector_size(16))) vec;
Index: gcc.dg/stack-usage-1.c
===================================================================
--- gcc.dg/stack-usage-1.c	(revision 163745)
+++ gcc.dg/stack-usage-1.c	(working copy)
@@ -8,7 +8,11 @@
    Then check that this is the actual stack usage in the assembly file.  */
 
 #if defined(__i386__)
-#  define SIZE 248
+#  if defined (__MACH__)
+#    define SIZE 232
+#  else
+#    define SIZE 248
+#  endif
 #elif defined(__x86_64__)
 #  define SIZE 356
 #elif defined (__sparc__)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Implement -fstack-usage option
  2010-09-02 11:38           ` Eric Botcazou
@ 2010-09-02 13:24             ` Jack Howarth
  2010-09-02 14:39               ` Eric Botcazou
  0 siblings, 1 reply; 11+ messages in thread
From: Jack Howarth @ 2010-09-02 13:24 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Richard Henderson, gcc-patches

On Thu, Sep 02, 2010 at 01:01:48PM +0200, Eric Botcazou wrote:
> 
> > It is the gcc.c-torture/execute/builtins/sprintf-chk.c compilation which is
> > exhibiting three behaviors...
> >
> > 1) consistent code
> > 2) random differences in code
> > 3) compiler ICE
> >
> > My main concern is if this bug will spill over into production code
> > generation.
> 
> Sure, but I don't really see the relationship with my patch.  It has been used 
> for more than 4 years in AdaCore compilers based on 3 different GCC versions 
> on a wide range of architectures and OSes (including Darwin) so I've a hard 
> time believing it could be responsible for random failures like this one, 
> especially if you don't pass -fstack-usage to the compiler.
> 

Eric,
   Looking at the patch, it seems to me that I could strip out everything
wrappered in if flag_stack_usage statements and still end up with very
significant code changes in the compiler execution even without -fstack-usage
being used. For example, the calls to allocate_dynamic_stack_space() in
calls.c and changes like...

@@ -1223,13 +1255,28 @@
      insns.  Since this is an extremely rare event, we have no reliable
      way of knowing which systems have this problem.  So we avoid even
      momentarily mis-aligning the stack.  */
+  if (!known_align_valid || known_align % PREFERRED_STACK_BOUNDARY != 0)
+    {
+      size = round_push (size);
 
-  /* If we added a variable amount to SIZE,
-     we can no longer assume it is aligned.  */
-#if !defined (SETJMP_VIA_SAVE_AREA)
-  if (MUST_ALIGN || known_align % PREFERRED_STACK_BOUNDARY != 0)
-#endif
-    size = round_push (size);
+      if (flag_stack_usage)
+	{
+	  int align = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
+	  stack_usage_size = (stack_usage_size + align - 1) / align * align;
+	}
+    }
+
+  /* The size is supposed to be fully adjusted at this point so record it
+     if stack usage info is requested.  */
+  if (flag_stack_usage)
+    {
+      current_function_dynamic_stack_size += stack_usage_size;
+
+      /* ??? This is gross but the only safe stance in the absence
+	 of stack usage oriented flow analysis.  */
+      if (!cannot_accumulate)
+	current_function_has_unbounded_dynamic_stack_size = 1;
+    }
 
   do_pending_stack_adjust ();

It seems like a leap of faith to say this code executes identically
to the previous in the absence of the -fstack-usage flag.
                Jack

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Implement -fstack-usage option
  2010-09-02 13:24             ` Jack Howarth
@ 2010-09-02 14:39               ` Eric Botcazou
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Botcazou @ 2010-09-02 14:39 UTC (permalink / raw)
  To: Jack Howarth; +Cc: Richard Henderson, gcc-patches

> It seems like a leap of faith to say this code executes identically
> to the previous in the absence of the -fstack-usage flag.

No, it isn't, just look at both codes and you'll be convinced as well.  The 
only (possible) changes are for SETJMP_VIA_SAVE_AREA targets, i.e. SPARC.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-09-02 14:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-05 15:42 [PATCH] Implement -fstack-usage option Eric Botcazou
2010-08-05 19:13 ` Richard Henderson
2010-08-06 17:42   ` Eric Botcazou
2010-08-30 17:20     ` Eric Botcazou
2010-08-30 18:00       ` Richard Henderson
2010-09-01 20:55     ` Jack Howarth
2010-09-01 21:32       ` Eric Botcazou
2010-09-02  0:08         ` Jack Howarth
2010-09-02 11:38           ` Eric Botcazou
2010-09-02 13:24             ` Jack Howarth
2010-09-02 14:39               ` Eric Botcazou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).