public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [cft] aligning main's stack frame
       [not found] ` <20051016201110.GA7226@redhat.com.suse.lists.egcs>
@ 2005-10-17 15:30   ` Andi Kleen
  2005-10-17 19:15     ` Richard Henderson
  0 siblings, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2005-10-17 15:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc

Richard Henderson <rth@redhat.com> writes:

> main:
>         leal    4(%esp), %ecx		# create argument pointer
>         andl    $-16, %esp		# align stack
>         pushl   -4(%ecx)		# copy return address

This will misaligned the call/ret stack in the CPU, leading to branch
mispredictions on many of the following RETs. On main it's probably
not a big issue, but for other functions it might be.

-Andi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [cft] aligning main's stack frame
  2005-10-17 15:30   ` [cft] aligning main's stack frame Andi Kleen
@ 2005-10-17 19:15     ` Richard Henderson
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2005-10-17 19:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc

On Mon, Oct 17, 2005 at 12:25:46PM +0200, Andi Kleen wrote:
> > main:
> >         leal    4(%esp), %ecx		# create argument pointer
> >         andl    $-16, %esp		# align stack
> >         pushl   -4(%ecx)		# copy return address
> 
> This will misaligned the call/ret stack in the CPU, leading to branch
> mispredictions on many of the following RETs. On main it's probably
> not a big issue, but for other functions it might be.

No it won't.  I don't actually use that for the return insn.

I cheat and move the CFA, and that copy satisfies the return
address at CFA-4 during the body of the function.


r~

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [cft] aligning main's stack frame
  2005-10-16 20:11 ` [cft] aligning main's stack frame Richard Henderson
  2005-10-16 20:59   ` Kean Johnston
@ 2005-11-03  1:46   ` Richard Henderson
  1 sibling, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2005-11-03  1:46 UTC (permalink / raw)
  To: Kean Johnston, gcc-patches, gcc

After some more testing, I went ahead and committed this.


r~

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [cft] aligning main's stack frame
  2005-10-16 20:11 ` [cft] aligning main's stack frame Richard Henderson
@ 2005-10-16 20:59   ` Kean Johnston
  2005-11-03  1:46   ` Richard Henderson
  1 sibling, 0 replies; 5+ messages in thread
From: Kean Johnston @ 2005-10-16 20:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches, gcc

> This should get more than just bootstrap testing.  Anyone care to
> help out here?
I'm bringing my mainline tree up to speed, as all the porting work
I recently did was on the 4.0 branch, but once that's done I'll
be glad to help out. Aside from the full testsuite, I will compile
up Xorg and an internal package I have which is a collection of
about 70 different open source libraries (gtk, cairo, libxml2 etc)
and perl and apache/php. If that all works, then thats about as
well tested as I can think to get. Its a lot of code, some of
which really stresses the compiler (Xorg and Perl in particular).

Kean

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [cft] aligning main's stack frame
  2005-10-16  3:43 About the alignment of main Kean Johnston
@ 2005-10-16 20:11 ` Richard Henderson
  2005-10-16 20:59   ` Kean Johnston
  2005-11-03  1:46   ` Richard Henderson
  0 siblings, 2 replies; 5+ messages in thread
From: Richard Henderson @ 2005-10-16 20:11 UTC (permalink / raw)
  To: Kean Johnston; +Cc: gcc-patches, gcc

So remember all that stuff I said earlier about it being intractibly
hard to realign the current stack frame?  It appears that the generic
bits of the compiler have improved since I last tried it.  In
particular, the argument pointer can now be a pseudo register.  I
think we had to add this for hppa64, but I'm not sure.

The realignment isn't free, and does disable tail-calls, so I wouldn't
suggest changing the current 16-byte preferred-stack-alignment.  But
it may be possible to use this on any function with at least one free
register on entry (i.e. regparm < (3 - need_static_chain_p)).

I don't do it here, but it *would* be possible to add an attribute to
enable this for any random function, or even (gasp) a command-line flag
that turns it on for sse-using functions with preferred-stack-alignment
back down to 4.  Something like this wouldn't be appropriate for gcc 4.1.

So given

	void dummy (void);
	int main1(int, char **);
	int main(int ac, char **av)
	{
	  dummy ();
	  return main1(ac, av);
	}

we emit -mtune=pentium4 -O2,

main:
        leal    4(%esp), %ecx		# create argument pointer
        andl    $-16, %esp		# align stack
        pushl   -4(%ecx)		# copy return address
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %esi
        pushl   %ebx
        pushl   %ecx			# save argument pointer
        subl    $12, %esp
        movl    (%ecx), %esi		# load arguments via arg pointer
        movl    4(%ecx), %ebx
        call    dummy
        movl    %ebx, 4(%esp)
        movl    %esi, (%esp)
        call    main1
        addl    $12, %esp
        popl    %ecx
        popl    %ebx
        popl    %esi
        popl    %ebp
        leal    -4(%ecx), %esp		# restore post-call stack pointer
        ret

There are a number of constraints that must be satisfied here.  The
argument pointer in pseudo code doesn't allow an offset, so we can't
just do straight movs between ecx and esp at the beginning and the
end of the function.  Copying the return address tremendously 
simplifies the unwinding data we generate.  Which for this function is

  DW_CFA_advance_loc: 4 to 00000004
  DW_CFA_def_cfa: r1 ofs 0
  DW_CFA_register: r4 in r1
  DW_CFA_advance_loc: 6 to 0000000a
  DW_CFA_def_cfa: r4 ofs 4
  DW_CFA_advance_loc: 1 to 0000000b
  DW_CFA_def_cfa_offset: 8
  DW_CFA_offset: r5 at cfa-8
  DW_CFA_advance_loc: 2 to 0000000d
  DW_CFA_def_cfa_reg: r5
  DW_CFA_advance_loc: 3 to 00000010
  DW_CFA_offset: r4 at cfa-20
  DW_CFA_offset: r3 at cfa-16
  DW_CFA_offset: r6 at cfa-12

Finally, this code allows -fomit-frame-pointer to be effective in 
main again.

This should get more than just bootstrap testing.  Anyone care to
help out here?



r~


	* dwarf2out.c (dwarf2out_reg_save_reg): New.
	(dwarf2out_frame_debug_expr): Return after dwarf_handle_frame_unspec.
	* function.c (assign_parms): Use calls.internal_arg_pointer.
	(expand_main_function): Remove FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN
	code.
	* target-def.h (TARGET_INTERNAL_ARG_POINTER): New.
	(TARGET_CALLS): Add it.
	* target.h (struct gcc_target): Add calls.internal_arg_pointer.
	* targhooks.c (default_internal_arg_pointer): New.
	* targhooks.h (default_internal_arg_pointer): Declare.
	* tree.h (dwarf2out_reg_save_reg): Declare.
	* doc/tm.texi (FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN): Remove.

	* config/i386/i386.c (dbx_register_map): Add return column.
	(dbx64_register_map, svr4_dbx_register_map): Likewise.
	(TARGET_INTERNAL_ARG_POINTER, ix86_internal_arg_pointer): New.
	(TARGET_DWARF_HANDLE_FRAME_UNSPEC, ix86_dwarf_handle_frame_unspec): New.
	(ix86_function_ok_for_sibcall): Disable if force_align_arg_pointer.
	(ix86_save_reg): Save force_align_arg_pointer.
	(ix86_emit_save_regs): Make regno unsigned.
	(ix86_emit_save_regs_using_mov): Likewise.
	(ix86_expand_prologue): Handle force_align_arg_pointer.
	(ix86_expand_epilogue): Likewise.
	* config/i386/i386.h: (dbx_register_map): Update.
	(dbx64_register_map, svr4_dbx_register_map): Update.
	(struct machine_function): Add force_align_arg_pointer.
	* config/i386/i386.md (UNSPEC_REG_SAVE, UNSPEC_DEF_CFA): New.
	(UNSPEC_TP, UNSPEC_TLS_GD, UNSPEC_TLS_LD_BASE): Renumber.
	(TARGET_PUSH_MEMORY peepholes): Disable if RTX_FRAME_RELATED_P.

Index: dwarf2out.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/dwarf2out.c,v
retrieving revision 1.615
diff -u -p -d -r1.615 dwarf2out.c
--- dwarf2out.c	6 Oct 2005 19:33:02 -0000	1.615
+++ dwarf2out.c	16 Oct 2005 19:29:44 -0000
@@ -1271,6 +1271,30 @@ clobbers_queued_reg_save (rtx insn)
   return false;
 }
 
+/* Entry point for saving the first register into the second.  */
+
+void
+dwarf2out_reg_save_reg (const char *label, rtx reg, rtx sreg)
+{
+  size_t i;
+  unsigned int regno, sregno;
+
+  for (i = 0; i < num_regs_saved_in_regs; i++)
+    if (REGNO (regs_saved_in_regs[i].orig_reg) == REGNO (reg))
+      break;
+  if (i == num_regs_saved_in_regs)
+    {
+      gcc_assert (i != ARRAY_SIZE (regs_saved_in_regs));
+      num_regs_saved_in_regs++;
+    }
+  regs_saved_in_regs[i].orig_reg = reg;
+  regs_saved_in_regs[i].saved_in_reg = sreg;
+
+  regno = DWARF_FRAME_REGNUM (REGNO (reg));
+  sregno = DWARF_FRAME_REGNUM (REGNO (sreg));
+  reg_save (label, regno, sregno, 0);
+}
+
 /* What register, if any, is currently saved in REG?  */
 
 static rtx
@@ -1659,7 +1683,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	case UNSPEC_VOLATILE:
 	  gcc_assert (targetm.dwarf_handle_frame_unspec);
 	  targetm.dwarf_handle_frame_unspec (label, expr, XINT (src, 1));
-	  break;
+	  return;
 
 	default:
 	  gcc_unreachable ();
Index: function.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/function.c,v
retrieving revision 1.646
diff -u -p -d -r1.646 function.c
--- function.c	12 Oct 2005 23:34:09 -0000	1.646
+++ function.c	16 Oct 2005 19:29:45 -0000
@@ -2894,22 +2894,9 @@ assign_parms (tree fndecl)
 {
   struct assign_parm_data_all all;
   tree fnargs, parm;
-  rtx internal_arg_pointer;
-
-  /* If the reg that the virtual arg pointer will be translated into is
-     not a fixed reg or is the stack pointer, make a copy of the virtual
-     arg pointer, and address parms via the copy.  The frame pointer is
-     considered fixed even though it is not marked as such.
-
-     The second time through, simply use ap to avoid generating rtx.  */
 
-  if ((ARG_POINTER_REGNUM == STACK_POINTER_REGNUM
-       || ! (fixed_regs[ARG_POINTER_REGNUM]
-	     || ARG_POINTER_REGNUM == FRAME_POINTER_REGNUM)))
-    internal_arg_pointer = copy_to_reg (virtual_incoming_args_rtx);
-  else
-    internal_arg_pointer = virtual_incoming_args_rtx;
-  current_function_internal_arg_pointer = internal_arg_pointer;
+  current_function_internal_arg_pointer
+    = targetm.calls.internal_arg_pointer ();
 
   assign_parms_initialize_all (&all);
   fnargs = assign_parms_augmented_arg_list (&all);
@@ -3917,42 +3904,6 @@ struct tree_opt_pass pass_init_function 
 void
 expand_main_function (void)
 {
-#ifdef FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN
-  if (FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN)
-    {
-      int align = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
-      rtx tmp, seq;
-
-      start_sequence ();
-      /* Forcibly align the stack.  */
-#ifdef STACK_GROWS_DOWNWARD
-      tmp = expand_simple_binop (Pmode, AND, stack_pointer_rtx, GEN_INT(-align),
-				 stack_pointer_rtx, 1, OPTAB_WIDEN);
-#else
-      tmp = expand_simple_binop (Pmode, PLUS, stack_pointer_rtx,
-				 GEN_INT (align - 1), NULL_RTX, 1, OPTAB_WIDEN);
-      tmp = expand_simple_binop (Pmode, AND, tmp, GEN_INT (-align),
-				 stack_pointer_rtx, 1, OPTAB_WIDEN);
-#endif
-      if (tmp != stack_pointer_rtx)
-	emit_move_insn (stack_pointer_rtx, tmp);
-
-      /* Enlist allocate_dynamic_stack_space to pick up the pieces.  */
-      tmp = force_reg (Pmode, const0_rtx);
-      allocate_dynamic_stack_space (tmp, NULL_RTX, BIGGEST_ALIGNMENT);
-      seq = get_insns ();
-      end_sequence ();
-
-      for (tmp = get_last_insn (); tmp; tmp = PREV_INSN (tmp))
-	if (NOTE_P (tmp) && NOTE_LINE_NUMBER (tmp) == NOTE_INSN_FUNCTION_BEG)
-	  break;
-      if (tmp)
-	emit_insn_before (seq, tmp);
-      else
-	emit_insn (seq);
-    }
-#endif
-
 #if (defined(INVOKE__main)				\
      || (!defined(HAS_INIT_SECTION)			\
 	 && !defined(INIT_SECTION_ASM_OP)		\
Index: target-def.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/target-def.h,v
retrieving revision 1.134
diff -u -p -d -r1.134 target-def.h
--- target-def.h	12 Oct 2005 20:54:48 -0000	1.134
+++ target-def.h	16 Oct 2005 19:29:45 -0000
@@ -445,6 +445,7 @@ Foundation, 51 Franklin Street, Fifth Fl
 #define TARGET_ARG_PARTIAL_BYTES hook_int_CUMULATIVE_ARGS_mode_tree_bool_0
 
 #define TARGET_FUNCTION_VALUE default_function_value
+#define TARGET_INTERNAL_ARG_POINTER default_internal_arg_pointer
 
 #define TARGET_CALLS {						\
    TARGET_PROMOTE_FUNCTION_ARGS,				\
@@ -463,7 +464,8 @@ Foundation, 51 Franklin Street, Fifth Fl
    TARGET_CALLEE_COPIES,					\
    TARGET_ARG_PARTIAL_BYTES,					\
    TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN,			\
-   TARGET_FUNCTION_VALUE					\
+   TARGET_FUNCTION_VALUE,					\
+   TARGET_INTERNAL_ARG_POINTER					\
    }
 
 #ifndef TARGET_UNWIND_TABLES_DEFAULT
Index: target.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/target.h,v
retrieving revision 1.145
diff -u -p -d -r1.145 target.h
--- target.h	12 Oct 2005 20:54:48 -0000	1.145
+++ target.h	16 Oct 2005 19:29:46 -0000
@@ -613,6 +613,10 @@ struct gcc_target
        specified by FN_DECL_OR_TYPE with a return type of RET_TYPE.  */
     rtx (*function_value) (tree ret_type, tree fn_decl_or_type,
 			   bool outgoing);
+
+    /* Return an rtx for the argument pointer incoming to the
+       current function.  */
+    rtx (*internal_arg_pointer) (void);
   } calls;
 
   /* Return the diagnostic message string if conversion from FROMTYPE
Index: targhooks.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/targhooks.c,v
retrieving revision 2.46
diff -u -p -d -r2.46 targhooks.c
--- targhooks.c	14 Jul 2005 07:39:55 -0000	2.46
+++ targhooks.c	16 Oct 2005 19:29:46 -0000
@@ -62,6 +62,7 @@ Software Foundation, 51 Franklin Street,
 #include "tm_p.h"
 #include "target-def.h"
 #include "ggc.h"
+#include "hard-reg-set.h"
 
 
 void
@@ -439,4 +440,19 @@ default_function_value (tree ret_type AT
 #endif
 }
 
+rtx
+default_internal_arg_pointer (void)
+{
+  /* If the reg that the virtual arg pointer will be translated into is
+     not a fixed reg or is the stack pointer, make a copy of the virtual
+     arg pointer, and address parms via the copy.  The frame pointer is
+     considered fixed even though it is not marked as such.  */
+  if ((ARG_POINTER_REGNUM == STACK_POINTER_REGNUM
+       || ! (fixed_regs[ARG_POINTER_REGNUM]
+	     || ARG_POINTER_REGNUM == FRAME_POINTER_REGNUM)))
+    return copy_to_reg (virtual_incoming_args_rtx);
+  else
+    return virtual_incoming_args_rtx;
+}
+
 #include "gt-targhooks.h"
Index: targhooks.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/targhooks.h,v
retrieving revision 2.33
diff -u -p -d -r2.33 targhooks.h
--- targhooks.h	14 Jul 2005 07:39:56 -0000	2.33
+++ targhooks.h	16 Oct 2005 19:29:46 -0000
@@ -68,4 +68,4 @@ extern const char *hook_invalid_arg_for_
   (tree, tree, tree);
 extern bool hook_bool_rtx_commutative_p (rtx, int);
 extern rtx default_function_value (tree, tree, bool);
-
+extern rtx default_internal_arg_pointer (void);
Index: tree.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree.h,v
retrieving revision 1.759
diff -u -p -d -r1.759 tree.h
--- tree.h	12 Oct 2005 23:34:09 -0000	1.759
+++ tree.h	16 Oct 2005 19:29:47 -0000
@@ -4119,6 +4119,10 @@ extern void dwarf2out_return_save (const
 
 extern void dwarf2out_return_reg (const char *, unsigned);
 
+/* Entry point for saving the first register into the second.  */
+
+extern void dwarf2out_reg_save_reg (const char *, rtx, rtx);
+
 /* In tree-inline.c  */
 
 /* The type of a set of already-visited pointers.  Functions for creating
Index: config/i386/i386.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v
retrieving revision 1.863
diff -u -p -d -r1.863 i386.c
--- config/i386/i386.c	12 Oct 2005 20:54:49 -0000	1.863
+++ config/i386/i386.c	16 Oct 2005 19:29:50 -0000
@@ -627,7 +627,7 @@ enum reg_class const regclass_map[FIRST_
 
 /* The "default" register map used in 32bit mode.  */
 
-int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
+int const dbx_register_map[FIRST_PSEUDO_REGISTER+1] =
 {
   0, 2, 1, 3, 6, 7, 4, 5,		/* general regs */
   12, 13, 14, 15, 16, 17, 18, 19,	/* fp regs */
@@ -636,6 +636,7 @@ int const dbx_register_map[FIRST_PSEUDO_
   29, 30, 31, 32, 33, 34, 35, 36,       /* MMX */
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended integer registers */
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
+  8					/* return column */
 };
 
 static int const x86_64_int_parameter_registers[6] =
@@ -650,7 +651,7 @@ static int const x86_64_int_return_regis
 };
 
 /* The "default" register map used in 64bit mode.  */
-int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
+int const dbx64_register_map[FIRST_PSEUDO_REGISTER+1] =
 {
   0, 1, 2, 3, 4, 5, 6, 7,		/* general regs */
   33, 34, 35, 36, 37, 38, 39, 40,	/* fp regs */
@@ -659,6 +660,7 @@ int const dbx64_register_map[FIRST_PSEUD
   41, 42, 43, 44, 45, 46, 47, 48,       /* MMX */
   8,9,10,11,12,13,14,15,		/* extended integer registers */
   25, 26, 27, 28, 29, 30, 31, 32,	/* extended SSE registers */
+  16					/* return column */
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -715,7 +717,7 @@ int const dbx64_register_map[FIRST_PSEUD
 	17 for %st(6) (gcc regno = 14)
 	18 for %st(7) (gcc regno = 15)
 */
-int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] =
+int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER+1] =
 {
   0, 2, 1, 3, 6, 7, 5, 4,		/* general regs */
   11, 12, 13, 14, 15, 16, 17, 18,	/* fp regs */
@@ -724,6 +726,7 @@ int const svr4_dbx_register_map[FIRST_PS
   29, 30, 31, 32, 33, 34, 35, 36,	/* MMX registers */
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended integer registers */
   -1, -1, -1, -1, -1, -1, -1, -1,	/* extended SSE registers */
+  8					/* return column */
 };
 
 /* Test and compare insns in i386.md store the information needed to
@@ -913,6 +916,8 @@ static void ix86_init_builtins (void);
 static rtx ix86_expand_builtin (tree, rtx, rtx, enum machine_mode, int);
 static const char *ix86_mangle_fundamental_type (tree);
 static tree ix86_stack_protect_fail (void);
+static rtx ix86_internal_arg_pointer (void);
+static void ix86_dwarf_handle_frame_unspec (const char *, rtx, int);
 
 /* This function is only used on Solaris.  */
 static void i386_solaris_elf_named_section (const char *, unsigned int, tree)
@@ -1081,6 +1086,10 @@ static void x86_64_elf_select_section (t
 #define TARGET_MUST_PASS_IN_STACK ix86_must_pass_in_stack
 #undef TARGET_PASS_BY_REFERENCE
 #define TARGET_PASS_BY_REFERENCE ix86_pass_by_reference
+#undef TARGET_INTERNAL_ARG_POINTER
+#define TARGET_INTERNAL_ARG_POINTER ix86_internal_arg_pointer
+#undef TARGET_DWARF_HANDLE_FRAME_UNSPEC
+#define TARGET_DWARF_HANDLE_FRAME_UNSPEC ix86_dwarf_handle_frame_unspec
 
 #undef TARGET_GIMPLIFY_VA_ARG_EXPR
 #define TARGET_GIMPLIFY_VA_ARG_EXPR ix86_gimplify_va_arg
@@ -1987,6 +1996,11 @@ ix86_function_ok_for_sibcall (tree decl,
     return false;
 #endif
 
+  /* If we forced aligned the stack, then sibcalling would unalign the
+     stack, which may break the called function.  */
+  if (cfun->machine->force_align_arg_pointer)
+    return false;
+
   /* Otherwise okay.  That also includes certain types of indirect calls.  */
   return true;
 }
@@ -4508,6 +4522,10 @@ ix86_save_reg (unsigned int regno, int m
 	}
     }
 
+  if (cfun->machine->force_align_arg_pointer
+      && regno == REGNO (cfun->machine->force_align_arg_pointer))
+    return 1;
+
   return (regs_ever_live[regno]
 	  && !call_used_regs[regno]
 	  && !fixed_regs[regno]
@@ -4719,10 +4737,10 @@ ix86_compute_frame_layout (struct ix86_f
 static void
 ix86_emit_save_regs (void)
 {
-  int regno;
+  unsigned int regno;
   rtx insn;
 
-  for (regno = FIRST_PSEUDO_REGISTER - 1; regno >= 0; regno--)
+  for (regno = FIRST_PSEUDO_REGISTER; regno-- > 0; )
     if (ix86_save_reg (regno, true))
       {
 	insn = emit_insn (gen_push (gen_rtx_REG (Pmode, regno)));
@@ -4735,7 +4753,7 @@ ix86_emit_save_regs (void)
 static void
 ix86_emit_save_regs_using_mov (rtx pointer, HOST_WIDE_INT offset)
 {
-  int regno;
+  unsigned int regno;
   rtx insn;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
@@ -4783,6 +4801,47 @@ pro_epilogue_adjust_stack (rtx dest, rtx
     RTX_FRAME_RELATED_P (insn) = 1;
 }
 
+/* Handle the TARGET_INTERNAL_ARG_POINTER hook.  */
+
+static rtx
+ix86_internal_arg_pointer (void)
+{
+  if (FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN
+      && DECL_NAME (current_function_decl)
+      && MAIN_NAME_P (DECL_NAME (current_function_decl))
+      && DECL_FILE_SCOPE_P (current_function_decl))
+    {
+      cfun->machine->force_align_arg_pointer = gen_rtx_REG (Pmode, 2);
+      return copy_to_reg (cfun->machine->force_align_arg_pointer);
+    }
+  else
+    return virtual_incoming_args_rtx;
+}
+
+/* Handle the TARGET_DWARF_HANDLE_FRAME_UNSPEC hook.
+   This is called from dwarf2out.c to emit call frame instructions
+   for frame-related insns containing UNSPECs and UNSPEC_VOLATILEs. */
+static void
+ix86_dwarf_handle_frame_unspec (const char *label, rtx pattern, int index)
+{
+  rtx unspec = SET_SRC (pattern);
+  gcc_assert (GET_CODE (unspec) == UNSPEC);
+
+  switch (index)
+    {
+    case UNSPEC_REG_SAVE:
+      dwarf2out_reg_save_reg (label, XVECEXP (unspec, 0, 0),
+			      SET_DEST (pattern));
+      break;
+    case UNSPEC_DEF_CFA:
+      dwarf2out_def_cfa (label, REGNO (SET_DEST (pattern)),
+			 INTVAL (XVECEXP (unspec, 0, 0)));
+      break;
+    default:
+      gcc_unreachable ();
+    }
+}
+
 /* Expand the prologue into a bunch of separate insns.  */
 
 void
@@ -4795,6 +4854,52 @@ ix86_expand_prologue (void)
 
   ix86_compute_frame_layout (&frame);
 
+  if (cfun->machine->force_align_arg_pointer)
+    {
+      rtx x, y;
+
+      /* Grab the argument pointer.  */
+      x = plus_constant (stack_pointer_rtx, 4);
+      y = cfun->machine->force_align_arg_pointer;
+      insn = emit_insn (gen_rtx_SET (VOIDmode, y, x));
+      RTX_FRAME_RELATED_P (insn) = 1;
+
+      /* The unwind info consists of two parts: install the fafp as the cfa,
+	 and record the fafp as the "save register" of the stack pointer.
+	 The later is there in order that the unwinder can see where it
+	 should restore the stack pointer across the and insn.  */
+      x = gen_rtx_UNSPEC (VOIDmode, gen_rtvec (1, const0_rtx), UNSPEC_DEF_CFA);
+      x = gen_rtx_SET (VOIDmode, y, x);
+      RTX_FRAME_RELATED_P (x) = 1;
+      y = gen_rtx_UNSPEC (VOIDmode, gen_rtvec (1, stack_pointer_rtx),
+			  UNSPEC_REG_SAVE);
+      y = gen_rtx_SET (VOIDmode, cfun->machine->force_align_arg_pointer, y);
+      RTX_FRAME_RELATED_P (y) = 1;
+      x = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, x, y));
+      x = gen_rtx_EXPR_LIST (REG_FRAME_RELATED_EXPR, x, NULL);
+      REG_NOTES (insn) = x;
+
+      /* Align the stack.  */
+      emit_insn (gen_andsi3 (stack_pointer_rtx, stack_pointer_rtx,
+			     GEN_INT (-16)));
+
+      /* And here we cheat like madmen with the unwind info.  We force the
+	 cfa register back to sp+4, which is exactly what it was at the
+	 start of the function.  Re-pushing the return address results in
+	 the return at the same spot relative to the cfa, and thus is 
+	 correct wrt the unwind info.  */
+      x = cfun->machine->force_align_arg_pointer;
+      x = gen_frame_mem (Pmode, plus_constant (x, -4));
+      insn = emit_insn (gen_push (x));
+      RTX_FRAME_RELATED_P (insn) = 1;
+
+      x = GEN_INT (4);
+      x = gen_rtx_UNSPEC (VOIDmode, gen_rtvec (1, x), UNSPEC_DEF_CFA);
+      x = gen_rtx_SET (VOIDmode, stack_pointer_rtx, x);
+      x = gen_rtx_EXPR_LIST (REG_FRAME_RELATED_EXPR, x, NULL);
+      REG_NOTES (insn) = x;
+    }
+
   /* Note: AT&T enter does NOT have reversed args.  Enter is probably
      slower on all targets.  Also sdb doesn't like it.  */
 
@@ -5072,6 +5177,13 @@ ix86_expand_epilogue (int style)
 	}
     }
 
+  if (cfun->machine->force_align_arg_pointer)
+    {
+      emit_insn (gen_addsi3 (stack_pointer_rtx,
+			     cfun->machine->force_align_arg_pointer,
+			     GEN_INT (-4)));
+    }
+
   /* Sibcall epilogues don't want a return instruction.  */
   if (style == 0)
     return;
Index: config/i386/i386.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.h,v
retrieving revision 1.446
diff -u -p -d -r1.446 i386.h
--- config/i386/i386.h	4 Oct 2005 14:07:24 -0000	1.446
+++ config/i386/i386.h	16 Oct 2005 19:29:51 -0000
@@ -1990,9 +1990,9 @@ number as al, and ax.
 #define DBX_REGISTER_NUMBER(N) \
   (TARGET_64BIT ? dbx64_register_map[(N)] : dbx_register_map[(N)])
 
-extern int const dbx_register_map[FIRST_PSEUDO_REGISTER];
-extern int const dbx64_register_map[FIRST_PSEUDO_REGISTER];
-extern int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER];
+extern int const dbx_register_map[FIRST_PSEUDO_REGISTER+1];
+extern int const dbx64_register_map[FIRST_PSEUDO_REGISTER+1];
+extern int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER+1];
 
 /* Before the prologue, RA is at 0(%esp).  */
 #define INCOMING_RETURN_ADDR_RTX \
@@ -2263,6 +2263,7 @@ struct machine_function GTY(())
 {
   struct stack_local_entry *stack_locals;
   const char *some_ld_name;
+  rtx force_align_arg_pointer;
   int save_varrargs_registers;
   int accesses_prev_frame;
   int optimize_mode_switching[MAX_386_ENTITIES];
Index: config/i386/i386.md
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.md,v
retrieving revision 1.659
diff -u -p -d -r1.659 i386.md
--- config/i386/i386.md	11 Oct 2005 08:42:24 -0000	1.659
+++ config/i386/i386.md	16 Oct 2005 19:29:55 -0000
@@ -66,11 +66,13 @@
    (UNSPEC_STACK_ALLOC		11)
    (UNSPEC_SET_GOT		12)
    (UNSPEC_SSE_PROLOGUE_SAVE	13)
+   (UNSPEC_REG_SAVE		14)
+   (UNSPEC_DEF_CFA		15)
 
    ; TLS support
-   (UNSPEC_TP			15)
-   (UNSPEC_TLS_GD		16)
-   (UNSPEC_TLS_LD_BASE		17)
+   (UNSPEC_TP			16)
+   (UNSPEC_TLS_GD		17)
+   (UNSPEC_TLS_LD_BASE		18)
 
    ; Other random patterns
    (UNSPEC_SCAS			20)
@@ -18929,7 +18931,8 @@
   [(set (match_operand:SI 0 "push_operand" "")
 	(match_operand:SI 1 "memory_operand" ""))
    (match_scratch:SI 2 "r")]
-  "! optimize_size && ! TARGET_PUSH_MEMORY"
+  "!optimize_size && !TARGET_PUSH_MEMORY
+   && !RTX_FRAME_RELATED_P (peep2_next_insn (0))"
   [(set (match_dup 2) (match_dup 1))
    (set (match_dup 0) (match_dup 2))]
   "")
@@ -18938,7 +18941,8 @@
   [(set (match_operand:DI 0 "push_operand" "")
 	(match_operand:DI 1 "memory_operand" ""))
    (match_scratch:DI 2 "r")]
-  "! optimize_size && ! TARGET_PUSH_MEMORY"
+  "!optimize_size && !TARGET_PUSH_MEMORY
+   && !RTX_FRAME_RELATED_P (peep2_next_insn (0))"
   [(set (match_dup 2) (match_dup 1))
    (set (match_dup 0) (match_dup 2))]
   "")
@@ -18949,7 +18953,8 @@
   [(set (match_operand:SF 0 "push_operand" "")
 	(match_operand:SF 1 "memory_operand" ""))
    (match_scratch:SF 2 "r")]
-  "! optimize_size && ! TARGET_PUSH_MEMORY"
+  "!optimize_size && !TARGET_PUSH_MEMORY
+   && !RTX_FRAME_RELATED_P (peep2_next_insn (0))"
   [(set (match_dup 2) (match_dup 1))
    (set (match_dup 0) (match_dup 2))]
   "")
@@ -18958,7 +18963,8 @@
   [(set (match_operand:HI 0 "push_operand" "")
 	(match_operand:HI 1 "memory_operand" ""))
    (match_scratch:HI 2 "r")]
-  "! optimize_size && ! TARGET_PUSH_MEMORY"
+  "!optimize_size && !TARGET_PUSH_MEMORY
+   && !RTX_FRAME_RELATED_P (peep2_next_insn (0))"
   [(set (match_dup 2) (match_dup 1))
    (set (match_dup 0) (match_dup 2))]
   "")
@@ -18967,7 +18973,8 @@
   [(set (match_operand:QI 0 "push_operand" "")
 	(match_operand:QI 1 "memory_operand" ""))
    (match_scratch:QI 2 "q")]
-  "! optimize_size && ! TARGET_PUSH_MEMORY"
+  "!optimize_size && !TARGET_PUSH_MEMORY
+   && !RTX_FRAME_RELATED_P (peep2_next_insn (0))"
   [(set (match_dup 2) (match_dup 1))
    (set (match_dup 0) (match_dup 2))]
   "")
Index: doc/tm.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/tm.texi,v
retrieving revision 1.447
diff -u -p -d -r1.447 tm.texi
--- doc/tm.texi	12 Oct 2005 20:54:49 -0000	1.447
+++ doc/tm.texi	16 Oct 2005 19:29:57 -0000
@@ -1033,18 +1033,6 @@ macro must evaluate to a value equal to 
 @code{STACK_BOUNDARY}.
 @end defmac
 
-@defmac FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN
-A C expression that evaluates true if @code{PREFERRED_STACK_BOUNDARY} is
-not guaranteed by the runtime and we should emit code to align the stack
-at the beginning of @code{main}.
-
-@cindex @code{PUSH_ROUNDING}, interaction with @code{PREFERRED_STACK_BOUNDARY}
-If @code{PUSH_ROUNDING} is not defined, the stack will always be aligned
-to the specified boundary.  If @code{PUSH_ROUNDING} is defined and specifies
-a less strict alignment than @code{PREFERRED_STACK_BOUNDARY}, the stack may
-be momentarily unaligned while pushing arguments.
-@end defmac
-
 @defmac FUNCTION_BOUNDARY
 Alignment required for a function entry point, in bits.
 @end defmac

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-11-03  1:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4351AE1C.1010704@sco.com.suse.lists.egcs>
     [not found] ` <20051016201110.GA7226@redhat.com.suse.lists.egcs>
2005-10-17 15:30   ` [cft] aligning main's stack frame Andi Kleen
2005-10-17 19:15     ` Richard Henderson
2005-10-16  3:43 About the alignment of main Kean Johnston
2005-10-16 20:11 ` [cft] aligning main's stack frame Richard Henderson
2005-10-16 20:59   ` Kean Johnston
2005-11-03  1:46   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).