public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [stack]: V9: Merge stack branch with trunk.
@ 2008-06-30 22:02 H.J. Lu
  0 siblings, 0 replies; only message in thread
From: H.J. Lu @ 2008-06-30 22:02 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-patches, rguenther, Ye, Joey, Guo, Xuepeng

[-- Attachment #1: Type: text/plain, Size: 6634 bytes --]

On Mon, Jun 30, 2008 at 2:26 PM, Ian Lance Taylor <iant@google.com> wrote:
> "H.J. Lu" <hjl.tools@gmail.com> writes:
>
>> BTW, I can post a complete stack alignment patch
>> with a separate patch for testcases.
>
> I think that would be helpful at this point.  Thanks.
>

Hi Ian,

Here is the complete patch for stack alignment. The
testcase patch is at

http://gcc.gnu.org/ml/gcc-patches/2008-06/msg01848.html

I am also enclosing the design document for stack
alignment. The middle-end change is to collect stack
alignment information and reserve frame pointer for
stack alignment. We also update DWARF output
to support stack alignment.  The rest is done in backend.

Thanks.


-- 
H.J.
---
2008-06-30  Joey Ye  <joey.ye@intel.com>
	    H.J. Lu  <hongjiu.lu@intel.com>

	* builtins.c (expand_builtin_setjmp_receiver): Replace
	virtual_incoming_args_rtx with
	crtl->args.internal_arg_pointer.
	(expand_builtin_apply_args_1): Likewise.
	(expand_builtin_longjmp): Need DRAP for stack alignment.
	(expand_builtin_apply): Likewise.

	* caller-save.c (setup_save_areas): Call assign_stack_local_1
	instead of assign_stack_local to allow alignment reduction.

	* calls.c (emit_call_1): Need DRAP for stack alignment if
	return pops.
	(expand_call): Replace virtual_incoming_args_rtx with
	crtl->args.internal_arg_pointer.
	* stmt.c (expand_nl_goto_receiver): Likewise.

	* cfgexpand.c (get_decl_align_unit): Estimate stack variable
	alignment and store to stack_alignment_estimated and
	max_used_stack_slot_alignment.
	(expand_one_var): Likewise.
	(expand_stack_alignment): New function.
	(tree_expand_cfg): Initialize max_used_stack_slot_alignment
	and stack_alignment_estimated fields in rtl_data.  Call
	expand_stack_alignment at end.

	* defaults.h (INCOMING_STACK_BOUNDARY): New.
	(MAX_STACK_ALIGNMENT): Likewise.
	(MAX_SUPPORTED_STACK_ALIGNMENT): Likewise.
	(SUPPORTS_STACK_ALIGNMENT): Likewise.

	* emit-rtl.c (gen_reg_rtx): Estimate stack alignment for
	stack alignment when generating virtual registers.

	* function.c (assign_stack_local): Renamed to ...
	(assign_stack_local_1): This.  Add a parameter to indicate
	if it is OK to reduce alignment.
	(assign_stack_local): Use it.
	(instantiate_new_reg): Instantiate virtual incoming args rtx
	to vDRAP if stack realignment and DRAP is needed.
	(assign_parms): Collect parameter/return type alignment and
	contribute to stack_alignment_estimated.
	(locate_and_pad_parm): Likewise.
	(get_arg_pointer_save_area): Replace virtual_incoming_args_rtx
	with crtl->args.internal_arg_pointer.

	* function.h (rtl_data): Add new field drap_reg,
	max_used_stack_slot_alignment, stack_alignment_estimated,
	stack_realign_needed, need_drap, stack_realign_processed and
	stack_realign_finalized.
	(stack_realign_fp): New macro.
	(stack_realign_drap): Likewise.

	* global.c (compute_regsets): Frame pointer is needed when
	stack is realigned.  Can eliminate frame pointer when stack is
	realigned and dynamic realigned argument pointer isn't used.

	* rtl.h (assign_stack_local_1): Declare new funtion.

	* reload1.c (update_eliminables):  Frame pointer is needed
	when stack is realigned.
	(init_elim_table): Can eliminate frame pointer when stack is
	realigned and dynamic realigned argument pointer isn't used.

	* target-def.h (TARGET_UPDATE_STACK_BOUNDARY): New.
	(TARGET_GET_DRAP_RTX): Likewise.
	(TARGET_CALLS): Add TARGET_UPDATE_STACK_BOUNDARY and
	TARGET_GET_DRAP_RTX.

	* target.h (gcc_target): Add update_stack_boundary and
	get_drap_rtx.

	* tree-vectorizer.c (vect_can_force_dr_alignment_p): Replace
	STACK_BOUNDARY with MAX_STACK_ALIGNMENT.

2008-06-30  Xuepeng Guo  <xuepeng.guo@intel.com>

	* dwarf2out.c (dw_fde_struct): Add cfa_uses_expression.
	(add_cfi): Don't redefine CFA when CFA was defined with
	DW_CFA_def_cfa_expression.
	(reg_save): Moved before build_cfa_loc.  Handle stack
	alignment.
	(dwarf2out_frame_debug_expr): Add rules 16-19 to handle stack
	realign.
	(int_loc_descriptor): Moved before output_cfa_loc.
	(output_cfa_loc): Handle DW_CFA_expression.
	(based_loc_descr): Update assert for stack realign.
	(compute_frame_pointer_to_fb_displacement): Likewise.

2008-06-30  Joey Ye  <joey.ye@intel.com>
	    H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (ix86_force_align_arg_pointer_string):
	Break long line.
	(ix86_gen_andsp): New.
	(ix86_user_incoming_stack_boundary): Likewise.
	(ix86_default_incoming_stack_boundary): Likewise.
	(ix86_incoming_stack_boundary): Likewise.
	(ix86_can_eliminate): Likewise.
	(find_drap_reg): Likewise.
	(ix86_update_stack_boundary): Likewise.
	(ix86_get_drap_rtx): Likewise.
	(ix86_finalize_stack_realign_flags): Likewise.
	(TARGET_UPDATE_STACK_BOUNDARY): Likewise.
	(TARGET_GET_DRAP_RTX): Likewise.
	(override_options): Overide option value for new options.
	(ix86_function_ok_for_sibcall): Remove check for
	force_align_arg_pointer.
	(ix86_handle_cconv_attribute): Likewise.
	(ix86_function_regparm): Likewise.
	(setup_incoming_varargs_64): Don't set stack_alignment_needed
	here.
	(ix86_va_start): Replace virtual_incoming_args_rtx with
	crtl->args.internal_arg_pointer.
	(ix86_select_alt_pic_regnum): Check DRAP register.
	(ix86_save_reg): Replace force_align_arg_pointer with drap_reg.
	(ix86_compute_frame_layout): Compute frame layout wrt stack
	realignment.
	(ix86_internal_arg_pointer): Just return
	virtual_incoming_args_rtx.
	(ix86_expand_prologue): Decide if stack realignment is needed
	and generate prologue code accordingly.
	(ix86_expand_epilogue): Generate epilogue code wrt stack
	realignment is really needed or not.
	
	* config/i386/i386.h (MAIN_STACK_BOUNDARY): New.
	(ABI_STACK_BOUNDARY): Likewise.
	(PREFERRED_STACK_BOUNDARY_DEFAULT): Likewise.
	(STACK_REALIGN_DEFAULT): Likewise.
	(INCOMING_STACK_BOUNDARY): Likewise.
	(MAX_STACK_ALIGNMENT): Likewise.
	(ix86_incoming_stack_boundary): Likewise.
	(FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN): Removed.
	(REAL_PIC_OFFSET_TABLE_REGNUM): Updated to use BX_REG.
	(CAN_ELIMINATE): Defined with ix86_can_eliminate.
	(machine_function): Remove force_align_arg_pointer.

	* config/i386/i386.md (BX_REG): New.
	(R13_REG): Likewise.

	* config/i386/i386.opt (mforce_drap): New.
	(mincoming-stack-boundary): Likewise.
	(mstackrealign): Add Init(-1).

	* config/i386/i386-protos.h (ix86_can_eliminate): New

2008-06-30  H.J. Lu  <hongjiu.lu@intel.com>

	* doc/extend.texi: Update force_align_arg_pointer.

	* doc/invoke.texi: Document -mincoming-stack-boundary.  Update
	-mstackrealign.

	* doc/tm.texi (MAX_STACK_ALIGNMENT): Add macro.
	(INCOMING_STACK_BOUNDARY): Likewise.
	(TARGET_UPDATE_STACK_BOUNDARY): New target hook.
	(TARGET_GET_DRAP_RTX): Likewise.

[-- Attachment #2: gcc-stack-v9-1.patch.bz2 --]
[-- Type: application/x-bzip2, Size: 19726 bytes --]

[-- Attachment #3: design-doc-stackalign.txt --]
[-- Type: text/plain, Size: 17450 bytes --]

-- 0. MOTIVATION --
Some local variables (such as of __m128 type or marked with alignment
attribute) require stack aligned at a boundary larger than the default stack
boundary. Current GCC partially supports this with limitations. We are
proposing a new design to fully solve the problem.


-- 1. CURRENT IMPLEMENTATION --
There are two ways current GCC supports bigger than default stack
alignment.  One is to make sure that stack is aligned at program entry
point, and then ensure that for each non-leaf function, its frame size is
aligned. This approach doesn't work when linking with libs or objects
compiled by other psABI confirming compilers. Some problems are logged as
PR 33721. Another is to adjust stack alignment at the entry point of a
function if it is marked with __attribute__ ((force_align_arg_pointer))
or -mstackrealign option is provided. This method guarantees the alignment
in most of the cases but with following problems and limitations:

*  Only 16 bytes alignment is supported
*  Adjusting stack alignment at each function prologue hurts performance
unnecessarily, because not all functions need bigger alignment. In fact,
commonly only those functions which have SSE variables defined locally
(either declared by the user or compiler generated internal temporary
variables) need corresponding alignment.
*  Doesn't support x86_64 for the cases when required stack alignment
is > 16 bytes
*  Emits inefficient and complicated prologue/epilogue code to adjust
stack alignment
*  Doesn't work with nested functions
*  Has a bug handling register parameters, which resulted in a cpu2006
failure. A patch is available as a workaround.

-- 2. NEW PROPOSAL: DESIGN --
Here, we propose a new design to fully support stack alignment while
overcoming above problems. The new design will
*  Support arbitrary alignment value, including 4,8,16,32...
*  Adjust function stack alignment only when necessary
*  Initial development will be on i386 and x86_64, but can be extended
to other platforms
*  Emit more efficient prologue/epilogue code
*  Coexist with special features like dynamic stack allocation (alloca),
nested functions, register parameter passing, PIC code and tail call
optimization
*  Be able to debug and unwind stack

2.1 Support arbitrary alignment value
Different source code and optimizations requires different stack alignment,
as in following table:
Feature         Alignment (bytes)
i386_ABI        4
x86_64_ABI      16
char            1
short           2
int             4
long            4/8*
long long       8
__m64           8
__m128          16
float           4
double          8
long double     16
user specified  any power of 2

*Note: 4 for i386, 8 for x86_64
The new design will support any alignment value in this table.

2.2 Adjust function stack alignment only when necessary

Current GCC defines following macros related to stack alignment:
i. STACK_BOUNDARY in bits, which is preferred by hardware, 32 for i386 and
64 for x86_64. It is the minimum stack boundary. It is fixed.
ii. PREFERRED_STACK_BOUNDARY. It sets the stack alignment when calling a
function. It may be set at command line and has no impact on stack
alignment at function entry. This proposal requires PREFERRED >= STACK, and
by default set to ABI_STACK_BOUNDARY

This design will define a few more macros, or concepts not explicitly
defined in code:
iii. ABI_STACK_BOUNDARY in bits, which is the stack boundary specified by
psABI, 32 for i386 and 128 for x86_64.  ABI_STACK_BOUNDARY >=
STACK_BOUNDARY. It is fixed for a given psABI.
iv. LOCAL_STACK_BOUNDARY in bits. Each function stack has its own stack
alignment requirement, which depends the alignment of its stack variables,
LOCAL_STACK_BOUNDARY = MAX (alignment of each effective stack variable).
v. INCOMING_STACK_BOUNDARY in bits, which is the stack boundary at function
entry. If a function is marked with __attribute__ ((force_align_arg_pointer))
or -mstackrealign option is provided, INCOMING = STACK_BOUNDARY. Otherwise,
INCOMING == PREFERRED_STACK_BOUNDARY because a function is typically called 
locally with the same PREFERRED_STACK_BOUNDARY. For those function whose  
PREFERRED is larger than ABI, it is the caller's responsibility to invoke 
them with appropriate PREFERRED.
vi. REQUIRED_STACK_ALIGNMENT in bits, which is stack alignment required by
local variables and calling other function. REQUIRED_STACK_ALIGNMENT ==
MAX(LOCAL_STACK_BOUNDARY,PREFERRED_STACK_BOUNDARY) in case of a non-leaf
function. For a leaf function, REQUIRED_STACK_ALIGNMENT ==
MAX(LOCAL_STACK_BOUNDARY,STACK_BOUNDARY).

This proposal won't adjust stack when INCOMING_STACK_BOUNDARY >=
REQUIRED_STACK_ALIGNMENT. Only when INCOMING_STACK_BOUNDARY <
REQUIRED_STACK_ALIGNMENT, or PREFERRED_STACK_BOUNDARY of entry function less 
than ABI_STACK_BOUNDARY, it will adjust stack to REQUIRED_STACK_ALIGNMENT
at prologue.

2.3 Initial development on i386 and x86_64
We initially support i386 and x86_64. In this document we focus more on
i386 because it is hard to implement because of the restriction of having
a small register file.  But all that we discuss can be easily applied
to x86_64.

2.4 Emit more efficient prologue/epilogue
When a function needs to adjust stack alignment and has no dynamic stack
allocation, this design will generate following example prologue/epilogue
code:
IA32 example Prologue:
        pushl     %ebp
        movl      %esp, %ebp
        andl      $-16, %esp
        subl      $4, %esp ; is $-4 the local stack size?
Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        ret
Locals will be addressed as esp + offset and parameters as ebp + offset. 
Add x86_64 example here.

Thus BP points to parameter frame and SP points to local frame.

2.5 Coexist with special features
Stack alignment adjustment will coexist with varying  GCC features
that have special calling conventions and frame layout, such as dynamic
stack allocation (alloca), nested functions and parameter passing via
registers to local functions.

I386 hard register usage is the major problem to make the proposal friendly 
to various GCC features. This design requires an additional hard register
in prologue/epilogue in case of dynamic stack allocation. The register is 
called as Dynamic Realigned Argument Pointer, or DRAP. Because I386 PIC
requires BX as GOT pointer and I386 may use AX, DX and CX as parameter
passing registers, also it has to work with setjmp/longjmp, there are
limited candidates to choose.  Current proposal uses DI as DRAP because
it won't conflict with i386 PIC or regparm and it is preserved across
setjmp/longjmp since it is callee-saved.

X86_64 is much easier. This proposal just chooses R12 as DRAP, which is
also preserved across setjmp/longjmp since it is callee-saved.

DRAP will be assigned to a virtual register, or VDRAP, in prologue so that 
DRAP hard register itself can be free for register allocator in function body.
Usually VDRAP will be allocated as the same DRAP register, thus the additional
register move instruction is oftenly removed. 

2.5.1 When stack alignment adjustment comes together with alloca, following
example prologue/epilogue will be emitted:
Prologue:
       pushl     %edi                     // Save callee save reg edi
       leal      8(%esp), %edi            // Save address of parameter frame
       andl      $-16, %esp               // Align local stack

//  Reserve two stack slots and save return address 
//  and previous frame pointer into them. By
//  pointing new ebp to them, we build a pseudo 
//  stack for unwinding.
       pushl     $4(%edi)                 //  save return address
       pushl     %ebp                     //  save old ebp
       movl      %esp, %ebp               //  point ebp to pseudo frame start

       subl      $24, %esp                // adjust local frame size
       movl      %edi, vreg1

epilogue:
       movl      vreg1, %edi
       movl      %ebp, %esp               // Restore esp to pseudo frame start
       popl      %ebp
       leal      -8(%edi), %esp           // restore esp to real frame start
       popl      %edi                     // Restore edi
       ret

Locals will be addressed as ebp - offset, parameters as vreg1 + offset

Where BX is used to set up virtual parameter frame pointer, BP points to
local frame and SP points to dynamic allocation frame.

2.5.2 Nested functions will automatically work because it uses CX as static
pointer, which won't conflict with any registers used by stack alignment
adjustment, even when nested functions are called via function pointer and
a function stub on stack.

2.5.3 GCC may optimize to use registers to pass parameters . At most AX, DX
and CX will be used. Such optimization won't conflict with stack alignment
adjustment thus it should automatically work.

2.5.4 I386 PIC uses EBX as GOT pointer. This design work well under i386
PIC:

For example:
i686 Prologue:
        pushl     %edi
        leal      8(%esp), %edi
        andl      $-16, %esp
        pushl     $4(%edi)
        pushl     %ebp
        movl      %esp, %ebp
        subl      $24,  %esp
        call      .L1
.L1:
        popl      %ebx
        movl      %edi, vreg1

Body:  // code for alloca
        movl      (vreg1), %eax
        subl      %eax, %esp
        andl      $-16, %esp
        movl      %esp, %eax

i686 Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        leal      -8(%edi), %esp
        popl      %edi
        ret

Locals will be addressed as ebp - offset, parameters as vreg1 + offset,
ebx has the GOT pointer.

2.6 Debug and unwind will work since DWARF2 has the flexibility to define
different frame pointers.

2.7 Some intrinsics rely on stack layout. Need to handle them accordingly.
They are __builtin_return_address, __builtin_frame_address. This proposal
will setup pseudo frame slot to help unwinder find return address and
parent frame address by emit following prologue code after adjusting
alignment:
        pushl     $4(%edi)
        pushl     %ebp


-- 3. NEW PROPOSAL: IMPLEMENTATION --
The proposed implementation can be partitioned into following subtasks.
*  Alignment requirement collection
*  Frames addressing
*  Alignment code generation
*  Debug and unwind information

3.1 Collect alignment requirement
Alignment of variables in source depend on their types or user's specify, 
and finally determine the function's alignment. In this proposal we
implement the collecting by comparing the variable's alignment with
cfun->stack_alignment_needed and reusing the later to store the larger 
when the variable is going to be really expanded. The benefit of this 
position is extremely avoiding the effect of optimize. Meanwhile the 
shortcoming is that the collecting may be conservative because some 
variables may be put into register after register allocation pass.

3.2 Frames addressing
Adding parameter frame, local frame, static frame and dynamic frame with
appropriate pointers, either hard registers or virtual registers.

Backend will customize CAN_ELIMINATE hook to assign hard registers to
corresponding virtual registers.

3.3 Alignment code generation
Emit prologue/epilogue code to guarantee correct stack alignment based on
each function's alignment requirement collected previously.

Modification should happen in ix86_expand_prologue and ix86_expand_epilogue.
Code to be emitted can follow above design in a straight forward manner.

3.4 Debug information
Emit debug and unwind information for aligned stacks. It also happens in
ix86_expand_prologue and ix86_expand_epilogue corresponding the
prologue/epilogue code emitted.

4. Code Example

Simply function:
void foo()
{
 
   volatile int local;
   ...
}

i686 Prologue:
        pushl     %ebp
        movl      %esp, %ebp
        subl      $4, %esp         // Adjust local frame size by 4
i686 Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        ret



x86_64 Prologue:
        pushq     %rbp
        movq      %rsp, %rbp
        subq      $16, %rsp
x86_64 Epilogue:
        movl      %rbp, %rsp
        popl      %rbp
        ret

Pure 16 bytes align:
void foo()
{
    volatile __m128 m = _mm_set_ps1(0.f);
}

i686 Prologue:
        pushl     %ebp
        movl      %esp, %ebp
        andl      $-16, %esp
        subl      $16, %esp     // this is space for m, 16 byte aligned
i686 Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        ret

x86_64 Prologue:
        pushq     %rbp
        movq      %rsp, %rbp
        andq      $-16, %rsp
        subq      $16, %rsp
x86_64 Epilogue:
        movl      %rbp, %rsp
        popl      %rbp
        ret

16 bytes align with alloca:
void foo(int size)
{
    char * ptr=alloca(size);
    volatile int __attribute((aligned(32)) m = 0;
    ...
}

i686 Prologue:
        pushl     %edi
        leal      8(%esp), %edi
        andl      $-32, %esp
        pushl     $4(%edi)
        pushl     %ebp
        movl      %esp, %ebp
        subl      $24,  %esp

Body:  // code for alloca
        movl      %edi, vreg1
        movl      (vreg1), %eax
        subl      %eax, %esp
        andl      $-16, %esp
        movl      %esp, %eax

i686 Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        leal      -8(%edi), %esp
        popl      %edi
        ret

void foo(int dummy1, int dummy2, int dummy3, int dummy4,
         int dummy5, int dummy6, int size)
{
    char * ptr=alloca(size);
    volatile int __attribute((aligned(32)) m = 0;
    ...
}
x86_64 Prologue:
        pushq     %rbx
        leaq      $16(%rsp), %rbx
        andq      $-32, %rsp
        pushq     8(%rbx)
        pushq     %rbp
        movq      %rsp, %rbp
        subq      $24, %rsp

Body:
	movq      %rbx, vreg1
        movl      (vreg1), %eax
        subq      %rax, %rsp
        andq      $-16, %rsp
        movq      %rsp, %rax

x86_64 Epilogue:
        movl      %rbp, %rsp
        popl      %rbp
        movl      %rbx, %rsp
        popl      %rbx
        ret

m128 and PIC
int g_i;
void foo()
{
    volatile __m128 m = _mm_set_ps1(0.f);
    g_i = 123;
    ...
}

i686 Prologue:
        pushl     %ebp
        movl      %esp, %ebp
        andl      $-16, %esp
        pushl     %ebx
        subl      $16, %esp
        call      .L1
.L1:
        popl      %ebx
	...

i686 Epilogue:
        addl      $16, %esp
        popl      %ebx
        movl      %ebp, %esp
        popl      %ebp
        ret

m128 + alloca + PIC
void foo(int size)
{
    char * ptr=alloca(size);
    volatile __m128 m = _mm_set_ps1(0.f);
    ...
}
i686 Prologue:
        pushl     %edi
        leall     8(%esp), %edi
        andl      $-16, %esp
        pushl     4(%edi)
        pushl     %ebp
        movl      %esp, %ebp
        subl      $24,  %esp
        call      .L1
.L1:
        popl      %ebx

Body:
        movl      %edi, vreg1
        movl      (vreg1), %eax
        subl      %eax, %esp
        andl      $-16, %esp
        movl      %esp, %eax

i686 Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        leal      -8(%edi), %esp
        popl      %edi
        ret

m128 + alloca + PIC + library call
void foo(int size)
{
    char * ptr=alloca(size);
    volatile __m128 m = _mm_set_ps1(0.f);
    printf("Hello\n");
    ...
}

i686 Prologue:
        pushl     %edi
        leal      8(%esp), %edi
        andl      $-16, %esp
        pushl     4(%edi)
        pushl     %ebp
        movl      %esp, %ebp
        subl      $24,  %esp
        call      .L1
.L1:
        popl      %ebx

i686 Body:
        movl      %edi, vreg1
        movl      (vreg1), %eax
        subl      %eax, %esp
        andl      $-16, %esp
        movl      %esp, %eax

Body:
        call      printf@PLT

i686 Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        leal      -8(%edi), %esp
        popl      %edi
        ret

m128 and nested function and PIC
void foo()
{
    void bar(int arg1, int arg 2)
    {
         volatile __m128 m = _mm_set_ps1(0.f);
         ...
    }
    bar(1,2);
}

i686:
foo:
        ...
        movl      %ebp, %ecx
        call      bar@PLT
        ...

bar:
        pushl     %edi
        leal      8(%esp), %edi
        andl      $-16, %esp
        pushl     4(%edi)
        pushl     %ebp
        movl      %esp, %ebp
        subl      $24,  %esp
        call      .L1
.L1:
        popl      %ebx

        movl      %edi, vreg1
        movl      (vreg1), %eax
        subl      %eax, %esp
        andl      $-16, %esp
        movl      %esp, %eax
        ...

        movl      %ebp, %esp
        popl      %ebp
        leal      -8(%edi), %esp
        popl      %edi
        ret

m128, dynamic stack alloc and register parameter function call
static void bar(int arg1, int arg 2, int arg3)
{
    char * ptr=alloca(size);
    volatile __m128 m = _mm_set_ps1(0.f);
    ...
}

void foo()
{
    bar(1,2,3);
}

i686 foo:
        movl      $1, %eax
        movl      $2, %edx
        movl      $3, %ecx
        call      bar
        ...
bar:
        pushl     %edi
        leal      8(%esp), %edi
        andl      $-16, %esp
        pushl     $4(%edi)
        pushl     %ebp
        movl      %esp, %ebp
        subl      $24,  %esp

        movl      %edi, vreg1
        movl      (vreg1), %eax
        subl      %eax, %esp
        andl      $-16, %esp
        movl      %esp, %eax
	...
	
        movl      %ebp, %esp
        popl      %ebp
        leal      -8(%edi), %esp
        popl      %edi
        ret

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2008-06-30 21:59 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-30 22:02 [stack]: V9: Merge stack branch with trunk H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).