* RFC: [PATCH] Add __builtin_ia32_stack_top @ 2015-07-21 22:56 H.J. Lu 2015-07-22 12:21 ` H.J. Lu 2015-07-30 19:44 ` [PATCH] Add __builtin_stack_top to x86 backend H.J. Lu 0 siblings, 2 replies; 7+ messages in thread From: H.J. Lu @ 2015-07-21 22:56 UTC (permalink / raw) To: gcc-patches; +Cc: Uros Bizjak When __builtin_frame_address is used to retrieve the address of the function stack frame, the frame pointer is always kept, which wastes one register and 2 instructions. For x86-32, one less register means significant negative impact on performance. This patch adds a new builtin function, __builtin_ia32_stack_top, to x86 backend. It returns the stack address when the function is called. Any comments, feedbacks? Thanks. H.J. --- gcc/ PR target/66960 * config/i386/i386.c (ix86_expand_prologue): Sorry if DRAP is used and the stack address has been taken. (ix86_builtins): Add IX86_BUILTIN_STACK_TOP. (ix86_init_mmx_sse_builtins): Add __builtin_ia32_stack_top. (ix86_expand_builtin): Handle IX86_BUILTIN_STACK_TOP. * config/i386/i386.h (machine_function): Add stack_top_taken. * doc/extend.texi: Document __builtin_ia32_stack_top. gcc/testsuite/ PR target/66960 * gcc.target/i386/pr66960-1.c: New test. * gcc.target/i386/pr66960-2.c: Likewise. * gcc.target/i386/pr66960-3.c: Likewise. * gcc.target/i386/pr66960-4.c: Likewise. --- gcc/config/i386/i386.c | 28 ++++++++++++++++++++++++++ gcc/config/i386/i386.h | 4 ++++ gcc/doc/extend.texi | 3 +++ gcc/testsuite/gcc.target/i386/pr66960-1.c | 33 +++++++++++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr66960-2.c | 33 +++++++++++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr66960-3.c | 17 ++++++++++++++++ gcc/testsuite/gcc.target/i386/pr66960-4.c | 21 ++++++++++++++++++++ 7 files changed, 139 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-4.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 8b9baf8..a252473 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11613,6 +11613,12 @@ ix86_expand_prologue (void) { int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT; + /* Can't use DRAP if the stack address has been taken. */ + if (cfun->machine->stack_top_taken) + sorry ("%<__builtin_ia32_stack_top%> not supported with stack" + " realignment. This may be worked around by adding" + " -maccumulate-outgoing-arg."); + /* Only need to push parameter pointer reg if it is caller saved. */ if (!call_used_regs[REGNO (crtl->drap_reg)]) { @@ -30779,6 +30785,9 @@ enum ix86_builtins IX86_BUILTIN_READ_FLAGS, IX86_BUILTIN_WRITE_FLAGS, + /* Get the stack address when the function is called. */ + IX86_BUILTIN_STACK_TOP, + IX86_BUILTIN_MAX }; @@ -34391,6 +34400,10 @@ ix86_init_mmx_sse_builtins (void) def_builtin (OPTION_MASK_ISA_MWAITX, "__builtin_ia32_mwaitx", VOID_FTYPE_UNSIGNED_UNSIGNED_UNSIGNED, IX86_BUILTIN_MWAITX); + /* Get the stack address when the function is called. */ + def_builtin (0, "__builtin_ia32_stack_top", + PVOID_FTYPE_VOID, IX86_BUILTIN_STACK_TOP); + /* Add FMA4 multi-arg argument instructions */ for (i = 0, d = bdesc_multi_arg; i < ARRAY_SIZE (bdesc_multi_arg); i++, d++) { @@ -40325,6 +40338,21 @@ addcarryx: emit_insn (gen_xabort (op0)); return 0; + case IX86_BUILTIN_STACK_TOP: + cfun->machine->stack_top_taken = true; + + if (!target + || GET_MODE (target) != Pmode + || !register_operand (target, Pmode)) + target = gen_reg_rtx (Pmode); + + /* After the prologue, stack top is at -WORD(AP) in the current + frame. */ + emit_insn (gen_rtx_SET (target, + plus_constant (Pmode, arg_pointer_rtx, + -UNITS_PER_WORD))); + return target; + default: break; } diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index ab668fe..a33a9eb 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -2483,6 +2483,10 @@ struct GTY(()) machine_function { /* If true, it is safe to not save/restore DRAP register. */ BOOL_BITFIELD no_drap_save_restore : 1; + /* If true, the stack address of the current function has been + taken. */ + BOOL_BITFIELD stack_top_taken : 1; + /* If true, there is register available for argument passing. This is used only in ix86_function_ok_for_sibcall by 32-bit to determine if there is scratch register available for indirect sibcall. In diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index b18d8fb..a41defc 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -16661,6 +16661,9 @@ The following built-in function is always available. @item void __builtin_ia32_pause (void) Generates the @code{pause} machine instruction with a compiler memory barrier. + +@item void *__builtin_ia32_stack_top (void) +Retrieves the stack address when the function is called. @end table The following floating-point built-in functions are made available in the diff --git a/gcc/testsuite/gcc.target/i386/pr66960-1.c b/gcc/testsuite/gcc.target/i386/pr66960-1.c new file mode 100644 index 0000000..74181cf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66960-1.c @@ -0,0 +1,33 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fomit-frame-pointer" { target { lp64 } } } */ +/* { dg-options "-O2 -fomit-frame-pointer -maddress-mode=short" { target { x32 } } } */ +/* { dg-options "-O2 -fomit-frame-pointer -miamcu" { target { ia32 } } } */ + +extern char **environ; +extern void exit (int status); +extern int main (long argc, char **argv, char **envp); + +void +_start (void) +{ + void *argc_p = __builtin_ia32_stack_top (); + char **argv = (char **) (argc_p + sizeof (void *)); + long argc = *(long *) argc_p; + int status; + + environ = argv + argc + 1; + + status = main (argc, argv, environ); + + exit (status); +} + +/* { dg-final { scan-assembler "movq\[ \t\]8\\(%rsp\\), %rdi" { target lp64 } } } */ +/* { dg-final { scan-assembler "leaq\[ \t\]16\\(%rsp\\), %rsi" { target lp64 } } } */ +/* { dg-final { scan-assembler "leaq\[ \t\]24\\(%rsp,%rdi,8\\), %rdx" { target lp64 } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]8\\(%esp\\), %edi" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]12\\(%rsp\\), %esi" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]4\\(%rsi,%rdi,4\\), %edx" { target x32 } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]\\(%esp\\), %eax" { target ia32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]4\\(%esp\\), %edx" { target ia32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]8\\(%esp,%eax,4\\), %ecx" { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66960-2.c b/gcc/testsuite/gcc.target/i386/pr66960-2.c new file mode 100644 index 0000000..8252c2c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66960-2.c @@ -0,0 +1,33 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fno-omit-frame-pointer" { target { lp64 } } } */ +/* { dg-options "-O2 -fno-omit-frame-pointer -maddress-mode=short" { target { x32 } } } */ +/* { dg-options "-O2 -fno-omit-frame-pointer -miamcu" { target { ia32 } } } */ + +extern char **environ; +extern void exit (int status); +extern int main (long argc, char **argv, char **envp); + +void +_start (void) +{ + void *argc_p = __builtin_ia32_stack_top (); + char **argv = (char **) (argc_p + sizeof (void *)); + long argc = *(long *) argc_p; + int status; + + environ = argv + argc + 1; + + status = main (argc, argv, environ); + + exit (status); +} + +/* { dg-final { scan-assembler "movq\[ \t\]8\\(%rbp\\), %rdi" { target lp64 } } } */ +/* { dg-final { scan-assembler "leaq\[ \t\]16\\(%rbp\\), %rsi" { target lp64 } } } */ +/* { dg-final { scan-assembler "leaq\[ \t\]24\\(%rbp,%rdi,8\\), %rdx" { target lp64 } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]8\\(%ebp\\), %edi" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]12\\(%rbp\\), %esi" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]4\\(%rsi,%rdi,4\\), %edx" { target x32 } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]4\\(%ebp\\), %eax" { target ia32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]8\\(%ebp\\), %edx" { target ia32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]12\\(%ebp,%eax,4\\), %ecx" { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66960-3.c b/gcc/testsuite/gcc.target/i386/pr66960-3.c new file mode 100644 index 0000000..1698003 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66960-3.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -mno-accumulate-outgoing-args" { target { lp64 } } } */ +/* { dg-options "-O2 -mno-accumulate-outgoing-args -maddress-mode=short" { target { x32 } } } */ +/* { dg-options "-O2 -mno-accumulate-outgoing-args -miamcu" { target { ia32 } } } */ + +extern void abort (void); +extern int check_int (int *i, int align); +typedef int aligned __attribute__((aligned(64))); + +void * +foo (void) +{ + aligned j; + if (check_int (&j, __alignof__(j)) != j) + abort (); + return __builtin_ia32_stack_top (); +} /* { dg-message "sorry, unimplemented: .__builtin_ia32_stack_top. not supported" } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66960-4.c b/gcc/testsuite/gcc.target/i386/pr66960-4.c new file mode 100644 index 0000000..82a032c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66960-4.c @@ -0,0 +1,21 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -maccumulate-outgoing-args" { target { lp64 } } } */ +/* { dg-options "-O2 -maccumulate-outgoing-args -maddress-mode=short" { target { x32 } } } */ +/* { dg-options "-O2 -maccumulate-outgoing-args -miamcu" { target { ia32 } } } */ + +extern void abort (void); +extern int check_int (int *i, int align); +typedef int aligned __attribute__((aligned(64))); + +void * +foo (void) +{ + aligned j; + if (check_int (&j, __alignof__(j)) != j) + abort (); + return __builtin_ia32_stack_top (); +} + +/* { dg-final { scan-assembler "leaq\[ \t\]8\\(%rbp\\), %rax" { target lp64 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]8\\(%rbp\\), %eax" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]4\\(%ebp\\), %eax" { target ia32 } } } */ -- 2.4.3 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: [PATCH] Add __builtin_ia32_stack_top 2015-07-21 22:56 RFC: [PATCH] Add __builtin_ia32_stack_top H.J. Lu @ 2015-07-22 12:21 ` H.J. Lu 2015-07-22 13:59 ` Segher Boessenkool 2015-07-30 19:44 ` [PATCH] Add __builtin_stack_top to x86 backend H.J. Lu 1 sibling, 1 reply; 7+ messages in thread From: H.J. Lu @ 2015-07-22 12:21 UTC (permalink / raw) To: GCC Patches; +Cc: Uros Bizjak On Tue, Jul 21, 2015 at 2:45 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: > When __builtin_frame_address is used to retrieve the address of the > function stack frame, the frame pointer is always kept, which wastes one > register and 2 instructions. For x86-32, one less register means > significant negative impact on performance. This patch adds a new > builtin function, __builtin_ia32_stack_top, to x86 backend. It > returns the stack address when the function is called. > > Any comments, feedbacks? > > Thanks. > > > H.J. > --- > gcc/ > > PR target/66960 > * config/i386/i386.c (ix86_expand_prologue): Sorry if DRAP is > used and the stack address has been taken. > (ix86_builtins): Add IX86_BUILTIN_STACK_TOP. > (ix86_init_mmx_sse_builtins): Add __builtin_ia32_stack_top. > (ix86_expand_builtin): Handle IX86_BUILTIN_STACK_TOP. > * config/i386/i386.h (machine_function): Add stack_top_taken. > * doc/extend.texi: Document __builtin_ia32_stack_top. > I got a feedback, suggesting __builtin_stack_top, instead of __builtin_ia32_stack_top. But I don't know if + /* After the prologue, stack top is at -WORD(AP) in the current + frame. */ + emit_insn (gen_rtx_SET (target, + plus_constant (Pmode, arg_pointer_rtx, + -UNITS_PER_WORD))); is true for all backends. If it works on all backends, I can move it to builtins.c. -- H.J. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: [PATCH] Add __builtin_ia32_stack_top 2015-07-22 12:21 ` H.J. Lu @ 2015-07-22 13:59 ` Segher Boessenkool 2015-07-22 14:14 ` H.J. Lu 0 siblings, 1 reply; 7+ messages in thread From: Segher Boessenkool @ 2015-07-22 13:59 UTC (permalink / raw) To: H.J. Lu; +Cc: GCC Patches, Uros Bizjak On Wed, Jul 22, 2015 at 05:10:04AM -0700, H.J. Lu wrote: > I got a feedback, suggesting __builtin_stack_top, instead of > __builtin_ia32_stack_top. But I don't know if > > + /* After the prologue, stack top is at -WORD(AP) in the current > + frame. */ > + emit_insn (gen_rtx_SET (target, > + plus_constant (Pmode, arg_pointer_rtx, > + -UNITS_PER_WORD))); > > is true for all backends. If it works on all backends, I can move > it to builtins.c. It doesn't afaik. But can't you define INITIAL_FRAME_ADDRESS_RTX? Segher ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: [PATCH] Add __builtin_ia32_stack_top 2015-07-22 13:59 ` Segher Boessenkool @ 2015-07-22 14:14 ` H.J. Lu 2015-07-22 16:01 ` H.J. Lu 0 siblings, 1 reply; 7+ messages in thread From: H.J. Lu @ 2015-07-22 14:14 UTC (permalink / raw) To: Segher Boessenkool; +Cc: GCC Patches, Uros Bizjak On Wed, Jul 22, 2015 at 6:55 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote: > On Wed, Jul 22, 2015 at 05:10:04AM -0700, H.J. Lu wrote: >> I got a feedback, suggesting __builtin_stack_top, instead of >> __builtin_ia32_stack_top. But I don't know if >> >> + /* After the prologue, stack top is at -WORD(AP) in the current >> + frame. */ >> + emit_insn (gen_rtx_SET (target, >> + plus_constant (Pmode, arg_pointer_rtx, >> + -UNITS_PER_WORD))); >> >> is true for all backends. If it works on all backends, I can move >> it to builtins.c. > > It doesn't afaik. But can't you define INITIAL_FRAME_ADDRESS_RTX? > > > Segher Does INITIAL_FRAME_ADDRESS_RTX point to stack top? It certainly can't be defined for x86. I will write a midld-end patch and leave to each backend to enable it. -- H.J. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: [PATCH] Add __builtin_ia32_stack_top 2015-07-22 14:14 ` H.J. Lu @ 2015-07-22 16:01 ` H.J. Lu 0 siblings, 0 replies; 7+ messages in thread From: H.J. Lu @ 2015-07-22 16:01 UTC (permalink / raw) To: Segher Boessenkool; +Cc: GCC Patches, Uros Bizjak [-- Attachment #1: Type: text/plain, Size: 2822 bytes --] On Wed, Jul 22, 2015 at 6:59 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Wed, Jul 22, 2015 at 6:55 AM, Segher Boessenkool > <segher@kernel.crashing.org> wrote: >> On Wed, Jul 22, 2015 at 05:10:04AM -0700, H.J. Lu wrote: >>> I got a feedback, suggesting __builtin_stack_top, instead of >>> __builtin_ia32_stack_top. But I don't know if >>> >>> + /* After the prologue, stack top is at -WORD(AP) in the current >>> + frame. */ >>> + emit_insn (gen_rtx_SET (target, >>> + plus_constant (Pmode, arg_pointer_rtx, >>> + -UNITS_PER_WORD))); >>> >>> is true for all backends. If it works on all backends, I can move >>> it to builtins.c. >> >> It doesn't afaik. But can't you define INITIAL_FRAME_ADDRESS_RTX? >> >> >> Segher > > Does INITIAL_FRAME_ADDRESS_RTX point to stack top? It certainly > can't be defined for x86. I will write a midld-end patch and leave to each > backend to enable it. Here is a patch. Any comments, feedbacks? Thanks. -- H.J. --- When __builtin_frame_address is used to retrieve the address of the function stack frame, the frame pointer is always kept, which wastes one register and 2 instructions. For x86-32, one less register means significant negative impact on performance. This patch adds a new builtin function, __builtin_stack_top. It returns the stack address when the function is called. This patch only enables __builtin_stack_top for x86 backend. Using __builtin_stack_top with other backends will lead to sorry, unimplemented: ‘__builtin_stack_top’ not supported on this target TARGET_STACK_TOP_RTX must be defined to enable __builtin_stack_top. default_stack_top_rtx may be extended to support more backends, including those with INITIAL_FRAME_ADDRESS_RTX. gcc/ PR target/66960 * builtin-types.def (BT_FN_PTR_VOID): New function type. * builtins.c (expand_builtin): Handle BUILT_IN_STACK_TOP. (is_simple_builtin): Likewise. * ipa-pure-const.c (special_builtin_state): Likewise. * builtins.def: Add BUILT_IN_STACK_TOP. * function.h (function): Add stack_top_taken. * target.def (stack_top_rtx): New target hook. * targhooks.c (default_stack_top_rtx): New. * targhooks.h (default_stack_top_rtx): Likewise. * config/i386/i386.c (ix86_expand_prologue): Sorry if DRAP is used and the stack address has been taken. (TARGET_STACK_TOP_RTX): New. * doc/extend.texi: Document __builtin_stack_top. * doc/tm.texi.in (TARGET_STACK_TOP_RTX): New. * doc/tm.texi: Regenerated. gcc/testsuite/ PR target/66960 * gcc.target/i386/pr66960-1.c: New test. * gcc.target/i386/pr66960-2.c: Likewise. * gcc.target/i386/pr66960-3.c: Likewise. * gcc.target/i386/pr66960-4.c: Likewise. * gcc.target/i386/pr66960-5.c: Likewise. [-- Attachment #2: 0001-Add-__builtin_stack_top.patch --] [-- Type: text/x-patch, Size: 16689 bytes --] From 53c2dd6e303d48eccf050696020b3765d3c4c382 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.tools@gmail.com> Date: Tue, 21 Jul 2015 14:32:09 -0700 Subject: [PATCH] Add __builtin_stack_top MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When __builtin_frame_address is used to retrieve the address of the function stack frame, the frame pointer is always kept, which wastes one register and 2 instructions. For x86-32, one less register means significant negative impact on performance. This patch adds a new builtin function, __builtin_stack_top. It returns the stack address when the function is called. This patch only enables __builtin_stack_top for x86 backend. Using __builtin_stack_top with other backends will lead to sorry, unimplemented: ‘__builtin_stack_top’ not supported on this target TARGET_STACK_TOP_RTX must be defined to enable __builtin_stack_top. default_stack_top_rtx may be extended to support more backends, including those with INITIAL_FRAME_ADDRESS_RTX. gcc/ PR target/66960 * builtin-types.def (BT_FN_PTR_VOID): New function type. * builtins.c (expand_builtin): Handle BUILT_IN_STACK_TOP. (is_simple_builtin): Likewise. * ipa-pure-const.c (special_builtin_state): Likewise. * builtins.def: Add BUILT_IN_STACK_TOP. * function.h (function): Add stack_top_taken. * target.def (stack_top_rtx): New target hook. * targhooks.c (default_stack_top_rtx): New. * targhooks.h (default_stack_top_rtx): Likewise. * config/i386/i386.c (ix86_expand_prologue): Sorry if DRAP is used and the stack address has been taken. (TARGET_STACK_TOP_RTX): New. * doc/extend.texi: Document __builtin_stack_top. * doc/tm.texi.in (TARGET_STACK_TOP_RTX): New. * doc/tm.texi: Regenerated. gcc/testsuite/ PR target/66960 * gcc.target/i386/pr66960-1.c: New test. * gcc.target/i386/pr66960-2.c: Likewise. * gcc.target/i386/pr66960-3.c: Likewise. * gcc.target/i386/pr66960-4.c: Likewise. * gcc.target/i386/pr66960-5.c: Likewise. --- gcc/builtin-types.def | 1 + gcc/builtins.c | 11 +++++++++++ gcc/builtins.def | 1 + gcc/config/i386/i386.c | 8 ++++++++ gcc/doc/extend.texi | 7 +++++++ gcc/doc/tm.texi | 5 +++++ gcc/doc/tm.texi.in | 2 ++ gcc/function.h | 3 +++ gcc/ipa-pure-const.c | 1 + gcc/target.def | 7 +++++++ gcc/targhooks.c | 9 +++++++++ gcc/targhooks.h | 3 +++ gcc/testsuite/gcc.target/i386/pr66960-1.c | 33 +++++++++++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr66960-2.c | 33 +++++++++++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr66960-3.c | 17 ++++++++++++++++ gcc/testsuite/gcc.target/i386/pr66960-4.c | 21 ++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr66960-5.c | 21 ++++++++++++++++++++ 17 files changed, 183 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-5.c diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def index 0e34531..2b6b5ab 100644 --- a/gcc/builtin-types.def +++ b/gcc/builtin-types.def @@ -177,6 +177,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_COMPLEX_LONGDOUBLE_LONGDOUBLE, BT_COMPLEX_LONGDOUBLE, BT_LONGDOUBLE) DEF_FUNCTION_TYPE_1 (BT_FN_PTR_UINT, BT_PTR, BT_UINT) DEF_FUNCTION_TYPE_1 (BT_FN_PTR_SIZE, BT_PTR, BT_SIZE) +DEF_FUNCTION_TYPE_1 (BT_FN_PTR_VOID, BT_PTR, BT_VOID) DEF_FUNCTION_TYPE_1 (BT_FN_INT_INT, BT_INT, BT_INT) DEF_FUNCTION_TYPE_1 (BT_FN_INT_UINT, BT_INT, BT_UINT) DEF_FUNCTION_TYPE_1 (BT_FN_INT_LONG, BT_INT, BT_LONG) diff --git a/gcc/builtins.c b/gcc/builtins.c index 1750e25..94514b4 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -6218,6 +6218,16 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode, case BUILT_IN_CONSTANT_P: return const0_rtx; + case BUILT_IN_STACK_TOP: + if (targetm.calls.stack_top_rtx) + { + cfun->stack_top_taken = true; + return targetm.calls.stack_top_rtx (); + } + else + sorry ("%<__builtin_stack_top%> not supported on this target"); + break; + case BUILT_IN_FRAME_ADDRESS: case BUILT_IN_RETURN_ADDRESS: return expand_builtin_frame_address (fndecl, exp); @@ -12407,6 +12417,7 @@ is_simple_builtin (tree decl) case BUILT_IN_RETURN: case BUILT_IN_AGGREGATE_INCOMING_ADDRESS: case BUILT_IN_FRAME_ADDRESS: + case BUILT_IN_STACK_TOP: case BUILT_IN_VA_END: case BUILT_IN_STACK_SAVE: case BUILT_IN_STACK_RESTORE: diff --git a/gcc/builtins.def b/gcc/builtins.def index 80e4a9c..62f0523 100644 --- a/gcc/builtins.def +++ b/gcc/builtins.def @@ -778,6 +778,7 @@ DEF_EXT_LIB_BUILTIN (BUILT_IN_FFSL, "ffsl", BT_FN_INT_LONG, ATTR_CONST_NOTHRO DEF_EXT_LIB_BUILTIN (BUILT_IN_FFSLL, "ffsll", BT_FN_INT_LONGLONG, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_EXT_LIB_BUILTIN (BUILT_IN_FORK, "fork", BT_FN_PID, ATTR_NOTHROW_LIST) DEF_GCC_BUILTIN (BUILT_IN_FRAME_ADDRESS, "frame_address", BT_FN_PTR_UINT, ATTR_NULL) +DEF_GCC_BUILTIN (BUILT_IN_STACK_TOP, "stack_top", BT_FN_PTR_VOID, ATTR_NULL) /* [trans-mem]: Adjust BUILT_IN_TM_FREE if BUILT_IN_FREE is changed. */ DEF_LIB_BUILTIN (BUILT_IN_FREE, "free", BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST) DEF_GCC_BUILTIN (BUILT_IN_FROB_RETURN_ADDR, "frob_return_addr", BT_FN_PTR_PTR, ATTR_NULL) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b10569a..6abd96e 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11613,6 +11613,12 @@ ix86_expand_prologue (void) { int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT; + /* Can't use DRAP if the stack address has been taken. */ + if (cfun->stack_top_taken) + sorry ("%<__builtin_stack_top%> not supported with stack" + " realignment. This may be worked around by adding" + " -maccumulate-outgoing-arg."); + /* Only need to push parameter pointer reg if it is caller saved. */ if (!call_used_regs[REGNO (crtl->drap_reg)]) { @@ -52610,6 +52616,8 @@ ix86_operands_ok_for_move_multiple (rtx *operands, bool load, #define TARGET_UPDATE_STACK_BOUNDARY ix86_update_stack_boundary #undef TARGET_GET_DRAP_RTX #define TARGET_GET_DRAP_RTX ix86_get_drap_rtx +#undef TARGET_STACK_TOP_RTX +#define TARGET_STACK_TOP_RTX default_stack_top_rtx #undef TARGET_STRICT_ARGUMENT_NAMING #define TARGET_STRICT_ARGUMENT_NAMING hook_bool_CUMULATIVE_ARGS_true #undef TARGET_STATIC_CHAIN diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index b18d8fb..e08b9f9 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8695,6 +8695,13 @@ This function should only be used with a nonzero argument for debugging purposes. @end deftypefn +@deftypefn {Built-in Function} {void *} __builtin_stack_top (void) +This function is similar to calling @code{__builtin_frame_address} +with a value of @code{0}, but it returns the stack address when the +function is called. Unlike @code{__builtin_frame_address}, the frame +pointer register is kept only when necessary. +@end deftypefn + @node Vector Extensions @section Using Vector Instructions through Built-in Functions diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index b911b7d..428e746 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11483,6 +11483,11 @@ argument list due to stack realignment. Return @code{NULL} if no DRAP is needed. @end deftypefn +@deftypefn {Target Hook} rtx TARGET_STACK_TOP_RTX (void) +This hook should return an rtx for the stack address when the function +is called. +@end deftypefn + @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void) When optimization is disabled, this hook indicates whether or not arguments should be allocated to stack slots. Normally, GCC allocates diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 47550cc..c68338a 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -8181,6 +8181,8 @@ and the associated definitions of those functions. @hook TARGET_GET_DRAP_RTX +@hook TARGET_STACK_TOP_RTX + @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS @hook TARGET_CONST_ANCHOR diff --git a/gcc/function.h b/gcc/function.h index e92c17c..dd1c38a 100644 --- a/gcc/function.h +++ b/gcc/function.h @@ -378,6 +378,9 @@ struct GTY(()) function { /* Set when the tail call has been identified. */ unsigned int tail_call_marked : 1; + + /* Set when the address of the stack top has been taken. */ + unsigned int stack_top_taken : 1; }; /* Add the decl D to the local_decls list of FUN. */ diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c index 8fd8c36..2405082 100644 --- a/gcc/ipa-pure-const.c +++ b/gcc/ipa-pure-const.c @@ -480,6 +480,7 @@ special_builtin_state (enum pure_const_state_e *state, bool *looping, case BUILT_IN_CXA_END_CLEANUP: case BUILT_IN_EH_COPY_VALUES: case BUILT_IN_FRAME_ADDRESS: + case BUILT_IN_STACK_TOP: case BUILT_IN_APPLY: case BUILT_IN_APPLY_ARGS: *looping = false; diff --git a/gcc/target.def b/gcc/target.def index 4edc209..7a30f39 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -4525,6 +4525,13 @@ argument list due to stack realignment. Return @code{NULL} if no DRAP\n\ is needed.", rtx, (void), NULL) +/* Get the stack address when the function is called. */ +DEFHOOK +(stack_top_rtx, + "This hook should return an rtx for the stack address when the function\n\ +is called.", + rtx, (void), NULL) + /* Return true if all function parameters should be spilled to the stack. */ DEFHOOK diff --git a/gcc/targhooks.c b/gcc/targhooks.c index 3eca47e..f188272 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1926,4 +1926,13 @@ can_use_doloop_if_innermost (const widest_int &, const widest_int &, return loop_depth == 1; } +/* Get the stack address when the function is called. After the + prologue, stack top is at -WORD(AP) in the current frame. */ + +rtx +default_stack_top_rtx (void) +{ + return plus_constant (Pmode, arg_pointer_rtx, -UNITS_PER_WORD); +} + #include "gt-targhooks.h" diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 5ae991d..094a589 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -240,4 +240,7 @@ extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE tree type ATTRIBUTE_UNUSED, int *pretend_arg_size ATTRIBUTE_UNUSED, int second_time ATTRIBUTE_UNUSED); + +extern rtx default_stack_top_rtx (void); + #endif /* GCC_TARGHOOKS_H */ diff --git a/gcc/testsuite/gcc.target/i386/pr66960-1.c b/gcc/testsuite/gcc.target/i386/pr66960-1.c new file mode 100644 index 0000000..aaab3cf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66960-1.c @@ -0,0 +1,33 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fomit-frame-pointer" { target { lp64 } } } */ +/* { dg-options "-O2 -fomit-frame-pointer -maddress-mode=short" { target { x32 } } } */ +/* { dg-options "-O2 -fomit-frame-pointer -miamcu" { target { ia32 } } } */ + +extern char **environ; +extern void exit (int status); +extern int main (long argc, char **argv, char **envp); + +void +_start (void) +{ + void *argc_p = __builtin_stack_top (); + char **argv = (char **) (argc_p + sizeof (void *)); + long argc = *(long *) argc_p; + int status; + + environ = argv + argc + 1; + + status = main (argc, argv, environ); + + exit (status); +} + +/* { dg-final { scan-assembler "movq\[ \t\]8\\(%rsp\\), %rdi" { target lp64 } } } */ +/* { dg-final { scan-assembler "leaq\[ \t\]16\\(%rsp\\), %rsi" { target lp64 } } } */ +/* { dg-final { scan-assembler "leaq\[ \t\]24\\(%rsp,%rdi,8\\), %rdx" { target lp64 } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]8\\(%esp\\), %edi" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]12\\(%rsp\\), %esi" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]4\\(%rsi,%rdi,4\\), %edx" { target x32 } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]\\(%esp\\), %eax" { target ia32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]4\\(%esp\\), %edx" { target ia32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]8\\(%esp,%eax,4\\), %ecx" { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66960-2.c b/gcc/testsuite/gcc.target/i386/pr66960-2.c new file mode 100644 index 0000000..b9dbde2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66960-2.c @@ -0,0 +1,33 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fno-omit-frame-pointer" { target { lp64 } } } */ +/* { dg-options "-O2 -fno-omit-frame-pointer -maddress-mode=short" { target { x32 } } } */ +/* { dg-options "-O2 -fno-omit-frame-pointer -miamcu" { target { ia32 } } } */ + +extern char **environ; +extern void exit (int status); +extern int main (long argc, char **argv, char **envp); + +void +_start (void) +{ + void *argc_p = __builtin_stack_top (); + char **argv = (char **) (argc_p + sizeof (void *)); + long argc = *(long *) argc_p; + int status; + + environ = argv + argc + 1; + + status = main (argc, argv, environ); + + exit (status); +} + +/* { dg-final { scan-assembler "movq\[ \t\]8\\(%rbp\\), %rdi" { target lp64 } } } */ +/* { dg-final { scan-assembler "leaq\[ \t\]16\\(%rbp\\), %rsi" { target lp64 } } } */ +/* { dg-final { scan-assembler "leaq\[ \t\]24\\(%rbp,%rdi,8\\), %rdx" { target lp64 } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]8\\(%ebp\\), %edi" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]12\\(%rbp\\), %esi" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]4\\(%rsi,%rdi,4\\), %edx" { target x32 } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]4\\(%ebp\\), %eax" { target ia32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]8\\(%ebp\\), %edx" { target ia32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]12\\(%ebp,%eax,4\\), %ecx" { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66960-3.c b/gcc/testsuite/gcc.target/i386/pr66960-3.c new file mode 100644 index 0000000..48cf25e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66960-3.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -mno-accumulate-outgoing-args" { target { lp64 } } } */ +/* { dg-options "-O2 -mno-accumulate-outgoing-args -maddress-mode=short" { target { x32 } } } */ +/* { dg-options "-O2 -mno-accumulate-outgoing-args -miamcu" { target { ia32 } } } */ + +extern void abort (void); +extern int check_int (int *i, int align); +typedef int aligned __attribute__((aligned(64))); + +void * +foo (void) +{ + aligned j; + if (check_int (&j, __alignof__(j)) != j) + abort (); + return __builtin_stack_top (); +} /* { dg-message "sorry, unimplemented: .__builtin_stack_top. not supported" } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66960-4.c b/gcc/testsuite/gcc.target/i386/pr66960-4.c new file mode 100644 index 0000000..44c0b26 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66960-4.c @@ -0,0 +1,21 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -maccumulate-outgoing-args" { target { lp64 } } } */ +/* { dg-options "-O2 -maccumulate-outgoing-args -maddress-mode=short" { target { x32 } } } */ +/* { dg-options "-O2 -maccumulate-outgoing-args -miamcu" { target { ia32 } } } */ + +extern void abort (void); +extern int check_int (int *i, int align); +typedef int aligned __attribute__((aligned(64))); + +void * +foo (void) +{ + aligned j; + if (check_int (&j, __alignof__(j)) != j) + abort (); + return __builtin_stack_top (); +} + +/* { dg-final { scan-assembler "leaq\[ \t\]8\\(%rbp\\), %rax" { target lp64 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]8\\(%rbp\\), %eax" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]4\\(%ebp\\), %eax" { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66960-5.c b/gcc/testsuite/gcc.target/i386/pr66960-5.c new file mode 100644 index 0000000..d449437 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66960-5.c @@ -0,0 +1,21 @@ +/* { dg-do link } */ +/* { dg-options "-O" } */ + +extern void link_error (void); + +__attribute__ ((noinline, noclone)) +void +foo (void) +{ + void **p = __builtin_stack_top (); + void *ra = __builtin_return_address (0); + if (*p != ra) + link_error (); +} + +int +main (void) +{ + foo (); + return 0; +} -- 2.4.3 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] Add __builtin_stack_top to x86 backend 2015-07-21 22:56 RFC: [PATCH] Add __builtin_ia32_stack_top H.J. Lu 2015-07-22 12:21 ` H.J. Lu @ 2015-07-30 19:44 ` H.J. Lu 2015-08-03 8:31 ` Uros Bizjak 1 sibling, 1 reply; 7+ messages in thread From: H.J. Lu @ 2015-07-30 19:44 UTC (permalink / raw) To: gcc-patches, Uros Bizjak On Tue, Jul 21, 2015 at 02:45:39PM -0700, H.J. Lu wrote: > When __builtin_frame_address is used to retrieve the address of the > function stack frame, the frame pointer is always kept, which wastes one > register and 2 instructions. For x86-32, one less register means > significant negative impact on performance. This patch adds a new > builtin function, __builtin_ia32_stack_top, to x86 backend. It > returns the stack address when the function is called. > > Any comments, feedbacks? > Although this function is generic, but implementation is target specific. I submitted a generic patch: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01859.html So far there are no interests from other backends. Here is a patch to implement __builtin_stack_top in x86 backend. We can update x86 backedn after it is added to middle-end. OK for trunk? Thanks. H.J. -- gcc/ PR target/66960 * config/i386/i386.c (ix86_expand_prologue): Sorry if DRAP is used and the stack address has been taken. (ix86_builtins): Add IX86_BUILTIN_STACK_TOP. (ix86_init_mmx_sse_builtins): Add __builtin_stack_top. (ix86_expand_builtin): Handle IX86_BUILTIN_STACK_TOP. * config/i386/i386.h (machine_function): Add stack_top_taken. * doc/extend.texi: Document __builtin_stack_top. gcc/testsuite/ PR target/66960 * gcc.target/i386/pr66960-1.c: New test. * gcc.target/i386/pr66960-2.c: Likewise. * gcc.target/i386/pr66960-3.c: Likewise. * gcc.target/i386/pr66960-4.c: Likewise. * gcc.target/i386/pr66960-5.c: Likewise. --- gcc/config/i386/i386.c | 28 ++++++++++++++++++++++++++ gcc/config/i386/i386.h | 4 ++++ gcc/doc/extend.texi | 6 ++++++ gcc/testsuite/gcc.target/i386/pr66960-1.c | 33 +++++++++++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr66960-2.c | 33 +++++++++++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr66960-3.c | 17 ++++++++++++++++ gcc/testsuite/gcc.target/i386/pr66960-4.c | 21 ++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr66960-5.c | 21 ++++++++++++++++++++ 8 files changed, 163 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr66960-5.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ede8ea0..ef7ba6d 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11575,6 +11575,12 @@ ix86_expand_prologue (void) { int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT; + /* Can't use DRAP if the stack address has been taken. */ + if (cfun->machine->stack_top_taken) + sorry ("%<__builtin_stack_top%> not supported with stack" + " realignment. This may be worked around by adding" + " -maccumulate-outgoing-arg."); + /* Only need to push parameter pointer reg if it is caller saved. */ if (!call_used_regs[REGNO (crtl->drap_reg)]) { @@ -30741,6 +30747,9 @@ enum ix86_builtins IX86_BUILTIN_READ_FLAGS, IX86_BUILTIN_WRITE_FLAGS, + /* Get the stack address when the function is called. */ + IX86_BUILTIN_STACK_TOP, + IX86_BUILTIN_MAX }; @@ -34353,6 +34362,10 @@ ix86_init_mmx_sse_builtins (void) def_builtin (OPTION_MASK_ISA_MWAITX, "__builtin_ia32_mwaitx", VOID_FTYPE_UNSIGNED_UNSIGNED_UNSIGNED, IX86_BUILTIN_MWAITX); + /* Get the stack address when the function is called. */ + def_builtin (0, "__builtin_stack_top", + PVOID_FTYPE_VOID, IX86_BUILTIN_STACK_TOP); + /* Add FMA4 multi-arg argument instructions */ for (i = 0, d = bdesc_multi_arg; i < ARRAY_SIZE (bdesc_multi_arg); i++, d++) { @@ -40291,6 +40304,21 @@ addcarryx: emit_insn (gen_xabort (op0)); return 0; + case IX86_BUILTIN_STACK_TOP: + cfun->machine->stack_top_taken = true; + + if (!target + || GET_MODE (target) != Pmode + || !register_operand (target, Pmode)) + target = gen_reg_rtx (Pmode); + + /* After the prologue, stack top is at -WORD(AP) in the current + frame. */ + emit_insn (gen_rtx_SET (target, + plus_constant (Pmode, arg_pointer_rtx, + -UNITS_PER_WORD))); + return target; + default: break; } diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 7bd23ec..3781a22 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -2492,6 +2492,10 @@ struct GTY(()) machine_function { /* If true, it is safe to not save/restore DRAP register. */ BOOL_BITFIELD no_drap_save_restore : 1; + /* If true, the stack address of the current function has been + taken. */ + BOOL_BITFIELD stack_top_taken : 1; + /* If true, there is register available for argument passing. This is used only in ix86_function_ok_for_sibcall by 32-bit to determine if there is scratch register available for indirect sibcall. In diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index b18d8fb..971b584 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -16661,6 +16661,12 @@ The following built-in function is always available. @item void __builtin_ia32_pause (void) Generates the @code{pause} machine instruction with a compiler memory barrier. + +@item void *__builtin_stack_top (void) +This function is similar to calling @code{__builtin_frame_address} +with a value of @code{0}, but it returns the stack address when the +function is called. Unlike @code{__builtin_frame_address}, the frame +pointer register isn't required. @end table The following floating-point built-in functions are made available in the diff --git a/gcc/testsuite/gcc.target/i386/pr66960-1.c b/gcc/testsuite/gcc.target/i386/pr66960-1.c new file mode 100644 index 0000000..aaab3cf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66960-1.c @@ -0,0 +1,33 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fomit-frame-pointer" { target { lp64 } } } */ +/* { dg-options "-O2 -fomit-frame-pointer -maddress-mode=short" { target { x32 } } } */ +/* { dg-options "-O2 -fomit-frame-pointer -miamcu" { target { ia32 } } } */ + +extern char **environ; +extern void exit (int status); +extern int main (long argc, char **argv, char **envp); + +void +_start (void) +{ + void *argc_p = __builtin_stack_top (); + char **argv = (char **) (argc_p + sizeof (void *)); + long argc = *(long *) argc_p; + int status; + + environ = argv + argc + 1; + + status = main (argc, argv, environ); + + exit (status); +} + +/* { dg-final { scan-assembler "movq\[ \t\]8\\(%rsp\\), %rdi" { target lp64 } } } */ +/* { dg-final { scan-assembler "leaq\[ \t\]16\\(%rsp\\), %rsi" { target lp64 } } } */ +/* { dg-final { scan-assembler "leaq\[ \t\]24\\(%rsp,%rdi,8\\), %rdx" { target lp64 } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]8\\(%esp\\), %edi" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]12\\(%rsp\\), %esi" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]4\\(%rsi,%rdi,4\\), %edx" { target x32 } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]\\(%esp\\), %eax" { target ia32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]4\\(%esp\\), %edx" { target ia32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]8\\(%esp,%eax,4\\), %ecx" { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66960-2.c b/gcc/testsuite/gcc.target/i386/pr66960-2.c new file mode 100644 index 0000000..b9dbde2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66960-2.c @@ -0,0 +1,33 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fno-omit-frame-pointer" { target { lp64 } } } */ +/* { dg-options "-O2 -fno-omit-frame-pointer -maddress-mode=short" { target { x32 } } } */ +/* { dg-options "-O2 -fno-omit-frame-pointer -miamcu" { target { ia32 } } } */ + +extern char **environ; +extern void exit (int status); +extern int main (long argc, char **argv, char **envp); + +void +_start (void) +{ + void *argc_p = __builtin_stack_top (); + char **argv = (char **) (argc_p + sizeof (void *)); + long argc = *(long *) argc_p; + int status; + + environ = argv + argc + 1; + + status = main (argc, argv, environ); + + exit (status); +} + +/* { dg-final { scan-assembler "movq\[ \t\]8\\(%rbp\\), %rdi" { target lp64 } } } */ +/* { dg-final { scan-assembler "leaq\[ \t\]16\\(%rbp\\), %rsi" { target lp64 } } } */ +/* { dg-final { scan-assembler "leaq\[ \t\]24\\(%rbp,%rdi,8\\), %rdx" { target lp64 } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]8\\(%ebp\\), %edi" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]12\\(%rbp\\), %esi" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]4\\(%rsi,%rdi,4\\), %edx" { target x32 } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]4\\(%ebp\\), %eax" { target ia32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]8\\(%ebp\\), %edx" { target ia32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]12\\(%ebp,%eax,4\\), %ecx" { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66960-3.c b/gcc/testsuite/gcc.target/i386/pr66960-3.c new file mode 100644 index 0000000..48cf25e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66960-3.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -mno-accumulate-outgoing-args" { target { lp64 } } } */ +/* { dg-options "-O2 -mno-accumulate-outgoing-args -maddress-mode=short" { target { x32 } } } */ +/* { dg-options "-O2 -mno-accumulate-outgoing-args -miamcu" { target { ia32 } } } */ + +extern void abort (void); +extern int check_int (int *i, int align); +typedef int aligned __attribute__((aligned(64))); + +void * +foo (void) +{ + aligned j; + if (check_int (&j, __alignof__(j)) != j) + abort (); + return __builtin_stack_top (); +} /* { dg-message "sorry, unimplemented: .__builtin_stack_top. not supported" } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66960-4.c b/gcc/testsuite/gcc.target/i386/pr66960-4.c new file mode 100644 index 0000000..44c0b26 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66960-4.c @@ -0,0 +1,21 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -maccumulate-outgoing-args" { target { lp64 } } } */ +/* { dg-options "-O2 -maccumulate-outgoing-args -maddress-mode=short" { target { x32 } } } */ +/* { dg-options "-O2 -maccumulate-outgoing-args -miamcu" { target { ia32 } } } */ + +extern void abort (void); +extern int check_int (int *i, int align); +typedef int aligned __attribute__((aligned(64))); + +void * +foo (void) +{ + aligned j; + if (check_int (&j, __alignof__(j)) != j) + abort (); + return __builtin_stack_top (); +} + +/* { dg-final { scan-assembler "leaq\[ \t\]8\\(%rbp\\), %rax" { target lp64 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]8\\(%rbp\\), %eax" { target x32 } } } */ +/* { dg-final { scan-assembler "leal\[ \t\]4\\(%ebp\\), %eax" { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr66960-5.c b/gcc/testsuite/gcc.target/i386/pr66960-5.c new file mode 100644 index 0000000..d449437 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66960-5.c @@ -0,0 +1,21 @@ +/* { dg-do link } */ +/* { dg-options "-O" } */ + +extern void link_error (void); + +__attribute__ ((noinline, noclone)) +void +foo (void) +{ + void **p = __builtin_stack_top (); + void *ra = __builtin_return_address (0); + if (*p != ra) + link_error (); +} + +int +main (void) +{ + foo (); + return 0; +} -- 2.4.3 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Add __builtin_stack_top to x86 backend 2015-07-30 19:44 ` [PATCH] Add __builtin_stack_top to x86 backend H.J. Lu @ 2015-08-03 8:31 ` Uros Bizjak 0 siblings, 0 replies; 7+ messages in thread From: Uros Bizjak @ 2015-08-03 8:31 UTC (permalink / raw) To: H.J. Lu; +Cc: gcc-patches, Segher Boessenkool On Thu, Jul 30, 2015 at 8:41 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: > On Tue, Jul 21, 2015 at 02:45:39PM -0700, H.J. Lu wrote: >> When __builtin_frame_address is used to retrieve the address of the >> function stack frame, the frame pointer is always kept, which wastes one >> register and 2 instructions. For x86-32, one less register means >> significant negative impact on performance. This patch adds a new >> builtin function, __builtin_ia32_stack_top, to x86 backend. It >> returns the stack address when the function is called. >> >> Any comments, feedbacks? >> > > Although this function is generic, but implementation is target > specific. I submitted a generic patch: > > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01859.html > > So far there are no interests from other backends. Here is a patch > to implement __builtin_stack_top in x86 backend. We can update x86 > backedn after it is added to middle-end. OK for trunk? I think that the discussion about generic implementation should come to some conclusion first. From the discussion, here was no resolution on which way to go. Uros. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-08-03 8:31 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-07-21 22:56 RFC: [PATCH] Add __builtin_ia32_stack_top H.J. Lu 2015-07-22 12:21 ` H.J. Lu 2015-07-22 13:59 ` Segher Boessenkool 2015-07-22 14:14 ` H.J. Lu 2015-07-22 16:01 ` H.J. Lu 2015-07-30 19:44 ` [PATCH] Add __builtin_stack_top to x86 backend H.J. Lu 2015-08-03 8:31 ` Uros Bizjak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).