* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=
@ 2015-05-10 15:19 H.J. Lu
[not found] ` <CAAs8HmwWSDY+KjKcB4W=TiYV0Pz7NSvfL_8igp+hPT-LU1utTg@mail.gmail.com>
0 siblings, 1 reply; 65+ messages in thread
From: H.J. Lu @ 2015-05-10 15:19 UTC (permalink / raw)
To: Michael Matz; +Cc: Sriraman Tallam, GCC Patches, David Li
On Sat, May 9, 2015 at 9:34 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, May 4, 2015 at 7:45 AM, Michael Matz <matz@suse.de> wrote:
>> Hi,
>>
>> On Thu, 30 Apr 2015, Sriraman Tallam wrote:
>>
>>> We noticed that one of our benchmarks sped-up by ~1% when we eliminated
>>> PLT stubs for some of the hot external library functions like memcmp,
>>> pow. The win was from better icache and itlb performance. The main
>>> reason was that the PLT stubs had no spatial locality with the
>>> call-sites. I have started looking at ways to tell the compiler to
>>> eliminate PLT stubs (in-effect inline them) for specified external
>>> functions, for x86_64. I have a proposal and a patch and I would like to
>>> hear what you think.
>>>
>>> This comes with caveats. This cannot be generally done for all
>>> functions marked extern as it is impossible for the compiler to say if a
>>> function is "truly extern" (defined in a shared library). If a function
>>> is not truly extern(ends up defined in the final executable), then
>>> calling it indirectly is a performance penalty as it could have been a
>>> direct call.
>>
>> This can be fixed by Alans idea.
>>
>>> Further, the newly created GOT entries are fixed up at
>>> start-up and do not get lazily bound.
>>
>> And this can be fixed by some enhancements in the linker and dynamic
>> linker. The idea is to still generate a PLT stub and make its GOT entry
>> point to it initially (like a normal got.plt slot). Then the first
>> indirect call will use the address of PLT entry (starting lazy resolution)
>> and update the GOT slot with the real address, so further indirect calls
>> will directly go to the function.
>>
>> This requires a new asm marker (and hence new reloc) as normally if
>> there's a GOT slot it's filled by the real symbols address, unlike if
>> there's only a got.plt slot. E.g. a
>>
>> call *foo@GOTPLT(%rip)
>>
>> would generate a GOT slot (and fill its address into above call insn), but
>> generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one.
>>
>
> I added the "relax" prefix support to x86 assembler on users/hjl/relax
> branch
>
> at
>
> https://sourceware.org/git/?p=binutils-gdb.git;a=summary
>
> [hjl@gnu-tools-1 relax-3]$ cat r.S
> .text
> relax jmp foo
> relax call foo
> relax jmp foo@plt
> relax call foo@plt
> [hjl@gnu-tools-1 relax-3]$ ./as -o r.o r.S
> [hjl@gnu-tools-1 relax-3]$ ./objdump -drw r.o
>
> r.o: file format elf64-x86-64
>
>
> Disassembly of section .text:
>
> 0000000000000000 <.text>:
> 0: 66 e9 00 00 00 00 data16 jmpq 0x6 2: R_X86_64_RELAX_PC32 foo-0x4
> 6: 66 e8 00 00 00 00 data16 callq 0xc 8: R_X86_64_RELAX_PC32 foo-0x4
> c: 66 e9 00 00 00 00 data16 jmpq 0x12 e: R_X86_64_RELAX_PLT32foo-0x4
> 12: 66 e8 00 00 00 00 data16 callq 0x18 14: R_X86_64_RELAX_PLT32foo-0x4
> [hjl@gnu-tools-1 relax-3]$
>
> Right now, the relax relocations are treated as PC32/PLT32 relocations.
> I am working on linker support.
>
I implemented the linker support for x86-64:
00000000 <main>:
0: 48 83 ec 08 sub $0x8,%rsp
4: e8 00 00 00 00 callq 9 <main+0x9> 5: R_X86_64_PC32 plt-0x4
9: e8 00 00 00 00 callq e <main+0xe> a: R_X86_64_PLT32 plt-0x4
e: e8 00 00 00 00 callq 13 <main+0x13> f: R_X86_64_PC32 bar-0x4
13: 66 e8 00 00 00 00 data16 callq 19 <main+0x19> 15:
R_X86_64_RELAX_PC32 bar-0x4
19: 66 e8 00 00 00 00 data16 callq 1f <main+0x1f> 1b:
R_X86_64_RELAX_PLT32 bar-0x4
1f: 66 e8 00 00 00 00 data16 callq 25 <main+0x25> 21:
R_X86_64_RELAX_PC32 foo-0x4
25: 66 e8 00 00 00 00 data16 callq 2b <main+0x2b> 27:
R_X86_64_RELAX_PLT32 foo-0x4
2b: 31 c0 xor %eax,%eax
2d: 48 83 c4 08 add $0x8,%rsp
31: c3 retq
00400460 <main>:
400460: 48 83 ec 08 sub $0x8,%rsp
400464: e8 d7 ff ff ff callq 400440 <plt@plt>
400469: e8 d2 ff ff ff callq 400440 <plt@plt>
40046e: e8 ad ff ff ff callq 400420 <bar@plt>
400473: ff 15 ff 03 20 00 callq *0x2003ff(%rip) # 600878
<_DYNAMIC+0xf8>
400479: ff 15 f9 03 20 00 callq *0x2003f9(%rip) # 600878
<_DYNAMIC+0xf8>
40047f: 66 e8 f3 00 00 00 data16 callq 400578 <foo>
400485: 66 e8 ed 00 00 00 data16 callq 400578 <foo>
40048b: 31 c0 xor %eax,%eax
40048d: 48 83 c4 08 add $0x8,%rsp
400491: c3 retq
Sriraman, can you give it a try?
--
H.J.
^ permalink raw reply [flat|nested] 65+ messages in thread
[parent not found: <CAAs8HmwWSDY+KjKcB4W=TiYV0Pz7NSvfL_8igp+hPT-LU1utTg@mail.gmail.com>]
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= [not found] ` <CAAs8HmwWSDY+KjKcB4W=TiYV0Pz7NSvfL_8igp+hPT-LU1utTg@mail.gmail.com> @ 2015-05-21 21:31 ` Sriraman Tallam 2015-05-21 21:39 ` Sriraman Tallam 2015-05-21 22:02 ` Pedro Alves 0 siblings, 2 replies; 65+ messages in thread From: Sriraman Tallam @ 2015-05-21 21:31 UTC (permalink / raw) To: H.J. Lu, Michael Matz; +Cc: David Li, GCC Patches [-- Attachment #1: Type: text/plain, Size: 5995 bytes --] On Sun, May 10, 2015 at 10:01 AM, Sriraman Tallam <tmsriram@google.com> wrote: > > On Sun, May 10, 2015, 8:19 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Sat, May 9, 2015 at 9:34 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Mon, May 4, 2015 at 7:45 AM, Michael Matz <matz@suse.de> wrote: >>> Hi, >>> >>> On Thu, 30 Apr 2015, Sriraman Tallam wrote: >>> >>>> We noticed that one of our benchmarks sped-up by ~1% when we eliminated >>>> PLT stubs for some of the hot external library functions like memcmp, >>>> pow. The win was from better icache and itlb performance. The main >>>> reason was that the PLT stubs had no spatial locality with the >>>> call-sites. I have started looking at ways to tell the compiler to >>>> eliminate PLT stubs (in-effect inline them) for specified external >>>> functions, for x86_64. I have a proposal and a patch and I would like to >>>> hear what you think. >>>> >>>> This comes with caveats. This cannot be generally done for all >>>> functions marked extern as it is impossible for the compiler to say if a >>>> function is "truly extern" (defined in a shared library). If a function >>>> is not truly extern(ends up defined in the final executable), then >>>> calling it indirectly is a performance penalty as it could have been a >>>> direct call. >>> >>> This can be fixed by Alans idea. >>> >>>> Further, the newly created GOT entries are fixed up at >>>> start-up and do not get lazily bound. >>> >>> And this can be fixed by some enhancements in the linker and dynamic >>> linker. The idea is to still generate a PLT stub and make its GOT entry >>> point to it initially (like a normal got.plt slot). Then the first >>> indirect call will use the address of PLT entry (starting lazy >>> resolution) >>> and update the GOT slot with the real address, so further indirect calls >>> will directly go to the function. >>> >>> This requires a new asm marker (and hence new reloc) as normally if >>> there's a GOT slot it's filled by the real symbols address, unlike if >>> there's only a got.plt slot. E.g. a >>> >>> call *foo@GOTPLT(%rip) >>> >>> would generate a GOT slot (and fill its address into above call insn), >>> but >>> generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one. >>> >> >> I added the "relax" prefix support to x86 assembler on users/hjl/relax >> branch >> >> at >> >> https://sourceware.org/git/?p=binutils-gdb.git;a=summary >> >> [hjl@gnu-tools-1 relax-3]$ cat r.S >> .text >> relax jmp foo >> relax call foo >> relax jmp foo@plt >> relax call foo@plt >> [hjl@gnu-tools-1 relax-3]$ ./as -o r.o r.S >> [hjl@gnu-tools-1 relax-3]$ ./objdump -drw r.o >> >> r.o: file format elf64-x86-64 >> >> >> Disassembly of section .text: >> >> 0000000000000000 <.text>: >> 0: 66 e9 00 00 00 00 data16 jmpq 0x6 2: R_X86_64_RELAX_PC32 foo-0x4 >> 6: 66 e8 00 00 00 00 data16 callq 0xc 8: R_X86_64_RELAX_PC32 >> foo-0x4 >> c: 66 e9 00 00 00 00 data16 jmpq 0x12 e: >> R_X86_64_RELAX_PLT32foo-0x4 >> 12: 66 e8 00 00 00 00 data16 callq 0x18 14: >> R_X86_64_RELAX_PLT32foo-0x4 >> [hjl@gnu-tools-1 relax-3]$ >> >> Right now, the relax relocations are treated as PC32/PLT32 relocations. >> I am working on linker support. >> > > I implemented the linker support for x86-64: > > 00000000 <main>: > 0: 48 83 ec 08 sub $0x8,%rsp > 4: e8 00 00 00 00 callq 9 <main+0x9> 5: R_X86_64_PC32 plt-0x4 > 9: e8 00 00 00 00 callq e <main+0xe> a: R_X86_64_PLT32 plt-0x4 > e: e8 00 00 00 00 callq 13 <main+0x13> f: R_X86_64_PC32 bar-0x4 > 13: 66 e8 00 00 00 00 data16 callq 19 <main+0x19> 15: > R_X86_64_RELAX_PC32 bar-0x4 > 19: 66 e8 00 00 00 00 data16 callq 1f <main+0x1f> 1b: > R_X86_64_RELAX_PLT32 bar-0x4 > 1f: 66 e8 00 00 00 00 data16 callq 25 <main+0x25> 21: > R_X86_64_RELAX_PC32 foo-0x4 > 25: 66 e8 00 00 00 00 data16 callq 2b <main+0x2b> 27: > R_X86_64_RELAX_PLT32 foo-0x4 > 2b: 31 c0 xor %eax,%eax > 2d: 48 83 c4 08 add $0x8,%rsp > 31: c3 retq > > 00400460 <main>: > 400460: 48 83 ec 08 sub $0x8,%rsp > 400464: e8 d7 ff ff ff callq 400440 <plt@plt> > 400469: e8 d2 ff ff ff callq 400440 <plt@plt> > 40046e: e8 ad ff ff ff callq 400420 <bar@plt> > 400473: ff 15 ff 03 20 00 callq *0x2003ff(%rip) # 600878 > <_DYNAMIC+0xf8> > 400479: ff 15 f9 03 20 00 callq *0x2003f9(%rip) # 600878 > <_DYNAMIC+0xf8> > 40047f: 66 e8 f3 00 00 00 data16 callq 400578 <foo> > 400485: 66 e8 ed 00 00 00 data16 callq 400578 <foo> > 40048b: 31 c0 xor %eax,%eax > 40048d: 48 83 c4 08 add $0x8,%rsp > 400491: c3 retq > > Sriraman, can you give it a try? I like HJ's proposal here and it is important that the linker fixes unnecessary indirect calls to direct ones. However, independently I think my original proposal is still useful and I want to pitch it again for the following reasons. AFAIU, Alexander Monakov's -fno-plt does not solve the following: * Does not do anything for non-PIC code. The compiler does not generate a @PLT call but the linker will route all external calls via PLT. We noticed a problem with non-PIC executables where the PLT stubs were causing too many icache misses and are interested in a solution for this. * Aggressively uses indirect calls even if the final symbol is not truly external. This needs HJ's linker patch to fix unnecessary indirect calls to direct calls. My original proposal, for x86_64 only, was to add -fno-plt=<function-name>. This lets the user decide for which functions PLT must be avoided. Let the compiler always generate an indirect call using call *func@GOTPCREL(%rip). We could do this for non-PIC code too. No need for linker fixups since this relies on the user to know that func is from a shared object. I am reattaching the patch. Thanks Sri > > Thanks! Will do! > > Sri > > -- > H.J. > > [-- Attachment #2: avoid_plt_patch.txt --] [-- Type: text/plain, Size: 4091 bytes --] * common.opt (-fno-plt=): New option. * config/i386/i386.c (avoid_plt_to_call): New function. (ix86_output_call_insn): Check if PLT needs to be avoided and call or jump indirectly if true. * opts-global.c (htab_str_eq): New function. (avoid_plt_fnsymbol_names_tab): New htab. (handle_common_deferred_options): Handle -fno-plt= Index: common.opt =================================================================== --- common.opt (revision 222892) +++ common.opt (working copy) @@ -1087,6 +1087,11 @@ fdbg-cnt= Common RejectNegative Joined Var(common_deferred_options) Defer -fdbg-cnt=<counter>:<limit>[,<counter>:<limit>,...] Set the debug counter limit. +fno-plt= +Common RejectNegative Joined Var(common_deferred_options) Defer +-fno-plt=<symbol1> Avoid going through the PLT when calling the specified function. +Allow multiple instances of this option with different function names. + fdebug-prefix-map= Common Joined RejectNegative Var(common_deferred_options) Defer Map one directory name to another in debug information Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 222892) +++ config/i386/i386.c (working copy) @@ -25282,6 +25282,25 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call return call; } +extern htab_t avoid_plt_fnsymbol_names_tab; +/* If the function referenced by call_op is to a external function + and calls via PLT must be avoided as specified by -fno-plt=, then + return true. */ + +static int +avoid_plt_to_call(rtx call_op) +{ + const char *name; + if (GET_CODE (call_op) != SYMBOL_REF + || SYMBOL_REF_LOCAL_P (call_op) + || avoid_plt_fnsymbol_names_tab == NULL) + return 0; + name = XSTR (call_op, 0); + if (htab_find_slot (avoid_plt_fnsymbol_names_tab, name, NO_INSERT) != NULL) + return 1; + return 0; +} + /* Output the assembly for a call instruction. */ const char * @@ -25294,7 +25313,12 @@ ix86_output_call_insn (rtx insn, rtx call_op) if (SIBLING_CALL_P (insn)) { if (direct_p) - xasm = "jmp\t%P0"; + { + if (avoid_plt_to_call (call_op)) + xasm = "jmp\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "jmp\t%P0"; + } /* SEH epilogue detection requires the indirect branch case to include REX.W. */ else if (TARGET_SEH) @@ -25346,9 +25370,15 @@ ix86_output_call_insn (rtx insn, rtx call_op) } if (direct_p) - xasm = "call\t%P0"; + { + if (avoid_plt_to_call (call_op)) + xasm = "call\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "call\t%P0"; + } else xasm = "call\t%A0"; + output_asm_insn (xasm, &call_op); Index: opts-global.c =================================================================== --- opts-global.c (revision 222892) +++ opts-global.c (working copy) @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3. If not see #include "xregex.h" #include "attribs.h" #include "stringpool.h" +#include "hash-table.h" typedef const char *const_char_p; /* For DEF_VEC_P. */ @@ -420,6 +421,17 @@ decode_options (struct gcc_options *opts, struct g finish_options (opts, opts_set, loc); } +/* Helper function for the hash table that compares the + existing entry (S1) with the given string (S2). */ + +static int +htab_str_eq (const void *s1, const void *s2) +{ + return !strcmp ((const char *)s1, (const char *) s2); +} + +htab_t avoid_plt_fnsymbol_names_tab = NULL; + /* Process common options that have been deferred until after the handlers have been called for all options. */ @@ -539,6 +551,15 @@ handle_common_deferred_options (void) stack_limit_rtx = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (opt->arg)); break; + case OPT_fno_plt_: + void **slot; + if (avoid_plt_fnsymbol_names_tab == NULL) + avoid_plt_fnsymbol_names_tab = htab_create (10, htab_hash_string, + htab_str_eq, NULL); + slot = htab_find_slot (avoid_plt_fnsymbol_names_tab, opt->arg, INSERT); + *slot = (void *)opt->arg; + break; + default: gcc_unreachable (); } ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-21 21:31 ` Sriraman Tallam @ 2015-05-21 21:39 ` Sriraman Tallam 2015-05-21 22:02 ` Pedro Alves 1 sibling, 0 replies; 65+ messages in thread From: Sriraman Tallam @ 2015-05-21 21:39 UTC (permalink / raw) To: H.J. Lu, Michael Matz, Jan Hubicka, amonakov Cc: David Li, GCC Patches, Richard Biener [-- Attachment #1: Type: text/plain, Size: 6536 bytes --] On Thu, May 21, 2015 at 2:12 PM, Sriraman Tallam <tmsriram@google.com> wrote: > On Sun, May 10, 2015 at 10:01 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> >> On Sun, May 10, 2015, 8:19 AM H.J. Lu <hjl.tools@gmail.com> wrote: >> >> On Sat, May 9, 2015 at 9:34 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Mon, May 4, 2015 at 7:45 AM, Michael Matz <matz@suse.de> wrote: >>>> Hi, >>>> >>>> On Thu, 30 Apr 2015, Sriraman Tallam wrote: >>>> >>>>> We noticed that one of our benchmarks sped-up by ~1% when we eliminated >>>>> PLT stubs for some of the hot external library functions like memcmp, >>>>> pow. The win was from better icache and itlb performance. The main >>>>> reason was that the PLT stubs had no spatial locality with the >>>>> call-sites. I have started looking at ways to tell the compiler to >>>>> eliminate PLT stubs (in-effect inline them) for specified external >>>>> functions, for x86_64. I have a proposal and a patch and I would like to >>>>> hear what you think. >>>>> >>>>> This comes with caveats. This cannot be generally done for all >>>>> functions marked extern as it is impossible for the compiler to say if a >>>>> function is "truly extern" (defined in a shared library). If a function >>>>> is not truly extern(ends up defined in the final executable), then >>>>> calling it indirectly is a performance penalty as it could have been a >>>>> direct call. >>>> >>>> This can be fixed by Alans idea. >>>> >>>>> Further, the newly created GOT entries are fixed up at >>>>> start-up and do not get lazily bound. >>>> >>>> And this can be fixed by some enhancements in the linker and dynamic >>>> linker. The idea is to still generate a PLT stub and make its GOT entry >>>> point to it initially (like a normal got.plt slot). Then the first >>>> indirect call will use the address of PLT entry (starting lazy >>>> resolution) >>>> and update the GOT slot with the real address, so further indirect calls >>>> will directly go to the function. >>>> >>>> This requires a new asm marker (and hence new reloc) as normally if >>>> there's a GOT slot it's filled by the real symbols address, unlike if >>>> there's only a got.plt slot. E.g. a >>>> >>>> call *foo@GOTPLT(%rip) >>>> >>>> would generate a GOT slot (and fill its address into above call insn), >>>> but >>>> generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one. >>>> >>> >>> I added the "relax" prefix support to x86 assembler on users/hjl/relax >>> branch >>> >>> at >>> >>> https://sourceware.org/git/?p=binutils-gdb.git;a=summary >>> >>> [hjl@gnu-tools-1 relax-3]$ cat r.S >>> .text >>> relax jmp foo >>> relax call foo >>> relax jmp foo@plt >>> relax call foo@plt >>> [hjl@gnu-tools-1 relax-3]$ ./as -o r.o r.S >>> [hjl@gnu-tools-1 relax-3]$ ./objdump -drw r.o >>> >>> r.o: file format elf64-x86-64 >>> >>> >>> Disassembly of section .text: >>> >>> 0000000000000000 <.text>: >>> 0: 66 e9 00 00 00 00 data16 jmpq 0x6 2: R_X86_64_RELAX_PC32 foo-0x4 >>> 6: 66 e8 00 00 00 00 data16 callq 0xc 8: R_X86_64_RELAX_PC32 >>> foo-0x4 >>> c: 66 e9 00 00 00 00 data16 jmpq 0x12 e: >>> R_X86_64_RELAX_PLT32foo-0x4 >>> 12: 66 e8 00 00 00 00 data16 callq 0x18 14: >>> R_X86_64_RELAX_PLT32foo-0x4 >>> [hjl@gnu-tools-1 relax-3]$ >>> >>> Right now, the relax relocations are treated as PC32/PLT32 relocations. >>> I am working on linker support. >>> >> >> I implemented the linker support for x86-64: >> >> 00000000 <main>: >> 0: 48 83 ec 08 sub $0x8,%rsp >> 4: e8 00 00 00 00 callq 9 <main+0x9> 5: R_X86_64_PC32 plt-0x4 >> 9: e8 00 00 00 00 callq e <main+0xe> a: R_X86_64_PLT32 plt-0x4 >> e: e8 00 00 00 00 callq 13 <main+0x13> f: R_X86_64_PC32 bar-0x4 >> 13: 66 e8 00 00 00 00 data16 callq 19 <main+0x19> 15: >> R_X86_64_RELAX_PC32 bar-0x4 >> 19: 66 e8 00 00 00 00 data16 callq 1f <main+0x1f> 1b: >> R_X86_64_RELAX_PLT32 bar-0x4 >> 1f: 66 e8 00 00 00 00 data16 callq 25 <main+0x25> 21: >> R_X86_64_RELAX_PC32 foo-0x4 >> 25: 66 e8 00 00 00 00 data16 callq 2b <main+0x2b> 27: >> R_X86_64_RELAX_PLT32 foo-0x4 >> 2b: 31 c0 xor %eax,%eax >> 2d: 48 83 c4 08 add $0x8,%rsp >> 31: c3 retq >> >> 00400460 <main>: >> 400460: 48 83 ec 08 sub $0x8,%rsp >> 400464: e8 d7 ff ff ff callq 400440 <plt@plt> >> 400469: e8 d2 ff ff ff callq 400440 <plt@plt> >> 40046e: e8 ad ff ff ff callq 400420 <bar@plt> >> 400473: ff 15 ff 03 20 00 callq *0x2003ff(%rip) # 600878 >> <_DYNAMIC+0xf8> >> 400479: ff 15 f9 03 20 00 callq *0x2003f9(%rip) # 600878 >> <_DYNAMIC+0xf8> >> 40047f: 66 e8 f3 00 00 00 data16 callq 400578 <foo> >> 400485: 66 e8 ed 00 00 00 data16 callq 400578 <foo> >> 40048b: 31 c0 xor %eax,%eax >> 40048d: 48 83 c4 08 add $0x8,%rsp >> 400491: c3 retq >> >> Sriraman, can you give it a try? > > > I like HJ's proposal here and it is important that the linker fixes > unnecessary indirect calls to direct ones. > > However, independently I think my original proposal is still useful > and I want to pitch it again for the following reasons. > > AFAIU, Alexander Monakov's -fno-plt does not solve the following: > > * Does not do anything for non-PIC code. The compiler does not > generate a @PLT call but the linker will route all external calls via > PLT. We noticed a problem with non-PIC executables where the PLT > stubs were causing too many icache misses and are interested in a > solution for this. > * Aggressively uses indirect calls even if the final symbol is not > truly external. This needs HJ's linker patch to fix unnecessary > indirect calls to direct calls. > > My original proposal, for x86_64 only, was to add > -fno-plt=<function-name>. This lets the user decide for which > functions PLT must be avoided. Let the compiler always generate an > indirect call using call *func@GOTPCREL(%rip). We could do this for > non-PIC code too. No need for linker fixups since this relies on the > user to know that func is from a shared object. > > I am reattaching the patch. I also want to add that my proposal with -fno-plt=<fn_name> does not take away anything from the newly added -fno-plt option or the linker patch HJ proposed. It is orthogonal and lets the user decide the subset of external functions for which PLT must be avoided. Thanks Sri > > Thanks > Sri > > >> >> Thanks! Will do! >> >> Sri >> >> -- >> H.J. >> >> [-- Attachment #2: avoid_plt_patch.txt --] [-- Type: text/plain, Size: 4091 bytes --] * common.opt (-fno-plt=): New option. * config/i386/i386.c (avoid_plt_to_call): New function. (ix86_output_call_insn): Check if PLT needs to be avoided and call or jump indirectly if true. * opts-global.c (htab_str_eq): New function. (avoid_plt_fnsymbol_names_tab): New htab. (handle_common_deferred_options): Handle -fno-plt= Index: common.opt =================================================================== --- common.opt (revision 222892) +++ common.opt (working copy) @@ -1087,6 +1087,11 @@ fdbg-cnt= Common RejectNegative Joined Var(common_deferred_options) Defer -fdbg-cnt=<counter>:<limit>[,<counter>:<limit>,...] Set the debug counter limit. +fno-plt= +Common RejectNegative Joined Var(common_deferred_options) Defer +-fno-plt=<symbol1> Avoid going through the PLT when calling the specified function. +Allow multiple instances of this option with different function names. + fdebug-prefix-map= Common Joined RejectNegative Var(common_deferred_options) Defer Map one directory name to another in debug information Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 222892) +++ config/i386/i386.c (working copy) @@ -25282,6 +25282,25 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call return call; } +extern htab_t avoid_plt_fnsymbol_names_tab; +/* If the function referenced by call_op is to a external function + and calls via PLT must be avoided as specified by -fno-plt=, then + return true. */ + +static int +avoid_plt_to_call(rtx call_op) +{ + const char *name; + if (GET_CODE (call_op) != SYMBOL_REF + || SYMBOL_REF_LOCAL_P (call_op) + || avoid_plt_fnsymbol_names_tab == NULL) + return 0; + name = XSTR (call_op, 0); + if (htab_find_slot (avoid_plt_fnsymbol_names_tab, name, NO_INSERT) != NULL) + return 1; + return 0; +} + /* Output the assembly for a call instruction. */ const char * @@ -25294,7 +25313,12 @@ ix86_output_call_insn (rtx insn, rtx call_op) if (SIBLING_CALL_P (insn)) { if (direct_p) - xasm = "jmp\t%P0"; + { + if (avoid_plt_to_call (call_op)) + xasm = "jmp\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "jmp\t%P0"; + } /* SEH epilogue detection requires the indirect branch case to include REX.W. */ else if (TARGET_SEH) @@ -25346,9 +25370,15 @@ ix86_output_call_insn (rtx insn, rtx call_op) } if (direct_p) - xasm = "call\t%P0"; + { + if (avoid_plt_to_call (call_op)) + xasm = "call\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "call\t%P0"; + } else xasm = "call\t%A0"; + output_asm_insn (xasm, &call_op); Index: opts-global.c =================================================================== --- opts-global.c (revision 222892) +++ opts-global.c (working copy) @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3. If not see #include "xregex.h" #include "attribs.h" #include "stringpool.h" +#include "hash-table.h" typedef const char *const_char_p; /* For DEF_VEC_P. */ @@ -420,6 +421,17 @@ decode_options (struct gcc_options *opts, struct g finish_options (opts, opts_set, loc); } +/* Helper function for the hash table that compares the + existing entry (S1) with the given string (S2). */ + +static int +htab_str_eq (const void *s1, const void *s2) +{ + return !strcmp ((const char *)s1, (const char *) s2); +} + +htab_t avoid_plt_fnsymbol_names_tab = NULL; + /* Process common options that have been deferred until after the handlers have been called for all options. */ @@ -539,6 +551,15 @@ handle_common_deferred_options (void) stack_limit_rtx = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (opt->arg)); break; + case OPT_fno_plt_: + void **slot; + if (avoid_plt_fnsymbol_names_tab == NULL) + avoid_plt_fnsymbol_names_tab = htab_create (10, htab_hash_string, + htab_str_eq, NULL); + slot = htab_find_slot (avoid_plt_fnsymbol_names_tab, opt->arg, INSERT); + *slot = (void *)opt->arg; + break; + default: gcc_unreachable (); } ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-21 21:31 ` Sriraman Tallam 2015-05-21 21:39 ` Sriraman Tallam @ 2015-05-21 22:02 ` Pedro Alves 2015-05-21 22:02 ` Jakub Jelinek 2015-05-21 22:34 ` Sriraman Tallam 1 sibling, 2 replies; 65+ messages in thread From: Pedro Alves @ 2015-05-21 22:02 UTC (permalink / raw) To: Sriraman Tallam, H.J. Lu, Michael Matz; +Cc: David Li, GCC Patches On 05/21/2015 10:12 PM, Sriraman Tallam wrote: > > My original proposal, for x86_64 only, was to add > -fno-plt=<function-name>. This lets the user decide for which > functions PLT must be avoided. Let the compiler always generate an > indirect call using call *func@GOTPCREL(%rip). We could do this for > non-PIC code too. No need for linker fixups since this relies on the > user to know that func is from a shared object. Having to pass function names on the command line seems like an odd interface. E.g, you'll need to pass the mangled name for C++ functions. Any reason this isn't a function attribute? Thanks, Pedro Alves ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-21 22:02 ` Pedro Alves @ 2015-05-21 22:02 ` Jakub Jelinek 2015-05-22 1:47 ` H.J. Lu 2015-05-22 3:38 ` Xinliang David Li 2015-05-21 22:34 ` Sriraman Tallam 1 sibling, 2 replies; 65+ messages in thread From: Jakub Jelinek @ 2015-05-21 22:02 UTC (permalink / raw) To: Pedro Alves; +Cc: Sriraman Tallam, H.J. Lu, Michael Matz, David Li, GCC Patches On Thu, May 21, 2015 at 10:51:50PM +0100, Pedro Alves wrote: > On 05/21/2015 10:12 PM, Sriraman Tallam wrote: > > > > My original proposal, for x86_64 only, was to add > > -fno-plt=<function-name>. This lets the user decide for which > > functions PLT must be avoided. Let the compiler always generate an > > indirect call using call *func@GOTPCREL(%rip). We could do this for > > non-PIC code too. No need for linker fixups since this relies on the > > user to know that func is from a shared object. > > Having to pass function names on the command line seems like an odd > interface. E.g, you'll need to pass the mangled name for > C++ functions. Any reason this isn't a function attribute? I strongly second this. Similar reasons for why we haven't added the asan blacklisting from the command line, one really should use function attributes for this kind of things. Jakub ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-21 22:02 ` Jakub Jelinek @ 2015-05-22 1:47 ` H.J. Lu 2015-05-22 3:38 ` Xinliang David Li 1 sibling, 0 replies; 65+ messages in thread From: H.J. Lu @ 2015-05-22 1:47 UTC (permalink / raw) To: Jakub Jelinek Cc: Pedro Alves, Sriraman Tallam, Michael Matz, David Li, GCC Patches On Thu, May 21, 2015 at 2:58 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, May 21, 2015 at 10:51:50PM +0100, Pedro Alves wrote: >> On 05/21/2015 10:12 PM, Sriraman Tallam wrote: >> > >> > My original proposal, for x86_64 only, was to add >> > -fno-plt=<function-name>. This lets the user decide for which >> > functions PLT must be avoided. Let the compiler always generate an >> > indirect call using call *func@GOTPCREL(%rip). We could do this for >> > non-PIC code too. No need for linker fixups since this relies on the >> > user to know that func is from a shared object. >> >> Having to pass function names on the command line seems like an odd >> interface. E.g, you'll need to pass the mangled name for >> C++ functions. Any reason this isn't a function attribute? > > I strongly second this. Similar reasons for why we haven't added > the asan blacklisting from the command line, one really should use > function attributes for this kind of things. > We can extend attribute to add something similar to "dllimport" -- H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-21 22:02 ` Jakub Jelinek 2015-05-22 1:47 ` H.J. Lu @ 2015-05-22 3:38 ` Xinliang David Li 1 sibling, 0 replies; 65+ messages in thread From: Xinliang David Li @ 2015-05-22 3:38 UTC (permalink / raw) To: Jakub Jelinek Cc: Pedro Alves, Sriraman Tallam, H.J. Lu, Michael Matz, GCC Patches We have -finstrument-functions-exclude-function-list=.. in GCC, though it is not using mangled names. David On Thu, May 21, 2015 at 2:58 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, May 21, 2015 at 10:51:50PM +0100, Pedro Alves wrote: >> On 05/21/2015 10:12 PM, Sriraman Tallam wrote: >> > >> > My original proposal, for x86_64 only, was to add >> > -fno-plt=<function-name>. This lets the user decide for which >> > functions PLT must be avoided. Let the compiler always generate an >> > indirect call using call *func@GOTPCREL(%rip). We could do this for >> > non-PIC code too. No need for linker fixups since this relies on the >> > user to know that func is from a shared object. >> >> Having to pass function names on the command line seems like an odd >> interface. E.g, you'll need to pass the mangled name for >> C++ functions. Any reason this isn't a function attribute? > > I strongly second this. Similar reasons for why we haven't added > the asan blacklisting from the command line, one really should use > function attributes for this kind of things. > > Jakub ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-21 22:02 ` Pedro Alves 2015-05-21 22:02 ` Jakub Jelinek @ 2015-05-21 22:34 ` Sriraman Tallam 2015-05-22 9:22 ` Pedro Alves 1 sibling, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-05-21 22:34 UTC (permalink / raw) To: Pedro Alves; +Cc: H.J. Lu, Michael Matz, David Li, GCC Patches On Thu, May 21, 2015 at 2:51 PM, Pedro Alves <palves@redhat.com> wrote: > On 05/21/2015 10:12 PM, Sriraman Tallam wrote: >> >> My original proposal, for x86_64 only, was to add >> -fno-plt=<function-name>. This lets the user decide for which >> functions PLT must be avoided. Let the compiler always generate an >> indirect call using call *func@GOTPCREL(%rip). We could do this for >> non-PIC code too. No need for linker fixups since this relies on the >> user to know that func is from a shared object. > > Having to pass function names on the command line seems like an odd > interface. E.g, you'll need to pass the mangled name for > C++ functions. Any reason this isn't a function attribute? It is not clear to me where I would stick the attribute. Example usage in foo.cc: #include<string.h> int main() { int n = memcmp(....); } I want memcmp to not go through PLT, do you propose explicitly re-declaring it in foo.cc with the attribute? Thanks Sri > > Thanks, > Pedro Alves > ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-21 22:34 ` Sriraman Tallam @ 2015-05-22 9:22 ` Pedro Alves 2015-05-22 15:13 ` Sriraman Tallam 2015-05-28 18:53 ` Sriraman Tallam 0 siblings, 2 replies; 65+ messages in thread From: Pedro Alves @ 2015-05-22 9:22 UTC (permalink / raw) To: Sriraman Tallam; +Cc: H.J. Lu, Michael Matz, David Li, GCC Patches On 05/21/2015 11:02 PM, Sriraman Tallam wrote: > On Thu, May 21, 2015 at 2:51 PM, Pedro Alves <palves@redhat.com> wrote: >> On 05/21/2015 10:12 PM, Sriraman Tallam wrote: >>> >>> My original proposal, for x86_64 only, was to add >>> -fno-plt=<function-name>. This lets the user decide for which >>> functions PLT must be avoided. Let the compiler always generate an >>> indirect call using call *func@GOTPCREL(%rip). We could do this for >>> non-PIC code too. No need for linker fixups since this relies on the >>> user to know that func is from a shared object. >> >> Having to pass function names on the command line seems like an odd >> interface. E.g, you'll need to pass the mangled name for >> C++ functions. Any reason this isn't a function attribute? > > It is not clear to me where I would stick the attribute. Example > usage in foo.cc: > > #include<string.h> > > int main() { > int n = memcmp(....); > } > > I want memcmp to not go through PLT, do you propose explicitly > re-declaring it in foo.cc with the attribute? I guess you'd do: #include<string.h> __attribute__((no_plt)) typeof (memcpy) memcpy; int main() { int n = memcmp(....); } or even: #include<string.h> int main() { if (hotpath) { __attribute__((no_plt)) typeof (memcpy) memcpy; for (..) { int n = memcmp(....); } } else { int n = memcmp(....); } } or globally: $ cat no-plt/string.h: #include_next <string.h> __attribute__((no_plt)) typeof (memcpy) memcpy; $ gcc -I no-plt/ ... Thanks, Pedro Alves ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-22 9:22 ` Pedro Alves @ 2015-05-22 15:13 ` Sriraman Tallam 2015-05-28 18:53 ` Sriraman Tallam 1 sibling, 0 replies; 65+ messages in thread From: Sriraman Tallam @ 2015-05-22 15:13 UTC (permalink / raw) To: Pedro Alves; +Cc: H.J. Lu, Michael Matz, David Li, GCC Patches On Fri, May 22, 2015 at 2:00 AM, Pedro Alves <palves@redhat.com> wrote: > On 05/21/2015 11:02 PM, Sriraman Tallam wrote: >> On Thu, May 21, 2015 at 2:51 PM, Pedro Alves <palves@redhat.com> wrote: >>> On 05/21/2015 10:12 PM, Sriraman Tallam wrote: >>>> >>>> My original proposal, for x86_64 only, was to add >>>> -fno-plt=<function-name>. This lets the user decide for which >>>> functions PLT must be avoided. Let the compiler always generate an >>>> indirect call using call *func@GOTPCREL(%rip). We could do this for >>>> non-PIC code too. No need for linker fixups since this relies on the >>>> user to know that func is from a shared object. >>> >>> Having to pass function names on the command line seems like an odd >>> interface. E.g, you'll need to pass the mangled name for >>> C++ functions. Any reason this isn't a function attribute? >> >> It is not clear to me where I would stick the attribute. Example >> usage in foo.cc: >> >> #include<string.h> >> >> int main() { >> int n = memcmp(....); >> } >> >> I want memcmp to not go through PLT, do you propose explicitly >> re-declaring it in foo.cc with the attribute? > > I guess you'd do: > > #include<string.h> > > __attribute__((no_plt)) typeof (memcpy) memcpy; > > int main() { > int n = memcmp(....); > } > > or even: > > #include<string.h> > > int main() { > if (hotpath) { > __attribute__((no_plt)) typeof (memcpy) memcpy; > for (..) { > int n = memcmp(....); > } > } else { > int n = memcmp(....); > } > } > > or globally: > > $ cat no-plt/string.h: > #include_next <string.h> > __attribute__((no_plt)) typeof (memcpy) memcpy; > > $ gcc -I no-plt/ ... That looks good, thanks. Sri > > Thanks, > Pedro Alves > ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-22 9:22 ` Pedro Alves 2015-05-22 15:13 ` Sriraman Tallam @ 2015-05-28 18:53 ` Sriraman Tallam 2015-05-28 19:05 ` H.J. Lu 1 sibling, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-05-28 18:53 UTC (permalink / raw) To: Pedro Alves; +Cc: H.J. Lu, Michael Matz, David Li, GCC Patches, Jan Hubicka [-- Attachment #1: Type: text/plain, Size: 2132 bytes --] I have attached a patch that adds the new attribute "noplt". Please review. * config/i386/i386.c (avoid_plt_to_call): New function. (ix86_output_call_insn): Generate indirect call for functions marked with "noplt" attribute. (attribute_spec ix86_attribute_): Define new attribute "noplt". * doc/extend.texi: Document new attribute "noplt". * gcc.target/i386/noplt-1.c: New testcase. * gcc.target/i386/noplt-2.c: New testcase. Thanks Sri On Fri, May 22, 2015 at 2:00 AM, Pedro Alves <palves@redhat.com> wrote: > On 05/21/2015 11:02 PM, Sriraman Tallam wrote: >> On Thu, May 21, 2015 at 2:51 PM, Pedro Alves <palves@redhat.com> wrote: >>> On 05/21/2015 10:12 PM, Sriraman Tallam wrote: >>>> >>>> My original proposal, for x86_64 only, was to add >>>> -fno-plt=<function-name>. This lets the user decide for which >>>> functions PLT must be avoided. Let the compiler always generate an >>>> indirect call using call *func@GOTPCREL(%rip). We could do this for >>>> non-PIC code too. No need for linker fixups since this relies on the >>>> user to know that func is from a shared object. >>> >>> Having to pass function names on the command line seems like an odd >>> interface. E.g, you'll need to pass the mangled name for >>> C++ functions. Any reason this isn't a function attribute? >> >> It is not clear to me where I would stick the attribute. Example >> usage in foo.cc: >> >> #include<string.h> >> >> int main() { >> int n = memcmp(....); >> } >> >> I want memcmp to not go through PLT, do you propose explicitly >> re-declaring it in foo.cc with the attribute? > > I guess you'd do: > > #include<string.h> > > __attribute__((no_plt)) typeof (memcpy) memcpy; > > int main() { > int n = memcmp(....); > } > > or even: > > #include<string.h> > > int main() { > if (hotpath) { > __attribute__((no_plt)) typeof (memcpy) memcpy; > for (..) { > int n = memcmp(....); > } > } else { > int n = memcmp(....); > } > } > > or globally: > > $ cat no-plt/string.h: > #include_next <string.h> > __attribute__((no_plt)) typeof (memcpy) memcpy; > > $ gcc -I no-plt/ ... > > Thanks, > Pedro Alves > [-- Attachment #2: noplt_attrib_patch.txt --] [-- Type: text/plain, Size: 4098 bytes --] * config/i386/i386.c (avoid_plt_to_call): New function. (ix86_output_call_insn): Generate indirect call for functions marked with "noplt" attribute. (attribute_spec ix86_attribute_): Define new attribute "noplt". * doc/extend.texi: Document new attribute "noplt". * gcc.target/i386/noplt-1.c: New testcase. * gcc.target/i386/noplt-2.c: New testcase. Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 223720) +++ config/i386/i386.c (working copy) @@ -25599,6 +25599,25 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call return call; } +/* Return true if the function being called was marked with attribute + "noplt". If this function is defined, this should return false. */ +static bool +avoid_plt_to_call (rtx call_op) +{ + if (GET_CODE (call_op) != SYMBOL_REF + || SYMBOL_REF_LOCAL_P (call_op)) + return false; + + tree symbol_decl = SYMBOL_REF_DECL (call_op); + + if (symbol_decl != NULL_TREE + && TREE_CODE (symbol_decl) == FUNCTION_DECL + && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl))) + return true; + + return false; +} + /* Output the assembly for a call instruction. */ const char * @@ -25611,7 +25630,12 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op if (SIBLING_CALL_P (insn)) { if (direct_p) - xasm = "%!jmp\t%P0"; + { + if (TARGET_64BIT && avoid_plt_to_call (call_op)) + xasm = "jmp\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "jmp\t%P0"; + } /* SEH epilogue detection requires the indirect branch case to include REX.W. */ else if (TARGET_SEH) @@ -25654,7 +25678,12 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op } if (direct_p) - xasm = "%!call\t%P0"; + { + if (TARGET_64BIT && avoid_plt_to_call (call_op)) + xasm = "call\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "call\t%P0"; + } else xasm = "%!call\t%A0"; @@ -46628,6 +46657,9 @@ static const struct attribute_spec ix86_attribute_ false }, { "callee_pop_aggregate_return", 1, 1, false, true, true, ix86_handle_callee_pop_aggregate_return, true }, + /* Attribute to avoid calling function via PLT. */ + { "noplt", 0, 0, true, false, false, ix86_handle_fndecl_attribute, + false }, /* End element. */ { NULL, 0, 0, false, false, false, NULL, false } }; Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 223720) +++ doc/extend.texi (working copy) @@ -4858,6 +4858,13 @@ On x86-32 targets, the @code{stdcall} attribute ca assume that the called function pops off the stack space used to pass arguments, unless it takes a variable number of arguments. +@item noplt +@cindex @code{noplt} function attribute, x86-64 +@cindex functions whose calls do not go via PLT +On x86-64 targets. the @code{noplt} attribute causes the compiler to +call this external function indirectly using a GOT entry and avoid the +PLT. + @item target (@var{options}) @cindex @code{target} function attribute As discussed in @ref{Common Function Attributes}, this attribute Index: testsuite/gcc.target/i386/noplt-1.c =================================================================== --- testsuite/gcc.target/i386/noplt-1.c (revision 0) +++ testsuite/gcc.target/i386/noplt-1.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-* } } */ + + +__attribute__ ((noplt)) +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-2.c =================================================================== --- testsuite/gcc.target/i386/noplt-2.c (revision 0) +++ testsuite/gcc.target/i386/noplt-2.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-* } } */ +/* { dg-options "-O2" } */ + + +__attribute__ ((noplt)) +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-28 18:53 ` Sriraman Tallam @ 2015-05-28 19:05 ` H.J. Lu 2015-05-28 19:48 ` Sriraman Tallam 0 siblings, 1 reply; 65+ messages in thread From: H.J. Lu @ 2015-05-28 19:05 UTC (permalink / raw) To: Sriraman Tallam Cc: Pedro Alves, Michael Matz, David Li, GCC Patches, Jan Hubicka On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: > I have attached a patch that adds the new attribute "noplt". Please review. > > * config/i386/i386.c (avoid_plt_to_call): New function. > (ix86_output_call_insn): Generate indirect call for functions > marked with "noplt" attribute. > (attribute_spec ix86_attribute_): Define new attribute "noplt". > * doc/extend.texi: Document new attribute "noplt". > * gcc.target/i386/noplt-1.c: New testcase. > * gcc.target/i386/noplt-2.c: New testcase. > 2 comments: 1. Don't remove "%!" prefix before call/jmp. It is needed for MPX. 2. Don't you need to check && !TARGET_MACHO && !TARGET_SEH && !TARGET_PECOFF since it only works for ELF. -- H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-28 19:05 ` H.J. Lu @ 2015-05-28 19:48 ` Sriraman Tallam 2015-05-28 20:19 ` H.J. Lu 0 siblings, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-05-28 19:48 UTC (permalink / raw) To: H.J. Lu Cc: Pedro Alves, Michael Matz, David Li, GCC Patches, Jan Hubicka, amonakov On Thu, May 28, 2015 at 11:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> I have attached a patch that adds the new attribute "noplt". Please review. >> >> * config/i386/i386.c (avoid_plt_to_call): New function. >> (ix86_output_call_insn): Generate indirect call for functions >> marked with "noplt" attribute. >> (attribute_spec ix86_attribute_): Define new attribute "noplt". >> * doc/extend.texi: Document new attribute "noplt". >> * gcc.target/i386/noplt-1.c: New testcase. >> * gcc.target/i386/noplt-2.c: New testcase. >> > > 2 comments: > > 1. Don't remove "%!" prefix before call/jmp. It is needed for MPX. > 2. Don't you need to check > > && !TARGET_MACHO > && !TARGET_SEH > && !TARGET_PECOFF > > since it only works for ELF. Ok, I will make this change. OTOH, is it just better to piggy-back on existing -fno-plt change by Alex in calls.c and do this: Index: calls.c =================================================================== --- calls.c (revision 223720) +++ calls.c (working copy) @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) : memory_address (FUNCTION_MODE, funexp)); - else if (flag_pic && !flag_plt && fndecl_or_type + else if (fndecl_or_type && TREE_CODE (fndecl_or_type) == FUNCTION_DECL - && !targetm.binds_local_p (fndecl_or_type)) + && !targetm.binds_local_p (fndecl_or_type) + && ((flag_pic && !flag_plt) + || (lookup_attribute ("noplt", DECL_ATTRIBUTES(fndecl_or_type))))) { funexp = force_reg (Pmode, funexp); } Thanks Sri > > -- > H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-28 19:48 ` Sriraman Tallam @ 2015-05-28 20:19 ` H.J. Lu 2015-05-28 21:27 ` Sriraman Tallam 0 siblings, 1 reply; 65+ messages in thread From: H.J. Lu @ 2015-05-28 20:19 UTC (permalink / raw) To: Sriraman Tallam Cc: Pedro Alves, Michael Matz, David Li, GCC Patches, Jan Hubicka, Alexander Monakov On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam <tmsriram@google.com> wrote: > On Thu, May 28, 2015 at 11:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>> I have attached a patch that adds the new attribute "noplt". Please review. >>> >>> * config/i386/i386.c (avoid_plt_to_call): New function. >>> (ix86_output_call_insn): Generate indirect call for functions >>> marked with "noplt" attribute. >>> (attribute_spec ix86_attribute_): Define new attribute "noplt". >>> * doc/extend.texi: Document new attribute "noplt". >>> * gcc.target/i386/noplt-1.c: New testcase. >>> * gcc.target/i386/noplt-2.c: New testcase. >>> >> >> 2 comments: >> >> 1. Don't remove "%!" prefix before call/jmp. It is needed for MPX. >> 2. Don't you need to check >> >> && !TARGET_MACHO >> && !TARGET_SEH >> && !TARGET_PECOFF >> >> since it only works for ELF. > > Ok, I will make this change. OTOH, is it just better to piggy-back on > existing -fno-plt change by Alex in calls.c > and do this: > > Index: calls.c > =================================================================== > --- calls.c (revision 223720) > +++ calls.c (working copy) > @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun > && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) > ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) > : memory_address (FUNCTION_MODE, funexp)); > - else if (flag_pic && !flag_plt && fndecl_or_type > + else if (fndecl_or_type > && TREE_CODE (fndecl_or_type) == FUNCTION_DECL > - && !targetm.binds_local_p (fndecl_or_type)) > + && !targetm.binds_local_p (fndecl_or_type) > + && ((flag_pic && !flag_plt) > + || (lookup_attribute ("noplt", DECL_ATTRIBUTES(fndecl_or_type))))) > { > funexp = force_reg (Pmode, funexp); > } > Does it work on non-PIC calls? -- H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-28 20:19 ` H.J. Lu @ 2015-05-28 21:27 ` Sriraman Tallam 2015-05-28 21:31 ` H.J. Lu 0 siblings, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-05-28 21:27 UTC (permalink / raw) To: H.J. Lu; +Cc: Pedro Alves, Michael Matz, David Li, GCC Patches, Jan Hubicka [-- Attachment #1: Type: text/plain, Size: 2171 bytes --] On Thu, May 28, 2015 at 12:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Thu, May 28, 2015 at 11:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> I have attached a patch that adds the new attribute "noplt". Please review. >>>> >>>> * config/i386/i386.c (avoid_plt_to_call): New function. >>>> (ix86_output_call_insn): Generate indirect call for functions >>>> marked with "noplt" attribute. >>>> (attribute_spec ix86_attribute_): Define new attribute "noplt". >>>> * doc/extend.texi: Document new attribute "noplt". >>>> * gcc.target/i386/noplt-1.c: New testcase. >>>> * gcc.target/i386/noplt-2.c: New testcase. >>>> >>> >>> 2 comments: >>> >>> 1. Don't remove "%!" prefix before call/jmp. It is needed for MPX. >>> 2. Don't you need to check >>> >>> && !TARGET_MACHO >>> && !TARGET_SEH >>> && !TARGET_PECOFF >>> >>> since it only works for ELF. >> >> Ok, I will make this change. OTOH, is it just better to piggy-back on >> existing -fno-plt change by Alex in calls.c >> and do this: >> >> Index: calls.c >> =================================================================== >> --- calls.c (revision 223720) >> +++ calls.c (working copy) >> @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun >> && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) >> ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) >> : memory_address (FUNCTION_MODE, funexp)); >> - else if (flag_pic && !flag_plt && fndecl_or_type >> + else if (fndecl_or_type >> && TREE_CODE (fndecl_or_type) == FUNCTION_DECL >> - && !targetm.binds_local_p (fndecl_or_type)) >> + && !targetm.binds_local_p (fndecl_or_type) >> + && ((flag_pic && !flag_plt) >> + || (lookup_attribute ("noplt", DECL_ATTRIBUTES(fndecl_or_type))))) >> { >> funexp = force_reg (Pmode, funexp); >> } >> > > Does it work on non-PIC calls? You are right, it doesnt work. I have attached the patch with the changes you mentioned. Thanks Sri > > -- > H.J. [-- Attachment #2: noplt_attrib_patch.txt --] [-- Type: text/plain, Size: 4218 bytes --] * config/i386/i386.c (avoid_plt_to_call): New function. (ix86_output_call_insn): Generate indirect call for functions marked with "noplt" attribute. (attribute_spec ix86_attribute_): Define new attribute "noplt". * doc/extend.texi: Document new attribute "noplt". * gcc.target/i386/noplt-1.c: New testcase. * gcc.target/i386/noplt-2.c: New testcase. Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 223720) +++ config/i386/i386.c (working copy) @@ -25599,6 +25599,25 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call return call; } +/* Return true if the function being called was marked with attribute + "noplt". If this function is defined, this should return false. */ +static bool +avoid_plt_to_call (rtx call_op) +{ + if (GET_CODE (call_op) != SYMBOL_REF + || SYMBOL_REF_LOCAL_P (call_op)) + return false; + + tree symbol_decl = SYMBOL_REF_DECL (call_op); + + if (symbol_decl != NULL_TREE + && TREE_CODE (symbol_decl) == FUNCTION_DECL + && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl))) + return true; + + return false; +} + /* Output the assembly for a call instruction. */ const char * @@ -25611,7 +25630,13 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op if (SIBLING_CALL_P (insn)) { if (direct_p) - xasm = "%!jmp\t%P0"; + { + if (!TARGET_MACHO && !TARGET_SEH && !TARGET_PECOFF + && TARGET_64BIT && avoid_plt_to_call (call_op)) + xasm = "%!jmp\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "%!jmp\t%P0"; + } /* SEH epilogue detection requires the indirect branch case to include REX.W. */ else if (TARGET_SEH) @@ -25654,7 +25679,13 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op } if (direct_p) - xasm = "%!call\t%P0"; + { + if (!TARGET_MACHO && !TARGET_SEH && !TARGET_PECOFF + && TARGET_64BIT && avoid_plt_to_call (call_op)) + xasm = "%!call\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "%!call\t%P0"; + } else xasm = "%!call\t%A0"; @@ -46628,6 +46659,9 @@ static const struct attribute_spec ix86_attribute_ false }, { "callee_pop_aggregate_return", 1, 1, false, true, true, ix86_handle_callee_pop_aggregate_return, true }, + /* Attribute to avoid calling function via PLT. */ + { "noplt", 0, 0, true, false, false, ix86_handle_fndecl_attribute, + false }, /* End element. */ { NULL, 0, 0, false, false, false, NULL, false } }; Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 223720) +++ doc/extend.texi (working copy) @@ -4858,6 +4858,13 @@ On x86-32 targets, the @code{stdcall} attribute ca assume that the called function pops off the stack space used to pass arguments, unless it takes a variable number of arguments. +@item noplt +@cindex @code{noplt} function attribute, x86-64 +@cindex functions whose calls do not go via PLT +On x86-64 targets. the @code{noplt} attribute causes the compiler to +call this external function indirectly using a GOT entry and avoid the +PLT. + @item target (@var{options}) @cindex @code{target} function attribute As discussed in @ref{Common Function Attributes}, this attribute Index: testsuite/gcc.target/i386/noplt-1.c =================================================================== --- testsuite/gcc.target/i386/noplt-1.c (revision 0) +++ testsuite/gcc.target/i386/noplt-1.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-* } } */ + + +__attribute__ ((noplt)) +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-2.c =================================================================== --- testsuite/gcc.target/i386/noplt-2.c (revision 0) +++ testsuite/gcc.target/i386/noplt-2.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-* } } */ +/* { dg-options "-O2" } */ + + +__attribute__ ((noplt)) +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-28 21:27 ` Sriraman Tallam @ 2015-05-28 21:31 ` H.J. Lu 2015-05-28 21:52 ` Sriraman Tallam 0 siblings, 1 reply; 65+ messages in thread From: H.J. Lu @ 2015-05-28 21:31 UTC (permalink / raw) To: Sriraman Tallam Cc: Pedro Alves, Michael Matz, David Li, GCC Patches, Jan Hubicka On Thu, May 28, 2015 at 1:54 PM, Sriraman Tallam <tmsriram@google.com> wrote: > On Thu, May 28, 2015 at 12:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>> On Thu, May 28, 2015 at 11:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>> On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>> I have attached a patch that adds the new attribute "noplt". Please review. >>>>> >>>>> * config/i386/i386.c (avoid_plt_to_call): New function. >>>>> (ix86_output_call_insn): Generate indirect call for functions >>>>> marked with "noplt" attribute. >>>>> (attribute_spec ix86_attribute_): Define new attribute "noplt". >>>>> * doc/extend.texi: Document new attribute "noplt". >>>>> * gcc.target/i386/noplt-1.c: New testcase. >>>>> * gcc.target/i386/noplt-2.c: New testcase. >>>>> >>>> >>>> 2 comments: >>>> >>>> 1. Don't remove "%!" prefix before call/jmp. It is needed for MPX. >>>> 2. Don't you need to check >>>> >>>> && !TARGET_MACHO >>>> && !TARGET_SEH >>>> && !TARGET_PECOFF >>>> >>>> since it only works for ELF. >>> >>> Ok, I will make this change. OTOH, is it just better to piggy-back on >>> existing -fno-plt change by Alex in calls.c >>> and do this: >>> >>> Index: calls.c >>> =================================================================== >>> --- calls.c (revision 223720) >>> +++ calls.c (working copy) >>> @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun >>> && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) >>> ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) >>> : memory_address (FUNCTION_MODE, funexp)); >>> - else if (flag_pic && !flag_plt && fndecl_or_type >>> + else if (fndecl_or_type >>> && TREE_CODE (fndecl_or_type) == FUNCTION_DECL >>> - && !targetm.binds_local_p (fndecl_or_type)) >>> + && !targetm.binds_local_p (fndecl_or_type) >>> + && ((flag_pic && !flag_plt) >>> + || (lookup_attribute ("noplt", DECL_ATTRIBUTES(fndecl_or_type))))) >>> { >>> funexp = force_reg (Pmode, funexp); >>> } >>> >> >> Does it work on non-PIC calls? > > You are right, it doesnt work. I have attached the patch with the > changes you mentioned. > Since direct_p is true, do wee need + if (GET_CODE (call_op) != SYMBOL_REF + || SYMBOL_REF_LOCAL_P (call_op)) + return false; H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-28 21:31 ` H.J. Lu @ 2015-05-28 21:52 ` Sriraman Tallam 2015-05-28 22:48 ` H.J. Lu 0 siblings, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-05-28 21:52 UTC (permalink / raw) To: H.J. Lu; +Cc: Pedro Alves, Michael Matz, David Li, GCC Patches, Jan Hubicka On Thu, May 28, 2015 at 2:01 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Thu, May 28, 2015 at 1:54 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Thu, May 28, 2015 at 12:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> On Thu, May 28, 2015 at 11:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>> On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>> I have attached a patch that adds the new attribute "noplt". Please review. >>>>>> >>>>>> * config/i386/i386.c (avoid_plt_to_call): New function. >>>>>> (ix86_output_call_insn): Generate indirect call for functions >>>>>> marked with "noplt" attribute. >>>>>> (attribute_spec ix86_attribute_): Define new attribute "noplt". >>>>>> * doc/extend.texi: Document new attribute "noplt". >>>>>> * gcc.target/i386/noplt-1.c: New testcase. >>>>>> * gcc.target/i386/noplt-2.c: New testcase. >>>>>> >>>>> >>>>> 2 comments: >>>>> >>>>> 1. Don't remove "%!" prefix before call/jmp. It is needed for MPX. >>>>> 2. Don't you need to check >>>>> >>>>> && !TARGET_MACHO >>>>> && !TARGET_SEH >>>>> && !TARGET_PECOFF >>>>> >>>>> since it only works for ELF. >>>> >>>> Ok, I will make this change. OTOH, is it just better to piggy-back on >>>> existing -fno-plt change by Alex in calls.c >>>> and do this: >>>> >>>> Index: calls.c >>>> =================================================================== >>>> --- calls.c (revision 223720) >>>> +++ calls.c (working copy) >>>> @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun >>>> && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) >>>> ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) >>>> : memory_address (FUNCTION_MODE, funexp)); >>>> - else if (flag_pic && !flag_plt && fndecl_or_type >>>> + else if (fndecl_or_type >>>> && TREE_CODE (fndecl_or_type) == FUNCTION_DECL >>>> - && !targetm.binds_local_p (fndecl_or_type)) >>>> + && !targetm.binds_local_p (fndecl_or_type) >>>> + && ((flag_pic && !flag_plt) >>>> + || (lookup_attribute ("noplt", DECL_ATTRIBUTES(fndecl_or_type))))) >>>> { >>>> funexp = force_reg (Pmode, funexp); >>>> } >>>> >>> >>> Does it work on non-PIC calls? >> >> You are right, it doesnt work. I have attached the patch with the >> changes you mentioned. >> > > Since direct_p is true, do wee need > > + if (GET_CODE (call_op) != SYMBOL_REF > + || SYMBOL_REF_LOCAL_P (call_op)) > + return false; We do need it right because for this case below, I do not want an indirect call: __attribute__((noplt)) int foo() { return 0; } int main() { return foo(); } Assuming foo is not inlined, if I remove the lines you mentioned, I will get an indirect call which is unnecessary. Thanks Sri > > H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-28 21:52 ` Sriraman Tallam @ 2015-05-28 22:48 ` H.J. Lu 2015-05-29 3:51 ` Sriraman Tallam 0 siblings, 1 reply; 65+ messages in thread From: H.J. Lu @ 2015-05-28 22:48 UTC (permalink / raw) To: Sriraman Tallam Cc: Pedro Alves, Michael Matz, David Li, GCC Patches, Jan Hubicka On Thu, May 28, 2015 at 2:27 PM, Sriraman Tallam <tmsriram@google.com> wrote: > On Thu, May 28, 2015 at 2:01 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Thu, May 28, 2015 at 1:54 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> On Thu, May 28, 2015 at 12:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>> On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>> On Thu, May 28, 2015 at 11:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>>> On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>>> I have attached a patch that adds the new attribute "noplt". Please review. >>>>>>> >>>>>>> * config/i386/i386.c (avoid_plt_to_call): New function. >>>>>>> (ix86_output_call_insn): Generate indirect call for functions >>>>>>> marked with "noplt" attribute. >>>>>>> (attribute_spec ix86_attribute_): Define new attribute "noplt". >>>>>>> * doc/extend.texi: Document new attribute "noplt". >>>>>>> * gcc.target/i386/noplt-1.c: New testcase. >>>>>>> * gcc.target/i386/noplt-2.c: New testcase. >>>>>>> >>>>>> >>>>>> 2 comments: >>>>>> >>>>>> 1. Don't remove "%!" prefix before call/jmp. It is needed for MPX. >>>>>> 2. Don't you need to check >>>>>> >>>>>> && !TARGET_MACHO >>>>>> && !TARGET_SEH >>>>>> && !TARGET_PECOFF >>>>>> >>>>>> since it only works for ELF. >>>>> >>>>> Ok, I will make this change. OTOH, is it just better to piggy-back on >>>>> existing -fno-plt change by Alex in calls.c >>>>> and do this: >>>>> >>>>> Index: calls.c >>>>> =================================================================== >>>>> --- calls.c (revision 223720) >>>>> +++ calls.c (working copy) >>>>> @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun >>>>> && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) >>>>> ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) >>>>> : memory_address (FUNCTION_MODE, funexp)); >>>>> - else if (flag_pic && !flag_plt && fndecl_or_type >>>>> + else if (fndecl_or_type >>>>> && TREE_CODE (fndecl_or_type) == FUNCTION_DECL >>>>> - && !targetm.binds_local_p (fndecl_or_type)) >>>>> + && !targetm.binds_local_p (fndecl_or_type) >>>>> + && ((flag_pic && !flag_plt) >>>>> + || (lookup_attribute ("noplt", DECL_ATTRIBUTES(fndecl_or_type))))) >>>>> { >>>>> funexp = force_reg (Pmode, funexp); >>>>> } >>>>> >>>> >>>> Does it work on non-PIC calls? >>> >>> You are right, it doesnt work. I have attached the patch with the >>> changes you mentioned. >>> >> >> Since direct_p is true, do wee need >> >> + if (GET_CODE (call_op) != SYMBOL_REF >> + || SYMBOL_REF_LOCAL_P (call_op)) >> + return false; > > We do need it right because for this case below, I do not want an > indirect call: > > __attribute__((noplt)) > int foo() { > return 0; > } > > int main() > { > return foo(); > } > > Assuming foo is not inlined, if I remove the lines you mentioned, I > will get an indirect call which is unnecessary. > I meant the "GET_CODE (call_op) != SYMBOL_REF" part isn't needed. -- H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-28 22:48 ` H.J. Lu @ 2015-05-29 3:51 ` Sriraman Tallam 2015-05-29 5:13 ` H.J. Lu 0 siblings, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-05-29 3:51 UTC (permalink / raw) To: H.J. Lu; +Cc: Pedro Alves, Michael Matz, David Li, GCC Patches, Jan Hubicka [-- Attachment #1: Type: text/plain, Size: 3318 bytes --] On Thu, May 28, 2015 at 2:52 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Thu, May 28, 2015 at 2:27 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Thu, May 28, 2015 at 2:01 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Thu, May 28, 2015 at 1:54 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> On Thu, May 28, 2015 at 12:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>> On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>> On Thu, May 28, 2015 at 11:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>>>> On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>>>> I have attached a patch that adds the new attribute "noplt". Please review. >>>>>>>> >>>>>>>> * config/i386/i386.c (avoid_plt_to_call): New function. >>>>>>>> (ix86_output_call_insn): Generate indirect call for functions >>>>>>>> marked with "noplt" attribute. >>>>>>>> (attribute_spec ix86_attribute_): Define new attribute "noplt". >>>>>>>> * doc/extend.texi: Document new attribute "noplt". >>>>>>>> * gcc.target/i386/noplt-1.c: New testcase. >>>>>>>> * gcc.target/i386/noplt-2.c: New testcase. >>>>>>>> >>>>>>> >>>>>>> 2 comments: >>>>>>> >>>>>>> 1. Don't remove "%!" prefix before call/jmp. It is needed for MPX. >>>>>>> 2. Don't you need to check >>>>>>> >>>>>>> && !TARGET_MACHO >>>>>>> && !TARGET_SEH >>>>>>> && !TARGET_PECOFF >>>>>>> >>>>>>> since it only works for ELF. >>>>>> >>>>>> Ok, I will make this change. OTOH, is it just better to piggy-back on >>>>>> existing -fno-plt change by Alex in calls.c >>>>>> and do this: >>>>>> >>>>>> Index: calls.c >>>>>> =================================================================== >>>>>> --- calls.c (revision 223720) >>>>>> +++ calls.c (working copy) >>>>>> @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun >>>>>> && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) >>>>>> ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) >>>>>> : memory_address (FUNCTION_MODE, funexp)); >>>>>> - else if (flag_pic && !flag_plt && fndecl_or_type >>>>>> + else if (fndecl_or_type >>>>>> && TREE_CODE (fndecl_or_type) == FUNCTION_DECL >>>>>> - && !targetm.binds_local_p (fndecl_or_type)) >>>>>> + && !targetm.binds_local_p (fndecl_or_type) >>>>>> + && ((flag_pic && !flag_plt) >>>>>> + || (lookup_attribute ("noplt", DECL_ATTRIBUTES(fndecl_or_type))))) >>>>>> { >>>>>> funexp = force_reg (Pmode, funexp); >>>>>> } >>>>>> >>>>> >>>>> Does it work on non-PIC calls? >>>> >>>> You are right, it doesnt work. I have attached the patch with the >>>> changes you mentioned. >>>> >>> >>> Since direct_p is true, do wee need >>> >>> + if (GET_CODE (call_op) != SYMBOL_REF >>> + || SYMBOL_REF_LOCAL_P (call_op)) >>> + return false; >> >> We do need it right because for this case below, I do not want an >> indirect call: >> >> __attribute__((noplt)) >> int foo() { >> return 0; >> } >> >> int main() >> { >> return foo(); >> } >> >> Assuming foo is not inlined, if I remove the lines you mentioned, I >> will get an indirect call which is unnecessary. >> > > I meant the "GET_CODE (call_op) != SYMBOL_REF" part isn't > needed. I should have realized that :), sorry. Patch fixed. Thanks Sri > > > > -- > H.J. [-- Attachment #2: noplt_attrib_patch.txt --] [-- Type: text/plain, Size: 4175 bytes --] * config/i386/i386.c (avoid_plt_to_call): New function. (ix86_output_call_insn): Generate indirect call for functions marked with "noplt" attribute. (attribute_spec ix86_attribute_): Define new attribute "noplt". * doc/extend.texi: Document new attribute "noplt". * gcc.target/i386/noplt-1.c: New testcase. * gcc.target/i386/noplt-2.c: New testcase. Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 223720) +++ config/i386/i386.c (working copy) @@ -25599,6 +25599,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call return call; } +/* Return true if the function being called was marked with attribute + "noplt". If this function is defined, this should return false. */ +static bool +avoid_plt_to_call (rtx call_op) +{ + if (SYMBOL_REF_LOCAL_P (call_op)) + return false; + + tree symbol_decl = SYMBOL_REF_DECL (call_op); + + if (symbol_decl != NULL_TREE + && TREE_CODE (symbol_decl) == FUNCTION_DECL + && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl))) + return true; + + return false; +} + /* Output the assembly for a call instruction. */ const char * @@ -25611,7 +25629,13 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op if (SIBLING_CALL_P (insn)) { if (direct_p) - xasm = "%!jmp\t%P0"; + { + if (!TARGET_MACHO && !TARGET_SEH && !TARGET_PECOFF + && TARGET_64BIT && avoid_plt_to_call (call_op)) + xasm = "%!jmp\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "%!jmp\t%P0"; + } /* SEH epilogue detection requires the indirect branch case to include REX.W. */ else if (TARGET_SEH) @@ -25654,7 +25678,13 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op } if (direct_p) - xasm = "%!call\t%P0"; + { + if (!TARGET_MACHO && !TARGET_SEH && !TARGET_PECOFF + && TARGET_64BIT && avoid_plt_to_call (call_op)) + xasm = "%!call\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "%!call\t%P0"; + } else xasm = "%!call\t%A0"; @@ -46628,6 +46658,9 @@ static const struct attribute_spec ix86_attribute_ false }, { "callee_pop_aggregate_return", 1, 1, false, true, true, ix86_handle_callee_pop_aggregate_return, true }, + /* Attribute to avoid calling function via PLT. */ + { "noplt", 0, 0, true, false, false, ix86_handle_fndecl_attribute, + false }, /* End element. */ { NULL, 0, 0, false, false, false, NULL, false } }; Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 223720) +++ doc/extend.texi (working copy) @@ -4858,6 +4858,13 @@ On x86-32 targets, the @code{stdcall} attribute ca assume that the called function pops off the stack space used to pass arguments, unless it takes a variable number of arguments. +@item noplt +@cindex @code{noplt} function attribute, x86-64 +@cindex functions whose calls do not go via PLT +On x86-64 targets. the @code{noplt} attribute causes the compiler to +call this external function indirectly using a GOT entry and avoid the +PLT. + @item target (@var{options}) @cindex @code{target} function attribute As discussed in @ref{Common Function Attributes}, this attribute Index: testsuite/gcc.target/i386/noplt-1.c =================================================================== --- testsuite/gcc.target/i386/noplt-1.c (revision 0) +++ testsuite/gcc.target/i386/noplt-1.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-* } } */ + + +__attribute__ ((noplt)) +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-2.c =================================================================== --- testsuite/gcc.target/i386/noplt-2.c (revision 0) +++ testsuite/gcc.target/i386/noplt-2.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-* } } */ +/* { dg-options "-O2" } */ + + +__attribute__ ((noplt)) +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-29 3:51 ` Sriraman Tallam @ 2015-05-29 5:13 ` H.J. Lu 2015-05-29 7:13 ` Sriraman Tallam 0 siblings, 1 reply; 65+ messages in thread From: H.J. Lu @ 2015-05-29 5:13 UTC (permalink / raw) To: Sriraman Tallam Cc: Pedro Alves, Michael Matz, David Li, GCC Patches, Jan Hubicka On Thu, May 28, 2015 at 4:54 PM, Sriraman Tallam <tmsriram@google.com> wrote: > On Thu, May 28, 2015 at 2:52 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Thu, May 28, 2015 at 2:27 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> On Thu, May 28, 2015 at 2:01 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>> On Thu, May 28, 2015 at 1:54 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>> On Thu, May 28, 2015 at 12:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>>> On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>>> On Thu, May 28, 2015 at 11:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>>>>> On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>>>>> I have attached a patch that adds the new attribute "noplt". Please review. >>>>>>>>> >>>>>>>>> * config/i386/i386.c (avoid_plt_to_call): New function. >>>>>>>>> (ix86_output_call_insn): Generate indirect call for functions >>>>>>>>> marked with "noplt" attribute. >>>>>>>>> (attribute_spec ix86_attribute_): Define new attribute "noplt". >>>>>>>>> * doc/extend.texi: Document new attribute "noplt". >>>>>>>>> * gcc.target/i386/noplt-1.c: New testcase. >>>>>>>>> * gcc.target/i386/noplt-2.c: New testcase. >>>>>>>>> >>>>>>>> >>>>>>>> 2 comments: >>>>>>>> >>>>>>>> 1. Don't remove "%!" prefix before call/jmp. It is needed for MPX. >>>>>>>> 2. Don't you need to check >>>>>>>> >>>>>>>> && !TARGET_MACHO >>>>>>>> && !TARGET_SEH >>>>>>>> && !TARGET_PECOFF >>>>>>>> >>>>>>>> since it only works for ELF. >>>>>>> >>>>>>> Ok, I will make this change. OTOH, is it just better to piggy-back on >>>>>>> existing -fno-plt change by Alex in calls.c >>>>>>> and do this: >>>>>>> >>>>>>> Index: calls.c >>>>>>> =================================================================== >>>>>>> --- calls.c (revision 223720) >>>>>>> +++ calls.c (working copy) >>>>>>> @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun >>>>>>> && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) >>>>>>> ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) >>>>>>> : memory_address (FUNCTION_MODE, funexp)); >>>>>>> - else if (flag_pic && !flag_plt && fndecl_or_type >>>>>>> + else if (fndecl_or_type >>>>>>> && TREE_CODE (fndecl_or_type) == FUNCTION_DECL >>>>>>> - && !targetm.binds_local_p (fndecl_or_type)) >>>>>>> + && !targetm.binds_local_p (fndecl_or_type) >>>>>>> + && ((flag_pic && !flag_plt) >>>>>>> + || (lookup_attribute ("noplt", DECL_ATTRIBUTES(fndecl_or_type))))) >>>>>>> { >>>>>>> funexp = force_reg (Pmode, funexp); >>>>>>> } >>>>>>> >>>>>> >>>>>> Does it work on non-PIC calls? >>>>> >>>>> You are right, it doesnt work. I have attached the patch with the >>>>> changes you mentioned. >>>>> >>>> >>>> Since direct_p is true, do wee need >>>> >>>> + if (GET_CODE (call_op) != SYMBOL_REF >>>> + || SYMBOL_REF_LOCAL_P (call_op)) >>>> + return false; >>> >>> We do need it right because for this case below, I do not want an >>> indirect call: >>> >>> __attribute__((noplt)) >>> int foo() { >>> return 0; >>> } >>> >>> int main() >>> { >>> return foo(); >>> } >>> >>> Assuming foo is not inlined, if I remove the lines you mentioned, I >>> will get an indirect call which is unnecessary. >>> >> >> I meant the "GET_CODE (call_op) != SYMBOL_REF" part isn't >> needed. > > I should have realized that :), sorry. Patch fixed. > --- testsuite/gcc.target/i386/noplt-1.c (revision 0) +++ testsuite/gcc.target/i386/noplt-1.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-* } } */ ... +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ The test will fail on Windows and Darwin. -- H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-29 5:13 ` H.J. Lu @ 2015-05-29 7:13 ` Sriraman Tallam 2015-05-29 17:36 ` Sriraman Tallam 2015-05-29 20:50 ` Jan Hubicka 0 siblings, 2 replies; 65+ messages in thread From: Sriraman Tallam @ 2015-05-29 7:13 UTC (permalink / raw) To: H.J. Lu; +Cc: Pedro Alves, Michael Matz, David Li, GCC Patches, Jan Hubicka [-- Attachment #1: Type: text/plain, Size: 3993 bytes --] On Thu, May 28, 2015 at 5:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Thu, May 28, 2015 at 4:54 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Thu, May 28, 2015 at 2:52 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Thu, May 28, 2015 at 2:27 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> On Thu, May 28, 2015 at 2:01 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>> On Thu, May 28, 2015 at 1:54 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>> On Thu, May 28, 2015 at 12:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>>>> On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>>>> On Thu, May 28, 2015 at 11:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>>>>>> On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>>>>>> I have attached a patch that adds the new attribute "noplt". Please review. >>>>>>>>>> >>>>>>>>>> * config/i386/i386.c (avoid_plt_to_call): New function. >>>>>>>>>> (ix86_output_call_insn): Generate indirect call for functions >>>>>>>>>> marked with "noplt" attribute. >>>>>>>>>> (attribute_spec ix86_attribute_): Define new attribute "noplt". >>>>>>>>>> * doc/extend.texi: Document new attribute "noplt". >>>>>>>>>> * gcc.target/i386/noplt-1.c: New testcase. >>>>>>>>>> * gcc.target/i386/noplt-2.c: New testcase. >>>>>>>>>> >>>>>>>>> >>>>>>>>> 2 comments: >>>>>>>>> >>>>>>>>> 1. Don't remove "%!" prefix before call/jmp. It is needed for MPX. >>>>>>>>> 2. Don't you need to check >>>>>>>>> >>>>>>>>> && !TARGET_MACHO >>>>>>>>> && !TARGET_SEH >>>>>>>>> && !TARGET_PECOFF >>>>>>>>> >>>>>>>>> since it only works for ELF. >>>>>>>> >>>>>>>> Ok, I will make this change. OTOH, is it just better to piggy-back on >>>>>>>> existing -fno-plt change by Alex in calls.c >>>>>>>> and do this: >>>>>>>> >>>>>>>> Index: calls.c >>>>>>>> =================================================================== >>>>>>>> --- calls.c (revision 223720) >>>>>>>> +++ calls.c (working copy) >>>>>>>> @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun >>>>>>>> && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) >>>>>>>> ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) >>>>>>>> : memory_address (FUNCTION_MODE, funexp)); >>>>>>>> - else if (flag_pic && !flag_plt && fndecl_or_type >>>>>>>> + else if (fndecl_or_type >>>>>>>> && TREE_CODE (fndecl_or_type) == FUNCTION_DECL >>>>>>>> - && !targetm.binds_local_p (fndecl_or_type)) >>>>>>>> + && !targetm.binds_local_p (fndecl_or_type) >>>>>>>> + && ((flag_pic && !flag_plt) >>>>>>>> + || (lookup_attribute ("noplt", DECL_ATTRIBUTES(fndecl_or_type))))) >>>>>>>> { >>>>>>>> funexp = force_reg (Pmode, funexp); >>>>>>>> } >>>>>>>> >>>>>>> >>>>>>> Does it work on non-PIC calls? >>>>>> >>>>>> You are right, it doesnt work. I have attached the patch with the >>>>>> changes you mentioned. >>>>>> >>>>> >>>>> Since direct_p is true, do wee need >>>>> >>>>> + if (GET_CODE (call_op) != SYMBOL_REF >>>>> + || SYMBOL_REF_LOCAL_P (call_op)) >>>>> + return false; >>>> >>>> We do need it right because for this case below, I do not want an >>>> indirect call: >>>> >>>> __attribute__((noplt)) >>>> int foo() { >>>> return 0; >>>> } >>>> >>>> int main() >>>> { >>>> return foo(); >>>> } >>>> >>>> Assuming foo is not inlined, if I remove the lines you mentioned, I >>>> will get an indirect call which is unnecessary. >>>> >>> >>> I meant the "GET_CODE (call_op) != SYMBOL_REF" part isn't >>> needed. >> >> I should have realized that :), sorry. Patch fixed. >> > > --- testsuite/gcc.target/i386/noplt-1.c (revision 0) > +++ testsuite/gcc.target/i386/noplt-1.c (working copy) > @@ -0,0 +1,13 @@ > +/* { dg-do compile { target x86_64-*-* } } */ > ... > +/* { dg-final { scan-assembler "call\[ > \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ > > The test will fail on Windows and Darwin. Changed to use x86_64-*-linux* target. > > > -- > H.J. [-- Attachment #2: noplt_attrib_patch.txt --] [-- Type: text/plain, Size: 4185 bytes --] * config/i386/i386.c (avoid_plt_to_call): New function. (ix86_output_call_insn): Generate indirect call for functions marked with "noplt" attribute. (attribute_spec ix86_attribute_): Define new attribute "noplt". * doc/extend.texi: Document new attribute "noplt". * gcc.target/i386/noplt-1.c: New testcase. * gcc.target/i386/noplt-2.c: New testcase. Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 223720) +++ config/i386/i386.c (working copy) @@ -25599,6 +25599,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call return call; } +/* Return true if the function being called was marked with attribute + "noplt". If this function is defined, this should return false. */ +static bool +avoid_plt_to_call (rtx call_op) +{ + if (SYMBOL_REF_LOCAL_P (call_op)) + return false; + + tree symbol_decl = SYMBOL_REF_DECL (call_op); + + if (symbol_decl != NULL_TREE + && TREE_CODE (symbol_decl) == FUNCTION_DECL + && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl))) + return true; + + return false; +} + /* Output the assembly for a call instruction. */ const char * @@ -25611,7 +25629,13 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op if (SIBLING_CALL_P (insn)) { if (direct_p) - xasm = "%!jmp\t%P0"; + { + if (!TARGET_MACHO && !TARGET_SEH && !TARGET_PECOFF + && TARGET_64BIT && avoid_plt_to_call (call_op)) + xasm = "%!jmp\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "%!jmp\t%P0"; + } /* SEH epilogue detection requires the indirect branch case to include REX.W. */ else if (TARGET_SEH) @@ -25654,7 +25678,13 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op } if (direct_p) - xasm = "%!call\t%P0"; + { + if (!TARGET_MACHO && !TARGET_SEH && !TARGET_PECOFF + && TARGET_64BIT && avoid_plt_to_call (call_op)) + xasm = "%!call\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "%!call\t%P0"; + } else xasm = "%!call\t%A0"; @@ -46628,6 +46658,9 @@ static const struct attribute_spec ix86_attribute_ false }, { "callee_pop_aggregate_return", 1, 1, false, true, true, ix86_handle_callee_pop_aggregate_return, true }, + /* Attribute to avoid calling function via PLT. */ + { "noplt", 0, 0, true, false, false, ix86_handle_fndecl_attribute, + false }, /* End element. */ { NULL, 0, 0, false, false, false, NULL, false } }; Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 223720) +++ doc/extend.texi (working copy) @@ -4858,6 +4858,13 @@ On x86-32 targets, the @code{stdcall} attribute ca assume that the called function pops off the stack space used to pass arguments, unless it takes a variable number of arguments. +@item noplt +@cindex @code{noplt} function attribute, x86-64 +@cindex functions whose calls do not go via PLT +On x86-64 targets. the @code{noplt} attribute causes the compiler to +call this external function indirectly using a GOT entry and avoid the +PLT. + @item target (@var{options}) @cindex @code{target} function attribute As discussed in @ref{Common Function Attributes}, this attribute Index: testsuite/gcc.target/i386/noplt-1.c =================================================================== --- testsuite/gcc.target/i386/noplt-1.c (revision 0) +++ testsuite/gcc.target/i386/noplt-1.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ + + +__attribute__ ((noplt)) +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-2.c =================================================================== --- testsuite/gcc.target/i386/noplt-2.c (revision 0) +++ testsuite/gcc.target/i386/noplt-2.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2" } */ + + +__attribute__ ((noplt)) +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-29 7:13 ` Sriraman Tallam @ 2015-05-29 17:36 ` Sriraman Tallam 2015-05-29 17:52 ` H.J. Lu 2015-05-29 20:50 ` Jan Hubicka 1 sibling, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-05-29 17:36 UTC (permalink / raw) To: H.J. Lu; +Cc: Pedro Alves, Michael Matz, David Li, GCC Patches, Jan Hubicka Hi HJ, Is this ok to commit? Thanks Sri On Thu, May 28, 2015 at 11:03 PM, Sriraman Tallam <tmsriram@google.com> wrote: > On Thu, May 28, 2015 at 5:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Thu, May 28, 2015 at 4:54 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> On Thu, May 28, 2015 at 2:52 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>> On Thu, May 28, 2015 at 2:27 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>> On Thu, May 28, 2015 at 2:01 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>>> On Thu, May 28, 2015 at 1:54 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>>> On Thu, May 28, 2015 at 12:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>>>>> On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>>>>> On Thu, May 28, 2015 at 11:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>>>>>>> On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>>>>>>> I have attached a patch that adds the new attribute "noplt". Please review. >>>>>>>>>>> >>>>>>>>>>> * config/i386/i386.c (avoid_plt_to_call): New function. >>>>>>>>>>> (ix86_output_call_insn): Generate indirect call for functions >>>>>>>>>>> marked with "noplt" attribute. >>>>>>>>>>> (attribute_spec ix86_attribute_): Define new attribute "noplt". >>>>>>>>>>> * doc/extend.texi: Document new attribute "noplt". >>>>>>>>>>> * gcc.target/i386/noplt-1.c: New testcase. >>>>>>>>>>> * gcc.target/i386/noplt-2.c: New testcase. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2 comments: >>>>>>>>>> >>>>>>>>>> 1. Don't remove "%!" prefix before call/jmp. It is needed for MPX. >>>>>>>>>> 2. Don't you need to check >>>>>>>>>> >>>>>>>>>> && !TARGET_MACHO >>>>>>>>>> && !TARGET_SEH >>>>>>>>>> && !TARGET_PECOFF >>>>>>>>>> >>>>>>>>>> since it only works for ELF. >>>>>>>>> >>>>>>>>> Ok, I will make this change. OTOH, is it just better to piggy-back on >>>>>>>>> existing -fno-plt change by Alex in calls.c >>>>>>>>> and do this: >>>>>>>>> >>>>>>>>> Index: calls.c >>>>>>>>> =================================================================== >>>>>>>>> --- calls.c (revision 223720) >>>>>>>>> +++ calls.c (working copy) >>>>>>>>> @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun >>>>>>>>> && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) >>>>>>>>> ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) >>>>>>>>> : memory_address (FUNCTION_MODE, funexp)); >>>>>>>>> - else if (flag_pic && !flag_plt && fndecl_or_type >>>>>>>>> + else if (fndecl_or_type >>>>>>>>> && TREE_CODE (fndecl_or_type) == FUNCTION_DECL >>>>>>>>> - && !targetm.binds_local_p (fndecl_or_type)) >>>>>>>>> + && !targetm.binds_local_p (fndecl_or_type) >>>>>>>>> + && ((flag_pic && !flag_plt) >>>>>>>>> + || (lookup_attribute ("noplt", DECL_ATTRIBUTES(fndecl_or_type))))) >>>>>>>>> { >>>>>>>>> funexp = force_reg (Pmode, funexp); >>>>>>>>> } >>>>>>>>> >>>>>>>> >>>>>>>> Does it work on non-PIC calls? >>>>>>> >>>>>>> You are right, it doesnt work. I have attached the patch with the >>>>>>> changes you mentioned. >>>>>>> >>>>>> >>>>>> Since direct_p is true, do wee need >>>>>> >>>>>> + if (GET_CODE (call_op) != SYMBOL_REF >>>>>> + || SYMBOL_REF_LOCAL_P (call_op)) >>>>>> + return false; >>>>> >>>>> We do need it right because for this case below, I do not want an >>>>> indirect call: >>>>> >>>>> __attribute__((noplt)) >>>>> int foo() { >>>>> return 0; >>>>> } >>>>> >>>>> int main() >>>>> { >>>>> return foo(); >>>>> } >>>>> >>>>> Assuming foo is not inlined, if I remove the lines you mentioned, I >>>>> will get an indirect call which is unnecessary. >>>>> >>>> >>>> I meant the "GET_CODE (call_op) != SYMBOL_REF" part isn't >>>> needed. >>> >>> I should have realized that :), sorry. Patch fixed. >>> >> >> --- testsuite/gcc.target/i386/noplt-1.c (revision 0) >> +++ testsuite/gcc.target/i386/noplt-1.c (working copy) >> @@ -0,0 +1,13 @@ >> +/* { dg-do compile { target x86_64-*-* } } */ >> ... >> +/* { dg-final { scan-assembler "call\[ >> \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ >> >> The test will fail on Windows and Darwin. > > Changed to use x86_64-*-linux* target. > >> >> >> -- >> H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-29 17:36 ` Sriraman Tallam @ 2015-05-29 17:52 ` H.J. Lu 2015-05-29 18:33 ` Sriraman Tallam 0 siblings, 1 reply; 65+ messages in thread From: H.J. Lu @ 2015-05-29 17:52 UTC (permalink / raw) To: Sriraman Tallam Cc: Pedro Alves, Michael Matz, David Li, GCC Patches, Jan Hubicka On Fri, May 29, 2015 at 10:20 AM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi HJ, > > Is this ok to commit? > Looks good to me. But I can't approve it. -- H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-29 17:52 ` H.J. Lu @ 2015-05-29 18:33 ` Sriraman Tallam 0 siblings, 0 replies; 65+ messages in thread From: Sriraman Tallam @ 2015-05-29 18:33 UTC (permalink / raw) To: H.J. Lu Cc: Pedro Alves, Michael Matz, David Li, GCC Patches, Jan Hubicka, Uros Bizjak +Uros On Fri, May 29, 2015 at 10:25 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Fri, May 29, 2015 at 10:20 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> Hi HJ, >> >> Is this ok to commit? >> > > Looks good to me. But I can't approve it. > > -- > H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-29 7:13 ` Sriraman Tallam 2015-05-29 17:36 ` Sriraman Tallam @ 2015-05-29 20:50 ` Jan Hubicka 2015-05-29 22:56 ` Sriraman Tallam 1 sibling, 1 reply; 65+ messages in thread From: Jan Hubicka @ 2015-05-29 20:50 UTC (permalink / raw) To: Sriraman Tallam Cc: H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches, Jan Hubicka > * config/i386/i386.c (avoid_plt_to_call): New function. > (ix86_output_call_insn): Generate indirect call for functions > marked with "noplt" attribute. > (attribute_spec ix86_attribute_): Define new attribute "noplt". > * doc/extend.texi: Document new attribute "noplt". > * gcc.target/i386/noplt-1.c: New testcase. > * gcc.target/i386/noplt-2.c: New testcase. > > Index: config/i386/i386.c > =================================================================== > --- config/i386/i386.c (revision 223720) > +++ config/i386/i386.c (working copy) > @@ -25599,6 +25599,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call > return call; > } > > +/* Return true if the function being called was marked with attribute > + "noplt". If this function is defined, this should return false. */ > +static bool > +avoid_plt_to_call (rtx call_op) > +{ > + if (SYMBOL_REF_LOCAL_P (call_op)) > + return false; > + > + tree symbol_decl = SYMBOL_REF_DECL (call_op); > + > + if (symbol_decl != NULL_TREE > + && TREE_CODE (symbol_decl) == FUNCTION_DECL > + && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl))) > + return true; > + > + return false; > +} OK, now we have __attribute__ (optimize("noplt")) which binds to the caller and makes all calls in the function to skip PLT and __attribute__ ("noplt") which binds to callee and makes all calls to function to not use PLT. That sort of makes sense to me, but why "noplt" attribute is not implemented at generic level just like -fplt? Is it only because every target supporting PLT would need update in its call expansion patterns? Also I think the PLT calls have EBX in call fusage wich is added by ix86_expand_call. else { /* Static functions and indirect calls don't need the pic register. */ if (flag_pic && (!TARGET_64BIT || (ix86_cmodel == CM_LARGE_PIC && DEFAULT_ABI != MS_ABI)) && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) { use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); if (ix86_use_pseudo_pic_reg ()) emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM), pic_offset_table_rtx); } I think you want to take that away from FUSAGE there just like we do for local calls (and in fact the code should already check flag_pic && flag_plt I suppose. Honza ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-29 20:50 ` Jan Hubicka @ 2015-05-29 22:56 ` Sriraman Tallam 2015-05-29 23:08 ` Sriraman Tallam [not found] ` <CAJA7tRYsMiq7rx34c=z6KwRdwYxxaeP6Z6qzA4XEwnJSMT7z=Q@mail.gmail.com> 0 siblings, 2 replies; 65+ messages in thread From: Sriraman Tallam @ 2015-05-29 22:56 UTC (permalink / raw) To: Jan Hubicka; +Cc: H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches [-- Attachment #1: Type: text/plain, Size: 2769 bytes --] On Fri, May 29, 2015 at 12:35 PM, Jan Hubicka <hubicka@ucw.cz> wrote: >> * config/i386/i386.c (avoid_plt_to_call): New function. >> (ix86_output_call_insn): Generate indirect call for functions >> marked with "noplt" attribute. >> (attribute_spec ix86_attribute_): Define new attribute "noplt". >> * doc/extend.texi: Document new attribute "noplt". >> * gcc.target/i386/noplt-1.c: New testcase. >> * gcc.target/i386/noplt-2.c: New testcase. >> >> Index: config/i386/i386.c >> =================================================================== >> --- config/i386/i386.c (revision 223720) >> +++ config/i386/i386.c (working copy) >> @@ -25599,6 +25599,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call >> return call; >> } >> >> +/* Return true if the function being called was marked with attribute >> + "noplt". If this function is defined, this should return false. */ >> +static bool >> +avoid_plt_to_call (rtx call_op) >> +{ >> + if (SYMBOL_REF_LOCAL_P (call_op)) >> + return false; >> + >> + tree symbol_decl = SYMBOL_REF_DECL (call_op); >> + >> + if (symbol_decl != NULL_TREE >> + && TREE_CODE (symbol_decl) == FUNCTION_DECL >> + && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl))) >> + return true; >> + >> + return false; >> +} > > OK, now we have __attribute__ (optimize("noplt")) which binds to the caller and makes > all calls in the function to skip PLT and __attribute__ ("noplt") which binds to callee > and makes all calls to function to not use PLT. > > That sort of makes sense to me, but why "noplt" attribute is not implemented at generic level > just like -fplt? Is it only because every target supporting PLT would need update in its > call expansion patterns? Yes, that is what I had in mind. > > Also I think the PLT calls have EBX in call fusage wich is added by ix86_expand_call. > else > { > /* Static functions and indirect calls don't need the pic register. */ > if (flag_pic > && (!TARGET_64BIT > || (ix86_cmodel == CM_LARGE_PIC > && DEFAULT_ABI != MS_ABI)) > && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF > && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) > { > use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); > if (ix86_use_pseudo_pic_reg ()) > emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM), > pic_offset_table_rtx); > } > > I think you want to take that away from FUSAGE there just like we do for local calls > (and in fact the code should already check flag_pic && flag_plt I suppose. Done that now and patch attached. Thanks Sri > > Honza [-- Attachment #2: noplt_attrib_patch.txt --] [-- Type: text/plain, Size: 5221 bytes --] * config/i386/i386.c (avoid_plt_to_call): New function. (ix86_expand_call): Dont use the PIC register when external function calls are not made via PLT. (ix86_output_call_insn): Generate indirect call for functions marked with "noplt" attribute. (attribute_spec ix86_attribute_): Define new attribute "noplt". * doc/extend.texi: Document new attribute "noplt". * gcc.target/i386/noplt-1.c: New testcase. * gcc.target/i386/noplt-2.c: New testcase. Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 223720) +++ config/i386/i386.c (working copy) @@ -25475,6 +25475,28 @@ construct_plt_address (rtx symbol) return tmp; } +/* Return true if the function being called was marked with attribute + "noplt". If this function is defined, this should return false. This + is currently used only with 64-bit ELF targets. */ +static bool +avoid_plt_to_call (rtx call_op) +{ + if (!TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF) + return false; + + if (SYMBOL_REF_LOCAL_P (call_op)) + return false; + + tree symbol_decl = SYMBOL_REF_DECL (call_op); + + if (symbol_decl != NULL_TREE + && TREE_CODE (symbol_decl) == FUNCTION_DECL + && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl))) + return true; + + return false; +} + rtx ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, rtx callarg2, @@ -25497,13 +25519,16 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call } else { - /* Static functions and indirect calls don't need the pic register. */ + /* Static functions and indirect calls don't need the pic register. Also, + check if PLT was explicitly avoided via no-plt or "noplt" attribute, making + it an indirect call. */ if (flag_pic && (!TARGET_64BIT || (ix86_cmodel == CM_LARGE_PIC && DEFAULT_ABI != MS_ABI)) && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF - && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) + && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)) + && flag_plt && !avoid_plt_to_call (XEXP (fnaddr, 0))) { use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); if (ix86_use_pseudo_pic_reg ()) @@ -25611,7 +25636,13 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op if (SIBLING_CALL_P (insn)) { if (direct_p) - xasm = "%!jmp\t%P0"; + { + if (!TARGET_MACHO && !TARGET_SEH && !TARGET_PECOFF + && TARGET_64BIT && avoid_plt_to_call (call_op)) + xasm = "%!jmp\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "%!jmp\t%P0"; + } /* SEH epilogue detection requires the indirect branch case to include REX.W. */ else if (TARGET_SEH) @@ -25654,7 +25685,13 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op } if (direct_p) - xasm = "%!call\t%P0"; + { + if (!TARGET_MACHO && !TARGET_SEH && !TARGET_PECOFF + && TARGET_64BIT && avoid_plt_to_call (call_op)) + xasm = "%!call\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "%!call\t%P0"; + } else xasm = "%!call\t%A0"; @@ -46628,6 +46665,9 @@ static const struct attribute_spec ix86_attribute_ false }, { "callee_pop_aggregate_return", 1, 1, false, true, true, ix86_handle_callee_pop_aggregate_return, true }, + /* Attribute to avoid calling function via PLT. */ + { "noplt", 0, 0, true, false, false, ix86_handle_fndecl_attribute, + false }, /* End element. */ { NULL, 0, 0, false, false, false, NULL, false } }; Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 223720) +++ doc/extend.texi (working copy) @@ -4858,6 +4858,13 @@ On x86-32 targets, the @code{stdcall} attribute ca assume that the called function pops off the stack space used to pass arguments, unless it takes a variable number of arguments. +@item noplt +@cindex @code{noplt} function attribute, x86-64 +@cindex functions whose calls do not go via PLT +On x86-64 targets. the @code{noplt} attribute causes the compiler to +call this external function indirectly using a GOT entry and avoid the +PLT. + @item target (@var{options}) @cindex @code{target} function attribute As discussed in @ref{Common Function Attributes}, this attribute Index: testsuite/gcc.target/i386/noplt-1.c =================================================================== --- testsuite/gcc.target/i386/noplt-1.c (revision 0) +++ testsuite/gcc.target/i386/noplt-1.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ + + +__attribute__ ((noplt)) +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-2.c =================================================================== --- testsuite/gcc.target/i386/noplt-2.c (revision 0) +++ testsuite/gcc.target/i386/noplt-2.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2" } */ + + +__attribute__ ((noplt)) +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-29 22:56 ` Sriraman Tallam @ 2015-05-29 23:08 ` Sriraman Tallam [not found] ` <CAJA7tRYsMiq7rx34c=z6KwRdwYxxaeP6Z6qzA4XEwnJSMT7z=Q@mail.gmail.com> 1 sibling, 0 replies; 65+ messages in thread From: Sriraman Tallam @ 2015-05-29 23:08 UTC (permalink / raw) To: Jan Hubicka; +Cc: H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches [-- Attachment #1: Type: text/plain, Size: 2983 bytes --] Made one more change and New patch attached. Thanks Sri On Fri, May 29, 2015 at 2:37 PM, Sriraman Tallam <tmsriram@google.com> wrote: > On Fri, May 29, 2015 at 12:35 PM, Jan Hubicka <hubicka@ucw.cz> wrote: >>> * config/i386/i386.c (avoid_plt_to_call): New function. >>> (ix86_output_call_insn): Generate indirect call for functions >>> marked with "noplt" attribute. >>> (attribute_spec ix86_attribute_): Define new attribute "noplt". >>> * doc/extend.texi: Document new attribute "noplt". >>> * gcc.target/i386/noplt-1.c: New testcase. >>> * gcc.target/i386/noplt-2.c: New testcase. >>> >>> Index: config/i386/i386.c >>> =================================================================== >>> --- config/i386/i386.c (revision 223720) >>> +++ config/i386/i386.c (working copy) >>> @@ -25599,6 +25599,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call >>> return call; >>> } >>> >>> +/* Return true if the function being called was marked with attribute >>> + "noplt". If this function is defined, this should return false. */ >>> +static bool >>> +avoid_plt_to_call (rtx call_op) >>> +{ >>> + if (SYMBOL_REF_LOCAL_P (call_op)) >>> + return false; >>> + >>> + tree symbol_decl = SYMBOL_REF_DECL (call_op); >>> + >>> + if (symbol_decl != NULL_TREE >>> + && TREE_CODE (symbol_decl) == FUNCTION_DECL >>> + && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl))) >>> + return true; >>> + >>> + return false; >>> +} >> >> OK, now we have __attribute__ (optimize("noplt")) which binds to the caller and makes >> all calls in the function to skip PLT and __attribute__ ("noplt") which binds to callee >> and makes all calls to function to not use PLT. >> >> That sort of makes sense to me, but why "noplt" attribute is not implemented at generic level >> just like -fplt? Is it only because every target supporting PLT would need update in its >> call expansion patterns? > > Yes, that is what I had in mind. > >> >> Also I think the PLT calls have EBX in call fusage wich is added by ix86_expand_call. >> else >> { >> /* Static functions and indirect calls don't need the pic register. */ >> if (flag_pic >> && (!TARGET_64BIT >> || (ix86_cmodel == CM_LARGE_PIC >> && DEFAULT_ABI != MS_ABI)) >> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >> && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >> { >> use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); >> if (ix86_use_pseudo_pic_reg ()) >> emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM), >> pic_offset_table_rtx); >> } >> >> I think you want to take that away from FUSAGE there just like we do for local calls >> (and in fact the code should already check flag_pic && flag_plt I suppose. > > Done that now and patch attached. > > Thanks > Sri > >> >> Honza [-- Attachment #2: noplt_attrib_patch.txt --] [-- Type: text/plain, Size: 5077 bytes --] * config/i386/i386.c (avoid_plt_to_call): New function. (ix86_expand_call): Dont use the PIC register when external function calls are not made via PLT. (ix86_output_call_insn): Generate indirect call for functions marked with "noplt" attribute. (attribute_spec ix86_attribute_): Define new attribute "noplt". * doc/extend.texi: Document new attribute "noplt". * gcc.target/i386/noplt-1.c: New testcase. * gcc.target/i386/noplt-2.c: New testcase. Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 223720) +++ config/i386/i386.c (working copy) @@ -25475,6 +25475,28 @@ construct_plt_address (rtx symbol) return tmp; } +/* Return true if the function being called was marked with attribute + "noplt". If this function is defined, this should return false. This + is currently used only with 64-bit ELF targets. */ +static bool +avoid_plt_to_call (rtx call_op) +{ + if (!TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF) + return false; + + if (SYMBOL_REF_LOCAL_P (call_op)) + return false; + + tree symbol_decl = SYMBOL_REF_DECL (call_op); + + if (symbol_decl != NULL_TREE + && TREE_CODE (symbol_decl) == FUNCTION_DECL + && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl))) + return true; + + return false; +} + rtx ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, rtx callarg2, @@ -25497,13 +25519,16 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call } else { - /* Static functions and indirect calls don't need the pic register. */ + /* Static functions and indirect calls don't need the pic register. Also, + check if PLT was explicitly avoided via no-plt or "noplt" attribute, making + it an indirect call. */ if (flag_pic && (!TARGET_64BIT || (ix86_cmodel == CM_LARGE_PIC && DEFAULT_ABI != MS_ABI)) && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF - && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) + && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)) + && flag_plt && !avoid_plt_to_call (XEXP (fnaddr, 0))) { use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); if (ix86_use_pseudo_pic_reg ()) @@ -25611,7 +25636,12 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op if (SIBLING_CALL_P (insn)) { if (direct_p) - xasm = "%!jmp\t%P0"; + { + if (avoid_plt_to_call (call_op)) + xasm = "%!jmp\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "%!jmp\t%P0"; + } /* SEH epilogue detection requires the indirect branch case to include REX.W. */ else if (TARGET_SEH) @@ -25654,7 +25684,12 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op } if (direct_p) - xasm = "%!call\t%P0"; + { + if (avoid_plt_to_call (call_op)) + xasm = "%!call\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "%!call\t%P0"; + } else xasm = "%!call\t%A0"; @@ -46628,6 +46663,9 @@ static const struct attribute_spec ix86_attribute_ false }, { "callee_pop_aggregate_return", 1, 1, false, true, true, ix86_handle_callee_pop_aggregate_return, true }, + /* Attribute to avoid calling function via PLT. */ + { "noplt", 0, 0, true, false, false, ix86_handle_fndecl_attribute, + false }, /* End element. */ { NULL, 0, 0, false, false, false, NULL, false } }; Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 223720) +++ doc/extend.texi (working copy) @@ -4858,6 +4858,13 @@ On x86-32 targets, the @code{stdcall} attribute ca assume that the called function pops off the stack space used to pass arguments, unless it takes a variable number of arguments. +@item noplt +@cindex @code{noplt} function attribute, x86-64 +@cindex functions whose calls do not go via PLT +On x86-64 targets. the @code{noplt} attribute causes the compiler to +call this external function indirectly using a GOT entry and avoid the +PLT. + @item target (@var{options}) @cindex @code{target} function attribute As discussed in @ref{Common Function Attributes}, this attribute Index: testsuite/gcc.target/i386/noplt-1.c =================================================================== --- testsuite/gcc.target/i386/noplt-1.c (revision 0) +++ testsuite/gcc.target/i386/noplt-1.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ + + +__attribute__ ((noplt)) +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-2.c =================================================================== --- testsuite/gcc.target/i386/noplt-2.c (revision 0) +++ testsuite/gcc.target/i386/noplt-2.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2" } */ + + +__attribute__ ((noplt)) +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ ^ permalink raw reply [flat|nested] 65+ messages in thread
[parent not found: <CAJA7tRYsMiq7rx34c=z6KwRdwYxxaeP6Z6qzA4XEwnJSMT7z=Q@mail.gmail.com>]
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= [not found] ` <CAJA7tRYsMiq7rx34c=z6KwRdwYxxaeP6Z6qzA4XEwnJSMT7z=Q@mail.gmail.com> @ 2015-05-30 4:44 ` Sriraman Tallam 2015-06-01 8:24 ` Ramana Radhakrishnan 0 siblings, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-05-30 4:44 UTC (permalink / raw) To: ramrad01 Cc: Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches On Fri, May 29, 2015 at 3:24 PM, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > > > On Friday, 29 May 2015, Sriraman Tallam <tmsriram@google.com> wrote: >> >> On Fri, May 29, 2015 at 12:35 PM, Jan Hubicka <hubicka@ucw.cz> wrote: >> >> * config/i386/i386.c (avoid_plt_to_call): New function. >> >> (ix86_output_call_insn): Generate indirect call for functions >> >> marked with "noplt" attribute. >> >> (attribute_spec ix86_attribute_): Define new attribute "noplt". >> >> * doc/extend.texi: Document new attribute "noplt". >> >> * gcc.target/i386/noplt-1.c: New testcase. >> >> * gcc.target/i386/noplt-2.c: New testcase. >> >> >> >> Index: config/i386/i386.c >> >> =================================================================== >> >> --- config/i386/i386.c (revision 223720) >> >> +++ config/i386/i386.c (working copy) >> >> @@ -25599,6 +25599,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx >> >> call >> >> return call; >> >> } >> >> >> >> +/* Return true if the function being called was marked with attribute >> >> + "noplt". If this function is defined, this should return false. >> >> */ >> >> +static bool >> >> +avoid_plt_to_call (rtx call_op) >> >> +{ >> >> + if (SYMBOL_REF_LOCAL_P (call_op)) >> >> + return false; >> >> + >> >> + tree symbol_decl = SYMBOL_REF_DECL (call_op); >> >> + >> >> + if (symbol_decl != NULL_TREE >> >> + && TREE_CODE (symbol_decl) == FUNCTION_DECL >> >> + && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl))) >> >> + return true; >> >> + >> >> + return false; >> >> +} >> > >> > OK, now we have __attribute__ (optimize("noplt")) which binds to the >> > caller and makes >> > all calls in the function to skip PLT and __attribute__ ("noplt") which >> > binds to callee >> > and makes all calls to function to not use PLT. >> > >> > That sort of makes sense to me, but why "noplt" attribute is not >> > implemented at generic level >> > just like -fplt? Is it only because every target supporting PLT would >> > need update in its >> > call expansion patterns? >> >> Yes, that is what I had in mind. >> > > > Why isn't it just an indirect call in the cases that would require a GOT > slot and a direct call otherwise ? I'm trying to work out what's so > different on each target that mandates this to be in the target backend. > Also it would be better to push the tests into gcc.dg if you can and check > for the absence of a relocation so that folks at least see these as being > UNSUPPORTED on their target. I am not familiar with PLT calls for other targets. I can move the tests to gcc.dg but what relocation are you suggesting I check for? Thanks Sri > > > > Ramana >> >> > >> > Also I think the PLT calls have EBX in call fusage wich is added by >> > ix86_expand_call. >> > else >> > { >> > /* Static functions and indirect calls don't need the pic >> > register. */ >> > if (flag_pic >> > && (!TARGET_64BIT >> > || (ix86_cmodel == CM_LARGE_PIC >> > && DEFAULT_ABI != MS_ABI)) >> > && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >> > && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >> > { >> > use_reg (&use, gen_rtx_REG (Pmode, >> > REAL_PIC_OFFSET_TABLE_REGNUM)); >> > if (ix86_use_pseudo_pic_reg ()) >> > emit_move_insn (gen_rtx_REG (Pmode, >> > REAL_PIC_OFFSET_TABLE_REGNUM), >> > pic_offset_table_rtx); >> > } >> > >> > I think you want to take that away from FUSAGE there just like we do for >> > local calls >> > (and in fact the code should already check flag_pic && flag_plt I >> > suppose. >> >> Done that now and patch attached. >> >> Thanks >> Sri >> >> > >> > Honza ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-30 4:44 ` Sriraman Tallam @ 2015-06-01 8:24 ` Ramana Radhakrishnan 2015-06-01 18:01 ` Sriraman Tallam 0 siblings, 1 reply; 65+ messages in thread From: Ramana Radhakrishnan @ 2015-06-01 8:24 UTC (permalink / raw) To: Sriraman Tallam Cc: Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches >> Why isn't it just an indirect call in the cases that would require a GOT >> slot and a direct call otherwise ? I'm trying to work out what's so >> different on each target that mandates this to be in the target backend. >> Also it would be better to push the tests into gcc.dg if you can and check >> for the absence of a relocation so that folks at least see these as being >> UNSUPPORTED on their target. > To be even more explicit, shouldn't this be handled similar to the way in which -fno-plt is handled in a target agnostic manner ? After all, if you can handle this for the command line, doing the same for a function which has been decorated with attribute((noplt)) should be simple. > I am not familiar with PLT calls for other targets. I can move the > tests to gcc.dg but what relocation are you suggesting I check for? Move the test to gcc.dg, add a target_support_no_plt function in testsuite/lib/target-supports.exp and mark this as being supported only on x86 and use scan-assembler to scan for PLT relocations for x86. Other targets can add things as they deem fit. In any case, on a large number of elf/ linux targets I would have thought the absence of a JMP_SLOT relocation would be good enough to check that this is working correctly. regards Ramana > > Thanks > Sri > > >> >> >> >> Ramana >>> >>>> >>>> Also I think the PLT calls have EBX in call fusage wich is added by >>>> ix86_expand_call. >>>> else >>>> { >>>> /* Static functions and indirect calls don't need the pic >>>> register. */ >>>> if (flag_pic >>>> && (!TARGET_64BIT >>>> || (ix86_cmodel == CM_LARGE_PIC >>>> && DEFAULT_ABI != MS_ABI)) >>>> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >>>> && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >>>> { >>>> use_reg (&use, gen_rtx_REG (Pmode, >>>> REAL_PIC_OFFSET_TABLE_REGNUM)); >>>> if (ix86_use_pseudo_pic_reg ()) >>>> emit_move_insn (gen_rtx_REG (Pmode, >>>> REAL_PIC_OFFSET_TABLE_REGNUM), >>>> pic_offset_table_rtx); >>>> } >>>> >>>> I think you want to take that away from FUSAGE there just like we do for >>>> local calls >>>> (and in fact the code should already check flag_pic && flag_plt I >>>> suppose. >>> >>> Done that now and patch attached. >>> >>> Thanks >>> Sri >>> >>>> >>>> Honza > ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-01 8:24 ` Ramana Radhakrishnan @ 2015-06-01 18:01 ` Sriraman Tallam 2015-06-01 18:41 ` Ramana Radhakrishnan 0 siblings, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-06-01 18:01 UTC (permalink / raw) To: Ramana Radhakrishnan Cc: Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan <ramana.radhakrishnan@arm.com> wrote: > >>> Why isn't it just an indirect call in the cases that would require a GOT >>> slot and a direct call otherwise ? I'm trying to work out what's so >>> different on each target that mandates this to be in the target backend. >>> Also it would be better to push the tests into gcc.dg if you can and >>> check >>> for the absence of a relocation so that folks at least see these as being >>> UNSUPPORTED on their target. >> >> > > > To be even more explicit, shouldn't this be handled similar to the way in > which -fno-plt is handled in a target agnostic manner ? After all, if you > can handle this for the command line, doing the same for a function which > has been decorated with attribute((noplt)) should be simple. -fno-plt does not work for non-PIC code, having non-PIC code not use PLT was my primary motivation. Infact, if you go back in this thread, I suggested to HJ if I should piggyback on -fno-plt. I tried using the -fno-plt implementation to do this by removing the flag_pic check in calls.c, but that does not still work for non-PIC code. > >> I am not familiar with PLT calls for other targets. I can move the >> tests to gcc.dg but what relocation are you suggesting I check for? > > > Move the test to gcc.dg, add a target_support_no_plt function in > testsuite/lib/target-supports.exp and mark this as being supported only on > x86 and use scan-assembler to scan for PLT relocations for x86. Other > targets can add things as they deem fit. > > In any case, on a large number of elf/ linux targets I would have thought > the absence of a JMP_SLOT relocation would be good enough to check that this > is working correctly. > > regards > Ramana > > > > >> >> Thanks >> Sri >> >> >>> >>> >>> >>> Ramana >>>> >>>> >>>>> >>>>> Also I think the PLT calls have EBX in call fusage wich is added by >>>>> ix86_expand_call. >>>>> else >>>>> { >>>>> /* Static functions and indirect calls don't need the pic >>>>> register. */ >>>>> if (flag_pic >>>>> && (!TARGET_64BIT >>>>> || (ix86_cmodel == CM_LARGE_PIC >>>>> && DEFAULT_ABI != MS_ABI)) >>>>> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >>>>> && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >>>>> { >>>>> use_reg (&use, gen_rtx_REG (Pmode, >>>>> REAL_PIC_OFFSET_TABLE_REGNUM)); >>>>> if (ix86_use_pseudo_pic_reg ()) >>>>> emit_move_insn (gen_rtx_REG (Pmode, >>>>> REAL_PIC_OFFSET_TABLE_REGNUM), >>>>> pic_offset_table_rtx); >>>>> } >>>>> >>>>> I think you want to take that away from FUSAGE there just like we do >>>>> for >>>>> local calls >>>>> (and in fact the code should already check flag_pic && flag_plt I >>>>> suppose. >>>> >>>> >>>> Done that now and patch attached. >>>> >>>> Thanks >>>> Sri >>>> >>>>> >>>>> Honza >> >> > ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-01 18:01 ` Sriraman Tallam @ 2015-06-01 18:41 ` Ramana Radhakrishnan 2015-06-01 18:55 ` Sriraman Tallam 0 siblings, 1 reply; 65+ messages in thread From: Ramana Radhakrishnan @ 2015-06-01 18:41 UTC (permalink / raw) To: Sriraman Tallam Cc: Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam <tmsriram@google.com> wrote: > On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan > <ramana.radhakrishnan@arm.com> wrote: >> >>>> Why isn't it just an indirect call in the cases that would require a GOT >>>> slot and a direct call otherwise ? I'm trying to work out what's so >>>> different on each target that mandates this to be in the target backend. >>>> Also it would be better to push the tests into gcc.dg if you can and >>>> check >>>> for the absence of a relocation so that folks at least see these as being >>>> UNSUPPORTED on their target. >>> >>> >> >> >> To be even more explicit, shouldn't this be handled similar to the way in >> which -fno-plt is handled in a target agnostic manner ? After all, if you >> can handle this for the command line, doing the same for a function which >> has been decorated with attribute((noplt)) should be simple. > > -fno-plt does not work for non-PIC code, having non-PIC code not use > PLT was my primary motivation. Infact, if you go back in this thread, > I suggested to HJ if I should piggyback on -fno-plt. I tried using > the -fno-plt implementation to do this by removing the flag_pic check > in calls.c, but that does not still work for non-PIC code. You're missing my point, unless I'm missing something basic here - I should have been even more explicit and said -fPIC was a given in all this discussion. calls.c:229 has else if (flag_pic && !flag_plt && fndecl_or_type && TREE_CODE (fndecl_or_type) == FUNCTION_DECL && !targetm.binds_local_p (fndecl_or_type)) why can't we merge the check in here for the attribute noplt ? If a new attribute is added to the "GNU language" in this case, why isn't this being treated in the same way as the command line option has been treated ? All this means is that we add an attribute and a command line option to common code and then not implement it in a proper target agnostic fashion. regards Ramana > >> >>> I am not familiar with PLT calls for other targets. I can move the >>> tests to gcc.dg but what relocation are you suggesting I check for? >> >> >> Move the test to gcc.dg, add a target_support_no_plt function in >> testsuite/lib/target-supports.exp and mark this as being supported only on >> x86 and use scan-assembler to scan for PLT relocations for x86. Other >> targets can add things as they deem fit. > >> >> In any case, on a large number of elf/ linux targets I would have thought >> the absence of a JMP_SLOT relocation would be good enough to check that this >> is working correctly. >> >> regards >> Ramana >> >> >> >> >>> >>> Thanks >>> Sri >>> >>> >>>> >>>> >>>> >>>> Ramana >>>>> >>>>> >>>>>> >>>>>> Also I think the PLT calls have EBX in call fusage wich is added by >>>>>> ix86_expand_call. >>>>>> else >>>>>> { >>>>>> /* Static functions and indirect calls don't need the pic >>>>>> register. */ >>>>>> if (flag_pic >>>>>> && (!TARGET_64BIT >>>>>> || (ix86_cmodel == CM_LARGE_PIC >>>>>> && DEFAULT_ABI != MS_ABI)) >>>>>> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >>>>>> && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >>>>>> { >>>>>> use_reg (&use, gen_rtx_REG (Pmode, >>>>>> REAL_PIC_OFFSET_TABLE_REGNUM)); >>>>>> if (ix86_use_pseudo_pic_reg ()) >>>>>> emit_move_insn (gen_rtx_REG (Pmode, >>>>>> REAL_PIC_OFFSET_TABLE_REGNUM), >>>>>> pic_offset_table_rtx); >>>>>> } >>>>>> >>>>>> I think you want to take that away from FUSAGE there just like we do >>>>>> for >>>>>> local calls >>>>>> (and in fact the code should already check flag_pic && flag_plt I >>>>>> suppose. >>>>> >>>>> >>>>> Done that now and patch attached. >>>>> >>>>> Thanks >>>>> Sri >>>>> >>>>>> >>>>>> Honza >>> >>> >> ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-01 18:41 ` Ramana Radhakrishnan @ 2015-06-01 18:55 ` Sriraman Tallam 2015-06-01 20:33 ` Ramana Radhakrishnan 0 siblings, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-06-01 18:55 UTC (permalink / raw) To: ramrad01 Cc: Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan >> <ramana.radhakrishnan@arm.com> wrote: >>> >>>>> Why isn't it just an indirect call in the cases that would require a GOT >>>>> slot and a direct call otherwise ? I'm trying to work out what's so >>>>> different on each target that mandates this to be in the target backend. >>>>> Also it would be better to push the tests into gcc.dg if you can and >>>>> check >>>>> for the absence of a relocation so that folks at least see these as being >>>>> UNSUPPORTED on their target. >>>> >>>> >>> >>> >>> To be even more explicit, shouldn't this be handled similar to the way in >>> which -fno-plt is handled in a target agnostic manner ? After all, if you >>> can handle this for the command line, doing the same for a function which >>> has been decorated with attribute((noplt)) should be simple. >> >> -fno-plt does not work for non-PIC code, having non-PIC code not use >> PLT was my primary motivation. Infact, if you go back in this thread, >> I suggested to HJ if I should piggyback on -fno-plt. I tried using >> the -fno-plt implementation to do this by removing the flag_pic check >> in calls.c, but that does not still work for non-PIC code. > > You're missing my point, unless I'm missing something basic here - I > should have been even more explicit and said -fPIC was a given in all > this discussion. > > calls.c:229 has > > else if (flag_pic && !flag_plt && fndecl_or_type > && TREE_CODE (fndecl_or_type) == FUNCTION_DECL > && !targetm.binds_local_p (fndecl_or_type)) > > why can't we merge the check in here for the attribute noplt ? We can and and please see this thread, that is the exact patch I proposed : https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html However, there was one caveat. I want this working without -fPIC too. non-PIC code also generates PLT calls and I want them eliminated. > > If a new attribute is added to the "GNU language" in this case, why > isn't this being treated in the same way as the command line option > has been treated ? All this means is that we add an attribute and a > command line option to common code and then not implement it in a > proper target agnostic fashion. You are right. This is the way I wanted it too but I also wanted the attribute to work without PIC. PLT calls are generated without -fPIC and -fPIE too and I wanted a solution for that. On looking at the code in more detail, * -fno-plt is made to work with -fPIC, is there a reason to not make it work for non-PIC code? I can remove the flag_pic check from calls.c * Then, I add the generic attribute "noplt" and everything is fine. There is just one caveat with the above approach, for x86_64 (*call_insn) will not generate indirect-calls for *non-PIC* code because constant_call_address_operand in predicates.md will evaluate to false. This can be fixed appropriately in ix86_output_call_insn in i386.c. Is this alright? Sorry for the confusion, but the primary reason why I did not do it the way you suggested is because we wanted "noplt" attribute to work for non-PIC code also. Thanks Sri > > regards > Ramana > > >> >>> >>>> I am not familiar with PLT calls for other targets. I can move the >>>> tests to gcc.dg but what relocation are you suggesting I check for? >>> >>> >>> Move the test to gcc.dg, add a target_support_no_plt function in >>> testsuite/lib/target-supports.exp and mark this as being supported only on >>> x86 and use scan-assembler to scan for PLT relocations for x86. Other >>> targets can add things as they deem fit. >> >>> >>> In any case, on a large number of elf/ linux targets I would have thought >>> the absence of a JMP_SLOT relocation would be good enough to check that this >>> is working correctly. >>> >>> regards >>> Ramana >>> >>> >>> >>> >>>> >>>> Thanks >>>> Sri >>>> >>>> >>>>> >>>>> >>>>> >>>>> Ramana >>>>>> >>>>>> >>>>>>> >>>>>>> Also I think the PLT calls have EBX in call fusage wich is added by >>>>>>> ix86_expand_call. >>>>>>> else >>>>>>> { >>>>>>> /* Static functions and indirect calls don't need the pic >>>>>>> register. */ >>>>>>> if (flag_pic >>>>>>> && (!TARGET_64BIT >>>>>>> || (ix86_cmodel == CM_LARGE_PIC >>>>>>> && DEFAULT_ABI != MS_ABI)) >>>>>>> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >>>>>>> && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >>>>>>> { >>>>>>> use_reg (&use, gen_rtx_REG (Pmode, >>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM)); >>>>>>> if (ix86_use_pseudo_pic_reg ()) >>>>>>> emit_move_insn (gen_rtx_REG (Pmode, >>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM), >>>>>>> pic_offset_table_rtx); >>>>>>> } >>>>>>> >>>>>>> I think you want to take that away from FUSAGE there just like we do >>>>>>> for >>>>>>> local calls >>>>>>> (and in fact the code should already check flag_pic && flag_plt I >>>>>>> suppose. >>>>>> >>>>>> >>>>>> Done that now and patch attached. >>>>>> >>>>>> Thanks >>>>>> Sri >>>>>> >>>>>>> >>>>>>> Honza >>>> >>>> >>> ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-01 18:55 ` Sriraman Tallam @ 2015-06-01 20:33 ` Ramana Radhakrishnan 2015-06-02 18:27 ` Sriraman Tallam 0 siblings, 1 reply; 65+ messages in thread From: Ramana Radhakrishnan @ 2015-06-01 20:33 UTC (permalink / raw) To: Sriraman Tallam Cc: Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches On Mon, Jun 1, 2015 at 7:55 PM, Sriraman Tallam <tmsriram@google.com> wrote: > On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan > <ramana.gcc@googlemail.com> wrote: >> On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan >>> <ramana.radhakrishnan@arm.com> wrote: >>>> >>>>>> Why isn't it just an indirect call in the cases that would require a GOT >>>>>> slot and a direct call otherwise ? I'm trying to work out what's so >>>>>> different on each target that mandates this to be in the target backend. >>>>>> Also it would be better to push the tests into gcc.dg if you can and >>>>>> check >>>>>> for the absence of a relocation so that folks at least see these as being >>>>>> UNSUPPORTED on their target. >>>>> >>>>> >>>> >>>> >>>> To be even more explicit, shouldn't this be handled similar to the way in >>>> which -fno-plt is handled in a target agnostic manner ? After all, if you >>>> can handle this for the command line, doing the same for a function which >>>> has been decorated with attribute((noplt)) should be simple. >>> >>> -fno-plt does not work for non-PIC code, having non-PIC code not use >>> PLT was my primary motivation. Infact, if you go back in this thread, >>> I suggested to HJ if I should piggyback on -fno-plt. I tried using >>> the -fno-plt implementation to do this by removing the flag_pic check >>> in calls.c, but that does not still work for non-PIC code. If you want __attribute__ ((noplt)) to work for non-PIC code, we should look to code it in the same place surely by making all __attribute__((noplt)) calls, indirect calls irrespective of whether it's fpic or not. >> >> You're missing my point, unless I'm missing something basic here - I >> should have been even more explicit and said -fPIC was a given in all >> this discussion. >> >> calls.c:229 has >> >> else if (flag_pic && !flag_plt && fndecl_or_type >> && TREE_CODE (fndecl_or_type) == FUNCTION_DECL >> && !targetm.binds_local_p (fndecl_or_type)) >> >> why can't we merge the check in here for the attribute noplt ? > > We can and and please see this thread, that is the exact patch I proposed : > https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html > > However, there was one caveat. I want this working without -fPIC too. > non-PIC code also generates PLT calls and I want them eliminated. > >> >> If a new attribute is added to the "GNU language" in this case, why >> isn't this being treated in the same way as the command line option >> has been treated ? All this means is that we add an attribute and a >> command line option to common code and then not implement it in a >> proper target agnostic fashion. > > You are right. This is the way I wanted it too but I also wanted the > attribute to work without PIC. PLT calls are generated without -fPIC > and -fPIE too and I wanted a solution for that. On looking at the > code in more detail, > > * -fno-plt is made to work with -fPIC, is there a reason to not make > it work for non-PIC code? I can remove the flag_pic check from > calls.c I don't think that's right, you probably have to allow that along with (flag_pic || (decl && attribute_no_plt (decl)) - however it seems odd to me that the language extension allows this but the flag doesn't. > * Then, I add the generic attribute "noplt" and everything is fine. > > There is just one caveat with the above approach, for x86_64 > (*call_insn) will not generate indirect-calls for *non-PIC* code > because constant_call_address_operand in predicates.md will evaluate > to false. This can be fixed appropriately in ix86_output_call_insn in > i386.c. Yes, targets need to massage that into place but that's essentially the mechanics of retaining indirect calls in each backend. -fno-plt doesn't work for ARM / AArch64 with optimizers currently (and I suspect on most other targets) because our predicates are too liberal, fixed by treating "noplt" or -fno-plt as the equivalent of -mlong-calls. > > > Is this alright? Sorry for the confusion, but the primary reason why > I did not do it the way you suggested is because we wanted "noplt" > attribute to work for non-PIC code also. If that is the case, then this is a slightly more complicated condition in the same place. We then always have indirect calls for functions that are marked noplt and just have target generate this appropriately. To be honest, this is trivial to implement in the ARM backend as one would just piggy back on the longcalls work - despite that, IMNSHO it's best done in a target independent manner. regards Ramana > > Thanks > Sri > >> >> regards >> Ramana >> >> >>> >>>> >>>>> I am not familiar with PLT calls for other targets. I can move the >>>>> tests to gcc.dg but what relocation are you suggesting I check for? >>>> >>>> >>>> Move the test to gcc.dg, add a target_support_no_plt function in >>>> testsuite/lib/target-supports.exp and mark this as being supported only on >>>> x86 and use scan-assembler to scan for PLT relocations for x86. Other >>>> targets can add things as they deem fit. >>> >>>> >>>> In any case, on a large number of elf/ linux targets I would have thought >>>> the absence of a JMP_SLOT relocation would be good enough to check that this >>>> is working correctly. >>>> >>>> regards >>>> Ramana >>>> >>>> >>>> >>>> >>>>> >>>>> Thanks >>>>> Sri >>>>> >>>>> >>>>>> >>>>>> >>>>>> >>>>>> Ramana >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Also I think the PLT calls have EBX in call fusage wich is added by >>>>>>>> ix86_expand_call. >>>>>>>> else >>>>>>>> { >>>>>>>> /* Static functions and indirect calls don't need the pic >>>>>>>> register. */ >>>>>>>> if (flag_pic >>>>>>>> && (!TARGET_64BIT >>>>>>>> || (ix86_cmodel == CM_LARGE_PIC >>>>>>>> && DEFAULT_ABI != MS_ABI)) >>>>>>>> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >>>>>>>> && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >>>>>>>> { >>>>>>>> use_reg (&use, gen_rtx_REG (Pmode, >>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM)); >>>>>>>> if (ix86_use_pseudo_pic_reg ()) >>>>>>>> emit_move_insn (gen_rtx_REG (Pmode, >>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM), >>>>>>>> pic_offset_table_rtx); >>>>>>>> } >>>>>>>> >>>>>>>> I think you want to take that away from FUSAGE there just like we do >>>>>>>> for >>>>>>>> local calls >>>>>>>> (and in fact the code should already check flag_pic && flag_plt I >>>>>>>> suppose. >>>>>>> >>>>>>> >>>>>>> Done that now and patch attached. >>>>>>> >>>>>>> Thanks >>>>>>> Sri >>>>>>> >>>>>>>> >>>>>>>> Honza >>>>> >>>>> >>>> ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-01 20:33 ` Ramana Radhakrishnan @ 2015-06-02 18:27 ` Sriraman Tallam 2015-06-02 19:59 ` Bernhard Reutner-Fischer 2015-06-02 21:09 ` Ramana Radhakrishnan 0 siblings, 2 replies; 65+ messages in thread From: Sriraman Tallam @ 2015-06-02 18:27 UTC (permalink / raw) To: ramrad01 Cc: Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches [-- Attachment #1: Type: text/plain, Size: 8134 bytes --] On Mon, Jun 1, 2015 at 1:33 PM, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > On Mon, Jun 1, 2015 at 7:55 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan >> <ramana.gcc@googlemail.com> wrote: >>> On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan >>>> <ramana.radhakrishnan@arm.com> wrote: >>>>> >>>>>>> Why isn't it just an indirect call in the cases that would require a GOT >>>>>>> slot and a direct call otherwise ? I'm trying to work out what's so >>>>>>> different on each target that mandates this to be in the target backend. >>>>>>> Also it would be better to push the tests into gcc.dg if you can and >>>>>>> check >>>>>>> for the absence of a relocation so that folks at least see these as being >>>>>>> UNSUPPORTED on their target. >>>>>> >>>>>> >>>>> >>>>> >>>>> To be even more explicit, shouldn't this be handled similar to the way in >>>>> which -fno-plt is handled in a target agnostic manner ? After all, if you >>>>> can handle this for the command line, doing the same for a function which >>>>> has been decorated with attribute((noplt)) should be simple. >>>> >>>> -fno-plt does not work for non-PIC code, having non-PIC code not use >>>> PLT was my primary motivation. Infact, if you go back in this thread, >>>> I suggested to HJ if I should piggyback on -fno-plt. I tried using >>>> the -fno-plt implementation to do this by removing the flag_pic check >>>> in calls.c, but that does not still work for non-PIC code. > > If you want __attribute__ ((noplt)) to work for non-PIC code, we > should look to code it in the same place surely by making all > __attribute__((noplt)) calls, indirect calls irrespective of whether > it's fpic or not. > > >>> >>> You're missing my point, unless I'm missing something basic here - I >>> should have been even more explicit and said -fPIC was a given in all >>> this discussion. >>> >>> calls.c:229 has >>> >>> else if (flag_pic && !flag_plt && fndecl_or_type >>> && TREE_CODE (fndecl_or_type) == FUNCTION_DECL >>> && !targetm.binds_local_p (fndecl_or_type)) >>> >>> why can't we merge the check in here for the attribute noplt ? >> >> We can and and please see this thread, that is the exact patch I proposed : >> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html >> >> However, there was one caveat. I want this working without -fPIC too. >> non-PIC code also generates PLT calls and I want them eliminated. >> >>> >>> If a new attribute is added to the "GNU language" in this case, why >>> isn't this being treated in the same way as the command line option >>> has been treated ? All this means is that we add an attribute and a >>> command line option to common code and then not implement it in a >>> proper target agnostic fashion. >> >> You are right. This is the way I wanted it too but I also wanted the >> attribute to work without PIC. PLT calls are generated without -fPIC >> and -fPIE too and I wanted a solution for that. On looking at the >> code in more detail, >> >> * -fno-plt is made to work with -fPIC, is there a reason to not make >> it work for non-PIC code? I can remove the flag_pic check from >> calls.c > > I don't think that's right, you probably have to allow that along with > (flag_pic || (decl && attribute_no_plt (decl)) - however it seems odd > to me that the language extension allows this but the flag doesn't. > >> * Then, I add the generic attribute "noplt" and everything is fine. >> >> There is just one caveat with the above approach, for x86_64 >> (*call_insn) will not generate indirect-calls for *non-PIC* code >> because constant_call_address_operand in predicates.md will evaluate >> to false. This can be fixed appropriately in ix86_output_call_insn in >> i386.c. > > Yes, targets need to massage that into place but that's essentially > the mechanics of retaining indirect calls in each backend. -fno-plt > doesn't work for ARM / AArch64 with optimizers currently (and I > suspect on most other targets) because our predicates are too liberal, > fixed by treating "noplt" or -fno-plt as the equivalent of > -mlong-calls. > >> >> >> Is this alright? Sorry for the confusion, but the primary reason why >> I did not do it the way you suggested is because we wanted "noplt" >> attribute to work for non-PIC code also. > > If that is the case, then this is a slightly more complicated > condition in the same place. We then always have indirect calls for > functions that are marked noplt and just have target generate this > appropriately. I have now modified this patch. This patch does two things: 1) Adds new generic function attribute "no_plt" that is similar in functionality to -fno-plt except that it applies only to calls to functions that are marked with this attribute. 2) For x86_64, it makes -fno-plt(and the attribute) also work for non-PIC code by directly generating an indirect call via a GOT entry. For PIC code, no_plt merely shadows the implementation of -fno-plt, no surprises here. * c-family/c-common.c (no_plt): New attribute. (handle_no_plt_attribute): New handler. * calls.c (prepare_call_address): Check for no_plt attribute. * config/i386/i386.c (ix86_function_ok_for_sibcall): Check for no_plt attribute. (ix86_expand_call): Ditto. (nopic_no_plt_attribute): New function. (ix86_output_call_insn): Output indirect call for non-pic no plt calls. * doc/extend.texi (no_plt): Document new attribute. * testsuite/gcc.target/i386/noplt-1.c: New test. * testsuite/gcc.target/i386/noplt-2.c: New test. * testsuite/gcc.target/i386/noplt-3.c: New test. * testsuite/gcc.target/i386/noplt-4.c: New test. Please review. Thanks Sri > > To be honest, this is trivial to implement in the ARM backend as one > would just piggy back on the longcalls work - despite that, IMNSHO > it's best done in a target independent manner. > > regards > Ramana > >> >> Thanks >> Sri >> >>> >>> regards >>> Ramana >>> >>> >>>> >>>>> >>>>>> I am not familiar with PLT calls for other targets. I can move the >>>>>> tests to gcc.dg but what relocation are you suggesting I check for? >>>>> >>>>> >>>>> Move the test to gcc.dg, add a target_support_no_plt function in >>>>> testsuite/lib/target-supports.exp and mark this as being supported only on >>>>> x86 and use scan-assembler to scan for PLT relocations for x86. Other >>>>> targets can add things as they deem fit. >>>> >>>>> >>>>> In any case, on a large number of elf/ linux targets I would have thought >>>>> the absence of a JMP_SLOT relocation would be good enough to check that this >>>>> is working correctly. >>>>> >>>>> regards >>>>> Ramana >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> Thanks >>>>>> Sri >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Ramana >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Also I think the PLT calls have EBX in call fusage wich is added by >>>>>>>>> ix86_expand_call. >>>>>>>>> else >>>>>>>>> { >>>>>>>>> /* Static functions and indirect calls don't need the pic >>>>>>>>> register. */ >>>>>>>>> if (flag_pic >>>>>>>>> && (!TARGET_64BIT >>>>>>>>> || (ix86_cmodel == CM_LARGE_PIC >>>>>>>>> && DEFAULT_ABI != MS_ABI)) >>>>>>>>> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >>>>>>>>> && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >>>>>>>>> { >>>>>>>>> use_reg (&use, gen_rtx_REG (Pmode, >>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM)); >>>>>>>>> if (ix86_use_pseudo_pic_reg ()) >>>>>>>>> emit_move_insn (gen_rtx_REG (Pmode, >>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM), >>>>>>>>> pic_offset_table_rtx); >>>>>>>>> } >>>>>>>>> >>>>>>>>> I think you want to take that away from FUSAGE there just like we do >>>>>>>>> for >>>>>>>>> local calls >>>>>>>>> (and in fact the code should already check flag_pic && flag_plt I >>>>>>>>> suppose. >>>>>>>> >>>>>>>> >>>>>>>> Done that now and patch attached. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Sri >>>>>>>> >>>>>>>>> >>>>>>>>> Honza >>>>>> >>>>>> >>>>> [-- Attachment #2: noplt_attrib_patch_new.txt --] [-- Type: text/plain, Size: 9660 bytes --] * c-family/c-common.c (no_plt): New attribute. (handle_no_plt_attribute): New handler. * calls.c (prepare_call_address): Check for no_plt attribute. * config/i386/i386.c (ix86_function_ok_for_sibcall): Check for no_plt attribute. (ix86_expand_call): Ditto. (nopic_no_plt_attribute): New function. (ix86_output_call_insn): Output indirect call for non-pic no plt calls. * doc/extend.texi (no_plt): Document new attribute. * testsuite/gcc.target/i386/noplt-1.c: New test. * testsuite/gcc.target/i386/noplt-2.c: New test. * testsuite/gcc.target/i386/noplt-3.c: New test. * testsuite/gcc.target/i386/noplt-4.c: New test. This patch does two things: * Adds new generic function attribute "no_plt" that is similar in functionality to -fno-plt except that it applies only to calls to functions that are marked with this attribute. * For x86_64, it makes -fno-plt(and the attribute) also work for non-PIC code by directly generating an indirect call via a GOT entry. Index: c-family/c-common.c =================================================================== --- c-family/c-common.c (revision 223720) +++ c-family/c-common.c (working copy) @@ -357,6 +357,7 @@ static tree handle_mode_attribute (tree *, tree, t static tree handle_section_attribute (tree *, tree, tree, int, bool *); static tree handle_aligned_attribute (tree *, tree, tree, int, bool *); static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ; +static tree handle_no_plt_attribute (tree *, tree, tree, int, bool *) ; static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *); static tree handle_ifunc_attribute (tree *, tree, tree, int, bool *); static tree handle_alias_attribute (tree *, tree, tree, int, bool *); @@ -706,6 +707,8 @@ const struct attribute_spec c_common_attribute_tab handle_aligned_attribute, false }, { "weak", 0, 0, true, false, false, handle_weak_attribute, false }, + { "no_plt", 0, 0, true, false, false, + handle_no_plt_attribute, false }, { "ifunc", 1, 1, true, false, false, handle_ifunc_attribute, false }, { "alias", 1, 1, true, false, false, @@ -8185,6 +8188,25 @@ handle_weak_attribute (tree *node, tree name, return NULL_TREE; } +/* Handle a "no_plt" attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_no_plt_attribute (tree *node, tree name, + tree ARG_UNUSED (args), + int ARG_UNUSED (flags), + bool * ARG_UNUSED (no_add_attrs)) +{ + if (TREE_CODE (*node) != FUNCTION_DECL) + { + warning (OPT_Wattributes, + "%qE attribute is only applicable on functions", name); + *no_add_attrs = true; + return NULL_TREE; + } + return NULL_TREE; +} + /* Handle an "alias" or "ifunc" attribute; arguments as in struct attribute_spec.handler, except that IS_ALIAS tells us whether this is an alias as opposed to ifunc attribute. */ Index: calls.c =================================================================== --- calls.c (revision 223720) +++ calls.c (working copy) @@ -226,8 +226,10 @@ prepare_call_address (tree fndecl_or_type, rtx fun && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) : memory_address (FUNCTION_MODE, funexp)); - else if (flag_pic && !flag_plt && fndecl_or_type + else if (flag_pic && fndecl_or_type && TREE_CODE (fndecl_or_type) == FUNCTION_DECL + && (!flag_plt + || lookup_attribute ("no_plt", DECL_ATTRIBUTES (fndecl_or_type))) && !targetm.binds_local_p (fndecl_or_type)) { funexp = force_reg (Pmode, funexp); Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 223720) +++ config/i386/i386.c (working copy) @@ -5479,6 +5479,8 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) && !TARGET_64BIT && flag_pic && flag_plt + && (TREE_CODE (decl) != FUNCTION_DECL + || !lookup_attribute ("no_plt", DECL_ATTRIBUTES (decl))) && decl && !targetm.binds_local_p (decl)) return false; @@ -25497,13 +25499,19 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call } else { - /* Static functions and indirect calls don't need the pic register. */ + /* Static functions and indirect calls don't need the pic register. Also, + check if PLT was explicitly avoided via no-plt or "no_plt" attribute, making + it an indirect call. */ if (flag_pic && (!TARGET_64BIT || (ix86_cmodel == CM_LARGE_PIC && DEFAULT_ABI != MS_ABI)) && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF - && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) + && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)) + && flag_plt + && (TREE_CODE (SYMBOL_REF_DECL (XEXP(fnaddr, 0))) != FUNCTION_DECL + || !lookup_attribute ("no_plt", + DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0)))))) { use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); if (ix86_use_pseudo_pic_reg ()) @@ -25599,6 +25607,34 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call return call; } +/* Return true if the function being called was marked with attribute + "no_plt" or using -fno-plt and we are compiling for no-PIC and x86_64. + This is currently used only with 64-bit ELF targets to call the function + marked "no_plt" indirectly. */ + +static bool +nopic_no_plt_attribute (rtx call_op) +{ + if (flag_pic) + return false; + + if (!TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF) + return false; + + if (SYMBOL_REF_LOCAL_P (call_op)) + return false; + + tree symbol_decl = SYMBOL_REF_DECL (call_op); + + if (symbol_decl != NULL_TREE + && TREE_CODE (symbol_decl) == FUNCTION_DECL + && (!flag_plt + || lookup_attribute ("no_plt", DECL_ATTRIBUTES (symbol_decl)))) + return true; + + return false; +} + /* Output the assembly for a call instruction. */ const char * @@ -25610,7 +25646,9 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op if (SIBLING_CALL_P (insn)) { - if (direct_p) + if (direct_p && nopic_no_plt_attribute (call_op)) + xasm = "%!jmp\t*%p0@GOTPCREL(%%rip)"; + else if (direct_p) xasm = "%!jmp\t%P0"; /* SEH epilogue detection requires the indirect branch case to include REX.W. */ @@ -25653,7 +25691,9 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op seh_nop_p = true; } - if (direct_p) + if (direct_p && nopic_no_plt_attribute (call_op)) + xasm = "%!call\t*%p0@GOTPCREL(%%rip)"; + else if (direct_p) xasm = "%!call\t%P0"; else xasm = "%!call\t%A0"; Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 223720) +++ doc/extend.texi (working copy) @@ -2916,6 +2916,15 @@ the standard C library can be guaranteed not to th with the notable exceptions of @code{qsort} and @code{bsearch} that take function pointer arguments. +@item no_plt +@cindex @code{no_plt} function attribute +The @code{no_plt} attribute is used to inform the compiler that a calls +to the function should not use the PLT. For example, external functions +defined in shared objects are called from the executable using the PLT. +This attribute on the function declaration calls these functions indirectly +rather than going via the PLT. This is similar to @option{-fno-plt} but +is only applicable to calls to the function marked with this attribute. + @item optimize @cindex @code{optimize} function attribute The @code{optimize} attribute is used to specify that a function is to Index: testsuite/gcc.target/i386/noplt-1.c =================================================================== --- testsuite/gcc.target/i386/noplt-1.c (revision 0) +++ testsuite/gcc.target/i386/noplt-1.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-fno-pic" } */ + +__attribute__ ((no_plt)) +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-2.c =================================================================== --- testsuite/gcc.target/i386/noplt-2.c (revision 0) +++ testsuite/gcc.target/i386/noplt-2.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2 -fno-pic" } */ + + +__attribute__ ((no_plt)) +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-3.c =================================================================== --- testsuite/gcc.target/i386/noplt-3.c (revision 0) +++ testsuite/gcc.target/i386/noplt-3.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-fno-pic -fno-plt" } */ + +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-4.c =================================================================== --- testsuite/gcc.target/i386/noplt-4.c (revision 0) +++ testsuite/gcc.target/i386/noplt-4.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -fno-plt" } */ + +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-02 18:27 ` Sriraman Tallam @ 2015-06-02 19:59 ` Bernhard Reutner-Fischer 2015-06-02 20:09 ` Sriraman Tallam 2015-06-02 21:09 ` Ramana Radhakrishnan 1 sibling, 1 reply; 65+ messages in thread From: Bernhard Reutner-Fischer @ 2015-06-02 19:59 UTC (permalink / raw) To: Sriraman Tallam, ramrad01 Cc: Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches On June 2, 2015 8:15:42 PM GMT+02:00, Sriraman Tallam <tmsriram@google.com> wrote: [] >I have now modified this patch. > >This patch does two things: > >1) Adds new generic function attribute "no_plt" that is similar in >functionality to -fno-plt except that it applies only to calls to >functions that are marked with this attribute. >2) For x86_64, it makes -fno-plt(and the attribute) also work for >non-PIC code by directly generating an indirect call via a GOT entry. > >For PIC code, no_plt merely shadows the implementation of -fno-plt, no >surprises here. > >* c-family/c-common.c (no_plt): New attribute. >(handle_no_plt_attribute): New handler. >* calls.c (prepare_call_address): Check for no_plt >attribute. >* config/i386/i386.c (ix86_function_ok_for_sibcall): Check >for no_plt attribute. >(ix86_expand_call): Ditto. >(nopic_no_plt_attribute): New function. >(ix86_output_call_insn): Output indirect call for non-pic >no plt calls. >* doc/extend.texi (no_plt): Document new attribute. >* testsuite/gcc.target/i386/noplt-1.c: New test. >* testsuite/gcc.target/i386/noplt-2.c: New test. >* testsuite/gcc.target/i386/noplt-3.c: New test. >* testsuite/gcc.target/i386/noplt-4.c: New test. > > >Please review. --- config/i386/i386.c (revision 223720) +++ config/i386/i386.c (working copy) @@ -5479,6 +5479,8 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) && !TARGET_64BIT && flag_pic && flag_plt + && (TREE_CODE (decl) != FUNCTION_DECL + || !lookup_attribute ("no_plt", DECL_ATTRIBUTES (decl))) && decl && !targetm.binds_local_p (decl)) return false; Wrong order or && decl is redundant. Stopped reading here. Thanks, ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-02 19:59 ` Bernhard Reutner-Fischer @ 2015-06-02 20:09 ` Sriraman Tallam 2015-06-02 21:18 ` Bernhard Reutner-Fischer 0 siblings, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-06-02 20:09 UTC (permalink / raw) To: Bernhard Reutner-Fischer Cc: ramrad01, Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches [-- Attachment #1: Type: text/plain, Size: 1887 bytes --] On Tue, Jun 2, 2015 at 12:32 PM, Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote: > On June 2, 2015 8:15:42 PM GMT+02:00, Sriraman Tallam <tmsriram@google.com> wrote: > [] > >>I have now modified this patch. >> >>This patch does two things: >> >>1) Adds new generic function attribute "no_plt" that is similar in >>functionality to -fno-plt except that it applies only to calls to >>functions that are marked with this attribute. >>2) For x86_64, it makes -fno-plt(and the attribute) also work for >>non-PIC code by directly generating an indirect call via a GOT entry. >> >>For PIC code, no_plt merely shadows the implementation of -fno-plt, no >>surprises here. >> >>* c-family/c-common.c (no_plt): New attribute. >>(handle_no_plt_attribute): New handler. >>* calls.c (prepare_call_address): Check for no_plt >>attribute. >>* config/i386/i386.c (ix86_function_ok_for_sibcall): Check >>for no_plt attribute. >>(ix86_expand_call): Ditto. >>(nopic_no_plt_attribute): New function. >>(ix86_output_call_insn): Output indirect call for non-pic >>no plt calls. >>* doc/extend.texi (no_plt): Document new attribute. >>* testsuite/gcc.target/i386/noplt-1.c: New test. >>* testsuite/gcc.target/i386/noplt-2.c: New test. >>* testsuite/gcc.target/i386/noplt-3.c: New test. >>* testsuite/gcc.target/i386/noplt-4.c: New test. >> >> >>Please review. > > --- config/i386/i386.c (revision 223720) > +++ config/i386/i386.c (working copy) > @@ -5479,6 +5479,8 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) > && !TARGET_64BIT > && flag_pic > && flag_plt > + && (TREE_CODE (decl) != FUNCTION_DECL > + || !lookup_attribute ("no_plt", DECL_ATTRIBUTES (decl))) > && decl && !targetm.binds_local_p (decl)) > return false; > > Wrong order or && decl is redundant. Stopped reading here. Fixed and new patch attached. Thanks Sri > > Thanks, > [-- Attachment #2: noplt_attrib_patch_new.txt --] [-- Type: text/plain, Size: 9785 bytes --] * c-family/c-common.c (no_plt): New attribute. (handle_no_plt_attribute): New handler. * calls.c (prepare_call_address): Check for no_plt attribute. * config/i386/i386.c (ix86_function_ok_for_sibcall): Check for no_plt attribute. (ix86_expand_call): Ditto. (nopic_no_plt_attribute): New function. (ix86_output_call_insn): Output indirect call for non-pic no plt calls. * doc/extend.texi (no_plt): Document new attribute. * testsuite/gcc.target/i386/noplt-1.c: New test. * testsuite/gcc.target/i386/noplt-2.c: New test. * testsuite/gcc.target/i386/noplt-3.c: New test. * testsuite/gcc.target/i386/noplt-4.c: New test. This patch does two things: * Adds new generic function attribute "no_plt" that is similar in functionality to -fno-plt except that it applies only to calls to functions that are marked with this attribute. * For x86_64, it makes -fno-plt(and the attribute) also work for non-PIC code by directly generating an indirect call via a GOT entry. Index: c-family/c-common.c =================================================================== --- c-family/c-common.c (revision 223720) +++ c-family/c-common.c (working copy) @@ -357,6 +357,7 @@ static tree handle_mode_attribute (tree *, tree, t static tree handle_section_attribute (tree *, tree, tree, int, bool *); static tree handle_aligned_attribute (tree *, tree, tree, int, bool *); static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ; +static tree handle_no_plt_attribute (tree *, tree, tree, int, bool *) ; static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *); static tree handle_ifunc_attribute (tree *, tree, tree, int, bool *); static tree handle_alias_attribute (tree *, tree, tree, int, bool *); @@ -706,6 +707,8 @@ const struct attribute_spec c_common_attribute_tab handle_aligned_attribute, false }, { "weak", 0, 0, true, false, false, handle_weak_attribute, false }, + { "no_plt", 0, 0, true, false, false, + handle_no_plt_attribute, false }, { "ifunc", 1, 1, true, false, false, handle_ifunc_attribute, false }, { "alias", 1, 1, true, false, false, @@ -8185,6 +8188,25 @@ handle_weak_attribute (tree *node, tree name, return NULL_TREE; } +/* Handle a "no_plt" attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_no_plt_attribute (tree *node, tree name, + tree ARG_UNUSED (args), + int ARG_UNUSED (flags), + bool * ARG_UNUSED (no_add_attrs)) +{ + if (TREE_CODE (*node) != FUNCTION_DECL) + { + warning (OPT_Wattributes, + "%qE attribute is only applicable on functions", name); + *no_add_attrs = true; + return NULL_TREE; + } + return NULL_TREE; +} + /* Handle an "alias" or "ifunc" attribute; arguments as in struct attribute_spec.handler, except that IS_ALIAS tells us whether this is an alias as opposed to ifunc attribute. */ Index: calls.c =================================================================== --- calls.c (revision 223720) +++ calls.c (working copy) @@ -226,8 +226,10 @@ prepare_call_address (tree fndecl_or_type, rtx fun && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) : memory_address (FUNCTION_MODE, funexp)); - else if (flag_pic && !flag_plt && fndecl_or_type + else if (flag_pic && fndecl_or_type && TREE_CODE (fndecl_or_type) == FUNCTION_DECL + && (!flag_plt + || lookup_attribute ("no_plt", DECL_ATTRIBUTES (fndecl_or_type))) && !targetm.binds_local_p (fndecl_or_type)) { funexp = force_reg (Pmode, funexp); Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 223720) +++ config/i386/i386.c (working copy) @@ -5479,7 +5479,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) && !TARGET_64BIT && flag_pic && flag_plt - && decl && !targetm.binds_local_p (decl)) + && decl + && (TREE_CODE (decl) != FUNCTION_DECL + || !lookup_attribute ("no_plt", DECL_ATTRIBUTES (decl))) + && !targetm.binds_local_p (decl)) return false; /* If we need to align the outgoing stack, then sibcalling would @@ -25497,13 +25500,19 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call } else { - /* Static functions and indirect calls don't need the pic register. */ + /* Static functions and indirect calls don't need the pic register. Also, + check if PLT was explicitly avoided via no-plt or "no_plt" attribute, making + it an indirect call. */ if (flag_pic && (!TARGET_64BIT || (ix86_cmodel == CM_LARGE_PIC && DEFAULT_ABI != MS_ABI)) && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF - && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) + && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)) + && flag_plt + && (TREE_CODE (SYMBOL_REF_DECL (XEXP(fnaddr, 0))) != FUNCTION_DECL + || !lookup_attribute ("no_plt", + DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0)))))) { use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); if (ix86_use_pseudo_pic_reg ()) @@ -25599,6 +25608,34 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call return call; } +/* Return true if the function being called was marked with attribute + "no_plt" or using -fno-plt and we are compiling for no-PIC and x86_64. + This is currently used only with 64-bit ELF targets to call the function + marked "no_plt" indirectly. */ + +static bool +nopic_no_plt_attribute (rtx call_op) +{ + if (flag_pic) + return false; + + if (!TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF) + return false; + + if (SYMBOL_REF_LOCAL_P (call_op)) + return false; + + tree symbol_decl = SYMBOL_REF_DECL (call_op); + + if (symbol_decl != NULL_TREE + && TREE_CODE (symbol_decl) == FUNCTION_DECL + && (!flag_plt + || lookup_attribute ("no_plt", DECL_ATTRIBUTES (symbol_decl)))) + return true; + + return false; +} + /* Output the assembly for a call instruction. */ const char * @@ -25610,7 +25647,9 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op if (SIBLING_CALL_P (insn)) { - if (direct_p) + if (direct_p && nopic_no_plt_attribute (call_op)) + xasm = "%!jmp\t*%p0@GOTPCREL(%%rip)"; + else if (direct_p) xasm = "%!jmp\t%P0"; /* SEH epilogue detection requires the indirect branch case to include REX.W. */ @@ -25653,7 +25692,9 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op seh_nop_p = true; } - if (direct_p) + if (direct_p && nopic_no_plt_attribute (call_op)) + xasm = "%!call\t*%p0@GOTPCREL(%%rip)"; + else if (direct_p) xasm = "%!call\t%P0"; else xasm = "%!call\t%A0"; Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 223720) +++ doc/extend.texi (working copy) @@ -2916,6 +2916,15 @@ the standard C library can be guaranteed not to th with the notable exceptions of @code{qsort} and @code{bsearch} that take function pointer arguments. +@item no_plt +@cindex @code{no_plt} function attribute +The @code{no_plt} attribute is used to inform the compiler that a calls +to the function should not use the PLT. For example, external functions +defined in shared objects are called from the executable using the PLT. +This attribute on the function declaration calls these functions indirectly +rather than going via the PLT. This is similar to @option{-fno-plt} but +is only applicable to calls to the function marked with this attribute. + @item optimize @cindex @code{optimize} function attribute The @code{optimize} attribute is used to specify that a function is to Index: testsuite/gcc.target/i386/noplt-1.c =================================================================== --- testsuite/gcc.target/i386/noplt-1.c (revision 0) +++ testsuite/gcc.target/i386/noplt-1.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-fno-pic" } */ + +__attribute__ ((no_plt)) +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-2.c =================================================================== --- testsuite/gcc.target/i386/noplt-2.c (revision 0) +++ testsuite/gcc.target/i386/noplt-2.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2 -fno-pic" } */ + + +__attribute__ ((no_plt)) +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-3.c =================================================================== --- testsuite/gcc.target/i386/noplt-3.c (revision 0) +++ testsuite/gcc.target/i386/noplt-3.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-fno-pic -fno-plt" } */ + +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-4.c =================================================================== --- testsuite/gcc.target/i386/noplt-4.c (revision 0) +++ testsuite/gcc.target/i386/noplt-4.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -fno-plt" } */ + +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-02 20:09 ` Sriraman Tallam @ 2015-06-02 21:18 ` Bernhard Reutner-Fischer 0 siblings, 0 replies; 65+ messages in thread From: Bernhard Reutner-Fischer @ 2015-06-02 21:18 UTC (permalink / raw) To: Sriraman Tallam Cc: ramrad01, Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches On June 2, 2015 9:59:40 PM GMT+02:00, Sriraman Tallam <tmsriram@google.com> wrote: >On Tue, Jun 2, 2015 at 12:32 PM, Bernhard Reutner-Fischer ><rep.dot.nop@gmail.com> wrote: >> On June 2, 2015 8:15:42 PM GMT+02:00, Sriraman Tallam ><tmsriram@google.com> wrote: >> [] >> >>>I have now modified this patch. >>> >>>This patch does two things: >>> >>>1) Adds new generic function attribute "no_plt" that is similar in >>>functionality to -fno-plt except that it applies only to calls to >>>functions that are marked with this attribute. >>>2) For x86_64, it makes -fno-plt(and the attribute) also work for >>>non-PIC code by directly generating an indirect call via a GOT >entry. >>> >>>For PIC code, no_plt merely shadows the implementation of -fno-plt, >no >>>surprises here. >>> >>>* c-family/c-common.c (no_plt): New attribute. >>>(handle_no_plt_attribute): New handler. >>>* calls.c (prepare_call_address): Check for no_plt >>>attribute. >>>* config/i386/i386.c (ix86_function_ok_for_sibcall): Check >>>for no_plt attribute. >>>(ix86_expand_call): Ditto. >>>(nopic_no_plt_attribute): New function. >>>(ix86_output_call_insn): Output indirect call for non-pic >>>no plt calls. >>>* doc/extend.texi (no_plt): Document new attribute. >>>* testsuite/gcc.target/i386/noplt-1.c: New test. >>>* testsuite/gcc.target/i386/noplt-2.c: New test. >>>* testsuite/gcc.target/i386/noplt-3.c: New test. >>>* testsuite/gcc.target/i386/noplt-4.c: New test. >>> >>> >>>Please review. >> >> --- config/i386/i386.c (revision 223720) >> +++ config/i386/i386.c (working copy) >> @@ -5479,6 +5479,8 @@ ix86_function_ok_for_sibcall (tree decl, tree >exp) >> && !TARGET_64BIT >> && flag_pic >> && flag_plt >> + && (TREE_CODE (decl) != FUNCTION_DECL >> + || !lookup_attribute ("no_plt", DECL_ATTRIBUTES (decl))) >> && decl && !targetm.binds_local_p (decl)) >> return false; >> >> Wrong order or && decl is redundant. Stopped reading here. > >Fixed and new patch Just reading the diff I do not grok the different conditions in ix86_function_ok_for_sibcall ix86_expand_call especially regarding CM_LARGE_PIC but I take it you've read more context. - && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) + && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)) + && flag_plt s/! /!/;# while you touch or maybe that's OK -- check_GNU.sh would know, hopefully. +/* Return true if the function being called was marked with attribute + "no_plt" or using -fno-plt and we are compiling for no-PIC and x86_64. + This is currently used only with 64-bit ELF targets to call the function a function + marked "no_plt" indirectly. */ + +static bool +nopic_no_plt_attribute (rtx call_op) IIRC predicates ought to have a _p suffix but maybe that's outdated nowadays? +{ + if (flag_pic) + return false; + + if (!TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF) missing space after || We have a contrib/check*.sh style checker for patches in there. + return false; + + if (SYMBOL_REF_LOCAL_P (call_op)) + return false; + + tree symbol_decl = SYMBOL_REF_DECL (call_op); + + if (symbol_decl != NULL_TREE + && TREE_CODE (symbol_decl) == FUNCTION_DECL + && (!flag_plt + || lookup_attribute ("no_plt", DECL_ATTRIBUTES (symbol_decl)))) + return true; + + return false; +} +@item no_plt +@cindex @code{no_plt} function attribute +The @code{no_plt} attribute is used to inform the compiler that a calls Doesn't parse. a call / calls +to the function should not use the PLT. For example, external functions would be nice to have an xref to PLT definition for the casual reader, iff we have one or could have one easily. +defined in shared objects are called from the executable using the PLT. +This attribute on the function declaration calls these functions indirectly +rather than going via the PLT. This is similar to @option{-fno-plt} but +is only applicable to calls to the function marked with this attribute. + smallexample (or you-name-it counterpart) for code-avoidance for bonus points, maybe. Not a conceptual review due to current cellphone-impairedness, but looks somewhat plausible at first glance.. HTH && cheers, ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-02 18:27 ` Sriraman Tallam 2015-06-02 19:59 ` Bernhard Reutner-Fischer @ 2015-06-02 21:09 ` Ramana Radhakrishnan 2015-06-02 21:25 ` Xinliang David Li ` (2 more replies) 1 sibling, 3 replies; 65+ messages in thread From: Ramana Radhakrishnan @ 2015-06-02 21:09 UTC (permalink / raw) To: Sriraman Tallam Cc: Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches On Tue, Jun 2, 2015 at 7:15 PM, Sriraman Tallam <tmsriram@google.com> wrote: > On Mon, Jun 1, 2015 at 1:33 PM, Ramana Radhakrishnan > <ramana.gcc@googlemail.com> wrote: >> On Mon, Jun 1, 2015 at 7:55 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan >>> <ramana.gcc@googlemail.com> wrote: >>>> On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>> On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan >>>>> <ramana.radhakrishnan@arm.com> wrote: >>>>>> >>>>>>>> Why isn't it just an indirect call in the cases that would require a GOT >>>>>>>> slot and a direct call otherwise ? I'm trying to work out what's so >>>>>>>> different on each target that mandates this to be in the target backend. >>>>>>>> Also it would be better to push the tests into gcc.dg if you can and >>>>>>>> check >>>>>>>> for the absence of a relocation so that folks at least see these as being >>>>>>>> UNSUPPORTED on their target. >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> To be even more explicit, shouldn't this be handled similar to the way in >>>>>> which -fno-plt is handled in a target agnostic manner ? After all, if you >>>>>> can handle this for the command line, doing the same for a function which >>>>>> has been decorated with attribute((noplt)) should be simple. >>>>> >>>>> -fno-plt does not work for non-PIC code, having non-PIC code not use >>>>> PLT was my primary motivation. Infact, if you go back in this thread, >>>>> I suggested to HJ if I should piggyback on -fno-plt. I tried using >>>>> the -fno-plt implementation to do this by removing the flag_pic check >>>>> in calls.c, but that does not still work for non-PIC code. >> >> If you want __attribute__ ((noplt)) to work for non-PIC code, we >> should look to code it in the same place surely by making all >> __attribute__((noplt)) calls, indirect calls irrespective of whether >> it's fpic or not. >> >> >>>> >>>> You're missing my point, unless I'm missing something basic here - I >>>> should have been even more explicit and said -fPIC was a given in all >>>> this discussion. >>>> >>>> calls.c:229 has >>>> >>>> else if (flag_pic && !flag_plt && fndecl_or_type >>>> && TREE_CODE (fndecl_or_type) == FUNCTION_DECL >>>> && !targetm.binds_local_p (fndecl_or_type)) >>>> >>>> why can't we merge the check in here for the attribute noplt ? >>> >>> We can and and please see this thread, that is the exact patch I proposed : >>> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html >>> >>> However, there was one caveat. I want this working without -fPIC too. >>> non-PIC code also generates PLT calls and I want them eliminated. >>> >>>> >>>> If a new attribute is added to the "GNU language" in this case, why >>>> isn't this being treated in the same way as the command line option >>>> has been treated ? All this means is that we add an attribute and a >>>> command line option to common code and then not implement it in a >>>> proper target agnostic fashion. >>> >>> You are right. This is the way I wanted it too but I also wanted the >>> attribute to work without PIC. PLT calls are generated without -fPIC >>> and -fPIE too and I wanted a solution for that. On looking at the >>> code in more detail, >>> >>> * -fno-plt is made to work with -fPIC, is there a reason to not make >>> it work for non-PIC code? I can remove the flag_pic check from >>> calls.c >> >> I don't think that's right, you probably have to allow that along with >> (flag_pic || (decl && attribute_no_plt (decl)) - however it seems odd >> to me that the language extension allows this but the flag doesn't. >> >>> * Then, I add the generic attribute "noplt" and everything is fine. >>> >>> There is just one caveat with the above approach, for x86_64 >>> (*call_insn) will not generate indirect-calls for *non-PIC* code >>> because constant_call_address_operand in predicates.md will evaluate >>> to false. This can be fixed appropriately in ix86_output_call_insn in >>> i386.c. >> >> Yes, targets need to massage that into place but that's essentially >> the mechanics of retaining indirect calls in each backend. -fno-plt >> doesn't work for ARM / AArch64 with optimizers currently (and I >> suspect on most other targets) because our predicates are too liberal, >> fixed by treating "noplt" or -fno-plt as the equivalent of >> -mlong-calls. >> >>> >>> >>> Is this alright? Sorry for the confusion, but the primary reason why >>> I did not do it the way you suggested is because we wanted "noplt" >>> attribute to work for non-PIC code also. >> >> If that is the case, then this is a slightly more complicated >> condition in the same place. We then always have indirect calls for >> functions that are marked noplt and just have target generate this >> appropriately. > > I have now modified this patch. Thanks for taking care of this. I'll have a read through tomorrow morning when I'm at my normal work machine. > > This patch does two things: > > 1) Adds new generic function attribute "no_plt" that is similar in > functionality to -fno-plt except that it applies only to calls to > functions that are marked with this attribute. > 2) For x86_64, it makes -fno-plt(and the attribute) also work for > non-PIC code by directly generating an indirect call via a GOT entry. I'm sorry I'm going to push back again for the same reason. Other than forcing targets to tweak their call insn patterns, the act of generating the indirect call should remain in target independent code. Sorry, not having the same behaviour on all platforms for something like this is just a recipe for confusion. regards Ramana > > For PIC code, no_plt merely shadows the implementation of -fno-plt, no > surprises here. > > * c-family/c-common.c (no_plt): New attribute. > (handle_no_plt_attribute): New handler. > * calls.c (prepare_call_address): Check for no_plt > attribute. > * config/i386/i386.c (ix86_function_ok_for_sibcall): Check > for no_plt attribute. > (ix86_expand_call): Ditto. > (nopic_no_plt_attribute): New function. > (ix86_output_call_insn): Output indirect call for non-pic > no plt calls. > * doc/extend.texi (no_plt): Document new attribute. > * testsuite/gcc.target/i386/noplt-1.c: New test. > * testsuite/gcc.target/i386/noplt-2.c: New test. > * testsuite/gcc.target/i386/noplt-3.c: New test. > * testsuite/gcc.target/i386/noplt-4.c: New test. > > > Please review. > > Thanks > Sri > > >> >> To be honest, this is trivial to implement in the ARM backend as one >> would just piggy back on the longcalls work - despite that, IMNSHO >> it's best done in a target independent manner. >> >> regards >> Ramana >> >>> >>> Thanks >>> Sri >>> >>>> >>>> regards >>>> Ramana >>>> >>>> >>>>> >>>>>> >>>>>>> I am not familiar with PLT calls for other targets. I can move the >>>>>>> tests to gcc.dg but what relocation are you suggesting I check for? >>>>>> >>>>>> >>>>>> Move the test to gcc.dg, add a target_support_no_plt function in >>>>>> testsuite/lib/target-supports.exp and mark this as being supported only on >>>>>> x86 and use scan-assembler to scan for PLT relocations for x86. Other >>>>>> targets can add things as they deem fit. >>>>> >>>>>> >>>>>> In any case, on a large number of elf/ linux targets I would have thought >>>>>> the absence of a JMP_SLOT relocation would be good enough to check that this >>>>>> is working correctly. >>>>>> >>>>>> regards >>>>>> Ramana >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> Sri >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Ramana >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Also I think the PLT calls have EBX in call fusage wich is added by >>>>>>>>>> ix86_expand_call. >>>>>>>>>> else >>>>>>>>>> { >>>>>>>>>> /* Static functions and indirect calls don't need the pic >>>>>>>>>> register. */ >>>>>>>>>> if (flag_pic >>>>>>>>>> && (!TARGET_64BIT >>>>>>>>>> || (ix86_cmodel == CM_LARGE_PIC >>>>>>>>>> && DEFAULT_ABI != MS_ABI)) >>>>>>>>>> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >>>>>>>>>> && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >>>>>>>>>> { >>>>>>>>>> use_reg (&use, gen_rtx_REG (Pmode, >>>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM)); >>>>>>>>>> if (ix86_use_pseudo_pic_reg ()) >>>>>>>>>> emit_move_insn (gen_rtx_REG (Pmode, >>>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM), >>>>>>>>>> pic_offset_table_rtx); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> I think you want to take that away from FUSAGE there just like we do >>>>>>>>>> for >>>>>>>>>> local calls >>>>>>>>>> (and in fact the code should already check flag_pic && flag_plt I >>>>>>>>>> suppose. >>>>>>>>> >>>>>>>>> >>>>>>>>> Done that now and patch attached. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Sri >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Honza >>>>>>> >>>>>>> >>>>>> ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-02 21:09 ` Ramana Radhakrishnan @ 2015-06-02 21:25 ` Xinliang David Li 2015-06-02 21:52 ` Bernhard Reutner-Fischer 2015-06-02 21:40 ` Sriraman Tallam 2015-06-03 19:57 ` Richard Henderson 2 siblings, 1 reply; 65+ messages in thread From: Xinliang David Li @ 2015-06-02 21:25 UTC (permalink / raw) To: Ramana Radhakrishnan Cc: Sriraman Tallam, Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, GCC Patches On Tue, Jun 2, 2015 at 1:56 PM, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > On Tue, Jun 2, 2015 at 7:15 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Mon, Jun 1, 2015 at 1:33 PM, Ramana Radhakrishnan >> <ramana.gcc@googlemail.com> wrote: >>> On Mon, Jun 1, 2015 at 7:55 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan >>>> <ramana.gcc@googlemail.com> wrote: >>>>> On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>> On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan >>>>>> <ramana.radhakrishnan@arm.com> wrote: >>>>>>> >>>>>>>>> Why isn't it just an indirect call in the cases that would require a GOT >>>>>>>>> slot and a direct call otherwise ? I'm trying to work out what's so >>>>>>>>> different on each target that mandates this to be in the target backend. >>>>>>>>> Also it would be better to push the tests into gcc.dg if you can and >>>>>>>>> check >>>>>>>>> for the absence of a relocation so that folks at least see these as being >>>>>>>>> UNSUPPORTED on their target. >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> To be even more explicit, shouldn't this be handled similar to the way in >>>>>>> which -fno-plt is handled in a target agnostic manner ? After all, if you >>>>>>> can handle this for the command line, doing the same for a function which >>>>>>> has been decorated with attribute((noplt)) should be simple. >>>>>> >>>>>> -fno-plt does not work for non-PIC code, having non-PIC code not use >>>>>> PLT was my primary motivation. Infact, if you go back in this thread, >>>>>> I suggested to HJ if I should piggyback on -fno-plt. I tried using >>>>>> the -fno-plt implementation to do this by removing the flag_pic check >>>>>> in calls.c, but that does not still work for non-PIC code. >>> >>> If you want __attribute__ ((noplt)) to work for non-PIC code, we >>> should look to code it in the same place surely by making all >>> __attribute__((noplt)) calls, indirect calls irrespective of whether >>> it's fpic or not. >>> >>> >>>>> >>>>> You're missing my point, unless I'm missing something basic here - I >>>>> should have been even more explicit and said -fPIC was a given in all >>>>> this discussion. >>>>> >>>>> calls.c:229 has >>>>> >>>>> else if (flag_pic && !flag_plt && fndecl_or_type >>>>> && TREE_CODE (fndecl_or_type) == FUNCTION_DECL >>>>> && !targetm.binds_local_p (fndecl_or_type)) >>>>> >>>>> why can't we merge the check in here for the attribute noplt ? >>>> >>>> We can and and please see this thread, that is the exact patch I proposed : >>>> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html >>>> >>>> However, there was one caveat. I want this working without -fPIC too. >>>> non-PIC code also generates PLT calls and I want them eliminated. >>>> >>>>> >>>>> If a new attribute is added to the "GNU language" in this case, why >>>>> isn't this being treated in the same way as the command line option >>>>> has been treated ? All this means is that we add an attribute and a >>>>> command line option to common code and then not implement it in a >>>>> proper target agnostic fashion. >>>> >>>> You are right. This is the way I wanted it too but I also wanted the >>>> attribute to work without PIC. PLT calls are generated without -fPIC >>>> and -fPIE too and I wanted a solution for that. On looking at the >>>> code in more detail, >>>> >>>> * -fno-plt is made to work with -fPIC, is there a reason to not make >>>> it work for non-PIC code? I can remove the flag_pic check from >>>> calls.c >>> >>> I don't think that's right, you probably have to allow that along with >>> (flag_pic || (decl && attribute_no_plt (decl)) - however it seems odd >>> to me that the language extension allows this but the flag doesn't. >>> >>>> * Then, I add the generic attribute "noplt" and everything is fine. >>>> >>>> There is just one caveat with the above approach, for x86_64 >>>> (*call_insn) will not generate indirect-calls for *non-PIC* code >>>> because constant_call_address_operand in predicates.md will evaluate >>>> to false. This can be fixed appropriately in ix86_output_call_insn in >>>> i386.c. >>> >>> Yes, targets need to massage that into place but that's essentially >>> the mechanics of retaining indirect calls in each backend. -fno-plt >>> doesn't work for ARM / AArch64 with optimizers currently (and I >>> suspect on most other targets) because our predicates are too liberal, >>> fixed by treating "noplt" or -fno-plt as the equivalent of >>> -mlong-calls. >>> >>>> >>>> >>>> Is this alright? Sorry for the confusion, but the primary reason why >>>> I did not do it the way you suggested is because we wanted "noplt" >>>> attribute to work for non-PIC code also. >>> >>> If that is the case, then this is a slightly more complicated >>> condition in the same place. We then always have indirect calls for >>> functions that are marked noplt and just have target generate this >>> appropriately. >> >> I have now modified this patch. > > Thanks for taking care of this. I'll have a read through tomorrow > morning when I'm at my normal work machine. > >> >> This patch does two things: >> >> 1) Adds new generic function attribute "no_plt" that is similar in >> functionality to -fno-plt except that it applies only to calls to >> functions that are marked with this attribute. >> 2) For x86_64, it makes -fno-plt(and the attribute) also work for >> non-PIC code by directly generating an indirect call via a GOT entry. > > I'm sorry I'm going to push back again for the same reason. > > Other than forcing targets to tweak their call insn patterns, the act > of generating the indirect call should remain in target independent > code. Sorry, not having the same behaviour on all platforms for > something like this is just a recipe for confusion. Do you have a good suggestion on the way to implement this (non PIC no-plt) in a clean and target independent way? Regarding the 'confusion' part, is it a matter of documentation (can be updated when more targets start to support it more efficiently)? David > > regards > Ramana > >> >> For PIC code, no_plt merely shadows the implementation of -fno-plt, no >> surprises here. >> >> * c-family/c-common.c (no_plt): New attribute. >> (handle_no_plt_attribute): New handler. >> * calls.c (prepare_call_address): Check for no_plt >> attribute. >> * config/i386/i386.c (ix86_function_ok_for_sibcall): Check >> for no_plt attribute. >> (ix86_expand_call): Ditto. >> (nopic_no_plt_attribute): New function. >> (ix86_output_call_insn): Output indirect call for non-pic >> no plt calls. >> * doc/extend.texi (no_plt): Document new attribute. >> * testsuite/gcc.target/i386/noplt-1.c: New test. >> * testsuite/gcc.target/i386/noplt-2.c: New test. >> * testsuite/gcc.target/i386/noplt-3.c: New test. >> * testsuite/gcc.target/i386/noplt-4.c: New test. >> >> >> Please review. >> >> Thanks >> Sri >> >> >>> >>> To be honest, this is trivial to implement in the ARM backend as one >>> would just piggy back on the longcalls work - despite that, IMNSHO >>> it's best done in a target independent manner. >>> >>> regards >>> Ramana >>> >>>> >>>> Thanks >>>> Sri >>>> >>>>> >>>>> regards >>>>> Ramana >>>>> >>>>> >>>>>> >>>>>>> >>>>>>>> I am not familiar with PLT calls for other targets. I can move the >>>>>>>> tests to gcc.dg but what relocation are you suggesting I check for? >>>>>>> >>>>>>> >>>>>>> Move the test to gcc.dg, add a target_support_no_plt function in >>>>>>> testsuite/lib/target-supports.exp and mark this as being supported only on >>>>>>> x86 and use scan-assembler to scan for PLT relocations for x86. Other >>>>>>> targets can add things as they deem fit. >>>>>> >>>>>>> >>>>>>> In any case, on a large number of elf/ linux targets I would have thought >>>>>>> the absence of a JMP_SLOT relocation would be good enough to check that this >>>>>>> is working correctly. >>>>>>> >>>>>>> regards >>>>>>> Ramana >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Thanks >>>>>>>> Sri >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Ramana >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Also I think the PLT calls have EBX in call fusage wich is added by >>>>>>>>>>> ix86_expand_call. >>>>>>>>>>> else >>>>>>>>>>> { >>>>>>>>>>> /* Static functions and indirect calls don't need the pic >>>>>>>>>>> register. */ >>>>>>>>>>> if (flag_pic >>>>>>>>>>> && (!TARGET_64BIT >>>>>>>>>>> || (ix86_cmodel == CM_LARGE_PIC >>>>>>>>>>> && DEFAULT_ABI != MS_ABI)) >>>>>>>>>>> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >>>>>>>>>>> && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >>>>>>>>>>> { >>>>>>>>>>> use_reg (&use, gen_rtx_REG (Pmode, >>>>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM)); >>>>>>>>>>> if (ix86_use_pseudo_pic_reg ()) >>>>>>>>>>> emit_move_insn (gen_rtx_REG (Pmode, >>>>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM), >>>>>>>>>>> pic_offset_table_rtx); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> I think you want to take that away from FUSAGE there just like we do >>>>>>>>>>> for >>>>>>>>>>> local calls >>>>>>>>>>> (and in fact the code should already check flag_pic && flag_plt I >>>>>>>>>>> suppose. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Done that now and patch attached. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Sri >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Honza >>>>>>>> >>>>>>>> >>>>>>> ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-02 21:25 ` Xinliang David Li @ 2015-06-02 21:52 ` Bernhard Reutner-Fischer 0 siblings, 0 replies; 65+ messages in thread From: Bernhard Reutner-Fischer @ 2015-06-02 21:52 UTC (permalink / raw) To: Xinliang David Li, Ramana Radhakrishnan Cc: Sriraman Tallam, Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, GCC Patches On June 2, 2015 11:22:03 PM GMT+02:00, Xinliang David Li <davidxl@google.com> wrote: >> I'm sorry I'm going to push back again for the same reason. >> >> Other than forcing targets to tweak their call insn patterns, the act >> of generating the indirect call should remain in target independent >> code. Sorry, not having the same behaviour on all platforms for >> something like this is just a recipe for confusion. Everything else will be a nightmare for any real (widespread) use, yes. Just doing this for x86, x86_64 and x32 gets us in an unpleasant situation like the dances everybody had and has to do for ebx avoidance. > >Do you have a good suggestion on the way to implement this (non PIC >no-plt) in a clean and target independent way? Regarding the not offhand here, at least, fwiw. >'confusion' part, is it a matter of documentation (can be updated when >more targets start to support it more efficiently)? I386 compatible relief in this respect certainly is nice but we ought to handle this better throughout IMHO. Cannot devote time there myself though, so just hoping you folks are able to put some effort into this. PS: and please, pretty please clip your replies sensibly.. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-02 21:09 ` Ramana Radhakrishnan 2015-06-02 21:25 ` Xinliang David Li @ 2015-06-02 21:40 ` Sriraman Tallam 2015-06-03 14:37 ` Ramana Radhakrishnan 2015-06-03 19:57 ` Richard Henderson 2 siblings, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-06-02 21:40 UTC (permalink / raw) To: ramrad01 Cc: Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches On Tue, Jun 2, 2015 at 1:56 PM, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > On Tue, Jun 2, 2015 at 7:15 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Mon, Jun 1, 2015 at 1:33 PM, Ramana Radhakrishnan >> <ramana.gcc@googlemail.com> wrote: >>> On Mon, Jun 1, 2015 at 7:55 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan >>>> <ramana.gcc@googlemail.com> wrote: >>>>> On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>> On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan >>>>>> <ramana.radhakrishnan@arm.com> wrote: >>>>>>> >>>>>>>>> Why isn't it just an indirect call in the cases that would require a GOT >>>>>>>>> slot and a direct call otherwise ? I'm trying to work out what's so >>>>>>>>> different on each target that mandates this to be in the target backend. >>>>>>>>> Also it would be better to push the tests into gcc.dg if you can and >>>>>>>>> check >>>>>>>>> for the absence of a relocation so that folks at least see these as being >>>>>>>>> UNSUPPORTED on their target. >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> To be even more explicit, shouldn't this be handled similar to the way in >>>>>>> which -fno-plt is handled in a target agnostic manner ? After all, if you >>>>>>> can handle this for the command line, doing the same for a function which >>>>>>> has been decorated with attribute((noplt)) should be simple. >>>>>> >>>>>> -fno-plt does not work for non-PIC code, having non-PIC code not use >>>>>> PLT was my primary motivation. Infact, if you go back in this thread, >>>>>> I suggested to HJ if I should piggyback on -fno-plt. I tried using >>>>>> the -fno-plt implementation to do this by removing the flag_pic check >>>>>> in calls.c, but that does not still work for non-PIC code. >>> >>> If you want __attribute__ ((noplt)) to work for non-PIC code, we >>> should look to code it in the same place surely by making all >>> __attribute__((noplt)) calls, indirect calls irrespective of whether >>> it's fpic or not. >>> >>> >>>>> >>>>> You're missing my point, unless I'm missing something basic here - I >>>>> should have been even more explicit and said -fPIC was a given in all >>>>> this discussion. >>>>> >>>>> calls.c:229 has >>>>> >>>>> else if (flag_pic && !flag_plt && fndecl_or_type >>>>> && TREE_CODE (fndecl_or_type) == FUNCTION_DECL >>>>> && !targetm.binds_local_p (fndecl_or_type)) >>>>> >>>>> why can't we merge the check in here for the attribute noplt ? >>>> >>>> We can and and please see this thread, that is the exact patch I proposed : >>>> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html >>>> >>>> However, there was one caveat. I want this working without -fPIC too. >>>> non-PIC code also generates PLT calls and I want them eliminated. >>>> >>>>> >>>>> If a new attribute is added to the "GNU language" in this case, why >>>>> isn't this being treated in the same way as the command line option >>>>> has been treated ? All this means is that we add an attribute and a >>>>> command line option to common code and then not implement it in a >>>>> proper target agnostic fashion. >>>> >>>> You are right. This is the way I wanted it too but I also wanted the >>>> attribute to work without PIC. PLT calls are generated without -fPIC >>>> and -fPIE too and I wanted a solution for that. On looking at the >>>> code in more detail, >>>> >>>> * -fno-plt is made to work with -fPIC, is there a reason to not make >>>> it work for non-PIC code? I can remove the flag_pic check from >>>> calls.c >>> >>> I don't think that's right, you probably have to allow that along with >>> (flag_pic || (decl && attribute_no_plt (decl)) - however it seems odd >>> to me that the language extension allows this but the flag doesn't. >>> >>>> * Then, I add the generic attribute "noplt" and everything is fine. >>>> >>>> There is just one caveat with the above approach, for x86_64 >>>> (*call_insn) will not generate indirect-calls for *non-PIC* code >>>> because constant_call_address_operand in predicates.md will evaluate >>>> to false. This can be fixed appropriately in ix86_output_call_insn in >>>> i386.c. >>> >>> Yes, targets need to massage that into place but that's essentially >>> the mechanics of retaining indirect calls in each backend. -fno-plt >>> doesn't work for ARM / AArch64 with optimizers currently (and I >>> suspect on most other targets) because our predicates are too liberal, >>> fixed by treating "noplt" or -fno-plt as the equivalent of >>> -mlong-calls. >>> >>>> >>>> >>>> Is this alright? Sorry for the confusion, but the primary reason why >>>> I did not do it the way you suggested is because we wanted "noplt" >>>> attribute to work for non-PIC code also. >>> >>> If that is the case, then this is a slightly more complicated >>> condition in the same place. We then always have indirect calls for >>> functions that are marked noplt and just have target generate this >>> appropriately. >> >> I have now modified this patch. > > Thanks for taking care of this. I'll have a read through tomorrow > morning when I'm at my normal work machine. > >> >> This patch does two things: >> >> 1) Adds new generic function attribute "no_plt" that is similar in >> functionality to -fno-plt except that it applies only to calls to >> functions that are marked with this attribute. >> 2) For x86_64, it makes -fno-plt(and the attribute) also work for >> non-PIC code by directly generating an indirect call via a GOT entry. > > I'm sorry I'm going to push back again for the same reason. Let me describe the problem I am having in a little more detail: For the PIC case, I think there is no confusion. Both of us agree on what is being done. Attribute no_plt exactly shadows -fno-plt and is completely target independent. For the non-PIC case, this is where some target dependent portions are needed. This is because I simply cannot remove the flag_pic check in calls.c and force the address onto a register. Lets say I did that with this patch: Index: calls.c =================================================================== --- calls.c (revision 223720) +++ calls.c (working copy) @@ -226,8 +226,10 @@ prepare_call_address (tree fndecl_or_type, rtx fun && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) : memory_address (FUNCTION_MODE, funexp)); - else if (flag_pic && !flag_plt && fndecl_or_type + else if (fndecl_or_type && TREE_CODE (fndecl_or_type) == FUNCTION_DECL + && (!flag_plt + || lookup_attribute ("no_plt", DECL_ATTRIBUTES (fndecl_or_type))) && !targetm.binds_local_p (fndecl_or_type)) { funexp = force_reg (Pmode, funexp); what would the code look like for this example below in the non-PIC case: __attribute__((no_plt)) extern int foo(); int main () { return foo(); } Without -O2: mov _Z3foov, %eax call *%eax The indirect call is there but this is wrong because this will force the linker to still create a PLT entry for foo and use that address. This is worse than calling the PLT directly as we end up calling the PLT indirectly. Now, with -O2: call *_Z3foov and again same story. The linker creates a PLT entry for foo and calls foo_plt indirectly. What we really need to do in the non-PIC case, if we need a target independent solution, is pretend that the call to foo is like a PIC call when we see the attribute. I looked at how to do this and the change to me seems pretty hairy and that is why it seemed like it is better to handle this in the target directly. Thanks Sri > > Other than forcing targets to tweak their call insn patterns, the act > of generating the indirect call should remain in target independent > code. Sorry, not having the same behaviour on all platforms for > something like this is just a recipe for confusion. > > regards > Ramana > >> >> For PIC code, no_plt merely shadows the implementation of -fno-plt, no >> surprises here. >> >> * c-family/c-common.c (no_plt): New attribute. >> (handle_no_plt_attribute): New handler. >> * calls.c (prepare_call_address): Check for no_plt >> attribute. >> * config/i386/i386.c (ix86_function_ok_for_sibcall): Check >> for no_plt attribute. >> (ix86_expand_call): Ditto. >> (nopic_no_plt_attribute): New function. >> (ix86_output_call_insn): Output indirect call for non-pic >> no plt calls. >> * doc/extend.texi (no_plt): Document new attribute. >> * testsuite/gcc.target/i386/noplt-1.c: New test. >> * testsuite/gcc.target/i386/noplt-2.c: New test. >> * testsuite/gcc.target/i386/noplt-3.c: New test. >> * testsuite/gcc.target/i386/noplt-4.c: New test. >> >> >> Please review. >> >> Thanks >> Sri >> >> >>> >>> To be honest, this is trivial to implement in the ARM backend as one >>> would just piggy back on the longcalls work - despite that, IMNSHO >>> it's best done in a target independent manner. >>> >>> regards >>> Ramana >>> >>>> >>>> Thanks >>>> Sri >>>> >>>>> >>>>> regards >>>>> Ramana >>>>> >>>>> >>>>>> >>>>>>> >>>>>>>> I am not familiar with PLT calls for other targets. I can move the >>>>>>>> tests to gcc.dg but what relocation are you suggesting I check for? >>>>>>> >>>>>>> >>>>>>> Move the test to gcc.dg, add a target_support_no_plt function in >>>>>>> testsuite/lib/target-supports.exp and mark this as being supported only on >>>>>>> x86 and use scan-assembler to scan for PLT relocations for x86. Other >>>>>>> targets can add things as they deem fit. >>>>>> >>>>>>> >>>>>>> In any case, on a large number of elf/ linux targets I would have thought >>>>>>> the absence of a JMP_SLOT relocation would be good enough to check that this >>>>>>> is working correctly. >>>>>>> >>>>>>> regards >>>>>>> Ramana >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Thanks >>>>>>>> Sri >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Ramana >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Also I think the PLT calls have EBX in call fusage wich is added by >>>>>>>>>>> ix86_expand_call. >>>>>>>>>>> else >>>>>>>>>>> { >>>>>>>>>>> /* Static functions and indirect calls don't need the pic >>>>>>>>>>> register. */ >>>>>>>>>>> if (flag_pic >>>>>>>>>>> && (!TARGET_64BIT >>>>>>>>>>> || (ix86_cmodel == CM_LARGE_PIC >>>>>>>>>>> && DEFAULT_ABI != MS_ABI)) >>>>>>>>>>> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >>>>>>>>>>> && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >>>>>>>>>>> { >>>>>>>>>>> use_reg (&use, gen_rtx_REG (Pmode, >>>>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM)); >>>>>>>>>>> if (ix86_use_pseudo_pic_reg ()) >>>>>>>>>>> emit_move_insn (gen_rtx_REG (Pmode, >>>>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM), >>>>>>>>>>> pic_offset_table_rtx); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> I think you want to take that away from FUSAGE there just like we do >>>>>>>>>>> for >>>>>>>>>>> local calls >>>>>>>>>>> (and in fact the code should already check flag_pic && flag_plt I >>>>>>>>>>> suppose. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Done that now and patch attached. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Sri >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Honza >>>>>>>> >>>>>>>> >>>>>>> ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-02 21:40 ` Sriraman Tallam @ 2015-06-03 14:37 ` Ramana Radhakrishnan 2015-06-03 18:53 ` Sriraman Tallam 0 siblings, 1 reply; 65+ messages in thread From: Ramana Radhakrishnan @ 2015-06-03 14:37 UTC (permalink / raw) To: Sriraman Tallam Cc: Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches Hi Sriraman, Thanks for the detailed explanation, that was useful. >> >> I'm sorry I'm going to push back again for the same reason. > > Let me describe the problem I am having in a little more detail: > > For the PIC case, I think there is no confusion. Both of us agree on > what is being done. Attribute no_plt exactly shadows -fno-plt and is > completely target independent. Agreed. > > For the non-PIC case, this is where some target dependent portions are > needed. This is because I simply cannot remove the flag_pic check in > calls.c and force the address onto a register. Lets say I did that > with this patch: Of-course I should have realized this earlier - sorry for being a pain. We need to load the value from the GOT (or an equivalent position independent manner) and that is entirely handled by the backends, there's no easy interface to do this from the mid-end. I tried a horrible hack in calls.c which was - int old_flag_pic = flag_pic; flag_pic = 1; funexp = force_reg (Pmode, funexp); flag_pic = old_flag_pic; We then have to relax quite a lot of checks in a number of places across backends to handle !flag_plt which ain't worth it. I agree now that it will be much cleaner just to punt this into the backend, so it may be worth noting that making this work properly for the non-PIC case requires quite a degree of massaging in the backends. Objections withdrawn. Thanks, Ramana > > Index: calls.c > =================================================================== > --- calls.c (revision 223720) > +++ calls.c (working copy) > @@ -226,8 +226,10 @@ prepare_call_address (tree fndecl_or_type, rtx fun > && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) > ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) > : memory_address (FUNCTION_MODE, funexp)); > - else if (flag_pic && !flag_plt && fndecl_or_type > + else if (fndecl_or_type > && TREE_CODE (fndecl_or_type) == FUNCTION_DECL > + && (!flag_plt > + || lookup_attribute ("no_plt", DECL_ATTRIBUTES (fndecl_or_type))) > && !targetm.binds_local_p (fndecl_or_type)) > { > funexp = force_reg (Pmode, funexp); > > what would the code look like for this example below in the non-PIC case: > > __attribute__((no_plt)) > extern int foo(); > > int main () > { > return foo(); > } > > > Without -O2: > > mov _Z3foov, %eax > call *%eax > > The indirect call is there but this is wrong because this will force > the linker to still create a PLT entry for foo and use that address. > This is worse than calling the PLT directly as we end up calling the > PLT indirectly. > > Now, with -O2: > call *_Z3foov > > and again same story. The linker creates a PLT entry for foo and > calls foo_plt indirectly. > > What we really need to do in the non-PIC case, if we need a target > independent solution, is pretend that the call to foo is like a PIC > call when we see the attribute. I looked at how to do this and the > change to me seems pretty hairy and that is why it seemed like it is > better to handle this in the target directly. > > Thanks > Sri > > >> >> Other than forcing targets to tweak their call insn patterns, the act >> of generating the indirect call should remain in target independent >> code. Sorry, not having the same behaviour on all platforms for >> something like this is just a recipe for confusion. >> >> regards >> Ramana >> >>> >>> For PIC code, no_plt merely shadows the implementation of -fno-plt, no >>> surprises here. >>> >>> * c-family/c-common.c (no_plt): New attribute. >>> (handle_no_plt_attribute): New handler. >>> * calls.c (prepare_call_address): Check for no_plt >>> attribute. >>> * config/i386/i386.c (ix86_function_ok_for_sibcall): Check >>> for no_plt attribute. >>> (ix86_expand_call): Ditto. >>> (nopic_no_plt_attribute): New function. >>> (ix86_output_call_insn): Output indirect call for non-pic >>> no plt calls. >>> * doc/extend.texi (no_plt): Document new attribute. >>> * testsuite/gcc.target/i386/noplt-1.c: New test. >>> * testsuite/gcc.target/i386/noplt-2.c: New test. >>> * testsuite/gcc.target/i386/noplt-3.c: New test. >>> * testsuite/gcc.target/i386/noplt-4.c: New test. >>> >>> >>> Please review. >>> >>> Thanks >>> Sri >>> >>> >>>> >>>> To be honest, this is trivial to implement in the ARM backend as one >>>> would just piggy back on the longcalls work - despite that, IMNSHO >>>> it's best done in a target independent manner. >>>> >>>> regards >>>> Ramana >>>> >>>>> >>>>> Thanks >>>>> Sri >>>>> >>>>>> >>>>>> regards >>>>>> Ramana >>>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> I am not familiar with PLT calls for other targets. I can move the >>>>>>>>> tests to gcc.dg but what relocation are you suggesting I check for? >>>>>>>> >>>>>>>> >>>>>>>> Move the test to gcc.dg, add a target_support_no_plt function in >>>>>>>> testsuite/lib/target-supports.exp and mark this as being supported only on >>>>>>>> x86 and use scan-assembler to scan for PLT relocations for x86. Other >>>>>>>> targets can add things as they deem fit. >>>>>>> >>>>>>>> >>>>>>>> In any case, on a large number of elf/ linux targets I would have thought >>>>>>>> the absence of a JMP_SLOT relocation would be good enough to check that this >>>>>>>> is working correctly. >>>>>>>> >>>>>>>> regards >>>>>>>> Ramana >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Sri >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Ramana >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Also I think the PLT calls have EBX in call fusage wich is added by >>>>>>>>>>>> ix86_expand_call. >>>>>>>>>>>> else >>>>>>>>>>>> { >>>>>>>>>>>> /* Static functions and indirect calls don't need the pic >>>>>>>>>>>> register. */ >>>>>>>>>>>> if (flag_pic >>>>>>>>>>>> && (!TARGET_64BIT >>>>>>>>>>>> || (ix86_cmodel == CM_LARGE_PIC >>>>>>>>>>>> && DEFAULT_ABI != MS_ABI)) >>>>>>>>>>>> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >>>>>>>>>>>> && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >>>>>>>>>>>> { >>>>>>>>>>>> use_reg (&use, gen_rtx_REG (Pmode, >>>>>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM)); >>>>>>>>>>>> if (ix86_use_pseudo_pic_reg ()) >>>>>>>>>>>> emit_move_insn (gen_rtx_REG (Pmode, >>>>>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM), >>>>>>>>>>>> pic_offset_table_rtx); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> I think you want to take that away from FUSAGE there just like we do >>>>>>>>>>>> for >>>>>>>>>>>> local calls >>>>>>>>>>>> (and in fact the code should already check flag_pic && flag_plt I >>>>>>>>>>>> suppose. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Done that now and patch attached. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Sri >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Honza >>>>>>>>> >>>>>>>>> >>>>>>>> > ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-03 14:37 ` Ramana Radhakrishnan @ 2015-06-03 18:53 ` Sriraman Tallam 2015-06-03 20:16 ` Richard Henderson 0 siblings, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-06-03 18:53 UTC (permalink / raw) To: Ramana Radhakrishnan Cc: Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches [-- Attachment #1: Type: text/plain, Size: 1465 bytes --] > > I agree now that it will be much cleaner just to punt this into the backend, > so it may be worth noting that making this work properly for the non-PIC > case requires quite a degree of massaging in the backends. > > Objections withdrawn. Thanks!, I have attached the latest patch after making the changes Bernhard suggested. Also, added a comment saying non-PIC case needs to be handled specially by the backend. * c-family/c-common.c (no_plt): New attribute. (handle_no_plt_attribute): New handler. * calls.c (prepare_call_address): Check for no_plt attribute. * config/i386/i386.c (ix86_function_ok_for_sibcall): Check for no_plt attribute. (ix86_expand_call): Ditto. (ix86_nopic_no_plt_attribute_p): New function. (ix86_output_call_insn): Output indirect call for non-pic no plt calls. * doc/extend.texi (no_plt): Document new attribute. * doc/invoke.texi: Document new attribute. * testsuite/gcc.target/i386/noplt-1.c: New test. * testsuite/gcc.target/i386/noplt-2.c: New test. * testsuite/gcc.target/i386/noplt-3.c: New test. * testsuite/gcc.target/i386/noplt-4.c: New test. This patch does two things: * Adds new generic function attribute "no_plt" that is similar in functionality to -fno-plt except that it applies only to calls to functions that are marked with this attribute. * For x86_64, it makes -fno-plt(and the attribute) also work for non-PIC code by directly generating an indirect call via a GOT entry. Sri > > Thanks, > Ramana [-- Attachment #2: noplt_attrib_patch_new.txt --] [-- Type: text/plain, Size: 11390 bytes --] * c-family/c-common.c (no_plt): New attribute. (handle_no_plt_attribute): New handler. * calls.c (prepare_call_address): Check for no_plt attribute. * config/i386/i386.c (ix86_function_ok_for_sibcall): Check for no_plt attribute. (ix86_expand_call): Ditto. (ix86_nopic_no_plt_attribute_p): New function. (ix86_output_call_insn): Output indirect call for non-pic no plt calls. * doc/extend.texi (no_plt): Document new attribute. * doc/invoke.texi: Document new attribute. * testsuite/gcc.target/i386/noplt-1.c: New test. * testsuite/gcc.target/i386/noplt-2.c: New test. * testsuite/gcc.target/i386/noplt-3.c: New test. * testsuite/gcc.target/i386/noplt-4.c: New test. This patch does two things: * Adds new generic function attribute "no_plt" that is similar in functionality to -fno-plt except that it applies only to calls to functions that are marked with this attribute. * For x86_64, it makes -fno-plt(and the attribute) also work for non-PIC code by directly generating an indirect call via a GOT entry. Index: c-family/c-common.c =================================================================== --- c-family/c-common.c (revision 223720) +++ c-family/c-common.c (working copy) @@ -357,6 +357,7 @@ static tree handle_mode_attribute (tree *, tree, t static tree handle_section_attribute (tree *, tree, tree, int, bool *); static tree handle_aligned_attribute (tree *, tree, tree, int, bool *); static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ; +static tree handle_no_plt_attribute (tree *, tree, tree, int, bool *) ; static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *); static tree handle_ifunc_attribute (tree *, tree, tree, int, bool *); static tree handle_alias_attribute (tree *, tree, tree, int, bool *); @@ -706,6 +707,8 @@ const struct attribute_spec c_common_attribute_tab handle_aligned_attribute, false }, { "weak", 0, 0, true, false, false, handle_weak_attribute, false }, + { "no_plt", 0, 0, true, false, false, + handle_no_plt_attribute, false }, { "ifunc", 1, 1, true, false, false, handle_ifunc_attribute, false }, { "alias", 1, 1, true, false, false, @@ -8185,6 +8188,25 @@ handle_weak_attribute (tree *node, tree name, return NULL_TREE; } +/* Handle a "no_plt" attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_no_plt_attribute (tree *node, tree name, + tree ARG_UNUSED (args), + int ARG_UNUSED (flags), + bool * ARG_UNUSED (no_add_attrs)) +{ + if (TREE_CODE (*node) != FUNCTION_DECL) + { + warning (OPT_Wattributes, + "%qE attribute is only applicable on functions", name); + *no_add_attrs = true; + return NULL_TREE; + } + return NULL_TREE; +} + /* Handle an "alias" or "ifunc" attribute; arguments as in struct attribute_spec.handler, except that IS_ALIAS tells us whether this is an alias as opposed to ifunc attribute. */ Index: calls.c =================================================================== --- calls.c (revision 223720) +++ calls.c (working copy) @@ -226,10 +226,16 @@ prepare_call_address (tree fndecl_or_type, rtx fun && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) : memory_address (FUNCTION_MODE, funexp)); - else if (flag_pic && !flag_plt && fndecl_or_type + else if (flag_pic + && fndecl_or_type && TREE_CODE (fndecl_or_type) == FUNCTION_DECL + && (!flag_plt + || lookup_attribute ("no_plt", DECL_ATTRIBUTES (fndecl_or_type))) && !targetm.binds_local_p (fndecl_or_type)) { + /* This is done only for PIC code. There is no easy interface to force the + function address into GOT for non-PIC case. non-PIC case needs to be + handled specially by the backend. */ funexp = force_reg (Pmode, funexp); } else if (! sibcallp) Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 223720) +++ config/i386/i386.c (working copy) @@ -5479,7 +5479,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) && !TARGET_64BIT && flag_pic && flag_plt - && decl && !targetm.binds_local_p (decl)) + && decl + && (TREE_CODE (decl) != FUNCTION_DECL + || !lookup_attribute ("no_plt", DECL_ATTRIBUTES (decl))) + && !targetm.binds_local_p (decl)) return false; /* If we need to align the outgoing stack, then sibcalling would @@ -25497,13 +25500,19 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call } else { - /* Static functions and indirect calls don't need the pic register. */ + /* Static functions and indirect calls don't need the pic register. Also, + check if PLT was explicitly avoided via no-plt or "no_plt" attribute, making + it an indirect call. */ if (flag_pic && (!TARGET_64BIT || (ix86_cmodel == CM_LARGE_PIC && DEFAULT_ABI != MS_ABI)) && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF - && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) + && !SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)) + && flag_plt + && (TREE_CODE (SYMBOL_REF_DECL (XEXP(fnaddr, 0))) != FUNCTION_DECL + || !lookup_attribute ("no_plt", + DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0)))))) { use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); if (ix86_use_pseudo_pic_reg ()) @@ -25598,7 +25607,32 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call return call; } +/* Return true if the function being called was marked with attribute "no_plt" + or using -fno-plt and we are compiling for non-PIC and x86_64. We need to + handle the non-PIC case in the backend because there is no easy interface + for the front-end to force non-PLT calls to use the GOT. This is currently + used only with 64-bit ELF targets to call the function marked "no_plt" + indirectly. */ +static bool +ix86_nopic_no_plt_attribute_p (rtx call_op) +{ + if (flag_pic || ix86_cmodel == CM_LARGE + || !TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF + || SYMBOL_REF_LOCAL_P (call_op)) + return false; + + tree symbol_decl = SYMBOL_REF_DECL (call_op); + + if (symbol_decl != NULL_TREE + && TREE_CODE (symbol_decl) == FUNCTION_DECL + && (!flag_plt + || lookup_attribute ("no_plt", DECL_ATTRIBUTES (symbol_decl)))) + return true; + + return false; +} + /* Output the assembly for a call instruction. */ const char * @@ -25610,7 +25644,9 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op if (SIBLING_CALL_P (insn)) { - if (direct_p) + if (direct_p && ix86_nopic_no_plt_attribute_p (call_op)) + xasm = "%!jmp\t*%p0@GOTPCREL(%%rip)"; + else if (direct_p) xasm = "%!jmp\t%P0"; /* SEH epilogue detection requires the indirect branch case to include REX.W. */ @@ -25653,7 +25689,9 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op seh_nop_p = true; } - if (direct_p) + if (direct_p && ix86_nopic_no_plt_attribute_p (call_op)) + xasm = "%!call\t*%p0@GOTPCREL(%%rip)"; + else if (direct_p) xasm = "%!call\t%P0"; else xasm = "%!call\t%A0"; Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 223720) +++ doc/extend.texi (working copy) @@ -2916,6 +2916,35 @@ the standard C library can be guaranteed not to th with the notable exceptions of @code{qsort} and @code{bsearch} that take function pointer arguments. +@item no_plt +@cindex @code{no_plt} function attribute +The @code{no_plt} attribute is the counterpart to option @option{-fno-plt} and +does not use PLT for calls to functions marked with this attribute in position +independent code. + +@smallexample +@group +/* Externally defined function foo. */ +int foo () __attribute__ ((no_plt)); + +int +main (/* @r{@dots{}} */) +@{ + /* @r{@dots{}} */ + foo (); + /* @r{@dots{}} */ +@} +@end group +@end smallexample + +The @code{no_plt} attribute on function foo tells the compiler to assume that +the function foo is externally defined and the call to foo must avoid the PLT +in position independent code. + +Additionally, a few targets also convert calls to those functions that are +marked to not use the PLT to use the GOT instead for non-position independent +code. + @item optimize @cindex @code{optimize} function attribute The @code{optimize} attribute is used to specify that a function is to Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 223720) +++ doc/invoke.texi (working copy) @@ -23868,6 +23868,14 @@ PLT stubs expect GOT pointer in a specific registe register allocation freedom to the compiler. Lazy binding requires PLT: with @option{-fno-plt} all external symbols are resolved at load time. +Alternatively, function attribute @code{no_plt} can be used to avoid PLT +for calls to specific external functions by marking those functions with +this attribute. + +Additionally, a few targets also convert calls to those functions that are +marked to not use the PLT to use the GOT instead for non-position independent +code. + @item -fno-jump-tables @opindex fno-jump-tables Do not use jump tables for switch statements even where it would be Index: testsuite/gcc.target/i386/noplt-1.c =================================================================== --- testsuite/gcc.target/i386/noplt-1.c (revision 0) +++ testsuite/gcc.target/i386/noplt-1.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-fno-pic" } */ + +__attribute__ ((no_plt)) +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-2.c =================================================================== --- testsuite/gcc.target/i386/noplt-2.c (revision 0) +++ testsuite/gcc.target/i386/noplt-2.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2 -fno-pic" } */ + + +__attribute__ ((no_plt)) +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-3.c =================================================================== --- testsuite/gcc.target/i386/noplt-3.c (revision 0) +++ testsuite/gcc.target/i386/noplt-3.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-fno-pic -fno-plt" } */ + +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-4.c =================================================================== --- testsuite/gcc.target/i386/noplt-4.c (revision 0) +++ testsuite/gcc.target/i386/noplt-4.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -fno-plt" } */ + +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-03 18:53 ` Sriraman Tallam @ 2015-06-03 20:16 ` Richard Henderson 2015-06-03 20:59 ` Sriraman Tallam 0 siblings, 1 reply; 65+ messages in thread From: Richard Henderson @ 2015-06-03 20:16 UTC (permalink / raw) To: Sriraman Tallam, Ramana Radhakrishnan Cc: Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches On 06/03/2015 11:38 AM, Sriraman Tallam wrote: > + { "no_plt", 0, 0, true, false, false, > + handle_no_plt_attribute, false }, Call it noplt. We don't add the underscore for noinline, noclone, etc. > Index: config/i386/i386.c > =================================================================== > --- config/i386/i386.c (revision 223720) > +++ config/i386/i386.c (working copy) > @@ -5479,7 +5479,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) > && !TARGET_64BIT > && flag_pic > && flag_plt > - && decl && !targetm.binds_local_p (decl)) > + && decl > + && (TREE_CODE (decl) != FUNCTION_DECL > + || !lookup_attribute ("no_plt", DECL_ATTRIBUTES (decl))) > + && !targetm.binds_local_p (decl)) > return false; > > /* If we need to align the outgoing stack, then sibcalling would Is this really necessary? I'd expect DECL to be NULL in this case, since the non-use of the PLT will mean that the (sib)call is indirect. > @@ -25497,13 +25500,19 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call > } > else > { > - /* Static functions and indirect calls don't need the pic register. */ > + /* Static functions and indirect calls don't need the pic register. Also, > + check if PLT was explicitly avoided via no-plt or "no_plt" attribute, making > + it an indirect call. */ > if (flag_pic > && (!TARGET_64BIT > || (ix86_cmodel == CM_LARGE_PIC > && DEFAULT_ABI != MS_ABI)) > && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF > - && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) > + && !SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)) > + && flag_plt > + && (TREE_CODE (SYMBOL_REF_DECL (XEXP(fnaddr, 0))) != FUNCTION_DECL > + || !lookup_attribute ("no_plt", > + DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0)))))) > { > use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); > if (ix86_use_pseudo_pic_reg ()) Why are you testing FUNCTION_DECL? Even if, somehow, the user were producing a function call to a data symbol, why do you think that lookup_attribute would produce incorrect results? Similarly in ix86_nopic_no_plt_attribute_p. r~ ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-03 20:16 ` Richard Henderson @ 2015-06-03 20:59 ` Sriraman Tallam 2015-06-04 16:56 ` Sriraman Tallam 0 siblings, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-06-03 20:59 UTC (permalink / raw) To: Richard Henderson Cc: Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches [-- Attachment #1: Type: text/plain, Size: 2536 bytes --] On Wed, Jun 3, 2015 at 1:09 PM, Richard Henderson <rth@redhat.com> wrote: > On 06/03/2015 11:38 AM, Sriraman Tallam wrote: >> + { "no_plt", 0, 0, true, false, false, >> + handle_no_plt_attribute, false }, > > Call it noplt. We don't add the underscore for noinline, noclone, etc. Done. > > > >> Index: config/i386/i386.c >> =================================================================== >> --- config/i386/i386.c (revision 223720) >> +++ config/i386/i386.c (working copy) >> @@ -5479,7 +5479,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) >> && !TARGET_64BIT >> && flag_pic >> && flag_plt >> - && decl && !targetm.binds_local_p (decl)) >> + && decl >> + && (TREE_CODE (decl) != FUNCTION_DECL >> + || !lookup_attribute ("no_plt", DECL_ATTRIBUTES (decl))) >> + && !targetm.binds_local_p (decl)) >> return false; >> >> /* If we need to align the outgoing stack, then sibcalling would > > Is this really necessary? I'd expect DECL to be NULL in this case, > since the non-use of the PLT will mean that the (sib)call is indirect. Removed. > > >> @@ -25497,13 +25500,19 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call >> } >> else >> { >> - /* Static functions and indirect calls don't need the pic register. */ >> + /* Static functions and indirect calls don't need the pic register. Also, >> + check if PLT was explicitly avoided via no-plt or "no_plt" attribute, making >> + it an indirect call. */ >> if (flag_pic >> && (!TARGET_64BIT >> || (ix86_cmodel == CM_LARGE_PIC >> && DEFAULT_ABI != MS_ABI)) >> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >> - && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >> + && !SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)) >> + && flag_plt >> + && (TREE_CODE (SYMBOL_REF_DECL (XEXP(fnaddr, 0))) != FUNCTION_DECL >> + || !lookup_attribute ("no_plt", >> + DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0)))))) >> { >> use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); >> if (ix86_use_pseudo_pic_reg ()) > > Why are you testing FUNCTION_DECL? Even if, somehow, the user were producing a > function call to a data symbol, why do you think that lookup_attribute would > produce incorrect results? > > Similarly in ix86_nopic_no_plt_attribute_p. Fixed. Patch attached with those changes. Thanks Sri [-- Attachment #2: noplt_attrib_patch_new.txt --] [-- Type: text/plain, Size: 10794 bytes --] * c-family/c-common.c (noplt): New attribute. (handle_noplt_attribute): New handler. * calls.c (prepare_call_address): Check for noplt attribute. * config/i386/i386.c (ix86_function_ok_for_sibcall): Check for noplt attribute. (ix86_expand_call): Ditto. (ix86_nopic_noplt_attribute_p): New function. (ix86_output_call_insn): Output indirect call for non-pic no plt calls. * doc/extend.texi (noplt): Document new attribute. * doc/invoke.texi: Document new attribute. * testsuite/gcc.target/i386/noplt-1.c: New test. * testsuite/gcc.target/i386/noplt-2.c: New test. * testsuite/gcc.target/i386/noplt-3.c: New test. * testsuite/gcc.target/i386/noplt-4.c: New test. This patch does two things: * Adds new generic function attribute "noplt" that is similar in functionality to -fno-plt except that it applies only to calls to functions that are marked with this attribute. * For x86_64, it makes -fno-plt(and the attribute) also work for non-PIC code by directly generating an indirect call via a GOT entry. Index: c-family/c-common.c =================================================================== --- c-family/c-common.c (revision 223720) +++ c-family/c-common.c (working copy) @@ -357,6 +357,7 @@ static tree handle_mode_attribute (tree *, tree, t static tree handle_section_attribute (tree *, tree, tree, int, bool *); static tree handle_aligned_attribute (tree *, tree, tree, int, bool *); static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ; +static tree handle_noplt_attribute (tree *, tree, tree, int, bool *) ; static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *); static tree handle_ifunc_attribute (tree *, tree, tree, int, bool *); static tree handle_alias_attribute (tree *, tree, tree, int, bool *); @@ -706,6 +707,8 @@ const struct attribute_spec c_common_attribute_tab handle_aligned_attribute, false }, { "weak", 0, 0, true, false, false, handle_weak_attribute, false }, + { "noplt", 0, 0, true, false, false, + handle_noplt_attribute, false }, { "ifunc", 1, 1, true, false, false, handle_ifunc_attribute, false }, { "alias", 1, 1, true, false, false, @@ -8185,6 +8188,25 @@ handle_weak_attribute (tree *node, tree name, return NULL_TREE; } +/* Handle a "noplt" attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_noplt_attribute (tree *node, tree name, + tree ARG_UNUSED (args), + int ARG_UNUSED (flags), + bool * ARG_UNUSED (no_add_attrs)) +{ + if (TREE_CODE (*node) != FUNCTION_DECL) + { + warning (OPT_Wattributes, + "%qE attribute is only applicable on functions", name); + *no_add_attrs = true; + return NULL_TREE; + } + return NULL_TREE; +} + /* Handle an "alias" or "ifunc" attribute; arguments as in struct attribute_spec.handler, except that IS_ALIAS tells us whether this is an alias as opposed to ifunc attribute. */ Index: calls.c =================================================================== --- calls.c (revision 223720) +++ calls.c (working copy) @@ -226,10 +226,16 @@ prepare_call_address (tree fndecl_or_type, rtx fun && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) : memory_address (FUNCTION_MODE, funexp)); - else if (flag_pic && !flag_plt && fndecl_or_type + else if (flag_pic + && fndecl_or_type && TREE_CODE (fndecl_or_type) == FUNCTION_DECL + && (!flag_plt + || lookup_attribute ("noplt", DECL_ATTRIBUTES (fndecl_or_type))) && !targetm.binds_local_p (fndecl_or_type)) { + /* This is done only for PIC code. There is no easy interface to force the + function address into GOT for non-PIC case. non-PIC case needs to be + handled specially by the backend. */ funexp = force_reg (Pmode, funexp); } else if (! sibcallp) Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 223720) +++ config/i386/i386.c (working copy) @@ -25497,13 +25497,18 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call } else { - /* Static functions and indirect calls don't need the pic register. */ + /* Static functions and indirect calls don't need the pic register. Also, + check if PLT was explicitly avoided via no-plt or "noplt" attribute, making + it an indirect call. */ if (flag_pic && (!TARGET_64BIT || (ix86_cmodel == CM_LARGE_PIC && DEFAULT_ABI != MS_ABI)) && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF - && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) + && !SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)) + && flag_plt + && !lookup_attribute ("noplt", + DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0))))) { use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); if (ix86_use_pseudo_pic_reg ()) @@ -25598,7 +25603,31 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call return call; } +/* Return true if the function being called was marked with attribute "noplt" + or using -fno-plt and we are compiling for non-PIC and x86_64. We need to + handle the non-PIC case in the backend because there is no easy interface + for the front-end to force non-PLT calls to use the GOT. This is currently + used only with 64-bit ELF targets to call the function marked "noplt" + indirectly. */ +static bool +ix86_nopic_noplt_attribute_p (rtx call_op) +{ + if (flag_pic || ix86_cmodel == CM_LARGE + || !TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF + || SYMBOL_REF_LOCAL_P (call_op)) + return false; + + tree symbol_decl = SYMBOL_REF_DECL (call_op); + + if (!flag_plt + || (symbol_decl != NULL_TREE + && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl)))) + return true; + + return false; +} + /* Output the assembly for a call instruction. */ const char * @@ -25610,7 +25639,9 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op if (SIBLING_CALL_P (insn)) { - if (direct_p) + if (direct_p && ix86_nopic_noplt_attribute_p (call_op)) + xasm = "%!jmp\t*%p0@GOTPCREL(%%rip)"; + else if (direct_p) xasm = "%!jmp\t%P0"; /* SEH epilogue detection requires the indirect branch case to include REX.W. */ @@ -25653,7 +25684,9 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op seh_nop_p = true; } - if (direct_p) + if (direct_p && ix86_nopic_noplt_attribute_p (call_op)) + xasm = "%!call\t*%p0@GOTPCREL(%%rip)"; + else if (direct_p) xasm = "%!call\t%P0"; else xasm = "%!call\t%A0"; Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 223720) +++ doc/extend.texi (working copy) @@ -2916,6 +2916,35 @@ the standard C library can be guaranteed not to th with the notable exceptions of @code{qsort} and @code{bsearch} that take function pointer arguments. +@item noplt +@cindex @code{noplt} function attribute +The @code{noplt} attribute is the counterpart to option @option{-fno-plt} and +does not use PLT for calls to functions marked with this attribute in position +independent code. + +@smallexample +@group +/* Externally defined function foo. */ +int foo () __attribute__ ((noplt)); + +int +main (/* @r{@dots{}} */) +@{ + /* @r{@dots{}} */ + foo (); + /* @r{@dots{}} */ +@} +@end group +@end smallexample + +The @code{noplt} attribute on function foo tells the compiler to assume that +the function foo is externally defined and the call to foo must avoid the PLT +in position independent code. + +Additionally, a few targets also convert calls to those functions that are +marked to not use the PLT to use the GOT instead for non-position independent +code. + @item optimize @cindex @code{optimize} function attribute The @code{optimize} attribute is used to specify that a function is to Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 223720) +++ doc/invoke.texi (working copy) @@ -23868,6 +23868,14 @@ PLT stubs expect GOT pointer in a specific registe register allocation freedom to the compiler. Lazy binding requires PLT: with @option{-fno-plt} all external symbols are resolved at load time. +Alternatively, function attribute @code{noplt} can be used to avoid PLT +for calls to specific external functions by marking those functions with +this attribute. + +Additionally, a few targets also convert calls to those functions that are +marked to not use the PLT to use the GOT instead for non-position independent +code. + @item -fno-jump-tables @opindex fno-jump-tables Do not use jump tables for switch statements even where it would be Index: testsuite/gcc.target/i386/noplt-1.c =================================================================== --- testsuite/gcc.target/i386/noplt-1.c (revision 0) +++ testsuite/gcc.target/i386/noplt-1.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-fno-pic" } */ + +__attribute__ ((noplt)) +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-2.c =================================================================== --- testsuite/gcc.target/i386/noplt-2.c (revision 0) +++ testsuite/gcc.target/i386/noplt-2.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2 -fno-pic" } */ + + +__attribute__ ((noplt)) +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-3.c =================================================================== --- testsuite/gcc.target/i386/noplt-3.c (revision 0) +++ testsuite/gcc.target/i386/noplt-3.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-fno-pic -fno-plt" } */ + +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-4.c =================================================================== --- testsuite/gcc.target/i386/noplt-4.c (revision 0) +++ testsuite/gcc.target/i386/noplt-4.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -fno-plt" } */ + +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-03 20:59 ` Sriraman Tallam @ 2015-06-04 16:56 ` Sriraman Tallam 2015-06-04 17:30 ` Richard Henderson 2015-07-24 19:02 ` H.J. Lu 0 siblings, 2 replies; 65+ messages in thread From: Sriraman Tallam @ 2015-06-04 16:56 UTC (permalink / raw) To: Richard Henderson Cc: Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches [-- Attachment #1: Type: text/plain, Size: 748 bytes --] > Patch attached with those changes. Is this patch alright to commit? * c-family/c-common.c (noplt): New attribute. (handle_noplt_attribute): New handler. * calls.c (prepare_call_address): Check for noplt attribute. * config/i386/i386.c (ix86_function_ok_for_sibcall): Check for noplt attribute. (ix86_expand_call): Ditto. (ix86_nopic_noplt_attribute_p): New function. (ix86_output_call_insn): Output indirect call for non-pic no plt calls. * doc/extend.texi (noplt): Document new attribute. * doc/invoke.texi: Document new attribute. * testsuite/gcc.target/i386/noplt-1.c: New test. * testsuite/gcc.target/i386/noplt-2.c: New test. * testsuite/gcc.target/i386/noplt-3.c: New test. * testsuite/gcc.target/i386/noplt-4.c: New test. Thanks Sri [-- Attachment #2: noplt_attrib_patch_new.txt --] [-- Type: text/plain, Size: 10794 bytes --] * c-family/c-common.c (noplt): New attribute. (handle_noplt_attribute): New handler. * calls.c (prepare_call_address): Check for noplt attribute. * config/i386/i386.c (ix86_function_ok_for_sibcall): Check for noplt attribute. (ix86_expand_call): Ditto. (ix86_nopic_noplt_attribute_p): New function. (ix86_output_call_insn): Output indirect call for non-pic no plt calls. * doc/extend.texi (noplt): Document new attribute. * doc/invoke.texi: Document new attribute. * testsuite/gcc.target/i386/noplt-1.c: New test. * testsuite/gcc.target/i386/noplt-2.c: New test. * testsuite/gcc.target/i386/noplt-3.c: New test. * testsuite/gcc.target/i386/noplt-4.c: New test. This patch does two things: * Adds new generic function attribute "noplt" that is similar in functionality to -fno-plt except that it applies only to calls to functions that are marked with this attribute. * For x86_64, it makes -fno-plt(and the attribute) also work for non-PIC code by directly generating an indirect call via a GOT entry. Index: c-family/c-common.c =================================================================== --- c-family/c-common.c (revision 223720) +++ c-family/c-common.c (working copy) @@ -357,6 +357,7 @@ static tree handle_mode_attribute (tree *, tree, t static tree handle_section_attribute (tree *, tree, tree, int, bool *); static tree handle_aligned_attribute (tree *, tree, tree, int, bool *); static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ; +static tree handle_noplt_attribute (tree *, tree, tree, int, bool *) ; static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *); static tree handle_ifunc_attribute (tree *, tree, tree, int, bool *); static tree handle_alias_attribute (tree *, tree, tree, int, bool *); @@ -706,6 +707,8 @@ const struct attribute_spec c_common_attribute_tab handle_aligned_attribute, false }, { "weak", 0, 0, true, false, false, handle_weak_attribute, false }, + { "noplt", 0, 0, true, false, false, + handle_noplt_attribute, false }, { "ifunc", 1, 1, true, false, false, handle_ifunc_attribute, false }, { "alias", 1, 1, true, false, false, @@ -8185,6 +8188,25 @@ handle_weak_attribute (tree *node, tree name, return NULL_TREE; } +/* Handle a "noplt" attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_noplt_attribute (tree *node, tree name, + tree ARG_UNUSED (args), + int ARG_UNUSED (flags), + bool * ARG_UNUSED (no_add_attrs)) +{ + if (TREE_CODE (*node) != FUNCTION_DECL) + { + warning (OPT_Wattributes, + "%qE attribute is only applicable on functions", name); + *no_add_attrs = true; + return NULL_TREE; + } + return NULL_TREE; +} + /* Handle an "alias" or "ifunc" attribute; arguments as in struct attribute_spec.handler, except that IS_ALIAS tells us whether this is an alias as opposed to ifunc attribute. */ Index: calls.c =================================================================== --- calls.c (revision 223720) +++ calls.c (working copy) @@ -226,10 +226,16 @@ prepare_call_address (tree fndecl_or_type, rtx fun && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) : memory_address (FUNCTION_MODE, funexp)); - else if (flag_pic && !flag_plt && fndecl_or_type + else if (flag_pic + && fndecl_or_type && TREE_CODE (fndecl_or_type) == FUNCTION_DECL + && (!flag_plt + || lookup_attribute ("noplt", DECL_ATTRIBUTES (fndecl_or_type))) && !targetm.binds_local_p (fndecl_or_type)) { + /* This is done only for PIC code. There is no easy interface to force the + function address into GOT for non-PIC case. non-PIC case needs to be + handled specially by the backend. */ funexp = force_reg (Pmode, funexp); } else if (! sibcallp) Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 223720) +++ config/i386/i386.c (working copy) @@ -25497,13 +25497,18 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call } else { - /* Static functions and indirect calls don't need the pic register. */ + /* Static functions and indirect calls don't need the pic register. Also, + check if PLT was explicitly avoided via no-plt or "noplt" attribute, making + it an indirect call. */ if (flag_pic && (!TARGET_64BIT || (ix86_cmodel == CM_LARGE_PIC && DEFAULT_ABI != MS_ABI)) && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF - && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) + && !SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)) + && flag_plt + && !lookup_attribute ("noplt", + DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0))))) { use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); if (ix86_use_pseudo_pic_reg ()) @@ -25598,7 +25603,31 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call return call; } +/* Return true if the function being called was marked with attribute "noplt" + or using -fno-plt and we are compiling for non-PIC and x86_64. We need to + handle the non-PIC case in the backend because there is no easy interface + for the front-end to force non-PLT calls to use the GOT. This is currently + used only with 64-bit ELF targets to call the function marked "noplt" + indirectly. */ +static bool +ix86_nopic_noplt_attribute_p (rtx call_op) +{ + if (flag_pic || ix86_cmodel == CM_LARGE + || !TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF + || SYMBOL_REF_LOCAL_P (call_op)) + return false; + + tree symbol_decl = SYMBOL_REF_DECL (call_op); + + if (!flag_plt + || (symbol_decl != NULL_TREE + && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl)))) + return true; + + return false; +} + /* Output the assembly for a call instruction. */ const char * @@ -25610,7 +25639,9 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op if (SIBLING_CALL_P (insn)) { - if (direct_p) + if (direct_p && ix86_nopic_noplt_attribute_p (call_op)) + xasm = "%!jmp\t*%p0@GOTPCREL(%%rip)"; + else if (direct_p) xasm = "%!jmp\t%P0"; /* SEH epilogue detection requires the indirect branch case to include REX.W. */ @@ -25653,7 +25684,9 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op seh_nop_p = true; } - if (direct_p) + if (direct_p && ix86_nopic_noplt_attribute_p (call_op)) + xasm = "%!call\t*%p0@GOTPCREL(%%rip)"; + else if (direct_p) xasm = "%!call\t%P0"; else xasm = "%!call\t%A0"; Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 223720) +++ doc/extend.texi (working copy) @@ -2916,6 +2916,35 @@ the standard C library can be guaranteed not to th with the notable exceptions of @code{qsort} and @code{bsearch} that take function pointer arguments. +@item noplt +@cindex @code{noplt} function attribute +The @code{noplt} attribute is the counterpart to option @option{-fno-plt} and +does not use PLT for calls to functions marked with this attribute in position +independent code. + +@smallexample +@group +/* Externally defined function foo. */ +int foo () __attribute__ ((noplt)); + +int +main (/* @r{@dots{}} */) +@{ + /* @r{@dots{}} */ + foo (); + /* @r{@dots{}} */ +@} +@end group +@end smallexample + +The @code{noplt} attribute on function foo tells the compiler to assume that +the function foo is externally defined and the call to foo must avoid the PLT +in position independent code. + +Additionally, a few targets also convert calls to those functions that are +marked to not use the PLT to use the GOT instead for non-position independent +code. + @item optimize @cindex @code{optimize} function attribute The @code{optimize} attribute is used to specify that a function is to Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 223720) +++ doc/invoke.texi (working copy) @@ -23868,6 +23868,14 @@ PLT stubs expect GOT pointer in a specific registe register allocation freedom to the compiler. Lazy binding requires PLT: with @option{-fno-plt} all external symbols are resolved at load time. +Alternatively, function attribute @code{noplt} can be used to avoid PLT +for calls to specific external functions by marking those functions with +this attribute. + +Additionally, a few targets also convert calls to those functions that are +marked to not use the PLT to use the GOT instead for non-position independent +code. + @item -fno-jump-tables @opindex fno-jump-tables Do not use jump tables for switch statements even where it would be Index: testsuite/gcc.target/i386/noplt-1.c =================================================================== --- testsuite/gcc.target/i386/noplt-1.c (revision 0) +++ testsuite/gcc.target/i386/noplt-1.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-fno-pic" } */ + +__attribute__ ((noplt)) +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-2.c =================================================================== --- testsuite/gcc.target/i386/noplt-2.c (revision 0) +++ testsuite/gcc.target/i386/noplt-2.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2 -fno-pic" } */ + + +__attribute__ ((noplt)) +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-3.c =================================================================== --- testsuite/gcc.target/i386/noplt-3.c (revision 0) +++ testsuite/gcc.target/i386/noplt-3.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-fno-pic -fno-plt" } */ + +void foo(); + +int main() +{ + foo(); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ Index: testsuite/gcc.target/i386/noplt-4.c =================================================================== --- testsuite/gcc.target/i386/noplt-4.c (revision 0) +++ testsuite/gcc.target/i386/noplt-4.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile { target x86_64-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -fno-plt" } */ + +int foo(); + +int main() +{ + return foo(); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-04 16:56 ` Sriraman Tallam @ 2015-06-04 17:30 ` Richard Henderson 2015-06-04 21:34 ` Sriraman Tallam 2015-07-24 19:02 ` H.J. Lu 1 sibling, 1 reply; 65+ messages in thread From: Richard Henderson @ 2015-06-04 17:30 UTC (permalink / raw) To: Sriraman Tallam Cc: Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches On 06/04/2015 09:54 AM, Sriraman Tallam wrote: > + DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0))))) Spacing. > { > use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); > if (ix86_use_pseudo_pic_reg ()) > @@ -25598,7 +25603,31 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call > > return call; > } > +/* Return true if the function being called was marked with attribute "noplt" Vertical spacing. > + || !TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF Spacing. Otherwise ok. r~ ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-04 17:30 ` Richard Henderson @ 2015-06-04 21:34 ` Sriraman Tallam 0 siblings, 0 replies; 65+ messages in thread From: Sriraman Tallam @ 2015-06-04 21:34 UTC (permalink / raw) To: Richard Henderson Cc: Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches On Thu, Jun 4, 2015 at 10:05 AM, Richard Henderson <rth@redhat.com> wrote: > On 06/04/2015 09:54 AM, Sriraman Tallam wrote: >> + DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0))))) > > Spacing. > >> { >> use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); >> if (ix86_use_pseudo_pic_reg ()) >> @@ -25598,7 +25603,31 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call >> >> return call; >> } >> +/* Return true if the function being called was marked with attribute "noplt" > > Vertical spacing. > >> + || !TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF > > Spacing. > > Otherwise ok. Made these changes and committed the patch. I had to add one more check here to check if decl is not null before looking at its attributes. It was causing a seg fault during boot-strap with libgcc build. + && (SYMBOL_REF_DECL ((XEXP (fnaddr, 0))) == NULL_TREE // This line was added after the patch was approved. + || !lookup_attribute ("noplt", + DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP (fnaddr, 0)))))) Thanks Sri > > > r~ ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-04 16:56 ` Sriraman Tallam 2015-06-04 17:30 ` Richard Henderson @ 2015-07-24 19:02 ` H.J. Lu 1 sibling, 0 replies; 65+ messages in thread From: H.J. Lu @ 2015-07-24 19:02 UTC (permalink / raw) To: Sriraman Tallam Cc: Richard Henderson, Ramana Radhakrishnan, Jan Hubicka, Pedro Alves, Michael Matz, David Li, GCC Patches On Thu, Jun 4, 2015 at 9:54 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> Patch attached with those changes. > > Is this patch alright to commit? > > > * c-family/c-common.c (noplt): New attribute. > (handle_noplt_attribute): New handler. > * calls.c (prepare_call_address): Check for noplt attribute. > * config/i386/i386.c (ix86_function_ok_for_sibcall): Check > for noplt attribute. > (ix86_expand_call): Ditto. > (ix86_nopic_noplt_attribute_p): New function. > (ix86_output_call_insn): Output indirect call for non-pic no plt calls. > * doc/extend.texi (noplt): Document new attribute. > * doc/invoke.texi: Document new attribute. > * testsuite/gcc.target/i386/noplt-1.c: New test. > * testsuite/gcc.target/i386/noplt-2.c: New test. > * testsuite/gcc.target/i386/noplt-3.c: New test. > * testsuite/gcc.target/i386/noplt-4.c: New test. > This may have caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67001 -- H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-06-02 21:09 ` Ramana Radhakrishnan 2015-06-02 21:25 ` Xinliang David Li 2015-06-02 21:40 ` Sriraman Tallam @ 2015-06-03 19:57 ` Richard Henderson 2 siblings, 0 replies; 65+ messages in thread From: Richard Henderson @ 2015-06-03 19:57 UTC (permalink / raw) To: ramrad01, Sriraman Tallam Cc: Ramana Radhakrishnan, Jan Hubicka, H.J. Lu, Pedro Alves, Michael Matz, David Li, GCC Patches On 06/02/2015 01:56 PM, Ramana Radhakrishnan wrote: > I'm sorry I'm going to push back again for the same reason. > > Other than forcing targets to tweak their call insn patterns, the act > of generating the indirect call should remain in target independent > code. How is that going to help? Unless a target tweaks its call insn patterns, combine or cse is going to reconstruct the direct call from the indirect call. Indeed, the tweak itself will be exactly what's needed to force the generation of the indirect call, no? r~ ^ permalink raw reply [flat|nested] 65+ messages in thread
* [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= @ 2015-05-01 0:31 Sriraman Tallam 2015-05-01 3:21 ` Alan Modra ` (2 more replies) 0 siblings, 3 replies; 65+ messages in thread From: Sriraman Tallam @ 2015-05-01 0:31 UTC (permalink / raw) To: GCC Patches, H.J. Lu, David Li [-- Attachment #1: Type: text/plain, Size: 3872 bytes --] Hi, We noticed that one of our benchmarks sped-up by ~1% when we eliminated PLT stubs for some of the hot external library functions like memcmp, pow. The win was from better icache and itlb performance. The main reason was that the PLT stubs had no spatial locality with the call-sites. I have started looking at ways to tell the compiler to eliminate PLT stubs (in-effect inline them) for specified external functions, for x86_64. I have a proposal and a patch and I would like to hear what you think. Here is a summary of what is happening currently. A call to an external function is direct but calls into the PLT stub which then jumps indirectly to the GOT entry. If I could replace the direct call to the PLT stub with an indirect call to a GOT entry which will hold the address of the external function, I have gotten rid of the PLT stub. Here is an example: foo.cc ===== extern int foo (); // Truly external library function, defined in a shared library. int main() { foo(); ... } Currently, this is what is happening. foo.s looks like this: main: ..... callq _Z3foov but the linker replaces this to call the PLT stub of foo instead. Function main calls the plt stub directly: 0000000000400766 <main>: …. 40076a: e8 71 fe ff ff callq 4005e0 <_Z3foov@plt> and the PLT stub does this: 00000000004005e0 <_Z3foov@plt>: 4005e0: jmpq *0x15d2(%rip) # 401bb8 <_GLOBAL_OFFSET_TABLE_+0x28> 4005e6: pushq $0x2 4005eb: jmpq 4005b0 <_init+0x28> The GOT entry at address 0x401bb8 contains the address of foo which will be lazily bound. What my proposal plans does is to change foo.s to look like this: callq *_Z3foov@GOTPCREL(%rip) which is indirectly calling foo via a GOT entry that contains the address of foo. The address in the GOT entry is fixed up at load time and the linker creates only one GOT entry per function irrespective of the number of callers. a.out now looks like this: 0000000000400746 <main>: ... 40074a: ff 15 20 14 00 00 callq *0x1420(%rip) # 401b70 <_DYNAMIC+0x1e8> ... Function main indirectly calls foo using the contents at location 0x401b70 which is actually a GOT entry containing the address of foo. Notice that we have in effect inlined the PLT stub. This comes with caveats. This cannot be generally done for all functions marked extern as it is impossible for the compiler to say if a function is "truly extern" (defined in a shared library). If a function is not truly extern(ends up defined in the final executable), then calling it indirectly is a performance penalty as it could have been a direct call. Further, the newly created GOT entries are fixed up at start-up and do not get lazily bound. Given this, I propose adding a new option called -fno-plt=<function-name> to the compiler. This tells the compiler that we know that the function is truly extern and we want the indirect call only for these call-sites. I have attached a patch that adds -fno-plt= to GCC. Any number of "-fno-plt=" can be specified and all call-sites corresponding to these named functions will be done indirectly using the mechanism described above without the use of a PLT stub. Alternatively, we can do this entirely in the linker. We can introduce a new relocation type to tell the linker to convert all direct calls to truly extern functions into indirect calls via GOT entries. The GCC patch just seems simpler. Also, we could link statically but we do not want that or we could copy the specific external functions into our executable. This might work for executable A but a different set of external functions might be hot for executable B. We want a more general solution. Please let me know what you think. Thanks Sri [-- Attachment #2: avoid_plt_patch.txt --] [-- Type: text/plain, Size: 4091 bytes --] * common.opt (-fno-plt=): New option. * config/i386/i386.c (avoid_plt_to_call): New function. (ix86_output_call_insn): Check if PLT needs to be avoided and call or jump indirectly if true. * opts-global.c (htab_str_eq): New function. (avoid_plt_fnsymbol_names_tab): New htab. (handle_common_deferred_options): Handle -fno-plt= Index: common.opt =================================================================== --- common.opt (revision 222641) +++ common.opt (working copy) @@ -1087,6 +1087,11 @@ fdbg-cnt= Common RejectNegative Joined Var(common_deferred_options) Defer -fdbg-cnt=<counter>:<limit>[,<counter>:<limit>,...] Set the debug counter limit. +fno-plt= +Common RejectNegative Joined Var(common_deferred_options) Defer +-fno-plt=<symbol1> Avoid going through the PLT when calling the specified function. +Allow multiple instances of this option with different function names. + fdebug-prefix-map= Common Joined RejectNegative Var(common_deferred_options) Defer Map one directory name to another in debug information Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 222641) +++ config/i386/i386.c (working copy) @@ -25282,6 +25282,25 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call return call; } +extern htab_t avoid_plt_fnsymbol_names_tab; +/* If the function referenced by call_op is to a external function + and calls via PLT must be avoided as specified by -fno-plt=, then + return true. */ + +static int +avoid_plt_to_call(rtx call_op) +{ + const char *name; + if (GET_CODE (call_op) != SYMBOL_REF + || SYMBOL_REF_LOCAL_P (call_op) + || avoid_plt_fnsymbol_names_tab == NULL) + return 0; + name = XSTR (call_op, 0); + if (htab_find_slot (avoid_plt_fnsymbol_names_tab, name, NO_INSERT) != NULL) + return 1; + return 0; +} + /* Output the assembly for a call instruction. */ const char * @@ -25294,7 +25313,12 @@ ix86_output_call_insn (rtx insn, rtx call_op) if (SIBLING_CALL_P (insn)) { if (direct_p) - xasm = "jmp\t%P0"; + { + if (avoid_plt_to_call (call_op)) + xasm = "jmp\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "jmp\t%P0"; + } /* SEH epilogue detection requires the indirect branch case to include REX.W. */ else if (TARGET_SEH) @@ -25346,9 +25370,15 @@ ix86_output_call_insn (rtx insn, rtx call_op) } if (direct_p) - xasm = "call\t%P0"; + { + if (avoid_plt_to_call (call_op)) + xasm = "call\t*%p0@GOTPCREL(%%rip)"; + else + xasm = "call\t%P0"; + } else xasm = "call\t%A0"; + output_asm_insn (xasm, &call_op); Index: opts-global.c =================================================================== --- opts-global.c (revision 222641) +++ opts-global.c (working copy) @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3. If not see #include "xregex.h" #include "attribs.h" #include "stringpool.h" +#include "hash-table.h" typedef const char *const_char_p; /* For DEF_VEC_P. */ @@ -420,6 +421,17 @@ decode_options (struct gcc_options *opts, struct g finish_options (opts, opts_set, loc); } +/* Helper function for the hash table that compares the + existing entry (S1) with the given string (S2). */ + +static int +htab_str_eq (const void *s1, const void *s2) +{ + return !strcmp ((const char *)s1, (const char *) s2); +} + +htab_t avoid_plt_fnsymbol_names_tab = NULL; + /* Process common options that have been deferred until after the handlers have been called for all options. */ @@ -539,6 +551,15 @@ handle_common_deferred_options (void) stack_limit_rtx = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (opt->arg)); break; + case OPT_fno_plt_: + void **slot; + if (avoid_plt_fnsymbol_names_tab == NULL) + avoid_plt_fnsymbol_names_tab = htab_create (10, htab_hash_string, + htab_str_eq, NULL); + slot = htab_find_slot (avoid_plt_fnsymbol_names_tab, opt->arg, INSERT); + *slot = (void *)opt->arg; + break; + default: gcc_unreachable (); } ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-01 0:31 Sriraman Tallam @ 2015-05-01 3:21 ` Alan Modra 2015-05-01 3:26 ` Sriraman Tallam 2015-05-01 15:01 ` Andi Kleen 2015-05-04 14:45 ` Michael Matz 2 siblings, 1 reply; 65+ messages in thread From: Alan Modra @ 2015-05-01 3:21 UTC (permalink / raw) To: Sriraman Tallam; +Cc: GCC Patches, H.J. Lu, David Li On Thu, Apr 30, 2015 at 05:31:30PM -0700, Sriraman Tallam wrote: > This comes with caveats. This cannot be generally done for all > functions marked extern as it is impossible for the compiler to say if > a function is "truly extern" (defined in a shared library). If a > function is not truly extern(ends up defined in the final executable), > then calling it indirectly is a performance penalty as it could have > been a direct call. Further, the newly created GOT entries are fixed > up at start-up and do not get lazily bound. I've considered something similar for PowerPC (but didn't consider doing do so for a subset of calls). Losing lazy symbol resolution is a real problem. The other problem you cite of indirect calls that could be direct can be fixed in the linker relatively easily. Edit this code 0: ff 15 00 00 00 00 callq *0x0(%rip) # 0x6 2: R_X86_64_GOTPCREL foo-0x4 6: ff 25 00 00 00 00 jmpq *0x0(%rip) # 0xc 8: R_X86_64_GOTPCREL foo-0x4 to this c: e8 00 00 00 00 callq 0x11 d: R_X86_64_PC32 foo-0x4 11: 90 nop 12: e9 00 00 00 00 jmpq 0x17 13: R_X86_64_PC32 foo-0x4 17: 90 nop You may need to have gcc or gas add a marker reloc to say exactly where an instruction starts. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-01 3:21 ` Alan Modra @ 2015-05-01 3:26 ` Sriraman Tallam 0 siblings, 0 replies; 65+ messages in thread From: Sriraman Tallam @ 2015-05-01 3:26 UTC (permalink / raw) To: Sriraman Tallam, GCC Patches, H.J. Lu, David Li On Thu, Apr 30, 2015 at 8:21 PM, Alan Modra <amodra@gmail.com> wrote: > On Thu, Apr 30, 2015 at 05:31:30PM -0700, Sriraman Tallam wrote: >> This comes with caveats. This cannot be generally done for all >> functions marked extern as it is impossible for the compiler to say if >> a function is "truly extern" (defined in a shared library). If a >> function is not truly extern(ends up defined in the final executable), >> then calling it indirectly is a performance penalty as it could have >> been a direct call. Further, the newly created GOT entries are fixed >> up at start-up and do not get lazily bound. > > I've considered something similar for PowerPC (but didn't consider > doing do so for a subset of calls). Losing lazy symbol resolution is > a real problem. With -fno-plt= option, you are choosing functions that are hot and PLT must be avoided. Losing lazy binding on these should be perfectly fine because they would be called. Thanks Sri The other problem you cite of indirect calls that > could be direct can be fixed in the linker relatively easily. > Edit this code > 0: ff 15 00 00 00 00 callq *0x0(%rip) # 0x6 > 2: R_X86_64_GOTPCREL foo-0x4 > 6: ff 25 00 00 00 00 jmpq *0x0(%rip) # 0xc > 8: R_X86_64_GOTPCREL foo-0x4 > to this > c: e8 00 00 00 00 callq 0x11 > d: R_X86_64_PC32 foo-0x4 > 11: 90 nop > 12: e9 00 00 00 00 jmpq 0x17 > 13: R_X86_64_PC32 foo-0x4 > 17: 90 nop > You may need to have gcc or gas add a marker reloc to say exactly > where an instruction starts. > > -- > Alan Modra > Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-01 0:31 Sriraman Tallam 2015-05-01 3:21 ` Alan Modra @ 2015-05-01 15:01 ` Andi Kleen 2015-05-01 16:19 ` Xinliang David Li 2015-05-01 17:50 ` Sriraman Tallam 2015-05-04 14:45 ` Michael Matz 2 siblings, 2 replies; 65+ messages in thread From: Andi Kleen @ 2015-05-01 15:01 UTC (permalink / raw) To: Sriraman Tallam; +Cc: GCC Patches, H.J. Lu, David Li Sriraman Tallam <tmsriram@google.com> writes: > > This comes with caveats. This cannot be generally done for all > functions marked extern as it is impossible for the compiler to say if > a function is "truly extern" (defined in a shared library). If a > function is not truly extern(ends up defined in the final executable), > then calling it indirectly is a performance penalty as it could have > been a direct call. Further, the newly created GOT entries are fixed > up at start-up and do not get lazily bound. This means you need to make it depend on -fno-semantic-interposition ? > Given this, I propose adding a new option called > -fno-plt=<function-name> to the compiler. This tells the compiler > that we know that the function is truly extern and we want the > indirect call only for these call-sites. I have attached a patch that > adds -fno-plt= to GCC. Any number of "-fno-plt=" can be specified and > all call-sites corresponding to these named functions will be done > indirectly using the mechanism described above without the use of a > PLT stub. The argument seems awkward. The command line may get very long. Better an attribute? Longer term it would be probably better to support it properly in the linker. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-01 15:01 ` Andi Kleen @ 2015-05-01 16:19 ` Xinliang David Li 2015-05-01 16:23 ` H.J. Lu 2015-05-01 17:50 ` Sriraman Tallam 1 sibling, 1 reply; 65+ messages in thread From: Xinliang David Li @ 2015-05-01 16:19 UTC (permalink / raw) To: Andi Kleen; +Cc: Sriraman Tallam, GCC Patches, H.J. Lu On Fri, May 1, 2015 at 8:01 AM, Andi Kleen <andi@firstfloor.org> wrote: > Sriraman Tallam <tmsriram@google.com> writes: >> >> This comes with caveats. This cannot be generally done for all >> functions marked extern as it is impossible for the compiler to say if >> a function is "truly extern" (defined in a shared library). If a >> function is not truly extern(ends up defined in the final executable), >> then calling it indirectly is a performance penalty as it could have >> been a direct call. Further, the newly created GOT entries are fixed >> up at start-up and do not get lazily bound. > > This means you need to make it depend on -fno-semantic-interposition ? > >> Given this, I propose adding a new option called >> -fno-plt=<function-name> to the compiler. This tells the compiler >> that we know that the function is truly extern and we want the >> indirect call only for these call-sites. I have attached a patch that >> adds -fno-plt= to GCC. Any number of "-fno-plt=" can be specified and >> all call-sites corresponding to these named functions will be done >> indirectly using the mechanism described above without the use of a >> PLT stub. > > The argument seems awkward. The command line may get very long. > Better an attribute? They are complementary. Perhaps another option like linker's --dynamic-list=<> that can take a file specifying the list of symbols. > > Longer term it would be probably better to support it properly > in the linker. > Linker solution has its own downside -- it require reserving more space conservatively for many callsites which end up being direct calls. David > -Andi > > -- > ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-01 16:19 ` Xinliang David Li @ 2015-05-01 16:23 ` H.J. Lu 2015-05-01 16:26 ` Xinliang David Li 0 siblings, 1 reply; 65+ messages in thread From: H.J. Lu @ 2015-05-01 16:23 UTC (permalink / raw) To: Xinliang David Li; +Cc: Andi Kleen, Sriraman Tallam, GCC Patches On Fri, May 1, 2015 at 9:19 AM, Xinliang David Li <davidxl@google.com> wrote: > On Fri, May 1, 2015 at 8:01 AM, Andi Kleen <andi@firstfloor.org> wrote: >> Sriraman Tallam <tmsriram@google.com> writes: >>> >>> This comes with caveats. This cannot be generally done for all >>> functions marked extern as it is impossible for the compiler to say if >>> a function is "truly extern" (defined in a shared library). If a >>> function is not truly extern(ends up defined in the final executable), >>> then calling it indirectly is a performance penalty as it could have >>> been a direct call. Further, the newly created GOT entries are fixed >>> up at start-up and do not get lazily bound. >> >> This means you need to make it depend on -fno-semantic-interposition ? >> >>> Given this, I propose adding a new option called >>> -fno-plt=<function-name> to the compiler. This tells the compiler >>> that we know that the function is truly extern and we want the >>> indirect call only for these call-sites. I have attached a patch that >>> adds -fno-plt= to GCC. Any number of "-fno-plt=" can be specified and >>> all call-sites corresponding to these named functions will be done >>> indirectly using the mechanism described above without the use of a >>> PLT stub. >> >> The argument seems awkward. The command line may get very long. >> Better an attribute? > > They are complementary. Perhaps another option like linker's > --dynamic-list=<> that can take a file specifying the list of symbols. > >> >> Longer term it would be probably better to support it properly >> in the linker. >> > > Linker solution has its own downside -- it require reserving more > space conservatively for many callsites which end up being direct > calls. > Can we do it automatically for LTO? -- H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-01 16:23 ` H.J. Lu @ 2015-05-01 16:26 ` Xinliang David Li 2015-05-01 18:06 ` Sriraman Tallam 0 siblings, 1 reply; 65+ messages in thread From: Xinliang David Li @ 2015-05-01 16:26 UTC (permalink / raw) To: H.J. Lu; +Cc: Andi Kleen, Sriraman Tallam, GCC Patches yes -- it is good to turn this on by default in LTO mode without requiring user to specify the option. David On Fri, May 1, 2015 at 9:23 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Fri, May 1, 2015 at 9:19 AM, Xinliang David Li <davidxl@google.com> wrote: >> On Fri, May 1, 2015 at 8:01 AM, Andi Kleen <andi@firstfloor.org> wrote: >>> Sriraman Tallam <tmsriram@google.com> writes: >>>> >>>> This comes with caveats. This cannot be generally done for all >>>> functions marked extern as it is impossible for the compiler to say if >>>> a function is "truly extern" (defined in a shared library). If a >>>> function is not truly extern(ends up defined in the final executable), >>>> then calling it indirectly is a performance penalty as it could have >>>> been a direct call. Further, the newly created GOT entries are fixed >>>> up at start-up and do not get lazily bound. >>> >>> This means you need to make it depend on -fno-semantic-interposition ? >>> >>>> Given this, I propose adding a new option called >>>> -fno-plt=<function-name> to the compiler. This tells the compiler >>>> that we know that the function is truly extern and we want the >>>> indirect call only for these call-sites. I have attached a patch that >>>> adds -fno-plt= to GCC. Any number of "-fno-plt=" can be specified and >>>> all call-sites corresponding to these named functions will be done >>>> indirectly using the mechanism described above without the use of a >>>> PLT stub. >>> >>> The argument seems awkward. The command line may get very long. >>> Better an attribute? >> >> They are complementary. Perhaps another option like linker's >> --dynamic-list=<> that can take a file specifying the list of symbols. >> >>> >>> Longer term it would be probably better to support it properly >>> in the linker. >>> >> >> Linker solution has its own downside -- it require reserving more >> space conservatively for many callsites which end up being direct >> calls. >> > > Can we do it automatically for LTO? > > > -- > H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-01 16:26 ` Xinliang David Li @ 2015-05-01 18:06 ` Sriraman Tallam 2015-05-02 12:12 ` Andi Kleen 0 siblings, 1 reply; 65+ messages in thread From: Sriraman Tallam @ 2015-05-01 18:06 UTC (permalink / raw) To: Xinliang David Li; +Cc: H.J. Lu, Andi Kleen, GCC Patches On Fri, May 1, 2015 at 9:26 AM, Xinliang David Li <davidxl@google.com> wrote: > yes -- it is good to turn this on by default in LTO mode without > requiring user to specify the option. Yes, with LTO, we would exactly know what the "truly extern" functions are and PLT stubs can be eliminated for all extern functions when early binding is specified. With lazy binding, we can eliminate the PLT stubs selectively for the hot extern functions. Thanks Sri > > David > > On Fri, May 1, 2015 at 9:23 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Fri, May 1, 2015 at 9:19 AM, Xinliang David Li <davidxl@google.com> wrote: >>> On Fri, May 1, 2015 at 8:01 AM, Andi Kleen <andi@firstfloor.org> wrote: >>>> Sriraman Tallam <tmsriram@google.com> writes: >>>>> >>>>> This comes with caveats. This cannot be generally done for all >>>>> functions marked extern as it is impossible for the compiler to say if >>>>> a function is "truly extern" (defined in a shared library). If a >>>>> function is not truly extern(ends up defined in the final executable), >>>>> then calling it indirectly is a performance penalty as it could have >>>>> been a direct call. Further, the newly created GOT entries are fixed >>>>> up at start-up and do not get lazily bound. >>>> >>>> This means you need to make it depend on -fno-semantic-interposition ? >>>> >>>>> Given this, I propose adding a new option called >>>>> -fno-plt=<function-name> to the compiler. This tells the compiler >>>>> that we know that the function is truly extern and we want the >>>>> indirect call only for these call-sites. I have attached a patch that >>>>> adds -fno-plt= to GCC. Any number of "-fno-plt=" can be specified and >>>>> all call-sites corresponding to these named functions will be done >>>>> indirectly using the mechanism described above without the use of a >>>>> PLT stub. >>>> >>>> The argument seems awkward. The command line may get very long. >>>> Better an attribute? >>> >>> They are complementary. Perhaps another option like linker's >>> --dynamic-list=<> that can take a file specifying the list of symbols. >>> >>>> >>>> Longer term it would be probably better to support it properly >>>> in the linker. >>>> >>> >>> Linker solution has its own downside -- it require reserving more >>> space conservatively for many callsites which end up being direct >>> calls. >>> >> >> Can we do it automatically for LTO? >> >> >> -- >> H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-01 18:06 ` Sriraman Tallam @ 2015-05-02 12:12 ` Andi Kleen 0 siblings, 0 replies; 65+ messages in thread From: Andi Kleen @ 2015-05-02 12:12 UTC (permalink / raw) To: Sriraman Tallam; +Cc: Xinliang David Li, H.J. Lu, Andi Kleen, GCC Patches On Fri, May 01, 2015 at 11:05:58AM -0700, Sriraman Tallam wrote: > On Fri, May 1, 2015 at 9:26 AM, Xinliang David Li <davidxl@google.com> wrote: > > yes -- it is good to turn this on by default in LTO mode without > > requiring user to specify the option. > > Yes, with LTO, we would exactly know what the "truly extern" functions > are ... unless a function is overwritten somewhere else at dynamic link time That's why you may need -fno-semantic... -Andi ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-01 15:01 ` Andi Kleen 2015-05-01 16:19 ` Xinliang David Li @ 2015-05-01 17:50 ` Sriraman Tallam 1 sibling, 0 replies; 65+ messages in thread From: Sriraman Tallam @ 2015-05-01 17:50 UTC (permalink / raw) To: Andi Kleen; +Cc: GCC Patches, H.J. Lu, David Li On Fri, May 1, 2015 at 8:01 AM, Andi Kleen <andi@firstfloor.org> wrote: > Sriraman Tallam <tmsriram@google.com> writes: >> >> This comes with caveats. This cannot be generally done for all >> functions marked extern as it is impossible for the compiler to say if >> a function is "truly extern" (defined in a shared library). If a >> function is not truly extern(ends up defined in the final executable), >> then calling it indirectly is a performance penalty as it could have >> been a direct call. Further, the newly created GOT entries are fixed >> up at start-up and do not get lazily bound. > > This means you need to make it depend on -fno-semantic-interposition ? Please correct me if I am wrong but I do not see any dependency on semantic-interposition. The GOT entry created for the function pointer (whose PLT has been eliminated) has a dynamic relocation against it to fixup the address at run-time and the dynamic linker fills it with the right address. This is not a new mechanism. The same mechanism is used when we access function pointers with PIE for instance. Thanks Sri > >> Given this, I propose adding a new option called >> -fno-plt=<function-name> to the compiler. This tells the compiler >> that we know that the function is truly extern and we want the >> indirect call only for these call-sites. I have attached a patch that >> adds -fno-plt= to GCC. Any number of "-fno-plt=" can be specified and >> all call-sites corresponding to these named functions will be done >> indirectly using the mechanism described above without the use of a >> PLT stub. > > The argument seems awkward. The command line may get very long. > Better an attribute? > > Longer term it would be probably better to support it properly > in the linker. > > -Andi > > -- > ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-01 0:31 Sriraman Tallam 2015-05-01 3:21 ` Alan Modra 2015-05-01 15:01 ` Andi Kleen @ 2015-05-04 14:45 ` Michael Matz 2015-05-04 16:43 ` Xinliang David Li 2015-05-09 16:35 ` H.J. Lu 2 siblings, 2 replies; 65+ messages in thread From: Michael Matz @ 2015-05-04 14:45 UTC (permalink / raw) To: Sriraman Tallam; +Cc: GCC Patches, H.J. Lu, David Li Hi, On Thu, 30 Apr 2015, Sriraman Tallam wrote: > We noticed that one of our benchmarks sped-up by ~1% when we eliminated > PLT stubs for some of the hot external library functions like memcmp, > pow. The win was from better icache and itlb performance. The main > reason was that the PLT stubs had no spatial locality with the > call-sites. I have started looking at ways to tell the compiler to > eliminate PLT stubs (in-effect inline them) for specified external > functions, for x86_64. I have a proposal and a patch and I would like to > hear what you think. > > This comes with caveats. This cannot be generally done for all > functions marked extern as it is impossible for the compiler to say if a > function is "truly extern" (defined in a shared library). If a function > is not truly extern(ends up defined in the final executable), then > calling it indirectly is a performance penalty as it could have been a > direct call. This can be fixed by Alans idea. > Further, the newly created GOT entries are fixed up at > start-up and do not get lazily bound. And this can be fixed by some enhancements in the linker and dynamic linker. The idea is to still generate a PLT stub and make its GOT entry point to it initially (like a normal got.plt slot). Then the first indirect call will use the address of PLT entry (starting lazy resolution) and update the GOT slot with the real address, so further indirect calls will directly go to the function. This requires a new asm marker (and hence new reloc) as normally if there's a GOT slot it's filled by the real symbols address, unlike if there's only a got.plt slot. E.g. a call *foo@GOTPLT(%rip) would generate a GOT slot (and fill its address into above call insn), but generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one. Ciao, Michael. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-04 14:45 ` Michael Matz @ 2015-05-04 16:43 ` Xinliang David Li 2015-05-04 16:58 ` Michael Matz 2015-05-09 16:35 ` H.J. Lu 1 sibling, 1 reply; 65+ messages in thread From: Xinliang David Li @ 2015-05-04 16:43 UTC (permalink / raw) To: Michael Matz; +Cc: Sriraman Tallam, GCC Patches, H.J. Lu The use case proposed by Sri allows user to selectively eliminate PLT overhead for hot external calls only. In such scenarios, lazy binding won't be something matters to the user. David On Mon, May 4, 2015 at 7:45 AM, Michael Matz <matz@suse.de> wrote: > Hi, > > On Thu, 30 Apr 2015, Sriraman Tallam wrote: > >> We noticed that one of our benchmarks sped-up by ~1% when we eliminated >> PLT stubs for some of the hot external library functions like memcmp, >> pow. The win was from better icache and itlb performance. The main >> reason was that the PLT stubs had no spatial locality with the >> call-sites. I have started looking at ways to tell the compiler to >> eliminate PLT stubs (in-effect inline them) for specified external >> functions, for x86_64. I have a proposal and a patch and I would like to >> hear what you think. >> >> This comes with caveats. This cannot be generally done for all >> functions marked extern as it is impossible for the compiler to say if a >> function is "truly extern" (defined in a shared library). If a function >> is not truly extern(ends up defined in the final executable), then >> calling it indirectly is a performance penalty as it could have been a >> direct call. > > This can be fixed by Alans idea. > >> Further, the newly created GOT entries are fixed up at >> start-up and do not get lazily bound. > > And this can be fixed by some enhancements in the linker and dynamic > linker. The idea is to still generate a PLT stub and make its GOT entry > point to it initially (like a normal got.plt slot). Then the first > indirect call will use the address of PLT entry (starting lazy resolution) > and update the GOT slot with the real address, so further indirect calls > will directly go to the function. > > This requires a new asm marker (and hence new reloc) as normally if > there's a GOT slot it's filled by the real symbols address, unlike if > there's only a got.plt slot. E.g. a > > call *foo@GOTPLT(%rip) > > would generate a GOT slot (and fill its address into above call insn), but > generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one. > > > Ciao, > Michael. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-04 16:43 ` Xinliang David Li @ 2015-05-04 16:58 ` Michael Matz 2015-05-04 17:22 ` Xinliang David Li 0 siblings, 1 reply; 65+ messages in thread From: Michael Matz @ 2015-05-04 16:58 UTC (permalink / raw) To: Xinliang David Li; +Cc: Sriraman Tallam, GCC Patches, H.J. Lu Hi, On Mon, 4 May 2015, Xinliang David Li wrote: > The use case proposed by Sri allows user to selectively eliminate PLT > overhead for hot external calls only. Yes, but only _because_ his approach doesn't use lazy binding. With the full solution such restriction to a subset of functions isn't necessary. And we should strive for going the full way, instead of adding hacks, shouldn't we? Ciao, Michael. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-04 16:58 ` Michael Matz @ 2015-05-04 17:22 ` Xinliang David Li 0 siblings, 0 replies; 65+ messages in thread From: Xinliang David Li @ 2015-05-04 17:22 UTC (permalink / raw) To: Michael Matz; +Cc: Sriraman Tallam, GCC Patches, H.J. Lu yes -- a full solution that supports lazy binding will be nice. David On Mon, May 4, 2015 at 9:58 AM, Michael Matz <matz@suse.de> wrote: > Hi, > > On Mon, 4 May 2015, Xinliang David Li wrote: > >> The use case proposed by Sri allows user to selectively eliminate PLT >> overhead for hot external calls only. > > Yes, but only _because_ his approach doesn't use lazy binding. With the > full solution such restriction to a subset of functions isn't necessary. > And we should strive for going the full way, instead of adding hacks, > shouldn't we? > > > Ciao, > Michael. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= 2015-05-04 14:45 ` Michael Matz 2015-05-04 16:43 ` Xinliang David Li @ 2015-05-09 16:35 ` H.J. Lu 1 sibling, 0 replies; 65+ messages in thread From: H.J. Lu @ 2015-05-09 16:35 UTC (permalink / raw) To: Michael Matz; +Cc: Sriraman Tallam, GCC Patches, David Li On Mon, May 4, 2015 at 7:45 AM, Michael Matz <matz@suse.de> wrote: > Hi, > > On Thu, 30 Apr 2015, Sriraman Tallam wrote: > >> We noticed that one of our benchmarks sped-up by ~1% when we eliminated >> PLT stubs for some of the hot external library functions like memcmp, >> pow. The win was from better icache and itlb performance. The main >> reason was that the PLT stubs had no spatial locality with the >> call-sites. I have started looking at ways to tell the compiler to >> eliminate PLT stubs (in-effect inline them) for specified external >> functions, for x86_64. I have a proposal and a patch and I would like to >> hear what you think. >> >> This comes with caveats. This cannot be generally done for all >> functions marked extern as it is impossible for the compiler to say if a >> function is "truly extern" (defined in a shared library). If a function >> is not truly extern(ends up defined in the final executable), then >> calling it indirectly is a performance penalty as it could have been a >> direct call. > > This can be fixed by Alans idea. > >> Further, the newly created GOT entries are fixed up at >> start-up and do not get lazily bound. > > And this can be fixed by some enhancements in the linker and dynamic > linker. The idea is to still generate a PLT stub and make its GOT entry > point to it initially (like a normal got.plt slot). Then the first > indirect call will use the address of PLT entry (starting lazy resolution) > and update the GOT slot with the real address, so further indirect calls > will directly go to the function. > > This requires a new asm marker (and hence new reloc) as normally if > there's a GOT slot it's filled by the real symbols address, unlike if > there's only a got.plt slot. E.g. a > > call *foo@GOTPLT(%rip) > > would generate a GOT slot (and fill its address into above call insn), but > generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one. > I added the "relax" prefix support to x86 assembler on users/hjl/relax branch at https://sourceware.org/git/?p=binutils-gdb.git;a=summary [hjl@gnu-tools-1 relax-3]$ cat r.S .text relax jmp foo relax call foo relax jmp foo@plt relax call foo@plt [hjl@gnu-tools-1 relax-3]$ ./as -o r.o r.S [hjl@gnu-tools-1 relax-3]$ ./objdump -drw r.o r.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <.text>: 0: 66 e9 00 00 00 00 data16 jmpq 0x6 2: R_X86_64_RELAX_PC32 foo-0x4 6: 66 e8 00 00 00 00 data16 callq 0xc 8: R_X86_64_RELAX_PC32 foo-0x4 c: 66 e9 00 00 00 00 data16 jmpq 0x12 e: R_X86_64_RELAX_PLT32foo-0x4 12: 66 e8 00 00 00 00 data16 callq 0x18 14: R_X86_64_RELAX_PLT32foo-0x4 [hjl@gnu-tools-1 relax-3]$ Right now, the relax relocations are treated as PC32/PLT32 relocations. I am working on linker support. -- H.J. ^ permalink raw reply [flat|nested] 65+ messages in thread
end of thread, other threads:[~2015-07-24 18:44 UTC | newest] Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-10 15:19 [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= H.J. Lu [not found] ` <CAAs8HmwWSDY+KjKcB4W=TiYV0Pz7NSvfL_8igp+hPT-LU1utTg@mail.gmail.com> 2015-05-21 21:31 ` Sriraman Tallam 2015-05-21 21:39 ` Sriraman Tallam 2015-05-21 22:02 ` Pedro Alves 2015-05-21 22:02 ` Jakub Jelinek 2015-05-22 1:47 ` H.J. Lu 2015-05-22 3:38 ` Xinliang David Li 2015-05-21 22:34 ` Sriraman Tallam 2015-05-22 9:22 ` Pedro Alves 2015-05-22 15:13 ` Sriraman Tallam 2015-05-28 18:53 ` Sriraman Tallam 2015-05-28 19:05 ` H.J. Lu 2015-05-28 19:48 ` Sriraman Tallam 2015-05-28 20:19 ` H.J. Lu 2015-05-28 21:27 ` Sriraman Tallam 2015-05-28 21:31 ` H.J. Lu 2015-05-28 21:52 ` Sriraman Tallam 2015-05-28 22:48 ` H.J. Lu 2015-05-29 3:51 ` Sriraman Tallam 2015-05-29 5:13 ` H.J. Lu 2015-05-29 7:13 ` Sriraman Tallam 2015-05-29 17:36 ` Sriraman Tallam 2015-05-29 17:52 ` H.J. Lu 2015-05-29 18:33 ` Sriraman Tallam 2015-05-29 20:50 ` Jan Hubicka 2015-05-29 22:56 ` Sriraman Tallam 2015-05-29 23:08 ` Sriraman Tallam [not found] ` <CAJA7tRYsMiq7rx34c=z6KwRdwYxxaeP6Z6qzA4XEwnJSMT7z=Q@mail.gmail.com> 2015-05-30 4:44 ` Sriraman Tallam 2015-06-01 8:24 ` Ramana Radhakrishnan 2015-06-01 18:01 ` Sriraman Tallam 2015-06-01 18:41 ` Ramana Radhakrishnan 2015-06-01 18:55 ` Sriraman Tallam 2015-06-01 20:33 ` Ramana Radhakrishnan 2015-06-02 18:27 ` Sriraman Tallam 2015-06-02 19:59 ` Bernhard Reutner-Fischer 2015-06-02 20:09 ` Sriraman Tallam 2015-06-02 21:18 ` Bernhard Reutner-Fischer 2015-06-02 21:09 ` Ramana Radhakrishnan 2015-06-02 21:25 ` Xinliang David Li 2015-06-02 21:52 ` Bernhard Reutner-Fischer 2015-06-02 21:40 ` Sriraman Tallam 2015-06-03 14:37 ` Ramana Radhakrishnan 2015-06-03 18:53 ` Sriraman Tallam 2015-06-03 20:16 ` Richard Henderson 2015-06-03 20:59 ` Sriraman Tallam 2015-06-04 16:56 ` Sriraman Tallam 2015-06-04 17:30 ` Richard Henderson 2015-06-04 21:34 ` Sriraman Tallam 2015-07-24 19:02 ` H.J. Lu 2015-06-03 19:57 ` Richard Henderson -- strict thread matches above, loose matches on Subject: below -- 2015-05-01 0:31 Sriraman Tallam 2015-05-01 3:21 ` Alan Modra 2015-05-01 3:26 ` Sriraman Tallam 2015-05-01 15:01 ` Andi Kleen 2015-05-01 16:19 ` Xinliang David Li 2015-05-01 16:23 ` H.J. Lu 2015-05-01 16:26 ` Xinliang David Li 2015-05-01 18:06 ` Sriraman Tallam 2015-05-02 12:12 ` Andi Kleen 2015-05-01 17:50 ` Sriraman Tallam 2015-05-04 14:45 ` Michael Matz 2015-05-04 16:43 ` Xinliang David Li 2015-05-04 16:58 ` Michael Matz 2015-05-04 17:22 ` Xinliang David Li 2015-05-09 16:35 ` H.J. Lu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).