From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 49271 invoked by alias); 21 May 2015 21:12:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 49259 invoked by uid 89); 21 May 2015 21:12:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.1 required=5.0 tests=AWL,BAYES_50,KAM_ASCII_DIVIDERS,KAM_STOCKGEN,RCVD_IN_DNSWL_LOW,SPF_PASS,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: mail-vn0-f44.google.com Received: from mail-vn0-f44.google.com (HELO mail-vn0-f44.google.com) (209.85.216.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 21 May 2015 21:12:23 +0000 Received: by vnbg190 with SMTP id g190so6893692vnb.3 for ; Thu, 21 May 2015 14:12:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=pZpJbDUPwqxX6EOZ3v2ZeYN8syGvPCOnWQVIAs7t9q8=; b=kASA9WTffGhZpe+gOjaPw29EVMW4Bkce8SbUMqzPWVwPQdvuaOaE+9CJ2i3950gOlC AiPP6KoNZ5SXvb6O+/92YrYLp5L6/dUfW9K3GiNc0jliKBqgr8ubT0mWAdqo2NYwfJUn tje0AGQdkl++TlgEu6QkW/JZ5DuIJGShVOM6n6mDCruG33izGsViLz18T2dPF1V/yE2e 3SXNhA5KX/sKbdk+QXe05uBXPl97j+KsFhVGBh+kR7GVnUGr2X87KQXx+Vgqplzea7J8 bJSErm6PWlHyY6+KMgDQxlByVsay9KcpZoyVq6JRZeiD/BXgEVJCgVPTBSABG1QTV0vW wJCA== X-Gm-Message-State: ALoCoQnA6aN/vTnw5safkyAiFx6TrPsTvwtHoCnSrZN9SJNwi2HBPD1IxEGmzj5sF4lZWW1uAivd MIME-Version: 1.0 X-Received: by 10.52.240.198 with SMTP id wc6mr4181528vdc.34.1432242741411; Thu, 21 May 2015 14:12:21 -0700 (PDT) Received: by 10.52.229.196 with HTTP; Thu, 21 May 2015 14:12:21 -0700 (PDT) In-Reply-To: References: Date: Thu, 21 May 2015 21:31:00 -0000 Message-ID: Subject: Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= From: Sriraman Tallam To: "H.J. Lu" , Michael Matz Cc: David Li , GCC Patches Content-Type: multipart/mixed; boundary=20cf307810b2d60f5805169dff69 X-IsSubscribed: yes X-SW-Source: 2015-05/txt/msg02041.txt.bz2 --20cf307810b2d60f5805169dff69 Content-Type: text/plain; charset=UTF-8 Content-length: 5995 On Sun, May 10, 2015 at 10:01 AM, Sriraman Tallam wrote: > > On Sun, May 10, 2015, 8:19 AM H.J. Lu wrote: > > On Sat, May 9, 2015 at 9:34 AM, H.J. Lu wrote: >> On Mon, May 4, 2015 at 7:45 AM, Michael Matz wrote: >>> Hi, >>> >>> On Thu, 30 Apr 2015, Sriraman Tallam wrote: >>> >>>> We noticed that one of our benchmarks sped-up by ~1% when we eliminated >>>> PLT stubs for some of the hot external library functions like memcmp, >>>> pow. The win was from better icache and itlb performance. The main >>>> reason was that the PLT stubs had no spatial locality with the >>>> call-sites. I have started looking at ways to tell the compiler to >>>> eliminate PLT stubs (in-effect inline them) for specified external >>>> functions, for x86_64. I have a proposal and a patch and I would like to >>>> hear what you think. >>>> >>>> This comes with caveats. This cannot be generally done for all >>>> functions marked extern as it is impossible for the compiler to say if a >>>> function is "truly extern" (defined in a shared library). If a function >>>> is not truly extern(ends up defined in the final executable), then >>>> calling it indirectly is a performance penalty as it could have been a >>>> direct call. >>> >>> This can be fixed by Alans idea. >>> >>>> Further, the newly created GOT entries are fixed up at >>>> start-up and do not get lazily bound. >>> >>> And this can be fixed by some enhancements in the linker and dynamic >>> linker. The idea is to still generate a PLT stub and make its GOT entry >>> point to it initially (like a normal got.plt slot). Then the first >>> indirect call will use the address of PLT entry (starting lazy >>> resolution) >>> and update the GOT slot with the real address, so further indirect calls >>> will directly go to the function. >>> >>> This requires a new asm marker (and hence new reloc) as normally if >>> there's a GOT slot it's filled by the real symbols address, unlike if >>> there's only a got.plt slot. E.g. a >>> >>> call *foo@GOTPLT(%rip) >>> >>> would generate a GOT slot (and fill its address into above call insn), >>> but >>> generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one. >>> >> >> I added the "relax" prefix support to x86 assembler on users/hjl/relax >> branch >> >> at >> >> https://sourceware.org/git/?p=binutils-gdb.git;a=summary >> >> [hjl@gnu-tools-1 relax-3]$ cat r.S >> .text >> relax jmp foo >> relax call foo >> relax jmp foo@plt >> relax call foo@plt >> [hjl@gnu-tools-1 relax-3]$ ./as -o r.o r.S >> [hjl@gnu-tools-1 relax-3]$ ./objdump -drw r.o >> >> r.o: file format elf64-x86-64 >> >> >> Disassembly of section .text: >> >> 0000000000000000 <.text>: >> 0: 66 e9 00 00 00 00 data16 jmpq 0x6 2: R_X86_64_RELAX_PC32 foo-0x4 >> 6: 66 e8 00 00 00 00 data16 callq 0xc 8: R_X86_64_RELAX_PC32 >> foo-0x4 >> c: 66 e9 00 00 00 00 data16 jmpq 0x12 e: >> R_X86_64_RELAX_PLT32foo-0x4 >> 12: 66 e8 00 00 00 00 data16 callq 0x18 14: >> R_X86_64_RELAX_PLT32foo-0x4 >> [hjl@gnu-tools-1 relax-3]$ >> >> Right now, the relax relocations are treated as PC32/PLT32 relocations. >> I am working on linker support. >> > > I implemented the linker support for x86-64: > > 00000000
: > 0: 48 83 ec 08 sub $0x8,%rsp > 4: e8 00 00 00 00 callq 9 5: R_X86_64_PC32 plt-0x4 > 9: e8 00 00 00 00 callq e a: R_X86_64_PLT32 plt-0x4 > e: e8 00 00 00 00 callq 13 f: R_X86_64_PC32 bar-0x4 > 13: 66 e8 00 00 00 00 data16 callq 19 15: > R_X86_64_RELAX_PC32 bar-0x4 > 19: 66 e8 00 00 00 00 data16 callq 1f 1b: > R_X86_64_RELAX_PLT32 bar-0x4 > 1f: 66 e8 00 00 00 00 data16 callq 25 21: > R_X86_64_RELAX_PC32 foo-0x4 > 25: 66 e8 00 00 00 00 data16 callq 2b 27: > R_X86_64_RELAX_PLT32 foo-0x4 > 2b: 31 c0 xor %eax,%eax > 2d: 48 83 c4 08 add $0x8,%rsp > 31: c3 retq > > 00400460
: > 400460: 48 83 ec 08 sub $0x8,%rsp > 400464: e8 d7 ff ff ff callq 400440 > 400469: e8 d2 ff ff ff callq 400440 > 40046e: e8 ad ff ff ff callq 400420 > 400473: ff 15 ff 03 20 00 callq *0x2003ff(%rip) # 600878 > <_DYNAMIC+0xf8> > 400479: ff 15 f9 03 20 00 callq *0x2003f9(%rip) # 600878 > <_DYNAMIC+0xf8> > 40047f: 66 e8 f3 00 00 00 data16 callq 400578 > 400485: 66 e8 ed 00 00 00 data16 callq 400578 > 40048b: 31 c0 xor %eax,%eax > 40048d: 48 83 c4 08 add $0x8,%rsp > 400491: c3 retq > > Sriraman, can you give it a try? I like HJ's proposal here and it is important that the linker fixes unnecessary indirect calls to direct ones. However, independently I think my original proposal is still useful and I want to pitch it again for the following reasons. AFAIU, Alexander Monakov's -fno-plt does not solve the following: * Does not do anything for non-PIC code. The compiler does not generate a @PLT call but the linker will route all external calls via PLT. We noticed a problem with non-PIC executables where the PLT stubs were causing too many icache misses and are interested in a solution for this. * Aggressively uses indirect calls even if the final symbol is not truly external. This needs HJ's linker patch to fix unnecessary indirect calls to direct calls. My original proposal, for x86_64 only, was to add -fno-plt=. This lets the user decide for which functions PLT must be avoided. Let the compiler always generate an indirect call using call *func@GOTPCREL(%rip). We could do this for non-PIC code too. No need for linker fixups since this relies on the user to know that func is from a shared object. I am reattaching the patch. Thanks Sri > > Thanks! Will do! > > Sri > > -- > H.J. > > --20cf307810b2d60f5805169dff69 Content-Type: text/plain; charset=US-ASCII; name="avoid_plt_patch.txt" Content-Disposition: attachment; filename="avoid_plt_patch.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i9yo3tja0 Content-length: 5547 CSogY29tbW9uLm9wdCAoLWZuby1wbHQ9KTogTmV3IG9wdGlvbi4KCSogY29u ZmlnL2kzODYvaTM4Ni5jIChhdm9pZF9wbHRfdG9fY2FsbCk6IE5ldyBmdW5j dGlvbi4KCShpeDg2X291dHB1dF9jYWxsX2luc24pOiAgQ2hlY2sgaWYgUExU IG5lZWRzIHRvIGJlIGF2b2lkZWQKCWFuZCBjYWxsIG9yIGp1bXAgaW5kaXJl Y3RseSBpZiB0cnVlLgoJKiBvcHRzLWdsb2JhbC5jIChodGFiX3N0cl9lcSk6 IE5ldyBmdW5jdGlvbi4KCShhdm9pZF9wbHRfZm5zeW1ib2xfbmFtZXNfdGFi KTogTmV3IGh0YWIuCgkoaGFuZGxlX2NvbW1vbl9kZWZlcnJlZF9vcHRpb25z KTogSGFuZGxlIC1mbm8tcGx0PQoKSW5kZXg6IGNvbW1vbi5vcHQKPT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PQotLS0gY29tbW9uLm9wdAkocmV2aXNpb24gMjIy ODkyKQorKysgY29tbW9uLm9wdAkod29ya2luZyBjb3B5KQpAQCAtMTA4Nyw2 ICsxMDg3LDExIEBAIGZkYmctY250PQogQ29tbW9uIFJlamVjdE5lZ2F0aXZl IEpvaW5lZCBWYXIoY29tbW9uX2RlZmVycmVkX29wdGlvbnMpIERlZmVyCiAt ZmRiZy1jbnQ9PGNvdW50ZXI+OjxsaW1pdD5bLDxjb3VudGVyPjo8bGltaXQ+ LC4uLl0JU2V0IHRoZSBkZWJ1ZyBjb3VudGVyIGxpbWl0LiAgIAogCitmbm8t cGx0PQorQ29tbW9uIFJlamVjdE5lZ2F0aXZlIEpvaW5lZCBWYXIoY29tbW9u X2RlZmVycmVkX29wdGlvbnMpIERlZmVyCistZm5vLXBsdD08c3ltYm9sMT4g IEF2b2lkIGdvaW5nIHRocm91Z2ggdGhlIFBMVCB3aGVuIGNhbGxpbmcgdGhl IHNwZWNpZmllZCBmdW5jdGlvbi4KK0FsbG93IG11bHRpcGxlIGluc3RhbmNl cyBvZiB0aGlzIG9wdGlvbiB3aXRoIGRpZmZlcmVudCBmdW5jdGlvbiBuYW1l cy4KKwogZmRlYnVnLXByZWZpeC1tYXA9CiBDb21tb24gSm9pbmVkIFJlamVj dE5lZ2F0aXZlIFZhcihjb21tb25fZGVmZXJyZWRfb3B0aW9ucykgRGVmZXIK IE1hcCBvbmUgZGlyZWN0b3J5IG5hbWUgdG8gYW5vdGhlciBpbiBkZWJ1ZyBp bmZvcm1hdGlvbgpJbmRleDogY29uZmlnL2kzODYvaTM4Ni5jCj09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT0KLS0tIGNvbmZpZy9pMzg2L2kzODYuYwkocmV2aXNp b24gMjIyODkyKQorKysgY29uZmlnL2kzODYvaTM4Ni5jCSh3b3JraW5nIGNv cHkpCkBAIC0yNTI4Miw2ICsyNTI4MiwyNSBAQCBpeDg2X2V4cGFuZF9jYWxs IChydHggcmV0dmFsLCBydHggZm5hZGRyLCBydHggY2FsbAogICByZXR1cm4g Y2FsbDsKIH0KIAorZXh0ZXJuIGh0YWJfdCBhdm9pZF9wbHRfZm5zeW1ib2xf bmFtZXNfdGFiOworLyogSWYgdGhlIGZ1bmN0aW9uIHJlZmVyZW5jZWQgYnkg Y2FsbF9vcCBpcyB0byBhIGV4dGVybmFsIGZ1bmN0aW9uCisgICBhbmQgY2Fs bHMgdmlhIFBMVCBtdXN0IGJlIGF2b2lkZWQgYXMgc3BlY2lmaWVkIGJ5IC1m bm8tcGx0PSwgdGhlbgorICAgcmV0dXJuIHRydWUuICAqLworCitzdGF0aWMg aW50Cithdm9pZF9wbHRfdG9fY2FsbChydHggY2FsbF9vcCkKK3sKKyAgY29u c3QgY2hhciAqbmFtZTsKKyAgaWYgKEdFVF9DT0RFIChjYWxsX29wKSAhPSBT WU1CT0xfUkVGCisgICAgICB8fCBTWU1CT0xfUkVGX0xPQ0FMX1AgKGNhbGxf b3ApCisgICAgICB8fCBhdm9pZF9wbHRfZm5zeW1ib2xfbmFtZXNfdGFiID09 IE5VTEwpCisgICAgcmV0dXJuIDA7CisgIG5hbWUgPSBYU1RSIChjYWxsX29w LCAwKTsKKyAgaWYgKGh0YWJfZmluZF9zbG90IChhdm9pZF9wbHRfZm5zeW1i b2xfbmFtZXNfdGFiLCBuYW1lLCBOT19JTlNFUlQpICE9IE5VTEwpCisgICAg cmV0dXJuIDE7CisgIHJldHVybiAwOworfQorCiAvKiBPdXRwdXQgdGhlIGFz c2VtYmx5IGZvciBhIGNhbGwgaW5zdHJ1Y3Rpb24uICAqLwogCiBjb25zdCBj aGFyICoKQEAgLTI1Mjk0LDcgKzI1MzEzLDEyIEBAIGl4ODZfb3V0cHV0X2Nh bGxfaW5zbiAocnR4IGluc24sIHJ0eCBjYWxsX29wKQogICBpZiAoU0lCTElO R19DQUxMX1AgKGluc24pKQogICAgIHsKICAgICAgIGlmIChkaXJlY3RfcCkK LQl4YXNtID0gImptcFx0JVAwIjsKKwl7CisJICBpZiAoYXZvaWRfcGx0X3Rv X2NhbGwgKGNhbGxfb3ApKQorCSAgICB4YXNtID0gImptcFx0KiVwMEBHT1RQ Q1JFTCglJXJpcCkiOworCSAgZWxzZQorCSAgICB4YXNtID0gImptcFx0JVAw IjsKKwl9CiAgICAgICAvKiBTRUggZXBpbG9ndWUgZGV0ZWN0aW9uIHJlcXVp cmVzIHRoZSBpbmRpcmVjdCBicmFuY2ggY2FzZQogCSB0byBpbmNsdWRlIFJF WC5XLiAgKi8KICAgICAgIGVsc2UgaWYgKFRBUkdFVF9TRUgpCkBAIC0yNTM0 Niw5ICsyNTM3MCwxNSBAQCBpeDg2X291dHB1dF9jYWxsX2luc24gKHJ0eCBp bnNuLCBydHggY2FsbF9vcCkKICAgICB9CiAKICAgaWYgKGRpcmVjdF9wKQot ICAgIHhhc20gPSAiY2FsbFx0JVAwIjsKKyAgICB7CisgICAgICBpZiAoYXZv aWRfcGx0X3RvX2NhbGwgKGNhbGxfb3ApKQorICAgICAgICB4YXNtID0gImNh bGxcdColcDBAR09UUENSRUwoJSVyaXApIjsKKyAgICAgIGVsc2UKKyAgICAg ICAgeGFzbSA9ICJjYWxsXHQlUDAiOworICAgIH0KICAgZWxzZQogICAgIHhh c20gPSAiY2FsbFx0JUEwIjsKKyAKIAogICBvdXRwdXRfYXNtX2luc24gKHhh c20sICZjYWxsX29wKTsKIApJbmRleDogb3B0cy1nbG9iYWwuYwo9PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09Ci0tLSBvcHRzLWdsb2JhbC5jCShyZXZpc2lvbiAy MjI4OTIpCisrKyBvcHRzLWdsb2JhbC5jCSh3b3JraW5nIGNvcHkpCkBAIC00 Nyw2ICs0Nyw3IEBAIGFsb25nIHdpdGggR0NDOyBzZWUgdGhlIGZpbGUgQ09Q WUlORzMuICBJZiBub3Qgc2VlCiAjaW5jbHVkZSAieHJlZ2V4LmgiCiAjaW5j bHVkZSAiYXR0cmlicy5oIgogI2luY2x1ZGUgInN0cmluZ3Bvb2wuaCIKKyNp bmNsdWRlICJoYXNoLXRhYmxlLmgiCiAKIHR5cGVkZWYgY29uc3QgY2hhciAq Y29uc3RfY2hhcl9wOyAvKiBGb3IgREVGX1ZFQ19QLiAgKi8KIApAQCAtNDIw LDYgKzQyMSwxNyBAQCBkZWNvZGVfb3B0aW9ucyAoc3RydWN0IGdjY19vcHRp b25zICpvcHRzLCBzdHJ1Y3QgZwogICBmaW5pc2hfb3B0aW9ucyAob3B0cywg b3B0c19zZXQsIGxvYyk7CiB9CiAKKy8qIEhlbHBlciBmdW5jdGlvbiBmb3Ig dGhlIGhhc2ggdGFibGUgdGhhdCBjb21wYXJlcyB0aGUKKyAgIGV4aXN0aW5n IGVudHJ5IChTMSkgd2l0aCB0aGUgZ2l2ZW4gc3RyaW5nIChTMikuICAqLwor CitzdGF0aWMgaW50CitodGFiX3N0cl9lcSAoY29uc3Qgdm9pZCAqczEsIGNv bnN0IHZvaWQgKnMyKQoreworICByZXR1cm4gIXN0cmNtcCAoKGNvbnN0IGNo YXIgKilzMSwgKGNvbnN0IGNoYXIgKikgczIpOworfQorCitodGFiX3QgYXZv aWRfcGx0X2Zuc3ltYm9sX25hbWVzX3RhYiA9IE5VTEw7CisKIC8qIFByb2Nl c3MgY29tbW9uIG9wdGlvbnMgdGhhdCBoYXZlIGJlZW4gZGVmZXJyZWQgdW50 aWwgYWZ0ZXIgdGhlCiAgICBoYW5kbGVycyBoYXZlIGJlZW4gY2FsbGVkIGZv ciBhbGwgb3B0aW9ucy4gICovCiAKQEAgLTUzOSw2ICs1NTEsMTUgQEAgaGFu ZGxlX2NvbW1vbl9kZWZlcnJlZF9vcHRpb25zICh2b2lkKQogCSAgc3RhY2tf bGltaXRfcnR4ID0gZ2VuX3J0eF9TWU1CT0xfUkVGIChQbW9kZSwgZ2djX3N0 cmR1cCAob3B0LT5hcmcpKTsKIAkgIGJyZWFrOwogCisgICAgICAgIGNhc2Ug T1BUX2Zub19wbHRfOgorCSAgdm9pZCAqKnNsb3Q7CisJICBpZiAoYXZvaWRf cGx0X2Zuc3ltYm9sX25hbWVzX3RhYiA9PSBOVUxMKQorCSAgICBhdm9pZF9w bHRfZm5zeW1ib2xfbmFtZXNfdGFiID0gaHRhYl9jcmVhdGUgKDEwLCBodGFi X2hhc2hfc3RyaW5nLAorCQkJCQkJCWh0YWJfc3RyX2VxLCBOVUxMKTsKKyAg ICAgICAgICBzbG90ID0gaHRhYl9maW5kX3Nsb3QgKGF2b2lkX3BsdF9mbnN5 bWJvbF9uYW1lc190YWIsIG9wdC0+YXJnLCBJTlNFUlQpOworICAgICAgICAg ICpzbG90ID0gKHZvaWQgKilvcHQtPmFyZzsKKyAgICAgICAgICBicmVhazsK KwogCWRlZmF1bHQ6CiAJICBnY2NfdW5yZWFjaGFibGUgKCk7CiAJfQo= --20cf307810b2d60f5805169dff69--