From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 79828 invoked by alias); 21 May 2015 21:31:22 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 79818 invoked by uid 89); 21 May 2015 21:31:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.4 required=5.0 tests=AWL,BAYES_95,KAM_ASCII_DIVIDERS,KAM_STOCKGEN,RCVD_IN_DNSWL_LOW,SPF_PASS,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: mail-vn0-f51.google.com Received: from mail-vn0-f51.google.com (HELO mail-vn0-f51.google.com) (209.85.216.51) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 21 May 2015 21:31:19 +0000 Received: by vnbf129 with SMTP id f129so9098vnb.10 for ; Thu, 21 May 2015 14:31:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=AdWi81Ue/wLVYPPMO6tc36ehEqS3LMaK2wst7GYA7Z4=; b=f/ngwghiiLOf8+cH4zol+CNeWD1mpwjpJabe4AZvYSS6oy1Z9LlxHf5sfLJpzmEm5R Vr3lXDOL6au1CYsl3y4d7hRj6qwO9rV1zHZYN8eLR/zev3wfVILx/eLdwjGMx1S6OYtp rvmTQvcD8tBvumzuHwYZL7ciQfT4MJc0hc+BA6i4YFHQVP4Gad081S9UNqCbGbsU2GIF YIFRxsNj/5cJW3FVcTLk99h2EHYPGafrs9xpQXimkdq3QqHjerKIVaMlv3YBL7G6KPJ3 8IKupwW9SuQ5siRA5O+UMygpd77u6x+AyydgkRTTRBlnbxs/xMnrkrFeu9OVNv3troKS s91A== X-Gm-Message-State: ALoCoQmBn9nDuDgsQY2KEuEEDURKHWvDUdlakDsPONvxp97yDwiwkbI+Sj+US5Kl6VtRGmvsO3xb MIME-Version: 1.0 X-Received: by 10.52.89.174 with SMTP id bp14mr4137503vdb.58.1432243877091; Thu, 21 May 2015 14:31:17 -0700 (PDT) Received: by 10.52.229.196 with HTTP; Thu, 21 May 2015 14:31:16 -0700 (PDT) In-Reply-To: References: Date: Thu, 21 May 2015 21:39:00 -0000 Message-ID: Subject: Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= From: Sriraman Tallam To: "H.J. Lu" , Michael Matz , Jan Hubicka , amonakov@ispras.ru Cc: David Li , GCC Patches , Richard Biener Content-Type: multipart/mixed; boundary=20cf307abdc9877c7f05169e43dc X-IsSubscribed: yes X-SW-Source: 2015-05/txt/msg02044.txt.bz2 --20cf307abdc9877c7f05169e43dc Content-Type: text/plain; charset=UTF-8 Content-length: 6536 On Thu, May 21, 2015 at 2:12 PM, Sriraman Tallam wrote: > On Sun, May 10, 2015 at 10:01 AM, Sriraman Tallam wrote: >> >> On Sun, May 10, 2015, 8:19 AM H.J. Lu wrote: >> >> On Sat, May 9, 2015 at 9:34 AM, H.J. Lu wrote: >>> On Mon, May 4, 2015 at 7:45 AM, Michael Matz wrote: >>>> Hi, >>>> >>>> On Thu, 30 Apr 2015, Sriraman Tallam wrote: >>>> >>>>> We noticed that one of our benchmarks sped-up by ~1% when we eliminated >>>>> PLT stubs for some of the hot external library functions like memcmp, >>>>> pow. The win was from better icache and itlb performance. The main >>>>> reason was that the PLT stubs had no spatial locality with the >>>>> call-sites. I have started looking at ways to tell the compiler to >>>>> eliminate PLT stubs (in-effect inline them) for specified external >>>>> functions, for x86_64. I have a proposal and a patch and I would like to >>>>> hear what you think. >>>>> >>>>> This comes with caveats. This cannot be generally done for all >>>>> functions marked extern as it is impossible for the compiler to say if a >>>>> function is "truly extern" (defined in a shared library). If a function >>>>> is not truly extern(ends up defined in the final executable), then >>>>> calling it indirectly is a performance penalty as it could have been a >>>>> direct call. >>>> >>>> This can be fixed by Alans idea. >>>> >>>>> Further, the newly created GOT entries are fixed up at >>>>> start-up and do not get lazily bound. >>>> >>>> And this can be fixed by some enhancements in the linker and dynamic >>>> linker. The idea is to still generate a PLT stub and make its GOT entry >>>> point to it initially (like a normal got.plt slot). Then the first >>>> indirect call will use the address of PLT entry (starting lazy >>>> resolution) >>>> and update the GOT slot with the real address, so further indirect calls >>>> will directly go to the function. >>>> >>>> This requires a new asm marker (and hence new reloc) as normally if >>>> there's a GOT slot it's filled by the real symbols address, unlike if >>>> there's only a got.plt slot. E.g. a >>>> >>>> call *foo@GOTPLT(%rip) >>>> >>>> would generate a GOT slot (and fill its address into above call insn), >>>> but >>>> generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one. >>>> >>> >>> I added the "relax" prefix support to x86 assembler on users/hjl/relax >>> branch >>> >>> at >>> >>> https://sourceware.org/git/?p=binutils-gdb.git;a=summary >>> >>> [hjl@gnu-tools-1 relax-3]$ cat r.S >>> .text >>> relax jmp foo >>> relax call foo >>> relax jmp foo@plt >>> relax call foo@plt >>> [hjl@gnu-tools-1 relax-3]$ ./as -o r.o r.S >>> [hjl@gnu-tools-1 relax-3]$ ./objdump -drw r.o >>> >>> r.o: file format elf64-x86-64 >>> >>> >>> Disassembly of section .text: >>> >>> 0000000000000000 <.text>: >>> 0: 66 e9 00 00 00 00 data16 jmpq 0x6 2: R_X86_64_RELAX_PC32 foo-0x4 >>> 6: 66 e8 00 00 00 00 data16 callq 0xc 8: R_X86_64_RELAX_PC32 >>> foo-0x4 >>> c: 66 e9 00 00 00 00 data16 jmpq 0x12 e: >>> R_X86_64_RELAX_PLT32foo-0x4 >>> 12: 66 e8 00 00 00 00 data16 callq 0x18 14: >>> R_X86_64_RELAX_PLT32foo-0x4 >>> [hjl@gnu-tools-1 relax-3]$ >>> >>> Right now, the relax relocations are treated as PC32/PLT32 relocations. >>> I am working on linker support. >>> >> >> I implemented the linker support for x86-64: >> >> 00000000
: >> 0: 48 83 ec 08 sub $0x8,%rsp >> 4: e8 00 00 00 00 callq 9 5: R_X86_64_PC32 plt-0x4 >> 9: e8 00 00 00 00 callq e a: R_X86_64_PLT32 plt-0x4 >> e: e8 00 00 00 00 callq 13 f: R_X86_64_PC32 bar-0x4 >> 13: 66 e8 00 00 00 00 data16 callq 19 15: >> R_X86_64_RELAX_PC32 bar-0x4 >> 19: 66 e8 00 00 00 00 data16 callq 1f 1b: >> R_X86_64_RELAX_PLT32 bar-0x4 >> 1f: 66 e8 00 00 00 00 data16 callq 25 21: >> R_X86_64_RELAX_PC32 foo-0x4 >> 25: 66 e8 00 00 00 00 data16 callq 2b 27: >> R_X86_64_RELAX_PLT32 foo-0x4 >> 2b: 31 c0 xor %eax,%eax >> 2d: 48 83 c4 08 add $0x8,%rsp >> 31: c3 retq >> >> 00400460
: >> 400460: 48 83 ec 08 sub $0x8,%rsp >> 400464: e8 d7 ff ff ff callq 400440 >> 400469: e8 d2 ff ff ff callq 400440 >> 40046e: e8 ad ff ff ff callq 400420 >> 400473: ff 15 ff 03 20 00 callq *0x2003ff(%rip) # 600878 >> <_DYNAMIC+0xf8> >> 400479: ff 15 f9 03 20 00 callq *0x2003f9(%rip) # 600878 >> <_DYNAMIC+0xf8> >> 40047f: 66 e8 f3 00 00 00 data16 callq 400578 >> 400485: 66 e8 ed 00 00 00 data16 callq 400578 >> 40048b: 31 c0 xor %eax,%eax >> 40048d: 48 83 c4 08 add $0x8,%rsp >> 400491: c3 retq >> >> Sriraman, can you give it a try? > > > I like HJ's proposal here and it is important that the linker fixes > unnecessary indirect calls to direct ones. > > However, independently I think my original proposal is still useful > and I want to pitch it again for the following reasons. > > AFAIU, Alexander Monakov's -fno-plt does not solve the following: > > * Does not do anything for non-PIC code. The compiler does not > generate a @PLT call but the linker will route all external calls via > PLT. We noticed a problem with non-PIC executables where the PLT > stubs were causing too many icache misses and are interested in a > solution for this. > * Aggressively uses indirect calls even if the final symbol is not > truly external. This needs HJ's linker patch to fix unnecessary > indirect calls to direct calls. > > My original proposal, for x86_64 only, was to add > -fno-plt=. This lets the user decide for which > functions PLT must be avoided. Let the compiler always generate an > indirect call using call *func@GOTPCREL(%rip). We could do this for > non-PIC code too. No need for linker fixups since this relies on the > user to know that func is from a shared object. > > I am reattaching the patch. I also want to add that my proposal with -fno-plt= does not take away anything from the newly added -fno-plt option or the linker patch HJ proposed. It is orthogonal and lets the user decide the subset of external functions for which PLT must be avoided. Thanks Sri > > Thanks > Sri > > >> >> Thanks! Will do! >> >> Sri >> >> -- >> H.J. >> >> --20cf307abdc9877c7f05169e43dc Content-Type: text/plain; charset=US-ASCII; name="avoid_plt_patch.txt" Content-Disposition: attachment; filename="avoid_plt_patch.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i9yovk781 Content-length: 5547 CSogY29tbW9uLm9wdCAoLWZuby1wbHQ9KTogTmV3IG9wdGlvbi4KCSogY29u ZmlnL2kzODYvaTM4Ni5jIChhdm9pZF9wbHRfdG9fY2FsbCk6IE5ldyBmdW5j dGlvbi4KCShpeDg2X291dHB1dF9jYWxsX2luc24pOiAgQ2hlY2sgaWYgUExU IG5lZWRzIHRvIGJlIGF2b2lkZWQKCWFuZCBjYWxsIG9yIGp1bXAgaW5kaXJl Y3RseSBpZiB0cnVlLgoJKiBvcHRzLWdsb2JhbC5jIChodGFiX3N0cl9lcSk6 IE5ldyBmdW5jdGlvbi4KCShhdm9pZF9wbHRfZm5zeW1ib2xfbmFtZXNfdGFi KTogTmV3IGh0YWIuCgkoaGFuZGxlX2NvbW1vbl9kZWZlcnJlZF9vcHRpb25z KTogSGFuZGxlIC1mbm8tcGx0PQoKSW5kZXg6IGNvbW1vbi5vcHQKPT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PQotLS0gY29tbW9uLm9wdAkocmV2aXNpb24gMjIy ODkyKQorKysgY29tbW9uLm9wdAkod29ya2luZyBjb3B5KQpAQCAtMTA4Nyw2 ICsxMDg3LDExIEBAIGZkYmctY250PQogQ29tbW9uIFJlamVjdE5lZ2F0aXZl IEpvaW5lZCBWYXIoY29tbW9uX2RlZmVycmVkX29wdGlvbnMpIERlZmVyCiAt ZmRiZy1jbnQ9PGNvdW50ZXI+OjxsaW1pdD5bLDxjb3VudGVyPjo8bGltaXQ+ LC4uLl0JU2V0IHRoZSBkZWJ1ZyBjb3VudGVyIGxpbWl0LiAgIAogCitmbm8t cGx0PQorQ29tbW9uIFJlamVjdE5lZ2F0aXZlIEpvaW5lZCBWYXIoY29tbW9u X2RlZmVycmVkX29wdGlvbnMpIERlZmVyCistZm5vLXBsdD08c3ltYm9sMT4g IEF2b2lkIGdvaW5nIHRocm91Z2ggdGhlIFBMVCB3aGVuIGNhbGxpbmcgdGhl IHNwZWNpZmllZCBmdW5jdGlvbi4KK0FsbG93IG11bHRpcGxlIGluc3RhbmNl cyBvZiB0aGlzIG9wdGlvbiB3aXRoIGRpZmZlcmVudCBmdW5jdGlvbiBuYW1l cy4KKwogZmRlYnVnLXByZWZpeC1tYXA9CiBDb21tb24gSm9pbmVkIFJlamVj dE5lZ2F0aXZlIFZhcihjb21tb25fZGVmZXJyZWRfb3B0aW9ucykgRGVmZXIK IE1hcCBvbmUgZGlyZWN0b3J5IG5hbWUgdG8gYW5vdGhlciBpbiBkZWJ1ZyBp bmZvcm1hdGlvbgpJbmRleDogY29uZmlnL2kzODYvaTM4Ni5jCj09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT0KLS0tIGNvbmZpZy9pMzg2L2kzODYuYwkocmV2aXNp b24gMjIyODkyKQorKysgY29uZmlnL2kzODYvaTM4Ni5jCSh3b3JraW5nIGNv cHkpCkBAIC0yNTI4Miw2ICsyNTI4MiwyNSBAQCBpeDg2X2V4cGFuZF9jYWxs IChydHggcmV0dmFsLCBydHggZm5hZGRyLCBydHggY2FsbAogICByZXR1cm4g Y2FsbDsKIH0KIAorZXh0ZXJuIGh0YWJfdCBhdm9pZF9wbHRfZm5zeW1ib2xf bmFtZXNfdGFiOworLyogSWYgdGhlIGZ1bmN0aW9uIHJlZmVyZW5jZWQgYnkg Y2FsbF9vcCBpcyB0byBhIGV4dGVybmFsIGZ1bmN0aW9uCisgICBhbmQgY2Fs bHMgdmlhIFBMVCBtdXN0IGJlIGF2b2lkZWQgYXMgc3BlY2lmaWVkIGJ5IC1m bm8tcGx0PSwgdGhlbgorICAgcmV0dXJuIHRydWUuICAqLworCitzdGF0aWMg aW50Cithdm9pZF9wbHRfdG9fY2FsbChydHggY2FsbF9vcCkKK3sKKyAgY29u c3QgY2hhciAqbmFtZTsKKyAgaWYgKEdFVF9DT0RFIChjYWxsX29wKSAhPSBT WU1CT0xfUkVGCisgICAgICB8fCBTWU1CT0xfUkVGX0xPQ0FMX1AgKGNhbGxf b3ApCisgICAgICB8fCBhdm9pZF9wbHRfZm5zeW1ib2xfbmFtZXNfdGFiID09 IE5VTEwpCisgICAgcmV0dXJuIDA7CisgIG5hbWUgPSBYU1RSIChjYWxsX29w LCAwKTsKKyAgaWYgKGh0YWJfZmluZF9zbG90IChhdm9pZF9wbHRfZm5zeW1i b2xfbmFtZXNfdGFiLCBuYW1lLCBOT19JTlNFUlQpICE9IE5VTEwpCisgICAg cmV0dXJuIDE7CisgIHJldHVybiAwOworfQorCiAvKiBPdXRwdXQgdGhlIGFz c2VtYmx5IGZvciBhIGNhbGwgaW5zdHJ1Y3Rpb24uICAqLwogCiBjb25zdCBj aGFyICoKQEAgLTI1Mjk0LDcgKzI1MzEzLDEyIEBAIGl4ODZfb3V0cHV0X2Nh bGxfaW5zbiAocnR4IGluc24sIHJ0eCBjYWxsX29wKQogICBpZiAoU0lCTElO R19DQUxMX1AgKGluc24pKQogICAgIHsKICAgICAgIGlmIChkaXJlY3RfcCkK LQl4YXNtID0gImptcFx0JVAwIjsKKwl7CisJICBpZiAoYXZvaWRfcGx0X3Rv X2NhbGwgKGNhbGxfb3ApKQorCSAgICB4YXNtID0gImptcFx0KiVwMEBHT1RQ Q1JFTCglJXJpcCkiOworCSAgZWxzZQorCSAgICB4YXNtID0gImptcFx0JVAw IjsKKwl9CiAgICAgICAvKiBTRUggZXBpbG9ndWUgZGV0ZWN0aW9uIHJlcXVp cmVzIHRoZSBpbmRpcmVjdCBicmFuY2ggY2FzZQogCSB0byBpbmNsdWRlIFJF WC5XLiAgKi8KICAgICAgIGVsc2UgaWYgKFRBUkdFVF9TRUgpCkBAIC0yNTM0 Niw5ICsyNTM3MCwxNSBAQCBpeDg2X291dHB1dF9jYWxsX2luc24gKHJ0eCBp bnNuLCBydHggY2FsbF9vcCkKICAgICB9CiAKICAgaWYgKGRpcmVjdF9wKQot ICAgIHhhc20gPSAiY2FsbFx0JVAwIjsKKyAgICB7CisgICAgICBpZiAoYXZv aWRfcGx0X3RvX2NhbGwgKGNhbGxfb3ApKQorICAgICAgICB4YXNtID0gImNh bGxcdColcDBAR09UUENSRUwoJSVyaXApIjsKKyAgICAgIGVsc2UKKyAgICAg ICAgeGFzbSA9ICJjYWxsXHQlUDAiOworICAgIH0KICAgZWxzZQogICAgIHhh c20gPSAiY2FsbFx0JUEwIjsKKyAKIAogICBvdXRwdXRfYXNtX2luc24gKHhh c20sICZjYWxsX29wKTsKIApJbmRleDogb3B0cy1nbG9iYWwuYwo9PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09Ci0tLSBvcHRzLWdsb2JhbC5jCShyZXZpc2lvbiAy MjI4OTIpCisrKyBvcHRzLWdsb2JhbC5jCSh3b3JraW5nIGNvcHkpCkBAIC00 Nyw2ICs0Nyw3IEBAIGFsb25nIHdpdGggR0NDOyBzZWUgdGhlIGZpbGUgQ09Q WUlORzMuICBJZiBub3Qgc2VlCiAjaW5jbHVkZSAieHJlZ2V4LmgiCiAjaW5j bHVkZSAiYXR0cmlicy5oIgogI2luY2x1ZGUgInN0cmluZ3Bvb2wuaCIKKyNp bmNsdWRlICJoYXNoLXRhYmxlLmgiCiAKIHR5cGVkZWYgY29uc3QgY2hhciAq Y29uc3RfY2hhcl9wOyAvKiBGb3IgREVGX1ZFQ19QLiAgKi8KIApAQCAtNDIw LDYgKzQyMSwxNyBAQCBkZWNvZGVfb3B0aW9ucyAoc3RydWN0IGdjY19vcHRp b25zICpvcHRzLCBzdHJ1Y3QgZwogICBmaW5pc2hfb3B0aW9ucyAob3B0cywg b3B0c19zZXQsIGxvYyk7CiB9CiAKKy8qIEhlbHBlciBmdW5jdGlvbiBmb3Ig dGhlIGhhc2ggdGFibGUgdGhhdCBjb21wYXJlcyB0aGUKKyAgIGV4aXN0aW5n IGVudHJ5IChTMSkgd2l0aCB0aGUgZ2l2ZW4gc3RyaW5nIChTMikuICAqLwor CitzdGF0aWMgaW50CitodGFiX3N0cl9lcSAoY29uc3Qgdm9pZCAqczEsIGNv bnN0IHZvaWQgKnMyKQoreworICByZXR1cm4gIXN0cmNtcCAoKGNvbnN0IGNo YXIgKilzMSwgKGNvbnN0IGNoYXIgKikgczIpOworfQorCitodGFiX3QgYXZv aWRfcGx0X2Zuc3ltYm9sX25hbWVzX3RhYiA9IE5VTEw7CisKIC8qIFByb2Nl c3MgY29tbW9uIG9wdGlvbnMgdGhhdCBoYXZlIGJlZW4gZGVmZXJyZWQgdW50 aWwgYWZ0ZXIgdGhlCiAgICBoYW5kbGVycyBoYXZlIGJlZW4gY2FsbGVkIGZv ciBhbGwgb3B0aW9ucy4gICovCiAKQEAgLTUzOSw2ICs1NTEsMTUgQEAgaGFu ZGxlX2NvbW1vbl9kZWZlcnJlZF9vcHRpb25zICh2b2lkKQogCSAgc3RhY2tf bGltaXRfcnR4ID0gZ2VuX3J0eF9TWU1CT0xfUkVGIChQbW9kZSwgZ2djX3N0 cmR1cCAob3B0LT5hcmcpKTsKIAkgIGJyZWFrOwogCisgICAgICAgIGNhc2Ug T1BUX2Zub19wbHRfOgorCSAgdm9pZCAqKnNsb3Q7CisJICBpZiAoYXZvaWRf cGx0X2Zuc3ltYm9sX25hbWVzX3RhYiA9PSBOVUxMKQorCSAgICBhdm9pZF9w bHRfZm5zeW1ib2xfbmFtZXNfdGFiID0gaHRhYl9jcmVhdGUgKDEwLCBodGFi X2hhc2hfc3RyaW5nLAorCQkJCQkJCWh0YWJfc3RyX2VxLCBOVUxMKTsKKyAg ICAgICAgICBzbG90ID0gaHRhYl9maW5kX3Nsb3QgKGF2b2lkX3BsdF9mbnN5 bWJvbF9uYW1lc190YWIsIG9wdC0+YXJnLCBJTlNFUlQpOworICAgICAgICAg ICpzbG90ID0gKHZvaWQgKilvcHQtPmFyZzsKKyAgICAgICAgICBicmVhazsK KwogCWRlZmF1bHQ6CiAJICBnY2NfdW5yZWFjaGFibGUgKCk7CiAJfQo= --20cf307abdc9877c7f05169e43dc--