From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28293 invoked by alias); 4 May 2015 14:45:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 27634 invoked by uid 89); 4 May 2015 14:45:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: mx2.suse.de Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Mon, 04 May 2015 14:45:49 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 6F0EEAAB1; Mon, 4 May 2015 14:45:45 +0000 (UTC) Date: Mon, 04 May 2015 14:45:00 -0000 From: Michael Matz To: Sriraman Tallam cc: GCC Patches , "H.J. Lu" , David Li Subject: Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-IsSubscribed: yes X-SW-Source: 2015-05/txt/msg00219.txt.bz2 Hi, On Thu, 30 Apr 2015, Sriraman Tallam wrote: > We noticed that one of our benchmarks sped-up by ~1% when we eliminated > PLT stubs for some of the hot external library functions like memcmp, > pow. The win was from better icache and itlb performance. The main > reason was that the PLT stubs had no spatial locality with the > call-sites. I have started looking at ways to tell the compiler to > eliminate PLT stubs (in-effect inline them) for specified external > functions, for x86_64. I have a proposal and a patch and I would like to > hear what you think. > > This comes with caveats. This cannot be generally done for all > functions marked extern as it is impossible for the compiler to say if a > function is "truly extern" (defined in a shared library). If a function > is not truly extern(ends up defined in the final executable), then > calling it indirectly is a performance penalty as it could have been a > direct call. This can be fixed by Alans idea. > Further, the newly created GOT entries are fixed up at > start-up and do not get lazily bound. And this can be fixed by some enhancements in the linker and dynamic linker. The idea is to still generate a PLT stub and make its GOT entry point to it initially (like a normal got.plt slot). Then the first indirect call will use the address of PLT entry (starting lazy resolution) and update the GOT slot with the real address, so further indirect calls will directly go to the function. This requires a new asm marker (and hence new reloc) as normally if there's a GOT slot it's filled by the real symbols address, unlike if there's only a got.plt slot. E.g. a call *foo@GOTPLT(%rip) would generate a GOT slot (and fill its address into above call insn), but generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one. Ciao, Michael.