From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 106679 invoked by alias); 6 May 2015 19:05:24 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 106661 invoked by uid 89); 6 May 2015 19:05:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-oi0-f54.google.com Received: from mail-oi0-f54.google.com (HELO mail-oi0-f54.google.com) (209.85.218.54) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 06 May 2015 19:05:22 +0000 Received: by oica37 with SMTP id a37so15582785oic.0 for ; Wed, 06 May 2015 12:05:20 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.202.62.212 with SMTP id l203mr131922oia.67.1430939120194; Wed, 06 May 2015 12:05:20 -0700 (PDT) Received: by 10.76.54.14 with HTTP; Wed, 6 May 2015 12:05:20 -0700 (PDT) In-Reply-To: <20150506190128.GL17573@brightrain.aerifal.cx> References: <5547AD8D.9080806@redhat.com> <20150504173955.GE1751@tucnak.redhat.com> <5547AF7C.9030500@redhat.com> <20150506154554.GZ1751@tucnak.redhat.com> <20150506173521.GJ17573@brightrain.aerifal.cx> <20150506183735.GK17573@brightrain.aerifal.cx> <20150506190128.GL17573@brightrain.aerifal.cx> Date: Wed, 06 May 2015 19:05:00 -0000 Message-ID: Subject: Re: [PATCH] Expand PIC calls without PLT with -fno-plt From: "H.J. Lu" To: Rich Felker Cc: Alexander Monakov , Jakub Jelinek , Jeff Law , GCC Patches Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-05/txt/msg00496.txt.bz2 On Wed, May 6, 2015 at 12:01 PM, Rich Felker wrote: > On Wed, May 06, 2015 at 11:44:57AM -0700, H.J. Lu wrote: >> On Wed, May 6, 2015 at 11:37 AM, Rich Felker wrote: >> > On Wed, May 06, 2015 at 11:26:29AM -0700, H.J. Lu wrote: >> >> On Wed, May 6, 2015 at 10:35 AM, Rich Felker wrote: >> >> > On Wed, May 06, 2015 at 07:43:58PM +0300, Alexander Monakov wrote: >> >> >> On Wed, 6 May 2015, Jakub Jelinek wrote: >> >> >> > The linker would know very well what kind of relocations are used for >> >> >> > particular PLT slot, and for the new relocations which would resolve to the >> >> >> > address of the .got.plt slot it could just tweak corresponding 3rd insn >> >> >> > in the slot, to not jump to first plt slot - 16, but a few bytes before that >> >> >> > that would just load the address of _G_O_T_ into %ebx and then fallthru >> >> >> > into the 0x4c2b7310 snippet above. The lazy binding would be a few ticks >> >> >> > slower in that case, but no requirement on %ebx to contain _G_O_T_. >> >> >> >> >> >> No, %ebx is callee-saved, so you can't outright overwrite it in the PLT stub. >> >> > >> >> > Indeed. And the situation is the same on almost all targets. The only >> >> > exceptions are those with direct PC-relative addressing (like x86_64) >> >> > and those with reserved inter-procedural linkage registers and >> >> > efficient PC-relative address loading via them (like ARM and AArch64). >> >> > MIPS (o32) is also an interesting exception in that the normal ABI is >> >> > already PLT-free, and while callees need a PIC register loaded, it's a >> >> > call-clobbered register, not a call-saved one, so it doesn't make the >> >> > same kind of trouble, >> >> > >> >> > I really don't see a need to make no-PLT code gen support lazy binding >> >> > when it's necessarily going to be costly to do so, and precludes most >> >> > of the benefits of the no-PLT approach. Anyone still wanting/needing >> >> > lazy binding semantics can use PLT, and can even choose on a per-TU >> >> > basis (or maybe even more fine-grained with pragmas/attributes?). >> >> > Those of us who are suffering the cost of PLT with no benefits >> >> > (because we use -Wl,-z,relro -Wl,-z,now) can just be rid of it (by >> >> > adding -fno-plt) and enjoy something like a 10% performance boost in >> >> > PIC/PIE. >> >> > >> >> >> >> There are things compiler can do for performance and correctness >> >> if it is told what options will be passed to linker. -z now is one and >> >> -Bsymbolic is another one: >> >> >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65886 >> >> >> >> I think we should add -fnow and -fsymbolic. Together with LTO, >> >> we can generate faster executables as well as shared libraries. >> > >> > I don't see how knowing about -Bsymbolic can help the compiler >> > optimize. Without visibility, it can't know whether the symbols will >> > be defined in the same DSO. With visibility, it can already do the >> > equivalent hints. Perhaps it helps in the case where the symbol is >> > already defined (and non-weak) in the same TU, but I think in this >> > case it should already be optimizing the reference. Symbol >> > interposition over top of a non-weak symbol from the same TU is always >> > invalid and the compiler should not be pessimizing code to make it >> > work. >> >> -Bsymbolic will bind all references to local definitions in shared libraries, >> with and without visibility, weak or non-weak. Compiler can use it >> in binds_tls_local_p and we can generate much better codes in shared >> libraries. > > Yes, I'm aware of what it does. But at compile-time the compiler can't > know whether the referenced symbol will be defined in the same DSO > unless this is visibility annotation telling it. Even when linking a > shared library using -Bsymbolic, the library code can still make calls > (or data references) to symbols in other DSOs. Even without LTO, -fsymbolic -fPIC will generate better codes for --- int glob_a = 1; int foo () { return glob_a; } --- and --- int glob_a (void) { return -1; } int foo () { return glob_a (); } --- >> > As for -fnow, I haven't thought about it much but I also don't see >> > many places where it could help. The only benefit that comes to mind >> > is on targets with weak memory order, where it would eliminate some of >> > the cost of synchronizing TLSDESC lazy bindings (see Szabolcs Nagy's >> > work on AArch64). It might also benefit PLT calls on such targets, but >> > you would get a lot more benefit from -fno-plt, and in that case -fnow >> > would not allow any further optimization. >> >> -fno-plt doesn't work with lazy binding. -fnow tells compiler that >> lazy binding is not used and it can optimize without PLT. With >> -flto -fnow, compiler can make much better choices. > > Ah, I see now you had LTO in mind. In that case the compiler does know > when the symbol is defined in the same DSO for -Bsymbolic. So that > clears up the usefulness of your proposed -fsymbolic. I still don't > see how -fnow would have a lot of practical usefulness, but I'm > certainly not opposed to it. > > Rich -- H.J.