From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 83063 invoked by alias); 20 May 2015 01:06:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 83054 invoked by uid 89); 20 May 2015 01:06:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.3 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RDNS_DYNAMIC,TVD_RCVD_IP autolearn=no version=3.3.2 X-HELO: brightrain.aerifal.cx Received: from 216-12-86-13.cv.mvl.ntelos.net (HELO brightrain.aerifal.cx) (216.12.86.13) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 20 May 2015 01:06:18 +0000 Received: from dalias by brightrain.aerifal.cx with local (Exim 3.15 #2) id 1YusSQ-0004ul-00; Wed, 20 May 2015 01:06:02 +0000 Date: Wed, 20 May 2015 01:09:00 -0000 From: Rich Felker To: "H.J. Lu" Cc: Richard Henderson , Michael Matz , Jan Hubicka , Alexander Monakov , GCC Patches , Uros Bizjak Subject: Re: [PATCH i386] Allow sibcalls in no-PLT PIC Message-ID: <20150520010602.GR17573@brightrain.aerifal.cx> References: <20150519180659.GG17573@brightrain.aerifal.cx> <555B87F4.30908@redhat.com> <555B8ACD.20503@redhat.com> <20150519201557.GK17573@brightrain.aerifal.cx> <20150519205420.GL17573@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-SW-Source: 2015-05/txt/msg01770.txt.bz2 On Tue, May 19, 2015 at 05:10:11PM -0700, H.J. Lu wrote: > On Tue, May 19, 2015 at 1:54 PM, Rich Felker wrote: > > On Tue, May 19, 2015 at 01:27:06PM -0700, H.J. Lu wrote: > >> On Tue, May 19, 2015 at 1:15 PM, Rich Felker wrote: > >> > On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote: > >> >> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson wrote: > >> >> > On 05/19/2015 12:06 PM, H.J. Lu wrote: > >> >> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson wrote: > >> >> >>> On 05/19/2015 11:06 AM, Rich Felker wrote: > >> >> >>>> I'm still mildly worried that concerns for supporting > >> >> >>>> relaxation might lead to decisions not to optimize code in ways that > >> >> >>>> would be difficult to relax (e.g. certain types of address load > >> >> >>>> reordering or hoisting) but I don't understand GCC internals > >> >> >>>> sufficiently to know if this concern is warranted or not. > >> >> >>> > >> >> >>> It is. The relaxation that HJ is working on requires that the reads from the > >> >> >>> got not be hoisted. I'm not especially convinced that what he's working on is > >> >> >>> a win. > >> >> >>> > >> >> >>> With LTO, the compiler can do the same job that he's attempting in the linker, > >> >> >>> without an extra nop. Without LTO, leaving it to the linker means that you > >> >> >>> can't hoist the load and hide the memory latency. > >> >> >>> > >> >> >> > >> >> >> My relax approach won't take away any optimization done by compiler. > >> >> >> It simply turns indirect branch into direct branch with a nop prefix at > >> >> >> link-time. I am having a hard time to understand why we shouldn't do it. > >> >> > > >> >> > I well understand what you're doing. > >> >> > > >> >> > But my point is that the only time the compiler should present you with the > >> >> > form of indirect branch you're looking for is when there's no place to hoist > >> >> > the load. > >> >> > > >> >> > At which point, is it really worth adding a new relocation to the ABI? Is it > >> >> > really worth adding new code to the linker that won't be exercised often? > >> >> > >> >> I believe there are plenty of indirect branches via GOT when compiling > >> >> PIE/PIC with -fno-plt: > >> >> > >> >> [hjl@gnu-6 gcc]$ cat /tmp/x.c > >> >> extern void foo (void); > >> >> > >> >> void > >> >> bar (void) > >> >> { > >> >> foo (); > >> >> } > >> >> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt > >> >> [hjl@gnu-6 gcc]$ cat x.s > >> >> ..file "x.c" > >> >> ..section .text.unlikely,"ax",@progbits > >> >> ..LCOLDB0: > >> >> ..text > >> >> ..LHOTB0: > >> >> ..p2align 4,,15 > >> >> ..globl bar > >> >> ..type bar, @function > >> >> bar: > >> >> ..LFB0: > >> >> ..cfi_startproc > >> >> jmp *foo@GOTPCREL(%rip) > >> >> ..cfi_endproc > >> >> ..LFE0: > >> >> ..size bar, .-bar > >> > > >> > I agree these exist. What I question is whether the savings from the > >> > linker being able to relax this to a direct call in the case where the > >> > programmer failed to let the compiler make it a direct call to begin > >> > with (by using hidden or protected visibility) are worth the cost of > >> > not being able to hoist the load out of loops or schedule it earlier > >> > in cases where relaxation is not possible because the call target is > >> > not defined in the same DSO. > >> > >> Just for fun. I compiled binutils as PIE with -fno-plt -flto: > >> > >> [hjl@gnu-mic-2 gas]$ file as-new > >> as-new: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), > >> dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not > >> stripped > >> [hjl@gnu-mic-2 gas]$ > >> > >> There are 43: > >> > >> ff 25 21 93 2d 00 jmpq *0x2d9321(%rip) # 3d5f58 <_DYNAMIC+0x1e8> > >> > >> and 1983 > >> > >> ff 15 eb f4 38 00 callq *0x38f4eb(%rip) # 3d60e0 <_DYNAMIC+0x370> > > > > How many of those would be relaxed? I suspect it depends a lot on > > whether libbfd is static or shared. > > When shared libraries are enabled, there are 177 indirect branches > to locally defined functions. Call to any locally defined functions, > which aren't compiled with LTO, is indirect. And are the above indirect calls/jumps (1983+43) candidates for scheduling/hoisting the address load (that's not being done yet), or are they the ones the compiler opted not to schedule/hoist? The win from relaxation seems small here, but as long as you're not going to block optimizations that would preclude relaxing, I don't see any disadvantages to doing it. Rich