From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 119556 invoked by alias); 15 May 2015 20:08:22 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 119495 invoked by uid 89); 15 May 2015 20:08:22 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ob0-f171.google.com Received: from mail-ob0-f171.google.com (HELO mail-ob0-f171.google.com) (209.85.214.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 15 May 2015 20:08:18 +0000 Received: by obblk2 with SMTP id lk2so86297383obb.0 for ; Fri, 15 May 2015 13:08:15 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.182.56.196 with SMTP id c4mr9982128obq.26.1431720495842; Fri, 15 May 2015 13:08:15 -0700 (PDT) Received: by 10.76.54.14 with HTTP; Fri, 15 May 2015 13:08:15 -0700 (PDT) In-Reply-To: <20150515194824.GB14415@kam.mff.cuni.cz> References: <1430757479-14241-1-git-send-email-amonakov@ispras.ru> <1430757479-14241-5-git-send-email-amonakov@ispras.ru> <20150515194824.GB14415@kam.mff.cuni.cz> Date: Fri, 15 May 2015 20:23:00 -0000 Message-ID: Subject: Re: [PATCH i386] Allow sibcalls in no-PLT PIC From: "H.J. Lu" To: Jan Hubicka Cc: Alexander Monakov , GCC Patches , Rich Felker , Uros Bizjak Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-05/txt/msg01453.txt.bz2 On Fri, May 15, 2015 at 12:48 PM, Jan Hubicka wrote: >> On Fri, May 15, 2015 at 9:27 AM, Alexander Monakov wrote: >> > Ping? Any comment about this patch? >> > >> > On Mon, 4 May 2015, Alexander Monakov wrote: >> > >> >> With -fno-plt, we don't have to reject even direct calls as sibcall >> >> candidates. >> >> >> >> This patch depends on '-fplt' flag that is introduced in another patch. >> >> >> >> This patch requires that with -fno-plt all sibcall candidates go through >> >> prepare_call_address that transforms the call to a GOT lookup. >> >> >> >> OK? >> >> * config/i386/i386.c (ix86_function_ok_for_sibcall): Check flag_plt. >> >> >> >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> >> index f29e053..b734350 100644 >> >> --- a/gcc/config/i386/i386.c >> >> +++ b/gcc/config/i386/i386.c >> >> @@ -5448,12 +5448,13 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) >> >> /* If we are generating position-independent code, we cannot sibcall >> >> optimize any indirect call, or a direct call to a global function, >> >> as the PLT requires %ebx be live. (Darwin does not have a PLT.) */ >> >> if (!TARGET_MACHO >> >> && !TARGET_64BIT >> >> && flag_pic >> >> + && flag_plt >> >> && (decl && !targetm.binds_local_p (decl))) >> >> return false; >> >> >> >> /* If we need to align the outgoing stack, then sibcalling would >> >> unalign the stack, which may break the called function. */ >> >> if (ix86_minimum_incoming_stack_boundary (true) >> >> >> >> I think it should be done via psABI change similar to >> >> https://groups.google.com/forum/#!topic/x86-64-abi/n8GYMpqvBxI >> >> which I have implemented on users/hjl/relax branch in binutils. > > OK, I am trying to understand how relax branch works and what difference it makes. > As I underestand it, the main purpose is to be able to make relaxed call of > > call function > > that will, in 64bit mode, either result to RIP relative call with extra NOP just > before the instruction if FUNCTION binds within the DSO or to indirect call through > GOT bypassing the PLT. This saves overhead of PLT and increase every such call > by extra NOP for no-LTO builds and even in LTO when the symbol is defined but > interposable. This is actually really nice trick. > > Now this is about 32bit mode where explicit GOT pointer register is needed > (how this work with large code model on x86-64?). It is needed by PLT, but I suppose > to implement the same relaxation for 32bit it would need to use EBX to lookup the > GOT pointer, too, so the check above would still be valid. > With relax branch in 32-bit, there are 2 cases: 1. PIC or PIE: We generate set up EBX relax call foo@PLT It is almost the same as we do now, except for the relax prefix. If foo is defined in another shared library or may be preempted, linker will generate call *foo@GOTPLT(%ebx) If foo turns out local, linker will output relax call foo 2. Non PIC/PIE: We generate relax call foo If foo is defined in a DSO, linker will generate call/jmp *foo@GOTPLT We don't set up EBX in this case. If foo turns out local, linker will output relax call foo -- H.J.