From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 127158 invoked by alias); 3 Jun 2015 14:27:16 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 127130 invoked by uid 89); 3 Jun 2015 14:27:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.3 required=5.0 tests=AWL,BAYES_50,KAM_STOCKGEN,SPF_PASS,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 03 Jun 2015 14:27:15 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5CFEE28; Wed, 3 Jun 2015 07:27:17 -0700 (PDT) Received: from collaborate-mta1.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTP id B82773F21B; Wed, 3 Jun 2015 07:27:12 -0700 (PDT) Received: from [10.2.206.27] (e105545-lin.cambridge.arm.com [10.2.206.27]) by collaborate-mta1.arm.com (Postfix) with ESMTPS id 8F32E13F730; Wed, 3 Jun 2015 09:27:11 -0500 (CDT) Message-ID: <556F0EBF.9030005@arm.com> Date: Wed, 03 Jun 2015 14:37:00 -0000 From: Ramana Radhakrishnan User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Sriraman Tallam CC: Jan Hubicka , "H.J. Lu" , Pedro Alves , Michael Matz , David Li , GCC Patches Subject: Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt= References: <20150529193552.GA52215@kam.mff.cuni.cz> <556C16B1.5080606@arm.com> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2015-06/txt/msg00327.txt.bz2 Hi Sriraman, Thanks for the detailed explanation, that was useful. >> >> I'm sorry I'm going to push back again for the same reason. > > Let me describe the problem I am having in a little more detail: > > For the PIC case, I think there is no confusion. Both of us agree on > what is being done. Attribute no_plt exactly shadows -fno-plt and is > completely target independent. Agreed. > > For the non-PIC case, this is where some target dependent portions are > needed. This is because I simply cannot remove the flag_pic check in > calls.c and force the address onto a register. Lets say I did that > with this patch: Of-course I should have realized this earlier - sorry for being a pain. We need to load the value from the GOT (or an equivalent position independent manner) and that is entirely handled by the backends, there's no easy interface to do this from the mid-end. I tried a horrible hack in calls.c which was - int old_flag_pic = flag_pic; flag_pic = 1; funexp = force_reg (Pmode, funexp); flag_pic = old_flag_pic; We then have to relax quite a lot of checks in a number of places across backends to handle !flag_plt which ain't worth it. I agree now that it will be much cleaner just to punt this into the backend, so it may be worth noting that making this work properly for the non-PIC case requires quite a degree of massaging in the backends. Objections withdrawn. Thanks, Ramana > > Index: calls.c > =================================================================== > --- calls.c (revision 223720) > +++ calls.c (working copy) > @@ -226,8 +226,10 @@ prepare_call_address (tree fndecl_or_type, rtx fun > && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) > ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) > : memory_address (FUNCTION_MODE, funexp)); > - else if (flag_pic && !flag_plt && fndecl_or_type > + else if (fndecl_or_type > && TREE_CODE (fndecl_or_type) == FUNCTION_DECL > + && (!flag_plt > + || lookup_attribute ("no_plt", DECL_ATTRIBUTES (fndecl_or_type))) > && !targetm.binds_local_p (fndecl_or_type)) > { > funexp = force_reg (Pmode, funexp); > > what would the code look like for this example below in the non-PIC case: > > __attribute__((no_plt)) > extern int foo(); > > int main () > { > return foo(); > } > > > Without -O2: > > mov _Z3foov, %eax > call *%eax > > The indirect call is there but this is wrong because this will force > the linker to still create a PLT entry for foo and use that address. > This is worse than calling the PLT directly as we end up calling the > PLT indirectly. > > Now, with -O2: > call *_Z3foov > > and again same story. The linker creates a PLT entry for foo and > calls foo_plt indirectly. > > What we really need to do in the non-PIC case, if we need a target > independent solution, is pretend that the call to foo is like a PIC > call when we see the attribute. I looked at how to do this and the > change to me seems pretty hairy and that is why it seemed like it is > better to handle this in the target directly. > > Thanks > Sri > > >> >> Other than forcing targets to tweak their call insn patterns, the act >> of generating the indirect call should remain in target independent >> code. Sorry, not having the same behaviour on all platforms for >> something like this is just a recipe for confusion. >> >> regards >> Ramana >> >>> >>> For PIC code, no_plt merely shadows the implementation of -fno-plt, no >>> surprises here. >>> >>> * c-family/c-common.c (no_plt): New attribute. >>> (handle_no_plt_attribute): New handler. >>> * calls.c (prepare_call_address): Check for no_plt >>> attribute. >>> * config/i386/i386.c (ix86_function_ok_for_sibcall): Check >>> for no_plt attribute. >>> (ix86_expand_call): Ditto. >>> (nopic_no_plt_attribute): New function. >>> (ix86_output_call_insn): Output indirect call for non-pic >>> no plt calls. >>> * doc/extend.texi (no_plt): Document new attribute. >>> * testsuite/gcc.target/i386/noplt-1.c: New test. >>> * testsuite/gcc.target/i386/noplt-2.c: New test. >>> * testsuite/gcc.target/i386/noplt-3.c: New test. >>> * testsuite/gcc.target/i386/noplt-4.c: New test. >>> >>> >>> Please review. >>> >>> Thanks >>> Sri >>> >>> >>>> >>>> To be honest, this is trivial to implement in the ARM backend as one >>>> would just piggy back on the longcalls work - despite that, IMNSHO >>>> it's best done in a target independent manner. >>>> >>>> regards >>>> Ramana >>>> >>>>> >>>>> Thanks >>>>> Sri >>>>> >>>>>> >>>>>> regards >>>>>> Ramana >>>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> I am not familiar with PLT calls for other targets. I can move the >>>>>>>>> tests to gcc.dg but what relocation are you suggesting I check for? >>>>>>>> >>>>>>>> >>>>>>>> Move the test to gcc.dg, add a target_support_no_plt function in >>>>>>>> testsuite/lib/target-supports.exp and mark this as being supported only on >>>>>>>> x86 and use scan-assembler to scan for PLT relocations for x86. Other >>>>>>>> targets can add things as they deem fit. >>>>>>> >>>>>>>> >>>>>>>> In any case, on a large number of elf/ linux targets I would have thought >>>>>>>> the absence of a JMP_SLOT relocation would be good enough to check that this >>>>>>>> is working correctly. >>>>>>>> >>>>>>>> regards >>>>>>>> Ramana >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Sri >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Ramana >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Also I think the PLT calls have EBX in call fusage wich is added by >>>>>>>>>>>> ix86_expand_call. >>>>>>>>>>>> else >>>>>>>>>>>> { >>>>>>>>>>>> /* Static functions and indirect calls don't need the pic >>>>>>>>>>>> register. */ >>>>>>>>>>>> if (flag_pic >>>>>>>>>>>> && (!TARGET_64BIT >>>>>>>>>>>> || (ix86_cmodel == CM_LARGE_PIC >>>>>>>>>>>> && DEFAULT_ABI != MS_ABI)) >>>>>>>>>>>> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >>>>>>>>>>>> && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) >>>>>>>>>>>> { >>>>>>>>>>>> use_reg (&use, gen_rtx_REG (Pmode, >>>>>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM)); >>>>>>>>>>>> if (ix86_use_pseudo_pic_reg ()) >>>>>>>>>>>> emit_move_insn (gen_rtx_REG (Pmode, >>>>>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM), >>>>>>>>>>>> pic_offset_table_rtx); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> I think you want to take that away from FUSAGE there just like we do >>>>>>>>>>>> for >>>>>>>>>>>> local calls >>>>>>>>>>>> (and in fact the code should already check flag_pic && flag_plt I >>>>>>>>>>>> suppose. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Done that now and patch attached. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Sri >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Honza >>>>>>>>> >>>>>>>>> >>>>>>>> >