From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2836 invoked by alias); 4 Dec 2014 19:32:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 2825 invoked by uid 89); 4 Dec 2014 19:32:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-oi0-f42.google.com Received: from mail-oi0-f42.google.com (HELO mail-oi0-f42.google.com) (209.85.218.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 04 Dec 2014 19:32:33 +0000 Received: by mail-oi0-f42.google.com with SMTP id v63so13004272oia.1 for ; Thu, 04 Dec 2014 11:32:31 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.182.241.195 with SMTP id wk3mr7915407obc.33.1417721551221; Thu, 04 Dec 2014 11:32:31 -0800 (PST) Received: by 10.60.123.111 with HTTP; Thu, 4 Dec 2014 11:32:31 -0800 (PST) In-Reply-To: References: Date: Thu, 04 Dec 2014 19:32:00 -0000 Message-ID: Subject: Re: [PATCH x86_64] Optimize access to globals in "-fpie -pie" builds with copy relocations From: Uros Bizjak To: "H.J. Lu" Cc: "gcc-patches@gcc.gnu.org" , Sriraman Tallam , Jakub Jelinek Content-Type: text/plain; charset=UTF-8 X-SW-Source: 2014-12/txt/msg00445.txt.bz2 On Thu, Dec 4, 2014 at 5:46 PM, H.J. Lu wrote: >>>>>>>> It would probably help reviewers if you pointed to actual path >>>>>>>> submission [1], which unfortunately contains the explanation in the >>>>>>>> patch itself [2], which further explains that this functionality is >>>>>>>> currently only supported with gold, patched with [3]. >>>>>>>> >>>>>>>> [1] https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00645.html >>>>>>>> [2] https://gcc.gnu.org/ml/gcc-patches/2014-09/txt2CHtu81P1O.txt >>>>>>>> [3] https://sourceware.org/ml/binutils/2014-05/msg00092.html >>>>>>>> >>>>>>>> After a bit of the above detective work, I think that new gcc option >>>>>>>> is not necessary. The configure should detect if new functionality is >>>>>>>> supported in the linker, and auto-configure gcc to use it when >>>>>>>> appropriate. >>>>>>> >>>>>>> I think GCC option is needed since one can use -fuse-ld= to >>>>>>> change linker. >>>>>> >>>>>> IMO, nobody will use this highly special x86_64-only option. It would >>>>>> be best for gnu-ld to reach feature parity with gold as far as this >>>>>> functionality is concerned. In this case, the optimization would be >>>>>> auto-configured, and would fire automatically, without any user >>>>>> intervention. >>>>>> >>>>> >>>>> Let's do it. I implemented the same feature in bfd linker on both >>>>> master and 2.25 branch. >>>>> >>>> >>>> +bool >>>> +i386_binds_local_p (const_tree exp) >>>> +{ >>>> + /* Globals marked extern are treated as local when linker copy relocations >>>> + support is available with -f{pie|PIE}. */ >>>> + if (TARGET_64BIT && ix86_copyrelocs && flag_pie >>>> + && TREE_CODE (exp) == VAR_DECL >>>> + && DECL_EXTERNAL (exp) && !DECL_WEAK (exp)) >>>> + return true; >>>> + return default_binds_local_p (exp); >>>> +} >>>> + >>>> >>>> It returns true with -fPIE and false without -fPIE. It is lying to compiler. >>>> Maybe legitimate_pic_address_disp_p is a better place. >> >> Agreed. >> >>> Something like this? >> >> Yes. >> >> OK, if Jakub doesn't have any objections here. Please also add >> Sriraman as author to ChangeLog entry. >> >> Thanks, >> Uros. > > Here is the patch. OK to install? > > Thanks. > > -- > H.J. > --- > Normally, with -fPIE/-fpie, GCC accesses globals that are extern to the > module using the GOT. This is two instructions, one to get the address > of the global from the GOT and the other to get the value. If it turns > out that the global gets defined in the executable at link-time, it still > needs to go through the GOT as it is too late then to generate a direct > access. > > Examples: > > foo.cc > ------ > int a_glob; > int main () { > return a_glob; // defined in this file > } > > With -O2 -fpie -pie, the generated code directly accesses the global via > PC-relative insn: > > 5e0
: > mov 0x165a(%rip),%eax # 1c40 > > foo.cc > ------ > > extern int a_glob; > int main () { > return a_glob; // defined in this file > } > > With -O2 -fpie -pie, the generated code accesses global via GOT using > two memory loads: > > 6f0
: > mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230> > mov (%rax),%eax > > This is true even if in the latter case the global was defined in the > executable through a different file. > > Some experiments on google benchmarks shows that the extra memory loads > affects performance by 1% to 5%. > > Solution - Copy Relocations: > > When the linker supports copy relocations, GCC can always assume that > the global will be defined in the executable. For globals that are truly > extern (come from shared objects), the linker will create copy relocations > and have them defined in the executable. Result is that no global access > needs to go through the GOT and hence improves performance. > > This optimization only applies to undefined, non-weak global data. > Undefined, weak global data access still must go through the GOT. > > This patch checks if linker supports PIE with copy reloc, which is > enabled in gold and bfd linker in bininutils 2.25, at configure time > and enables this optimization if the linker support is available. > > gcc/ > > * configure.ac (HAVE_LD_PIE_COPYRELOC): Defined to 1 if > Linux/x86-64 linker supports PIE with copy reloc. > * config.in: Regenerated. > * configure: Likewise. > > * config/i386/i386.c (legitimate_pic_address_disp_p): Allow > pc-relative address for undefined, non-weak, non-function > symbol reference in 64-bit PIE if linker supports PIE with > copy reloc. > > * doc/sourcebuild.texi: Document pie_copyreloc target. > > gcc/testsuite/ > > * gcc.target/i386/pie-copyrelocs-1.c: New test. > * gcc.target/i386/pie-copyrelocs-2.c: Likewise. > * gcc.target/i386/pie-copyrelocs-3.c: Likewise. > * gcc.target/i386/pie-copyrelocs-4.c: Likewise. > > * lib/target-supports.exp (check_effective_target_pie_copyreloc): > New procedure. OK. Thanks, Uros.