From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25739 invoked by alias); 3 Feb 2015 19:25:44 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 25709 invoked by uid 89); 3 Feb 2015 19:25:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,KAM_STOCKGEN,RCVD_IN_DNSWL_LOW,SPF_PASS,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: mail-vc0-f182.google.com Received: from mail-vc0-f182.google.com (HELO mail-vc0-f182.google.com) (209.85.220.182) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 03 Feb 2015 19:25:41 +0000 Received: by mail-vc0-f182.google.com with SMTP id kv19so17351559vcb.13 for ; Tue, 03 Feb 2015 11:25:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=v/gnKx1bIhlrWzQMP6Vjd0QGsehsqO9wknKUb19ZNd4=; b=D3oC3Vs+6vsP50jTaase48EZQK3wUxPiUv8JMbvOjZikdPG//GE1bhQdg+gWrEx5/D d3IPru2P0mqXw8aj6WjezHX0vV053JiQE2cDI1A5rfSseReg2Q7Mqi+f6xs+/yzIclnY oXu7M0qIXZSE6vsGtUJFC9mRkJLoBDmxQ9e2fCC47ICyZa/9sB6ZraMQMZlia8H/Ld62 Leb5HvoMkLAGHnQ9Ho+GcYM5CtvJ8syym1B3kwv0xcLcAmnq91NjBDlaoF3zzdfMIH43 PJJHXcvg9n6m3QkRwWwUwTdAy3dEylRzPQmFtYal/fvMx8rUZH85Z6rEaiJFDv+98MJ/ 9BkA== X-Gm-Message-State: ALoCoQkR3aw8HXyZr7SeDNHkXcV9PKi8sVpzSi44xH3wfbKPw0A24yUPBGjpAhNQMyu42EUdxbWu MIME-Version: 1.0 X-Received: by 10.52.146.115 with SMTP id tb19mr14388099vdb.69.1422991538859; Tue, 03 Feb 2015 11:25:38 -0800 (PST) Received: by 10.52.37.114 with HTTP; Tue, 3 Feb 2015 11:25:38 -0800 (PST) In-Reply-To: References: Date: Tue, 03 Feb 2015 19:25:00 -0000 Message-ID: Subject: Re: [PATCH x86_64] Optimize access to globals in "-fpie -pie" builds with copy relocations From: Sriraman Tallam To: "H.J. Lu" Cc: Uros Bizjak , "gcc-patches@gcc.gnu.org" , Jakub Jelinek Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-02/txt/msg00156.txt.bz2 On Thu, Dec 4, 2014 at 8:46 AM, H.J. Lu wrote: > On Thu, Dec 4, 2014 at 4:44 AM, Uros Bizjak wrote: >> On Wed, Dec 3, 2014 at 10:35 PM, H.J. Lu wrote: >> >>>>>>>> It would probably help reviewers if you pointed to actual path >>>>>>>> submission [1], which unfortunately contains the explanation in the >>>>>>>> patch itself [2], which further explains that this functionality is >>>>>>>> currently only supported with gold, patched with [3]. >>>>>>>> >>>>>>>> [1] https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00645.html >>>>>>>> [2] https://gcc.gnu.org/ml/gcc-patches/2014-09/txt2CHtu81P1O.txt >>>>>>>> [3] https://sourceware.org/ml/binutils/2014-05/msg00092.html >>>>>>>> >>>>>>>> After a bit of the above detective work, I think that new gcc option >>>>>>>> is not necessary. The configure should detect if new functionality is >>>>>>>> supported in the linker, and auto-configure gcc to use it when >>>>>>>> appropriate. >>>>>>> >>>>>>> I think GCC option is needed since one can use -fuse-ld= to >>>>>>> change linker. >>>>>> >>>>>> IMO, nobody will use this highly special x86_64-only option. It would >>>>>> be best for gnu-ld to reach feature parity with gold as far as this >>>>>> functionality is concerned. In this case, the optimization would be >>>>>> auto-configured, and would fire automatically, without any user >>>>>> intervention. >>>>>> >>>>> >>>>> Let's do it. I implemented the same feature in bfd linker on both >>>>> master and 2.25 branch. >>>>> >>>> >>>> +bool >>>> +i386_binds_local_p (const_tree exp) >>>> +{ >>>> + /* Globals marked extern are treated as local when linker copy relocations >>>> + support is available with -f{pie|PIE}. */ >>>> + if (TARGET_64BIT && ix86_copyrelocs && flag_pie >>>> + && TREE_CODE (exp) == VAR_DECL >>>> + && DECL_EXTERNAL (exp) && !DECL_WEAK (exp)) >>>> + return true; >>>> + return default_binds_local_p (exp); >>>> +} >>>> + >>>> >>>> It returns true with -fPIE and false without -fPIE. It is lying to compiler. >>>> Maybe legitimate_pic_address_disp_p is a better place. >> >> Agreed. >> >>> Something like this? >> >> Yes. >> >> OK, if Jakub doesn't have any objections here. Please also add >> Sriraman as author to ChangeLog entry. >> >> Thanks, >> Uros. > > Here is the patch. OK to install? > > Thanks. > > -- > H.J. > --- > Normally, with -fPIE/-fpie, GCC accesses globals that are extern to the > module using the GOT. This is two instructions, one to get the address > of the global from the GOT and the other to get the value. If it turns > out that the global gets defined in the executable at link-time, it still > needs to go through the GOT as it is too late then to generate a direct > access. > > Examples: > > foo.cc > ------ > int a_glob; > int main () { > return a_glob; // defined in this file > } > > With -O2 -fpie -pie, the generated code directly accesses the global via > PC-relative insn: > > 5e0
: > mov 0x165a(%rip),%eax # 1c40 > > foo.cc > ------ > > extern int a_glob; > int main () { > return a_glob; // defined in this file > } > > With -O2 -fpie -pie, the generated code accesses global via GOT using > two memory loads: > > 6f0
: > mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230> > mov (%rax),%eax > > This is true even if in the latter case the global was defined in the > executable through a different file. > > Some experiments on google benchmarks shows that the extra memory loads > affects performance by 1% to 5%. > > Solution - Copy Relocations: > > When the linker supports copy relocations, GCC can always assume that > the global will be defined in the executable. For globals that are truly > extern (come from shared objects), the linker will create copy relocations > and have them defined in the executable. Result is that no global access > needs to go through the GOT and hence improves performance. > > This optimization only applies to undefined, non-weak global data. > Undefined, weak global data access still must go through the GOT. Hi H.J., This was the original patch to i386.c to let global accesses take advantage of copy relocations and avoid the GOT. @@ -13113,7 +13113,11 @@ legitimate_pic_address_disp_p (rtx disp) return true; } else if (!SYMBOL_REF_FAR_ADDR_P (op0) - && SYMBOL_REF_LOCAL_P (op0) + && (SYMBOL_REF_LOCAL_P (op0) + || (HAVE_LD_PIE_COPYRELOC + && flag_pie + && !SYMBOL_REF_WEAK (op0) + && !SYMBOL_REF_FUNCTION_P (op0))) && ix86_cmodel != CM_LARGE_PIC) I do not understand here why weak global data access must go through the GOT and not use copy relocations. Ultimately, there is only going to be one copy of the global either defined in the executable or the shared object right? Can we remove the check for SYMBOL_REF_WEAK? Thanks Sri > > This patch checks if linker supports PIE with copy reloc, which is > enabled in gold and bfd linker in bininutils 2.25, at configure time > and enables this optimization if the linker support is available. > > gcc/ > > * configure.ac (HAVE_LD_PIE_COPYRELOC): Defined to 1 if > Linux/x86-64 linker supports PIE with copy reloc. > * config.in: Regenerated. > * configure: Likewise. > > * config/i386/i386.c (legitimate_pic_address_disp_p): Allow > pc-relative address for undefined, non-weak, non-function > symbol reference in 64-bit PIE if linker supports PIE with > copy reloc. > > * doc/sourcebuild.texi: Document pie_copyreloc target. > > gcc/testsuite/ > > * gcc.target/i386/pie-copyrelocs-1.c: New test. > * gcc.target/i386/pie-copyrelocs-2.c: Likewise. > * gcc.target/i386/pie-copyrelocs-3.c: Likewise. > * gcc.target/i386/pie-copyrelocs-4.c: Likewise. > > * lib/target-supports.exp (check_effective_target_pie_copyreloc): > New procedure.