From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11322 invoked by alias); 22 Apr 2015 23:08:24 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 11298 invoked by uid 89); 22 Apr 2015 23:08:23 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.8 required=5.0 tests=AWL,BAYES_40,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ob0-f177.google.com Received: from mail-ob0-f177.google.com (HELO mail-ob0-f177.google.com) (209.85.214.177) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 22 Apr 2015 23:08:22 +0000 Received: by obcux3 with SMTP id ux3so1022143obc.2 for ; Wed, 22 Apr 2015 16:08:20 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.60.133.144 with SMTP id pc16mr25951438oeb.0.1429744100269; Wed, 22 Apr 2015 16:08:20 -0700 (PDT) Received: by 10.76.54.14 with HTTP; Wed, 22 Apr 2015 16:08:20 -0700 (PDT) In-Reply-To: References: <20150422163432.GA1053@intel.com> Date: Wed, 22 Apr 2015 23:08:00 -0000 Message-ID: Subject: Re: [PATCH] PR target/65846: Optimize data access in PIE with copy reloc From: "H.J. Lu" To: ramrad01@arm.com Cc: gcc-patches , Evgeny Stupachenko , Sriraman Tallam , Uros Bizjak Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-04/txt/msg01370.txt.bz2 On Wed, Apr 22, 2015 at 3:15 PM, Ramana Radhakrishnan wrote: > On Wed, Apr 22, 2015 at 5:34 PM, H.J. Lu wrote: >> Normally, with PIE, GCC accesses globals that are extern to the module >> using GOT. This is two instructions, one to get the address of the global >> from GOT and the other to get the value. Examples: >> >> --- >> extern int a_glob; >> int >> main () >> { >> return a_glob; >> } >> --- >> >> With PIE, the generated code accesses global via GOT using two memory >> loads: >> >> movq a_glob@GOTPCREL(%rip), %rax >> movl (%rax), %eax >> >> for 64-bit or >> >> movl a_glob@GOT(%ecx), %eax >> movl (%eax), %eax >> >> for 32-bit. >> >> Some experiments on google and SPEC CPU benchmarks show that the extra >> instruction affects performance by 1% to 5%. >> >> Solution - Copy Relocations: >> >> When the linker supports copy relocations, GCC can always assume that >> the global will be defined in the executable. For globals that are >> truly extern (come from shared objects), the linker will create copy >> relocations and have them defined in the executable. Result is that >> no global access needs to go through GOT and hence improves performance. >> We can generate >> >> movl a_glob(%rip), %eax >> >> for 64-bit and >> >> movl a_glob@GOTOFF(%eax), %eax >> >> for 32-bit. This optimization only applies to undefined non-weak >> non-TLS global data. Undefined weak global or TLS data access still >> must go through GOT. >> >> This patch reverts legitimate_pic_address_disp_p change made in revision >> 218397, which only applies to x86-64. Instead, this patch updates >> targetm.binds_local_p to indicate if undefined non-weak non-TLS global >> data is defined locally in PIE. It also introduces a new target hook, >> binds_tls_local_p to distinguish TLS variable from non-TLS variable. By >> default, binds_tls_local_p is the same as binds_local_p. >> >> This patch checks if 32-bit and 64-bit linkers support PIE with copy >> reloc at configure time. 64-bit linker is enabled in binutils 2.25 >> and 32-bit linker is enabled in binutils 2.26. This optimization >> is enabled only if the linker support is available. >> >> Tested on Linux/x86-64 with -m32 and -m64, using linkers with and without >> support for copy relocation in PIE. OK for trunk? >> >> Thanks. > > > Looking at this my first reaction was that surely most (if not all ? ) > targets that use ELF and had copy relocs would benefit from this ? > Couldn't we find a simpler way for targets to have this support ? I > don't have a more constructive suggestion to make at the minute but > getting this to work just from the targetm.binds_local_p (decl) > interface would probably be better ? default_binds_local_p_3 is a global function which is used to implement targetm.binds_local_p in x86 backend. Any backend can use it to optimize for copy relocation. -- H.J.