public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH x86_64] Optimize access to globals in "-fpie -pie" builds with copy relocations
@ 2014-05-15 18:34 Sriraman Tallam
  2014-05-19 18:11 ` Sriraman Tallam
  0 siblings, 1 reply; 63+ messages in thread
From: Sriraman Tallam @ 2014-05-15 18:34 UTC (permalink / raw)
  To: GCC Patches, David Li, Cary Coutant, Ian Lance Taylor

[-- Attachment #1: Type: text/plain, Size: 2296 bytes --]

Optimize access to globals with -fpie, x86_64 only:

Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
using the GOT.  This is two instructions, one to get the address of the global
from the GOT and the other to get the value.  If it turns out that the global
gets defined in the executable at link-time, it still needs to go through the
GOT as it is too late then to generate a direct access.

Examples:

foo.cc
------
int a_glob;
int main () {
  return a_glob; // defined in this file
}

With -O2 -fpie -pie, the generated code directly accesses the global via
PC-relative insn:

5e0   <main>:
   mov    0x165a(%rip),%eax        # 1c40 <a_glob>

foo.cc
------

extern int a_glob;
int main () {
  return a_glob; // defined in this file
}

With -O2 -fpie -pie, the generated code accesses global via GOT using two
memory loads:

6f0  <main>:
   mov    0x1609(%rip),%rax   # 1d00 <_DYNAMIC+0x230>
   mov    (%rax),%eax

This is true even if in the latter case the global was defined in the
executable through a different file.

Some experiments on google benchmarks shows that the extra memory loads affects
performance by 1% to 5%.


Solution - Copy Relocations:

When the linker supports copy relocations, GCC can always assume that the
global will be defined in the executable.  For globals that are truly extern
(come from shared objects), the linker will create copy relocations and have
them defined in the executable. Result is that no global access needs to go
through the GOT and hence improves performance.

This patch to the gold linker :
https://sourceware.org/ml/binutils/2014-05/msg00092.html
submitted recently allows gold to generate copy relocations for -pie mode when
necessary.

I have added option -mld-pie-copyrelocs which when combined with -fpie would do
this.  Note that the BFD linker does not support pie copyrelocs yet and this
option cannot be used there.

Please review.


ChangeLog:

* config/i386/i36.opt (mld-pie-copyrelocs): New option.
* config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
 address is still legitimate in the presence of copy relocations
 and -fpie.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.



Patch attached.
Thanks
Sri

[-- Attachment #2: gcc_pie_copyrelocs_patch.txt --]
[-- Type: text/plain, Size: 4850 bytes --]

Optimize access to globals with -fpie, x86_64 only:

Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
using the GOT.  This is two instructions, one to get the address of the global
from the GOT and the other to get the value.  If it turns out that the global
gets defined in the executable at link-time, it still needs to go through the
GOT as it is too late then to generate a direct access. 

Examples:

foo.cc
------
int a_glob;
int main () {
  return a_glob; // defined in this file
}

With -O2 -fpie -pie, the generated code directly accesses the global via
PC-relative insn:

5e0   <main>:
   mov    0x165a(%rip),%eax        # 1c40 <a_glob>

foo.cc
------

extern int a_glob;
int main () {
  return a_glob; // defined in this file
}

With -O2 -fpie -pie, the generated code accesses global via GOT using two
memory loads:

6f0  <main>:
   mov    0x1609(%rip),%rax   # 1d00 <_DYNAMIC+0x230>
   mov    (%rax),%eax

This is true even if in the latter case the global was defined in the
executable through a different file.

Some experiments on google benchmarks shows that the extra memory loads affects
performance by 1% to 5%. 


Solution - Copy Relocations:

When the linker supports copy relocations, GCC can always assume that the
global will be defined in the executable.  For globals that are truly extern
(come from shared objects), the linker will create copy relocations and have
them defined in the executable. Result is that no global access needs to go
through the GOT and hence improves performance.

This patch to the gold linker :
https://sourceware.org/ml/binutils/2014-05/msg00092.html
submitted recently allows gold to generate copy relocations for -pie mode when
necessary.

I have added option -mld-pie-copyrelocs which when combined with -fpie would do
this.  Note that the BFD linker does not support pie copyrelocs yet and this
option cannot be used there.

Please review.


ChangeLog:

	* config/i386/i36.opt (mld-pie-copyrelocs): New option.
	* config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
	  address is still legitimate in the presence of copy relocations
	  and -fpie.
	* testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
	* testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.


Index: config/i386/i386.opt
===================================================================
--- config/i386/i386.opt	(revision 210437)
+++ config/i386/i386.opt	(working copy)
@@ -108,6 +108,10 @@ int x_ix86_dump_tunes
 TargetSave
 int x_ix86_force_align_arg_pointer
 
+;; -mld-pie-copyrelocs
+TargetSave
+int x_ix86_ld_pie_copyrelocs
+
 ;; -mforce-drap= 
 TargetSave
 int x_ix86_force_drap
@@ -291,6 +295,10 @@ mfancy-math-387
 Target RejectNegative Report InverseMask(NO_FANCY_MATH_387, USE_FANCY_MATH_387) Save
 Generate sin, cos, sqrt for FPU
 
+mld-pie-copyrelocs
+Target Report Var(ix86_ld_pie_copyrelocs) Init(0)
+Use linker copy relocs for pie
+
 mforce-drap
 Target Report Var(ix86_force_drap)
 Always use Dynamic Realigned Argument Pointer (DRAP) to realign stack
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 210437)
+++ config/i386/i386.c	(working copy)
@@ -12684,7 +12684,9 @@ legitimate_pic_address_disp_p (rtx disp)
 		return true;
 	    }
 	  else if (!SYMBOL_REF_FAR_ADDR_P (op0)
-		   && SYMBOL_REF_LOCAL_P (op0)
+		   && (SYMBOL_REF_LOCAL_P (op0)
+		       || (TARGET_64BIT && ix86_ld_pie_copyrelocs && flag_pie
+			   && !SYMBOL_REF_FUNCTION_P (op0)))
 		   && ix86_cmodel != CM_LARGE_PIC)
 	    return true;
 	  break;
Index: testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c
===================================================================
--- testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c	(revision 0)
+++ testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c	(revision 0)
@@ -0,0 +1,13 @@
+/* Test if -mld-pie-copyrelocs does the right thing. */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fpie -mld-pie-copyrelocs" } */
+
+extern int glob_a;
+
+int foo ()
+{
+  return glob_a;
+}
+
+/* glob_a should never be accessed with a GOTPCREL  */ 
+/* { dg-final { scan-assembler-not "glob_a\\@GOTPCREL" { target { x86_64-*-* } } } } */
Index: testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c
===================================================================
--- testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c	(revision 0)
+++ testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c	(revision 0)
@@ -0,0 +1,13 @@
+/* Test if -mno-ld-pie-copyrelocs does the right thing. */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fpie -mno-ld-pie-copyrelocs" } */
+
+extern int glob_a;
+
+int foo ()
+{
+  return glob_a;
+}
+
+/* glob_a should always be accessed via GOT  */ 
+/* { dg-final { scan-assembler "glob_a\\@GOT" { target { x86_64-*-* } } } } */


^ permalink raw reply	[flat|nested] 63+ messages in thread
* Re: [PATCH x86_64] Optimize access to globals in "-fpie -pie" builds with copy relocations
@ 2014-12-02 19:19 Uros Bizjak
  2014-12-02 19:39 ` H.J. Lu
  2014-12-02 19:40 ` H.J. Lu
  0 siblings, 2 replies; 63+ messages in thread
From: Uros Bizjak @ 2014-12-02 19:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sriraman Tallam, H.J. Lu, Jakub Jelinek

Hello!

> Ping.
>> Ping.
>>> Ping.
>>>> Ping.

It would probably help reviewers if you pointed to actual path
submission [1], which unfortunately contains the explanation in the
patch itself [2], which further explains that this functionality is
currently only supported with gold, patched with [3].

[1] https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00645.html
[2] https://gcc.gnu.org/ml/gcc-patches/2014-09/txt2CHtu81P1O.txt
[3] https://sourceware.org/ml/binutils/2014-05/msg00092.html

After a bit of the above detective work, I think that new gcc option
is not necessary. The configure should detect if new functionality is
supported in the linker, and auto-configure gcc to use it when
appropriate.

I have also added a couple of linker experts in the CC.

Uros.

^ permalink raw reply	[flat|nested] 63+ messages in thread
* Re: [PATCH x86_64] Optimize access to globals in "-fpie -pie" builds with copy relocations
@ 2014-12-04 22:19 Dominique Dhumieres
  2014-12-04 23:54 ` H.J. Lu
  0 siblings, 1 reply; 63+ messages in thread
From: Dominique Dhumieres @ 2014-12-04 22:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: hjl.tools

> Normally, with -fPIE/-fpie, GCC accesses globals that are extern to the
> module using the GOT.  This is two instructions, one to get the address
> of the global from the GOT and the other to get the value.  If it turns
> out that the global gets defined in the executable at link-time, it still
> needs to go through the GOT as it is too late then to generate a direct
> access.
>
> Examples:
>
> foo.cc
> ------
> int a_glob;
> int main () {
>   return a_glob; // defined in this file
> }
>
> With -O2 -fpie -pie, the generated code directly accesses the global via
> PC-relative insn:
>
> 5e0   <main>:
>    mov    0x165a(%rip),%eax        # 1c40 <a_glob>
>
> foo.cc
> ------
>
> extern int a_glob;
> int main () {
>   return a_glob; // defined in this file
> }
>
> With -O2 -fpie -pie, the generated code accesses global via GOT using
> two memory loads:
>
> 6f0  <main>:
>    mov    0x1609(%rip),%rax   # 1d00 <_DYNAMIC+0x230>
>    mov    (%rax),%eax
>
> This is true even if in the latter case the global was defined in the
> executable through a different file.
>
> Some experiments on google benchmarks shows that the extra memory loads
> affects performance by 1% to 5%.
>
> Solution - Copy Relocations:
>
> When the linker supports copy relocations, GCC can always assume that
> the global will be defined in the executable.  For globals that are truly
> extern (come from shared objects), the linker will create copy relocations
> and have them defined in the executable. Result is that no global access
> needs to go through the GOT and hence improves performance.
>
> This optimization only applies to undefined, non-weak global data.
> Undefined, weak global data access still must go through the GOT.
>
> This patch checks if linker supports PIE with copy reloc, which is
> enabled in gold and bfd linker in bininutils 2.25, at configure time
> and enables this optimization if the linker support is available.
>
> gcc/
>
> * configure.ac (HAVE_LD_PIE_COPYRELOC): Defined to 1 if
> Linux/x86-64 linker supports PIE with copy reloc.
> * config.in: Regenerated.
> * configure: Likewise.
>
> * config/i386/i386.c (legitimate_pic_address_disp_p): Allow
> pc-relative address for undefined, non-weak, non-function
> symbol reference in 64-bit PIE if linker supports PIE with
> copy reloc.
>
> * doc/sourcebuild.texi: Document pie_copyreloc target.
>
> gcc/testsuite/
>
> * gcc.target/i386/pie-copyrelocs-1.c: New test.
> * gcc.target/i386/pie-copyrelocs-2.c: Likewise.
> * gcc.target/i386/pie-copyrelocs-3.c: Likewise.
> * gcc.target/i386/pie-copyrelocs-4.c: Likewise.
>
> * lib/target-supports.exp (check_effective_target_pie_copyreloc):
> New procedure.

It caused pr64189.

Dominique.

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2015-02-27 23:26 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-15 18:34 [PATCH x86_64] Optimize access to globals in "-fpie -pie" builds with copy relocations Sriraman Tallam
2014-05-19 18:11 ` Sriraman Tallam
2014-06-09 22:55   ` Sriraman Tallam
2014-06-21  0:17     ` Sriraman Tallam
2014-06-26 17:55       ` Sriraman Tallam
2014-07-11 17:42         ` Sriraman Tallam
2014-09-02 18:15           ` Sriraman Tallam
2014-09-02 20:40       ` Richard Henderson
2014-09-03  7:25         ` Bernhard Reutner-Fischer
2014-09-08 22:19         ` Sriraman Tallam
2014-09-19 21:11           ` Sriraman Tallam
2014-09-29 17:57             ` Sriraman Tallam
2014-10-06 20:43               ` Sriraman Tallam
2014-11-10 23:35                 ` Sriraman Tallam
2014-12-02 18:01                   ` Sriraman Tallam
2014-12-02 19:06           ` H.J. Lu
2014-12-02 19:19 Uros Bizjak
2014-12-02 19:39 ` H.J. Lu
2014-12-02 19:40 ` H.J. Lu
2014-12-02 20:01   ` Uros Bizjak
2014-12-02 20:43     ` H.J. Lu
2014-12-02 20:19       ` Jakub Jelinek
2014-12-02 22:14         ` H.J. Lu
2014-12-02 23:21           ` H.J. Lu
2014-12-03 13:47     ` H.J. Lu
2014-12-03 15:01       ` H.J. Lu
2014-12-03 21:35         ` H.J. Lu
2014-12-04 12:44           ` Uros Bizjak
2014-12-04 16:46             ` H.J. Lu
2014-12-04 19:32               ` Uros Bizjak
2015-02-03 19:25               ` Sriraman Tallam
2015-02-03 19:26                 ` Sriraman Tallam
2015-02-03 19:36                 ` Jakub Jelinek
2015-02-03 21:20                   ` Sriraman Tallam
2015-02-03 21:29                     ` H.J. Lu
2015-02-03 21:36                       ` Sriraman Tallam
2015-02-03 22:03                         ` H.J. Lu
2015-02-03 22:19                           ` Jakub Jelinek
2015-02-04  1:16                             ` H.J. Lu
2015-02-04 18:27                               ` Sriraman Tallam
2015-02-04 18:31                                 ` Jakub Jelinek
2015-02-04 18:38                                   ` H.J. Lu
2015-02-04 18:42                                     ` Jakub Jelinek
2015-02-04 18:45                                       ` H.J. Lu
2015-02-04 18:51                                         ` Sriraman Tallam
2015-02-04 18:57                                           ` H.J. Lu
2015-02-04 21:53                                             ` Sriraman Tallam
2015-02-04 22:37                                               ` H.J. Lu
2015-02-04 22:47                                                 ` Bernhard Reutner-Fischer
2015-02-04 23:10                                                   ` H.J. Lu
2015-02-04 23:29                                                     ` H.J. Lu
2015-02-05 16:57                                                       ` Bernhard Reutner-Fischer
2015-02-05 18:54                                                       ` Richard Henderson
2015-02-05 19:01                                                         ` H.J. Lu
2015-02-05 19:59                                                           ` Richard Henderson
2015-02-05 22:05                                                             ` Sriraman Tallam
2015-02-05 22:47                                                               ` H.J. Lu
2015-02-05 22:48                                                                 ` Sriraman Tallam
2015-02-06 16:25                                                               ` H.J. Lu
2015-02-27 23:39               ` H.J. Lu
2015-02-27 23:46                 ` H.J. Lu
2014-12-04 22:19 Dominique Dhumieres
2014-12-04 23:54 ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).