public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH][x86_64] Convert indirect call via GOT to direct when possible
@ 2016-05-20 20:28 Sriraman Tallam
  2016-05-20 20:32 ` H.J. Lu
  0 siblings, 1 reply; 26+ messages in thread
From: Sriraman Tallam @ 2016-05-20 20:28 UTC (permalink / raw)
  To: binutils, Cary Coutant; +Cc: H.J. Lu, David Li

[-- Attachment #1: Type: text/plain, Size: 1441 bytes --]

Hi,

   GCC has option -fno-plt which converts all extern calls to indirect
calls via GOT to prevent the linker for generating any PLT stubs.
However, if the function ends up defined in the executable this patch
will convert those indirect calls/jumps to direct.  Since the indirect
calls are one byte longer, an extra nop is needed at the beginning.

Here is a simple example:

main.c
---------
extern int foo();
int main() {
  return foo();
}

deffoo.c
-----------
int foo() {
  return 0;
}

$ gcc -fno-plt main.c deffoo.c
$objdump -d a.out

0000000000400626 <main>:
  ...
  40062a:       ff 15 28 14 00 00       callq  *0x1428(%rip)        #
401a58 <_DYNAMIC+0x1d8>

The call is indirect even though foo is defined in the executable.

With this patch,
0000000000400606 <main>:
   ....
   40060a:       90                      nop
  40060b:       e8 03 00 00 00          callq  400613 <foo>

The call is now direct with an extra nop.

   Please review.

Thanks
Sri

* x86_64.cc (can_convert_callq_to_direct): New function.
Target_x86_64<size>::Scan::global: Check if an indirect call via
GOT can be converted to direct.
Target_x86_64<size>::Relocate::relocate: Change any indirect call
via GOT that can be converted.
* testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
* testsuite/Makefile.in: Regenerate.
* testsuite/x86_64_indirect_call_to_direct1.s: New file.
* testsuite/x86_64_indirect_jump_to_direct1.s: New file.

[-- Attachment #2: convert_indirect_call_patch.txt --]
[-- Type: text/plain, Size: 5731 bytes --]

	* x86_64.cc (can_convert_callq_to_direct): New function.
	Target_x86_64<size>::Scan::global: Check if an indirect call via
	GOT can be converted to direct.
	Target_x86_64<size>::Relocate::relocate: Change any indirect call
	via GOT that can be converted.
	* testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
	* testsuite/Makefile.in: Regenerate.
	* testsuite/x86_64_indirect_call_to_direct1.s: New file.
	* testsuite/x86_64_indirect_jump_to_direct1.s: New file.


diff --git a/gold/testsuite/Makefile.am b/gold/testsuite/Makefile.am
index bf222c3..797c6b0 100644
--- a/gold/testsuite/Makefile.am
+++ b/gold/testsuite/Makefile.am
@@ -1084,6 +1084,25 @@ x86_64_mov_to_lea13.stdout: x86_64_mov_to_lea13
 x86_64_mov_to_lea14.stdout: x86_64_mov_to_lea14
 	$(TEST_OBJDUMP) -dw $< > $@
 
+check_SCRIPTS += x86_64_indirect_call_to_direct.sh
+check_DATA += x86_64_indirect_call_to_direct1.stdout \
+	x86_64_indirect_jump_to_direct1.stdout
+MOSTLYCLEANFILES += x86_64_indirect_call_to_direct1 \
+	x86_64_indirect_jump_to_direct1
+
+x86_64_indirect_call_to_direct1.o: x86_64_indirect_call_to_direct1.s
+	$(TEST_AS) --64 -o $@ $<
+x86_64_indirect_call_to_direct1: x86_64_indirect_call_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_call_to_direct1.stdout: x86_64_indirect_call_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+x86_64_indirect_jump_to_direct1.o: x86_64_indirect_jump_to_direct1.s
+	$(TEST_AS) --64 -o $@ $<
+x86_64_indirect_jump_to_direct1: x86_64_indirect_jump_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_jump_to_direct1.stdout: x86_64_indirect_jump_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+
 check_SCRIPTS += x86_64_overflow_pc32.sh
 check_DATA += x86_64_overflow_pc32.err
 MOSTLYCLEANFILES += x86_64_overflow_pc32.err
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct1.s b/gold/testsuite/x86_64_indirect_call_to_direct1.s
index e69de29..5ca2e38 100644
--- a/gold/testsuite/x86_64_indirect_call_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_call_to_direct1.s
@@ -0,0 +1,12 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	call	*foo@GOTPCREL(%rip)
+	ret
+	.size	main, .-main
diff --git a/gold/testsuite/x86_64_indirect_jump_to_direct1.s b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
index e69de29..b817e34 100644
--- a/gold/testsuite/x86_64_indirect_jump_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
@@ -0,0 +1,11 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	jmp	*foo@GOTPCREL(%rip)
+	.size	main, .-main
diff --git a/gold/x86_64.cc b/gold/x86_64.cc
index 81126ef..59b7de5 100644
--- a/gold/x86_64.cc
+++ b/gold/x86_64.cc
@@ -891,6 +891,21 @@ class Target_x86_64 : public Sized_target<size, false>
 	    && strcmp(gsym->name(), "_DYNAMIC") != 0);
   }
 
+  // Check if relocation against this symbol is a candidate for
+  // conversion from
+  // (callq|jmpq) *foo@GOTPCREL(%rip) to
+  // nop
+  // (callq|jmpq) foo
+  static bool
+  can_convert_callq_to_direct(const Symbol* gsym)
+  {
+    gold_assert(gsym != NULL);
+    return (gsym->type() == elfcpp::STT_FUNC
+	    && !gsym->is_undefined ()
+	    && !gsym->is_from_dynobj()
+	    && !gsym->is_preemptible());
+  }
+
   // Adjust TLS relocation type based on the options and whether this
   // is a local symbol.
   static tls::Tls_optimization
@@ -2931,17 +2946,31 @@ Target_x86_64<size>::Scan::global(Symbol_table* symtab,
 	// If we convert this from
 	// mov foo@GOTPCREL(%rip), %reg
 	// to lea foo(%rip), %reg.
+	// OR
+	// if we convert
+	// (callq|jmpq) *foo@GOTPCREL(%rip) to
+	// nop
+	// (callq|jmpq) foo
 	// in Relocate::relocate, then there is nothing to do here.
 	if ((r_type == elfcpp::R_X86_64_GOTPCREL
 	     || r_type == elfcpp::R_X86_64_GOTPCRELX
 	     || r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
 	    && reloc.get_r_offset() >= 2
-	    && Target_x86_64<size>::can_convert_mov_to_lea(gsym))
+	    && (Target_x86_64<size>::can_convert_mov_to_lea(gsym)
+		|| Target_x86_64<size>::can_convert_callq_to_direct(gsym)))
 	  {
 	    section_size_type stype;
 	    const unsigned char* view = object->section_contents(data_shndx,
 								 &stype, true);
-	    if (view[reloc.get_r_offset() - 2] == 0x8b)
+	    if (Target_x86_64<size>::can_convert_mov_to_lea(gsym)
+		&& view[reloc.get_r_offset() - 2] == 0x8b)
+	      break;
+
+	    // Opcode for call is 0xff 0x15 and opcode for jmp is 0xff 0x25
+	    if (Target_x86_64<size>::can_convert_callq_to_direct(gsym)
+		&& view[reloc.get_r_offset() - 2] == 0xff
+		&& (view[reloc.get_r_offset() - 1] == 0x15
+		    || view[reloc.get_r_offset() - 1] == 0x25))
 	      break;
 	  }
 
@@ -3634,6 +3663,28 @@ Target_x86_64<size>::Relocate::relocate(
 	  view[-2] = 0x8d;
 	  Reloc_funcs::pcrela32(view, object, psymval, addend, address);
 	}
+      // Convert
+      // (callq|jmpq) *foo@GOTPCREL(%rip) to
+      // nop
+      // (callq|jmpq) foo
+      else if (rela.get_r_offset() >= 2
+	       && view[-2] == 0xff
+	       && (view [-1] == 0x15 || view [-1] == 0x25)
+	       && (gsym != NULL
+		   && Target_x86_64<size>::can_convert_callq_to_direct(gsym)))
+	{
+	  // Insert the 1-byte nop, whose opcode is 0x90.  This is needed
+	  // because the indirect call(jump) is one byte longer than the
+	  // direct call(jump).
+	  view[-2] = 0x90;
+	  // Insert the direct call (opcode 0xe8) or jmp (opcode 0xe9).
+	  if (view[-1] == 0x15)
+	    view[-1] = 0xe8;
+	  else
+	    view[-1] = 0xe9;
+	  // Convert GOTPCREL to 32-bit pc relative reloc.
+	  Reloc_funcs::pcrela32(view, object, psymval, addend, address);
+	}
       else
 	{
 	  if (gsym != NULL)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-05-20 20:28 [PATCH][x86_64] Convert indirect call via GOT to direct when possible Sriraman Tallam
@ 2016-05-20 20:32 ` H.J. Lu
  2016-05-20 21:15   ` Sriraman Tallam
  2016-05-27 22:14   ` Sriraman Tallam
  0 siblings, 2 replies; 26+ messages in thread
From: H.J. Lu @ 2016-05-20 20:32 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: binutils, Cary Coutant, David Li

On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi,
>
>    GCC has option -fno-plt which converts all extern calls to indirect
> calls via GOT to prevent the linker for generating any PLT stubs.
> However, if the function ends up defined in the executable this patch
> will convert those indirect calls/jumps to direct.  Since the indirect
> calls are one byte longer, an extra nop is needed at the beginning.
>
> Here is a simple example:
>
> main.c
> ---------
> extern int foo();
> int main() {
>   return foo();
> }
>
> deffoo.c
> -----------
> int foo() {
>   return 0;
> }
>
> $ gcc -fno-plt main.c deffoo.c
> $objdump -d a.out
>
> 0000000000400626 <main>:
>   ...
>   40062a:       ff 15 28 14 00 00       callq  *0x1428(%rip)        #
> 401a58 <_DYNAMIC+0x1d8>
>
> The call is indirect even though foo is defined in the executable.
>
> With this patch,
> 0000000000400606 <main>:
>    ....
>    40060a:       90                      nop
>   40060b:       e8 03 00 00 00          callq  400613 <foo>
>
> The call is now direct with an extra nop.
>
>

Please try ld, which uses 0x67 prefix (addr32) instead of nop.
Also for

jmp *foo#GOTPCREL(%rip)

 ld converts it to

jmp foo
nop

-- 
H.J.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-05-20 20:32 ` H.J. Lu
@ 2016-05-20 21:15   ` Sriraman Tallam
  2016-05-20 21:32     ` H.J. Lu
  2016-05-27 22:14   ` Sriraman Tallam
  1 sibling, 1 reply; 26+ messages in thread
From: Sriraman Tallam @ 2016-05-20 21:15 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils, Cary Coutant, David Li

On Fri, May 20, 2016 at 1:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi,
>>
>>    GCC has option -fno-plt which converts all extern calls to indirect
>> calls via GOT to prevent the linker for generating any PLT stubs.
>> However, if the function ends up defined in the executable this patch
>> will convert those indirect calls/jumps to direct.  Since the indirect
>> calls are one byte longer, an extra nop is needed at the beginning.
>>
>> Here is a simple example:
>>
>> main.c
>> ---------
>> extern int foo();
>> int main() {
>>   return foo();
>> }
>>
>> deffoo.c
>> -----------
>> int foo() {
>>   return 0;
>> }
>>
>> $ gcc -fno-plt main.c deffoo.c
>> $objdump -d a.out
>>
>> 0000000000400626 <main>:
>>   ...
>>   40062a:       ff 15 28 14 00 00       callq  *0x1428(%rip)        #
>> 401a58 <_DYNAMIC+0x1d8>
>>
>> The call is indirect even though foo is defined in the executable.
>>
>> With this patch,
>> 0000000000400606 <main>:
>>    ....
>>    40060a:       90                      nop
>>   40060b:       e8 03 00 00 00          callq  400613 <foo>
>>
>> The call is now direct with an extra nop.
>>
>>
>
> Please try ld, which uses 0x67 prefix (addr32) instead of nop.

Is this committed to ld?, trunk ld does not seem to do this.

Also a quick thing about -fPIE and -fno-plt.  The assembly looks like
this for the call:

movq foo@GOTPCREL(%rip), %rax
jmp *%rax

Why can't we make this a single jmp *foo@GOTPCREL(%rip)

This goes via the GOT if foo is external and that is always reachable
with a 32-bit offset. Did I miss anything obvious?


> Also for
>
> jmp *foo#GOTPCREL(%rip)
>
>  ld converts it to
>
> jmp foo
> nop
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-05-20 21:15   ` Sriraman Tallam
@ 2016-05-20 21:32     ` H.J. Lu
  2016-05-24 22:06       ` Sriraman Tallam
  0 siblings, 1 reply; 26+ messages in thread
From: H.J. Lu @ 2016-05-20 21:32 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: binutils, Cary Coutant, David Li

On Fri, May 20, 2016 at 2:15 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Fri, May 20, 2016 at 1:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi,
>>>
>>>    GCC has option -fno-plt which converts all extern calls to indirect
>>> calls via GOT to prevent the linker for generating any PLT stubs.
>>> However, if the function ends up defined in the executable this patch
>>> will convert those indirect calls/jumps to direct.  Since the indirect
>>> calls are one byte longer, an extra nop is needed at the beginning.
>>>
>>> Here is a simple example:
>>>
>>> main.c
>>> ---------
>>> extern int foo();
>>> int main() {
>>>   return foo();
>>> }
>>>
>>> deffoo.c
>>> -----------
>>> int foo() {
>>>   return 0;
>>> }
>>>
>>> $ gcc -fno-plt main.c deffoo.c
>>> $objdump -d a.out
>>>
>>> 0000000000400626 <main>:
>>>   ...
>>>   40062a:       ff 15 28 14 00 00       callq  *0x1428(%rip)        #
>>> 401a58 <_DYNAMIC+0x1d8>
>>>
>>> The call is indirect even though foo is defined in the executable.
>>>
>>> With this patch,
>>> 0000000000400606 <main>:
>>>    ....
>>>    40060a:       90                      nop
>>>   40060b:       e8 03 00 00 00          callq  400613 <foo>
>>>
>>> The call is now direct with an extra nop.
>>>
>>>
>>
>> Please try ld, which uses 0x67 prefix (addr32) instead of nop.
>
> Is this committed to ld?, trunk ld does not seem to do this.

It works for me.

> Also a quick thing about -fPIE and -fno-plt.  The assembly looks like
> this for the call:
>
> movq foo@GOTPCREL(%rip), %rax
> jmp *%rax
>
> Why can't we make this a single jmp *foo@GOTPCREL(%rip)

Works for me:

[hjl@gnu-6 tmp]$ cat x.c
extern void bar (void);

void
foo (void)
{
  bar ();
}
[hjl@gnu-6 tmp]$ cat y.c
extern void foo (void);

void
bar (void)
{
}

int
main (void)
{
  foo ();
  return 0;
}
[hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -S -fPIE -fno-plt -O2 x.c y.c
[hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -pie x.o y.o
gcc: error: y.o: No such file or directory
[hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -S -fPIE -fno-plt -O2 x.c y.c
[hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -pie x.s y.s
[hjl@gnu-6 tmp]$ cat x.s
.file "x.c"
.text
.p2align 4,,15
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
jmp *bar@GOTPCREL(%rip)
.cfi_endproc
.LFE0:
.size foo, .-foo
.ident "GCC: (GNU) 6.1.1 20160428"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-6 tmp]$ gdb a.out
GNU gdb (GDB) Fedora 7.10.1-31.fc23
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from a.out...(no debugging symbols found)...done.
(gdb) disass foo
Dump of assembler code for function foo:
   0x0000000000000660 <+0>: jmpq   0x670 <bar>
   0x0000000000000665 <+5>: nop
End of assembler dump.
(gdb)

> This goes via the GOT if foo is external and that is always reachable
> with a 32-bit offset. Did I miss anything obvious?
>
>
>> Also for
>>
>> jmp *foo#GOTPCREL(%rip)
>>
>>  ld converts it to
>>
>> jmp foo
>> nop
>>
>> --
>> H.J.



-- 
H.J.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-05-20 21:32     ` H.J. Lu
@ 2016-05-24 22:06       ` Sriraman Tallam
  2016-05-24 22:39         ` Rafael Espíndola
  0 siblings, 1 reply; 26+ messages in thread
From: Sriraman Tallam @ 2016-05-24 22:06 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils, Cary Coutant, David Li

On Fri, May 20, 2016 at 2:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, May 20, 2016 at 2:15 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Fri, May 20, 2016 at 1:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Hi,
>>>>
>>>>    GCC has option -fno-plt which converts all extern calls to indirect
>>>> calls via GOT to prevent the linker for generating any PLT stubs.
>>>> However, if the function ends up defined in the executable this patch
>>>> will convert those indirect calls/jumps to direct.  Since the indirect
>>>> calls are one byte longer, an extra nop is needed at the beginning.
>>>>
>>>> Here is a simple example:
>>>>
>>>> main.c
>>>> ---------
>>>> extern int foo();
>>>> int main() {
>>>>   return foo();
>>>> }
>>>>
>>>> deffoo.c
>>>> -----------
>>>> int foo() {
>>>>   return 0;
>>>> }
>>>>
>>>> $ gcc -fno-plt main.c deffoo.c
>>>> $objdump -d a.out
>>>>
>>>> 0000000000400626 <main>:
>>>>   ...
>>>>   40062a:       ff 15 28 14 00 00       callq  *0x1428(%rip)        #
>>>> 401a58 <_DYNAMIC+0x1d8>
>>>>
>>>> The call is indirect even though foo is defined in the executable.
>>>>
>>>> With this patch,
>>>> 0000000000400606 <main>:
>>>>    ....
>>>>    40060a:       90                      nop
>>>>   40060b:       e8 03 00 00 00          callq  400613 <foo>
>>>>
>>>> The call is now direct with an extra nop.
>>>>
>>>>
>>>
>>> Please try ld, which uses 0x67 prefix (addr32) instead of nop.
>>
>> Is this committed to ld?, trunk ld does not seem to do this.
>
> It works for me.

I think I am missing something obvious here as this conversion is not
happening for me:

foo.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
   0: 31 c0                 xor    %eax,%eax
   2: c3                   retq

0000000000000003 <_start>:
   3: 31 c0                 xor    %eax,%eax
   5: ff 25 00 00 00 00     jmpq   *0x0(%rip)        # b <_start+0x8>
7: R_X86_64_GOTPCREL foo-0x4

Now,

$ ld --version
GNU ld (GNU Binutils) 2.26.51.20160524
Copyright (C) 2016 Free Software Foundation, Inc.

$ ld foo.o

$ objdump -d a.out
a.out:     file format elf64-x86-64


Disassembly of section .text:

00000000004000e8 <foo>:
  4000e8: 31 c0                 xor    %eax,%eax
  4000ea: c3                   retq

00000000004000eb <_start>:
  4000eb: 31 c0                 xor    %eax,%eax
  4000ed: ff 25 05 00 20 00     jmpq   *0x200005(%rip)        # 6000f8
<_start+0x20000d>


The jmpq is not converted and it should be I would assume.

>
>> Also a quick thing about -fPIE and -fno-plt.  The assembly looks like
>> this for the call:
>>
>> movq foo@GOTPCREL(%rip), %rax
>> jmp *%rax
>>
>> Why can't we make this a single jmp *foo@GOTPCREL(%rip)
>
> Works for me:

Right, this works at HEAD so maybe we are missing some patch that combines this.

>
> [hjl@gnu-6 tmp]$ cat x.c
> extern void bar (void);
>
> void
> foo (void)
> {
>   bar ();
> }
> [hjl@gnu-6 tmp]$ cat y.c
> extern void foo (void);
>
> void
> bar (void)
> {
> }
>
> int
> main (void)
> {
>   foo ();
>   return 0;
> }
> [hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -S -fPIE -fno-plt -O2 x.c y.c
> [hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -pie x.o y.o
> gcc: error: y.o: No such file or directory
> [hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -S -fPIE -fno-plt -O2 x.c y.c
> [hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -pie x.s y.s
> [hjl@gnu-6 tmp]$ cat x.s
> .file "x.c"
> .text
> .p2align 4,,15
> .globl foo
> .type foo, @function
> foo:
> .LFB0:
> .cfi_startproc
> jmp *bar@GOTPCREL(%rip)
> .cfi_endproc
> .LFE0:
> .size foo, .-foo
> .ident "GCC: (GNU) 6.1.1 20160428"
> .section .note.GNU-stack,"",@progbits
> [hjl@gnu-6 tmp]$ gdb a.out
> GNU gdb (GDB) Fedora 7.10.1-31.fc23
> Copyright (C) 2015 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from a.out...(no debugging symbols found)...done.
> (gdb) disass foo
> Dump of assembler code for function foo:
>    0x0000000000000660 <+0>: jmpq   0x670 <bar>
>    0x0000000000000665 <+5>: nop
> End of assembler dump.
> (gdb)
>
>> This goes via the GOT if foo is external and that is always reachable
>> with a 32-bit offset. Did I miss anything obvious?
>>
>>
>>> Also for
>>>
>>> jmp *foo#GOTPCREL(%rip)
>>>
>>>  ld converts it to
>>>
>>> jmp foo
>>> nop
>>>
>>> --
>>> H.J.
>
>
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-05-24 22:06       ` Sriraman Tallam
@ 2016-05-24 22:39         ` Rafael Espíndola
  2016-05-24 22:58           ` Sriraman Tallam
  0 siblings, 1 reply; 26+ messages in thread
From: Rafael Espíndola @ 2016-05-24 22:39 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: H.J. Lu, binutils, Cary Coutant, David Li

You need a R_X86_64_GOTPCRELX.  Pass -mrelax-relocations=yes to a new
version of gas.

Cheers,
Rafael


On 24 May 2016 at 18:06, Sriraman Tallam <tmsriram@google.com> wrote:
> On Fri, May 20, 2016 at 2:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, May 20, 2016 at 2:15 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> On Fri, May 20, 2016 at 1:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> Hi,
>>>>>
>>>>>    GCC has option -fno-plt which converts all extern calls to indirect
>>>>> calls via GOT to prevent the linker for generating any PLT stubs.
>>>>> However, if the function ends up defined in the executable this patch
>>>>> will convert those indirect calls/jumps to direct.  Since the indirect
>>>>> calls are one byte longer, an extra nop is needed at the beginning.
>>>>>
>>>>> Here is a simple example:
>>>>>
>>>>> main.c
>>>>> ---------
>>>>> extern int foo();
>>>>> int main() {
>>>>>   return foo();
>>>>> }
>>>>>
>>>>> deffoo.c
>>>>> -----------
>>>>> int foo() {
>>>>>   return 0;
>>>>> }
>>>>>
>>>>> $ gcc -fno-plt main.c deffoo.c
>>>>> $objdump -d a.out
>>>>>
>>>>> 0000000000400626 <main>:
>>>>>   ...
>>>>>   40062a:       ff 15 28 14 00 00       callq  *0x1428(%rip)        #
>>>>> 401a58 <_DYNAMIC+0x1d8>
>>>>>
>>>>> The call is indirect even though foo is defined in the executable.
>>>>>
>>>>> With this patch,
>>>>> 0000000000400606 <main>:
>>>>>    ....
>>>>>    40060a:       90                      nop
>>>>>   40060b:       e8 03 00 00 00          callq  400613 <foo>
>>>>>
>>>>> The call is now direct with an extra nop.
>>>>>
>>>>>
>>>>
>>>> Please try ld, which uses 0x67 prefix (addr32) instead of nop.
>>>
>>> Is this committed to ld?, trunk ld does not seem to do this.
>>
>> It works for me.
>
> I think I am missing something obvious here as this conversion is not
> happening for me:
>
> foo.o:     file format elf64-x86-64
>
> Disassembly of section .text:
>
> 0000000000000000 <foo>:
>    0: 31 c0                 xor    %eax,%eax
>    2: c3                   retq
>
> 0000000000000003 <_start>:
>    3: 31 c0                 xor    %eax,%eax
>    5: ff 25 00 00 00 00     jmpq   *0x0(%rip)        # b <_start+0x8>
> 7: R_X86_64_GOTPCREL foo-0x4
>
> Now,
>
> $ ld --version
> GNU ld (GNU Binutils) 2.26.51.20160524
> Copyright (C) 2016 Free Software Foundation, Inc.
>
> $ ld foo.o
>
> $ objdump -d a.out
> a.out:     file format elf64-x86-64
>
>
> Disassembly of section .text:
>
> 00000000004000e8 <foo>:
>   4000e8: 31 c0                 xor    %eax,%eax
>   4000ea: c3                   retq
>
> 00000000004000eb <_start>:
>   4000eb: 31 c0                 xor    %eax,%eax
>   4000ed: ff 25 05 00 20 00     jmpq   *0x200005(%rip)        # 6000f8
> <_start+0x20000d>
>
>
> The jmpq is not converted and it should be I would assume.
>
>>
>>> Also a quick thing about -fPIE and -fno-plt.  The assembly looks like
>>> this for the call:
>>>
>>> movq foo@GOTPCREL(%rip), %rax
>>> jmp *%rax
>>>
>>> Why can't we make this a single jmp *foo@GOTPCREL(%rip)
>>
>> Works for me:
>
> Right, this works at HEAD so maybe we are missing some patch that combines this.
>
>>
>> [hjl@gnu-6 tmp]$ cat x.c
>> extern void bar (void);
>>
>> void
>> foo (void)
>> {
>>   bar ();
>> }
>> [hjl@gnu-6 tmp]$ cat y.c
>> extern void foo (void);
>>
>> void
>> bar (void)
>> {
>> }
>>
>> int
>> main (void)
>> {
>>   foo ();
>>   return 0;
>> }
>> [hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -S -fPIE -fno-plt -O2 x.c y.c
>> [hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -pie x.o y.o
>> gcc: error: y.o: No such file or directory
>> [hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -S -fPIE -fno-plt -O2 x.c y.c
>> [hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -pie x.s y.s
>> [hjl@gnu-6 tmp]$ cat x.s
>> .file "x.c"
>> .text
>> .p2align 4,,15
>> .globl foo
>> .type foo, @function
>> foo:
>> .LFB0:
>> .cfi_startproc
>> jmp *bar@GOTPCREL(%rip)
>> .cfi_endproc
>> .LFE0:
>> .size foo, .-foo
>> .ident "GCC: (GNU) 6.1.1 20160428"
>> .section .note.GNU-stack,"",@progbits
>> [hjl@gnu-6 tmp]$ gdb a.out
>> GNU gdb (GDB) Fedora 7.10.1-31.fc23
>> Copyright (C) 2015 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> Type "show configuration" for configuration details.
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>.
>> Find the GDB manual and other documentation resources online at:
>> <http://www.gnu.org/software/gdb/documentation/>.
>> For help, type "help".
>> Type "apropos word" to search for commands related to "word"...
>> Reading symbols from a.out...(no debugging symbols found)...done.
>> (gdb) disass foo
>> Dump of assembler code for function foo:
>>    0x0000000000000660 <+0>: jmpq   0x670 <bar>
>>    0x0000000000000665 <+5>: nop
>> End of assembler dump.
>> (gdb)
>>
>>> This goes via the GOT if foo is external and that is always reachable
>>> with a 32-bit offset. Did I miss anything obvious?
>>>
>>>
>>>> Also for
>>>>
>>>> jmp *foo#GOTPCREL(%rip)
>>>>
>>>>  ld converts it to
>>>>
>>>> jmp foo
>>>> nop
>>>>
>>>> --
>>>> H.J.
>>
>>
>>
>> --
>> H.J.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-05-24 22:39         ` Rafael Espíndola
@ 2016-05-24 22:58           ` Sriraman Tallam
  2016-05-25  2:06             ` Rafael Espíndola
  0 siblings, 1 reply; 26+ messages in thread
From: Sriraman Tallam @ 2016-05-24 22:58 UTC (permalink / raw)
  To: Rafael Espíndola; +Cc: H.J. Lu, binutils, Cary Coutant, David Li

On Tue, May 24, 2016 at 3:39 PM, Rafael Espíndola
<rafael.espindola@gmail.com> wrote:
> You need a R_X86_64_GOTPCRELX.  Pass -mrelax-relocations=yes to a new
> version of gas.

Thanks! I will try that but why is this conversion not applicable for
just R_X86_64_GOTPCREL?  What is the difference?

Sri

>
> Cheers,
> Rafael
>
>
> On 24 May 2016 at 18:06, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Fri, May 20, 2016 at 2:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, May 20, 2016 at 2:15 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> On Fri, May 20, 2016 at 1:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>>    GCC has option -fno-plt which converts all extern calls to indirect
>>>>>> calls via GOT to prevent the linker for generating any PLT stubs.
>>>>>> However, if the function ends up defined in the executable this patch
>>>>>> will convert those indirect calls/jumps to direct.  Since the indirect
>>>>>> calls are one byte longer, an extra nop is needed at the beginning.
>>>>>>
>>>>>> Here is a simple example:
>>>>>>
>>>>>> main.c
>>>>>> ---------
>>>>>> extern int foo();
>>>>>> int main() {
>>>>>>   return foo();
>>>>>> }
>>>>>>
>>>>>> deffoo.c
>>>>>> -----------
>>>>>> int foo() {
>>>>>>   return 0;
>>>>>> }
>>>>>>
>>>>>> $ gcc -fno-plt main.c deffoo.c
>>>>>> $objdump -d a.out
>>>>>>
>>>>>> 0000000000400626 <main>:
>>>>>>   ...
>>>>>>   40062a:       ff 15 28 14 00 00       callq  *0x1428(%rip)        #
>>>>>> 401a58 <_DYNAMIC+0x1d8>
>>>>>>
>>>>>> The call is indirect even though foo is defined in the executable.
>>>>>>
>>>>>> With this patch,
>>>>>> 0000000000400606 <main>:
>>>>>>    ....
>>>>>>    40060a:       90                      nop
>>>>>>   40060b:       e8 03 00 00 00          callq  400613 <foo>
>>>>>>
>>>>>> The call is now direct with an extra nop.
>>>>>>
>>>>>>
>>>>>
>>>>> Please try ld, which uses 0x67 prefix (addr32) instead of nop.
>>>>
>>>> Is this committed to ld?, trunk ld does not seem to do this.
>>>
>>> It works for me.
>>
>> I think I am missing something obvious here as this conversion is not
>> happening for me:
>>
>> foo.o:     file format elf64-x86-64
>>
>> Disassembly of section .text:
>>
>> 0000000000000000 <foo>:
>>    0: 31 c0                 xor    %eax,%eax
>>    2: c3                   retq
>>
>> 0000000000000003 <_start>:
>>    3: 31 c0                 xor    %eax,%eax
>>    5: ff 25 00 00 00 00     jmpq   *0x0(%rip)        # b <_start+0x8>
>> 7: R_X86_64_GOTPCREL foo-0x4
>>
>> Now,
>>
>> $ ld --version
>> GNU ld (GNU Binutils) 2.26.51.20160524
>> Copyright (C) 2016 Free Software Foundation, Inc.
>>
>> $ ld foo.o
>>
>> $ objdump -d a.out
>> a.out:     file format elf64-x86-64
>>
>>
>> Disassembly of section .text:
>>
>> 00000000004000e8 <foo>:
>>   4000e8: 31 c0                 xor    %eax,%eax
>>   4000ea: c3                   retq
>>
>> 00000000004000eb <_start>:
>>   4000eb: 31 c0                 xor    %eax,%eax
>>   4000ed: ff 25 05 00 20 00     jmpq   *0x200005(%rip)        # 6000f8
>> <_start+0x20000d>
>>
>>
>> The jmpq is not converted and it should be I would assume.
>>
>>>
>>>> Also a quick thing about -fPIE and -fno-plt.  The assembly looks like
>>>> this for the call:
>>>>
>>>> movq foo@GOTPCREL(%rip), %rax
>>>> jmp *%rax
>>>>
>>>> Why can't we make this a single jmp *foo@GOTPCREL(%rip)
>>>
>>> Works for me:
>>
>> Right, this works at HEAD so maybe we are missing some patch that combines this.
>>
>>>
>>> [hjl@gnu-6 tmp]$ cat x.c
>>> extern void bar (void);
>>>
>>> void
>>> foo (void)
>>> {
>>>   bar ();
>>> }
>>> [hjl@gnu-6 tmp]$ cat y.c
>>> extern void foo (void);
>>>
>>> void
>>> bar (void)
>>> {
>>> }
>>>
>>> int
>>> main (void)
>>> {
>>>   foo ();
>>>   return 0;
>>> }
>>> [hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -S -fPIE -fno-plt -O2 x.c y.c
>>> [hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -pie x.o y.o
>>> gcc: error: y.o: No such file or directory
>>> [hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -S -fPIE -fno-plt -O2 x.c y.c
>>> [hjl@gnu-6 tmp]$ /usr/gcc-6.1.1-x32/bin/gcc -pie x.s y.s
>>> [hjl@gnu-6 tmp]$ cat x.s
>>> .file "x.c"
>>> .text
>>> .p2align 4,,15
>>> .globl foo
>>> .type foo, @function
>>> foo:
>>> .LFB0:
>>> .cfi_startproc
>>> jmp *bar@GOTPCREL(%rip)
>>> .cfi_endproc
>>> .LFE0:
>>> .size foo, .-foo
>>> .ident "GCC: (GNU) 6.1.1 20160428"
>>> .section .note.GNU-stack,"",@progbits
>>> [hjl@gnu-6 tmp]$ gdb a.out
>>> GNU gdb (GDB) Fedora 7.10.1-31.fc23
>>> Copyright (C) 2015 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "x86_64-redhat-linux-gnu".
>>> Type "show configuration" for configuration details.
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>.
>>> Find the GDB manual and other documentation resources online at:
>>> <http://www.gnu.org/software/gdb/documentation/>.
>>> For help, type "help".
>>> Type "apropos word" to search for commands related to "word"...
>>> Reading symbols from a.out...(no debugging symbols found)...done.
>>> (gdb) disass foo
>>> Dump of assembler code for function foo:
>>>    0x0000000000000660 <+0>: jmpq   0x670 <bar>
>>>    0x0000000000000665 <+5>: nop
>>> End of assembler dump.
>>> (gdb)
>>>
>>>> This goes via the GOT if foo is external and that is always reachable
>>>> with a 32-bit offset. Did I miss anything obvious?
>>>>
>>>>
>>>>> Also for
>>>>>
>>>>> jmp *foo#GOTPCREL(%rip)
>>>>>
>>>>>  ld converts it to
>>>>>
>>>>> jmp foo
>>>>> nop
>>>>>
>>>>> --
>>>>> H.J.
>>>
>>>
>>>
>>> --
>>> H.J.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-05-24 22:58           ` Sriraman Tallam
@ 2016-05-25  2:06             ` Rafael Espíndola
  0 siblings, 0 replies; 26+ messages in thread
From: Rafael Espíndola @ 2016-05-25  2:06 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: H.J. Lu, binutils, Cary Coutant, David Li

On 24 May 2016 at 18:58, Sriraman Tallam <tmsriram@google.com> wrote:
> On Tue, May 24, 2016 at 3:39 PM, Rafael Espíndola
> <rafael.espindola@gmail.com> wrote:
>> You need a R_X86_64_GOTPCRELX.  Pass -mrelax-relocations=yes to a new
>> version of gas.
>
> Thanks! I will try that but why is this conversion not applicable for
> just R_X86_64_GOTPCREL?  What is the difference?

The guarantees. It is hard (impossible?) to disassemble backwards in
x86, so the assembler produces these relocations to let the linker
know that the transformation is safe.

Cheers,
Rafael

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-05-20 20:32 ` H.J. Lu
  2016-05-20 21:15   ` Sriraman Tallam
@ 2016-05-27 22:14   ` Sriraman Tallam
  2016-05-28 17:44     ` H.J. Lu
  1 sibling, 1 reply; 26+ messages in thread
From: Sriraman Tallam @ 2016-05-27 22:14 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils, Cary Coutant, David Li

[-- Attachment #1: Type: text/plain, Size: 1950 bytes --]

On Fri, May 20, 2016 at 1:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Hi,
>>
>>    GCC has option -fno-plt which converts all extern calls to indirect
>> calls via GOT to prevent the linker for generating any PLT stubs.
>> However, if the function ends up defined in the executable this patch
>> will convert those indirect calls/jumps to direct.  Since the indirect
>> calls are one byte longer, an extra nop is needed at the beginning.
>>
>> Here is a simple example:
>>
>> main.c
>> ---------
>> extern int foo();
>> int main() {
>>   return foo();
>> }
>>
>> deffoo.c
>> -----------
>> int foo() {
>>   return 0;
>> }
>>
>> $ gcc -fno-plt main.c deffoo.c
>> $objdump -d a.out
>>
>> 0000000000400626 <main>:
>>   ...
>>   40062a:       ff 15 28 14 00 00       callq  *0x1428(%rip)        #
>> 401a58 <_DYNAMIC+0x1d8>
>>
>> The call is indirect even though foo is defined in the executable.
>>
>> With this patch,
>> 0000000000400606 <main>:
>>    ....
>>    40060a:       90                      nop
>>   40060b:       e8 03 00 00 00          callq  400613 <foo>
>>
>> The call is now direct with an extra nop.
>>
>>
>
> Please try ld, which uses 0x67 prefix (addr32) instead of nop.
> Also for
>
> jmp *foo#GOTPCREL(%rip)
>
>  ld converts it to
>
> jmp foo
> nop

I have modified the patch to keep it consistent with what ld produces.

Please take another look.

* x86_64.cc (can_convert_callq_to_direct): New function.
Target_x86_64<size>::Scan::global: Check if an indirect call via
GOT can be converted to direct.
Target_x86_64<size>::Relocate::relocate: Change any indirect call
via GOT that can be converted.
* testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
* testsuite/Makefile.in: Regenerate.
* testsuite/x86_64_indirect_call_to_direct1.s: New file.
* testsuite/x86_64_indirect_jump_to_direct1.s: New file.

Thanks
Sri

>
> --
> H.J.

[-- Attachment #2: convert_indirect_call_patch.txt --]
[-- Type: text/plain, Size: 7860 bytes --]

	* x86_64.cc (can_convert_callq_to_direct): New function.
	Target_x86_64<size>::Scan::global: Check if an indirect call via
	GOT can be converted to direct.
	Target_x86_64<size>::Relocate::relocate: Change any indirect call
	via GOT that can be converted.
	* testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
	* testsuite/Makefile.in: Regenerate.
	* testsuite/x86_64_indirect_call_to_direct1.s: New file.
	* testsuite/x86_64_indirect_jump_to_direct1.s: New file.

diff --git a/gold/testsuite/Makefile.am b/gold/testsuite/Makefile.am
index 01cae9f..f5cc0db 100644
--- a/gold/testsuite/Makefile.am
+++ b/gold/testsuite/Makefile.am
@@ -1096,6 +1096,25 @@ x86_64_mov_to_lea13.stdout: x86_64_mov_to_lea13
 x86_64_mov_to_lea14.stdout: x86_64_mov_to_lea14
 	$(TEST_OBJDUMP) -dw $< > $@
 
+check_SCRIPTS += x86_64_indirect_call_to_direct.sh
+check_DATA += x86_64_indirect_call_to_direct1.stdout \
+	x86_64_indirect_jump_to_direct1.stdout
+MOSTLYCLEANFILES += x86_64_indirect_call_to_direct1 \
+	x86_64_indirect_jump_to_direct1
+
+x86_64_indirect_call_to_direct1.o: x86_64_indirect_call_to_direct1.s
+	$(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_call_to_direct1: x86_64_indirect_call_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_call_to_direct1.stdout: x86_64_indirect_call_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+x86_64_indirect_jump_to_direct1.o: x86_64_indirect_jump_to_direct1.s
+	$(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_jump_to_direct1: x86_64_indirect_jump_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_jump_to_direct1.stdout: x86_64_indirect_jump_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+
 check_SCRIPTS += x86_64_overflow_pc32.sh
 check_DATA += x86_64_overflow_pc32.err
 MOSTLYCLEANFILES += x86_64_overflow_pc32.err
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct.sh b/gold/testsuite/x86_64_indirect_call_to_direct.sh
index e69de29..d54d024 100755
--- a/gold/testsuite/x86_64_indirect_call_to_direct.sh
+++ b/gold/testsuite/x86_64_indirect_call_to_direct.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+# x86_64_indirect_call_to_direct.sh -- a test for indirect call(jump) to direct
+# conversion.
+
+# Copyright (C) 2016 onwards Free Software Foundation, Inc.
+# Written by Sriraman Tallam <tmsriram@google.com>
+
+# This file is part of gold.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
+# MA 02110-1301, USA.
+
+set -e
+
+grep -q "callq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_call_to_direct1.stdout
+grep -q "jmpq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_jump_to_direct1.stdout
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct1.s b/gold/testsuite/x86_64_indirect_call_to_direct1.s
index e69de29..5ca2e38 100644
--- a/gold/testsuite/x86_64_indirect_call_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_call_to_direct1.s
@@ -0,0 +1,12 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	call	*foo@GOTPCREL(%rip)
+	ret
+	.size	main, .-main
diff --git a/gold/testsuite/x86_64_indirect_jump_to_direct1.s b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
index e69de29..b817e34 100644
--- a/gold/testsuite/x86_64_indirect_jump_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
@@ -0,0 +1,11 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	jmp	*foo@GOTPCREL(%rip)
+	.size	main, .-main
diff --git a/gold/x86_64.cc b/gold/x86_64.cc
index 81126ef..a298117 100644
--- a/gold/x86_64.cc
+++ b/gold/x86_64.cc
@@ -891,6 +891,22 @@ class Target_x86_64 : public Sized_target<size, false>
 	    && strcmp(gsym->name(), "_DYNAMIC") != 0);
   }
 
+  // Convert
+  // callq *foo@GOTPCRELX(%rip) to
+  // addr32 callq foo
+  // and jmpq *foo@GOTPCRELX(%rip) to
+  // jmpq foo
+  // nop
+  static bool
+  can_convert_callq_to_direct(const Symbol* gsym)
+  {
+    gold_assert(gsym != NULL);
+    return (gsym->type() == elfcpp::STT_FUNC
+	    && !gsym->is_undefined ()
+	    && !gsym->is_from_dynobj()
+	    && !gsym->is_preemptible());
+  }
+
   // Adjust TLS relocation type based on the options and whether this
   // is a local symbol.
   static tls::Tls_optimization
@@ -2931,17 +2947,29 @@ Target_x86_64<size>::Scan::global(Symbol_table* symtab,
 	// If we convert this from
 	// mov foo@GOTPCREL(%rip), %reg
 	// to lea foo(%rip), %reg.
+	// OR
+	// if we convert
+	// (callq|jmpq) *foo@GOTPCREL(%rip) to
+	// (callq|jmpq) foo
 	// in Relocate::relocate, then there is nothing to do here.
-	if ((r_type == elfcpp::R_X86_64_GOTPCREL
-	     || r_type == elfcpp::R_X86_64_GOTPCRELX
+	if ((r_type == elfcpp::R_X86_64_GOTPCRELX
 	     || r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
 	    && reloc.get_r_offset() >= 2
-	    && Target_x86_64<size>::can_convert_mov_to_lea(gsym))
+	    && (Target_x86_64<size>::can_convert_mov_to_lea(gsym)
+		|| Target_x86_64<size>::can_convert_callq_to_direct(gsym)))
 	  {
 	    section_size_type stype;
 	    const unsigned char* view = object->section_contents(data_shndx,
 								 &stype, true);
-	    if (view[reloc.get_r_offset() - 2] == 0x8b)
+	    if (Target_x86_64<size>::can_convert_mov_to_lea(gsym)
+		&& view[reloc.get_r_offset() - 2] == 0x8b)
+	      break;
+
+	    // Opcode for call is 0xff 0x15 and opcode for jmp is 0xff 0x25
+	    if (Target_x86_64<size>::can_convert_callq_to_direct(gsym)
+		&& view[reloc.get_r_offset() - 2] == 0xff
+		&& (view[reloc.get_r_offset() - 1] == 0x15
+		    || view[reloc.get_r_offset() - 1] == 0x25))
 	      break;
 	  }
 
@@ -3634,6 +3662,46 @@ Target_x86_64<size>::Relocate::relocate(
 	  view[-2] = 0x8d;
 	  Reloc_funcs::pcrela32(view, object, psymval, addend, address);
 	}
+      // Convert
+      // callq *foo@GOTPCRELX(%rip) to
+      // addr32 callq foo
+      // and jmpq *foo@GOTPCRELX(%rip) to
+      // jmpq foo
+      // nop
+      else if ((r_type == elfcpp::R_X86_64_GOTPCRELX
+		|| r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
+	       && rela.get_r_offset() >= 2
+	       && view[-2] == 0xff
+	       && (view [-1] == 0x15 || view [-1] == 0x25)
+	       && (gsym != NULL
+		   && Target_x86_64<size>::can_convert_callq_to_direct(gsym)))
+	{
+	  if (view[-1] == 0x15)
+	    {
+	      // Convert callq *foo@GOTPCRELX(%rip) to addr32 callq.
+	      // Opcode of addr32 is 0x67 and opcode of direct callq is 0xe8.
+	      view[-2] = 0x67;
+	      view[-1] = 0xe8;
+	      // Convert GOTPCRELX to 32-bit pc relative reloc.
+	      Reloc_funcs::pcrela32(view, object, psymval, addend, address);
+	    }
+	  else
+	    {
+	      // Convert jmpq *foo@GOTPCRELX(%rip) to
+	      // jmpq foo
+	      // nop
+	      // The opcode of direct jmpq is 0xe9.
+	      view[-2] = 0xe9;
+	      // The opcode of nop is 0x90.
+	      view[3] = 0x90;
+	      // Convert GOTPCRELX to 32-bit pc relative reloc.  jmpq is rip
+	      // relative and since the instruction following the jmpq is now
+	      // the nop, offset the address by 1 byte.  The start of the
+              // relocation also moves ahead by 1 byte.
+	      Reloc_funcs::pcrela32(&view[-1], object, psymval, addend,
+				    address - 1);
+	    }
+	}
       else
 	{
 	  if (gsym != NULL)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-05-27 22:14   ` Sriraman Tallam
@ 2016-05-28 17:44     ` H.J. Lu
  2016-05-31 18:03       ` Sriraman Tallam
  0 siblings, 1 reply; 26+ messages in thread
From: H.J. Lu @ 2016-05-28 17:44 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: binutils, Cary Coutant, David Li

On Fri, May 27, 2016 at 3:14 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Fri, May 20, 2016 at 1:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Hi,
>>>
>>>    GCC has option -fno-plt which converts all extern calls to indirect
>>> calls via GOT to prevent the linker for generating any PLT stubs.
>>> However, if the function ends up defined in the executable this patch
>>> will convert those indirect calls/jumps to direct.  Since the indirect
>>> calls are one byte longer, an extra nop is needed at the beginning.
>>>
>>> Here is a simple example:
>>>
>>> main.c
>>> ---------
>>> extern int foo();
>>> int main() {
>>>   return foo();
>>> }
>>>
>>> deffoo.c
>>> -----------
>>> int foo() {
>>>   return 0;
>>> }
>>>
>>> $ gcc -fno-plt main.c deffoo.c
>>> $objdump -d a.out
>>>
>>> 0000000000400626 <main>:
>>>   ...
>>>   40062a:       ff 15 28 14 00 00       callq  *0x1428(%rip)        #
>>> 401a58 <_DYNAMIC+0x1d8>
>>>
>>> The call is indirect even though foo is defined in the executable.
>>>
>>> With this patch,
>>> 0000000000400606 <main>:
>>>    ....
>>>    40060a:       90                      nop
>>>   40060b:       e8 03 00 00 00          callq  400613 <foo>
>>>
>>> The call is now direct with an extra nop.
>>>
>>>
>>
>> Please try ld, which uses 0x67 prefix (addr32) instead of nop.
>> Also for
>>
>> jmp *foo#GOTPCREL(%rip)
>>
>>  ld converts it to
>>
>> jmp foo
>> nop
>
> I have modified the patch to keep it consistent with what ld produces.
>
> Please take another look.
>
> * x86_64.cc (can_convert_callq_to_direct): New function.
> Target_x86_64<size>::Scan::global: Check if an indirect call via
> GOT can be converted to direct.
> Target_x86_64<size>::Relocate::relocate: Change any indirect call
> via GOT that can be converted.
> * testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
> * testsuite/Makefile.in: Regenerate.
> * testsuite/x86_64_indirect_call_to_direct1.s: New file.
> * testsuite/x86_64_indirect_jump_to_direct1.s: New file.
>

Do you need to check R_X86_64_REX_GOTPCRELX for branch?

-- 
H.J.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-05-28 17:44     ` H.J. Lu
@ 2016-05-31 18:03       ` Sriraman Tallam
  2016-06-06 20:51         ` Sriraman Tallam
                           ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Sriraman Tallam @ 2016-05-31 18:03 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils, Cary Coutant, David Li

[-- Attachment #1: Type: text/plain, Size: 2375 bytes --]

On Sat, May 28, 2016 at 10:44 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, May 27, 2016 at 3:14 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Fri, May 20, 2016 at 1:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Hi,
>>>>
>>>>    GCC has option -fno-plt which converts all extern calls to indirect
>>>> calls via GOT to prevent the linker for generating any PLT stubs.
>>>> However, if the function ends up defined in the executable this patch
>>>> will convert those indirect calls/jumps to direct.  Since the indirect
>>>> calls are one byte longer, an extra nop is needed at the beginning.
>>>>
>>>> Here is a simple example:
>>>>
>>>> main.c
>>>> ---------
>>>> extern int foo();
>>>> int main() {
>>>>   return foo();
>>>> }
>>>>
>>>> deffoo.c
>>>> -----------
>>>> int foo() {
>>>>   return 0;
>>>> }
>>>>
>>>> $ gcc -fno-plt main.c deffoo.c
>>>> $objdump -d a.out
>>>>
>>>> 0000000000400626 <main>:
>>>>   ...
>>>>   40062a:       ff 15 28 14 00 00       callq  *0x1428(%rip)        #
>>>> 401a58 <_DYNAMIC+0x1d8>
>>>>
>>>> The call is indirect even though foo is defined in the executable.
>>>>
>>>> With this patch,
>>>> 0000000000400606 <main>:
>>>>    ....
>>>>    40060a:       90                      nop
>>>>   40060b:       e8 03 00 00 00          callq  400613 <foo>
>>>>
>>>> The call is now direct with an extra nop.
>>>>
>>>>
>>>
>>> Please try ld, which uses 0x67 prefix (addr32) instead of nop.
>>> Also for
>>>
>>> jmp *foo#GOTPCREL(%rip)
>>>
>>>  ld converts it to
>>>
>>> jmp foo
>>> nop
>>
>> I have modified the patch to keep it consistent with what ld produces.
>>
>> Please take another look.
>>
>> * x86_64.cc (can_convert_callq_to_direct): New function.
>> Target_x86_64<size>::Scan::global: Check if an indirect call via
>> GOT can be converted to direct.
>> Target_x86_64<size>::Relocate::relocate: Change any indirect call
>> via GOT that can be converted.
>> * testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
>> * testsuite/Makefile.in: Regenerate.
>> * testsuite/x86_64_indirect_call_to_direct1.s: New file.
>> * testsuite/x86_64_indirect_jump_to_direct1.s: New file.
>>
>
> Do you need to check R_X86_64_REX_GOTPCRELX for branch?

Ok, patch changed to not check for this and refactored a bit.

Thanks
Sri

>
> --
> H.J.

[-- Attachment #2: convert_indirect_call_patch.txt --]
[-- Type: text/plain, Size: 8065 bytes --]

	* x86_64.cc (can_convert_callq_to_direct): New function.
	Target_x86_64<size>::Scan::global: Check if an indirect call via
	GOT can be converted to direct.
	Target_x86_64<size>::Relocate::relocate: Change any indirect call
	via GOT that can be converted.
	* testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
	* testsuite/Makefile.in: Regenerate.
	* testsuite/x86_64_indirect_call_to_direct1.s: New file.
	* testsuite/x86_64_indirect_jump_to_direct1.s: New file.

diff --git a/gold/testsuite/Makefile.am b/gold/testsuite/Makefile.am
index 01cae9f..f5cc0db 100644
--- a/gold/testsuite/Makefile.am
+++ b/gold/testsuite/Makefile.am
@@ -1096,6 +1096,25 @@ x86_64_mov_to_lea13.stdout: x86_64_mov_to_lea13
 x86_64_mov_to_lea14.stdout: x86_64_mov_to_lea14
 	$(TEST_OBJDUMP) -dw $< > $@
 
+check_SCRIPTS += x86_64_indirect_call_to_direct.sh
+check_DATA += x86_64_indirect_call_to_direct1.stdout \
+	x86_64_indirect_jump_to_direct1.stdout
+MOSTLYCLEANFILES += x86_64_indirect_call_to_direct1 \
+	x86_64_indirect_jump_to_direct1
+
+x86_64_indirect_call_to_direct1.o: x86_64_indirect_call_to_direct1.s
+	$(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_call_to_direct1: x86_64_indirect_call_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_call_to_direct1.stdout: x86_64_indirect_call_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+x86_64_indirect_jump_to_direct1.o: x86_64_indirect_jump_to_direct1.s
+	$(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_jump_to_direct1: x86_64_indirect_jump_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_jump_to_direct1.stdout: x86_64_indirect_jump_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+
 check_SCRIPTS += x86_64_overflow_pc32.sh
 check_DATA += x86_64_overflow_pc32.err
 MOSTLYCLEANFILES += x86_64_overflow_pc32.err
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct.sh b/gold/testsuite/x86_64_indirect_call_to_direct.sh
index e69de29..d54d024 100755
--- a/gold/testsuite/x86_64_indirect_call_to_direct.sh
+++ b/gold/testsuite/x86_64_indirect_call_to_direct.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+# x86_64_indirect_call_to_direct.sh -- a test for indirect call(jump) to direct
+# conversion.
+
+# Copyright (C) 2016 onwards Free Software Foundation, Inc.
+# Written by Sriraman Tallam <tmsriram@google.com>
+
+# This file is part of gold.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
+# MA 02110-1301, USA.
+
+set -e
+
+grep -q "callq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_call_to_direct1.stdout
+grep -q "jmpq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_jump_to_direct1.stdout
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct1.s b/gold/testsuite/x86_64_indirect_call_to_direct1.s
index e69de29..5ca2e38 100644
--- a/gold/testsuite/x86_64_indirect_call_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_call_to_direct1.s
@@ -0,0 +1,12 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	call	*foo@GOTPCREL(%rip)
+	ret
+	.size	main, .-main
diff --git a/gold/testsuite/x86_64_indirect_jump_to_direct1.s b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
index e69de29..b817e34 100644
--- a/gold/testsuite/x86_64_indirect_jump_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
@@ -0,0 +1,11 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	jmp	*foo@GOTPCREL(%rip)
+	.size	main, .-main
diff --git a/gold/x86_64.cc b/gold/x86_64.cc
index 81126ef..d774d5b 100644
--- a/gold/x86_64.cc
+++ b/gold/x86_64.cc
@@ -891,6 +891,22 @@ class Target_x86_64 : public Sized_target<size, false>
 	    && strcmp(gsym->name(), "_DYNAMIC") != 0);
   }
 
+  // Convert
+  // callq *foo@GOTPCRELX(%rip) to
+  // addr32 callq foo
+  // and jmpq *foo@GOTPCRELX(%rip) to
+  // jmpq foo
+  // nop
+  static bool
+  can_convert_callq_to_direct(const Symbol* gsym)
+  {
+    gold_assert(gsym != NULL);
+    return (gsym->type() == elfcpp::STT_FUNC
+	    && !gsym->is_undefined ()
+	    && !gsym->is_from_dynobj()
+	    && !gsym->is_preemptible());
+  }
+
   // Adjust TLS relocation type based on the options and whether this
   // is a local symbol.
   static tls::Tls_optimization
@@ -2931,17 +2947,34 @@ Target_x86_64<size>::Scan::global(Symbol_table* symtab,
 	// If we convert this from
 	// mov foo@GOTPCREL(%rip), %reg
 	// to lea foo(%rip), %reg.
+	// OR
+	// if we convert
+	// (callq|jmpq) *foo@GOTPCRELX(%rip) to
+	// (callq|jmpq) foo
 	// in Relocate::relocate, then there is nothing to do here.
-	if ((r_type == elfcpp::R_X86_64_GOTPCREL
-	     || r_type == elfcpp::R_X86_64_GOTPCRELX
-	     || r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
-	    && reloc.get_r_offset() >= 2
-	    && Target_x86_64<size>::can_convert_mov_to_lea(gsym))
+	bool do_convert_mov_to_lea
+	    = ((r_type == elfcpp::R_X86_64_GOTPCREL
+	        || r_type == elfcpp::R_X86_64_GOTPCRELX
+	        || r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
+	       && reloc.get_r_offset() >= 2
+	       && Target_x86_64<size>::can_convert_mov_to_lea(gsym));
+	bool do_convert_callq_to_direct
+	    = (r_type == elfcpp::R_X86_64_GOTPCRELX
+	       && reloc.get_r_offset() >= 2
+	       && Target_x86_64<size>::can_convert_callq_to_direct(gsym));
+	if (do_convert_mov_to_lea || do_convert_callq_to_direct)
 	  {
 	    section_size_type stype;
 	    const unsigned char* view = object->section_contents(data_shndx,
 								 &stype, true);
-	    if (view[reloc.get_r_offset() - 2] == 0x8b)
+	    if (do_convert_mov_to_lea
+		&& view[reloc.get_r_offset() - 2] == 0x8b)
+	      break;
+
+	    if (do_convert_callq_to_direct
+	        && view[reloc.get_r_offset() - 2] == 0xff
+	        && (view[reloc.get_r_offset() - 1] == 0x15
+	    	    || view[reloc.get_r_offset() - 1] == 0x25))
 	      break;
 	  }
 
@@ -3634,6 +3667,45 @@ Target_x86_64<size>::Relocate::relocate(
 	  view[-2] = 0x8d;
 	  Reloc_funcs::pcrela32(view, object, psymval, addend, address);
 	}
+      // Convert
+      // callq *foo@GOTPCRELX(%rip) to
+      // addr32 callq foo
+      // and jmpq *foo@GOTPCRELX(%rip) to
+      // jmpq foo
+      // nop
+      else if (r_type == elfcpp::R_X86_64_GOTPCRELX
+	       && rela.get_r_offset() >= 2
+	       && view[-2] == 0xff
+	       && (view [-1] == 0x15 || view [-1] == 0x25)
+	       && (gsym != NULL
+		   && Target_x86_64<size>::can_convert_callq_to_direct(gsym)))
+	{
+	  if (view[-1] == 0x15)
+	    {
+	      // Convert callq *foo@GOTPCRELX(%rip) to addr32 callq.
+	      // Opcode of addr32 is 0x67 and opcode of direct callq is 0xe8.
+	      view[-2] = 0x67;
+	      view[-1] = 0xe8;
+	      // Convert GOTPCRELX to 32-bit pc relative reloc.
+	      Reloc_funcs::pcrela32(view, object, psymval, addend, address);
+	    }
+	  else
+	    {
+	      // Convert jmpq *foo@GOTPCRELX(%rip) to
+	      // jmpq foo
+	      // nop
+	      // The opcode of direct jmpq is 0xe9.
+	      view[-2] = 0xe9;
+	      // The opcode of nop is 0x90.
+	      view[3] = 0x90;
+	      // Convert GOTPCRELX to 32-bit pc relative reloc.  jmpq is rip
+	      // relative and since the instruction following the jmpq is now
+	      // the nop, offset the address by 1 byte.  The start of the
+              // relocation also moves ahead by 1 byte.
+	      Reloc_funcs::pcrela32(&view[-1], object, psymval, addend,
+				    address - 1);
+	    }
+	}
       else
 	{
 	  if (gsym != NULL)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-05-31 18:03       ` Sriraman Tallam
@ 2016-06-06 20:51         ` Sriraman Tallam
  2016-06-08 22:22           ` Sriraman Tallam
  2016-06-10 20:06         ` Cary Coutant
  2016-06-10 23:00         ` Cary Coutant
  2 siblings, 1 reply; 26+ messages in thread
From: Sriraman Tallam @ 2016-06-06 20:51 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils, Cary Coutant, David Li

[-- Attachment #1: Type: text/plain, Size: 3074 bytes --]

On Tue, May 31, 2016 at 11:02 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Sat, May 28, 2016 at 10:44 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, May 27, 2016 at 3:14 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> On Fri, May 20, 2016 at 1:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> Hi,
>>>>>
>>>>>    GCC has option -fno-plt which converts all extern calls to indirect
>>>>> calls via GOT to prevent the linker for generating any PLT stubs.
>>>>> However, if the function ends up defined in the executable this patch
>>>>> will convert those indirect calls/jumps to direct.  Since the indirect
>>>>> calls are one byte longer, an extra nop is needed at the beginning.
>>>>>
>>>>> Here is a simple example:
>>>>>
>>>>> main.c
>>>>> ---------
>>>>> extern int foo();
>>>>> int main() {
>>>>>   return foo();
>>>>> }
>>>>>
>>>>> deffoo.c
>>>>> -----------
>>>>> int foo() {
>>>>>   return 0;
>>>>> }
>>>>>
>>>>> $ gcc -fno-plt main.c deffoo.c
>>>>> $objdump -d a.out
>>>>>
>>>>> 0000000000400626 <main>:
>>>>>   ...
>>>>>   40062a:       ff 15 28 14 00 00       callq  *0x1428(%rip)        #
>>>>> 401a58 <_DYNAMIC+0x1d8>
>>>>>
>>>>> The call is indirect even though foo is defined in the executable.
>>>>>
>>>>> With this patch,
>>>>> 0000000000400606 <main>:
>>>>>    ....
>>>>>    40060a:       90                      nop
>>>>>   40060b:       e8 03 00 00 00          callq  400613 <foo>
>>>>>
>>>>> The call is now direct with an extra nop.
>>>>>
>>>>>
>>>>
>>>> Please try ld, which uses 0x67 prefix (addr32) instead of nop.
>>>> Also for
>>>>
>>>> jmp *foo#GOTPCREL(%rip)
>>>>
>>>>  ld converts it to
>>>>
>>>> jmp foo
>>>> nop
>>>
>>> I have modified the patch to keep it consistent with what ld produces.
>>>
>>> Please take another look.
>>>
>>> * x86_64.cc (can_convert_callq_to_direct): New function.
>>> Target_x86_64<size>::Scan::global: Check if an indirect call via
>>> GOT can be converted to direct.
>>> Target_x86_64<size>::Relocate::relocate: Change any indirect call
>>> via GOT that can be converted.
>>> * testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
>>> * testsuite/Makefile.in: Regenerate.
>>> * testsuite/x86_64_indirect_call_to_direct1.s: New file.
>>> * testsuite/x86_64_indirect_jump_to_direct1.s: New file.
>>>
>>
>> Do you need to check R_X86_64_REX_GOTPCRELX for branch?
>
> Ok, patch changed to not check for this and refactored a bit.

Ping, Is this patch ok now?


* x86_64.cc (can_convert_callq_to_direct): New function.
Target_x86_64<size>::Scan::global: Check if an indirect call via
GOT can be converted to direct.
Target_x86_64<size>::Relocate::relocate: Change any indirect call
via GOT that can be converted.
* testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
* testsuite/Makefile.in: Regenerate.
* testsuite/x86_64_indirect_call_to_direct1.s: New file.
* testsuite/x86_64_indirect_jump_to_direct1.s: New file.

Patch attached.

Thanks
Sri

>
> Thanks
> Sri
>
>>
>> --
>> H.J.

[-- Attachment #2: convert_indirect_call_patch.txt --]
[-- Type: text/plain, Size: 8065 bytes --]

	* x86_64.cc (can_convert_callq_to_direct): New function.
	Target_x86_64<size>::Scan::global: Check if an indirect call via
	GOT can be converted to direct.
	Target_x86_64<size>::Relocate::relocate: Change any indirect call
	via GOT that can be converted.
	* testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
	* testsuite/Makefile.in: Regenerate.
	* testsuite/x86_64_indirect_call_to_direct1.s: New file.
	* testsuite/x86_64_indirect_jump_to_direct1.s: New file.

diff --git a/gold/testsuite/Makefile.am b/gold/testsuite/Makefile.am
index 01cae9f..f5cc0db 100644
--- a/gold/testsuite/Makefile.am
+++ b/gold/testsuite/Makefile.am
@@ -1096,6 +1096,25 @@ x86_64_mov_to_lea13.stdout: x86_64_mov_to_lea13
 x86_64_mov_to_lea14.stdout: x86_64_mov_to_lea14
 	$(TEST_OBJDUMP) -dw $< > $@
 
+check_SCRIPTS += x86_64_indirect_call_to_direct.sh
+check_DATA += x86_64_indirect_call_to_direct1.stdout \
+	x86_64_indirect_jump_to_direct1.stdout
+MOSTLYCLEANFILES += x86_64_indirect_call_to_direct1 \
+	x86_64_indirect_jump_to_direct1
+
+x86_64_indirect_call_to_direct1.o: x86_64_indirect_call_to_direct1.s
+	$(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_call_to_direct1: x86_64_indirect_call_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_call_to_direct1.stdout: x86_64_indirect_call_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+x86_64_indirect_jump_to_direct1.o: x86_64_indirect_jump_to_direct1.s
+	$(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_jump_to_direct1: x86_64_indirect_jump_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_jump_to_direct1.stdout: x86_64_indirect_jump_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+
 check_SCRIPTS += x86_64_overflow_pc32.sh
 check_DATA += x86_64_overflow_pc32.err
 MOSTLYCLEANFILES += x86_64_overflow_pc32.err
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct.sh b/gold/testsuite/x86_64_indirect_call_to_direct.sh
index e69de29..d54d024 100755
--- a/gold/testsuite/x86_64_indirect_call_to_direct.sh
+++ b/gold/testsuite/x86_64_indirect_call_to_direct.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+# x86_64_indirect_call_to_direct.sh -- a test for indirect call(jump) to direct
+# conversion.
+
+# Copyright (C) 2016 onwards Free Software Foundation, Inc.
+# Written by Sriraman Tallam <tmsriram@google.com>
+
+# This file is part of gold.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
+# MA 02110-1301, USA.
+
+set -e
+
+grep -q "callq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_call_to_direct1.stdout
+grep -q "jmpq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_jump_to_direct1.stdout
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct1.s b/gold/testsuite/x86_64_indirect_call_to_direct1.s
index e69de29..5ca2e38 100644
--- a/gold/testsuite/x86_64_indirect_call_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_call_to_direct1.s
@@ -0,0 +1,12 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	call	*foo@GOTPCREL(%rip)
+	ret
+	.size	main, .-main
diff --git a/gold/testsuite/x86_64_indirect_jump_to_direct1.s b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
index e69de29..b817e34 100644
--- a/gold/testsuite/x86_64_indirect_jump_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
@@ -0,0 +1,11 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	jmp	*foo@GOTPCREL(%rip)
+	.size	main, .-main
diff --git a/gold/x86_64.cc b/gold/x86_64.cc
index 81126ef..d774d5b 100644
--- a/gold/x86_64.cc
+++ b/gold/x86_64.cc
@@ -891,6 +891,22 @@ class Target_x86_64 : public Sized_target<size, false>
 	    && strcmp(gsym->name(), "_DYNAMIC") != 0);
   }
 
+  // Convert
+  // callq *foo@GOTPCRELX(%rip) to
+  // addr32 callq foo
+  // and jmpq *foo@GOTPCRELX(%rip) to
+  // jmpq foo
+  // nop
+  static bool
+  can_convert_callq_to_direct(const Symbol* gsym)
+  {
+    gold_assert(gsym != NULL);
+    return (gsym->type() == elfcpp::STT_FUNC
+	    && !gsym->is_undefined ()
+	    && !gsym->is_from_dynobj()
+	    && !gsym->is_preemptible());
+  }
+
   // Adjust TLS relocation type based on the options and whether this
   // is a local symbol.
   static tls::Tls_optimization
@@ -2931,17 +2947,34 @@ Target_x86_64<size>::Scan::global(Symbol_table* symtab,
 	// If we convert this from
 	// mov foo@GOTPCREL(%rip), %reg
 	// to lea foo(%rip), %reg.
+	// OR
+	// if we convert
+	// (callq|jmpq) *foo@GOTPCRELX(%rip) to
+	// (callq|jmpq) foo
 	// in Relocate::relocate, then there is nothing to do here.
-	if ((r_type == elfcpp::R_X86_64_GOTPCREL
-	     || r_type == elfcpp::R_X86_64_GOTPCRELX
-	     || r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
-	    && reloc.get_r_offset() >= 2
-	    && Target_x86_64<size>::can_convert_mov_to_lea(gsym))
+	bool do_convert_mov_to_lea
+	    = ((r_type == elfcpp::R_X86_64_GOTPCREL
+	        || r_type == elfcpp::R_X86_64_GOTPCRELX
+	        || r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
+	       && reloc.get_r_offset() >= 2
+	       && Target_x86_64<size>::can_convert_mov_to_lea(gsym));
+	bool do_convert_callq_to_direct
+	    = (r_type == elfcpp::R_X86_64_GOTPCRELX
+	       && reloc.get_r_offset() >= 2
+	       && Target_x86_64<size>::can_convert_callq_to_direct(gsym));
+	if (do_convert_mov_to_lea || do_convert_callq_to_direct)
 	  {
 	    section_size_type stype;
 	    const unsigned char* view = object->section_contents(data_shndx,
 								 &stype, true);
-	    if (view[reloc.get_r_offset() - 2] == 0x8b)
+	    if (do_convert_mov_to_lea
+		&& view[reloc.get_r_offset() - 2] == 0x8b)
+	      break;
+
+	    if (do_convert_callq_to_direct
+	        && view[reloc.get_r_offset() - 2] == 0xff
+	        && (view[reloc.get_r_offset() - 1] == 0x15
+	    	    || view[reloc.get_r_offset() - 1] == 0x25))
 	      break;
 	  }
 
@@ -3634,6 +3667,45 @@ Target_x86_64<size>::Relocate::relocate(
 	  view[-2] = 0x8d;
 	  Reloc_funcs::pcrela32(view, object, psymval, addend, address);
 	}
+      // Convert
+      // callq *foo@GOTPCRELX(%rip) to
+      // addr32 callq foo
+      // and jmpq *foo@GOTPCRELX(%rip) to
+      // jmpq foo
+      // nop
+      else if (r_type == elfcpp::R_X86_64_GOTPCRELX
+	       && rela.get_r_offset() >= 2
+	       && view[-2] == 0xff
+	       && (view [-1] == 0x15 || view [-1] == 0x25)
+	       && (gsym != NULL
+		   && Target_x86_64<size>::can_convert_callq_to_direct(gsym)))
+	{
+	  if (view[-1] == 0x15)
+	    {
+	      // Convert callq *foo@GOTPCRELX(%rip) to addr32 callq.
+	      // Opcode of addr32 is 0x67 and opcode of direct callq is 0xe8.
+	      view[-2] = 0x67;
+	      view[-1] = 0xe8;
+	      // Convert GOTPCRELX to 32-bit pc relative reloc.
+	      Reloc_funcs::pcrela32(view, object, psymval, addend, address);
+	    }
+	  else
+	    {
+	      // Convert jmpq *foo@GOTPCRELX(%rip) to
+	      // jmpq foo
+	      // nop
+	      // The opcode of direct jmpq is 0xe9.
+	      view[-2] = 0xe9;
+	      // The opcode of nop is 0x90.
+	      view[3] = 0x90;
+	      // Convert GOTPCRELX to 32-bit pc relative reloc.  jmpq is rip
+	      // relative and since the instruction following the jmpq is now
+	      // the nop, offset the address by 1 byte.  The start of the
+              // relocation also moves ahead by 1 byte.
+	      Reloc_funcs::pcrela32(&view[-1], object, psymval, addend,
+				    address - 1);
+	    }
+	}
       else
 	{
 	  if (gsym != NULL)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-06-06 20:51         ` Sriraman Tallam
@ 2016-06-08 22:22           ` Sriraman Tallam
  2016-06-08 22:34             ` H.J. Lu
  0 siblings, 1 reply; 26+ messages in thread
From: Sriraman Tallam @ 2016-06-08 22:22 UTC (permalink / raw)
  To: H.J. Lu; +Cc: binutils, Cary Coutant, David Li

[-- Attachment #1: Type: text/plain, Size: 3801 bytes --]

On Mon, Jun 6, 2016 at 1:50 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Tue, May 31, 2016 at 11:02 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Sat, May 28, 2016 at 10:44 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, May 27, 2016 at 3:14 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> On Fri, May 20, 2016 at 1:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>>    GCC has option -fno-plt which converts all extern calls to indirect
>>>>>> calls via GOT to prevent the linker for generating any PLT stubs.
>>>>>> However, if the function ends up defined in the executable this patch
>>>>>> will convert those indirect calls/jumps to direct.  Since the indirect
>>>>>> calls are one byte longer, an extra nop is needed at the beginning.
>>>>>>
>>>>>> Here is a simple example:
>>>>>>
>>>>>> main.c
>>>>>> ---------
>>>>>> extern int foo();
>>>>>> int main() {
>>>>>>   return foo();
>>>>>> }
>>>>>>
>>>>>> deffoo.c
>>>>>> -----------
>>>>>> int foo() {
>>>>>>   return 0;
>>>>>> }
>>>>>>
>>>>>> $ gcc -fno-plt main.c deffoo.c
>>>>>> $objdump -d a.out
>>>>>>
>>>>>> 0000000000400626 <main>:
>>>>>>   ...
>>>>>>   40062a:       ff 15 28 14 00 00       callq  *0x1428(%rip)        #
>>>>>> 401a58 <_DYNAMIC+0x1d8>
>>>>>>
>>>>>> The call is indirect even though foo is defined in the executable.
>>>>>>
>>>>>> With this patch,
>>>>>> 0000000000400606 <main>:
>>>>>>    ....
>>>>>>    40060a:       90                      nop
>>>>>>   40060b:       e8 03 00 00 00          callq  400613 <foo>
>>>>>>
>>>>>> The call is now direct with an extra nop.
>>>>>>
>>>>>>
>>>>>
>>>>> Please try ld, which uses 0x67 prefix (addr32) instead of nop.
>>>>> Also for
>>>>>
>>>>> jmp *foo#GOTPCREL(%rip)
>>>>>
>>>>>  ld converts it to
>>>>>
>>>>> jmp foo
>>>>> nop
>>>>
>>>> I have modified the patch to keep it consistent with what ld produces.
>>>>
>>>> Please take another look.
>>>>
>>>> * x86_64.cc (can_convert_callq_to_direct): New function.
>>>> Target_x86_64<size>::Scan::global: Check if an indirect call via
>>>> GOT can be converted to direct.
>>>> Target_x86_64<size>::Relocate::relocate: Change any indirect call
>>>> via GOT that can be converted.
>>>> * testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
>>>> * testsuite/Makefile.in: Regenerate.
>>>> * testsuite/x86_64_indirect_call_to_direct1.s: New file.
>>>> * testsuite/x86_64_indirect_jump_to_direct1.s: New file.
>>>>
>>>
>>> Do you need to check R_X86_64_REX_GOTPCRELX for branch?
>>
>> Ok, patch changed to not check for this and refactored a bit.
>
> Ping, Is this patch ok now?

Ping.  H.J. / Cary, is this good to go now?

* x86_64.cc (can_convert_callq_to_direct): New function.
Target_x86_64<size>::Scan::global: Check if an indirect call via
GOT can be converted to direct.
Target_x86_64<size>::Relocate::relocate: Change any indirect call
via GOT that can be converted.
* testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
* testsuite/Makefile.in: Regenerate.
* testsuite/x86_64_indirect_call_to_direct1.s: New file.
* testsuite/x86_64_indirect_jump_to_direct1.s: New file.


Thanks
Sri


>
>
> * x86_64.cc (can_convert_callq_to_direct): New function.
> Target_x86_64<size>::Scan::global: Check if an indirect call via
> GOT can be converted to direct.
> Target_x86_64<size>::Relocate::relocate: Change any indirect call
> via GOT that can be converted.
> * testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
> * testsuite/Makefile.in: Regenerate.
> * testsuite/x86_64_indirect_call_to_direct1.s: New file.
> * testsuite/x86_64_indirect_jump_to_direct1.s: New file.
>
> Patch attached.
>
> Thanks
> Sri
>
>>
>> Thanks
>> Sri
>>
>>>
>>> --
>>> H.J.

[-- Attachment #2: convert_indirect_call_patch.txt --]
[-- Type: text/plain, Size: 8065 bytes --]

	* x86_64.cc (can_convert_callq_to_direct): New function.
	Target_x86_64<size>::Scan::global: Check if an indirect call via
	GOT can be converted to direct.
	Target_x86_64<size>::Relocate::relocate: Change any indirect call
	via GOT that can be converted.
	* testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
	* testsuite/Makefile.in: Regenerate.
	* testsuite/x86_64_indirect_call_to_direct1.s: New file.
	* testsuite/x86_64_indirect_jump_to_direct1.s: New file.

diff --git a/gold/testsuite/Makefile.am b/gold/testsuite/Makefile.am
index 01cae9f..f5cc0db 100644
--- a/gold/testsuite/Makefile.am
+++ b/gold/testsuite/Makefile.am
@@ -1096,6 +1096,25 @@ x86_64_mov_to_lea13.stdout: x86_64_mov_to_lea13
 x86_64_mov_to_lea14.stdout: x86_64_mov_to_lea14
 	$(TEST_OBJDUMP) -dw $< > $@
 
+check_SCRIPTS += x86_64_indirect_call_to_direct.sh
+check_DATA += x86_64_indirect_call_to_direct1.stdout \
+	x86_64_indirect_jump_to_direct1.stdout
+MOSTLYCLEANFILES += x86_64_indirect_call_to_direct1 \
+	x86_64_indirect_jump_to_direct1
+
+x86_64_indirect_call_to_direct1.o: x86_64_indirect_call_to_direct1.s
+	$(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_call_to_direct1: x86_64_indirect_call_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_call_to_direct1.stdout: x86_64_indirect_call_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+x86_64_indirect_jump_to_direct1.o: x86_64_indirect_jump_to_direct1.s
+	$(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_jump_to_direct1: x86_64_indirect_jump_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_jump_to_direct1.stdout: x86_64_indirect_jump_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+
 check_SCRIPTS += x86_64_overflow_pc32.sh
 check_DATA += x86_64_overflow_pc32.err
 MOSTLYCLEANFILES += x86_64_overflow_pc32.err
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct.sh b/gold/testsuite/x86_64_indirect_call_to_direct.sh
index e69de29..d54d024 100755
--- a/gold/testsuite/x86_64_indirect_call_to_direct.sh
+++ b/gold/testsuite/x86_64_indirect_call_to_direct.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+# x86_64_indirect_call_to_direct.sh -- a test for indirect call(jump) to direct
+# conversion.
+
+# Copyright (C) 2016 onwards Free Software Foundation, Inc.
+# Written by Sriraman Tallam <tmsriram@google.com>
+
+# This file is part of gold.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
+# MA 02110-1301, USA.
+
+set -e
+
+grep -q "callq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_call_to_direct1.stdout
+grep -q "jmpq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_jump_to_direct1.stdout
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct1.s b/gold/testsuite/x86_64_indirect_call_to_direct1.s
index e69de29..5ca2e38 100644
--- a/gold/testsuite/x86_64_indirect_call_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_call_to_direct1.s
@@ -0,0 +1,12 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	call	*foo@GOTPCREL(%rip)
+	ret
+	.size	main, .-main
diff --git a/gold/testsuite/x86_64_indirect_jump_to_direct1.s b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
index e69de29..b817e34 100644
--- a/gold/testsuite/x86_64_indirect_jump_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
@@ -0,0 +1,11 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	jmp	*foo@GOTPCREL(%rip)
+	.size	main, .-main
diff --git a/gold/x86_64.cc b/gold/x86_64.cc
index 81126ef..d774d5b 100644
--- a/gold/x86_64.cc
+++ b/gold/x86_64.cc
@@ -891,6 +891,22 @@ class Target_x86_64 : public Sized_target<size, false>
 	    && strcmp(gsym->name(), "_DYNAMIC") != 0);
   }
 
+  // Convert
+  // callq *foo@GOTPCRELX(%rip) to
+  // addr32 callq foo
+  // and jmpq *foo@GOTPCRELX(%rip) to
+  // jmpq foo
+  // nop
+  static bool
+  can_convert_callq_to_direct(const Symbol* gsym)
+  {
+    gold_assert(gsym != NULL);
+    return (gsym->type() == elfcpp::STT_FUNC
+	    && !gsym->is_undefined ()
+	    && !gsym->is_from_dynobj()
+	    && !gsym->is_preemptible());
+  }
+
   // Adjust TLS relocation type based on the options and whether this
   // is a local symbol.
   static tls::Tls_optimization
@@ -2931,17 +2947,34 @@ Target_x86_64<size>::Scan::global(Symbol_table* symtab,
 	// If we convert this from
 	// mov foo@GOTPCREL(%rip), %reg
 	// to lea foo(%rip), %reg.
+	// OR
+	// if we convert
+	// (callq|jmpq) *foo@GOTPCRELX(%rip) to
+	// (callq|jmpq) foo
 	// in Relocate::relocate, then there is nothing to do here.
-	if ((r_type == elfcpp::R_X86_64_GOTPCREL
-	     || r_type == elfcpp::R_X86_64_GOTPCRELX
-	     || r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
-	    && reloc.get_r_offset() >= 2
-	    && Target_x86_64<size>::can_convert_mov_to_lea(gsym))
+	bool do_convert_mov_to_lea
+	    = ((r_type == elfcpp::R_X86_64_GOTPCREL
+	        || r_type == elfcpp::R_X86_64_GOTPCRELX
+	        || r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
+	       && reloc.get_r_offset() >= 2
+	       && Target_x86_64<size>::can_convert_mov_to_lea(gsym));
+	bool do_convert_callq_to_direct
+	    = (r_type == elfcpp::R_X86_64_GOTPCRELX
+	       && reloc.get_r_offset() >= 2
+	       && Target_x86_64<size>::can_convert_callq_to_direct(gsym));
+	if (do_convert_mov_to_lea || do_convert_callq_to_direct)
 	  {
 	    section_size_type stype;
 	    const unsigned char* view = object->section_contents(data_shndx,
 								 &stype, true);
-	    if (view[reloc.get_r_offset() - 2] == 0x8b)
+	    if (do_convert_mov_to_lea
+		&& view[reloc.get_r_offset() - 2] == 0x8b)
+	      break;
+
+	    if (do_convert_callq_to_direct
+	        && view[reloc.get_r_offset() - 2] == 0xff
+	        && (view[reloc.get_r_offset() - 1] == 0x15
+	    	    || view[reloc.get_r_offset() - 1] == 0x25))
 	      break;
 	  }
 
@@ -3634,6 +3667,45 @@ Target_x86_64<size>::Relocate::relocate(
 	  view[-2] = 0x8d;
 	  Reloc_funcs::pcrela32(view, object, psymval, addend, address);
 	}
+      // Convert
+      // callq *foo@GOTPCRELX(%rip) to
+      // addr32 callq foo
+      // and jmpq *foo@GOTPCRELX(%rip) to
+      // jmpq foo
+      // nop
+      else if (r_type == elfcpp::R_X86_64_GOTPCRELX
+	       && rela.get_r_offset() >= 2
+	       && view[-2] == 0xff
+	       && (view [-1] == 0x15 || view [-1] == 0x25)
+	       && (gsym != NULL
+		   && Target_x86_64<size>::can_convert_callq_to_direct(gsym)))
+	{
+	  if (view[-1] == 0x15)
+	    {
+	      // Convert callq *foo@GOTPCRELX(%rip) to addr32 callq.
+	      // Opcode of addr32 is 0x67 and opcode of direct callq is 0xe8.
+	      view[-2] = 0x67;
+	      view[-1] = 0xe8;
+	      // Convert GOTPCRELX to 32-bit pc relative reloc.
+	      Reloc_funcs::pcrela32(view, object, psymval, addend, address);
+	    }
+	  else
+	    {
+	      // Convert jmpq *foo@GOTPCRELX(%rip) to
+	      // jmpq foo
+	      // nop
+	      // The opcode of direct jmpq is 0xe9.
+	      view[-2] = 0xe9;
+	      // The opcode of nop is 0x90.
+	      view[3] = 0x90;
+	      // Convert GOTPCRELX to 32-bit pc relative reloc.  jmpq is rip
+	      // relative and since the instruction following the jmpq is now
+	      // the nop, offset the address by 1 byte.  The start of the
+              // relocation also moves ahead by 1 byte.
+	      Reloc_funcs::pcrela32(&view[-1], object, psymval, addend,
+				    address - 1);
+	    }
+	}
       else
 	{
 	  if (gsym != NULL)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-06-08 22:22           ` Sriraman Tallam
@ 2016-06-08 22:34             ` H.J. Lu
  0 siblings, 0 replies; 26+ messages in thread
From: H.J. Lu @ 2016-06-08 22:34 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: binutils, Cary Coutant, David Li

On Wed, Jun 8, 2016 at 3:22 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Mon, Jun 6, 2016 at 1:50 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Tue, May 31, 2016 at 11:02 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> On Sat, May 28, 2016 at 10:44 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Fri, May 27, 2016 at 3:14 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> On Fri, May 20, 2016 at 1:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>> On Fri, May 20, 2016 at 1:27 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>>    GCC has option -fno-plt which converts all extern calls to indirect
>>>>>>> calls via GOT to prevent the linker for generating any PLT stubs.
>>>>>>> However, if the function ends up defined in the executable this patch
>>>>>>> will convert those indirect calls/jumps to direct.  Since the indirect
>>>>>>> calls are one byte longer, an extra nop is needed at the beginning.
>>>>>>>
>>>>>>> Here is a simple example:
>>>>>>>
>>>>>>> main.c
>>>>>>> ---------
>>>>>>> extern int foo();
>>>>>>> int main() {
>>>>>>>   return foo();
>>>>>>> }
>>>>>>>
>>>>>>> deffoo.c
>>>>>>> -----------
>>>>>>> int foo() {
>>>>>>>   return 0;
>>>>>>> }
>>>>>>>
>>>>>>> $ gcc -fno-plt main.c deffoo.c
>>>>>>> $objdump -d a.out
>>>>>>>
>>>>>>> 0000000000400626 <main>:
>>>>>>>   ...
>>>>>>>   40062a:       ff 15 28 14 00 00       callq  *0x1428(%rip)        #
>>>>>>> 401a58 <_DYNAMIC+0x1d8>
>>>>>>>
>>>>>>> The call is indirect even though foo is defined in the executable.
>>>>>>>
>>>>>>> With this patch,
>>>>>>> 0000000000400606 <main>:
>>>>>>>    ....
>>>>>>>    40060a:       90                      nop
>>>>>>>   40060b:       e8 03 00 00 00          callq  400613 <foo>
>>>>>>>
>>>>>>> The call is now direct with an extra nop.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Please try ld, which uses 0x67 prefix (addr32) instead of nop.
>>>>>> Also for
>>>>>>
>>>>>> jmp *foo#GOTPCREL(%rip)
>>>>>>
>>>>>>  ld converts it to
>>>>>>
>>>>>> jmp foo
>>>>>> nop
>>>>>
>>>>> I have modified the patch to keep it consistent with what ld produces.
>>>>>
>>>>> Please take another look.
>>>>>
>>>>> * x86_64.cc (can_convert_callq_to_direct): New function.
>>>>> Target_x86_64<size>::Scan::global: Check if an indirect call via
>>>>> GOT can be converted to direct.
>>>>> Target_x86_64<size>::Relocate::relocate: Change any indirect call
>>>>> via GOT that can be converted.
>>>>> * testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
>>>>> * testsuite/Makefile.in: Regenerate.
>>>>> * testsuite/x86_64_indirect_call_to_direct1.s: New file.
>>>>> * testsuite/x86_64_indirect_jump_to_direct1.s: New file.
>>>>>
>>>>
>>>> Do you need to check R_X86_64_REX_GOTPCRELX for branch?
>>>
>>> Ok, patch changed to not check for this and refactored a bit.
>>
>> Ping, Is this patch ok now?
>
> Ping.  H.J. / Cary, is this good to go now?
>
> * x86_64.cc (can_convert_callq_to_direct): New function.
> Target_x86_64<size>::Scan::global: Check if an indirect call via
> GOT can be converted to direct.
> Target_x86_64<size>::Relocate::relocate: Change any indirect call
> via GOT that can be converted.
> * testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
> * testsuite/Makefile.in: Regenerate.
> * testsuite/x86_64_indirect_call_to_direct1.s: New file.
> * testsuite/x86_64_indirect_jump_to_direct1.s: New file.
>
>

Looks good to me.  But I can't approve it.

-- 
H.J.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-05-31 18:03       ` Sriraman Tallam
  2016-06-06 20:51         ` Sriraman Tallam
@ 2016-06-10 20:06         ` Cary Coutant
  2016-06-10 20:33           ` H.J. Lu
                             ` (2 more replies)
  2016-06-10 23:00         ` Cary Coutant
  2 siblings, 3 replies; 26+ messages in thread
From: Cary Coutant @ 2016-06-10 20:06 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: H.J. Lu, binutils, David Li

>>> * x86_64.cc (can_convert_callq_to_direct): New function.
>>> Target_x86_64<size>::Scan::global: Check if an indirect call via
>>> GOT can be converted to direct.
>>> Target_x86_64<size>::Relocate::relocate: Change any indirect call
>>> via GOT that can be converted.
>>> * testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
>>> * testsuite/Makefile.in: Regenerate.
>>> * testsuite/x86_64_indirect_call_to_direct1.s: New file.
>>> * testsuite/x86_64_indirect_jump_to_direct1.s: New file.
>>>
>>
>> Do you need to check R_X86_64_REX_GOTPCRELX for branch?
>
> Ok, patch changed to not check for this and refactored a bit.

Thanks for the refactoring -- I was about to suggest that. There's
still too much logic outside the can_convert_* functions that is
duplicated in Scan::global() and in Relocate::relocate(), all because
we don't want to fetch the section contents without doing a few
preliminary checks. I'd like to move most of that logic into the
can_convert_* functions, so let's start with a Lazy_view class, which
will fetch the section contents only when needed:

template<int size>
class Lazy_view
{
 public:
  Lazy_view(Sized_relobj_file<size, false>* object, unsigned int data_shndx)
    : object_(object), data_shndx_(data_shndx), view_(NULL), view_size_(0)
  { }

  inline unsigned char
  operator[](size_t offset)
  {
    if (this->view_ == NULL)
      this->view_ = this->object_->section_contents(this->data_shndx_,
                                                    &this->view_size_,
                                                    true);
    if (offset >= this->view_size_)
      return 0;
    return this->view_[offset];
  }

 private:
  Sized_relobj_file<size, false>* object_;
  unsigned int data_shndx_;
  const unsigned char* view_;
  section_size_type view_size_;
};

Now we can move (almost) all of that external logic into the
can_convert_* routines, leaving us with this in Scan::global():

        Lazy_view<size> view(object, data_shndx);
        size_t r_offset = reloc.get_r_offset();
        if (r_offset >= 2
            && Target_x86_64<size>::can_convert_mov_to_lea(gsym, r_type,
                                                           r_offset, &view))
          break;
        if (r_offset >= 2
            && Target_x86_64<size>::can_convert_callq_to_direct(gsym, r_type,

r_offset, &view))
          break;

... and this in Relocate::relocate():

        if ((gsym == NULL
             && rela.get_r_offset() >= 2
             && view[-2] == 0x8b
             && !psymval->is_ifunc_symbol())
            || (gsym != NULL
                && rela.get_r_offset() >= 2
                && Target_x86_64<size>::can_convert_mov_to_lea(gsym, r_type,
                                                               0, &view)))
          {
            ...
          }
        else if (gsym != NULL
                 && rela.get_r_offset() >= 2
                 && Target_x86_64<size>::can_convert_callq_to_direct(gsym,
                                                                     r_type,
                                                                     0, &view))
          {
            if (view[-1] == 0x15)
              {
                ...
              }
            else
              {
                ...
              }
          }

Folding all the external logic into the can_convert_* functions, and
refactoring a bit gives this:

  template<class View_type>
  static inline bool
  can_convert_mov_to_lea(const Symbol* gsym, unsigned int r_type,
                         size_t r_offset, View_type* view)
  {
    gold_assert(gsym != NULL);
    // We cannot do the conversion unless it's one of these relocations.
    if (r_type != elfcpp::R_X86_64_GOTPCREL
        && r_type != elfcpp::R_X86_64_GOTPCRELX
        && r_type != elfcpp::R_X86_64_REX_GOTPCRELX)
      return false;
    // We cannot convert references to IFUNC symbols, or to symbols that
    // are not local to the current module.
    if (gsym->type() == elfcpp::STT_GNU_IFUNC
        || gsym->is_undefined ()
        || gsym->is_from_dynobj()
        || gsym->is_preemptible())
      return false;
    if (parameters->options().shared()
        && (gsym->visibility() == elfcpp::STV_DEFAULT
            || gsym->visibility() == elfcpp::STV_PROTECTED)
        && !parameters->options().Bsymbolic())
      return false;
    // We cannot convert references to the _DYNAMIC symbol.
    if (strcmp(gsym->name(), "_DYNAMIC") == 0)
      return false;
    // Check for a MOV opcode.
    return (*view)[r_offset - 2] == 0x8b;
  }

  template<class View_type>
  static inline bool
  can_convert_callq_to_direct(const Symbol* gsym, unsigned int r_type,
                              size_t r_offset, View_type* view)
  {
    gold_assert(gsym != NULL);
    // We cannot do the conversion unless it's a GOTPCRELX relocation.
    if (r_type != elfcpp::R_X86_64_GOTPCRELX)
      return false;
    // We cannot convert references to IFUNC symbols, or to symbols that
    // are not local to the current module.
    if (gsym->type() == elfcpp::STT_GNU_IFUNC
        || gsym->is_undefined ()
        || gsym->is_from_dynobj()
        || gsym->is_preemptible())
      return false;
    // Check for a CALLQ or JMPQ opcode.
    return ((*view)[r_offset - 2] == 0xff
            && ((*view)[r_offset - 1] == 0x15
                || (*view)[r_offset - 1] == 0x25));
  }

A couple of observations, though:

1. Sri, in your patch, you just test for sym type == STT_FUNC. Isn't
it sufficient to test for sym type != STT_GNU_IFUNC (as in the
convert-to-lea case)? I don't think it really matters -- if we see a
jump to an STT_OBJECT or STT_NOTYPE symbol, why isn't the
transformation just as valid?

2. HJ, given an R_X86_64_GOTPCRELX relocation, is it still necessary
to check the opcode during Scan::global()? Doesn't the relocation
guarantee that it's an appropriate instruction for the transformation?
I think in both cases, we could skip fetching the section contents if
we have this relocation.

3. This piece of can_convert_mov_to_lea has me a bit puzzled:

    if (parameters->options().shared()
        && (gsym->visibility() == elfcpp::STV_DEFAULT
            || gsym->visibility() == elfcpp::STV_PROTECTED)
        && !parameters->options().Bsymbolic())
      return false;

We've already tested for is_preemptible, which should take care of all
of these cases. If I remove this code, however, I get a couple of
failures that I need to investigate. At the very least, I suspect this
part of the logic can be simplified. I want to investigate this
further before committing the whole thing.

-cary

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-06-10 20:06         ` Cary Coutant
@ 2016-06-10 20:33           ` H.J. Lu
  2016-06-10 22:22             ` Cary Coutant
  2016-06-13 23:04           ` Sriraman Tallam
  2016-06-14 18:33           ` Sriraman Tallam
  2 siblings, 1 reply; 26+ messages in thread
From: H.J. Lu @ 2016-06-10 20:33 UTC (permalink / raw)
  To: Cary Coutant; +Cc: Sriraman Tallam, binutils, David Li

On Fri, Jun 10, 2016 at 1:06 PM, Cary Coutant <ccoutant@gmail.com> wrote:

>   template<class View_type>
>   static inline bool
>   can_convert_callq_to_direct(const Symbol* gsym, unsigned int r_type,
>                               size_t r_offset, View_type* view)
>   {
>     gold_assert(gsym != NULL);
>     // We cannot do the conversion unless it's a GOTPCRELX relocation.
>     if (r_type != elfcpp::R_X86_64_GOTPCRELX)
>       return false;
>     // We cannot convert references to IFUNC symbols, or to symbols that
>     // are not local to the current module.
>     if (gsym->type() == elfcpp::STT_GNU_IFUNC
>         || gsym->is_undefined ()
>         || gsym->is_from_dynobj()
>         || gsym->is_preemptible())
>       return false;
>     // Check for a CALLQ or JMPQ opcode.
>     return ((*view)[r_offset - 2] == 0xff
>             && ((*view)[r_offset - 1] == 0x15
>                 || (*view)[r_offset - 1] == 0x25));
>   }
>
> A couple of observations, though:
>
> 1. Sri, in your patch, you just test for sym type == STT_FUNC. Isn't
> it sufficient to test for sym type != STT_GNU_IFUNC (as in the
> convert-to-lea case)? I don't think it really matters -- if we see a
> jump to an STT_OBJECT or STT_NOTYPE symbol, why isn't the
> transformation just as valid?
>
> 2. HJ, given an R_X86_64_GOTPCRELX relocation, is it still necessary
> to check the opcode during Scan::global()? Doesn't the relocation
> guarantee that it's an appropriate instruction for the transformation?
> I think in both cases, we could skip fetching the section contents if
> we have this relocation.

Doesn't gold make different decision based on opcode, like
branch vs non-branch?


-- 
H.J.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-06-10 20:33           ` H.J. Lu
@ 2016-06-10 22:22             ` Cary Coutant
  2016-06-10 23:18               ` H.J. Lu
  0 siblings, 1 reply; 26+ messages in thread
From: Cary Coutant @ 2016-06-10 22:22 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Sriraman Tallam, binutils, David Li

>> 2. HJ, given an R_X86_64_GOTPCRELX relocation, is it still necessary
>> to check the opcode during Scan::global()? Doesn't the relocation
>> guarantee that it's an appropriate instruction for the transformation?
>> I think in both cases, we could skip fetching the section contents if
>> we have this relocation.
>
> Doesn't gold make different decision based on opcode, like
> branch vs non-branch?

Only during actual relocation, when we necessarily have the section
contents already available. During relocation scanning, it would be
nice to avoid reading the section contents.

-cary

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-05-31 18:03       ` Sriraman Tallam
  2016-06-06 20:51         ` Sriraman Tallam
  2016-06-10 20:06         ` Cary Coutant
@ 2016-06-10 23:00         ` Cary Coutant
  2016-06-10 23:10           ` Joseph Myers
  2 siblings, 1 reply; 26+ messages in thread
From: Cary Coutant @ 2016-06-10 23:00 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: H.J. Lu, binutils, David Li

>> Do you need to check R_X86_64_REX_GOTPCRELX for branch?
>
> Ok, patch changed to not check for this and refactored a bit.

+# x86_64_indirect_call_to_direct.sh -- a test for indirect call(jump) to direct
+# conversion.
+
+# Copyright (C) 2016 onwards Free Software Foundation, Inc.

I don't think "onwards" is a valid way of expressing copyright dates.
I haven't found any mention of it in the Gnu coding standards, the GPL
documentation, or in any existing sources. (Maybe I missed a policy
change?)

A range of years (e.g., "2012-2016") is only valid when the file was
updated in each of the years in the range. So a blanket "onwards" is
inappropriate for a file that probably won't get touched every year,
even if it is a valid way of expressing a range.

-cary

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-06-10 23:00         ` Cary Coutant
@ 2016-06-10 23:10           ` Joseph Myers
  0 siblings, 0 replies; 26+ messages in thread
From: Joseph Myers @ 2016-06-10 23:10 UTC (permalink / raw)
  To: Cary Coutant; +Cc: Sriraman Tallam, H.J. Lu, binutils, David Li

On Fri, 10 Jun 2016, Cary Coutant wrote:

> A range of years (e.g., "2012-2016") is only valid when the file was
> updated in each of the years in the range. So a blanket "onwards" is

No, as long as there was a released version of the package (including in 
public version control) in each year.  See the Information for Maintainers 
of GNU Software.  binutils/README has the required statement about the 
meaning of ranges.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-06-10 22:22             ` Cary Coutant
@ 2016-06-10 23:18               ` H.J. Lu
  0 siblings, 0 replies; 26+ messages in thread
From: H.J. Lu @ 2016-06-10 23:18 UTC (permalink / raw)
  To: Cary Coutant; +Cc: Sriraman Tallam, binutils, David Li

On Fri, Jun 10, 2016 at 3:22 PM, Cary Coutant <ccoutant@gmail.com> wrote:
>>> 2. HJ, given an R_X86_64_GOTPCRELX relocation, is it still necessary
>>> to check the opcode during Scan::global()? Doesn't the relocation
>>> guarantee that it's an appropriate instruction for the transformation?
>>> I think in both cases, we could skip fetching the section contents if
>>> we have this relocation.
>>
>> Doesn't gold make different decision based on opcode, like
>> branch vs non-branch?
>
> Only during actual relocation, when we necessarily have the section
> contents already available. During relocation scanning, it would be
> nice to avoid reading the section contents.
>
> -cary

Yes, no need to check opcode for GOTPCRELX during scan.

-- 
H.J.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-06-10 20:06         ` Cary Coutant
  2016-06-10 20:33           ` H.J. Lu
@ 2016-06-13 23:04           ` Sriraman Tallam
  2016-06-14 18:33           ` Sriraman Tallam
  2 siblings, 0 replies; 26+ messages in thread
From: Sriraman Tallam @ 2016-06-13 23:04 UTC (permalink / raw)
  To: Cary Coutant; +Cc: H.J. Lu, binutils, David Li

On Fri, Jun 10, 2016 at 1:06 PM, Cary Coutant <ccoutant@gmail.com> wrote:
>>>> * x86_64.cc (can_convert_callq_to_direct): New function.
>>>> Target_x86_64<size>::Scan::global: Check if an indirect call via
>>>> GOT can be converted to direct.
>>>> Target_x86_64<size>::Relocate::relocate: Change any indirect call
>>>> via GOT that can be converted.
>>>> * testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
>>>> * testsuite/Makefile.in: Regenerate.
>>>> * testsuite/x86_64_indirect_call_to_direct1.s: New file.
>>>> * testsuite/x86_64_indirect_jump_to_direct1.s: New file.
>>>>
>>>
>>> Do you need to check R_X86_64_REX_GOTPCRELX for branch?
>>
>> Ok, patch changed to not check for this and refactored a bit.
>
> Thanks for the refactoring -- I was about to suggest that. There's
> still too much logic outside the can_convert_* functions that is
> duplicated in Scan::global() and in Relocate::relocate(), all because
> we don't want to fetch the section contents without doing a few
> preliminary checks. I'd like to move most of that logic into the
> can_convert_* functions, so let's start with a Lazy_view class, which
> will fetch the section contents only when needed:
>
> template<int size>
> class Lazy_view
> {
>  public:
>   Lazy_view(Sized_relobj_file<size, false>* object, unsigned int data_shndx)
>     : object_(object), data_shndx_(data_shndx), view_(NULL), view_size_(0)
>   { }
>
>   inline unsigned char
>   operator[](size_t offset)
>   {
>     if (this->view_ == NULL)
>       this->view_ = this->object_->section_contents(this->data_shndx_,
>                                                     &this->view_size_,
>                                                     true);
>     if (offset >= this->view_size_)
>       return 0;
>     return this->view_[offset];
>   }
>
>  private:
>   Sized_relobj_file<size, false>* object_;
>   unsigned int data_shndx_;
>   const unsigned char* view_;
>   section_size_type view_size_;
> };
>
> Now we can move (almost) all of that external logic into the
> can_convert_* routines, leaving us with this in Scan::global():
>
>         Lazy_view<size> view(object, data_shndx);
>         size_t r_offset = reloc.get_r_offset();
>         if (r_offset >= 2
>             && Target_x86_64<size>::can_convert_mov_to_lea(gsym, r_type,
>                                                            r_offset, &view))
>           break;
>         if (r_offset >= 2
>             && Target_x86_64<size>::can_convert_callq_to_direct(gsym, r_type,
>
> r_offset, &view))
>           break;
>
> ... and this in Relocate::relocate():
>
>         if ((gsym == NULL
>              && rela.get_r_offset() >= 2
>              && view[-2] == 0x8b
>              && !psymval->is_ifunc_symbol())
>             || (gsym != NULL
>                 && rela.get_r_offset() >= 2
>                 && Target_x86_64<size>::can_convert_mov_to_lea(gsym, r_type,
>                                                                0, &view)))
>           {
>             ...
>           }
>         else if (gsym != NULL
>                  && rela.get_r_offset() >= 2
>                  && Target_x86_64<size>::can_convert_callq_to_direct(gsym,
>                                                                      r_type,
>                                                                      0, &view))
>           {
>             if (view[-1] == 0x15)
>               {
>                 ...
>               }
>             else
>               {
>                 ...
>               }
>           }
>
> Folding all the external logic into the can_convert_* functions, and
> refactoring a bit gives this:
>
>   template<class View_type>
>   static inline bool
>   can_convert_mov_to_lea(const Symbol* gsym, unsigned int r_type,
>                          size_t r_offset, View_type* view)
>   {
>     gold_assert(gsym != NULL);
>     // We cannot do the conversion unless it's one of these relocations.
>     if (r_type != elfcpp::R_X86_64_GOTPCREL
>         && r_type != elfcpp::R_X86_64_GOTPCRELX
>         && r_type != elfcpp::R_X86_64_REX_GOTPCRELX)
>       return false;
>     // We cannot convert references to IFUNC symbols, or to symbols that
>     // are not local to the current module.
>     if (gsym->type() == elfcpp::STT_GNU_IFUNC
>         || gsym->is_undefined ()
>         || gsym->is_from_dynobj()
>         || gsym->is_preemptible())
>       return false;
>     if (parameters->options().shared()
>         && (gsym->visibility() == elfcpp::STV_DEFAULT
>             || gsym->visibility() == elfcpp::STV_PROTECTED)
>         && !parameters->options().Bsymbolic())
>       return false;
>     // We cannot convert references to the _DYNAMIC symbol.
>     if (strcmp(gsym->name(), "_DYNAMIC") == 0)
>       return false;
>     // Check for a MOV opcode.
>     return (*view)[r_offset - 2] == 0x8b;
>   }
>
>   template<class View_type>
>   static inline bool
>   can_convert_callq_to_direct(const Symbol* gsym, unsigned int r_type,
>                               size_t r_offset, View_type* view)
>   {
>     gold_assert(gsym != NULL);
>     // We cannot do the conversion unless it's a GOTPCRELX relocation.
>     if (r_type != elfcpp::R_X86_64_GOTPCRELX)
>       return false;
>     // We cannot convert references to IFUNC symbols, or to symbols that
>     // are not local to the current module.
>     if (gsym->type() == elfcpp::STT_GNU_IFUNC
>         || gsym->is_undefined ()
>         || gsym->is_from_dynobj()
>         || gsym->is_preemptible())
>       return false;
>     // Check for a CALLQ or JMPQ opcode.
>     return ((*view)[r_offset - 2] == 0xff
>             && ((*view)[r_offset - 1] == 0x15
>                 || (*view)[r_offset - 1] == 0x25));
>   }
>
> A couple of observations, though:
>
> 1. Sri, in your patch, you just test for sym type == STT_FUNC. Isn't
> it sufficient to test for sym type != STT_GNU_IFUNC (as in the
> convert-to-lea case)? I don't think it really matters -- if we see a
> jump to an STT_OBJECT or STT_NOTYPE symbol, why isn't the
> transformation just as valid?
>
> 2. HJ, given an R_X86_64_GOTPCRELX relocation, is it still necessary
> to check the opcode during Scan::global()? Doesn't the relocation
> guarantee that it's an appropriate instruction for the transformation?
> I think in both cases, we could skip fetching the section contents if
> we have this relocation.
>
> 3. This piece of can_convert_mov_to_lea has me a bit puzzled:
>
>     if (parameters->options().shared()
>         && (gsym->visibility() == elfcpp::STV_DEFAULT
>             || gsym->visibility() == elfcpp::STV_PROTECTED)
>         && !parameters->options().Bsymbolic())
>       return false;

This code seems to cover the one extra case about a protected symbol
in a shared object.  A protected symbol defined in a shared object is
not pre-emptible and is_preemptible() will return false.  However,
this code suggests the transformation is invalid in that case and  I
dont understand why.  If this code is being overly conservative, then
this check can be removed and the test cases deleted.  Nevertheless,
this code can be simplified to:

    if (parameters->options().shared()
        && gsym->visibility() == elfcpp::STV_PROTECTED)
      return false;

>
> We've already tested for is_preemptible, which should take care of all
> of these cases. If I remove this code, however, I get a couple of
> failures that I need to investigate. At the very least, I suspect this
> part of the logic can be simplified. I want to investigate this
> further before committing the whole thing.
>
> -cary

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-06-10 20:06         ` Cary Coutant
  2016-06-10 20:33           ` H.J. Lu
  2016-06-13 23:04           ` Sriraman Tallam
@ 2016-06-14 18:33           ` Sriraman Tallam
  2016-06-20 19:04             ` Cary Coutant
  2 siblings, 1 reply; 26+ messages in thread
From: Sriraman Tallam @ 2016-06-14 18:33 UTC (permalink / raw)
  To: Cary Coutant; +Cc: H.J. Lu, binutils, David Li

[-- Attachment #1: Type: text/plain, Size: 8693 bytes --]

On Fri, Jun 10, 2016 at 1:06 PM, Cary Coutant <ccoutant@gmail.com> wrote:
>>>> * x86_64.cc (can_convert_callq_to_direct): New function.
>>>> Target_x86_64<size>::Scan::global: Check if an indirect call via
>>>> GOT can be converted to direct.
>>>> Target_x86_64<size>::Relocate::relocate: Change any indirect call
>>>> via GOT that can be converted.
>>>> * testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
>>>> * testsuite/Makefile.in: Regenerate.
>>>> * testsuite/x86_64_indirect_call_to_direct1.s: New file.
>>>> * testsuite/x86_64_indirect_jump_to_direct1.s: New file.
>>>>
>>>
>>> Do you need to check R_X86_64_REX_GOTPCRELX for branch?
>>
>> Ok, patch changed to not check for this and refactored a bit.
>
> Thanks for the refactoring -- I was about to suggest that. There's
> still too much logic outside the can_convert_* functions that is
> duplicated in Scan::global() and in Relocate::relocate(), all because
> we don't want to fetch the section contents without doing a few
> preliminary checks. I'd like to move most of that logic into the
> can_convert_* functions, so let's start with a Lazy_view class, which
> will fetch the section contents only when needed:
>
> template<int size>
> class Lazy_view
> {
>  public:
>   Lazy_view(Sized_relobj_file<size, false>* object, unsigned int data_shndx)
>     : object_(object), data_shndx_(data_shndx), view_(NULL), view_size_(0)
>   { }
>
>   inline unsigned char
>   operator[](size_t offset)
>   {
>     if (this->view_ == NULL)
>       this->view_ = this->object_->section_contents(this->data_shndx_,
>                                                     &this->view_size_,
>                                                     true);
>     if (offset >= this->view_size_)
>       return 0;
>     return this->view_[offset];
>   }
>
>  private:
>   Sized_relobj_file<size, false>* object_;
>   unsigned int data_shndx_;
>   const unsigned char* view_;
>   section_size_type view_size_;
> };
>
> Now we can move (almost) all of that external logic into the
> can_convert_* routines, leaving us with this in Scan::global():
>
>         Lazy_view<size> view(object, data_shndx);
>         size_t r_offset = reloc.get_r_offset();
>         if (r_offset >= 2
>             && Target_x86_64<size>::can_convert_mov_to_lea(gsym, r_type,
>                                                            r_offset, &view))
>           break;
>         if (r_offset >= 2
>             && Target_x86_64<size>::can_convert_callq_to_direct(gsym, r_type,
>
> r_offset, &view))
>           break;
>
> ... and this in Relocate::relocate():
>
>         if ((gsym == NULL
>              && rela.get_r_offset() >= 2
>              && view[-2] == 0x8b
>              && !psymval->is_ifunc_symbol())
>             || (gsym != NULL
>                 && rela.get_r_offset() >= 2
>                 && Target_x86_64<size>::can_convert_mov_to_lea(gsym, r_type,
>                                                                0, &view)))
>           {
>             ...
>           }
>         else if (gsym != NULL
>                  && rela.get_r_offset() >= 2
>                  && Target_x86_64<size>::can_convert_callq_to_direct(gsym,
>                                                                      r_type,
>                                                                      0, &view))
>           {
>             if (view[-1] == 0x15)
>               {
>                 ...
>               }
>             else
>               {
>                 ...
>               }
>           }
>
> Folding all the external logic into the can_convert_* functions, and
> refactoring a bit gives this:
>
>   template<class View_type>
>   static inline bool
>   can_convert_mov_to_lea(const Symbol* gsym, unsigned int r_type,
>                          size_t r_offset, View_type* view)
>   {
>     gold_assert(gsym != NULL);
>     // We cannot do the conversion unless it's one of these relocations.
>     if (r_type != elfcpp::R_X86_64_GOTPCREL
>         && r_type != elfcpp::R_X86_64_GOTPCRELX
>         && r_type != elfcpp::R_X86_64_REX_GOTPCRELX)
>       return false;
>     // We cannot convert references to IFUNC symbols, or to symbols that
>     // are not local to the current module.
>     if (gsym->type() == elfcpp::STT_GNU_IFUNC
>         || gsym->is_undefined ()
>         || gsym->is_from_dynobj()
>         || gsym->is_preemptible())
>       return false;
>     if (parameters->options().shared()
>         && (gsym->visibility() == elfcpp::STV_DEFAULT
>             || gsym->visibility() == elfcpp::STV_PROTECTED)
>         && !parameters->options().Bsymbolic())
>       return false;
>     // We cannot convert references to the _DYNAMIC symbol.
>     if (strcmp(gsym->name(), "_DYNAMIC") == 0)
>       return false;
>     // Check for a MOV opcode.
>     return (*view)[r_offset - 2] == 0x8b;
>   }
>
>   template<class View_type>
>   static inline bool
>   can_convert_callq_to_direct(const Symbol* gsym, unsigned int r_type,
>                               size_t r_offset, View_type* view)
>   {
>     gold_assert(gsym != NULL);
>     // We cannot do the conversion unless it's a GOTPCRELX relocation.
>     if (r_type != elfcpp::R_X86_64_GOTPCRELX)
>       return false;
>     // We cannot convert references to IFUNC symbols, or to symbols that
>     // are not local to the current module.
>     if (gsym->type() == elfcpp::STT_GNU_IFUNC
>         || gsym->is_undefined ()
>         || gsym->is_from_dynobj()
>         || gsym->is_preemptible())
>       return false;
>     // Check for a CALLQ or JMPQ opcode.
>     return ((*view)[r_offset - 2] == 0xff
>             && ((*view)[r_offset - 1] == 0x15
>                 || (*view)[r_offset - 1] == 0x25));
>   }
>
> A couple of observations, though:
>
> 1. Sri, in your patch, you just test for sym type == STT_FUNC. Isn't
> it sufficient to test for sym type != STT_GNU_IFUNC (as in the
> convert-to-lea case)? I don't think it really matters -- if we see a
> jump to an STT_OBJECT or STT_NOTYPE symbol, why isn't the
> transformation just as valid?
>
> 2. HJ, given an R_X86_64_GOTPCRELX relocation, is it still necessary
> to check the opcode during Scan::global()? Doesn't the relocation
> guarantee that it's an appropriate instruction for the transformation?
> I think in both cases, we could skip fetching the section contents if
> we have this relocation.
>
> 3. This piece of can_convert_mov_to_lea has me a bit puzzled:
>
>     if (parameters->options().shared()
>         && (gsym->visibility() == elfcpp::STV_DEFAULT
>             || gsym->visibility() == elfcpp::STV_PROTECTED)
>         && !parameters->options().Bsymbolic())
>       return false;
>
> We've already tested for is_preemptible, which should take care of all
> of these cases. If I remove this code, however, I get a couple of
> failures that I need to investigate. At the very least, I suspect this
> part of the logic can be simplified. I want to investigate this
> further before committing the whole thing.

This case is needed if the shared object specifies "protected"
visibility using an asm directive instead of the attribute.  In this
case, the compiler prefers to use the GOT.  Here is the relevant code
from the test that fails if this condition is removed:

// The function f2 is used to test that the executable can see the
// same function address for a protected function in the executable
// and in the shared library.  We can't use the visibility attribute
// here, becaues that may cause gcc to generate a PC relative reloc;
// we need it to get the value from the GOT.  I'm not sure this is
// really useful, given that it doesn't work with the visibility
// attribute.  This test exists here mainly because the glibc
// testsuite has the same test, and we want to make sure that gold
// passes the glibc testsuite.

extern "C" int f2();
asm(".protected f2");


Like I said earlier, the condition can be simplified and I have done that.


I am attaching the patch after making all the changes mentioned.
Please take a look.

* x86_64.cc (Lazy_view): New class.
(can_convert_mov_to_lea): Templatize function.  Make the function
check for appropriate relocation types and use the view parameter
to get section contents.
(can_convert_callq_to_direct): New function.
(Target_x86_64<size>::Scan::global): Refactor.
(Target_x86_64<size>::Relocate::relocate): Refactor. Change any indirect
call via GOT that can be converted.
* testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
* testsuite/Makefile.in: Regenerate.
* testsuite/x86_64_indirect_call_to_direct1.s: New file.
* testsuite/x86_64_indirect_jump_to_direct1.s: New file.


Thanks
Sri



>
> -cary

[-- Attachment #2: convert_indirect_call_patch.txt --]
[-- Type: text/plain, Size: 11686 bytes --]

	* x86_64.cc (Lazy_view): New class.
	(can_convert_mov_to_lea): Templatize function.  Make the function
	check for appropriate relocation types and use the view parameter
	to get section contents.
	(can_convert_callq_to_direct): New function.
	(Target_x86_64<size>::Scan::global): Refactor.
	(Target_x86_64<size>::Relocate::relocate): Refactor. Change any indirect
	call via GOT that can be converted.
	* testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
	* testsuite/Makefile.in: Regenerate.
	* testsuite/x86_64_indirect_call_to_direct1.s: New file.
	* testsuite/x86_64_indirect_jump_to_direct1.s: New file.


diff --git a/gold/testsuite/Makefile.am b/gold/testsuite/Makefile.am
index 01cae9f..f5cc0db 100644
--- a/gold/testsuite/Makefile.am
+++ b/gold/testsuite/Makefile.am
@@ -1096,6 +1096,25 @@ x86_64_mov_to_lea13.stdout: x86_64_mov_to_lea13
 x86_64_mov_to_lea14.stdout: x86_64_mov_to_lea14
 	$(TEST_OBJDUMP) -dw $< > $@
 
+check_SCRIPTS += x86_64_indirect_call_to_direct.sh
+check_DATA += x86_64_indirect_call_to_direct1.stdout \
+	x86_64_indirect_jump_to_direct1.stdout
+MOSTLYCLEANFILES += x86_64_indirect_call_to_direct1 \
+	x86_64_indirect_jump_to_direct1
+
+x86_64_indirect_call_to_direct1.o: x86_64_indirect_call_to_direct1.s
+	$(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_call_to_direct1: x86_64_indirect_call_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_call_to_direct1.stdout: x86_64_indirect_call_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+x86_64_indirect_jump_to_direct1.o: x86_64_indirect_jump_to_direct1.s
+	$(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_jump_to_direct1: x86_64_indirect_jump_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_jump_to_direct1.stdout: x86_64_indirect_jump_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+
 check_SCRIPTS += x86_64_overflow_pc32.sh
 check_DATA += x86_64_overflow_pc32.err
 MOSTLYCLEANFILES += x86_64_overflow_pc32.err
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct.sh b/gold/testsuite/x86_64_indirect_call_to_direct.sh
index e69de29..916e1a3 100755
--- a/gold/testsuite/x86_64_indirect_call_to_direct.sh
+++ b/gold/testsuite/x86_64_indirect_call_to_direct.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+# x86_64_indirect_call_to_direct.sh -- a test for indirect call(jump) to direct
+# conversion.
+
+# Copyright (C) 2016 Free Software Foundation, Inc.
+# Written by Sriraman Tallam <tmsriram@google.com>
+
+# This file is part of gold.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
+# MA 02110-1301, USA.
+
+set -e
+
+grep -q "callq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_call_to_direct1.stdout
+grep -q "jmpq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_jump_to_direct1.stdout
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct1.s b/gold/testsuite/x86_64_indirect_call_to_direct1.s
index e69de29..5ca2e38 100644
--- a/gold/testsuite/x86_64_indirect_call_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_call_to_direct1.s
@@ -0,0 +1,12 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	call	*foo@GOTPCREL(%rip)
+	ret
+	.size	main, .-main
diff --git a/gold/testsuite/x86_64_indirect_jump_to_direct1.s b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
index e69de29..b817e34 100644
--- a/gold/testsuite/x86_64_indirect_jump_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
@@ -0,0 +1,11 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	jmp	*foo@GOTPCREL(%rip)
+	.size	main, .-main
diff --git a/gold/x86_64.cc b/gold/x86_64.cc
index 81126ef..85a98da 100644
--- a/gold/x86_64.cc
+++ b/gold/x86_64.cc
@@ -403,6 +403,33 @@ class Output_data_plt_x86_64_standard : public Output_data_plt_x86_64<size>
   static const unsigned char plt_eh_frame_fde[plt_eh_frame_fde_size];
 };
 
+template<int size>
+class Lazy_view
+{
+ public:
+  Lazy_view(Sized_relobj_file<size, false>* object, unsigned int data_shndx)
+    : object_(object), data_shndx_(data_shndx), view_(NULL), view_size_(0)
+  { }
+
+  inline unsigned char
+  operator[](size_t offset)
+  {
+    if (this->view_ == NULL)
+      this->view_ = this->object_->section_contents(this->data_shndx_,
+                                                    &this->view_size_,
+                                                    true);
+    if (offset >= this->view_size_)
+      return 0;
+    return this->view_[offset];
+  }
+ 
+ private:
+  Sized_relobj_file<size, false>* object_;
+  unsigned int data_shndx_;
+  const unsigned char* view_;
+  section_size_type view_size_;
+};
+
 // The x86_64 target class.
 // See the ABI at
 //   http://www.x86-64.org/documentation/abi.pdf
@@ -876,19 +903,62 @@ class Target_x86_64 : public Sized_target<size, false>
   // conversion from
   // mov foo@GOTPCREL(%rip), %reg
   // to lea foo(%rip), %reg.
-  static bool
-  can_convert_mov_to_lea(const Symbol* gsym)
+  template<class View_type>
+  static inline bool
+  can_convert_mov_to_lea(const Symbol* gsym, unsigned int r_type,
+                         size_t r_offset, View_type* view)
+  {
+    gold_assert(gsym != NULL);
+    // We cannot do the conversion unless it's one of these relocations.
+    if (r_type != elfcpp::R_X86_64_GOTPCREL
+        && r_type != elfcpp::R_X86_64_GOTPCRELX
+        && r_type != elfcpp::R_X86_64_REX_GOTPCRELX)
+      return false;
+    // We cannot convert references to IFUNC symbols, or to symbols that
+    // are not local to the current module.
+    if (gsym->type() == elfcpp::STT_GNU_IFUNC
+        || gsym->is_undefined ()
+        || gsym->is_from_dynobj()
+        || gsym->is_preemptible())
+      return false;
+    // If we are building a shared object and the symbol is protected, we may
+    // need to go through the GOT.
+    if (parameters->options().shared()
+        && gsym->visibility() == elfcpp::STV_PROTECTED)
+      return false;
+    // We cannot convert references to the _DYNAMIC symbol.
+    if (strcmp(gsym->name(), "_DYNAMIC") == 0)
+      return false;
+    // Check for a MOV opcode.
+    return (*view)[r_offset - 2] == 0x8b;
+  }
+
+  // Convert
+  // callq *foo@GOTPCRELX(%rip) to
+  // addr32 callq foo
+  // and jmpq *foo@GOTPCRELX(%rip) to
+  // jmpq foo
+  // nop
+  template<class View_type>
+  static inline bool
+  can_convert_callq_to_direct(const Symbol* gsym, unsigned int r_type,
+			      size_t r_offset, View_type* view)
   {
     gold_assert(gsym != NULL);
-    return (gsym->type() != elfcpp::STT_GNU_IFUNC
-	    && !gsym->is_undefined ()
-	    && !gsym->is_from_dynobj()
-	    && !gsym->is_preemptible()
-	    && (!parameters->options().shared()
-		|| (gsym->visibility() != elfcpp::STV_DEFAULT
-		    && gsym->visibility() != elfcpp::STV_PROTECTED)
-		|| parameters->options().Bsymbolic())
-	    && strcmp(gsym->name(), "_DYNAMIC") != 0);
+    // We cannot do the conversion unless it's a GOTPCRELX relocation.
+    if (r_type != elfcpp::R_X86_64_GOTPCRELX)
+      return false;
+    // We cannot convert references to IFUNC symbols, or to symbols that
+    // are not local to the current module.
+    if (gsym->type() == elfcpp::STT_GNU_IFUNC
+        || gsym->is_undefined ()
+        || gsym->is_from_dynobj()
+        || gsym->is_preemptible())
+      return false;
+    // Check for a CALLQ or JMPQ opcode.
+    return ((*view)[r_offset - 2] == 0xff
+            && ((*view)[r_offset - 1] == 0x15
+                || (*view)[r_offset - 1] == 0x25));
   }
 
   // Adjust TLS relocation type based on the options and whether this
@@ -2931,19 +3001,23 @@ Target_x86_64<size>::Scan::global(Symbol_table* symtab,
 	// If we convert this from
 	// mov foo@GOTPCREL(%rip), %reg
 	// to lea foo(%rip), %reg.
+	// OR
+	// if we convert
+	// (callq|jmpq) *foo@GOTPCRELX(%rip) to
+	// (callq|jmpq) foo
 	// in Relocate::relocate, then there is nothing to do here.
-	if ((r_type == elfcpp::R_X86_64_GOTPCREL
-	     || r_type == elfcpp::R_X86_64_GOTPCRELX
-	     || r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
-	    && reloc.get_r_offset() >= 2
-	    && Target_x86_64<size>::can_convert_mov_to_lea(gsym))
-	  {
-	    section_size_type stype;
-	    const unsigned char* view = object->section_contents(data_shndx,
-								 &stype, true);
-	    if (view[reloc.get_r_offset() - 2] == 0x8b)
-	      break;
-	  }
+
+	// If relocation type is R_X86_64_GOTPCRELX it is automatically a
+	// candidate for conversion.
+	if (r_type ==  elfcpp::R_X86_64_GOTPCRELX)
+	  break;
+
+        Lazy_view<size> view(object, data_shndx);
+        size_t r_offset = reloc.get_r_offset();
+        if (r_offset >= 2
+            && Target_x86_64<size>::can_convert_mov_to_lea(gsym, r_type,
+                                                           r_offset, &view))
+          break;
 
 	if (gsym->final_value_is_known())
 	  {
@@ -3625,15 +3699,56 @@ Target_x86_64<size>::Relocate::relocate(
       // mov foo@GOTPCREL(%rip), %reg
       // to lea foo(%rip), %reg.
       // if possible.
-      if (rela.get_r_offset() >= 2
-	  && view[-2] == 0x8b
-	  && ((gsym == NULL && !psymval->is_ifunc_symbol())
-	      || (gsym != NULL
-		  && Target_x86_64<size>::can_convert_mov_to_lea(gsym))))
+       if ((gsym == NULL
+             && rela.get_r_offset() >= 2
+             && view[-2] == 0x8b
+             && !psymval->is_ifunc_symbol())
+            || (gsym != NULL
+                && rela.get_r_offset() >= 2
+                && Target_x86_64<size>::can_convert_mov_to_lea(gsym, r_type,
+                                                               0, &view)))
 	{
 	  view[-2] = 0x8d;
 	  Reloc_funcs::pcrela32(view, object, psymval, addend, address);
 	}
+      // Convert
+      // callq *foo@GOTPCRELX(%rip) to
+      // addr32 callq foo
+      // and jmpq *foo@GOTPCRELX(%rip) to
+      // jmpq foo
+      // nop
+      else if (gsym != NULL
+	       && rela.get_r_offset() >= 2
+	       && Target_x86_64<size>::can_convert_callq_to_direct(gsym,
+								   r_type,
+								   0, &view))
+	{
+	  if (view[-1] == 0x15)
+	    {
+	      // Convert callq *foo@GOTPCRELX(%rip) to addr32 callq.
+	      // Opcode of addr32 is 0x67 and opcode of direct callq is 0xe8.
+	      view[-2] = 0x67;
+	      view[-1] = 0xe8;
+	      // Convert GOTPCRELX to 32-bit pc relative reloc.
+	      Reloc_funcs::pcrela32(view, object, psymval, addend, address);
+	    }
+	  else
+	    {
+	      // Convert jmpq *foo@GOTPCRELX(%rip) to
+	      // jmpq foo
+	      // nop
+	      // The opcode of direct jmpq is 0xe9.
+	      view[-2] = 0xe9;
+	      // The opcode of nop is 0x90.
+	      view[3] = 0x90;
+	      // Convert GOTPCRELX to 32-bit pc relative reloc.  jmpq is rip
+	      // relative and since the instruction following the jmpq is now
+	      // the nop, offset the address by 1 byte.  The start of the
+              // relocation also moves ahead by 1 byte.
+	      Reloc_funcs::pcrela32(&view[-1], object, psymval, addend,
+				    address - 1);
+	    }
+	}
       else
 	{
 	  if (gsym != NULL)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-06-14 18:33           ` Sriraman Tallam
@ 2016-06-20 19:04             ` Cary Coutant
  2016-06-24 17:47               ` Sriraman Tallam
  0 siblings, 1 reply; 26+ messages in thread
From: Cary Coutant @ 2016-06-20 19:04 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: H.J. Lu, binutils, David Li

> I am attaching the patch after making all the changes mentioned.
> Please take a look.

+       // If relocation type is R_X86_64_GOTPCRELX it is automatically a
+       // candidate for conversion.
+       if (r_type ==  elfcpp::R_X86_64_GOTPCRELX)
+         break;

If you're not calling can_convert_callq_to_direct() here, then you
shouldn't be calling it here:

+      // Convert
+      // callq *foo@GOTPCRELX(%rip) to
+      // addr32 callq foo
+      // and jmpq *foo@GOTPCRELX(%rip) to
+      // jmpq foo
+      // nop
+      else if (gsym != NULL
+              && rela.get_r_offset() >= 2
+              && Target_x86_64<size>::can_convert_callq_to_direct(gsym,
+                                                                  r_type,
+                                                                  0, &view))

What will happen if it returns false in relocate()? You'll get no
conversion, but also no GOT entry. Since you're only doing this for
the GOTPCRELX relocation, you don't need can_convert_callq_to_direct()
at all, but you will need an error case for when this is false:

+    return ((*view)[r_offset - 2] == 0xff
+            && ((*view)[r_offset - 1] == 0x15
+                || (*view)[r_offset - 1] == 0x25));

If you see a GOTPCRELX relocation, and the opcode isn't either 0xff
0x15 or 0xff 0x25, that's a bad input that should be diagnosed.

-cary

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-06-20 19:04             ` Cary Coutant
@ 2016-06-24 17:47               ` Sriraman Tallam
  2016-06-28 19:31                 ` Cary Coutant
  0 siblings, 1 reply; 26+ messages in thread
From: Sriraman Tallam @ 2016-06-24 17:47 UTC (permalink / raw)
  To: Cary Coutant; +Cc: H.J. Lu, binutils, David Li

[-- Attachment #1: Type: text/plain, Size: 2297 bytes --]

On Mon, Jun 20, 2016 at 12:04 PM, Cary Coutant <ccoutant@gmail.com> wrote:
>> I am attaching the patch after making all the changes mentioned.
>> Please take a look.
>
> +       // If relocation type is R_X86_64_GOTPCRELX it is automatically a
> +       // candidate for conversion.
> +       if (r_type ==  elfcpp::R_X86_64_GOTPCRELX)
> +         break;
>
> If you're not calling can_convert_callq_to_direct() here, then you
> shouldn't be calling it here:

Yes, good point.  I am a little confused about this new relocation
elfcpp::R_X86_64_GOTPCRELX.  I have attached a patch that reverts to
the change you suggested where I check for conversion always.
However,  is it safe to assume that a elfcpp::R_X86_64_GOTPCRELX
relocation implies that the instruction containing the relocation is
eligible for one of the conversions *always* ?

If this is true, then I could completely remove
can_convert_callq_to_direct  and simplify the code a lot more.
Otherwise, the check is needed in both places.   Will
-relax-relocations=yes in the assembler do the checks before
converting R_X86_64_GOTPCREL to  now R_X86_64_GOTPCRELX. The attached
patch now contains a conservative check in both places to
can_convert_callq_to_direct.

Thanks
Sri


>
> +      // Convert
> +      // callq *foo@GOTPCRELX(%rip) to
> +      // addr32 callq foo
> +      // and jmpq *foo@GOTPCRELX(%rip) to
> +      // jmpq foo
> +      // nop
> +      else if (gsym != NULL
> +              && rela.get_r_offset() >= 2
> +              && Target_x86_64<size>::can_convert_callq_to_direct(gsym,
> +                                                                  r_type,
> +                                                                  0, &view))
>
> What will happen if it returns false in relocate()? You'll get no
> conversion, but also no GOT entry. Since you're only doing this for
> the GOTPCRELX relocation, you don't need can_convert_callq_to_direct()
> at all, but you will need an error case for when this is false:
>
> +    return ((*view)[r_offset - 2] == 0xff
> +            && ((*view)[r_offset - 1] == 0x15
> +                || (*view)[r_offset - 1] == 0x25));
>
> If you see a GOTPCRELX relocation, and the opcode isn't either 0xff
> 0x15 or 0xff 0x25, that's a bad input that should be diagnosed.
>
> -cary

[-- Attachment #2: convert_indirect_call_patch.txt --]
[-- Type: text/plain, Size: 11676 bytes --]

	* x86_64.cc (Lazy_view): New class.
	(can_convert_mov_to_lea): Templatize function.  Make the function
	check for appropriate relocation types and use the view parameter
	to get section contents.
	(can_convert_callq_to_direct): New function.
	(Target_x86_64<size>::Scan::global): Refactor.
	(Target_x86_64<size>::Relocate::relocate): Refactor. Change any indirect
	call via GOT that can be converted.
	* testsuite/Makefile.am (x86_64_indirect_call_to_direct.sh): New test.
	* testsuite/Makefile.in: Regenerate.
	* testsuite/x86_64_indirect_call_to_direct1.s: New file.
	* testsuite/x86_64_indirect_jump_to_direct1.s: New file.

diff --git a/gold/testsuite/Makefile.am b/gold/testsuite/Makefile.am
index 01cae9f..f5cc0db 100644
--- a/gold/testsuite/Makefile.am
+++ b/gold/testsuite/Makefile.am
@@ -1096,6 +1096,25 @@ x86_64_mov_to_lea13.stdout: x86_64_mov_to_lea13
 x86_64_mov_to_lea14.stdout: x86_64_mov_to_lea14
 	$(TEST_OBJDUMP) -dw $< > $@
 
+check_SCRIPTS += x86_64_indirect_call_to_direct.sh
+check_DATA += x86_64_indirect_call_to_direct1.stdout \
+	x86_64_indirect_jump_to_direct1.stdout
+MOSTLYCLEANFILES += x86_64_indirect_call_to_direct1 \
+	x86_64_indirect_jump_to_direct1
+
+x86_64_indirect_call_to_direct1.o: x86_64_indirect_call_to_direct1.s
+	$(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_call_to_direct1: x86_64_indirect_call_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_call_to_direct1.stdout: x86_64_indirect_call_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+x86_64_indirect_jump_to_direct1.o: x86_64_indirect_jump_to_direct1.s
+	$(TEST_AS) --64 -mrelax-relocations=yes -o $@ $<
+x86_64_indirect_jump_to_direct1: x86_64_indirect_jump_to_direct1.o gcctestdir/ld
+	gcctestdir/ld -o $@ $<
+x86_64_indirect_jump_to_direct1.stdout: x86_64_indirect_jump_to_direct1
+	$(TEST_OBJDUMP) -dw $< > $@
+
 check_SCRIPTS += x86_64_overflow_pc32.sh
 check_DATA += x86_64_overflow_pc32.err
 MOSTLYCLEANFILES += x86_64_overflow_pc32.err
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct.sh b/gold/testsuite/x86_64_indirect_call_to_direct.sh
index e69de29..916e1a3 100755
--- a/gold/testsuite/x86_64_indirect_call_to_direct.sh
+++ b/gold/testsuite/x86_64_indirect_call_to_direct.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+# x86_64_indirect_call_to_direct.sh -- a test for indirect call(jump) to direct
+# conversion.
+
+# Copyright (C) 2016 Free Software Foundation, Inc.
+# Written by Sriraman Tallam <tmsriram@google.com>
+
+# This file is part of gold.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
+# MA 02110-1301, USA.
+
+set -e
+
+grep -q "callq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_call_to_direct1.stdout
+grep -q "jmpq[ ]\+[a-f0-9]\+ <foo>" x86_64_indirect_jump_to_direct1.stdout
diff --git a/gold/testsuite/x86_64_indirect_call_to_direct1.s b/gold/testsuite/x86_64_indirect_call_to_direct1.s
index e69de29..5ca2e38 100644
--- a/gold/testsuite/x86_64_indirect_call_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_call_to_direct1.s
@@ -0,0 +1,12 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	call	*foo@GOTPCREL(%rip)
+	ret
+	.size	main, .-main
diff --git a/gold/testsuite/x86_64_indirect_jump_to_direct1.s b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
index e69de29..b817e34 100644
--- a/gold/testsuite/x86_64_indirect_jump_to_direct1.s
+++ b/gold/testsuite/x86_64_indirect_jump_to_direct1.s
@@ -0,0 +1,11 @@
+	.text
+	.globl	foo
+	.type	foo, @function
+foo:
+	ret
+	.size	foo, .-foo
+	.globl	main
+	.type	main, @function
+main:
+	jmp	*foo@GOTPCREL(%rip)
+	.size	main, .-main
diff --git a/gold/x86_64.cc b/gold/x86_64.cc
index 81126ef..3356271 100644
--- a/gold/x86_64.cc
+++ b/gold/x86_64.cc
@@ -403,6 +403,33 @@ class Output_data_plt_x86_64_standard : public Output_data_plt_x86_64<size>
   static const unsigned char plt_eh_frame_fde[plt_eh_frame_fde_size];
 };
 
+template<int size>
+class Lazy_view
+{
+ public:
+  Lazy_view(Sized_relobj_file<size, false>* object, unsigned int data_shndx)
+    : object_(object), data_shndx_(data_shndx), view_(NULL), view_size_(0)
+  { }
+
+  inline unsigned char
+  operator[](size_t offset)
+  {
+    if (this->view_ == NULL)
+      this->view_ = this->object_->section_contents(this->data_shndx_,
+                                                    &this->view_size_,
+                                                    true);
+    if (offset >= this->view_size_)
+      return 0;
+    return this->view_[offset];
+  }
+ 
+ private:
+  Sized_relobj_file<size, false>* object_;
+  unsigned int data_shndx_;
+  const unsigned char* view_;
+  section_size_type view_size_;
+};
+
 // The x86_64 target class.
 // See the ABI at
 //   http://www.x86-64.org/documentation/abi.pdf
@@ -876,19 +903,62 @@ class Target_x86_64 : public Sized_target<size, false>
   // conversion from
   // mov foo@GOTPCREL(%rip), %reg
   // to lea foo(%rip), %reg.
-  static bool
-  can_convert_mov_to_lea(const Symbol* gsym)
+  template<class View_type>
+  static inline bool
+  can_convert_mov_to_lea(const Symbol* gsym, unsigned int r_type,
+                         size_t r_offset, View_type* view)
   {
     gold_assert(gsym != NULL);
-    return (gsym->type() != elfcpp::STT_GNU_IFUNC
-	    && !gsym->is_undefined ()
-	    && !gsym->is_from_dynobj()
-	    && !gsym->is_preemptible()
-	    && (!parameters->options().shared()
-		|| (gsym->visibility() != elfcpp::STV_DEFAULT
-		    && gsym->visibility() != elfcpp::STV_PROTECTED)
-		|| parameters->options().Bsymbolic())
-	    && strcmp(gsym->name(), "_DYNAMIC") != 0);
+    // We cannot do the conversion unless it's one of these relocations.
+    if (r_type != elfcpp::R_X86_64_GOTPCREL
+        && r_type != elfcpp::R_X86_64_GOTPCRELX
+        && r_type != elfcpp::R_X86_64_REX_GOTPCRELX)
+      return false;
+    // We cannot convert references to IFUNC symbols, or to symbols that
+    // are not local to the current module.
+    if (gsym->type() == elfcpp::STT_GNU_IFUNC
+        || gsym->is_undefined ()
+        || gsym->is_from_dynobj()
+        || gsym->is_preemptible())
+      return false;
+    // If we are building a shared object and the symbol is protected, we may
+    // need to go through the GOT.
+    if (parameters->options().shared()
+        && gsym->visibility() == elfcpp::STV_PROTECTED)
+      return false;
+    // We cannot convert references to the _DYNAMIC symbol.
+    if (strcmp(gsym->name(), "_DYNAMIC") == 0)
+      return false;
+    // Check for a MOV opcode.
+    return (*view)[r_offset - 2] == 0x8b;
+  }
+
+  // Convert
+  // callq *foo@GOTPCRELX(%rip) to
+  // addr32 callq foo
+  // and jmpq *foo@GOTPCRELX(%rip) to
+  // jmpq foo
+  // nop
+  template<class View_type>
+  static inline bool
+  can_convert_callq_to_direct(const Symbol* gsym, unsigned int r_type,
+			      size_t r_offset, View_type* view)
+  {
+    gold_assert(gsym != NULL);
+    // We cannot do the conversion unless it's a GOTPCRELX relocation.
+    if (r_type != elfcpp::R_X86_64_GOTPCRELX)
+      return false;
+    // We cannot convert references to IFUNC symbols, or to symbols that
+    // are not local to the current module.
+    if (gsym->type() == elfcpp::STT_GNU_IFUNC
+        || gsym->is_undefined ()
+        || gsym->is_from_dynobj()
+        || gsym->is_preemptible())
+      return false;
+    // Check for a CALLQ or JMPQ opcode.
+    return ((*view)[r_offset - 2] == 0xff
+            && ((*view)[r_offset - 1] == 0x15
+                || (*view)[r_offset - 1] == 0x25));
   }
 
   // Adjust TLS relocation type based on the options and whether this
@@ -2931,19 +3001,24 @@ Target_x86_64<size>::Scan::global(Symbol_table* symtab,
 	// If we convert this from
 	// mov foo@GOTPCREL(%rip), %reg
 	// to lea foo(%rip), %reg.
+	// OR
+	// if we convert
+	// (callq|jmpq) *foo@GOTPCRELX(%rip) to
+	// (callq|jmpq) foo
 	// in Relocate::relocate, then there is nothing to do here.
-	if ((r_type == elfcpp::R_X86_64_GOTPCREL
-	     || r_type == elfcpp::R_X86_64_GOTPCRELX
-	     || r_type == elfcpp::R_X86_64_REX_GOTPCRELX)
-	    && reloc.get_r_offset() >= 2
-	    && Target_x86_64<size>::can_convert_mov_to_lea(gsym))
-	  {
-	    section_size_type stype;
-	    const unsigned char* view = object->section_contents(data_shndx,
-								 &stype, true);
-	    if (view[reloc.get_r_offset() - 2] == 0x8b)
-	      break;
-	  }
+
+        Lazy_view<size> view(object, data_shndx);
+        size_t r_offset = reloc.get_r_offset();
+        if (r_offset >= 2
+            && Target_x86_64<size>::can_convert_mov_to_lea(gsym, r_type,
+                                                           r_offset, &view))
+          break;
+
+	if (r_offset >= 2
+	    && Target_x86_64<size>::can_convert_callq_to_direct(gsym, r_type,
+								r_offset,
+								&view))
+          break;
 
 	if (gsym->final_value_is_known())
 	  {
@@ -3625,15 +3700,56 @@ Target_x86_64<size>::Relocate::relocate(
       // mov foo@GOTPCREL(%rip), %reg
       // to lea foo(%rip), %reg.
       // if possible.
-      if (rela.get_r_offset() >= 2
-	  && view[-2] == 0x8b
-	  && ((gsym == NULL && !psymval->is_ifunc_symbol())
-	      || (gsym != NULL
-		  && Target_x86_64<size>::can_convert_mov_to_lea(gsym))))
+       if ((gsym == NULL
+             && rela.get_r_offset() >= 2
+             && view[-2] == 0x8b
+             && !psymval->is_ifunc_symbol())
+            || (gsym != NULL
+                && rela.get_r_offset() >= 2
+                && Target_x86_64<size>::can_convert_mov_to_lea(gsym, r_type,
+                                                               0, &view)))
 	{
 	  view[-2] = 0x8d;
 	  Reloc_funcs::pcrela32(view, object, psymval, addend, address);
 	}
+      // Convert
+      // callq *foo@GOTPCRELX(%rip) to
+      // addr32 callq foo
+      // and jmpq *foo@GOTPCRELX(%rip) to
+      // jmpq foo
+      // nop
+      else if (gsym != NULL
+	       && rela.get_r_offset() >= 2
+	       && Target_x86_64<size>::can_convert_callq_to_direct(gsym,
+								   r_type,
+								   0, &view))
+	{
+	  if (view[-1] == 0x15)
+	    {
+	      // Convert callq *foo@GOTPCRELX(%rip) to addr32 callq.
+	      // Opcode of addr32 is 0x67 and opcode of direct callq is 0xe8.
+	      view[-2] = 0x67;
+	      view[-1] = 0xe8;
+	      // Convert GOTPCRELX to 32-bit pc relative reloc.
+	      Reloc_funcs::pcrela32(view, object, psymval, addend, address);
+	    }
+	  else
+	    {
+	      // Convert jmpq *foo@GOTPCRELX(%rip) to
+	      // jmpq foo
+	      // nop
+	      // The opcode of direct jmpq is 0xe9.
+	      view[-2] = 0xe9;
+	      // The opcode of nop is 0x90.
+	      view[3] = 0x90;
+	      // Convert GOTPCRELX to 32-bit pc relative reloc.  jmpq is rip
+	      // relative and since the instruction following the jmpq is now
+	      // the nop, offset the address by 1 byte.  The start of the
+              // relocation also moves ahead by 1 byte.
+	      Reloc_funcs::pcrela32(&view[-1], object, psymval, addend,
+				    address - 1);
+	    }
+	}
       else
 	{
 	  if (gsym != NULL)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-06-24 17:47               ` Sriraman Tallam
@ 2016-06-28 19:31                 ` Cary Coutant
  2016-06-28 20:58                   ` H.J. Lu
  0 siblings, 1 reply; 26+ messages in thread
From: Cary Coutant @ 2016-06-28 19:31 UTC (permalink / raw)
  To: Sriraman Tallam; +Cc: H.J. Lu, binutils, David Li

> Yes, good point.  I am a little confused about this new relocation
> elfcpp::R_X86_64_GOTPCRELX.  I have attached a patch that reverts to
> the change you suggested where I check for conversion always.
> However,  is it safe to assume that a elfcpp::R_X86_64_GOTPCRELX
> relocation implies that the instruction containing the relocation is
> eligible for one of the conversions *always* ?

Here's HJ's original proposal:

   https://groups.google.com/forum/#!topic/x86-64-abi/n9AWHogmVY0

It looks like that relocation marks several different instructions
sequences that are eligible for transformation, so it looks like you
may still have to look at the opcode. But it might be the case that in
Scan::global(), all we need to know is that *some* transformation is
going to happen that will remove the need for the GOT entry -- it
doesn't really matter which one.

HJ, can you clarify? If we have a GOTPCRELX relocation, what other
conditions are sufficient to guarantee that one of the proposed
translations will take place during relocation?

> If this is true, then I could completely remove
> can_convert_callq_to_direct  and simplify the code a lot more.
> Otherwise, the check is needed in both places.   Will
> -relax-relocations=yes in the assembler do the checks before
> converting R_X86_64_GOTPCREL to  now R_X86_64_GOTPCRELX. The attached
> patch now contains a conservative check in both places to
> can_convert_callq_to_direct.

There are other conditions in the can_convert* routines; I don't think
you can completely remove the routine.

Your latest version of the patch looks good; please go ahead and apply
it. I think for now it's safest to check the opcodes, and we can
discuss further improvements as a follow-up.

Thanks!

-cary

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH][x86_64] Convert indirect call via GOT to direct when possible
  2016-06-28 19:31                 ` Cary Coutant
@ 2016-06-28 20:58                   ` H.J. Lu
  0 siblings, 0 replies; 26+ messages in thread
From: H.J. Lu @ 2016-06-28 20:58 UTC (permalink / raw)
  To: Cary Coutant; +Cc: Sriraman Tallam, binutils, David Li

On Tue, Jun 28, 2016 at 12:31 PM, Cary Coutant <ccoutant@gmail.com> wrote:
>> Yes, good point.  I am a little confused about this new relocation
>> elfcpp::R_X86_64_GOTPCRELX.  I have attached a patch that reverts to
>> the change you suggested where I check for conversion always.
>> However,  is it safe to assume that a elfcpp::R_X86_64_GOTPCRELX
>> relocation implies that the instruction containing the relocation is
>> eligible for one of the conversions *always* ?
>
> Here's HJ's original proposal:
>
>    https://groups.google.com/forum/#!topic/x86-64-abi/n9AWHogmVY0
>
> It looks like that relocation marks several different instructions
> sequences that are eligible for transformation, so it looks like you
> may still have to look at the opcode. But it might be the case that in
> Scan::global(), all we need to know is that *some* transformation is
> going to happen that will remove the need for the GOT entry -- it
> doesn't really matter which one.
>
> HJ, can you clarify? If we have a GOTPCRELX relocation, what other
> conditions are sufficient to guarantee that one of the proposed
> translations will take place during relocation?
>

All GOTPCRELX relocations can be converted if symbol is resolved
locally without relocation overflow.

-- 
H.J.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-06-28 20:58 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-20 20:28 [PATCH][x86_64] Convert indirect call via GOT to direct when possible Sriraman Tallam
2016-05-20 20:32 ` H.J. Lu
2016-05-20 21:15   ` Sriraman Tallam
2016-05-20 21:32     ` H.J. Lu
2016-05-24 22:06       ` Sriraman Tallam
2016-05-24 22:39         ` Rafael Espíndola
2016-05-24 22:58           ` Sriraman Tallam
2016-05-25  2:06             ` Rafael Espíndola
2016-05-27 22:14   ` Sriraman Tallam
2016-05-28 17:44     ` H.J. Lu
2016-05-31 18:03       ` Sriraman Tallam
2016-06-06 20:51         ` Sriraman Tallam
2016-06-08 22:22           ` Sriraman Tallam
2016-06-08 22:34             ` H.J. Lu
2016-06-10 20:06         ` Cary Coutant
2016-06-10 20:33           ` H.J. Lu
2016-06-10 22:22             ` Cary Coutant
2016-06-10 23:18               ` H.J. Lu
2016-06-13 23:04           ` Sriraman Tallam
2016-06-14 18:33           ` Sriraman Tallam
2016-06-20 19:04             ` Cary Coutant
2016-06-24 17:47               ` Sriraman Tallam
2016-06-28 19:31                 ` Cary Coutant
2016-06-28 20:58                   ` H.J. Lu
2016-06-10 23:00         ` Cary Coutant
2016-06-10 23:10           ` Joseph Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).