public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PING^3: [PATCH] x86: Force __x86_indirect_thunk_reg for function call via GOT
@ 2018-03-15 15:51 H.J. Lu
  2018-03-15 20:51 ` Jan Hubicka
  0 siblings, 1 reply; 5+ messages in thread
From: H.J. Lu @ 2018-03-15 15:51 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jan Hubicka

On Sun, Mar 11, 2018 at 7:40 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Mar 5, 2018 at 4:20 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Feb 27, 2018 at 11:39 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>> For x86 targets, when -fno-plt is used, external functions are called
>>> via GOT slot, in 64-bit mode:
>>>
>>>         [bnd] call/jmp *foo@GOTPCREL(%rip)
>>>
>>> and in 32-bit mode:
>>>
>>>         [bnd] call/jmp *foo@GOT[(%reg)]
>>>
>>> With -mindirect-branch=, they are converted to, in 64-bit mode:
>>>
>>>         pushq          foo@GOTPCREL(%rip)
>>>         [bnd] jmp      __x86_indirect_thunk[_bnd]
>>>
>>> and in 32-bit mode:
>>>
>>>         pushl          foo@GOT[(%reg)]
>>>         [bnd] jmp      __x86_indirect_thunk[_bnd]
>>>
>>> which were incompatible with CFI.  In 64-bit mode, since R11 is a scratch
>>> register, we generate:
>>>
>>>         movq           foo@GOTPCREL(%rip), %r11
>>>         [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11
>>>
>>> instead.  We do it in ix86_output_indirect_branch so that we can use
>>> the newly proposed R_X86_64_THUNK_GOTPCRELX relocation:
>>>
>>> https://groups.google.com/forum/#!topic/x86-64-abi/eED5lzn3_Mg
>>>
>>>         movq           foo@OTPCREL_THUNK(%rip), %r11
>>>         [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11
>>>
>>> to load GOT slot into R11.  If foo is defined locally, linker can can
>>> convert
>>>
>>>         movq           foo@GOTPCREL_THUNK(%rip), %reg
>>>         call/jmp       __x86_indirect_thunk_reg
>>>
>>> to
>>>
>>>         call/jmp       foo
>>>         nop            0L(%rax)
>>>
>>> In 32-bit mode, since all caller-saved registers, EAX, EDX and ECX, may
>>> used to function parameters, there is no scratch register available.  For
>>> -fno-plt -fno-pic -mindirect-branch=, we expand external function call
>>> to:
>>>
>>>         movl           foo@GOT, %reg
>>>         [bnd] call/jmp *%reg
>>>
>>> so that it can be converted to
>>>
>>>         movl           foo@GOT, %reg
>>>         [bnd] call/jmp __x86_indirect_thunk_[bnd_]reg
>>>
>>> in ix86_output_indirect_branch.  Since this is performed during RTL
>>> expansion, other instructions may be inserted between movl and call/jmp.
>>> Linker optimization isn't always possible.
>>>
>>> Tested on i686 and x86-64.  OK for trunk?
>>>
>>>
>>> H.J.
>>> ---
>>> gcc/
>>>
>>>         PR target/83970
>>>         * config/i386/constraints.md (Bs): Allow GOT_memory_operand
>>>         for TARGET_LP64 with indirect branch conversion.
>>>         (Bw): Likewise.
>>>         * config/i386/i386.c (ix86_expand_call): Handle -fno-plt with
>>>         -mindirect-branch=.
>>>         (ix86_nopic_noplt_attribute_p): Likewise.
>>>         (ix86_output_indirect_branch): In 64-bit mode, convert function
>>>         call via GOT with R11 as a scratch register using
>>>         __x86_indirect_thunk_r11.
>>>         (ix86_output_call_insn): In 64-bit mode, set xasm to NULL when
>>>         calling ix86_output_indirect_branch with function call via GOT.
>>>         * config/i386/i386.md (*call_got_thunk): New call pattern for
>>>         TARGET_LP64 with indirect branch conversion.
>>>         (*call_value_got_thunk): Likewise.
>>>
>>> gcc/testsuite/
>>>
>>>         PR target/83970
>>>         * gcc.target/i386/indirect-thunk-5.c: Updated.
>>>         * gcc.target/i386/indirect-thunk-6.c: Likewise.
>>>         * gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
>>>         * gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
>>>         * gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
>>>         * gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
>>>         * gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
>>>         * gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
>>>         * gcc.target/i386/indirect-thunk-13.c: New test.
>>>         * gcc.target/i386/indirect-thunk-14.c: Likewise.
>>>         * gcc.target/i386/indirect-thunk-bnd-5.c: Likewise.
>>>         * gcc.target/i386/indirect-thunk-bnd-6.c: Likewise.
>>>         * gcc.target/i386/indirect-thunk-extern-11.c: Likewise.
>>>         * gcc.target/i386/indirect-thunk-extern-12.c: Likewise.
>>>         * gcc.target/i386/indirect-thunk-inline-8.c: Likewise.
>>>         * gcc.target/i386/indirect-thunk-inline-9.c: Likewise.
>>
>> PING:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01527.html
>>
>
> PING.
>

PING.

-- 
H.J.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PING^3: [PATCH] x86: Force __x86_indirect_thunk_reg for function call via GOT
  2018-03-15 15:51 PING^3: [PATCH] x86: Force __x86_indirect_thunk_reg for function call via GOT H.J. Lu
@ 2018-03-15 20:51 ` Jan Hubicka
  2018-03-15 21:03   ` H.J. Lu
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Hubicka @ 2018-03-15 20:51 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

> On Sun, Mar 11, 2018 at 7:40 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> > On Mon, Mar 5, 2018 at 4:20 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> >> On Tue, Feb 27, 2018 at 11:39 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
> >>> For x86 targets, when -fno-plt is used, external functions are called
> >>> via GOT slot, in 64-bit mode:
> >>>
> >>>         [bnd] call/jmp *foo@GOTPCREL(%rip)
> >>>
> >>> and in 32-bit mode:
> >>>
> >>>         [bnd] call/jmp *foo@GOT[(%reg)]
> >>>
> >>> With -mindirect-branch=, they are converted to, in 64-bit mode:
> >>>
> >>>         pushq          foo@GOTPCREL(%rip)
> >>>         [bnd] jmp      __x86_indirect_thunk[_bnd]
> >>>
> >>> and in 32-bit mode:
> >>>
> >>>         pushl          foo@GOT[(%reg)]
> >>>         [bnd] jmp      __x86_indirect_thunk[_bnd]
> >>>
> >>> which were incompatible with CFI.  In 64-bit mode, since R11 is a scratch
> >>> register, we generate:
> >>>
> >>>         movq           foo@GOTPCREL(%rip), %r11
> >>>         [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11
> >>>
> >>> instead.  We do it in ix86_output_indirect_branch so that we can use
> >>> the newly proposed R_X86_64_THUNK_GOTPCRELX relocation:
> >>>
> >>> https://groups.google.com/forum/#!topic/x86-64-abi/eED5lzn3_Mg
> >>>
> >>>         movq           foo@OTPCREL_THUNK(%rip), %r11
> >>>         [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11
> >>>
> >>> to load GOT slot into R11.  If foo is defined locally, linker can can
> >>> convert
> >>>
> >>>         movq           foo@GOTPCREL_THUNK(%rip), %reg
> >>>         call/jmp       __x86_indirect_thunk_reg
> >>>
> >>> to
> >>>
> >>>         call/jmp       foo
> >>>         nop            0L(%rax)
> >>>
> >>> In 32-bit mode, since all caller-saved registers, EAX, EDX and ECX, may
> >>> used to function parameters, there is no scratch register available.  For
> >>> -fno-plt -fno-pic -mindirect-branch=, we expand external function call
> >>> to:
> >>>
> >>>         movl           foo@GOT, %reg
> >>>         [bnd] call/jmp *%reg
> >>>
> >>> so that it can be converted to
> >>>
> >>>         movl           foo@GOT, %reg
> >>>         [bnd] call/jmp __x86_indirect_thunk_[bnd_]reg
> >>>
> >>> in ix86_output_indirect_branch.  Since this is performed during RTL
> >>> expansion, other instructions may be inserted between movl and call/jmp.
> >>> Linker optimization isn't always possible.

I suppose we can just combine those into patterns if we want to prevent gcc from
interleaving this with other instructions.  However since this affects ABI and
not only return thunk, did you discuss the changes with LLVM folks as well?

I would be nice to not have diverging solutions.

Honza

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PING^3: [PATCH] x86: Force __x86_indirect_thunk_reg for function call via GOT
  2018-03-15 20:51 ` Jan Hubicka
@ 2018-03-15 21:03   ` H.J. Lu
  2018-03-15 21:42     ` Jan Hubicka
  0 siblings, 1 reply; 5+ messages in thread
From: H.J. Lu @ 2018-03-15 21:03 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches

On Thu, Mar 15, 2018 at 1:41 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Sun, Mar 11, 2018 at 7:40 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> > On Mon, Mar 5, 2018 at 4:20 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> >> On Tue, Feb 27, 2018 at 11:39 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>> >>> For x86 targets, when -fno-plt is used, external functions are called
>> >>> via GOT slot, in 64-bit mode:
>> >>>
>> >>>         [bnd] call/jmp *foo@GOTPCREL(%rip)
>> >>>
>> >>> and in 32-bit mode:
>> >>>
>> >>>         [bnd] call/jmp *foo@GOT[(%reg)]
>> >>>
>> >>> With -mindirect-branch=, they are converted to, in 64-bit mode:
>> >>>
>> >>>         pushq          foo@GOTPCREL(%rip)
>> >>>         [bnd] jmp      __x86_indirect_thunk[_bnd]
>> >>>
>> >>> and in 32-bit mode:
>> >>>
>> >>>         pushl          foo@GOT[(%reg)]
>> >>>         [bnd] jmp      __x86_indirect_thunk[_bnd]
>> >>>
>> >>> which were incompatible with CFI.  In 64-bit mode, since R11 is a scratch
>> >>> register, we generate:
>> >>>
>> >>>         movq           foo@GOTPCREL(%rip), %r11
>> >>>         [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11
>> >>>
>> >>> instead.  We do it in ix86_output_indirect_branch so that we can use
>> >>> the newly proposed R_X86_64_THUNK_GOTPCRELX relocation:
>> >>>
>> >>> https://groups.google.com/forum/#!topic/x86-64-abi/eED5lzn3_Mg
>> >>>
>> >>>         movq           foo@OTPCREL_THUNK(%rip), %r11
>> >>>         [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11
>> >>>
>> >>> to load GOT slot into R11.  If foo is defined locally, linker can can
>> >>> convert
>> >>>
>> >>>         movq           foo@GOTPCREL_THUNK(%rip), %reg
>> >>>         call/jmp       __x86_indirect_thunk_reg
>> >>>
>> >>> to
>> >>>
>> >>>         call/jmp       foo
>> >>>         nop            0L(%rax)
>> >>>
>> >>> In 32-bit mode, since all caller-saved registers, EAX, EDX and ECX, may
>> >>> used to function parameters, there is no scratch register available.  For
>> >>> -fno-plt -fno-pic -mindirect-branch=, we expand external function call
>> >>> to:
>> >>>
>> >>>         movl           foo@GOT, %reg
>> >>>         [bnd] call/jmp *%reg
>> >>>
>> >>> so that it can be converted to
>> >>>
>> >>>         movl           foo@GOT, %reg
>> >>>         [bnd] call/jmp __x86_indirect_thunk_[bnd_]reg
>> >>>
>> >>> in ix86_output_indirect_branch.  Since this is performed during RTL
>> >>> expansion, other instructions may be inserted between movl and call/jmp.
>> >>> Linker optimization isn't always possible.
>
> I suppose we can just combine those into patterns if we want to prevent gcc from

I will look into it.

> interleaving this with other instructions.  However since this affects ABI and
> not only return thunk, did you discuss the changes with LLVM folks as well?

This doesn't change calling convention.   The new R_X86_64_THUNK_GOTPCRELX
relocation is an optimization.   It can be safely treated as
R_X86_64_GOTPCRELX.

> I would be nice to not have diverging solutions.
>

That is why I posted the new relocation to x86-64 psABI group.

-- 
H.J.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PING^3: [PATCH] x86: Force __x86_indirect_thunk_reg for function call via GOT
  2018-03-15 21:03   ` H.J. Lu
@ 2018-03-15 21:42     ` Jan Hubicka
  2018-03-15 22:39       ` H.J. Lu
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Hubicka @ 2018-03-15 21:42 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

> >> >>> in ix86_output_indirect_branch.  Since this is performed during RTL
> >> >>> expansion, other instructions may be inserted between movl and call/jmp.
> >> >>> Linker optimization isn't always possible.
> >
> > I suppose we can just combine those into patterns if we want to prevent gcc from
> 
> I will look into it.

I suppose we may want that for next stage1. Right now it would be nice to keep patches
simple if possible...
> 
> > interleaving this with other instructions.  However since this affects ABI and
> > not only return thunk, did you discuss the changes with LLVM folks as well?
> 
> This doesn't change calling convention.   The new R_X86_64_THUNK_GOTPCRELX
> relocation is an optimization.   It can be safely treated as
> R_X86_64_GOTPCRELX.
> 
> > I would be nice to not have diverging solutions.
> >
> 
> That is why I posted the new relocation to x86-64 psABI group.

I wonder if anyone from LLVM camp is reading.  I will take a look at the proposal too.

Honza
> 
> -- 
> H.J.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PING^3: [PATCH] x86: Force __x86_indirect_thunk_reg for function call via GOT
  2018-03-15 21:42     ` Jan Hubicka
@ 2018-03-15 22:39       ` H.J. Lu
  0 siblings, 0 replies; 5+ messages in thread
From: H.J. Lu @ 2018-03-15 22:39 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches

On Thu, Mar 15, 2018 at 2:03 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> >>> in ix86_output_indirect_branch.  Since this is performed during RTL
>> >> >>> expansion, other instructions may be inserted between movl and call/jmp.
>> >> >>> Linker optimization isn't always possible.
>> >
>> > I suppose we can just combine those into patterns if we want to prevent gcc from
>>
>> I will look into it.
>
> I suppose we may want that for next stage1. Right now it would be nice to keep patches
> simple if possible...

Sure.

>> > interleaving this with other instructions.  However since this affects ABI and
>> > not only return thunk, did you discuss the changes with LLVM folks as well?
>>
>> This doesn't change calling convention.   The new R_X86_64_THUNK_GOTPCRELX
>> relocation is an optimization.   It can be safely treated as
>> R_X86_64_GOTPCRELX.
>>
>> > I would be nice to not have diverging solutions.
>> >
>>
>> That is why I posted the new relocation to x86-64 psABI group.
>
> I wonder if anyone from LLVM camp is reading.  I will take a look at the proposal too.
>

Quite a few LLVM developers subscribe x86-64 psABI mailing list.


-- 
H.J.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-03-15 21:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-15 15:51 PING^3: [PATCH] x86: Force __x86_indirect_thunk_reg for function call via GOT H.J. Lu
2018-03-15 20:51 ` Jan Hubicka
2018-03-15 21:03   ` H.J. Lu
2018-03-15 21:42     ` Jan Hubicka
2018-03-15 22:39       ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).