public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PING: [PATCH] x86: Force __x86_indirect_thunk_reg for function call via GOT
@ 2018-03-05 12:20 H.J. Lu
  2018-03-20 14:58 ` Jan Hubicka
  0 siblings, 1 reply; 3+ messages in thread
From: H.J. Lu @ 2018-03-05 12:20 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jan Hubicka

On Tue, Feb 27, 2018 at 11:39 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
> For x86 targets, when -fno-plt is used, external functions are called
> via GOT slot, in 64-bit mode:
>
>         [bnd] call/jmp *foo@GOTPCREL(%rip)
>
> and in 32-bit mode:
>
>         [bnd] call/jmp *foo@GOT[(%reg)]
>
> With -mindirect-branch=, they are converted to, in 64-bit mode:
>
>         pushq          foo@GOTPCREL(%rip)
>         [bnd] jmp      __x86_indirect_thunk[_bnd]
>
> and in 32-bit mode:
>
>         pushl          foo@GOT[(%reg)]
>         [bnd] jmp      __x86_indirect_thunk[_bnd]
>
> which were incompatible with CFI.  In 64-bit mode, since R11 is a scratch
> register, we generate:
>
>         movq           foo@GOTPCREL(%rip), %r11
>         [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11
>
> instead.  We do it in ix86_output_indirect_branch so that we can use
> the newly proposed R_X86_64_THUNK_GOTPCRELX relocation:
>
> https://groups.google.com/forum/#!topic/x86-64-abi/eED5lzn3_Mg
>
>         movq           foo@OTPCREL_THUNK(%rip), %r11
>         [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11
>
> to load GOT slot into R11.  If foo is defined locally, linker can can
> convert
>
>         movq           foo@GOTPCREL_THUNK(%rip), %reg
>         call/jmp       __x86_indirect_thunk_reg
>
> to
>
>         call/jmp       foo
>         nop            0L(%rax)
>
> In 32-bit mode, since all caller-saved registers, EAX, EDX and ECX, may
> used to function parameters, there is no scratch register available.  For
> -fno-plt -fno-pic -mindirect-branch=, we expand external function call
> to:
>
>         movl           foo@GOT, %reg
>         [bnd] call/jmp *%reg
>
> so that it can be converted to
>
>         movl           foo@GOT, %reg
>         [bnd] call/jmp __x86_indirect_thunk_[bnd_]reg
>
> in ix86_output_indirect_branch.  Since this is performed during RTL
> expansion, other instructions may be inserted between movl and call/jmp.
> Linker optimization isn't always possible.
>
> Tested on i686 and x86-64.  OK for trunk?
>
>
> H.J.
> ---
> gcc/
>
>         PR target/83970
>         * config/i386/constraints.md (Bs): Allow GOT_memory_operand
>         for TARGET_LP64 with indirect branch conversion.
>         (Bw): Likewise.
>         * config/i386/i386.c (ix86_expand_call): Handle -fno-plt with
>         -mindirect-branch=.
>         (ix86_nopic_noplt_attribute_p): Likewise.
>         (ix86_output_indirect_branch): In 64-bit mode, convert function
>         call via GOT with R11 as a scratch register using
>         __x86_indirect_thunk_r11.
>         (ix86_output_call_insn): In 64-bit mode, set xasm to NULL when
>         calling ix86_output_indirect_branch with function call via GOT.
>         * config/i386/i386.md (*call_got_thunk): New call pattern for
>         TARGET_LP64 with indirect branch conversion.
>         (*call_value_got_thunk): Likewise.
>
> gcc/testsuite/
>
>         PR target/83970
>         * gcc.target/i386/indirect-thunk-5.c: Updated.
>         * gcc.target/i386/indirect-thunk-6.c: Likewise.
>         * gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
>         * gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
>         * gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
>         * gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
>         * gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
>         * gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
>         * gcc.target/i386/indirect-thunk-13.c: New test.
>         * gcc.target/i386/indirect-thunk-14.c: Likewise.
>         * gcc.target/i386/indirect-thunk-bnd-5.c: Likewise.
>         * gcc.target/i386/indirect-thunk-bnd-6.c: Likewise.
>         * gcc.target/i386/indirect-thunk-extern-11.c: Likewise.
>         * gcc.target/i386/indirect-thunk-extern-12.c: Likewise.
>         * gcc.target/i386/indirect-thunk-inline-8.c: Likewise.
>         * gcc.target/i386/indirect-thunk-inline-9.c: Likewise.

PING:

https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01527.html


-- 
H.J.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: PING: [PATCH] x86: Force __x86_indirect_thunk_reg for function call via GOT
  2018-03-05 12:20 PING: [PATCH] x86: Force __x86_indirect_thunk_reg for function call via GOT H.J. Lu
@ 2018-03-20 14:58 ` Jan Hubicka
  2018-03-20 15:27   ` H.J. Lu
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Hubicka @ 2018-03-20 14:58 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

> On Tue, Feb 27, 2018 at 11:39 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
> > For x86 targets, when -fno-plt is used, external functions are called
> > via GOT slot, in 64-bit mode:
> >
> >         [bnd] call/jmp *foo@GOTPCREL(%rip)
> >
> > and in 32-bit mode:
> >
> >         [bnd] call/jmp *foo@GOT[(%reg)]
> >
> > With -mindirect-branch=, they are converted to, in 64-bit mode:
> >
> >         pushq          foo@GOTPCREL(%rip)
> >         [bnd] jmp      __x86_indirect_thunk[_bnd]
> >
> > and in 32-bit mode:
> >
> >         pushl          foo@GOT[(%reg)]
> >         [bnd] jmp      __x86_indirect_thunk[_bnd]
> >
> > which were incompatible with CFI.  In 64-bit mode, since R11 is a scratch
> > register, we generate:
> >
> >         movq           foo@GOTPCREL(%rip), %r11
> >         [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11
> >
> > instead.  We do it in ix86_output_indirect_branch so that we can use
> > the newly proposed R_X86_64_THUNK_GOTPCRELX relocation:
> >
> > https://groups.google.com/forum/#!topic/x86-64-abi/eED5lzn3_Mg
> >
> >         movq           foo@OTPCREL_THUNK(%rip), %r11
> >         [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11
> >
> > to load GOT slot into R11.  If foo is defined locally, linker can can
> > convert
> >
> >         movq           foo@GOTPCREL_THUNK(%rip), %reg
> >         call/jmp       __x86_indirect_thunk_reg
> >
> > to
> >
> >         call/jmp       foo
> >         nop            0L(%rax)
> >
> > In 32-bit mode, since all caller-saved registers, EAX, EDX and ECX, may
> > used to function parameters, there is no scratch register available.  For
> > -fno-plt -fno-pic -mindirect-branch=, we expand external function call
> > to:
> >
> >         movl           foo@GOT, %reg
> >         [bnd] call/jmp *%reg
> >
> > so that it can be converted to
> >
> >         movl           foo@GOT, %reg
> >         [bnd] call/jmp __x86_indirect_thunk_[bnd_]reg
> >
> > in ix86_output_indirect_branch.  Since this is performed during RTL
> > expansion, other instructions may be inserted between movl and call/jmp.
> > Linker optimization isn't always possible.
> >
> > Tested on i686 and x86-64.  OK for trunk?
> >
> >
> > H.J.
> > ---
> > gcc/
> >
> >         PR target/83970
> >         * config/i386/constraints.md (Bs): Allow GOT_memory_operand
> >         for TARGET_LP64 with indirect branch conversion.
> >         (Bw): Likewise.
> >         * config/i386/i386.c (ix86_expand_call): Handle -fno-plt with
> >         -mindirect-branch=.
> >         (ix86_nopic_noplt_attribute_p): Likewise.
> >         (ix86_output_indirect_branch): In 64-bit mode, convert function
> >         call via GOT with R11 as a scratch register using
> >         __x86_indirect_thunk_r11.
> >         (ix86_output_call_insn): In 64-bit mode, set xasm to NULL when
> >         calling ix86_output_indirect_branch with function call via GOT.
> >         * config/i386/i386.md (*call_got_thunk): New call pattern for
> >         TARGET_LP64 with indirect branch conversion.
> >         (*call_value_got_thunk): Likewise.
> >
> > gcc/testsuite/
> >
> >         PR target/83970
> >         * gcc.target/i386/indirect-thunk-5.c: Updated.
> >         * gcc.target/i386/indirect-thunk-6.c: Likewise.
> >         * gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
> >         * gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
> >         * gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
> >         * gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
> >         * gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
> >         * gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
> >         * gcc.target/i386/indirect-thunk-13.c: New test.
> >         * gcc.target/i386/indirect-thunk-14.c: Likewise.
> >         * gcc.target/i386/indirect-thunk-bnd-5.c: Likewise.
> >         * gcc.target/i386/indirect-thunk-bnd-6.c: Likewise.
> >         * gcc.target/i386/indirect-thunk-extern-11.c: Likewise.
> >         * gcc.target/i386/indirect-thunk-extern-12.c: Likewise.
> >         * gcc.target/i386/indirect-thunk-inline-8.c: Likewise.
> >         * gcc.target/i386/indirect-thunk-inline-9.c: Likewise.

Patch is OK.  I am just bit worried how many additional features we will need
relatively late in stage4. My understanding is that at the moment there are no
direct plans to retpoline userland, but I see that it may change in future.
Can you give us bit of review if there are still some missing parts?

Thanks,
Honza

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: PING: [PATCH] x86: Force __x86_indirect_thunk_reg for function call via GOT
  2018-03-20 14:58 ` Jan Hubicka
@ 2018-03-20 15:27   ` H.J. Lu
  0 siblings, 0 replies; 3+ messages in thread
From: H.J. Lu @ 2018-03-20 15:27 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches

On Tue, Mar 20, 2018 at 7:48 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Tue, Feb 27, 2018 at 11:39 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>> > For x86 targets, when -fno-plt is used, external functions are called
>> > via GOT slot, in 64-bit mode:
>> >
>> >         [bnd] call/jmp *foo@GOTPCREL(%rip)
>> >
>> > and in 32-bit mode:
>> >
>> >         [bnd] call/jmp *foo@GOT[(%reg)]
>> >
>> > With -mindirect-branch=, they are converted to, in 64-bit mode:
>> >
>> >         pushq          foo@GOTPCREL(%rip)
>> >         [bnd] jmp      __x86_indirect_thunk[_bnd]
>> >
>> > and in 32-bit mode:
>> >
>> >         pushl          foo@GOT[(%reg)]
>> >         [bnd] jmp      __x86_indirect_thunk[_bnd]
>> >
>> > which were incompatible with CFI.  In 64-bit mode, since R11 is a scratch
>> > register, we generate:
>> >
>> >         movq           foo@GOTPCREL(%rip), %r11
>> >         [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11
>> >
>> > instead.  We do it in ix86_output_indirect_branch so that we can use
>> > the newly proposed R_X86_64_THUNK_GOTPCRELX relocation:
>> >
>> > https://groups.google.com/forum/#!topic/x86-64-abi/eED5lzn3_Mg
>> >
>> >         movq           foo@OTPCREL_THUNK(%rip), %r11
>> >         [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11
>> >
>> > to load GOT slot into R11.  If foo is defined locally, linker can can
>> > convert
>> >
>> >         movq           foo@GOTPCREL_THUNK(%rip), %reg
>> >         call/jmp       __x86_indirect_thunk_reg
>> >
>> > to
>> >
>> >         call/jmp       foo
>> >         nop            0L(%rax)
>> >
>> > In 32-bit mode, since all caller-saved registers, EAX, EDX and ECX, may
>> > used to function parameters, there is no scratch register available.  For
>> > -fno-plt -fno-pic -mindirect-branch=, we expand external function call
>> > to:
>> >
>> >         movl           foo@GOT, %reg
>> >         [bnd] call/jmp *%reg
>> >
>> > so that it can be converted to
>> >
>> >         movl           foo@GOT, %reg
>> >         [bnd] call/jmp __x86_indirect_thunk_[bnd_]reg
>> >
>> > in ix86_output_indirect_branch.  Since this is performed during RTL
>> > expansion, other instructions may be inserted between movl and call/jmp.
>> > Linker optimization isn't always possible.
>> >
>> > Tested on i686 and x86-64.  OK for trunk?
>> >
>> >
>> > H.J.
>> > ---
>> > gcc/
>> >
>> >         PR target/83970
>> >         * config/i386/constraints.md (Bs): Allow GOT_memory_operand
>> >         for TARGET_LP64 with indirect branch conversion.
>> >         (Bw): Likewise.
>> >         * config/i386/i386.c (ix86_expand_call): Handle -fno-plt with
>> >         -mindirect-branch=.
>> >         (ix86_nopic_noplt_attribute_p): Likewise.
>> >         (ix86_output_indirect_branch): In 64-bit mode, convert function
>> >         call via GOT with R11 as a scratch register using
>> >         __x86_indirect_thunk_r11.
>> >         (ix86_output_call_insn): In 64-bit mode, set xasm to NULL when
>> >         calling ix86_output_indirect_branch with function call via GOT.
>> >         * config/i386/i386.md (*call_got_thunk): New call pattern for
>> >         TARGET_LP64 with indirect branch conversion.
>> >         (*call_value_got_thunk): Likewise.
>> >
>> > gcc/testsuite/
>> >
>> >         PR target/83970
>> >         * gcc.target/i386/indirect-thunk-5.c: Updated.
>> >         * gcc.target/i386/indirect-thunk-6.c: Likewise.
>> >         * gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
>> >         * gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
>> >         * gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
>> >         * gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
>> >         * gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
>> >         * gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
>> >         * gcc.target/i386/indirect-thunk-13.c: New test.
>> >         * gcc.target/i386/indirect-thunk-14.c: Likewise.
>> >         * gcc.target/i386/indirect-thunk-bnd-5.c: Likewise.
>> >         * gcc.target/i386/indirect-thunk-bnd-6.c: Likewise.
>> >         * gcc.target/i386/indirect-thunk-extern-11.c: Likewise.
>> >         * gcc.target/i386/indirect-thunk-extern-12.c: Likewise.
>> >         * gcc.target/i386/indirect-thunk-inline-8.c: Likewise.
>> >         * gcc.target/i386/indirect-thunk-inline-9.c: Likewise.
>
> Patch is OK.  I am just bit worried how many additional features we will need
> relatively late in stage4. My understanding is that at the moment there are no

I will punt it for GCC 9 and add new call patterns with a scratch register
based on your previous feedback:

https://gcc.gnu.org/ml/gcc-patches/2018-03/msg00766.html

> direct plans to retpoline userland, but I see that it may change in future.
> Can you give us bit of review if there are still some missing parts?

This is the only one missing.  I do have a proposal to use retpoline with
CET in user space such that a single binary will support both retpoline
and CET.  It requires linker, GCC and glibc changes.  But I don't know
if there is a demand for it.


-- 
H.J.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-03-20 14:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-05 12:20 PING: [PATCH] x86: Force __x86_indirect_thunk_reg for function call via GOT H.J. Lu
2018-03-20 14:58 ` Jan Hubicka
2018-03-20 15:27   ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).