[x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
@ 2013-07-23 19:49 H.J. Lu
  2013-07-24  8:44 ` Florian Weimer
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: H.J. Lu @ 2013-07-23 19:49 UTC (permalink / raw)
  To: GNU C Library, GCC Development, Binutils, Girkar, Milind,
	Kreitzer, David L

Intel MPX:

http://software.intel.com/sites/default/files/319433-015.pdf

introduces 4 bound registers, which will be used for parameter passing
in x86-64.  Bound registers are cleared by branch instructions.  Branch
instructions with BND prefix will keep bound register contents. This leads
to 2 requirements to 64-bit MPX run-time:
 1. Dynamic linker (ld.so) should save and restore bound registers during
    symbol lookup.
 2. Extend the current 16-byte PLT entry:

  ff 25 32 8b 21 00        jmpq   *name@GOTPCREL(%rip)
  68 00 00 00 00           pushq  $index
  e9 00 00 00 00           jmpq   PLT0

    which clear bound registers, to 32-byte to add BND prefix to branch
    instructions.

There are 2 psABI considerations:

 1. Should PLT entries in all binaries, with and without MPX, be changed
    to 32-byte or just the necessary ones?
 2. Only branch to PLT entry with BND prefix needs 32-byte PLT entry. If
    we use 32-byte PLT entry only when needed, it can be decided by:
    a. A new MPX PLT relocation:
       i. No new run-time relocation since MPX PLT relocation is
      resolved to branch to PLT entry at link-time.
       ii. Pro: No new section.
       iii. Con:
        Need a new relocation.
        Can't mark executable nor shared library.
    b. A new note section to indicate branches to external symbols with MPX
       prefix:
       i. A note section in relocatable and addition to PT_NOTE segment
          in executable and shared library.
       ii. Pro: No new relocation.
       iii. Con: A new section.

Here is the proposed note section:

An optional x86 feature note section, .note.x86-feature, to indicate
features in the input files. The contents of this note section are:

    .section        .note.x86-feature
    .align          4
    .long           .L1 - .L0
    .long           .L3 - .L2
    .long           1
.L0:
    .asciz         "x86 feature"
.L1:
    .align          4
.L2:
    .long        FeatureFlag (Feature flag)
.L3:

The current valid bits in FeatureFlag are

#define NT_X86_FEATURE_BND_INSN_RELOC    (0x1 << 0)

It should be set if relocation against externally visible symbol is applied
to instruction with BND prefix.

The remaining bits in FeatureFlag are reserved.

If a linker supports the optional feature note section, it should follow
the rules below when processing the relocatable input for generating
relocatable file, executable or shared library:

1. Relocatable files without the feature note section are considered
as if FeatureFlag is zero.
2. An FeatureFlag bit is set if it is set in any input relocatable files.
3. The feature note section should be generated in the output file if any
FeatureFlag bit is set.
4. The feature note section should be included in PT_NOTE segment when
generating executable or shared library.

I prefer the note section solution.  Any suggestions, comments?



--
H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-07-23 19:49 [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX H.J. Lu
@ 2013-07-24  8:44 ` Florian Weimer
  2013-07-24 15:22   ` H.J. Lu
  2013-07-24 16:45 ` Ian Lance Taylor
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Florian Weimer @ 2013-07-24  8:44 UTC (permalink / raw)
  To: H.J. Lu
  Cc: GNU C Library, GCC Development, Binutils, Girkar, Milind,
	Kreitzer, David L

On 07/23/2013 09:49 PM, H.J. Lu wrote:
>   2. Extend the current 16-byte PLT entry:
>
>    ff 25 32 8b 21 00        jmpq   *name@GOTPCREL(%rip)
>    68 00 00 00 00           pushq  $index
>    e9 00 00 00 00           jmpq   PLT0
>
>      which clear bound registers, to 32-byte to add BND prefix to branch
>      instructions.

Would it be possible to use a different instruction sequence that stays 
in the 16 byte limit?  Or restrict MPX support to BIND_NOW relocations?

-- 
Florian Weimer / Red Hat Product Security Team

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-07-24  8:44 ` Florian Weimer
@ 2013-07-24 15:22   ` H.J. Lu
  0 siblings, 0 replies; 27+ messages in thread
From: H.J. Lu @ 2013-07-24 15:22 UTC (permalink / raw)
  To: Florian Weimer
  Cc: GNU C Library, GCC Development, Binutils, Girkar, Milind,
	Kreitzer, David L

On Wed, Jul 24, 2013 at 1:43 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 07/23/2013 09:49 PM, H.J. Lu wrote:
>>
>>   2. Extend the current 16-byte PLT entry:
>>
>>    ff 25 32 8b 21 00        jmpq   *name@GOTPCREL(%rip)
>>    68 00 00 00 00           pushq  $index
>>    e9 00 00 00 00           jmpq   PLT0
>>
>>      which clear bound registers, to 32-byte to add BND prefix to branch
>>      instructions.
>
>
> Would it be possible to use a different instruction sequence that stays in
> the 16 byte limit?  Or restrict MPX support to BIND_NOW relocations?
>

It isn't possible to use different insns in PLT to add BND prefix.
The issue isn't about relocation.  The issue is external calls
are routed via PLT entry, which clears bound registers.  That
is why we need to use a different PLT entry to preserve bound
registers.


--
H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-07-23 19:49 [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX H.J. Lu
  2013-07-24  8:44 ` Florian Weimer
@ 2013-07-24 16:45 ` Ian Lance Taylor
  2013-07-24 18:53   ` H.J. Lu
  2013-07-24 23:36 ` Roland McGrath
  2013-08-14 16:23 ` Jakub Jelinek
  3 siblings, 1 reply; 27+ messages in thread
From: Ian Lance Taylor @ 2013-07-24 16:45 UTC (permalink / raw)
  To: H.J. Lu
  Cc: GNU C Library, GCC Development, Binutils, Girkar, Milind,
	Kreitzer, David L

On Tue, Jul 23, 2013 at 12:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
> http://software.intel.com/sites/default/files/319433-015.pdf
>
> introduces 4 bound registers, which will be used for parameter passing
> in x86-64.  Bound registers are cleared by branch instructions.  Branch
> instructions with BND prefix will keep bound register contents.

I took a very quick look at the doc.  Why shouldn't we run the kernel
with BNDPRESERVE = 1, to avoid this behaviour of clearing the bound
registers on branch instructions?  That would let us avoid these
issues.


> I prefer the note section solution.  Any suggestions, comments?

I concur, but why not use the ELF attributes support rather than a new
note section?

Ian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-07-24 16:45 ` Ian Lance Taylor
@ 2013-07-24 18:53   ` H.J. Lu
  2013-07-24 18:59     ` Ian Lance Taylor
  0 siblings, 1 reply; 27+ messages in thread
From: H.J. Lu @ 2013-07-24 18:53 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: GNU C Library, GCC Development, Binutils, Girkar, Milind,
	Kreitzer, David L

On Wed, Jul 24, 2013 at 9:45 AM, Ian Lance Taylor <iant@google.com> wrote:
> On Tue, Jul 23, 2013 at 12:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> http://software.intel.com/sites/default/files/319433-015.pdf
>>
>> introduces 4 bound registers, which will be used for parameter passing
>> in x86-64.  Bound registers are cleared by branch instructions.  Branch
>> instructions with BND prefix will keep bound register contents.
>
> I took a very quick look at the doc.  Why shouldn't we run the kernel
> with BNDPRESERVE = 1, to avoid this behaviour of clearing the bound
> registers on branch instructions?  That would let us avoid these
> issues.

This doesn't work in case of legacy callees which return pointers.
The bound registers will be incorrect since they are set in the
last MPX function.  MPX callers will get wrong bounds on
pointers returned by legacy callees

>
>> I prefer the note section solution.  Any suggestions, comments?
>
> I concur, but why not use the ELF attributes support rather than a new
> note section?
>

The issues are

1. ELF attributes target static linker.  There is no support in
shared library nor executables.  We may need it to make run-time
decision based on MPX feature to select legacy or MPX share
library.
2. ELF attribute lookup isn't very fast at run-time.


--
H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-07-24 18:53   ` H.J. Lu
@ 2013-07-24 18:59     ` Ian Lance Taylor
  2013-07-24 19:14       ` H.J. Lu
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Lance Taylor @ 2013-07-24 18:59 UTC (permalink / raw)
  To: H.J. Lu
  Cc: GNU C Library, GCC Development, Binutils, Girkar, Milind,
	Kreitzer, David L

On Wed, Jul 24, 2013 at 11:53 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jul 24, 2013 at 9:45 AM, Ian Lance Taylor <iant@google.com> wrote:
>> On Tue, Jul 23, 2013 at 12:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>
>>> http://software.intel.com/sites/default/files/319433-015.pdf
>>>
>>> introduces 4 bound registers, which will be used for parameter passing
>>> in x86-64.  Bound registers are cleared by branch instructions.  Branch
>>> instructions with BND prefix will keep bound register contents.
>>
>> I took a very quick look at the doc.  Why shouldn't we run the kernel
>> with BNDPRESERVE = 1, to avoid this behaviour of clearing the bound
>> registers on branch instructions?  That would let us avoid these
>> issues.
>
> This doesn't work in case of legacy callees which return pointers.
> The bound registers will be incorrect since they are set in the
> last MPX function.  MPX callers will get wrong bounds on
> pointers returned by legacy callees

As far as I can see the compiler needs to know the pair of bound
registers associated with a pointer anyhow.  So if the compiler calls
some function and gets a pointer, it needs to know the bound registers
that go with that pointer.  Are you suggesting that not only are bound
registers passed as parameters to functions, they are also implicitly
returned by functions?

Ian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-07-24 18:59     ` Ian Lance Taylor
@ 2013-07-24 19:14       ` H.J. Lu
  0 siblings, 0 replies; 27+ messages in thread
From: H.J. Lu @ 2013-07-24 19:14 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: GNU C Library, GCC Development, Binutils, Girkar, Milind,
	Kreitzer, David L

On Wed, Jul 24, 2013 at 11:59 AM, Ian Lance Taylor <iant@google.com> wrote:
> On Wed, Jul 24, 2013 at 11:53 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Jul 24, 2013 at 9:45 AM, Ian Lance Taylor <iant@google.com> wrote:
>>> On Tue, Jul 23, 2013 at 12:49 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>
>>>> http://software.intel.com/sites/default/files/319433-015.pdf
>>>>
>>>> introduces 4 bound registers, which will be used for parameter passing
>>>> in x86-64.  Bound registers are cleared by branch instructions.  Branch
>>>> instructions with BND prefix will keep bound register contents.
>>>
>>> I took a very quick look at the doc.  Why shouldn't we run the kernel
>>> with BNDPRESERVE = 1, to avoid this behaviour of clearing the bound
>>> registers on branch instructions?  That would let us avoid these
>>> issues.
>>
>> This doesn't work in case of legacy callees which return pointers.
>> The bound registers will be incorrect since they are set in the
>> last MPX function.  MPX callers will get wrong bounds on
>> pointers returned by legacy callees
>
> As far as I can see the compiler needs to know the pair of bound
> registers associated with a pointer anyhow.  So if the compiler calls
> some function and gets a pointer, it needs to know the bound registers
> that go with that pointer.  Are you suggesting that not only are bound
> registers passed as parameters to functions, they are also implicitly
> returned by functions?
>

Yes, when pointer is returned in register.


--
H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-07-23 19:49 [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX H.J. Lu
  2013-07-24  8:44 ` Florian Weimer
  2013-07-24 16:45 ` Ian Lance Taylor
@ 2013-07-24 23:36 ` Roland McGrath
  2013-07-25  0:23   ` Ian Lance Taylor
  2013-07-25 17:11   ` H.J. Lu
  2013-08-14 16:23 ` Jakub Jelinek
  3 siblings, 2 replies; 27+ messages in thread
From: Roland McGrath @ 2013-07-24 23:36 UTC (permalink / raw)
  To: H.J. Lu
  Cc: GNU C Library, GCC Development, Binutils, Girkar, Milind,
	Kreitzer, David L

I've read through the MPX spec once, but most of it is still not very
clear to me.  So please correct any misconceptions.  (HJ, if you answer
any or all of these questions in your usual style with just, "It's not a
problem," I will find you and I will kill you.  Explain!)

Will an MPX-using binary require an MPX-supporting dynamic linker to run
correctly?

* An old dynamic linker won't clobber %bndN directly, so that's not a
  problem.

* Does having the bounds registers set have any effect on regular/legacy
  code, or only when bndc[lun] instructions are used?  

  If it doesn't affect normal instructions, then I don't entirely
  understand why it would matter to clear %bnd* when entering or leaving
  legacy code.  Is it solely for the case of legacy code returning a
  pointer value, so that the new code would expect the new ABI wherein
  %bnd0 has been set to correspond to the pointer returned in %rax?

* What's the effect of entering the dynamic linker via "bnd jmp"
  (i.e. new MPX-using binary with new PLT, old dynamic linker)?  The old
  dynamic linker will leave %bndN et al exactly as they are, until its
  first unadorned branching instruction implicitly clears them.  So the
  only problem would be if the work _dl_runtime_{resolve,profile} does
  before its first branch/call were affected by the %bndN state.

If there are indeed any problems with this scenario, then you need a
plan to make new binaries require a new dynamic linker (and fail
gracefully in the absence of one, and have packaging systems grok the
dependency, etc.)

In a related vein, what's the effect of entering some legacy code via
"bnd jmp" (i.e. new binary using PLT call into legacy DSO)?  

* If the state of %bndN et al does not affect legacy code directly, then
  it's not a problem.  The legacy code will eventually use an unadorned
  branch instruction, and that will implicitly clear %bnd*.  (Even if
  it's a leaf function that's entirely branch-free, its return will
  count as such an unadorned branch instruction.)

* If that's not the case, then a PLT entry that jumps to legacy code
  will need to clear the %bndN state.  I see one straightforward
  approach, at the cost of a double-bounce (i.e. turning the normal
  double-bounce into a triple-bounce) when going from MPX code to legacy
  code.  Each PLT entry can be:

	bnd jmp *foo@GOTPCREL(%rip)
	pushq $N
	bnd jmp .Lplt0
	.balign 16
	jmp *foo@GOTPCREL+8(%rip)
	.balign 32

  and now each of those gets two (adjacent) GOT slots rather than just
  one.  When the dynamic linker resolves "foo" and sees that it's in a
  legacy DSO, it sets the foo GOT slot to point to .plt+(N*32 + 16) and
  the foo+1 GOT slot to point to the real target (resolution of "foo").
  After fixup, entering that PLT entry will do "bnd jmp" to the second
  half of the entry, which does (unadorned) "jmp" to the real target,
  implicitly clearing %bndN state.

Those are the background questions to help me understand better.
Now, to your specific questions.

I can't tell if you are proposing that a single object might contain
both 16-byte and 32-byte PLT slots next to each other in the same .plt
section.  That seems like a bad idea.  I can think of two things off
hand that expect PLT entries to be of uniform size, and there may well
be more.

* The foo@plt pseudo-symbols that e.g. objdump will display are based on
  the BFD backend knowing the size of PLT entries.  Arguably this ought
  to look at sh_entsize of .plt instead of using baked-in knowledge, but
  it doesn't.

* The linker-generated CFI for .plt is a single FDE for the whole
  section, using a DWARF expression covering all normal PLT entries
  together based on them having uniform size and contents.  (You could
  of course make the linker generate per-entry CFI, or partition the PLT
  into short and long entries and have the CFI treat the two partitions
  appropriately differently.  But that seems like a complication best
  avoided.)

Now, assuming we are talking about a uniform PLT in each object, there
is the question of whether to use a new PLT layout everywhere, or only
when linking an object with some input files that use MPX.  

* My initial reaction was to say that we should just change it
  unconditionally to keep things simple: use new linker, get new format,
  end of story.  Simplicity is good.

* But, doubling the size of PLT entries means more i-cache pressure.  If
  cache lines are 64 bytes, then today you fit four entries into a cache
  line.  Assuming PLT entries are more used than unused, this is a good
  thing.  Reducing that to two entries per cache line means twice as
  many i-cache misses if you hit a given PLT frequently (with even
  distribution of which entries you actually use--at any rate, it's
  "more" even if it's not "twice as many").  Perhaps this is enough cost
  in real-world situations to be worried about.  I really don't know.

* As I mentioned before, there are things floating around that think
  they know the size of PLT entries.  Realistically, there will be
  plenty of people using new tools to build binaries but not using MPX
  at all, and these people will give those binaries to people who have
  old tools.  In the case of someone running an old objdump on a new
  binary, they would see bogus foo@plt pseudo-symbols and be misled and
  confused.  Not to mention the unknown unknowns, i.e. other things that
  "know" the size of PLT entries that we don't know about or haven't
  thought of here.  It's just basic conservatism not to perturb things
  for these people who don't care about or need anything related to MPX
  at all.

How a relocatable object is marked so that the linker knows whether its
code is MPX-compatible at link time and how a DSO/executable is marked
so that the dynamic linker knows at runtime are two separate subjects.

For relocatable objects, I don't think there is really any precedent for
using ELF notes to tell the linker things.  It seems much nicer if the
linker continues to treat notes completely normally, i.e. appending
input files' same-named note sections together like with any other named
section rather than magically recognizing and swallowing certain notes.
OTOH, the SHT_GNU_ATTRIBUTES mechanism exists for exactly this sort of
purpose and is used on other machines for very similar sorts of issues.
There is both precedent and existing code in binutils to have the linker
merge attribute sections from many input files together in a fashion
aware of the semantics of those sections, and to have those attributes
affect the linker's behavior in machine-specific ways.  I think you have
to make a very strong case to use anything other than SHT_GNU_ATTRIBUTES
for this sort of purpose in relocatable objects.

For linked objects, there a couple of obvious choices.  They all require
that the linker have special knowledge to create the markings.  One
option is a note.  We use .note.ABI-tag for a similar purpose in libc,
but I don't know of any precedent for the linker synthesizing notes.
The most obvious choice is e_flags bits.  That's what other machines use
to mark ABI variants.  There are no bits assigned for x86 yet.  There
are obvious limitations to using e_flags, in that it's part of the
universal ELF psABI rather than something with vendor extensibility
built in like notes have, and in that there are only 32 bits available
to assign rather than being a wholly open-ended format like notes.  But
using e_flags is certainly simpler to synthesize in the linker and
simpler to recognize in the dynamic linker than a note format.  I think
you have to make at least a reasonable (objective) case to use a note
rather than e_flags, though I'm certainly not firmly against a note.

Finally, you've only mentioned x86-64.  The hardware details apply about
the same to x86-32 AFAICT.  If this is something that we'll eventually
want to do for x86-32 as well, then I think we should at least hash out
the plan for x86-32 fairly thoroughly before committing to a plan for
x86-64 (even if the actual implementation for x86-32 lags).  Probably
it's all much the same and working it through for x86-32 won't give us
any pause in our x86-64 plans, but we won't know until we actually do it.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-07-24 23:36 ` Roland McGrath
@ 2013-07-25  0:23   ` Ian Lance Taylor
  2013-07-25 11:09     ` Ilya Enkovich
  2013-07-25 17:24     ` H.J. Lu
  2013-07-25 17:11   ` H.J. Lu
  1 sibling, 2 replies; 27+ messages in thread
From: Ian Lance Taylor @ 2013-07-25  0:23 UTC (permalink / raw)
  To: Roland McGrath
  Cc: H.J. Lu, GNU C Library, GCC Development, Binutils, Girkar,
	Milind, Kreitzer, David L

On Wed, Jul 24, 2013 at 4:36 PM, Roland McGrath <roland@hack.frob.com> wrote:
>
> Will an MPX-using binary require an MPX-supporting dynamic linker to run
> correctly?
>
> * An old dynamic linker won't clobber %bndN directly, so that's not a
>   problem.

These are my answers and likely incorrect.

It will clobber the registers indirectly, though, as soon as it
executes a branching instruction.  The effect will be that calls from
bnd-checked code to bnd-checked code through the dynamic linker will
not succeed.

I have not yet seen the changes this will require to the ABI, but I'm
making the natural assumptions: the first four pointer arguments to a
function will be associated with a pair of bound registers, and
similarly for a returned pointer.  I don't know what the proposal is
for struct parameters and return values.

> * Does having the bounds registers set have any effect on regular/legacy
>   code, or only when bndc[lun] instructions are used?

As far as I can tell, only when the bndXX instructions are used,
though I'd be happy to hear otherwise.

>   If it doesn't affect normal instructions, then I don't entirely
>   understand why it would matter to clear %bnd* when entering or leaving
>   legacy code.  Is it solely for the case of legacy code returning a
>   pointer value, so that the new code would expect the new ABI wherein
>   %bnd0 has been set to correspond to the pointer returned in %rax?

There is no problem with clearing the bnd registers when calling in or
out of legacy code.  The issue is avoiding clearing the pointers when
calling from bnd-enabled code to bnd-enabled code.

> * What's the effect of entering the dynamic linker via "bnd jmp"
>   (i.e. new MPX-using binary with new PLT, old dynamic linker)?  The old
>   dynamic linker will leave %bndN et al exactly as they are, until its
>   first unadorned branching instruction implicitly clears them.  So the
>   only problem would be if the work _dl_runtime_{resolve,profile} does
>   before its first branch/call were affected by the %bndN state.

"It's not a problem."

> In a related vein, what's the effect of entering some legacy code via
> "bnd jmp" (i.e. new binary using PLT call into legacy DSO)?
>
> * If the state of %bndN et al does not affect legacy code directly, then
>   it's not a problem.  The legacy code will eventually use an unadorned
>   branch instruction, and that will implicitly clear %bnd*.  (Even if
>   it's a leaf function that's entirely branch-free, its return will
>   count as such an unadorned branch instruction.)

Yes.

> * If that's not the case, ....

It is the case.

> I can't tell if you are proposing that a single object might contain
> both 16-byte and 32-byte PLT slots next to each other in the same .plt
> section.  That seems like a bad idea.  I can think of two things off
> hand that expect PLT entries to be of uniform size, and there may well
> be more.
>
> * The foo@plt pseudo-symbols that e.g. objdump will display are based on
>   the BFD backend knowing the size of PLT entries.  Arguably this ought
>   to look at sh_entsize of .plt instead of using baked-in knowledge, but
>   it doesn't.

This seems fixable.  Of course, we could also keep the PLT the same
length by changing it.  The current PLT entries are

    jmpq *GOT(sym)
    pushq offset
    jmpq plt0

The linker or dynamic linker initializes *GOT(sym) to point to the
second instruction in this sequence.  So we can keep the PLT at 16
bytes by simply changing it to jump somewhere else.

    bnd jmpq *GOT(sym)
    .skip 9

We have the linker or dynamic linker fill in *GOT(sym) to point to the
second PLT table.  When the dynamic linker is involved, we use another
DT tag to point to the second PLT.  The offsets are consistent: there
is one entry in each PLT table, so the dynamic linker can compute the
right value.  Then in the second PLT we have the sequence

    pushq offset
    bnd jmpq plt0

That gives the dynamic linker the offset that it needs to update
*GOT(sym) to point to the runtime symbol value.  So we get slightly
worse instruction cache handling the first time a function is called,
but after that we are the same as before.  And PLT entries are the
same size as always so everything is simpler.

The special DT tag will tell the dynamic linker to apply the special
processing.  No attribute is needed to change behaviour.  The issue
then is: a program linked in this way will not work with an old
dynamic linker, because the old dynamic linker will not initialize
GOT(sym) to the right value.  That is a problem for any scheme, so I
think that is OK.  But if that is a concern, we could actually handle
by generating two PLTs.  One conventional PLT, and another as I just
outlined.  The linker branches to the new PLT, and initializes
GOT(sym) to point to the old PLT.  The dynamic linker spots this
because it recognizes the new DT tags, and cunningly rewrites the GOT
to point to the new PLT.  Cost is an extra jump the first time a
function is called when using the old dynamic linker.

Ian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-07-25  0:23   ` Ian Lance Taylor
@ 2013-07-25 11:09     ` Ilya Enkovich
  2013-07-25 16:33       ` H.J. Lu
  2013-07-25 17:24     ` H.J. Lu
  1 sibling, 1 reply; 27+ messages in thread
From: Ilya Enkovich @ 2013-07-25 11:09 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Roland McGrath, H.J. Lu, GNU C Library, GCC Development,
	Binutils, Girkar, Milind, Kreitzer, David L

2013/7/25 Ian Lance Taylor <iant@google.com>:
> On Wed, Jul 24, 2013 at 4:36 PM, Roland McGrath <roland@hack.frob.com> wrote:
>>
>> Will an MPX-using binary require an MPX-supporting dynamic linker to run
>> correctly?
>>
>> * An old dynamic linker won't clobber %bndN directly, so that's not a
>>   problem.
>
> These are my answers and likely incorrect.

Hi,

I want add some comments to your answers.

>
> It will clobber the registers indirectly, though, as soon as it
> executes a branching instruction.  The effect will be that calls from
> bnd-checked code to bnd-checked code through the dynamic linker will
> not succeed.

I would not say that call will fail. Some bound info will just be
lost. MPX binaries should still work correctly with old dynamic
linker. The problem here is that when you decrease level of MPX
support (use legacy dynamic linker, and legacy libraries) you decrease
a quality of bound violation detection. BTW if new PLT section is used
then table fixup after the first call will lead to correct bounds
transfer in subsequent calls.

>
> I have not yet seen the changes this will require to the ABI, but I'm
> making the natural assumptions: the first four pointer arguments to a
> function will be associated with a pair of bound registers, and
> similarly for a returned pointer.  I don't know what the proposal is
> for struct parameters and return values.

The general idea is to use bound registers for pointers passed in
registers. It does not matter if this pointer is a part of the
structure. BND0 is used to return bounds for returned pointer.

Of course, there are some more details (e.g. when more than 4 pointers
are passed in registers or when vararg call is made).

>
>
>> * Does having the bounds registers set have any effect on regular/legacy
>>   code, or only when bndc[lun] instructions are used?
>
> As far as I can tell, only when the bndXX instructions are used,
> though I'd be happy to hear otherwise.

As usually new registers affect context save/restore instructions.

>
>
>>   If it doesn't affect normal instructions, then I don't entirely
>>   understand why it would matter to clear %bnd* when entering or leaving
>>   legacy code.  Is it solely for the case of legacy code returning a
>>   pointer value, so that the new code would expect the new ABI wherein
>>   %bnd0 has been set to correspond to the pointer returned in %rax?
>
> There is no problem with clearing the bnd registers when calling in or
> out of legacy code.  The issue is avoiding clearing the pointers when
> calling from bnd-enabled code to bnd-enabled code.

When legacy code returns a pointer we need to clear at least BND0 to
avoid wrong bounds for returned pointer.
We also may have a calls sequence mpx code -> legacy code -> mpx code.
In such case we have to clear all bound register before calling mpx
code from legacy code. Otherwise nested mpx code gets wrong bounds.

Thanks,
Ilya

>
>
>> * What's the effect of entering the dynamic linker via "bnd jmp"
>>   (i.e. new MPX-using binary with new PLT, old dynamic linker)?  The old
>>   dynamic linker will leave %bndN et al exactly as they are, until its
>>   first unadorned branching instruction implicitly clears them.  So the
>>   only problem would be if the work _dl_runtime_{resolve,profile} does
>>   before its first branch/call were affected by the %bndN state.
>
> "It's not a problem."
>
>> In a related vein, what's the effect of entering some legacy code via
>> "bnd jmp" (i.e. new binary using PLT call into legacy DSO)?
>>
>> * If the state of %bndN et al does not affect legacy code directly, then
>>   it's not a problem.  The legacy code will eventually use an unadorned
>>   branch instruction, and that will implicitly clear %bnd*.  (Even if
>>   it's a leaf function that's entirely branch-free, its return will
>>   count as such an unadorned branch instruction.)
>
> Yes.
>
>> * If that's not the case, ....
>
> It is the case.
>
>> I can't tell if you are proposing that a single object might contain
>> both 16-byte and 32-byte PLT slots next to each other in the same .plt
>> section.  That seems like a bad idea.  I can think of two things off
>> hand that expect PLT entries to be of uniform size, and there may well
>> be more.
>>
>> * The foo@plt pseudo-symbols that e.g. objdump will display are based on
>>   the BFD backend knowing the size of PLT entries.  Arguably this ought
>>   to look at sh_entsize of .plt instead of using baked-in knowledge, but
>>   it doesn't.
>
> This seems fixable.  Of course, we could also keep the PLT the same
> length by changing it.  The current PLT entries are
>
>     jmpq *GOT(sym)
>     pushq offset
>     jmpq plt0
>
> The linker or dynamic linker initializes *GOT(sym) to point to the
> second instruction in this sequence.  So we can keep the PLT at 16
> bytes by simply changing it to jump somewhere else.
>
>     bnd jmpq *GOT(sym)
>     .skip 9
>
> We have the linker or dynamic linker fill in *GOT(sym) to point to the
> second PLT table.  When the dynamic linker is involved, we use another
> DT tag to point to the second PLT.  The offsets are consistent: there
> is one entry in each PLT table, so the dynamic linker can compute the
> right value.  Then in the second PLT we have the sequence
>
>     pushq offset
>     bnd jmpq plt0
>
> That gives the dynamic linker the offset that it needs to update
> *GOT(sym) to point to the runtime symbol value.  So we get slightly
> worse instruction cache handling the first time a function is called,
> but after that we are the same as before.  And PLT entries are the
> same size as always so everything is simpler.
>
> The special DT tag will tell the dynamic linker to apply the special
> processing.  No attribute is needed to change behaviour.  The issue
> then is: a program linked in this way will not work with an old
> dynamic linker, because the old dynamic linker will not initialize
> GOT(sym) to the right value.  That is a problem for any scheme, so I
> think that is OK.  But if that is a concern, we could actually handle
> by generating two PLTs.  One conventional PLT, and another as I just
> outlined.  The linker branches to the new PLT, and initializes
> GOT(sym) to point to the old PLT.  The dynamic linker spots this
> because it recognizes the new DT tags, and cunningly rewrites the GOT
> to point to the new PLT.  Cost is an extra jump the first time a
> function is called when using the old dynamic linker.
>
> Ian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-07-25 11:09     ` Ilya Enkovich
@ 2013-07-25 16:33       ` H.J. Lu
  0 siblings, 0 replies; 27+ messages in thread
From: H.J. Lu @ 2013-07-25 16:33 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: Ian Lance Taylor, Roland McGrath, GNU C Library, GCC Development,
	Binutils, Girkar, Milind, Kreitzer, David L

On Thu, Jul 25, 2013 at 4:08 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2013/7/25 Ian Lance Taylor <iant@google.com>:
>> On Wed, Jul 24, 2013 at 4:36 PM, Roland McGrath <roland@hack.frob.com> wrote:
>>>
>>> Will an MPX-using binary require an MPX-supporting dynamic linker to run
>>> correctly?
>>>
>>> * An old dynamic linker won't clobber %bndN directly, so that's not a
>>>   problem.
>>
>> These are my answers and likely incorrect.
>
> Hi,
>
> I want add some comments to your answers.
>
>>
>> It will clobber the registers indirectly, though, as soon as it
>> executes a branching instruction.  The effect will be that calls from
>> bnd-checked code to bnd-checked code through the dynamic linker will
>> not succeed.
>
> I would not say that call will fail. Some bound info will just be
> lost. MPX binaries should still work correctly with old dynamic
> linker. The problem here is that when you decrease level of MPX
> support (use legacy dynamic linker, and legacy libraries) you decrease
> a quality of bound violation detection. BTW if new PLT section is used
> then table fixup after the first call will lead to correct bounds
> transfer in subsequent calls.

To make it clear, the sequence is

MPX code -> PLT -> ld.so -> PLT -> MPX library

If ld.so doesn't preserve bound registers, bound registers
will be cleared, which means the lower bound is 0 and
upper bound is -1 (MAX), when MPX library is reached.
The MPX library will work correctly, but without MPX
protections on pointers passed in registers.


--
H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-07-24 23:36 ` Roland McGrath
  2013-07-25  0:23   ` Ian Lance Taylor
@ 2013-07-25 17:11   ` H.J. Lu
  1 sibling, 0 replies; 27+ messages in thread
From: H.J. Lu @ 2013-07-25 17:11 UTC (permalink / raw)
  To: Roland McGrath
  Cc: GNU C Library, GCC Development, Binutils, Girkar, Milind,
	Kreitzer, David L

On Wed, Jul 24, 2013 at 4:36 PM, Roland McGrath <roland@hack.frob.com> wrote:
> I've read through the MPX spec once, but most of it is still not very
> clear to me.  So please correct any misconceptions.  (HJ, if you answer
> any or all of these questions in your usual style with just, "It's not a
> problem," I will find you and I will kill you.  Explain!)
>
> Will an MPX-using binary require an MPX-supporting dynamic linker to run
> correctly?

Yes.  But you may lose MPX protection in MPX library since bound registers
are cleared in the first call with lazy bounding:

MPX code -> PLT -> ld.so -> PLT -> MPX library

>
> Those are the background questions to help me understand better.
> Now, to your specific questions.
>
> Now, assuming we are talking about a uniform PLT in each object, there
> is the question of whether to use a new PLT layout everywhere, or only
> when linking an object with some input files that use MPX.

I am proposing the uniform PLT in each object.  That was my first
question.

> * My initial reaction was to say that we should just change it
>   unconditionally to keep things simple: use new linker, get new format,
>   end of story.  Simplicity is good.

This is my thinking also.

> * But, doubling the size of PLT entries means more i-cache pressure.  If
>   cache lines are 64 bytes, then today you fit four entries into a cache
>   line.  Assuming PLT entries are more used than unused, this is a good
>   thing.  Reducing that to two entries per cache line means twice as
>   many i-cache misses if you hit a given PLT frequently (with even
>   distribution of which entries you actually use--at any rate, it's
>   "more" even if it's not "twice as many").  Perhaps this is enough cost
>   in real-world situations to be worried about.  I really don't know.
>
> * As I mentioned before, there are things floating around that think
>   they know the size of PLT entries.  Realistically, there will be
>   plenty of people using new tools to build binaries but not using MPX
>   at all, and these people will give those binaries to people who have
>   old tools.  In the case of someone running an old objdump on a new
>   binary, they would see bogus foo@plt pseudo-symbols and be misled and
>   confused.  Not to mention the unknown unknowns, i.e. other things that
>   "know" the size of PLT entries that we don't know about or haven't
>   thought of here.  It's just basic conservatism not to perturb things
>   for these people who don't care about or need anything related to MPX
>   at all.

We can investigate if the old objdump can deal with PLT entry size
change.

> How a relocatable object is marked so that the linker knows whether its
> code is MPX-compatible at link time and how a DSO/executable is marked
> so that the dynamic linker knows at runtime are two separate subjects.
>
> For relocatable objects, I don't think there is really any precedent for
> using ELF notes to tell the linker things.  It seems much nicer if the

We have been using .note.GNU-stack section at link-time for a long time.

> linker continues to treat notes completely normally, i.e. appending
> input files' same-named note sections together like with any other named
> section rather than magically recognizing and swallowing certain notes.
> OTOH, the SHT_GNU_ATTRIBUTES mechanism exists for exactly this sort of
> purpose and is used on other machines for very similar sorts of issues.
> There is both precedent and existing code in binutils to have the linker
> merge attribute sections from many input files together in a fashion
> aware of the semantics of those sections, and to have those attributes
> affect the linker's behavior in machine-specific ways.  I think you have
> to make a very strong case to use anything other than SHT_GNU_ATTRIBUTES
> for this sort of purpose in relocatable objects.
>
> For linked objects, there a couple of obvious choices.  They all require
> that the linker have special knowledge to create the markings.  One
> option is a note.  We use .note.ABI-tag for a similar purpose in libc,
> but I don't know of any precedent for the linker synthesizing notes.
> The most obvious choice is e_flags bits.  That's what other machines use
> to mark ABI variants.  There are no bits assigned for x86 yet.  There
> are obvious limitations to using e_flags, in that it's part of the
> universal ELF psABI rather than something with vendor extensibility
> built in like notes have, and in that there are only 32 bits available
> to assign rather than being a wholly open-ended format like notes.  But
> using e_flags is certainly simpler to synthesize in the linker and
> simpler to recognize in the dynamic linker than a note format.  I think
> you have to make at least a reasonable (objective) case to use a note
> rather than e_flags, though I'm certainly not firmly against a note.

My main concerns are e_flags isn't very extensible and
the old tools may not be able to handle it properly.  A note
section is backward compatible. Given that MPX insn are
NOPs on older hardware, it is safe to ignore it.  If we use the note
section in linked objects,  it is more consistent to also use it
In relocatable files.  We just need to dump the note section to
get the MPX info for both relocatable files and linked objects.

> Finally, you've only mentioned x86-64.  The hardware details apply about
> the same to x86-32 AFAICT.  If this is something that we'll eventually
> want to do for x86-32 as well, then I think we should at least hash out
> the plan for x86-32 fairly thoroughly before committing to a plan for
> x86-64 (even if the actual implementation for x86-32 lags).  Probably
> it's all much the same and working it through for x86-32 won't give us
> any pause in our x86-64 plans, but we won't know until we actually do it.
>

For ia32, my question is if MPX should be supported for functions
with the regparm attribute. If not, there is no problem with PLT
since bound registers won't be used for passing bounds for
pointers passed in registers and PLT isn't used for function returns.
If we want to support MPX for functions with the regparm attribute,
we will run into the same issue as x86-64.  My preference is not
to support MPX functions with the regparm attribute.

--
H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-07-25  0:23   ` Ian Lance Taylor
  2013-07-25 11:09     ` Ilya Enkovich
@ 2013-07-25 17:24     ` H.J. Lu
  2013-08-08  0:33       ` H.J. Lu
  1 sibling, 1 reply; 27+ messages in thread
From: H.J. Lu @ 2013-07-25 17:24 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Roland McGrath, GNU C Library, GCC Development, Binutils, Girkar,
	Milind, Kreitzer, David L

On Wed, Jul 24, 2013 at 5:23 PM, Ian Lance Taylor <iant@google.com> wrote:
>> * The foo@plt pseudo-symbols that e.g. objdump will display are based on
>>   the BFD backend knowing the size of PLT entries.  Arguably this ought
>>   to look at sh_entsize of .plt instead of using baked-in knowledge, but
>>   it doesn't.
>
> This seems fixable.  Of course, we could also keep the PLT the same
> length by changing it.  The current PLT entries are
>
>     jmpq *GOT(sym)
>     pushq offset
>     jmpq plt0
>
> The linker or dynamic linker initializes *GOT(sym) to point to the
> second instruction in this sequence.  So we can keep the PLT at 16
> bytes by simply changing it to jump somewhere else.
>
>     bnd jmpq *GOT(sym)
>     .skip 9
>
> We have the linker or dynamic linker fill in *GOT(sym) to point to the
> second PLT table.  When the dynamic linker is involved, we use another
> DT tag to point to the second PLT.  The offsets are consistent: there
> is one entry in each PLT table, so the dynamic linker can compute the
> right value.  Then in the second PLT we have the sequence
>
>     pushq offset
>     bnd jmpq plt0
>
> That gives the dynamic linker the offset that it needs to update
> *GOT(sym) to point to the runtime symbol value.  So we get slightly
> worse instruction cache handling the first time a function is called,
> but after that we are the same as before.  And PLT entries are the
> same size as always so everything is simpler.
>
> The special DT tag will tell the dynamic linker to apply the special
> processing.  No attribute is needed to change behaviour.  The issue
> then is: a program linked in this way will not work with an old
> dynamic linker, because the old dynamic linker will not initialize
> GOT(sym) to the right value.  That is a problem for any scheme, so I
> think that is OK.  But if that is a concern, we could actually handle
> by generating two PLTs.  One conventional PLT, and another as I just
> outlined.  The linker branches to the new PLT, and initializes
> GOT(sym) to point to the old PLT.  The dynamic linker spots this
> because it recognizes the new DT tags, and cunningly rewrites the GOT
> to point to the new PLT.  Cost is an extra jump the first time a
> function is called when using the old dynamic linker.
>

I don't like the complexity.  I believe extending PLT entry to
32 byte works with the old ld.so.  If we are willing to have
mixed PLT entry, we merge 2 16-byte PLT entries into one
super 32-byte PLT entry so that we can have

jmpq   *name@GOTPCREL(%rip)
pushq  $index
jmpq   PLT0
bnd jmpq   *name@GOTPCREL(%rip)
pushq  $index
bnd jmpq   PLT0
nop paddings
jmpq   *name@GOTPCREL(%rip)
pushq  $index
jmpq   PLT0

We can also have new link-time relocations for branches with BND
prefix and only create the super PLT entries when needed.  Of course,.
unwind info may be incorrect for both approach if we don't find a way
to fix it.

--
H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-07-25 17:24     ` H.J. Lu
@ 2013-08-08  0:33       ` H.J. Lu
  2013-08-08  7:19         ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: H.J. Lu @ 2013-08-08  0:33 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Roland McGrath, GNU C Library, GCC Development, Binutils, Girkar,
	Milind, Kreitzer, David L

Here is the proposal to add Tag_GNU_X86_EXTERN_BRANCH and
NT_X86_FEATURE_PLT_BND.  Any comments?

--
H.J.
---
Intel MPX:

http://software.intel.com/sites/default/files/319433-015.pdf

introduces 4 bound registers, which will be used for parameter passing
in x86-64.  Bound registers are cleared by branch instructions.  Branch
instructions with BND prefix will keep bound register contents. This leads
to 2 requirements to 64-bit MPX run-time:
 1. Dynamic linker (ld.so) should save and restore bound registers during
    symbol lookup.
 2. Extend the current 16-byte PLT entry:

  ff 25 00 00 00 00        jmpq   *name@GOTPCREL(%rip)
  68 00 00 00 00           pushq  $index
  e9 00 00 00 00           jmpq   PLT0

    which clears bound registers, to 32-byte to add BND prefix to branch
    instructions:

  f2 ff 25 00 00 00 00        bnd jmpq   *name@GOTPCREL(%rip)
  68 00 00 00 00        pushq       $index
  f2 e9 00 00 00 00           bfd jmpq   PLT0
  0f 1f 80 00 00 00 00        nopl       0(%rax)
  0f 1f 80 00 00 00 00        nopl       0(%rax)

We use the .gnu_attribute directive to record an object attribute:

enum
{
  Tag_GNU_X86_EXTERN_BRANCH = 4,
};

for the types of external branch instructions in relocatable files.

enum
{
  /* All external branch instructions are legacy.  */
  Val_GNU_X86_EXTERN_BRANCH_LEGACY = 0,
  /* There is at lease one external branch instruction with BND prefix.  */
  Val_GNU_X86_EXTERN_BRANCH_BND = 1,
};

An x86 feature note section, .note.x86-feature, is used to indicate
features in executables and shared library. The contents of this note
section are:

    .section        .note.x86-feature
    .align          4
    .long           .L1 - .L0
    .long           .L3 - .L2
    .long           1
.L0:
    .asciz         "x86 feature"
.L1:
    .align          4
.L2:
    .long        FeatureFlag (Feature flag)
.L3:

The current valid bits in FeatureFlag are

#define NT_X86_FEATURE_PLT_BND    (0x1 << 0)

It should be set if PLT entry has BND prefix to preserve bound registers.

The remaining bits in FeatureFlag are reserved.

When merging Tag_GNU_X86_EXTERN_BRANCH, if any input relocatable
file has Tag_GNU_X86_EXTERN_BRANCH set to Val_GNU_X86_EXTERN_BRANCH_BND,
the resulting Tag_GNU_X86_EXTERN_BRANCH value should be
Val_GNU_X86_EXTERN_BRANCH_BND.

When generating executable or shared library, if PLT is needed and
Tag_GNU_X86_EXTERN_BRANCH value is Val_GNU_X86_EXTERN_BRANCH_BND,
the 32-byte PLT entry should be used and the feature note section should
be generated with the NT_X86_FEATURE_PLT_BND bit set to 1 and the feature
note section should be included in PT_NOTE segment. The benefit of the
note section is it is backward compatible with existing run-time and tools.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-08-08  0:33       ` H.J. Lu
@ 2013-08-08  7:19         ` Jan Beulich
  2013-08-08 16:01           ` H.J. Lu
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2013-08-08  7:19 UTC (permalink / raw)
  To: H.J. Lu
  Cc: GCC Development, Ian Lance Taylor, Roland McGrath,
	David L Kreitzer, Milind Girkar, Binutils, GNU C Library

>>> On 08.08.13 at 02:33, "H.J. Lu" <hjl.tools@gmail.com> wrote:
> We use the .gnu_attribute directive to record an object attribute:
> 
> enum
> {
>   Tag_GNU_X86_EXTERN_BRANCH = 4,
> };
> 
> for the types of external branch instructions in relocatable files.
> 
> enum
> {
>   /* All external branch instructions are legacy.  */
>   Val_GNU_X86_EXTERN_BRANCH_LEGACY = 0,
>   /* There is at lease one external branch instruction with BND prefix.  */
>   Val_GNU_X86_EXTERN_BRANCH_BND = 1,
> };
> 
> An x86 feature note section, .note.x86-feature, is used to indicate
> features in executables and shared library. The contents of this note
> section are:
> 
>     .section        .note.x86-feature
>     .align          4
>     .long           .L1 - .L0
>     .long           .L3 - .L2
>     .long           1
> .L0:
>     .asciz         "x86 feature"
> .L1:
>     .align          4
> .L2:
>     .long        FeatureFlag (Feature flag)
> .L3:
> 
> The current valid bits in FeatureFlag are
> 
> #define NT_X86_FEATURE_PLT_BND    (0x1 << 0)
> 
> It should be set if PLT entry has BND prefix to preserve bound registers.
> 
> The remaining bits in FeatureFlag are reserved.
> 
> When merging Tag_GNU_X86_EXTERN_BRANCH, if any input relocatable
> file has Tag_GNU_X86_EXTERN_BRANCH set to Val_GNU_X86_EXTERN_BRANCH_BND,
> the resulting Tag_GNU_X86_EXTERN_BRANCH value should be
> Val_GNU_X86_EXTERN_BRANCH_BND.
> 
> When generating executable or shared library, if PLT is needed and
> Tag_GNU_X86_EXTERN_BRANCH value is Val_GNU_X86_EXTERN_BRANCH_BND,
> the 32-byte PLT entry should be used and the feature note section should
> be generated with the NT_X86_FEATURE_PLT_BND bit set to 1 and the feature
> note section should be included in PT_NOTE segment. The benefit of the
> note section is it is backward compatible with existing run-time and tools.

While I can see the purpose of the attribute section, I don't see
what the note section is for: You don't mention at all what it's
consumed for, and I also can't see how it validly would be for
anything. That's because iirc note section contents, if not
understood by the consumer, is required to not have any effect
on the correctness of the program. Hence if loaded on a system
that MPX capable, has an MPX aware kernel, but no MPX aware
user space (apart from this one executable or shared library, or
a set thereof), it ought to still work correctly. Which - afaict - it
won't if the dynamic loader itself isn't MPX aware.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-08-08  7:19         ` Jan Beulich
@ 2013-08-08 16:01           ` H.J. Lu
  2013-08-09  7:08             ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: H.J. Lu @ 2013-08-08 16:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: GCC Development, Ian Lance Taylor, Roland McGrath,
	David L Kreitzer, Milind Girkar, Binutils, GNU C Library

On Thu, Aug 8, 2013 at 12:19 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 08.08.13 at 02:33, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>> We use the .gnu_attribute directive to record an object attribute:
>>
>> enum
>> {
>>   Tag_GNU_X86_EXTERN_BRANCH = 4,
>> };
>>
>> for the types of external branch instructions in relocatable files.
>>
>> enum
>> {
>>   /* All external branch instructions are legacy.  */
>>   Val_GNU_X86_EXTERN_BRANCH_LEGACY = 0,
>>   /* There is at lease one external branch instruction with BND prefix.  */
>>   Val_GNU_X86_EXTERN_BRANCH_BND = 1,
>> };
>>
>> An x86 feature note section, .note.x86-feature, is used to indicate
>> features in executables and shared library. The contents of this note
>> section are:
>>
>>     .section        .note.x86-feature
>>     .align          4
>>     .long           .L1 - .L0
>>     .long           .L3 - .L2
>>     .long           1
>> .L0:
>>     .asciz         "x86 feature"
>> .L1:
>>     .align          4
>> .L2:
>>     .long        FeatureFlag (Feature flag)
>> .L3:
>>
>> The current valid bits in FeatureFlag are
>>
>> #define NT_X86_FEATURE_PLT_BND    (0x1 << 0)
>>
>> It should be set if PLT entry has BND prefix to preserve bound registers.
>>
>> The remaining bits in FeatureFlag are reserved.
>>
>> When merging Tag_GNU_X86_EXTERN_BRANCH, if any input relocatable
>> file has Tag_GNU_X86_EXTERN_BRANCH set to Val_GNU_X86_EXTERN_BRANCH_BND,
>> the resulting Tag_GNU_X86_EXTERN_BRANCH value should be
>> Val_GNU_X86_EXTERN_BRANCH_BND.
>>
>> When generating executable or shared library, if PLT is needed and
>> Tag_GNU_X86_EXTERN_BRANCH value is Val_GNU_X86_EXTERN_BRANCH_BND,
>> the 32-byte PLT entry should be used and the feature note section should
>> be generated with the NT_X86_FEATURE_PLT_BND bit set to 1 and the feature
>> note section should be included in PT_NOTE segment. The benefit of the
>> note section is it is backward compatible with existing run-time and tools.
>
> While I can see the purpose of the attribute section, I don't see
> what the note section is for: You don't mention at all what it's
> consumed for, and I also can't see how it validly would be for
> anything. That's because iirc note section contents, if not
> understood by the consumer, is required to not have any effect
> on the correctness of the program. Hence if loaded on a system
> that MPX capable, has an MPX aware kernel, but no MPX aware
> user space (apart from this one executable or shared library, or
> a set thereof), it ought to still work correctly. Which - afaict - it
> won't if the dynamic loader itself isn't MPX aware.
>

The note section isn't required for correctness.  But it can be used
by ld.so to select an alternate MPX aware shared library in a different
directory, instead of a legacy one.

There is another way to encode this information in the first entry
of PLT:

   0:    ff 35 00 00 00 00        pushq  GOT+8(%rip)
   6:    f2 ff 25 00 00 00 00     bnd jmpq *GOT+16(%rip)
   d:    0f 1f 44 00 00           nopl   0x0(%rax,%rax,1)
  12:    0f 1f 80 00 00 00 00     nopl   0x0(%rax)
  19:    0f 1f 80 00 00 00 01     nopl   0x1000000(%rax)

We can encode PLT property in 10 (4 + 4 + 2) bytes of
displacements of 3 nops.  In this example, the first bit
of the last byte of PLT0 is 1.

-- 
H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-08-08 16:01           ` H.J. Lu
@ 2013-08-09  7:08             ` Jan Beulich
  2013-08-09 17:03               ` H.J. Lu
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2013-08-09  7:08 UTC (permalink / raw)
  To: H.J. Lu
  Cc: GCC Development, Ian Lance Taylor, Roland McGrath,
	David L Kreitzer, Milind Girkar, Binutils, GNU C Library

>>> On 08.08.13 at 18:01, "H.J. Lu" <hjl.tools@gmail.com> wrote:
> On Thu, Aug 8, 2013 at 12:19 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 08.08.13 at 02:33, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>>> We use the .gnu_attribute directive to record an object attribute:
>>>
>>> enum
>>> {
>>>   Tag_GNU_X86_EXTERN_BRANCH = 4,
>>> };
>>>
>>> for the types of external branch instructions in relocatable files.
>>>
>>> enum
>>> {
>>>   /* All external branch instructions are legacy.  */
>>>   Val_GNU_X86_EXTERN_BRANCH_LEGACY = 0,
>>>   /* There is at lease one external branch instruction with BND prefix.  */
>>>   Val_GNU_X86_EXTERN_BRANCH_BND = 1,
>>> };
>>>
>>> An x86 feature note section, .note.x86-feature, is used to indicate
>>> features in executables and shared library. The contents of this note
>>> section are:
>>>
>>>     .section        .note.x86-feature
>>>     .align          4
>>>     .long           .L1 - .L0
>>>     .long           .L3 - .L2
>>>     .long           1
>>> .L0:
>>>     .asciz         "x86 feature"
>>> .L1:
>>>     .align          4
>>> .L2:
>>>     .long        FeatureFlag (Feature flag)
>>> .L3:
>>>
>>> The current valid bits in FeatureFlag are
>>>
>>> #define NT_X86_FEATURE_PLT_BND    (0x1 << 0)
>>>
>>> It should be set if PLT entry has BND prefix to preserve bound registers.
>>>
>>> The remaining bits in FeatureFlag are reserved.
>>>
>>> When merging Tag_GNU_X86_EXTERN_BRANCH, if any input relocatable
>>> file has Tag_GNU_X86_EXTERN_BRANCH set to Val_GNU_X86_EXTERN_BRANCH_BND,
>>> the resulting Tag_GNU_X86_EXTERN_BRANCH value should be
>>> Val_GNU_X86_EXTERN_BRANCH_BND.
>>>
>>> When generating executable or shared library, if PLT is needed and
>>> Tag_GNU_X86_EXTERN_BRANCH value is Val_GNU_X86_EXTERN_BRANCH_BND,
>>> the 32-byte PLT entry should be used and the feature note section should
>>> be generated with the NT_X86_FEATURE_PLT_BND bit set to 1 and the feature
>>> note section should be included in PT_NOTE segment. The benefit of the
>>> note section is it is backward compatible with existing run-time and tools.
>>
>> While I can see the purpose of the attribute section, I don't see
>> what the note section is for: You don't mention at all what it's
>> consumed for, and I also can't see how it validly would be for
>> anything. That's because iirc note section contents, if not
>> understood by the consumer, is required to not have any effect
>> on the correctness of the program. Hence if loaded on a system
>> that MPX capable, has an MPX aware kernel, but no MPX aware
>> user space (apart from this one executable or shared library, or
>> a set thereof), it ought to still work correctly. Which - afaict - it
>> won't if the dynamic loader itself isn't MPX aware.
>>
> 
> The note section isn't required for correctness.  But it can be used
> by ld.so to select an alternate MPX aware shared library in a different
> directory, instead of a legacy one.

Okay, that clarifies your intentions with the note section. However,
then you need something else to make sure an MPX aware app can't
load on an MPX enabled kernel without MPX-enabled ld.so.

> There is another way to encode this information in the first entry
> of PLT:
> 
>    0:    ff 35 00 00 00 00        pushq  GOT+8(%rip)
>    6:    f2 ff 25 00 00 00 00     bnd jmpq *GOT+16(%rip)
>    d:    0f 1f 44 00 00           nopl   0x0(%rax,%rax,1)
>   12:    0f 1f 80 00 00 00 00     nopl   0x0(%rax)
>   19:    0f 1f 80 00 00 00 01     nopl   0x1000000(%rax)
> 
> We can encode PLT property in 10 (4 + 4 + 2) bytes of
> displacements of 3 nops.  In this example, the first bit
> of the last byte of PLT0 is 1.

While a nice idea, I think that's worse, because much harder to
determine from simply dumping information for a given binary.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-08-09  7:08             ` Jan Beulich
@ 2013-08-09 17:03               ` H.J. Lu
  2013-08-12  9:25                 ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: H.J. Lu @ 2013-08-09 17:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: GCC Development, Ian Lance Taylor, Roland McGrath,
	David L Kreitzer, Milind Girkar, Binutils, GNU C Library

On Fri, Aug 9, 2013 at 12:08 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 08.08.13 at 18:01, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>> On Thu, Aug 8, 2013 at 12:19 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 08.08.13 at 02:33, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>>>> We use the .gnu_attribute directive to record an object attribute:
>>>>
>>>> enum
>>>> {
>>>>   Tag_GNU_X86_EXTERN_BRANCH = 4,
>>>> };
>>>>
>>>> for the types of external branch instructions in relocatable files.
>>>>
>>>> enum
>>>> {
>>>>   /* All external branch instructions are legacy.  */
>>>>   Val_GNU_X86_EXTERN_BRANCH_LEGACY = 0,
>>>>   /* There is at lease one external branch instruction with BND prefix.  */
>>>>   Val_GNU_X86_EXTERN_BRANCH_BND = 1,
>>>> };
>>>>
>>>> An x86 feature note section, .note.x86-feature, is used to indicate
>>>> features in executables and shared library. The contents of this note
>>>> section are:
>>>>
>>>>     .section        .note.x86-feature
>>>>     .align          4
>>>>     .long           .L1 - .L0
>>>>     .long           .L3 - .L2
>>>>     .long           1
>>>> .L0:
>>>>     .asciz         "x86 feature"
>>>> .L1:
>>>>     .align          4
>>>> .L2:
>>>>     .long        FeatureFlag (Feature flag)
>>>> .L3:
>>>>
>>>> The current valid bits in FeatureFlag are
>>>>
>>>> #define NT_X86_FEATURE_PLT_BND    (0x1 << 0)
>>>>
>>>> It should be set if PLT entry has BND prefix to preserve bound registers.
>>>>
>>>> The remaining bits in FeatureFlag are reserved.
>>>>
>>>> When merging Tag_GNU_X86_EXTERN_BRANCH, if any input relocatable
>>>> file has Tag_GNU_X86_EXTERN_BRANCH set to Val_GNU_X86_EXTERN_BRANCH_BND,
>>>> the resulting Tag_GNU_X86_EXTERN_BRANCH value should be
>>>> Val_GNU_X86_EXTERN_BRANCH_BND.
>>>>
>>>> When generating executable or shared library, if PLT is needed and
>>>> Tag_GNU_X86_EXTERN_BRANCH value is Val_GNU_X86_EXTERN_BRANCH_BND,
>>>> the 32-byte PLT entry should be used and the feature note section should
>>>> be generated with the NT_X86_FEATURE_PLT_BND bit set to 1 and the feature
>>>> note section should be included in PT_NOTE segment. The benefit of the
>>>> note section is it is backward compatible with existing run-time and tools.
>>>
>>> While I can see the purpose of the attribute section, I don't see
>>> what the note section is for: You don't mention at all what it's
>>> consumed for, and I also can't see how it validly would be for
>>> anything. That's because iirc note section contents, if not
>>> understood by the consumer, is required to not have any effect
>>> on the correctness of the program. Hence if loaded on a system
>>> that MPX capable, has an MPX aware kernel, but no MPX aware
>>> user space (apart from this one executable or shared library, or
>>> a set thereof), it ought to still work correctly. Which - afaict - it
>>> won't if the dynamic loader itself isn't MPX aware.
>>>
>>
>> The note section isn't required for correctness.  But it can be used
>> by ld.so to select an alternate MPX aware shared library in a different
>> directory, instead of a legacy one.
>
> Okay, that clarifies your intentions with the note section. However,
> then you need something else to make sure an MPX aware app can't
> load on an MPX enabled kernel without MPX-enabled ld.so.

The MPX enabled app will still run correctly.  ld.so will clear the bound
registers (that makes unlimited bound) for the first call with lazy binding.

>> There is another way to encode this information in the first entry
>> of PLT:
>>
>>    0:    ff 35 00 00 00 00        pushq  GOT+8(%rip)
>>    6:    f2 ff 25 00 00 00 00     bnd jmpq *GOT+16(%rip)
>>    d:    0f 1f 44 00 00           nopl   0x0(%rax,%rax,1)
>>   12:    0f 1f 80 00 00 00 00     nopl   0x0(%rax)
>>   19:    0f 1f 80 00 00 00 01     nopl   0x1000000(%rax)
>>
>> We can encode PLT property in 10 (4 + 4 + 2) bytes of
>> displacements of 3 nops.  In this example, the first bit
>> of the last byte of PLT0 is 1.
>
> While a nice idea, I think that's worse, because much harder to
> determine from simply dumping information for a given binary.
>

I agree.  That is why a note section is better.


-- 
H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-08-09 17:03               ` H.J. Lu
@ 2013-08-12  9:25                 ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2013-08-12  9:25 UTC (permalink / raw)
  To: H.J. Lu
  Cc: GCC Development, Ian Lance Taylor, Roland McGrath,
	David L Kreitzer, Milind Girkar, Binutils, GNU C Library

>>> On 09.08.13 at 19:03, "H.J. Lu" <hjl.tools@gmail.com> wrote:
> On Fri, Aug 9, 2013 at 12:08 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 08.08.13 at 18:01, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>>> On Thu, Aug 8, 2013 at 12:19 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 08.08.13 at 02:33, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>>>>> We use the .gnu_attribute directive to record an object attribute:
>>>>>
>>>>> enum
>>>>> {
>>>>>   Tag_GNU_X86_EXTERN_BRANCH = 4,
>>>>> };
>>>>>
>>>>> for the types of external branch instructions in relocatable files.
>>>>>
>>>>> enum
>>>>> {
>>>>>   /* All external branch instructions are legacy.  */
>>>>>   Val_GNU_X86_EXTERN_BRANCH_LEGACY = 0,
>>>>>   /* There is at lease one external branch instruction with BND prefix.  */
>>>>>   Val_GNU_X86_EXTERN_BRANCH_BND = 1,
>>>>> };
>>>>>
>>>>> An x86 feature note section, .note.x86-feature, is used to indicate
>>>>> features in executables and shared library. The contents of this note
>>>>> section are:
>>>>>
>>>>>     .section        .note.x86-feature
>>>>>     .align          4
>>>>>     .long           .L1 - .L0
>>>>>     .long           .L3 - .L2
>>>>>     .long           1
>>>>> .L0:
>>>>>     .asciz         "x86 feature"
>>>>> .L1:
>>>>>     .align          4
>>>>> .L2:
>>>>>     .long        FeatureFlag (Feature flag)
>>>>> .L3:
>>>>>
>>>>> The current valid bits in FeatureFlag are
>>>>>
>>>>> #define NT_X86_FEATURE_PLT_BND    (0x1 << 0)
>>>>>
>>>>> It should be set if PLT entry has BND prefix to preserve bound registers.
>>>>>
>>>>> The remaining bits in FeatureFlag are reserved.
>>>>>
>>>>> When merging Tag_GNU_X86_EXTERN_BRANCH, if any input relocatable
>>>>> file has Tag_GNU_X86_EXTERN_BRANCH set to Val_GNU_X86_EXTERN_BRANCH_BND,
>>>>> the resulting Tag_GNU_X86_EXTERN_BRANCH value should be
>>>>> Val_GNU_X86_EXTERN_BRANCH_BND.
>>>>>
>>>>> When generating executable or shared library, if PLT is needed and
>>>>> Tag_GNU_X86_EXTERN_BRANCH value is Val_GNU_X86_EXTERN_BRANCH_BND,
>>>>> the 32-byte PLT entry should be used and the feature note section should
>>>>> be generated with the NT_X86_FEATURE_PLT_BND bit set to 1 and the feature
>>>>> note section should be included in PT_NOTE segment. The benefit of the
>>>>> note section is it is backward compatible with existing run-time and tools.
>>>>
>>>> While I can see the purpose of the attribute section, I don't see
>>>> what the note section is for: You don't mention at all what it's
>>>> consumed for, and I also can't see how it validly would be for
>>>> anything. That's because iirc note section contents, if not
>>>> understood by the consumer, is required to not have any effect
>>>> on the correctness of the program. Hence if loaded on a system
>>>> that MPX capable, has an MPX aware kernel, but no MPX aware
>>>> user space (apart from this one executable or shared library, or
>>>> a set thereof), it ought to still work correctly. Which - afaict - it
>>>> won't if the dynamic loader itself isn't MPX aware.
>>>>
>>>
>>> The note section isn't required for correctness.  But it can be used
>>> by ld.so to select an alternate MPX aware shared library in a different
>>> directory, instead of a legacy one.
>>
>> Okay, that clarifies your intentions with the note section. However,
>> then you need something else to make sure an MPX aware app can't
>> load on an MPX enabled kernel without MPX-enabled ld.so.
> 
> The MPX enabled app will still run correctly.  ld.so will clear the bound
> registers (that makes unlimited bound) for the first call with lazy binding.

Only if those registers are used for their primary purpose. The
documentation specifically says that this isn't a requirement.
But anyway, I see we're once again not going to get anywhere
with this...

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-07-23 19:49 [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX H.J. Lu
                   ` (2 preceding siblings ...)
  2013-07-24 23:36 ` Roland McGrath
@ 2013-08-14 16:23 ` Jakub Jelinek
  2013-08-19 18:52   ` H.J. Lu
  3 siblings, 1 reply; 27+ messages in thread
From: Jakub Jelinek @ 2013-08-14 16:23 UTC (permalink / raw)
  To: H.J. Lu
  Cc: GNU C Library, GCC Development, Binutils, Girkar, Milind,
	Kreitzer, David L

On Tue, Jul 23, 2013 at 12:49:06PM -0700, H.J. Lu wrote:
> There are 2 psABI considerations:
> 
>  1. Should PLT entries in all binaries, with and without MPX, be changed
>     to 32-byte or just the necessary ones?

Ugh, please don't.

>  2. Only branch to PLT entry with BND prefix needs 32-byte PLT entry. If
>     we use 32-byte PLT entry only when needed, it can be decided by:
>     a. A new MPX PLT relocation:
>        i. No new run-time relocation since MPX PLT relocation is
>       resolved to branch to PLT entry at link-time.
>        ii. Pro: No new section.
>        iii. Con:
>         Need a new relocation.
>         Can't mark executable nor shared library.

I think I prefer new relocation, @mpxplt or similar.  The linker could then
use the 32-byte PLT slot for both @plt and @mpxplt relocs if there is at
least one @mpxplt reloc for the symbol, otherwise it would use 16-byte PLT
slot.  And you can certainly mark executables or PIEs or shared libraries
this way, the linker could do that if it creates any 32-byte PLT slot.

	Jakub

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-08-14 16:23 ` Jakub Jelinek
@ 2013-08-19 18:52   ` H.J. Lu
  2013-10-01 12:15     ` Ilya Enkovich
  0 siblings, 1 reply; 27+ messages in thread
From: H.J. Lu @ 2013-08-19 18:52 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: GNU C Library, GCC Development, Binutils, Girkar, Milind,
	Kreitzer, David L

On Wed, Aug 14, 2013 at 8:49 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Jul 23, 2013 at 12:49:06PM -0700, H.J. Lu wrote:
>> There are 2 psABI considerations:
>>
>>  1. Should PLT entries in all binaries, with and without MPX, be changed
>>     to 32-byte or just the necessary ones?
>
> Ugh, please don't.
>
>>  2. Only branch to PLT entry with BND prefix needs 32-byte PLT entry. If
>>     we use 32-byte PLT entry only when needed, it can be decided by:
>>     a. A new MPX PLT relocation:
>>        i. No new run-time relocation since MPX PLT relocation is
>>       resolved to branch to PLT entry at link-time.
>>        ii. Pro: No new section.
>>        iii. Con:
>>         Need a new relocation.
>>         Can't mark executable nor shared library.
>
> I think I prefer new relocation, @mpxplt or similar.  The linker could then
> use the 32-byte PLT slot for both @plt and @mpxplt relocs if there is at
> least one @mpxplt reloc for the symbol, otherwise it would use 16-byte PLT
> slot.  And you can certainly mark executables or PIEs or shared libraries
> this way, the linker could do that if it creates any 32-byte PLT slot.

We don't have to add @mpxplt since we have "bnd" prefix.  We also
need to handle "bnd call foo" in executable.  We can add new BND
version relocation for R_X86_64_PC32 and R_X86_64_PLT32, instead
of using the GNU attribute section.  Which approach is preferred?

-- 
H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-08-19 18:52   ` H.J. Lu
@ 2013-10-01 12:15     ` Ilya Enkovich
  2013-10-01 12:27       ` Jakub Jelinek
  0 siblings, 1 reply; 27+ messages in thread
From: Ilya Enkovich @ 2013-10-01 12:15 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Jakub Jelinek, GNU C Library, GCC Development, Binutils, Girkar,
	Milind, Kreitzer, David L

Hi all,

I'd like to restart discussion on this topic. I see two viable options
in this thread for PLT entry for MPX.

The first one is to use new relocation for calls requiring extended
PLT. Linker may decide then which PLT entries should be extended and
use 16 byte entries when possible. The only question here is how
dynamic linker may detect MPX binary and try to search for MPX shared
libraries. Does it have access to PLT section to check it? Isn't still
better to just use note section?

The second one is a note section. It does not have as good granularity
as new relocation but in the most cases all calls in MPX object file
would require extended PLT. Therefore linker create extended PLT entry
if it used by function from object files marked with the MPX note
section. The only drawback here is that old linker will just silently
ignore this note section and user have to check linker version.

Due to mentioned drawback of the second approach I would vote for the
new relocation but still with note section for dynamic liker.

Thanks,
Ilya

2013/8/19 H.J. Lu <hjl.tools@gmail.com>:
> On Wed, Aug 14, 2013 at 8:49 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Tue, Jul 23, 2013 at 12:49:06PM -0700, H.J. Lu wrote:
>>> There are 2 psABI considerations:
>>>
>>>  1. Should PLT entries in all binaries, with and without MPX, be changed
>>>     to 32-byte or just the necessary ones?
>>
>> Ugh, please don't.
>>
>>>  2. Only branch to PLT entry with BND prefix needs 32-byte PLT entry. If
>>>     we use 32-byte PLT entry only when needed, it can be decided by:
>>>     a. A new MPX PLT relocation:
>>>        i. No new run-time relocation since MPX PLT relocation is
>>>       resolved to branch to PLT entry at link-time.
>>>        ii. Pro: No new section.
>>>        iii. Con:
>>>         Need a new relocation.
>>>         Can't mark executable nor shared library.
>>
>> I think I prefer new relocation, @mpxplt or similar.  The linker could then
>> use the 32-byte PLT slot for both @plt and @mpxplt relocs if there is at
>> least one @mpxplt reloc for the symbol, otherwise it would use 16-byte PLT
>> slot.  And you can certainly mark executables or PIEs or shared libraries
>> this way, the linker could do that if it creates any 32-byte PLT slot.
>
> We don't have to add @mpxplt since we have "bnd" prefix.  We also
> need to handle "bnd call foo" in executable.  We can add new BND
> version relocation for R_X86_64_PC32 and R_X86_64_PLT32, instead
> of using the GNU attribute section.  Which approach is preferred?
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-10-01 12:15     ` Ilya Enkovich
@ 2013-10-01 12:27       ` Jakub Jelinek
  2013-10-02 10:02         ` Ilya Enkovich
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Jelinek @ 2013-10-01 12:27 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: H.J. Lu, GNU C Library, GCC Development, Binutils, Girkar,
	Milind, Kreitzer, David L

On Tue, Oct 01, 2013 at 04:15:53PM +0400, Ilya Enkovich wrote:
> I'd like to restart discussion on this topic. I see two viable options
> in this thread for PLT entry for MPX.
> 
> The first one is to use new relocation for calls requiring extended
> PLT. Linker may decide then which PLT entries should be extended and
> use 16 byte entries when possible. The only question here is how
> dynamic linker may detect MPX binary and try to search for MPX shared
> libraries. Does it have access to PLT section to check it? Isn't still
> better to just use note section?
> 
> The second one is a note section. It does not have as good granularity
> as new relocation but in the most cases all calls in MPX object file
> would require extended PLT. Therefore linker create extended PLT entry
> if it used by function from object files marked with the MPX note
> section. The only drawback here is that old linker will just silently
> ignore this note section and user have to check linker version.
> 
> Due to mentioned drawback of the second approach I would vote for the
> new relocation but still with note section for dynamic liker.

Whether the PLT is extended or not can be determined either by the kind
of dynamic relocations applied to it (either the relocation for
non-PLT resp. PLT MPX calls should be only for ld(1) purposes and not
dynamic, or there could be also some dynamic relocation, counterpart of
R_*_JMP_SLOT).  In the former case, if the dynamic linker would need to find
out if the PLT is extended or not for some reason, the linker could add some
.dynamic tag, that is the usual way to handle this kind of stuff.

	Jakub

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-10-01 12:27       ` Jakub Jelinek
@ 2013-10-02 10:02         ` Ilya Enkovich
  2013-10-07  9:31           ` Ilya Enkovich
  0 siblings, 1 reply; 27+ messages in thread
From: Ilya Enkovich @ 2013-10-02 10:02 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: H.J. Lu, GNU C Library, GCC Development, Binutils, Girkar,
	Milind, Kreitzer, David L

2013/10/1 Jakub Jelinek <jakub@redhat.com>:
> On Tue, Oct 01, 2013 at 04:15:53PM +0400, Ilya Enkovich wrote:
>> I'd like to restart discussion on this topic. I see two viable options
>> in this thread for PLT entry for MPX.
>>
>> The first one is to use new relocation for calls requiring extended
>> PLT. Linker may decide then which PLT entries should be extended and
>> use 16 byte entries when possible. The only question here is how
>> dynamic linker may detect MPX binary and try to search for MPX shared
>> libraries. Does it have access to PLT section to check it? Isn't still
>> better to just use note section?
>>
>> The second one is a note section. It does not have as good granularity
>> as new relocation but in the most cases all calls in MPX object file
>> would require extended PLT. Therefore linker create extended PLT entry
>> if it used by function from object files marked with the MPX note
>> section. The only drawback here is that old linker will just silently
>> ignore this note section and user have to check linker version.
>>
>> Due to mentioned drawback of the second approach I would vote for the
>> new relocation but still with note section for dynamic liker.
>
> Whether the PLT is extended or not can be determined either by the kind
> of dynamic relocations applied to it (either the relocation for
> non-PLT resp. PLT MPX calls should be only for ld(1) purposes and not
> dynamic, or there could be also some dynamic relocation, counterpart of
> R_*_JMP_SLOT).  In the former case, if the dynamic linker would need to find
> out if the PLT is extended or not for some reason, the linker could add some
> .dynamic tag, that is the usual way to handle this kind of stuff.

I do not see the reason for new dynamic relocation now. Adding PLT MPX
calls for ld is enough. As H.J. suggested it wouldn't require any
changes in compiler, 'as' may just check for 'bnd' prefix in
instruction and generate proper relocation for ld.

Having an entry in .dynamic section with special MPX tag is a good
idea. No need for a new section then.

Does anyone see flaws in this scheme?

Thanks,
Ilya

>
>         Jakub

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-10-02 10:02         ` Ilya Enkovich
@ 2013-10-07  9:31           ` Ilya Enkovich
  2013-10-07  9:48             ` Jakub Jelinek
  0 siblings, 1 reply; 27+ messages in thread
From: Ilya Enkovich @ 2013-10-07  9:31 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: H.J. Lu, GNU C Library, GCC Development, Binutils, Girkar,
	Milind, Kreitzer, David L

2013/10/2 Ilya Enkovich <enkovich.gnu@gmail.com>:
> 2013/10/1 Jakub Jelinek <jakub@redhat.com>:
>> On Tue, Oct 01, 2013 at 04:15:53PM +0400, Ilya Enkovich wrote:
>>> I'd like to restart discussion on this topic. I see two viable options
>>> in this thread for PLT entry for MPX.
>>>
>>> The first one is to use new relocation for calls requiring extended
>>> PLT. Linker may decide then which PLT entries should be extended and
>>> use 16 byte entries when possible. The only question here is how
>>> dynamic linker may detect MPX binary and try to search for MPX shared
>>> libraries. Does it have access to PLT section to check it? Isn't still
>>> better to just use note section?
>>>
>>> The second one is a note section. It does not have as good granularity
>>> as new relocation but in the most cases all calls in MPX object file
>>> would require extended PLT. Therefore linker create extended PLT entry
>>> if it used by function from object files marked with the MPX note
>>> section. The only drawback here is that old linker will just silently
>>> ignore this note section and user have to check linker version.
>>>
>>> Due to mentioned drawback of the second approach I would vote for the
>>> new relocation but still with note section for dynamic liker.
>>
>> Whether the PLT is extended or not can be determined either by the kind
>> of dynamic relocations applied to it (either the relocation for
>> non-PLT resp. PLT MPX calls should be only for ld(1) purposes and not
>> dynamic, or there could be also some dynamic relocation, counterpart of
>> R_*_JMP_SLOT).  In the former case, if the dynamic linker would need to find
>> out if the PLT is extended or not for some reason, the linker could add some
>> .dynamic tag, that is the usual way to handle this kind of stuff.
>
> I do not see the reason for new dynamic relocation now. Adding PLT MPX
> calls for ld is enough. As H.J. suggested it wouldn't require any
> changes in compiler, 'as' may just check for 'bnd' prefix in
> instruction and generate proper relocation for ld.

Seems assembler may not always detect MPX relocation. For simple calls
it may check for 'bnd' prefix, but for indirect call we need to
generate MPX relocation for 'mov' instruction storing address of the
called function. This instruction does not have any prefix and
therefore compiler has to specify relocation by itself.


Ilya

>
> Having an entry in .dynamic section with special MPX tag is a good
> idea. No need for a new section then.
>
> Does anyone see flaws in this scheme?
>
> Thanks,
> Ilya
>
>>
>>         Jakub

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-10-07  9:31           ` Ilya Enkovich
@ 2013-10-07  9:48             ` Jakub Jelinek
  2013-10-07 10:00               ` Ilya Enkovich
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Jelinek @ 2013-10-07  9:48 UTC (permalink / raw)
  To: Ilya Enkovich
  Cc: H.J. Lu, GNU C Library, GCC Development, Binutils, Girkar,
	Milind, Kreitzer, David L

On Mon, Oct 07, 2013 at 01:31:29PM +0400, Ilya Enkovich wrote:
> Seems assembler may not always detect MPX relocation. For simple calls
> it may check for 'bnd' prefix, but for indirect call we need to
> generate MPX relocation for 'mov' instruction storing address of the
> called function. This instruction does not have any prefix and
> therefore compiler has to specify relocation by itself.

Ugh, not only mov I guess.
You can easily have:

int *fn1 (int *, int *);
int *fn2 (int *, int *);
typedef int (*fnt) (int *, int *);
fnt fns[2] = { fn1, fn2 };

So perhaps we need some directive that will say that all the relocations
that could be used to refer to PLT slots need to be turned into
corresponding MPX relocations?  Or assembler switch.

	Jakub

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX
  2013-10-07  9:48             ` Jakub Jelinek
@ 2013-10-07 10:00               ` Ilya Enkovich
  0 siblings, 0 replies; 27+ messages in thread
From: Ilya Enkovich @ 2013-10-07 10:00 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: H.J. Lu, GNU C Library, GCC Development, Binutils, Girkar,
	Milind, Kreitzer, David L

2013/10/7 Jakub Jelinek <jakub@redhat.com>:
> On Mon, Oct 07, 2013 at 01:31:29PM +0400, Ilya Enkovich wrote:
>> Seems assembler may not always detect MPX relocation. For simple calls
>> it may check for 'bnd' prefix, but for indirect call we need to
>> generate MPX relocation for 'mov' instruction storing address of the
>> called function. This instruction does not have any prefix and
>> therefore compiler has to specify relocation by itself.
>
> Ugh, not only mov I guess.
> You can easily have:
>
> int *fn1 (int *, int *);
> int *fn2 (int *, int *);
> typedef int (*fnt) (int *, int *);
> fnt fns[2] = { fn1, fn2 };
>
> So perhaps we need some directive that will say that all the relocations
> that could be used to refer to PLT slots need to be turned into
> corresponding MPX relocations?  Or assembler switch.

Compiler can always make a decision knowing whether MPX code is
generated in current module or not. I think in this and all other
similar cases compiler should use something like @mpx or @mpxplt and
assembler would generate proper relocation then. For your example it
would look like:
  fns:
        .quad   fn1@mpx
        .quad   fn2@mpx
And if we introduce new relocation in compiler then we also may use
them for mpx calls/jumps. Support in assembler would be quite simple
then, without a necessity to make a decision when to use MPX
relocation.

Ilya
>
>         Jakub

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2013-10-07 10:00 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-23 19:49 [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX H.J. Lu
2013-07-24  8:44 ` Florian Weimer
2013-07-24 15:22   ` H.J. Lu
2013-07-24 16:45 ` Ian Lance Taylor
2013-07-24 18:53   ` H.J. Lu
2013-07-24 18:59     ` Ian Lance Taylor
2013-07-24 19:14       ` H.J. Lu
2013-07-24 23:36 ` Roland McGrath
2013-07-25  0:23   ` Ian Lance Taylor
2013-07-25 11:09     ` Ilya Enkovich
2013-07-25 16:33       ` H.J. Lu
2013-07-25 17:24     ` H.J. Lu
2013-08-08  0:33       ` H.J. Lu
2013-08-08  7:19         ` Jan Beulich
2013-08-08 16:01           ` H.J. Lu
2013-08-09  7:08             ` Jan Beulich
2013-08-09 17:03               ` H.J. Lu
2013-08-12  9:25                 ` Jan Beulich
2013-07-25 17:11   ` H.J. Lu
2013-08-14 16:23 ` Jakub Jelinek
2013-08-19 18:52   ` H.J. Lu
2013-10-01 12:15     ` Ilya Enkovich
2013-10-01 12:27       ` Jakub Jelinek
2013-10-02 10:02         ` Ilya Enkovich
2013-10-07  9:31           ` Ilya Enkovich
2013-10-07  9:48             ` Jakub Jelinek
2013-10-07 10:00               ` Ilya Enkovich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).