public inbox for gnu-gabi@sourceware.org
 help / color / mirror / Atom feed
* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00         ` Florian Weimer
@ 2018-01-01  0:00           ` Carlos O'Donell
  2018-01-01  0:00             ` Florian Weimer
                               ` (2 more replies)
  2018-01-01  0:00           ` H.J. Lu
  1 sibling, 3 replies; 33+ messages in thread
From: Carlos O'Donell @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer
  Cc: H.J. Lu, Generic System V Application Binary Interface, gnu-gabi

On 03/22/2018 03:59 AM, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>> On 03/21/2018 03:04 PM, Florian Weimer wrote:
>>> * H. J. Lu:
>>>
>>>>> Could we ship a template for the PLT entries in ld.so instead?  And if
>>>>> needed, map it from the file together with an address array, like this?
>>>>
>>>> This won't work since linker needs to know exactly PLT layout to generate
>>>> JUMP_SLOT relocations for LD_AUDIT.
>>>
>>> Why would we need JUMP_SLOT relocations?  Couldn't we install suitable
>>> interceptors for GLOB_DAT relocations instead, as long as they resolve
>>> to external function symbols?
>>
>> I think your suggestion might work, but why alter the existing
>> behaviour which users expect and is documented in countless linker
>> text books?
> 
> If you have references, please add them to the glibc implementation or
> the wiki.  It would certainly help those who are trying to work on the
> code.

Well, Levin's "Linker's and Loaders"
https://www.iecc.com/linker/linker10.html, is the immediate reference
that I have on my shelf, and that developers working on glibc/binutils
should read.

Likewise "Computer Systems: A Programmer's Perspective" by Bryant and
O'Halloran. Chapter 7 "Linking." which covers explicitly how the present
GOT and PLT work.

I have added these under developer resources on the wiki front page.

> My understanding is that the whole thing is quite underdocumented.
> For LD_AUDIT in particular, we only have the Solaris documentation,
> and that's for an independent implementation.

Yes, for LD_AUDIT it is underdocumented, here we have only the linux
man pages project pages.

>> Existing tooling to process such relocations and entries could
>> remain unchanged and we would continue to support LD_AUDIT.
> 
> My understanding is that H.J.'s proposal requires changes when running
> in non-audit mode.  It certainly requires relinking all binaries,
> perhaps even with special flags.

It would require a relink only to fix existing binaries which are broken
by the use of -fno-plt, which is not an option that has seen general use
anywhere that I am aware of.

> Using ld.so-generated thunks for all GLOB_DAT function symbol
> relocations would happen in audit mode only and should work with
> existing binaries which were built with -Wl,-z,now.

This is a very good reason to prefer one method over another, that we
could fix existing binaries. However, I still think the complexity of
such a fix outweighs what we are trying to fix. Do we have another use
for such stubs?

Today we have to admit that -fno-plt is not compatible with auditing.

I would like to change that to ensure that in future releases we are
able to let users use -fno-plt *and* auditing.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00   ` H.J. Lu
@ 2018-01-01  0:00     ` Florian Weimer
  2018-01-01  0:00       ` Carlos O'Donell
  2018-01-01  0:00     ` Florian Weimer
  1 sibling, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Generic System V Application Binary Interface, gnu-gabi

* H. J. Lu:

>> Could we ship a template for the PLT entries in ld.so instead?  And if
>> needed, map it from the file together with an address array, like this?
>
> This won't work since linker needs to know exactly PLT layout to generate
> JUMP_SLOT relocations for LD_AUDIT.

Why would we need JUMP_SLOT relocations?  Couldn't we install suitable
interceptors for GLOB_DAT relocations instead, as long as they resolve
to external function symbols?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00         ` Florian Weimer
  2018-01-01  0:00           ` Carlos O'Donell
@ 2018-01-01  0:00           ` H.J. Lu
  1 sibling, 0 replies; 33+ messages in thread
From: H.J. Lu @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Carlos O'Donell,
	Generic System V Application Binary Interface, gnu-gabi

On Thu, Mar 22, 2018 at 1:59 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * Carlos O'Donell:
>
>> On 03/21/2018 03:04 PM, Florian Weimer wrote:
>>> * H. J. Lu:
>>>
>>>>> Could we ship a template for the PLT entries in ld.so instead?  And if
>>>>> needed, map it from the file together with an address array, like this?
>>>>
>>>> This won't work since linker needs to know exactly PLT layout to generate
>>>> JUMP_SLOT relocations for LD_AUDIT.
>>>
>>> Why would we need JUMP_SLOT relocations?  Couldn't we install suitable
>>> interceptors for GLOB_DAT relocations instead, as long as they resolve
>>> to external function symbols?
>>
>> I think your suggestion might work, but why alter the existing
>> behaviour which users expect and is documented in countless linker
>> text books?
>
> If you have references, please add them to the glibc implementation or
> the wiki.  It would certainly help those who are trying to work on the
> code.
>
> My understanding is that the whole thing is quite underdocumented.
> For LD_AUDIT in particular, we only have the Solaris documentation,
> and that's for an independent implementation.
>
>> Existing tooling to process such relocations and entries could
>> remain unchanged and we would continue to support LD_AUDIT.
>
> My understanding is that H.J.'s proposal requires changes when running
> in non-audit mode.  It certainly requires relinking all binaries,
> perhaps even with special flags.

Please see

https://github.com/hjl-tools/glibc/tree/hjl/plt/audit

and

https://github.com/hjl-tools/binutils-gdb/tree/users/hjl/plt/audit

for my glibc and binutils implementations.  My changes are relatively
small and have minimum overhead when LD_AUDIT isn't used.

> Using ld.so-generated thunks for all GLOB_DAT function symbol
> relocations would happen in audit mode only and should work with
> existing binaries which were built with -Wl,-z,now.

It means to put parts of ld into ld.so.  This is a much bigger change.

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00     ` Florian Weimer
@ 2018-01-01  0:00       ` Carlos O'Donell
  2018-01-01  0:00         ` Florian Weimer
  0 siblings, 1 reply; 33+ messages in thread
From: Carlos O'Donell @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer, H.J. Lu
  Cc: Generic System V Application Binary Interface, gnu-gabi

On 03/21/2018 03:04 PM, Florian Weimer wrote:
> * H. J. Lu:
> 
>>> Could we ship a template for the PLT entries in ld.so instead?  And if
>>> needed, map it from the file together with an address array, like this?
>>
>> This won't work since linker needs to know exactly PLT layout to generate
>> JUMP_SLOT relocations for LD_AUDIT.
> 
> Why would we need JUMP_SLOT relocations?  Couldn't we install suitable
> interceptors for GLOB_DAT relocations instead, as long as they resolve
> to external function symbols?

I think your suggestion might work, but why alter the existing behaviour which
users expect and is documented in countless linker text books?

Existing tooling to process such relocations and entries could remain unchanged
and we would continue to support LD_AUDIT.

What other benefits would we gain from your suggestion?

Cheers,
Carlos.
 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00               ` H.J. Lu
@ 2018-01-01  0:00                 ` Florian Weimer
  2018-01-01  0:00                   ` H.J. Lu
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Carlos O'Donell,
	Generic System V Application Binary Interface, gnu-gabi

* H. J. Lu:

> On Thu, Mar 22, 2018 at 9:47 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>> * Carlos O'Donell:
>>
>>> Well, Levin's "Linker's and Loaders"
>>> https://www.iecc.com/linker/linker10.html, is the immediate reference
>>> that I have on my shelf, and that developers working on glibc/binutils
>>> should read.
>>
>> Thanks, I didn't know that.
>>
>>>> My understanding is that H.J.'s proposal requires changes when running
>>>> in non-audit mode.  It certainly requires relinking all binaries,
>>>> perhaps even with special flags.
>>>
>>> It would require a relink only to fix existing binaries which are broken
>>> by the use of -fno-plt, which is not an option that has seen general use
>>> anywhere that I am aware of.
>>
>> I don't think that's actually true.  BFD ld has not emitted
>> R_X86_64_JUMP_SLOT relocations with -z now for quite some time now.
>> This optimization predates -fno-plt.
>>
>
> Not true with binutils 2.30:
>
> [hjl@gnu-bdx-1 include]$ readelf -d /bin/ld | grep NOW
>  0x0000000000000018 (BIND_NOW)
>  0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE
> [hjl@gnu-bdx-1 include]$ readelf -rW /bin/ld | grep JUMP_SLOT
> 00000000001b0868  0000000100000007 R_X86_64_JUMP_SLOT
> 0000000000000000 getenv@GLIBC_2.2.5 + 0
> ...

But binutils 2.28 or some earlier version exhibited different
behavior, right?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00     ` Cary Coutant
@ 2018-01-01  0:00       ` Alan Modra
  2018-01-01  0:00         ` H.J. Lu
  2018-01-01  0:00       ` H.J. Lu
  1 sibling, 1 reply; 33+ messages in thread
From: Alan Modra @ 2018-01-01  0:00 UTC (permalink / raw)
  To: generic-abi; +Cc: Carlos O'Donell, gnu-gabi

On Wed, Mar 21, 2018 at 10:15:26PM -0700, Cary Coutant wrote:
> If you get rid of the GOT entry, and have the point of call jump
> indirectly through the PLTGOT entry, which is initialized to point to
> part (b) of the PLT entry, everything should work the same as without
> -fno-plt. Essentially, all -fno-plt would do is inline part (a) of the
> PLT entry.
> 
> -cary
> 
> * I'm using parts (a) and (b) to refer to the two parts of a PLT
> entry: (a) an indirect jump via the PLTGOT entry, and (b) code that
> jumps to the lazy binding routine, passing the JUMP_SLOT index.

Yes, that essentially is what I've done for -fno-plt on powerpc.
The call stub code is inlined while the rest of the PLT is more or
less unchanged.  So you get all of the usual lazy-binding features
by default, and can use "-z now -z relro" if you want a read-only
PLT.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RFC: Audit external function called indirectly via GOT
@ 2018-01-01  0:00 H.J. Lu
  2018-01-01  0:00 ` Cary Coutant
  2018-01-01  0:00 ` Florian Weimer
  0 siblings, 2 replies; 33+ messages in thread
From: H.J. Lu @ 2018-01-01  0:00 UTC (permalink / raw)
  To: generic-abi; +Cc: gnu-gabi

Auditing of external function calls and their return values relies on
lazy binding with PLT.  When external functions are called indirectly
via GOT without using PLT, auditing stops working.

Here is a proposal to support auditing of external function called
indirectly via GOT:

1. Add optional dynamic tags:

 #define DT_GNU_PLT     0x6ffffef4  /* Address of PLT section  */
 #define DT_GNU_PLTSZ   0x6ffffdf1  /* Size of PLT section  */
 #define DT_GNU_PLTENT  0x6ffffdf2  /* Size of one PLT entry  */
 #define DT_GNU_PLT0SZ  0x6ffffdf3  /* Size of the first PLT entry  */
 #define DT_GNU_PLTGOTSZ 0x6ffffdf4 /* Size of PLTGOT section  */

and update DT_FLAGS_1 with:

 #define DF_1_JMPRELIGN 0x10000000  /* DT_JMPREL can be ignored  */
2. Linker creates PLT entries for auditing external function calls via
GOT and sets DT_GNU_PLT, DT_GNU_PLTSZ, DT_GNU_PLTENT, DT_GNU_PLT0SZ and
DT_GNU_PLTGOTSZ.  If PLT isn't required for lazy binding, set the
DF_1_JMPRELIGN bit in DT_FLAGS_1.
3. When auditing is enabled at run-time, dynamic linker resolves GLOB_DAT
relocation to its corresponding PLT entry by finding JUMP_SLOT relocation
against the same function and use its PLT slot as the function address.
On x86, the first PLT entry and the 3 GOT slots are reserved.  GOT slot
is (JUMP_SLOT relocation offset - DT_PLTGOT) / size of GOT entry.  PLT
offset is (GOT slot - 3) * DT_GNU_PLTENT + DT_GNU_PLT0SZ.  PLT address
is DT_GNU_PLT + PLT offset.  DT_GNU_PLT, DT_GNU_PLTSZ, DT_PLTGOT and
DT_GNU_PLTGOTSZ can be used to check if GOT and PLT offsets are within
range.
4. If DF_1_JMPRELIGN is set, dynamic linker can ignore DT_JMPREL when
lazy binding is disabled.

Any comments?

H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00           ` Carlos O'Donell
  2018-01-01  0:00             ` Florian Weimer
  2018-01-01  0:00             ` H.J. Lu
@ 2018-01-01  0:00             ` Cary Coutant
  2018-01-01  0:00               ` H.J. Lu
  2 siblings, 1 reply; 33+ messages in thread
From: Cary Coutant @ 2018-01-01  0:00 UTC (permalink / raw)
  To: generic-abi; +Cc: Florian Weimer, H.J. Lu, gnu-gabi

> Today we have to admit that -fno-plt is not compatible with auditing.
>
> I would like to change that to ensure that in future releases we are
> able to let users use -fno-plt *and* auditing.

The security features are all about locking down the GOT and the
PLTGOT at program startup. The auditing features take advantage of the
lazy binding mechanism and want to fiddle with those tables
dynamically. I don't see how you're going to make the two compatible.

-cary

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00     ` Florian Weimer
@ 2018-01-01  0:00       ` H.J. Lu
  2018-01-01  0:00         ` Florian Weimer
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Generic System V Application Binary Interface, gnu-gabi

On Wed, Mar 28, 2018 at 11:37 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 03/20/2018 05:52 PM, H.J. Lu wrote:
>>
>> On Mon, Mar 19, 2018 at 1:21 AM, Florian Weimer <fweimer@redhat.com>
>> wrote:
>>>
>>> On 03/17/2018 02:31 PM, H.J. Lu wrote:
>>>>
>>>>
>>>> Auditing of external function calls and their return values relies on
>>>> lazy binding with PLT.  When external functions are called indirectly
>>>> via GOT without using PLT, auditing stops working.
>>>>
>>>> Here is a proposal to support auditing of external function called
>>>> indirectly via GOT:
>>>>
>>>> 1. Add optional dynamic tags:
>>>>
>>>>    #define DT_GNU_PLT     0x6ffffef4  /* Address of PLT section  */
>>>>    #define DT_GNU_PLTSZ   0x6ffffdf1  /* Size of PLT section  */
>>>>    #define DT_GNU_PLTENT  0x6ffffdf2  /* Size of one PLT entry  */
>>>>    #define DT_GNU_PLT0SZ  0x6ffffdf3  /* Size of the first PLT entry  */
>>>>    #define DT_GNU_PLTGOTSZ 0x6ffffdf4 /* Size of PLTGOT section  */
>>>>
>>>> and update DT_FLAGS_1 with:
>>>>
>>>>    #define DF_1_JMPRELIGN 0x10000000  /* DT_JMPREL can be ignored  */
>>>> 2. Linker creates PLT entries for auditing external function calls via
>>>> GOT and sets DT_GNU_PLT, DT_GNU_PLTSZ, DT_GNU_PLTENT, DT_GNU_PLT0SZ and
>>>> DT_GNU_PLTGOTSZ.  If PLT isn't required for lazy binding, set the
>>>> DF_1_JMPRELIGN bit in DT_FLAGS_1.
>>>
>>>
>>>
>>> Could we ship a template for the PLT entries in ld.so instead?  And if
>>> needed, map it from the file together with an address array, like this?
>>
>>
>> This won't work since linker needs to know exactly PLT layout to generate
>> JUMP_SLOT relocations for LD_AUDIT.
>
>
> I don't see why it would need JUMP_SLOT relocations if it simply
> auto-generates PLT stub equivalents and installs them in GLOB_DAT
> relocations.

My understanding is that LD_AUDIT is based on JUMP_SLOT relocations.

> Anyway, going back to the larger question what we need here.
>
> I used  this as a test case for audit support with BIND_NOW:
>
> latrace /bin/true --help
>
> Most of Fedora is compiled with BIND_NOW.  Fedora 26 does not print latrace
> messages (the problem I mentioned earlier), Fedora 27 works (yay), Fedora 28
> crashes (meh).
>
> So depending on which side Fedora 28+ falls, I think your approach might be
> viable.  I expect that a future binutils version would do this by default,
> and beyond the additional dynamic section tags, new PLT stubs would only be
> created for no-plt functions because current binutils is supposed to
> generate PLT entries again (after they went missing for -z now binaries for
> some time).
>

-fno-plt is a compiler option, not a linker option.  Linker generates PLT for
PLT32 relocations to external functions.


-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00           ` Alan Modra
@ 2018-01-01  0:00             ` H.J. Lu
  0 siblings, 0 replies; 33+ messages in thread
From: H.J. Lu @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Generic System V Application Binary Interface
  Cc: Carlos O'Donell, gnu-gabi

On Thu, Mar 22, 2018 at 6:30 AM, Alan Modra <amodra@gmail.com> wrote:
> On Thu, Mar 22, 2018 at 05:39:18AM -0700, H.J. Lu wrote:
>> On Thu, Mar 22, 2018 at 5:29 AM, Alan Modra <amodra@gmail.com> wrote:
>> > On Wed, Mar 21, 2018 at 10:15:26PM -0700, Cary Coutant wrote:
>> >> If you get rid of the GOT entry, and have the point of call jump
>> >> indirectly through the PLTGOT entry, which is initialized to point to
>> >> part (b) of the PLT entry, everything should work the same as without
>> >> -fno-plt. Essentially, all -fno-plt would do is inline part (a) of the
>> >> PLT entry.
>> >>
>> >> -cary
>> >>
>> >> * I'm using parts (a) and (b) to refer to the two parts of a PLT
>> >> entry: (a) an indirect jump via the PLTGOT entry, and (b) code that
>> >> jumps to the lazy binding routine, passing the JUMP_SLOT index.
>> >
>> > Yes, that essentially is what I've done for -fno-plt on powerpc.
>> > The call stub code is inlined while the rest of the PLT is more or
>> > less unchanged.  So you get all of the usual lazy-binding features
>> > by default, and can use "-z now -z relro" if you want a read-only
>> > PLT.
>> >
>>
>> On x86, PLT is always read-only.  The issue is the writable PLTGOT.
>
> Yes, I do know how the x86 PLT works.  (Or to be more honest, how it
> used to work..)  To be clear, I was using PLT to refer to the whole
> scheme, ie. the code to do an indirect jump (x86 .plt), plus a table
> of addresses (x86 .plt.got), plus code for lazy binding (x86 .plt
> again).  Like x86 the powerpc PLT code to do indirect jumps and lazy
> binding is read-only nowadays.  -fno-plt on powerpc inlines the code
> to do the indirect jump, but leaves the table of addresses and the
> lazy binding code functionally unchanged.
>
>> On x86, -fno-plt removes the writable PLTGOT.
>
> I think that may have been a mistake.  You could have kept .plt.got
> functionally unchanged, giving you a writable .plt.got by default with
> -fno-plt, and read-only when "-z now -z relro".  Just like the usual
> -fplt case.
>

This is done on purpose.  See "Alternate Code Sequences For
Security" chapter on x86-64 psABI version 1.0.

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00 RFC: Audit external function called indirectly via GOT H.J. Lu
  2018-01-01  0:00 ` Cary Coutant
@ 2018-01-01  0:00 ` Florian Weimer
  2018-01-01  0:00   ` H.J. Lu
  1 sibling, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: generic-abi, H.J. Lu; +Cc: gnu-gabi

On 03/17/2018 02:31 PM, H.J. Lu wrote:
> Auditing of external function calls and their return values relies on
> lazy binding with PLT.  When external functions are called indirectly
> via GOT without using PLT, auditing stops working.
> 
> Here is a proposal to support auditing of external function called
> indirectly via GOT:
> 
> 1. Add optional dynamic tags:
> 
>   #define DT_GNU_PLT     0x6ffffef4  /* Address of PLT section  */
>   #define DT_GNU_PLTSZ   0x6ffffdf1  /* Size of PLT section  */
>   #define DT_GNU_PLTENT  0x6ffffdf2  /* Size of one PLT entry  */
>   #define DT_GNU_PLT0SZ  0x6ffffdf3  /* Size of the first PLT entry  */
>   #define DT_GNU_PLTGOTSZ 0x6ffffdf4 /* Size of PLTGOT section  */
> 
> and update DT_FLAGS_1 with:
> 
>   #define DF_1_JMPRELIGN 0x10000000  /* DT_JMPREL can be ignored  */
> 2. Linker creates PLT entries for auditing external function calls via
> GOT and sets DT_GNU_PLT, DT_GNU_PLTSZ, DT_GNU_PLTENT, DT_GNU_PLT0SZ and
> DT_GNU_PLTGOTSZ.  If PLT isn't required for lazy binding, set the
> DF_1_JMPRELIGN bit in DT_FLAGS_1.

Could we ship a template for the PLT entries in ld.so instead?  And if 
needed, map it from the file together with an address array, like this?

   Data page with pointer
   PLT template from ld.so (loading pointers from the previous page)

This process can get be repeated, to obtain as many PLT stubs as needed. 
  It's not a real JIT, so SELinux will still be happy.

The data page would probably contain two pointers per PLT entry, not 
just one, so that the reserved PLT entries aren't necessary.

> 3. When auditing is enabled at run-time, dynamic linker resolves GLOB_DAT
> relocation to its corresponding PLT entry by finding JUMP_SLOT relocation
> against the same function and use its PLT slot as the function address.

This step would stay the same.

I wonder if this would make it possible to restore audit support for 
existing binaries which lack PLT entries today.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00       ` Carlos O'Donell
@ 2018-01-01  0:00         ` Florian Weimer
  2018-01-01  0:00           ` Carlos O'Donell
  2018-01-01  0:00           ` H.J. Lu
  0 siblings, 2 replies; 33+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: H.J. Lu, Generic System V Application Binary Interface, gnu-gabi

* Carlos O'Donell:

> On 03/21/2018 03:04 PM, Florian Weimer wrote:
>> * H. J. Lu:
>> 
>>>> Could we ship a template for the PLT entries in ld.so instead?  And if
>>>> needed, map it from the file together with an address array, like this?
>>>
>>> This won't work since linker needs to know exactly PLT layout to generate
>>> JUMP_SLOT relocations for LD_AUDIT.
>> 
>> Why would we need JUMP_SLOT relocations?  Couldn't we install suitable
>> interceptors for GLOB_DAT relocations instead, as long as they resolve
>> to external function symbols?
>
> I think your suggestion might work, but why alter the existing
> behaviour which users expect and is documented in countless linker
> text books?

If you have references, please add them to the glibc implementation or
the wiki.  It would certainly help those who are trying to work on the
code.

My understanding is that the whole thing is quite underdocumented.
For LD_AUDIT in particular, we only have the Solaris documentation,
and that's for an independent implementation.

> Existing tooling to process such relocations and entries could
> remain unchanged and we would continue to support LD_AUDIT.

My understanding is that H.J.'s proposal requires changes when running
in non-audit mode.  It certainly requires relinking all binaries,
perhaps even with special flags.

Using ld.so-generated thunks for all GLOB_DAT function symbol
relocations would happen in audit mode only and should work with
existing binaries which were built with -Wl,-z,now.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00       ` H.J. Lu
@ 2018-01-01  0:00         ` Florian Weimer
  0 siblings, 0 replies; 33+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Generic System V Application Binary Interface, gnu-gabi

On 03/28/2018 08:41 PM, H.J. Lu wrote:

>> I don't see why it would need JUMP_SLOT relocations if it simply
>> auto-generates PLT stub equivalents and installs them in GLOB_DAT
>> relocations.
> 
> My understanding is that LD_AUDIT is based on JUMP_SLOT relocations.

The current implementation on x86, yes, to avoid the need for run-time 
code  generation.  But that doesn't mean it's the best way forward. 
Certainly not if the toolchain no longer generates JUMP_SLOT relocations 
(like it did at some point in the past).

>> Anyway, going back to the larger question what we need here.
>>
>> I used  this as a test case for audit support with BIND_NOW:
>>
>> latrace /bin/true --help
>>
>> Most of Fedora is compiled with BIND_NOW.  Fedora 26 does not print latrace
>> messages (the problem I mentioned earlier), Fedora 27 works (yay), Fedora 28
>> crashes (meh).
>>
>> So depending on which side Fedora 28+ falls, I think your approach might be
>> viable.  I expect that a future binutils version would do this by default,
>> and beyond the additional dynamic section tags, new PLT stubs would only be
>> created for no-plt functions because current binutils is supposed to
>> generate PLT entries again (after they went missing for -z now binaries for
>> some time).

> -fno-plt is a compiler option, not a linker option.  Linker generates PLT for
> PLT32 relocations to external functions.

That doesn't change the point—if future binutils versions elide 
JUMP_SLOT relocations, then your proposal is not going to solve our 
issue.  As I wrote, I cannot verify the current state because the 
toolchain regressed again.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00       ` H.J. Lu
@ 2018-01-01  0:00         ` Cary Coutant
  2018-01-01  0:00           ` Cary Coutant
  2018-01-01  0:00           ` H.J. Lu
  0 siblings, 2 replies; 33+ messages in thread
From: Cary Coutant @ 2018-01-01  0:00 UTC (permalink / raw)
  To: generic-abi; +Cc: Carlos O'Donell, gnu-gabi

>> My suggestion was that the GOT entry could be statically initialized
>> by the linker to point to the provisional PLT entry, rather than
>> forcing the dynamic loader to go through all this messy computation.
>> If auditing is not enabled, it would process the GLOB_DAT relocation
>> normally, and set the GOT entry to point to the actual function,
>
> elf_machine_plt_address in my glibc patch:
>
> https://github.com/hjl-tools/glibc/commit/aa8f2f5b9f395769f30d776649a11c2a045dd9e2
>
> has
>
> if (__glibc_unlikely (GLRO(dl_naudit) > 0)
> && map->l_info[ADDRIDX (DT_GNU_PLT)]
> && map->l_info[DT_JMPREL]
> && ELFW(ST_TYPE) (refsym->st_info) == STT_FUNC)
> {
>    Find the matching JUMP_SLOT relocation.
> }
> else
>    Use the original resolution.
>
> If LD_AUDIT is unused, the whole thing is skipped.
>
>> bypassing the provisional PLT and PLTGOT entries completely. If
>> auditing is enabled, it could simply ignore the GLOB_DAT relocation
>> (or, if the binary is PIE, it could process it as a RELATIVE
>> relocation), and the -fno-plt calls will end up jumping to the
>> provisional PLT entry.
>>
>> (This is already how we handle the PLTGOT entries: the linker
>> statically initializes the entries to point to part (b)* of the PLT
>> entry, while putting JUMP_SLOT relocations for those entries into the
>> JMPREL table.)
>>
>> I think if you do that, none of these extra dynamic table entries will
>> be needed, except for the IGNORE_JMPREL flag that indicates there are
>> no JMPREL slots other than those for the provisional PLT entries. How
>> useful is that flag? If the final program has even one external call
>> that was *not* compiled with -fno-plt, you won't be able to set it.
>> Would it be better to partition the JMPREL and PLT tables into
>> "regular" and "provisional" entries? That would take just a single new
>> DT_PROVISIONAL_JMPREL entry to tell the dynamic loader where the
>> JMPREL entries for the provisional PLT entries begin; it can ignore
>> everything past that point when auditing is turned off.
>
> These new dynamic tags are used to compute PLT offset from GOT offset.
> See elf_machine_plt_address in my patch.

Kinda reminds me of "These go to 11."

What I'm suggesting eliminates the need for the dynamic loader to
compute the PLT offset from the GOT offset, and therefore eliminates
the need for all these additional DT entries.

>> I suppose you may also want to partition the GLOB_DAT relocations, so
>> that the dynamic loader can easily figure out which ones to ignore
>> when auditing is enabled. That would take another dynamic table entry.
>>
>> Now, why do we need both the regular GOT entry and the provisional
>> PLTGOT entry? If the program is linked with -z relro and lazy binding,
>> you can put the GOT entries in the RELRO segment, and the PLTGOT
>> entries in writable data. That gives you the security when auditing is
>> turned off, and the ability to dynamically patch the PLTGOT when it's
>> turned on. In any other case, however, I see no reason to have both.
>> If you get rid of the GOT entry, and have the point of call jump
>> indirectly through the PLTGOT entry, which is initialized to point to
>> part (b) of the PLT entry, everything should work the same as without
>> -fno-plt. Essentially, all -fno-plt would do is inline part (a) of the
>> PLT entry.
>
> I want to use both so that GOT is read-only after relocation in
> normal case and the writable PLTGOT is only used for LD_AUDIT.

But if the program isn't linked with relro, the PLTGOT entries remain
writable and you have no need for both. If it's linked with immediate
binding and relro, the PLTGOT entries become relro, and again you have
no need for both. The only case where you can make an argument for
both is when the program is linked with both relro and lazy binding.
But I don't see why you need the additional security if you're not
bothering to link with immediate binding.

-cary

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00   ` Carlos O'Donell
@ 2018-01-01  0:00     ` Cary Coutant
  2018-01-01  0:00       ` Alan Modra
  2018-01-01  0:00       ` H.J. Lu
  0 siblings, 2 replies; 33+ messages in thread
From: Cary Coutant @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: generic-abi, gnu-gabi

> To be specific we are talking about the Solaris LD_AUDIT support that is
> implemented in the GNU dynamic loader ld.so. This has been a very useful
> thing for developers to have, particularly those working on schemes that
> alter lookup paths or binding rules. Also those that use these hooks to
> do other useful auditing. There were a lot of Solaris LD_AUDIT users, and
> now there are a lot of users that use this same feature in the GNU tools.

The description of la_symbind*() says this:

       "The return value of la_symbind32() and la_symbind64() is the address
       to which control should be passed after the function returns.  If the
       auditing library is simply monitoring symbol bindings, then it should
       return sym->st_value.  A different value may be returned if the
       library wishes to direct control to an alternate location."

That implies that it is called only for symbols that are
dynamically-bound (i.e., lazy binding). Does this mean that you want
to cancel the immediate binding effects of -fno-plt?

> The problem comes when you build with -fno-plt, or if you elide a PLT slot
> for any other reason, there is no longer a place for the LD_AUDIT
> infrastructure to hook into.
>
> In the case of x86 the -fno-plt generated code is a direct call through
> the GOT. The GOT is RO after relocation (relro), and so most tooling expects
> that it cannot be changed. Therefore it's not entirely kosher to reuse the
> GOT for this purpose, though you could do that, in fact on x86 the GLOB_DAT
> reloc and GOT entry look an awful lot like a function descriptor and a call
> through that function descriptor (for arches that have non-code PLTs).
>
> By keeping the generation of the PLT slot, but not using it, you can go back
> and re-use that PLT entry for auditing. If you are RELRO then you are going
> to pay a performance cost for turning on auditing, you will be forced to
> go through the PLT call sequence every time, enter the loader, find your
> already computed resolution in the loader's cache, and continue. If you are
> non-RELRO you can finalize the binding in the PLT.

I'm not sure if you're saying that this is worse with -fno-plt than
without. Wouldn't you have the same performance cost either way, if
auditing is turned on?

> What does "statically relocated" mean?

If I'm reading HJ's proposal correctly, he's got: (1) a regular GOT
entry (with a GLOB_DAT relocation), (2) a "provisional" PLTGOT entry
(with a JUMP_SLOT relocation), and (3) a "provisional" PLT entry for
each external function, and all these extra dynamic table entries are
there so that:

(1) the dynamic loader can find the provisional PLTGOT entry for the
same function by matching the GLOB_DAT relocations with the JUMP_SLOT
relocations,
(2) use that to find the corresponding provisional PLT entry,
(3) relocate the GOT entry to point to that PLT entry,
(4) which will then proceed to use the PLTGOT entry for binding as if
-fno-plt had not been used.

My suggestion was that the GOT entry could be statically initialized
by the linker to point to the provisional PLT entry, rather than
forcing the dynamic loader to go through all this messy computation.
If auditing is not enabled, it would process the GLOB_DAT relocation
normally, and set the GOT entry to point to the actual function,
bypassing the provisional PLT and PLTGOT entries completely. If
auditing is enabled, it could simply ignore the GLOB_DAT relocation
(or, if the binary is PIE, it could process it as a RELATIVE
relocation), and the -fno-plt calls will end up jumping to the
provisional PLT entry.

(This is already how we handle the PLTGOT entries: the linker
statically initializes the entries to point to part (b)* of the PLT
entry, while putting JUMP_SLOT relocations for those entries into the
JMPREL table.)

I think if you do that, none of these extra dynamic table entries will
be needed, except for the IGNORE_JMPREL flag that indicates there are
no JMPREL slots other than those for the provisional PLT entries. How
useful is that flag? If the final program has even one external call
that was *not* compiled with -fno-plt, you won't be able to set it.
Would it be better to partition the JMPREL and PLT tables into
"regular" and "provisional" entries? That would take just a single new
DT_PROVISIONAL_JMPREL entry to tell the dynamic loader where the
JMPREL entries for the provisional PLT entries begin; it can ignore
everything past that point when auditing is turned off.

I suppose you may also want to partition the GLOB_DAT relocations, so
that the dynamic loader can easily figure out which ones to ignore
when auditing is enabled. That would take another dynamic table entry.

Now, why do we need both the regular GOT entry and the provisional
PLTGOT entry? If the program is linked with -z relro and lazy binding,
you can put the GOT entries in the RELRO segment, and the PLTGOT
entries in writable data. That gives you the security when auditing is
turned off, and the ability to dynamically patch the PLTGOT when it's
turned on. In any other case, however, I see no reason to have both.
If you get rid of the GOT entry, and have the point of call jump
indirectly through the PLTGOT entry, which is initialized to point to
part (b) of the PLT entry, everything should work the same as without
-fno-plt. Essentially, all -fno-plt would do is inline part (a) of the
PLT entry.

-cary

* I'm using parts (a) and (b) to refer to the two parts of a PLT
entry: (a) an indirect jump via the PLTGOT entry, and (b) code that
jumps to the lazy binding routine, passing the JUMP_SLOT index.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00       ` Alan Modra
@ 2018-01-01  0:00         ` H.J. Lu
  2018-01-01  0:00           ` Alan Modra
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Generic System V Application Binary Interface
  Cc: Carlos O'Donell, gnu-gabi

On Thu, Mar 22, 2018 at 5:29 AM, Alan Modra <amodra@gmail.com> wrote:
> On Wed, Mar 21, 2018 at 10:15:26PM -0700, Cary Coutant wrote:
>> If you get rid of the GOT entry, and have the point of call jump
>> indirectly through the PLTGOT entry, which is initialized to point to
>> part (b) of the PLT entry, everything should work the same as without
>> -fno-plt. Essentially, all -fno-plt would do is inline part (a) of the
>> PLT entry.
>>
>> -cary
>>
>> * I'm using parts (a) and (b) to refer to the two parts of a PLT
>> entry: (a) an indirect jump via the PLTGOT entry, and (b) code that
>> jumps to the lazy binding routine, passing the JUMP_SLOT index.
>
> Yes, that essentially is what I've done for -fno-plt on powerpc.
> The call stub code is inlined while the rest of the PLT is more or
> less unchanged.  So you get all of the usual lazy-binding features
> by default, and can use "-z now -z relro" if you want a read-only
> PLT.
>

On x86, PLT is always read-only.  The issue is the writable PLTGOT.
On x86, -fno-plt removes the writable PLTGOT.  My proposal puts
back the writable PLTGOT which is only used for LD_AUDIT.

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00             ` H.J. Lu
@ 2018-01-01  0:00               ` Carlos O'Donell
  0 siblings, 0 replies; 33+ messages in thread
From: Carlos O'Donell @ 2018-01-01  0:00 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Florian Weimer, Generic System V Application Binary Interface, gnu-gabi

On 03/22/2018 11:01 AM, H.J. Lu wrote:
> On Thu, Mar 22, 2018 at 8:36 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>>> Using ld.so-generated thunks for all GLOB_DAT function symbol
>>> relocations would happen in audit mode only and should work with
>>> existing binaries which were built with -Wl,-z,now.
>>
>> This is a very good reason to prefer one method over another, that we
>> could fix existing binaries. However, I still think the complexity of
>> such a fix outweighs what we are trying to fix. Do we have another use
>> for such stubs?
> 
> If you take a look at BFD linker, it generates different PLT layouts for
> MPX and CET.  It is totally transparent to ld.so.  Putting all PLT choices
> as well as adding new ones in ld.so is very complex.  I don't believe they
> belong to ld.so.

Belief is not a good reason to choose one technical solution over another.

I agree with your statements though, there would be a lot of additional
complexity added to ld.so without much apparent gain for that complexity
e.g. fixing existing -fno-plt binaries to work with LD_AUDIT. Which is why
I asked Florian if he had *other* uses for the stubs, since that might
change the balance. I admit it would have to be a very good reason to make
me consider the added complexity to balance the use case.

I think your solution as you have defined it is the best option, but we
should circle back and make sure we answer all of Cary and Alan's questions
to their satisfaction and gain consensus.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00             ` Cary Coutant
@ 2018-01-01  0:00               ` H.J. Lu
  2018-01-01  0:00                 ` Cary Coutant
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Cary Coutant
  Cc: Generic System V Application Binary Interface, Florian Weimer, gnu-gabi

On Thu, Mar 22, 2018 at 9:10 AM, Cary Coutant <ccoutant@gmail.com> wrote:
>> Today we have to admit that -fno-plt is not compatible with auditing.
>>
>> I would like to change that to ensure that in future releases we are
>> able to let users use -fno-plt *and* auditing.
>
> The security features are all about locking down the GOT and the
> PLTGOT at program startup. The auditing features take advantage of the
> lazy binding mechanism and want to fiddle with those tables
> dynamically. I don't see how you're going to make the two compatible.
>

That is exactly what my proposal does:

1.  Provide both GOT and PLTGOT without lazy binding.
2. PLTGOT is unused without LD_AUDIT.
3. With LD_AUDIT, ld.so redirects GLOB_DAT relocation against GOT to
JUMP_SLOT relocation against PLTGOT.  This is not the same as lazy
binding since it happens every time when a function is called, not just
the first time.

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00                   ` H.J. Lu
@ 2018-01-01  0:00                     ` Cary Coutant
  0 siblings, 0 replies; 33+ messages in thread
From: Cary Coutant @ 2018-01-01  0:00 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Generic System V Application Binary Interface, Florian Weimer, gnu-gabi

>>> My suggestion was that the GOT entry could be statically initialized
>>> by the linker to point to the provisional PLT entry, rather than
>>> forcing the dynamic loader to go through all this messy computation.
>>> If auditing is not enabled, it would process the GLOB_DAT relocation
>>> normally, and set the GOT entry to point to the actual function,
>>> bypassing the provisional PLT and PLTGOT entries completely. If
>>> auditing is enabled, it could simply ignore the GLOB_DAT relocation
>>> (or, if the binary is PIE, it could process it as a RELATIVE
>>> relocation), and the -fno-plt calls will end up jumping to the
>>> provisional PLT entry.
>>>
>>> (This is already how we handle the PLTGOT entries: the linker
>>> statically initializes the entries to point to part (b)* of the PLT
>>> entry, while putting JUMP_SLOT relocations for those entries into the
>>> JMPREL table.)
>>>
>>> I think if you do that, none of these extra dynamic table entries will
>>> be needed, ...
>
> Your scheme is very similar to mine.   Both generate one GLOB_DAT
> and one JUMP_SLOT relocation for the same function symbol.  But
> only one of them should be used at run-time.  Your scheme may be
> simpler when LD_AUDIT is used since you don't need to update GOT
> slot.  But you still need to decide if a GLOB_DAT relocation should be
> skipped for LD_AUDIT.

That's why I then suggested this:

> I suppose you may also want to partition the GLOB_DAT relocations, so
> that the dynamic loader can easily figure out which ones to ignore
> when auditing is enabled. That would take another dynamic table entry.

-cary

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00               ` H.J. Lu
@ 2018-01-01  0:00                 ` Cary Coutant
  2018-01-01  0:00                   ` H.J. Lu
  0 siblings, 1 reply; 33+ messages in thread
From: Cary Coutant @ 2018-01-01  0:00 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Generic System V Application Binary Interface, Florian Weimer, gnu-gabi

I haven't seen a response yet to the primary point I was trying to make:

> My suggestion was that the GOT entry could be statically initialized
> by the linker to point to the provisional PLT entry, rather than
> forcing the dynamic loader to go through all this messy computation.
> If auditing is not enabled, it would process the GLOB_DAT relocation
> normally, and set the GOT entry to point to the actual function,
> bypassing the provisional PLT and PLTGOT entries completely. If
> auditing is enabled, it could simply ignore the GLOB_DAT relocation
> (or, if the binary is PIE, it could process it as a RELATIVE
> relocation), and the -fno-plt calls will end up jumping to the
> provisional PLT entry.
>
> (This is already how we handle the PLTGOT entries: the linker
> statically initializes the entries to point to part (b)* of the PLT
> entry, while putting JUMP_SLOT relocations for those entries into the
> JMPREL table.)
>
> I think if you do that, none of these extra dynamic table entries will
> be needed, ...

... or this secondary point:

> ... except for the IGNORE_JMPREL flag that indicates there are
> no JMPREL slots other than those for the provisional PLT entries. How
> useful is that flag? If the final program has even one external call
> that was *not* compiled with -fno-plt, you won't be able to set it.
> Would it be better to partition the JMPREL and PLT tables into
> "regular" and "provisional" entries? That would take just a single new
> DT_PROVISIONAL_JMPREL entry to tell the dynamic loader where the
> JMPREL entries for the provisional PLT entries begin; it can ignore
> everything past that point when auditing is turned off.

-cary

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00 ` Cary Coutant
@ 2018-01-01  0:00   ` Carlos O'Donell
  2018-01-01  0:00     ` Cary Coutant
  0 siblings, 1 reply; 33+ messages in thread
From: Carlos O'Donell @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Cary Coutant, generic-abi; +Cc: gnu-gabi

On 03/21/2018 11:16 AM, Cary Coutant wrote:
>> Auditing of external function calls and their return values relies on
>> lazy binding with PLT.  When external functions are called indirectly
>> via GOT without using PLT, auditing stops working.
> 
> Could you give a little background here? Why does it stop working?
> What does auditing rely on? I didn't find anything about this in the
> psABI document.

To be specific we are talking about the Solaris LD_AUDIT support that is
implemented in the GNU dynamic loader ld.so. This has been a very useful
thing for developers to have, particularly those working on schemes that
alter lookup paths or binding rules. Also those that use these hooks to
do other useful auditing. There were a lot of Solaris LD_AUDIT users, and
now there are a lot of users that use this same feature in the GNU tools.

The problem comes when you build with -fno-plt, or if you elide a PLT slot
for any other reason, there is no longer a place for the LD_AUDIT
infrastructure to hook into.

In the case of x86 the -fno-plt generated code is a direct call through
the GOT. The GOT is RO after relocation (relro), and so most tooling expects
that it cannot be changed. Therefore it's not entirely kosher to reuse the
GOT for this purpose, though you could do that, in fact on x86 the GLOB_DAT
reloc and GOT entry look an awful lot like a function descriptor and a call
through that function descriptor (for arches that have non-code PLTs).

By keeping the generation of the PLT slot, but not using it, you can go back
and re-use that PLT entry for auditing. If you are RELRO then you are going
to pay a performance cost for turning on auditing, you will be forced to
go through the PLT call sequence every time, enter the loader, find your
already computed resolution in the loader's cache, and continue. If you are
non-RELRO you can finalize the binding in the PLT.

Again, all of this is to support LD_AUDIT, which traditionally used PLT
entries and I'd like to keep this developer tooling working even in the
presence of optimized binaries.

> Here is a proposal to support auditing of external function called
>> indirectly via GOT:
>>
>> 1. Add optional dynamic tags:
>>
>>  #define DT_GNU_PLT     0x6ffffef4  /* Address of PLT section  */
>>  #define DT_GNU_PLTSZ   0x6ffffdf1  /* Size of PLT section  */
>>  #define DT_GNU_PLTENT  0x6ffffdf2  /* Size of one PLT entry  */
>>  #define DT_GNU_PLT0SZ  0x6ffffdf3  /* Size of the first PLT entry  */
>>  #define DT_GNU_PLTGOTSZ 0x6ffffdf4 /* Size of PLTGOT section  */
>>
>> and update DT_FLAGS_1 with:
>>
>>  #define DF_1_JMPRELIGN 0x10000000  /* DT_JMPREL can be ignored  */
>> 2. Linker creates PLT entries for auditing external function calls via
>> GOT and sets DT_GNU_PLT, DT_GNU_PLTSZ, DT_GNU_PLTENT, DT_GNU_PLT0SZ and
>> DT_GNU_PLTGOTSZ.  If PLT isn't required for lazy binding, set the
>> DF_1_JMPRELIGN bit in DT_FLAGS_1.
>> 3. When auditing is enabled at run-time, dynamic linker resolves GLOB_DAT
>> relocation to its corresponding PLT entry by finding JUMP_SLOT relocation
>> against the same function and use its PLT slot as the function address.
>> On x86, the first PLT entry and the 3 GOT slots are reserved.  GOT slot
>> is (JUMP_SLOT relocation offset - DT_PLTGOT) / size of GOT entry.  PLT
>> offset is (GOT slot - 3) * DT_GNU_PLTENT + DT_GNU_PLT0SZ.  PLT address
>> is DT_GNU_PLT + PLT offset.  DT_GNU_PLT, DT_GNU_PLTSZ, DT_PLTGOT and
>> DT_GNU_PLTGOTSZ can be used to check if GOT and PLT offsets are within
>> range.
>> 4. If DF_1_JMPRELIGN is set, dynamic linker can ignore DT_JMPREL when
>> lazy binding is disabled.
>>
>> Any comments?
> 
> Maybe a little more background would help me understand this better,
> but I don't see why the GOT slots aren't being (or couldn't be)
> statically relocated to point to the PLT slots. If the linker does
> that, all the dynamic loader has to do is ignore the JMPREL
> relocations at startup, and let lazy binding happen. I don't see why
> it would need to go through this complicated matching process.

What does "statically relocated" mean?

It appears you are implying the GOT slots for these function calls could
be statically relocated to their respective PLT entries.

This is possible, but you have another problem.

You have a list of GLOB_DAT relocs to process, some of which would overwrite
the statically relocated entries, how do you figure out which these are
and avoid processing them when LD_AUDIT is enabled?
 
> (One trivial comment on your choice of naming: I can't see "JMPRELIGN"
> without reading it as a misspelled "jump re-align"! Maybe "IGN_JMPREL"
> would be better for human readers.)

Agreed.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00 RFC: Audit external function called indirectly via GOT H.J. Lu
@ 2018-01-01  0:00 ` Cary Coutant
  2018-01-01  0:00   ` Carlos O'Donell
  2018-01-01  0:00 ` Florian Weimer
  1 sibling, 1 reply; 33+ messages in thread
From: Cary Coutant @ 2018-01-01  0:00 UTC (permalink / raw)
  To: generic-abi; +Cc: gnu-gabi

> Auditing of external function calls and their return values relies on
> lazy binding with PLT.  When external functions are called indirectly
> via GOT without using PLT, auditing stops working.

Could you give a little background here? Why does it stop working?
What does auditing rely on? I didn't find anything about this in the
psABI document.

> Here is a proposal to support auditing of external function called
> indirectly via GOT:
>
> 1. Add optional dynamic tags:
>
>  #define DT_GNU_PLT     0x6ffffef4  /* Address of PLT section  */
>  #define DT_GNU_PLTSZ   0x6ffffdf1  /* Size of PLT section  */
>  #define DT_GNU_PLTENT  0x6ffffdf2  /* Size of one PLT entry  */
>  #define DT_GNU_PLT0SZ  0x6ffffdf3  /* Size of the first PLT entry  */
>  #define DT_GNU_PLTGOTSZ 0x6ffffdf4 /* Size of PLTGOT section  */
>
> and update DT_FLAGS_1 with:
>
>  #define DF_1_JMPRELIGN 0x10000000  /* DT_JMPREL can be ignored  */
> 2. Linker creates PLT entries for auditing external function calls via
> GOT and sets DT_GNU_PLT, DT_GNU_PLTSZ, DT_GNU_PLTENT, DT_GNU_PLT0SZ and
> DT_GNU_PLTGOTSZ.  If PLT isn't required for lazy binding, set the
> DF_1_JMPRELIGN bit in DT_FLAGS_1.
> 3. When auditing is enabled at run-time, dynamic linker resolves GLOB_DAT
> relocation to its corresponding PLT entry by finding JUMP_SLOT relocation
> against the same function and use its PLT slot as the function address.
> On x86, the first PLT entry and the 3 GOT slots are reserved.  GOT slot
> is (JUMP_SLOT relocation offset - DT_PLTGOT) / size of GOT entry.  PLT
> offset is (GOT slot - 3) * DT_GNU_PLTENT + DT_GNU_PLT0SZ.  PLT address
> is DT_GNU_PLT + PLT offset.  DT_GNU_PLT, DT_GNU_PLTSZ, DT_PLTGOT and
> DT_GNU_PLTGOTSZ can be used to check if GOT and PLT offsets are within
> range.
> 4. If DF_1_JMPRELIGN is set, dynamic linker can ignore DT_JMPREL when
> lazy binding is disabled.
>
> Any comments?

Maybe a little more background would help me understand this better,
but I don't see why the GOT slots aren't being (or couldn't be)
statically relocated to point to the PLT slots. If the linker does
that, all the dynamic loader has to do is ignore the JMPREL
relocations at startup, and let lazy binding happen. I don't see why
it would need to go through this complicated matching process.

(One trivial comment on your choice of naming: I can't see "JMPRELIGN"
without reading it as a misspelled "jump re-align"! Maybe "IGN_JMPREL"
would be better for human readers.)

-cary

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00   ` H.J. Lu
  2018-01-01  0:00     ` Florian Weimer
@ 2018-01-01  0:00     ` Florian Weimer
  2018-01-01  0:00       ` H.J. Lu
  1 sibling, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Generic System V Application Binary Interface, gnu-gabi

On 03/20/2018 05:52 PM, H.J. Lu wrote:
> On Mon, Mar 19, 2018 at 1:21 AM, Florian Weimer <fweimer@redhat.com> wrote:
>> On 03/17/2018 02:31 PM, H.J. Lu wrote:
>>>
>>> Auditing of external function calls and their return values relies on
>>> lazy binding with PLT.  When external functions are called indirectly
>>> via GOT without using PLT, auditing stops working.
>>>
>>> Here is a proposal to support auditing of external function called
>>> indirectly via GOT:
>>>
>>> 1. Add optional dynamic tags:
>>>
>>>    #define DT_GNU_PLT     0x6ffffef4  /* Address of PLT section  */
>>>    #define DT_GNU_PLTSZ   0x6ffffdf1  /* Size of PLT section  */
>>>    #define DT_GNU_PLTENT  0x6ffffdf2  /* Size of one PLT entry  */
>>>    #define DT_GNU_PLT0SZ  0x6ffffdf3  /* Size of the first PLT entry  */
>>>    #define DT_GNU_PLTGOTSZ 0x6ffffdf4 /* Size of PLTGOT section  */
>>>
>>> and update DT_FLAGS_1 with:
>>>
>>>    #define DF_1_JMPRELIGN 0x10000000  /* DT_JMPREL can be ignored  */
>>> 2. Linker creates PLT entries for auditing external function calls via
>>> GOT and sets DT_GNU_PLT, DT_GNU_PLTSZ, DT_GNU_PLTENT, DT_GNU_PLT0SZ and
>>> DT_GNU_PLTGOTSZ.  If PLT isn't required for lazy binding, set the
>>> DF_1_JMPRELIGN bit in DT_FLAGS_1.
>>
>>
>> Could we ship a template for the PLT entries in ld.so instead?  And if
>> needed, map it from the file together with an address array, like this?
> 
> This won't work since linker needs to know exactly PLT layout to generate
> JUMP_SLOT relocations for LD_AUDIT.

I don't see why it would need JUMP_SLOT relocations if it simply 
auto-generates PLT stub equivalents and installs them in GLOB_DAT 
relocations.

Anyway, going back to the larger question what we need here.

I used  this as a test case for audit support with BIND_NOW:

latrace /bin/true --help

Most of Fedora is compiled with BIND_NOW.  Fedora 26 does not print 
latrace messages (the problem I mentioned earlier), Fedora 27 works 
(yay), Fedora 28 crashes (meh).

So depending on which side Fedora 28+ falls, I think your approach might 
be viable.  I expect that a future binutils version would do this by 
default, and beyond the additional dynamic section tags, new PLT stubs 
would only be created for no-plt functions because current binutils is 
supposed to generate PLT entries again (after they went missing for -z 
now binaries for some time).

Thanks,
Florian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00         ` H.J. Lu
@ 2018-01-01  0:00           ` Alan Modra
  2018-01-01  0:00             ` H.J. Lu
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Modra @ 2018-01-01  0:00 UTC (permalink / raw)
  To: generic-abi; +Cc: Carlos O'Donell, gnu-gabi

On Thu, Mar 22, 2018 at 05:39:18AM -0700, H.J. Lu wrote:
> On Thu, Mar 22, 2018 at 5:29 AM, Alan Modra <amodra@gmail.com> wrote:
> > On Wed, Mar 21, 2018 at 10:15:26PM -0700, Cary Coutant wrote:
> >> If you get rid of the GOT entry, and have the point of call jump
> >> indirectly through the PLTGOT entry, which is initialized to point to
> >> part (b) of the PLT entry, everything should work the same as without
> >> -fno-plt. Essentially, all -fno-plt would do is inline part (a) of the
> >> PLT entry.
> >>
> >> -cary
> >>
> >> * I'm using parts (a) and (b) to refer to the two parts of a PLT
> >> entry: (a) an indirect jump via the PLTGOT entry, and (b) code that
> >> jumps to the lazy binding routine, passing the JUMP_SLOT index.
> >
> > Yes, that essentially is what I've done for -fno-plt on powerpc.
> > The call stub code is inlined while the rest of the PLT is more or
> > less unchanged.  So you get all of the usual lazy-binding features
> > by default, and can use "-z now -z relro" if you want a read-only
> > PLT.
> >
> 
> On x86, PLT is always read-only.  The issue is the writable PLTGOT.

Yes, I do know how the x86 PLT works.  (Or to be more honest, how it
used to work..)  To be clear, I was using PLT to refer to the whole
scheme, ie. the code to do an indirect jump (x86 .plt), plus a table
of addresses (x86 .plt.got), plus code for lazy binding (x86 .plt
again).  Like x86 the powerpc PLT code to do indirect jumps and lazy
binding is read-only nowadays.  -fno-plt on powerpc inlines the code
to do the indirect jump, but leaves the table of addresses and the
lazy binding code functionally unchanged.

> On x86, -fno-plt removes the writable PLTGOT.

I think that may have been a mistake.  You could have kept .plt.got
functionally unchanged, giving you a writable .plt.got by default with
-fno-plt, and read-only when "-z now -z relro".  Just like the usual
-fplt case.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00                 ` Cary Coutant
@ 2018-01-01  0:00                   ` H.J. Lu
  2018-01-01  0:00                     ` Cary Coutant
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Cary Coutant
  Cc: Generic System V Application Binary Interface, Florian Weimer, gnu-gabi

On Tue, Mar 27, 2018 at 6:39 PM, Cary Coutant <ccoutant@gmail.com> wrote:
> I haven't seen a response yet to the primary point I was trying to make:
>
>> My suggestion was that the GOT entry could be statically initialized
>> by the linker to point to the provisional PLT entry, rather than
>> forcing the dynamic loader to go through all this messy computation.
>> If auditing is not enabled, it would process the GLOB_DAT relocation
>> normally, and set the GOT entry to point to the actual function,
>> bypassing the provisional PLT and PLTGOT entries completely. If
>> auditing is enabled, it could simply ignore the GLOB_DAT relocation
>> (or, if the binary is PIE, it could process it as a RELATIVE
>> relocation), and the -fno-plt calls will end up jumping to the
>> provisional PLT entry.
>>
>> (This is already how we handle the PLTGOT entries: the linker
>> statically initializes the entries to point to part (b)* of the PLT
>> entry, while putting JUMP_SLOT relocations for those entries into the
>> JMPREL table.)
>>
>> I think if you do that, none of these extra dynamic table entries will
>> be needed, ...

Your scheme is very similar to mine.   Both generate one GLOB_DAT
and one JUMP_SLOT relocation for the same function symbol.  But
only one of them should be used at run-time.  Your scheme may be
simpler when LD_AUDIT is used since you don't need to update GOT
slot.  But you still need to decide if a GLOB_DAT relocation should be
skipped for LD_AUDIT.

> ... or this secondary point:
>
>> ... except for the IGNORE_JMPREL flag that indicates there are
>> no JMPREL slots other than those for the provisional PLT entries. How
>> useful is that flag? If the final program has even one external call
>> that was *not* compiled with -fno-plt, you won't be able to set it.
>> Would it be better to partition the JMPREL and PLT tables into
>> "regular" and "provisional" entries? That would take just a single new
>> DT_PROVISIONAL_JMPREL entry to tell the dynamic loader where the
>> JMPREL entries for the provisional PLT entries begin; it can ignore
>> everything past that point when auditing is turned off.
>

This sounds a good idea..  I will take a look.

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00         ` Cary Coutant
  2018-01-01  0:00           ` Cary Coutant
@ 2018-01-01  0:00           ` H.J. Lu
  1 sibling, 0 replies; 33+ messages in thread
From: H.J. Lu @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Generic System V Application Binary Interface
  Cc: Carlos O'Donell, gnu-gabi

On Thu, Mar 22, 2018 at 9:02 AM, Cary Coutant <ccoutant@gmail.com> wrote:
>>> My suggestion was that the GOT entry could be statically initialized
>>> by the linker to point to the provisional PLT entry, rather than
>>> forcing the dynamic loader to go through all this messy computation.
>>> If auditing is not enabled, it would process the GLOB_DAT relocation
>>> normally, and set the GOT entry to point to the actual function,
>>
>> elf_machine_plt_address in my glibc patch:
>>
>> https://github.com/hjl-tools/glibc/commit/aa8f2f5b9f395769f30d776649a11c2a045dd9e2
>>
>> has
>>
>> if (__glibc_unlikely (GLRO(dl_naudit) > 0)
>> && map->l_info[ADDRIDX (DT_GNU_PLT)]
>> && map->l_info[DT_JMPREL]
>> && ELFW(ST_TYPE) (refsym->st_info) == STT_FUNC)
>> {
>>    Find the matching JUMP_SLOT relocation.
>> }
>> else
>>    Use the original resolution.
>>
>> If LD_AUDIT is unused, the whole thing is skipped.
>>
>>> bypassing the provisional PLT and PLTGOT entries completely. If
>>> auditing is enabled, it could simply ignore the GLOB_DAT relocation
>>> (or, if the binary is PIE, it could process it as a RELATIVE
>>> relocation), and the -fno-plt calls will end up jumping to the
>>> provisional PLT entry.
>>>
>>> (This is already how we handle the PLTGOT entries: the linker
>>> statically initializes the entries to point to part (b)* of the PLT
>>> entry, while putting JUMP_SLOT relocations for those entries into the
>>> JMPREL table.)
>>>
>>> I think if you do that, none of these extra dynamic table entries will
>>> be needed, except for the IGNORE_JMPREL flag that indicates there are
>>> no JMPREL slots other than those for the provisional PLT entries. How
>>> useful is that flag? If the final program has even one external call
>>> that was *not* compiled with -fno-plt, you won't be able to set it.
>>> Would it be better to partition the JMPREL and PLT tables into
>>> "regular" and "provisional" entries? That would take just a single new
>>> DT_PROVISIONAL_JMPREL entry to tell the dynamic loader where the
>>> JMPREL entries for the provisional PLT entries begin; it can ignore
>>> everything past that point when auditing is turned off.
>>
>> These new dynamic tags are used to compute PLT offset from GOT offset.
>> See elf_machine_plt_address in my patch.
>
> Kinda reminds me of "These go to 11."
>
> What I'm suggesting eliminates the need for the dynamic loader to
> compute the PLT offset from the GOT offset, and therefore eliminates
> the need for all these additional DT entries.
>
>>> I suppose you may also want to partition the GLOB_DAT relocations, so
>>> that the dynamic loader can easily figure out which ones to ignore
>>> when auditing is enabled. That would take another dynamic table entry.
>>>
>>> Now, why do we need both the regular GOT entry and the provisional
>>> PLTGOT entry? If the program is linked with -z relro and lazy binding,
>>> you can put the GOT entries in the RELRO segment, and the PLTGOT
>>> entries in writable data. That gives you the security when auditing is
>>> turned off, and the ability to dynamically patch the PLTGOT when it's
>>> turned on. In any other case, however, I see no reason to have both.
>>> If you get rid of the GOT entry, and have the point of call jump
>>> indirectly through the PLTGOT entry, which is initialized to point to
>>> part (b) of the PLT entry, everything should work the same as without
>>> -fno-plt. Essentially, all -fno-plt would do is inline part (a) of the
>>> PLT entry.
>>
>> I want to use both so that GOT is read-only after relocation in
>> normal case and the writable PLTGOT is only used for LD_AUDIT.
>
> But if the program isn't linked with relro, the PLTGOT entries remain
> writable and you have no need for both. If it's linked with immediate
> binding and relro, the PLTGOT entries become relro, and again you have
> no need for both. The only case where you can make an argument for
> both is when the program is linked with both relro and lazy binding.
> But I don't see why you need the additional security if you're not
> bothering to link with immediate binding.
>

On Linux, -z relro is the default.  With -fno-plt, lazy binding is off.
That is why I need both GOT and PLTGOT.   Since PLTGOT is usually
unused, the writable PLTGOT isn't a security issue.

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00                 ` Florian Weimer
@ 2018-01-01  0:00                   ` H.J. Lu
  0 siblings, 0 replies; 33+ messages in thread
From: H.J. Lu @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Carlos O'Donell,
	Generic System V Application Binary Interface, gnu-gabi

On Thu, Mar 22, 2018 at 10:15 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * H. J. Lu:
>
>> On Thu, Mar 22, 2018 at 9:47 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>> * Carlos O'Donell:
>>>
>>>> Well, Levin's "Linker's and Loaders"
>>>> https://www.iecc.com/linker/linker10.html, is the immediate reference
>>>> that I have on my shelf, and that developers working on glibc/binutils
>>>> should read.
>>>
>>> Thanks, I didn't know that.
>>>
>>>>> My understanding is that H.J.'s proposal requires changes when running
>>>>> in non-audit mode.  It certainly requires relinking all binaries,
>>>>> perhaps even with special flags.
>>>>
>>>> It would require a relink only to fix existing binaries which are broken
>>>> by the use of -fno-plt, which is not an option that has seen general use
>>>> anywhere that I am aware of.
>>>
>>> I don't think that's actually true.  BFD ld has not emitted
>>> R_X86_64_JUMP_SLOT relocations with -z now for quite some time now.
>>> This optimization predates -fno-plt.
>>>
>>
>> Not true with binutils 2.30:
>>
>> [hjl@gnu-bdx-1 include]$ readelf -d /bin/ld | grep NOW
>>  0x0000000000000018 (BIND_NOW)
>>  0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE
>> [hjl@gnu-bdx-1 include]$ readelf -rW /bin/ld | grep JUMP_SLOT
>> 00000000001b0868  0000000100000007 R_X86_64_JUMP_SLOT
>> 0000000000000000 getenv@GLIBC_2.2.5 + 0
>> ...
>
> But binutils 2.28 or some earlier version exhibited different
> behavior, right?

Yes.

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00         ` Cary Coutant
@ 2018-01-01  0:00           ` Cary Coutant
  2018-01-01  0:00           ` H.J. Lu
  1 sibling, 0 replies; 33+ messages in thread
From: Cary Coutant @ 2018-01-01  0:00 UTC (permalink / raw)
  To: generic-abi; +Cc: Carlos O'Donell, gnu-gabi

>> I want to use both so that GOT is read-only after relocation in
>> normal case and the writable PLTGOT is only used for LD_AUDIT.
>
> But if the program isn't linked with relro, the PLTGOT entries remain
> writable and you have no need for both. If it's linked with immediate
> binding and relro, the PLTGOT entries become relro, and again you have
> no need for both. The only case where you can make an argument for
> both is when the program is linked with both relro and lazy binding.
> But I don't see why you need the additional security if you're not
> bothering to link with immediate binding.

Sorry, I meant to write "... if the program isn't linked with relro,
the *GOT* entries remain writable ...."

-cary

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00     ` Cary Coutant
  2018-01-01  0:00       ` Alan Modra
@ 2018-01-01  0:00       ` H.J. Lu
  2018-01-01  0:00         ` Cary Coutant
  1 sibling, 1 reply; 33+ messages in thread
From: H.J. Lu @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Generic System V Application Binary Interface
  Cc: Carlos O'Donell, gnu-gabi

On Wed, Mar 21, 2018 at 10:15 PM, Cary Coutant <ccoutant@gmail.com> wrote:
>> To be specific we are talking about the Solaris LD_AUDIT support that is
>> implemented in the GNU dynamic loader ld.so. This has been a very useful
>> thing for developers to have, particularly those working on schemes that
>> alter lookup paths or binding rules. Also those that use these hooks to
>> do other useful auditing. There were a lot of Solaris LD_AUDIT users, and
>> now there are a lot of users that use this same feature in the GNU tools.
>
> The description of la_symbind*() says this:
>
>        "The return value of la_symbind32() and la_symbind64() is the address
>        to which control should be passed after the function returns.  If the
>        auditing library is simply monitoring symbol bindings, then it should
>        return sym->st_value.  A different value may be returned if the
>        library wishes to direct control to an alternate location."
>
> That implies that it is called only for symbols that are
> dynamically-bound (i.e., lazy binding). Does this mean that you want
> to cancel the immediate binding effects of -fno-plt?
>
>> The problem comes when you build with -fno-plt, or if you elide a PLT slot
>> for any other reason, there is no longer a place for the LD_AUDIT
>> infrastructure to hook into.
>>
>> In the case of x86 the -fno-plt generated code is a direct call through
>> the GOT. The GOT is RO after relocation (relro), and so most tooling expects
>> that it cannot be changed. Therefore it's not entirely kosher to reuse the
>> GOT for this purpose, though you could do that, in fact on x86 the GLOB_DAT
>> reloc and GOT entry look an awful lot like a function descriptor and a call
>> through that function descriptor (for arches that have non-code PLTs).
>>
>> By keeping the generation of the PLT slot, but not using it, you can go back
>> and re-use that PLT entry for auditing. If you are RELRO then you are going
>> to pay a performance cost for turning on auditing, you will be forced to
>> go through the PLT call sequence every time, enter the loader, find your
>> already computed resolution in the loader's cache, and continue. If you are
>> non-RELRO you can finalize the binding in the PLT.
>
> I'm not sure if you're saying that this is worse with -fno-plt than
> without. Wouldn't you have the same performance cost either way, if
> auditing is turned on?
>
>> What does "statically relocated" mean?
>
> If I'm reading HJ's proposal correctly, he's got: (1) a regular GOT
> entry (with a GLOB_DAT relocation), (2) a "provisional" PLTGOT entry
> (with a JUMP_SLOT relocation), and (3) a "provisional" PLT entry for
> each external function, and all these extra dynamic table entries are
> there so that:
>
> (1) the dynamic loader can find the provisional PLTGOT entry for the
> same function by matching the GLOB_DAT relocations with the JUMP_SLOT
> relocations,
> (2) use that to find the corresponding provisional PLT entry,
> (3) relocate the GOT entry to point to that PLT entry,
> (4) which will then proceed to use the PLTGOT entry for binding as if
> -fno-plt had not been used.

That is correct.

> My suggestion was that the GOT entry could be statically initialized
> by the linker to point to the provisional PLT entry, rather than
> forcing the dynamic loader to go through all this messy computation.
> If auditing is not enabled, it would process the GLOB_DAT relocation
> normally, and set the GOT entry to point to the actual function,

elf_machine_plt_address in my glibc patch:

https://github.com/hjl-tools/glibc/commit/aa8f2f5b9f395769f30d776649a11c2a045dd9e2

has

if (__glibc_unlikely (GLRO(dl_naudit) > 0)
&& map->l_info[ADDRIDX (DT_GNU_PLT)]
&& map->l_info[DT_JMPREL]
&& ELFW(ST_TYPE) (refsym->st_info) == STT_FUNC)
{
   Find the matching JUMP_SLOT relocation.
}
else
   Use the original resolution.

If LD_AUDIT is unused, the whole thing is skipped.

> bypassing the provisional PLT and PLTGOT entries completely. If
> auditing is enabled, it could simply ignore the GLOB_DAT relocation
> (or, if the binary is PIE, it could process it as a RELATIVE
> relocation), and the -fno-plt calls will end up jumping to the
> provisional PLT entry.
>
> (This is already how we handle the PLTGOT entries: the linker
> statically initializes the entries to point to part (b)* of the PLT
> entry, while putting JUMP_SLOT relocations for those entries into the
> JMPREL table.)
>
> I think if you do that, none of these extra dynamic table entries will
> be needed, except for the IGNORE_JMPREL flag that indicates there are
> no JMPREL slots other than those for the provisional PLT entries. How
> useful is that flag? If the final program has even one external call
> that was *not* compiled with -fno-plt, you won't be able to set it.
> Would it be better to partition the JMPREL and PLT tables into
> "regular" and "provisional" entries? That would take just a single new
> DT_PROVISIONAL_JMPREL entry to tell the dynamic loader where the
> JMPREL entries for the provisional PLT entries begin; it can ignore
> everything past that point when auditing is turned off.

These new dynamic tags are used to compute PLT offset from GOT offset.
See elf_machine_plt_address in my patch.

> I suppose you may also want to partition the GLOB_DAT relocations, so
> that the dynamic loader can easily figure out which ones to ignore
> when auditing is enabled. That would take another dynamic table entry.
>
> Now, why do we need both the regular GOT entry and the provisional
> PLTGOT entry? If the program is linked with -z relro and lazy binding,
> you can put the GOT entries in the RELRO segment, and the PLTGOT
> entries in writable data. That gives you the security when auditing is
> turned off, and the ability to dynamically patch the PLTGOT when it's
> turned on. In any other case, however, I see no reason to have both.
> If you get rid of the GOT entry, and have the point of call jump
> indirectly through the PLTGOT entry, which is initialized to point to
> part (b) of the PLT entry, everything should work the same as without
> -fno-plt. Essentially, all -fno-plt would do is inline part (a) of the
> PLT entry.
>

I want to use both so that GOT is read-only after relocation in
normal case and the writable PLTGOT is only used for LD_AUDIT.

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00           ` Carlos O'Donell
  2018-01-01  0:00             ` Florian Weimer
@ 2018-01-01  0:00             ` H.J. Lu
  2018-01-01  0:00               ` Carlos O'Donell
  2018-01-01  0:00             ` Cary Coutant
  2 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Florian Weimer, Generic System V Application Binary Interface, gnu-gabi

On Thu, Mar 22, 2018 at 8:36 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>> Using ld.so-generated thunks for all GLOB_DAT function symbol
>> relocations would happen in audit mode only and should work with
>> existing binaries which were built with -Wl,-z,now.
>
> This is a very good reason to prefer one method over another, that we
> could fix existing binaries. However, I still think the complexity of
> such a fix outweighs what we are trying to fix. Do we have another use
> for such stubs?

If you take a look at BFD linker, it generates different PLT layouts for
MPX and CET.  It is totally transparent to ld.so.  Putting all PLT choices
as well as adding new ones in ld.so is very complex.  I don't believe they
belong to ld.so.

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00             ` Florian Weimer
@ 2018-01-01  0:00               ` H.J. Lu
  2018-01-01  0:00                 ` Florian Weimer
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Carlos O'Donell,
	Generic System V Application Binary Interface, gnu-gabi

On Thu, Mar 22, 2018 at 9:47 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * Carlos O'Donell:
>
>> Well, Levin's "Linker's and Loaders"
>> https://www.iecc.com/linker/linker10.html, is the immediate reference
>> that I have on my shelf, and that developers working on glibc/binutils
>> should read.
>
> Thanks, I didn't know that.
>
>>> My understanding is that H.J.'s proposal requires changes when running
>>> in non-audit mode.  It certainly requires relinking all binaries,
>>> perhaps even with special flags.
>>
>> It would require a relink only to fix existing binaries which are broken
>> by the use of -fno-plt, which is not an option that has seen general use
>> anywhere that I am aware of.
>
> I don't think that's actually true.  BFD ld has not emitted
> R_X86_64_JUMP_SLOT relocations with -z now for quite some time now.
> This optimization predates -fno-plt.
>

Not true with binutils 2.30:

[hjl@gnu-bdx-1 include]$ readelf -d /bin/ld | grep NOW
 0x0000000000000018 (BIND_NOW)
 0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE
[hjl@gnu-bdx-1 include]$ readelf -rW /bin/ld | grep JUMP_SLOT
00000000001b0868  0000000100000007 R_X86_64_JUMP_SLOT
0000000000000000 getenv@GLIBC_2.2.5 + 0
...

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00           ` Carlos O'Donell
@ 2018-01-01  0:00             ` Florian Weimer
  2018-01-01  0:00               ` H.J. Lu
  2018-01-01  0:00             ` H.J. Lu
  2018-01-01  0:00             ` Cary Coutant
  2 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: H.J. Lu, Generic System V Application Binary Interface, gnu-gabi

* Carlos O'Donell:

> Well, Levin's "Linker's and Loaders"
> https://www.iecc.com/linker/linker10.html, is the immediate reference
> that I have on my shelf, and that developers working on glibc/binutils
> should read.

Thanks, I didn't know that.

>> My understanding is that H.J.'s proposal requires changes when running
>> in non-audit mode.  It certainly requires relinking all binaries,
>> perhaps even with special flags.
>
> It would require a relink only to fix existing binaries which are broken
> by the use of -fno-plt, which is not an option that has seen general use
> anywhere that I am aware of.

I don't think that's actually true.  BFD ld has not emitted
R_X86_64_JUMP_SLOT relocations with -z now for quite some time now.
This optimization predates -fno-plt.

Please check some older BIND_NOW binaries to confirm this.

Does this alter your position?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Audit external function called indirectly via GOT
  2018-01-01  0:00 ` Florian Weimer
@ 2018-01-01  0:00   ` H.J. Lu
  2018-01-01  0:00     ` Florian Weimer
  2018-01-01  0:00     ` Florian Weimer
  0 siblings, 2 replies; 33+ messages in thread
From: H.J. Lu @ 2018-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Generic System V Application Binary Interface, gnu-gabi

On Mon, Mar 19, 2018 at 1:21 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 03/17/2018 02:31 PM, H.J. Lu wrote:
>>
>> Auditing of external function calls and their return values relies on
>> lazy binding with PLT.  When external functions are called indirectly
>> via GOT without using PLT, auditing stops working.
>>
>> Here is a proposal to support auditing of external function called
>> indirectly via GOT:
>>
>> 1. Add optional dynamic tags:
>>
>>   #define DT_GNU_PLT     0x6ffffef4  /* Address of PLT section  */
>>   #define DT_GNU_PLTSZ   0x6ffffdf1  /* Size of PLT section  */
>>   #define DT_GNU_PLTENT  0x6ffffdf2  /* Size of one PLT entry  */
>>   #define DT_GNU_PLT0SZ  0x6ffffdf3  /* Size of the first PLT entry  */
>>   #define DT_GNU_PLTGOTSZ 0x6ffffdf4 /* Size of PLTGOT section  */
>>
>> and update DT_FLAGS_1 with:
>>
>>   #define DF_1_JMPRELIGN 0x10000000  /* DT_JMPREL can be ignored  */
>> 2. Linker creates PLT entries for auditing external function calls via
>> GOT and sets DT_GNU_PLT, DT_GNU_PLTSZ, DT_GNU_PLTENT, DT_GNU_PLT0SZ and
>> DT_GNU_PLTGOTSZ.  If PLT isn't required for lazy binding, set the
>> DF_1_JMPRELIGN bit in DT_FLAGS_1.
>
>
> Could we ship a template for the PLT entries in ld.so instead?  And if
> needed, map it from the file together with an address array, like this?

This won't work since linker needs to know exactly PLT layout to generate
JUMP_SLOT relocations for LD_AUDIT.

>   Data page with pointer
>   PLT template from ld.so (loading pointers from the previous page)
>
> This process can get be repeated, to obtain as many PLT stubs as needed.
> It's not a real JIT, so SELinux will still be happy.
>
> The data page would probably contain two pointers per PLT entry, not just
> one, so that the reserved PLT entries aren't necessary.
>
>> 3. When auditing is enabled at run-time, dynamic linker resolves GLOB_DAT
>> relocation to its corresponding PLT entry by finding JUMP_SLOT relocation
>> against the same function and use its PLT slot as the function address.
>
>
> This step would stay the same.
>
> I wonder if this would make it possible to restore audit support for
> existing binaries which lack PLT entries today.
>

I don't think so.


-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2018-03-29  9:39 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-01  0:00 RFC: Audit external function called indirectly via GOT H.J. Lu
2018-01-01  0:00 ` Cary Coutant
2018-01-01  0:00   ` Carlos O'Donell
2018-01-01  0:00     ` Cary Coutant
2018-01-01  0:00       ` Alan Modra
2018-01-01  0:00         ` H.J. Lu
2018-01-01  0:00           ` Alan Modra
2018-01-01  0:00             ` H.J. Lu
2018-01-01  0:00       ` H.J. Lu
2018-01-01  0:00         ` Cary Coutant
2018-01-01  0:00           ` Cary Coutant
2018-01-01  0:00           ` H.J. Lu
2018-01-01  0:00 ` Florian Weimer
2018-01-01  0:00   ` H.J. Lu
2018-01-01  0:00     ` Florian Weimer
2018-01-01  0:00       ` Carlos O'Donell
2018-01-01  0:00         ` Florian Weimer
2018-01-01  0:00           ` Carlos O'Donell
2018-01-01  0:00             ` Florian Weimer
2018-01-01  0:00               ` H.J. Lu
2018-01-01  0:00                 ` Florian Weimer
2018-01-01  0:00                   ` H.J. Lu
2018-01-01  0:00             ` H.J. Lu
2018-01-01  0:00               ` Carlos O'Donell
2018-01-01  0:00             ` Cary Coutant
2018-01-01  0:00               ` H.J. Lu
2018-01-01  0:00                 ` Cary Coutant
2018-01-01  0:00                   ` H.J. Lu
2018-01-01  0:00                     ` Cary Coutant
2018-01-01  0:00           ` H.J. Lu
2018-01-01  0:00     ` Florian Weimer
2018-01-01  0:00       ` H.J. Lu
2018-01-01  0:00         ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).