public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Arm's SVE PCS and LD_AUDIT support?
@ 2020-02-01  3:37 Carlos O'Donell
  2020-02-03 11:35 ` Szabolcs Nagy
  0 siblings, 1 reply; 5+ messages in thread
From: Carlos O'Donell @ 2020-02-01  3:37 UTC (permalink / raw)
  To: Szabolcs Nagy, libc-alpha

Szabolcs,

One of the things I want to refactor is to move some of the LD_AUDIT
support up a level inside the dynamic loader and have it depend
less on the lazy-binding semantics.

I want only la_pltenter() and la_pltexit() to be affected by the
binding semantics, but today because lazy was the default, we aren't
there yet.

Florian and I were wondering if we couldn't implement the following:

- Leave PLT in place for SVE PCS but unused.

- Enable full save-restore in plt enter/exit conservatively for
  STO_AARCH64_VARIANT_PCS if LD_AUDIT is in use, possibly routing
  those symbols to a different _dl_profile_fixup_full_save?

Would that work? 

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Arm's SVE PCS and LD_AUDIT support?
  2020-02-01  3:37 Arm's SVE PCS and LD_AUDIT support? Carlos O'Donell
@ 2020-02-03 11:35 ` Szabolcs Nagy
  2020-02-03 15:43   ` Carlos O'Donell
  0 siblings, 1 reply; 5+ messages in thread
From: Szabolcs Nagy @ 2020-02-03 11:35 UTC (permalink / raw)
  To: Carlos O'Donell, libc-alpha; +Cc: nd

On 01/02/2020 03:37, Carlos O'Donell wrote:
> Szabolcs,
> 
> One of the things I want to refactor is to move some of the LD_AUDIT
> support up a level inside the dynamic loader and have it depend
> less on the lazy-binding semantics.
> 
> I want only la_pltenter() and la_pltexit() to be affected by the
> binding semantics, but today because lazy was the default, we aren't
> there yet.

iirc currently any load time bound pltgot entries
will not go via plt hook of ldaudit because it
uses the same entry mechanism as lazy binding
(plt0 jumps to GOT[2]) so vpcs is not using ld
audit now.

will that change?

> Florian and I were wondering if we couldn't implement the following:
> 
> - Leave PLT in place for SVE PCS but unused.
> 
> - Enable full save-restore in plt enter/exit conservatively for
>   STO_AARCH64_VARIANT_PCS if LD_AUDIT is in use, possibly routing
>   those symbols to a different _dl_profile_fixup_full_save?
> 
> Would that work? 

i don't understand what is the new ld audit mechanism
for hooking into the plt if not GOT[1] & GOT[2].

when the vpcs abi was designed we were thinking about
adding a second entry point somewhere (e.g.
DT_AARCH64_VPCS_PLT and DT_AARCH64_VPCS_PLTGOT)
instead of using GOT[1] as PLTGOT initializer which
then jumps to GOT[2], variant_pcs symbols could
use a different entry point which can do whatever
it takes to make lazy binding work.

but it seemed a bit too much hassle for something
we don't really plan to use (bind now for vpcs is
good enough) and in principle the entry point can
handle variant_pcs and normal symbols differently,
it's just ugly because checking for the STO_*
symbol table flag at runtime has to happen in
asm since we don't know the pcs yet.

adding such second entry is still possible, or
the ld audit hook can do something ugly in asm.

if you can distinguish normal and vpcs syms in
the hook then i see no problem doing ld audit,
but the struct where the register state is saved
need to be scalable (currently exposed to the
user in the plt callbacks).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Arm's SVE PCS and LD_AUDIT support?
  2020-02-03 11:35 ` Szabolcs Nagy
@ 2020-02-03 15:43   ` Carlos O'Donell
  2020-02-03 19:01     ` Szabolcs Nagy
  0 siblings, 1 reply; 5+ messages in thread
From: Carlos O'Donell @ 2020-02-03 15:43 UTC (permalink / raw)
  To: Szabolcs Nagy, libc-alpha; +Cc: nd

On 2/3/20 6:35 AM, Szabolcs Nagy wrote:
> On 01/02/2020 03:37, Carlos O'Donell wrote:
>> Szabolcs,
>>
>> One of the things I want to refactor is to move some of the LD_AUDIT
>> support up a level inside the dynamic loader and have it depend
>> less on the lazy-binding semantics.
>>
>> I want only la_pltenter() and la_pltexit() to be affected by the
>> binding semantics, but today because lazy was the default, we aren't
>> there yet.
> 
> iirc currently any load time bound pltgot entries
> will not go via plt hook of ldaudit because it
> uses the same entry mechanism as lazy binding
> (plt0 jumps to GOT[2]) so vpcs is not using ld
> audit now.

I'm sorry, I don't quite understand this sentence, but I think you
are saying:

- Currently the STO_AARCH64_VARIANT_PCS symbols cannot support
  la_pltenter() and la_pltexit() because they do not call through
  the loader's PLT hook.

I agree this is the current state.
 
> will that change?

I'm asking _you_ the question if we can change things to support 
LD_AUDIT and SVE PCS and _how_ we might change things.

I think you answer the _how_ below, by saying we could create an
alternate hook to avoid looking up the symbol's type.

>> Florian and I were wondering if we couldn't implement the following:
>>
>> - Leave PLT in place for SVE PCS but unused.
>>
>> - Enable full save-restore in plt enter/exit conservatively for
>>   STO_AARCH64_VARIANT_PCS if LD_AUDIT is in use, possibly routing
>>   those symbols to a different _dl_profile_fixup_full_save?
>>
>> Would that work? 
> 
> i don't understand what is the new ld audit mechanism
> for hooking into the plt if not GOT[1] & GOT[2].

You are asking for implementation details which I did not provide :-)

In dl-machine.h:elf_machine_lazy_rel() when we fully resolve the
STO_AARCH64_VARIANT_PCS symbol, we would instead need to point the
symbol at something new like a reserved GOT[3]/GOT[4].

So you define it like this:
DT_PLTGOT = GOT[0]
GOT[1] = link map
GOT[2] = hook for all symbols
GOT[3] = link map
GOT[4] = hook for STO_AARCH64_VARIANT_PCS

This would be lower-cost to develop but similar to DT_AARCH64_VPCS_PLTGOT 
and DT_AARCH64_VPCS_PLT.

For the sake of upgrades we want to use DT_AARCH64_VPCS_PLTGOT
to indicate the availability of the feature and define that it points
to DT_PLTGOT+2, and we use GOT[1]/GOT[2] as expected (really GOT[3]/GOT[4]).

This way old binaries keep working without LD_AUDIT, but new binaries
can redirect VPCS symbols into the second hook.

> when the vpcs abi was designed we were thinking about
> adding a second entry point somewhere (e.g.
> DT_AARCH64_VPCS_PLT and DT_AARCH64_VPCS_PLTGOT)
> instead of using GOT[1] as PLTGOT initializer which
> then jumps to GOT[2], variant_pcs symbols could
> use a different entry point which can do whatever
> it takes to make lazy binding work.

That would be a very robust design.

I think I'm suggesting a subset of this design.

> but it seemed a bit too much hassle for something
> we don't really plan to use (bind now for vpcs is
> good enough) and in principle the entry point can
> handle variant_pcs and normal symbols differently,
> it's just ugly because checking for the STO_*
> symbol table flag at runtime has to happen in
> asm since we don't know the pcs yet.

- If in lazy-binding mode.
  - Setup GOT[1]/GOT[2] to point to ld hook.
  - Normal lazily bound symbols go to the ld hook.
  - STO_AARCH64_VARIANT_PCS are immediately bound for performance.

- If in non-lazy binding mode.
  - Do nothing since we will relocate all PLT entries.
  - All work done in dl-machine.h:elf_machine_lazy_rel()

- If in ld-audit mode.
  - Setup GOT[1]/GOT[2] to point to ld hook.
  - Setup GOT[3]/GOT[4] to point to full-save ld hook.
  - Additionally relocate all STO_AARCH64_VARIANT_PCS to GOT[3]/GOT[4]
  - If no la_pltenter or la_pltexit is requested for the symbol we could
    finalize the relocation to the real symbol and avoid the full save
    for the hook.

Notes:
- If you really wanted we could use the alternative hook to support
  lazy binding of STO_AARCH64_VARIANT_PCS, but I wouldn't bother.
- Florian and I discussed offloading the problem of VPCS detection to
  the user by moving la_symbind() really early and let the user, who
  knows the calls, return LA_SYMB_FULLSAVE (new flag) from la_symbind
  for those functions that need a full save and restore. The down side
  to this approach is silent corruption if you get this wrong. You could
  invert the meaning and say LA_SYMB_NOFULLSAVE and use it to speed up
  all the other symbols during auditing on aarch64. I'm warry of this
  approach because we could do a better job with just a little bit more
  design work.

> adding such second entry is still possible, or
> the ld audit hook can do something ugly in asm.

I think a second entry would be preferable to doing this all in asm.
 
> if you can distinguish normal and vpcs syms in
> the hook then i see no problem doing ld audit,
> but the struct where the register state is saved
> need to be scalable (currently exposed to the
> user in the plt callbacks).

We would have to do the following:

- the la_aarch64_gnu_pltenter hook must inspect the symbol and detect
  if it is STO_AARCH64_VARIANT_PCS, and if so, then use a different
  scalable definition of La_aarch64_vpcs_regs and La_aarch64_vpcs_retval.

- Keep the La_aarch64_vpcs_regs and La_aarch64_vpcs_retval compatible
  with the existing regs and retval, and just extend them.

In summary:
- You suggest an alternative with DT_AARCH64_VPCS_PLT and DT_AARCH64_VPCS_PLTGOT
  to enable both lazy binding and ld audit.
- I suggest a crude DT_AARCH64_VPCS_PLTGOT-only solution just for ld audit.
  - Could be extended to support DT_AARCH64_VPCS_PLT by changing the value
    stored in DT_AARCH64_VPCS_PLTGOT.

Either solution requires:
- New La_aarch64_vpcs_regs, and La_aarch64_vpcs_retval.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Arm's SVE PCS and LD_AUDIT support?
  2020-02-03 15:43   ` Carlos O'Donell
@ 2020-02-03 19:01     ` Szabolcs Nagy
  2020-02-03 22:52       ` Carlos O'Donell
  0 siblings, 1 reply; 5+ messages in thread
From: Szabolcs Nagy @ 2020-02-03 19:01 UTC (permalink / raw)
  To: Carlos O'Donell, libc-alpha; +Cc: nd

On 03/02/2020 15:43, Carlos O'Donell wrote:
> On 2/3/20 6:35 AM, Szabolcs Nagy wrote:
>> On 01/02/2020 03:37, Carlos O'Donell wrote:
>>> One of the things I want to refactor is to move some of the LD_AUDIT
>>> support up a level inside the dynamic loader and have it depend
>>> less on the lazy-binding semantics.
>>>
>>> I want only la_pltenter() and la_pltexit() to be affected by the
>>> binding semantics, but today because lazy was the default, we aren't
>>> there yet.
>>
>> iirc currently any load time bound pltgot entries
>> will not go via plt hook of ldaudit because it
>> uses the same entry mechanism as lazy binding
>> (plt0 jumps to GOT[2]) so vpcs is not using ld
>> audit now.
> 
> I'm sorry, I don't quite understand this sentence, but I think you
> are saying:
> 
> - Currently the STO_AARCH64_VARIANT_PCS symbols cannot support
>   la_pltenter() and la_pltexit() because they do not call through
>   the loader's PLT hook.
> 
> I agree this is the current state.

yes, that's the current state and as far as i was
concerned this is not broken: ld audit still works,
just variant pcs symbols are not hooked.

i thought you might be planing something that makes
this behaviour broken (e.g. force all symbols through
a different hooking mechanism instead of GOT[2]).

>> will that change?
> 
> I'm asking _you_ the question if we can change things to support 
> LD_AUDIT and SVE PCS and _how_ we might change things.
> 
> I think you answer the _how_ below, by saying we could create an
> alternate hook to avoid looking up the symbol's type.

my idea was that the audit plt hooks were not very
reliable or widely used anyway so the current level
of support was a reasonable trade-off.

i think adding ld audit support for variant pcs is
doable and likely preferable to the current state,
but i would only spend significant abi design effort
on it if we expect that it will be used.

e.g. is it ok to just always save/restore the full
register state during ld audit (but only expose to
users a subset of the regs so the callback abi is
unchanged)?

the reason i wanted to avoid full save/restore during
lazy binding was to avoid stack usage issues, but
i'm less concerned about that in case of audit.
(should i be concerned? doesn't it currently use a
large amount of stack anyway?)

will ld audit work on '-z now' binaries?
will distros build binaries with -z now?
will users build tooling on top of ld audit?
do we plan to make compiler changes to make plt
calls more reliable?

i.e. i'm not sure what level of support i should
aim for and what to optimize for.

>>> Florian and I were wondering if we couldn't implement the following:
>>>
>>> - Leave PLT in place for SVE PCS but unused.
>>>
>>> - Enable full save-restore in plt enter/exit conservatively for
>>>   STO_AARCH64_VARIANT_PCS if LD_AUDIT is in use, possibly routing
>>>   those symbols to a different _dl_profile_fixup_full_save?
>>>
>>> Would that work? 
>>
>> i don't understand what is the new ld audit mechanism
>> for hooking into the plt if not GOT[1] & GOT[2].
> 
> You are asking for implementation details which I did not provide :-)
> 
> In dl-machine.h:elf_machine_lazy_rel() when we fully resolve the
> STO_AARCH64_VARIANT_PCS symbol, we would instead need to point the
> symbol at something new like a reserved GOT[3]/GOT[4].
> 
> So you define it like this:
> DT_PLTGOT = GOT[0]
> GOT[1] = link map
> GOT[2] = hook for all symbols
> GOT[3] = link map
> GOT[4] = hook for STO_AARCH64_VARIANT_PCS
> 
> This would be lower-cost to develop but similar to DT_AARCH64_VPCS_PLTGOT 
> and DT_AARCH64_VPCS_PLT.
> 
> For the sake of upgrades we want to use DT_AARCH64_VPCS_PLTGOT
> to indicate the availability of the feature and define that it points
> to DT_PLTGOT+2, and we use GOT[1]/GOT[2] as expected (really GOT[3]/GOT[4]).
> 
> This way old binaries keep working without LD_AUDIT, but new binaries
> can redirect VPCS symbols into the second hook.

i will have to refresh my memory about the details,
but something along these lines can work and can be
added to the psabi. (e.g. i'm not sure if the second
entry can be generic entry point ldso may use however
it likes to, or the semantics needs to be tied to
vpcs/audit, it may depend on what other tools like
debuggers do with got[1]/got[2])

>> when the vpcs abi was designed we were thinking about
>> adding a second entry point somewhere (e.g.
>> DT_AARCH64_VPCS_PLT and DT_AARCH64_VPCS_PLTGOT)
>> instead of using GOT[1] as PLTGOT initializer which
>> then jumps to GOT[2], variant_pcs symbols could
>> use a different entry point which can do whatever
>> it takes to make lazy binding work.
> 
> That would be a very robust design.
> 
> I think I'm suggesting a subset of this design.
> 
>> but it seemed a bit too much hassle for something
>> we don't really plan to use (bind now for vpcs is
>> good enough) and in principle the entry point can
>> handle variant_pcs and normal symbols differently,
>> it's just ugly because checking for the STO_*
>> symbol table flag at runtime has to happen in
>> asm since we don't know the pcs yet.
> 
> - If in lazy-binding mode.
>   - Setup GOT[1]/GOT[2] to point to ld hook.
>   - Normal lazily bound symbols go to the ld hook.
>   - STO_AARCH64_VARIANT_PCS are immediately bound for performance.
> 
> - If in non-lazy binding mode.
>   - Do nothing since we will relocate all PLT entries.
>   - All work done in dl-machine.h:elf_machine_lazy_rel()
> 
> - If in ld-audit mode.
>   - Setup GOT[1]/GOT[2] to point to ld hook.
>   - Setup GOT[3]/GOT[4] to point to full-save ld hook.
>   - Additionally relocate all STO_AARCH64_VARIANT_PCS to GOT[3]/GOT[4]
>   - If no la_pltenter or la_pltexit is requested for the symbol we could
>     finalize the relocation to the real symbol and avoid the full save
>     for the hook.

will ld-audit mode work for binaries built with -z now?
(what if somebody used -z now in order to guarantee there
is no hooking because a magic pcs is in use somewhere?)

> 
> Notes:
> - If you really wanted we could use the alternative hook to support
>   lazy binding of STO_AARCH64_VARIANT_PCS, but I wouldn't bother.
> - Florian and I discussed offloading the problem of VPCS detection to
>   the user by moving la_symbind() really early and let the user, who
>   knows the calls, return LA_SYMB_FULLSAVE (new flag) from la_symbind
>   for those functions that need a full save and restore. The down side
>   to this approach is silent corruption if you get this wrong. You could
>   invert the meaning and say LA_SYMB_NOFULLSAVE and use it to speed up
>   all the other symbols during auditing on aarch64. I'm warry of this
>   approach because we could do a better job with just a little bit more
>   design work.
> 
>> adding such second entry is still possible, or
>> the ld audit hook can do something ugly in asm.
> 
> I think a second entry would be preferable to doing this all in asm.

ok.

>> if you can distinguish normal and vpcs syms in
>> the hook then i see no problem doing ld audit,
>> but the struct where the register state is saved
>> need to be scalable (currently exposed to the
>> user in the plt callbacks).
> 
> We would have to do the following:
> 
> - the la_aarch64_gnu_pltenter hook must inspect the symbol and detect
>   if it is STO_AARCH64_VARIANT_PCS, and if so, then use a different
>   scalable definition of La_aarch64_vpcs_regs and La_aarch64_vpcs_retval.
> 
> - Keep the La_aarch64_vpcs_regs and La_aarch64_vpcs_retval compatible
>   with the existing regs and retval, and just extend them.

note that there are bugs in this area, so ld audit does
not work correctly for base pcs on aarch64 (i haven't
gotten around applying
https://sourceware.org/ml/libc-alpha/2018-08/msg00019.html
)

i will have to think about how to best deal with
La_aarch64_vpcs_* types (it may well be that the
current setup is unusable and then we can break abi).

> In summary:
> - You suggest an alternative with DT_AARCH64_VPCS_PLT and DT_AARCH64_VPCS_PLTGOT
>   to enable both lazy binding and ld audit.
> - I suggest a crude DT_AARCH64_VPCS_PLTGOT-only solution just for ld audit.
>   - Could be extended to support DT_AARCH64_VPCS_PLT by changing the value
>     stored in DT_AARCH64_VPCS_PLTGOT.
> 
> Either solution requires:
> - New La_aarch64_vpcs_regs, and La_aarch64_vpcs_retval.
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Arm's SVE PCS and LD_AUDIT support?
  2020-02-03 19:01     ` Szabolcs Nagy
@ 2020-02-03 22:52       ` Carlos O'Donell
  0 siblings, 0 replies; 5+ messages in thread
From: Carlos O'Donell @ 2020-02-03 22:52 UTC (permalink / raw)
  To: Szabolcs Nagy, libc-alpha; +Cc: nd

On 2/3/20 2:01 PM, Szabolcs Nagy wrote:
> On 03/02/2020 15:43, Carlos O'Donell wrote:
>> On 2/3/20 6:35 AM, Szabolcs Nagy wrote:
>>> On 01/02/2020 03:37, Carlos O'Donell wrote:
>>>> One of the things I want to refactor is to move some of the LD_AUDIT
>>>> support up a level inside the dynamic loader and have it depend
>>>> less on the lazy-binding semantics.
>>>>
>>>> I want only la_pltenter() and la_pltexit() to be affected by the
>>>> binding semantics, but today because lazy was the default, we aren't
>>>> there yet.
>>>
>>> iirc currently any load time bound pltgot entries
>>> will not go via plt hook of ldaudit because it
>>> uses the same entry mechanism as lazy binding
>>> (plt0 jumps to GOT[2]) so vpcs is not using ld
>>> audit now.
>>
>> I'm sorry, I don't quite understand this sentence, but I think you
>> are saying:
>>
>> - Currently the STO_AARCH64_VARIANT_PCS symbols cannot support
>>   la_pltenter() and la_pltexit() because they do not call through
>>   the loader's PLT hook.
>>
>> I agree this is the current state.
> 
> yes, that's the current state and as far as i was
> concerned this is not broken: ld audit still works,
> just variant pcs symbols are not hooked.
> 
> i thought you might be planing something that makes
> this behaviour broken (e.g. force all symbols through
> a different hooking mechanism instead of GOT[2]).
> 
>>> will that change?
>>
>> I'm asking _you_ the question if we can change things to support 
>> LD_AUDIT and SVE PCS and _how_ we might change things.
>>
>> I think you answer the _how_ below, by saying we could create an
>> alternate hook to avoid looking up the symbol's type.

Sorry, I see here that you want to try to prioritize implementing
something for Arm's SVE PCS.

I'm not asking that you implement it, only that we try to have a
design in place such that if I eventually do some refactoring that
we don't do anything that ruins this.

> my idea was that the audit plt hooks were not very
> reliable or widely used anyway so the current level
> of support was a reasonable trade-off.

It is a very reasonable trade-off.

> i think adding ld audit support for variant pcs is
> doable and likely preferable to the current state,
> but i would only spend significant abi design effort
> on it if we expect that it will be used.

I don't expect they will be used, but I wanted to have a plan
of some kind if I go in there and refactor things.

The *other* parts of the interface are way more useful than the
PLT entry/exit support since they let you control search scope
and binding which lets you alter linkage behaviour.

> e.g. is it ok to just always save/restore the full
> register state during ld audit (but only expose to
> users a subset of the regs so the callback abi is
> unchanged)?

I'd say just leave the implementation as it is.

I'm not asking you to implement this right now, but just what it
would take and what we would be happy to have.

I think a full save/restore in all cases is a fine compromise.

I do not think it is acceptable to expose only a subset of the regs,
particularly those regs used as call arguments (the point of being
able to inspect and change those registers).
 
> the reason i wanted to avoid full save/restore during
> lazy binding was to avoid stack usage issues, but
> i'm less concerned about that in case of audit.
> (should i be concerned? doesn't it currently use a
> large amount of stack anyway?)

No. If the framesize is 0 then you don't use any copied frame and
jump to the cached resolved function.

Yes. If the framesize is consistently large, then for all such
functions we use the max value and copy across parameters and
registers. You need detailed caller information to do this robustly.

Therefore I'm not worried about stack usage since for most functions
it should be zero since they don't want to audit PLT enter/exit.

Thus saving/restoring all the time is a waste. Though we do it on
Intel because xsavec is not that costly.

I'm worried about overall performance.

When you enable auditing you automatically turn on profiling and
enter _dl_runtime_profile, and this has performance implications
for all symbols even if we don't register PLT enter/exit code
(no framesize).

We could attempt to finalize a symbol if LA_SYMB_NOPLTENTER/NOPLTEXIT
was passed, and that would ameliorate the performance issues, but
that's not what we do today.

> will ld audit work on '-z now' binaries?

Yes, absolutely, with LD_AUDIT the dynamic loader actually enables
lazy binding dynamically.

And to be clear LD_BIND_NOW=1 conflicts with LD_AUDIT= and breaks
auditing, and I have an email about how to handle conflicting
environment variables.

> will distros build binaries with -z now?

They already do. Fedora builds with -z now, and we try to harden
everything.

> will users build tooling on top of ld audit?

Users already do, though largely only for object search and
symbol binding, because that lets you largely control the
behaviour of the loader.

> do we plan to make compiler changes to make plt
> calls more reliable?

It is the solution I'm suggesting and pushing internally at Red Hat.

That I want to see PLT emitted even if it's not immediately used,
that way the ABI of the PLT remains and is used cases where required
by inspection tooling.

Reconstituting the PLT entries dynamically (something Florian has
looked at) can be quite dangerous.

The other area where PLT entries are useful is if we eventually ever
need to do userspace live patching. We will want all PLT entries present
to quiesce calls into another DSO via the PLT entries.

> i.e. i'm not sure what level of support i should
> aim for and what to optimize for.

I can make some suggestions :-)

* Do not *implement* PLT enter/exit in Arm's SVE PCS.
  - It's a waste of time for now.

* Do *design* a strawman PLT enter/exit in Arm's SVE PCS so we don't
  break a potential solution with future refactoring of LD_AUDIT pieces.

  - I think we understand the solution here and I'm not worried now.

* We should absolutely make PLT's more robust, make them ABI, and thus
  allow future tooling to continue to be able to use them when required:

  - LD_AUDIT PLT enter/exit
  - User code wanting PLT by setting LD_BIND_NOW=0
    - Profiling via PLT without the need for interposing all symbols
    - PLT rewriting by user code.
  - Userspace live patching?
    - PLT are interesting thread rendevous points.

>>>> Florian and I were wondering if we couldn't implement the following:
>>>>
>>>> - Leave PLT in place for SVE PCS but unused.
>>>>
>>>> - Enable full save-restore in plt enter/exit conservatively for
>>>>   STO_AARCH64_VARIANT_PCS if LD_AUDIT is in use, possibly routing
>>>>   those symbols to a different _dl_profile_fixup_full_save?
>>>>
>>>> Would that work? 
>>>
>>> i don't understand what is the new ld audit mechanism
>>> for hooking into the plt if not GOT[1] & GOT[2].
>>
>> You are asking for implementation details which I did not provide :-)
>>
>> In dl-machine.h:elf_machine_lazy_rel() when we fully resolve the
>> STO_AARCH64_VARIANT_PCS symbol, we would instead need to point the
>> symbol at something new like a reserved GOT[3]/GOT[4].
>>
>> So you define it like this:
>> DT_PLTGOT = GOT[0]
>> GOT[1] = link map
>> GOT[2] = hook for all symbols
>> GOT[3] = link map
>> GOT[4] = hook for STO_AARCH64_VARIANT_PCS
>>
>> This would be lower-cost to develop but similar to DT_AARCH64_VPCS_PLTGOT 
>> and DT_AARCH64_VPCS_PLT.
>>
>> For the sake of upgrades we want to use DT_AARCH64_VPCS_PLTGOT
>> to indicate the availability of the feature and define that it points
>> to DT_PLTGOT+2, and we use GOT[1]/GOT[2] as expected (really GOT[3]/GOT[4]).
>>
>> This way old binaries keep working without LD_AUDIT, but new binaries
>> can redirect VPCS symbols into the second hook.
> 
> i will have to refresh my memory about the details,
> but something along these lines can work and can be
> added to the psabi. (e.g. i'm not sure if the second
> entry can be generic entry point ldso may use however
> it likes to, or the semantics needs to be tied to
> vpcs/audit, it may depend on what other tools like
> debuggers do with got[1]/got[2])

Yes, we'd have to look at debuggers.

>>> when the vpcs abi was designed we were thinking about
>>> adding a second entry point somewhere (e.g.
>>> DT_AARCH64_VPCS_PLT and DT_AARCH64_VPCS_PLTGOT)
>>> instead of using GOT[1] as PLTGOT initializer which
>>> then jumps to GOT[2], variant_pcs symbols could
>>> use a different entry point which can do whatever
>>> it takes to make lazy binding work.
>>
>> That would be a very robust design.
>>
>> I think I'm suggesting a subset of this design.
>>
>>> but it seemed a bit too much hassle for something
>>> we don't really plan to use (bind now for vpcs is
>>> good enough) and in principle the entry point can
>>> handle variant_pcs and normal symbols differently,
>>> it's just ugly because checking for the STO_*
>>> symbol table flag at runtime has to happen in
>>> asm since we don't know the pcs yet.
>>
>> - If in lazy-binding mode.
>>   - Setup GOT[1]/GOT[2] to point to ld hook.
>>   - Normal lazily bound symbols go to the ld hook.
>>   - STO_AARCH64_VARIANT_PCS are immediately bound for performance.
>>
>> - If in non-lazy binding mode.
>>   - Do nothing since we will relocate all PLT entries.
>>   - All work done in dl-machine.h:elf_machine_lazy_rel()
>>
>> - If in ld-audit mode.
>>   - Setup GOT[1]/GOT[2] to point to ld hook.
>>   - Setup GOT[3]/GOT[4] to point to full-save ld hook.
>>   - Additionally relocate all STO_AARCH64_VARIANT_PCS to GOT[3]/GOT[4]
>>   - If no la_pltenter or la_pltexit is requested for the symbol we could
>>     finalize the relocation to the real symbol and avoid the full save
>>     for the hook.
> 
> will ld-audit mode work for binaries built with -z now?

It already does. LD_AUDIT= disables -z now and enables lazy even if
everything was compiled -z now.

> (what if somebody used -z now in order to guarantee there
> is no hooking because a magic pcs is in use somewhere?)

You have no such guarantees. The loader will honour LD_AUDIT= and disable
-z now.

>>
>> Notes:
>> - If you really wanted we could use the alternative hook to support
>>   lazy binding of STO_AARCH64_VARIANT_PCS, but I wouldn't bother.
>> - Florian and I discussed offloading the problem of VPCS detection to
>>   the user by moving la_symbind() really early and let the user, who
>>   knows the calls, return LA_SYMB_FULLSAVE (new flag) from la_symbind
>>   for those functions that need a full save and restore. The down side
>>   to this approach is silent corruption if you get this wrong. You could
>>   invert the meaning and say LA_SYMB_NOFULLSAVE and use it to speed up
>>   all the other symbols during auditing on aarch64. I'm warry of this
>>   approach because we could do a better job with just a little bit more
>>   design work.
>>
>>> adding such second entry is still possible, or
>>> the ld audit hook can do something ugly in asm.
>>
>> I think a second entry would be preferable to doing this all in asm.
> 
> ok.
> 
>>> if you can distinguish normal and vpcs syms in
>>> the hook then i see no problem doing ld audit,
>>> but the struct where the register state is saved
>>> need to be scalable (currently exposed to the
>>> user in the plt callbacks).
>>
>> We would have to do the following:
>>
>> - the la_aarch64_gnu_pltenter hook must inspect the symbol and detect
>>   if it is STO_AARCH64_VARIANT_PCS, and if so, then use a different
>>   scalable definition of La_aarch64_vpcs_regs and La_aarch64_vpcs_retval.
>>
>> - Keep the La_aarch64_vpcs_regs and La_aarch64_vpcs_retval compatible
>>   with the existing regs and retval, and just extend them.
> 
> note that there are bugs in this area, so ld audit does
> not work correctly for base pcs on aarch64 (i haven't
> gotten around applying
> https://sourceware.org/ml/libc-alpha/2018-08/msg00019.html
> )

OK.

> i will have to think about how to best deal with
> La_aarch64_vpcs_* types (it may well be that the
> current setup is unusable and then we can break abi).

Your call.

>> In summary:
>> - You suggest an alternative with DT_AARCH64_VPCS_PLT and DT_AARCH64_VPCS_PLTGOT
>>   to enable both lazy binding and ld audit.
>> - I suggest a crude DT_AARCH64_VPCS_PLTGOT-only solution just for ld audit.
>>   - Could be extended to support DT_AARCH64_VPCS_PLT by changing the value
>>     stored in DT_AARCH64_VPCS_PLTGOT.
>>
>> Either solution requires:
>> - New La_aarch64_vpcs_regs, and La_aarch64_vpcs_retval.
>>
> 

In summary:
- I'm not suggesting we need to implement PLT enter/exit for Arm SVE.
- I'm suggesting we should think about a solution for PLT enter/exit for Arm SVE.
- I want to refactor some LD_AUDIT support to make symbol binding more reliable
  and independent of lazy symbol binding.
- I don't want the refactoring to potentially break Arm's future PLT enter/exit
  solution due to an architectural requirement I didn't think about.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-03 22:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-01  3:37 Arm's SVE PCS and LD_AUDIT support? Carlos O'Donell
2020-02-03 11:35 ` Szabolcs Nagy
2020-02-03 15:43   ` Carlos O'Donell
2020-02-03 19:01     ` Szabolcs Nagy
2020-02-03 22:52       ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).