* [PATCH] Fix runtime linker auditing on aarch64
@ 2020-09-23 1:16 Ben Woodard
2020-09-23 12:22 ` Florian Weimer
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Ben Woodard @ 2020-09-23 1:16 UTC (permalink / raw)
To: libc-alpha
The dynamic linker's auditing was not working on aarch64. See PR#26643
https://sourceware.org/bugzilla/show_bug.cgi?id=26643
There were two distinct problems:
* _dl_runtime_resolve was not preserving x8 the indirect result location
register.
* The NEON Q registers pushed onto the stack by _dl_runtime_resolve
were twice the size of D registers extracted from the stack frame by
_dl_runtime_profile.
To fix this
* The La_aarch64_regs structure was expanded to include x8 and the full
sized NEON V registers that are required to be preserved by the ABI.
* _dl_runtime_profile needed to extract registers saved by
_dl_runtime_resolve and put them into the new correctly sized
La_aarch64_regs structure.
* The return value structure La_aarch64_retval also didn't have the correctly
sized NEON V registers.
As a couple of additional cleanups
* The names of the NEON registers saved within the La_aarch64_regs and the
La_aarch_retval structures referred to the old D registers which were
doubles. Now the registers are quads and are called V for vector registers.
So the name of the field in the structure and the names of the offsets
within that structure were named to use the more modern names.
* The ABI specification says that r0-r7 + r8 the indirect result location
register as well as the NEON v0-v7 registers can be used to return values
from a function. Therefore, I addded those to the La_aarch64_retval
structure so that it also correctly matches the ABI.
An additional problem not addressed by this patch is what to do about the
changes to the aarch64 ABI needed to support SVE. A discussion about what to
do about that was begun on libc-alpha here:
https://sourceware.org/pipermail/libc-alpha/2020-September/117797.html
---
sysdeps/aarch64/bits/link.h | 17 ++++----
sysdeps/aarch64/dl-link.sym | 4 +-
sysdeps/aarch64/dl-trampoline.S | 75 +++++++++++++++++++++------------
3 files changed, 59 insertions(+), 37 deletions(-)
diff --git a/sysdeps/aarch64/bits/link.h b/sysdeps/aarch64/bits/link.h
index 0c54e6ea7b..2b43ace57c 100644
--- a/sysdeps/aarch64/bits/link.h
+++ b/sysdeps/aarch64/bits/link.h
@@ -23,19 +23,20 @@
/* Registers for entry into PLT on AArch64. */
typedef struct La_aarch64_regs
{
- uint64_t lr_xreg[8];
- uint64_t lr_dreg[8];
- uint64_t lr_sp;
- uint64_t lr_lr;
+ uint64_t lr_xreg[9];
+ __uint128_t lr_vreg[8];
+ uint64_t lr_sp;
+ uint64_t lr_lr;
} La_aarch64_regs;
/* Return values for calls from PLT on AArch64. */
typedef struct La_aarch64_retval
{
- /* Up to two integer registers can be used for a return value. */
- uint64_t lrv_xreg[2];
- /* Up to four D registers can be used for a return value. */
- uint64_t lrv_dreg[4];
+ /* Up to eight integer registers and the indirect result location register
+ can be used for a return value. */
+ uint64_t lrv_xreg[9];
+ /* Up to eight V registers can be used for a return value. */
+ __uint128_t lrv_vreg[8];
} La_aarch64_retval;
__BEGIN_DECLS
diff --git a/sysdeps/aarch64/dl-link.sym b/sysdeps/aarch64/dl-link.sym
index d67d28b40c..70d153a1d5 100644
--- a/sysdeps/aarch64/dl-link.sym
+++ b/sysdeps/aarch64/dl-link.sym
@@ -7,9 +7,9 @@ DL_SIZEOF_RG sizeof(struct La_aarch64_regs)
DL_SIZEOF_RV sizeof(struct La_aarch64_retval)
DL_OFFSET_RG_X0 offsetof(struct La_aarch64_regs, lr_xreg)
-DL_OFFSET_RG_D0 offsetof(struct La_aarch64_regs, lr_dreg)
+DL_OFFSET_RG_V0 offsetof(struct La_aarch64_regs, lr_vreg)
DL_OFFSET_RG_SP offsetof(struct La_aarch64_regs, lr_sp)
DL_OFFSET_RG_LR offsetof(struct La_aarch64_regs, lr_lr)
DL_OFFSET_RV_X0 offsetof(struct La_aarch64_retval, lrv_xreg)
-DL_OFFSET_RV_D0 offsetof(struct La_aarch64_retval, lrv_dreg)
+DL_OFFSET_RV_V0 offsetof(struct La_aarch64_retval, lrv_vreg)
diff --git a/sysdeps/aarch64/dl-trampoline.S b/sysdeps/aarch64/dl-trampoline.S
index 794876fffa..c91341e8fc 100644
--- a/sysdeps/aarch64/dl-trampoline.S
+++ b/sysdeps/aarch64/dl-trampoline.S
@@ -46,6 +46,8 @@ _dl_runtime_resolve:
cfi_rel_offset (lr, 8)
/* Save arguments. */
+ /* Note: Saving x9 is not required by the ABI but the assember requires
+ the immediate values of operand 3 to be a multiple of 16 */
stp x8, x9, [sp, #-(80+8*16)]!
cfi_adjust_cfa_offset (80+8*16)
cfi_rel_offset (x8, 0)
@@ -183,19 +185,23 @@ _dl_runtime_profile:
stp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
cfi_rel_offset (x6, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 0)
cfi_rel_offset (x7, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 8)
-
- stp d0, d1, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
- cfi_rel_offset (d0, OFFSET_RG + DL_OFFSET_RG_D0 + 16*0)
- cfi_rel_offset (d1, OFFSET_RG + DL_OFFSET_RG_D0 + 16*0 + 8)
- stp d2, d3, [X29, #OFFSET_RG+ DL_OFFSET_RG_D0 + 16*1]
- cfi_rel_offset (d2, OFFSET_RG + DL_OFFSET_RG_D0 + 16*1 + 0)
- cfi_rel_offset (d3, OFFSET_RG + DL_OFFSET_RG_D0 + 16*1 + 8)
- stp d4, d5, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
- cfi_rel_offset (d4, OFFSET_RG + DL_OFFSET_RG_D0 + 16*2 + 0)
- cfi_rel_offset (d5, OFFSET_RG + DL_OFFSET_RG_D0 + 16*2 + 8)
- stp d6, d7, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
- cfi_rel_offset (d6, OFFSET_RG + DL_OFFSET_RG_D0 + 16*3 + 0)
- cfi_rel_offset (d7, OFFSET_RG + DL_OFFSET_RG_D0 + 16*3 + 8)
+ str x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4 + 0]
+ cfi_rel_offset (x8, OFFSET_RG + DL_OFFSET_RG_X0 + 16*4 + 0)
+ /* Note X9 is in the stack frame for alignment but it is not
+ required to be saved by the ABI */
+
+ stp q0, q1, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*0]
+ cfi_rel_offset (q0, OFFSET_RG + DL_OFFSET_RG_V0 + 32*0)
+ cfi_rel_offset (q1, OFFSET_RG + DL_OFFSET_RG_V0 + 32*0 + 16)
+ stp q2, q3, [X29, #OFFSET_RG+ DL_OFFSET_RG_V0 + 32*1]
+ cfi_rel_offset (q2, OFFSET_RG + DL_OFFSET_RG_V0 + 32*1 + 0)
+ cfi_rel_offset (q3, OFFSET_RG + DL_OFFSET_RG_V0 + 32*1 + 16)
+ stp q4, q5, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
+ cfi_rel_offset (q4, OFFSET_RG + DL_OFFSET_RG_V0 + 32*2 + 0)
+ cfi_rel_offset (q5, OFFSET_RG + DL_OFFSET_RG_V0 + 32*2 + 16)
+ stp q6, q7, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
+ cfi_rel_offset (q6, OFFSET_RG + DL_OFFSET_RG_V0 + 32*3 + 0)
+ cfi_rel_offset (q7, OFFSET_RG + DL_OFFSET_RG_V0 + 32*3 + 16)
add x0, x29, #SF_SIZE + 16
ldr x1, [x29, #OFFSET_LR]
@@ -234,10 +240,11 @@ _dl_runtime_profile:
ldp x2, x3, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*1]
ldp x4, x5, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*2]
ldp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
- ldp d0, d1, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
- ldp d2, d3, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*1]
- ldp d4, d5, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
- ldp d6, d7, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
+ ldr x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
+ ldp q0, q1, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 16*0]
+ ldp q2, q3, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*1]
+ ldp q4, q5, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
+ ldp q6, q7, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
cfi_def_cfa_register (sp)
ldp x29, x30, [x29, #0]
@@ -280,14 +287,21 @@ _dl_runtime_profile:
ldp x2, x3, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*1]
ldp x4, x5, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*2]
ldp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
- ldp d0, d1, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
- ldp d2, d3, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*1]
- ldp d4, d5, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
- ldp d6, d7, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
+ ldr x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
+ ldp q0, q1, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*0]
+ ldp q2, q3, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*1]
+ ldp q4, q5, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
+ ldp q6, q7, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
blr ip0
- stp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0]
- stp d0, d1, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*0]
- stp d2, d3, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*1]
+ stp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*0]
+ stp x2, x3, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*1]
+ stp x4, x5, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*2]
+ stp x6, x7, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*3]
+ str x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
+ stp q0, q1, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*0]
+ stp q2, q3, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*1]
+ stp q4, q5, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*2]
+ stp q6, q7, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*3]
/* Setup call to pltexit */
ldp x0, x1, [x29, #OFFSET_SAVED_CALL_X0]
@@ -295,9 +309,16 @@ _dl_runtime_profile:
add x3, x29, #OFFSET_RV
bl _dl_call_pltexit
- ldp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0]
- ldp d0, d1, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*0]
- ldp d2, d3, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*1]
+ ldp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*0]
+ ldp x2, x3, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*1]
+ ldp x4, x5, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*2]
+ ldp x6, x7, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*3]
+ ldr x8, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*4]
+ ldp q0, q1, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*0]
+ ldp q2, q3, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*1]
+ ldp q4, q5, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*2]
+ ldp q6, q7, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*3]
+
/* LR from within La_aarch64_reg */
ldr lr, [x29, #OFFSET_RG + DL_OFFSET_RG_LR]
cfi_restore(lr)
--
2.26.2
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Fix runtime linker auditing on aarch64
2020-09-23 1:16 [PATCH] Fix runtime linker auditing on aarch64 Ben Woodard
@ 2020-09-23 12:22 ` Florian Weimer
2020-09-23 12:48 ` Szabolcs Nagy
2020-09-23 12:34 ` Szabolcs Nagy
2020-09-24 0:30 ` Carlos O'Donell
2 siblings, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2020-09-23 12:22 UTC (permalink / raw)
To: Ben Woodard via Libc-alpha
* Ben Woodard via Libc-alpha:
> To fix this
> * The La_aarch64_regs structure was expanded to include x8 and the full
> sized NEON V registers that are required to be preserved by the ABI.
Off-list, you said that the audit interface was completely broken on
AArch64. But it seems to be working enough for sotruss. So I do wonder
if we have to do a proper ABI transition here after all (bumping
LAV_CURRENT and all the consequences of that).
Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Fix runtime linker auditing on aarch64
2020-09-23 1:16 [PATCH] Fix runtime linker auditing on aarch64 Ben Woodard
2020-09-23 12:22 ` Florian Weimer
@ 2020-09-23 12:34 ` Szabolcs Nagy
2020-09-23 15:10 ` Ben Coyote Woodard
2020-09-23 18:00 ` Ben Coyote Woodard
2020-09-24 0:30 ` Carlos O'Donell
2 siblings, 2 replies; 15+ messages in thread
From: Szabolcs Nagy @ 2020-09-23 12:34 UTC (permalink / raw)
To: Ben Woodard; +Cc: libc-alpha
The 09/22/2020 18:16, Ben Woodard via Libc-alpha wrote:
> The dynamic linker's auditing was not working on aarch64. See PR#26643
> https://sourceware.org/bugzilla/show_bug.cgi?id=26643
>
> There were two distinct problems:
> * _dl_runtime_resolve was not preserving x8 the indirect result location
> register.
> * The NEON Q registers pushed onto the stack by _dl_runtime_resolve
> were twice the size of D registers extracted from the stack frame by
> _dl_runtime_profile.
>
> To fix this
> * The La_aarch64_regs structure was expanded to include x8 and the full
> sized NEON V registers that are required to be preserved by the ABI.
> * _dl_runtime_profile needed to extract registers saved by
> _dl_runtime_resolve and put them into the new correctly sized
> La_aarch64_regs structure.
> * The return value structure La_aarch64_retval also didn't have the correctly
> sized NEON V registers.
>
> As a couple of additional cleanups
> * The names of the NEON registers saved within the La_aarch64_regs and the
> La_aarch_retval structures referred to the old D registers which were
> doubles. Now the registers are quads and are called V for vector registers.
> So the name of the field in the structure and the names of the offsets
> within that structure were named to use the more modern names.
> * The ABI specification says that r0-r7 + r8 the indirect result location
> register as well as the NEON v0-v7 registers can be used to return values
> from a function. Therefore, I addded those to the La_aarch64_retval
> structure so that it also correctly matches the ABI.
>
> An additional problem not addressed by this patch is what to do about the
> changes to the aarch64 ABI needed to support SVE. A discussion about what to
> do about that was begun on libc-alpha here:
> https://sourceware.org/pipermail/libc-alpha/2020-September/117797.html
SVE calls are special (marked as STO_AARCH64_VARIANT_PCS
in the dynamic symbol table) and currently load time
bound (like BIND_NOW) so i think they don't go through
the PLT0 sequence that calls the profile entry in ld.so
and thus audit hooks are not called for them.
this is probably not what LD_AUDIT users would want
(do they care about hooking into sve calls?), but
VARIANT_PCS essentially allows any call convention,
so all registers have to be saved and restored if
such call enters the dynamic linker which is a problem
if register state may be extended in the future
(although probably ldaudit is special enough that its
users can update glibc if they care about new regs?).
(one way to expose variant pcs calls to audit hooks
is to detect the symbol type in the asm entry point
and then call a different hook, but this sounds
sufficiently ugly that i think we would then prefer
to update the elf abi with a second plt entry sequence
for variant pcs calls that can just use a different
entry into ld.so and new linkers would generate that
for dsos with variant pcs symbols.)
> ---
> sysdeps/aarch64/bits/link.h | 17 ++++----
> sysdeps/aarch64/dl-link.sym | 4 +-
> sysdeps/aarch64/dl-trampoline.S | 75 +++++++++++++++++++++------------
> 3 files changed, 59 insertions(+), 37 deletions(-)
>
> diff --git a/sysdeps/aarch64/bits/link.h b/sysdeps/aarch64/bits/link.h
> index 0c54e6ea7b..2b43ace57c 100644
> --- a/sysdeps/aarch64/bits/link.h
> +++ b/sysdeps/aarch64/bits/link.h
> @@ -23,19 +23,20 @@
> /* Registers for entry into PLT on AArch64. */
> typedef struct La_aarch64_regs
> {
> - uint64_t lr_xreg[8];
> - uint64_t lr_dreg[8];
> - uint64_t lr_sp;
> - uint64_t lr_lr;
> + uint64_t lr_xreg[9];
> + __uint128_t lr_vreg[8];
> + uint64_t lr_sp;
> + uint64_t lr_lr;
> } La_aarch64_regs;
ok.
changing abi is fine with me: old abi was
unusably broken, if audit modules use some
versioning that's even better.
>
> /* Return values for calls from PLT on AArch64. */
> typedef struct La_aarch64_retval
> {
> - /* Up to two integer registers can be used for a return value. */
> - uint64_t lrv_xreg[2];
> - /* Up to four D registers can be used for a return value. */
> - uint64_t lrv_dreg[4];
> + /* Up to eight integer registers and the indirect result location register
> + can be used for a return value. */
> + uint64_t lrv_xreg[9];
x8 is not preserved so recording it at function exit
is not useful. (on entry it points to where results
are stored but on exit it can be clobbered)
> + /* Up to eight V registers can be used for a return value. */
> + __uint128_t lrv_vreg[8];
>
> } La_aarch64_retval;
> __BEGIN_DECLS
note: i don't like to use non-standard types in
public apis (like __uint128_t), but we already
made this mistake in the linux sigcontext, so this
is probably ok.
(my preference normally is to use a standard type
e.g. long double or char[] with alignment attr,
but in practice __uint128_t is probably easier to
deal with)
> diff --git a/sysdeps/aarch64/dl-link.sym b/sysdeps/aarch64/dl-link.sym
> index d67d28b40c..70d153a1d5 100644
> --- a/sysdeps/aarch64/dl-link.sym
> +++ b/sysdeps/aarch64/dl-link.sym
> @@ -7,9 +7,9 @@ DL_SIZEOF_RG sizeof(struct La_aarch64_regs)
> DL_SIZEOF_RV sizeof(struct La_aarch64_retval)
>
> DL_OFFSET_RG_X0 offsetof(struct La_aarch64_regs, lr_xreg)
> -DL_OFFSET_RG_D0 offsetof(struct La_aarch64_regs, lr_dreg)
> +DL_OFFSET_RG_V0 offsetof(struct La_aarch64_regs, lr_vreg)
> DL_OFFSET_RG_SP offsetof(struct La_aarch64_regs, lr_sp)
> DL_OFFSET_RG_LR offsetof(struct La_aarch64_regs, lr_lr)
>
> DL_OFFSET_RV_X0 offsetof(struct La_aarch64_retval, lrv_xreg)
> -DL_OFFSET_RV_D0 offsetof(struct La_aarch64_retval, lrv_dreg)
> +DL_OFFSET_RV_V0 offsetof(struct La_aarch64_retval, lrv_vreg)
ok.
> diff --git a/sysdeps/aarch64/dl-trampoline.S b/sysdeps/aarch64/dl-trampoline.S
> index 794876fffa..c91341e8fc 100644
> --- a/sysdeps/aarch64/dl-trampoline.S
> +++ b/sysdeps/aarch64/dl-trampoline.S
> @@ -46,6 +46,8 @@ _dl_runtime_resolve:
> cfi_rel_offset (lr, 8)
>
> /* Save arguments. */
> + /* Note: Saving x9 is not required by the ABI but the assember requires
> + the immediate values of operand 3 to be a multiple of 16 */
> stp x8, x9, [sp, #-(80+8*16)]!
> cfi_adjust_cfa_offset (80+8*16)
> cfi_rel_offset (x8, 0)
ok.
> @@ -183,19 +185,23 @@ _dl_runtime_profile:
there is a comment at the entry point with offsets
which i think is no longer valid (i think it's
ok to remove the offsets just document the order
of things on the stack)
> stp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
> cfi_rel_offset (x6, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 0)
> cfi_rel_offset (x7, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 8)
> -
> - stp d0, d1, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
> - cfi_rel_offset (d0, OFFSET_RG + DL_OFFSET_RG_D0 + 16*0)
> - cfi_rel_offset (d1, OFFSET_RG + DL_OFFSET_RG_D0 + 16*0 + 8)
> - stp d2, d3, [X29, #OFFSET_RG+ DL_OFFSET_RG_D0 + 16*1]
> - cfi_rel_offset (d2, OFFSET_RG + DL_OFFSET_RG_D0 + 16*1 + 0)
> - cfi_rel_offset (d3, OFFSET_RG + DL_OFFSET_RG_D0 + 16*1 + 8)
> - stp d4, d5, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
> - cfi_rel_offset (d4, OFFSET_RG + DL_OFFSET_RG_D0 + 16*2 + 0)
> - cfi_rel_offset (d5, OFFSET_RG + DL_OFFSET_RG_D0 + 16*2 + 8)
> - stp d6, d7, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
> - cfi_rel_offset (d6, OFFSET_RG + DL_OFFSET_RG_D0 + 16*3 + 0)
> - cfi_rel_offset (d7, OFFSET_RG + DL_OFFSET_RG_D0 + 16*3 + 8)
> + str x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4 + 0]
> + cfi_rel_offset (x8, OFFSET_RG + DL_OFFSET_RG_X0 + 16*4 + 0)
> + /* Note X9 is in the stack frame for alignment but it is not
> + required to be saved by the ABI */
> +
i dont see x9 here. you can just note that there is padding.
> + stp q0, q1, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*0]
> + cfi_rel_offset (q0, OFFSET_RG + DL_OFFSET_RG_V0 + 32*0)
> + cfi_rel_offset (q1, OFFSET_RG + DL_OFFSET_RG_V0 + 32*0 + 16)
> + stp q2, q3, [X29, #OFFSET_RG+ DL_OFFSET_RG_V0 + 32*1]
> + cfi_rel_offset (q2, OFFSET_RG + DL_OFFSET_RG_V0 + 32*1 + 0)
> + cfi_rel_offset (q3, OFFSET_RG + DL_OFFSET_RG_V0 + 32*1 + 16)
> + stp q4, q5, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
> + cfi_rel_offset (q4, OFFSET_RG + DL_OFFSET_RG_V0 + 32*2 + 0)
> + cfi_rel_offset (q5, OFFSET_RG + DL_OFFSET_RG_V0 + 32*2 + 16)
> + stp q6, q7, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
> + cfi_rel_offset (q6, OFFSET_RG + DL_OFFSET_RG_V0 + 32*3 + 0)
> + cfi_rel_offset (q7, OFFSET_RG + DL_OFFSET_RG_V0 + 32*3 + 16)
>
> add x0, x29, #SF_SIZE + 16
> ldr x1, [x29, #OFFSET_LR]
> @@ -234,10 +240,11 @@ _dl_runtime_profile:
> ldp x2, x3, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*1]
> ldp x4, x5, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*2]
> ldp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
> - ldp d0, d1, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
> - ldp d2, d3, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*1]
> - ldp d4, d5, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
> - ldp d6, d7, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
> + ldr x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
> + ldp q0, q1, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 16*0]
> + ldp q2, q3, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*1]
> + ldp q4, q5, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
> + ldp q6, q7, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
>
> cfi_def_cfa_register (sp)
> ldp x29, x30, [x29, #0]
> @@ -280,14 +287,21 @@ _dl_runtime_profile:
> ldp x2, x3, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*1]
> ldp x4, x5, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*2]
> ldp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
> - ldp d0, d1, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
> - ldp d2, d3, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*1]
> - ldp d4, d5, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
> - ldp d6, d7, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
> + ldr x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
> + ldp q0, q1, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*0]
> + ldp q2, q3, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*1]
> + ldp q4, q5, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
> + ldp q6, q7, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
> blr ip0
ok.
> - stp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0]
> - stp d0, d1, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*0]
> - stp d2, d3, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*1]
> + stp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*0]
> + stp x2, x3, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*1]
> + stp x4, x5, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*2]
> + stp x6, x7, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*3]
> + str x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
i think storing x8 is not useful.
> + stp q0, q1, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*0]
> + stp q2, q3, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*1]
> + stp q4, q5, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*2]
> + stp q6, q7, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*3]
>
> /* Setup call to pltexit */
> ldp x0, x1, [x29, #OFFSET_SAVED_CALL_X0]
> @@ -295,9 +309,16 @@ _dl_runtime_profile:
> add x3, x29, #OFFSET_RV
> bl _dl_call_pltexit
>
> - ldp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0]
> - ldp d0, d1, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*0]
> - ldp d2, d3, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*1]
> + ldp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*0]
> + ldp x2, x3, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*1]
> + ldp x4, x5, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*2]
> + ldp x6, x7, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*3]
> + ldr x8, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*4]
> + ldp q0, q1, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*0]
> + ldp q2, q3, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*1]
> + ldp q4, q5, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*2]
> + ldp q6, q7, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*3]
> +
> /* LR from within La_aarch64_reg */
> ldr lr, [x29, #OFFSET_RG + DL_OFFSET_RG_LR]
> cfi_restore(lr)
thanks for the patch.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Fix runtime linker auditing on aarch64
2020-09-23 12:22 ` Florian Weimer
@ 2020-09-23 12:48 ` Szabolcs Nagy
2020-09-23 12:56 ` Florian Weimer
0 siblings, 1 reply; 15+ messages in thread
From: Szabolcs Nagy @ 2020-09-23 12:48 UTC (permalink / raw)
To: Florian Weimer; +Cc: Ben Woodard via Libc-alpha
The 09/23/2020 14:22, Florian Weimer via Libc-alpha wrote:
> * Ben Woodard via Libc-alpha:
>
> > To fix this
> > * The La_aarch64_regs structure was expanded to include x8 and the full
> > sized NEON V registers that are required to be preserved by the ABI.
>
> Off-list, you said that the audit interface was completely broken on
> AArch64. But it seems to be working enough for sotruss. So I do wonder
> if we have to do a proper ABI transition here after all (bumping
> LAV_CURRENT and all the consequences of that).
i think plt hooks currently don't work for functions
that take neon vector arguments because the save/restore
logic clobbers the top bits (but such extern calls are
not common since they need to use non-portable types)
but i agree if it's not too intrusive to bump the audit
abi then we should do so and then the incompatibility
can be detected at least.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Fix runtime linker auditing on aarch64
2020-09-23 12:48 ` Szabolcs Nagy
@ 2020-09-23 12:56 ` Florian Weimer
2020-09-23 13:18 ` Szabolcs Nagy
0 siblings, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2020-09-23 12:56 UTC (permalink / raw)
To: Szabolcs Nagy; +Cc: Ben Woodard via Libc-alpha
* Szabolcs Nagy:
> The 09/23/2020 14:22, Florian Weimer via Libc-alpha wrote:
>> * Ben Woodard via Libc-alpha:
>>
>> > To fix this
>> > * The La_aarch64_regs structure was expanded to include x8 and the full
>> > sized NEON V registers that are required to be preserved by the ABI.
>>
>> Off-list, you said that the audit interface was completely broken on
>> AArch64. But it seems to be working enough for sotruss. So I do wonder
>> if we have to do a proper ABI transition here after all (bumping
>> LAV_CURRENT and all the consequences of that).
>
> i think plt hooks currently don't work for functions
> that take neon vector arguments because the save/restore
> logic clobbers the top bits (but such extern calls are
> not common since they need to use non-portable types)
>
> but i agree if it's not too intrusive to bump the audit
> abi then we should do so and then the incompatibility
> can be detected at least.
The other question I had if we can do this once and make sure that the
CPU state is represented in such a way that we can efficiently save and
load it for later SVE support, so that we do not have to create two
copies (the architecture state and the audit representation), or bump
LAV_CURRENT for a new CPU.
(I'm aware that AArch64 would be pioneering audit support for vector
calling conventions.)
Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Fix runtime linker auditing on aarch64
2020-09-23 12:56 ` Florian Weimer
@ 2020-09-23 13:18 ` Szabolcs Nagy
2020-09-24 11:10 ` Florian Weimer
0 siblings, 1 reply; 15+ messages in thread
From: Szabolcs Nagy @ 2020-09-23 13:18 UTC (permalink / raw)
To: Florian Weimer; +Cc: Ben Woodard via Libc-alpha
The 09/23/2020 14:56, Florian Weimer wrote:
> * Szabolcs Nagy:
> > The 09/23/2020 14:22, Florian Weimer via Libc-alpha wrote:
> >> * Ben Woodard via Libc-alpha:
> >>
> >> > To fix this
> >> > * The La_aarch64_regs structure was expanded to include x8 and the full
> >> > sized NEON V registers that are required to be preserved by the ABI.
> >>
> >> Off-list, you said that the audit interface was completely broken on
> >> AArch64. But it seems to be working enough for sotruss. So I do wonder
> >> if we have to do a proper ABI transition here after all (bumping
> >> LAV_CURRENT and all the consequences of that).
> >
> > i think plt hooks currently don't work for functions
> > that take neon vector arguments because the save/restore
> > logic clobbers the top bits (but such extern calls are
> > not common since they need to use non-portable types)
> >
> > but i agree if it's not too intrusive to bump the audit
> > abi then we should do so and then the incompatibility
> > can be detected at least.
>
> The other question I had if we can do this once and make sure that the
> CPU state is represented in such a way that we can efficiently save and
> load it for later SVE support, so that we do not have to create two
> copies (the architecture state and the audit representation), or bump
> LAV_CURRENT for a new CPU.
>
> (I'm aware that AArch64 would be pioneering audit support for vector
> calling conventions.)
we don't have a way to do this in the architecture
(i.e. reg state save/restore operations)
we can have something in the kernel (which needs
to know about the supported registers) but i
assume using a syscall for this is too much overhead.
(maybe in vdso? sounds ugly if we need it before
vdso is set up)
(i can tell the architects about this requirement
in case they invent new register state.)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Fix runtime linker auditing on aarch64
2020-09-23 12:34 ` Szabolcs Nagy
@ 2020-09-23 15:10 ` Ben Coyote Woodard
2020-09-23 16:19 ` Szabolcs Nagy
2020-09-23 18:00 ` Ben Coyote Woodard
1 sibling, 1 reply; 15+ messages in thread
From: Ben Coyote Woodard @ 2020-09-23 15:10 UTC (permalink / raw)
To: Szabolcs Nagy; +Cc: libc-alpha
On Wed, Sep 23, 2020 at 13:34, Szabolcs Nagy <szabolcs.nagy@arm.com>
wrote:
> The 09/22/2020 18:16, Ben Woodard via Libc-alpha wrote:
>> The dynamic linker's auditing was not working on aarch64. See
>> PR#26643
>> <https://sourceware.org/bugzilla/show_bug.cgi?id=26643>
>>
>> There were two distinct problems:
>> * _dl_runtime_resolve was not preserving x8 the indirect result
>> location
>> register.
>> * The NEON Q registers pushed onto the stack by
>> _dl_runtime_resolve
>> were twice the size of D registers extracted from the stack
>> frame by
>> _dl_runtime_profile.
>>
>> To fix this
>> * The La_aarch64_regs structure was expanded to include x8 and
>> the full
>> sized NEON V registers that are required to be preserved by the
>> ABI.
>> * _dl_runtime_profile needed to extract registers saved by
>> _dl_runtime_resolve and put them into the new correctly sized
>> La_aarch64_regs structure.
>> * The return value structure La_aarch64_retval also didn't have
>> the correctly
>> sized NEON V registers.
>>
>> As a couple of additional cleanups
>> * The names of the NEON registers saved within the
>> La_aarch64_regs and the
>> La_aarch_retval structures referred to the old D registers
>> which were
>> doubles. Now the registers are quads and are called V for
>> vector registers.
>> So the name of the field in the structure and the names of the
>> offsets
>> within that structure were named to use the more modern names.
>> * The ABI specification says that r0-r7 + r8 the indirect result
>> location
>> register as well as the NEON v0-v7 registers can be used to
>> return values
>> from a function. Therefore, I addded those to the
>> La_aarch64_retval
>> structure so that it also correctly matches the ABI.
>>
>> An additional problem not addressed by this patch is what to do
>> about the
>> changes to the aarch64 ABI needed to support SVE. A discussion
>> about what to
>> do about that was begun on libc-alpha here:
>>
>> <https://sourceware.org/pipermail/libc-alpha/2020-September/117797.html>
>
> SVE calls are special (marked as STO_AARCH64_VARIANT_PCS
> in the dynamic symbol table) and currently load time
> bound (like BIND_NOW) so i think they don't go through
> the PLT0 sequence that calls the profile entry in ld.so
> and thus audit hooks are not called for them.
>
> this is probably not what LD_AUDIT users would want
> (do they care about hooking into sve calls?), but
> VARIANT_PCS essentially allows any call convention,
> so all registers have to be saved and restored if
> such call enters the dynamic linker which is a problem
> if register state may be extended in the future
> (although probably ldaudit is special enough that its
> users can update glibc if they care about new regs?).
>
> (one way to expose variant pcs calls to audit hooks
> is to detect the symbol type in the asm entry point
> and then call a different hook, but this sounds
> sufficiently ugly that i think we would then prefer
> to update the elf abi with a second plt entry sequence
> for variant pcs calls that can just use a different
> entry into ld.so and new linkers would generate that
> for dsos with variant pcs symbols.)
I'll deal with the SVE stuff in a different email.
>
>> ---
>> sysdeps/aarch64/bits/link.h | 17 ++++----
>> sysdeps/aarch64/dl-link.sym | 4 +-
>> sysdeps/aarch64/dl-trampoline.S | 75
>> +++++++++++++++++++++------------
>> 3 files changed, 59 insertions(+), 37 deletions(-)
>>
>> diff --git a/sysdeps/aarch64/bits/link.h
>> b/sysdeps/aarch64/bits/link.h
>> index 0c54e6ea7b..2b43ace57c 100644
>> --- a/sysdeps/aarch64/bits/link.h
>> +++ b/sysdeps/aarch64/bits/link.h
>> @@ -23,19 +23,20 @@
>> /* Registers for entry into PLT on AArch64. */
>> typedef struct La_aarch64_regs
>> {
>> - uint64_t lr_xreg[8];
>> - uint64_t lr_dreg[8];
>> - uint64_t lr_sp;
>> - uint64_t lr_lr;
>> + uint64_t lr_xreg[9];
>> + __uint128_t lr_vreg[8];
>> + uint64_t lr_sp;
>> + uint64_t lr_lr;
>> } La_aarch64_regs;
>
> ok.
>
> changing abi is fine with me: old abi was
> unusably broken, if audit modules use some
> versioning that's even better.
>
>>
>> /* Return values for calls from PLT on AArch64. */
>> typedef struct La_aarch64_retval
>> {
>> - /* Up to two integer registers can be used for a return value.
>> */
>> - uint64_t lrv_xreg[2];
>> - /* Up to four D registers can be used for a return value. */
>> - uint64_t lrv_dreg[4];
>> + /* Up to eight integer registers and the indirect result
>> location register
>> + can be used for a return value. */
>> + uint64_t lrv_xreg[9];
>
> x8 is not preserved so recording it at function exit
> is not useful. (on entry it points to where results
> are stored but on exit it can be clobbered)
OK that was not clear to me reading the AAPCS. Do you want to ping
you're colleagues the tech writers over at arm and see if they can
tighten up the language a bit.
I dropped it back down to 8.
>
>> + /* Up to eight V registers can be used for a return value. */
>> + __uint128_t lrv_vreg[8];
>>
>> } La_aarch64_retval;
>> __BEGIN_DECLS
>
> note: i don't like to use non-standard types in
> public apis (like __uint128_t), but we already
> made this mistake in the linux sigcontext, so this
> is probably ok.
>
> (my preference normally is to use a standard type
> e.g. long double or char[] with alignment attr,
> but in practice __uint128_t is probably easier to
> deal with)
>
I kind of prefer "long double" here as well. It is after all what it
likely is. I'm not really attached to __uint128_t; the previous version
of the interface had uint64_t and so when making the registers the
correct size I changed it to uint128_t but that didn't compile and then
when I grepped the source I found __uint128_t. It wasn't like I put a
lot of thought into that decision.
>
>> diff --git a/sysdeps/aarch64/dl-link.sym
>> b/sysdeps/aarch64/dl-link.sym
>> index d67d28b40c..70d153a1d5 100644
>> --- a/sysdeps/aarch64/dl-link.sym
>> +++ b/sysdeps/aarch64/dl-link.sym
>> @@ -7,9 +7,9 @@ DL_SIZEOF_RG sizeof(struct La_aarch64_regs)
>> DL_SIZEOF_RV sizeof(struct La_aarch64_retval)
>>
>> DL_OFFSET_RG_X0 offsetof(struct La_aarch64_regs, lr_xreg)
>> -DL_OFFSET_RG_D0 offsetof(struct La_aarch64_regs, lr_dreg)
>> +DL_OFFSET_RG_V0 offsetof(struct La_aarch64_regs, lr_vreg)
>> DL_OFFSET_RG_SP offsetof(struct La_aarch64_regs, lr_sp)
>> DL_OFFSET_RG_LR offsetof(struct La_aarch64_regs, lr_lr)
>>
>> DL_OFFSET_RV_X0 offsetof(struct La_aarch64_retval, lrv_xreg)
>> -DL_OFFSET_RV_D0 offsetof(struct La_aarch64_retval, lrv_dreg)
>> +DL_OFFSET_RV_V0 offsetof(struct La_aarch64_retval, lrv_vreg)
>
> ok.
>
>> diff --git a/sysdeps/aarch64/dl-trampoline.S
>> b/sysdeps/aarch64/dl-trampoline.S
>> index 794876fffa..c91341e8fc 100644
>> --- a/sysdeps/aarch64/dl-trampoline.S
>> +++ b/sysdeps/aarch64/dl-trampoline.S
>> @@ -46,6 +46,8 @@ _dl_runtime_resolve:
>> cfi_rel_offset (lr, 8)
>>
>> /* Save arguments. */
>> + /* Note: Saving x9 is not required by the ABI but the assember
>> requires
>> + the immediate values of operand 3 to be a multiple of 16 */
>> stp x8, x9, [sp, #-(80+8*16)]!
>> cfi_adjust_cfa_offset (80+8*16)
>> cfi_rel_offset (x8, 0)
>
> ok.
>
>> @@ -183,19 +185,23 @@ _dl_runtime_profile:
>
> there is a comment at the entry point with offsets
> which i think is no longer valid (i think it's
> ok to remove the offsets just document the order
> of things on the stack)
oops missed that part. Fixed.
In an earlier version of the patch forgot to update the retval
structure and I remember looking at that comment and thinking, "I
didn't change retval and so everything should be the same" but then I
changed the retval structure.
>
>> stp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
>> cfi_rel_offset (x6, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 0)
>> cfi_rel_offset (x7, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 8)
>> -
>> - stp d0, d1, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
>> - cfi_rel_offset (d0, OFFSET_RG + DL_OFFSET_RG_D0 + 16*0)
>> - cfi_rel_offset (d1, OFFSET_RG + DL_OFFSET_RG_D0 + 16*0 + 8)
>> - stp d2, d3, [X29, #OFFSET_RG+ DL_OFFSET_RG_D0 + 16*1]
>> - cfi_rel_offset (d2, OFFSET_RG + DL_OFFSET_RG_D0 + 16*1 + 0)
>> - cfi_rel_offset (d3, OFFSET_RG + DL_OFFSET_RG_D0 + 16*1 + 8)
>> - stp d4, d5, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
>> - cfi_rel_offset (d4, OFFSET_RG + DL_OFFSET_RG_D0 + 16*2 + 0)
>> - cfi_rel_offset (d5, OFFSET_RG + DL_OFFSET_RG_D0 + 16*2 + 8)
>> - stp d6, d7, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
>> - cfi_rel_offset (d6, OFFSET_RG + DL_OFFSET_RG_D0 + 16*3 + 0)
>> - cfi_rel_offset (d7, OFFSET_RG + DL_OFFSET_RG_D0 + 16*3 + 8)
>> + str x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4 + 0]
>> + cfi_rel_offset (x8, OFFSET_RG + DL_OFFSET_RG_X0 + 16*4 + 0)
>> + /* Note X9 is in the stack frame for alignment but it is not
>> + required to be saved by the ABI */
>> +
>
> i dont see x9 here. you can just note that there is padding.
fixed
>
>> + stp q0, q1, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*0]
>> + cfi_rel_offset (q0, OFFSET_RG + DL_OFFSET_RG_V0 + 32*0)
>> + cfi_rel_offset (q1, OFFSET_RG + DL_OFFSET_RG_V0 + 32*0 + 16)
>> + stp q2, q3, [X29, #OFFSET_RG+ DL_OFFSET_RG_V0 + 32*1]
>> + cfi_rel_offset (q2, OFFSET_RG + DL_OFFSET_RG_V0 + 32*1 + 0)
>> + cfi_rel_offset (q3, OFFSET_RG + DL_OFFSET_RG_V0 + 32*1 + 16)
>> + stp q4, q5, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
>> + cfi_rel_offset (q4, OFFSET_RG + DL_OFFSET_RG_V0 + 32*2 + 0)
>> + cfi_rel_offset (q5, OFFSET_RG + DL_OFFSET_RG_V0 + 32*2 + 16)
>> + stp q6, q7, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
>> + cfi_rel_offset (q6, OFFSET_RG + DL_OFFSET_RG_V0 + 32*3 + 0)
>> + cfi_rel_offset (q7, OFFSET_RG + DL_OFFSET_RG_V0 + 32*3 + 16)
>>
>> add x0, x29, #SF_SIZE + 16
>> ldr x1, [x29, #OFFSET_LR]
>> @@ -234,10 +240,11 @@ _dl_runtime_profile:
>> ldp x2, x3, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*1]
>> ldp x4, x5, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*2]
>> ldp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
>> - ldp d0, d1, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
>> - ldp d2, d3, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*1]
>> - ldp d4, d5, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
>> - ldp d6, d7, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
>> + ldr x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
>> + ldp q0, q1, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 16*0]
>> + ldp q2, q3, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*1]
>> + ldp q4, q5, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
>> + ldp q6, q7, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
>>
>> cfi_def_cfa_register (sp)
>> ldp x29, x30, [x29, #0]
>> @@ -280,14 +287,21 @@ _dl_runtime_profile:
>> ldp x2, x3, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*1]
>> ldp x4, x5, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*2]
>> ldp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
>> - ldp d0, d1, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
>> - ldp d2, d3, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*1]
>> - ldp d4, d5, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
>> - ldp d6, d7, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
>> + ldr x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
>> + ldp q0, q1, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*0]
>> + ldp q2, q3, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*1]
>> + ldp q4, q5, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
>> + ldp q6, q7, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
>> blr ip0
>
> ok.
>
>> - stp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0]
>> - stp d0, d1, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*0]
>> - stp d2, d3, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*1]
>> + stp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*0]
>> + stp x2, x3, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*1]
>> + stp x4, x5, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*2]
>> + stp x6, x7, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*3]
>> + str x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
>
> i think storing x8 is not useful.
fixed
>
>> + stp q0, q1, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*0]
>> + stp q2, q3, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*1]
>> + stp q4, q5, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*2]
>> + stp q6, q7, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*3]
>>
>> /* Setup call to pltexit */
>> ldp x0, x1, [x29, #OFFSET_SAVED_CALL_X0]
>> @@ -295,9 +309,16 @@ _dl_runtime_profile:
>> add x3, x29, #OFFSET_RV
>> bl _dl_call_pltexit
>>
>> - ldp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0]
>> - ldp d0, d1, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*0]
>> - ldp d2, d3, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*1]
>> + ldp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*0]
>> + ldp x2, x3, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*1]
>> + ldp x4, x5, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*2]
>> + ldp x6, x7, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*3]
>> + ldr x8, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*4]
>> + ldp q0, q1, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*0]
>> + ldp q2, q3, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*1]
>> + ldp q4, q5, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*2]
>> + ldp q6, q7, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*3]
>> +
>> /* LR from within La_aarch64_reg */
>> ldr lr, [x29, #OFFSET_RG + DL_OFFSET_RG_LR]
>> cfi_restore(lr)
>
> thanks for the patch.
You're welcome. Standby for V2 of the patch.
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Fix runtime linker auditing on aarch64
2020-09-23 15:10 ` Ben Coyote Woodard
@ 2020-09-23 16:19 ` Szabolcs Nagy
2020-09-23 18:12 ` Ben Coyote Woodard
0 siblings, 1 reply; 15+ messages in thread
From: Szabolcs Nagy @ 2020-09-23 16:19 UTC (permalink / raw)
To: Ben Coyote Woodard; +Cc: libc-alpha
The 09/23/2020 08:10, Ben Coyote Woodard wrote:
> On Wed, Sep 23, 2020 at 13:34, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> > The 09/22/2020 18:16, Ben Woodard via Libc-alpha wrote:
> > > /* Return values for calls from PLT on AArch64. */
> > > typedef struct La_aarch64_retval
> > > {
> > > - /* Up to two integer registers can be used for a return value.
> > > */
> > > - uint64_t lrv_xreg[2];
> > > - /* Up to four D registers can be used for a return value. */
> > > - uint64_t lrv_dreg[4];
> > > + /* Up to eight integer registers and the indirect result
> > > location register
> > > + can be used for a return value. */
> > > + uint64_t lrv_xreg[9];
> >
> > x8 is not preserved so recording it at function exit
> > is not useful. (on entry it points to where results
> > are stored but on exit it can be clobbered)
>
> OK that was not clear to me reading the AAPCS. Do you want to ping you're
> colleagues the tech writers over at arm and see if they can tighten up the
> language a bit.
aapcs is now openly developed (on github)
so you can submit bug reports easily ;)
in this case section 6.1.1 does not say
if x8 is preserved or not, but 6.5 is quite
explicit i think:
https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#result-return
> > > + /* Up to eight V registers can be used for a return value. */
> > > + __uint128_t lrv_vreg[8];
> > >
> > > } La_aarch64_retval;
> > > __BEGIN_DECLS
> >
> > note: i don't like to use non-standard types in
> > public apis (like __uint128_t), but we already
> > made this mistake in the linux sigcontext, so this
> > is probably ok.
> >
> > (my preference normally is to use a standard type
> > e.g. long double or char[] with alignment attr,
> > but in practice __uint128_t is probably easier to
> > deal with)
> >
>
> I kind of prefer "long double" here as well. It is after all what it likely
> is. I'm not really attached to __uint128_t; the previous version of the
> interface had uint64_t and so when making the registers the correct size I
> changed it to uint128_t but that didn't compile and then when I grepped the
> source I found __uint128_t. It wasn't like I put a lot of thought into that
> decision.
hm, i think the common case is to inspect float
or double arguments in the v regs, so the
__uint128_t is probably easier for the user
to deal with. (and sigcontext etc already
uses that for save/restore of the registers
so it is better for interop and least surprise)
it's unfortunate that there is no standard
uint128_t type. (i think the arm way would
be to use "uint8x16_t" or similar type from
arm_neon.h but in practice that's harder
to use)
so keep this as is.
(but this reminds me that the current hooks
are broken for long double functions, not
just for functions taking neon vector args)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Fix runtime linker auditing on aarch64
2020-09-23 12:34 ` Szabolcs Nagy
2020-09-23 15:10 ` Ben Coyote Woodard
@ 2020-09-23 18:00 ` Ben Coyote Woodard
1 sibling, 0 replies; 15+ messages in thread
From: Ben Coyote Woodard @ 2020-09-23 18:00 UTC (permalink / raw)
To: Szabolcs Nagy; +Cc: libc-alpha
On Wed, Sep 23, 2020 at 13:34, Szabolcs Nagy <szabolcs.nagy@arm.com>
wrote:
>>
>> An additional problem not addressed by this patch is what to do
>> about the
>> changes to the aarch64 ABI needed to support SVE. A discussion
>> about what to
>> do about that was begun on libc-alpha here:
>>
>> <https://sourceware.org/pipermail/libc-alpha/2020-September/117797.html>
>
> SVE calls are special (marked as STO_AARCH64_VARIANT_PCS
> in the dynamic symbol table) and currently load time
> bound (like BIND_NOW) so i think they don't go through
> the PLT0 sequence that calls the profile entry in ld.so
> and thus audit hooks are not called for them.
>
> this is probably not what LD_AUDIT users would want
> (do they care about hooking into sve calls?), but
Most certainly!!
The two big users of LD_AUDIT that I often times work with are:
spindle https://computing.llnl.gov/projects/spindle and hpctoolkit
http://hpctoolkit.org/ I was talking to a colleague over at cray/HPE
where they are selling the Apollo 80 which use the A64FX processor
which has SVE and he immediately mentioned spindle and said that he
would talk to the cray team that handles spindle. So this problem with
SVE may not have been reported by users yet but it is just barely over
the horizon.
> VARIANT_PCS essentially allows any call convention,
That seems like a really odd choice to me. You guys over at ARM did
publish a AAPCS for processors with SVE defining which registers need
to be saved vs which could be clobbered.
https://developer.arm.com/documentation/100986/0000/?search=5eec7447e24a5e02d07b2774
I've been trying to figure out how to handle this. I kind of needed to
know what the structure of the La_aarch64_{regs,retval} before I really
dug deeply into this. If we could just define the how you want to
present the structures, I will update my auditing patch to use that
structure and then we can figure out how to fill later.
The pieces that I have but hadn't quite put together are:
- There is HWCAP_SVE in sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h As
I understand it so far, this is an ELF attribute that the dynamic
linker could use when picking which version of a library to load. If
this were propagated into the dynamic linker, we could tell at at least
a library level if any functions in this library used SVE. If it didn't
then we wouldn't have to save SVE registers.
- I haven't checked if libraries do in fact set this HWCAP flag. If
they don't then that is probably a binutils issue.
- I haven't yet figured out how to find and make use of this
library flag where I need it in the _dl_runtime_resolve and
_dl_runtime_profile. If they are not relatively easy to find, I
considered pushing the processor's hwcap and the library's hwcap onto
the stack from the C code which calls those functions.
- If the library's hwcap flags are not available in an audit
library, we need to define a way that they are made available because
they are needed to make the correct library selection purposes.
Consider for example the case where an audit lib is trying to interpose
its selection of a library over the normal selection made by the
runtime linker. It needs to be able to evaluate if the hardware has all
capabilities that the library requires.
- Similarly, if a library does not make use of any of the NEON
registers, then we may be able to get away with just saving the X regs
and skipping the V regs. They are less of a problem than the
potentially 2048 bit Z regs but it adds up.
However all of that happens at the library level, since the functions
are of a different type in the dynamic section, it would be even better
to make the decision about what needs to be saved at a per-function
level.
I noted that I can potentially use ID_AA64PFR0_EL1 [35:32] to see if
the processor has SVE then read ZCR_EL1[0:3] to figure out the internal
register length of the SVE registers.
Then I would use ld1 and st1 to copy that length to a char[] or
something.
Admittedly that is all less than half baked, which is why I wanted to
agree on a La_aarch64_{regs,retval} structure first.
> so all registers have to be saved and restored if
> such call enters the dynamic linker which is a problem
> if register state may be extended in the future
> (although probably ldaudit is special enough that its
> users can update glibc if they care about new regs?).
>
> (one way to expose variant pcs calls to audit hooks
> is to detect the symbol type in the asm entry point
> and then call a different hook, but this sounds
> sufficiently ugly that i think we would then prefer
> to update the elf abi with a second plt entry sequence
> for variant pcs calls that can just use a different
> entry into ld.so and new linkers would generate that
> for dsos with variant pcs symbols.)
>
>> ---
>> sysdeps/aarch64/bits/link.h | 17 ++++----
>> sysdeps/aarch64/dl-link.sym | 4 +-
>> sysdeps/aarch64/dl-trampoline.S | 75
>> +++++++++++++++++++++------------
>> 3 files changed, 59 insertions(+), 37 deletions(-)
>>
>> diff --git a/sysdeps/aarch64/bits/link.h
>> b/sysdeps/aarch64/bits/link.h
>> index 0c54e6ea7b..2b43ace57c 100644
>> --- a/sysdeps/aarch64/bits/link.h
>> +++ b/sysdeps/aarch64/bits/link.h
>> @@ -23,19 +23,20 @@
>> /* Registers for entry into PLT on AArch64. */
>> typedef struct La_aarch64_regs
>> {
>> - uint64_t lr_xreg[8];
>> - uint64_t lr_dreg[8];
>> - uint64_t lr_sp;
>> - uint64_t lr_lr;
>> + uint64_t lr_xreg[9];
>> + __uint128_t lr_vreg[8];
>> + uint64_t lr_sp;
>> + uint64_t lr_lr;
>> } La_aarch64_regs;
>
> ok.
>
> changing abi is fine with me: old abi was
> unusably broken, if audit modules use some
> versioning that's even better.
>
>>
>> /* Return values for calls from PLT on AArch64. */
>> typedef struct La_aarch64_retval
>> {
>> - /* Up to two integer registers can be used for a return value.
>> */
>> - uint64_t lrv_xreg[2];
>> - /* Up to four D registers can be used for a return value. */
>> - uint64_t lrv_dreg[4];
>> + /* Up to eight integer registers and the indirect result
>> location register
>> + can be used for a return value. */
>> + uint64_t lrv_xreg[9];
>
> x8 is not preserved so recording it at function exit
> is not useful. (on entry it points to where results
> are stored but on exit it can be clobbered)
>
>> + /* Up to eight V registers can be used for a return value. */
>> + __uint128_t lrv_vreg[8];
>>
>> } La_aarch64_retval;
>> __BEGIN_DECLS
>
> note: i don't like to use non-standard types in
> public apis (like __uint128_t), but we already
> made this mistake in the linux sigcontext, so this
> is probably ok.
>
> (my preference normally is to use a standard type
> e.g. long double or char[] with alignment attr,
> but in practice __uint128_t is probably easier to
> deal with)
>
>
>> diff --git a/sysdeps/aarch64/dl-link.sym
>> b/sysdeps/aarch64/dl-link.sym
>> index d67d28b40c..70d153a1d5 100644
>> --- a/sysdeps/aarch64/dl-link.sym
>> +++ b/sysdeps/aarch64/dl-link.sym
>> @@ -7,9 +7,9 @@ DL_SIZEOF_RG sizeof(struct La_aarch64_regs)
>> DL_SIZEOF_RV sizeof(struct La_aarch64_retval)
>>
>> DL_OFFSET_RG_X0 offsetof(struct La_aarch64_regs, lr_xreg)
>> -DL_OFFSET_RG_D0 offsetof(struct La_aarch64_regs, lr_dreg)
>> +DL_OFFSET_RG_V0 offsetof(struct La_aarch64_regs, lr_vreg)
>> DL_OFFSET_RG_SP offsetof(struct La_aarch64_regs, lr_sp)
>> DL_OFFSET_RG_LR offsetof(struct La_aarch64_regs, lr_lr)
>>
>> DL_OFFSET_RV_X0 offsetof(struct La_aarch64_retval, lrv_xreg)
>> -DL_OFFSET_RV_D0 offsetof(struct La_aarch64_retval, lrv_dreg)
>> +DL_OFFSET_RV_V0 offsetof(struct La_aarch64_retval, lrv_vreg)
>
> ok.
>
>> diff --git a/sysdeps/aarch64/dl-trampoline.S
>> b/sysdeps/aarch64/dl-trampoline.S
>> index 794876fffa..c91341e8fc 100644
>> --- a/sysdeps/aarch64/dl-trampoline.S
>> +++ b/sysdeps/aarch64/dl-trampoline.S
>> @@ -46,6 +46,8 @@ _dl_runtime_resolve:
>> cfi_rel_offset (lr, 8)
>>
>> /* Save arguments. */
>> + /* Note: Saving x9 is not required by the ABI but the assember
>> requires
>> + the immediate values of operand 3 to be a multiple of 16 */
>> stp x8, x9, [sp, #-(80+8*16)]!
>> cfi_adjust_cfa_offset (80+8*16)
>> cfi_rel_offset (x8, 0)
>
> ok.
>
>> @@ -183,19 +185,23 @@ _dl_runtime_profile:
>
> there is a comment at the entry point with offsets
> which i think is no longer valid (i think it's
> ok to remove the offsets just document the order
> of things on the stack)
>
>> stp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
>> cfi_rel_offset (x6, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 0)
>> cfi_rel_offset (x7, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 8)
>> -
>> - stp d0, d1, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
>> - cfi_rel_offset (d0, OFFSET_RG + DL_OFFSET_RG_D0 + 16*0)
>> - cfi_rel_offset (d1, OFFSET_RG + DL_OFFSET_RG_D0 + 16*0 + 8)
>> - stp d2, d3, [X29, #OFFSET_RG+ DL_OFFSET_RG_D0 + 16*1]
>> - cfi_rel_offset (d2, OFFSET_RG + DL_OFFSET_RG_D0 + 16*1 + 0)
>> - cfi_rel_offset (d3, OFFSET_RG + DL_OFFSET_RG_D0 + 16*1 + 8)
>> - stp d4, d5, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
>> - cfi_rel_offset (d4, OFFSET_RG + DL_OFFSET_RG_D0 + 16*2 + 0)
>> - cfi_rel_offset (d5, OFFSET_RG + DL_OFFSET_RG_D0 + 16*2 + 8)
>> - stp d6, d7, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
>> - cfi_rel_offset (d6, OFFSET_RG + DL_OFFSET_RG_D0 + 16*3 + 0)
>> - cfi_rel_offset (d7, OFFSET_RG + DL_OFFSET_RG_D0 + 16*3 + 8)
>> + str x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4 + 0]
>> + cfi_rel_offset (x8, OFFSET_RG + DL_OFFSET_RG_X0 + 16*4 + 0)
>> + /* Note X9 is in the stack frame for alignment but it is not
>> + required to be saved by the ABI */
>> +
>
> i dont see x9 here. you can just note that there is padding.
>
>> + stp q0, q1, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*0]
>> + cfi_rel_offset (q0, OFFSET_RG + DL_OFFSET_RG_V0 + 32*0)
>> + cfi_rel_offset (q1, OFFSET_RG + DL_OFFSET_RG_V0 + 32*0 + 16)
>> + stp q2, q3, [X29, #OFFSET_RG+ DL_OFFSET_RG_V0 + 32*1]
>> + cfi_rel_offset (q2, OFFSET_RG + DL_OFFSET_RG_V0 + 32*1 + 0)
>> + cfi_rel_offset (q3, OFFSET_RG + DL_OFFSET_RG_V0 + 32*1 + 16)
>> + stp q4, q5, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
>> + cfi_rel_offset (q4, OFFSET_RG + DL_OFFSET_RG_V0 + 32*2 + 0)
>> + cfi_rel_offset (q5, OFFSET_RG + DL_OFFSET_RG_V0 + 32*2 + 16)
>> + stp q6, q7, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
>> + cfi_rel_offset (q6, OFFSET_RG + DL_OFFSET_RG_V0 + 32*3 + 0)
>> + cfi_rel_offset (q7, OFFSET_RG + DL_OFFSET_RG_V0 + 32*3 + 16)
>>
>> add x0, x29, #SF_SIZE + 16
>> ldr x1, [x29, #OFFSET_LR]
>> @@ -234,10 +240,11 @@ _dl_runtime_profile:
>> ldp x2, x3, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*1]
>> ldp x4, x5, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*2]
>> ldp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
>> - ldp d0, d1, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
>> - ldp d2, d3, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*1]
>> - ldp d4, d5, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
>> - ldp d6, d7, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
>> + ldr x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
>> + ldp q0, q1, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 16*0]
>> + ldp q2, q3, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*1]
>> + ldp q4, q5, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
>> + ldp q6, q7, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
>>
>> cfi_def_cfa_register (sp)
>> ldp x29, x30, [x29, #0]
>> @@ -280,14 +287,21 @@ _dl_runtime_profile:
>> ldp x2, x3, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*1]
>> ldp x4, x5, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*2]
>> ldp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
>> - ldp d0, d1, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
>> - ldp d2, d3, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*1]
>> - ldp d4, d5, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
>> - ldp d6, d7, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
>> + ldr x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
>> + ldp q0, q1, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*0]
>> + ldp q2, q3, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*1]
>> + ldp q4, q5, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
>> + ldp q6, q7, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
>> blr ip0
>
> ok.
>
>> - stp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0]
>> - stp d0, d1, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*0]
>> - stp d2, d3, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*1]
>> + stp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*0]
>> + stp x2, x3, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*1]
>> + stp x4, x5, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*2]
>> + stp x6, x7, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*3]
>> + str x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
>
> i think storing x8 is not useful.
>
>> + stp q0, q1, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*0]
>> + stp q2, q3, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*1]
>> + stp q4, q5, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*2]
>> + stp q6, q7, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*3]
>>
>> /* Setup call to pltexit */
>> ldp x0, x1, [x29, #OFFSET_SAVED_CALL_X0]
>> @@ -295,9 +309,16 @@ _dl_runtime_profile:
>> add x3, x29, #OFFSET_RV
>> bl _dl_call_pltexit
>>
>> - ldp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0]
>> - ldp d0, d1, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*0]
>> - ldp d2, d3, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*1]
>> + ldp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*0]
>> + ldp x2, x3, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*1]
>> + ldp x4, x5, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*2]
>> + ldp x6, x7, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*3]
>> + ldr x8, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*4]
>> + ldp q0, q1, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*0]
>> + ldp q2, q3, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*1]
>> + ldp q4, q5, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*2]
>> + ldp q6, q7, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*3]
>> +
>> /* LR from within La_aarch64_reg */
>> ldr lr, [x29, #OFFSET_RG + DL_OFFSET_RG_LR]
>> cfi_restore(lr)
>
> thanks for the patch.
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Fix runtime linker auditing on aarch64
2020-09-23 16:19 ` Szabolcs Nagy
@ 2020-09-23 18:12 ` Ben Coyote Woodard
0 siblings, 0 replies; 15+ messages in thread
From: Ben Coyote Woodard @ 2020-09-23 18:12 UTC (permalink / raw)
To: Szabolcs Nagy; +Cc: libc-alpha
On Wed, Sep 23, 2020 at 17:19, Szabolcs Nagy <szabolcs.nagy@arm.com>
wrote:
> The 09/23/2020 08:10, Ben Coyote Woodard wrote:
>> On Wed, Sep 23, 2020 at 13:34, Szabolcs Nagy <szabolcs.nagy@arm.com
>> <mailto:szabolcs.nagy@arm.com>> wrote:
>> > The 09/22/2020 18:16, Ben Woodard via Libc-alpha wrote:
>> > > /* Return values for calls from PLT on AArch64. */
>> > > typedef struct La_aarch64_retval
>> > > {
>> > > - /* Up to two integer registers can be used for a return
>> value.
>> > > */
>> > > - uint64_t lrv_xreg[2];
>> > > - /* Up to four D registers can be used for a return value.
>> */
>> > > - uint64_t lrv_dreg[4];
>> > > + /* Up to eight integer registers and the indirect result
>> > > location register
>> > > + can be used for a return value. */
>> > > + uint64_t lrv_xreg[9];
>> >
>> > x8 is not preserved so recording it at function exit
>> > is not useful. (on entry it points to where results
>> > are stored but on exit it can be clobbered)
>>
>> OK that was not clear to me reading the AAPCS. Do you want to ping
>> you're
>> colleagues the tech writers over at arm and see if they can tighten
>> up the
>> language a bit.
>
> aapcs is now openly developed (on github)
> so you can submit bug reports easily ;)
>
> in this case section 6.1.1 does not say
> if x8 is preserved or not, but 6.5 is quite
> explicit i think:
>
> <https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#result-return>
>
Good enough for me. I either didn't read that section or that last line
was not in the earlier version that I read.
>> > > + /* Up to eight V registers can be used for a return value.
>> */
>> > > + __uint128_t lrv_vreg[8];
>> > >
>> > > } La_aarch64_retval;
>> > > __BEGIN_DECLS
>> >
>> > note: i don't like to use non-standard types in
>> > public apis (like __uint128_t), but we already
>> > made this mistake in the linux sigcontext, so this
>> > is probably ok.
>> >
>> > (my preference normally is to use a standard type
>> > e.g. long double or char[] with alignment attr,
>> > but in practice __uint128_t is probably easier to
>> > deal with)
>> >
>>
>> I kind of prefer "long double" here as well. It is after all what
>> it likely
>> is. I'm not really attached to __uint128_t; the previous version of
>> the
>> interface had uint64_t and so when making the registers the correct
>> size I
>> changed it to uint128_t but that didn't compile and then when I
>> grepped the
>> source I found __uint128_t. It wasn't like I put a lot of thought
>> into that
>> decision.
>
> hm, i think the common case is to inspect float
> or double arguments in the v regs, so the
> __uint128_t is probably easier for the user
> to deal with. (and sigcontext etc already
> uses that for save/restore of the registers
> so it is better for interop and least surprise)
>
> it's unfortunate that there is no standard
> uint128_t type. (i think the arm way would
> be to use "uint8x16_t" or similar type from
> arm_neon.h but in practice that's harder
> to use)
>
> so keep this as is.
>
OK I'll change it back in a V3 version of my patch.
Since it is a trivial change, I will wait a bit and see if anyone else
has any feedback before I send it.
> (but this reminds me that the current hooks
> are broken for long double functions, not
> just for functions taking neon vector args)
Yep and the one that triggered me looking into this code was a function
that used x8 to pass a parameter indirectly. The rest of the problems I
found by inspection.
-ben
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Fix runtime linker auditing on aarch64
2020-09-23 1:16 [PATCH] Fix runtime linker auditing on aarch64 Ben Woodard
2020-09-23 12:22 ` Florian Weimer
2020-09-23 12:34 ` Szabolcs Nagy
@ 2020-09-24 0:30 ` Carlos O'Donell
2020-09-24 3:14 ` Ben Coyote Woodard
2 siblings, 1 reply; 15+ messages in thread
From: Carlos O'Donell @ 2020-09-24 0:30 UTC (permalink / raw)
To: Ben Woodard, libc-alpha, Szabolcs Nagy
On 9/22/20 9:16 PM, Ben Woodard via Libc-alpha wrote:
> The dynamic linker's auditing was not working on aarch64. See PR#26643
> https://sourceware.org/bugzilla/show_bug.cgi?id=26643
>
> There were two distinct problems:
> * _dl_runtime_resolve was not preserving x8 the indirect result location
> register.
> * The NEON Q registers pushed onto the stack by _dl_runtime_resolve
> were twice the size of D registers extracted from the stack frame by
> _dl_runtime_profile.
>
> To fix this
> * The La_aarch64_regs structure was expanded to include x8 and the full
> sized NEON V registers that are required to be preserved by the ABI.
> * _dl_runtime_profile needed to extract registers saved by
> _dl_runtime_resolve and put them into the new correctly sized
> La_aarch64_regs structure.
> * The return value structure La_aarch64_retval also didn't have the correctly
> sized NEON V registers.
>
> As a couple of additional cleanups
> * The names of the NEON registers saved within the La_aarch64_regs and the
> La_aarch_retval structures referred to the old D registers which were
> doubles. Now the registers are quads and are called V for vector registers.
> So the name of the field in the structure and the names of the offsets
> within that structure were named to use the more modern names.
> * The ABI specification says that r0-r7 + r8 the indirect result location
> register as well as the NEON v0-v7 registers can be used to return values
> from a function. Therefore, I addded those to the La_aarch64_retval
> structure so that it also correctly matches the ABI.
>
> An additional problem not addressed by this patch is what to do about the
> changes to the aarch64 ABI needed to support SVE. A discussion about what to
> do about that was begun on libc-alpha here:
> https://sourceware.org/pipermail/libc-alpha/2020-September/117797.html
> ---
> sysdeps/aarch64/bits/link.h | 17 ++++----
> sysdeps/aarch64/dl-link.sym | 4 +-
> sysdeps/aarch64/dl-trampoline.S | 75 +++++++++++++++++++++------------
> 3 files changed, 59 insertions(+), 37 deletions(-)
>
> diff --git a/sysdeps/aarch64/bits/link.h b/sysdeps/aarch64/bits/link.h
> index 0c54e6ea7b..2b43ace57c 100644
> --- a/sysdeps/aarch64/bits/link.h
> +++ b/sysdeps/aarch64/bits/link.h
> @@ -23,19 +23,20 @@
> /* Registers for entry into PLT on AArch64. */
> typedef struct La_aarch64_regs
> {
> - uint64_t lr_xreg[8];
> - uint64_t lr_dreg[8];
> - uint64_t lr_sp;
> - uint64_t lr_lr;
> + uint64_t lr_xreg[9];
> + __uint128_t lr_vreg[8];
> + uint64_t lr_sp;
> + uint64_t lr_lr;
This breaks ABI and does not address what to do about SVE.
If you argue that LD_AUDIT was always broken for AArch64 then
you get away with breaking ABI *once* and that one time you
break it to fix the ABI should include all the currently known
breakages that are out there.
I think we'll need feedback from Arm on this to get their input
on the direction to take here.
Granted we always have LAV_CURRENT we can bump to change the
interface, but that's a lot of code to write to handle that
and it would require struct-copying to support a newer larger
sized structure.
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Fix runtime linker auditing on aarch64
2020-09-24 0:30 ` Carlos O'Donell
@ 2020-09-24 3:14 ` Ben Coyote Woodard
2020-09-24 8:04 ` Szabolcs Nagy
2020-09-24 9:58 ` Florian Weimer
0 siblings, 2 replies; 15+ messages in thread
From: Ben Coyote Woodard @ 2020-09-24 3:14 UTC (permalink / raw)
To: Carlos O'Donell, libc-alpha, Szabolcs Nagy
On 9/23/20 5:30 PM, Carlos O'Donell wrote:
> On 9/22/20 9:16 PM, Ben Woodard via Libc-alpha wrote:
>> The dynamic linker's auditing was not working on aarch64. See PR#26643
>> https://sourceware.org/bugzilla/show_bug.cgi?id=26643
>>
>> There were two distinct problems:
>> * _dl_runtime_resolve was not preserving x8 the indirect result location
>> register.
>> * The NEON Q registers pushed onto the stack by _dl_runtime_resolve
>> were twice the size of D registers extracted from the stack frame by
>> _dl_runtime_profile.
>>
>> To fix this
>> * The La_aarch64_regs structure was expanded to include x8 and the full
>> sized NEON V registers that are required to be preserved by the ABI.
>> * _dl_runtime_profile needed to extract registers saved by
>> _dl_runtime_resolve and put them into the new correctly sized
>> La_aarch64_regs structure.
>> * The return value structure La_aarch64_retval also didn't have the correctly
>> sized NEON V registers.
>>
>> As a couple of additional cleanups
>> * The names of the NEON registers saved within the La_aarch64_regs and the
>> La_aarch_retval structures referred to the old D registers which were
>> doubles. Now the registers are quads and are called V for vector registers.
>> So the name of the field in the structure and the names of the offsets
>> within that structure were named to use the more modern names.
>> * The ABI specification says that r0-r7 + r8 the indirect result location
>> register as well as the NEON v0-v7 registers can be used to return values
>> from a function. Therefore, I addded those to the La_aarch64_retval
>> structure so that it also correctly matches the ABI.
>>
>> An additional problem not addressed by this patch is what to do about the
>> changes to the aarch64 ABI needed to support SVE. A discussion about what to
>> do about that was begun on libc-alpha here:
>> https://sourceware.org/pipermail/libc-alpha/2020-September/117797.html
>> ---
>> sysdeps/aarch64/bits/link.h | 17 ++++----
>> sysdeps/aarch64/dl-link.sym | 4 +-
>> sysdeps/aarch64/dl-trampoline.S | 75 +++++++++++++++++++++------------
>> 3 files changed, 59 insertions(+), 37 deletions(-)
>>
>> diff --git a/sysdeps/aarch64/bits/link.h b/sysdeps/aarch64/bits/link.h
>> index 0c54e6ea7b..2b43ace57c 100644
>> --- a/sysdeps/aarch64/bits/link.h
>> +++ b/sysdeps/aarch64/bits/link.h
>> @@ -23,19 +23,20 @@
>> /* Registers for entry into PLT on AArch64. */
>> typedef struct La_aarch64_regs
>> {
>> - uint64_t lr_xreg[8];
>> - uint64_t lr_dreg[8];
>> - uint64_t lr_sp;
>> - uint64_t lr_lr;
>> + uint64_t lr_xreg[9];
>> + __uint128_t lr_vreg[8];
>> + uint64_t lr_sp;
>> + uint64_t lr_lr;
> This breaks ABI and does not address what to do about SVE.
>
> If you argue that LD_AUDIT was always broken for AArch64 then
> you get away with breaking ABI *once* and that one time you
> break it to fix the ABI should include all the currently known
> breakages that are out there.
I agree with this and that was literally why I didn't submit this patch
until you encouraged me to submit it.
LD_AUDIT has always broken for AArch64. I think that we are currently up
to four cases where it it wouldn't work:
- functions that used x8 for indirect parameter references
- long double parameters
- NEON registers for parameters
- NEON registers for return values
Szabolcs did point out an interesting twist to the question which I
think is worth considering:
Currently the SVE functions are not handled by the PLT/GOT subsystem and
are therefore unauditable due to them having a different linkage type.
It will take some additional code to make them auditable. This will also
likely require changes to compilers and binutils and likely additional
kernel support.
He also says that there is no architecturally defined way to save or
restore these registers even though ARM did specify an ABI that included
saving them. He's much more of an expert on the ARM architecture than I
am but I have yet to convince myself that is true. but I'm just starting
to tinker with SVE assembly instructions (how does the kernel context swap?)
With all those things being true, even though there are currently SVE
enabled processors currently for sale, it seems like the full hardware
enablement of SVE including making functions auditable is a long way
off. This seems like it should allow time to implement the code
necessary to allow bumps of LAV_CURRENT on a per arch level.
So a reasonable plan may be:
1) fix the problems with aarch64 auditing now. We literally have tool
authors impacted by the broken audit interface now.
2) add per arch LAV_CURRENT bumps to accommodate ABI changes
3) do the backend work in the kernel, the compilers and binutils to make
SVE auditable
4) then finally do the work needed wire up auditing of SVE functions
including potentially changing the ABI again and bumping LAV_CURRENT in
glibc.
>
> I think we'll need feedback from Arm on this to get their input
> on the direction to take here.
More than just feedback, I would personally love it if ARM actually did
the work of 2-4 as part of their enablement of ARMv8.4 or whichever
minor version of the v8 ISA makes SVE mandatory rather than an optional
feature.
> Granted we always have LAV_CURRENT we can bump to change the
> interface, but that's a lot of code to write to handle that
> and it would require struct-copying to support a newer larger
> sized structure.
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Fix runtime linker auditing on aarch64
2020-09-24 3:14 ` Ben Coyote Woodard
@ 2020-09-24 8:04 ` Szabolcs Nagy
2020-09-24 9:58 ` Florian Weimer
1 sibling, 0 replies; 15+ messages in thread
From: Szabolcs Nagy @ 2020-09-24 8:04 UTC (permalink / raw)
To: Ben Coyote Woodard; +Cc: Carlos O'Donell, libc-alpha
The 09/23/2020 20:14, Ben Coyote Woodard wrote:
>
> On 9/23/20 5:30 PM, Carlos O'Donell wrote:
> > On 9/22/20 9:16 PM, Ben Woodard via Libc-alpha wrote:
> > > The dynamic linker's auditing was not working on aarch64. See PR#26643
> > > https://sourceware.org/bugzilla/show_bug.cgi?id=26643
> > >
> > > There were two distinct problems:
> > > * _dl_runtime_resolve was not preserving x8 the indirect result location
> > > register.
> > > * The NEON Q registers pushed onto the stack by _dl_runtime_resolve
> > > were twice the size of D registers extracted from the stack frame by
> > > _dl_runtime_profile.
> > >
> > > To fix this
> > > * The La_aarch64_regs structure was expanded to include x8 and the full
> > > sized NEON V registers that are required to be preserved by the ABI.
> > > * _dl_runtime_profile needed to extract registers saved by
> > > _dl_runtime_resolve and put them into the new correctly sized
> > > La_aarch64_regs structure.
> > > * The return value structure La_aarch64_retval also didn't have the correctly
> > > sized NEON V registers.
> > >
> > > As a couple of additional cleanups
> > > * The names of the NEON registers saved within the La_aarch64_regs and the
> > > La_aarch_retval structures referred to the old D registers which were
> > > doubles. Now the registers are quads and are called V for vector registers.
> > > So the name of the field in the structure and the names of the offsets
> > > within that structure were named to use the more modern names.
> > > * The ABI specification says that r0-r7 + r8 the indirect result location
> > > register as well as the NEON v0-v7 registers can be used to return values
> > > from a function. Therefore, I addded those to the La_aarch64_retval
> > > structure so that it also correctly matches the ABI.
> > >
> > > An additional problem not addressed by this patch is what to do about the
> > > changes to the aarch64 ABI needed to support SVE. A discussion about what to
> > > do about that was begun on libc-alpha here:
> > > https://sourceware.org/pipermail/libc-alpha/2020-September/117797.html
> > > ---
> > > sysdeps/aarch64/bits/link.h | 17 ++++----
> > > sysdeps/aarch64/dl-link.sym | 4 +-
> > > sysdeps/aarch64/dl-trampoline.S | 75 +++++++++++++++++++++------------
> > > 3 files changed, 59 insertions(+), 37 deletions(-)
> > >
> > > diff --git a/sysdeps/aarch64/bits/link.h b/sysdeps/aarch64/bits/link.h
> > > index 0c54e6ea7b..2b43ace57c 100644
> > > --- a/sysdeps/aarch64/bits/link.h
> > > +++ b/sysdeps/aarch64/bits/link.h
> > > @@ -23,19 +23,20 @@
> > > /* Registers for entry into PLT on AArch64. */
> > > typedef struct La_aarch64_regs
> > > {
> > > - uint64_t lr_xreg[8];
> > > - uint64_t lr_dreg[8];
> > > - uint64_t lr_sp;
> > > - uint64_t lr_lr;
> > > + uint64_t lr_xreg[9];
> > > + __uint128_t lr_vreg[8];
> > > + uint64_t lr_sp;
> > > + uint64_t lr_lr;
> > This breaks ABI and does not address what to do about SVE.
> >
> > If you argue that LD_AUDIT was always broken for AArch64 then
> > you get away with breaking ABI *once* and that one time you
> > break it to fix the ABI should include all the currently known
> > breakages that are out there.
>
> I agree with this and that was literally why I didn't submit this patch
> until you encouraged me to submit it.
>
> LD_AUDIT has always broken for AArch64. I think that we are currently up to
> four cases where it it wouldn't work:
> - functions that used x8 for indirect parameter references
> - long double parameters
> - NEON registers for parameters
> - NEON registers for return values
>
> Szabolcs did point out an interesting twist to the question which I think is
> worth considering:
>
> Currently the SVE functions are not handled by the PLT/GOT subsystem and are
> therefore unauditable due to them having a different linkage type. It will
> take some additional code to make them auditable. This will also likely
> require changes to compilers and binutils and likely additional kernel
> support.
it can be done with only changing glibc (but it
may not be the best solution).
however the problem is not just with sve calls:
variant pcs is there for a reason, there are other
call conventions (vector pcs using advanced simd
and future/custom extensions and those are all
handled by the same mechanism), so whatever hook
we introduce has to deal with a more general problem.
solutions i can see:
0) proposed patch: only solve auditing for base pcs.
users who care about sve pcs will have to use a
different solution.
1) audit entry always saves *all* registers and one
extra bit (== is the symbol variant pcs) and then
the user hook has to deal with that.
this requires no toolchain changes, but potentially
slows down the common case for auditing and the abi
has to be versioned: new reg state needs libc update.
(and auditing will require more stack space)
2) same, but with two separate audit hooks (dispatch
can be in libc or linker generated), then the base pcs
hook is fast and reliable and only the variant pcs
hook needs complications. (dispatch in libc is ugly
because the asm entry code has to inspect elf symbol
table flags without clobbering registers)
3) introduce separate elf symbol table markings for sve,
vector pcs etc and have separate hooks for them so
user can deal with the different call conventions
more easily. this needs more toolchain and elf abi
work, and i think there are not enough symbol table
bits for this (i.e. may require significant elf
extension to make it work).
>
> He also says that there is no architecturally defined way to save or restore
> these registers even though ARM did specify an ABI that included saving
> them. He's much more of an expert on the ARM architecture than I am but I
> have yet to convince myself that is true. but I'm just starting to tinker
> with SVE assembly instructions (how does the kernel context swap?)
sorry i was probably not clear: there is of course
way to save and restore sve registers, but there
is no *future proof* way, i.e. if new register
state is introduced for whatever cpu extension
then we will have to update code in glibc again.
in principle there could be save/restore instructions
that save all registers that are commonly required
for a context switch in a defined format (which can
be versioned and specified in the architecture)
then the audit hooks would just directly expose
that to the user.
>
> With all those things being true, even though there are currently SVE
> enabled processors currently for sale, it seems like the full hardware
> enablement of SVE including making functions auditable is a long way off.
> This seems like it should allow time to implement the code necessary to
> allow bumps of LAV_CURRENT on a per arch level.
>
> So a reasonable plan may be:
> 1) fix the problems with aarch64 auditing now. We literally have tool
> authors impacted by the broken audit interface now.
> 2) add per arch LAV_CURRENT bumps to accommodate ABI changes
> 3) do the backend work in the kernel, the compilers and binutils to make SVE
> auditable
> 4) then finally do the work needed wire up auditing of SVE functions
> including potentially changing the ABI again and bumping LAV_CURRENT in
> glibc.
>
> >
> > I think we'll need feedback from Arm on this to get their input
> > on the direction to take here.
>
> More than just feedback, I would personally love it if ARM actually did the
> work of 2-4 as part of their enablement of ARMv8.4 or whichever minor
> version of the v8 ISA makes SVE mandatory rather than an optional feature.
(sve is currently optional in all arch versions.)
i have no experience with auditing and expected
that hooking base pcs function calls is enough.
note that currently the tooling does not guarantee
that all extern calls go via a PLT, so auditing
cannot reliably hook all calls, which tells me
it is enough to do it on a best effort basis. if
users really want reliable audit PLT hooks, then
that's even more work that goes beyond sve. this
has to be justified for arm to care fixing it.
> > Granted we always have LAV_CURRENT we can bump to change the
> > interface, but that's a lot of code to write to handle that
> > and it would require struct-copying to support a newer larger
> > sized structure.
> >
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Fix runtime linker auditing on aarch64
2020-09-24 3:14 ` Ben Coyote Woodard
2020-09-24 8:04 ` Szabolcs Nagy
@ 2020-09-24 9:58 ` Florian Weimer
1 sibling, 0 replies; 15+ messages in thread
From: Florian Weimer @ 2020-09-24 9:58 UTC (permalink / raw)
To: Ben Coyote Woodard via Libc-alpha
* Ben Coyote Woodard via Libc-alpha:
> LD_AUDIT has always broken for AArch64.
It was not completely broken, even the PLT enter/exit hooks somewhat
worked.
Therefore, you really need to increase LAV_CURRENT (for AArch64) in your
patch, so that we can patch in backwards compatibility code later if it
turns out that your patch broke existing audit modules that worked
before. The dynamic linker can treat both versions (1 and 2) in the
same way if you and the AArch64 maintainers agree not to implement the
compatibility code. But with the LAV_CURRENT change, we at least have
the option to add compatibility support if necessary.
Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Fix runtime linker auditing on aarch64
2020-09-23 13:18 ` Szabolcs Nagy
@ 2020-09-24 11:10 ` Florian Weimer
0 siblings, 0 replies; 15+ messages in thread
From: Florian Weimer @ 2020-09-24 11:10 UTC (permalink / raw)
To: Szabolcs Nagy; +Cc: Ben Woodard via Libc-alpha
* Szabolcs Nagy:
> we don't have a way to do this in the architecture
> (i.e. reg state save/restore operations)
> we can have something in the kernel (which needs
> to know about the supported registers) but i
> assume using a syscall for this is too much overhead.
Yes, seems like a system call would be problematic.
> (maybe in vdso? sounds ugly if we need it before
> vdso is set up)
Indeed.
If AArch64 doesn't have a generic mechanism, that certainly simplifies
matters because there's no dynamic linker alignment to the hardware
needed then.
Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2020-09-24 11:10 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 1:16 [PATCH] Fix runtime linker auditing on aarch64 Ben Woodard
2020-09-23 12:22 ` Florian Weimer
2020-09-23 12:48 ` Szabolcs Nagy
2020-09-23 12:56 ` Florian Weimer
2020-09-23 13:18 ` Szabolcs Nagy
2020-09-24 11:10 ` Florian Weimer
2020-09-23 12:34 ` Szabolcs Nagy
2020-09-23 15:10 ` Ben Coyote Woodard
2020-09-23 16:19 ` Szabolcs Nagy
2020-09-23 18:12 ` Ben Coyote Woodard
2020-09-23 18:00 ` Ben Coyote Woodard
2020-09-24 0:30 ` Carlos O'Donell
2020-09-24 3:14 ` Ben Coyote Woodard
2020-09-24 8:04 ` Szabolcs Nagy
2020-09-24 9:58 ` Florian Weimer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).