From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by sourceware.org (Postfix) with ESMTP id 1DAF83861030 for ; Wed, 23 Sep 2020 18:00:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 1DAF83861030 Received: from mail-oo1-f70.google.com (mail-oo1-f70.google.com [209.85.161.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-106-ioueyRUIM0ueE00WzUvs0A-1; Wed, 23 Sep 2020 14:00:51 -0400 X-MC-Unique: ioueyRUIM0ueE00WzUvs0A-1 Received: by mail-oo1-f70.google.com with SMTP id g13so170792ooo.20 for ; Wed, 23 Sep 2020 11:00:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:message-id:in-reply-to :references:mime-version; bh=h5L8Cw28DWq+Lud/LNi6F5mgZ4+bdFC+MNAs/tGmONE=; b=KoKTXA/Mu2aMuNmWff2ag0gI4aubW6gkrYh7qtByOiyrJEZ6aOJp+lQLftKbLFQ4nw 0QJUZ2RAt+TcP3t5zutPwjP4ErguoEW39Z2Rx66Xzsa/ljj8+bzV5/IBe1fwgw5I9dfu owNdL9cSqguqbgPwXSMDZXU485OIGSlJGBUIh5GcQtDfV1XrM/AlHGgLh/7mOmgbGxEj tu5Z5rnyPOP+VRxewDQf1pyUBXdEZnTLQJyN+/Wf1RJ13V7kHLT2vDJa9caOaETIJfar XZVMQv6mVf3UnAxOVQDtLXb+fKUOeWZQWs+ImjB59aD+fyhzOBeaGJ1W8uONlUh0m/PM 8DQQ== X-Gm-Message-State: AOAM533v8Kxrgv48zXp/tEjd7Ul/uzFUKzNAnQUdr3630fYZkcEGS+Td iiZbYzcClZJeYcX8fHKXqzbMXofwIY4lm0ASM675HQfhQ8KVbhAGQtJtDdnq60R3MyWHwEuTZGd h4MW8qtVsDbXC5Rpx9pl0 X-Received: by 2002:a9d:7f0c:: with SMTP id j12mr557661otq.53.1600884050566; Wed, 23 Sep 2020 11:00:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzcGWlutHMlhrJkaheSdGxWz7MFPjVHr2gGEv9leEBWbY2ePWsuxy6YLw5mmIWEvMD/gUWzog== X-Received: by 2002:a9d:7f0c:: with SMTP id j12mr557607otq.53.1600884049438; Wed, 23 Sep 2020 11:00:49 -0700 (PDT) Received: from [192.168.1.234] (47-208-193-143.trckcmtc01.res.dyn.suddenlink.net. [47.208.193.143]) by smtp.gmail.com with ESMTPSA id 74sm170506otc.65.2020.09.23.11.00.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Sep 2020 11:00:48 -0700 (PDT) Date: Wed, 23 Sep 2020 11:00:40 -0700 From: Ben Coyote Woodard Subject: Re: [PATCH] Fix runtime linker auditing on aarch64 To: Szabolcs Nagy Cc: libc-alpha@sourceware.org Message-Id: <41I4HQ.3U0CTNIN4QWR3@redhat.com> In-Reply-To: <20200923123426.GD16385@arm.com> References: <20200923011613.2243151-1-woodard@redhat.com> <20200923123426.GD16385@arm.com> X-Mailer: geary/3.36.3.1 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, HTML_MESSAGE, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPAM_BODY, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset=us-ascii; format=flowed X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2020 18:01:00 -0000 On Wed, Sep 23, 2020 at 13:34, Szabolcs Nagy wrote: >> >> An additional problem not addressed by this patch is what to do >> about the >> changes to the aarch64 ABI needed to support SVE. A discussion >> about what to >> do about that was begun on libc-alpha here: >> >> > > SVE calls are special (marked as STO_AARCH64_VARIANT_PCS > in the dynamic symbol table) and currently load time > bound (like BIND_NOW) so i think they don't go through > the PLT0 sequence that calls the profile entry in ld.so > and thus audit hooks are not called for them. > > this is probably not what LD_AUDIT users would want > (do they care about hooking into sve calls?), but Most certainly!! The two big users of LD_AUDIT that I often times work with are: spindle https://computing.llnl.gov/projects/spindle and hpctoolkit http://hpctoolkit.org/ I was talking to a colleague over at cray/HPE where they are selling the Apollo 80 which use the A64FX processor which has SVE and he immediately mentioned spindle and said that he would talk to the cray team that handles spindle. So this problem with SVE may not have been reported by users yet but it is just barely over the horizon. > VARIANT_PCS essentially allows any call convention, That seems like a really odd choice to me. You guys over at ARM did publish a AAPCS for processors with SVE defining which registers need to be saved vs which could be clobbered. https://developer.arm.com/documentation/100986/0000/?search=5eec7447e24a5e02d07b2774 I've been trying to figure out how to handle this. I kind of needed to know what the structure of the La_aarch64_{regs,retval} before I really dug deeply into this. If we could just define the how you want to present the structures, I will update my auditing patch to use that structure and then we can figure out how to fill later. The pieces that I have but hadn't quite put together are: - There is HWCAP_SVE in sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h As I understand it so far, this is an ELF attribute that the dynamic linker could use when picking which version of a library to load. If this were propagated into the dynamic linker, we could tell at at least a library level if any functions in this library used SVE. If it didn't then we wouldn't have to save SVE registers. - I haven't checked if libraries do in fact set this HWCAP flag. If they don't then that is probably a binutils issue. - I haven't yet figured out how to find and make use of this library flag where I need it in the _dl_runtime_resolve and _dl_runtime_profile. If they are not relatively easy to find, I considered pushing the processor's hwcap and the library's hwcap onto the stack from the C code which calls those functions. - If the library's hwcap flags are not available in an audit library, we need to define a way that they are made available because they are needed to make the correct library selection purposes. Consider for example the case where an audit lib is trying to interpose its selection of a library over the normal selection made by the runtime linker. It needs to be able to evaluate if the hardware has all capabilities that the library requires. - Similarly, if a library does not make use of any of the NEON registers, then we may be able to get away with just saving the X regs and skipping the V regs. They are less of a problem than the potentially 2048 bit Z regs but it adds up. However all of that happens at the library level, since the functions are of a different type in the dynamic section, it would be even better to make the decision about what needs to be saved at a per-function level. I noted that I can potentially use ID_AA64PFR0_EL1 [35:32] to see if the processor has SVE then read ZCR_EL1[0:3] to figure out the internal register length of the SVE registers. Then I would use ld1 and st1 to copy that length to a char[] or something. Admittedly that is all less than half baked, which is why I wanted to agree on a La_aarch64_{regs,retval} structure first. > so all registers have to be saved and restored if > such call enters the dynamic linker which is a problem > if register state may be extended in the future > (although probably ldaudit is special enough that its > users can update glibc if they care about new regs?). > > (one way to expose variant pcs calls to audit hooks > is to detect the symbol type in the asm entry point > and then call a different hook, but this sounds > sufficiently ugly that i think we would then prefer > to update the elf abi with a second plt entry sequence > for variant pcs calls that can just use a different > entry into ld.so and new linkers would generate that > for dsos with variant pcs symbols.) > >> --- >> sysdeps/aarch64/bits/link.h | 17 ++++---- >> sysdeps/aarch64/dl-link.sym | 4 +- >> sysdeps/aarch64/dl-trampoline.S | 75 >> +++++++++++++++++++++------------ >> 3 files changed, 59 insertions(+), 37 deletions(-) >> >> diff --git a/sysdeps/aarch64/bits/link.h >> b/sysdeps/aarch64/bits/link.h >> index 0c54e6ea7b..2b43ace57c 100644 >> --- a/sysdeps/aarch64/bits/link.h >> +++ b/sysdeps/aarch64/bits/link.h >> @@ -23,19 +23,20 @@ >> /* Registers for entry into PLT on AArch64. */ >> typedef struct La_aarch64_regs >> { >> - uint64_t lr_xreg[8]; >> - uint64_t lr_dreg[8]; >> - uint64_t lr_sp; >> - uint64_t lr_lr; >> + uint64_t lr_xreg[9]; >> + __uint128_t lr_vreg[8]; >> + uint64_t lr_sp; >> + uint64_t lr_lr; >> } La_aarch64_regs; > > ok. > > changing abi is fine with me: old abi was > unusably broken, if audit modules use some > versioning that's even better. > >> >> /* Return values for calls from PLT on AArch64. */ >> typedef struct La_aarch64_retval >> { >> - /* Up to two integer registers can be used for a return value. >> */ >> - uint64_t lrv_xreg[2]; >> - /* Up to four D registers can be used for a return value. */ >> - uint64_t lrv_dreg[4]; >> + /* Up to eight integer registers and the indirect result >> location register >> + can be used for a return value. */ >> + uint64_t lrv_xreg[9]; > > x8 is not preserved so recording it at function exit > is not useful. (on entry it points to where results > are stored but on exit it can be clobbered) > >> + /* Up to eight V registers can be used for a return value. */ >> + __uint128_t lrv_vreg[8]; >> >> } La_aarch64_retval; >> __BEGIN_DECLS > > note: i don't like to use non-standard types in > public apis (like __uint128_t), but we already > made this mistake in the linux sigcontext, so this > is probably ok. > > (my preference normally is to use a standard type > e.g. long double or char[] with alignment attr, > but in practice __uint128_t is probably easier to > deal with) > > >> diff --git a/sysdeps/aarch64/dl-link.sym >> b/sysdeps/aarch64/dl-link.sym >> index d67d28b40c..70d153a1d5 100644 >> --- a/sysdeps/aarch64/dl-link.sym >> +++ b/sysdeps/aarch64/dl-link.sym >> @@ -7,9 +7,9 @@ DL_SIZEOF_RG sizeof(struct La_aarch64_regs) >> DL_SIZEOF_RV sizeof(struct La_aarch64_retval) >> >> DL_OFFSET_RG_X0 offsetof(struct La_aarch64_regs, lr_xreg) >> -DL_OFFSET_RG_D0 offsetof(struct La_aarch64_regs, lr_dreg) >> +DL_OFFSET_RG_V0 offsetof(struct La_aarch64_regs, lr_vreg) >> DL_OFFSET_RG_SP offsetof(struct La_aarch64_regs, lr_sp) >> DL_OFFSET_RG_LR offsetof(struct La_aarch64_regs, lr_lr) >> >> DL_OFFSET_RV_X0 offsetof(struct La_aarch64_retval, lrv_xreg) >> -DL_OFFSET_RV_D0 offsetof(struct La_aarch64_retval, lrv_dreg) >> +DL_OFFSET_RV_V0 offsetof(struct La_aarch64_retval, lrv_vreg) > > ok. > >> diff --git a/sysdeps/aarch64/dl-trampoline.S >> b/sysdeps/aarch64/dl-trampoline.S >> index 794876fffa..c91341e8fc 100644 >> --- a/sysdeps/aarch64/dl-trampoline.S >> +++ b/sysdeps/aarch64/dl-trampoline.S >> @@ -46,6 +46,8 @@ _dl_runtime_resolve: >> cfi_rel_offset (lr, 8) >> >> /* Save arguments. */ >> + /* Note: Saving x9 is not required by the ABI but the assember >> requires >> + the immediate values of operand 3 to be a multiple of 16 */ >> stp x8, x9, [sp, #-(80+8*16)]! >> cfi_adjust_cfa_offset (80+8*16) >> cfi_rel_offset (x8, 0) > > ok. > >> @@ -183,19 +185,23 @@ _dl_runtime_profile: > > there is a comment at the entry point with offsets > which i think is no longer valid (i think it's > ok to remove the offsets just document the order > of things on the stack) > >> stp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3] >> cfi_rel_offset (x6, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 0) >> cfi_rel_offset (x7, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 8) >> - >> - stp d0, d1, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0] >> - cfi_rel_offset (d0, OFFSET_RG + DL_OFFSET_RG_D0 + 16*0) >> - cfi_rel_offset (d1, OFFSET_RG + DL_OFFSET_RG_D0 + 16*0 + 8) >> - stp d2, d3, [X29, #OFFSET_RG+ DL_OFFSET_RG_D0 + 16*1] >> - cfi_rel_offset (d2, OFFSET_RG + DL_OFFSET_RG_D0 + 16*1 + 0) >> - cfi_rel_offset (d3, OFFSET_RG + DL_OFFSET_RG_D0 + 16*1 + 8) >> - stp d4, d5, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2] >> - cfi_rel_offset (d4, OFFSET_RG + DL_OFFSET_RG_D0 + 16*2 + 0) >> - cfi_rel_offset (d5, OFFSET_RG + DL_OFFSET_RG_D0 + 16*2 + 8) >> - stp d6, d7, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3] >> - cfi_rel_offset (d6, OFFSET_RG + DL_OFFSET_RG_D0 + 16*3 + 0) >> - cfi_rel_offset (d7, OFFSET_RG + DL_OFFSET_RG_D0 + 16*3 + 8) >> + str x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4 + 0] >> + cfi_rel_offset (x8, OFFSET_RG + DL_OFFSET_RG_X0 + 16*4 + 0) >> + /* Note X9 is in the stack frame for alignment but it is not >> + required to be saved by the ABI */ >> + > > i dont see x9 here. you can just note that there is padding. > >> + stp q0, q1, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*0] >> + cfi_rel_offset (q0, OFFSET_RG + DL_OFFSET_RG_V0 + 32*0) >> + cfi_rel_offset (q1, OFFSET_RG + DL_OFFSET_RG_V0 + 32*0 + 16) >> + stp q2, q3, [X29, #OFFSET_RG+ DL_OFFSET_RG_V0 + 32*1] >> + cfi_rel_offset (q2, OFFSET_RG + DL_OFFSET_RG_V0 + 32*1 + 0) >> + cfi_rel_offset (q3, OFFSET_RG + DL_OFFSET_RG_V0 + 32*1 + 16) >> + stp q4, q5, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2] >> + cfi_rel_offset (q4, OFFSET_RG + DL_OFFSET_RG_V0 + 32*2 + 0) >> + cfi_rel_offset (q5, OFFSET_RG + DL_OFFSET_RG_V0 + 32*2 + 16) >> + stp q6, q7, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3] >> + cfi_rel_offset (q6, OFFSET_RG + DL_OFFSET_RG_V0 + 32*3 + 0) >> + cfi_rel_offset (q7, OFFSET_RG + DL_OFFSET_RG_V0 + 32*3 + 16) >> >> add x0, x29, #SF_SIZE + 16 >> ldr x1, [x29, #OFFSET_LR] >> @@ -234,10 +240,11 @@ _dl_runtime_profile: >> ldp x2, x3, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*1] >> ldp x4, x5, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*2] >> ldp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3] >> - ldp d0, d1, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0] >> - ldp d2, d3, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*1] >> - ldp d4, d5, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2] >> - ldp d6, d7, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3] >> + ldr x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4] >> + ldp q0, q1, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 16*0] >> + ldp q2, q3, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*1] >> + ldp q4, q5, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2] >> + ldp q6, q7, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3] >> >> cfi_def_cfa_register (sp) >> ldp x29, x30, [x29, #0] >> @@ -280,14 +287,21 @@ _dl_runtime_profile: >> ldp x2, x3, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*1] >> ldp x4, x5, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*2] >> ldp x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3] >> - ldp d0, d1, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0] >> - ldp d2, d3, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*1] >> - ldp d4, d5, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2] >> - ldp d6, d7, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3] >> + ldr x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4] >> + ldp q0, q1, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*0] >> + ldp q2, q3, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*1] >> + ldp q4, q5, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2] >> + ldp q6, q7, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3] >> blr ip0 > > ok. > >> - stp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0] >> - stp d0, d1, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*0] >> - stp d2, d3, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*1] >> + stp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*0] >> + stp x2, x3, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*1] >> + stp x4, x5, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*2] >> + stp x6, x7, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*3] >> + str x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4] > > i think storing x8 is not useful. > >> + stp q0, q1, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*0] >> + stp q2, q3, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*1] >> + stp q4, q5, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*2] >> + stp q6, q7, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*3] >> >> /* Setup call to pltexit */ >> ldp x0, x1, [x29, #OFFSET_SAVED_CALL_X0] >> @@ -295,9 +309,16 @@ _dl_runtime_profile: >> add x3, x29, #OFFSET_RV >> bl _dl_call_pltexit >> >> - ldp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0] >> - ldp d0, d1, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*0] >> - ldp d2, d3, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*1] >> + ldp x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*0] >> + ldp x2, x3, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*1] >> + ldp x4, x5, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*2] >> + ldp x6, x7, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*3] >> + ldr x8, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*4] >> + ldp q0, q1, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*0] >> + ldp q2, q3, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*1] >> + ldp q4, q5, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*2] >> + ldp q6, q7, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*3] >> + >> /* LR from within La_aarch64_reg */ >> ldr lr, [x29, #OFFSET_RG + DL_OFFSET_RG_LR] >> cfi_restore(lr) > > thanks for the patch. >