This patch series modifies hand-written assembly files for Arm targets, conditionally enabling branch target identification as well as address return signature and verification based on Armv8.1-M Pointer Authentication [1] using ACLE feature test macros at compile-time [2]. The incorportaion of PACBTI functionality in function prologues/ epilogues is dictated by the arguments passed to the `-mbranch-protection' flag at the time of Newlib compilation. Regression tested on arm-none-eabi with and without MVE extension and for Newlib and Newlib-nano. [1] <https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension> [2] <https://developer.arm.com/documentation/101028/0012/5--Feature-test-macros> Victor Do Nascimento (8): newlib: libc: define M-profile PACBTI-enablement macros newlib: libc: strcmp M-profile PACBTI-enablement newlib: libc: strlen M-profile PACBTI-enablement newlib: libc: memchr M-profile PACBTI-enablement newlib: libc: memcpy M-profile PACBTI-enablement newlib: libc: setjmp/longjmp M-profile PACBTI-enablement newlib: libc: aeabi_memmove M-profile PACBTI-enablement newlib: libc: aeabi_memset M-profile PACBTI-enablement .../libc/machine/arm/aeabi_memmove-thumb2.S | 54 ++++++++++++-- newlib/libc/machine/arm/aeabi_memset-thumb2.S | 47 +++++++++++- newlib/libc/machine/arm/memchr.S | 55 ++++++++++++-- newlib/libc/machine/arm/memcpy-armv7m.S | 74 +++++++++++++++++-- newlib/libc/machine/arm/pacbti.h | 64 ++++++++++++++++ newlib/libc/machine/arm/setjmp.S | 33 ++++++++- newlib/libc/machine/arm/strcmp-arm-tiny.S | 7 +- newlib/libc/machine/arm/strcmp-armv7.S | 45 ++++++++--- newlib/libc/machine/arm/strcmp-armv7m.S | 36 +++++++-- newlib/libc/machine/arm/strlen-armv7.S | 53 ++++++++++++- newlib/libc/machine/arm/strlen-thumb2-Os.S | 13 +++- 11 files changed, 435 insertions(+), 46 deletions(-) create mode 100644 newlib/libc/machine/arm/pacbti.h -- 2.36.1
Create an assembly header file that conditionally defines fuction prologues/epilogues depending on the compile-time mbranch-protection argument values. * newlib/libc/machine/arm/pacbti.h: New. --- newlib/libc/machine/arm/pacbti.h | 64 ++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) create mode 100644 newlib/libc/machine/arm/pacbti.h diff --git a/newlib/libc/machine/arm/pacbti.h b/newlib/libc/machine/arm/pacbti.h new file mode 100644 index 000000000..4c31d00bc --- /dev/null +++ b/newlib/libc/machine/arm/pacbti.h @@ -0,0 +1,64 @@ +/* + * Copyright (c) 2022 Arm Ltd + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. The name of the company may not be used to endorse or promote + * products derived from this software without specific prior written + * permission. + * + * THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED + * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* Checki whether leaf function PAC signing has been requested + in the -mbranch-protect compile-time option. */ +#define LEAF_PROTECT_BIT 2 +#define __HAVE_PAC_LEAF \ + __ARM_FEATURE_PAC_DEFAULT & (1 << LEAF_PROTECT_BIT) + +/* Macro to handle function entry depending on branch-protection + schemes. */ + .macro pacbti_prologue +#if __HAVE_PAC_LEAF +#if __ARM_FEATURE_BTI_DEFAULT + pacbti ip, lr, sp +#else + pac ip, lr, sp +#endif /* __ARM_FEATURE_BTI_DEFAULT */ + .cfi_register 143, 12 + str ip, [sp, #-4]! + .save {ra_auth_code} + .cfi_def_cfa_offset 4 + .cfi_offset 143, -4 +#elif __ARM_FEATURE_BTI_DEFAULT + bti +#endif /* __HAVE_PAC_LEAF */ + .endm + +/* Macro to handle different branch exchange cases depending on + branch-protection schemes. */ + .macro pacbti_epilogue +#if __HAVE_PAC_LEAF + ldr ip, [sp], #4 + .cfi_restore 143 + .cfi_def_cfa_offset 0 + aut ip, lr, sp +#endif /* __HAVE_PAC_LEAF */ + bx lr + .endm -- 2.36.1
Add function prologue/epilogue to conditionally add BTI landing pads and/or PAC code generation & authentication instructions depending on compilation flags. This patch enables PACBTI for all relevant variants of strcmp: * Newlib for armv8.1-m.main+pacbti * Newlib for armv8.1-m.main+pacbti+mve * Newlib-nano --- newlib/libc/machine/arm/strcmp-arm-tiny.S | 7 +++- newlib/libc/machine/arm/strcmp-armv7.S | 45 +++++++++++++++++------ newlib/libc/machine/arm/strcmp-armv7m.S | 36 +++++++++++++++--- 3 files changed, 71 insertions(+), 17 deletions(-) diff --git a/newlib/libc/machine/arm/strcmp-arm-tiny.S b/newlib/libc/machine/arm/strcmp-arm-tiny.S index 607a41daf..8085eb4df 100644 --- a/newlib/libc/machine/arm/strcmp-arm-tiny.S +++ b/newlib/libc/machine/arm/strcmp-arm-tiny.S @@ -29,10 +29,14 @@ /* Tiny version of strcmp in ARM state. Used only when optimizing for size. Also supports Thumb-2. */ +#include "pacbti.h" + .syntax unified def_fn strcmp + .fnstart .cfi_sections .debug_frame .cfi_startproc + pacbti_prologue 1: ldrb r2, [r0], #1 ldrb r3, [r1], #1 @@ -42,6 +46,7 @@ def_fn strcmp beq 1b 2: subs r0, r2, r3 - bx lr + pacbti_epilogue .cfi_endproc + .fnend .size strcmp, . - strcmp diff --git a/newlib/libc/machine/arm/strcmp-armv7.S b/newlib/libc/machine/arm/strcmp-armv7.S index 2f93bfb73..584c89f7e 100644 --- a/newlib/libc/machine/arm/strcmp-armv7.S +++ b/newlib/libc/machine/arm/strcmp-armv7.S @@ -45,6 +45,8 @@ .thumb .syntax unified +#include "pacbti.h" + /* Parameters and result. */ #define src1 r0 #define src2 r1 @@ -91,8 +93,9 @@ ldrd r4, r5, [sp], #16 .cfi_restore 4 .cfi_restore 5 + .cfi_adjust_cfa_offset -16 sub result, result, r1, lsr #24 - bx lr + pacbti_epilogue #else /* To use the big-endian trick we'd have to reverse all three words. that's slower than this approach. */ @@ -112,22 +115,28 @@ ldrd r4, r5, [sp], #16 .cfi_restore 4 .cfi_restore 5 + .cfi_adjust_cfa_offset -16 sub result, result, r1 - bx lr + pacbti_epilogue #endif .endm + .text .p2align 5 + .fnstart + .cfi_sections .debug_frame + .cfi_startproc .Lstrcmp_start_addr: #ifndef STRCMP_NO_PRECHECK .Lfastpath_exit: sub r0, r2, r3 - bx lr + pacbti_epilogue nop #endif def_fn strcmp + pacbti_prologue #ifndef STRCMP_NO_PRECHECK ldrb r2, [src1] ldrb r3, [src2] @@ -136,16 +145,26 @@ def_fn strcmp cmpcs r2, r3 bne .Lfastpath_exit #endif - .cfi_sections .debug_frame - .cfi_startproc strd r4, r5, [sp, #-16]! - .cfi_def_cfa_offset 16 + .save {r4, r5} + .cfi_adjust_cfa_offset 16 +#if __HAVE_PAC_LEAF + .cfi_offset 4, -20 + .cfi_offset 5, -16 +#else .cfi_offset 4, -16 .cfi_offset 5, -12 +#endif /* __HAVE_PAC_LEAF */ orr tmp1, src1, src2 strd r6, r7, [sp, #8] + .save {r6, r7} +#if __HAVE_PAC_LEAF + .cfi_offset 6, -12 + .cfi_offset 7, -8 +#else .cfi_offset 6, -8 .cfi_offset 7, -4 +#endif /* __HAVE_PAC_LEAF */ mvn const_m1, #0 lsl r2, tmp1, #29 cbz r2, .Lloop_aligned8 @@ -270,7 +289,6 @@ def_fn strcmp ldr data1, [src1], #4 beq .Laligned_m2 bcs .Laligned_m1 - #ifdef STRCMP_NO_PRECHECK ldrb data2, [src2, #1] uxtb tmp1, data1, ror #BYTE1_OFFSET @@ -314,7 +332,8 @@ def_fn strcmp mov result, tmp1 ldr r4, [sp], #16 .cfi_restore 4 - bx lr + .cfi_adjust_cfa_offset -16 + pacbti_epilogue #ifndef STRCMP_NO_PRECHECK .Laligned_m1: @@ -364,8 +383,9 @@ def_fn strcmp /* R6/7 Not used in this sequence. */ .cfi_restore 6 .cfi_restore 7 + .cfi_adjust_cfa_offset -16 neg result, result - bx lr + pacbti_epilogue 6: .cfi_restore_state @@ -441,7 +461,8 @@ def_fn strcmp /* R6/7 not used in this sequence. */ .cfi_restore 6 .cfi_restore 7 - bx lr + .cfi_adjust_cfa_offset -16 + pacbti_epilogue .Lstrcmp_tail: .cfi_restore_state @@ -463,7 +484,9 @@ def_fn strcmp /* R6/7 not used in this sequence. */ .cfi_restore 6 .cfi_restore 7 + .cfi_adjust_cfa_offset -16 sub result, result, data2, lsr #24 - bx lr + pacbti_epilogue .cfi_endproc + .fnend .size strcmp, . - .Lstrcmp_start_addr diff --git a/newlib/libc/machine/arm/strcmp-armv7m.S b/newlib/libc/machine/arm/strcmp-armv7m.S index cdb4912df..fe1519f4d 100644 --- a/newlib/libc/machine/arm/strcmp-armv7m.S +++ b/newlib/libc/machine/arm/strcmp-armv7m.S @@ -29,6 +29,8 @@ /* Very similar to the generic code, but uses Thumb2 as implemented in ARMv7-M. */ +#include "pacbti.h" + /* Parameters and result. */ #define src1 r0 #define src2 r1 @@ -44,8 +46,10 @@ .thumb .syntax unified def_fn strcmp + .fnstart .cfi_sections .debug_frame .cfi_startproc + pacbti_prologue eor tmp1, src1, src2 tst tmp1, #3 /* Strings not at same byte offset from a word boundary. */ @@ -106,7 +110,7 @@ def_fn strcmp lsrs result, result, #24 subs result, result, data2 #endif - bx lr + pacbti_epilogue #if 0 @@ -214,12 +218,18 @@ def_fn strcmp cmpcs data1, data2 beq .Lstrcmp_unaligned sub result, data1, data2 - bx lr + pacbti_epilogue 2: stmfd sp!, {r5} - .cfi_def_cfa_offset 4 + .save {r5} + .cfi_adjust_cfa_offset 4 +#if __HAVE_PAC_LEAF + .cfi_offset 5, -8 /* Account for ip register already on stack. */ +#else .cfi_offset 5, -4 +#endif /* __HAVE_PAC_LEAF */ + ldr data1, [src1], #4 and tmp2, src2, #3 @@ -353,10 +363,17 @@ def_fn strcmp .Lstrcmp_done_equal: mov result, #0 .cfi_remember_state +#if __HAVE_PAC_LEAF + pop {r5, ip} + .cfi_restore 5 + .cfi_restore 143 + aut ip, lr, sp +#else ldmfd sp!, {r5} .cfi_restore 5 +#endif /* __HAVE_PAC_LEAF */ .cfi_def_cfa_offset 0 - bx lr + bx lr .Lstrcmp_tail: .cfi_restore_state @@ -370,9 +387,18 @@ def_fn strcmp S2LOEQ data2, data2, #8 beq .Lstrcmp_tail sub result, r2, result +#if __HAVE_PAC_LEAF + pop {r5, ip} + .cfi_restore 5 + .cfi_restore 143 + .cfi_def_cfa_offset 0 + aut ip, lr, sp +#else ldmfd sp!, {r5} .cfi_restore 5 .cfi_def_cfa_offset 0 - bx lr +#endif /* __HAVE_PAC_LEAF */ + bx lr .cfi_endproc + .fnend .size strcmp, . - strcmp -- 2.36.1
Add function prologue/epilogue to conditionally add BTI landing pads and/or PAC code generation & authentication instructions depending on compilation flags. This patch enables PACBTI for all relevant variants of strlen: * Newlib for armv8.1-m.main+pacbti * Newlib for armv8.1-m.main+pacbti+mve * Newlib-nano --- newlib/libc/machine/arm/strlen-armv7.S | 53 ++++++++++++++++++++-- newlib/libc/machine/arm/strlen-thumb2-Os.S | 13 ++++-- 2 files changed, 60 insertions(+), 6 deletions(-) diff --git a/newlib/libc/machine/arm/strlen-armv7.S b/newlib/libc/machine/arm/strlen-armv7.S index f3dda0d60..0492800f9 100644 --- a/newlib/libc/machine/arm/strlen-armv7.S +++ b/newlib/libc/machine/arm/strlen-armv7.S @@ -59,6 +59,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include "acle-compat.h" +#include "pacbti.h" .macro def_fn f p2align=0 .text @@ -78,7 +79,11 @@ /* This code requires Thumb. */ #if __ARM_ARCH_PROFILE == 'M' +#if __ARM_ARCH >= 8 + /* keep config inherited from -march=. */ +#else .arch armv7e-m +#endif /* if __ARM_ARCH >= 8 */ #else .arch armv6t2 #endif @@ -100,8 +105,35 @@ #define tmp2 r5 def_fn strlen p2align=6 + .fnstart + .cfi_startproc + /* common pacbti_prologue macro from pacbti.h not used. + handwritten prologue saves one push instruction. */ +#if __HAVE_PAC_LEAF +#if __ARM_FEATURE_BTI_DEFAULT + pacbti ip, lr, sp +#else + pac ip, lr, sp +#endif /* __ARM_FEATURE_BTI_DEFAULT */ + .cfi_register 143, 12 + push {r4, r5, ip} + .save {r4, r5, ra_auth_code} + .cfi_def_cfa_offset 12 + .cfi_offset 143, -4 + .cfi_offset 5, -8 + .cfi_offset 4, -12 + +#else +#if __ARM_FEATURE_BTI_DEFAULT + bti +#endif /* __ARM_FEATURE_BTI_DEFAULT */ + push {r4, r5} + .save {r4, r5} + .cfi_def_cfa_offset 8 + .cfi_offset 5, -4 + .cfi_offset 4, -8 +#endif /* __HAVE_PAC_LEAF */ pld [srcin, #0] - strd r4, r5, [sp, #-8]! bic src, srcin, #7 mvn const_m1, #0 ands tmp1, srcin, #7 /* (8 - bytes) to alignment. */ @@ -159,9 +191,22 @@ def_fn strlen p2align=6 rev data1a, data1a #endif clz data1a, data1a - ldrd r4, r5, [sp], #8 add result, result, data1a, lsr #3 /* Bits -> Bytes. */ - bx lr +#if __HAVE_PAC_LEAF + pop {r4, r5, ip} + .cfi_restore 4 + .cfi_restore 5 + .cfi_restore 143 + .cfi_def_cfa_offset 0 + aut ip, lr, sp +#else + ldrd r4, r5, [sp], #8 + .cfi_restore 4 + .cfi_restore 5 + .cfi_def_cfa_offset 0 +#endif /* __HAVE_PAC_LEAF */ + bx lr + .Lmisaligned8: ldrd data1a, data1b, [src] @@ -177,4 +222,6 @@ def_fn strlen p2align=6 movne data1a, const_m1 mov const_0, #0 b .Lstart_realigned + .cfi_endproc + .fnend .size strlen, . - strlen diff --git a/newlib/libc/machine/arm/strlen-thumb2-Os.S b/newlib/libc/machine/arm/strlen-thumb2-Os.S index 961f41a0a..571068dac 100644 --- a/newlib/libc/machine/arm/strlen-thumb2-Os.S +++ b/newlib/libc/machine/arm/strlen-thumb2-Os.S @@ -25,6 +25,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include "acle-compat.h" +#include "pacbti.h" .macro def_fn f p2align=0 .text @@ -33,8 +34,9 @@ .type \f, %function \f: .endm - -#if __ARM_ARCH_ISA_THUMB >= 2 && __ARM_ARCH >= 7 +#if __ARM_ARCH_PROFILE == 'M' && __ARM_ARCH >= 8 + /* keep config inherited from -march=. */ +#elif __ARM_ARCH_ISA_THUMB >= 2 && __ARM_ARCH >= 7 .arch armv7 #else .arch armv6t2 @@ -44,11 +46,16 @@ .syntax unified def_fn strlen p2align=1 + .fnstart + .cfi_startproc + pacbti_prologue mov r3, r0 1: ldrb.w r2, [r3], #1 cmp r2, #0 bne 1b subs r0, r3, r0 subs r0, #1 - bx lr + pacbti_epilogue + .cfi_endproc + .fnend .size strlen, . - strlen -- 2.36.1
Add function prologue/epilogue to conditionally add BTI landing pads and/or PAC code generation & authentication instructions depending on compilation flags. --- newlib/libc/machine/arm/memchr.S | 55 ++++++++++++++++++++++++++++---- 1 file changed, 48 insertions(+), 7 deletions(-) diff --git a/newlib/libc/machine/arm/memchr.S b/newlib/libc/machine/arm/memchr.S index 1a4c6512c..6bdccb934 100644 --- a/newlib/libc/machine/arm/memchr.S +++ b/newlib/libc/machine/arm/memchr.S @@ -76,6 +76,7 @@ .syntax unified #include "acle-compat.h" +#include "pacbti.h" @ NOTE: This ifdef MUST match the one in memchr-stub.c #if defined (__ARM_NEON__) || defined (__ARM_NEON) @@ -267,10 +268,14 @@ memchr: #elif __ARM_ARCH_ISA_THUMB >= 2 && defined (__ARM_FEATURE_DSP) #if __ARM_ARCH_PROFILE == 'M' - .arch armv7e-m +#if __ARM_ARCH >= 8 + /* keep config inherited from -march=. */ #else - .arch armv6t2 -#endif + .arch armv7e-m +#endif /* __ARM_ARCH >= 8 */ +#else + .arch armv6t2 +#endif /* __ARM_ARCH_PROFILE == 'M' */ @ this lets us check a flag in a 00/ff byte easily in either endianness #ifdef __ARMEB__ @@ -287,11 +292,14 @@ memchr: .p2align 4,,15 .global memchr .type memchr,%function + .fnstart + .cfi_startproc memchr: @ r0 = start of memory to scan @ r1 = character to look for @ r2 = length @ returns r0 = pointer to character or NULL if not found + pacbti_prologue and r1,r1,#0xff @ Don't trust the caller to pass a char cmp r2,#16 @ If short don't bother with anything clever @@ -313,6 +321,19 @@ memchr: 10: @ We are aligned, we know we have at least 8 bytes to work with push {r4,r5,r6,r7} + .save {r4-r7} + .cfi_adjust_cfa_offset 16 +#if __HAVE_PAC_LEAF + .cfi_offset 4, -20 + .cfi_offset 5, -16 + .cfi_offset 6, -12 + .cfi_offset 7, -8 +#else + .cfi_offset 4, -16 + .cfi_offset 5, -12 + .cfi_offset 6, -8 + .cfi_offset 7, -4 +#endif /* __HAVE_PAC_LEAF*/ orr r1, r1, r1, lsl #8 @ expand the match word across all bytes orr r1, r1, r1, lsl #16 bic r4, r2, #7 @ Number of double words to work with * 8 @@ -334,6 +355,11 @@ memchr: bne 15b @ (Flags from the subs above) pop {r4,r5,r6,r7} + .cfi_restore 7 + .cfi_restore 6 + .cfi_restore 5 + .cfi_restore 4 + .cfi_adjust_cfa_offset -16 and r1,r1,#0xff @ r1 back to a single character and r2,r2,#7 @ Leave the count remaining as the number @ after the double words have been done @@ -350,11 +376,11 @@ memchr: 40: movs r0,#0 @ not found - bx lr + pacbti_epilogue 50: subs r0,r0,#1 @ found - bx lr + pacbti_epilogue 60: @ We're here because the fast path found a hit @ now we have to track down exactly which word it was @@ -378,9 +404,24 @@ memchr: addeq r0,r0,#1 61: - pop {r4,r5,r6,r7} subs r0,r0,#1 - bx lr +#if __HAVE_PAC_LEAF + pop {r4,r5,r6,r7,ip} + .cfi_restore 143 +#else + pop {r4,r5,r6,r7} +#endif /* __HAVE_PAC_LEAF*/ + .cfi_restore 7 + .cfi_restore 6 + .cfi_restore 5 + .cfi_restore 4 + .cfi_def_cfa_offset 0 +#if __HAVE_PAC_LEAF + aut ip, lr, sp +#endif /* __HAVE_PAC_LEAF*/ + bx lr + .cfi_endproc + .fnend #else /* Defined in memchr-stub.c. */ #endif -- 2.36.1
Add function prologue/epilogue to conditionally add BTI landing pads and/or PAC code generation & authentication instructions depending on compilation flags. --- newlib/libc/machine/arm/memcpy-armv7m.S | 74 ++++++++++++++++++++++--- 1 file changed, 66 insertions(+), 8 deletions(-) diff --git a/newlib/libc/machine/arm/memcpy-armv7m.S b/newlib/libc/machine/arm/memcpy-armv7m.S index c8bff36f6..d9b9e4984 100644 --- a/newlib/libc/machine/arm/memcpy-armv7m.S +++ b/newlib/libc/machine/arm/memcpy-armv7m.S @@ -46,6 +46,8 @@ __OPT_BIG_BLOCK_SIZE: Size of big block in words. Default to 64. __OPT_MID_BLOCK_SIZE: Size of big block in words. Default to 16. */ +#include "pacbti.h" + #ifndef __OPT_BIG_BLOCK_SIZE #define __OPT_BIG_BLOCK_SIZE (4 * 16) #endif @@ -85,6 +87,8 @@ .global memcpy .thumb .thumb_func + .fnstart + .cfi_startproc .type memcpy, %function memcpy: @ r0: dst @@ -93,10 +97,31 @@ memcpy: #ifdef __ARM_FEATURE_UNALIGNED /* In case of UNALIGNED access supported, ip is not used in function body. */ + pacbti_prologue mov ip, r0 #else +#if __HAVE_PAC_LEAF +#if __ARM_FEATURE_BTI_DEFAULT + pacbti ip, lr, sp +#else + pac ip, lr, sp +#endif /* __ARM_FEATURE_BTI_DEFAULT */ + .cfi_register 143, 12 + push {r0, ip} + .save {r0, ra_auth_code} + .cfi_def_cfa_offset 8 + .cfi_offset 0, -8 + .cfi_offset 143, -4 +#else +#if __ARM_FEATURE_BTI_DEFAULT + bti +#endif /* __ARM_FEATURE_BTI_DEFAULT */ push {r0} -#endif + .save {r0} + .cfi_adjust_cfa_offset 4 + .cfi_offset 0, -4 +#endif /* __HAVE_PAC_LEAF */ +#endif /* __ARM_FEATURE_UNALIGNED */ orr r3, r1, r0 ands r3, r3, #3 bne .Lmisaligned_copy @@ -135,13 +160,13 @@ memcpy: ldr r3, [r1], #4 str r3, [r0], #4 END_UNROLL -#else /* __ARM_ARCH_7M__ */ +#else ldr r3, [r1, \offset] str r3, [r0, \offset] END_UNROLL adds r0, __OPT_MID_BLOCK_SIZE adds r1, __OPT_MID_BLOCK_SIZE -#endif +#endif /* __ARM_ARCH_7M__ */ subs r2, __OPT_MID_BLOCK_SIZE bhs .Lmid_block_loop @@ -180,10 +205,21 @@ memcpy: .Ldone: #ifdef __ARM_FEATURE_UNALIGNED mov r0, ip + pacbti_epilogue +#else +#if __HAVE_PAC_LEAF + pop {r0, ra_auth_code} + .cfi_restore 0 + .cfi_restore 143 + .cfi_def_cfa_offset 0 + aut ip, lr, sp #else pop {r0} -#endif - bx lr + .cfi_restore 0 + .cfi_def_cfa_offset 0 +#endif /* __HAVE_PAC_LEAF */ + bx lr +#endif /* __ARM_FEATURE_UNALIGNED */ .align 2 .Lmisaligned_copy: @@ -247,6 +283,15 @@ memcpy: /* dst is aligned, but src isn't. Misaligned copy. */ push {r4, r5} + .save {r4, r5} + .cfi_adjust_cfa_offset 8 +#if __HAVE_PAC_LEAF /* we pushed just the pac code. */ + .cfi_offset 4, -12 + .cfi_offset 5, -8 +#else /* we haven't pushed anything to stack. */ + .cfi_offset 4, -8 + .cfi_offset 5, -4 +#endif /* __HAVE_PAC_LEAF */ subs r2, #4 /* Backward r1 by misaligned bytes, to make r1 aligned. @@ -299,6 +344,7 @@ memcpy: adds r2, #4 subs r1, ip pop {r4, r5} + .cfi_adjust_cfa_offset -8 #endif /* __ARM_FEATURE_UNALIGNED */ @@ -321,9 +367,21 @@ memcpy: #ifdef __ARM_FEATURE_UNALIGNED mov r0, ip + pacbti_epilogue +#else +#if __HAVE_PAC_LEAF + pop {r0, ra_auth_code} + .cfi_restore 0 + .cfi_restore 143 + .cfi_def_cfa_offset 0 + aut ip, lr, sp #else pop {r0} -#endif - bx lr - + .cfi_restore 0 + .cfi_def_cfa_offset 0 +#endif /* __HAVE_PAC_LEAF */ + bx lr +#endif /* __ARM_FEATURE_UNALIGNED */ + .cfi_endproc + .fnend .size memcpy, .-memcpy -- 2.36.1
Add function prologue/epilogue to conditionally add BTI landing pads and/or PAC code generation & authentication instructions depending on compilation flags. --- newlib/libc/machine/arm/setjmp.S | 33 ++++++++++++++++++++++++++++++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/newlib/libc/machine/arm/setjmp.S b/newlib/libc/machine/arm/setjmp.S index 21d6ff9e7..d60dd57df 100644 --- a/newlib/libc/machine/arm/setjmp.S +++ b/newlib/libc/machine/arm/setjmp.S @@ -157,11 +157,15 @@ SYM (.arm_start_of.\name): .globl SYM (\name) TYPE (\name) SYM (\name): + .fnstart + .cfi_startproc PROLOGUE \name .endm .macro FUNC_END name RET + .cfi_endproc + .fnend SIZE (\name) .endm @@ -173,11 +177,26 @@ SYM (\name): /* Save all the callee-preserved registers into the jump buffer. */ #ifdef __thumb2__ +#if __ARM_FEATURE_PAC_DEFAULT +#if __ARM_FEATURE_BTI_DEFAULT + pacbti ip, lr, sp +#else + pac ip, lr, sp +#endif /* __ARM_FEATURE_BTI_DEFAULT */ + .cfi_register 143, 12 + mov a4, ip + mov ip, sp + stmea a1!, { a4, v1-v7, fp, ip, lr } +#else +#if __ARM_FEATURE_BTI_DEFAULT + bti +#endif /* __ARM_FEATURE_BTI_DEFAULT */ mov ip, sp stmea a1!, { v1-v7, fp, ip, lr } +#endif /* __ARM_FEATURE_PAC_DEFAULT */ #else stmea a1!, { v1-v7, fp, ip, sp, lr } -#endif +#endif /* __thumb2__ */ #if 0 /* Simulator does not cope with FP instructions yet. */ #ifndef __SOFTFP__ @@ -200,11 +219,17 @@ SYM (\name): /* Restore the registers, retrieving the state when setjmp() was called. */ #ifdef __thumb2__ +#if __ARM_FEATURE_PAC_DEFAULT + ldmfd a1!, { a4, v1-v7, fp, ip, lr } + mov sp, ip + mov ip, a4 +#else ldmfd a1!, { v1-v7, fp, ip, lr } mov sp, ip +#endif /* __ARM_FEATURE_PAC_DEFAULT */ #else ldmfd a1!, { v1-v7, fp, ip, sp, lr } -#endif +#endif /* __thumb2__ */ #if 0 /* Simulator does not cope with FP instructions yet. */ #ifndef __SOFTFP__ @@ -220,5 +245,9 @@ SYM (\name): #endif moveq a1, #1 +#if __ARM_FEATURE_PAC_DEFAULT + aut ip, lr, sp +#endif + FUNC_END longjmp #endif -- 2.36.1
Add function prologue/epilogue to conditionally add BTI landing pads and/or PAC code generation & authentication instructions depending on compilation flags. --- .../libc/machine/arm/aeabi_memmove-thumb2.S | 54 +++++++++++++++++-- 1 file changed, 49 insertions(+), 5 deletions(-) diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb2.S b/newlib/libc/machine/arm/aeabi_memmove-thumb2.S index e9504437b..34054d96c 100644 --- a/newlib/libc/machine/arm/aeabi_memmove-thumb2.S +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb2.S @@ -26,6 +26,8 @@ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ + #include "pacbti.h" + .thumb .syntax unified .global __aeabi_memmove @@ -33,8 +35,28 @@ ASM_ALIAS __aeabi_memmove4 __aeabi_memmove ASM_ALIAS __aeabi_memmove8 __aeabi_memmove __aeabi_memmove: - cmp r0, r1 + .fnstart + .cfi_startproc +#if __HAVE_PAC_LEAF +#if __ARM_FEATURE_BTI_DEFAULT + pacbti ip, lr, sp +#else + pac ip, lr, sp +#endif /* __ARM_FEATURE_BTI_DEFAULT */ + .cfi_register 143, 12 + push {r4, ip} + .save {r4, ra_auth_code} + .cfi_def_cfa_offset 8 + .cfi_offset 4, -8 + .cfi_offset 143, -4 +#elif __ARM_FEATURE_BTI_DEFAULT + bti push {r4} + .save {r4} + .cfi_def_cfa_offset 4 + .cfi_offset 4, -4 +#endif /* __HAVE_PAC_LEAF */ + cmp r0, r1 bls 3f adds r3, r1, r2 cmp r0, r3 @@ -48,8 +70,18 @@ __aeabi_memmove: strb r4, [r1, #-1]! bne 1b 2: - pop {r4} - bx lr +#if __HAVE_PAC_LEAF + pop {r4, ip} + .cfi_restore 4 + .cfi_restore 143 + .cfi_def_cfa_offset 0 + aut ip, lr, sp +#else + pop {r4} + .cfi_restore 4 + .cfi_def_cfa_offset 0 +#endif /* __HAVE_PAC_LEAF */ + bx lr 3: cmp r2, #0 beq 2b @@ -60,6 +92,18 @@ __aeabi_memmove: cmp r2, r1 strb r4, [r3, #1]! bne 4b - pop {r4} - bx lr +#if __HAVE_PAC_LEAF + pop {r4, ip} + .cfi_restore 4 + .cfi_restore 143 + .cfi_def_cfa_offset 0 + aut ip, lr, sp +#else + pop {r4} + .cfi_restore 4 + .cfi_def_cfa_offset 0 +#endif /* __HAVE_PAC_LEAF */ + bx lr + .cfi_endproc + .fnend .size __aeabi_memmove, . - __aeabi_memmove -- 2.36.1
Add function prologue/epilogue to conditionally add BTI landing pads and/or PAC code generation & authentication instructions depending on compilation flags. --- newlib/libc/machine/arm/aeabi_memset-thumb2.S | 47 ++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb2.S b/newlib/libc/machine/arm/aeabi_memset-thumb2.S index eaca1d8d7..8bd4b5eaa 100644 --- a/newlib/libc/machine/arm/aeabi_memset-thumb2.S +++ b/newlib/libc/machine/arm/aeabi_memset-thumb2.S @@ -26,14 +26,43 @@ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ +#include "pacbti.h" + .thumb .syntax unified .global __aeabi_memset .type __aeabi_memset, %function + .fnstart + .cfi_startproc ASM_ALIAS __aeabi_memset4 __aeabi_memset ASM_ALIAS __aeabi_memset8 __aeabi_memset __aeabi_memset: +#if __HAVE_PAC_LEAF +#if __ARM_FEATURE_BTI_DEFAULT + pacbti ip, lr, sp +#else + pac ip, lr, sp +#endif /* __ARM_FEATURE_BTI_DEFAULT */ + .cfi_register 143, 12 + push {r4, r5, r6, ip} + .save {r4, r5, r6, ra_auth_code} + .cfi_def_cfa_offset 16 + .cfi_offset 4, -16 + .cfi_offset 5, -12 + .cfi_offset 6, -8 + .cfi_offset 143, -4 +#else +#if __ARM_FEATURE_BTI_DEFAULT + bti +#endif /* __ARM_FEATURE_BTI_DEFAULT */ push {r4, r5, r6} + .save {r4, r5, r6} + .cfi_def_cfa_offset 12 + .cfi_offset 4, -12 + .cfi_offset 5, -8 + .cfi_offset 6, -4 +#endif /* __HAVE_PAC_LEAF */ + lsls r4, r0, #30 beq 10f subs r4, r1, #1 @@ -98,10 +127,26 @@ __aeabi_memset: cmp r3, r4 bne 8b 9: +#if __HAVE_PAC_LEAF + pop {r4, r5, r6, ip} + .cfi_restore 143 + .cfi_restore 6 + .cfi_restore 5 + .cfi_restore 4 + .cfi_def_cfa_offset 0 + aut ip, lr, sp +#else pop {r4, r5, r6} - bx lr + .cfi_restore 6 + .cfi_restore 5 + .cfi_restore 4 + .cfi_def_cfa_offset 0 +#endif /* __HAVE_PAC_LEAF */ + bx lr 10: mov r4, r1 mov r3, r0 b 3b + .cfi_endproc + .fnend .size __aeabi_memset, . - __aeabi_memset -- 2.36.1
On 03/08/2022 16:35, Victor Do Nascimento wrote: > Create an assembly header file that conditionally defines fuction > prologues/epilogues depending on the compile-time mbranch-protection > argument values. > > * newlib/libc/machine/arm/pacbti.h: New. > --- > newlib/libc/machine/arm/pacbti.h | 64 ++++++++++++++++++++++++++++++++ > 1 file changed, 64 insertions(+) > create mode 100644 newlib/libc/machine/arm/pacbti.h > > diff --git a/newlib/libc/machine/arm/pacbti.h b/newlib/libc/machine/arm/pacbti.h > new file mode 100644 > index 000000000..4c31d00bc > --- /dev/null > +++ b/newlib/libc/machine/arm/pacbti.h > @@ -0,0 +1,64 @@ > +/* > + * Copyright (c) 2022 Arm Ltd > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in the > + * documentation and/or other materials provided with the distribution. > + * 3. The name of the company may not be used to endorse or promote > + * products derived from this software without specific prior written > + * permission. > + * > + * THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED > + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF > + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. > + * IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED > + * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR > + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF > + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING > + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS > + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +/* Checki whether leaf function PAC signing has been requested > + in the -mbranch-protect compile-time option. */ > +#define LEAF_PROTECT_BIT 2 > +#define __HAVE_PAC_LEAF \ > + __ARM_FEATURE_PAC_DEFAULT & (1 << LEAF_PROTECT_BIT) Either this header needs to avoid polluting the user namespace, or it doesn't. If it does, then LEAF_PROTECT_BIT fails to do that. If it doesn't then __HAVE_PAC_LEAF should really be HAVE_PAC_LEAF (to avoid polluting the reserved namespace. I suspect this header is private to newlib (won't be exported to users), so should not be prefixing names with __. > + > +/* Macro to handle function entry depending on branch-protection > + schemes. */ > + .macro pacbti_prologue > +#if __HAVE_PAC_LEAF > +#if __ARM_FEATURE_BTI_DEFAULT > + pacbti ip, lr, sp > +#else > + pac ip, lr, sp > +#endif /* __ARM_FEATURE_BTI_DEFAULT */ > + .cfi_register 143, 12 > + str ip, [sp, #-4]! This causes the stack to lose 8-byte alignment. Whilst for leaf functions that's probably not a problem, the macro should have an option where the user can ask for alignment to be preserved. But there should also be an option to not save IP on the stack at all (for when the user does not modify IP in the function body). > + .save {ra_auth_code} > + .cfi_def_cfa_offset 4 > + .cfi_offset 143, -4 > +#elif __ARM_FEATURE_BTI_DEFAULT > + bti > +#endif /* __HAVE_PAC_LEAF */ > + .endm > + > +/* Macro to handle different branch exchange cases depending on > + branch-protection schemes. */ > + .macro pacbti_epilogue > +#if __HAVE_PAC_LEAF > + ldr ip, [sp], #4 > + .cfi_restore 143 > + .cfi_def_cfa_offset 0 > + aut ip, lr, sp > +#endif /* __HAVE_PAC_LEAF */ > + bx lr > + .endm I think these macros are really misnamed, firstly, they're only for leaf functions and secondly, they (particularly the epilogue) does something even if PAC is not needed. So I think they should be renamed as 'leaf_prologue' and 'leaf_epilogue' respectively. In consequence, I think this file should really be merged into the existing arm_asm.h, then we don't need yet another header file. Finally, the header needs to define a (C) macro that defines how much to adjust the CFI offset by for various scenarios: - When PAC is not used - When PAC is used with no alignment - When PAC is used and asked for 8-byte alignment to be preserved. R.
On 03/08/2022 16:35, Victor Do Nascimento wrote: > Add function prologue/epilogue to conditionally add BTI landing pads > and/or PAC code generation & authentication instructions depending on > compilation flags. > > This patch enables PACBTI for all relevant variants of strcmp: > * Newlib for armv8.1-m.main+pacbti > * Newlib for armv8.1-m.main+pacbti+mve > * Newlib-nano > --- > newlib/libc/machine/arm/strcmp-arm-tiny.S | 7 +++- > newlib/libc/machine/arm/strcmp-armv7.S | 45 +++++++++++++++++------ > newlib/libc/machine/arm/strcmp-armv7m.S | 36 +++++++++++++++--- > 3 files changed, 71 insertions(+), 17 deletions(-) > > diff --git a/newlib/libc/machine/arm/strcmp-arm-tiny.S b/newlib/libc/machine/arm/strcmp-arm-tiny.S > index 607a41daf..8085eb4df 100644 > --- a/newlib/libc/machine/arm/strcmp-arm-tiny.S > +++ b/newlib/libc/machine/arm/strcmp-arm-tiny.S > @@ -29,10 +29,14 @@ > /* Tiny version of strcmp in ARM state. Used only when optimizing > for size. Also supports Thumb-2. */ > > +#include "pacbti.h" > + > .syntax unified > def_fn strcmp > + .fnstart > .cfi_sections .debug_frame > .cfi_startproc > + pacbti_prologue See comment on patch1 about the names of these macros. > 1: > ldrb r2, [r0], #1 > ldrb r3, [r1], #1 > @@ -42,6 +46,7 @@ def_fn strcmp > beq 1b > 2: > subs r0, r2, r3 > - bx lr > + pacbti_epilogue > .cfi_endproc > + .fnend > .size strcmp, . - strcmp > diff --git a/newlib/libc/machine/arm/strcmp-armv7.S b/newlib/libc/machine/arm/strcmp-armv7.S > index 2f93bfb73..584c89f7e 100644 > --- a/newlib/libc/machine/arm/strcmp-armv7.S > +++ b/newlib/libc/machine/arm/strcmp-armv7.S > @@ -45,6 +45,8 @@ > .thumb > .syntax unified > > +#include "pacbti.h" > + > /* Parameters and result. */ > #define src1 r0 > #define src2 r1 > @@ -91,8 +93,9 @@ > ldrd r4, r5, [sp], #16 > .cfi_restore 4 > .cfi_restore 5 > + .cfi_adjust_cfa_offset -16 This looks like it's an omission from the original file and thus should be separated out as a separate patch. > sub result, result, r1, lsr #24 > - bx lr > + pacbti_epilogue > #else > /* To use the big-endian trick we'd have to reverse all three words. > that's slower than this approach. */ > @@ -112,22 +115,28 @@ > ldrd r4, r5, [sp], #16 > .cfi_restore 4 > .cfi_restore 5 > + .cfi_adjust_cfa_offset -16 > sub result, result, r1 > > - bx lr > + pacbti_epilogue > #endif > .endm > > + > .text > .p2align 5 > + .fnstart > + .cfi_sections .debug_frame > + .cfi_startproc > .Lstrcmp_start_addr: > #ifndef STRCMP_NO_PRECHECK > .Lfastpath_exit: > sub r0, r2, r3 > - bx lr > + pacbti_epilogue > nop > #endif > def_fn strcmp > + pacbti_prologue > #ifndef STRCMP_NO_PRECHECK > ldrb r2, [src1] > ldrb r3, [src2] > @@ -136,16 +145,26 @@ def_fn strcmp > cmpcs r2, r3 > bne .Lfastpath_exit > #endif > - .cfi_sections .debug_frame > - .cfi_startproc > strd r4, r5, [sp, #-16]! > - .cfi_def_cfa_offset 16 > + .save {r4, r5} Hmm, I've just remembered that leaf functions cannot throw exceptions (the EHABI only supports synchronous exceptions), so I don't think we need .save directives at all. Instead, such functions should be marked with ".cantunwind". You still want the dwarf unwind information as that's for the debugger, it's just the '.save' directives that aren't needed. > + .cfi_adjust_cfa_offset 16 > +#if __HAVE_PAC_LEAF > + .cfi_offset 4, -20 > + .cfi_offset 5, -16 > +#else > .cfi_offset 4, -16 > .cfi_offset 5, -12 > +#endif /* __HAVE_PAC_LEAF */ Ugh! This is not the right way to do this. Instead (see comments on patch 1) use a macro that defines the additional adjustment to use in various contexts, something like .cfi_offset 4, -(16+PAC_UNALIGNED_CFI_ADJ) > orr tmp1, src1, src2 > strd r6, r7, [sp, #8] > + .save {r6, r7} > +#if __HAVE_PAC_LEAF > + .cfi_offset 6, -12 > + .cfi_offset 7, -8 > +#else > .cfi_offset 6, -8 > .cfi_offset 7, -4 > +#endif /* __HAVE_PAC_LEAF */ > mvn const_m1, #0 > lsl r2, tmp1, #29 > cbz r2, .Lloop_aligned8 > @@ -270,7 +289,6 @@ def_fn strcmp > ldr data1, [src1], #4 > beq .Laligned_m2 > bcs .Laligned_m1 > - > #ifdef STRCMP_NO_PRECHECK > ldrb data2, [src2, #1] > uxtb tmp1, data1, ror #BYTE1_OFFSET > @@ -314,7 +332,8 @@ def_fn strcmp > mov result, tmp1 > ldr r4, [sp], #16 > .cfi_restore 4 > - bx lr > + .cfi_adjust_cfa_offset -16 > + pacbti_epilogue > > #ifndef STRCMP_NO_PRECHECK > .Laligned_m1: > @@ -364,8 +383,9 @@ def_fn strcmp > /* R6/7 Not used in this sequence. */ > .cfi_restore 6 > .cfi_restore 7 > + .cfi_adjust_cfa_offset -16 > neg result, result > - bx lr > + pacbti_epilogue > > 6: > .cfi_restore_state > @@ -441,7 +461,8 @@ def_fn strcmp > /* R6/7 not used in this sequence. */ > .cfi_restore 6 > .cfi_restore 7 > - bx lr > + .cfi_adjust_cfa_offset -16 > + pacbti_epilogue > > .Lstrcmp_tail: > .cfi_restore_state > @@ -463,7 +484,9 @@ def_fn strcmp > /* R6/7 not used in this sequence. */ > .cfi_restore 6 > .cfi_restore 7 > + .cfi_adjust_cfa_offset -16 > sub result, result, data2, lsr #24 > - bx lr > + pacbti_epilogue > .cfi_endproc > + .fnend > .size strcmp, . - .Lstrcmp_start_addr > diff --git a/newlib/libc/machine/arm/strcmp-armv7m.S b/newlib/libc/machine/arm/strcmp-armv7m.S > index cdb4912df..fe1519f4d 100644 > --- a/newlib/libc/machine/arm/strcmp-armv7m.S > +++ b/newlib/libc/machine/arm/strcmp-armv7m.S > @@ -29,6 +29,8 @@ > /* Very similar to the generic code, but uses Thumb2 as implemented > in ARMv7-M. */ > > +#include "pacbti.h" > + > /* Parameters and result. */ > #define src1 r0 > #define src2 r1 > @@ -44,8 +46,10 @@ > .thumb > .syntax unified > def_fn strcmp > + .fnstart > .cfi_sections .debug_frame > .cfi_startproc > + pacbti_prologue > eor tmp1, src1, src2 > tst tmp1, #3 > /* Strings not at same byte offset from a word boundary. */ > @@ -106,7 +110,7 @@ def_fn strcmp > lsrs result, result, #24 > subs result, result, data2 > #endif > - bx lr > + pacbti_epilogue > > > #if 0 > @@ -214,12 +218,18 @@ def_fn strcmp > cmpcs data1, data2 > beq .Lstrcmp_unaligned > sub result, data1, data2 > - bx lr > + pacbti_epilogue > > 2: > stmfd sp!, {r5} > - .cfi_def_cfa_offset 4 > + .save {r5} > + .cfi_adjust_cfa_offset 4 > +#if __HAVE_PAC_LEAF > + .cfi_offset 5, -8 /* Account for ip register already on stack. */ > +#else > .cfi_offset 5, -4 > +#endif /* __HAVE_PAC_LEAF */ > + > > ldr data1, [src1], #4 > and tmp2, src2, #3 > @@ -353,10 +363,17 @@ def_fn strcmp > .Lstrcmp_done_equal: > mov result, #0 > .cfi_remember_state > +#if __HAVE_PAC_LEAF > + pop {r5, ip} > + .cfi_restore 5 > + .cfi_restore 143 > + aut ip, lr, sp > +#else > ldmfd sp!, {r5} > .cfi_restore 5 > +#endif /* __HAVE_PAC_LEAF */ I think we could define a macro for this as well, something like pop_reg_and_pac r5 While you're doing that, please change the single-register ldmfd into 'pop'. > .cfi_def_cfa_offset 0 > - bx lr > + bx lr stray change to the BX instruction. > > .Lstrcmp_tail: > .cfi_restore_state > @@ -370,9 +387,18 @@ def_fn strcmp > S2LOEQ data2, data2, #8 > beq .Lstrcmp_tail > sub result, r2, result > +#if __HAVE_PAC_LEAF > + pop {r5, ip} > + .cfi_restore 5 > + .cfi_restore 143 > + .cfi_def_cfa_offset 0 > + aut ip, lr, sp > +#else > ldmfd sp!, {r5} > .cfi_restore 5 > .cfi_def_cfa_offset 0 > - bx lr > +#endif /* __HAVE_PAC_LEAF */ Same as above > + bx lr tab between mnemonic and operands. > .cfi_endproc > + .fnend > .size strcmp, . - strcmp R.
Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes: > On 03/08/2022 16:35, Victor Do Nascimento wrote: >> Create an assembly header file that conditionally defines fuction >> prologues/epilogues depending on the compile-time mbranch-protection >> argument values. >> * newlib/libc/machine/arm/pacbti.h: New. >> --- >> newlib/libc/machine/arm/pacbti.h | 64 ++++++++++++++++++++++++++++++++ >> 1 file changed, 64 insertions(+) >> create mode 100644 newlib/libc/machine/arm/pacbti.h >> diff --git a/newlib/libc/machine/arm/pacbti.h >> b/newlib/libc/machine/arm/pacbti.h >> new file mode 100644 >> index 000000000..4c31d00bc >> --- /dev/null >> +++ b/newlib/libc/machine/arm/pacbti.h >> @@ -0,0 +1,64 @@ >> +/* >> + * Copyright (c) 2022 Arm Ltd >> + * All rights reserved. >> + * >> + * Redistribution and use in source and binary forms, with or without >> + * modification, are permitted provided that the following conditions >> + * are met: >> + * 1. Redistributions of source code must retain the above copyright >> + * notice, this list of conditions and the following disclaimer. >> + * 2. Redistributions in binary form must reproduce the above copyright >> + * notice, this list of conditions and the following disclaimer in the >> + * documentation and/or other materials provided with the distribution. >> + * 3. The name of the company may not be used to endorse or promote >> + * products derived from this software without specific prior written >> + * permission. >> + * >> + * THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED >> + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF >> + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. >> + * IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, >> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED >> + * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR >> + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF >> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING >> + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS >> + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. >> + */ >> + >> +/* Checki whether leaf function PAC signing has been requested >> + in the -mbranch-protect compile-time option. */ >> +#define LEAF_PROTECT_BIT 2 >> +#define __HAVE_PAC_LEAF \ >> + __ARM_FEATURE_PAC_DEFAULT & (1 << LEAF_PROTECT_BIT) > > Either this header needs to avoid polluting the user namespace, or it > doesn't. If it does, then LEAF_PROTECT_BIT fails to do that. If it > doesn't then __HAVE_PAC_LEAF should really be HAVE_PAC_LEAF (to avoid > polluting the reserved namespace. I suspect this header is private to > newlib (won't be exported to users), so should not be prefixing names > with __. > >> + >> +/* Macro to handle function entry depending on branch-protection >> + schemes. */ >> + .macro pacbti_prologue >> +#if __HAVE_PAC_LEAF >> +#if __ARM_FEATURE_BTI_DEFAULT >> + pacbti ip, lr, sp >> +#else >> + pac ip, lr, sp >> +#endif /* __ARM_FEATURE_BTI_DEFAULT */ >> + .cfi_register 143, 12 >> + str ip, [sp, #-4]! > > This causes the stack to lose 8-byte alignment. Whilst for leaf > functions that's probably not a problem, the macro should have an > option where the user can ask for alignment to be preserved. But > there should also be an option to not save IP on the stack at all (for > when the user does not modify IP in the function body). Thanks for the feedback here. So we introduce two options the user is given control over. I guess this raises two questions concerning design choices to be made. 1. How do we expose these options to the user, and 2. What should we define as default values? Regarding defaults, in order to maximize performance my initial guess would be that we don't push IP to the stack, such that we save on instructions by omitting pushes and pops and automatically preserve stack alignment that way. Even so, that leaves the question of how to best allow users to modify these parameters prior to compiling the library. Any thoughts?? >> + .save {ra_auth_code} >> + .cfi_def_cfa_offset 4 >> + .cfi_offset 143, -4 >> +#elif __ARM_FEATURE_BTI_DEFAULT >> + bti >> +#endif /* __HAVE_PAC_LEAF */ >> + .endm >> + >> +/* Macro to handle different branch exchange cases depending on >> + branch-protection schemes. */ >> + .macro pacbti_epilogue >> +#if __HAVE_PAC_LEAF >> + ldr ip, [sp], #4 >> + .cfi_restore 143 >> + .cfi_def_cfa_offset 0 >> + aut ip, lr, sp >> +#endif /* __HAVE_PAC_LEAF */ >> + bx lr >> + .endm > > I think these macros are really misnamed, firstly, they're only for > leaf functions and secondly, they (particularly the epilogue) does > something even if PAC is not needed. So I think they should be > renamed as 'leaf_prologue' and 'leaf_epilogue' respectively. Agreed. > In consequence, I think this file should really be merged into the > existing arm_asm.h, then we don't need yet another header file. > > Finally, the header needs to define a (C) macro that defines how much > to adjust the CFI offset by for various scenarios: > - When PAC is not used > - When PAC is used with no alignment > - When PAC is used and asked for 8-byte alignment to be preserved. I'm thinking we set the value of PAC_CFI_ADJ contigent on values of PAC_LEAF_PUSH_SP and PAC_PRESERVE_STACK_ALIGN when HAVE_PAC_LEAF is set (once I define these new macros, of course). V. > R.
Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes: > On 03/08/2022 16:35, Victor Do Nascimento wrote: >> Add function prologue/epilogue to conditionally add BTI landing pads >> and/or PAC code generation & authentication instructions depending on >> compilation flags. >> This patch enables PACBTI for all relevant variants of strcmp: >> * Newlib for armv8.1-m.main+pacbti >> * Newlib for armv8.1-m.main+pacbti+mve >> * Newlib-nano >> --- >> newlib/libc/machine/arm/strcmp-arm-tiny.S | 7 +++- >> newlib/libc/machine/arm/strcmp-armv7.S | 45 +++++++++++++++++------ >> newlib/libc/machine/arm/strcmp-armv7m.S | 36 +++++++++++++++--- >> 3 files changed, 71 insertions(+), 17 deletions(-) >> diff --git a/newlib/libc/machine/arm/strcmp-arm-tiny.S >> b/newlib/libc/machine/arm/strcmp-arm-tiny.S >> index 607a41daf..8085eb4df 100644 >> --- a/newlib/libc/machine/arm/strcmp-arm-tiny.S >> +++ b/newlib/libc/machine/arm/strcmp-arm-tiny.S >> @@ -29,10 +29,14 @@ >> /* Tiny version of strcmp in ARM state. Used only when optimizing >> for size. Also supports Thumb-2. */ >> +#include "pacbti.h" >> + >> .syntax unified >> def_fn strcmp >> + .fnstart >> .cfi_sections .debug_frame >> .cfi_startproc >> + pacbti_prologue > > See comment on patch1 about the names of these macros. > >> 1: >> ldrb r2, [r0], #1 >> ldrb r3, [r1], #1 >> @@ -42,6 +46,7 @@ def_fn strcmp >> beq 1b >> 2: >> subs r0, r2, r3 >> - bx lr >> + pacbti_epilogue >> .cfi_endproc >> + .fnend >> .size strcmp, . - strcmp >> diff --git a/newlib/libc/machine/arm/strcmp-armv7.S b/newlib/libc/machine/arm/strcmp-armv7.S >> index 2f93bfb73..584c89f7e 100644 >> --- a/newlib/libc/machine/arm/strcmp-armv7.S >> +++ b/newlib/libc/machine/arm/strcmp-armv7.S >> @@ -45,6 +45,8 @@ >> .thumb >> .syntax unified >> +#include "pacbti.h" >> + >> /* Parameters and result. */ >> #define src1 r0 >> #define src2 r1 >> @@ -91,8 +93,9 @@ >> ldrd r4, r5, [sp], #16 >> .cfi_restore 4 >> .cfi_restore 5 >> + .cfi_adjust_cfa_offset -16 > > This looks like it's an omission from the original file and thus > should be separated out as a separate patch. > >> sub result, result, r1, lsr #24 >> - bx lr >> + pacbti_epilogue >> #else >> /* To use the big-endian trick we'd have to reverse all three words. >> that's slower than this approach. */ >> @@ -112,22 +115,28 @@ >> ldrd r4, r5, [sp], #16 >> .cfi_restore 4 >> .cfi_restore 5 >> + .cfi_adjust_cfa_offset -16 >> sub result, result, r1 >> - bx lr >> + pacbti_epilogue >> #endif >> .endm >> + >> .text >> .p2align 5 >> + .fnstart >> + .cfi_sections .debug_frame >> + .cfi_startproc >> .Lstrcmp_start_addr: >> #ifndef STRCMP_NO_PRECHECK >> .Lfastpath_exit: >> sub r0, r2, r3 >> - bx lr >> + pacbti_epilogue >> nop >> #endif >> def_fn strcmp >> + pacbti_prologue >> #ifndef STRCMP_NO_PRECHECK >> ldrb r2, [src1] >> ldrb r3, [src2] >> @@ -136,16 +145,26 @@ def_fn strcmp >> cmpcs r2, r3 >> bne .Lfastpath_exit >> #endif >> - .cfi_sections .debug_frame >> - .cfi_startproc >> strd r4, r5, [sp, #-16]! >> - .cfi_def_cfa_offset 16 >> + .save {r4, r5} > > Hmm, I've just remembered that leaf functions cannot throw exceptions > (the EHABI only supports synchronous exceptions), so I don't think we > need .save directives at all. Instead, such functions should be > marked with ".cantunwind". You still want the dwarf unwind > information as that's for the debugger, it's just the '.save' > directives that aren't needed. > >> + .cfi_adjust_cfa_offset 16 >> +#if __HAVE_PAC_LEAF >> + .cfi_offset 4, -20 >> + .cfi_offset 5, -16 >> +#else >> .cfi_offset 4, -16 >> .cfi_offset 5, -12 >> +#endif /* __HAVE_PAC_LEAF */ > > Ugh! This is not the right way to do this. Instead (see comments on > patch 1) use a macro that defines the additional adjustment to use in > various contexts, something like > > .cfi_offset 4, -(16+PAC_UNALIGNED_CFI_ADJ) > >> orr tmp1, src1, src2 >> strd r6, r7, [sp, #8] >> + .save {r6, r7} >> +#if __HAVE_PAC_LEAF >> + .cfi_offset 6, -12 >> + .cfi_offset 7, -8 >> +#else >> .cfi_offset 6, -8 >> .cfi_offset 7, -4 >> +#endif /* __HAVE_PAC_LEAF */ >> mvn const_m1, #0 >> lsl r2, tmp1, #29 >> cbz r2, .Lloop_aligned8 >> @@ -270,7 +289,6 @@ def_fn strcmp >> ldr data1, [src1], #4 >> beq .Laligned_m2 >> bcs .Laligned_m1 >> - >> #ifdef STRCMP_NO_PRECHECK >> ldrb data2, [src2, #1] >> uxtb tmp1, data1, ror #BYTE1_OFFSET >> @@ -314,7 +332,8 @@ def_fn strcmp >> mov result, tmp1 >> ldr r4, [sp], #16 >> .cfi_restore 4 >> - bx lr >> + .cfi_adjust_cfa_offset -16 >> + pacbti_epilogue >> #ifndef STRCMP_NO_PRECHECK >> .Laligned_m1: >> @@ -364,8 +383,9 @@ def_fn strcmp >> /* R6/7 Not used in this sequence. */ >> .cfi_restore 6 >> .cfi_restore 7 >> + .cfi_adjust_cfa_offset -16 >> neg result, result >> - bx lr >> + pacbti_epilogue >> 6: >> .cfi_restore_state >> @@ -441,7 +461,8 @@ def_fn strcmp >> /* R6/7 not used in this sequence. */ >> .cfi_restore 6 >> .cfi_restore 7 >> - bx lr >> + .cfi_adjust_cfa_offset -16 >> + pacbti_epilogue >> .Lstrcmp_tail: >> .cfi_restore_state >> @@ -463,7 +484,9 @@ def_fn strcmp >> /* R6/7 not used in this sequence. */ >> .cfi_restore 6 >> .cfi_restore 7 >> + .cfi_adjust_cfa_offset -16 >> sub result, result, data2, lsr #24 >> - bx lr >> + pacbti_epilogue >> .cfi_endproc >> + .fnend >> .size strcmp, . - .Lstrcmp_start_addr >> diff --git a/newlib/libc/machine/arm/strcmp-armv7m.S b/newlib/libc/machine/arm/strcmp-armv7m.S >> index cdb4912df..fe1519f4d 100644 >> --- a/newlib/libc/machine/arm/strcmp-armv7m.S >> +++ b/newlib/libc/machine/arm/strcmp-armv7m.S >> @@ -29,6 +29,8 @@ >> /* Very similar to the generic code, but uses Thumb2 as implemented >> in ARMv7-M. */ >> +#include "pacbti.h" >> + >> /* Parameters and result. */ >> #define src1 r0 >> #define src2 r1 >> @@ -44,8 +46,10 @@ >> .thumb >> .syntax unified >> def_fn strcmp >> + .fnstart >> .cfi_sections .debug_frame >> .cfi_startproc >> + pacbti_prologue >> eor tmp1, src1, src2 >> tst tmp1, #3 >> /* Strings not at same byte offset from a word boundary. */ >> @@ -106,7 +110,7 @@ def_fn strcmp >> lsrs result, result, #24 >> subs result, result, data2 >> #endif >> - bx lr >> + pacbti_epilogue >> #if 0 >> @@ -214,12 +218,18 @@ def_fn strcmp >> cmpcs data1, data2 >> beq .Lstrcmp_unaligned >> sub result, data1, data2 >> - bx lr >> + pacbti_epilogue >> 2: >> stmfd sp!, {r5} >> - .cfi_def_cfa_offset 4 >> + .save {r5} >> + .cfi_adjust_cfa_offset 4 >> +#if __HAVE_PAC_LEAF >> + .cfi_offset 5, -8 /* Account for ip register already on stack. */ >> +#else >> .cfi_offset 5, -4 >> +#endif /* __HAVE_PAC_LEAF */ >> + >> ldr data1, [src1], #4 >> and tmp2, src2, #3 >> @@ -353,10 +363,17 @@ def_fn strcmp >> .Lstrcmp_done_equal: >> mov result, #0 >> .cfi_remember_state >> +#if __HAVE_PAC_LEAF >> + pop {r5, ip} >> + .cfi_restore 5 >> + .cfi_restore 143 >> + aut ip, lr, sp >> +#else >> ldmfd sp!, {r5} >> .cfi_restore 5 >> +#endif /* __HAVE_PAC_LEAF */ > > I think we could define a macro for this as well, something like > pop_reg_and_pac r5 This sounds good, but a survey through relevant function epilogues reveals a series of different `pop' scenarios. On the one hand, we have pop {r5, ip}, as we have here but on another we also have, for example, pop {r4,r5,r6,r7,ip}. Do we wish to provide a short-hand notation for one pop operation but not the other via macros? My initial understanding is that assembler macros don't provide us with the necessary flexibility to handle all relevant cases. I think the use of a macro here would greatly enhance code readability and by extension maintainability, but at first sight it feels inelegant to provide a solution applicable only to a single case of a more general problem. Any wisdom here would be appreciated. V. > While you're doing that, please change the single-register ldmfd into 'pop'. > >> .cfi_def_cfa_offset 0 >> - bx lr >> + bx lr > > stray change to the BX instruction. > >> .Lstrcmp_tail: >> .cfi_restore_state >> @@ -370,9 +387,18 @@ def_fn strcmp >> S2LOEQ data2, data2, #8 >> beq .Lstrcmp_tail >> sub result, r2, result >> +#if __HAVE_PAC_LEAF >> + pop {r5, ip} >> + .cfi_restore 5 >> + .cfi_restore 143 >> + .cfi_def_cfa_offset 0 >> + aut ip, lr, sp >> +#else >> ldmfd sp!, {r5} >> .cfi_restore 5 >> .cfi_def_cfa_offset 0 >> - bx lr >> +#endif /* __HAVE_PAC_LEAF */ > > Same as above > >> + bx lr > > tab between mnemonic and operands. > >> .cfi_endproc >> + .fnend >> .size strcmp, . - strcmp > > R.
On 05/08/2022 11:51, Victor L. Do Nascimento wrote: > Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes: > >> On 03/08/2022 16:35, Victor Do Nascimento wrote: >>> Add function prologue/epilogue to conditionally add BTI landing pads >>> and/or PAC code generation & authentication instructions depending on >>> compilation flags. >>> This patch enables PACBTI for all relevant variants of strcmp: >>> * Newlib for armv8.1-m.main+pacbti >>> * Newlib for armv8.1-m.main+pacbti+mve >>> * Newlib-nano >>> --- >>> newlib/libc/machine/arm/strcmp-arm-tiny.S | 7 +++- >>> newlib/libc/machine/arm/strcmp-armv7.S | 45 +++++++++++++++++------ >>> newlib/libc/machine/arm/strcmp-armv7m.S | 36 +++++++++++++++--- >>> 3 files changed, 71 insertions(+), 17 deletions(-) >>> diff --git a/newlib/libc/machine/arm/strcmp-arm-tiny.S >>> b/newlib/libc/machine/arm/strcmp-arm-tiny.S >>> index 607a41daf..8085eb4df 100644 >>> --- a/newlib/libc/machine/arm/strcmp-arm-tiny.S >>> +++ b/newlib/libc/machine/arm/strcmp-arm-tiny.S >>> @@ -29,10 +29,14 @@ >>> /* Tiny version of strcmp in ARM state. Used only when optimizing >>> for size. Also supports Thumb-2. */ >>> +#include "pacbti.h" >>> + >>> .syntax unified >>> def_fn strcmp >>> + .fnstart >>> .cfi_sections .debug_frame >>> .cfi_startproc >>> + pacbti_prologue >> >> See comment on patch1 about the names of these macros. >> >>> 1: >>> ldrb r2, [r0], #1 >>> ldrb r3, [r1], #1 >>> @@ -42,6 +46,7 @@ def_fn strcmp >>> beq 1b >>> 2: >>> subs r0, r2, r3 >>> - bx lr >>> + pacbti_epilogue >>> .cfi_endproc >>> + .fnend >>> .size strcmp, . - strcmp >>> diff --git a/newlib/libc/machine/arm/strcmp-armv7.S b/newlib/libc/machine/arm/strcmp-armv7.S >>> index 2f93bfb73..584c89f7e 100644 >>> --- a/newlib/libc/machine/arm/strcmp-armv7.S >>> +++ b/newlib/libc/machine/arm/strcmp-armv7.S >>> @@ -45,6 +45,8 @@ >>> .thumb >>> .syntax unified >>> +#include "pacbti.h" >>> + >>> /* Parameters and result. */ >>> #define src1 r0 >>> #define src2 r1 >>> @@ -91,8 +93,9 @@ >>> ldrd r4, r5, [sp], #16 >>> .cfi_restore 4 >>> .cfi_restore 5 >>> + .cfi_adjust_cfa_offset -16 >> >> This looks like it's an omission from the original file and thus >> should be separated out as a separate patch. >> >>> sub result, result, r1, lsr #24 >>> - bx lr >>> + pacbti_epilogue >>> #else >>> /* To use the big-endian trick we'd have to reverse all three words. >>> that's slower than this approach. */ >>> @@ -112,22 +115,28 @@ >>> ldrd r4, r5, [sp], #16 >>> .cfi_restore 4 >>> .cfi_restore 5 >>> + .cfi_adjust_cfa_offset -16 >>> sub result, result, r1 >>> - bx lr >>> + pacbti_epilogue >>> #endif >>> .endm >>> + >>> .text >>> .p2align 5 >>> + .fnstart >>> + .cfi_sections .debug_frame >>> + .cfi_startproc >>> .Lstrcmp_start_addr: >>> #ifndef STRCMP_NO_PRECHECK >>> .Lfastpath_exit: >>> sub r0, r2, r3 >>> - bx lr >>> + pacbti_epilogue >>> nop >>> #endif >>> def_fn strcmp >>> + pacbti_prologue >>> #ifndef STRCMP_NO_PRECHECK >>> ldrb r2, [src1] >>> ldrb r3, [src2] >>> @@ -136,16 +145,26 @@ def_fn strcmp >>> cmpcs r2, r3 >>> bne .Lfastpath_exit >>> #endif >>> - .cfi_sections .debug_frame >>> - .cfi_startproc >>> strd r4, r5, [sp, #-16]! >>> - .cfi_def_cfa_offset 16 >>> + .save {r4, r5} >> >> Hmm, I've just remembered that leaf functions cannot throw exceptions >> (the EHABI only supports synchronous exceptions), so I don't think we >> need .save directives at all. Instead, such functions should be >> marked with ".cantunwind". You still want the dwarf unwind >> information as that's for the debugger, it's just the '.save' >> directives that aren't needed. >> >>> + .cfi_adjust_cfa_offset 16 >>> +#if __HAVE_PAC_LEAF >>> + .cfi_offset 4, -20 >>> + .cfi_offset 5, -16 >>> +#else >>> .cfi_offset 4, -16 >>> .cfi_offset 5, -12 >>> +#endif /* __HAVE_PAC_LEAF */ >> >> Ugh! This is not the right way to do this. Instead (see comments on >> patch 1) use a macro that defines the additional adjustment to use in >> various contexts, something like >> >> .cfi_offset 4, -(16+PAC_UNALIGNED_CFI_ADJ) >> >>> orr tmp1, src1, src2 >>> strd r6, r7, [sp, #8] >>> + .save {r6, r7} >>> +#if __HAVE_PAC_LEAF >>> + .cfi_offset 6, -12 >>> + .cfi_offset 7, -8 >>> +#else >>> .cfi_offset 6, -8 >>> .cfi_offset 7, -4 >>> +#endif /* __HAVE_PAC_LEAF */ >>> mvn const_m1, #0 >>> lsl r2, tmp1, #29 >>> cbz r2, .Lloop_aligned8 >>> @@ -270,7 +289,6 @@ def_fn strcmp >>> ldr data1, [src1], #4 >>> beq .Laligned_m2 >>> bcs .Laligned_m1 >>> - >>> #ifdef STRCMP_NO_PRECHECK >>> ldrb data2, [src2, #1] >>> uxtb tmp1, data1, ror #BYTE1_OFFSET >>> @@ -314,7 +332,8 @@ def_fn strcmp >>> mov result, tmp1 >>> ldr r4, [sp], #16 >>> .cfi_restore 4 >>> - bx lr >>> + .cfi_adjust_cfa_offset -16 >>> + pacbti_epilogue >>> #ifndef STRCMP_NO_PRECHECK >>> .Laligned_m1: >>> @@ -364,8 +383,9 @@ def_fn strcmp >>> /* R6/7 Not used in this sequence. */ >>> .cfi_restore 6 >>> .cfi_restore 7 >>> + .cfi_adjust_cfa_offset -16 >>> neg result, result >>> - bx lr >>> + pacbti_epilogue >>> 6: >>> .cfi_restore_state >>> @@ -441,7 +461,8 @@ def_fn strcmp >>> /* R6/7 not used in this sequence. */ >>> .cfi_restore 6 >>> .cfi_restore 7 >>> - bx lr >>> + .cfi_adjust_cfa_offset -16 >>> + pacbti_epilogue >>> .Lstrcmp_tail: >>> .cfi_restore_state >>> @@ -463,7 +484,9 @@ def_fn strcmp >>> /* R6/7 not used in this sequence. */ >>> .cfi_restore 6 >>> .cfi_restore 7 >>> + .cfi_adjust_cfa_offset -16 >>> sub result, result, data2, lsr #24 >>> - bx lr >>> + pacbti_epilogue >>> .cfi_endproc >>> + .fnend >>> .size strcmp, . - .Lstrcmp_start_addr >>> diff --git a/newlib/libc/machine/arm/strcmp-armv7m.S b/newlib/libc/machine/arm/strcmp-armv7m.S >>> index cdb4912df..fe1519f4d 100644 >>> --- a/newlib/libc/machine/arm/strcmp-armv7m.S >>> +++ b/newlib/libc/machine/arm/strcmp-armv7m.S >>> @@ -29,6 +29,8 @@ >>> /* Very similar to the generic code, but uses Thumb2 as implemented >>> in ARMv7-M. */ >>> +#include "pacbti.h" >>> + >>> /* Parameters and result. */ >>> #define src1 r0 >>> #define src2 r1 >>> @@ -44,8 +46,10 @@ >>> .thumb >>> .syntax unified >>> def_fn strcmp >>> + .fnstart >>> .cfi_sections .debug_frame >>> .cfi_startproc >>> + pacbti_prologue >>> eor tmp1, src1, src2 >>> tst tmp1, #3 >>> /* Strings not at same byte offset from a word boundary. */ >>> @@ -106,7 +110,7 @@ def_fn strcmp >>> lsrs result, result, #24 >>> subs result, result, data2 >>> #endif >>> - bx lr >>> + pacbti_epilogue >>> #if 0 >>> @@ -214,12 +218,18 @@ def_fn strcmp >>> cmpcs data1, data2 >>> beq .Lstrcmp_unaligned >>> sub result, data1, data2 >>> - bx lr >>> + pacbti_epilogue >>> 2: >>> stmfd sp!, {r5} >>> - .cfi_def_cfa_offset 4 >>> + .save {r5} >>> + .cfi_adjust_cfa_offset 4 >>> +#if __HAVE_PAC_LEAF >>> + .cfi_offset 5, -8 /* Account for ip register already on stack. */ >>> +#else >>> .cfi_offset 5, -4 >>> +#endif /* __HAVE_PAC_LEAF */ >>> + >>> ldr data1, [src1], #4 >>> and tmp2, src2, #3 >>> @@ -353,10 +363,17 @@ def_fn strcmp >>> .Lstrcmp_done_equal: >>> mov result, #0 >>> .cfi_remember_state >>> +#if __HAVE_PAC_LEAF >>> + pop {r5, ip} >>> + .cfi_restore 5 >>> + .cfi_restore 143 >>> + aut ip, lr, sp >>> +#else >>> ldmfd sp!, {r5} >>> .cfi_restore 5 >>> +#endif /* __HAVE_PAC_LEAF */ >> >> I think we could define a macro for this as well, something like >> pop_reg_and_pac r5 > > This sounds good, but a survey through relevant function epilogues > reveals a series of different `pop' scenarios. On the one hand, we have > pop {r5, ip}, as we have here but on another we also have, for example, > pop {r4,r5,r6,r7,ip}. > > Do we wish to provide a short-hand notation for one pop operation but > not the other via macros? My initial understanding is that assembler > macros don't provide us with the necessary flexibility to handle all > relevant cases. > > I think the use of a macro here would greatly enhance code readability > and by extension maintainability, but at first sight it feels inelegant > to provide a solution applicable only to a single case of a more general > problem. > > Any wisdom here would be appreciated. Yes, this is all quite tricky. Fortunately, the vast majority of time registers saved on the stack frame are consecutively numbered. So I think it's possible to write some macros to handle this. The example below is not complete (I'll leave you to flesh out all the different cases :), but I think it shows that we can achieve what's needed. @ Emit .cfi_offset directives for a consecutive sequence of registers .macro cfisavelist first, last, count .cfi_offset \last, -4*(\count) .if \last-\first cfisavelist \first, \last-1, \count+1 .endif .endm @ Create a prologue entry sequence handling PAC/BTI, if required and emitting @ CFI directives for .macro prologue first=-1, last=-1, savepac=1 .if \first != -1 .if \last != -1 .if \savepac push {r\first-r\last, ip} .cfi_offset 143, -4 cfisavelist \first, \last, 2 .else push {r\first-r\last} cfisavelist \first, \last, 1 .endif .else push {r\first} cfisavelist \first, \first, 1 .endif .else ... .endif .endm .text .global foo .type foo, %function foo: .cfi_startproc prologue 4, 7, savepac=0 .cfi_endproc .text .global bar .type bar, %function bar: .cfi_startproc prologue 4, 7 .cfi_endproc .text .global wom .type wom, %function wom: .cfi_startproc prologue 4, savepac=0 .cfi_endproc .text .global bat .type bat, %function bat: .cfi_startproc prologue savepac=0 .cfi_endproc R. > > V. > >> While you're doing that, please change the single-register ldmfd into 'pop'. >> >>> .cfi_def_cfa_offset 0 >>> - bx lr >>> + bx lr >> >> stray change to the BX instruction. >> >>> .Lstrcmp_tail: >>> .cfi_restore_state >>> @@ -370,9 +387,18 @@ def_fn strcmp >>> S2LOEQ data2, data2, #8 >>> beq .Lstrcmp_tail >>> sub result, r2, result >>> +#if __HAVE_PAC_LEAF >>> + pop {r5, ip} >>> + .cfi_restore 5 >>> + .cfi_restore 143 >>> + .cfi_def_cfa_offset 0 >>> + aut ip, lr, sp >>> +#else >>> ldmfd sp!, {r5} >>> .cfi_restore 5 >>> .cfi_def_cfa_offset 0 >>> - bx lr >>> +#endif /* __HAVE_PAC_LEAF */ >> >> Same as above >> >>> + bx lr >> >> tab between mnemonic and operands. >> >>> .cfi_endproc >>> + .fnend >>> .size strcmp, . - strcmp >> >> R.
Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes: > On 05/08/2022 11:51, Victor L. Do Nascimento wrote: >> Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes: >> >>> On 03/08/2022 16:35, Victor Do Nascimento wrote: >>>> Add function prologue/epilogue to conditionally add BTI landing pads >>>> and/or PAC code generation & authentication instructions depending on >>>> compilation flags. >>>> This patch enables PACBTI for all relevant variants of strcmp: >>>> * Newlib for armv8.1-m.main+pacbti >>>> * Newlib for armv8.1-m.main+pacbti+mve >>>> * Newlib-nano >>>> --- >>>> newlib/libc/machine/arm/strcmp-arm-tiny.S | 7 +++- >>>> newlib/libc/machine/arm/strcmp-armv7.S | 45 +++++++++++++++++------ >>>> newlib/libc/machine/arm/strcmp-armv7m.S | 36 +++++++++++++++--- >>>> 3 files changed, 71 insertions(+), 17 deletions(-) >>>> diff --git a/newlib/libc/machine/arm/strcmp-arm-tiny.S >>>> b/newlib/libc/machine/arm/strcmp-arm-tiny.S >>>> index 607a41daf..8085eb4df 100644 >>>> --- a/newlib/libc/machine/arm/strcmp-arm-tiny.S >>>> +++ b/newlib/libc/machine/arm/strcmp-arm-tiny.S >>>> @@ -29,10 +29,14 @@ >>>> /* Tiny version of strcmp in ARM state. Used only when optimizing >>>> for size. Also supports Thumb-2. */ >>>> +#include "pacbti.h" >>>> + >>>> .syntax unified >>>> def_fn strcmp >>>> + .fnstart >>>> .cfi_sections .debug_frame >>>> .cfi_startproc >>>> + pacbti_prologue >>> >>> See comment on patch1 about the names of these macros. >>> >>>> 1: >>>> ldrb r2, [r0], #1 >>>> ldrb r3, [r1], #1 >>>> @@ -42,6 +46,7 @@ def_fn strcmp >>>> beq 1b >>>> 2: >>>> subs r0, r2, r3 >>>> - bx lr >>>> + pacbti_epilogue >>>> .cfi_endproc >>>> + .fnend >>>> .size strcmp, . - strcmp >>>> diff --git a/newlib/libc/machine/arm/strcmp-armv7.S b/newlib/libc/machine/arm/strcmp-armv7.S >>>> index 2f93bfb73..584c89f7e 100644 >>>> --- a/newlib/libc/machine/arm/strcmp-armv7.S >>>> +++ b/newlib/libc/machine/arm/strcmp-armv7.S >>>> @@ -45,6 +45,8 @@ >>>> .thumb >>>> .syntax unified >>>> +#include "pacbti.h" >>>> + >>>> /* Parameters and result. */ >>>> #define src1 r0 >>>> #define src2 r1 >>>> @@ -91,8 +93,9 @@ >>>> ldrd r4, r5, [sp], #16 >>>> .cfi_restore 4 >>>> .cfi_restore 5 >>>> + .cfi_adjust_cfa_offset -16 >>> >>> This looks like it's an omission from the original file and thus >>> should be separated out as a separate patch. >>> >>>> sub result, result, r1, lsr #24 >>>> - bx lr >>>> + pacbti_epilogue >>>> #else >>>> /* To use the big-endian trick we'd have to reverse all three words. >>>> that's slower than this approach. */ >>>> @@ -112,22 +115,28 @@ >>>> ldrd r4, r5, [sp], #16 >>>> .cfi_restore 4 >>>> .cfi_restore 5 >>>> + .cfi_adjust_cfa_offset -16 >>>> sub result, result, r1 >>>> - bx lr >>>> + pacbti_epilogue >>>> #endif >>>> .endm >>>> + >>>> .text >>>> .p2align 5 >>>> + .fnstart >>>> + .cfi_sections .debug_frame >>>> + .cfi_startproc >>>> .Lstrcmp_start_addr: >>>> #ifndef STRCMP_NO_PRECHECK >>>> .Lfastpath_exit: >>>> sub r0, r2, r3 >>>> - bx lr >>>> + pacbti_epilogue >>>> nop >>>> #endif >>>> def_fn strcmp >>>> + pacbti_prologue >>>> #ifndef STRCMP_NO_PRECHECK >>>> ldrb r2, [src1] >>>> ldrb r3, [src2] >>>> @@ -136,16 +145,26 @@ def_fn strcmp >>>> cmpcs r2, r3 >>>> bne .Lfastpath_exit >>>> #endif >>>> - .cfi_sections .debug_frame >>>> - .cfi_startproc >>>> strd r4, r5, [sp, #-16]! >>>> - .cfi_def_cfa_offset 16 >>>> + .save {r4, r5} >>> >>> Hmm, I've just remembered that leaf functions cannot throw exceptions >>> (the EHABI only supports synchronous exceptions), so I don't think we >>> need .save directives at all. Instead, such functions should be >>> marked with ".cantunwind". You still want the dwarf unwind >>> information as that's for the debugger, it's just the '.save' >>> directives that aren't needed. >>> >>>> + .cfi_adjust_cfa_offset 16 >>>> +#if __HAVE_PAC_LEAF >>>> + .cfi_offset 4, -20 >>>> + .cfi_offset 5, -16 >>>> +#else >>>> .cfi_offset 4, -16 >>>> .cfi_offset 5, -12 >>>> +#endif /* __HAVE_PAC_LEAF */ >>> >>> Ugh! This is not the right way to do this. Instead (see comments on >>> patch 1) use a macro that defines the additional adjustment to use in >>> various contexts, something like >>> >>> .cfi_offset 4, -(16+PAC_UNALIGNED_CFI_ADJ) >>> >>>> orr tmp1, src1, src2 >>>> strd r6, r7, [sp, #8] >>>> + .save {r6, r7} >>>> +#if __HAVE_PAC_LEAF >>>> + .cfi_offset 6, -12 >>>> + .cfi_offset 7, -8 >>>> +#else >>>> .cfi_offset 6, -8 >>>> .cfi_offset 7, -4 >>>> +#endif /* __HAVE_PAC_LEAF */ >>>> mvn const_m1, #0 >>>> lsl r2, tmp1, #29 >>>> cbz r2, .Lloop_aligned8 >>>> @@ -270,7 +289,6 @@ def_fn strcmp >>>> ldr data1, [src1], #4 >>>> beq .Laligned_m2 >>>> bcs .Laligned_m1 >>>> - >>>> #ifdef STRCMP_NO_PRECHECK >>>> ldrb data2, [src2, #1] >>>> uxtb tmp1, data1, ror #BYTE1_OFFSET >>>> @@ -314,7 +332,8 @@ def_fn strcmp >>>> mov result, tmp1 >>>> ldr r4, [sp], #16 >>>> .cfi_restore 4 >>>> - bx lr >>>> + .cfi_adjust_cfa_offset -16 >>>> + pacbti_epilogue >>>> #ifndef STRCMP_NO_PRECHECK >>>> .Laligned_m1: >>>> @@ -364,8 +383,9 @@ def_fn strcmp >>>> /* R6/7 Not used in this sequence. */ >>>> .cfi_restore 6 >>>> .cfi_restore 7 >>>> + .cfi_adjust_cfa_offset -16 >>>> neg result, result >>>> - bx lr >>>> + pacbti_epilogue >>>> 6: >>>> .cfi_restore_state >>>> @@ -441,7 +461,8 @@ def_fn strcmp >>>> /* R6/7 not used in this sequence. */ >>>> .cfi_restore 6 >>>> .cfi_restore 7 >>>> - bx lr >>>> + .cfi_adjust_cfa_offset -16 >>>> + pacbti_epilogue >>>> .Lstrcmp_tail: >>>> .cfi_restore_state >>>> @@ -463,7 +484,9 @@ def_fn strcmp >>>> /* R6/7 not used in this sequence. */ >>>> .cfi_restore 6 >>>> .cfi_restore 7 >>>> + .cfi_adjust_cfa_offset -16 >>>> sub result, result, data2, lsr #24 >>>> - bx lr >>>> + pacbti_epilogue >>>> .cfi_endproc >>>> + .fnend >>>> .size strcmp, . - .Lstrcmp_start_addr >>>> diff --git a/newlib/libc/machine/arm/strcmp-armv7m.S b/newlib/libc/machine/arm/strcmp-armv7m.S >>>> index cdb4912df..fe1519f4d 100644 >>>> --- a/newlib/libc/machine/arm/strcmp-armv7m.S >>>> +++ b/newlib/libc/machine/arm/strcmp-armv7m.S >>>> @@ -29,6 +29,8 @@ >>>> /* Very similar to the generic code, but uses Thumb2 as implemented >>>> in ARMv7-M. */ >>>> +#include "pacbti.h" >>>> + >>>> /* Parameters and result. */ >>>> #define src1 r0 >>>> #define src2 r1 >>>> @@ -44,8 +46,10 @@ >>>> .thumb >>>> .syntax unified >>>> def_fn strcmp >>>> + .fnstart >>>> .cfi_sections .debug_frame >>>> .cfi_startproc >>>> + pacbti_prologue >>>> eor tmp1, src1, src2 >>>> tst tmp1, #3 >>>> /* Strings not at same byte offset from a word boundary. */ >>>> @@ -106,7 +110,7 @@ def_fn strcmp >>>> lsrs result, result, #24 >>>> subs result, result, data2 >>>> #endif >>>> - bx lr >>>> + pacbti_epilogue >>>> #if 0 >>>> @@ -214,12 +218,18 @@ def_fn strcmp >>>> cmpcs data1, data2 >>>> beq .Lstrcmp_unaligned >>>> sub result, data1, data2 >>>> - bx lr >>>> + pacbti_epilogue >>>> 2: >>>> stmfd sp!, {r5} >>>> - .cfi_def_cfa_offset 4 >>>> + .save {r5} >>>> + .cfi_adjust_cfa_offset 4 >>>> +#if __HAVE_PAC_LEAF >>>> + .cfi_offset 5, -8 /* Account for ip register already on stack. */ >>>> +#else >>>> .cfi_offset 5, -4 >>>> +#endif /* __HAVE_PAC_LEAF */ >>>> + >>>> ldr data1, [src1], #4 >>>> and tmp2, src2, #3 >>>> @@ -353,10 +363,17 @@ def_fn strcmp >>>> .Lstrcmp_done_equal: >>>> mov result, #0 >>>> .cfi_remember_state >>>> +#if __HAVE_PAC_LEAF >>>> + pop {r5, ip} >>>> + .cfi_restore 5 >>>> + .cfi_restore 143 >>>> + aut ip, lr, sp >>>> +#else >>>> ldmfd sp!, {r5} >>>> .cfi_restore 5 >>>> +#endif /* __HAVE_PAC_LEAF */ >>> >>> I think we could define a macro for this as well, something like >>> pop_reg_and_pac r5 >> This sounds good, but a survey through relevant function epilogues >> reveals a series of different `pop' scenarios. On the one hand, we have >> pop {r5, ip}, as we have here but on another we also have, for example, >> pop {r4,r5,r6,r7,ip}. >> Do we wish to provide a short-hand notation for one pop operation >> but >> not the other via macros? My initial understanding is that assembler >> macros don't provide us with the necessary flexibility to handle all >> relevant cases. >> I think the use of a macro here would greatly enhance code >> readability >> and by extension maintainability, but at first sight it feels inelegant >> to provide a solution applicable only to a single case of a more general >> problem. >> Any wisdom here would be appreciated. > > Yes, this is all quite tricky. Fortunately, the vast majority of time > registers saved on the stack frame are consecutively numbered. So I > think it's possible to write some macros to handle this. The example > below is not complete (I'll leave you to flesh out all the different > cases :), but I think it shows that we can achieve what's needed. > > @ Emit .cfi_offset directives for a consecutive sequence of registers > .macro cfisavelist first, last, count > .cfi_offset \last, -4*(\count) > .if \last-\first > cfisavelist \first, \last-1, \count+1 > .endif > .endm > > @ Create a prologue entry sequence handling PAC/BTI, if required and > emitting > @ CFI directives for > .macro prologue first=-1, last=-1, savepac=1 > .if \first != -1 > .if \last != -1 > .if \savepac > push {r\first-r\last, ip} > .cfi_offset 143, -4 > cfisavelist \first, \last, 2 > .else > push {r\first-r\last} > cfisavelist \first, \last, 1 > .endif > .else > push {r\first} > cfisavelist \first, \first, 1 > .endif > .else > ... > .endif > .endm > > .text > .global foo > .type foo, %function > foo: > .cfi_startproc > prologue 4, 7, savepac=0 > .cfi_endproc > > .text > .global bar > .type bar, %function > bar: > .cfi_startproc > prologue 4, 7 > .cfi_endproc > > .text > .global wom > .type wom, %function > wom: > .cfi_startproc > prologue 4, savepac=0 > .cfi_endproc > > .text > .global bat > .type bat, %function > bat: > .cfi_startproc > prologue savepac=0 > .cfi_endproc > > R. Thanks for these, they'll come in extremely handy. Regarding the pop_reg_and_pac macro you suggested, I'd concocted a solution which at first sight seemed to make minimal assumptions about what we're asking it to pop and, looking at initial example disassemblies, seems to behave well. I'll share it with you here in case you come across anything warranting feedback. Furthermore, I will see whether it will be of any use in building upon your example. .macro cfi_restore_list first:req, rest:vararg .cfi_restore \first .ifnb \rest cfi_restore_list \rest .endif .endm .macro pop_reg_and_pac first:req, rest:vararg .ifnb \rest pop {\first, \rest, ip} .else pop {\first, ip} .endif cfi_restore_list \first, \rest .cfi_restore 143 aut ip, lr, sp .endm As always, thanks for your input. V. >> V. >> >>> While you're doing that, please change the single-register ldmfd into 'pop'. >>> >>>> .cfi_def_cfa_offset 0 >>>> - bx lr >>>> + bx lr >>> >>> stray change to the BX instruction. >>> >>>> .Lstrcmp_tail: >>>> .cfi_restore_state >>>> @@ -370,9 +387,18 @@ def_fn strcmp >>>> S2LOEQ data2, data2, #8 >>>> beq .Lstrcmp_tail >>>> sub result, r2, result >>>> +#if __HAVE_PAC_LEAF >>>> + pop {r5, ip} >>>> + .cfi_restore 5 >>>> + .cfi_restore 143 >>>> + .cfi_def_cfa_offset 0 >>>> + aut ip, lr, sp >>>> +#else >>>> ldmfd sp!, {r5} >>>> .cfi_restore 5 >>>> .cfi_def_cfa_offset 0 >>>> - bx lr >>>> +#endif /* __HAVE_PAC_LEAF */ >>> >>> Same as above >>> >>>> + bx lr >>> >>> tab between mnemonic and operands. >>> >>>> .cfi_endproc >>>> + .fnend >>>> .size strcmp, . - strcmp >>> >>> R.
On 05/08/2022 15:58, Victor L. Do Nascimento wrote: > Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes: > >> On 05/08/2022 11:51, Victor L. Do Nascimento wrote: >>> Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes: >>> >>>> On 03/08/2022 16:35, Victor Do Nascimento wrote: >>>>> Add function prologue/epilogue to conditionally add BTI landing pads >>>>> and/or PAC code generation & authentication instructions depending on >>>>> compilation flags. >>>>> This patch enables PACBTI for all relevant variants of strcmp: >>>>> * Newlib for armv8.1-m.main+pacbti >>>>> * Newlib for armv8.1-m.main+pacbti+mve >>>>> * Newlib-nano >>>>> --- >>>>> newlib/libc/machine/arm/strcmp-arm-tiny.S | 7 +++- >>>>> newlib/libc/machine/arm/strcmp-armv7.S | 45 +++++++++++++++++------ >>>>> newlib/libc/machine/arm/strcmp-armv7m.S | 36 +++++++++++++++--- >>>>> 3 files changed, 71 insertions(+), 17 deletions(-) >>>>> diff --git a/newlib/libc/machine/arm/strcmp-arm-tiny.S >>>>> b/newlib/libc/machine/arm/strcmp-arm-tiny.S >>>>> index 607a41daf..8085eb4df 100644 >>>>> --- a/newlib/libc/machine/arm/strcmp-arm-tiny.S >>>>> +++ b/newlib/libc/machine/arm/strcmp-arm-tiny.S >>>>> @@ -29,10 +29,14 @@ >>>>> /* Tiny version of strcmp in ARM state. Used only when optimizing >>>>> for size. Also supports Thumb-2. */ >>>>> +#include "pacbti.h" >>>>> + >>>>> .syntax unified >>>>> def_fn strcmp >>>>> + .fnstart >>>>> .cfi_sections .debug_frame >>>>> .cfi_startproc >>>>> + pacbti_prologue >>>> >>>> See comment on patch1 about the names of these macros. >>>> >>>>> 1: >>>>> ldrb r2, [r0], #1 >>>>> ldrb r3, [r1], #1 >>>>> @@ -42,6 +46,7 @@ def_fn strcmp >>>>> beq 1b >>>>> 2: >>>>> subs r0, r2, r3 >>>>> - bx lr >>>>> + pacbti_epilogue >>>>> .cfi_endproc >>>>> + .fnend >>>>> .size strcmp, . - strcmp >>>>> diff --git a/newlib/libc/machine/arm/strcmp-armv7.S b/newlib/libc/machine/arm/strcmp-armv7.S >>>>> index 2f93bfb73..584c89f7e 100644 >>>>> --- a/newlib/libc/machine/arm/strcmp-armv7.S >>>>> +++ b/newlib/libc/machine/arm/strcmp-armv7.S >>>>> @@ -45,6 +45,8 @@ >>>>> .thumb >>>>> .syntax unified >>>>> +#include "pacbti.h" >>>>> + >>>>> /* Parameters and result. */ >>>>> #define src1 r0 >>>>> #define src2 r1 >>>>> @@ -91,8 +93,9 @@ >>>>> ldrd r4, r5, [sp], #16 >>>>> .cfi_restore 4 >>>>> .cfi_restore 5 >>>>> + .cfi_adjust_cfa_offset -16 >>>> >>>> This looks like it's an omission from the original file and thus >>>> should be separated out as a separate patch. >>>> >>>>> sub result, result, r1, lsr #24 >>>>> - bx lr >>>>> + pacbti_epilogue >>>>> #else >>>>> /* To use the big-endian trick we'd have to reverse all three words. >>>>> that's slower than this approach. */ >>>>> @@ -112,22 +115,28 @@ >>>>> ldrd r4, r5, [sp], #16 >>>>> .cfi_restore 4 >>>>> .cfi_restore 5 >>>>> + .cfi_adjust_cfa_offset -16 >>>>> sub result, result, r1 >>>>> - bx lr >>>>> + pacbti_epilogue >>>>> #endif >>>>> .endm >>>>> + >>>>> .text >>>>> .p2align 5 >>>>> + .fnstart >>>>> + .cfi_sections .debug_frame >>>>> + .cfi_startproc >>>>> .Lstrcmp_start_addr: >>>>> #ifndef STRCMP_NO_PRECHECK >>>>> .Lfastpath_exit: >>>>> sub r0, r2, r3 >>>>> - bx lr >>>>> + pacbti_epilogue >>>>> nop >>>>> #endif >>>>> def_fn strcmp >>>>> + pacbti_prologue >>>>> #ifndef STRCMP_NO_PRECHECK >>>>> ldrb r2, [src1] >>>>> ldrb r3, [src2] >>>>> @@ -136,16 +145,26 @@ def_fn strcmp >>>>> cmpcs r2, r3 >>>>> bne .Lfastpath_exit >>>>> #endif >>>>> - .cfi_sections .debug_frame >>>>> - .cfi_startproc >>>>> strd r4, r5, [sp, #-16]! >>>>> - .cfi_def_cfa_offset 16 >>>>> + .save {r4, r5} >>>> >>>> Hmm, I've just remembered that leaf functions cannot throw exceptions >>>> (the EHABI only supports synchronous exceptions), so I don't think we >>>> need .save directives at all. Instead, such functions should be >>>> marked with ".cantunwind". You still want the dwarf unwind >>>> information as that's for the debugger, it's just the '.save' >>>> directives that aren't needed. >>>> >>>>> + .cfi_adjust_cfa_offset 16 >>>>> +#if __HAVE_PAC_LEAF >>>>> + .cfi_offset 4, -20 >>>>> + .cfi_offset 5, -16 >>>>> +#else >>>>> .cfi_offset 4, -16 >>>>> .cfi_offset 5, -12 >>>>> +#endif /* __HAVE_PAC_LEAF */ >>>> >>>> Ugh! This is not the right way to do this. Instead (see comments on >>>> patch 1) use a macro that defines the additional adjustment to use in >>>> various contexts, something like >>>> >>>> .cfi_offset 4, -(16+PAC_UNALIGNED_CFI_ADJ) >>>> >>>>> orr tmp1, src1, src2 >>>>> strd r6, r7, [sp, #8] >>>>> + .save {r6, r7} >>>>> +#if __HAVE_PAC_LEAF >>>>> + .cfi_offset 6, -12 >>>>> + .cfi_offset 7, -8 >>>>> +#else >>>>> .cfi_offset 6, -8 >>>>> .cfi_offset 7, -4 >>>>> +#endif /* __HAVE_PAC_LEAF */ >>>>> mvn const_m1, #0 >>>>> lsl r2, tmp1, #29 >>>>> cbz r2, .Lloop_aligned8 >>>>> @@ -270,7 +289,6 @@ def_fn strcmp >>>>> ldr data1, [src1], #4 >>>>> beq .Laligned_m2 >>>>> bcs .Laligned_m1 >>>>> - >>>>> #ifdef STRCMP_NO_PRECHECK >>>>> ldrb data2, [src2, #1] >>>>> uxtb tmp1, data1, ror #BYTE1_OFFSET >>>>> @@ -314,7 +332,8 @@ def_fn strcmp >>>>> mov result, tmp1 >>>>> ldr r4, [sp], #16 >>>>> .cfi_restore 4 >>>>> - bx lr >>>>> + .cfi_adjust_cfa_offset -16 >>>>> + pacbti_epilogue >>>>> #ifndef STRCMP_NO_PRECHECK >>>>> .Laligned_m1: >>>>> @@ -364,8 +383,9 @@ def_fn strcmp >>>>> /* R6/7 Not used in this sequence. */ >>>>> .cfi_restore 6 >>>>> .cfi_restore 7 >>>>> + .cfi_adjust_cfa_offset -16 >>>>> neg result, result >>>>> - bx lr >>>>> + pacbti_epilogue >>>>> 6: >>>>> .cfi_restore_state >>>>> @@ -441,7 +461,8 @@ def_fn strcmp >>>>> /* R6/7 not used in this sequence. */ >>>>> .cfi_restore 6 >>>>> .cfi_restore 7 >>>>> - bx lr >>>>> + .cfi_adjust_cfa_offset -16 >>>>> + pacbti_epilogue >>>>> .Lstrcmp_tail: >>>>> .cfi_restore_state >>>>> @@ -463,7 +484,9 @@ def_fn strcmp >>>>> /* R6/7 not used in this sequence. */ >>>>> .cfi_restore 6 >>>>> .cfi_restore 7 >>>>> + .cfi_adjust_cfa_offset -16 >>>>> sub result, result, data2, lsr #24 >>>>> - bx lr >>>>> + pacbti_epilogue >>>>> .cfi_endproc >>>>> + .fnend >>>>> .size strcmp, . - .Lstrcmp_start_addr >>>>> diff --git a/newlib/libc/machine/arm/strcmp-armv7m.S b/newlib/libc/machine/arm/strcmp-armv7m.S >>>>> index cdb4912df..fe1519f4d 100644 >>>>> --- a/newlib/libc/machine/arm/strcmp-armv7m.S >>>>> +++ b/newlib/libc/machine/arm/strcmp-armv7m.S >>>>> @@ -29,6 +29,8 @@ >>>>> /* Very similar to the generic code, but uses Thumb2 as implemented >>>>> in ARMv7-M. */ >>>>> +#include "pacbti.h" >>>>> + >>>>> /* Parameters and result. */ >>>>> #define src1 r0 >>>>> #define src2 r1 >>>>> @@ -44,8 +46,10 @@ >>>>> .thumb >>>>> .syntax unified >>>>> def_fn strcmp >>>>> + .fnstart >>>>> .cfi_sections .debug_frame >>>>> .cfi_startproc >>>>> + pacbti_prologue >>>>> eor tmp1, src1, src2 >>>>> tst tmp1, #3 >>>>> /* Strings not at same byte offset from a word boundary. */ >>>>> @@ -106,7 +110,7 @@ def_fn strcmp >>>>> lsrs result, result, #24 >>>>> subs result, result, data2 >>>>> #endif >>>>> - bx lr >>>>> + pacbti_epilogue >>>>> #if 0 >>>>> @@ -214,12 +218,18 @@ def_fn strcmp >>>>> cmpcs data1, data2 >>>>> beq .Lstrcmp_unaligned >>>>> sub result, data1, data2 >>>>> - bx lr >>>>> + pacbti_epilogue >>>>> 2: >>>>> stmfd sp!, {r5} >>>>> - .cfi_def_cfa_offset 4 >>>>> + .save {r5} >>>>> + .cfi_adjust_cfa_offset 4 >>>>> +#if __HAVE_PAC_LEAF >>>>> + .cfi_offset 5, -8 /* Account for ip register already on stack. */ >>>>> +#else >>>>> .cfi_offset 5, -4 >>>>> +#endif /* __HAVE_PAC_LEAF */ >>>>> + >>>>> ldr data1, [src1], #4 >>>>> and tmp2, src2, #3 >>>>> @@ -353,10 +363,17 @@ def_fn strcmp >>>>> .Lstrcmp_done_equal: >>>>> mov result, #0 >>>>> .cfi_remember_state >>>>> +#if __HAVE_PAC_LEAF >>>>> + pop {r5, ip} >>>>> + .cfi_restore 5 >>>>> + .cfi_restore 143 >>>>> + aut ip, lr, sp >>>>> +#else >>>>> ldmfd sp!, {r5} >>>>> .cfi_restore 5 >>>>> +#endif /* __HAVE_PAC_LEAF */ >>>> >>>> I think we could define a macro for this as well, something like >>>> pop_reg_and_pac r5 >>> This sounds good, but a survey through relevant function epilogues >>> reveals a series of different `pop' scenarios. On the one hand, we have >>> pop {r5, ip}, as we have here but on another we also have, for example, >>> pop {r4,r5,r6,r7,ip}. >>> Do we wish to provide a short-hand notation for one pop operation >>> but >>> not the other via macros? My initial understanding is that assembler >>> macros don't provide us with the necessary flexibility to handle all >>> relevant cases. >>> I think the use of a macro here would greatly enhance code >>> readability >>> and by extension maintainability, but at first sight it feels inelegant >>> to provide a solution applicable only to a single case of a more general >>> problem. >>> Any wisdom here would be appreciated. >> >> Yes, this is all quite tricky. Fortunately, the vast majority of time >> registers saved on the stack frame are consecutively numbered. So I >> think it's possible to write some macros to handle this. The example >> below is not complete (I'll leave you to flesh out all the different >> cases :), but I think it shows that we can achieve what's needed. >> >> @ Emit .cfi_offset directives for a consecutive sequence of registers >> .macro cfisavelist first, last, count >> .cfi_offset \last, -4*(\count) >> .if \last-\first >> cfisavelist \first, \last-1, \count+1 >> .endif >> .endm >> >> @ Create a prologue entry sequence handling PAC/BTI, if required and >> emitting >> @ CFI directives for >> .macro prologue first=-1, last=-1, savepac=1 >> .if \first != -1 >> .if \last != -1 >> .if \savepac >> push {r\first-r\last, ip} >> .cfi_offset 143, -4 >> cfisavelist \first, \last, 2 >> .else >> push {r\first-r\last} >> cfisavelist \first, \last, 1 >> .endif >> .else >> push {r\first} >> cfisavelist \first, \first, 1 >> .endif >> .else >> ... >> .endif >> .endm >> >> .text >> .global foo >> .type foo, %function >> foo: >> .cfi_startproc >> prologue 4, 7, savepac=0 >> .cfi_endproc >> >> .text >> .global bar >> .type bar, %function >> bar: >> .cfi_startproc >> prologue 4, 7 >> .cfi_endproc >> >> .text >> .global wom >> .type wom, %function >> wom: >> .cfi_startproc >> prologue 4, savepac=0 >> .cfi_endproc >> >> .text >> .global bat >> .type bat, %function >> bat: >> .cfi_startproc >> prologue savepac=0 >> .cfi_endproc >> >> R. > > Thanks for these, they'll come in extremely handy. Regarding the > pop_reg_and_pac macro you suggested, I'd concocted a solution which at > first sight seemed to make minimal assumptions about what we're asking > it to pop and, looking at initial example disassemblies, seems to > behave well. > > I'll share it with you here in case you come across anything warranting > feedback. > > Furthermore, I will see whether it will be of any use in building upon > your example. > > .macro cfi_restore_list first:req, rest:vararg > .cfi_restore \first > .ifnb \rest > cfi_restore_list \rest > .endif > .endm > > .macro pop_reg_and_pac first:req, rest:vararg > .ifnb \rest > pop {\first, \rest, ip} > .else > pop {\first, ip} > .endif > cfi_restore_list \first, \rest > .cfi_restore 143 > aut ip, lr, sp > .endm > Yes, I tried something similar to start with, but couldn't make it work for the depth-first recursion needed for the cfi save directives (because you need to count the list of registers to save in order to work out the offsets of each one). The second problem is that, AFAICT, macros can't be used as values to parameters (they simply produce text), so you can't map names onto values (and I was trying to avoid register names in .cfi statements). R. > As always, thanks for your input. > V. > >>> V. >>> >>>> While you're doing that, please change the single-register ldmfd into 'pop'. >>>> >>>>> .cfi_def_cfa_offset 0 >>>>> - bx lr >>>>> + bx lr >>>> >>>> stray change to the BX instruction. >>>> >>>>> .Lstrcmp_tail: >>>>> .cfi_restore_state >>>>> @@ -370,9 +387,18 @@ def_fn strcmp >>>>> S2LOEQ data2, data2, #8 >>>>> beq .Lstrcmp_tail >>>>> sub result, r2, result >>>>> +#if __HAVE_PAC_LEAF >>>>> + pop {r5, ip} >>>>> + .cfi_restore 5 >>>>> + .cfi_restore 143 >>>>> + .cfi_def_cfa_offset 0 >>>>> + aut ip, lr, sp >>>>> +#else >>>>> ldmfd sp!, {r5} >>>>> .cfi_restore 5 >>>>> .cfi_def_cfa_offset 0 >>>>> - bx lr >>>>> +#endif /* __HAVE_PAC_LEAF */ >>>> >>>> Same as above >>>> >>>>> + bx lr >>>> >>>> tab between mnemonic and operands. >>>> >>>>> .cfi_endproc >>>>> + .fnend >>>>> .size strcmp, . - strcmp >>>> >>>> R.
Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes: > On 03/08/2022 16:35, Victor Do Nascimento wrote: >> Create an assembly header file that conditionally defines fuction >> prologues/epilogues depending on the compile-time mbranch-protection >> argument values. >> * newlib/libc/machine/arm/pacbti.h: New. >> --- >> newlib/libc/machine/arm/pacbti.h | 64 ++++++++++++++++++++++++++++++++ >> 1 file changed, 64 insertions(+) >> create mode 100644 newlib/libc/machine/arm/pacbti.h >> diff --git a/newlib/libc/machine/arm/pacbti.h >> b/newlib/libc/machine/arm/pacbti.h >> new file mode 100644 >> index 000000000..4c31d00bc >> --- /dev/null >> +++ b/newlib/libc/machine/arm/pacbti.h >> @@ -0,0 +1,64 @@ >> +/* >> + * Copyright (c) 2022 Arm Ltd >> + * All rights reserved. >> + * >> + * Redistribution and use in source and binary forms, with or without >> + * modification, are permitted provided that the following conditions >> + * are met: >> + * 1. Redistributions of source code must retain the above copyright >> + * notice, this list of conditions and the following disclaimer. >> + * 2. Redistributions in binary form must reproduce the above copyright >> + * notice, this list of conditions and the following disclaimer in the >> + * documentation and/or other materials provided with the distribution. >> + * 3. The name of the company may not be used to endorse or promote >> + * products derived from this software without specific prior written >> + * permission. >> + * >> + * THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED >> + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF >> + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. >> + * IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, >> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED >> + * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR >> + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF >> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING >> + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS >> + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. >> + */ >> + >> +/* Checki whether leaf function PAC signing has been requested >> + in the -mbranch-protect compile-time option. */ >> +#define LEAF_PROTECT_BIT 2 >> +#define __HAVE_PAC_LEAF \ >> + __ARM_FEATURE_PAC_DEFAULT & (1 << LEAF_PROTECT_BIT) > > Either this header needs to avoid polluting the user namespace, or it > doesn't. If it does, then LEAF_PROTECT_BIT fails to do that. If it > doesn't then __HAVE_PAC_LEAF should really be HAVE_PAC_LEAF (to avoid > polluting the reserved namespace. I suspect this header is private to > newlib (won't be exported to users), so should not be prefixing names > with __. > >> + >> +/* Macro to handle function entry depending on branch-protection >> + schemes. */ >> + .macro pacbti_prologue >> +#if __HAVE_PAC_LEAF >> +#if __ARM_FEATURE_BTI_DEFAULT >> + pacbti ip, lr, sp >> +#else >> + pac ip, lr, sp >> +#endif /* __ARM_FEATURE_BTI_DEFAULT */ >> + .cfi_register 143, 12 >> + str ip, [sp, #-4]! > > This causes the stack to lose 8-byte alignment. Whilst for leaf > functions that's probably not a problem, the macro should have an > option where the user can ask for alignment to be preserved. But > there should also be an option to not save IP on the stack at all (for > when the user does not modify IP in the function body). Work is ongoing on implementing the macro with all necessary features. A better macro implementation for the prologue is given as follows (but still lacks any alignment preservation logic): /* Emit .cfi_offset directives for a consecutive sequence of registers. PAC_CFI_ADJ will be 4 bytes if IP reg is pushed. */ .macro cfisavelist first, last, index=1 .cfi_offset \last, -4*(\index) - PAC_CFI_ADJ .if \last-\first cfisavelist \first, \last-1, \index+1 .endif .endm /* Create a prologue entry sequence handling PAC/BTI, if required and emitting CFI directives for generated PAC code and any pushed registers. */ #define NREG ((\last-\first)+1) .macro prologue first=-1, last=-1, savepac=PAC_LEAF_PUSH_IP #if HAVE_PAC_LEAF #if __ARM_FEATURE_BTI_DEFAULT pacbti ip, lr, sp #else pac ip, lr, sp #endif /* __ARM_FEATURE_BTI_DEFAULT */ .cfi_register 143, 12 #else #if __ARM_FEATURE_BTI_DEFAULT bti #endif /* __ARM_FEATURE_BTI_DEFAULT */ #endif /* HAVE_PAC_LEAF */ .if \first != -1 .if \last != -1 .if \savepac push {r\first-r\last, ip} .cfi_adjust_cfa_offset NREG*4 + PAC_CFI_ADJ .cfi_offset 143, -PAC_CFI_ADJ cfisavelist \first, \last .else push {r\first-r\last} .cfi_adjust_cfa_offset NREG*4 cfisavelist \first, \last .endif .else .if \savepac push {r\first, ip} .cfi_adjust_cfa_offset 4 + PAC_CFI_ADJ .cfi_offset 143, -PAC_CFI_ADJ cfisavelist \first, \first .else // !\savepac push {r\first} cfisavelist \first, \first .endif .endif .else // \first == -1 .if \savepac push {ip} .cfi_adjust_cfa_offset PAC_CFI_ADJ .cfi_offset 143, -PAC_CFI_ADJ .endif .endif .endm The question remains of how I can ammend the register list in the push directive when we wish to push an extra dummy register to the stack for alignment purposes. A hypothetical solution (albeit a broken one, as we can't redefine macro arguments in the body of the macro) for when pushing an even no. of registers + the pac code is: .if \first != -1 .if \last != -1 .if \savepac + .if \align_requested && ( ( (NREG+1) % 2 ) != 0 && (\first != 0) ) + \first = \first - 1 + .endif push {r\first-r\last, ip} .cfi_adjust_cfa_offset NREG*4 + PAC_CFI_ADJ + .if \align_requested && ( ((NREG+1) % 2 ) != 0 && (\first != 0) ) + /* Having pushed the dummy, restore original register range. */ + \first = \first + 1 + .endif .cfi_offset 143, -PAC_CFI_ADJ cfisavelist \first, \last .else ... The logic above seems correct to me, but I need to figure out a way of adjusting the value of \first in the least invasive way. If we emit the push instruction via a separate macro, we could then pass the \first argument as \first-1 (e.g. pushmacro \first=\first-1, last=last, savepac=1). Even so, creating a macro just to emit the push instruction seems overkill. We may end up having way too many macros. Do you have any insights on how I may be able to fix the above code to allow for alignment preservation elegantly within the limitations of what assembler macros can do? Cheers, V. >> + .save {ra_auth_code} >> + .cfi_def_cfa_offset 4 >> + .cfi_offset 143, -4 >> +#elif __ARM_FEATURE_BTI_DEFAULT >> + bti >> +#endif /* __HAVE_PAC_LEAF */ >> + .endm >> + >> +/* Macro to handle different branch exchange cases depending on >> + branch-protection schemes. */ >> + .macro pacbti_epilogue >> +#if __HAVE_PAC_LEAF >> + ldr ip, [sp], #4 >> + .cfi_restore 143 >> + .cfi_def_cfa_offset 0 >> + aut ip, lr, sp >> +#endif /* __HAVE_PAC_LEAF */ >> + bx lr >> + .endm > > I think these macros are really misnamed, firstly, they're only for > leaf functions and secondly, they (particularly the epilogue) does > something even if PAC is not needed. So I think they should be > renamed as 'leaf_prologue' and 'leaf_epilogue' respectively. > > In consequence, I think this file should really be merged into the > existing arm_asm.h, then we don't need yet another header file. > > Finally, the header needs to define a (C) macro that defines how much > to adjust the CFI offset by for various scenarios: > - When PAC is not used > - When PAC is used with no alignment > - When PAC is used and asked for 8-byte alignment to be preserved. > > R.
"Victor L. Do Nascimento" <victor.donascimento@arm.com> writes: > Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes: > >> On 03/08/2022 16:35, Victor Do Nascimento wrote: >>> Create an assembly header file that conditionally defines fuction >>> prologues/epilogues depending on the compile-time mbranch-protection >>> argument values. >>> * newlib/libc/machine/arm/pacbti.h: New. >>> --- >>> newlib/libc/machine/arm/pacbti.h | 64 ++++++++++++++++++++++++++++++++ >>> 1 file changed, 64 insertions(+) >>> create mode 100644 newlib/libc/machine/arm/pacbti.h >>> diff --git a/newlib/libc/machine/arm/pacbti.h >>> b/newlib/libc/machine/arm/pacbti.h >>> new file mode 100644 >>> index 000000000..4c31d00bc >>> --- /dev/null >>> +++ b/newlib/libc/machine/arm/pacbti.h >>> @@ -0,0 +1,64 @@ >>> +/* >>> + * Copyright (c) 2022 Arm Ltd >>> + * All rights reserved. >>> + * >>> + * Redistribution and use in source and binary forms, with or without >>> + * modification, are permitted provided that the following conditions >>> + * are met: >>> + * 1. Redistributions of source code must retain the above copyright >>> + * notice, this list of conditions and the following disclaimer. >>> + * 2. Redistributions in binary form must reproduce the above copyright >>> + * notice, this list of conditions and the following disclaimer in the >>> + * documentation and/or other materials provided with the distribution. >>> + * 3. The name of the company may not be used to endorse or promote >>> + * products derived from this software without specific prior written >>> + * permission. >>> + * >>> + * THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED >>> + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF >>> + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. >>> + * IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, >>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED >>> + * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR >>> + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF >>> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING >>> + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS >>> + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. >>> + */ >>> + >>> +/* Checki whether leaf function PAC signing has been requested >>> + in the -mbranch-protect compile-time option. */ >>> +#define LEAF_PROTECT_BIT 2 >>> +#define __HAVE_PAC_LEAF \ >>> + __ARM_FEATURE_PAC_DEFAULT & (1 << LEAF_PROTECT_BIT) >> >> Either this header needs to avoid polluting the user namespace, or it >> doesn't. If it does, then LEAF_PROTECT_BIT fails to do that. If it >> doesn't then __HAVE_PAC_LEAF should really be HAVE_PAC_LEAF (to avoid >> polluting the reserved namespace. I suspect this header is private to >> newlib (won't be exported to users), so should not be prefixing names >> with __. >> >>> + >>> +/* Macro to handle function entry depending on branch-protection >>> + schemes. */ >>> + .macro pacbti_prologue >>> +#if __HAVE_PAC_LEAF >>> +#if __ARM_FEATURE_BTI_DEFAULT >>> + pacbti ip, lr, sp >>> +#else >>> + pac ip, lr, sp >>> +#endif /* __ARM_FEATURE_BTI_DEFAULT */ >>> + .cfi_register 143, 12 >>> + str ip, [sp, #-4]! >> >> This causes the stack to lose 8-byte alignment. Whilst for leaf >> functions that's probably not a problem, the macro should have an >> option where the user can ask for alignment to be preserved. But >> there should also be an option to not save IP on the stack at all (for >> when the user does not modify IP in the function body). > > Work is ongoing on implementing the macro with all necessary features. > > A better macro implementation for the prologue is given as follows (but > still lacks any alignment preservation logic): > > /* Emit .cfi_offset directives for a consecutive sequence of registers. > PAC_CFI_ADJ will be 4 bytes if IP reg is pushed. */ > .macro cfisavelist first, last, index=1 > .cfi_offset \last, -4*(\index) - PAC_CFI_ADJ > .if \last-\first > cfisavelist \first, \last-1, \index+1 > .endif > .endm > > /* Create a prologue entry sequence handling PAC/BTI, if required and emitting > CFI directives for generated PAC code and any pushed registers. */ > > #define NREG ((\last-\first)+1) > > .macro prologue first=-1, last=-1, savepac=PAC_LEAF_PUSH_IP > #if HAVE_PAC_LEAF > #if __ARM_FEATURE_BTI_DEFAULT > pacbti ip, lr, sp > #else > pac ip, lr, sp > #endif /* __ARM_FEATURE_BTI_DEFAULT */ > .cfi_register 143, 12 > #else > #if __ARM_FEATURE_BTI_DEFAULT > bti > #endif /* __ARM_FEATURE_BTI_DEFAULT */ > #endif /* HAVE_PAC_LEAF */ > .if \first != -1 > .if \last != -1 > .if \savepac > push {r\first-r\last, ip} > .cfi_adjust_cfa_offset NREG*4 + PAC_CFI_ADJ > .cfi_offset 143, -PAC_CFI_ADJ > cfisavelist \first, \last > .else > push {r\first-r\last} > .cfi_adjust_cfa_offset NREG*4 > cfisavelist \first, \last > .endif > .else > .if \savepac > push {r\first, ip} > .cfi_adjust_cfa_offset 4 + PAC_CFI_ADJ > .cfi_offset 143, -PAC_CFI_ADJ > cfisavelist \first, \first > .else // !\savepac > push {r\first} > cfisavelist \first, \first > .endif > .endif > .else // \first == -1 > .if \savepac > push {ip} > .cfi_adjust_cfa_offset PAC_CFI_ADJ > .cfi_offset 143, -PAC_CFI_ADJ > .endif > .endif > .endm > > The question remains of how I can ammend the register list in the push > directive when we wish to push an extra dummy register to the stack for > alignment purposes. > > A hypothetical solution (albeit a broken one, as we can't redefine macro > arguments in the body of the macro) for when pushing an even no. of > registers + the pac code is: > > .if \first != -1 > .if \last != -1 > .if \savepac > > + .if \align_requested && ( ( (NREG+1) % 2 ) != 0 && (\first != 0) ) > + \first = \first - 1 > + .endif > > push {r\first-r\last, ip} > .cfi_adjust_cfa_offset NREG*4 + PAC_CFI_ADJ > > + .if \align_requested && ( ((NREG+1) % 2 ) != 0 && (\first != 0) ) > + /* Having pushed the dummy, restore original register range. */ > + \first = \first + 1 > + .endif > > .cfi_offset 143, -PAC_CFI_ADJ > cfisavelist \first, \last > > .else > ... > > The logic above seems correct to me, but I need to figure out a way of > adjusting the value of \first in the least invasive way. > > If we emit the push instruction via a separate macro, we could then pass > the \first argument as \first-1 (e.g. pushmacro \first=\first-1, > last=last, savepac=1). Even so, creating a macro just to emit the push > instruction seems overkill. We may end up having way too many macros. > > Do you have any insights on how I may be able to fix the above code to > allow for alignment preservation elegantly within the limitations of > what assembler macros can do? > > Cheers, > V. > I've also been thinking about simplifying the align logic, such that if we request alignment preservation, it always pushes a dummy register along w/ the pac code. This way, we don't preserve alignment, but rather alignment "parity. If in the absence of pushing the pac-code a simple push operation would result in a misaligned stack pointer, I believe that adding the pac code to the register list should also result in the preservation of misalignment. I think that if in a hypothetical situation code gets written such that the original prologue results in a misaligned stack, it's correct to assume that the developer is aware of this and will manually compensate for it later on in the code if alignment is needed. The risk I see is that having too much hidden logic in our prologue macro might result in unforseen bugs for coders with insufficient familiarity with it. The alignment preservation feature should behave as follows: "With this turned on, no perceptable difference is observed in program behaviour irrespective of the presence/absence of the pac code on the stack." >>> + .save {ra_auth_code} >>> + .cfi_def_cfa_offset 4 >>> + .cfi_offset 143, -4 >>> +#elif __ARM_FEATURE_BTI_DEFAULT >>> + bti >>> +#endif /* __HAVE_PAC_LEAF */ >>> + .endm >>> + >>> +/* Macro to handle different branch exchange cases depending on >>> + branch-protection schemes. */ >>> + .macro pacbti_epilogue >>> +#if __HAVE_PAC_LEAF >>> + ldr ip, [sp], #4 >>> + .cfi_restore 143 >>> + .cfi_def_cfa_offset 0 >>> + aut ip, lr, sp >>> +#endif /* __HAVE_PAC_LEAF */ >>> + bx lr >>> + .endm >> >> I think these macros are really misnamed, firstly, they're only for >> leaf functions and secondly, they (particularly the epilogue) does >> something even if PAC is not needed. So I think they should be >> renamed as 'leaf_prologue' and 'leaf_epilogue' respectively. >> >> In consequence, I think this file should really be merged into the >> existing arm_asm.h, then we don't need yet another header file. >> >> Finally, the header needs to define a (C) macro that defines how much >> to adjust the CFI offset by for various scenarios: >> - When PAC is not used >> - When PAC is used with no alignment >> - When PAC is used and asked for 8-byte alignment to be preserved. This creates anoter level of complexity. As you mention that users should have the option of whether or not to push the pac code onto the stack when it won't be overwritten over the course of a leaf fuction, the logic for this C macro will take this into account. This leads us to create something as follows: #if HAVE_PAC_LEAF /* Pac code requested for leaf functions. */ # if PAC_LEAF_PUSH_IP /* Always push pac code to the stack. */ # if ALIGN_PRESERVE # define PAC_CFI_ADJ 8 # else # define PAC_CFI_ADJ 4 # endif # else # define PAC_CFI_ADJ 0 # endif /* PAC_LEAF_PUSH_IP*/ #else # define PAC_CFI_ADJ 0 #endif /* HAVE_PAC_LEAF */ Thus, if PAC_LEAF_PUSH_IP is not requested, PAC_CFI_ADJ is set to 0. This creates a problem for functions where pushing the pac code onto the stack is mandatory, as is the case for functions where the register storing it will be reused in the body of the function. In That case, we need a second macro which is independent of the value of PAC_LEAF_PUSH_IP. This leads to our second macro to be used for such functions: #if HAVE_PAC_LEAF # if ALIGN_PRESERVE # define PAC_CFI_ADJ_DEFAULT 8 # else # define PAC_CFI_ADJ_DEFAULT 4 # endif #endif and developers would have to choose correctly whether to write something like .cfi_offset <reg>, -(<base_offset>+PAC_CFI_ADJ) or .cfi_offset <reg>, -(<base_offset>+PAC_CFI_ADJ_DEFAULT) Does this sound remotely acceptable? We could use an assembler macro rather than C macro and calculate the offset based on some passed argument, but either way we find ourselves catering for a growing number of use cases and with it the complexity of any solution grows. I just thought I'd ask what your thoughts were on these matters. V. >> >> R.