From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 61480385829F for ; Fri, 5 Aug 2022 14:34:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 61480385829F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=foss.arm.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=foss.arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D5D33106F; Fri, 5 Aug 2022 07:34:41 -0700 (PDT) Received: from [10.2.78.27] (unknown [10.2.78.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 964F13F73B; Fri, 5 Aug 2022 07:34:40 -0700 (PDT) Message-ID: Date: Fri, 5 Aug 2022 15:34:39 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH v2 2/8] newlib: libc: strcmp M-profile PACBTI-enablement Content-Language: en-GB To: "Victor L. Do Nascimento" Cc: newlib@sourceware.org, richard.earnshaw@arm.com References: <20220803153524.20631-1-victor.donascimento@arm.com> <20220803153524.20631-3-victor.donascimento@arm.com> From: Richard Earnshaw In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3496.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: newlib@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Newlib mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Aug 2022 14:34:43 -0000 On 05/08/2022 11:51, Victor L. Do Nascimento wrote: > Richard Earnshaw writes: > >> On 03/08/2022 16:35, Victor Do Nascimento wrote: >>> Add function prologue/epilogue to conditionally add BTI landing pads >>> and/or PAC code generation & authentication instructions depending on >>> compilation flags. >>> This patch enables PACBTI for all relevant variants of strcmp: >>> * Newlib for armv8.1-m.main+pacbti >>> * Newlib for armv8.1-m.main+pacbti+mve >>> * Newlib-nano >>> --- >>> newlib/libc/machine/arm/strcmp-arm-tiny.S | 7 +++- >>> newlib/libc/machine/arm/strcmp-armv7.S | 45 +++++++++++++++++------ >>> newlib/libc/machine/arm/strcmp-armv7m.S | 36 +++++++++++++++--- >>> 3 files changed, 71 insertions(+), 17 deletions(-) >>> diff --git a/newlib/libc/machine/arm/strcmp-arm-tiny.S >>> b/newlib/libc/machine/arm/strcmp-arm-tiny.S >>> index 607a41daf..8085eb4df 100644 >>> --- a/newlib/libc/machine/arm/strcmp-arm-tiny.S >>> +++ b/newlib/libc/machine/arm/strcmp-arm-tiny.S >>> @@ -29,10 +29,14 @@ >>> /* Tiny version of strcmp in ARM state. Used only when optimizing >>> for size. Also supports Thumb-2. */ >>> +#include "pacbti.h" >>> + >>> .syntax unified >>> def_fn strcmp >>> + .fnstart >>> .cfi_sections .debug_frame >>> .cfi_startproc >>> + pacbti_prologue >> >> See comment on patch1 about the names of these macros. >> >>> 1: >>> ldrb r2, [r0], #1 >>> ldrb r3, [r1], #1 >>> @@ -42,6 +46,7 @@ def_fn strcmp >>> beq 1b >>> 2: >>> subs r0, r2, r3 >>> - bx lr >>> + pacbti_epilogue >>> .cfi_endproc >>> + .fnend >>> .size strcmp, . - strcmp >>> diff --git a/newlib/libc/machine/arm/strcmp-armv7.S b/newlib/libc/machine/arm/strcmp-armv7.S >>> index 2f93bfb73..584c89f7e 100644 >>> --- a/newlib/libc/machine/arm/strcmp-armv7.S >>> +++ b/newlib/libc/machine/arm/strcmp-armv7.S >>> @@ -45,6 +45,8 @@ >>> .thumb >>> .syntax unified >>> +#include "pacbti.h" >>> + >>> /* Parameters and result. */ >>> #define src1 r0 >>> #define src2 r1 >>> @@ -91,8 +93,9 @@ >>> ldrd r4, r5, [sp], #16 >>> .cfi_restore 4 >>> .cfi_restore 5 >>> + .cfi_adjust_cfa_offset -16 >> >> This looks like it's an omission from the original file and thus >> should be separated out as a separate patch. >> >>> sub result, result, r1, lsr #24 >>> - bx lr >>> + pacbti_epilogue >>> #else >>> /* To use the big-endian trick we'd have to reverse all three words. >>> that's slower than this approach. */ >>> @@ -112,22 +115,28 @@ >>> ldrd r4, r5, [sp], #16 >>> .cfi_restore 4 >>> .cfi_restore 5 >>> + .cfi_adjust_cfa_offset -16 >>> sub result, result, r1 >>> - bx lr >>> + pacbti_epilogue >>> #endif >>> .endm >>> + >>> .text >>> .p2align 5 >>> + .fnstart >>> + .cfi_sections .debug_frame >>> + .cfi_startproc >>> .Lstrcmp_start_addr: >>> #ifndef STRCMP_NO_PRECHECK >>> .Lfastpath_exit: >>> sub r0, r2, r3 >>> - bx lr >>> + pacbti_epilogue >>> nop >>> #endif >>> def_fn strcmp >>> + pacbti_prologue >>> #ifndef STRCMP_NO_PRECHECK >>> ldrb r2, [src1] >>> ldrb r3, [src2] >>> @@ -136,16 +145,26 @@ def_fn strcmp >>> cmpcs r2, r3 >>> bne .Lfastpath_exit >>> #endif >>> - .cfi_sections .debug_frame >>> - .cfi_startproc >>> strd r4, r5, [sp, #-16]! >>> - .cfi_def_cfa_offset 16 >>> + .save {r4, r5} >> >> Hmm, I've just remembered that leaf functions cannot throw exceptions >> (the EHABI only supports synchronous exceptions), so I don't think we >> need .save directives at all. Instead, such functions should be >> marked with ".cantunwind". You still want the dwarf unwind >> information as that's for the debugger, it's just the '.save' >> directives that aren't needed. >> >>> + .cfi_adjust_cfa_offset 16 >>> +#if __HAVE_PAC_LEAF >>> + .cfi_offset 4, -20 >>> + .cfi_offset 5, -16 >>> +#else >>> .cfi_offset 4, -16 >>> .cfi_offset 5, -12 >>> +#endif /* __HAVE_PAC_LEAF */ >> >> Ugh! This is not the right way to do this. Instead (see comments on >> patch 1) use a macro that defines the additional adjustment to use in >> various contexts, something like >> >> .cfi_offset 4, -(16+PAC_UNALIGNED_CFI_ADJ) >> >>> orr tmp1, src1, src2 >>> strd r6, r7, [sp, #8] >>> + .save {r6, r7} >>> +#if __HAVE_PAC_LEAF >>> + .cfi_offset 6, -12 >>> + .cfi_offset 7, -8 >>> +#else >>> .cfi_offset 6, -8 >>> .cfi_offset 7, -4 >>> +#endif /* __HAVE_PAC_LEAF */ >>> mvn const_m1, #0 >>> lsl r2, tmp1, #29 >>> cbz r2, .Lloop_aligned8 >>> @@ -270,7 +289,6 @@ def_fn strcmp >>> ldr data1, [src1], #4 >>> beq .Laligned_m2 >>> bcs .Laligned_m1 >>> - >>> #ifdef STRCMP_NO_PRECHECK >>> ldrb data2, [src2, #1] >>> uxtb tmp1, data1, ror #BYTE1_OFFSET >>> @@ -314,7 +332,8 @@ def_fn strcmp >>> mov result, tmp1 >>> ldr r4, [sp], #16 >>> .cfi_restore 4 >>> - bx lr >>> + .cfi_adjust_cfa_offset -16 >>> + pacbti_epilogue >>> #ifndef STRCMP_NO_PRECHECK >>> .Laligned_m1: >>> @@ -364,8 +383,9 @@ def_fn strcmp >>> /* R6/7 Not used in this sequence. */ >>> .cfi_restore 6 >>> .cfi_restore 7 >>> + .cfi_adjust_cfa_offset -16 >>> neg result, result >>> - bx lr >>> + pacbti_epilogue >>> 6: >>> .cfi_restore_state >>> @@ -441,7 +461,8 @@ def_fn strcmp >>> /* R6/7 not used in this sequence. */ >>> .cfi_restore 6 >>> .cfi_restore 7 >>> - bx lr >>> + .cfi_adjust_cfa_offset -16 >>> + pacbti_epilogue >>> .Lstrcmp_tail: >>> .cfi_restore_state >>> @@ -463,7 +484,9 @@ def_fn strcmp >>> /* R6/7 not used in this sequence. */ >>> .cfi_restore 6 >>> .cfi_restore 7 >>> + .cfi_adjust_cfa_offset -16 >>> sub result, result, data2, lsr #24 >>> - bx lr >>> + pacbti_epilogue >>> .cfi_endproc >>> + .fnend >>> .size strcmp, . - .Lstrcmp_start_addr >>> diff --git a/newlib/libc/machine/arm/strcmp-armv7m.S b/newlib/libc/machine/arm/strcmp-armv7m.S >>> index cdb4912df..fe1519f4d 100644 >>> --- a/newlib/libc/machine/arm/strcmp-armv7m.S >>> +++ b/newlib/libc/machine/arm/strcmp-armv7m.S >>> @@ -29,6 +29,8 @@ >>> /* Very similar to the generic code, but uses Thumb2 as implemented >>> in ARMv7-M. */ >>> +#include "pacbti.h" >>> + >>> /* Parameters and result. */ >>> #define src1 r0 >>> #define src2 r1 >>> @@ -44,8 +46,10 @@ >>> .thumb >>> .syntax unified >>> def_fn strcmp >>> + .fnstart >>> .cfi_sections .debug_frame >>> .cfi_startproc >>> + pacbti_prologue >>> eor tmp1, src1, src2 >>> tst tmp1, #3 >>> /* Strings not at same byte offset from a word boundary. */ >>> @@ -106,7 +110,7 @@ def_fn strcmp >>> lsrs result, result, #24 >>> subs result, result, data2 >>> #endif >>> - bx lr >>> + pacbti_epilogue >>> #if 0 >>> @@ -214,12 +218,18 @@ def_fn strcmp >>> cmpcs data1, data2 >>> beq .Lstrcmp_unaligned >>> sub result, data1, data2 >>> - bx lr >>> + pacbti_epilogue >>> 2: >>> stmfd sp!, {r5} >>> - .cfi_def_cfa_offset 4 >>> + .save {r5} >>> + .cfi_adjust_cfa_offset 4 >>> +#if __HAVE_PAC_LEAF >>> + .cfi_offset 5, -8 /* Account for ip register already on stack. */ >>> +#else >>> .cfi_offset 5, -4 >>> +#endif /* __HAVE_PAC_LEAF */ >>> + >>> ldr data1, [src1], #4 >>> and tmp2, src2, #3 >>> @@ -353,10 +363,17 @@ def_fn strcmp >>> .Lstrcmp_done_equal: >>> mov result, #0 >>> .cfi_remember_state >>> +#if __HAVE_PAC_LEAF >>> + pop {r5, ip} >>> + .cfi_restore 5 >>> + .cfi_restore 143 >>> + aut ip, lr, sp >>> +#else >>> ldmfd sp!, {r5} >>> .cfi_restore 5 >>> +#endif /* __HAVE_PAC_LEAF */ >> >> I think we could define a macro for this as well, something like >> pop_reg_and_pac r5 > > This sounds good, but a survey through relevant function epilogues > reveals a series of different `pop' scenarios. On the one hand, we have > pop {r5, ip}, as we have here but on another we also have, for example, > pop {r4,r5,r6,r7,ip}. > > Do we wish to provide a short-hand notation for one pop operation but > not the other via macros? My initial understanding is that assembler > macros don't provide us with the necessary flexibility to handle all > relevant cases. > > I think the use of a macro here would greatly enhance code readability > and by extension maintainability, but at first sight it feels inelegant > to provide a solution applicable only to a single case of a more general > problem. > > Any wisdom here would be appreciated. Yes, this is all quite tricky. Fortunately, the vast majority of time registers saved on the stack frame are consecutively numbered. So I think it's possible to write some macros to handle this. The example below is not complete (I'll leave you to flesh out all the different cases :), but I think it shows that we can achieve what's needed. @ Emit .cfi_offset directives for a consecutive sequence of registers .macro cfisavelist first, last, count .cfi_offset \last, -4*(\count) .if \last-\first cfisavelist \first, \last-1, \count+1 .endif .endm @ Create a prologue entry sequence handling PAC/BTI, if required and emitting @ CFI directives for .macro prologue first=-1, last=-1, savepac=1 .if \first != -1 .if \last != -1 .if \savepac push {r\first-r\last, ip} .cfi_offset 143, -4 cfisavelist \first, \last, 2 .else push {r\first-r\last} cfisavelist \first, \last, 1 .endif .else push {r\first} cfisavelist \first, \first, 1 .endif .else ... .endif .endm .text .global foo .type foo, %function foo: .cfi_startproc prologue 4, 7, savepac=0 .cfi_endproc .text .global bar .type bar, %function bar: .cfi_startproc prologue 4, 7 .cfi_endproc .text .global wom .type wom, %function wom: .cfi_startproc prologue 4, savepac=0 .cfi_endproc .text .global bat .type bat, %function bat: .cfi_startproc prologue savepac=0 .cfi_endproc R. > > V. > >> While you're doing that, please change the single-register ldmfd into 'pop'. >> >>> .cfi_def_cfa_offset 0 >>> - bx lr >>> + bx lr >> >> stray change to the BX instruction. >> >>> .Lstrcmp_tail: >>> .cfi_restore_state >>> @@ -370,9 +387,18 @@ def_fn strcmp >>> S2LOEQ data2, data2, #8 >>> beq .Lstrcmp_tail >>> sub result, r2, result >>> +#if __HAVE_PAC_LEAF >>> + pop {r5, ip} >>> + .cfi_restore 5 >>> + .cfi_restore 143 >>> + .cfi_def_cfa_offset 0 >>> + aut ip, lr, sp >>> +#else >>> ldmfd sp!, {r5} >>> .cfi_restore 5 >>> .cfi_def_cfa_offset 0 >>> - bx lr >>> +#endif /* __HAVE_PAC_LEAF */ >> >> Same as above >> >>> + bx lr >> >> tab between mnemonic and operands. >> >>> .cfi_endproc >>> + .fnend >>> .size strcmp, . - strcmp >> >> R.