public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] Implement assembly cortex-M PACBTI functionality
@ 2022-08-03 15:35 Victor Do Nascimento
  2022-08-03 15:35 ` [PATCH v2 1/8] newlib: libc: define M-profile PACBTI-enablement macros Victor Do Nascimento
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2022-08-03 15:35 UTC (permalink / raw)
  To: newlib; +Cc: richard.earnshaw, Victor Do Nascimento

This patch series modifies hand-written assembly files for Arm targets,
conditionally enabling branch target identification as well as address
return signature and verification based on Armv8.1-M Pointer
Authentication [1] using ACLE feature test macros at compile-time [2].

The incorportaion of PACBTI functionality in function prologues/
epilogues is dictated by the arguments passed to the
`-mbranch-protection' flag at the time of Newlib compilation.

Regression tested on arm-none-eabi with and without MVE extension and
for Newlib and Newlib-nano.

[1]
<https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension>
[2] <https://developer.arm.com/documentation/101028/0012/5--Feature-test-macros>

Victor Do Nascimento (8):
  newlib: libc: define M-profile PACBTI-enablement macros
  newlib: libc: strcmp M-profile PACBTI-enablement
  newlib: libc: strlen M-profile PACBTI-enablement
  newlib: libc: memchr M-profile PACBTI-enablement
  newlib: libc: memcpy M-profile PACBTI-enablement
  newlib: libc: setjmp/longjmp M-profile PACBTI-enablement
  newlib: libc: aeabi_memmove M-profile PACBTI-enablement
  newlib: libc: aeabi_memset M-profile PACBTI-enablement

 .../libc/machine/arm/aeabi_memmove-thumb2.S   | 54 ++++++++++++--
 newlib/libc/machine/arm/aeabi_memset-thumb2.S | 47 +++++++++++-
 newlib/libc/machine/arm/memchr.S              | 55 ++++++++++++--
 newlib/libc/machine/arm/memcpy-armv7m.S       | 74 +++++++++++++++++--
 newlib/libc/machine/arm/pacbti.h              | 64 ++++++++++++++++
 newlib/libc/machine/arm/setjmp.S              | 33 ++++++++-
 newlib/libc/machine/arm/strcmp-arm-tiny.S     |  7 +-
 newlib/libc/machine/arm/strcmp-armv7.S        | 45 ++++++++---
 newlib/libc/machine/arm/strcmp-armv7m.S       | 36 +++++++--
 newlib/libc/machine/arm/strlen-armv7.S        | 53 ++++++++++++-
 newlib/libc/machine/arm/strlen-thumb2-Os.S    | 13 +++-
 11 files changed, 435 insertions(+), 46 deletions(-)
 create mode 100644 newlib/libc/machine/arm/pacbti.h

-- 
2.36.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/8] newlib: libc: define M-profile PACBTI-enablement macros
  2022-08-03 15:35 [PATCH v2 0/8] Implement assembly cortex-M PACBTI functionality Victor Do Nascimento
@ 2022-08-03 15:35 ` Victor Do Nascimento
  2022-08-04 15:19   ` Richard Earnshaw
  2022-08-03 15:35 ` [PATCH v2 2/8] newlib: libc: strcmp M-profile PACBTI-enablement Victor Do Nascimento
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Victor Do Nascimento @ 2022-08-03 15:35 UTC (permalink / raw)
  To: newlib; +Cc: richard.earnshaw, Victor Do Nascimento

Create an assembly header file that conditionally defines fuction
prologues/epilogues depending on the compile-time mbranch-protection
argument values.

* newlib/libc/machine/arm/pacbti.h: New.
---
 newlib/libc/machine/arm/pacbti.h | 64 ++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)
 create mode 100644 newlib/libc/machine/arm/pacbti.h

diff --git a/newlib/libc/machine/arm/pacbti.h b/newlib/libc/machine/arm/pacbti.h
new file mode 100644
index 000000000..4c31d00bc
--- /dev/null
+++ b/newlib/libc/machine/arm/pacbti.h
@@ -0,0 +1,64 @@
+/*
+ * Copyright (c) 2022 Arm Ltd
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. The name of the company may not be used to endorse or promote
+ *    products derived from this software without specific prior written
+ *    permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
+ * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* Checki whether leaf function PAC signing has been requested
+   in the -mbranch-protect compile-time option.  */
+#define LEAF_PROTECT_BIT 2
+#define __HAVE_PAC_LEAF \
+	__ARM_FEATURE_PAC_DEFAULT & (1 << LEAF_PROTECT_BIT)
+
+/* Macro to handle function entry depending on branch-protection
+   schemes.  */
+	.macro pacbti_prologue
+#if __HAVE_PAC_LEAF
+#if __ARM_FEATURE_BTI_DEFAULT
+	pacbti ip, lr, sp
+#else
+	pac ip, lr, sp
+#endif /* __ARM_FEATURE_BTI_DEFAULT */
+	.cfi_register 143, 12
+	str ip, [sp, #-4]!
+	.save {ra_auth_code}
+	.cfi_def_cfa_offset 4
+	.cfi_offset 143, -4
+#elif __ARM_FEATURE_BTI_DEFAULT
+	bti
+#endif /* __HAVE_PAC_LEAF */
+	.endm
+
+/* Macro to handle different branch exchange cases depending on
+   branch-protection schemes.  */
+	.macro pacbti_epilogue
+#if __HAVE_PAC_LEAF
+	ldr ip, [sp], #4
+	.cfi_restore 143
+	.cfi_def_cfa_offset 0
+	aut ip, lr, sp
+#endif /* __HAVE_PAC_LEAF */
+	bx lr
+	.endm
-- 
2.36.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 2/8] newlib: libc: strcmp M-profile PACBTI-enablement
  2022-08-03 15:35 [PATCH v2 0/8] Implement assembly cortex-M PACBTI functionality Victor Do Nascimento
  2022-08-03 15:35 ` [PATCH v2 1/8] newlib: libc: define M-profile PACBTI-enablement macros Victor Do Nascimento
@ 2022-08-03 15:35 ` Victor Do Nascimento
  2022-08-04 15:48   ` Richard Earnshaw
  2022-08-03 15:35 ` [PATCH v2 3/8] newlib: libc: strlen " Victor Do Nascimento
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Victor Do Nascimento @ 2022-08-03 15:35 UTC (permalink / raw)
  To: newlib; +Cc: richard.earnshaw, Victor Do Nascimento

Add function prologue/epilogue to conditionally add BTI landing pads
and/or PAC code generation & authentication instructions depending on
compilation flags.

This patch enables PACBTI for all relevant variants of strcmp:
     * Newlib for armv8.1-m.main+pacbti
     * Newlib for armv8.1-m.main+pacbti+mve
     * Newlib-nano
---
 newlib/libc/machine/arm/strcmp-arm-tiny.S |  7 +++-
 newlib/libc/machine/arm/strcmp-armv7.S    | 45 +++++++++++++++++------
 newlib/libc/machine/arm/strcmp-armv7m.S   | 36 +++++++++++++++---
 3 files changed, 71 insertions(+), 17 deletions(-)

diff --git a/newlib/libc/machine/arm/strcmp-arm-tiny.S b/newlib/libc/machine/arm/strcmp-arm-tiny.S
index 607a41daf..8085eb4df 100644
--- a/newlib/libc/machine/arm/strcmp-arm-tiny.S
+++ b/newlib/libc/machine/arm/strcmp-arm-tiny.S
@@ -29,10 +29,14 @@
 /* Tiny version of strcmp in ARM state.  Used only when optimizing
    for size.  Also supports Thumb-2.  */
 
+#include "pacbti.h"
+
 	.syntax unified
 def_fn strcmp
+	.fnstart
 	.cfi_sections .debug_frame
 	.cfi_startproc
+	pacbti_prologue
 1:
 	ldrb	r2, [r0], #1
 	ldrb	r3, [r1], #1
@@ -42,6 +46,7 @@ def_fn strcmp
 	beq	1b
 2:
 	subs	r0, r2, r3
-	bx	lr
+	pacbti_epilogue
 	.cfi_endproc
+	.fnend
 	.size	strcmp, . - strcmp
diff --git a/newlib/libc/machine/arm/strcmp-armv7.S b/newlib/libc/machine/arm/strcmp-armv7.S
index 2f93bfb73..584c89f7e 100644
--- a/newlib/libc/machine/arm/strcmp-armv7.S
+++ b/newlib/libc/machine/arm/strcmp-armv7.S
@@ -45,6 +45,8 @@
 	.thumb
 	.syntax unified
 
+#include "pacbti.h"
+
 /* Parameters and result.  */
 #define src1		r0
 #define src2		r1
@@ -91,8 +93,9 @@
 	ldrd	r4, r5, [sp], #16
 	.cfi_restore 4
 	.cfi_restore 5
+	.cfi_adjust_cfa_offset -16
 	sub	result, result, r1, lsr #24
-	bx	lr
+	pacbti_epilogue
 #else
 	/* To use the big-endian trick we'd have to reverse all three words.
 	   that's slower than this approach.  */
@@ -112,22 +115,28 @@
 	ldrd	r4, r5, [sp], #16
 	.cfi_restore 4
 	.cfi_restore 5
+	.cfi_adjust_cfa_offset -16
 	sub	result, result, r1
 
-	bx	lr
+	pacbti_epilogue
 #endif
 	.endm
 
+
 	.text
 	.p2align	5
+	.fnstart
+	.cfi_sections .debug_frame
+	.cfi_startproc
 .Lstrcmp_start_addr:
 #ifndef STRCMP_NO_PRECHECK
 .Lfastpath_exit:
 	sub	r0, r2, r3
-	bx	lr
+	pacbti_epilogue
 	nop
 #endif
 def_fn	strcmp
+	pacbti_prologue
 #ifndef STRCMP_NO_PRECHECK
 	ldrb	r2, [src1]
 	ldrb	r3, [src2]
@@ -136,16 +145,26 @@ def_fn	strcmp
 	cmpcs	r2, r3
 	bne	.Lfastpath_exit
 #endif
-	.cfi_sections .debug_frame
-	.cfi_startproc
 	strd	r4, r5, [sp, #-16]!
-	.cfi_def_cfa_offset 16
+	.save {r4, r5}
+	.cfi_adjust_cfa_offset 16
+#if __HAVE_PAC_LEAF
+	.cfi_offset 4, -20
+	.cfi_offset 5, -16
+#else
 	.cfi_offset 4, -16
 	.cfi_offset 5, -12
+#endif /* __HAVE_PAC_LEAF */
 	orr	tmp1, src1, src2
 	strd	r6, r7, [sp, #8]
+	.save {r6, r7}
+#if __HAVE_PAC_LEAF
+	.cfi_offset 6, -12
+	.cfi_offset 7, -8
+#else
 	.cfi_offset 6, -8
 	.cfi_offset 7, -4
+#endif /* __HAVE_PAC_LEAF */
 	mvn	const_m1, #0
 	lsl	r2, tmp1, #29
 	cbz	r2, .Lloop_aligned8
@@ -270,7 +289,6 @@ def_fn	strcmp
 	ldr	data1, [src1], #4
 	beq	.Laligned_m2
 	bcs	.Laligned_m1
-
 #ifdef STRCMP_NO_PRECHECK
 	ldrb	data2, [src2, #1]
 	uxtb	tmp1, data1, ror #BYTE1_OFFSET
@@ -314,7 +332,8 @@ def_fn	strcmp
 	mov	result, tmp1
 	ldr	r4, [sp], #16
 	.cfi_restore 4
-	bx	lr
+	.cfi_adjust_cfa_offset -16
+	pacbti_epilogue
 
 #ifndef STRCMP_NO_PRECHECK
 .Laligned_m1:
@@ -364,8 +383,9 @@ def_fn	strcmp
 	/* R6/7 Not used in this sequence.  */
 	.cfi_restore 6
 	.cfi_restore 7
+	.cfi_adjust_cfa_offset -16
 	neg	result, result
-	bx	lr
+	pacbti_epilogue
 
 6:
 	.cfi_restore_state
@@ -441,7 +461,8 @@ def_fn	strcmp
 	/* R6/7 not used in this sequence.  */
 	.cfi_restore 6
 	.cfi_restore 7
-	bx	lr
+	.cfi_adjust_cfa_offset -16
+	pacbti_epilogue
 
 .Lstrcmp_tail:
 	.cfi_restore_state
@@ -463,7 +484,9 @@ def_fn	strcmp
 	/* R6/7 not used in this sequence.  */
 	.cfi_restore 6
 	.cfi_restore 7
+	.cfi_adjust_cfa_offset -16
 	sub	result, result, data2, lsr #24
-	bx	lr
+	pacbti_epilogue
 	.cfi_endproc
+	.fnend
 	.size strcmp, . - .Lstrcmp_start_addr
diff --git a/newlib/libc/machine/arm/strcmp-armv7m.S b/newlib/libc/machine/arm/strcmp-armv7m.S
index cdb4912df..fe1519f4d 100644
--- a/newlib/libc/machine/arm/strcmp-armv7m.S
+++ b/newlib/libc/machine/arm/strcmp-armv7m.S
@@ -29,6 +29,8 @@
 /* Very similar to the generic code, but uses Thumb2 as implemented
    in ARMv7-M.  */
 
+#include "pacbti.h"
+
 /* Parameters and result.  */
 #define src1		r0
 #define src2		r1
@@ -44,8 +46,10 @@
 	.thumb
 	.syntax unified
 def_fn strcmp
+	.fnstart
 	.cfi_sections .debug_frame
 	.cfi_startproc
+	pacbti_prologue
 	eor	tmp1, src1, src2
 	tst	tmp1, #3
 	/* Strings not at same byte offset from a word boundary.  */
@@ -106,7 +110,7 @@ def_fn strcmp
 	lsrs	result, result, #24
 	subs	result, result, data2
 #endif
-	bx	lr
+	pacbti_epilogue
 
 
 #if 0
@@ -214,12 +218,18 @@ def_fn strcmp
 	cmpcs	data1, data2
 	beq	.Lstrcmp_unaligned
 	sub	result, data1, data2
-	bx	lr
+	pacbti_epilogue
 
 2:
 	stmfd	sp!, {r5}
-	.cfi_def_cfa_offset 4
+	.save {r5}
+	.cfi_adjust_cfa_offset 4
+#if __HAVE_PAC_LEAF
+	.cfi_offset 5, -8 /* Account for ip register already on stack.  */
+#else
 	.cfi_offset 5, -4
+#endif /* __HAVE_PAC_LEAF */
+
 
 	ldr	data1, [src1], #4
 	and	tmp2, src2, #3
@@ -353,10 +363,17 @@ def_fn strcmp
 .Lstrcmp_done_equal:
 	mov	result, #0
 	.cfi_remember_state
+#if __HAVE_PAC_LEAF
+	pop {r5, ip}
+	.cfi_restore 5
+	.cfi_restore 143
+	aut ip, lr, sp
+#else
 	ldmfd	sp!, {r5}
 	.cfi_restore 5
+#endif /* __HAVE_PAC_LEAF */
 	.cfi_def_cfa_offset 0
-	bx	lr
+	bx lr
 
 .Lstrcmp_tail:
 	.cfi_restore_state
@@ -370,9 +387,18 @@ def_fn strcmp
 	S2LOEQ	data2, data2, #8
 	beq	.Lstrcmp_tail
 	sub	result, r2, result
+#if __HAVE_PAC_LEAF
+	pop {r5, ip}
+	.cfi_restore 5
+	.cfi_restore 143
+	.cfi_def_cfa_offset 0
+	aut ip, lr, sp
+#else
 	ldmfd	sp!, {r5}
 	.cfi_restore 5
 	.cfi_def_cfa_offset 0
-	bx	lr
+#endif /* __HAVE_PAC_LEAF */
+	bx lr
 	.cfi_endproc
+	.fnend
 	.size strcmp, . - strcmp
-- 
2.36.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 3/8] newlib: libc: strlen M-profile PACBTI-enablement
  2022-08-03 15:35 [PATCH v2 0/8] Implement assembly cortex-M PACBTI functionality Victor Do Nascimento
  2022-08-03 15:35 ` [PATCH v2 1/8] newlib: libc: define M-profile PACBTI-enablement macros Victor Do Nascimento
  2022-08-03 15:35 ` [PATCH v2 2/8] newlib: libc: strcmp M-profile PACBTI-enablement Victor Do Nascimento
@ 2022-08-03 15:35 ` Victor Do Nascimento
  2022-08-03 15:35 ` [PATCH v2 4/8] newlib: libc: memchr " Victor Do Nascimento
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2022-08-03 15:35 UTC (permalink / raw)
  To: newlib; +Cc: richard.earnshaw, Victor Do Nascimento

Add function prologue/epilogue to conditionally add BTI landing pads
and/or PAC code generation & authentication instructions depending on
compilation flags.

This patch enables PACBTI for all relevant variants of strlen:
     * Newlib for armv8.1-m.main+pacbti
     * Newlib for armv8.1-m.main+pacbti+mve
     * Newlib-nano
---
 newlib/libc/machine/arm/strlen-armv7.S     | 53 ++++++++++++++++++++--
 newlib/libc/machine/arm/strlen-thumb2-Os.S | 13 ++++--
 2 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/newlib/libc/machine/arm/strlen-armv7.S b/newlib/libc/machine/arm/strlen-armv7.S
index f3dda0d60..0492800f9 100644
--- a/newlib/libc/machine/arm/strlen-armv7.S
+++ b/newlib/libc/machine/arm/strlen-armv7.S
@@ -59,6 +59,7 @@
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.  */
 
 #include "acle-compat.h"
+#include "pacbti.h"
 
 	.macro def_fn f p2align=0
 	.text
@@ -78,7 +79,11 @@
 
 	/* This code requires Thumb.  */
 #if __ARM_ARCH_PROFILE == 'M'
+#if __ARM_ARCH >= 8
+	/* keep config inherited from -march=.  */
+#else
 	.arch   armv7e-m
+#endif /* if __ARM_ARCH >= 8 */
 #else
 	.arch	armv6t2
 #endif
@@ -100,8 +105,35 @@
 #define tmp2		r5
 
 def_fn	strlen p2align=6
+	.fnstart
+	.cfi_startproc
+	/* common pacbti_prologue macro from pacbti.h not used.
+	   handwritten prologue saves one push instruction.  */
+#if __HAVE_PAC_LEAF
+#if __ARM_FEATURE_BTI_DEFAULT
+	pacbti ip, lr, sp
+#else
+	pac ip, lr, sp
+#endif /* __ARM_FEATURE_BTI_DEFAULT */
+	.cfi_register 143, 12
+	push    {r4, r5, ip}
+	.save   {r4, r5, ra_auth_code}
+	.cfi_def_cfa_offset 12
+	.cfi_offset 143, -4
+	.cfi_offset 5, -8
+	.cfi_offset 4, -12
+
+#else
+#if __ARM_FEATURE_BTI_DEFAULT
+	bti
+#endif /* __ARM_FEATURE_BTI_DEFAULT */
+	push    {r4, r5}
+	.save   {r4, r5}
+	.cfi_def_cfa_offset 8
+	.cfi_offset 5, -4
+	.cfi_offset 4, -8
+#endif /* __HAVE_PAC_LEAF */
 	pld	[srcin, #0]
-	strd	r4, r5, [sp, #-8]!
 	bic	src, srcin, #7
 	mvn	const_m1, #0
 	ands	tmp1, srcin, #7		/* (8 - bytes) to alignment.  */
@@ -159,9 +191,22 @@ def_fn	strlen p2align=6
 	rev	data1a, data1a
 #endif
 	clz	data1a, data1a
-	ldrd	r4, r5, [sp], #8
 	add	result, result, data1a, lsr #3	/* Bits -> Bytes.  */
-	bx	lr
+#if __HAVE_PAC_LEAF
+	pop	{r4, r5, ip}
+	.cfi_restore 4
+	.cfi_restore 5
+	.cfi_restore 143
+	.cfi_def_cfa_offset 0
+	aut ip, lr, sp
+#else
+	ldrd	r4, r5, [sp], #8
+	.cfi_restore 4
+	.cfi_restore 5
+	.cfi_def_cfa_offset 0
+#endif /* __HAVE_PAC_LEAF */
+	bx lr
+
 
 .Lmisaligned8:
 	ldrd	data1a, data1b, [src]
@@ -177,4 +222,6 @@ def_fn	strlen p2align=6
 	movne	data1a, const_m1
 	mov	const_0, #0
 	b	.Lstart_realigned
+	.cfi_endproc
+	.fnend
 	.size	strlen, . - strlen
diff --git a/newlib/libc/machine/arm/strlen-thumb2-Os.S b/newlib/libc/machine/arm/strlen-thumb2-Os.S
index 961f41a0a..571068dac 100644
--- a/newlib/libc/machine/arm/strlen-thumb2-Os.S
+++ b/newlib/libc/machine/arm/strlen-thumb2-Os.S
@@ -25,6 +25,7 @@
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.  */
 
 #include "acle-compat.h"
+#include "pacbti.h"
 
 	.macro def_fn f p2align=0
 	.text
@@ -33,8 +34,9 @@
 	.type \f, %function
 \f:
 	.endm
-
-#if __ARM_ARCH_ISA_THUMB >= 2 && __ARM_ARCH >= 7
+#if __ARM_ARCH_PROFILE == 'M' && __ARM_ARCH >= 8
+	/* keep config inherited from -march=.  */
+#elif __ARM_ARCH_ISA_THUMB >= 2 && __ARM_ARCH >= 7
 	.arch   armv7
 #else
 	.arch	armv6t2
@@ -44,11 +46,16 @@
 	.syntax unified
 
 def_fn	strlen p2align=1
+	.fnstart
+	.cfi_startproc
+	pacbti_prologue
 	mov     r3, r0
 1:	ldrb.w  r2, [r3], #1
 	cmp     r2, #0
 	bne	1b
 	subs    r0, r3, r0
 	subs    r0, #1
-	bx      lr
+	pacbti_epilogue
+	.cfi_endproc
+	.fnend
 	.size	strlen, . - strlen
-- 
2.36.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 4/8] newlib: libc: memchr M-profile PACBTI-enablement
  2022-08-03 15:35 [PATCH v2 0/8] Implement assembly cortex-M PACBTI functionality Victor Do Nascimento
                   ` (2 preceding siblings ...)
  2022-08-03 15:35 ` [PATCH v2 3/8] newlib: libc: strlen " Victor Do Nascimento
@ 2022-08-03 15:35 ` Victor Do Nascimento
  2022-08-03 15:35 ` [PATCH v2 5/8] newlib: libc: memcpy " Victor Do Nascimento
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2022-08-03 15:35 UTC (permalink / raw)
  To: newlib; +Cc: richard.earnshaw, Victor Do Nascimento

Add function prologue/epilogue to conditionally add BTI landing pads
and/or PAC code generation & authentication instructions depending on
compilation flags.
---
 newlib/libc/machine/arm/memchr.S | 55 ++++++++++++++++++++++++++++----
 1 file changed, 48 insertions(+), 7 deletions(-)

diff --git a/newlib/libc/machine/arm/memchr.S b/newlib/libc/machine/arm/memchr.S
index 1a4c6512c..6bdccb934 100644
--- a/newlib/libc/machine/arm/memchr.S
+++ b/newlib/libc/machine/arm/memchr.S
@@ -76,6 +76,7 @@
 	.syntax unified
 
 #include "acle-compat.h"
+#include "pacbti.h"
 
 @ NOTE: This ifdef MUST match the one in memchr-stub.c
 #if defined (__ARM_NEON__) || defined (__ARM_NEON)
@@ -267,10 +268,14 @@ memchr:
 #elif __ARM_ARCH_ISA_THUMB >= 2 && defined (__ARM_FEATURE_DSP)
 
 #if __ARM_ARCH_PROFILE == 'M'
-       .arch armv7e-m
+#if __ARM_ARCH >= 8
+	/* keep config inherited from -march=.  */
 #else
-       .arch armv6t2
-#endif
+	.arch armv7e-m
+#endif /* __ARM_ARCH >= 8 */
+#else
+	.arch armv6t2
+#endif /* __ARM_ARCH_PROFILE == 'M' */
 
 @ this lets us check a flag in a 00/ff byte easily in either endianness
 #ifdef __ARMEB__
@@ -287,11 +292,14 @@ memchr:
 	.p2align 4,,15
 	.global memchr
 	.type memchr,%function
+	.fnstart
+	.cfi_startproc
 memchr:
 	@ r0 = start of memory to scan
 	@ r1 = character to look for
 	@ r2 = length
 	@ returns r0 = pointer to character or NULL if not found
+	pacbti_prologue
 	and	r1,r1,#0xff	@ Don't trust the caller to pass a char
 
 	cmp	r2,#16		@ If short don't bother with anything clever
@@ -313,6 +321,19 @@ memchr:
 10:
 	@ We are aligned, we know we have at least 8 bytes to work with
 	push	{r4,r5,r6,r7}
+	.save   {r4-r7}
+	.cfi_adjust_cfa_offset 16
+#if __HAVE_PAC_LEAF
+	.cfi_offset 4, -20
+	.cfi_offset 5, -16
+	.cfi_offset 6, -12
+	.cfi_offset 7, -8
+#else
+	.cfi_offset 4, -16
+	.cfi_offset 5, -12
+	.cfi_offset 6, -8
+	.cfi_offset 7, -4
+#endif /*  __HAVE_PAC_LEAF*/
 	orr	r1, r1, r1, lsl #8	@ expand the match word across all bytes
 	orr	r1, r1, r1, lsl #16
 	bic	r4, r2, #7	@ Number of double words to work with * 8
@@ -334,6 +355,11 @@ memchr:
 	bne	15b		@ (Flags from the subs above)
 
 	pop	{r4,r5,r6,r7}
+	.cfi_restore 7
+	.cfi_restore 6
+	.cfi_restore 5
+	.cfi_restore 4
+	.cfi_adjust_cfa_offset -16
 	and	r1,r1,#0xff	@ r1 back to a single character
 	and	r2,r2,#7	@ Leave the count remaining as the number
 				@ after the double words have been done
@@ -350,11 +376,11 @@ memchr:
 
 40:
 	movs	r0,#0		@ not found
-	bx	lr
+	pacbti_epilogue
 
 50:
 	subs	r0,r0,#1	@ found
-	bx	lr
+	pacbti_epilogue
 
 60:  @ We're here because the fast path found a hit 
      @ now we have to track down exactly which word it was
@@ -378,9 +404,24 @@ memchr:
 	addeq	r0,r0,#1
 
 61:
-	pop	{r4,r5,r6,r7}
 	subs	r0,r0,#1
-	bx	lr
+#if __HAVE_PAC_LEAF
+	pop	{r4,r5,r6,r7,ip}
+	.cfi_restore 143
+#else
+	pop	{r4,r5,r6,r7}
+#endif /* __HAVE_PAC_LEAF*/
+	.cfi_restore 7
+	.cfi_restore 6
+	.cfi_restore 5
+	.cfi_restore 4
+	.cfi_def_cfa_offset 0
+#if __HAVE_PAC_LEAF
+	aut ip, lr, sp
+#endif /* __HAVE_PAC_LEAF*/
+	bx lr
+	.cfi_endproc
+	.fnend
 #else
   /* Defined in memchr-stub.c.  */
 #endif
-- 
2.36.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 5/8] newlib: libc: memcpy M-profile PACBTI-enablement
  2022-08-03 15:35 [PATCH v2 0/8] Implement assembly cortex-M PACBTI functionality Victor Do Nascimento
                   ` (3 preceding siblings ...)
  2022-08-03 15:35 ` [PATCH v2 4/8] newlib: libc: memchr " Victor Do Nascimento
@ 2022-08-03 15:35 ` Victor Do Nascimento
  2022-08-03 15:35 ` [PATCH v2 6/8] newlib: libc: setjmp/longjmp " Victor Do Nascimento
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2022-08-03 15:35 UTC (permalink / raw)
  To: newlib; +Cc: richard.earnshaw, Victor Do Nascimento

Add function prologue/epilogue to conditionally add BTI landing pads
and/or PAC code generation & authentication instructions depending on
compilation flags.
---
 newlib/libc/machine/arm/memcpy-armv7m.S | 74 ++++++++++++++++++++++---
 1 file changed, 66 insertions(+), 8 deletions(-)

diff --git a/newlib/libc/machine/arm/memcpy-armv7m.S b/newlib/libc/machine/arm/memcpy-armv7m.S
index c8bff36f6..d9b9e4984 100644
--- a/newlib/libc/machine/arm/memcpy-armv7m.S
+++ b/newlib/libc/machine/arm/memcpy-armv7m.S
@@ -46,6 +46,8 @@
      __OPT_BIG_BLOCK_SIZE: Size of big block in words.  Default to 64.
      __OPT_MID_BLOCK_SIZE: Size of big block in words.  Default to 16.
  */
+#include "pacbti.h"
+
 #ifndef __OPT_BIG_BLOCK_SIZE
 #define __OPT_BIG_BLOCK_SIZE (4 * 16)
 #endif
@@ -85,6 +87,8 @@
 	.global	memcpy
 	.thumb
 	.thumb_func
+	.fnstart
+	.cfi_startproc
 	.type	memcpy, %function
 memcpy:
 	@ r0: dst
@@ -93,10 +97,31 @@ memcpy:
 #ifdef __ARM_FEATURE_UNALIGNED
 	/* In case of UNALIGNED access supported, ip is not used in
 	   function body.  */
+	pacbti_prologue
 	mov	ip, r0
 #else
+#if __HAVE_PAC_LEAF
+#if __ARM_FEATURE_BTI_DEFAULT
+	pacbti	ip, lr, sp
+#else
+	pac	ip, lr, sp
+#endif /* __ARM_FEATURE_BTI_DEFAULT */
+	.cfi_register 143, 12
+	push	{r0, ip}
+	.save	{r0, ra_auth_code}
+	.cfi_def_cfa_offset 8
+	.cfi_offset 0, -8
+	.cfi_offset 143, -4
+#else
+#if __ARM_FEATURE_BTI_DEFAULT
+	bti
+#endif /* __ARM_FEATURE_BTI_DEFAULT */
 	push	{r0}
-#endif
+	.save	{r0}
+	.cfi_adjust_cfa_offset 4
+	.cfi_offset 0, -4
+#endif /* __HAVE_PAC_LEAF */
+#endif /* __ARM_FEATURE_UNALIGNED */
 	orr	r3, r1, r0
 	ands	r3, r3, #3
 	bne	.Lmisaligned_copy
@@ -135,13 +160,13 @@ memcpy:
 	ldr	r3, [r1], #4
 	str	r3, [r0], #4
 	END_UNROLL
-#else /* __ARM_ARCH_7M__ */
+#else
 	ldr	r3, [r1, \offset]
 	str	r3, [r0, \offset]
 	END_UNROLL
 	adds    r0, __OPT_MID_BLOCK_SIZE
 	adds    r1, __OPT_MID_BLOCK_SIZE
-#endif
+#endif /* __ARM_ARCH_7M__ */
 	subs	r2, __OPT_MID_BLOCK_SIZE
 	bhs	.Lmid_block_loop
 
@@ -180,10 +205,21 @@ memcpy:
 .Ldone:
 #ifdef __ARM_FEATURE_UNALIGNED
 	mov	r0, ip
+	pacbti_epilogue
+#else
+#if __HAVE_PAC_LEAF
+	pop	{r0, ra_auth_code}
+	.cfi_restore 0
+	.cfi_restore 143
+	.cfi_def_cfa_offset 0
+	aut ip, lr, sp
 #else
 	pop	{r0}
-#endif
-	bx	lr
+	.cfi_restore 0
+	.cfi_def_cfa_offset 0
+#endif /* __HAVE_PAC_LEAF */
+	bx lr
+#endif /*  __ARM_FEATURE_UNALIGNED */
 
 	.align 2
 .Lmisaligned_copy:
@@ -247,6 +283,15 @@ memcpy:
 	/* dst is aligned, but src isn't.  Misaligned copy.  */
 
 	push	{r4, r5}
+	.save   {r4, r5}
+	.cfi_adjust_cfa_offset 8
+#if __HAVE_PAC_LEAF /* we pushed just the pac code.  */
+	.cfi_offset 4, -12
+	.cfi_offset 5, -8
+#else /* we haven't pushed anything to stack.  */
+	.cfi_offset 4, -8
+	.cfi_offset 5, -4
+#endif /* __HAVE_PAC_LEAF */
 	subs	r2, #4
 
 	/* Backward r1 by misaligned bytes, to make r1 aligned.
@@ -299,6 +344,7 @@ memcpy:
 	adds	r2, #4
 	subs	r1, ip
 	pop	{r4, r5}
+	.cfi_adjust_cfa_offset -8
 
 #endif /* __ARM_FEATURE_UNALIGNED */
 
@@ -321,9 +367,21 @@ memcpy:
 
 #ifdef __ARM_FEATURE_UNALIGNED
 	mov	r0, ip
+	pacbti_epilogue
+#else
+#if __HAVE_PAC_LEAF
+	pop	{r0, ra_auth_code}
+	.cfi_restore 0
+	.cfi_restore 143
+	.cfi_def_cfa_offset 0
+	aut ip, lr, sp
 #else
 	pop	{r0}
-#endif
-	bx	lr
-
+	.cfi_restore 0
+	.cfi_def_cfa_offset 0
+#endif /* __HAVE_PAC_LEAF */
+	bx lr
+#endif /* __ARM_FEATURE_UNALIGNED */
+	.cfi_endproc
+	.fnend
 	.size	memcpy, .-memcpy
-- 
2.36.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 6/8] newlib: libc: setjmp/longjmp M-profile PACBTI-enablement
  2022-08-03 15:35 [PATCH v2 0/8] Implement assembly cortex-M PACBTI functionality Victor Do Nascimento
                   ` (4 preceding siblings ...)
  2022-08-03 15:35 ` [PATCH v2 5/8] newlib: libc: memcpy " Victor Do Nascimento
@ 2022-08-03 15:35 ` Victor Do Nascimento
  2022-08-03 15:35 ` [PATCH v2 7/8] newlib: libc: aeabi_memmove " Victor Do Nascimento
  2022-08-03 15:35 ` [PATCH v2 8/8] newlib: libc: aeabi_memset " Victor Do Nascimento
  7 siblings, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2022-08-03 15:35 UTC (permalink / raw)
  To: newlib; +Cc: richard.earnshaw, Victor Do Nascimento

Add function prologue/epilogue to conditionally add BTI landing pads
and/or PAC code generation & authentication instructions depending on
compilation flags.
---
 newlib/libc/machine/arm/setjmp.S | 33 ++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/newlib/libc/machine/arm/setjmp.S b/newlib/libc/machine/arm/setjmp.S
index 21d6ff9e7..d60dd57df 100644
--- a/newlib/libc/machine/arm/setjmp.S
+++ b/newlib/libc/machine/arm/setjmp.S
@@ -157,11 +157,15 @@ SYM (.arm_start_of.\name):
 	.globl SYM (\name)
 	TYPE (\name)
 SYM (\name):
+	.fnstart
+	.cfi_startproc
 	PROLOGUE \name
 .endm
 
 .macro FUNC_END name
 	RET
+	.cfi_endproc
+	.fnend
 	SIZE (\name)
 .endm
 	
@@ -173,11 +177,26 @@ SYM (\name):
 
 	/* Save all the callee-preserved registers into the jump buffer.  */
 #ifdef __thumb2__
+#if __ARM_FEATURE_PAC_DEFAULT
+#if __ARM_FEATURE_BTI_DEFAULT
+	pacbti		ip, lr, sp
+#else
+	pac		ip, lr, sp
+#endif /* __ARM_FEATURE_BTI_DEFAULT */
+	.cfi_register 143, 12
+	mov		a4, ip
+	mov		ip, sp
+	stmea		a1!, { a4, v1-v7, fp, ip, lr }
+#else
+#if __ARM_FEATURE_BTI_DEFAULT
+	bti
+#endif /* __ARM_FEATURE_BTI_DEFAULT */
 	mov		ip, sp
 	stmea		a1!, { v1-v7, fp, ip, lr }
+#endif /* __ARM_FEATURE_PAC_DEFAULT */
 #else
 	stmea		a1!, { v1-v7, fp, ip, sp, lr }
-#endif
+#endif /* __thumb2__ */
 	
 #if 0	/* Simulator does not cope with FP instructions yet.  */
 #ifndef __SOFTFP__
@@ -200,11 +219,17 @@ SYM (\name):
 	
 	/* Restore the registers, retrieving the state when setjmp() was called.  */
 #ifdef __thumb2__
+#if __ARM_FEATURE_PAC_DEFAULT
+	ldmfd		a1!, { a4, v1-v7, fp, ip, lr }
+	mov		sp, ip
+	mov         ip, a4
+#else
 	ldmfd		a1!, { v1-v7, fp, ip, lr }
 	mov		sp, ip
+#endif /* __ARM_FEATURE_PAC_DEFAULT */
 #else
 	ldmfd		a1!, { v1-v7, fp, ip, sp, lr }
-#endif
+#endif /* __thumb2__ */
 	
 #if 0	/* Simulator does not cope with FP instructions yet.  */
 #ifndef __SOFTFP__
@@ -220,5 +245,9 @@ SYM (\name):
 #endif
 	moveq		a1, #1
 
+#if __ARM_FEATURE_PAC_DEFAULT
+	aut		ip, lr, sp
+#endif
+
 	FUNC_END longjmp
 #endif
-- 
2.36.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 7/8] newlib: libc: aeabi_memmove M-profile PACBTI-enablement
  2022-08-03 15:35 [PATCH v2 0/8] Implement assembly cortex-M PACBTI functionality Victor Do Nascimento
                   ` (5 preceding siblings ...)
  2022-08-03 15:35 ` [PATCH v2 6/8] newlib: libc: setjmp/longjmp " Victor Do Nascimento
@ 2022-08-03 15:35 ` Victor Do Nascimento
  2022-08-03 15:35 ` [PATCH v2 8/8] newlib: libc: aeabi_memset " Victor Do Nascimento
  7 siblings, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2022-08-03 15:35 UTC (permalink / raw)
  To: newlib; +Cc: richard.earnshaw, Victor Do Nascimento

Add function prologue/epilogue to conditionally add BTI landing pads
and/or PAC code generation & authentication instructions depending on
compilation flags.
---
 .../libc/machine/arm/aeabi_memmove-thumb2.S   | 54 +++++++++++++++++--
 1 file changed, 49 insertions(+), 5 deletions(-)

diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb2.S b/newlib/libc/machine/arm/aeabi_memmove-thumb2.S
index e9504437b..34054d96c 100644
--- a/newlib/libc/machine/arm/aeabi_memmove-thumb2.S
+++ b/newlib/libc/machine/arm/aeabi_memmove-thumb2.S
@@ -26,6 +26,8 @@
  * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+	#include "pacbti.h"
+
 	.thumb
 	.syntax unified
 	.global __aeabi_memmove
@@ -33,8 +35,28 @@
 	ASM_ALIAS __aeabi_memmove4 __aeabi_memmove
 	ASM_ALIAS __aeabi_memmove8 __aeabi_memmove
 __aeabi_memmove:
-	cmp	r0, r1
+	.fnstart
+	.cfi_startproc
+#if __HAVE_PAC_LEAF
+#if __ARM_FEATURE_BTI_DEFAULT
+	pacbti ip, lr, sp
+#else
+	pac ip, lr, sp
+#endif /* __ARM_FEATURE_BTI_DEFAULT */
+	.cfi_register 143, 12
+	push	{r4, ip}
+	.save {r4, ra_auth_code}
+	.cfi_def_cfa_offset 8
+	.cfi_offset 4, -8
+	.cfi_offset 143, -4
+#elif __ARM_FEATURE_BTI_DEFAULT
+	bti
 	push	{r4}
+	.save   {r4}
+	.cfi_def_cfa_offset 4
+	.cfi_offset 4, -4
+#endif /* __HAVE_PAC_LEAF */
+	cmp	r0, r1
 	bls	3f
 	adds	r3, r1, r2
 	cmp	r0, r3
@@ -48,8 +70,18 @@ __aeabi_memmove:
 	strb	r4, [r1, #-1]!
 	bne	1b
 2:
-	pop	{r4}
-	bx	lr
+#if __HAVE_PAC_LEAF
+	pop {r4, ip}
+	.cfi_restore 4
+	.cfi_restore 143
+	.cfi_def_cfa_offset 0
+	aut ip, lr, sp
+#else
+	pop {r4}
+	.cfi_restore 4
+	.cfi_def_cfa_offset 0
+#endif /* __HAVE_PAC_LEAF */
+	bx lr
 3:
 	cmp	r2, #0
 	beq	2b
@@ -60,6 +92,18 @@ __aeabi_memmove:
 	cmp	r2, r1
 	strb	r4, [r3, #1]!
 	bne	4b
-	pop	{r4}
-	bx	lr
+#if __HAVE_PAC_LEAF
+	pop {r4, ip}
+	.cfi_restore 4
+	.cfi_restore 143
+	.cfi_def_cfa_offset 0
+	aut ip, lr, sp
+#else
+	pop {r4}
+	.cfi_restore 4
+	.cfi_def_cfa_offset 0
+#endif /* __HAVE_PAC_LEAF */
+	bx lr
+	.cfi_endproc
+	.fnend
 	.size __aeabi_memmove, . - __aeabi_memmove
-- 
2.36.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 8/8] newlib: libc: aeabi_memset M-profile PACBTI-enablement
  2022-08-03 15:35 [PATCH v2 0/8] Implement assembly cortex-M PACBTI functionality Victor Do Nascimento
                   ` (6 preceding siblings ...)
  2022-08-03 15:35 ` [PATCH v2 7/8] newlib: libc: aeabi_memmove " Victor Do Nascimento
@ 2022-08-03 15:35 ` Victor Do Nascimento
  7 siblings, 0 replies; 18+ messages in thread
From: Victor Do Nascimento @ 2022-08-03 15:35 UTC (permalink / raw)
  To: newlib; +Cc: richard.earnshaw, Victor Do Nascimento

Add function prologue/epilogue to conditionally add BTI landing pads
and/or PAC code generation & authentication instructions depending on
compilation flags.
---
 newlib/libc/machine/arm/aeabi_memset-thumb2.S | 47 ++++++++++++++++++-
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb2.S b/newlib/libc/machine/arm/aeabi_memset-thumb2.S
index eaca1d8d7..8bd4b5eaa 100644
--- a/newlib/libc/machine/arm/aeabi_memset-thumb2.S
+++ b/newlib/libc/machine/arm/aeabi_memset-thumb2.S
@@ -26,14 +26,43 @@
  * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#include "pacbti.h"
+
 	.thumb
 	.syntax unified
 	.global __aeabi_memset
 	.type	__aeabi_memset, %function
+	.fnstart
+	.cfi_startproc
 	ASM_ALIAS __aeabi_memset4 __aeabi_memset
 	ASM_ALIAS __aeabi_memset8 __aeabi_memset
 __aeabi_memset:
+#if __HAVE_PAC_LEAF
+#if __ARM_FEATURE_BTI_DEFAULT
+	pacbti	ip, lr, sp
+#else
+	pac	ip, lr, sp
+#endif /* __ARM_FEATURE_BTI_DEFAULT */
+	.cfi_register 143, 12
+	push	{r4, r5, r6, ip}
+	.save	{r4, r5, r6, ra_auth_code}
+	.cfi_def_cfa_offset 16
+	.cfi_offset 4, -16
+	.cfi_offset 5, -12
+	.cfi_offset 6, -8
+	.cfi_offset 143, -4
+#else
+#if __ARM_FEATURE_BTI_DEFAULT
+	bti
+#endif /* __ARM_FEATURE_BTI_DEFAULT */
 	push	{r4, r5, r6}
+	.save	{r4, r5, r6}
+	.cfi_def_cfa_offset 12
+	.cfi_offset 4, -12
+	.cfi_offset 5, -8
+	.cfi_offset 6, -4
+#endif /* __HAVE_PAC_LEAF */
+
 	lsls	r4, r0, #30
 	beq	10f
 	subs	r4, r1, #1
@@ -98,10 +127,26 @@ __aeabi_memset:
 	cmp	r3, r4
 	bne	8b
 9:
+#if __HAVE_PAC_LEAF
+	pop	{r4, r5, r6, ip}
+	.cfi_restore 143
+	.cfi_restore 6
+	.cfi_restore 5
+	.cfi_restore 4
+	.cfi_def_cfa_offset 0
+	aut ip, lr, sp
+#else
 	pop	{r4, r5, r6}
-	bx	lr
+	.cfi_restore 6
+	.cfi_restore 5
+	.cfi_restore 4
+	.cfi_def_cfa_offset 0
+#endif /* __HAVE_PAC_LEAF */
+	bx lr
 10:
 	mov	r4, r1
 	mov	r3, r0
 	b	3b
+	.cfi_endproc
+	.fnend
 	.size __aeabi_memset, . - __aeabi_memset
-- 
2.36.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/8] newlib: libc: define M-profile PACBTI-enablement macros
  2022-08-03 15:35 ` [PATCH v2 1/8] newlib: libc: define M-profile PACBTI-enablement macros Victor Do Nascimento
@ 2022-08-04 15:19   ` Richard Earnshaw
  2022-08-05  9:21     ` Victor L. Do Nascimento
  2022-08-16 16:11     ` Victor L. Do Nascimento
  0 siblings, 2 replies; 18+ messages in thread
From: Richard Earnshaw @ 2022-08-04 15:19 UTC (permalink / raw)
  To: Victor Do Nascimento, newlib; +Cc: richard.earnshaw



On 03/08/2022 16:35, Victor Do Nascimento wrote:
> Create an assembly header file that conditionally defines fuction
> prologues/epilogues depending on the compile-time mbranch-protection
> argument values.
> 
> * newlib/libc/machine/arm/pacbti.h: New.
> ---
>   newlib/libc/machine/arm/pacbti.h | 64 ++++++++++++++++++++++++++++++++
>   1 file changed, 64 insertions(+)
>   create mode 100644 newlib/libc/machine/arm/pacbti.h
> 
> diff --git a/newlib/libc/machine/arm/pacbti.h b/newlib/libc/machine/arm/pacbti.h
> new file mode 100644
> index 000000000..4c31d00bc
> --- /dev/null
> +++ b/newlib/libc/machine/arm/pacbti.h
> @@ -0,0 +1,64 @@
> +/*
> + * Copyright (c) 2022 Arm Ltd
> + * All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. The name of the company may not be used to endorse or promote
> + *    products derived from this software without specific prior written
> + *    permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED
> + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
> + * IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
> + * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
> + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
> + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
> + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +/* Checki whether leaf function PAC signing has been requested
> +   in the -mbranch-protect compile-time option.  */
> +#define LEAF_PROTECT_BIT 2
> +#define __HAVE_PAC_LEAF \
> +	__ARM_FEATURE_PAC_DEFAULT & (1 << LEAF_PROTECT_BIT)

Either this header needs to avoid polluting the user namespace, or it 
doesn't.  If it does, then LEAF_PROTECT_BIT fails to do that.  If it 
doesn't then __HAVE_PAC_LEAF should really be HAVE_PAC_LEAF (to avoid 
polluting the reserved namespace.  I suspect this header is private to 
newlib (won't be exported to users), so should not be prefixing names 
with __.

> +
> +/* Macro to handle function entry depending on branch-protection
> +   schemes.  */
> +	.macro pacbti_prologue
> +#if __HAVE_PAC_LEAF
> +#if __ARM_FEATURE_BTI_DEFAULT
> +	pacbti ip, lr, sp
> +#else
> +	pac ip, lr, sp
> +#endif /* __ARM_FEATURE_BTI_DEFAULT */
> +	.cfi_register 143, 12
> +	str ip, [sp, #-4]!

This causes the stack to lose 8-byte alignment.  Whilst for leaf 
functions that's probably not a problem, the macro should have an option 
where the user can ask for alignment to be preserved.  But there should 
also be an option to not save IP on the stack at all (for when the user 
does not modify IP in the function body).

> +	.save {ra_auth_code}
> +	.cfi_def_cfa_offset 4
> +	.cfi_offset 143, -4
> +#elif __ARM_FEATURE_BTI_DEFAULT
> +	bti
> +#endif /* __HAVE_PAC_LEAF */
> +	.endm
> +
> +/* Macro to handle different branch exchange cases depending on
> +   branch-protection schemes.  */
> +	.macro pacbti_epilogue
> +#if __HAVE_PAC_LEAF
> +	ldr ip, [sp], #4
> +	.cfi_restore 143
> +	.cfi_def_cfa_offset 0
> +	aut ip, lr, sp
> +#endif /* __HAVE_PAC_LEAF */
> +	bx lr
> +	.endm

I think these macros are really misnamed, firstly, they're only for leaf 
functions and secondly, they (particularly the epilogue) does something 
even if PAC is not needed.  So I think they should be renamed as 
'leaf_prologue' and 'leaf_epilogue' respectively.

In consequence, I think this file should really be merged into the 
existing arm_asm.h, then we don't need yet another header file.

Finally, the header needs to define a (C) macro that defines how much to 
adjust the CFI offset by for various scenarios:
- When PAC is not used
- When PAC is used with no alignment
- When PAC is used and asked for 8-byte alignment to be preserved.

R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/8] newlib: libc: strcmp M-profile PACBTI-enablement
  2022-08-03 15:35 ` [PATCH v2 2/8] newlib: libc: strcmp M-profile PACBTI-enablement Victor Do Nascimento
@ 2022-08-04 15:48   ` Richard Earnshaw
  2022-08-05 10:51     ` Victor L. Do Nascimento
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Earnshaw @ 2022-08-04 15:48 UTC (permalink / raw)
  To: Victor Do Nascimento, newlib; +Cc: richard.earnshaw



On 03/08/2022 16:35, Victor Do Nascimento wrote:
> Add function prologue/epilogue to conditionally add BTI landing pads
> and/or PAC code generation & authentication instructions depending on
> compilation flags.
> 
> This patch enables PACBTI for all relevant variants of strcmp:
>       * Newlib for armv8.1-m.main+pacbti
>       * Newlib for armv8.1-m.main+pacbti+mve
>       * Newlib-nano
> ---
>   newlib/libc/machine/arm/strcmp-arm-tiny.S |  7 +++-
>   newlib/libc/machine/arm/strcmp-armv7.S    | 45 +++++++++++++++++------
>   newlib/libc/machine/arm/strcmp-armv7m.S   | 36 +++++++++++++++---
>   3 files changed, 71 insertions(+), 17 deletions(-)
> 
> diff --git a/newlib/libc/machine/arm/strcmp-arm-tiny.S b/newlib/libc/machine/arm/strcmp-arm-tiny.S
> index 607a41daf..8085eb4df 100644
> --- a/newlib/libc/machine/arm/strcmp-arm-tiny.S
> +++ b/newlib/libc/machine/arm/strcmp-arm-tiny.S
> @@ -29,10 +29,14 @@
>   /* Tiny version of strcmp in ARM state.  Used only when optimizing
>      for size.  Also supports Thumb-2.  */
>   
> +#include "pacbti.h"
> +
>   	.syntax unified
>   def_fn strcmp
> +	.fnstart
>   	.cfi_sections .debug_frame
>   	.cfi_startproc
> +	pacbti_prologue

See comment on patch1 about the names of these macros.

>   1:
>   	ldrb	r2, [r0], #1
>   	ldrb	r3, [r1], #1
> @@ -42,6 +46,7 @@ def_fn strcmp
>   	beq	1b
>   2:
>   	subs	r0, r2, r3
> -	bx	lr
> +	pacbti_epilogue
>   	.cfi_endproc
> +	.fnend
>   	.size	strcmp, . - strcmp
> diff --git a/newlib/libc/machine/arm/strcmp-armv7.S b/newlib/libc/machine/arm/strcmp-armv7.S
> index 2f93bfb73..584c89f7e 100644
> --- a/newlib/libc/machine/arm/strcmp-armv7.S
> +++ b/newlib/libc/machine/arm/strcmp-armv7.S
> @@ -45,6 +45,8 @@
>   	.thumb
>   	.syntax unified
>   
> +#include "pacbti.h"
> +
>   /* Parameters and result.  */
>   #define src1		r0
>   #define src2		r1
> @@ -91,8 +93,9 @@
>   	ldrd	r4, r5, [sp], #16
>   	.cfi_restore 4
>   	.cfi_restore 5
> +	.cfi_adjust_cfa_offset -16

This looks like it's an omission from the original file and thus should 
be separated out as a separate patch.

>   	sub	result, result, r1, lsr #24
> -	bx	lr
> +	pacbti_epilogue
>   #else
>   	/* To use the big-endian trick we'd have to reverse all three words.
>   	   that's slower than this approach.  */
> @@ -112,22 +115,28 @@
>   	ldrd	r4, r5, [sp], #16
>   	.cfi_restore 4
>   	.cfi_restore 5
> +	.cfi_adjust_cfa_offset -16
>   	sub	result, result, r1
>   
> -	bx	lr
> +	pacbti_epilogue
>   #endif
>   	.endm
>   
> +
>   	.text
>   	.p2align	5
> +	.fnstart
> +	.cfi_sections .debug_frame
> +	.cfi_startproc
>   .Lstrcmp_start_addr:
>   #ifndef STRCMP_NO_PRECHECK
>   .Lfastpath_exit:
>   	sub	r0, r2, r3
> -	bx	lr
> +	pacbti_epilogue
>   	nop
>   #endif
>   def_fn	strcmp
> +	pacbti_prologue
>   #ifndef STRCMP_NO_PRECHECK
>   	ldrb	r2, [src1]
>   	ldrb	r3, [src2]
> @@ -136,16 +145,26 @@ def_fn	strcmp
>   	cmpcs	r2, r3
>   	bne	.Lfastpath_exit
>   #endif
> -	.cfi_sections .debug_frame
> -	.cfi_startproc
>   	strd	r4, r5, [sp, #-16]!
> -	.cfi_def_cfa_offset 16
> +	.save {r4, r5}

Hmm, I've just remembered that leaf functions cannot throw exceptions 
(the EHABI only supports synchronous exceptions), so I don't think we 
need .save directives at all.  Instead, such functions should be marked 
with ".cantunwind".  You still want the dwarf unwind information as 
that's for the debugger, it's just the '.save' directives that aren't 
needed.

> +	.cfi_adjust_cfa_offset 16
> +#if __HAVE_PAC_LEAF
> +	.cfi_offset 4, -20
> +	.cfi_offset 5, -16
> +#else
>   	.cfi_offset 4, -16
>   	.cfi_offset 5, -12
> +#endif /* __HAVE_PAC_LEAF */

Ugh!  This is not the right way to do this.  Instead (see comments on 
patch 1) use a macro that defines the additional adjustment to use in 
various contexts, something like

	.cfi_offset 4, -(16+PAC_UNALIGNED_CFI_ADJ)

>   	orr	tmp1, src1, src2
>   	strd	r6, r7, [sp, #8]
> +	.save {r6, r7}
> +#if __HAVE_PAC_LEAF
> +	.cfi_offset 6, -12
> +	.cfi_offset 7, -8
> +#else
>   	.cfi_offset 6, -8
>   	.cfi_offset 7, -4
> +#endif /* __HAVE_PAC_LEAF */
>   	mvn	const_m1, #0
>   	lsl	r2, tmp1, #29
>   	cbz	r2, .Lloop_aligned8
> @@ -270,7 +289,6 @@ def_fn	strcmp
>   	ldr	data1, [src1], #4
>   	beq	.Laligned_m2
>   	bcs	.Laligned_m1
> -
>   #ifdef STRCMP_NO_PRECHECK
>   	ldrb	data2, [src2, #1]
>   	uxtb	tmp1, data1, ror #BYTE1_OFFSET
> @@ -314,7 +332,8 @@ def_fn	strcmp
>   	mov	result, tmp1
>   	ldr	r4, [sp], #16
>   	.cfi_restore 4
> -	bx	lr
> +	.cfi_adjust_cfa_offset -16
> +	pacbti_epilogue
>   
>   #ifndef STRCMP_NO_PRECHECK
>   .Laligned_m1:
> @@ -364,8 +383,9 @@ def_fn	strcmp
>   	/* R6/7 Not used in this sequence.  */
>   	.cfi_restore 6
>   	.cfi_restore 7
> +	.cfi_adjust_cfa_offset -16
>   	neg	result, result
> -	bx	lr
> +	pacbti_epilogue
>   
>   6:
>   	.cfi_restore_state
> @@ -441,7 +461,8 @@ def_fn	strcmp
>   	/* R6/7 not used in this sequence.  */
>   	.cfi_restore 6
>   	.cfi_restore 7
> -	bx	lr
> +	.cfi_adjust_cfa_offset -16
> +	pacbti_epilogue
>   
>   .Lstrcmp_tail:
>   	.cfi_restore_state
> @@ -463,7 +484,9 @@ def_fn	strcmp
>   	/* R6/7 not used in this sequence.  */
>   	.cfi_restore 6
>   	.cfi_restore 7
> +	.cfi_adjust_cfa_offset -16
>   	sub	result, result, data2, lsr #24
> -	bx	lr
> +	pacbti_epilogue
>   	.cfi_endproc
> +	.fnend
>   	.size strcmp, . - .Lstrcmp_start_addr
> diff --git a/newlib/libc/machine/arm/strcmp-armv7m.S b/newlib/libc/machine/arm/strcmp-armv7m.S
> index cdb4912df..fe1519f4d 100644
> --- a/newlib/libc/machine/arm/strcmp-armv7m.S
> +++ b/newlib/libc/machine/arm/strcmp-armv7m.S
> @@ -29,6 +29,8 @@
>   /* Very similar to the generic code, but uses Thumb2 as implemented
>      in ARMv7-M.  */
>   
> +#include "pacbti.h"
> +
>   /* Parameters and result.  */
>   #define src1		r0
>   #define src2		r1
> @@ -44,8 +46,10 @@
>   	.thumb
>   	.syntax unified
>   def_fn strcmp
> +	.fnstart
>   	.cfi_sections .debug_frame
>   	.cfi_startproc
> +	pacbti_prologue
>   	eor	tmp1, src1, src2
>   	tst	tmp1, #3
>   	/* Strings not at same byte offset from a word boundary.  */
> @@ -106,7 +110,7 @@ def_fn strcmp
>   	lsrs	result, result, #24
>   	subs	result, result, data2
>   #endif
> -	bx	lr
> +	pacbti_epilogue
>   
>   
>   #if 0
> @@ -214,12 +218,18 @@ def_fn strcmp
>   	cmpcs	data1, data2
>   	beq	.Lstrcmp_unaligned
>   	sub	result, data1, data2
> -	bx	lr
> +	pacbti_epilogue
>   
>   2:
>   	stmfd	sp!, {r5}
> -	.cfi_def_cfa_offset 4
> +	.save {r5}
> +	.cfi_adjust_cfa_offset 4
> +#if __HAVE_PAC_LEAF
> +	.cfi_offset 5, -8 /* Account for ip register already on stack.  */
> +#else
>   	.cfi_offset 5, -4
> +#endif /* __HAVE_PAC_LEAF */
> +
>   
>   	ldr	data1, [src1], #4
>   	and	tmp2, src2, #3
> @@ -353,10 +363,17 @@ def_fn strcmp
>   .Lstrcmp_done_equal:
>   	mov	result, #0
>   	.cfi_remember_state
> +#if __HAVE_PAC_LEAF
> +	pop {r5, ip}
> +	.cfi_restore 5
> +	.cfi_restore 143
> +	aut ip, lr, sp
> +#else
>   	ldmfd	sp!, {r5}
>   	.cfi_restore 5
> +#endif /* __HAVE_PAC_LEAF */

I think we could define a macro for this as well, something like
         pop_reg_and_pac r5

While you're doing that, please change the single-register ldmfd into 'pop'.

>   	.cfi_def_cfa_offset 0
> -	bx	lr
> +	bx lr

stray change to the BX instruction.

>   
>   .Lstrcmp_tail:
>   	.cfi_restore_state
> @@ -370,9 +387,18 @@ def_fn strcmp
>   	S2LOEQ	data2, data2, #8
>   	beq	.Lstrcmp_tail
>   	sub	result, r2, result
> +#if __HAVE_PAC_LEAF
> +	pop {r5, ip}
> +	.cfi_restore 5
> +	.cfi_restore 143
> +	.cfi_def_cfa_offset 0
> +	aut ip, lr, sp
> +#else
>   	ldmfd	sp!, {r5}
>   	.cfi_restore 5
>   	.cfi_def_cfa_offset 0
> -	bx	lr
> +#endif /* __HAVE_PAC_LEAF */

Same as above

> +	bx lr

tab between mnemonic and operands.

>   	.cfi_endproc
> +	.fnend
>   	.size strcmp, . - strcmp

R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/8] newlib: libc: define M-profile PACBTI-enablement macros
  2022-08-04 15:19   ` Richard Earnshaw
@ 2022-08-05  9:21     ` Victor L. Do Nascimento
  2022-08-16 16:11     ` Victor L. Do Nascimento
  1 sibling, 0 replies; 18+ messages in thread
From: Victor L. Do Nascimento @ 2022-08-05  9:21 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: newlib, richard.earnshaw

Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:

> On 03/08/2022 16:35, Victor Do Nascimento wrote:
>> Create an assembly header file that conditionally defines fuction
>> prologues/epilogues depending on the compile-time mbranch-protection
>> argument values.
>> * newlib/libc/machine/arm/pacbti.h: New.
>> ---
>>   newlib/libc/machine/arm/pacbti.h | 64 ++++++++++++++++++++++++++++++++
>>   1 file changed, 64 insertions(+)
>>   create mode 100644 newlib/libc/machine/arm/pacbti.h
>> diff --git a/newlib/libc/machine/arm/pacbti.h
>> b/newlib/libc/machine/arm/pacbti.h
>> new file mode 100644
>> index 000000000..4c31d00bc
>> --- /dev/null
>> +++ b/newlib/libc/machine/arm/pacbti.h
>> @@ -0,0 +1,64 @@
>> +/*
>> + * Copyright (c) 2022 Arm Ltd
>> + * All rights reserved.
>> + *
>> + * Redistribution and use in source and binary forms, with or without
>> + * modification, are permitted provided that the following conditions
>> + * are met:
>> + * 1. Redistributions of source code must retain the above copyright
>> + *    notice, this list of conditions and the following disclaimer.
>> + * 2. Redistributions in binary form must reproduce the above copyright
>> + *    notice, this list of conditions and the following disclaimer in the
>> + *    documentation and/or other materials provided with the distribution.
>> + * 3. The name of the company may not be used to endorse or promote
>> + *    products derived from this software without specific prior written
>> + *    permission.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED
>> + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
>> + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
>> + * IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
>> + * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
>> + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
>> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
>> + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
>> + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + */
>> +
>> +/* Checki whether leaf function PAC signing has been requested
>> +   in the -mbranch-protect compile-time option.  */
>> +#define LEAF_PROTECT_BIT 2
>> +#define __HAVE_PAC_LEAF \
>> +	__ARM_FEATURE_PAC_DEFAULT & (1 << LEAF_PROTECT_BIT)
>
> Either this header needs to avoid polluting the user namespace, or it
> doesn't.  If it does, then LEAF_PROTECT_BIT fails to do that.  If it
> doesn't then __HAVE_PAC_LEAF should really be HAVE_PAC_LEAF (to avoid
> polluting the reserved namespace.  I suspect this header is private to
> newlib (won't be exported to users), so should not be prefixing names
> with __.
>
>> +
>> +/* Macro to handle function entry depending on branch-protection
>> +   schemes.  */
>> +	.macro pacbti_prologue
>> +#if __HAVE_PAC_LEAF
>> +#if __ARM_FEATURE_BTI_DEFAULT
>> +	pacbti ip, lr, sp
>> +#else
>> +	pac ip, lr, sp
>> +#endif /* __ARM_FEATURE_BTI_DEFAULT */
>> +	.cfi_register 143, 12
>> +	str ip, [sp, #-4]!
>
> This causes the stack to lose 8-byte alignment.  Whilst for leaf
> functions that's probably not a problem, the macro should have an
> option where the user can ask for alignment to be preserved.  But
> there should also be an option to not save IP on the stack at all (for
> when the user does not modify IP in the function body).

Thanks for the feedback here. So we introduce two options the user is
given control over. I guess this raises two questions concerning design
choices to be made.

1. How do we expose these options to the user, and
2. What should we define as default values?

Regarding defaults, in order to maximize performance my initial guess
would be that we don't push IP to the stack, such that we save on
instructions by omitting pushes and pops and automatically preserve
stack alignment that way.

Even so, that leaves the question of how to best allow users to modify
these parameters prior to compiling the library.

Any thoughts??

>> +	.save {ra_auth_code}
>> +	.cfi_def_cfa_offset 4
>> +	.cfi_offset 143, -4
>> +#elif __ARM_FEATURE_BTI_DEFAULT
>> +	bti
>> +#endif /* __HAVE_PAC_LEAF */
>> +	.endm
>> +
>> +/* Macro to handle different branch exchange cases depending on
>> +   branch-protection schemes.  */
>> +	.macro pacbti_epilogue
>> +#if __HAVE_PAC_LEAF
>> +	ldr ip, [sp], #4
>> +	.cfi_restore 143
>> +	.cfi_def_cfa_offset 0
>> +	aut ip, lr, sp
>> +#endif /* __HAVE_PAC_LEAF */
>> +	bx lr
>> +	.endm
>
> I think these macros are really misnamed, firstly, they're only for
> leaf functions and secondly, they (particularly the epilogue) does
> something even if PAC is not needed.  So I think they should be
> renamed as 'leaf_prologue' and 'leaf_epilogue' respectively.

Agreed.

> In consequence, I think this file should really be merged into the
> existing arm_asm.h, then we don't need yet another header file.
>
> Finally, the header needs to define a (C) macro that defines how much
> to adjust the CFI offset by for various scenarios:
> - When PAC is not used
> - When PAC is used with no alignment
> - When PAC is used and asked for 8-byte alignment to be preserved.

I'm thinking we set the value of PAC_CFI_ADJ contigent on values of
PAC_LEAF_PUSH_SP and PAC_PRESERVE_STACK_ALIGN when HAVE_PAC_LEAF is
set (once I define these new macros, of course).

V.

> R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/8] newlib: libc: strcmp M-profile PACBTI-enablement
  2022-08-04 15:48   ` Richard Earnshaw
@ 2022-08-05 10:51     ` Victor L. Do Nascimento
  2022-08-05 14:34       ` Richard Earnshaw
  0 siblings, 1 reply; 18+ messages in thread
From: Victor L. Do Nascimento @ 2022-08-05 10:51 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: newlib, richard.earnshaw

Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:

> On 03/08/2022 16:35, Victor Do Nascimento wrote:
>> Add function prologue/epilogue to conditionally add BTI landing pads
>> and/or PAC code generation & authentication instructions depending on
>> compilation flags.
>> This patch enables PACBTI for all relevant variants of strcmp:
>>       * Newlib for armv8.1-m.main+pacbti
>>       * Newlib for armv8.1-m.main+pacbti+mve
>>       * Newlib-nano
>> ---
>>   newlib/libc/machine/arm/strcmp-arm-tiny.S |  7 +++-
>>   newlib/libc/machine/arm/strcmp-armv7.S    | 45 +++++++++++++++++------
>>   newlib/libc/machine/arm/strcmp-armv7m.S   | 36 +++++++++++++++---
>>   3 files changed, 71 insertions(+), 17 deletions(-)
>> diff --git a/newlib/libc/machine/arm/strcmp-arm-tiny.S
>> b/newlib/libc/machine/arm/strcmp-arm-tiny.S
>> index 607a41daf..8085eb4df 100644
>> --- a/newlib/libc/machine/arm/strcmp-arm-tiny.S
>> +++ b/newlib/libc/machine/arm/strcmp-arm-tiny.S
>> @@ -29,10 +29,14 @@
>>   /* Tiny version of strcmp in ARM state.  Used only when optimizing
>>      for size.  Also supports Thumb-2.  */
>>   +#include "pacbti.h"
>> +
>>   	.syntax unified
>>   def_fn strcmp
>> +	.fnstart
>>   	.cfi_sections .debug_frame
>>   	.cfi_startproc
>> +	pacbti_prologue
>
> See comment on patch1 about the names of these macros.
>
>>   1:
>>   	ldrb	r2, [r0], #1
>>   	ldrb	r3, [r1], #1
>> @@ -42,6 +46,7 @@ def_fn strcmp
>>   	beq	1b
>>   2:
>>   	subs	r0, r2, r3
>> -	bx	lr
>> +	pacbti_epilogue
>>   	.cfi_endproc
>> +	.fnend
>>   	.size	strcmp, . - strcmp
>> diff --git a/newlib/libc/machine/arm/strcmp-armv7.S b/newlib/libc/machine/arm/strcmp-armv7.S
>> index 2f93bfb73..584c89f7e 100644
>> --- a/newlib/libc/machine/arm/strcmp-armv7.S
>> +++ b/newlib/libc/machine/arm/strcmp-armv7.S
>> @@ -45,6 +45,8 @@
>>   	.thumb
>>   	.syntax unified
>>   +#include "pacbti.h"
>> +
>>   /* Parameters and result.  */
>>   #define src1		r0
>>   #define src2		r1
>> @@ -91,8 +93,9 @@
>>   	ldrd	r4, r5, [sp], #16
>>   	.cfi_restore 4
>>   	.cfi_restore 5
>> +	.cfi_adjust_cfa_offset -16
>
> This looks like it's an omission from the original file and thus
> should be separated out as a separate patch.
>
>>   	sub	result, result, r1, lsr #24
>> -	bx	lr
>> +	pacbti_epilogue
>>   #else
>>   	/* To use the big-endian trick we'd have to reverse all three words.
>>   	   that's slower than this approach.  */
>> @@ -112,22 +115,28 @@
>>   	ldrd	r4, r5, [sp], #16
>>   	.cfi_restore 4
>>   	.cfi_restore 5
>> +	.cfi_adjust_cfa_offset -16
>>   	sub	result, result, r1
>>   -	bx	lr
>> +	pacbti_epilogue
>>   #endif
>>   	.endm
>>   +
>>   	.text
>>   	.p2align	5
>> +	.fnstart
>> +	.cfi_sections .debug_frame
>> +	.cfi_startproc
>>   .Lstrcmp_start_addr:
>>   #ifndef STRCMP_NO_PRECHECK
>>   .Lfastpath_exit:
>>   	sub	r0, r2, r3
>> -	bx	lr
>> +	pacbti_epilogue
>>   	nop
>>   #endif
>>   def_fn	strcmp
>> +	pacbti_prologue
>>   #ifndef STRCMP_NO_PRECHECK
>>   	ldrb	r2, [src1]
>>   	ldrb	r3, [src2]
>> @@ -136,16 +145,26 @@ def_fn	strcmp
>>   	cmpcs	r2, r3
>>   	bne	.Lfastpath_exit
>>   #endif
>> -	.cfi_sections .debug_frame
>> -	.cfi_startproc
>>   	strd	r4, r5, [sp, #-16]!
>> -	.cfi_def_cfa_offset 16
>> +	.save {r4, r5}
>
> Hmm, I've just remembered that leaf functions cannot throw exceptions
> (the EHABI only supports synchronous exceptions), so I don't think we
> need .save directives at all.  Instead, such functions should be
> marked with ".cantunwind".  You still want the dwarf unwind
> information as that's for the debugger, it's just the '.save'
> directives that aren't needed.
>
>> +	.cfi_adjust_cfa_offset 16
>> +#if __HAVE_PAC_LEAF
>> +	.cfi_offset 4, -20
>> +	.cfi_offset 5, -16
>> +#else
>>   	.cfi_offset 4, -16
>>   	.cfi_offset 5, -12
>> +#endif /* __HAVE_PAC_LEAF */
>
> Ugh!  This is not the right way to do this.  Instead (see comments on
> patch 1) use a macro that defines the additional adjustment to use in
> various contexts, something like
>
> 	.cfi_offset 4, -(16+PAC_UNALIGNED_CFI_ADJ)
>
>>   	orr	tmp1, src1, src2
>>   	strd	r6, r7, [sp, #8]
>> +	.save {r6, r7}
>> +#if __HAVE_PAC_LEAF
>> +	.cfi_offset 6, -12
>> +	.cfi_offset 7, -8
>> +#else
>>   	.cfi_offset 6, -8
>>   	.cfi_offset 7, -4
>> +#endif /* __HAVE_PAC_LEAF */
>>   	mvn	const_m1, #0
>>   	lsl	r2, tmp1, #29
>>   	cbz	r2, .Lloop_aligned8
>> @@ -270,7 +289,6 @@ def_fn	strcmp
>>   	ldr	data1, [src1], #4
>>   	beq	.Laligned_m2
>>   	bcs	.Laligned_m1
>> -
>>   #ifdef STRCMP_NO_PRECHECK
>>   	ldrb	data2, [src2, #1]
>>   	uxtb	tmp1, data1, ror #BYTE1_OFFSET
>> @@ -314,7 +332,8 @@ def_fn	strcmp
>>   	mov	result, tmp1
>>   	ldr	r4, [sp], #16
>>   	.cfi_restore 4
>> -	bx	lr
>> +	.cfi_adjust_cfa_offset -16
>> +	pacbti_epilogue
>>     #ifndef STRCMP_NO_PRECHECK
>>   .Laligned_m1:
>> @@ -364,8 +383,9 @@ def_fn	strcmp
>>   	/* R6/7 Not used in this sequence.  */
>>   	.cfi_restore 6
>>   	.cfi_restore 7
>> +	.cfi_adjust_cfa_offset -16
>>   	neg	result, result
>> -	bx	lr
>> +	pacbti_epilogue
>>     6:
>>   	.cfi_restore_state
>> @@ -441,7 +461,8 @@ def_fn	strcmp
>>   	/* R6/7 not used in this sequence.  */
>>   	.cfi_restore 6
>>   	.cfi_restore 7
>> -	bx	lr
>> +	.cfi_adjust_cfa_offset -16
>> +	pacbti_epilogue
>>     .Lstrcmp_tail:
>>   	.cfi_restore_state
>> @@ -463,7 +484,9 @@ def_fn	strcmp
>>   	/* R6/7 not used in this sequence.  */
>>   	.cfi_restore 6
>>   	.cfi_restore 7
>> +	.cfi_adjust_cfa_offset -16
>>   	sub	result, result, data2, lsr #24
>> -	bx	lr
>> +	pacbti_epilogue
>>   	.cfi_endproc
>> +	.fnend
>>   	.size strcmp, . - .Lstrcmp_start_addr
>> diff --git a/newlib/libc/machine/arm/strcmp-armv7m.S b/newlib/libc/machine/arm/strcmp-armv7m.S
>> index cdb4912df..fe1519f4d 100644
>> --- a/newlib/libc/machine/arm/strcmp-armv7m.S
>> +++ b/newlib/libc/machine/arm/strcmp-armv7m.S
>> @@ -29,6 +29,8 @@
>>   /* Very similar to the generic code, but uses Thumb2 as implemented
>>      in ARMv7-M.  */
>>   +#include "pacbti.h"
>> +
>>   /* Parameters and result.  */
>>   #define src1		r0
>>   #define src2		r1
>> @@ -44,8 +46,10 @@
>>   	.thumb
>>   	.syntax unified
>>   def_fn strcmp
>> +	.fnstart
>>   	.cfi_sections .debug_frame
>>   	.cfi_startproc
>> +	pacbti_prologue
>>   	eor	tmp1, src1, src2
>>   	tst	tmp1, #3
>>   	/* Strings not at same byte offset from a word boundary.  */
>> @@ -106,7 +110,7 @@ def_fn strcmp
>>   	lsrs	result, result, #24
>>   	subs	result, result, data2
>>   #endif
>> -	bx	lr
>> +	pacbti_epilogue
>>       #if 0
>> @@ -214,12 +218,18 @@ def_fn strcmp
>>   	cmpcs	data1, data2
>>   	beq	.Lstrcmp_unaligned
>>   	sub	result, data1, data2
>> -	bx	lr
>> +	pacbti_epilogue
>>     2:
>>   	stmfd	sp!, {r5}
>> -	.cfi_def_cfa_offset 4
>> +	.save {r5}
>> +	.cfi_adjust_cfa_offset 4
>> +#if __HAVE_PAC_LEAF
>> +	.cfi_offset 5, -8 /* Account for ip register already on stack.  */
>> +#else
>>   	.cfi_offset 5, -4
>> +#endif /* __HAVE_PAC_LEAF */
>> +
>>     	ldr	data1, [src1], #4
>>   	and	tmp2, src2, #3
>> @@ -353,10 +363,17 @@ def_fn strcmp
>>   .Lstrcmp_done_equal:
>>   	mov	result, #0
>>   	.cfi_remember_state
>> +#if __HAVE_PAC_LEAF
>> +	pop {r5, ip}
>> +	.cfi_restore 5
>> +	.cfi_restore 143
>> +	aut ip, lr, sp
>> +#else
>>   	ldmfd	sp!, {r5}
>>   	.cfi_restore 5
>> +#endif /* __HAVE_PAC_LEAF */
>
> I think we could define a macro for this as well, something like
>         pop_reg_and_pac r5

This sounds good, but a survey through relevant function epilogues
reveals a series of different `pop' scenarios. On the one hand, we have
pop {r5, ip}, as we have here but on another we also have, for example,
pop {r4,r5,r6,r7,ip}.

Do we wish to provide a short-hand notation for one pop operation but
not the other via macros? My initial understanding is that assembler
macros don't provide us with the necessary flexibility to handle all
relevant cases.

I think the use of a macro here would greatly enhance code readability
and by extension maintainability, but at first sight it feels inelegant
to provide a solution applicable only to a single case of a more general
problem.

Any wisdom here would be appreciated.

V.

> While you're doing that, please change the single-register ldmfd into 'pop'.
>
>>   	.cfi_def_cfa_offset 0
>> -	bx	lr
>> +	bx lr
>
> stray change to the BX instruction.
>
>>     .Lstrcmp_tail:
>>   	.cfi_restore_state
>> @@ -370,9 +387,18 @@ def_fn strcmp
>>   	S2LOEQ	data2, data2, #8
>>   	beq	.Lstrcmp_tail
>>   	sub	result, r2, result
>> +#if __HAVE_PAC_LEAF
>> +	pop {r5, ip}
>> +	.cfi_restore 5
>> +	.cfi_restore 143
>> +	.cfi_def_cfa_offset 0
>> +	aut ip, lr, sp
>> +#else
>>   	ldmfd	sp!, {r5}
>>   	.cfi_restore 5
>>   	.cfi_def_cfa_offset 0
>> -	bx	lr
>> +#endif /* __HAVE_PAC_LEAF */
>
> Same as above
>
>> +	bx lr
>
> tab between mnemonic and operands.
>
>>   	.cfi_endproc
>> +	.fnend
>>   	.size strcmp, . - strcmp
>
> R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/8] newlib: libc: strcmp M-profile PACBTI-enablement
  2022-08-05 10:51     ` Victor L. Do Nascimento
@ 2022-08-05 14:34       ` Richard Earnshaw
  2022-08-05 14:58         ` Victor L. Do Nascimento
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Earnshaw @ 2022-08-05 14:34 UTC (permalink / raw)
  To: Victor L. Do Nascimento; +Cc: newlib, richard.earnshaw



On 05/08/2022 11:51, Victor L. Do Nascimento wrote:
> Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:
> 
>> On 03/08/2022 16:35, Victor Do Nascimento wrote:
>>> Add function prologue/epilogue to conditionally add BTI landing pads
>>> and/or PAC code generation & authentication instructions depending on
>>> compilation flags.
>>> This patch enables PACBTI for all relevant variants of strcmp:
>>>        * Newlib for armv8.1-m.main+pacbti
>>>        * Newlib for armv8.1-m.main+pacbti+mve
>>>        * Newlib-nano
>>> ---
>>>    newlib/libc/machine/arm/strcmp-arm-tiny.S |  7 +++-
>>>    newlib/libc/machine/arm/strcmp-armv7.S    | 45 +++++++++++++++++------
>>>    newlib/libc/machine/arm/strcmp-armv7m.S   | 36 +++++++++++++++---
>>>    3 files changed, 71 insertions(+), 17 deletions(-)
>>> diff --git a/newlib/libc/machine/arm/strcmp-arm-tiny.S
>>> b/newlib/libc/machine/arm/strcmp-arm-tiny.S
>>> index 607a41daf..8085eb4df 100644
>>> --- a/newlib/libc/machine/arm/strcmp-arm-tiny.S
>>> +++ b/newlib/libc/machine/arm/strcmp-arm-tiny.S
>>> @@ -29,10 +29,14 @@
>>>    /* Tiny version of strcmp in ARM state.  Used only when optimizing
>>>       for size.  Also supports Thumb-2.  */
>>>    +#include "pacbti.h"
>>> +
>>>    	.syntax unified
>>>    def_fn strcmp
>>> +	.fnstart
>>>    	.cfi_sections .debug_frame
>>>    	.cfi_startproc
>>> +	pacbti_prologue
>>
>> See comment on patch1 about the names of these macros.
>>
>>>    1:
>>>    	ldrb	r2, [r0], #1
>>>    	ldrb	r3, [r1], #1
>>> @@ -42,6 +46,7 @@ def_fn strcmp
>>>    	beq	1b
>>>    2:
>>>    	subs	r0, r2, r3
>>> -	bx	lr
>>> +	pacbti_epilogue
>>>    	.cfi_endproc
>>> +	.fnend
>>>    	.size	strcmp, . - strcmp
>>> diff --git a/newlib/libc/machine/arm/strcmp-armv7.S b/newlib/libc/machine/arm/strcmp-armv7.S
>>> index 2f93bfb73..584c89f7e 100644
>>> --- a/newlib/libc/machine/arm/strcmp-armv7.S
>>> +++ b/newlib/libc/machine/arm/strcmp-armv7.S
>>> @@ -45,6 +45,8 @@
>>>    	.thumb
>>>    	.syntax unified
>>>    +#include "pacbti.h"
>>> +
>>>    /* Parameters and result.  */
>>>    #define src1		r0
>>>    #define src2		r1
>>> @@ -91,8 +93,9 @@
>>>    	ldrd	r4, r5, [sp], #16
>>>    	.cfi_restore 4
>>>    	.cfi_restore 5
>>> +	.cfi_adjust_cfa_offset -16
>>
>> This looks like it's an omission from the original file and thus
>> should be separated out as a separate patch.
>>
>>>    	sub	result, result, r1, lsr #24
>>> -	bx	lr
>>> +	pacbti_epilogue
>>>    #else
>>>    	/* To use the big-endian trick we'd have to reverse all three words.
>>>    	   that's slower than this approach.  */
>>> @@ -112,22 +115,28 @@
>>>    	ldrd	r4, r5, [sp], #16
>>>    	.cfi_restore 4
>>>    	.cfi_restore 5
>>> +	.cfi_adjust_cfa_offset -16
>>>    	sub	result, result, r1
>>>    -	bx	lr
>>> +	pacbti_epilogue
>>>    #endif
>>>    	.endm
>>>    +
>>>    	.text
>>>    	.p2align	5
>>> +	.fnstart
>>> +	.cfi_sections .debug_frame
>>> +	.cfi_startproc
>>>    .Lstrcmp_start_addr:
>>>    #ifndef STRCMP_NO_PRECHECK
>>>    .Lfastpath_exit:
>>>    	sub	r0, r2, r3
>>> -	bx	lr
>>> +	pacbti_epilogue
>>>    	nop
>>>    #endif
>>>    def_fn	strcmp
>>> +	pacbti_prologue
>>>    #ifndef STRCMP_NO_PRECHECK
>>>    	ldrb	r2, [src1]
>>>    	ldrb	r3, [src2]
>>> @@ -136,16 +145,26 @@ def_fn	strcmp
>>>    	cmpcs	r2, r3
>>>    	bne	.Lfastpath_exit
>>>    #endif
>>> -	.cfi_sections .debug_frame
>>> -	.cfi_startproc
>>>    	strd	r4, r5, [sp, #-16]!
>>> -	.cfi_def_cfa_offset 16
>>> +	.save {r4, r5}
>>
>> Hmm, I've just remembered that leaf functions cannot throw exceptions
>> (the EHABI only supports synchronous exceptions), so I don't think we
>> need .save directives at all.  Instead, such functions should be
>> marked with ".cantunwind".  You still want the dwarf unwind
>> information as that's for the debugger, it's just the '.save'
>> directives that aren't needed.
>>
>>> +	.cfi_adjust_cfa_offset 16
>>> +#if __HAVE_PAC_LEAF
>>> +	.cfi_offset 4, -20
>>> +	.cfi_offset 5, -16
>>> +#else
>>>    	.cfi_offset 4, -16
>>>    	.cfi_offset 5, -12
>>> +#endif /* __HAVE_PAC_LEAF */
>>
>> Ugh!  This is not the right way to do this.  Instead (see comments on
>> patch 1) use a macro that defines the additional adjustment to use in
>> various contexts, something like
>>
>> 	.cfi_offset 4, -(16+PAC_UNALIGNED_CFI_ADJ)
>>
>>>    	orr	tmp1, src1, src2
>>>    	strd	r6, r7, [sp, #8]
>>> +	.save {r6, r7}
>>> +#if __HAVE_PAC_LEAF
>>> +	.cfi_offset 6, -12
>>> +	.cfi_offset 7, -8
>>> +#else
>>>    	.cfi_offset 6, -8
>>>    	.cfi_offset 7, -4
>>> +#endif /* __HAVE_PAC_LEAF */
>>>    	mvn	const_m1, #0
>>>    	lsl	r2, tmp1, #29
>>>    	cbz	r2, .Lloop_aligned8
>>> @@ -270,7 +289,6 @@ def_fn	strcmp
>>>    	ldr	data1, [src1], #4
>>>    	beq	.Laligned_m2
>>>    	bcs	.Laligned_m1
>>> -
>>>    #ifdef STRCMP_NO_PRECHECK
>>>    	ldrb	data2, [src2, #1]
>>>    	uxtb	tmp1, data1, ror #BYTE1_OFFSET
>>> @@ -314,7 +332,8 @@ def_fn	strcmp
>>>    	mov	result, tmp1
>>>    	ldr	r4, [sp], #16
>>>    	.cfi_restore 4
>>> -	bx	lr
>>> +	.cfi_adjust_cfa_offset -16
>>> +	pacbti_epilogue
>>>      #ifndef STRCMP_NO_PRECHECK
>>>    .Laligned_m1:
>>> @@ -364,8 +383,9 @@ def_fn	strcmp
>>>    	/* R6/7 Not used in this sequence.  */
>>>    	.cfi_restore 6
>>>    	.cfi_restore 7
>>> +	.cfi_adjust_cfa_offset -16
>>>    	neg	result, result
>>> -	bx	lr
>>> +	pacbti_epilogue
>>>      6:
>>>    	.cfi_restore_state
>>> @@ -441,7 +461,8 @@ def_fn	strcmp
>>>    	/* R6/7 not used in this sequence.  */
>>>    	.cfi_restore 6
>>>    	.cfi_restore 7
>>> -	bx	lr
>>> +	.cfi_adjust_cfa_offset -16
>>> +	pacbti_epilogue
>>>      .Lstrcmp_tail:
>>>    	.cfi_restore_state
>>> @@ -463,7 +484,9 @@ def_fn	strcmp
>>>    	/* R6/7 not used in this sequence.  */
>>>    	.cfi_restore 6
>>>    	.cfi_restore 7
>>> +	.cfi_adjust_cfa_offset -16
>>>    	sub	result, result, data2, lsr #24
>>> -	bx	lr
>>> +	pacbti_epilogue
>>>    	.cfi_endproc
>>> +	.fnend
>>>    	.size strcmp, . - .Lstrcmp_start_addr
>>> diff --git a/newlib/libc/machine/arm/strcmp-armv7m.S b/newlib/libc/machine/arm/strcmp-armv7m.S
>>> index cdb4912df..fe1519f4d 100644
>>> --- a/newlib/libc/machine/arm/strcmp-armv7m.S
>>> +++ b/newlib/libc/machine/arm/strcmp-armv7m.S
>>> @@ -29,6 +29,8 @@
>>>    /* Very similar to the generic code, but uses Thumb2 as implemented
>>>       in ARMv7-M.  */
>>>    +#include "pacbti.h"
>>> +
>>>    /* Parameters and result.  */
>>>    #define src1		r0
>>>    #define src2		r1
>>> @@ -44,8 +46,10 @@
>>>    	.thumb
>>>    	.syntax unified
>>>    def_fn strcmp
>>> +	.fnstart
>>>    	.cfi_sections .debug_frame
>>>    	.cfi_startproc
>>> +	pacbti_prologue
>>>    	eor	tmp1, src1, src2
>>>    	tst	tmp1, #3
>>>    	/* Strings not at same byte offset from a word boundary.  */
>>> @@ -106,7 +110,7 @@ def_fn strcmp
>>>    	lsrs	result, result, #24
>>>    	subs	result, result, data2
>>>    #endif
>>> -	bx	lr
>>> +	pacbti_epilogue
>>>        #if 0
>>> @@ -214,12 +218,18 @@ def_fn strcmp
>>>    	cmpcs	data1, data2
>>>    	beq	.Lstrcmp_unaligned
>>>    	sub	result, data1, data2
>>> -	bx	lr
>>> +	pacbti_epilogue
>>>      2:
>>>    	stmfd	sp!, {r5}
>>> -	.cfi_def_cfa_offset 4
>>> +	.save {r5}
>>> +	.cfi_adjust_cfa_offset 4
>>> +#if __HAVE_PAC_LEAF
>>> +	.cfi_offset 5, -8 /* Account for ip register already on stack.  */
>>> +#else
>>>    	.cfi_offset 5, -4
>>> +#endif /* __HAVE_PAC_LEAF */
>>> +
>>>      	ldr	data1, [src1], #4
>>>    	and	tmp2, src2, #3
>>> @@ -353,10 +363,17 @@ def_fn strcmp
>>>    .Lstrcmp_done_equal:
>>>    	mov	result, #0
>>>    	.cfi_remember_state
>>> +#if __HAVE_PAC_LEAF
>>> +	pop {r5, ip}
>>> +	.cfi_restore 5
>>> +	.cfi_restore 143
>>> +	aut ip, lr, sp
>>> +#else
>>>    	ldmfd	sp!, {r5}
>>>    	.cfi_restore 5
>>> +#endif /* __HAVE_PAC_LEAF */
>>
>> I think we could define a macro for this as well, something like
>>          pop_reg_and_pac r5
> 
> This sounds good, but a survey through relevant function epilogues
> reveals a series of different `pop' scenarios. On the one hand, we have
> pop {r5, ip}, as we have here but on another we also have, for example,
> pop {r4,r5,r6,r7,ip}.
> 
> Do we wish to provide a short-hand notation for one pop operation but
> not the other via macros? My initial understanding is that assembler
> macros don't provide us with the necessary flexibility to handle all
> relevant cases.
> 
> I think the use of a macro here would greatly enhance code readability
> and by extension maintainability, but at first sight it feels inelegant
> to provide a solution applicable only to a single case of a more general
> problem.
> 
> Any wisdom here would be appreciated.

Yes, this is all quite tricky.  Fortunately, the vast majority of time 
registers saved on the stack frame are consecutively numbered.  So I 
think it's possible to write some macros to handle this.  The example 
below is not complete (I'll leave you to flesh out all the different 
cases :), but I think it shows that we can achieve what's needed.

@ Emit .cfi_offset directives for a consecutive sequence of registers
	.macro cfisavelist first, last, count
	.cfi_offset \last, -4*(\count)
	.if \last-\first
	cfisavelist \first, \last-1, \count+1
	.endif
	.endm

@ Create a prologue entry sequence handling PAC/BTI, if required and 
emitting
@ CFI directives for 	
	.macro prologue first=-1, last=-1, savepac=1
	.if \first != -1
	.if \last != -1
	.if \savepac
	push {r\first-r\last, ip}
	.cfi_offset 143, -4
	cfisavelist \first, \last, 2
	.else
	push {r\first-r\last}
	cfisavelist \first, \last, 1
	.endif
	.else
	push {r\first}
	cfisavelist \first, \first, 1
	.endif
	.else
	...
	.endif
	.endm

	.text
	.global foo
	.type foo, %function
foo:
	.cfi_startproc
	prologue 4, 7, savepac=0
	.cfi_endproc

	.text
	.global bar
	.type bar, %function
bar:
	.cfi_startproc
	prologue 4, 7
	.cfi_endproc

	.text
	.global wom
	.type wom, %function
wom:
	.cfi_startproc
	prologue 4, savepac=0
	.cfi_endproc
	
	.text
	.global bat
	.type bat, %function
bat:
	.cfi_startproc
	prologue savepac=0
	.cfi_endproc

R.
> 
> V.
> 
>> While you're doing that, please change the single-register ldmfd into 'pop'.
>>
>>>    	.cfi_def_cfa_offset 0
>>> -	bx	lr
>>> +	bx lr
>>
>> stray change to the BX instruction.
>>
>>>      .Lstrcmp_tail:
>>>    	.cfi_restore_state
>>> @@ -370,9 +387,18 @@ def_fn strcmp
>>>    	S2LOEQ	data2, data2, #8
>>>    	beq	.Lstrcmp_tail
>>>    	sub	result, r2, result
>>> +#if __HAVE_PAC_LEAF
>>> +	pop {r5, ip}
>>> +	.cfi_restore 5
>>> +	.cfi_restore 143
>>> +	.cfi_def_cfa_offset 0
>>> +	aut ip, lr, sp
>>> +#else
>>>    	ldmfd	sp!, {r5}
>>>    	.cfi_restore 5
>>>    	.cfi_def_cfa_offset 0
>>> -	bx	lr
>>> +#endif /* __HAVE_PAC_LEAF */
>>
>> Same as above
>>
>>> +	bx lr
>>
>> tab between mnemonic and operands.
>>
>>>    	.cfi_endproc
>>> +	.fnend
>>>    	.size strcmp, . - strcmp
>>
>> R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/8] newlib: libc: strcmp M-profile PACBTI-enablement
  2022-08-05 14:34       ` Richard Earnshaw
@ 2022-08-05 14:58         ` Victor L. Do Nascimento
  2022-08-05 15:15           ` Richard Earnshaw
  0 siblings, 1 reply; 18+ messages in thread
From: Victor L. Do Nascimento @ 2022-08-05 14:58 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: newlib, richard.earnshaw

Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:

> On 05/08/2022 11:51, Victor L. Do Nascimento wrote:
>> Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:
>> 
>>> On 03/08/2022 16:35, Victor Do Nascimento wrote:
>>>> Add function prologue/epilogue to conditionally add BTI landing pads
>>>> and/or PAC code generation & authentication instructions depending on
>>>> compilation flags.
>>>> This patch enables PACBTI for all relevant variants of strcmp:
>>>>        * Newlib for armv8.1-m.main+pacbti
>>>>        * Newlib for armv8.1-m.main+pacbti+mve
>>>>        * Newlib-nano
>>>> ---
>>>>    newlib/libc/machine/arm/strcmp-arm-tiny.S |  7 +++-
>>>>    newlib/libc/machine/arm/strcmp-armv7.S    | 45 +++++++++++++++++------
>>>>    newlib/libc/machine/arm/strcmp-armv7m.S   | 36 +++++++++++++++---
>>>>    3 files changed, 71 insertions(+), 17 deletions(-)
>>>> diff --git a/newlib/libc/machine/arm/strcmp-arm-tiny.S
>>>> b/newlib/libc/machine/arm/strcmp-arm-tiny.S
>>>> index 607a41daf..8085eb4df 100644
>>>> --- a/newlib/libc/machine/arm/strcmp-arm-tiny.S
>>>> +++ b/newlib/libc/machine/arm/strcmp-arm-tiny.S
>>>> @@ -29,10 +29,14 @@
>>>>    /* Tiny version of strcmp in ARM state.  Used only when optimizing
>>>>       for size.  Also supports Thumb-2.  */
>>>>    +#include "pacbti.h"
>>>> +
>>>>    	.syntax unified
>>>>    def_fn strcmp
>>>> +	.fnstart
>>>>    	.cfi_sections .debug_frame
>>>>    	.cfi_startproc
>>>> +	pacbti_prologue
>>>
>>> See comment on patch1 about the names of these macros.
>>>
>>>>    1:
>>>>    	ldrb	r2, [r0], #1
>>>>    	ldrb	r3, [r1], #1
>>>> @@ -42,6 +46,7 @@ def_fn strcmp
>>>>    	beq	1b
>>>>    2:
>>>>    	subs	r0, r2, r3
>>>> -	bx	lr
>>>> +	pacbti_epilogue
>>>>    	.cfi_endproc
>>>> +	.fnend
>>>>    	.size	strcmp, . - strcmp
>>>> diff --git a/newlib/libc/machine/arm/strcmp-armv7.S b/newlib/libc/machine/arm/strcmp-armv7.S
>>>> index 2f93bfb73..584c89f7e 100644
>>>> --- a/newlib/libc/machine/arm/strcmp-armv7.S
>>>> +++ b/newlib/libc/machine/arm/strcmp-armv7.S
>>>> @@ -45,6 +45,8 @@
>>>>    	.thumb
>>>>    	.syntax unified
>>>>    +#include "pacbti.h"
>>>> +
>>>>    /* Parameters and result.  */
>>>>    #define src1		r0
>>>>    #define src2		r1
>>>> @@ -91,8 +93,9 @@
>>>>    	ldrd	r4, r5, [sp], #16
>>>>    	.cfi_restore 4
>>>>    	.cfi_restore 5
>>>> +	.cfi_adjust_cfa_offset -16
>>>
>>> This looks like it's an omission from the original file and thus
>>> should be separated out as a separate patch.
>>>
>>>>    	sub	result, result, r1, lsr #24
>>>> -	bx	lr
>>>> +	pacbti_epilogue
>>>>    #else
>>>>    	/* To use the big-endian trick we'd have to reverse all three words.
>>>>    	   that's slower than this approach.  */
>>>> @@ -112,22 +115,28 @@
>>>>    	ldrd	r4, r5, [sp], #16
>>>>    	.cfi_restore 4
>>>>    	.cfi_restore 5
>>>> +	.cfi_adjust_cfa_offset -16
>>>>    	sub	result, result, r1
>>>>    -	bx	lr
>>>> +	pacbti_epilogue
>>>>    #endif
>>>>    	.endm
>>>>    +
>>>>    	.text
>>>>    	.p2align	5
>>>> +	.fnstart
>>>> +	.cfi_sections .debug_frame
>>>> +	.cfi_startproc
>>>>    .Lstrcmp_start_addr:
>>>>    #ifndef STRCMP_NO_PRECHECK
>>>>    .Lfastpath_exit:
>>>>    	sub	r0, r2, r3
>>>> -	bx	lr
>>>> +	pacbti_epilogue
>>>>    	nop
>>>>    #endif
>>>>    def_fn	strcmp
>>>> +	pacbti_prologue
>>>>    #ifndef STRCMP_NO_PRECHECK
>>>>    	ldrb	r2, [src1]
>>>>    	ldrb	r3, [src2]
>>>> @@ -136,16 +145,26 @@ def_fn	strcmp
>>>>    	cmpcs	r2, r3
>>>>    	bne	.Lfastpath_exit
>>>>    #endif
>>>> -	.cfi_sections .debug_frame
>>>> -	.cfi_startproc
>>>>    	strd	r4, r5, [sp, #-16]!
>>>> -	.cfi_def_cfa_offset 16
>>>> +	.save {r4, r5}
>>>
>>> Hmm, I've just remembered that leaf functions cannot throw exceptions
>>> (the EHABI only supports synchronous exceptions), so I don't think we
>>> need .save directives at all.  Instead, such functions should be
>>> marked with ".cantunwind".  You still want the dwarf unwind
>>> information as that's for the debugger, it's just the '.save'
>>> directives that aren't needed.
>>>
>>>> +	.cfi_adjust_cfa_offset 16
>>>> +#if __HAVE_PAC_LEAF
>>>> +	.cfi_offset 4, -20
>>>> +	.cfi_offset 5, -16
>>>> +#else
>>>>    	.cfi_offset 4, -16
>>>>    	.cfi_offset 5, -12
>>>> +#endif /* __HAVE_PAC_LEAF */
>>>
>>> Ugh!  This is not the right way to do this.  Instead (see comments on
>>> patch 1) use a macro that defines the additional adjustment to use in
>>> various contexts, something like
>>>
>>> 	.cfi_offset 4, -(16+PAC_UNALIGNED_CFI_ADJ)
>>>
>>>>    	orr	tmp1, src1, src2
>>>>    	strd	r6, r7, [sp, #8]
>>>> +	.save {r6, r7}
>>>> +#if __HAVE_PAC_LEAF
>>>> +	.cfi_offset 6, -12
>>>> +	.cfi_offset 7, -8
>>>> +#else
>>>>    	.cfi_offset 6, -8
>>>>    	.cfi_offset 7, -4
>>>> +#endif /* __HAVE_PAC_LEAF */
>>>>    	mvn	const_m1, #0
>>>>    	lsl	r2, tmp1, #29
>>>>    	cbz	r2, .Lloop_aligned8
>>>> @@ -270,7 +289,6 @@ def_fn	strcmp
>>>>    	ldr	data1, [src1], #4
>>>>    	beq	.Laligned_m2
>>>>    	bcs	.Laligned_m1
>>>> -
>>>>    #ifdef STRCMP_NO_PRECHECK
>>>>    	ldrb	data2, [src2, #1]
>>>>    	uxtb	tmp1, data1, ror #BYTE1_OFFSET
>>>> @@ -314,7 +332,8 @@ def_fn	strcmp
>>>>    	mov	result, tmp1
>>>>    	ldr	r4, [sp], #16
>>>>    	.cfi_restore 4
>>>> -	bx	lr
>>>> +	.cfi_adjust_cfa_offset -16
>>>> +	pacbti_epilogue
>>>>      #ifndef STRCMP_NO_PRECHECK
>>>>    .Laligned_m1:
>>>> @@ -364,8 +383,9 @@ def_fn	strcmp
>>>>    	/* R6/7 Not used in this sequence.  */
>>>>    	.cfi_restore 6
>>>>    	.cfi_restore 7
>>>> +	.cfi_adjust_cfa_offset -16
>>>>    	neg	result, result
>>>> -	bx	lr
>>>> +	pacbti_epilogue
>>>>      6:
>>>>    	.cfi_restore_state
>>>> @@ -441,7 +461,8 @@ def_fn	strcmp
>>>>    	/* R6/7 not used in this sequence.  */
>>>>    	.cfi_restore 6
>>>>    	.cfi_restore 7
>>>> -	bx	lr
>>>> +	.cfi_adjust_cfa_offset -16
>>>> +	pacbti_epilogue
>>>>      .Lstrcmp_tail:
>>>>    	.cfi_restore_state
>>>> @@ -463,7 +484,9 @@ def_fn	strcmp
>>>>    	/* R6/7 not used in this sequence.  */
>>>>    	.cfi_restore 6
>>>>    	.cfi_restore 7
>>>> +	.cfi_adjust_cfa_offset -16
>>>>    	sub	result, result, data2, lsr #24
>>>> -	bx	lr
>>>> +	pacbti_epilogue
>>>>    	.cfi_endproc
>>>> +	.fnend
>>>>    	.size strcmp, . - .Lstrcmp_start_addr
>>>> diff --git a/newlib/libc/machine/arm/strcmp-armv7m.S b/newlib/libc/machine/arm/strcmp-armv7m.S
>>>> index cdb4912df..fe1519f4d 100644
>>>> --- a/newlib/libc/machine/arm/strcmp-armv7m.S
>>>> +++ b/newlib/libc/machine/arm/strcmp-armv7m.S
>>>> @@ -29,6 +29,8 @@
>>>>    /* Very similar to the generic code, but uses Thumb2 as implemented
>>>>       in ARMv7-M.  */
>>>>    +#include "pacbti.h"
>>>> +
>>>>    /* Parameters and result.  */
>>>>    #define src1		r0
>>>>    #define src2		r1
>>>> @@ -44,8 +46,10 @@
>>>>    	.thumb
>>>>    	.syntax unified
>>>>    def_fn strcmp
>>>> +	.fnstart
>>>>    	.cfi_sections .debug_frame
>>>>    	.cfi_startproc
>>>> +	pacbti_prologue
>>>>    	eor	tmp1, src1, src2
>>>>    	tst	tmp1, #3
>>>>    	/* Strings not at same byte offset from a word boundary.  */
>>>> @@ -106,7 +110,7 @@ def_fn strcmp
>>>>    	lsrs	result, result, #24
>>>>    	subs	result, result, data2
>>>>    #endif
>>>> -	bx	lr
>>>> +	pacbti_epilogue
>>>>        #if 0
>>>> @@ -214,12 +218,18 @@ def_fn strcmp
>>>>    	cmpcs	data1, data2
>>>>    	beq	.Lstrcmp_unaligned
>>>>    	sub	result, data1, data2
>>>> -	bx	lr
>>>> +	pacbti_epilogue
>>>>      2:
>>>>    	stmfd	sp!, {r5}
>>>> -	.cfi_def_cfa_offset 4
>>>> +	.save {r5}
>>>> +	.cfi_adjust_cfa_offset 4
>>>> +#if __HAVE_PAC_LEAF
>>>> +	.cfi_offset 5, -8 /* Account for ip register already on stack.  */
>>>> +#else
>>>>    	.cfi_offset 5, -4
>>>> +#endif /* __HAVE_PAC_LEAF */
>>>> +
>>>>      	ldr	data1, [src1], #4
>>>>    	and	tmp2, src2, #3
>>>> @@ -353,10 +363,17 @@ def_fn strcmp
>>>>    .Lstrcmp_done_equal:
>>>>    	mov	result, #0
>>>>    	.cfi_remember_state
>>>> +#if __HAVE_PAC_LEAF
>>>> +	pop {r5, ip}
>>>> +	.cfi_restore 5
>>>> +	.cfi_restore 143
>>>> +	aut ip, lr, sp
>>>> +#else
>>>>    	ldmfd	sp!, {r5}
>>>>    	.cfi_restore 5
>>>> +#endif /* __HAVE_PAC_LEAF */
>>>
>>> I think we could define a macro for this as well, something like
>>>          pop_reg_and_pac r5
>> This sounds good, but a survey through relevant function epilogues
>> reveals a series of different `pop' scenarios. On the one hand, we have
>> pop {r5, ip}, as we have here but on another we also have, for example,
>> pop {r4,r5,r6,r7,ip}.
>> Do we wish to provide a short-hand notation for one pop operation
>> but
>> not the other via macros? My initial understanding is that assembler
>> macros don't provide us with the necessary flexibility to handle all
>> relevant cases.
>> I think the use of a macro here would greatly enhance code
>> readability
>> and by extension maintainability, but at first sight it feels inelegant
>> to provide a solution applicable only to a single case of a more general
>> problem.
>> Any wisdom here would be appreciated.
>
> Yes, this is all quite tricky.  Fortunately, the vast majority of time
> registers saved on the stack frame are consecutively numbered.  So I
> think it's possible to write some macros to handle this.  The example
> below is not complete (I'll leave you to flesh out all the different
> cases :), but I think it shows that we can achieve what's needed.
>
> @ Emit .cfi_offset directives for a consecutive sequence of registers
> 	.macro cfisavelist first, last, count
> 	.cfi_offset \last, -4*(\count)
> 	.if \last-\first
> 	cfisavelist \first, \last-1, \count+1
> 	.endif
> 	.endm
>
> @ Create a prologue entry sequence handling PAC/BTI, if required and
> emitting
> @ CFI directives for 	
> 	.macro prologue first=-1, last=-1, savepac=1
> 	.if \first != -1
> 	.if \last != -1
> 	.if \savepac
> 	push {r\first-r\last, ip}
> 	.cfi_offset 143, -4
> 	cfisavelist \first, \last, 2
> 	.else
> 	push {r\first-r\last}
> 	cfisavelist \first, \last, 1
> 	.endif
> 	.else
> 	push {r\first}
> 	cfisavelist \first, \first, 1
> 	.endif
> 	.else
> 	...
> 	.endif
> 	.endm
>
> 	.text
> 	.global foo
> 	.type foo, %function
> foo:
> 	.cfi_startproc
> 	prologue 4, 7, savepac=0
> 	.cfi_endproc
>
> 	.text
> 	.global bar
> 	.type bar, %function
> bar:
> 	.cfi_startproc
> 	prologue 4, 7
> 	.cfi_endproc
>
> 	.text
> 	.global wom
> 	.type wom, %function
> wom:
> 	.cfi_startproc
> 	prologue 4, savepac=0
> 	.cfi_endproc
> 	
> 	.text
> 	.global bat
> 	.type bat, %function
> bat:
> 	.cfi_startproc
> 	prologue savepac=0
> 	.cfi_endproc
>
> R.

Thanks for these, they'll come in extremely handy. Regarding the
pop_reg_and_pac macro you suggested, I'd concocted a solution which at
first sight seemed to make minimal assumptions about what we're asking
it  to pop and, looking at initial example disassemblies, seems to
behave well.

I'll share it with you here in case you come across anything warranting
feedback.

Furthermore, I will see whether it will be of any use in building upon
your example.

	.macro cfi_restore_list first:req, rest:vararg
	.cfi_restore \first
	.ifnb \rest
	cfi_restore_list \rest
	.endif
	.endm

	.macro pop_reg_and_pac first:req, rest:vararg
	.ifnb \rest
	pop {\first, \rest, ip}
	.else
	pop {\first, ip}
	.endif
	cfi_restore_list \first, \rest
	.cfi_restore 143
	aut ip, lr, sp
	.endm

As always, thanks for your input.
V.

>> V.
>> 
>>> While you're doing that, please change the single-register ldmfd into 'pop'.
>>>
>>>>    	.cfi_def_cfa_offset 0
>>>> -	bx	lr
>>>> +	bx lr
>>>
>>> stray change to the BX instruction.
>>>
>>>>      .Lstrcmp_tail:
>>>>    	.cfi_restore_state
>>>> @@ -370,9 +387,18 @@ def_fn strcmp
>>>>    	S2LOEQ	data2, data2, #8
>>>>    	beq	.Lstrcmp_tail
>>>>    	sub	result, r2, result
>>>> +#if __HAVE_PAC_LEAF
>>>> +	pop {r5, ip}
>>>> +	.cfi_restore 5
>>>> +	.cfi_restore 143
>>>> +	.cfi_def_cfa_offset 0
>>>> +	aut ip, lr, sp
>>>> +#else
>>>>    	ldmfd	sp!, {r5}
>>>>    	.cfi_restore 5
>>>>    	.cfi_def_cfa_offset 0
>>>> -	bx	lr
>>>> +#endif /* __HAVE_PAC_LEAF */
>>>
>>> Same as above
>>>
>>>> +	bx lr
>>>
>>> tab between mnemonic and operands.
>>>
>>>>    	.cfi_endproc
>>>> +	.fnend
>>>>    	.size strcmp, . - strcmp
>>>
>>> R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/8] newlib: libc: strcmp M-profile PACBTI-enablement
  2022-08-05 14:58         ` Victor L. Do Nascimento
@ 2022-08-05 15:15           ` Richard Earnshaw
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Earnshaw @ 2022-08-05 15:15 UTC (permalink / raw)
  To: Victor L. Do Nascimento; +Cc: newlib, richard.earnshaw



On 05/08/2022 15:58, Victor L. Do Nascimento wrote:
> Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:
> 
>> On 05/08/2022 11:51, Victor L. Do Nascimento wrote:
>>> Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:
>>>
>>>> On 03/08/2022 16:35, Victor Do Nascimento wrote:
>>>>> Add function prologue/epilogue to conditionally add BTI landing pads
>>>>> and/or PAC code generation & authentication instructions depending on
>>>>> compilation flags.
>>>>> This patch enables PACBTI for all relevant variants of strcmp:
>>>>>         * Newlib for armv8.1-m.main+pacbti
>>>>>         * Newlib for armv8.1-m.main+pacbti+mve
>>>>>         * Newlib-nano
>>>>> ---
>>>>>     newlib/libc/machine/arm/strcmp-arm-tiny.S |  7 +++-
>>>>>     newlib/libc/machine/arm/strcmp-armv7.S    | 45 +++++++++++++++++------
>>>>>     newlib/libc/machine/arm/strcmp-armv7m.S   | 36 +++++++++++++++---
>>>>>     3 files changed, 71 insertions(+), 17 deletions(-)
>>>>> diff --git a/newlib/libc/machine/arm/strcmp-arm-tiny.S
>>>>> b/newlib/libc/machine/arm/strcmp-arm-tiny.S
>>>>> index 607a41daf..8085eb4df 100644
>>>>> --- a/newlib/libc/machine/arm/strcmp-arm-tiny.S
>>>>> +++ b/newlib/libc/machine/arm/strcmp-arm-tiny.S
>>>>> @@ -29,10 +29,14 @@
>>>>>     /* Tiny version of strcmp in ARM state.  Used only when optimizing
>>>>>        for size.  Also supports Thumb-2.  */
>>>>>     +#include "pacbti.h"
>>>>> +
>>>>>     	.syntax unified
>>>>>     def_fn strcmp
>>>>> +	.fnstart
>>>>>     	.cfi_sections .debug_frame
>>>>>     	.cfi_startproc
>>>>> +	pacbti_prologue
>>>>
>>>> See comment on patch1 about the names of these macros.
>>>>
>>>>>     1:
>>>>>     	ldrb	r2, [r0], #1
>>>>>     	ldrb	r3, [r1], #1
>>>>> @@ -42,6 +46,7 @@ def_fn strcmp
>>>>>     	beq	1b
>>>>>     2:
>>>>>     	subs	r0, r2, r3
>>>>> -	bx	lr
>>>>> +	pacbti_epilogue
>>>>>     	.cfi_endproc
>>>>> +	.fnend
>>>>>     	.size	strcmp, . - strcmp
>>>>> diff --git a/newlib/libc/machine/arm/strcmp-armv7.S b/newlib/libc/machine/arm/strcmp-armv7.S
>>>>> index 2f93bfb73..584c89f7e 100644
>>>>> --- a/newlib/libc/machine/arm/strcmp-armv7.S
>>>>> +++ b/newlib/libc/machine/arm/strcmp-armv7.S
>>>>> @@ -45,6 +45,8 @@
>>>>>     	.thumb
>>>>>     	.syntax unified
>>>>>     +#include "pacbti.h"
>>>>> +
>>>>>     /* Parameters and result.  */
>>>>>     #define src1		r0
>>>>>     #define src2		r1
>>>>> @@ -91,8 +93,9 @@
>>>>>     	ldrd	r4, r5, [sp], #16
>>>>>     	.cfi_restore 4
>>>>>     	.cfi_restore 5
>>>>> +	.cfi_adjust_cfa_offset -16
>>>>
>>>> This looks like it's an omission from the original file and thus
>>>> should be separated out as a separate patch.
>>>>
>>>>>     	sub	result, result, r1, lsr #24
>>>>> -	bx	lr
>>>>> +	pacbti_epilogue
>>>>>     #else
>>>>>     	/* To use the big-endian trick we'd have to reverse all three words.
>>>>>     	   that's slower than this approach.  */
>>>>> @@ -112,22 +115,28 @@
>>>>>     	ldrd	r4, r5, [sp], #16
>>>>>     	.cfi_restore 4
>>>>>     	.cfi_restore 5
>>>>> +	.cfi_adjust_cfa_offset -16
>>>>>     	sub	result, result, r1
>>>>>     -	bx	lr
>>>>> +	pacbti_epilogue
>>>>>     #endif
>>>>>     	.endm
>>>>>     +
>>>>>     	.text
>>>>>     	.p2align	5
>>>>> +	.fnstart
>>>>> +	.cfi_sections .debug_frame
>>>>> +	.cfi_startproc
>>>>>     .Lstrcmp_start_addr:
>>>>>     #ifndef STRCMP_NO_PRECHECK
>>>>>     .Lfastpath_exit:
>>>>>     	sub	r0, r2, r3
>>>>> -	bx	lr
>>>>> +	pacbti_epilogue
>>>>>     	nop
>>>>>     #endif
>>>>>     def_fn	strcmp
>>>>> +	pacbti_prologue
>>>>>     #ifndef STRCMP_NO_PRECHECK
>>>>>     	ldrb	r2, [src1]
>>>>>     	ldrb	r3, [src2]
>>>>> @@ -136,16 +145,26 @@ def_fn	strcmp
>>>>>     	cmpcs	r2, r3
>>>>>     	bne	.Lfastpath_exit
>>>>>     #endif
>>>>> -	.cfi_sections .debug_frame
>>>>> -	.cfi_startproc
>>>>>     	strd	r4, r5, [sp, #-16]!
>>>>> -	.cfi_def_cfa_offset 16
>>>>> +	.save {r4, r5}
>>>>
>>>> Hmm, I've just remembered that leaf functions cannot throw exceptions
>>>> (the EHABI only supports synchronous exceptions), so I don't think we
>>>> need .save directives at all.  Instead, such functions should be
>>>> marked with ".cantunwind".  You still want the dwarf unwind
>>>> information as that's for the debugger, it's just the '.save'
>>>> directives that aren't needed.
>>>>
>>>>> +	.cfi_adjust_cfa_offset 16
>>>>> +#if __HAVE_PAC_LEAF
>>>>> +	.cfi_offset 4, -20
>>>>> +	.cfi_offset 5, -16
>>>>> +#else
>>>>>     	.cfi_offset 4, -16
>>>>>     	.cfi_offset 5, -12
>>>>> +#endif /* __HAVE_PAC_LEAF */
>>>>
>>>> Ugh!  This is not the right way to do this.  Instead (see comments on
>>>> patch 1) use a macro that defines the additional adjustment to use in
>>>> various contexts, something like
>>>>
>>>> 	.cfi_offset 4, -(16+PAC_UNALIGNED_CFI_ADJ)
>>>>
>>>>>     	orr	tmp1, src1, src2
>>>>>     	strd	r6, r7, [sp, #8]
>>>>> +	.save {r6, r7}
>>>>> +#if __HAVE_PAC_LEAF
>>>>> +	.cfi_offset 6, -12
>>>>> +	.cfi_offset 7, -8
>>>>> +#else
>>>>>     	.cfi_offset 6, -8
>>>>>     	.cfi_offset 7, -4
>>>>> +#endif /* __HAVE_PAC_LEAF */
>>>>>     	mvn	const_m1, #0
>>>>>     	lsl	r2, tmp1, #29
>>>>>     	cbz	r2, .Lloop_aligned8
>>>>> @@ -270,7 +289,6 @@ def_fn	strcmp
>>>>>     	ldr	data1, [src1], #4
>>>>>     	beq	.Laligned_m2
>>>>>     	bcs	.Laligned_m1
>>>>> -
>>>>>     #ifdef STRCMP_NO_PRECHECK
>>>>>     	ldrb	data2, [src2, #1]
>>>>>     	uxtb	tmp1, data1, ror #BYTE1_OFFSET
>>>>> @@ -314,7 +332,8 @@ def_fn	strcmp
>>>>>     	mov	result, tmp1
>>>>>     	ldr	r4, [sp], #16
>>>>>     	.cfi_restore 4
>>>>> -	bx	lr
>>>>> +	.cfi_adjust_cfa_offset -16
>>>>> +	pacbti_epilogue
>>>>>       #ifndef STRCMP_NO_PRECHECK
>>>>>     .Laligned_m1:
>>>>> @@ -364,8 +383,9 @@ def_fn	strcmp
>>>>>     	/* R6/7 Not used in this sequence.  */
>>>>>     	.cfi_restore 6
>>>>>     	.cfi_restore 7
>>>>> +	.cfi_adjust_cfa_offset -16
>>>>>     	neg	result, result
>>>>> -	bx	lr
>>>>> +	pacbti_epilogue
>>>>>       6:
>>>>>     	.cfi_restore_state
>>>>> @@ -441,7 +461,8 @@ def_fn	strcmp
>>>>>     	/* R6/7 not used in this sequence.  */
>>>>>     	.cfi_restore 6
>>>>>     	.cfi_restore 7
>>>>> -	bx	lr
>>>>> +	.cfi_adjust_cfa_offset -16
>>>>> +	pacbti_epilogue
>>>>>       .Lstrcmp_tail:
>>>>>     	.cfi_restore_state
>>>>> @@ -463,7 +484,9 @@ def_fn	strcmp
>>>>>     	/* R6/7 not used in this sequence.  */
>>>>>     	.cfi_restore 6
>>>>>     	.cfi_restore 7
>>>>> +	.cfi_adjust_cfa_offset -16
>>>>>     	sub	result, result, data2, lsr #24
>>>>> -	bx	lr
>>>>> +	pacbti_epilogue
>>>>>     	.cfi_endproc
>>>>> +	.fnend
>>>>>     	.size strcmp, . - .Lstrcmp_start_addr
>>>>> diff --git a/newlib/libc/machine/arm/strcmp-armv7m.S b/newlib/libc/machine/arm/strcmp-armv7m.S
>>>>> index cdb4912df..fe1519f4d 100644
>>>>> --- a/newlib/libc/machine/arm/strcmp-armv7m.S
>>>>> +++ b/newlib/libc/machine/arm/strcmp-armv7m.S
>>>>> @@ -29,6 +29,8 @@
>>>>>     /* Very similar to the generic code, but uses Thumb2 as implemented
>>>>>        in ARMv7-M.  */
>>>>>     +#include "pacbti.h"
>>>>> +
>>>>>     /* Parameters and result.  */
>>>>>     #define src1		r0
>>>>>     #define src2		r1
>>>>> @@ -44,8 +46,10 @@
>>>>>     	.thumb
>>>>>     	.syntax unified
>>>>>     def_fn strcmp
>>>>> +	.fnstart
>>>>>     	.cfi_sections .debug_frame
>>>>>     	.cfi_startproc
>>>>> +	pacbti_prologue
>>>>>     	eor	tmp1, src1, src2
>>>>>     	tst	tmp1, #3
>>>>>     	/* Strings not at same byte offset from a word boundary.  */
>>>>> @@ -106,7 +110,7 @@ def_fn strcmp
>>>>>     	lsrs	result, result, #24
>>>>>     	subs	result, result, data2
>>>>>     #endif
>>>>> -	bx	lr
>>>>> +	pacbti_epilogue
>>>>>         #if 0
>>>>> @@ -214,12 +218,18 @@ def_fn strcmp
>>>>>     	cmpcs	data1, data2
>>>>>     	beq	.Lstrcmp_unaligned
>>>>>     	sub	result, data1, data2
>>>>> -	bx	lr
>>>>> +	pacbti_epilogue
>>>>>       2:
>>>>>     	stmfd	sp!, {r5}
>>>>> -	.cfi_def_cfa_offset 4
>>>>> +	.save {r5}
>>>>> +	.cfi_adjust_cfa_offset 4
>>>>> +#if __HAVE_PAC_LEAF
>>>>> +	.cfi_offset 5, -8 /* Account for ip register already on stack.  */
>>>>> +#else
>>>>>     	.cfi_offset 5, -4
>>>>> +#endif /* __HAVE_PAC_LEAF */
>>>>> +
>>>>>       	ldr	data1, [src1], #4
>>>>>     	and	tmp2, src2, #3
>>>>> @@ -353,10 +363,17 @@ def_fn strcmp
>>>>>     .Lstrcmp_done_equal:
>>>>>     	mov	result, #0
>>>>>     	.cfi_remember_state
>>>>> +#if __HAVE_PAC_LEAF
>>>>> +	pop {r5, ip}
>>>>> +	.cfi_restore 5
>>>>> +	.cfi_restore 143
>>>>> +	aut ip, lr, sp
>>>>> +#else
>>>>>     	ldmfd	sp!, {r5}
>>>>>     	.cfi_restore 5
>>>>> +#endif /* __HAVE_PAC_LEAF */
>>>>
>>>> I think we could define a macro for this as well, something like
>>>>           pop_reg_and_pac r5
>>> This sounds good, but a survey through relevant function epilogues
>>> reveals a series of different `pop' scenarios. On the one hand, we have
>>> pop {r5, ip}, as we have here but on another we also have, for example,
>>> pop {r4,r5,r6,r7,ip}.
>>> Do we wish to provide a short-hand notation for one pop operation
>>> but
>>> not the other via macros? My initial understanding is that assembler
>>> macros don't provide us with the necessary flexibility to handle all
>>> relevant cases.
>>> I think the use of a macro here would greatly enhance code
>>> readability
>>> and by extension maintainability, but at first sight it feels inelegant
>>> to provide a solution applicable only to a single case of a more general
>>> problem.
>>> Any wisdom here would be appreciated.
>>
>> Yes, this is all quite tricky.  Fortunately, the vast majority of time
>> registers saved on the stack frame are consecutively numbered.  So I
>> think it's possible to write some macros to handle this.  The example
>> below is not complete (I'll leave you to flesh out all the different
>> cases :), but I think it shows that we can achieve what's needed.
>>
>> @ Emit .cfi_offset directives for a consecutive sequence of registers
>> 	.macro cfisavelist first, last, count
>> 	.cfi_offset \last, -4*(\count)
>> 	.if \last-\first
>> 	cfisavelist \first, \last-1, \count+1
>> 	.endif
>> 	.endm
>>
>> @ Create a prologue entry sequence handling PAC/BTI, if required and
>> emitting
>> @ CFI directives for 	
>> 	.macro prologue first=-1, last=-1, savepac=1
>> 	.if \first != -1
>> 	.if \last != -1
>> 	.if \savepac
>> 	push {r\first-r\last, ip}
>> 	.cfi_offset 143, -4
>> 	cfisavelist \first, \last, 2
>> 	.else
>> 	push {r\first-r\last}
>> 	cfisavelist \first, \last, 1
>> 	.endif
>> 	.else
>> 	push {r\first}
>> 	cfisavelist \first, \first, 1
>> 	.endif
>> 	.else
>> 	...
>> 	.endif
>> 	.endm
>>
>> 	.text
>> 	.global foo
>> 	.type foo, %function
>> foo:
>> 	.cfi_startproc
>> 	prologue 4, 7, savepac=0
>> 	.cfi_endproc
>>
>> 	.text
>> 	.global bar
>> 	.type bar, %function
>> bar:
>> 	.cfi_startproc
>> 	prologue 4, 7
>> 	.cfi_endproc
>>
>> 	.text
>> 	.global wom
>> 	.type wom, %function
>> wom:
>> 	.cfi_startproc
>> 	prologue 4, savepac=0
>> 	.cfi_endproc
>> 	
>> 	.text
>> 	.global bat
>> 	.type bat, %function
>> bat:
>> 	.cfi_startproc
>> 	prologue savepac=0
>> 	.cfi_endproc
>>
>> R.
> 
> Thanks for these, they'll come in extremely handy. Regarding the
> pop_reg_and_pac macro you suggested, I'd concocted a solution which at
> first sight seemed to make minimal assumptions about what we're asking
> it  to pop and, looking at initial example disassemblies, seems to
> behave well.
> 
> I'll share it with you here in case you come across anything warranting
> feedback.
> 
> Furthermore, I will see whether it will be of any use in building upon
> your example.
> 
> 	.macro cfi_restore_list first:req, rest:vararg
> 	.cfi_restore \first
> 	.ifnb \rest
> 	cfi_restore_list \rest
> 	.endif
> 	.endm
> 
> 	.macro pop_reg_and_pac first:req, rest:vararg
> 	.ifnb \rest
> 	pop {\first, \rest, ip}
> 	.else
> 	pop {\first, ip}
> 	.endif
> 	cfi_restore_list \first, \rest
> 	.cfi_restore 143
> 	aut ip, lr, sp
> 	.endm
> 

Yes, I tried something similar to start with, but couldn't make it work 
for the depth-first recursion needed for the cfi save directives 
(because you need to count the list of registers to save in order to 
work out the offsets of each one).  The second problem is that, AFAICT, 
macros can't be used as values to parameters (they simply produce text), 
so you can't map names onto values (and I was trying to avoid register 
names in .cfi statements).

R.


> As always, thanks for your input.
> V.
> 
>>> V.
>>>
>>>> While you're doing that, please change the single-register ldmfd into 'pop'.
>>>>
>>>>>     	.cfi_def_cfa_offset 0
>>>>> -	bx	lr
>>>>> +	bx lr
>>>>
>>>> stray change to the BX instruction.
>>>>
>>>>>       .Lstrcmp_tail:
>>>>>     	.cfi_restore_state
>>>>> @@ -370,9 +387,18 @@ def_fn strcmp
>>>>>     	S2LOEQ	data2, data2, #8
>>>>>     	beq	.Lstrcmp_tail
>>>>>     	sub	result, r2, result
>>>>> +#if __HAVE_PAC_LEAF
>>>>> +	pop {r5, ip}
>>>>> +	.cfi_restore 5
>>>>> +	.cfi_restore 143
>>>>> +	.cfi_def_cfa_offset 0
>>>>> +	aut ip, lr, sp
>>>>> +#else
>>>>>     	ldmfd	sp!, {r5}
>>>>>     	.cfi_restore 5
>>>>>     	.cfi_def_cfa_offset 0
>>>>> -	bx	lr
>>>>> +#endif /* __HAVE_PAC_LEAF */
>>>>
>>>> Same as above
>>>>
>>>>> +	bx lr
>>>>
>>>> tab between mnemonic and operands.
>>>>
>>>>>     	.cfi_endproc
>>>>> +	.fnend
>>>>>     	.size strcmp, . - strcmp
>>>>
>>>> R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/8] newlib: libc: define M-profile PACBTI-enablement macros
  2022-08-04 15:19   ` Richard Earnshaw
  2022-08-05  9:21     ` Victor L. Do Nascimento
@ 2022-08-16 16:11     ` Victor L. Do Nascimento
  2022-08-18 16:28       ` Victor L. Do Nascimento
  1 sibling, 1 reply; 18+ messages in thread
From: Victor L. Do Nascimento @ 2022-08-16 16:11 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: newlib, richard.earnshaw

Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:

> On 03/08/2022 16:35, Victor Do Nascimento wrote:
>> Create an assembly header file that conditionally defines fuction
>> prologues/epilogues depending on the compile-time mbranch-protection
>> argument values.
>> * newlib/libc/machine/arm/pacbti.h: New.
>> ---
>>   newlib/libc/machine/arm/pacbti.h | 64 ++++++++++++++++++++++++++++++++
>>   1 file changed, 64 insertions(+)
>>   create mode 100644 newlib/libc/machine/arm/pacbti.h
>> diff --git a/newlib/libc/machine/arm/pacbti.h
>> b/newlib/libc/machine/arm/pacbti.h
>> new file mode 100644
>> index 000000000..4c31d00bc
>> --- /dev/null
>> +++ b/newlib/libc/machine/arm/pacbti.h
>> @@ -0,0 +1,64 @@
>> +/*
>> + * Copyright (c) 2022 Arm Ltd
>> + * All rights reserved.
>> + *
>> + * Redistribution and use in source and binary forms, with or without
>> + * modification, are permitted provided that the following conditions
>> + * are met:
>> + * 1. Redistributions of source code must retain the above copyright
>> + *    notice, this list of conditions and the following disclaimer.
>> + * 2. Redistributions in binary form must reproduce the above copyright
>> + *    notice, this list of conditions and the following disclaimer in the
>> + *    documentation and/or other materials provided with the distribution.
>> + * 3. The name of the company may not be used to endorse or promote
>> + *    products derived from this software without specific prior written
>> + *    permission.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED
>> + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
>> + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
>> + * IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
>> + * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
>> + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
>> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
>> + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
>> + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + */
>> +
>> +/* Checki whether leaf function PAC signing has been requested
>> +   in the -mbranch-protect compile-time option.  */
>> +#define LEAF_PROTECT_BIT 2
>> +#define __HAVE_PAC_LEAF \
>> +	__ARM_FEATURE_PAC_DEFAULT & (1 << LEAF_PROTECT_BIT)
>
> Either this header needs to avoid polluting the user namespace, or it
> doesn't.  If it does, then LEAF_PROTECT_BIT fails to do that.  If it
> doesn't then __HAVE_PAC_LEAF should really be HAVE_PAC_LEAF (to avoid
> polluting the reserved namespace.  I suspect this header is private to
> newlib (won't be exported to users), so should not be prefixing names
> with __.
>
>> +
>> +/* Macro to handle function entry depending on branch-protection
>> +   schemes.  */
>> +	.macro pacbti_prologue
>> +#if __HAVE_PAC_LEAF
>> +#if __ARM_FEATURE_BTI_DEFAULT
>> +	pacbti ip, lr, sp
>> +#else
>> +	pac ip, lr, sp
>> +#endif /* __ARM_FEATURE_BTI_DEFAULT */
>> +	.cfi_register 143, 12
>> +	str ip, [sp, #-4]!
>
> This causes the stack to lose 8-byte alignment.  Whilst for leaf
> functions that's probably not a problem, the macro should have an
> option where the user can ask for alignment to be preserved.  But
> there should also be an option to not save IP on the stack at all (for
> when the user does not modify IP in the function body).

Work is ongoing on implementing the macro with all necessary features.

A better macro implementation for the prologue is given as follows (but
still lacks any alignment preservation logic):

/* Emit .cfi_offset directives for a consecutive sequence of registers.
   PAC_CFI_ADJ will be 4 bytes if IP reg is pushed. */
	.macro cfisavelist first, last, index=1
	.cfi_offset \last, -4*(\index) - PAC_CFI_ADJ
	.if \last-\first
	cfisavelist \first, \last-1, \index+1
	.endif
	.endm

/* Create a prologue entry sequence handling PAC/BTI, if required and emitting
   CFI directives for generated PAC code and any pushed registers.  */

#define NREG ((\last-\first)+1)
	
	.macro prologue first=-1, last=-1, savepac=PAC_LEAF_PUSH_IP
#if HAVE_PAC_LEAF
#if __ARM_FEATURE_BTI_DEFAULT
	pacbti	ip, lr, sp
#else
	pac	ip, lr, sp
#endif /* __ARM_FEATURE_BTI_DEFAULT */
	.cfi_register 143, 12
#else
#if __ARM_FEATURE_BTI_DEFAULT
	bti
#endif /* __ARM_FEATURE_BTI_DEFAULT */
#endif /* HAVE_PAC_LEAF */
	.if \first != -1
	.if \last != -1
	.if \savepac
	push {r\first-r\last, ip}
	.cfi_adjust_cfa_offset NREG*4 + PAC_CFI_ADJ
	.cfi_offset 143, -PAC_CFI_ADJ
	cfisavelist \first, \last
	.else
	push {r\first-r\last}
	.cfi_adjust_cfa_offset NREG*4
	cfisavelist \first, \last
	.endif
	.else
	.if \savepac
	push {r\first, ip}
	.cfi_adjust_cfa_offset 4 + PAC_CFI_ADJ
	.cfi_offset 143, -PAC_CFI_ADJ 
	cfisavelist \first, \first
	.else // !\savepac
	push {r\first}
	cfisavelist \first, \first
	.endif
	.endif
	.else // \first == -1
	.if \savepac
	push {ip}
	.cfi_adjust_cfa_offset PAC_CFI_ADJ
	.cfi_offset 143, -PAC_CFI_ADJ
	.endif
	.endif
	.endm

The question remains of how I can ammend the register list in the push
directive when we wish to push an extra dummy register to the stack for
alignment purposes.

A hypothetical solution (albeit a broken one, as we can't redefine macro
arguments in the body of the macro) for when pushing an even no. of
registers + the pac code is:

	.if \first != -1
	.if \last != -1
	.if \savepac

+	.if \align_requested && ( ( (NREG+1) % 2 ) != 0 && (\first != 0) )
+	\first = \first - 1
+       .endif
        
 	push {r\first-r\last, ip}
	.cfi_adjust_cfa_offset NREG*4 + PAC_CFI_ADJ

+       .if \align_requested && ( ((NREG+1) % 2 ) != 0 && (\first != 0) )
+       /* Having pushed the dummy, restore original register range.  */
+       \first = \first + 1
+       .endif

	.cfi_offset 143, -PAC_CFI_ADJ
 	cfisavelist \first, \last

        .else
        ...

The logic above seems correct to me, but I need to figure out a way of
adjusting the value of \first in the least invasive way.

If we emit the push instruction via a separate macro, we could then pass
the \first argument as \first-1 (e.g. pushmacro \first=\first-1,
last=last, savepac=1). Even so, creating a macro just to emit the push
instruction seems overkill. We may end up having way too many macros.

Do you have any insights on how I may be able to fix the above code to
allow for alignment preservation elegantly within the limitations of
what assembler macros can do?

Cheers,
V.

>> +	.save {ra_auth_code}
>> +	.cfi_def_cfa_offset 4
>> +	.cfi_offset 143, -4
>> +#elif __ARM_FEATURE_BTI_DEFAULT
>> +	bti
>> +#endif /* __HAVE_PAC_LEAF */
>> +	.endm
>> +
>> +/* Macro to handle different branch exchange cases depending on
>> +   branch-protection schemes.  */
>> +	.macro pacbti_epilogue
>> +#if __HAVE_PAC_LEAF
>> +	ldr ip, [sp], #4
>> +	.cfi_restore 143
>> +	.cfi_def_cfa_offset 0
>> +	aut ip, lr, sp
>> +#endif /* __HAVE_PAC_LEAF */
>> +	bx lr
>> +	.endm
>
> I think these macros are really misnamed, firstly, they're only for
> leaf functions and secondly, they (particularly the epilogue) does
> something even if PAC is not needed.  So I think they should be
> renamed as 'leaf_prologue' and 'leaf_epilogue' respectively.
>
> In consequence, I think this file should really be merged into the
> existing arm_asm.h, then we don't need yet another header file.
>
> Finally, the header needs to define a (C) macro that defines how much
> to adjust the CFI offset by for various scenarios:
> - When PAC is not used
> - When PAC is used with no alignment
> - When PAC is used and asked for 8-byte alignment to be preserved.
>
> R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/8] newlib: libc: define M-profile PACBTI-enablement macros
  2022-08-16 16:11     ` Victor L. Do Nascimento
@ 2022-08-18 16:28       ` Victor L. Do Nascimento
  0 siblings, 0 replies; 18+ messages in thread
From: Victor L. Do Nascimento @ 2022-08-18 16:28 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: newlib, richard.earnshaw

"Victor L. Do Nascimento" <victor.donascimento@arm.com> writes:

> Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:
>
>> On 03/08/2022 16:35, Victor Do Nascimento wrote:
>>> Create an assembly header file that conditionally defines fuction
>>> prologues/epilogues depending on the compile-time mbranch-protection
>>> argument values.
>>> * newlib/libc/machine/arm/pacbti.h: New.
>>> ---
>>>   newlib/libc/machine/arm/pacbti.h | 64 ++++++++++++++++++++++++++++++++
>>>   1 file changed, 64 insertions(+)
>>>   create mode 100644 newlib/libc/machine/arm/pacbti.h
>>> diff --git a/newlib/libc/machine/arm/pacbti.h
>>> b/newlib/libc/machine/arm/pacbti.h
>>> new file mode 100644
>>> index 000000000..4c31d00bc
>>> --- /dev/null
>>> +++ b/newlib/libc/machine/arm/pacbti.h
>>> @@ -0,0 +1,64 @@
>>> +/*
>>> + * Copyright (c) 2022 Arm Ltd
>>> + * All rights reserved.
>>> + *
>>> + * Redistribution and use in source and binary forms, with or without
>>> + * modification, are permitted provided that the following conditions
>>> + * are met:
>>> + * 1. Redistributions of source code must retain the above copyright
>>> + *    notice, this list of conditions and the following disclaimer.
>>> + * 2. Redistributions in binary form must reproduce the above copyright
>>> + *    notice, this list of conditions and the following disclaimer in the
>>> + *    documentation and/or other materials provided with the distribution.
>>> + * 3. The name of the company may not be used to endorse or promote
>>> + *    products derived from this software without specific prior written
>>> + *    permission.
>>> + *
>>> + * THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED
>>> + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
>>> + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
>>> + * IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
>>> + * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
>>> + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
>>> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
>>> + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
>>> + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>>> + */
>>> +
>>> +/* Checki whether leaf function PAC signing has been requested
>>> +   in the -mbranch-protect compile-time option.  */
>>> +#define LEAF_PROTECT_BIT 2
>>> +#define __HAVE_PAC_LEAF \
>>> +	__ARM_FEATURE_PAC_DEFAULT & (1 << LEAF_PROTECT_BIT)
>>
>> Either this header needs to avoid polluting the user namespace, or it
>> doesn't.  If it does, then LEAF_PROTECT_BIT fails to do that.  If it
>> doesn't then __HAVE_PAC_LEAF should really be HAVE_PAC_LEAF (to avoid
>> polluting the reserved namespace.  I suspect this header is private to
>> newlib (won't be exported to users), so should not be prefixing names
>> with __.
>>
>>> +
>>> +/* Macro to handle function entry depending on branch-protection
>>> +   schemes.  */
>>> +	.macro pacbti_prologue
>>> +#if __HAVE_PAC_LEAF
>>> +#if __ARM_FEATURE_BTI_DEFAULT
>>> +	pacbti ip, lr, sp
>>> +#else
>>> +	pac ip, lr, sp
>>> +#endif /* __ARM_FEATURE_BTI_DEFAULT */
>>> +	.cfi_register 143, 12
>>> +	str ip, [sp, #-4]!
>>
>> This causes the stack to lose 8-byte alignment.  Whilst for leaf
>> functions that's probably not a problem, the macro should have an
>> option where the user can ask for alignment to be preserved.  But
>> there should also be an option to not save IP on the stack at all (for
>> when the user does not modify IP in the function body).
>
> Work is ongoing on implementing the macro with all necessary features.
>
> A better macro implementation for the prologue is given as follows (but
> still lacks any alignment preservation logic):
>
> /* Emit .cfi_offset directives for a consecutive sequence of registers.
>    PAC_CFI_ADJ will be 4 bytes if IP reg is pushed. */
> 	.macro cfisavelist first, last, index=1
> 	.cfi_offset \last, -4*(\index) - PAC_CFI_ADJ
> 	.if \last-\first
> 	cfisavelist \first, \last-1, \index+1
> 	.endif
> 	.endm
>
> /* Create a prologue entry sequence handling PAC/BTI, if required and emitting
>    CFI directives for generated PAC code and any pushed registers.  */
>
> #define NREG ((\last-\first)+1)
> 	
> 	.macro prologue first=-1, last=-1, savepac=PAC_LEAF_PUSH_IP
> #if HAVE_PAC_LEAF
> #if __ARM_FEATURE_BTI_DEFAULT
> 	pacbti	ip, lr, sp
> #else
> 	pac	ip, lr, sp
> #endif /* __ARM_FEATURE_BTI_DEFAULT */
> 	.cfi_register 143, 12
> #else
> #if __ARM_FEATURE_BTI_DEFAULT
> 	bti
> #endif /* __ARM_FEATURE_BTI_DEFAULT */
> #endif /* HAVE_PAC_LEAF */
> 	.if \first != -1
> 	.if \last != -1
> 	.if \savepac
> 	push {r\first-r\last, ip}
> 	.cfi_adjust_cfa_offset NREG*4 + PAC_CFI_ADJ
> 	.cfi_offset 143, -PAC_CFI_ADJ
> 	cfisavelist \first, \last
> 	.else
> 	push {r\first-r\last}
> 	.cfi_adjust_cfa_offset NREG*4
> 	cfisavelist \first, \last
> 	.endif
> 	.else
> 	.if \savepac
> 	push {r\first, ip}
> 	.cfi_adjust_cfa_offset 4 + PAC_CFI_ADJ
> 	.cfi_offset 143, -PAC_CFI_ADJ 
> 	cfisavelist \first, \first
> 	.else // !\savepac
> 	push {r\first}
> 	cfisavelist \first, \first
> 	.endif
> 	.endif
> 	.else // \first == -1
> 	.if \savepac
> 	push {ip}
> 	.cfi_adjust_cfa_offset PAC_CFI_ADJ
> 	.cfi_offset 143, -PAC_CFI_ADJ
> 	.endif
> 	.endif
> 	.endm
>
> The question remains of how I can ammend the register list in the push
> directive when we wish to push an extra dummy register to the stack for
> alignment purposes.
>
> A hypothetical solution (albeit a broken one, as we can't redefine macro
> arguments in the body of the macro) for when pushing an even no. of
> registers + the pac code is:
>
> 	.if \first != -1
> 	.if \last != -1
> 	.if \savepac
>
> +	.if \align_requested && ( ( (NREG+1) % 2 ) != 0 && (\first != 0) )
> +	\first = \first - 1
> +       .endif
>         
>  	push {r\first-r\last, ip}
> 	.cfi_adjust_cfa_offset NREG*4 + PAC_CFI_ADJ
>
> +       .if \align_requested && ( ((NREG+1) % 2 ) != 0 && (\first != 0) )
> +       /* Having pushed the dummy, restore original register range.  */
> +       \first = \first + 1
> +       .endif
>
> 	.cfi_offset 143, -PAC_CFI_ADJ
>  	cfisavelist \first, \last
>
>         .else
>         ...
>
> The logic above seems correct to me, but I need to figure out a way of
> adjusting the value of \first in the least invasive way.
>
> If we emit the push instruction via a separate macro, we could then pass
> the \first argument as \first-1 (e.g. pushmacro \first=\first-1,
> last=last, savepac=1). Even so, creating a macro just to emit the push
> instruction seems overkill. We may end up having way too many macros.
>
> Do you have any insights on how I may be able to fix the above code to
> allow for alignment preservation elegantly within the limitations of
> what assembler macros can do?
>
> Cheers,
> V.
>

I've also been thinking about simplifying the align logic, such that if
we request alignment preservation, it always pushes a dummy register along
w/ the pac code.  This way, we don't preserve alignment, but rather
alignment "parity.

If in the absence of pushing the pac-code a simple push operation would
result in a misaligned stack pointer, I believe that adding the pac code
to the register list should also result in the preservation of
misalignment.

I think that if in a hypothetical situation code gets written such
that the original prologue results in a misaligned stack, it's correct
to assume that the developer is aware of this and will manually
compensate for it later on in the code if alignment is needed.  The
risk I see is that having too much hidden logic in our prologue macro
might result in unforseen bugs for coders with insufficient familiarity
with it.

The alignment preservation feature should behave as follows: "With this
turned on, no perceptable difference is observed in program behaviour
irrespective of the presence/absence of the pac code on the stack."

>>> +	.save {ra_auth_code}
>>> +	.cfi_def_cfa_offset 4
>>> +	.cfi_offset 143, -4
>>> +#elif __ARM_FEATURE_BTI_DEFAULT
>>> +	bti
>>> +#endif /* __HAVE_PAC_LEAF */
>>> +	.endm
>>> +
>>> +/* Macro to handle different branch exchange cases depending on
>>> +   branch-protection schemes.  */
>>> +	.macro pacbti_epilogue
>>> +#if __HAVE_PAC_LEAF
>>> +	ldr ip, [sp], #4
>>> +	.cfi_restore 143
>>> +	.cfi_def_cfa_offset 0
>>> +	aut ip, lr, sp
>>> +#endif /* __HAVE_PAC_LEAF */
>>> +	bx lr
>>> +	.endm
>>
>> I think these macros are really misnamed, firstly, they're only for
>> leaf functions and secondly, they (particularly the epilogue) does
>> something even if PAC is not needed.  So I think they should be
>> renamed as 'leaf_prologue' and 'leaf_epilogue' respectively.
>>
>> In consequence, I think this file should really be merged into the
>> existing arm_asm.h, then we don't need yet another header file.
>>
>> Finally, the header needs to define a (C) macro that defines how much
>> to adjust the CFI offset by for various scenarios:
>> - When PAC is not used
>> - When PAC is used with no alignment
>> - When PAC is used and asked for 8-byte alignment to be preserved.

This creates anoter level of complexity. As you mention that users
should have the option of whether or not to push the pac code onto the stack
when it won't be overwritten over the course of a leaf fuction, the
logic for this C macro will take this into account.  This leads us to
create something as follows:

#if HAVE_PAC_LEAF        /* Pac code requested for leaf functions.  */
# if PAC_LEAF_PUSH_IP    /* Always push pac code to  the stack.  */
#  if ALIGN_PRESERVE
#   define PAC_CFI_ADJ 8
#  else
#   define PAC_CFI_ADJ 4
#  endif
# else
#  define PAC_CFI_ADJ 0
# endif /* PAC_LEAF_PUSH_IP*/
#else
# define PAC_CFI_ADJ 0
#endif /* HAVE_PAC_LEAF */

Thus, if PAC_LEAF_PUSH_IP is not requested, PAC_CFI_ADJ is set to 0.
This creates a problem for functions where pushing the pac code onto
the stack is mandatory, as is the case for functions where the register
storing it will be reused in the body of the function.
In That case, we need a second macro which is independent of the value
of PAC_LEAF_PUSH_IP.  This leads to our second macro to be used for such
functions: 

#if HAVE_PAC_LEAF
# if ALIGN_PRESERVE	
#  define PAC_CFI_ADJ_DEFAULT 8
# else
#  define PAC_CFI_ADJ_DEFAULT 4
# endif
#endif

and developers would have to choose correctly whether to write something
like

       	.cfi_offset <reg>, -(<base_offset>+PAC_CFI_ADJ)
 or
	.cfi_offset <reg>, -(<base_offset>+PAC_CFI_ADJ_DEFAULT)

Does this sound remotely acceptable?

We could use an assembler macro rather than C macro and calculate the
offset based on some passed argument, but either way we find ourselves
catering for a growing number of use cases and with it the complexity of
any solution grows.

I just thought I'd ask what your thoughts were on these matters.
V.

>>
>> R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-08-18 16:28 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-03 15:35 [PATCH v2 0/8] Implement assembly cortex-M PACBTI functionality Victor Do Nascimento
2022-08-03 15:35 ` [PATCH v2 1/8] newlib: libc: define M-profile PACBTI-enablement macros Victor Do Nascimento
2022-08-04 15:19   ` Richard Earnshaw
2022-08-05  9:21     ` Victor L. Do Nascimento
2022-08-16 16:11     ` Victor L. Do Nascimento
2022-08-18 16:28       ` Victor L. Do Nascimento
2022-08-03 15:35 ` [PATCH v2 2/8] newlib: libc: strcmp M-profile PACBTI-enablement Victor Do Nascimento
2022-08-04 15:48   ` Richard Earnshaw
2022-08-05 10:51     ` Victor L. Do Nascimento
2022-08-05 14:34       ` Richard Earnshaw
2022-08-05 14:58         ` Victor L. Do Nascimento
2022-08-05 15:15           ` Richard Earnshaw
2022-08-03 15:35 ` [PATCH v2 3/8] newlib: libc: strlen " Victor Do Nascimento
2022-08-03 15:35 ` [PATCH v2 4/8] newlib: libc: memchr " Victor Do Nascimento
2022-08-03 15:35 ` [PATCH v2 5/8] newlib: libc: memcpy " Victor Do Nascimento
2022-08-03 15:35 ` [PATCH v2 6/8] newlib: libc: setjmp/longjmp " Victor Do Nascimento
2022-08-03 15:35 ` [PATCH v2 7/8] newlib: libc: aeabi_memmove " Victor Do Nascimento
2022-08-03 15:35 ` [PATCH v2 8/8] newlib: libc: aeabi_memset " Victor Do Nascimento

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).