public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 1/2] ld: pru: Merge the bss input sections into data
@ 2023-01-29 15:32 Dimitar Dimitrov
  2023-01-29 15:38 ` [PATCH 2/2] ld: pru: Add optional section alignments Dimitar Dimitrov
  2023-02-01  9:43 ` [PATCH 1/2] ld: pru: Merge the bss input sections into data Nick Clifton
  0 siblings, 2 replies; 4+ messages in thread
From: Dimitar Dimitrov @ 2023-01-29 15:32 UTC (permalink / raw)
  To: binutils; +Cc: Dimitar Dimitrov

The popular method to load PRU firmware is through the remoteproc Linux
kernel driver.  In order to save a few bytes from the firmware, the PRU
CRT0 is spared from calling memset for the bss segment [1].  Instead the
host loader is supposed to zero out the bss segment.  This is important
for PRU, which typically has only 8KB for instruction memory.

The legacy non-mainline PRU host driver relied on the default
behaviour of the kernel core remoteproc [2].  That default is to zero
out the loadable memory regions not backed by file storage (i.e. the
bss sections).  This worked for the libgloss' CRT0.

But the PRU loader merged in mainline Linux explicitly changes the
default behaviour [3].  It no longer is zeroing out memory regions.
Hence the bss sections are not initialized - neither by CRT0, nor by the
host loader.

This patch fixes the issue by aligning the GNU LD default linker script
with the mainline Linux kernel expectation.  Since the mainline kernel
driver is submitted by the PRU manufacturer itself (Text Instruments),
we can consider that as defining the ABI.

This change has been tested on Beaglebone AI-64 [4].  Static counter
variables in the firmware are now always starting from zero, as
expected.  There was only one new toolchain test failure in orphan3.d,
due to reordering of the output sections.  I believe this is a harmless
issue.  I could not rewrite the PASS criteria to ignore the output
section ordering, so I have disabled that test case for PRU.

Ok for master?

[1] https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=libgloss/pru/crt0.S;h=b3f0d53a93acc372f461007553e7688ca77753c9;hb=HEAD#l40
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/remoteproc_elf_loader.c?h=v6.1#n228
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/pru_rproc.c?h=v6.1#n641
[4] https://beagleboard.org/ai-64

ld/ChangeLog:

	* scripttempl/pru.sc (.data): Merge .bss input sections into the
	.data output section.
	* testsuite/ld-elf/orphan3.d: Disable for PRU..

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
---
 ld/scripttempl/pru.sc         | 24 ++++++++++++++----------
 ld/testsuite/ld-elf/orphan3.d |  2 +-
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/ld/scripttempl/pru.sc b/ld/scripttempl/pru.sc
index 56d07bea15d..7cb80694641 100644
--- a/ld/scripttempl/pru.sc
+++ b/ld/scripttempl/pru.sc
@@ -149,17 +149,13 @@ SECTIONS
     ${RELOCATING+*(.gnu.linkonce.r*)}
     ${RELOCATING+. = ALIGN(4);}
     ${RELOCATING+ PROVIDE (_data_end = .) ; }
-  } ${RELOCATING+ > dmem }
 
-  /* Linux remoteproc loader requires the resource_table section
-     start address to be aligned to 8 bytes.  */
-  .resource_table ${RELOCATING-0} ${RELOCATING+ ALIGN(8)} :
-  {
-    KEEP (*(.resource_table))
-  } ${RELOCATING+ > dmem}
-
-  .bss ${RELOCATING-0} :
-  {
+    ${RELOCATING+/* Merge the bss input sections into the output
+      data section.  The Linux kernel's remoteproc PRU ELF loader
+      will not memzero the bss section.  The CRT0 will not either, in order
+      to reduce the final firmware's instruction memory size.  Hence
+      present bss sections as regular data sections, at the negligible
+      expense of increasing the ELF file size.  */}
     ${RELOCATING+ PROVIDE (_bss_start = .) ; }
     *(.bss)
     ${RELOCATING+ *(.bss.*)}
@@ -167,6 +163,14 @@ SECTIONS
     ${RELOCATING+*(.gnu.linkonce.b*)}
     ${RELOCATING+*(COMMON)}
     ${RELOCATING+ PROVIDE (_bss_end = .) ; }
+
+  } ${RELOCATING+ > dmem}
+
+  /* Linux remoteproc loader requires the resource_table section
+     start address to be aligned to 8 bytes.  */
+  .resource_table ${RELOCATING-0} ${RELOCATING+ ALIGN (8)} :
+  {
+    KEEP (*(.resource_table))
   } ${RELOCATING+ > dmem}
 
   /* Global data not cleared after reset.  */
diff --git a/ld/testsuite/ld-elf/orphan3.d b/ld/testsuite/ld-elf/orphan3.d
index af6ce25e0fb..ec101643dea 100644
--- a/ld/testsuite/ld-elf/orphan3.d
+++ b/ld/testsuite/ld-elf/orphan3.d
@@ -7,7 +7,7 @@
 #ld:
 #readelf: -S --wide
 #xfail: [uses_genelf]
-#xfail: xstormy16-*-*
+#xfail: xstormy16-*-* pru-*-*
 
 #...
   \[[ 0-9]+\] \.foo +PROGBITS +[0-9a-f]+ +[0-9a-f]+ +0+20 +0+ +A +0 +0 +[0-9]+
-- 
2.39.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/2] ld: pru: Add optional section alignments
  2023-01-29 15:32 [PATCH 1/2] ld: pru: Merge the bss input sections into data Dimitar Dimitrov
@ 2023-01-29 15:38 ` Dimitar Dimitrov
  2023-02-01  9:43   ` Nick Clifton
  2023-02-01  9:43 ` [PATCH 1/2] ld: pru: Merge the bss input sections into data Nick Clifton
  1 sibling, 1 reply; 4+ messages in thread
From: Dimitar Dimitrov @ 2023-01-29 15:38 UTC (permalink / raw)
  To: binutils; +Cc: Dimitar Dimitrov

The Texas Instruments SoCs with AARCH64 host processors have stricter
alignment requirements than ones with ARM32 host processors.  It's not
only the requirement for resource_table to be aligned to 8.  But also
any loadable segment size must be a multiple of 4 [1].

The current PRU default linker script may output a segment size not
aligned to 4, which would cause firmware load failure on AARCH64 hosts.

Fix this by using COMMONPAGESIZE and MAXPAGESIZE to signify respectively
the section memory size requirement and the resource table section's
start address alignment.  This would avoid penalizing the ARM32 hosts,
for which the default values (1 and 1) are sufficient.

For AARCH64 hosts, the alignments would be overwritten from GCC spec
files using the linker command line, e.g.:
  -z common-page-size=4 -z max-page-size=8

Ok for master?

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/pru_rproc.c?h=v6.1#n555

ld/ChangeLog:

	* scripttempl/pru.sc (_data_end): Remove the alignment.
	(.data): Align output section size to COMMONPAGESIZE.
	(.resource_table): Ditto.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
---
 ld/scripttempl/pru.sc | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/ld/scripttempl/pru.sc b/ld/scripttempl/pru.sc
index 7cb80694641..4229bed99bb 100644
--- a/ld/scripttempl/pru.sc
+++ b/ld/scripttempl/pru.sc
@@ -147,7 +147,6 @@ SECTIONS
     ${RELOCATING+ *(.rodata:*)}
     ${RELOCATING+*(.gnu.linkonce.d*)}
     ${RELOCATING+*(.gnu.linkonce.r*)}
-    ${RELOCATING+. = ALIGN(4);}
     ${RELOCATING+ PROVIDE (_data_end = .) ; }
 
     ${RELOCATING+/* Merge the bss input sections into the output
@@ -164,13 +163,22 @@ SECTIONS
     ${RELOCATING+*(COMMON)}
     ${RELOCATING+ PROVIDE (_bss_end = .) ; }
 
+    ${RELOCATING+/* In case this is the last input section, align to
+      keep the loadable segment size a multiple of the common page size.
+      Some SoCs have stricter memory size requirements than others.  */
+    . = ALIGN (CONSTANT (COMMONPAGESIZE));}
   } ${RELOCATING+ > dmem}
 
   /* Linux remoteproc loader requires the resource_table section
-     start address to be aligned to 8 bytes.  */
-  .resource_table ${RELOCATING-0} ${RELOCATING+ ALIGN (8)} :
+     start address to be aligned to 8 bytes for SoCs with AARCH64
+     host processors.  */
+  .resource_table ${RELOCATING-0} ${RELOCATING+ ALIGN (CONSTANT (MAXPAGESIZE))} :
   {
     KEEP (*(.resource_table))
+    ${RELOCATING+/* In case this is the last input section, align to
+      keep the loadable segment size a multiple of the common page size.
+      Some SoCs have stricter memory size requirements than others.  */
+    . = ALIGN (CONSTANT (COMMONPAGESIZE));}
   } ${RELOCATING+ > dmem}
 
   /* Global data not cleared after reset.  */
-- 
2.39.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] ld: pru: Merge the bss input sections into data
  2023-01-29 15:32 [PATCH 1/2] ld: pru: Merge the bss input sections into data Dimitar Dimitrov
  2023-01-29 15:38 ` [PATCH 2/2] ld: pru: Add optional section alignments Dimitar Dimitrov
@ 2023-02-01  9:43 ` Nick Clifton
  1 sibling, 0 replies; 4+ messages in thread
From: Nick Clifton @ 2023-02-01  9:43 UTC (permalink / raw)
  To: Dimitar Dimitrov, binutils

Hi Dimitar,

> This change has been tested on Beaglebone AI-64 [4].  Static counter
> variables in the firmware are now always starting from zero, as
> expected.  There was only one new toolchain test failure in orphan3.d,
> due to reordering of the output sections.  I believe this is a harmless
> issue.  I could not rewrite the PASS criteria to ignore the output
> section ordering, so I have disabled that test case for PRU.
> 
> Ok for master?

Approved - please apply.

Cheers
   Nick



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] ld: pru: Add optional section alignments
  2023-01-29 15:38 ` [PATCH 2/2] ld: pru: Add optional section alignments Dimitar Dimitrov
@ 2023-02-01  9:43   ` Nick Clifton
  0 siblings, 0 replies; 4+ messages in thread
From: Nick Clifton @ 2023-02-01  9:43 UTC (permalink / raw)
  To: Dimitar Dimitrov, binutils

Hi Dimitar,

> ld/ChangeLog:
> 
> 	* scripttempl/pru.sc (_data_end): Remove the alignment.
> 	(.data): Align output section size to COMMONPAGESIZE.
> 	(.resource_table): Ditto.

Approved - please apply.

Cheers
   Nick



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-02-01  9:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-29 15:32 [PATCH 1/2] ld: pru: Merge the bss input sections into data Dimitar Dimitrov
2023-01-29 15:38 ` [PATCH 2/2] ld: pru: Add optional section alignments Dimitar Dimitrov
2023-02-01  9:43   ` Nick Clifton
2023-02-01  9:43 ` [PATCH 1/2] ld: pru: Merge the bss input sections into data Nick Clifton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).