public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed
* [PATCH,MIPS 2/3] Enable reorder for crt0.S
@ 2014-11-18 12:14 Matthew Fortune
  2014-11-26 18:00 ` Steve Ellcey
  2019-07-04 17:23 ` Richard Sandiford
  0 siblings, 2 replies; 8+ messages in thread
From: Matthew Fortune @ 2014-11-18 12:14 UTC (permalink / raw)
  To: newlib

Hi,

As part of a long term plan to reduce the amount of hand written .set noreorder
code, I have reworked the crt0.S file so that the assembler can fill delay
slots instead of them being explicitly filled.  The reason for doing this is
to enable future auto-conversion of delay slot branches to 'compact' branches
present in the R6 architecture. Auto-conversion is not possible in a .set
noreorder block as any delay slot branch with a non-NOP delay slot would have
to be reordered!!! to convert to a compact branch without a delay slot.
Writing code in a natural linear order is (subjectively) also much simpler to
digest and maintain.

One ugly piece of code had to be reworked in the zerobss loop which was using
a pseudo-instruction BLTU as the branch. The structure of the old-loop was
clearly aiming to produce a tight loop with one instruction and the delay slot
filled but the expansion of the BLTU would have undone this anyway. This has
been reworked to create the kind of loop originally intended and have the
assembler fill the delay slot. The precise behaviour of the loop is subtly
different from before for two reasons:

1) When the _fbss and _end symbols have the same value then the old loop would
   have written zero to every address from _fbss to the end of memory (or an
   exception occurred). The new loop is skipped if the two symbols are the same.
2) The old loop wrote zero to address of _end which is past the end of the
   bss range. The new loop does not do this.
3) When _fbss is greater then _end at the start then the old loop would have
   written one element and exited. The new loop will attempt to write zero
   to every address from _fbss to the end of memory, wrap and continue to
   _end (or hit an exception). This change in behaviour is fine as the
   scenario is invalid anyway.
4) The _end marker is now aligned to 4-bytes to ensure that the last element
   written does not reach beyond the address of _end. This is also necessary
   as the termination condition is an equality test instead of an ordered
   test so (_end - _fbss) must be a multiple of 4-bytes.

Delay slot filling will occur when libgloss is built with GCC and an
optimisation level greater than zero. This gets translated to an assembler
optimisation level of '2'.

All instance of JAL <reg> have been changed to JALR <reg> as there is no
special handling in the JAL macro in binutils for a register operand and
JALR is the real underlying instruction.

This change is primarily verified by code inspection but has also been run
through some small test programs.

Thanks,
Matthew

libgloss/

	* mips/crt0.S: Remove .set noreorder throughout.  Change JAL <reg> to
	JALR <reg> throughout.
	(zerobss): Open code the bltu macro instruction so that the
	zero-loop does not have a NOP in the branch delay slot.
---
 libgloss/mips/crt0.S | 53 ++++++++++++++++++----------------------------------
 1 file changed, 18 insertions(+), 35 deletions(-)

diff --git a/libgloss/mips/crt0.S b/libgloss/mips/crt0.S
index 599e79c..f66ef1b 100644
--- a/libgloss/mips/crt0.S
+++ b/libgloss/mips/crt0.S
@@ -57,13 +57,14 @@
 	.globl	_start
 	.ent	_start
 _start:
-	.set	noreorder
 #ifdef __mips_embedded_pic
 #define PICBASE start_PICBASE
+	.set	noreorder
 	PICBASE = .+8
         bal	PICBASE
 	nop
 	move	s0,$31
+	.set	reorder
 #endif
 #if __mips<3
 #  define STATUS_MASK (SR_CU1|SR_PE)
@@ -89,9 +90,7 @@ _start:
 	/* Avoid hazard from FPU enable and other SR changes.  */
 	LA (t0, hardware_hazard_hook)
 	beq	t0,zero,1f
-	nop
-	jal	t0
-	nop
+	jalr	t0
 1:
 
 /* Check for FPU presence.  Don't check if we know that soft_float is
@@ -105,11 +104,8 @@ _start:
 	mfc1	t1,fp1
 	nop
 	bne	t0,t2,1f	/* check for match */
-	nop
 	bne	t1,zero,1f	/* double check */
-	nop
 	j	2f		/* FPU is present. */
-	nop
 #endif
 1:
 	/* FPU is not present.  Set status register to say that. */
@@ -119,9 +115,7 @@ _start:
 	/* Avoid hazard from FPU disable.  */
 	LA (t0, hardware_hazard_hook)
 	beq	t0,zero,2f
-	nop
-	jal	t0
-	nop
+	jalr	t0
 2:
 
 
@@ -129,7 +123,6 @@ _start:
    doesn't get confused.  */
 	LA (v0, 3f)
 	jr	v0
-	nop
 3:
 	LA (gp, _gp)				# set the global data pointer
 	.end _start
@@ -145,21 +138,20 @@ _start:
 zerobss:
 	LA (v0, _fbss)
 	LA (v1, _end)
-3:
-	sw	zero,0(v0)
-	bltu	v0,v1,3b
-	addiu	v0,v0,4				# executed in delay slot
-
+	beq	v0,v1,2f
+1:
+	addiu	v0,v0,4
+	sw	zero,-4(v0)
+	bne	v0,v1,1b
+2:
 	la	t0, __lstack			# make a small stack so we
 	addiu	sp, t0, STARTUP_STACK_SIZE	# can run some C code
 	la	a0, __memsize			# get the usable memory size
 	jal	get_mem_info
-	nop
 
 	/* setup the stack pointer */
 	LA (t0, __stack)			# is __stack set ?
 	bne	t0,zero,4f
-	nop
 
 	/* NOTE: a0[0] contains the amount of memory available, and
 	         not the last memory address. */
@@ -189,19 +181,14 @@ zerobss:
 init:
 	LA (t9, hardware_init_hook)		# init the hardware if needed
 	beq	t9,zero,6f
-	nop
-	jal	t9
-	nop
+	jalr	t9
 6:
 	LA (t9, software_init_hook)		# init the hardware if needed
 	beq	t9,zero,7f
-	nop
-	jal	t9
-	nop
+	jalr	t9
 7:
 	LA (a0, _fini)
 	jal	atexit
-	nop
 
 #ifdef GCRT0
 	.globl	_ftext
@@ -209,12 +196,10 @@ init:
 	LA (a0, _ftext)
 	LA (a1, _etext)
 	jal	monstartup
-	nop
 #endif
 
 
 	jal	_init				# run global constructors
-	nop
 
 	addiu	a1,sp,32			# argv = sp + 32
 	addiu	a2,sp,40			# envp = sp + 40
@@ -225,13 +210,13 @@ init:
 	sw	zero,(a1)
 	sw	zero,(a2)
 #endif
-	jal	main				# call the program start function
 	move	a0,zero				# set argc to 0
+	jal	main				# call the program start function
 
 	# fall through to the "exit" routine
+	move	a0,v0				# pass through the exit code
 	jal	exit				# call libc exit to run the G++
 						# destructors
-	move	a0,v0				# pass through the exit code
 	.end	init
 
  
@@ -257,27 +242,25 @@ _exit:
 	/* Need to reinit PICBASE, since we might be called via exit()
 	   rather than via a return path which would restore old s0.  */
 #define PICBASE exit_PICBASE
+	.set	noreorder
 	PICBASE = .+8
 	bal	PICBASE
 	nop
 	move	s0,$31
+	.set	reorder
 #endif
 #ifdef GCRT0
 	LA (t0, _mcleanup)
-	jal	t0
-	nop
+	jalr	t0
 #endif
 	LA (t0, hardware_exit_hook)
 	beq	t0,zero,1f
-	nop
-	jal	t0
-	nop
+	jalr	t0
 1:
 
 	# break instruction can cope with 0xfffff, but GAS limits the range:
 	break	1023
 	b	7b				# but loop back just in-case
-	nop
 	.end _exit
  
 /* Assume the PICBASE set up above is no longer valid below here.  */
-- 
1.9.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH,MIPS 2/3] Enable reorder for crt0.S
  2014-11-18 12:14 [PATCH,MIPS 2/3] Enable reorder for crt0.S Matthew Fortune
@ 2014-11-26 18:00 ` Steve Ellcey
  2014-11-27  0:15   ` Matthew Fortune
  2019-07-04 17:23 ` Richard Sandiford
  1 sibling, 1 reply; 8+ messages in thread
From: Steve Ellcey @ 2014-11-26 18:00 UTC (permalink / raw)
  To: newlib

On Tue, 2014-11-18 at 12:14 +0000, Matthew Fortune wrote:

> libgloss/
> 
> 	* mips/crt0.S: Remove .set noreorder throughout.  Change JAL <reg> to
> 	JALR <reg> throughout.
> 	(zerobss): Open code the bltu macro instruction so that the
> 	zero-loop does not have a NOP in the branch delay slot.

This looks OK to me.

Steve Ellcey

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH,MIPS 2/3] Enable reorder for crt0.S
  2014-11-26 18:00 ` Steve Ellcey
@ 2014-11-27  0:15   ` Matthew Fortune
  2014-11-27  0:28     ` Jeff Johnston
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Fortune @ 2014-11-27  0:15 UTC (permalink / raw)
  To: Steve Ellcey, newlib

[-- Attachment #1: Type: text/plain, Size: 137 bytes --]

> This looks OK to me.

Re-submitting this as an attachment as the inline copy would not apply
cleanly for Jeff.

Thanks,
Matthew

[-- Attachment #2: 0002-Enable-reorder.patch --]
[-- Type: application/octet-stream, Size: 3964 bytes --]

From 597dbd2da2b3c63841340b9164ed2ab4341515d0 Mon Sep 17 00:00:00 2001
From: Matthew Fortune <matthew.fortune@imgtec.com>
Date: Mon, 17 Nov 2014 11:59:53 +0000
Subject: [PATCH 2/3] Enable reorder

libgloss/

	* mips/crt0.S: Remove .set noreorder throughout.
	(zerobss): Open code the bltu macro instruction so that the
	zero-loop does not have a NOP in the branch delay slot.
---
 libgloss/mips/crt0.S | 53 ++++++++++++++++++----------------------------------
 1 file changed, 18 insertions(+), 35 deletions(-)

diff --git a/libgloss/mips/crt0.S b/libgloss/mips/crt0.S
index 599e79c..f66ef1b 100644
--- a/libgloss/mips/crt0.S
+++ b/libgloss/mips/crt0.S
@@ -57,13 +57,14 @@
 	.globl	_start
 	.ent	_start
 _start:
-	.set	noreorder
 #ifdef __mips_embedded_pic
 #define PICBASE start_PICBASE
+	.set	noreorder
 	PICBASE = .+8
         bal	PICBASE
 	nop
 	move	s0,$31
+	.set	reorder
 #endif
 #if __mips<3
 #  define STATUS_MASK (SR_CU1|SR_PE)
@@ -89,9 +90,7 @@ _start:
 	/* Avoid hazard from FPU enable and other SR changes.  */
 	LA (t0, hardware_hazard_hook)
 	beq	t0,zero,1f
-	nop
-	jal	t0
-	nop
+	jalr	t0
 1:
 
 /* Check for FPU presence.  Don't check if we know that soft_float is
@@ -105,11 +104,8 @@ _start:
 	mfc1	t1,fp1
 	nop
 	bne	t0,t2,1f	/* check for match */
-	nop
 	bne	t1,zero,1f	/* double check */
-	nop
 	j	2f		/* FPU is present. */
-	nop
 #endif
 1:
 	/* FPU is not present.  Set status register to say that. */
@@ -119,9 +115,7 @@ _start:
 	/* Avoid hazard from FPU disable.  */
 	LA (t0, hardware_hazard_hook)
 	beq	t0,zero,2f
-	nop
-	jal	t0
-	nop
+	jalr	t0
 2:
 
 
@@ -129,7 +123,6 @@ _start:
    doesn't get confused.  */
 	LA (v0, 3f)
 	jr	v0
-	nop
 3:
 	LA (gp, _gp)				# set the global data pointer
 	.end _start
@@ -145,21 +138,20 @@ _start:
 zerobss:
 	LA (v0, _fbss)
 	LA (v1, _end)
-3:
-	sw	zero,0(v0)
-	bltu	v0,v1,3b
-	addiu	v0,v0,4				# executed in delay slot
-
+	beq	v0,v1,2f
+1:
+	addiu	v0,v0,4
+	sw	zero,-4(v0)
+	bne	v0,v1,1b
+2:
 	la	t0, __lstack			# make a small stack so we
 	addiu	sp, t0, STARTUP_STACK_SIZE	# can run some C code
 	la	a0, __memsize			# get the usable memory size
 	jal	get_mem_info
-	nop
 
 	/* setup the stack pointer */
 	LA (t0, __stack)			# is __stack set ?
 	bne	t0,zero,4f
-	nop
 
 	/* NOTE: a0[0] contains the amount of memory available, and
 	         not the last memory address. */
@@ -189,19 +181,14 @@ zerobss:
 init:
 	LA (t9, hardware_init_hook)		# init the hardware if needed
 	beq	t9,zero,6f
-	nop
-	jal	t9
-	nop
+	jalr	t9
 6:
 	LA (t9, software_init_hook)		# init the hardware if needed
 	beq	t9,zero,7f
-	nop
-	jal	t9
-	nop
+	jalr	t9
 7:
 	LA (a0, _fini)
 	jal	atexit
-	nop
 
 #ifdef GCRT0
 	.globl	_ftext
@@ -209,12 +196,10 @@ init:
 	LA (a0, _ftext)
 	LA (a1, _etext)
 	jal	monstartup
-	nop
 #endif
 
 
 	jal	_init				# run global constructors
-	nop
 
 	addiu	a1,sp,32			# argv = sp + 32
 	addiu	a2,sp,40			# envp = sp + 40
@@ -225,13 +210,13 @@ init:
 	sw	zero,(a1)
 	sw	zero,(a2)
 #endif
-	jal	main				# call the program start function
 	move	a0,zero				# set argc to 0
+	jal	main				# call the program start function
 
 	# fall through to the "exit" routine
+	move	a0,v0				# pass through the exit code
 	jal	exit				# call libc exit to run the G++
 						# destructors
-	move	a0,v0				# pass through the exit code
 	.end	init
 
  
@@ -257,27 +242,25 @@ _exit:
 	/* Need to reinit PICBASE, since we might be called via exit()
 	   rather than via a return path which would restore old s0.  */
 #define PICBASE exit_PICBASE
+	.set	noreorder
 	PICBASE = .+8
 	bal	PICBASE
 	nop
 	move	s0,$31
+	.set	reorder
 #endif
 #ifdef GCRT0
 	LA (t0, _mcleanup)
-	jal	t0
-	nop
+	jalr	t0
 #endif
 	LA (t0, hardware_exit_hook)
 	beq	t0,zero,1f
-	nop
-	jal	t0
-	nop
+	jalr	t0
 1:
 
 	# break instruction can cope with 0xfffff, but GAS limits the range:
 	break	1023
 	b	7b				# but loop back just in-case
-	nop
 	.end _exit
  
 /* Assume the PICBASE set up above is no longer valid below here.  */
-- 
1.9.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH,MIPS 2/3] Enable reorder for crt0.S
  2014-11-27  0:15   ` Matthew Fortune
@ 2014-11-27  0:28     ` Jeff Johnston
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Johnston @ 2014-11-27  0:28 UTC (permalink / raw)
  To: Matthew Fortune; +Cc: Steve Ellcey, newlib

Thanks, that did it.  Patch applied.

-- Jeff J.

----- Original Message -----
> From: "Matthew Fortune" <Matthew.Fortune@imgtec.com>
> To: "Steve Ellcey" <Steve.Ellcey@imgtec.com>, newlib@sourceware.org
> Sent: Wednesday, November 26, 2014 7:15:43 PM
> Subject: RE: [PATCH,MIPS 2/3] Enable reorder for crt0.S
> 
> > This looks OK to me.
> 
> Re-submitting this as an attachment as the inline copy would not apply
> cleanly for Jeff.
> 
> Thanks,
> Matthew
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH,MIPS 2/3] Enable reorder for crt0.S
  2014-11-18 12:14 [PATCH,MIPS 2/3] Enable reorder for crt0.S Matthew Fortune
  2014-11-26 18:00 ` Steve Ellcey
@ 2019-07-04 17:23 ` Richard Sandiford
  2019-07-06  3:26   ` Faraz Shahbazker
  2019-07-30  0:00   ` [PATCH, MIPS] Align _end symbol to at least 4 in all MIPS scripts Faraz Shahbazker
  1 sibling, 2 replies; 8+ messages in thread
From: Richard Sandiford @ 2019-07-04 17:23 UTC (permalink / raw)
  To: Matthew Fortune; +Cc: newlib

Hi,

Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
> Hi,
>
> As part of a long term plan to reduce the amount of hand written .set noreorder
> code, I have reworked the crt0.S file so that the assembler can fill delay
> slots instead of them being explicitly filled.  The reason for doing this is
> to enable future auto-conversion of delay slot branches to 'compact' branches
> present in the R6 architecture. Auto-conversion is not possible in a .set
> noreorder block as any delay slot branch with a non-NOP delay slot would have
> to be reordered!!! to convert to a compact branch without a delay slot.
> Writing code in a natural linear order is (subjectively) also much simpler to
> digest and maintain.
>
> One ugly piece of code had to be reworked in the zerobss loop which was using
> a pseudo-instruction BLTU as the branch. The structure of the old-loop was
> clearly aiming to produce a tight loop with one instruction and the delay slot
> filled but the expansion of the BLTU would have undone this anyway. This has
> been reworked to create the kind of loop originally intended and have the
> assembler fill the delay slot. The precise behaviour of the loop is subtly
> different from before for two reasons:
>
> 1) When the _fbss and _end symbols have the same value then the old loop would
>    have written zero to every address from _fbss to the end of memory (or an
>    exception occurred). The new loop is skipped if the two symbols are the same.
> 2) The old loop wrote zero to address of _end which is past the end of the
>    bss range. The new loop does not do this.
> 3) When _fbss is greater then _end at the start then the old loop would have
>    written one element and exited. The new loop will attempt to write zero
>    to every address from _fbss to the end of memory, wrap and continue to
>    _end (or hit an exception). This change in behaviour is fine as the
>    scenario is invalid anyway.
> 4) The _end marker is now aligned to 4-bytes to ensure that the last element
>    written does not reach beyond the address of _end. This is also necessary
>    as the termination condition is an equality test instead of an ordered
>    test so (_end - _fbss) must be a multiple of 4-bytes.

Sorry to jump on this old patch, but I couldn't see anything that did
(4).  I was trying to test mipsisa64-elf with idt64.ld and many tests
end up with an _end that isn't four-byte aligned, leading to an
"infinite" zeroing loop.

I guess we should add:

  . = ALIGN(4);

in front of:

   PROVIDE (end = .);
   _end = .;

for each script that doesn't already align to 4 or beyond.

Thanks,
Richard

>
> Delay slot filling will occur when libgloss is built with GCC and an
> optimisation level greater than zero. This gets translated to an assembler
> optimisation level of '2'.
>
> All instance of JAL <reg> have been changed to JALR <reg> as there is no
> special handling in the JAL macro in binutils for a register operand and
> JALR is the real underlying instruction.
>
> This change is primarily verified by code inspection but has also been run
> through some small test programs.
>
> Thanks,
> Matthew
>
> libgloss/
>
> 	* mips/crt0.S: Remove .set noreorder throughout.  Change JAL <reg> to
> 	JALR <reg> throughout.
> 	(zerobss): Open code the bltu macro instruction so that the
> 	zero-loop does not have a NOP in the branch delay slot.
> ---
>  libgloss/mips/crt0.S | 53 ++++++++++++++++++----------------------------------
>  1 file changed, 18 insertions(+), 35 deletions(-)
>
> diff --git a/libgloss/mips/crt0.S b/libgloss/mips/crt0.S
> index 599e79c..f66ef1b 100644
> --- a/libgloss/mips/crt0.S
> +++ b/libgloss/mips/crt0.S
> @@ -57,13 +57,14 @@
>  	.globl	_start
>  	.ent	_start
>  _start:
> -	.set	noreorder
>  #ifdef __mips_embedded_pic
>  #define PICBASE start_PICBASE
> +	.set	noreorder
>  	PICBASE = .+8
>          bal	PICBASE
>  	nop
>  	move	s0,$31
> +	.set	reorder
>  #endif
>  #if __mips<3
>  #  define STATUS_MASK (SR_CU1|SR_PE)
> @@ -89,9 +90,7 @@ _start:
>  	/* Avoid hazard from FPU enable and other SR changes.  */
>  	LA (t0, hardware_hazard_hook)
>  	beq	t0,zero,1f
> -	nop
> -	jal	t0
> -	nop
> +	jalr	t0
>  1:
>  
>  /* Check for FPU presence.  Don't check if we know that soft_float is
> @@ -105,11 +104,8 @@ _start:
>  	mfc1	t1,fp1
>  	nop
>  	bne	t0,t2,1f	/* check for match */
> -	nop
>  	bne	t1,zero,1f	/* double check */
> -	nop
>  	j	2f		/* FPU is present. */
> -	nop
>  #endif
>  1:
>  	/* FPU is not present.  Set status register to say that. */
> @@ -119,9 +115,7 @@ _start:
>  	/* Avoid hazard from FPU disable.  */
>  	LA (t0, hardware_hazard_hook)
>  	beq	t0,zero,2f
> -	nop
> -	jal	t0
> -	nop
> +	jalr	t0
>  2:
>  
>  
> @@ -129,7 +123,6 @@ _start:
>     doesn't get confused.  */
>  	LA (v0, 3f)
>  	jr	v0
> -	nop
>  3:
>  	LA (gp, _gp)				# set the global data pointer
>  	.end _start
> @@ -145,21 +138,20 @@ _start:
>  zerobss:
>  	LA (v0, _fbss)
>  	LA (v1, _end)
> -3:
> -	sw	zero,0(v0)
> -	bltu	v0,v1,3b
> -	addiu	v0,v0,4				# executed in delay slot
> -
> +	beq	v0,v1,2f
> +1:
> +	addiu	v0,v0,4
> +	sw	zero,-4(v0)
> +	bne	v0,v1,1b
> +2:
>  	la	t0, __lstack			# make a small stack so we
>  	addiu	sp, t0, STARTUP_STACK_SIZE	# can run some C code
>  	la	a0, __memsize			# get the usable memory size
>  	jal	get_mem_info
> -	nop
>  
>  	/* setup the stack pointer */
>  	LA (t0, __stack)			# is __stack set ?
>  	bne	t0,zero,4f
> -	nop
>  
>  	/* NOTE: a0[0] contains the amount of memory available, and
>  	         not the last memory address. */
> @@ -189,19 +181,14 @@ zerobss:
>  init:
>  	LA (t9, hardware_init_hook)		# init the hardware if needed
>  	beq	t9,zero,6f
> -	nop
> -	jal	t9
> -	nop
> +	jalr	t9
>  6:
>  	LA (t9, software_init_hook)		# init the hardware if needed
>  	beq	t9,zero,7f
> -	nop
> -	jal	t9
> -	nop
> +	jalr	t9
>  7:
>  	LA (a0, _fini)
>  	jal	atexit
> -	nop
>  
>  #ifdef GCRT0
>  	.globl	_ftext
> @@ -209,12 +196,10 @@ init:
>  	LA (a0, _ftext)
>  	LA (a1, _etext)
>  	jal	monstartup
> -	nop
>  #endif
>  
>  
>  	jal	_init				# run global constructors
> -	nop
>  
>  	addiu	a1,sp,32			# argv = sp + 32
>  	addiu	a2,sp,40			# envp = sp + 40
> @@ -225,13 +210,13 @@ init:
>  	sw	zero,(a1)
>  	sw	zero,(a2)
>  #endif
> -	jal	main				# call the program start function
>  	move	a0,zero				# set argc to 0
> +	jal	main				# call the program start function
>  
>  	# fall through to the "exit" routine
> +	move	a0,v0				# pass through the exit code
>  	jal	exit				# call libc exit to run the G++
>  						# destructors
> -	move	a0,v0				# pass through the exit code
>  	.end	init
>  
>   
> @@ -257,27 +242,25 @@ _exit:
>  	/* Need to reinit PICBASE, since we might be called via exit()
>  	   rather than via a return path which would restore old s0.  */
>  #define PICBASE exit_PICBASE
> +	.set	noreorder
>  	PICBASE = .+8
>  	bal	PICBASE
>  	nop
>  	move	s0,$31
> +	.set	reorder
>  #endif
>  #ifdef GCRT0
>  	LA (t0, _mcleanup)
> -	jal	t0
> -	nop
> +	jalr	t0
>  #endif
>  	LA (t0, hardware_exit_hook)
>  	beq	t0,zero,1f
> -	nop
> -	jal	t0
> -	nop
> +	jalr	t0
>  1:
>  
>  	# break instruction can cope with 0xfffff, but GAS limits the range:
>  	break	1023
>  	b	7b				# but loop back just in-case
> -	nop
>  	.end _exit
>   
>  /* Assume the PICBASE set up above is no longer valid below here.  */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH,MIPS 2/3] Enable reorder for crt0.S
  2019-07-04 17:23 ` Richard Sandiford
@ 2019-07-06  3:26   ` Faraz Shahbazker
  2019-07-30  0:00   ` [PATCH, MIPS] Align _end symbol to at least 4 in all MIPS scripts Faraz Shahbazker
  1 sibling, 0 replies; 8+ messages in thread
From: Faraz Shahbazker @ 2019-07-06  3:26 UTC (permalink / raw)
  To: Richard Sandiford, Matthew Fortune; +Cc: newlib

Hi Richard,


Looks like an oversight. From an internal branch of that vintage, I can see that the alignment was intended, but only committed upstream for mti*.ld scripts and that too snuck in accidentally as part of an unrelated commit (see https://sourceware.org/ml/newlib-cvs/2014-q4/msg00025.html). Probably a mistake in splitting patches for submission?


I am not familiar will all these target boards. Okay to go ahead and fix all MIPS scripts?


Regards,

Faraz

________________________________
From: newlib-owner@sourceware.org <newlib-owner@sourceware.org> on behalf of Richard Sandiford <richard.sandiford@arm.com>
Sent: Thursday, July 4, 2019 10:23:49 AM
To: Matthew Fortune
Cc: newlib@sourceware.org
Subject: Re: [PATCH,MIPS 2/3] Enable reorder for crt0.S

Hi,

Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
> Hi,
>
> As part of a long term plan to reduce the amount of hand written .set noreorder
> code, I have reworked the crt0.S file so that the assembler can fill delay
> slots instead of them being explicitly filled.  The reason for doing this is
> to enable future auto-conversion of delay slot branches to 'compact' branches
> present in the R6 architecture. Auto-conversion is not possible in a .set
> noreorder block as any delay slot branch with a non-NOP delay slot would have
> to be reordered!!! to convert to a compact branch without a delay slot.
> Writing code in a natural linear order is (subjectively) also much simpler to
> digest and maintain.
>
> One ugly piece of code had to be reworked in the zerobss loop which was using
> a pseudo-instruction BLTU as the branch. The structure of the old-loop was
> clearly aiming to produce a tight loop with one instruction and the delay slot
> filled but the expansion of the BLTU would have undone this anyway. This has
> been reworked to create the kind of loop originally intended and have the
> assembler fill the delay slot. The precise behaviour of the loop is subtly
> different from before for two reasons:
>
> 1) When the _fbss and _end symbols have the same value then the old loop would
>    have written zero to every address from _fbss to the end of memory (or an
>    exception occurred). The new loop is skipped if the two symbols are the same.
> 2) The old loop wrote zero to address of _end which is past the end of the
>    bss range. The new loop does not do this.
> 3) When _fbss is greater then _end at the start then the old loop would have
>    written one element and exited. The new loop will attempt to write zero
>    to every address from _fbss to the end of memory, wrap and continue to
>    _end (or hit an exception). This change in behaviour is fine as the
>    scenario is invalid anyway.
> 4) The _end marker is now aligned to 4-bytes to ensure that the last element
>    written does not reach beyond the address of _end. This is also necessary
>    as the termination condition is an equality test instead of an ordered
>    test so (_end - _fbss) must be a multiple of 4-bytes.

Sorry to jump on this old patch, but I couldn't see anything that did
(4).  I was trying to test mipsisa64-elf with idt64.ld and many tests
end up with an _end that isn't four-byte aligned, leading to an
"infinite" zeroing loop.

I guess we should add:

  . = ALIGN(4);

in front of:

   PROVIDE (end = .);
   _end = .;

for each script that doesn't already align to 4 or beyond.

Thanks,
Richard

>
> Delay slot filling will occur when libgloss is built with GCC and an
> optimisation level greater than zero. This gets translated to an assembler
> optimisation level of '2'.
>
> All instance of JAL <reg> have been changed to JALR <reg> as there is no
> special handling in the JAL macro in binutils for a register operand and
> JALR is the real underlying instruction.
>
> This change is primarily verified by code inspection but has also been run
> through some small test programs.
>
> Thanks,
> Matthew
>
> libgloss/
>
>        * mips/crt0.S: Remove .set noreorder throughout.  Change JAL <reg> to
>        JALR <reg> throughout.
>        (zerobss): Open code the bltu macro instruction so that the
>        zero-loop does not have a NOP in the branch delay slot.
> ---
>  libgloss/mips/crt0.S | 53 ++++++++++++++++++----------------------------------
>  1 file changed, 18 insertions(+), 35 deletions(-)
>
> diff --git a/libgloss/mips/crt0.S b/libgloss/mips/crt0.S
> index 599e79c..f66ef1b 100644
> --- a/libgloss/mips/crt0.S
> +++ b/libgloss/mips/crt0.S
> @@ -57,13 +57,14 @@
>        .globl  _start
>        .ent    _start
>  _start:
> -     .set    noreorder
>  #ifdef __mips_embedded_pic
>  #define PICBASE start_PICBASE
> +     .set    noreorder
>        PICBASE = .+8
>          bal  PICBASE
>        nop
>        move    s0,$31
> +     .set    reorder
>  #endif
>  #if __mips<3
>  #  define STATUS_MASK (SR_CU1|SR_PE)
> @@ -89,9 +90,7 @@ _start:
>        /* Avoid hazard from FPU enable and other SR changes.  */
>        LA (t0, hardware_hazard_hook)
>        beq     t0,zero,1f
> -     nop
> -     jal     t0
> -     nop
> +     jalr    t0
>  1:
>
>  /* Check for FPU presence.  Don't check if we know that soft_float is
> @@ -105,11 +104,8 @@ _start:
>        mfc1    t1,fp1
>        nop
>        bne     t0,t2,1f        /* check for match */
> -     nop
>        bne     t1,zero,1f      /* double check */
> -     nop
>        j       2f              /* FPU is present. */
> -     nop
>  #endif
>  1:
>        /* FPU is not present.  Set status register to say that. */
> @@ -119,9 +115,7 @@ _start:
>        /* Avoid hazard from FPU disable.  */
>        LA (t0, hardware_hazard_hook)
>        beq     t0,zero,2f
> -     nop
> -     jal     t0
> -     nop
> +     jalr    t0
>  2:
>
>
> @@ -129,7 +123,6 @@ _start:
>     doesn't get confused.  */
>        LA (v0, 3f)
>        jr      v0
> -     nop
>  3:
>        LA (gp, _gp)                            # set the global data pointer
>        .end _start
> @@ -145,21 +138,20 @@ _start:
>  zerobss:
>        LA (v0, _fbss)
>        LA (v1, _end)
> -3:
> -     sw      zero,0(v0)
> -     bltu    v0,v1,3b
> -     addiu   v0,v0,4                         # executed in delay slot
> -
> +     beq     v0,v1,2f
> +1:
> +     addiu   v0,v0,4
> +     sw      zero,-4(v0)
> +     bne     v0,v1,1b
> +2:
>        la      t0, __lstack                    # make a small stack so we
>        addiu   sp, t0, STARTUP_STACK_SIZE      # can run some C code
>        la      a0, __memsize                   # get the usable memory size
>        jal     get_mem_info
> -     nop
>
>        /* setup the stack pointer */
>        LA (t0, __stack)                        # is __stack set ?
>        bne     t0,zero,4f
> -     nop
>
>        /* NOTE: a0[0] contains the amount of memory available, and
>                 not the last memory address. */
> @@ -189,19 +181,14 @@ zerobss:
>  init:
>        LA (t9, hardware_init_hook)             # init the hardware if needed
>        beq     t9,zero,6f
> -     nop
> -     jal     t9
> -     nop
> +     jalr    t9
>  6:
>        LA (t9, software_init_hook)             # init the hardware if needed
>        beq     t9,zero,7f
> -     nop
> -     jal     t9
> -     nop
> +     jalr    t9
>  7:
>        LA (a0, _fini)
>        jal     atexit
> -     nop
>
>  #ifdef GCRT0
>        .globl  _ftext
> @@ -209,12 +196,10 @@ init:
>        LA (a0, _ftext)
>        LA (a1, _etext)
>        jal     monstartup
> -     nop
>  #endif
>
>
>        jal     _init                           # run global constructors
> -     nop
>
>        addiu   a1,sp,32                        # argv = sp + 32
>        addiu   a2,sp,40                        # envp = sp + 40
> @@ -225,13 +210,13 @@ init:
>        sw      zero,(a1)
>        sw      zero,(a2)
>  #endif
> -     jal     main                            # call the program start function
>        move    a0,zero                         # set argc to 0
> +     jal     main                            # call the program start function
>
>        # fall through to the "exit" routine
> +     move    a0,v0                           # pass through the exit code
>        jal     exit                            # call libc exit to run the G++
>                                                # destructors
> -     move    a0,v0                           # pass through the exit code
>        .end    init
>
>
> @@ -257,27 +242,25 @@ _exit:
>        /* Need to reinit PICBASE, since we might be called via exit()
>           rather than via a return path which would restore old s0.  */
>  #define PICBASE exit_PICBASE
> +     .set    noreorder
>        PICBASE = .+8
>        bal     PICBASE
>        nop
>        move    s0,$31
> +     .set    reorder
>  #endif
>  #ifdef GCRT0
>        LA (t0, _mcleanup)
> -     jal     t0
> -     nop
> +     jalr    t0
>  #endif
>        LA (t0, hardware_exit_hook)
>        beq     t0,zero,1f
> -     nop
> -     jal     t0
> -     nop
> +     jalr    t0
>  1:
>
>        # break instruction can cope with 0xfffff, but GAS limits the range:
>        break   1023
>        b       7b                              # but loop back just in-case
> -     nop
>        .end _exit
>
>  /* Assume the PICBASE set up above is no longer valid below here.  */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH, MIPS] Align _end symbol to at least 4 in all MIPS scripts
  2019-07-04 17:23 ` Richard Sandiford
  2019-07-06  3:26   ` Faraz Shahbazker
@ 2019-07-30  0:00   ` Faraz Shahbazker
  2019-07-30  7:39     ` Corinna Vinschen
  1 sibling, 1 reply; 8+ messages in thread
From: Faraz Shahbazker @ 2019-07-30  0:00 UTC (permalink / raw)
  To: newlib; +Cc: Faraz Shahbazker, Richard Sandiford

Left-over part of commit 84b2a020daa17d8ee5c9ec979c3d56f95e69573b

The _end marker must be aligned to 4-bytes to ensure that the last
element written does not reach beyond the address of _end.  This is
also necessary as the termination condition is an equality test
instead of an ordered test so (_end - _fbss) must be a multiple of
4-bytes.  The alignment is already correct for mti*.ld files, fix
it for all remaining MIPS scripts that don't already align to at
least 4.

libgloss/
	* mips/array.ld: Align _end to 4 byte boundary.
	* mips/ddb-kseg0.ld: Likewise.
	* mips/ddb.ld: Likewise.
	* mips/dve.ld: Likewise.
	* mips/idt.ld: Likewise.
	* mips/idt32.ld: Likewise.
	* mips/idt64.ld: Likewise.
	* mips/idtecoff.ld: Likewise.
	* mips/jmr3904app-java.ld: Likewise.
	* mips/jmr3904app.ld: Likewise.
	* mips/jmr3904dram-java.ld: Likewise.
	* mips/jmr3904dram.ld: Likewise.
	* mips/lsi.ld: Likewise.
	* mips/pmon.ld: Likewise.
	* mips/sde32.ld: Likewise.
	* mips/sde64.ld: Likewise.
---
 libgloss/mips/array.ld            | 1 +
 libgloss/mips/ddb-kseg0.ld        | 1 +
 libgloss/mips/ddb.ld              | 1 +
 libgloss/mips/dve.ld              | 1 +
 libgloss/mips/idt.ld              | 1 +
 libgloss/mips/idt32.ld            | 1 +
 libgloss/mips/idt64.ld            | 1 +
 libgloss/mips/idtecoff.ld         | 1 +
 libgloss/mips/jmr3904app-java.ld  | 1 +
 libgloss/mips/jmr3904app.ld       | 1 +
 libgloss/mips/jmr3904dram-java.ld | 1 +
 libgloss/mips/jmr3904dram.ld      | 1 +
 libgloss/mips/lsi.ld              | 1 +
 libgloss/mips/pmon.ld             | 1 +
 libgloss/mips/sde32.ld            | 1 +
 libgloss/mips/sde64.ld            | 1 +
 16 files changed, 16 insertions(+)

diff --git a/libgloss/mips/array.ld b/libgloss/mips/array.ld
index 0492ae5..5cdcf40 100644
--- a/libgloss/mips/array.ld
+++ b/libgloss/mips/array.ld
@@ -182,6 +182,7 @@ SECTIONS
     *(.gnu.linkonce.b.*)
     *(COMMON)
   }
+  . = ALIGN(4);
    end = .;
    _end = .;
 }
diff --git a/libgloss/mips/ddb-kseg0.ld b/libgloss/mips/ddb-kseg0.ld
index 8c1f926..a8643fd 100644
--- a/libgloss/mips/ddb-kseg0.ld
+++ b/libgloss/mips/ddb-kseg0.ld
@@ -135,6 +135,7 @@ SECTIONS
     *(COMMON)
   }
 
+   . = ALIGN(4);
    end = .;
    _end = .;
 
diff --git a/libgloss/mips/ddb.ld b/libgloss/mips/ddb.ld
index 299106f..7b899d4 100644
--- a/libgloss/mips/ddb.ld
+++ b/libgloss/mips/ddb.ld
@@ -135,6 +135,7 @@ SECTIONS
     *(COMMON)
   }
 
+  . = ALIGN(4);
    end = .;
    _end = .;
 
diff --git a/libgloss/mips/dve.ld b/libgloss/mips/dve.ld
index 96abbbe..e28c9c2 100644
--- a/libgloss/mips/dve.ld
+++ b/libgloss/mips/dve.ld
@@ -137,6 +137,7 @@ SECTIONS
     *(COMMON)
   }
 
+  . = ALIGN(4);
    end = .;
    _end = .;
 
diff --git a/libgloss/mips/idt.ld b/libgloss/mips/idt.ld
index b4608bf..a779569 100644
--- a/libgloss/mips/idt.ld
+++ b/libgloss/mips/idt.ld
@@ -143,6 +143,7 @@ SECTIONS
     *(COMMON)
   }
 
+  . = ALIGN(4);
   PROVIDE (end = .);
   _end = .;
 
diff --git a/libgloss/mips/idt32.ld b/libgloss/mips/idt32.ld
index 5084df7..8d4e4d6 100644
--- a/libgloss/mips/idt32.ld
+++ b/libgloss/mips/idt32.ld
@@ -144,6 +144,7 @@ SECTIONS
     *(COMMON)
   }
 
+  . = ALIGN(4);
    PROVIDE (end = .);
    _end = .;
 
diff --git a/libgloss/mips/idt64.ld b/libgloss/mips/idt64.ld
index a1121c6..8d996bc 100644
--- a/libgloss/mips/idt64.ld
+++ b/libgloss/mips/idt64.ld
@@ -145,6 +145,7 @@ SECTIONS
     *(COMMON)
   }
 
+  . = ALIGN(4);
   PROVIDE (end = .);
   _end = .;
 
diff --git a/libgloss/mips/idtecoff.ld b/libgloss/mips/idtecoff.ld
index 0297c60..57111e1 100644
--- a/libgloss/mips/idtecoff.ld
+++ b/libgloss/mips/idtecoff.ld
@@ -95,6 +95,7 @@ SECTIONS
     *(.gnu.linkonce.b.*)
     *(COMMON)
   }
+  . = ALIGN(4);
    end = .;
    _end = .;
 }
diff --git a/libgloss/mips/jmr3904app-java.ld b/libgloss/mips/jmr3904app-java.ld
index 92de26d..c9539fd 100644
--- a/libgloss/mips/jmr3904app-java.ld
+++ b/libgloss/mips/jmr3904app-java.ld
@@ -99,6 +99,7 @@ SECTIONS
     . = __stack ;
   }
 
+  . = ALIGN(4);
    end = .;
    _end = .;
 
diff --git a/libgloss/mips/jmr3904app.ld b/libgloss/mips/jmr3904app.ld
index 367fc47..35c2fec 100644
--- a/libgloss/mips/jmr3904app.ld
+++ b/libgloss/mips/jmr3904app.ld
@@ -137,6 +137,7 @@ SECTIONS
     . = __stack ;
   }
 
+  . = ALIGN(4);
    end = .;
    _end = .;
 
diff --git a/libgloss/mips/jmr3904dram-java.ld b/libgloss/mips/jmr3904dram-java.ld
index 4c0681a..aedd434 100644
--- a/libgloss/mips/jmr3904dram-java.ld
+++ b/libgloss/mips/jmr3904dram-java.ld
@@ -98,6 +98,7 @@ SECTIONS
     *(COMMON)
   }
 
+  . = ALIGN(4);
    end = .;
    _end = .;
 
diff --git a/libgloss/mips/jmr3904dram.ld b/libgloss/mips/jmr3904dram.ld
index 9e7d255..168c318 100644
--- a/libgloss/mips/jmr3904dram.ld
+++ b/libgloss/mips/jmr3904dram.ld
@@ -95,6 +95,7 @@ SECTIONS
     *(COMMON)
   }
 
+  . = ALIGN(4);
    end = .;
    _end = .;
 
diff --git a/libgloss/mips/lsi.ld b/libgloss/mips/lsi.ld
index 780c31c..e350419 100644
--- a/libgloss/mips/lsi.ld
+++ b/libgloss/mips/lsi.ld
@@ -133,6 +133,7 @@ SECTIONS
     *(.gnu.linkonce.b.*)
     *(COMMON)
   }
+  . = ALIGN(4);
    end = .;
    _end = .;
 }
diff --git a/libgloss/mips/pmon.ld b/libgloss/mips/pmon.ld
index fff6f66..1608cd9 100644
--- a/libgloss/mips/pmon.ld
+++ b/libgloss/mips/pmon.ld
@@ -137,6 +137,7 @@ SECTIONS
     *(COMMON)
   }
 
+  . = ALIGN(4);
    end = .;
    _end = .;
 
diff --git a/libgloss/mips/sde32.ld b/libgloss/mips/sde32.ld
index 7273107..4ef3b69 100644
--- a/libgloss/mips/sde32.ld
+++ b/libgloss/mips/sde32.ld
@@ -144,6 +144,7 @@ SECTIONS
     *(COMMON)
   }
 
+  . = ALIGN(4);
   PROVIDE (end = .);
   _end = .;
 
diff --git a/libgloss/mips/sde64.ld b/libgloss/mips/sde64.ld
index 0bcbe98..40338a1 100644
--- a/libgloss/mips/sde64.ld
+++ b/libgloss/mips/sde64.ld
@@ -146,6 +146,7 @@ SECTIONS
     *(COMMON)
   }
 
+  . = ALIGN(4);
   PROVIDE (end = .);
   _end = .;
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH, MIPS] Align _end symbol to at least 4 in all MIPS scripts
  2019-07-30  0:00   ` [PATCH, MIPS] Align _end symbol to at least 4 in all MIPS scripts Faraz Shahbazker
@ 2019-07-30  7:39     ` Corinna Vinschen
  0 siblings, 0 replies; 8+ messages in thread
From: Corinna Vinschen @ 2019-07-30  7:39 UTC (permalink / raw)
  To: Faraz Shahbazker; +Cc: newlib, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 1294 bytes --]

On Jul 30 00:00, Faraz Shahbazker wrote:
> Left-over part of commit 84b2a020daa17d8ee5c9ec979c3d56f95e69573b
> 
> The _end marker must be aligned to 4-bytes to ensure that the last
> element written does not reach beyond the address of _end.  This is
> also necessary as the termination condition is an equality test
> instead of an ordered test so (_end - _fbss) must be a multiple of
> 4-bytes.  The alignment is already correct for mti*.ld files, fix
> it for all remaining MIPS scripts that don't already align to at
> least 4.
> 
> libgloss/
> 	* mips/array.ld: Align _end to 4 byte boundary.
> 	* mips/ddb-kseg0.ld: Likewise.
> 	* mips/ddb.ld: Likewise.
> 	* mips/dve.ld: Likewise.
> 	* mips/idt.ld: Likewise.
> 	* mips/idt32.ld: Likewise.
> 	* mips/idt64.ld: Likewise.
> 	* mips/idtecoff.ld: Likewise.
> 	* mips/jmr3904app-java.ld: Likewise.
> 	* mips/jmr3904app.ld: Likewise.
> 	* mips/jmr3904dram-java.ld: Likewise.
> 	* mips/jmr3904dram.ld: Likewise.
> 	* mips/lsi.ld: Likewise.
> 	* mips/pmon.ld: Likewise.
> 	* mips/sde32.ld: Likewise.
> 	* mips/sde64.ld: Likewise.

Pushed.  I just dropped the CVS ChangeLog remnant from the commit
message.  We really don't do that anymore.


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-07-30  7:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-18 12:14 [PATCH,MIPS 2/3] Enable reorder for crt0.S Matthew Fortune
2014-11-26 18:00 ` Steve Ellcey
2014-11-27  0:15   ` Matthew Fortune
2014-11-27  0:28     ` Jeff Johnston
2019-07-04 17:23 ` Richard Sandiford
2019-07-06  3:26   ` Faraz Shahbazker
2019-07-30  0:00   ` [PATCH, MIPS] Align _end symbol to at least 4 in all MIPS scripts Faraz Shahbazker
2019-07-30  7:39     ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).