* [PATCH,MIPS 2/3] Enable reorder for crt0.S
@ 2014-11-18 12:14 Matthew Fortune
2014-11-26 18:00 ` Steve Ellcey
2019-07-04 17:23 ` Richard Sandiford
0 siblings, 2 replies; 8+ messages in thread
From: Matthew Fortune @ 2014-11-18 12:14 UTC (permalink / raw)
To: newlib
Hi,
As part of a long term plan to reduce the amount of hand written .set noreorder
code, I have reworked the crt0.S file so that the assembler can fill delay
slots instead of them being explicitly filled. The reason for doing this is
to enable future auto-conversion of delay slot branches to 'compact' branches
present in the R6 architecture. Auto-conversion is not possible in a .set
noreorder block as any delay slot branch with a non-NOP delay slot would have
to be reordered!!! to convert to a compact branch without a delay slot.
Writing code in a natural linear order is (subjectively) also much simpler to
digest and maintain.
One ugly piece of code had to be reworked in the zerobss loop which was using
a pseudo-instruction BLTU as the branch. The structure of the old-loop was
clearly aiming to produce a tight loop with one instruction and the delay slot
filled but the expansion of the BLTU would have undone this anyway. This has
been reworked to create the kind of loop originally intended and have the
assembler fill the delay slot. The precise behaviour of the loop is subtly
different from before for two reasons:
1) When the _fbss and _end symbols have the same value then the old loop would
have written zero to every address from _fbss to the end of memory (or an
exception occurred). The new loop is skipped if the two symbols are the same.
2) The old loop wrote zero to address of _end which is past the end of the
bss range. The new loop does not do this.
3) When _fbss is greater then _end at the start then the old loop would have
written one element and exited. The new loop will attempt to write zero
to every address from _fbss to the end of memory, wrap and continue to
_end (or hit an exception). This change in behaviour is fine as the
scenario is invalid anyway.
4) The _end marker is now aligned to 4-bytes to ensure that the last element
written does not reach beyond the address of _end. This is also necessary
as the termination condition is an equality test instead of an ordered
test so (_end - _fbss) must be a multiple of 4-bytes.
Delay slot filling will occur when libgloss is built with GCC and an
optimisation level greater than zero. This gets translated to an assembler
optimisation level of '2'.
All instance of JAL <reg> have been changed to JALR <reg> as there is no
special handling in the JAL macro in binutils for a register operand and
JALR is the real underlying instruction.
This change is primarily verified by code inspection but has also been run
through some small test programs.
Thanks,
Matthew
libgloss/
* mips/crt0.S: Remove .set noreorder throughout. Change JAL <reg> to
JALR <reg> throughout.
(zerobss): Open code the bltu macro instruction so that the
zero-loop does not have a NOP in the branch delay slot.
---
libgloss/mips/crt0.S | 53 ++++++++++++++++++----------------------------------
1 file changed, 18 insertions(+), 35 deletions(-)
diff --git a/libgloss/mips/crt0.S b/libgloss/mips/crt0.S
index 599e79c..f66ef1b 100644
--- a/libgloss/mips/crt0.S
+++ b/libgloss/mips/crt0.S
@@ -57,13 +57,14 @@
.globl _start
.ent _start
_start:
- .set noreorder
#ifdef __mips_embedded_pic
#define PICBASE start_PICBASE
+ .set noreorder
PICBASE = .+8
bal PICBASE
nop
move s0,$31
+ .set reorder
#endif
#if __mips<3
# define STATUS_MASK (SR_CU1|SR_PE)
@@ -89,9 +90,7 @@ _start:
/* Avoid hazard from FPU enable and other SR changes. */
LA (t0, hardware_hazard_hook)
beq t0,zero,1f
- nop
- jal t0
- nop
+ jalr t0
1:
/* Check for FPU presence. Don't check if we know that soft_float is
@@ -105,11 +104,8 @@ _start:
mfc1 t1,fp1
nop
bne t0,t2,1f /* check for match */
- nop
bne t1,zero,1f /* double check */
- nop
j 2f /* FPU is present. */
- nop
#endif
1:
/* FPU is not present. Set status register to say that. */
@@ -119,9 +115,7 @@ _start:
/* Avoid hazard from FPU disable. */
LA (t0, hardware_hazard_hook)
beq t0,zero,2f
- nop
- jal t0
- nop
+ jalr t0
2:
@@ -129,7 +123,6 @@ _start:
doesn't get confused. */
LA (v0, 3f)
jr v0
- nop
3:
LA (gp, _gp) # set the global data pointer
.end _start
@@ -145,21 +138,20 @@ _start:
zerobss:
LA (v0, _fbss)
LA (v1, _end)
-3:
- sw zero,0(v0)
- bltu v0,v1,3b
- addiu v0,v0,4 # executed in delay slot
-
+ beq v0,v1,2f
+1:
+ addiu v0,v0,4
+ sw zero,-4(v0)
+ bne v0,v1,1b
+2:
la t0, __lstack # make a small stack so we
addiu sp, t0, STARTUP_STACK_SIZE # can run some C code
la a0, __memsize # get the usable memory size
jal get_mem_info
- nop
/* setup the stack pointer */
LA (t0, __stack) # is __stack set ?
bne t0,zero,4f
- nop
/* NOTE: a0[0] contains the amount of memory available, and
not the last memory address. */
@@ -189,19 +181,14 @@ zerobss:
init:
LA (t9, hardware_init_hook) # init the hardware if needed
beq t9,zero,6f
- nop
- jal t9
- nop
+ jalr t9
6:
LA (t9, software_init_hook) # init the hardware if needed
beq t9,zero,7f
- nop
- jal t9
- nop
+ jalr t9
7:
LA (a0, _fini)
jal atexit
- nop
#ifdef GCRT0
.globl _ftext
@@ -209,12 +196,10 @@ init:
LA (a0, _ftext)
LA (a1, _etext)
jal monstartup
- nop
#endif
jal _init # run global constructors
- nop
addiu a1,sp,32 # argv = sp + 32
addiu a2,sp,40 # envp = sp + 40
@@ -225,13 +210,13 @@ init:
sw zero,(a1)
sw zero,(a2)
#endif
- jal main # call the program start function
move a0,zero # set argc to 0
+ jal main # call the program start function
# fall through to the "exit" routine
+ move a0,v0 # pass through the exit code
jal exit # call libc exit to run the G++
# destructors
- move a0,v0 # pass through the exit code
.end init
@@ -257,27 +242,25 @@ _exit:
/* Need to reinit PICBASE, since we might be called via exit()
rather than via a return path which would restore old s0. */
#define PICBASE exit_PICBASE
+ .set noreorder
PICBASE = .+8
bal PICBASE
nop
move s0,$31
+ .set reorder
#endif
#ifdef GCRT0
LA (t0, _mcleanup)
- jal t0
- nop
+ jalr t0
#endif
LA (t0, hardware_exit_hook)
beq t0,zero,1f
- nop
- jal t0
- nop
+ jalr t0
1:
# break instruction can cope with 0xfffff, but GAS limits the range:
break 1023
b 7b # but loop back just in-case
- nop
.end _exit
/* Assume the PICBASE set up above is no longer valid below here. */
--
1.9.4
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH,MIPS 2/3] Enable reorder for crt0.S
2014-11-18 12:14 [PATCH,MIPS 2/3] Enable reorder for crt0.S Matthew Fortune
@ 2014-11-26 18:00 ` Steve Ellcey
2014-11-27 0:15 ` Matthew Fortune
2019-07-04 17:23 ` Richard Sandiford
1 sibling, 1 reply; 8+ messages in thread
From: Steve Ellcey @ 2014-11-26 18:00 UTC (permalink / raw)
To: newlib
On Tue, 2014-11-18 at 12:14 +0000, Matthew Fortune wrote:
> libgloss/
>
> * mips/crt0.S: Remove .set noreorder throughout. Change JAL <reg> to
> JALR <reg> throughout.
> (zerobss): Open code the bltu macro instruction so that the
> zero-loop does not have a NOP in the branch delay slot.
This looks OK to me.
Steve Ellcey
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH,MIPS 2/3] Enable reorder for crt0.S
2014-11-26 18:00 ` Steve Ellcey
@ 2014-11-27 0:15 ` Matthew Fortune
2014-11-27 0:28 ` Jeff Johnston
0 siblings, 1 reply; 8+ messages in thread
From: Matthew Fortune @ 2014-11-27 0:15 UTC (permalink / raw)
To: Steve Ellcey, newlib
[-- Attachment #1: Type: text/plain, Size: 137 bytes --]
> This looks OK to me.
Re-submitting this as an attachment as the inline copy would not apply
cleanly for Jeff.
Thanks,
Matthew
[-- Attachment #2: 0002-Enable-reorder.patch --]
[-- Type: application/octet-stream, Size: 3964 bytes --]
From 597dbd2da2b3c63841340b9164ed2ab4341515d0 Mon Sep 17 00:00:00 2001
From: Matthew Fortune <matthew.fortune@imgtec.com>
Date: Mon, 17 Nov 2014 11:59:53 +0000
Subject: [PATCH 2/3] Enable reorder
libgloss/
* mips/crt0.S: Remove .set noreorder throughout.
(zerobss): Open code the bltu macro instruction so that the
zero-loop does not have a NOP in the branch delay slot.
---
libgloss/mips/crt0.S | 53 ++++++++++++++++++----------------------------------
1 file changed, 18 insertions(+), 35 deletions(-)
diff --git a/libgloss/mips/crt0.S b/libgloss/mips/crt0.S
index 599e79c..f66ef1b 100644
--- a/libgloss/mips/crt0.S
+++ b/libgloss/mips/crt0.S
@@ -57,13 +57,14 @@
.globl _start
.ent _start
_start:
- .set noreorder
#ifdef __mips_embedded_pic
#define PICBASE start_PICBASE
+ .set noreorder
PICBASE = .+8
bal PICBASE
nop
move s0,$31
+ .set reorder
#endif
#if __mips<3
# define STATUS_MASK (SR_CU1|SR_PE)
@@ -89,9 +90,7 @@ _start:
/* Avoid hazard from FPU enable and other SR changes. */
LA (t0, hardware_hazard_hook)
beq t0,zero,1f
- nop
- jal t0
- nop
+ jalr t0
1:
/* Check for FPU presence. Don't check if we know that soft_float is
@@ -105,11 +104,8 @@ _start:
mfc1 t1,fp1
nop
bne t0,t2,1f /* check for match */
- nop
bne t1,zero,1f /* double check */
- nop
j 2f /* FPU is present. */
- nop
#endif
1:
/* FPU is not present. Set status register to say that. */
@@ -119,9 +115,7 @@ _start:
/* Avoid hazard from FPU disable. */
LA (t0, hardware_hazard_hook)
beq t0,zero,2f
- nop
- jal t0
- nop
+ jalr t0
2:
@@ -129,7 +123,6 @@ _start:
doesn't get confused. */
LA (v0, 3f)
jr v0
- nop
3:
LA (gp, _gp) # set the global data pointer
.end _start
@@ -145,21 +138,20 @@ _start:
zerobss:
LA (v0, _fbss)
LA (v1, _end)
-3:
- sw zero,0(v0)
- bltu v0,v1,3b
- addiu v0,v0,4 # executed in delay slot
-
+ beq v0,v1,2f
+1:
+ addiu v0,v0,4
+ sw zero,-4(v0)
+ bne v0,v1,1b
+2:
la t0, __lstack # make a small stack so we
addiu sp, t0, STARTUP_STACK_SIZE # can run some C code
la a0, __memsize # get the usable memory size
jal get_mem_info
- nop
/* setup the stack pointer */
LA (t0, __stack) # is __stack set ?
bne t0,zero,4f
- nop
/* NOTE: a0[0] contains the amount of memory available, and
not the last memory address. */
@@ -189,19 +181,14 @@ zerobss:
init:
LA (t9, hardware_init_hook) # init the hardware if needed
beq t9,zero,6f
- nop
- jal t9
- nop
+ jalr t9
6:
LA (t9, software_init_hook) # init the hardware if needed
beq t9,zero,7f
- nop
- jal t9
- nop
+ jalr t9
7:
LA (a0, _fini)
jal atexit
- nop
#ifdef GCRT0
.globl _ftext
@@ -209,12 +196,10 @@ init:
LA (a0, _ftext)
LA (a1, _etext)
jal monstartup
- nop
#endif
jal _init # run global constructors
- nop
addiu a1,sp,32 # argv = sp + 32
addiu a2,sp,40 # envp = sp + 40
@@ -225,13 +210,13 @@ init:
sw zero,(a1)
sw zero,(a2)
#endif
- jal main # call the program start function
move a0,zero # set argc to 0
+ jal main # call the program start function
# fall through to the "exit" routine
+ move a0,v0 # pass through the exit code
jal exit # call libc exit to run the G++
# destructors
- move a0,v0 # pass through the exit code
.end init
@@ -257,27 +242,25 @@ _exit:
/* Need to reinit PICBASE, since we might be called via exit()
rather than via a return path which would restore old s0. */
#define PICBASE exit_PICBASE
+ .set noreorder
PICBASE = .+8
bal PICBASE
nop
move s0,$31
+ .set reorder
#endif
#ifdef GCRT0
LA (t0, _mcleanup)
- jal t0
- nop
+ jalr t0
#endif
LA (t0, hardware_exit_hook)
beq t0,zero,1f
- nop
- jal t0
- nop
+ jalr t0
1:
# break instruction can cope with 0xfffff, but GAS limits the range:
break 1023
b 7b # but loop back just in-case
- nop
.end _exit
/* Assume the PICBASE set up above is no longer valid below here. */
--
1.9.4
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH,MIPS 2/3] Enable reorder for crt0.S
2014-11-27 0:15 ` Matthew Fortune
@ 2014-11-27 0:28 ` Jeff Johnston
0 siblings, 0 replies; 8+ messages in thread
From: Jeff Johnston @ 2014-11-27 0:28 UTC (permalink / raw)
To: Matthew Fortune; +Cc: Steve Ellcey, newlib
Thanks, that did it. Patch applied.
-- Jeff J.
----- Original Message -----
> From: "Matthew Fortune" <Matthew.Fortune@imgtec.com>
> To: "Steve Ellcey" <Steve.Ellcey@imgtec.com>, newlib@sourceware.org
> Sent: Wednesday, November 26, 2014 7:15:43 PM
> Subject: RE: [PATCH,MIPS 2/3] Enable reorder for crt0.S
>
> > This looks OK to me.
>
> Re-submitting this as an attachment as the inline copy would not apply
> cleanly for Jeff.
>
> Thanks,
> Matthew
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH,MIPS 2/3] Enable reorder for crt0.S
2014-11-18 12:14 [PATCH,MIPS 2/3] Enable reorder for crt0.S Matthew Fortune
2014-11-26 18:00 ` Steve Ellcey
@ 2019-07-04 17:23 ` Richard Sandiford
2019-07-06 3:26 ` Faraz Shahbazker
2019-07-30 0:00 ` [PATCH, MIPS] Align _end symbol to at least 4 in all MIPS scripts Faraz Shahbazker
1 sibling, 2 replies; 8+ messages in thread
From: Richard Sandiford @ 2019-07-04 17:23 UTC (permalink / raw)
To: Matthew Fortune; +Cc: newlib
Hi,
Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
> Hi,
>
> As part of a long term plan to reduce the amount of hand written .set noreorder
> code, I have reworked the crt0.S file so that the assembler can fill delay
> slots instead of them being explicitly filled. The reason for doing this is
> to enable future auto-conversion of delay slot branches to 'compact' branches
> present in the R6 architecture. Auto-conversion is not possible in a .set
> noreorder block as any delay slot branch with a non-NOP delay slot would have
> to be reordered!!! to convert to a compact branch without a delay slot.
> Writing code in a natural linear order is (subjectively) also much simpler to
> digest and maintain.
>
> One ugly piece of code had to be reworked in the zerobss loop which was using
> a pseudo-instruction BLTU as the branch. The structure of the old-loop was
> clearly aiming to produce a tight loop with one instruction and the delay slot
> filled but the expansion of the BLTU would have undone this anyway. This has
> been reworked to create the kind of loop originally intended and have the
> assembler fill the delay slot. The precise behaviour of the loop is subtly
> different from before for two reasons:
>
> 1) When the _fbss and _end symbols have the same value then the old loop would
> have written zero to every address from _fbss to the end of memory (or an
> exception occurred). The new loop is skipped if the two symbols are the same.
> 2) The old loop wrote zero to address of _end which is past the end of the
> bss range. The new loop does not do this.
> 3) When _fbss is greater then _end at the start then the old loop would have
> written one element and exited. The new loop will attempt to write zero
> to every address from _fbss to the end of memory, wrap and continue to
> _end (or hit an exception). This change in behaviour is fine as the
> scenario is invalid anyway.
> 4) The _end marker is now aligned to 4-bytes to ensure that the last element
> written does not reach beyond the address of _end. This is also necessary
> as the termination condition is an equality test instead of an ordered
> test so (_end - _fbss) must be a multiple of 4-bytes.
Sorry to jump on this old patch, but I couldn't see anything that did
(4). I was trying to test mipsisa64-elf with idt64.ld and many tests
end up with an _end that isn't four-byte aligned, leading to an
"infinite" zeroing loop.
I guess we should add:
. = ALIGN(4);
in front of:
PROVIDE (end = .);
_end = .;
for each script that doesn't already align to 4 or beyond.
Thanks,
Richard
>
> Delay slot filling will occur when libgloss is built with GCC and an
> optimisation level greater than zero. This gets translated to an assembler
> optimisation level of '2'.
>
> All instance of JAL <reg> have been changed to JALR <reg> as there is no
> special handling in the JAL macro in binutils for a register operand and
> JALR is the real underlying instruction.
>
> This change is primarily verified by code inspection but has also been run
> through some small test programs.
>
> Thanks,
> Matthew
>
> libgloss/
>
> * mips/crt0.S: Remove .set noreorder throughout. Change JAL <reg> to
> JALR <reg> throughout.
> (zerobss): Open code the bltu macro instruction so that the
> zero-loop does not have a NOP in the branch delay slot.
> ---
> libgloss/mips/crt0.S | 53 ++++++++++++++++++----------------------------------
> 1 file changed, 18 insertions(+), 35 deletions(-)
>
> diff --git a/libgloss/mips/crt0.S b/libgloss/mips/crt0.S
> index 599e79c..f66ef1b 100644
> --- a/libgloss/mips/crt0.S
> +++ b/libgloss/mips/crt0.S
> @@ -57,13 +57,14 @@
> .globl _start
> .ent _start
> _start:
> - .set noreorder
> #ifdef __mips_embedded_pic
> #define PICBASE start_PICBASE
> + .set noreorder
> PICBASE = .+8
> bal PICBASE
> nop
> move s0,$31
> + .set reorder
> #endif
> #if __mips<3
> # define STATUS_MASK (SR_CU1|SR_PE)
> @@ -89,9 +90,7 @@ _start:
> /* Avoid hazard from FPU enable and other SR changes. */
> LA (t0, hardware_hazard_hook)
> beq t0,zero,1f
> - nop
> - jal t0
> - nop
> + jalr t0
> 1:
>
> /* Check for FPU presence. Don't check if we know that soft_float is
> @@ -105,11 +104,8 @@ _start:
> mfc1 t1,fp1
> nop
> bne t0,t2,1f /* check for match */
> - nop
> bne t1,zero,1f /* double check */
> - nop
> j 2f /* FPU is present. */
> - nop
> #endif
> 1:
> /* FPU is not present. Set status register to say that. */
> @@ -119,9 +115,7 @@ _start:
> /* Avoid hazard from FPU disable. */
> LA (t0, hardware_hazard_hook)
> beq t0,zero,2f
> - nop
> - jal t0
> - nop
> + jalr t0
> 2:
>
>
> @@ -129,7 +123,6 @@ _start:
> doesn't get confused. */
> LA (v0, 3f)
> jr v0
> - nop
> 3:
> LA (gp, _gp) # set the global data pointer
> .end _start
> @@ -145,21 +138,20 @@ _start:
> zerobss:
> LA (v0, _fbss)
> LA (v1, _end)
> -3:
> - sw zero,0(v0)
> - bltu v0,v1,3b
> - addiu v0,v0,4 # executed in delay slot
> -
> + beq v0,v1,2f
> +1:
> + addiu v0,v0,4
> + sw zero,-4(v0)
> + bne v0,v1,1b
> +2:
> la t0, __lstack # make a small stack so we
> addiu sp, t0, STARTUP_STACK_SIZE # can run some C code
> la a0, __memsize # get the usable memory size
> jal get_mem_info
> - nop
>
> /* setup the stack pointer */
> LA (t0, __stack) # is __stack set ?
> bne t0,zero,4f
> - nop
>
> /* NOTE: a0[0] contains the amount of memory available, and
> not the last memory address. */
> @@ -189,19 +181,14 @@ zerobss:
> init:
> LA (t9, hardware_init_hook) # init the hardware if needed
> beq t9,zero,6f
> - nop
> - jal t9
> - nop
> + jalr t9
> 6:
> LA (t9, software_init_hook) # init the hardware if needed
> beq t9,zero,7f
> - nop
> - jal t9
> - nop
> + jalr t9
> 7:
> LA (a0, _fini)
> jal atexit
> - nop
>
> #ifdef GCRT0
> .globl _ftext
> @@ -209,12 +196,10 @@ init:
> LA (a0, _ftext)
> LA (a1, _etext)
> jal monstartup
> - nop
> #endif
>
>
> jal _init # run global constructors
> - nop
>
> addiu a1,sp,32 # argv = sp + 32
> addiu a2,sp,40 # envp = sp + 40
> @@ -225,13 +210,13 @@ init:
> sw zero,(a1)
> sw zero,(a2)
> #endif
> - jal main # call the program start function
> move a0,zero # set argc to 0
> + jal main # call the program start function
>
> # fall through to the "exit" routine
> + move a0,v0 # pass through the exit code
> jal exit # call libc exit to run the G++
> # destructors
> - move a0,v0 # pass through the exit code
> .end init
>
>
> @@ -257,27 +242,25 @@ _exit:
> /* Need to reinit PICBASE, since we might be called via exit()
> rather than via a return path which would restore old s0. */
> #define PICBASE exit_PICBASE
> + .set noreorder
> PICBASE = .+8
> bal PICBASE
> nop
> move s0,$31
> + .set reorder
> #endif
> #ifdef GCRT0
> LA (t0, _mcleanup)
> - jal t0
> - nop
> + jalr t0
> #endif
> LA (t0, hardware_exit_hook)
> beq t0,zero,1f
> - nop
> - jal t0
> - nop
> + jalr t0
> 1:
>
> # break instruction can cope with 0xfffff, but GAS limits the range:
> break 1023
> b 7b # but loop back just in-case
> - nop
> .end _exit
>
> /* Assume the PICBASE set up above is no longer valid below here. */
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH,MIPS 2/3] Enable reorder for crt0.S
2019-07-04 17:23 ` Richard Sandiford
@ 2019-07-06 3:26 ` Faraz Shahbazker
2019-07-30 0:00 ` [PATCH, MIPS] Align _end symbol to at least 4 in all MIPS scripts Faraz Shahbazker
1 sibling, 0 replies; 8+ messages in thread
From: Faraz Shahbazker @ 2019-07-06 3:26 UTC (permalink / raw)
To: Richard Sandiford, Matthew Fortune; +Cc: newlib
Hi Richard,
Looks like an oversight. From an internal branch of that vintage, I can see that the alignment was intended, but only committed upstream for mti*.ld scripts and that too snuck in accidentally as part of an unrelated commit (see https://sourceware.org/ml/newlib-cvs/2014-q4/msg00025.html). Probably a mistake in splitting patches for submission?
I am not familiar will all these target boards. Okay to go ahead and fix all MIPS scripts?
Regards,
Faraz
________________________________
From: newlib-owner@sourceware.org <newlib-owner@sourceware.org> on behalf of Richard Sandiford <richard.sandiford@arm.com>
Sent: Thursday, July 4, 2019 10:23:49 AM
To: Matthew Fortune
Cc: newlib@sourceware.org
Subject: Re: [PATCH,MIPS 2/3] Enable reorder for crt0.S
Hi,
Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
> Hi,
>
> As part of a long term plan to reduce the amount of hand written .set noreorder
> code, I have reworked the crt0.S file so that the assembler can fill delay
> slots instead of them being explicitly filled. The reason for doing this is
> to enable future auto-conversion of delay slot branches to 'compact' branches
> present in the R6 architecture. Auto-conversion is not possible in a .set
> noreorder block as any delay slot branch with a non-NOP delay slot would have
> to be reordered!!! to convert to a compact branch without a delay slot.
> Writing code in a natural linear order is (subjectively) also much simpler to
> digest and maintain.
>
> One ugly piece of code had to be reworked in the zerobss loop which was using
> a pseudo-instruction BLTU as the branch. The structure of the old-loop was
> clearly aiming to produce a tight loop with one instruction and the delay slot
> filled but the expansion of the BLTU would have undone this anyway. This has
> been reworked to create the kind of loop originally intended and have the
> assembler fill the delay slot. The precise behaviour of the loop is subtly
> different from before for two reasons:
>
> 1) When the _fbss and _end symbols have the same value then the old loop would
> have written zero to every address from _fbss to the end of memory (or an
> exception occurred). The new loop is skipped if the two symbols are the same.
> 2) The old loop wrote zero to address of _end which is past the end of the
> bss range. The new loop does not do this.
> 3) When _fbss is greater then _end at the start then the old loop would have
> written one element and exited. The new loop will attempt to write zero
> to every address from _fbss to the end of memory, wrap and continue to
> _end (or hit an exception). This change in behaviour is fine as the
> scenario is invalid anyway.
> 4) The _end marker is now aligned to 4-bytes to ensure that the last element
> written does not reach beyond the address of _end. This is also necessary
> as the termination condition is an equality test instead of an ordered
> test so (_end - _fbss) must be a multiple of 4-bytes.
Sorry to jump on this old patch, but I couldn't see anything that did
(4). I was trying to test mipsisa64-elf with idt64.ld and many tests
end up with an _end that isn't four-byte aligned, leading to an
"infinite" zeroing loop.
I guess we should add:
. = ALIGN(4);
in front of:
PROVIDE (end = .);
_end = .;
for each script that doesn't already align to 4 or beyond.
Thanks,
Richard
>
> Delay slot filling will occur when libgloss is built with GCC and an
> optimisation level greater than zero. This gets translated to an assembler
> optimisation level of '2'.
>
> All instance of JAL <reg> have been changed to JALR <reg> as there is no
> special handling in the JAL macro in binutils for a register operand and
> JALR is the real underlying instruction.
>
> This change is primarily verified by code inspection but has also been run
> through some small test programs.
>
> Thanks,
> Matthew
>
> libgloss/
>
> * mips/crt0.S: Remove .set noreorder throughout. Change JAL <reg> to
> JALR <reg> throughout.
> (zerobss): Open code the bltu macro instruction so that the
> zero-loop does not have a NOP in the branch delay slot.
> ---
> libgloss/mips/crt0.S | 53 ++++++++++++++++++----------------------------------
> 1 file changed, 18 insertions(+), 35 deletions(-)
>
> diff --git a/libgloss/mips/crt0.S b/libgloss/mips/crt0.S
> index 599e79c..f66ef1b 100644
> --- a/libgloss/mips/crt0.S
> +++ b/libgloss/mips/crt0.S
> @@ -57,13 +57,14 @@
> .globl _start
> .ent _start
> _start:
> - .set noreorder
> #ifdef __mips_embedded_pic
> #define PICBASE start_PICBASE
> + .set noreorder
> PICBASE = .+8
> bal PICBASE
> nop
> move s0,$31
> + .set reorder
> #endif
> #if __mips<3
> # define STATUS_MASK (SR_CU1|SR_PE)
> @@ -89,9 +90,7 @@ _start:
> /* Avoid hazard from FPU enable and other SR changes. */
> LA (t0, hardware_hazard_hook)
> beq t0,zero,1f
> - nop
> - jal t0
> - nop
> + jalr t0
> 1:
>
> /* Check for FPU presence. Don't check if we know that soft_float is
> @@ -105,11 +104,8 @@ _start:
> mfc1 t1,fp1
> nop
> bne t0,t2,1f /* check for match */
> - nop
> bne t1,zero,1f /* double check */
> - nop
> j 2f /* FPU is present. */
> - nop
> #endif
> 1:
> /* FPU is not present. Set status register to say that. */
> @@ -119,9 +115,7 @@ _start:
> /* Avoid hazard from FPU disable. */
> LA (t0, hardware_hazard_hook)
> beq t0,zero,2f
> - nop
> - jal t0
> - nop
> + jalr t0
> 2:
>
>
> @@ -129,7 +123,6 @@ _start:
> doesn't get confused. */
> LA (v0, 3f)
> jr v0
> - nop
> 3:
> LA (gp, _gp) # set the global data pointer
> .end _start
> @@ -145,21 +138,20 @@ _start:
> zerobss:
> LA (v0, _fbss)
> LA (v1, _end)
> -3:
> - sw zero,0(v0)
> - bltu v0,v1,3b
> - addiu v0,v0,4 # executed in delay slot
> -
> + beq v0,v1,2f
> +1:
> + addiu v0,v0,4
> + sw zero,-4(v0)
> + bne v0,v1,1b
> +2:
> la t0, __lstack # make a small stack so we
> addiu sp, t0, STARTUP_STACK_SIZE # can run some C code
> la a0, __memsize # get the usable memory size
> jal get_mem_info
> - nop
>
> /* setup the stack pointer */
> LA (t0, __stack) # is __stack set ?
> bne t0,zero,4f
> - nop
>
> /* NOTE: a0[0] contains the amount of memory available, and
> not the last memory address. */
> @@ -189,19 +181,14 @@ zerobss:
> init:
> LA (t9, hardware_init_hook) # init the hardware if needed
> beq t9,zero,6f
> - nop
> - jal t9
> - nop
> + jalr t9
> 6:
> LA (t9, software_init_hook) # init the hardware if needed
> beq t9,zero,7f
> - nop
> - jal t9
> - nop
> + jalr t9
> 7:
> LA (a0, _fini)
> jal atexit
> - nop
>
> #ifdef GCRT0
> .globl _ftext
> @@ -209,12 +196,10 @@ init:
> LA (a0, _ftext)
> LA (a1, _etext)
> jal monstartup
> - nop
> #endif
>
>
> jal _init # run global constructors
> - nop
>
> addiu a1,sp,32 # argv = sp + 32
> addiu a2,sp,40 # envp = sp + 40
> @@ -225,13 +210,13 @@ init:
> sw zero,(a1)
> sw zero,(a2)
> #endif
> - jal main # call the program start function
> move a0,zero # set argc to 0
> + jal main # call the program start function
>
> # fall through to the "exit" routine
> + move a0,v0 # pass through the exit code
> jal exit # call libc exit to run the G++
> # destructors
> - move a0,v0 # pass through the exit code
> .end init
>
>
> @@ -257,27 +242,25 @@ _exit:
> /* Need to reinit PICBASE, since we might be called via exit()
> rather than via a return path which would restore old s0. */
> #define PICBASE exit_PICBASE
> + .set noreorder
> PICBASE = .+8
> bal PICBASE
> nop
> move s0,$31
> + .set reorder
> #endif
> #ifdef GCRT0
> LA (t0, _mcleanup)
> - jal t0
> - nop
> + jalr t0
> #endif
> LA (t0, hardware_exit_hook)
> beq t0,zero,1f
> - nop
> - jal t0
> - nop
> + jalr t0
> 1:
>
> # break instruction can cope with 0xfffff, but GAS limits the range:
> break 1023
> b 7b # but loop back just in-case
> - nop
> .end _exit
>
> /* Assume the PICBASE set up above is no longer valid below here. */
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH, MIPS] Align _end symbol to at least 4 in all MIPS scripts
2019-07-04 17:23 ` Richard Sandiford
2019-07-06 3:26 ` Faraz Shahbazker
@ 2019-07-30 0:00 ` Faraz Shahbazker
2019-07-30 7:39 ` Corinna Vinschen
1 sibling, 1 reply; 8+ messages in thread
From: Faraz Shahbazker @ 2019-07-30 0:00 UTC (permalink / raw)
To: newlib; +Cc: Faraz Shahbazker, Richard Sandiford
Left-over part of commit 84b2a020daa17d8ee5c9ec979c3d56f95e69573b
The _end marker must be aligned to 4-bytes to ensure that the last
element written does not reach beyond the address of _end. This is
also necessary as the termination condition is an equality test
instead of an ordered test so (_end - _fbss) must be a multiple of
4-bytes. The alignment is already correct for mti*.ld files, fix
it for all remaining MIPS scripts that don't already align to at
least 4.
libgloss/
* mips/array.ld: Align _end to 4 byte boundary.
* mips/ddb-kseg0.ld: Likewise.
* mips/ddb.ld: Likewise.
* mips/dve.ld: Likewise.
* mips/idt.ld: Likewise.
* mips/idt32.ld: Likewise.
* mips/idt64.ld: Likewise.
* mips/idtecoff.ld: Likewise.
* mips/jmr3904app-java.ld: Likewise.
* mips/jmr3904app.ld: Likewise.
* mips/jmr3904dram-java.ld: Likewise.
* mips/jmr3904dram.ld: Likewise.
* mips/lsi.ld: Likewise.
* mips/pmon.ld: Likewise.
* mips/sde32.ld: Likewise.
* mips/sde64.ld: Likewise.
---
libgloss/mips/array.ld | 1 +
libgloss/mips/ddb-kseg0.ld | 1 +
libgloss/mips/ddb.ld | 1 +
libgloss/mips/dve.ld | 1 +
libgloss/mips/idt.ld | 1 +
libgloss/mips/idt32.ld | 1 +
libgloss/mips/idt64.ld | 1 +
libgloss/mips/idtecoff.ld | 1 +
libgloss/mips/jmr3904app-java.ld | 1 +
libgloss/mips/jmr3904app.ld | 1 +
libgloss/mips/jmr3904dram-java.ld | 1 +
libgloss/mips/jmr3904dram.ld | 1 +
libgloss/mips/lsi.ld | 1 +
libgloss/mips/pmon.ld | 1 +
libgloss/mips/sde32.ld | 1 +
libgloss/mips/sde64.ld | 1 +
16 files changed, 16 insertions(+)
diff --git a/libgloss/mips/array.ld b/libgloss/mips/array.ld
index 0492ae5..5cdcf40 100644
--- a/libgloss/mips/array.ld
+++ b/libgloss/mips/array.ld
@@ -182,6 +182,7 @@ SECTIONS
*(.gnu.linkonce.b.*)
*(COMMON)
}
+ . = ALIGN(4);
end = .;
_end = .;
}
diff --git a/libgloss/mips/ddb-kseg0.ld b/libgloss/mips/ddb-kseg0.ld
index 8c1f926..a8643fd 100644
--- a/libgloss/mips/ddb-kseg0.ld
+++ b/libgloss/mips/ddb-kseg0.ld
@@ -135,6 +135,7 @@ SECTIONS
*(COMMON)
}
+ . = ALIGN(4);
end = .;
_end = .;
diff --git a/libgloss/mips/ddb.ld b/libgloss/mips/ddb.ld
index 299106f..7b899d4 100644
--- a/libgloss/mips/ddb.ld
+++ b/libgloss/mips/ddb.ld
@@ -135,6 +135,7 @@ SECTIONS
*(COMMON)
}
+ . = ALIGN(4);
end = .;
_end = .;
diff --git a/libgloss/mips/dve.ld b/libgloss/mips/dve.ld
index 96abbbe..e28c9c2 100644
--- a/libgloss/mips/dve.ld
+++ b/libgloss/mips/dve.ld
@@ -137,6 +137,7 @@ SECTIONS
*(COMMON)
}
+ . = ALIGN(4);
end = .;
_end = .;
diff --git a/libgloss/mips/idt.ld b/libgloss/mips/idt.ld
index b4608bf..a779569 100644
--- a/libgloss/mips/idt.ld
+++ b/libgloss/mips/idt.ld
@@ -143,6 +143,7 @@ SECTIONS
*(COMMON)
}
+ . = ALIGN(4);
PROVIDE (end = .);
_end = .;
diff --git a/libgloss/mips/idt32.ld b/libgloss/mips/idt32.ld
index 5084df7..8d4e4d6 100644
--- a/libgloss/mips/idt32.ld
+++ b/libgloss/mips/idt32.ld
@@ -144,6 +144,7 @@ SECTIONS
*(COMMON)
}
+ . = ALIGN(4);
PROVIDE (end = .);
_end = .;
diff --git a/libgloss/mips/idt64.ld b/libgloss/mips/idt64.ld
index a1121c6..8d996bc 100644
--- a/libgloss/mips/idt64.ld
+++ b/libgloss/mips/idt64.ld
@@ -145,6 +145,7 @@ SECTIONS
*(COMMON)
}
+ . = ALIGN(4);
PROVIDE (end = .);
_end = .;
diff --git a/libgloss/mips/idtecoff.ld b/libgloss/mips/idtecoff.ld
index 0297c60..57111e1 100644
--- a/libgloss/mips/idtecoff.ld
+++ b/libgloss/mips/idtecoff.ld
@@ -95,6 +95,7 @@ SECTIONS
*(.gnu.linkonce.b.*)
*(COMMON)
}
+ . = ALIGN(4);
end = .;
_end = .;
}
diff --git a/libgloss/mips/jmr3904app-java.ld b/libgloss/mips/jmr3904app-java.ld
index 92de26d..c9539fd 100644
--- a/libgloss/mips/jmr3904app-java.ld
+++ b/libgloss/mips/jmr3904app-java.ld
@@ -99,6 +99,7 @@ SECTIONS
. = __stack ;
}
+ . = ALIGN(4);
end = .;
_end = .;
diff --git a/libgloss/mips/jmr3904app.ld b/libgloss/mips/jmr3904app.ld
index 367fc47..35c2fec 100644
--- a/libgloss/mips/jmr3904app.ld
+++ b/libgloss/mips/jmr3904app.ld
@@ -137,6 +137,7 @@ SECTIONS
. = __stack ;
}
+ . = ALIGN(4);
end = .;
_end = .;
diff --git a/libgloss/mips/jmr3904dram-java.ld b/libgloss/mips/jmr3904dram-java.ld
index 4c0681a..aedd434 100644
--- a/libgloss/mips/jmr3904dram-java.ld
+++ b/libgloss/mips/jmr3904dram-java.ld
@@ -98,6 +98,7 @@ SECTIONS
*(COMMON)
}
+ . = ALIGN(4);
end = .;
_end = .;
diff --git a/libgloss/mips/jmr3904dram.ld b/libgloss/mips/jmr3904dram.ld
index 9e7d255..168c318 100644
--- a/libgloss/mips/jmr3904dram.ld
+++ b/libgloss/mips/jmr3904dram.ld
@@ -95,6 +95,7 @@ SECTIONS
*(COMMON)
}
+ . = ALIGN(4);
end = .;
_end = .;
diff --git a/libgloss/mips/lsi.ld b/libgloss/mips/lsi.ld
index 780c31c..e350419 100644
--- a/libgloss/mips/lsi.ld
+++ b/libgloss/mips/lsi.ld
@@ -133,6 +133,7 @@ SECTIONS
*(.gnu.linkonce.b.*)
*(COMMON)
}
+ . = ALIGN(4);
end = .;
_end = .;
}
diff --git a/libgloss/mips/pmon.ld b/libgloss/mips/pmon.ld
index fff6f66..1608cd9 100644
--- a/libgloss/mips/pmon.ld
+++ b/libgloss/mips/pmon.ld
@@ -137,6 +137,7 @@ SECTIONS
*(COMMON)
}
+ . = ALIGN(4);
end = .;
_end = .;
diff --git a/libgloss/mips/sde32.ld b/libgloss/mips/sde32.ld
index 7273107..4ef3b69 100644
--- a/libgloss/mips/sde32.ld
+++ b/libgloss/mips/sde32.ld
@@ -144,6 +144,7 @@ SECTIONS
*(COMMON)
}
+ . = ALIGN(4);
PROVIDE (end = .);
_end = .;
diff --git a/libgloss/mips/sde64.ld b/libgloss/mips/sde64.ld
index 0bcbe98..40338a1 100644
--- a/libgloss/mips/sde64.ld
+++ b/libgloss/mips/sde64.ld
@@ -146,6 +146,7 @@ SECTIONS
*(COMMON)
}
+ . = ALIGN(4);
PROVIDE (end = .);
_end = .;
--
2.7.4
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH, MIPS] Align _end symbol to at least 4 in all MIPS scripts
2019-07-30 0:00 ` [PATCH, MIPS] Align _end symbol to at least 4 in all MIPS scripts Faraz Shahbazker
@ 2019-07-30 7:39 ` Corinna Vinschen
0 siblings, 0 replies; 8+ messages in thread
From: Corinna Vinschen @ 2019-07-30 7:39 UTC (permalink / raw)
To: Faraz Shahbazker; +Cc: newlib, Richard Sandiford
[-- Attachment #1: Type: text/plain, Size: 1294 bytes --]
On Jul 30 00:00, Faraz Shahbazker wrote:
> Left-over part of commit 84b2a020daa17d8ee5c9ec979c3d56f95e69573b
>
> The _end marker must be aligned to 4-bytes to ensure that the last
> element written does not reach beyond the address of _end. This is
> also necessary as the termination condition is an equality test
> instead of an ordered test so (_end - _fbss) must be a multiple of
> 4-bytes. The alignment is already correct for mti*.ld files, fix
> it for all remaining MIPS scripts that don't already align to at
> least 4.
>
> libgloss/
> * mips/array.ld: Align _end to 4 byte boundary.
> * mips/ddb-kseg0.ld: Likewise.
> * mips/ddb.ld: Likewise.
> * mips/dve.ld: Likewise.
> * mips/idt.ld: Likewise.
> * mips/idt32.ld: Likewise.
> * mips/idt64.ld: Likewise.
> * mips/idtecoff.ld: Likewise.
> * mips/jmr3904app-java.ld: Likewise.
> * mips/jmr3904app.ld: Likewise.
> * mips/jmr3904dram-java.ld: Likewise.
> * mips/jmr3904dram.ld: Likewise.
> * mips/lsi.ld: Likewise.
> * mips/pmon.ld: Likewise.
> * mips/sde32.ld: Likewise.
> * mips/sde64.ld: Likewise.
Pushed. I just dropped the CVS ChangeLog remnant from the commit
message. We really don't do that anymore.
Thanks,
Corinna
--
Corinna Vinschen
Cygwin Maintainer
Red Hat
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-07-30 7:39 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-18 12:14 [PATCH,MIPS 2/3] Enable reorder for crt0.S Matthew Fortune
2014-11-26 18:00 ` Steve Ellcey
2014-11-27 0:15 ` Matthew Fortune
2014-11-27 0:28 ` Jeff Johnston
2019-07-04 17:23 ` Richard Sandiford
2019-07-06 3:26 ` Faraz Shahbazker
2019-07-30 0:00 ` [PATCH, MIPS] Align _end symbol to at least 4 in all MIPS scripts Faraz Shahbazker
2019-07-30 7:39 ` Corinna Vinschen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).