* [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops @ 2019-07-21 17:39 Fredrik Noring 2019-07-21 22:22 ` Maciej W. Rozycki 0 siblings, 1 reply; 10+ messages in thread From: Fredrik Noring @ 2019-07-21 17:39 UTC (permalink / raw) To: Paul Burton, Matthew Fortune Cc: Maciej W. Rozycki, Jürgen Urban, gcc-patches Hi Paul, Matthew, Paul -- as I'm preparing the R5900 kernel patches there was a USB DMA series and breakage that needed attention. The fixes ending with ff2437befd8f ("usb: host: Fix excessive alignment restriction for local memory allocations") are now merged with Linus' kernel, and recommended for the R5900 series. The initial R5900 patch submission is getting closer. :) The present problem is related to GCC and the R5900 short loop bug[1]. It turns out GCC emits assembly code like loop: addiu $5,$5,1 addiu $4,$4,1 lb $2,-1($5) .set noreorder .set nomacro bne $2,$3,loop sb $2,-1($4) .set macro .set reorder that is undefined for the R5900 (this short loop has five instructions), for simple C code such as while ((*s++ = *p++) != '\n') ; in the kernel. The noreorder directive prohibits GAS from corrections, and GAS really ought to give an error for it, I think. In the meantime, I have a tool that does machine code analysis of ELF objects to catch undefined R5900 short loops, including those made with assembler macros in the kernel. [ In theory, GAS could actually insert NOPs prior to the noreorder directive, to make the loop longer that six instructions, but GAS does not have that kind of capability. Another option that GCC prevents is to place a NOP in the delay slot. ] A reasonable fix for GCC is perhaps to update gcc/config/mips/mips.md to not make explicit use of the branch delay slot, as suggested by the patch below? Then GCC will emit loop: addiu $5,$5,1 addiu $4,$4,1 lb $2,-1($5) sb $2,-1($4) bne $2,$3,loop that GAS will adjust in the ELF object to 4: 24a50001 addiu a1,a1,1 8: 24840001 addiu a0,a0,1 c: 80a2ffff lb v0,-1(a1) 10: a082ffff sb v0,-1(a0) 14: 1443fffb bne v0,v1,4 18: 00000000 nop where a NOP is placed in the delay slot to avoid the bug. For longer loops, this relies on GAS making appropriate use of the delay slot. I'm not sure if GAS is good at that, though? Fredrik References: [1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gas/config/tc-mips.c;h=b7b4b6989a12d3091a02de7155fbea3adbf1c9d7;hb=HEAD#l7133 On the R5900 short loops need to be fixed by inserting a NOP in the branch delay slot. The short loop bug under certain conditions causes loops to execute only once or twice. We must ensure that the assembler never generates loops that satisfy all of the following conditions: - a loop consists of less than or equal to six instructions (including the branch delay slot); - a loop contains only one conditional branch instruction at the end of the loop; - a loop does not contain any other branch or jump instructions; - a branch delay slot of the loop is not NOP (EE 2.9 or later). We need to do this because of a hardware bug in the R5900 chip. diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md index e17b1d522f0..acd31a8960c 100644 --- a/gcc/config/mips/mips.md +++ b/gcc/config/mips/mips.md @@ -1091,6 +1091,7 @@ (define_delay (and (eq_attr "type" "branch") (not (match_test "TARGET_MIPS16")) + (not (match_test "TARGET_FIX_R5900")) (eq_attr "branch_likely" "yes")) [(eq_attr "can_delay" "yes") (nil) @@ -1100,6 +1101,7 @@ ;; not annul on false. (define_delay (and (eq_attr "type" "branch") (not (match_test "TARGET_MIPS16")) + (not (match_test "TARGET_FIX_R5900")) (ior (match_test "TARGET_CB_NEVER") (and (eq_attr "compact_form" "maybe") (not (match_test "TARGET_CB_ALWAYS"))) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops 2019-07-21 17:39 [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops Fredrik Noring @ 2019-07-21 22:22 ` Maciej W. Rozycki 2019-07-22 15:56 ` Fredrik Noring 0 siblings, 1 reply; 10+ messages in thread From: Maciej W. Rozycki @ 2019-07-21 22:22 UTC (permalink / raw) To: Fredrik Noring Cc: Paul Burton, Matthew Fortune, Jürgen Urban, gcc-patches Hi Fredrik, > The present problem is related to GCC and the R5900 short loop bug[1]. It > turns out GCC emits assembly code like > > loop: addiu $5,$5,1 > addiu $4,$4,1 > lb $2,-1($5) > .set noreorder > .set nomacro > bne $2,$3,loop > sb $2,-1($4) > .set macro > .set reorder > > that is undefined for the R5900 (this short loop has five instructions), > for simple C code such as > > while ((*s++ = *p++) != '\n') > ; > > in the kernel. The noreorder directive prohibits GAS from corrections, and > GAS really ought to give an error for it, I think. In the meantime, I have a > tool that does machine code analysis of ELF objects to catch undefined R5900 > short loops, including those made with assembler macros in the kernel. I think that should be a GAS warning really (similarly to macros that expand to multiple instructions in a delay slot) as people ought to be allowed to do what they wish, and then `-Werror' can be used for code quality enforcement (and possibly disabled on a case-by-case basis). > [ In theory, GAS could actually insert NOPs prior to the noreorder directive, > to make the loop longer that six instructions, but GAS does not have that > kind of capability. Another option that GCC prevents is to place a NOP in > the delay slot. ] Well, GAS does have that capability, although of course it is not enabled for `noreorder' code. For generated code I think however that usually it will be cheaper performance-wise if a non-trivial delay-slot instruction is never produced in the affected cases (i.e. a dummy NOP is always used). > A reasonable fix for GCC is perhaps to update gcc/config/mips/mips.md to not > make explicit use of the branch delay slot, as suggested by the patch below? > Then GCC will emit > > loop: addiu $5,$5,1 > addiu $4,$4,1 > lb $2,-1($5) > sb $2,-1($4) > bne $2,$3,loop > > that GAS will adjust in the ELF object to > > 4: 24a50001 addiu a1,a1,1 > 8: 24840001 addiu a0,a0,1 > c: 80a2ffff lb v0,-1(a1) > 10: a082ffff sb v0,-1(a0) > 14: 1443fffb bne v0,v1,4 > 18: 00000000 nop > > where a NOP is placed in the delay slot to avoid the bug. For longer loops, > this relies on GAS making appropriate use of the delay slot. I'm not sure if > GAS is good at that, though? I'm sort-of surprised that GCC has produced `reorder' code here, making it rely on GAS for delay slot scheduling. Have you used an unusual set of options that prevents GCC from making `noreorder' code, which as I recall is the usual default (not for the MIPS16 mode IIRC, as well as some obscure corner cases)? > On the R5900 short loops need to be fixed by inserting a NOP in the > branch delay slot. > > The short loop bug under certain conditions causes loops to execute > only once or twice. We must ensure that the assembler never > generates loops that satisfy all of the following conditions: > > - a loop consists of less than or equal to six instructions > (including the branch delay slot); > - a loop contains only one conditional branch instruction at the end > of the loop; > - a loop does not contain any other branch or jump instructions; > - a branch delay slot of the loop is not NOP (EE 2.9 or later). > > We need to do this because of a hardware bug in the R5900 chip. > > diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md > index e17b1d522f0..acd31a8960c 100644 > --- a/gcc/config/mips/mips.md > +++ b/gcc/config/mips/mips.md > @@ -1091,6 +1091,7 @@ > > (define_delay (and (eq_attr "type" "branch") > (not (match_test "TARGET_MIPS16")) > + (not (match_test "TARGET_FIX_R5900")) > (eq_attr "branch_likely" "yes")) > [(eq_attr "can_delay" "yes") > (nil) > @@ -1100,6 +1101,7 @@ > ;; not annul on false. > (define_delay (and (eq_attr "type" "branch") > (not (match_test "TARGET_MIPS16")) > + (not (match_test "TARGET_FIX_R5900")) > (ior (match_test "TARGET_CB_NEVER") > (and (eq_attr "compact_form" "maybe") > (not (match_test "TARGET_CB_ALWAYS"))) > I think you need to modify the default `can_delay' attribute definition instead (in the same way). An improved future version might determine the exact conditions as noted in your proposed commit description, however I'd suggest making this simple change first. HTH, Maciej ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops 2019-07-21 22:22 ` Maciej W. Rozycki @ 2019-07-22 15:56 ` Fredrik Noring 2019-07-22 21:48 ` Maciej W. Rozycki 0 siblings, 1 reply; 10+ messages in thread From: Fredrik Noring @ 2019-07-22 15:56 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Paul Burton, Matthew Fortune, Jürgen Urban, gcc-patches Hi Maciej, I'm glad to hear from you again! > I think that should be a GAS warning really (similarly to macros that > expand to multiple instructions in a delay slot) as people ought to be > allowed to do what they wish, and then `-Werror' can be used for code > quality enforcement (and possibly disabled on a case-by-case basis). How can one reasonably prevent the bug when compiling a whole Linux distribution with thousands of packages if GAS merely warns and proceeds in many cases? > > [ In theory, GAS could actually insert NOPs prior to the noreorder directive, > > to make the loop longer that six instructions, but GAS does not have that > > kind of capability. Another option that GCC prevents is to place a NOP in > > the delay slot. ] > > Well, GAS does have that capability, although of course it is not enabled > for `noreorder' code. Hmm? I'm unable to make sense of that, since such NOPs are unaffected by noreorder. This is what I meant: loop: addiu $5,$5,1 addiu $4,$4,1 lb $2,-1($5) nop /* These two NOPs would prevent the */ nop /* bug while preserving noreorder. */ .set noreorder .set nomacro bne $2,$3,loop sb $2,-1($4) .set macro .set reorder > For generated code I think however that usually it > will be cheaper performance-wise if a non-trivial delay-slot instruction > is never produced in the affected cases (i.e. a dummy NOP is always used). I suppose so too. One observation is that R5900 BogoMIPS went down from 584 to 195 when switching from the generic kernel variant .set noreorder 1: bnez a0,1b addiu a0,a0,-1 .set reorder that is undefined for R5900 in arch/mips/lib/delay.c, to a special variant beqz a0,2f 1: addiu a0,a0,-1 bnez a0,1b 2: for the R5900 where GAS will place a NOP in the branch delay slot. With loop unrolling BogoMIPS could be inflated again, of course. In general, unrolling is a bit better for the R5900 than general MIPS. Perhaps GCC could be informed about that, possibly via profile-guided optimisation. > > A reasonable fix for GCC is perhaps to update gcc/config/mips/mips.md to not > > make explicit use of the branch delay slot, as suggested by the patch below? > > Then GCC will emit > > > > loop: addiu $5,$5,1 > > addiu $4,$4,1 > > lb $2,-1($5) > > sb $2,-1($4) > > bne $2,$3,loop > > > > that GAS will adjust in the ELF object to > > > > 4: 24a50001 addiu a1,a1,1 > > 8: 24840001 addiu a0,a0,1 > > c: 80a2ffff lb v0,-1(a1) > > 10: a082ffff sb v0,-1(a0) > > 14: 1443fffb bne v0,v1,4 > > 18: 00000000 nop > > > > where a NOP is placed in the delay slot to avoid the bug. For longer loops, > > this relies on GAS making appropriate use of the delay slot. I'm not sure if > > GAS is good at that, though? > > I'm sort-of surprised that GCC has produced `reorder' code here, making > it rely on GAS for delay slot scheduling. Have you used an unusual set of > options that prevents GCC from making `noreorder' code, which as I recall > is the usual default (not for the MIPS16 mode IIRC, as well as some > obscure corner cases)? I used the suggested patch, and recompiled the kernel with it, and verified with the machine code tool that there were no undefined loops. Wouldn't that be enough? > > diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md > > index e17b1d522f0..acd31a8960c 100644 > > --- a/gcc/config/mips/mips.md > > +++ b/gcc/config/mips/mips.md > > @@ -1091,6 +1091,7 @@ > > > > (define_delay (and (eq_attr "type" "branch") > > (not (match_test "TARGET_MIPS16")) > > + (not (match_test "TARGET_FIX_R5900")) > > (eq_attr "branch_likely" "yes")) > > [(eq_attr "can_delay" "yes") > > (nil) > > @@ -1100,6 +1101,7 @@ > > ;; not annul on false. > > (define_delay (and (eq_attr "type" "branch") > > (not (match_test "TARGET_MIPS16")) > > + (not (match_test "TARGET_FIX_R5900")) > > (ior (match_test "TARGET_CB_NEVER") > > (and (eq_attr "compact_form" "maybe") > > (not (match_test "TARGET_CB_ALWAYS"))) > > > > I think you need to modify the default `can_delay' attribute definition > instead (in the same way). I tried to limit the case to branch delays only, which is a rough approximation. Call and jump delay slots are acceptable. Are you referring to this piece? ;; Can the instruction be put into a delay slot? (define_attr "can_delay" "no,yes" (if_then_else (and (eq_attr "type" "!branch,call,jump") (eq_attr "hazard" "none") (match_test "get_attr_insn_count (insn) == 1")) (const_string "yes") (const_string "no"))) > An improved future version might determine the > exact conditions as noted in your proposed commit description, however I'd > suggest making this simple change first. Learning the exact conditions, in a hardware sense, would be interesting indeed, as some short loops actually do appear to work despite being labeled as undefined. A fairly deep understanding of the R5900 implementation is essential for such an undertaking. :) Fredrik ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops 2019-07-22 15:56 ` Fredrik Noring @ 2019-07-22 21:48 ` Maciej W. Rozycki 2019-07-22 22:19 ` Jeff Law 2019-07-23 17:08 ` Fredrik Noring 0 siblings, 2 replies; 10+ messages in thread From: Maciej W. Rozycki @ 2019-07-22 21:48 UTC (permalink / raw) To: Fredrik Noring Cc: Paul Burton, Matthew Fortune, Jürgen Urban, gcc-patches Hi Fredrik, > I'm glad to hear from you again! I'm not dead, just distracted. > > I think that should be a GAS warning really (similarly to macros that > > expand to multiple instructions in a delay slot) as people ought to be > > allowed to do what they wish, and then `-Werror' can be used for code > > quality enforcement (and possibly disabled on a case-by-case basis). > > How can one reasonably prevent the bug when compiling a whole Linux > distribution with thousands of packages if GAS merely warns and proceeds > in many cases? I think the tools must not set a policy. By using `.set noreorder' the user told the toolchain he knows better and asked to keep its hands away. As I say you can use `-Werror' for code auditing. And in most cases manually scheduled delay slots in handcoded assembly are a sign of a problem with the piece of code affected, regardless of the R5900 erratum. > > > [ In theory, GAS could actually insert NOPs prior to the noreorder directive, > > > to make the loop longer that six instructions, but GAS does not have that > > > kind of capability. Another option that GCC prevents is to place a NOP in > > > the delay slot. ] > > > > Well, GAS does have that capability, although of course it is not enabled > > for `noreorder' code. > > Hmm? I'm unable to make sense of that, since such NOPs are unaffected by > noreorder. This is what I meant: > > loop: addiu $5,$5,1 > addiu $4,$4,1 > lb $2,-1($5) > nop /* These two NOPs would prevent the */ > nop /* bug while preserving noreorder. */ > .set noreorder > .set nomacro > bne $2,$3,loop > sb $2,-1($4) > .set macro > .set reorder See `nops_for_insn'. However again, as I noted, `.set noreorder' tells GAS not to analyse the dependencies for code within. There is no need to schedule this delay slot manually, and if this is generated code, then the producer (GCC) should have taken care of the hazards, be it architectural or errata. If this is manually written code, then the author asked for trouble. > > For generated code I think however that usually it > > will be cheaper performance-wise if a non-trivial delay-slot instruction > > is never produced in the affected cases (i.e. a dummy NOP is always used). > > I suppose so too. One observation is that R5900 BogoMIPS went down from > 584 to 195 when switching from the generic kernel variant > > .set noreorder > 1: bnez a0,1b > addiu a0,a0,-1 > .set reorder > > that is undefined for R5900 in arch/mips/lib/delay.c, to a special variant > > beqz a0,2f > 1: addiu a0,a0,-1 > bnez a0,1b > 2: > > for the R5900 where GAS will place a NOP in the branch delay slot. With > loop unrolling BogoMIPS could be inflated again, of course. In general, > unrolling is a bit better for the R5900 than general MIPS. Perhaps GCC > could be informed about that, possibly via profile-guided optimisation. Well, BogoMIPS is just an arbitrary number. There is a data dependency here, so manual delay slot scheduling was unavoidable. I'd suggest using this as a functional equivalent for the R5900: addiu a0,a0,1 1: addiu a0,a0,-1 bnez a0,1b and avoiding the irregularity for a0==0. > > > A reasonable fix for GCC is perhaps to update gcc/config/mips/mips.md to not > > > make explicit use of the branch delay slot, as suggested by the patch below? > > > Then GCC will emit > > > > > > loop: addiu $5,$5,1 > > > addiu $4,$4,1 > > > lb $2,-1($5) > > > sb $2,-1($4) > > > bne $2,$3,loop > > > > > > that GAS will adjust in the ELF object to > > > > > > 4: 24a50001 addiu a1,a1,1 > > > 8: 24840001 addiu a0,a0,1 > > > c: 80a2ffff lb v0,-1(a1) > > > 10: a082ffff sb v0,-1(a0) > > > 14: 1443fffb bne v0,v1,4 > > > 18: 00000000 nop > > > > > > where a NOP is placed in the delay slot to avoid the bug. For longer loops, > > > this relies on GAS making appropriate use of the delay slot. I'm not sure if > > > GAS is good at that, though? > > > > I'm sort-of surprised that GCC has produced `reorder' code here, making > > it rely on GAS for delay slot scheduling. Have you used an unusual set of > > options that prevents GCC from making `noreorder' code, which as I recall > > is the usual default (not for the MIPS16 mode IIRC, as well as some > > obscure corner cases)? > > I used the suggested patch, and recompiled the kernel with it, and verified > with the machine code tool that there were no undefined loops. Wouldn't that > be enough? It would break if GAS was invoked without `-mfix-r5900' (e.g. by using an explicit `-Wa,-mno-fix-r5900' option). > > > diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md > > > index e17b1d522f0..acd31a8960c 100644 > > > --- a/gcc/config/mips/mips.md > > > +++ b/gcc/config/mips/mips.md > > > @@ -1091,6 +1091,7 @@ > > > > > > (define_delay (and (eq_attr "type" "branch") > > > (not (match_test "TARGET_MIPS16")) > > > + (not (match_test "TARGET_FIX_R5900")) > > > (eq_attr "branch_likely" "yes")) > > > [(eq_attr "can_delay" "yes") > > > (nil) > > > @@ -1100,6 +1101,7 @@ > > > ;; not annul on false. > > > (define_delay (and (eq_attr "type" "branch") > > > (not (match_test "TARGET_MIPS16")) > > > + (not (match_test "TARGET_FIX_R5900")) > > > (ior (match_test "TARGET_CB_NEVER") > > > (and (eq_attr "compact_form" "maybe") > > > (not (match_test "TARGET_CB_ALWAYS"))) > > > > > > > I think you need to modify the default `can_delay' attribute definition > > instead (in the same way). > > I tried to limit the case to branch delays only, which is a rough > approximation. Call and jump delay slots are acceptable. Are you referring > to this piece? > > ;; Can the instruction be put into a delay slot? > (define_attr "can_delay" "no,yes" > (if_then_else (and (eq_attr "type" "!branch,call,jump") > (eq_attr "hazard" "none") > (match_test "get_attr_insn_count (insn) == 1")) > (const_string "yes") > (const_string "no"))) Yes. My suggestion does not preclude limiting the workaround to branches only while being precise about what the situation is, i.e. branches still require a delay slot instruction (unlike MIPS16 or uMIPS compact branches) but no real instruction is suitable. I'd prefer if GCC actually scheduled the required dummy NOP instruction explicitly. > > An improved future version might determine the > > exact conditions as noted in your proposed commit description, however I'd > > suggest making this simple change first. > > Learning the exact conditions, in a hardware sense, would be interesting > indeed, as some short loops actually do appear to work despite being labeled > as undefined. A fairly deep understanding of the R5900 implementation is > essential for such an undertaking. :) Ack, although in this case I only meant what was stated in the commit description, i.e. avoiding forcing a dummy delay slot instruction where known for sure not to be needed, e.g. a long loop, a forward branch, etc. Maciej ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops 2019-07-22 21:48 ` Maciej W. Rozycki @ 2019-07-22 22:19 ` Jeff Law 2019-07-23 17:08 ` Fredrik Noring 1 sibling, 0 replies; 10+ messages in thread From: Jeff Law @ 2019-07-22 22:19 UTC (permalink / raw) To: Maciej W. Rozycki, Fredrik Noring Cc: Paul Burton, Matthew Fortune, Jürgen Urban, gcc-patches On 7/22/19 3:47 PM, Maciej W. Rozycki wrote: > Hi Fredrik, > >> I'm glad to hear from you again! > > I'm not dead, just distracted. > >>> I think that should be a GAS warning really (similarly to macros that >>> expand to multiple instructions in a delay slot) as people ought to be >>> allowed to do what they wish, and then `-Werror' can be used for code >>> quality enforcement (and possibly disabled on a case-by-case basis). >> >> How can one reasonably prevent the bug when compiling a whole Linux >> distribution with thousands of packages if GAS merely warns and proceeds >> in many cases? > > I think the tools must not set a policy. By using `.set noreorder' the > user told the toolchain he knows better and asked to keep its hands away. Agreed, 100%. Jeff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops 2019-07-22 21:48 ` Maciej W. Rozycki 2019-07-22 22:19 ` Jeff Law @ 2019-07-23 17:08 ` Fredrik Noring 2019-07-24 11:11 ` Maciej W. Rozycki 1 sibling, 1 reply; 10+ messages in thread From: Fredrik Noring @ 2019-07-23 17:08 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Paul Burton, Matthew Fortune, Jürgen Urban, gcc-patches Hi Maciej, > > How can one reasonably prevent the bug when compiling a whole Linux > > distribution with thousands of packages if GAS merely warns and proceeds > > in many cases? > > I think the tools must not set a policy. By using `.set noreorder' the > user told the toolchain he knows better and asked to keep its hands away. > > As I say you can use `-Werror' for code auditing. And in most cases > manually scheduled delay slots in handcoded assembly are a sign of a > problem with the piece of code affected, regardless of the R5900 erratum. Well, it seems practical to use unofficial tools and a patched GAS to fix this assembly bug, then. It's a grave problem for the R5900 and it needs to be reliably detected. > > > > [ In theory, GAS could actually insert NOPs prior to the noreorder directive, > > > > to make the loop longer that six instructions, but GAS does not have that > > > > kind of capability. Another option that GCC prevents is to place a NOP in > > > > the delay slot. ] > > > > > > Well, GAS does have that capability, although of course it is not enabled > > > for `noreorder' code. > > > > Hmm? I'm unable to make sense of that, since such NOPs are unaffected by > > noreorder. This is what I meant: > > > > loop: addiu $5,$5,1 > > addiu $4,$4,1 > > lb $2,-1($5) > > nop /* These two NOPs would prevent the */ > > nop /* bug while preserving noreorder. */ > > .set noreorder > > .set nomacro > > bne $2,$3,loop > > sb $2,-1($4) > > .set macro > > .set reorder > > See `nops_for_insn'. However again, as I noted, `.set noreorder' tells > GAS not to analyse the dependencies for code within. There is no need to > schedule this delay slot manually, and if this is generated code, then the > producer (GCC) should have taken care of the hazards, be it architectural > or errata. If this is manually written code, then the author asked for > trouble. I'm using the principle above to unobtrusively adjust problematic kernel code, via a short_loop_war macro. Here is one patched instance: --- a/arch/mips/boot/compressed/head.S +++ b/arch/mips/boot/compressed/head.S @@ -13,6 +13,7 @@ */ #include <asm/asm.h> +#include <asm/asmmacro.h> #include <asm/regdef.h> .set noreorder @@ -29,6 +30,7 @@ start: PTR_LA a0, _edata PTR_LA a2, _end 1: sw zero, 0(a0) + short_loop_war(1b) bne a2, a0, 1b addiu a0, a0, 4 @@ -48,6 +50,7 @@ start: jr k0 nop 3: + short_loop_war(3b) b 3b nop END(start) The short_loop_war macro is designed to be placed just before the branch, and it inserts NOPs as necessary for the R5900: #ifdef CONFIG_CPU_R5900 .macro short_loop_war loop_target .set instruction_count,2 + (. - \loop_target) / 4 .ifgt 7 - instruction_count .rept 7 - instruction_count nop .endr .endif .endm #else .macro short_loop_war loop_target .endm #endif > Well, BogoMIPS is just an arbitrary number. So presumably the noreorder BogoMIPS variant can be replaced with a single reorder variant that works with all MIPS implementations? > There is a data dependency here, so manual delay slot scheduling was > unavoidable. I'd suggest using this as a functional equivalent for the > R5900: > > addiu a0,a0,1 > 1: addiu a0,a0,-1 > bnez a0,1b > > and avoiding the irregularity for a0==0. Thanks! > > I used the suggested patch, and recompiled the kernel with it, and verified > > with the machine code tool that there were no undefined loops. Wouldn't that > > be enough? > > It would break if GAS was invoked without `-mfix-r5900' (e.g. by using an > explicit `-Wa,-mno-fix-r5900' option). I see, hmm. > > I tried to limit the case to branch delays only, which is a rough > > approximation. Call and jump delay slots are acceptable. Are you referring > > to this piece? > > > > ;; Can the instruction be put into a delay slot? > > (define_attr "can_delay" "no,yes" > > (if_then_else (and (eq_attr "type" "!branch,call,jump") > > (eq_attr "hazard" "none") > > (match_test "get_attr_insn_count (insn) == 1")) > > (const_string "yes") > > (const_string "no"))) > > Yes. My suggestion does not preclude limiting the workaround to branches > only while being precise about what the situation is, i.e. branches still > require a delay slot instruction (unlike MIPS16 or uMIPS compact branches) > but no real instruction is suitable. I'd prefer if GCC actually scheduled > the required dummy NOP instruction explicitly. OK. Fredrik ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops 2019-07-23 17:08 ` Fredrik Noring @ 2019-07-24 11:11 ` Maciej W. Rozycki 2019-07-24 15:03 ` Fredrik Noring 0 siblings, 1 reply; 10+ messages in thread From: Maciej W. Rozycki @ 2019-07-24 11:11 UTC (permalink / raw) To: Fredrik Noring Cc: Paul Burton, Matthew Fortune, Jürgen Urban, gcc-patches Fredrik, > > > How can one reasonably prevent the bug when compiling a whole Linux > > > distribution with thousands of packages if GAS merely warns and proceeds > > > in many cases? > > > > I think the tools must not set a policy. By using `.set noreorder' the > > user told the toolchain he knows better and asked to keep its hands away. > > > > As I say you can use `-Werror' for code auditing. And in most cases > > manually scheduled delay slots in handcoded assembly are a sign of a > > problem with the piece of code affected, regardless of the R5900 erratum. > > Well, it seems practical to use unofficial tools and a patched GAS to fix > this assembly bug, then. It's a grave problem for the R5900 and it needs to > be reliably detected. Why? What is wrong with using `-Werror'? Or you could use `-Wa,-fatal-warnings' to limit making warnings fatal to the assembly stage only if you are concerned about bad high-level language code quality interfering with your goal. > > See `nops_for_insn'. However again, as I noted, `.set noreorder' tells > > GAS not to analyse the dependencies for code within. There is no need to > > schedule this delay slot manually, and if this is generated code, then the > > producer (GCC) should have taken care of the hazards, be it architectural > > or errata. If this is manually written code, then the author asked for > > trouble. > > I'm using the principle above to unobtrusively adjust problematic kernel > code, via a short_loop_war macro. Here is one patched instance: > > --- a/arch/mips/boot/compressed/head.S > +++ b/arch/mips/boot/compressed/head.S > @@ -13,6 +13,7 @@ > */ > > #include <asm/asm.h> > +#include <asm/asmmacro.h> > #include <asm/regdef.h> > > .set noreorder > @@ -29,6 +30,7 @@ start: > PTR_LA a0, _edata > PTR_LA a2, _end > 1: sw zero, 0(a0) > + short_loop_war(1b) > bne a2, a0, 1b > addiu a0, a0, 4 > > @@ -48,6 +50,7 @@ start: > jr k0 > nop > 3: > + short_loop_war(3b) > b 3b > nop Is it needed here given that the delay slot instruction is a NOP anyway? Also this source does not need `.set noreorder', except for the loop around `1:' where a data dependency is present with the delay slot instruction. And I am surprised that it even assembles as it uses `.cprestore' without the required offset argument (not that this pseudo-op makes any sense here). And it's weird overall, e.g. it loads $ra explicitly rather than using JALR (or JAL even). But these are unrelated problems. > > Well, BogoMIPS is just an arbitrary number. > > So presumably the noreorder BogoMIPS variant can be replaced with a > single reorder variant that works with all MIPS implementations? Sure, BogoMIPS just records how quickly the delay loop runs, and therefore how many iterations are required for a given amount of time to spend twiddling thumbs. Maciej ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops 2019-07-24 11:11 ` Maciej W. Rozycki @ 2019-07-24 15:03 ` Fredrik Noring 2019-07-24 15:32 ` Maciej W. Rozycki 0 siblings, 1 reply; 10+ messages in thread From: Fredrik Noring @ 2019-07-24 15:03 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Paul Burton, Matthew Fortune, Jürgen Urban, gcc-patches Hi Maciej, > > > > How can one reasonably prevent the bug when compiling a whole Linux > > > > distribution with thousands of packages if GAS merely warns and proceeds > > > > in many cases? > > > > > > I think the tools must not set a policy. By using `.set noreorder' the > > > user told the toolchain he knows better and asked to keep its hands away. > > > > > > As I say you can use `-Werror' for code auditing. And in most cases > > > manually scheduled delay slots in handcoded assembly are a sign of a > > > problem with the piece of code affected, regardless of the R5900 erratum. > > > > Well, it seems practical to use unofficial tools and a patched GAS to fix > > this assembly bug, then. It's a grave problem for the R5900 and it needs to > > be reliably detected. > > Why? What is wrong with using `-Werror'? > > Or you could use `-Wa,-fatal-warnings' to limit making warnings fatal to > the assembly stage only if you are concerned about bad high-level language > code quality interfering with your goal. It appears unfeasible to fix thousands of build scripts to work like that. Although the numbers with actual MIPS code in them likely are very small, some may source macros and other stuff via various libraries. Some distributions, such as Gentoo, do have global CFLAGS but such mechanism are unreliable. I understand that one may want to take a principled stance with "hands away" and "author asked for trouble", but fact is that whomever put a given noreorder directive into a piece of code most likely didn't have R5900 errata in mind, and R5900 users most likely don't want the assembler to generate code that is known to cause subtle bugs of all imaginable kinds, including data corruption. Hence my recommendation. It's not really an argument because I'm not going to ask that the official GAS changes behaviour on this. > > 3: > > + short_loop_war(3b) > > b 3b > > nop > > Is it needed here given that the delay slot instruction is a NOP anyway? Ah, no. That is a left-over from the note that "a branch delay slot of the loop is not NOP (EE 2.9 or later)", as it appears a NOP is not enough for EE before 2.9. Perhaps the short_loop_war macro can be improved to give errors when misused. For example, it could potentially give an error if it is placed outside of noreorder, but it doesn't seem like GAS can inform the macro whether it is inside or outside. > Also this source does not need `.set noreorder', except for the loop > around `1:' where a data dependency is present with the delay slot > instruction. That data dependency can surely be reworked too, so as to make the whole piece reorderable? > And I am surprised that it even assembles as it uses > `.cprestore' without the required offset argument (not that this pseudo-op > makes any sense here). And it's weird overall, e.g. it loads $ra > explicitly rather than using JALR (or JAL even). But these are unrelated > problems. Good to know, thanks. :) Fredrik ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops 2019-07-24 15:03 ` Fredrik Noring @ 2019-07-24 15:32 ` Maciej W. Rozycki 2019-07-24 15:39 ` Jeff Law 0 siblings, 1 reply; 10+ messages in thread From: Maciej W. Rozycki @ 2019-07-24 15:32 UTC (permalink / raw) To: Fredrik Noring Cc: Paul Burton, Matthew Fortune, Jürgen Urban, gcc-patches Fredrik, > > Why? What is wrong with using `-Werror'? > > > > Or you could use `-Wa,-fatal-warnings' to limit making warnings fatal to > > the assembly stage only if you are concerned about bad high-level language > > code quality interfering with your goal. > > It appears unfeasible to fix thousands of build scripts to work like that. > Although the numbers with actual MIPS code in them likely are very small, > some may source macros and other stuff via various libraries. Some > distributions, such as Gentoo, do have global CFLAGS but such mechanism are > unreliable. I'd expect a sane distribution packager to prepare/patch any disobedient software packages to accept global distribution-specific flags, as there are often more important flags to pass which all packages need to be build with. However if they haven't, then put the flags in CC/CXX/etc. as you'd do with multilib selection options, or if that fails too, then wrap your compiler driver used for a distribution build into a shell script found according to $PATH (which may be non-standard) that adds any options required implicitly. I've done that many times when I found myself in situation where I had to override some silly hardcoded assumptions, including removing or transforming rather than adding options passed. > I understand that one may want to take a principled stance with "hands away" > and "author asked for trouble", but fact is that whomever put a given > noreorder directive into a piece of code most likely didn't have R5900 > errata in mind, and R5900 users most likely don't want the assembler to > generate code that is known to cause subtle bugs of all imaginable kinds, > including data corruption. Yes, and a warning is I think a reasonable solution. Alternatively maybe you could persuade binutils maintainers to have a `configure' option to make `-fatal-warnings' GAS's default for a given build. > > > 3: > > > + short_loop_war(3b) > > > b 3b > > > nop > > > > Is it needed here given that the delay slot instruction is a NOP anyway? > > Ah, no. That is a left-over from the note that "a branch delay slot of the > loop is not NOP (EE 2.9 or later)", as it appears a NOP is not enough for > EE before 2.9. Ah, OK then. > > Also this source does not need `.set noreorder', except for the loop > > around `1:' where a data dependency is present with the delay slot > > instruction. > > That data dependency can surely be reworked too, so as to make the whole > piece reorderable? The SW and ADDIU instructions can surely be swapped, e.g.: PTR_LA a0, _edata - 4 PTR_LA a2, _end 1: addiu a0, a0, 4 sw zero, 0(a0) bne a2, a0, 1b You can make it all `reorder' then. Note that this preserves the final out-of-range store, which I gather is deliberate for performance reasons. Maciej ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops 2019-07-24 15:32 ` Maciej W. Rozycki @ 2019-07-24 15:39 ` Jeff Law 0 siblings, 0 replies; 10+ messages in thread From: Jeff Law @ 2019-07-24 15:39 UTC (permalink / raw) To: Maciej W. Rozycki, Fredrik Noring Cc: Paul Burton, Matthew Fortune, Jürgen Urban, gcc-patches On 7/24/19 9:31 AM, Maciej W. Rozycki wrote: > Fredrik, > >>> Why? What is wrong with using `-Werror'? >>> >>> Or you could use `-Wa,-fatal-warnings' to limit making warnings fatal to >>> the assembly stage only if you are concerned about bad high-level language >>> code quality interfering with your goal. >> >> It appears unfeasible to fix thousands of build scripts to work like that. >> Although the numbers with actual MIPS code in them likely are very small, >> some may source macros and other stuff via various libraries. Some >> distributions, such as Gentoo, do have global CFLAGS but such mechanism are >> unreliable. > > I'd expect a sane distribution packager to prepare/patch any disobedient > software packages to accept global distribution-specific flags, as there > are often more important flags to pass which all packages need to be build > with. Getting good flags injection and being able to audit it is painful, but amazingly helpful (insert plug here for Nick's annobin/annocheck). Fixing packages to accept the standard methods of flag injection as well as addressing issues that arise is all standard procedure. Jeff ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-07-24 15:35 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-21 17:39 [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops Fredrik Noring 2019-07-21 22:22 ` Maciej W. Rozycki 2019-07-22 15:56 ` Fredrik Noring 2019-07-22 21:48 ` Maciej W. Rozycki 2019-07-22 22:19 ` Jeff Law 2019-07-23 17:08 ` Fredrik Noring 2019-07-24 11:11 ` Maciej W. Rozycki 2019-07-24 15:03 ` Fredrik Noring 2019-07-24 15:32 ` Maciej W. Rozycki 2019-07-24 15:39 ` Jeff Law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).