* [PATCH] libgcc/MIPS: Fill in delay slots of (some) MIPS16 call stubs
@ 2013-07-29 17:36 Maciej W. Rozycki
2013-07-29 19:24 ` Richard Sandiford
0 siblings, 1 reply; 5+ messages in thread
From: Maciej W. Rozycki @ 2013-07-29 17:36 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Sandiford
Hi,
A shortcoming of older versions of GAS makes branch swapping not happen
if the instruction to be reordered into a branch delay slot immediately
follows a delay slot of another branch. This happens to hit some MIPS16
call stubs, e.g. (from libgcc.a):
00000000 <__mips16_call_stub_sf_0>:
0: 03e09021 move s2,ra
4: 0040f809 jalr v0
8: 0040c821 move t9,v0
c: 44020000 mfc1 v0,$f0
10: 02400008 jr s2
14: 00000000 nop
The shortcoming has been recently lifted, but I gather GCC generally wants
to (and does) schedule delay slots elsewhere manually, so why not to do so
here as well.
The piece of code above is generated from libgcc/config/mips/mips16.S
with a macro called DELAYf() meant for pieces that read from an FPR.
There's a complementing macro called DELAYt() to write an FPR that does
schedule the delay slot manually. The reason for such an arrangement is I
believe a possibility that a read from CP1 may require another instruction
to complete before the value read is available in the destination GPR (a
coprocessor move delay slot).
I believe the only legacy MIPS processors that implemented the MIPS16 ASE
in its original variation (i.e. with no compact jumps, no SAVE/RESTORE,
and no extend instructions) were the LSI's TinyRISC cores. It's unclear
to me from TinyRISC documentation whether these cores suffered from the
coprocessor move delay slot. They featured a short three-stage pipeline
that had a bypass implemented to make data from memory loads available to
the immediately following instruction if needed, in parallel to the
destination register write back, to avoid load delay slots.
Unfortunately documentation does not mention whether such a bypass was
available for coprocessor moves or not, even though the instructions are
said to have the very same pipeline stages as memory moves. It is
therefore safe to assume coprocessor move delay slots were required.
OTOH no modern MIPS architecture processor requires coprocessor move
delay slots (they were lifted with the MIPS IV ISA legacy ISA already),
hence the current arrangement incurs unnecessary text space consumption
and a performance hit for all the modern targets. Especially as in many
cases the cases the next instruction executed after the branch delay slot
will not access the GPR anyway and thus will not cause any potential
pipeline stall even with any less efficient architecture implementations.
This change therefore enables manual delay-slot scheduling of
move-from-CP1 instructions whenever the stubs are built for the MIPS IV or
a newer ISA. It makes the stub above look like this:
00000000 <__mips16_call_stub_sf_0>:
0: 03e09021 move s2,ra
4: 0040f809 jalr v0
8: 0040c821 move t9,v0
c: 02400008 jr s2
10: 44020000 mfc1 v0,$f0
These stubs are I believe not really covered in our testing, because they
require a mixed standard-MIPS/MIPS16 environment. I have therefore
verified libgcc.a object code by inspection to be still correct after this
change, i.e. no change at all with current GAS (that otherwise schedules
these move-from-CP1 instructions into the following jump's delay slot
automatically) and the expected improved code with old GAS (that otherwise
inserts a NOP into that delay slot instead).
OK to apply?
2013-07-29 Maciej W. Rozycki <macro@codesourcery.com>
libgcc/
* config/mips/mips16.S (DELAYf): Alias to DELAYt for the MIPS IV
ISA and up.
Maciej
gcc-mips16-stub-delay-slot.patch
Index: gcc-fsf-trunk-quilt/libgcc/config/mips/mips16.S
===================================================================
--- gcc-fsf-trunk-quilt.orig/libgcc/config/mips/mips16.S 2013-03-27 15:20:54.000000000 +0000
+++ gcc-fsf-trunk-quilt/libgcc/config/mips/mips16.S 2013-07-13 02:40:38.300930313 +0100
@@ -89,8 +89,13 @@ see the files COPYING3 and COPYING.RUNTI
OPCODE, OP2; \
.set reorder
+#if __mips >= 4
+/* Coprocessor moves are interlocked from the MIPS IV ISA up. */
+#define DELAYf(T, OPCODE, OP2) DELAYt (T, OPCODE, OP2)
+#else
/* Use "OPCODE. OP2" and jump to T. */
#define DELAYf(T, OPCODE, OP2) OPCODE, OP2; jr T
+#endif
/* MOVE_SF_BYTE0(D)
Move the first single-precision floating-point argument between
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] libgcc/MIPS: Fill in delay slots of (some) MIPS16 call stubs
2013-07-29 17:36 [PATCH] libgcc/MIPS: Fill in delay slots of (some) MIPS16 call stubs Maciej W. Rozycki
@ 2013-07-29 19:24 ` Richard Sandiford
2013-07-29 19:32 ` Richard Sandiford
2013-07-29 21:36 ` Maciej W. Rozycki
0 siblings, 2 replies; 5+ messages in thread
From: Richard Sandiford @ 2013-07-29 19:24 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: gcc-patches
"Maciej W. Rozycki" <macro@codesourcery.com> writes:
> I believe the only legacy MIPS processors that implemented the MIPS16 ASE
> in its original variation (i.e. with no compact jumps, no SAVE/RESTORE,
> and no extend instructions) were the LSI's TinyRISC cores.
Ah, hadn't realised that the original version had no EXTEND instruction.
Which TinyRISC iteration was that? The LR4102 had it AFAIK.
FWIW GCC and binutils always assume that EXTEND is available and just
make a distinction between "original MIPS16" (which sounds like it might
not be as original as I'd thought) and "MIPS16e". The VR4121 and VR4131
were also in the "original MIPS16" category (no compact jumps, SAVE/RESTORE,
etc.). But they also had no FPU.
> These stubs are I believe not really covered in our testing, because they
> require a mixed standard-MIPS/MIPS16 environment.
What's the barrier to testing a mixed environment? The normal *-linux-gnu
configurations have no MIPS16 multilibs, so you should be able to test it
on a plain mips-linux-gnu configuration using --target_flags unix/-mips16.
FWIW I've been using the mips64-linux-gnu equivalent
(--target_flags unix/-mabi=32/-mips16) without problems.
Or if you don't want to test on GNU/Linux, you should be able to use something
like mips64-elf configured with whichever --with-arch= you like (and an
appropriate simulator). Long time since I tried that though. I prefer
testing on GNU/Linux because it also covers libgcc.so symbol visibility
and has better libgfortran support.
> libgcc/
> * config/mips/mips16.S (DELAYf): Alias to DELAYt for the MIPS IV
> ISA and up.
OK, thanks, but please do run it through the testsuite first.
Richard
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] libgcc/MIPS: Fill in delay slots of (some) MIPS16 call stubs
2013-07-29 19:24 ` Richard Sandiford
@ 2013-07-29 19:32 ` Richard Sandiford
2013-07-29 21:36 ` Maciej W. Rozycki
1 sibling, 0 replies; 5+ messages in thread
From: Richard Sandiford @ 2013-07-29 19:32 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: gcc-patches
Richard Sandiford <rdsandiford@googlemail.com> writes:
> "Maciej W. Rozycki" <macro@codesourcery.com> writes:
>> I believe the only legacy MIPS processors that implemented the MIPS16 ASE
>> in its original variation (i.e. with no compact jumps, no SAVE/RESTORE,
>> and no extend instructions) were the LSI's TinyRISC cores.
>
> Ah, hadn't realised that the original version had no EXTEND instruction.
> Which TinyRISC iteration was that? The LR4102 had it AFAIK.
*sigh*. I only just realised you meant the _register_ extend instructions
(SEB, etc.). Sorry about that.
Richard
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] libgcc/MIPS: Fill in delay slots of (some) MIPS16 call stubs
2013-07-29 19:24 ` Richard Sandiford
2013-07-29 19:32 ` Richard Sandiford
@ 2013-07-29 21:36 ` Maciej W. Rozycki
2013-07-30 18:44 ` Maciej W. Rozycki
1 sibling, 1 reply; 5+ messages in thread
From: Maciej W. Rozycki @ 2013-07-29 21:36 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches
On Mon, 29 Jul 2013, Richard Sandiford wrote:
> > I believe the only legacy MIPS processors that implemented the MIPS16 ASE
> > in its original variation (i.e. with no compact jumps, no SAVE/RESTORE,
> > and no extend instructions) were the LSI's TinyRISC cores.
>
> Ah, hadn't realised that the original version had no EXTEND instruction.
> Which TinyRISC iteration was that? The LR4102 had it AFAIK.
>
> FWIW GCC and binutils always assume that EXTEND is available and just
> make a distinction between "original MIPS16" (which sounds like it might
> not be as original as I'd thought) and "MIPS16e". The VR4121 and VR4131
> were also in the "original MIPS16" category (no compact jumps, SAVE/RESTORE,
> etc.). But they also had no FPU.
I meant the sign- and zero-extend instructions indeed, not the EXTEND
opcode (not really a distinct instruction, more like a prefix), sorry for
not being clear enough. Thanks for the reminder of the NEC silicon.
> > These stubs are I believe not really covered in our testing, because they
> > require a mixed standard-MIPS/MIPS16 environment.
>
> What's the barrier to testing a mixed environment? The normal *-linux-gnu
> configurations have no MIPS16 multilibs, so you should be able to test it
> on a plain mips-linux-gnu configuration using --target_flags unix/-mips16.
> FWIW I've been using the mips64-linux-gnu equivalent
> (--target_flags unix/-mabi=32/-mips16) without problems.
>
> Or if you don't want to test on GNU/Linux, you should be able to use something
> like mips64-elf configured with whichever --with-arch= you like (and an
> appropriate simulator). Long time since I tried that though. I prefer
> testing on GNU/Linux because it also covers libgcc.so symbol visibility
> and has better libgfortran support.
We don't have specific coverage, but something in the testsuite might
happen to pull one or more of these thunks indeed.
> > libgcc/
> > * config/mips/mips16.S (DELAYf): Alias to DELAYt for the MIPS IV
> > ISA and up.
>
> OK, thanks, but please do run it through the testsuite first.
I'll see if I can do it -- the MIPS/Linux tree I used for recent MIPS32r2
MADD.fmt testing has no MIPS16 multilibs configured, so it might happen to
just work with -mips16 passed as an extra option (otherwise MIPS16 libs
would be automagically picked instead). I'll check if binaries executeed
really pulled any of the thunks concerned.
Maciej
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] libgcc/MIPS: Fill in delay slots of (some) MIPS16 call stubs
2013-07-29 21:36 ` Maciej W. Rozycki
@ 2013-07-30 18:44 ` Maciej W. Rozycki
0 siblings, 0 replies; 5+ messages in thread
From: Maciej W. Rozycki @ 2013-07-30 18:44 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches
On Mon, 29 Jul 2013, Maciej W. Rozycki wrote:
> > > These stubs are I believe not really covered in our testing, because they
> > > require a mixed standard-MIPS/MIPS16 environment.
> >
> > What's the barrier to testing a mixed environment? The normal *-linux-gnu
> > configurations have no MIPS16 multilibs, so you should be able to test it
> > on a plain mips-linux-gnu configuration using --target_flags unix/-mips16.
> > FWIW I've been using the mips64-linux-gnu equivalent
> > (--target_flags unix/-mabi=32/-mips16) without problems.
> >
> > Or if you don't want to test on GNU/Linux, you should be able to use something
> > like mips64-elf configured with whichever --with-arch= you like (and an
> > appropriate simulator). Long time since I tried that though. I prefer
> > testing on GNU/Linux because it also covers libgcc.so symbol visibility
> > and has better libgfortran support.
>
> We don't have specific coverage, but something in the testsuite might
> happen to pull one or more of these thunks indeed.
It is as I feared, the coverage is sparse. The only call/return stubs
pulled across the testsuite are:
__mips16_call_stub_sf_5
__mips16_call_stub_sc_0
__mips16_call_stub_df_0
__mips16_call_stub_df_1
__mips16_call_stub_df_2
__mips16_call_stub_df_10
(there are some call stubs too and some PIC stubs, but they all use
different code). Of these only the first suffers from the old GAS
shortcoming:
00000000 <__mips16_call_stub_sf_5>:
0: 03e09021 move s2,ra
4: 44846000 mtc1 a0,$f12
8: 44857000 mtc1 a1,$f14
c: 0040f809 jalr v0
10: 0040c821 move t9,v0
14: 44020000 mfc1 v0,$f0
18: 02400008 jr s2
1c: 00000000 nop
as only stubs that return a single float result are affected (ones that
return a single complex or a double result are not, because they use more
than one CP1 move in their epilogues and old GAS is capable enough to
schedule the last move into the JR's delay slot itself).
> > > libgcc/
> > > * config/mips/mips16.S (DELAYf): Alias to DELAYt for the MIPS IV
> > > ISA and up.
> >
> > OK, thanks, but please do run it through the testsuite first.
>
> I'll see if I can do it -- the MIPS/Linux tree I used for recent MIPS32r2
> MADD.fmt testing has no MIPS16 multilibs configured, so it might happen to
> just work with -mips16 passed as an extra option (otherwise MIPS16 libs
> would be automagically picked instead). I'll check if binaries executeed
> really pulled any of the thunks concerned.
No regressions seen, applied now. Thanks for your review.
Maciej
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-07-30 18:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-29 17:36 [PATCH] libgcc/MIPS: Fill in delay slots of (some) MIPS16 call stubs Maciej W. Rozycki
2013-07-29 19:24 ` Richard Sandiford
2013-07-29 19:32 ` Richard Sandiford
2013-07-29 21:36 ` Maciej W. Rozycki
2013-07-30 18:44 ` Maciej W. Rozycki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).