mips-elf (10.2.0) - Question about floating point constants

public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed

* mips-elf (10.2.0) - Question about floating point constants
@ 2021-03-24 10:33 Yt Dl
  2021-09-26  7:43 ` Yt Dl
  0 siblings, 1 reply; 2+ messages in thread
From: Yt Dl @ 2021-03-24 10:33 UTC (permalink / raw)
  To: gcc-help

Hello,

I am working on some code targeting vr4300 using mips-elf-gcc (on msys2),
in which gcc was configured with:

--build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --prefix="./"
--target=mips-elf --enable-languages=c,c++ --without-headers --with-newlib
--with-gnu-as=./bin/mips-elf-as.exe --with-gnu-ld=./bin/mips-elf-ld.exe
--enable-checking=release --enable-shared --enable-shared-libgcc
--disable-decimal-float --disable-gold --disable-libatomic
--disable-libgomp --disable-libitm --disable-libquadmath
--disable-libquadmath-support --disable-libsanitizer --disable-libssp
--disable-libunwind-exceptions --disable-libvtv --disable-multilib
--disable-nls --disable-rpath --disable-symvers --disable-threads
--disable-win32-registry --enable-lto --enable-plugin --enable-static
--without-included-gettext

While everything is working, I found that gcc always places single
precision floating point constants into data sections instead of using
coprocessor moves; this may not always be optimal.

Please consider the following example:

// ----------------------------------------------------------------
// cctest.c
// ----------------------------------------------------------------

extern struct {
    float x;
    float y;
    float z;
} var;

void *test() {
    float t;

    t = 5.0;
    var.x = var.x + t;
    var.y = 10.0;
    var.z = 60.0;
    return (void*)&var;
}
// ----------------------------------------------------------------

To the best of my knowledge, I would expect compiler produces something
like:

; ----------------------------------------------------------------
lui $2, %hi(var)
lui $1, 0x40A0          ; 5.0
addiu $2,$2,%lo(var)
mtc1 $1, $f2
lwc1 $f0, 0x0($2)
lui $3, 0x4120          ; 10.0
lui $4, 0x4270          ; 60.0
sw $3, 0x4($2)
add.s $f0, $f0, $f2
sw $4, 0x8($2)
jr $31
swc1 $f0, 0x0($2)
; ----------------------------------------------------------------

However, gcc produces:

; ----------------------------------------------------------------
; cctest.s
; ----------------------------------------------------------------

; .text
lui $3,%hi(var)
lui $2,%hi($LC0)
lwc1 $f0,%lo(var)($3)
lwc1 $f2,%lo($LC0)($2)
lui $5,%hi($LC1)
add.s $f0,$f0,$f2
addiu $2,$3,%lo(var)
lui $4,%hi($LC2)
swc1 $f0,%lo(var)($3)
lwc1 $f0,%lo($LC1)($5)
swc1 $f0,4($2)
lwc1 $f0,%lo($LC2)($4)
jr $31
swc1 $f0,8($2)

; .rodata
.align 2
$LC0:
.word 1084227584
.align 2
$LC1:
.word 1092616192
.align 2
$LC2:
.word 1114636288
; ----------------------------------------------------------------

with the following flags given:

-G0 -fomit-frame-pointer -fno-PIC -fno-stack-protector -fno-common
-fno-zero-initialized-in-bss -mips3 -march=vr4300 -mtune=vr4300 -mabi=32
-mlong32 -mno-shared -mgp32 -mhard-float -mno-check-zero-division
-mno-abicalls -mno-memcpy -mbranch-likely -O3

According to VR4300/VR4305/VR4310 manual (p222, p230), both lwc1 and mtc1
takes 1 cycle to complete, and interlocks for 1 cycle when load-use occurs;
thus, using mtc1 should save some space without loss of performance in this
case.

In fact, there exists some ancient compilers that use mtc1 to load float
point constants.

Therefore, I am wondering why gcc always prefers to place such constants in
data sections instead of transferring them from general purpose registers -
is this intended or not?

And, if this is not intended, how could I change this?

Many thanks.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: mips-elf (10.2.0) - Question about floating point constants
  2021-03-24 10:33 mips-elf (10.2.0) - Question about floating point constants Yt Dl
@ 2021-09-26  7:43 ` Yt Dl
  0 siblings, 0 replies; 2+ messages in thread
From: Yt Dl @ 2021-09-26  7:43 UTC (permalink / raw)
  To: gcc-help

I noticed that ancient version(s) of gcc did the trick:

; ----------------------------------------------------------------
; cctest.egcs112.s

; compiler: egcs-mips-linux-1.1.2-4.i386
; binutils: binutils-mips-linux-2.9.5-3.i386
; options: -O2 -non_shared -mips3 -G 0 -mcpu=4300

; .text
.set noreorder
.cpload $25 ; No sure why GPT was used with "-G 0"
.set reorder ; Allow as to reorder instructions
la $2,var
li.s $f6,5.00000000000000000000e0 ; This pseudo op will expand to lui + mtc0
l.s $f0,0($2)
li.s $f2,1.00000000000000000000e1
li.s $f4,6.00000000000000000000e1
add.s $f0,$f0,$f6
s.s $f2,4($2)
s.s $f4,8($2)
.set noreorder
.set nomacro
j $31
s.s $f0,0($2)
.set macro
.set reorder
; ----------------------------------------------------------------

It turns out that some optimizations for 32 bits code were probably dropped
at some point in 64 bits support added.

Currently, inside gcc source, the only way defined in mips.c and mips.md to
transfer single immediate, is to load via memory; In short, while it is not
possible to perform such optimization with modern releases of gcc, this can
be done by switching back to 199x versions or make a custom build to add it
back manually.

But I still think that there is no reason to force gcc to use memory to
load single immediates when even FPRs is available, given such
optimizations once existed in the history version(s) of gcc.

Hence, I would like to know the reason why this was removed ... whether
that was intended.

Many thanks.

Yt Dl <ytleedl@gmail.com> 於 2021年3月24日 週三 下午6:33寫道：

> Hello,
>
> I am working on some code targeting vr4300 using mips-elf-gcc (on msys2),
> in which gcc was configured with:
>
> --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --prefix="./"
> --target=mips-elf --enable-languages=c,c++ --without-headers --with-newlib
> --with-gnu-as=./bin/mips-elf-as.exe --with-gnu-ld=./bin/mips-elf-ld.exe
> --enable-checking=release --enable-shared --enable-shared-libgcc
> --disable-decimal-float --disable-gold --disable-libatomic
> --disable-libgomp --disable-libitm --disable-libquadmath
> --disable-libquadmath-support --disable-libsanitizer --disable-libssp
> --disable-libunwind-exceptions --disable-libvtv --disable-multilib
> --disable-nls --disable-rpath --disable-symvers --disable-threads
> --disable-win32-registry --enable-lto --enable-plugin --enable-static
> --without-included-gettext
>
> While everything is working, I found that gcc always places single
> precision floating point constants into data sections instead of using
> coprocessor moves; this may not always be optimal.
>
> Please consider the following example:
>
> // ----------------------------------------------------------------
> // cctest.c
> // ----------------------------------------------------------------
>
> extern struct {
>     float x;
>     float y;
>     float z;
> } var;
>
> void *test() {
>     float t;
>
>     t = 5.0;
>     var.x = var.x + t;
>     var.y = 10.0;
>     var.z = 60.0;
>     return (void*)&var;
> }
> // ----------------------------------------------------------------
>
> To the best of my knowledge, I would expect compiler produces something
> like:
>
> ; ----------------------------------------------------------------
> lui $2, %hi(var)
> lui $1, 0x40A0          ; 5.0
> addiu $2,$2,%lo(var)
> mtc1 $1, $f2
> lwc1 $f0, 0x0($2)
> lui $3, 0x4120          ; 10.0
> lui $4, 0x4270          ; 60.0
> sw $3, 0x4($2)
> add.s $f0, $f0, $f2
> sw $4, 0x8($2)
> jr $31
> swc1 $f0, 0x0($2)
> ; ----------------------------------------------------------------
>
> However, gcc produces:
>
> ; ----------------------------------------------------------------
> ; cctest.s
> ; ----------------------------------------------------------------
>
> ; .text
> lui $3,%hi(var)
> lui $2,%hi($LC0)
> lwc1 $f0,%lo(var)($3)
> lwc1 $f2,%lo($LC0)($2)
> lui $5,%hi($LC1)
> add.s $f0,$f0,$f2
> addiu $2,$3,%lo(var)
> lui $4,%hi($LC2)
> swc1 $f0,%lo(var)($3)
> lwc1 $f0,%lo($LC1)($5)
> swc1 $f0,4($2)
> lwc1 $f0,%lo($LC2)($4)
> jr $31
> swc1 $f0,8($2)
>
> ; .rodata
> .align 2
> $LC0:
> .word 1084227584
> .align 2
> $LC1:
> .word 1092616192
> .align 2
> $LC2:
> .word 1114636288
> ; ----------------------------------------------------------------
>
> with the following flags given:
>
> -G0 -fomit-frame-pointer -fno-PIC -fno-stack-protector -fno-common
> -fno-zero-initialized-in-bss -mips3 -march=vr4300 -mtune=vr4300 -mabi=32
> -mlong32 -mno-shared -mgp32 -mhard-float -mno-check-zero-division
> -mno-abicalls -mno-memcpy -mbranch-likely -O3
>
> According to VR4300/VR4305/VR4310 manual (p222, p230), both lwc1 and mtc1
> takes 1 cycle to complete, and interlocks for 1 cycle when load-use occurs;
> thus, using mtc1 should save some space without loss of performance in this
> case.
>
> In fact, there exists some ancient compilers that use mtc1 to load float
> point constants.
>
> Therefore, I am wondering why gcc always prefers to place such constants
> in data sections instead of transferring them from general purpose
> registers - is this intended or not?
>
> And, if this is not intended, how could I change this?
>
> Many thanks.
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-09-26  7:43 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-24 10:33 mips-elf (10.2.0) - Question about floating point constants Yt Dl
2021-09-26  7:43 ` Yt Dl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).