public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/110105] New: ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb
@ 2023-06-03 16:50 pavel.morozkin at gmail dot com
2023-06-03 16:55 ` [Bug target/110105] " pavel.morozkin at gmail dot com
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: pavel.morozkin at gmail dot com @ 2023-06-03 16:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105
Bug ID: 110105
Summary: ARM GCC: underoptimization: expected vfma.f16, actual
vcvtb-vfma.f32-vcvtb
Product: gcc
Version: 13.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: pavel.morozkin at gmail dot com
Target Milestone: ---
This code:
__fp16 mul(__fp16 x, __fp16 y, __fp16 z)
{
return x * y + z;
}
compiled as:
gcc -O3 -mfpu=fp-armv8 -march=armv8.2-a+fp16
produces the following assembler code:
mul:
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
vcvtb.f32.f16 s2, s2
vfma.f32 s2, s0, s1
vcvtb.f16.f32 s0, s2
bx lr
Here we see vcvtb-vfma.f32-vcvtb while a single vfma.f16 is expected.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/110105] ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb
2023-06-03 16:50 [Bug target/110105] New: ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb pavel.morozkin at gmail dot com
@ 2023-06-03 16:55 ` pavel.morozkin at gmail dot com
2023-06-03 17:56 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: pavel.morozkin at gmail dot com @ 2023-06-03 16:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105
--- Comment #1 from Pavel M <pavel.morozkin at gmail dot com> ---
Demo: https://godbolt.org/z/9s7eb9b1K.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/110105] ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb
2023-06-03 16:50 [Bug target/110105] New: ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb pavel.morozkin at gmail dot com
2023-06-03 16:55 ` [Bug target/110105] " pavel.morozkin at gmail dot com
@ 2023-06-03 17:56 ` pinskia at gcc dot gnu.org
2023-06-09 21:12 ` rsandifo at gcc dot gnu.org
2023-06-12 18:48 ` pavel.morozkin at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-03 17:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
float mul(float x, float y, float z)
{
return ((double)x) * y + z;
}
Also produces the conversion ...
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/110105] ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb
2023-06-03 16:50 [Bug target/110105] New: ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb pavel.morozkin at gmail dot com
2023-06-03 16:55 ` [Bug target/110105] " pavel.morozkin at gmail dot com
2023-06-03 17:56 ` pinskia at gcc dot gnu.org
@ 2023-06-09 21:12 ` rsandifo at gcc dot gnu.org
2023-06-12 18:48 ` pavel.morozkin at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-06-09 21:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105
rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rsandifo at gcc dot gnu.org
--- Comment #3 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
This is deliberate, since __fp16 is only a “storage type”:
all __fp16 arithmetic happens on float, a bit like all short
arithmetic happens in int.
It works if you use _Float16 instead:
_Float16 mul(_Float16 x, _Float16 y, _Float16 z)
{
return x * y + z;
}
vfma.f16 s2, s0, s1
vmov s0, s2 @ __fp16
bx lr
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/110105] ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb
2023-06-03 16:50 [Bug target/110105] New: ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb pavel.morozkin at gmail dot com
` (2 preceding siblings ...)
2023-06-09 21:12 ` rsandifo at gcc dot gnu.org
@ 2023-06-12 18:48 ` pavel.morozkin at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: pavel.morozkin at gmail dot com @ 2023-06-12 18:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105
--- Comment #4 from Pavel M <pavel.morozkin at gmail dot com> ---
To: rsandifo@gcc.gnu.org
Thanks! I confused __fp16 with _Float16.
However, if __fp16 is only a “storage type”, then why this code:
__fp16 mul(__fp16 x, __fp16 y)
{
return x * y;
}
compiled with -O3 -mfpu=fp-armv8 -march=armv8.2-a+fp16
leads to this code:
mul:
vmul.f16 s0, s0, s1
bx lr
Here we see vmul.f16 instead of half->float->vmul.f32->float->half.
As a user, I expect half->float->vmul.f32->float->half (because __fp16 is only
a “storage type”).
Where is the conversions and mul.f32?
P.S. If optimizer does this, then as I remember, half->float->op->float->half
does not always produce the same result as half->op->half. The difference in
result may be +/-1 (last) bit. Any comments?
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-06-12 18:48 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-03 16:50 [Bug target/110105] New: ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb pavel.morozkin at gmail dot com
2023-06-03 16:55 ` [Bug target/110105] " pavel.morozkin at gmail dot com
2023-06-03 17:56 ` pinskia at gcc dot gnu.org
2023-06-09 21:12 ` rsandifo at gcc dot gnu.org
2023-06-12 18:48 ` pavel.morozkin at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).