[Bug target/110105] New: ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/110105] New: ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb
@ 2023-06-03 16:50 pavel.morozkin at gmail dot com
  2023-06-03 16:55 ` [Bug target/110105] " pavel.morozkin at gmail dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: pavel.morozkin at gmail dot com @ 2023-06-03 16:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105

            Bug ID: 110105
           Summary: ARM GCC: underoptimization: expected vfma.f16, actual
                    vcvtb-vfma.f32-vcvtb
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pavel.morozkin at gmail dot com
  Target Milestone: ---

This code:
__fp16 mul(__fp16 x, __fp16 y, __fp16 z)
{
    return x * y + z;
}

compiled as:
gcc -O3 -mfpu=fp-armv8 -march=armv8.2-a+fp16

produces the following assembler code:
mul:
        vcvtb.f32.f16   s0, s0
        vcvtb.f32.f16   s1, s1
        vcvtb.f32.f16   s2, s2
        vfma.f32        s2, s0, s1
        vcvtb.f16.f32   s0, s2
        bx      lr

Here we see vcvtb-vfma.f32-vcvtb while a single vfma.f16 is expected.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/110105] ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb
  2023-06-03 16:50 [Bug target/110105] New: ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb pavel.morozkin at gmail dot com
@ 2023-06-03 16:55 ` pavel.morozkin at gmail dot com
  2023-06-03 17:56 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: pavel.morozkin at gmail dot com @ 2023-06-03 16:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105

--- Comment #1 from Pavel M <pavel.morozkin at gmail dot com> ---
Demo: https://godbolt.org/z/9s7eb9b1K.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/110105] ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb
  2023-06-03 16:50 [Bug target/110105] New: ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb pavel.morozkin at gmail dot com
  2023-06-03 16:55 ` [Bug target/110105] " pavel.morozkin at gmail dot com
@ 2023-06-03 17:56 ` pinskia at gcc dot gnu.org
  2023-06-09 21:12 ` rsandifo at gcc dot gnu.org
  2023-06-12 18:48 ` pavel.morozkin at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-03 17:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
float mul(float x, float y, float z)
{
    return ((double)x) * y + z;
}

Also produces the conversion ...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/110105] ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb
  2023-06-03 16:50 [Bug target/110105] New: ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb pavel.morozkin at gmail dot com
  2023-06-03 16:55 ` [Bug target/110105] " pavel.morozkin at gmail dot com
  2023-06-03 17:56 ` pinskia at gcc dot gnu.org
@ 2023-06-09 21:12 ` rsandifo at gcc dot gnu.org
  2023-06-12 18:48 ` pavel.morozkin at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-06-09 21:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #3 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
This is deliberate, since __fp16 is only a “storage type”:
all __fp16 arithmetic happens on float, a bit like all short
arithmetic happens in int.

It works if you use _Float16 instead:

_Float16 mul(_Float16 x, _Float16 y, _Float16 z)
{
    return x * y + z;
}

        vfma.f16        s2, s0, s1
        vmov    s0, s2  @ __fp16
        bx      lr

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/110105] ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb
  2023-06-03 16:50 [Bug target/110105] New: ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb pavel.morozkin at gmail dot com
                   ` (2 preceding siblings ...)
  2023-06-09 21:12 ` rsandifo at gcc dot gnu.org
@ 2023-06-12 18:48 ` pavel.morozkin at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: pavel.morozkin at gmail dot com @ 2023-06-12 18:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105

--- Comment #4 from Pavel M <pavel.morozkin at gmail dot com> ---
To: rsandifo@gcc.gnu.org

Thanks! I confused __fp16 with _Float16.

However, if __fp16 is only a “storage type”, then why this code:
__fp16 mul(__fp16 x, __fp16 y)
{
    return x * y;
}

compiled with -O3 -mfpu=fp-armv8 -march=armv8.2-a+fp16

leads to this code:
mul:
        vmul.f16        s0, s0, s1
        bx      lr

Here we see vmul.f16 instead of half->float->vmul.f32->float->half.

As a user, I expect half->float->vmul.f32->float->half (because __fp16 is only
a “storage type”).

Where is the conversions and mul.f32?

P.S. If optimizer does this, then as I remember, half->float->op->float->half
does not always produce the same result as half->op->half. The difference in
result may be +/-1 (last) bit. Any comments?

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-06-12 18:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-03 16:50 [Bug target/110105] New: ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb pavel.morozkin at gmail dot com
2023-06-03 16:55 ` [Bug target/110105] " pavel.morozkin at gmail dot com
2023-06-03 17:56 ` pinskia at gcc dot gnu.org
2023-06-09 21:12 ` rsandifo at gcc dot gnu.org
2023-06-12 18:48 ` pavel.morozkin at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).