[Bug target/100638] New: FP16 vector compare missed optimization on AArch64

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/100638] New: FP16 vector compare missed optimization on AArch64
@ 2021-05-17 13:50 tnfchris at gcc dot gnu.org
  2021-05-17 13:51 ` [Bug target/100638] " tnfchris at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2021-05-17 13:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100638

            Bug ID: 100638
           Summary: FP16 vector compare missed optimization on AArch64
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64-*

The following testcase

```
#include <arm_neon.h>

void foo(float16_t *x, uint16x8_t *out) {
    float16x8x2_t xk = vld2q_f16(x);
    float16x8_t xk_re = xk.val[0];
    float16x8_t xk_im = xk.val[1];
    uint16x8_t theta_rx = xk_re < 0;
    uint16x8_t theta_ix = xk_im < 0;
    out[0] = theta_rx;
    out[1] = theta_ix;
}
```
on AArch64 with `-Ofast -march=armv8.2-a+fp16` generates

```
foo:
        ld2     {v0.8h - v1.8h}, [x0]
        mov     w2, -1
        fcvt    s2, h0
        fcvt    s18, h1
        dup     h3, v0.h[1]
        dup     h24, v0.h[2]
        dup     h23, v0.h[3]
        fcmpe   s2, #0.0
        fcvt    s3, h3
        fcvt    s24, h24
        dup     h22, v0.h[4]
        csel    w0, w2, wzr, mi
        fcvt    s23, h23
        fcmpe   s3, #0.0
        dup     v2.4h, w0
        dup     h17, v1.h[1]
        dup     h16, v1.h[2]
        csel    w0, w2, wzr, mi
        fcmpe   s24, #0.0
        dup     h7, v1.h[3]
        dup     h6, v1.h[4]
        dup     h5, v1.h[5]
        dup     h4, v1.h[6]
        dup     h3, v1.h[7]
        mov     v1.16b, v2.16b
        dup     h21, v0.h[5]
        fcvt    s22, h22
        fcvt    s17, h17
        dup     h20, v0.h[6]
        ins     v1.h[1], w0
        csel    w0, w2, wzr, mi
        fcmpe   s23, #0.0
        fcvt    s21, h21
        dup     h19, v0.h[7]
        fcvt    s20, h20
        fcvt    s16, h16
        ins     v1.h[2], w0
        fcvt    s7, h7
        csel    w0, w2, wzr, mi
        fcmpe   s22, #0.0
        fcvt    s19, h19
        fcvt    s6, h6
        fcvt    s5, h5
        fcvt    s4, h4
        ins     v1.h[3], w0
        fcvt    s3, h3
        csel    w0, w2, wzr, mi
        fcmpe   s21, #0.0
        ins     v1.h[4], w0
        csel    w0, w2, wzr, mi
        fcmpe   s20, #0.0
        ins     v1.h[5], w0
        csel    w0, w2, wzr, mi
        fcmpe   s19, #0.0
        ins     v1.h[6], w0
        csel    w0, w2, wzr, mi
        fcmpe   s18, #0.0
        ins     v1.h[7], w0
        csel    w0, w2, wzr, mi
        fcmpe   s17, #0.0
        dup     v0.4h, w0
        csel    w0, w2, wzr, mi
        fcmpe   s16, #0.0
        ins     v0.h[1], w0
        csel    w0, w2, wzr, mi
        fcmpe   s7, #0.0
        ins     v0.h[2], w0
        csel    w0, w2, wzr, mi
        fcmpe   s6, #0.0
        ins     v0.h[3], w0
        csel    w0, w2, wzr, mi
        fcmpe   s5, #0.0
        ins     v0.h[4], w0
        csel    w0, w2, wzr, mi
        fcmpe   s4, #0.0
        ins     v0.h[5], w0
        csel    w0, w2, wzr, mi
        fcmpe   s3, #0.0
        ins     v0.h[6], w0
        csel    w2, w2, wzr, mi
        ins     v0.h[7], w2
        stp     q1, q0, [x1]
        ret
```

instead of simply

```
foo:
        ld2     { v0.8h, v1.8h }, [x0]
        fcmlt   v2.8h, v0.8h, #0.0
        fcmlt   v0.8h, v1.8h, #0.0
        stp     q2, q0, [x1]
        ret
```

This is happening because veclower doesn't find a pattern for the FP16 vector
compare and then lowers to operations on scalar.

However even the lowered operations are inefficient:

```
        fcvt    s23, h23
        fcmpe   s23, #0.0
```

indicates that the backend doesn't know how to do this operation on fp16.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/100638] FP16 vector compare missed optimization on AArch64
  2021-05-17 13:50 [Bug target/100638] New: FP16 vector compare missed optimization on AArch64 tnfchris at gcc dot gnu.org
@ 2021-05-17 13:51 ` tnfchris at gcc dot gnu.org
  2021-05-22 23:53 ` pinskia at gcc dot gnu.org
  2024-01-26  1:58 ` [Bug target/100638] FP16 (vector) " pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2021-05-17 13:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100638

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/100638] FP16 vector compare missed optimization on AArch64
  2021-05-17 13:50 [Bug target/100638] New: FP16 vector compare missed optimization on AArch64 tnfchris at gcc dot gnu.org
  2021-05-17 13:51 ` [Bug target/100638] " tnfchris at gcc dot gnu.org
@ 2021-05-22 23:53 ` pinskia at gcc dot gnu.org
  2024-01-26  1:58 ` [Bug target/100638] FP16 (vector) " pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-05-22 23:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100638

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-05-22
   Target Milestone|12.0                        |---
           Severity|normal                      |enhancement
     Ever confirmed|0                           |1

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/100638] FP16 (vector) compare missed optimization on AArch64
  2021-05-17 13:50 [Bug target/100638] New: FP16 vector compare missed optimization on AArch64 tnfchris at gcc dot gnu.org
  2021-05-17 13:51 ` [Bug target/100638] " tnfchris at gcc dot gnu.org
  2021-05-22 23:53 ` pinskia at gcc dot gnu.org
@ 2024-01-26  1:58 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-26  1:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100638

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|FP16 vector compare missed  |FP16 (vector) compare
                   |optimization on AArch64     |missed optimization on
                   |                            |AArch64

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #0)
> However even the lowered operations are inefficient:
> 
> ```
>         fcvt    s23, h23
>         fcmpe   s23, #0.0
> ```
Actually that comes from expand:
```
;; _16 = _15 < 0.0;

(insn 48 47 49 (set (reg:SF 194)
        (float_extend:SF (reg:HF 102 [ _15 ]))) "/app/example.c":8:16 -1
     (nil))

(insn 49 48 50 (set (reg:HF 196)
        (const_double:HF 0.0 [0x0.0p+0])) "/app/example.c":8:16 -1
     (nil))

(insn 50 49 51 (set (reg:SF 195)
        (float_extend:SF (reg:HF 196))) "/app/example.c":8:16 -1
     (nil))

(insn 51 50 52 (set (reg:CCFPE 66 cc)
        (compare:CCFPE (reg:SF 194)
            (reg:SF 195))) "/app/example.c":8:16 -1
     (nil))
```

Which can reproduce with just a simple:
```
void foo(_Float16 *x, unsigned short *out) {
    *out = -(*x < 0.0f16);
}
```

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-01-26  1:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-17 13:50 [Bug target/100638] New: FP16 vector compare missed optimization on AArch64 tnfchris at gcc dot gnu.org
2021-05-17 13:51 ` [Bug target/100638] " tnfchris at gcc dot gnu.org
2021-05-22 23:53 ` pinskia at gcc dot gnu.org
2024-01-26  1:58 ` [Bug target/100638] FP16 (vector) " pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).