[Bug target/100165] New: fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/100165] New: fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector
@ 2021-04-21  0:47 pinskia at gcc dot gnu.org
  2021-08-25  8:13 ` [Bug target/100165] " pinskia at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-04-21  0:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

            Bug ID: 100165
           Summary: fmov could be used to zero out the upper bits instead
                    of movi/zip or movi/ins with __builtin_shuffle and
                    zero vector
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64-*-*

Take:
typedef double V __attribute__((vector_size(16)));
typedef long long VI __attribute__((vector_size(16)));

V
foo (V x)
{
  return __builtin_shuffle (x, (V) { 0, 0,  }, (VI) {0, 3});
}

----- CUT ----
Or
typedef float V __attribute__((vector_size(16)));
typedef int VI __attribute__((vector_size(16)));

V
foo (V x)
{
  return __builtin_shuffle (x, (V) { 0, 0, 0, 0 }, (VI) {0, 1, 4, 5});
}
---- CUT ----
Both should just produce:
fmov d0, d0
ret
---- CUT ----
The x86_64 specific version of this was PR 94680 which I just confirmed today.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector
  2021-04-21  0:47 [Bug target/100165] New: fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector pinskia at gcc dot gnu.org
@ 2021-08-25  8:13 ` pinskia at gcc dot gnu.org
  2023-11-12 21:27 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-25  8:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This 
V
foo (V x)
{
  return __builtin_shuffle (x, (V) { 0, 0, 0, 0,  }, (VI) { 0, 1, 6, 7});
}

Produces:
        movi    v1.4s, 0
        ins     v0.d[1], v1.d[1]

Which is better but fmov is still better :).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector
  2021-04-21  0:47 [Bug target/100165] New: fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector pinskia at gcc dot gnu.org
  2021-08-25  8:13 ` [Bug target/100165] " pinskia at gcc dot gnu.org
@ 2023-11-12 21:27 ` pinskia at gcc dot gnu.org
  2023-11-12 21:32 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-12 21:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 56564
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56564&action=edit
Full testcase

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector
  2021-04-21  0:47 [Bug target/100165] New: fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector pinskia at gcc dot gnu.org
  2021-08-25  8:13 ` [Bug target/100165] " pinskia at gcc dot gnu.org
  2023-11-12 21:27 ` pinskia at gcc dot gnu.org
@ 2023-11-12 21:32 ` pinskia at gcc dot gnu.org
  2023-11-12 21:45 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-12 21:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2023-11-12
     Ever confirmed|0                           |1

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Currently the trunk produces:
```
foo:
        ins     v0.d[1], xzr
        ret
foo1:
        movi    v31.4s, 0
        zip1    v0.2d, v0.2d, v31.2d
        ret
foo2:
        ins     v0.d[1], xzr
        ret
```

Which is better than 10.x but still not using fmov.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector
  2021-04-21  0:47 [Bug target/100165] New: fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector pinskia at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-11-12 21:32 ` pinskia at gcc dot gnu.org
@ 2023-11-12 21:45 ` pinskia at gcc dot gnu.org
  2023-11-12 21:45 ` pinskia at gcc dot gnu.org
  2024-02-27  8:44 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-12 21:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |pinskia at gcc dot gnu.org

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Mine, I will handle this. Most likely for GCC 15 though.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector
  2021-04-21  0:47 [Bug target/100165] New: fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector pinskia at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-11-12 21:45 ` pinskia at gcc dot gnu.org
@ 2023-11-12 21:45 ` pinskia at gcc dot gnu.org
  2024-02-27  8:44 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-12 21:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector
  2021-04-21  0:47 [Bug target/100165] New: fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector pinskia at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-11-12 21:45 ` pinskia at gcc dot gnu.org
@ 2024-02-27  8:44 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-27  8:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
For the ones which produce ins, it should be easy to modify the pattern to emit
fmov for those cases, that is `elt == 0`:

(define_insn "aarch64_simd_vec_set_zero<mode>"
  [(set (match_operand:VALLS_F16 0 "register_operand" "=w")
        (vec_merge:VALLS_F16
            (match_operand:VALLS_F16 1 "aarch64_simd_imm_zero" "")
            (match_operand:VALLS_F16 3 "register_operand" "0")
            (match_operand:SI 2 "immediate_operand" "i")))]
  "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0"
  {
    int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2])));
    operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt);
    return "ins\\t%0.<Vetype>[%p2], <vwcore>zr";
  }
)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-02-27  8:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-21  0:47 [Bug target/100165] New: fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector pinskia at gcc dot gnu.org
2021-08-25  8:13 ` [Bug target/100165] " pinskia at gcc dot gnu.org
2023-11-12 21:27 ` pinskia at gcc dot gnu.org
2023-11-12 21:32 ` pinskia at gcc dot gnu.org
2023-11-12 21:45 ` pinskia at gcc dot gnu.org
2023-11-12 21:45 ` pinskia at gcc dot gnu.org
2024-02-27  8:44 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).