[Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64
@ 2021-01-22 11:27 ktkachov at gcc dot gnu.org
  2021-01-22 12:36 ` [Bug target/98792] " rguenth at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2021-01-22 11:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98792

            Bug ID: 98792
           Summary: Fail to use SHRN instructions for narrowing shift on
                    aarch64
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

#define N 1024
unsigned short res[N];
unsigned int in[N];

void
foo (void)
{
  for (int i = 0; i < N; i++)
    res[i] = in[i] >> 3;
}

with -O3 -mcpu=neoverse-n1 on aarch64 generates the loop:
.L2:
        ldp     q1, q0, [x0]
        add     x0, x0, 32
        ushr    v1.4s, v1.4s, 3
        ushr    v0.4s, v0.4s, 3
        xtn     v2.4h, v1.4s
        xtn2    v2.8h, v0.4s
        str     q2, [x1], 16
        cmp     x0, x2
        bne     .L2

it could be using the SHRN narrowing shift instruction insted. LLVM can do it
(some other inefficiencies aside):
.LBB0_1:                                // %vector.body
                                        // =>This Inner Loop Header: Depth=1
        add     x11, x10, x8
        ldp     q0, q1, [x11]
        add     x8, x8, #32                     // =32
        cmp     x8, #1, lsl #12                 // =4096
        shrn    v0.4h, v0.4s, #3
        shrn    v1.4h, v1.4s, #3
        stp     d0, d1, [x9, #-8]
        add     x9, x9, #16                     // =16
        b.ne    .LBB0_1

Some backend patterns can probably handle it, but maybe the vectoriser can do
something useful earlier as well?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/98792] Fail to use SHRN instructions for narrowing shift on aarch64
  2021-01-22 11:27 [Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64 ktkachov at gcc dot gnu.org
@ 2021-01-22 12:36 ` rguenth at gcc dot gnu.org
  2021-03-07  1:29 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-22 12:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98792

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
would need such concept, like a named pattern and a vector pattern recognizing
it.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/98792] Fail to use SHRN instructions for narrowing shift on aarch64
  2021-01-22 11:27 [Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64 ktkachov at gcc dot gnu.org
  2021-01-22 12:36 ` [Bug target/98792] " rguenth at gcc dot gnu.org
@ 2021-03-07  1:29 ` pinskia at gcc dot gnu.org
  2021-09-02 10:21 ` pinskia at gcc dot gnu.org
  2023-12-16  2:57 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-03-07  1:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98792

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-03-07
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
           Severity|normal                      |enhancement

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.


(insn 17 16 18 3 (set (reg:V8HI 109 [ vect__3.8 ])
        (vec_concat:V8HI (truncate:V4HI (reg:V4SI 105 [ vect__2.7 ]))
            (truncate:V4HI (reg:V4SI 107 [ vect__2.7 ])))) "t9.c":9:16 1942
{vec_pack_trunc_v4si}
     (expr_list:REG_DEAD (reg:V4SI 107 [ vect__2.7 ])
        (expr_list:REG_DEAD (reg:V4SI 105 [ vect__2.7 ])
            (nil))))
(insn 18 17 19 3 (set (mem:V8HI (post_inc:DI (reg:DI 92 [ ivtmp.16 ])) [2 MEM
<vector(8) short unsigned int> [(short unsigned int *)_7]+0 S16 A128])
        (reg:V8HI 109 [ vect__3.8 ])) "t9.c":9:16 1161 {*aarch64_simd_movv8hi}
     (expr_list:REG_DEAD (reg:V8HI 109 [ vect__3.8 ])
        (expr_list:REG_INC (reg:DI 92 [ ivtmp.16 ])
            (nil))))
Part of the problem is the above. 
So this might need to be done at the gimple level such that we don't do the
vec_concat in the first place ....
That is if we had the RTL for:
        ushr    v1.4s, v1.4s, 3
        ushr    v0.4s, v0.4s, 3
        xtn     v2.4h, v1.4s
        xtn     v3.8h, v0.4s
        str     d3, d2, [x1], 16
I think combine would have done its job.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/98792] Fail to use SHRN instructions for narrowing shift on aarch64
  2021-01-22 11:27 [Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64 ktkachov at gcc dot gnu.org
  2021-01-22 12:36 ` [Bug target/98792] " rguenth at gcc dot gnu.org
  2021-03-07  1:29 ` pinskia at gcc dot gnu.org
@ 2021-09-02 10:21 ` pinskia at gcc dot gnu.org
  2023-12-16  2:57 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-02 10:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98792

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
We do produce shrn2 but not shrn now:

.L2:
        ldp     q0, q1, [x0]
        add     x0, x0, 32
        ushr    v0.4s, v0.4s, 3
        xtn     v0.4h, v0.4s
        shrn2   v0.8h, v1.4s, 3
        str     q0, [x1], 16
        cmp     x0, x2
        bne     .L2

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/98792] Fail to use SHRN instructions for narrowing shift on aarch64
  2021-01-22 11:27 [Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64 ktkachov at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-09-02 10:21 ` pinskia at gcc dot gnu.org
@ 2023-12-16  2:57 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-12-16  2:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98792

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
It was fixed in GCC 12 by one of the following commits:
r12-7142-g83d7e720cd1d07
r12-7141-gbce43c0493f65d
r12-7140-g4057266ce5afc1
r12-7138-gaeef5c57f161ad

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-12-16  2:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22 11:27 [Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64 ktkachov at gcc dot gnu.org
2021-01-22 12:36 ` [Bug target/98792] " rguenth at gcc dot gnu.org
2021-03-07  1:29 ` pinskia at gcc dot gnu.org
2021-09-02 10:21 ` pinskia at gcc dot gnu.org
2023-12-16  2:57 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).