public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64
@ 2021-01-22 11:27 ktkachov at gcc dot gnu.org
  2021-01-22 12:36 ` [Bug target/98792] " rguenth at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2021-01-22 11:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98792

            Bug ID: 98792
           Summary: Fail to use SHRN instructions for narrowing shift on
                    aarch64
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

#define N 1024
unsigned short res[N];
unsigned int in[N];

void
foo (void)
{
  for (int i = 0; i < N; i++)
    res[i] = in[i] >> 3;
}

with -O3 -mcpu=neoverse-n1 on aarch64 generates the loop:
.L2:
        ldp     q1, q0, [x0]
        add     x0, x0, 32
        ushr    v1.4s, v1.4s, 3
        ushr    v0.4s, v0.4s, 3
        xtn     v2.4h, v1.4s
        xtn2    v2.8h, v0.4s
        str     q2, [x1], 16
        cmp     x0, x2
        bne     .L2

it could be using the SHRN narrowing shift instruction insted. LLVM can do it
(some other inefficiencies aside):
.LBB0_1:                                // %vector.body
                                        // =>This Inner Loop Header: Depth=1
        add     x11, x10, x8
        ldp     q0, q1, [x11]
        add     x8, x8, #32                     // =32
        cmp     x8, #1, lsl #12                 // =4096
        shrn    v0.4h, v0.4s, #3
        shrn    v1.4h, v1.4s, #3
        stp     d0, d1, [x9, #-8]
        add     x9, x9, #16                     // =16
        b.ne    .LBB0_1

Some backend patterns can probably handle it, but maybe the vectoriser can do
something useful earlier as well?

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-12-16  2:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22 11:27 [Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64 ktkachov at gcc dot gnu.org
2021-01-22 12:36 ` [Bug target/98792] " rguenth at gcc dot gnu.org
2021-03-07  1:29 ` pinskia at gcc dot gnu.org
2021-09-02 10:21 ` pinskia at gcc dot gnu.org
2023-12-16  2:57 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).