public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64
@ 2021-01-22 11:27 ktkachov at gcc dot gnu.org
2021-01-22 12:36 ` [Bug target/98792] " rguenth at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2021-01-22 11:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98792
Bug ID: 98792
Summary: Fail to use SHRN instructions for narrowing shift on
aarch64
Product: gcc
Version: unknown
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: ktkachov at gcc dot gnu.org
Target Milestone: ---
Target: aarch64
#define N 1024
unsigned short res[N];
unsigned int in[N];
void
foo (void)
{
for (int i = 0; i < N; i++)
res[i] = in[i] >> 3;
}
with -O3 -mcpu=neoverse-n1 on aarch64 generates the loop:
.L2:
ldp q1, q0, [x0]
add x0, x0, 32
ushr v1.4s, v1.4s, 3
ushr v0.4s, v0.4s, 3
xtn v2.4h, v1.4s
xtn2 v2.8h, v0.4s
str q2, [x1], 16
cmp x0, x2
bne .L2
it could be using the SHRN narrowing shift instruction insted. LLVM can do it
(some other inefficiencies aside):
.LBB0_1: // %vector.body
// =>This Inner Loop Header: Depth=1
add x11, x10, x8
ldp q0, q1, [x11]
add x8, x8, #32 // =32
cmp x8, #1, lsl #12 // =4096
shrn v0.4h, v0.4s, #3
shrn v1.4h, v1.4s, #3
stp d0, d1, [x9, #-8]
add x9, x9, #16 // =16
b.ne .LBB0_1
Some backend patterns can probably handle it, but maybe the vectoriser can do
something useful earlier as well?
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/98792] Fail to use SHRN instructions for narrowing shift on aarch64
2021-01-22 11:27 [Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64 ktkachov at gcc dot gnu.org
@ 2021-01-22 12:36 ` rguenth at gcc dot gnu.org
2021-03-07 1:29 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-22 12:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98792
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |53947
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
would need such concept, like a named pattern and a vector pattern recognizing
it.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/98792] Fail to use SHRN instructions for narrowing shift on aarch64
2021-01-22 11:27 [Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64 ktkachov at gcc dot gnu.org
2021-01-22 12:36 ` [Bug target/98792] " rguenth at gcc dot gnu.org
@ 2021-03-07 1:29 ` pinskia at gcc dot gnu.org
2021-09-02 10:21 ` pinskia at gcc dot gnu.org
2023-12-16 2:57 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-03-07 1:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98792
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2021-03-07
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Severity|normal |enhancement
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
(insn 17 16 18 3 (set (reg:V8HI 109 [ vect__3.8 ])
(vec_concat:V8HI (truncate:V4HI (reg:V4SI 105 [ vect__2.7 ]))
(truncate:V4HI (reg:V4SI 107 [ vect__2.7 ])))) "t9.c":9:16 1942
{vec_pack_trunc_v4si}
(expr_list:REG_DEAD (reg:V4SI 107 [ vect__2.7 ])
(expr_list:REG_DEAD (reg:V4SI 105 [ vect__2.7 ])
(nil))))
(insn 18 17 19 3 (set (mem:V8HI (post_inc:DI (reg:DI 92 [ ivtmp.16 ])) [2 MEM
<vector(8) short unsigned int> [(short unsigned int *)_7]+0 S16 A128])
(reg:V8HI 109 [ vect__3.8 ])) "t9.c":9:16 1161 {*aarch64_simd_movv8hi}
(expr_list:REG_DEAD (reg:V8HI 109 [ vect__3.8 ])
(expr_list:REG_INC (reg:DI 92 [ ivtmp.16 ])
(nil))))
Part of the problem is the above.
So this might need to be done at the gimple level such that we don't do the
vec_concat in the first place ....
That is if we had the RTL for:
ushr v1.4s, v1.4s, 3
ushr v0.4s, v0.4s, 3
xtn v2.4h, v1.4s
xtn v3.8h, v0.4s
str d3, d2, [x1], 16
I think combine would have done its job.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/98792] Fail to use SHRN instructions for narrowing shift on aarch64
2021-01-22 11:27 [Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64 ktkachov at gcc dot gnu.org
2021-01-22 12:36 ` [Bug target/98792] " rguenth at gcc dot gnu.org
2021-03-07 1:29 ` pinskia at gcc dot gnu.org
@ 2021-09-02 10:21 ` pinskia at gcc dot gnu.org
2023-12-16 2:57 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-02 10:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98792
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
We do produce shrn2 but not shrn now:
.L2:
ldp q0, q1, [x0]
add x0, x0, 32
ushr v0.4s, v0.4s, 3
xtn v0.4h, v0.4s
shrn2 v0.8h, v1.4s, 3
str q0, [x1], 16
cmp x0, x2
bne .L2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/98792] Fail to use SHRN instructions for narrowing shift on aarch64
2021-01-22 11:27 [Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64 ktkachov at gcc dot gnu.org
` (2 preceding siblings ...)
2021-09-02 10:21 ` pinskia at gcc dot gnu.org
@ 2023-12-16 2:57 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-12-16 2:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98792
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
It was fixed in GCC 12 by one of the following commits:
r12-7142-g83d7e720cd1d07
r12-7141-gbce43c0493f65d
r12-7140-g4057266ce5afc1
r12-7138-gaeef5c57f161ad
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-12-16 2:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22 11:27 [Bug target/98792] New: Fail to use SHRN instructions for narrowing shift on aarch64 ktkachov at gcc dot gnu.org
2021-01-22 12:36 ` [Bug target/98792] " rguenth at gcc dot gnu.org
2021-03-07 1:29 ` pinskia at gcc dot gnu.org
2021-09-02 10:21 ` pinskia at gcc dot gnu.org
2023-12-16 2:57 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).