public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/95967] New: Poor aarch64 vector constructor code when using arm_neon.h
@ 2020-06-29 15:20 rsandifo at gcc dot gnu.org
2020-06-29 15:21 ` [Bug target/95967] " rsandifo at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2020-06-29 15:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95967
Bug ID: 95967
Summary: Poor aarch64 vector constructor code when using
arm_neon.h
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: rsandifo at gcc dot gnu.org
Depends on: 95962
Blocks: 95958
Target Milestone: ---
Target: aarch64*-*-*
We generate poor code for the attached functions:
f1:
movi v4.4s, 0
ins v4.s[0], v0.s[0]
ins v4.s[1], v1.s[0]
ins v4.s[2], v2.s[0]
mov v0.16b, v4.16b
ins v0.s[3], v3.s[0]
ret
f2:
dup v0.4s, v0.s[0]
ins v0.s[1], v1.s[0]
ins v0.s[2], v2.s[0]
ins v0.s[3], v3.s[0]
ret
f3:
sub sp, sp, #16
stp s0, s1, [sp]
stp s2, s3, [sp, 8]
ldr q0, [sp]
add sp, sp, 16
ret
g1:
movi v0.4s, 0
ld1 {v0.s}[0], [x0]
ld1 {v0.s}[1], [x1]
ld1 {v0.s}[2], [x2]
ld1 {v0.s}[3], [x3]
ret
g2:
ld1r {v0.4s}, [x0]
ld1 {v0.s}[1], [x1]
ld1 {v0.s}[2], [x2]
ld1 {v0.s}[3], [x3]
ret
g3:
sub sp, sp, #16
ldr s0, [x3]
ldr s3, [x0]
ldr s2, [x1]
ldr s1, [x2]
stp s3, s2, [sp]
stp s1, s0, [sp, 8]
ldr q0, [sp]
add sp, sp, 16
ret
All three f functions should generate:
mov v0.s[1], v1.s[0]
mov v0.s[2], v2.s[0]
mov v0.s[3], v3.s[0]
ret
and all three g functions should generate:
ldr s0, [x0]
ld1 { v0.s }[1], [x1]
ld1 { v0.s }[2], [x2]
ld1 { v0.s }[3], [x3]
ret
which is what current Clang does.
Getting the right code for f3 and g3 depends on the fix for PR95962.
There's a reasonable chance that PR95962 will be enough on its own
to fix f3 and g3, but I included them just in case it isn't.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958
[Bug 95958] [meta-bug] Inefficient arm_neon.h code for AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95962
[Bug 95962] Inefficient code for simple arm_neon.h iota operation
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/95967] Poor aarch64 vector constructor code when using arm_neon.h
2020-06-29 15:20 [Bug target/95967] New: Poor aarch64 vector constructor code when using arm_neon.h rsandifo at gcc dot gnu.org
@ 2020-06-29 15:21 ` rsandifo at gcc dot gnu.org
2021-05-30 23:08 ` pinskia at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2020-06-29 15:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95967
--- Comment #1 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
Created attachment 48802
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48802&action=edit
6 constructor functions
This time with attachment -- not sure what happened last time.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/95967] Poor aarch64 vector constructor code when using arm_neon.h
2020-06-29 15:20 [Bug target/95967] New: Poor aarch64 vector constructor code when using arm_neon.h rsandifo at gcc dot gnu.org
2020-06-29 15:21 ` [Bug target/95967] " rsandifo at gcc dot gnu.org
@ 2021-05-30 23:08 ` pinskia at gcc dot gnu.org
2021-05-30 23:10 ` pinskia at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-05-30 23:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95967
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/95967] Poor aarch64 vector constructor code when using arm_neon.h
2020-06-29 15:20 [Bug target/95967] New: Poor aarch64 vector constructor code when using arm_neon.h rsandifo at gcc dot gnu.org
2020-06-29 15:21 ` [Bug target/95967] " rsandifo at gcc dot gnu.org
2021-05-30 23:08 ` pinskia at gcc dot gnu.org
@ 2021-05-30 23:10 ` pinskia at gcc dot gnu.org
2021-05-30 23:13 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-05-30 23:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95967
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2021-05-30
Ever confirmed|0 |1
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/95967] Poor aarch64 vector constructor code when using arm_neon.h
2020-06-29 15:20 [Bug target/95967] New: Poor aarch64 vector constructor code when using arm_neon.h rsandifo at gcc dot gnu.org
` (2 preceding siblings ...)
2021-05-30 23:10 ` pinskia at gcc dot gnu.org
@ 2021-05-30 23:13 ` pinskia at gcc dot gnu.org
2021-05-30 23:19 ` pinskia at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-05-30 23:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95967
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |pinskia at gcc dot gnu.org
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 50891
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50891&action=edit
Start of the patch which should fix most of it
This patch which should fix the majority of the problem though I have not
tested it on the testcase. Basically it takes the BIT_INSERT_EXPR's and
"combines" them such that they become a CONSTRUCTOR.
I am still deciding if this belongs in reassoc or as part of forwprop.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/95967] Poor aarch64 vector constructor code when using arm_neon.h
2020-06-29 15:20 [Bug target/95967] New: Poor aarch64 vector constructor code when using arm_neon.h rsandifo at gcc dot gnu.org
` (3 preceding siblings ...)
2021-05-30 23:13 ` pinskia at gcc dot gnu.org
@ 2021-05-30 23:19 ` pinskia at gcc dot gnu.org
2021-06-03 0:46 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-05-30 23:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95967
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #3)
> Created attachment 50891 [details]
> Start of the patch which should fix most of it
>
> This patch which should fix the majority of the problem though I have not
> tested it on the testcase. Basically it takes the BIT_INSERT_EXPR's and
> "combines" them such that they become a CONSTRUCTOR.
> I am still deciding if this belongs in reassoc or as part of forwprop.
This patch looks like it only fixes f1 and g1.
Most likely because I did not handle CONSTRUCTOR as the initial case:
f2:
_6 = {s0_2(D), s0_2(D), s0_2(D), s0_2(D)};
__builtin_aarch64_im_lane_boundsi (16, 4, 1);
__builtin_aarch64_im_lane_boundsi (16, 4, 2);
__builtin_aarch64_im_lane_boundsi (16, 4, 3);
_10 = BIT_INSERT_EXPR <_6, s1_3(D), 32>;
_12 = BIT_INSERT_EXPR <_10, s2_4(D), 64>;
__vec_14 = BIT_INSERT_EXPR <_12, s3_5(D), 96>;
I will look into adding that in a few weeks and add a testcase for it too.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/95967] Poor aarch64 vector constructor code when using arm_neon.h
2020-06-29 15:20 [Bug target/95967] New: Poor aarch64 vector constructor code when using arm_neon.h rsandifo at gcc dot gnu.org
` (4 preceding siblings ...)
2021-05-30 23:19 ` pinskia at gcc dot gnu.org
@ 2021-06-03 0:46 ` pinskia at gcc dot gnu.org
2021-08-12 7:37 ` tnfchris at gcc dot gnu.org
2023-05-12 6:06 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-03 0:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95967
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Depends on| |93237
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
f1 is really PR 93237.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93237
[Bug 93237] vector defined using inserts is not converted into constructors
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/95967] Poor aarch64 vector constructor code when using arm_neon.h
2020-06-29 15:20 [Bug target/95967] New: Poor aarch64 vector constructor code when using arm_neon.h rsandifo at gcc dot gnu.org
` (5 preceding siblings ...)
2021-06-03 0:46 ` pinskia at gcc dot gnu.org
@ 2021-08-12 7:37 ` tnfchris at gcc dot gnu.org
2023-05-12 6:06 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2021-08-12 7:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95967
Tamar Christina <tnfchris at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tnfchris at gcc dot gnu.org
--- Comment #6 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
f3 and g3 no longer use the stack:
f3:
ins v0.s[1], v1.s[0]
sub sp, sp, #16
add sp, sp, 16
ins v0.s[2], v2.s[0]
ins v0.s[3], v3.s[0]
ret
g3:
ldr s0, [x0]
sub sp, sp, #16
ld1 {v0.s}[1], [x1]
ld1 {v0.s}[2], [x2]
ld1 {v0.s}[3], [x3]
add sp, sp, 16
ret
Though we still allocate the space for it (but that's a general AArch64 issue).
for the other cases the only thing left is the initializations.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/95967] Poor aarch64 vector constructor code when using arm_neon.h
2020-06-29 15:20 [Bug target/95967] New: Poor aarch64 vector constructor code when using arm_neon.h rsandifo at gcc dot gnu.org
` (6 preceding siblings ...)
2021-08-12 7:37 ` tnfchris at gcc dot gnu.org
@ 2023-05-12 6:06 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-12 6:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95967
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #6)
> f3 and g3 no longer use the stack:
...
> Though we still allocate the space for it (but that's a general AArch64
> issue). for the other cases the only thing left is the initializations.
That was fixed on the trunk, maybe even for GCC 13.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-05-12 6:06 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-29 15:20 [Bug target/95967] New: Poor aarch64 vector constructor code when using arm_neon.h rsandifo at gcc dot gnu.org
2020-06-29 15:21 ` [Bug target/95967] " rsandifo at gcc dot gnu.org
2021-05-30 23:08 ` pinskia at gcc dot gnu.org
2021-05-30 23:10 ` pinskia at gcc dot gnu.org
2021-05-30 23:13 ` pinskia at gcc dot gnu.org
2021-05-30 23:19 ` pinskia at gcc dot gnu.org
2021-06-03 0:46 ` pinskia at gcc dot gnu.org
2021-08-12 7:37 ` tnfchris at gcc dot gnu.org
2023-05-12 6:06 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).