public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/103781] New: [AArch64, 11 regr.] Failed partial vectorization of mulv2di3
@ 2021-12-20 20:54 husseydevin at gmail dot com
2021-12-20 20:59 ` [Bug middle-end/103781] " pinskia at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: husseydevin at gmail dot com @ 2021-12-20 20:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103781
Bug ID: 103781
Summary: [AArch64, 11 regr.] Failed partial vectorization of
mulv2di3
Product: gcc
Version: 11.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: husseydevin at gmail dot com
Target Milestone: ---
As of GCC 11, the AArch64 backend is very greedy in trying to vectorize
mulv2di3. However, there is no mulv2di3 routine so it extracts from the vector.
The bad codegen should be obvious.
#include <stdint.h>
void fma_u64(uint64_t *restrict acc, const uint64_t *restrict x, const uint64_t
*restrict y)
{
for (int i = 0; i < 16384; i++){
acc[0] += *x++ * *y++;
acc[1] += *x++ * *y++;
}
}
gcc-11 -O3
fma_u64:
.LFB0:
.cfi_startproc
ldr q1, [x0]
add x6, x1, 262144
.p2align 3,,7
.L2:
ldr x4, [x1], 16
ldr x5, [x2], 16
ldr x3, [x1, -8]
mul x4, x4, x5
ldr x5, [x2, -8]
fmov d0, x4
ins v0.d[1], x5
mul x3, x3, x5
ins v0.d[1], x3
add v1.2d, v1.2d, v0.2d
cmp x1, x6
bne .L2
str q1, [x0]
ret
.cfi_endproc
GCC 10.2.1 emits better code.
fma_u64:
.LFB0:
.cfi_startproc
ldp x4, x3, [x0]
add x9, x1, 262144
.p2align 3,,7
.L2:
ldr x8, [x1], 16
ldr x7, [x2], 16
ldr x6, [x1, -8]
ldr x5, [x2, -8]
madd x4, x8, x7, x4
madd x3, x6, x5, x3
cmp x9, x1
bne .L2
stp x4, x3, [x0]
ret
.cfi_endproc
However, the ideal code would be a 2 iteration unroll.
Side note: why not ldp in the loop?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/103781] [AArch64, 11 regr.] Failed partial vectorization of mulv2di3
2021-12-20 20:54 [Bug middle-end/103781] New: [AArch64, 11 regr.] Failed partial vectorization of mulv2di3 husseydevin at gmail dot com
@ 2021-12-20 20:59 ` pinskia at gcc dot gnu.org
2021-12-20 21:12 ` [Bug target/103781] Cost model for SLP for aarch64 is not so good still husseydevin at gmail dot com
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-20 20:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103781
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>Side note: why not ldp in the loop?
Because of the way LDP formation is done, it is just badly done in general
(file a different bug for that). It is a known issue that ldp/stp formation is
not good really.
>As of GCC 11, the AArch64 backend is very greedy in trying to vectorize mulv2di3.
No, you are actually seeing SLP happening really and since mul does not exist,
it does not do that.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/103781] Cost model for SLP for aarch64 is not so good still
2021-12-20 20:54 [Bug middle-end/103781] New: [AArch64, 11 regr.] Failed partial vectorization of mulv2di3 husseydevin at gmail dot com
2021-12-20 20:59 ` [Bug middle-end/103781] " pinskia at gcc dot gnu.org
@ 2021-12-20 21:12 ` husseydevin at gmail dot com
2021-12-20 21:13 ` pinskia at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: husseydevin at gmail dot com @ 2021-12-20 21:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103781
--- Comment #2 from Devin Hussey <husseydevin at gmail dot com> ---
Yeah my bad, I meant SLP, I get them mixed up all the time.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/103781] Cost model for SLP for aarch64 is not so good still
2021-12-20 20:54 [Bug middle-end/103781] New: [AArch64, 11 regr.] Failed partial vectorization of mulv2di3 husseydevin at gmail dot com
2021-12-20 20:59 ` [Bug middle-end/103781] " pinskia at gcc dot gnu.org
2021-12-20 21:12 ` [Bug target/103781] Cost model for SLP for aarch64 is not so good still husseydevin at gmail dot com
@ 2021-12-20 21:13 ` pinskia at gcc dot gnu.org
2021-12-20 21:51 ` [Bug target/103781] generic/cortex-a53 cost model for SLP for aarch64 is good husseydevin at gmail dot com
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-20 21:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103781
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I think it is the generic (and cortex-a53) cost model which is bad,
-mcpu=cortex-a57 and -mcpu=neoverse-n1 is fine.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/103781] generic/cortex-a53 cost model for SLP for aarch64 is good
2021-12-20 20:54 [Bug middle-end/103781] New: [AArch64, 11 regr.] Failed partial vectorization of mulv2di3 husseydevin at gmail dot com
` (2 preceding siblings ...)
2021-12-20 21:13 ` pinskia at gcc dot gnu.org
@ 2021-12-20 21:51 ` husseydevin at gmail dot com
2023-12-16 4:37 ` pinskia at gcc dot gnu.org
2024-01-26 0:47 ` pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: husseydevin at gmail dot com @ 2021-12-20 21:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103781
--- Comment #4 from Devin Hussey <husseydevin at gmail dot com> ---
Makes sense because the multiplier is what, 5 cycles on an A53?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/103781] generic/cortex-a53 cost model for SLP for aarch64 is good
2021-12-20 20:54 [Bug middle-end/103781] New: [AArch64, 11 regr.] Failed partial vectorization of mulv2di3 husseydevin at gmail dot com
` (3 preceding siblings ...)
2021-12-20 21:51 ` [Bug target/103781] generic/cortex-a53 cost model for SLP for aarch64 is good husseydevin at gmail dot com
@ 2023-12-16 4:37 ` pinskia at gcc dot gnu.org
2024-01-26 0:47 ` pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-12-16 4:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103781
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |pinskia at gcc dot gnu.org
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I know that the generic cost model has changed on the trunk but I am not sure
this one is fixed ...
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/103781] generic/cortex-a53 cost model for SLP for aarch64 is good
2021-12-20 20:54 [Bug middle-end/103781] New: [AArch64, 11 regr.] Failed partial vectorization of mulv2di3 husseydevin at gmail dot com
` (4 preceding siblings ...)
2023-12-16 4:37 ` pinskia at gcc dot gnu.org
@ 2024-01-26 0:47 ` pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-26 0:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103781
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2024-01-26
Ever confirmed|0 |1
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
Note if sve is turned on, we get:
```
.L2:
ldr q30, [x1], 16
ldr q29, [x2], 16
mul z29.d, z30.d, z29.d
add v31.2d, v31.2d, v29.2d
cmp x1, x3
bne .L2
```
For the inner loop on the trunk which is 100% what you want as then it is
vectorized.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-01-26 0:47 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-20 20:54 [Bug middle-end/103781] New: [AArch64, 11 regr.] Failed partial vectorization of mulv2di3 husseydevin at gmail dot com
2021-12-20 20:59 ` [Bug middle-end/103781] " pinskia at gcc dot gnu.org
2021-12-20 21:12 ` [Bug target/103781] Cost model for SLP for aarch64 is not so good still husseydevin at gmail dot com
2021-12-20 21:13 ` pinskia at gcc dot gnu.org
2021-12-20 21:51 ` [Bug target/103781] generic/cortex-a53 cost model for SLP for aarch64 is good husseydevin at gmail dot com
2023-12-16 4:37 ` pinskia at gcc dot gnu.org
2024-01-26 0:47 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).