public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/98143] New: arm: missed vectorization with MVE compared to Neon
@ 2020-12-04 14:22 clyon at gcc dot gnu.org
2021-04-21 15:31 ` [Bug target/98143] " clyon at gcc dot gnu.org
0 siblings, 1 reply; 2+ messages in thread
From: clyon at gcc dot gnu.org @ 2020-12-04 14:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98143
Bug ID: 98143
Summary: arm: missed vectorization with MVE compared to Neon
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: clyon at gcc dot gnu.org
Target Milestone: ---
While working on enabling auto-vectorization for MVE, I noticed a missed
optimization compared to Neon:
#include <stdint.h>
uint16_t *dest;
void func()
{
int i;
for (i=0;i<16;i++)
dest[i]=3;
}
Compiled with -O3 -S -dp -mfloat-abi=hard -mfpu=auto -mcpu=cortex-a9 -mthumb:
func:
movw r3, #:lower16:.LANCHOR0 @ 15 [c=4 l=4] *thumb2_movsi_vfp/4
vmov.i16 q8, #3 @ v8hi @ 7 [c=4 l=4] *neon_movv8hi/2
movt r3, #:upper16:.LANCHOR0 @ 16 [c=4 l=4] *arm_movt/0
ldr r3, [r3] @ 14 [c=12 l=4] *thumb2_movsi_vfp/5
vst1.16 {q8}, [r3]! @ 8 [c=8 l=4] *movmisalignv8hi_neon_store
vst1.16 {q8}, [r3] @ 11 [c=8 l=4] *movmisalignv8hi_neon_store
bx lr @ 44 [c=8 l=4] *thumb2_return
Compiled with -O3 -S -dp -mfloat-abi=hard -mfpu=auto -march=armv8.1-m.main+mve
-mthumb:
func:
movs r2, #3 @ 7 [c=4 l=2] *thumb2_movsi_shortim
ldr r3, .L3 @ 5 [c=12 l=4] *thumb2_movsi_vfp/5
ldr r3, [r3] @ 6 [c=12 l=4] *thumb2_movsi_vfp/5
strh r2, [r3] @ movhi @ 9 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #2] @ movhi @ 12 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #4] @ movhi @ 15 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #6] @ movhi @ 18 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #8] @ movhi @ 21 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #10] @ movhi @ 24 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #12] @ movhi @ 27 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #14] @ movhi @ 30 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #16] @ movhi @ 33 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #18] @ movhi @ 36 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #20] @ movhi @ 39 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #22] @ movhi @ 42 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #24] @ movhi @ 45 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #26] @ movhi @ 48 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #28] @ movhi @ 51 [c=4 l=4] *thumb2_movhi_vfp/4
strh r2, [r3, #30] @ movhi @ 54 [c=4 l=4] *thumb2_movhi_vfp/4
bx lr @ 84 [c=8 l=4] *thumb2_return
This PR is about building the const, as the problems with stores are probably
part of PR97875.
In summry, with Neon we build the constant vector with:
vmov.i16 q8, #3 @ v8hi @ 7 [c=4 l=4] *neon_movv8hi/2
but with MVE:
movs r2, #3 @ 7 [c=4 l=2] *thumb2_movsi_shortim
and then store it as 16-bits value as many times as needed.
I haven't managed to understand why we can't make use of mve.md's mve_mov<mode>
where there is an alternative with "Dm", which should work?
^ permalink raw reply [flat|nested] 2+ messages in thread
* [Bug target/98143] arm: missed vectorization with MVE compared to Neon
2020-12-04 14:22 [Bug target/98143] New: arm: missed vectorization with MVE compared to Neon clyon at gcc dot gnu.org
@ 2021-04-21 15:31 ` clyon at gcc dot gnu.org
0 siblings, 0 replies; 2+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-04-21 15:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98143
--- Comment #1 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Current trunk generates for MVE:
ldr r3, .L3+16 @ 5 [c=12 l=4] *thumb2_movsi_vfp/5
vldr.64 d6, .L3 @ 7 [c=8 l=4] *mve_movv8hi/8
vldr.64 d7, .L3+8
ldr r3, [r3] @ 14 [c=12 l=4] *thumb2_movsi_vfp/5
mov r2, r3 @ 17 [c=4 l=2] *thumb2_movsi_vfp/0
adds r3, r3, #16 @ 18 [c=4 l=2] *thumb2_addsi_short/1
vstrh.16 q3, [r2] @ 8 [c=8 l=4]
*movmisalignv8hi_mve_store
vstrh.16 q3, [r3] @ 11 [c=8 l=4]
*movmisalignv8hi_mve_store
bx lr @ 45 [c=8 l=4] *thumb2_return
.L4:
.align 3
.L3:
.short 3
.short 3
.short 3
.short 3
.short 3
.short 3
.short 3
.short 3
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-04-21 15:31 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-04 14:22 [Bug target/98143] New: arm: missed vectorization with MVE compared to Neon clyon at gcc dot gnu.org
2021-04-21 15:31 ` [Bug target/98143] " clyon at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).