public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/63277] New: ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around
@ 2014-09-16 12:07 janne-gcc at jannau dot net
2014-09-16 13:20 ` [Bug target/63277] " ktkachov at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: janne-gcc at jannau dot net @ 2014-09-16 12:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63277
Bug ID: 63277
Summary: ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2
for shuffling data unnecessarily around
Product: gcc
Version: 5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: janne-gcc at jannau dot net
Created attachment 33500
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33500&action=edit
small example source code
armv7a-hardfloat-linux-gnueabi-gcc-5.0.0 -v
Using built-in specs.
COLLECT_GCC=/opt/gcc/bin/armv7a-hardfloat-linux-gnueabi-gcc-5.0.0
COLLECT_LTO_WRAPPER=/opt/gcc/libexec/gcc/armv7a-hardfloat-linux-gnueabi/5.0.0/lto-wrapper
Target: armv7a-hardfloat-linux-gnueabi
Configured with: /home/janne/src/gcc-trunk/configure --host=x86_64-pc-linux-gnu
--target=armv7a-hardfloat-linux-gnueabi --build=x86_64-pc-linux-gnu
--prefix=/opt/gcc/ --enable-languages=c,c++,fortran --enable-obsolete
--enable-secureplt --disable-werror --with-system-zlib --enable-nls
--without-included-gettext --enable-checking=release --enable-libstdcxx-time
--enable-poison-system-directories
--with-sysroot=/usr/armv7a-hardfloat-linux-gnueabi --disable-bootstrap
--enable-__cxa_atexit --enable-clocale=gnu --disable-multilib --disable-altivec
--disable-fixed-point --with-float=hard --with-arch=armv7-a --with-float=hard
--with-fpu=neon --disable-libgcj --enable-libgomp --disable-libmudflap
--disable-libssp --enable-lto --without-cloog
Thread model: posix
gcc version 5.0.0 20140916 (experimental) (GCC)
armv7a-hardfloat-linux-gnueabi-gcc-5.0.0 -march=armv7-a -mfpu=neon -O3 -S
arm_neon_excessive_vmov.c -o -
.arch armv7-a
.eabi_attribute 27, 3
.eabi_attribute 28, 1
.fpu neon
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "arm_neon_excessive_vmov.c"
.text
.align 2
.global gf_w8_split_multiply_region_neon
.type gf_w8_split_multiply_region_neon, %function
gf_w8_split_multiply_region_neon:
@ args = 4, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
str lr, [sp, #-4]!
mov r3, r3, asl #4
ldr ip, [sp, #4]
add lr, r0, #4096
add lr, lr, r3
add r0, r0, r3
vmov.i8 q15, #15 @ v16qi
add ip, r2, ip, lsl #4
vld1.8 {d18-d19}, [lr]
cmp r1, ip
vld1.8 {d16-d17}, [r0]
ldrcs pc, [sp], #4
vmov d27, d18 @ v8qi
vmov d26, d19 @ v8qi
vmov d25, d16 @ v8qi
vmov d24, d17 @ v8qi
.L3:
vld1.8 {d18-d19}, [r1]
vmov d20, d25 @ v8qi
vmov d21, d24 @ v8qi
add r1, r1, #16
vshr.u8 q14, q9, #4
cmp r1, ip
vmov d22, d27 @ v8qi
vmov d23, d26 @ v8qi
vtbl.8 d16, {d20, d21}, d28
vand q9, q9, q15
vtbl.8 d28, {d20, d21}, d29
vtbl.8 d17, {d22, d23}, d18
vmov d29, d28 @ v8qi
vmov d28, d16 @ v8qi
vtbl.8 d16, {d22, d23}, d19
vswp d16, d17
veor q8, q8, q14
vst1.8 {d16-d17}, [r2]
add r2, r2, #16
bcc .L3
ldr pc, [sp], #4
.size gf_w8_split_multiply_region_neon,
.-gf_w8_split_multiply_region_neon
.ident "GCC: (GNU) 5.0.0 20140916 (experimental)"
.section .note.GNU-stack,"",%progbits
There is no need for the vmovs/vswp and clang 3.4 generates from the same
source file assembly without them.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/63277] ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around
2014-09-16 12:07 [Bug target/63277] New: ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around janne-gcc at jannau dot net
@ 2014-09-16 13:20 ` ktkachov at gcc dot gnu.org
2014-09-16 14:45 ` janne-gcc at jannau dot net
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2014-09-16 13:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63277
ktkachov at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Target| |arm
Status|UNCONFIRMED |NEW
Last reconfirmed| |2014-09-16
CC| |ktkachov at gcc dot gnu.org
Ever confirmed|0 |1
Known to fail| |5.0
--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed.
The vmovs and vswp are generated from the implementation of our vcombine. The
define_insn_and split for neon_vcombine<mode> splits into moves/swaps after
reload, thus not giving the register allocator a chance to optimise the moves
away.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/63277] ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around
2014-09-16 12:07 [Bug target/63277] New: ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around janne-gcc at jannau dot net
2014-09-16 13:20 ` [Bug target/63277] " ktkachov at gcc dot gnu.org
@ 2014-09-16 14:45 ` janne-gcc at jannau dot net
2014-09-16 14:45 ` janne-gcc at jannau dot net
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: janne-gcc at jannau dot net @ 2014-09-16 14:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63277
--- Comment #2 from Janne Grunau <janne-gcc at jannau dot net> ---
It is not only the vcombine.
The handling of the table vectors is even more dreadful. The loads are combined
to properly paired registers. Then moved in reverse in order to different
registers to be assembled again in the loop to properly paired registers for
vtbl2. See the attached arm_neon_excessive_vmov_wo_vcombine.c
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/63277] ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around
2014-09-16 12:07 [Bug target/63277] New: ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around janne-gcc at jannau dot net
2014-09-16 13:20 ` [Bug target/63277] " ktkachov at gcc dot gnu.org
2014-09-16 14:45 ` janne-gcc at jannau dot net
@ 2014-09-16 14:45 ` janne-gcc at jannau dot net
2014-09-18 1:26 ` cbaylis at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: janne-gcc at jannau dot net @ 2014-09-16 14:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63277
--- Comment #3 from Janne Grunau <janne-gcc at jannau dot net> ---
Created attachment 33501
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33501&action=edit
arm_neon_excessive_vmov_wo_vcombine.c
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/63277] ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around
2014-09-16 12:07 [Bug target/63277] New: ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around janne-gcc at jannau dot net
` (2 preceding siblings ...)
2014-09-16 14:45 ` janne-gcc at jannau dot net
@ 2014-09-18 1:26 ` cbaylis at gcc dot gnu.org
2015-05-14 10:31 ` [Bug rtl-optimization/63277] " ramana at gcc dot gnu.org
2015-06-25 20:51 ` ramana at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: cbaylis at gcc dot gnu.org @ 2014-09-18 1:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63277
cbaylis at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |cbaylis at gcc dot gnu.org
--- Comment #4 from cbaylis at gcc dot gnu.org ---
A much simplified test case based on arm_neon_excessive_vmov_wo_vcombine.c
$ arm-unknown-linux-gnueabihf-gcc -O2 -S -o - -mfpu=neon mini.c
#include <arm_neon.h>
void f(int8_t *p)
{
int8x16_t v;
int8x8_t v2;
int8x8x2_t vx;
v=vld1q_s8(p);
v2=vld1_s8(p);
vx.val[0] = vget_low_s8(v);
vx.val[1] = vget_high_s8(v);
v2 = vtbl2_s8(vx, v2);
vst1_s8(p, v2);
}
With -dp, the generated code is:
f:
vld1.8 {d18-d19}, [r0] @ 6 neon_vld1v16qi [length = 4]
vmov d16, d18 @ v8qi @ 10 *neon_movv8qi/1 [length = 4]
vld1.8 {d20}, [r0] @ 7 neon_vld1v8qi [length = 4]
vmov d17, d19 @ v8qi @ 11 *neon_movv8qi/1 [length = 4]
vtbl.8 d16, {d16, d17}, d20 @ 12 neon_vtbl2v8qi [length = 4]
vst1.8 {d16}, [r0] @ 13 neon_vst1v8qi [length = 4]
bx lr @ 24 *thumb2_return [length = 4]
By the time IRA runs, the insns which result in the moves look like this:
(insn 9 18 11 2 (set (subreg:V8QI (reg/v:TI 116 [ vx ]) 0)
(subreg:V8QI (reg:V16QI 114 [ D.14019 ]) 0)) /tmp/mini.c:11 827
{*neon_movv8qi}
The registers 116 and 114 are allocated to different hard registers, as they
conflict. Presumably, the register allocator could be taught to treat this
subreg->subreg move as a copy and allow the same hard register to be allocated.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/63277] ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around
2014-09-16 12:07 [Bug target/63277] New: ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around janne-gcc at jannau dot net
` (3 preceding siblings ...)
2014-09-18 1:26 ` cbaylis at gcc dot gnu.org
@ 2015-05-14 10:31 ` ramana at gcc dot gnu.org
2015-06-25 20:51 ` ramana at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: ramana at gcc dot gnu.org @ 2015-05-14 10:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63277
Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |ra
CC| |ramana at gcc dot gnu.org,
| |vmakarov at gcc dot gnu.org
Component|target |rtl-optimization
--- Comment #5 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
Adding Vlad to CC.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/63277] ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around
2014-09-16 12:07 [Bug target/63277] New: ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around janne-gcc at jannau dot net
` (4 preceding siblings ...)
2015-05-14 10:31 ` [Bug rtl-optimization/63277] " ramana at gcc dot gnu.org
@ 2015-06-25 20:51 ` ramana at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: ramana at gcc dot gnu.org @ 2015-06-25 20:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63277
Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |47562
--- Comment #6 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
Link to meta bug.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47562
[Bug 47562] [meta-bug] keep track of Neon enhancements
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-06-25 20:51 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-16 12:07 [Bug target/63277] New: ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around janne-gcc at jannau dot net
2014-09-16 13:20 ` [Bug target/63277] " ktkachov at gcc dot gnu.org
2014-09-16 14:45 ` janne-gcc at jannau dot net
2014-09-16 14:45 ` janne-gcc at jannau dot net
2014-09-18 1:26 ` cbaylis at gcc dot gnu.org
2015-05-14 10:31 ` [Bug rtl-optimization/63277] " ramana at gcc dot gnu.org
2015-06-25 20:51 ` ramana at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).