public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
@ 2012-01-24 15:29 eric.batut at allegorithmic dot com
2012-01-24 15:31 ` [Bug target/51980] " rguenth at gcc dot gnu.org
` (13 more replies)
0 siblings, 14 replies; 15+ messages in thread
From: eric.batut at allegorithmic dot com @ 2012-01-24 15:29 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
Bug #: 51980
Summary: ARM - Neon code polluted by useless stores to the
stack with vuzpq / vzipq / vtrnq
Classification: Unclassified
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: eric.batut@allegorithmic.com
Created attachment 26442
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26442
Minimal repro case (C file)
When using UZP/ZIP/TRN Neon intrinsics, gcc-trunk generates a whole lot of
stack operations (and associated stack alignment operations) even if everything
can purely be done using Neon registers.
Compiler used is GCC trunk, rev 183468, compiled with Android's build-gcc.sh
(arm-linux-androideabi).
Command line is:
arm-linux-androideabi-g++ -c -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard
-mfpu=vfp -flax-vector-conversions -mfpu=neon -O2 -o test.s test.c -S
Generated assembly code for attached C file is:
_Z13sqrlen4D_16u817__simd128_uint8_tS_:
vabd.u8 q1, q0, q1
stmfd sp!, {r4, fp} <= Unnecessary
add fp, sp, #4 <= Unnecessary
sub sp, sp, #48 <= Unnecessary
add r3, sp, #15 <= Unnecessary
vmull.u8 q0, d2, d2
bic r3, r3, #15 <= Unnecessary
vmull.u8 q8, d3, d3
vuzp.32 q0, q8
vstmia r3, {d0-d1} <= Unnecessary, caused by vuzp.32
vstr d16, [r3, #16] <= Unnecessary, caused by vuzp.32
vstr d17, [r3, #24] <= Unnecessary, caused by vuzp.32
vpaddl.u16 q0, q0
vpadal.u16 q0, q8
sub sp, fp, #4 <= Unnecessary
ldmfd sp!, {r4, fp} <= Unnecessary
bx lr
As no stack operation is needed in this function, ideally the following should
be generated instead:
_Z13sqrlen4D_16u817__simd128_uint8_tS_:
vabd.u8 q1, q0, q1
vmull.u8 q0, d2, d2
vmull.u8 q8, d3, d3
vuzp.32 q0, q8
vpaddl.u16 q0, q0
vpadal.u16 q0, q8
bx lr
This makes even tight Neon functions written with intrinsics much larger and
slower than necessary, and makes it very hard to write performance-oriented
code with intrinsics in arm-gcc.
gcc -v yields:
Using built-in specs.
COLLECT_GCC=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/bin/arm-linux-androideabi-g++
COLLECT_LTO_WRAPPER=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/lto-wrapper
Target: arm-linux-androideabi
Configured with: /home/eb/android-ndk-r6/src/build/../gcc/gcc-4.7.0/configure
--prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86
--target=arm-linux-androideabi --host=i386-linux-gnu --build=i386-linux-gnu
--with-gnu-as --with-gnu-ld --enable-languages=c,c++
--with-gmp=/tmp/ndk-eb/build/toolchain/temp-install
--with-mpfr=/tmp/ndk-eb/build/toolchain/temp-install
--with-mpc=/tmp/ndk-eb/build/toolchain/temp-install --disable-libssp
--enable-threads --disable-nls --disable-libmudflap --disable-libgomp
--disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared --disable-tls
--with-float=soft --with-fpu=vfp --with-arch=armv5te --enable-target-optspace
--enable-initfini-array --disable-nls
--prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86
--with-sysroot=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot
--with-binutils-version=2.21.53 --with-mpfr-version=3.0.1
--with-gmp-version=5.0.2 --with-gcc-version=4.7.0 --with-gdb-version=6.6
--with-mpc-version=0.9 --with-arch=armv5te --enable-libstdc__-v3
--program-transform-name='s,^,arm-linux-androideabi-,'
Thread model: posix
gcc version 4.7.0 20120124 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-c' '-march=armv7-a' '-mcpu=cortex-a9' '-mfloat-abi=hard'
'-mfpu=vfp' '-flax-vector-conversions' '-mfpu=neon' '-O2' '-o' 'test.s' '-S'
'-v' '-mtls-dialect=gnu'
/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/cc1plus
-quiet -v -imultilib armv7-a -D_GNU_SOURCE test.c -mbionic -fPIC -quiet
-dumpbase test.c -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=vfp
-mfpu=neon -mtls-dialect=gnu -auxbase-strip test.s -O2 -version
-flax-vector-conversions -o test.s -fno-exceptions -fno-rtti
GNU C++ (GCC) version 4.7.0 20120124 (experimental) (arm-linux-androideabi)
compiled by GNU C version 4.6.0 20110603 (Red Hat 4.6.0-10), GMP version
5.0.2, MPFR version 3.0.1, MPC version 0.9
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0"
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0/arm-linux-androideabi/armv7-a"
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0/backward"
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/local/include"
#include "..." search starts here:
#include <...> search starts here:
/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/include
/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/include-fixed
/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include
/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/include
End of search list.
GNU C++ (GCC) version 4.7.0 20120124 (experimental) (arm-linux-androideabi)
compiled by GNU C version 4.6.0 20110603 (Red Hat 4.6.0-10), GMP version
5.0.2, MPFR version 3.0.1, MPC version 0.9
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: d84173bb26a7319ac9d4c1278a6a7e04
COMPILER_PATH=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/bin/
LIBRARY_PATH=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/armv7-a/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/lib/armv7-a/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/lib/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/lib/
COLLECT_GCC_OPTIONS='-c' '-march=armv7-a' '-mcpu=cortex-a9' '-mfloat-abi=hard'
'-mfpu=vfp' '-flax-vector-conversions' '-mfpu=neon' '-O2' '-o' 'test.s' '-S'
'-v' '-mtls-dialect=gnu'
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
@ 2012-01-24 15:31 ` rguenth at gcc dot gnu.org
2012-01-27 14:50 ` eric.batut at allegorithmic dot com
` (12 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-01-24 15:31 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-01-24
Ever Confirmed|0 |1
--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-01-24 14:57:43 UTC ---
It looks like the neon builtins are not properly marked as pure/const, that
certainly is a road-block for optimizations. The heavy use of UNSPECs is
another.
Confirmed.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
2012-01-24 15:31 ` [Bug target/51980] " rguenth at gcc dot gnu.org
@ 2012-01-27 14:50 ` eric.batut at allegorithmic dot com
2012-01-27 15:51 ` ramana at gcc dot gnu.org
` (11 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: eric.batut at allegorithmic dot com @ 2012-01-27 14:50 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
Eric Batut <eric.batut at allegorithmic dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ramana at gcc dot gnu.org,
| |rsandifo at gcc dot gnu.org
--- Comment #2 from Eric Batut <eric.batut at allegorithmic dot com> 2012-01-27 14:13:08 UTC ---
Adding the usual suspects for ARM-related bugs.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
2012-01-24 15:31 ` [Bug target/51980] " rguenth at gcc dot gnu.org
2012-01-27 14:50 ` eric.batut at allegorithmic dot com
@ 2012-01-27 15:51 ` ramana at gcc dot gnu.org
2012-03-30 8:18 ` ramana at gcc dot gnu.org
` (10 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: ramana at gcc dot gnu.org @ 2012-01-27 15:51 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Target|arm-linux-androideabi |arm-linux-androideabi,
| |arm*-*-*eabi
--- Comment #3 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2012-01-27 15:25:29 UTC ---
(In reply to comment #1)
> It looks like the neon builtins are not properly marked as pure/const, that
> certainly is a road-block for optimizations.
> The heavy use of UNSPECs is
> another.
yes, one other problem is that a lot of the neon intrinsics don't expand into
an equivalent RTL - you still need the unspecs for the polynomial types but in
general a large number of the intrinsics that are in the form of unspecs could
use the underlying vec_ expanders that are also present.
>
> Confirmed.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (2 preceding siblings ...)
2012-01-27 15:51 ` ramana at gcc dot gnu.org
@ 2012-03-30 8:18 ` ramana at gcc dot gnu.org
2012-03-30 8:40 ` ramana at gcc dot gnu.org
` (9 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: ramana at gcc dot gnu.org @ 2012-03-30 8:18 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #4 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2012-03-30 07:58:49 UTC ---
Your testcase is broken - it doesn't honour reinterpret_casts properly . This
is a better testcase.
#include <arm_neon.h>
uint32x4_t sqrlen4D_16u8( const uint8x16_t A, const uint8x16_t B )
{
const uint8x16_t absAB = vabdq_u8( A, B );
const uint16x8_t square_l = vmull_u8( vget_low_u8( absAB ), vget_low_u8( absAB
) );
const uint16x8_t square_h = vmull_u8( vget_high_u8( absAB ), vget_high_u8(
absAB ) );
const uint32x4x2_t rgrgrgrg_babababa = vuzpq_u32( vreinterpretq_u32_u16
(square_l), vreinterpretq_u32_u16 (square_h) );
const uint16x8_t rgrgrgrg = vreinterpretq_u16_u32 (rgrgrgrg_babababa.val[0]);
const uint16x8_t babababa = vreinterpretq_u16_u32 (rgrgrgrg_babababa.val[1]);
const uint32x4_t rpg_rpg_rpg_rpg = vpaddlq_u16( rgrgrgrg );
const uint32x4_t dp = vpadalq_u16( rpg_rpg_rpg_rpg, babababa );
return ( dp );
}
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (3 preceding siblings ...)
2012-03-30 8:18 ` ramana at gcc dot gnu.org
@ 2012-03-30 8:40 ` ramana at gcc dot gnu.org
2012-07-05 16:46 ` ramana at gcc dot gnu.org
` (8 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: ramana at gcc dot gnu.org @ 2012-03-30 8:40 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #5 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2012-03-30 08:17:21 UTC ---
Experimenting with :
Applying the patch of PR48941 and the patch for lower-subreg here
http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01886.html
I now see : We still have too many moves for my liking but the gratuituous
spilling is now gone.
.cpu cortex-a9
.eabi_attribute 27, 3
.fpu neon
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "t2.c"
.text
.align 2
.global sqrlen4D_16u8
.type sqrlen4D_16u8, %function
sqrlen4D_16u8:
@ args = 16, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vmov d16, r0, r1 @ v16qi
vmov d17, r2, r3
vldmia sp, {d18-d19}
vabd.u8 q10, q8, q9
vmull.u8 q11, d20, d20
vmull.u8 q10, d21, d21
vmov q8, q11 @ v4si -- unnecessary ?
vmov q9, q10 @ v4si -- unnecessary ?
vuzp.32 q8, q9
vpaddl.u16 q10, q8
vmov q11, q10 @ v4si -- unnecessary
vpadal.u16 q11, q9
vmov r0, r1, d22 @ v4si
vmov r2, r3, d23
bx lr
.size sqrlen4D_16u8, .-sqrlen4D_16u8
.ident "GCC: (GNU) 4.8.0 20120330 (experimental)"
.section .note.GNU-stack,"",%progbits
This probably makes it a dup of PR48941 but it's starting to look more
promising now.
Eric, could you try the 2 patches and see what you get - This isn't something
to be gratuitously backported as we still have to see the effects elsewhere but
it would be worth seeing if this helps on your intrinsics testcases.
Ramana
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (4 preceding siblings ...)
2012-03-30 8:40 ` ramana at gcc dot gnu.org
@ 2012-07-05 16:46 ` ramana at gcc dot gnu.org
2013-05-28 19:30 ` mgretton at gcc dot gnu.org
` (7 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: ramana at gcc dot gnu.org @ 2012-07-05 16:46 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #6 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2012-07-05 16:45:32 UTC ---
Author: ramana
Date: Thu Jul 5 16:45:18 2012
New Revision: 189294
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189294
Log:
2012-07-05 Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
PR target/49891
PR target/51980
* gcc/testsuite/gcc.target/arm/neon/vtrnf32.c: Update.
* gcc/testsuite/gcc.target/arm/neon/vtrns32.c: Update.
* gcc/testsuite/gcc.target/arm/neon/vtrnu32.c: Update.
* gcc/testsuite/gcc.target/arm/neon/vzipf32.c: Update.
* gcc/testsuite/gcc.target/arm/neon/vzips32.c: Update.
* gcc/testsuite/gcc.target/arm/neon/vzipu32.c: Update.
2012-07-05 Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
Julian Brown <julian@codesourcery.com>
PR target/49891
PR target/51980
* config/arm/neon-gen.ml (return_by_ptr): Delete.
(print_function): Handle empty strings.
(return): Delete use of return_by_ptr.
(mask_shape_for_shuffle): New function.
(mask_elems): Likewise.
(shuffle_fn): Likewise.
(params): Simplify and remove use of return_by_ptr.
(get_shuffle): New function.
(print_variant): Update.
* config/arm/neon.ml (rev_elems): New function.
(permute_range): Likewise.
(zip_range): Likewise.
(uzip_range): Likewise.
(trn_range): Likewise.
(zip_elems): Likewise.
(uzip_elems): Likewise.
(trn_elems): Likewise.
(features): New enumeration Use_shuffle. Delete ReturnPtr.
(pf_su_8_16): New.
(suf_32): New.
(ops): Update entries for Vrev64, Vrev32, Vrev16, Vtr, Vzip, Vuzp.
* config/arm/arm_neon.h: Regenerate.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/arm_neon.h
trunk/gcc/config/arm/neon-gen.ml
trunk/gcc/config/arm/neon.ml
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/arm/neon/vtrnf32.c
trunk/gcc/testsuite/gcc.target/arm/neon/vtrns32.c
trunk/gcc/testsuite/gcc.target/arm/neon/vtrnu32.c
trunk/gcc/testsuite/gcc.target/arm/neon/vzipf32.c
trunk/gcc/testsuite/gcc.target/arm/neon/vzips32.c
trunk/gcc/testsuite/gcc.target/arm/neon/vzipu32.c
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (5 preceding siblings ...)
2012-07-05 16:46 ` ramana at gcc dot gnu.org
@ 2013-05-28 19:30 ` mgretton at gcc dot gnu.org
2014-01-22 12:19 ` StaffLeavers at arm dot com
` (6 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: mgretton at gcc dot gnu.org @ 2013-05-28 19:30 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
mgretton at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mgretton at gcc dot gnu.org
--- Comment #7 from mgretton at gcc dot gnu.org ---
Testing the testcase in #4 with a recent trunk (gcc version 4.9.0 20130528
(experimental) (GCC)) gives the following results:
arm-none-eabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=softfp -O2 -mthumb:
sqrlen4D_16u8:
vmov d18, r0, r1 @ v16qi
vmov d19, r2, r3
vld1.64 {d16-d17}, [sp:64]
vabd.u8 q8, q9, q8
vmull.u8 q9, d16, d16
vmull.u8 q8, d17, d17
vuzp.32 q9, q8
vpaddl.u16 q9, q9
vmov q10, q9 @ v4si
vpadal.u16 q10, q8
vmov r0, r1, d20 @ v4si
vmov r2, r3, d21
bx lr
arm-none-eabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=hard -O2 -mthumb:
sqrlen4D_16u8:
vabd.u8 q1, q0, q1
vmull.u8 q0, d2, d2
vmull.u8 q8, d3, d3
vuzp.32 q0, q8
vpaddl.u16 q0, q0
vpadal.u16 q0, q8
bx lr
So code generation seems to be OK for hard-float ABI but the soft-float version
has some issues with an extra vmov between the vpaddl and vpadal.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (7 preceding siblings ...)
2014-01-22 12:19 ` StaffLeavers at arm dot com
@ 2014-01-22 12:19 ` ktkachov at gcc dot gnu.org
2014-01-22 12:20 ` StaffLeavers at arm dot com
` (4 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2014-01-22 12:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
ktkachov at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ktkachov at gcc dot gnu.org
--- Comment #8 from ktkachov at gcc dot gnu.org ---
> arm-none-eabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=softfp -O2 -mthumb:
> sqrlen4D_16u8:
> vmov d18, r0, r1 @ v16qi
> vmov d19, r2, r3
> vld1.64 {d16-d17}, [sp:64]
> vabd.u8 q8, q9, q8
> vmull.u8 q9, d16, d16
> vmull.u8 q8, d17, d17
> vuzp.32 q9, q8
> vpaddl.u16 q9, q9
> vmov q10, q9 @ v4si
> vpadal.u16 q10, q8
> vmov r0, r1, d20 @ v4si
> vmov r2, r3, d21
> bx lr
With current trunk I'm getting for the softfp case:
push {lr} @ 40 *push_multi [length = 2]
vmov d16, r0, r1 @ v16qi @ 37 *neon_movv16qi/6 [length
= 8]
vmov d17, r2, r3
add lr, sp, #4 @ 36 *arm_addsi3/5 [length = 4]
vldr d18, [sp, #4] @ 3 *neon_movv16qi/4 [length = 8]
vldr d19, [sp, #12]
vabd.u8 q9, q8, q9 @ 7 neon_vabdv16qi [length = 4]
vmull.u8 q8, d18, d18 @ 14 neon_vmullv8qi [length = 4]
vmull.u8 q9, d19, d19 @ 16 neon_vmullv8qi [length = 4]
vuzp.32 q8, q9 @ 18 *neon_vuzpv4si_insn [length = 4]
vpaddl.u16 q8, q8 @ 22 neon_vpaddlv8hi [length = 4]
vpadal.u16 q8, q9 @ 28 neon_vpadalv8hi [length = 4]
vmov r0, r1, d16 @ v4si @ 39 *neon_movv4si/5 [length = 8]
vmov r2, r3, d17
ldr pc, [sp], #4 @ 45 *ldr_with_return [length = 4]
The move between the vpad*s is gone, but there's a couple of redundant loads
and some register spillage.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (6 preceding siblings ...)
2013-05-28 19:30 ` mgretton at gcc dot gnu.org
@ 2014-01-22 12:19 ` StaffLeavers at arm dot com
2014-01-22 12:19 ` ktkachov at gcc dot gnu.org
` (5 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #9 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.
Your email will be forwarded to their line manager.
Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com
Thank you.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (8 preceding siblings ...)
2014-01-22 12:19 ` ktkachov at gcc dot gnu.org
@ 2014-01-22 12:20 ` StaffLeavers at arm dot com
2014-01-22 12:21 ` StaffLeavers at arm dot com
` (3 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:20 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #10 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.
Your email will be forwarded to their line manager.
Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com
Thank you.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (9 preceding siblings ...)
2014-01-22 12:20 ` StaffLeavers at arm dot com
@ 2014-01-22 12:21 ` StaffLeavers at arm dot com
2014-01-22 12:22 ` StaffLeavers at arm dot com
` (2 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:21 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #11 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.
Your email will be forwarded to their line manager.
Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com
Thank you.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (10 preceding siblings ...)
2014-01-22 12:21 ` StaffLeavers at arm dot com
@ 2014-01-22 12:22 ` StaffLeavers at arm dot com
2014-01-22 12:22 ` StaffLeavers at arm dot com
2014-06-13 15:38 ` christophe.lyon at st dot com
13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #12 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.
Your email will be forwarded to their line manager.
Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com
Thank you.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (11 preceding siblings ...)
2014-01-22 12:22 ` StaffLeavers at arm dot com
@ 2014-01-22 12:22 ` StaffLeavers at arm dot com
2014-06-13 15:38 ` christophe.lyon at st dot com
13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
--- Comment #13 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.
Your email will be forwarded to their line manager.
Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com
Thank you.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
` (12 preceding siblings ...)
2014-01-22 12:22 ` StaffLeavers at arm dot com
@ 2014-06-13 15:38 ` christophe.lyon at st dot com
13 siblings, 0 replies; 15+ messages in thread
From: christophe.lyon at st dot com @ 2014-06-13 15:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
christophe.lyon at st dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |christophe.lyon at st dot com
--- Comment #14 from christophe.lyon at st dot com ---
As of current trunk the softfp case looks like this:
sqrlen4D_16u8:
vmov d16, r0, r1 @ v16qi
vmov d17, r2, r3
vld1.64 {d18-d19}, [sp:64]
vabd.u8 q9, q8, q9
vmull.u8 q8, d18, d18
vmull.u8 q9, d19, d19
vuzp.32 q8, q9
vpaddl.u16 q8, q8
vpadal.u16 q8, q9
vmov r0, r1, d16 @ v4si
vmov r2, r3, d17
bx lr
which looks quite good.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-06-13 15:38 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
2012-01-24 15:31 ` [Bug target/51980] " rguenth at gcc dot gnu.org
2012-01-27 14:50 ` eric.batut at allegorithmic dot com
2012-01-27 15:51 ` ramana at gcc dot gnu.org
2012-03-30 8:18 ` ramana at gcc dot gnu.org
2012-03-30 8:40 ` ramana at gcc dot gnu.org
2012-07-05 16:46 ` ramana at gcc dot gnu.org
2013-05-28 19:30 ` mgretton at gcc dot gnu.org
2014-01-22 12:19 ` StaffLeavers at arm dot com
2014-01-22 12:19 ` ktkachov at gcc dot gnu.org
2014-01-22 12:20 ` StaffLeavers at arm dot com
2014-01-22 12:21 ` StaffLeavers at arm dot com
2014-01-22 12:22 ` StaffLeavers at arm dot com
2014-01-22 12:22 ` StaffLeavers at arm dot com
2014-06-13 15:38 ` christophe.lyon at st dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).