From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12112 invoked by alias); 24 Jan 2012 14:32:47 -0000 Received: (qmail 12087 invoked by uid 22791); 24 Jan 2012 14:32:45 -0000 X-SWARE-Spam-Status: No, hits=-2.6 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,SARE_SUB_OBFU_Q0,TW_DM,TW_JL,TW_UZ,TW_ZP X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 24 Jan 2012 14:32:32 +0000 From: "eric.batut at allegorithmic dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq Date: Tue, 24 Jan 2012 15:29:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: eric.batut at allegorithmic dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-01/txt/msg02800.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980 Bug #: 51980 Summary: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned@gcc.gnu.org ReportedBy: eric.batut@allegorithmic.com Created attachment 26442 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26442 Minimal repro case (C file) When using UZP/ZIP/TRN Neon intrinsics, gcc-trunk generates a whole lot of stack operations (and associated stack alignment operations) even if everything can purely be done using Neon registers. Compiler used is GCC trunk, rev 183468, compiled with Android's build-gcc.sh (arm-linux-androideabi). Command line is: arm-linux-androideabi-g++ -c -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=vfp -flax-vector-conversions -mfpu=neon -O2 -o test.s test.c -S Generated assembly code for attached C file is: _Z13sqrlen4D_16u817__simd128_uint8_tS_: vabd.u8 q1, q0, q1 stmfd sp!, {r4, fp} <= Unnecessary add fp, sp, #4 <= Unnecessary sub sp, sp, #48 <= Unnecessary add r3, sp, #15 <= Unnecessary vmull.u8 q0, d2, d2 bic r3, r3, #15 <= Unnecessary vmull.u8 q8, d3, d3 vuzp.32 q0, q8 vstmia r3, {d0-d1} <= Unnecessary, caused by vuzp.32 vstr d16, [r3, #16] <= Unnecessary, caused by vuzp.32 vstr d17, [r3, #24] <= Unnecessary, caused by vuzp.32 vpaddl.u16 q0, q0 vpadal.u16 q0, q8 sub sp, fp, #4 <= Unnecessary ldmfd sp!, {r4, fp} <= Unnecessary bx lr As no stack operation is needed in this function, ideally the following should be generated instead: _Z13sqrlen4D_16u817__simd128_uint8_tS_: vabd.u8 q1, q0, q1 vmull.u8 q0, d2, d2 vmull.u8 q8, d3, d3 vuzp.32 q0, q8 vpaddl.u16 q0, q0 vpadal.u16 q0, q8 bx lr This makes even tight Neon functions written with intrinsics much larger and slower than necessary, and makes it very hard to write performance-oriented code with intrinsics in arm-gcc. gcc -v yields: Using built-in specs. COLLECT_GCC=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/bin/arm-linux-androideabi-g++ COLLECT_LTO_WRAPPER=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/lto-wrapper Target: arm-linux-androideabi Configured with: /home/eb/android-ndk-r6/src/build/../gcc/gcc-4.7.0/configure --prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86 --target=arm-linux-androideabi --host=i386-linux-gnu --build=i386-linux-gnu --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --with-gmp=/tmp/ndk-eb/build/toolchain/temp-install --with-mpfr=/tmp/ndk-eb/build/toolchain/temp-install --with-mpc=/tmp/ndk-eb/build/toolchain/temp-install --disable-libssp --enable-threads --disable-nls --disable-libmudflap --disable-libgomp --disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared --disable-tls --with-float=soft --with-fpu=vfp --with-arch=armv5te --enable-target-optspace --enable-initfini-array --disable-nls --prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86 --with-sysroot=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot --with-binutils-version=2.21.53 --with-mpfr-version=3.0.1 --with-gmp-version=5.0.2 --with-gcc-version=4.7.0 --with-gdb-version=6.6 --with-mpc-version=0.9 --with-arch=armv5te --enable-libstdc__-v3 --program-transform-name='s,^,arm-linux-androideabi-,' Thread model: posix gcc version 4.7.0 20120124 (experimental) (GCC) COLLECT_GCC_OPTIONS='-c' '-march=armv7-a' '-mcpu=cortex-a9' '-mfloat-abi=hard' '-mfpu=vfp' '-flax-vector-conversions' '-mfpu=neon' '-O2' '-o' 'test.s' '-S' '-v' '-mtls-dialect=gnu' /home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/cc1plus -quiet -v -imultilib armv7-a -D_GNU_SOURCE test.c -mbionic -fPIC -quiet -dumpbase test.c -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=vfp -mfpu=neon -mtls-dialect=gnu -auxbase-strip test.s -O2 -version -flax-vector-conversions -o test.s -fno-exceptions -fno-rtti GNU C++ (GCC) version 4.7.0 20120124 (experimental) (arm-linux-androideabi) compiled by GNU C version 4.6.0 20110603 (Red Hat 4.6.0-10), GMP version 5.0.2, MPFR version 3.0.1, MPC version 0.9 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 ignoring nonexistent directory "/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0" ignoring nonexistent directory "/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0/arm-linux-androideabi/armv7-a" ignoring nonexistent directory "/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0/backward" ignoring nonexistent directory "/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/local/include" #include "..." search starts here: #include <...> search starts here: /home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/include /home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/include-fixed /home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include /home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/include End of search list. GNU C++ (GCC) version 4.7.0 20120124 (experimental) (arm-linux-androideabi) compiled by GNU C version 4.6.0 20110603 (Red Hat 4.6.0-10), GMP version 5.0.2, MPFR version 3.0.1, MPC version 0.9 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: d84173bb26a7319ac9d4c1278a6a7e04 COMPILER_PATH=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/bin/ LIBRARY_PATH=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/armv7-a/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/lib/armv7-a/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/lib/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/lib/ COLLECT_GCC_OPTIONS='-c' '-march=armv7-a' '-mcpu=cortex-a9' '-mfloat-abi=hard' '-mfpu=vfp' '-flax-vector-conversions' '-mfpu=neon' '-O2' '-o' 'test.s' '-S' '-v' '-mtls-dialect=gnu'