public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
@ 2012-01-24 15:29 eric.batut at allegorithmic dot com
  2012-01-24 15:31 ` [Bug target/51980] " rguenth at gcc dot gnu.org
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: eric.batut at allegorithmic dot com @ 2012-01-24 15:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

             Bug #: 51980
           Summary: ARM - Neon code polluted by useless stores to the
                    stack with vuzpq / vzipq / vtrnq
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: eric.batut@allegorithmic.com


Created attachment 26442
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26442
Minimal repro case (C file)

When using UZP/ZIP/TRN Neon intrinsics, gcc-trunk generates a whole lot of
stack operations (and associated stack alignment operations) even if everything
can purely be done using Neon registers. 

Compiler used is GCC trunk, rev 183468, compiled with Android's build-gcc.sh
(arm-linux-androideabi).

Command line is:
arm-linux-androideabi-g++ -c -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard
-mfpu=vfp -flax-vector-conversions -mfpu=neon -O2 -o test.s test.c -S

Generated assembly code for attached C file is:
_Z13sqrlen4D_16u817__simd128_uint8_tS_:
    vabd.u8    q1, q0, q1
    stmfd    sp!, {r4, fp}       <= Unnecessary
    add    fp, sp, #4          <= Unnecessary
    sub    sp, sp, #48         <= Unnecessary
    add    r3, sp, #15         <= Unnecessary
    vmull.u8    q0, d2, d2
    bic    r3, r3, #15         <= Unnecessary
    vmull.u8    q8, d3, d3
    vuzp.32    q0, q8
    vstmia    r3, {d0-d1}         <= Unnecessary, caused by vuzp.32
    vstr    d16, [r3, #16]      <= Unnecessary, caused by vuzp.32
    vstr    d17, [r3, #24]      <= Unnecessary, caused by vuzp.32
    vpaddl.u16    q0, q0
    vpadal.u16    q0, q8
    sub    sp, fp, #4          <= Unnecessary
    ldmfd    sp!, {r4, fp}       <= Unnecessary
    bx    lr

As no stack operation is needed in this function, ideally the following should
be generated instead:
_Z13sqrlen4D_16u817__simd128_uint8_tS_:
    vabd.u8    q1, q0, q1
    vmull.u8    q0, d2, d2
    vmull.u8    q8, d3, d3
    vuzp.32    q0, q8
    vpaddl.u16    q0, q0
    vpadal.u16    q0, q8
    bx    lr

This makes even tight Neon functions written with intrinsics much larger and
slower than necessary, and makes it very hard to write performance-oriented
code with intrinsics in arm-gcc.

gcc -v yields:
Using built-in specs.
COLLECT_GCC=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/bin/arm-linux-androideabi-g++
COLLECT_LTO_WRAPPER=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/lto-wrapper
Target: arm-linux-androideabi
Configured with: /home/eb/android-ndk-r6/src/build/../gcc/gcc-4.7.0/configure
--prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86
--target=arm-linux-androideabi --host=i386-linux-gnu --build=i386-linux-gnu
--with-gnu-as --with-gnu-ld --enable-languages=c,c++
--with-gmp=/tmp/ndk-eb/build/toolchain/temp-install
--with-mpfr=/tmp/ndk-eb/build/toolchain/temp-install
--with-mpc=/tmp/ndk-eb/build/toolchain/temp-install --disable-libssp
--enable-threads --disable-nls --disable-libmudflap --disable-libgomp
--disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared --disable-tls
--with-float=soft --with-fpu=vfp --with-arch=armv5te --enable-target-optspace
--enable-initfini-array --disable-nls
--prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86
--with-sysroot=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot
--with-binutils-version=2.21.53 --with-mpfr-version=3.0.1
--with-gmp-version=5.0.2 --with-gcc-version=4.7.0 --with-gdb-version=6.6
--with-mpc-version=0.9 --with-arch=armv5te --enable-libstdc__-v3
--program-transform-name='s,^,arm-linux-androideabi-,'
Thread model: posix
gcc version 4.7.0 20120124 (experimental) (GCC) 
COLLECT_GCC_OPTIONS='-c' '-march=armv7-a' '-mcpu=cortex-a9' '-mfloat-abi=hard'
'-mfpu=vfp' '-flax-vector-conversions' '-mfpu=neon' '-O2' '-o' 'test.s' '-S'
'-v' '-mtls-dialect=gnu'

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/cc1plus
-quiet -v -imultilib armv7-a -D_GNU_SOURCE test.c -mbionic -fPIC -quiet
-dumpbase test.c -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=vfp
-mfpu=neon -mtls-dialect=gnu -auxbase-strip test.s -O2 -version
-flax-vector-conversions -o test.s -fno-exceptions -fno-rtti
GNU C++ (GCC) version 4.7.0 20120124 (experimental) (arm-linux-androideabi)
    compiled by GNU C version 4.6.0 20110603 (Red Hat 4.6.0-10), GMP version
5.0.2, MPFR version 3.0.1, MPC version 0.9
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0"
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0/arm-linux-androideabi/armv7-a"
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0/backward"
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/local/include"
#include "..." search starts here:
#include <...> search starts here:

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/include

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/include-fixed

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/include
End of search list.
GNU C++ (GCC) version 4.7.0 20120124 (experimental) (arm-linux-androideabi)
    compiled by GNU C version 4.6.0 20110603 (Red Hat 4.6.0-10), GMP version
5.0.2, MPFR version 3.0.1, MPC version 0.9
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: d84173bb26a7319ac9d4c1278a6a7e04
COMPILER_PATH=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/bin/
LIBRARY_PATH=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/armv7-a/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/lib/armv7-a/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/lib/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/lib/
COLLECT_GCC_OPTIONS='-c' '-march=armv7-a' '-mcpu=cortex-a9' '-mfloat-abi=hard'
'-mfpu=vfp' '-flax-vector-conversions' '-mfpu=neon' '-O2' '-o' 'test.s' '-S'
'-v' '-mtls-dialect=gnu'


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-06-13 15:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
2012-01-24 15:31 ` [Bug target/51980] " rguenth at gcc dot gnu.org
2012-01-27 14:50 ` eric.batut at allegorithmic dot com
2012-01-27 15:51 ` ramana at gcc dot gnu.org
2012-03-30  8:18 ` ramana at gcc dot gnu.org
2012-03-30  8:40 ` ramana at gcc dot gnu.org
2012-07-05 16:46 ` ramana at gcc dot gnu.org
2013-05-28 19:30 ` mgretton at gcc dot gnu.org
2014-01-22 12:19 ` ktkachov at gcc dot gnu.org
2014-01-22 12:19 ` StaffLeavers at arm dot com
2014-01-22 12:20 ` StaffLeavers at arm dot com
2014-01-22 12:21 ` StaffLeavers at arm dot com
2014-01-22 12:22 ` StaffLeavers at arm dot com
2014-01-22 12:22 ` StaffLeavers at arm dot com
2014-06-13 15:38 ` christophe.lyon at st dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).