public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
@ 2012-01-24 15:29 eric.batut at allegorithmic dot com
  2012-01-24 15:31 ` [Bug target/51980] " rguenth at gcc dot gnu.org
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: eric.batut at allegorithmic dot com @ 2012-01-24 15:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

             Bug #: 51980
           Summary: ARM - Neon code polluted by useless stores to the
                    stack with vuzpq / vzipq / vtrnq
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: eric.batut@allegorithmic.com


Created attachment 26442
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26442
Minimal repro case (C file)

When using UZP/ZIP/TRN Neon intrinsics, gcc-trunk generates a whole lot of
stack operations (and associated stack alignment operations) even if everything
can purely be done using Neon registers. 

Compiler used is GCC trunk, rev 183468, compiled with Android's build-gcc.sh
(arm-linux-androideabi).

Command line is:
arm-linux-androideabi-g++ -c -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard
-mfpu=vfp -flax-vector-conversions -mfpu=neon -O2 -o test.s test.c -S

Generated assembly code for attached C file is:
_Z13sqrlen4D_16u817__simd128_uint8_tS_:
    vabd.u8    q1, q0, q1
    stmfd    sp!, {r4, fp}       <= Unnecessary
    add    fp, sp, #4          <= Unnecessary
    sub    sp, sp, #48         <= Unnecessary
    add    r3, sp, #15         <= Unnecessary
    vmull.u8    q0, d2, d2
    bic    r3, r3, #15         <= Unnecessary
    vmull.u8    q8, d3, d3
    vuzp.32    q0, q8
    vstmia    r3, {d0-d1}         <= Unnecessary, caused by vuzp.32
    vstr    d16, [r3, #16]      <= Unnecessary, caused by vuzp.32
    vstr    d17, [r3, #24]      <= Unnecessary, caused by vuzp.32
    vpaddl.u16    q0, q0
    vpadal.u16    q0, q8
    sub    sp, fp, #4          <= Unnecessary
    ldmfd    sp!, {r4, fp}       <= Unnecessary
    bx    lr

As no stack operation is needed in this function, ideally the following should
be generated instead:
_Z13sqrlen4D_16u817__simd128_uint8_tS_:
    vabd.u8    q1, q0, q1
    vmull.u8    q0, d2, d2
    vmull.u8    q8, d3, d3
    vuzp.32    q0, q8
    vpaddl.u16    q0, q0
    vpadal.u16    q0, q8
    bx    lr

This makes even tight Neon functions written with intrinsics much larger and
slower than necessary, and makes it very hard to write performance-oriented
code with intrinsics in arm-gcc.

gcc -v yields:
Using built-in specs.
COLLECT_GCC=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/bin/arm-linux-androideabi-g++
COLLECT_LTO_WRAPPER=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/lto-wrapper
Target: arm-linux-androideabi
Configured with: /home/eb/android-ndk-r6/src/build/../gcc/gcc-4.7.0/configure
--prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86
--target=arm-linux-androideabi --host=i386-linux-gnu --build=i386-linux-gnu
--with-gnu-as --with-gnu-ld --enable-languages=c,c++
--with-gmp=/tmp/ndk-eb/build/toolchain/temp-install
--with-mpfr=/tmp/ndk-eb/build/toolchain/temp-install
--with-mpc=/tmp/ndk-eb/build/toolchain/temp-install --disable-libssp
--enable-threads --disable-nls --disable-libmudflap --disable-libgomp
--disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared --disable-tls
--with-float=soft --with-fpu=vfp --with-arch=armv5te --enable-target-optspace
--enable-initfini-array --disable-nls
--prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86
--with-sysroot=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot
--with-binutils-version=2.21.53 --with-mpfr-version=3.0.1
--with-gmp-version=5.0.2 --with-gcc-version=4.7.0 --with-gdb-version=6.6
--with-mpc-version=0.9 --with-arch=armv5te --enable-libstdc__-v3
--program-transform-name='s,^,arm-linux-androideabi-,'
Thread model: posix
gcc version 4.7.0 20120124 (experimental) (GCC) 
COLLECT_GCC_OPTIONS='-c' '-march=armv7-a' '-mcpu=cortex-a9' '-mfloat-abi=hard'
'-mfpu=vfp' '-flax-vector-conversions' '-mfpu=neon' '-O2' '-o' 'test.s' '-S'
'-v' '-mtls-dialect=gnu'

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/cc1plus
-quiet -v -imultilib armv7-a -D_GNU_SOURCE test.c -mbionic -fPIC -quiet
-dumpbase test.c -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=vfp
-mfpu=neon -mtls-dialect=gnu -auxbase-strip test.s -O2 -version
-flax-vector-conversions -o test.s -fno-exceptions -fno-rtti
GNU C++ (GCC) version 4.7.0 20120124 (experimental) (arm-linux-androideabi)
    compiled by GNU C version 4.6.0 20110603 (Red Hat 4.6.0-10), GMP version
5.0.2, MPFR version 3.0.1, MPC version 0.9
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0"
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0/arm-linux-androideabi/armv7-a"
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0/backward"
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/local/include"
#include "..." search starts here:
#include <...> search starts here:

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/include

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/include-fixed

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/include
End of search list.
GNU C++ (GCC) version 4.7.0 20120124 (experimental) (arm-linux-androideabi)
    compiled by GNU C version 4.6.0 20110603 (Red Hat 4.6.0-10), GMP version
5.0.2, MPFR version 3.0.1, MPC version 0.9
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: d84173bb26a7319ac9d4c1278a6a7e04
COMPILER_PATH=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/bin/
LIBRARY_PATH=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/armv7-a/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/lib/armv7-a/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/lib/:/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot/usr/lib/
COLLECT_GCC_OPTIONS='-c' '-march=armv7-a' '-mcpu=cortex-a9' '-mfloat-abi=hard'
'-mfpu=vfp' '-flax-vector-conversions' '-mfpu=neon' '-O2' '-o' 'test.s' '-S'
'-v' '-mtls-dialect=gnu'


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
  2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
@ 2012-01-24 15:31 ` rguenth at gcc dot gnu.org
  2012-01-27 14:50 ` eric.batut at allegorithmic dot com
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-01-24 15:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-01-24
     Ever Confirmed|0                           |1

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-01-24 14:57:43 UTC ---
It looks like the neon builtins are not properly marked as pure/const, that
certainly is a road-block for optimizations.  The heavy use of UNSPECs is
another.

Confirmed.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
  2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
  2012-01-24 15:31 ` [Bug target/51980] " rguenth at gcc dot gnu.org
@ 2012-01-27 14:50 ` eric.batut at allegorithmic dot com
  2012-01-27 15:51 ` ramana at gcc dot gnu.org
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: eric.batut at allegorithmic dot com @ 2012-01-27 14:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

Eric Batut <eric.batut at allegorithmic dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ramana at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org

--- Comment #2 from Eric Batut <eric.batut at allegorithmic dot com> 2012-01-27 14:13:08 UTC ---
Adding the usual suspects for ARM-related bugs.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
  2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
  2012-01-24 15:31 ` [Bug target/51980] " rguenth at gcc dot gnu.org
  2012-01-27 14:50 ` eric.batut at allegorithmic dot com
@ 2012-01-27 15:51 ` ramana at gcc dot gnu.org
  2012-03-30  8:18 ` ramana at gcc dot gnu.org
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: ramana at gcc dot gnu.org @ 2012-01-27 15:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Target|arm-linux-androideabi       |arm-linux-androideabi,
                   |                            |arm*-*-*eabi

--- Comment #3 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2012-01-27 15:25:29 UTC ---
(In reply to comment #1)
> It looks like the neon builtins are not properly marked as pure/const, that
> certainly is a road-block for optimizations.  



> The heavy use of UNSPECs is
> another.

yes, one other problem is that a lot of the neon intrinsics don't expand into
an equivalent RTL - you still need the unspecs for the polynomial types but in
general a large number of the intrinsics that are in the form of unspecs could
use the underlying vec_ expanders that are also present. 




> 
> Confirmed.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
  2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
                   ` (2 preceding siblings ...)
  2012-01-27 15:51 ` ramana at gcc dot gnu.org
@ 2012-03-30  8:18 ` ramana at gcc dot gnu.org
  2012-03-30  8:40 ` ramana at gcc dot gnu.org
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: ramana at gcc dot gnu.org @ 2012-03-30  8:18 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

--- Comment #4 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2012-03-30 07:58:49 UTC ---
Your testcase is broken - it doesn't honour reinterpret_casts properly . This
is  a better testcase. 

#include <arm_neon.h>

uint32x4_t sqrlen4D_16u8( const uint8x16_t A, const uint8x16_t B )
{
 const uint8x16_t absAB = vabdq_u8( A, B );
 const uint16x8_t square_l = vmull_u8( vget_low_u8( absAB ), vget_low_u8( absAB
) );
 const uint16x8_t square_h = vmull_u8( vget_high_u8( absAB ), vget_high_u8(
absAB ) );
 const uint32x4x2_t rgrgrgrg_babababa = vuzpq_u32( vreinterpretq_u32_u16
(square_l), vreinterpretq_u32_u16 (square_h) );
 const uint16x8_t rgrgrgrg = vreinterpretq_u16_u32 (rgrgrgrg_babababa.val[0]);
 const uint16x8_t babababa = vreinterpretq_u16_u32 (rgrgrgrg_babababa.val[1]);
 const uint32x4_t rpg_rpg_rpg_rpg = vpaddlq_u16( rgrgrgrg );
 const uint32x4_t dp = vpadalq_u16( rpg_rpg_rpg_rpg, babababa );
 return ( dp );
}


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
  2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
                   ` (3 preceding siblings ...)
  2012-03-30  8:18 ` ramana at gcc dot gnu.org
@ 2012-03-30  8:40 ` ramana at gcc dot gnu.org
  2012-07-05 16:46 ` ramana at gcc dot gnu.org
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: ramana at gcc dot gnu.org @ 2012-03-30  8:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

--- Comment #5 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2012-03-30 08:17:21 UTC ---
Experimenting with : 

Applying the patch of PR48941 and the patch for lower-subreg here

http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01886.html

I now see : We still have too many moves for my liking but the gratuituous
spilling is now gone. 

      .cpu cortex-a9
        .eabi_attribute 27, 3
        .fpu neon
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 2
        .eabi_attribute 30, 2
        .eabi_attribute 34, 1
        .eabi_attribute 18, 4
        .file   "t2.c"
        .text
        .align  2
        .global sqrlen4D_16u8
        .type   sqrlen4D_16u8, %function
sqrlen4D_16u8:
        @ args = 16, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        vmov    d16, r0, r1  @ v16qi
        vmov    d17, r2, r3
        vldmia  sp, {d18-d19}
        vabd.u8 q10, q8, q9
        vmull.u8        q11, d20, d20
        vmull.u8        q10, d21, d21
        vmov    q8, q11  @ v4si  -- unnecessary ? 
        vmov    q9, q10  @ v4si  -- unnecessary ? 
        vuzp.32 q8, q9
        vpaddl.u16      q10, q8
        vmov    q11, q10  @ v4si  -- unnecessary
        vpadal.u16      q11, q9
        vmov    r0, r1, d22  @ v4si
        vmov    r2, r3, d23
        bx      lr
        .size   sqrlen4D_16u8, .-sqrlen4D_16u8
        .ident  "GCC: (GNU) 4.8.0 20120330 (experimental)"
        .section        .note.GNU-stack,"",%progbits

This probably makes it a dup of PR48941 but it's starting to look more
promising now. 

Eric, could you try the 2 patches and see what you get - This isn't something
to be gratuitously backported as we still have to see the effects elsewhere but
it would be worth seeing if this helps on your intrinsics testcases. 

Ramana


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
  2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
                   ` (4 preceding siblings ...)
  2012-03-30  8:40 ` ramana at gcc dot gnu.org
@ 2012-07-05 16:46 ` ramana at gcc dot gnu.org
  2013-05-28 19:30 ` mgretton at gcc dot gnu.org
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: ramana at gcc dot gnu.org @ 2012-07-05 16:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

--- Comment #6 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2012-07-05 16:45:32 UTC ---
Author: ramana
Date: Thu Jul  5 16:45:18 2012
New Revision: 189294

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189294
Log:
2012-07-05  Ramana Radhakrishnan  <ramana.radhakrishnan@linaro.org>

        PR target/49891
        PR target/51980
        * gcc/testsuite/gcc.target/arm/neon/vtrnf32.c: Update.
        * gcc/testsuite/gcc.target/arm/neon/vtrns32.c: Update.
        * gcc/testsuite/gcc.target/arm/neon/vtrnu32.c: Update.
        * gcc/testsuite/gcc.target/arm/neon/vzipf32.c: Update.
        * gcc/testsuite/gcc.target/arm/neon/vzips32.c: Update.
        * gcc/testsuite/gcc.target/arm/neon/vzipu32.c: Update.


2012-07-05  Ramana Radhakrishnan  <ramana.radhakrishnan@linaro.org>
        Julian Brown  <julian@codesourcery.com>

        PR target/49891
        PR target/51980
        * config/arm/neon-gen.ml (return_by_ptr): Delete.
        (print_function): Handle empty strings.
        (return): Delete use of return_by_ptr.
        (mask_shape_for_shuffle): New function.
        (mask_elems): Likewise.
        (shuffle_fn): Likewise.
        (params): Simplify and remove use of return_by_ptr.
        (get_shuffle): New function.
        (print_variant): Update.
        * config/arm/neon.ml (rev_elems): New function.
        (permute_range): Likewise.
        (zip_range): Likewise.
        (uzip_range): Likewise.
        (trn_range): Likewise.
        (zip_elems): Likewise.
        (uzip_elems): Likewise.
        (trn_elems): Likewise.
        (features): New enumeration Use_shuffle. Delete ReturnPtr.
        (pf_su_8_16): New.
        (suf_32): New.
        (ops): Update entries for Vrev64, Vrev32, Vrev16, Vtr, Vzip, Vuzp.
        * config/arm/arm_neon.h: Regenerate.




Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm_neon.h
    trunk/gcc/config/arm/neon-gen.ml
    trunk/gcc/config/arm/neon.ml
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/arm/neon/vtrnf32.c
    trunk/gcc/testsuite/gcc.target/arm/neon/vtrns32.c
    trunk/gcc/testsuite/gcc.target/arm/neon/vtrnu32.c
    trunk/gcc/testsuite/gcc.target/arm/neon/vzipf32.c
    trunk/gcc/testsuite/gcc.target/arm/neon/vzips32.c
    trunk/gcc/testsuite/gcc.target/arm/neon/vzipu32.c


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
  2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
                   ` (5 preceding siblings ...)
  2012-07-05 16:46 ` ramana at gcc dot gnu.org
@ 2013-05-28 19:30 ` mgretton at gcc dot gnu.org
  2014-01-22 12:19 ` StaffLeavers at arm dot com
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: mgretton at gcc dot gnu.org @ 2013-05-28 19:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

mgretton at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mgretton at gcc dot gnu.org

--- Comment #7 from mgretton at gcc dot gnu.org ---
Testing the testcase in #4 with a recent trunk (gcc version 4.9.0 20130528
(experimental) (GCC)) gives the following results:

arm-none-eabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=softfp -O2 -mthumb:
sqrlen4D_16u8:
        vmov    d18, r0, r1  @ v16qi
        vmov    d19, r2, r3
        vld1.64 {d16-d17}, [sp:64]
        vabd.u8 q8, q9, q8
        vmull.u8        q9, d16, d16
        vmull.u8        q8, d17, d17
        vuzp.32 q9, q8
        vpaddl.u16      q9, q9
        vmov    q10, q9  @ v4si
        vpadal.u16      q10, q8
        vmov    r0, r1, d20  @ v4si
        vmov    r2, r3, d21
        bx      lr


arm-none-eabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=hard -O2 -mthumb:
sqrlen4D_16u8:
        vabd.u8 q1, q0, q1
        vmull.u8        q0, d2, d2
        vmull.u8        q8, d3, d3
        vuzp.32 q0, q8
        vpaddl.u16      q0, q0
        vpadal.u16      q0, q8
        bx      lr

So code generation seems to be OK for hard-float ABI but the soft-float version
has some issues with an extra vmov between the vpaddl and vpadal.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
  2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
                   ` (7 preceding siblings ...)
  2014-01-22 12:19 ` StaffLeavers at arm dot com
@ 2014-01-22 12:19 ` ktkachov at gcc dot gnu.org
  2014-01-22 12:20 ` StaffLeavers at arm dot com
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2014-01-22 12:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ktkachov at gcc dot gnu.org

--- Comment #8 from ktkachov at gcc dot gnu.org ---
> arm-none-eabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=softfp -O2 -mthumb:
> sqrlen4D_16u8:
>         vmov    d18, r0, r1  @ v16qi
>         vmov    d19, r2, r3
>         vld1.64 {d16-d17}, [sp:64]
>         vabd.u8 q8, q9, q8
>         vmull.u8        q9, d16, d16
>         vmull.u8        q8, d17, d17
>         vuzp.32 q9, q8
>         vpaddl.u16      q9, q9
>         vmov    q10, q9  @ v4si
>         vpadal.u16      q10, q8
>         vmov    r0, r1, d20  @ v4si
>         vmov    r2, r3, d21
>         bx      lr

With current trunk I'm getting for the softfp case:

        push    {lr}    @ 40    *push_multi     [length = 2]
        vmov    d16, r0, r1  @ v16qi    @ 37    *neon_movv16qi/6        [length
= 8]
        vmov    d17, r2, r3
        add     lr, sp, #4      @ 36    *arm_addsi3/5   [length = 4]
        vldr    d18, [sp, #4]   @ 3     *neon_movv16qi/4        [length = 8]
        vldr    d19, [sp, #12]
        vabd.u8 q9, q8, q9      @ 7     neon_vabdv16qi  [length = 4]
        vmull.u8        q8, d18, d18    @ 14    neon_vmullv8qi  [length = 4]
        vmull.u8        q9, d19, d19    @ 16    neon_vmullv8qi  [length = 4]
        vuzp.32 q8, q9  @ 18    *neon_vuzpv4si_insn     [length = 4]
        vpaddl.u16      q8, q8  @ 22    neon_vpaddlv8hi [length = 4]
        vpadal.u16      q8, q9  @ 28    neon_vpadalv8hi [length = 4]
        vmov    r0, r1, d16  @ v4si     @ 39    *neon_movv4si/5 [length = 8]
        vmov    r2, r3, d17
        ldr     pc, [sp], #4    @ 45    *ldr_with_return        [length = 4]


The move between the vpad*s is gone, but there's a couple of redundant loads
and some register spillage.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
  2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
                   ` (6 preceding siblings ...)
  2013-05-28 19:30 ` mgretton at gcc dot gnu.org
@ 2014-01-22 12:19 ` StaffLeavers at arm dot com
  2014-01-22 12:19 ` ktkachov at gcc dot gnu.org
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

--- Comment #9 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.

Your email will be forwarded to their line manager.


Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com

Thank you.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
  2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
                   ` (8 preceding siblings ...)
  2014-01-22 12:19 ` ktkachov at gcc dot gnu.org
@ 2014-01-22 12:20 ` StaffLeavers at arm dot com
  2014-01-22 12:21 ` StaffLeavers at arm dot com
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

--- Comment #10 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.

Your email will be forwarded to their line manager.


Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com

Thank you.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
  2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
                   ` (9 preceding siblings ...)
  2014-01-22 12:20 ` StaffLeavers at arm dot com
@ 2014-01-22 12:21 ` StaffLeavers at arm dot com
  2014-01-22 12:22 ` StaffLeavers at arm dot com
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

--- Comment #11 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.

Your email will be forwarded to their line manager.


Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com

Thank you.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
  2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
                   ` (10 preceding siblings ...)
  2014-01-22 12:21 ` StaffLeavers at arm dot com
@ 2014-01-22 12:22 ` StaffLeavers at arm dot com
  2014-01-22 12:22 ` StaffLeavers at arm dot com
  2014-06-13 15:38 ` christophe.lyon at st dot com
  13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

--- Comment #12 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.

Your email will be forwarded to their line manager.


Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com

Thank you.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
  2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
                   ` (11 preceding siblings ...)
  2014-01-22 12:22 ` StaffLeavers at arm dot com
@ 2014-01-22 12:22 ` StaffLeavers at arm dot com
  2014-06-13 15:38 ` christophe.lyon at st dot com
  13 siblings, 0 replies; 15+ messages in thread
From: StaffLeavers at arm dot com @ 2014-01-22 12:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

--- Comment #13 from StaffLeavers at arm dot com ---
greta.yorsh no longer works for ARM.

Your email will be forwarded to their line manager.


Please do not reply to this email.
If you need more information, please email real-postmaster@arm.com

Thank you.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
  2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
                   ` (12 preceding siblings ...)
  2014-01-22 12:22 ` StaffLeavers at arm dot com
@ 2014-06-13 15:38 ` christophe.lyon at st dot com
  13 siblings, 0 replies; 15+ messages in thread
From: christophe.lyon at st dot com @ 2014-06-13 15:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

christophe.lyon at st dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |christophe.lyon at st dot com

--- Comment #14 from christophe.lyon at st dot com ---
As of current trunk the softfp case looks like this:
sqrlen4D_16u8:
        vmov    d16, r0, r1  @ v16qi
        vmov    d17, r2, r3
        vld1.64 {d18-d19}, [sp:64]
        vabd.u8 q9, q8, q9
        vmull.u8        q8, d18, d18
        vmull.u8        q9, d19, d19
        vuzp.32 q8, q9
        vpaddl.u16      q8, q8
        vpadal.u16      q8, q9
        vmov    r0, r1, d16  @ v4si
        vmov    r2, r3, d17
        bx      lr

which looks quite good.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-06-13 15:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-24 15:29 [Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq eric.batut at allegorithmic dot com
2012-01-24 15:31 ` [Bug target/51980] " rguenth at gcc dot gnu.org
2012-01-27 14:50 ` eric.batut at allegorithmic dot com
2012-01-27 15:51 ` ramana at gcc dot gnu.org
2012-03-30  8:18 ` ramana at gcc dot gnu.org
2012-03-30  8:40 ` ramana at gcc dot gnu.org
2012-07-05 16:46 ` ramana at gcc dot gnu.org
2013-05-28 19:30 ` mgretton at gcc dot gnu.org
2014-01-22 12:19 ` StaffLeavers at arm dot com
2014-01-22 12:19 ` ktkachov at gcc dot gnu.org
2014-01-22 12:20 ` StaffLeavers at arm dot com
2014-01-22 12:21 ` StaffLeavers at arm dot com
2014-01-22 12:22 ` StaffLeavers at arm dot com
2014-01-22 12:22 ` StaffLeavers at arm dot com
2014-06-13 15:38 ` christophe.lyon at st dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).