[Bug target/48789] missed ARM optimization: use LDMIA

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/48789] missed ARM optimization: use LDMIA
  2011-04-27 11:24 [Bug target/48789] New: missed ARM optimization: use LDMIA edwintorok at gmail dot com
@ 2011-04-27 11:15 ` edwintorok at gmail dot com
  2011-04-27 11:20 ` edwintorok at gmail dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: edwintorok at gmail dot com @ 2011-04-27 11:15 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48789

--- Comment #3 from Török Edwin <edwintorok at gmail dot com> 2011-04-27 11:11:02 UTC ---
Created attachment 24116
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24116
bench.c


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/48789] missed ARM optimization: use LDMIA
  2011-04-27 11:24 [Bug target/48789] New: missed ARM optimization: use LDMIA edwintorok at gmail dot com
  2011-04-27 11:15 ` [Bug target/48789] " edwintorok at gmail dot com
@ 2011-04-27 11:20 ` edwintorok at gmail dot com
  2011-04-27 11:20 ` edwintorok at gmail dot com
  2011-07-27 16:40 ` ramana at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: edwintorok at gmail dot com @ 2011-04-27 11:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48789

--- Comment #2 from Török Edwin <edwintorok at gmail dot com> 2011-04-27 11:10:49 UTC ---
Created attachment 24115
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24115
test.S


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/48789] missed ARM optimization: use LDMIA
  2011-04-27 11:24 [Bug target/48789] New: missed ARM optimization: use LDMIA edwintorok at gmail dot com
  2011-04-27 11:15 ` [Bug target/48789] " edwintorok at gmail dot com
  2011-04-27 11:20 ` edwintorok at gmail dot com
@ 2011-04-27 11:20 ` edwintorok at gmail dot com
  2011-07-27 16:40 ` ramana at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: edwintorok at gmail dot com @ 2011-04-27 11:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48789

--- Comment #1 from Török Edwin <edwintorok at gmail dot com> 2011-04-27 11:10:35 UTC ---
Created attachment 24114
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24114
reverse.c


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/48789] New: missed ARM optimization: use LDMIA
@ 2011-04-27 11:24 edwintorok at gmail dot com
  2011-04-27 11:15 ` [Bug target/48789] " edwintorok at gmail dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: edwintorok at gmail dot com @ 2011-04-27 11:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48789

           Summary: missed ARM optimization: use LDMIA
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: edwintorok@gmail.com
              Host: x86_64-linux-gnu
            Target: arm-elf
             Build: x86_64-linux-gnu


The attached testcase compiles to larger and slower code than the
hand-optimized version, although the C code follows closely the structure of
hand-optimized assembly.

To reproduce the missed optimization:
arm-elf-gcc reverse.c -O3 -mcpu=arm946e-s -msoft-float

The reverse_bytes_order_c2 has too many ldr/str instructions, it should use
ldmia/stmia as seen in the hand-optimized version (test.S
reverse_bytes_order2).

Note: without -msoft-float it generates faster code by using VFP instructions,
but my CPU doesn't support them, so I have to turn off floating point
generation.

Attachments:
reverse.c: the testcase
test.S: the hand-optimized version of the reverse_bytes_order_c2, called
reverse_bytes_order2 here (code from CHDK's lib/armutil/)
bench.c: a simple benchmark runner to compare gcc's version with the hand
optimized one

This happens both with 4.6 and 4.5:
$ arm-elf-gcc -v
Using built-in specs.
COLLECT_GCC=../build-dir/arm/toolchain/bin/arm-elf-gcc
COLLECT_LTO_WRAPPER=/home/edwin/chdk/build-dir/arm/toolchain/libexec/gcc/arm-elf/4.6.0/lto-wrapper
Target: arm-elf
Configured with: ../gcc-4.6.0/configure --target=arm-elf
--prefix=/home/edwin/chdk/build-dir/arm/toolchain --enable-interwork
--enable-multilib --enable-languages=c --with-newlib
--with-gmp-include=/home/edwin/chdk/build-dir/build/gmp
--with-gmp-lib=/home/edwin/chdk/build-dir/build/gmp/.libs --without-headers
--disable-libssp --disable-nls --disable-zlib --disable-libc --disable-libm
--disable-intl --disable-hardfloat --disable-threads --with-gnu-as
--with-gnu-ld
Thread model: single
gcc version 4.6.0 (GCC) 

$ /opt/cfarm/release/4.5.0/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/opt/cfarm/release/4.5.0/bin/gcc
COLLECT_LTO_WRAPPER=/home/guerby/opt/release/4.5.0/bin/../libexec/gcc/armv7l-unknown-linux-gnueabi/4.5.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabi
Configured with: ../gcc-4.5.0/configure --prefix=/opt/cfarm/release/4.5.0
--enable-languages=c,ada --enable-__cxa_atexit --disable-nls
--enable-threads=posix --disable-multilib --with-gmp=/opt/cfarm/gmp-4.2.4
--with-mpfr=/opt/cfarm/mpfr-2.4.2 --with-mpc=/opt/cfarm/mpc-0.8
--with-cpu=cortex-a8 --with-fpu=neon --with-float=softfp --disable-werror
Thread model: posix
gcc version 4.5.0 (GCC) 

Some benchmarks (run on gcc33, which would support armv7, but my CPU won't, so
I can only use armv5te):
base: 0.340810 (hand-optimized assembly)
3: 0.840712 (alternate version)
c: 0.379164 (C code, compiled with -O3)
c2: 0.395410 (C code, unrolled 8 times as the hand assembly, compiled with -O3)

(note: run benchmark as ./a.out; ./a.out; ./a.out. I think there is some
frequency scaling causing the first run to be slower)

To run benchmark:
/opt/cfarm/release/4.5.0/bin/gcc bench.c reverse.c test.S -O3  -mcpu=arm946e-s
-msoft-float


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/48789] missed ARM optimization: use LDMIA
  2011-04-27 11:24 [Bug target/48789] New: missed ARM optimization: use LDMIA edwintorok at gmail dot com
                   ` (2 preceding siblings ...)
  2011-04-27 11:20 ` edwintorok at gmail dot com
@ 2011-07-27 16:40 ` ramana at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: ramana at gcc dot gnu.org @ 2011-07-27 16:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48789

Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ramana at gcc dot gnu.org

--- Comment #4 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2011-07-27 16:40:09 UTC ---
There are a number of problems and not all of them are related to the backend. 


- ldm / stm aren't really first class citizens as far as GCC is concerned .
There is no way today of getting the register allocator to forcefully use
increasing addresses as a metric of choosing where to do what. 

- I suspect the performance issues you are seeing are with the number of spills
and fills that are being generated in this case. If you tried -fsched-pressure
life becomes much better and in fact the amount of stack space used is 0 . I
haven't run any benchmarks to see if in this particular case you get better
performance .


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-07-27 16:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-27 11:24 [Bug target/48789] New: missed ARM optimization: use LDMIA edwintorok at gmail dot com
2011-04-27 11:15 ` [Bug target/48789] " edwintorok at gmail dot com
2011-04-27 11:20 ` edwintorok at gmail dot com
2011-04-27 11:20 ` edwintorok at gmail dot com
2011-07-27 16:40 ` ramana at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).