public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/67366] New: Poor assembly generation for unaligned memory accesses on ARM v6 & v7 cpus
@ 2015-08-26 21:42 yann.collet.73 at gmail dot com
  2015-08-27  7:39 ` [Bug target/67366] " rguenth at gcc dot gnu.org
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: yann.collet.73 at gmail dot com @ 2015-08-26 21:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67366

            Bug ID: 67366
           Summary: Poor assembly generation for unaligned memory accesses
                    on ARM v6 & v7 cpus
           Product: gcc
           Version: 4.8.2
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yann.collet.73 at gmail dot com
  Target Milestone: ---

Accessing unaligned memory positions used to be forbidden on ARM cpus. But
since ARMv6 (quite many years by now), this operation is supported.

However, GCC 4.5 - 4.6 - 4.7 - 4.8 seem to generate sub-optimal code on these
targets.

In theory, it's illegal to issue a direct statement such as :

u32 read32(const void* ptr) { return *(const u32*)ptr; }

if ptr is not properly aligned.

There are 2 work-around that I know.
The first is to use `packed` instruction, which is not portable (compiler
specific).

The second and better one is to use memcpy() :

u32 read32(const void* ptr) { u32 v; memcpy(&u, ptr, sizeof(v)); return v; }

This version is portable and safe.
It also works very well on multiple platform, such as x86/x64 or PPC, or ARM64,
being reduced to an optimal assembly sequence (single instruction).

Unfortunately, GCC 4.5 - 4.6 - 4.7 - 4.8 generate suboptimal assembly for this
function on ARMv6 or ARMv7 :

read32(void const*):
        ldr     r0, [r0]        @ unaligned
        sub     sp, sp, #8
        str     r0, [sp, #4]    @ unaligned
        ldr     r0, [sp, #4]
        add     sp, sp, #8
        bx      lr

This in stark contrast with clang, which generates a much more efficient
assembly :

read32(void const*):                           @ @read32(void const*)
        ldr     r0, [r0]
        bx      lr

(assembly can be generated and displayed using a simple tool :
https://goo.gl/7FWDB8)

It's not that gcc is unaware of cpu's unaligned memory access capability,
since it does use it : `ldr r0, [r0]`
but then lose a lot of time on useless operations on a discardable temporary
variable,
storing data into stack just to read it again.


Inlining does not save the day. -O3 help at reducing the impact, but it's still
large.

On a recent exercise comparing efficient vs inefficient memory access on ARMv6
and ARMv7,
the measured difference was very large : up to 6x faster at -O2 settings.
See :
http://fastcompression.blogspot.com/2015/08/accessing-unaligned-memory.html

It's definitely a too large difference to be ignored.
As a consequence, to preserve performance, source code must try a bunch of
possibilities depending on target and compiler, if not version.
In some circumstances (gcc with ARMv6, or gcc <= 4.5), it's even necessary to
write illegal code (see !st version above) to reach optimal performance on
targets.

This looks like a waste of energy, and a recipe for bugs, especially compared
to clang, which generates clean code in all circumstances for all targets.


Considering the huge performance difference such an improvement could make, is
that something the gcc team would like to look into ?


Regards


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-10-13  9:16 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-26 21:42 [Bug c/67366] New: Poor assembly generation for unaligned memory accesses on ARM v6 & v7 cpus yann.collet.73 at gmail dot com
2015-08-27  7:39 ` [Bug target/67366] " rguenth at gcc dot gnu.org
2015-08-27  9:36 ` rearnsha at gcc dot gnu.org
2015-08-27 10:21 ` rguenther at suse dot de
2015-08-27 10:42 ` ramana at gcc dot gnu.org
2015-08-27 10:47 ` ramana at gcc dot gnu.org
2015-08-27 11:08 ` ramana at gcc dot gnu.org
2015-08-27 11:13 ` rguenther at suse dot de
2015-08-27 11:17 ` ramana at gcc dot gnu.org
2015-08-27 14:31 ` rearnsha at gcc dot gnu.org
2015-08-27 14:36 ` rguenther at suse dot de
2015-08-27 14:45 ` ramana at gcc dot gnu.org
2015-09-09 15:28 ` ramana at gcc dot gnu.org
2015-10-09 11:08 ` ramana at gcc dot gnu.org
2015-10-11 10:33 ` fredrik.hederstierna@securitas-direct.com
2015-10-13  9:16 ` ramana at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).