public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/63503] New: [AArch64] A57 executes fused multiply-add poorly in some situations
@ 2014-10-09 21:56 e.menezes at samsung dot com
  2014-10-09 22:01 ` [Bug target/63503] " pinskia at gcc dot gnu.org
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: e.menezes at samsung dot com @ 2014-10-09 21:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503

            Bug ID: 63503
           Summary: [AArch64] A57 executes fused multiply-add poorly in
                    some situations
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: e.menezes at samsung dot com
                CC: spop at gcc dot gnu.org
            Target: aarch64-*

Curious why Geekbench's {D,S}GEMM by GCC were 8-9% slower than by LLVM, I was
baffled to find that the code emitted by GCC for the innermost loop in the
algorithm core is actually very good:

.L8:
    ldr d2, [x8, w5, uxtw 3]
    ldr d1, [x7, w5, uxtw 3]
    add w5, w5, 1
    cmp w5, w6
    fmadd   d0, d2, d1, d0
    bne .L8

LLVM's code is not so neat:

.LBB0_10:
    ldr d1, [x27, x22, lsl #3]
    ldr d2, [x9, x22, lsl #3]
    fmul    d1, d1, d2
    fadd    d0, d0, d1
    add w21, w21, #1
    add x22, x22, #1
    cmp w21, w24, uxtw
    b.ne .LBB0_10

However, it runs faster.

Methinks that the A57 microarchitecture is performing tricks for discrete FP
operations but not for fused multiply-add, since both code sequences are
semantically the same.  Whatever it is, it seems that fused multiply-add, and
perhaps its cousins, is actually a performance hit only when one depends on the
results of a previous one, as in this case on the results of the fused
operation in the previous loop iteration.

I'll try to create a simple test-case, but, in the meantime, please chime in
about your thoughts.


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-04-28  8:11 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-09 21:56 [Bug target/63503] New: [AArch64] A57 executes fused multiply-add poorly in some situations e.menezes at samsung dot com
2014-10-09 22:01 ` [Bug target/63503] " pinskia at gcc dot gnu.org
2014-10-09 22:02 ` pinskia at gcc dot gnu.org
2014-10-09 22:05 ` e.menezes at samsung dot com
2014-10-09 22:14 ` e.menezes at samsung dot com
2014-10-09 23:08 ` pinskia at gcc dot gnu.org
2014-10-10 13:24 ` wdijkstr at arm dot com
2014-10-10 13:59 ` ramana at gcc dot gnu.org
2014-10-14 20:21 ` e.menezes at samsung dot com
2014-10-14 22:38 ` e.menezes at samsung dot com
2014-10-21 17:47 ` wdijkstr at arm dot com
2014-10-21 18:35 ` pinskia at gcc dot gnu.org
2014-10-21 21:41 ` e.menezes at samsung dot com
2014-10-22 12:13 ` wdijkstr at arm dot com
2014-10-22 16:54 ` e.menezes at samsung dot com
2014-10-22 17:58 ` wdijkstr at arm dot com
2014-10-22 23:30 ` e.menezes at samsung dot com
2014-10-22 23:30 ` e.menezes at samsung dot com
2014-10-22 23:59 ` e.menezes at samsung dot com
2014-10-23  0:31 ` wdijkstr at arm dot com
2014-10-23 10:26 ` ramana.radhakrishnan at arm dot com
2014-10-28 20:57 ` e.menezes at samsung dot com
2014-10-28 22:54 ` e.menezes at samsung dot com
2014-10-29  0:07 ` wdijkstr at arm dot com
2015-04-28  8:11 ` thopre01 at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).