[Bug tree-optimization/55623] New: [ARM] GCC should not prefer long dependency chains, they inhibit performance on superscalar processors

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "siarhei.siamashka at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/55623] New: [ARM] GCC should not prefer long dependency chains, they inhibit performance on superscalar processors
Date: Sun, 09 Dec 2012 10:00:00 -0000	[thread overview]
Message-ID: <bug-55623-4@http.gcc.gnu.org/bugzilla/> (raw)


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55623

             Bug #: 55623
           Summary: [ARM] GCC should not prefer long dependency chains,
                    they inhibit performance on superscalar processors
    Classification: Unclassified
           Product: gcc
           Version: 4.7.2
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: siarhei.siamashka@gmail.com


This is a missing optimization. Or in this particular case, it's more like GCC
is reversing an attempt of a programmer to optimize the code for superscalar
dual-issue processors.

$ arm-none-linux-gnueabi-gcc -O2 -mcpu=cortex-a8 -o badsched badsched.c
$ objdump -d badsched

00000000 <f1>:
   0:    e1a03120     lsr    r3, r0, #2
   4:    e08330a0     add    r3, r3, r0, lsr #1
   8:    e08331a0     add    r3, r3, r0, lsr #3
   c:    e0833220     add    r3, r3, r0, lsr #4
  10:    e08332a0     add    r3, r3, r0, lsr #5
  14:    e0833320     add    r3, r3, r0, lsr #6
  18:    e08333a0     add    r3, r3, r0, lsr #7
  1c:    e0833420     add    r3, r3, r0, lsr #8
  20:    e08334a0     add    r3, r3, r0, lsr #9
  24:    e0833520     add    r3, r3, r0, lsr #10
  28:    e08335a0     add    r3, r3, r0, lsr #11
  2c:    e0833620     add    r3, r3, r0, lsr #12
  30:    e08336a0     add    r3, r3, r0, lsr #13
  34:    e0833720     add    r3, r3, r0, lsr #14
  38:    e08337a0     add    r3, r3, r0, lsr #15
  3c:    e0833820     add    r3, r3, r0, lsr #16
  40:    e08338a0     add    r3, r3, r0, lsr #17
  44:    e0833920     add    r3, r3, r0, lsr #18
  48:    e08339a0     add    r3, r3, r0, lsr #19
  4c:    e0833a20     add    r3, r3, r0, lsr #20
  50:    e0833aa0     add    r3, r3, r0, lsr #21
  54:    e0833b20     add    r3, r3, r0, lsr #22
  58:    e0833ba0     add    r3, r3, r0, lsr #23
  5c:    e0830c20     add    r0, r3, r0, lsr #24
  60:    e12fff1e     bx    lr

00000064 <f2>:
  64:    e1a031a0     lsr    r3, r0, #3
  68:    e1a02220     lsr    r2, r0, #4
  6c:    e08330a0     add    r3, r3, r0, lsr #1
  70:    e0822120     add    r2, r2, r0, lsr #2
  74:    e08332a0     add    r3, r3, r0, lsr #5
  78:    e0822320     add    r2, r2, r0, lsr #6
  7c:    e08333a0     add    r3, r3, r0, lsr #7
  80:    e0822420     add    r2, r2, r0, lsr #8
  84:    e08334a0     add    r3, r3, r0, lsr #9
  88:    e0822520     add    r2, r2, r0, lsr #10
  8c:    e08335a0     add    r3, r3, r0, lsr #11
  90:    e0822620     add    r2, r2, r0, lsr #12
  94:    e08336a0     add    r3, r3, r0, lsr #13
  98:    e0822720     add    r2, r2, r0, lsr #14
  9c:    e08337a0     add    r3, r3, r0, lsr #15
  a0:    e0822820     add    r2, r2, r0, lsr #16
  a4:    e08338a0     add    r3, r3, r0, lsr #17
  a8:    e0822920     add    r2, r2, r0, lsr #18
  ac:    e08339a0     add    r3, r3, r0, lsr #19
  b0:    e0822a20     add    r2, r2, r0, lsr #20
  b4:    e0833aa0     add    r3, r3, r0, lsr #21
  b8:    e0822b20     add    r2, r2, r0, lsr #22
  bc:    e0833ba0     add    r3, r3, r0, lsr #23
  c0:    e0820c20     add    r0, r2, r0, lsr #24
  c4:    e0800003     add    r0, r0, r3
  c8:    e12fff1e     bx    lr

Guess which one of these two functions will be faster?

=== Cortex-A8 @1000MHz ===

$ time ./badsched 1

real    0m2.512s
user    0m2.500s
sys    0m0.000s

$ time ./badsched 2

real    0m2.064s
user    0m2.008s
sys    0m0.008s

=== Cortex-A15 @1700MHz ===

real    0m2.786s
user    0m2.770s
sys    0m0.005s

real    0m1.451s
user    0m1.440s
sys    0m0.005s

There is a function call and loop overhead which prevents Cortex-A8 from
showing ~2x better performance in the case of using "f2" function. We can try
to mark these function as static in order to get them inlined, but in this case
the asm workaround becomes ineffective in a rather interesting way, which also
demonstrates instructions scheduling issues.

next             reply	other threads:[~2012-12-09 10:00 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-09 10:00 siarhei.siamashka at gmail dot com [this message]
2012-12-09 10:01 ` [Bug tree-optimization/55623] " siarhei.siamashka at gmail dot com
2012-12-09 10:48 ` [Bug middle-end/55623] " pinskia at gcc dot gnu.org
2012-12-09 11:19 ` siarhei.siamashka at gmail dot com
2012-12-09 11:22 ` siarhei.siamashka at gmail dot com
2012-12-09 11:36 ` steven at gcc dot gnu.org
2012-12-09 12:13 ` steven at gcc dot gnu.org
2012-12-09 12:15 ` steven at gcc dot gnu.org
2012-12-10  9:47 ` rguenth at gcc dot gnu.org
2012-12-11 16:33 ` ramana at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-55623-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).