[Bug rtl-optimization/49057] New: benchmark of gcc. a piece of loop code compiled by gcc-4.5.1 is slower compiled by gcc-4.4.2 when run on cortex-a9.

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/49057] New: benchmark of gcc. a piece of loop code compiled by gcc-4.5.1 is slower compiled by gcc-4.4.2 when run on cortex-a9.
@ 2011-05-19  7:22 kun.he at mediatek dot com
  2011-07-23 23:46 ` [Bug target/49057] scheduling difference of subs cause 80% performance difference pinskia at gcc dot gnu.org
  2011-07-24 14:27 ` rearnsha at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: kun.he at mediatek dot com @ 2011-05-19  7:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49057

           Summary: benchmark of gcc. a piece of loop code compiled by
                    gcc-4.5.1 is slower compiled by gcc-4.4.2 when run on
                    cortex-a9.
           Product: gcc
           Version: 4.5.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: kun.he@mediatek.com


The following C code is used to do a “integer add” test. The type of n, i, i1,
i2, loop_cnt are all ‘int’. the initial value: loop_cnt=5000000, i=0, i1=3,
i2=-3.
for (n = loop_cnt; n > 0; n--) {        /*    0    x     -x  - initial value */
                i += i1;                /*    x    x     -x   */
                i1 += i2;               /*    x    0     -x   */
                i1 += i2;               /*    x    -x    -x   */
                i2 += i;                /*    x    -x    0    */
                i2 += i;                /*    x    -x    x    */
                i += i1;                /*    0    -x    x    */
                i += i1;                /*    -x   -x    x    */
                i1 += i2;               /*    -x   0     x    */
                i1 += i2;               /*    -x   x     x    */
                i2 += i;                /*    -x   x     0    */
                i2 += i;                /*    -x   x     -x   */
                i += i1;                /*    0    x     -x   */
                /*
                 * Note that at loop end, i1 = -i2
                 */
                /*
                 * which is as we started.  Thus,
                 */
                /*
                 * the values in the loop are stable
                 */
        }
I use gcc-4.4.2 and gcc-4.5.1 compile this C code, that will generate different
binary code.
Gcc-4.42:
284:    e0800003     add    r0, r0, r3
 288:    e2511001     subs    r1, r1, #1    ; 0x1
 28c:    e0833082     add    r3, r3, r2, lsl #1
 290:    e0822080     add    r2, r2, r0, lsl #1
 294:    e0800083     add    r0, r0, r3, lsl #1
 298:    e0833082     add    r3, r3, r2, lsl #1
 29c:    e0822080     add    r2, r2, r0, lsl #1
 2a0:    e0830000     add    r0, r3, r0
 2a4:    1afffff6     bne    284 <add_int+0x4c>

Gcc-4.5.1:
138:    e0800003     add    r0, r0, r3
 13c:    e0833082     add    r3, r3, r2, lsl #1
 140:    e0822080     add    r2, r2, r0, lsl #1
 144:    e2511001     subs    r1, r1, #1
 148:    e0800083     add    r0, r0, r3, lsl #1
 14c:    e0833082     add    r3, r3, r2, lsl #1
 150:    e0822080     add    r2, r2, r0, lsl #1
 154:    e0830000     add    r0, r3, r0
 158:    1afffff6     bne    138 <add_int+0x4c>

As you see, the only one difference is the position of “subs    r1, r1, #1”,
and this difference has led to huge differences in performance. The performance
of the latter just has 80% of the former.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/49057] scheduling difference of subs cause 80% performance difference
  2011-05-19  7:22 [Bug rtl-optimization/49057] New: benchmark of gcc. a piece of loop code compiled by gcc-4.5.1 is slower compiled by gcc-4.4.2 when run on cortex-a9 kun.he at mediatek dot com
@ 2011-07-23 23:46 ` pinskia at gcc dot gnu.org
  2011-07-24 14:27 ` rearnsha at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2011-07-23 23:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49057

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/49057] scheduling difference of subs cause 80% performance difference
  2011-05-19  7:22 [Bug rtl-optimization/49057] New: benchmark of gcc. a piece of loop code compiled by gcc-4.5.1 is slower compiled by gcc-4.4.2 when run on cortex-a9 kun.he at mediatek dot com
  2011-07-23 23:46 ` [Bug target/49057] scheduling difference of subs cause 80% performance difference pinskia at gcc dot gnu.org
@ 2011-07-24 14:27 ` rearnsha at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2011-07-24 14:27 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49057

Richard Earnshaw <rearnsha at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2011.07.24 14:27:01
     Ever Confirmed|0                           |1

--- Comment #1 from Richard Earnshaw <rearnsha at gcc dot gnu.org> 2011-07-24 14:27:01 UTC ---
You haven't said what CPU this is for, what options you used when compiling,
and you haven't provided a complete testcase.

Are you *absolutely* sure this is the only difference? because I find that hard
to believe.  More likely is that the loop has a different alignment, or there
is some other, secondary, issue that you've exposed.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-07-24 14:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-19  7:22 [Bug rtl-optimization/49057] New: benchmark of gcc. a piece of loop code compiled by gcc-4.5.1 is slower compiled by gcc-4.4.2 when run on cortex-a9 kun.he at mediatek dot com
2011-07-23 23:46 ` [Bug target/49057] scheduling difference of subs cause 80% performance difference pinskia at gcc dot gnu.org
2011-07-24 14:27 ` rearnsha at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).