public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/49057] New: benchmark of gcc. a piece of loop code compiled by gcc-4.5.1 is slower compiled by gcc-4.4.2 when run on cortex-a9.
@ 2011-05-19 7:22 kun.he at mediatek dot com
2011-07-23 23:46 ` [Bug target/49057] scheduling difference of subs cause 80% performance difference pinskia at gcc dot gnu.org
2011-07-24 14:27 ` rearnsha at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: kun.he at mediatek dot com @ 2011-05-19 7:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49057
Summary: benchmark of gcc. a piece of loop code compiled by
gcc-4.5.1 is slower compiled by gcc-4.4.2 when run on
cortex-a9.
Product: gcc
Version: 4.5.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: kun.he@mediatek.com
The following C code is used to do a “integer add” test. The type of n, i, i1,
i2, loop_cnt are all ‘int’. the initial value: loop_cnt=5000000, i=0, i1=3,
i2=-3.
for (n = loop_cnt; n > 0; n--) { /* 0 x -x - initial value */
i += i1; /* x x -x */
i1 += i2; /* x 0 -x */
i1 += i2; /* x -x -x */
i2 += i; /* x -x 0 */
i2 += i; /* x -x x */
i += i1; /* 0 -x x */
i += i1; /* -x -x x */
i1 += i2; /* -x 0 x */
i1 += i2; /* -x x x */
i2 += i; /* -x x 0 */
i2 += i; /* -x x -x */
i += i1; /* 0 x -x */
/*
* Note that at loop end, i1 = -i2
*/
/*
* which is as we started. Thus,
*/
/*
* the values in the loop are stable
*/
}
I use gcc-4.4.2 and gcc-4.5.1 compile this C code, that will generate different
binary code.
Gcc-4.42:
284: e0800003 add r0, r0, r3
288: e2511001 subs r1, r1, #1 ; 0x1
28c: e0833082 add r3, r3, r2, lsl #1
290: e0822080 add r2, r2, r0, lsl #1
294: e0800083 add r0, r0, r3, lsl #1
298: e0833082 add r3, r3, r2, lsl #1
29c: e0822080 add r2, r2, r0, lsl #1
2a0: e0830000 add r0, r3, r0
2a4: 1afffff6 bne 284 <add_int+0x4c>
Gcc-4.5.1:
138: e0800003 add r0, r0, r3
13c: e0833082 add r3, r3, r2, lsl #1
140: e0822080 add r2, r2, r0, lsl #1
144: e2511001 subs r1, r1, #1
148: e0800083 add r0, r0, r3, lsl #1
14c: e0833082 add r3, r3, r2, lsl #1
150: e0822080 add r2, r2, r0, lsl #1
154: e0830000 add r0, r3, r0
158: 1afffff6 bne 138 <add_int+0x4c>
As you see, the only one difference is the position of “subs r1, r1, #1”,
and this difference has led to huge differences in performance. The performance
of the latter just has 80% of the former.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug target/49057] scheduling difference of subs cause 80% performance difference
2011-05-19 7:22 [Bug rtl-optimization/49057] New: benchmark of gcc. a piece of loop code compiled by gcc-4.5.1 is slower compiled by gcc-4.4.2 when run on cortex-a9 kun.he at mediatek dot com
@ 2011-07-23 23:46 ` pinskia at gcc dot gnu.org
2011-07-24 14:27 ` rearnsha at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2011-07-23 23:46 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49057
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug target/49057] scheduling difference of subs cause 80% performance difference
2011-05-19 7:22 [Bug rtl-optimization/49057] New: benchmark of gcc. a piece of loop code compiled by gcc-4.5.1 is slower compiled by gcc-4.4.2 when run on cortex-a9 kun.he at mediatek dot com
2011-07-23 23:46 ` [Bug target/49057] scheduling difference of subs cause 80% performance difference pinskia at gcc dot gnu.org
@ 2011-07-24 14:27 ` rearnsha at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2011-07-24 14:27 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49057
Richard Earnshaw <rearnsha at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |WAITING
Last reconfirmed| |2011.07.24 14:27:01
Ever Confirmed|0 |1
--- Comment #1 from Richard Earnshaw <rearnsha at gcc dot gnu.org> 2011-07-24 14:27:01 UTC ---
You haven't said what CPU this is for, what options you used when compiling,
and you haven't provided a complete testcase.
Are you *absolutely* sure this is the only difference? because I find that hard
to believe. More likely is that the loop has a different alignment, or there
is some other, secondary, issue that you've exposed.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-07-24 14:27 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-19 7:22 [Bug rtl-optimization/49057] New: benchmark of gcc. a piece of loop code compiled by gcc-4.5.1 is slower compiled by gcc-4.4.2 when run on cortex-a9 kun.he at mediatek dot com
2011-07-23 23:46 ` [Bug target/49057] scheduling difference of subs cause 80% performance difference pinskia at gcc dot gnu.org
2011-07-24 14:27 ` rearnsha at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).