public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/29953] New: [SH-4] Perfomance regression in loops. cmp/eq used instead of dt
@ 2006-11-23 10:14 nbkolchin at gmail dot com
2006-11-23 10:15 ` [Bug target/29953] " nbkolchin at gmail dot com
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: nbkolchin at gmail dot com @ 2006-11-23 10:14 UTC (permalink / raw)
To: gcc-bugs
GCC 4.1.1 (probably all 4.* versions, tested 4.3.0-svn also), uses cmp/eq
instead of "dt" in loops. This leads to ~20% perfomance decrease.
Technically, loop processing algorithm is completely different between
versions.
Example (sources in attach):
CFLAGS="-m4 -O3 -fomit-frame-pointer"
gcc 3.4.4:
----------------------------
.LFB2:
mov.l .L11,r3
mov #0,r0
mov.l .L12,r2
.L5:
mov.l @r3+,r1 ! !!!
dt r2 ! !!!
bf/s .L5
add r1,r0
rts
nop
.L13:
.align 2
.L11:
.long -1946157056
.L12:
.long 1000000
-----------------------------
gcc 4.1.1:
-----------------------------
.LFB2:
mov.l .L8,r2
mov #0,r0
mov.l .L9,r3
.L2:
mov.l @r2+,r1 ! !!!
cmp/eq r3,r2 ! !!!
bf/s .L2
add r1,r0
rts
nop
.L10:
.align 2
.L8:
.long -1946157056
.L9:
.long -1942157056
-----------------------------
P.S. We are porting application from gcc3.4 to gcc4.1 and have about 60%
perfomance decrease. So this is probably just first report. :(
--
Summary: [SH-4] Perfomance regression in loops. cmp/eq used
instead of dt
Product: gcc
Version: 4.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: nbkolchin at gmail dot com
GCC build triplet: i686-pc-linux-gnu
GCC host triplet: i686-pc-linux-gnu
GCC target triplet: sh-rtemself
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29953
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/29953] [SH-4] Perfomance regression in loops. cmp/eq used instead of dt
2006-11-23 10:14 [Bug target/29953] New: [SH-4] Perfomance regression in loops. cmp/eq used instead of dt nbkolchin at gmail dot com
@ 2006-11-23 10:15 ` nbkolchin at gmail dot com
2007-03-28 10:51 ` mano at roarinelk dot homelinux dot net
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: nbkolchin at gmail dot com @ 2006-11-23 10:15 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from nbkolchin at gmail dot com 2006-11-23 10:15 -------
Created an attachment (id=12671)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12671&action=view)
test.cpp
Testcase
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29953
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/29953] [SH-4] Perfomance regression in loops. cmp/eq used instead of dt
2006-11-23 10:14 [Bug target/29953] New: [SH-4] Perfomance regression in loops. cmp/eq used instead of dt nbkolchin at gmail dot com
2006-11-23 10:15 ` [Bug target/29953] " nbkolchin at gmail dot com
@ 2007-03-28 10:51 ` mano at roarinelk dot homelinux dot net
2007-04-03 6:43 ` christian dot bruel at st dot com
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: mano at roarinelk dot homelinux dot net @ 2007-03-28 10:51 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from mano at roarinelk dot homelinux dot net 2007-03-28 11:50 -------
gcc-4.1.2 and 3.4.6 for linux; 3.3.5 and 2.95 for QNX also create
similar code without the dt instruction. How come your 3.4.4 is
so smart?
--
mano at roarinelk dot homelinux dot net changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mano at roarinelk dot
| |homelinux dot net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29953
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/29953] [SH-4] Perfomance regression in loops. cmp/eq used instead of dt
2006-11-23 10:14 [Bug target/29953] New: [SH-4] Perfomance regression in loops. cmp/eq used instead of dt nbkolchin at gmail dot com
2006-11-23 10:15 ` [Bug target/29953] " nbkolchin at gmail dot com
2007-03-28 10:51 ` mano at roarinelk dot homelinux dot net
@ 2007-04-03 6:43 ` christian dot bruel at st dot com
2007-04-03 14:30 ` christian dot bruel at st dot com
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: christian dot bruel at st dot com @ 2007-04-03 6:43 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from christian dot bruel at st dot com 2007-04-03 07:43 -------
thank you for reporting this,
There is indeed a data dependency on 'r2' introduced by the cmp/eq instruction,
preventing the mov and the comparaison to be executed in parallel, unlike the
dt on the induction variable.
The use of dt seems to be quite sensitive, I checked on various versions sh-gcc
(3.3.x, 3.4.x) and it was difficult to find one using the correct strategy.
I'll check that issue.
Regards,
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29953
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/29953] [SH-4] Perfomance regression in loops. cmp/eq used instead of dt
2006-11-23 10:14 [Bug target/29953] New: [SH-4] Perfomance regression in loops. cmp/eq used instead of dt nbkolchin at gmail dot com
` (2 preceding siblings ...)
2007-04-03 6:43 ` christian dot bruel at st dot com
@ 2007-04-03 14:30 ` christian dot bruel at st dot com
2007-04-03 15:34 ` steven at gcc dot gnu dot org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: christian dot bruel at st dot com @ 2007-04-03 14:30 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from christian dot bruel at st dot com 2007-04-03 15:30 -------
This missed optimisation appears with all counted loops. The ir in gimple
produces
j = 0;
<D1202>:;
j = j + 1;
if (j <= 999)
{
goto <D1202>;
}
The transformation to do ( j=1000; j=j-1; if (j)...) will allow the decrement
and test pattern to be catched by combine.
Since this transformation needs to know about code selection (and is only
useful if the number of issued instructions is > 1), it seems best to do it in
rtl. I'm thinking about strength_reduce in loop.c when we optimize bivs.
Question: does it make sense to do this transformation in loop.c ? I'm thinking
at strength_reduce.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29953
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/29953] [SH-4] Perfomance regression in loops. cmp/eq used instead of dt
2006-11-23 10:14 [Bug target/29953] New: [SH-4] Perfomance regression in loops. cmp/eq used instead of dt nbkolchin at gmail dot com
` (3 preceding siblings ...)
2007-04-03 14:30 ` christian dot bruel at st dot com
@ 2007-04-03 15:34 ` steven at gcc dot gnu dot org
2007-05-15 9:31 ` chrbr at gcc dot gnu dot org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: steven at gcc dot gnu dot org @ 2007-04-03 15:34 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from steven at gcc dot gnu dot org 2007-04-03 16:34 -------
Re. comment #4:
Answer: Go ahead and implement it in loop.c.
If you want to fix it only for GCC 4.1, that is. There is no loop.c in GCC 4.2
and later.
So does it make sense? Depends on what you want to achieve.
--
steven at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2007-04-03 16:34:17
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29953
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/29953] [SH-4] Perfomance regression in loops. cmp/eq used instead of dt
2006-11-23 10:14 [Bug target/29953] New: [SH-4] Perfomance regression in loops. cmp/eq used instead of dt nbkolchin at gmail dot com
` (4 preceding siblings ...)
2007-04-03 15:34 ` steven at gcc dot gnu dot org
@ 2007-05-15 9:31 ` chrbr at gcc dot gnu dot org
2007-06-08 7:59 ` chrbr at gcc dot gnu dot org
2007-06-08 8:18 ` chrbr at gcc dot gnu dot org
7 siblings, 0 replies; 9+ messages in thread
From: chrbr at gcc dot gnu dot org @ 2007-05-15 9:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from chrbr at gcc dot gnu dot org 2007-05-15 10:30 -------
I dropped the 4.1 and implemented a -finvert-loops option on the trunk.
This option allows a basic induction variable to be decremented instead of
incremented to support exit testing against 0.
I'm validating a patch on intel and sh.
--
chrbr at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |chrbr at gcc dot gnu dot org
|dot org |
Status|NEW |ASSIGNED
Last reconfirmed|2007-04-03 16:34:17 |2007-05-15 10:30:36
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29953
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/29953] [SH-4] Perfomance regression in loops. cmp/eq used instead of dt
2006-11-23 10:14 [Bug target/29953] New: [SH-4] Perfomance regression in loops. cmp/eq used instead of dt nbkolchin at gmail dot com
` (5 preceding siblings ...)
2007-05-15 9:31 ` chrbr at gcc dot gnu dot org
@ 2007-06-08 7:59 ` chrbr at gcc dot gnu dot org
2007-06-08 8:18 ` chrbr at gcc dot gnu dot org
7 siblings, 0 replies; 9+ messages in thread
From: chrbr at gcc dot gnu dot org @ 2007-06-08 7:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from chrbr at gcc dot gnu dot org 2007-06-08 07:58 -------
Subject: Bug 29953
Author: chrbr
Date: Fri Jun 8 07:58:41 2007
New Revision: 125564
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=125564
Log:
PR target/29953
* config/sh/sh.md (doloop_end): New pattern and splitter.
* loop-iv.c (simple_rhs_p): Check for hardware registers.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.md
trunk/gcc/loop-iv.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29953
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/29953] [SH-4] Perfomance regression in loops. cmp/eq used instead of dt
2006-11-23 10:14 [Bug target/29953] New: [SH-4] Perfomance regression in loops. cmp/eq used instead of dt nbkolchin at gmail dot com
` (6 preceding siblings ...)
2007-06-08 7:59 ` chrbr at gcc dot gnu dot org
@ 2007-06-08 8:18 ` chrbr at gcc dot gnu dot org
7 siblings, 0 replies; 9+ messages in thread
From: chrbr at gcc dot gnu dot org @ 2007-06-08 8:18 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from chrbr at gcc dot gnu dot org 2007-06-08 08:18 -------
doloop_optimize does the iv inversion with the doloop_end insn support in the
machine description.
--
chrbr at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
Target Milestone|--- |4.3.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29953
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-06-08 8:18 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-11-23 10:14 [Bug target/29953] New: [SH-4] Perfomance regression in loops. cmp/eq used instead of dt nbkolchin at gmail dot com
2006-11-23 10:15 ` [Bug target/29953] " nbkolchin at gmail dot com
2007-03-28 10:51 ` mano at roarinelk dot homelinux dot net
2007-04-03 6:43 ` christian dot bruel at st dot com
2007-04-03 14:30 ` christian dot bruel at st dot com
2007-04-03 15:34 ` steven at gcc dot gnu dot org
2007-05-15 9:31 ` chrbr at gcc dot gnu dot org
2007-06-08 7:59 ` chrbr at gcc dot gnu dot org
2007-06-08 8:18 ` chrbr at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).