public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64
@ 2007-10-27 15:40 tbm at cyrius dot com
2007-10-27 15:48 ` [Bug tree-optimization/33922] " rguenth at gcc dot gnu dot org
` (26 more replies)
0 siblings, 27 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 15:40 UTC (permalink / raw)
To: gcc-bugs
The following program take about 30 seconds to compile on IA64 with -O3 and
trunk, but took less than a second with 4.2. On x86_x86 it take about 3
seconds.
(sid)tbm@coconut0:~$ time /usr/lib/gcc-snapshot/bin/gcc -c -O3 slow.c
real 0m32.572s
user 0m18.838s
sys 0m0.049s
(sid)tbm@coconut0:~$ time gcc-4.2 -c -O3 slow.c
real 0m0.696s
user 0m0.062s
sys 0m0.022s
(sid)tbm@coconut0:~$
--
Summary: [4.3 Regression] slow compilation on ia64
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: tbm at cyrius dot com
GCC target triplet: ia64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33922] [4.3 Regression] slow compilation on ia64
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
@ 2007-10-27 15:48 ` rguenth at gcc dot gnu dot org
2007-10-27 16:03 ` tbm at cyrius dot com
` (25 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-10-27 15:48 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from rguenth at gcc dot gnu dot org 2007-10-27 15:48 -------
-ftime-report output please?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33922] [4.3 Regression] slow compilation on ia64
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
2007-10-27 15:48 ` [Bug tree-optimization/33922] " rguenth at gcc dot gnu dot org
@ 2007-10-27 16:03 ` tbm at cyrius dot com
2007-10-27 16:07 ` tbm at cyrius dot com
` (24 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 16:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from tbm at cyrius dot com 2007-10-27 16:03 -------
compile times:
20070303 0m25.928s
20070422 0m8.723s
20070515 0m7.345s
20070613 0m8.996s
20070811 0m8.172s
20070916 0m24.503s
20071020 0m34.445s
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33922] [4.3 Regression] slow compilation on ia64
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
2007-10-27 15:48 ` [Bug tree-optimization/33922] " rguenth at gcc dot gnu dot org
2007-10-27 16:03 ` tbm at cyrius dot com
@ 2007-10-27 16:07 ` tbm at cyrius dot com
2007-10-27 16:14 ` tbm at cyrius dot com
` (23 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 16:07 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from tbm at cyrius dot com 2007-10-27 16:07 -------
(In reply to comment #1)
> -ftime-report output please?
(sid)tbm@coconut0:~/x$ /usr/lib/gcc-snapshot/bin/gcc -c -O3 -ftime-report
slow.c
Execution times (seconds)
garbage collection : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.32 ( 1%) wall
0 kB ( 0%) ggc
callgraph construction: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
13 kB ( 0%) ggc
callgraph optimization: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
2 kB ( 0%) ggc
CFG verifier : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
df live regs : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
0 kB ( 0%) ggc
df live&initialized regs: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
df reg dead/unused notes: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
142 kB ( 1%) ggc
register information : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
224 kB ( 2%) ggc
rebuild jump labels : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 1%) wall
0 kB ( 0%) ggc
parser : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
83 kB ( 1%) ggc
tree gimplify : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
14 kB ( 0%) ggc
tree CFG construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
23 kB ( 0%) ggc
tree CFG cleanup : 0.00 ( 0%) usr 0.00 ( 2%) sys 0.02 ( 0%) wall
1018 kB ( 8%) ggc
tree VRP : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
132 kB ( 1%) ggc
tree reassociation : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree PRE : 0.39 ( 2%) usr 0.00 ( 4%) sys 0.41 ( 2%) wall
1052 kB ( 8%) ggc
tree conservative DCE : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
predictive commoning : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
tree SSA to normal : 0.06 ( 0%) usr 0.00 ( 4%) sys 0.06 ( 0%) wall
1010 kB ( 8%) ggc
tree SSA verifier : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
10 kB ( 0%) ggc
tree STMT verifier : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
0 kB ( 0%) ggc
expand : 0.08 ( 0%) usr 0.00 ( 2%) sys 0.77 ( 3%) wall
1163 kB ( 9%) ggc
jump : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
CSE : 0.03 ( 0%) usr 0.00 ( 2%) sys 0.04 ( 0%) wall
1 kB ( 0%) ggc
dead store elim1 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
129 kB ( 1%) ggc
dead store elim2 : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 1%) wall
267 kB ( 2%) ggc
CPROP 2 : 0.01 ( 0%) usr 0.00 ( 2%) sys 0.01 ( 0%) wall
132 kB ( 1%) ggc
bypass jumps : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
130 kB ( 1%) ggc
CSE 2 : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
1 kB ( 0%) ggc
branch prediction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
combiner : 0.82 ( 4%) usr 0.00 ( 0%) sys 0.91 ( 3%) wall
452 kB ( 3%) ggc
if-conversion : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
352 kB ( 3%) ggc
regmove : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
scheduling : 1.32 ( 7%) usr 0.00 ( 2%) sys 1.55 ( 6%) wall
194 kB ( 1%) ggc
local alloc : 0.14 ( 1%) usr 0.00 ( 0%) sys 0.14 ( 1%) wall
50 kB ( 0%) ggc
global alloc : 0.54 ( 3%) usr 0.00 ( 9%) sys 0.78 ( 3%) wall
2537 kB (19%) ggc
reload CSE regs : 0.18 ( 1%) usr 0.00 ( 0%) sys 0.19 ( 1%) wall
584 kB ( 4%) ggc
load CSE after reload : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
thread pro- & epilogue: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
24 kB ( 0%) ggc
rename registers : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
scheduling 2 : 14.45 (78%) usr 0.03 (65%) sys 19.36 (74%) wall
2099 kB (16%) ggc
machine dep reorg : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
final : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 1%) wall
0 kB ( 0%) ggc
symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
TOTAL : 18.63 0.04 26.28
13034 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33922] [4.3 Regression] slow compilation on ia64
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (2 preceding siblings ...)
2007-10-27 16:07 ` tbm at cyrius dot com
@ 2007-10-27 16:14 ` tbm at cyrius dot com
2007-10-27 16:44 ` tbm at cyrius dot com
` (22 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 16:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from tbm at cyrius dot com 2007-10-27 16:13 -------
Maybe something for Maxim to look at?
--
tbm at cyrius dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mkuvyrkov at gcc dot gnu dot
| |org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33922] [4.3 Regression] slow compilation on ia64
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (3 preceding siblings ...)
2007-10-27 16:14 ` tbm at cyrius dot com
@ 2007-10-27 16:44 ` tbm at cyrius dot com
2007-10-27 17:51 ` tbm at cyrius dot com
` (21 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 16:44 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from tbm at cyrius dot com 2007-10-27 16:44 -------
Oops, I forgot to add the testcase:
typedef enum
{
ST_TiemanStyle,
}
BrailleDisplay;
static int pendingCommand;
static int currentModifiers;
typedef struct
{
int (*updateKeys) (BrailleDisplay * brl, int *keyPressed);
}
ProtocolOperations;
static const ProtocolOperations *protocol;
brl_readCommand (BrailleDisplay * brl)
{
unsigned long int keys;
int command;
int keyPressed;
unsigned char routingKeys[200];
int routingKeyCount;
signed char rightVerticalSensor;
if (pendingCommand != (-1))
{
return command;
}
if (!protocol->updateKeys (brl, &keyPressed))
{
if (rightVerticalSensor >= 0)
keys |= 1;
if ((routingKeyCount == 0) && keys)
{
if (currentModifiers)
{
doChord:switch (keys);
}
else
{
doCharacter:
command = 0X2200;
if (keys & 0X01UL)
command |= 0001;
if (keys & 0X02UL)
command |= 0002;
if (keys & 0X04UL)
command |= 0004;
if (keys & 0X08UL)
command |= 0010;
if (keys & 0X10UL)
command |= 0020;
if (keys & 0X20UL)
command |= 0040;
if (currentModifiers & (0X0010 | 0X0200))
command |= 0100;
if (currentModifiers & 0X0040)
command |= 0200;
if (currentModifiers & 0X0100)
command |= 0X020000;
if (currentModifiers & 0X0400)
command |= 0X080000;
if (currentModifiers & 0X0800)
command |= 0X040000;
}
unsigned char key1 = routingKeys[0];
if (key1 == 0)
{
}
if (key1 == 1)
if (keys)
{
currentModifiers |= 0X0010;
goto doCharacter;
}
}
}
return command;
}
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33922] [4.3 Regression] slow compilation on ia64
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (4 preceding siblings ...)
2007-10-27 16:44 ` tbm at cyrius dot com
@ 2007-10-27 17:51 ` tbm at cyrius dot com
2007-10-27 17:52 ` tbm at cyrius dot com
` (20 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 17:51 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from tbm at cyrius dot com 2007-10-27 17:50 -------
As a comparison, here is what I get with 20070811:
(sid)tbm@coconut0:~/x$ /usr/lib/gcc-snapshot/bin/gcc -c -O3 -ftime-report
slow.c
Execution times (seconds)
garbage collection : 0.06 ( 2%) usr 0.00 ( 0%) sys 0.43 ( 5%) wall
0 kB ( 0%) ggc
CFG verifier : 0.02 ( 1%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
df live regs : 0.02 ( 1%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
df use-def / def-use chains: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 1%)
wall 0 kB ( 0%) ggc
df reg dead/unused notes: 0.01 ( 0%) usr 0.00 ( 2%) sys 0.01 ( 0%) wall
198 kB ( 2%) ggc
register information : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 0.03 ( 1%) usr 0.00 ( 0%) sys 0.15 ( 2%) wall
224 kB ( 2%) ggc
parser : 0.00 ( 0%) usr 0.00 ( 8%) sys 0.01 ( 0%) wall
81 kB ( 1%) ggc
tree VRP : 0.01 ( 0%) usr 0.00 ( 3%) sys 0.01 ( 0%) wall
132 kB ( 1%) ggc
tree operand scan : 0.01 ( 0%) usr 0.00 ( 3%) sys 0.01 ( 0%) wall
106 kB ( 1%) ggc
tree PRE : 0.41 (13%) usr 0.00 ( 3%) sys 1.00 (11%) wall
1052 kB ( 9%) ggc
tree SSA to normal : 0.08 ( 3%) usr 0.00 ( 2%) sys 0.32 ( 3%) wall
1023 kB ( 8%) ggc
tree SSA verifier : 0.03 ( 1%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
10 kB ( 0%) ggc
tree STMT verifier : 0.04 ( 1%) usr 0.00 ( 0%) sys 0.24 ( 3%) wall
0 kB ( 0%) ggc
expand : 0.02 ( 1%) usr 0.01 (12%) sys 0.03 ( 0%) wall
571 kB ( 5%) ggc
CSE : 0.04 ( 1%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
1 kB ( 0%) ggc
dead code elimination : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
dead store elim2 : 0.04 ( 1%) usr 0.00 ( 0%) sys 0.05 ( 1%) wall
122 kB ( 1%) ggc
CPROP 1 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 1%) wall
97 kB ( 1%) ggc
CPROP 2 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
131 kB ( 1%) ggc
bypass jumps : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
130 kB ( 1%) ggc
CSE 2 : 0.02 ( 1%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
1 kB ( 0%) ggc
combiner : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
23 kB ( 0%) ggc
if-conversion : 0.04 ( 1%) usr 0.00 ( 0%) sys 0.16 ( 2%) wall
0 kB ( 0%) ggc
regmove : 0.04 ( 1%) usr 0.00 ( 3%) sys 0.13 ( 1%) wall
0 kB ( 0%) ggc
scheduling : 0.40 (12%) usr 0.00 ( 5%) sys 1.17 (13%) wall
61 kB ( 1%) ggc
local alloc : 0.03 ( 1%) usr 0.00 ( 0%) sys 0.15 ( 2%) wall
162 kB ( 1%) ggc
global alloc : 0.35 (11%) usr 0.01 ( 9%) sys 1.03 (11%) wall
2694 kB (22%) ggc
reload CSE regs : 0.22 ( 7%) usr 0.00 ( 2%) sys 0.67 ( 7%) wall
686 kB ( 6%) ggc
load CSE after reload : 0.07 ( 2%) usr 0.00 ( 2%) sys 0.18 ( 2%) wall
0 kB ( 0%) ggc
rename registers : 0.07 ( 2%) usr 0.00 ( 0%) sys 0.22 ( 2%) wall
3 kB ( 0%) ggc
scheduling 2 : 1.02 (31%) usr 0.01 (11%) sys 2.50 (27%) wall
1192 kB (10%) ggc
machine dep reorg : 0.03 ( 1%) usr 0.00 ( 2%) sys 0.04 ( 0%) wall
1 kB ( 0%) ggc
final : 0.02 ( 1%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
TOTAL : 3.24 0.06 9.11
12164 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug tree-optimization/33922] [4.3 Regression] slow compilation on ia64
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (5 preceding siblings ...)
2007-10-27 17:51 ` tbm at cyrius dot com
@ 2007-10-27 17:52 ` tbm at cyrius dot com
2007-10-27 18:01 ` [Bug middle-end/33922] " pinskia at gcc dot gnu dot org
` (19 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 17:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from tbm at cyrius dot com 2007-10-27 17:52 -------
So scheduling 2 has gone from 2.5 to 19.36 seconds from 20070811 to
20071020 (both with checking enabled).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug middle-end/33922] [4.3 Regression] slow compilation on ia64
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (6 preceding siblings ...)
2007-10-27 17:52 ` tbm at cyrius dot com
@ 2007-10-27 18:01 ` pinskia at gcc dot gnu dot org
2007-10-27 18:08 ` tbm at cyrius dot com
` (18 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-10-27 18:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from pinskia at gcc dot gnu dot org 2007-10-27 18:00 -------
> Extra diagnostic checks enabled; compiler may run slowly.
> Configure with --enable-checking=release to disable checks.
We added this message for a reason, seems like you should try that for first.
The release branches defaults to --enable-checking=release.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |WAITING
Component|tree-optimization |middle-end
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug middle-end/33922] [4.3 Regression] slow compilation on ia64
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (7 preceding siblings ...)
2007-10-27 18:01 ` [Bug middle-end/33922] " pinskia at gcc dot gnu dot org
@ 2007-10-27 18:08 ` tbm at cyrius dot com
2007-10-27 18:10 ` Andrew Pinski
2007-10-27 18:10 ` pinskia at gmail dot com
` (17 subsequent siblings)
26 siblings, 1 reply; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 18:08 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from tbm at cyrius dot com 2007-10-27 18:08 -------
(In reply to comment #8)
> > Extra diagnostic checks enabled; compiler may run slowly.
> > Configure with --enable-checking=release to disable checks.
>
> We added this message for a reason, seems like you should try that for first.
> The release branches defaults to --enable-checking=release.
Well, I showed that even with checking enabled the compiler was _much_ faster
2 months ago. But, ok, I'll try with checking disabled too.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Bug middle-end/33922] [4.3 Regression] slow compilation on ia64
2007-10-27 18:08 ` tbm at cyrius dot com
@ 2007-10-27 18:10 ` Andrew Pinski
0 siblings, 0 replies; 29+ messages in thread
From: Andrew Pinski @ 2007-10-27 18:10 UTC (permalink / raw)
To: gcc-bugzilla; +Cc: gcc-bugs
On 27 Oct 2007 18:08:21 -0000, tbm at cyrius dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
> Well, I showed that even with checking enabled the compiler was _much_ faster
> 2 months ago. But, ok, I'll try with checking disabled too.
Well someone (maybe DF) could have added a lot of checking.
-- Pinski
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug middle-end/33922] [4.3 Regression] slow compilation on ia64
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (8 preceding siblings ...)
2007-10-27 18:08 ` tbm at cyrius dot com
@ 2007-10-27 18:10 ` pinskia at gmail dot com
2007-10-27 18:13 ` tbm at cyrius dot com
` (16 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: pinskia at gmail dot com @ 2007-10-27 18:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from pinskia at gmail dot com 2007-10-27 18:10 -------
Subject: Re: [4.3 Regression] slow compilation on ia64
On 27 Oct 2007 18:08:21 -0000, tbm at cyrius dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
> Well, I showed that even with checking enabled the compiler was _much_ faster
> 2 months ago. But, ok, I'll try with checking disabled too.
Well someone (maybe DF) could have added a lot of checking.
-- Pinski
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug middle-end/33922] [4.3 Regression] slow compilation on ia64
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (9 preceding siblings ...)
2007-10-27 18:10 ` pinskia at gmail dot com
@ 2007-10-27 18:13 ` tbm at cyrius dot com
2007-10-27 18:53 ` tbm at cyrius dot com
` (15 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 18:13 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from tbm at cyrius dot com 2007-10-27 18:13 -------
(In reply to comment #10)
> Well someone (maybe DF) could have added a lot of checking.
OK, good point.
I'll report my findings in a few hours.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug middle-end/33922] [4.3 Regression] slow compilation on ia64
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (10 preceding siblings ...)
2007-10-27 18:13 ` tbm at cyrius dot com
@ 2007-10-27 18:53 ` tbm at cyrius dot com
2007-10-27 18:59 ` [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling) pinskia at gcc dot gnu dot org
` (14 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 18:53 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from tbm at cyrius dot com 2007-10-27 18:53 -------
Same results without checking (actually, even slower - is that possible?):
(sid)tbm@coconut0:~/tmp/gcc/gcc-4.3-20071027-r129674-no-checking/gcc$ ./xgcc
-B. -ftime-report -O3 -c ~/slow.c
Execution times (seconds)
df live regs : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
df live&initialized regs: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
df reg dead/unused notes: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
142 kB ( 1%) ggc
register information : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
224 kB ( 2%) ggc
tree VRP : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
132 kB ( 1%) ggc
tree PRE : 0.37 ( 2%) usr 0.00 ( 3%) sys 0.64 ( 1%) wall
1052 kB ( 8%) ggc
tree SSA to normal : 0.06 ( 0%) usr 0.00 ( 3%) sys 0.06 ( 0%) wall
1010 kB ( 8%) ggc
expand : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
1182 kB ( 9%) ggc
CSE : 0.03 ( 0%) usr 0.00 ( 7%) sys 0.14 ( 0%) wall
1 kB ( 0%) ggc
dead store elim2 : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
267 kB ( 2%) ggc
CPROP 2 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
132 kB ( 1%) ggc
bypass jumps : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
130 kB ( 1%) ggc
CSE 2 : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 1%) wall
0 kB ( 0%) ggc
combiner : 0.81 ( 4%) usr 0.00 ( 3%) sys 1.77 ( 3%) wall
452 kB ( 4%) ggc
if-conversion : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
352 kB ( 3%) ggc
regmove : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
scheduling : 1.34 ( 6%) usr 0.00 ( 0%) sys 3.53 ( 7%) wall
194 kB ( 2%) ggc
local alloc : 0.14 ( 1%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall
50 kB ( 0%) ggc
global alloc : 0.53 ( 2%) usr 0.00 ( 3%) sys 0.70 ( 1%) wall
2537 kB (20%) ggc
reload CSE regs : 0.17 ( 1%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall
584 kB ( 5%) ggc
load CSE after reload : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
0 kB ( 0%) ggc
rename registers : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
scheduling 2 : 18.96 (83%) usr 0.02 (66%) sys 43.24 (84%) wall
1970 kB (15%) ggc
final : 0.02 ( 0%) usr 0.00 ( 3%) sys 0.12 ( 0%) wall
0 kB ( 0%) ggc
TOTAL : 22.83 0.03 51.54
12913 kB
--
tbm at cyrius dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |UNCONFIRMED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (11 preceding siblings ...)
2007-10-27 18:53 ` tbm at cyrius dot com
@ 2007-10-27 18:59 ` pinskia at gcc dot gnu dot org
2007-10-27 19:27 ` tbm at cyrius dot com
` (13 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-10-27 18:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from pinskia at gcc dot gnu dot org 2007-10-27 18:59 -------
What happens if you compile with -O3 -fno-tree-vectorize ?
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |pinskia at gcc dot gnu dot
| |org
Component|middle-end |rtl-optimization
Summary|[4.3 Regression] slow |[4.3 Regression] slow
|compilation on ia64 |compilation on ia64
| |(postreload scheduling)
Target Milestone|--- |4.3.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (12 preceding siblings ...)
2007-10-27 18:59 ` [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling) pinskia at gcc dot gnu dot org
@ 2007-10-27 19:27 ` tbm at cyrius dot com
2007-10-28 19:11 ` jakub at gcc dot gnu dot org
` (12 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: tbm at cyrius dot com @ 2007-10-27 19:27 UTC (permalink / raw)
To: gcc-bugs
------- Comment #14 from tbm at cyrius dot com 2007-10-27 19:27 -------
(In reply to comment #13)
> What happens if you compile with -O3 -fno-tree-vectorize ?
It's still slow:
(sid)tbm@coconut0:~/tmp/gcc/gcc-4.3-20071027-r129674-no-checking/gcc$ ./xgcc
-B. -ftime-report -O3 -fno-tree-vectorize -c ~/slow.c
Execution times (seconds)
callgraph construction: 0.00 ( 0%) usr 0.00 ( 2%) sys 0.07 ( 0%) wall
13 kB ( 0%) ggc
callgraph optimization: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
2 kB ( 0%) ggc
df reaching defs : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
0 kB ( 0%) ggc
df live regs : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
df live&initialized regs: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
df reg dead/unused notes: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
142 kB ( 1%) ggc
register information : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
224 kB ( 2%) ggc
parser : 0.00 ( 0%) usr 0.00 ( 1%) sys 0.04 ( 0%) wall
83 kB ( 1%) ggc
inline heuristics : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree gimplify : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
14 kB ( 0%) ggc
tree CFG construction : 0.00 ( 0%) usr 0.00 ( 1%) sys 0.02 ( 0%) wall
23 kB ( 0%) ggc
tree CFG cleanup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
1018 kB ( 8%) ggc
tree VRP : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
132 kB ( 1%) ggc
tree copy propagation : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
24 kB ( 0%) ggc
tree PRE : 0.37 ( 2%) usr 0.00 ( 0%) sys 0.47 ( 1%) wall
1052 kB ( 8%) ggc
tree SSA to normal : 0.06 ( 0%) usr 0.00 ( 1%) sys 0.06 ( 0%) wall
1010 kB ( 8%) ggc
expand : 0.04 ( 0%) usr 0.00 ( 2%) sys 0.37 ( 1%) wall
1182 kB ( 9%) ggc
forward prop : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
2 kB ( 0%) ggc
CSE : 0.03 ( 0%) usr 0.00 ( 1%) sys 0.03 ( 0%) wall
1 kB ( 0%) ggc
dead store elim2 : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
267 kB ( 2%) ggc
CPROP 2 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
132 kB ( 1%) ggc
bypass jumps : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
130 kB ( 1%) ggc
CSE 2 : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
0 kB ( 0%) ggc
branch prediction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
combiner : 0.82 ( 3%) usr 0.00 ( 0%) sys 1.66 ( 4%) wall
452 kB ( 4%) ggc
if-conversion : 0.02 ( 0%) usr 0.00 ( 1%) sys 0.03 ( 0%) wall
352 kB ( 3%) ggc
regmove : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
scheduling : 1.34 ( 5%) usr 0.00 ( 0%) sys 2.99 ( 7%) wall
194 kB ( 2%) ggc
local alloc : 0.14 ( 1%) usr 0.00 ( 0%) sys 0.34 ( 1%) wall
50 kB ( 0%) ggc
global alloc : 0.53 ( 2%) usr 0.00 ( 1%) sys 1.15 ( 3%) wall
2537 kB (20%) ggc
reload CSE regs : 0.17 ( 1%) usr 0.00 ( 0%) sys 0.36 ( 1%) wall
584 kB ( 5%) ggc
load CSE after reload : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
0 kB ( 0%) ggc
if-conversion 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
0 kB ( 0%) ggc
rename registers : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
scheduling 2 : 20.44 (84%) usr 0.08 (84%) sys 31.73 (79%) wall
1970 kB (15%) ggc
final : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
0 kB ( 0%) ggc
TOTAL : 24.34 0.10 40.40
12913 kB
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (13 preceding siblings ...)
2007-10-27 19:27 ` tbm at cyrius dot com
@ 2007-10-28 19:11 ` jakub at gcc dot gnu dot org
2007-10-28 19:42 ` jakub at gcc dot gnu dot org
` (11 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-10-28 19:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #15 from jakub at gcc dot gnu dot org 2007-10-28 19:10 -------
Compared to 20070803 with -O3 -fno-tree-vectorize there are now 100 times more
calls to rtx_needs_barrier and 44 times more calls to
safe_group_barrier_needed.
E.g. the latter is horribly expensive, e.g. copying around 401 * sizeof (struct
reg_write_state) == 1604 bytes several times.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (14 preceding siblings ...)
2007-10-28 19:11 ` jakub at gcc dot gnu dot org
@ 2007-10-28 19:42 ` jakub at gcc dot gnu dot org
2007-10-28 20:04 ` maxim at codesourcery dot com
` (10 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-10-28 19:42 UTC (permalink / raw)
To: gcc-bugs
------- Comment #16 from jakub at gcc dot gnu dot org 2007-10-28 19:42 -------
Haven't analyzed why exactly there are so many more safe_group_barrier_needed
calls, but they are certainly much more common than direct group_barrier_needed
calls on this testcase (14579701 safe_group_barrier_needed calls,
14604168 group_barrier_needed calls). But if so, the only thing that call
cares about is the return value, all the state is thrown away. From what I see
the need_barrier retval is ored together from all the recursive calls, couldn't
we gain something by just returning 1 immediately whenever one of the recursive
calls returned non-zero?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (15 preceding siblings ...)
2007-10-28 19:42 ` jakub at gcc dot gnu dot org
@ 2007-10-28 20:04 ` maxim at codesourcery dot com
2007-10-28 20:21 ` jakub at gcc dot gnu dot org
` (9 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: maxim at codesourcery dot com @ 2007-10-28 20:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #17 from maxim at codesourcery dot com 2007-10-28 20:04 -------
Subject: Re: [4.3 Regression] slow compilation
on ia64 (postreload scheduling)
jakub at gcc dot gnu dot org wrote:
> ------- Comment #15 from jakub at gcc dot gnu dot org 2007-10-28 19:10 -------
> Compared to 20070803 with -O3 -fno-tree-vectorize there are now 100 times more
> calls to rtx_needs_barrier and 44 times more calls to
> safe_group_barrier_needed.
> E.g. the latter is horribly expensive, e.g. copying around 401 * sizeof (struct
> reg_write_state) == 1604 bytes several times.
The underlying problem is that list of ready to schedule instructions
now became larger than it was before and the scheduler tends to slow
down with the size of the list growing. There already is a workaround
for this problem (limiting ready list in case it is too large; see
PARAM_MAX_SCHED_READY_INSNS) but it doesn't seem to do best in this case.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (16 preceding siblings ...)
2007-10-28 20:04 ` maxim at codesourcery dot com
@ 2007-10-28 20:21 ` jakub at gcc dot gnu dot org
2007-10-28 20:43 ` jakub at gcc dot gnu dot org
` (8 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-10-28 20:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #18 from jakub at gcc dot gnu dot org 2007-10-28 20:20 -------
Created an attachment (id=14429)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14429&action=view)
rws_insn.patch
Just a side note. Maintaining the rws_insn array seems to be horribly
expensive to me, and for each regno only one bit is actually used just to check
one gcc_assert and only two regnos are actually checked in some other code. So
memsetting and maintaining a 1604 bytes long array all the time seems to be an
overkill - a bitmap can do just fine or, if we just remove that gcc_assert
when not ENABLE_CHECKING, we need just 2 bits altogether instead of those 1604
bytes.
Doesn't help much on this testcase (as it is not addressing the algorithmic
issue), but is already noticeable.
scheduling 2 : 10.60 (88%) usr 0.00 ( 0%) sys 10.60 (88%) wall
1970 kB (15%) ggc
went down to
scheduling 2 : 8.99 (86%) usr 0.01 (50%) sys 9.00 (86%) wall
1970 kB (15%) ggc
with this patch and --enable-checking=release, so about 14% speedup in wall
time for the whole compilation of this file.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (17 preceding siblings ...)
2007-10-28 20:21 ` jakub at gcc dot gnu dot org
@ 2007-10-28 20:43 ` jakub at gcc dot gnu dot org
2007-10-28 21:11 ` jakub at gcc dot gnu dot org
` (7 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-10-28 20:43 UTC (permalink / raw)
To: gcc-bugs
------- Comment #19 from jakub at gcc dot gnu dot org 2007-10-28 20:42 -------
Another trivial patch that improves speed is:
--- ia64.c (revision 129700)
+++ ia64.c (working copy)
@@ -5310,11 +5310,11 @@ ia64_safe_type (rtx insn)
struct reg_write_state
{
- unsigned int write_count : 2;
- unsigned int first_pred : 16;
- unsigned int written_by_fp : 1;
- unsigned int written_by_and : 1;
- unsigned int written_by_or : 1;
+ unsigned short write_count : 2;
+ unsigned short first_pred : 10;
+ unsigned short written_by_fp : 1;
+ unsigned short written_by_and : 1;
+ unsigned short written_by_or : 1;
};
/* Cumulative info for the current instruction group. */
which cuts the size of rws_sum and rws_saved arrays into half (1604 to 802
bytes)
and with both patches in I get:
scheduling 2 : 6.86 (82%) usr 0.01 (50%) sys 6.87 (82%) wall
1970 kB (15%) ggc
or 31% speedup in wall time both patches together. first_pred is either 0 or
PR_REG(0) through PR_REG(63), so it certainly fits into 10 bit bitfield. If
needed it would fit even into 6 bit (as when pred == 0, write_count will be
already 2 and we could subtract PR_REG(0) from it), but that's still too big to
squeeze it into 1 byte per register.
Even when this bug is fixed for real, both changes IMHO make sense anyway (the
first patch could perhaps use some cleanup, nice macros to hide it or
something).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (18 preceding siblings ...)
2007-10-28 20:43 ` jakub at gcc dot gnu dot org
@ 2007-10-28 21:11 ` jakub at gcc dot gnu dot org
2007-10-29 8:43 ` jakub at gcc dot gnu dot org
` (6 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-10-28 21:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #20 from jakub at gcc dot gnu dot org 2007-10-28 21:11 -------
Actually, we don't probably need to write to rws_sum array at all when in
safe_group_barried_needed and then we wouldn't need to copy it around (save and
restore it) at all.
--- config/ia64/ia64.c~ 2007-10-28 22:00:24.000000000 +0100
+++ config/ia64/ia64.c 2007-10-28 22:04:26.000000000 +0100
@@ -5353,6 +5353,7 @@ static int rtx_needs_barrier (rtx, struc
static void init_insn_group_barriers (void);
static int group_barrier_needed (rtx);
static int safe_group_barrier_needed (rtx);
+static int in_safe_group_barrier;
/* Update *RWS for REGNO, which is being written by the current instruction,
with predicate PRED, and associated register flags in FLAGS. */
@@ -5407,7 +5408,8 @@ rws_access_regno (int regno, struct reg_
{
case 0:
/* The register has not been written yet. */
- rws_update (regno, flags, pred);
+ if (!in_safe_group_barrier)
+ rws_update (regno, flags, pred);
break;
case 1:
@@ -5421,7 +5423,8 @@ rws_access_regno (int regno, struct reg_
;
else if ((rws_sum[regno].first_pred ^ 1) != pred)
need_barrier = 1;
- rws_update (regno, flags, pred);
+ if (!in_safe_group_barrier)
+ rws_update (regno, flags, pred);
break;
case 2:
@@ -5433,8 +5436,11 @@ rws_access_regno (int regno, struct reg_
;
else
need_barrier = 1;
- rws_sum[regno].written_by_and = flags.is_and;
- rws_sum[regno].written_by_or = flags.is_or;
+ if (!in_safe_group_barrier)
+ {
+ rws_sum[regno].written_by_and = flags.is_and;
+ rws_sum[regno].written_by_or = flags.is_or;
+ }
break;
default:
@@ -6099,17 +6105,16 @@ int safe_group_barrier_needed_cnt[5];
static int
safe_group_barrier_needed (rtx insn)
{
- struct reg_write_state rws_saved[NUM_REGS];
int saved_first_instruction;
int t;
- memcpy (rws_saved, rws_sum, NUM_REGS * sizeof *rws_saved);
saved_first_instruction = first_instruction;
+ in_safe_group_barrier = 1;
t = group_barrier_needed (insn);
- memcpy (rws_sum, rws_saved, NUM_REGS * sizeof *rws_saved);
first_instruction = saved_first_instruction;
+ in_safe_group_barrier = 0;
return t;
}
together with the other patches gives (everything is x86_64-linux -> ia64-linux
cross, would need to measure it on ia64-linux native)
scheduling 2 : 5.20 (78%) usr 0.01 (50%) sys 5.20 (77%) wall
1970 kB (15%) ggc
or ~ 45% speedup on this testcase.
--
jakub at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wilson at gcc dot gnu dot
| |org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (19 preceding siblings ...)
2007-10-28 21:11 ` jakub at gcc dot gnu dot org
@ 2007-10-29 8:43 ` jakub at gcc dot gnu dot org
2007-11-01 20:59 ` jakub at gcc dot gnu dot org
` (5 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-10-29 8:43 UTC (permalink / raw)
To: gcc-bugs
------- Comment #21 from jakub at gcc dot gnu dot org 2007-10-29 08:43 -------
Created an attachment (id=14433)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14433&action=view)
gcc43-ia64-rws-speedups.patch
All 3 patches together, with macros.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (20 preceding siblings ...)
2007-10-29 8:43 ` jakub at gcc dot gnu dot org
@ 2007-11-01 20:59 ` jakub at gcc dot gnu dot org
2007-11-01 21:02 ` rguenther at suse dot de
` (4 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu dot org @ 2007-11-01 20:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #22 from jakub at gcc dot gnu dot org 2007-11-01 20:59 -------
The most important cause of the slowdown e.g. compared to 4.2.x is the totally
insane thing -ftree-pre creates though.
For -O3 -fno-tree-vectorize -fdump-tree-all pr33922.c
wc -l shows
2361 pr33922.c.090t.sink
while for -O3 -fno-tree-vectorize -fno-tree-pre -fdump-tree-all pr33922.c
324 pr33922.c.090t.sink
and of course the size of assembly corresponds to this:
11400 pr33922.s # -O3 -fno-tree-vectorize
195 pr33922.s # -O3 -fno-tree-vectorize -fno-tree-pre
-O3 -fno-tree-vectorize -fdump-tree-pre-all dump contains
2081 ^Created.*value lines and all those constants are actually created and
many PHI nodes as well. I believe this might be what nickc was trying to fix
today by adding a limit, but wasn't that limit huge (131072 bits)?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (21 preceding siblings ...)
2007-11-01 20:59 ` jakub at gcc dot gnu dot org
@ 2007-11-01 21:02 ` rguenther at suse dot de
2007-11-03 10:48 ` ebotcazou at gcc dot gnu dot org
` (3 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: rguenther at suse dot de @ 2007-11-01 21:02 UTC (permalink / raw)
To: gcc-bugs
------- Comment #23 from rguenther at suse dot de 2007-11-01 21:01 -------
Subject: Re: [4.3 Regression] slow compilation
on ia64 (postreload scheduling)
On Thu, 1 Nov 2007, jakub at gcc dot gnu dot org wrote:
> ------- Comment #22 from jakub at gcc dot gnu dot org 2007-11-01 20:59 -------
> The most important cause of the slowdown e.g. compared to 4.2.x is the totally
> insane thing -ftree-pre creates though.
> For -O3 -fno-tree-vectorize -fdump-tree-all pr33922.c
> wc -l shows
> 2361 pr33922.c.090t.sink
> while for -O3 -fno-tree-vectorize -fno-tree-pre -fdump-tree-all pr33922.c
> 324 pr33922.c.090t.sink
> and of course the size of assembly corresponds to this:
> 11400 pr33922.s # -O3 -fno-tree-vectorize
> 195 pr33922.s # -O3 -fno-tree-vectorize -fno-tree-pre
>
> -O3 -fno-tree-vectorize -fdump-tree-pre-all dump contains
> 2081 ^Created.*value lines and all those constants are actually created and
> many PHI nodes as well. I believe this might be what nickc was trying to fix
> today by adding a limit, but wasn't that limit huge (131072 bits)?
The limit was to cut off exponential behavior. But yes, PRE (and more
PPRE) is known to increase code-size. Looks like some better heuristics
are needed.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (22 preceding siblings ...)
2007-11-01 21:02 ` rguenther at suse dot de
@ 2007-11-03 10:48 ` ebotcazou at gcc dot gnu dot org
2007-11-05 3:10 ` mmitchel at gcc dot gnu dot org
` (2 subsequent siblings)
26 siblings, 0 replies; 29+ messages in thread
From: ebotcazou at gcc dot gnu dot org @ 2007-11-03 10:48 UTC (permalink / raw)
To: gcc-bugs
--
ebotcazou at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2007-11-03 10:48:04
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (23 preceding siblings ...)
2007-11-03 10:48 ` ebotcazou at gcc dot gnu dot org
@ 2007-11-05 3:10 ` mmitchel at gcc dot gnu dot org
2007-11-05 15:42 ` spop at gcc dot gnu dot org
2007-11-05 15:44 ` spop at gcc dot gnu dot org
26 siblings, 0 replies; 29+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-11-05 3:10 UTC (permalink / raw)
To: gcc-bugs
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (24 preceding siblings ...)
2007-11-05 3:10 ` mmitchel at gcc dot gnu dot org
@ 2007-11-05 15:42 ` spop at gcc dot gnu dot org
2007-11-05 15:44 ` spop at gcc dot gnu dot org
26 siblings, 0 replies; 29+ messages in thread
From: spop at gcc dot gnu dot org @ 2007-11-05 15:42 UTC (permalink / raw)
To: gcc-bugs
------- Comment #24 from spop at gcc dot gnu dot org 2007-11-05 15:42 -------
Subject: Bug 33922
Author: spop
Date: Mon Nov 5 15:42:30 2007
New Revision: 129901
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=129901
Log:
2007-11-05 Nick Clifton <nickc@redhat.com>
Sebastian Pop <sebastian.pop@amd.com>
PR tree-optimization/32540
PR tree-optimization/33922
* doc/invoke.texi: Document PARAM_MAX_PARTIAL_ANTIC_LENGTH.
* tree-ssa-pre.c: Include params.h.
(compute_partial_antic_aux): Use PARAM_MAX_PARTIAL_ANTIC_LENGTH
to limit the maximum length of the PA set for a given block.
* Makefile.in: Add a dependency upon params.h for tree-ssa-pre.c
* params.def (PARAM_MAX_PARTIAL_ANTIC_LENGTH): New parameter.
* gcc.dg/tree-ssa/pr32540-1.c: New.
* gcc.dg/tree-ssa/pr32540-2.c: New.
* gcc.dg/tree-ssa/pr33922.c: New.
Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr32540-1.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr32540-2.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr33922.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/Makefile.in
trunk/gcc/doc/invoke.texi
trunk/gcc/params.def
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-pre.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling)
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
` (25 preceding siblings ...)
2007-11-05 15:42 ` spop at gcc dot gnu dot org
@ 2007-11-05 15:44 ` spop at gcc dot gnu dot org
26 siblings, 0 replies; 29+ messages in thread
From: spop at gcc dot gnu dot org @ 2007-11-05 15:44 UTC (permalink / raw)
To: gcc-bugs
------- Comment #25 from spop at gcc dot gnu dot org 2007-11-05 15:44 -------
Fixed.
--
spop at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33922
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2007-11-05 15:44 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-27 15:40 [Bug tree-optimization/33922] New: [4.3 Regression] slow compilation on ia64 tbm at cyrius dot com
2007-10-27 15:48 ` [Bug tree-optimization/33922] " rguenth at gcc dot gnu dot org
2007-10-27 16:03 ` tbm at cyrius dot com
2007-10-27 16:07 ` tbm at cyrius dot com
2007-10-27 16:14 ` tbm at cyrius dot com
2007-10-27 16:44 ` tbm at cyrius dot com
2007-10-27 17:51 ` tbm at cyrius dot com
2007-10-27 17:52 ` tbm at cyrius dot com
2007-10-27 18:01 ` [Bug middle-end/33922] " pinskia at gcc dot gnu dot org
2007-10-27 18:08 ` tbm at cyrius dot com
2007-10-27 18:10 ` Andrew Pinski
2007-10-27 18:10 ` pinskia at gmail dot com
2007-10-27 18:13 ` tbm at cyrius dot com
2007-10-27 18:53 ` tbm at cyrius dot com
2007-10-27 18:59 ` [Bug rtl-optimization/33922] [4.3 Regression] slow compilation on ia64 (postreload scheduling) pinskia at gcc dot gnu dot org
2007-10-27 19:27 ` tbm at cyrius dot com
2007-10-28 19:11 ` jakub at gcc dot gnu dot org
2007-10-28 19:42 ` jakub at gcc dot gnu dot org
2007-10-28 20:04 ` maxim at codesourcery dot com
2007-10-28 20:21 ` jakub at gcc dot gnu dot org
2007-10-28 20:43 ` jakub at gcc dot gnu dot org
2007-10-28 21:11 ` jakub at gcc dot gnu dot org
2007-10-29 8:43 ` jakub at gcc dot gnu dot org
2007-11-01 20:59 ` jakub at gcc dot gnu dot org
2007-11-01 21:02 ` rguenther at suse dot de
2007-11-03 10:48 ` ebotcazou at gcc dot gnu dot org
2007-11-05 3:10 ` mmitchel at gcc dot gnu dot org
2007-11-05 15:42 ` spop at gcc dot gnu dot org
2007-11-05 15:44 ` spop at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).