* [Bug rtl-optimization/19097] Lots of else ifs take forever to compile
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
@ 2004-12-21 2:39 ` pinskia at gcc dot gnu dot org
2004-12-21 8:38 ` steven at gcc dot gnu dot org
` (16 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-12-21 2:39 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-12-21 02:39 -------
I think DOM is "fixing" the compile-time/memory-hog on the mainline :).
--
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |compile-time-hog, memory-hog
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] Lots of else ifs take forever to compile
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
2004-12-21 2:39 ` [Bug rtl-optimization/19097] " pinskia at gcc dot gnu dot org
@ 2004-12-21 8:38 ` steven at gcc dot gnu dot org
2004-12-21 8:47 ` steven at gcc dot gnu dot org
` (15 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-12-21 8:38 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2004-12-21 08:38 -------
This is what I get (with checking enabled) for an C5 with two C4s with
"GNU C version 4.0.0 20041220 (experimental) (x86_64-unknown-linux-gnu)":
$ ./cc1 t.c -O2
a
{GC 14385k -> 11467k}
Analyzing compilation unit
Performing intraprocedural optimizations
Assembling functions:
a
{GC 68010k -> 54166k} {GC 74855k -> 56923k} {GC 91181k -> 71784k} {GC 99265k
-> 71726k}
Execution times (seconds)
garbage collection : 0.77 ( 0%) usr 0.01 ( 0%) sys 0.77 ( 0%) wall
callgraph construction: 3.03 ( 2%) usr 0.06 ( 2%) sys 3.09 ( 2%) wall
cfg construction : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
cfg cleanup : 1.55 ( 1%) usr 0.00 ( 0%) sys 1.52 ( 1%) wall
CFG verifier : 3.63 ( 2%) usr 0.04 ( 2%) sys 3.65 ( 2%) wall
trivially dead code : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall
life analysis : 0.84 ( 0%) usr 0.00 ( 0%) sys 0.84 ( 0%) wall
life info update : 0.29 ( 0%) usr 0.00 ( 0%) sys 0.29 ( 0%) wall
alias analysis : 0.22 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall
register scan : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall
rebuild jump labels : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
preprocessing : 0.17 ( 0%) usr 0.07 ( 3%) sys 0.34 ( 0%) wall
lexical analysis : 0.09 ( 0%) usr 0.19 ( 8%) sys 0.26 ( 0%) wall
parser : 0.35 ( 0%) usr 0.10 ( 4%) sys 0.44 ( 0%) wall
tree gimplify : 0.10 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall
tree eh : 0.05 ( 0%) usr 0.01 ( 0%) sys 0.05 ( 0%) wall
tree CFG construction : 0.17 ( 0%) usr 0.03 ( 1%) sys 0.20 ( 0%) wall
tree CFG cleanup : 0.85 ( 0%) usr 0.00 ( 0%) sys 0.89 ( 1%) wall
tree find referenced vars: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%)
wall
tree PTA : 0.19 ( 0%) usr 0.01 ( 0%) sys 0.19 ( 0%) wall
tree alias analysis : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
tree PHI insertion : 0.08 ( 0%) usr 0.01 ( 0%) sys 0.08 ( 0%) wall
tree SSA rewrite : 0.35 ( 0%) usr 0.00 ( 0%) sys 0.37 ( 0%) wall
tree SSA other : 0.58 ( 0%) usr 0.17 ( 7%) sys 0.65 ( 0%) wall
tree operand scan : 0.19 ( 0%) usr 0.10 ( 4%) sys 0.38 ( 0%) wall
dominator optimization: 2.12 ( 1%) usr 0.03 ( 1%) sys 2.16 ( 1%) wall
tree CCP : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall
tree split crit edges : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
tree PRE : 0.20 ( 0%) usr 0.01 ( 0%) sys 0.21 ( 0%) wall
tree remove redundant PHIs: 0.23 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%)
wall
tree linearize phis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
tree forward propagate: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
tree conservative DCE : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall
tree aggressive DCE : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall
tree DSE : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.32 ( 0%) wall
tree loop init : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
tree copy headers : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
tree SSA to normal : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall
tree rename SSA copies: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
tree SSA verifier : 1.99 ( 1%) usr 0.00 ( 0%) sys 1.95 ( 1%) wall
tree STMT verifier : 0.43 ( 0%) usr 0.02 ( 1%) sys 0.44 ( 0%) wall
callgraph verifier : 8.76 ( 5%) usr 0.05 ( 2%) sys 8.81 ( 5%) wall
dominance frontiers : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
control dependences : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
expand : 1.09 ( 1%) usr 0.03 ( 1%) sys 1.20 ( 1%) wall
jump : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall
CSE : 9.12 ( 5%) usr 0.00 ( 0%) sys 9.12 ( 5%) wall
loop analysis : 0.30 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall
global CSE : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
CPROP 1 : 55.78 (33%) usr 0.57 (23%) sys 56.38 (32%) wall
PRE : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
CPROP 2 : 28.46 (17%) usr 0.41 (17%) sys 28.91 (17%) wall
bypass jumps : 28.96 (17%) usr 0.40 (16%) sys 29.39 (17%) wall
CSE 2 : 8.25 ( 5%) usr 0.00 ( 0%) sys 8.25 ( 5%) wall
branch prediction : 0.95 ( 1%) usr 0.01 ( 0%) sys 0.95 ( 1%) wall
flow analysis : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
combiner : 0.36 ( 0%) usr 0.01 ( 0%) sys 0.38 ( 0%) wall
if-conversion : 0.45 ( 0%) usr 0.00 ( 0%) sys 0.48 ( 0%) wall
regmove : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
local alloc : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.26 ( 0%) wall
global alloc : 0.55 ( 0%) usr 0.02 ( 1%) sys 0.58 ( 0%) wall
reload CSE regs : 3.45 ( 2%) usr 0.00 ( 0%) sys 3.46 ( 2%) wall
flow 2 : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
if-conversion 2 : 0.30 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall
peephole 2 : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall
rename registers : 0.93 ( 1%) usr 0.02 ( 1%) sys 0.98 ( 1%) wall
scheduling 2 : 1.07 ( 1%) usr 0.02 ( 1%) sys 1.13 ( 1%) wall
machine dep reorg : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
reorder blocks : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall
shorten branches : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
final : 0.18 ( 0%) usr 0.01 ( 0%) sys 0.19 ( 0%) wall
rest of compilation : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
TOTAL : 171.18 2.43 174.57
My bet: known non-linear behavior in compute_transp. I'm curous why gcse's
"is_too_expensive ()" thinks GCSE is _not_ too expensive for this test case.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed| |1
Last reconfirmed|0000-00-00 00:00:00 |2004-12-21 08:38:15
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] Lots of else ifs take forever to compile
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
2004-12-21 2:39 ` [Bug rtl-optimization/19097] " pinskia at gcc dot gnu dot org
2004-12-21 8:38 ` steven at gcc dot gnu dot org
@ 2004-12-21 8:47 ` steven at gcc dot gnu dot org
2004-12-23 1:43 ` steven at gcc dot gnu dot org
` (14 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-12-21 8:47 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2004-12-21 08:47 -------
Hmm no, it's not compute_transp:
CPU: P4 / Xeon with 2 hyper-threads, speed 3194.18 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % symbol name
343490 22.3269 exp_equiv_p
298238 19.3856 next_set
63817 4.1481 find_avail_set
62787 4.0812 insert_set_in_table
55981 3.6388 cgraph_edge
35531 2.3095 cse_insn
18770 1.2201 cgraph_create_edge
17784 1.1560 for_each_rtx
15956 1.0371 fold_rtx
15164 0.9857 expr_equiv_p
15081 0.9803 hash_rtx
12551 0.8158 validate_value_data
12380 0.8047 get_cse_reg_info
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] Lots of else ifs take forever to compile
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (2 preceding siblings ...)
2004-12-21 8:47 ` steven at gcc dot gnu dot org
@ 2004-12-23 1:43 ` steven at gcc dot gnu dot org
2004-12-24 1:00 ` [Bug rtl-optimization/19097] [3.4/4.0 regression] Quadratic behavior with many sets for the same register in gcse CPROP steven at gcc dot gnu dot org
` (13 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-12-23 1:43 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2004-12-23 01:43 -------
This looks like a problem with the hash function for a REG when we
have many implicit sets:
Found 6001 implicit sets
SET hash table (6001 buckets, 6001 entries)
Index 0 (hash value 58)
(set (reg/v:SI 58 [ b ])
(const_int 1 [0x1]))
Index 1 (hash value 58)
(set (reg/v:SI 58 [ b ])
(const_int 10000 [0x2710]))
Index 2 (hash value 58)
(set (reg/v:SI 58 [ b ])
(const_int 10001 [0x2711]))
Index 3 (hash value 58)
(set (reg/v:SI 58 [ b ])
(const_int 10002 [0x2712]))
Index 4 (hash value 58)
(set (reg/v:SI 58 [ b ])
(const_int 10003 [0x2713]))
Index 5 (hash value 58)
(set (reg/v:SI 58 [ b ])
(const_int 10004 [0x2714]))
Index 6 (hash value 58)
(set (reg/v:SI 58 [ b ])
(const_int 10005 [0x2715]))
Index 7 (hash value 58)
(set (reg/v:SI 58 [ b ])
(const_int 10006 [0x2716]))
Index 8 (hash value 58)
(set (reg/v:SI 58 [ b ])
(const_int 10007 [0x2717]))
Index 9 (hash value 58)
(set (reg/v:SI 58 [ b ])
(const_int 10008 [0x2718]))
Index 10 (hash value 58)
(set (reg/v:SI 58 [ b ])
(const_int 10009 [0x2719]))
(etc.)
Needless to say, this results in truely dramatically bad compile
time behavior of the hash table.
--
What |Removed |Added
----------------------------------------------------------------------------
CC| |roger at eyesopen dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0 regression] Quadratic behavior with many sets for the same register in gcse CPROP
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (3 preceding siblings ...)
2004-12-23 1:43 ` steven at gcc dot gnu dot org
@ 2004-12-24 1:00 ` steven at gcc dot gnu dot org
2004-12-28 18:48 ` bonzini at gcc dot gnu dot org
` (12 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: steven at gcc dot gnu dot org @ 2004-12-24 1:00 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2004-12-24 01:00 -------
We did not have implicit sets in 3.3, so it doesn't show this behavior.
On the other hand, someone could feed RTL to 3.3 where a single register
is set many times, and I'm sure it will show the same behavior. But in
this case, with implicit sets, we could have it for normal code too, for
example an implicit set from a switch condition. So I'm marking this as a
regression from the last release without implicit sets.
The memory issue is different, I'm not sure if/where/why we allocate too
much. It's probably just bad cache behavior that makes this a "memory-hog":
walk a very long linked list that doesn't fit in RAM, well, that sends any
machine to swapping hell. I'm not seeing memory consumption grow in a non
linear way, so I'm removing that keyword.
--
What |Removed |Added
----------------------------------------------------------------------------
Keywords|memory-hog |
Known to fail| |4.0.0
Known to work| |3.3.4
Last reconfirmed|2004-12-21 08:38:15 |2004-12-24 01:00:42
date| |
Summary|Lots of else ifs take |[3.4/4.0 regression]
|forever to compile |Quadratic behavior with many
| |sets for the same register
| |in gcse CPROP
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0 regression] Quadratic behavior with many sets for the same register in gcse CPROP
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (4 preceding siblings ...)
2004-12-24 1:00 ` [Bug rtl-optimization/19097] [3.4/4.0 regression] Quadratic behavior with many sets for the same register in gcse CPROP steven at gcc dot gnu dot org
@ 2004-12-28 18:48 ` bonzini at gcc dot gnu dot org
2005-03-05 19:53 ` [Bug rtl-optimization/19097] [3.4/4.0/4.1 " pinskia at gcc dot gnu dot org
` (11 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: bonzini at gcc dot gnu dot org @ 2004-12-28 18:48 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From bonzini at gcc dot gnu dot org 2004-12-28 18:48 -------
Interestingly, on arm-elf the hog is CSE, because each copy of b is given a
different pseudo.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (5 preceding siblings ...)
2004-12-28 18:48 ` bonzini at gcc dot gnu dot org
@ 2005-03-05 19:53 ` pinskia at gcc dot gnu dot org
2005-03-05 23:37 ` steven at gcc dot gnu dot org
` (10 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-03-05 19:53 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |4.1.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (6 preceding siblings ...)
2005-03-05 19:53 ` [Bug rtl-optimization/19097] [3.4/4.0/4.1 " pinskia at gcc dot gnu dot org
@ 2005-03-05 23:37 ` steven at gcc dot gnu dot org
2005-06-08 3:32 ` phython at gcc dot gnu dot org
` (9 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-03-05 23:37 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2005-03-05 23:37 -------
I don't think this will be fixed for 4.1, unless we kick out implicit
sets from gcse, or all of gcse. The former may be possible if our
const/copy prop at the tree level is good enough, but I wouldn't count
on it.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (7 preceding siblings ...)
2005-03-05 23:37 ` steven at gcc dot gnu dot org
@ 2005-06-08 3:32 ` phython at gcc dot gnu dot org
2005-06-08 9:38 ` steven at gcc dot gnu dot org
` (8 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: phython at gcc dot gnu dot org @ 2005-06-08 3:32 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From phython at gcc dot gnu dot org 2005-06-08 03:32 -------
The problem with the hash table seems to be fixed in gcc 4.1, but not gcc 3.4
or 4.0. In gcc 4.1 hash_rtx is used for the implicit sets instead of the really
dumb hash_set.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (8 preceding siblings ...)
2005-06-08 3:32 ` phython at gcc dot gnu dot org
@ 2005-06-08 9:38 ` steven at gcc dot gnu dot org
2005-06-08 12:44 ` phython at gcc dot gnu dot org
` (7 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-06-08 9:38 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2005-06-08 09:38 -------
Can you try and figure out which patch changed this for GCC 4.1? Bonus
points if you can see if backporting that patch gives GCC 4.0 a speed-up,
because in that case you may have something to go to Mark with for the
next GCC 4.0.x (x>1) release ;-)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (9 preceding siblings ...)
2005-06-08 9:38 ` steven at gcc dot gnu dot org
@ 2005-06-08 12:44 ` phython at gcc dot gnu dot org
2005-08-23 7:29 ` phython at gcc dot gnu dot org
` (6 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: phython at gcc dot gnu dot org @ 2005-06-08 12:44 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From phython at gcc dot gnu dot org 2005-06-08 12:43 -------
Ok, I seem to be wrong, hash_set still seems to be used for implicit sets.
However, the destination registers in gcc 4.1 are all different:
SET hash table (1251 buckets, 1001 entries)
Index 0 (hash value 339)
(set (reg/v:SI 339 [ b ])
(const_int 1 [0x1]))
Index 1 (hash value 342)
(set (reg:SI 342)
(const_int 1000 [0x3e8]))
Index 2 (hash value 344)
(set (reg:SI 344)
(const_int 1001 [0x3e9]))
Index 3 (hash value 346)
(set (reg:SI 346)
(const_int 1002 [0x3ea]))
Index 4 (hash value 348)
(set (reg:SI 348)
(const_int 1003 [0x3eb]))
Index 5 (hash value 350)
(set (reg:SI 350)
(const_int 1004 [0x3ec]))
Index 6 (hash value 352)
(set (reg:SI 352)
(const_int 1005 [0x3ed]))
Index 7 (hash value 354)
...
Index 999 (hash value 1087)
(set (reg:SI 2338)
(const_int 1998 [0x7ce]))
Index 1000 (hash value 1089)
(set (reg:SI 2340)
(const_int 1999 [0x7cf]))
And the time report for -O2 using CL4 on ia64-linux
Execution times (seconds)
garbage collection : 1.98 ( 1%) usr 0.00 ( 0%) sys 1.98 ( 1%) wall
0 kB ( 0%) ggc
dump files : 1.03 ( 0%) usr 0.06 ( 4%) sys 1.08 ( 0%) wall
0 kB ( 0%) ggc
callgraph construction: 4.20 ( 1%) usr 0.01 ( 1%) sys 4.21 ( 1%) wall
12188 kB ( 7%) ggc
cfg construction : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
2131 kB ( 1%) ggc
cfg cleanup : 1.56 ( 0%) usr 0.00 ( 0%) sys 1.56 ( 0%) wall
568 kB ( 0%) ggc
CFG verifier : 7.19 ( 2%) usr 0.02 ( 1%) sys 7.21 ( 2%) wall
0 kB ( 0%) ggc
trivially dead code : 0.54 ( 0%) usr 0.00 ( 0%) sys 0.54 ( 0%) wall
0 kB ( 0%) ggc
life analysis : 4.12 ( 1%) usr 0.00 ( 0%) sys 4.12 ( 1%) wall
4688 kB ( 3%) ggc
life info update : 0.83 ( 0%) usr 0.00 ( 0%) sys 0.83 ( 0%) wall
1250 kB ( 1%) ggc
alias analysis : 1.02 ( 0%) usr 0.00 ( 0%) sys 1.02 ( 0%) wall
4096 kB ( 2%) ggc
register scan : 0.42 ( 0%) usr 0.00 ( 0%) sys 0.42 ( 0%) wall
0 kB ( 0%) ggc
rebuild jump labels : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
0 kB ( 0%) ggc
preprocessing : 0.11 ( 0%) usr 0.07 ( 4%) sys 0.20 ( 0%) wall
702 kB ( 0%) ggc
lexical analysis : 0.08 ( 0%) usr 0.14 ( 8%) sys 0.21 ( 0%) wall
0 kB ( 0%) ggc
parser : 0.22 ( 0%) usr 0.06 ( 4%) sys 0.28 ( 0%) wall
5672 kB ( 3%) ggc
tree gimplify : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
625 kB ( 0%) ggc
tree eh : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
tree CFG construction : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall
17418 kB (10%) ggc
tree CFG cleanup : 1.64 ( 0%) usr 0.00 ( 0%) sys 1.65 ( 0%) wall
0 kB ( 0%) ggc
tree VRP : 30.28 ( 9%) usr 0.02 ( 1%) sys 30.31 ( 9%) wall
5568 kB ( 3%) ggc
tree copy propagation : 0.60 ( 0%) usr 0.01 ( 1%) sys 0.61 ( 0%) wall
2 kB ( 0%) ggc
tree store copy prop : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
0 kB ( 0%) ggc
tree find ref. vars : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
0 kB ( 0%) ggc
tree PTA : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall
0 kB ( 0%) ggc
tree alias analysis : 0.29 ( 0%) usr 0.10 ( 6%) sys 0.42 ( 0%) wall
2 kB ( 0%) ggc
tree PHI insertion : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree SSA rewrite : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
0 kB ( 0%) ggc
tree SSA other : 0.15 ( 0%) usr 0.02 ( 1%) sys 0.17 ( 0%) wall
2 kB ( 0%) ggc
tree SSA incremental : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
0 kB ( 0%) ggc
tree operand scan : 58.50 (18%) usr 0.17 (11%) sys 58.63 (17%) wall
1354 kB ( 1%) ggc
dominator optimization: 1.51 ( 0%) usr 0.00 ( 0%) sys 1.51 ( 0%) wall
10313 kB ( 6%) ggc
tree STORE-CCP : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
0 kB ( 0%) ggc
tree CCP : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall
0 kB ( 0%) ggc
tree split crit edges : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree reassociation : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
tree PRE : 0.28 ( 0%) usr 0.01 ( 0%) sys 0.29 ( 0%) wall
0 kB ( 0%) ggc
tree FRE : 0.17 ( 0%) usr 0.02 ( 1%) sys 0.19 ( 0%) wall
0 kB ( 0%) ggc
tree code sinking : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall
0 kB ( 0%) ggc
tree linearize phis : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
0 kB ( 0%) ggc
tree forward propagate: 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
tree conservative DCE : 0.29 ( 0%) usr 0.00 ( 0%) sys 0.29 ( 0%) wall
0 kB ( 0%) ggc
tree aggressive DCE : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall
0 kB ( 0%) ggc
tree DSE : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall
0 kB ( 0%) ggc
PHI merge : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree loop init : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
0 kB ( 0%) ggc
tree copy headers : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
0 kB ( 0%) ggc
tree SSA uncprop : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
0 kB ( 0%) ggc
tree SSA to normal : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall
2 kB ( 0%) ggc
tree rename SSA copies: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
tree SSA verifier : 2.57 ( 1%) usr 0.00 ( 0%) sys 2.57 ( 1%) wall
0 kB ( 0%) ggc
tree STMT verifier : 7.09 ( 2%) usr 0.07 ( 4%) sys 7.15 ( 2%) wall
0 kB ( 0%) ggc
callgraph verifier : 14.63 ( 4%) usr 0.00 ( 0%) sys 14.63 ( 4%) wall
0 kB ( 0%) ggc
dominance frontiers : 27.54 ( 8%) usr 0.00 ( 0%) sys 27.54 ( 8%) wall
0 kB ( 0%) ggc
control dependences : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
0 kB ( 0%) ggc
expand : 0.69 ( 0%) usr 0.01 ( 0%) sys 0.70 ( 0%) wall
21941 kB (13%) ggc
jump : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall
0 kB ( 0%) ggc
CSE : 11.83 ( 4%) usr 0.01 ( 0%) sys 11.84 ( 4%) wall
0 kB ( 0%) ggc
loop analysis : 0.36 ( 0%) usr 0.00 ( 0%) sys 0.37 ( 0%) wall
3750 kB ( 2%) ggc
global CSE : 0.50 ( 0%) usr 0.08 ( 5%) sys 0.58 ( 0%) wall
0 kB ( 0%) ggc
CPROP 1 : 1.29 ( 0%) usr 0.07 ( 4%) sys 1.36 ( 0%) wall
2656 kB ( 2%) ggc
PRE : 1.65 ( 0%) usr 0.26 (16%) sys 1.91 ( 1%) wall
0 kB ( 0%) ggc
CPROP 2 : 1.33 ( 0%) usr 0.07 ( 4%) sys 1.39 ( 0%) wall
2031 kB ( 1%) ggc
bypass jumps : 1.35 ( 0%) usr 0.10 ( 6%) sys 1.45 ( 0%) wall
2031 kB ( 1%) ggc
CSE 2 : 13.92 ( 4%) usr 0.00 ( 0%) sys 13.92 ( 4%) wall
0 kB ( 0%) ggc
branch prediction : 0.92 ( 0%) usr 0.00 ( 0%) sys 0.92 ( 0%) wall
2187 kB ( 1%) ggc
flow analysis : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
0 kB ( 0%) ggc
combiner : 0.79 ( 0%) usr 0.00 ( 0%) sys 0.79 ( 0%) wall
1875 kB ( 1%) ggc
if-conversion : 0.52 ( 0%) usr 0.00 ( 0%) sys 0.53 ( 0%) wall
0 kB ( 0%) ggc
regmove : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall
0 kB ( 0%) ggc
scheduling : 31.43 ( 9%) usr 0.01 ( 1%) sys 31.44 ( 9%) wall
4073 kB ( 2%) ggc
local alloc : 0.61 ( 0%) usr 0.00 ( 0%) sys 0.61 ( 0%) wall
1193 kB ( 1%) ggc
global alloc : 1.57 ( 0%) usr 0.04 ( 3%) sys 1.61 ( 0%) wall
312 kB ( 0%) ggc
reload CSE regs : 1.47 ( 0%) usr 0.00 ( 0%) sys 1.47 ( 0%) wall
1875 kB ( 1%) ggc
flow 2 : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall
3751 kB ( 2%) ggc
if-conversion 2 : 0.28 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall
0 kB ( 0%) ggc
peephole 2 : 0.35 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall
0 kB ( 0%) ggc
rename registers : 18.93 ( 6%) usr 0.05 ( 3%) sys 18.98 ( 6%) wall
0 kB ( 0%) ggc
scheduling 2 : 69.37 (21%) usr 0.12 ( 7%) sys 69.48 (21%) wall
34853 kB (21%) ggc
machine dep reorg : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
0 kB ( 0%) ggc
reorder blocks : 0.75 ( 0%) usr 0.00 ( 0%) sys 0.75 ( 0%) wall
17522 kB (10%) ggc
shorten branches : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
final : 0.52 ( 0%) usr 0.01 ( 0%) sys 0.52 ( 0%) wall
0 kB ( 0%) ggc
rest of compilation : 0.30 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall
1 kB ( 0%) ggc
TOTAL : 333.44 1.64 335.08
167877 kB
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (10 preceding siblings ...)
2005-06-08 12:44 ` phython at gcc dot gnu dot org
@ 2005-08-23 7:29 ` phython at gcc dot gnu dot org
2005-08-25 5:44 ` phython at gcc dot gnu dot org
` (5 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: phython at gcc dot gnu dot org @ 2005-08-23 7:29 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From phython at gcc dot gnu dot org 2005-08-23 06:46 -------
Here is an updated time-report with the main time contributors on ppc-linux with
checking disabled.
tree VRP : 75.60 (11%) usr 0.39 ( 0%) sys 89.92 ( 9%) wall
13693 kB ( 6%) ggc
tree operand scan : 256.77 (37%) usr 1.36 ( 2%) sys 287.58 (28%) wall
2122 kB ( 1%) ggc
scheduling : 274.05 (39%) usr 73.82 (89%) sys 381.13 (37%) wall
6251 kB ( 3%) ggc
global alloc : 6.98 ( 1%) usr 3.48 ( 4%) sys 166.93 (16%) wall
5469 kB ( 3%) ggc
global alloc was really slow because it took a lot of memory, more than my
machine had.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (11 preceding siblings ...)
2005-08-23 7:29 ` phython at gcc dot gnu dot org
@ 2005-08-25 5:44 ` phython at gcc dot gnu dot org
2005-08-25 6:40 ` phython at gcc dot gnu dot org
` (4 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: phython at gcc dot gnu dot org @ 2005-08-25 5:44 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From phython at gcc dot gnu dot org 2005-08-25 05:24 -------
If I use C5(1) with two C4's in C5 then I get a stack overflow in VRP between
find_assert_locations and find_conditional_asserts.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (12 preceding siblings ...)
2005-08-25 5:44 ` phython at gcc dot gnu dot org
@ 2005-08-25 6:40 ` phython at gcc dot gnu dot org
2005-08-25 7:06 ` phython at gcc dot gnu dot org
` (3 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: phython at gcc dot gnu dot org @ 2005-08-25 6:40 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From phython at gcc dot gnu dot org 2005-08-25 06:03 -------
DOM also eats a metric ton of memory on this testcase. 900MB with C3(1).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (13 preceding siblings ...)
2005-08-25 6:40 ` phython at gcc dot gnu dot org
@ 2005-08-25 7:06 ` phython at gcc dot gnu dot org
2005-09-19 0:14 ` pinskia at gcc dot gnu dot org
` (2 subsequent siblings)
17 siblings, 0 replies; 31+ messages in thread
From: phython at gcc dot gnu dot org @ 2005-08-25 7:06 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From phython at gcc dot gnu dot org 2005-08-25 07:03 -------
FOO! The exact testcase I have been testing for the last couple days has been
#define CL0(a) if (b == a) { foo (); } , so all the work DOM was doing converts
the if's to else if's .
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (14 preceding siblings ...)
2005-08-25 7:06 ` phython at gcc dot gnu dot org
@ 2005-09-19 0:14 ` pinskia at gcc dot gnu dot org
2005-09-19 0:15 ` steven at gcc dot gnu dot org
2005-09-30 14:43 ` amacleod at redhat dot com
17 siblings, 0 replies; 31+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-19 0:14 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-19 00:14 -------
For -O1 on x86_64-pc-linux-gnu:
For 4.1.0:
tree operand scan : 18.28 (50%) usr 0.10 (18%) sys 18.33 (50%) wall 402 kB ( 0%) ggc
That is the same issue as PR 21430
For both 4.0.0 and 4.1.0:
4.0.0:
dominance frontiers : 12.80 (66%) usr 0.00 ( 0%) sys 12.82 (64%) wall
4.1.0:
dominance frontiers : 10.87 (30%) usr 0.00 ( 0%) sys 10.89 (30%) wall 0 kB ( 0%) ggc
--
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn| |21430
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (15 preceding siblings ...)
2005-09-19 0:14 ` pinskia at gcc dot gnu dot org
@ 2005-09-19 0:15 ` steven at gcc dot gnu dot org
2005-09-30 14:43 ` amacleod at redhat dot com
17 siblings, 0 replies; 31+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-09-19 0:15 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2005-09-19 00:15 -------
Another SSA operands cache slowness example...
--
What |Removed |Added
----------------------------------------------------------------------------
CC| |amacleod at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
2004-12-21 1:57 [Bug rtl-optimization/19097] New: Lots of else ifs take forever to compile phython at gcc dot gnu dot org
` (16 preceding siblings ...)
2005-09-19 0:15 ` steven at gcc dot gnu dot org
@ 2005-09-30 14:43 ` amacleod at redhat dot com
17 siblings, 0 replies; 31+ messages in thread
From: amacleod at redhat dot com @ 2005-09-30 14:43 UTC (permalink / raw)
To: gcc-bugs
--
Bug 19097 depends on bug 21430, which changed state.
Bug 21430 Summary: [4.1 Regression] Quadratic behavior with constant initializers
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21430
What |Old Value |New Value
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
[not found] <bug-19097-7737@http.gcc.gnu.org/bugzilla/>
@ 2005-10-16 23:20 ` steven at gcc dot gnu dot org
2005-10-18 8:36 ` bonzini at gcc dot gnu dot org
` (10 subsequent siblings)
11 siblings, 0 replies; 31+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-10-16 23:20 UTC (permalink / raw)
To: gcc-bugs
------- Comment #17 from steven at gcc dot gnu dot org 2005-10-16 23:20 -------
On AMD64, I now get the following timings:
-O1 -O2
3.3 (profilebootstrapped) 46.64 46.90
4.1 (checking=release) 72.82 156.43
In 4.1, the Big Spenders are "dominance frontiers" (41% usr)
and "tree operand scan" (also 41%) for -O1. For -O2 those two
also are big time black holes, and the 3 gcse.c CPROP passes
join them at the top of the profile ("dominance frontiers" 18%,
"tree operand scan" 18%, "CPROP 1" 17%, "CPROP 2" 9%, "bypass
jumps" 10%).
So this is still a regression from GCC 3.3.
--
steven at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to fail|4.0.0 |4.0.0 4.1.0
Last reconfirmed|2005-10-16 22:35:50 |2005-10-16 23:20:30
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
[not found] <bug-19097-7737@http.gcc.gnu.org/bugzilla/>
2005-10-16 23:20 ` steven at gcc dot gnu dot org
@ 2005-10-18 8:36 ` bonzini at gcc dot gnu dot org
2005-10-18 12:25 ` amacleod at redhat dot com
` (9 subsequent siblings)
11 siblings, 0 replies; 31+ messages in thread
From: bonzini at gcc dot gnu dot org @ 2005-10-18 8:36 UTC (permalink / raw)
To: gcc-bugs
------- Comment #18 from bonzini at gcc dot gnu dot org 2005-10-18 08:36 -------
Steven, how does your df.c-based cprop fare on this testcase?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
[not found] <bug-19097-7737@http.gcc.gnu.org/bugzilla/>
2005-10-16 23:20 ` steven at gcc dot gnu dot org
2005-10-18 8:36 ` bonzini at gcc dot gnu dot org
@ 2005-10-18 12:25 ` amacleod at redhat dot com
2005-10-29 22:38 ` steven at gcc dot gnu dot org
` (8 subsequent siblings)
11 siblings, 0 replies; 31+ messages in thread
From: amacleod at redhat dot com @ 2005-10-18 12:25 UTC (permalink / raw)
To: gcc-bugs
------- Comment #19 from amacleod at redhat dot com 2005-10-18 12:25 -------
Created an attachment (id=10017)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10017&action=view)
patch for operand scan
Now that correct_use_link is *only* used for real uses, it is no longer
profitable to try to "shortcut" the search for the owner of a use list. The
shortcut use to look at each previous node as the list was traversed, and check
to see if the stmt was modified. If it wasn't, we knew that node was in the
correct list and wouldnt have to scan all the way back to the owner.
well, this testcase was spending almost all its time checking for
stmt_modified_p.... something like 250,000,000 checks on 50,000 calls.
I've removed the no longer useful shortcut, and the results are as follows:
bootstrapped and no new regressions on i686-pc-linux-gnu.
Andrew
x86-64:
-O1 on testcase:
before patch tree operand scan : 8.00 (36%)
TOTAL : 22.18
after patch: tree operand scan : 1.41 ( 9%)
TOTAL : 15.62
-O2 on testcase:
before patch tree operand scan : 7.88 (15%)
TOTAL : 53.08
after patch: tree operand scan : 1.42 ( 3%)
TOTAL : 46.94
x86:
-O1 on testcase:
before patch tree operand scan : 2.54 (17%)
TOTAL : 14.60
after patch: tree operand scan : 1.01 ( 8%)
TOTAL : 12.51
-O2 on testcase:
before patch tree operand scan : 2.95 ( 8%)
TOTAL : 39.20
after patch: tree operand scan : 1.06 ( 3%)
TOTAL : 38.03
pretty much a wash on cc1-i and cpgram.cc testcases on both targets.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
[not found] <bug-19097-7737@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2005-10-18 12:25 ` amacleod at redhat dot com
@ 2005-10-29 22:38 ` steven at gcc dot gnu dot org
2005-10-31 2:02 ` mmitchel at gcc dot gnu dot org
` (7 subsequent siblings)
11 siblings, 0 replies; 31+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-10-29 22:38 UTC (permalink / raw)
To: gcc-bugs
------- Comment #20 from steven at gcc dot gnu dot org 2005-10-29 22:38 -------
amacleod, are you going to post your patch and/or commit it??
--
steven at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |WAITING
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
[not found] <bug-19097-7737@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2005-10-29 22:38 ` steven at gcc dot gnu dot org
@ 2005-10-31 2:02 ` mmitchel at gcc dot gnu dot org
2005-10-31 13:33 ` amacleod at redhat dot com
` (6 subsequent siblings)
11 siblings, 0 replies; 31+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-10-31 2:02 UTC (permalink / raw)
To: gcc-bugs
------- Comment #21 from mmitchel at gcc dot gnu dot org 2005-10-31 02:02 -------
I'm going to leave this as P2, since we've got a proposed patch in Comment #19.
Andrew, do you need a review on that patch? Or, is there any other reason it
hasn't been committed?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
[not found] <bug-19097-7737@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2005-10-31 2:02 ` mmitchel at gcc dot gnu dot org
@ 2005-10-31 13:33 ` amacleod at redhat dot com
2005-10-31 14:41 ` amacleod at redhat dot com
` (5 subsequent siblings)
11 siblings, 0 replies; 31+ messages in thread
From: amacleod at redhat dot com @ 2005-10-31 13:33 UTC (permalink / raw)
To: gcc-bugs
------- Comment #22 from amacleod at redhat dot com 2005-10-31 13:33 -------
It will be checked in shortly. I got your OK for this stage last week, and I
was merely waiting for the SVN switchover freeze to expire, trying a new build
and getting back to work today.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
[not found] <bug-19097-7737@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2005-10-31 13:33 ` amacleod at redhat dot com
@ 2005-10-31 14:41 ` amacleod at redhat dot com
2005-10-31 15:04 ` dberlin at gcc dot gnu dot org
` (4 subsequent siblings)
11 siblings, 0 replies; 31+ messages in thread
From: amacleod at redhat dot com @ 2005-10-31 14:41 UTC (permalink / raw)
To: gcc-bugs
------- Comment #23 from amacleod at redhat dot com 2005-10-31 14:41 -------
Hmm. This has been committed, but the commit hasn't shown up yet. Perhaps
because I tagged it as a tree-optimization PR and I now notice that its marked
as rtl-optimization?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
[not found] <bug-19097-7737@http.gcc.gnu.org/bugzilla/>
` (6 preceding siblings ...)
2005-10-31 14:41 ` amacleod at redhat dot com
@ 2005-10-31 15:04 ` dberlin at gcc dot gnu dot org
2005-10-31 15:19 ` amacleod at gcc dot gnu dot org
` (3 subsequent siblings)
11 siblings, 0 replies; 31+ messages in thread
From: dberlin at gcc dot gnu dot org @ 2005-10-31 15:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #24 from dberlin at gcc dot gnu dot org 2005-10-31 15:04 -------
I fixed the bug that was preventing it from sending it to this bug, it should
pop up in a second
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
[not found] <bug-19097-7737@http.gcc.gnu.org/bugzilla/>
` (7 preceding siblings ...)
2005-10-31 15:04 ` dberlin at gcc dot gnu dot org
@ 2005-10-31 15:19 ` amacleod at gcc dot gnu dot org
2005-10-31 17:12 ` steven at gcc dot gnu dot org
` (2 subsequent siblings)
11 siblings, 0 replies; 31+ messages in thread
From: amacleod at gcc dot gnu dot org @ 2005-10-31 15:19 UTC (permalink / raw)
To: gcc-bugs
------- Comment #25 from amacleod at redhat dot com 2005-10-31 15:19 -------
Subject: Bug 19097
Author: amacleod
Date: Mon Oct 31 13:38:05 2005
New Revision: 106272
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=106272
Log:
2005-10-31 Andrew MacLeod <amacleod@redhat.com>
PR tree-optimization/19097
* tree-ssa-operands.c (correct_use_link): Don't look for modified
stmts.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-operands.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
[not found] <bug-19097-7737@http.gcc.gnu.org/bugzilla/>
` (8 preceding siblings ...)
2005-10-31 15:19 ` amacleod at gcc dot gnu dot org
@ 2005-10-31 17:12 ` steven at gcc dot gnu dot org
2005-11-08 0:18 ` steven at gcc dot gnu dot org
2005-11-08 6:48 ` phython at gcc dot gnu dot org
11 siblings, 0 replies; 31+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-10-31 17:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #26 from steven at gcc dot gnu dot org 2005-10-31 17:12 -------
Moving back to new, because I don't know if the GCSE CPROP issue with implicit
sets is also already fixed.
--
steven at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |NEW
Last reconfirmed|2005-10-16 23:20:30 |2005-10-31 17:12:20
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
[not found] <bug-19097-7737@http.gcc.gnu.org/bugzilla/>
` (9 preceding siblings ...)
2005-10-31 17:12 ` steven at gcc dot gnu dot org
@ 2005-11-08 0:18 ` steven at gcc dot gnu dot org
2005-11-08 6:48 ` phython at gcc dot gnu dot org
11 siblings, 0 replies; 31+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-11-08 0:18 UTC (permalink / raw)
To: gcc-bugs
------- Comment #27 from steven at gcc dot gnu dot org 2005-11-08 00:18 -------
On AMD64, revision 106596M (the M is for a local loop-invariant.c
patch, nothing special), compiler built with --enable-checking=release:
at -O1:
tree operand scan : 1.50 (10%) usr 0.09 (17%) sys 1.62 (10%) wall
dominance frontiers : 9.09 (60%) usr 0.00 ( 0%) sys 9.20 (58%) wall
TOTAL : 15.05 0.53 15.80
at -O2:
tree VRP : 12.20 (23%) usr 0.03 ( 3%) sys 12.44 (23%) wall
dominance frontiers : 9.17 (18%) usr 0.01 ( 1%) sys 9.30 (17%) wall
CPROP 1 : 8.17 (16%) usr 0.16 (16%) sys 8.44 (16%) wall
CPROP 2 : 5.54 (11%) usr 0.11 (11%) sys 5.72 (11%) wall
bypass jumps : 5.57 (11%) usr 0.11 (11%) sys 5.75 (11%) wall
TOTAL : 52.31 1.00 53.98
For GCC 3.3.5 at -O1 the total time is 26s, and at -O2 it is 31s.
For AMD64 -m32 -march=i686:
at -O1:
tree operand scan : 1.48 (10%) usr 0.09 (18%) sys 1.59 (10%) wall
dominance frontiers : 9.03 (61%) usr 0.00 ( 0%) sys 9.14 (59%) wall
TOTAL : 14.70 0.49 15.39
at -O2:
tree VRP : 11.84 (24%) usr 0.04 ( 4%) sys 12.02 (24%) wall
dominance frontiers : 9.11 (19%) usr 0.02 ( 2%) sys 9.25 (18%) wall
CPROP 1 : 7.54 (15%) usr 0.10 (11%) sys 7.74 (15%) wall
CPROP 2 : 4.99 (10%) usr 0.10 (11%) sys 5.15 (10%) wall
bypass jumps : 4.96 (10%) usr 0.10 (11%) sys 5.12 (10%) wall
TOTAL : 48.97 0.93 50.54
For GCC 3.3.5 at -O1 the total time is 25s, and at -O2 it is 28s.
Compared to my measurements from comment #17, this is good progress. James, do
you think we can close this bug now?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Bug rtl-optimization/19097] [3.4/4.0/4.1 regression] Quadratic behavior with many sets for the same register in gcse CPROP
[not found] <bug-19097-7737@http.gcc.gnu.org/bugzilla/>
` (10 preceding siblings ...)
2005-11-08 0:18 ` steven at gcc dot gnu dot org
@ 2005-11-08 6:48 ` phython at gcc dot gnu dot org
11 siblings, 0 replies; 31+ messages in thread
From: phython at gcc dot gnu dot org @ 2005-11-08 6:48 UTC (permalink / raw)
To: gcc-bugs
------- Comment #28 from phython at gcc dot gnu dot org 2005-11-08 06:48 -------
Steven: How long does 4.0 take to compile this function on your box?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19097
^ permalink raw reply [flat|nested] 31+ messages in thread