public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/110489] New: Slow building virtual.c.i from p11-kit
@ 2023-06-29 19:35 sjames at gcc dot gnu.org
2023-06-29 19:46 ` [Bug middle-end/110489] " pinskia at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-06-29 19:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489
Bug ID: 110489
Summary: Slow building virtual.c.i from p11-kit
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: compile-time-hog
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: sjames at gcc dot gnu.org
Target Milestone: ---
Created attachment 55430
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55430&action=edit
virtual.c.i.xz
I fear this is a degenerate case as it's a largeish generated file (made during
the build process), but it's noticeable enough for me to raise it anyway.
When building p11-kit, I noticed a handful of files took considerably longer to
build. This is with release checking.
The standout seems to be `virtual.c.i`:
```
$ time gcc -c virtual.c.i -O2 -fPIC
real 0m12.429s
user 0m12.137s
sys 0m0.238s
$ gcc --version
gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ gcc -c virtual.c.i -O2 -pipe -fPIC -ftime-report
Time variable usr sys wall
GGC
phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
1326k ( 0%)
phase parsing : 1.24 ( 5%) 1.01 ( 20%) 2.26 ( 7%)
53M ( 9%)
phase lang. deferred : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
96 ( 0%)
phase opt and generate : 24.16 ( 95%) 4.08 ( 80%) 28.74 ( 93%)
570M ( 91%)
phase finalize : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
garbage collection : 0.08 ( 0%) 0.00 ( 0%) 0.08 ( 0%)
0 ( 0%)
dump files : 1.07 ( 4%) 0.24 ( 5%) 1.58 ( 5%)
0 ( 0%)
callgraph construction : 0.14 ( 1%) 0.01 ( 0%) 0.22 ( 1%)
18M ( 3%)
callgraph optimization : 0.36 ( 1%) 0.12 ( 2%) 0.50 ( 2%)
13k ( 0%)
callgraph functions expansion : 20.14 ( 79%) 3.12 ( 61%) 23.71 ( 76%)
390M ( 62%)
callgraph ipa passes : 3.61 ( 14%) 0.84 ( 17%) 4.48 ( 14%)
132M ( 21%)
ipa function summary : 0.14 ( 1%) 0.04 ( 1%) 0.12 ( 0%)
13M ( 2%)
ipa dead code removal : 0.09 ( 0%) 0.00 ( 0%) 0.07 ( 0%)
0 ( 0%)
ipa devirtualization : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
ipa cp : 0.14 ( 1%) 0.04 ( 1%) 0.11 ( 0%)
6933k ( 1%)
ipa inlining heuristics : 0.14 ( 1%) 0.02 ( 0%) 0.10 ( 0%)
135k ( 0%)
ipa function splitting : 0.39 ( 2%) 0.06 ( 1%) 0.34 ( 1%)
40M ( 7%)
ipa comdats : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
ipa reference : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
0 ( 0%)
ipa profile : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
0 ( 0%)
ipa pure const : 0.07 ( 0%) 0.02 ( 0%) 0.16 ( 1%)
2416 ( 0%)
ipa icf : 0.14 ( 1%) 0.00 ( 0%) 0.14 ( 0%)
8112 ( 0%)
ipa SRA : 0.10 ( 0%) 0.00 ( 0%) 0.11 ( 0%)
1116k ( 0%)
ipa free lang data : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
ipa free inline summary : 0.02 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
0 ( 0%)
ipa modref : 0.08 ( 0%) 0.00 ( 0%) 0.08 ( 0%)
2874k ( 0%)
cfg construction : 0.02 ( 0%) 0.02 ( 0%) 0.06 ( 0%)
1323k ( 0%)
cfg cleanup : 0.23 ( 1%) 0.06 ( 1%) 0.34 ( 1%)
3108k ( 0%)
trivially dead code : 0.07 ( 0%) 0.02 ( 0%) 0.14 ( 0%)
0 ( 0%)
df scan insns : 0.20 ( 1%) 0.03 ( 1%) 0.25 ( 1%)
282k ( 0%)
df reaching defs : 0.21 ( 1%) 0.03 ( 1%) 0.37 ( 1%)
0 ( 0%)
df live regs : 0.35 ( 1%) 0.08 ( 2%) 0.31 ( 1%)
0 ( 0%)
df live&initialized regs : 0.23 ( 1%) 0.01 ( 0%) 0.17 ( 1%)
0 ( 0%)
df must-initialized regs : 0.06 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
0 ( 0%)
df use-def / def-use chains : 0.10 ( 0%) 0.02 ( 0%) 0.17 ( 1%)
0 ( 0%)
df reg dead/unused notes : 0.40 ( 2%) 0.02 ( 0%) 0.32 ( 1%)
6334k ( 1%)
register information : 0.14 ( 1%) 0.01 ( 0%) 0.06 ( 0%)
0 ( 0%)
alias analysis : 0.46 ( 2%) 0.03 ( 1%) 0.31 ( 1%)
11M ( 2%)
alias stmt walking : 0.13 ( 1%) 0.04 ( 1%) 0.13 ( 0%)
18k ( 0%)
register scan : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
137k ( 0%)
rebuild jump labels : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%)
0 ( 0%)
preprocessing : 0.32 ( 1%) 0.23 ( 5%) 0.60 ( 2%)
4145k ( 1%)
lexical analysis : 0.48 ( 2%) 0.37 ( 7%) 0.70 ( 2%)
0 ( 0%)
parser (global) : 0.10 ( 0%) 0.07 ( 1%) 0.25 ( 1%)
14M ( 2%)
parser struct body : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
181k ( 0%)
parser function body : 0.34 ( 1%) 0.34 ( 7%) 0.71 ( 2%)
34M ( 6%)
early inlining heuristics : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
50k ( 0%)
inline parameters : 0.16 ( 1%) 0.02 ( 0%) 0.16 ( 1%)
9064k ( 1%)
integration : 0.42 ( 2%) 0.02 ( 0%) 0.44 ( 1%)
17M ( 3%)
tree gimplify : 0.19 ( 1%) 0.05 ( 1%) 0.20 ( 1%)
20M ( 3%)
tree eh : 0.00 ( 0%) 0.01 ( 0%) 0.00 ( 0%)
0 ( 0%)
tree CFG construction : 0.00 ( 0%) 0.01 ( 0%) 0.05 ( 0%)
11M ( 2%)
tree CFG cleanup : 0.34 ( 1%) 0.10 ( 2%) 0.46 ( 1%)
68k ( 0%)
tree tail merge : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
7696 ( 0%)
tree VRP : 0.39 ( 2%) 0.10 ( 2%) 0.58 ( 2%)
14M ( 2%)
tree Early VRP : 0.12 ( 0%) 0.05 ( 1%) 0.22 ( 1%)
6698k ( 1%)
tree copy propagation : 0.06 ( 0%) 0.00 ( 0%) 0.13 ( 0%)
432 ( 0%)
tree PTA : 0.80 ( 3%) 0.18 ( 4%) 1.10 ( 4%)
6475k ( 1%)
tree SSA other : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
tree SSA rewrite : 0.12 ( 0%) 0.02 ( 0%) 0.12 ( 0%)
10M ( 2%)
tree SSA incremental : 0.23 ( 1%) 0.04 ( 1%) 0.19 ( 1%)
3738k ( 1%)
tree operand scan : 0.07 ( 0%) 0.05 ( 1%) 0.16 ( 1%)
14M ( 2%)
dominator optimization : 0.62 ( 2%) 0.15 ( 3%) 0.53 ( 2%)
4427k ( 1%)
backwards jump threading : 0.27 ( 1%) 0.01 ( 0%) 0.31 ( 1%)
38k ( 0%)
tree SRA : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
isolate eroneous paths : 0.01 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
0 ( 0%)
tree CCP : 0.50 ( 2%) 0.08 ( 2%) 0.50 ( 2%)
696k ( 0%)
tree reassociation : 0.04 ( 0%) 0.02 ( 0%) 0.04 ( 0%)
48 ( 0%)
tree PRE : 0.34 ( 1%) 0.05 ( 1%) 0.35 ( 1%)
8448k ( 1%)
tree FRE : 0.46 ( 2%) 0.06 ( 1%) 0.55 ( 2%)
5354k ( 1%)
tree code sinking : 0.07 ( 0%) 0.02 ( 0%) 0.02 ( 0%)
270k ( 0%)
tree linearize phis : 0.02 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
6316k ( 1%)
tree backward propagate : 0.02 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
0 ( 0%)
tree forward propagate : 0.12 ( 0%) 0.05 ( 1%) 0.19 ( 1%)
41k ( 0%)
tree phiprop : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
tree conservative DCE : 0.10 ( 0%) 0.02 ( 0%) 0.17 ( 1%)
7296 ( 0%)
tree aggressive DCE : 0.14 ( 1%) 0.04 ( 1%) 0.13 ( 0%)
12M ( 2%)
tree buildin call DCE : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
tree DSE : 0.03 ( 0%) 0.00 ( 0%) 0.09 ( 0%)
15k ( 0%)
PHI merge : 0.01 ( 0%) 0.01 ( 0%) 0.01 ( 0%)
48k ( 0%)
tree loop optimization : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
tree loop invariant motion : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
complete unrolling : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
12k ( 0%)
tree slp vectorization : 0.18 ( 1%) 0.04 ( 1%) 0.23 ( 1%)
12M ( 2%)
tree copy headers : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
7904 ( 0%)
tree SSA uncprop : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%)
0 ( 0%)
tree NRV optimization : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
139k ( 0%)
tree switch conversion : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
tree switch lowering : 0.00 ( 0%) 0.00 ( 0%) 0.05 ( 0%)
0 ( 0%)
gimple CSE sin/cos : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
gimple widening/fma detection : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
tree strlen optimization : 0.07 ( 0%) 0.01 ( 0%) 0.12 ( 0%)
6325k ( 1%)
tree modref : 0.14 ( 1%) 0.04 ( 1%) 0.16 ( 1%)
10M ( 2%)
dominance frontiers : 0.03 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
dominance computation : 0.50 ( 2%) 0.13 ( 3%) 0.82 ( 3%)
0 ( 0%)
control dependences : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
out of ssa : 0.06 ( 0%) 0.02 ( 0%) 0.11 ( 0%)
1061k ( 0%)
expand vars : 0.05 ( 0%) 0.01 ( 0%) 0.03 ( 0%)
2414k ( 0%)
expand : 0.54 ( 2%) 0.09 ( 2%) 0.55 ( 2%)
38M ( 6%)
post expand cleanups : 0.03 ( 0%) 0.02 ( 0%) 0.08 ( 0%)
3084k ( 0%)
lower subreg : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
jump : 0.04 ( 0%) 0.02 ( 0%) 0.03 ( 0%)
0 ( 0%)
forward prop : 0.42 ( 2%) 0.04 ( 1%) 0.50 ( 2%)
226k ( 0%)
CSE : 0.31 ( 1%) 0.03 ( 1%) 0.38 ( 1%)
237k ( 0%)
dead code elimination : 0.13 ( 1%) 0.02 ( 0%) 0.07 ( 0%)
0 ( 0%)
dead store elim1 : 0.12 ( 0%) 0.03 ( 1%) 0.22 ( 1%)
2042k ( 0%)
dead store elim2 : 0.16 ( 1%) 0.01 ( 0%) 0.18 ( 1%)
2333k ( 0%)
loop analysis : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
0 ( 0%)
loop init : 0.36 ( 1%) 0.10 ( 2%) 0.44 ( 1%)
33M ( 5%)
loop fini : 0.09 ( 0%) 0.04 ( 1%) 0.13 ( 0%)
0 ( 0%)
CPROP : 0.43 ( 2%) 0.08 ( 2%) 0.50 ( 2%)
1842k ( 0%)
PRE : 0.13 ( 1%) 0.04 ( 1%) 0.17 ( 1%)
3816 ( 0%)
CSE 2 : 0.25 ( 1%) 0.04 ( 1%) 0.34 ( 1%)
311k ( 0%)
branch prediction : 0.11 ( 0%) 0.03 ( 1%) 0.07 ( 0%)
623k ( 0%)
combiner : 0.39 ( 2%) 0.04 ( 1%) 0.57 ( 2%)
4948k ( 1%)
if-conversion : 0.09 ( 0%) 0.01 ( 0%) 0.12 ( 0%)
86k ( 0%)
mode switching : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
integrated RA : 1.75 ( 7%) 0.11 ( 2%) 2.10 ( 7%)
147M ( 24%)
LRA non-specific : 0.39 ( 2%) 0.05 ( 1%) 0.50 ( 2%)
391k ( 0%)
LRA virtuals elimination : 0.02 ( 0%) 0.01 ( 0%) 0.06 ( 0%)
112k ( 0%)
LRA reload inheritance : 0.03 ( 0%) 0.00 ( 0%) 0.07 ( 0%)
0 ( 0%)
LRA create live ranges : 0.09 ( 0%) 0.02 ( 0%) 0.07 ( 0%)
8232 ( 0%)
LRA hard reg assignment : 0.01 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
0 ( 0%)
reload : 0.02 ( 0%) 0.01 ( 0%) 0.05 ( 0%)
141k ( 0%)
reload CSE regs : 0.34 ( 1%) 0.11 ( 2%) 0.49 ( 2%)
3089k ( 0%)
ree : 0.05 ( 0%) 0.00 ( 0%) 0.09 ( 0%)
19k ( 0%)
thread pro- & epilogue : 0.32 ( 1%) 0.01 ( 0%) 0.32 ( 1%)
12M ( 2%)
if-conversion 2 : 0.07 ( 0%) 0.02 ( 0%) 0.04 ( 0%)
0 ( 0%)
combine stack adjustments : 0.01 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
0 ( 0%)
peephole 2 : 0.21 ( 1%) 0.02 ( 0%) 0.17 ( 1%)
1225k ( 0%)
hard reg cprop : 0.10 ( 0%) 0.01 ( 0%) 0.15 ( 0%)
552 ( 0%)
scheduling 2 : 1.55 ( 6%) 0.14 ( 3%) 1.35 ( 4%)
2653k ( 0%)
machine dep reorg : 0.11 ( 0%) 0.04 ( 1%) 0.12 ( 0%)
0 ( 0%)
reorder blocks : 0.10 ( 0%) 0.02 ( 0%) 0.21 ( 1%)
724k ( 0%)
shorten branches : 0.12 ( 0%) 0.02 ( 0%) 0.14 ( 0%)
0 ( 0%)
reg stack : 0.00 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
0 ( 0%)
final : 0.33 ( 1%) 0.07 ( 1%) 0.59 ( 2%)
11M ( 2%)
variable output : 0.02 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
183k ( 0%)
tree if-combine : 0.03 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
80 ( 0%)
if to switch conversion : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
straight-line strength reduction : 0.05 ( 0%) 0.00 ( 0%) 0.08 ( 0%)
864 ( 0%)
store merging : 0.01 ( 0%) 0.00 ( 0%) 0.06 ( 0%)
728 ( 0%)
initialize rtl : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
12k ( 0%)
address lowering : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
360 ( 0%)
access analysis : 0.12 ( 0%) 0.03 ( 1%) 0.20 ( 1%)
16k ( 0%)
early local passes : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
unaccounted optimizations : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
rest of compilation : 1.76 ( 7%) 0.34 ( 7%) 1.82 ( 6%)
14M ( 2%)
unaccounted post reload : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
unaccounted late compilation : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
remove unused locals : 0.08 ( 0%) 0.04 ( 1%) 0.11 ( 0%)
0 ( 0%)
address taken : 0.07 ( 0%) 0.01 ( 0%) 0.15 ( 0%)
0 ( 0%)
rebuild frequencies : 0.04 ( 0%) 0.02 ( 0%) 0.10 ( 0%)
568 ( 0%)
repair loop structures : 0.03 ( 0%) 0.01 ( 0%) 0.02 ( 0%)
112 ( 0%)
TOTAL : 25.40 5.09 31.03
625M
```
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug middle-end/110489] Slow building virtual.c.i from p11-kit
2023-06-29 19:35 [Bug c/110489] New: Slow building virtual.c.i from p11-kit sjames at gcc dot gnu.org
@ 2023-06-29 19:46 ` pinskia at gcc dot gnu.org
2023-06-29 19:54 ` pinskia at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-29 19:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The only ones that stick out are:
dump files : 1.07 ( 4%) 0.24 ( 5%) 1.58 ( 5%)
0 ( 0%)
integrated RA : 1.75 ( 7%) 0.11 ( 2%) 2.10 ( 7%)
147M ( 24%)
scheduling 2 : 1.55 ( 6%) 0.14 ( 3%) 1.35 ( 4%)
2653k ( 0%)
Nothing else sticks out really. (but they do add up).
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug middle-end/110489] Slow building virtual.c.i from p11-kit
2023-06-29 19:35 [Bug c/110489] New: Slow building virtual.c.i from p11-kit sjames at gcc dot gnu.org
2023-06-29 19:46 ` [Bug middle-end/110489] " pinskia at gcc dot gnu.org
@ 2023-06-29 19:54 ` pinskia at gcc dot gnu.org
2023-06-30 7:45 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-29 19:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So I took a look at the sources, there are very many small functions.
This might be the reason why dump files Timevar takes a long time, it is called
for each pass and for each function. Maybe that can be improved.
the register allocator and schedule costs I suspect is due to there being a
small setup cost which multiply by many functions add up.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug middle-end/110489] Slow building virtual.c.i from p11-kit
2023-06-29 19:35 [Bug c/110489] New: Slow building virtual.c.i from p11-kit sjames at gcc dot gnu.org
2023-06-29 19:46 ` [Bug middle-end/110489] " pinskia at gcc dot gnu.org
2023-06-29 19:54 ` pinskia at gcc dot gnu.org
@ 2023-06-30 7:45 ` rguenth at gcc dot gnu.org
2023-06-30 8:37 ` rguenth at gcc dot gnu.org
2023-06-30 8:38 ` cvs-commit at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-30 7:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2023-06-30
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Samples: 45K of event 'cycles', Event count (approx.): 51356148788
Overhead Samples Command Shared Object Symbol
2.57% 1169 cc1 libc-2.31.so [.] _int_malloc
1.54% 700 cc1 cc1 [.] bitmap_set_bit
1.52% 692 cc1 libc-2.31.so [.] malloc
1.32% 602 cc1 libc-2.31.so [.] _int_free
1.31% 598 cc1 cc1 [.] record_reg_classes
1.04% 476 cc1 cc1 [.] constrain_operands
0.81% 368 cc1 cc1 [.] solve_constraints
0.79% 360 cc1 cc1 [.] cse_insn
0.78% 357 cc1 cc1 [.] ggc_internal_alloc
0.76% 347 cc1 libc-2.31.so [.] free
0.73% 330 cc1 cc1 [.] statistics_fini_pass
it's pointing at things I've seen multiple times, but I think investigating
why memory allocation is so high up in the profile would be good. There
are some users like dom_info::dom_init which hit hard on the allocator
without good reason but then it's only few per function but as seen this
testcase has many of them. Likewise ipa_sra_summarize_function seems to
have 99% cost in memory allocation. Doing more on-demand initialization
might help here.
I have a patch for the statistics_finish_pass hit.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug middle-end/110489] Slow building virtual.c.i from p11-kit
2023-06-29 19:35 [Bug c/110489] New: Slow building virtual.c.i from p11-kit sjames at gcc dot gnu.org
` (2 preceding siblings ...)
2023-06-30 7:45 ` rguenth at gcc dot gnu.org
@ 2023-06-30 8:37 ` rguenth at gcc dot gnu.org
2023-06-30 8:38 ` cvs-commit at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-30 8:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
There would possibly be opportunity to optimize some of our infrastructure for
the case where we have 3 basic blocks (the minimum, ENTRY, bb2 and EXIT). For
example dominance compute doesn't need to be "computed", the only special case
to consider is that EXIT is not reachable.
The testcase at hand seems to consist of forwarders. In general single-BB
functions can be quite common in C++ code as well. A lot of passes
could excuse themselves as well (next special case is BB2 having a
backedge to itself). There is quite some constant overhead all over the place
even when we do nothing in the end.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug middle-end/110489] Slow building virtual.c.i from p11-kit
2023-06-29 19:35 [Bug c/110489] New: Slow building virtual.c.i from p11-kit sjames at gcc dot gnu.org
` (3 preceding siblings ...)
2023-06-30 8:37 ` rguenth at gcc dot gnu.org
@ 2023-06-30 8:38 ` cvs-commit at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-30 8:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:18e5aeaef294428fc8458c2c70a9ac3a537c35d6
commit r14-2209-g18e5aeaef294428fc8458c2c70a9ac3a537c35d6
Author: Richard Biener <rguenther@suse.de>
Date: Fri Jun 30 09:46:48 2023 +0200
middle-end/110489 - avoid useless work on statistics
When we call statistics_fini_pass we unconditionally allocate
the statistics hash and traverse it. When a TU has many small
functions this can take considerable time. The following avoids
this by never allocating the hash from this function.
PR middle-end/110489
* statistics.cc (curr_statistics_hash): Add argument
indicating whether we should allocate the hash.
(statistics_fini_pass): If the hash isn't allocated
only print the summary header.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-06-30 8:38 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-29 19:35 [Bug c/110489] New: Slow building virtual.c.i from p11-kit sjames at gcc dot gnu.org
2023-06-29 19:46 ` [Bug middle-end/110489] " pinskia at gcc dot gnu.org
2023-06-29 19:54 ` pinskia at gcc dot gnu.org
2023-06-30 7:45 ` rguenth at gcc dot gnu.org
2023-06-30 8:37 ` rguenth at gcc dot gnu.org
2023-06-30 8:38 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).