public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/110489] New: Slow building virtual.c.i from p11-kit
@ 2023-06-29 19:35 sjames at gcc dot gnu.org
  2023-06-29 19:46 ` [Bug middle-end/110489] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-06-29 19:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489

            Bug ID: 110489
           Summary: Slow building virtual.c.i from p11-kit
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: compile-time-hog
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sjames at gcc dot gnu.org
  Target Milestone: ---

Created attachment 55430
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55430&action=edit
virtual.c.i.xz

I fear this is a degenerate case as it's a largeish generated file (made during
the build process), but it's noticeable enough for me to raise it anyway.

When building p11-kit, I noticed a handful of files took considerably longer to
build. This is with release checking.

The standout seems to be `virtual.c.i`:

```
$ time gcc -c virtual.c.i -O2 -fPIC

real    0m12.429s
user    0m12.137s
sys     0m0.238s

$ gcc --version
gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -c virtual.c.i -O2 -pipe -fPIC -ftime-report
Time variable                                   usr           sys          wall
          GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)
 1326k (  0%)
 phase parsing                      :   1.24 (  5%)   1.01 ( 20%)   2.26 (  7%)
   53M (  9%)
 phase lang. deferred               :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
   96  (  0%)
 phase opt and generate             :  24.16 ( 95%)   4.08 ( 80%)  28.74 ( 93%)
  570M ( 91%)
 phase finalize                     :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)
    0  (  0%)
 garbage collection                 :   0.08 (  0%)   0.00 (  0%)   0.08 (  0%)
    0  (  0%)
 dump files                         :   1.07 (  4%)   0.24 (  5%)   1.58 (  5%)
    0  (  0%)
 callgraph construction             :   0.14 (  1%)   0.01 (  0%)   0.22 (  1%)
   18M (  3%)
 callgraph optimization             :   0.36 (  1%)   0.12 (  2%)   0.50 (  2%)
   13k (  0%)
 callgraph functions expansion      :  20.14 ( 79%)   3.12 ( 61%)  23.71 ( 76%)
  390M ( 62%)
 callgraph ipa passes               :   3.61 ( 14%)   0.84 ( 17%)   4.48 ( 14%)
  132M ( 21%)
 ipa function summary               :   0.14 (  1%)   0.04 (  1%)   0.12 (  0%)
   13M (  2%)
 ipa dead code removal              :   0.09 (  0%)   0.00 (  0%)   0.07 (  0%)
    0  (  0%)
 ipa devirtualization               :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 ipa cp                             :   0.14 (  1%)   0.04 (  1%)   0.11 (  0%)
 6933k (  1%)
 ipa inlining heuristics            :   0.14 (  1%)   0.02 (  0%)   0.10 (  0%)
  135k (  0%)
 ipa function splitting             :   0.39 (  2%)   0.06 (  1%)   0.34 (  1%)
   40M (  7%)
 ipa comdats                        :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 ipa reference                      :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 ipa profile                        :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 ipa pure const                     :   0.07 (  0%)   0.02 (  0%)   0.16 (  1%)
 2416  (  0%)
 ipa icf                            :   0.14 (  1%)   0.00 (  0%)   0.14 (  0%)
 8112  (  0%)
 ipa SRA                            :   0.10 (  0%)   0.00 (  0%)   0.11 (  0%)
 1116k (  0%)
 ipa free lang data                 :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    0  (  0%)
 ipa free inline summary            :   0.02 (  0%)   0.00 (  0%)   0.04 (  0%)
    0  (  0%)
 ipa modref                         :   0.08 (  0%)   0.00 (  0%)   0.08 (  0%)
 2874k (  0%)
 cfg construction                   :   0.02 (  0%)   0.02 (  0%)   0.06 (  0%)
 1323k (  0%)
 cfg cleanup                        :   0.23 (  1%)   0.06 (  1%)   0.34 (  1%)
 3108k (  0%)
 trivially dead code                :   0.07 (  0%)   0.02 (  0%)   0.14 (  0%)
    0  (  0%)
 df scan insns                      :   0.20 (  1%)   0.03 (  1%)   0.25 (  1%)
  282k (  0%)
 df reaching defs                   :   0.21 (  1%)   0.03 (  1%)   0.37 (  1%)
    0  (  0%)
 df live regs                       :   0.35 (  1%)   0.08 (  2%)   0.31 (  1%)
    0  (  0%)
 df live&initialized regs           :   0.23 (  1%)   0.01 (  0%)   0.17 (  1%)
    0  (  0%)
 df must-initialized regs           :   0.06 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 df use-def / def-use chains        :   0.10 (  0%)   0.02 (  0%)   0.17 (  1%)
    0  (  0%)
 df reg dead/unused notes           :   0.40 (  2%)   0.02 (  0%)   0.32 (  1%)
 6334k (  1%)
 register information               :   0.14 (  1%)   0.01 (  0%)   0.06 (  0%)
    0  (  0%)
 alias analysis                     :   0.46 (  2%)   0.03 (  1%)   0.31 (  1%)
   11M (  2%)
 alias stmt walking                 :   0.13 (  1%)   0.04 (  1%)   0.13 (  0%)
   18k (  0%)
 register scan                      :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
  137k (  0%)
 rebuild jump labels                :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)
    0  (  0%)
 preprocessing                      :   0.32 (  1%)   0.23 (  5%)   0.60 (  2%)
 4145k (  1%)
 lexical analysis                   :   0.48 (  2%)   0.37 (  7%)   0.70 (  2%)
    0  (  0%)
 parser (global)                    :   0.10 (  0%)   0.07 (  1%)   0.25 (  1%)
   14M (  2%)
 parser struct body                 :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
  181k (  0%)
 parser function body               :   0.34 (  1%)   0.34 (  7%)   0.71 (  2%)
   34M (  6%)
 early inlining heuristics          :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
   50k (  0%)
 inline parameters                  :   0.16 (  1%)   0.02 (  0%)   0.16 (  1%)
 9064k (  1%)
 integration                        :   0.42 (  2%)   0.02 (  0%)   0.44 (  1%)
   17M (  3%)
 tree gimplify                      :   0.19 (  1%)   0.05 (  1%)   0.20 (  1%)
   20M (  3%)
 tree eh                            :   0.00 (  0%)   0.01 (  0%)   0.00 (  0%)
    0  (  0%)
 tree CFG construction              :   0.00 (  0%)   0.01 (  0%)   0.05 (  0%)
   11M (  2%)
 tree CFG cleanup                   :   0.34 (  1%)   0.10 (  2%)   0.46 (  1%)
   68k (  0%)
 tree tail merge                    :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
 7696  (  0%)
 tree VRP                           :   0.39 (  2%)   0.10 (  2%)   0.58 (  2%)
   14M (  2%)
 tree Early VRP                     :   0.12 (  0%)   0.05 (  1%)   0.22 (  1%)
 6698k (  1%)
 tree copy propagation              :   0.06 (  0%)   0.00 (  0%)   0.13 (  0%)
  432  (  0%)
 tree PTA                           :   0.80 (  3%)   0.18 (  4%)   1.10 (  4%)
 6475k (  1%)
 tree SSA other                     :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    0  (  0%)
 tree SSA rewrite                   :   0.12 (  0%)   0.02 (  0%)   0.12 (  0%)
   10M (  2%)
 tree SSA incremental               :   0.23 (  1%)   0.04 (  1%)   0.19 (  1%)
 3738k (  1%)
 tree operand scan                  :   0.07 (  0%)   0.05 (  1%)   0.16 (  1%)
   14M (  2%)
 dominator optimization             :   0.62 (  2%)   0.15 (  3%)   0.53 (  2%)
 4427k (  1%)
 backwards jump threading           :   0.27 (  1%)   0.01 (  0%)   0.31 (  1%)
   38k (  0%)
 tree SRA                           :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
    0  (  0%)
 isolate eroneous paths             :   0.01 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 tree CCP                           :   0.50 (  2%)   0.08 (  2%)   0.50 (  2%)
  696k (  0%)
 tree reassociation                 :   0.04 (  0%)   0.02 (  0%)   0.04 (  0%)
   48  (  0%)
 tree PRE                           :   0.34 (  1%)   0.05 (  1%)   0.35 (  1%)
 8448k (  1%)
 tree FRE                           :   0.46 (  2%)   0.06 (  1%)   0.55 (  2%)
 5354k (  1%)
 tree code sinking                  :   0.07 (  0%)   0.02 (  0%)   0.02 (  0%)
  270k (  0%)
 tree linearize phis                :   0.02 (  0%)   0.00 (  0%)   0.04 (  0%)
 6316k (  1%)
 tree backward propagate            :   0.02 (  0%)   0.00 (  0%)   0.04 (  0%)
    0  (  0%)
 tree forward propagate             :   0.12 (  0%)   0.05 (  1%)   0.19 (  1%)
   41k (  0%)
 tree phiprop                       :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 tree conservative DCE              :   0.10 (  0%)   0.02 (  0%)   0.17 (  1%)
 7296  (  0%)
 tree aggressive DCE                :   0.14 (  1%)   0.04 (  1%)   0.13 (  0%)
   12M (  2%)
 tree buildin call DCE              :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)
    0  (  0%)
 tree DSE                           :   0.03 (  0%)   0.00 (  0%)   0.09 (  0%)
   15k (  0%)
 PHI merge                          :   0.01 (  0%)   0.01 (  0%)   0.01 (  0%)
   48k (  0%)
 tree loop optimization             :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 tree loop invariant motion         :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
    0  (  0%)
 complete unrolling                 :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
   12k (  0%)
 tree slp vectorization             :   0.18 (  1%)   0.04 (  1%)   0.23 (  1%)
   12M (  2%)
 tree copy headers                  :   0.02 (  0%)   0.00 (  0%)   0.00 (  0%)
 7904  (  0%)
 tree SSA uncprop                   :   0.05 (  0%)   0.00 (  0%)   0.05 (  0%)
    0  (  0%)
 tree NRV optimization              :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)
  139k (  0%)
 tree switch conversion             :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 tree switch lowering               :   0.00 (  0%)   0.00 (  0%)   0.05 (  0%)
    0  (  0%)
 gimple CSE sin/cos                 :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 gimple widening/fma detection      :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
    0  (  0%)
 tree strlen optimization           :   0.07 (  0%)   0.01 (  0%)   0.12 (  0%)
 6325k (  1%)
 tree modref                        :   0.14 (  1%)   0.04 (  1%)   0.16 (  1%)
   10M (  2%)
 dominance frontiers                :   0.03 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 dominance computation              :   0.50 (  2%)   0.13 (  3%)   0.82 (  3%)
    0  (  0%)
 control dependences                :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 out of ssa                         :   0.06 (  0%)   0.02 (  0%)   0.11 (  0%)
 1061k (  0%)
 expand vars                        :   0.05 (  0%)   0.01 (  0%)   0.03 (  0%)
 2414k (  0%)
 expand                             :   0.54 (  2%)   0.09 (  2%)   0.55 (  2%)
   38M (  6%)
 post expand cleanups               :   0.03 (  0%)   0.02 (  0%)   0.08 (  0%)
 3084k (  0%)
 lower subreg                       :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 jump                               :   0.04 (  0%)   0.02 (  0%)   0.03 (  0%)
    0  (  0%)
 forward prop                       :   0.42 (  2%)   0.04 (  1%)   0.50 (  2%)
  226k (  0%)
 CSE                                :   0.31 (  1%)   0.03 (  1%)   0.38 (  1%)
  237k (  0%)
 dead code elimination              :   0.13 (  1%)   0.02 (  0%)   0.07 (  0%)
    0  (  0%)
 dead store elim1                   :   0.12 (  0%)   0.03 (  1%)   0.22 (  1%)
 2042k (  0%)
 dead store elim2                   :   0.16 (  1%)   0.01 (  0%)   0.18 (  1%)
 2333k (  0%)
 loop analysis                      :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
    0  (  0%)
 loop init                          :   0.36 (  1%)   0.10 (  2%)   0.44 (  1%)
   33M (  5%)
 loop fini                          :   0.09 (  0%)   0.04 (  1%)   0.13 (  0%)
    0  (  0%)
 CPROP                              :   0.43 (  2%)   0.08 (  2%)   0.50 (  2%)
 1842k (  0%)
 PRE                                :   0.13 (  1%)   0.04 (  1%)   0.17 (  1%)
 3816  (  0%)
 CSE 2                              :   0.25 (  1%)   0.04 (  1%)   0.34 (  1%)
  311k (  0%)
 branch prediction                  :   0.11 (  0%)   0.03 (  1%)   0.07 (  0%)
  623k (  0%)
 combiner                           :   0.39 (  2%)   0.04 (  1%)   0.57 (  2%)
 4948k (  1%)
 if-conversion                      :   0.09 (  0%)   0.01 (  0%)   0.12 (  0%)
   86k (  0%)
 mode switching                     :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    0  (  0%)
 integrated RA                      :   1.75 (  7%)   0.11 (  2%)   2.10 (  7%)
  147M ( 24%)
 LRA non-specific                   :   0.39 (  2%)   0.05 (  1%)   0.50 (  2%)
  391k (  0%)
 LRA virtuals elimination           :   0.02 (  0%)   0.01 (  0%)   0.06 (  0%)
  112k (  0%)
 LRA reload inheritance             :   0.03 (  0%)   0.00 (  0%)   0.07 (  0%)
    0  (  0%)
 LRA create live ranges             :   0.09 (  0%)   0.02 (  0%)   0.07 (  0%)
 8232  (  0%)
 LRA hard reg assignment            :   0.01 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 reload                             :   0.02 (  0%)   0.01 (  0%)   0.05 (  0%)
  141k (  0%)
 reload CSE regs                    :   0.34 (  1%)   0.11 (  2%)   0.49 (  2%)
 3089k (  0%)
 ree                                :   0.05 (  0%)   0.00 (  0%)   0.09 (  0%)
   19k (  0%)
 thread pro- & epilogue             :   0.32 (  1%)   0.01 (  0%)   0.32 (  1%)
   12M (  2%)
 if-conversion 2                    :   0.07 (  0%)   0.02 (  0%)   0.04 (  0%)
    0  (  0%)
 combine stack adjustments          :   0.01 (  0%)   0.00 (  0%)   0.04 (  0%)
    0  (  0%)
 peephole 2                         :   0.21 (  1%)   0.02 (  0%)   0.17 (  1%)
 1225k (  0%)
 hard reg cprop                     :   0.10 (  0%)   0.01 (  0%)   0.15 (  0%)
  552  (  0%)
 scheduling 2                       :   1.55 (  6%)   0.14 (  3%)   1.35 (  4%)
 2653k (  0%)
 machine dep reorg                  :   0.11 (  0%)   0.04 (  1%)   0.12 (  0%)
    0  (  0%)
 reorder blocks                     :   0.10 (  0%)   0.02 (  0%)   0.21 (  1%)
  724k (  0%)
 shorten branches                   :   0.12 (  0%)   0.02 (  0%)   0.14 (  0%)
    0  (  0%)
 reg stack                          :   0.00 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 final                              :   0.33 (  1%)   0.07 (  1%)   0.59 (  2%)
   11M (  2%)
 variable output                    :   0.02 (  0%)   0.00 (  0%)   0.04 (  0%)
  183k (  0%)
 tree if-combine                    :   0.03 (  0%)   0.00 (  0%)   0.00 (  0%)
   80  (  0%)
 if to switch conversion            :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
    0  (  0%)
 straight-line strength reduction   :   0.05 (  0%)   0.00 (  0%)   0.08 (  0%)
  864  (  0%)
 store merging                      :   0.01 (  0%)   0.00 (  0%)   0.06 (  0%)
  728  (  0%)
 initialize rtl                     :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)
   12k (  0%)
 address lowering                   :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)
  360  (  0%)
 access analysis                    :   0.12 (  0%)   0.03 (  1%)   0.20 (  1%)
   16k (  0%)
 early local passes                 :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 unaccounted optimizations          :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    0  (  0%)
 rest of compilation                :   1.76 (  7%)   0.34 (  7%)   1.82 (  6%)
   14M (  2%)
 unaccounted post reload            :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    0  (  0%)
 unaccounted late compilation       :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 remove unused locals               :   0.08 (  0%)   0.04 (  1%)   0.11 (  0%)
    0  (  0%)
 address taken                      :   0.07 (  0%)   0.01 (  0%)   0.15 (  0%)
    0  (  0%)
 rebuild frequencies                :   0.04 (  0%)   0.02 (  0%)   0.10 (  0%)
  568  (  0%)
 repair loop structures             :   0.03 (  0%)   0.01 (  0%)   0.02 (  0%)
  112  (  0%)
 TOTAL                              :  25.40          5.09         31.03       
  625M
```

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/110489] Slow building virtual.c.i from p11-kit
  2023-06-29 19:35 [Bug c/110489] New: Slow building virtual.c.i from p11-kit sjames at gcc dot gnu.org
@ 2023-06-29 19:46 ` pinskia at gcc dot gnu.org
  2023-06-29 19:54 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-29 19:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The only ones that stick out are:
 dump files                         :   1.07 (  4%)   0.24 (  5%)   1.58 (  5%)
    0  (  0%)
 integrated RA                      :   1.75 (  7%)   0.11 (  2%)   2.10 (  7%)
  147M ( 24%)
 scheduling 2                       :   1.55 (  6%)   0.14 (  3%)   1.35 (  4%)
 2653k (  0%)


Nothing else sticks out really. (but they do add up).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/110489] Slow building virtual.c.i from p11-kit
  2023-06-29 19:35 [Bug c/110489] New: Slow building virtual.c.i from p11-kit sjames at gcc dot gnu.org
  2023-06-29 19:46 ` [Bug middle-end/110489] " pinskia at gcc dot gnu.org
@ 2023-06-29 19:54 ` pinskia at gcc dot gnu.org
  2023-06-30  7:45 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-29 19:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So I took a look at the sources, there are very many small functions.
This might be the reason why dump files Timevar takes a long time, it is called
for each pass and for each function. Maybe that can be improved.

the register allocator and schedule costs I suspect is due to there being a
small setup cost which multiply by many functions add up.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/110489] Slow building virtual.c.i from p11-kit
  2023-06-29 19:35 [Bug c/110489] New: Slow building virtual.c.i from p11-kit sjames at gcc dot gnu.org
  2023-06-29 19:46 ` [Bug middle-end/110489] " pinskia at gcc dot gnu.org
  2023-06-29 19:54 ` pinskia at gcc dot gnu.org
@ 2023-06-30  7:45 ` rguenth at gcc dot gnu.org
  2023-06-30  8:37 ` rguenth at gcc dot gnu.org
  2023-06-30  8:38 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-30  7:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-06-30
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Samples: 45K of event 'cycles', Event count (approx.): 51356148788              
Overhead       Samples  Command  Shared Object       Symbol                     
   2.57%          1169  cc1      libc-2.31.so        [.] _int_malloc
   1.54%           700  cc1      cc1                 [.] bitmap_set_bit
   1.52%           692  cc1      libc-2.31.so        [.] malloc
   1.32%           602  cc1      libc-2.31.so        [.] _int_free
   1.31%           598  cc1      cc1                 [.] record_reg_classes
   1.04%           476  cc1      cc1                 [.] constrain_operands
   0.81%           368  cc1      cc1                 [.] solve_constraints
   0.79%           360  cc1      cc1                 [.] cse_insn
   0.78%           357  cc1      cc1                 [.] ggc_internal_alloc
   0.76%           347  cc1      libc-2.31.so        [.] free
   0.73%           330  cc1      cc1                 [.] statistics_fini_pass

it's pointing at things I've seen multiple times, but I think investigating
why memory allocation is so high up in the profile would be good.  There
are some users like dom_info::dom_init which hit hard on the allocator
without good reason but then it's only few per function but as seen this
testcase has many of them.  Likewise ipa_sra_summarize_function seems to
have 99% cost in memory allocation.  Doing more on-demand initialization
might help here.

I have a patch for the statistics_finish_pass hit.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/110489] Slow building virtual.c.i from p11-kit
  2023-06-29 19:35 [Bug c/110489] New: Slow building virtual.c.i from p11-kit sjames at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-06-30  7:45 ` rguenth at gcc dot gnu.org
@ 2023-06-30  8:37 ` rguenth at gcc dot gnu.org
  2023-06-30  8:38 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-30  8:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
There would possibly be opportunity to optimize some of our infrastructure for
the case where we have 3 basic blocks (the minimum, ENTRY, bb2 and EXIT).  For
example dominance compute doesn't need to be "computed", the only special case
to consider is that EXIT is not reachable.

The testcase at hand seems to consist of forwarders.  In general single-BB
functions can be quite common in C++ code as well.  A lot of passes
could excuse themselves as well (next special case is BB2 having a
backedge to itself).  There is quite some constant overhead all over the place
even when we do nothing in the end.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/110489] Slow building virtual.c.i from p11-kit
  2023-06-29 19:35 [Bug c/110489] New: Slow building virtual.c.i from p11-kit sjames at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-06-30  8:37 ` rguenth at gcc dot gnu.org
@ 2023-06-30  8:38 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-30  8:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110489

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:18e5aeaef294428fc8458c2c70a9ac3a537c35d6

commit r14-2209-g18e5aeaef294428fc8458c2c70a9ac3a537c35d6
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Jun 30 09:46:48 2023 +0200

    middle-end/110489 - avoid useless work on statistics

    When we call statistics_fini_pass we unconditionally allocate
    the statistics hash and traverse it.  When a TU has many small
    functions this can take considerable time.  The following avoids
    this by never allocating the hash from this function.

            PR middle-end/110489
            * statistics.cc (curr_statistics_hash): Add argument
            indicating whether we should allocate the hash.
            (statistics_fini_pass): If the hash isn't allocated
            only print the summary header.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-06-30  8:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-29 19:35 [Bug c/110489] New: Slow building virtual.c.i from p11-kit sjames at gcc dot gnu.org
2023-06-29 19:46 ` [Bug middle-end/110489] " pinskia at gcc dot gnu.org
2023-06-29 19:54 ` pinskia at gcc dot gnu.org
2023-06-30  7:45 ` rguenth at gcc dot gnu.org
2023-06-30  8:37 ` rguenth at gcc dot gnu.org
2023-06-30  8:38 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).