public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/114261] New: [13/14 Regression] Scheduling takes excessive time (97%)
@ 2024-03-07  1:37 patrick at rivosinc dot com
  2024-03-07  1:38 ` [Bug rtl-optimization/114261] " patrick at rivosinc dot com
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: patrick at rivosinc dot com @ 2024-03-07  1:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261

            Bug ID: 114261
           Summary: [13/14 Regression] Scheduling takes excessive time
                    (97%)
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: patrick at rivosinc dot com
  Target Milestone: ---

I recently enabled timeout detection for the risc-v fuzzer so I'm not sure how
interesting this is. Seems like something weird is going on in scheduling that
should be short circuited/bailed out.

tip-of-tree
> ./bin/riscv64-unknown-linux-gnu-gcc red.c -O1 -fschedule-insns -w -ftime-report

Time variable                                   usr           sys          wall
          GGC
 phase setup                        :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
 2779k (  1%)
 phase parsing                      :   0.72 (  0%)   0.54 ( 38%)   1.35 (  0%)
   36M ( 13%)
 phase opt and generate             : 917.85 (100%)   0.89 ( 62%) 925.29 (100%)
  241M ( 86%)
 phase last asm                     :   0.07 (  0%)   0.00 (  0%)   0.06 (  0%)
  343k (  0%)
 garbage collection                 :   0.31 (  0%)   0.01 (  1%)   0.37 (  0%)
    0  (  0%)
 dump files                         :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    0  (  0%)
 callgraph construction             :   0.06 (  0%)   0.00 (  0%)   0.10 (  0%)
 8997k (  3%)
 callgraph optimization             :   0.02 (  0%)   0.01 (  1%)   0.03 (  0%)
   10k (  0%)
 callgraph functions expansion      : 915.19 (100%)   0.76 ( 53%) 922.36 (100%)
  198M ( 71%)
 callgraph ipa passes               :   2.45 (  0%)   0.10 (  7%)   2.68 (  0%)
   19M (  7%)
 ipa function summary               :   0.20 (  0%)   0.00 (  0%)   0.21 (  0%)
 5358k (  2%)
 ipa dead code removal              :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 ipa inlining heuristics            :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
  192  (  0%)
 ipa pure const                     :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 1080  (  0%)
 ipa modref                         :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 4336  (  0%)
 cfg construction                   :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
   28k (  0%)
 cfg cleanup                        :   0.08 (  0%)   0.00 (  0%)   0.07 (  0%)
 2496  (  0%)
 CFG verifier                       :   1.27 (  0%)   0.01 (  1%)   1.26 (  0%)
    0  (  0%)
 trivially dead code                :   0.11 (  0%)   0.00 (  0%)   0.12 (  0%)
    0  (  0%)
 df scan insns                      :   0.20 (  0%)   0.04 (  3%)   0.26 (  0%)
  528  (  0%)
 df reaching defs                   :   1.03 (  0%)   0.03 (  2%)   1.07 (  0%)
    0  (  0%)
 df live regs                       :   0.75 (  0%)   0.00 (  0%)   0.80 (  0%)
    0  (  0%)
 df live&initialized regs           :   0.22 (  0%)   0.00 (  0%)   0.22 (  0%)
    0  (  0%)
 df use-def / def-use chains        :   0.00 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 df reg dead/unused notes           :   0.66 (  0%)   0.00 (  0%)   0.68 (  0%)
 7041k (  2%)
 register information               :   0.27 (  0%)   0.00 (  0%)   0.28 (  0%)
    0  (  0%)
 alias analysis                     :   0.45 (  0%)   0.01 (  1%)   0.45 (  0%)
   14M (  5%)
 alias stmt walking                 :   0.38 (  0%)   0.03 (  2%)   0.50 (  0%)
 3192  (  0%)
 register scan                      :   0.03 (  0%)   0.00 (  0%)   0.05 (  0%)
  586k (  0%)
 rebuild jump labels                :   0.05 (  0%)   0.00 (  0%)   0.05 (  0%)
    0  (  0%)
 preprocessing                      :   0.08 (  0%)   0.09 (  6%)   0.24 (  0%)
 1861k (  1%)
 lexical analysis                   :   0.24 (  0%)   0.16 ( 11%)   0.30 (  0%)
    0  (  0%)
 parser (global)                    :   0.18 (  0%)   0.12 (  8%)   0.35 (  0%)
10103k (  4%)
 parser function body               :   0.19 (  0%)   0.17 ( 12%)   0.40 (  0%)
   24M (  9%)
 inline parameters                  :   0.10 (  0%)   0.00 (  0%)   0.07 (  0%)
  128k (  0%)
 tree gimplify                      :   0.04 (  0%)   0.02 (  1%)   0.05 (  0%)
   12M (  5%)
 tree eh                            :   0.00 (  0%)   0.01 (  1%)   0.00 (  0%)
10056  (  0%)
 tree CFG construction              :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
 8660k (  3%)
 tree CFG cleanup                   :   0.03 (  0%)   0.00 (  0%)   0.02 (  0%)
   20k (  0%)
 tree copy propagation              :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)
  200  (  0%)
 tree PTA                           :   0.36 (  0%)   0.06 (  4%)   0.44 (  0%)
 2128k (  1%)
 tree SSA other                     :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    0  (  0%)
 tree SSA rewrite                   :   0.03 (  0%)   0.00 (  0%)   0.02 (  0%)
 2079k (  1%)
 tree SSA incremental               :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
  196k (  0%)
 tree operand scan                  :   0.00 (  0%)   0.01 (  1%)   0.02 (  0%)
 3695k (  1%)
 dominator optimization             :   0.42 (  0%)   0.00 (  0%)   0.46 (  0%)
  515k (  0%)
 backwards jump threading           :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
 2256  (  0%)
 tree CCP                           :   0.13 (  0%)   0.01 (  1%)   0.12 (  0%)
  751k (  0%)
 tree FRE                           :   0.13 (  0%)   0.00 (  0%)   0.07 (  0%)
  659k (  0%)
 tree code sinking                  :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)
   63k (  0%)
 tree backward propagate            :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    0  (  0%)
 tree forward propagate             :   0.03 (  0%)   0.00 (  0%)   0.05 (  0%)
   62k (  0%)
 tree conservative DCE              :   0.06 (  0%)   0.01 (  1%)   0.06 (  0%)
    0  (  0%)
 tree aggressive DCE                :   0.03 (  0%)   0.01 (  1%)   0.02 (  0%)
    0  (  0%)
 tree DSE                           :   0.03 (  0%)   0.00 (  0%)   0.02 (  0%)
   16k (  0%)
 tree loop invariant motion         :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
 2712  (  0%)
 tree canonical iv                  :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
  234k (  0%)
 complete unrolling                 :   0.05 (  0%)   0.00 (  0%)   0.06 (  0%)
  977k (  0%)
 tree iv optimization               :   0.06 (  0%)   0.00 (  0%)   0.07 (  0%)
 2150k (  1%)
 tree copy headers                  :   0.04 (  0%)   0.00 (  0%)   0.03 (  0%)
  684k (  0%)
 tree SSA verifier                  :   1.39 (  0%)   0.01 (  1%)   1.45 (  0%)
    0  (  0%)
 tree STMT verifier                 :   2.04 (  0%)   0.03 (  2%)   2.14 (  0%)
    0  (  0%)
 tree strlen optimization           :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)
  582k (  0%)
 tree modref                        :   0.04 (  0%)   0.00 (  0%)   0.05 (  0%)
   11k (  0%)
 callgraph verifier                 :   0.07 (  0%)   0.00 (  0%)   0.10 (  0%)
    0  (  0%)
 dominance computation              :   0.01 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 out of ssa                         :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
  376  (  0%)
 expand vars                        :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
 2102k (  1%)
 expand                             :   0.29 (  0%)   0.05 (  3%)   0.37 (  0%)
   63M ( 22%)
 post expand cleanups               :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
 5184  (  0%)
 lower subreg                       :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 forward prop                       :   0.25 (  0%)   0.04 (  3%)   0.29 (  0%)
 1448k (  1%)
 CSE                                :   0.64 (  0%)   0.01 (  1%)   0.65 (  0%)
 1083k (  0%)
 dead code elimination              :   0.07 (  0%)   0.00 (  0%)   0.07 (  0%)
    0  (  0%)
 dead store elim1                   :   0.70 (  0%)   0.01 (  1%)   0.74 (  0%)
 2507k (  1%)
 dead store elim2                   :   0.61 (  0%)   0.00 (  0%)   0.61 (  0%)
 2385k (  1%)
 loop init                          :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)
 1257k (  0%)
 loop invariant motion              :   0.15 (  0%)   0.00 (  0%)   0.13 (  0%)
   23k (  0%)
 loop fini                          :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 auto inc dec                       :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)
  144  (  0%)
 branch prediction                  :   0.01 (  0%)   0.00 (  0%)   0.03 (  0%)
   58k (  0%)
 combiner                           :   1.12 (  0%)   0.03 (  2%)   1.17 (  0%)
   22M (  8%)
 scheduling                         : 895.36 ( 97%)   0.19 ( 13%) 901.31 ( 97%)
   15M (  6%)
 integrated RA                      :   2.93 (  0%)   0.21 ( 15%)   3.28 (  0%)
   52M ( 19%)
 LRA non-specific                   :   0.41 (  0%)   0.01 (  1%)   0.44 (  0%)
  218k (  0%)
 LRA virtuals elimination           :   0.11 (  0%)   0.00 (  0%)   0.10 (  0%)
   42k (  0%)
 LRA reload inheritance             :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
 4816  (  0%)
 LRA create live ranges             :   0.12 (  0%)   0.00 (  0%)   0.12 (  0%)
  411k (  0%)
 LRA hard reg assignment            :   0.10 (  0%)   0.00 (  0%)   0.10 (  0%)
    0  (  0%)
 reload                             :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
  144  (  0%)
 reload CSE regs                    :   0.18 (  0%)   0.00 (  0%)   0.17 (  0%)
 2470k (  1%)
 thread pro- & epilogue             :   0.27 (  0%)   0.00 (  0%)   0.26 (  0%)
   18k (  0%)
 hard reg cprop                     :   0.21 (  0%)   0.00 (  0%)   0.21 (  0%)
   11k (  0%)
 reorder blocks                     :   0.03 (  0%)   0.00 (  0%)   0.04 (  0%)
   89k (  0%)
 shorten branches                   :   0.04 (  0%)   0.01 (  1%)   0.04 (  0%)
    0  (  0%)
 final                              :   0.15 (  0%)   0.00 (  0%)   0.15 (  0%)
  385k (  0%)
 straight-line strength reduction   :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
  234k (  0%)
 initialize rtl                     :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
   17k (  0%)
 access analysis                    :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)
  512  (  0%)
 rest of compilation                :   0.21 (  0%)   0.00 (  0%)   0.24 (  0%)
 1531k (  1%)
 remove unused locals               :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 address taken                      :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 verify loop closed                 :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 verify RTL sharing                 :   2.30 (  0%)   0.02 (  1%)   2.40 (  0%)
    0  (  0%)
 TOTAL                              : 918.66          1.43        926.72       
  280M
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.

Bisected with a 2 minute timeout. First bad commit: r13-5154-g733a1b777f1

r13-5154-g733a1b777f1
> ./bin/riscv64-unknown-linux-gnu-gcc red.c -O1 -fschedule-insns -w -ftime-report; /work/patrick/notify.sh

Time variable                                   usr           sys          wall
          GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
 1539k (  1%)
 phase parsing                      :   0.61 (  0%)   0.51 ( 35%)   1.12 (  0%)
   36M ( 13%)
 phase opt and generate             :1342.05 (100%)   0.93 ( 64%)1348.25 (100%)
  246M ( 87%)
 phase last asm                     :   0.05 (  0%)   0.01 (  1%)   0.06 (  0%)
   89k (  0%)
 garbage collection                 :   0.39 (  0%)   0.01 (  1%)   0.41 (  0%)
    0  (  0%)
 callgraph construction             :   0.12 (  0%)   0.02 (  1%)   0.09 (  0%)
 9000k (  3%)
 callgraph optimization             :   0.02 (  0%)   0.01 (  1%)   0.02 (  0%)
 4112  (  0%)
 callgraph functions expansion      :1339.07 (100%)   0.81 ( 56%)1345.12 (100%)
  202M ( 71%)
 callgraph ipa passes               :   2.68 (  0%)   0.10 (  7%)   2.81 (  0%)
   21M (  7%)
 ipa function summary               :   0.08 (  0%)   0.01 (  1%)   0.09 (  0%)
 7113k (  2%)
 ipa dead code removal              :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 ipa inlining heuristics            :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 2536  (  0%)
 ipa pure const                     :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
 1080  (  0%)
 ipa free inline summary            :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 ipa modref                         :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
 6256  (  0%)
 cfg construction                   :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
   28k (  0%)
 cfg cleanup                        :   0.15 (  0%)   0.00 (  0%)   0.14 (  0%)
 2496  (  0%)
 CFG verifier                       :   1.46 (  0%)   0.01 (  1%)   1.48 (  0%)
    0  (  0%)
 trivially dead code                :   0.12 (  0%)   0.00 (  0%)   0.12 (  0%)
    0  (  0%)
 df scan insns                      :   0.18 (  0%)   0.06 (  4%)   0.24 (  0%)
  624  (  0%)
 df reaching defs                   :   1.28 (  0%)   0.02 (  1%)   1.29 (  0%)
    0  (  0%)
 df live regs                       :   0.87 (  0%)   0.00 (  0%)   0.83 (  0%)
    0  (  0%)
 df live&initialized regs           :   0.25 (  0%)   0.00 (  0%)   0.23 (  0%)
    0  (  0%)
 df use-def / def-use chains        :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
    0  (  0%)
 df reg dead/unused notes           :   0.70 (  0%)   0.00 (  0%)   0.69 (  0%)
 6981k (  2%)
 register information               :   0.33 (  0%)   0.00 (  0%)   0.33 (  0%)
    0  (  0%)
 alias analysis                     :   0.46 (  0%)   0.00 (  0%)   0.44 (  0%)
   14M (  5%)
 alias stmt walking                 :   0.54 (  0%)   0.02 (  1%)   0.59 (  0%)
 3192  (  0%)
 register scan                      :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)
  591k (  0%)
 rebuild jump labels                :   0.05 (  0%)   0.00 (  0%)   0.05 (  0%)
    0  (  0%)
 preprocessing                      :   0.12 (  0%)   0.10 (  7%)   0.40 (  0%)
 1602k (  1%)
 lexical analysis                   :   0.17 (  0%)   0.20 ( 14%)   0.26 (  0%)
    0  (  0%)
 parser (global)                    :   0.16 (  0%)   0.09 (  6%)   0.22 (  0%)
   10M (  4%)
 parser function body               :   0.11 (  0%)   0.11 (  8%)   0.16 (  0%)
   24M (  9%)
 parser inl. func. body             :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
   24k (  0%)
 inline parameters                  :   0.08 (  0%)   0.00 (  0%)   0.08 (  0%)
  128k (  0%)
 tree gimplify                      :   0.05 (  0%)   0.00 (  0%)   0.08 (  0%)
   12M (  5%)
 tree eh                            :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
10056  (  0%)
 tree CFG construction              :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 8669k (  3%)
 tree CFG cleanup                   :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
   25k (  0%)
 tree copy propagation              :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
  200  (  0%)
 tree PTA                           :   0.40 (  0%)   0.07 (  5%)   0.45 (  0%)
 2127k (  1%)
 tree SSA rewrite                   :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)
 2079k (  1%)
 tree SSA incremental               :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)
  200k (  0%)
 tree operand scan                  :   0.03 (  0%)   0.03 (  2%)   0.04 (  0%)
 3691k (  1%)
 dominator optimization             :   0.80 (  0%)   0.00 (  0%)   0.79 (  0%)
 2918k (  1%)
 backwards jump threading           :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 2560  (  0%)
 tree CCP                           :   0.12 (  0%)   0.00 (  0%)   0.13 (  0%)
  734k (  0%)
 tree reassociation                 :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
   96  (  0%)
 tree FRE                           :   0.11 (  0%)   0.00 (  0%)   0.12 (  0%)
  684k (  0%)
 tree code sinking                  :   0.03 (  0%)   0.00 (  0%)   0.01 (  0%)
   63k (  0%)
 tree linearize phis                :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
   11k (  0%)
 tree forward propagate             :   0.05 (  0%)   0.00 (  0%)   0.04 (  0%)
   67k (  0%)
 tree conservative DCE              :   0.02 (  0%)   0.04 (  3%)   0.06 (  0%)
    0  (  0%)
 tree aggressive DCE                :   0.04 (  0%)   0.00 (  0%)   0.08 (  0%)
   16k (  0%)
 tree DSE                           :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 tree loop invariant motion         :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
 2616  (  0%)
 tree canonical iv                  :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
  236k (  0%)
 complete unrolling                 :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)
  980k (  0%)
 tree iv optimization               :   0.05 (  0%)   0.00 (  0%)   0.05 (  0%)
 2196k (  1%)
 tree SSA uncprop                   :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
    0  (  0%)
 tree SSA verifier                  :   1.82 (  0%)   0.00 (  0%)   1.77 (  0%)
    0  (  0%)
 tree STMT verifier                 :   2.38 (  0%)   0.00 (  0%)   2.35 (  0%)
    0  (  0%)
 tree strlen optimization           :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
  631k (  0%)
 tree modref                        :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)
   12k (  0%)
 callgraph verifier                 :   0.09 (  0%)   0.00 (  0%)   0.11 (  0%)
    0  (  0%)
 dominance computation              :   0.05 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 out of ssa                         :   0.03 (  0%)   0.00 (  0%)   0.04 (  0%)
  480  (  0%)
 expand vars                        :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 2133k (  1%)
 expand                             :   0.29 (  0%)   0.01 (  1%)   0.31 (  0%)
   61M ( 22%)
 post expand cleanups               :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
 5656  (  0%)
 varconst                           :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)
   88k (  0%)
 lower subreg                       :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 forward prop                       :   0.22 (  0%)   0.02 (  1%)   0.24 (  0%)
 1193k (  0%)
 CSE                                :   0.60 (  0%)   0.01 (  1%)   0.61 (  0%)
  863k (  0%)
 dead code elimination              :   0.08 (  0%)   0.00 (  0%)   0.09 (  0%)
    0  (  0%)
 dead store elim1                   :   0.76 (  0%)   0.00 (  0%)   0.77 (  0%)
 2501k (  1%)
 dead store elim2                   :   0.79 (  0%)   0.00 (  0%)   0.77 (  0%)
 2492k (  1%)
 loop init                          :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)
 1305k (  0%)
 loop invariant motion              :   0.13 (  0%)   0.00 (  0%)   0.18 (  0%)
   34k (  0%)
 loop fini                          :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
    0  (  0%)
 branch prediction                  :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
   60k (  0%)
 combiner                           :   0.94 (  0%)   0.00 (  0%)   0.94 (  0%)
   25M (  9%)
 if-conversion                      :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
   10k (  0%)
 scheduling                         :1315.49 ( 98%)   0.37 ( 26%)1320.88 ( 98%)
   15M (  5%)
 integrated RA                      :   4.20 (  0%)   0.22 ( 15%)   4.58 (  0%)
   53M ( 19%)
 LRA non-specific                   :   0.60 (  0%)   0.00 (  0%)   0.63 (  0%)
  245k (  0%)
 LRA virtuals elimination           :   0.20 (  0%)   0.00 (  0%)   0.20 (  0%)
   43k (  0%)
 LRA reload inheritance             :   0.05 (  0%)   0.00 (  0%)   0.05 (  0%)
 5040  (  0%)
 LRA create live ranges             :   0.19 (  0%)   0.00 (  0%)   0.19 (  0%)
  412k (  0%)
 LRA hard reg assignment            :   0.15 (  0%)   0.00 (  0%)   0.15 (  0%)
    0  (  0%)
 reload                             :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
  168  (  0%)
 reload CSE regs                    :   0.18 (  0%)   0.00 (  0%)   0.19 (  0%)
 2494k (  1%)
 thread pro- & epilogue             :   0.28 (  0%)   0.00 (  0%)   0.28 (  0%)
   20k (  0%)
 hard reg cprop                     :   0.21 (  0%)   0.00 (  0%)   0.20 (  0%)
   22k (  0%)
 reorder blocks                     :   0.05 (  0%)   0.00 (  0%)   0.03 (  0%)
   99k (  0%)
 shorten branches                   :   0.06 (  0%)   0.00 (  0%)   0.05 (  0%)
    0  (  0%)
 final                              :   0.13 (  0%)   0.00 (  0%)   0.15 (  0%)
  624  (  0%)
 straight-line strength reduction   :   0.03 (  0%)   0.00 (  0%)   0.04 (  0%)
  234k (  0%)
 access analysis                    :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)
  416  (  0%)
 rest of compilation                :   0.23 (  0%)   0.01 (  1%)   0.27 (  0%)
 2179k (  1%)
 remove unused locals               :   0.02 (  0%)   0.00 (  0%)   0.04 (  0%)
    0  (  0%)
 address taken                      :   0.01 (  0%)   0.00 (  0%)   0.03 (  0%)
    0  (  0%)
 verify RTL sharing                 :   2.54 (  0%)   0.00 (  0%)   2.57 (  0%)
    0  (  0%)
 TOTAL                              :1342.71          1.45       1349.44       
  284M
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.

I don't have the tooling to reliably reduce a slow compile issue so I've
attached the unreduced preprocessed file I was testing with. Any tips for
reducing a case like this would be appreciated.

Testcase found via fuzzer.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%)
  2024-03-07  1:37 [Bug rtl-optimization/114261] New: [13/14 Regression] Scheduling takes excessive time (97%) patrick at rivosinc dot com
@ 2024-03-07  1:38 ` patrick at rivosinc dot com
  2024-03-07  1:40 ` pinskia at gcc dot gnu.org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: patrick at rivosinc dot com @ 2024-03-07  1:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261

--- Comment #1 from Patrick O'Neill <patrick at rivosinc dot com> ---
Created attachment 57640
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57640&action=edit
Raw testcase and headers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%)
  2024-03-07  1:37 [Bug rtl-optimization/114261] New: [13/14 Regression] Scheduling takes excessive time (97%) patrick at rivosinc dot com
  2024-03-07  1:38 ` [Bug rtl-optimization/114261] " patrick at rivosinc dot com
@ 2024-03-07  1:40 ` pinskia at gcc dot gnu.org
  2024-03-07  1:40 ` patrick at rivosinc dot com
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-07  1:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |13.3
                 CC|                            |amonakov at gcc dot gnu.org,
                   |                            |pinskia at gcc dot gnu.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%)
  2024-03-07  1:37 [Bug rtl-optimization/114261] New: [13/14 Regression] Scheduling takes excessive time (97%) patrick at rivosinc dot com
  2024-03-07  1:38 ` [Bug rtl-optimization/114261] " patrick at rivosinc dot com
  2024-03-07  1:40 ` pinskia at gcc dot gnu.org
@ 2024-03-07  1:40 ` patrick at rivosinc dot com
  2024-03-07  7:30 ` amonakov at gcc dot gnu.org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: patrick at rivosinc dot com @ 2024-03-07  1:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261

--- Comment #2 from Patrick O'Neill <patrick at rivosinc dot com> ---
Created attachment 57641
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57641&action=edit
unreduced preprocessed testcase

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%)
  2024-03-07  1:37 [Bug rtl-optimization/114261] New: [13/14 Regression] Scheduling takes excessive time (97%) patrick at rivosinc dot com
                   ` (2 preceding siblings ...)
  2024-03-07  1:40 ` patrick at rivosinc dot com
@ 2024-03-07  7:30 ` amonakov at gcc dot gnu.org
  2024-03-07  7:51 ` patrick at rivosinc dot com
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: amonakov at gcc dot gnu.org @ 2024-03-07  7:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261

--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
The first attachment is empty (perhaps you made a non-recursive archive when
you meant to recursively zip a directory).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%)
  2024-03-07  1:37 [Bug rtl-optimization/114261] New: [13/14 Regression] Scheduling takes excessive time (97%) patrick at rivosinc dot com
                   ` (3 preceding siblings ...)
  2024-03-07  7:30 ` amonakov at gcc dot gnu.org
@ 2024-03-07  7:51 ` patrick at rivosinc dot com
  2024-03-07 20:36 ` law at gcc dot gnu.org
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: patrick at rivosinc dot com @ 2024-03-07  7:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261

Patrick O'Neill <patrick at rivosinc dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #57640|0                           |1
        is obsolete|                            |

--- Comment #4 from Patrick O'Neill <patrick at rivosinc dot com> ---
Created attachment 57642
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57642&action=edit
Raw testcase and headers

Thanks for catching that, here's the zip file with actual files in it this time
;)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%)
  2024-03-07  1:37 [Bug rtl-optimization/114261] New: [13/14 Regression] Scheduling takes excessive time (97%) patrick at rivosinc dot com
                   ` (4 preceding siblings ...)
  2024-03-07  7:51 ` patrick at rivosinc dot com
@ 2024-03-07 20:36 ` law at gcc dot gnu.org
  2024-03-11 16:30 ` amonakov at gcc dot gnu.org
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: law at gcc dot gnu.org @ 2024-03-07 20:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at gcc dot gnu.org
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%)
  2024-03-07  1:37 [Bug rtl-optimization/114261] New: [13/14 Regression] Scheduling takes excessive time (97%) patrick at rivosinc dot com
                   ` (5 preceding siblings ...)
  2024-03-07 20:36 ` law at gcc dot gnu.org
@ 2024-03-11 16:30 ` amonakov at gcc dot gnu.org
  2024-03-12  7:49 ` [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%) since r13-5154-g733a1b777f1 rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: amonakov at gcc dot gnu.org @ 2024-03-11 16:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mkuvyrkov at gcc dot gnu.org

--- Comment #5 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
It appears sched-deps is O(N*M) given N reg_pending_barriers and M distinct
pseudos in a region (or even a basic block). For instance, on the following
testcase

#define x10(x) x x x x x x x x x x
#define x100(x) x10(x10(x))
#define x10000(x) x100(x100(x))

void f(int);

void g(int *p)
{
#if 1
        x10000(f(*p++);)
#else
        x10000(asm("" :: "r"(*p++));)
#endif
}

gcc -O -fschedule-insns invokes add_dependence 20000 times for each asm/call
after the first. There is a loop

      for (i = 0; i < (unsigned)deps->max_reg; i++)
        {
          struct deps_reg *reg_last = &deps->reg_last[i];
          reg_last->sets = alloc_INSN_LIST (insn, reg_last->sets);
          SET_REGNO_REG_SET (&deps->reg_last_in_use, i);
        }

that registers the insn with reg_pending_barrier != 0 in reg_last->sets of each
pseudo, and then all those reg_last->sets will be inspected on the next
reg_pending_barrier insn.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%) since r13-5154-g733a1b777f1
  2024-03-07  1:37 [Bug rtl-optimization/114261] New: [13/14 Regression] Scheduling takes excessive time (97%) patrick at rivosinc dot com
                   ` (6 preceding siblings ...)
  2024-03-11 16:30 ` amonakov at gcc dot gnu.org
@ 2024-03-12  7:49 ` rguenth at gcc dot gnu.org
  2024-03-13  3:09 ` law at gcc dot gnu.org
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-12  7:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Isn't the "scheduling window" limited somehow?  Can we impose an upper bound on
the number of dependences by placing a "virtual barrier" when we hit that
limit?
I don't know the structure of the scheduler or it's dependence analysis
framework.

In other places where GCC faces this usual quadraticness in dependence analysis
we have such limits and try to cope with it as good as we can, worst give up
completely.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%) since r13-5154-g733a1b777f1
  2024-03-07  1:37 [Bug rtl-optimization/114261] New: [13/14 Regression] Scheduling takes excessive time (97%) patrick at rivosinc dot com
                   ` (7 preceding siblings ...)
  2024-03-12  7:49 ` [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%) since r13-5154-g733a1b777f1 rguenth at gcc dot gnu.org
@ 2024-03-13  3:09 ` law at gcc dot gnu.org
  2024-03-13 13:56 ` amonakov at gcc dot gnu.org
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: law at gcc dot gnu.org @ 2024-03-13  3:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261

--- Comment #7 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Yea, there are various limits on the size of various lists the scheduler
maintains.  This looks independent of those various clamps.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%) since r13-5154-g733a1b777f1
  2024-03-07  1:37 [Bug rtl-optimization/114261] New: [13/14 Regression] Scheduling takes excessive time (97%) patrick at rivosinc dot com
                   ` (8 preceding siblings ...)
  2024-03-13  3:09 ` law at gcc dot gnu.org
@ 2024-03-13 13:56 ` amonakov at gcc dot gnu.org
  2024-03-13 14:08 ` rguenth at gcc dot gnu.org
  2024-03-13 15:11 ` amonakov at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: amonakov at gcc dot gnu.org @ 2024-03-13 13:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261

--- Comment #8 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
If we want to get rid of the compilation time regression sooner rather than
later, I can suggest limiting my change only to functions that call setjmp:

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index c23218890f..ae23f55274 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -3695,7 +3695,7 @@ deps_analyze_insn (class deps_desc *deps, rtx_insn *insn)

       CANT_MOVE (insn) = 1;

-      if (!reload_completed)
+      if (!reload_completed && cfun->calls_setjmp)
        {
          /* Scheduling across calls may increase register pressure by
extending
             live ranges of pseudos over the call.  Worse, in presence of
setjmp


That way we retain the "correctness fix" part of r13-5154-g733a1b777f1 and keep
the previous status quo on normal functions (quadraticness on asms like
demonstrated in comment #5 would also remain).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%) since r13-5154-g733a1b777f1
  2024-03-07  1:37 [Bug rtl-optimization/114261] New: [13/14 Regression] Scheduling takes excessive time (97%) patrick at rivosinc dot com
                   ` (9 preceding siblings ...)
  2024-03-13 13:56 ` amonakov at gcc dot gnu.org
@ 2024-03-13 14:08 ` rguenth at gcc dot gnu.org
  2024-03-13 15:11 ` amonakov at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-13 14:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
As far as I understand the testcase is from fuzzing so not "real", so I think
this proposed "fix" isn't necessary (and it's not a real fix, adding a
setjmp call at the end of the function will restore it).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%) since r13-5154-g733a1b777f1
  2024-03-07  1:37 [Bug rtl-optimization/114261] New: [13/14 Regression] Scheduling takes excessive time (97%) patrick at rivosinc dot com
                   ` (10 preceding siblings ...)
  2024-03-13 14:08 ` rguenth at gcc dot gnu.org
@ 2024-03-13 15:11 ` amonakov at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: amonakov at gcc dot gnu.org @ 2024-03-13 15:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261

--- Comment #10 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Indeed, but OTOH according to bug 84402 comment 58 it caused a noticeable hit
on gimple-match.cc compilation:

733a1b777f16cd397b43a242d9c31761f66d3da8 13th January 2023
sched-deps: do not schedule pseudos across calls [PR108117] (Alexander Monakov)
Stage 2: +14%
Stage 3: +9%


In any case, if the proposed band-aid is unnecessary, that's fine with me.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-03-13 15:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-07  1:37 [Bug rtl-optimization/114261] New: [13/14 Regression] Scheduling takes excessive time (97%) patrick at rivosinc dot com
2024-03-07  1:38 ` [Bug rtl-optimization/114261] " patrick at rivosinc dot com
2024-03-07  1:40 ` pinskia at gcc dot gnu.org
2024-03-07  1:40 ` patrick at rivosinc dot com
2024-03-07  7:30 ` amonakov at gcc dot gnu.org
2024-03-07  7:51 ` patrick at rivosinc dot com
2024-03-07 20:36 ` law at gcc dot gnu.org
2024-03-11 16:30 ` amonakov at gcc dot gnu.org
2024-03-12  7:49 ` [Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%) since r13-5154-g733a1b777f1 rguenth at gcc dot gnu.org
2024-03-13  3:09 ` law at gcc dot gnu.org
2024-03-13 13:56 ` amonakov at gcc dot gnu.org
2024-03-13 14:08 ` rguenth at gcc dot gnu.org
2024-03-13 15:11 ` amonakov at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).