[Bug tree-optimization/108705] New: Unexpected CPU time usage with LTO in ranger propagation

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "rimvydas.jas at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/108705] New: Unexpected CPU time usage with LTO in ranger propagation
Date: Wed, 08 Feb 2023 00:00:25 +0000	[thread overview]
Message-ID: <bug-108705-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108705

            Bug ID: 108705
           Summary: Unexpected CPU time usage with LTO in ranger
                    propagation
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rimvydas.jas at gmail dot com
  Target Milestone: ---

Very trivialized reduced testcase that still works with
--enable-checking=release configured trunk.

$ cat hog.f90  # foo() and bar() are in separate units in original case
subroutine bar(n,m,p,s) ! in bar.f90
implicit none
integer :: n,m
real,intent(inout) :: p(n),s(*)
!real,intent(inout) :: p(:),s(:) ! gives slower growth
call foo(n,m,p,s)
call foo(n,m,p,s)
call foo(n,m,p,s)
call foo(n,m,p,s)
call foo(n,m,p,s)
call foo(n,m,p,s)
call foo(n,m,p,s)
call foo(n,m,p,s)
call foo(n,m,p,s)
call foo(n,m,p,s)
call foo(n,m,p,s)
call foo(n,m,p,s)
call foo(n,m,p,s)
call foo(n,m,p,s)
call foo(n,m,p,s)
! ...
!call foo(n,m,p,s)
end subroutine bar

subroutine foo(n,m,p,b) ! in foo.f90
implicit none
integer :: n,m,j
real,intent(inout) :: p(n),b(*)
do j=1,n
  b(m+j-1)=p(j)
enddo
m=m+n ! <---- problematic part
end subroutine foo

$ gfortran -Wall -Wextra -O2 -flto -c hog.f90 # mimic ccBemfcW.ltrans23.o
$ lto1 -ftime-report -fchecking=0 hog.o # -fltrans /tmp/ccBemfcW.ltrans23.o
Reading object files: hog.o {GC 2518k}  {heap 1028k}
Reading the symbol table:
Merging declarations: {GC 2520k}  {heap 1028k}
Reading summaries: <odr> {GC 2520k}  {heap 1028k} <profile_estimate> {GC 2520k}
 {heap 1028k} <icf> {GC 2520k}  {heap 1028k} <fnsummary> {GC 2530k}  {heap
1168k} <pure-const> {GC 2530k}  {heap 1168k} <modref> {GC 2532k}  {heap 1168k}
{GC 2532k} 
Merging symbols: {heap 1168k}Reading function bodies:
Performing interprocedural optimizations
 <odr> {heap 1168k} <whole-program> {heap 1168k} <profile_estimate> {heap
1168k} <icf> {heap 1360k} <devirt> {heap 1360k} <cp> {heap 1360k} <cdtor> {heap
1360k} <fnsummary> {heap 1360k} <inline> {heap 1360k} <pure-const> {heap 1360k}
<modref> {heap 1360k} <free-fnsummary> {heap 1360k} <static-var> {heap 1360k}
<single-use> {heap 1360k} <comdats> {heap 1360k}Assembling functions:
 <simdclone> {heap 1360k} foo in:foo_ bar in:bar_
Time variable                                   usr           sys          wall
          GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)
 2583k ( 67%)
 phase opt and generate             :1490.95 (100%)   0.00 (  0%)1491.02 (100%)
 1230k ( 32%)
 callgraph functions expansion      :1490.95 (100%)   0.00 (  0%)1491.02 (100%)
 1201k ( 31%)
 tree VRP                           :   5.08 (  0%)   0.00 (  0%)   5.07 (  0%)
   84k (  2%)
 dominator optimization             :1485.85 (100%)   0.00 (  0%)1485.92 (100%)
   16k (  0%)
 tree canonical iv                  :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
   71k (  2%)
 tree loop distribution             :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
  115k (  3%)
 dead store elim1                   :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
 2304  (  0%)
 combiner                           :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
   42k (  1%)
 initialize rtl                     :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
   12k (  0%)
 TOTAL                              :1490.95          0.00       1491.02       
 3846k

For 10+ calls adding extra new call adds to total time: 0.21, 1.07, 6.29,
38.74, 238.24, 1490.95, 9424.12, ... seconds of CPU time.

This is not a problem for non-LTO builds since objects need to be compiled only
once for all executables in the project.  However with LTO it means that for
any executable having problematic subroutines in call graph *and* having such
unit pairs in the same ltrans partition would need to be compiled from scratch
over and over.  In original case final LTO link of a first executable with
-flto=20 was still compiling last 3 ltrans partitions after 25h+ (gcc-12 is
fine).

It is quite hard to say where lto1 gets "stuck" (as in infinite loop, no new
output gets added to assembly outputs either).  One has to ps(1) the full
command line, grab /tmp/ccBemfcW.ltrans23.o and manually invoke lto1 to see
where problematic code units could be.  Also there are no support for
attributes to deal with such problems e.g.:
!GCC$ ATTRIBUTES noclone,noinline :: foo
while LTO is getting better at detecting "strategically placed debug write()
statements".

next             reply	other threads:[~2023-02-08  0:00 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-08  0:00 rimvydas.jas at gmail dot com [this message]
2023-02-08  0:03 ` [Bug tree-optimization/108705] [13 Regression] " pinskia at gcc dot gnu.org
2023-02-08  4:39 ` rimvydas.jas at gmail dot com
2023-02-09 13:36 ` rguenth at gcc dot gnu.org
2023-02-09 14:01 ` amacleod at redhat dot com
2023-02-09 15:31 ` rimvydas.jas at gmail dot com
2023-02-09 16:06 ` amacleod at redhat dot com
2023-02-09 20:51 ` rimvydas.jas at gmail dot com
2023-02-09 20:54 ` rimvydas.jas at gmail dot com
2023-02-10 14:56 ` amacleod at redhat dot com
2023-02-10 15:35 ` rimvydas.jas at gmail dot com
2023-02-10 22:03 ` rimvydas.jas at gmail dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-108705-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).