public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/54394] New: fatigue2 -flto run time regression
@ 2012-08-28 22:32 jamborm at gcc dot gnu.org
  2012-08-29 15:47 ` [Bug middle-end/54394] [4.8 Regression] " dominiq at lps dot ens.fr
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-08-28 22:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394

             Bug #: 54394
           Summary: fatigue2 -flto run time regression
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: jamborm@gcc.gnu.org
                CC: rguenth@gcc.gnu.org
              Host: x86_64-linux-gnu
            Target: x86_64-linux-gnu


Revision 190346 caused a large run time regression of fatigue2
polyhedron benchmark when run with -Ofast -flto.  On a x86_64-linux
box, the run time went from 150 seconds to 215 seconds and there is a
similar percentage increase on my i686-linux desktop.

The commit leading to that revision is:

    2012-08-13  Richard Guenther  <rguenther@suse.de>

        * basic-block.h (struct basic_block): Remove loop_depth
        member, move flags and index members next to each other.
        * cfgloop.h (bb_loop_depth): New inline function.
        * cfghooks.c (split_block): Do not set loop_depth.
        (duplicate_block): Likewise.
        * cfgloop.c (flow_loop_nodes_find): Likewise.
        (flow_loops_find): Likewise.
        (add_bb_to_loop): Likewise.
        (remove_bb_from_loops): Likewise.
        * cfgrtl.c (force_nonfallthru_and_redirect): Likewise.
        * gimple-streamer-in.c (input_bb): Do not stream loop_depth.
        * gimple-streamer-out.c (output_bb): Likewise.
        * bt-load.c: Include cfgloop.h.
        (migrate_btr_defs): Use bb_loop_depth.
        * cfg.c (dump_bb_info): Likewise.
        * final.c (compute_alignments): Likewise.
        * ira.c (update_equiv_regs): Likewise.
        * tree-ssa-copy.c (init_copy_prop): Likewise.
        * tree-ssa-dom.c (loop_depth_of_name): Likewise.
        * tree-ssa-forwprop.c: Include cfgloop.h.
        (forward_propagate_addr_expr): Use bb_loop_depth.
        * tree-ssa-pre.c (insert_into_preds_of_block): Likewise.
        * tree-ssa-sink.c (select_best_block): Likewise.
        * ipa-inline-analysis.c: Include cfgloop.h.
        (estimate_function_body_sizes): Use bb_loop_depth.
        * Makefile.in (tree-ssa-forwprop.o): Depend on $(CFGLOOP_H).
        (ipa-inline-analysis.o): Likewise.
        (bt-load.o): Likewise.

        * gcc.dg/tree-prof/update-loopch.c: Adjust.

I believe the patch was not supposed to alter compiler output in any
(significant) way.  However, inlining decisions are different (file 1
is the dump before the patch, file 2 with it):

  In file 1: extra inlining into function MAIN__.2477/17
    Function __computer_time_m_MOD_computer_time/13 inlined 1 times (as opposed
to 0 times)
    Function __perdida_m_MOD_perdida/16 inlined 1 times (as opposed to 0 times)

  In file 2: extra inlining into function MAIN__.2477/17
    Function __free_input_MOD_convert_lower_case/9 inlined 1 times (as opposed
to 0 times)
    Function __free_input_MOD_convert_lower_case.part.2.2390/62 inlined 1 times
(as opposed to 0 times)
    Function __read_input_m_MOD_read_input/12 inlined 1 times (as opposed to 0
times)

  In file 2: extra un-inlined function __perdida_m_MOD_perdida/16
    Callers: 1, Callees: 27, Inlinees: 0

  In file 1: extra un-inlined function
__read_input_m_MOD_read_input.constprop.0/122
    Originally a clone of __read_input_m_MOD_read_input/12
    Callers: 1, Callees: 530, Inlinees: 22

At the same time this does not seem to be an LTO issue because the
inline dump of the compilation (as opposed to linking) before the
patch contains lines:

    __perdida_m_MOD_perdida/9 function not considered for inlining
      loop depth: 2 freq:53666 size:21 time: 30 callee size: 0 stack: 0

which the patch changes to:

    __perdida_m_MOD_perdida/9 function not considered for inlining
      loop depth: 0 freq:53666 size:21 time: 30 callee size: 0 stack: 0

LTO only makes the heuristics inline perdida as a function called just
once.  Loop depth 0 makes the candidate look not beneficial/cold even
when we know there are no other callees.

Loop depth is zero because at the time of inlining analysis, the
bb->loop_father is NULL.  So it seems we need to compute loops at the
beginning of inline summary generation?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/54394] [4.8 Regression] fatigue2 -flto run time regression
  2012-08-28 22:32 [Bug middle-end/54394] New: fatigue2 -flto run time regression jamborm at gcc dot gnu.org
@ 2012-08-29 15:47 ` dominiq at lps dot ens.fr
  2012-08-29 18:28 ` jamborm at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-08-29 15:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394

Dominique d'Humieres <dominiq at lps dot ens.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|fatigue2 -flto run time     |[4.8 Regression] fatigue2
                   |regression                  |-flto run time regression

--- Comment #1 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-08-29 15:46:58 UTC ---
I see something similar on x86_64-apple-darwin10, but with -fwhole-program and
not with -flto:
i.e., with -Ofast before or after r190346 the run time is ~150s and does not
change with -flto.
At r190345 and before, the run time with -fwhole-program is ~100s (and ~56s
with --param max-inline-insns-auto=98 -funroll-loops), while it is ~142s at
r190346 and after.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/54394] [4.8 Regression] fatigue2 -flto run time regression
  2012-08-28 22:32 [Bug middle-end/54394] New: fatigue2 -flto run time regression jamborm at gcc dot gnu.org
  2012-08-29 15:47 ` [Bug middle-end/54394] [4.8 Regression] " dominiq at lps dot ens.fr
@ 2012-08-29 18:28 ` jamborm at gcc dot gnu.org
  2012-08-29 21:31 ` dominiq at lps dot ens.fr
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-08-29 18:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394

Martin Jambor <jamborm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
                URL|                            |http://gcc.gnu.org/ml/gcc-p
                   |                            |atches/2012-08/msg01991.htm
                   |                            |l
   Last reconfirmed|                            |2012-08-29
         AssignedTo|unassigned at gcc dot       |jamborm at gcc dot gnu.org
                   |gnu.org                     |
     Ever Confirmed|0                           |1

--- Comment #2 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-08-29 18:28:41 UTC ---
I have posted a fix to the mailing list:

http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01991.html

You can try whether it fixes your regression too.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/54394] [4.8 Regression] fatigue2 -flto run time regression
  2012-08-28 22:32 [Bug middle-end/54394] New: fatigue2 -flto run time regression jamborm at gcc dot gnu.org
  2012-08-29 15:47 ` [Bug middle-end/54394] [4.8 Regression] " dominiq at lps dot ens.fr
  2012-08-29 18:28 ` jamborm at gcc dot gnu.org
@ 2012-08-29 21:31 ` dominiq at lps dot ens.fr
  2012-08-30 15:33 ` jamborm at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-08-29 21:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394

--- Comment #3 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-08-29 21:30:45 UTC ---
> You can try whether it fixes your regression too.

Yes, it does. Thanks.

Did you check if you get the same run time with -flto and -fwhole-program? If
yes, it would probably mean that -flto is not working as it should on darwin.
How could I check that?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/54394] [4.8 Regression] fatigue2 -flto run time regression
  2012-08-28 22:32 [Bug middle-end/54394] New: fatigue2 -flto run time regression jamborm at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2012-08-29 21:31 ` dominiq at lps dot ens.fr
@ 2012-08-30 15:33 ` jamborm at gcc dot gnu.org
  2012-08-31 13:13 ` jamborm at gcc dot gnu.org
  2012-08-31 13:27 ` jamborm at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-08-30 15:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394

--- Comment #4 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-08-30 15:32:53 UTC ---
(In reply to comment #3)
> > You can try whether it fixes your regression too.
> 
> Yes, it does. Thanks.

Great, thanks.

> 
> Did you check if you get the same run time with -flto and -fwhole-program? If
> yes, it would probably mean that -flto is not working as it should on darwin.
> How could I check that?

With the patch, -fwhole-program is only 1% slower than -flto on my
i686 desktop (the x86_64 machine is currently running a bootstrap).

-flto can provide a "better-than-fwhole-program" when you use a linker
plugin.  I do not know what is the status of that on darwin (whether
it works at all or what minimum ld version you need, etc.), you'll
have to search yourself.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/54394] [4.8 Regression] fatigue2 -flto run time regression
  2012-08-28 22:32 [Bug middle-end/54394] New: fatigue2 -flto run time regression jamborm at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2012-08-30 15:33 ` jamborm at gcc dot gnu.org
@ 2012-08-31 13:13 ` jamborm at gcc dot gnu.org
  2012-08-31 13:27 ` jamborm at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-08-31 13:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394

--- Comment #5 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-08-31 13:13:09 UTC ---
Author: jamborm
Date: Fri Aug 31 13:13:03 2012
New Revision: 190831

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190831
Log:
2012-08-31  Martin Jambor  <mjambor@suse.cz>

    PR middle-end/54394
    * ipa-inline-analysis.c (estimate_function_body_sizes): Compute
    dominance info and loops whenever optimizing.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ipa-inline-analysis.c


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug middle-end/54394] [4.8 Regression] fatigue2 -flto run time regression
  2012-08-28 22:32 [Bug middle-end/54394] New: fatigue2 -flto run time regression jamborm at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2012-08-31 13:13 ` jamborm at gcc dot gnu.org
@ 2012-08-31 13:27 ` jamborm at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: jamborm at gcc dot gnu.org @ 2012-08-31 13:27 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394

Martin Jambor <jamborm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #6 from Martin Jambor <jamborm at gcc dot gnu.org> 2012-08-31 13:27:08 UTC ---
Fixed.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-08-31 13:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-28 22:32 [Bug middle-end/54394] New: fatigue2 -flto run time regression jamborm at gcc dot gnu.org
2012-08-29 15:47 ` [Bug middle-end/54394] [4.8 Regression] " dominiq at lps dot ens.fr
2012-08-29 18:28 ` jamborm at gcc dot gnu.org
2012-08-29 21:31 ` dominiq at lps dot ens.fr
2012-08-30 15:33 ` jamborm at gcc dot gnu.org
2012-08-31 13:13 ` jamborm at gcc dot gnu.org
2012-08-31 13:27 ` jamborm at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).