[Bug middle-end/54146] New: Very slow compile at -O1 (expand vars)

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars)
@ 2012-07-31 18:33 glisse at gcc dot gnu.org
  2012-07-31 18:38 ` [Bug middle-end/54146] " pinskia at gcc dot gnu.org
                   ` (64 more replies)
  0 siblings, 65 replies; 66+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-07-31 18:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

             Bug #: 54146
           Summary: Very slow compile at -O1 (expand vars)
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: glisse@gcc.gnu.org


Created attachment 27912
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27912
testcase

Hello,

trying to compile the attached program on x86_64, g++ slow.cc -frounding-math
-c -w -O2 fails after about an hour. The memory consumption (as shown by top)
never went above 1.5G. clang compiles the whole thing in 1 minute (well, it
won't compile the preprocessed source attached here, but the original program).

If you remove most of the check<...> lines at then end (keep only the first
one), the program takes about 10s to compile at -O0, and 30s at -O1 (or -O2,
-O3), with 2/3 of the time spent in "expand vars" according to -ftime-report.
The most relevant report I found in bugzilla is PR 38474, but it seems to say
that expand vars should be fast now...


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile at -O1 (expand vars)
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
@ 2012-07-31 18:38 ` pinskia at gcc dot gnu.org
  2012-07-31 18:48 ` glisse at gcc dot gnu.org
                   ` (63 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-07-31 18:38 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-07-31 18:38:20 UTC ---
Are you compiling GCC with --enable-checking=release to do the timings?


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile at -O1 (expand vars)
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
  2012-07-31 18:38 ` [Bug middle-end/54146] " pinskia at gcc dot gnu.org
@ 2012-07-31 18:48 ` glisse at gcc dot gnu.org
  2012-07-31 20:17 ` steven at gcc dot gnu.org
                   ` (62 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-07-31 18:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #2 from Marc Glisse <glisse at gcc dot gnu.org> 2012-07-31 18:47:48 UTC ---
(In reply to comment #1)
> Are you compiling GCC with --enable-checking=release to do the timings?

I first noticed the issue with the compiler provided by debian (4.7.1), which
has --enable-checking=release. I then checked with a snapshot from 3 days ago
which doesn't have the option, and the results were about the same.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile at -O1 (expand vars)
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
  2012-07-31 18:38 ` [Bug middle-end/54146] " pinskia at gcc dot gnu.org
  2012-07-31 18:48 ` glisse at gcc dot gnu.org
@ 2012-07-31 20:17 ` steven at gcc dot gnu.org
  2012-07-31 20:34 ` steven at gcc dot gnu.org
                   ` (61 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-07-31 20:17 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-07-31
                 CC|                            |steven at gcc dot gnu.org
     Ever Confirmed|0                           |1


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile at -O1 (expand vars)
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2012-07-31 20:17 ` steven at gcc dot gnu.org
@ 2012-07-31 20:34 ` steven at gcc dot gnu.org
  2012-07-31 22:17 ` steven at gcc dot gnu.org
                   ` (60 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-07-31 20:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #3 from Steven Bosscher <steven at gcc dot gnu.org> 2012-07-31 20:33:56 UTC ---
Time is spent in add_scope_conflicts() in this loop:

  FOR_EACH_BB (bb)
    add_scope_conflicts_1 (bb, work, true);


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile at -O1 (expand vars)
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2012-07-31 20:34 ` steven at gcc dot gnu.org
@ 2012-07-31 22:17 ` steven at gcc dot gnu.org
  2012-07-31 23:00 ` steven at gcc dot gnu.org
                   ` (59 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-07-31 22:17 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #4 from Steven Bosscher <steven at gcc dot gnu.org> 2012-07-31 22:16:58 UTC ---
Created attachment 27915
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27915
Speed up stack var conflict matric computation

The patch speeds up the conflict matrix computation in three ways:

1. Nested EXECUTE_IF_SET_IN_BITMAP is extremely inefficient. The same conflicts
result if one IORs the bitmaps.

2. Self-conflicts should not be recorded (this helps stack_var_conflict_p too).

3. The conflicts matrix can be stored in upper triangular form.

Debugging, bootstrapping, testing, and other activities involved in
contributing this formally are left as an exercise to the reader... ;-)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile at -O1 (expand vars)
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2012-07-31 22:17 ` steven at gcc dot gnu.org
@ 2012-07-31 23:00 ` steven at gcc dot gnu.org
  2012-08-01 12:22 ` steven at gcc dot gnu.org
                   ` (58 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-07-31 23:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27915|0                           |1
        is obsolete|                            |

--- Comment #5 from Steven Bosscher <steven at gcc dot gnu.org> 2012-07-31 22:59:53 UTC ---
Created attachment 27917
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27917
Slighty less broken version of the previous attachment


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile at -O1 (expand vars)
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2012-07-31 23:00 ` steven at gcc dot gnu.org
@ 2012-08-01 12:22 ` steven at gcc dot gnu.org
  2012-08-01 14:08 ` steven at gcc dot gnu.org
                   ` (57 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-01 12:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org,
                   |                            |vmakarov at gcc dot gnu.org

--- Comment #6 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-01 12:22:29 UTC ---
With my patch applied, and with a couple of lines commented out at the bottom:

//  check<CGAL::Quotient<CGAL::Gmpz> >();
//  check<CGAL::Lazy_exact_nt<CGAL::Gmpq> >();
//  check<CORE::BigInt>();
//  check<CORE::BigRat>();
//  check<CORE::BigFloat>();
//  check<CORE::Expr>();

and with the compiler patched with the patch from comment #5 and configured as:
$ cat configargs.h
/* Generated automatically. */
static const char configuration_arguments[] = "../trunk/configure
--with-mpfr=/opt/cfarm/mpfr-latest --with-gmp=/opt/cfarm/gmp-latest
--with-mpc=/opt/cfarm/mpc-latest --with-isl=/opt/cfarm/isl-latest
--with-cloog=/opt/cfarm/cloog-latest --enable-languages=c,c++ --disable-nls
--disable-libmudflap --disable-libssp --disable-libitm --disable-multilib
--disable-bootstrap --enable-checking=release";
static const char thread_model[] = "posix";

static const struct {
  const char *name, *value;
} configure_default_options[] = { { "cpu", "generic" }, { "arch", "x86-64" } };


I have "TOTAL :8223.99" seconds compile time. Top 10 big spenders:

 inline heuristics       :6393.58 (78%) usr 175699 kB (27%) ggc
 tree loop init          : 270.76 ( 3%) usr 242922 kB (37%) ggc
 if-conversion           : 220.55 ( 3%) usr   3774 kB ( 1%) ggc
 integrated RA           : 199.12 ( 2%) usr 593696 kB (90%) ggc
 reload                  : 128.67 ( 2%) usr  52399 kB ( 8%) ggc
 tree SSA incremental    : 195.26 ( 2%) usr 160581 kB (24%) ggc
 tree SSA rewrite        : 112.87 ( 1%) usr  39761 kB ( 6%) ggc
 df live regs            :  79.90 ( 1%) usr      0 kB ( 0%) ggc
 df live&initialized regs:  77.14 ( 1%) usr      0 kB ( 0%) ggc
 out of ssa              :  63.59 ( 1%) usr    661 kB ( 0%) ggc

The inline heuristics stuff is probably also due to stack-vars handling, I will
look into that.

The loop init stuff is something for Richi.

IRA produces the most garbage, by far, and in fact causes OOM if I do not
comment out those last few lines.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile at -O1 (expand vars)
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2012-08-01 12:22 ` steven at gcc dot gnu.org
@ 2012-08-01 14:08 ` steven at gcc dot gnu.org
  2012-08-01 14:14 ` steven at gcc dot gnu.org
                   ` (56 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-01 14:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #7 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-01 14:08:01 UTC ---
(In reply to comment #6)
> The inline heuristics stuff is probably also due to stack-vars handling,
> I will look into that.

Turns out this is due to the use of attribute((flatten)) on gebp_kernel.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile at -O1 (expand vars)
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2012-08-01 14:08 ` steven at gcc dot gnu.org
@ 2012-08-01 14:14 ` steven at gcc dot gnu.org
  2012-08-01 14:25 ` steven at gcc dot gnu.org
                   ` (55 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-01 14:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #8 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-01 14:14:11 UTC ---
With the attribute((flatten)) removed, the full test case compiles in less than
a minute.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile at -O1 (expand vars)
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2012-08-01 14:14 ` steven at gcc dot gnu.org
@ 2012-08-01 14:25 ` steven at gcc dot gnu.org
  2012-08-01 14:32 ` rguenth at gcc dot gnu.org
                   ` (54 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-01 14:25 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |steven at gcc dot gnu.org

--- Comment #9 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-01 14:24:48 UTC ---
clang compiles the test case with attribute((flatten)) because it doesn't
support that attribute (http://llvm.org/bugs/show_bug.cgi?id=7559).

I'm beginning to think this is one of those cases of "Doctor it hurts if I ..."
that should be closed as WONTFIX. Marc, do you know where the use of the
flatten attribute comes from in your code?


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile at -O1 (expand vars)
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2012-08-01 14:25 ` steven at gcc dot gnu.org
@ 2012-08-01 14:32 ` rguenth at gcc dot gnu.org
  2012-08-01 16:00 ` glisse at gcc dot gnu.org
                   ` (53 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-01 14:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #10 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-01 14:32:15 UTC ---
(In reply to comment #9)
> clang compiles the test case with attribute((flatten)) because it doesn't
> support that attribute (http://llvm.org/bugs/show_bug.cgi?id=7559).
> 
> I'm beginning to think this is one of those cases of "Doctor it hurts if I ..."
> that should be closed as WONTFIX. Marc, do you know where the use of the
> flatten attribute comes from in your code?

indeed "flatten" will override any inlining heuristic that avoids creating
gigant function bodies.  Still eventually improving worst-case performance
of some of our algorithms sounds good, even if it will never fix all issues
that pop up when you for example put flatten on main () ;)


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile at -O1 (expand vars)
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2012-08-01 14:32 ` rguenth at gcc dot gnu.org
@ 2012-08-01 16:00 ` glisse at gcc dot gnu.org
  2012-08-02 10:01 ` [Bug tree-optimization/54146] Very slow compile with attribute((flatten)) steven at gcc dot gnu.org
                   ` (52 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-08-01 16:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #11 from Marc Glisse <glisse at gcc dot gnu.org> 2012-08-01 15:59:43 UTC ---
(In reply to comment #9)
> I'm beginning to think this is one of those cases of "Doctor it hurts if I ..."
> that should be closed as WONTFIX.

Indeed, might even call it INVALID. Thanks a lot for the investigation and
sorry for the bad example. Hopefully the patches you wrote are still useful.

> Marc, do you know where the use of the
> flatten attribute comes from in your code?

Comes from the Eigen library, I'll talk to them about it and see if they can
comment. They mostly deal with simple number types (double, float), so the
behavior with heavy types might have been unnoticed.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2012-08-01 16:00 ` glisse at gcc dot gnu.org
@ 2012-08-02 10:01 ` steven at gcc dot gnu.org
  2012-08-02 10:37 ` rguenther at suse dot de
                   ` (51 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-02 10:01 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|middle-end                  |tree-optimization
            Summary|Very slow compile at -O1    |Very slow compile with
                   |(expand vars)               |attribute((flatten))

--- Comment #12 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-02 10:01:03 UTC ---
(In reply to comment #10)
> indeed "flatten" will override any inlining heuristic that avoids creating
> gigant function bodies.  Still eventually improving worst-case performance
> of some of our algorithms sounds good, even if it will never fix all issues
> that pop up when you for example put flatten on main () ;)

Feel free to have a stab at it, if you must.

It would be great if you could have a look at the patch I attached to this bug
report for stack var conflicts (I'd finish it myself, but I won't have time for
it before September or so).

Oh, and document your loop changes, because loop_optimizer_init accounts for
~20% of the compile time (half of it incorrectly accounted to if-conversion,
I'll post a bunch oftimevar fixes this week).

Note that the flattened functions in themselves do not present much of a
problem to the compiler (not at -O1, anyway), so the problem here is not the
by-passing of the inlining heuristics per-se. It is the flattening operation
itself that causes the compiler to explode. You can only see this by adding a
timevar around flatten_function (I used TV_LOCAL_ALLOC for that, but I'll add a
proper timevar for it :-) because the compiler as-is incorrectly accounts the
time for flatten_function to TV_INLINE_HEURISTICS. There must be something
non-linear in the flattening algorithm. I'm still trying to find out where, but
it's kinda hard to debug a problem with a test case that takes literally hours
to compile :-)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2012-08-02 10:01 ` [Bug tree-optimization/54146] Very slow compile with attribute((flatten)) steven at gcc dot gnu.org
@ 2012-08-02 10:37 ` rguenther at suse dot de
  2012-08-03  9:00 ` steven at gcc dot gnu.org
                   ` (50 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: rguenther at suse dot de @ 2012-08-02 10:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #13 from rguenther at suse dot de <rguenther at suse dot de> 2012-08-02 10:35:56 UTC ---
On Thu, 2 Aug 2012, steven at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146
> 
> Steven Bosscher <steven at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>           Component|middle-end                  |tree-optimization
>             Summary|Very slow compile at -O1    |Very slow compile with
>                    |(expand vars)               |attribute((flatten))
> 
> --- Comment #12 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-02 10:01:03 UTC ---
> (In reply to comment #10)
> > indeed "flatten" will override any inlining heuristic that avoids creating
> > gigant function bodies.  Still eventually improving worst-case performance
> > of some of our algorithms sounds good, even if it will never fix all issues
> > that pop up when you for example put flatten on main () ;)
> 
> Feel free to have a stab at it, if you must.
> 
> It would be great if you could have a look at the patch I attached to this bug
> report for stack var conflicts (I'd finish it myself, but I won't have time for
> it before September or so).

It looks sensible - I hope Micha will pick it up though, I'll ping him
personally.

> Oh, and document your loop changes, because loop_optimizer_init accounts for
> ~20% of the compile time (half of it incorrectly accounted to if-conversion,
> I'll post a bunch oftimevar fixes this week).

It's on my list of things (for stage3, I really want to push more invasive
cleanups for stage1 ... heh).

> Note that the flattened functions in themselves do not present much of a
> problem to the compiler (not at -O1, anyway), so the problem here is not the
> by-passing of the inlining heuristics per-se. It is the flattening operation
> itself that causes the compiler to explode. You can only see this by adding a
> timevar around flatten_function (I used TV_LOCAL_ALLOC for that, but I'll add a
> proper timevar for it :-) because the compiler as-is incorrectly accounts the
> time for flatten_function to TV_INLINE_HEURISTICS. There must be something
> non-linear in the flattening algorithm. I'm still trying to find out where, but
> it's kinda hard to debug a problem with a test case that takes literally hours
> to compile :-)

I suppose it's the old issue that we update fibheap keys along each
inlining decision - and with flatten there are just very many ... Honza?
It also seems we flatten depth-first (thus inline leafs) instead of
the other way around:

      orig_callee = callee;
      inline_call (e, true, NULL, NULL);
      if (e->callee != orig_callee)
        orig_callee->symbol.aux = (void *) node;
      flatten_function (e->callee, early);
      if (e->callee != orig_callee)
        orig_callee->symbol.aux = NULL;

This means we will materialize all intermediate flattenings in
functions that will not be reclaimed, right?  flattening foo
should inline everything into foo, but not affect remaining
callee bodies.

Richard.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2012-08-02 10:37 ` rguenther at suse dot de
@ 2012-08-03  9:00 ` steven at gcc dot gnu.org
  2012-08-03 14:44 ` matz at gcc dot gnu.org
                   ` (49 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-03  9:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #14 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-03 08:59:29 UTC ---
Created attachment 27930
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27930
Do not inline_merge_summary if called via flatten_function

As I noted in comment #12, the flattening itself is the problem, not the
heuristics.  I have narrowed this down to inline_merge_summary. With the
attached hack, compile time (for only "check<CGAL::Gmpfi>();") drops from 6459s
to 1560s. I hi-jacked the unused timevar TV_LOCAL_ALLOC (from pre-IRA days) to
measure the time spent in inline_call. Without my hack, this is 4801s, and with
the hack it is only 0.37s (!). Do the math: 6459s-4801s=1658s~=1560s.
(NB: These timings are all with my stack_vars patch applied as well.)

With the hack, the remaining compile time is mostly spent in:
 tree loop init                  : 236.16
 integrated RA                   : 215.70
 if-conversion                   : 177.26 (but due to loop_optimizer_init)
 tree SSA incremental            : 154.69
 reload                          : 124.33

I don't understand how inline_merge_summary is supposed to work, so I'm going
to leave that one for Richi and Honza.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2012-08-03  9:00 ` steven at gcc dot gnu.org
@ 2012-08-03 14:44 ` matz at gcc dot gnu.org
  2012-08-05 11:39 ` steven at gcc dot gnu.org
                   ` (48 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: matz at gcc dot gnu.org @ 2012-08-03 14:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #15 from Michael Matz <matz at gcc dot gnu.org> 2012-08-03 14:43:14 UTC ---
Author: matz
Date: Fri Aug  3 14:43:09 2012
New Revision: 190126

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190126
Log:
    PR tree-optimization/54146
    * cfgexpand.c (add_scope_conflicts_1): Use bitmap_ior_into.
    (add_scope_conflicts): Iterate in RPO order.
    (add_stack_protection_conflicts): Iterate over the other triangle.
    (fini_vars_expansion): Clear stack_vars_sorted.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2012-08-03 14:44 ` matz at gcc dot gnu.org
@ 2012-08-05 11:39 ` steven at gcc dot gnu.org
  2012-08-05 18:49 ` steven at gcc dot gnu.org
                   ` (47 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-05 11:39 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27917|0                           |1
        is obsolete|                            |

--- Comment #16 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-05 11:38:36 UTC ---
Comment on attachment 27917
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27917
Slighty less broken version of the previous attachment

Patch obsolete, similar fix committed as r190126:
http://gcc.gnu.org/viewcvs?view=revision&revision=190126


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2012-08-05 11:39 ` steven at gcc dot gnu.org
@ 2012-08-05 18:49 ` steven at gcc dot gnu.org
  2012-08-05 21:14 ` steven at gcc dot gnu.org
                   ` (46 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-05 18:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #17 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-05 18:48:55 UTC ---
(In reply to comment #14)
>  if-conversion             : 177.26 (but due to loop_optimizer_init)

Hmm, this is not loop_optimizer_init. All time is spent in the two memset calls
in cond_move_process_if_block:

  /* Build a mapping for each block to the value used for each
     register.  */
  max_reg = max_reg_num ();
  size = (max_reg + 1) * sizeof (rtx);
  then_vals = (rtx *) alloca (size);
  else_vals = (rtx *) alloca (size);
  memset (then_vals, 0, size);
  memset (else_vals, 0, size);

There are O(1e6) registers in the test case, and O(1e5) basic blocks. So we end
up memset'ing O(10)MB X O(1e5)times = O(1e6)MB (assuming 64 bits pointers so
that each rtx is O(10) bytes).

I have a few ideas to fix this problem.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2012-08-05 18:49 ` steven at gcc dot gnu.org
@ 2012-08-05 21:14 ` steven at gcc dot gnu.org
  2012-08-06  8:46 ` rguenth at gcc dot gnu.org
                   ` (45 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-05 21:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #18 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-05 21:13:26 UTC ---
Created attachment 27946
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27946
Speed up ifcvt.c:check_cond_move_block

This speeds up the pre-reload if-conversion passes by using a smarter data
structure to keep track of register values.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2012-08-05 21:14 ` steven at gcc dot gnu.org
@ 2012-08-06  8:46 ` rguenth at gcc dot gnu.org
  2012-08-06  9:10 ` stevenb.gcc at gmail dot com
                   ` (44 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-06  8:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #19 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-06 08:45:12 UTC ---
(In reply to comment #17)
> (In reply to comment #14)
> >  if-conversion             : 177.26 (but due to loop_optimizer_init)
> 
> Hmm, this is not loop_optimizer_init. All time is spent in the two memset calls
> in cond_move_process_if_block:
> 
>   /* Build a mapping for each block to the value used for each
>      register.  */
>   max_reg = max_reg_num ();
>   size = (max_reg + 1) * sizeof (rtx);
>   then_vals = (rtx *) alloca (size);
>   else_vals = (rtx *) alloca (size);
>   memset (then_vals, 0, size);
>   memset (else_vals, 0, size);
> 
> There are O(1e6) registers in the test case, and O(1e5) basic blocks. So we end
> up memset'ing O(10)MB X O(1e5)times = O(1e6)MB (assuming 64 bits pointers so
> that each rtx is O(10) bytes).
> 
> I have a few ideas to fix this problem.

Ick, I suppose similar issues exist on the tree level for passes that
think that memory / compile-time usage O(number-of-ssa-names or basic-blocks)
is "ok" (and I suppose it really _is_ ok ...?)


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2012-08-06  8:46 ` rguenth at gcc dot gnu.org
@ 2012-08-06  9:10 ` stevenb.gcc at gmail dot com
  2012-08-06 15:36 ` steven at gcc dot gnu.org
                   ` (43 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: stevenb.gcc at gmail dot com @ 2012-08-06  9:10 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #20 from stevenb.gcc at gmail dot com <stevenb.gcc at gmail dot com> 2012-08-06 09:09:02 UTC ---
On Mon, Aug 6, 2012 at 10:45 AM, rguenth at gcc dot gnu.org
<gcc-bugzilla@gcc.gnu.org> wrote:
> Ick, I suppose similar issues exist on the tree level for passes that
> think that memory / compile-time usage O(number-of-ssa-names or basic-blocks)
> is "ok" (and I suppose it really _is_ ok ...?)

I think this is OK, yes. In general, anything linear in some measure
of the function should be OK.
NB, this thing in ifcvt is *not* O(max_reg_num()) but
O(max_reg_num()*n_basic_blocks).

I suspect that in the tree optimizers, a non-linear issue exists for
rewriting into loop-closed SSA form. It looks like it is
O(num_ssa_names*n_basic_blocks). I'm trying to confirm that (in my
evening free time, so don't hold your breath ;-).


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2012-08-06  9:10 ` stevenb.gcc at gmail dot com
@ 2012-08-06 15:36 ` steven at gcc dot gnu.org
  2012-08-06 19:37 ` steven at gcc dot gnu.org
                   ` (42 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-06 15:36 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #21 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-06 15:36:04 UTC ---
(In reply to comment #20)

Yay, it's always nice to be right the first time when diagnosing a problem.

The tree loop optimizers spend 285s out of 1360s total compile time (with my
flatten hack and ifcvt patch applied) in compute_global_livein. That's 21% of
the total compile time.

I have a few ideas for how to fix this.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2012-08-06 15:36 ` steven at gcc dot gnu.org
@ 2012-08-06 19:37 ` steven at gcc dot gnu.org
  2012-08-06 20:22 ` steven at gcc dot gnu.org
                   ` (41 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-06 19:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #22 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-06 19:36:35 UTC ---
IRA/reload spends a rather significant amount of time here:

  FOR_EACH_BB_REVERSE (bb)
    { 
      bitmap_iterator bi;
      rtx insn;

      CLEAR_REG_SET (live_relevant_regs);
-->   memset (live_subregs_used, 0, max_regno * sizeof (int));

This is running with max_regno=O(1e6) and n_basic_blocks=O(1e5) ...


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (22 preceding siblings ...)
  2012-08-06 19:37 ` steven at gcc dot gnu.org
@ 2012-08-06 20:22 ` steven at gcc dot gnu.org
  2012-08-06 20:56 ` steven at gcc dot gnu.org
                   ` (40 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-06 20:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #23 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-06 20:22:02 UTC ---
Created attachment 27953
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27953
Be memory friendlier in build_insn_chain

My first ever reload patch! :-)


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (23 preceding siblings ...)
  2012-08-06 20:22 ` steven at gcc dot gnu.org
@ 2012-08-06 20:56 ` steven at gcc dot gnu.org
  2012-08-06 22:43 ` steven at gcc dot gnu.org
                   ` (39 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-06 20:56 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #24 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-06 20:55:37 UTC ---
(In reply to comment #23)
> Created attachment 27953 [details]

Needs this extra bit:

diff -u ira.c ira.c
--- ira.c       (working copy)
+++ ira.c       (working copy)
@@ -3539,7 +3539,8 @@
   *p = NULL;

   for (i = 0; i < (unsigned int) max_regno; i++)
-    sbitmap_free (live_subregs[i]);
+    if (live_subregs[i] != NULL)
+      sbitmap_free (live_subregs[i]);
   free (live_subregs);
   BITMAP_FREE (live_subregs_used);
   BITMAP_FREE (live_relevant_regs);


With that, reload time goes down to ~14s, down from ~124s.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (24 preceding siblings ...)
  2012-08-06 20:56 ` steven at gcc dot gnu.org
@ 2012-08-06 22:43 ` steven at gcc dot gnu.org
  2012-08-06 22:59 ` steven at gcc dot gnu.org
                   ` (38 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-06 22:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #25 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-06 22:42:12 UTC ---
(In reply to comment #21)
> The tree loop optimizers spend 285s out of 1360s total compile time (with my
> flatten hack and ifcvt patch applied) in compute_global_livein. That's 21% of
> the total compile time.

Hmm, I've underestimated this problem. It's not a simple matter of speeding up
compute_global_livein with a few caches. I've collected some statistics:
* The largest function has 408254 basic blocks
* The largest number of blocks visited in compute_global_livein is 185939
* compute_global_livein looked at 148056 edges to visit those blocks
* The maximum size of the work list was only 6419

185939 is the number of basic blocks that end up in livein. That is a bitmap,
so most time is spent in traversing bitmap linked lists.

Perhaps this can be fixed with a fibheap, which may improve the access pattern
on the livein bitmap.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (25 preceding siblings ...)
  2012-08-06 22:43 ` steven at gcc dot gnu.org
@ 2012-08-06 22:59 ` steven at gcc dot gnu.org
  2012-08-07 11:59 ` rguenth at gcc dot gnu.org
                   ` (37 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-06 22:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #26 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-06 22:58:22 UTC ---
(In reply to comment #25)
> 185939 is the number of basic blocks that end up in livein. That is a bitmap,
> so most time is spent in traversing bitmap linked lists.

Oh, and this doesn't happen just once, but several 100 times.
Only to prune the set to (livein & loop_exits) ...


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (26 preceding siblings ...)
  2012-08-06 22:59 ` steven at gcc dot gnu.org
@ 2012-08-07 11:59 ` rguenth at gcc dot gnu.org
  2012-08-07 19:58 ` steven at gcc dot gnu.org
                   ` (36 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-07 11:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu.org

--- Comment #27 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-07 11:59:07 UTC ---
Martin, can you look at comment #14 and the patch?  I think what we want to
do in flatten_function is before

      inline_call (e, true, NULL, NULL);

reset the edge predicates so that inline_merge_summary becomes very cheap.
Unfortunately that beast seems to have no early out (but instead uses
true_predicate () ...).  Can we speed it up for the case where we just
want "fast" operation rather than precise accounting of sizes/time in the
inlined-to caller?


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (27 preceding siblings ...)
  2012-08-07 11:59 ` rguenth at gcc dot gnu.org
@ 2012-08-07 19:58 ` steven at gcc dot gnu.org
  2012-08-07 22:28 ` steven at gcc dot gnu.org
                   ` (35 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-07 19:58 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #28 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-07 19:58:00 UTC ---
To illustrate the rewrite_into_closed_loop_ssa problem, consider this test
case:

extern void use1 (int);
extern void use2 (int);
extern int confuse_loop (void);

void
foo (void)
{
  int i, j, k;

  for (i = 0; i < 100; i++)
    {
      for (j = 0; j < 100; j++)
        {
          k = i + j;
          if (j > 2)
            use1 (k);
          if (confuse_loop ())
            break;
        }
      use2 (k);
    }
}

The only PHI needed for loop-closed SSA is one for k before "use2(k)". 

One problem is that GCC ends up inserting two PHI nodes (one on the exit after
confuse_loop(), which is wasteful if the exit block is empty. So problem 1 is
that too many PHIs are inserted.

The second problem is that with multiple loop exits, a large part of the CFG is
walked in compute_global_livein. The CFG for the test case (compiled with
-fno-tree-ch) looks like this:

             +---<------------ 14 ------<-----+
E           /                                  \          E
N          /                                    \         X
T -> 2 -  9 --> 3 --> 4 --> 5 -> 11 ----------> 7 -> 8 -> I
R              / \         / \                 /          T
Y             /   +-> 10 -+   +-> 6 -> 13 ->--+
              \                   /
               +----<-- 12 --<---+

And the following blocks are visited (with a patch I will attach in a minute --
without the patch even more blocks are visited). Block 3 is the block
containing the def, so the CFG walk ends there.

XXX visiting block 7
XXX visiting block 13
XXX visiting block 6
XXX visiting block 5
XXX visiting block 4
XXX visiting block 10
XXX visiting block 11
;; Created LCSSA PHI: k_4 = PHI <k_14(5)>
;; Created LCSSA PHI: k_8 = PHI <k_14(6)>

;; Resulting GIMPLE function IR:

foo ()
{
  int k;
  int j;
  int i;
  int D.1727;

  <bb 2>:
  goto <bb 9>;

  <bb 12>:

  <bb 3>:
  # j_29 = PHI <j_18(12), 0(9)>
  k_14 = i_28 + j_29;
  if (j_29 > 2)
    goto <bb 4>;
  else
    goto <bb 10>;

  <bb 10>:
  goto <bb 5>;

  <bb 4>:
  use1 (k_14);

  <bb 5>:
  D.1727_17 = confuse_loop ();
  if (D.1727_17 != 0)
    goto <bb 11>;
  else
    goto <bb 6>;

  <bb 11>:
  # k_4 = PHI <k_14(5)>
  goto <bb 7>;

  <bb 6>:
  j_18 = j_29 + 1;
  if (j_18 <= 99)
    goto <bb 12>;
  else
    goto <bb 13>;

  <bb 13>:
  # k_8 = PHI <k_14(6)>

  <bb 7>:
  # k_21 = PHI <k_4(11), k_8(13)>
  use2 (k_21);
  i_20 = i_28 + 1;
  if (i_20 <= 99)
    goto <bb 14>;
  else
    goto <bb 8>;

  <bb 8>:
  return;

  <bb 14>:

  <bb 9>:
  # i_28 = PHI <i_20(14), 0(2)>
  goto <bb 3>;

}


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (28 preceding siblings ...)
  2012-08-07 19:58 ` steven at gcc dot gnu.org
@ 2012-08-07 22:28 ` steven at gcc dot gnu.org
  2012-08-07 22:37 ` steven at gcc dot gnu.org
                   ` (34 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-07 22:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #29 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-07 22:28:11 UTC ---
Created attachment 27957
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27957
Do not traverse sibling loops

The idea here is to note that for a nested loop we know for sure that the loop
exits into a common loop father must all need PHI nodes, because we have have
loop pre-headers. I'm not sure what the effect of this is on irreducible loops,
and I haven't tested this much yet. Comments sought...


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (29 preceding siblings ...)
  2012-08-07 22:28 ` steven at gcc dot gnu.org
@ 2012-08-07 22:37 ` steven at gcc dot gnu.org
  2012-08-08  6:28 ` steven at gcc dot gnu.org
                   ` (33 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-07 22:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #30 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-07 22:36:30 UTC ---
> Created attachment 27957 [details]

With the attachment, time spent in rewrite_into_loop_closed_ssa is almost 0
(and that includes the time in the verifier). Compile time for the test case
(still with only "check<CGAL::Gmpfi>();") is now 924s wall time. Top spenders:

integrated RA                   : 224.26 (24%)
tree SSA incremental            :  75.07 ( 8%)
df live&initialized regs        :  69.58 ( 8%)
df live regs                    :  66.19 ( 7%)
remove unused locals            :  53.43 ( 6%)
out of ssa                      :  51.93 ( 6%)


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (30 preceding siblings ...)
  2012-08-07 22:37 ` steven at gcc dot gnu.org
@ 2012-08-08  6:28 ` steven at gcc dot gnu.org
  2012-08-08  6:29 ` steven at gcc dot gnu.org
                   ` (32 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-08  6:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #31 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-08 06:28:16 UTC ---
Author: steven
Date: Wed Aug  8 06:28:10 2012
New Revision: 190222

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190222
Log:
    PR middle-end/54146
    * ifcvt.c: Include pointer-set.h.
    (cond_move_process_if_block): Change type of then_regs and
    else_regs from alloca'd array to pointer_sets.
    (check_cond_move_block): Update for this change.
    (cond_move_convert_if_block): Likewise.
    * Makefile.in: Fix dependencies for ifcvt.o.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/ifcvt.c


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug tree-optimization/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (31 preceding siblings ...)
  2012-08-08  6:28 ` steven at gcc dot gnu.org
@ 2012-08-08  6:29 ` steven at gcc dot gnu.org
  2012-08-08  6:30 ` [Bug middle-end/54146] " steven at gcc dot gnu.org
                   ` (31 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-08  6:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #31 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-08 06:28:16 UTC ---
Author: steven
Date: Wed Aug  8 06:28:10 2012
New Revision: 190222

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190222
Log:
    PR middle-end/54146
    * ifcvt.c: Include pointer-set.h.
    (cond_move_process_if_block): Change type of then_regs and
    else_regs from alloca'd array to pointer_sets.
    (check_cond_move_block): Update for this change.
    (cond_move_convert_if_block): Likewise.
    * Makefile.in: Fix dependencies for ifcvt.o.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/ifcvt.c

--- Comment #32 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-08 06:29:16 UTC ---
Author: steven
Date: Wed Aug  8 06:29:12 2012
New Revision: 190223

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190223
Log:
    PR middle-end/54146
    * ira.c (init_live_subregs): Take live_subregs_used as a bitmap.
    (build_insn_chain): Make live_subregs_used a bitmap.
    Use SBITMAP_SIZE to ignore the paradoxical bytes of subregs.
    Use sbitmap_free to free the live_subreg sbitmaps.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira.c


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (33 preceding siblings ...)
  2012-08-08  6:30 ` [Bug middle-end/54146] " steven at gcc dot gnu.org
@ 2012-08-08  6:30 ` steven at gcc dot gnu.org
  2012-08-08  8:10 ` rguenth at gcc dot gnu.org
                   ` (29 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-08  6:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |middle-end

--- Comment #32 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-08 06:29:16 UTC ---
Author: steven
Date: Wed Aug  8 06:29:12 2012
New Revision: 190223

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190223
Log:
    PR middle-end/54146
    * ira.c (init_live_subregs): Take live_subregs_used as a bitmap.
    (build_insn_chain): Make live_subregs_used a bitmap.
    Use SBITMAP_SIZE to ignore the paradoxical bytes of subregs.
    Use sbitmap_free to free the live_subreg sbitmaps.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira.c


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (32 preceding siblings ...)
  2012-08-08  6:29 ` steven at gcc dot gnu.org
@ 2012-08-08  6:30 ` steven at gcc dot gnu.org
  2012-08-08  6:30 ` steven at gcc dot gnu.org
                   ` (30 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-08  6:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |middle-end

--- Comment #31 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-08 06:28:16 UTC ---
Author: steven
Date: Wed Aug  8 06:28:10 2012
New Revision: 190222

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190222
Log:
    PR middle-end/54146
    * ifcvt.c: Include pointer-set.h.
    (cond_move_process_if_block): Change type of then_regs and
    else_regs from alloca'd array to pointer_sets.
    (check_cond_move_block): Update for this change.
    (cond_move_convert_if_block): Likewise.
    * Makefile.in: Fix dependencies for ifcvt.o.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/ifcvt.c

--- Comment #32 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-08 06:29:16 UTC ---
Author: steven
Date: Wed Aug  8 06:29:12 2012
New Revision: 190223

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190223
Log:
    PR middle-end/54146
    * ira.c (init_live_subregs): Take live_subregs_used as a bitmap.
    (build_insn_chain): Make live_subregs_used a bitmap.
    Use SBITMAP_SIZE to ignore the paradoxical bytes of subregs.
    Use sbitmap_free to free the live_subreg sbitmaps.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira.c


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (34 preceding siblings ...)
  2012-08-08  6:30 ` steven at gcc dot gnu.org
@ 2012-08-08  8:10 ` rguenth at gcc dot gnu.org
  2012-08-08 10:11 ` steven at gcc dot gnu.org
                   ` (28 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-08  8:10 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #33 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-08 08:09:25 UTC ---
(In reply to comment #29)
> Created attachment 27957 [details]
> Do not traverse sibling loops
> 
> The idea here is to note that for a nested loop we know for sure that the loop
> exits into a common loop father must all need PHI nodes, because we have have
> loop pre-headers. I'm not sure what the effect of this is on irreducible loops,
> and I haven't tested this much yet. Comments sought...

Nice results.  A few comments - if you rely on preheaders then please assert
in rewrite_into_loop_closed_ssa that loops_state_satisfies_p
(LOOPS_HAVE_PREHEADERS).

Irreducible loops are "one" loop to cfgloop and to loop-closed SSA form, so
they should be fine.

What do you mean by "for a nested loop we know for sure that the loop
exits into a common loop father must all need PHI nodes"?  What happens for
nested loops that do not exit into a common loop father?  That is what's
"common" here?  I think what you want to say is that a loop cannot exit
to a child of a sibling of a father, thus:

  for (;;)
    for (;;)
      goto x;
  for (;;)
    x:

cannot happen in the sense that loop detection would not detect the loop
nest as written (but had a single irreducible loop)?

I think you should simply move compute_global_livein to its single use
and make it static.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (35 preceding siblings ...)
  2012-08-08  8:10 ` rguenth at gcc dot gnu.org
@ 2012-08-08 10:11 ` steven at gcc dot gnu.org
  2012-08-08 11:49 ` rguenth at gcc dot gnu.org
                   ` (27 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-08 10:11 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #34 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-08 10:10:46 UTC ---
(In reply to comment #33)
> I think you should simply move compute_global_livein to its single use
> and make it static.

Yes, and I need to add the same smarts there as in find_uses_to_rename_use to
look through loops at the same nesting level, because this test case fails:

extern unsigned use (unsigned);
void foo (void)
{
  unsigned i, j;
  do { i = use (0); } while (i);
  do { j = use (0); } while (j);
  if (i) use (j);
}

It fails in check_loop_closed_ssa_use at -O2 but passes at -O2 -fno-tree-vrp,
so VRP is doing something destroying the initially correct loop-closed SSA
form.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (36 preceding siblings ...)
  2012-08-08 10:11 ` steven at gcc dot gnu.org
@ 2012-08-08 11:49 ` rguenth at gcc dot gnu.org
  2012-08-08 17:40 ` steven at gcc dot gnu.org
                   ` (26 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-08 11:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #35 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-08 11:49:21 UTC ---
(In reply to comment #34)
> (In reply to comment #33)
> > I think you should simply move compute_global_livein to its single use
> > and make it static.
> 
> Yes, and I need to add the same smarts there as in find_uses_to_rename_use to
> look through loops at the same nesting level, because this test case fails:
> 
> extern unsigned use (unsigned);
> void foo (void)
> {
>   unsigned i, j;
>   do { i = use (0); } while (i);
>   do { j = use (0); } while (j);
>   if (i) use (j);
> }
> 
> It fails in check_loop_closed_ssa_use at -O2 but passes at -O2 -fno-tree-vrp,
> so VRP is doing something destroying the initially correct loop-closed SSA
> form.

Jump threading I suppose.  I think you can drop loop-closed SSA form
right before calling vrp_finalize ().


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (37 preceding siblings ...)
  2012-08-08 11:49 ` rguenth at gcc dot gnu.org
@ 2012-08-08 17:40 ` steven at gcc dot gnu.org
  2012-08-10  3:46 ` hubicka at ucw dot cz
                   ` (25 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-08 17:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #36 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-08 17:39:49 UTC ---
Author: steven
Date: Wed Aug  8 17:39:46 2012
New Revision: 190235

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190235
Log:
    PR middle-end/54146
    * gimpify.c (gimplify_body): Only verify_gimple_in_seq with
    checking enabled.
    * tree-ssa-loop-manip.c (add_exit_phis_var): Assert that var is
    a gimple_reg if checking is enabled.
    (find_uses_to_rename_stmt): Only look at non-virtual USE operands.
    * tree-into-ssa (compute_global_livein): Change the worklist
    type from an array to a VEC.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/gimplify.c
    trunk/gcc/tree-into-ssa.c
    trunk/gcc/tree-ssa-loop-manip.c


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (38 preceding siblings ...)
  2012-08-08 17:40 ` steven at gcc dot gnu.org
@ 2012-08-10  3:46 ` hubicka at ucw dot cz
  2012-08-10  3:51 ` hubicka at ucw dot cz
                   ` (24 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: hubicka at ucw dot cz @ 2012-08-10  3:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #37 from Jan Hubicka <hubicka at ucw dot cz> 2012-08-10 03:45:31 UTC ---
> 
> I suppose it's the old issue that we update fibheap keys along each
> inlining decision - and with flatten there are just very many ... Honza?

Well, I killed most of updating for mainline and at flattening time the keys
are not
even computed.  So it is probably some other bookeping.  I will profile it.

> It also seems we flatten depth-first (thus inline leafs) instead of
> the other way around:
> 
>       orig_callee = callee;
>       inline_call (e, true, NULL, NULL);
>       if (e->callee != orig_callee)
>         orig_callee->symbol.aux = (void *) node;
>       flatten_function (e->callee, early);
>       if (e->callee != orig_callee)
>         orig_callee->symbol.aux = NULL;
> 
> This means we will materialize all intermediate flattenings in
> functions that will not be reclaimed, right?  flattening foo
> should inline everything into foo, but not affect remaining
> callee bodies.

I do not follow here.  Callee is already the inline clone, so we will
not affect other copies of foo...

Honza
> 
> Richard.
> 
> -- 
> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (39 preceding siblings ...)
  2012-08-10  3:46 ` hubicka at ucw dot cz
@ 2012-08-10  3:51 ` hubicka at ucw dot cz
  2012-08-10  3:54 ` hubicka at ucw dot cz
                   ` (23 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: hubicka at ucw dot cz @ 2012-08-10  3:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #38 from Jan Hubicka <hubicka at ucw dot cz> 2012-08-10 03:50:26 UTC ---
> I don't understand how inline_merge_summary is supposed to work, so I'm going
> to leave that one for Richi and Honza.
Well, it produces inline summary for function that is result of inlininig based
on summaries
of the former functions.  You can't just bypass it for flattening or the
summary
of the function after flatting will be off. Obviously the function does fair
amount of
propagation. I will try to speed it up or limit the bounds. Since the summaries
are of
constant size, we should not explode here.

Honza


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (40 preceding siblings ...)
  2012-08-10  3:51 ` hubicka at ucw dot cz
@ 2012-08-10  3:54 ` hubicka at ucw dot cz
  2012-08-10  4:35 ` hubicka at gcc dot gnu.org
                   ` (22 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: hubicka at ucw dot cz @ 2012-08-10  3:54 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #39 from Jan Hubicka <hubicka at ucw dot cz> 2012-08-10 03:53:46 UTC ---
> Martin, can you look at comment #14 and the patch?  I think what we want to
> do in flatten_function is before
> 
>       inline_call (e, true, NULL, NULL);
> 
> reset the edge predicates so that inline_merge_summary becomes very cheap.

Well, this will still result in (conservatively) wrong inline summary for
flattened function. I am not sure if "flatten" attribute should laso mean
"do not care about inlining of the flattened function much".
I think we want to handle this generically even if it is harder for
small function inliner to explode here because it does a lot less cascaded
inlining.

> Unfortunately that beast seems to have no early out (but instead uses
> true_predicate () ...).  Can we speed it up for the case where we just
> want "fast" operation rather than precise accounting of sizes/time in the
> inlined-to caller?

I will look.

Honza


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (41 preceding siblings ...)
  2012-08-10  3:54 ` hubicka at ucw dot cz
@ 2012-08-10  4:35 ` hubicka at gcc dot gnu.org
  2012-08-10  5:19 ` hubicka at gcc dot gnu.org
                   ` (21 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: hubicka at gcc dot gnu.org @ 2012-08-10  4:35 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #40 from Jan Hubicka <hubicka at gcc dot gnu.org> 2012-08-10 04:34:54 UTC ---
OK,
my simple ^C profiling shows:
#14 0x000000000091f17f in estimate_edge_size_and_time (e=0x7fffb7192d68,
size=<optimized out>, time=0x7fffc04110b0, prob=10000) at
../../gcc/ipa-inline-analysis.c:2183
#15 0x000000000091f23a in estimate_calls_size_and_time (node=0x7fffb7193af8,
size=0x7fffc04110b4, time=0x7fffc04110b0, possible_truths=4294967294,
known_vals=0x0, 
    known_binfos=0x0) at ../../gcc/ipa-inline-analysis.c:2274
#16 0x000000000091f2d1 in estimate_calls_size_and_time (node=0x7fffb71939c0,
size=0x7fffc04110b4, time=0x7fffc04110b0, possible_truths=4294967294,
known_vals=0x0, 
    known_binfos=0x0) at ../../gcc/ipa-inline-analysis.c:2277
#17 0x000000000091f2d1 in estimate_calls_size_and_time (node=0x7fffb7193888,
size=0x7fffc04110b4, time=0x7fffc04110b0, possible_truths=4294967294,
known_vals=0x0, 
    known_binfos=0x0) at ../../gcc/ipa-inline-analysis.c:2277
#18 0x000000000091f2d1 in estimate_calls_size_and_time (node=0x7fffb7191c30,
size=0x7fffc04110b4, time=0x7fffc04110b0, possible_truths=4294967294,
known_vals=0x0, 
    known_binfos=0x0) at ../../gcc/ipa-inline-analysis.c:2277
#19 0x000000000091f2d1 in estimate_calls_size_and_time (node=0x7fffb7180c30,
size=0x7fffc04110b4, time=0x7fffc04110b0, possible_truths=4294967294,
known_vals=0x0, 
    known_binfos=0x0) at ../../gcc/ipa-inline-analysis.c:2277
#20 0x000000000091f2d1 in estimate_calls_size_and_time (node=0x7fffb7147d68,
size=0x7fffc04110b4, time=0x7fffc04110b0, possible_truths=4294967294,
known_vals=0x0, 
    known_binfos=0x0) at ../../gcc/ipa-inline-analysis.c:2277
#21 0x000000000091f2d1 in estimate_calls_size_and_time (node=0x7fffcbb039c0,
size=0x7fffc04110b4, time=0x7fffc04110b0, possible_truths=4294967294,
known_vals=0x0, 
    known_binfos=0x0) at ../../gcc/ipa-inline-analysis.c:2277
#22 0x00000000009232a1 in inline_merge_summary (edge=0x7fffb6fba068) at
../../gcc/ipa-inline-analysis.c:2687
#23 0x0000000000ed0cdf in inline_call (e=0x7fffb6fba068,
update_original=<optimized out>, new_edges=0x0, overall_size=0x0) at
../../gcc/ipa-inline-transform.c:246
#24 0x0000000000ece3a9 in flatten_function (node=0x7fffb6fb9ea0, early=1
'\001') at ../../gcc/ipa-inline.c:1605
#25 0x0000000000ece3bf in flatten_function (node=0x7fffb6fb4c30, early=1
'\001') at ../../gcc/ipa-inline.c:1608
#26 0x0000000000ece3bf in flatten_function (node=0x7fffb6f87270, early=1
'\001') at ../../gcc/ipa-inline.c:1608
#27 0x0000000000ece3bf in flatten_function (node=0x7fffcbb039c0, early=1
'\001') at ../../gcc/ipa-inline.c:1608
#28 0x0000000000ece616 in early_inliner () at ../../gcc/ipa-inline.c:1881

so the time is actually not spent in the predicate merging logic, but in
computing overall size/time of the function after inline operation is
performed. This is because of following call in inline_merge_summary:

  estimate_calls_size_and_time (to, &info->size, &info->time,
                                ~(clause_t)(1 << predicate_false_condition),
                                NULL, NULL);

Here we simply recompute the size/time by walking all callees of the function
and since this involve recursive walk of the inlined callees that are many we
hit quadratic time complexity.
Ugly.  One (particularly ugly) workaround would be to make inline_merge_summary
to skip updating after each step of flattening because flattening do not care,
but that is quite a kludge (similar to Steven's patch but with one extra update
on the very end).

Other solution is to work hard enough to update everything incrementally, but
it is not 100% trivial (that is why I simply recompute). 
I actually tought of this case and concluded that if we do so deep inline trees
we will blow up somewhere else.  Quite an achivement that Steven managed to
chase out all the other cases.

Honza


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (42 preceding siblings ...)
  2012-08-10  4:35 ` hubicka at gcc dot gnu.org
@ 2012-08-10  5:19 ` hubicka at gcc dot gnu.org
  2012-08-10  6:33 ` steven at gcc dot gnu.org
                   ` (20 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: hubicka at gcc dot gnu.org @ 2012-08-10  5:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #41 from Jan Hubicka <hubicka at gcc dot gnu.org> 2012-08-10 05:18:52 UTC ---
Created attachment 27979
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27979
Path for inliner slowness

Hi,
this is patch I am testing. After some consideration I do not like the
incremental update idea much. It asks for bookkeeping issues and has roundoff
problems so unless it is really necessary I will try to stick with the
recomputation scheme. This is more careful variant of Steven's patch. We skip
the recomputation after each incremental inline and update the info once we are
done with the flattening.

Honza

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (43 preceding siblings ...)
  2012-08-10  5:19 ` hubicka at gcc dot gnu.org
@ 2012-08-10  6:33 ` steven at gcc dot gnu.org
  2012-08-10  7:53 ` hubicka at gcc dot gnu.org
                   ` (19 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-10  6:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #42 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-10 06:32:37 UTC ---
(In reply to comment #40)
> Quite an achivement that Steven managed to
> chase out all the other cases.

Thanks for the compliment :-)

I'm still working on the rewrite_into_loop_closed_ssa slowness. I will have a
patch ready this weekend.

Next: On to -O2 :-)


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (44 preceding siblings ...)
  2012-08-10  6:33 ` steven at gcc dot gnu.org
@ 2012-08-10  7:53 ` hubicka at gcc dot gnu.org
  2012-08-14 12:39 ` rguenth at gcc dot gnu.org
                   ` (18 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: hubicka at gcc dot gnu.org @ 2012-08-10  7:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #43 from Jan Hubicka <hubicka at gcc dot gnu.org> 2012-08-10 07:52:31 UTC ---
Author: hubicka
Date: Fri Aug 10 07:52:23 2012
New Revision: 190283

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190283
Log:

    PR middle-end/54146
    * ipa-inline-transform.c (inline_call): Add UPDATE_OVERALL_SUMMARY
    parameter; honnor it.
    * ipa-inline.c (recursive_inlining): Update call
    of inline_call.
    (inline_small_functions): Likewise.
    (ipa_inline): Likewise.
    (inline_always_inline_functions): Likewise.
    (early_inline_small_functions): Likewise.
    (flatten_function): Do separate update of summary info.
    * ipa-inline.h (inline_update_overall_summary): Declare.
    (inline_call): Update.
    * ipa-inline-analysis.c (inline_merge_summary): Break out
    updating code to ...
    (inline_update_overall_summary): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ipa-inline-analysis.c
    trunk/gcc/ipa-inline-transform.c
    trunk/gcc/ipa-inline.c
    trunk/gcc/ipa-inline.h


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (45 preceding siblings ...)
  2012-08-10  7:53 ` hubicka at gcc dot gnu.org
@ 2012-08-14 12:39 ` rguenth at gcc dot gnu.org
  2012-08-14 13:21 ` glisse at gcc dot gnu.org
                   ` (17 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-14 12:39 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #44 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-14 12:38:41 UTC ---
Author: rguenth
Date: Tue Aug 14 12:38:32 2012
New Revision: 190382

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190382
Log:
2012-08-14  Richard Guenther  <rguenther@suse.de>

    PR tree-optimization/54146
    * tree-ssa-pre.c (do_regular_insertion): Use a VEC
    indexed by pred edge index for avail.
    (do_partial_partial_insertion): Likewise.
    (insert_into_preds_of_block): Adjust.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-pre.c


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (46 preceding siblings ...)
  2012-08-14 12:39 ` rguenth at gcc dot gnu.org
@ 2012-08-14 13:21 ` glisse at gcc dot gnu.org
  2012-08-15 14:34 ` steven at gcc dot gnu.org
                   ` (16 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-08-14 13:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #45 from Marc Glisse <glisse at gcc dot gnu.org> 2012-08-14 13:20:55 UTC ---
(In reply to comment #11)
> > Marc, do you know where the use of the
> > flatten attribute comes from in your code?
> Comes from the Eigen library, I'll talk to them about it and see if they can
> comment. They mostly deal with simple number types (double, float), so the
> behavior with heavy types might have been unnoticed.

For the record:
after doing some benchmarks, they have removed the attribute, it didn't improve
performance anymore (though it probably used to in the past).

I have to say I am extremely impressed by all the improvements you guys have
done based on this bad code (which I realize is now just a handy example of big
code with the flatten attribute high enough in the call chain). If you keep at
it, next time I might not even notice there is something wrong with the code
;-)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (47 preceding siblings ...)
  2012-08-14 13:21 ` glisse at gcc dot gnu.org
@ 2012-08-15 14:34 ` steven at gcc dot gnu.org
  2012-08-15 15:07 ` steven at gcc dot gnu.org
                   ` (15 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-15 14:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27930|0                           |1
        is obsolete|                            |
  Attachment #27946|0                           |1
        is obsolete|                            |
  Attachment #27953|0                           |1
        is obsolete|                            |
  Attachment #27957|0                           |1
        is obsolete|                            |
  Attachment #27979|0                           |1
        is obsolete|                            |

--- Comment #46 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-15 14:33:27 UTC ---
Created attachment 28020
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28020
Faster rewrite_into_loop_closed_ssa

This reduces time spent in rewrite_into_loop_closed_ssa to something too small
to show up in the timevar measurements.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (48 preceding siblings ...)
  2012-08-15 14:34 ` steven at gcc dot gnu.org
@ 2012-08-15 15:07 ` steven at gcc dot gnu.org
  2012-08-16 10:53 ` steven at gcc dot gnu.org
                   ` (14 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-15 15:07 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #47 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-15 15:07:05 UTC ---
(In reply to comment #46)
> Created attachment 28020 [details]
> Faster rewrite_into_loop_closed_ssa

After this patch, IRA is the only major bottle-neck left, although there are
still a few passes that take more time than I think is reasonable (out-of-SSA
in particular). New top 10 spenders:

 integrated RA           : 227.36 (30%) usr     678602 kB (19%) ggc
 tree SSA incremental    :  66.17 ( 9%) usr     151828 kB ( 4%) ggc
 out of ssa              :  43.76 ( 6%) usr        608 kB ( 0%) ggc
 tree PTA                :  43.06 ( 6%) usr      27466 kB ( 1%) ggc
 df live regs            :  35.03 ( 5%) usr          0 kB ( 0%) ggc
 df live&initialized regs:  34.25 ( 5%) usr          0 kB ( 0%) ggc
 tree SSA rewrite        :  18.37 ( 2%) usr      24888 kB ( 1%) ggc
 remove unused locals    :  17.97 ( 2%) usr          0 kB ( 0%) ggc
 dominance computation   :  15.06 ( 2%) usr          0 kB ( 0%) ggc
 tree CFG cleanup        :  11.79 ( 2%) usr       1123 kB ( 0%) ggc


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (49 preceding siblings ...)
  2012-08-15 15:07 ` steven at gcc dot gnu.org
@ 2012-08-16 10:53 ` steven at gcc dot gnu.org
  2012-08-16 12:11 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-16 10:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #48 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-16 10:52:20 UTC ---
Author: steven
Date: Thu Aug 16 10:52:14 2012
New Revision: 190442

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190442
Log:
    PR middle-end/54146
    * tree-flow.h (compute_global_livein): Remove prototype.
    * tree-into-ssa.c (compute_global_livein): Remove function.
    * tree-ssa-loop-manip.c: Include gimple-pretty-print.h.
    (find_sibling_superloop): New function.
    (compute_live_loop_exits): New function.
    (add_exit_phis_edge): Rename to add_exit_phi.  Do not allow
    inserting a PHI in a block that is not a loop exit for VAR.
    Add dumping if TDF_DETAILS.
    (add_exit_phis_var): Rewrite.
    (add_exit_phis): Update.
    (get_loops_exits): Rewrite to return an array of per-loop exits
    rather than one bitmap with all loop exits.
    (find_uses_to_rename_bb): Ignore virtual PHI nodes.
    (rewrite_into_loop_closed_ssa): Update.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-flow.h
    trunk/gcc/tree-into-ssa.c
    trunk/gcc/tree-ssa-loop-manip.c


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (50 preceding siblings ...)
  2012-08-16 10:53 ` steven at gcc dot gnu.org
@ 2012-08-16 12:11 ` rguenth at gcc dot gnu.org
  2012-08-16 13:56 ` stevenb.gcc at gmail dot com
                   ` (12 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-16 12:11 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |memory-hog

--- Comment #49 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-16 12:10:56 UTC ---
We still use very much memory (full testcase doesn't fit in 4GB ram).  With

...
  check<CGAL::Gmpfr>();
  //check<CGAL::Gmpfi>();
  //check<CGAL::Quotient<CGAL::Gmpz> >();
  //check<CGAL::Lazy_exact_nt<CGAL::Gmpq> >();
  //check<CORE::BigInt>();
  //check<CORE::BigRat>();
  //check<CORE::BigFloat>();
  //check<CORE::Expr>();

I see (--enable-gather-detailed-mem-stats):

Kind                   Nodes      Bytes
---------------------------------------
decls                1116380  181227400
types                 535841   90021288
blocks                222522   17801760
stmts                  45749    2068872
refs                  854577   43178392
exprs                2147881   95117920
constants             140132    4176820
identifiers            73710    6486480
vecs                 2654190  131466648
binfos                 48162    4873432
ssa names             339369   27149520
constructors           26426     634224
random kinds         1885177   73894984
lang_decl kinds       352117   13918872
lang_type kinds        48218    7329136
omp clauses                0          0
---------------------------------------
Total                10490451  699345748

a lot of memory in TREE_VECs for some reason.

GIMPLE statements
Kind                   Stmts      Bytes
---------------------------------------
assignments           316019   30602376
phi nodes              54994   15777472
conditionals           26090    2504640
everything else       237509   23110772
---------------------------------------
Total                 634612   71995260

gimple is lean, so is RTL ;)

Alloc-pool Kind         Elt size  Pools  Allocated (elts)            Peak
(elts)            Leak (elts)
--------------------------------------------------------------------------------------------------------------
live ranges                40        513   19250760(    481269)   10800800(   
270020)          0(         0)
df_scan ref base           56       1026  331010008(   5910893)   14059808(   
251068)          0(         0)
df_scan ref artificial     64       1026   20113600(    314275)    4239872(    
66248)          0(         0)
df_scan ref regular        64       1026   66557568(   1039962)    4543872(    
70998)          0(         0)

are by far the biggest alloc-pool users.

bitmap stats are confusing because they show leaks for bitmaps we free
by releasing their obstack.  DF and PTA bitmaps are largest.

We leak some VECs ...

c-family/c-pragma.c:619 (push_visibility)                24: 0.0%         24   
           1: 0.0%
cp/pt.c:471 (maybe_begin_member_template_process         24: 0.0%         24   
           1: 0.0%
function.c:4513 (push_struct_function)                   40: 0.0%         40   
           1: 0.0%
vec.c:307 (vec_stack_p_reserve_exact_1)                  40: 0.0%         40   
           1: 0.0%
tree-ssa-loop-ivopts.c:3070 (multiplier_allowed_        328: 0.0%        608   
           3: 0.0%
tree-ssa-loop-ivopts.c:3153 (get_address_cost)          328: 0.0%        608   
           3: 0.0%
tree-ssa-sccvn.c:745 (copy_reference_ops_from_re        392: 0.0%     806232   
      102098: 4.6%
cfgloop.h:583 (fel_init)                                480: 0.0%        860   
         106: 0.0%
c-family/c-pragma.c:1246 (c_register_pragma_1)          584: 0.0%        696   
           4: 0.0%
function.c:156 (push_function_context)                  976: 0.0%       1200   
           8: 0.0%
ira.c:3699 (find_moveable_pseudos)                     1240: 0.0%     221128   
         513: 0.0%
passes.c:2188 (execute_one_pass)                       4360: 0.1%     655320   
       16466: 0.7%
tree-ssa-structalias.c:3870 (handle_lhs_call)          9576: 0.2%      18360   
         133: 0.0%
ipa-ref.c:55 (ipa_record_reference)                   60184: 1.1%     327640   
        5813: 0.3%
cfgloop.c:1143 (get_loop_exit_edges)                  73184: 1.3%     157888   
       62221: 2.8%
tree-into-ssa.c:940 (mark_phi_for_rewrite)           153360: 2.7%     164096   
          17: 0.0%
cfgloop.c:1134 (get_loop_exit_edges)                 166592: 3.0%     238712   
       11639: 0.5%
ipa-reference.c:184 (set_reference_optimization_     180248: 3.2%     248064   
          47: 0.0%
tree-into-ssa.c:321 (get_ssa_name_ann)               627448:11.2%     716496   
          14: 0.0%
tree-ssa-sccvn.c:3657 (extract_and_process_scc_f    1246864:22.3%    1291960   
      105903: 4.7%
tree-ssa-loop-im.c:1562 (record_mem_ref_loc)        1292560:23.1%    1392576   
       55465: 2.5%
tree-ssa-loop-im.c:1551 (record_mem_ref_loc)        1771800:31.7%    3141360   
       52717: 2.4%

I'll look at the loop and sccvn parts.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (51 preceding siblings ...)
  2012-08-16 12:11 ` rguenth at gcc dot gnu.org
@ 2012-08-16 13:56 ` stevenb.gcc at gmail dot com
  2012-08-16 14:07 ` rguenther at suse dot de
                   ` (11 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: stevenb.gcc at gmail dot com @ 2012-08-16 13:56 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #50 from stevenb.gcc at gmail dot com <stevenb.gcc at gmail dot com> 2012-08-16 13:55:40 UTC ---
On Thu, Aug 16, 2012 at 2:10 PM, rguenth at gcc dot gnu.org
<gcc-bugzilla@gcc.gnu.org> wrote:
> bitmap stats are confusing because they show leaks for bitmaps we free
> by releasing their obstack.  DF and PTA bitmaps are largest.

The bitmap obstack stats are not very reliable anyway, I think. I've
been using my own stats for this (with a size_t version of
obstack_memory_used:

+static size_t
+obstack_memory_used2 (struct obstack *h)
+{
+  struct _obstack_chunk* lp;
+  size_t nbytes = 0;
+
+  for (lp = h->chunk; lp != 0; lp = lp->prev)
+    {
+      nbytes += (size_t) (lp->limit - (char *) lp);
+    }
+  return nbytes;
+}
+

so that also the "freed" bitmap elements are counted). With that, you
can show that the reg_obstack and bitmap_default_obstack grow up to
several GB that isn't released between passes. An added problem is
that IRA puts its bitmap on its own obstack (as it should, really) but
that means the >3GB of free elements on the reg_obstack and
bitmap_default_obstack remain unused. So on the machine I use for
testing (cfarm gcc17), the memory footprint is reduced by >2GB (~25%)
with this hack:

Index: ira.c
===================================================================
--- ira.c       (revision 190442)
+++ ira.c       (working copy)
@@ -4132,6 +4132,12 @@
   int max_regno_before_ira, ira_max_point_before_emit;
   int rebuild_p;

+  /* There shouldn't be anything on these obstacks.  */
+  bitmap_obstack_release (NULL);
+  bitmap_obstack_initialize (NULL);
+  bitmap_obstack_release (&reg_obstack);
+  bitmap_obstack_initialize (&reg_obstack);
+
   if (flag_caller_saves)
     init_caller_save ();

There is in general a lot of BITMAP_ALLOC(NULL) abuse in the compiler.
I have patches to address the cases in tree-ssa-live.c and dse.c, and
I intend to look at the tree-ssa-ter and cfgexpand cases this weekend.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (52 preceding siblings ...)
  2012-08-16 13:56 ` stevenb.gcc at gmail dot com
@ 2012-08-16 14:07 ` rguenther at suse dot de
  2012-08-16 14:28 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: rguenther at suse dot de @ 2012-08-16 14:07 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #51 from rguenther at suse dot de <rguenther at suse dot de> 2012-08-16 14:06:06 UTC ---
On Thu, 16 Aug 2012, stevenb.gcc at gmail dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146
> 
> --- Comment #50 from stevenb.gcc at gmail dot com <stevenb.gcc at gmail dot com> 2012-08-16 13:55:40 UTC ---
> On Thu, Aug 16, 2012 at 2:10 PM, rguenth at gcc dot gnu.org
> <gcc-bugzilla@gcc.gnu.org> wrote:
> > bitmap stats are confusing because they show leaks for bitmaps we free
> > by releasing their obstack.  DF and PTA bitmaps are largest.
> 
> The bitmap obstack stats are not very reliable anyway, I think. I've
> been using my own stats for this (with a size_t version of
> obstack_memory_used:
> 
> +static size_t
> +obstack_memory_used2 (struct obstack *h)
> +{
> +  struct _obstack_chunk* lp;
> +  size_t nbytes = 0;
> +
> +  for (lp = h->chunk; lp != 0; lp = lp->prev)
> +    {
> +      nbytes += (size_t) (lp->limit - (char *) lp);
> +    }
> +  return nbytes;
> +}
> +
> 
> so that also the "freed" bitmap elements are counted). With that, you
> can show that the reg_obstack and bitmap_default_obstack grow up to
> several GB that isn't released between passes. An added problem is

Hum, I thought we release those obstacks after each pass ...


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (53 preceding siblings ...)
  2012-08-16 14:07 ` rguenther at suse dot de
@ 2012-08-16 14:28 ` rguenth at gcc dot gnu.org
  2012-08-16 23:04 ` steven at gcc dot gnu.org
                   ` (9 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-16 14:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #52 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-16 14:27:59 UTC ---
Author: rguenth
Date: Thu Aug 16 14:27:51 2012
New Revision: 190445

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190445
Log:
2012-08-16  Richard Guenther  <rguenther@suse.de>

    PR middle-end/54146
    * tree-ssa-loop-niter.c (find_loop_niter_by_eval): Free the
    exit vector.
    * ipa-pure-const.c (analyze_function): Use FOR_EACH_LOOP_BREAK.
    * cfgloop.h (FOR_EACH_LOOP_BREAK): Fix.
    * tree-ssa-structalias.c (handle_lhs_call): Properly free rhsc.
    * tree-into-ssa.c (get_ssa_name_ann): Allocate info only when
    needed.
    * tree-ssa-loop-im.c (analyze_memory_references): Adjust.
    (tree_ssa_lim_finalize): Free all mem_refs.
    * tree-ssa-sccvn.c (extract_and_process_scc_for_name): Free
    scc when bailing out.
    * modulo-sched.c (sms_schedule): Use FOR_EACH_LOOP_BREAK.
    * ira-build.c (loop_with_complex_edge_p): Free loop exit vector.
    * graphite-sese-to-poly.c (scop_ivs_can_be_represented): Use
    FOR_EACH_LOOP_BREAK.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgloop.h
    trunk/gcc/graphite-sese-to-poly.c
    trunk/gcc/ipa-pure-const.c
    trunk/gcc/ira-build.c
    trunk/gcc/modulo-sched.c
    trunk/gcc/tree-into-ssa.c
    trunk/gcc/tree-ssa-loop-im.c
    trunk/gcc/tree-ssa-loop-niter.c
    trunk/gcc/tree-ssa-sccvn.c
    trunk/gcc/tree-ssa-structalias.c


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (54 preceding siblings ...)
  2012-08-16 14:28 ` rguenth at gcc dot gnu.org
@ 2012-08-16 23:04 ` steven at gcc dot gnu.org
  2012-08-17  9:43 ` steven at gcc dot gnu.org
                   ` (8 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-16 23:04 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #28020|0                           |1
        is obsolete|                            |

--- Comment #53 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-16 23:04:04 UTC ---
Created attachment 28039
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28039
More dedicated obstacks

This, together with Richi's leak plugs from earlier today, brings peak memory
down to 5GB. That's 5GB now, down from 9GB this morning. Not bad for a day's
work.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (55 preceding siblings ...)
  2012-08-16 23:04 ` steven at gcc dot gnu.org
@ 2012-08-17  9:43 ` steven at gcc dot gnu.org
  2012-08-20 19:31 ` steven at gcc dot gnu.org
                   ` (7 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-17  9:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #54 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-17 09:42:15 UTC ---
Author: steven
Date: Fri Aug 17 09:42:06 2012
New Revision: 190475

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190475
Log:
    PR middle-end/54146
    * tree-ssa-loop-im.c (lim_bitmap_obstack): New bitmap_obstack.
    (memref_free): Don't free the bitmaps individually here.
    (mem_ref_alloc): Allocate the bitmaps on the new bitmap obstack.
    (analyze_memory_references): Likewise.
    (tree_ssa_lim_initialize): Initialize the new bitmap obstack.
    (tree_ssa_lim_finalize): Release it.
    * dse.c (dse_bitmap_obstack): New bitmap obstack.
    (dse_obstack): New obstack.
    (get_group_info): Allocate the bitmaps on the new bitmap obstack.
    (dse_step0): Allocate the scratch bitmap on reg_obstack.  Initialize
    the new bitmap obstack and normal obstack.  Use XNEWVEC for bb_table.
    (record_store): Allocate regs_set on reg_obstack.
    (dse_step1): Allocate regs_live on reg_obstack.
    (dse_step2_init): Allocate offset_map_n and offset_map_p on the new
    obstack.
    (dse_step3_scan): Allocate bitmaps on the new bitmap obstack.
    (dse_step3): Likewise.
    (dse_confluence_0): Likewise.
    (dse_confluence_n): Likewise.
    (dse_transfer_function): Likewise.
    (dse_step7): Destroy the new obstacks, and everything allocated on
    them, in one big sweep.
    (rest_of_handle_dse): Update.
    * cfgexpand.c (stack_var_bitmap_obstack): New bitmap obstack.
    (add_stack_var_conflict): Allocate bitmaps on it.
    (add_scope_conflicts_1): Likewise.
    (add_scope_conflicts): Likewise.
    (update_alias_info_with_stack_vars): Likewise.
    (init_vars_expansion): Move TREE_USED fiddling expand_used_vars.
    Initialize the new bitmap obstack.
    (fini_vars_expansion): Release it.
    (estimated_stack_frame_size): Use init_vars_expansion to set things up
    and always clean up at the end.
    (expand_used_vars): Do the TREE_USED trickery here.  Always call
    fini_vars_expansion.
    * tree-ssa-live.h (struct tree_live_info_d): Make livein and liveout
    arrays of bitmap_head to avoid one indirection per bitmap access.
    (live_on_entry, live_on_exit, live_var_map, live_merge_and_clear,
    make_live_on_entry): Update.
    * tree-ssa-live.c (partition_view_bitmap): Don't double-free 'used'.
    (liveness_bitmap_obstack): New bitmap obstack.
    (remove_unused_locals): Use it to allocate all bitmaps on.  Update
    for livein/liveout changes in tree-ssa-live.h.
    (delete_tree_live_info): Release the bitmap obstack.
    (loe_visit_block, live_worklist, set_var_live_on_entry,
    calculate_live_on_exit, dump_live_info): Update.
    (calculate_live_ranges): Initialize the bitmap.
    * tree-ssa-ter.c (ter_bitmap_obstack): New bitmap obstack.
    (new_temp_expr_table): Allocate bitmap on it.
    (make_dependent_on_partition, add_to_partition_kill_list,
    add_dependence, process_replaceable): Likewise.
    (find_replaceable_exprs): Initialize and release the new obstack here.
    * df-problems.c (df_lr_add_problem): Allocate persistent bitmap
    for out_of_date_transfer_functions on df_bitmap_obstack.
    (df_live_add_problem): Likewise.
    (df_chain_add_problem): Likewise.
    (df_word_lr_add_problem): Likewise.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c
    trunk/gcc/df-problems.c
    trunk/gcc/dse.c
    trunk/gcc/tree-ssa-live.c
    trunk/gcc/tree-ssa-live.h
    trunk/gcc/tree-ssa-loop-im.c
    trunk/gcc/tree-ssa-loop-manip.c
    trunk/gcc/tree-ssa-ter.c


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (56 preceding siblings ...)
  2012-08-17  9:43 ` steven at gcc dot gnu.org
@ 2012-08-20 19:31 ` steven at gcc dot gnu.org
  2012-08-21  7:56 ` rguenther at suse dot de
                   ` (6 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-20 19:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX

--- Comment #55 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-20 19:30:34 UTC ---
The remaining problem areas are all liveness calculation routines that are
essentially inherent quadratic problems: DF liveness, IRA liveness, and
out-of-SSA liveness. 

I think it would be good to deprecate the flatten attribute...


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (57 preceding siblings ...)
  2012-08-20 19:31 ` steven at gcc dot gnu.org
@ 2012-08-21  7:56 ` rguenther at suse dot de
  2012-08-21 13:35 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: rguenther at suse dot de @ 2012-08-21  7:56 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #56 from rguenther at suse dot de <rguenther at suse dot de> 2012-08-21 07:55:14 UTC ---
On Mon, 20 Aug 2012, steven at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146
> 
> Steven Bosscher <steven at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|NEW                         |RESOLVED
>          Resolution|                            |WONTFIX
> 
> --- Comment #55 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-20 19:30:34 UTC ---
> The remaining problem areas are all liveness calculation routines that are
> essentially inherent quadratic problems: DF liveness, IRA liveness, and
> out-of-SSA liveness. 
> 
> I think it would be good to deprecate the flatten attribute...

It still can be useful I think, if only for creating testcases
with arbitrary large functions ;)


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (58 preceding siblings ...)
  2012-08-21  7:56 ` rguenther at suse dot de
@ 2012-08-21 13:35 ` rguenth at gcc dot gnu.org
  2012-08-21 13:57 ` stevenb.gcc at gmail dot com
                   ` (4 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-21 13:35 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #57 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-21 13:34:28 UTC ---
Author: rguenth
Date: Tue Aug 21 13:34:19 2012
New Revision: 190562

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190562
Log:
2012-08-21  Richard Guenther  <rguenther@suse.de>

    Backport from mainline
    2012-08-16  Richard Guenther  <rguenther@suse.de>

    PR middle-end/54146
    * tree-ssa-loop-niter.c (find_loop_niter_by_eval): Free the
    exit vector.
    * ipa-pure-const.c (analyze_function): Use FOR_EACH_LOOP_BREAK.
    * cfgloop.h (FOR_EACH_LOOP_BREAK): Fix.
    * tree-ssa-structalias.c (handle_lhs_call): Properly free rhsc.
    * tree-ssa-loop-im.c (analyze_memory_references): Adjust.
    (tree_ssa_lim_finalize): Free all mem_refs.
    * tree-ssa-sccvn.c (extract_and_process_scc_for_name): Free
    scc when bailing out.
    * modulo-sched.c (sms_schedule): Use FOR_EACH_LOOP_BREAK.
    * ira-build.c (loop_with_complex_edge_p): Free loop exit vector.
    * graphite-sese-to-poly.c (scop_ivs_can_be_represented): Use
    FOR_EACH_LOOP_BREAK.

    2012-08-17  Richard Guenther  <rguenther@suse.de>

    * tree-sra.c (modify_function): Free redirect_callers vector.
    * ipa-split.c (split_function): Free args_to_pass vector.
    * tree-vect-stmts.c (vectorizable_operation): Do not pre-allocate
    vec_oprnds.
    (new_stmt_vec_info): Do not pre-allocate STMT_VINFO_SAME_ALIGN_REFS.
    * tree-vect-slp.c (vect_free_slp_instance): Free the instance.
    (vect_analyze_slp_instance): Free everything.
    (destroy_bb_vec_info): Free the SLP instances.

    2012-08-17  Richard Guenther  <rguenther@suse.de>

    * params.def (integer-share-limit): Decrease from 256 to 251,
    add rationale.

    2012-08-21  Richard Guenther  <rguenther@suse.de>

    * tree-ssa-loop-im.c (tree_ssa_lim_finalize): Properly free
    the affine expansion cache.

Modified:
    branches/gcc-4_7-branch/gcc/ChangeLog
    branches/gcc-4_7-branch/gcc/cfgloop.h
    branches/gcc-4_7-branch/gcc/graphite-sese-to-poly.c
    branches/gcc-4_7-branch/gcc/ipa-pure-const.c
    branches/gcc-4_7-branch/gcc/ipa-split.c
    branches/gcc-4_7-branch/gcc/ira-build.c
    branches/gcc-4_7-branch/gcc/modulo-sched.c
    branches/gcc-4_7-branch/gcc/params.def
    branches/gcc-4_7-branch/gcc/tree-sra.c
    branches/gcc-4_7-branch/gcc/tree-ssa-loop-im.c
    branches/gcc-4_7-branch/gcc/tree-ssa-loop-niter.c
    branches/gcc-4_7-branch/gcc/tree-ssa-sccvn.c
    branches/gcc-4_7-branch/gcc/tree-ssa-structalias.c
    branches/gcc-4_7-branch/gcc/tree-vect-slp.c
    branches/gcc-4_7-branch/gcc/tree-vect-stmts.c


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (59 preceding siblings ...)
  2012-08-21 13:35 ` rguenth at gcc dot gnu.org
@ 2012-08-21 13:57 ` stevenb.gcc at gmail dot com
  2012-09-02 22:55 ` steven at gcc dot gnu.org
                   ` (3 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: stevenb.gcc at gmail dot com @ 2012-08-21 13:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #58 from stevenb.gcc at gmail dot com <stevenb.gcc at gmail dot com> 2012-08-21 13:56:27 UTC ---
FWIW, I think all patches addressing parts of this bug are candidates
for back-porting to release branches. They are all almost trivial.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (60 preceding siblings ...)
  2012-08-21 13:57 ` stevenb.gcc at gmail dot com
@ 2012-09-02 22:55 ` steven at gcc dot gnu.org
  2012-09-04 10:51 ` steven at gcc dot gnu.org
                   ` (2 subsequent siblings)
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-09-02 22:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #59 from Steven Bosscher <steven at gcc dot gnu.org> 2012-09-02 22:54:34 UTC ---
FWIW Martin: SRA blows up this test case's register pressure. Compiling with
SRA enabled takes ~900s, but with -fno-tree-sra compile time almost halves.
There are extremely long live ranges for SA.* variables created by SRA.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (61 preceding siblings ...)
  2012-09-02 22:55 ` steven at gcc dot gnu.org
@ 2012-09-04 10:51 ` steven at gcc dot gnu.org
  2012-09-10 16:05 ` hubicka at ucw dot cz
  2012-10-15 21:32 ` steven at gcc dot gnu.org
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-09-04 10:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #60 from Steven Bosscher <steven at gcc dot gnu.org> 2012-09-04 10:50:47 UTC ---
Another improvement that can be done for the test case, is to assign
pseudo-register numbers such that the density of the DF_LR and DF_LIVE bitmaps
increases.

For the density I used the following definition here:
  size = (# list elts) * (# bits per elt)
  cardinality = (# bits set)
  density = cardinality / size

The average density we achieve for the DF_LR_IN sets of the largest function is
only 12%, and that number is misleading because there are a few elements with
almost all bits set and many elements with only a few bits set. For instance
there is one bitmap with the following statistics:

size = 50
cardinality = 940
density = 14.7%

but the per-element statistics show:

bits/elt    count
1        20
2        19
3        2
4        2
124        7

so excluding the 7 dense elements (124 bits set out of 128), the density is
only 1.3%!

The dense bits are for a bunch of SRA variables, the sparse ones for ivtmp
variables. The ivtmp variables are live in mostly the same blocks as the SRA
variables (>100000 blocks where the variables appear in DF_LR_IN of the block)
so the packing is extremely inefficient. This is the cause for most of the
slowness of DF on the test case: the bitmap chains are large and extremely
sparse.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (62 preceding siblings ...)
  2012-09-04 10:51 ` steven at gcc dot gnu.org
@ 2012-09-10 16:05 ` hubicka at ucw dot cz
  2012-10-15 21:32 ` steven at gcc dot gnu.org
  64 siblings, 0 replies; 66+ messages in thread
From: hubicka at ucw dot cz @ 2012-09-10 16:05 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #61 from Jan Hubicka <hubicka at ucw dot cz> 2012-09-10 16:04:38 UTC ---
> > I think it would be good to deprecate the flatten attribute...
> 
> It still can be useful I think, if only for creating testcases
> with arbitrary large functions ;)

I think it is generally useful and some of the HPC extensions do contain
flatten
equivalent.  I do not see anything bad in compiler blowing up your machine when
you
explicitely ask him to do so via an attribute ;)

Honza


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Bug middle-end/54146] Very slow compile with attribute((flatten))
  2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
                   ` (63 preceding siblings ...)
  2012-09-10 16:05 ` hubicka at ucw dot cz
@ 2012-10-15 21:32 ` steven at gcc dot gnu.org
  64 siblings, 0 replies; 66+ messages in thread
From: steven at gcc dot gnu.org @ 2012-10-15 21:32 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #62 from Steven Bosscher <steven at gcc dot gnu.org> 2012-10-15 21:32:04 UTC ---
Author: steven
Date: Mon Oct 15 21:31:57 2012
New Revision: 192476

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192476
Log:
    Backport from trunk (r190222):

    PR tree-optimization/54146
    * ifcvt.c: Include pointer-set.h.
    (cond_move_process_if_block): Change type of then_regs and
    else_regs from alloca'd array to pointer_sets.
    (check_cond_move_block): Update for this change.
    (cond_move_convert_if_block): Likewise.
    * Makefile.in: Fix dependencies for ifcvt.o.


Modified:
    branches/gcc-4_7-branch/gcc/ChangeLog
    branches/gcc-4_7-branch/gcc/Makefile.in
    branches/gcc-4_7-branch/gcc/ifcvt.c


^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2012-10-15 21:32 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-31 18:33 [Bug middle-end/54146] New: Very slow compile at -O1 (expand vars) glisse at gcc dot gnu.org
2012-07-31 18:38 ` [Bug middle-end/54146] " pinskia at gcc dot gnu.org
2012-07-31 18:48 ` glisse at gcc dot gnu.org
2012-07-31 20:17 ` steven at gcc dot gnu.org
2012-07-31 20:34 ` steven at gcc dot gnu.org
2012-07-31 22:17 ` steven at gcc dot gnu.org
2012-07-31 23:00 ` steven at gcc dot gnu.org
2012-08-01 12:22 ` steven at gcc dot gnu.org
2012-08-01 14:08 ` steven at gcc dot gnu.org
2012-08-01 14:14 ` steven at gcc dot gnu.org
2012-08-01 14:25 ` steven at gcc dot gnu.org
2012-08-01 14:32 ` rguenth at gcc dot gnu.org
2012-08-01 16:00 ` glisse at gcc dot gnu.org
2012-08-02 10:01 ` [Bug tree-optimization/54146] Very slow compile with attribute((flatten)) steven at gcc dot gnu.org
2012-08-02 10:37 ` rguenther at suse dot de
2012-08-03  9:00 ` steven at gcc dot gnu.org
2012-08-03 14:44 ` matz at gcc dot gnu.org
2012-08-05 11:39 ` steven at gcc dot gnu.org
2012-08-05 18:49 ` steven at gcc dot gnu.org
2012-08-05 21:14 ` steven at gcc dot gnu.org
2012-08-06  8:46 ` rguenth at gcc dot gnu.org
2012-08-06  9:10 ` stevenb.gcc at gmail dot com
2012-08-06 15:36 ` steven at gcc dot gnu.org
2012-08-06 19:37 ` steven at gcc dot gnu.org
2012-08-06 20:22 ` steven at gcc dot gnu.org
2012-08-06 20:56 ` steven at gcc dot gnu.org
2012-08-06 22:43 ` steven at gcc dot gnu.org
2012-08-06 22:59 ` steven at gcc dot gnu.org
2012-08-07 11:59 ` rguenth at gcc dot gnu.org
2012-08-07 19:58 ` steven at gcc dot gnu.org
2012-08-07 22:28 ` steven at gcc dot gnu.org
2012-08-07 22:37 ` steven at gcc dot gnu.org
2012-08-08  6:28 ` steven at gcc dot gnu.org
2012-08-08  6:29 ` steven at gcc dot gnu.org
2012-08-08  6:30 ` [Bug middle-end/54146] " steven at gcc dot gnu.org
2012-08-08  6:30 ` steven at gcc dot gnu.org
2012-08-08  8:10 ` rguenth at gcc dot gnu.org
2012-08-08 10:11 ` steven at gcc dot gnu.org
2012-08-08 11:49 ` rguenth at gcc dot gnu.org
2012-08-08 17:40 ` steven at gcc dot gnu.org
2012-08-10  3:46 ` hubicka at ucw dot cz
2012-08-10  3:51 ` hubicka at ucw dot cz
2012-08-10  3:54 ` hubicka at ucw dot cz
2012-08-10  4:35 ` hubicka at gcc dot gnu.org
2012-08-10  5:19 ` hubicka at gcc dot gnu.org
2012-08-10  6:33 ` steven at gcc dot gnu.org
2012-08-10  7:53 ` hubicka at gcc dot gnu.org
2012-08-14 12:39 ` rguenth at gcc dot gnu.org
2012-08-14 13:21 ` glisse at gcc dot gnu.org
2012-08-15 14:34 ` steven at gcc dot gnu.org
2012-08-15 15:07 ` steven at gcc dot gnu.org
2012-08-16 10:53 ` steven at gcc dot gnu.org
2012-08-16 12:11 ` rguenth at gcc dot gnu.org
2012-08-16 13:56 ` stevenb.gcc at gmail dot com
2012-08-16 14:07 ` rguenther at suse dot de
2012-08-16 14:28 ` rguenth at gcc dot gnu.org
2012-08-16 23:04 ` steven at gcc dot gnu.org
2012-08-17  9:43 ` steven at gcc dot gnu.org
2012-08-20 19:31 ` steven at gcc dot gnu.org
2012-08-21  7:56 ` rguenther at suse dot de
2012-08-21 13:35 ` rguenth at gcc dot gnu.org
2012-08-21 13:57 ` stevenb.gcc at gmail dot com
2012-09-02 22:55 ` steven at gcc dot gnu.org
2012-09-04 10:51 ` steven at gcc dot gnu.org
2012-09-10 16:05 ` hubicka at ucw dot cz
2012-10-15 21:32 ` steven at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).