public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* PR c++/5504: Optimization breaks wei-ku-1 from blitz
@ 2002-03-25 21:41 Diego Novillo
  2002-04-03 18:04 ` Jason Merrill
  2002-04-05 15:08 ` Jason Merrill
  0 siblings, 2 replies; 8+ messages in thread
From: Diego Novillo @ 2002-03-25 21:41 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc, gcc-prs

Jason,

Although the PR indicates that g++ ICEs, the program actually
compiles.  However, it takes about 7 hours and cc1plus grows to
~650 Mb.

The problem seems to be in the expansion of EH cleanups.  I've
been comparing gcc 3.0.4 with the 3.1 branch.  This is what I
found so far:

- Both compilers seem to generate roughly the same number of
  trees.  They both grow to about 47 Mb right before the call to
  expand_stmt() in semantics.c:expand_body().

- However, when expand_stmt() is called to expand the body of
  main(), the 3.1 compiler grows to 415Mb, while the 3.0.4.
  compiler grows to about 53 Mb.

- The difference is in the calls to expand_eh_region_end_cleanup()
  The 3.0.4 compiler calls it 76 times.  The 3.1 compiler calls
  it 59,212 times.
  
  Not surprisingly, we spend no more than a couple of minutes
  generating the initial RTL, and 7 hours trying to optimize it.
  We seem to be generating mostly fluff because the final .s file
  doesn't look overly large (a couple of Mb, IIRC).

- The program seems to compile fine with -fno-exceptions.

Jakub tracked the problem to your destructor cleanup patch from
Dec01:

http://gcc.gnu.org/ml/gcc-patches/2001-12/msg01532.html

I don't know enough C++ to determine whether this is to be
expected or not.  The test case is rather large, so it may well
be that 3.0.4 was not doing the right thing.

Any ideas on what to try next?


Thanks.  Diego.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PR c++/5504: Optimization breaks wei-ku-1 from blitz
  2002-03-25 21:41 PR c++/5504: Optimization breaks wei-ku-1 from blitz Diego Novillo
@ 2002-04-03 18:04 ` Jason Merrill
  2002-04-03 19:23   ` Diego Novillo
  2002-04-05 15:08 ` Jason Merrill
  1 sibling, 1 reply; 8+ messages in thread
From: Jason Merrill @ 2002-04-03 18:04 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc, gcc-prs

Interesting.  I'll take over this PR, if you don't mind.

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PR c++/5504: Optimization breaks wei-ku-1 from blitz
  2002-04-03 18:04 ` Jason Merrill
@ 2002-04-03 19:23   ` Diego Novillo
  0 siblings, 0 replies; 8+ messages in thread
From: Diego Novillo @ 2002-04-03 19:23 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc, gcc-prs

On Wed, 2002-04-03 at 20:57, Jason Merrill wrote:
> Interesting.  I'll take over this PR, if you don't mind.
> 
Sure.  Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PR c++/5504: Optimization breaks wei-ku-1 from blitz
  2002-03-25 21:41 PR c++/5504: Optimization breaks wei-ku-1 from blitz Diego Novillo
  2002-04-03 18:04 ` Jason Merrill
@ 2002-04-05 15:08 ` Jason Merrill
  2002-04-05 16:14   ` Jason Merrill
                     ` (2 more replies)
  1 sibling, 3 replies; 8+ messages in thread
From: Jason Merrill @ 2002-04-05 15:08 UTC (permalink / raw)
  To: Richard Henderson, Diego Novillo; +Cc: gcc, gcc-gnats, Jason Merrill

This seems to be a problem with combinatorial explosion of EH regions.

The explosion itself is a necessary consequence of the use of expression
templates, which produces a lot of little recursive functions inlined into 
one another, ideally optimizing out to very simple code.  In this case, since
IndexPlaceholder has a destructor, all of the expression template classes
parameterized on it need to call it.  As the expression (A = 11111 + ...)
grows, the classes involved get more and more complicated.

We used to handle cleanups in destructors by manually calling the base and
member destructors from finish_function.  My patch changed this to use
normal cleanups for destroying bases and members, in order to get proper
semantics if the destructor itself or one of the subobject destructors
throw; previously we would have skipped the other subobject destructors.
This change is necessary for correctness.

But as a result, we now have a lot more EH regions than we used to.
Counting calls to expand_eh_region_end_cleanup, I see: With one term, 67;
two, 379; three, 1657; four, 6779.  It's not entirely clear to me that this
explosion is due to the nature of expression templates, but I think it is.
And the vast majority of these aren't needed; the function ends up with 43
entries in the action table.

For the four term expression, the maximal expression template class looks
like

_bz_ArrayExpr
 <blitz::_bz_ArrayExprOp
  <blitz::_bz_ArrayExpr
   <blitz::_bz_ArrayExprOp 
    <blitz::_bz_ArrayExpr 
     <blitz::_bz_ArrayExprOp 
      <blitz::_bz_ArrayExprConstant<int>, 
       blitz::_bz_ArrayExpr 
        <blitz::_bz_ArrayExprOp  
         <blitz::_bz_ArrayExprConstant<int>,  
          blitz::IndexPlaceholder<0>,  
          blitz::Multiply<int, int> > >,  
       blitz::Add<int, int> > >, 
     blitz::_bz_ArrayExpr
      <blitz::_bz_ArrayExprOp
       <blitz::_bz_ArrayExprConstant<int>,
        blitz::IndexPlaceholder<1>,
        blitz::Multiply<int, int> > >,
     blitz::Add<int, int> > >,
   blitz::_bz_ArrayExpr
    <blitz::_bz_ArrayExprOp
     <blitz::_bz_ArrayExprConstant<int>,
      blitz::IndexPlaceholder<2>,
      blitz::Multiply<int, int> > >,
   blitz::Add<int, int> > >

which itself involves 15 destructor calls, for the IndexPlaceholder members
and all the types that (perhaps indirectly) contain one.  Each of these is
expanded twice, once for the normal flow case, and once for the EH case.
If fully inlined, destroying one temporary of this type adds 30 exception
regions to the list.  And such temporaries are created every time the
expression object is passed by value, as it uniformly is.

Unfortunately, the backend doesn't seem to be prepared to handle this
number of EH regions; many of the data structures can only be searched
linearly.  For the four-term case, gprof shows that we're spending the vast
majority of our time traversing them:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 37.35    211.09   211.09   110832     1.90     2.32  maybe_remove_eh_handler
 26.59    361.36   150.27   344409     0.44     0.44  in_expr_list_p
 10.00    417.89    56.53   305399     0.19     0.19  next_nonnote_insn
  6.59    455.11    37.22    28896     1.29     1.29  remove_exception_handler_label
...

where the time in in_expr_list_p comes from the call in can_delete_label_p
to see if a label is in exception_handler_labels.

with -O0 -finline, it looks a bit different:

 28.70    106.27   106.27   100111     1.06     1.30  maybe_remove_eh_handler
 20.71    182.94    76.67   306819     0.25     0.25  in_expr_list_p
 12.38    228.77    45.83    92573     0.50     0.50  pop_temp_slots
  7.12    255.12    26.35    56221     0.47     0.47  free_temp_slots
  5.03    273.73    18.61    28872     0.64     0.64  remove_exception_handler_label

Here again we're running into linear traversals of lists that got larger
than we ever expected, but this time it's the list of temp slots.

I'm not sure what to do about this.  We need to support this sort of code;
this is a very common C++ programming idiom.  Obvious ways to improve
performance would be:

  Adjust the EH data structures so that we can do more efficient searches in
  them.

  Try to avoid creating EH regions that will just be deleted again.  If
  nothing in the region can throw, we can discard it at expand_eh_region_end
  time.

For 3.1, one option would be to go back to doing destructor cleanups only
on the normal flow path; this would break EH semantics again for this case,
but that would not be a regression and I don't think I've ever seen a bug
report about it in past releases.

Thoughts?

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: c++/5504: Optimization breaks wei-ku-1 from blitz
  2002-04-05 15:08 ` Jason Merrill
@ 2002-04-05 16:14   ` Jason Merrill
  2002-04-06 15:36   ` PR " Mark Mitchell
  2002-04-09  0:20   ` Richard Henderson
  2 siblings, 0 replies; 8+ messages in thread
From: Jason Merrill @ 2002-04-05 16:14 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Diego Novillo, gcc, gcc-gnats

>>>>> "Jason" == Jason Merrill <jason@redhat.com> writes:

> For 3.1, one option would be to go back to doing destructor cleanups only
> on the normal flow path; this would break EH semantics again for this case,
> but that would not be a regression and I don't think I've ever seen a bug
> report about it in past releases.

I take it back; PR c++/411 is for this bug.  We may still want to do this,
though.

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PR c++/5504: Optimization breaks wei-ku-1 from blitz
  2002-04-05 15:08 ` Jason Merrill
  2002-04-05 16:14   ` Jason Merrill
@ 2002-04-06 15:36   ` Mark Mitchell
  2002-04-06 23:38     ` Richard Henderson
  2002-04-09  0:20   ` Richard Henderson
  2 siblings, 1 reply; 8+ messages in thread
From: Mark Mitchell @ 2002-04-06 15:36 UTC (permalink / raw)
  To: Jason Merrill, Richard Henderson, Diego Novillo; +Cc: gcc, gcc-gnats

> For 3.1, one option would be to go back to doing destructor cleanups only
> on the normal flow path; this would break EH semantics again for this
> case, but that would not be a regression and I don't think I've ever seen
> a bug report about it in past releases.

I think this is the most practical solution under the circumstances.

At some point, we clearly need to explore the other alternatives you
suggest.

Thanks,

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PR c++/5504: Optimization breaks wei-ku-1 from blitz
  2002-04-06 15:36   ` PR " Mark Mitchell
@ 2002-04-06 23:38     ` Richard Henderson
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Henderson @ 2002-04-06 23:38 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jason Merrill, Diego Novillo, gcc, gcc-gnats

On Sat, Apr 06, 2002 at 02:48:54PM -0800, Mark Mitchell wrote:
> I think this is the most practical solution under the circumstances.

Hold off a bit.  I don't think either of the solutions
that Jason suggests are hard to implement.  I'd certainly
prefer a solution that doesn't re-introduce bugs...


r~

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PR c++/5504: Optimization breaks wei-ku-1 from blitz
  2002-04-05 15:08 ` Jason Merrill
  2002-04-05 16:14   ` Jason Merrill
  2002-04-06 15:36   ` PR " Mark Mitchell
@ 2002-04-09  0:20   ` Richard Henderson
  2 siblings, 0 replies; 8+ messages in thread
From: Richard Henderson @ 2002-04-09  0:20 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Diego Novillo, gcc, mark

On Fri, Apr 05, 2002 at 11:34:40PM +0100, Jason Merrill wrote:
> Unfortunately, the backend doesn't seem to be prepared to handle this
> number of EH regions; many of the data structures can only be searched
> linearly.  For the four-term case, gprof shows that we're spending the vast
> majority of our time traversing them:
> 
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls  ms/call  ms/call  name    
>  37.35    211.09   211.09   110832     1.90     2.32  maybe_remove_eh_handler
>  26.59    361.36   150.27   344409     0.44     0.44  in_expr_list_p
>  10.00    417.89    56.53   305399     0.19     0.19  next_nonnote_insn
>   6.59    455.11    37.22    28896     1.29     1.29  remove_exception_handler_label
[...]
> I'm not sure what to do about this.  We need to support this sort of code;
> this is a very common C++ programming idiom.  Obvious ways to improve
> performance would be:
>
> [1] Adjust the EH data structures so that we can do more efficient
>     searches in them.
> [2] Try to avoid creating EH regions that will just be deleted again.
>     If nothing in the region can throw, we can discard it at
>     expand_eh_region_end time.

It turns out that [2] is annoyingly difficult.  FIXUP regions created
by expand_cleanups want to insert code into the *parent* of a CLEANUP
region, after that region has already been finalized by
expand_eh_region_end.  I.e. if we simply fail to create the cleanup
region based on the fact that it'll never be used, we'll not have all
the data structures needed to resolve the fixup region.  :-(

Doing [1] is within our power.  I found a whole series of quadratic
(or worse) operations here -- not all of them in the EH code.

I'm currently testing a patch that brings the original test case to

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
  5.14      7.34     7.34  2662100     0.00     0.00  walk_tree
  4.57     13.87     6.53      701     0.01     0.01  fixup_var_refs_insns
  4.30     20.01     6.14 34516896     0.00     0.00  ggc_alloc
  3.20     24.58     4.57 62692969     0.00     0.00  statement_code_p
  2.96     28.81     4.23  7488030     0.00     0.00  fixup_var_refs_1
  2.70     32.66     3.85 77455616     0.00     0.00  add_insn

 expand                : 207.49 (59%) usr   8.39 (68%) sys 216.25 (59%) wall
 parser                :  38.15 (11%) usr   0.77 ( 6%) sys  38.52 (11%) wall
 garbage collection    :  29.99 ( 9%) usr   1.31 (11%) sys  31.30 ( 9%) wall
 cfg construction      :  18.96 ( 5%) usr   0.46 ( 4%) sys  19.56 ( 5%) wall
 cfg cleanup           :  16.27 ( 5%) usr   0.04 ( 0%) sys  16.23 ( 4%) wall
 TOTAL                 : 351.17            12.35           363.83

I.e. we're down below 6 minutes compile time instead of 7 hours.

Unfortunately, it still consumes unacceptable amounts of memory.
Somewhere near 1.2GB peak.  I think I can improve this by eliding
some of the *code* associated with cleanup regions, even though
I cannot elide the cleanup region data structure itself.

I'll test the patch I have overnight and post if it succeeds.


r~

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-04-09  7:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-03-25 21:41 PR c++/5504: Optimization breaks wei-ku-1 from blitz Diego Novillo
2002-04-03 18:04 ` Jason Merrill
2002-04-03 19:23   ` Diego Novillo
2002-04-05 15:08 ` Jason Merrill
2002-04-05 16:14   ` Jason Merrill
2002-04-06 15:36   ` PR " Mark Mitchell
2002-04-06 23:38     ` Richard Henderson
2002-04-09  0:20   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).