public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Unit at a time C++ again
@ 2003-06-18 13:53 Jan Hubicka
  2003-06-18 23:31 ` Mark Mitchell
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Hubicka @ 2003-06-18 13:53 UTC (permalink / raw)
  To: mark, gcc

Mark,
after some frustration I got back to the unit-at-a-time and it appears
difficult to get there without doing quite a lot of intrusive modifications at
once, so I would like to discuss with you a plan whether it is possible to get
unit-at-a-time working with gcc-3.4 timeframe (it would be nice to do as it
would be easier to modify inlining heuristics to avoid some dead ends we are
seeing now).

As discussed last time, we both agree that we are shooting for scheme where
frontend expands all functions/datastructures into generic tree form (gimple in
post-3.5) and unit-at-a-time code cares to build callgraph and graph of
references and remove unreachable nodes.

My current code, as implemented for C/objC does not touch the data structures
and does just the callgraph part.  My original plan was to first convert all
frontends to the callgraph and continue by datastructures next.  The API is
done as follows:

  once frontend is finished with function, it calls:
    cgraph_finalize_function (decl, body)
  once frontend is finished with the compilation unit it calls:
    cgraph_finalize_compilation_unit ()
  and once we want to output functions it calls:
    cgraph_expand_functions ()

At the moment the data structures are output directly via varasm and it calls
   cgraph_mark_needed_node (node, needed)
for each function whose reference is output.  As incremental step I would like
to implement the data-structure part, but the way data structures are output
seems to be very frontend dependent so it appears to be dificult.  I would like
to do it later in case it is not essential to get callgraph part used by C++.

cgraph expand functions then examines entry points to the program (externally
visible functions, functions marked by varasm) and lowers representation of all
such reachable bodies.  I expect gimplification to sit there, but at the moment
I simply call back the frontend via
    lang_hooks.callgraph.lower_function
this function is supposed to make all calls and references to function in the
body explicit and it may futher do cgraph_mark_needed_node for functions needed
by this function.  Callgraph is constructed for the examined functions and
reachable functions are marked as needed and lowered by same process.

Finally once needed functions are lowered, unit-at-a-time optimizations are
performed and
    lang_hooks.callgraph.expand_function
is called.  On tree-SSA I expect this to be middle end gimple
optimizer/expander but at the moment this is slightly frontend specific.

Now the question is how to map C++ frontend to it.  My original plan has been as follows:
  - map expand_body to cgraph_finalize_function
  - use lang_hooks.callgraph.lower_function to output virtual tables/produce
    static initializers for given function

First dificulty I run into that expand_body often deffers function and later we
change DECL_PUBLIC and similar attributes.  My cgraph code expects that once
function is finalized, it's declaration does not change so it gets confused.

Why is this done?  Is this really needed?
To avoid problems with this I decided to keep the function deffering mechanizm
and finalize_functions at the end of compilation.

The patch I sent does not exactly that - instead it preserves the current loop
to determine functions needed and finalize only those that are needed.  This is
because to get the proper flags I needed to call import_export_decl that
appeared to be unsafe to call for functions I won't output at the end.
Perhaps I can avoid this and simply walk whole queue and finalize all functions.

Another problem is the code to output static initializers/virtual tables.  At
the moment I kept the code to iterate over functions needed and I simply inject
TREE_SYMBOL_REFERENCED flag from lower_function.

This is ugly as we discussed earlier.  We concluded that it will be better to
just examine what functions are further needed by this and expand it later.  I
did run into problem here with static initializers - I apparently can not
expand the function to statically initialize something until I output the data
itself, but I don't want to do that.

What do you think about incremental step here were we simply output the static
initializers and virtual tables needed by the function.  Is this possible to do
so with simple recursive walk of function body noticing what tables/data are
referenced?  (so we won't need to have the loop going trought all the vtables
checking TREE_SYMBOL_REFERENCED.

What do you think about sollution that will keep the array for deffering
functions, but actually cgraph_finalize_function them all at the end of
compilation unit.  Then output vtables/static initializers referenced by static
data (using loop similar to current one, but this won't need to iterate,
right?) and finally in lang_hooks.callgraph.lower_function walk function
body and output vtables/static initializers needed by function that are not
already output.  Will this introduce new function bodies?

Honza

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unit at a time C++ again
  2003-06-18 13:53 Unit at a time C++ again Jan Hubicka
@ 2003-06-18 23:31 ` Mark Mitchell
  2003-06-19  8:36   ` Jan Hubicka
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Mitchell @ 2003-06-18 23:31 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

On Wed, 2003-06-18 at 05:26, Jan Hubicka wrote:
> Mark,
> after some frustration I got back to the unit-at-a-time and it appears
> difficult to get there without doing quite a lot of intrusive modifications at
> once, so I would like to discuss with you a plan whether it is possible to get
> unit-at-a-time working with gcc-3.4 timeframe (it would be nice to do as it
> would be easier to modify inlining heuristics to avoid some dead ends we are
> seeing now).

As I said before, I don't think we should consider this work until it
handles both template instantiation and data emission.  Otherwise, we're
getting far less than halfway there; the places where we're seeing big
problems relate to the interaction between all of these things.

(I'd also prefer that the bits that make registration calls into cgraph
not happen until the end of the translation unit.  There's no reason
they should have to be done as we go.)

-- 
Mark Mitchell
CodeSourcery, LLC
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unit at a time C++ again
  2003-06-18 23:31 ` Mark Mitchell
@ 2003-06-19  8:36   ` Jan Hubicka
  2003-06-19 10:47     ` Mark Mitchell
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Hubicka @ 2003-06-19  8:36 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jan Hubicka, gcc

> On Wed, 2003-06-18 at 05:26, Jan Hubicka wrote:
> > Mark,
> > after some frustration I got back to the unit-at-a-time and it appears
> > difficult to get there without doing quite a lot of intrusive modifications at
> > once, so I would like to discuss with you a plan whether it is possible to get
> > unit-at-a-time working with gcc-3.4 timeframe (it would be nice to do as it
> > would be easier to modify inlining heuristics to avoid some dead ends we are
> > seeing now).
> 
> As I said before, I don't think we should consider this work until it
> handles both template instantiation and data emission.  Otherwise, we're

OK, dealing with template instantiation should not be too dificult (I
think I can simply instantiate all templates with the code I am having
right now).  Dealing with at least needed portion of the datastructures
should not be impossible, but I would like first to understand what
exactly it does buy to us...

> getting far less than halfway there; the places where we're seeing big
> problems relate to the interaction between all of these things.

... what kind of problems do you have in the mind?

> 
> (I'd also prefer that the bits that make registration calls into cgraph
> not happen until the end of the translation unit.  There's no reason
> they should have to be done as we go.)

I am doing that now - I put them into deferred_functions array, register
all of them at the end of compilation unit and then instantiated
templates and friends go directly.

Honza
> 
> -- 
> Mark Mitchell
> CodeSourcery, LLC
> mark@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unit at a time C++ again
  2003-06-19  8:36   ` Jan Hubicka
@ 2003-06-19 10:47     ` Mark Mitchell
  2003-06-19 11:48       ` Jan Hubicka
  2003-06-19 17:27       ` Joe Buck
  0 siblings, 2 replies; 11+ messages in thread
From: Mark Mitchell @ 2003-06-19 10:47 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

On Wed, 2003-06-18 at 23:19, Jan Hubicka wrote:
> > On Wed, 2003-06-18 at 05:26, Jan Hubicka wrote:
> > > Mark,
> > > after some frustration I got back to the unit-at-a-time and it appears
> > > difficult to get there without doing quite a lot of intrusive modifications at
> > > once, so I would like to discuss with you a plan whether it is possible to get
> > > unit-at-a-time working with gcc-3.4 timeframe (it would be nice to do as it
> > > would be easier to modify inlining heuristics to avoid some dead ends we are
> > > seeing now).
> > 
> > As I said before, I don't think we should consider this work until it
> > handles both template instantiation and data emission.  Otherwise, we're
> 
> OK, dealing with template instantiation should not be too dificult (I
> think I can simply instantiate all templates with the code I am having
> right now).  Dealing with at least needed portion of the datastructures
> should not be impossible, but I would like first to understand what
> exactly it does buy to us...

The template instantiation part of the issue is pretty clear: in big C++
programs many (if not most) of the functions are template
instantiations.  Trying to make decisions before template instantiation
is not worth it.

Data structures are also important because they can trigger template
instantiations.  A simple example is:

  template <typename T> void f(T) {}
  
  void (*fp)(int) = &f<int>;

Often, that kind of thing results in the instantiations of lots of
additional functions, as one function calls another and so forth.

-- 
Mark Mitchell
CodeSourcery, LLC
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unit at a time C++ again
  2003-06-19 10:47     ` Mark Mitchell
@ 2003-06-19 11:48       ` Jan Hubicka
  2003-06-19 14:52         ` Mark Mitchell
  2003-06-19 17:27       ` Joe Buck
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Hubicka @ 2003-06-19 11:48 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jan Hubicka, gcc

Hi,
> > OK, dealing with template instantiation should not be too dificult (I
> > think I can simply instantiate all templates with the code I am having
> > right now).  Dealing with at least needed portion of the datastructures
> > should not be impossible, but I would like first to understand what
> > exactly it does buy to us...
> 
> The template instantiation part of the issue is pretty clear: in big C++
> programs many (if not most) of the functions are template
> instantiations.  Trying to make decisions before template instantiation
> is not worth it.

This is what I am partly affraid of - forcing all instantiations to
actually happen (and not only doing the reachable ones) seems to be
problem as we produce a lot of unneeded trees.
Perhaps I can somehow just instantiate the template datastructures and
not produce the actual methods and produce the methods during the
function lowering process (that happens only for reachable functions),
but I am not sure what kind of complications I can hit with this.

Also note that I am not doing any decisions before template
instantiation is done - I am just walking the call dependency graph
expanding any reachable node I visit getting templates instantiated as
needed.
> 
> Data structures are also important because they can trigger template
> instantiations.  A simple example is:
> 
>   template <typename T> void f(T) {}
>   
>   void (*fp)(int) = &f<int>;
> 
> Often, that kind of thing results in the instantiations of lots of
> additional functions, as one function calls another and so forth.

Yes, this is clear to me, however what I don't understand why you
consider the split scheme where reachable datastructures are produced by
frontend and reachable functions maintained by backend major problem.

I am now working on adding variable graph companion to my callgraph code
(for C/objC this seems to be pretty easy).  This will hopefully get the
code closer to what we are shooting for.  For C++ there seems to be
problem with virtual tables as these are not output only via
assemble_variable and do have dependencies in between middle end does
not see.

When I don't want to give up idea of doing the transition in multiple
steps instead of one big that will likely cause inexpected surprises,
perhaps we can attack the C++ problem from the other way around -
finalize all build datastructures, keep middle end on the busyness of
doing reachability analysis... 
I will at least try to do this step in my local tree before trying to
put it all together and lets see how far can I get...

Honza
> 
> -- 
> Mark Mitchell
> CodeSourcery, LLC
> mark@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unit at a time C++ again
  2003-06-19 11:48       ` Jan Hubicka
@ 2003-06-19 14:52         ` Mark Mitchell
  2003-06-19 15:08           ` Jan Hubicka
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Mitchell @ 2003-06-19 14:52 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

On Thu, 2003-06-19 at 01:47, Jan Hubicka wrote:
> Hi,
> > > OK, dealing with template instantiation should not be too dificult (I
> > > think I can simply instantiate all templates with the code I am having
> > > right now).  Dealing with at least needed portion of the datastructures
> > > should not be impossible, but I would like first to understand what
> > > exactly it does buy to us...
> > 
> > The template instantiation part of the issue is pretty clear: in big C++
> > programs many (if not most) of the functions are template
> > instantiations.  Trying to make decisions before template instantiation
> > is not worth it.
> 
> This is what I am partly affraid of - forcing all instantiations to
> actually happen (and not only doing the reachable ones) seems to be
> problem as we produce a lot of unneeded trees.

You don't have a choice in the matter. :-)

The C++ standard says which classes/functions are instantiated based on
a particular input program, and our template instantiation model
dictates that you instantiate all of the needed ones in every
translation unit that needs them.

> Yes, this is clear to me, however what I don't understand why you
> consider the split scheme where reachable datastructures are produced by
> frontend and reachable functions maintained by backend major problem.

Because in order to make good heuristic decisions about inlining you
really want *everything* available to the inliner.

Furthermore, I want to avoid one of Zack's "incomplete transitions." 
There's a need to develop code incrementally, but there's no reason we
should adopt it in the compiler incrementally.  It would be better if
you would do this work on a branch, make it work, and check in the
branch.
 
> I am now working on adding variable graph companion to my callgraph code
> (for C/objC this seems to be pretty easy).  This will hopefully get the
> code closer to what we are shooting for.  For C++ there seems to be
> problem with virtual tables as these are not output only via
> assemble_variable and do have dependencies in between middle end does
> not see.

I don't see any calls to assemble_variable in the C++ front end. 
AFAICT, the vtables are emitted via rest_of_decl_compilation like other
variables.  If that's not true, it should be fixed.

-- 
Mark Mitchell
CodeSourcery, LLC
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unit at a time C++ again
  2003-06-19 14:52         ` Mark Mitchell
@ 2003-06-19 15:08           ` Jan Hubicka
  2003-06-19 15:17             ` Mark Mitchell
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Hubicka @ 2003-06-19 15:08 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jan Hubicka, gcc

> On Thu, 2003-06-19 at 01:47, Jan Hubicka wrote:
> > Hi,
> > > > OK, dealing with template instantiation should not be too dificult (I
> > > > think I can simply instantiate all templates with the code I am having
> > > > right now).  Dealing with at least needed portion of the datastructures
> > > > should not be impossible, but I would like first to understand what
> > > > exactly it does buy to us...
> > > 
> > > The template instantiation part of the issue is pretty clear: in big C++
> > > programs many (if not most) of the functions are template
> > > instantiations.  Trying to make decisions before template instantiation
> > > is not worth it.
> > 
> > This is what I am partly affraid of - forcing all instantiations to
> > actually happen (and not only doing the reachable ones) seems to be
> > problem as we produce a lot of unneeded trees.
> 
> You don't have a choice in the matter. :-)
> 
> The C++ standard says which classes/functions are instantiated based on
> a particular input program, and our template instantiation model
> dictates that you instantiate all of the needed ones in every
> translation unit that needs them.

Hmm, I don't follow.  If C++ standard dicates what classes/functions are
instantiated and we do exactly that, how exactly do you propose to
change the current code?
> 
> > Yes, this is clear to me, however what I don't understand why you
> > consider the split scheme where reachable datastructures are produced by
> > frontend and reachable functions maintained by backend major problem.
> 
> Because in order to make good heuristic decisions about inlining you
> really want *everything* available to the inliner.

It looks like we are missunderstanding each other.

I do have everything available to the inliner since the first
incarnation of patch (this is the point of unit-at-a-time mode :)
- I do have code lowering pass that figure out what functions needs to
be expanded and output variables needed by these functions without
actually expanding the functions.  Once all functions are known I do
unit-at-a-time optimization.

> > I am now working on adding variable graph companion to my callgraph code
> > (for C/objC this seems to be pretty easy).  This will hopefully get the
> > code closer to what we are shooting for.  For C++ there seems to be
> > problem with virtual tables as these are not output only via
> > assemble_variable and do have dependencies in between middle end does
> > not see.
> 
> I don't see any calls to assemble_variable in the C++ front end. 
> AFAICT, the vtables are emitted via rest_of_decl_compilation like other
> variables.  If that's not true, it should be fixed.

I was reffering to the following code:
static void
output_vtable_inherit (tree vars)
{
  tree parent;
  rtx child_rtx, parent_rtx;

  child_rtx = XEXP (DECL_RTL (vars), 0);	  /* strip the mem ref  */

  parent = binfo_for_vtable (vars);

  if (parent == TYPE_BINFO (DECL_CONTEXT (vars)))
    parent_rtx = const0_rtx;
  else if (parent)
    {
      parent = get_vtbl_decl_for_binfo (TYPE_BINFO (BINFO_TYPE (parent)));
      parent_rtx = XEXP (DECL_RTL (parent), 0);  /* strip the mem ref  */
    }
  else
    abort ();

  assemble_vtable_inherit (child_rtx, parent_rtx);
}
That does bypass rest_of_decl_compilation.

Honza
> 
> -- 
> Mark Mitchell
> CodeSourcery, LLC
> mark@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unit at a time C++ again
  2003-06-19 15:08           ` Jan Hubicka
@ 2003-06-19 15:17             ` Mark Mitchell
  2003-06-19 16:06               ` Jan Hubicka
  2003-06-19 21:59               ` Jan Hubicka
  0 siblings, 2 replies; 11+ messages in thread
From: Mark Mitchell @ 2003-06-19 15:17 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc

> > The C++ standard says which classes/functions are instantiated based on
> > a particular input program, and our template instantiation model
> > dictates that you instantiate all of the needed ones in every
> > translation unit that needs them.
> 
> Hmm, I don't follow.  If C++ standard dicates what classes/functions are
> instantiated and we do exactly that, how exactly do you propose to
> change the current code?

I don't.  But I want to make sure that all of this instantiation takes
place before any heuristic decisions about inlining are made and before
decisions variable emission is done.

One reason for the latter is that sometimes, if you can do enough
optimization, you no longer need to emit the variables.

If we're going to unit-at-a-time, go all the way.  Wait until the entire
file is processed, all the templates are instantiated, etc., before
writing out *anything*.

> I was reffering to the following code:
> static void
> output_vtable_inherit (tree vars)
> {

I don't even know what that code does, I'm afraid. :-)

It's for some special stuff that tries to help the linker discard
unneeded vtables, I think.  It should be postponed until the vtable
actually gets emitted, via some kind of hook.

-- 
Mark Mitchell
CodeSourcery, LLC
mark@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unit at a time C++ again
  2003-06-19 15:17             ` Mark Mitchell
@ 2003-06-19 16:06               ` Jan Hubicka
  2003-06-19 21:59               ` Jan Hubicka
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Hubicka @ 2003-06-19 16:06 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jan Hubicka, gcc

> > > The C++ standard says which classes/functions are instantiated based on
> > > a particular input program, and our template instantiation model
> > > dictates that you instantiate all of the needed ones in every
> > > translation unit that needs them.
> > 
> > Hmm, I don't follow.  If C++ standard dicates what classes/functions are
> > instantiated and we do exactly that, how exactly do you propose to
> > change the current code?
> 
> I don't.  But I want to make sure that all of this instantiation takes
> place before any heuristic decisions about inlining are made

This part is done by my patch - I do have complette callgraph of
reachable functions before I start any optimizations and no new
functions are inserted into it during the optimization.  On the other
hand, the functions can be removed.

> and before decisions variable emission is done.

This part is not, but I do have patch halfway that can delay emitting of
any variables.  It is stupid at the moment but interfaces are the
important thing right now I guess.

> 
> One reason for the latter is that sometimes, if you can do enough
> optimization, you no longer need to emit the variables.

We don't do any such optimization at the moment (it must be done at tree
level during the unit-at-a-time analysis so this part probably should
wait for tree-SSA), but I am trying to keep this in the mind.
To work well, it will need to created dependency graph that is mixed in
between variables and functions, so when variable requiring function
stops to needed we actually recognize it.  This is not done right now
but can be added with no further modifications to the frontends.

> 
> If we're going to unit-at-a-time, go all the way.  Wait until the entire
> file is processed, all the templates are instantiated, etc., before
> writing out *anything*.

OK, so you don't propose to completely rewrite the finish_file, template
instantiation and all the others?  That sounds promising :)

At the moment I am letting to go trought all static/global variables
output by frontend.  This is changed by patch I am working on, so only
thing that get trought are things assembled directly (not via
rest_of_decl_compilation or cgraph_finalize_function)
> 
> > I was reffering to the following code:
> > static void
> > output_vtable_inherit (tree vars)
> > {
> 
> I don't even know what that code does, I'm afraid. :-)

Hmm, me neither - this is my first journey to the C++ frontend after all :)
> 
> It's for some special stuff that tries to help the linker discard
> unneeded vtables, I think.  It should be postponed until the vtable
> actually gets emitted, via some kind of hook.

Yes, I was considering adding a hook that is called once given variable
is scheduled to be assembled, like I do already have hook for functions
scheduled to be optimized.

Honza
> 
> -- 
> Mark Mitchell
> CodeSourcery, LLC
> mark@codesourcery.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unit at a time C++ again
  2003-06-19 10:47     ` Mark Mitchell
  2003-06-19 11:48       ` Jan Hubicka
@ 2003-06-19 17:27       ` Joe Buck
  1 sibling, 0 replies; 11+ messages in thread
From: Joe Buck @ 2003-06-19 17:27 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Jan Hubicka, gcc

On Wed, Jun 18, 2003 at 11:32:39PM -0700, Mark Mitchell wrote:
> The template instantiation part of the issue is pretty clear: in big C++
> programs many (if not most) of the functions are template
> instantiations.  Trying to make decisions before template instantiation
> is not worth it.

It's possible that there could be exceptions: for some templates, it
can be determined by inspection that an inlined call is always cheaper
than a non-inlined call, regardless of the values of the type parameters
(because the template function or member function returns a trivial
expression), meaning that even with -Os inlining pays.

The STL has hundreds of such templates.

On the other hand, if the templates are going to be instantiated anyway,
it might still be best to ignore this.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unit at a time C++ again
  2003-06-19 15:17             ` Mark Mitchell
  2003-06-19 16:06               ` Jan Hubicka
@ 2003-06-19 21:59               ` Jan Hubicka
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Hubicka @ 2003-06-19 21:59 UTC (permalink / raw)
  To: Mark Mitchell, gcc-patches; +Cc: Jan Hubicka, gcc

> > > The C++ standard says which classes/functions are instantiated based on
> > > a particular input program, and our template instantiation model
> > > dictates that you instantiate all of the needed ones in every
> > > translation unit that needs them.
> > 
> > Hmm, I don't follow.  If C++ standard dicates what classes/functions are
> > instantiated and we do exactly that, how exactly do you propose to
> > change the current code?
> 
> I don't.  But I want to make sure that all of this instantiation takes
> place before any heuristic decisions about inlining are made and before
> decisions variable emission is done.
> 
> One reason for the latter is that sometimes, if you can do enough
> optimization, you no longer need to emit the variables.
> 
> If we're going to unit-at-a-time, go all the way.  Wait until the entire
> file is processed, all the templates are instantiated, etc., before
> writing out *anything*.
> 
> > I was reffering to the following code:
> > static void
> > output_vtable_inherit (tree vars)
> > {
> 
> I don't even know what that code does, I'm afraid. :-)
> 
> It's for some special stuff that tries to help the linker discard
> unneeded vtables, I think.  It should be postponed until the vtable
> actually gets emitted, via some kind of hook.

Mark,
this is the yet another unit-at-a-time patch :)  This time it works by
modifying DECL_NEEDED to work similary to flag_syntax_only as discussed
earlier. It relies on the cgraphunit and varpool to elliminate again
unneded data.

This appears to work in most cases except for static inintializers that
are reachable from the constructors so they are not elliminated.  We
probably will want to teach middle-end about the initializers later.

It passes majority of testsuite when forced to be enabled at -O2
except for:

FAIL: g++.dg/ext/pretty1.C scan-assembler top level
FAIL: g++.dg/ext/pretty2.C (test for excess errors)
Excess errors:
: undefined reference to `__PRETTY_FUNCTION__'
: undefined reference to `__PRETTY_FUNCTION__'
: undefined reference to `__PRETTY_FUNCTION__'

I am not at all sure how does the pretty function work and why I broke it :( It
appears to be latent problem when deferring functions, but I am not sure about
that.

Another problem appears to be testcase:
FAIL: g++.dg/opt/vtgc1.C scan-assembler-dem .vtable_inherit[    ]*VTT for Multivv3.*0
PASS: g++.dg/opt/vtgc1.C scan-assembler-dem .vtable_inherit[    ]*vtable for Multiss2.*vtable for Base2
PASS: g++.dg/opt/vtgc1.C scan-assembler-dem .vtable_inherit[    ]*vtable for Multivs1.*vtable for Base2
FAIL: g++.dg/opt/vtgc1.C scan-assembler-dem .vtable_inherit[    ]*VTT for Multivs1.*vtable for Base2
PASS: g++.dg/opt/vtgc1.C scan-assembler-dem .vtable_inherit[    ]*vtable for Multisv0.*vtable for Side0
FAIL: g++.dg/opt/vtgc1.C scan-assembler-dem .vtable_inherit[    ]*VTT for Multisv0.*vtable for Side0
PASS: g++.dg/opt/vtgc1.C scan-assembler-dem .vtable_inherit[    ]*vtable for Side0.*0
PASS: g++.dg/opt/vtgc1.C scan-assembler-dem .vtable_inherit[    ]*vtable for VbasedA.*0
FAIL: g++.dg/opt/vtgc1.C scan-assembler-dem .vtable_inherit[    ]*VTT for VbasedA.*0

The reason for failure appears to be the fact that virtual table is not needed
and thus it is not output, so perhaps the testcase is wrong.

I've also bootstrapped/regtested the patch.  Does it look better now?

I will try to do more testing (benchmarking) tomorrow.  I would like to
get it into form applicable to mainline (perhaps disabled by default
before we rework the inlining heuristics and such so it actually brings
measurable speedups)

Honza

Fri Jun 20 01:27:00 CEST 2003  Jan Hubicka  <jh@suse.cz>
	* cp-lang.c (LANG_HOOKS_CALLGRAPH_EXPAND_FUNCTION,
	LANG_HOOKS_PREPARE_ASSEMBLE_VARIABLE): Set macros.
	* cp-tree.h (DECL_NEEDED): In unit-at-a-time do the same trick
	as for TREE_USED.
	(prepare_assemble_variable, really_expand_body): Declare.
	* decl.c (cp_finish_decl): Flush out the declaration in
	unit-at-a-time.
	* decl2.c: Include bitmap.h and cgraph.h
	(output_vtable_inherit): Rename to ...
	(prepare_assemble_variable): ... this; check for operand being vtable.
	(finish_file): Deal with unit-at-a-time
	* rtti.c (emit_tinfo_decl): Emit all tinfos in unit-at-a-time.
	* semantics.c (really_expand_body): Break out from ...
	(expand_body): ... this one.
diff -Nrc3p ../cp.old/cp-lang.c cp/cp-lang.c
*** ../cp.old/cp-lang.c	Sat Jun  7 15:03:48 2003
--- cp/cp-lang.c	Thu Jun 19 22:17:17 2003
*************** static bool cp_var_mod_type_p (tree);
*** 145,150 ****
--- 145,156 ----
  #undef LANG_HOOKS_EXPR_SIZE
  #define LANG_HOOKS_EXPR_SIZE cp_expr_size
  
+ #undef LANG_HOOKS_CALLGRAPH_EXPAND_FUNCTION
+ #define LANG_HOOKS_CALLGRAPH_EXPAND_FUNCTION really_expand_body
+ 
+ #undef LANG_HOOKS_PREPARE_ASSEMBLE_VARIABLE 
+ #define LANG_HOOKS_PREPARE_ASSEMBLE_VARIABLE prepare_assemble_variable
+ 
  #undef LANG_HOOKS_MAKE_TYPE
  #define LANG_HOOKS_MAKE_TYPE cxx_make_type
  #undef LANG_HOOKS_TYPE_FOR_MODE
diff -Nrc3p ../cp.old/cp-tree.h cp/cp-tree.h
*** ../cp.old/cp-tree.h	Sat Jun  7 15:03:48 2003
--- cp/cp-tree.h	Thu Jun 19 22:17:51 2003
*************** struct lang_decl GTY(())
*** 1745,1751 ****
    ((at_eof && TREE_PUBLIC (DECL) && !DECL_COMDAT (DECL))	\
     || (DECL_ASSEMBLER_NAME_SET_P (DECL)				\
         && TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (DECL)))	\
!    || (flag_syntax_only && TREE_USED (DECL)))
  
  /* For a FUNCTION_DECL or a VAR_DECL, the language linkage for the
     declaration.  Some entities (like a member function in a local
--- 1745,1751 ----
    ((at_eof && TREE_PUBLIC (DECL) && !DECL_COMDAT (DECL))	\
     || (DECL_ASSEMBLER_NAME_SET_P (DECL)				\
         && TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (DECL)))	\
!    || (((flag_syntax_only || flag_unit_at_a_time) && TREE_USED (DECL))))
  
  /* For a FUNCTION_DECL or a VAR_DECL, the language linkage for the
     declaration.  Some entities (like a member function in a local
*************** extern tree build_artificial_parm (tree,
*** 3791,3796 ****
--- 3791,3797 ----
  extern tree get_guard (tree);
  extern tree get_guard_cond (tree);
  extern tree set_guard (tree);
+ extern void prepare_assemble_variable (tree);
  
  extern void cp_error_at		(const char *msgid, ...);
  extern void cp_warning_at	(const char *msgid, ...);
*************** extern void clear_out_block             
*** 4146,4151 ****
--- 4147,4153 ----
  extern tree begin_global_stmt_expr              (void);
  extern tree finish_global_stmt_expr             (tree);
  extern tree check_template_template_default_arg (tree);
+ extern void really_expand_body			(tree);
  
  /* in tree.c */
  extern void lang_check_failed			(const char *, int,
diff -Nrc3p ../cp.old/decl.c cp/decl.c
*** ../cp.old/decl.c	Sun Jun  8 11:42:41 2003
--- cp/decl.c	Fri Jun 20 01:04:35 2003
*************** cp_finish_decl (tree decl, tree init, tr
*** 8204,8209 ****
--- 8204,8214 ----
  	  if (TREE_STATIC (decl))
  	    expand_static_init (decl, init);
  	}
+       if (TREE_CODE (decl) == VAR_DECL
+ 	  && (TREE_STATIC (decl) || DECL_EXTERNAL (decl))
+ 	  && !DECL_DEFER_OUTPUT (decl)
+ 	  && flag_unit_at_a_time)
+ 	rest_of_decl_compilation (decl, NULL, 0, 0);
      finish_end0:
  
        /* Undo call to `pushclass' that was done in `start_decl'
diff -Nrc3p ../cp.old/decl2.c cp/decl2.c
*** ../cp.old/decl2.c	Thu May 22 01:27:50 2003
--- cp/decl2.c	Thu Jun 19 23:41:28 2003
*************** Boston, MA 02111-1307, USA.  */
*** 46,51 ****
--- 46,53 ----
  #include "cpplib.h"
  #include "target.h"
  #include "c-common.h"
+ #include "cgraph.h"
+ #include "bitmap.h"
  extern cpp_reader *parse_in;
  
  /* This structure contains information about the initializations
*************** static void add_using_namespace (tree, t
*** 66,72 ****
  static cxx_binding *ambiguous_decl (tree, cxx_binding *, cxx_binding *, int);
  static tree build_anon_union_vars (tree);
  static bool acceptable_java_type (tree);
- static void output_vtable_inherit (tree);
  static tree start_objects (int, int);
  static void finish_objects (int, int, tree);
  static tree merge_functions (tree, tree);
--- 68,73 ----
*************** import_export_class (tree ctype)
*** 1604,1614 ****
  /* We need to describe to the assembler the relationship between
     a vtable and the vtable of the parent class.  */
  
! static void
! output_vtable_inherit (tree vars)
  {
    tree parent;
    rtx child_rtx, parent_rtx;
  
    child_rtx = XEXP (DECL_RTL (vars), 0);	  /* strip the mem ref  */
  
--- 1605,1628 ----
  /* We need to describe to the assembler the relationship between
     a vtable and the vtable of the parent class.  */
  
! void
! prepare_assemble_variable (tree vars)
  {
    tree parent;
    rtx child_rtx, parent_rtx;
+   const char *type_name;
+ 
+   /* Recognize virtual tables.  */
+   if (!flag_vtable_gc
+       || TREE_CODE (TREE_TYPE (vars)) != ARRAY_TYPE
+       || TREE_TYPE (TREE_TYPE (vars)) != TREE_TYPE (vtbl_type_node))
+     return;
+ 
+   type_name = IDENTIFIER_POINTER
+ 	        (DECL_NAME (TYPE_NAME (TREE_TYPE (TREE_TYPE (vars)))));
+ 
+   if (strcmp (VTBL_PTR_TYPE, type_name))
+     return;
  
    child_rtx = XEXP (DECL_RTL (vars), 0);	  /* strip the mem ref  */
  
*************** maybe_emit_vtables (tree ctype)
*** 1706,1714 ****
  
        rest_of_decl_compilation (vtbl, NULL, 1, 1);
  
-       if (flag_vtable_gc)
- 	output_vtable_inherit (vtbl);
- 
        /* Because we're only doing syntax-checking, we'll never end up
  	 actually marking the variable as written.  */
        if (flag_syntax_only)
--- 1720,1725 ----
*************** finish_file ()
*** 2549,2554 ****
--- 2560,2566 ----
    size_t i;
    location_t locus;
    unsigned ssdf_count = 0;
+   bitmap_head fn_finalized;
  
    locus = input_location;
    at_eof = 1;
*************** finish_file ()
*** 2557,2562 ****
--- 2569,2576 ----
    if (! global_bindings_p () || current_class_type || decl_namespace_list)
      return;
  
+   bitmap_initialize (&fn_finalized, 0);
+ 
    if (pch_file)
      c_common_write_pch ();
  
*************** finish_file ()
*** 2746,2752 ****
  	     instantiation "static", which will result in errors about
  	     the use of undefined functions if there is no body for
  	     the function.  */
! 	  if (!DECL_SAVED_TREE (decl))
  	    continue;
  
  	  import_export_decl (decl);
--- 2760,2766 ----
  	     instantiation "static", which will result in errors about
  	     the use of undefined functions if there is no body for
  	     the function.  */
! 	  if (!DECL_SAVED_TREE (decl) || bitmap_bit_p (&fn_finalized, i))
  	    continue;
  
  	  import_export_decl (decl);
*************** finish_file ()
*** 2786,2791 ****
--- 2800,2806 ----
  	      expand_body (decl);
  	      /* Undo the damage done by finish_function.  */
  	      DECL_EXTERNAL (decl) = 0;
+ 	      DECL_DEFER_OUTPUT (decl) = 0;
  	      DECL_NOT_REALLY_EXTERN (decl) = saved_not_really_extern;
  	      /* If we're compiling -fsyntax-only pretend that this
  		 function has been written out so that we don't try to
*************** finish_file ()
*** 2793,2798 ****
--- 2808,2814 ----
  	      if (flag_syntax_only)
  		TREE_ASM_WRITTEN (decl) = 1;
  	      reconsider = true;
+ 	      bitmap_set_bit (&fn_finalized, i);
  	    }
  	}
  
*************** finish_file ()
*** 2869,2874 ****
--- 2885,2896 ----
       linkage now.  */
    pop_lang_context ();
  
+   if (flag_unit_at_a_time)
+     {
+       cgraph_finalize_compilation_unit ();
+       cgraph_optimize ();
+     }
+ 
    /* Now, issue warnings about static, but not defined, functions,
       etc., and emit debugging information.  */
    walk_namespaces (wrapup_globals_for_namespace, /*data=*/&reconsider);
*************** finish_file ()
*** 2899,2904 ****
--- 2921,2928 ----
        dump_time_statistics ();
      }
    input_location = locus;
+ 
+   bitmap_clear (&fn_finalized);
  }
  
  /* T is the parse tree for an expression.  Return the expression after
diff -Nrc3p ../cp.old/rtti.c cp/rtti.c
*** ../cp.old/rtti.c	Sun May 18 00:21:35 2003
--- cp/rtti.c	Fri Jun 20 01:10:58 2003
*************** emit_tinfo_decl (tree decl)
*** 1447,1453 ****
    my_friendly_assert (unemitted_tinfo_decl_p (decl), 20030307); 
    
    import_export_tinfo (decl, type, in_library);
!   if (DECL_REALLY_EXTERN (decl) || !DECL_NEEDED_P (decl))
      return false;
  
    if (!doing_runtime && in_library)
--- 1449,1455 ----
    my_friendly_assert (unemitted_tinfo_decl_p (decl), 20030307); 
    
    import_export_tinfo (decl, type, in_library);
!   if (DECL_REALLY_EXTERN (decl) || (!DECL_NEEDED_P (decl) && !flag_unit_at_a_time))
      return false;
  
    if (!doing_runtime && in_library)
diff -Nrc3p ../cp.old/semantics.c cp/semantics.c
*** ../cp.old/semantics.c	Sun May 18 11:42:09 2003
--- cp/semantics.c	Thu Jun 19 21:22:27 2003
***************
*** 41,46 ****
--- 41,47 ----
  #include "output.h"
  #include "timevar.h"
  #include "debug.h"
+ #include "cgraph.h"
  
  /* There routines provide a modular interface to perform many parsing
     operations.  They may therefore be used during actual parsing, or
*************** emit_associated_thunks (fn)
*** 2380,2453 ****
  /* Generate RTL for FN.  */
  
  void
! expand_body (fn)
       tree fn;
  {
    location_t saved_loc;
    tree saved_function;
  
!   /* When the parser calls us after finishing the body of a template
!      function, we don't really want to expand the body.  When we're
!      processing an in-class definition of an inline function,
!      PROCESSING_TEMPLATE_DECL will no longer be set here, so we have
!      to look at the function itself.  */
!   if (processing_template_decl
!       || (DECL_LANG_SPECIFIC (fn) 
! 	  && DECL_TEMPLATE_INFO (fn)
! 	  && uses_template_parms (DECL_TI_ARGS (fn))))
!     {
!       /* Normally, collection only occurs in rest_of_compilation.  So,
! 	 if we don't collect here, we never collect junk generated
! 	 during the processing of templates until we hit a
! 	 non-template function.  */
!       ggc_collect ();
!       return;
!     }
! 
!   /* Replace AGGR_INIT_EXPRs with appropriate CALL_EXPRs.  */
!   walk_tree_without_duplicates (&DECL_SAVED_TREE (fn),
! 				simplify_aggr_init_exprs_r,
! 				NULL);
! 
!   /* If this is a constructor or destructor body, we have to clone
!      it.  */
!   if (maybe_clone_body (fn))
!     {
!       /* We don't want to process FN again, so pretend we've written
! 	 it out, even though we haven't.  */
!       TREE_ASM_WRITTEN (fn) = 1;
!       return;
!     }
! 
!   /* There's no reason to do any of the work here if we're only doing
!      semantic analysis; this code just generates RTL.  */
!   if (flag_syntax_only)
!     return;
! 
!   /* If possible, avoid generating RTL for this function.  Instead,
!      just record it as an inline function, and wait until end-of-file
!      to decide whether to write it out or not.  */
!   if (/* We have to generate RTL if it's not an inline function.  */
!       (DECL_INLINE (fn) || DECL_COMDAT (fn))
!       /* Or if we have to emit code for inline functions anyhow.  */
!       && !flag_keep_inline_functions
!       /* Or if we actually have a reference to the function.  */
!       && !DECL_NEEDED_P (fn))
!     {
!       /* Set DECL_EXTERNAL so that assemble_external will be called as
! 	 necessary.  We'll clear it again in finish_file.  */
!       if (!DECL_EXTERNAL (fn))
! 	{
! 	  DECL_NOT_REALLY_EXTERN (fn) = 1;
! 	  DECL_EXTERNAL (fn) = 1;
! 	}
!       /* Remember this function.  In finish_file we'll decide if
! 	 we actually need to write this function out.  */
!       defer_fn (fn);
!       /* Let the back-end know that this function exists.  */
!       (*debug_hooks->deferred_inline_function) (fn);
!       return;
!     }
  
    /* Compute the appropriate object-file linkage for inline
       functions.  */
--- 2381,2394 ----
  /* Generate RTL for FN.  */
  
  void
! really_expand_body (fn)
       tree fn;
  {
    location_t saved_loc;
    tree saved_function;
  
!   if (flag_unit_at_a_time && !cgraph_global_info_ready)
!     abort ();
  
    /* Compute the appropriate object-file linkage for inline
       functions.  */
*************** expand_body (fn)
*** 2519,2524 ****
--- 2460,2561 ----
    emit_associated_thunks (fn);
  }
  
+ /* Generate RTL for FN.  */
+ 
+ void
+ expand_body (fn)
+      tree fn;
+ {
+   /* When the parser calls us after finishing the body of a template
+      function, we don't really want to expand the body.  When we're
+      processing an in-class definition of an inline function,
+      PROCESSING_TEMPLATE_DECL will no longer be set here, so we have
+      to look at the function itself.  */
+   if (processing_template_decl
+       || (DECL_LANG_SPECIFIC (fn) 
+ 	  && DECL_TEMPLATE_INFO (fn)
+ 	  && uses_template_parms (DECL_TI_ARGS (fn))))
+     {
+       /* Normally, collection only occurs in rest_of_compilation.  So,
+ 	 if we don't collect here, we never collect junk generated
+ 	 during the processing of templates until we hit a
+ 	 non-template function.  */
+       ggc_collect ();
+       return;
+     }
+ 
+   /* Replace AGGR_INIT_EXPRs with appropriate CALL_EXPRs.  */
+   walk_tree_without_duplicates (&DECL_SAVED_TREE (fn),
+ 				simplify_aggr_init_exprs_r,
+ 				NULL);
+ 
+   /* If this is a constructor or destructor body, we have to clone
+      it.  */
+   if (maybe_clone_body (fn))
+     {
+       /* We don't want to process FN again, so pretend we've written
+ 	 it out, even though we haven't.  */
+       TREE_ASM_WRITTEN (fn) = 1;
+       return;
+     }
+ 
+   /* There's no reason to do any of the work here if we're only doing
+      semantic analysis; this code just generates RTL.  */
+   if (flag_syntax_only)
+     return;
+ 
+   if (flag_unit_at_a_time && cgraph_global_info_ready)
+     abort ();
+ 
+   if (flag_unit_at_a_time && !cgraph_global_info_ready)
+     {
+       if (at_eof)
+ 	cgraph_finalize_function (fn, DECL_SAVED_TREE (fn));
+       else
+ 	{
+ 	  if (!DECL_EXTERNAL (fn))
+ 	    {
+ 	      DECL_NOT_REALLY_EXTERN (fn) = 1;
+ 	      DECL_EXTERNAL (fn) = 1;
+ 	    }
+ 	  /* Remember this function.  In finish_file we'll decide if
+ 	     we actually need to write this function out.  */
+ 	  defer_fn (fn);
+ 	  /* Let the back-end know that this function exists.  */
+ 	  (*debug_hooks->deferred_inline_function) (fn);
+ 	}
+       return;
+     }
+ 
+ 
+   /* If possible, avoid generating RTL for this function.  Instead,
+      just record it as an inline function, and wait until end-of-file
+      to decide whether to write it out or not.  */
+   if (/* We have to generate RTL if it's not an inline function.  */
+       (DECL_INLINE (fn) || DECL_COMDAT (fn))
+       /* Or if we have to emit code for inline functions anyhow.  */
+       && !flag_keep_inline_functions
+       /* Or if we actually have a reference to the function.  */
+       && !DECL_NEEDED_P (fn))
+     {
+       /* Set DECL_EXTERNAL so that assemble_external will be called as
+ 	 necessary.  We'll clear it again in finish_file.  */
+       if (!DECL_EXTERNAL (fn))
+ 	{
+ 	  DECL_NOT_REALLY_EXTERN (fn) = 1;
+ 	  DECL_EXTERNAL (fn) = 1;
+ 	}
+       /* Remember this function.  In finish_file we'll decide if
+ 	 we actually need to write this function out.  */
+       defer_fn (fn);
+       /* Let the back-end know that this function exists.  */
+       (*debug_hooks->deferred_inline_function) (fn);
+       return;
+     }
+ 
+   really_expand_body (fn);
+ }
+ 
  /* Helper function for walk_tree, used by finish_function to override all
     the RETURN_STMTs and pertinent CLEANUP_STMTs for the named return
     value optimization.  */

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2003-06-19 21:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-18 13:53 Unit at a time C++ again Jan Hubicka
2003-06-18 23:31 ` Mark Mitchell
2003-06-19  8:36   ` Jan Hubicka
2003-06-19 10:47     ` Mark Mitchell
2003-06-19 11:48       ` Jan Hubicka
2003-06-19 14:52         ` Mark Mitchell
2003-06-19 15:08           ` Jan Hubicka
2003-06-19 15:17             ` Mark Mitchell
2003-06-19 16:06               ` Jan Hubicka
2003-06-19 21:59               ` Jan Hubicka
2003-06-19 17:27       ` Joe Buck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).