* [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
@ 2008-11-11 17:31 Rafael Espindola
2008-11-11 20:23 ` Jan Hubicka
2008-11-11 22:54 ` Diego Novillo
0 siblings, 2 replies; 18+ messages in thread
From: Rafael Espindola @ 2008-11-11 17:31 UTC (permalink / raw)
To: gcc-patches; +Cc: Diego Novillo
[-- Attachment #1: Type: text/plain, Size: 1899 bytes --]
We have to move the call execute_all_ipa_transforms to cgraphunit.c,
since we want to run the transformations and then regenerate the
summaries.
Doing so shows a problem with pass_set_nothrow_function_flags. Setting
the NOTHROW bit of a function F invalidates the eh regions of any
function that call F. This is not currently visible since
execute_fixup_cfg gets called by other passes before the problem is
detected. Ian pointed out that the correct fix is to make
pass_set_nothrow_function_flag an IPA pass. The attached patches use 2
different approaches to avoid the problem.
Patch 1 adds a direct call to execute_fixup_cfg in
tree_rest_of_compilation. For reasons I still don't understand this
causes a regression on the following test
---------------------------------------------------------------
struct _Tune {
explicit _Tune() {
}
};
template<class _PoolTp>
static void _S_get_pool() {
static _Tune _M_options;
}
class __mt_alloc {
public:
~__mt_alloc() throw() {
}
};
void deallocate() {
_S_get_pool<int>();
delete (int *)0;
}
int main() {
__mt_alloc a;
deallocate();
deallocate();
}
---------------------------------------------------------------
cc1plus fails with "internal compiler error: in calc_dfs_tree, at
dominance.c:394". Converting _S_get_pool to a non-template function
solves the problem, so there is something really strange going on.
The second patch takes the simpler approach of just disabling the
pass. It is currently bootstrapping, but I have checked that it
doesn't show that regression.
So, which one is your favorite? :-) Assuming disabling the pass has no
regressions, should I commit that? And if it has? Should I debug the
above problem?
Cheers,
--
Rafael Avila de Espindola
Google | Gordon House | Barrow Street | Dublin 4 | Ireland
Registered in Dublin, Ireland | Registration Number: 368047
[-- Attachment #2: ipa1.patch --]
[-- Type: text/x-diff, Size: 3453 bytes --]
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 01cae4e..fe74fc1 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1078,7 +1078,46 @@ cgraph_inline_p (struct cgraph_edge *e, cgraph_inline_failed_t *reason)
return !e->inline_failed;
}
+/* Execute all ipa transforms in all functions. */
+static void
+cgraph_execute_all_ipa_transforms (void)
+{
+ struct cgraph_node *node;
+ struct cgraph_node **order = XCNEWVEC (struct cgraph_node *, cgraph_n_nodes);
+ int order_pos, new_order_pos = 0;
+ int i;
+
+ order_pos = cgraph_postorder (order);
+ gcc_assert (order_pos == cgraph_n_nodes);
+
+ /* Garbage collector may remove inline clones we eliminate during
+ optimization. So we must be sure to not reference them. */
+ for (i = 0; i < order_pos; i++)
+ {
+ if (order[i]->output)
+ order[new_order_pos++] = order[i];
+ }
+
+ gimple_register_cfg_hooks ();
+
+ for (i = new_order_pos - 1; i >= 0; i--)
+ {
+ tree fndecl;
+ node = order[i];
+ fndecl = node->decl;
+
+ bitmap_obstack_initialize (NULL);
+ current_function_decl = fndecl;
+ set_cfun (DECL_STRUCT_FUNCTION (fndecl));
+
+ execute_all_ipa_transforms ();
+
+ set_cfun (NULL);
+ current_function_decl = NULL;
+ bitmap_obstack_release (NULL);
+ }
+}
/* Expand all functions that must be output.
@@ -1320,6 +1359,8 @@ cgraph_optimize (void)
cgraph_mark_functions_to_output ();
+ cgraph_execute_all_ipa_transforms ();
+
cgraph_state = CGRAPH_STATE_EXPANSION;
if (!flag_toplevel_reorder)
cgraph_output_in_order ();
diff --git a/gcc/passes.c b/gcc/passes.c
index ed433da..15603e1 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -1244,20 +1244,26 @@ execute_one_ipa_transform_pass (struct cgraph_node *node,
void
execute_all_ipa_transforms (void)
{
- if (cfun && cfun->ipa_transforms_to_apply)
- {
- unsigned int i;
- struct cgraph_node *node = cgraph_node (current_function_decl);
-
- for (i = 0; i < VEC_length (ipa_opt_pass, cfun->ipa_transforms_to_apply);
- i++)
- execute_one_ipa_transform_pass (node,
- VEC_index (ipa_opt_pass,
- cfun->ipa_transforms_to_apply,
- i));
- VEC_free (ipa_opt_pass, heap, cfun->ipa_transforms_to_apply);
- cfun->ipa_transforms_to_apply = NULL;
- }
+ unsigned int i;
+ struct cgraph_node *node;
+
+ gcc_assert(cfun);
+ if (!cfun->ipa_transforms_to_apply)
+ return;
+
+ node = cgraph_node (current_function_decl);
+
+ gcc_assert (node->output);
+
+ for (i = 0; i < VEC_length (ipa_opt_pass, cfun->ipa_transforms_to_apply);
+ i++)
+ execute_one_ipa_transform_pass (node,
+ VEC_index (ipa_opt_pass,
+ cfun->ipa_transforms_to_apply,
+ i));
+
+ VEC_free (ipa_opt_pass, heap, cfun->ipa_transforms_to_apply);
+ cfun->ipa_transforms_to_apply = NULL;
}
/* Execute PASS. */
diff --git a/gcc/tree-optimize.c b/gcc/tree-optimize.c
index 18ccf21..2732847 100644
--- a/gcc/tree-optimize.c
+++ b/gcc/tree-optimize.c
@@ -415,8 +415,9 @@ tree_rest_of_compilation (tree fndecl)
bitmap_obstack_initialize (®_obstack); /* FIXME, only at RTL generation*/
-
- execute_all_ipa_transforms ();
+ /* FIXME lto. pass_set_nothrow_function_flags might have corrupted our
+ eh regions. It should be rewritten as an IPA pass. */
+ execute_fixup_cfg ();
/* Perform all tree transforms and optimizations. */
execute_pass_list (all_passes);
[-- Attachment #3: ipa2.patch --]
[-- Type: text/x-diff, Size: 3704 bytes --]
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 01cae4e..fe74fc1 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1078,7 +1078,46 @@ cgraph_inline_p (struct cgraph_edge *e, cgraph_inline_failed_t *reason)
return !e->inline_failed;
}
+/* Execute all ipa transforms in all functions. */
+static void
+cgraph_execute_all_ipa_transforms (void)
+{
+ struct cgraph_node *node;
+ struct cgraph_node **order = XCNEWVEC (struct cgraph_node *, cgraph_n_nodes);
+ int order_pos, new_order_pos = 0;
+ int i;
+
+ order_pos = cgraph_postorder (order);
+ gcc_assert (order_pos == cgraph_n_nodes);
+
+ /* Garbage collector may remove inline clones we eliminate during
+ optimization. So we must be sure to not reference them. */
+ for (i = 0; i < order_pos; i++)
+ {
+ if (order[i]->output)
+ order[new_order_pos++] = order[i];
+ }
+
+ gimple_register_cfg_hooks ();
+
+ for (i = new_order_pos - 1; i >= 0; i--)
+ {
+ tree fndecl;
+ node = order[i];
+ fndecl = node->decl;
+
+ bitmap_obstack_initialize (NULL);
+ current_function_decl = fndecl;
+ set_cfun (DECL_STRUCT_FUNCTION (fndecl));
+
+ execute_all_ipa_transforms ();
+
+ set_cfun (NULL);
+ current_function_decl = NULL;
+ bitmap_obstack_release (NULL);
+ }
+}
/* Expand all functions that must be output.
@@ -1320,6 +1359,8 @@ cgraph_optimize (void)
cgraph_mark_functions_to_output ();
+ cgraph_execute_all_ipa_transforms ();
+
cgraph_state = CGRAPH_STATE_EXPANSION;
if (!flag_toplevel_reorder)
cgraph_output_in_order ();
diff --git a/gcc/passes.c b/gcc/passes.c
index ed433da..50c47b6 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -823,7 +823,8 @@ init_optimization_passes (void)
NEXT_PASS (pass_split_for_shorten_branches);
NEXT_PASS (pass_convert_to_eh_region_ranges);
NEXT_PASS (pass_shorten_branches);
- NEXT_PASS (pass_set_nothrow_function_flags);
+ /* FIXME lto: should be an IPA pass. */
+ /* NEXT_PASS (pass_set_nothrow_function_flags);*/
NEXT_PASS (pass_final);
}
NEXT_PASS (pass_df_finish);
@@ -1244,20 +1245,26 @@ execute_one_ipa_transform_pass (struct cgraph_node *node,
void
execute_all_ipa_transforms (void)
{
- if (cfun && cfun->ipa_transforms_to_apply)
- {
- unsigned int i;
- struct cgraph_node *node = cgraph_node (current_function_decl);
-
- for (i = 0; i < VEC_length (ipa_opt_pass, cfun->ipa_transforms_to_apply);
- i++)
- execute_one_ipa_transform_pass (node,
- VEC_index (ipa_opt_pass,
- cfun->ipa_transforms_to_apply,
- i));
- VEC_free (ipa_opt_pass, heap, cfun->ipa_transforms_to_apply);
- cfun->ipa_transforms_to_apply = NULL;
- }
+ unsigned int i;
+ struct cgraph_node *node;
+
+ gcc_assert(cfun);
+ if (!cfun->ipa_transforms_to_apply)
+ return;
+
+ node = cgraph_node (current_function_decl);
+
+ gcc_assert (node->output);
+
+ for (i = 0; i < VEC_length (ipa_opt_pass, cfun->ipa_transforms_to_apply);
+ i++)
+ execute_one_ipa_transform_pass (node,
+ VEC_index (ipa_opt_pass,
+ cfun->ipa_transforms_to_apply,
+ i));
+
+ VEC_free (ipa_opt_pass, heap, cfun->ipa_transforms_to_apply);
+ cfun->ipa_transforms_to_apply = NULL;
}
/* Execute PASS. */
diff --git a/gcc/tree-optimize.c b/gcc/tree-optimize.c
index 18ccf21..a5def35 100644
--- a/gcc/tree-optimize.c
+++ b/gcc/tree-optimize.c
@@ -415,9 +415,6 @@ tree_rest_of_compilation (tree fndecl)
bitmap_obstack_initialize (®_obstack); /* FIXME, only at RTL generation*/
-
- execute_all_ipa_transforms ();
-
/* Perform all tree transforms and optimizations. */
execute_pass_list (all_passes);
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-11 17:31 [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c Rafael Espindola
@ 2008-11-11 20:23 ` Jan Hubicka
2008-11-11 22:49 ` Rafael Espindola
2008-11-11 23:11 ` Diego Novillo
2008-11-11 22:54 ` Diego Novillo
1 sibling, 2 replies; 18+ messages in thread
From: Jan Hubicka @ 2008-11-11 20:23 UTC (permalink / raw)
To: Rafael Espindola; +Cc: gcc-patches, Diego Novillo
Hi,
we definitly should have nothrow IPA pass same as we do for const/pure
call discovery. However in current implementation
pass_set_nothrow_functoin_flags is RTL pass done at late compilation
and pass_execute_fixup_cfg exists there precisely to fixup function
bodies after this pass being executed on callees (we compile in
topological order) and also to cleanup after other passes, such as
pure/const discovery that does not update themselves.
(today pure/const passes probably could do the job in their transform
hooks, but on the other hand it is practical to couple all the cleanup
work to single pass)
How this picture change in LTO? I would expect that RTL compilation
queue is pretty much unchanged.
Honza
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-11 20:23 ` Jan Hubicka
@ 2008-11-11 22:49 ` Rafael Espindola
2008-11-11 22:53 ` Jan Hubicka
2008-11-11 23:11 ` Diego Novillo
1 sibling, 1 reply; 18+ messages in thread
From: Rafael Espindola @ 2008-11-11 22:49 UTC (permalink / raw)
To: Jan Hubicka; +Cc: gcc-patches, Diego Novillo
> How this picture change in LTO? I would expect that RTL compilation
> queue is pretty much unchanged.
For each function it is. I am just reordering some passes, and because
of that I had to add a call fixup_cfg. It introduced that strange
regression I mentioned. No idea why.
> Honza
>
Cheers,
--
Rafael Avila de Espindola
Google | Gordon House | Barrow Street | Dublin 4 | Ireland
Registered in Dublin, Ireland | Registration Number: 368047
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-11 22:49 ` Rafael Espindola
@ 2008-11-11 22:53 ` Jan Hubicka
0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2008-11-11 22:53 UTC (permalink / raw)
To: Rafael Espindola; +Cc: Jan Hubicka, gcc-patches, Diego Novillo
> > How this picture change in LTO? I would expect that RTL compilation
> > queue is pretty much unchanged.
>
> For each function it is. I am just reordering some passes, and because
> of that I had to add a call fixup_cfg. It introduced that strange
> regression I mentioned. No idea why.
Well, if you reorder the passes, you always need fixup_cfg to be first
in the block of local passes containing the RTL queue. That way you
should not get any CFG verification failures.
I don't see how resonably you can end up needing 2 fixup_cfg's unless
you are adding new block of local passes somewhere in the queue.
Or simply disable the pass for time being if you want...
Honza
>
> > Honza
> >
>
>
> Cheers,
> --
> Rafael Avila de Espindola
>
> Google | Gordon House | Barrow Street | Dublin 4 | Ireland
> Registered in Dublin, Ireland | Registration Number: 368047
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-11 17:31 [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c Rafael Espindola
2008-11-11 20:23 ` Jan Hubicka
@ 2008-11-11 22:54 ` Diego Novillo
2008-11-11 23:14 ` Jan Hubicka
1 sibling, 1 reply; 18+ messages in thread
From: Diego Novillo @ 2008-11-11 22:54 UTC (permalink / raw)
To: Rafael Espindola; +Cc: gcc-patches
2008/11/11 Rafael Espindola <espindola@google.com>:
> So, which one is your favorite? :-) Assuming disabling the pass has no
> regressions, should I commit that?
Yes, thanks. Converting pass_set_nothrow_function_flag into an IPA
pass is the right approach. In fact, I like Jan's idea of bundling it
with the const/pure pass. However, that can wait. Could you open a
PR for it?
> And if it has? Should I debug the above problem?
No, it would be too much hassle for no gain. The pass must go away.
Diego.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-11 20:23 ` Jan Hubicka
2008-11-11 22:49 ` Rafael Espindola
@ 2008-11-11 23:11 ` Diego Novillo
2008-11-11 23:18 ` Jan Hubicka
1 sibling, 1 reply; 18+ messages in thread
From: Diego Novillo @ 2008-11-11 23:11 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Rafael Espindola, gcc-patches
2008/11/11 Jan Hubicka <hubicka@ucw.cz>:
> How this picture change in LTO? I would expect that RTL compilation
> queue is pretty much unchanged.
Yes, the RTL passes should not change. There may be exceptions like
this one where there are hidden dependencies that prevent us from
reorganizing the IPA passes. I hope we don't find many.
Diego.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-11 22:54 ` Diego Novillo
@ 2008-11-11 23:14 ` Jan Hubicka
0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2008-11-11 23:14 UTC (permalink / raw)
To: Diego Novillo; +Cc: Rafael Espindola, gcc-patches
> 2008/11/11 Rafael Espindola <espindola@google.com>:
>
> > So, which one is your favorite? :-) Assuming disabling the pass has no
> > regressions, should I commit that?
>
> Yes, thanks. Converting pass_set_nothrow_function_flag into an IPA
> pass is the right approach. In fact, I like Jan's idea of bundling it
> with the const/pure pass. However, that can wait. Could you open a
> PR for it?
Well, things seems a bit more complicated here. It seems to make sense
to do the pass both locally and at IPA level. It probably should be
moved to gimple though.
I was just looking into inliner heuristics to make it more scalable. On
C++ beasts like tramp3d or botan we miss a lot of early optimization
oppurtunities because we don't know if functions are const/pure/nothrow.
So I am thinking about scheduling simple local pass into early
optimizations that will look at current function body and set the flags.
(we also need alias analysis in early stages and do some function call
regularization). Those should make early optimizers a lot more
effective and make inlining heuristics (and IPA in general) fed with a
lot less garbage. Current trick of ignoring load/stores for program
size metrics has many sick side effects and for tramp3d we tend to
handle functions containing over 9000 loads+stores as being cheap to
inline (and indeed we optimize those 9000 statement later, but it is all
unnecesarily expensive and unsafe in a way that on different code base
where loads/stores do not optimize out we will end up inlining very
large constructors even at -Os)
We tend to simplify function bodies a lot in our late optimization
passes quite likely turning them to nothrow that in turn affect size of
dwarf2out tables quite a lot, so still doing this late nothrow discovery
seems to make sense.
So I guess ideally we should have constpure pass handling nothrow too
and having option to work either locally or doing IPA propagation and do
it on all three stages to be able to clean up the abstraction penalties
effectivly... Don't seem to be compilation time overkill here, but I
guess little experimentation will be needed.
Honza
>
>
> > And if it has? Should I debug the above problem?
>
> No, it would be too much hassle for no gain. The pass must go away.
>
>
> Diego.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-11 23:11 ` Diego Novillo
@ 2008-11-11 23:18 ` Jan Hubicka
2008-11-11 23:41 ` Rafael Espindola
2008-11-11 23:53 ` Diego Novillo
0 siblings, 2 replies; 18+ messages in thread
From: Jan Hubicka @ 2008-11-11 23:18 UTC (permalink / raw)
To: Diego Novillo; +Cc: Jan Hubicka, Rafael Espindola, gcc-patches
> 2008/11/11 Jan Hubicka <hubicka@ucw.cz>:
>
> > How this picture change in LTO? I would expect that RTL compilation
> > queue is pretty much unchanged.
>
> Yes, the RTL passes should not change. There may be exceptions like
> this one where there are hidden dependencies that prevent us from
> reorganizing the IPA passes. I hope we don't find many.
How you reorganize the IPA passes? It seems that in current
organization of compiler we realy should stay with one block of early
local passes followed by all the IPA/LTO followed by one block of late
passes + RTL. In this scheme we should never need more than one
fixup_cfg on the beggining of last block.
Honza
>
>
> Diego.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-11 23:18 ` Jan Hubicka
@ 2008-11-11 23:41 ` Rafael Espindola
2008-11-11 23:59 ` Jan Hubicka
2008-11-11 23:53 ` Diego Novillo
1 sibling, 1 reply; 18+ messages in thread
From: Rafael Espindola @ 2008-11-11 23:41 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Diego Novillo, gcc-patches
> How you reorganize the IPA passes? It seems that in current
> organization of compiler we realy should stay with one block of early
> local passes followed by all the IPA/LTO followed by one block of late
> passes + RTL. In this scheme we should never need more than one
> fixup_cfg on the beggining of last block.
I am trying to run the inliner in lgen to remove the extern inline
functions before writing IL to disk.
Cheers,
--
Rafael Avila de Espindola
Google | Gordon House | Barrow Street | Dublin 4 | Ireland
Registered in Dublin, Ireland | Registration Number: 368047
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-11 23:18 ` Jan Hubicka
2008-11-11 23:41 ` Rafael Espindola
@ 2008-11-11 23:53 ` Diego Novillo
1 sibling, 0 replies; 18+ messages in thread
From: Diego Novillo @ 2008-11-11 23:53 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Rafael Espindola, gcc-patches, Doug Kwan
On Tue, Nov 11, 2008 at 18:12, Jan Hubicka <hubicka@ucw.cz> wrote:
> How you reorganize the IPA passes? It seems that in current
> organization of compiler we realy should stay with one block of early
> local passes followed by all the IPA/LTO followed by one block of late
> passes + RTL. In this scheme we should never need more than one
> fixup_cfg on the beggining of last block.
Yes, that scheme hasn't changed.
So far, it has been mostly a separation of IPA passes so that the LTO
generation happens as a separate set of passes. When generating LTO
information (lgen), we want to run the inliner before writing out the
IL so that extern inline functions get tossed out. So that was simply
a matter of adding another meta pass (all_lto_gen_passes) that runs
after all_lto_ipa_passes.
There is some more cleanup that Doug is planning so that we can use
the pass manager when running lto1 in WPA mode. Doug was going to
work on that in the next few days.
Diego.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-11 23:41 ` Rafael Espindola
@ 2008-11-11 23:59 ` Jan Hubicka
2008-11-12 0:01 ` Diego Novillo
0 siblings, 1 reply; 18+ messages in thread
From: Jan Hubicka @ 2008-11-11 23:59 UTC (permalink / raw)
To: Rafael Espindola; +Cc: Jan Hubicka, Diego Novillo, gcc-patches
> > How you reorganize the IPA passes? It seems that in current
> > organization of compiler we realy should stay with one block of early
> > local passes followed by all the IPA/LTO followed by one block of late
> > passes + RTL. In this scheme we should never need more than one
> > fixup_cfg on the beggining of last block.
>
> I am trying to run the inliner in lgen to remove the extern inline
> functions before writing IL to disk.
This should be realitvely easy to do if you schedule one IPA_INLINE pass
followed by local passes queue containing only fixup_cfg (fixup_cfg is
cleaning up after inlining too) before you cgraph output. The local
passes will make passsmanager to apply inliner's transformations and you
should end up with consistent cfgs.
In general I am not opposed of doing IPA passes at compilation time to
optimize what we get into lto object files. Problem of early inlining
is that it does not have idea of the whole program size and thus can't
do too informed decisions compared to the IPA inliner. Often when A
calls B and B calls C it is not clear whether A->B should be inlined or
B->C or both. If B and C lie in different unit then A we could prevent
the more useful inliing. We already have early inliner handling the
code size reducing (and thus safe) cases.
It seems that most correct approach would be to turn extern inline
functions into static functions before LTO and have pass that will
redirect the calls of noninlined bodies to the non-extern inline body
(or external call) after inlining. This way we don't lose
optimization/information.
Honza
>
> Cheers,
> --
> Rafael Avila de Espindola
>
> Google | Gordon House | Barrow Street | Dublin 4 | Ireland
> Registered in Dublin, Ireland | Registration Number: 368047
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-11 23:59 ` Jan Hubicka
@ 2008-11-12 0:01 ` Diego Novillo
2008-11-12 0:04 ` Jan Hubicka
0 siblings, 1 reply; 18+ messages in thread
From: Diego Novillo @ 2008-11-12 0:01 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Rafael Espindola, gcc-patches
On Tue, Nov 11, 2008 at 18:52, Jan Hubicka <hubicka@ucw.cz> wrote:
> It seems that most correct approach would be to turn extern inline
> functions into static functions before LTO and have pass that will
> redirect the calls of noninlined bodies to the non-extern inline body
> (or external call) after inlining. This way we don't lose
> optimization/information.
Isn't this what the inliner does already?
Diego.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-12 0:01 ` Diego Novillo
@ 2008-11-12 0:04 ` Jan Hubicka
2008-11-12 0:57 ` Jan Hubicka
0 siblings, 1 reply; 18+ messages in thread
From: Jan Hubicka @ 2008-11-12 0:04 UTC (permalink / raw)
To: Diego Novillo; +Cc: Jan Hubicka, Rafael Espindola, gcc-patches
> On Tue, Nov 11, 2008 at 18:52, Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > It seems that most correct approach would be to turn extern inline
> > functions into static functions before LTO and have pass that will
> > redirect the calls of noninlined bodies to the non-extern inline body
> > (or external call) after inlining. This way we don't lose
> > optimization/information.
>
> Isn't this what the inliner does already?
No, currently we handle extern inline in quite hackish way. If there is
only extern inline function than we handle is as normal inline with the
difference that body is removed afterwards.
If there is more than one occurence of extern inline (in --combine) or
both extern inline and offline variant (this can be in single C unit),
then we simply disable inlining and panic.
This is all because of DECL sharing, I was never able to get multiple
variants of same function out of C and C++ frontend.
Honza
>
>
> Diego.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-12 0:04 ` Jan Hubicka
@ 2008-11-12 0:57 ` Jan Hubicka
2008-11-12 2:02 ` Diego Novillo
0 siblings, 1 reply; 18+ messages in thread
From: Jan Hubicka @ 2008-11-12 0:57 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Diego Novillo, Rafael Espindola, gcc-patches
> > On Tue, Nov 11, 2008 at 18:52, Jan Hubicka <hubicka@ucw.cz> wrote:
> >
> > > It seems that most correct approach would be to turn extern inline
> > > functions into static functions before LTO and have pass that will
> > > redirect the calls of noninlined bodies to the non-extern inline body
> > > (or external call) after inlining. This way we don't lose
> > > optimization/information.
> >
> > Isn't this what the inliner does already?
>
> No, currently we handle extern inline in quite hackish way. If there is
> only extern inline function than we handle is as normal inline with the
> difference that body is removed afterwards.
> If there is more than one occurence of extern inline (in --combine) or
> both extern inline and offline variant (this can be in single C unit),
> then we simply disable inlining and panic.
> This is all because of DECL sharing, I was never able to get multiple
> variants of same function out of C and C++ frontend.
... also extern inline functions remain extern for most of IPA stuff as
well as --combine merging that is quite suboptimal.
I guess we don't need to care about --combine, but the case where both
extern inline and offline is in one unit seems quite important
implementation quality bug to me.
Honza
>
> Honza
> >
> >
> > Diego.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-12 0:57 ` Jan Hubicka
@ 2008-11-12 2:02 ` Diego Novillo
2008-11-12 10:01 ` Rafael Espindola
2008-11-12 16:35 ` Jan Hubicka
0 siblings, 2 replies; 18+ messages in thread
From: Diego Novillo @ 2008-11-12 2:02 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Jan Hubicka, Rafael Espindola, gcc-patches
On Tue, Nov 11, 2008 at 19:00, Jan Hubicka <jh@suse.cz> wrote:
> ... also extern inline functions remain extern for most of IPA stuff as
> well as --combine merging that is quite suboptimal.
> I guess we don't need to care about --combine, but the case where both
> extern inline and offline is in one unit seems quite important
> implementation quality bug to me.
OK, do you think you could implement this solution to extern inlines?
For now, running the inliner early will get rid of the immediate
problem we are having. If you don't have a lot of time, could you
send an outline of what needs to be done?
Thanks. Diego.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-12 2:02 ` Diego Novillo
@ 2008-11-12 10:01 ` Rafael Espindola
2008-11-12 16:02 ` Jan Hubicka
2008-11-12 16:35 ` Jan Hubicka
1 sibling, 1 reply; 18+ messages in thread
From: Rafael Espindola @ 2008-11-12 10:01 UTC (permalink / raw)
To: Diego Novillo; +Cc: Jan Hubicka, Jan Hubicka, gcc-patches
> OK, do you think you could implement this solution to extern inlines?
> For now, running the inliner early will get rid of the immediate
> problem we are having. If you don't have a lot of time, could you
> send an outline of what needs to be done?
I think I still prefer to run the inliner and drop the extern inline
functions. The option of fully transferring then to wpa would be
harder, since now the compiler would see more function bodies then the
linker and would need decide what to do with them. I am afraid that
converting extern inline into static functions would break some code
that has unreasonable expectations about a function defined in another
file being called.
Doing an early inline (and possibly other optimizations) also has the
benefit of reducing the size of the IL that is written to disk.
> Thanks. Diego.
>
Cheers,
--
Rafael Avila de Espindola
Google | Gordon House | Barrow Street | Dublin 4 | Ireland
Registered in Dublin, Ireland | Registration Number: 368047
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-12 10:01 ` Rafael Espindola
@ 2008-11-12 16:02 ` Jan Hubicka
0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2008-11-12 16:02 UTC (permalink / raw)
To: Rafael Espindola; +Cc: Diego Novillo, Jan Hubicka, Jan Hubicka, gcc-patches
> > OK, do you think you could implement this solution to extern inlines?
> > For now, running the inliner early will get rid of the immediate
> > problem we are having. If you don't have a lot of time, could you
> > send an outline of what needs to be done?
>
> I think I still prefer to run the inliner and drop the extern inline
> functions. The option of fully transferring then to wpa would be
> harder, since now the compiler would see more function bodies then the
> linker and would need decide what to do with them. I am afraid that
> converting extern inline into static functions would break some code
> that has unreasonable expectations about a function defined in another
> file being called.
Well, the scheme that I think would be ideal should work here.
Basically we should
1) Make C and C++ froends to keep both copies of function of same name
(extern inline and non-extern inline body) if available.
2) Teach LTO to output the extern inlines as static functions and to
direct all calls to extern inline function variant
3) Teach inliner to redirect all the remaining calls to the externally
visible body. We aready do work on removing out of line extern inlines
so this can be bundled there.
This should have precisely the intended semantics of extern inline
eliminating the current bug that extern inlines are ignored with
--combine and if offline body is present.
This (except for the LTO bits) is something on my TODO for years, but
I never got around implementing 1) correctly. 3) is very easy to do...
>
> Doing an early inline (and possibly other optimizations) also has the
> benefit of reducing the size of the IL that is written to disk.
We already do this kind of early inline during the einline pass.
Einline pass is basically doing to work that pays back in both size and
speed and work that will be always done (always_inline and flatten
attribute processing), while full inliner is here to trade size for
speed based on knowledge of whole unit.
So you should not need to reorder passes here at all, just you will lose
some of extern inline oppurtunities (I think C frontend is making extern
inline always inline so early inliner will pick them except for weird
recursive cases where topological order does not allows them. In C++ I
think they are not always inline so early inliner will pick only very
small ones. If you need immediate hack around, perhaps making C++ ones
to disregard inline limits too would work).
Honza
>
> > Thanks. Diego.
> >
>
> Cheers,
> --
> Rafael Avila de Espindola
>
> Google | Gordon House | Barrow Street | Dublin 4 | Ireland
> Registered in Dublin, Ireland | Registration Number: 368047
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c
2008-11-12 2:02 ` Diego Novillo
2008-11-12 10:01 ` Rafael Espindola
@ 2008-11-12 16:35 ` Jan Hubicka
1 sibling, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2008-11-12 16:35 UTC (permalink / raw)
To: Diego Novillo; +Cc: Jan Hubicka, Jan Hubicka, Rafael Espindola, gcc-patches
> On Tue, Nov 11, 2008 at 19:00, Jan Hubicka <jh@suse.cz> wrote:
>
> > ... also extern inline functions remain extern for most of IPA stuff as
> > well as --combine merging that is quite suboptimal.
> > I guess we don't need to care about --combine, but the case where both
> > extern inline and offline is in one unit seems quite important
> > implementation quality bug to me.
>
> OK, do you think you could implement this solution to extern inlines?
It is for a long time on my TODO. Do you think there is someone with
enough of frontend knowledge to make frontend to give me both variants
of functions? I can definitly then make callgraph to turn extern
inlines to static functions at the end of unit and do rest of magic.
Honza
> For now, running the inliner early will get rid of the immediate
> problem we are having. If you don't have a lot of time, could you
> send an outline of what needs to be done?
>
>
> Thanks. Diego.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-11-12 15:52 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-11 17:31 [lto][patch] Move the call to execute_all_ipa_transforms to cgraphunit.c Rafael Espindola
2008-11-11 20:23 ` Jan Hubicka
2008-11-11 22:49 ` Rafael Espindola
2008-11-11 22:53 ` Jan Hubicka
2008-11-11 23:11 ` Diego Novillo
2008-11-11 23:18 ` Jan Hubicka
2008-11-11 23:41 ` Rafael Espindola
2008-11-11 23:59 ` Jan Hubicka
2008-11-12 0:01 ` Diego Novillo
2008-11-12 0:04 ` Jan Hubicka
2008-11-12 0:57 ` Jan Hubicka
2008-11-12 2:02 ` Diego Novillo
2008-11-12 10:01 ` Rafael Espindola
2008-11-12 16:02 ` Jan Hubicka
2008-11-12 16:35 ` Jan Hubicka
2008-11-11 23:53 ` Diego Novillo
2008-11-11 22:54 ` Diego Novillo
2008-11-11 23:14 ` Jan Hubicka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).