public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Improve stack variable reuse with inlining with exceptions (PR tree-optimization/86214)
@ 2019-01-17 13:35 Jakub Jelinek
  2019-01-17 14:07 ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Jelinek @ 2019-01-17 13:35 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Hi!

As the following testcases show, we are unable to share stack slots
for (large) variables from inline functions if something in those inline
functions can throw externally.

The issue is that the clobbers we have even in the EH paths
are usually removed by ehcleanup1 which attempts to make the functions
smaller before inlining; if the only reason to throw internally and then
rethrow externally are clobber stmts, that is too high cost and so we
optimize them away.

If such functions are inlined into a function that has some EH landing pads
active for the inlined functions though, those EH landing pads are where
the local variables are all considered to conflict between different
inlines, because there is no clobber on that path.

The following patch fixes it by adding clobbers at the start of the EH
landing pad in the inlining function's caller if it was external throw
before and now is internal throw (i.e. when we add an edge from newly added
bb in the inlining to bb in the function from before inlining).

If we throw externally from a function (even if it is inline function), all
local variables in that function are out of scope.

I've bootstrapped/regtested an earlier version of this patch which just had
another basic_block member id->eh_landing_pad_bb used to assert that there
is at most one such landing pad (which survived x86_64-linux and i686-linux
bootstrap/regtest), ok for trunk if this patch passes bootstrap/regtest too?

2019-01-17  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/86214
	* tree-inline.h (struct copy_body_data): Add
	add_clobbers_to_eh_landing_pads member.
	* tree-inline.c (add_clobbers_to_eh_landing_pad): New function.
	(copy_edges_for_bb): Call it if EH edge destination is <
	id->add_clobbers_to_eh_landing_pads.  Fix a comment typo.
	(expand_call_inline): Set id->add_clobbers_to_eh_landing_pads
	if flag_stack_reuse != SR_NONE and clear it afterwards.

	* g++.dg/opt/pr86214-1.C: New test.
	* g++.dg/opt/pr86214-2.C: New test.

--- gcc/tree-inline.h.jj	2019-01-17 13:19:56.678539594 +0100
+++ gcc/tree-inline.h	2019-01-17 14:22:56.138496295 +0100
@@ -155,6 +155,12 @@ struct copy_body_data
   /* A list of addressable local variables remapped into the caller
      when inlining a call within an OpenMP SIMD-on-SIMT loop.  */
   vec<tree> *dst_simt_vars;
+
+  /* If clobbers for local variables from the inline function
+     that need to live in memory should be added to EH landing pads
+     outside of the inlined function, this should be the number
+     of basic blocks in the caller before inlining.  Zero otherwise.  */
+  int add_clobbers_to_eh_landing_pads;
 };
 
 /* Weights of constructions for estimate_num_insns.  */
--- gcc/tree-inline.c.jj	2019-01-17 13:19:56.668539756 +0100
+++ gcc/tree-inline.c	2019-01-17 14:23:42.364746520 +0100
@@ -2190,6 +2190,40 @@ update_ssa_across_abnormal_edges (basic_
       }
 }
 
+/* Insert clobbers for automatic variables of inlined ID->src_fn
+   function at the start of basic block BB.  */
+
+static void
+add_clobbers_to_eh_landing_pad (basic_block bb, copy_body_data *id)
+{
+  tree var;
+  unsigned int i;
+  FOR_EACH_VEC_SAFE_ELT (id->src_cfun->local_decls, i, var)
+    if (VAR_P (var)
+	&& !DECL_HARD_REGISTER (var)
+	&& !TREE_THIS_VOLATILE (var)
+	&& !DECL_HAS_VALUE_EXPR_P (var)
+	&& !is_gimple_reg (var)
+	&& auto_var_in_fn_p (var, id->src_fn))
+      {
+	tree *t = id->decl_map->get (var);
+	if (!t)
+	  continue;
+	tree new_var = *t;
+	if (VAR_P (new_var)
+	    && !DECL_HARD_REGISTER (new_var)
+	    && !TREE_THIS_VOLATILE (new_var)
+	    && !DECL_HAS_VALUE_EXPR_P (new_var)
+	    && !is_gimple_reg (new_var)
+	    && auto_var_in_fn_p (new_var, id->dst_fn))
+	  {
+	    gimple_stmt_iterator gsi = gsi_after_labels (bb);
+	    tree clobber = build_clobber (TREE_TYPE (new_var));
+	    gimple *clobber_stmt = gimple_build_assign (new_var, clobber);
+	    gsi_insert_before (&gsi, clobber_stmt, GSI_NEW_STMT);
+	  }
+      }
+}
 
 /* Copy edges from BB into its copy constructed earlier, scale profile
    accordingly.  Edges will be taken care of later.  Assume aux
@@ -2232,7 +2266,7 @@ copy_edges_for_bb (basic_block bb, profi
   if (bb->index == ENTRY_BLOCK || bb->index == EXIT_BLOCK)
     return false;
 
-  /* When doing function splitting, we must decreate count of the return block
+  /* When doing function splitting, we must decrease count of the return block
      which was previously reachable by block we did not copy.  */
   if (single_succ_p (bb) && single_succ_edge (bb)->dest->index == EXIT_BLOCK)
     FOR_EACH_EDGE (old_edge, ei, bb->preds)
@@ -2317,8 +2351,16 @@ copy_edges_for_bb (basic_block bb, profi
 	      e->probability = old_edge->probability;
 	    
           FOR_EACH_EDGE (e, ei, copy_stmt_bb->succs)
-	    if ((e->flags & EDGE_EH) && !e->probability.initialized_p ())
-	      e->probability = profile_probability::never ();
+	    if (e->flags & EDGE_EH)
+	      {
+		if (!e->probability.initialized_p ())
+		  e->probability = profile_probability::never ();
+		if (e->dest->index < id->add_clobbers_to_eh_landing_pads)
+		  {
+		    add_clobbers_to_eh_landing_pad (e->dest, id);
+		    id->add_clobbers_to_eh_landing_pads = 0;
+		  }
+	      }
         }
 
 
@@ -4565,6 +4607,8 @@ expand_call_inline (basic_block bb, gimp
   id->decl_map = new hash_map<tree, tree>;
   dst = id->debug_map;
   id->debug_map = NULL;
+  if (flag_stack_reuse != SR_NONE)
+    id->add_clobbers_to_eh_landing_pads = last_basic_block_for_fn (cfun);
 
   /* Record the function we are about to inline.  */
   id->src_fn = fn;
@@ -4872,6 +4916,7 @@ expand_call_inline (basic_block bb, gimp
     }
 
   id->assign_stmts.release ();
+  id->add_clobbers_to_eh_landing_pads = 0;
 
   /* Output the inlining info for this abstract function, since it has been
      inlined.  If we don't do this now, we can lose the information about the
--- gcc/testsuite/g++.dg/opt/pr86214-1.C.jj	2019-01-17 14:22:38.083789139 +0100
+++ gcc/testsuite/g++.dg/opt/pr86214-1.C	2019-01-17 14:22:38.083789139 +0100
@@ -0,0 +1,30 @@
+// PR tree-optimization/86214
+// { dg-do compile }
+// { dg-options "-O2 -Wstack-usage=15000" }
+
+typedef __SIZE_TYPE__ size_t;
+struct A { A (); ~A (); int a; void qux (const char *); };
+int bar (char *);
+
+static inline A
+foo ()
+{
+  char b[8192];
+  int x = bar (b);
+  A s;
+  if (x > 0 && (size_t) x < sizeof b)
+    s.qux (b);
+  return s;
+}
+
+void
+baz ()	// { dg-bogus "stack usage is" }
+{
+  A a;
+  char c[1024];
+  bar (c);
+  foo (); foo (); foo (); foo (); foo ();
+  foo (); foo (); foo (); foo (); foo ();
+  foo (); foo (); foo (); foo (); foo ();
+  foo (); foo (); foo (); foo (); foo ();
+}
--- gcc/testsuite/g++.dg/opt/pr86214-2.C.jj	2019-01-17 14:22:38.083789139 +0100
+++ gcc/testsuite/g++.dg/opt/pr86214-2.C	2019-01-17 14:22:38.083789139 +0100
@@ -0,0 +1,28 @@
+// PR tree-optimization/86214
+// { dg-do compile }
+// { dg-options "-O2 -Wstack-usage=15000" }
+
+typedef __SIZE_TYPE__ size_t;
+struct A { A (); ~A (); int a; void qux (const char *); };
+int bar (char *);
+
+static inline __attribute__((always_inline)) A
+foo ()
+{
+  char b[8192];
+  int x = bar (b);
+  A s;
+  if (x > 0 && (size_t) x < sizeof b)
+    s.qux (b);
+  return s;
+}
+
+void
+baz ()	// { dg-bogus "stack usage is" }
+{
+  A a;
+  foo (); foo (); foo (); foo (); foo ();
+  foo (); foo (); foo (); foo (); foo ();
+  foo (); foo (); foo (); foo (); foo ();
+  foo (); foo (); foo (); foo (); foo ();
+}

	Jakub

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Improve stack variable reuse with inlining with exceptions (PR tree-optimization/86214)
  2019-01-17 13:35 [PATCH] Improve stack variable reuse with inlining with exceptions (PR tree-optimization/86214) Jakub Jelinek
@ 2019-01-17 14:07 ` Richard Biener
  2019-01-17 14:43   ` Jakub Jelinek
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2019-01-17 14:07 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

On Thu, 17 Jan 2019, Jakub Jelinek wrote:

> Hi!
> 
> As the following testcases show, we are unable to share stack slots
> for (large) variables from inline functions if something in those inline
> functions can throw externally.
> 
> The issue is that the clobbers we have even in the EH paths
> are usually removed by ehcleanup1 which attempts to make the functions
> smaller before inlining; if the only reason to throw internally and then
> rethrow externally are clobber stmts, that is too high cost and so we
> optimize them away.
> 
> If such functions are inlined into a function that has some EH landing pads
> active for the inlined functions though, those EH landing pads are where
> the local variables are all considered to conflict between different
> inlines, because there is no clobber on that path.
> 
> The following patch fixes it by adding clobbers at the start of the EH
> landing pad in the inlining function's caller if it was external throw
> before and now is internal throw (i.e. when we add an edge from newly added
> bb in the inlining to bb in the function from before inlining).
> 
> If we throw externally from a function (even if it is inline function), all
> local variables in that function are out of scope.
> 
> I've bootstrapped/regtested an earlier version of this patch which just had
> another basic_block member id->eh_landing_pad_bb used to assert that there
> is at most one such landing pad (which survived x86_64-linux and i686-linux
> bootstrap/regtest), ok for trunk if this patch passes bootstrap/regtest too?
> 
> 2019-01-17  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/86214
> 	* tree-inline.h (struct copy_body_data): Add
> 	add_clobbers_to_eh_landing_pads member.
> 	* tree-inline.c (add_clobbers_to_eh_landing_pad): New function.
> 	(copy_edges_for_bb): Call it if EH edge destination is <
> 	id->add_clobbers_to_eh_landing_pads.  Fix a comment typo.
> 	(expand_call_inline): Set id->add_clobbers_to_eh_landing_pads
> 	if flag_stack_reuse != SR_NONE and clear it afterwards.
> 
> 	* g++.dg/opt/pr86214-1.C: New test.
> 	* g++.dg/opt/pr86214-2.C: New test.
> 
> --- gcc/tree-inline.h.jj	2019-01-17 13:19:56.678539594 +0100
> +++ gcc/tree-inline.h	2019-01-17 14:22:56.138496295 +0100
> @@ -155,6 +155,12 @@ struct copy_body_data
>    /* A list of addressable local variables remapped into the caller
>       when inlining a call within an OpenMP SIMD-on-SIMT loop.  */
>    vec<tree> *dst_simt_vars;
> +
> +  /* If clobbers for local variables from the inline function
> +     that need to live in memory should be added to EH landing pads
> +     outside of the inlined function, this should be the number
> +     of basic blocks in the caller before inlining.  Zero otherwise.  */
> +  int add_clobbers_to_eh_landing_pads;
>  };
>  
>  /* Weights of constructions for estimate_num_insns.  */
> --- gcc/tree-inline.c.jj	2019-01-17 13:19:56.668539756 +0100
> +++ gcc/tree-inline.c	2019-01-17 14:23:42.364746520 +0100
> @@ -2190,6 +2190,40 @@ update_ssa_across_abnormal_edges (basic_
>        }
>  }
>  
> +/* Insert clobbers for automatic variables of inlined ID->src_fn
> +   function at the start of basic block BB.  */
> +
> +static void
> +add_clobbers_to_eh_landing_pad (basic_block bb, copy_body_data *id)
> +{
> +  tree var;
> +  unsigned int i;
> +  FOR_EACH_VEC_SAFE_ELT (id->src_cfun->local_decls, i, var)
> +    if (VAR_P (var)
> +	&& !DECL_HARD_REGISTER (var)
> +	&& !TREE_THIS_VOLATILE (var)
> +	&& !DECL_HAS_VALUE_EXPR_P (var)
> +	&& !is_gimple_reg (var)
> +	&& auto_var_in_fn_p (var, id->src_fn))
> +      {
> +	tree *t = id->decl_map->get (var);
> +	if (!t)
> +	  continue;
> +	tree new_var = *t;
> +	if (VAR_P (new_var)
> +	    && !DECL_HARD_REGISTER (new_var)
> +	    && !TREE_THIS_VOLATILE (new_var)
> +	    && !DECL_HAS_VALUE_EXPR_P (new_var)
> +	    && !is_gimple_reg (new_var)
> +	    && auto_var_in_fn_p (new_var, id->dst_fn))
> +	  {
> +	    gimple_stmt_iterator gsi = gsi_after_labels (bb);
> +	    tree clobber = build_clobber (TREE_TYPE (new_var));
> +	    gimple *clobber_stmt = gimple_build_assign (new_var, clobber);
> +	    gsi_insert_before (&gsi, clobber_stmt, GSI_NEW_STMT);
> +	  }
> +      }

So we do not care to optimize this to only clobber the vars that
are appear live over the EH edge?

> +}
>  
>  /* Copy edges from BB into its copy constructed earlier, scale profile
>     accordingly.  Edges will be taken care of later.  Assume aux
> @@ -2232,7 +2266,7 @@ copy_edges_for_bb (basic_block bb, profi
>    if (bb->index == ENTRY_BLOCK || bb->index == EXIT_BLOCK)
>      return false;
>  
> -  /* When doing function splitting, we must decreate count of the return block
> +  /* When doing function splitting, we must decrease count of the return block
>       which was previously reachable by block we did not copy.  */
>    if (single_succ_p (bb) && single_succ_edge (bb)->dest->index == EXIT_BLOCK)
>      FOR_EACH_EDGE (old_edge, ei, bb->preds)
> @@ -2317,8 +2351,16 @@ copy_edges_for_bb (basic_block bb, profi
>  	      e->probability = old_edge->probability;
>  	    
>            FOR_EACH_EDGE (e, ei, copy_stmt_bb->succs)
> -	    if ((e->flags & EDGE_EH) && !e->probability.initialized_p ())
> -	      e->probability = profile_probability::never ();
> +	    if (e->flags & EDGE_EH)
> +	      {
> +		if (!e->probability.initialized_p ())
> +		  e->probability = profile_probability::never ();
> +		if (e->dest->index < id->add_clobbers_to_eh_landing_pads)
> +		  {

Why don't we need to watch out for EDGE_COUNT (e->dest->preds) != 1?
Is it that we just don't care?

> +		    add_clobbers_to_eh_landing_pad (e->dest, id);
> +		    id->add_clobbers_to_eh_landing_pads = 0;
> +		  }
> +	      }
>          }
>  
>  
> @@ -4565,6 +4607,8 @@ expand_call_inline (basic_block bb, gimp
>    id->decl_map = new hash_map<tree, tree>;
>    dst = id->debug_map;
>    id->debug_map = NULL;
> +  if (flag_stack_reuse != SR_NONE)
> +    id->add_clobbers_to_eh_landing_pads = last_basic_block_for_fn (cfun);
>  
>    /* Record the function we are about to inline.  */
>    id->src_fn = fn;
> @@ -4872,6 +4916,7 @@ expand_call_inline (basic_block bb, gimp
>      }
>  
>    id->assign_stmts.release ();
> +  id->add_clobbers_to_eh_landing_pads = 0;
>  
>    /* Output the inlining info for this abstract function, since it has been
>       inlined.  If we don't do this now, we can lose the information about the
> --- gcc/testsuite/g++.dg/opt/pr86214-1.C.jj	2019-01-17 14:22:38.083789139 +0100
> +++ gcc/testsuite/g++.dg/opt/pr86214-1.C	2019-01-17 14:22:38.083789139 +0100
> @@ -0,0 +1,30 @@
> +// PR tree-optimization/86214
> +// { dg-do compile }
> +// { dg-options "-O2 -Wstack-usage=15000" }
> +
> +typedef __SIZE_TYPE__ size_t;
> +struct A { A (); ~A (); int a; void qux (const char *); };
> +int bar (char *);
> +
> +static inline A
> +foo ()
> +{
> +  char b[8192];
> +  int x = bar (b);
> +  A s;
> +  if (x > 0 && (size_t) x < sizeof b)
> +    s.qux (b);
> +  return s;
> +}
> +
> +void
> +baz ()	// { dg-bogus "stack usage is" }
> +{
> +  A a;
> +  char c[1024];
> +  bar (c);
> +  foo (); foo (); foo (); foo (); foo ();
> +  foo (); foo (); foo (); foo (); foo ();
> +  foo (); foo (); foo (); foo (); foo ();
> +  foo (); foo (); foo (); foo (); foo ();
> +}
> --- gcc/testsuite/g++.dg/opt/pr86214-2.C.jj	2019-01-17 14:22:38.083789139 +0100
> +++ gcc/testsuite/g++.dg/opt/pr86214-2.C	2019-01-17 14:22:38.083789139 +0100
> @@ -0,0 +1,28 @@
> +// PR tree-optimization/86214
> +// { dg-do compile }
> +// { dg-options "-O2 -Wstack-usage=15000" }
> +
> +typedef __SIZE_TYPE__ size_t;
> +struct A { A (); ~A (); int a; void qux (const char *); };
> +int bar (char *);
> +
> +static inline __attribute__((always_inline)) A
> +foo ()
> +{
> +  char b[8192];
> +  int x = bar (b);
> +  A s;
> +  if (x > 0 && (size_t) x < sizeof b)
> +    s.qux (b);
> +  return s;
> +}
> +
> +void
> +baz ()	// { dg-bogus "stack usage is" }
> +{
> +  A a;
> +  foo (); foo (); foo (); foo (); foo ();
> +  foo (); foo (); foo (); foo (); foo ();
> +  foo (); foo (); foo (); foo (); foo ();
> +  foo (); foo (); foo (); foo (); foo ();
> +}
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Improve stack variable reuse with inlining with exceptions (PR tree-optimization/86214)
  2019-01-17 14:07 ` Richard Biener
@ 2019-01-17 14:43   ` Jakub Jelinek
  2019-01-18  9:07     ` Jakub Jelinek
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Jelinek @ 2019-01-17 14:43 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Thu, Jan 17, 2019 at 03:06:57PM +0100, Richard Biener wrote:
> > +	tree new_var = *t;
> > +	if (VAR_P (new_var)
> > +	    && !DECL_HARD_REGISTER (new_var)
> > +	    && !TREE_THIS_VOLATILE (new_var)
> > +	    && !DECL_HAS_VALUE_EXPR_P (new_var)
> > +	    && !is_gimple_reg (new_var)
> > +	    && auto_var_in_fn_p (new_var, id->dst_fn))
> > +	  {
> > +	    gimple_stmt_iterator gsi = gsi_after_labels (bb);
> > +	    tree clobber = build_clobber (TREE_TYPE (new_var));
> > +	    gimple *clobber_stmt = gimple_build_assign (new_var, clobber);
> > +	    gsi_insert_before (&gsi, clobber_stmt, GSI_NEW_STMT);
> > +	  }
> > +      }
> 
> So we do not care to optimize this to only clobber the vars that
> are appear live over the EH edge?

Wouldn't that be quite expensive (especially at that spot in the inliner)?
I could surely defer that (at the spot in copy_edges_for_bb just remember
which bb was it and handle it before destroying the decl_map in
expand_call_inline, but:
1) are the extra CLOBBERs that big a deal?
2) if they are, shouldn't we have a pass that does it generically after IPA
   and removes all CLOBBERs for vars already known dead, whether they come
   from inlining or whatever else (we have many in the IL already that are
   dead for other reasons, e.g. a variable where the destructor is inlined
   and has one CLOBBER in itself, but have another CLOBBER in the caller too
   when the var goes out of scope); tree DSE is able to remove CLOBBERs for
   the same var if they are adjacent; in either case, what would be the
   algorithm for that?  Something like add_scope_conflicts algorithm, just
   not build any conflicts, but just propagate the var is live bitmaps
   and when the propagation terminates, go through all bbs and if it sees
   a clobber on a var that isn't live at that point, remove the clobber?

> > @@ -2317,8 +2351,16 @@ copy_edges_for_bb (basic_block bb, profi
> >  	      e->probability = old_edge->probability;
> >  	    
> >            FOR_EACH_EDGE (e, ei, copy_stmt_bb->succs)
> > -	    if ((e->flags & EDGE_EH) && !e->probability.initialized_p ())
> > -	      e->probability = profile_probability::never ();
> > +	    if (e->flags & EDGE_EH)
> > +	      {
> > +		if (!e->probability.initialized_p ())
> > +		  e->probability = profile_probability::never ();
> > +		if (e->dest->index < id->add_clobbers_to_eh_landing_pads)
> > +		  {
> 
> Why don't we need to watch out for EDGE_COUNT (e->dest->preds) != 1?

That is the usual case.

> Is it that we just don't care?

That was the intention, if say the EH pad has 4 predecessors,
3 of them from 3 different inline functions and the last one from something
else, then the alternatives to what the patch does (i.e. have clobbers
for variables from all those 3 inline functions, the EH pad is initially
in the caller, so the vars from those inline functions can't be magically
still live there) would be either do nothing and keep this PR open and
have inlining increase stack sizes much more than it expects to in its
estimations, or split those EH pads, have special EH pad for one inline
function and have that set of clobber there, another one for another etc.
and branch to the same spot.  But that means worse code generation unless we
optimize that away (and, either we optimize that by throwing the CLOBBERs
out and we are back to this PR, or we optimize them by moving all the
CLOBBERs into the same EH pad (essentially sinking the CLOBBERs to the
common fallthru, but that means what this patch does)).

Note, analysis of what variables are dead could be useful e.g. for
tail-merge pass, e.g. on:
int bar (int n);
void baz (char *x, int n);

int
foo (int x, int y)
{
  switch (x)
    {
    case 1:
      {
        char buf[8192];
        baz (buf, 4);
        if (y != 5)
	  return 4;
	bar (2);
	bar (3);
	bar (4);
	bar (5);
	bar (6);
	bar (7);
	bar (9);
	break;
      }
    case 2:
      {
        char buf[8192];
        baz (buf, 124);
        if (y != 7)
	  return 19;
	bar (2);
	break;
      }
    case 3:
      {
        char buf[8192];
        baz (buf, 51);
        if (y != 19)
	  return 26;
	bar (2);
	bar (3);
	bar (4);
	bar (5);
	bar (6);
	bar (7);
	bar (9);
	break;
      }
    default:
      {
        char buf[8192];
        baz (buf, 18);
        if (y != 21)
	  return 28;
	bar (2);
	bar (3);
	bar (4);
	bar (5);
	bar (6);
	bar (7);
	bar (9);
	break;
      }
    }
  bar (8);
  return 7;
}

we don't optimize this at -O2, but do with -O2 -fstack-reuse=none when
we don't have the CLOBBERs around.  If tail-merge pass could run this
analysis and find out that while the basic blocks under consideration have
different sets of clobbers, the vars from clobbers from the other sets are dead
in the bb not having those, thus it should be possible to union the clobber
sets and tail-merge because of that.

	Jakub

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Improve stack variable reuse with inlining with exceptions (PR tree-optimization/86214)
  2019-01-17 14:43   ` Jakub Jelinek
@ 2019-01-18  9:07     ` Jakub Jelinek
  2019-01-18  9:23       ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Jelinek @ 2019-01-18  9:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Thu, Jan 17, 2019 at 03:43:05PM +0100, Jakub Jelinek wrote:
> > So we do not care to optimize this to only clobber the vars that
> > are appear live over the EH edge?
> 
> Wouldn't that be quite expensive (especially at that spot in the inliner)?
> I could surely defer that (at the spot in copy_edges_for_bb just remember
> which bb was it and handle it before destroying the decl_map in
> expand_call_inline, but:
> 1) are the extra CLOBBERs that big a deal?
> 2) if they are, shouldn't we have a pass that does it generically after IPA
>    and removes all CLOBBERs for vars already known dead, whether they come
>    from inlining or whatever else (we have many in the IL already that are
>    dead for other reasons, e.g. a variable where the destructor is inlined
>    and has one CLOBBER in itself, but have another CLOBBER in the caller too
>    when the var goes out of scope); tree DSE is able to remove CLOBBERs for
>    the same var if they are adjacent; in either case, what would be the
>    algorithm for that?  Something like add_scope_conflicts algorithm, just
>    not build any conflicts, but just propagate the var is live bitmaps
>    and when the propagation terminates, go through all bbs and if it sees
>    a clobber on a var that isn't live at that point, remove the clobber?

That said, I think it would be doable also in the inliner if you prefer to
do it there for now, and for GCC10 other passes could use that for other
purposes.

I'd do it in copy_cfg_body before we destroy the ->aux fields mapping src_fn
bbs to dst_fn blocks, pre_and_rev_post_order_compute_fn (id->src_cfun, ...)
+ have a bitmap previously computed of interesting decls to track liveness
of (for later uses if the bitmap is NULL it could track all VAR_DECLs that
need to live in memory) and do the liveness propagation similarly to what
add_scope_conflicts does on the src_cfun bbs, then in the end just look at
srcs of EDGE_EH edges that are >= last and see what vars are live at the end
of those bbs from the bitmaps.

	Jakub

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Improve stack variable reuse with inlining with exceptions (PR tree-optimization/86214)
  2019-01-18  9:07     ` Jakub Jelinek
@ 2019-01-18  9:23       ` Richard Biener
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Biener @ 2019-01-18  9:23 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

On Fri, 18 Jan 2019, Jakub Jelinek wrote:

> On Thu, Jan 17, 2019 at 03:43:05PM +0100, Jakub Jelinek wrote:
> > > So we do not care to optimize this to only clobber the vars that
> > > are appear live over the EH edge?
> > 
> > Wouldn't that be quite expensive (especially at that spot in the inliner)?
> > I could surely defer that (at the spot in copy_edges_for_bb just remember
> > which bb was it and handle it before destroying the decl_map in
> > expand_call_inline, but:
> > 1) are the extra CLOBBERs that big a deal?

No idea.  Can you see how many we add (and to have a comparison how
many were there before) when, say, building libstdc++?  (bootstrap
has -fno-exceptions to that's not a good test, maybe some other
(smaller) C++ application?)

> > 2) if they are, shouldn't we have a pass that does it generically after IPA
> >    and removes all CLOBBERs for vars already known dead, whether they come
> >    from inlining or whatever else (we have many in the IL already that are
> >    dead for other reasons, e.g. a variable where the destructor is inlined
> >    and has one CLOBBER in itself, but have another CLOBBER in the caller too
> >    when the var goes out of scope); tree DSE is able to remove CLOBBERs for
> >    the same var if they are adjacent; in either case, what would be the
> >    algorithm for that?  Something like add_scope_conflicts algorithm, just
> >    not build any conflicts, but just propagate the var is live bitmaps
> >    and when the propagation terminates, go through all bbs and if it sees
> >    a clobber on a var that isn't live at that point, remove the clobber?
> 
> That said, I think it would be doable also in the inliner if you prefer to
> do it there for now, and for GCC10 other passes could use that for other
> purposes.
> 
> I'd do it in copy_cfg_body before we destroy the ->aux fields mapping src_fn
> bbs to dst_fn blocks, pre_and_rev_post_order_compute_fn (id->src_cfun, ...)
> + have a bitmap previously computed of interesting decls to track liveness
> of (for later uses if the bitmap is NULL it could track all VAR_DECLs that
> need to live in memory) and do the liveness propagation similarly to what
> add_scope_conflicts does on the src_cfun bbs, then in the end just look at
> srcs of EDGE_EH edges that are >= last and see what vars are live at the end
> of those bbs from the bitmaps.

I wonder if instead of tracking interesting vars we can compute
local live-in/out during actual BB copying and then just iterate
the global problem for the part going into the interesting EH
edges?

OTOH doing all this smells like a source of quadraticness so I hope
it won't be necessary or we can instead use some heuristics to
prune the set of vars to add clobbers for (can we somehow use
BLOCK_VARs of the throws?) if the numbers say it looks necessary.

But yes, adding them all during inlining and then having a way
to prune them later would be another option.

I don't want to be too clever for GCC 9 since we're quite late so
another option is to limit the number of clobbers generated
(just generate them for the top N of vars sorted by size (prioritizing
variable-sized ones)?).

Richard.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-01-18  9:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-17 13:35 [PATCH] Improve stack variable reuse with inlining with exceptions (PR tree-optimization/86214) Jakub Jelinek
2019-01-17 14:07 ` Richard Biener
2019-01-17 14:43   ` Jakub Jelinek
2019-01-18  9:07     ` Jakub Jelinek
2019-01-18  9:23       ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).