public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* RFA: Speedup expand_used_vars by 30 times (PR38474)
@ 2012-05-27  3:04 Michael Matz
  2012-05-29  9:50 ` Richard Guenther
  2012-07-02 20:59 ` RFA: Speedup expand_used_vars by 30 times (PR38474) Mike Stump
  0 siblings, 2 replies; 8+ messages in thread
From: Michael Matz @ 2012-05-27  3:04 UTC (permalink / raw)
  To: gcc-patches

Hi,

[for certain test cases :-) ]

the temp slot cleanups I just sent where actually motivated by PR38474.  
It exposes many slownesses in the compiler, but at -O0 the only remaining 
one is the expand phase.  expanding variables is the one thing (on my 
machine here, with -O0 -g f951): 30 seconds for the reduced testcase.  
expand itself (the statements) then needs 5.5 seconds and is the other 
thing.  Overall 77 seconds compile time.

The problem with expanding variables in this case is the 
add_alias_set_conflicts routine.  It walks all NxN/2 stack variable pairs, 
looks if pair (i,j) can share the same stack slot from a type perspective 
(canshare(i,j)) and adds a conflict if not so.  Before asking that 
question it doesn't look if pair (i,j) maybe already conflict for other 
reasons.

Now, we could simply add the conflicts(i,j) test in that loop, before 
asking canshare(i,j) and adding a new conflict.  But adding conflicts 
comes with a cost, so we can do better.  The answer to canshare(i,j) 
depends on only all components already merged into [i] and the type of j.  
With carefully choosing a type to represent all components of [i] and 
updating it in union_stack_vars we can simply ask canshare(i,j) right 
before we actually want to merge j into [i].  That's the least amount of 
work possible as long as we want to have the same results.

This change brings the 30 seconds for var expansion down to 0.8 seconds.

The other change in function.c further improves the temp slot machinery.  
While expanding free_temp_slots is called after each statement, and if it 
is able to free a slot it needs to update the RTX->slot mapping (a 
htab_t).  The freeing of slots was done incorrectly, it overwrote the slot 
in question with NULL, but it should have used htab_clear_slot.  This 
meant that the hashtable kept growing and growing even with all elements 
being empty because the size statistics aren't correct.  Additionally it 
breaks hash chains if there is one striding the nullified slot.

And then a microoptimization: if we know there aren't any slots in use at 
all (happens often in normal circumstances) we can as well use htab_empty 
instead of traversing the hash table for the right slots.

This brings the expansion time (i.e. for expanding the statements) down 
from 5.5 to 2.8 seconds.  All changes together reduce the compile time 
for the reduced testcase from ~ 77 seconds to ~ 44.

A variant of this regstrapped on x86_64-linux, all languages+Ada+objc++, 
but after some changes I'm redoing that regstrap.  Okay for trunk if that 
passes?


Ciao,
Michael.
-----------------------
	PR middle-end/38474
	* cfgexpand.c (struct stack_var): Add slot_type member.
	(add_stack_var): Initialize it.
	(add_alias_set_conflicts): Remove.
	(merge_stack_vars_p, more_specific_type): New functions.
	(nonconflicting_type): New static data.
	(union_stack_vars): Use more_specific_type to update slot_type.
	(partition_stack_vars): Call merge_stack_vars_p ...
	(expand_used_vars): ... instead of add_alias_set_conflicts.
	(fini_vars_expansion): Clear nonconflicting_type.
	* function.c (n_temp_slots_in_use): New static data.
	(make_slot_available, assign_stack_temp_for_type): Update it.
	(init_temp_slots): Zero it.
	(remove_unused_temp_slot_addresses): Use it for quicker removal.
	(remove_unused_temp_slot_addresses_1): Use htab_clear_slot.

Index: cfgexpand.c
===================================================================
--- cfgexpand.c.orig	2012-05-27 01:33:55.000000000 +0200
+++ cfgexpand.c	2012-05-27 04:37:18.000000000 +0200
@@ -177,6 +177,11 @@ struct stack_var
   /* The next stack variable in the partition, or EOC.  */
   size_t next;
 
+  /* The most specific type of all variables merged with the slot
+     if this is a representative.  Initially this is simply the
+     TREE_TYPE for the variable.  */
+  tree slot_type;
+
   /* The numbers of conflicting stack variables.  */
   bitmap conflicts;
 };
@@ -285,6 +290,7 @@ add_stack_var (tree decl)
   /* All variables are initially in their own partition.  */
   v->representative = stack_vars_num;
   v->next = EOC;
+  v->slot_type = TREE_TYPE (decl);
 
   /* All variables initially conflict with no other.  */
   v->conflicts = NULL;
@@ -353,45 +359,40 @@ aggregate_contains_union_type (tree type
   return false;
 }
 
-/* A subroutine of expand_used_vars.  If two variables X and Y have alias
-   sets that do not conflict, then do add a conflict for these variables
-   in the interference graph.  We also need to make sure to add conflicts
-   for union containing structures.  Else RTL alias analysis comes along
-   and due to type based aliasing rules decides that for two overlapping
-   union temporaries { short s; int i; } accesses to the same mem through
-   different types may not alias and happily reorders stores across
-   life-time boundaries of the temporaries (See PR25654).  */
+/* A subroutine of partition_stack_vars.  If two variables I and J have alias
+   sets that do not conflict, then we cannot put them into the same stack
+   slot.  This is equivalent to adding conflicts for the two variables.
+   Return false in that case, otherwise return true.
+
+   We also need to make sure to not merge union containing structures.  Else
+   RTL alias analysis comes along and due to type based aliasing rules decides
+   that for two overlapping union temporaries { short s; int i; } accesses to
+   the same mem through different types may not alias and happily reorders
+   stores across life-time boundaries of the temporaries (See PR25654).  */
 
-static void
-add_alias_set_conflicts (void)
+static bool
+merge_stack_vars_p (size_t i, size_t j)
 {
-  size_t i, j, n = stack_vars_num;
-
-  for (i = 0; i < n; ++i)
-    {
-      tree type_i = TREE_TYPE (stack_vars[i].decl);
-      bool aggr_i = AGGREGATE_TYPE_P (type_i);
-      bool contains_union;
-
-      contains_union = aggregate_contains_union_type (type_i);
-      for (j = 0; j < i; ++j)
-	{
-	  tree type_j = TREE_TYPE (stack_vars[j].decl);
-	  bool aggr_j = AGGREGATE_TYPE_P (type_j);
-	  if (aggr_i != aggr_j
-	      /* Either the objects conflict by means of type based
-		 aliasing rules, or we need to add a conflict.  */
-	      || !objects_must_conflict_p (type_i, type_j)
-	      /* In case the types do not conflict ensure that access
-		 to elements will conflict.  In case of unions we have
-		 to be careful as type based aliasing rules may say
-		 access to the same memory does not conflict.  So play
-		 safe and add a conflict in this case when
-                 -fstrict-aliasing is used.  */
-              || (contains_union && flag_strict_aliasing))
-	    add_stack_var_conflict (i, j);
-	}
-    }
+  tree type_i = stack_vars[i].slot_type;
+  bool aggr_i = AGGREGATE_TYPE_P (type_i);
+  tree type_j = stack_vars[j].slot_type;
+  bool aggr_j = AGGREGATE_TYPE_P (type_j);
+
+  if (aggr_i != aggr_j
+      /* In case the types do not conflict ensure that access
+	 to elements will conflict.  In case of unions we have
+	 to be careful as type based aliasing rules may say
+	 access to the same memory does not conflict.  So play
+	 safe and don't merge in this case when
+	 -fstrict-aliasing is used.  */
+      || (flag_strict_aliasing
+	  && (aggregate_contains_union_type (type_i)
+	      || aggregate_contains_union_type (type_j)))
+      /* Either the objects conflict by means of type based
+	 aliasing rules, or we cannot merge them.  */
+      || !objects_must_conflict_p (type_i, type_j))
+    return false;
+  return true;
 }
 
 /* Callback for walk_stmt_ops.  If OP is a decl touched by add_stack_var
@@ -736,6 +737,68 @@ update_alias_info_with_stack_vars (void)
     }
 }
 
+/* Helper type for special situation in more_specific_type.  */
+static tree nonconflicting_type;
+
+/* Given two types T1 and T2, which must fulfill
+   objects_must_conflict_p (T1, T2), return the more specific of the two.
+
+   More specific is the one that must-conflicts with fewer types
+   than the other.  E.g. given two types with alias sets 0 and 1,
+   the latter is the more specific one.  Currently two types are only
+   objects_must_conflict_p if either both have the same alias set,
+   or one has alias set zero.  We return the other.  */
+
+static tree
+more_specific_type (tree t1, tree t2)
+{
+
+  /* See also the implementation of objects_must_conflict_p.  */
+  alias_set_type set1, set2;
+
+  /* If they are the same none is more specific.  */
+  if (t1 == t2)
+    return t1;
+
+  /* If one alias set is zero, the other is at least as specific.  */
+  set1 = get_alias_set (t1);
+  if (!set1)
+    return t2;
+  set2 = get_alias_set (t2);
+  if (!set2)
+    return t1;
+
+  if (set1 == set2)
+    return t1;
+
+  /* This can only happen if both types were volatile, as only then
+     objects_must_conflict_p answers true without looking at the alias
+     sets.  We can't return one of the two given types:
+
+     Suppose we have sets 3v and 4v (i.e. two non-conflicting sets,
+     but both volatile, hence objects_must_conflict_p is true).  Now we
+     choose one (say 3v) as the new slot type.  Suppose another variable
+     as union candidate has alias set 3 (not volatile), it could be
+     merged into this slot (3v and 3 must conflict).  Except that it
+     couldn't; because one component of this slot (4v) doesn't must-conflict
+     with set 3.
+
+     So instead we return a (new) type that is the most specific one
+     there is, namely one that doesn't conflict with anything else,
+     except the set 0.  This effectively switches off merging anything
+     else into this slot, except alias set 0 and volatile things.
+
+     It doesn't matter which type we choose, it's not used for any
+     code generation.  */
+
+  if (!nonconflicting_type)
+    {
+      nonconflicting_type = build_distinct_type_copy (integer_type_node);
+      TYPE_ALIAS_SET (nonconflicting_type) = new_alias_set ();
+    }
+  return nonconflicting_type;
+}
+
 /* A subroutine of partition_stack_vars.  The UNION portion of a UNION/FIND
    partitioning algorithm.  Partitions A and B are known to be non-conflicting.
    Merge them into a single partition A.  */
@@ -757,6 +820,10 @@ union_stack_vars (size_t a, size_t b)
   if (stack_vars[a].alignb < stack_vars[b].alignb)
     stack_vars[a].alignb = stack_vars[b].alignb;
 
+  /* Give the slot a type reflecting all its components aliasing-wise.  */
+  stack_vars[a].slot_type = more_specific_type (stack_vars[a].slot_type,
+						stack_vars[b].slot_type);
+
   /* Update the interference graph and merge the conflicts.  */
   if (vb->conflicts)
     {
@@ -825,6 +892,11 @@ partition_stack_vars (void)
 	      != (jalign * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT))
 	    continue;
 
+	  /* Due to the way alias sets work, no variables with non-conflicting
+	     alias sets may be assigned the same address.  */
+	  if (!merge_stack_vars_p (i, j))
+	    continue;
+
 	  /* UNION the objects, placing J at OFFSET.  */
 	  union_stack_vars (i, j);
 	}
@@ -1471,6 +1543,7 @@ fini_vars_expansion (void)
   stack_vars_alloc = stack_vars_num = 0;
   pointer_map_destroy (decl_to_stack_part);
   decl_to_stack_part = NULL;
+  nonconflicting_type = NULL;
 }
 
 /* Make a fair guess for the size of the stack frame of the function
@@ -1626,10 +1699,6 @@ expand_used_vars (void)
   if (stack_vars_num > 0)
     {
       add_scope_conflicts ();
-      /* Due to the way alias sets work, no variables with non-conflicting
-	 alias sets may be assigned the same address.  Add conflicts to
-	 reflect this.  */
-      add_alias_set_conflicts ();
 
       /* If stack protection is enabled, we don't share space between
 	 vulnerable data and non-vulnerable data.  */
Index: function.c
===================================================================
--- function.c.orig	2012-05-27 03:22:10.000000000 +0200
+++ function.c	2012-05-27 03:22:11.000000000 +0200
@@ -572,6 +572,7 @@ struct GTY(()) temp_slot {
 /* A table of addresses that represent a stack slot.  The table is a mapping
    from address RTXen to a temp slot.  */
 static GTY((param_is(struct temp_slot_address_entry))) htab_t temp_slot_address_table;
+static size_t n_temp_slots_in_use;
 
 /* Entry for the above hash table.  */
 struct GTY(()) temp_slot_address_entry {
@@ -648,6 +649,7 @@ make_slot_available (struct temp_slot *t
   insert_slot_to_list (temp, &avail_temp_slots);
   temp->in_use = 0;
   temp->level = -1;
+  n_temp_slots_in_use--;
 }
 
 /* Compute the hash value for an address -> temp slot mapping.
@@ -700,7 +702,7 @@ remove_unused_temp_slot_addresses_1 (voi
   const struct temp_slot_address_entry *t;
   t = (const struct temp_slot_address_entry *) *slot;
   if (! t->temp_slot->in_use)
-    *slot = NULL;
+    htab_clear_slot (temp_slot_address_table, slot);
   return 1;
 }
 
@@ -708,9 +710,13 @@ remove_unused_temp_slot_addresses_1 (voi
 static void
 remove_unused_temp_slot_addresses (void)
 {
-  htab_traverse (temp_slot_address_table,
-		 remove_unused_temp_slot_addresses_1,
-		 NULL);
+  /* Use quicker clearing if there aren't any active temp slots.  */
+  if (n_temp_slots_in_use)
+    htab_traverse (temp_slot_address_table,
+		   remove_unused_temp_slot_addresses_1,
+		   NULL);
+  else
+    htab_empty (temp_slot_address_table);
 }
 
 /* Find the temp slot corresponding to the object at address X.  */
@@ -902,6 +908,7 @@ assign_stack_temp_for_type (enum machine
   p->in_use = 1;
   p->type = type;
   p->level = temp_slot_level;
+  n_temp_slots_in_use++;
 
   pp = temp_slots_at_level (p->level);
   insert_slot_to_list (p, pp);
@@ -1213,6 +1220,7 @@ init_temp_slots (void)
   avail_temp_slots = 0;
   used_temp_slots = 0;
   temp_slot_level = 0;
+  n_temp_slots_in_use = 0;
 
   /* Set up the table to map addresses to temp slots.  */
   if (! temp_slot_address_table)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFA: Speedup expand_used_vars by 30 times (PR38474)
  2012-05-27  3:04 RFA: Speedup expand_used_vars by 30 times (PR38474) Michael Matz
@ 2012-05-29  9:50 ` Richard Guenther
  2012-05-29 11:59   ` Michael Matz
  2012-06-06 12:44   ` RF[CA]: Don't restrict stack slot sharing Michael Matz
  2012-07-02 20:59 ` RFA: Speedup expand_used_vars by 30 times (PR38474) Mike Stump
  1 sibling, 2 replies; 8+ messages in thread
From: Richard Guenther @ 2012-05-29  9:50 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

On Sun, May 27, 2012 at 5:03 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> [for certain test cases :-) ]
>
> the temp slot cleanups I just sent where actually motivated by PR38474.
> It exposes many slownesses in the compiler, but at -O0 the only remaining
> one is the expand phase.  expanding variables is the one thing (on my
> machine here, with -O0 -g f951): 30 seconds for the reduced testcase.
> expand itself (the statements) then needs 5.5 seconds and is the other
> thing.  Overall 77 seconds compile time.
>
> The problem with expanding variables in this case is the
> add_alias_set_conflicts routine.  It walks all NxN/2 stack variable pairs,
> looks if pair (i,j) can share the same stack slot from a type perspective
> (canshare(i,j)) and adds a conflict if not so.  Before asking that
> question it doesn't look if pair (i,j) maybe already conflict for other
> reasons.
>
> Now, we could simply add the conflicts(i,j) test in that loop, before
> asking canshare(i,j) and adding a new conflict.  But adding conflicts
> comes with a cost, so we can do better.  The answer to canshare(i,j)
> depends on only all components already merged into [i] and the type of j.
> With carefully choosing a type to represent all components of [i] and
> updating it in union_stack_vars we can simply ask canshare(i,j) right
> before we actually want to merge j into [i].  That's the least amount of
> work possible as long as we want to have the same results.
>
> This change brings the 30 seconds for var expansion down to 0.8 seconds.
>
> The other change in function.c further improves the temp slot machinery.
> While expanding free_temp_slots is called after each statement, and if it
> is able to free a slot it needs to update the RTX->slot mapping (a
> htab_t).  The freeing of slots was done incorrectly, it overwrote the slot
> in question with NULL, but it should have used htab_clear_slot.

That looks like a bugfix applicable to branches?

> This
> meant that the hashtable kept growing and growing even with all elements
> being empty because the size statistics aren't correct.  Additionally it
> breaks hash chains if there is one striding the nullified slot.
>
> And then a microoptimization: if we know there aren't any slots in use at
> all (happens often in normal circumstances) we can as well use htab_empty
> instead of traversing the hash table for the right slots.
>
> This brings the expansion time (i.e. for expanding the statements) down
> from 5.5 to 2.8 seconds.  All changes together reduce the compile time
> for the reduced testcase from ~ 77 seconds to ~ 44.
>
> A variant of this regstrapped on x86_64-linux, all languages+Ada+objc++,
> but after some changes I'm redoing that regstrap.  Okay for trunk if that
> passes?

The volatile handling is very odd - the objects_must_conflict_p code basically
says that two volatile vars may always share stack slots?!  What's the reasoning
for this, or handling volatiles special in any way?  I'd rather remove
this fishy
code (and if at all, ensure that volatile automatics are never coalesced with
any other stack slot).

Thanks,
Richard.

>
> Ciao,
> Michael.
> -----------------------
>        PR middle-end/38474
>        * cfgexpand.c (struct stack_var): Add slot_type member.
>        (add_stack_var): Initialize it.
>        (add_alias_set_conflicts): Remove.
>        (merge_stack_vars_p, more_specific_type): New functions.
>        (nonconflicting_type): New static data.
>        (union_stack_vars): Use more_specific_type to update slot_type.
>        (partition_stack_vars): Call merge_stack_vars_p ...
>        (expand_used_vars): ... instead of add_alias_set_conflicts.
>        (fini_vars_expansion): Clear nonconflicting_type.
>        * function.c (n_temp_slots_in_use): New static data.
>        (make_slot_available, assign_stack_temp_for_type): Update it.
>        (init_temp_slots): Zero it.
>        (remove_unused_temp_slot_addresses): Use it for quicker removal.
>        (remove_unused_temp_slot_addresses_1): Use htab_clear_slot.
>
> Index: cfgexpand.c
> ===================================================================
> --- cfgexpand.c.orig    2012-05-27 01:33:55.000000000 +0200
> +++ cfgexpand.c 2012-05-27 04:37:18.000000000 +0200
> @@ -177,6 +177,11 @@ struct stack_var
>   /* The next stack variable in the partition, or EOC.  */
>   size_t next;
>
> +  /* The most specific type of all variables merged with the slot
> +     if this is a representative.  Initially this is simply the
> +     TREE_TYPE for the variable.  */
> +  tree slot_type;
> +
>   /* The numbers of conflicting stack variables.  */
>   bitmap conflicts;
>  };
> @@ -285,6 +290,7 @@ add_stack_var (tree decl)
>   /* All variables are initially in their own partition.  */
>   v->representative = stack_vars_num;
>   v->next = EOC;
> +  v->slot_type = TREE_TYPE (decl);
>
>   /* All variables initially conflict with no other.  */
>   v->conflicts = NULL;
> @@ -353,45 +359,40 @@ aggregate_contains_union_type (tree type
>   return false;
>  }
>
> -/* A subroutine of expand_used_vars.  If two variables X and Y have alias
> -   sets that do not conflict, then do add a conflict for these variables
> -   in the interference graph.  We also need to make sure to add conflicts
> -   for union containing structures.  Else RTL alias analysis comes along
> -   and due to type based aliasing rules decides that for two overlapping
> -   union temporaries { short s; int i; } accesses to the same mem through
> -   different types may not alias and happily reorders stores across
> -   life-time boundaries of the temporaries (See PR25654).  */
> +/* A subroutine of partition_stack_vars.  If two variables I and J have alias
> +   sets that do not conflict, then we cannot put them into the same stack
> +   slot.  This is equivalent to adding conflicts for the two variables.
> +   Return false in that case, otherwise return true.
> +
> +   We also need to make sure to not merge union containing structures.  Else
> +   RTL alias analysis comes along and due to type based aliasing rules decides
> +   that for two overlapping union temporaries { short s; int i; } accesses to
> +   the same mem through different types may not alias and happily reorders
> +   stores across life-time boundaries of the temporaries (See PR25654).  */
>
> -static void
> -add_alias_set_conflicts (void)
> +static bool
> +merge_stack_vars_p (size_t i, size_t j)
>  {
> -  size_t i, j, n = stack_vars_num;
> -
> -  for (i = 0; i < n; ++i)
> -    {
> -      tree type_i = TREE_TYPE (stack_vars[i].decl);
> -      bool aggr_i = AGGREGATE_TYPE_P (type_i);
> -      bool contains_union;
> -
> -      contains_union = aggregate_contains_union_type (type_i);
> -      for (j = 0; j < i; ++j)
> -       {
> -         tree type_j = TREE_TYPE (stack_vars[j].decl);
> -         bool aggr_j = AGGREGATE_TYPE_P (type_j);
> -         if (aggr_i != aggr_j
> -             /* Either the objects conflict by means of type based
> -                aliasing rules, or we need to add a conflict.  */
> -             || !objects_must_conflict_p (type_i, type_j)
> -             /* In case the types do not conflict ensure that access
> -                to elements will conflict.  In case of unions we have
> -                to be careful as type based aliasing rules may say
> -                access to the same memory does not conflict.  So play
> -                safe and add a conflict in this case when
> -                 -fstrict-aliasing is used.  */
> -              || (contains_union && flag_strict_aliasing))
> -           add_stack_var_conflict (i, j);
> -       }
> -    }
> +  tree type_i = stack_vars[i].slot_type;
> +  bool aggr_i = AGGREGATE_TYPE_P (type_i);
> +  tree type_j = stack_vars[j].slot_type;
> +  bool aggr_j = AGGREGATE_TYPE_P (type_j);
> +
> +  if (aggr_i != aggr_j
> +      /* In case the types do not conflict ensure that access
> +        to elements will conflict.  In case of unions we have
> +        to be careful as type based aliasing rules may say
> +        access to the same memory does not conflict.  So play
> +        safe and don't merge in this case when
> +        -fstrict-aliasing is used.  */
> +      || (flag_strict_aliasing
> +         && (aggregate_contains_union_type (type_i)
> +             || aggregate_contains_union_type (type_j)))
> +      /* Either the objects conflict by means of type based
> +        aliasing rules, or we cannot merge them.  */
> +      || !objects_must_conflict_p (type_i, type_j))
> +    return false;
> +  return true;
>  }
>
>  /* Callback for walk_stmt_ops.  If OP is a decl touched by add_stack_var
> @@ -736,6 +737,68 @@ update_alias_info_with_stack_vars (void)
>     }
>  }
>
> +/* Helper type for special situation in more_specific_type.  */
> +static tree nonconflicting_type;
> +
> +/* Given two types T1 and T2, which must fulfill
> +   objects_must_conflict_p (T1, T2), return the more specific of the two.
> +
> +   More specific is the one that must-conflicts with fewer types
> +   than the other.  E.g. given two types with alias sets 0 and 1,
> +   the latter is the more specific one.  Currently two types are only
> +   objects_must_conflict_p if either both have the same alias set,
> +   or one has alias set zero.  We return the other.  */
> +
> +static tree
> +more_specific_type (tree t1, tree t2)
> +{
> +
> +  /* See also the implementation of objects_must_conflict_p.  */
> +  alias_set_type set1, set2;
> +
> +  /* If they are the same none is more specific.  */
> +  if (t1 == t2)
> +    return t1;
> +
> +  /* If one alias set is zero, the other is at least as specific.  */
> +  set1 = get_alias_set (t1);
> +  if (!set1)
> +    return t2;
> +  set2 = get_alias_set (t2);
> +  if (!set2)
> +    return t1;
> +
> +  if (set1 == set2)
> +    return t1;
> +
> +  /* This can only happen if both types were volatile, as only then
> +     objects_must_conflict_p answers true without looking at the alias
> +     sets.  We can't return one of the two given types:
> +
> +     Suppose we have sets 3v and 4v (i.e. two non-conflicting sets,
> +     but both volatile, hence objects_must_conflict_p is true).  Now we
> +     choose one (say 3v) as the new slot type.  Suppose another variable
> +     as union candidate has alias set 3 (not volatile), it could be
> +     merged into this slot (3v and 3 must conflict).  Except that it
> +     couldn't; because one component of this slot (4v) doesn't must-conflict
> +     with set 3.
> +
> +     So instead we return a (new) type that is the most specific one
> +     there is, namely one that doesn't conflict with anything else,
> +     except the set 0.  This effectively switches off merging anything
> +     else into this slot, except alias set 0 and volatile things.
> +
> +     It doesn't matter which type we choose, it's not used for any
> +     code generation.  */
> +
> +  if (!nonconflicting_type)
> +    {
> +      nonconflicting_type = build_distinct_type_copy (integer_type_node);
> +      TYPE_ALIAS_SET (nonconflicting_type) = new_alias_set ();
> +    }
> +  return nonconflicting_type;
> +}
> +
>  /* A subroutine of partition_stack_vars.  The UNION portion of a UNION/FIND
>    partitioning algorithm.  Partitions A and B are known to be non-conflicting.
>    Merge them into a single partition A.  */
> @@ -757,6 +820,10 @@ union_stack_vars (size_t a, size_t b)
>   if (stack_vars[a].alignb < stack_vars[b].alignb)
>     stack_vars[a].alignb = stack_vars[b].alignb;
>
> +  /* Give the slot a type reflecting all its components aliasing-wise.  */
> +  stack_vars[a].slot_type = more_specific_type (stack_vars[a].slot_type,
> +                                               stack_vars[b].slot_type);
> +
>   /* Update the interference graph and merge the conflicts.  */
>   if (vb->conflicts)
>     {
> @@ -825,6 +892,11 @@ partition_stack_vars (void)
>              != (jalign * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT))
>            continue;
>
> +         /* Due to the way alias sets work, no variables with non-conflicting
> +            alias sets may be assigned the same address.  */
> +         if (!merge_stack_vars_p (i, j))
> +           continue;
> +
>          /* UNION the objects, placing J at OFFSET.  */
>          union_stack_vars (i, j);
>        }
> @@ -1471,6 +1543,7 @@ fini_vars_expansion (void)
>   stack_vars_alloc = stack_vars_num = 0;
>   pointer_map_destroy (decl_to_stack_part);
>   decl_to_stack_part = NULL;
> +  nonconflicting_type = NULL;
>  }
>
>  /* Make a fair guess for the size of the stack frame of the function
> @@ -1626,10 +1699,6 @@ expand_used_vars (void)
>   if (stack_vars_num > 0)
>     {
>       add_scope_conflicts ();
> -      /* Due to the way alias sets work, no variables with non-conflicting
> -        alias sets may be assigned the same address.  Add conflicts to
> -        reflect this.  */
> -      add_alias_set_conflicts ();
>
>       /* If stack protection is enabled, we don't share space between
>         vulnerable data and non-vulnerable data.  */
> Index: function.c
> ===================================================================
> --- function.c.orig     2012-05-27 03:22:10.000000000 +0200
> +++ function.c  2012-05-27 03:22:11.000000000 +0200
> @@ -572,6 +572,7 @@ struct GTY(()) temp_slot {
>  /* A table of addresses that represent a stack slot.  The table is a mapping
>    from address RTXen to a temp slot.  */
>  static GTY((param_is(struct temp_slot_address_entry))) htab_t temp_slot_address_table;
> +static size_t n_temp_slots_in_use;
>
>  /* Entry for the above hash table.  */
>  struct GTY(()) temp_slot_address_entry {
> @@ -648,6 +649,7 @@ make_slot_available (struct temp_slot *t
>   insert_slot_to_list (temp, &avail_temp_slots);
>   temp->in_use = 0;
>   temp->level = -1;
> +  n_temp_slots_in_use--;
>  }
>
>  /* Compute the hash value for an address -> temp slot mapping.
> @@ -700,7 +702,7 @@ remove_unused_temp_slot_addresses_1 (voi
>   const struct temp_slot_address_entry *t;
>   t = (const struct temp_slot_address_entry *) *slot;
>   if (! t->temp_slot->in_use)
> -    *slot = NULL;
> +    htab_clear_slot (temp_slot_address_table, slot);
>   return 1;
>  }
>
> @@ -708,9 +710,13 @@ remove_unused_temp_slot_addresses_1 (voi
>  static void
>  remove_unused_temp_slot_addresses (void)
>  {
> -  htab_traverse (temp_slot_address_table,
> -                remove_unused_temp_slot_addresses_1,
> -                NULL);
> +  /* Use quicker clearing if there aren't any active temp slots.  */
> +  if (n_temp_slots_in_use)
> +    htab_traverse (temp_slot_address_table,
> +                  remove_unused_temp_slot_addresses_1,
> +                  NULL);
> +  else
> +    htab_empty (temp_slot_address_table);
>  }
>
>  /* Find the temp slot corresponding to the object at address X.  */
> @@ -902,6 +908,7 @@ assign_stack_temp_for_type (enum machine
>   p->in_use = 1;
>   p->type = type;
>   p->level = temp_slot_level;
> +  n_temp_slots_in_use++;
>
>   pp = temp_slots_at_level (p->level);
>   insert_slot_to_list (p, pp);
> @@ -1213,6 +1220,7 @@ init_temp_slots (void)
>   avail_temp_slots = 0;
>   used_temp_slots = 0;
>   temp_slot_level = 0;
> +  n_temp_slots_in_use = 0;
>
>   /* Set up the table to map addresses to temp slots.  */
>   if (! temp_slot_address_table)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFA: Speedup expand_used_vars by 30 times (PR38474)
  2012-05-29  9:50 ` Richard Guenther
@ 2012-05-29 11:59   ` Michael Matz
  2012-06-06 12:44   ` RF[CA]: Don't restrict stack slot sharing Michael Matz
  1 sibling, 0 replies; 8+ messages in thread
From: Michael Matz @ 2012-05-29 11:59 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1601 bytes --]

Hi,

On Tue, 29 May 2012, Richard Guenther wrote:

> > The other change in function.c further improves the temp slot 
> > machinery. While expanding free_temp_slots is called after each 
> > statement, and if it is able to free a slot it needs to update the 
> > RTX->slot mapping (a htab_t).  The freeing of slots was done 
> > incorrectly, it overwrote the slot in question with NULL, but it 
> > should have used htab_clear_slot.
> 
> That looks like a bugfix applicable to branches?

Perhaps, though the results of this bug are only worse code generation.  
When the chains are broken due to the bug some RTX->slot mappings aren't 
found anymore, but the code deals conservatively with that.

> > A variant of this regstrapped on x86_64-linux, all 
> > languages+Ada+objc++, but after some changes I'm redoing that 
> > regstrap.  Okay for trunk if that passes?
> 
> The volatile handling is very odd - the objects_must_conflict_p code 
> basically says that two volatile vars may always share stack slots?!  

That's correct.  Basically everything can share a stack slot that 
conflicts already for other reasons than address-based considerations.  
Two volatile accesses always conflict and hence they'll never be swapped 
(which is what creates the problems with stack slot sharing).

> What's the reasoning for this, or handling volatiles special in any way?  
> I'd rather remove this fishy code (and if at all, ensure that volatile 
> automatics are never coalesced with any other stack slot).

Volatility is no reason for not sharing stack slots, it's not fishy code.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RF[CA]: Don't restrict stack slot sharing
  2012-05-29  9:50 ` Richard Guenther
  2012-05-29 11:59   ` Michael Matz
@ 2012-06-06 12:44   ` Michael Matz
  2012-06-06 15:16     ` Richard Guenther
  1 sibling, 1 reply; 8+ messages in thread
From: Michael Matz @ 2012-06-06 12:44 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

Hi,

On Tue, 29 May 2012, Richard Guenther wrote:

> > [... patch about making conflict handling for stack slot sharing 
> > faster...]
> 
> The volatile handling is very odd - the objects_must_conflict_p code 
> basically says that two volatile vars may always share stack slots?!  
> What's the reasoning for this, or handling volatiles special in any way?  
> I'd rather remove this fishy code (and if at all, ensure that volatile 
> automatics are never coalesced with any other stack slot).

After some pondering and tries I propose something even more aggressive: 
disable the restriction on stack slot sharing alltogether.  The current 
code tries to avoid sharing slots for decls that don't conflict in their 
alias sets.  The thought behind that is that such accesses could be 
reordered by any transformation (on the basis of non-conflicting alias 
sets), which is a problem if those accesses actually go to the same 
memory.

Now, since this code was invented some TLC went into the RTL 
disambiguators.  In particular write_dependence_1 (and hence anti_ and 
output_dependece) don't use TBAA anymore (see below for a caveat).  
true_dependence_1 still does, and it's correct to do so.

As TBAA isn't used in the dependence testers that could create problems 
(namely swapping a write-write or a read-write pair) we also don't need to 
restrict the stack sharing based on TBAA.  We still will happily reorder a 
write-read pair (i.e. read-after-write), but that's okay because for this 
to have a bad effect the initial code must have been invalid already (in 
our mem model a read must be done with a matching type like the last write 
to that memory area).

I verified this with some tweaks on the scheduler on x86(-64).  I disabled 
the stack slot sharing restrictions, then I forced a first scheduling pass 
with -O2, and I implemented a shuffle mode that reorders everything it 
can:

Index: common/config/i386/i386-common.c
===================================================================
--- common/config/i386/i386-common.c    (revision 187959)
+++ common/config/i386/i386-common.c    (working copy)
@@ -618,7 +618,8 @@ static const struct default_options ix86
     { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
     /* Turn off -fschedule-insns by default.  It tends to make the
        problem with not enough registers even worse.  */
-    { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
+    /*{ OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },*/
+    { OPT_LEVELS_2_PLUS, OPT_fschedule_insns, NULL, 1 },

 #ifdef SUBTARGET_OPTIMIZATION_OPTIONS
     SUBTARGET_OPTIMIZATION_OPTIONS,
Index: haifa-sched.c
===================================================================
--- haifa-sched.c       (revision 187959)
+++ haifa-sched.c       (working copy)
@@ -2436,6 +2436,9 @@ rank_for_schedule (const void *x, const
   /* Make sure that priority of TMP and TMP2 are initialized.  */
   gcc_assert (INSN_PRIORITY_KNOWN (tmp) && INSN_PRIORITY_KNOWN (tmp2));

+  if (!reload_completed)
+    return INSN_LUID (tmp2) - INSN_LUID (tmp);
+
   if (sched_pressure != SCHED_PRESSURE_NONE)
     {
       int diff;

The only regression with this change on a regstrap on x86_64-linux with 
and without -m32 is gcc.target/i386/pr49095.c, which already fails just 
with -fschedule-insns and no patches (unexpected input to the peephole 
patterns disable the optimization that is checked in this testcase via asm 
string matching).

The caveat: there's a pre-existing problem in the RTL disambiguators where 
nonoverlapping_component_refs_p still uses a sort of TBAA (component_ref 
paths), not based on alias sets.  This problem wasn't worked around with 
the stack sharing restrictions anyway, so it's untouched by this proposal.

What I propose is hence the below patch.

(Btw, what about adding this shuffle schedule mode as general testing 
mean, perhaps under an uck-me flag?)

Regstrapped this patch (all languages+Ada) on x86_64-linux, with and 
without the above scheduler hacks, no regressions (without the scheduler 
hacks).


Ciao,
Michael.
--------------------------
	PR middle-end/38474
	* cfgexpand.c (add_alias_set_conflicts): Remove.
	(expand_used_vars): Don't call it.
	(fini_vars_expansion): Clear nonconflicting_type.
	* function.c (n_temp_slots_in_use): New static data.
	(make_slot_available, assign_stack_temp_for_type): Update it.
	(init_temp_slots): Zero it.
	(remove_unused_temp_slot_addresses): Use it for quicker removal.
	(remove_unused_temp_slot_addresses_1): Use htab_clear_slot.

Index: cfgexpand.c
===================================================================
--- cfgexpand.c.orig	2012-05-29 16:24:46.000000000 +0200
+++ cfgexpand.c	2012-06-06 14:30:33.000000000 +0200
@@ -353,47 +353,6 @@ aggregate_contains_union_type (tree type
   return false;
 }
 
-/* A subroutine of expand_used_vars.  If two variables X and Y have alias
-   sets that do not conflict, then do add a conflict for these variables
-   in the interference graph.  We also need to make sure to add conflicts
-   for union containing structures.  Else RTL alias analysis comes along
-   and due to type based aliasing rules decides that for two overlapping
-   union temporaries { short s; int i; } accesses to the same mem through
-   different types may not alias and happily reorders stores across
-   life-time boundaries of the temporaries (See PR25654).  */
-
-static void
-add_alias_set_conflicts (void)
-{
-  size_t i, j, n = stack_vars_num;
-
-  for (i = 0; i < n; ++i)
-    {
-      tree type_i = TREE_TYPE (stack_vars[i].decl);
-      bool aggr_i = AGGREGATE_TYPE_P (type_i);
-      bool contains_union;
-
-      contains_union = aggregate_contains_union_type (type_i);
-      for (j = 0; j < i; ++j)
-	{
-	  tree type_j = TREE_TYPE (stack_vars[j].decl);
-	  bool aggr_j = AGGREGATE_TYPE_P (type_j);
-	  if (aggr_i != aggr_j
-	      /* Either the objects conflict by means of type based
-		 aliasing rules, or we need to add a conflict.  */
-	      || !objects_must_conflict_p (type_i, type_j)
-	      /* In case the types do not conflict ensure that access
-		 to elements will conflict.  In case of unions we have
-		 to be careful as type based aliasing rules may say
-		 access to the same memory does not conflict.  So play
-		 safe and add a conflict in this case when
-                 -fstrict-aliasing is used.  */
-              || (contains_union && flag_strict_aliasing))
-	    add_stack_var_conflict (i, j);
-	}
-    }
-}
-
 /* Callback for walk_stmt_ops.  If OP is a decl touched by add_stack_var
    enter its partition number into bitmap DATA.  */
 
@@ -1626,10 +1585,6 @@ expand_used_vars (void)
   if (stack_vars_num > 0)
     {
       add_scope_conflicts ();
-      /* Due to the way alias sets work, no variables with non-conflicting
-	 alias sets may be assigned the same address.  Add conflicts to
-	 reflect this.  */
-      add_alias_set_conflicts ();
 
       /* If stack protection is enabled, we don't share space between
 	 vulnerable data and non-vulnerable data.  */
Index: function.c
===================================================================
--- function.c.orig	2012-05-29 16:42:31.000000000 +0200
+++ function.c	2012-05-29 16:45:33.000000000 +0200
@@ -572,6 +572,7 @@ struct GTY(()) temp_slot {
 /* A table of addresses that represent a stack slot.  The table is a mapping
    from address RTXen to a temp slot.  */
 static GTY((param_is(struct temp_slot_address_entry))) htab_t temp_slot_address_table;
+static size_t n_temp_slots_in_use;
 
 /* Entry for the above hash table.  */
 struct GTY(()) temp_slot_address_entry {
@@ -648,6 +649,7 @@ make_slot_available (struct temp_slot *t
   insert_slot_to_list (temp, &avail_temp_slots);
   temp->in_use = 0;
   temp->level = -1;
+  n_temp_slots_in_use--;
 }
 
 /* Compute the hash value for an address -> temp slot mapping.
@@ -700,7 +702,7 @@ remove_unused_temp_slot_addresses_1 (voi
   const struct temp_slot_address_entry *t;
   t = (const struct temp_slot_address_entry *) *slot;
   if (! t->temp_slot->in_use)
-    *slot = NULL;
+    htab_clear_slot (temp_slot_address_table, slot);
   return 1;
 }
 
@@ -708,9 +710,13 @@ remove_unused_temp_slot_addresses_1 (voi
 static void
 remove_unused_temp_slot_addresses (void)
 {
-  htab_traverse (temp_slot_address_table,
-		 remove_unused_temp_slot_addresses_1,
-		 NULL);
+  /* Use quicker clearing if there aren't any active temp slots.  */
+  if (n_temp_slots_in_use)
+    htab_traverse (temp_slot_address_table,
+		   remove_unused_temp_slot_addresses_1,
+		   NULL);
+  else
+    htab_empty (temp_slot_address_table);
 }
 
 /* Find the temp slot corresponding to the object at address X.  */
@@ -902,6 +908,7 @@ assign_stack_temp_for_type (enum machine
   p->in_use = 1;
   p->type = type;
   p->level = temp_slot_level;
+  n_temp_slots_in_use++;
 
   pp = temp_slots_at_level (p->level);
   insert_slot_to_list (p, pp);
@@ -1213,6 +1220,7 @@ init_temp_slots (void)
   avail_temp_slots = 0;
   used_temp_slots = 0;
   temp_slot_level = 0;
+  n_temp_slots_in_use = 0;
 
   /* Set up the table to map addresses to temp slots.  */
   if (! temp_slot_address_table)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RF[CA]: Don't restrict stack slot sharing
  2012-06-06 12:44   ` RF[CA]: Don't restrict stack slot sharing Michael Matz
@ 2012-06-06 15:16     ` Richard Guenther
  2012-06-15 15:11       ` Michael Matz
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Guenther @ 2012-06-06 15:16 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

On Wed, Jun 6, 2012 at 2:43 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Tue, 29 May 2012, Richard Guenther wrote:
>
>> > [... patch about making conflict handling for stack slot sharing
>> > faster...]
>>
>> The volatile handling is very odd - the objects_must_conflict_p code
>> basically says that two volatile vars may always share stack slots?!
>> What's the reasoning for this, or handling volatiles special in any way?
>> I'd rather remove this fishy code (and if at all, ensure that volatile
>> automatics are never coalesced with any other stack slot).
>
> After some pondering and tries I propose something even more aggressive:
> disable the restriction on stack slot sharing alltogether.  The current
> code tries to avoid sharing slots for decls that don't conflict in their
> alias sets.  The thought behind that is that such accesses could be
> reordered by any transformation (on the basis of non-conflicting alias
> sets), which is a problem if those accesses actually go to the same
> memory.
>
> Now, since this code was invented some TLC went into the RTL
> disambiguators.  In particular write_dependence_1 (and hence anti_ and
> output_dependece) don't use TBAA anymore (see below for a caveat).
> true_dependence_1 still does, and it's correct to do so.
>
> As TBAA isn't used in the dependence testers that could create problems
> (namely swapping a write-write or a read-write pair) we also don't need to
> restrict the stack sharing based on TBAA.  We still will happily reorder a
> write-read pair (i.e. read-after-write), but that's okay because for this
> to have a bad effect the initial code must have been invalid already (in
> our mem model a read must be done with a matching type like the last write
> to that memory area).
>
> I verified this with some tweaks on the scheduler on x86(-64).  I disabled
> the stack slot sharing restrictions, then I forced a first scheduling pass
> with -O2, and I implemented a shuffle mode that reorders everything it
> can:
>
> Index: common/config/i386/i386-common.c
> ===================================================================
> --- common/config/i386/i386-common.c    (revision 187959)
> +++ common/config/i386/i386-common.c    (working copy)
> @@ -618,7 +618,8 @@ static const struct default_options ix86
>     { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
>     /* Turn off -fschedule-insns by default.  It tends to make the
>        problem with not enough registers even worse.  */
> -    { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> +    /*{ OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },*/
> +    { OPT_LEVELS_2_PLUS, OPT_fschedule_insns, NULL, 1 },
>
>  #ifdef SUBTARGET_OPTIMIZATION_OPTIONS
>     SUBTARGET_OPTIMIZATION_OPTIONS,
> Index: haifa-sched.c
> ===================================================================
> --- haifa-sched.c       (revision 187959)
> +++ haifa-sched.c       (working copy)
> @@ -2436,6 +2436,9 @@ rank_for_schedule (const void *x, const
>   /* Make sure that priority of TMP and TMP2 are initialized.  */
>   gcc_assert (INSN_PRIORITY_KNOWN (tmp) && INSN_PRIORITY_KNOWN (tmp2));
>
> +  if (!reload_completed)
> +    return INSN_LUID (tmp2) - INSN_LUID (tmp);
> +
>   if (sched_pressure != SCHED_PRESSURE_NONE)
>     {
>       int diff;
>
> The only regression with this change on a regstrap on x86_64-linux with
> and without -m32 is gcc.target/i386/pr49095.c, which already fails just
> with -fschedule-insns and no patches (unexpected input to the peephole
> patterns disable the optimization that is checked in this testcase via asm
> string matching).
>
> The caveat: there's a pre-existing problem in the RTL disambiguators where
> nonoverlapping_component_refs_p still uses a sort of TBAA (component_ref
> paths), not based on alias sets.  This problem wasn't worked around with
> the stack sharing restrictions anyway, so it's untouched by this proposal.
>
> What I propose is hence the below patch.
>
> (Btw, what about adding this shuffle schedule mode as general testing
> mean, perhaps under an uck-me flag?)
>
> Regstrapped this patch (all languages+Ada) on x86_64-linux, with and
> without the above scheduler hacks, no regressions (without the scheduler
> hacks).

The n_temp_slots_in_use part is ok.

The rest is also a good idea, and indeed the middle-end type-based
memory-model makes sharing slots always possible (modulo bugs,
like the nonoverlapping_component_refs_p code - which should simply
be removed).

Thus, ok for the rest, too, after waiting a while for others to chime in.

Thanks,
Richard.

>
> Ciao,
> Michael.
> --------------------------
>        PR middle-end/38474
>        * cfgexpand.c (add_alias_set_conflicts): Remove.
>        (expand_used_vars): Don't call it.
>        (fini_vars_expansion): Clear nonconflicting_type.
>        * function.c (n_temp_slots_in_use): New static data.
>        (make_slot_available, assign_stack_temp_for_type): Update it.
>        (init_temp_slots): Zero it.
>        (remove_unused_temp_slot_addresses): Use it for quicker removal.
>        (remove_unused_temp_slot_addresses_1): Use htab_clear_slot.
>
> Index: cfgexpand.c
> ===================================================================
> --- cfgexpand.c.orig    2012-05-29 16:24:46.000000000 +0200
> +++ cfgexpand.c 2012-06-06 14:30:33.000000000 +0200
> @@ -353,47 +353,6 @@ aggregate_contains_union_type (tree type
>   return false;
>  }
>
> -/* A subroutine of expand_used_vars.  If two variables X and Y have alias
> -   sets that do not conflict, then do add a conflict for these variables
> -   in the interference graph.  We also need to make sure to add conflicts
> -   for union containing structures.  Else RTL alias analysis comes along
> -   and due to type based aliasing rules decides that for two overlapping
> -   union temporaries { short s; int i; } accesses to the same mem through
> -   different types may not alias and happily reorders stores across
> -   life-time boundaries of the temporaries (See PR25654).  */
> -
> -static void
> -add_alias_set_conflicts (void)
> -{
> -  size_t i, j, n = stack_vars_num;
> -
> -  for (i = 0; i < n; ++i)
> -    {
> -      tree type_i = TREE_TYPE (stack_vars[i].decl);
> -      bool aggr_i = AGGREGATE_TYPE_P (type_i);
> -      bool contains_union;
> -
> -      contains_union = aggregate_contains_union_type (type_i);
> -      for (j = 0; j < i; ++j)
> -       {
> -         tree type_j = TREE_TYPE (stack_vars[j].decl);
> -         bool aggr_j = AGGREGATE_TYPE_P (type_j);
> -         if (aggr_i != aggr_j
> -             /* Either the objects conflict by means of type based
> -                aliasing rules, or we need to add a conflict.  */
> -             || !objects_must_conflict_p (type_i, type_j)
> -             /* In case the types do not conflict ensure that access
> -                to elements will conflict.  In case of unions we have
> -                to be careful as type based aliasing rules may say
> -                access to the same memory does not conflict.  So play
> -                safe and add a conflict in this case when
> -                 -fstrict-aliasing is used.  */
> -              || (contains_union && flag_strict_aliasing))
> -           add_stack_var_conflict (i, j);
> -       }
> -    }
> -}
> -
>  /* Callback for walk_stmt_ops.  If OP is a decl touched by add_stack_var
>    enter its partition number into bitmap DATA.  */
>
> @@ -1626,10 +1585,6 @@ expand_used_vars (void)
>   if (stack_vars_num > 0)
>     {
>       add_scope_conflicts ();
> -      /* Due to the way alias sets work, no variables with non-conflicting
> -        alias sets may be assigned the same address.  Add conflicts to
> -        reflect this.  */
> -      add_alias_set_conflicts ();
>
>       /* If stack protection is enabled, we don't share space between
>         vulnerable data and non-vulnerable data.  */
> Index: function.c
> ===================================================================
> --- function.c.orig     2012-05-29 16:42:31.000000000 +0200
> +++ function.c  2012-05-29 16:45:33.000000000 +0200
> @@ -572,6 +572,7 @@ struct GTY(()) temp_slot {
>  /* A table of addresses that represent a stack slot.  The table is a mapping
>    from address RTXen to a temp slot.  */
>  static GTY((param_is(struct temp_slot_address_entry))) htab_t temp_slot_address_table;
> +static size_t n_temp_slots_in_use;
>
>  /* Entry for the above hash table.  */
>  struct GTY(()) temp_slot_address_entry {
> @@ -648,6 +649,7 @@ make_slot_available (struct temp_slot *t
>   insert_slot_to_list (temp, &avail_temp_slots);
>   temp->in_use = 0;
>   temp->level = -1;
> +  n_temp_slots_in_use--;
>  }
>
>  /* Compute the hash value for an address -> temp slot mapping.
> @@ -700,7 +702,7 @@ remove_unused_temp_slot_addresses_1 (voi
>   const struct temp_slot_address_entry *t;
>   t = (const struct temp_slot_address_entry *) *slot;
>   if (! t->temp_slot->in_use)
> -    *slot = NULL;
> +    htab_clear_slot (temp_slot_address_table, slot);
>   return 1;
>  }
>
> @@ -708,9 +710,13 @@ remove_unused_temp_slot_addresses_1 (voi
>  static void
>  remove_unused_temp_slot_addresses (void)
>  {
> -  htab_traverse (temp_slot_address_table,
> -                remove_unused_temp_slot_addresses_1,
> -                NULL);
> +  /* Use quicker clearing if there aren't any active temp slots.  */
> +  if (n_temp_slots_in_use)
> +    htab_traverse (temp_slot_address_table,
> +                  remove_unused_temp_slot_addresses_1,
> +                  NULL);
> +  else
> +    htab_empty (temp_slot_address_table);
>  }
>
>  /* Find the temp slot corresponding to the object at address X.  */
> @@ -902,6 +908,7 @@ assign_stack_temp_for_type (enum machine
>   p->in_use = 1;
>   p->type = type;
>   p->level = temp_slot_level;
> +  n_temp_slots_in_use++;
>
>   pp = temp_slots_at_level (p->level);
>   insert_slot_to_list (p, pp);
> @@ -1213,6 +1220,7 @@ init_temp_slots (void)
>   avail_temp_slots = 0;
>   used_temp_slots = 0;
>   temp_slot_level = 0;
> +  n_temp_slots_in_use = 0;
>
>   /* Set up the table to map addresses to temp slots.  */
>   if (! temp_slot_address_table)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RF[CA]: Don't restrict stack slot sharing
  2012-06-06 15:16     ` Richard Guenther
@ 2012-06-15 15:11       ` Michael Matz
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Matz @ 2012-06-15 15:11 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

Hi,

On Wed, 6 Jun 2012, Richard Guenther wrote:

> > Regstrapped this patch (all languages+Ada) on x86_64-linux, with and 
> > without the above scheduler hacks, no regressions (without the 
> > scheduler hacks).
> 
> The n_temp_slots_in_use part is ok.
> 
> The rest is also a good idea, and indeed the middle-end type-based 
> memory-model makes sharing slots always possible (modulo bugs, like the 
> nonoverlapping_component_refs_p code - which should simply be removed).
> 
> Thus, ok for the rest, too, after waiting a while for others to chime 
> in.

Now finally in as r188667.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFA: Speedup expand_used_vars by 30 times (PR38474)
  2012-05-27  3:04 RFA: Speedup expand_used_vars by 30 times (PR38474) Michael Matz
  2012-05-29  9:50 ` Richard Guenther
@ 2012-07-02 20:59 ` Mike Stump
  2012-07-03 11:42   ` Michael Matz
  1 sibling, 1 reply; 8+ messages in thread
From: Mike Stump @ 2012-07-02 20:59 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

On May 26, 2012, at 8:03 PM, Michael Matz wrote:
> -----------------------
> 	PR middle-end/38474
> 	* cfgexpand.c (struct stack_var): Add slot_type member.
> 	(add_stack_var): Initialize it.
> 	(add_alias_set_conflicts): Remove.
> 	(merge_stack_vars_p, more_specific_type): New functions.
> 	(nonconflicting_type): New static data.
> 	(union_stack_vars): Use more_specific_type to update slot_type.
> 	(partition_stack_vars): Call merge_stack_vars_p ...

So this patch seems to have killed 'RUNTESTFLAGS=dg.exp=class_array_3.f03' check-fortran for me.  Any insight?  Unfortunately the bug I think manifests as generated code in the fortran runtime libraries, so I need to go track down the code.

Also, git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@188667 seems to not match this patch, nor any other patch I can find, but it is the closest match.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFA: Speedup expand_used_vars by 30 times (PR38474)
  2012-07-02 20:59 ` RFA: Speedup expand_used_vars by 30 times (PR38474) Mike Stump
@ 2012-07-03 11:42   ` Michael Matz
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Matz @ 2012-07-03 11:42 UTC (permalink / raw)
  To: Mike Stump; +Cc: gcc-patches

Hi,

On Mon, 2 Jul 2012, Mike Stump wrote:

> On May 26, 2012, at 8:03 PM, Michael Matz wrote:
> > -----------------------
> > 	PR middle-end/38474
> > 	* cfgexpand.c (struct stack_var): Add slot_type member.
> > 	(add_stack_var): Initialize it.
> > 	(add_alias_set_conflicts): Remove.
> > 	(merge_stack_vars_p, more_specific_type): New functions.
> > 	(nonconflicting_type): New static data.
> > 	(union_stack_vars): Use more_specific_type to update slot_type.
> > 	(partition_stack_vars): Call merge_stack_vars_p ...
> 
> So this patch seems to have killed 
> 'RUNTESTFLAGS=dg.exp=class_array_3.f03' check-fortran for me.  Any 
> insight?

Hmm, no.  Works for me just fine :-/  So something is shared on the stack, 
but then reordered so that both entities are live.  If you find which file 
is miscompiled you can look at the -fdump-rtl-expand-details dumps for 
entries like:
  Partition 1: size 32 align 16
        dest
  Partition 0: size 32 align 16
        src

If something is shared there will be multiple decls mentioned in the 
lines.  (Ignore the lines mentioning Partition without a ':', that the 
out-of-ssa partitions, not the stack slots).

> Unfortunately the bug I think manifests as generated code in 
> the fortran runtime libraries, so I need to go track down the code.
> 
> Also, git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@188667 seems to 
> not match this patch, nor any other patch I can find, but it is the 
> closest match.

The patch that went in is 
  http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00367.html


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-07-03 11:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-27  3:04 RFA: Speedup expand_used_vars by 30 times (PR38474) Michael Matz
2012-05-29  9:50 ` Richard Guenther
2012-05-29 11:59   ` Michael Matz
2012-06-06 12:44   ` RF[CA]: Don't restrict stack slot sharing Michael Matz
2012-06-06 15:16     ` Richard Guenther
2012-06-15 15:11       ` Michael Matz
2012-07-02 20:59 ` RFA: Speedup expand_used_vars by 30 times (PR38474) Mike Stump
2012-07-03 11:42   ` Michael Matz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).