[PR tree-optimization/52558]: RFC: questions on store data race

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PR tree-optimization/52558]: RFC: questions on store data race
@ 2012-04-12 22:12 Aldy Hernandez
  2012-04-13  8:46 ` Richard Guenther
  2012-04-13 23:23 ` Boehm, Hans
  0 siblings, 2 replies; 22+ messages in thread
From: Aldy Hernandez @ 2012-04-12 22:12 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Andrew MacLeod, Boehm, Hans, gcc-patches, Torvald Riegel

[-- Attachment #1: Type: text/plain, Size: 2385 bytes --]

Here we have a testcase that affects both the C++ memory model and 
transactional memory.

[Hans, this is caused by the same problem that is causing the 
speculative register promotion issue you and Torvald pointed me at].

In the following testcase (adapted from the PR), the loop invariant 
motion pass caches a pre-existing value for g_2, and then performs a 
store to g_2 on every path, causing a store data race:

int g_1 = 1;
int g_2 = 0;

int func_1(void)
{
   int l;
   for (l = 0; l < 1234; l++)
   {
     if (g_1)
       return l;
     else
       g_2 = 0;	<-- Store to g_2 should only happen if !g_1
   }
   return 999;
}

This gets transformed into something like:

	g_2_lsm = g_2;
	if (g_1) {
		g_2 = g_2_lsm;	// boo! hiss!
		return 0;
	} else {
		g_2_lsm = 0;
		g_2 = g_2_lsm;
	}

The spurious write to g_2 could cause a data race.

Andrew has suggested verifying that the store to g_2 was actually 
different than on entry, and letting PHI copy propagation optimize the 
redundant comparisons.  Like this:

	g_2_lsm = g_2;
	if (g_1) {
		if (g_2_lsm != g_2)	// hopefully optimized away
			g_2 = g_2_lsm;	// hopefully optimized away
		return 0;
	} else {
		g_2_lsm = 0;
		if (g_2_lsm != g_2)	// hopefully optimized away
			g_2 = g_2_lsm;
	}

...which PHI copy propagation and/or friends should be able to optimize.

For that matter, regardless of the memory model, the proposed solution 
would improve the existing code by removing the extra store (in this 
contrived test case anyhow).

Anyone see a problem with this approach (see attached patch)?

Assuming the unlikely scenario that everyone likes this :), I have two 
problems that I would like answered.  But feel free to ignore if the 
approach is a no go.

1. No pass can figure out the equality (or inequality) of g_2_lsm and 
g_2.  In the PR, Richard mentions that both FRE/PRE can figure this out, 
but they are not run after store motion.  DOM should also be able to, 
but apparently does not :(.  Tips?

2. The GIMPLE_CONDs I create in this patch are currently causing 
problems with complex floats/doubles.  do_jump is complaining that it 
can't compare them, so I assume it is too late (in tree-ssa-loop-im.c) 
to compare complex values since complex lowering has already happened 
(??). Is there a more canonical way of creating a GIMPLE_COND that may 
contain complex floats at this stage?

Thanks folks.

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 3287 bytes --]

	* tree-ssa-loop-im.c (execute_sm): Guard store motion with a
	conditional.
	* opts.c (finish_options): Do not allow store or load data races
	with -fgnu-tm.

Index: tree-ssa-loop-im.c
===================================================================
--- tree-ssa-loop-im.c	(revision 186108)
+++ tree-ssa-loop-im.c	(working copy)
@@ -1999,8 +1999,59 @@ execute_sm (struct loop *loop, VEC (edge
 
   FOR_EACH_VEC_ELT (edge, exits, i, ex)
     {
-      store = gimple_build_assign (unshare_expr (ref->mem), tmp_var);
-      gsi_insert_on_edge (ex, store);
+      basic_block new_bb, then_bb, old_dest;
+      edge then_old_edge;
+      gimple_stmt_iterator gsi;
+      gimple stmt;
+      tree t1;
+
+      if (PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES))
+	{
+	  store = gimple_build_assign (unshare_expr (ref->mem), tmp_var);
+	  gsi_insert_on_edge (ex, store);
+	}
+      else
+	{
+	  old_dest = ex->dest;
+	  new_bb = split_edge (ex);
+	  then_bb = create_empty_bb (new_bb);
+	  if (current_loops && new_bb->loop_father)
+	    add_bb_to_loop (then_bb, new_bb->loop_father);
+
+	  gsi = gsi_start_bb (new_bb);
+	  t1 = make_rename_temp (TREE_TYPE (ref->mem), NULL);
+	  stmt = gimple_build_assign (t1, unshare_expr (ref->mem));
+	  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+	  stmt = gimple_build_cond (NE_EXPR, t1, tmp_var,
+				    NULL_TREE, NULL_TREE);
+	  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+
+	  gsi = gsi_start_bb (then_bb);
+	  store = gimple_build_assign (unshare_expr (ref->mem), tmp_var);
+	  gsi_insert_after (&gsi, store, GSI_CONTINUE_LINKING);
+
+	  make_edge (new_bb, then_bb, EDGE_TRUE_VALUE);
+	  make_edge (new_bb, old_dest, EDGE_FALSE_VALUE);
+	  then_old_edge = make_edge (then_bb, old_dest, EDGE_FALLTHRU);
+	  set_immediate_dominator (CDI_DOMINATORS, then_bb, new_bb);
+
+	  for (gsi = gsi_start_phis (old_dest); !gsi_end_p (gsi); gsi_next (&gsi))
+	    {
+	      gimple phi = gsi_stmt (gsi);
+	      unsigned i;
+
+	      for (i = 0; i < gimple_phi_num_args (phi); i++)
+		if (gimple_phi_arg_edge (phi, i)->src == new_bb)
+		  {
+		    tree arg = gimple_phi_arg_def (phi, i);
+		    add_phi_arg (phi, arg, then_old_edge, UNKNOWN_LOCATION);
+		    update_stmt (phi);
+		  }
+	    }
+	  /* Remove the original fall through edge.  This was the
+	     single_succ_edge (new_bb).  */
+	  EDGE_SUCC (new_bb, 0)->flags &= ~EDGE_FALLTHRU;
+	}
     }
 }
 
Index: opts.c
===================================================================
--- opts.c	(revision 186108)
+++ opts.c	(working copy)
@@ -663,8 +663,16 @@ finish_options (struct gcc_options *opts
       opts->x_flag_toplevel_reorder = 0;
     }
 
-  if (opts->x_flag_tm && opts->x_flag_non_call_exceptions)
-    sorry ("transactional memory is not supported with non-call exceptions");
+  if (opts->x_flag_tm)
+    {
+      if (opts->x_flag_non_call_exceptions)
+	sorry ("transactional memory is not supported with non-call exceptions");
+
+      set_param_value ("allow-load-data-races", 0,
+		       opts->x_param_values, opts_set->x_param_values);
+      set_param_value ("allow-store-data-races", 0,
+		       opts->x_param_values, opts_set->x_param_values);
+    }
 
   /* -Wmissing-noreturn is alias for -Wsuggest-attribute=noreturn.  */
   if (opts->x_warn_missing_noreturn)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-04-12 22:12 [PR tree-optimization/52558]: RFC: questions on store data race Aldy Hernandez
@ 2012-04-13  8:46 ` Richard Guenther
  2012-04-13 23:32   ` Boehm, Hans
  2012-04-24 17:43   ` Aldy Hernandez
  2012-04-13 23:23 ` Boehm, Hans
  1 sibling, 2 replies; 22+ messages in thread
From: Richard Guenther @ 2012-04-13  8:46 UTC (permalink / raw)
  To: Aldy Hernandez
  Cc: Richard Guenther, Andrew MacLeod, Boehm, Hans, gcc-patches,
	Torvald Riegel

On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez <aldyh@redhat.com> wrote:
> Here we have a testcase that affects both the C++ memory model and
> transactional memory.
>
> [Hans, this is caused by the same problem that is causing the speculative
> register promotion issue you and Torvald pointed me at].
>
> In the following testcase (adapted from the PR), the loop invariant motion
> pass caches a pre-existing value for g_2, and then performs a store to g_2
> on every path, causing a store data race:
>
> int g_1 = 1;
> int g_2 = 0;
>
> int func_1(void)
> {
>  int l;
>  for (l = 0; l < 1234; l++)
>  {
>    if (g_1)
>      return l;
>    else
>      g_2 = 0;  <-- Store to g_2 should only happen if !g_1
>  }
>  return 999;
> }
>
> This gets transformed into something like:
>
>        g_2_lsm = g_2;
>        if (g_1) {
>                g_2 = g_2_lsm;  // boo! hiss!
>                return 0;
>        } else {
>                g_2_lsm = 0;
>                g_2 = g_2_lsm;
>        }
>
> The spurious write to g_2 could cause a data race.
>
> Andrew has suggested verifying that the store to g_2 was actually different
> than on entry, and letting PHI copy propagation optimize the redundant
> comparisons.  Like this:
>
>        g_2_lsm = g_2;
>        if (g_1) {
>                if (g_2_lsm != g_2)     // hopefully optimized away
>                        g_2 = g_2_lsm;  // hopefully optimized away
>                return 0;
>        } else {
>                g_2_lsm = 0;
>                if (g_2_lsm != g_2)     // hopefully optimized away
>                        g_2 = g_2_lsm;
>        }
>
> ...which PHI copy propagation and/or friends should be able to optimize.
>
> For that matter, regardless of the memory model, the proposed solution would
> improve the existing code by removing the extra store (in this contrived
> test case anyhow).
>
> Anyone see a problem with this approach (see attached patch)?

Can we _remove_ a store we percieve as redundant (with a single-threaded
view) with the memory model?  If so, then this sounds plausible (minus
the optimization that if the variable is written to unconditionally we avoid
this comparison -- see the places where we check whether we can apply
store motion to possibly trapping locations).

A similar effect could be obtained by keeping a flag whether we entered the
path that stored the value before the transform.  Thus, do

  lsm = g2;  // technically not needed, if you also handle loads
  flag = false;
  for (...)
    {
       if (g1)
         {
            if (flag)
              g2 = lsm;
         }
       else
         {
            lsm = 0;
            flag = true;
         }
    }
  if (flag)
    g2 = lsm;

that would allow to extend this to cover the cases where the access may
eventually trap (if you omit the load before the loop).

Not sure which is more efficient (you can eventually combine flag variables
for different store locations, combining the if-changed compare is not so easy
I guess).

> Assuming the unlikely scenario that everyone likes this :), I have two
> problems that I would like answered.  But feel free to ignore if the
> approach is a no go.
>
> 1. No pass can figure out the equality (or inequality) of g_2_lsm and g_2.
>  In the PR, Richard mentions that both FRE/PRE can figure this out, but they
> are not run after store motion.  DOM should also be able to, but apparently
> does not :(.  Tips?

Well.  Schedule FRE after loop opts ...

DOM is not prepared to handle loads/stores with differing VOPs - it was never
updated really after the alias-improvements merge.

> 2. The GIMPLE_CONDs I create in this patch are currently causing problems
> with complex floats/doubles.  do_jump is complaining that it can't compare
> them, so I assume it is too late (in tree-ssa-loop-im.c) to compare complex
> values since complex lowering has already happened (??). Is there a more
> canonical way of creating a GIMPLE_COND that may contain complex floats at
> this stage?

As you are handling redundant stores you want a bitwise comparison anyway,
but yes, complex compares need to be lowered.  But as said, you want a
bitwise comparison, not a value-comparison.  You can try using

  if (VIEW_CONVERT_EXPR <int_type_for_mode (mode_for_size (...)) >
(lsm_tmp_reg) != VIEW_CONVERT_EXPR < .... > (orig_value))
    ...

that can of course result in really awful codegen (think of fp - gp register
moves, or even stack spills).  Which maybe hints at that the flag approach
would be better in some cases?  (you need to cache the original value
in a register anyway, the only thing you save is updating it when you would
update the flag value)

Maybe you can combine both, eventually dependent on a target hook
can_compare_bitwise (mode) that would tell you if the target can efficiently
compare two registers in mode for bit equality.

Few comments on the patch itself.

+         new_bb = split_edge (ex);
+         then_bb = create_empty_bb (new_bb);
+         if (current_loops && new_bb->loop_father)
+           add_bb_to_loop (then_bb, new_bb->loop_father);

+         gsi = gsi_start_bb (new_bb);
+         t1 = make_rename_temp (TREE_TYPE (ref->mem), NULL);

Hmm, why do you need to re-load the value?  Don't you have both
the value as it was loaded before the loop and the new value (the lsm tmp reg)
already?  Why not simply remember the SSA name used?  Ah, because
store motion also uses make_rename_temp.  If you don't want to change
that (which would be good but inconvenient because of the required PHI
insertions), I'd change the code that performs the load in the pre-header to do

  SSA_NAME = ref->mem;
  rename-lsm-tmp = SSA_NAME;

so you can remember SSA_NAME and re-use it here.  That probably solves
your optimization issue, too.

+         stmt = gimple_build_assign (t1, unshare_expr (ref->mem));
+         gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+         stmt = gimple_build_cond (NE_EXPR, t1, tmp_var,
+                                   NULL_TREE, NULL_TREE);

As I mentioned above you need to do a bitwise comparison (for example
we do not want to miss a store with -0.0 when the original value was 0.0,
but -0.0 == 0.0).  Thus you need to create two new temporaries
here to do the conversion (if it isn't already an integer mode).

+         for (gsi = gsi_start_phis (old_dest); !gsi_end_p (gsi);
gsi_next (&gsi))
+           {
+             gimple phi = gsi_stmt (gsi);
+             unsigned i;
+
+             for (i = 0; i < gimple_phi_num_args (phi); i++)
+               if (gimple_phi_arg_edge (phi, i)->src == new_bb)
+                 {
+                   tree arg = gimple_phi_arg_def (phi, i);
+                   add_phi_arg (phi, arg, then_old_edge, UNKNOWN_LOCATION);
+                   update_stmt (phi);
+                 }
+           }

Hmm.  This is of course doing unnecessary work in case there are multiple
moved stores.  In fact splitting the exit edge is only necessary if there are
more than one predecessor.  Otherwise you can simply split the exit block
after inserting the new conditional after its labels.

+  if (opts->x_flag_tm)
+    {
+      if (opts->x_flag_non_call_exceptions)
+       sorry ("transactional memory is not supported with non-call
exceptions");
+
+      set_param_value ("allow-load-data-races", 0,
+                      opts->x_param_values, opts_set->x_param_values);
+      set_param_value ("allow-store-data-races", 0,
+                      opts->x_param_values, opts_set->x_param_values);

Unless the user explicitely set those params?  Which means using params
wasn't a very good idea ... ideally you'd diagnose an error when the user
would mix -fgnu-tm and -fallow-store-data-races.  So - can we remove
allow-load-data-races (we are never going to implement that) and make
allow-store-data-races a proper -f flag please?

+    }

And that should be a separate patch

It would be nice if you could think about LIM/SM for possibly trapping
loads/stores
while you are doing this work.

Thanks,
Richard.

>
> Thanks folks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-04-12 22:12 [PR tree-optimization/52558]: RFC: questions on store data race Aldy Hernandez
  2012-04-13  8:46 ` Richard Guenther
@ 2012-04-13 23:23 ` Boehm, Hans
  2012-04-17 15:04   ` Aldy Hernandez
  1 sibling, 1 reply; 22+ messages in thread
From: Boehm, Hans @ 2012-04-13 23:23 UTC (permalink / raw)
  To: Aldy Hernandez, Richard Guenther
  Cc: Andrew MacLeod, gcc-patches, Torvald Riegel



> -----Original Message-----
> From: Aldy Hernandez [mailto:aldyh@redhat.com]
> Sent: Thursday, April 12, 2012 3:12 PM
> To: Richard Guenther
> Cc: Andrew MacLeod; Boehm, Hans; gcc-patches; Torvald Riegel
> Subject: [PR tree-optimization/52558]: RFC: questions on store data
> race
> 
> Here we have a testcase that affects both the C++ memory model and
> transactional memory.
> 
> [Hans, this is caused by the same problem that is causing the
> speculative register promotion issue you and Torvald pointed me at].
> 
> In the following testcase (adapted from the PR), the loop invariant
> motion pass caches a pre-existing value for g_2, and then performs a
> store to g_2 on every path, causing a store data race:
> 
> int g_1 = 1;
> int g_2 = 0;
> 
> int func_1(void)
> {
>    int l;
>    for (l = 0; l < 1234; l++)
>    {
>      if (g_1)
>        return l;
>      else
>        g_2 = 0;	<-- Store to g_2 should only happen if !g_1
>    }
>    return 999;
> }
> 
> This gets transformed into something like:
> 
> 	g_2_lsm = g_2;
> 	if (g_1) {
> 		g_2 = g_2_lsm;	// boo! hiss!
> 		return 0;
> 	} else {
> 		g_2_lsm = 0;
> 		g_2 = g_2_lsm;
> 	}
> 
> The spurious write to g_2 could cause a data race.
Presumably the g_2_lsm = g_2 is actually outside the loop?

Why does the second g_2 = g_2_lsm; get introduced?  I would have expected it before the return.  Was the example just over-abbreviated?

Other than that, this sounds right to me.  So does Richard's flag-based version, which is the approach I would have originally expected.  But that clearly costs you a register.  It would be interesting to see how they compare.

Hans

> 
> Andrew has suggested verifying that the store to g_2 was actually
> different than on entry, and letting PHI copy propagation optimize the
> redundant comparisons.  Like this:
> 
> 	g_2_lsm = g_2;
> 	if (g_1) {
> 		if (g_2_lsm != g_2)	// hopefully optimized away
> 			g_2 = g_2_lsm;	// hopefully optimized away
> 		return 0;
> 	} else {
> 		g_2_lsm = 0;
> 		if (g_2_lsm != g_2)	// hopefully optimized away
> 			g_2 = g_2_lsm;
> 	}
> 
> ...which PHI copy propagation and/or friends should be able to
> optimize.
> 
> For that matter, regardless of the memory model, the proposed solution
> would improve the existing code by removing the extra store (in this
> contrived test case anyhow).
> 
> Anyone see a problem with this approach (see attached patch)?
> 
> Assuming the unlikely scenario that everyone likes this :), I have two
> problems that I would like answered.  But feel free to ignore if the
> approach is a no go.
> 
> 1. No pass can figure out the equality (or inequality) of g_2_lsm and
> g_2.  In the PR, Richard mentions that both FRE/PRE can figure this
> out, but they are not run after store motion.  DOM should also be able
> to, but apparently does not :(.  Tips?
> 
> 2. The GIMPLE_CONDs I create in this patch are currently causing
> problems with complex floats/doubles.  do_jump is complaining that it
> can't compare them, so I assume it is too late (in tree-ssa-loop-im.c)
> to compare complex values since complex lowering has already happened
> (??). Is there a more canonical way of creating a GIMPLE_COND that may
> contain complex floats at this stage?
> 
> Thanks folks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-04-13  8:46 ` Richard Guenther
@ 2012-04-13 23:32   ` Boehm, Hans
  2012-04-24 17:43   ` Aldy Hernandez
  1 sibling, 0 replies; 22+ messages in thread
From: Boehm, Hans @ 2012-04-13 23:32 UTC (permalink / raw)
  To: Richard Guenther, Aldy Hernandez
  Cc: Richard Guenther, Andrew MacLeod, gcc-patches, Torvald Riegel

> From: Richard Guenther [mailto:richard.guenther@gmail.com]
> Can we _remove_ a store we percieve as redundant (with a single-threaded
> view) with the memory model?
Generally yes, so long as synchronization operations either conservatively treated as completely opaque, or are treated correctly in the "single-threaded view".  If there is no synchronization between the original store, and the redundant one, then the redundant one changes things only if another thread writes to the same variable in-between.  That constitutes a data race, which invokes undefined behavior.  The general rule is that any sequentially correct transformation is OK between synchronization operations, so long as you don't store to anything you weren't supposed to modify, and the state at the next synchronization point is what would have been expected.  You can sometimes treat synchronizations more aggressively, but that should be safe.

Hans

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-04-13 23:23 ` Boehm, Hans
@ 2012-04-17 15:04   ` Aldy Hernandez
  0 siblings, 0 replies; 22+ messages in thread
From: Aldy Hernandez @ 2012-04-17 15:04 UTC (permalink / raw)
  To: Boehm, Hans; +Cc: Richard Guenther, Andrew MacLeod, gcc-patches, Torvald Riegel

On 04/13/12 18:22, Boehm, Hans wrote:
>
>
>> -----Original Message-----
>> From: Aldy Hernandez [mailto:aldyh@redhat.com]
>> Sent: Thursday, April 12, 2012 3:12 PM
>> To: Richard Guenther
>> Cc: Andrew MacLeod; Boehm, Hans; gcc-patches; Torvald Riegel
>> Subject: [PR tree-optimization/52558]: RFC: questions on store data
>> race
>>
>> Here we have a testcase that affects both the C++ memory model and
>> transactional memory.
>>
>> [Hans, this is caused by the same problem that is causing the
>> speculative register promotion issue you and Torvald pointed me at].
>>

>> In the following testcase (adapted from the PR), the loop invariant
>> motion pass caches a pre-existing value for g_2, and then performs a
>> store to g_2 on every path, causing a store data race:
>>
>> int g_1 = 1;
>> int g_2 = 0;
>>
>> int func_1(void)
>> {
>>     int l;
>>     for (l = 0; l<  1234; l++)
>>     {
>>       if (g_1)
>>         return l;
>>       else
>>         g_2 = 0;	<-- Store to g_2 should only happen if !g_1
>>     }
>>     return 999;
>> }
>>
>> This gets transformed into something like:
>>
>> 	g_2_lsm = g_2;
>> 	if (g_1) {
>> 		g_2 = g_2_lsm;	// boo! hiss!
>> 		return 0;
>> 	} else {
>> 		g_2_lsm = 0;
>> 		g_2 = g_2_lsm;
>> 	}
>>
>> The spurious write to g_2 could cause a data race.
> Presumably the g_2_lsm = g_2 is actually outside the loop?
>
> Why does the second g_2 = g_2_lsm; get introduced?  I would have expected it before the return.  Was the example just over-abbreviated?

There is some abbreviation going on :).  To be exact, this is what -O2 
currently produces for the lim1 pass.

<bb 2>:
   pretmp.4_1 = g_1;
   g_2_lsm.6_12 = g_2;

<bb 3>:
   # l_13 = PHI <l_6(5), 0(2)>
   # g_2_lsm.6_10 = PHI <g_2_lsm.6_11(5), g_2_lsm.6_12(2)>
   g_1.0_4 = pretmp.4_1;
   if (g_1.0_4 != 0)
     goto <bb 7>;
   else
     goto <bb 4>;

<bb 4>:
   g_2_lsm.6_11 = 0;
   l_6 = l_13 + 1;
   if (l_6 != 1234)
     goto <bb 5>;
   else
     goto <bb 8>;

<bb 8>:
   # g_2_lsm.6_18 = PHI <g_2_lsm.6_11(4)>
   g_2 = g_2_lsm.6_18;
   goto <bb 6>;

<bb 5>:
   goto <bb 3>;

<bb 7>:
   # g_2_lsm.6_17 = PHI <g_2_lsm.6_10(3)>
   # l_19 = PHI <l_13(3)>
   g_2 = g_2_lsm.6_17;

<bb 6>:
   # l_2 = PHI <l_19(7), 999(8)>
   return l_2;

So yes, there seems to be another write to g_2 inside the else, but 
probably because we have merged some basic blocks along the way.

>
> Other than that, this sounds right to me.  So does Richard's flag-based version, which is the approach I would have originally expected.  But that clearly costs you a register.  It would be interesting to see how they compare.

I am working on the flag based approach.

Thanks to both of you.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-04-13  8:46 ` Richard Guenther
  2012-04-13 23:32   ` Boehm, Hans
@ 2012-04-24 17:43   ` Aldy Hernandez
  2012-04-25 11:45     ` Richard Guenther
  1 sibling, 1 reply; 22+ messages in thread
From: Aldy Hernandez @ 2012-04-24 17:43 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Guenther, Andrew MacLeod, Boehm, Hans, gcc-patches,
	Torvald Riegel

[-- Attachment #1: Type: text/plain, Size: 8381 bytes --]

On 04/13/12 03:46, Richard Guenther wrote:
> On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez<aldyh@redhat.com>  wrote:

Richard.  Thanks so much for reviewing and providing an alternative 
approach, which AFAICT provides superior results.

> A similar effect could be obtained by keeping a flag whether we entered the
> path that stored the value before the transform.  Thus, do
>
>    lsm = g2;  // technically not needed, if you also handle loads
>    flag = false;
>    for (...)
>      {
>         if (g1)
>           {
>              if (flag)
>                g2 = lsm;
>           }
>         else
>           {
>              lsm = 0;
>              flag = true;
>           }
>      }
>    if (flag)
>      g2 = lsm;

I have implemented this in the attached patch, and it seems to be 
generating better code than my other if-changed approach.  It also 
avoids generating unnecessary loads that may trap.

For the original testcase:

int g_1 = 1;
int g_2 = 0;

int func_1(void)
{
  int l;
  for (l = 0; l < 1234; l++)
  {
    if (g_1)
      return l;
    else
      g_2 = 0;
  }
  return 999;
}

After all optimizations are done, we are now generating the following 
for the flags approach:

new:
func_1:
         movl    g_1(%rip), %edx
         xorl    %eax, %eax
         testl   %edx, %edx
         je      .L5
         rep
         ret
.L5:
         movl    $0, g_2(%rip)
         movw    $999, %ax
         ret

Which is significantly better than the unmodified trunk of:

func_1:
         movl    g_1(%rip), %edx
         movl    g_2(%rip), %eax
         testl   %edx, %edx
         je      .L5
         movl    %eax, g_2(%rip)
         xorl    %eax, %eax
         ret
.L5:
         movl    $0, g_2(%rip)
         movl    $999, %eax
         ret

And in other less contrived cases, we generate correct code that avoids 
writes on code paths that did not have it.  For example, in Hans 
register promotion example:

int count;

struct obj {
     int data;
     struct obj *next;
} *q;

void func()
{
   struct obj *p;
   for (p = q; p; p = p->next)
     	if (p->data > 0)
		count++;
}

Under the new memory model we should avoid promoting "count" to a 
register and unilaterally writing to it upon exiting the loop.

With the attached patch, we now generate:

func:
.LFB0:
         movq    q(%rip), %rax
         xorl    %ecx, %ecx
         movl    count(%rip), %edx
         testq   %rax, %rax
         je      .L1
.L9:
         movl    (%rax), %esi
         testl   %esi, %esi
         jle     .L3
         addl    $1, %edx
         movl    $1, %ecx
.L3:
         movq    8(%rax), %rax
         testq   %rax, %rax
         jne     .L9
         testb   %cl, %cl
         jne     .L12
.L1:
         rep
         ret
.L12:
         movl    %edx, count(%rip)
         ret

Which is as expected.

I don't understand your suggestion of:

 >    lsm = g2;  // technically not needed, if you also handle loads
> that would allow to extend this to cover the cases where the access may
> eventually trap (if you omit the load before the loop).

Can't I just remove the lsm=g2?  What's this about also handling loads?

>
> Not sure which is more efficient (you can eventually combine flag variables
> for different store locations, combining the if-changed compare is not so easy
> I guess).

So far, I see better code generated with the flags approach, so...

>> 1. No pass can figure out the equality (or inequality) of g_2_lsm and g_2.
>>   In the PR, Richard mentions that both FRE/PRE can figure this out, but they
>> are not run after store motion.  DOM should also be able to, but apparently
>> does not :(.  Tips?
>
> Well.  Schedule FRE after loop opts ...
>
> DOM is not prepared to handle loads/stores with differing VOPs - it was never
> updated really after the alias-improvements merge.
>
>> 2. The GIMPLE_CONDs I create in this patch are currently causing problems
>> with complex floats/doubles.  do_jump is complaining that it can't compare
>> them, so I assume it is too late (in tree-ssa-loop-im.c) to compare complex
>> values since complex lowering has already happened (??). Is there a more
>> canonical way of creating a GIMPLE_COND that may contain complex floats at
>> this stage?
>
> As you are handling redundant stores you want a bitwise comparison anyway,
> but yes, complex compares need to be lowered.  But as said, you want a
> bitwise comparison, not a value-comparison.  You can try using

I can ignore all this.  I have implemented both alternatives, with a 
target hook as suggested, but I'm thinking of removing it altogether and 
just leaving the flags approach.

> Few comments on the patch itself.
>
> +         new_bb = split_edge (ex);
> +         then_bb = create_empty_bb (new_bb);
> +         if (current_loops&&  new_bb->loop_father)
> +           add_bb_to_loop (then_bb, new_bb->loop_father);
>
> +         gsi = gsi_start_bb (new_bb);
> +         t1 = make_rename_temp (TREE_TYPE (ref->mem), NULL);
>
> Hmm, why do you need to re-load the value?  Don't you have both
> the value as it was loaded before the loop and the new value (the lsm tmp reg)
> already?  Why not simply remember the SSA name used?  Ah, because
> store motion also uses make_rename_temp.  If you don't want to change
> that (which would be good but inconvenient because of the required PHI
> insertions), I'd change the code that performs the load in the pre-header to do
>
>    SSA_NAME = ref->mem;
>    rename-lsm-tmp = SSA_NAME;
>
> so you can remember SSA_NAME and re-use it here.  That probably solves
> your optimization issue, too.

Fixed.

> +         for (gsi = gsi_start_phis (old_dest); !gsi_end_p (gsi);
> gsi_next (&gsi))
> +           {
> +             gimple phi = gsi_stmt (gsi);
> +             unsigned i;
> +
> +             for (i = 0; i<  gimple_phi_num_args (phi); i++)
> +               if (gimple_phi_arg_edge (phi, i)->src == new_bb)
> +                 {
> +                   tree arg = gimple_phi_arg_def (phi, i);
> +                   add_phi_arg (phi, arg, then_old_edge, UNKNOWN_LOCATION);
> +                   update_stmt (phi);
> +                 }
> +           }
>
> Hmm.  This is of course doing unnecessary work in case there are multiple
> moved stores.  In fact splitting the exit edge is only necessary if there are
> more than one predecessor.  Otherwise you can simply split the exit block
> after inserting the new conditional after its labels.

Is this still the case with the current patch?  If so, I'm a bit 
confused as to what you want here.

>
> +  if (opts->x_flag_tm)
> +    {
> +      if (opts->x_flag_non_call_exceptions)
> +       sorry ("transactional memory is not supported with non-call
> exceptions");
> +
> +      set_param_value ("allow-load-data-races", 0,
> +                      opts->x_param_values, opts_set->x_param_values);
> +      set_param_value ("allow-store-data-races", 0,
> +                      opts->x_param_values, opts_set->x_param_values);
>
> Unless the user explicitely set those params?  Which means using params
> wasn't a very good idea ... ideally you'd diagnose an error when the user
> would mix -fgnu-tm and -fallow-store-data-races.  So - can we remove
> allow-load-data-races (we are never going to implement that) and make
> allow-store-data-races a proper -f flag please?

Sorry, that was not meant for public consumption.  I have rewritten this 
to either take the param value as is, and in the case of flag_tm, only 
restrict the optimization if the loop is inside an actual transaction 
(see the changes to trans-mem.c, cause I know you'll hate 
bb_in_transaction() :-)).

>
> +    }
>
> And that should be a separate patch

I can certainly reimplement --param=allow-{load,store}-data-races as 
proper -f flags.  I don't care either way.  But I remember Ian Taylor 
having a different opinion.  If y'all agree, I can implement whatever.

> It would be nice if you could think about LIM/SM for possibly trapping
> loads/stores
> while you are doing this work.

We are no longer adding additional trapping loads (as with the 
if-changed approach).  Is this something else you'd like me to 
concentrate on?

Please take a look at the attached patch.  I tested the flags 
alternative on a full bootstrap, and there's only one regression which I 
will look at, but I'd like to know if the current WIP is aligned with 
what you had in mind.

Thanks again.
Aldy

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 13936 bytes --]

	* trans-mem.c (bb_in_transaction_1): New.
	(bb_in_transaction): New.
	* gimple.h (bb_in_transaction): Protoize.
	* Makefile.in (tree-ssa-loop-im.o): Depend on TARGET_H.
	* doc/tm.texi: Regenerate.
	* tree-ssa-loop-im.c (execute_sm_if_changed): New.
	(execute_sm_if_changed_flag): New.
	(execute_sm_if_changed_flag_set): New.
	(execute_sm): Do not generate data races unless requested.
	* targhooks.c (default_can_compare_bitwise_p): New.
	* targhooks.h (default_can_compare_bitwise_p): Protoize.
	* target.def (can_compare_bitwise_p): New hook.
	* tree-ssa-loop-im.c (execute_sm): Guard store motion with a
	conditional.
	* opts.c (finish_options): Do not allow store or load data races
	with -fgnu-tm.

Index: trans-mem.c
===================================================================
--- trans-mem.c	(revision 186108)
+++ trans-mem.c	(working copy)
@@ -135,6 +135,47 @@
 */
 
 \f
+/* Helper function for bb_in_transaction.  Returns true if the first
+   statement in BB is in a transaction.  If the BB does not have any
+   statements, return -1.  */
+
+static int
+bb_in_transaction_1 (basic_block bb)
+{
+  gimple t;
+
+  t = gimple_seq_first_stmt (bb_seq (bb));
+  if (t)
+    return gimple_in_transaction (t);
+  return -1;
+}
+
+/* Return true if BB is in a transaction.  This can only be called
+   after transaction bits have been calculated with
+   compute_transaction_bits().  */
+
+bool
+bb_in_transaction (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+  int res;
+
+  res = bb_in_transaction_1 (bb);
+  if (res != -1)
+    return (bool) res;
+
+  /* ?? Perhaps we should cache this somewhere in the BB, or are
+     multiple levels of empty BB's common. ??  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      int res = bb_in_transaction_1 (e->src);
+      if (res != -1)
+	return (bool) res;
+    }
+  return false;
+}
+
 /* Return the attributes we want to examine for X, or NULL if it's not
    something we examine.  We look at function types, but allow pointers
    to function types and function decls and peek through.  */
Index: tree-ssa-loop-im.c
===================================================================
--- tree-ssa-loop-im.c	(revision 186108)
+++ tree-ssa-loop-im.c	(working copy)
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.
 #include "tree-affine.h"
 #include "pointer-set.h"
 #include "tree-ssa-propagate.h"
+#include "target.h"
 
 /* TODO:  Support for predicated code motion.  I.e.
 
@@ -52,7 +53,7 @@ along with GCC; see the file COPYING3.
 	 }
      }
 
-   Where COND and INV are is invariants, but evaluating INV may trap or be
+   Where COND and INV are invariants, but evaluating INV may trap or be
    invalid from some other reason if !COND.  This may be transformed to
 
    if (cond)
@@ -1956,6 +1957,161 @@ get_lsm_tmp_name (tree ref, unsigned n)
   return lsm_tmp_name;
 }
 
+/* Helper function for execute_sm.  Emit code to store TMP_VAR into
+   MEM along edge EX.
+
+   The store is only done if MEM has changed.  We do this so no
+   changes to MEM occur on code paths that did not originally store
+   into it.
+
+   The common case for execute_sm will transform:
+
+     for (...) {
+       if (foo)
+         stuff;
+       else
+         MEM = TMP_VAR;
+     }
+
+   into:
+
+     lsm = MEM;
+     for (...) {
+       if (foo)
+         stuff;
+       else
+         lsm = TMP_VAR;
+     }
+     MEM = lsm;
+
+  This function will either generate when FLAG is present:
+
+     lsm = MEM;
+     flag = false;
+     ...
+     if (foo)
+       stuff;
+     else {
+       lsm = TMP_VAR;
+       flag = true;
+     }
+     if (flag)		<--
+       MEM = lsm;	<--
+
+  or the following when FLAG is not set:
+
+     lsm = MEM;
+     for (...) {
+       if (foo)
+         stuff;
+       else
+         lsm = TMP_VAR;
+     }
+     if (lsm != MEM)	<--
+       MEM = lsm;	<--
+*/
+
+static void
+execute_sm_if_changed (edge ex, tree mem, tree orig_mem_ssa, tree tmp_var,
+		       tree flag)
+{
+  basic_block new_bb, then_bb, old_dest = ex->dest;
+  bool single_exit_p = single_pred_p (ex->dest);
+  edge then_old_edge;
+  gimple_stmt_iterator gsi;
+  gimple stmt;
+
+  if (single_exit_p)
+    ex = split_block_after_labels (old_dest);
+
+  old_dest = ex->dest;
+  new_bb = split_edge (ex);
+  then_bb = create_empty_bb (new_bb);
+  if (current_loops && new_bb->loop_father)
+    add_bb_to_loop (then_bb, new_bb->loop_father);
+
+  gsi = gsi_start_bb (new_bb);
+  if (flag)
+    stmt = gimple_build_cond (NE_EXPR, flag, boolean_false_node,
+			      NULL_TREE, NULL_TREE);
+  else
+    stmt = gimple_build_cond (NE_EXPR, orig_mem_ssa, tmp_var,
+			      NULL_TREE, NULL_TREE);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+
+  gsi = gsi_start_bb (then_bb);
+  /* Insert actual store.  */
+  stmt = gimple_build_assign (unshare_expr (mem), tmp_var);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+
+  make_edge (new_bb, then_bb, EDGE_TRUE_VALUE);
+  make_edge (new_bb, old_dest, EDGE_FALSE_VALUE);
+  then_old_edge = make_edge (then_bb, old_dest, EDGE_FALLTHRU);
+  set_immediate_dominator (CDI_DOMINATORS, then_bb, new_bb);
+
+  if (!single_exit_p)
+    for (gsi = gsi_start_phis (old_dest); !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	gimple phi = gsi_stmt (gsi);
+	unsigned i;
+
+	for (i = 0; i < gimple_phi_num_args (phi); i++)
+	  if (gimple_phi_arg_edge (phi, i)->src == new_bb)
+	    {
+	      tree arg = gimple_phi_arg_def (phi, i);
+	      add_phi_arg (phi, arg, then_old_edge, UNKNOWN_LOCATION);
+	      update_stmt (phi);
+	    }
+      }
+  /* Remove the original fall through edge.  This was the
+     single_succ_edge (new_bb).  */
+  EDGE_SUCC (new_bb, 0)->flags &= ~EDGE_FALLTHRU;
+}
+
+/* Helper function for execute_sm.  On every path that sets REF, set
+   an appropriate flag indicating the store.  */
+
+static tree
+execute_sm_if_changed_flag_set (struct loop *loop, mem_ref_p ref)
+{
+  unsigned i;
+  mem_ref_loc_p loc;
+  tree flag;
+  VEC (mem_ref_loc_p, heap) *locs = NULL;
+  char *str = get_lsm_tmp_name (ref->mem, ~0);
+
+  lsm_tmp_name_add ("_flag");
+  flag = make_rename_temp (boolean_type_node, str);
+  get_all_locs_in_loop (loop, ref, &locs);
+  FOR_EACH_VEC_ELT (mem_ref_loc_p, locs, i, loc)
+    {
+      gimple_stmt_iterator gsi;
+      gimple stmt;
+
+      gsi = gsi_start_bb (gimple_bb (loc->stmt));
+      for (; !gsi_end_p (gsi); gsi_next (&gsi))
+	if (gsi_stmt (gsi) == loc->stmt)
+	  {
+	    stmt = gimple_build_assign (flag, boolean_true_node);
+	    gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+	    break;
+	  }
+    }
+  VEC_free (mem_ref_loc_p, heap, locs);
+  return flag;
+}
+
+/* Store motion types for execute_sm below.  */
+enum store_type
+  {
+    /* Traditional single-thread.  */
+    STORE_SINGLE_THREAD,
+    /* Store only if value changed.  */
+    STORE_IF_CHANGED,
+    /* Store only if flag changed.  */
+    STORE_IF_CHANGED_FLAG
+  };
+
 /* Executes store motion of memory reference REF from LOOP.
    Exits from the LOOP are stored in EXITS.  The initialization of the
    temporary variable is put to the preheader of the loop, and assignments
@@ -1964,12 +2120,13 @@ get_lsm_tmp_name (tree ref, unsigned n)
 static void
 execute_sm (struct loop *loop, VEC (edge, heap) *exits, mem_ref_p ref)
 {
-  tree tmp_var;
+  tree tmp_var, mem_ssa, store_flag;
   unsigned i;
-  gimple load, store;
+  gimple load;
   struct fmt_data fmt_data;
-  edge ex;
+  edge ex, latch_edge;
   struct lim_aux_data *lim_data;
+  enum store_type store;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -1978,6 +2135,8 @@ execute_sm (struct loop *loop, VEC (edge
       fprintf (dump_file, " from loop %d\n", loop->num);
     }
 
+  mem_ssa = create_tmp_reg (TREE_TYPE (unshare_expr (ref->mem)), NULL);
+  mem_ssa = make_ssa_name (mem_ssa, NULL);
   tmp_var = make_rename_temp (TREE_TYPE (ref->mem),
 			      get_lsm_tmp_name (ref->mem, ~0));
 
@@ -1985,22 +2144,66 @@ execute_sm (struct loop *loop, VEC (edge
   fmt_data.orig_loop = loop;
   for_each_index (&ref->mem, force_move_till, &fmt_data);
 
+  if ((flag_tm && bb_in_transaction (loop_preheader_edge (loop)->src))
+      || !PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES))
+    {
+      if (targetm.can_compare_bitwise_p (TYPE_MODE (TREE_TYPE (ref->mem))))
+	store = STORE_IF_CHANGED;
+      else
+	store = STORE_IF_CHANGED_FLAG;
+    }
+  else
+    store = STORE_SINGLE_THREAD;
+
+#if 1
+  /* ?? testing ?? */
+  store = STORE_IF_CHANGED_FLAG;
+#endif
+
+  if (store == STORE_IF_CHANGED_FLAG)
+    store_flag = execute_sm_if_changed_flag_set (loop, ref);
+
   rewrite_mem_refs (loop, ref, tmp_var);
 
-  /* Emit the load & stores.  */
-  load = gimple_build_assign (tmp_var, unshare_expr (ref->mem));
+
+  /* Emit the load code into the latch, so that we are sure it will
+     be processed after all dependencies.  */
+  latch_edge = loop_latch_edge (loop);
+  load = gimple_build_assign (mem_ssa, unshare_expr (ref->mem));
   lim_data = init_lim_data (load);
   lim_data->max_loop = loop;
   lim_data->tgt_loop = loop;
+  gsi_insert_on_edge (latch_edge, load);
+  load = gimple_build_assign (tmp_var, mem_ssa);
+  lim_data = init_lim_data (load);
+  lim_data->max_loop = loop;
+  lim_data->tgt_loop = loop;
+  gsi_insert_on_edge (latch_edge, load);
+  if (store == STORE_IF_CHANGED_FLAG)
+    {
+      load = gimple_build_assign (store_flag, boolean_false_node);
+      lim_data = init_lim_data (load);
+      lim_data->max_loop = loop;
+      lim_data->tgt_loop = loop;
+      gsi_insert_on_edge (latch_edge, load);
+    }
 
-  /* Put this into the latch, so that we are sure it will be processed after
-     all dependencies.  */
-  gsi_insert_on_edge (loop_latch_edge (loop), load);
-
+  /* Sink the store to every exit from the loop.  */
   FOR_EACH_VEC_ELT (edge, exits, i, ex)
     {
-      store = gimple_build_assign (unshare_expr (ref->mem), tmp_var);
-      gsi_insert_on_edge (ex, store);
+      if (store == STORE_SINGLE_THREAD)
+	{
+	  gimple store;
+	  store = gimple_build_assign (unshare_expr (ref->mem), tmp_var);
+	  gsi_insert_on_edge (ex, store);
+	}
+      else if (store == STORE_IF_CHANGED)
+	execute_sm_if_changed (ex, ref->mem, mem_ssa, tmp_var, NULL);
+      else if (store == STORE_IF_CHANGED_FLAG)
+	execute_sm_if_changed (ex, ref->mem, mem_ssa, tmp_var,
+			       store_flag);
+      else
+	gcc_unreachable ();
     }
 }
 
Index: target.def
===================================================================
--- target.def	(revision 186108)
+++ target.def	(working copy)
@@ -2667,6 +2667,13 @@ DEFHOOK
  enum unwind_info_type, (void),
  default_debug_unwind_info)
 
+DEFHOOK
+(can_compare_bitwise_p,
+ "This hook should return true if the target can efficiently compare two\
+ registers in the given mode for bit equality.",
+ bool, (enum machine_mode),
+ default_can_compare_bitwise_p)
+
 DEFHOOKPOD
 (atomic_test_and_set_trueval,
  "This value should be set if the result written by\
Index: gimple.h
===================================================================
--- gimple.h	(revision 186108)
+++ gimple.h	(working copy)
@@ -1122,6 +1122,7 @@ extern tree omp_reduction_init (tree, tr
 /* In trans-mem.c.  */
 extern void diagnose_tm_safe_errors (tree);
 extern void compute_transaction_bits (void);
+extern bool bb_in_transaction (basic_block);
 
 /* In tree-nested.c.  */
 extern void lower_nested_functions (tree);
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 186108)
+++ Makefile.in	(working copy)
@@ -2532,7 +2532,7 @@ tree-ssa-loop-im.o : tree-ssa-loop-im.c
    $(PARAMS_H) output.h $(DIAGNOSTIC_H) $(TIMEVAR_H) $(TM_H) coretypes.h \
    $(TREE_DUMP_H) $(TREE_PASS_H) $(FLAGS_H) $(BASIC_BLOCK_H) \
    pointer-set.h tree-affine.h tree-ssa-propagate.h gimple-pretty-print.h \
-   tree-pretty-print.h
+   tree-pretty-print.h $(TARGET_H)
 tree-ssa-math-opts.o : tree-ssa-math-opts.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(TM_H) $(FLAGS_H) $(TREE_H) $(TREE_FLOW_H) $(TIMEVAR_H) \
    $(TREE_PASS_H) alloc-pool.h $(BASIC_BLOCK_H) $(TARGET_H) \
Index: doc/tm.texi
===================================================================
--- doc/tm.texi	(revision 186108)
+++ doc/tm.texi	(working copy)
@@ -9471,6 +9471,10 @@ how you define @code{DWARF2_FRAME_INFO}.
 @end defmac
 
 @deftypefn {Target Hook} {enum unwind_info_type} TARGET_DEBUG_UNWIND_INFO (void)
+
+@deftypefn {Target Hook} bool TARGET_CAN_COMPARE_BITWISE_P (enum @var{machine_mode})
+This hook should return true if the target can efficiently compare two registers in the given mode for bit equality.
+@end deftypefn
 This hook defines the mechanism that will be used for describing frame
 unwind information to the debugger.  Normally the hook will return
 @code{UI_DWARF2} if DWARF 2 debug information is enabled, and
Index: targhooks.c
===================================================================
--- targhooks.c	(revision 186108)
+++ targhooks.c	(working copy)
@@ -1331,6 +1331,15 @@ default_debug_unwind_info (void)
   return UI_NONE;
 }
 
+/* This hook returns true if the target can efficiently compare
+   two registers in the given mode for bit equality.  */
+
+bool
+default_can_compare_bitwise_p (enum machine_mode mode)
+{
+  return SCALAR_INT_MODE_P (mode);
+}
+
 /* To be used by targets where reg_raw_mode doesn't return the right
    mode for registers used in apply_builtin_return and apply_builtin_arg.  */
 
Index: targhooks.h
===================================================================
--- targhooks.h	(revision 186108)
+++ targhooks.h	(working copy)
@@ -168,6 +168,8 @@ extern unsigned char default_class_max_n
 
 extern enum unwind_info_type default_debug_unwind_info (void);
 
+extern bool default_can_compare_bitwise_p (enum machine_mode);
+
 extern int default_label_align_after_barrier_max_skip (rtx);
 extern int default_loop_align_max_skip (rtx);
 extern int default_label_align_max_skip (rtx);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-04-24 17:43   ` Aldy Hernandez
@ 2012-04-25 11:45     ` Richard Guenther
  2012-04-25 22:55       ` Aldy Hernandez
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Guenther @ 2012-04-25 11:45 UTC (permalink / raw)
  To: Aldy Hernandez
  Cc: Richard Guenther, Andrew MacLeod, Boehm, Hans, gcc-patches,
	Torvald Riegel

On Tue, Apr 24, 2012 at 7:43 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 04/13/12 03:46, Richard Guenther wrote:
>>
>> On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez<aldyh@redhat.com>  wrote:
>
>
> Richard.  Thanks so much for reviewing and providing an alternative
> approach, which AFAICT provides superior results.
>
>
>> A similar effect could be obtained by keeping a flag whether we entered
>> the
>> path that stored the value before the transform.  Thus, do
>>
>>   lsm = g2;  // technically not needed, if you also handle loads
>>   flag = false;
>>   for (...)
>>     {
>>        if (g1)
>>          {
>>             if (flag)
>>               g2 = lsm;
>>          }
>>        else
>>          {
>>             lsm = 0;
>>             flag = true;
>>          }
>>     }
>>   if (flag)
>>     g2 = lsm;
>
>
> I have implemented this in the attached patch, and it seems to be generating
> better code than my other if-changed approach.  It also avoids generating
> unnecessary loads that may trap.
>
> For the original testcase:
>
>
> int g_1 = 1;
> int g_2 = 0;
>
> int func_1(void)
> {
>  int l;
>  for (l = 0; l < 1234; l++)
>  {
>   if (g_1)
>     return l;
>   else
>     g_2 = 0;
>  }
>  return 999;
> }
>
> After all optimizations are done, we are now generating the following for
> the flags approach:
>
> new:
> func_1:
>        movl    g_1(%rip), %edx
>        xorl    %eax, %eax
>        testl   %edx, %edx
>        je      .L5
>        rep
>        ret
> .L5:
>        movl    $0, g_2(%rip)
>        movw    $999, %ax
>        ret
>
> Which is significantly better than the unmodified trunk of:
>
> func_1:
>        movl    g_1(%rip), %edx
>        movl    g_2(%rip), %eax
>        testl   %edx, %edx
>        je      .L5
>        movl    %eax, g_2(%rip)
>        xorl    %eax, %eax
>        ret
> .L5:
>        movl    $0, g_2(%rip)
>        movl    $999, %eax
>        ret
>
> And in other less contrived cases, we generate correct code that avoids
> writes on code paths that did not have it.  For example, in Hans register
> promotion example:
>
> int count;
>
> struct obj {
>    int data;
>    struct obj *next;
> } *q;
>
> void func()
> {
>  struct obj *p;
>  for (p = q; p; p = p->next)
>        if (p->data > 0)
>                count++;
> }
>
> Under the new memory model we should avoid promoting "count" to a register
> and unilaterally writing to it upon exiting the loop.
>
> With the attached patch, we now generate:
>
> func:
> .LFB0:
>        movq    q(%rip), %rax
>        xorl    %ecx, %ecx
>        movl    count(%rip), %edx
>        testq   %rax, %rax
>        je      .L1
> .L9:
>        movl    (%rax), %esi
>        testl   %esi, %esi
>        jle     .L3
>        addl    $1, %edx
>        movl    $1, %ecx
> .L3:
>        movq    8(%rax), %rax
>        testq   %rax, %rax
>        jne     .L9
>        testb   %cl, %cl
>        jne     .L12
> .L1:
>        rep
>        ret
> .L12:
>        movl    %edx, count(%rip)
>        ret
>
> Which is as expected.
>
> I don't understand your suggestion of:
>
>
>>    lsm = g2;  // technically not needed, if you also handle loads
>>
>> that would allow to extend this to cover the cases where the access may
>>
>> eventually trap (if you omit the load before the loop).
>
>
> Can't I just remove the lsm=g2?  What's this about also handling loads?
>
>
>>
>> Not sure which is more efficient (you can eventually combine flag
>> variables
>> for different store locations, combining the if-changed compare is not so
>> easy
>> I guess).
>
>
> So far, I see better code generated with the flags approach, so...
>
>
>>> 1. No pass can figure out the equality (or inequality) of g_2_lsm and
>>> g_2.
>>>  In the PR, Richard mentions that both FRE/PRE can figure this out, but
>>> they
>>> are not run after store motion.  DOM should also be able to, but
>>> apparently
>>> does not :(.  Tips?
>>
>>
>> Well.  Schedule FRE after loop opts ...
>>
>> DOM is not prepared to handle loads/stores with differing VOPs - it was
>> never
>> updated really after the alias-improvements merge.
>>
>>> 2. The GIMPLE_CONDs I create in this patch are currently causing problems
>>> with complex floats/doubles.  do_jump is complaining that it can't
>>> compare
>>> them, so I assume it is too late (in tree-ssa-loop-im.c) to compare
>>> complex
>>> values since complex lowering has already happened (??). Is there a more
>>> canonical way of creating a GIMPLE_COND that may contain complex floats
>>> at
>>> this stage?
>>
>>
>> As you are handling redundant stores you want a bitwise comparison anyway,
>> but yes, complex compares need to be lowered.  But as said, you want a
>> bitwise comparison, not a value-comparison.  You can try using
>
>
> I can ignore all this.  I have implemented both alternatives, with a target
> hook as suggested, but I'm thinking of removing it altogether and just
> leaving the flags approach.
>
>> Few comments on the patch itself.
>>
>> +         new_bb = split_edge (ex);
>> +         then_bb = create_empty_bb (new_bb);
>> +         if (current_loops&&  new_bb->loop_father)
>>
>> +           add_bb_to_loop (then_bb, new_bb->loop_father);
>>
>> +         gsi = gsi_start_bb (new_bb);
>> +         t1 = make_rename_temp (TREE_TYPE (ref->mem), NULL);
>>
>> Hmm, why do you need to re-load the value?  Don't you have both
>> the value as it was loaded before the loop and the new value (the lsm tmp
>> reg)
>> already?  Why not simply remember the SSA name used?  Ah, because
>> store motion also uses make_rename_temp.  If you don't want to change
>> that (which would be good but inconvenient because of the required PHI
>> insertions), I'd change the code that performs the load in the pre-header
>> to do
>>
>>   SSA_NAME = ref->mem;
>>   rename-lsm-tmp = SSA_NAME;
>>
>> so you can remember SSA_NAME and re-use it here.  That probably solves
>> your optimization issue, too.
>
>
> Fixed.
>
>
>> +         for (gsi = gsi_start_phis (old_dest); !gsi_end_p (gsi);
>> gsi_next (&gsi))
>> +           {
>> +             gimple phi = gsi_stmt (gsi);
>> +             unsigned i;
>> +
>> +             for (i = 0; i<  gimple_phi_num_args (phi); i++)
>> +               if (gimple_phi_arg_edge (phi, i)->src == new_bb)
>> +                 {
>> +                   tree arg = gimple_phi_arg_def (phi, i);
>> +                   add_phi_arg (phi, arg, then_old_edge,
>> UNKNOWN_LOCATION);
>> +                   update_stmt (phi);
>> +                 }
>> +           }
>>
>> Hmm.  This is of course doing unnecessary work in case there are multiple
>> moved stores.  In fact splitting the exit edge is only necessary if there
>> are
>> more than one predecessor.  Otherwise you can simply split the exit block
>> after inserting the new conditional after its labels.
>
>
> Is this still the case with the current patch?  If so, I'm a bit confused as
> to what you want here.
>
>
>>
>> +  if (opts->x_flag_tm)
>> +    {
>> +      if (opts->x_flag_non_call_exceptions)
>> +       sorry ("transactional memory is not supported with non-call
>> exceptions");
>> +
>> +      set_param_value ("allow-load-data-races", 0,
>> +                      opts->x_param_values, opts_set->x_param_values);
>> +      set_param_value ("allow-store-data-races", 0,
>> +                      opts->x_param_values, opts_set->x_param_values);
>>
>> Unless the user explicitely set those params?  Which means using params
>> wasn't a very good idea ... ideally you'd diagnose an error when the user
>> would mix -fgnu-tm and -fallow-store-data-races.  So - can we remove
>> allow-load-data-races (we are never going to implement that) and make
>> allow-store-data-races a proper -f flag please?
>
>
> Sorry, that was not meant for public consumption.  I have rewritten this to
> either take the param value as is, and in the case of flag_tm, only restrict
> the optimization if the loop is inside an actual transaction (see the
> changes to trans-mem.c, cause I know you'll hate bb_in_transaction() :-)).
>
>
>>
>> +    }
>>
>> And that should be a separate patch
>
>
> I can certainly reimplement --param=allow-{load,store}-data-races as proper
> -f flags.  I don't care either way.  But I remember Ian Taylor having a
> different opinion.  If y'all agree, I can implement whatever.
>
>
>> It would be nice if you could think about LIM/SM for possibly trapping
>> loads/stores
>> while you are doing this work.
>
>
> We are no longer adding additional trapping loads (as with the if-changed
> approach).  Is this something else you'd like me to concentrate on?
>
> Please take a look at the attached patch.  I tested the flags alternative on
> a full bootstrap, and there's only one regression which I will look at, but
> I'd like to know if the current WIP is aligned with what you had in mind.

+  /* ?? Perhaps we should cache this somewhere in the BB, or are
+     multiple levels of empty BB's common. ??  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      int res = bb_in_transaction_1 (e->src);
+      if (res != -1)
+       return (bool) res;
+    }
+  return false;

that's weird code - if predecessors are not all in or not in a transaction
you return a random value.  So it seems this is somewhat fragile.

If bb status can be derived from looking at a single stmt why is the
transaction-ness not recorded as a basic-block flag then?  Thus,
why do we have a bit in gimple stmts?

Why can nobody merge a transaction and a non-transaction basic-block
making your predicate above wrong because only the 2nd stmt is
in a transaction?

+  bool single_exit_p = single_pred_p (ex->dest);

that's a strange variable name ;)

+/* Helper function for execute_sm.  On every path that sets REF, set
+   an appropriate flag indicating the store.  */
+
+static tree
+execute_sm_if_changed_flag_set (struct loop *loop, mem_ref_p ref)
+{
...
+  FOR_EACH_VEC_ELT (mem_ref_loc_p, locs, i, loc)
+    {
+      gimple_stmt_iterator gsi;
+      gimple stmt;
+
+      gsi = gsi_start_bb (gimple_bb (loc->stmt));
+      for (; !gsi_end_p (gsi); gsi_next (&gsi))
+       if (gsi_stmt (gsi) == loc->stmt)

does not seem to do it on every path but on every REF setter.  And instead
of the innermost loop just use gsi_for_stmt (loc->stmt).

+  /* Emit the load code into the latch, so that we are sure it will
+     be processed after all dependencies.  */
+  latch_edge = loop_latch_edge (loop);
+  load = gimple_build_assign (mem_ssa, unshare_expr (ref->mem));
   lim_data = init_lim_data (load);
   lim_data->max_loop = loop;
   lim_data->tgt_loop = loop;
+  gsi_insert_on_edge (latch_edge, load);
+  load = gimple_build_assign (tmp_var, mem_ssa);
+  lim_data = init_lim_data (load);
+  lim_data->max_loop = loop;
+  lim_data->tgt_loop = loop;
+  gsi_insert_on_edge (latch_edge, load);

the 2nd assign is no longer a "load", so I suppose you don't need any lim_data
for it?

+      else if (store == STORE_IF_CHANGED_FLAG)
+       execute_sm_if_changed (ex, ref->mem, mem_ssa, tmp_var,
+                              store_flag);

and this can pass NULL instead of mem_ssa?

Overall this looks reasonable.  With also handling trapping things I meant
that we should be able to apply store-motion to

int a[256];
void foo (int *p)
{
  int i;
  for (i= 0;i<256;i++)
    if (flag)
      *p = a[i];
}

but we cannot, because the transform

  lsm = *p;
  for (i = 0; i < 256; ++i)
    if (flag)
      lsm = a[i];
  *p = lsm;

even when the store is properly conditional the load is not (and you do not
change that).  The unconditional load may trap here.  What I meant to say
is that we do not need the load at all unless there is also a load involved
inside the loop - if we use the flag-based approach.  If there is also a load
inside the loop we'd need to perform the load there (or not handle this
case)

But overall this looks nice!

Thanks,
Richard.

> Thanks again.
> Aldy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-04-25 11:45     ` Richard Guenther
@ 2012-04-25 22:55       ` Aldy Hernandez
  2012-04-26  9:52         ` Richard Guenther
  0 siblings, 1 reply; 22+ messages in thread
From: Aldy Hernandez @ 2012-04-25 22:55 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Guenther, Andrew MacLeod, Boehm, Hans, gcc-patches,
	Torvald Riegel

[-- Attachment #1: Type: text/plain, Size: 4029 bytes --]

On 04/25/12 06:45, Richard Guenther wrote:
> On Tue, Apr 24, 2012 at 7:43 PM, Aldy Hernandez<aldyh@redhat.com>  wrote:
>> On 04/13/12 03:46, Richard Guenther wrote:
>>>
>>> On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez<aldyh@redhat.com>    wrote:

> +  /* ?? Perhaps we should cache this somewhere in the BB, or are
> +     multiple levels of empty BB's common. ??  */
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +    {
> +      int res = bb_in_transaction_1 (e->src);
> +      if (res != -1)
> +       return (bool) res;
> +    }
> +  return false;
>
> that's weird code - if predecessors are not all in or not in a transaction
> you return a random value.  So it seems this is somewhat fragile.
>
> If bb status can be derived from looking at a single stmt why is the
> transaction-ness not recorded as a basic-block flag then?  Thus,
> why do we have a bit in gimple stmts?

How about we get rid of the GIMPLE bit altogether and keep it in the BB 
flags like you mention?  See patch.

> +  bool single_exit_p = single_pred_p (ex->dest);
>
> that's a strange variable name ;)

How about loop_has_only_one_exit? :)

>
> +/* Helper function for execute_sm.  On every path that sets REF, set
> +   an appropriate flag indicating the store.  */
> +
> +static tree
> +execute_sm_if_changed_flag_set (struct loop *loop, mem_ref_p ref)
> +{
> ...
> +  FOR_EACH_VEC_ELT (mem_ref_loc_p, locs, i, loc)
> +    {
> +      gimple_stmt_iterator gsi;
> +      gimple stmt;
> +
> +      gsi = gsi_start_bb (gimple_bb (loc->stmt));
> +      for (; !gsi_end_p (gsi); gsi_next (&gsi))
> +       if (gsi_stmt (gsi) == loc->stmt)
>
> does not seem to do it on every path but on every REF setter.  And instead
> of the innermost loop just use gsi_for_stmt (loc->stmt).

Updated comment.  Fixed.

>
> +  /* Emit the load code into the latch, so that we are sure it will
> +     be processed after all dependencies.  */
> +  latch_edge = loop_latch_edge (loop);
> +  load = gimple_build_assign (mem_ssa, unshare_expr (ref->mem));
>     lim_data = init_lim_data (load);
>     lim_data->max_loop = loop;
>     lim_data->tgt_loop = loop;
> +  gsi_insert_on_edge (latch_edge, load);
> +  load = gimple_build_assign (tmp_var, mem_ssa);
> +  lim_data = init_lim_data (load);
> +  lim_data->max_loop = loop;
> +  lim_data->tgt_loop = loop;
> +  gsi_insert_on_edge (latch_edge, load);
>
> the 2nd assign is no longer a "load", so I suppose you don't need any lim_data
> for it?

Re-did all this.  See patch.

>
> +      else if (store == STORE_IF_CHANGED_FLAG)
> +       execute_sm_if_changed (ex, ref->mem, mem_ssa, tmp_var,
> +                              store_flag);
>
> and this can pass NULL instead of mem_ssa?

Got rid of mem_ssa altogether, as well as the if-changed approach which 
proved inferior to the flags based approach.

> Overall this looks reasonable.  With also handling trapping things I meant
> that we should be able to apply store-motion to
>
> int a[256];
> void foo (int *p)
> {
>    int i;
>    for (i= 0;i<256;i++)
>      if (flag)
>        *p = a[i];
> }
>
> but we cannot, because the transform
>
>    lsm = *p;
>    for (i = 0; i<  256; ++i)
>      if (flag)
>        lsm = a[i];
>    *p = lsm;
>
> even when the store is properly conditional the load is not (and you do not
> change that).  The unconditional load may trap here.  What I meant to say
> is that we do not need the load at all unless there is also a load involved
> inside the loop - if we use the flag-based approach.  If there is also a load
> inside the loop we'd need to perform the load there (or not handle this
> case)

Ahh!  Yes, this sounds very profitable.  Would you mind me doing this in 
a separate patch?  I am already handling loads in this incarnation, so 
this should be straightforward.

Speak of loads, I am keeping the information as an additional bitmap in 
`memory_accesses', as ->refs_in_loop was set for stores as well, so I 
couldn't depend on it.  Let me know if you have another idea.

Mildly tested.  Wadaya think?

Thanks again.
Aldy

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 15213 bytes --]

	* gimple.h (gimple_set_in_transaction): Remove.
	(gimple_in_transaction): Look in BB instead.
	(gimple_statement_base): Remove in_transaction field.
	* basic-block.h (enum bb_flags): Add BB_IN_TRANSACTION.
	* trans-mem.c (compute_transaction_bits): Place transaction bit
	information into basic blocks.
	* Makefile.in (tree-ssa-loop-im.o): Depend on TARGET_H.
	* doc/tm.texi: Regenerate.
	* tree-ssa-loop-im.c (execute_sm_if_changed): New.
	(execute_sm_if_changed_flag): New.
	(execute_sm_if_changed_flag_set): New.
	(execute_sm): Do not generate data races unless requested.
	* targhooks.c (default_can_compare_bitwise_p): New.
	* targhooks.h (default_can_compare_bitwise_p): Protoize.
	* target.def (can_compare_bitwise_p): New hook.
	* tree-ssa-loop-im.c (execute_sm): Guard store motion with a
	conditional.
	* opts.c (finish_options): Do not allow store or load data races
	with -fgnu-tm.

Index: trans-mem.c
===================================================================
--- trans-mem.c	(revision 186108)
+++ trans-mem.c	(working copy)
@@ -2449,13 +2449,15 @@ compute_transaction_bits (void)
   struct tm_region *region;
   VEC (basic_block, heap) *queue;
   unsigned int i;
-  gimple_stmt_iterator gsi;
   basic_block bb;
 
   /* ?? Perhaps we need to abstract gate_tm_init further, because we
      certainly don't need it to calculate CDI_DOMINATOR info.  */
   gate_tm_init ();
 
+  FOR_EACH_BB (bb)
+    bb->flags &= ~BB_IN_TRANSACTION;
+
   for (region = all_tm_regions; region; region = region->next)
     {
       queue = get_tm_region_blocks (region->entry_block,
@@ -2464,11 +2466,7 @@ compute_transaction_bits (void)
 				    NULL,
 				    /*stop_at_irr_p=*/true);
       for (i = 0; VEC_iterate (basic_block, queue, i, bb); ++i)
-	for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-	  {
-	    gimple stmt = gsi_stmt (gsi);
-	    gimple_set_in_transaction (stmt, true);
-	  }
+	bb->flags |= BB_IN_TRANSACTION;
       VEC_free (basic_block, heap, queue);
     }
 
Index: basic-block.h
===================================================================
--- basic-block.h	(revision 186108)
+++ basic-block.h	(working copy)
@@ -256,7 +256,12 @@ enum bb_flags
      df_set_bb_dirty, but not cleared by df_analyze, so it can be used
      to test whether a block has been modified prior to a df_analyze
      call.  */
-  BB_MODIFIED = 1 << 12
+  BB_MODIFIED = 1 << 12,
+
+  /* Set on blocks that are in a transaction.  This is calculated on
+     demand, and is available after calling
+     compute_transaction_bits().  */
+  BB_IN_TRANSACTION = 1 << 13
 };
 
 /* Dummy flag for convenience in the hot/cold partitioning code.  */
Index: tree-ssa-loop-im.c
===================================================================
--- tree-ssa-loop-im.c	(revision 186108)
+++ tree-ssa-loop-im.c	(working copy)
@@ -52,7 +52,7 @@ along with GCC; see the file COPYING3.
 	 }
      }
 
-   Where COND and INV are is invariants, but evaluating INV may trap or be
+   Where COND and INV are invariants, but evaluating INV may trap or be
    invalid from some other reason if !COND.  This may be transformed to
 
    if (cond)
@@ -175,6 +175,9 @@ static struct
   /* The set of memory references accessed in each loop.  */
   VEC (bitmap, heap) *refs_in_loop;
 
+  /* The set of memory references read in each loop.  */
+  VEC (bitmap, heap) *reads_in_loop;
+
   /* The set of memory references accessed in each loop, including
      subloops.  */
   VEC (bitmap, heap) *all_refs_in_loop;
@@ -1555,6 +1558,13 @@ record_mem_ref_loc (mem_ref_p ref, struc
 
   VEC_safe_push (mem_ref_loc_p, heap, accs->locs, aref);
   bitmap_set_bit (ril, ref->id);
+  if (gimple_assign_single_p (stmt)
+      && gimple_assign_rhs1 (stmt) == ref->mem)
+    {
+      bitmap reads;
+      reads = VEC_index (bitmap, memory_accesses.reads_in_loop, loop->num);
+      bitmap_set_bit (reads, ref->id);
+    }
 }
 
 /* Marks reference REF as stored in LOOP.  */
@@ -1726,6 +1736,8 @@ analyze_memory_references (void)
   memory_accesses.refs_list = NULL;
   memory_accesses.refs_in_loop = VEC_alloc (bitmap, heap,
 					    number_of_loops ());
+  memory_accesses.reads_in_loop = VEC_alloc (bitmap, heap,
+					     number_of_loops ());
   memory_accesses.all_refs_in_loop = VEC_alloc (bitmap, heap,
 						number_of_loops ());
   memory_accesses.all_refs_stored_in_loop = VEC_alloc (bitmap, heap,
@@ -1736,6 +1748,8 @@ analyze_memory_references (void)
       empty = BITMAP_ALLOC (NULL);
       VEC_quick_push (bitmap, memory_accesses.refs_in_loop, empty);
       empty = BITMAP_ALLOC (NULL);
+      VEC_quick_push (bitmap, memory_accesses.reads_in_loop, empty);
+      empty = BITMAP_ALLOC (NULL);
       VEC_quick_push (bitmap, memory_accesses.all_refs_in_loop, empty);
       empty = BITMAP_ALLOC (NULL);
       VEC_quick_push (bitmap, memory_accesses.all_refs_stored_in_loop, empty);
@@ -1956,6 +1970,133 @@ get_lsm_tmp_name (tree ref, unsigned n)
   return lsm_tmp_name;
 }
 
+/* Helper function for execute_sm.  Emit code to store TMP_VAR into
+   MEM along edge EX.
+
+   The store is only done if MEM has changed.  We do this so no
+   changes to MEM occur on code paths that did not originally store
+   into it.
+
+   The common case for execute_sm will transform:
+
+     for (...) {
+       if (foo)
+         stuff;
+       else
+         MEM = TMP_VAR;
+     }
+
+   into:
+
+     lsm = MEM;
+     for (...) {
+       if (foo)
+         stuff;
+       else
+         lsm = TMP_VAR;
+     }
+     MEM = lsm;
+
+  This function will generate:
+
+     // Note: this load can be avoided unless there is a load already
+     //       present in the loop.
+     lsm = MEM;
+
+     flag = false;
+     ...
+     for (...) {
+       if (foo)
+         stuff;
+       else {
+         lsm = TMP_VAR;
+         flag = true;
+       }
+     }
+     if (flag)		<--
+       MEM = lsm;	<--
+*/
+
+static void
+execute_sm_if_changed (edge ex, tree mem, tree tmp_var, tree flag)
+{
+  basic_block new_bb, then_bb, old_dest = ex->dest;
+  bool loop_has_only_one_exit = single_pred_p (ex->dest);
+  edge then_old_edge;
+  gimple_stmt_iterator gsi;
+  gimple stmt;
+
+  if (loop_has_only_one_exit)
+    ex = split_block_after_labels (old_dest);
+
+  old_dest = ex->dest;
+  new_bb = split_edge (ex);
+  then_bb = create_empty_bb (new_bb);
+  if (current_loops && new_bb->loop_father)
+    add_bb_to_loop (then_bb, new_bb->loop_father);
+
+  gsi = gsi_start_bb (new_bb);
+  stmt = gimple_build_cond (NE_EXPR, flag, boolean_false_node,
+			    NULL_TREE, NULL_TREE);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+
+  gsi = gsi_start_bb (then_bb);
+  /* Insert actual store.  */
+  stmt = gimple_build_assign (unshare_expr (mem), tmp_var);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+
+  make_edge (new_bb, then_bb, EDGE_TRUE_VALUE);
+  make_edge (new_bb, old_dest, EDGE_FALSE_VALUE);
+  then_old_edge = make_edge (then_bb, old_dest, EDGE_FALLTHRU);
+  set_immediate_dominator (CDI_DOMINATORS, then_bb, new_bb);
+
+  if (!loop_has_only_one_exit)
+    for (gsi = gsi_start_phis (old_dest); !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	gimple phi = gsi_stmt (gsi);
+	unsigned i;
+
+	for (i = 0; i < gimple_phi_num_args (phi); i++)
+	  if (gimple_phi_arg_edge (phi, i)->src == new_bb)
+	    {
+	      tree arg = gimple_phi_arg_def (phi, i);
+	      add_phi_arg (phi, arg, then_old_edge, UNKNOWN_LOCATION);
+	      update_stmt (phi);
+	    }
+      }
+  /* Remove the original fall through edge.  This was the
+     single_succ_edge (new_bb).  */
+  EDGE_SUCC (new_bb, 0)->flags &= ~EDGE_FALLTHRU;
+}
+
+/* Helper function for execute_sm.  On every location where REF is
+   set, set an appropriate flag indicating the store.  */
+
+static tree
+execute_sm_if_changed_flag_set (struct loop *loop, mem_ref_p ref)
+{
+  unsigned i;
+  mem_ref_loc_p loc;
+  tree flag;
+  VEC (mem_ref_loc_p, heap) *locs = NULL;
+  char *str = get_lsm_tmp_name (ref->mem, ~0);
+
+  lsm_tmp_name_add ("_flag");
+  flag = make_rename_temp (boolean_type_node, str);
+  get_all_locs_in_loop (loop, ref, &locs);
+  FOR_EACH_VEC_ELT (mem_ref_loc_p, locs, i, loc)
+    {
+      gimple_stmt_iterator gsi;
+      gimple stmt;
+
+      gsi = gsi_for_stmt (loc->stmt);
+      stmt = gimple_build_assign (flag, boolean_true_node);
+      gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+    }
+  VEC_free (mem_ref_loc_p, heap, locs);
+  return flag;
+}
+
 /* Executes store motion of memory reference REF from LOOP.
    Exits from the LOOP are stored in EXITS.  The initialization of the
    temporary variable is put to the preheader of the loop, and assignments
@@ -1964,12 +2105,14 @@ get_lsm_tmp_name (tree ref, unsigned n)
 static void
 execute_sm (struct loop *loop, VEC (edge, heap) *exits, mem_ref_p ref)
 {
-  tree tmp_var;
+  tree tmp_var, store_flag;
   unsigned i;
-  gimple load, store;
+  gimple load;
   struct fmt_data fmt_data;
-  edge ex;
+  edge ex, latch_edge;
   struct lim_aux_data *lim_data;
+  bool multi_threaded_model_p = false;
+  bool ref_read_in_loop_p = false;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -1985,23 +2128,55 @@ execute_sm (struct loop *loop, VEC (edge
   fmt_data.orig_loop = loop;
   for_each_index (&ref->mem, force_move_till, &fmt_data);
 
+  if ((flag_tm && loop_preheader_edge (loop)->src->flags & BB_IN_TRANSACTION)
+      || !PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES))
+    multi_threaded_model_p = true;
+
+  if (multi_threaded_model_p)
+    store_flag = execute_sm_if_changed_flag_set (loop, ref);
+
   rewrite_mem_refs (loop, ref, tmp_var);
 
-  /* Emit the load & stores.  */
-  load = gimple_build_assign (tmp_var, unshare_expr (ref->mem));
-  lim_data = init_lim_data (load);
-  lim_data->max_loop = loop;
-  lim_data->tgt_loop = loop;
-
-  /* Put this into the latch, so that we are sure it will be processed after
-     all dependencies.  */
-  gsi_insert_on_edge (loop_latch_edge (loop), load);
 
-  FOR_EACH_VEC_ELT (edge, exits, i, ex)
-    {
-      store = gimple_build_assign (unshare_expr (ref->mem), tmp_var);
-      gsi_insert_on_edge (ex, store);
+  /* Emit the load code into the latch, so that we are sure it will
+     be processed after all dependencies.  */
+  latch_edge = loop_latch_edge (loop);
+  if (multi_threaded_model_p)
+    {
+      bitmap reads = VEC_index (bitmap, memory_accesses.reads_in_loop,
+				loop->num);
+      ref_read_in_loop_p = bitmap_bit_p (reads, ref->id);
+    }
+  /* For the multi-threaded variant, we can avoid the load altogether,
+     since the store predicated by a flag.  However, do the load if it
+     was originally in the loop.  */
+  if (!multi_threaded_model_p || ref_read_in_loop_p)
+    {
+      load = gimple_build_assign (tmp_var, unshare_expr (ref->mem));
+      lim_data = init_lim_data (load);
+      lim_data->max_loop = loop;
+      lim_data->tgt_loop = loop;
+      gsi_insert_on_edge (latch_edge, load);
+    }
+  if (multi_threaded_model_p)
+    {
+      load = gimple_build_assign (store_flag, boolean_false_node);
+      lim_data = init_lim_data (load);
+      lim_data->max_loop = loop;
+      lim_data->tgt_loop = loop;
+      gsi_insert_on_edge (latch_edge, load);
     }
+
+  /* Sink the store to every exit from the loop.  */
+  FOR_EACH_VEC_ELT (edge, exits, i, ex)
+    if (!multi_threaded_model_p)
+      {
+	gimple store;
+	store = gimple_build_assign (unshare_expr (ref->mem), tmp_var);
+	gsi_insert_on_edge (ex, store);
+      }
+    else
+      execute_sm_if_changed (ex, ref->mem, tmp_var, store_flag);
 }
 
 /* Hoists memory references MEM_REFS out of LOOP.  EXITS is the list of exit
@@ -2433,6 +2608,10 @@ tree_ssa_lim_finalize (void)
     BITMAP_FREE (b);
   VEC_free (bitmap, heap, memory_accesses.refs_in_loop);
 
+  FOR_EACH_VEC_ELT (bitmap, memory_accesses.reads_in_loop, i, b)
+    BITMAP_FREE (b);
+  VEC_free (bitmap, heap, memory_accesses.reads_in_loop);
+
   FOR_EACH_VEC_ELT (bitmap, memory_accesses.all_refs_in_loop, i, b)
     BITMAP_FREE (b);
   VEC_free (bitmap, heap, memory_accesses.all_refs_in_loop);
Index: testsuite/gcc.dg/pr52558-1.c
===================================================================
--- testsuite/gcc.dg/pr52558-1.c	(revision 0)
+++ testsuite/gcc.dg/pr52558-1.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "--param allow-store-data-races=0 -O2 -fdump-tree-lim1" } */
+
+int count;
+
+struct obj {
+    int data;
+    struct obj *next;
+} *q;
+
+void func()
+{
+  struct obj *p;
+  for (p = q; p; p = p->next)
+    if (p->data > 0)
+      count++;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM count_lsm.. count_lsm_flag" 1 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: testsuite/gcc.dg/pr52558-2.c
===================================================================
--- testsuite/gcc.dg/pr52558-2.c	(revision 0)
+++ testsuite/gcc.dg/pr52558-2.c	(revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "--param allow-store-data-races=0 -O2 -fdump-tree-lim1" } */
+
+int g_1 = 1;
+int g_2 = 0;
+
+int func_1(void)
+{
+ int l;
+ for (l = 0; l < 1234; l++)
+ {
+   if (g_1)
+     return l;
+   else
+     g_2 = 0;
+ }
+ return 999;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM.*g_2_lsm_flag" 1 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: testsuite/gcc.dg/tm/reg-promotion.c
===================================================================
--- testsuite/gcc.dg/tm/reg-promotion.c	(revision 0)
+++ testsuite/gcc.dg/tm/reg-promotion.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm -O2 -fdump-tree-lim1" } */
+
+int count;
+
+struct obj {
+    int data;
+    struct obj *next;
+} *q;
+
+void func()
+{
+  struct obj *p;
+  __transaction_atomic {
+    for (p = q; p; p = p->next)
+      if (p->data > 0)
+	count++;
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "MEM count_lsm.. count_lsm_flag" 1 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: gimple.h
===================================================================
--- gimple.h	(revision 186108)
+++ gimple.h	(working copy)
@@ -305,11 +305,6 @@ struct GTY(()) gimple_statement_base {
   /* Nonzero if this statement contains volatile operands.  */
   unsigned has_volatile_ops 	: 1;
 
-  /* Nonzero if this statement appears inside a transaction.  This bit
-     is calculated on de-mand and has relevant information only after
-     it has been calculated with compute_transaction_bits.  */
-  unsigned in_transaction	: 1;
-
   /* The SUBCODE field can be used for tuple-specific flags for tuples
      that do not require subcodes.  Note that SUBCODE should be at
      least as wide as tree codes, as several tuples store tree codes
@@ -1591,15 +1586,7 @@ gimple_set_has_volatile_ops (gimple stmt
 static inline bool
 gimple_in_transaction (gimple stmt)
 {
-  return stmt->gsbase.in_transaction;
-}
-
-/* Set the IN_TRANSACTION flag to TRANSACTIONP.  */
-
-static inline void
-gimple_set_in_transaction (gimple stmt, bool transactionp)
-{
-  stmt->gsbase.in_transaction = (unsigned) transactionp;
+  return gimple_bb (stmt)->flags & BB_IN_TRANSACTION;
 }
 
 /* Return true if statement STMT may access memory.  */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-04-25 22:55       ` Aldy Hernandez
@ 2012-04-26  9:52         ` Richard Guenther
  2012-05-07 23:05           ` Aldy Hernandez
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Guenther @ 2012-04-26  9:52 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Andrew MacLeod, Boehm, Hans, gcc-patches, Torvald Riegel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5384 bytes --]

On Wed, 25 Apr 2012, Aldy Hernandez wrote:

> On 04/25/12 06:45, Richard Guenther wrote:
> > On Tue, Apr 24, 2012 at 7:43 PM, Aldy Hernandez<aldyh@redhat.com>  wrote:
> > > On 04/13/12 03:46, Richard Guenther wrote:
> > > >
> > > > On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez<aldyh@redhat.com>
> > > > wrote:
> 
> > +  /* ?? Perhaps we should cache this somewhere in the BB, or are
> > +     multiple levels of empty BB's common. ??  */
> > +  FOR_EACH_EDGE (e, ei, bb->preds)
> > +    {
> > +      int res = bb_in_transaction_1 (e->src);
> > +      if (res != -1)
> > +       return (bool) res;
> > +    }
> > +  return false;
> >
> > that's weird code - if predecessors are not all in or not in a transaction
> > you return a random value.  So it seems this is somewhat fragile.
> >
> > If bb status can be derived from looking at a single stmt why is the
> > transaction-ness not recorded as a basic-block flag then?  Thus,
> > why do we have a bit in gimple stmts?
> 
> How about we get rid of the GIMPLE bit altogether and keep it in the BB flags
> like you mention?  See patch.

Yeah, that looks good.

> > +  bool single_exit_p = single_pred_p (ex->dest);
> >
> > that's a strange variable name ;)
> 
> How about loop_has_only_one_exit? :)

Fine ;)

> >
> > +/* Helper function for execute_sm.  On every path that sets REF, set
> > +   an appropriate flag indicating the store.  */
> > +
> > +static tree
> > +execute_sm_if_changed_flag_set (struct loop *loop, mem_ref_p ref)
> > +{
> > ...
> > +  FOR_EACH_VEC_ELT (mem_ref_loc_p, locs, i, loc)
> > +    {
> > +      gimple_stmt_iterator gsi;
> > +      gimple stmt;
> > +
> > +      gsi = gsi_start_bb (gimple_bb (loc->stmt));
> > +      for (; !gsi_end_p (gsi); gsi_next (&gsi))
> > +       if (gsi_stmt (gsi) == loc->stmt)
> >
> > does not seem to do it on every path but on every REF setter.  And instead
> > of the innermost loop just use gsi_for_stmt (loc->stmt).
> 
> Updated comment.  Fixed.
> 
> >
> > +  /* Emit the load code into the latch, so that we are sure it will
> > +     be processed after all dependencies.  */
> > +  latch_edge = loop_latch_edge (loop);
> > +  load = gimple_build_assign (mem_ssa, unshare_expr (ref->mem));
> >     lim_data = init_lim_data (load);
> >     lim_data->max_loop = loop;
> >     lim_data->tgt_loop = loop;
> > +  gsi_insert_on_edge (latch_edge, load);
> > +  load = gimple_build_assign (tmp_var, mem_ssa);
> > +  lim_data = init_lim_data (load);
> > +  lim_data->max_loop = loop;
> > +  lim_data->tgt_loop = loop;
> > +  gsi_insert_on_edge (latch_edge, load);
> >
> > the 2nd assign is no longer a "load", so I suppose you don't need any
> > lim_data
> > for it?
> 
> Re-did all this.  See patch.
> 
> >
> > +      else if (store == STORE_IF_CHANGED_FLAG)
> > +       execute_sm_if_changed (ex, ref->mem, mem_ssa, tmp_var,
> > +                              store_flag);
> >
> > and this can pass NULL instead of mem_ssa?
> 
> Got rid of mem_ssa altogether, as well as the if-changed approach which proved
> inferior to the flags based approach.

Good.

> > Overall this looks reasonable.  With also handling trapping things I meant
> > that we should be able to apply store-motion to
> >
> > int a[256];
> > void foo (int *p)
> > {
> >    int i;
> >    for (i= 0;i<256;i++)
> >      if (flag)
> >        *p = a[i];
> > }
> >
> > but we cannot, because the transform
> >
> >    lsm = *p;
> >    for (i = 0; i<  256; ++i)
> >      if (flag)
> >        lsm = a[i];
> >    *p = lsm;
> >
> > even when the store is properly conditional the load is not (and you do not
> > change that).  The unconditional load may trap here.  What I meant to say
> > is that we do not need the load at all unless there is also a load involved
> > inside the loop - if we use the flag-based approach.  If there is also a
> > load
> > inside the loop we'd need to perform the load there (or not handle this
> > case)
> 
> Ahh!  Yes, this sounds very profitable.  Would you mind me doing this in a
> separate patch?  I am already handling loads in this incarnation, so this
> should be straightforward.

No, it's fine to do it separately.

> Speak of loads, I am keeping the information as an additional bitmap in
> `memory_accesses', as ->refs_in_loop was set for stores as well, so I couldn't
> depend on it.  Let me know if you have another idea.

Hmm, refs_in_loop & ~all_refs_stored_in_loop, so instead of

+      bitmap reads = VEC_index (bitmap, memory_accesses.reads_in_loop,
+                               loop->num);
+      ref_read_in_loop_p = bitmap_bit_p (reads, ref->id);

ref_read_in_loop_p = bitmap_bit_p (refs, ref->id) && !bitmap_bit_p 
(stores, ref->id);

?  But maybe that doesn't work if a ref is both read and stored to.
Btw, rather than adding a bitmap to memory_accesses I'd rather add
a mark_ref_loaded corresponding to mark_ref_stored (or rather merge
both into one) and a bitmap to struct mem_ref.

> Mildly tested.  Wadaya think?

Looks good overall, minus the detail of remembering refs loaded.

Richard.


> Thanks again.
> Aldy
> 

-- 
Richard Guenther <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix ImendÃ¶rffer

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-04-26  9:52         ` Richard Guenther
@ 2012-05-07 23:05           ` Aldy Hernandez
  2012-05-08  0:12             ` Andrew MacLeod
                               ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Aldy Hernandez @ 2012-05-07 23:05 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Andrew MacLeod, Boehm, Hans, gcc-patches, Torvald Riegel

[-- Attachment #1: Type: text/plain, Size: 4069 bytes --]

Hi.  Sorry for the delay.  There were various tricky hiccups along the 
way to bootstrappability and regression cleanliness...

On 04/26/12 04:51, Richard Guenther wrote:
> On Wed, 25 Apr 2012, Aldy Hernandez wrote:
>
>> On 04/25/12 06:45, Richard Guenther wrote:
>>> On Tue, Apr 24, 2012 at 7:43 PM, Aldy Hernandez<aldyh@redhat.com>   wrote:
>>>> On 04/13/12 03:46, Richard Guenther wrote:
>>>>>
>>>>> On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez<aldyh@redhat.com>
>>>>> wrote:

>> Speak of loads, I am keeping the information as an additional bitmap in
>> `memory_accesses', as ->refs_in_loop was set for stores as well, so I couldn't
>> depend on it.  Let me know if you have another idea.
>
> Hmm, refs_in_loop&  ~all_refs_stored_in_loop, so instead of
>
> +      bitmap reads = VEC_index (bitmap, memory_accesses.reads_in_loop,
> +                               loop->num);
> +      ref_read_in_loop_p = bitmap_bit_p (reads, ref->id);
>
> ref_read_in_loop_p = bitmap_bit_p (refs, ref->id)&&  !bitmap_bit_p
> (stores, ref->id);
>
> ?  But maybe that doesn't work if a ref is both read and stored to.
> Btw, rather than adding a bitmap to memory_accesses I'd rather add
> a mark_ref_loaded corresponding to mark_ref_stored (or rather merge
> both into one) and a bitmap to struct mem_ref.

I have done so as mark_ref_loaded().  I first tried to merge both into a 
generic mark_ref(), to avoid iterating twice, but I was concerned with 
the early loop exit of !bitmap_bit_p(ref->stored, loop->num), not 
knowing if I should exit the loop altogether in the merged case or what. 
  In either case, the code was somewhat convoluted, so I opted for a 
separate clean function.

In scanning the gimple stream for the loads I noticed that instances of 
two component refs (say foo.bar) were not pointer comparable, so I am 
asking the alias oracle with refs_may_alias_p.  It also seemed to make 
more sense wrt memory chunks that may alias.  What do you think?

This patch is far more robust and fully tested.  Two wrenches to 
complicate things though...

First, while inserting the code writing the _LSM_ temporary writes into 
the original variables, I found a handful of tests where the order of 
the writes mattered (aliased locations, inlined code, etc, IIRC).  [This 
is in loops where multiple stores are moved.]  So iterating and 
inserting on the exit edges caused the stores to be saved in reverse.  I 
am now saving the edges and threading the code, so they happen in the 
correct order:

		if (foo_flag_lsm)
			foo = foo_lsm;
		if (foo2_flag_lsm)
			foo2 = foo2_lsm;
	actual_exit:

This required some edge redirection so we didn't jump to the original 
actual exit prematurely when handling multiple moved stores.  The 
dominator tree needed to be updated, and there is one instance where I 
couldn't find a clever way of saving the order without jumping through 
an inordinate amount of hoops.  It seemed easier to call 
recompute_dominator.

Second, avoiding the load, unless it happens in the loop like you have 
suggested, brought about some bogus unused warnings with the LSM 
temporaries.  What happens is that compute_ssa() creates uses of the LSM 
temporaries that are not yet defined, so we end up with PHI nodes with a 
definition entry (D).  Since we are sure the eventual uses will be gated 
by their corresponding flag, I created phi nodes with an initial value 
of 0 in the preheader of the loops in which they are used.  This solved 
the problem.  I also played with initializing to 0 at the entry block, 
but that seemed a bit too invasive.

Andrew suggested the correct fix was to add a new pass that was able to 
do some ?? flow sensitive data flow analysis ?? that could discover 
these unreachable paths and insert the 0 phis at the start of the blocks 
automatically.  But that seemed like far too much work, considering how 
long it's taken me to get this far ;-).

Bootstraped and fully tested with multi_threaded_model_p=true 
hard-coded.  This is the revision I'd like to contribute, sans the 
hardcoded value.

What do you think?

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 19040 bytes --]

	* cfg.c (alloc_aux_for_edge): Fix comment.
	(alloc_aux_for_edge): Remove static.
	* gimple.h (gimple_set_in_transaction): Remove.
	(gimple_in_transaction): Look in BB instead.
	(gimple_statement_base): Remove in_transaction field.
	* basic-block.h (enum bb_flags): Add BB_IN_TRANSACTION.
	(alloc_aux_for_edge): Protoize.
	* trans-mem.c (compute_transaction_bits): Place transaction bit
	information into basic blocks.
	* Makefile.in (tree-ssa-loop-im.o): Depend on TARGET_H.
	* doc/tm.texi: Regenerate.
	* tree-ssa-loop-im.c (execute_sm_if_changed): New.
	(execute_sm_if_changed_flag): New.
	(execute_sm_if_changed_flag_set): New.
	(execute_sm): Do not generate data races unless requested.
	(mark_ref): Rename from mark_ref_stored.  Set loaded bit.
	(mem_ref_alloc): Initialize loaded field.
	(memref_free): Free loaded field.
	(tree_ssa_lim_initialize): Call alloc_aux_for_edges.
	(tree_ssa_lim_finalize): Call free_aux_for_edges.
	* targhooks.c (default_can_compare_bitwise_p): New.
	* targhooks.h (default_can_compare_bitwise_p): Protoize.
	* target.def (can_compare_bitwise_p): New hook.

Index: testsuite/gcc.dg/pr52558-2.c
===================================================================
--- testsuite/gcc.dg/pr52558-2.c	(revision 0)
+++ testsuite/gcc.dg/pr52558-2.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "--param allow-store-data-races=0 -O2 -fdump-tree-lim1" } */
+
+/* Test that g_2 is not written to unless !g_1.  */
+
+int g_1 = 1;
+int g_2 = 0;
+
+int func_1(void)
+{
+ int l;
+ for (l = 0; l < 1234; l++)
+ {
+   if (g_1)
+     return l;
+   else
+     g_2 = 0;
+ }
+ return 999;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM.*g_2_lsm_flag" 1 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: testsuite/gcc.dg/pr52558-1.c
===================================================================
--- testsuite/gcc.dg/pr52558-1.c	(revision 0)
+++ testsuite/gcc.dg/pr52558-1.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "--param allow-store-data-races=0 -O2 -fdump-tree-lim1" } */
+
+/* Test that `count' is not written to unless p->data > 0.  */
+
+int count;
+
+struct obj {
+    int data;
+    struct obj *next;
+} *q;
+
+void func()
+{
+  struct obj *p;
+  for (p = q; p; p = p->next)
+    if (p->data > 0)
+      count++;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM count_lsm.. count_lsm_flag" 1 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: testsuite/gcc.dg/tm/reg-promotion.c
===================================================================
--- testsuite/gcc.dg/tm/reg-promotion.c	(revision 0)
+++ testsuite/gcc.dg/tm/reg-promotion.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm -O2 -fdump-tree-lim1" } */
+
+/* Test that `count' is not written to unless p->data>0.  */
+
+int count;
+
+struct obj {
+    int data;
+    struct obj *next;
+} *q;
+
+void func()
+{
+  struct obj *p;
+  __transaction_atomic {
+    for (p = q; p; p = p->next)
+      if (p->data > 0)
+	count++;
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "MEM count_lsm.. count_lsm_flag" 1 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: tree-ssa-loop-im.c
===================================================================
--- tree-ssa-loop-im.c	(revision 187051)
+++ tree-ssa-loop-im.c	(working copy)
@@ -52,7 +52,7 @@ along with GCC; see the file COPYING3.
 	 }
      }
 
-   Where COND and INV are is invariants, but evaluating INV may trap or be
+   Where COND and INV are invariants, but evaluating INV may trap or be
    invalid from some other reason if !COND.  This may be transformed to
 
    if (cond)
@@ -132,6 +132,10 @@ typedef struct mem_ref
   hashval_t hash;		/* Its hash value.  */
   bitmap stored;		/* The set of loops in that this memory location
 				   is stored to.  */
+  bitmap loaded;		/* The set of loops in that this
+				   memory location may be loaded.
+				   This includes aliases of the memory
+				   location.  */
   VEC (mem_ref_locs_p, heap) *accesses_in_loop;
 				/* The locations of the accesses.  Vector
 				   indexed by the loop number.  */
@@ -1487,6 +1491,7 @@ memref_free (void *obj)
   mem_ref_locs_p accs;
 
   BITMAP_FREE (mem->stored);
+  BITMAP_FREE (mem->loaded);
   BITMAP_FREE (mem->indep_loop);
   BITMAP_FREE (mem->dep_loop);
   BITMAP_FREE (mem->indep_ref);
@@ -1510,6 +1515,7 @@ mem_ref_alloc (tree mem, unsigned hash,
   ref->id = id;
   ref->hash = hash;
   ref->stored = BITMAP_ALLOC (NULL);
+  ref->loaded = BITMAP_ALLOC (NULL);
   ref->indep_loop = BITMAP_ALLOC (NULL);
   ref->dep_loop = BITMAP_ALLOC (NULL);
   ref->indep_ref = BITMAP_ALLOC (NULL);
@@ -1569,6 +1575,18 @@ mark_ref_stored (mem_ref_p ref, struct l
     bitmap_set_bit (ref->stored, loop->num);
 }
 
+/* Marks reference REF as loaded in LOOP.  */
+
+static void
+mark_ref_loaded (mem_ref_p ref, struct loop *loop)
+{
+  for (;
+       loop != current_loops->tree_root
+       && !bitmap_bit_p (ref->loaded, loop->num);
+       loop = loop_outer (loop))
+    bitmap_set_bit (ref->loaded, loop->num);
+}
+
 /* Gathers memory references in statement STMT in LOOP, storing the
    information about them in the memory_accesses structure.  Marks
    the vops accessed through unrecognized statements there as
@@ -1626,6 +1644,22 @@ gather_mem_refs_stmt (struct loop *loop,
 	  fprintf (dump_file, "\n");
 	}
     }
+
+  if (gimple_assign_single_p (stmt))
+    {
+      tree t = gimple_assign_rhs1 (stmt);
+      /* ?? For some reason two exact COMPONENT_REFs cannot be
+	 compared with pointer equality, so ask the alias oracle.  */
+      if (ref->mem == t
+	  || ((TREE_CODE (t) == SSA_NAME
+	       || DECL_P (t)
+	       || handled_component_p (t)
+	       || TREE_CODE (t) == MEM_REF
+	       || TREE_CODE (t) == TARGET_MEM_REF)
+	      && refs_may_alias_p (t, ref->mem)))
+	mark_ref_loaded (ref, loop);
+    }
+
   if (is_stored)
     mark_ref_stored (ref, loop);
 
@@ -1956,6 +1990,175 @@ get_lsm_tmp_name (tree ref, unsigned n)
   return lsm_tmp_name;
 }
 
+struct prev_flag_edges {
+  /* Edge to insert new flag comparison code.  */
+  edge append_cond_position;
+
+  /* Edge for fall through from previous flag comparison.  */
+  edge last_cond_fallthru;
+};
+
+/* Helper function for execute_sm.  Emit code to store TMP_VAR into
+   MEM along edge EX.
+
+   The store is only done if MEM has changed.  We do this so no
+   changes to MEM occur on code paths that did not originally store
+   into it.
+
+   The common case for execute_sm will transform:
+
+     for (...) {
+       if (foo)
+         stuff;
+       else
+         MEM = TMP_VAR;
+     }
+
+   into:
+
+     lsm = MEM;
+     for (...) {
+       if (foo)
+         stuff;
+       else
+         lsm = TMP_VAR;
+     }
+     MEM = lsm;
+
+  This function will generate:
+
+     // Note: this load can be avoided unless there is a load already
+     //       present in the loop.
+     lsm = MEM;
+
+     lsm_flag = false;
+     ...
+     for (...) {
+       if (foo)
+         stuff;
+       else {
+         lsm = TMP_VAR;
+         lsm_flag = true;
+       }
+     }
+     if (lsm_flag)	<--
+       MEM = lsm;	<--
+*/
+
+static void
+execute_sm_if_changed (edge ex, tree mem, tree tmp_var, tree flag)
+{
+  basic_block new_bb, then_bb, old_dest;
+  bool loop_has_only_one_exit;
+  edge then_old_edge, orig_ex = ex;
+  gimple_stmt_iterator gsi;
+  gimple stmt;
+  struct prev_flag_edges *prev_edges = (struct prev_flag_edges *) ex->aux;
+
+  /* ?? Insert store after previous store if applicable.  See note
+     below.  */
+  if (prev_edges)
+    ex = prev_edges->append_cond_position;
+
+  loop_has_only_one_exit = single_pred_p (ex->dest);
+
+  if (loop_has_only_one_exit)
+    ex = split_block_after_labels (ex->dest);
+
+  old_dest = ex->dest;
+  new_bb = split_edge (ex);
+  then_bb = create_empty_bb (new_bb);
+  if (current_loops && new_bb->loop_father)
+    add_bb_to_loop (then_bb, new_bb->loop_father);
+
+  gsi = gsi_start_bb (new_bb);
+  stmt = gimple_build_cond (NE_EXPR, flag, boolean_false_node,
+			    NULL_TREE, NULL_TREE);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+
+  gsi = gsi_start_bb (then_bb);
+  /* Insert actual store.  */
+  stmt = gimple_build_assign (unshare_expr (mem), tmp_var);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+
+  make_edge (new_bb, then_bb, EDGE_TRUE_VALUE);
+  make_edge (new_bb, old_dest, EDGE_FALSE_VALUE);
+  then_old_edge = make_edge (then_bb, old_dest, EDGE_FALLTHRU);
+
+  set_immediate_dominator (CDI_DOMINATORS, then_bb, new_bb);
+
+  if (prev_edges)
+    {
+      basic_block prevbb = prev_edges->last_cond_fallthru->src;
+      redirect_edge_succ (prev_edges->last_cond_fallthru, new_bb);
+      set_immediate_dominator (CDI_DOMINATORS, new_bb, prevbb);
+      set_immediate_dominator (CDI_DOMINATORS, old_dest,
+			       recompute_dominator (CDI_DOMINATORS, old_dest));
+    }
+
+  /* ?? Because stores may alias, they must happen in the exact
+     sequence they originally happened.  Save the position right after
+     the (_lsm) store we just created so we can continue appending after
+     it and maintain the original order.  */
+  {
+    struct prev_flag_edges *p;
+
+    if (orig_ex->aux)
+      orig_ex->aux = NULL;
+    alloc_aux_for_edge (orig_ex, sizeof (struct prev_flag_edges));
+    p = (struct prev_flag_edges *) orig_ex->aux;
+    p->append_cond_position = then_old_edge;
+    p->last_cond_fallthru = find_edge (new_bb, old_dest);
+    orig_ex->aux = (void *) p;
+  }
+
+  if (!loop_has_only_one_exit)
+    for (gsi = gsi_start_phis (old_dest); !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	gimple phi = gsi_stmt (gsi);
+	unsigned i;
+
+	for (i = 0; i < gimple_phi_num_args (phi); i++)
+	  if (gimple_phi_arg_edge (phi, i)->src == new_bb)
+	    {
+	      tree arg = gimple_phi_arg_def (phi, i);
+	      add_phi_arg (phi, arg, then_old_edge, UNKNOWN_LOCATION);
+	      update_stmt (phi);
+	    }
+      }
+  /* Remove the original fall through edge.  This was the
+     single_succ_edge (new_bb).  */
+  EDGE_SUCC (new_bb, 0)->flags &= ~EDGE_FALLTHRU;
+}
+
+/* Helper function for execute_sm.  On every location where REF is
+   set, set an appropriate flag indicating the store.  */
+
+static tree
+execute_sm_if_changed_flag_set (struct loop *loop, mem_ref_p ref)
+{
+  unsigned i;
+  mem_ref_loc_p loc;
+  tree flag;
+  VEC (mem_ref_loc_p, heap) *locs = NULL;
+  char *str = get_lsm_tmp_name (ref->mem, ~0);
+
+  lsm_tmp_name_add ("_flag");
+  flag = make_rename_temp (boolean_type_node, str);
+  get_all_locs_in_loop (loop, ref, &locs);
+  FOR_EACH_VEC_ELT (mem_ref_loc_p, locs, i, loc)
+    {
+      gimple_stmt_iterator gsi;
+      gimple stmt;
+
+      gsi = gsi_for_stmt (loc->stmt);
+      stmt = gimple_build_assign (flag, boolean_true_node);
+      gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+    }
+  VEC_free (mem_ref_loc_p, heap, locs);
+  return flag;
+}
+
 /* Executes store motion of memory reference REF from LOOP.
    Exits from the LOOP are stored in EXITS.  The initialization of the
    temporary variable is put to the preheader of the loop, and assignments
@@ -1964,12 +2167,16 @@ get_lsm_tmp_name (tree ref, unsigned n)
 static void
 execute_sm (struct loop *loop, VEC (edge, heap) *exits, mem_ref_p ref)
 {
-  tree tmp_var;
+  tree tmp_var, store_flag;
   unsigned i;
-  gimple load, store;
+  gimple load;
   struct fmt_data fmt_data;
-  edge ex;
+  edge ex, latch_edge;
   struct lim_aux_data *lim_data;
+  bool multi_threaded_model_p = false;
+  /* ?? FIXME TESTING TESTING ?? */
+  multi_threaded_model_p=true;
+  /* ?? FIXME TESTING TESTING ?? */
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -1985,23 +2192,69 @@ execute_sm (struct loop *loop, VEC (edge
   fmt_data.orig_loop = loop;
   for_each_index (&ref->mem, force_move_till, &fmt_data);
 
+  if ((flag_tm && loop_preheader_edge (loop)->src->flags & BB_IN_TRANSACTION)
+      || !PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES))
+    multi_threaded_model_p = true;
+
+  if (multi_threaded_model_p)
+    store_flag = execute_sm_if_changed_flag_set (loop, ref);
+
   rewrite_mem_refs (loop, ref, tmp_var);
 
-  /* Emit the load & stores.  */
-  load = gimple_build_assign (tmp_var, unshare_expr (ref->mem));
-  lim_data = init_lim_data (load);
-  lim_data->max_loop = loop;
-  lim_data->tgt_loop = loop;
-
-  /* Put this into the latch, so that we are sure it will be processed after
-     all dependencies.  */
-  gsi_insert_on_edge (loop_latch_edge (loop), load);
 
-  FOR_EACH_VEC_ELT (edge, exits, i, ex)
-    {
-      store = gimple_build_assign (unshare_expr (ref->mem), tmp_var);
-      gsi_insert_on_edge (ex, store);
+  /* Emit the load code into the latch, so that we are sure it will
+     be processed after all dependencies.  */
+  latch_edge = loop_latch_edge (loop);
+  /* For the multi-threaded variant, we can avoid the load altogether,
+     since the store is predicated by a flag.  However, do the load if
+     it was originally in the loop.  */
+  if (!multi_threaded_model_p || bitmap_bit_p (ref->loaded, loop->num))
+    {
+      load = gimple_build_assign (tmp_var, unshare_expr (ref->mem));
+      lim_data = init_lim_data (load);
+      lim_data->max_loop = loop;
+      lim_data->tgt_loop = loop;
+      gsi_insert_on_edge (latch_edge, load);
+    }
+  else if (multi_threaded_model_p)
+    {
+      /* We know that all uses of tmp_var will be gated by lsm_flag,
+	 but subsequent passes can be confused and think tmp_var is
+	 used before it is defined.  If we avoided the load, create a
+	 PHI node of 0 for the data flow path we know to be
+	 unreachable.
+
+	 ?? Ideally what we'd want is a flow sensitive data flow
+	 analysis pass that can discover the above, and both remove
+	 the load and add the dummy PHI nodes for unreachable
+	 blocks. */
+      gimple phi;
+      edge e = loop_preheader_edge (loop);
+      tree t = make_ssa_name (tmp_var, NULL);
+
+      phi = create_phi_node (t, e->dest);
+      SSA_NAME_DEF_STMT (t) = phi;
+      add_phi_arg (phi, build_zero_cst (TREE_TYPE (t)), e, UNKNOWN_LOCATION);
+    }
+  if (multi_threaded_model_p)
+    {
+      load = gimple_build_assign (store_flag, boolean_false_node);
+      lim_data = init_lim_data (load);
+      lim_data->max_loop = loop;
+      lim_data->tgt_loop = loop;
+      gsi_insert_on_edge (latch_edge, load);
     }
+
+  /* Sink the store to every exit from the loop.  */
+  FOR_EACH_VEC_ELT (edge, exits, i, ex)
+    if (!multi_threaded_model_p)
+      {
+	gimple store;
+	store = gimple_build_assign (unshare_expr (ref->mem), tmp_var);
+	gsi_insert_on_edge (ex, store);
+      }
+    else
+      execute_sm_if_changed (ex, ref->mem, tmp_var, store_flag);
 }
 
 /* Hoists memory references MEM_REFS out of LOOP.  EXITS is the list of exit
@@ -2410,6 +2663,8 @@ tree_ssa_lim_initialize (void)
 
   if (flag_tm)
     compute_transaction_bits ();
+
+  alloc_aux_for_edges (0);
 }
 
 /* Cleans up after the invariant motion pass.  */
@@ -2421,6 +2676,8 @@ tree_ssa_lim_finalize (void)
   unsigned i;
   bitmap b;
 
+  free_aux_for_edges ();
+
   FOR_EACH_BB (bb)
     SET_ALWAYS_EXECUTED_IN (bb, NULL);
 
Index: trans-mem.c
===================================================================
--- trans-mem.c	(revision 187051)
+++ trans-mem.c	(working copy)
@@ -2449,13 +2449,15 @@ compute_transaction_bits (void)
   struct tm_region *region;
   VEC (basic_block, heap) *queue;
   unsigned int i;
-  gimple_stmt_iterator gsi;
   basic_block bb;
 
   /* ?? Perhaps we need to abstract gate_tm_init further, because we
      certainly don't need it to calculate CDI_DOMINATOR info.  */
   gate_tm_init ();
 
+  FOR_EACH_BB (bb)
+    bb->flags &= ~BB_IN_TRANSACTION;
+
   for (region = all_tm_regions; region; region = region->next)
     {
       queue = get_tm_region_blocks (region->entry_block,
@@ -2464,11 +2466,7 @@ compute_transaction_bits (void)
 				    NULL,
 				    /*stop_at_irr_p=*/true);
       for (i = 0; VEC_iterate (basic_block, queue, i, bb); ++i)
-	for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-	  {
-	    gimple stmt = gsi_stmt (gsi);
-	    gimple_set_in_transaction (stmt, true);
-	  }
+	bb->flags |= BB_IN_TRANSACTION;
       VEC_free (basic_block, heap, queue);
     }
 
Index: gimple.h
===================================================================
--- gimple.h	(revision 187051)
+++ gimple.h	(working copy)
@@ -305,11 +305,6 @@ struct GTY(()) gimple_statement_base {
   /* Nonzero if this statement contains volatile operands.  */
   unsigned has_volatile_ops 	: 1;
 
-  /* Nonzero if this statement appears inside a transaction.  This bit
-     is calculated on de-mand and has relevant information only after
-     it has been calculated with compute_transaction_bits.  */
-  unsigned in_transaction	: 1;
-
   /* The SUBCODE field can be used for tuple-specific flags for tuples
      that do not require subcodes.  Note that SUBCODE should be at
      least as wide as tree codes, as several tuples store tree codes
@@ -1585,15 +1580,7 @@ gimple_set_has_volatile_ops (gimple stmt
 static inline bool
 gimple_in_transaction (gimple stmt)
 {
-  return stmt->gsbase.in_transaction;
-}
-
-/* Set the IN_TRANSACTION flag to TRANSACTIONP.  */
-
-static inline void
-gimple_set_in_transaction (gimple stmt, bool transactionp)
-{
-  stmt->gsbase.in_transaction = (unsigned) transactionp;
+  return gimple_bb (stmt)->flags & BB_IN_TRANSACTION;
 }
 
 /* Return true if statement STMT may access memory.  */
Index: cfg.c
===================================================================
--- cfg.c	(revision 187051)
+++ cfg.c	(working copy)
@@ -814,10 +814,10 @@ free_aux_for_blocks (void)
   clear_aux_for_blocks ();
 }
 
-/* Allocate a memory edge of SIZE as BB->aux.  The obstack must
+/* Allocate a memory edge of SIZE as E->aux.  The obstack must
    be first initialized by alloc_aux_for_edges.  */
 
-static void
+void
 alloc_aux_for_edge (edge e, int size)
 {
   /* Verify that aux field is clear.  */
Index: basic-block.h
===================================================================
--- basic-block.h	(revision 187051)
+++ basic-block.h	(working copy)
@@ -256,7 +256,12 @@ enum bb_flags
      df_set_bb_dirty, but not cleared by df_analyze, so it can be used
      to test whether a block has been modified prior to a df_analyze
      call.  */
-  BB_MODIFIED = 1 << 12
+  BB_MODIFIED = 1 << 12,
+
+  /* Set on blocks that are in a transaction.  This is calculated on
+     demand, and is available after calling
+     compute_transaction_bits().  */
+  BB_IN_TRANSACTION = 1 << 13
 };
 
 /* Dummy flag for convenience in the hot/cold partitioning code.  */
@@ -788,6 +793,7 @@ extern basic_block alloc_block (void);
 extern void alloc_aux_for_blocks (int);
 extern void clear_aux_for_blocks (void);
 extern void free_aux_for_blocks (void);
+extern void alloc_aux_for_edge (edge, int);
 extern void alloc_aux_for_edges (int);
 extern void clear_aux_for_edges (void);
 extern void free_aux_for_edges (void);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-05-07 23:05           ` Aldy Hernandez
@ 2012-05-08  0:12             ` Andrew MacLeod
  2012-05-08 19:50               ` Aldy Hernandez
  2012-05-15 14:27             ` PING: " Aldy Hernandez
  2012-05-16 12:54             ` Richard Guenther
  2 siblings, 1 reply; 22+ messages in thread
From: Andrew MacLeod @ 2012-05-08  0:12 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Richard Guenther, Boehm, Hans, gcc-patches, Torvald Riegel

On 05/07/2012 07:04 PM, Aldy Hernandez wrote:
>
>
> Andrew suggested the correct fix was to add a new pass that was able 
> to do some ?? flow sensitive data flow analysis ?? that could discover 
> these unreachable paths and insert the 0 phis at the start of the 
> blocks automatically.  But that seemed like far too much work, 
> considering how long it's taken me to get this far ;-).
>

Wait, no.     I didn't suggest writing an entire generic pass was the 
correct fix for you... I said that this was a technique that a generic 
pass  which identified this sort of flow sensitive data flow info could 
use to work within the CFG.  Simply zeroing out uses of the load in PHI 
nodes on paths it has determined are not actually reachable by that 
value, and you were suppose to integrate just the bits of that technique 
that you required.. just replace any uses of your LSM temporary with 
0.   I never intended you should write an entire pass...

Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-05-08  0:12             ` Andrew MacLeod
@ 2012-05-08 19:50               ` Aldy Hernandez
  0 siblings, 0 replies; 22+ messages in thread
From: Aldy Hernandez @ 2012-05-08 19:50 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: Richard Guenther, Boehm, Hans, gcc-patches, Torvald Riegel

On 05/07/12 19:11, Andrew MacLeod wrote:
> On 05/07/2012 07:04 PM, Aldy Hernandez wrote:
>>
>>
>> Andrew suggested the correct fix was to add a new pass that was able
>> to do some ?? flow sensitive data flow analysis ?? that could discover
>> these unreachable paths and insert the 0 phis at the start of the
>> blocks automatically. But that seemed like far too much work,
>> considering how long it's taken me to get this far ;-).
>>
>
> Wait, no. I didn't suggest writing an entire generic pass was the
> correct fix for you... I said that this was a technique that a generic
> pass which identified this sort of flow sensitive data flow info could
> use to work within the CFG. Simply zeroing out uses of the load in PHI
> nodes on paths it has determined are not actually reachable by that
> value, and you were suppose to integrate just the bits of that technique
> that you required.. just replace any uses of your LSM temporary with 0.
> I never intended you should write an entire pass...

Ah, well in that case, I've already done that.  Well I don't do it on 
any path, just in the loop header, and things propagate down from there 
quite nicely :).

Aldy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* PING: Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-05-07 23:05           ` Aldy Hernandez
  2012-05-08  0:12             ` Andrew MacLeod
@ 2012-05-15 14:27             ` Aldy Hernandez
  2012-05-16 12:54             ` Richard Guenther
  2 siblings, 0 replies; 22+ messages in thread
From: Aldy Hernandez @ 2012-05-15 14:27 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Andrew MacLeod, Boehm, Hans, gcc-patches, Torvald Riegel

PING.

> Hi. Sorry for the delay. There were various tricky hiccups along the way
> to bootstrappability and regression cleanliness...
>
> On 04/26/12 04:51, Richard Guenther wrote:
>> On Wed, 25 Apr 2012, Aldy Hernandez wrote:
>>
>>> On 04/25/12 06:45, Richard Guenther wrote:
>>>> On Tue, Apr 24, 2012 at 7:43 PM, Aldy Hernandez<aldyh@redhat.com>
>>>> wrote:
>>>>> On 04/13/12 03:46, Richard Guenther wrote:
>>>>>>
>>>>>> On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez<aldyh@redhat.com>
>>>>>> wrote:
>
>>> Speak of loads, I am keeping the information as an additional bitmap in
>>> `memory_accesses', as ->refs_in_loop was set for stores as well, so I
>>> couldn't
>>> depend on it. Let me know if you have another idea.
>>
>> Hmm, refs_in_loop& ~all_refs_stored_in_loop, so instead of
>>
>> + bitmap reads = VEC_index (bitmap, memory_accesses.reads_in_loop,
>> + loop->num);
>> + ref_read_in_loop_p = bitmap_bit_p (reads, ref->id);
>>
>> ref_read_in_loop_p = bitmap_bit_p (refs, ref->id)&& !bitmap_bit_p
>> (stores, ref->id);
>>
>> ? But maybe that doesn't work if a ref is both read and stored to.
>> Btw, rather than adding a bitmap to memory_accesses I'd rather add
>> a mark_ref_loaded corresponding to mark_ref_stored (or rather merge
>> both into one) and a bitmap to struct mem_ref.
>
> I have done so as mark_ref_loaded(). I first tried to merge both into a
> generic mark_ref(), to avoid iterating twice, but I was concerned with
> the early loop exit of !bitmap_bit_p(ref->stored, loop->num), not
> knowing if I should exit the loop altogether in the merged case or what.
> In either case, the code was somewhat convoluted, so I opted for a
> separate clean function.
>
> In scanning the gimple stream for the loads I noticed that instances of
> two component refs (say foo.bar) were not pointer comparable, so I am
> asking the alias oracle with refs_may_alias_p. It also seemed to make
> more sense wrt memory chunks that may alias. What do you think?
>
> This patch is far more robust and fully tested. Two wrenches to
> complicate things though...
>
> First, while inserting the code writing the _LSM_ temporary writes into
> the original variables, I found a handful of tests where the order of
> the writes mattered (aliased locations, inlined code, etc, IIRC). [This
> is in loops where multiple stores are moved.] So iterating and inserting
> on the exit edges caused the stores to be saved in reverse. I am now
> saving the edges and threading the code, so they happen in the correct
> order:
>
> if (foo_flag_lsm)
> foo = foo_lsm;
> if (foo2_flag_lsm)
> foo2 = foo2_lsm;
> actual_exit:
>
> This required some edge redirection so we didn't jump to the original
> actual exit prematurely when handling multiple moved stores. The
> dominator tree needed to be updated, and there is one instance where I
> couldn't find a clever way of saving the order without jumping through
> an inordinate amount of hoops. It seemed easier to call
> recompute_dominator.
>
> Second, avoiding the load, unless it happens in the loop like you have
> suggested, brought about some bogus unused warnings with the LSM
> temporaries. What happens is that compute_ssa() creates uses of the LSM
> temporaries that are not yet defined, so we end up with PHI nodes with a
> definition entry (D). Since we are sure the eventual uses will be gated
> by their corresponding flag, I created phi nodes with an initial value
> of 0 in the preheader of the loops in which they are used. This solved
> the problem. I also played with initializing to 0 at the entry block,
> but that seemed a bit too invasive.
>
> Andrew suggested the correct fix was to add a new pass that was able to
> do some ?? flow sensitive data flow analysis ?? that could discover
> these unreachable paths and insert the 0 phis at the start of the blocks
> automatically. But that seemed like far too much work, considering how
> long it's taken me to get this far ;-).
>
> Bootstraped and fully tested with multi_threaded_model_p=true
> hard-coded. This is the revision I'd like to contribute, sans the
> hardcoded value.
>
> What do you think?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-05-07 23:05           ` Aldy Hernandez
  2012-05-08  0:12             ` Andrew MacLeod
  2012-05-15 14:27             ` PING: " Aldy Hernandez
@ 2012-05-16 12:54             ` Richard Guenther
  2012-05-22  2:32               ` Aldy Hernandez
  2 siblings, 1 reply; 22+ messages in thread
From: Richard Guenther @ 2012-05-16 12:54 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Andrew MacLeod, Boehm, Hans, gcc-patches, Torvald Riegel

On Mon, 7 May 2012, Aldy Hernandez wrote:

> Hi.  Sorry for the delay.  There were various tricky hiccups along the way to
> bootstrappability and regression cleanliness...
> 
> On 04/26/12 04:51, Richard Guenther wrote:
> > On Wed, 25 Apr 2012, Aldy Hernandez wrote:
> >
> > > On 04/25/12 06:45, Richard Guenther wrote:
> > > > On Tue, Apr 24, 2012 at 7:43 PM, Aldy Hernandez<aldyh@redhat.com>
> > > > wrote:
> > > > > On 04/13/12 03:46, Richard Guenther wrote:
> > > > > >
> > > > > > On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez<aldyh@redhat.com>
> > > > > > wrote:
> 
> > > Speak of loads, I am keeping the information as an additional bitmap in
> > > `memory_accesses', as ->refs_in_loop was set for stores as well, so I
> > > couldn't
> > > depend on it.  Let me know if you have another idea.
> >
> > Hmm, refs_in_loop&  ~all_refs_stored_in_loop, so instead of
> >
> > +      bitmap reads = VEC_index (bitmap, memory_accesses.reads_in_loop,
> > +                               loop->num);
> > +      ref_read_in_loop_p = bitmap_bit_p (reads, ref->id);
> >
> > ref_read_in_loop_p = bitmap_bit_p (refs, ref->id)&&  !bitmap_bit_p
> > (stores, ref->id);
> >
> > ?  But maybe that doesn't work if a ref is both read and stored to.
> > Btw, rather than adding a bitmap to memory_accesses I'd rather add
> > a mark_ref_loaded corresponding to mark_ref_stored (or rather merge
> > both into one) and a bitmap to struct mem_ref.
> 
> I have done so as mark_ref_loaded().  I first tried to merge both into a
> generic mark_ref(), to avoid iterating twice, but I was concerned with the
> early loop exit of !bitmap_bit_p(ref->stored, loop->num), not knowing if I
> should exit the loop altogether in the merged case or what.  In either case,
> the code was somewhat convoluted, so I opted for a separate clean function.
> 
> In scanning the gimple stream for the loads I noticed that instances of two
> component refs (say foo.bar) were not pointer comparable, so I am asking the
> alias oracle with refs_may_alias_p.  It also seemed to make more sense wrt
> memory chunks that may alias.  What do you think?

No, it asks for equal must aliases (it will replace them after all!).
You should use operand_equal_p to compare reference trees.

But why do all the following dance

+
+  if (gimple_assign_single_p (stmt))
+    {
+      tree t = gimple_assign_rhs1 (stmt);
+      /* ?? For some reason two exact COMPONENT_REFs cannot be
+        compared with pointer equality, so ask the alias oracle.  */
+      if (ref->mem == t
+         || ((TREE_CODE (t) == SSA_NAME
+              || DECL_P (t)
+              || handled_component_p (t)
+              || TREE_CODE (t) == MEM_REF
+              || TREE_CODE (t) == TARGET_MEM_REF)
+             && refs_may_alias_p (t, ref->mem)))
+       mark_ref_loaded (ref, loop);
+    }

at all and not just

   if (is_stored)
     mark_ref_stored (ref, loop);
   else
     mark_ref_loaded (ref, loop);

and similar in the !mem case mark the ref loaded unconditionally:

   mark_ref_loaded (ref, loop);
   if (gimple_vdef (stmt))
     mark_ref_stored (ref, loop);

> This patch is far more robust and fully tested.  Two wrenches to complicate
> things though...
> 
> First, while inserting the code writing the _LSM_ temporary writes into the
> original variables, I found a handful of tests where the order of the writes
> mattered (aliased locations, inlined code, etc, IIRC).  [This is in loops
> where multiple stores are moved.]  So iterating and inserting on the exit
> edges caused the stores to be saved in reverse.  I am now saving the edges and
> threading the code, so they happen in the correct order:
> 
> 		if (foo_flag_lsm)
> 			foo = foo_lsm;
> 		if (foo2_flag_lsm)
> 			foo2 = foo2_lsm;
> 	actual_exit:
> 
> This required some edge redirection so we didn't jump to the original actual
> exit prematurely when handling multiple moved stores.  The dominator tree
> needed to be updated, and there is one instance where I couldn't find a clever
> way of saving the order without jumping through an inordinate amount of hoops.
> It seemed easier to call recompute_dominator.
> 
> Second, avoiding the load, unless it happens in the loop like you have
> suggested, brought about some bogus unused warnings with the LSM temporaries.
> What happens is that compute_ssa() creates uses of the LSM temporaries that
> are not yet defined, so we end up with PHI nodes with a definition entry (D).
> Since we are sure the eventual uses will be gated by their corresponding flag,
> I created phi nodes with an initial value of 0 in the preheader of the loops
> in which they are used.  This solved the problem.  I also played with
> initializing to 0 at the entry block, but that seemed a bit too invasive.

Hm.  Btw, see also PR39612 which you could solve with that?

> Andrew suggested the correct fix was to add a new pass that was able to do
> some ?? flow sensitive data flow analysis ?? that could discover these
> unreachable paths and insert the 0 phis at the start of the blocks
> automatically.  But that seemed like far too much work, considering how long
> it's taken me to get this far ;-).
> 
> Bootstraped and fully tested with multi_threaded_model_p=true hard-coded.
> This is the revision I'd like to contribute, sans the hardcoded value.
> 
> What do you think?

I notice

+  /* Emit the load code into the latch, so that we are sure it will
+     be processed after all dependencies.  */
+  latch_edge = loop_latch_edge (loop);

inserting code into the latch block is bad - the vectorizer will
be confused by non-empty latches.

You might at first (as you suggested yourself ;)) avoid trying to
tackle the load issue.  The patch is already complex enough, and
a candidate for splitting out is the BB_IN_TRANSACTION change
(hereby pre-approved).

Your ChangeLog mentions changes that are no longer necessary
(the target hook).

I didn't quite get the store order issue - we _do_ preserve store
ordering right now, do we?  So how come introducing the flags
change that?

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-05-16 12:54             ` Richard Guenther
@ 2012-05-22  2:32               ` Aldy Hernandez
  2012-05-29 11:14                 ` Richard Guenther
  0 siblings, 1 reply; 22+ messages in thread
From: Aldy Hernandez @ 2012-05-22  2:32 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Andrew MacLeod, Boehm, Hans, gcc-patches, Torvald Riegel

[-- Attachment #1: Type: text/plain, Size: 1363 bytes --]

On 05/16/12 07:53, Richard Guenther wrote:
> On Mon, 7 May 2012, Aldy Hernandez wrote:

[Sorry for the delay; I was on vacation.]

I am forgoing the load avoidance code altogether to simplify things. 
Thanks.

> +  /* Emit the load code into the latch, so that we are sure it will
> +     be processed after all dependencies.  */
> +  latch_edge = loop_latch_edge (loop);
>
> inserting code into the latch block is bad - the vectorizer will
> be confused by non-empty latches.

The code as is on trunk inserts the load on the latch.  That's why I 
also inserted the supporting flag checking code there.  Do you want me 
to put the supporting code somewhere else?

> Your ChangeLog mentions changes that are no longer necessary
> (the target hook).

Whoops.  Fixed.

>
> I didn't quite get the store order issue - we _do_ preserve store
> ordering right now, do we?  So how come introducing the flags
> change that?

The current scheme on trunk works because it inserts one load with 
gsi_insert_on_edge() and subsequent ones get appended correctly, whereas 
my patch has to split the edge (which happens at the top of the block), 
so subsequent code inserts happen in reverse order (at the top).  If I 
remember correctly, that is... I can look again and report if it's still 
unclear.

New patch attached.  Tested on x86-64 Linux.  No regressions.

Thanks.
Aldy

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 12258 bytes --]

	* cfg.c (alloc_aux_for_edge): Fix comment.
	(alloc_aux_for_edge): Remove static.
	* basic-block.h (alloc_aux_for_edge): Protoize.
	* tree-ssa-loop-im.c (execute_sm_if_changed): New.
	(execute_sm_if_changed_flag): New.
	(execute_sm_if_changed_flag_set): New.
	(execute_sm): Do not generate data races unless requested.
	(tree_ssa_lim_initialize): Call alloc_aux_for_edges.
	(tree_ssa_lim_finalize): Call free_aux_for_edges.

Index: testsuite/gcc.dg/tm/reg-promotion.c
===================================================================
--- testsuite/gcc.dg/tm/reg-promotion.c	(revision 0)
+++ testsuite/gcc.dg/tm/reg-promotion.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm -O2 -fdump-tree-lim1" } */
+
+/* Test that `count' is not written to unless p->data>0.  */
+
+int count;
+
+struct obj {
+    int data;
+    struct obj *next;
+} *q;
+
+void func()
+{
+  struct obj *p;
+  __transaction_atomic {
+    for (p = q; p; p = p->next)
+      if (p->data > 0)
+	count++;
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "MEM count_lsm.. count_lsm_flag" 1 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: testsuite/gcc.dg/pr52558-1.c
===================================================================
--- testsuite/gcc.dg/pr52558-1.c	(revision 0)
+++ testsuite/gcc.dg/pr52558-1.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "--param allow-store-data-races=0 -O2 -fdump-tree-lim1" } */
+
+/* Test that `count' is not written to unless p->data > 0.  */
+
+int count;
+
+struct obj {
+    int data;
+    struct obj *next;
+} *q;
+
+void func()
+{
+  struct obj *p;
+  for (p = q; p; p = p->next)
+    if (p->data > 0)
+      count++;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM count_lsm.. count_lsm_flag" 1 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: testsuite/gcc.dg/pr52558-2.c
===================================================================
--- testsuite/gcc.dg/pr52558-2.c	(revision 0)
+++ testsuite/gcc.dg/pr52558-2.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "--param allow-store-data-races=0 -O2 -fdump-tree-lim1" } */
+
+/* Test that g_2 is not written to unless !g_1.  */
+
+int g_1 = 1;
+int g_2 = 0;
+
+int func_1(void)
+{
+ int l;
+ for (l = 0; l < 1234; l++)
+ {
+   if (g_1)
+     return l;
+   else
+     g_2 = 0;
+ }
+ return 999;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM.*g_2_lsm_flag" 1 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: basic-block.h
===================================================================
--- basic-block.h	(revision 187729)
+++ basic-block.h	(working copy)
@@ -802,6 +802,7 @@ extern basic_block alloc_block (void);
 extern void alloc_aux_for_blocks (int);
 extern void clear_aux_for_blocks (void);
 extern void free_aux_for_blocks (void);
+extern void alloc_aux_for_edge (edge, int);
 extern void alloc_aux_for_edges (int);
 extern void clear_aux_for_edges (void);
 extern void free_aux_for_edges (void);
Index: tree-ssa-loop-im.c
===================================================================
--- tree-ssa-loop-im.c	(revision 187729)
+++ tree-ssa-loop-im.c	(working copy)
@@ -52,7 +52,7 @@ along with GCC; see the file COPYING3.
 	 }
      }
 
-   Where COND and INV are is invariants, but evaluating INV may trap or be
+   Where COND and INV are invariants, but evaluating INV may trap or be
    invalid from some other reason if !COND.  This may be transformed to
 
    if (cond)
@@ -1626,6 +1626,7 @@ gather_mem_refs_stmt (struct loop *loop,
 	  fprintf (dump_file, "\n");
 	}
     }
+
   if (is_stored)
     mark_ref_stored (ref, loop);
 
@@ -1956,6 +1957,173 @@ get_lsm_tmp_name (tree ref, unsigned n)
   return lsm_tmp_name;
 }
 
+struct prev_flag_edges {
+  /* Edge to insert new flag comparison code.  */
+  edge append_cond_position;
+
+  /* Edge for fall through from previous flag comparison.  */
+  edge last_cond_fallthru;
+};
+
+/* Helper function for execute_sm.  Emit code to store TMP_VAR into
+   MEM along edge EX.
+
+   The store is only done if MEM has changed.  We do this so no
+   changes to MEM occur on code paths that did not originally store
+   into it.
+
+   The common case for execute_sm will transform:
+
+     for (...) {
+       if (foo)
+         stuff;
+       else
+         MEM = TMP_VAR;
+     }
+
+   into:
+
+     lsm = MEM;
+     for (...) {
+       if (foo)
+         stuff;
+       else
+         lsm = TMP_VAR;
+     }
+     MEM = lsm;
+
+  This function will generate:
+
+     lsm = MEM;
+
+     lsm_flag = false;
+     ...
+     for (...) {
+       if (foo)
+         stuff;
+       else {
+         lsm = TMP_VAR;
+         lsm_flag = true;
+       }
+     }
+     if (lsm_flag)	<--
+       MEM = lsm;	<--
+*/
+
+static void
+execute_sm_if_changed (edge ex, tree mem, tree tmp_var, tree flag)
+{
+  basic_block new_bb, then_bb, old_dest;
+  bool loop_has_only_one_exit;
+  edge then_old_edge, orig_ex = ex;
+  gimple_stmt_iterator gsi;
+  gimple stmt;
+  struct prev_flag_edges *prev_edges = (struct prev_flag_edges *) ex->aux;
+
+  /* ?? Insert store after previous store if applicable.  See note
+     below.  */
+  if (prev_edges)
+    ex = prev_edges->append_cond_position;
+
+  loop_has_only_one_exit = single_pred_p (ex->dest);
+
+  if (loop_has_only_one_exit)
+    ex = split_block_after_labels (ex->dest);
+
+  old_dest = ex->dest;
+  new_bb = split_edge (ex);
+  then_bb = create_empty_bb (new_bb);
+  if (current_loops && new_bb->loop_father)
+    add_bb_to_loop (then_bb, new_bb->loop_father);
+
+  gsi = gsi_start_bb (new_bb);
+  stmt = gimple_build_cond (NE_EXPR, flag, boolean_false_node,
+			    NULL_TREE, NULL_TREE);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+
+  gsi = gsi_start_bb (then_bb);
+  /* Insert actual store.  */
+  stmt = gimple_build_assign (unshare_expr (mem), tmp_var);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+
+  make_edge (new_bb, then_bb, EDGE_TRUE_VALUE);
+  make_edge (new_bb, old_dest, EDGE_FALSE_VALUE);
+  then_old_edge = make_edge (then_bb, old_dest, EDGE_FALLTHRU);
+
+  set_immediate_dominator (CDI_DOMINATORS, then_bb, new_bb);
+
+  if (prev_edges)
+    {
+      basic_block prevbb = prev_edges->last_cond_fallthru->src;
+      redirect_edge_succ (prev_edges->last_cond_fallthru, new_bb);
+      set_immediate_dominator (CDI_DOMINATORS, new_bb, prevbb);
+      set_immediate_dominator (CDI_DOMINATORS, old_dest,
+			       recompute_dominator (CDI_DOMINATORS, old_dest));
+    }
+
+  /* ?? Because stores may alias, they must happen in the exact
+     sequence they originally happened.  Save the position right after
+     the (_lsm) store we just created so we can continue appending after
+     it and maintain the original order.  */
+  {
+    struct prev_flag_edges *p;
+
+    if (orig_ex->aux)
+      orig_ex->aux = NULL;
+    alloc_aux_for_edge (orig_ex, sizeof (struct prev_flag_edges));
+    p = (struct prev_flag_edges *) orig_ex->aux;
+    p->append_cond_position = then_old_edge;
+    p->last_cond_fallthru = find_edge (new_bb, old_dest);
+    orig_ex->aux = (void *) p;
+  }
+
+  if (!loop_has_only_one_exit)
+    for (gsi = gsi_start_phis (old_dest); !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	gimple phi = gsi_stmt (gsi);
+	unsigned i;
+
+	for (i = 0; i < gimple_phi_num_args (phi); i++)
+	  if (gimple_phi_arg_edge (phi, i)->src == new_bb)
+	    {
+	      tree arg = gimple_phi_arg_def (phi, i);
+	      add_phi_arg (phi, arg, then_old_edge, UNKNOWN_LOCATION);
+	      update_stmt (phi);
+	    }
+      }
+  /* Remove the original fall through edge.  This was the
+     single_succ_edge (new_bb).  */
+  EDGE_SUCC (new_bb, 0)->flags &= ~EDGE_FALLTHRU;
+}
+
+/* Helper function for execute_sm.  On every location where REF is
+   set, set an appropriate flag indicating the store.  */
+
+static tree
+execute_sm_if_changed_flag_set (struct loop *loop, mem_ref_p ref)
+{
+  unsigned i;
+  mem_ref_loc_p loc;
+  tree flag;
+  VEC (mem_ref_loc_p, heap) *locs = NULL;
+  char *str = get_lsm_tmp_name (ref->mem, ~0);
+
+  lsm_tmp_name_add ("_flag");
+  flag = make_rename_temp (boolean_type_node, str);
+  get_all_locs_in_loop (loop, ref, &locs);
+  FOR_EACH_VEC_ELT (mem_ref_loc_p, locs, i, loc)
+    {
+      gimple_stmt_iterator gsi;
+      gimple stmt;
+
+      gsi = gsi_for_stmt (loc->stmt);
+      stmt = gimple_build_assign (flag, boolean_true_node);
+      gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+    }
+  VEC_free (mem_ref_loc_p, heap, locs);
+  return flag;
+}
+
 /* Executes store motion of memory reference REF from LOOP.
    Exits from the LOOP are stored in EXITS.  The initialization of the
    temporary variable is put to the preheader of the loop, and assignments
@@ -1964,12 +2132,16 @@ get_lsm_tmp_name (tree ref, unsigned n)
 static void
 execute_sm (struct loop *loop, VEC (edge, heap) *exits, mem_ref_p ref)
 {
-  tree tmp_var;
+  tree tmp_var, store_flag;
   unsigned i;
-  gimple load, store;
+  gimple load;
   struct fmt_data fmt_data;
-  edge ex;
+  edge ex, latch_edge;
   struct lim_aux_data *lim_data;
+  bool multi_threaded_model_p = false;
+  /* ?? FIXME TESTING TESTING ?? */
+  multi_threaded_model_p=true;
+  /* ?? FIXME TESTING TESTING ?? */
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -1985,23 +2157,47 @@ execute_sm (struct loop *loop, VEC (edge
   fmt_data.orig_loop = loop;
   for_each_index (&ref->mem, force_move_till, &fmt_data);
 
+  if ((flag_tm && loop_preheader_edge (loop)->src->flags & BB_IN_TRANSACTION)
+      || !PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES))
+    multi_threaded_model_p = true;
+
+  if (multi_threaded_model_p)
+    store_flag = execute_sm_if_changed_flag_set (loop, ref);
+
   rewrite_mem_refs (loop, ref, tmp_var);
 
-  /* Emit the load & stores.  */
+  /* Emit the load code into the latch, so that we are sure it will
+     be processed after all dependencies.  */
+  latch_edge = loop_latch_edge (loop);
+
+  /* FIXME/TODO: For the multi-threaded variant, we could avoid this
+     load altogether, since the store is predicated by a flag.  We
+     could, do the load only if it was originally in the loop.  */
   load = gimple_build_assign (tmp_var, unshare_expr (ref->mem));
   lim_data = init_lim_data (load);
   lim_data->max_loop = loop;
   lim_data->tgt_loop = loop;
+  gsi_insert_on_edge (latch_edge, load);
 
-  /* Put this into the latch, so that we are sure it will be processed after
-     all dependencies.  */
-  gsi_insert_on_edge (loop_latch_edge (loop), load);
-
-  FOR_EACH_VEC_ELT (edge, exits, i, ex)
+  if (multi_threaded_model_p)
     {
-      store = gimple_build_assign (unshare_expr (ref->mem), tmp_var);
-      gsi_insert_on_edge (ex, store);
+      load = gimple_build_assign (store_flag, boolean_false_node);
+      lim_data = init_lim_data (load);
+      lim_data->max_loop = loop;
+      lim_data->tgt_loop = loop;
+      gsi_insert_on_edge (latch_edge, load);
     }
+
+  /* Sink the store to every exit from the loop.  */
+  FOR_EACH_VEC_ELT (edge, exits, i, ex)
+    if (!multi_threaded_model_p)
+      {
+	gimple store;
+	store = gimple_build_assign (unshare_expr (ref->mem), tmp_var);
+	gsi_insert_on_edge (ex, store);
+      }
+    else
+      execute_sm_if_changed (ex, ref->mem, tmp_var, store_flag);
 }
 
 /* Hoists memory references MEM_REFS out of LOOP.  EXITS is the list of exit
@@ -2410,6 +2606,8 @@ tree_ssa_lim_initialize (void)
 
   if (flag_tm)
     compute_transaction_bits ();
+
+  alloc_aux_for_edges (0);
 }
 
 /* Cleans up after the invariant motion pass.  */
@@ -2421,6 +2619,8 @@ tree_ssa_lim_finalize (void)
   unsigned i;
   bitmap b;
 
+  free_aux_for_edges ();
+
   FOR_EACH_BB (bb)
     SET_ALWAYS_EXECUTED_IN (bb, NULL);
 
Index: cfg.c
===================================================================
--- cfg.c	(revision 187729)
+++ cfg.c	(working copy)
@@ -814,10 +814,10 @@ free_aux_for_blocks (void)
   clear_aux_for_blocks ();
 }
 
-/* Allocate a memory edge of SIZE as BB->aux.  The obstack must
+/* Allocate a memory edge of SIZE as E->aux.  The obstack must
    be first initialized by alloc_aux_for_edges.  */
 
-static void
+void
 alloc_aux_for_edge (edge e, int size)
 {
   /* Verify that aux field is clear.  */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-05-22  2:32               ` Aldy Hernandez
@ 2012-05-29 11:14                 ` Richard Guenther
  2012-05-31 19:53                   ` Aldy Hernandez
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Guenther @ 2012-05-29 11:14 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Andrew MacLeod, Boehm, Hans, gcc-patches, Torvald Riegel

On Mon, 21 May 2012, Aldy Hernandez wrote:

> On 05/16/12 07:53, Richard Guenther wrote:
> > On Mon, 7 May 2012, Aldy Hernandez wrote:
> 
> [Sorry for the delay; I was on vacation.]
> 
> I am forgoing the load avoidance code altogether to simplify things. Thanks.
> 
> > +  /* Emit the load code into the latch, so that we are sure it will
> > +     be processed after all dependencies.  */
> > +  latch_edge = loop_latch_edge (loop);
> >
> > inserting code into the latch block is bad - the vectorizer will
> > be confused by non-empty latches.
> 
> The code as is on trunk inserts the load on the latch.

Does it?  Ok ...

>  That's why I also
> inserted the supporting flag checking code there.  Do you want me to put the
> supporting code somewhere else?

Technically there isn't a good other place that does not add a
partial redundancy.

> > Your ChangeLog mentions changes that are no longer necessary
> > (the target hook).
> 
> Whoops.  Fixed.
> 
> >
> > I didn't quite get the store order issue - we _do_ preserve store
> > ordering right now, do we?  So how come introducing the flags
> > change that?
> 
> The current scheme on trunk works because it inserts one load with
> gsi_insert_on_edge() and subsequent ones get appended correctly, whereas my
> patch has to split the edge (which happens at the top of the block), so
> subsequent code inserts happen in reverse order (at the top).  If I remember
> correctly, that is... I can look again and report if it's still unclear.

Hmm, the old code (well, and the new one ...) walks stores to move
by walking a bitmap.  I suppose we rely on computing uids in the
original program order here then.

(flag_tm && loop_preheader_edge (loop)->src->flags & BB_IN_TRANSACTION)

can you encapsulate this into a predicate?  Like block_in_transaction ()
that also checks flag_tm?

+  /* ?? FIXME TESTING TESTING ?? */
+  multi_threaded_model_p=true;
+  /* ?? FIXME TESTING TESTING ?? */

that of course needs fixing ;)  (and re-testing)


> New patch attached.  Tested on x86-64 Linux.  No regressions.

Ok with the new predicate in basic-block.h and re-testing without the
fixme above.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-05-29 11:14                 ` Richard Guenther
@ 2012-05-31 19:53                   ` Aldy Hernandez
  2012-06-01  6:22                     ` Tobias Burnus
  0 siblings, 1 reply; 22+ messages in thread
From: Aldy Hernandez @ 2012-05-31 19:53 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Andrew MacLeod, Boehm, Hans, gcc-patches, Torvald Riegel

[-- Attachment #1: Type: text/plain, Size: 958 bytes --]

On 05/29/12 06:13, Richard Guenther wrote:
> On Mon, 21 May 2012, Aldy Hernandez wrote:
>
>> On 05/16/12 07:53, Richard Guenther wrote:
>>> On Mon, 7 May 2012, Aldy Hernandez wrote:

>
> (flag_tm&&  loop_preheader_edge (loop)->src->flags&  BB_IN_TRANSACTION)
>
> can you encapsulate this into a predicate?  Like block_in_transaction ()
> that also checks flag_tm?

Done.. whoops, forgot to check for flag_tm.  I will move this into the 
predicate after I do another round of testing.  I missed this bit, and 
I've just committed.

>
> +  /* ?? FIXME TESTING TESTING ?? */
> +  multi_threaded_model_p=true;
> +  /* ?? FIXME TESTING TESTING ?? */
>
> that of course needs fixing ;)  (and re-testing)

This was here on purpose, so you'd see how I was testing.  I have 
committed the attached patch, not before testing with _and_ without the 
above FIXME.

Thanks so much for the review.  I will follow up with the flag_tm 
abstraction.

Closing the PR...

Aldy

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 12977 bytes --]

	PR tree-optimization/52558
	* cfg.c (alloc_aux_for_edge): Fix comment.
	(alloc_aux_for_edge): Remove static.
	* basic-block.h (alloc_aux_for_edge): Protoize.
	* tree-ssa-loop-im.c (execute_sm_if_changed): New.
	(execute_sm_if_changed_flag): New.
	(execute_sm_if_changed_flag_set): New.
	(execute_sm): Do not generate data races unless requested.
	(tree_ssa_lim_initialize): Call alloc_aux_for_edges.
	(tree_ssa_lim_finalize): Call free_aux_for_edges.
	* gimple.h (block_in_transaction): New.
	(gimple_in_transaction): Use block_in_transaction.

Index: testsuite/gcc.dg/pr52558-1.c
===================================================================
--- testsuite/gcc.dg/pr52558-1.c	(revision 0)
+++ testsuite/gcc.dg/pr52558-1.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "--param allow-store-data-races=0 -O2 -fdump-tree-lim1" } */
+
+/* Test that `count' is not written to unless p->data > 0.  */
+
+int count;
+
+struct obj {
+    int data;
+    struct obj *next;
+} *q;
+
+void func()
+{
+  struct obj *p;
+  for (p = q; p; p = p->next)
+    if (p->data > 0)
+      count++;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM count_lsm.. count_lsm_flag" 1 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: testsuite/gcc.dg/pr52558-2.c
===================================================================
--- testsuite/gcc.dg/pr52558-2.c	(revision 0)
+++ testsuite/gcc.dg/pr52558-2.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "--param allow-store-data-races=0 -O2 -fdump-tree-lim1" } */
+
+/* Test that g_2 is not written to unless !g_1.  */
+
+int g_1 = 1;
+int g_2 = 0;
+
+int func_1(void)
+{
+ int l;
+ for (l = 0; l < 1234; l++)
+ {
+   if (g_1)
+     return l;
+   else
+     g_2 = 0;
+ }
+ return 999;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM.*g_2_lsm_flag" 1 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: testsuite/gcc.dg/tm/reg-promotion.c
===================================================================
--- testsuite/gcc.dg/tm/reg-promotion.c	(revision 0)
+++ testsuite/gcc.dg/tm/reg-promotion.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-fgnu-tm -O2 -fdump-tree-lim1" } */
+
+/* Test that `count' is not written to unless p->data>0.  */
+
+int count;
+
+struct obj {
+    int data;
+    struct obj *next;
+} *q;
+
+void func()
+{
+  struct obj *p;
+  __transaction_atomic {
+    for (p = q; p; p = p->next)
+      if (p->data > 0)
+	count++;
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "MEM count_lsm.. count_lsm_flag" 1 "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim1" } } */
Index: tree-ssa-loop-im.c
===================================================================
--- tree-ssa-loop-im.c	(revision 188080)
+++ tree-ssa-loop-im.c	(working copy)
@@ -52,7 +52,7 @@ along with GCC; see the file COPYING3.
 	 }
      }
 
-   Where COND and INV are is invariants, but evaluating INV may trap or be
+   Where COND and INV are invariants, but evaluating INV may trap or be
    invalid from some other reason if !COND.  This may be transformed to
 
    if (cond)
@@ -1626,6 +1626,7 @@ gather_mem_refs_stmt (struct loop *loop,
 	  fprintf (dump_file, "\n");
 	}
     }
+
   if (is_stored)
     mark_ref_stored (ref, loop);
 
@@ -1956,6 +1957,173 @@ get_lsm_tmp_name (tree ref, unsigned n)
   return lsm_tmp_name;
 }
 
+struct prev_flag_edges {
+  /* Edge to insert new flag comparison code.  */
+  edge append_cond_position;
+
+  /* Edge for fall through from previous flag comparison.  */
+  edge last_cond_fallthru;
+};
+
+/* Helper function for execute_sm.  Emit code to store TMP_VAR into
+   MEM along edge EX.
+
+   The store is only done if MEM has changed.  We do this so no
+   changes to MEM occur on code paths that did not originally store
+   into it.
+
+   The common case for execute_sm will transform:
+
+     for (...) {
+       if (foo)
+         stuff;
+       else
+         MEM = TMP_VAR;
+     }
+
+   into:
+
+     lsm = MEM;
+     for (...) {
+       if (foo)
+         stuff;
+       else
+         lsm = TMP_VAR;
+     }
+     MEM = lsm;
+
+  This function will generate:
+
+     lsm = MEM;
+
+     lsm_flag = false;
+     ...
+     for (...) {
+       if (foo)
+         stuff;
+       else {
+         lsm = TMP_VAR;
+         lsm_flag = true;
+       }
+     }
+     if (lsm_flag)	<--
+       MEM = lsm;	<--
+*/
+
+static void
+execute_sm_if_changed (edge ex, tree mem, tree tmp_var, tree flag)
+{
+  basic_block new_bb, then_bb, old_dest;
+  bool loop_has_only_one_exit;
+  edge then_old_edge, orig_ex = ex;
+  gimple_stmt_iterator gsi;
+  gimple stmt;
+  struct prev_flag_edges *prev_edges = (struct prev_flag_edges *) ex->aux;
+
+  /* ?? Insert store after previous store if applicable.  See note
+     below.  */
+  if (prev_edges)
+    ex = prev_edges->append_cond_position;
+
+  loop_has_only_one_exit = single_pred_p (ex->dest);
+
+  if (loop_has_only_one_exit)
+    ex = split_block_after_labels (ex->dest);
+
+  old_dest = ex->dest;
+  new_bb = split_edge (ex);
+  then_bb = create_empty_bb (new_bb);
+  if (current_loops && new_bb->loop_father)
+    add_bb_to_loop (then_bb, new_bb->loop_father);
+
+  gsi = gsi_start_bb (new_bb);
+  stmt = gimple_build_cond (NE_EXPR, flag, boolean_false_node,
+			    NULL_TREE, NULL_TREE);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+
+  gsi = gsi_start_bb (then_bb);
+  /* Insert actual store.  */
+  stmt = gimple_build_assign (unshare_expr (mem), tmp_var);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+
+  make_edge (new_bb, then_bb, EDGE_TRUE_VALUE);
+  make_edge (new_bb, old_dest, EDGE_FALSE_VALUE);
+  then_old_edge = make_edge (then_bb, old_dest, EDGE_FALLTHRU);
+
+  set_immediate_dominator (CDI_DOMINATORS, then_bb, new_bb);
+
+  if (prev_edges)
+    {
+      basic_block prevbb = prev_edges->last_cond_fallthru->src;
+      redirect_edge_succ (prev_edges->last_cond_fallthru, new_bb);
+      set_immediate_dominator (CDI_DOMINATORS, new_bb, prevbb);
+      set_immediate_dominator (CDI_DOMINATORS, old_dest,
+			       recompute_dominator (CDI_DOMINATORS, old_dest));
+    }
+
+  /* ?? Because stores may alias, they must happen in the exact
+     sequence they originally happened.  Save the position right after
+     the (_lsm) store we just created so we can continue appending after
+     it and maintain the original order.  */
+  {
+    struct prev_flag_edges *p;
+
+    if (orig_ex->aux)
+      orig_ex->aux = NULL;
+    alloc_aux_for_edge (orig_ex, sizeof (struct prev_flag_edges));
+    p = (struct prev_flag_edges *) orig_ex->aux;
+    p->append_cond_position = then_old_edge;
+    p->last_cond_fallthru = find_edge (new_bb, old_dest);
+    orig_ex->aux = (void *) p;
+  }
+
+  if (!loop_has_only_one_exit)
+    for (gsi = gsi_start_phis (old_dest); !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	gimple phi = gsi_stmt (gsi);
+	unsigned i;
+
+	for (i = 0; i < gimple_phi_num_args (phi); i++)
+	  if (gimple_phi_arg_edge (phi, i)->src == new_bb)
+	    {
+	      tree arg = gimple_phi_arg_def (phi, i);
+	      add_phi_arg (phi, arg, then_old_edge, UNKNOWN_LOCATION);
+	      update_stmt (phi);
+	    }
+      }
+  /* Remove the original fall through edge.  This was the
+     single_succ_edge (new_bb).  */
+  EDGE_SUCC (new_bb, 0)->flags &= ~EDGE_FALLTHRU;
+}
+
+/* Helper function for execute_sm.  On every location where REF is
+   set, set an appropriate flag indicating the store.  */
+
+static tree
+execute_sm_if_changed_flag_set (struct loop *loop, mem_ref_p ref)
+{
+  unsigned i;
+  mem_ref_loc_p loc;
+  tree flag;
+  VEC (mem_ref_loc_p, heap) *locs = NULL;
+  char *str = get_lsm_tmp_name (ref->mem, ~0);
+
+  lsm_tmp_name_add ("_flag");
+  flag = make_rename_temp (boolean_type_node, str);
+  get_all_locs_in_loop (loop, ref, &locs);
+  FOR_EACH_VEC_ELT (mem_ref_loc_p, locs, i, loc)
+    {
+      gimple_stmt_iterator gsi;
+      gimple stmt;
+
+      gsi = gsi_for_stmt (loc->stmt);
+      stmt = gimple_build_assign (flag, boolean_true_node);
+      gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+    }
+  VEC_free (mem_ref_loc_p, heap, locs);
+  return flag;
+}
+
 /* Executes store motion of memory reference REF from LOOP.
    Exits from the LOOP are stored in EXITS.  The initialization of the
    temporary variable is put to the preheader of the loop, and assignments
@@ -1964,12 +2132,13 @@ get_lsm_tmp_name (tree ref, unsigned n)
 static void
 execute_sm (struct loop *loop, VEC (edge, heap) *exits, mem_ref_p ref)
 {
-  tree tmp_var;
+  tree tmp_var, store_flag;
   unsigned i;
-  gimple load, store;
+  gimple load;
   struct fmt_data fmt_data;
-  edge ex;
+  edge ex, latch_edge;
   struct lim_aux_data *lim_data;
+  bool multi_threaded_model_p = false;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -1985,23 +2154,47 @@ execute_sm (struct loop *loop, VEC (edge
   fmt_data.orig_loop = loop;
   for_each_index (&ref->mem, force_move_till, &fmt_data);
 
+  if ((flag_tm && block_in_transaction (loop_preheader_edge (loop)->src))
+      || !PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES))
+    multi_threaded_model_p = true;
+
+  if (multi_threaded_model_p)
+    store_flag = execute_sm_if_changed_flag_set (loop, ref);
+
   rewrite_mem_refs (loop, ref, tmp_var);
 
-  /* Emit the load & stores.  */
+  /* Emit the load code into the latch, so that we are sure it will
+     be processed after all dependencies.  */
+  latch_edge = loop_latch_edge (loop);
+
+  /* FIXME/TODO: For the multi-threaded variant, we could avoid this
+     load altogether, since the store is predicated by a flag.  We
+     could, do the load only if it was originally in the loop.  */
   load = gimple_build_assign (tmp_var, unshare_expr (ref->mem));
   lim_data = init_lim_data (load);
   lim_data->max_loop = loop;
   lim_data->tgt_loop = loop;
+  gsi_insert_on_edge (latch_edge, load);
 
-  /* Put this into the latch, so that we are sure it will be processed after
-     all dependencies.  */
-  gsi_insert_on_edge (loop_latch_edge (loop), load);
-
-  FOR_EACH_VEC_ELT (edge, exits, i, ex)
+  if (multi_threaded_model_p)
     {
-      store = gimple_build_assign (unshare_expr (ref->mem), tmp_var);
-      gsi_insert_on_edge (ex, store);
+      load = gimple_build_assign (store_flag, boolean_false_node);
+      lim_data = init_lim_data (load);
+      lim_data->max_loop = loop;
+      lim_data->tgt_loop = loop;
+      gsi_insert_on_edge (latch_edge, load);
     }
+
+  /* Sink the store to every exit from the loop.  */
+  FOR_EACH_VEC_ELT (edge, exits, i, ex)
+    if (!multi_threaded_model_p)
+      {
+	gimple store;
+	store = gimple_build_assign (unshare_expr (ref->mem), tmp_var);
+	gsi_insert_on_edge (ex, store);
+      }
+    else
+      execute_sm_if_changed (ex, ref->mem, tmp_var, store_flag);
 }
 
 /* Hoists memory references MEM_REFS out of LOOP.  EXITS is the list of exit
@@ -2410,6 +2603,8 @@ tree_ssa_lim_initialize (void)
 
   if (flag_tm)
     compute_transaction_bits ();
+
+  alloc_aux_for_edges (0);
 }
 
 /* Cleans up after the invariant motion pass.  */
@@ -2421,6 +2616,8 @@ tree_ssa_lim_finalize (void)
   unsigned i;
   bitmap b;
 
+  free_aux_for_edges ();
+
   FOR_EACH_BB (bb)
     SET_ALWAYS_EXECUTED_IN (bb, NULL);
 
Index: gimple.h
===================================================================
--- gimple.h	(revision 188080)
+++ gimple.h	(working copy)
@@ -1588,12 +1588,20 @@ gimple_set_has_volatile_ops (gimple stmt
     stmt->gsbase.has_volatile_ops = (unsigned) volatilep;
 }
 
+/* Return true if BB is in a transaction.  */
+
+static inline bool
+block_in_transaction (basic_block bb)
+{
+  return bb->flags & BB_IN_TRANSACTION;
+}
+
 /* Return true if STMT is in a transaction.  */
 
 static inline bool
 gimple_in_transaction (gimple stmt)
 {
-  return gimple_bb (stmt)->flags & BB_IN_TRANSACTION;
+  return block_in_transaction (gimple_bb (stmt));
 }
 
 /* Return true if statement STMT may access memory.  */
Index: basic-block.h
===================================================================
--- basic-block.h	(revision 188080)
+++ basic-block.h	(working copy)
@@ -802,6 +802,7 @@ extern basic_block alloc_block (void);
 extern void alloc_aux_for_blocks (int);
 extern void clear_aux_for_blocks (void);
 extern void free_aux_for_blocks (void);
+extern void alloc_aux_for_edge (edge, int);
 extern void alloc_aux_for_edges (int);
 extern void clear_aux_for_edges (void);
 extern void free_aux_for_edges (void);
Index: cfg.c
===================================================================
--- cfg.c	(revision 188080)
+++ cfg.c	(working copy)
@@ -814,10 +814,10 @@ free_aux_for_blocks (void)
   clear_aux_for_blocks ();
 }
 
-/* Allocate a memory edge of SIZE as BB->aux.  The obstack must
+/* Allocate a memory edge of SIZE as E->aux.  The obstack must
    be first initialized by alloc_aux_for_edges.  */
 
-static void
+void
 alloc_aux_for_edge (edge e, int size)
 {
   /* Verify that aux field is clear.  */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-05-31 19:53                   ` Aldy Hernandez
@ 2012-06-01  6:22                     ` Tobias Burnus
  2012-06-01 15:11                       ` Aldy Hernandez
  0 siblings, 1 reply; 22+ messages in thread
From: Tobias Burnus @ 2012-06-01  6:22 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc-patches

Aldy Hernandez wrote:
> 	PR tree-optimization/52558
> 	* cfg.c (alloc_aux_for_edge): Fix comment.
> 	(alloc_aux_for_edge): Remove static.
> 	* basic-block.h (alloc_aux_for_edge): Protoize.
> 	* tree-ssa-loop-im.c (execute_sm_if_changed): New.
> 	(execute_sm_if_changed_flag): New.
> 	(execute_sm_if_changed_flag_set): New.
> 	(execute_sm): Do not generate data races unless requested.
> 	(tree_ssa_lim_initialize): Call alloc_aux_for_edges.
> 	(tree_ssa_lim_finalize): Call free_aux_for_edges.
> 	* gimple.h (block_in_transaction): New.
> 	(gimple_in_transaction): Use block_in_transaction.

> Index: gimple.h
> ===================================================================
> --- gimple.h	(revision 188080)
> +++ gimple.h	(working copy)

> +/* Return true if BB is in a transaction.  */
> +
> +static inline bool
> +block_in_transaction (basic_block bb)
> +{
> +  return bb->flags & BB_IN_TRANSACTION;

The compile does not seem to like the latter very much and, thus, warns 
for every file which includes gimple.h:

gcc/gimple.h: In function 'block_in_transaction':
gcc/gimple.h:1596:20: warning: overflow in implicit constant conversion 
[-Woverflow]
    return bb->flags & BB_IN_TRANSACTION;
                     ^

Tobias

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-06-01  6:22                     ` Tobias Burnus
@ 2012-06-01 15:11                       ` Aldy Hernandez
  2012-06-01 16:03                         ` Aldy Hernandez
  0 siblings, 1 reply; 22+ messages in thread
From: Aldy Hernandez @ 2012-06-01 15:11 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gcc-patches

On 06/01/12 01:22, Tobias Burnus wrote:

> gcc/gimple.h: In function 'block_in_transaction':
> gcc/gimple.h:1596:20: warning: overflow in implicit constant conversion
> [-Woverflow]
> return bb->flags & BB_IN_TRANSACTION;
> ^

Is this still the case with the code currently in mainline:

   return flag_tm && bb->flags & BB_IN_TRANSACTION;

??

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-06-01 15:11                       ` Aldy Hernandez
@ 2012-06-01 16:03                         ` Aldy Hernandez
  2012-06-15 12:05                           ` Eric Botcazou
  0 siblings, 1 reply; 22+ messages in thread
From: Aldy Hernandez @ 2012-06-01 16:03 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gcc-patches

On 06/01/12 10:11, Aldy Hernandez wrote:
> On 06/01/12 01:22, Tobias Burnus wrote:
>
>> gcc/gimple.h: In function 'block_in_transaction':
>> gcc/gimple.h:1596:20: warning: overflow in implicit constant conversion
>> [-Woverflow]
>> return bb->flags & BB_IN_TRANSACTION;
>> ^
>
> Is this still the case with the code currently in mainline:
>
> return flag_tm && bb->flags & BB_IN_TRANSACTION;
>
> ??

Whoops, I forgot to commit that last patch.  Check now.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-06-01 16:03                         ` Aldy Hernandez
@ 2012-06-15 12:05                           ` Eric Botcazou
  2012-06-15 12:40                             ` Aldy Hernandez
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Botcazou @ 2012-06-15 12:05 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc-patches, Tobias Burnus

> Whoops, I forgot to commit that last patch.  Check now.

The warning is there on the 4.7 branch now.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PR tree-optimization/52558]: RFC: questions on store data race
  2012-06-15 12:05                           ` Eric Botcazou
@ 2012-06-15 12:40                             ` Aldy Hernandez
  0 siblings, 0 replies; 22+ messages in thread
From: Aldy Hernandez @ 2012-06-15 12:40 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, Tobias Burnus

[-- Attachment #1: Type: text/plain, Size: 292 bytes --]

On 06/15/12 06:40, Eric Botcazou wrote:
>> Whoops, I forgot to commit that last patch.  Check now.
>
> The warning is there on the 4.7 branch now.
>

Arghhh, that's the second time.  I wonder why the warning doesn't show 
up on my bootstraps.

Anyway, committed the attached patch to branch.

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 1159 bytes --]

	Backport from mainline:
	2012-05-31  Aldy Hernandez  <aldyh@redhat.com>
	* tree-ssa-loop-im.c (execute_sm): Do not check flag_tm.
	* gimple.h (block_in_transaction): Check for flag_tm.

Index: tree-ssa-loop-im.c
===================================================================
--- tree-ssa-loop-im.c	(revision 188631)
+++ tree-ssa-loop-im.c	(working copy)
@@ -2154,7 +2154,7 @@ execute_sm (struct loop *loop, VEC (edge
   fmt_data.orig_loop = loop;
   for_each_index (&ref->mem, force_move_till, &fmt_data);
 
-  if ((flag_tm && block_in_transaction (loop_preheader_edge (loop)->src))
+  if (block_in_transaction (loop_preheader_edge (loop)->src)
       || !PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES))
     multi_threaded_model_p = true;
 
Index: gimple.h
===================================================================
--- gimple.h	(revision 188631)
+++ gimple.h	(working copy)
@@ -1587,7 +1587,7 @@ gimple_set_has_volatile_ops (gimple stmt
 static inline bool
 block_in_transaction (basic_block bb)
 {
-  return bb->flags & BB_IN_TRANSACTION;
+  return flag_tm && bb->flags & BB_IN_TRANSACTION;
 }
 
 /* Return true if STMT is in a transaction.  */

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-06-15 12:05 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-12 22:12 [PR tree-optimization/52558]: RFC: questions on store data race Aldy Hernandez
2012-04-13  8:46 ` Richard Guenther
2012-04-13 23:32   ` Boehm, Hans
2012-04-24 17:43   ` Aldy Hernandez
2012-04-25 11:45     ` Richard Guenther
2012-04-25 22:55       ` Aldy Hernandez
2012-04-26  9:52         ` Richard Guenther
2012-05-07 23:05           ` Aldy Hernandez
2012-05-08  0:12             ` Andrew MacLeod
2012-05-08 19:50               ` Aldy Hernandez
2012-05-15 14:27             ` PING: " Aldy Hernandez
2012-05-16 12:54             ` Richard Guenther
2012-05-22  2:32               ` Aldy Hernandez
2012-05-29 11:14                 ` Richard Guenther
2012-05-31 19:53                   ` Aldy Hernandez
2012-06-01  6:22                     ` Tobias Burnus
2012-06-01 15:11                       ` Aldy Hernandez
2012-06-01 16:03                         ` Aldy Hernandez
2012-06-15 12:05                           ` Eric Botcazou
2012-06-15 12:40                             ` Aldy Hernandez
2012-04-13 23:23 ` Boehm, Hans
2012-04-17 15:04   ` Aldy Hernandez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).