public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [C11-atomic] test invalid hoist across and acquire load
  2012-03-21 18:26 [C11-atomic] test invalid hoist across and acquire load Aldy Hernandez
@ 2012-03-21 18:08 ` Andrew MacLeod
  2012-03-23 14:39   ` Aldy Hernandez
  2012-03-21 18:17 ` Andrew MacLeod
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew MacLeod @ 2012-03-21 18:08 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Torvald Riegel, gcc-patches

On 03/21/2012 01:35 PM, Aldy Hernandez wrote:
> In the test below, we cannot cache either [x] or [y] neither before 
> the load of flag1 nor the load of flag2.  This is because the 
> corresponding store/release can flush a different value of x or y:
>
> +  if (__atomic_load_n (&flag1, __ATOMIC_ACQUIRE))
> +    i = x + y;
> +
> +  if (__atomic_load_n (&flag2, __ATOMIC_ACQUIRE))
> +    a = 10;
> +  j = x + y;
>

Actually, does it need to be that complicated?

can't you simply have the "other_thread" process monotonically increase 
x by 1 every cycle?

then if the load is hoisted and commoned,  
simulate_thread_final_verify() can simply check that if  i == j,  it 
knows that x was loaded as a common value and reused when calculating 
j.   with the other thread increasing x eveyr sycle, they should never 
be the same value.

Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [C11-atomic] test invalid hoist across and acquire load
  2012-03-21 18:26 [C11-atomic] test invalid hoist across and acquire load Aldy Hernandez
  2012-03-21 18:08 ` Andrew MacLeod
@ 2012-03-21 18:17 ` Andrew MacLeod
  1 sibling, 0 replies; 6+ messages in thread
From: Andrew MacLeod @ 2012-03-21 18:17 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Torvald Riegel, gcc-patches

On 03/21/2012 01:35 PM, Aldy Hernandez wrote:
>
> The pass at fault here is the combine stack adjustment RTL pass.  I 
> have not looked into why this is happening, but I wanted to get this 
> test into the branch lest we forget about it.
>
> Is this OK for the branch?  Is my understanding correct?

Fine for the C11-atomic branch..  we'll accumulate all the failing tests 
there for now... and then address them when we go memory-model 
compliance hunting...

Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [C11-atomic] test invalid hoist across and acquire load
@ 2012-03-21 18:26 Aldy Hernandez
  2012-03-21 18:08 ` Andrew MacLeod
  2012-03-21 18:17 ` Andrew MacLeod
  0 siblings, 2 replies; 6+ messages in thread
From: Aldy Hernandez @ 2012-03-21 18:26 UTC (permalink / raw)
  To: Andrew MacLeod, Torvald Riegel, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1333 bytes --]

In the test below, we cannot cache either [x] or [y] neither before the 
load of flag1 nor the load of flag2.  This is because the corresponding 
store/release can flush a different value of x or y:

+  if (__atomic_load_n (&flag1, __ATOMIC_ACQUIRE))
+    i = x + y;
+
+  if (__atomic_load_n (&flag2, __ATOMIC_ACQUIRE))
+    a = 10;
+  j = x + y;

For example, on x86-64, we are hoisting "x" and "y" before the load of 
flag2:

         movl    flag1(%rip), %eax
         movl    x(%rip), %edx		<-- hoist of X
         testl   %eax, %eax
         movl    y(%rip), %eax		<-- hoist of Y
         je      .L2
         leal    (%edx,%eax), %ecx
         movl    %ecx, i(%rip)
.L2:
         movl    flag2(%rip), %ecx
         testl   %ecx, %ecx
         je      .L3
         movl    $10, a(%rip)
.L3:
         addl    %edx, %eax		<-- x/y may have changed by the
					    acquire of flag2.
         movl    %eax, j(%rip)
         ret

(For that matter, we are also hoisting "x" before the actual test of 
flag1 as well, but I believe this is allowed since flag1 has already 
been loaded.)

The pass at fault here is the combine stack adjustment RTL pass.  I have 
not looked into why this is happening, but I wanted to get this test 
into the branch lest we forget about it.

Is this OK for the branch?  Is my understanding correct?

Aldy

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 2677 bytes --]


Index: atomic-hoist-1.c
===================================================================
--- atomic-hoist-1.c	(revision 0)
+++ atomic-hoist-1.c	(revision 0)
@@ -0,0 +1,96 @@
+/* { dg-do link } */
+/* { dg-require-effective-target sync_int_long } */
+/* { dg-final { simulate-thread } } */
+
+/* Test that a hoist is not performed across an acquire barrier.  */
+
+#include <stdio.h>
+#include "simulate-thread.h"
+
+int flag1=1, flag2=1;
+
+unsigned int x=1, y=2, i=0x1234, j=0x5678, a;
+
+/* These two tables are random numbers such that there are no two
+   pairs between the both tables that yield the same sum.  */
+
+unsigned int table1[16] = {
+  24747, 19512, 3692, 25985,
+  25079, 24, 3310, 22073,
+  4026, 25641, 35240, 35542,
+  24783, 17378, 12184, 23755
+};
+
+unsigned int table2[16] = {
+  2467, 37461, 14064, 36460,
+  46434, 8387, 42174, 36763,
+  49205, 48759, 10526, 3446,
+  14035, 2195, 6798, 38782
+};
+
+int table_cycle_size = 16;
+
+/* At each instruction, get a new X and Y from the tables to later
+   verify that we have not reused a value incorrectly.  */
+void simulate_thread_other_threads ()
+{
+  static int current = 0;
+
+  if (++current >= table_cycle_size)
+    current = 0;
+  x = table1[current];
+  y = table2[current];
+}
+
+/* Return true if error, otherwise 0.  */
+int verify_result ()
+{
+  /* [i] should not equal [j], because that would mean that we hoisted
+     [x] and [y] instead of loading them again.  */
+  int fail = i == j;
+  if (fail)
+    printf("FAIL: i (%u) should not equal j (%u)\n", i, j);
+  return fail;
+}
+
+int simulate_thread_step_verify ()
+{
+  return verify_result ();
+}
+
+int simulate_thread_final_verify ()
+{
+  return verify_result ();
+}
+
+__attribute__((noinline))
+void simulate_thread_main()
+{
+  /* The values of x or y should not be hoisted across reads of
+     flag[12].
+
+     For example, when the second load below synchronizes with another
+     thread, the synchronization is with a release, and that release
+     may cause a stored value of x/y to be flushed and become visible.
+     So, for this case, it is incorrect for CSE/CSA/and-others to
+     hoist x or y above the load of flag2.  */
+
+  /* Execute loads with value changing at various cyclic values.  */
+  if (__atomic_load_n (&flag1, __ATOMIC_ACQUIRE))
+    i = x + y;
+ 
+  if (__atomic_load_n (&flag2, __ATOMIC_ACQUIRE))
+    a = 10;
+  j = x + y;
+
+  /* Since x and y have been changing at each instruction above, i and j
+     should be different.  If they are the same, we have hoisted
+     something incorrectly.  */
+}
+
+main()
+{
+  simulate_thread_main ();
+  simulate_thread_done ();
+  return 0;
+}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [C11-atomic] test invalid hoist across and acquire load
  2012-03-21 18:08 ` Andrew MacLeod
@ 2012-03-23 14:39   ` Aldy Hernandez
  2012-03-23 14:57     ` Andrew MacLeod
  0 siblings, 1 reply; 6+ messages in thread
From: Aldy Hernandez @ 2012-03-23 14:39 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: Torvald Riegel, gcc-patches

On 03/21/12 12:54, Andrew MacLeod wrote:
> On 03/21/2012 01:35 PM, Aldy Hernandez wrote:
>> In the test below, we cannot cache either [x] or [y] neither before
>> the load of flag1 nor the load of flag2. This is because the
>> corresponding store/release can flush a different value of x or y:
>>
>> + if (__atomic_load_n (&flag1, __ATOMIC_ACQUIRE))
>> + i = x + y;
>> +
>> + if (__atomic_load_n (&flag2, __ATOMIC_ACQUIRE))
>> + a = 10;
>> + j = x + y;
>>
>
> Actually, does it need to be that complicated?
>
> can't you simply have the "other_thread" process monotonically increase
> x by 1 every cycle?
>
> then if the load is hoisted and commoned, simulate_thread_final_verify()
> can simply check that if i == j, it knows that x was loaded as a common
> value and reused when calculating j. with the other thread increasing x
> eveyr sycle, they should never be the same value.

Hmmm, for this particular case I know CSA is commoning x + y, but what 
if another combination of passes hoists only y and leaves x alone.  It 
would be nice to test that y isn't hoisted independently of x.  Would it 
not, or do you only want to test this particular behavior?

Aldy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [C11-atomic] test invalid hoist across and acquire load
  2012-03-23 14:39   ` Aldy Hernandez
@ 2012-03-23 14:57     ` Andrew MacLeod
  2012-03-23 15:39       ` Aldy Hernandez
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew MacLeod @ 2012-03-23 14:57 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Torvald Riegel, gcc-patches

On 03/23/2012 10:39 AM, Aldy Hernandez wrote:
> On 03/21/12 12:54, Andrew MacLeod wrote:
>> On 03/21/2012 01:35 PM, Aldy Hernandez wrote:
>>> In the test below, we cannot cache either [x] or [y] neither before
>>> the load of flag1 nor the load of flag2. This is because the
>>> corresponding store/release can flush a different value of x or y:
>>>
>>> + if (__atomic_load_n (&flag1, __ATOMIC_ACQUIRE))
>>> + i = x + y;
>>> +
>>> + if (__atomic_load_n (&flag2, __ATOMIC_ACQUIRE))
>>> + a = 10;
>>> + j = x + y;
>>>
>>
>> Actually, does it need to be that complicated?
>>
>> can't you simply have the "other_thread" process monotonically increase
>> x by 1 every cycle?
>>
>> then if the load is hoisted and commoned, simulate_thread_final_verify()
>> can simply check that if i == j, it knows that x was loaded as a common
>> value and reused when calculating j. with the other thread increasing x
>> eveyr sycle, they should never be the same value.
>
> Hmmm, for this particular case I know CSA is commoning x + y, but what 
> if another combination of passes hoists only y and leaves x alone.  It 
> would be nice to test that y isn't hoisted independently of x.  Would 
> it not, or do you only want to test this particular behavior?
>
so enter it as 2 testcases, one increasing x and one increasing y, or 
better yet  set it up so that this function is called twice from 
simulate_main, with other_process() increasing x the first time and 
increasing y the second time...  or something like that.

Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [C11-atomic] test invalid hoist across and acquire load
  2012-03-23 14:57     ` Andrew MacLeod
@ 2012-03-23 15:39       ` Aldy Hernandez
  0 siblings, 0 replies; 6+ messages in thread
From: Aldy Hernandez @ 2012-03-23 15:39 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: Torvald Riegel, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1430 bytes --]

After much pondering and talking with both you and Torvald, it has been 
determined that the test at hand is technically allowed to hoist the 
value of x/y because the standard guarantees that the code below is data 
race free:

	if (__atomic_load_n (&flag1, __ATOMIC_ACQUIRE))
		i = x + y;
	if (__atomic_load_n (&flag2, __ATOMIC_ACQUIRE))
		a = 10;
	j = x + y;

So, since j=x+y is an unconditional load of x/y, we can assume there are 
no other threads writing to x/y.

However...

Depending on such undefined behaviors is liable to cause confusion and 
frustration with our ourselves and our users, so it is best to limit 
these optimizations across atomics altogether.  It's best to err on the 
side of caution than overly optimize across confusing data races.  As we 
become better versed in optimizing across atomics, we can relax things 
and perhaps make optimizations more aggressive.  But for now, let's mark 
this as a must fix, and make sure hoists are not allowed across acquire 
operations.

I have detailed the problem in the testcase.

 > so enter it as 2 testcases, one increasing x and one increasing y, or
 > better yet set it up so that this function is called twice from
 > simulate_main, with other_process() increasing x the first time and
 > increasing y the second time... or something like that.
 >
 > Andrew

Two testcases?  Now you just want to see me work more :).

Implemented as loop.

OK for branch?

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 2698 bytes --]

Index: atomic-hoist-1.c
===================================================================
--- atomic-hoist-1.c	(revision 0)
+++ atomic-hoist-1.c	(revision 0)
@@ -0,0 +1,89 @@
+/* { dg-do link } */
+/* { dg-require-effective-target sync_int_long } */
+/* { dg-final { simulate-thread } } */
+
+/* Test that a hoist is not performed across an acquire barrier.  */
+
+#include <stdio.h>
+#include "simulate-thread.h"
+
+int iteration = 0;
+int flag1=1, flag2=1;
+unsigned int x=1, y=2, i=0x1234, j=0x5678, a;
+
+
+/* At each instruction, get a new X or Y to later verify that we have
+   not reused a value incorrectly.  */
+void simulate_thread_other_threads ()
+{
+  if (iteration == 0)
+    x++;
+  else
+    y++;
+}
+
+/* Return true if error, otherwise 0.  */
+int verify_result ()
+{
+  /* [i] should not equal [j], because that would mean that we hoisted
+     [x] or [y] instead of loading them again.  */
+  int fail = i == j;
+  if (fail)
+    printf("FAIL: i (%u) should not equal j (%u)\n", i, j);
+  return fail;
+}
+
+int simulate_thread_step_verify ()
+{
+  return verify_result ();
+}
+
+int simulate_thread_final_verify ()
+{
+  return verify_result ();
+}
+
+__attribute__((noinline))
+void simulate_thread_main()
+{
+  for (; iteration < 2; ++iteration)
+    {
+      /* The values of x or y should not be hoisted across reads of
+	 flag[12].
+
+	 For example, when the second load below synchronizes with
+	 another thread, the synchronization is with a release, and
+	 that release may cause a stored value of x/y to be flushed
+	 and become visible.  So, for this case, it is incorrect for
+	 CSE/CSA/and-others to hoist x or y above the load of
+	 flag2.  */
+      if (__atomic_load_n (&flag1, __ATOMIC_ACQUIRE))
+	i = x + y;
+      if (__atomic_load_n (&flag2, __ATOMIC_ACQUIRE))
+	a = 10;
+      /* NOTE: According to the standard we can assume that the
+	 testcase is data race free, so if there is an unconditional
+	 load of x+y here at j=x+y, there should not be any other
+	 thread writing to x or y if we are indeed data race free.
+
+	 This means that we are technically free to hoist x/y.
+	 However, since depending on these undefined behaviors is
+	 liable to get many confused, it is best to be conservative
+	 with optimizations on atomics, hence the current test.  As
+	 we become better versed in optimizations across atomics, we
+	 can relax the optimizations a bit.  */
+      j = x + y;
+
+      /* Since x or y have been changing at each instruction above, i
+	 and j should be different.  If they are the same, we have
+	 hoisted something incorrectly.  */
+    }
+
+}
+
+main()
+{
+  simulate_thread_main ();
+  simulate_thread_done ();
+  return 0;
+}

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-03-23 15:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-21 18:26 [C11-atomic] test invalid hoist across and acquire load Aldy Hernandez
2012-03-21 18:08 ` Andrew MacLeod
2012-03-23 14:39   ` Aldy Hernandez
2012-03-23 14:57     ` Andrew MacLeod
2012-03-23 15:39       ` Aldy Hernandez
2012-03-21 18:17 ` Andrew MacLeod

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).