public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
* [Fortran, Patch] Memory sync after coarray image control statements and assignment
@ 2015-12-06 21:02 Alessandro Fanfarillo
  2015-12-07  7:20 ` Tobias Burnus
  0 siblings, 1 reply; 15+ messages in thread
From: Alessandro Fanfarillo @ 2015-12-06 21:02 UTC (permalink / raw)
  To: gfortran, gcc-patches, opencoarrays

[-- Attachment #1: Type: text/plain, Size: 779 bytes --]

Dear all,

currently, a coarray assignment in a program composed by a single
segment (without any sync statements) produces wrong results.
Furthermore, a coarray code
compiled with an optimization flag higher that -O0 may produce wrong
results. The patch (re)introduces a __sync_synchronize() after coarray
image control statements (sync all, sync images, critical, locks and
events) and get/put.

The attached patch fixes the problems reported by Deepak in the
following discussion:
https://gcc.gnu.org/ml/fortran/2015-04/msg00062.html.

Built and regtested on x86_64-pc-linux-gnu.


PS: Adding the __sync_synchronize() after get/put and sync statements
fix all the problems reported by Deepak. I do the same also for locks,
critical and events. Suggestions are warmly welcome.

[-- Attachment #2: patch_sync.diff --]
[-- Type: text/plain, Size: 4792 bytes --]

commit 75c93d3085116748115c8f69ad5ad58f4ad9369c
Author: Alessandro Fanfarillo <fanfarillo@ing.uniroma2.it>
Date:   Sun Dec 6 18:50:51 2015 +0100

    Introducing __sync_synchronize() after image control statements, send and get

diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
index ba176a1..b450fae 100644
--- a/gcc/fortran/ChangeLog
+++ b/gcc/fortran/ChangeLog
@@ -1,3 +1,13 @@
+2015-12-06  Alessandro Fanfarillo <fanfarillo.gcc@gmail.com>
+
+	* trans.c (gfc_allocate_using_lib,gfc_deallocate_with_status):
+	Introducing __sync_synchronize() after image control statements.
+	* trans-stmt.c (gfc_trans_sync,gfc_trans_event_post_wait,
+	gfc_trans_lock_unlock): Ditto.
+	* trans-intrinsic.c (gfc_conv_intrinsic_caf_get,
+	conv_caf_send): Introducing __sync_synchronize() after send,
+	get and sendget.
+
 2015-12-05  Paul Thomas  <pault@gcc.gnu.org>
 
 	PR fortran/68676
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 21efe44..07795ca 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -1222,6 +1222,12 @@ gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs, tree lhs_kind,
   se->expr = res_var;
   if (array_expr->ts.type == BT_CHARACTER)
     se->string_length = argse.string_length;
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = builtin_decl_explicit (BUILT_IN_SYNC_SYNCHRONIZE);
+  tmp = build_call_expr_loc (input_location, tmp, 0);
+  gfc_add_expr_to_block (&se->pre, tmp);
+
 }
 
 
@@ -1390,6 +1396,12 @@ conv_caf_send (gfc_code *code) {
   gfc_add_expr_to_block (&block, tmp);
   gfc_add_block_to_block (&block, &lhs_se.post);
   gfc_add_block_to_block (&block, &rhs_se.post);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = builtin_decl_explicit (BUILT_IN_SYNC_SYNCHRONIZE);
+  tmp = build_call_expr_loc (input_location, tmp, 0);
+  gfc_add_expr_to_block (&block, tmp);
+
   return gfc_finish_block (&block);
 }
 
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 3df483a..b1de0f5 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -818,6 +818,11 @@ gfc_trans_lock_unlock (gfc_code *code, gfc_exec_op op)
 				   errmsg, errmsg_len);
       gfc_add_expr_to_block (&se.pre, tmp);
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = builtin_decl_explicit (BUILT_IN_SYNC_SYNCHRONIZE);
+      tmp = build_call_expr_loc (input_location, tmp, 0);
+      gfc_add_expr_to_block (&se.pre, tmp);
+
       if (stat2 != NULL_TREE)
 	gfc_add_modify (&se.pre, stat2,
 			fold_convert (TREE_TYPE (stat2), stat));
@@ -995,6 +1000,11 @@ gfc_trans_event_post_wait (gfc_code *code, gfc_exec_op op)
 			       errmsg, errmsg_len);
   gfc_add_expr_to_block (&se.pre, tmp);
 
+  /* It guarantees memory consistency within the same segment */
+  tmp = builtin_decl_explicit (BUILT_IN_SYNC_SYNCHRONIZE);
+  tmp = build_call_expr_loc (input_location, tmp, 0);
+  gfc_add_expr_to_block (&se.pre, tmp);
+
   if (stat2 != NULL_TREE)
     gfc_add_modify (&se.pre, stat2, fold_convert (TREE_TYPE (stat2), stat));
 
@@ -1080,6 +1090,15 @@ gfc_trans_sync (gfc_code *code, gfc_exec_op type)
 			       fold_convert (integer_type_node, images));
     }
 
+  /* Per F2008, 8.5.1, a SYNC MEMORY is implied by calling the
+     image control statements SYNC IMAGES and SYNC ALL.  */
+  if (flag_coarray == GFC_FCOARRAY_LIB)
+    {
+      tmp = builtin_decl_explicit (BUILT_IN_SYNC_SYNCHRONIZE);
+      tmp = build_call_expr_loc (input_location, tmp, 0);
+      gfc_add_expr_to_block (&se.pre, tmp);
+    }
+  
   if (flag_coarray != GFC_FCOARRAY_LIB)
     {
       /* Set STAT to zero.  */
diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index 001db41..e7803bd 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -746,6 +746,11 @@ gfc_allocate_using_lib (stmtblock_t * block, tree pointer, tree size,
 			 TREE_TYPE (pointer), pointer,
 			 fold_convert ( TREE_TYPE (pointer), tmp));
   gfc_add_expr_to_block (block, tmp);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = builtin_decl_explicit (BUILT_IN_SYNC_SYNCHRONIZE);
+  tmp = build_call_expr_loc (input_location, tmp, 0);
+  gfc_add_expr_to_block (block, tmp);
 }
 
 
@@ -1356,6 +1361,11 @@ gfc_deallocate_with_status (tree pointer, tree status, tree errmsg,
 	     token, pstat, errmsg, errlen);
       gfc_add_expr_to_block (&non_null, tmp);
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = builtin_decl_explicit (BUILT_IN_SYNC_SYNCHRONIZE);
+      tmp = build_call_expr_loc (input_location, tmp, 0);
+      gfc_add_expr_to_block (&non_null, tmp);
+
       if (status != NULL_TREE)
 	{
 	  tree stat = build_fold_indirect_ref_loc (input_location, status);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
  2015-12-06 21:02 [Fortran, Patch] Memory sync after coarray image control statements and assignment Alessandro Fanfarillo
@ 2015-12-07  7:20 ` Tobias Burnus
  2015-12-07  9:16   ` Alessandro Fanfarillo
  2015-12-07 10:06   ` Tobias Burnus
  0 siblings, 2 replies; 15+ messages in thread
From: Tobias Burnus @ 2015-12-07  7:20 UTC (permalink / raw)
  To: Alessandro Fanfarillo, gfortran, gcc-patches, opencoarrays

Dear Alessandro, dear all,

Alessandro Fanfarillo wrote:
> currently, a coarray assignment in a program composed by a single
> segment (without any sync statements) produces wrong results.

Always - or only with optimization?

> Furthermore, a coarray code
> compiled with an optimization flag higher that -O0 may produce wrong
> results. The patch (re)introduces a __sync_synchronize() after coarray
> image control statements (sync all, sync images, critical, locks and
> events) and get/put.

I wonder whether using

__asm__ __volatile__ ("":::"memory");

would be sufficient as it has a way lower overhead than 
__sync_synchronize().


That would be something like:

   r = build_stmt (input_location, ASM_EXPR, string,
                   output_operands, input_operands,
                   clobbers, labels);
   ASM_VOLATILE_P (r) = 1;

with string = "", output_operands = NULL_TREE, input_operands = 
NULL_TREE, clobbers = "memory" and labels = NULL_TREE.  (Except that 
string+clobbers are trees and not char[].)

Cheers,

Tobias

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
  2015-12-07  7:20 ` Tobias Burnus
@ 2015-12-07  9:16   ` Alessandro Fanfarillo
  2015-12-07 10:06   ` Tobias Burnus
  1 sibling, 0 replies; 15+ messages in thread
From: Alessandro Fanfarillo @ 2015-12-07  9:16 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gfortran, gcc-patches, opencoarrays

Hi,

2015-12-07 8:20 GMT+01:00 Tobias Burnus <burnus@net-b.de>:
> Always - or only with optimization?
>

Only with optimization.

> I wonder whether using
>
> __asm__ __volatile__ ("":::"memory");
>
> would be sufficient as it has a way lower overhead than
> __sync_synchronize().
>
>
> That would be something like:
>
>   r = build_stmt (input_location, ASM_EXPR, string,
>                   output_operands, input_operands,
>                   clobbers, labels);
>   ASM_VOLATILE_P (r) = 1;
>
> with string = "", output_operands = NULL_TREE, input_operands = NULL_TREE,
> clobbers = "memory" and labels = NULL_TREE.  (Except that string+clobbers
> are trees and not char[].)
>

I'm going to try it. Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
  2015-12-07  7:20 ` Tobias Burnus
  2015-12-07  9:16   ` Alessandro Fanfarillo
@ 2015-12-07 10:06   ` Tobias Burnus
  2015-12-07 14:09     ` Matthew Wahab
  2015-12-07 14:48     ` Alessandro Fanfarillo
  1 sibling, 2 replies; 15+ messages in thread
From: Tobias Burnus @ 2015-12-07 10:06 UTC (permalink / raw)
  To: fortran, gcc-patches, Alessandro Fanfarillo, opencoarrays

[-- Attachment #1: Type: text/plain, Size: 442 bytes --]

I wrote:
> I wonder whether using
>
> __asm__ __volatile__ ("":::"memory");
>
> would be sufficient as it has a way lower overhead than 
> __sync_synchronize().

Namely, something like the attached patch.

Regarding the original patch submission: Is there a reason that you didn't
include the test case of Deepak from https://gcc.gnu.org/ml/fortran/2015-04/msg00062.html
It should work as -fcoarray=lib -lcaf_single "dg-do run" test.

Tobias

[-- Attachment #2: caf.diff --]
[-- Type: text/x-diff, Size: 5211 bytes --]

 trans-intrinsic.c |   18 ++++++++++++++++++
 trans-stmt.c      |   29 +++++++++++++++++++++++++++++
 trans.c           |   16 ++++++++++++++++
 3 files changed, 63 insertions(+)

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 21efe44..04ba3ea 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -1222,6 +1222,15 @@ gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs, tree lhs_kind,
   se->expr = res_var;
   if (array_expr->ts.type == BT_CHARACTER)
     se->string_length = argse.string_length;
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (&se->pre, tmp);
+
 }
 
 
@@ -1390,6 +1399,15 @@ conv_caf_send (gfc_code *code) {
   gfc_add_expr_to_block (&block, tmp);
   gfc_add_block_to_block (&block, &lhs_se.post);
   gfc_add_block_to_block (&block, &rhs_se.post);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (&block, tmp);
+
   return gfc_finish_block (&block);
 }
 
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 3df483a..b7e1faa 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -818,6 +818,15 @@ gfc_trans_lock_unlock (gfc_code *code, gfc_exec_op op)
 				   errmsg, errmsg_len);
       gfc_add_expr_to_block (&se.pre, tmp);
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+
+      gfc_add_expr_to_block (&se.pre, tmp);
+
       if (stat2 != NULL_TREE)
 	gfc_add_modify (&se.pre, stat2,
 			fold_convert (TREE_TYPE (stat2), stat));
@@ -995,6 +1004,14 @@ gfc_trans_event_post_wait (gfc_code *code, gfc_exec_op op)
 			       errmsg, errmsg_len);
   gfc_add_expr_to_block (&se.pre, tmp);
 
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (&se.pre, tmp);
+
   if (stat2 != NULL_TREE)
     gfc_add_modify (&se.pre, stat2, fold_convert (TREE_TYPE (stat2), stat));
 
@@ -1080,6 +1097,18 @@ gfc_trans_sync (gfc_code *code, gfc_exec_op type)
 			       fold_convert (integer_type_node, images));
     }
 
+  /* Per F2008, 8.5.1, a SYNC MEMORY is implied by calling the
+     image control statements SYNC IMAGES and SYNC ALL.  */
+  if (flag_coarray == GFC_FCOARRAY_LIB)
+    {
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+      gfc_add_expr_to_block (&se.pre, tmp);
+    }
+
   if (flag_coarray != GFC_FCOARRAY_LIB)
     {
       /* Set STAT to zero.  */
diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index 001db41..1993743 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -746,6 +746,14 @@ gfc_allocate_using_lib (stmtblock_t * block, tree pointer, tree size,
 			 TREE_TYPE (pointer), pointer,
 			 fold_convert ( TREE_TYPE (pointer), tmp));
   gfc_add_expr_to_block (block, tmp);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (block, tmp);
 }
 
 
@@ -1356,6 +1364,14 @@ gfc_deallocate_with_status (tree pointer, tree status, tree errmsg,
 	     token, pstat, errmsg, errlen);
       gfc_add_expr_to_block (&non_null, tmp);
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+      gfc_add_expr_to_block (&non_null, tmp);
+
       if (status != NULL_TREE)
 	{
 	  tree stat = build_fold_indirect_ref_loc (input_location, status);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
  2015-12-07 10:06   ` Tobias Burnus
@ 2015-12-07 14:09     ` Matthew Wahab
  2015-12-08  9:26       ` Tobias Burnus
  2015-12-07 14:48     ` Alessandro Fanfarillo
  1 sibling, 1 reply; 15+ messages in thread
From: Matthew Wahab @ 2015-12-07 14:09 UTC (permalink / raw)
  To: Tobias Burnus, fortran, gcc-patches, Alessandro Fanfarillo

On 07/12/15 10:06, Tobias Burnus wrote:
> I wrote:
>> I wonder whether using
>>
>> __asm__ __volatile__ ("":::"memory");
>>
>> would be sufficient as it has a way lower overhead than
>> __sync_synchronize().
>
> Namely, something like the attached patch.
>
> Regarding the original patch submission: Is there a reason that you didn't
> include the test case of Deepak from https://gcc.gnu.org/ml/fortran/2015-04/msg00062.html
> It should work as -fcoarray=lib -lcaf_single "dg-do run" test.
>
> Tobias
>

I don't know anything about Fortran or coarrays and I'm curious whether this affects 
architectures with weak memory models. Is the barrier only needed to stop reordering 
by the compiler or is does it also need to stop reordering by the hardware?

Matthew


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
  2015-12-07 10:06   ` Tobias Burnus
  2015-12-07 14:09     ` Matthew Wahab
@ 2015-12-07 14:48     ` Alessandro Fanfarillo
  2015-12-08 10:01       ` Tobias Burnus
  1 sibling, 1 reply; 15+ messages in thread
From: Alessandro Fanfarillo @ 2015-12-07 14:48 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gfortran, gcc-patches, opencoarrays

[-- Attachment #1: Type: text/plain, Size: 625 bytes --]

Your patch fixes the issues. In attachment patch, test case and changelog.

Thanks!

2015-12-07 11:06 GMT+01:00 Tobias Burnus <tobias.burnus@physik.fu-berlin.de>:
> I wrote:
>> I wonder whether using
>>
>> __asm__ __volatile__ ("":::"memory");
>>
>> would be sufficient as it has a way lower overhead than
>> __sync_synchronize().
>
> Namely, something like the attached patch.
>
> Regarding the original patch submission: Is there a reason that you didn't
> include the test case of Deepak from https://gcc.gnu.org/ml/fortran/2015-04/msg00062.html
> It should work as -fcoarray=lib -lcaf_single "dg-do run" test.
>
> Tobias

[-- Attachment #2: caf_sync_2.diff --]
[-- Type: text/plain, Size: 5889 bytes --]

commit 69e650945454491bbaf81513a1eed10b5b6b0f45
Author: Alessandro Fanfarillo <fanfarillo@ing.uniroma2.it>
Date:   Mon Dec 7 15:46:10 2015 +0100

    Introducing __asm__ __volatile__ (:::memory) after image control statements, send and get

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 21efe44..25ff311 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -1222,6 +1222,15 @@ gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs, tree lhs_kind,
   se->expr = res_var;
   if (array_expr->ts.type == BT_CHARACTER)
     se->string_length = argse.string_length;
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (&se->pre, tmp);
+
 }
 
 
@@ -1390,6 +1399,15 @@ conv_caf_send (gfc_code *code) {
   gfc_add_expr_to_block (&block, tmp);
   gfc_add_block_to_block (&block, &lhs_se.post);
   gfc_add_block_to_block (&block, &rhs_se.post);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (&block, tmp);
+
   return gfc_finish_block (&block);
 }
 
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 3df483a..b7e1faa 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -818,6 +818,15 @@ gfc_trans_lock_unlock (gfc_code *code, gfc_exec_op op)
 				   errmsg, errmsg_len);
       gfc_add_expr_to_block (&se.pre, tmp);
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+
+      gfc_add_expr_to_block (&se.pre, tmp);
+
       if (stat2 != NULL_TREE)
 	gfc_add_modify (&se.pre, stat2,
 			fold_convert (TREE_TYPE (stat2), stat));
@@ -995,6 +1004,14 @@ gfc_trans_event_post_wait (gfc_code *code, gfc_exec_op op)
 			       errmsg, errmsg_len);
   gfc_add_expr_to_block (&se.pre, tmp);
 
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (&se.pre, tmp);
+
   if (stat2 != NULL_TREE)
     gfc_add_modify (&se.pre, stat2, fold_convert (TREE_TYPE (stat2), stat));
 
@@ -1080,6 +1097,18 @@ gfc_trans_sync (gfc_code *code, gfc_exec_op type)
 			       fold_convert (integer_type_node, images));
     }
 
+  /* Per F2008, 8.5.1, a SYNC MEMORY is implied by calling the
+     image control statements SYNC IMAGES and SYNC ALL.  */
+  if (flag_coarray == GFC_FCOARRAY_LIB)
+    {
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+      gfc_add_expr_to_block (&se.pre, tmp);
+    }
+
   if (flag_coarray != GFC_FCOARRAY_LIB)
     {
       /* Set STAT to zero.  */
diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index 001db41..1993743 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -746,6 +746,14 @@ gfc_allocate_using_lib (stmtblock_t * block, tree pointer, tree size,
 			 TREE_TYPE (pointer), pointer,
 			 fold_convert ( TREE_TYPE (pointer), tmp));
   gfc_add_expr_to_block (block, tmp);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (block, tmp);
 }
 
 
@@ -1356,6 +1364,14 @@ gfc_deallocate_with_status (tree pointer, tree status, tree errmsg,
 	     token, pstat, errmsg, errlen);
       gfc_add_expr_to_block (&non_null, tmp);
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+      gfc_add_expr_to_block (&non_null, tmp);
+
       if (status != NULL_TREE)
 	{
 	  tree stat = build_fold_indirect_ref_loc (input_location, status);
diff --git a/gcc/testsuite/gfortran.dg/coarray_40.f90 b/gcc/testsuite/gfortran.dg/coarray_40.f90
new file mode 100644
index 0000000..84af50e
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray_40.f90
@@ -0,0 +1,25 @@
+! { dg-do run }
+! { dg-options "-fcoarray=lib -lcaf_single" }
+!
+! Run-time test for memory consistency
+!
+! Contributed by Deepak Eachempati
+
+program cp_bug
+    implicit none
+    integer :: v1, v2, u[*]
+    integer :: me
+
+    me = this_image()
+
+    u = 0
+    v1 = 10
+
+    v1 = u[me]
+
+    ! v2 should get value in u (0)
+    v2 = v1
+
+    if(v2 /= u) call abort()
+
+end program

[-- Attachment #3: patch_changelog.diff --]
[-- Type: text/plain, Size: 1354 bytes --]

commit 3c36054cbeb23353d9fd81a9101861c812af35d5
Author: Alessandro Fanfarillo <fanfarillo@ing.uniroma2.it>
Date:   Mon Dec 7 15:15:45 2015 +0100

    Introducing __asm__ __volatile__ (:::memory) after image control statements, send and get

diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
index ba176a1..3f0cef0 100644
--- a/gcc/fortran/ChangeLog
+++ b/gcc/fortran/ChangeLog
@@ -1,3 +1,15 @@
+2015-12-07  Tobias Burnus  <burnus@net-b.de>
+	    Alessandro Fanfarillo <fanfarillo.gcc@gmail.com>
+
+	* trans.c (gfc_allocate_using_lib,gfc_deallocate_with_status):
+	Introducing __asm__ __volatile__ ("":::"memory")
+	after image control statements.
+	* trans-stmt.c 	(gfc_trans_sync,gfc_trans_event_post_wait)
+	(gfc_trans_lock_unlock): Ditto.
+	* trans-intrinsic.c (gfc_conv_intrinsic_caf_get,
+	conv_caf_send): Introducing __asm__ __volatile__ ("":::"memory")
+	after send, get and sendget.
+
 2015-12-05  Paul Thomas  <pault@gcc.gnu.org>
 
 	PR fortran/68676
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 188ed2b..0ac8836 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2015-12-07  Tobias Burnus  <burnus@net-b.de>
+	    Alessandro Fanfarillo <fanfarillo.gcc@gmail.com>
+
+	* gfortran.dg/coarray_40.f90: New.
+
 2015-12-07  Kirill Yukhin  <kirill.yukhin@intel.com>
 
 	PR target/68627

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
  2015-12-07 14:09     ` Matthew Wahab
@ 2015-12-08  9:26       ` Tobias Burnus
  2015-12-09 14:47         ` Matthew Wahab
  0 siblings, 1 reply; 15+ messages in thread
From: Tobias Burnus @ 2015-12-08  9:26 UTC (permalink / raw)
  To: Matthew Wahab; +Cc: fortran, gcc-patches, Alessandro Fanfarillo

On Mon, Dec 07, 2015 at 02:09:22PM +0000, Matthew Wahab wrote:
> >>I wonder whether using
> >>__asm__ __volatile__ ("":::"memory");
> >>would be sufficient as it has a way lower overhead than
> >>__sync_synchronize().
>
> I don't know anything about Fortran or coarrays and I'm curious
> whether this affects architectures with weak memory models. Is the
> barrier only needed to stop reordering by the compiler or is does it
> also need to stop reordering by the hardware?

Short answer: I think no mfence is needed as either the communication
is local (to the thread/process) - in which case the hardware will act
correctly - or the communication is remote (different thread, process,
communication to different computer via interlink [ethernet, infiniband,
...]); and in the later case, the communication library has to deal with
it.

However, I think that except for SYNC using
  __asm__ __volatile__ ("":::"memory");
is the wrong solution - even though it might work as band aid.


 * * *


To gether some suggestions, here is how coarrays work. They are based on:
everything is a local variable - except it is a coarray. And all communication
is explicit - but is often single sideded.  That scheme works well (also)
with distributed memory and is similar to MPI (Message Passing Interface).


Using

integer :: var[*]

one declares a coarray scalar variable. Accessing it as

  var = 5
  tmp = var

acts on the variable on the current 'image' (process). Using

  var[idx] = 5   ! Set 'var' in image 'idx' to '5'
  tmp = var[idx] ! Obtain value of 'var' on image 'idx'

one accesses that variable on a remote image - except if 'idx' matches
the current image; in that case, it acts on the local variable.


In one segment (which ends at explicit or implicit synchronizations,
e.g. using SYNC ALL): If the value of a is changed - either locally
or by a remote image - then only that image is permitted to 'read' the
value (or to change it as that would be another possibility for race
conditions).


Very often, coarray variables are arrays and heavily accessed as local
varable in hot loops. But on the other hand, the value of the variable
can be changed by external means - up to having hardware support to
write into the memory from another computer.


Thus, there two cases were the local view might fail:

 (a) when the variable has been changed in a previous segment by a
     remote process, e.g.
       var = 5   ! assue 'var' is a coarray
       sync all  ! end of segment 
       !         ! var is changed by a remote image
       sync all  ! end of segment 
       tmp = var
     where "var" might or might not have the value 5. Or more fancy
     without locks and global barriers:
       type(event_type) :: var_is_set[*]
       var = 5
       event post(have_set_var[remote_idx])
         ! Remote waits for event, sets out 'var' to 42 and
         ! then posts an event that it is set
       event wait(have_set_var)
       tmp = var
    where one tells the remote process that "var = 5" has been set
    and waits until it has set the local variable "var".


 (b) when the variable is changed on the same image by two means, e.g.
       var = 5
       var[this_image()] = 42
       tmp = var


The communication maps to function calls like:
  __gfortran_caf_send(var_token, idx, 5)    // var[idx] = 5
  __gfortran_caf_get (tmp, var_token, idx)  // tmp = var[idx]
[Real commands, see
https://gcc.gnu.org/onlinedocs/gfortran/index.html#toc_Coarray-Programming]


It is assumed that the library called takes care of the hardware part,
i.e. locking, mfence etc. - such that in the compiler itself, one only
has to deal with restricting compiler optimizations to the places where
it is permitted.


A coarray comes into existance via:

  var = _gfortran_caf_register (size, &var_token);

similar to malloc - and var_token identifies the coarray; what the
content is, is left to the library. - For the good or worse,
_gfortran_caf_register doesn't tell the compiler that it has clobbered
"var" as in a way the pointer address escaped. Nor is "var" marked as
volatile.


Coming back to item (a): After a segment has ended (SYNC ALL, EVENT WAIT
or similar), the compiler can no longer assume that coarray variables
have the same value. Those variables can be hidden as in
    call foo()
    sync all
    call foo()
where 'foo' doesn't know about the 'sync all' and the caller doesn't know
whether 'foo' has accesses to coarrays (and whether it accesses them). I
think
  __asm__ __volatile__ ("":::"memory");
should be sufficient in that case, assuming that the communication library
takes are of making all changes available (mfence, __sync_synchronize or
whatever).


The other question is item (b): Here, one has an alias problem within the
segment - but the alias only rarely happens. Code of the form
  var[idx] = ...
usually does not access the local memory; it happens only in corner cases
like:
  do i = 1, num_images
    var[i] = 5
  end do
which sets 'var' to 5 on all images - including the local image. This will
produce code like (full example below):

  *var = 5
  _gfortran_caf_send(var_token, idx, 42);
  tmp = *var

where the frontend knows that _gfortran_caf_send might modify 'var' iff idx
matches the local image.  The question is only how to tell this the compiler.

Using __asm__ __volatile__ ("":::"memory"); seems to be appropriate, even
though something more tailored would be nice, something like:
  __asm__ __volatile__ ("": : "+rm" (*var) : "")
but I am not sure whether that's valid and works - additionally, it assumes
that one reads "*var" (i.e. that it is defined before), which is not required.
(No idea whether -Wundefined would trigger.)

I there some better way to express this?

Cheers,

Tobias

PS: Full example:

integer :: tmp
integer :: var[*]

var = 5
var[this_image()] = 42
tmp = var
end

This will generate a static constructor, run at program startup:

_caf_init.1 ()
{                           //  v-- allocate 4 byte
  var = _gfortran_caf_register (4, 0, &caf_token.0, 0B, 0B, 0);
}

and the (main) program code (slightly trimmed):

  static void * restrict caf_token.0;
  static integer(kind=4) * restrict var;
  void _caf_init.1 (void);

  *var = 4;
  
  desc.3.data = 42;
  _gfortran_caf_send (caf_token.0, 0B /* offset */ var,
                      _gfortran_caf_this_image (0), &desc.2, 0B, &desc.3, 4, 4, 0);
  __asm__ __volatile__("":::"memory");  // new
  tmp = *var;

The problem is that in that case the compiler does not know that
"_gfortran_caf_send (caf_token.0," can modify "*var".

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
  2015-12-07 14:48     ` Alessandro Fanfarillo
@ 2015-12-08 10:01       ` Tobias Burnus
  2015-12-08 14:21         ` Alessandro Fanfarillo
  0 siblings, 1 reply; 15+ messages in thread
From: Tobias Burnus @ 2015-12-08 10:01 UTC (permalink / raw)
  To: Alessandro Fanfarillo; +Cc: gfortran, gcc-patches, opencoarrays

Dear Alessandro, dear all,

On Mon, Dec 07, 2015 at 03:48:17PM +0100, Alessandro Fanfarillo wrote:
> Your patch fixes the issues. In attachment patch, test case and changelog.

Regarding the ChangeLog: Please include the added lines, only, and not the
change as patch. gcc/testsuite/ChangeLog changes too often such that a patch
won't apply.


Regarding the patch, I wonder where the memory synchronization makes sense
and where it is not required. (cf. also my email to Matthew in this thread,
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00828.html)


I think it should be after all image control statements (8.5.1 in
http://j3-fortran.org/doc/year/15/15-007r2.pdf):
SYNC ALL, SYNC IMAGES, SYNC MEMORY, ALLOCATE, DEALLOCATE,
CRITICAL ... END CRITICAL, EVENT POST, EVENT WAIT, LOCK, UNLOCK,
MOVE_ALLOC.

Thus:
- SYNC ..., ALLOCATE/DEALLOCATE: I think those are all handled by the
  current patch
- MOVE_ALLOC: This one should be handled via the internal (de)allocate
  handling (please check!)
- EVENT WAIT, CRITICAL and LOCK: Obtaining a lock or receiving an event
  implies that quite likely some other process has changed something
  before. For those, the assembler statement really has to be added.
- EVENT POST, UNLOCK and END CRITICAL: While those are image control
  statements, I do not see how a remote image could modify a value in
  a well defined way, which is guaranteed to be available after that
  statement - but might not yet be available already at the previous
  segment (i.e. the previous image control statement).

Hence: I think you should update the patch to also handle 
EVENT WAIT, CRITICAL and LOCK - and to check MOVE_ALLOC.



Additionally, we need to handle the alias issue of:
  var = 5
  var[this_image()] = 42
  tmp = var

Both _gfortran_caf_send and _gfortran_caf_sendget can modify the
value of a variable; thus, calling the assembler after the function
makes sense.


However, _gfortran_caf_get does not modify the associated variable;
adding the assembler statement *after* _gfortran_caf_get. The
question is, however, whether one needs to take care of 'flushing'
the variable before the _gfortran_caf_get:
   var = 7
   ...
   var = 5
   tmp = var[this_image()]
   result = var + tmp
Here, one needs to prevent the compiler of keeping "5" only in a
register or moving "var = 5" after the _gfortran_caf_get call.


Thus, you have to move the assembler statement before the library
call in _gfortran_caf_get - and add another one at the beginning
of _gfortran_caf_sendget.

(For send/get, might might come up with something better than
""::"memory", but for now, it should do.)

Cheers,

Tobias

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
  2015-12-08 10:01       ` Tobias Burnus
@ 2015-12-08 14:21         ` Alessandro Fanfarillo
  2015-12-09  7:23           ` Tobias Burnus
  0 siblings, 1 reply; 15+ messages in thread
From: Alessandro Fanfarillo @ 2015-12-08 14:21 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gfortran, gcc-patches, opencoarrays

[-- Attachment #1: Type: text/plain, Size: 3116 bytes --]

Hi,

in attachment the new patch. I also checked the behavior with
move_alloc: it synchronizes right after the deregistration of the
destination.
I also noticed that __asm__ __volatile__ ("":::"memory") is called
before sync all and not after. It shouldn't be a problem, right?


2015-12-08 11:01 GMT+01:00 Tobias Burnus <tobias.burnus@physik.fu-berlin.de>:
> Dear Alessandro, dear all,
>
> On Mon, Dec 07, 2015 at 03:48:17PM +0100, Alessandro Fanfarillo wrote:
>> Your patch fixes the issues. In attachment patch, test case and changelog.
>
> Regarding the ChangeLog: Please include the added lines, only, and not the
> change as patch. gcc/testsuite/ChangeLog changes too often such that a patch
> won't apply.
>
>
> Regarding the patch, I wonder where the memory synchronization makes sense
> and where it is not required. (cf. also my email to Matthew in this thread,
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00828.html)
>
>
> I think it should be after all image control statements (8.5.1 in
> http://j3-fortran.org/doc/year/15/15-007r2.pdf):
> SYNC ALL, SYNC IMAGES, SYNC MEMORY, ALLOCATE, DEALLOCATE,
> CRITICAL ... END CRITICAL, EVENT POST, EVENT WAIT, LOCK, UNLOCK,
> MOVE_ALLOC.
>
> Thus:
> - SYNC ..., ALLOCATE/DEALLOCATE: I think those are all handled by the
>   current patch
> - MOVE_ALLOC: This one should be handled via the internal (de)allocate
>   handling (please check!)
> - EVENT WAIT, CRITICAL and LOCK: Obtaining a lock or receiving an event
>   implies that quite likely some other process has changed something
>   before. For those, the assembler statement really has to be added.
> - EVENT POST, UNLOCK and END CRITICAL: While those are image control
>   statements, I do not see how a remote image could modify a value in
>   a well defined way, which is guaranteed to be available after that
>   statement - but might not yet be available already at the previous
>   segment (i.e. the previous image control statement).
>
> Hence: I think you should update the patch to also handle
> EVENT WAIT, CRITICAL and LOCK - and to check MOVE_ALLOC.
>
>
>
> Additionally, we need to handle the alias issue of:
>   var = 5
>   var[this_image()] = 42
>   tmp = var
>
> Both _gfortran_caf_send and _gfortran_caf_sendget can modify the
> value of a variable; thus, calling the assembler after the function
> makes sense.
>
>
> However, _gfortran_caf_get does not modify the associated variable;
> adding the assembler statement *after* _gfortran_caf_get. The
> question is, however, whether one needs to take care of 'flushing'
> the variable before the _gfortran_caf_get:
>    var = 7
>    ...
>    var = 5
>    tmp = var[this_image()]
>    result = var + tmp
> Here, one needs to prevent the compiler of keeping "5" only in a
> register or moving "var = 5" after the _gfortran_caf_get call.
>
>
> Thus, you have to move the assembler statement before the library
> call in _gfortran_caf_get - and add another one at the beginning
> of _gfortran_caf_sendget.
>
> (For send/get, might might come up with something better than
> ""::"memory", but for now, it should do.)
>
> Cheers,
>
> Tobias

[-- Attachment #2: caf_sync_3.diff --]
[-- Type: text/plain, Size: 9177 bytes --]

2015-12-08  Tobias Burnus  <burnus@net-b.de>
	    Alessandro Fanfarillo <fanfarillo.gcc@gmail.com>

	* trans.c (gfc_allocate_using_lib,gfc_deallocate_with_status):
	Introducing __asm__ __volatile__ ("":::"memory")
	after image control statements.
	* trans-stmt.c 	(gfc_trans_sync, gfc_trans_event_post_wait,
	gfc_trans_lock_unlock, gfc_trans_critical): Ditto.
	* trans-intrinsic.c (gfc_conv_intrinsic_caf_get,
	conv_caf_send): Introducing __asm__ __volatile__ ("":::"memory")
	after send, before get and around sendget.

2015-12-08  Tobias Burnus  <burnus@net-b.de>
	    Alessandro Fanfarillo <fanfarillo.gcc@gmail.com>

	* gfortran.dg/coarray_40.f90: New.

commit 6cdffc285931eb604d4c900d77fe60b96d604556
Author: Alessandro Fanfarillo <fanfarillo@ing.uniroma2.it>
Date:   Tue Dec 8 14:58:20 2015 +0100

    Introducing __asm__ __volatile__ (:::memory) after image control statements, send, get and sendget

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 21efe44..0e4fcc5 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -1211,6 +1211,14 @@ gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs, tree lhs_kind,
   if (lhs == NULL_TREE)
     may_require_tmp = boolean_false_node;
 
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (&se->pre, tmp);
+
   tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_get, 9,
 			     token, offset, image_index, argse.expr, vec,
 			     dst_var, kind, lhs_kind, may_require_tmp);
@@ -1222,6 +1230,15 @@ gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs, tree lhs_kind,
   se->expr = res_var;
   if (array_expr->ts.type == BT_CHARACTER)
     se->string_length = argse.string_length;
+
+  /* It guarantees memory consistency within the same segment */
+  /* tmp = gfc_build_string_const (strlen ("memory")+1, "memory"), */
+  /* tmp = build5_loc (input_location, ASM_EXPR, void_type_node, */
+  /* 		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE, */
+  /* 		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE); */
+  /* ASM_VOLATILE_P (tmp) = 1; */
+  /* gfc_add_expr_to_block (&se->pre, tmp); */
+
 }
 
 
@@ -1375,6 +1392,14 @@ conv_caf_send (gfc_code *code) {
     {
       tree rhs_token, rhs_offset, rhs_image_index;
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+	tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			  gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			  tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+      gfc_add_expr_to_block (&block, tmp);
+
       caf_decl = gfc_get_tree_for_caf_expr (rhs_expr);
       if (TREE_CODE (TREE_TYPE (caf_decl)) == REFERENCE_TYPE)
 	caf_decl = build_fold_indirect_ref_loc (input_location, caf_decl);
@@ -1390,6 +1415,15 @@ conv_caf_send (gfc_code *code) {
   gfc_add_expr_to_block (&block, tmp);
   gfc_add_block_to_block (&block, &lhs_se.post);
   gfc_add_block_to_block (&block, &rhs_se.post);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (&block, tmp);
+
   return gfc_finish_block (&block);
 }
 
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 3df483a..0075ae4 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -818,6 +818,15 @@ gfc_trans_lock_unlock (gfc_code *code, gfc_exec_op op)
 				   errmsg, errmsg_len);
       gfc_add_expr_to_block (&se.pre, tmp);
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+
+      gfc_add_expr_to_block (&se.pre, tmp);
+
       if (stat2 != NULL_TREE)
 	gfc_add_modify (&se.pre, stat2,
 			fold_convert (TREE_TYPE (stat2), stat));
@@ -995,6 +1004,14 @@ gfc_trans_event_post_wait (gfc_code *code, gfc_exec_op op)
 			       errmsg, errmsg_len);
   gfc_add_expr_to_block (&se.pre, tmp);
 
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (&se.pre, tmp);
+
   if (stat2 != NULL_TREE)
     gfc_add_modify (&se.pre, stat2, fold_convert (TREE_TYPE (stat2), stat));
 
@@ -1080,6 +1097,18 @@ gfc_trans_sync (gfc_code *code, gfc_exec_op type)
 			       fold_convert (integer_type_node, images));
     }
 
+  /* Per F2008, 8.5.1, a SYNC MEMORY is implied by calling the
+     image control statements SYNC IMAGES and SYNC ALL.  */
+  if (flag_coarray == GFC_FCOARRAY_LIB)
+    {
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+      gfc_add_expr_to_block (&se.pre, tmp);
+    }
+
   if (flag_coarray != GFC_FCOARRAY_LIB)
     {
       /* Set STAT to zero.  */
@@ -1401,6 +1430,15 @@ gfc_trans_critical (gfc_code *code)
       gfc_add_expr_to_block (&block, tmp);
     }
 
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+    tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		      gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		      tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+
+  gfc_add_expr_to_block (&block, tmp);
+
   tmp = gfc_trans_code (code->block->next);
   gfc_add_expr_to_block (&block, tmp);
 
@@ -1413,6 +1451,14 @@ gfc_trans_critical (gfc_code *code)
       gfc_add_expr_to_block (&block, tmp);
     }
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+
+      gfc_add_expr_to_block (&block, tmp);
 
   return gfc_finish_block (&block);
 }
diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index 001db41..1993743 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -746,6 +746,14 @@ gfc_allocate_using_lib (stmtblock_t * block, tree pointer, tree size,
 			 TREE_TYPE (pointer), pointer,
 			 fold_convert ( TREE_TYPE (pointer), tmp));
   gfc_add_expr_to_block (block, tmp);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (block, tmp);
 }
 
 
@@ -1356,6 +1364,14 @@ gfc_deallocate_with_status (tree pointer, tree status, tree errmsg,
 	     token, pstat, errmsg, errlen);
       gfc_add_expr_to_block (&non_null, tmp);
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+      gfc_add_expr_to_block (&non_null, tmp);
+
       if (status != NULL_TREE)
 	{
 	  tree stat = build_fold_indirect_ref_loc (input_location, status);
diff --git a/gcc/testsuite/gfortran.dg/coarray_40.f90 b/gcc/testsuite/gfortran.dg/coarray_40.f90
new file mode 100644
index 0000000..84af50e
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray_40.f90
@@ -0,0 +1,25 @@
+! { dg-do run }
+! { dg-options "-fcoarray=lib -lcaf_single" }
+!
+! Run-time test for memory consistency
+!
+! Contributed by Deepak Eachempati
+
+program cp_bug
+    implicit none
+    integer :: v1, v2, u[*]
+    integer :: me
+
+    me = this_image()
+
+    u = 0
+    v1 = 10
+
+    v1 = u[me]
+
+    ! v2 should get value in u (0)
+    v2 = v1
+
+    if(v2 /= u) call abort()
+
+end program

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
  2015-12-08 14:21         ` Alessandro Fanfarillo
@ 2015-12-09  7:23           ` Tobias Burnus
  2015-12-09 16:08             ` Alessandro Fanfarillo
  0 siblings, 1 reply; 15+ messages in thread
From: Tobias Burnus @ 2015-12-09  7:23 UTC (permalink / raw)
  To: Alessandro Fanfarillo, Tobias Burnus; +Cc: gfortran, gcc-patches, opencoarrays

Alessandro Fanfarillo wrote:
> in attachment the new patch. I also checked the behavior with
> move_alloc: it synchronizes right after the deregistration of the
> destination.
> I also noticed that __asm__ __volatile__ ("":::"memory") is called
> before sync all and not after. It shouldn't be a problem, right?

The patch looks mostly good to me, except:

> @@ -1222,6 +1230,15 @@ gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs, tree lhs_kind,
>     se->expr = res_var;
>     if (array_expr->ts.type == BT_CHARACTER)
>       se->string_length = argse.string_length;
> +
> +  /* It guarantees memory consistency within the same segment */
> +  /* tmp = gfc_build_string_const (strlen ("memory")+1, "memory"), */
> +  /* tmp = build5_loc (input_location, ASM_EXPR, void_type_node, */
> +  /* 		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE, */
> +  /* 		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE); */
> +  /* ASM_VOLATILE_P (tmp) = 1; */
> +  /* gfc_add_expr_to_block (&se->pre, tmp); */
> +
>   }

Do not add out-commented code. Please remove.


>  gfc_trans_critical (gfc_code *code)
>  {
>    stmtblock_t block;
>    tree tmp, token = NULL_TREE;
>
>    gfc_start_block (&block);
>
>    if (flag_coarray == GFC_FCOARRAY_LIB)
>      {
>        token = gfc_get_symbol_decl (code->resolved_sym);
>        token = GFC_TYPE_ARRAY_CAF_TOKEN (TREE_TYPE (token));
>        tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_lock, 7,
>                                  token, integer_zero_node, 
> integer_one_node,
>                                  null_pointer_node, null_pointer_node,
>                                  null_pointer_node, integer_zero_node);
>        gfc_add_expr_to_block (&block, tmp);
>      }
>
> +  /* It guarantees memory consistency within the same segment */
> +  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
> +    tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
> +                     gfc_build_string_const (1, ""), NULL_TREE, 
> NULL_TREE,
> +                     tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
> +  ASM_VOLATILE_P (tmp) = 1;
> +
> +  gfc_add_expr_to_block (&block, tmp);
> +
>    tmp = gfc_trans_code (code->block->next);
>    gfc_add_expr_to_block (&block, tmp);
>
>    if (flag_coarray == GFC_FCOARRAY_LIB)
>      {
>        tmp = build_call_expr_loc (input_location, 
> gfor_fndecl_caf_unlock, 6,
>                                  token, integer_zero_node, 
> integer_one_node,
>                                  null_pointer_node, null_pointer_node,
>                                  integer_zero_node);
>        gfc_add_expr_to_block (&block, tmp);
>      }
>
> +      /* It guarantees memory consistency within the same segment */
> +      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
> +      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
> +                       gfc_build_string_const (1, ""), NULL_TREE, 
> NULL_TREE,
> +                       tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
> +      ASM_VOLATILE_P (tmp) = 1;
> +
> +      gfc_add_expr_to_block (&block, tmp);
>
>    return gfc_finish_block (&block);
>  }

Please move the two new code additions into the associated "if 
(flag_coarray == GFC_FCOARRAY_LIB)" blocks - and keep an eye on the 
indentation: for the current code location, the second block is wrongly 
idented.


With the two issues fixed: LGTM.

Cheers,

Tobias

PS: I assume you can still commit yourself. I am asking because Steve 
did commit the recent EVENT patch for you.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
  2015-12-08  9:26       ` Tobias Burnus
@ 2015-12-09 14:47         ` Matthew Wahab
  0 siblings, 0 replies; 15+ messages in thread
From: Matthew Wahab @ 2015-12-09 14:47 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: fortran, gcc-patches, Alessandro Fanfarillo

On 08/12/15 09:25, Tobias Burnus wrote:
> On Mon, Dec 07, 2015 at 02:09:22PM +0000, Matthew Wahab wrote:
>>>> I wonder whether using
>>>> __asm__ __volatile__ ("":::"memory");
>>>> would be sufficient as it has a way lower overhead than
>>>> __sync_synchronize().
>>
>> I don't know anything about Fortran or coarrays and I'm curious
>> whether this affects architectures with weak memory models. Is the
>> barrier only needed to stop reordering by the compiler or is does it
>> also need to stop reordering by the hardware?
>
> Short answer: I think no mfence is needed as either the communication
> is local (to the thread/process) - in which case the hardware will act
> correctly - or the communication is remote (different thread, process,
> communication to different computer via interlink [ethernet, infiniband,
> ...]); and in the later case, the communication library has to deal with
> it.

Thanks for explaining this, it made things clear. Based on your description, I agree 
that hardware reordering shouldn't be a problem.

> and the (main) program code (slightly trimmed):
>
>    static void * restrict caf_token.0;
>    static integer(kind=4) * restrict var;
>    void _caf_init.1 (void);
>
>    *var = 4;
>
>    desc.3.data = 42;
>    _gfortran_caf_send (caf_token.0, 0B /* offset */ var,
>                        _gfortran_caf_this_image (0), &desc.2, 0B, &desc.3, 4, 4, 0);
>    __asm__ __volatile__("":::"memory");  // new
>    tmp = *var;
>
> The problem is that in that case the compiler does not know that
> "_gfortran_caf_send (caf_token.0," can modify "*var".
>

Is the restrict attribute on var correct? From what you say, it sounds like *var 
could be accessed through other pointers (assuming restrict has the same meaning as 
in C).

Matthew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
  2015-12-09  7:23           ` Tobias Burnus
@ 2015-12-09 16:08             ` Alessandro Fanfarillo
  2015-12-09 22:16               ` Tobias Burnus
  0 siblings, 1 reply; 15+ messages in thread
From: Alessandro Fanfarillo @ 2015-12-09 16:08 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Tobias Burnus, gfortran, gcc-patches, opencoarrays

[-- Attachment #1: Type: text/plain, Size: 3861 bytes --]

Done.

I have permission for contributing but I don't have write permission
on the repository.

2015-12-09 8:23 GMT+01:00 Tobias Burnus <burnus@net-b.de>:
> Alessandro Fanfarillo wrote:
>>
>> in attachment the new patch. I also checked the behavior with
>> move_alloc: it synchronizes right after the deregistration of the
>> destination.
>> I also noticed that __asm__ __volatile__ ("":::"memory") is called
>> before sync all and not after. It shouldn't be a problem, right?
>
>
> The patch looks mostly good to me, except:
>
>> @@ -1222,6 +1230,15 @@ gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr
>> *expr, tree lhs, tree lhs_kind,
>>     se->expr = res_var;
>>     if (array_expr->ts.type == BT_CHARACTER)
>>       se->string_length = argse.string_length;
>> +
>> +  /* It guarantees memory consistency within the same segment */
>> +  /* tmp = gfc_build_string_const (strlen ("memory")+1, "memory"), */
>> +  /* tmp = build5_loc (input_location, ASM_EXPR, void_type_node, */
>> +  /*               gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
>> */
>> +  /*               tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE); */
>> +  /* ASM_VOLATILE_P (tmp) = 1; */
>> +  /* gfc_add_expr_to_block (&se->pre, tmp); */
>> +
>>   }
>
>
> Do not add out-commented code. Please remove.
>
>
>>  gfc_trans_critical (gfc_code *code)
>>  {
>>    stmtblock_t block;
>>    tree tmp, token = NULL_TREE;
>>
>>    gfc_start_block (&block);
>>
>>    if (flag_coarray == GFC_FCOARRAY_LIB)
>>      {
>>        token = gfc_get_symbol_decl (code->resolved_sym);
>>        token = GFC_TYPE_ARRAY_CAF_TOKEN (TREE_TYPE (token));
>>        tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_lock, 7,
>>                                  token, integer_zero_node,
>> integer_one_node,
>>                                  null_pointer_node, null_pointer_node,
>>                                  null_pointer_node, integer_zero_node);
>>        gfc_add_expr_to_block (&block, tmp);
>>      }
>>
>> +  /* It guarantees memory consistency within the same segment */
>> +  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
>> +    tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
>> +                     gfc_build_string_const (1, ""), NULL_TREE,
>> NULL_TREE,
>> +                     tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
>> +  ASM_VOLATILE_P (tmp) = 1;
>> +
>> +  gfc_add_expr_to_block (&block, tmp);
>> +
>>    tmp = gfc_trans_code (code->block->next);
>>    gfc_add_expr_to_block (&block, tmp);
>>
>>    if (flag_coarray == GFC_FCOARRAY_LIB)
>>      {
>>        tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_unlock,
>> 6,
>>                                  token, integer_zero_node,
>> integer_one_node,
>>                                  null_pointer_node, null_pointer_node,
>>                                  integer_zero_node);
>>        gfc_add_expr_to_block (&block, tmp);
>>      }
>>
>> +      /* It guarantees memory consistency within the same segment */
>> +      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
>> +      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
>> +                       gfc_build_string_const (1, ""), NULL_TREE,
>> NULL_TREE,
>> +                       tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
>> +      ASM_VOLATILE_P (tmp) = 1;
>> +
>> +      gfc_add_expr_to_block (&block, tmp);
>>
>>    return gfc_finish_block (&block);
>>  }
>
>
> Please move the two new code additions into the associated "if (flag_coarray
> == GFC_FCOARRAY_LIB)" blocks - and keep an eye on the indentation: for the
> current code location, the second block is wrongly idented.
>
>
> With the two issues fixed: LGTM.
>
> Cheers,
>
> Tobias
>
> PS: I assume you can still commit yourself. I am asking because Steve did
> commit the recent EVENT patch for you.

[-- Attachment #2: caf_sync_4.diff --]
[-- Type: text/plain, Size: 8681 bytes --]

2015-12-09  Tobias Burnus  <burnus@net-b.de>
	    Alessandro Fanfarillo <fanfarillo.gcc@gmail.com>

	* trans.c (gfc_allocate_using_lib,gfc_deallocate_with_status):
	Introducing __asm__ __volatile__ ("":::"memory")
	after image control statements.
	* trans-stmt.c 	(gfc_trans_sync, gfc_trans_event_post_wait,
	gfc_trans_lock_unlock, gfc_trans_critical): Ditto.
	* trans-intrinsic.c (gfc_conv_intrinsic_caf_get,
	conv_caf_send): Introducing __asm__ __volatile__ ("":::"memory")
	after send, before get and around sendget.

2015-12-09  Tobias Burnus  <burnus@net-b.de>
	    Alessandro Fanfarillo <fanfarillo.gcc@gmail.com>

	* gfortran.dg/coarray_40.f90: New.

commit 96975d401c4b66c0362d853bf6b604cda171552b
Author: Alessandro Fanfarillo <fanfarillo@ing.uniroma2.it>
Date:   Wed Dec 9 17:05:28 2015 +0100

    Introducing __asm__ __volatile__ (:::memory) after image control statements, send, get and sendget

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 21efe44..31bad35 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -1211,6 +1211,14 @@ gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs, tree lhs_kind,
   if (lhs == NULL_TREE)
     may_require_tmp = boolean_false_node;
 
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (&se->pre, tmp);
+
   tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_get, 9,
 			     token, offset, image_index, argse.expr, vec,
 			     dst_var, kind, lhs_kind, may_require_tmp);
@@ -1375,6 +1383,14 @@ conv_caf_send (gfc_code *code) {
     {
       tree rhs_token, rhs_offset, rhs_image_index;
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+	tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			  gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			  tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+      gfc_add_expr_to_block (&block, tmp);
+
       caf_decl = gfc_get_tree_for_caf_expr (rhs_expr);
       if (TREE_CODE (TREE_TYPE (caf_decl)) == REFERENCE_TYPE)
 	caf_decl = build_fold_indirect_ref_loc (input_location, caf_decl);
@@ -1390,6 +1406,15 @@ conv_caf_send (gfc_code *code) {
   gfc_add_expr_to_block (&block, tmp);
   gfc_add_block_to_block (&block, &lhs_se.post);
   gfc_add_block_to_block (&block, &rhs_se.post);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (&block, tmp);
+
   return gfc_finish_block (&block);
 }
 
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 3df483a..72416d4 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -818,6 +818,15 @@ gfc_trans_lock_unlock (gfc_code *code, gfc_exec_op op)
 				   errmsg, errmsg_len);
       gfc_add_expr_to_block (&se.pre, tmp);
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+
+      gfc_add_expr_to_block (&se.pre, tmp);
+
       if (stat2 != NULL_TREE)
 	gfc_add_modify (&se.pre, stat2,
 			fold_convert (TREE_TYPE (stat2), stat));
@@ -995,6 +1004,14 @@ gfc_trans_event_post_wait (gfc_code *code, gfc_exec_op op)
 			       errmsg, errmsg_len);
   gfc_add_expr_to_block (&se.pre, tmp);
 
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (&se.pre, tmp);
+
   if (stat2 != NULL_TREE)
     gfc_add_modify (&se.pre, stat2, fold_convert (TREE_TYPE (stat2), stat));
 
@@ -1080,6 +1097,18 @@ gfc_trans_sync (gfc_code *code, gfc_exec_op type)
 			       fold_convert (integer_type_node, images));
     }
 
+  /* Per F2008, 8.5.1, a SYNC MEMORY is implied by calling the
+     image control statements SYNC IMAGES and SYNC ALL.  */
+  if (flag_coarray == GFC_FCOARRAY_LIB)
+    {
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+      gfc_add_expr_to_block (&se.pre, tmp);
+    }
+
   if (flag_coarray != GFC_FCOARRAY_LIB)
     {
       /* Set STAT to zero.  */
@@ -1399,6 +1428,17 @@ gfc_trans_critical (gfc_code *code)
 				 null_pointer_node, null_pointer_node,
 				 null_pointer_node, integer_zero_node);
       gfc_add_expr_to_block (&block, tmp);
+
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+	tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			  gfc_build_string_const (1, ""),
+			  NULL_TREE, NULL_TREE,
+			  tree_cons (NULL_TREE, tmp, NULL_TREE),
+			  NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+  
+      gfc_add_expr_to_block (&block, tmp);
     }
 
   tmp = gfc_trans_code (code->block->next);
@@ -1411,8 +1451,18 @@ gfc_trans_critical (gfc_code *code)
 				 null_pointer_node, null_pointer_node,
 				 integer_zero_node);
       gfc_add_expr_to_block (&block, tmp);
-    }
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+	tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			  gfc_build_string_const (1, ""),
+			  NULL_TREE, NULL_TREE,
+			  tree_cons (NULL_TREE, tmp, NULL_TREE),
+			  NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+
+      gfc_add_expr_to_block (&block, tmp);
+    }
 
   return gfc_finish_block (&block);
 }
diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index 001db41..1993743 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -746,6 +746,14 @@ gfc_allocate_using_lib (stmtblock_t * block, tree pointer, tree size,
 			 TREE_TYPE (pointer), pointer,
 			 fold_convert ( TREE_TYPE (pointer), tmp));
   gfc_add_expr_to_block (block, tmp);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		    gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		    tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (block, tmp);
 }
 
 
@@ -1356,6 +1364,14 @@ gfc_deallocate_with_status (tree pointer, tree status, tree errmsg,
 	     token, pstat, errmsg, errlen);
       gfc_add_expr_to_block (&non_null, tmp);
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+      tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+      gfc_add_expr_to_block (&non_null, tmp);
+
       if (status != NULL_TREE)
 	{
 	  tree stat = build_fold_indirect_ref_loc (input_location, status);
diff --git a/gcc/testsuite/gfortran.dg/coarray_40.f90 b/gcc/testsuite/gfortran.dg/coarray_40.f90
new file mode 100644
index 0000000..84af50e
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray_40.f90
@@ -0,0 +1,25 @@
+! { dg-do run }
+! { dg-options "-fcoarray=lib -lcaf_single" }
+!
+! Run-time test for memory consistency
+!
+! Contributed by Deepak Eachempati
+
+program cp_bug
+    implicit none
+    integer :: v1, v2, u[*]
+    integer :: me
+
+    me = this_image()
+
+    u = 0
+    v1 = 10
+
+    v1 = u[me]
+
+    ! v2 should get value in u (0)
+    v2 = v1
+
+    if(v2 /= u) call abort()
+
+end program

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
  2015-12-09 16:08             ` Alessandro Fanfarillo
@ 2015-12-09 22:16               ` Tobias Burnus
  2015-12-10  8:44                 ` Alessandro Fanfarillo
  0 siblings, 1 reply; 15+ messages in thread
From: Tobias Burnus @ 2015-12-09 22:16 UTC (permalink / raw)
  To: Alessandro Fanfarillo; +Cc: Tobias Burnus, gfortran, gcc-patches, opencoarrays

Alessandro Fanfarillo wrote:
> Done.
Thanks. Committed as r231476.

Do we need to do anything about GCC 5 or is this only a GCC 6 issue?

> I have permission for contributing but I don't have write permission
> on the repository.

That can be changed: Simply fill out the form and list me (burnus (at] 
gcc.gnu.org) as sponsor: https://sourceware.org/cgi-bin/pdw/ps_form.cgi 
– and see https://gcc.gnu.org/svnwrite.html

Tobias

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
  2015-12-09 22:16               ` Tobias Burnus
@ 2015-12-10  8:44                 ` Alessandro Fanfarillo
       [not found]                   ` <20151210090345.GB6991@physik.fu-berlin.de>
  0 siblings, 1 reply; 15+ messages in thread
From: Alessandro Fanfarillo @ 2015-12-10  8:44 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Tobias Burnus, gfortran, gcc-patches, opencoarrays

2015-12-09 23:16 GMT+01:00 Tobias Burnus <burnus@net-b.de>:
> Thanks. Committed as r231476.

Thanks.

>
> Do we need to do anything about GCC 5 or is this only a GCC 6 issue?
>

Yes, the patch should be applied to GCC 5 too.

> That can be changed: Simply fill out the form and list me (burnus (at]
> gcc.gnu.org) as sponsor: https://sourceware.org/cgi-bin/pdw/ps_form.cgi –
> and see https://gcc.gnu.org/svnwrite.html

Done. Thanks.

>
> Tobias

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
       [not found]                     ` <CAHqFgjVhx6pcVFEaCT8=9P+vkR=SP-mohZwr+B315N0_41OtKA@mail.gmail.com>
@ 2015-12-14 19:14                       ` Tobias Burnus
  0 siblings, 0 replies; 15+ messages in thread
From: Tobias Burnus @ 2015-12-14 19:14 UTC (permalink / raw)
  To: gcc-patches, gfortran; +Cc: Tobias Burnus

[-- Attachment #1: Type: text/plain, Size: 459 bytes --]

Alessandro Fanfarillo wrote:
> In attachment the patch for gcc5-branch.

Commited as Rev. 231626.

Tobias

> 2015-12-10 10:03 GMT+01:00 Tobias Burnus <tobias.burnus@physik.fu-berlin.de>:
>> Hi Alessandro (off list),
>>
>> On Thu, Dec 10, 2015 at 09:44:16AM +0100, Alessandro Fanfarillo wrote:
>>> Yes, the patch should be applied to GCC 5 too.
>> Can you create a patch? Requires a rediff plus removing the bits which
>> do not exist on GCC 5 - like events.


[-- Attachment #2: commit.diff --]
[-- Type: text/x-patch, Size: 8536 bytes --]

Index: gcc/fortran/ChangeLog
===================================================================
--- gcc/fortran/ChangeLog	(Revision 231625)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,19 @@
+2015-12-09  Tobias Burnus  <burnus@net-b.de>
+	    Alessandro Fanfarillo <fanfarillo.gcc@gmail.com>
+
+	Backport from mainline.
+	2015-12-09  Tobias Burnus  <burnus@net-b.de>
+	    Alessandro Fanfarillo <fanfarillo.gcc@gmail.com>
+
+	* trans.c (gfc_allocate_using_lib,gfc_deallocate_with_status):
+	Introducing __asm__ __volatile__ ("":::"memory")
+	after image control statements.
+	* trans-stmt.c 	(gfc_trans_sync, gfc_trans_event_post_wait,
+	gfc_trans_lock_unlock, gfc_trans_critical): Ditto.
+	* trans-intrinsic.c (gfc_conv_intrinsic_caf_get,
+	conv_caf_send): Introducing __asm__ __volatile__ ("":::"memory")
+	after send, before get and around sendget.
+
 2015-12-04  Release Manager
 
 	* GCC 5.3.0 released.
Index: gcc/fortran/trans-intrinsic.c
===================================================================
--- gcc/fortran/trans-intrinsic.c	(Revision 231625)
+++ gcc/fortran/trans-intrinsic.c	(Arbeitskopie)
@@ -1221,12 +1221,22 @@
   /* No overlap possible as we have generated a temporary.  */
   if (lhs == NULL_TREE)
     may_require_tmp = boolean_false_node;
+  
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+    tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		      gfc_build_string_const (1, ""),
+		      NULL_TREE, NULL_TREE,
+		      tree_cons (NULL_TREE, tmp, NULL_TREE),
+		      NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (&se->pre, tmp);
 
   tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_get, 9,
 			     token, offset, image_index, argse.expr, vec,
 			     dst_var, kind, lhs_kind, may_require_tmp);
   gfc_add_expr_to_block (&se->pre, tmp);
-
+  
   if (se->ss)
     gfc_advance_se_ss_chain (se);
 
@@ -1386,6 +1396,16 @@
     {
       tree rhs_token, rhs_offset, rhs_image_index;
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+	tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			  gfc_build_string_const (1, ""),
+			  NULL_TREE, NULL_TREE,
+			  tree_cons (NULL_TREE, tmp, NULL_TREE),
+			  NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+      gfc_add_expr_to_block (&block, tmp);
+
       caf_decl = gfc_get_tree_for_caf_expr (rhs_expr);
       if (TREE_CODE (TREE_TYPE (caf_decl)) == REFERENCE_TYPE)
 	caf_decl = build_fold_indirect_ref_loc (input_location, caf_decl);
@@ -1401,6 +1421,17 @@
   gfc_add_expr_to_block (&block, tmp);
   gfc_add_block_to_block (&block, &lhs_se.post);
   gfc_add_block_to_block (&block, &rhs_se.post);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+    tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		      gfc_build_string_const (1, ""),
+		      NULL_TREE, NULL_TREE,
+		      tree_cons (NULL_TREE, tmp, NULL_TREE),
+		      NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (&block, tmp);
+
   return gfc_finish_block (&block);
 }
 
Index: gcc/fortran/trans-stmt.c
===================================================================
--- gcc/fortran/trans-stmt.c	(Revision 231625)
+++ gcc/fortran/trans-stmt.c	(Arbeitskopie)
@@ -829,6 +829,17 @@
 				   errmsg, errmsg_len);
       gfc_add_expr_to_block (&se.pre, tmp);
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+	tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			  gfc_build_string_const (1, ""),
+			  NULL_TREE, NULL_TREE,
+			  tree_cons (NULL_TREE, tmp, NULL_TREE),
+			  NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+
+      gfc_add_expr_to_block (&se.pre, tmp);
+
       if (stat2 != NULL_TREE)
 	gfc_add_modify (&se.pre, stat2,
 			fold_convert (TREE_TYPE (stat2), stat));
@@ -931,6 +942,20 @@
 			       fold_convert (integer_type_node, images));
     }
 
+  /* Per F2008, 8.5.1, a SYNC MEMORY is implied by calling the
+     image control statements SYNC IMAGES and SYNC ALL.  */
+  if (flag_coarray == GFC_FCOARRAY_LIB)
+    {
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+	tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			  gfc_build_string_const (1, ""),
+			  NULL_TREE, NULL_TREE,
+			  tree_cons (NULL_TREE, tmp, NULL_TREE),
+			  NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+      gfc_add_expr_to_block (&se.pre, tmp);
+    }
+
   if (flag_coarray != GFC_FCOARRAY_LIB)
     {
       /* Set STAT to zero.  */
@@ -1250,6 +1275,17 @@
 				 null_pointer_node, null_pointer_node,
 				 null_pointer_node, integer_zero_node);
       gfc_add_expr_to_block (&block, tmp);
+
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+	tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			  gfc_build_string_const (1, ""),
+			  NULL_TREE, NULL_TREE,
+			  tree_cons (NULL_TREE, tmp, NULL_TREE),
+			  NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+
+      gfc_add_expr_to_block (&block, tmp);
     }
 
   tmp = gfc_trans_code (code->block->next);
@@ -1262,9 +1298,19 @@
 				 null_pointer_node, null_pointer_node,
 				 integer_zero_node);
       gfc_add_expr_to_block (&block, tmp);
+
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+	tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			  gfc_build_string_const (1, ""),
+			  NULL_TREE, NULL_TREE,
+			  tree_cons (NULL_TREE, tmp, NULL_TREE),
+			  NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+
+      gfc_add_expr_to_block (&block, tmp);
     }
 
-
   return gfc_finish_block (&block);
 }
 
Index: gcc/fortran/trans.c
===================================================================
--- gcc/fortran/trans.c	(Revision 231625)
+++ gcc/fortran/trans.c	(Arbeitskopie)
@@ -739,6 +739,16 @@
 			 TREE_TYPE (pointer), pointer,
 			 fold_convert ( TREE_TYPE (pointer), tmp));
   gfc_add_expr_to_block (block, tmp);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+    tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		      gfc_build_string_const (1, ""),
+		      NULL_TREE, NULL_TREE,
+		      tree_cons (NULL_TREE, tmp, NULL_TREE),
+		      NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (block, tmp);
 }
 
 
@@ -1360,6 +1370,16 @@
 	     token, pstat, errmsg, errlen);
       gfc_add_expr_to_block (&non_null, tmp);
 
+      /* It guarantees memory consistency within the same segment */
+      tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+	tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			  gfc_build_string_const (1, ""),
+			  NULL_TREE, NULL_TREE,
+			  tree_cons (NULL_TREE, tmp, NULL_TREE),
+			  NULL_TREE);
+      ASM_VOLATILE_P (tmp) = 1;
+      gfc_add_expr_to_block (&non_null, tmp);
+
       if (status != NULL_TREE)
 	{
 	  tree stat = build_fold_indirect_ref_loc (input_location, status);
Index: gcc/testsuite/ChangeLog
===================================================================
--- gcc/testsuite/ChangeLog	(Revision 231625)
+++ gcc/testsuite/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,12 @@
+2015-12-09  Tobias Burnus  <burnus@net-b.de>
+	    Alessandro Fanfarillo <fanfarillo.gcc@gmail.com>
+
+	Backport from mainline.
+	2015-12-09  Tobias Burnus  <burnus@net-b.de>
+	    Alessandro Fanfarillo <fanfarillo.gcc@gmail.com>
+
+	* gfortran.dg/coarray_40.f90: New.
+
 2015-12-14  Martin Jambor  <mjambor@suse.cz>
 
 	PR ipa/66616
Index: gcc/testsuite/gfortran.dg/coarray_40.f90
===================================================================
--- gcc/testsuite/gfortran.dg/coarray_40.f90	(nicht existent)
+++ gcc/testsuite/gfortran.dg/coarray_40.f90	(Arbeitskopie)
@@ -0,0 +1,25 @@
+! { dg-do run }
+! { dg-options "-fcoarray=lib -lcaf_single" }
+!
+! Run-time test for memory consistency
+!
+! Contributed by Deepak Eachempati
+
+program cp_bug
+    implicit none
+    integer :: v1, v2, u[*]
+    integer :: me
+
+    me = this_image()
+
+    u = 0
+    v1 = 10
+
+    v1 = u[me]
+
+    ! v2 should get value in u (0)
+    v2 = v1
+
+    if(v2 /= u) call abort()
+
+end program


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-12-14 19:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-06 21:02 [Fortran, Patch] Memory sync after coarray image control statements and assignment Alessandro Fanfarillo
2015-12-07  7:20 ` Tobias Burnus
2015-12-07  9:16   ` Alessandro Fanfarillo
2015-12-07 10:06   ` Tobias Burnus
2015-12-07 14:09     ` Matthew Wahab
2015-12-08  9:26       ` Tobias Burnus
2015-12-09 14:47         ` Matthew Wahab
2015-12-07 14:48     ` Alessandro Fanfarillo
2015-12-08 10:01       ` Tobias Burnus
2015-12-08 14:21         ` Alessandro Fanfarillo
2015-12-09  7:23           ` Tobias Burnus
2015-12-09 16:08             ` Alessandro Fanfarillo
2015-12-09 22:16               ` Tobias Burnus
2015-12-10  8:44                 ` Alessandro Fanfarillo
     [not found]                   ` <20151210090345.GB6991@physik.fu-berlin.de>
     [not found]                     ` <CAHqFgjVhx6pcVFEaCT8=9P+vkR=SP-mohZwr+B315N0_41OtKA@mail.gmail.com>
2015-12-14 19:14                       ` Tobias Burnus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).