I wrote: > I wonder whether using > > __asm__ __volatile__ ("":::"memory"); > > would be sufficient as it has a way lower overhead than > __sync_synchronize(). Namely, something like the attached patch. Regarding the original patch submission: Is there a reason that you didn't include the test case of Deepak from https://gcc.gnu.org/ml/fortran/2015-04/msg00062.html It should work as -fcoarray=lib -lcaf_single "dg-do run" test. Tobias