public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Patch for PR libgomp/37938, IA64 specific.
@ 2008-11-04 16:51 Steve Ellcey
  2008-11-04 17:22 ` Jakub Jelinek
  0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2008-11-04 16:51 UTC (permalink / raw)
  To: gcc-patches

Some libgomp tests were failing on IA64 Linux (but not HP-UX), the
problem was that Linux uses the sync_lock_test_and_set instruction and
this did not have a memory barrier in it to make sure that loads and
stores were completed before the sync_lock_test_and_set instruction was
executed.  This patch changes sync_lock_test_and_set to a define_expand
that has a memory barrier followed by the original xchg instruction (now
renamed to sync_lock_test_and_set_xchg).

Tested on IA64 Linux and HP-UX with no regressions.

I'll wait a day before checking the patch in to give people a chance to
look it over.

Steve Ellcey
sje@cup.hp.com



2008-11-04  Steve Ellcey  <sje@cup.hp.com>

	PR libgomp/37938
	* config/ia64/sync.md (sync_lock_test_and_set_xchg<mode>): New name
	for original sync_lock_test_and_set<mode>.
	(sync_lock_test_and_set<mode>): Create define_expand version.


Index: config/ia64/sync.md
===================================================================
--- config/ia64/sync.md	(revision 141574)
+++ config/ia64/sync.md	(working copy)
@@ -159,7 +159,20 @@ (define_insn "cmpxchg_rel_di"
   "cmpxchg8.rel %0 = %1, %3, %2"
   [(set_attr "itanium_class" "sem")])
 
-(define_insn "sync_lock_test_and_set<mode>"
+(define_expand "sync_lock_test_and_set<mode>"
+  [(set (match_operand:IMODE 0 "gr_register_operand" "")
+        (match_operand:IMODE 1 "not_postinc_memory_operand" ""))
+   (set (match_dup 1)
+        (match_operand:IMODE 2 "gr_register_operand" ""))]
+  ""
+{
+  emit_insn (gen_memory_barrier ());
+  emit_insn (gen_sync_lock_test_and_set_xchg<mode> (operands[0], operands[1],
+	     operands[2]));
+  DONE;
+})
+
+(define_insn "sync_lock_test_and_set_xchg<mode>"
   [(set (match_operand:IMODE 0 "gr_register_operand" "=r")
         (match_operand:IMODE 1 "not_postinc_memory_operand" "+S"))
    (set (match_dup 1)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-04 16:51 Patch for PR libgomp/37938, IA64 specific Steve Ellcey
@ 2008-11-04 17:22 ` Jakub Jelinek
  2008-11-06 17:14   ` Steve Ellcey
  2008-11-06 17:30   ` Richard Henderson
  0 siblings, 2 replies; 38+ messages in thread
From: Jakub Jelinek @ 2008-11-04 17:22 UTC (permalink / raw)
  To: Steve Ellcey, Richard Henderson, hjl; +Cc: gcc-patches

On Tue, Nov 04, 2008 at 08:50:28AM -0800, Steve Ellcey wrote:
> Some libgomp tests were failing on IA64 Linux (but not HP-UX), the
> problem was that Linux uses the sync_lock_test_and_set instruction and
> this did not have a memory barrier in it to make sure that loads and
> stores were completed before the sync_lock_test_and_set instruction was
> executed.  This patch changes sync_lock_test_and_set to a define_expand
> that has a memory barrier followed by the original xchg instruction (now
> renamed to sync_lock_test_and_set_xchg).

I think having a memory barrier there matches our documentation:

"In most cases, these builtins are considered a "full barrier".  That
is, no memory operand will be moved across the operation, either
forward or backward.  Further, instructions will be issued as necessary
to prevent the processor from speculating loads across the operation
and from queuing stores after the operation."

and what other ports do as well.  Shouldn't
sync_lock_release<mode> get the same treatment on ia64 though?
E.g. ppc has gen_lwsync in sync_lock_release and the generic implementation
is:
  /* Otherwise we can implement this operation by emitting a barrier
     followed by a store of zero.  */
  expand_builtin_synchronize ();
  emit_move_insn (mem, val);
(wonder why the generic one isn't sufficient for ia64).

When we are talking about __sync* builtins, H.J., does icc imply a barrier
in these?

	Jakub

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-04 17:22 ` Jakub Jelinek
@ 2008-11-06 17:14   ` Steve Ellcey
  2008-11-06 17:42     ` H.J. Lu
  2008-11-06 17:30   ` Richard Henderson
  1 sibling, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2008-11-06 17:14 UTC (permalink / raw)
  To: jakub; +Cc: rth, hjl, gcc-patches


On Tue, 2008-11-04 at 18:17 +0100, Jakub Jelinek wrote:

> Shouldn't sync_lock_release<mode> get the same treatment on ia64 though?

I think you are right.  I have added a memory_barrier call in
sync_lock_release<mode> to my patch and retested.  Here is the current
patch I intend to check in:

2008-11-06  Steve Ellcey  <sje@cup.hp.com>

	PR libgomp/37938
	* config/ia64/sync.md (sync_lock_test_and_set_xchg<mode>): New name
	for original sync_lock_test_and_set<mode>.
	(sync_lock_test_and_set<mode>): Create define_expand version.
	(sync_lock_release<mode>): Add memory_barrier.


Index: config/ia64/sync.md
===================================================================
--- config/ia64/sync.md	(revision 141635)
+++ config/ia64/sync.md	(working copy)
@@ -159,7 +159,20 @@ (define_insn "cmpxchg_rel_di"
   "cmpxchg8.rel %0 = %1, %3, %2"
   [(set_attr "itanium_class" "sem")])
 
-(define_insn "sync_lock_test_and_set<mode>"
+(define_expand "sync_lock_test_and_set<mode>"
+  [(set (match_operand:IMODE 0 "gr_register_operand" "")
+        (match_operand:IMODE 1 "not_postinc_memory_operand" ""))
+   (set (match_dup 1)
+        (match_operand:IMODE 2 "gr_register_operand" ""))]
+  ""
+{
+  emit_insn (gen_memory_barrier ());
+  emit_insn (gen_sync_lock_test_and_set_xchg<mode> (operands[0], operands[1],
+	     operands[2]));
+  DONE;
+})
+
+(define_insn "sync_lock_test_and_set_xchg<mode>"
   [(set (match_operand:IMODE 0 "gr_register_operand" "=r")
         (match_operand:IMODE 1 "not_postinc_memory_operand" "+S"))
    (set (match_dup 1)
@@ -173,5 +186,6 @@ (define_expand "sync_lock_release<mode>"
 	(match_operand:IMODE 1 "gr_reg_or_0_operand" ""))]
   ""
 {
+  emit_insn (gen_memory_barrier ());
   gcc_assert (MEM_VOLATILE_P (operands[0]));
 })

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-04 17:22 ` Jakub Jelinek
  2008-11-06 17:14   ` Steve Ellcey
@ 2008-11-06 17:30   ` Richard Henderson
  1 sibling, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2008-11-06 17:30 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Steve Ellcey, hjl, gcc-patches

Jakub Jelinek wrote:
> and what other ports do as well.  Shouldn't
> sync_lock_release<mode> get the same treatment on ia64 though?

No, lock_release has release semantics, exactly matching
the ia64 semantics of a volatile store.


r~

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 17:14   ` Steve Ellcey
@ 2008-11-06 17:42     ` H.J. Lu
  2008-11-06 18:02       ` H.J. Lu
  0 siblings, 1 reply; 38+ messages in thread
From: H.J. Lu @ 2008-11-06 17:42 UTC (permalink / raw)
  To: Steve Ellcey; +Cc: jakub, rth, hjl, gcc-patches

On Thu, Nov 6, 2008 at 9:03 AM, Steve Ellcey <sje@cup.hp.com> wrote:
>
> On Tue, 2008-11-04 at 18:17 +0100, Jakub Jelinek wrote:
>
>> Shouldn't sync_lock_release<mode> get the same treatment on ia64 though?
>
> I think you are right.  I have added a memory_barrier call in
> sync_lock_release<mode> to my patch and retested.  Here is the current
> patch I intend to check in:
>
> 2008-11-06  Steve Ellcey  <sje@cup.hp.com>
>
>        PR libgomp/37938
>        * config/ia64/sync.md (sync_lock_test_and_set_xchg<mode>): New name
>        for original sync_lock_test_and_set<mode>.
>        (sync_lock_test_and_set<mode>): Create define_expand version.
>        (sync_lock_release<mode>): Add memory_barrier.
>
>
> Index: config/ia64/sync.md
> ===================================================================
> --- config/ia64/sync.md (revision 141635)
> +++ config/ia64/sync.md (working copy)
> @@ -159,7 +159,20 @@ (define_insn "cmpxchg_rel_di"
>   "cmpxchg8.rel %0 = %1, %3, %2"
>   [(set_attr "itanium_class" "sem")])
>
> -(define_insn "sync_lock_test_and_set<mode>"
> +(define_expand "sync_lock_test_and_set<mode>"
> +  [(set (match_operand:IMODE 0 "gr_register_operand" "")
> +        (match_operand:IMODE 1 "not_postinc_memory_operand" ""))
> +   (set (match_dup 1)
> +        (match_operand:IMODE 2 "gr_register_operand" ""))]
> +  ""
> +{
> +  emit_insn (gen_memory_barrier ());
> +  emit_insn (gen_sync_lock_test_and_set_xchg<mode> (operands[0], operands[1],
> +            operands[2]));
> +  DONE;
> +})
> +
> +(define_insn "sync_lock_test_and_set_xchg<mode>"
>   [(set (match_operand:IMODE 0 "gr_register_operand" "=r")
>         (match_operand:IMODE 1 "not_postinc_memory_operand" "+S"))
>    (set (match_dup 1)
> @@ -173,5 +186,6 @@ (define_expand "sync_lock_release<mode>"
>        (match_operand:IMODE 1 "gr_reg_or_0_operand" ""))]
>   ""
>  {
> +  emit_insn (gen_memory_barrier ());
>   gcc_assert (MEM_VOLATILE_P (operands[0]));
>  })
>

I checked icc 11.0. It doesn't generate "mf" for __sync_lock_test_and_set.

--
#include <ia64intrin.h>

int
foo (int *p, int x)
{
  return __sync_lock_test_and_set (p, x);
}
--

generates:

foo:
	.prologue
	.body
	.mib
	xchg4 r8 = [r32], r33
	nop 0
	br.ret.sptk.many b0
	.endp foo#

xchg4 is performed with acquire semantics, i.e., the memory read/write is made
visible prior to all subsequent data memory access. If mf makes a difference,
that means we may access the memory with release or unordered semantics
before. If I have to guess, we may have an unordered read somewhere.

-- 
H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 17:42     ` H.J. Lu
@ 2008-11-06 18:02       ` H.J. Lu
  2008-11-06 19:07         ` Steve Ellcey
  0 siblings, 1 reply; 38+ messages in thread
From: H.J. Lu @ 2008-11-06 18:02 UTC (permalink / raw)
  To: Steve Ellcey; +Cc: jakub, rth, gcc-patches

On Thu, Nov 6, 2008 at 9:29 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Nov 6, 2008 at 9:03 AM, Steve Ellcey <sje@cup.hp.com> wrote:
>>
>> On Tue, 2008-11-04 at 18:17 +0100, Jakub Jelinek wrote:
>>
>>> Shouldn't sync_lock_release<mode> get the same treatment on ia64 though?
>>
>> I think you are right.  I have added a memory_barrier call in
>> sync_lock_release<mode> to my patch and retested.  Here is the current
>> patch I intend to check in:
>>
>> 2008-11-06  Steve Ellcey  <sje@cup.hp.com>
>>
>>        PR libgomp/37938
>>        * config/ia64/sync.md (sync_lock_test_and_set_xchg<mode>): New name
>>        for original sync_lock_test_and_set<mode>.
>>        (sync_lock_test_and_set<mode>): Create define_expand version.
>>        (sync_lock_release<mode>): Add memory_barrier.
>>
>>
>> Index: config/ia64/sync.md
>> ===================================================================
>> --- config/ia64/sync.md (revision 141635)
>> +++ config/ia64/sync.md (working copy)
>> @@ -159,7 +159,20 @@ (define_insn "cmpxchg_rel_di"
>>   "cmpxchg8.rel %0 = %1, %3, %2"
>>   [(set_attr "itanium_class" "sem")])
>>
>> -(define_insn "sync_lock_test_and_set<mode>"
>> +(define_expand "sync_lock_test_and_set<mode>"
>> +  [(set (match_operand:IMODE 0 "gr_register_operand" "")
>> +        (match_operand:IMODE 1 "not_postinc_memory_operand" ""))
>> +   (set (match_dup 1)
>> +        (match_operand:IMODE 2 "gr_register_operand" ""))]
>> +  ""
>> +{
>> +  emit_insn (gen_memory_barrier ());
>> +  emit_insn (gen_sync_lock_test_and_set_xchg<mode> (operands[0], operands[1],
>> +            operands[2]));
>> +  DONE;
>> +})
>> +
>> +(define_insn "sync_lock_test_and_set_xchg<mode>"
>>   [(set (match_operand:IMODE 0 "gr_register_operand" "=r")
>>         (match_operand:IMODE 1 "not_postinc_memory_operand" "+S"))
>>    (set (match_dup 1)
>> @@ -173,5 +186,6 @@ (define_expand "sync_lock_release<mode>"
>>        (match_operand:IMODE 1 "gr_reg_or_0_operand" ""))]
>>   ""
>>  {
>> +  emit_insn (gen_memory_barrier ());
>>   gcc_assert (MEM_VOLATILE_P (operands[0]));
>>  })
>>
>
> I checked icc 11.0. It doesn't generate "mf" for __sync_lock_test_and_set.
>
> --
> #include <ia64intrin.h>
>
> int
> foo (int *p, int x)
> {
>  return __sync_lock_test_and_set (p, x);
> }
> --
>
> generates:
>
> foo:
>        .prologue
>        .body
>        .mib
>        xchg4 r8 = [r32], r33
>        nop 0
>        br.ret.sptk.many b0
>        .endp foo#
>
> xchg4 is performed with acquire semantics, i.e., the memory read/write is made
> visible prior to all subsequent data memory access. If mf makes a difference,
> that means we may access the memory with release or unordered semantics
> before. If I have to guess, we may have an unordered read somewhere.
>
> --
> H.J.
>

Shouldn't we use __sync_lock_test_and_set  to initialize mutux?


H.J.
--- libgomp/config/linux/mutex.h.sync	2006-11-18 06:22:18.000000000 -0800
+++ libgomp/config/linux/mutex.h	2008-11-06 09:50:46.000000000 -0800
@@ -38,7 +38,7 @@ typedef int gomp_mutex_t;

 static inline void gomp_mutex_init (gomp_mutex_t *mutex)
 {
-  *mutex = 0;
+  __sync_lock_test_and_set (mutex, 0);
 }

 extern void gomp_mutex_lock_slow (gomp_mutex_t *mutex);


-- 
H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 18:02       ` H.J. Lu
@ 2008-11-06 19:07         ` Steve Ellcey
  2008-11-06 19:52           ` H.J. Lu
  2008-11-06 21:35           ` Jakub Jelinek
  0 siblings, 2 replies; 38+ messages in thread
From: Steve Ellcey @ 2008-11-06 19:07 UTC (permalink / raw)
  To: H.J. Lu; +Cc: jakub, rth, gcc-patches


On Thu, 2008-11-06 at 09:56 -0800, H.J. Lu wrote:

> Shouldn't we use __sync_lock_test_and_set  to initialize mutux?
> 
> 
> H.J.
> --- libgomp/config/linux/mutex.h.sync	2006-11-18 06:22:18.000000000 -0800
> +++ libgomp/config/linux/mutex.h	2008-11-06 09:50:46.000000000 -0800
> @@ -38,7 +38,7 @@ typedef int gomp_mutex_t;
> 
>  static inline void gomp_mutex_init (gomp_mutex_t *mutex)
>  {
> -  *mutex = 0;
> +  __sync_lock_test_and_set (mutex, 0);
>  }
> 
>  extern void gomp_mutex_lock_slow (gomp_mutex_t *mutex);

I was wondering about the initialization but this patch didn't seem to
fix the test case in PR 37938 when I added it and then removed my
patches.

Steve Ellcey
sje@cup.hp.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 19:07         ` Steve Ellcey
@ 2008-11-06 19:52           ` H.J. Lu
  2008-11-06 20:34             ` H.J. Lu
  2008-11-06 21:35           ` Jakub Jelinek
  1 sibling, 1 reply; 38+ messages in thread
From: H.J. Lu @ 2008-11-06 19:52 UTC (permalink / raw)
  To: sje; +Cc: jakub, rth, gcc-patches

On Thu, Nov 6, 2008 at 10:33 AM, Steve Ellcey <sje@cup.hp.com> wrote:
>
> On Thu, 2008-11-06 at 09:56 -0800, H.J. Lu wrote:
>
>> Shouldn't we use __sync_lock_test_and_set  to initialize mutux?
>>
>>
>> H.J.
>> --- libgomp/config/linux/mutex.h.sync 2006-11-18 06:22:18.000000000 -0800
>> +++ libgomp/config/linux/mutex.h      2008-11-06 09:50:46.000000000 -0800
>> @@ -38,7 +38,7 @@ typedef int gomp_mutex_t;
>>
>>  static inline void gomp_mutex_init (gomp_mutex_t *mutex)
>>  {
>> -  *mutex = 0;
>> +  __sync_lock_test_and_set (mutex, 0);
>>  }
>>
>>  extern void gomp_mutex_lock_slow (gomp_mutex_t *mutex);
>
> I was wondering about the initialization but this patch didn't seem to
> fix the test case in PR 37938 when I added it and then removed my
> patches.
>

We have to make sure that both read and write to mutex are done with
proper memory ordering. I don't think this is IA64 specific.

We can add a memory barrier before _sync_lock_test_and_set (). But I
don't think we should add a memory barrier in _sync_lock_test_and_set.

-- 
H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 19:52           ` H.J. Lu
@ 2008-11-06 20:34             ` H.J. Lu
  2008-11-06 21:05               ` Steve Ellcey
  0 siblings, 1 reply; 38+ messages in thread
From: H.J. Lu @ 2008-11-06 20:34 UTC (permalink / raw)
  To: sje; +Cc: jakub, rth, gcc-patches

On Thu, Nov 6, 2008 at 11:06 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Nov 6, 2008 at 10:33 AM, Steve Ellcey <sje@cup.hp.com> wrote:
>>
>> On Thu, 2008-11-06 at 09:56 -0800, H.J. Lu wrote:
>>
>>> Shouldn't we use __sync_lock_test_and_set  to initialize mutux?
>>>
>>>
>>> H.J.
>>> --- libgomp/config/linux/mutex.h.sync 2006-11-18 06:22:18.000000000 -0800
>>> +++ libgomp/config/linux/mutex.h      2008-11-06 09:50:46.000000000 -0800
>>> @@ -38,7 +38,7 @@ typedef int gomp_mutex_t;
>>>
>>>  static inline void gomp_mutex_init (gomp_mutex_t *mutex)
>>>  {
>>> -  *mutex = 0;
>>> +  __sync_lock_test_and_set (mutex, 0);
>>>  }
>>>
>>>  extern void gomp_mutex_lock_slow (gomp_mutex_t *mutex);
>>
>> I was wondering about the initialization but this patch didn't seem to
>> fix the test case in PR 37938 when I added it and then removed my
>> patches.
>>
>
> We have to make sure that both read and write to mutex are done with
> proper memory ordering. I don't think this is IA64 specific.
>
> We can add a memory barrier before _sync_lock_test_and_set (). But I
> don't think we should add a memory barrier in _sync_lock_test_and_set.
>

There are

libgomp/critical.c:static gomp_mutex_t default_lock;
libgomp/critical.c:static gomp_mutex_t create_lock_lock;
libgomp/critical.c:  gomp_mutex_t *plock;
libgomp/critical.c:      && sizeof (gomp_mutex_t) <= sizeof (void *)
libgomp/critical.c:      && __alignof (gomp_mutex_t) <= sizeof (void *))
libgomp/critical.c:    plock = (gomp_mutex_t *)pptr;
libgomp/critical.c:	  gomp_mutex_t *nlock = gomp_malloc (sizeof (gomp_mutex_t));
libgomp/critical.c:	      plock = gomp_malloc (sizeof (gomp_mutex_t));
libgomp/critical.c:  gomp_mutex_t *plock;
libgomp/critical.c:      && sizeof (gomp_mutex_t) <= sizeof (void *)
libgomp/critical.c:      && __alignof (gomp_mutex_t) <= sizeof (void *))
libgomp/critical.c:    plock = (gomp_mutex_t *)pptr;
libgomp/critical.c:static gomp_mutex_t atomic_lock;

Those reads look suspicious.

-- 
H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 20:34             ` H.J. Lu
@ 2008-11-06 21:05               ` Steve Ellcey
  2008-11-06 21:10                 ` H.J. Lu
  0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2008-11-06 21:05 UTC (permalink / raw)
  To: H.J. Lu; +Cc: jakub, rth, gcc-patches


On Thu, 2008-11-06 at 11:09 -0800, H.J. Lu wrote:

> There are
> 
> libgomp/critical.c:static gomp_mutex_t default_lock;
> libgomp/critical.c:static gomp_mutex_t create_lock_lock;
> libgomp/critical.c:  gomp_mutex_t *plock;
> libgomp/critical.c:      && sizeof (gomp_mutex_t) <= sizeof (void *)
> libgomp/critical.c:      && __alignof (gomp_mutex_t) <= sizeof (void *))
> libgomp/critical.c:    plock = (gomp_mutex_t *)pptr;
> libgomp/critical.c:	  gomp_mutex_t *nlock = gomp_malloc (sizeof (gomp_mutex_t));
> libgomp/critical.c:	      plock = gomp_malloc (sizeof (gomp_mutex_t));
> libgomp/critical.c:  gomp_mutex_t *plock;
> libgomp/critical.c:      && sizeof (gomp_mutex_t) <= sizeof (void *)
> libgomp/critical.c:      && __alignof (gomp_mutex_t) <= sizeof (void *))
> libgomp/critical.c:    plock = (gomp_mutex_t *)pptr;
> libgomp/critical.c:static gomp_mutex_t atomic_lock;
> 
> Those reads look suspicious.

I am not sure what reads you mean.  Do you mean the 'plock =
(gomp_mutex_t *) pptr' statements? The sizeof and __alignof uses are not
reads, right?  But pptr is a pointer to mutex and not a mutex itself, so
reading pptr shouldn't be a problem.

Steve Ellcey
sje@cup.hp.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 21:05               ` Steve Ellcey
@ 2008-11-06 21:10                 ` H.J. Lu
  0 siblings, 0 replies; 38+ messages in thread
From: H.J. Lu @ 2008-11-06 21:10 UTC (permalink / raw)
  To: sje; +Cc: jakub, rth, gcc-patches

On Thu, Nov 6, 2008 at 12:32 PM, Steve Ellcey <sje@cup.hp.com> wrote:
>
> On Thu, 2008-11-06 at 11:09 -0800, H.J. Lu wrote:
>
>> There are
>>
>> libgomp/critical.c:static gomp_mutex_t default_lock;
>> libgomp/critical.c:static gomp_mutex_t create_lock_lock;
>> libgomp/critical.c:  gomp_mutex_t *plock;
>> libgomp/critical.c:      && sizeof (gomp_mutex_t) <= sizeof (void *)
>> libgomp/critical.c:      && __alignof (gomp_mutex_t) <= sizeof (void *))
>> libgomp/critical.c:    plock = (gomp_mutex_t *)pptr;
>> libgomp/critical.c:     gomp_mutex_t *nlock = gomp_malloc (sizeof (gomp_mutex_t));
>> libgomp/critical.c:         plock = gomp_malloc (sizeof (gomp_mutex_t));
>> libgomp/critical.c:  gomp_mutex_t *plock;
>> libgomp/critical.c:      && sizeof (gomp_mutex_t) <= sizeof (void *)
>> libgomp/critical.c:      && __alignof (gomp_mutex_t) <= sizeof (void *))
>> libgomp/critical.c:    plock = (gomp_mutex_t *)pptr;
>> libgomp/critical.c:static gomp_mutex_t atomic_lock;
>>
>> Those reads look suspicious.
>
> I am not sure what reads you mean.  Do you mean the 'plock =
> (gomp_mutex_t *) pptr' statements? The sizeof and __alignof uses are not
> reads, right?  But pptr is a pointer to mutex and not a mutex itself, so
> reading pptr shouldn't be a problem.
>

You are right. I think we need to double check all codes in

#ifdef HAVE_SYNC_BUILTINS
...
#endif

to make sure that they have proper memory ordering.

-- 
H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 19:07         ` Steve Ellcey
  2008-11-06 19:52           ` H.J. Lu
@ 2008-11-06 21:35           ` Jakub Jelinek
  2008-11-06 21:38             ` H.J. Lu
  1 sibling, 1 reply; 38+ messages in thread
From: Jakub Jelinek @ 2008-11-06 21:35 UTC (permalink / raw)
  To: Steve Ellcey; +Cc: H.J. Lu, rth, gcc-patches

On Thu, Nov 06, 2008 at 10:33:11AM -0800, Steve Ellcey wrote:
> On Thu, 2008-11-06 at 09:56 -0800, H.J. Lu wrote:
> 
> > Shouldn't we use __sync_lock_test_and_set  to initialize mutux?
> > 
> > 
> > H.J.
> > --- libgomp/config/linux/mutex.h.sync	2006-11-18 06:22:18.000000000 -0800
> > +++ libgomp/config/linux/mutex.h	2008-11-06 09:50:46.000000000 -0800
> > @@ -38,7 +38,7 @@ typedef int gomp_mutex_t;
> > 
> >  static inline void gomp_mutex_init (gomp_mutex_t *mutex)
> >  {
> > -  *mutex = 0;
> > +  __sync_lock_test_and_set (mutex, 0);
> >  }
> > 
> >  extern void gomp_mutex_lock_slow (gomp_mutex_t *mutex);
> 
> I was wondering about the initialization but this patch didn't seem to
> fix the test case in PR 37938 when I added it and then removed my
> patches.

Why would mutex initialization need it?  A mutex is always initialized
uncontended, so other threads either don't exist at that point, or are
prevented from trying to look it, usually by some other mutex, barrier etc.

I guess the problem is that the __sync_* builtins don't have explicit
acquire vs. release semantics variants.  We need to use
__sync_bool_compare_and_swap for locking the mutex, and I believe on ia64
__sync_bool_compare_and_swap has release, not acquire semantics, which is
desirable for mutex lock, and for unlocking the mutex we use
__sync_lock_test_and_set (which has acquire semantics?), but want release
semantics, right?  Putting __sync_synchronize into gomp_mutex_unlock
would mean slowing down other targets, for which already
__sync_lock_test_and_set (and __sync_bool_compare_and_swap) act as a full
barrier.  So I'm afraid ia64 needs to define config/linux/ia64/mutex.[ch]
which uses __asm instead of __sync builtins.

	Jakub

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 21:35           ` Jakub Jelinek
@ 2008-11-06 21:38             ` H.J. Lu
  2008-11-06 21:59               ` Jakub Jelinek
  0 siblings, 1 reply; 38+ messages in thread
From: H.J. Lu @ 2008-11-06 21:38 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Steve Ellcey, rth, gcc-patches

On Thu, Nov 6, 2008 at 1:00 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Nov 06, 2008 at 10:33:11AM -0800, Steve Ellcey wrote:
>> On Thu, 2008-11-06 at 09:56 -0800, H.J. Lu wrote:
>>
>> > Shouldn't we use __sync_lock_test_and_set  to initialize mutux?
>> >
>> >
>> > H.J.
>> > --- libgomp/config/linux/mutex.h.sync       2006-11-18 06:22:18.000000000 -0800
>> > +++ libgomp/config/linux/mutex.h    2008-11-06 09:50:46.000000000 -0800
>> > @@ -38,7 +38,7 @@ typedef int gomp_mutex_t;
>> >
>> >  static inline void gomp_mutex_init (gomp_mutex_t *mutex)
>> >  {
>> > -  *mutex = 0;
>> > +  __sync_lock_test_and_set (mutex, 0);
>> >  }
>> >
>> >  extern void gomp_mutex_lock_slow (gomp_mutex_t *mutex);
>>
>> I was wondering about the initialization but this patch didn't seem to
>> fix the test case in PR 37938 when I added it and then removed my
>> patches.
>
> Why would mutex initialization need it?  A mutex is always initialized
> uncontended, so other threads either don't exist at that point, or are
> prevented from trying to look it, usually by some other mutex, barrier etc.
>
> I guess the problem is that the __sync_* builtins don't have explicit
> acquire vs. release semantics variants.  We need to use
> __sync_bool_compare_and_swap for locking the mutex, and I believe on ia64
> __sync_bool_compare_and_swap has release, not acquire semantics, which is
> desirable for mutex lock, and for unlocking the mutex we use
> __sync_lock_test_and_set (which has acquire semantics?), but want release
> semantics, right?  Putting __sync_synchronize into gomp_mutex_unlock
> would mean slowing down other targets, for which already
> __sync_lock_test_and_set (and __sync_bool_compare_and_swap) act as a full
> barrier.  So I'm afraid ia64 needs to define config/linux/ia64/mutex.[ch]
> which uses __asm instead of __sync builtins.
>

According to ia64 psABI, ia64 _sync_bool_compare_and_swap should
have full barrier.  __sync_lock_test_and_set  has acquire barrier.


-- 
H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 21:38             ` H.J. Lu
@ 2008-11-06 21:59               ` Jakub Jelinek
  2008-11-06 22:15                 ` H.J. Lu
  0 siblings, 1 reply; 38+ messages in thread
From: Jakub Jelinek @ 2008-11-06 21:59 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Steve Ellcey, rth, gcc-patches

On Thu, Nov 06, 2008 at 01:09:44PM -0800, H.J. Lu wrote:
> According to ia64 psABI, ia64 _sync_bool_compare_and_swap should
> have full barrier.  __sync_lock_test_and_set  has acquire barrier.

Ok, then we have just one problematic __sync_lock_test_and_set
in whole libgomp.

Would the following (untested) patch cure it?  Or we can use
__sync_synchronize (); val = __sync_lock_test_and_set (mutex, 0);
if we don't mind mf + xchg4.acq.

--- libgomp/config/linux/ia64/mutex.h.jj	2008-11-06 22:19:11.000000000 +0100
+++ libgomp/config/linux/ia64/mutex.h	2008-11-06 22:22:11.000000000 +0100
@@ -0,0 +1,66 @@
+/* Copyright (C) 2005, 2008 Free Software Foundation, Inc.
+   Contributed by Richard Henderson <rth@redhat.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU Lesser General Public License as published by
+   the Free Software Foundation; either version 2.1 of the License, or
+   (at your option) any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for
+   more details.
+
+   You should have received a copy of the GNU Lesser General Public License 
+   along with libgomp; see the file COPYING.LIB.  If not, write to the
+   Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+/* As a special exception, if you link this library with other files, some
+   of which are compiled with GCC, to produce an executable, this library
+   does not by itself cause the resulting executable to be covered by the
+   GNU General Public License.  This exception does not however invalidate
+   any other reasons why the executable file might be covered by the GNU
+   General Public License.  */
+
+/* This is a Linux specific implementation of a mutex synchronization
+   mechanism for libgomp.  This type is private to the library.  This
+   implementation uses atomic instructions and the futex syscall.  */
+
+#ifndef GOMP_MUTEX_H
+#define GOMP_MUTEX_H 1
+
+typedef int gomp_mutex_t;
+
+#define GOMP_MUTEX_INIT_0 1
+
+static inline void gomp_mutex_init (gomp_mutex_t *mutex)
+{
+  *mutex = 0;
+}
+
+extern void gomp_mutex_lock_slow (gomp_mutex_t *mutex);
+static inline void gomp_mutex_lock (gomp_mutex_t *mutex)
+{
+  if (!__sync_bool_compare_and_swap (mutex, 0, 1))
+    gomp_mutex_lock_slow (mutex);
+}
+
+extern void gomp_mutex_unlock_slow (gomp_mutex_t *mutex);
+static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
+{
+  /* On ia64, __sync_lock_test_and_set has acquire semantics,
+     while release semantics is needed for gomp_mutex_unlock.  */
+  int val;
+  __asm__ __volatile__ ("xchg4.rel %0 = [%1], r0;;"
+			: "=r" (val) : "r" (mutex) : "memory");
+  if (__builtin_expect (val > 1, 0))
+    gomp_mutex_unlock_slow (mutex);
+}
+
+static inline void gomp_mutex_destroy (gomp_mutex_t *mutex)
+{
+}
+
+#endif /* GOMP_MUTEX_H */


	Jakub

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 21:59               ` Jakub Jelinek
@ 2008-11-06 22:15                 ` H.J. Lu
  2008-11-06 22:53                   ` Steve Ellcey
  2008-11-06 23:21                   ` Steve Ellcey
  0 siblings, 2 replies; 38+ messages in thread
From: H.J. Lu @ 2008-11-06 22:15 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Steve Ellcey, rth, gcc-patches

On Thu, Nov 6, 2008 at 1:31 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Nov 06, 2008 at 01:09:44PM -0800, H.J. Lu wrote:
>> According to ia64 psABI, ia64 _sync_bool_compare_and_swap should
>> have full barrier.  __sync_lock_test_and_set  has acquire barrier.
>
> Ok, then we have just one problematic __sync_lock_test_and_set
> in whole libgomp.
>
> Would the following (untested) patch cure it?  Or we can use
> __sync_synchronize (); val = __sync_lock_test_and_set (mutex, 0);
> if we don't mind mf + xchg4.acq.
>

> +extern void gomp_mutex_unlock_slow (gomp_mutex_t *mutex);
> +static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
> +{
> +  /* On ia64, __sync_lock_test_and_set has acquire semantics,
> +     while release semantics is needed for gomp_mutex_unlock.  */
> +  int val;
> +  __asm__ __volatile__ ("xchg4.rel %0 = [%1], r0;;"
> +                       : "=r" (val) : "r" (mutex) : "memory");
> +  if (__builtin_expect (val > 1, 0))
> +    gomp_mutex_unlock_slow (mutex);
> +}
> +

There is no xchg4.rel.


-- 
H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 22:15                 ` H.J. Lu
@ 2008-11-06 22:53                   ` Steve Ellcey
  2008-11-06 23:21                   ` Steve Ellcey
  1 sibling, 0 replies; 38+ messages in thread
From: Steve Ellcey @ 2008-11-06 22:53 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jakub Jelinek, rth, gcc-patches


On Thu, 2008-11-06 at 13:58 -0800, H.J. Lu wrote:
> On Thu, Nov 6, 2008 at 1:31 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Thu, Nov 06, 2008 at 01:09:44PM -0800, H.J. Lu wrote:
> >> According to ia64 psABI, ia64 _sync_bool_compare_and_swap should
> >> have full barrier.  __sync_lock_test_and_set  has acquire barrier.
> >
> > Ok, then we have just one problematic __sync_lock_test_and_set
> > in whole libgomp.
> >
> > Would the following (untested) patch cure it?  Or we can use
> > __sync_synchronize (); val = __sync_lock_test_and_set (mutex, 0);
> > if we don't mind mf + xchg4.acq.
> >
> 
> > +extern void gomp_mutex_unlock_slow (gomp_mutex_t *mutex);
> > +static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
> > +{
> > +  /* On ia64, __sync_lock_test_and_set has acquire semantics,
> > +     while release semantics is needed for gomp_mutex_unlock.  */
> > +  int val;
> > +  __asm__ __volatile__ ("xchg4.rel %0 = [%1], r0;;"
> > +                       : "=r" (val) : "r" (mutex) : "memory");
> > +  if (__builtin_expect (val > 1, 0))
> > +    gomp_mutex_unlock_slow (mutex);
> > +}
> > +
> 
> There is no xchg4.rel.

So that leaves us with calling __sync_synchronize, which is where I
started except that instead of doing the mf inside of the
__sync_lock_test_and_set we do it outside with a seperate call to
__sync_synchronize.

Steve Ellcey
sje@cup.hp.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 22:15                 ` H.J. Lu
  2008-11-06 22:53                   ` Steve Ellcey
@ 2008-11-06 23:21                   ` Steve Ellcey
  2008-11-06 23:48                     ` Jakub Jelinek
  1 sibling, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2008-11-06 23:21 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: rth, gcc-patches, H.J. Lu


On Thu, Nov 6, 2008 at 1:31 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Nov 06, 2008 at 01:09:44PM -0800, H.J. Lu wrote:
>> According to ia64 psABI, ia64 _sync_bool_compare_and_swap should
>> have full barrier.  __sync_lock_test_and_set  has acquire barrier.
>
> Ok, then we have just one problematic __sync_lock_test_and_set
> in whole libgomp.
>
> Would the following (untested) patch cure it?  Or we can use
> __sync_synchronize (); val = __sync_lock_test_and_set (mutex, 0);
> if we don't mind mf + xchg4.acq.

So I put a call to __sync_synchronize into gomp_mutex_unlock and that
seems to work fine, my only complaint is that this requires us to have
an ia64 specific version of mutex.h.  Now if ia64 was doing something
wrong, I would understand that.  But the IA64 __sync_lock_test_and_set
instruction has acquire semantics which is what is documented in
extend.texi.  If gomp_mutex_unlock needs something more then that
shouldn't the standard linux version of gomp_mutex_unlock be changed to
call __sync_synchronize and/or use something else instead of
__sync_lock_test_and_set?

Steve Ellcey
sje@cup.hp.com


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 23:21                   ` Steve Ellcey
@ 2008-11-06 23:48                     ` Jakub Jelinek
  2008-11-07  0:00                       ` Steve Ellcey
  0 siblings, 1 reply; 38+ messages in thread
From: Jakub Jelinek @ 2008-11-06 23:48 UTC (permalink / raw)
  To: Steve Ellcey; +Cc: rth, gcc-patches, H.J. Lu

On Thu, Nov 06, 2008 at 02:44:55PM -0800, Steve Ellcey wrote:
> So I put a call to __sync_synchronize into gomp_mutex_unlock and that
> seems to work fine, my only complaint is that this requires us to have
> an ia64 specific version of mutex.h.  Now if ia64 was doing something
> wrong, I would understand that.  But the IA64 __sync_lock_test_and_set
> instruction has acquire semantics which is what is documented in
> extend.texi.  If gomp_mutex_unlock needs something more then that
> shouldn't the standard linux version of gomp_mutex_unlock be changed to
> call __sync_synchronize and/or use something else instead of
> __sync_lock_test_and_set?

This is libgomp internal header and something that has to be as fast as
possible, it is inlined in many places inside of libgomp.
IMHO if we put a big comment in the generic header that it relies on 
release semantics (or full barrier) of the builtin and that targets which
don't offer this for __sync_lock_test_and_set should #include "ia64/mutex.h"
in their config/linux/<arch>/mutex.h, it is acceptable.
From my brief look at the targets that are currently supported by libgomp's
config/linux:

Doesn't care or only tiny bit worse code: x86
Worse code: powerpc, s390, sparc, alpha, mips   
Needs __sync_synchronize: ia64

I really think that using __sync_synchronize in config/linux/mutex.h and
have all targets but ia64 override this header is worse than having
just special config/linux/ia64/mutex.h.

	Jakub

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-06 23:48                     ` Jakub Jelinek
@ 2008-11-07  0:00                       ` Steve Ellcey
  2008-11-07  0:04                         ` H.J. Lu
  0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2008-11-07  0:00 UTC (permalink / raw)
  To: jakub; +Cc: rth, gcc-patches, hjl.tools

On Fri, 2008-11-07 at 00:24 +0100, Jakub Jelinek wrote:

I really think that using __sync_synchronize in config/linux/mutex.h and
> have all targets but ia64 override this header is worse than having
> just special config/linux/ia64/mutex.h.
> 
> 	Jakub

Yes, but I felt I had to protest at least a little.  I will test this
libgomp patch overnight to verify that it works OK and has no
regressions.  The only change from this mutex.h and the standard one is
the call to __sync_synchronize in gomp_mutex_unlock (and the comments
for that routine).

Steve Ellcey
sje@cup.hp.com


2008-11-06  Steve Ellcey  <sje@cup.hp.com>

	PR libgomp/37938
	* config/linux/ia64/mutex.h: New.

Index: config/linux/ia64/mutex.h
===================================================================
--- config/linux/ia64/mutex.h	(revision 0)
+++ config/linux/ia64/mutex.h	(revision 0)
@@ -0,0 +1,70 @@
+/* Copyright (C) 2005 Free Software Foundation, Inc.
+   Contributed by Richard Henderson <rth@redhat.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU Lesser General Public License as published by
+   the Free Software Foundation; either version 2.1 of the License, or
+   (at your option) any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for
+   more details.
+
+   You should have received a copy of the GNU Lesser General Public License 
+   along with libgomp; see the file COPYING.LIB.  If not, write to the
+   Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+/* As a special exception, if you link this library with other files, some
+   of which are compiled with GCC, to produce an executable, this library
+   does not by itself cause the resulting executable to be covered by the
+   GNU General Public License.  This exception does not however invalidate
+   any other reasons why the executable file might be covered by the GNU
+   General Public License.  */
+
+/* This is a Linux specific implementation of a mutex synchronization
+   mechanism for libgomp.  This type is private to the library.  This
+   implementation uses atomic instructions and the futex syscall.  */
+
+#ifndef GOMP_MUTEX_H
+#define GOMP_MUTEX_H 1
+
+typedef int gomp_mutex_t;
+
+#define GOMP_MUTEX_INIT_0 1
+
+static inline void gomp_mutex_init (gomp_mutex_t *mutex)
+{
+  *mutex = 0;
+}
+
+extern void gomp_mutex_lock_slow (gomp_mutex_t *mutex);
+static inline void gomp_mutex_lock (gomp_mutex_t *mutex)
+{
+  if (!__sync_bool_compare_and_swap (mutex, 0, 1))
+    gomp_mutex_lock_slow (mutex);
+}
+
+extern void gomp_mutex_unlock_slow (gomp_mutex_t *mutex);
+
+/* gomp_mutex_unlock needs release semantics and __sync_lock_test_and_set
+   only guarantees acquire semantics so we need a __sync_synchronize call
+   on IA64.  */
+
+static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
+{
+  int val;
+  __sync_synchronize ();
+  val = __sync_lock_test_and_set (mutex, 0);
+  if (__builtin_expect (val > 1, 0))
+    gomp_mutex_unlock_slow (mutex);
+}
+
+static inline void gomp_mutex_destroy (gomp_mutex_t *mutex)
+{
+}
+
+#endif /* GOMP_MUTEX_H */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-07  0:00                       ` Steve Ellcey
@ 2008-11-07  0:04                         ` H.J. Lu
  2008-11-07  0:17                           ` H.J. Lu
  2008-11-07  0:20                           ` Jakub Jelinek
  0 siblings, 2 replies; 38+ messages in thread
From: H.J. Lu @ 2008-11-07  0:04 UTC (permalink / raw)
  To: Steve Ellcey; +Cc: jakub, rth, gcc-patches

On Thu, Nov 6, 2008 at 3:47 PM, Steve Ellcey <sje@cup.hp.com> wrote:
> On Fri, 2008-11-07 at 00:24 +0100, Jakub Jelinek wrote:
>
> I really think that using __sync_synchronize in config/linux/mutex.h and
>> have all targets but ia64 override this header is worse than having
>> just special config/linux/ia64/mutex.h.
>>
>>       Jakub
>
> Yes, but I felt I had to protest at least a little.  I will test this
> libgomp patch overnight to verify that it works OK and has no
> regressions.  The only change from this mutex.h and the standard one is
> the call to __sync_synchronize in gomp_mutex_unlock (and the comments
> for that routine).
>

FWIW, ia64 has __sync_lock_release.


-- 
H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-07  0:04                         ` H.J. Lu
@ 2008-11-07  0:17                           ` H.J. Lu
  2008-11-07  3:40                             ` Jakub Jelinek
  2008-11-07  0:20                           ` Jakub Jelinek
  1 sibling, 1 reply; 38+ messages in thread
From: H.J. Lu @ 2008-11-07  0:17 UTC (permalink / raw)
  To: Steve Ellcey; +Cc: jakub, rth, gcc-patches

On Thu, Nov 6, 2008 at 3:57 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Nov 6, 2008 at 3:47 PM, Steve Ellcey <sje@cup.hp.com> wrote:
>> On Fri, 2008-11-07 at 00:24 +0100, Jakub Jelinek wrote:
>>
>> I really think that using __sync_synchronize in config/linux/mutex.h and
>>> have all targets but ia64 override this header is worse than having
>>> just special config/linux/ia64/mutex.h.
>>>
>>>       Jakub
>>
>> Yes, but I felt I had to protest at least a little.  I will test this
>> libgomp patch overnight to verify that it works OK and has no
>> regressions.  The only change from this mutex.h and the standard one is
>> the call to __sync_synchronize in gomp_mutex_unlock (and the comments
>> for that routine).
>>
>
> FWIW, ia64 has __sync_lock_release.
>

Will this work

static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
{
  int val = *mutex;
  __sync_lock_release (mutx);
 if (__builtin_expect (val > 1, 0))
    gomp_mutex_unlock_slow (mutex);
}



-- 
H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-07  0:04                         ` H.J. Lu
  2008-11-07  0:17                           ` H.J. Lu
@ 2008-11-07  0:20                           ` Jakub Jelinek
  1 sibling, 0 replies; 38+ messages in thread
From: Jakub Jelinek @ 2008-11-07  0:20 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Steve Ellcey, rth, gcc-patches

On Thu, Nov 06, 2008 at 03:57:04PM -0800, H.J. Lu wrote:
> On Thu, Nov 6, 2008 at 3:47 PM, Steve Ellcey <sje@cup.hp.com> wrote:
> > On Fri, 2008-11-07 at 00:24 +0100, Jakub Jelinek wrote:
> >
> > I really think that using __sync_synchronize in config/linux/mutex.h and
> >> have all targets but ia64 override this header is worse than having
> >> just special config/linux/ia64/mutex.h.
> >>
> >>       Jakub
> >
> > Yes, but I felt I had to protest at least a little.  I will test this
> > libgomp patch overnight to verify that it works OK and has no
> > regressions.  The only change from this mutex.h and the standard one is
> > the call to __sync_synchronize in gomp_mutex_unlock (and the comments
> > for that routine).
> >
> 
> FWIW, ia64 has __sync_lock_release.

Sure, but we need to (atomically) read the previous value before storing 0
to find out if futex_wake is needed or not.

	Jakub

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-07  0:17                           ` H.J. Lu
@ 2008-11-07  3:40                             ` Jakub Jelinek
  2008-11-07  6:30                               ` H.J. Lu
  0 siblings, 1 reply; 38+ messages in thread
From: Jakub Jelinek @ 2008-11-07  3:40 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Steve Ellcey, rth, gcc-patches

On Thu, Nov 06, 2008 at 04:00:12PM -0800, H.J. Lu wrote:
> Will this work
> 
> static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
> {
>   int val = *mutex;
>   __sync_lock_release (mutx);
>  if (__builtin_expect (val > 1, 0))
>     gomp_mutex_unlock_slow (mutex);
> }

No, that's not atomic.  Consider if this task reads 1 from *mutex
and sleeps in between the read and __sync_lock_release.
Another task in the mean time calls gomp_mutex_lock, which atomically
replaces 1 with 2 in memory and then calls gomp_mutex_lock_slow
and futex_wait in there.  Then the first task wakes up, __sync_lock_release
releases the mutex, but as val is 1, doesn't futex_wake it.  The
second task might wait forever.

	Jakub

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-07  3:40                             ` Jakub Jelinek
@ 2008-11-07  6:30                               ` H.J. Lu
  2008-11-07 18:47                                 ` Steve Ellcey
  0 siblings, 1 reply; 38+ messages in thread
From: H.J. Lu @ 2008-11-07  6:30 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Steve Ellcey, rth, gcc-patches

On Thu, Nov 6, 2008 at 4:15 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Nov 06, 2008 at 04:00:12PM -0800, H.J. Lu wrote:
>> Will this work
>>
>> static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
>> {
>>   int val = *mutex;
>>   __sync_lock_release (mutx);
>>  if (__builtin_expect (val > 1, 0))
>>     gomp_mutex_unlock_slow (mutex);
>> }
>
> No, that's not atomic.  Consider if this task reads 1 from *mutex
> and sleeps in between the read and __sync_lock_release.
> Another task in the mean time calls gomp_mutex_lock, which atomically
> replaces 1 with 2 in memory and then calls gomp_mutex_lock_slow
> and futex_wait in there.  Then the first task wakes up, __sync_lock_release
> releases the mutex, but as val is 1, doesn't futex_wake it.  The
> second task might wait forever.
>

static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
{
  int val = __sync_lock_test_and_set (mutex, 0);
  if (__builtin_expect (val > 1, 0))
    gomp_mutex_unlock_slow (mutex);
}

should work as long as mutex isn't updated with release or
unordered semantics prior to this function call. Who else
have updated mutux? What semantics are they using?

-- 
H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-07  6:30                               ` H.J. Lu
@ 2008-11-07 18:47                                 ` Steve Ellcey
  2008-11-07 19:34                                   ` H.J. Lu
  0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2008-11-07 18:47 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jakub Jelinek, rth, gcc-patches


On Thu, 2008-11-06 at 17:12 -0800, H.J. Lu wrote:

> static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
> {
>   int val = __sync_lock_test_and_set (mutex, 0);
>   if (__builtin_expect (val > 1, 0))
>     gomp_mutex_unlock_slow (mutex);
> }
> 
> should work as long as mutex isn't updated with release or
> unordered semantics prior to this function call. Who else
> have updated mutux? What semantics are they using?

Isn't the lock done with release semantics? gomp_mutex_lock
calls __sync_bool_compare_and_swap and sync_compare_and_swap<mode> in
sync.md generates a cmpxchg.rel instruction.

Steve Ellcey
sje@cup.hp.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-07 18:47                                 ` Steve Ellcey
@ 2008-11-07 19:34                                   ` H.J. Lu
  2008-11-07 21:18                                     ` H.J. Lu
  0 siblings, 1 reply; 38+ messages in thread
From: H.J. Lu @ 2008-11-07 19:34 UTC (permalink / raw)
  To: sje; +Cc: Jakub Jelinek, rth, gcc-patches

On Fri, Nov 7, 2008 at 10:35 AM, Steve Ellcey <sje@cup.hp.com> wrote:
>
> On Thu, 2008-11-06 at 17:12 -0800, H.J. Lu wrote:
>
>> static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
>> {
>>   int val = __sync_lock_test_and_set (mutex, 0);
>>   if (__builtin_expect (val > 1, 0))
>>     gomp_mutex_unlock_slow (mutex);
>> }
>>
>> should work as long as mutex isn't updated with release or
>> unordered semantics prior to this function call. Who else
>> have updated mutux? What semantics are they using?
>
> Isn't the lock done with release semantics? gomp_mutex_lock
> calls __sync_bool_compare_and_swap and sync_compare_and_swap<mode> in
> sync.md generates a cmpxchg.rel instruction.
>

According to ia64 psABI, __sync_bool_compare_and_swap and
sync_compare_and_swap<mode>
should have full barrier. That means they should use cmpxchg.acq,
which is icc 11.0 generates.

-- 
H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-07 19:34                                   ` H.J. Lu
@ 2008-11-07 21:18                                     ` H.J. Lu
  2008-11-07 21:28                                       ` H.J. Lu
  0 siblings, 1 reply; 38+ messages in thread
From: H.J. Lu @ 2008-11-07 21:18 UTC (permalink / raw)
  To: sje; +Cc: Jakub Jelinek, rth, gcc-patches

On Fri, Nov 7, 2008 at 10:57 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Nov 7, 2008 at 10:35 AM, Steve Ellcey <sje@cup.hp.com> wrote:
>>
>> On Thu, 2008-11-06 at 17:12 -0800, H.J. Lu wrote:
>>
>>> static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
>>> {
>>>   int val = __sync_lock_test_and_set (mutex, 0);
>>>   if (__builtin_expect (val > 1, 0))
>>>     gomp_mutex_unlock_slow (mutex);
>>> }
>>>
>>> should work as long as mutex isn't updated with release or
>>> unordered semantics prior to this function call. Who else
>>> have updated mutux? What semantics are they using?
>>
>> Isn't the lock done with release semantics? gomp_mutex_lock
>> calls __sync_bool_compare_and_swap and sync_compare_and_swap<mode> in
>> sync.md generates a cmpxchg.rel instruction.
>>
>
> According to ia64 psABI, __sync_bool_compare_and_swap and
> sync_compare_and_swap<mode>
> should have full barrier. That means they should use cmpxchg.acq,
> which is icc 11.0 generates.
>

I think cmpxchg.rel is typo:


(define_insn "cmpxchg_rel_<mode>"
  [(set (match_operand:DI 0 "gr_register_operand" "=r")
        (zero_extend:DI
          (match_operand:I124MODE 1 "not_postinc_memory_operand" "+S")))
   (set (match_dup 1)
        (unspec:I124MODE
          [(match_dup 1)
           (match_operand:DI 2 "ar_ccv_reg_operand" "")
           (match_operand:I124MODE 3 "gr_register_operand" "r")]
          UNSPEC_CMPXCHG_ACQ))]
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ""
  "cmpxchg<modesuffix>.rel %0 = %1, %3, %2"
  [(set_attr "itanium_class" "sem")])

(define_insn "cmpxchg_rel_di"
  [(set (match_operand:DI 0 "gr_register_operand" "=r")
        (match_operand:DI 1 "not_postinc_memory_operand" "+S"))
   (set (match_dup 1)
        (unspec:DI [(match_dup 1)
                    (match_operand:DI 2 "ar_ccv_reg_operand" "")
                    (match_operand:DI 3 "gr_register_operand" "r")]
                   UNSPEC_CMPXCHG_ACQ))]
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ""
  "cmpxchg8.rel %0 = %1, %3, %2"
  [(set_attr "itanium_class" "sem")])


-- 
H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-07 21:18                                     ` H.J. Lu
@ 2008-11-07 21:28                                       ` H.J. Lu
  2008-11-10 17:32                                         ` Steve Ellcey
  2008-11-12 14:33                                         ` Jakub Jelinek
  0 siblings, 2 replies; 38+ messages in thread
From: H.J. Lu @ 2008-11-07 21:28 UTC (permalink / raw)
  To: sje; +Cc: Jakub Jelinek, rth, gcc-patches

On Fri, Nov 7, 2008 at 11:30 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Nov 7, 2008 at 10:57 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, Nov 7, 2008 at 10:35 AM, Steve Ellcey <sje@cup.hp.com> wrote:
>>>
>>> On Thu, 2008-11-06 at 17:12 -0800, H.J. Lu wrote:
>>>
>>>> static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
>>>> {
>>>>   int val = __sync_lock_test_and_set (mutex, 0);
>>>>   if (__builtin_expect (val > 1, 0))
>>>>     gomp_mutex_unlock_slow (mutex);
>>>> }
>>>>
>>>> should work as long as mutex isn't updated with release or
>>>> unordered semantics prior to this function call. Who else
>>>> have updated mutux? What semantics are they using?
>>>
>>> Isn't the lock done with release semantics? gomp_mutex_lock
>>> calls __sync_bool_compare_and_swap and sync_compare_and_swap<mode> in
>>> sync.md generates a cmpxchg.rel instruction.
>>>
>>
>> According to ia64 psABI, __sync_bool_compare_and_swap and
>> sync_compare_and_swap<mode>
>> should have full barrier. That means they should use cmpxchg.acq,
>> which is icc 11.0 generates.
>>
>
> I think cmpxchg.rel is typo:
>
>
> (define_insn "cmpxchg_rel_<mode>"
>  [(set (match_operand:DI 0 "gr_register_operand" "=r")
>        (zero_extend:DI
>          (match_operand:I124MODE 1 "not_postinc_memory_operand" "+S")))
>   (set (match_dup 1)
>        (unspec:I124MODE
>          [(match_dup 1)
>           (match_operand:DI 2 "ar_ccv_reg_operand" "")
>           (match_operand:I124MODE 3 "gr_register_operand" "r")]
>          UNSPEC_CMPXCHG_ACQ))]
>          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  ""
>  "cmpxchg<modesuffix>.rel %0 = %1, %3, %2"
>  [(set_attr "itanium_class" "sem")])
>
> (define_insn "cmpxchg_rel_di"
>  [(set (match_operand:DI 0 "gr_register_operand" "=r")
>        (match_operand:DI 1 "not_postinc_memory_operand" "+S"))
>   (set (match_dup 1)
>        (unspec:DI [(match_dup 1)
>                    (match_operand:DI 2 "ar_ccv_reg_operand" "")
>                    (match_operand:DI 3 "gr_register_operand" "r")]
>                   UNSPEC_CMPXCHG_ACQ))]
>                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  ""
>  "cmpxchg8.rel %0 = %1, %3, %2"
>  [(set_attr "itanium_class" "sem")])
>

I am testing this patch. There is no need for memory barrier with acq semantics.


H.J.
---
--- ./sync.md.sync      2007-08-23 09:22:02.000000000 -0700
+++ ./sync.md   2008-11-07 11:29:22.000000000 -0800
@@ -125,15 +125,14 @@
   if (GET_MODE (dst) != DImode)
     dst = gen_reg_rtx (DImode);

-  emit_insn (gen_memory_barrier ());
-  emit_insn (gen_cmpxchg_rel_<mode> (dst, operands[1], ccv, operands[3]));
+  emit_insn (gen_cmpxchg_acq_<mode> (dst, operands[1], ccv, operands[3]));

   if (dst != operands[0])
     emit_move_insn (operands[0], gen_lowpart (<MODE>mode, dst));
   DONE;
 })

-(define_insn "cmpxchg_rel_<mode>"
+(define_insn "cmpxchg_acq_<mode>"
   [(set (match_operand:DI 0 "gr_register_operand" "=r")
 	(zero_extend:DI
 	  (match_operand:I124MODE 1 "not_postinc_memory_operand" "+S")))
@@ -144,10 +143,10 @@
 	   (match_operand:I124MODE 3 "gr_register_operand" "r")]
 	  UNSPEC_CMPXCHG_ACQ))]
   ""
-  "cmpxchg<modesuffix>.rel %0 = %1, %3, %2"
+  "cmpxchg<modesuffix>.acq %0 = %1, %3, %2"
   [(set_attr "itanium_class" "sem")])

-(define_insn "cmpxchg_rel_di"
+(define_insn "cmpxchg_acq_di"
   [(set (match_operand:DI 0 "gr_register_operand" "=r")
 	(match_operand:DI 1 "not_postinc_memory_operand" "+S"))
    (set (match_dup 1)
@@ -156,7 +155,7 @@
 		    (match_operand:DI 3 "gr_register_operand" "r")]
 		   UNSPEC_CMPXCHG_ACQ))]
   ""
-  "cmpxchg8.rel %0 = %1, %3, %2"
+  "cmpxchg8.acq %0 = %1, %3, %2"
   [(set_attr "itanium_class" "sem")])

 (define_insn "sync_lock_test_and_set<mode>"



-- 
H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-07 21:28                                       ` H.J. Lu
@ 2008-11-10 17:32                                         ` Steve Ellcey
  2008-11-12 14:33                                         ` Jakub Jelinek
  1 sibling, 0 replies; 38+ messages in thread
From: Steve Ellcey @ 2008-11-10 17:32 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jakub Jelinek, rth, gcc-patches


On Fri, 2008-11-07 at 11:33 -0800, H.J. Lu wrote:

> I am testing this patch. There is no need for memory barrier with acq semantics.
> 
> 
> H.J.

H.J.  I tried your patch and still had problems.  I think the change to
use acquire semantics is right, but it doesn't fix the problem.  In
looking at it I wonder about the release side.  In
libgomp/config/linux.h we have:

extern void gomp_mutex_lock_slow (gomp_mutex_t *mutex);
static inline void gomp_mutex_lock (gomp_mutex_t *mutex)
{
  if (!__sync_bool_compare_and_swap (mutex, 0, 1))
    gomp_mutex_lock_slow (mutex);
}

extern void gomp_mutex_unlock_slow (gomp_mutex_t *mutex);
static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
{
  int val = __sync_lock_test_and_set (mutex, 0);
  if (__builtin_expect (val > 1, 0))
    gomp_mutex_unlock_slow (mutex);
}

Now your change will make sure that the __sync_bool_compare_and_swap
builtin is done with aquire semantics (which seems right) but the unlock
part is done with __sync_lock_test_and_set and this does an xchg on IA64
which, in your earlier mail, you said also had acquire semantics.  Since
we are unlocking here don't we need release semantics?

Steve Ellcey
sje@cup.hp.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-07 21:28                                       ` H.J. Lu
  2008-11-10 17:32                                         ` Steve Ellcey
@ 2008-11-12 14:33                                         ` Jakub Jelinek
  2008-11-12 15:48                                           ` H.J. Lu
  1 sibling, 1 reply; 38+ messages in thread
From: Jakub Jelinek @ 2008-11-12 14:33 UTC (permalink / raw)
  To: H.J. Lu; +Cc: sje, rth, gcc-patches

On Fri, Nov 07, 2008 at 11:33:26AM -0800, H.J. Lu wrote:
> I am testing this patch. There is no need for memory barrier with acq semantics.

Isn't acquire semantics of cmpxchg.acq weaker than fence semantics that
__sync_* builtins other than __sync_lock_{test_and_set,release} are supposed
to have?  E.g. http://refspecs.freestandards.org/elf/IA64-SysV-psABI.pdf
documents __sync_{bool,val}_compare_and_swap as Full barrier.  I think
in libgomp, glibc and many other places __sync_{bool,val}_compare_and_swap
is used in places where release semantics or fence semantics is needed.

	Jakub

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-12 14:33                                         ` Jakub Jelinek
@ 2008-11-12 15:48                                           ` H.J. Lu
  2008-11-12 15:52                                             ` H.J. Lu
  0 siblings, 1 reply; 38+ messages in thread
From: H.J. Lu @ 2008-11-12 15:48 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: sje, rth, gcc-patches

On Wed, Nov 12, 2008 at 6:15 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Nov 07, 2008 at 11:33:26AM -0800, H.J. Lu wrote:
>> I am testing this patch. There is no need for memory barrier with acq semantics.
>
> Isn't acquire semantics of cmpxchg.acq weaker than fence semantics that
> __sync_* builtins other than __sync_lock_{test_and_set,release} are supposed
> to have?  E.g. http://refspecs.freestandards.org/elf/IA64-SysV-psABI.pdf
> documents __sync_{bool,val}_compare_and_swap as Full barrier.  I think
> in libgomp, glibc and many other places __sync_{bool,val}_compare_and_swap
> is used in places where release semantics or fence semantics is needed.
>
>        Jakub
>

Here is the the ordering interactions between memory accesses with
different ordering
semantics. "O" indicates that the first and second reference are
performed in order with
respect to each other. A "-" indicates that no ordering is implied.

                                                    Second Reference
  First Reference
                                    Fence               Acquire
   Release          Unordered
 fence                               O                         O
           O                           O
acquire                            O                         O
         O                           O
release                            O                         –
          O                            –
unordered                       O                         –
       O                           –

We need to make sure that we never hit "-".

-- 
H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-12 15:48                                           ` H.J. Lu
@ 2008-11-12 15:52                                             ` H.J. Lu
  2008-11-12 16:35                                               ` Steve Ellcey
  2008-11-12 22:15                                               ` Jakub Jelinek
  0 siblings, 2 replies; 38+ messages in thread
From: H.J. Lu @ 2008-11-12 15:52 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: sje, rth, gcc-patches

On Wed, Nov 12, 2008 at 07:42:22AM -0800, H.J. Lu wrote:
> On Wed, Nov 12, 2008 at 6:15 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Fri, Nov 07, 2008 at 11:33:26AM -0800, H.J. Lu wrote:
> >> I am testing this patch. There is no need for memory barrier with acq semantics.
> >
> > Isn't acquire semantics of cmpxchg.acq weaker than fence semantics that
> > __sync_* builtins other than __sync_lock_{test_and_set,release} are supposed
> > to have?  E.g. http://refspecs.freestandards.org/elf/IA64-SysV-psABI.pdf
> > documents __sync_{bool,val}_compare_and_swap as Full barrier.  I think
> > in libgomp, glibc and many other places __sync_{bool,val}_compare_and_swap
> > is used in places where release semantics or fence semantics is needed.
> >
> >        Jakub
> >
> 
> Here is the the ordering interactions between memory accesses with
> different ordering
> semantics. "O" indicates that the first and second reference are
> performed in order with
> respect to each other. A "-" indicates that no ordering is implied.
> 
>                                                     Second Reference
>   First Reference
>                                     Fence               Acquire
>    Release          Unordered
>  fence                               O                         O
>            O                           O
> acquire                            O                         O
>          O                           O
> release                            O                         –
>           O                            –
> unordered                       O                         –
>        O                           –
> 
> We need to make sure that we never hit "-".
> 

This one is more readable:

                   Seccond Reference
First Reference
             Fence   Acquire  Release  Unordered
fence          O         O        O         O
acquire        O         O        O         O
release        O         –        O         –
unordered      O         –        O         –


H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-12 15:52                                             ` H.J. Lu
@ 2008-11-12 16:35                                               ` Steve Ellcey
  2008-11-12 23:10                                                 ` Jakub Jelinek
  2008-11-12 22:15                                               ` Jakub Jelinek
  1 sibling, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2008-11-12 16:35 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jakub Jelinek, rth, gcc-patches


On Wed, 2008-11-12 at 07:47 -0800, H.J. Lu wrote:

> This one is more readable:
> 
>                    Seccond Reference
> First Reference
>              Fence   Acquire  Release  Unordered
> fence          O         O        O         O
> acquire        O         O        O         O
> release        O         –        O         –
> unordered      O         –        O         –
> 
> 
> H.J.

The thing that seems a bit confusing to me is that we use
__sync_bool_compare_and_swap to lock but __sync_lock_test_and_set to
unlock in the gomp mutex lock/unlock routines.  If we wanted the most
efficient mutex lock/unlock it would seem that we would want to use
sync_lock_test_and_set for the mutex set (acquire semantics) and
sync_lock_release (release semantics) for the mutex release.  I think
the reason we don't do this is we want to get the old value of the mutex
when releasing it in order to see if it is 1 or 2 in order to see if we
should wake another process up with futex.  On IA64 there is no atomic
exchange instruction with release semantics.  There is compare and
exchange with acquire or release semantics and an exchange with acquire
semantics, but no exchange with release semantics.

Steve Ellcey
sje@cup.hp.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-12 15:52                                             ` H.J. Lu
  2008-11-12 16:35                                               ` Steve Ellcey
@ 2008-11-12 22:15                                               ` Jakub Jelinek
  1 sibling, 0 replies; 38+ messages in thread
From: Jakub Jelinek @ 2008-11-12 22:15 UTC (permalink / raw)
  To: H.J. Lu; +Cc: sje, rth, gcc-patches

On Wed, Nov 12, 2008 at 07:47:40AM -0800, H.J. Lu wrote:
> This one is more readable:
> 
>                    Second Reference
> First Reference
>              Fence   Acquire  Release  Unordered
> fence          O         O        O         O
> acquire        O         O        O         O
> release        O         –        O         –
> unordered      O         –        O         –

So, if you take the mf out of __sync_{bool,val}_compare_and_swap
and use cmpxchg*.acq, if you do:
  mem1 = 6;
  __sync_bool_compare_and_swap (&mem2, 4, 5);
mem2 store might be visible before mem1 store, which is wrong,
as __sync_bool_compare_and_swap is documented to be a full barrier.

	Jakub

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-12 16:35                                               ` Steve Ellcey
@ 2008-11-12 23:10                                                 ` Jakub Jelinek
  2008-11-13 18:51                                                   ` Steve Ellcey
  0 siblings, 1 reply; 38+ messages in thread
From: Jakub Jelinek @ 2008-11-12 23:10 UTC (permalink / raw)
  To: Steve Ellcey; +Cc: H.J. Lu, rth, gcc-patches

On Wed, Nov 12, 2008 at 08:00:30AM -0800, Steve Ellcey wrote:
> The thing that seems a bit confusing to me is that we use
> __sync_bool_compare_and_swap to lock but __sync_lock_test_and_set to
> unlock in the gomp mutex lock/unlock routines.  If we wanted the most
> efficient mutex lock/unlock it would seem that we would want to use
> sync_lock_test_and_set for the mutex set (acquire semantics) and
> sync_lock_release (release semantics) for the mutex release.  I think

Only if we needed just unlocked/locked state.  But we want 3 to avoid
unnecessary trips into the kernel.

> the reason we don't do this is we want to get the old value of the mutex
> when releasing it in order to see if it is 1 or 2 in order to see if we
> should wake another process up with futex.  On IA64 there is no atomic
> exchange instruction with release semantics.  There is compare and
> exchange with acquire or release semantics and an exchange with acquire
> semantics, but no exchange with release semantics.

I think just using
  __sync_synchronize ();
  int val = __sync_lock_test_and_set (mutex, 0);
  if (__builtin_expect (val > 1, 0))
    gomp_mutex_unlock_slow (mutex);
in ia64/mutex.h (gomp_mutex_unlock) will be the fastest, as cmpxchg*.rel
would need an extra memory load and a loop if it failed.

	Jakub

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-12 23:10                                                 ` Jakub Jelinek
@ 2008-11-13 18:51                                                   ` Steve Ellcey
  2008-11-13 21:20                                                     ` Jakub Jelinek
  0 siblings, 1 reply; 38+ messages in thread
From: Steve Ellcey @ 2008-11-13 18:51 UTC (permalink / raw)
  To: jakub; +Cc: hjl.tools, rth, gcc-patches

> I think just using
>   __sync_synchronize ();
>   int val = __sync_lock_test_and_set (mutex, 0);
>   if (__builtin_expect (val > 1, 0))
>     gomp_mutex_unlock_slow (mutex);
> in ia64/mutex.h (gomp_mutex_unlock) will be the fastest, as cmpxchg*.rel
> would need an extra memory load and a loop if it failed.
> 
>         Jakub

OK, here is the patch I tested last night.  It fixes the bug and caused
no regressions.  The only difference between this mutex.h and the regular
one is the call to __sync_synchronize in gomp_mutex_unlock.  Should I go
ahead and check it in?

Steve Ellcey
sje@cup.hp.com



2008-11-13  Steve Ellcey  <sje@cup.hp.com>

	* config/linux/ia64/mutex.h: New.


Index: config/linux/ia64/mutex.h
===================================================================
--- config/linux/ia64/mutex.h	(revision 0)
+++ config/linux/ia64/mutex.h	(revision 0)
@@ -0,0 +1,68 @@
+/* Copyright (C) 2005 Free Software Foundation, Inc.
+   Contributed by Richard Henderson <rth@redhat.com>.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU Lesser General Public License as published by
+   the Free Software Foundation; either version 2.1 of the License, or
+   (at your option) any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for
+   more details.
+
+   You should have received a copy of the GNU Lesser General Public License 
+   along with libgomp; see the file COPYING.LIB.  If not, write to the
+   Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+/* As a special exception, if you link this library with other files, some
+   of which are compiled with GCC, to produce an executable, this library
+   does not by itself cause the resulting executable to be covered by the
+   GNU General Public License.  This exception does not however invalidate
+   any other reasons why the executable file might be covered by the GNU
+   General Public License.  */
+
+/* This is a Linux specific implementation of a mutex synchronization
+   mechanism for libgomp.  This type is private to the library.  This
+   implementation uses atomic instructions and the futex syscall.  */
+
+#ifndef GOMP_MUTEX_H
+#define GOMP_MUTEX_H 1
+
+typedef int gomp_mutex_t;
+
+#define GOMP_MUTEX_INIT_0 1
+
+static inline void gomp_mutex_init (gomp_mutex_t *mutex)
+{
+  *mutex = 0;
+}
+
+extern void gomp_mutex_lock_slow (gomp_mutex_t *mutex);
+static inline void gomp_mutex_lock (gomp_mutex_t *mutex)
+{
+  if (!__sync_bool_compare_and_swap (mutex, 0, 1))
+    gomp_mutex_lock_slow (mutex);
+}
+
+extern void gomp_mutex_unlock_slow (gomp_mutex_t *mutex);
+
+/* IA64 needs a __sync_synchronize call before  __sync_lock_test_and_set
+   because  __sync_lock_test_and_set is not a full memory fence.  */
+static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
+{
+  int val;
+  __sync_synchronize ();
+  val = __sync_lock_test_and_set (mutex, 0);
+  if (__builtin_expect (val > 1, 0))
+    gomp_mutex_unlock_slow (mutex);
+}
+
+static inline void gomp_mutex_destroy (gomp_mutex_t *mutex)
+{
+}
+
+#endif /* GOMP_MUTEX_H */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-13 18:51                                                   ` Steve Ellcey
@ 2008-11-13 21:20                                                     ` Jakub Jelinek
  2008-11-13 21:39                                                       ` Steve Ellcey
  0 siblings, 1 reply; 38+ messages in thread
From: Jakub Jelinek @ 2008-11-13 21:20 UTC (permalink / raw)
  To: Steve Ellcey; +Cc: hjl.tools, rth, gcc-patches

On Thu, Nov 13, 2008 at 09:54:19AM -0800, Steve Ellcey wrote:
> OK, here is the patch I tested last night.  It fixes the bug and caused
> no regressions.  The only difference between this mutex.h and the regular
> one is the call to __sync_synchronize in gomp_mutex_unlock.  Should I go
> ahead and check it in?

Yes please.  Just minor comments inline.

> 2008-11-13  Steve Ellcey  <sje@cup.hp.com>
> 

Add here:
	PR libgomp/37938
> 	* config/linux/ia64/mutex.h: New.

> --- config/linux/ia64/mutex.h	(revision 0)
> +++ config/linux/ia64/mutex.h	(revision 0)
> @@ -0,0 +1,68 @@
> +/* Copyright (C) 2005 Free Software Foundation, Inc.

And here `, 2008'.

> +/* IA64 needs a __sync_synchronize call before  __sync_lock_test_and_set
> +   because  __sync_lock_test_and_set is not a full memory fence.  */

Why 2 spaces between __sync?  One should be enough.  Also, shouldn't
there be a comma before because?

	Jakub

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Patch for PR libgomp/37938, IA64 specific.
  2008-11-13 21:20                                                     ` Jakub Jelinek
@ 2008-11-13 21:39                                                       ` Steve Ellcey
  0 siblings, 0 replies; 38+ messages in thread
From: Steve Ellcey @ 2008-11-13 21:39 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: hjl.tools, rth, gcc-patches


On Thu, 2008-11-13 at 21:45 +0100, Jakub Jelinek wrote:

> Yes please.  Just minor comments inline.

I add the PR number and the 2008 copyright year, I also got rid
of the extra spaces in the comment but I didn't add the comma,
I don't think it is needed.  It's checked in now.

Steve Ellcey
sje@cup.hp.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2008-11-13 21:10 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-04 16:51 Patch for PR libgomp/37938, IA64 specific Steve Ellcey
2008-11-04 17:22 ` Jakub Jelinek
2008-11-06 17:14   ` Steve Ellcey
2008-11-06 17:42     ` H.J. Lu
2008-11-06 18:02       ` H.J. Lu
2008-11-06 19:07         ` Steve Ellcey
2008-11-06 19:52           ` H.J. Lu
2008-11-06 20:34             ` H.J. Lu
2008-11-06 21:05               ` Steve Ellcey
2008-11-06 21:10                 ` H.J. Lu
2008-11-06 21:35           ` Jakub Jelinek
2008-11-06 21:38             ` H.J. Lu
2008-11-06 21:59               ` Jakub Jelinek
2008-11-06 22:15                 ` H.J. Lu
2008-11-06 22:53                   ` Steve Ellcey
2008-11-06 23:21                   ` Steve Ellcey
2008-11-06 23:48                     ` Jakub Jelinek
2008-11-07  0:00                       ` Steve Ellcey
2008-11-07  0:04                         ` H.J. Lu
2008-11-07  0:17                           ` H.J. Lu
2008-11-07  3:40                             ` Jakub Jelinek
2008-11-07  6:30                               ` H.J. Lu
2008-11-07 18:47                                 ` Steve Ellcey
2008-11-07 19:34                                   ` H.J. Lu
2008-11-07 21:18                                     ` H.J. Lu
2008-11-07 21:28                                       ` H.J. Lu
2008-11-10 17:32                                         ` Steve Ellcey
2008-11-12 14:33                                         ` Jakub Jelinek
2008-11-12 15:48                                           ` H.J. Lu
2008-11-12 15:52                                             ` H.J. Lu
2008-11-12 16:35                                               ` Steve Ellcey
2008-11-12 23:10                                                 ` Jakub Jelinek
2008-11-13 18:51                                                   ` Steve Ellcey
2008-11-13 21:20                                                     ` Jakub Jelinek
2008-11-13 21:39                                                       ` Steve Ellcey
2008-11-12 22:15                                               ` Jakub Jelinek
2008-11-07  0:20                           ` Jakub Jelinek
2008-11-06 17:30   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).