libstdc++/3584: arm-specific atomic operations not atomic

public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed

* libstdc++/3584: arm-specific atomic operations not atomic
@ 2001-07-06  2:56 robin.farine
  0 siblings, 0 replies; 8+ messages in thread
From: robin.farine @ 2001-07-06  2:56 UTC (permalink / raw)
  To: gcc-gnats; +Cc: acnrf

>Number:         3584
>Category:       libstdc++
>Synopsis:       arm-specific atomic operations not atomic
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    unassigned
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jul 06 02:56:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     Robin Farine
>Release:        gcc version 3.0
>Organization:
>Environment:
build/host: Debian GNU/Linux
target: arm-unknown-elf
>Description:
Routines such as atomic_add() that reads a memory location,
apply an operation to the read value, uses the swp
instruction to update the memory location and swp again if
the value read the 2nd time does equal the initial value
introduce a race condition. The attachment illustrates such
a case.
>How-To-Repeat:

>Fix:
For single CPU systems, disable thread scheduling during the
operation.
>Release-Note:
>Audit-Trail:
>Unformatted:
----gnatsweb-attachment----
Content-Type: text/plain; name="arm-atomic-add.txt"
Content-Disposition: inline; filename="arm-atomic-add.txt"

The atomic_add() for ARM (with instructions numbering):

static inline void
__attribute__ ((__unused__))
__atomic_add (volatile _Atomic_word *__mem, int __val)
{
  _Atomic_word __tmp, __tmp2, __tmp3;
  __asm__ __volatile__ (
        "\n"
        "0:\t"
1       "ldr     %0, [%3] \n\t"
2       "add     %1, %0, %4 \n\t"
3       "swp     %2, %1, [%3] \n\t"
4       "cmp     %0, %2 \n\t"
5       "swpne   %1, %2, [%3] \n\t"
6       "bne     0b \n\t"
7       ""
        : "=&r"(__tmp), "=&r"(__tmp2), "=&r"(__tmp3) 
        : "r" (__mem), "r"(__val) 
        : "cc", "memory");
}


Example of sequence that produces an invalid result (*__mem == 2 instead of 4 at
the end):

T1 (%0 %1 %2)   T2 (%0 %1 %2)   *__mem

1 (x x x) __val=1               0
2 (0 x x)
3 (0 1 x) <-- reschedule
                1 (x x x) __val=1
                2 (0 x x)
                3 (0 1 x)
                4 (0 1 0)       1
                5 (0 1 0)
                6 (0 1 0)
                7 (0 1 0)
                ...      <-- reschedule
4 (0 1 1)                       1
5 (0 1 1) <-- reschedule
                1 (x x x) __val=2
                2 (1 x x)
                3 (1 3 x)
                4 (1 3 1)       3
                5 (1 3 1)
                6 (1 3 1)
                7 (1 3 1)
                ....      <-- reschedule
6 (0 3 1)                       1
1 (0 3 1)
2 (1 3 1)
3 (1 2 1)
4 (1 2 1)                       2
5 (1 2 1)
6 (1 2 1)
7 (1 2 1)
....


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libstdc++/3584: arm-specific atomic operations not atomic
@ 2002-10-03  9:17 rearnsha
  0 siblings, 0 replies; 8+ messages in thread
From: rearnsha @ 2002-10-03  9:17 UTC (permalink / raw)
  To: acnrf, gcc-bugs, gcc-prs, nobody, rearnsha, robin.farine

Synopsis: arm-specific atomic operations not atomic

State-Changed-From-To: feedback->closed
State-Changed-By: rearnsha
State-Changed-When: Thu Oct  3 09:17:26 2002
State-Changed-Why:
    I'm closing this because there doesn't seem to be much more that can be done other than reverting back to the generic (no thread safety) model, which is what I've just done.
    
    See http://gcc.gnu.org/ml/gcc-patches/2002-10/msg00157.html
    for further details on the problem with the old code.
    
    If thread support is added to the generic model, then we can make use of that on ARM too.

http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=3584


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libstdc++/3584: arm-specific atomic operations not atomic
@ 2002-05-31 18:09 pme
  0 siblings, 0 replies; 8+ messages in thread
From: pme @ 2002-05-31 18:09 UTC (permalink / raw)
  To: acnrf, gcc-bugs, gcc-prs, nobody, rearnsha, robin.farine

Synopsis: arm-specific atomic operations not atomic

State-Changed-From-To: open->feedback
State-Changed-By: pme
State-Changed-When: Fri May 31 18:05:20 2002
State-Changed-Why:
    What's the status of this PR?  How are things for 3.1
    and 3.2?
    
    > I'm not up on what threading models are usable with
    > libstdc++-v3, so I'll have to leave that side of
    > things to the c++ library maintainers.
    
    We simply grab whatever thread model has been configured
    for GCC, as represented by the various gcc/gthr-*.h files.
    

http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=3584


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libstdc++/3584: arm-specific atomic operations not atomic
@ 2001-07-12  7:26 Richard Earnshaw
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Earnshaw @ 2001-07-12  7:26 UTC (permalink / raw)
  To: nobody; +Cc: gcc-prs

The following reply was made to PR libstdc++/3584; it has been noted by GNATS.

From: Richard Earnshaw <rearnsha@arm.com>
To: Robin Farine <acnrf@dial.eunet.ch>
Cc: Richard.Earnshaw@arm.com, robin.farine@terminus.org, gcc-gnats@gcc.gnu.org
Subject: Re: libstdc++/3584: arm-specific atomic operations not atomic 
Date: Thu, 12 Jul 2001 15:23:57 +0100

 acnrf@dial.eunet.ch said:
 > Besides the fact that the label "0:" should *precede* the "mov r2, 1"
 > instruction, this version too would allow starvation of a low-priority
 > thread. I suppose that a multi-threads and multi-processors safe
 > implementation of atomic_add() would look more like this: 

 I think we should forget multi-processor versions for the moment.  SWP 
 semantics don't work well with these, since SWP memory locations need to 
 be placed in uncached/unbuffered pages to have well-defined 
 multi-processor semantics.

 I'm not up on what threading models are usable with libstdc++-v3, so I'll 
 have to leave that side of things to the c++ library maintainers.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libstdc++/3584: arm-specific atomic operations not atomic
@ 2001-07-12  7:16 Robin Farine
  0 siblings, 0 replies; 8+ messages in thread
From: Robin Farine @ 2001-07-12  7:16 UTC (permalink / raw)
  To: nobody; +Cc: gcc-prs

The following reply was made to PR libstdc++/3584; it has been noted by GNATS.

From: Robin Farine <acnrf@dial.eunet.ch>
To: Richard.Earnshaw@arm.com
Cc: robin.farine@terminus.org, gcc-gnats@gcc.gnu.org
Subject: Re: libstdc++/3584: arm-specific atomic operations not atomic
Date: 12 Jul 2001 16:08:17 +0200

 Robin Farine <rfarine@halftrack.hq.acn-group.ch> writes:
 
 [...]
 
 > 
 >         @ r0 = address (struct {int lock, int value}), r1 = increment.
 >         ...                     @ prolog
 >         mov     r2, 1
 > 0:
 >         swp     r3, r2, [r0]
 >         teq     r3, #0
 >         beq     1f              @ got the lock
 >         mov     lr, pc
 >         ldr     pc, .thread_yield
 >         b       0b
 > .thread_yield:  .word thread_yield
 > 
 > 1:
 >         @ We now have the lock
 >         ldr     r2, [r0, #4]
 >         add     r2, r2, r1
 >         str     r2, [r0, #4]
 >         str     r3, [r0]        @ Release the lock.
 
 Besides the fact that the label "0:" should *precede* the "mov r2, 1"
 instruction, this version too would allow starvation of a low-priority thread. I
 suppose that a multi-threads and multi-processors safe implementation of
 atomic_add() would look more like this:
 
 void
 atomic_add(volatile _Atomic_word *__mem, int __val)
 {
   int __lock;
   int __tmp = 1;
 
   thread_suspend_scheduler();
 
   __asm__ __volatile__ (
 	"\n"
 
   /* spin while another CPU holds the lock */
 
 	"0:\t"
 	"swp     %0, %1, [%2] \n\t"
 	"teq     %0, #0 \n\t"
 	"bne     0b \n\t"
 
   /* we now have the lock */
 
 	"ldr     %1, [%2, #4] \n\t"
 	"add     %1, %1, %3 \n\t"
 	"str     %1, [%2, #4] \n\t"
 
   /* release the lock */
 
 	"str     %0, [%2] \n\t"
 	""
 	: "=&r"(__lock), "=&r"(__tmp)
 	: "r" (__mem), "r"(__val) 
 	: "cc", "memory");
 
   thread_resume_scheduler();
 }
 
 
 This routine relies on the structured version of _Atomic_word and on nestable
 primitives to suspend/resume threads scheduling, which the POSIX threads API
 doesn't provide, if I remember correctly ...


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libstdc++/3584: arm-specific atomic operations not atomic
@ 2001-07-11 11:06 Richard Earnshaw
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Earnshaw @ 2001-07-11 11:06 UTC (permalink / raw)
  To: nobody; +Cc: gcc-prs

The following reply was made to PR libstdc++/3584; it has been noted by GNATS.

From: Richard Earnshaw <rearnsha@arm.com>
To: Robin Farine <acnrf@dial.eunet.ch>
Cc: Richard.Earnshaw@arm.com, robin.farine@terminus.org, gcc-gnats@gcc.gnu.org
Subject: Re: libstdc++/3584: arm-specific atomic operations not atomic 
Date: Wed, 11 Jul 2001 19:04:22 +0100

 > This certainly works correctly but it would introduce another problem. If
 > another thread already holds the lock, the current thread would just spin
 > hopelessly until the thread that holds the lock gets the CPU and releases
 > it. This could result into a dead-lock if the spinning thread has a higher
 > priority than the lock owner.
 > 
 > With the swap instruction (in contrast to test-and-set ;-)), I think that such
 > routines implementations have to depend on the threading model to explicitly
 > release the CPU when they fail to get the lock. Something like:
 
 Correct.  I was forgetting that the load-locked/store-conditional 
 operations wouldn't suffer from this problem.  That makes the sequence 
 particularly nasty, especially when called from thumb code... :-(
 
 R.
 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libstdc++/3584: arm-specific atomic operations not atomic
@ 2001-07-11 10:36 Robin Farine
  0 siblings, 0 replies; 8+ messages in thread
From: Robin Farine @ 2001-07-11 10:36 UTC (permalink / raw)
  To: nobody; +Cc: gcc-prs

The following reply was made to PR libstdc++/3584; it has been noted by GNATS.

From: Robin Farine <acnrf@dial.eunet.ch>
To: Richard.Earnshaw@arm.com
Cc: robin.farine@terminus.org, gcc-gnats@gcc.gnu.org
Subject: Re: libstdc++/3584: arm-specific atomic operations not atomic
Date: 11 Jul 2001 19:34:59 +0200

 Richard Earnshaw <rearnsha@arm.com> writes:
 
 [...]
 
 > The code for this variant would be: 
 >       @ r0 = address (struct {int lock, int value}), r1 = increment.
 >       mov     r2, #INT_MIN
 > 0:
 >       swp     r3, r2, [r0]
 >       cmp     r3, #INT_MIN
 >       beq     0b      @ already locked
 >       @ We now have the lock
 >       ldr     r2, [r0, #4]
 >       add     r2, r2, r1
 >       str     r2, [r0, #4]
 >       str     r3, [r0]        @ Release the lock.
 >       mov     r2
 
 This certainly works correctly but it would introduce another problem. If
 another thread already holds the lock, the current thread would just spin
 hopelessly until the thread that holds the lock gets the CPU and releases
 it. This could result into a dead-lock if the spinning thread has a higher
 priority than the lock owner.
 
 With the swap instruction (in contrast to test-and-set ;-)), I think that such
 routines implementations have to depend on the threading model to explicitly
 release the CPU when they fail to get the lock. Something like:
 
 
         @ r0 = address (struct {int lock, int value}), r1 = increment.
         ...                     @ prolog
         mov     r2, 1
 0:
         swp     r3, r2, [r0]
         teq     r3, #0
         beq     1f              @ got the lock
         mov     lr, pc
         ldr     pc, .thread_yield
         b       0b
 .thread_yield:  .word thread_yield
 
 1:
         @ We now have the lock
         ldr     r2, [r0, #4]
         add     r2, r2, r1
         str     r2, [r0, #4]
         str     r3, [r0]        @ Release the lock.
         mov     r2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: libstdc++/3584: arm-specific atomic operations not atomic
@ 2001-07-11  8:46 Richard Earnshaw
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Earnshaw @ 2001-07-11  8:46 UTC (permalink / raw)
  To: nobody; +Cc: gcc-prs

The following reply was made to PR libstdc++/3584; it has been noted by GNATS.

From: Richard Earnshaw <rearnsha@arm.com>
To: robin.farine@terminus.org
Cc: gcc-gnats@gcc.gnu.org, acnrf@dial.eunet.ch, Richard.Earnshaw@arm.com
Subject: Re: libstdc++/3584: arm-specific atomic operations not atomic 
Date: Wed, 11 Jul 2001 16:38:26 +0100

 robin.farine@terminus.org said:
 > Routines such as atomic_add() that reads a memory location, apply an
 > operation to the read value, uses the swp instruction to update the
 > memory location and swp again if the value read the 2nd time does
 > equal the initial value introduce a race condition.

 I agree.  I can think of two possible solutions; which is best depends on 
 the potential uses of these operations.

 The first one reserves a value (say INT_MIN, chosen because it is a simple 
 immediate constant on the ARM, and a relatively unlikely value in normal 
 signed-arithmetic operations); if the atomic memory location ever contains 
 this value, then another atomic operation is in progress on the memory 
 location, and the requester must spin until that value is not already 
 stored there.

 The code for atomic add would then be.

 	@ r0 = address, r1 = increment.
 	mov	r2, #INT_MIN
 0:
 	swp	r3, r2, [r0]
 	cmp	r3, #INT_MIN
 	beq	0b		@ already locked.
 	@ We now have the lock
 	add	r3, r3, r1
 	cmp	r3, #INT_MIN	@ Paranoia -- INT_MIN is reserved.
 	addeq	r3, #1
 	str	r3, [r0]	@ save, and release the lock.

 Similar sequences could be written for the other operations.

 The second avoids reserving a value, but requires a larger object for the 
 "lock".  It would also require special code to initialize and read any 
 value from an _Atomic_word object.  This would break the current 
 include/bits/basic_string.h implementation, which contains

         _Atomic_word            _M_references;

         bool
         _M_is_leaked() const
         { return _M_references < 0; }

         bool
         _M_is_shared() const
         { return _M_references > 0; }

 	...

 I guess we could fix this by making _Atomic_word a class which defined 
 these operations.

 The code for this variant would be: 
 	@ r0 = address (struct {int lock, int value}), r1 = increment.
 	mov	r2, #INT_MIN
 0:
 	swp	r3, r2, [r0]
 	cmp	r3, #INT_MIN
 	beq	0b	@ already locked
 	@ We now have the lock
 	ldr	r2, [r0, #4]
 	add	r2, r2, r1
 	str	r2, [r0, #4]
 	str	r3, [r0]	@ Release the lock.
 	mov	r2

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-10-03 16:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-07-06  2:56 libstdc++/3584: arm-specific atomic operations not atomic robin.farine
2001-07-11  8:46 Richard Earnshaw
2001-07-11 10:36 Robin Farine
2001-07-11 11:06 Richard Earnshaw
2001-07-12  7:16 Robin Farine
2001-07-12  7:26 Richard Earnshaw
2002-05-31 18:09 pme
2002-10-03  9:17 rearnsha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).