public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Remove sparcv8 support
@ 2016-10-20 19:47 Adhemerval Zanella
  2016-10-20 20:56 ` David Miller
  2016-10-21  9:02 ` Andreas Larsson
  0 siblings, 2 replies; 32+ messages in thread
From: Adhemerval Zanella @ 2016-10-20 19:47 UTC (permalink / raw)
  To: GNU C Library

Hi all,

The sparcv8 build is broken since GLIBC 2.23 due the new pthread 
barrier implementation [1] and since then there is no thread or 
interest on fixing it (Torvald has suggested some options on 
2.23 release thread).  It won't help with both new pthread rdlock 
and cond implementation, although I would expect that it relies 
on same atomic primitive that was not present for pthread barrier.

AFAIK, recent commercial sparc chips from Oracle all supports
sparcv9.  The only somewhat recent sparc chip with just sparcv8
support is LEON4, which I really doubt it cares for glibc support.

So I propose to set sparcv9 as the minimum supported sparc32
architecture and remove all the old sparcv8 code on glibc.

Any thoughs?

[1] commit b02840bacdefde318d2ad2f920e50785b9b25d69

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-20 19:47 Remove sparcv8 support Adhemerval Zanella
@ 2016-10-20 20:56 ` David Miller
  2016-10-21  9:02 ` Andreas Larsson
  1 sibling, 0 replies; 32+ messages in thread
From: David Miller @ 2016-10-20 20:56 UTC (permalink / raw)
  To: adhemerval.zanella; +Cc: libc-alpha

From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Thu, 20 Oct 2016 17:47:30 -0200

> The sparcv8 build is broken since GLIBC 2.23 due the new pthread 
> barrier implementation [1] and since then there is no thread or 
> interest on fixing it (Torvald has suggested some options on 
> 2.23 release thread).

It's not lack of interest, it's lack of time.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-20 19:47 Remove sparcv8 support Adhemerval Zanella
  2016-10-20 20:56 ` David Miller
@ 2016-10-21  9:02 ` Andreas Larsson
  2016-10-21 13:13   ` Adhemerval Zanella
  2016-10-24 17:25   ` Torvald Riegel
  1 sibling, 2 replies; 32+ messages in thread
From: Andreas Larsson @ 2016-10-21  9:02 UTC (permalink / raw)
  To: Adhemerval Zanella, GNU C Library; +Cc: David Miller

On 2016-10-20 21:47, Adhemerval Zanella wrote:
> The sparcv8 build is broken since GLIBC 2.23 due the new pthread
> barrier implementation [1] and since then there is no thread or
> interest on fixing it (Torvald has suggested some options on
> 2.23 release thread).  It won't help with both new pthread rdlock
> and cond implementation, although I would expect that it relies
> on same atomic primitive that was not present for pthread barrier.
>
> AFAIK, recent commercial sparc chips from Oracle all supports
> sparcv9.  The only somewhat recent sparc chip with just sparcv8
> support is LEON4, which I really doubt it cares for glibc support.

Hi!

We do care about GLIBC support for many different LEON3 and LEON4 
systems. GLIBC support for sparcv8 is important for us and it is 
important for our customers. Both LEON3 and LEON4 are continuously used 
in new hardware designs.

We are not always using the latest version of GLIBC (the latest step we 
took was to GLIBC 2.20), so unfortunately we missed this issue. We will 
look into what the extent of the missing support is. Any pointers are 
most welcome.

Do you have a link to the suggested options on the 2.23 release thread? 
I dug around a bit in the archives, but did not find it.

(As a side note, most of the recent LEON3 and LEON4 chips have CAS 
instruction support, but pure sparcv8 support is of course the baseline.)

Best regards,

Andreas Larsson
Software Engineer
Cobham Gaisler

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-21  9:02 ` Andreas Larsson
@ 2016-10-21 13:13   ` Adhemerval Zanella
  2016-10-21 15:03     ` David Miller
  2016-10-24 17:25   ` Torvald Riegel
  1 sibling, 1 reply; 32+ messages in thread
From: Adhemerval Zanella @ 2016-10-21 13:13 UTC (permalink / raw)
  To: Andreas Larsson, GNU C Library; +Cc: David Miller



On 21/10/2016 06:59, Andreas Larsson wrote:
> On 2016-10-20 21:47, Adhemerval Zanella wrote:
>> The sparcv8 build is broken since GLIBC 2.23 due the new pthread
>> barrier implementation [1] and since then there is no thread or
>> interest on fixing it (Torvald has suggested some options on
>> 2.23 release thread).  It won't help with both new pthread rdlock
>> and cond implementation, although I would expect that it relies
>> on same atomic primitive that was not present for pthread barrier.
>>
>> AFAIK, recent commercial sparc chips from Oracle all supports
>> sparcv9.  The only somewhat recent sparc chip with just sparcv8
>> support is LEON4, which I really doubt it cares for glibc support.
> 
> Hi!
> 
> We do care about GLIBC support for many different LEON3 and LEON4 systems. GLIBC support for sparcv8 is important for us and it is important for our customers. Both LEON3 and LEON4 are continuously used in new hardware designs.
> 
> We are not always using the latest version of GLIBC (the latest step we took was to GLIBC 2.20), so unfortunately we missed this issue. We will look into what the extent of the missing support is. Any pointers are most welcome.
> 
> Do you have a link to the suggested options on the 2.23 release thread? I dug around a bit in the archives, but did not find it.
> 
> (As a side note, most of the recent LEON3 and LEON4 chips have CAS instruction support, but pure sparcv8 support is of course the baseline.)

I am glad I could get some attention.  At least for glibc, building
with '-mcpu=leon3' will enable both sparcv9 and fpu implied folders,
so I think we do not have problem here (unless glibc preconfigure and
config.guess is wrongly assuming leon3 as sparcv9 compatible for
glibc implementations).

Now the current problem for pre sparc-v9 is on 'New pthread_barrier 
algorithm to fulfill barrier destruction requirements.' (commit id 
b02840bacdefde318d2ad2f920e50785b9b25d69) Torvald's added a default
sparc32 pthread_barrier_wait.c file to just throw an build error.

This is because new algorithm uses atomic_compare_exchange_weak_release and
for pre-v9 sparc32 it will have to use a lock embedded into the barrier
(as for previous implementation). Ideally we would like to do in a way that
can be embedded into the generic code so that you don't have to maintain 
sparc-specific files.

I am not sure which will pre-v9 sparc specific constraints, but my understanding
is you will need to extend the 'struct pthread_barrier' so 'current_round'
will be at least 64-bits so you can use '__v7_compare_and_exchange_val_acq'
(I am also not sure about alignment requirement).

Which get back to initial proposal: do we really care to continue support
sparc variants without proper CAS implementation? For each new NPTL or other
implementation we will need to take pre-sparcv8 in consideration and
I feel that current sem_xxx implementation is still lacking the latest 
fixes.

If you still care for such support, we need to set a track date for
completion.  It has been 2 releases that we ship broken pre-sparcv8 support
and see no point on continue to do so.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-21 13:13   ` Adhemerval Zanella
@ 2016-10-21 15:03     ` David Miller
  2016-10-24 17:14       ` Torvald Riegel
  0 siblings, 1 reply; 32+ messages in thread
From: David Miller @ 2016-10-21 15:03 UTC (permalink / raw)
  To: adhemerval.zanella; +Cc: andreas, libc-alpha

From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Fri, 21 Oct 2016 11:13:09 -0200

> This is because new algorithm uses atomic_compare_exchange_weak_release and
> for pre-v9 sparc32 it will have to use a lock embedded into the barrier
> (as for previous implementation). Ideally we would like to do in a way that
> can be embedded into the generic code so that you don't have to maintain 
> sparc-specific files.

I would not put that into a generic implementation, such a scheme
deadlocks in the presence of signals.

And yes the existing sparc v7 locking deadlocks this way as well and
has done so since day one, many of the threaded test cases in glibc
timeout and fail for this very reason.

The only long term solution is to do what ARM and others have done
which is provide an atomic primitive in the Linux kernel which
executes via a system call or similar which will eliminate the signal
based deadlocks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-21 15:03     ` David Miller
@ 2016-10-24 17:14       ` Torvald Riegel
  0 siblings, 0 replies; 32+ messages in thread
From: Torvald Riegel @ 2016-10-24 17:14 UTC (permalink / raw)
  To: David Miller; +Cc: adhemerval.zanella, andreas, libc-alpha

On Fri, 2016-10-21 at 11:02 -0400, David Miller wrote:
> From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> Date: Fri, 21 Oct 2016 11:13:09 -0200
> 
> > This is because new algorithm uses atomic_compare_exchange_weak_release and
> > for pre-v9 sparc32 it will have to use a lock embedded into the barrier
> > (as for previous implementation). Ideally we would like to do in a way that
> > can be embedded into the generic code so that you don't have to maintain 
> > sparc-specific files.
> 
> I would not put that into a generic implementation, such a scheme
> deadlocks in the presence of signals.
> 
> And yes the existing sparc v7 locking deadlocks this way as well and
> has done so since day one, many of the threaded test cases in glibc
> timeout and fail for this very reason.
> 
> The only long term solution is to do what ARM and others have done
> which is provide an atomic primitive in the Linux kernel which
> executes via a system call or similar which will eliminate the signal
> based deadlocks.

I think this is reasonable.  Should we stop supporting pre-v9 sparc32
until such a kernel-side solution is in place (and we can test whether
it is present)?

If the specific arch one is building for has CAS, we could also continue
to support it with that as a prerequisite.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-21  9:02 ` Andreas Larsson
  2016-10-21 13:13   ` Adhemerval Zanella
@ 2016-10-24 17:25   ` Torvald Riegel
  2016-10-24 17:43     ` Adhemerval Zanella
  2016-10-25 14:34     ` Andreas Larsson
  1 sibling, 2 replies; 32+ messages in thread
From: Torvald Riegel @ 2016-10-24 17:25 UTC (permalink / raw)
  To: Andreas Larsson; +Cc: Adhemerval Zanella, GNU C Library, David Miller

On Fri, 2016-10-21 at 10:59 +0200, Andreas Larsson wrote:
> On 2016-10-20 21:47, Adhemerval Zanella wrote:
> > The sparcv8 build is broken since GLIBC 2.23 due the new pthread
> > barrier implementation [1] and since then there is no thread or
> > interest on fixing it (Torvald has suggested some options on
> > 2.23 release thread).  It won't help with both new pthread rdlock
> > and cond implementation, although I would expect that it relies
> > on same atomic primitive that was not present for pthread barrier.
> >
> > AFAIK, recent commercial sparc chips from Oracle all supports
> > sparcv9.  The only somewhat recent sparc chip with just sparcv8
> > support is LEON4, which I really doubt it cares for glibc support.
> 
> Hi!
> 
> We do care about GLIBC support for many different LEON3 and LEON4 
> systems. GLIBC support for sparcv8 is important for us and it is 
> important for our customers. Both LEON3 and LEON4 are continuously used 
> in new hardware designs.

If you do care about it, it would be nice if you could (help) maintain
sparcv8 (e.g., regularly testing most recent glibc on sparcv8, at the
very least early during the freeze of each release).  This ensures that
you won't get surprises such as this one, when nobody else is spending
resources on it.

> We are not always using the latest version of GLIBC (the latest step we 
> took was to GLIBC 2.20), so unfortunately we missed this issue. We will 
> look into what the extent of the missing support is. Any pointers are 
> most welcome.
> 
> Do you have a link to the suggested options on the 2.23 release thread? 
> I dug around a bit in the archives, but did not find it.
> 
> (As a side note, most of the recent LEON3 and LEON4 chips have CAS 
> instruction support, but pure sparcv8 support is of course the baseline.)

Yes, the lack of CAS is the major problem I am aware of.  If the chips
you mention do support CAS, then a patch that adds support for the
CAS-based atomic operations in glibc would fix the barrier problem
(because the generic barrier should just work).  The patch would also
have to add configure bits or whatever would be appropriate so that
glibc can figure out whether it is supposed to be run on a sparcv8 with
or without CAS.

What about stopping support for plain sparcv8, and keeping to support
sparcv8+CAS provided that we have a (group of) maintainer(s) for the
latter that can tend to the minimal responsibilities of an arch
maintainer and has the time to do that?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-24 17:25   ` Torvald Riegel
@ 2016-10-24 17:43     ` Adhemerval Zanella
  2016-10-25 14:34       ` Andreas Larsson
  2016-10-25 14:34     ` Andreas Larsson
  1 sibling, 1 reply; 32+ messages in thread
From: Adhemerval Zanella @ 2016-10-24 17:43 UTC (permalink / raw)
  To: Torvald Riegel, Andreas Larsson; +Cc: GNU C Library, David Miller



On 24/10/2016 15:25, Torvald Riegel wrote:
> On Fri, 2016-10-21 at 10:59 +0200, Andreas Larsson wrote:
>> On 2016-10-20 21:47, Adhemerval Zanella wrote:
>>> The sparcv8 build is broken since GLIBC 2.23 due the new pthread
>>> barrier implementation [1] and since then there is no thread or
>>> interest on fixing it (Torvald has suggested some options on
>>> 2.23 release thread).  It won't help with both new pthread rdlock
>>> and cond implementation, although I would expect that it relies
>>> on same atomic primitive that was not present for pthread barrier.
>>>
>>> AFAIK, recent commercial sparc chips from Oracle all supports
>>> sparcv9.  The only somewhat recent sparc chip with just sparcv8
>>> support is LEON4, which I really doubt it cares for glibc support.
>>
>> Hi!
>>
>> We do care about GLIBC support for many different LEON3 and LEON4 
>> systems. GLIBC support for sparcv8 is important for us and it is 
>> important for our customers. Both LEON3 and LEON4 are continuously used 
>> in new hardware designs.
> 
> If you do care about it, it would be nice if you could (help) maintain
> sparcv8 (e.g., regularly testing most recent glibc on sparcv8, at the
> very least early during the freeze of each release).  This ensures that
> you won't get surprises such as this one, when nobody else is spending
> resources on it.
> 
>> We are not always using the latest version of GLIBC (the latest step we 
>> took was to GLIBC 2.20), so unfortunately we missed this issue. We will 
>> look into what the extent of the missing support is. Any pointers are 
>> most welcome.
>>
>> Do you have a link to the suggested options on the 2.23 release thread? 
>> I dug around a bit in the archives, but did not find it.
>>
>> (As a side note, most of the recent LEON3 and LEON4 chips have CAS 
>> instruction support, but pure sparcv8 support is of course the baseline.)
> 
> Yes, the lack of CAS is the major problem I am aware of.  If the chips
> you mention do support CAS, then a patch that adds support for the
> CAS-based atomic operations in glibc would fix the barrier problem
> (because the generic barrier should just work).  The patch would also
> have to add configure bits or whatever would be appropriate so that
> glibc can figure out whether it is supposed to be run on a sparcv8 with
> or without CAS.
> 
> What about stopping support for plain sparcv8, and keeping to support
> sparcv8+CAS provided that we have a (group of) maintainer(s) for the
> latter that can tend to the minimal responsibilities of an arch
> maintainer and has the time to do that?

At least the build for sparcv9-linux-gnu with -mcpu=leon3 finishes, 
although I am not sure if it correctly runs on leon processors.
And I seconded Tovarld's suggestion about stop maintaining plain
sparcv8 and set sparcv8+CAS as the base supported sparc32.

As pointed out by David Miller, correct support for plain sparcv8
could really only be provided with kernel supported.  And when
it lands on kernel side, it should work effortlessly with a 
sparcv8 + cas glibc build. 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-24 17:43     ` Adhemerval Zanella
@ 2016-10-25 14:34       ` Andreas Larsson
  2016-10-25 14:45         ` Adhemerval Zanella
  0 siblings, 1 reply; 32+ messages in thread
From: Andreas Larsson @ 2016-10-25 14:34 UTC (permalink / raw)
  To: Adhemerval Zanella, Torvald Riegel; +Cc: GNU C Library, David Miller, software

On 2016-10-24 19:42, Adhemerval Zanella wrote:
>
>
> On 24/10/2016 15:25, Torvald Riegel wrote:
>> On Fri, 2016-10-21 at 10:59 +0200, Andreas Larsson wrote:
>>> On 2016-10-20 21:47, Adhemerval Zanella wrote:
>>>> The sparcv8 build is broken since GLIBC 2.23 due the new pthread
>>>> barrier implementation [1] and since then there is no thread or
>>>> interest on fixing it (Torvald has suggested some options on
>>>> 2.23 release thread).  It won't help with both new pthread rdlock
>>>> and cond implementation, although I would expect that it relies
>>>> on same atomic primitive that was not present for pthread barrier.
>>>>
>>>> AFAIK, recent commercial sparc chips from Oracle all supports
>>>> sparcv9.  The only somewhat recent sparc chip with just sparcv8
>>>> support is LEON4, which I really doubt it cares for glibc support.
>>>
>>> Hi!
>>>
>>> We do care about GLIBC support for many different LEON3 and LEON4
>>> systems. GLIBC support for sparcv8 is important for us and it is
>>> important for our customers. Both LEON3 and LEON4 are continuously used
>>> in new hardware designs.
>>
>> If you do care about it, it would be nice if you could (help) maintain
>> sparcv8 (e.g., regularly testing most recent glibc on sparcv8, at the
>> very least early during the freeze of each release).  This ensures that
>> you won't get surprises such as this one, when nobody else is spending
>> resources on it.
>>
>>> We are not always using the latest version of GLIBC (the latest step we
>>> took was to GLIBC 2.20), so unfortunately we missed this issue. We will
>>> look into what the extent of the missing support is. Any pointers are
>>> most welcome.
>>>
>>> Do you have a link to the suggested options on the 2.23 release thread?
>>> I dug around a bit in the archives, but did not find it.
>>>
>>> (As a side note, most of the recent LEON3 and LEON4 chips have CAS
>>> instruction support, but pure sparcv8 support is of course the baseline.)
>>
>> Yes, the lack of CAS is the major problem I am aware of.  If the chips
>> you mention do support CAS, then a patch that adds support for the
>> CAS-based atomic operations in glibc would fix the barrier problem
>> (because the generic barrier should just work).  The patch would also
>> have to add configure bits or whatever would be appropriate so that
>> glibc can figure out whether it is supposed to be run on a sparcv8 with
>> or without CAS.
>>
>> What about stopping support for plain sparcv8, and keeping to support
>> sparcv8+CAS provided that we have a (group of) maintainer(s) for the
>> latter that can tend to the minimal responsibilities of an arch
>> maintainer and has the time to do that?
>
> At least the build for sparcv9-linux-gnu with -mcpu=leon3 finishes,
> although I am not sure if it correctly runs on leon processors.
> And I seconded Tovarld's suggestion about stop maintaining plain
> sparcv8 and set sparcv8+CAS as the base supported sparc32.

I have mixed feelings about this, but it is certainly better than
throwing out sparcv8 outright.

> As pointed out by David Miller, correct support for plain sparcv8
> could really only be provided with kernel supported.  And when
> it lands on kernel side, it should work effortlessly with a
> sparcv8 + cas glibc build.

What do you mean by "work effortlessly with a sparcv8 + cas glibc
build"?

Best regards,
Andreas Larsson

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-24 17:25   ` Torvald Riegel
  2016-10-24 17:43     ` Adhemerval Zanella
@ 2016-10-25 14:34     ` Andreas Larsson
  2016-10-25 16:22       ` Torvald Riegel
  1 sibling, 1 reply; 32+ messages in thread
From: Andreas Larsson @ 2016-10-25 14:34 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: Adhemerval Zanella, GNU C Library, David Miller, software

On 2016-10-24 19:25, Torvald Riegel wrote:
> On Fri, 2016-10-21 at 10:59 +0200, Andreas Larsson wrote:
>> On 2016-10-20 21:47, Adhemerval Zanella wrote:
>>> The sparcv8 build is broken since GLIBC 2.23 due the new pthread
>>> barrier implementation [1] and since then there is no thread or
>>> interest on fixing it (Torvald has suggested some options on
>>> 2.23 release thread).  It won't help with both new pthread rdlock
>>> and cond implementation, although I would expect that it relies
>>> on same atomic primitive that was not present for pthread barrier.
>>>
>>> AFAIK, recent commercial sparc chips from Oracle all supports
>>> sparcv9.  The only somewhat recent sparc chip with just sparcv8
>>> support is LEON4, which I really doubt it cares for glibc support.
>>
>> Hi!
>>
>> We do care about GLIBC support for many different LEON3 and LEON4
>> systems. GLIBC support for sparcv8 is important for us and it is
>> important for our customers. Both LEON3 and LEON4 are continuously used
>> in new hardware designs.
>
> If you do care about it, it would be nice if you could (help) maintain
> sparcv8 (e.g., regularly testing most recent glibc on sparcv8, at the
> very least early during the freeze of each release).  This ensures that
> you won't get surprises such as this one, when nobody else is spending
> resources on it.

Yes, it is apparent that we need to keep up better to avoid problems
like this.

>> We are not always using the latest version of GLIBC (the latest step we
>> took was to GLIBC 2.20), so unfortunately we missed this issue. We will
>> look into what the extent of the missing support is. Any pointers are
>> most welcome.
>>
>> Do you have a link to the suggested options on the 2.23 release thread?
>> I dug around a bit in the archives, but did not find it.
>>
>> (As a side note, most of the recent LEON3 and LEON4 chips have CAS
>> instruction support, but pure sparcv8 support is of course the baseline.)
>
> Yes, the lack of CAS is the major problem I am aware of.  If the chips
> you mention do support CAS, then a patch that adds support for the
> CAS-based atomic operations in glibc would fix the barrier problem
> (because the generic barrier should just work).  The patch would also
> have to add configure bits or whatever would be appropriate so that
> glibc can figure out whether it is supposed to be run on a sparcv8 with
> or without CAS.

Perhaps not the kosher way to do it (happy to get feedback if some
other method should be used), but changing
sysdeps/sparc/sparc32/pthread_barrier_wait.c to:

#if defined(__GCC_ATOMIC_INT_LOCK_FREE) && (__GCC_ATOMIC_INT_LOCK_FREE > 1)
#include <nptl/pthread_barrier_wait.c>
#else
#error No support for pthread barriers on pre-v9 sparc.
#endif

and fixing missing undefs for sparc32 for sendmsg and recvmsg
(sparc32 was not adjusted in commit abf29edd4a3918)

--- a/sysdeps/unix/sysv/linux/sparc/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/sparc/kernel-features.h
@@ -32,8 +32,10 @@
  #include_next <kernel-features.h>

  /* 32-bit SPARC kernels do not support
-   futex_atomic_cmpxchg_inatomic.  */
+   futex_atomic_cmpxchg_inatomic or sendmsg/recvmsg.  */
  #if !defined __arch64__ && !defined __sparc_v9__
  # undef __ASSUME_REQUEUE_PI
  # undef __ASSUME_SET_ROBUST_LIST
+# undef __ASSUME_SENDMSG_SYSCALL
+# undef __ASSUME_RECVMSG_SYSCALL
  #endif

made me able to cross-compile glibc 2.24 using gcc 4.9.4 and
-mcpu=leon3, boot with a buildroot based system and run cross-compiled
nptl/tst-barrier[1234] without failures. I will continue with building
and run the rest of the test framework, especially tst-barrier5.

Best regards,
Andreas Larsson

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-25 14:34       ` Andreas Larsson
@ 2016-10-25 14:45         ` Adhemerval Zanella
  2016-10-26 14:46           ` Andreas Larsson
  0 siblings, 1 reply; 32+ messages in thread
From: Adhemerval Zanella @ 2016-10-25 14:45 UTC (permalink / raw)
  To: Andreas Larsson; +Cc: Torvald Riegel, GNU C Library, David Miller, software



> On 25 Oct 2016, at 12:34, Andreas Larsson <andreas@gaisler.com> wrote:
> 
>> On 2016-10-24 19:42, Adhemerval Zanella wrote:
>> 
>> 
>>> On 24/10/2016 15:25, Torvald Riegel wrote:
>>>> On Fri, 2016-10-21 at 10:59 +0200, Andreas Larsson wrote:
>>>>> On 2016-10-20 21:47, Adhemerval Zanella wrote:
>>>>> The sparcv8 build is broken since GLIBC 2.23 due the new pthread
>>>>> barrier implementation [1] and since then there is no thread or
>>>>> interest on fixing it (Torvald has suggested some options on
>>>>> 2.23 release thread).  It won't help with both new pthread rdlock
>>>>> and cond implementation, although I would expect that it relies
>>>>> on same atomic primitive that was not present for pthread barrier.
>>>>> 
>>>>> AFAIK, recent commercial sparc chips from Oracle all supports
>>>>> sparcv9.  The only somewhat recent sparc chip with just sparcv8
>>>>> support is LEON4, which I really doubt it cares for glibc support.
>>>> 
>>>> Hi!
>>>> 
>>>> We do care about GLIBC support for many different LEON3 and LEON4
>>>> systems. GLIBC support for sparcv8 is important for us and it is
>>>> important for our customers. Both LEON3 and LEON4 are continuously used
>>>> in new hardware designs.
>>> 
>>> If you do care about it, it would be nice if you could (help) maintain
>>> sparcv8 (e.g., regularly testing most recent glibc on sparcv8, at the
>>> very least early during the freeze of each release).  This ensures that
>>> you won't get surprises such as this one, when nobody else is spending
>>> resources on it.
>>> 
>>>> We are not always using the latest version of GLIBC (the latest step we
>>>> took was to GLIBC 2.20), so unfortunately we missed this issue. We will
>>>> look into what the extent of the missing support is. Any pointers are
>>>> most welcome.
>>>> 
>>>> Do you have a link to the suggested options on the 2.23 release thread?
>>>> I dug around a bit in the archives, but did not find it.
>>>> 
>>>> (As a side note, most of the recent LEON3 and LEON4 chips have CAS
>>>> instruction support, but pure sparcv8 support is of course the baseline.)
>>> 
>>> Yes, the lack of CAS is the major problem I am aware of.  If the chips
>>> you mention do support CAS, then a patch that adds support for the
>>> CAS-based atomic operations in glibc would fix the barrier problem
>>> (because the generic barrier should just work).  The patch would also
>>> have to add configure bits or whatever would be appropriate so that
>>> glibc can figure out whether it is supposed to be run on a sparcv8 with
>>> or without CAS.
>>> 
>>> What about stopping support for plain sparcv8, and keeping to support
>>> sparcv8+CAS provided that we have a (group of) maintainer(s) for the
>>> latter that can tend to the minimal responsibilities of an arch
>>> maintainer and has the time to do that?
>> 
>> At least the build for sparcv9-linux-gnu with -mcpu=leon3 finishes,
>> although I am not sure if it correctly runs on leon processors.
>> And I seconded Tovarld's suggestion about stop maintaining plain
>> sparcv8 and set sparcv8+CAS as the base supported sparc32.
> 
> I have mixed feelings about this, but it is certainly better than
> throwing out sparcv8 outright.

>> As pointed out by David Miller, correct support for plain sparcv8
>> could really only be provided with kernel supported.  And when
>> it lands on kernel side, it should work effortlessly with a
>> sparcv8 + cas glibc build.
> 
> What do you mean by "work effortlessly with a sparcv8 + cas glibc
> build"?

Meaning that even if underlying hardware does not support correct CAS, kernel emulation will provide it and thus a default GLIBC sparc32 build will work regardless.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-25 14:34     ` Andreas Larsson
@ 2016-10-25 16:22       ` Torvald Riegel
  0 siblings, 0 replies; 32+ messages in thread
From: Torvald Riegel @ 2016-10-25 16:22 UTC (permalink / raw)
  To: Andreas Larsson; +Cc: Adhemerval Zanella, GNU C Library, David Miller, software

On Tue, 2016-10-25 at 16:33 +0200, Andreas Larsson wrote:
> On 2016-10-24 19:25, Torvald Riegel wrote:
> > On Fri, 2016-10-21 at 10:59 +0200, Andreas Larsson wrote:
> >> We are not always using the latest version of GLIBC (the latest step we
> >> took was to GLIBC 2.20), so unfortunately we missed this issue. We will
> >> look into what the extent of the missing support is. Any pointers are
> >> most welcome.
> >>
> >> Do you have a link to the suggested options on the 2.23 release thread?
> >> I dug around a bit in the archives, but did not find it.
> >>
> >> (As a side note, most of the recent LEON3 and LEON4 chips have CAS
> >> instruction support, but pure sparcv8 support is of course the baseline.)
> >
> > Yes, the lack of CAS is the major problem I am aware of.  If the chips
> > you mention do support CAS, then a patch that adds support for the
> > CAS-based atomic operations in glibc would fix the barrier problem
> > (because the generic barrier should just work).  The patch would also
> > have to add configure bits or whatever would be appropriate so that
> > glibc can figure out whether it is supposed to be run on a sparcv8 with
> > or without CAS.
> 
> Perhaps not the kosher way to do it (happy to get feedback if some
> other method should be used), but changing
> sysdeps/sparc/sparc32/pthread_barrier_wait.c to:
> 
> #if defined(__GCC_ATOMIC_INT_LOCK_FREE) && (__GCC_ATOMIC_INT_LOCK_FREE > 1)
> #include <nptl/pthread_barrier_wait.c>
> #else
> #error No support for pthread barriers on pre-v9 sparc.
> #endif

The sparc-specific barriers should just go away. atomic-machine.h for
sparc v8 should ensure that it provides the atomic operations that are
needed (and which work on process-shared uses too).

I don't have a real preference for when in the build process we should
check whether a real CAS is provided by the HW (eg, at configure time or
when trying to use atomic operations).

I'd suggest to also look at whether you really need the custom spinlock
on sparc; the generic ones should be just as good (if not, it would be
good to have a comment in the custom files explaining why we need
those).  Also, is the barrier in
sysdeps/sparc/sparc64/pthread_spin_trylock.S at the right position wrt.
to the actual acquisiton of the lock? (OTOH, I think we assume TSO
anyway, so a misplaced acquire or release MO fence is harmless).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-25 14:45         ` Adhemerval Zanella
@ 2016-10-26 14:46           ` Andreas Larsson
  2016-10-26 18:03             ` Adhemerval Zanella
  2016-10-27 10:38             ` Torvald Riegel
  0 siblings, 2 replies; 32+ messages in thread
From: Andreas Larsson @ 2016-10-26 14:46 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Torvald Riegel, GNU C Library, David Miller, software

On 2016-10-25 16:44, Adhemerval Zanella wrote:
>
>
>> On 25 Oct 2016, at 12:34, Andreas Larsson <andreas@gaisler.com> wrote:
>>
>>> On 2016-10-24 19:42, Adhemerval Zanella wrote:
>>>
>>>
>>>> On 24/10/2016 15:25, Torvald Riegel wrote:
>>>>> On Fri, 2016-10-21 at 10:59 +0200, Andreas Larsson wrote:
>>>>>> On 2016-10-20 21:47, Adhemerval Zanella wrote:
>>>>>> The sparcv8 build is broken since GLIBC 2.23 due the new pthread
>>>>>> barrier implementation [1] and since then there is no thread or
>>>>>> interest on fixing it (Torvald has suggested some options on
>>>>>> 2.23 release thread).  It won't help with both new pthread rdlock
>>>>>> and cond implementation, although I would expect that it relies
>>>>>> on same atomic primitive that was not present for pthread barrier.
>>>>>>
>>>>>> AFAIK, recent commercial sparc chips from Oracle all supports
>>>>>> sparcv9.  The only somewhat recent sparc chip with just sparcv8
>>>>>> support is LEON4, which I really doubt it cares for glibc support.
>>>>>
>>>>> Hi!
>>>>>
>>>>> We do care about GLIBC support for many different LEON3 and LEON4
>>>>> systems. GLIBC support for sparcv8 is important for us and it is
>>>>> important for our customers. Both LEON3 and LEON4 are continuously used
>>>>> in new hardware designs.
>>>>
>>>> If you do care about it, it would be nice if you could (help) maintain
>>>> sparcv8 (e.g., regularly testing most recent glibc on sparcv8, at the
>>>> very least early during the freeze of each release).  This ensures that
>>>> you won't get surprises such as this one, when nobody else is spending
>>>> resources on it.
>>>>
>>>>> We are not always using the latest version of GLIBC (the latest step we
>>>>> took was to GLIBC 2.20), so unfortunately we missed this issue. We will
>>>>> look into what the extent of the missing support is. Any pointers are
>>>>> most welcome.
>>>>>
>>>>> Do you have a link to the suggested options on the 2.23 release thread?
>>>>> I dug around a bit in the archives, but did not find it.
>>>>>
>>>>> (As a side note, most of the recent LEON3 and LEON4 chips have CAS
>>>>> instruction support, but pure sparcv8 support is of course the baseline.)
>>>>
>>>> Yes, the lack of CAS is the major problem I am aware of.  If the chips
>>>> you mention do support CAS, then a patch that adds support for the
>>>> CAS-based atomic operations in glibc would fix the barrier problem
>>>> (because the generic barrier should just work).  The patch would also
>>>> have to add configure bits or whatever would be appropriate so that
>>>> glibc can figure out whether it is supposed to be run on a sparcv8 with
>>>> or without CAS.
>>>>
>>>> What about stopping support for plain sparcv8, and keeping to support
>>>> sparcv8+CAS provided that we have a (group of) maintainer(s) for the
>>>> latter that can tend to the minimal responsibilities of an arch
>>>> maintainer and has the time to do that?
>>>
>>> At least the build for sparcv9-linux-gnu with -mcpu=leon3 finishes,
>>> although I am not sure if it correctly runs on leon processors.
>>> And I seconded Tovarld's suggestion about stop maintaining plain
>>> sparcv8 and set sparcv8+CAS as the base supported sparc32.
>>
>> I have mixed feelings about this, but it is certainly better than
>> throwing out sparcv8 outright.
>
>>> As pointed out by David Miller, correct support for plain sparcv8
>>> could really only be provided with kernel supported.  And when
>>> it lands on kernel side, it should work effortlessly with a
>>> sparcv8 + cas glibc build.
>>
>> What do you mean by "work effortlessly with a sparcv8 + cas glibc
>> build"?
>
> Meaning that even if underlying hardware does not support correct CAS,
> kernel emulation will provide it and thus a default GLIBC sparc32 build
> will work regardless.

I am not sure it is as simple as that. Even if the kernel makes sure
that an emulated CAS is atomic against another emulated CAS, it would
not guarantee atomicity against a plain store instruction on a different
CPU, right? For the emulated CAS to work on an SMP system I would think
the atomic_store_relaxed and atomic_store_release functions would also
need to be handled by the kernel, locking the write out when the CAS is
emulated, to keep the interaction linearizable.

-- 
Best regards,
Andreas Larsson

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-26 14:46           ` Andreas Larsson
@ 2016-10-26 18:03             ` Adhemerval Zanella
  2016-10-26 18:47               ` David Miller
  2016-10-27 10:38             ` Torvald Riegel
  1 sibling, 1 reply; 32+ messages in thread
From: Adhemerval Zanella @ 2016-10-26 18:03 UTC (permalink / raw)
  To: Andreas Larsson; +Cc: Torvald Riegel, GNU C Library, David Miller, software



On 26/10/2016 12:45, Andreas Larsson wrote:
> On 2016-10-25 16:44, Adhemerval Zanella wrote:
>>
>>
>>> On 25 Oct 2016, at 12:34, Andreas Larsson <andreas@gaisler.com> wrote:
>>>
>>>> On 2016-10-24 19:42, Adhemerval Zanella wrote:
>>>>
>>>>
>>>>> On 24/10/2016 15:25, Torvald Riegel wrote:
>>>>>> On Fri, 2016-10-21 at 10:59 +0200, Andreas Larsson wrote:
>>>>>>> On 2016-10-20 21:47, Adhemerval Zanella wrote:
>>>>>>> The sparcv8 build is broken since GLIBC 2.23 due the new pthread
>>>>>>> barrier implementation [1] and since then there is no thread or
>>>>>>> interest on fixing it (Torvald has suggested some options on
>>>>>>> 2.23 release thread).  It won't help with both new pthread rdlock
>>>>>>> and cond implementation, although I would expect that it relies
>>>>>>> on same atomic primitive that was not present for pthread barrier.
>>>>>>>
>>>>>>> AFAIK, recent commercial sparc chips from Oracle all supports
>>>>>>> sparcv9.  The only somewhat recent sparc chip with just sparcv8
>>>>>>> support is LEON4, which I really doubt it cares for glibc support.
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> We do care about GLIBC support for many different LEON3 and LEON4
>>>>>> systems. GLIBC support for sparcv8 is important for us and it is
>>>>>> important for our customers. Both LEON3 and LEON4 are continuously used
>>>>>> in new hardware designs.
>>>>>
>>>>> If you do care about it, it would be nice if you could (help) maintain
>>>>> sparcv8 (e.g., regularly testing most recent glibc on sparcv8, at the
>>>>> very least early during the freeze of each release).  This ensures that
>>>>> you won't get surprises such as this one, when nobody else is spending
>>>>> resources on it.
>>>>>
>>>>>> We are not always using the latest version of GLIBC (the latest step we
>>>>>> took was to GLIBC 2.20), so unfortunately we missed this issue. We will
>>>>>> look into what the extent of the missing support is. Any pointers are
>>>>>> most welcome.
>>>>>>
>>>>>> Do you have a link to the suggested options on the 2.23 release thread?
>>>>>> I dug around a bit in the archives, but did not find it.
>>>>>>
>>>>>> (As a side note, most of the recent LEON3 and LEON4 chips have CAS
>>>>>> instruction support, but pure sparcv8 support is of course the baseline.)
>>>>>
>>>>> Yes, the lack of CAS is the major problem I am aware of.  If the chips
>>>>> you mention do support CAS, then a patch that adds support for the
>>>>> CAS-based atomic operations in glibc would fix the barrier problem
>>>>> (because the generic barrier should just work).  The patch would also
>>>>> have to add configure bits or whatever would be appropriate so that
>>>>> glibc can figure out whether it is supposed to be run on a sparcv8 with
>>>>> or without CAS.
>>>>>
>>>>> What about stopping support for plain sparcv8, and keeping to support
>>>>> sparcv8+CAS provided that we have a (group of) maintainer(s) for the
>>>>> latter that can tend to the minimal responsibilities of an arch
>>>>> maintainer and has the time to do that?
>>>>
>>>> At least the build for sparcv9-linux-gnu with -mcpu=leon3 finishes,
>>>> although I am not sure if it correctly runs on leon processors.
>>>> And I seconded Tovarld's suggestion about stop maintaining plain
>>>> sparcv8 and set sparcv8+CAS as the base supported sparc32.
>>>
>>> I have mixed feelings about this, but it is certainly better than
>>> throwing out sparcv8 outright.
>>
>>>> As pointed out by David Miller, correct support for plain sparcv8
>>>> could really only be provided with kernel supported.  And when
>>>> it lands on kernel side, it should work effortlessly with a
>>>> sparcv8 + cas glibc build.
>>>
>>> What do you mean by "work effortlessly with a sparcv8 + cas glibc
>>> build"?
>>
>> Meaning that even if underlying hardware does not support correct CAS,
>> kernel emulation will provide it and thus a default GLIBC sparc32 build
>> will work regardless.
> 
> I am not sure it is as simple as that. Even if the kernel makes sure
> that an emulated CAS is atomic against another emulated CAS, it would
> not guarantee atomicity against a plain store instruction on a different
> CPU, right? For the emulated CAS to work on an SMP system I would think
> the atomic_store_relaxed and atomic_store_release functions would also
> need to be handled by the kernel, locking the write out when the CAS is
> emulated, to keep the interaction linearizable.
> 

I would expect kernel to emulate all the define atomic operation defined
in ISA to provide correct atomic semantic. I am not really sure how
feasible it would be, but the idea is from library standpoint running
on a machine with a emulated atomic provided by kernel is semantic
equal to running on a machine with hardware provided atomic.

And I think it would be not feasible to keep pushing for C11 atomics
on glibc if we can not guarantee it. 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-26 18:03             ` Adhemerval Zanella
@ 2016-10-26 18:47               ` David Miller
  2016-10-26 19:39                 ` Adhemerval Zanella
  2016-10-27 10:54                 ` Torvald Riegel
  0 siblings, 2 replies; 32+ messages in thread
From: David Miller @ 2016-10-26 18:47 UTC (permalink / raw)
  To: adhemerval.zanella; +Cc: andreas, triegel, libc-alpha, software

From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Wed, 26 Oct 2016 16:02:50 -0200

>> I am not sure it is as simple as that. Even if the kernel makes sure
>> that an emulated CAS is atomic against another emulated CAS, it would
>> not guarantee atomicity against a plain store instruction on a different
>> CPU, right? For the emulated CAS to work on an SMP system I would think
>> the atomic_store_relaxed and atomic_store_release functions would also
>> need to be handled by the kernel, locking the write out when the CAS is
>> emulated, to keep the interaction linearizable.
>> 
> 
> I would expect kernel to emulate all the define atomic operation defined
> in ISA to provide correct atomic semantic. I am not really sure how
> feasible it would be, but the idea is from library standpoint running
> on a machine with a emulated atomic provided by kernel is semantic
> equal to running on a machine with hardware provided atomic.
> 
> And I think it would be not feasible to keep pushing for C11 atomics
> on glibc if we can not guarantee it. 

Plain stores would semantically not be allowed on such a shared value
anyways.

If atomicity is required, then nobody should do direct stores.  Direct
stores are unchecked and non-atomic.  Whether the kernel implements
the CAS or the cpu does it directly has no bearing on this issue.

All entities should always use the CAS operation to modify such values.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-26 18:47               ` David Miller
@ 2016-10-26 19:39                 ` Adhemerval Zanella
  2016-10-27 10:54                 ` Torvald Riegel
  1 sibling, 0 replies; 32+ messages in thread
From: Adhemerval Zanella @ 2016-10-26 19:39 UTC (permalink / raw)
  To: David Miller; +Cc: andreas, triegel, libc-alpha, software



On 26/10/2016 16:47, David Miller wrote:
> From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> Date: Wed, 26 Oct 2016 16:02:50 -0200
> 
>>> I am not sure it is as simple as that. Even if the kernel makes sure
>>> that an emulated CAS is atomic against another emulated CAS, it would
>>> not guarantee atomicity against a plain store instruction on a different
>>> CPU, right? For the emulated CAS to work on an SMP system I would think
>>> the atomic_store_relaxed and atomic_store_release functions would also
>>> need to be handled by the kernel, locking the write out when the CAS is
>>> emulated, to keep the interaction linearizable.
>>>
>>
>> I would expect kernel to emulate all the define atomic operation defined
>> in ISA to provide correct atomic semantic. I am not really sure how
>> feasible it would be, but the idea is from library standpoint running
>> on a machine with a emulated atomic provided by kernel is semantic
>> equal to running on a machine with hardware provided atomic.
>>
>> And I think it would be not feasible to keep pushing for C11 atomics
>> on glibc if we can not guarantee it. 
> 
> Plain stores would semantically not be allowed on such a shared value
> anyways.
> 
> If atomicity is required, then nobody should do direct stores.  Direct
> stores are unchecked and non-atomic.  Whether the kernel implements
> the CAS or the cpu does it directly has no bearing on this issue.
> 
> All entities should always use the CAS operation to modify such values.

And neither it is the idea for current glibc concurrency as described 
in the wiki [1].  Since the idea is using C11 atomic memory model
and semantic, plain store on atomic variables is not supported.

What I would expect is for sparc to have semantically correct
atomic_{load,store}_* operation, either with custom implementation or
even based on supported CAS operation.

[1] https://sourceware.org/glibc/wiki/Concurrency

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-26 14:46           ` Andreas Larsson
  2016-10-26 18:03             ` Adhemerval Zanella
@ 2016-10-27 10:38             ` Torvald Riegel
  2016-11-01 15:27               ` Andreas Larsson
  1 sibling, 1 reply; 32+ messages in thread
From: Torvald Riegel @ 2016-10-27 10:38 UTC (permalink / raw)
  To: Andreas Larsson; +Cc: Adhemerval Zanella, GNU C Library, David Miller, software

On Wed, 2016-10-26 at 16:45 +0200, Andreas Larsson wrote:
> On 2016-10-25 16:44, Adhemerval Zanella wrote:
> >
> >
> >> On 25 Oct 2016, at 12:34, Andreas Larsson <andreas@gaisler.com> wrote:
> >>
> >>> On 2016-10-24 19:42, Adhemerval Zanella wrote:
> >>>
> >>>
> >>>> On 24/10/2016 15:25, Torvald Riegel wrote:
> >>>>> On Fri, 2016-10-21 at 10:59 +0200, Andreas Larsson wrote:
> >>>>>> On 2016-10-20 21:47, Adhemerval Zanella wrote:
> >>>>>> The sparcv8 build is broken since GLIBC 2.23 due the new pthread
> >>>>>> barrier implementation [1] and since then there is no thread or
> >>>>>> interest on fixing it (Torvald has suggested some options on
> >>>>>> 2.23 release thread).  It won't help with both new pthread rdlock
> >>>>>> and cond implementation, although I would expect that it relies
> >>>>>> on same atomic primitive that was not present for pthread barrier.
> >>>>>>
> >>>>>> AFAIK, recent commercial sparc chips from Oracle all supports
> >>>>>> sparcv9.  The only somewhat recent sparc chip with just sparcv8
> >>>>>> support is LEON4, which I really doubt it cares for glibc support.
> >>>>>
> >>>>> Hi!
> >>>>>
> >>>>> We do care about GLIBC support for many different LEON3 and LEON4
> >>>>> systems. GLIBC support for sparcv8 is important for us and it is
> >>>>> important for our customers. Both LEON3 and LEON4 are continuously used
> >>>>> in new hardware designs.
> >>>>
> >>>> If you do care about it, it would be nice if you could (help) maintain
> >>>> sparcv8 (e.g., regularly testing most recent glibc on sparcv8, at the
> >>>> very least early during the freeze of each release).  This ensures that
> >>>> you won't get surprises such as this one, when nobody else is spending
> >>>> resources on it.
> >>>>
> >>>>> We are not always using the latest version of GLIBC (the latest step we
> >>>>> took was to GLIBC 2.20), so unfortunately we missed this issue. We will
> >>>>> look into what the extent of the missing support is. Any pointers are
> >>>>> most welcome.
> >>>>>
> >>>>> Do you have a link to the suggested options on the 2.23 release thread?
> >>>>> I dug around a bit in the archives, but did not find it.
> >>>>>
> >>>>> (As a side note, most of the recent LEON3 and LEON4 chips have CAS
> >>>>> instruction support, but pure sparcv8 support is of course the baseline.)
> >>>>
> >>>> Yes, the lack of CAS is the major problem I am aware of.  If the chips
> >>>> you mention do support CAS, then a patch that adds support for the
> >>>> CAS-based atomic operations in glibc would fix the barrier problem
> >>>> (because the generic barrier should just work).  The patch would also
> >>>> have to add configure bits or whatever would be appropriate so that
> >>>> glibc can figure out whether it is supposed to be run on a sparcv8 with
> >>>> or without CAS.
> >>>>
> >>>> What about stopping support for plain sparcv8, and keeping to support
> >>>> sparcv8+CAS provided that we have a (group of) maintainer(s) for the
> >>>> latter that can tend to the minimal responsibilities of an arch
> >>>> maintainer and has the time to do that?
> >>>
> >>> At least the build for sparcv9-linux-gnu with -mcpu=leon3 finishes,
> >>> although I am not sure if it correctly runs on leon processors.
> >>> And I seconded Tovarld's suggestion about stop maintaining plain
> >>> sparcv8 and set sparcv8+CAS as the base supported sparc32.
> >>
> >> I have mixed feelings about this, but it is certainly better than
> >> throwing out sparcv8 outright.
> >
> >>> As pointed out by David Miller, correct support for plain sparcv8
> >>> could really only be provided with kernel supported.  And when
> >>> it lands on kernel side, it should work effortlessly with a
> >>> sparcv8 + cas glibc build.
> >>
> >> What do you mean by "work effortlessly with a sparcv8 + cas glibc
> >> build"?
> >
> > Meaning that even if underlying hardware does not support correct CAS,
> > kernel emulation will provide it and thus a default GLIBC sparc32 build
> > will work regardless.
> 
> I am not sure it is as simple as that. Even if the kernel makes sure
> that an emulated CAS is atomic against another emulated CAS, it would
> not guarantee atomicity against a plain store instruction on a different
> CPU, right? For the emulated CAS to work on an SMP system I would think
> the atomic_store_relaxed and atomic_store_release functions would also
> need to be handled by the kernel, locking the write out when the CAS is
> emulated, to keep the interaction linearizable.

Is there still recent sparcv8 hardware that has no native CAS but
multiple CPU cores?  I think we've used the kernel emulation only on
non-multi-core systems so far.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-26 18:47               ` David Miller
  2016-10-26 19:39                 ` Adhemerval Zanella
@ 2016-10-27 10:54                 ` Torvald Riegel
  2016-10-27 14:36                   ` Carlos O'Donell
  1 sibling, 1 reply; 32+ messages in thread
From: Torvald Riegel @ 2016-10-27 10:54 UTC (permalink / raw)
  To: David Miller; +Cc: adhemerval.zanella, andreas, libc-alpha, software

On Wed, 2016-10-26 at 14:47 -0400, David Miller wrote:
> From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> Date: Wed, 26 Oct 2016 16:02:50 -0200
> 
> >> I am not sure it is as simple as that. Even if the kernel makes sure
> >> that an emulated CAS is atomic against another emulated CAS, it would
> >> not guarantee atomicity against a plain store instruction on a different
> >> CPU, right? For the emulated CAS to work on an SMP system I would think
> >> the atomic_store_relaxed and atomic_store_release functions would also
> >> need to be handled by the kernel, locking the write out when the CAS is
> >> emulated, to keep the interaction linearizable.
> >> 
> > 
> > I would expect kernel to emulate all the define atomic operation defined
> > in ISA to provide correct atomic semantic. I am not really sure how
> > feasible it would be, but the idea is from library standpoint running
> > on a machine with a emulated atomic provided by kernel is semantic
> > equal to running on a machine with hardware provided atomic.
> > 
> > And I think it would be not feasible to keep pushing for C11 atomics
> > on glibc if we can not guarantee it. 
> 
> Plain stores would semantically not be allowed on such a shared value
> anyways.
> 
> If atomicity is required, then nobody should do direct stores.  Direct
> stores are unchecked and non-atomic.  Whether the kernel implements
> the CAS or the cpu does it directly has no bearing on this issue.
> 
> All entities should always use the CAS operation to modify such values.

I'm not quite sure what you're trying to say, so I'll make a few general
comments that will hopefully help clarify.

It is true that we do want to use the C11 memory model throughout glibc,
which means a data-race-freedom requirement for glibc code, which in
turn means having to use atomic operations (ie, atomic_* ()) whenever
there would be a data race (as defined by C11) otherwise.

The implementation of atomic_*() in glibc is an exception to that rule,
in that on some systems we may know that in a controlled environment
(eg, function not inlined or volatile used), the compiler will generate
code for a plain store/load in the implementation of an atomic_*()
function that is equivalent to a relaxed MO atomic store/load (including
effects of fences).
This makes concurrent relaxed MO loads/stores work.

If we also have a non-multi-core system, entering the kernel emulation
for CAS then stops all other execution on the system, so the CAS
emulation in the kernel is atomic.
If instead we have a multi-core system, either the kernel would have to
temporarily stop all other cores while emulating the CAS, or all
atomic_*() would have to use the kernel.  Which of these two options is
better is hard to say upfront.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-27 10:54                 ` Torvald Riegel
@ 2016-10-27 14:36                   ` Carlos O'Donell
  2016-11-07 16:38                     ` David Miller
  0 siblings, 1 reply; 32+ messages in thread
From: Carlos O'Donell @ 2016-10-27 14:36 UTC (permalink / raw)
  To: Torvald Riegel, David Miller
  Cc: adhemerval.zanella, andreas, libc-alpha, software

On 10/27/2016 06:52 AM, Torvald Riegel wrote:
> On Wed, 2016-10-26 at 14:47 -0400, David Miller wrote:
>> From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
>> Date: Wed, 26 Oct 2016 16:02:50 -0200
>>
>>>> I am not sure it is as simple as that. Even if the kernel makes sure
>>>> that an emulated CAS is atomic against another emulated CAS, it would
>>>> not guarantee atomicity against a plain store instruction on a different
>>>> CPU, right? For the emulated CAS to work on an SMP system I would think
>>>> the atomic_store_relaxed and atomic_store_release functions would also
>>>> need to be handled by the kernel, locking the write out when the CAS is
>>>> emulated, to keep the interaction linearizable.
>>>>
>>>
>>> I would expect kernel to emulate all the define atomic operation defined
>>> in ISA to provide correct atomic semantic. I am not really sure how
>>> feasible it would be, but the idea is from library standpoint running
>>> on a machine with a emulated atomic provided by kernel is semantic
>>> equal to running on a machine with hardware provided atomic.
>>>
>>> And I think it would be not feasible to keep pushing for C11 atomics
>>> on glibc if we can not guarantee it. 
>>
>> Plain stores would semantically not be allowed on such a shared value
>> anyways.
>>
>> If atomicity is required, then nobody should do direct stores.  Direct
>> stores are unchecked and non-atomic.  Whether the kernel implements
>> the CAS or the cpu does it directly has no bearing on this issue.
>>
>> All entities should always use the CAS operation to modify such values.
> 
> I'm not quite sure what you're trying to say, so I'll make a few general
> comments that will hopefully help clarify.
> 
> It is true that we do want to use the C11 memory model throughout glibc,
> which means a data-race-freedom requirement for glibc code, which in
> turn means having to use atomic operations (ie, atomic_* ()) whenever
> there would be a data race (as defined by C11) otherwise.
> 
> The implementation of atomic_*() in glibc is an exception to that rule,
> in that on some systems we may know that in a controlled environment
> (eg, function not inlined or volatile used), the compiler will generate
> code for a plain store/load in the implementation of an atomic_*()
> function that is equivalent to a relaxed MO atomic store/load (including
> effects of fences).
> This makes concurrent relaxed MO loads/stores work.
> 
> If we also have a non-multi-core system, entering the kernel emulation
> for CAS then stops all other execution on the system, so the CAS
> emulation in the kernel is atomic.
> If instead we have a multi-core system, either the kernel would have to
> temporarily stop all other cores while emulating the CAS, or all
> atomic_*() would have to use the kernel.  Which of these two options is
> better is hard to say upfront.
 
Agreed.

For hppa we have similar problems we I'm moving everything over to
emulated CAS from initialization to accesses e.g. atomic_exchange_*.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-27 10:38             ` Torvald Riegel
@ 2016-11-01 15:27               ` Andreas Larsson
  0 siblings, 0 replies; 32+ messages in thread
From: Andreas Larsson @ 2016-11-01 15:27 UTC (permalink / raw)
  To: Torvald Riegel; +Cc: Adhemerval Zanella, GNU C Library, David Miller, software

On 2016-10-27 12:38, Torvald Riegel wrote:
> On Wed, 2016-10-26 at 16:45 +0200, Andreas Larsson wrote:
>> On 2016-10-25 16:44, Adhemerval Zanella wrote:
>>>
>>>
>>>> On 25 Oct 2016, at 12:34, Andreas Larsson <andreas@gaisler.com> wrote:
>>>>
>>>>> On 2016-10-24 19:42, Adhemerval Zanella wrote:
>>>>>
>>>>>
>>>>>> On 24/10/2016 15:25, Torvald Riegel wrote:
>>>>>>> On Fri, 2016-10-21 at 10:59 +0200, Andreas Larsson wrote:
>>>>>>>> On 2016-10-20 21:47, Adhemerval Zanella wrote:
>>>>>>>> The sparcv8 build is broken since GLIBC 2.23 due the new pthread
>>>>>>>> barrier implementation [1] and since then there is no thread or
>>>>>>>> interest on fixing it (Torvald has suggested some options on
>>>>>>>> 2.23 release thread).  It won't help with both new pthread rdlock
>>>>>>>> and cond implementation, although I would expect that it relies
>>>>>>>> on same atomic primitive that was not present for pthread barrier.
>>>>>>>>
>>>>>>>> AFAIK, recent commercial sparc chips from Oracle all supports
>>>>>>>> sparcv9.  The only somewhat recent sparc chip with just sparcv8
>>>>>>>> support is LEON4, which I really doubt it cares for glibc support.
>>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> We do care about GLIBC support for many different LEON3 and LEON4
>>>>>>> systems. GLIBC support for sparcv8 is important for us and it is
>>>>>>> important for our customers. Both LEON3 and LEON4 are continuously used
>>>>>>> in new hardware designs.
>>>>>>
>>>>>> If you do care about it, it would be nice if you could (help) maintain
>>>>>> sparcv8 (e.g., regularly testing most recent glibc on sparcv8, at the
>>>>>> very least early during the freeze of each release).  This ensures that
>>>>>> you won't get surprises such as this one, when nobody else is spending
>>>>>> resources on it.
>>>>>>
>>>>>>> We are not always using the latest version of GLIBC (the latest step we
>>>>>>> took was to GLIBC 2.20), so unfortunately we missed this issue. We will
>>>>>>> look into what the extent of the missing support is. Any pointers are
>>>>>>> most welcome.
>>>>>>>
>>>>>>> Do you have a link to the suggested options on the 2.23 release thread?
>>>>>>> I dug around a bit in the archives, but did not find it.
>>>>>>>
>>>>>>> (As a side note, most of the recent LEON3 and LEON4 chips have CAS
>>>>>>> instruction support, but pure sparcv8 support is of course the baseline.)
>>>>>>
>>>>>> Yes, the lack of CAS is the major problem I am aware of.  If the chips
>>>>>> you mention do support CAS, then a patch that adds support for the
>>>>>> CAS-based atomic operations in glibc would fix the barrier problem
>>>>>> (because the generic barrier should just work).  The patch would also
>>>>>> have to add configure bits or whatever would be appropriate so that
>>>>>> glibc can figure out whether it is supposed to be run on a sparcv8 with
>>>>>> or without CAS.
>>>>>>
>>>>>> What about stopping support for plain sparcv8, and keeping to support
>>>>>> sparcv8+CAS provided that we have a (group of) maintainer(s) for the
>>>>>> latter that can tend to the minimal responsibilities of an arch
>>>>>> maintainer and has the time to do that?
>>>>>
>>>>> At least the build for sparcv9-linux-gnu with -mcpu=leon3 finishes,
>>>>> although I am not sure if it correctly runs on leon processors.
>>>>> And I seconded Tovarld's suggestion about stop maintaining plain
>>>>> sparcv8 and set sparcv8+CAS as the base supported sparc32.
>>>>
>>>> I have mixed feelings about this, but it is certainly better than
>>>> throwing out sparcv8 outright.
>>>
>>>>> As pointed out by David Miller, correct support for plain sparcv8
>>>>> could really only be provided with kernel supported.  And when
>>>>> it lands on kernel side, it should work effortlessly with a
>>>>> sparcv8 + cas glibc build.
>>>>
>>>> What do you mean by "work effortlessly with a sparcv8 + cas glibc
>>>> build"?
>>>
>>> Meaning that even if underlying hardware does not support correct CAS,
>>> kernel emulation will provide it and thus a default GLIBC sparc32 build
>>> will work regardless.
>>
>> I am not sure it is as simple as that. Even if the kernel makes sure
>> that an emulated CAS is atomic against another emulated CAS, it would
>> not guarantee atomicity against a plain store instruction on a different
>> CPU, right? For the emulated CAS to work on an SMP system I would think
>> the atomic_store_relaxed and atomic_store_release functions would also
>> need to be handled by the kernel, locking the write out when the CAS is
>> emulated, to keep the interaction linearizable.
>
> Is there still recent sparcv8 hardware that has no native CAS but
> multiple CPU cores?  I think we've used the kernel emulation only on
> non-multi-core systems so far.

There are no LEON3 or LEON4 multiprocessor chips that I am aware of that 
lacks CAS. I cannot rule out that some customer has instantiated such a 
design though.

Best regards,
Andreas Larsson

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-10-27 14:36                   ` Carlos O'Donell
@ 2016-11-07 16:38                     ` David Miller
  2016-11-07 21:21                       ` Sam Ravnborg
  2016-11-09 17:08                       ` Torvald Riegel
  0 siblings, 2 replies; 32+ messages in thread
From: David Miller @ 2016-11-07 16:38 UTC (permalink / raw)
  To: carlos; +Cc: triegel, adhemerval.zanella, andreas, libc-alpha, software

[-- Attachment #1: Type: Text/Plain, Size: 897 bytes --]


So the following attached is what I started playing around with this
weekend.

It implements software trap "0x23" to perform a CAS operations, the
operands are expected in registers %o0, %o1, and %o2.

Since it was easiest to test I implemented this first on sparc64 which
just executes the CAS instruction directly.  I'll start working on the
32-bit part in the background.

The capability will be advertised via the mask returned by the "get
kernel features" system call.  We could check this early in the
crt'ish code and cache the value in a variable which the atomics can
check.

Another kernel side change I have to do is advertise the LEON CAS
availability in the _dl_hwcaps so that we can use the LEON CAS in
glibc when available.

The first patch is the kernel side, and the second is the glibc side.
The whole NPTL testsuite passes for the plain 32-bit sparc target with
these changes.

[-- Attachment #2: casemul_kernel.diff --]
[-- Type: Text/X-Patch, Size: 3367 bytes --]

From fa1cad39df7318cdb46baea5774c340322cd74f2 Mon Sep 17 00:00:00 2001
From: "David S. Miller" <davem@davemloft.net>
Date: Mon, 7 Nov 2016 08:27:05 -0800
Subject: [PATCH] sparc64: Add CAS emulation trap.

Older 32-bit sparc cpus (other than LEON) lack a CAS instruction, so
we need to provide some kind of helper infrastructure in the kernel
to emulate it.

This is the first part which firstly defines the basic infrastructure
and the simplest implementation, which is to just directly execute the
instruction on sparc64.

We make use of the window fill/spill fault unwind facilities to make
this as simple as possible.  When we take a full TSB miss, we check if
the trap level is greater than one, and if so unwind the trap to one
of the final 3 instructions of the interrupted trap handler's block.
Which of the three to use is based upon whether this is a real fault,
an unaligned access, or a data access exception (ie. bus error).

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/include/uapi/asm/unistd.h | 1 +
 arch/sparc/kernel/Makefile           | 1 +
 arch/sparc/kernel/sys_sparc_64.c     | 2 +-
 arch/sparc/kernel/ttable_64.S        | 3 ++-
 4 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/include/uapi/asm/unistd.h b/arch/sparc/include/uapi/asm/unistd.h
index 36eee81..0725911 100644
--- a/arch/sparc/include/uapi/asm/unistd.h
+++ b/arch/sparc/include/uapi/asm/unistd.h
@@ -430,6 +430,7 @@
 
 /* Bitmask values returned from kern_features system call.  */
 #define KERN_FEATURE_MIXED_MODE_STACK	0x00000001
+#define KERN_FEATURE_CAS_EMUL		0x00000002
 
 #ifdef __32bit_syscall_numbers__
 /* Sparc 32-bit only has the "setresuid32", "getresuid32" variants,
diff --git a/arch/sparc/kernel/Makefile b/arch/sparc/kernel/Makefile
index fa3c02d..1166638 100644
--- a/arch/sparc/kernel/Makefile
+++ b/arch/sparc/kernel/Makefile
@@ -21,6 +21,7 @@ CFLAGS_REMOVE_perf_event.o := -pg
 CFLAGS_REMOVE_pcr.o := -pg
 endif
 
+obj-$(CONFIG_SPARC64)   += casemul.o
 obj-$(CONFIG_SPARC64)   += urtt_fill.o
 obj-$(CONFIG_SPARC32)   += entry.o wof.o wuf.o
 obj-$(CONFIG_SPARC32)   += etrap_32.o
diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
index fe8b8ee..d55e8b8 100644
--- a/arch/sparc/kernel/sys_sparc_64.c
+++ b/arch/sparc/kernel/sys_sparc_64.c
@@ -643,5 +643,5 @@ SYSCALL_DEFINE5(rt_sigaction, int, sig, const struct sigaction __user *, act,
 
 asmlinkage long sys_kern_features(void)
 {
-	return KERN_FEATURE_MIXED_MODE_STACK;
+	return KERN_FEATURE_MIXED_MODE_STACK | KERN_FEATURE_CAS_EMUL;
 }
diff --git a/arch/sparc/kernel/ttable_64.S b/arch/sparc/kernel/ttable_64.S
index c6dfdaa..3364019 100644
--- a/arch/sparc/kernel/ttable_64.S
+++ b/arch/sparc/kernel/ttable_64.S
@@ -147,7 +147,8 @@ tl0_resv11e:	TRAP_UTRAP(UT_TRAP_INSTRUCTION_30,0x11e) TRAP_UTRAP(UT_TRAP_INSTRUC
 tl0_getcc:	GETCC_TRAP
 tl0_setcc:	SETCC_TRAP
 tl0_getpsr:	TRAP(do_getpsr)
-tl0_resv123:	BTRAP(0x123) BTRAP(0x124) BTRAP(0x125) BTRAP(0x126) BTRAP(0x127)
+tl0_cas:	TRAP_NOSAVE(emulate_cas)
+tl0_resv124:	BTRAP(0x124) BTRAP(0x125) BTRAP(0x126) BTRAP(0x127)
 tl0_resv128:	BTRAP(0x128) BTRAP(0x129) BTRAP(0x12a) BTRAP(0x12b) BTRAP(0x12c)
 tl0_resv12d:	BTRAP(0x12d) BTRAP(0x12e) BTRAP(0x12f) BTRAP(0x130) BTRAP(0x131)
 tl0_resv132:	BTRAP(0x132) BTRAP(0x133) BTRAP(0x134) BTRAP(0x135) BTRAP(0x136)
-- 
2.1.2.532.g19b5d50


[-- Attachment #3: casemul_glibc.diff --]
[-- Type: Text/X-Patch, Size: 24469 bytes --]

From 9e4a9f69dd74c47a7d84c1233164acfae7602a9f Mon Sep 17 00:00:00 2001
From: "David S. Miller" <davem@davemloft.net>
Date: Sun, 6 Nov 2016 19:01:31 -0800
Subject: [PATCH] sparc: On 32-bit, provide atomics via kernel assist.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 sysdeps/sparc/sparc32/atomic-machine.h             | 313 ++-------------------
 sysdeps/sparc/sparc32/pthread_barrier_wait.c       |   1 -
 sysdeps/sparc/sparc32/sem_post.c                   |  82 ------
 sysdeps/sparc/sparc32/sem_waitcommon.c             | 146 ----------
 .../sparc/sparc32/sparcv9/pthread_barrier_wait.c   |   1 -
 sysdeps/sparc/sparc32/sparcv9/sem_post.c           |   1 -
 sysdeps/sparc/sparc32/sparcv9/sem_waitcommon.c     |   1 -
 .../unix/sysv/linux/sparc/sparc32/atomic-machine.h |  35 +++
 .../linux/sparc/sparc32/sparcv9/atomic-machine.h   |   1 +
 9 files changed, 55 insertions(+), 526 deletions(-)
 delete mode 100644 sysdeps/sparc/sparc32/pthread_barrier_wait.c
 delete mode 100644 sysdeps/sparc/sparc32/sem_post.c
 delete mode 100644 sysdeps/sparc/sparc32/sem_waitcommon.c
 delete mode 100644 sysdeps/sparc/sparc32/sparcv9/pthread_barrier_wait.c
 delete mode 100644 sysdeps/sparc/sparc32/sparcv9/sem_post.c
 delete mode 100644 sysdeps/sparc/sparc32/sparcv9/sem_waitcommon.c
 create mode 100644 sysdeps/unix/sysv/linux/sparc/sparc32/atomic-machine.h
 create mode 100644 sysdeps/unix/sysv/linux/sparc/sparc32/sparcv9/atomic-machine.h

diff --git a/sysdeps/sparc/sparc32/atomic-machine.h b/sysdeps/sparc/sparc32/atomic-machine.h
index d6e68f9..ceac729 100644
--- a/sysdeps/sparc/sparc32/atomic-machine.h
+++ b/sysdeps/sparc/sparc32/atomic-machine.h
@@ -50,311 +50,36 @@ typedef uintmax_t uatomic_max_t;
 #define __HAVE_64B_ATOMICS 0
 #define USE_ATOMIC_COMPILER_BUILTINS 0
 
+#define __arch_compare_and_exchange_val_8_acq(mem, newval, oldval) \
+  (abort (), (__typeof (*mem)) 0)
 
-/* We have no compare and swap, just test and set.
-   The following implementation contends on 64 global locks
-   per library and assumes no variable will be accessed using atomic.h
-   macros from two different libraries.  */
+#define __arch_compare_and_exchange_val_16_acq(mem, newval, oldval) \
+  (abort (), (__typeof (*mem)) 0)
 
-__make_section_unallocated
-  (".gnu.linkonce.b.__sparc32_atomic_locks, \"aw\", %nobits");
+# define __arch_compare_and_exchange_val_32_acq(mem, newval, oldval) \
+  __sparc_assisted_compare_and_exchange_val_32_acq ((mem), (newval), (oldval))
 
-volatile unsigned char __sparc32_atomic_locks[64]
-  __attribute__ ((nocommon, section (".gnu.linkonce.b.__sparc32_atomic_locks"
-				     __sec_comment),
-		  visibility ("hidden")));
+#define __arch_compare_and_exchange_val_64_acq(mem, newval, oldval) \
+  (abort (), (__typeof (*mem)) 0)
 
-#define __sparc32_atomic_do_lock(addr) \
-  do								      \
-    {								      \
-      unsigned int __old_lock;					      \
-      unsigned int __idx = (((long) addr >> 2) ^ ((long) addr >> 12)) \
-			   & 63;				      \
-      do							      \
-	__asm __volatile ("ldstub %1, %0"			      \
-			  : "=r" (__old_lock),			      \
-			    "=m" (__sparc32_atomic_locks[__idx])      \
-			  : "m" (__sparc32_atomic_locks[__idx])	      \
-			  : "memory");				      \
-      while (__old_lock);					      \
-    }								      \
-  while (0)
+#define atomic_compare_and_exchange_val_24_acq(mem, newval, oldval) \
+  atomic_compare_and_exchange_val_acq (mem, newval, oldval)
 
-#define __sparc32_atomic_do_unlock(addr) \
-  do								      \
-    {								      \
-      __sparc32_atomic_locks[(((long) addr >> 2)		      \
-			      ^ ((long) addr >> 12)) & 63] = 0;	      \
-      __asm __volatile ("" ::: "memory");			      \
-    }								      \
-  while (0)
-
-#define __sparc32_atomic_do_lock24(addr) \
-  do								      \
-    {								      \
-      unsigned int __old_lock;					      \
-      do							      \
-	__asm __volatile ("ldstub %1, %0"			      \
-			  : "=r" (__old_lock), "=m" (*(addr))	      \
-			  : "m" (*(addr))			      \
-			  : "memory");				      \
-      while (__old_lock);					      \
-    }								      \
-  while (0)
-
-#define __sparc32_atomic_do_unlock24(addr) \
-  do								      \
-    {								      \
-      __asm __volatile ("" ::: "memory");			      \
-      *(char *) (addr) = 0;					      \
-    }								      \
-  while (0)
-
-
-#ifndef SHARED
-# define __v9_compare_and_exchange_val_32_acq(mem, newval, oldval) \
-({union { __typeof (oldval) a; uint32_t v; } oldval_arg = { .a = (oldval) };  \
-  union { __typeof (newval) a; uint32_t v; } newval_arg = { .a = (newval) };  \
-  register uint32_t __acev_tmp __asm ("%g6");			              \
-  register __typeof (mem) __acev_mem __asm ("%g1") = (mem);		      \
-  register uint32_t __acev_oldval __asm ("%g5");		              \
-  __acev_tmp = newval_arg.v;						      \
-  __acev_oldval = oldval_arg.v;						      \
-  /* .word 0xcde05005 is cas [%g1], %g5, %g6.  Can't use cas here though,     \
-     because as will then mark the object file as V8+ arch.  */		      \
-  __asm __volatile (".word 0xcde05005"					      \
-		    : "+r" (__acev_tmp), "=m" (*__acev_mem)		      \
-		    : "r" (__acev_oldval), "m" (*__acev_mem),		      \
-		      "r" (__acev_mem) : "memory");			      \
-  (__typeof (oldval)) __acev_tmp; })
-#endif
-
-/* The only basic operation needed is compare and exchange.  */
-#define __v7_compare_and_exchange_val_acq(mem, newval, oldval) \
-  ({ __typeof (mem) __acev_memp = (mem);			      \
-     __typeof (*mem) __acev_ret;				      \
-     __typeof (*mem) __acev_newval = (newval);			      \
-								      \
-     __sparc32_atomic_do_lock (__acev_memp);			      \
-     __acev_ret = *__acev_memp;					      \
-     if (__acev_ret == (oldval))				      \
-       *__acev_memp = __acev_newval;				      \
-     __sparc32_atomic_do_unlock (__acev_memp);			      \
-     __acev_ret; })
-
-#define __v7_compare_and_exchange_bool_acq(mem, newval, oldval) \
-  ({ __typeof (mem) __aceb_memp = (mem);			      \
-     int __aceb_ret;						      \
-     __typeof (*mem) __aceb_newval = (newval);			      \
-								      \
-     __sparc32_atomic_do_lock (__aceb_memp);			      \
-     __aceb_ret = 0;						      \
-     if (*__aceb_memp == (oldval))				      \
-       *__aceb_memp = __aceb_newval;				      \
-     else							      \
-       __aceb_ret = 1;						      \
-     __sparc32_atomic_do_unlock (__aceb_memp);			      \
-     __aceb_ret; })
-
-#define __v7_exchange_acq(mem, newval) \
-  ({ __typeof (mem) __acev_memp = (mem);			      \
-     __typeof (*mem) __acev_ret;				      \
-     __typeof (*mem) __acev_newval = (newval);			      \
-								      \
-     __sparc32_atomic_do_lock (__acev_memp);			      \
-     __acev_ret = *__acev_memp;					      \
-     *__acev_memp = __acev_newval;				      \
-     __sparc32_atomic_do_unlock (__acev_memp);			      \
-     __acev_ret; })
-
-#define __v7_exchange_and_add(mem, value) \
-  ({ __typeof (mem) __acev_memp = (mem);			      \
-     __typeof (*mem) __acev_ret;				      \
-								      \
-     __sparc32_atomic_do_lock (__acev_memp);			      \
-     __acev_ret = *__acev_memp;					      \
-     *__acev_memp = __acev_ret + (value);			      \
-     __sparc32_atomic_do_unlock (__acev_memp);			      \
-     __acev_ret; })
-
-/* Special versions, which guarantee that top 8 bits of all values
-   are cleared and use those bits as the ldstub lock.  */
-#define __v7_compare_and_exchange_val_24_acq(mem, newval, oldval) \
-  ({ __typeof (mem) __acev_memp = (mem);			      \
-     __typeof (*mem) __acev_ret;				      \
-     __typeof (*mem) __acev_newval = (newval);			      \
-								      \
-     __sparc32_atomic_do_lock24 (__acev_memp);			      \
-     __acev_ret = *__acev_memp & 0xffffff;			      \
-     if (__acev_ret == (oldval))				      \
-       *__acev_memp = __acev_newval;				      \
-     else							      \
-       __sparc32_atomic_do_unlock24 (__acev_memp);		      \
-     __asm __volatile ("" ::: "memory");			      \
-     __acev_ret; })
-
-#define __v7_exchange_24_rel(mem, newval) \
-  ({ __typeof (mem) __acev_memp = (mem);			      \
-     __typeof (*mem) __acev_ret;				      \
-     __typeof (*mem) __acev_newval = (newval);			      \
-								      \
-     __sparc32_atomic_do_lock24 (__acev_memp);			      \
-     __acev_ret = *__acev_memp & 0xffffff;			      \
-     *__acev_memp = __acev_newval;				      \
-     __asm __volatile ("" ::: "memory");			      \
-     __acev_ret; })
-
-#ifdef SHARED
-
-/* When dynamically linked, we assume pre-v9 libraries are only ever
-   used on pre-v9 CPU.  */
-# define __atomic_is_v9 0
-
-# define atomic_compare_and_exchange_val_acq(mem, newval, oldval) \
-  __v7_compare_and_exchange_val_acq (mem, newval, oldval)
-
-# define atomic_compare_and_exchange_bool_acq(mem, newval, oldval) \
-  __v7_compare_and_exchange_bool_acq (mem, newval, oldval)
-
-# define atomic_exchange_acq(mem, newval) \
-  __v7_exchange_acq (mem, newval)
-
-# define atomic_exchange_and_add(mem, value) \
-  __v7_exchange_and_add (mem, value)
-
-# define atomic_compare_and_exchange_val_24_acq(mem, newval, oldval) \
-  ({								      \
-     if (sizeof (*mem) != 4)					      \
-       abort ();						      \
-     __v7_compare_and_exchange_val_24_acq (mem, newval, oldval); })
-
-# define atomic_exchange_24_rel(mem, newval) \
-  ({								      \
-     if (sizeof (*mem) != 4)					      \
-       abort ();						      \
-     __v7_exchange_24_rel (mem, newval); })
+#define atomic_exchange_24_rel(mem, newval) \
+  atomic_exchange_rel (mem, newval)
 
 # define atomic_full_barrier() __asm ("" ::: "memory")
 # define atomic_read_barrier() atomic_full_barrier ()
 # define atomic_write_barrier() atomic_full_barrier ()
 
-#else
-
-/* In libc.a/libpthread.a etc. we don't know if we'll be run on
-   pre-v9 or v9 CPU.  To be interoperable with dynamically linked
-   apps on v9 CPUs e.g. with process shared primitives, use cas insn
-   on v9 CPUs and ldstub on pre-v9.  */
-
-extern uint64_t _dl_hwcap __attribute__((weak));
-# define __atomic_is_v9 \
-  (__builtin_expect (&_dl_hwcap != 0, 1) \
-   && __builtin_expect (_dl_hwcap & HWCAP_SPARC_V9, HWCAP_SPARC_V9))
-
-# define atomic_compare_and_exchange_val_acq(mem, newval, oldval) \
-  ({								      \
-     __typeof (*mem) __acev_wret;				      \
-     if (sizeof (*mem) != 4)					      \
-       abort ();						      \
-     if (__atomic_is_v9)					      \
-       __acev_wret						      \
-	 = __v9_compare_and_exchange_val_32_acq (mem, newval, oldval);\
-     else							      \
-       __acev_wret						      \
-	 = __v7_compare_and_exchange_val_acq (mem, newval, oldval);   \
-     __acev_wret; })
-
-# define atomic_compare_and_exchange_bool_acq(mem, newval, oldval) \
-  ({								      \
-     int __acev_wret;						      \
-     if (sizeof (*mem) != 4)					      \
-       abort ();						      \
-     if (__atomic_is_v9)					      \
-       {							      \
-	 __typeof (oldval) __acev_woldval = (oldval);		      \
-	 __acev_wret						      \
-	   = __v9_compare_and_exchange_val_32_acq (mem, newval,	      \
-						   __acev_woldval)    \
-	     != __acev_woldval;					      \
-       }							      \
-     else							      \
-       __acev_wret						      \
-	 = __v7_compare_and_exchange_bool_acq (mem, newval, oldval);  \
-     __acev_wret; })
-
-# define atomic_exchange_rel(mem, newval) \
-  ({								      \
-     __typeof (*mem) __acev_wret;				      \
-     if (sizeof (*mem) != 4)					      \
-       abort ();						      \
-     if (__atomic_is_v9)					      \
-       {							      \
-	 __typeof (mem) __acev_wmemp = (mem);			      \
-	 __typeof (*(mem)) __acev_wval = (newval);		      \
-	 do							      \
-	   __acev_wret = *__acev_wmemp;				      \
-	 while (__builtin_expect				      \
-		  (__v9_compare_and_exchange_val_32_acq (__acev_wmemp,\
-							 __acev_wval, \
-							 __acev_wret) \
-		   != __acev_wret, 0));				      \
-       }							      \
-     else							      \
-       __acev_wret = __v7_exchange_acq (mem, newval);		      \
-     __acev_wret; })
-
-# define atomic_compare_and_exchange_val_24_acq(mem, newval, oldval) \
-  ({								      \
-     __typeof (*mem) __acev_wret;				      \
-     if (sizeof (*mem) != 4)					      \
-       abort ();						      \
-     if (__atomic_is_v9)					      \
-       __acev_wret						      \
-	 = __v9_compare_and_exchange_val_32_acq (mem, newval, oldval);\
-     else							      \
-       __acev_wret						      \
-	 = __v7_compare_and_exchange_val_24_acq (mem, newval, oldval);\
-     __acev_wret; })
-
-# define atomic_exchange_24_rel(mem, newval) \
-  ({								      \
-     __typeof (*mem) __acev_w24ret;				      \
-     if (sizeof (*mem) != 4)					      \
-       abort ();						      \
-     if (__atomic_is_v9)					      \
-       __acev_w24ret = atomic_exchange_rel (mem, newval);	      \
-     else							      \
-       __acev_w24ret = __v7_exchange_24_rel (mem, newval);	      \
-     __acev_w24ret; })
-
-#define atomic_full_barrier()						\
-  do {									\
-     if (__atomic_is_v9)						\
-       /* membar #LoadLoad | #LoadStore | #StoreLoad | #StoreStore */	\
-       __asm __volatile (".word 0x8143e00f" : : : "memory");		\
-     else								\
-       __asm __volatile ("" : : : "memory");				\
-  } while (0)
-
-#define atomic_read_barrier()						\
-  do {									\
-     if (__atomic_is_v9)						\
-       /* membar #LoadLoad | #LoadStore */				\
-       __asm __volatile (".word 0x8143e005" : : : "memory");		\
-     else								\
-       __asm __volatile ("" : : : "memory");				\
-  } while (0)
-
-#define atomic_write_barrier()						\
-  do {									\
-     if (__atomic_is_v9)						\
-       /* membar  #LoadStore | #StoreStore */				\
-       __asm __volatile (".word 0x8143e00c" : : : "memory");		\
-     else								\
-       __asm __volatile ("" : : : "memory");				\
-  } while (0)
+void __sparc_link_error (void);
 
+/* An OS-specific atomic-machine.h file will define this macro if
+   the OS can provide something.  If not, we'll fail to build
+   with a compiler that doesn't supply the operation.  */
+#ifndef __sparc_assisted_compare_and_exchange_val_32_acq
+# define __sparc_assisted_compare_and_exchange_val_32_acq(mem, newval, oldval) \
+  ({ __sparc_link_error (); oldval; })
 #endif
 
-#include <sysdep.h>
-
 #endif	/* atomic-machine.h */
diff --git a/sysdeps/sparc/sparc32/pthread_barrier_wait.c b/sysdeps/sparc/sparc32/pthread_barrier_wait.c
deleted file mode 100644
index e5ef911..0000000
--- a/sysdeps/sparc/sparc32/pthread_barrier_wait.c
+++ /dev/null
@@ -1 +0,0 @@
-#error No support for pthread barriers on pre-v9 sparc.
diff --git a/sysdeps/sparc/sparc32/sem_post.c b/sysdeps/sparc/sparc32/sem_post.c
deleted file mode 100644
index 415a3d5..0000000
--- a/sysdeps/sparc/sparc32/sem_post.c
+++ /dev/null
@@ -1,82 +0,0 @@
-/* sem_post -- post to a POSIX semaphore.  Generic futex-using version.
-   Copyright (C) 2003-2016 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-   Contributed by Jakub Jelinek <jakub@redhat.com>, 2003.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <atomic.h>
-#include <errno.h>
-#include <sysdep.h>
-#include <lowlevellock.h>
-#include <internaltypes.h>
-#include <semaphore.h>
-#include <futex-internal.h>
-
-#include <shlib-compat.h>
-
-
-/* See sem_wait for an explanation of the algorithm.  */
-int
-__new_sem_post (sem_t *sem)
-{
-  struct new_sem *isem = (struct new_sem *) sem;
-  int private = isem->private;
-  unsigned int v;
-
-  __sparc32_atomic_do_lock24 (&isem->pad);
-
-  v = isem->value;
-  if ((v >> SEM_VALUE_SHIFT) == SEM_VALUE_MAX)
-    {
-      __sparc32_atomic_do_unlock24 (&isem->pad);
-
-      __set_errno (EOVERFLOW);
-      return -1;
-    }
-  isem->value = v + (1 << SEM_VALUE_SHIFT);
-
-  __sparc32_atomic_do_unlock24 (&isem->pad);
-
-  if ((v & SEM_NWAITERS_MASK) != 0)
-    futex_wake (&isem->value, 1, private);
-
-  return 0;
-}
-versioned_symbol (libpthread, __new_sem_post, sem_post, GLIBC_2_1);
-
-
-#if SHLIB_COMPAT (libpthread, GLIBC_2_0, GLIBC_2_1)
-int
-attribute_compat_text_section
-__old_sem_post (sem_t *sem)
-{
-  int *futex = (int *) sem;
-
-  /* We must need to synchronize with consumers of this token, so the atomic
-     increment must have release MO semantics.  */
-  atomic_write_barrier ();
-  (void) atomic_increment_val (futex);
-  /* We always have to assume it is a shared semaphore.  */
-  int err = lll_futex_wake (futex, 1, LLL_SHARED);
-  if (__builtin_expect (err, 0) < 0)
-    {
-      __set_errno (-err);
-      return -1;
-    }
-  return 0;
-}
-compat_symbol (libpthread, __old_sem_post, sem_post, GLIBC_2_0);
-#endif
diff --git a/sysdeps/sparc/sparc32/sem_waitcommon.c b/sysdeps/sparc/sparc32/sem_waitcommon.c
deleted file mode 100644
index 5340f57..0000000
--- a/sysdeps/sparc/sparc32/sem_waitcommon.c
+++ /dev/null
@@ -1,146 +0,0 @@
-/* sem_waitcommon -- wait on a semaphore, shared code.
-   Copyright (C) 2003-2016 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-   Contributed by Paul Mackerras <paulus@au.ibm.com>, 2003.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <errno.h>
-#include <sysdep.h>
-#include <futex-internal.h>
-#include <internaltypes.h>
-#include <semaphore.h>
-#include <sys/time.h>
-
-#include <pthreadP.h>
-#include <shlib-compat.h>
-#include <atomic.h>
-
-
-static void
-__sem_wait_32_finish (struct new_sem *sem);
-
-static void
-__sem_wait_cleanup (void *arg)
-{
-  struct new_sem *sem = (struct new_sem *) arg;
-
-  __sem_wait_32_finish (sem);
-}
-
-/* Wait until at least one token is available, possibly with a timeout.
-   This is in a separate function in order to make sure gcc
-   puts the call site into an exception region, and thus the
-   cleanups get properly run.  TODO still necessary?  Other futex_wait
-   users don't seem to need it.  */
-static int
-__attribute__ ((noinline))
-do_futex_wait (struct new_sem *sem, const struct timespec *abstime)
-{
-  int err;
-
-  err = futex_abstimed_wait_cancelable (&sem->value, SEM_NWAITERS_MASK,
-					abstime, sem->private);
-
-  return err;
-}
-
-/* Fast path: Try to grab a token without blocking.  */
-static int
-__new_sem_wait_fast (struct new_sem *sem, int definitive_result)
-{
-  unsigned int v;
-  int ret = 0;
-
-  __sparc32_atomic_do_lock24(&sem->pad);
-
-  v = sem->value;
-  if ((v >> SEM_VALUE_SHIFT) == 0)
-    ret = -1;
-  else
-    sem->value = v - (1 << SEM_VALUE_SHIFT);
-
-  __sparc32_atomic_do_unlock24(&sem->pad);
-
-  return ret;
-}
-
-/* Slow path that blocks.  */
-static int
-__attribute__ ((noinline))
-__new_sem_wait_slow (struct new_sem *sem, const struct timespec *abstime)
-{
-  unsigned int v;
-  int err = 0;
-
-  __sparc32_atomic_do_lock24(&sem->pad);
-
-  sem->nwaiters++;
-
-  pthread_cleanup_push (__sem_wait_cleanup, sem);
-
-  /* Wait for a token to be available.  Retry until we can grab one.  */
-  v = sem->value;
-  do
-    {
-      if (!(v & SEM_NWAITERS_MASK))
-	sem->value = v | SEM_NWAITERS_MASK;
-
-      /* If there is no token, wait.  */
-      if ((v >> SEM_VALUE_SHIFT) == 0)
-	{
-	  __sparc32_atomic_do_unlock24(&sem->pad);
-
-	  err = do_futex_wait(sem, abstime);
-	  if (err == ETIMEDOUT || err == EINTR)
-	    {
-	      __set_errno (err);
-	      err = -1;
-	      goto error;
-	    }
-	  err = 0;
-
-	  __sparc32_atomic_do_lock24(&sem->pad);
-
-	  /* We blocked, so there might be a token now.  */
-	  v = sem->value;
-	}
-    }
-  /* If there is no token, we must not try to grab one.  */
-  while ((v >> SEM_VALUE_SHIFT) == 0);
-
-  sem->value = v - (1 << SEM_VALUE_SHIFT);
-
-  __sparc32_atomic_do_unlock24(&sem->pad);
-
-error:
-  pthread_cleanup_pop (0);
-
-  __sem_wait_32_finish (sem);
-
-  return err;
-}
-
-/* Stop being a registered waiter (non-64b-atomics code only).  */
-static void
-__sem_wait_32_finish (struct new_sem *sem)
-{
-  __sparc32_atomic_do_lock24(&sem->pad);
-
-  if (--sem->nwaiters == 0)
-    sem->value &= ~SEM_NWAITERS_MASK;
-
-  __sparc32_atomic_do_unlock24(&sem->pad);
-}
diff --git a/sysdeps/sparc/sparc32/sparcv9/pthread_barrier_wait.c b/sysdeps/sparc/sparc32/sparcv9/pthread_barrier_wait.c
deleted file mode 100644
index 246c8d4..0000000
--- a/sysdeps/sparc/sparc32/sparcv9/pthread_barrier_wait.c
+++ /dev/null
@@ -1 +0,0 @@
-#include <nptl/pthread_barrier_wait.c>
diff --git a/sysdeps/sparc/sparc32/sparcv9/sem_post.c b/sysdeps/sparc/sparc32/sparcv9/sem_post.c
deleted file mode 100644
index 6a2813c..0000000
--- a/sysdeps/sparc/sparc32/sparcv9/sem_post.c
+++ /dev/null
@@ -1 +0,0 @@
-#include <nptl/sem_post.c>
diff --git a/sysdeps/sparc/sparc32/sparcv9/sem_waitcommon.c b/sysdeps/sparc/sparc32/sparcv9/sem_waitcommon.c
deleted file mode 100644
index d4a1395..0000000
--- a/sysdeps/sparc/sparc32/sparcv9/sem_waitcommon.c
+++ /dev/null
@@ -1 +0,0 @@
-#include <nptl/sem_waitcommon.c>
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/atomic-machine.h b/sysdeps/unix/sysv/linux/sparc/sparc32/atomic-machine.h
new file mode 100644
index 0000000..4bb8aa4
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/atomic-machine.h
@@ -0,0 +1,35 @@
+/* Atomic operations.  SPARC/Linux version.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdint.h>
+
+#define __sparc_assisted_compare_and_exchange_val_32_acq(mem, newval, oldval)\
+({union { __typeof (oldval) a; uint32_t v; } oldval_arg = { .a = (oldval) }; \
+  union { __typeof (newval) a; uint32_t v; } newval_arg = { .a = (newval) }; \
+  register uint32_t __acev_tmp __asm ("%o2");			             \
+  register __typeof (mem) __acev_mem __asm ("%o0") = (mem);		     \
+  register uint32_t __acev_oldval __asm ("%o1");		             \
+  __acev_tmp = newval_arg.v;						     \
+  __acev_oldval = oldval_arg.v;						     \
+  __asm __volatile ("ta 0x23"					             \
+		    : "+r" (__acev_tmp), "=m" (*__acev_mem)		     \
+		    : "r" (__acev_oldval), "m" (*__acev_mem),		     \
+		      "r" (__acev_mem) : "memory");			     \
+  (__typeof (oldval)) __acev_tmp; })
+
+#include <sysdeps/sparc/sparc32/atomic-machine.h>
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/sparcv9/atomic-machine.h b/sysdeps/unix/sysv/linux/sparc/sparc32/sparcv9/atomic-machine.h
new file mode 100644
index 0000000..c5cf630
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/sparcv9/atomic-machine.h
@@ -0,0 +1 @@
+#include <sysdeps/sparc/sparc32/sparcv9/atomic-machine.h>
-- 
2.1.2.532.g19b5d50


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-11-07 16:38                     ` David Miller
@ 2016-11-07 21:21                       ` Sam Ravnborg
  2016-11-08  1:06                         ` David Miller
  2016-11-09 17:08                       ` Torvald Riegel
  1 sibling, 1 reply; 32+ messages in thread
From: Sam Ravnborg @ 2016-11-07 21:21 UTC (permalink / raw)
  To: David Miller
  Cc: carlos, triegel, adhemerval.zanella, andreas, libc-alpha, software

On Mon, Nov 07, 2016 at 11:38:25AM -0500, David Miller wrote:
> 
> So the following attached is what I started playing around with this
> weekend.
> 
> It implements software trap "0x23" to perform a CAS operations, the
> operands are expected in registers %o0, %o1, and %o2.
> 
> Since it was easiest to test I implemented this first on sparc64 which
> just executes the CAS instruction directly.  I'll start working on the
> 32-bit part in the background.
> 
> The capability will be advertised via the mask returned by the "get
> kernel features" system call.  We could check this early in the
> crt'ish code and cache the value in a variable which the atomics can
> check.
> 
> Another kernel side change I have to do is advertise the LEON CAS
> availability in the _dl_hwcaps so that we can use the LEON CAS in
> glibc when available.
> 
> The first patch is the kernel side, and the second is the glibc side.
> The whole NPTL testsuite passes for the plain 32-bit sparc target with
> these changes.

Glad that you found some time to look into this!


> >From fa1cad39df7318cdb46baea5774c340322cd74f2 Mon Sep 17 00:00:00 2001
> From: "David S. Miller" <davem@davemloft.net>
> Date: Mon, 7 Nov 2016 08:27:05 -0800
> Subject: [PATCH] sparc64: Add CAS emulation trap.
> 
> Older 32-bit sparc cpus (other than LEON) lack a CAS instruction, so
> we need to provide some kind of helper infrastructure in the kernel
> to emulate it.
> 
> This is the first part which firstly defines the basic infrastructure
> and the simplest implementation, which is to just directly execute the
> instruction on sparc64.
> 
> We make use of the window fill/spill fault unwind facilities to make
> this as simple as possible.  When we take a full TSB miss, we check if
> the trap level is greater than one, and if so unwind the trap to one
> of the final 3 instructions of the interrupted trap handler's block.
> Which of the three to use is based upon whether this is a real fault,
> an unaligned access, or a data access exception (ie. bus error).
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  arch/sparc/include/uapi/asm/unistd.h | 1 +
>  arch/sparc/kernel/Makefile           | 1 +
>  arch/sparc/kernel/sys_sparc_64.c     | 2 +-
>  arch/sparc/kernel/ttable_64.S        | 3 ++-
>  4 files changed, 5 insertions(+), 2 deletions(-)

casemul.S is missing.
So all the fun kernel stuf was not included in the patch...

	Sam

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-11-07 21:21                       ` Sam Ravnborg
@ 2016-11-08  1:06                         ` David Miller
  2016-11-09  5:49                           ` Sam Ravnborg
  0 siblings, 1 reply; 32+ messages in thread
From: David Miller @ 2016-11-08  1:06 UTC (permalink / raw)
  To: sam; +Cc: carlos, triegel, adhemerval.zanella, andreas, libc-alpha, software

[-- Attachment #1: Type: Text/Plain, Size: 289 bytes --]

From: Sam Ravnborg <sam@ravnborg.org>
Date: Mon, 7 Nov 2016 22:20:50 +0100

> casemul.S is missing.
> So all the fun kernel stuf was not included in the patch...

Ugh, someone asked me about this but asked me privately so I only
sent the corrected patch to them privately :-/

Here it is:

[-- Attachment #2: casemul_kernel.diff --]
[-- Type: Text/X-Patch, Size: 5115 bytes --]

From acdd40e01a98e2c4bf477d5d66c183716e7562c5 Mon Sep 17 00:00:00 2001
From: "David S. Miller" <davem@davemloft.net>
Date: Mon, 7 Nov 2016 08:27:05 -0800
Subject: [PATCH] sparc64: Add CAS emulation trap.

Older 32-bit sparc cpus (other than LEON) lack a CAS instruction, so
we need to provide some kind of helper infrastructure in the kernel
to emulate it.

This is the first part which firstly defines the basic infrastructure
and the simplest implementation, which is to just directly execute the
instruction on sparc64.

We make use of the window fill/spill fault unwind facilities to make
this as simple as possible.  When we take a full TSB miss, we check if
the trap level is greater than one, and if so unwind the trap to one
of the final 3 instructions of the interrupted trap handler's block.
Which of the three to use is based upon whether this is a real fault,
an unaligned access, or a data access exception (ie. bus error).

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/include/uapi/asm/unistd.h |  1 +
 arch/sparc/kernel/Makefile           |  1 +
 arch/sparc/kernel/casemul.S          | 66 ++++++++++++++++++++++++++++++++++++
 arch/sparc/kernel/sys_sparc_64.c     |  2 +-
 arch/sparc/kernel/ttable_64.S        |  3 +-
 5 files changed, 71 insertions(+), 2 deletions(-)
 create mode 100644 arch/sparc/kernel/casemul.S

diff --git a/arch/sparc/include/uapi/asm/unistd.h b/arch/sparc/include/uapi/asm/unistd.h
index 36eee81..0725911 100644
--- a/arch/sparc/include/uapi/asm/unistd.h
+++ b/arch/sparc/include/uapi/asm/unistd.h
@@ -430,6 +430,7 @@
 
 /* Bitmask values returned from kern_features system call.  */
 #define KERN_FEATURE_MIXED_MODE_STACK	0x00000001
+#define KERN_FEATURE_CAS_EMUL		0x00000002
 
 #ifdef __32bit_syscall_numbers__
 /* Sparc 32-bit only has the "setresuid32", "getresuid32" variants,
diff --git a/arch/sparc/kernel/Makefile b/arch/sparc/kernel/Makefile
index fa3c02d..1166638 100644
--- a/arch/sparc/kernel/Makefile
+++ b/arch/sparc/kernel/Makefile
@@ -21,6 +21,7 @@ CFLAGS_REMOVE_perf_event.o := -pg
 CFLAGS_REMOVE_pcr.o := -pg
 endif
 
+obj-$(CONFIG_SPARC64)   += casemul.o
 obj-$(CONFIG_SPARC64)   += urtt_fill.o
 obj-$(CONFIG_SPARC32)   += entry.o wof.o wuf.o
 obj-$(CONFIG_SPARC32)   += etrap_32.o
diff --git a/arch/sparc/kernel/casemul.S b/arch/sparc/kernel/casemul.S
new file mode 100644
index 0000000..237221f
--- /dev/null
+++ b/arch/sparc/kernel/casemul.S
@@ -0,0 +1,66 @@
+#include <asm/asi.h>
+#include <asm/thread_info.h>
+#include <asm/trap_block.h>
+#include <asm/ptrace.h>
+#include <asm/head.h>
+
+	.text
+	.align		128
+	.globl		emulate_cas
+	.type		emulate_cas,#function
+emulate_cas:
+	casa		[%o0] ASI_AIUP, %o1, %o2
+	done
+	nop; nop; nop; nop; nop; nop;
+	nop; nop; nop; nop; nop; nop; nop; nop
+	nop; nop; nop; nop; nop; nop; nop; nop
+	nop; nop; nop; nop; nop;
+	ba,a,pt		%xcc, 3f
+	ba,a,pt		%xcc, 2f
+	ba,a,pt		%xcc, 1f
+	.size		emulate_cas,.-emulate_cas
+
+	/* Fault */
+1:	TRAP_LOAD_THREAD_REG(%g6, %g1)
+	stb		%g4, [%g6 + TI_FAULT_CODE]
+	stx		%g5, [%g6 + TI_FAULT_ADDR]
+	ba,pt		%xcc, etrap
+	 rd		%pc, %g7
+	call		do_sparc64_fault
+	 add		%sp, PTREGS_OFF, %o0
+	ba,a,pt		%xcc, rtrap
+
+	/* Memory address unaligned */
+2:	ba,pt	%xcc, etrap
+	 rd	%pc, %g7
+	sethi	%hi(tlb_type), %g1
+	lduw	[%g1 + %lo(tlb_type)], %g1
+	cmp	%g1, 3
+	bne,pt	%icc, 1f
+	 add	%sp, PTREGS_OFF, %o0
+	mov	%l4, %o2
+	call	sun4v_do_mna
+	 mov	%l5, %o1
+	ba,a,pt	%xcc, rtrap
+1:	mov	%l4, %o1
+	mov	%l5, %o2
+	call	mem_address_unaligned
+	 nop
+	ba,a,pt	%xcc, rtrap
+
+	/* Data access exception */
+3:	ba,pt	%xcc, etrap
+	 rd	%pc, %g7
+	sethi	%hi(tlb_type), %g1
+	mov	%l4, %o1
+	lduw	[%g1 + %lo(tlb_type)], %g1
+	mov	%l5, %o2
+	cmp	%g1, 3
+	bne,pt	%icc, 1f
+	 add	%sp, PTREGS_OFF, %o0
+	call	sun4v_data_access_exception
+	 nop
+	ba,a,pt	%xcc, rtrap
+1:	call	spitfire_data_access_exception
+	 nop
+	ba,a,pt	%xcc, rtrap
diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
index fe8b8ee..d55e8b8 100644
--- a/arch/sparc/kernel/sys_sparc_64.c
+++ b/arch/sparc/kernel/sys_sparc_64.c
@@ -643,5 +643,5 @@ SYSCALL_DEFINE5(rt_sigaction, int, sig, const struct sigaction __user *, act,
 
 asmlinkage long sys_kern_features(void)
 {
-	return KERN_FEATURE_MIXED_MODE_STACK;
+	return KERN_FEATURE_MIXED_MODE_STACK | KERN_FEATURE_CAS_EMUL;
 }
diff --git a/arch/sparc/kernel/ttable_64.S b/arch/sparc/kernel/ttable_64.S
index c6dfdaa..3364019 100644
--- a/arch/sparc/kernel/ttable_64.S
+++ b/arch/sparc/kernel/ttable_64.S
@@ -147,7 +147,8 @@ tl0_resv11e:	TRAP_UTRAP(UT_TRAP_INSTRUCTION_30,0x11e) TRAP_UTRAP(UT_TRAP_INSTRUC
 tl0_getcc:	GETCC_TRAP
 tl0_setcc:	SETCC_TRAP
 tl0_getpsr:	TRAP(do_getpsr)
-tl0_resv123:	BTRAP(0x123) BTRAP(0x124) BTRAP(0x125) BTRAP(0x126) BTRAP(0x127)
+tl0_cas:	TRAP_NOSAVE(emulate_cas)
+tl0_resv124:	BTRAP(0x124) BTRAP(0x125) BTRAP(0x126) BTRAP(0x127)
 tl0_resv128:	BTRAP(0x128) BTRAP(0x129) BTRAP(0x12a) BTRAP(0x12b) BTRAP(0x12c)
 tl0_resv12d:	BTRAP(0x12d) BTRAP(0x12e) BTRAP(0x12f) BTRAP(0x130) BTRAP(0x131)
 tl0_resv132:	BTRAP(0x132) BTRAP(0x133) BTRAP(0x134) BTRAP(0x135) BTRAP(0x136)
-- 
2.1.2.532.g19b5d50


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-11-08  1:06                         ` David Miller
@ 2016-11-09  5:49                           ` Sam Ravnborg
  2016-11-10 23:33                             ` David Miller
  0 siblings, 1 reply; 32+ messages in thread
From: Sam Ravnborg @ 2016-11-09  5:49 UTC (permalink / raw)
  To: David Miller
  Cc: carlos, triegel, adhemerval.zanella, andreas, libc-alpha, software

Hi Dave.

> diff --git a/arch/sparc/kernel/casemul.S b/arch/sparc/kernel/casemul.S
> new file mode 100644
> index 0000000..237221f
> --- /dev/null
> +++ b/arch/sparc/kernel/casemul.S
> @@ -0,0 +1,66 @@
> +#include <asm/asi.h>
> +#include <asm/thread_info.h>
> +#include <asm/trap_block.h>
> +#include <asm/ptrace.h>
> +#include <asm/head.h>
> +
> +	.text
> +	.align		128
> +	.globl		emulate_cas
> +	.type		emulate_cas,#function
> +emulate_cas:
ENTRY(emulate_cas)

> +	casa		[%o0] ASI_AIUP, %o1, %o2
> +	done
> +	nop; nop; nop; nop; nop; nop;
> +	nop; nop; nop; nop; nop; nop; nop; nop
> +	nop; nop; nop; nop; nop; nop; nop; nop
> +	nop; nop; nop; nop; nop;
> +	ba,a,pt		%xcc, 3f
> +	ba,a,pt		%xcc, 2f
> +	ba,a,pt		%xcc, 1f
> +	.size		emulate_cas,.-emulate_cas
ENDPROC()

Did not (yet) look at the details of emulate_cas.

	Sam

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-11-07 16:38                     ` David Miller
  2016-11-07 21:21                       ` Sam Ravnborg
@ 2016-11-09 17:08                       ` Torvald Riegel
  2016-11-09 17:16                         ` David Miller
  1 sibling, 1 reply; 32+ messages in thread
From: Torvald Riegel @ 2016-11-09 17:08 UTC (permalink / raw)
  To: David Miller; +Cc: carlos, adhemerval.zanella, andreas, libc-alpha, software

On Mon, 2016-11-07 at 11:38 -0500, David Miller wrote:
> 
> So the following attached is what I started playing around with this
> weekend.
> 
> It implements software trap "0x23" to perform a CAS operations, the
> operands are expected in registers %o0, %o1, and %o2.
> 
> Since it was easiest to test I implemented this first on sparc64 which
> just executes the CAS instruction directly.  I'll start working on the
> 32-bit part in the background.
> 
> The capability will be advertised via the mask returned by the "get
> kernel features" system call.  We could check this early in the
> crt'ish code and cache the value in a variable which the atomics can
> check.
> 
> Another kernel side change I have to do is advertise the LEON CAS
> availability in the _dl_hwcaps so that we can use the LEON CAS in
> glibc when available.
> 
> The first patch is the kernel side, and the second is the glibc side.
> The whole NPTL testsuite passes for the plain 32-bit sparc target with
> these changes.

What approach are you going to use in the kernel to emulate the CAS if
the hardware doesn't offer one?  If you are not stopping all threads,
then there could be concurrent stores to the same memory location
targeted by the CAS; to make such stores atomic wrt. the CAS, you would
need to implement atomic stores in glibc to also use the kernel (eg, to
do a CAS).
I didn't see this in the glibc patch you sent, so I thought I'd ask.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-11-09 17:08                       ` Torvald Riegel
@ 2016-11-09 17:16                         ` David Miller
  2016-11-10  5:05                           ` Torvald Riegel
  2016-11-10 16:41                           ` Chris Metcalf
  0 siblings, 2 replies; 32+ messages in thread
From: David Miller @ 2016-11-09 17:16 UTC (permalink / raw)
  To: triegel; +Cc: carlos, adhemerval.zanella, andreas, libc-alpha, software

From: Torvald Riegel <triegel@redhat.com>
Date: Wed, 09 Nov 2016 09:08:15 -0800

> What approach are you going to use in the kernel to emulate the CAS if
> the hardware doesn't offer one?  If you are not stopping all threads,
> then there could be concurrent stores to the same memory location
> targeted by the CAS; to make such stores atomic wrt. the CAS, you would
> need to implement atomic stores in glibc to also use the kernel (eg, to
> do a CAS).

I keep hearing about this case, but as long as the CAS is atomic what
is the difference between the store being synchronized in some way
or not?

I think the ordering allowed for gives the same set of legal results.

In any possible case either the CAS "wins" or the async store "wins"
and that determines the final result written.  All combinations are
legal outcomes even with a hardware CAS implementation.

I really don't think such asynchronous stores are legal, nor should
the be explicitly accomodated in the CAS emulation support.  Either
the value is maintained in an atomic manner, or it is not.  And if it
is, updates must use CAS.  Straight stores are only legal on the
initialization of the word before any CAS code paths can get to the
value.

I cannot think of any sane setup that can allow async stores
intermixed with CAS updates.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-11-09 17:16                         ` David Miller
@ 2016-11-10  5:05                           ` Torvald Riegel
  2016-11-10 16:41                           ` Chris Metcalf
  1 sibling, 0 replies; 32+ messages in thread
From: Torvald Riegel @ 2016-11-10  5:05 UTC (permalink / raw)
  To: David Miller; +Cc: carlos, adhemerval.zanella, andreas, libc-alpha, software

On Wed, 2016-11-09 at 12:15 -0500, David Miller wrote:
> From: Torvald Riegel <triegel@redhat.com>
> Date: Wed, 09 Nov 2016 09:08:15 -0800
> 
> > What approach are you going to use in the kernel to emulate the CAS if
> > the hardware doesn't offer one?  If you are not stopping all threads,
> > then there could be concurrent stores to the same memory location
> > targeted by the CAS; to make such stores atomic wrt. the CAS, you would
> > need to implement atomic stores in glibc to also use the kernel (eg, to
> > do a CAS).
> 
> I keep hearing about this case, but as long as the CAS is atomic what
> is the difference between the store being synchronized in some way
> or not?
> 
> I think the ordering allowed for gives the same set of legal results.
> 
> In any possible case either the CAS "wins" or the async store "wins"
> and that determines the final result written.  All combinations are
> legal outcomes even with a hardware CAS implementation.

See this example, a is initially 0:

Thread 1:
atomic_store_relaxed (&a, 1);
r = atomic_load_relaxed (&a);

Thread 2:
exp = 0;
atomic_compare_exchange_weak_relaxed (&a, &exp, 2); // succeeds

r should never equal 2.  But if the CAS is not atomic wrt. the store by
Thread 1, then the CAS can load 0, then Thread 1's store comes in, and
then Thread 2's CAS stores 2 because it thought the value of a would be
the expected value of 0.

> I really don't think such asynchronous stores are legal, nor should
> the be explicitly accomodated in the CAS emulation support.  Either
> the value is maintained in an atomic manner, or it is not.  And if it
> is, updates must use CAS.

Yes, the implementation of atomic_store_* in glibc must use the CAS
emulation.  We do not care about plain stores because we consider them
data races in the context of the C11 model.  However, we still have
quite a few cases of plain stores that should be atomic stores in glibc;
so we might have a few problems until we've converted all of those.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-11-09 17:16                         ` David Miller
  2016-11-10  5:05                           ` Torvald Riegel
@ 2016-11-10 16:41                           ` Chris Metcalf
  2016-11-10 17:08                             ` Torvald Riegel
  1 sibling, 1 reply; 32+ messages in thread
From: Chris Metcalf @ 2016-11-10 16:41 UTC (permalink / raw)
  To: David Miller, triegel
  Cc: carlos, adhemerval.zanella, andreas, libc-alpha, software

On 11/9/2016 12:15 PM, David Miller wrote:
> From: Torvald Riegel <triegel@redhat.com>
> Date: Wed, 09 Nov 2016 09:08:15 -0800
>
>> What approach are you going to use in the kernel to emulate the CAS if
>> the hardware doesn't offer one?  If you are not stopping all threads,
>> then there could be concurrent stores to the same memory location
>> targeted by the CAS; to make such stores atomic wrt. the CAS, you would
>> need to implement atomic stores in glibc to also use the kernel (eg, to
>> do a CAS).
> I keep hearing about this case, but as long as the CAS is atomic what
> is the difference between the store being synchronized in some way
> or not?
>
> I think the ordering allowed for gives the same set of legal results.
>
> In any possible case either the CAS "wins" or the async store "wins"
> and that determines the final result written.  All combinations are
> legal outcomes even with a hardware CAS implementation.

That's not actually true.  Suppose you have an initial zero value, and you race
with a store of 2 and a kernel CAS from 0 to 1.  The legal output is only 2:
either the store hit first and the CAS failed, or the CAS hit first and succeeded,
then was overwritten by the 2.  But if the kernel CAS starts first and loads the
zero, then the store hits and sets the value to 2, the CAS will still decide it was
successful and write the 1, thus leaving the value illegally set to 1.

> I really don't think such asynchronous stores are legal, nor should
> the be explicitly accomodated in the CAS emulation support.  Either
> the value is maintained in an atomic manner, or it is not.  And if it
> is, updates must use CAS.  Straight stores are only legal on the
> initialization of the word before any CAS code paths can get to the
> value.
>
> I cannot think of any sane setup that can allow async stores
> intermixed with CAS updates.

So despite arguing above that mixing CAS and asynchronous store is safe,
here you are arguing that you shouldn't do it?  In any case yes, I think you
have come to the right conclusion, and you shouldn't do it.

If you're interested, I have some optimized code for the tilepro architecture to
handle this in arch/tile.  In kernel/intvec_32.S, the intvec_\vecname macro
does a fastpath check for negative syscalls and calls out to sys_cmpxchg, which
does some optimized work to figure out how to provide optimized atomics.
We actually support both 32 and 64-bit cmpxchg, as well as an "atomic_update"
that does (*mem & mask) + added, giving obvious implementations for
atomic_exchange, atomic_exchange_and_add, atomic_and_val, and atomic_or_val
(see glibc's sysdeps/tile/tilepro/atomic-machine.h).  There's some very hairy
stuff designed to handle the case of faulting with a bad user address here, since
we haven't set up the kernel stack yet.  But it works, and it's quite fast
(about 50 cycles to do the fast syscall).

We also hook into the same logic to support a more extended set of in-kernel
atomic operations; see arch/tile/lib/atomic*32* for that stuff.

The underlying locking is done by hashing into a lock table based on the low bits
of the address, which lets us support process-shared as well as process-private,
although it does mean that if multiple processes start up roughly
simultaneously and all try to lock the same process-private futex, they contend
with each other since they're using the same VA.  Oh well; we didn't come up
with a better solution that had good uncontended performance, but perhaps
there are better solutions to the hash function.

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-11-10 16:41                           ` Chris Metcalf
@ 2016-11-10 17:08                             ` Torvald Riegel
  2016-11-10 18:22                               ` Chris Metcalf
  0 siblings, 1 reply; 32+ messages in thread
From: Torvald Riegel @ 2016-11-10 17:08 UTC (permalink / raw)
  To: Chris Metcalf
  Cc: David Miller, carlos, adhemerval.zanella, andreas, libc-alpha, software

On Thu, 2016-11-10 at 11:41 -0500, Chris Metcalf wrote:
> On 11/9/2016 12:15 PM, David Miller wrote:
> > From: Torvald Riegel <triegel@redhat.com>
> > Date: Wed, 09 Nov 2016 09:08:15 -0800
> >
> >> What approach are you going to use in the kernel to emulate the CAS if
> >> the hardware doesn't offer one?  If you are not stopping all threads,
> >> then there could be concurrent stores to the same memory location
> >> targeted by the CAS; to make such stores atomic wrt. the CAS, you would
> >> need to implement atomic stores in glibc to also use the kernel (eg, to
> >> do a CAS).
> > I keep hearing about this case, but as long as the CAS is atomic what
> > is the difference between the store being synchronized in some way
> > or not?
> >
> > I think the ordering allowed for gives the same set of legal results.
> >
> > In any possible case either the CAS "wins" or the async store "wins"
> > and that determines the final result written.  All combinations are
> > legal outcomes even with a hardware CAS implementation.
> 
> That's not actually true.  Suppose you have an initial zero value, and you race
> with a store of 2 and a kernel CAS from 0 to 1.  The legal output is only 2:
> either the store hit first and the CAS failed, or the CAS hit first and succeeded,
> then was overwritten by the 2.  But if the kernel CAS starts first and loads the
> zero, then the store hits and sets the value to 2, the CAS will still decide it was
> successful and write the 1, thus leaving the value illegally set to 1.

Looking at tile's atomic-machine.h files again, it seems we're not
actually enforcing that atomic stores are atomic wrt. the CAS
implementation in the kernel.
The default implementation for atomic_store_relaxed in include/atomic.h
does a plain memory store instead of falling back to exchange.  This is
the right approach by default, I think, because that's what
pre-C11-concurrency code in glibc does (ie, there's no abstraction for
an atomic store at all, and plain memory accesses are used).

However, if we emulate CAS with locks or such in the kernel, atomic
stores need to synchronize with the CAS.  This would mean that all archs
such as tile or sparc that do that have to define atomic_store_relaxed
to fix this (at least for code converted to using C11 atomics, all
nonconverted code might still do the wrong thing).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-11-10 17:08                             ` Torvald Riegel
@ 2016-11-10 18:22                               ` Chris Metcalf
  2016-11-10 23:38                                 ` Torvald Riegel
  0 siblings, 1 reply; 32+ messages in thread
From: Chris Metcalf @ 2016-11-10 18:22 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: David Miller, carlos, adhemerval.zanella, andreas, libc-alpha, software

On 11/10/2016 12:08 PM, Torvald Riegel wrote:
> Looking at tile's atomic-machine.h files again, it seems we're not
> actually enforcing that atomic stores are atomic wrt. the CAS
> implementation in the kernel.
> The default implementation for atomic_store_relaxed in include/atomic.h
> does a plain memory store instead of falling back to exchange.  This is
> the right approach by default, I think, because that's what
> pre-C11-concurrency code in glibc does (ie, there's no abstraction for
> an atomic store at all, and plain memory accesses are used).
>
> However, if we emulate CAS with locks or such in the kernel, atomic
> stores need to synchronize with the CAS.  This would mean that all archs
> such as tile or sparc that do that have to define atomic_store_relaxed
> to fix this (at least for code converted to using C11 atomics, all
> nonconverted code might still do the wrong thing).

Note that our mainstream tilegx architecture has full atomic support, so
this is only applicable to the older tilepro architecture.

2016-11-10  Chris Metcalf  <cmetcalf@mellanox.com>

     * sysdeps/tile/tilepro/atomic-machine.h (atomic_store_relaxed)
     (atomic_store_acquire): Provide tilepro-specific implementations.

diff --git a/sysdeps/tile/tilepro/atomic-machine.h b/sysdeps/tile/tilepro/atomic-machine.h
index 702e17d77db7..5365929c940a 100644
--- a/sysdeps/tile/tilepro/atomic-machine.h
+++ b/sysdeps/tile/tilepro/atomic-machine.h
@@ -83,6 +83,16 @@ int __atomic_update_32 (volatile int *mem, int mask, int addend)
    ({ __typeof (mask) __att1_v = (mask);                 \
      __atomic_update ((mem), ~__att1_v, __att1_v); })

+/*
+ * We must use the kernel atomics for atomic_store, since otherwise an
+ * unsynchronized store could become visible after another core's
+ * kernel-atomic implementation had read the memory word in question,
+ * but before it had written the updated value to it, which would
+ * cause the unsynchronized store to be lost.
+ */
+#define atomic_store_relaxed(mem, val) atomic_exchange_acq (mem, val)
+#define atomic_store_release(mem, val) atomic_exchange_rel (mem, val)
+
  #include <sysdeps/tile/atomic-machine.h>

  #endif /* atomic-machine.h */

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-11-09  5:49                           ` Sam Ravnborg
@ 2016-11-10 23:33                             ` David Miller
  0 siblings, 0 replies; 32+ messages in thread
From: David Miller @ 2016-11-10 23:33 UTC (permalink / raw)
  To: sam; +Cc: carlos, triegel, adhemerval.zanella, andreas, libc-alpha, software

From: Sam Ravnborg <sam@ravnborg.org>
Date: Wed, 9 Nov 2016 06:49:43 +0100

>> +	.text
>> +	.align		128
>> +	.globl		emulate_cas
>> +	.type		emulate_cas,#function
>> +emulate_cas:
> ENTRY(emulate_cas)
> 
 ...
> ENDPROC()
> 
> Did not (yet) look at the details of emulate_cas.

Thanks Sam, I'll make that change.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Remove sparcv8 support
  2016-11-10 18:22                               ` Chris Metcalf
@ 2016-11-10 23:38                                 ` Torvald Riegel
  0 siblings, 0 replies; 32+ messages in thread
From: Torvald Riegel @ 2016-11-10 23:38 UTC (permalink / raw)
  To: Chris Metcalf
  Cc: David Miller, carlos, adhemerval.zanella, andreas, libc-alpha, software

On Thu, 2016-11-10 at 13:22 -0500, Chris Metcalf wrote:
> On 11/10/2016 12:08 PM, Torvald Riegel wrote:
> > Looking at tile's atomic-machine.h files again, it seems we're not
> > actually enforcing that atomic stores are atomic wrt. the CAS
> > implementation in the kernel.
> > The default implementation for atomic_store_relaxed in include/atomic.h
> > does a plain memory store instead of falling back to exchange.  This is
> > the right approach by default, I think, because that's what
> > pre-C11-concurrency code in glibc does (ie, there's no abstraction for
> > an atomic store at all, and plain memory accesses are used).
> >
> > However, if we emulate CAS with locks or such in the kernel, atomic
> > stores need to synchronize with the CAS.  This would mean that all archs
> > such as tile or sparc that do that have to define atomic_store_relaxed
> > to fix this (at least for code converted to using C11 atomics, all
> > nonconverted code might still do the wrong thing).
> 
> Note that our mainstream tilegx architecture has full atomic support, so
> this is only applicable to the older tilepro architecture.

LGTM, thanks.


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-11-10 23:38 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-20 19:47 Remove sparcv8 support Adhemerval Zanella
2016-10-20 20:56 ` David Miller
2016-10-21  9:02 ` Andreas Larsson
2016-10-21 13:13   ` Adhemerval Zanella
2016-10-21 15:03     ` David Miller
2016-10-24 17:14       ` Torvald Riegel
2016-10-24 17:25   ` Torvald Riegel
2016-10-24 17:43     ` Adhemerval Zanella
2016-10-25 14:34       ` Andreas Larsson
2016-10-25 14:45         ` Adhemerval Zanella
2016-10-26 14:46           ` Andreas Larsson
2016-10-26 18:03             ` Adhemerval Zanella
2016-10-26 18:47               ` David Miller
2016-10-26 19:39                 ` Adhemerval Zanella
2016-10-27 10:54                 ` Torvald Riegel
2016-10-27 14:36                   ` Carlos O'Donell
2016-11-07 16:38                     ` David Miller
2016-11-07 21:21                       ` Sam Ravnborg
2016-11-08  1:06                         ` David Miller
2016-11-09  5:49                           ` Sam Ravnborg
2016-11-10 23:33                             ` David Miller
2016-11-09 17:08                       ` Torvald Riegel
2016-11-09 17:16                         ` David Miller
2016-11-10  5:05                           ` Torvald Riegel
2016-11-10 16:41                           ` Chris Metcalf
2016-11-10 17:08                             ` Torvald Riegel
2016-11-10 18:22                               ` Chris Metcalf
2016-11-10 23:38                                 ` Torvald Riegel
2016-10-27 10:38             ` Torvald Riegel
2016-11-01 15:27               ` Andreas Larsson
2016-10-25 14:34     ` Andreas Larsson
2016-10-25 16:22       ` Torvald Riegel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).