public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] nvptx: Add support for subword compare-and-swap
@ 2020-06-15 20:28 Kwok Cheung Yeung
  2020-06-23 16:44 ` Thomas Schwinge
  2020-07-01 14:28 ` Tom de Vries
  0 siblings, 2 replies; 24+ messages in thread
From: Kwok Cheung Yeung @ 2020-06-15 20:28 UTC (permalink / raw)
  To: GCC Patches, tdevries, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 457 bytes --]

Hello

This patch adds support on nvptx for __sync_val_compare_and_swap operations on 
1- and 2-byte values. The implementation is a straight copy of the version for 
AMD GCN.

I have added a new libgomp test that exercises the new operation. I have also 
verified that the new code does not cause any regressions on the nvptx 
offloading tests, and that the new test passes with both nvptx and amdgcn as 
offload targets.

Okay for master and OG10?

Kwok


[-- Attachment #2: nvptx_subword.patch --]
[-- Type: text/plain, Size: 4829 bytes --]

commit 7c3a9c23ba9f5b8fe953aa5492ae75617f2444a3
Author: Kwok Cheung Yeung <kcy@codesourcery.com>
Date:   Mon Jun 15 12:34:55 2020 -0700

    nvptx: Add support for subword compare-and-swap
    
    2020-06-15  Kwok Cheung Yeung  <kcy@codesourcery.com>
    
    	libgcc/
    	* config/nvptx/atomic.c: New.
    	* config/nvptx/t-nvptx (LIB2ADD): Add atomic.c.
    
    	libgomp/
    	* testsuite/libgomp.c-c++-common/reduction-16.c: New.

diff --git a/libgcc/config/nvptx/atomic.c b/libgcc/config/nvptx/atomic.c
new file mode 100644
index 0000000..4becbd2
--- /dev/null
+++ b/libgcc/config/nvptx/atomic.c
@@ -0,0 +1,59 @@
+/* NVPTX atomic operations
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Mentor Graphics.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdbool.h>
+
+#define __SYNC_SUBWORD_COMPARE_AND_SWAP(TYPE, SIZE)			     \
+									     \
+TYPE									     \
+__sync_val_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval)     \
+{									     \
+  unsigned int *wordptr = (unsigned int *)((__UINTPTR_TYPE__ ) ptr & ~3UL);  \
+  int shift = ((__UINTPTR_TYPE__ ) ptr & 3UL) * 8;			     \
+  unsigned int valmask = (1 << (SIZE * 8)) - 1;				     \
+  unsigned int wordmask = ~(valmask << shift);				     \
+  unsigned int oldword = *wordptr;					     \
+  for (;;)								     \
+    {									     \
+      TYPE prevval = (oldword >> shift) & valmask;			     \
+      if (__builtin_expect (prevval != oldval, 0))			     \
+	return prevval;							     \
+      unsigned int newword = oldword & wordmask;			     \
+      newword |= ((unsigned int) newval) << shift;			     \
+      unsigned int prevword						     \
+	  = __sync_val_compare_and_swap_4 (wordptr, oldword, newword);	     \
+      if (__builtin_expect (prevword == oldword, 1))			     \
+	return oldval;							     \
+      oldword = prevword;						     \
+    }									     \
+}									     \
+									     \
+bool									     \
+__sync_bool_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval)    \
+{									     \
+  return __sync_val_compare_and_swap_##SIZE (ptr, oldval, newval) == oldval; \
+}
+
+__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned char, 1)
+__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned short, 2)
+
diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index c4d20c9..ede0bf0 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -1,5 +1,6 @@
 LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
-	$(srcdir)/config/nvptx/mgomp.c
+	$(srcdir)/config/nvptx/mgomp.c \
+	$(srcdir)/config/nvptx/atomic.c
 
 LIB2ADDEH=
 LIB2FUNCS_EXCLUDE=__main
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
new file mode 100644
index 0000000..951e522
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+
+#define N 512
+
+#define GENERATE_TEST(T)	\
+int test_##T (void)		\
+{				\
+  T a[N], res = 0;		\
+				\
+  for (int i = 0; i < N; ++i)	\
+    a[i] = i & 1;		\
+				\
+_Pragma("omp target teams distribute reduction(||:res) defaultmap(tofrom:scalar)") \
+  for (int i = 0; i < N; ++i)	\
+    res = res || a[i];		\
+				\
+  /* res should be non-zero.  */\
+  if (!res)			\
+    return 1;			\
+				\
+_Pragma("omp target teams distribute reduction(&&:res) defaultmap(tofrom:scalar)") \
+  for (int i = 0; i < N; ++i)	\
+    res = res && a[i];		\
+				\
+  /* res should be zero.  */	\
+  return res;			\
+}
+
+GENERATE_TEST(char)
+GENERATE_TEST(short)
+GENERATE_TEST(int)
+GENERATE_TEST(long)
+
+int main(void)
+{
+  if (test_char ())
+    abort ();
+  if (test_short ())
+    abort ();
+  if (test_int ())
+    abort ();
+  if (test_long ())
+    abort ();
+}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] nvptx: Add support for subword compare-and-swap
  2020-06-15 20:28 [PATCH] nvptx: Add support for subword compare-and-swap Kwok Cheung Yeung
@ 2020-06-23 16:44 ` Thomas Schwinge
  2020-06-23 16:51   ` Jakub Jelinek
                     ` (2 more replies)
  2020-07-01 14:28 ` Tom de Vries
  1 sibling, 3 replies; 24+ messages in thread
From: Thomas Schwinge @ 2020-06-23 16:44 UTC (permalink / raw)
  To: Kwok Cheung Yeung, Tom de Vries; +Cc: gcc-patches, Jakub Jelinek

Hi!

On 2020-06-15T21:28:12+0100, Kwok Cheung Yeung <kcy@codesourcery.com> wrote:
> This patch adds support on nvptx for __sync_val_compare_and_swap operations on
> 1- and 2-byte values.

Is this a thorough review that these are the only functions missing, or
did you just implement what you found missing for some test case you've
been looking into?  Other architectures' similar libgcc files seem to be
defining more of such related functions.

Per the PTX 3.1 manual that I looked into, I see for CAS it supports:
'.b32', '.b64'.  We've got:

    $ build-gcc-offload-nvptx-none/gcc/xgcc -Bbuild-gcc-offload-nvptx-none/gcc -dM -E -x c /dev/null | grep -i compare.and.swap
    #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1
    #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1

..., so that would match the PTX 3.1 manual.  GCC also seems to know
about 16-byte ("'.b128'", which doesn't exist in PTX).  A quick run of
your testcase with 'GENERATE_TEST(__int128)' seems to just work, but I
haven't verified why/how.

> The implementation is a straight copy of the version for
> AMD GCN.

(Should thus be generalized?  That can be done later, as far as I'm
concerned -- there already seems to be quite some code duplication in
libgcc.)

I have not verified the algorithm.

It seems a bit unfortunate to have such a thing outlined in a separate
function, given we're talking about performance-critical code here?  Even
more so for GCN, where there's no JIT compiler that can inline it later,
as it's the case for nvptx?

You have verified that GCC itself shouldn't/can't synthesize such
replacement code, inline?

The GCN/nvptx libgcc code actually seems simple enough so that GCC could
synthesize that internally -- or is there a reason not to?  (Just
curious.)

I see 'gcc/doc/generic.texi', "OMP_ATOMIC" state:

    The gimplifier tries
    three alternative code generation strategies.  Whenever possible,
    an atomic update built-in is used.  If that fails, a
    compare-and-swap loop is attempted.  If that also fails, a
    regular critical section around the expression is used.

..., and I see 'gcc/omp-expand.c:expand_omp_atomic_pipeline' synthesize
that "compare-and-swap loop" code, which looks vaguely similar to your
libgcc implementation.

..., or maybe even 'gcc/builtins.c:expand_builtin_compare_and_swap'
etc. could be doing such things?

..., and 'gcc/optabs.c:expand_compare_and_swap_loop' etc. also look
similar/related?

I clearly don't know the history behind all of that.  :-|

> I have added a new libgomp test that exercises the new operation.

Ideally we should also have nvptx target compile-time test, and scan the
PTX assembler code generated -- but we don't have that for a lot of
things, so this is probably OK without, too?

> I have also
> verified that the new code does not cause any regressions on the nvptx
> offloading tests, and that the new test passes with both nvptx and amdgcn as
> offload targets.
>
> Okay for master and OG10?

Given that this solves an actual problem, and we have precedence with the
GCN target doing the same thing, this seems OK (but I can't formally
approve).


Grüße
 Thomas


> commit 7c3a9c23ba9f5b8fe953aa5492ae75617f2444a3
> Author: Kwok Cheung Yeung <kcy@codesourcery.com>
> Date:   Mon Jun 15 12:34:55 2020 -0700
>
>     nvptx: Add support for subword compare-and-swap
>
>     2020-06-15  Kwok Cheung Yeung  <kcy@codesourcery.com>
>
>       libgcc/
>       * config/nvptx/atomic.c: New.
>       * config/nvptx/t-nvptx (LIB2ADD): Add atomic.c.
>
>       libgomp/
>       * testsuite/libgomp.c-c++-common/reduction-16.c: New.
>
> diff --git a/libgcc/config/nvptx/atomic.c b/libgcc/config/nvptx/atomic.c
> new file mode 100644
> index 0000000..4becbd2
> --- /dev/null
> +++ b/libgcc/config/nvptx/atomic.c
> @@ -0,0 +1,59 @@
> +/* NVPTX atomic operations
> +   Copyright (C) 2020 Free Software Foundation, Inc.
> +   Contributed by Mentor Graphics.
> +
> +   This file is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by the
> +   Free Software Foundation; either version 3, or (at your option) any
> +   later version.
> +
> +   This file is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <stdbool.h>
> +
> +#define __SYNC_SUBWORD_COMPARE_AND_SWAP(TYPE, SIZE)                       \
> +                                                                          \
> +TYPE                                                                      \
> +__sync_val_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval)     \
> +{                                                                         \
> +  unsigned int *wordptr = (unsigned int *)((__UINTPTR_TYPE__ ) ptr & ~3UL);  \
> +  int shift = ((__UINTPTR_TYPE__ ) ptr & 3UL) * 8;                        \
> +  unsigned int valmask = (1 << (SIZE * 8)) - 1;                                   \
> +  unsigned int wordmask = ~(valmask << shift);                                    \
> +  unsigned int oldword = *wordptr;                                        \
> +  for (;;)                                                                \
> +    {                                                                             \
> +      TYPE prevval = (oldword >> shift) & valmask;                        \
> +      if (__builtin_expect (prevval != oldval, 0))                        \
> +     return prevval;                                                      \
> +      unsigned int newword = oldword & wordmask;                          \
> +      newword |= ((unsigned int) newval) << shift;                        \
> +      unsigned int prevword                                               \
> +       = __sync_val_compare_and_swap_4 (wordptr, oldword, newword);       \
> +      if (__builtin_expect (prevword == oldword, 1))                      \
> +     return oldval;                                                       \
> +      oldword = prevword;                                                 \
> +    }                                                                             \
> +}                                                                         \
> +                                                                          \
> +bool                                                                      \
> +__sync_bool_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval)    \
> +{                                                                         \
> +  return __sync_val_compare_and_swap_##SIZE (ptr, oldval, newval) == oldval; \
> +}
> +
> +__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned char, 1)
> +__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned short, 2)
> +
> diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
> index c4d20c9..ede0bf0 100644
> --- a/libgcc/config/nvptx/t-nvptx
> +++ b/libgcc/config/nvptx/t-nvptx
> @@ -1,5 +1,6 @@
>  LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
> -     $(srcdir)/config/nvptx/mgomp.c
> +     $(srcdir)/config/nvptx/mgomp.c \
> +     $(srcdir)/config/nvptx/atomic.c
>
>  LIB2ADDEH=
>  LIB2FUNCS_EXCLUDE=__main
> diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> new file mode 100644
> index 0000000..951e522
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> @@ -0,0 +1,46 @@
> +/* { dg-do run } */
> +
> +#include <stdlib.h>
> +
> +#define N 512
> +
> +#define GENERATE_TEST(T)     \
> +int test_##T (void)          \
> +{                            \
> +  T a[N], res = 0;           \
> +                             \
> +  for (int i = 0; i < N; ++i)        \
> +    a[i] = i & 1;            \
> +                             \
> +_Pragma("omp target teams distribute reduction(||:res) defaultmap(tofrom:scalar)") \
> +  for (int i = 0; i < N; ++i)        \
> +    res = res || a[i];               \
> +                             \
> +  /* res should be non-zero.  */\
> +  if (!res)                  \
> +    return 1;                        \
> +                             \
> +_Pragma("omp target teams distribute reduction(&&:res) defaultmap(tofrom:scalar)") \
> +  for (int i = 0; i < N; ++i)        \
> +    res = res && a[i];               \
> +                             \
> +  /* res should be zero.  */ \
> +  return res;                        \
> +}
> +
> +GENERATE_TEST(char)
> +GENERATE_TEST(short)
> +GENERATE_TEST(int)
> +GENERATE_TEST(long)
> +
> +int main(void)
> +{
> +  if (test_char ())
> +    abort ();
> +  if (test_short ())
> +    abort ();
> +  if (test_int ())
> +    abort ();
> +  if (test_long ())
> +    abort ();
> +}
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] nvptx: Add support for subword compare-and-swap
  2020-06-23 16:44 ` Thomas Schwinge
@ 2020-06-23 16:51   ` Jakub Jelinek
  2020-06-30 16:35     ` Kwok Cheung Yeung
  2020-06-24 11:13   ` Kwok Cheung Yeung
  2020-06-30 14:37   ` Tom de Vries
  2 siblings, 1 reply; 24+ messages in thread
From: Jakub Jelinek @ 2020-06-23 16:51 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Kwok Cheung Yeung, Tom de Vries, gcc-patches

On Tue, Jun 23, 2020 at 06:44:26PM +0200, Thomas Schwinge wrote:
> On 2020-06-15T21:28:12+0100, Kwok Cheung Yeung <kcy@codesourcery.com> wrote:
> > This patch adds support on nvptx for __sync_val_compare_and_swap operations on
> > 1- and 2-byte values.
> 
> Is this a thorough review that these are the only functions missing, or
> did you just implement what you found missing for some test case you've
> been looking into?  Other architectures' similar libgcc files seem to be
> defining more of such related functions.
> It seems a bit unfortunate to have such a thing outlined in a separate
> function, given we're talking about performance-critical code here?  Even
> more so for GCN, where there's no JIT compiler that can inline it later,
> as it's the case for nvptx?

I think this should really be handled by the backend inline, like many other
targets do it when they only support 32-bit+ and not 8/16-bit atomics.
See e.g. sparc backend.

	Jakub


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] nvptx: Add support for subword compare-and-swap
  2020-06-23 16:44 ` Thomas Schwinge
  2020-06-23 16:51   ` Jakub Jelinek
@ 2020-06-24 11:13   ` Kwok Cheung Yeung
  2020-06-30 14:37   ` Tom de Vries
  2 siblings, 0 replies; 24+ messages in thread
From: Kwok Cheung Yeung @ 2020-06-24 11:13 UTC (permalink / raw)
  To: Thomas Schwinge, Tom de Vries; +Cc: gcc-patches, Jakub Jelinek

On 23/06/2020 5:44 pm, Thomas Schwinge wrote:
> Hi!
> 
> On 2020-06-15T21:28:12+0100, Kwok Cheung Yeung <kcy@codesourcery.com> wrote:
>> This patch adds support on nvptx for __sync_val_compare_and_swap operations on
>> 1- and 2-byte values.
> 
> Is this a thorough review that these are the only functions missing, or
> did you just implement what you found missing for some test case you've
> been looking into?  Other architectures' similar libgcc files seem to be
> defining more of such related functions.
> 

This was to fix a particular test case that used logical reduction operators on 
subword values. Support for other operators (e.g. arithmetic add) would need 
additional support.

Other architectures also define atomic operations for subwords in libgcc (e.g. 
in config/arm/linux-atomic.c for Arm, config/riscv/atomic.c for RISC-V, in 
config/m68k/linux-atomic.c for Motorola 68000s etc.).

>> The implementation is a straight copy of the version for
>> AMD GCN.
> 
> (Should thus be generalized?  That can be done later, as far as I'm
> concerned -- there already seems to be quite some code duplication in
> libgcc.)
> 
> I have not verified the algorithm.
> 
> It seems a bit unfortunate to have such a thing outlined in a separate
> function, given we're talking about performance-critical code here?  Even
> more so for GCN, where there's no JIT compiler that can inline it later,
> as it's the case for nvptx?
> 
> You have verified that GCC itself shouldn't/can't synthesize such
> replacement code, inline?
> 
> The GCN/nvptx libgcc code actually seems simple enough so that GCC could
> synthesize that internally -- or is there a reason not to?  (Just
> curious.)
>

Without the additional __sync_val_compare_and_swap_* functions, the testcase 
fails with:

unresolved symbol __sync_val_compare_and_swap_2

when the accelerator compiler is run.

The algorithm is generic and fairly simple, so it is more that GCC doesn't, 
rather than shouldn't or can't. i.e. No one has implemented it as a fallback in 
lieu of native support so far.

> I see 'gcc/doc/generic.texi', "OMP_ATOMIC" state:
> 
>      The gimplifier tries
>      three alternative code generation strategies.  Whenever possible,
>      an atomic update built-in is used.  If that fails, a
>      compare-and-swap loop is attempted.  If that also fails, a
>      regular critical section around the expression is used.
> 
> ..., and I see 'gcc/omp-expand.c:expand_omp_atomic_pipeline' synthesize
> that "compare-and-swap loop" code, which looks vaguely similar to your
> libgcc implementation.
> 
 > ..., and 'gcc/optabs.c:expand_compare_and_swap_loop' etc. also look
 > similar/related?

These create loops that repeatedly do the compare-and-swap operation until it 
succeeds. They cannot work to express a smaller compare-and-swap in terms of a 
larger one since they lack the necessary shifts and masking.

The exit condition is also different - since we are implementing 
compare-and-swap, the loop can exit even if the swap fails. The loop exits if 
the subword read from memory is different from the expected value (i.e. the 
compare-and-swap fails) or if the word containing the subword is equal to the 
expected value (i.e. it succeeds). The loop is there to cope with the case where 
the compare-and-swap fails due to part of the larger word changing that is not 
part of the subword that we are interested in.

> ..., or maybe even 'gcc/builtins.c:expand_builtin_compare_and_swap'
> etc. could be doing such things?
> 

This delegates to expand_atomic_compare_and_swap (in optabs.c), which would be 
the logical place to synthesize the sub-word loop. All it does ATM is look for 
atomic_compare_and_swap op handlers, then for sync_compare_and_swap handlers, 
then library functions for __sync_val_compare_and_swap.

Kwok

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] nvptx: Add support for subword compare-and-swap
  2020-06-23 16:44 ` Thomas Schwinge
  2020-06-23 16:51   ` Jakub Jelinek
  2020-06-24 11:13   ` Kwok Cheung Yeung
@ 2020-06-30 14:37   ` Tom de Vries
  2 siblings, 0 replies; 24+ messages in thread
From: Tom de Vries @ 2020-06-30 14:37 UTC (permalink / raw)
  To: Thomas Schwinge, Kwok Cheung Yeung; +Cc: gcc-patches, Jakub Jelinek

On 6/23/20 6:44 PM, Thomas Schwinge wrote:
> Per the PTX 3.1 manual that I looked into, I see for CAS it supports:
> '.b32', '.b64'.  We've got:
> 
>     $ build-gcc-offload-nvptx-none/gcc/xgcc -Bbuild-gcc-offload-nvptx-none/gcc -dM -E -x c /dev/null | grep -i compare.and.swap
>     #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1
>     #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1
> 
> ..., so that would match the PTX 3.1 manual.  GCC also seems to know
> about 16-byte ("'.b128'", which doesn't exist in PTX).  A quick run of
> your testcase with 'GENERATE_TEST(__int128)' seems to just work, but I
> haven't verified why/how.

In this case, we hit the critical section strategy and use
expand_omp_atomic_mutex.

Thanks,
- Tom



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] nvptx: Add support for subword compare-and-swap
  2020-06-23 16:51   ` Jakub Jelinek
@ 2020-06-30 16:35     ` Kwok Cheung Yeung
  0 siblings, 0 replies; 24+ messages in thread
From: Kwok Cheung Yeung @ 2020-06-30 16:35 UTC (permalink / raw)
  To: Jakub Jelinek, Thomas Schwinge; +Cc: Tom de Vries, gcc-patches

On 23/06/2020 5:51 pm, Jakub Jelinek wrote:
> On Tue, Jun 23, 2020 at 06:44:26PM +0200, Thomas Schwinge wrote:
>> On 2020-06-15T21:28:12+0100, Kwok Cheung Yeung <kcy@codesourcery.com> wrote:
>>> This patch adds support on nvptx for __sync_val_compare_and_swap operations on
>>> 1- and 2-byte values.
>>
>> Is this a thorough review that these are the only functions missing, or
>> did you just implement what you found missing for some test case you've
>> been looking into?  Other architectures' similar libgcc files seem to be
>> defining more of such related functions.
>> It seems a bit unfortunate to have such a thing outlined in a separate
>> function, given we're talking about performance-critical code here?  Even
>> more so for GCN, where there's no JIT compiler that can inline it later,
>> as it's the case for nvptx?
> 
> I think this should really be handled by the backend inline, like many other
> targets do it when they only support 32-bit+ and not 8/16-bit atomics.
> See e.g. sparc backend.

I can see both approaches being used - e.g. Sparc, Alpha, and RS6000 expand the 
subword atomics into loops in the backend. However, there are various 
linux-atomic.c files in the subdirectories of libgcc/config/ (e.g. for Arm, 
PA-RISC, m68k etc.), which use a system-call to do the 32-bit compare-and-swap, 
then express smaller operations in terms of that. TilePro, AMDGCN and RISC-V 
have atomic.c files that do the subword compare-and-swap in terms of a larger 
native compare-and-swap.

I don't know - is it worth doing this in the backend on this architecture? One 
thing to keep in mind is that nvptx is a virtual instruction set, so the JIT 
compiler would likely inline the libgcc library calls anyway.

Thanks

Kwok

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] nvptx: Add support for subword compare-and-swap
  2020-06-15 20:28 [PATCH] nvptx: Add support for subword compare-and-swap Kwok Cheung Yeung
  2020-06-23 16:44 ` Thomas Schwinge
@ 2020-07-01 14:28 ` Tom de Vries
  2020-07-15 19:08   ` Kwok Cheung Yeung
  2020-07-20 13:19   ` Kwok Cheung Yeung
  1 sibling, 2 replies; 24+ messages in thread
From: Tom de Vries @ 2020-07-01 14:28 UTC (permalink / raw)
  To: Kwok Cheung Yeung, GCC Patches; +Cc: Thomas Schwinge, Jakub Jelinek

On 6/15/20 10:28 PM, Kwok Cheung Yeung wrote:
> Hello
> 
> This patch adds support on nvptx for __sync_val_compare_and_swap
> operations on 1- and 2-byte values. The implementation is a straight
> copy of the version for AMD GCN.

Actually it also adds support for __sync_bool_compare_and_swap, be sure
to mention this in the commit log.

> I have added a new libgomp test that exercises the new operation. I have
> also verified that the new code does not cause any regressions on the
> nvptx offloading tests, and that the new test passes with both nvptx and
> amdgcn as offload targets.
> 

AFAICT, that excercises __sync_val_compare_and_swap, but not
__sync_bool_compare_and_swap.

You've covered offloading, but now consider the nvptx standalone target.

I was surprised to find only one generic test-case exercising char/short
__sync_val_compare_and_swap:
...
$ find gcc/testsuite/ -type f | xargs grep -l sync_char_short | xargs
grep sync_val
gcc/testsuite/gcc.dg/pr49696.c:  __sync_val_compare_and_swap (x, 1, 0);
...
and that's not a run test.

So, I think gcc needs a copy of (some of) the
gcc/testsuite/gcc.dg/ia64-sync-*.c tests for effective target
sync_char_short.

However, since this patch only adds partial support, we cannot enable
sync_char_short for nvptx yet.  So, if you stick to partial support, you
should add a char/short copy of ia64-sync-3.c to gcc.target/nvptx (which
ideally could be an include of a generic test-case that is active for
sync_char_short only, with mention that it can be removed once
sync_char_short is enabled for nvptx).

> Okay for master and OG10?
> 

I looked at the implementation, and it looks ok to me, though I think we
need to make explicit in a comment what the assumptions are:
- that we have read and write access to the entire word, and
- that the word is not volatile.

As for the discussion about libgcc vs backend implementation, I think
it's good to have both.  In particular, a backend implementation might
be able to take advantage of the address space and issue an atom.shared.cas.

But since the current patch fixes something, I don't want to hold it
until a backend implementation is done.

So, config/nvptx part okay for master, provided:
- nvptx test-case added
- nvptx standalone target testing done
- assumption comment added to implementation.
This bit can be committed as a separate patch, without the oacc test-case.

As for the oacc test-case, you could add the __int128 bit, perhaps along
the lines of how things are done in
libgomp/testsuite/libgomp.c++/target-8.C ?

[ FWIW, it would be nice if the choice between critical section and
atomic builtin was postponed till after the host/offload compiler split.
 Then the nvptx compiler would just fall back on the critical section,
and the test-case would have worked out of the box, and there would be
no need for the target to pretend to support unsupported atomic operators.
Conversely, we could upgrade the omp variables from char/short to int
somehow to make sure the assumptions for using non-native subword
__sync_val_compare_and_swap are met. ]

Thanks,
- Tom

> Kwok
> 
> 
> nvptx_subword.patch
> 
> commit 7c3a9c23ba9f5b8fe953aa5492ae75617f2444a3
> Author: Kwok Cheung Yeung <kcy@codesourcery.com>
> Date:   Mon Jun 15 12:34:55 2020 -0700
> 
>     nvptx: Add support for subword compare-and-swap
>     
>     2020-06-15  Kwok Cheung Yeung  <kcy@codesourcery.com>
>     
>     	libgcc/
>     	* config/nvptx/atomic.c: New.
>     	* config/nvptx/t-nvptx (LIB2ADD): Add atomic.c.
>     
>     	libgomp/
>     	* testsuite/libgomp.c-c++-common/reduction-16.c: New.
> 
> diff --git a/libgcc/config/nvptx/atomic.c b/libgcc/config/nvptx/atomic.c
> new file mode 100644
> index 0000000..4becbd2
> --- /dev/null
> +++ b/libgcc/config/nvptx/atomic.c
> @@ -0,0 +1,59 @@
> +/* NVPTX atomic operations
> +   Copyright (C) 2020 Free Software Foundation, Inc.
> +   Contributed by Mentor Graphics.
> +
> +   This file is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by the
> +   Free Software Foundation; either version 3, or (at your option) any
> +   later version.
> +
> +   This file is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <stdbool.h>
> +
> +#define __SYNC_SUBWORD_COMPARE_AND_SWAP(TYPE, SIZE)			     \
> +									     \
> +TYPE									     \
> +__sync_val_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval)     \
> +{									     \
> +  unsigned int *wordptr = (unsigned int *)((__UINTPTR_TYPE__ ) ptr & ~3UL);  \
> +  int shift = ((__UINTPTR_TYPE__ ) ptr & 3UL) * 8;			     \
> +  unsigned int valmask = (1 << (SIZE * 8)) - 1;				     \
> +  unsigned int wordmask = ~(valmask << shift);				     \
> +  unsigned int oldword = *wordptr;					     \
> +  for (;;)								     \
> +    {									     \
> +      TYPE prevval = (oldword >> shift) & valmask;			     \
> +      if (__builtin_expect (prevval != oldval, 0))			     \
> +	return prevval;							     \
> +      unsigned int newword = oldword & wordmask;			     \
> +      newword |= ((unsigned int) newval) << shift;			     \
> +      unsigned int prevword						     \
> +	  = __sync_val_compare_and_swap_4 (wordptr, oldword, newword);	     \
> +      if (__builtin_expect (prevword == oldword, 1))			     \
> +	return oldval;							     \
> +      oldword = prevword;						     \
> +    }									     \
> +}									     \
> +									     \
> +bool									     \
> +__sync_bool_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval)    \
> +{									     \
> +  return __sync_val_compare_and_swap_##SIZE (ptr, oldval, newval) == oldval; \
> +}
> +
> +__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned char, 1)
> +__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned short, 2)
> +
> diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
> index c4d20c9..ede0bf0 100644
> --- a/libgcc/config/nvptx/t-nvptx
> +++ b/libgcc/config/nvptx/t-nvptx
> @@ -1,5 +1,6 @@
>  LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
> -	$(srcdir)/config/nvptx/mgomp.c
> +	$(srcdir)/config/nvptx/mgomp.c \
> +	$(srcdir)/config/nvptx/atomic.c
>  
>  LIB2ADDEH=
>  LIB2FUNCS_EXCLUDE=__main
> diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> new file mode 100644
> index 0000000..951e522
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> @@ -0,0 +1,46 @@
> +/* { dg-do run } */
> +
> +#include <stdlib.h>
> +
> +#define N 512
> +
> +#define GENERATE_TEST(T)	\
> +int test_##T (void)		\
> +{				\
> +  T a[N], res = 0;		\
> +				\
> +  for (int i = 0; i < N; ++i)	\
> +    a[i] = i & 1;		\
> +				\
> +_Pragma("omp target teams distribute reduction(||:res) defaultmap(tofrom:scalar)") \
> +  for (int i = 0; i < N; ++i)	\
> +    res = res || a[i];		\
> +				\
> +  /* res should be non-zero.  */\
> +  if (!res)			\
> +    return 1;			\
> +				\
> +_Pragma("omp target teams distribute reduction(&&:res) defaultmap(tofrom:scalar)") \
> +  for (int i = 0; i < N; ++i)	\
> +    res = res && a[i];		\
> +				\
> +  /* res should be zero.  */	\
> +  return res;			\
> +}
> +
> +GENERATE_TEST(char)
> +GENERATE_TEST(short)
> +GENERATE_TEST(int)
> +GENERATE_TEST(long)
> +
> +int main(void)
> +{
> +  if (test_char ())
> +    abort ();
> +  if (test_short ())
> +    abort ();
> +  if (test_int ())
> +    abort ();
> +  if (test_long ())
> +    abort ();
> +}
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] nvptx: Add support for subword compare-and-swap
  2020-07-01 14:28 ` Tom de Vries
@ 2020-07-15 19:08   ` Kwok Cheung Yeung
  2020-07-20 13:19   ` Kwok Cheung Yeung
  1 sibling, 0 replies; 24+ messages in thread
From: Kwok Cheung Yeung @ 2020-07-15 19:08 UTC (permalink / raw)
  To: Tom de Vries, GCC Patches; +Cc: Thomas Schwinge, Jakub Jelinek

On 01/07/2020 3:28 pm, Tom de Vries wrote:
> I looked at the implementation, and it looks ok to me, though I think we
> need to make explicit in a comment what the assumptions are:
> - that we have read and write access to the entire word, and

Is there a situation where an 8/16-bit portion of memory is R/W but the 32-bit 
word containing it is not on this architecture? Something like memory-mapped I/O 
perhaps?

> - that the word is not volatile.

I don't think that non-volatility matters in this case - indeed, the whole point 
of using an atomic primitive is that the memory being accessed can change at any 
time :-). There is an initial read of the word before the loop, then one read 
per iteration of the loop using __sync_val_compare_and_swap_4 (which by 
definition will always access the memory). If any part of the initially read 
value changes by the time it gets to __sync_val_compare_and_swap_4, then the 
compare will fail and the loop will continue onto the next iteration.

Thanks

Kwok

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] nvptx: Add support for subword compare-and-swap
  2020-07-01 14:28 ` Tom de Vries
  2020-07-15 19:08   ` Kwok Cheung Yeung
@ 2020-07-20 13:19   ` Kwok Cheung Yeung
  2020-08-04 14:56     ` [PING] " Kwok Cheung Yeung
  2020-08-13  9:27     ` Tom de Vries
  1 sibling, 2 replies; 24+ messages in thread
From: Kwok Cheung Yeung @ 2020-07-20 13:19 UTC (permalink / raw)
  To: Tom de Vries, GCC Patches; +Cc: Thomas Schwinge, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1681 bytes --]

On 01/07/2020 3:28 pm, Tom de Vries wrote:
> So, I think gcc needs a copy of (some of) the
> gcc/testsuite/gcc.dg/ia64-sync-*.c tests for effective target
> sync_char_short.
> 
> However, since this patch only adds partial support, we cannot enable
> sync_char_short for nvptx yet.  So, if you stick to partial support, you
> should add a char/short copy of ia64-sync-3.c to gcc.target/nvptx (which
> ideally could be an include of a generic test-case that is active for
> sync_char_short only, with mention that it can be removed once
> sync_char_short is enabled for nvptx).
> 

I have added gcc.target/nvptx/sync.c, which is a version of ia64-sync-3.c 
extended to test chars and shorts too. I kept the original int and long tests 
because sync_int_long isn't indicated as being supported on nvptx either.

> I looked at the implementation, and it looks ok to me, though I think we
> need to make explicit in a comment what the assumptions are:
> - that we have read and write access to the entire word, and
> - that the word is not volatile.
> 

I've added some extra comments in the implementation. Like I said previously, 
the loop accounts for the larger word being volatile.

> As for the oacc test-case, you could add the __int128 bit, perhaps along
> the lines of how things are done in
> libgomp/testsuite/libgomp.c++/target-8.C ?
> 

I've added a extra test for __int128 types in my libgomp testcase that runs if 
128-bit types are supported.

I've tested that there are no regressions with the patch on standalone nvptx, 
and that the new reduction-16.c testcase passes with both nvptx and AMD GCN 
offloading.

Is this version okay for master and og10?

Thanks

Kwok

[-- Attachment #2: nvptx_subword_sync.patch --]
[-- Type: text/plain, Size: 10309 bytes --]

commit 4661232905d55a4bc1354cb717b2e5d950d215af
Author: Kwok Cheung Yeung <kcy@codesourcery.com>
Date:   Thu Jul 16 12:00:24 2020 -0700

    nvptx: Add support for subword compare-and-swap
    
    This adds support for __sync_val_compare_and_swap and
    __sync_bool_compare_and_swap for 1-byte and 2-byte long
    values, which are not natively supported on nvptx.
    
    2020-07-16  Kwok Cheung Yeung  <kcy@codesourcery.com>
    
    	libgcc/
    	* config/nvptx/atomic.c: New.
    	* config/nvptx/t-nvptx (LIB2ADD): Add atomic.c.
    
    	gcc/testsuite/
    	* gcc.target/nvptx/sync.c: New.
    
    	libgomp/
    	* testsuite/libgomp.c-c++-common/reduction-16.c: New.

diff --git a/gcc/testsuite/gcc.target/nvptx/sync.c b/gcc/testsuite/gcc.target/nvptx/sync.c
new file mode 100644
index 0000000..a573824
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sync.c
@@ -0,0 +1,143 @@
+/* { dg-do run } */
+
+/* Test basic functionality of the intrinsics.  */
+
+/* This is a copy of gcc.dg/ia64-sync-2.c, extended to test 8-bit and 16-bit
+   values as well.  */
+
+/* Ideally this test should require sync_char_short and sync_int_long, but we
+   only support a subset at the moment.  */
+
+__extension__ typedef __SIZE_TYPE__ size_t;
+
+extern void abort (void);
+extern void *memcpy (void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
+
+static char AC[4];
+static char init_qi[4] = { -30,-30,-50,-50 };
+static char test_qi[4] = { -115,-115,25,25 };
+
+static void
+do_qi (void)
+{
+  if (__sync_val_compare_and_swap(AC+0, -30, -115) != -30)
+    abort ();
+  if (__sync_val_compare_and_swap(AC+0, -30, -115) != -115)
+    abort ();
+  if (__sync_bool_compare_and_swap(AC+1, -30, -115) != 1)
+    abort ();
+  if (__sync_bool_compare_and_swap(AC+1, -30, -115) != 0)
+    abort ();
+
+  if (__sync_val_compare_and_swap(AC+2, AC[2], 25) != -50)
+    abort ();
+  if (__sync_val_compare_and_swap(AC+2, AC[2], 25) != 25)
+    abort ();
+  if (__sync_bool_compare_and_swap(AC+3, AC[3], 25) != 1)
+    abort ();
+  if (__sync_bool_compare_and_swap(AC+3, AC[3], 25) != 1)
+    abort ();
+}
+
+static short AS[4];
+static short init_hi[4] = { -30,-30,-50,-50 };
+static short test_hi[4] = { -115,-115,25,25 };
+
+static void
+do_hi (void)
+{
+  if (__sync_val_compare_and_swap(AS+0, -30, -115) != -30)
+    abort ();
+  if (__sync_val_compare_and_swap(AS+0, -30, -115) != -115)
+    abort ();
+  if (__sync_bool_compare_and_swap(AS+1, -30, -115) != 1)
+    abort ();
+  if (__sync_bool_compare_and_swap(AS+1, -30, -115) != 0)
+    abort ();
+
+  if (__sync_val_compare_and_swap(AS+2, AS[2], 25) != -50)
+    abort ();
+  if (__sync_val_compare_and_swap(AS+2, AS[2], 25) != 25)
+    abort ();
+  if (__sync_bool_compare_and_swap(AS+3, AS[3], 25) != 1)
+    abort ();
+  if (__sync_bool_compare_and_swap(AS+3, AS[3], 25) != 1)
+    abort ();
+}
+
+static int AI[4];
+static int init_si[4] = { -30,-30,-50,-50 };
+static int test_si[4] = { -115,-115,25,25 };
+
+static void
+do_si (void)
+{
+  if (__sync_val_compare_and_swap(AI+0, -30, -115) != -30)
+    abort ();
+  if (__sync_val_compare_and_swap(AI+0, -30, -115) != -115)
+    abort ();
+  if (__sync_bool_compare_and_swap(AI+1, -30, -115) != 1)
+    abort ();
+  if (__sync_bool_compare_and_swap(AI+1, -30, -115) != 0)
+    abort ();
+
+  if (__sync_val_compare_and_swap(AI+2, AI[2], 25) != -50)
+    abort ();
+  if (__sync_val_compare_and_swap(AI+2, AI[2], 25) != 25)
+    abort ();
+  if (__sync_bool_compare_and_swap(AI+3, AI[3], 25) != 1)
+    abort ();
+  if (__sync_bool_compare_and_swap(AI+3, AI[3], 25) != 1)
+    abort ();
+}
+
+static long AL[4];
+static long init_di[4] = { -30,-30,-50,-50 };
+static long test_di[4] = { -115,-115,25,25 };
+
+static void
+do_di (void)
+{
+  if (__sync_val_compare_and_swap(AL+0, -30, -115) != -30)
+    abort ();
+  if (__sync_val_compare_and_swap(AL+0, -30, -115) != -115)
+    abort ();
+  if (__sync_bool_compare_and_swap(AL+1, -30, -115) != 1)
+    abort ();
+  if (__sync_bool_compare_and_swap(AL+1, -30, -115) != 0)
+    abort ();
+
+  if (__sync_val_compare_and_swap(AL+2, AL[2], 25) != -50)
+    abort ();
+  if (__sync_val_compare_and_swap(AL+2, AL[2], 25) != 25)
+    abort ();
+  if (__sync_bool_compare_and_swap(AL+3, AL[3], 25) != 1)
+    abort ();
+  if (__sync_bool_compare_and_swap(AL+3, AL[3], 25) != 1)
+    abort ();
+}
+
+int main()
+{
+  memcpy(AC, init_qi, sizeof(init_qi));
+  memcpy(AS, init_hi, sizeof(init_hi));
+  memcpy(AI, init_si, sizeof(init_si));
+  memcpy(AL, init_di, sizeof(init_di));
+
+  do_qi ();
+  do_hi ();
+  do_si ();
+  do_di ();
+
+  if (memcmp (AC, test_qi, sizeof(test_qi)))
+    abort ();
+  if (memcmp (AS, test_hi, sizeof(test_hi)))
+    abort ();
+  if (memcmp (AI, test_si, sizeof(test_si)))
+    abort ();
+  if (memcmp (AL, test_di, sizeof(test_di)))
+    abort ();
+
+  return 0;
+}
diff --git a/libgcc/config/nvptx/atomic.c b/libgcc/config/nvptx/atomic.c
new file mode 100644
index 0000000..25a34fb
--- /dev/null
+++ b/libgcc/config/nvptx/atomic.c
@@ -0,0 +1,70 @@
+/* NVPTX atomic operations
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Mentor Graphics.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdbool.h>
+
+/* Implement __sync_val_compare_and_swap and __sync_bool_compare_and_swap
+   for 1 and 2-byte values (which are not natively supported) in terms of
+   __sync_val_compare_and_swap for 4-byte values (which is supported).
+   This assumes that the contents of the word surrounding the subword
+   value that we are interested in are accessible as well (which should
+   normally be the case).  */
+
+#define __SYNC_SUBWORD_COMPARE_AND_SWAP(TYPE, SIZE)			     \
+									     \
+TYPE									     \
+__sync_val_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval)     \
+{									     \
+  unsigned int *wordptr = (unsigned int *)((__UINTPTR_TYPE__ ) ptr & ~3UL);  \
+  int shift = ((__UINTPTR_TYPE__ ) ptr & 3UL) * 8;			     \
+  unsigned int valmask = (1 << (SIZE * 8)) - 1;				     \
+  unsigned int wordmask = ~(valmask << shift);				     \
+  unsigned int oldword = *wordptr;					     \
+  for (;;)								     \
+    {									     \
+      TYPE prevval = (oldword >> shift) & valmask;			     \
+      /* Exit if the subword value previously read from memory is not */     \
+      /* equal to the expected value OLDVAL.  */			     \
+      if (__builtin_expect (prevval != oldval, 0))			     \
+	return prevval;							     \
+      unsigned int newword = oldword & wordmask;			     \
+      newword |= ((unsigned int) newval) << shift;			     \
+      unsigned int prevword						     \
+	  = __sync_val_compare_and_swap_4 (wordptr, oldword, newword);	     \
+      /* Exit only if the compare-and-swap succeeds on the whole word */     \
+      /* (i.e. the contents of *WORDPTR have not changed since the last */   \
+      /* memory read).  */						     \
+      if (__builtin_expect (prevword == oldword, 1))			     \
+	return oldval;							     \
+      oldword = prevword;						     \
+    }									     \
+}									     \
+									     \
+bool									     \
+__sync_bool_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval)    \
+{									     \
+  return __sync_val_compare_and_swap_##SIZE (ptr, oldval, newval) == oldval; \
+}
+
+__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned char, 1)
+__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned short, 2)
diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index c4d20c9..ede0bf0 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -1,5 +1,6 @@
 LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
-	$(srcdir)/config/nvptx/mgomp.c
+	$(srcdir)/config/nvptx/mgomp.c \
+	$(srcdir)/config/nvptx/atomic.c
 
 LIB2ADDEH=
 LIB2FUNCS_EXCLUDE=__main
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
new file mode 100644
index 0000000..d0e82b0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+
+#define N 512
+
+#define GENERATE_TEST(T)	\
+int test_##T (void)		\
+{				\
+  T a[N], res = 0;		\
+				\
+  for (int i = 0; i < N; ++i)	\
+    a[i] = i & 1;		\
+				\
+_Pragma("omp target teams distribute reduction(||:res) defaultmap(tofrom:scalar)") \
+  for (int i = 0; i < N; ++i)	\
+    res = res || a[i];		\
+				\
+  /* res should be non-zero.  */\
+  if (!res)			\
+    return 1;			\
+				\
+_Pragma("omp target teams distribute reduction(&&:res) defaultmap(tofrom:scalar)") \
+  for (int i = 0; i < N; ++i)	\
+    res = res && a[i];		\
+				\
+  /* res should be zero.  */	\
+  return res;			\
+}
+
+GENERATE_TEST(char)
+GENERATE_TEST(short)
+GENERATE_TEST(int)
+GENERATE_TEST(long)
+#ifdef __SIZEOF_INT128__
+GENERATE_TEST(__int128)
+#endif
+
+int main(void)
+{
+  if (test_char ())
+    abort ();
+  if (test_short ())
+    abort ();
+  if (test_int ())
+    abort ();
+  if (test_long ())
+    abort ();
+#ifdef __SIZEOF_INT128__
+  if (test___int128 ())
+    abort ();
+#endif
+}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PING] Re: [PATCH] nvptx: Add support for subword compare-and-swap
  2020-07-20 13:19   ` Kwok Cheung Yeung
@ 2020-08-04 14:56     ` Kwok Cheung Yeung
  2020-08-13  9:27     ` Tom de Vries
  1 sibling, 0 replies; 24+ messages in thread
From: Kwok Cheung Yeung @ 2020-08-04 14:56 UTC (permalink / raw)
  To: Tom de Vries, GCC Patches; +Cc: Thomas Schwinge, Jakub Jelinek, Catherine Moore

Hello

I posted a revised patchset about two weeks ago at:

https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550291.html

Are you able to take a look at it?

Thanks

Kwok

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] nvptx: Add support for subword compare-and-swap
  2020-07-20 13:19   ` Kwok Cheung Yeung
  2020-08-04 14:56     ` [PING] " Kwok Cheung Yeung
@ 2020-08-13  9:27     ` Tom de Vries
  2020-09-01 11:41       ` [patch][nvptx] libgomp: Split testcase in order to XFAIL __sync_val_compare_and_swap_16 (was: [PATCH] nvptx: Add support for subword compare-and-swap) Tobias Burnus
  1 sibling, 1 reply; 24+ messages in thread
From: Tom de Vries @ 2020-08-13  9:27 UTC (permalink / raw)
  To: Kwok Cheung Yeung, GCC Patches; +Cc: Thomas Schwinge, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2278 bytes --]

On 7/20/20 3:19 PM, Kwok Cheung Yeung wrote:
> On 01/07/2020 3:28 pm, Tom de Vries wrote:
>> So, I think gcc needs a copy of (some of) the
>> gcc/testsuite/gcc.dg/ia64-sync-*.c tests for effective target
>> sync_char_short.
>>
>> However, since this patch only adds partial support, we cannot enable
>> sync_char_short for nvptx yet.  So, if you stick to partial support, you
>> should add a char/short copy of ia64-sync-3.c to gcc.target/nvptx (which
>> ideally could be an include of a generic test-case that is active for
>> sync_char_short only, with mention that it can be removed once
>> sync_char_short is enabled for nvptx).
>>
> 
> I have added gcc.target/nvptx/sync.c, which is a version of
> ia64-sync-3.c extended to test chars and shorts too.

I've:
- added the short/char part of that as gcc.dg/ia64-async-5.c
- included the sync_int_long ones of gcc.dg/ia64-async-* in
  gcc.target/nvptx
- did the same for gcc.dg/ia64-async-5.c as part of this patch

> I kept the original
> int and long tests because sync_int_long isn't indicated as being
> supported on nvptx either.
> 

Yep, I proposed a patch to enable that:
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551842.html .

>> I looked at the implementation, and it looks ok to me, though I think we
>> need to make explicit in a comment what the assumptions are:
>> - that we have read and write access to the entire word, and
>> - that the word is not volatile.
>>
> 
> I've added some extra comments in the implementation. Like I said
> previously, the loop accounts for the larger word being volatile.
> 

Right, I known what that loop is intending to do. The loop though may
manifest worst-case as a hang, so I've mentioned that in the comment.

>> As for the oacc test-case, you could add the __int128 bit, perhaps along
>> the lines of how things are done in
>> libgomp/testsuite/libgomp.c++/target-8.C ?
>>
> 
> I've added a extra test for __int128 types in my libgomp testcase that
> runs if 128-bit types are supported.
> 
> I've tested that there are no regressions with the patch on standalone
> nvptx, and that the new reduction-16.c testcase passes with both nvptx
> and AMD GCN offloading.
> 
> Is this version okay for master and og10?
> 

Pushed as attached to master.

Thanks,
- Tom


[-- Attachment #2: 0001-nvptx-Add-support-for-subword-compare-and-swap.patch --]
[-- Type: text/x-patch, Size: 6593 bytes --]

nvptx: Add support for subword compare-and-swap

This adds support for __sync_val_compare_and_swap and
__sync_bool_compare_and_swap for 1-byte and 2-byte long
values, which are not natively supported on nvptx.

Build and reg-tested on nvptx.
Build and reg-tested libgomp on x86_64 with nvptx accelerator.

2020-07-16  Kwok Cheung Yeung  <kcy@codesourcery.com>

	libgcc/
	* config/nvptx/atomic.c: New.
	* config/nvptx/t-nvptx (LIB2ADD): Add atomic.c.

	gcc/testsuite/
	* gcc.target/nvptx/sync-5.c: New.

	libgomp/
	* testsuite/libgomp.c-c++-common/reduction-16.c: New.

---
 gcc/testsuite/gcc.target/nvptx/ia64-sync-5.c       |  2 +
 libgcc/config/nvptx/atomic.c                       | 73 ++++++++++++++++++++++
 libgcc/config/nvptx/t-nvptx                        |  3 +-
 .../testsuite/libgomp.c-c++-common/reduction-16.c  | 53 ++++++++++++++++
 4 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/ia64-sync-5.c b/gcc/testsuite/gcc.target/nvptx/ia64-sync-5.c
new file mode 100644
index 00000000000..ec40f2ca7a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/ia64-sync-5.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "../../gcc.dg/ia64-sync-5.c"
diff --git a/libgcc/config/nvptx/atomic.c b/libgcc/config/nvptx/atomic.c
new file mode 100644
index 00000000000..e1ea078692a
--- /dev/null
+++ b/libgcc/config/nvptx/atomic.c
@@ -0,0 +1,73 @@
+/* NVPTX atomic operations
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Mentor Graphics.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdbool.h>
+
+/* Implement __sync_val_compare_and_swap and __sync_bool_compare_and_swap
+   for 1 and 2-byte values (which are not natively supported) in terms of
+   __sync_val_compare_and_swap for 4-byte values (which is supported).
+   This assumes that the contents of the word surrounding the subword
+   value that we are interested in are accessible as well (which should
+   normally be the case).  Note that if the contents of the word surrounding
+   the subword changes between the __sync_val_compare_and_swap_4 and the
+   preceeding load of oldword, while the subword does not, the implementation
+   loops, which may manifest worst-case as a hang.  */
+
+#define __SYNC_SUBWORD_COMPARE_AND_SWAP(TYPE, SIZE)			     \
+									     \
+TYPE									     \
+__sync_val_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval)     \
+{									     \
+  unsigned int *wordptr = (unsigned int *)((__UINTPTR_TYPE__ ) ptr & ~3UL);  \
+  int shift = ((__UINTPTR_TYPE__ ) ptr & 3UL) * 8;			     \
+  unsigned int valmask = (1 << (SIZE * 8)) - 1;				     \
+  unsigned int wordmask = ~(valmask << shift);				     \
+  unsigned int oldword = *wordptr;					     \
+  for (;;)								     \
+    {									     \
+      TYPE prevval = (oldword >> shift) & valmask;			     \
+      /* Exit if the subword value previously read from memory is not */     \
+      /* equal to the expected value OLDVAL.  */			     \
+      if (__builtin_expect (prevval != oldval, 0))			     \
+	return prevval;							     \
+      unsigned int newword = oldword & wordmask;			     \
+      newword |= ((unsigned int) newval) << shift;			     \
+      unsigned int prevword						     \
+	  = __sync_val_compare_and_swap_4 (wordptr, oldword, newword);	     \
+      /* Exit only if the compare-and-swap succeeds on the whole word */     \
+      /* (i.e. the contents of *WORDPTR have not changed since the last */   \
+      /* memory read).  */						     \
+      if (__builtin_expect (prevword == oldword, 1))			     \
+	return oldval;							     \
+      oldword = prevword;						     \
+    }									     \
+}									     \
+									     \
+bool									     \
+__sync_bool_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval)    \
+{									     \
+  return __sync_val_compare_and_swap_##SIZE (ptr, oldval, newval) == oldval; \
+}
+
+__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned char, 1)
+__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned short, 2)
diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index c4d20c94cbb..ede0bf0f87d 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -1,5 +1,6 @@
 LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
-	$(srcdir)/config/nvptx/mgomp.c
+	$(srcdir)/config/nvptx/mgomp.c \
+	$(srcdir)/config/nvptx/atomic.c
 
 LIB2ADDEH=
 LIB2FUNCS_EXCLUDE=__main
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
new file mode 100644
index 00000000000..d0e82b04790
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+
+#define N 512
+
+#define GENERATE_TEST(T)	\
+int test_##T (void)		\
+{				\
+  T a[N], res = 0;		\
+				\
+  for (int i = 0; i < N; ++i)	\
+    a[i] = i & 1;		\
+				\
+_Pragma("omp target teams distribute reduction(||:res) defaultmap(tofrom:scalar)") \
+  for (int i = 0; i < N; ++i)	\
+    res = res || a[i];		\
+				\
+  /* res should be non-zero.  */\
+  if (!res)			\
+    return 1;			\
+				\
+_Pragma("omp target teams distribute reduction(&&:res) defaultmap(tofrom:scalar)") \
+  for (int i = 0; i < N; ++i)	\
+    res = res && a[i];		\
+				\
+  /* res should be zero.  */	\
+  return res;			\
+}
+
+GENERATE_TEST(char)
+GENERATE_TEST(short)
+GENERATE_TEST(int)
+GENERATE_TEST(long)
+#ifdef __SIZEOF_INT128__
+GENERATE_TEST(__int128)
+#endif
+
+int main(void)
+{
+  if (test_char ())
+    abort ();
+  if (test_short ())
+    abort ();
+  if (test_int ())
+    abort ();
+  if (test_long ())
+    abort ();
+#ifdef __SIZEOF_INT128__
+  if (test___int128 ())
+    abort ();
+#endif
+}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch][nvptx] libgomp: Split testcase in order to XFAIL __sync_val_compare_and_swap_16 (was: [PATCH] nvptx: Add support for subword compare-and-swap)
  2020-08-13  9:27     ` Tom de Vries
@ 2020-09-01 11:41       ` Tobias Burnus
  2020-09-01 12:58         ` Tom de Vries
  0 siblings, 1 reply; 24+ messages in thread
From: Tobias Burnus @ 2020-09-01 11:41 UTC (permalink / raw)
  To: Tom de Vries, Kwok Cheung Yeung, GCC Patches
  Cc: Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 908 bytes --]

Hi Tom, hello all,

it turned out that the testcase fails on PowerPC (but not x86_64)
as the nvptx lto complains: unresolved symbol __sync_val_compare_and_swap_16

The testcase uses int128 – and that's the culprit, but I have no idea
why it only fails with PowerPC and not with x86-64.

Unless someone sees a good way to implement __sync_val_compare_and_swap_16,
I propose to XFAIL it for now.

Namely: Split-off the int128 testcase into a new file and run it only
if not nvptx-offloading on PowerPC. If on nvptx+PowerPC, link that testcase
as well – but XFAIL it.

OK? Or do you/does anyone have a better idea?

Tobias

PS: Should I open a PR for the missing __sync_val_compare_and_swap_16?

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

[-- Attachment #2: swap-xfail.diff --]
[-- Type: text/x-patch, Size: 3510 bytes --]

libgomp: Split testcase in order to XFAIL __sync_val_compare_and_swap_16

On PowerPC, only, __sync_val_compare_and_swap_16 is generated for the
testcase – which nvptx currently does not support.

libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/reduction-16.c: Move int128 test to ...
	* testsuite/libgomp.c-c++-common/reduction-17.c: ... this new test.
	* testsuite/libgomp.c-c++-common/reduction-17a.c: New test.

 .../testsuite/libgomp.c-c++-common/reduction-16.c  |  8 +----
 .../testsuite/libgomp.c-c++-common/reduction-17.c  | 41 ++++++++++++++++++++++
 .../testsuite/libgomp.c-c++-common/reduction-17a.c | 11 ++++++
 3 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
index d0e82b04790..02ec4cd36b8 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* See also reduction-17.c for an int128 testcase.  */
 
 #include <stdlib.h>
 
@@ -32,9 +33,6 @@ GENERATE_TEST(char)
 GENERATE_TEST(short)
 GENERATE_TEST(int)
 GENERATE_TEST(long)
-#ifdef __SIZEOF_INT128__
-GENERATE_TEST(__int128)
-#endif
 
 int main(void)
 {
@@ -46,8 +44,4 @@ int main(void)
     abort ();
   if (test_long ())
     abort ();
-#ifdef __SIZEOF_INT128__
-  if (test___int128 ())
-    abort ();
-#endif
 }
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-17.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-17.c
new file mode 100644
index 00000000000..44dabf6e139
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-17.c
@@ -0,0 +1,41 @@
+/* { dg-do run { target { ! powerpc*-*-linux* } || { ! offload_target_nvptx } } } */
+/* { dg-require-effective-target int128 } */
+
+/* See also reduction-17a.c for powerpc*-*-linux* + offload_target_nvptx.  */
+/* See also reduction-16.c for a char/short/int/long testcase.  */
+
+#include <stdlib.h>
+
+#define N 512
+
+#define GENERATE_TEST(T)	\
+int test_##T (void)		\
+{				\
+  T a[N], res = 0;		\
+				\
+  for (int i = 0; i < N; ++i)	\
+    a[i] = i & 1;		\
+				\
+_Pragma("omp target teams distribute reduction(||:res) defaultmap(tofrom:scalar)") \
+  for (int i = 0; i < N; ++i)	\
+    res = res || a[i];		\
+				\
+  /* res should be non-zero.  */\
+  if (!res)			\
+    return 1;			\
+				\
+_Pragma("omp target teams distribute reduction(&&:res) defaultmap(tofrom:scalar)") \
+  for (int i = 0; i < N; ++i)	\
+    res = res && a[i];		\
+				\
+  /* res should be zero.  */	\
+  return res;			\
+}
+
+GENERATE_TEST(__int128)
+
+int main(void)
+{
+  if (test___int128 ())
+    abort ();
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-17a.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-17a.c
new file mode 100644
index 00000000000..1e6a7bc990f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-17a.c
@@ -0,0 +1,11 @@
+/* Duplicates reduction-17.c as it fails on PowerPC with nvptx offload
+   due to 'unresolved symbol __sync_val_compare_and_swap_16' on the nvptx side.  */
+
+/* { dg-do link { target { powerpc*-*-linux* } && { offload_target_nvptx } } } */
+/* { dg-require-effective-target int128 } */
+/* { dg-xfail-if "__sync_val_compare_and_swap_16" { *-*-* } } */
+
+/* See also reduction-17a.c for either not powerpc*-*-linux* or not offload_target_nvptx.  */
+/* See also reduction-16.c for a char/short/int/long testcase.  */
+
+#include "reduction-17.c"

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch][nvptx] libgomp: Split testcase in order to XFAIL __sync_val_compare_and_swap_16 (was: [PATCH] nvptx: Add support for subword compare-and-swap)
  2020-09-01 11:41       ` [patch][nvptx] libgomp: Split testcase in order to XFAIL __sync_val_compare_and_swap_16 (was: [PATCH] nvptx: Add support for subword compare-and-swap) Tobias Burnus
@ 2020-09-01 12:58         ` Tom de Vries
  2020-09-02  7:56           ` Tom de Vries
  0 siblings, 1 reply; 24+ messages in thread
From: Tom de Vries @ 2020-09-01 12:58 UTC (permalink / raw)
  To: Tobias Burnus, Kwok Cheung Yeung, GCC Patches
  Cc: Jakub Jelinek, Thomas Schwinge, Segher Boessenkool

On 9/1/20 1:41 PM, Tobias Burnus wrote:
> Hi Tom, hello all,
> 
> it turned out that the testcase fails on PowerPC (but not x86_64)
> as the nvptx lto complains: unresolved symbol
> __sync_val_compare_and_swap_16
> 
> The testcase uses int128 – and that's the culprit, but I have no idea
> why it only fails with PowerPC and not with x86-64.
> 

Well, I'm guessing the explanation is here in omp-expand.c:
...
/* Expand an GIMPLE_OMP_ATOMIC statement.  We try to expand

   using expand_omp_atomic_fetch_op.  If it failed, we try to

   call expand_omp_atomic_pipeline, and if it fails too, the

   ultimate fallback is wrapping the operation in a mutex

   (expand_omp_atomic_mutex).  REGION is the atomic region built

   by build_omp_regions_1().  */

static void
expand_omp_atomic (struct omp_region *region)
...

In the x86_64 case, when doing:
...
$ gcc-11 src/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
-fdump-tree-all-details -fopenmp
...
we get:
...
  <bb 33> :
  D.3382 = .omp_data_i->res;
  GOMP_atomic_start ();
  D.3383 = MEM[(__int128 * {ref-all})D.3382];

  <bb 34> :
  D.3384 = (_Bool) D.3383;
  if (D.3384 != 0)
    goto <bb 35>; [INV]
  else
    goto <bb 37>; [INV]

  <bb 38> :
  MEM[(__int128 * {ref-all})D.3382] = iftmp.80;
  GOMP_atomic_end ();
...
which means we're triggering the "expand_omp_atomic_mutex" case for x86_64.

Apparently we're triggering the "expand_omp_atomic_pipeline" for powerpc.

> Unless someone sees a good way to implement __sync_val_compare_and_swap_16,

Hmm, one could implement it in the compiler using calls to
GOMP_atomic_start/GOMP_atomic_end, but it feels somewhat hacky.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch][nvptx] libgomp: Split testcase in order to XFAIL __sync_val_compare_and_swap_16 (was: [PATCH] nvptx: Add support for subword compare-and-swap)
  2020-09-01 12:58         ` Tom de Vries
@ 2020-09-02  7:56           ` Tom de Vries
  2020-09-02 10:22             ` [RFC][nvptx, libgomp] Add 128-bit atomic support Tom de Vries
  0 siblings, 1 reply; 24+ messages in thread
From: Tom de Vries @ 2020-09-02  7:56 UTC (permalink / raw)
  To: Tobias Burnus, Kwok Cheung Yeung, GCC Patches
  Cc: Jakub Jelinek, Thomas Schwinge, Segher Boessenkool

On 9/1/20 2:58 PM, Tom de Vries wrote:
> On 9/1/20 1:41 PM, Tobias Burnus wrote:
>> Hi Tom, hello all,
>>
>> it turned out that the testcase fails on PowerPC (but not x86_64)
>> as the nvptx lto complains: unresolved symbol
>> __sync_val_compare_and_swap_16
>>
>> The testcase uses int128 – and that's the culprit, but I have no idea
>> why it only fails with PowerPC and not with x86-64.
>>
> 

Reproduced on x86_64 using trigger patch:
...
$ git diff
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index ed17bb00205..eccedac192f 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -153,9 +153,15 @@
     (DI "TARGET_64BIT || (TARGET_CMPXCHG8B && (TARGET_80387 ||
TARGET_SSE))")
    ])

+ (define_mode_iterator ATOMIC2
+    [QI HI SI
+     (DI "TARGET_64BIT || (TARGET_CMPXCHG8B && (TARGET_80387 ||
TARGET_SSE))")
+    TI
+    ])
+
 (define_expand "atomic_load<mode>"
-  [(set (match_operand:ATOMIC 0 "nonimmediate_operand")
-       (unspec:ATOMIC [(match_operand:ATOMIC 1 "memory_operand")
+  [(set (match_operand:ATOMIC2 0 "nonimmediate_operand")
+       (unspec:ATOMIC2 [(match_operand:ATOMIC2 1 "memory_operand")
                        (match_operand:SI 2 "const_int_operand")]
                       UNSPEC_LDA))]
   ""
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
index d0e82b04790..62b0e032c33 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-additional-options "-mcx16" } */

 #include <stdlib.h>

...

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][nvptx, libgomp] Add 128-bit atomic support
  2020-09-02  7:56           ` Tom de Vries
@ 2020-09-02 10:22             ` Tom de Vries
  2020-09-02 10:44               ` Jakub Jelinek
  0 siblings, 1 reply; 24+ messages in thread
From: Tom de Vries @ 2020-09-02 10:22 UTC (permalink / raw)
  To: Tobias Burnus, GCC Patches, Jakub Jelinek
  Cc: Kwok Cheung Yeung, Thomas Schwinge, Segher Boessenkool

[-- Attachment #1: Type: text/plain, Size: 2114 bytes --]

[ was: Re: [patch][nvptx] libgomp: Split testcase in order to XFAIL
__sync_val_compare_and_swap_16 (was: [PATCH] nvptx: Add support for
subword compare-and-swap) ]

On 9/2/20 9:56 AM, Tom de Vries wrote:
> On 9/1/20 2:58 PM, Tom de Vries wrote:
>> On 9/1/20 1:41 PM, Tobias Burnus wrote:
>>> Hi Tom, hello all,
>>>
>>> it turned out that the testcase fails on PowerPC (but not x86_64)
>>> as the nvptx lto complains: unresolved symbol
>>> __sync_val_compare_and_swap_16
>>>
>>> The testcase uses int128 – and that's the culprit, but I have no idea
>>> why it only fails with PowerPC and not with x86-64.
>>>
>>
> 
> Reproduced on x86_64 using trigger patch:
> ...
> $ git diff
> diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
> index ed17bb00205..eccedac192f 100644
> --- a/gcc/config/i386/sync.md
> +++ b/gcc/config/i386/sync.md
> @@ -153,9 +153,15 @@
>      (DI "TARGET_64BIT || (TARGET_CMPXCHG8B && (TARGET_80387 ||
> TARGET_SSE))")
>     ])
> 
> + (define_mode_iterator ATOMIC2
> +    [QI HI SI
> +     (DI "TARGET_64BIT || (TARGET_CMPXCHG8B && (TARGET_80387 ||
> TARGET_SSE))")
> +    TI
> +    ])
> +
>  (define_expand "atomic_load<mode>"
> -  [(set (match_operand:ATOMIC 0 "nonimmediate_operand")
> -       (unspec:ATOMIC [(match_operand:ATOMIC 1 "memory_operand")
> +  [(set (match_operand:ATOMIC2 0 "nonimmediate_operand")
> +       (unspec:ATOMIC2 [(match_operand:ATOMIC2 1 "memory_operand")
>                         (match_operand:SI 2 "const_int_operand")]
>                        UNSPEC_LDA))]
>    ""
> diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> index d0e82b04790..62b0e032c33 100644
> --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> @@ -1,4 +1,5 @@
>  /* { dg-do run } */
> +/* { dg-additional-options "-mcx16" } */
> 
>  #include <stdlib.h>
> 
> ...
> 

And test-case passes on x86_64 with this patch (obviously, in
combination with trigger patch above).

Jakub, WDYT?

Tobias, can you try on powerpc?

Thanks,
- Tom


[-- Attachment #2: 0001-nvptx-libgomp-Add-128-bit-atomic-support.patch --]
[-- Type: text/x-patch, Size: 1182 bytes --]

[nvptx, libgomp] Add 128-bit atomic support

---
 libgomp/config/nvptx/atomic.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/libgomp/config/nvptx/atomic.c b/libgomp/config/nvptx/atomic.c
new file mode 100644
index 00000000000..49a6d350827
--- /dev/null
+++ b/libgomp/config/nvptx/atomic.c
@@ -0,0 +1,34 @@
+#include <stdbool.h>
+
+#include "../../atomic.c"
+
+unsigned __int128
+__sync_val_compare_and_swap_16 (volatile void *vptr, unsigned __int128 oldval,
+				unsigned __int128 newval)
+{
+  volatile unsigned __int128 *ptr = vptr;
+  GOMP_atomic_start ();
+  unsigned __int128 val = *ptr;
+  if (val == oldval)
+    *ptr = newval;
+  GOMP_atomic_end ();
+  return val;
+}
+
+bool
+__sync_bool_compare_and_swap_16 (volatile void *vptr, unsigned __int128 oldval,
+				 unsigned __int128 newval)
+{
+  return __sync_val_compare_and_swap_16 (vptr, oldval, newval) == oldval;
+}
+
+unsigned __int128
+__atomic_load_16 (const volatile void *vptr,
+		  int memorder __attribute__((unused)))
+{
+  const volatile unsigned __int128 *ptr = vptr;
+  GOMP_atomic_start ();
+  unsigned __int128 val = *ptr;
+  GOMP_atomic_end ();
+  return val;
+}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
  2020-09-02 10:22             ` [RFC][nvptx, libgomp] Add 128-bit atomic support Tom de Vries
@ 2020-09-02 10:44               ` Jakub Jelinek
  2020-09-02 11:30                 ` Tobias Burnus
  2020-09-02 11:48                 ` Tom de Vries
  0 siblings, 2 replies; 24+ messages in thread
From: Jakub Jelinek @ 2020-09-02 10:44 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Tobias Burnus, GCC Patches, Kwok Cheung Yeung, Thomas Schwinge,
	Segher Boessenkool

On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote:
> And test-case passes on x86_64 with this patch (obviously, in
> combination with trigger patch above).
> 
> Jakub, WDYT?

I guess the normal answer would be use libatomic, but it isn't ported for
nvptx.
I guess at least temporarily this is ok, though I'm wondering why
you need __sync_*_16 rather than __atomic_*_16, or perhaps both __sync_* and
__atomic_*.

What happens if you try
unsigned __int128 v;
#pragma omp declare target (v)
int
main ()
{
  #pragma omp target
  {
    __atomic_add_fetch (&v, 1, __ATOMIC_RELAXED);
    __atomic_fetch_add (&v, 1, __ATOMIC_RELAXED);
    unsigned __int128v exp = 2;
    __atomic_compare_exchange_n (&v, &expected, 7, 0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE);
  }
}
etc. (see some gcc.dg/atomic* tests, ditto for __sync_*)?
I guess better not to throw everything into one test, because not every
target supports them all (e.g. I think x86_64 doesn't really do 128-bit
atomic loads because the cmpxchg16b insn are not appropriate for .rodata
locations).

	Jakub


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
  2020-09-02 10:44               ` Jakub Jelinek
@ 2020-09-02 11:30                 ` Tobias Burnus
  2020-09-02 11:48                 ` Tom de Vries
  1 sibling, 0 replies; 24+ messages in thread
From: Tobias Burnus @ 2020-09-02 11:30 UTC (permalink / raw)
  To: Jakub Jelinek, Tom de Vries
  Cc: GCC Patches, Kwok Cheung Yeung, Thomas Schwinge, Segher Boessenkool

On 9/2/20 12:22 PM, Tom de Vries wrote:

> Tobias, can you try on powerpc?

Testcase now compiles and runs w/o error message.

On 9/2/20 12:44 PM, Jakub Jelinek wrote:

> I guess the normal answer would be use libatomic, but it isn't ported for
> nvptx.
> I guess at least temporarily this is ok,though I'm wondering why
> you need __sync_*_16 rather than __atomic_*_16, or perhaps both __sync_* and
> __atomic_*.
>
> What happens if you try
> unsigned __int128 v;

...

I had to change "unsigned __int128" and "unsigned __int128v" to
"__uint128_t" and "expected" to "exp". Result without offloading
configured on x86-64-gnu-linux:

aotmic.c:(.text+0x84): undefined reference to `__atomic_fetch_add_16'
/usr/bin/ld: aotmic.c:(.text+0xa3): undefined reference to `__atomic_fetch_add_16'
/usr/bin/ld: aotmic.c:(.text+0xda): undefined reference to `__atomic_compare_exchange_16'

And on PowerPC with nvptx (without the RFC patch):

atomic.c: In function 'main._omp_fn.0':
atomic.c:6:11: internal compiler error: in write_fn_proto, at config/nvptx/nvptx.c:913
     6 |   #pragma omp target
       |           ^

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
  2020-09-02 10:44               ` Jakub Jelinek
  2020-09-02 11:30                 ` Tobias Burnus
@ 2020-09-02 11:48                 ` Tom de Vries
  2020-09-11 14:24                   ` Tom de Vries
  1 sibling, 1 reply; 24+ messages in thread
From: Tom de Vries @ 2020-09-02 11:48 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Tobias Burnus, GCC Patches, Kwok Cheung Yeung, Thomas Schwinge,
	Segher Boessenkool

On 9/2/20 12:44 PM, Jakub Jelinek wrote:
> On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote:
>> And test-case passes on x86_64 with this patch (obviously, in
>> combination with trigger patch above).
>>
>> Jakub, WDYT?
> 
> I guess the normal answer would be use libatomic, but it isn't ported for
> nvptx.

Ah, I was not aware of that one, filed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 to look into that.

> I guess at least temporarily this is ok, though I'm wondering why
> you need __sync_*_16 rather than __atomic_*_16, 

That's what omp-expand.c uses in expand_omp_atomic_pipeline:
BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N .

Thanks,
- Tom

> or perhaps both __sync_* and
> __atomic_*.
> 
> What happens if you try
> unsigned __int128 v;
> #pragma omp declare target (v)
> int
> main ()
> {
>   #pragma omp target
>   {
>     __atomic_add_fetch (&v, 1, __ATOMIC_RELAXED);
>     __atomic_fetch_add (&v, 1, __ATOMIC_RELAXED);
>     unsigned __int128v exp = 2;
>     __atomic_compare_exchange_n (&v, &expected, 7, 0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE);
>   }
> }
> etc. (see some gcc.dg/atomic* tests, ditto for __sync_*)?
> I guess better not to throw everything into one test, because not every
> target supports them all (e.g. I think x86_64 doesn't really do 128-bit
> atomic loads because the cmpxchg16b insn are not appropriate for .rodata
> locations).
> 
> 	Jakub
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
  2020-09-02 11:48                 ` Tom de Vries
@ 2020-09-11 14:24                   ` Tom de Vries
  2020-09-11 14:25                     ` Tom de Vries
  2020-09-11 14:37                     ` Jakub Jelinek
  0 siblings, 2 replies; 24+ messages in thread
From: Tom de Vries @ 2020-09-11 14:24 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Tobias Burnus, GCC Patches, Thomas Schwinge, Segher Boessenkool

On 9/2/20 1:48 PM, Tom de Vries wrote:
> On 9/2/20 12:44 PM, Jakub Jelinek wrote:
>> On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote:
>>> And test-case passes on x86_64 with this patch (obviously, in
>>> combination with trigger patch above).
>>>
>>> Jakub, WDYT?
>>
>> I guess the normal answer would be use libatomic, but it isn't ported for
>> nvptx.
> 
> Ah, I was not aware of that one, filed
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 to look into that.
> 
>> I guess at least temporarily this is ok, though I'm wondering why
>> you need __sync_*_16 rather than __atomic_*_16, 
> 
> That's what omp-expand.c uses in expand_omp_atomic_pipeline:
> BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N .
> 

I've got an updated version of this patch. It:
- no longer supplies the __atomic_load_16, since that's now handled by
  libatomic
- the __sync_val_compare_and_swap now uses __atomic_compare_and_swap,
  which also falls back on libatomic.

I'm currently retesting.

Any comments?

Otherwise, I'll commit on Monday.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
  2020-09-11 14:24                   ` Tom de Vries
@ 2020-09-11 14:25                     ` Tom de Vries
  2020-09-11 14:48                       ` Andrew Stubbs
  2020-09-11 14:37                     ` Jakub Jelinek
  1 sibling, 1 reply; 24+ messages in thread
From: Tom de Vries @ 2020-09-11 14:25 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Tobias Burnus, GCC Patches, Thomas Schwinge, Segher Boessenkool

[-- Attachment #1: Type: text/plain, Size: 1159 bytes --]

[ Fixing ENOPATCH. ]

On 9/11/20 4:24 PM, Tom de Vries wrote:
> On 9/2/20 1:48 PM, Tom de Vries wrote:
>> On 9/2/20 12:44 PM, Jakub Jelinek wrote:
>>> On Wed, Sep 02, 2020 at 12:22:28PM +0200, Tom de Vries wrote:
>>>> And test-case passes on x86_64 with this patch (obviously, in
>>>> combination with trigger patch above).
>>>>
>>>> Jakub, WDYT?
>>>
>>> I guess the normal answer would be use libatomic, but it isn't ported for
>>> nvptx.
>>
>> Ah, I was not aware of that one, filed
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 to look into that.
>>
>>> I guess at least temporarily this is ok, though I'm wondering why
>>> you need __sync_*_16 rather than __atomic_*_16, 
>>
>> That's what omp-expand.c uses in expand_omp_atomic_pipeline:
>> BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_N .
>>
> 
> I've got an updated version of this patch. It:
> - no longer supplies the __atomic_load_16, since that's now handled by
>   libatomic
> - the __sync_val_compare_and_swap now uses __atomic_compare_and_swap,
>   which also falls back on libatomic.
> 
> I'm currently retesting.
> 
> Any comments?
> 
> Otherwise, I'll commit on Monday.
> 
> Thanks,
> - Tom
> 

[-- Attachment #2: 0001-libgomp-nvptx-Add-__sync_compare_and_swap_16.patch --]
[-- Type: text/x-patch, Size: 2536 bytes --]

[libgomp, nvptx] Add __sync_compare_and_swap_16

As reported here
( https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553070.html  ),
when running test-case libgomp.c-c++-common/reduction-16.c for powerpc host
with nvptx accelerator, we run into:
...
unresolved symbol __sync_val_compare_and_swap_16
...

I can reproduce the problem on x86_64 with a trigger patch that:
- initializes ix86_isa_flags2 to TARGET_ISA2_CX16
- enables define_expand "atomic_load<mode>" in gcc/config/i386/sync.md
  for TImode

The problem is that omp-expand.c generates atomic builtin calls based on
checks whether those are supported on the host, which forces the target to
support these, even though those checks fail for the accelerator target.

Fix this by:
- adding a __sync_val_compare_and_swap_16 in libgomp for nvptx,
  which falls back onto libatomic's __atomic_compare_and_swap_16
- adding -foffload=-latomic in the test-case

Tested libgomp on x86_64-linux with nvptx accelerator.

Tested libgomp with trigger patch on x86_64-linux with nvptx accelerator.

libgomp/ChangeLog:

	* config/nvptx/atomic.c: New file.  Add
	__sync_val_compare_and_swap_16.

---
 libgomp/config/nvptx/atomic.c                         | 18 ++++++++++++++++++
 libgomp/testsuite/libgomp.c-c++-common/reduction-16.c |  1 +
 2 files changed, 19 insertions(+)

diff --git a/libgomp/config/nvptx/atomic.c b/libgomp/config/nvptx/atomic.c
new file mode 100644
index 00000000000..2e13b09b497
--- /dev/null
+++ b/libgomp/config/nvptx/atomic.c
@@ -0,0 +1,18 @@
+#include "libgomp.h"
+
+#include "../../atomic.c"
+
+/* Implement __sync_val_compare_and_swap_16, to support offloading from hosts
+   that support this builtin.  Fallback on libatomic.  This can be removed
+   once omp-expand starts using __atomic_compare_exchange_n instead.  */
+
+unsigned __int128
+__sync_val_compare_and_swap_16 (volatile void *vptr, unsigned __int128 oldval,
+				unsigned __int128 newval)
+{
+  volatile __int128 *ptr = vptr;
+  __int128 expected = oldval;
+  __atomic_compare_exchange_n (ptr, &expected, newval, false,
+			       MEMMODEL_SEQ_CST, MEMMODEL_SEQ_CST);
+  return expected;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
index d0e82b04790..cfa8853f18b 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-additional-options "-foffload=-latomic" } */
 
 #include <stdlib.h>
 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
  2020-09-11 14:24                   ` Tom de Vries
  2020-09-11 14:25                     ` Tom de Vries
@ 2020-09-11 14:37                     ` Jakub Jelinek
  1 sibling, 0 replies; 24+ messages in thread
From: Jakub Jelinek @ 2020-09-11 14:37 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Tobias Burnus, GCC Patches, Thomas Schwinge, Segher Boessenkool

On Fri, Sep 11, 2020 at 04:24:42PM +0200, Tom de Vries wrote:
> I've got an updated version of this patch. It:
> - no longer supplies the __atomic_load_16, since that's now handled by
>   libatomic
> - the __sync_val_compare_and_swap now uses __atomic_compare_and_swap,
>   which also falls back on libatomic.
> 
> I'm currently retesting.
> 
> Any comments?
> 
> Otherwise, I'll commit on Monday.

If some functions are now in libatomic, do we expect users to know that and
pass -foffload=-latomic to link, or will mkoffload or whatever do that
automatically?  If the latter, will e.g. libgomp testsuite ensure that
during testing the library can be found even non-installed, if the former,
will libgomp testsuite add it for the respective testcases that need it,
perhaps under special options?

	Jakub


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
  2020-09-11 14:25                     ` Tom de Vries
@ 2020-09-11 14:48                       ` Andrew Stubbs
  2020-09-11 15:03                         ` tdevries
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Stubbs @ 2020-09-11 14:48 UTC (permalink / raw)
  To: Tom de Vries, Jakub Jelinek
  Cc: Tobias Burnus, GCC Patches, Thomas Schwinge, Segher Boessenkool

On 11/09/2020 15:25, Tom de Vries wrote:
> --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> 
> @@ -1,4 +1,5 @@
> 
> /*·{·dg-do·run·}·*/
> +/*·{·dg-additional-options·"-foffload=-latomic"·}·*/

This will probably break amdgcn, where libatomic does not exist.

Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
  2020-09-11 14:48                       ` Andrew Stubbs
@ 2020-09-11 15:03                         ` tdevries
  2020-09-11 15:29                           ` Tobias Burnus
  0 siblings, 1 reply; 24+ messages in thread
From: tdevries @ 2020-09-11 15:03 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Jakub Jelinek, Tobias Burnus, GCC Patches, Thomas Schwinge

On 2020-09-11 16:48, Andrew Stubbs wrote:
> On 11/09/2020 15:25, Tom de Vries wrote:
>> --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
>> 
>> @@ -1,4 +1,5 @@
>> 
>> /*·{·dg-do·run·}·*/
>> +/*·{·dg-additional-options·"-foffload=-latomic"·}·*/
> 
> This will probably break amdgcn, where libatomic does not exist.
> 

It looks like the customary way to handle that is to use 
offload_target_nvptx.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][nvptx, libgomp] Add 128-bit atomic support
  2020-09-11 15:03                         ` tdevries
@ 2020-09-11 15:29                           ` Tobias Burnus
  0 siblings, 0 replies; 24+ messages in thread
From: Tobias Burnus @ 2020-09-11 15:29 UTC (permalink / raw)
  To: tdevries, Andrew Stubbs
  Cc: Jakub Jelinek, Tobias Burnus, GCC Patches, Thomas Schwinge

On 9/11/20 5:03 PM, tdevries wrote:

> On 2020-09-11 16:48, Andrew Stubbs wrote:
>> On 11/09/2020 15:25, Tom de Vries wrote:
>>> --- a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
>>> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
>>>
>>> @@ -1,4 +1,5 @@
>>>
>>> /*·{·dg-do·run·}·*/
>>> +/*·{·dg-additional-options·"-foffload=-latomic"·}·*/
>>
>> This will probably break amdgcn, where libatomic does not exist.
>>
> It looks like the customary way to handle that is to use
> offload_target_nvptx.

Or   { target { powerpc*-*-* } }  ?

For some (known) reasons, the __sync_val_compare_and_swap_16 is
produced for powerpc but not for x86-64.

I could imagine that GCN is affected in the same way as nvptx,
except that AMD's ROC is currently not supported for PowerPC,
if I understand it correctly. If FAIL start to occur in some
CPU/GPU combinations, it can be still revisited.

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-09-11 15:29 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-15 20:28 [PATCH] nvptx: Add support for subword compare-and-swap Kwok Cheung Yeung
2020-06-23 16:44 ` Thomas Schwinge
2020-06-23 16:51   ` Jakub Jelinek
2020-06-30 16:35     ` Kwok Cheung Yeung
2020-06-24 11:13   ` Kwok Cheung Yeung
2020-06-30 14:37   ` Tom de Vries
2020-07-01 14:28 ` Tom de Vries
2020-07-15 19:08   ` Kwok Cheung Yeung
2020-07-20 13:19   ` Kwok Cheung Yeung
2020-08-04 14:56     ` [PING] " Kwok Cheung Yeung
2020-08-13  9:27     ` Tom de Vries
2020-09-01 11:41       ` [patch][nvptx] libgomp: Split testcase in order to XFAIL __sync_val_compare_and_swap_16 (was: [PATCH] nvptx: Add support for subword compare-and-swap) Tobias Burnus
2020-09-01 12:58         ` Tom de Vries
2020-09-02  7:56           ` Tom de Vries
2020-09-02 10:22             ` [RFC][nvptx, libgomp] Add 128-bit atomic support Tom de Vries
2020-09-02 10:44               ` Jakub Jelinek
2020-09-02 11:30                 ` Tobias Burnus
2020-09-02 11:48                 ` Tom de Vries
2020-09-11 14:24                   ` Tom de Vries
2020-09-11 14:25                     ` Tom de Vries
2020-09-11 14:48                       ` Andrew Stubbs
2020-09-11 15:03                         ` tdevries
2020-09-11 15:29                           ` Tobias Burnus
2020-09-11 14:37                     ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).