From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by sourceware.org (Postfix) with ESMTPS id 164AE3857007 for ; Wed, 1 Jul 2020 14:28:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 164AE3857007 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tdevries@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 01955ACA7; Wed, 1 Jul 2020 14:28:07 +0000 (UTC) Subject: Re: [PATCH] nvptx: Add support for subword compare-and-swap To: Kwok Cheung Yeung , GCC Patches References: <3cbe58e6-b427-ff0d-b7d1-8723aac82b18@codesourcery.com> From: Tom de Vries Autocrypt: addr=tdevries@suse.de; keydata= xsBNBF0ltCcBCADDhsUnMMdEXiHFfqJdXeRvgqSEUxLCy/pHek88ALuFnPTICTwkf4g7uSR7 HvOFUoUyu8oP5mNb4VZHy3Xy8KRZGaQuaOHNhZAT1xaVo6kxjswUi3vYgGJhFMiLuIHdApoc u5f7UbV+egYVxmkvVLSqsVD4pUgHeSoAcIlm3blZ1sDKviJCwaHxDQkVmSsGXImaAU+ViJ5l CwkvyiiIifWD2SoOuFexZyZ7RUddLosgsO0npVUYbl6dEMq2a5ijGF6/rBs1m3nAoIgpXk6P TCKlSWVW6OCneTaKM5C387972qREtiArTakRQIpvDJuiR2soGfdeJ6igGA1FZjU+IsM5ABEB AAHNH1RvbSBkZSBWcmllcyA8dGRldnJpZXNAc3VzZS5kZT7CwKsEEwEIAD4WIQSsnSe5hKbL MK1mGmjuhV2rbOJEoAUCXSW0JwIbAwUJA8JnAAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAh CRDuhV2rbOJEoBYhBKydJ7mEpsswrWYaaO6FXats4kSgc48H/Ra2lq5p3dHsrlQLqM7N68Fo eRDf3PMevXyMlrCYDGLVncQwMw3O/AkousktXKQ42DPJh65zoXB22yUt8m0g12xkLax98KFJ 5NyUloa6HflLl+wQL/uZjIdNUQaHQLw3HKwRMVi4l0/Jh/TygYG1Dtm8I4o708JS4y8GQxoQ UL0z1OM9hyM3gI2WVTTyprsBHy2EjMOu/2Xpod95pF8f90zBLajy6qXEnxlcsqreMaqmkzKn 3KTZpWRxNAS/IH3FbGQ+3RpWkNGSJpwfEMVCeyK5a1n7yt1podd1ajY5mA1jcaUmGppqx827 8TqyteNe1B/pbiUt2L/WhnTgW1NC1QDOwE0EXSW0JwEIAM99H34Bu4MKM7HDJVt864MXbx7B 1M93wVlpJ7Uq+XDFD0A0hIal028j+h6jA6bhzWto4RUfDl/9mn1StngNVFovvwtfzbamp6+W pKHZm9X5YvlIwCx131kTxCNDcF+/adRW4n8CU3pZWYmNVqhMUiPLxElA6QhXTtVBh1RkjCZQ Kmbd1szvcOfaD8s+tJABJzNZsmO2hVuFwkDrRN8Jgrh92a+yHQPd9+RybW2l7sJv26nkUH5Z 5s84P6894ebgimcprJdAkjJTgprl1nhgvptU5M9Uv85Pferoh2groQEAtRPlCGrZ2/2qVNe9 XJfSYbiyedvApWcJs5DOByTaKkcAEQEAAcLAkwQYAQgAJhYhBKydJ7mEpsswrWYaaO6FXats 4kSgBQJdJbQnAhsMBQkDwmcAACEJEO6FXats4kSgFiEErJ0nuYSmyzCtZhpo7oVdq2ziRKD3 twf7BAQBZ8TqR812zKAD7biOnWIJ0McV72PFBxmLIHp24UVe0ZogtYMxSWKLg3csh0yLVwc7 H3vldzJ9AoK3Qxp0Q6K/rDOeUy3HMqewQGcqrsRRh0NXDIQk5CgSrZslPe47qIbe3O7ik/MC q31FNIAQJPmKXX25B115MMzkSKlv4udfx7KdyxHrTSkwWZArLQiEZj5KG4cCKhIoMygPTA3U yGaIvI/BGOtHZ7bEBVUCFDFfOWJ26IOCoPnSVUvKPEOH9dv+sNy7jyBsP5QxeTqwxC/1ZtNS DUCSFQjqA6bEGwM22dP8OUY6SC94x1G81A9/xbtm9LQxKm0EiDH8KBMLfQ== Cc: Thomas Schwinge , Jakub Jelinek Message-ID: <7d824fcc-cacf-6ad3-578d-1dc71243ed8a@suse.de> Date: Wed, 1 Jul 2020 16:28:05 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: <3cbe58e6-b427-ff0d-b7d1-8723aac82b18@codesourcery.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Jul 2020 14:28:11 -0000 On 6/15/20 10:28 PM, Kwok Cheung Yeung wrote: > Hello > > This patch adds support on nvptx for __sync_val_compare_and_swap > operations on 1- and 2-byte values. The implementation is a straight > copy of the version for AMD GCN. Actually it also adds support for __sync_bool_compare_and_swap, be sure to mention this in the commit log. > I have added a new libgomp test that exercises the new operation. I have > also verified that the new code does not cause any regressions on the > nvptx offloading tests, and that the new test passes with both nvptx and > amdgcn as offload targets. > AFAICT, that excercises __sync_val_compare_and_swap, but not __sync_bool_compare_and_swap. You've covered offloading, but now consider the nvptx standalone target. I was surprised to find only one generic test-case exercising char/short __sync_val_compare_and_swap: ... $ find gcc/testsuite/ -type f | xargs grep -l sync_char_short | xargs grep sync_val gcc/testsuite/gcc.dg/pr49696.c: __sync_val_compare_and_swap (x, 1, 0); ... and that's not a run test. So, I think gcc needs a copy of (some of) the gcc/testsuite/gcc.dg/ia64-sync-*.c tests for effective target sync_char_short. However, since this patch only adds partial support, we cannot enable sync_char_short for nvptx yet. So, if you stick to partial support, you should add a char/short copy of ia64-sync-3.c to gcc.target/nvptx (which ideally could be an include of a generic test-case that is active for sync_char_short only, with mention that it can be removed once sync_char_short is enabled for nvptx). > Okay for master and OG10? > I looked at the implementation, and it looks ok to me, though I think we need to make explicit in a comment what the assumptions are: - that we have read and write access to the entire word, and - that the word is not volatile. As for the discussion about libgcc vs backend implementation, I think it's good to have both. In particular, a backend implementation might be able to take advantage of the address space and issue an atom.shared.cas. But since the current patch fixes something, I don't want to hold it until a backend implementation is done. So, config/nvptx part okay for master, provided: - nvptx test-case added - nvptx standalone target testing done - assumption comment added to implementation. This bit can be committed as a separate patch, without the oacc test-case. As for the oacc test-case, you could add the __int128 bit, perhaps along the lines of how things are done in libgomp/testsuite/libgomp.c++/target-8.C ? [ FWIW, it would be nice if the choice between critical section and atomic builtin was postponed till after the host/offload compiler split. Then the nvptx compiler would just fall back on the critical section, and the test-case would have worked out of the box, and there would be no need for the target to pretend to support unsupported atomic operators. Conversely, we could upgrade the omp variables from char/short to int somehow to make sure the assumptions for using non-native subword __sync_val_compare_and_swap are met. ] Thanks, - Tom > Kwok > > > nvptx_subword.patch > > commit 7c3a9c23ba9f5b8fe953aa5492ae75617f2444a3 > Author: Kwok Cheung Yeung > Date: Mon Jun 15 12:34:55 2020 -0700 > > nvptx: Add support for subword compare-and-swap > > 2020-06-15 Kwok Cheung Yeung > > libgcc/ > * config/nvptx/atomic.c: New. > * config/nvptx/t-nvptx (LIB2ADD): Add atomic.c. > > libgomp/ > * testsuite/libgomp.c-c++-common/reduction-16.c: New. > > diff --git a/libgcc/config/nvptx/atomic.c b/libgcc/config/nvptx/atomic.c > new file mode 100644 > index 0000000..4becbd2 > --- /dev/null > +++ b/libgcc/config/nvptx/atomic.c > @@ -0,0 +1,59 @@ > +/* NVPTX atomic operations > + Copyright (C) 2020 Free Software Foundation, Inc. > + Contributed by Mentor Graphics. > + > + This file is free software; you can redistribute it and/or modify it > + under the terms of the GNU General Public License as published by the > + Free Software Foundation; either version 3, or (at your option) any > + later version. > + > + This file is distributed in the hope that it will be useful, but > + WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + General Public License for more details. > + > + Under Section 7 of GPL version 3, you are granted additional > + permissions described in the GCC Runtime Library Exception, version > + 3.1, as published by the Free Software Foundation. > + > + You should have received a copy of the GNU General Public License and > + a copy of the GCC Runtime Library Exception along with this program; > + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > + . */ > + > +#include > + > +#define __SYNC_SUBWORD_COMPARE_AND_SWAP(TYPE, SIZE) \ > + \ > +TYPE \ > +__sync_val_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval) \ > +{ \ > + unsigned int *wordptr = (unsigned int *)((__UINTPTR_TYPE__ ) ptr & ~3UL); \ > + int shift = ((__UINTPTR_TYPE__ ) ptr & 3UL) * 8; \ > + unsigned int valmask = (1 << (SIZE * 8)) - 1; \ > + unsigned int wordmask = ~(valmask << shift); \ > + unsigned int oldword = *wordptr; \ > + for (;;) \ > + { \ > + TYPE prevval = (oldword >> shift) & valmask; \ > + if (__builtin_expect (prevval != oldval, 0)) \ > + return prevval; \ > + unsigned int newword = oldword & wordmask; \ > + newword |= ((unsigned int) newval) << shift; \ > + unsigned int prevword \ > + = __sync_val_compare_and_swap_4 (wordptr, oldword, newword); \ > + if (__builtin_expect (prevword == oldword, 1)) \ > + return oldval; \ > + oldword = prevword; \ > + } \ > +} \ > + \ > +bool \ > +__sync_bool_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval) \ > +{ \ > + return __sync_val_compare_and_swap_##SIZE (ptr, oldval, newval) == oldval; \ > +} > + > +__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned char, 1) > +__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned short, 2) > + > diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx > index c4d20c9..ede0bf0 100644 > --- a/libgcc/config/nvptx/t-nvptx > +++ b/libgcc/config/nvptx/t-nvptx > @@ -1,5 +1,6 @@ > LIB2ADD=$(srcdir)/config/nvptx/reduction.c \ > - $(srcdir)/config/nvptx/mgomp.c > + $(srcdir)/config/nvptx/mgomp.c \ > + $(srcdir)/config/nvptx/atomic.c > > LIB2ADDEH= > LIB2FUNCS_EXCLUDE=__main > diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c > new file mode 100644 > index 0000000..951e522 > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c > @@ -0,0 +1,46 @@ > +/* { dg-do run } */ > + > +#include > + > +#define N 512 > + > +#define GENERATE_TEST(T) \ > +int test_##T (void) \ > +{ \ > + T a[N], res = 0; \ > + \ > + for (int i = 0; i < N; ++i) \ > + a[i] = i & 1; \ > + \ > +_Pragma("omp target teams distribute reduction(||:res) defaultmap(tofrom:scalar)") \ > + for (int i = 0; i < N; ++i) \ > + res = res || a[i]; \ > + \ > + /* res should be non-zero. */\ > + if (!res) \ > + return 1; \ > + \ > +_Pragma("omp target teams distribute reduction(&&:res) defaultmap(tofrom:scalar)") \ > + for (int i = 0; i < N; ++i) \ > + res = res && a[i]; \ > + \ > + /* res should be zero. */ \ > + return res; \ > +} > + > +GENERATE_TEST(char) > +GENERATE_TEST(short) > +GENERATE_TEST(int) > +GENERATE_TEST(long) > + > +int main(void) > +{ > + if (test_char ()) > + abort (); > + if (test_short ()) > + abort (); > + if (test_int ()) > + abort (); > + if (test_long ()) > + abort (); > +} >