From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 9556C393C88C for ; Wed, 24 Jun 2020 11:14:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 9556C393C88C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Kwok_Yeung@mentor.com IronPort-SDR: KmgakRgWH+e0W1GiUYnlOpvM7TTX2/sHeOLUaMzX7aExMcVtM3JXqpUznpl55bHdZVNxHcaljH vmuU2y8B2WdC8icyyfv0Jb1pV1AUnFJ+GhzAqjTUEftYhAbEFSNodTha0Rb/+4TWgrf0bnvEyc RAVmtYqLAuzbv4umozZ4yjbvCWLIxgAzN4XIQl6hF1pFTkITX1alUqXCkJnuuIGziedKRWaDw/ aFc9juW37yc36EsPVzGrIyXu8hYQzQKtQkZ/Qwt42rQ1uO/1JEdMN+3Ht54SF6ugB8/+yI7sZg TBQ= X-IronPort-AV: E=Sophos;i="5.75,275,1589270400"; d="scan'208";a="52279846" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa1.mentor.iphmx.com with ESMTP; 24 Jun 2020 03:14:00 -0800 IronPort-SDR: 6vu1szuYm7KqLQPLeQ5HGoSqxR6mjA9RGllKRP2TKeD5FnFDKDSu4M5zu2P3I+Yvp5/YI1Lb4Z PLd9+rrig1+fDxgw4tXVvJoTyCs+2ngZb34Wzku+odWJbe7+b0631cm5n9+0n7ssIx4BBQRnzx oeUADihU5BEygRVIeF0eWlF0isXUv+EdBAkOysJvVHJA6GoRoUBMrPJRUSGqWF1dYJvgEfaA30 uAb3b2eKTTSXVX37Bj7fWvVbRO1vqkeQ+8uXZzeVgftdJ3lzwnZwYHDd3z5YldBVOhPmmbFj3b Fos= From: Kwok Cheung Yeung Subject: Re: [PATCH] nvptx: Add support for subword compare-and-swap To: Thomas Schwinge , Tom de Vries CC: , Jakub Jelinek References: <3cbe58e6-b427-ff0d-b7d1-8723aac82b18@codesourcery.com> <87o8p920lx.fsf@euler.schwinge.homeip.net> Message-ID: <04d81d3b-c292-4b1c-0b42-df5e2b8227f3@codesourcery.com> Date: Wed, 24 Jun 2020 12:13:27 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: <87o8p920lx.fsf@euler.schwinge.homeip.net> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) To SVR-IES-MBX-03.mgc.mentorg.com (139.181.222.3) X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Jun 2020 11:14:03 -0000 On 23/06/2020 5:44 pm, Thomas Schwinge wrote: > Hi! > > On 2020-06-15T21:28:12+0100, Kwok Cheung Yeung wrote: >> This patch adds support on nvptx for __sync_val_compare_and_swap operations on >> 1- and 2-byte values. > > Is this a thorough review that these are the only functions missing, or > did you just implement what you found missing for some test case you've > been looking into? Other architectures' similar libgcc files seem to be > defining more of such related functions. > This was to fix a particular test case that used logical reduction operators on subword values. Support for other operators (e.g. arithmetic add) would need additional support. Other architectures also define atomic operations for subwords in libgcc (e.g. in config/arm/linux-atomic.c for Arm, config/riscv/atomic.c for RISC-V, in config/m68k/linux-atomic.c for Motorola 68000s etc.). >> The implementation is a straight copy of the version for >> AMD GCN. > > (Should thus be generalized? That can be done later, as far as I'm > concerned -- there already seems to be quite some code duplication in > libgcc.) > > I have not verified the algorithm. > > It seems a bit unfortunate to have such a thing outlined in a separate > function, given we're talking about performance-critical code here? Even > more so for GCN, where there's no JIT compiler that can inline it later, > as it's the case for nvptx? > > You have verified that GCC itself shouldn't/can't synthesize such > replacement code, inline? > > The GCN/nvptx libgcc code actually seems simple enough so that GCC could > synthesize that internally -- or is there a reason not to? (Just > curious.) > Without the additional __sync_val_compare_and_swap_* functions, the testcase fails with: unresolved symbol __sync_val_compare_and_swap_2 when the accelerator compiler is run. The algorithm is generic and fairly simple, so it is more that GCC doesn't, rather than shouldn't or can't. i.e. No one has implemented it as a fallback in lieu of native support so far. > I see 'gcc/doc/generic.texi', "OMP_ATOMIC" state: > > The gimplifier tries > three alternative code generation strategies. Whenever possible, > an atomic update built-in is used. If that fails, a > compare-and-swap loop is attempted. If that also fails, a > regular critical section around the expression is used. > > ..., and I see 'gcc/omp-expand.c:expand_omp_atomic_pipeline' synthesize > that "compare-and-swap loop" code, which looks vaguely similar to your > libgcc implementation. > > ..., and 'gcc/optabs.c:expand_compare_and_swap_loop' etc. also look > similar/related? These create loops that repeatedly do the compare-and-swap operation until it succeeds. They cannot work to express a smaller compare-and-swap in terms of a larger one since they lack the necessary shifts and masking. The exit condition is also different - since we are implementing compare-and-swap, the loop can exit even if the swap fails. The loop exits if the subword read from memory is different from the expected value (i.e. the compare-and-swap fails) or if the word containing the subword is equal to the expected value (i.e. it succeeds). The loop is there to cope with the case where the compare-and-swap fails due to part of the larger word changing that is not part of the subword that we are interested in. > ..., or maybe even 'gcc/builtins.c:expand_builtin_compare_and_swap' > etc. could be doing such things? > This delegates to expand_atomic_compare_and_swap (in optabs.c), which would be the logical place to synthesize the sub-word loop. All it does ATM is look for atomic_compare_and_swap op handlers, then for sync_compare_and_swap handlers, then library functions for __sync_val_compare_and_swap. Kwok