From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from xry111.site (xry111.site [IPv6:2001:470:683e::1]) by sourceware.org (Postfix) with ESMTPS id 8F72B395445C for ; Thu, 17 Nov 2022 03:39:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8F72B395445C Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1668656339; bh=+UkiRq2fKsDWeYSp8Ibdwjcjyx8PcAecv7VYvQZGR1M=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=VhEopZL0Z82qH9M6mZCtPmTxkqfJqulAep1BSa29gwr3o9aAasBzMpcScEC1dRToq JwNuT0eaRhG/UfBk3qYnelV0d+lB6EaurIr+XW1pWn55uT/X+7XOKs+wWZhfggnAff qIUB4NO8ejqdgMTP4zbecEs7X8OzkVevrWS5hyLA= Received: from localhost.localdomain (xry111.site [IPv6:2001:470:683e::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 66EB865BBF; Wed, 16 Nov 2022 22:38:57 -0500 (EST) Message-ID: <2b83052845d53312f6d5af2953162cfa693b6538.camel@xry111.site> Subject: Re: [PATCH] LoongArch: Fix atomic_exchange make comparison and may jump out From: Xi Ruoyao To: Jinyang He , Chenghua Xu , Lulu Cheng Cc: Weining Lu , Xing Li , yala , Peng Fan , gcc-patches@gcc.gnu.org, Huang Pei Date: Thu, 17 Nov 2022 11:38:55 +0800 In-Reply-To: References: <20221115130328.15413-1-hejinyang@loongson.cn> <8039c23568889fe85afbe6940ed625448cf6cd56.camel@xry111.site> <1dd9ace0-a83f-c530-2d65-5f762e0cc81e@loongson.cn> <0390618e9d9e74eb2ea22ae8a934cbc37cd483a7.camel@xry111.site> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.0 MIME-Version: 1.0 X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FROM_SUSPICIOUS_NTLD,LIKELY_SPAM_FROM,PDS_OTHER_BAD_TLD,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 2022-11-17 at 10:55 +0800, Jinyang He wrote: > On 2022/11/17 =E4=B8=8A=E5=8D=889:39, Jinyang He wrote: >=20 > > On 2022/11/16 =E4=B8=8B=E5=8D=887:46, Xi Ruoyao wrote: > >=20 > > > On Wed, 2022-11-16 at 10:11 +0800, Jinyang He wrote: > > >=20 > > > > > > +=C2=A0 return "%G6\\n\\t" > > > > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "1:\\n\\t" > > > > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "ll.\\t%0,%1\\= n\\t" > > > > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "and\\t%7,%0,%z3\\n= \\t" > > > > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "or%i5\\t%7,%7,%5\\= n\\t" > > > > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "sc.\\t%7,%1\\= n\\t" > > > > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "beqz\\t%7,1b\\n\\t= "; > > > > > Do we need a "dbar 0x700" after beqz? > > > > >=20 > > > > > /* snip */ > > > > That's worth discussing. Actually I don't see any dbar hint definit= ion > > > > like 0x700 in the manual right now. > > > > Besides, I think what should be provided here is a relaxed version.= And > > > > whether the barrier exsit or not is depend on the specific=20 > > > > memory_order. > > > It's not related to memory order, but for a hardware issue workaround= . > > > Jiaxun told me (via LKML): > > >=20 > > > =C2=A0=C2=A0=C2=A0 I had checked with Loongson guys and they confirme= d that the > > > =C2=A0=C2=A0=C2=A0 workaround still needs to be applied to latest 3A4= 000 processors, > > > =C2=A0=C2=A0=C2=A0 including 3A4000 for MIPS and 3A5000 for LoongArch= . > > > =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 Though, the reason behind the w= orkaround varies with the=20 > > > evaluation > > > =C2=A0=C2=A0=C2=A0 of their uArch, for GS464V based core, barrier is = required as the > > > =C2=A0=C2=A0=C2=A0 uArch design allows regular load to be reordered a= fter an atomic > > > =C2=A0=C2=A0=C2=A0 linked load, and that would break assumption of co= mpiler atomic > > > =C2=A0=C2=A0=C2=A0 constraints. > >=20 > > That certainly seems to be needed, but before or after. It's beyond my > > recognition and cc huangpei@loongson.cn=C2=A0for help. >=20 >=20 > Pei told me the ll-sc works at present like follows, >=20 > uArch like: > =C2=A0=C2=A0 ll -> (ll.dbar ll.ld_atomic) > =C2=A0=C2=A0 sc -> (sc.dbar sc.st_atomic) >=20 > exchange: > ll.dbar > <---------------------------+ > ll.ld_atomic $rd=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 | > ...(no jmp)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | > sc.dbar=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | > sc.st_stomic $rd=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 | > ld $rj -can-not-emit-at-----+ >=20 > The load $rj can not emit between ll.dbar and ll.ld_atomic because the > sc.dbar barrier it. >=20 >=20 > compare and exchange: > ll.dbar > <-----------------------+ > ll.ld_atomic $rd=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | > ...(jmp) ---------------+------+ > sc.dbar=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | > sc.st_stomic $rd=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 | > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |= =C2=A0=C2=A0 <--+ > ld $rj -may-emit-at-----+ >=20 > Jumping out ll-sc may lead loading $rj emit between ll.dbar and ll.atomic= . >=20 >=20 > Thus, exchange not need dbar. >=20 >=20 > >=20 > >=20 > > >=20 > > > Without these dbar instructions I'd got random test failures in GCC > > > libgomp test suite. >=20 > Which test suite? I mean when we didn't use dbar 0x700 for compare-and-exchange (during the early development stage of GCC for LoongArch) I observed these failures. So we do need an additional dbar for compare-and-exchange, but do not need it for a bare atomic exchange? --=20 Xi Ruoyao School of Aerospace Science and Technology, Xidian University