From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 9876A385841A for ; Thu, 17 Nov 2022 02:56:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9876A385841A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [111.9.175.10]) by gateway (Coremail) with SMTP id _____8BxHdm7onVjXiUIAA--.23132S3; Thu, 17 Nov 2022 10:55:55 +0800 (CST) Received: from [10.136.12.12] (unknown [111.9.175.10]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Cxn+C2onVjDZoVAA--.57203S3; Thu, 17 Nov 2022 10:55:52 +0800 (CST) Subject: Re: [PATCH] LoongArch: Fix atomic_exchange make comparison and may jump out From: Jinyang He To: Xi Ruoyao , Chenghua Xu , Lulu Cheng Cc: Weining Lu , Xing Li , yala , Peng Fan , gcc-patches@gcc.gnu.org, Huang Pei References: <20221115130328.15413-1-hejinyang@loongson.cn> <8039c23568889fe85afbe6940ed625448cf6cd56.camel@xry111.site> <1dd9ace0-a83f-c530-2d65-5f762e0cc81e@loongson.cn> <0390618e9d9e74eb2ea22ae8a934cbc37cd483a7.camel@xry111.site> Message-ID: Date: Thu, 17 Nov 2022 10:55:50 +0800 User-Agent: Mozilla/5.0 (X11; Linux mips64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-CM-TRANSID:AQAAf8Cxn+C2onVjDZoVAA--.57203S3 X-CM-SenderInfo: pkhmx0p1dqwqxorr0wxvrqhubq/ X-Coremail-Antispam: 1Uk129KBjvJXoWxAr4xGryktF1xGr47Cw4xtFb_yoW5Ww1xpr 4fJF17KrW5Jrn5Jry7tr1UZryYyr1UG3WUXr1rJFy8Zr4qkr12qr1UXw10gFy5J3y8Jr1j qr1UXw17ZF17JrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUj1kv1TuYvTs0mT0YCTnIWj qI5I8CrVACY4xI64kE6c02F40Ex7xfYxn0WfASr-VFAUDa7-sFnT9fnUUIcSsGvfJTRUUU bI8YFVCjjxCrM7AC8VAFwI0_Jr0_Gr1l1xkIjI8I6I8E6xAIw20EY4v20xvaj40_Wr0E3s 1l1IIY67AEw4v_Jrv_JF1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28EF7xv wVC0I7IYx2IY67AKxVW8JVW5JwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwA2z4 x0Y4vEx4A2jsIE14v26r4UJVWxJr1l84ACjcxK6I8E87Iv6xkF7I0E14v26r4UJVWxJr1l e2I262IYc4CY6c8Ij28IcVAaY2xG8wAqjxCEc2xF0cIa020Ex4CE44I27wAqx4xG64xvF2 IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE14v26r1j6r4U McvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvEwIxGrwCYjI0SjxkI62AI1cAE67vIY487Mx AIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_ Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwI xGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8 JwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcV C2z280aVCY1x0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxU70PfDUUUU X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_STATUS,NICE_REPLY_A,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 2022/11/17 上午9:39, Jinyang He wrote: > On 2022/11/16 下午7:46, Xi Ruoyao wrote: > >> On Wed, 2022-11-16 at 10:11 +0800, Jinyang He wrote: >> >>>>> +  return "%G6\\n\\t" >>>>> +        "1:\\n\\t" >>>>> +        "ll.\\t%0,%1\\n\\t" >>>>> +        "and\\t%7,%0,%z3\\n\\t" >>>>> +        "or%i5\\t%7,%7,%5\\n\\t" >>>>> +        "sc.\\t%7,%1\\n\\t" >>>>> +        "beqz\\t%7,1b\\n\\t"; >>>> Do we need a "dbar 0x700" after beqz? >>>> >>>> /* snip */ >>> That's worth discussing. Actually I don't see any dbar hint definition >>> like 0x700 in the manual right now. >>> Besides, I think what should be provided here is a relaxed version. And >>> whether the barrier exsit or not is depend on the specific >>> memory_order. >> It's not related to memory order, but for a hardware issue workaround. >> Jiaxun told me (via LKML): >> >>     I had checked with Loongson guys and they confirmed that the >>     workaround still needs to be applied to latest 3A4000 processors, >>     including 3A4000 for MIPS and 3A5000 for LoongArch. >>         Though, the reason behind the workaround varies with the >> evaluation >>     of their uArch, for GS464V based core, barrier is required as the >>     uArch design allows regular load to be reordered after an atomic >>     linked load, and that would break assumption of compiler atomic >>     constraints. > > That certainly seems to be needed, but before or after. It's beyond my > recognition and cc huangpei@loongson.cn for help. Pei told me the ll-sc works at present like follows, uArch like:   ll -> (ll.dbar ll.ld_atomic)   sc -> (sc.dbar sc.st_atomic) exchange: ll.dbar <---------------------------+ ll.ld_atomic $rd            | ...(no jmp)                 | sc.dbar                     | sc.st_stomic $rd            | ld $rj -can-not-emit-at-----+ The load $rj can not emit between ll.dbar and ll.ld_atomic because the sc.dbar barrier it. compare and exchange: ll.dbar <-----------------------+ ll.ld_atomic $rd        | ...(jmp) ---------------+------+ sc.dbar                 |      | sc.st_stomic $rd        |      |                         |   <--+ ld $rj -may-emit-at-----+ Jumping out ll-sc may lead loading $rj emit between ll.dbar and ll.atomic. Thus, exchange not need dbar. > > >> >> Without these dbar instructions I'd got random test failures in GCC >> libgomp test suite. Which test suite? >> >> We use a non-zero hint here because it is treated exactly same as zero >> in 3A5000, and the future LoongArch processors can fix the issue and >> ignore the dbar 0x700 instruction. > Thanks, it's a nice workaround.