From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 5293C3857838 for ; Wed, 23 Nov 2022 02:12:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5293C3857838 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [120.244.241.110]) by gateway (Coremail) with SMTP id _____8AxDOuMgX1jZyYAAA--.340S3; Wed, 23 Nov 2022 10:12:29 +0800 (CST) Received: from [192.168.1.102] (unknown [120.244.241.110]) by localhost.localdomain (Coremail) with SMTP id AQAAf8BxoOKLgX1ji3YYAA--.63701S3; Wed, 23 Nov 2022 10:12:27 +0800 (CST) Content-Type: multipart/alternative; boundary="------------GXCh7IsBvluTEO7RCcEkU0ta" Message-ID: <6a16b95e-3622-5c51-bb23-2885611f86ca@loongson.cn> Date: Wed, 23 Nov 2022 10:12:27 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: [PATCH v4] LoongArch: Optimize immediate load. Content-Language: en-US To: Xi Ruoyao , gcc-patches@gcc.gnu.org Cc: i@xen0n.name, xuchenghua@loongson.cn References: <20221117095909.2896386-1-chenglulu@loongson.cn> <32b7624b24d1f48805d4c777ebde1380fd3d1596.camel@xry111.site> From: chenglulu In-Reply-To: <32b7624b24d1f48805d4c777ebde1380fd3d1596.camel@xry111.site> X-CM-TRANSID:AQAAf8BxoOKLgX1ji3YYAA--.63701S3 X-CM-SenderInfo: xfkh0wpoxo3qxorr0wxvrqhubq/ X-Coremail-Antispam: 1Uk129KBjvJXoW7tFy8Aw4UXr18KF47Zr48Xrb_yoW8uFWfpw 48Kr1UGrW8Xr48Wr17Jr4UJFWUXr1xAw45ur1UW3W2yw4UGwsYqFs3ZF4Y9FyUJr4kWw1a qr1jgry7ZrWUAw7anT9S1TB71UUUUUDqnTZGkaVYY2UrUUUUj1kv1TuYvTs0mT0YCTnIWj DUYxn0WfASr-VFAUDa7-sFnT9fnUUIcSsGvfJTRUUUbx8YFVCjjxCrM7AC8VAFwI0_Jr0_ Gr1l1xkIjI8I6I8E6xAIw20EY4v20xvaj40_Wr0E3s1l1IIY67AEw4v_JrI_Jryl8cAvFV AK0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28EF7xvwVC0I7IYx2IY67AKxVWUCVW8JwA2 z4x0Y4vE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwA2z4x0Y4vEx4A2jsIE14v26F4j6r4UJw A2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487 Mc804VCY07AIYIkI8VC2zVCFFI0UMcIj6xIIjxv20xvE14v26r1Y6r17McIj6I8E87Iv67 AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IY64vIr41l7480Y4vEI4kI2Ix0 rVAqx4xJMxk0xIA0c2IEe2xFo4CEbIxvr21l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x 0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUGVWUWwC20s026x8GjcxK67AKxVWUGVWUWwC2 zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF 4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWU CwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r1j6r4UYxBIda VFxhVjvjDU0xZFpf9x07UNa9-UUUUU= X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00,BODY_8BITS,HTML_MESSAGE,KAM_DMARC_STATUS,NICE_REPLY_A,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------GXCh7IsBvluTEO7RCcEkU0ta Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit 在 2022/11/23 00:44, Xi Ruoyao 写道: >> While I still can't fully understand the immediate load issue and how >> this patch fix it, I've tested this patch (alongside the prefetch >> instruction patch) with bootstrap-ubsan.  And the compiled result of >> imm-load1.c seems OK. > And it's doing correct thing for Glibc "improved generic string > functions" patch, producing some really tight loop now. > In the process of debugging, I found this,bringing the immediate number load instruction out of the loop is done in loop2_invariant optimization. One of the conditions for extraction is that the destination register cannot be used more than once, and the sequence before it was modified was like this: (insn 12 11 13 3 (set (reg:DI 90)         (const_int 16842752 [0x1010000])) "test.c":13:12 discrim 1 131 {*movdi_64bit}      (nil)) (insn 13 12 14 3 (set (reg:DI 91)         (ior:DI (reg:DI 90)             (const_int 257 [0x101]))) "test.c":13:12 discrim 1 88 {iordi3}      (expr_list:REG_DEAD (reg:DI 90)         (expr_list:REG_EQUAL (const_int 16843009 [0x1010101])             (nil)))) (insn 14 13 15 3 (set (reg:DI 91)         (ior:DI (zero_extend:DI (subreg:SI (reg:DI 91) 0))             (const_int 282578783305728 [0x1010100000000]))) "test.c":13:12 discrim 1 150 {lu32i_d}      (expr_list:REG_EQUAL (const_int 282578800148737 [0x1010101010101])         (nil))) (insn 15 14 17 3 (set (reg:DI 91)         (ior:DI (and:DI (reg:DI 91)                 (const_int 4503599627370495 [0xfffffffffffff]))             (const_int 72057594037927936 [0x100000000000000]))) "test.c":13:12 discrim 1 151 {lu52i_d}      (expr_list:REG_EQUAL (const_int 72340172838076673 [0x101010101010101])         (nil))) Therefore, the last two instructions do not meet the extraction conditions. But because of the implementation of our instructions, I freed myself up immediately to do it loop2_invariant later, so I avoided this problem. --------------GXCh7IsBvluTEO7RCcEkU0ta--