From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 7A2113858D37 for ; Mon, 10 Oct 2022 01:39:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7A2113858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from [10.20.4.52] (unknown [10.20.4.52]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Cx72vsd0NjhK4pAA--.17296S2; Mon, 10 Oct 2022 09:39:56 +0800 (CST) Subject: Re: [PATCH 0/2] LoongArch: Add optimized functions. To: Richard Henderson , Xi Ruoyao , Adhemerval Zanella Netto , "dengjianbo@loongson.cn" Cc: xuchenghua , "i.swmail" , libc-alpha , joseph , caiyinyu References: <403f78f0-55d9-48cf-c62a-4a0462a76987@loongson.cn> <2022091910031722091613@loongson.cn> <0172d70e-e939-31d4-bcd8-b47f274f97d9@linaro.org> <9cbcd3541c903aaba8038237befee5e3720d144e.camel@xry111.site> <1fec4245-9eb4-108d-722e-ba36a1df0023@linaro.org> <8411c465e01de9608633f8b1fd2d82d3ef16f001.camel@xry111.site> <1679af30-ee17-3016-1bd3-192f744ad8ef@linaro.org> <6afb2b9136ff6c96d5b729340427f59e24ebf268.camel@xry111.site> From: Lulu Cheng Message-ID: <0e06281a-6746-0324-24af-16e1fa3966ba@loongson.cn> Date: Mon, 10 Oct 2022 09:39:56 +0800 User-Agent: Mozilla/5.0 (X11; Linux mips64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------949DD160BD626BBE857FE094" Content-Language: en-US X-CM-TRANSID:AQAAf8Cx72vsd0NjhK4pAA--.17296S2 X-Coremail-Antispam: 1UD129KBjvdXoW7XrWDXw1rZrWxZF47AF48Xrb_yoWDtrbE9r 1vkw1DWw129F4Syr4F9F4xCasrWF4UAryxtryvqa1akry7Jrn5CF1Dur9avw1xWr4vgr9x C3sxXFyUCF9F9jkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUb-8FF20E14v26r4j6ryUM7CY07I20VC2zVCF04k26cxKx2IYs7xG 6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8w A2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j 6F4UJwA2z4x0Y4vEx4A2jsIE14v26r4UJVWxJr1l84ACjcxK6I8E87Iv6xkF7I0E14v26r 4UJVWxJr1le2I262IYc4CY6c8Ij28IcVAaY2xG8wASzI0EjI02j7AqF2xKxwAqx4xG64xv F2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r 4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvEwIxGrwACjI8F5VA0II8E6IAqYI8I648v 4I1lc7I2V7IY0VAS07AlzVAYIcxG8wCY02Avz4vE-syl42xK82IYc2Ij64vIr41l4I8I3I 0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWU GVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI 0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0 rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r1j6r 4UYxBIdaVFxhVjvjDU0xZFpf9x0JUdHUDUUUUU= X-CM-SenderInfo: xfkh0wpoxo3qxorr0wxvrqhubq/ X-Spam-Status: No, score=-14.5 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,NICE_REPLY_A,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------949DD160BD626BBE857FE094 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit 在 2022/9/29 上午3:18, Richard Henderson 写道: > On 9/28/22 09:42, Xi Ruoyao wrote: >>> There is nothing in string-maskoff.h that the compiler should not be >>> able to produce >>> itself from the generic version.  Having a brief look, the compiler >>> simply needs to be >>> improved to unify two current AND patterns (which is an existing >>> bug) and add the >>> additional case for bstrins.d. >> >> Added GCC LoongArch port maintainer into Cc:. >> >> It's actually more complicated.  Without the inline assembly in >> repeat_bytes(), the compiler does not hoist the 4-instruction 64-bit >> immediate load sequence out of a loop for "some reason I don't know >> yet". > > Oh that's interesting.  I suspect that adding a REG_EQUAL note for > each (or perhaps just the last) insn emitted by loongarch_move_integer > would fix that. > > I have optimized this problem, and the modification is in the attachment. However, in some cases, the immediate count load will be two more than the original implementation instruction. I'm still working on that. --------------949DD160BD626BBE857FE094 Content-Type: text/x-patch; charset=UTF-8; name="0001-LoongArch-Optimize-immediate-load.patch" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="0001-LoongArch-Optimize-immediate-load.patch" >From 093865cc12334ed3f2db42e3c7b19b1d7ef4559a Mon Sep 17 00:00:00 2001 From: Lulu Cheng Date: Sun, 9 Oct 2022 17:54:38 +0800 Subject: [PATCH] LoongArch: Optimize immediate load. Optimize the link of https://sourceware.org/pipermail/libc-alpha/2022-September/142202.html said in a number of repeated loading immediately. gcc/ChangeLog: * config/loongarch/loongarch.cc (struct loongarch_integer_op): Add the member curr_value to the structure to represent the result of the immediate count of the load instruction at each step. (loongarch_build_integer): Assign a value to the member curr_value. (loongarch_move_integer): Optimize immediate load. --- gcc/config/loongarch/loongarch.cc | 57 ++++++++++++++++++------------- 1 file changed, 34 insertions(+), 23 deletions(-) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 70918d41860..38d822bcd49 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -154,7 +154,11 @@ enum loongarch_load_imm_method struct loongarch_integer_op { enum rtx_code code; + /* Current Immediate Count The immediate count of the load instruction. */ HOST_WIDE_INT value; + /* Represent the result of the immediate count of the load instruction at + each step. */ + HOST_WIDE_INT curr_value; enum loongarch_load_imm_method method; }; @@ -1523,24 +1527,27 @@ loongarch_build_integer (struct loongarch_integer_op *codes, { /* The value of the lower 32 bit be loaded with one instruction. lu12i.w. */ - codes[0].code = UNKNOWN; - codes[0].method = METHOD_NORMAL; - codes[0].value = low_part; + codes[cost].code = UNKNOWN; + codes[cost].method = METHOD_NORMAL; + codes[cost].value = low_part; + codes[cost].curr_value = low_part; cost++; } else { /* lu12i.w + ior. */ - codes[0].code = UNKNOWN; - codes[0].method = METHOD_NORMAL; - codes[0].value = low_part & ~(IMM_REACH - 1); + codes[cost].code = UNKNOWN; + codes[cost].method = METHOD_NORMAL; + codes[cost].value = low_part & ~(IMM_REACH - 1); + codes[cost].curr_value = codes[cost].value; cost++; HOST_WIDE_INT iorv = low_part & (IMM_REACH - 1); if (iorv != 0) { - codes[1].code = IOR; - codes[1].method = METHOD_NORMAL; - codes[1].value = iorv; + codes[cost].code = IOR; + codes[cost].method = METHOD_NORMAL; + codes[cost].value = iorv; + codes[cost].curr_value = low_part; cost++; } } @@ -1563,11 +1570,15 @@ loongarch_build_integer (struct loongarch_integer_op *codes, { codes[cost].method = METHOD_LU52I; codes[cost].value = value & LU52I_B; + codes[cost].curr_value = codes[cost].value | (codes[cost-1].curr_value & + 0xfffffffffffff); return cost + 1; } codes[cost].method = METHOD_LU32I; codes[cost].value = (value & LU32I_B) | (sign51 ? LU52I_B : 0); + codes[cost].curr_value = codes[cost].value | (codes[cost-1].curr_value & + 0xffffffff); cost++; /* Determine whether the 52-61 bits are sign-extended from the low order, @@ -1576,6 +1587,8 @@ loongarch_build_integer (struct loongarch_integer_op *codes, { codes[cost].method = METHOD_LU52I; codes[cost].value = value & LU52I_B; + codes[cost].curr_value = codes[cost].value | (codes[cost-1].curr_value & + 0xfffffffffffff); cost++; } } @@ -2959,29 +2972,27 @@ loongarch_move_integer (rtx temp, rtx dest, unsigned HOST_WIDE_INT value) else x = force_reg (mode, x); + set_unique_reg_note (get_last_insn (), REG_EQUAL, GEN_INT (codes[i-1].curr_value)); + switch (codes[i].method) { case METHOD_NORMAL: + /* mov or ior. */ x = gen_rtx_fmt_ee (codes[i].code, mode, x, GEN_INT (codes[i].value)); break; case METHOD_LU32I: - emit_insn ( - gen_rtx_SET (x, - gen_rtx_IOR (DImode, - gen_rtx_ZERO_EXTEND ( - DImode, gen_rtx_SUBREG (SImode, x, 0)), - GEN_INT (codes[i].value)))); + gcc_assert (mode == DImode); + /* lu32i_d */ + x = gen_rtx_IOR (mode, gen_rtx_ZERO_EXTEND (mode, + gen_rtx_SUBREG (SImode, x, 0)), + GEN_INT (codes[i].value)); break; case METHOD_LU52I: - emit_insn (gen_lu52i_d (x, x, GEN_INT (0xfffffffffffff), - GEN_INT (codes[i].value))); - break; - case METHOD_INSV: - emit_insn ( - gen_rtx_SET (gen_rtx_ZERO_EXTRACT (DImode, x, GEN_INT (20), - GEN_INT (32)), - gen_rtx_REG (DImode, 0))); + gcc_assert (mode == DImode); + /*lu52i_d*/ + x = gen_rtx_IOR (mode, gen_rtx_AND (mode, x, GEN_INT (0xfffffffffffff)), + GEN_INT (codes[i].value)); break; default: gcc_unreachable (); -- 2.31.1 --------------949DD160BD626BBE857FE094--