From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 52F69386D600 for ; Thu, 21 Dec 2023 12:00:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 52F69386D600 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 52F69386D600 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=114.242.206.163 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703160036; cv=none; b=vpNAnFlhH382GyXYXB9k51l0vI8uLqwcc//cBya8iNSKxp39/sV4pyfMNh86uySk3Y7O/h8USo9u+6huQmE6ZU5hByOS0sfVnWpmgSkKBy0H4MTgAR113Hl6UxZ4Q0p6jcDZDluL0o5sGcRcUudS3hzqYFVx86mzLA4in/miT74= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703160036; c=relaxed/simple; bh=1SDAsBAwd1vgVhIJ4udO2lFscBqjEWfhkuxqkd4/A30=; h=Subject:To:From:Message-ID:Date:MIME-Version; b=aZmCDVu+0SxOYqcvsBCiqLseKNE+Ss6LxW4Pt6Eds9kzk88sKErBUuUVb2x5R2M6S03GekAQ70fPleK1K1QxRcSXr9PrV2mglBqlPeLyhvBvDn4OgZUv1X3FR471ZaOcHSGXp2hZwudCjFYlumeCMWT/5Vqfk1a+yd6YBGsVfBM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from loongson.cn (unknown [10.20.4.107]) by gateway (Coremail) with SMTP id _____8DxE_DdKIRlOGwDAA--.17395S3; Thu, 21 Dec 2023 20:00:29 +0800 (CST) Received: from [10.20.4.107] (unknown [10.20.4.107]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Cxfb7cKIRltLkDAA--.12410S3; Thu, 21 Dec 2023 20:00:28 +0800 (CST) Subject: Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine To: Xi Ruoyao , gcc-patches@gcc.gnu.org Cc: i@xen0n.name, xuchenghua@loongson.cn References: <20231212064754.6623-1-xry111@xry111.site> From: chenglulu Message-ID: Date: Thu, 21 Dec 2023 20:00:28 +0800 User-Agent: Mozilla/5.0 (X11; Linux loongarch64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-CM-TRANSID:AQAAf8Cxfb7cKIRltLkDAA--.12410S3 X-CM-SenderInfo: xfkh0wpoxo3qxorr0wxvrqhubq/ X-Coremail-Antispam: 1Uk129KBj9fXoW3Zry3urWxWryUWFy8Kr1rGrX_yoW8JF45Jo WxKFZ8J345Jr9Fg3yDKa4fCw1fJryDJw4Iy345WrWSka18ZryUC3yDWayYv3y3tFy8W348 GF17JF9rtFW7Ja1rl-sFpf9Il3svdjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUYI7kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUGVWUXwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI 0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280 aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcVAKI48JMxk0xIA0c2IEe2 xFo4CEbIxvr21l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAq x4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r 1DMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF 7I0E14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxV W8JVWxJwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxU cyxRUUUUU X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00,BODY_8BITS,GIT_PATCH_0,KAM_DMARC_STATUS,KAM_SHORT,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Sorry, I've been busy with something else these two days. I don't think there's anything wrong with the code, but I need to test the spec.:-) 在 2023/12/21 下午7:56, Xi Ruoyao 写道: > Ping :). > > On Tue, 2023-12-12 at 14:47 +0800, Xi Ruoyao wrote: >> The problem with peephole2 is it uses a naive sliding-window algorithm >> and misses many cases.  For example: >> >>     float a[10000]; >>     float t() { return a[0] + a[8000]; } >> >> is compiled to: >> >>     la.local    $r13,a >>     la.local    $r12,a+32768 >>     fld.s       $f1,$r13,0 >>     fld.s       $f0,$r12,-768 >>     fadd.s      $f0,$f1,$f0 >> >> by trunk.  But as we've explained in r14-4851, the following would be >> better with -mexplicit-relocs=auto: >> >>     pcalau12i   $r13,%pc_hi20(a) >>     pcalau12i   $r12,%pc_hi20(a+32000) >>     fld.s       $f1,$r13,%pc_lo12(a) >>     fld.s       $f0,$r12,%pc_lo12(a+32000) >>     fadd.s      $f0,$f1,$f0 >> >> However the sliding-window algorithm just won't detect the pcalau12i/fld >> pair to be optimized.  Use a define_insn_and_split in combine pass will >> work around the issue. >> >> gcc/ChangeLog: >> >> * config/loongarch/loongarch.md: >> (simple_load): New >> define_insn_and_split. >> (simple_load_off): Likewise. >> (simple_load_ext): Likewise. >> (simple_load_offext): >> Likewise. >> (simple_store): Likewise. >> (simple_store_off): Likewise. >> (define_peephole2): Remove la.local/[f]ld peepholes. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c: >> New test. >> --- >> >> Bootstrapped & regtested on loongarch64-linux-gnu.  Ok for trunk? >> >>  gcc/config/loongarch/loongarch.md             | 165 +++++++++--------- >>  ...explicit-relocs-auto-single-load-store-2.c |  11 ++ >>  2 files changed, 98 insertions(+), 78 deletions(-) >>  create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c >> >> diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md >> index 7b26d15aa4e..4009de408fb 100644 >> --- a/gcc/config/loongarch/loongarch.md >> +++ b/gcc/config/loongarch/loongarch.md >> @@ -4033,101 +4033,110 @@ (define_insn "loongarch_crcc_w__w" >>  ;; >>  ;; And if the pseudo op cannot be relaxed, we'll get a worse result (with >>  ;; 3 instructions). >> -(define_peephole2 >> -  [(set (match_operand:P 0 "register_operand") >> - (match_operand:P 1 "symbolic_pcrel_operand")) >> -   (set (match_operand:LD_AT_LEAST_32_BIT 2 "register_operand") >> - (mem:LD_AT_LEAST_32_BIT (match_dup 0)))] >> -  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ >> -   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \ >> -   && (peep2_reg_dead_p (2, operands[0]) \ >> -       || REGNO (operands[0]) == REGNO (operands[2]))" >> -  [(set (match_dup 2) >> - (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 0) (match_dup 1))))] >> +(define_insn_and_split "simple_load" >> +  [(set (match_operand:LD_AT_LEAST_32_BIT 0 "register_operand" "=r,f") >> + (mem:LD_AT_LEAST_32_BIT >> +   (match_operand:P 1 "symbolic_pcrel_operand" "")))] >> +  "loongarch_pre_reload_split () \ >> +   && la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ >> +   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM)" >> +  "#" >> +  "" >> +  [(set (match_dup 0) >> + (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 2) (match_dup 1))))] >>    { >> -    emit_insn (gen_pcalau12i_gr (operands[0], operands[1])); >> +    operands[2] = gen_reg_rtx (Pmode); >> +    emit_insn (gen_pcalau12i_gr (operands[2], operands[1])); >>    }) >> >> -(define_peephole2 >> -  [(set (match_operand:P 0 "register_operand") >> - (match_operand:P 1 "symbolic_pcrel_operand")) >> -   (set (match_operand:LD_AT_LEAST_32_BIT 2 "register_operand") >> - (mem:LD_AT_LEAST_32_BIT (plus (match_dup 0) >> - (match_operand 3 "const_int_operand"))))] >> -  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ >> -   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \ >> -   && (peep2_reg_dead_p (2, operands[0]) \ >> -       || REGNO (operands[0]) == REGNO (operands[2]))" >> -  [(set (match_dup 2) >> - (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 0) (match_dup 1))))] >> +(define_insn_and_split "simple_load_off" >> +  [(set (match_operand:LD_AT_LEAST_32_BIT 0 "register_operand" "=r,f") >> + (mem:LD_AT_LEAST_32_BIT >> +   (plus (match_operand:P 1 "symbolic_pcrel_operand" "") >> + (match_operand 2 "const_int_operand" ""))))] >> +  "loongarch_pre_reload_split () \ >> +   && la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ >> +   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM)" >> +  "#" >> +  "" >> +  [(set (match_dup 0) >> + (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 2) (match_dup 1))))] >>    { >> -    operands[1] = plus_constant (Pmode, operands[1], INTVAL (operands[3])); >> -    emit_insn (gen_pcalau12i_gr (operands[0], operands[1])); >> +    HOST_WIDE_INT offset = INTVAL (operands[2]); >> +    operands[2] = gen_reg_rtx (Pmode); >> +    operands[1] = plus_constant (Pmode, operands[1], offset); >> +    emit_insn (gen_pcalau12i_gr (operands[2], operands[1])); >>    }) >> >> -(define_peephole2 >> -  [(set (match_operand:P 0 "register_operand") >> - (match_operand:P 1 "symbolic_pcrel_operand")) >> -   (set (match_operand:GPR 2 "register_operand") >> - (any_extend:GPR (mem:SUBDI (match_dup 0))))] >> -  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ >> -   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \ >> -   && (peep2_reg_dead_p (2, operands[0]) \ >> -       || REGNO (operands[0]) == REGNO (operands[2]))" >> -  [(set (match_dup 2) >> - (any_extend:GPR (mem:SUBDI (lo_sum:P (match_dup 0) >> -      (match_dup 1)))))] >> +(define_insn_and_split "simple_load_ext" >> +  [(set (match_operand:GPR 0 "register_operand" "=r") >> + (any_extend:GPR >> +   (mem:SUBDI (match_operand:P 1 "symbolic_pcrel_operand" ""))))] >> +  "loongarch_pre_reload_split () \ >> +   && la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ >> +   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM)" >> +  "#" >> +  "" >> +  [(set (match_dup 0) >> + (any_extend:GPR >> +   (mem:SUBDI (lo_sum:P (match_dup 2) (match_dup 1)))))] >>    { >> -    emit_insn (gen_pcalau12i_gr (operands[0], operands[1])); >> +    operands[2] = gen_reg_rtx (Pmode); >> +    emit_insn (gen_pcalau12i_gr (operands[2], operands[1])); >>    }) >> >> -(define_peephole2 >> -  [(set (match_operand:P 0 "register_operand") >> - (match_operand:P 1 "symbolic_pcrel_operand")) >> -   (set (match_operand:GPR 2 "register_operand") >> +(define_insn_and_split >> +  "simple_load_off_ext" >> +  [(set (match_operand:GPR 0 "register_operand" "=r") >> + (any_extend:GPR >> +   (mem:SUBDI >> +     (plus (match_operand:P 1 "symbolic_pcrel_operand" "") >> +   (match_operand 2 "const_int_operand" "")))))] >> +  "loongarch_pre_reload_split () \ >> +   && la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ >> +   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM)" >> +  "#" >> +  "" >> +  [(set (match_dup 0) >>   (any_extend:GPR >> -   (mem:SUBDI (plus (match_dup 0) >> -    (match_operand 3 "const_int_operand")))))] >> -  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ >> -   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \ >> -   && (peep2_reg_dead_p (2, operands[0]) \ >> -       || REGNO (operands[0]) == REGNO (operands[2]))" >> -  [(set (match_dup 2) >> - (any_extend:GPR (mem:SUBDI (lo_sum:P (match_dup 0) >> -      (match_dup 1)))))] >> +   (mem:SUBDI (lo_sum:P (match_dup 2) (match_dup 1)))))] >>    { >> -    operands[1] = plus_constant (Pmode, operands[1], INTVAL (operands[3])); >> -    emit_insn (gen_pcalau12i_gr (operands[0], operands[1])); >> +    HOST_WIDE_INT offset = INTVAL (operands[2]); >> +    operands[2] = gen_reg_rtx (Pmode); >> +    operands[1] = plus_constant (Pmode, operands[1], offset); >> +    emit_insn (gen_pcalau12i_gr (operands[2], operands[1])); >>    }) >> >> -(define_peephole2 >> -  [(set (match_operand:P 0 "register_operand") >> - (match_operand:P 1 "symbolic_pcrel_operand")) >> -   (set (mem:ST_ANY (match_dup 0)) >> - (match_operand:ST_ANY 2 "register_operand"))] >> -  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ >> -   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \ >> -   && (peep2_reg_dead_p (2, operands[0])) \ >> -   && REGNO (operands[0]) != REGNO (operands[2])" >> -  [(set (mem:ST_ANY (lo_sum:P (match_dup 0) (match_dup 1))) (match_dup 2))] >> +(define_insn_and_split "simple_store" >> +  [(set (mem:ST_ANY (match_operand:P 0 "symbolic_pcrel_operand")) >> + (match_operand:ST_ANY 1 "register_operand" "r,f"))] >> +  "loongarch_pre_reload_split () \ >> +   && la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ >> +   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM)" >> +  "#" >> +  "" >> +  [(set (mem:ST_ANY (lo_sum:P (match_dup 2) (match_dup 0))) (match_dup 1))] >>    { >> -    emit_insn (gen_pcalau12i_gr (operands[0], operands[1])); >> +    operands[2] = gen_reg_rtx (Pmode); >> +    emit_insn (gen_pcalau12i_gr (operands[2], operands[0])); >>    }) >> >> -(define_peephole2 >> -  [(set (match_operand:P 0 "register_operand") >> - (match_operand:P 1 "symbolic_pcrel_operand")) >> -   (set (mem:ST_ANY (plus (match_dup 0) >> -   (match_operand 3 "const_int_operand"))) >> - (match_operand:ST_ANY 2 "register_operand"))] >> -  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ >> -   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \ >> -   && (peep2_reg_dead_p (2, operands[0])) \ >> -   && REGNO (operands[0]) != REGNO (operands[2])" >> -  [(set (mem:ST_ANY (lo_sum:P (match_dup 0) (match_dup 1))) (match_dup 2))] >> +(define_insn_and_split "simple_store_off" >> +  [(set (mem:ST_ANY >> +   (plus (match_operand:P 0 "symbolic_pcrel_operand" "") >> + (match_operand 1 "const_int_operand" ""))) >> + (match_operand:ST_ANY 2 "register_operand" "r,f"))] >> +  "loongarch_pre_reload_split () \ >> +   && la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ >> +   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM)" >> +  "#" >> +  "" >> +  [(set (mem:ST_ANY (lo_sum:P (match_dup 1) (match_dup 0))) (match_dup 2))] >>    { >> -    operands[1] = plus_constant (Pmode, operands[1], INTVAL (operands[3])); >> -    emit_insn (gen_pcalau12i_gr (operands[0], operands[1])); >> +    HOST_WIDE_INT offset = INTVAL (operands[1]); >> +    operands[1] = gen_reg_rtx (Pmode); >> +    operands[0] = plus_constant (Pmode, operands[0], offset); >> +    emit_insn (gen_pcalau12i_gr (operands[1], operands[0])); >>    }) >> >>  ;; Synchronization instructions. >> diff --git a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c >> new file mode 100644 >> index 00000000000..42cb966d1e0 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c >> @@ -0,0 +1,11 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d -mexplicit-relocs=auto" } */ >> + >> +float a[8001]; >> +float >> +t (void) >> +{ >> +  return a[0] + a[8000]; >> +} >> + >> +/* { dg-final { scan-assembler-not "la.local" } } */