From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=42Pv=34=loongson.cn=chenglulu@sourceware.org>
Received: from loongson.cn (mail.loongson.cn [114.242.206.163])
	by sourceware.org (Postfix) with ESMTP id 72C5C3858D3C
	for <gcc-patches@gcc.gnu.org>; Mon, 28 Nov 2022 02:46:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 72C5C3858D3C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn
Received: from loongson.cn (unknown [10.20.4.52])
	by gateway (Coremail) with SMTP id _____8Cxruv9IIRju4sBAA--.3837S3;
	Mon, 28 Nov 2022 10:46:22 +0800 (CST)
Received: from [10.20.4.52] (unknown [10.20.4.52])
	by localhost.localdomain (Coremail) with SMTP id AQAAf8Cxn+D7IIRjU9QcAA--.8231S2;
	Mon, 28 Nov 2022 10:46:19 +0800 (CST)
Subject: Re: [pushed][PATCH v4] LoongArch: Optimize immediate load.
To: Xi Ruoyao <xry111@xry111.site>, gcc-patches@gcc.gnu.org
Cc: i@xen0n.name, xuchenghua@loongson.cn
References: <20221117095909.2896386-1-chenglulu@loongson.cn>
 <f466da699ce6de53ff66965fcecfc938e9f2b2d7.camel@xry111.site>
 <32b7624b24d1f48805d4c777ebde1380fd3d1596.camel@xry111.site>
From: Lulu Cheng <chenglulu@loongson.cn>
Message-ID: <85e70380-399e-41dc-127f-bd442d371e85@loongson.cn>
Date: Mon, 28 Nov 2022 10:46:19 +0800
User-Agent: Mozilla/5.0 (X11; Linux mips64; rv:68.0) Gecko/20100101
 Thunderbird/68.7.0
MIME-Version: 1.0
In-Reply-To: <32b7624b24d1f48805d4c777ebde1380fd3d1596.camel@xry111.site>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-CM-TRANSID:AQAAf8Cxn+D7IIRjU9QcAA--.8231S2
X-CM-SenderInfo: xfkh0wpoxo3qxorr0wxvrqhubq/
X-Coremail-Antispam: 1Uk129KBjvAXoWfGr15uFWfJr1xCFy8tryftFb_yoW8WFW8to
	W0gw13Jr1rXr1jgr4DGr15Jr15Xr1UJrsrtrWUJ343GF18Ar1UA3yUJryYy3y3JF1kGw1U
	JryUXryjyFy7Xr1rn29KB7ZKAUJUUUU5529EdanIXcx71UUUUU7KY7ZEXasCq-sGcSsGvf
	J3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU0xBIdaVrnRJU
	UUv2b4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2IYs7xG6rWj6s
	0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0
	Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1l84
	ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AKxVW8Jr0_Cr1U
	M2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx1l5I8CrVACY4
	xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1Y6r17McIj6I8E87Iv67AKxVWUJVW8
	JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IY64vIr41lc7I2V7IY0VAS07AlzVAYIcxG8w
	CF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j
	6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw1lIxkGc2Ij64
	vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Jr0_
	Gr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0x
	vEx4A2jsIEc7CjxVAFwI0_Jr0_GrUvcSsGvfC2KfnxnUUI43ZEXa7IU1QVy3UUUUU==
X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,BODY_8BITS,GIT_PATCH_0,KAM_DMARC_STATUS,KAM_SHORT,NICE_REPLY_A,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Pushed r13-4315.

在 2022/11/23 上午12:44, Xi Ruoyao 写道:
> On Tue, 2022-11-22 at 22:03 +0800, Xi Ruoyao via Gcc-patches wrote:
>> While I still can't fully understand the immediate load issue and how
>> this patch fix it, I've tested this patch (alongside the prefetch
>> instruction patch) with bootstrap-ubsan.  And the compiled result of
>> imm-load1.c seems OK.
> And it's doing correct thing for Glibc "improved generic string
> functions" patch, producing some really tight loop now.
>
>> On Thu, 2022-11-17 at 17:59 +0800, Lulu Cheng wrote:
>>> v1 -> v2:
>>> 1. Change the code format.
>>> 2. Fix bugs in the code.
>>>
>>> v2 -> v3:
>>> Modifying a code implementation of an undefined behavior.
>>>
>>> v3 -> v4:
>>> Move the part of the immediate number decomposition from expand pass
>>> to split
>>> pass.
>>>
>>> Both regression tests and spec2006 passed.
>>>
>>> The problem mentioned in the link does not move the four immediate
>>> load
>>> instructions out of the loop. It has been optimized. Now, as in the
>>> test case,
>>> four immediate load instructions are generated outside the loop.
>>> (
>>> https://sourceware.org/pipermail/libc-alpha/2022-September/142202.html
>>> )
>>>
>>> --------------------------------------------------------------------
>>> Because loop2_invariant pass will extract the instructions that do
>>> not
>>> change
>>> in the loop out of the loop, some instructions will not meet the
>>> extraction
>>> conditions if the machine performs immediate decomposition while
>>> expand pass,
>>> so the immediate decomposition will be transferred to the split
>>> process.
>>>
>>> gcc/ChangeLog:
>>>
>>>          * config/loongarch/loongarch.cc (enum
>>> loongarch_load_imm_method):
>>>          Remove the member METHOD_INSV that is not currently used.
>>>          (struct loongarch_integer_op): Define a new member
>>> curr_value,
>>>          that records the value of the number stored in the
>>> destination
>>>          register immediately after the current instruction has run.
>>>          (loongarch_build_integer): Assign a value to the curr_value
>>> member variable.
>>>          (loongarch_move_integer): Adds information for the immediate
>>> load instruction.
>>>          * config/loongarch/loongarch.md (*movdi_32bit): Redefine as
>>> define_insn_and_split.
>>>          (*movdi_64bit): Likewise.
>>>          (*movsi_internal): Likewise.
>>>          (*movhi_internal): Likewise.
>>>          * config/loongarch/predicates.md: Return true as long as it
>>> is
>>> CONST_INT, ensure
>>>          that the immediate number is not optimized by decomposition
>>> during expand
>>>          optimization loop.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>          * gcc.target/loongarch/imm-load.c: New test.
>>>          * gcc.target/loongarch/imm-load1.c: New test.
>>> ---
>>>   gcc/config/loongarch/loongarch.cc             | 62 ++++++++++------
>>> --
>>> -
>>>   gcc/config/loongarch/loongarch.md             | 44 +++++++++++--
>>>   gcc/config/loongarch/predicates.md            |  2 +-
>>>   gcc/testsuite/gcc.target/loongarch/imm-load.c | 10 +++
>>>   .../gcc.target/loongarch/imm-load1.c          | 26 ++++++++
>>>   5 files changed, 110 insertions(+), 34 deletions(-)
>>>   create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load.c
>>>   create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load1.c
>>>
>>> diff --git a/gcc/config/loongarch/loongarch.cc
>>> b/gcc/config/loongarch/loongarch.cc
>>> index 8ee32c90573..9e0d6c7c3ea 100644
>>> --- a/gcc/config/loongarch/loongarch.cc
>>> +++ b/gcc/config/loongarch/loongarch.cc
>>> @@ -139,22 +139,21 @@ struct loongarch_address_info
>>>   
>>>      METHOD_LU52I:
>>>        Load 52-63 bit of the immediate number.
>>> -
>>> -   METHOD_INSV:
>>> -     immediate like 0xfff00000fffffxxx
>>> -   */
>>> +*/
>>>   enum loongarch_load_imm_method
>>>   {
>>>     METHOD_NORMAL,
>>>     METHOD_LU32I,
>>> -  METHOD_LU52I,
>>> -  METHOD_INSV
>>> +  METHOD_LU52I
>>>   };
>>>   
>>>   struct loongarch_integer_op
>>>   {
>>>     enum rtx_code code;
>>>     HOST_WIDE_INT value;
>>> +  /* Represent the result of the immediate count of the load
>>> instruction at
>>> +     each step.  */
>>> +  HOST_WIDE_INT curr_value;
>>>     enum loongarch_load_imm_method method;
>>>   };
>>>   
>>> @@ -1475,24 +1474,27 @@ loongarch_build_integer (struct
>>> loongarch_integer_op *codes,
>>>       {
>>>         /* The value of the lower 32 bit be loaded with one
>>> instruction.
>>>           lu12i.w.  */
>>> -      codes[0].code = UNKNOWN;
>>> -      codes[0].method = METHOD_NORMAL;
>>> -      codes[0].value = low_part;
>>> +      codes[cost].code = UNKNOWN;
>>> +      codes[cost].method = METHOD_NORMAL;
>>> +      codes[cost].value = low_part;
>>> +      codes[cost].curr_value = low_part;
>>>         cost++;
>>>       }
>>>     else
>>>       {
>>>         /* lu12i.w + ior.  */
>>> -      codes[0].code = UNKNOWN;
>>> -      codes[0].method = METHOD_NORMAL;
>>> -      codes[0].value = low_part & ~(IMM_REACH - 1);
>>> +      codes[cost].code = UNKNOWN;
>>> +      codes[cost].method = METHOD_NORMAL;
>>> +      codes[cost].value = low_part & ~(IMM_REACH - 1);
>>> +      codes[cost].curr_value = codes[cost].value;
>>>         cost++;
>>>         HOST_WIDE_INT iorv = low_part & (IMM_REACH - 1);
>>>         if (iorv != 0)
>>>          {
>>> -         codes[1].code = IOR;
>>> -         codes[1].method = METHOD_NORMAL;
>>> -         codes[1].value = iorv;
>>> +         codes[cost].code = IOR;
>>> +         codes[cost].method = METHOD_NORMAL;
>>> +         codes[cost].value = iorv;
>>> +         codes[cost].curr_value = low_part;
>>>            cost++;
>>>          }
>>>       }
>>> @@ -1515,11 +1517,14 @@ loongarch_build_integer (struct
>>> loongarch_integer_op *codes,
>>>          {
>>>            codes[cost].method = METHOD_LU52I;
>>>            codes[cost].value = value & LU52I_B;
>>> +         codes[cost].curr_value = value;
>>>            return cost + 1;
>>>          }
>>>   
>>>         codes[cost].method = METHOD_LU32I;
>>>         codes[cost].value = (value & LU32I_B) | (sign51 ? LU52I_B :
>>> 0);
>>> +      codes[cost].curr_value = (value & 0xfffffffffffff)
>>> +       | (sign51 ? LU52I_B : 0);
>>>         cost++;
>>>   
>>>         /* Determine whether the 52-61 bits are sign-extended from
>>> the
>>> low order,
>>> @@ -1528,6 +1533,7 @@ loongarch_build_integer (struct
>>> loongarch_integer_op *codes,
>>>          {
>>>            codes[cost].method = METHOD_LU52I;
>>>            codes[cost].value = value & LU52I_B;
>>> +         codes[cost].curr_value = value;
>>>            cost++;
>>>          }
>>>       }
>>> @@ -2911,6 +2917,9 @@ loongarch_move_integer (rtx temp, rtx dest,
>>> unsigned HOST_WIDE_INT value)
>>>         else
>>>          x = force_reg (mode, x);
>>>   
>>> +      set_unique_reg_note (get_last_insn (), REG_EQUAL,
>>> +                          GEN_INT (codes[i-1].curr_value));
>>> +
>>>         switch (codes[i].method)
>>>          {
>>>          case METHOD_NORMAL:
>>> @@ -2918,22 +2927,17 @@ loongarch_move_integer (rtx temp, rtx dest,
>>> unsigned HOST_WIDE_INT value)
>>>                                GEN_INT (codes[i].value));
>>>            break;
>>>          case METHOD_LU32I:
>>> -         emit_insn (
>>> -           gen_rtx_SET (x,
>>> -                        gen_rtx_IOR (DImode,
>>> -                                     gen_rtx_ZERO_EXTEND (
>>> -                                       DImode, gen_rtx_SUBREG
>>> (SImode, x, 0)),
>>> -                                     GEN_INT (codes[i].value))));
>>> +         gcc_assert (mode == DImode);
>>> +         x = gen_rtx_IOR (DImode,
>>> +                          gen_rtx_ZERO_EXTEND (DImode,
>>> +                                               gen_rtx_SUBREG
>>> (SImode, x, 0)),
>>> +                          GEN_INT (codes[i].value));
>>>            break;
>>>          case METHOD_LU52I:
>>> -         emit_insn (gen_lu52i_d (x, x, GEN_INT (0xfffffffffffff),
>>> -                                 GEN_INT (codes[i].value)));
>>> -         break;
>>> -       case METHOD_INSV:
>>> -         emit_insn (
>>> -           gen_rtx_SET (gen_rtx_ZERO_EXTRACT (DImode, x, GEN_INT
>>> (20),
>>> -                                              GEN_INT (32)),
>>> -                        gen_rtx_REG (DImode, 0)));
>>> +         gcc_assert (mode == DImode);
>>> +         x = gen_rtx_IOR (DImode,
>>> +                          gen_rtx_AND (DImode, x, GEN_INT
>>> (0xfffffffffffff)),
>>> +                          GEN_INT (codes[i].value));
>>>            break;
>>>          default:
>>>            gcc_unreachable ();
>>> diff --git a/gcc/config/loongarch/loongarch.md
>>> b/gcc/config/loongarch/loongarch.md
>>> index 2fda5381904..f61db66d535 100644
>>> --- a/gcc/config/loongarch/loongarch.md
>>> +++ b/gcc/config/loongarch/loongarch.md
>>> @@ -1718,23 +1718,41 @@ (define_expand "movdi"
>>>       DONE;
>>>   })
>>>   
>>> -(define_insn "*movdi_32bit"
>>> +(define_insn_and_split "*movdi_32bit"
>>>     [(set (match_operand:DI 0 "nonimmediate_operand"
>>> "=r,r,r,w,*f,*f,*r,*m")
>>>          (match_operand:DI 1 "move_operand"
>>> "r,i,w,r,*J*r,*m,*f,*f"))]
>>>     "!TARGET_64BIT
>>>      && (register_operand (operands[0], DImode)
>>>          || reg_or_0_operand (operands[1], DImode))"
>>>     { return loongarch_output_move (operands[0], operands[1]); }
>>> +  "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P
>>> (REGNO
>>> +  (operands[0]))"
>>> +  [(const_int 0)]
>>> +  "
>>> +{
>>> +  loongarch_move_integer (operands[0], operands[0], INTVAL
>>> (operands[1]));
>>> +  DONE;
>>> +}
>>> +  "
>>>     [(set_attr "move_type"
>>> "move,const,load,store,mgtf,fpload,mftg,fpstore")
>>>      (set_attr "mode" "DI")])
>>>   
>>> -(define_insn "*movdi_64bit"
>>> +(define_insn_and_split "*movdi_64bit"
>>>     [(set (match_operand:DI 0 "nonimmediate_operand"
>>> "=r,r,r,w,*f,*f,*r,*m")
>>>          (match_operand:DI 1 "move_operand"
>>> "r,Yd,w,rJ,*r*J,*m,*f,*f"))]
>>>     "TARGET_64BIT
>>>      && (register_operand (operands[0], DImode)
>>>          || reg_or_0_operand (operands[1], DImode))"
>>>     { return loongarch_output_move (operands[0], operands[1]); }
>>> +  "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P
>>> (REGNO
>>> +  (operands[0]))"
>>> +  [(const_int 0)]
>>> +  "
>>> +{
>>> +  loongarch_move_integer (operands[0], operands[0], INTVAL
>>> (operands[1]));
>>> +  DONE;
>>> +}
>>> +  "
>>>     [(set_attr "move_type"
>>> "move,const,load,store,mgtf,fpload,mftg,fpstore")
>>>      (set_attr "mode" "DI")])
>>>   
>>> @@ -1749,12 +1767,21 @@ (define_expand "movsi"
>>>       DONE;
>>>   })
>>>   
>>> -(define_insn "*movsi_internal"
>>> +(define_insn_and_split "*movsi_internal"
>>>     [(set (match_operand:SI 0 "nonimmediate_operand"
>>> "=r,r,r,w,*f,*f,*r,*m,*r,*z")
>>>          (match_operand:SI 1 "move_operand"
>>> "r,Yd,w,rJ,*r*J,*m,*f,*f,*z,*r"))]
>>>     "(register_operand (operands[0], SImode)
>>>       || reg_or_0_operand (operands[1], SImode))"
>>>     { return loongarch_output_move (operands[0], operands[1]); }
>>> +  "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P
>>> (REGNO
>>> +  (operands[0]))"
>>> +  [(const_int 0)]
>>> +  "
>>> +{
>>> +  loongarch_move_integer (operands[0], operands[0], INTVAL
>>> (operands[1]));
>>> +  DONE;
>>> +}
>>> +  "
>>>     [(set_attr "move_type"
>>> "move,const,load,store,mgtf,fpload,mftg,fpstore,mftg,mgtf")
>>>      (set_attr "mode" "SI")])
>>>   
>>> @@ -1774,12 +1801,21 @@ (define_expand "movhi"
>>>       DONE;
>>>   })
>>>   
>>> -(define_insn "*movhi_internal"
>>> +(define_insn_and_split "*movhi_internal"
>>>     [(set (match_operand:HI 0 "nonimmediate_operand"
>>> "=r,r,r,r,m,r,k")
>>>          (match_operand:HI 1 "move_operand" "r,Yd,I,m,rJ,k,rJ"))]
>>>     "(register_operand (operands[0], HImode)
>>>          || reg_or_0_operand (operands[1], HImode))"
>>>     { return loongarch_output_move (operands[0], operands[1]); }
>>> +  "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P
>>> (REGNO
>>> +  (operands[0]))"
>>> +  [(const_int 0)]
>>> +  "
>>> +{
>>> +  loongarch_move_integer (operands[0], operands[0], INTVAL
>>> (operands[1]));
>>> +  DONE;
>>> +}
>>> +  "
>>>     [(set_attr "move_type" "move,const,const,load,store,load,store")
>>>      (set_attr "mode" "HI")])
>>>   
>>> diff --git a/gcc/config/loongarch/predicates.md
>>> b/gcc/config/loongarch/predicates.md
>>> index 8bd0c1376c9..58c3dc2261c 100644
>>> --- a/gcc/config/loongarch/predicates.md
>>> +++ b/gcc/config/loongarch/predicates.md
>>> @@ -226,7 +226,7 @@ (define_predicate "move_operand"
>>>     switch (GET_CODE (op))
>>>       {
>>>       case CONST_INT:
>>> -      return !splittable_const_int_operand (op, mode);
>>> +      return true;
>>>   
>>>       case CONST:
>>>       case SYMBOL_REF:
>>> diff --git a/gcc/testsuite/gcc.target/loongarch/imm-load.c
>>> b/gcc/testsuite/gcc.target/loongarch/imm-load.c
>>> new file mode 100644
>>> index 00000000000..c04ca33996f
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/loongarch/imm-load.c
>>> @@ -0,0 +1,10 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-mabi=lp64d -O2 -fdump-rtl-split1" } */
>>> +
>>> +long int
>>> +test (void)
>>> +{
>>> +  return 0x1234567890abcdef;
>>> +}
>>> +/* { dg-final { scan-rtl-dump-times "scanning new insn with uid" 6
>>> "split1" } } */
>>> +
>>> diff --git a/gcc/testsuite/gcc.target/loongarch/imm-load1.c
>>> b/gcc/testsuite/gcc.target/loongarch/imm-load1.c
>>> new file mode 100644
>>> index 00000000000..2ff02971239
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/loongarch/imm-load1.c
>>> @@ -0,0 +1,26 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-mabi=lp64d -O2" } */
>>> +/* { dg-final { scan-assembler
>>> "test:.*lu52i\.d.*\n\taddi\.w.*\n\.L2:" } } */
>>> +
>>> +
>>> +extern long long b[10];
>>> +static inline long long
>>> +repeat_bytes (void)
>>> +{
>>> +  long long r = 0x0101010101010101;
>>> +
>>> +  return r;
>>> +}
>>> +
>>> +static inline long long
>>> +highbit_mask (long long m)
>>> +{
>>> +  return m & repeat_bytes ();
>>> +}
>>> +
>>> +void test(long long *a)
>>> +{
>>> +  for (int i = 0; i < 10; i++)
>>> +    b[i] = highbit_mask (a[i]);
>>> +
>>> +}