public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v3] LoongArch: Optimize immediate load.
@ 2022-11-01 12:04 Lulu Cheng
  2022-11-04  2:22 ` Xi Ruoyao
  0 siblings, 1 reply; 5+ messages in thread
From: Lulu Cheng @ 2022-11-01 12:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: xry111, i, xuchenghua, Lulu Cheng

v1 -> v2:
1. Change the code format.
2. Fix bugs in the code.

v2 -> v3:
Modifying a code implementation of an undefined behavior.

Both regression tests and spec2006 passed.

The problem mentioned in the link does not move the four immediate load
instructions out of the loop. It has been optimized. Now, as in the test case,
four immediate load instructions are generated outside the loop.
(https://sourceware.org/pipermail/libc-alpha/2022-September/142202.html)



--------------------------------------------------------------------
Fixed an issue where the compiler would not take four 64-bit immediate
load instructions out of the loop.

gcc/ChangeLog:

	* config/loongarch/constraints.md (x): New constraint.
	* config/loongarch/loongarch.cc (struct loongarch_address_info):
	Adds a method to load the immediate 32 to 64 bit field.
	(struct loongarch_integer_op): Define a new member curr_value,
	that records the value of the number stored in the destination
	register immediately after the current instruction has run.
	(LARCH_MAX_INTEGER_OPS): Define this macro as 3.
	(LU32I_B): Move to the loongarch.h.
	(LU52I_B): Likewise.
	(loongarch_build_integer): Adds a method to load the immediate
	32 to 63 bits.
	(loongarch_move_integer): Likewise.
	(loongarch_print_operand_reloc): Modifying comment information.
	* config/loongarch/loongarch.h (LU32I_B): Move from loongarch.cc.
	(LU52I_B): Likewise.
	(HWIT_UC_0xFFFFFFFF): New macro.
	(HI32_OPERAND): New macro.
	* config/loongarch/loongarch.md (load_hi32): New template.
	* config/loongarch/predicates.md (const_hi32_operand): Determines
	whether the value is an immediate number that has a value of only
	the higher 32 bits.
	(hi32_mask_operand): Immediately counts the mask of 32 to 61 bits.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/imm-load.c: New test.
---
 gcc/config/loongarch/constraints.md           |   7 +-
 gcc/config/loongarch/loongarch.cc             | 105 +++++++++++-------
 gcc/config/loongarch/loongarch.h              |   9 ++
 gcc/config/loongarch/loongarch.md             |  34 ++++++
 gcc/config/loongarch/predicates.md            |   8 ++
 gcc/testsuite/gcc.target/loongarch/imm-load.c |  25 +++++
 6 files changed, 148 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load.c

diff --git a/gcc/config/loongarch/constraints.md b/gcc/config/loongarch/constraints.md
index 43cb7b5f0f5..1dcf09ce5eb 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -46,7 +46,7 @@
 ;; "u" "A signed 52bit constant and low 32-bit is zero (for logic instructions)"
 ;; "v" "A signed 64-bit constant and low 44-bit is zero (for logic instructions)."
 ;; "w" "Matches any valid memory."
-;; "x" <-----unused
+;; "x" "A signed 64-bit constant and low 32-bit is zero (for logic instructions)."
 ;; "y" <-----unused
 ;; "z" FCC_REGS
 ;; "A" <-----unused
@@ -139,6 +139,11 @@ (define_constraint "v"
   (and (match_code "const_int")
        (match_test "LU52I_OPERAND (ival)")))
 
+(define_constraint "x"
+  "A signed 64-bit constant and low 32-bit is zero (for logic instructions)."
+  (and (match_code "const_int")
+       (match_test "HI32_OPERAND (ival)")))
+
 (define_register_constraint "z" "FCC_REGS"
   "A floating-point condition code register.")
 
diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index f54c233f90c..28c05c2a193 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -139,6 +139,9 @@ struct loongarch_address_info
    METHOD_LU52I:
      Load 52-63 bit of the immediate number.
 
+   METHOD_LD_HI32:
+     Load 32-63 bit of the immediate number.
+
    METHOD_INSV:
      immediate like 0xfff00000fffffxxx
    */
@@ -147,20 +150,26 @@ enum loongarch_load_imm_method
   METHOD_NORMAL,
   METHOD_LU32I,
   METHOD_LU52I,
+  METHOD_LD_HI32,
   METHOD_INSV
 };
 
 struct loongarch_integer_op
 {
   enum rtx_code code;
+  /* Current Immediate Count The immediate count of the load instruction.  */
   HOST_WIDE_INT value;
+  /* Represent the result of the immediate count of the load instruction at
+     each step.  */
+  HOST_WIDE_INT curr_value;
   enum loongarch_load_imm_method method;
 };
 
 /* The largest number of operations needed to load an integer constant.
-   The worst accepted case for 64-bit constants is LU12I.W,LU32I.D,LU52I.D,ORI
-   or LU12I.W,LU32I.D,LU52I.D,ADDI.D DECL_ASSEMBLER_NAME.  */
-#define LARCH_MAX_INTEGER_OPS 4
+   The worst accepted case for 64-bit constants is LU12I.W,
+   LOAD_HI32(LU32I.D,LU52I.D),ORI or LU12I.W,LOAD_HI32(LU32I.D,LU52I.D),
+   ADDI.D DECL_ASSEMBLER_NAME.  */
+#define LARCH_MAX_INTEGER_OPS 3
 
 /* Arrays that map GCC register numbers to debugger register numbers.  */
 int loongarch_dwarf_regno[FIRST_PSEUDO_REGISTER];
@@ -1454,9 +1463,6 @@ loongarch_expand_epilogue (bool sibcall_p)
     emit_jump_insn (gen_simple_return_internal (ra));
 }
 
-#define LU32I_B (0xfffffULL << 32)
-#define LU52I_B (0xfffULL << 52)
-
 /* Fill CODES with a sequence of rtl operations to load VALUE.
    Return the number of operations needed.  */
 
@@ -1474,24 +1480,27 @@ loongarch_build_integer (struct loongarch_integer_op *codes,
     {
       /* The value of the lower 32 bit be loaded with one instruction.
 	 lu12i.w.  */
-      codes[0].code = UNKNOWN;
-      codes[0].method = METHOD_NORMAL;
-      codes[0].value = low_part;
+      codes[cost].code = UNKNOWN;
+      codes[cost].method = METHOD_NORMAL;
+      codes[cost].value = low_part;
+      codes[cost].curr_value = low_part;
       cost++;
     }
   else
     {
       /* lu12i.w + ior.  */
-      codes[0].code = UNKNOWN;
-      codes[0].method = METHOD_NORMAL;
-      codes[0].value = low_part & ~(IMM_REACH - 1);
+      codes[cost].code = UNKNOWN;
+      codes[cost].method = METHOD_NORMAL;
+      codes[cost].value = low_part & ~(IMM_REACH - 1);
+      codes[cost].curr_value = codes[cost].value;
       cost++;
       HOST_WIDE_INT iorv = low_part & (IMM_REACH - 1);
       if (iorv != 0)
 	{
-	  codes[1].code = IOR;
-	  codes[1].method = METHOD_NORMAL;
-	  codes[1].value = iorv;
+	  codes[cost].code = IOR;
+	  codes[cost].method = METHOD_NORMAL;
+	  codes[cost].value = iorv;
+	  codes[cost].curr_value = low_part;
 	  cost++;
 	}
     }
@@ -1514,23 +1523,34 @@ loongarch_build_integer (struct loongarch_integer_op *codes,
 	{
 	  codes[cost].method = METHOD_LU52I;
 	  codes[cost].value = value & LU52I_B;
+	  codes[cost].curr_value = codes[cost].value
+	    | (codes[cost-1].curr_value & 0xfffffffffffff);
 	  return cost + 1;
 	}
 
-      codes[cost].method = METHOD_LU32I;
-      codes[cost].value = (value & LU32I_B) | (sign51 ? LU52I_B : 0);
-      cost++;
-
-      /* Determine whether the 52-61 bits are sign-extended from the low order,
-	 and if not, load the 52-61 bits.  */
-      if (!lu52i[(value & (HOST_WIDE_INT_1U << 51)) >> 51])
+      if (lu52i[sign51])
 	{
-	  codes[cost].method = METHOD_LU52I;
-	  codes[cost].value = value & LU52I_B;
+	  /* Determine whether the 52-63 bits are sign-extended from the low
+	     order.  If so, the 52-63 bits of the immediate number do not need
+	     to be loaded.  */
+	  codes[cost].method = METHOD_LU32I;
+	  codes[cost].value = (value & LU32I_B) | (sign51 ? LU52I_B : 0);
+	  codes[cost].curr_value = codes[cost].value
+	    | (codes[cost-1].curr_value & 0xffffffff);
+	  cost++;
+	}
+      else
+	{
+	  /* If the higher 32 bits of the 64bit immediate need to be loaded
+	     separately by two instructions, a false immediate load instruction
+	     load_hi32 is used to load them.  */
+	  codes[cost].method = METHOD_LD_HI32;
+	  codes[cost].value = value & 0xffffffff00000000;
+	  codes[cost].curr_value = codes[cost].value
+	    | (codes[cost-1].curr_value & 0xffffffff);
 	  cost++;
 	}
     }
-
   gcc_assert (cost <= LARCH_MAX_INTEGER_OPS);
 
   return cost;
@@ -2910,29 +2930,36 @@ loongarch_move_integer (rtx temp, rtx dest, unsigned HOST_WIDE_INT value)
       else
 	x = force_reg (mode, x);
 
+      set_unique_reg_note (get_last_insn (), REG_EQUAL,
+			   GEN_INT (codes[i-1].curr_value));
+
       switch (codes[i].method)
 	{
 	case METHOD_NORMAL:
+	  /* mov or ior.  */
 	  x = gen_rtx_fmt_ee (codes[i].code, mode, x,
 			      GEN_INT (codes[i].value));
 	  break;
 	case METHOD_LU32I:
-	  emit_insn (
-	    gen_rtx_SET (x,
-			 gen_rtx_IOR (DImode,
-				      gen_rtx_ZERO_EXTEND (
-					DImode, gen_rtx_SUBREG (SImode, x, 0)),
-				      GEN_INT (codes[i].value))));
+	  gcc_assert (mode == DImode);
+	  /* lu32i_d */
+	  x = gen_rtx_IOR (mode, gen_rtx_ZERO_EXTEND (mode,
+						gen_rtx_SUBREG (SImode, x, 0)),
+			   GEN_INT (codes[i].value));
 	  break;
 	case METHOD_LU52I:
-	  emit_insn (gen_lu52i_d (x, x, GEN_INT (0xfffffffffffff),
-				  GEN_INT (codes[i].value)));
+	  gcc_assert (mode == DImode);
+	  /* lu52i_d */
+	  x = gen_rtx_IOR (mode, gen_rtx_AND (mode, x,
+					      GEN_INT (0xfffffffffffff)),
+			   GEN_INT (codes[i].value));
 	  break;
-	case METHOD_INSV:
-	  emit_insn (
-	    gen_rtx_SET (gen_rtx_ZERO_EXTRACT (DImode, x, GEN_INT (20),
-					       GEN_INT (32)),
-			 gen_rtx_REG (DImode, 0)));
+	case METHOD_LD_HI32:
+	  /* Load the high 32 bits of the immediate number.  */
+	  gcc_assert (mode == DImode);
+	  /* load_hi32 */
+	  x = gen_rtx_IOR (mode, gen_rtx_AND (mode, x, GEN_INT (0xffffffff)),
+			   GEN_INT (codes[i].value));
 	  break;
 	default:
 	  gcc_unreachable ();
@@ -4890,7 +4917,7 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool hi64_part,
    'd'	Print CONST_INT OP in decimal.
    'F'	Print the FPU branch condition for comparison OP.
    'G'	Print a DBAR insn if the memory model requires a release.
-   'H'  Print address 52-61bit relocation associated with OP.
+   'H'  Print address 52-63bit relocation associated with OP.
    'h'  Print the high-part relocation associated with OP.
    'i'	Print i if the operand is not a register.
    'L'  Print the low-part relocation associated with OP.
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index f4a9c329fef..9190591e1a1 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -564,6 +564,9 @@ enum reg_class
 #define IMM_REACH (HOST_WIDE_INT_1 << IMM_BITS)
 #define HWIT_1U HOST_WIDE_INT_1U
 
+#define LU32I_B (0xfffffULL << 32)
+#define LU52I_B (0xfffULL << 52)
+
 /* True if VALUE is an unsigned 6-bit number.  */
 
 #define UIMM6_OPERAND(VALUE) (((VALUE) & ~(unsigned HOST_WIDE_INT) 0x3f) == 0)
@@ -605,6 +608,12 @@ enum reg_class
 #define LU52I_OPERAND(VALUE) \
   (((VALUE) | (HWIT_UC_0xFFF << 52)) == (HWIT_UC_0xFFF << 52))
 
+/* True if VALUE can be loaded into a register using load_hi32.  */
+
+#define HWIT_UC_0xFFFFFFFF HOST_WIDE_INT_UC(0xffffffff)
+#define HI32_OPERAND(VALUE) \
+  (((VALUE) | (HWIT_UC_0xFFFFFFFF << 32)) == (HWIT_UC_0xFFFFFFFF << 32))
+
 /* Return a value X with the low 12 bits clear, and such that
    VALUE - X is a signed 12-bit value.  */
 
diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index 214b14bddd3..c52f5f35ff1 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1882,6 +1882,39 @@ (define_expand "mov<mode>cc"
   DONE;
 })
 
+
+;; Load immediate to the 32-63 bits of the source register.
+(define_insn_and_split "load_hi32"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(ior:DI
+	  (and:DI (match_operand:DI 1 "register_operand" "0")
+		  (match_operand 2 "hi32_mask_operand"))
+	(match_operand 3 "const_hi32_operand" "x")))]
+  "TARGET_64BIT"
+  "#"
+  ""
+  [(set (match_dup 0)
+	(ior:DI
+	  (zero_extend:DI
+	    (subreg:SI (match_dup 1) 0))
+	  (match_dup 4)))
+   (set (match_dup 0)
+	(ior:DI
+	  (and:DI (match_dup 0)
+		  (match_dup 6))
+	  (match_dup 5)))]
+{
+  HOST_WIDE_INT value = INTVAL (operands[3]);
+  int sign51 = (value & (HWIT_1U << 51)) >> 51;
+
+  operands[4] = GEN_INT ((value & LU32I_B) | (sign51 ? LU52I_B : 0));
+  operands[5] = GEN_INT (value & LU52I_B);
+  operands[6] = GEN_INT (0xfffffffffffff);
+}
+  [(set_attr "insn_count" "2")])
+
+;; Load immediately counts to 32-51 bits of the source register,
+;; with high bit symbol extensions.
 (define_insn "lu32i_d"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(ior:DI
@@ -1893,6 +1926,7 @@ (define_insn "lu32i_d"
   [(set_attr "type" "arith")
    (set_attr "mode" "DI")])
 
+;; Load immediately counts to bits 52-63 of the source register.
 (define_insn "lu52i_d"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(ior:DI
diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md
index 8bd0c1376c9..29d81ff0250 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -35,6 +35,10 @@ (define_predicate "const_lu52i_operand"
   (and (match_code "const_int")
        (match_test "LU52I_OPERAND (INTVAL (op))")))
 
+(define_predicate "const_hi32_operand"
+  (and (match_code "const_int")
+       (match_test "HI32_OPERAND (INTVAL (op))")))
+
 (define_predicate "const_arith_operand"
   (and (match_code "const_int")
        (match_test "IMM12_OPERAND (INTVAL (op))")))
@@ -103,6 +107,10 @@ (define_predicate "lu52i_mask_operand"
   (and (match_code "const_int")
        (match_test "UINTVAL (op) == 0xfffffffffffff")))
 
+(define_predicate "hi32_mask_operand"
+  (and (match_code "const_int")
+       (match_test "UINTVAL (op) == 0xffffffff")))
+
 (define_predicate "low_bitmask_operand"
   (and (match_code "const_int")
        (match_test "low_bitmask_len (mode, INTVAL (op)) > 12")))
diff --git a/gcc/testsuite/gcc.target/loongarch/imm-load.c b/gcc/testsuite/gcc.target/loongarch/imm-load.c
new file mode 100644
index 00000000000..91ceb33d058
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/imm-load.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64d -O2 -fdump-rtl-loop2_invariant" } */
+
+extern long long b[10];
+static inline long long
+repeat_bytes (void)
+{
+  long long r = 0x0101010101010101;
+
+  return r;
+}
+
+static inline long long
+highbit_mask (long long m)
+{
+  return m & repeat_bytes ();
+}
+
+void test(long long *a)
+{
+  for (int i = 0; i < 10; i++)
+    b[i] = highbit_mask (a[i]);
+
+}
+/* { dg-final { scan-rtl-dump-times "moved without introducing a new temporary register" 4 "loop2_invariant" } } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] LoongArch: Optimize immediate load.
  2022-11-01 12:04 [PATCH v3] LoongArch: Optimize immediate load Lulu Cheng
@ 2022-11-04  2:22 ` Xi Ruoyao
  2022-11-04  2:33   ` Lulu Cheng
  0 siblings, 1 reply; 5+ messages in thread
From: Xi Ruoyao @ 2022-11-04  2:22 UTC (permalink / raw)
  To: Lulu Cheng, gcc-patches; +Cc: i, xuchenghua

On Tue, 2022-11-01 at 20:04 +0800, Lulu Cheng wrote:
> gcc/ChangeLog:
> 
>         * config/loongarch/constraints.md (x): New constraint.
>         * config/loongarch/loongarch.cc (struct loongarch_address_info):
>         Adds a method to load the immediate 32 to 64 bit field.
>         (struct loongarch_integer_op): Define a new member curr_value,
>         that records the value of the number stored in the destination
>         register immediately after the current instruction has run.
>         (LARCH_MAX_INTEGER_OPS): Define this macro as 3.
>         (LU32I_B): Move to the loongarch.h.
>         (LU52I_B): Likewise.
>         (loongarch_build_integer): Adds a method to load the immediate
>         32 to 63 bits.
>         (loongarch_move_integer): Likewise.

We need to mention "call set_unique_reg_note" here because it seems the
key to resolve the issue.

Otherwise LGTM.

>         (loongarch_print_operand_reloc): Modifying comment information.
>         * config/loongarch/loongarch.h (LU32I_B): Move from loongarch.cc.
>         (LU52I_B): Likewise.
>         (HWIT_UC_0xFFFFFFFF): New macro.
>         (HI32_OPERAND): New macro.
>         * config/loongarch/loongarch.md (load_hi32): New template.
>         * config/loongarch/predicates.md (const_hi32_operand): Determines
>         whether the value is an immediate number that has a value of only
>         the higher 32 bits.
>         (hi32_mask_operand): Immediately counts the mask of 32 to 61 bits.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] LoongArch: Optimize immediate load.
  2022-11-04  2:22 ` Xi Ruoyao
@ 2022-11-04  2:33   ` Lulu Cheng
  2022-11-04  2:56     ` Xi Ruoyao
  0 siblings, 1 reply; 5+ messages in thread
From: Lulu Cheng @ 2022-11-04  2:33 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua


在 2022/11/4 上午10:22, Xi Ruoyao 写道:
> On Tue, 2022-11-01 at 20:04 +0800, Lulu Cheng wrote:
>> gcc/ChangeLog:
>>
>>          * config/loongarch/constraints.md (x): New constraint.
>>          * config/loongarch/loongarch.cc (struct loongarch_address_info):
>>          Adds a method to load the immediate 32 to 64 bit field.
>>          (struct loongarch_integer_op): Define a new member curr_value,
>>          that records the value of the number stored in the destination
>>          register immediately after the current instruction has run.
>>          (LARCH_MAX_INTEGER_OPS): Define this macro as 3.
>>          (LU32I_B): Move to the loongarch.h.
>>          (LU52I_B): Likewise.
>>          (loongarch_build_integer): Adds a method to load the immediate
>>          32 to 63 bits.
>>          (loongarch_move_integer): Likewise.
> We need to mention "call set_unique_reg_note" here because it seems the
> key to resolve the issue.

During debugging, I found the problem because the source register and 
destination

register of the lu32i.d instruction are the same. As a result, during 
loop2_invariant pass,

the destination register of lu32i.d is used twice, so the instructions 
after this instruction

will not be brought out of the loop.

Therefore, I combined lu32i.d and lu52i.d into one template, which 
avoids the situation

that the same register is used twice. It is not split into two 
instructions until loop2_invariant has

been optimized. So I don't think "set_unique_reg_note" plays a decisive 
role in this optimization.

>
> Otherwise LGTM.
>
>>          (loongarch_print_operand_reloc): Modifying comment information.
>>          * config/loongarch/loongarch.h (LU32I_B): Move from loongarch.cc.
>>          (LU52I_B): Likewise.
>>          (HWIT_UC_0xFFFFFFFF): New macro.
>>          (HI32_OPERAND): New macro.
>>          * config/loongarch/loongarch.md (load_hi32): New template.
>>          * config/loongarch/predicates.md (const_hi32_operand): Determines
>>          whether the value is an immediate number that has a value of only
>>          the higher 32 bits.
>>          (hi32_mask_operand): Immediately counts the mask of 32 to 61 bits.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] LoongArch: Optimize immediate load.
  2022-11-04  2:33   ` Lulu Cheng
@ 2022-11-04  2:56     ` Xi Ruoyao
  2022-11-04  3:07       ` Lulu Cheng
  0 siblings, 1 reply; 5+ messages in thread
From: Xi Ruoyao @ 2022-11-04  2:56 UTC (permalink / raw)
  To: Lulu Cheng, gcc-patches; +Cc: i, xuchenghua

On Fri, 2022-11-04 at 10:33 +0800, Lulu Cheng wrote:
> 
> 在 2022/11/4 上午10:22, Xi Ruoyao 写道:
> > On Tue, 2022-11-01 at 20:04 +0800, Lulu Cheng wrote:
> > > gcc/ChangeLog:
> > > 
> > >          * config/loongarch/constraints.md (x): New constraint.
> > >          * config/loongarch/loongarch.cc (struct loongarch_address_info):
> > >          Adds a method to load the immediate 32 to 64 bit field.
> > >          (struct loongarch_integer_op): Define a new member curr_value,
> > >          that records the value of the number stored in the destination
> > >          register immediately after the current instruction has run.
> > >          (LARCH_MAX_INTEGER_OPS): Define this macro as 3.
> > >          (LU32I_B): Move to the loongarch.h.
> > >          (LU52I_B): Likewise.
> > >          (loongarch_build_integer): Adds a method to load the immediate
> > >          32 to 63 bits.
> > >          (loongarch_move_integer): Likewise.
> > We need to mention "call set_unique_reg_note" here because it seems the
> > key to resolve the issue.
> 
> During debugging, I found the problem because the source register and 
> destination register of the lu32i.d instruction are the same. As a
> result, during loop2_invariant pass, the destination register of
> lu32i.d is used twice, so the instructions after this instruction will
> not be brought out of the loop. Therefore, I combined lu32i.d and
> lu52i.d into one template, which avoids the situation that the same
> register is used twice. It is not split into two instructions until
> loop2_invariant has been optimized. So I don't think
> "set_unique_reg_note" plays a decisive role in this optimization.

It's better to mention this logic in the commit message then, to prevent
others from misunderstandings like mine.

Again the code change LGTM and I've tested it with --with-build-
config=bootstrap-ubsan.

> > 
> > Otherwise LGTM.
> > 
> > >          (loongarch_print_operand_reloc): Modifying comment information.
> > >          * config/loongarch/loongarch.h (LU32I_B): Move from loongarch.cc.
> > >          (LU52I_B): Likewise.
> > >          (HWIT_UC_0xFFFFFFFF): New macro.
> > >          (HI32_OPERAND): New macro.
> > >          * config/loongarch/loongarch.md (load_hi32): New template.
> > >          * config/loongarch/predicates.md (const_hi32_operand): Determines
> > >          whether the value is an immediate number that has a value of only
> > >          the higher 32 bits.
> > >          (hi32_mask_operand): Immediately counts the mask of 32 to 61 bits.
> 

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] LoongArch: Optimize immediate load.
  2022-11-04  2:56     ` Xi Ruoyao
@ 2022-11-04  3:07       ` Lulu Cheng
  0 siblings, 0 replies; 5+ messages in thread
From: Lulu Cheng @ 2022-11-04  3:07 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua


在 2022/11/4 上午10:56, Xi Ruoyao 写道:
> On Fri, 2022-11-04 at 10:33 +0800, Lulu Cheng wrote:
>> 在 2022/11/4 上午10:22, Xi Ruoyao 写道:
>>> On Tue, 2022-11-01 at 20:04 +0800, Lulu Cheng wrote:
>>>> gcc/ChangeLog:
>>>>
>>>>           * config/loongarch/constraints.md (x): New constraint.
>>>>           * config/loongarch/loongarch.cc (struct loongarch_address_info):
>>>>           Adds a method to load the immediate 32 to 64 bit field.
>>>>           (struct loongarch_integer_op): Define a new member curr_value,
>>>>           that records the value of the number stored in the destination
>>>>           register immediately after the current instruction has run.
>>>>           (LARCH_MAX_INTEGER_OPS): Define this macro as 3.
>>>>           (LU32I_B): Move to the loongarch.h.
>>>>           (LU52I_B): Likewise.
>>>>           (loongarch_build_integer): Adds a method to load the immediate
>>>>           32 to 63 bits.
>>>>           (loongarch_move_integer): Likewise.
>>> We need to mention "call set_unique_reg_note" here because it seems the
>>> key to resolve the issue.
>> During debugging, I found the problem because the source register and
>> destination register of the lu32i.d instruction are the same. As a
>> result, during loop2_invariant pass, the destination register of
>> lu32i.d is used twice, so the instructions after this instruction will
>> not be brought out of the loop. Therefore, I combined lu32i.d and
>> lu52i.d into one template, which avoids the situation that the same
>> register is used twice. It is not split into two instructions until
>> loop2_invariant has been optimized. So I don't think
>> "set_unique_reg_note" plays a decisive role in this optimization.
> It's better to mention this logic in the commit message then, to prevent
> others from misunderstandings like mine.
>
> Again the code change LGTM and I've tested it with --with-build-
> config=bootstrap-ubsan.
>
Ok, I'll describe this logic.

Thanks!

>>> Otherwise LGTM.
>>>
>>>>           (loongarch_print_operand_reloc): Modifying comment information.
>>>>           * config/loongarch/loongarch.h (LU32I_B): Move from loongarch.cc.
>>>>           (LU52I_B): Likewise.
>>>>           (HWIT_UC_0xFFFFFFFF): New macro.
>>>>           (HI32_OPERAND): New macro.
>>>>           * config/loongarch/loongarch.md (load_hi32): New template.
>>>>           * config/loongarch/predicates.md (const_hi32_operand): Determines
>>>>           whether the value is an immediate number that has a value of only
>>>>           the higher 32 bits.
>>>>           (hi32_mask_operand): Immediately counts the mask of 32 to 61 bits.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-04  3:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-01 12:04 [PATCH v3] LoongArch: Optimize immediate load Lulu Cheng
2022-11-04  2:22 ` Xi Ruoyao
2022-11-04  2:33   ` Lulu Cheng
2022-11-04  2:56     ` Xi Ruoyao
2022-11-04  3:07       ` Lulu Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).