[PATCH 0/2] loongarch: improve code generation for integer division

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/2] loongarch: improve code generation for integer division
@ 2022-07-07  2:23 Xi Ruoyao
  2022-07-07  2:26 ` [PATCH 1/2] loongarch: add alternatives for idiv insns to improve code generation Xi Ruoyao
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Xi Ruoyao @ 2022-07-07  2:23 UTC (permalink / raw)
  To: gcc-patches; +Cc: Lulu Cheng, Chenghua Xu, Wang Xuerui

We were generating some unnecessary instructions for integer division.
These two patches improve the code generation to compile

    template <class T> T div(T a, T b) { return a / b; }

into a single division instruction (along with a return instruction of
course) as we expected for T in {int32_t, uint32_t, int64_t}.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Xi Ruoyao (2):
  loongarch: add alternatives for idiv insns to improve code generation
  loongarch: avoid unnecessary sign-extend after 32-bit division

 gcc/config/loongarch/loongarch-protos.h    |  1 +
 gcc/config/loongarch/loongarch.cc          |  2 +-
 gcc/config/loongarch/loongarch.md          | 34 ++++++++++++++++------
 gcc/testsuite/gcc.target/loongarch/div-1.c |  9 ++++++
 gcc/testsuite/gcc.target/loongarch/div-2.c |  9 ++++++
 gcc/testsuite/gcc.target/loongarch/div-3.c |  9 ++++++
 gcc/testsuite/gcc.target/loongarch/div-4.c |  9 ++++++
 7 files changed, 63 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-3.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-4.c

-- 
2.37.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] loongarch: add alternatives for idiv insns to improve code generation
  2022-07-07  2:23 [PATCH 0/2] loongarch: improve code generation for integer division Xi Ruoyao
@ 2022-07-07  2:26 ` Xi Ruoyao
  2022-07-07  2:29 ` [PATCH 2/2] loongarch: avoid unnecessary sign-extend after 32-bit division Xi Ruoyao
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Xi Ruoyao @ 2022-07-07  2:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Lulu Cheng, Chenghua Xu, Wang Xuerui

Currently in the description of LoongArch integer division instructions,
the output is marked as earlyclobbered ('&').  It's necessary when
loongarch_check_zero_div_p() because clobbering operand 2 (divisor) will
make the checking for zero divisor impossible.

But, for -mno-check-zero-division (the default of GCC >= 12.2 for
optimized code), the output is not earlyclobbered at all.  And, the
read of operand 1 only occurs before clobbering the output.  So we make
three alternatives for an idiv instruction:

* (=r,r,r): For -mno-check-zero-division.
* (=&r,r,r): For -mcheck-zero-division.
* (=&r,0,r): For -mcheck-zero-division, to explicitly allow patterns
  like "div.d $a0, $a0, $a1".

gcc/ChangeLog:

	* config/loongarch/loongarch.cc (loongarch_check_zero_div_p):
	Remove static, for use in the machine description file.
	* config/loongarch/loongarch-protos.h:
	(loongarch_check_zero_div_p): Add prototype.
	* config/loongarch/loongarch.md (enabled): New attr.
	(*<optab><mode>3): Add (=r,r,r) and (=&r,0,r) alternatives for
	idiv.  Conditionally enable the alternatives using
	loongarch_check_zero_div_p.
	(<optab>di3_fake): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/div-1.c: New test.
	* gcc.target/loongarch/div-2.c: New test.
	* gcc.target/loongarch/div-3.c: New test.
---
 gcc/config/loongarch/loongarch-protos.h    |  1 +
 gcc/config/loongarch/loongarch.cc          |  2 +-
 gcc/config/loongarch/loongarch.md          | 28 +++++++++++++++-------
 gcc/testsuite/gcc.target/loongarch/div-1.c |  9 +++++++
 gcc/testsuite/gcc.target/loongarch/div-2.c |  9 +++++++
 gcc/testsuite/gcc.target/loongarch/div-3.c |  9 +++++++
 6 files changed, 49 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-3.c

diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h
index 2144c2421ed..2287fd3763c 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -130,6 +130,7 @@ extern bool loongarch_symbol_binds_local_p (const_rtx);
 extern const char *current_section_name (void);
 extern unsigned int current_section_flags (void);
 extern bool loongarch_use_ins_ext_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
+extern bool loongarch_check_zero_div_p (void);
 
 union loongarch_gen_fn_ptrs
 {
diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index d72b256df51..bc56282c9a7 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2104,7 +2104,7 @@ loongarch_load_store_insns (rtx mem, rtx_insn *insn)
 
 /* Return true if we need to trap on division by zero.  */
 
-static bool
+bool
 loongarch_check_zero_div_p (void)
 {
   /* if -m[no-]check-zero-division is given explicitly.  */
diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index d3c809e25f3..b002eb2ac22 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -110,6 +110,8 @@ (define_constants
 ;;
 ;; ....................
 
+(define_attr "enabled" "no,yes" (const_string "yes"))
+
 (define_attr "got" "unset,load"
   (const_string "unset"))
 
@@ -763,26 +765,36 @@ (define_expand "<optab><mode>3"
 })
 
 (define_insn "*<optab><mode>3"
-  [(set (match_operand:GPR 0 "register_operand" "=&r")
-	(any_div:GPR (match_operand:GPR 1 "register_operand" "r")
-		     (match_operand:GPR 2 "register_operand" "r")))]
+  [(set (match_operand:GPR 0 "register_operand" "=r,&r,&r")
+	(any_div:GPR (match_operand:GPR 1 "register_operand" "r,r,0")
+		     (match_operand:GPR 2 "register_operand" "r,r,r")))]
   ""
 {
   return loongarch_output_division ("<insn>.<d><u>\t%0,%1,%2", operands);
 }
   [(set_attr "type" "idiv")
-   (set_attr "mode" "<MODE>")])
+   (set_attr "mode" "<MODE>")
+   (set (attr "enabled")
+      (if_then_else
+	(match_test "!!which_alternative == loongarch_check_zero_div_p()")
+	(const_string "yes")
+	(const_string "no")))])
 
 (define_insn "<optab>di3_fake"
-  [(set (match_operand:SI 0 "register_operand" "=&r")
-	(any_div:SI (match_operand:DI 1 "register_operand" "r")
-		    (match_operand:DI 2 "register_operand" "r")))]
+  [(set (match_operand:SI 0 "register_operand" "=r,&r,&r")
+	(any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
+		    (match_operand:DI 2 "register_operand" "r,r,r")))]
   ""
 {
   return loongarch_output_division ("<insn>.w<u>\t%0,%1,%2", operands);
 }
   [(set_attr "type" "idiv")
-   (set_attr "mode" "SI")])
+   (set_attr "mode" "SI")
+   (set (attr "enabled")
+      (if_then_else
+	(match_test "!!which_alternative == loongarch_check_zero_div_p()")
+	(const_string "yes")
+	(const_string "no")))])
 
 ;; Floating point multiply accumulate instructions.
 
diff --git a/gcc/testsuite/gcc.target/loongarch/div-1.c b/gcc/testsuite/gcc.target/loongarch/div-1.c
new file mode 100644
index 00000000000..b1683f8535f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/div-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcheck-zero-division" } */
+/* { dg-final { scan-assembler "div.\[wd\]\t\\\$r4,\\\$r4,\\\$r5" } } */
+
+long
+div(long a, long b)
+{
+  return a / b;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/div-2.c b/gcc/testsuite/gcc.target/loongarch/div-2.c
new file mode 100644
index 00000000000..4c2beb5b930
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/div-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-check-zero-division" } */
+/* { dg-final { scan-assembler "div.\[wd\]\t\\\$r4,\\\$r5,\\\$r4" } } */
+
+long
+div(long a, long b)
+{
+  return b / a;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/div-3.c b/gcc/testsuite/gcc.target/loongarch/div-3.c
new file mode 100644
index 00000000000..d25969263f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/div-3.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcheck-zero-division" } */
+/* { dg-final { scan-assembler-not "div.\[wd\]\t\\\$r4,\\\$r5,\\\$r4" } } */
+
+long
+div(long a, long b)
+{
+  return b / a;
+}
-- 
2.37.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] loongarch: avoid unnecessary sign-extend after 32-bit division
  2022-07-07  2:23 [PATCH 0/2] loongarch: improve code generation for integer division Xi Ruoyao
  2022-07-07  2:26 ` [PATCH 1/2] loongarch: add alternatives for idiv insns to improve code generation Xi Ruoyao
@ 2022-07-07  2:29 ` Xi Ruoyao
  2022-07-08  1:23 ` [PATCH 0/2] loongarch: improve code generation for integer division Lulu Cheng
  2022-07-10  2:20 ` Lulu Cheng
  3 siblings, 0 replies; 6+ messages in thread
From: Xi Ruoyao @ 2022-07-07  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: Lulu Cheng, Chenghua Xu, Wang Xuerui

Like add.w/sub.w/mul.w, div.w/mod.w/div.wu/mod.wu also sign-extend the
output on LA64.  But, LoongArch v1.00 mandates that the inputs of 32-bit
division to be sign-extended so we have to expand 32-bit division into
RTL sequences.

We defined div.w/mod.w/div.wu/mod.wu as a (DI, DI) -> SI instruction.
This definition does not indicate the fact that these instructions will
store the result as sign-extended value in a 64-bit GR.  Then the
compiler would emit unnecessary sign-extend operations.  For example:

    int div(int a, int b) { return a / b; }

was compiled to:

    div.w  $r4, $r4, $r5
    slli.w $r4, $r4, 0    # this is unnecessary
    jr     $r1

To remove this unnecessary operation, we change the division
instructions to (DI, DI) -> DI and describe the sign-extend behavior
explicitly in the RTL template.  In the expander for 32-bit division we
then use simplify_gen_subreg to extract the lower 32 bits.

gcc/ChangeLog:

	* config/loongarch/loongarch.md (<any_div>di3_fake): Describe
	the sign-extend of result in the RTL template.
	(<any_div><mode>3): Adjust for <any_div>di3_fake change.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/div-4.c: New test.
---
 gcc/config/loongarch/loongarch.md          | 12 ++++++++----
 gcc/testsuite/gcc.target/loongarch/div-4.c |  9 +++++++++
 2 files changed, 17 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-4.c

diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index b002eb2ac22..0202f73efae 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -752,6 +752,7 @@ (define_expand "<optab><mode>3"
   {
     rtx reg1 = gen_reg_rtx (DImode);
     rtx reg2 = gen_reg_rtx (DImode);
+    rtx rd = gen_reg_rtx (DImode);
 
     operands[1] = gen_rtx_SIGN_EXTEND (word_mode, operands[1]);
     operands[2] = gen_rtx_SIGN_EXTEND (word_mode, operands[2]);
@@ -759,7 +760,9 @@ (define_expand "<optab><mode>3"
     emit_insn (gen_rtx_SET (reg1, operands[1]));
     emit_insn (gen_rtx_SET (reg2, operands[2]));
 
-    emit_insn (gen_<optab>di3_fake (operands[0], reg1, reg2));
+    emit_insn (gen_<optab>di3_fake (rd, reg1, reg2));
+    emit_insn (gen_rtx_SET (operands[0],
+			    simplify_gen_subreg (SImode, rd, DImode, 0)));
     DONE;
   }
 })
@@ -781,9 +784,10 @@ (define_insn "*<optab><mode>3"
 	(const_string "no")))])
 
 (define_insn "<optab>di3_fake"
-  [(set (match_operand:SI 0 "register_operand" "=r,&r,&r")
-	(any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
-		    (match_operand:DI 2 "register_operand" "r,r,r")))]
+  [(set (match_operand:DI 0 "register_operand" "=r,&r,&r")
+	(sign_extend:DI
+	  (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
+		      (match_operand:DI 2 "register_operand" "r,r,r"))))]
   ""
 {
   return loongarch_output_division ("<insn>.w<u>\t%0,%1,%2", operands);
diff --git a/gcc/testsuite/gcc.target/loongarch/div-4.c b/gcc/testsuite/gcc.target/loongarch/div-4.c
new file mode 100644
index 00000000000..a52f87d6caf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/div-4.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "slli" } } */
+
+int
+div(int a, int b)
+{
+  return a / b;
+}
-- 
2.37.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] loongarch: improve code generation for integer division
  2022-07-07  2:23 [PATCH 0/2] loongarch: improve code generation for integer division Xi Ruoyao
  2022-07-07  2:26 ` [PATCH 1/2] loongarch: add alternatives for idiv insns to improve code generation Xi Ruoyao
  2022-07-07  2:29 ` [PATCH 2/2] loongarch: avoid unnecessary sign-extend after 32-bit division Xi Ruoyao
@ 2022-07-08  1:23 ` Lulu Cheng
  2022-07-10  2:20 ` Lulu Cheng
  3 siblings, 0 replies; 6+ messages in thread
From: Lulu Cheng @ 2022-07-08  1:23 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: Chenghua Xu, Wang Xuerui


在 2022/7/7 上午10:23, Xi Ruoyao 写道:
> We were generating some unnecessary instructions for integer division.
> These two patches improve the code generation to compile
>
>      template <class T> T div(T a, T b) { return a / b; }
>
> into a single division instruction (along with a return instruction of
> course) as we expected for T in {int32_t, uint32_t, int64_t}.
>
> Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
>
> Xi Ruoyao (2):
>    loongarch: add alternatives for idiv insns to improve code generation
>    loongarch: avoid unnecessary sign-extend after 32-bit division
>
>   gcc/config/loongarch/loongarch-protos.h    |  1 +
>   gcc/config/loongarch/loongarch.cc          |  2 +-
>   gcc/config/loongarch/loongarch.md          | 34 ++++++++++++++++------
>   gcc/testsuite/gcc.target/loongarch/div-1.c |  9 ++++++
>   gcc/testsuite/gcc.target/loongarch/div-2.c |  9 ++++++
>   gcc/testsuite/gcc.target/loongarch/div-3.c |  9 ++++++
>   gcc/testsuite/gcc.target/loongarch/div-4.c |  9 ++++++
>   7 files changed, 63 insertions(+), 10 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-1.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-2.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-3.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-4.c
>
I am testing the spec and it can be done today or tomorrow.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] loongarch: improve code generation for integer division
  2022-07-07  2:23 [PATCH 0/2] loongarch: improve code generation for integer division Xi Ruoyao
                   ` (2 preceding siblings ...)
  2022-07-08  1:23 ` [PATCH 0/2] loongarch: improve code generation for integer division Lulu Cheng
@ 2022-07-10  2:20 ` Lulu Cheng
  2022-07-10  3:44   ` Xi Ruoyao
  3 siblings, 1 reply; 6+ messages in thread
From: Lulu Cheng @ 2022-07-10  2:20 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: Chenghua Xu, Wang Xuerui


在 2022/7/7 上午10:23, Xi Ruoyao 写道:
> We were generating some unnecessary instructions for integer division.
> These two patches improve the code generation to compile
>
>      template <class T> T div(T a, T b) { return a / b; }
>
> into a single division instruction (along with a return instruction of
> course) as we expected for T in {int32_t, uint32_t, int64_t}.
>
> Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
>
> Xi Ruoyao (2):
>    loongarch: add alternatives for idiv insns to improve code generation
>    loongarch: avoid unnecessary sign-extend after 32-bit division
>
>   gcc/config/loongarch/loongarch-protos.h    |  1 +
>   gcc/config/loongarch/loongarch.cc          |  2 +-
>   gcc/config/loongarch/loongarch.md          | 34 ++++++++++++++++------
>   gcc/testsuite/gcc.target/loongarch/div-1.c |  9 ++++++
>   gcc/testsuite/gcc.target/loongarch/div-2.c |  9 ++++++
>   gcc/testsuite/gcc.target/loongarch/div-3.c |  9 ++++++
>   gcc/testsuite/gcc.target/loongarch/div-4.c |  9 ++++++
>   7 files changed, 63 insertions(+), 10 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-1.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-2.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-3.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-4.c
>
LGTM and spec has been tested.

Thanks!


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] loongarch: improve code generation for integer division
  2022-07-10  2:20 ` Lulu Cheng
@ 2022-07-10  3:44   ` Xi Ruoyao
  0 siblings, 0 replies; 6+ messages in thread
From: Xi Ruoyao @ 2022-07-10  3:44 UTC (permalink / raw)
  To: Lulu Cheng, gcc-patches; +Cc: Chenghua Xu, Wang Xuerui

On Sun, 2022-07-10 at 10:20 +0800, Lulu Cheng wrote:
> 
> 在 2022/7/7 上午10:23, Xi Ruoyao 写道:
> > We were generating some unnecessary instructions for integer
> > division.
> > These two patches improve the code generation to compile
> > 
> >      template <class T> T div(T a, T b) { return a / b; }
> > 
> > into a single division instruction (along with a return instruction
> > of
> > course) as we expected for T in {int32_t, uint32_t, int64_t}.
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> > 
> > Xi Ruoyao (2):
> >    loongarch: add alternatives for idiv insns to improve code
> > generation
> >    loongarch: avoid unnecessary sign-extend after 32-bit division
> > 
> >   gcc/config/loongarch/loongarch-protos.h    |  1 +
> >   gcc/config/loongarch/loongarch.cc          |  2 +-
> >   gcc/config/loongarch/loongarch.md          | 34 ++++++++++++++++--
> > ----
> >   gcc/testsuite/gcc.target/loongarch/div-1.c |  9 ++++++
> >   gcc/testsuite/gcc.target/loongarch/div-2.c |  9 ++++++
> >   gcc/testsuite/gcc.target/loongarch/div-3.c |  9 ++++++
> >   gcc/testsuite/gcc.target/loongarch/div-4.c |  9 ++++++
> >   7 files changed, 63 insertions(+), 10 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-1.c
> >   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-2.c
> >   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-3.c
> >   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-4.c
> > 
> LGTM and spec has been tested.
> 
> Thanks!

Pushed r13-1592 and r13-1593.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-07-10  3:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-07  2:23 [PATCH 0/2] loongarch: improve code generation for integer division Xi Ruoyao
2022-07-07  2:26 ` [PATCH 1/2] loongarch: add alternatives for idiv insns to improve code generation Xi Ruoyao
2022-07-07  2:29 ` [PATCH 2/2] loongarch: avoid unnecessary sign-extend after 32-bit division Xi Ruoyao
2022-07-08  1:23 ` [PATCH 0/2] loongarch: improve code generation for integer division Lulu Cheng
2022-07-10  2:20 ` Lulu Cheng
2022-07-10  3:44   ` Xi Ruoyao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).