Re: [PATCH] LoongArch: Use the movcf2gr instruction to implement cstore<ANYF:mode>4

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: [PATCH] LoongArch: Use the movcf2gr instruction to implement cstore<ANYF:mode>4
       [not found] <1225b53f-aef6-ab79-c861-bc6c0eb87778@loongson.cn>
@ 2023-12-14  7:44 ` Jiahao Xu
  2023-12-14  8:18   ` Xi Ruoyao
  0 siblings, 1 reply; 3+ messages in thread
From: Jiahao Xu @ 2023-12-14  7:44 UTC (permalink / raw)
  To: gcc-patches, xry111; +Cc: chenglulu, i, xuchenghua

[-- Attachment #1: Type: text/plain, Size: 11242 bytes --]


The implementation of this patch has some issues. When I compile 521.wrf 
with -Ofast -mlasx -flto -muse-movcf2gr, it results in an ICE:
during RTL pass: reload
module_mp_fast_sbm.fppized.f90: In function 'fast_sbm.constprop':
module_mp_fast_sbm.fppized.f90:1369:25: internal compiler error: maximum 
number of generated reload insns per insn achieved (90)
  1369 |   END SUBROUTINE FAST_SBM
       |                         ^
0x1207221bf lra_constraints(bool)
         ../../gcc/gcc/lra-constraints.cc:5429
0x120705a3f lra(_IO_FILE*, int)
         ../../gcc/gcc/lra.cc:2442
0x1206ac93f do_reload
         ../../gcc/gcc/ira.cc:5973
0x1206ac93f execute
         ../../gcc/gcc/ira.cc:6161

	

	

	

	

We used a branch to load floating-point comparison results into GPR.
This is very slow when the branch is not predictable.

Use the movcf2gr instruction to implement cstore<ANYF:mode>4 if movcf2gr
is fast enough.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in (muse-movcf2gr): New
option.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch-tune.h
(loongarch_rtx_cost_data::movcf2gr): New field.
(loongarch_rtx_cost_data::movcf2gr_): New method.
(loongarch_rtx_cost_data::use_movcf2gr): New method.
(simple_insn_cost): Declare.
* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Set movcf2gr
to COSTS_N_INSNS (7).
(loongarch_cpu_rtx_cost_data): Set movcf2gr to COSTS_N_INSNS (1)
for LA664.
(loongarch_rtx_cost_optimize_size): Set movcf2gr to
COSTS_N_INSNS (1) + 1.
(simple_insn_cost): Define and initialize to COSTS_N_INSNS (1).
* doc/invoke.texi (-muse-movcf2gr): Document the new option.
* config/loongarch/predicates.md (loongarch_fcmp_operator): New
predicate.
* config/loongarch/loongarch.md (movcf2gr<GPR:mode>): New
define_insn.
(cstore<ANYF:mode>4): New define_expand.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Set the default of
-muse-movcf2gr based on -mtune=.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/movcf2gr.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu (twice, with
BOOT_CFLAGS and {C,CXX}FLAGS_FOR_TARGET set to "-O2 -muse-movcf2gr" and
"-O2 -mno-use-movcf2gr"). Ok for trunk?

gcc/config/loongarch/genopts/loongarch.opt.in | 4 +++
gcc/config/loongarch/loongarch-def.cc | 12 +++++--
gcc/config/loongarch/loongarch-tune.h | 14 ++++++++
gcc/config/loongarch/loongarch.cc | 3 ++
gcc/config/loongarch/loongarch.md | 36 +++++++++++++++++++
gcc/config/loongarch/loongarch.opt | 4 +++
gcc/config/loongarch/predicates.md | 4 +++
gcc/doc/invoke.texi | 8 +++++
gcc/testsuite/gcc.target/loongarch/movcf2gr.c | 9 +++++
9 files changed, 91 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/loongarch/movcf2gr.c

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index c3848d02fd3..a87915d9b5a 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -245,6 +245,10 @@ mpass-mrelax-to-as
Target Var(loongarch_pass_mrelax_to_as) Init(HAVE_AS_MRELAX_OPTION)
Pass -mrelax or -mno-relax option to the assembler.
+muse-movcf2gr
+Target Var(loongarch_use_movcf2gr) Init(M_OPT_UNSET)
+Emit the movcf2gr instruction.
+
-param=loongarch-vect-unroll-limit=
Target Joined UInteger Var(loongarch_vect_unroll_limit) Init(6) 
IntegerRange(1, 64) Param
Used to limit unroll factor which indicates how much the autovectorizer may
diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index 4a8885e8343..6da085d375e 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -36,6 +36,8 @@ using array_tune = array<T, N_TUNE_TYPES>;
template <class T>
using array_arch = array<T, N_ARCH_TYPES>;
+const int simple_insn_cost = COSTS_N_INSNS (1);
+
/* CPU property tables. */
array_tune<const char *> loongarch_cpu_strings = array_tune<const char *> ()
.set (CPU_NATIVE, STR_CPU_NATIVE)
@@ -101,15 +103,18 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
int_mult_di (COSTS_N_INSNS (4)),
int_div_si (COSTS_N_INSNS (5)),
int_div_di (COSTS_N_INSNS (5)),
+ movcf2gr (COSTS_N_INSNS (7)),
branch_cost (6),
memory_latency (4) {}
/* The following properties cannot be looked up directly using "cpucfg".
So it is necessary to provide a default value for "unknown native"
tune targets (i.e. -mtune=native while PRID does not correspond to
- any known "-mtune" type). Currently all numbers are default. */
+ any known "-mtune" type). */
array_tune<loongarch_rtx_cost_data> loongarch_cpu_rtx_cost_data =
- array_tune<loongarch_rtx_cost_data> ();
+ array_tune<loongarch_rtx_cost_data> ()
+ .set (CPU_LA664,
+ loongarch_rtx_cost_data ().movcf2gr_ (COSTS_N_INSNS (1)));
/* RTX costs to use when optimizing for size.
We use a value slightly larger than COSTS_N_INSNS (1) for all of them
@@ -125,7 +130,8 @@ const loongarch_rtx_cost_data 
loongarch_rtx_cost_optimize_size =
.int_mult_si_ (COST_COMPLEX_INSN)
.int_mult_di_ (COST_COMPLEX_INSN)
.int_div_si_ (COST_COMPLEX_INSN)
- .int_div_di_ (COST_COMPLEX_INSN);
+ .int_div_di_ (COST_COMPLEX_INSN)
+ .movcf2gr_ (COST_COMPLEX_INSN);
array_tune<int> loongarch_cpu_issue_rate = array_tune<int> ()
.set (CPU_NATIVE, 4)
diff --git a/gcc/config/loongarch/loongarch-tune.h 
b/gcc/config/loongarch/loongarch-tune.h
index 4aa01c54c08..7f478e009cd 100644
--- a/gcc/config/loongarch/loongarch-tune.h
+++ b/gcc/config/loongarch/loongarch-tune.h
@@ -23,6 +23,8 @@ along with GCC; see the file COPYING3. If not see
#include "loongarch-def-array.h"
+extern const int simple_insn_cost;
+
/* RTX costs of various operations on the different architectures. */
struct loongarch_rtx_cost_data
{
@@ -35,6 +37,7 @@ struct loongarch_rtx_cost_data
unsigned short int_mult_di;
unsigned short int_div_si;
unsigned short int_div_di;
+ unsigned short movcf2gr;
unsigned short branch_cost;
unsigned short memory_latency;
@@ -95,6 +98,12 @@ struct loongarch_rtx_cost_data
return *this;
}
+ loongarch_rtx_cost_data movcf2gr_ (unsigned short _movcf2gr)
+ {
+ movcf2gr = _movcf2gr;
+ return *this;
+ }
+
loongarch_rtx_cost_data branch_cost_ (unsigned short _branch_cost)
{
branch_cost = _branch_cost;
@@ -107,6 +116,11 @@ struct loongarch_rtx_cost_data
return *this;
}
+ bool use_movcf2gr () const
+ {
+ /* If movcf2gr is cheaper than two li.w and a branch, use it. */
+ return movcf2gr <= simple_insn_cost * 2 + branch_cost;
+ }
};
/* Costs to use when optimizing for size. */
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 390e3206a17..35e84964eb7 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7528,6 +7528,9 @@ loongarch_option_override_internal (struct 
gcc_options *opts,
else
loongarch_cost = &loongarch_cpu_rtx_cost_data[la_target.cpu_tune];
+ if (loongarch_use_movcf2gr == M_OPT_UNSET)
+ loongarch_use_movcf2gr = loongarch_cost->use_movcf2gr ();
+
/* If the user hasn't specified a branch cost, use the processor's
default. */
if (loongarch_branch_cost == 0)
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index a5d0dcd65fe..de3015c923b 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -3169,6 +3169,42 @@ (define_insn "s<code>_<ANYF:mode>_using_FCCmode"
[(set_attr "type" "fcmp")
(set_attr "mode" "FCC")])
+(define_insn "movcf2gr<GPR:mode>"
+ [(set (match_operand:GPR 0 "register_operand" "=r")
+ (if_then_else:GPR (ne (match_operand:FCC 1 "register_operand" "z")
+ (const_int 0))
+ (const_int 1)
+ (const_int 0)))]
+ "TARGET_HARD_FLOAT && loongarch_use_movcf2gr"
+ "movcf2gr\t%0,%1"
+ [(set_attr "type" "move")
+ (set_attr "mode" "FCC")])
+
+(define_expand "cstore<ANYF:mode>4"
+ [(set (match_operand:SI 0 "register_operand")
+ (match_operator:SI 1 "loongarch_fcmp_operator"
+ [(match_operand:ANYF 2 "register_operand")
+ (match_operand:ANYF 3 "register_operand")]))]
+ "loongarch_use_movcf2gr"
+ {
+ rtx fcc = gen_reg_rtx (FCCmode);
+ rtx cmp = gen_rtx_fmt_ee (GET_CODE (operands[1]), FCCmode,
+ operands[2], operands[3]);
+
+ emit_insn (gen_rtx_SET (fcc, cmp));
+ if (TARGET_64BIT)
+ {
+ rtx gpr = gen_reg_rtx (DImode);
+ emit_insn (gen_movcf2grdi (gpr, fcc));
+ emit_insn (gen_rtx_SET (operands[0],
+ lowpart_subreg (SImode, gpr, DImode)));
+ }
+ else
+ emit_insn (gen_movcf2grsi (operands[0], fcc));
+
+ DONE;
+ })
+
\f
;;
;; ....................
diff --git a/gcc/config/loongarch/loongarch.opt 
b/gcc/config/loongarch/loongarch.opt
index 61d25130ea9..b553eaa34e7 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -253,6 +253,10 @@ mpass-mrelax-to-as
Target Var(loongarch_pass_mrelax_to_as) Init(HAVE_AS_MRELAX_OPTION)
Pass -mrelax or -mno-relax option to the assembler.
+muse-movcf2gr
+Target Var(loongarch_use_movcf2gr) Init(M_OPT_UNSET)
+Emit the movcf2gr instruction.
+
-param=loongarch-vect-unroll-limit=
Target Joined UInteger Var(loongarch_vect_unroll_limit) Init(6) 
IntegerRange(1, 64) Param
Used to limit unroll factor which indicates how much the autovectorizer may
diff --git a/gcc/config/loongarch/predicates.md 
b/gcc/config/loongarch/predicates.md
index 9e9ce58cb53..83fea08315c 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -590,6 +590,10 @@ (define_predicate "order_operator"
(define_predicate "loongarch_cstore_operator"
(match_code "ne,eq,gt,gtu,ge,geu,lt,ltu,le,leu"))
+(define_predicate "loongarch_fcmp_operator"
+ (match_code
+ "unordered,uneq,unlt,unle,eq,lt,le,ordered,ltgt,ne,ge,gt,unge,ungt"))
+
(define_predicate "small_data_pattern"
(and (match_code "set,parallel,unspec,unspec_volatile,prefetch")
(match_test "loongarch_small_data_pattern_p (op)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1f26f80d26c..1f79a888627 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -26811,6 +26811,14 @@ Enable the approximation for vectorized 
reciprocal square root.
So, for example, @option{-mrecip=all,!sqrt} enables
all of the reciprocal approximations, except for scalar square root.
+@item -muse-movcf2gr
+@itemx -mno-use-movcf2gr
+Use (do not use) the @code{movcf2gr} instruction. The default is
+dependent on the setting of @option{-mtune=} option:
+@option{-muse-movcf2gr} if tuning for a microarchitecture where the
+@code{movcf2gr} instruction is faster than a @code{bceqz} or @code{bcnez}
+branch setting a GPR to 0 or 1; @option{-mno-use-movcf2gr} otherwise.
+
@item loongarch-vect-unroll-limit
The vectorizer will use available tuning information to determine whether it
would be beneficial to unroll the main vectorized loop and by how much. This
diff --git a/gcc/testsuite/gcc.target/loongarch/movcf2gr.c 
b/gcc/testsuite/gcc.target/loongarch/movcf2gr.c
new file mode 100644
index 00000000000..d27c393b5ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/movcf2gr.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mtune=la664 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "movcf2gr\t\\\$r4,\\\$fcc" } } */
+
+int
+t (float a, float b)
+{
+ return a > b;
+}

-- 
2.43.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] LoongArch: Use the movcf2gr instruction to implement cstore<ANYF:mode>4
  2023-12-14  7:44 ` [PATCH] LoongArch: Use the movcf2gr instruction to implement cstore<ANYF:mode>4 Jiahao Xu
@ 2023-12-14  8:18   ` Xi Ruoyao
  0 siblings, 0 replies; 3+ messages in thread
From: Xi Ruoyao @ 2023-12-14  8:18 UTC (permalink / raw)
  To: Jiahao Xu, gcc-patches; +Cc: chenglulu, i, xuchenghua

On Thu, 2023-12-14 at 15:44 +0800, Jiahao Xu wrote:

> The implementation of this patch has some issues. When I compile 521.wrf with -Ofast -mlasx -flto -muse-movcf2gr, it results in an ICE:

Indeed, creating CCCmode pseudos without a complete movfcc
implementation is buggy.

This patch needs a complete rework.

>  during RTL pass: reload
>  module_mp_fast_sbm.fppized.f90: In function 'fast_sbm.constprop':
>  module_mp_fast_sbm.fppized.f90:1369:25: internal compiler error: maximum number of generated reload insns per insn achieved (90)
>   1369 |   END SUBROUTINE FAST_SBM
>        |                         ^
>  0x1207221bf lra_constraints(bool)
>          ../../gcc/gcc/lra-constraints.cc:5429
>  0x120705a3f lra(_IO_FILE*, int)
>          ../../gcc/gcc/lra.cc:2442
>  0x1206ac93f do_reload
>          ../../gcc/gcc/ira.cc:5973
>  0x1206ac93f execute
>          ../../gcc/gcc/ira.cc:6161

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] LoongArch: Use the movcf2gr instruction to implement cstore<ANYF:mode>4
@ 2023-12-13 14:20 Xi Ruoyao
  0 siblings, 0 replies; 3+ messages in thread
From: Xi Ruoyao @ 2023-12-13 14:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: chenglulu, i, xuchenghua, Xi Ruoyao

We used a branch to load floating-point comparison results into GPR.
This is very slow when the branch is not predictable.

Use the movcf2gr instruction to implement cstore<ANYF:mode>4 if movcf2gr
is fast enough.

gcc/ChangeLog:

	* config/loongarch/genopts/loongarch.opt.in (muse-movcf2gr): New
	option.
	* config/loongarch/loongarch.opt: Regenerate.
	* config/loongarch/loongarch-tune.h
	(loongarch_rtx_cost_data::movcf2gr): New field.
	(loongarch_rtx_cost_data::movcf2gr_): New method.
	(loongarch_rtx_cost_data::use_movcf2gr): New method.
	(simple_insn_cost): Declare.
	* config/loongarch/loongarch-def.cc
	(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Set movcf2gr
	to COSTS_N_INSNS (7).
	(loongarch_cpu_rtx_cost_data): Set movcf2gr to COSTS_N_INSNS (1)
	for LA664.
	(loongarch_rtx_cost_optimize_size): Set movcf2gr to
	COSTS_N_INSNS (1) + 1.
	(simple_insn_cost): Define and initialize to COSTS_N_INSNS (1).
	* doc/invoke.texi (-muse-movcf2gr): Document the new option.
	* config/loongarch/predicates.md (loongarch_fcmp_operator): New
	predicate.
	* config/loongarch/loongarch.md (movcf2gr<GPR:mode>): New
	define_insn.
	(cstore<ANYF:mode>4): New define_expand.
	* config/loongarch/loongarch.cc
	(loongarch_option_override_internal): Set the default of
	-muse-movcf2gr based on -mtune=.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/movcf2gr.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu (twice, with
BOOT_CFLAGS and {C,CXX}FLAGS_FOR_TARGET set to "-O2 -muse-movcf2gr" and
"-O2 -mno-use-movcf2gr").  Ok for trunk?

 gcc/config/loongarch/genopts/loongarch.opt.in |  4 +++
 gcc/config/loongarch/loongarch-def.cc         | 12 +++++--
 gcc/config/loongarch/loongarch-tune.h         | 14 ++++++++
 gcc/config/loongarch/loongarch.cc             |  3 ++
 gcc/config/loongarch/loongarch.md             | 36 +++++++++++++++++++
 gcc/config/loongarch/loongarch.opt            |  4 +++
 gcc/config/loongarch/predicates.md            |  4 +++
 gcc/doc/invoke.texi                           |  8 +++++
 gcc/testsuite/gcc.target/loongarch/movcf2gr.c |  9 +++++
 9 files changed, 91 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/movcf2gr.c

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in
index c3848d02fd3..a87915d9b5a 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -245,6 +245,10 @@ mpass-mrelax-to-as
 Target Var(loongarch_pass_mrelax_to_as) Init(HAVE_AS_MRELAX_OPTION)
 Pass -mrelax or -mno-relax option to the assembler.
 
+muse-movcf2gr
+Target Var(loongarch_use_movcf2gr) Init(M_OPT_UNSET)
+Emit the movcf2gr instruction.
+
 -param=loongarch-vect-unroll-limit=
 Target Joined UInteger Var(loongarch_vect_unroll_limit) Init(6) IntegerRange(1, 64) Param
 Used to limit unroll factor which indicates how much the autovectorizer may
diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc
index 4a8885e8343..6da085d375e 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -36,6 +36,8 @@ using array_tune = array<T, N_TUNE_TYPES>;
 template <class T>
 using array_arch = array<T, N_ARCH_TYPES>;
 
+const int simple_insn_cost = COSTS_N_INSNS (1);
+
 /* CPU property tables.  */
 array_tune<const char *> loongarch_cpu_strings = array_tune<const char *> ()
   .set (CPU_NATIVE, STR_CPU_NATIVE)
@@ -101,15 +103,18 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
     int_mult_di (COSTS_N_INSNS (4)),
     int_div_si (COSTS_N_INSNS (5)),
     int_div_di (COSTS_N_INSNS (5)),
+    movcf2gr (COSTS_N_INSNS (7)),
     branch_cost (6),
     memory_latency (4) {}
 
 /* The following properties cannot be looked up directly using "cpucfg".
  So it is necessary to provide a default value for "unknown native"
  tune targets (i.e. -mtune=native while PRID does not correspond to
- any known "-mtune" type).  Currently all numbers are default.  */
+ any known "-mtune" type).  */
 array_tune<loongarch_rtx_cost_data> loongarch_cpu_rtx_cost_data =
-  array_tune<loongarch_rtx_cost_data> ();
+  array_tune<loongarch_rtx_cost_data> ()
+    .set (CPU_LA664,
+	  loongarch_rtx_cost_data ().movcf2gr_ (COSTS_N_INSNS (1)));
 
 /* RTX costs to use when optimizing for size.
    We use a value slightly larger than COSTS_N_INSNS (1) for all of them
@@ -125,7 +130,8 @@ const loongarch_rtx_cost_data loongarch_rtx_cost_optimize_size =
     .int_mult_si_ (COST_COMPLEX_INSN)
     .int_mult_di_ (COST_COMPLEX_INSN)
     .int_div_si_ (COST_COMPLEX_INSN)
-    .int_div_di_ (COST_COMPLEX_INSN);
+    .int_div_di_ (COST_COMPLEX_INSN)
+    .movcf2gr_ (COST_COMPLEX_INSN);
 
 array_tune<int> loongarch_cpu_issue_rate = array_tune<int> ()
   .set (CPU_NATIVE, 4)
diff --git a/gcc/config/loongarch/loongarch-tune.h b/gcc/config/loongarch/loongarch-tune.h
index 4aa01c54c08..7f478e009cd 100644
--- a/gcc/config/loongarch/loongarch-tune.h
+++ b/gcc/config/loongarch/loongarch-tune.h
@@ -23,6 +23,8 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "loongarch-def-array.h"
 
+extern const int simple_insn_cost;
+
 /* RTX costs of various operations on the different architectures.  */
 struct loongarch_rtx_cost_data
 {
@@ -35,6 +37,7 @@ struct loongarch_rtx_cost_data
   unsigned short int_mult_di;
   unsigned short int_div_si;
   unsigned short int_div_di;
+  unsigned short movcf2gr;
   unsigned short branch_cost;
   unsigned short memory_latency;
 
@@ -95,6 +98,12 @@ struct loongarch_rtx_cost_data
     return *this;
   }
 
+  loongarch_rtx_cost_data movcf2gr_ (unsigned short _movcf2gr)
+  {
+    movcf2gr = _movcf2gr;
+    return *this;
+  }
+
   loongarch_rtx_cost_data branch_cost_ (unsigned short _branch_cost)
   {
     branch_cost = _branch_cost;
@@ -107,6 +116,11 @@ struct loongarch_rtx_cost_data
     return *this;
   }
 
+  bool use_movcf2gr () const
+  {
+    /* If movcf2gr is cheaper than two li.w and a branch, use it.  */
+    return movcf2gr <= simple_insn_cost * 2 + branch_cost;
+  }
 };
 
 /* Costs to use when optimizing for size.  */
diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 390e3206a17..35e84964eb7 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7528,6 +7528,9 @@ loongarch_option_override_internal (struct gcc_options *opts,
   else
     loongarch_cost = &loongarch_cpu_rtx_cost_data[la_target.cpu_tune];
 
+  if (loongarch_use_movcf2gr == M_OPT_UNSET)
+    loongarch_use_movcf2gr = loongarch_cost->use_movcf2gr ();
+
   /* If the user hasn't specified a branch cost, use the processor's
      default.  */
   if (loongarch_branch_cost == 0)
diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index a5d0dcd65fe..de3015c923b 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -3169,6 +3169,42 @@ (define_insn "s<code>_<ANYF:mode>_using_FCCmode"
   [(set_attr "type" "fcmp")
    (set_attr "mode" "FCC")])
 
+(define_insn "movcf2gr<GPR:mode>"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+	(if_then_else:GPR (ne (match_operand:FCC 1 "register_operand" "z")
+			      (const_int 0))
+			  (const_int 1)
+			  (const_int 0)))]
+  "TARGET_HARD_FLOAT && loongarch_use_movcf2gr"
+  "movcf2gr\t%0,%1"
+  [(set_attr "type" "move")
+   (set_attr "mode" "FCC")])
+
+(define_expand "cstore<ANYF:mode>4"
+  [(set (match_operand:SI 0 "register_operand")
+	(match_operator:SI 1 "loongarch_fcmp_operator"
+	  [(match_operand:ANYF 2 "register_operand")
+	   (match_operand:ANYF 3 "register_operand")]))]
+  "loongarch_use_movcf2gr"
+  {
+    rtx fcc = gen_reg_rtx (FCCmode);
+    rtx cmp = gen_rtx_fmt_ee (GET_CODE (operands[1]), FCCmode,
+			      operands[2], operands[3]);
+
+    emit_insn (gen_rtx_SET (fcc, cmp));
+    if (TARGET_64BIT)
+      {
+	rtx gpr = gen_reg_rtx (DImode);
+	emit_insn (gen_movcf2grdi (gpr, fcc));
+	emit_insn (gen_rtx_SET (operands[0],
+				lowpart_subreg (SImode, gpr, DImode)));
+      }
+    else
+      emit_insn (gen_movcf2grsi (operands[0], fcc));
+
+    DONE;
+  })
+
 \f
 ;;
 ;;  ....................
diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt
index 61d25130ea9..b553eaa34e7 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -253,6 +253,10 @@ mpass-mrelax-to-as
 Target Var(loongarch_pass_mrelax_to_as) Init(HAVE_AS_MRELAX_OPTION)
 Pass -mrelax or -mno-relax option to the assembler.
 
+muse-movcf2gr
+Target Var(loongarch_use_movcf2gr) Init(M_OPT_UNSET)
+Emit the movcf2gr instruction.
+
 -param=loongarch-vect-unroll-limit=
 Target Joined UInteger Var(loongarch_vect_unroll_limit) Init(6) IntegerRange(1, 64) Param
 Used to limit unroll factor which indicates how much the autovectorizer may
diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md
index 9e9ce58cb53..83fea08315c 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -590,6 +590,10 @@ (define_predicate "order_operator"
 (define_predicate "loongarch_cstore_operator"
   (match_code "ne,eq,gt,gtu,ge,geu,lt,ltu,le,leu"))
 
+(define_predicate "loongarch_fcmp_operator"
+  (match_code
+    "unordered,uneq,unlt,unle,eq,lt,le,ordered,ltgt,ne,ge,gt,unge,ungt"))
+
 (define_predicate "small_data_pattern"
   (and (match_code "set,parallel,unspec,unspec_volatile,prefetch")
        (match_test "loongarch_small_data_pattern_p (op)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1f26f80d26c..1f79a888627 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -26811,6 +26811,14 @@ Enable the approximation for vectorized reciprocal square root.
 So, for example, @option{-mrecip=all,!sqrt} enables
 all of the reciprocal approximations, except for scalar square root.
 
+@item -muse-movcf2gr
+@itemx -mno-use-movcf2gr
+Use (do not use) the @code{movcf2gr} instruction.  The default is
+dependent on the setting of @option{-mtune=} option:
+@option{-muse-movcf2gr} if tuning for a microarchitecture where the
+@code{movcf2gr} instruction is faster than a @code{bceqz} or @code{bcnez}
+branch setting a GPR to 0 or 1; @option{-mno-use-movcf2gr} otherwise.
+
 @item loongarch-vect-unroll-limit
 The vectorizer will use available tuning information to determine whether it
 would be beneficial to unroll the main vectorized loop and by how much.  This
diff --git a/gcc/testsuite/gcc.target/loongarch/movcf2gr.c b/gcc/testsuite/gcc.target/loongarch/movcf2gr.c
new file mode 100644
index 00000000000..d27c393b5ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/movcf2gr.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mtune=la664 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "movcf2gr\t\\\$r4,\\\$fcc" } } */
+
+int
+t (float a, float b)
+{
+  return a > b;
+}
-- 
2.43.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-12-14  8:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1225b53f-aef6-ab79-c861-bc6c0eb87778@loongson.cn>
2023-12-14  7:44 ` [PATCH] LoongArch: Use the movcf2gr instruction to implement cstore<ANYF:mode>4 Jiahao Xu
2023-12-14  8:18   ` Xi Ruoyao
2023-12-13 14:20 Xi Ruoyao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).