[PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations.
@ 2023-12-06  7:04 Jiahao Xu
  2023-12-06  7:04 ` [PATCH v3 1/5] LoongArch: Add support for LoongArch V1.1 approximate instructions Jiahao Xu
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Jiahao Xu @ 2023-12-06  7:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: xry111, i, chenglulu, xuchenghua, Jiahao Xu

LoongArch V1.1 adds support for approximate instructions, which are utilized along with additional
Newton-Raphson steps implement single precision floating-point division, square root and reciprocal
square root operations for better throughput.

The patches are modifications made based on the patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html

Jiahao Xu (5):
  LoongArch: Add support for LoongArch V1.1 approximate instructions.
  LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
    instructions.
  LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.
  LoongArch: New options -mrecip and -mrecip= with ffast-math.
  LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf
    when -mrecip is enabled.

 gcc/config/loongarch/genopts/isa-evolution.in |   1 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  11 +
 gcc/config/loongarch/larchintrin.h            |  38 +++
 gcc/config/loongarch/lasx.md                  |  89 ++++++-
 gcc/config/loongarch/lasxintrin.h             |  34 +++
 gcc/config/loongarch/loongarch-builtins.cc    |  66 +++++
 gcc/config/loongarch/loongarch-c.cc           |   3 +
 gcc/config/loongarch/loongarch-cpucfg-map.h   |   1 +
 gcc/config/loongarch/loongarch-def.cc         |   3 +-
 gcc/config/loongarch/loongarch-protos.h       |   2 +
 gcc/config/loongarch/loongarch-str.h          |   1 +
 gcc/config/loongarch/loongarch.cc             | 252 +++++++++++++++++-
 gcc/config/loongarch/loongarch.h              |  18 ++
 gcc/config/loongarch/loongarch.md             | 104 ++++++--
 gcc/config/loongarch/loongarch.opt            |  15 ++
 gcc/config/loongarch/lsx.md                   |  89 ++++++-
 gcc/config/loongarch/lsxintrin.h              |  34 +++
 gcc/config/loongarch/predicates.md            |   8 +
 gcc/doc/extend.texi                           |  35 +++
 gcc/doc/invoke.texi                           |  54 ++++
 gcc/testsuite/gcc.target/loongarch/divf.c     |  10 +
 .../loongarch/larch-frecipe-builtin.c         |  28 ++
 .../gcc.target/loongarch/recip-divf.c         |   9 +
 .../gcc.target/loongarch/recip-sqrtf.c        |  23 ++
 gcc/testsuite/gcc.target/loongarch/sqrtf.c    |  24 ++
 .../loongarch/vector/lasx/lasx-divf.c         |  13 +
 .../vector/lasx/lasx-frecipe-builtin.c        |  30 +++
 .../loongarch/vector/lasx/lasx-recip-divf.c   |  12 +
 .../loongarch/vector/lasx/lasx-recip-sqrtf.c  |  28 ++
 .../loongarch/vector/lasx/lasx-recip.c        |  24 ++
 .../loongarch/vector/lasx/lasx-rsqrt.c        |  26 ++
 .../loongarch/vector/lasx/lasx-sqrtf.c        |  29 ++
 .../loongarch/vector/lsx/lsx-divf.c           |  13 +
 .../vector/lsx/lsx-frecipe-builtin.c          |  30 +++
 .../loongarch/vector/lsx/lsx-recip-divf.c     |  12 +
 .../loongarch/vector/lsx/lsx-recip-sqrtf.c    |  28 ++
 .../loongarch/vector/lsx/lsx-recip.c          |  24 ++
 .../loongarch/vector/lsx/lsx-rsqrt.c          |  26 ++
 .../loongarch/vector/lsx/lsx-sqrtf.c          |  29 ++
 39 files changed, 1234 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/5] LoongArch: Add support for LoongArch V1.1 approximate instructions.
  2023-12-06  7:04 [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations Jiahao Xu
@ 2023-12-06  7:04 ` Jiahao Xu
  2023-12-06  7:04 ` [PATCH v3 2/5] LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions Jiahao Xu
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Jiahao Xu @ 2023-12-06  7:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: xry111, i, chenglulu, xuchenghua, Jiahao Xu

This patch adds define_insn/builtins/intrinsics for these instructions, and add option
-mfrecipe to control instruction generation.

gcc/ChangeLog:

	* config/loongarch/genopts/isa-evolution.in (fecipe): Add.
	* config/loongarch/larchintrin.h (__frecipe_s): New intrinsic.
	(__frecipe_d): Ditto.
	(__frsqrte_s): Ditto.
	(__frsqrte_d): Ditto.
	* config/loongarch/lasx.md (lasx_xvfrecipe_<flasxfmt>): New insn pattern.
	(lasx_xvfrsqrte_<flasxfmt>): Ditto.
	* config/loongarch/lasxintrin.h (__lasx_xvfrecipe_s): New intrinsic.
	(__lasx_xvfrecipe_d): Ditto.
	(__lasx_xvfrsqrte_s): Ditto.
	(__lasx_xvfrsqrte_d): Ditto.
	* config/loongarch/loongarch-builtins.cc (AVAIL_ALL): Add predicates.
	(LSX_EXT_BUILTIN): New macro.
	(LASX_EXT_BUILTIN): Ditto.
	* config/loongarch/loongarch-cpucfg-map.h: Regenerate.
	* config/loongarch/loongarch-c.cc: Add builtin macro "__loongarch_frecipe".
	* config/loongarch/loongarch-def.cc: Regenerate.
	* config/loongarch/loongarch-str.h (OPTSTR_FRECIPE): Regenerate.
	* config/loongarch/loongarch.cc (loongarch_asm_code_end): Dump status for TARGET_FRECIPE.
	* config/loongarch/loongarch.md (loongarch_frecipe_<fmt>): New insn pattern.
	(loongarch_frsqrte_<fmt>): Ditto.
	* config/loongarch/loongarch.opt: Regenerate.
	* config/loongarch/lsx.md (lsx_vfrecipe_<flsxfmt>): New insn pattern.
	(lsx_vfrsqrte_<flsxfmt>): Ditto.
	* config/loongarch/lsxintrin.h (__lsx_vfrecipe_s): New intrinsic.
	(__lsx_vfrecipe_d): Ditto.
	(__lsx_vfrsqrte_s): Ditto.
	(__lsx_vfrsqrte_d): Ditto.
	* doc/extend.texi: Add documentation for LoongArch new builtins and intrinsics.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/larch-frecipe-builtin.c: New test.
	* gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c: New test.
	* gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c: New test.

diff --git a/gcc/config/loongarch/genopts/isa-evolution.in b/gcc/config/loongarch/genopts/isa-evolution.in
index a6bc3f87f20..11a198b649f 100644
--- a/gcc/config/loongarch/genopts/isa-evolution.in
+++ b/gcc/config/loongarch/genopts/isa-evolution.in
@@ -1,3 +1,4 @@
+2	25	frecipe		Support frecipe.{s/d} and frsqrte.{s/d} instructions.
 2	26	div32		Support div.w[u] and mod.w[u] instructions with inputs not sign-extended.
 2	27	lam-bh		Support am{swap/add}[_db].{b/h} instructions.
 2	28	lamcas		Support amcas[_db].{b/h/w/d} instructions.
diff --git a/gcc/config/loongarch/larchintrin.h b/gcc/config/loongarch/larchintrin.h
index e571ed27b37..bb1cda831eb 100644
--- a/gcc/config/loongarch/larchintrin.h
+++ b/gcc/config/loongarch/larchintrin.h
@@ -333,6 +333,44 @@ __iocsrwr_d (unsigned long int _1, unsigned int _2)
 }
 #endif
 
+#ifdef __loongarch_frecipe
+/* Assembly instruction format: fd, fj.  */
+/* Data types in instruction templates:  SF, SF.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__frecipe_s (float _1)
+{
+  __builtin_loongarch_frecipe_s ((float) _1);
+}
+
+/* Assembly instruction format: fd, fj.  */
+/* Data types in instruction templates:  DF, DF.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__frecipe_d (double _1)
+{
+  __builtin_loongarch_frecipe_d ((double) _1);
+}
+
+/* Assembly instruction format: fd, fj.  */
+/* Data types in instruction templates:  SF, SF.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__frsqrte_s (float _1)
+{
+  __builtin_loongarch_frsqrte_s ((float) _1);
+}
+
+/* Assembly instruction format: fd, fj.  */
+/* Data types in instruction templates:  DF, DF.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__frsqrte_d (double _1)
+{
+  __builtin_loongarch_frsqrte_d ((double) _1);
+}
+#endif
+
 /* Assembly instruction format:	ui15.  */
 /* Data types in instruction templates:  USI.  */
 #define __dbar(/*ui15*/ _1) __builtin_loongarch_dbar ((_1))
diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 116b30c0774..f6e5208a6f1 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -40,8 +40,10 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVFCVTL
   UNSPEC_LASX_XVFLOGB
   UNSPEC_LASX_XVFRECIP
+  UNSPEC_LASX_XVFRECIPE
   UNSPEC_LASX_XVFRINT
   UNSPEC_LASX_XVFRSQRT
+  UNSPEC_LASX_XVFRSQRTE
   UNSPEC_LASX_XVFCMP_SAF
   UNSPEC_LASX_XVFCMP_SEQ
   UNSPEC_LASX_XVFCMP_SLE
@@ -1633,6 +1635,17 @@ (define_insn "lasx_xvfrecip_<flasxfmt>"
   [(set_attr "type" "simd_fdiv")
    (set_attr "mode" "<MODE>")])
 
+;; Approximate Reciprocal Instructions.
+
+(define_insn "lasx_xvfrecipe_<flasxfmt>"
+  [(set (match_operand:FLASX 0 "register_operand" "=f")
+    (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
+		  UNSPEC_LASX_XVFRECIPE))]
+  "ISA_HAS_LASX && TARGET_FRECIPE"
+  "xvfrecipe.<flasxfmt>\t%u0,%u1"
+  [(set_attr "type" "simd_fdiv")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "lasx_xvfrsqrt_<flasxfmt>"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
 	(unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
@@ -1642,6 +1655,17 @@ (define_insn "lasx_xvfrsqrt_<flasxfmt>"
   [(set_attr "type" "simd_fdiv")
    (set_attr "mode" "<MODE>")])
 
+;; Approximate Reciprocal Square Root Instructions.
+
+(define_insn "lasx_xvfrsqrte_<flasxfmt>"
+  [(set (match_operand:FLASX 0 "register_operand" "=f")
+    (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
+		  UNSPEC_LASX_XVFRSQRTE))]
+  "ISA_HAS_LASX && TARGET_FRECIPE"
+  "xvfrsqrte.<flasxfmt>\t%u0,%u1"
+  [(set_attr "type" "simd_fdiv")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "lasx_xvftint_u_<ilasxfmt_u>_<flasxfmt>"
   [(set (match_operand:<VIMODE256> 0 "register_operand" "=f")
 	(unspec:<VIMODE256> [(match_operand:FLASX 1 "register_operand" "f")]
diff --git a/gcc/config/loongarch/lasxintrin.h b/gcc/config/loongarch/lasxintrin.h
index 7bce2c757f1..5e65e76e74c 100644
--- a/gcc/config/loongarch/lasxintrin.h
+++ b/gcc/config/loongarch/lasxintrin.h
@@ -2399,6 +2399,40 @@ __m256d __lasx_xvfrecip_d (__m256d _1)
   return (__m256d)__builtin_lasx_xvfrecip_d ((v4f64)_1);
 }
 
+#if defined(__loongarch_frecipe)
+/* Assembly instruction format: xd, xj.  */
+/* Data types in instruction templates:  V8SF, V8SF.  */
+extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__m256 __lasx_xvfrecipe_s (__m256 _1)
+{
+  return (__m256)__builtin_lasx_xvfrecipe_s ((v8f32)_1);
+}
+
+/* Assembly instruction format: xd, xj.  */
+/* Data types in instruction templates:  V4DF, V4DF.  */
+extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__m256d __lasx_xvfrecipe_d (__m256d _1)
+{
+  return (__m256d)__builtin_lasx_xvfrecipe_d ((v4f64)_1);
+}
+
+/* Assembly instruction format: xd, xj.  */
+/* Data types in instruction templates:  V8SF, V8SF.  */
+extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__m256 __lasx_xvfrsqrte_s (__m256 _1)
+{
+  return (__m256)__builtin_lasx_xvfrsqrte_s ((v8f32)_1);
+}
+
+/* Assembly instruction format: xd, xj.  */
+/* Data types in instruction templates:  V4DF, V4DF.  */
+extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__m256d __lasx_xvfrsqrte_d (__m256d _1)
+{
+  return (__m256d)__builtin_lasx_xvfrsqrte_d ((v4f64)_1);
+}
+#endif
+
 /* Assembly instruction format:	xd, xj.  */
 /* Data types in instruction templates:  V8SF, V8SF.  */
 extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc
index 5d037ab7f10..507fc953c72 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -120,6 +120,9 @@ struct loongarch_builtin_description
 AVAIL_ALL (hard_float, TARGET_HARD_FLOAT_ABI)
 AVAIL_ALL (lsx, ISA_HAS_LSX)
 AVAIL_ALL (lasx, ISA_HAS_LASX)
+AVAIL_ALL (frecipe, TARGET_FRECIPE && TARGET_HARD_FLOAT_ABI)
+AVAIL_ALL (lsx_frecipe, ISA_HAS_LSX && TARGET_FRECIPE)
+AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE)
 
 /* Construct a loongarch_builtin_description from the given arguments.
 
@@ -164,6 +167,15 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
     "__builtin_lsx_" #INSN,  LARCH_BUILTIN_DIRECT,			\
     FUNCTION_TYPE, loongarch_builtin_avail_lsx }
 
+ /* Define an LSX LARCH_BUILTIN_DIRECT function __builtin_lsx_<INSN>
+    for instruction CODE_FOR_lsx_<INSN>.  FUNCTION_TYPE is a builtin_description
+    field. AVAIL is the name of the availability predicate, without the leading
+    loongarch_builtin_avail_.  */
+#define LSX_EXT_BUILTIN(INSN, FUNCTION_TYPE, AVAIL)                     \
+  { CODE_FOR_lsx_ ## INSN,                                              \
+    "__builtin_lsx_" #INSN,  LARCH_BUILTIN_DIRECT,                      \
+    FUNCTION_TYPE, loongarch_builtin_avail_##AVAIL }
+
 
 /* Define an LSX LARCH_BUILTIN_LSX_TEST_BRANCH function __builtin_lsx_<INSN>
    for instruction CODE_FOR_lsx_<INSN>.  FUNCTION_TYPE is a builtin_description
@@ -189,6 +201,15 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
     "__builtin_lasx_" #INSN,  LARCH_BUILTIN_LASX,			\
     FUNCTION_TYPE, loongarch_builtin_avail_lasx }
 
+/* Define an LASX LARCH_BUILTIN_DIRECT function __builtin_lasx_<INSN>
+   for instruction CODE_FOR_lasx_<INSN>.  FUNCTION_TYPE is a builtin_description
+   field. AVAIL is the name of the availability predicate, without the leading
+   loongarch_builtin_avail_.  */
+#define LASX_EXT_BUILTIN(INSN, FUNCTION_TYPE, AVAIL)                    \
+  { CODE_FOR_lasx_ ## INSN,                                             \
+    "__builtin_lasx_" #INSN,  LARCH_BUILTIN_LASX,                       \
+    FUNCTION_TYPE, loongarch_builtin_avail_##AVAIL }
+
 /* Define an LASX LARCH_BUILTIN_DIRECT_NO_TARGET function __builtin_lasx_<INSN>
    for instruction CODE_FOR_lasx_<INSN>.  FUNCTION_TYPE is a builtin_description
    field.  */
@@ -804,6 +825,27 @@ static const struct loongarch_builtin_description loongarch_builtins[] = {
   DIRECT_NO_TARGET_BUILTIN (syscall, LARCH_VOID_FTYPE_USI, default),
   DIRECT_NO_TARGET_BUILTIN (break, LARCH_VOID_FTYPE_USI, default),
 
+  /* Built-in functions for frecipe.{s/d} and frsqrte.{s/d}.  */
+
+  DIRECT_BUILTIN (frecipe_s, LARCH_SF_FTYPE_SF, frecipe),
+  DIRECT_BUILTIN (frecipe_d, LARCH_DF_FTYPE_DF, frecipe),
+  DIRECT_BUILTIN (frsqrte_s, LARCH_SF_FTYPE_SF, frecipe),
+  DIRECT_BUILTIN (frsqrte_d, LARCH_DF_FTYPE_DF, frecipe),
+
+  /* Built-in functions for new LSX instructions.  */
+
+  LSX_EXT_BUILTIN (vfrecipe_s, LARCH_V4SF_FTYPE_V4SF, lsx_frecipe),
+  LSX_EXT_BUILTIN (vfrecipe_d, LARCH_V2DF_FTYPE_V2DF, lsx_frecipe),
+  LSX_EXT_BUILTIN (vfrsqrte_s, LARCH_V4SF_FTYPE_V4SF, lsx_frecipe),
+  LSX_EXT_BUILTIN (vfrsqrte_d, LARCH_V2DF_FTYPE_V2DF, lsx_frecipe),
+
+  /* Built-in functions for new LASX instructions.  */
+
+  LASX_EXT_BUILTIN (xvfrecipe_s, LARCH_V8SF_FTYPE_V8SF, lasx_frecipe),
+  LASX_EXT_BUILTIN (xvfrecipe_d, LARCH_V4DF_FTYPE_V4DF, lasx_frecipe),
+  LASX_EXT_BUILTIN (xvfrsqrte_s, LARCH_V8SF_FTYPE_V8SF, lasx_frecipe),
+  LASX_EXT_BUILTIN (xvfrsqrte_d, LARCH_V4DF_FTYPE_V4DF, lasx_frecipe),
+
   /* Built-in functions for LSX.  */
   LSX_BUILTIN (vsll_b, LARCH_V16QI_FTYPE_V16QI_V16QI),
   LSX_BUILTIN (vsll_h, LARCH_V8HI_FTYPE_V8HI_V8HI),
diff --git a/gcc/config/loongarch/loongarch-c.cc b/gcc/config/loongarch/loongarch-c.cc
index fbc33a10351..44f52245c78 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -102,6 +102,9 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   else
     builtin_define ("__loongarch_frlen=0");
 
+  if (TARGET_HARD_FLOAT && TARGET_FRECIPE)
+    builtin_define ("__loongarch_frecipe");
+
   if (ISA_HAS_LSX)
     {
       builtin_define ("__loongarch_simd");
diff --git a/gcc/config/loongarch/loongarch-cpucfg-map.h b/gcc/config/loongarch/loongarch-cpucfg-map.h
index 02ff1671255..148333c249c 100644
--- a/gcc/config/loongarch/loongarch-cpucfg-map.h
+++ b/gcc/config/loongarch/loongarch-cpucfg-map.h
@@ -29,6 +29,7 @@ static constexpr struct {
   unsigned int cpucfg_bit;
   HOST_WIDE_INT isa_evolution_bit;
 } cpucfg_map[] = {
+  { 2, 1u << 25, OPTION_MASK_ISA_FRECIPE },
   { 2, 1u << 26, OPTION_MASK_ISA_DIV32 },
   { 2, 1u << 27, OPTION_MASK_ISA_LAM_BH },
   { 2, 1u << 28, OPTION_MASK_ISA_LAMCAS },
diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc
index bc6997e45b5..c41804a180e 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -60,7 +60,8 @@ array_arch<loongarch_isa> loongarch_cpu_default_isa =
 	    .fpu_ (ISA_EXT_FPU64)
 	    .simd_ (ISA_EXT_SIMD_LASX)
 	    .evolution_ (OPTION_MASK_ISA_DIV32 | OPTION_MASK_ISA_LD_SEQ_SA
-		    | OPTION_MASK_ISA_LAM_BH | OPTION_MASK_ISA_LAMCAS));
+			 | OPTION_MASK_ISA_LAM_BH | OPTION_MASK_ISA_LAMCAS
+			 | OPTION_MASK_ISA_FRECIPE));
 
 static inline loongarch_cache la464_cache ()
 {
diff --git a/gcc/config/loongarch/loongarch-str.h b/gcc/config/loongarch/loongarch-str.h
index 7c78d1443d5..4d1bfd675e8 100644
--- a/gcc/config/loongarch/loongarch-str.h
+++ b/gcc/config/loongarch/loongarch-str.h
@@ -68,6 +68,7 @@ along with GCC; see the file COPYING3.  If not see
 #define STR_EXPLICIT_RELOCS_NONE "none"
 #define STR_EXPLICIT_RELOCS_ALWAYS "always"
 
+#define OPTSTR_FRECIPE	"frecipe"
 #define OPTSTR_DIV32	"div32"
 #define OPTSTR_LAM_BH	"lam-bh"
 #define OPTSTR_LAMCAS	"lamcas"
diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 3545e66a10e..57a20bec8a4 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -11503,6 +11503,7 @@ loongarch_asm_code_end (void)
 	       loongarch_cpu_strings [la_target.cpu_tune]);
       fprintf (asm_out_file, "%s Base ISA: %s\n", ASM_COMMENT_START,
 	       loongarch_isa_base_strings [la_target.isa.base]);
+      DUMP_FEATURE (TARGET_FRECIPE);
       DUMP_FEATURE (TARGET_DIV32);
       DUMP_FEATURE (TARGET_LAM_BH);
       DUMP_FEATURE (TARGET_LAMCAS);
diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index 7a101dd64b7..07beede8892 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -59,6 +59,12 @@ (define_c_enum "unspec" [
   ;; Stack tie
   UNSPEC_TIE
 
+  ;; RSQRT
+  UNSPEC_RSQRTE
+
+  ;; RECIP
+  UNSPEC_RECIPE
+
   ;; CRC
   UNSPEC_CRC
   UNSPEC_CRCC
@@ -220,6 +226,7 @@ (define_attr "qword_mode" "no,yes"
 ;; fmadd	floating point multiply-add
 ;; fdiv		floating point divide
 ;; frdiv	floating point reciprocal divide
+;; frecipe      floating point approximate reciprocal
 ;; fabs		floating point absolute value
 ;; flogb	floating point exponent extract
 ;; fneg		floating point negation
@@ -229,6 +236,7 @@ (define_attr "qword_mode" "no,yes"
 ;; fscaleb	floating point scale
 ;; fsqrt	floating point square root
 ;; frsqrt       floating point reciprocal square root
+;; frsqrte      floating point approximate reciprocal square root
 ;; multi	multiword sequence (or user asm statements)
 ;; atomic	atomic memory update instruction
 ;; syncloop	memory atomic operation implemented as a sync loop
@@ -238,8 +246,8 @@ (define_attr "type"
   "unknown,branch,jump,call,load,fpload,fpidxload,store,fpstore,fpidxstore,
    prefetch,prefetchx,condmove,mgtf,mftg,const,arith,logical,
    shift,slt,signext,clz,trap,imul,idiv,move,
-   fmove,fadd,fmul,fmadd,fdiv,frdiv,fabs,flogb,fneg,fcmp,fcopysign,fcvt,
-   fscaleb,fsqrt,frsqrt,accext,accmod,multi,atomic,syncloop,nop,ghost,
+   fmove,fadd,fmul,fmadd,fdiv,frdiv,frecipe,fabs,flogb,fneg,fcmp,fcopysign,fcvt,
+   fscaleb,fsqrt,frsqrt,frsqrte,accext,accmod,multi,atomic,syncloop,nop,ghost,
    simd_div,simd_fclass,simd_flog2,simd_fadd,simd_fcvt,simd_fmul,simd_fmadd,
    simd_fdiv,simd_bitins,simd_bitmov,simd_insert,simd_sld,simd_mul,simd_fcmp,
    simd_fexp2,simd_int_arith,simd_bit,simd_shift,simd_splat,simd_fill,
@@ -908,6 +916,18 @@ (define_insn "*recip<mode>3"
   [(set_attr "type" "frdiv")
    (set_attr "mode" "<UNITMODE>")])
 
+;; Approximate Reciprocal Instructions.
+
+(define_insn "loongarch_frecipe_<fmt>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+    (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")]
+	     UNSPEC_RECIPE))]
+  "TARGET_FRECIPE"
+  "frecipe.<fmt>\t%0,%1"
+  [(set_attr "type" "frecipe")
+   (set_attr "mode" "<UNITMODE>")
+   (set_attr "insn_count" "1")])
+
 ;; Integer division and modulus.
 (define_expand "<optab><mode>3"
   [(set (match_operand:GPR 0 "register_operand")
@@ -1133,6 +1153,17 @@ (define_insn "*rsqrt<mode>b"
   [(set_attr "type" "frsqrt")
    (set_attr "mode" "<UNITMODE>")
    (set_attr "insn_count" "1")])
+
+;; Approximate Reciprocal Square Root Instructions.
+
+(define_insn "loongarch_frsqrte_<fmt>"
+  [(set (match_operand:ANYF 0 "register_operand" "=f")
+    (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")]
+		 UNSPEC_RSQRTE))]
+  "TARGET_FRECIPE"
+  "frsqrte.<fmt>\t%0,%1"
+  [(set_attr "type" "frsqrte")
+   (set_attr "mode" "<UNITMODE>")])
 \f
 ;;
 ;;  ....................
diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt
index 41e6424e861..cdd59ae4fcf 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -260,6 +260,10 @@ default value is 4.
 Variable
 HOST_WIDE_INT isa_evolution = 0
 
+mfrecipe
+Target Mask(ISA_FRECIPE) Var(isa_evolution)
+Support frecipe.{s/d} and frsqrte.{s/d} instructions.
+
 mdiv32
 Target Mask(ISA_DIV32) Var(isa_evolution)
 Support div.w[u] and mod.w[u] instructions with inputs not sign-extended.
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 23239993404..e2393aed139 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -42,8 +42,10 @@ (define_c_enum "unspec" [
   UNSPEC_LSX_VFCVTL
   UNSPEC_LSX_VFLOGB
   UNSPEC_LSX_VFRECIP
+  UNSPEC_LSX_VFRECIPE
   UNSPEC_LSX_VFRINT
   UNSPEC_LSX_VFRSQRT
+  UNSPEC_LSX_VFRSQRTE
   UNSPEC_LSX_VFCMP_SAF
   UNSPEC_LSX_VFCMP_SEQ
   UNSPEC_LSX_VFCMP_SLE
@@ -1546,6 +1548,17 @@ (define_insn "lsx_vfrecip_<flsxfmt>"
   [(set_attr "type" "simd_fdiv")
    (set_attr "mode" "<MODE>")])
 
+;; Approximate Reciprocal Instructions.
+
+(define_insn "lsx_vfrecipe_<flsxfmt>"
+  [(set (match_operand:FLSX 0 "register_operand" "=f")
+    (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")]
+		 UNSPEC_LSX_VFRECIPE))]
+  "ISA_HAS_LSX && TARGET_FRECIPE"
+  "vfrecipe.<flsxfmt>\t%w0,%w1"
+  [(set_attr "type" "simd_fdiv")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "lsx_vfrsqrt_<flsxfmt>"
   [(set (match_operand:FLSX 0 "register_operand" "=f")
 	(unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")]
@@ -1555,6 +1568,17 @@ (define_insn "lsx_vfrsqrt_<flsxfmt>"
   [(set_attr "type" "simd_fdiv")
    (set_attr "mode" "<MODE>")])
 
+;; Approximate Reciprocal Square Root Instructions.
+
+(define_insn "lsx_vfrsqrte_<flsxfmt>"
+  [(set (match_operand:FLSX 0 "register_operand" "=f")
+    (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")]
+		 UNSPEC_LSX_VFRSQRTE))]
+  "ISA_HAS_LSX && TARGET_FRECIPE"
+  "vfrsqrte.<flsxfmt>\t%w0,%w1"
+  [(set_attr "type" "simd_fdiv")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "lsx_vftint_u_<ilsxfmt_u>_<flsxfmt>"
   [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
 	(unspec:<VIMODE> [(match_operand:FLSX 1 "register_operand" "f")]
diff --git a/gcc/config/loongarch/lsxintrin.h b/gcc/config/loongarch/lsxintrin.h
index 29553c093fa..57a6fc40a8f 100644
--- a/gcc/config/loongarch/lsxintrin.h
+++ b/gcc/config/loongarch/lsxintrin.h
@@ -2480,6 +2480,40 @@ __m128d __lsx_vfrecip_d (__m128d _1)
   return (__m128d)__builtin_lsx_vfrecip_d ((v2f64)_1);
 }
 
+#if defined(__loongarch_frecipe)
+/* Assembly instruction format: vd, vj.  */
+/* Data types in instruction templates:  V4SF, V4SF.  */
+extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__m128 __lsx_vfrecipe_s (__m128 _1)
+{
+  return (__m128)__builtin_lsx_vfrecipe_s ((v4f32)_1);
+}
+
+/* Assembly instruction format: vd, vj.  */
+/* Data types in instruction templates:  V2DF, V2DF.  */
+extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__m128d __lsx_vfrecipe_d (__m128d _1)
+{
+  return (__m128d)__builtin_lsx_vfrecipe_d ((v2f64)_1);
+}
+
+/* Assembly instruction format: vd, vj.  */
+/* Data types in instruction templates:  V4SF, V4SF.  */
+extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__m128 __lsx_vfrsqrte_s (__m128 _1)
+{
+  return (__m128)__builtin_lsx_vfrsqrte_s ((v4f32)_1);
+}
+
+/* Assembly instruction format: vd, vj.  */
+/* Data types in instruction templates:  V2DF, V2DF.  */
+extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__m128d __lsx_vfrsqrte_d (__m128d _1)
+{
+  return (__m128d)__builtin_lsx_vfrsqrte_d ((v2f64)_1);
+}
+#endif
+
 /* Assembly instruction format:	vd, vj.  */
 /* Data types in instruction templates:  V4SF, V4SF.  */
 extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 32ae15e1d5b..98c6d320fbe 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -17027,6 +17027,14 @@ The intrinsics provided are listed below:
     void __builtin_loongarch_break (imm0_32767)
 @end smallexample
 
+These instrisic functions are available by using @option{-mfrecipe}.
+@smallexample
+    float __builtin_loongarch_frecipe_s (float);
+    double  __builtin_loongarch_frecipe_d (double);
+    float __builtin_loongarch_frsqrte_s (float);
+    double  __builtin_loongarch_frsqrte_d (double);
+@end smallexample
+
 @emph{Note:}Since the control register is divided into 32-bit and 64-bit,
 but the access instruction is not distinguished. So GCC renames the control
 instructions when implementing intrinsics.
@@ -17099,6 +17107,15 @@ function you need to include @code{larchintrin.h}.
     void __break (imm0_32767)
 @end smallexample
 
+These instrisic functions are available by including @code{larchintrin.h} and
+using @option{-mfrecipe}.
+@smallexample
+    float __frecipe_s (float);
+    double __frecipe_d (double);
+    float __frsqrte_s (float);
+    double __frsqrte_d (double);
+@end smallexample
+
 Additional built-in functions are available for LoongArch family
 processors to efficiently use 128-bit floating-point (__float128)
 values.
@@ -17939,6 +17956,15 @@ __m128i __lsx_vxori_b (__m128i, imm0_255);
 __m128i __lsx_vxor_v (__m128i, __m128i);
 @end smallexample
 
+These instrisic functions are available by including @code{lsxintrin.h} and
+using @option{-mfrecipe} and @option{-mlsx}.
+@smallexample
+__m128d __lsx_vfrecipe_d (__m128d);
+__m128 __lsx_vfrecipe_s (__m128);
+__m128d __lsx_vfrsqrte_d (__m128d);
+__m128 __lsx_vfrsqrte_s (__m128);
+@end smallexample
+
 @node LoongArch ASX Vector Intrinsics
 @subsection LoongArch ASX Vector Intrinsics
 
@@ -18778,6 +18804,15 @@ __m256i __lasx_xvxori_b (__m256i, imm0_255);
 __m256i __lasx_xvxor_v (__m256i, __m256i);
 @end smallexample
 
+These instrisic functions are available by including @code{lasxintrin.h} and
+using @option{-mfrecipe} and @option{-mlasx}.
+@smallexample
+__m256d __lasx_xvfrecipe_d (__m256d);
+__m256 __lasx_xvfrecipe_s (__m256);
+__m256d __lasx_xvfrsqrte_d (__m256d);
+__m256 __lasx_xvfrsqrte_s (__m256);
+@end smallexample
+
 @node MIPS DSP Built-in Functions
 @subsection MIPS DSP Built-in Functions
 
diff --git a/gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c b/gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c
new file mode 100644
index 00000000000..b9329f34676
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c
@@ -0,0 +1,28 @@
+/* Test builtins for frecipe.{s/d} and frsqrte.{s/d} instructions */
+/* { dg-do compile } */
+/* { dg-options "-mfrecipe" } */
+/* { dg-final { scan-assembler-times "test_frecipe_s:.*frecipe\\.s.*test_frecipe_s" 1 } } */
+/* { dg-final { scan-assembler-times "test_frecipe_d:.*frecipe\\.d.*test_frecipe_d" 1 } } */
+/* { dg-final { scan-assembler-times "test_frsqrte_s:.*frsqrte\\.s.*test_frsqrte_s" 1 } } */
+/* { dg-final { scan-assembler-times "test_frsqrte_d:.*frsqrte\\.d.*test_frsqrte_d" 1 } } */
+
+float
+test_frecipe_s (float _1)
+{
+  return __builtin_loongarch_frecipe_s (_1);
+}
+double
+test_frecipe_d (double _1)
+{
+  return __builtin_loongarch_frecipe_d (_1);
+}
+float
+test_frsqrte_s (float _1)
+{
+  return __builtin_loongarch_frsqrte_s (_1);
+}
+double
+test_frsqrte_d (double _1)
+{
+  return __builtin_loongarch_frsqrte_d (_1);
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c
new file mode 100644
index 00000000000..522535b45a3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c
@@ -0,0 +1,30 @@
+/* Test builtins for xvfrecipe.{s/d} and xvfrsqrte.{s/d} instructions */
+/* { dg-do compile } */
+/* { dg-options "-mlasx -mfrecipe" } */
+/* { dg-final { scan-assembler-times "lasx_xvfrecipe_s:.*xvfrecipe\\.s.*lasx_xvfrecipe_s" 1 } } */
+/* { dg-final { scan-assembler-times "lasx_xvfrecipe_d:.*xvfrecipe\\.d.*lasx_xvfrecipe_d" 1 } } */
+/* { dg-final { scan-assembler-times "lasx_xvfrsqrte_s:.*xvfrsqrte\\.s.*lasx_xvfrsqrte_s" 1 } } */
+/* { dg-final { scan-assembler-times "lasx_xvfrsqrte_d:.*xvfrsqrte\\.d.*lasx_xvfrsqrte_d" 1 } } */
+
+#include <lasxintrin.h>
+
+v8f32
+__lasx_xvfrecipe_s (v8f32 _1)
+{
+  return __builtin_lasx_xvfrecipe_s (_1);
+}
+v4f64
+__lasx_xvfrecipe_d (v4f64 _1)
+{
+  return __builtin_lasx_xvfrecipe_d (_1);
+}
+v8f32
+__lasx_xvfrsqrte_s (v8f32 _1)
+{
+  return __builtin_lasx_xvfrsqrte_s (_1);
+}
+v4f64
+__lasx_xvfrsqrte_d (v4f64 _1)
+{
+  return __builtin_lasx_xvfrsqrte_d (_1);
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c
new file mode 100644
index 00000000000..4ad0cb0ffd6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c
@@ -0,0 +1,30 @@
+/* Test builtins for vfrecipe.{s/d} and vfrsqrte.{s/d} instructions */
+/* { dg-do compile } */
+/* { dg-options "-mlsx -mfrecipe" } */
+/* { dg-final { scan-assembler-times "lsx_vfrecipe_s:.*vfrecipe\\.s.*lsx_vfrecipe_s" 1 } } */
+/* { dg-final { scan-assembler-times "lsx_vfrecipe_d:.*vfrecipe\\.d.*lsx_vfrecipe_d" 1 } } */
+/* { dg-final { scan-assembler-times "lsx_vfrsqrte_s:.*vfrsqrte\\.s.*lsx_vfrsqrte_s" 1 } } */
+/* { dg-final { scan-assembler-times "lsx_vfrsqrte_d:.*vfrsqrte\\.d.*lsx_vfrsqrte_d" 1 } } */
+
+#include <lsxintrin.h>
+
+v4f32
+__lsx_vfrecipe_s (v4f32 _1)
+{
+  return __builtin_lsx_vfrecipe_s (_1);
+}
+v2f64
+__lsx_vfrecipe_d (v2f64 _1)
+{
+  return __builtin_lsx_vfrecipe_d (_1);
+}
+v4f32
+__lsx_vfrsqrte_s (v4f32 _1)
+{
+  return __builtin_lsx_vfrsqrte_s (_1);
+}
+v2f64
+__lsx_vfrsqrte_d (v2f64 _1)
+{
+  return __builtin_lsx_vfrsqrte_d (_1);
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 2/5] LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions.
  2023-12-06  7:04 [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations Jiahao Xu
  2023-12-06  7:04 ` [PATCH v3 1/5] LoongArch: Add support for LoongArch V1.1 approximate instructions Jiahao Xu
@ 2023-12-06  7:04 ` Jiahao Xu
  2023-12-06  7:04 ` [PATCH v3 3/5] LoongArch: Redefine pattern for xvfrecip/vfrecip instructions Jiahao Xu
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Jiahao Xu @ 2023-12-06  7:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: xry111, i, chenglulu, xuchenghua, Jiahao Xu

Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt<mode>2 to align with standard
pattern name. Define function use_rsqrt_p to decide when to use rsqrt optab.

gcc/ChangeLog:

	* config/loongarch/lasx.md (lasx_xvfrsqrt_<flasxfmt>): Renamed to ..
	(rsqrt<mode>2): .. this.
	* config/loongarch/loongarch-builtins.cc
	(CODE_FOR_lsx_vfrsqrt_d): Redefine to standard pattern name.
	(CODE_FOR_lsx_vfrsqrt_s): Ditto.
	(CODE_FOR_lasx_xvfrsqrt_d): Ditto.
	(CODE_FOR_lasx_xvfrsqrt_s): Ditto.
	* config/loongarch/loongarch.cc (use_rsqrt_p): New function.
	(loongarch_optab_supported_p): Ditto.
	(TARGET_OPTAB_SUPPORTED_P): New hook.
	* config/loongarch/loongarch.md (*rsqrt<mode>a): Remove.
	(*rsqrt<mode>2): New insn pattern.
	(*rsqrt<mode>b): Remove.
	* config/loongarch/lsx.md (lsx_vfrsqrt_<flsxfmt>): Renamed to ..
	(rsqrt<mode>2): .. this.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vector/lasx/lasx-rsqrt.c: New test.
	* gcc.target/loongarch/vector/lsx/lsx-rsqrt.c: New test.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index f6e5208a6f1..c8edc1bfd76 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1646,10 +1646,10 @@ (define_insn "lasx_xvfrecipe_<flasxfmt>"
   [(set_attr "type" "simd_fdiv")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "lasx_xvfrsqrt_<flasxfmt>"
+(define_insn "rsqrt<mode>2"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
-	(unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
-		      UNSPEC_LASX_XVFRSQRT))]
+    (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
+		  UNSPEC_LASX_XVFRSQRT))]
   "ISA_HAS_LASX"
   "xvfrsqrt.<flasxfmt>\t%u0,%u1"
   [(set_attr "type" "simd_fdiv")
diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc
index 507fc953c72..ba8686d4ceb 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -500,6 +500,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE)
 #define CODE_FOR_lsx_vssrlrn_bu_h CODE_FOR_lsx_vssrlrn_u_bu_h
 #define CODE_FOR_lsx_vssrlrn_hu_w CODE_FOR_lsx_vssrlrn_u_hu_w
 #define CODE_FOR_lsx_vssrlrn_wu_d CODE_FOR_lsx_vssrlrn_u_wu_d
+#define CODE_FOR_lsx_vfrsqrt_d CODE_FOR_rsqrtv2df2
+#define CODE_FOR_lsx_vfrsqrt_s CODE_FOR_rsqrtv4sf2
 
 /* LoongArch ASX define CODE_FOR_lasx_mxxx */
 #define CODE_FOR_lasx_xvsadd_b CODE_FOR_ssaddv32qi3
@@ -776,6 +778,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE)
 #define CODE_FOR_lasx_xvsat_hu CODE_FOR_lasx_xvsat_u_hu
 #define CODE_FOR_lasx_xvsat_wu CODE_FOR_lasx_xvsat_u_wu
 #define CODE_FOR_lasx_xvsat_du CODE_FOR_lasx_xvsat_u_du
+#define CODE_FOR_lasx_xvfrsqrt_d CODE_FOR_rsqrtv4df2
+#define CODE_FOR_lasx_xvfrsqrt_s CODE_FOR_rsqrtv8sf2
 
 static const struct loongarch_builtin_description loongarch_builtins[] = {
 #define LARCH_MOVFCSR2GR 0
diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 57a20bec8a4..96a4b846f2d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -11487,6 +11487,30 @@ loongarch_builtin_support_vector_misalignment (machine_mode mode,
 						      is_packed);
 }
 
+static bool
+use_rsqrt_p (void)
+{
+  return (flag_finite_math_only
+	  && !flag_trapping_math
+	  && flag_unsafe_math_optimizations);
+}
+
+/* Implement the TARGET_OPTAB_SUPPORTED_P hook.  */
+
+static bool
+loongarch_optab_supported_p (int op, machine_mode, machine_mode,
+			     optimization_type opt_type)
+{
+  switch (op)
+    {
+    case rsqrt_optab:
+      return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p ();
+
+    default:
+      return true;
+    }
+}
+
 /* If -fverbose-asm, dump some info for debugging.  */
 static void
 loongarch_asm_code_end (void)
@@ -11625,6 +11649,9 @@ loongarch_asm_code_end (void)
 #undef TARGET_FUNCTION_ARG_BOUNDARY
 #define TARGET_FUNCTION_ARG_BOUNDARY loongarch_function_arg_boundary
 
+#undef TARGET_OPTAB_SUPPORTED_P
+#define TARGET_OPTAB_SUPPORTED_P loongarch_optab_supported_p
+
 #undef TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P loongarch_vector_mode_supported_p
 
diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index 07beede8892..fd154b02e48 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -60,6 +60,7 @@ (define_c_enum "unspec" [
   UNSPEC_TIE
 
   ;; RSQRT
+  UNSPEC_RSQRT
   UNSPEC_RSQRTE
 
   ;; RECIP
@@ -1134,25 +1135,14 @@ (define_insn "sqrt<mode>2"
    (set_attr "mode" "<UNITMODE>")
    (set_attr "insn_count" "1")])
 
-(define_insn "*rsqrt<mode>a"
+(define_insn "*rsqrt<mode>2"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
-	(div:ANYF (match_operand:ANYF 1 "const_1_operand" "")
-		  (sqrt:ANYF (match_operand:ANYF 2 "register_operand" "f"))))]
-  "flag_unsafe_math_optimizations"
-  "frsqrt.<fmt>\t%0,%2"
-  [(set_attr "type" "frsqrt")
-   (set_attr "mode" "<UNITMODE>")
-   (set_attr "insn_count" "1")])
-
-(define_insn "*rsqrt<mode>b"
-  [(set (match_operand:ANYF 0 "register_operand" "=f")
-	(sqrt:ANYF (div:ANYF (match_operand:ANYF 1 "const_1_operand" "")
-			     (match_operand:ANYF 2 "register_operand" "f"))))]
-  "flag_unsafe_math_optimizations"
-  "frsqrt.<fmt>\t%0,%2"
+    (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")]
+	     UNSPEC_RSQRT))]
+  "TARGET_HARD_FLOAT"
+  "frsqrt.<fmt>\t%0,%1"
   [(set_attr "type" "frsqrt")
-   (set_attr "mode" "<UNITMODE>")
-   (set_attr "insn_count" "1")])
+   (set_attr "mode" "<UNITMODE>")])
 
 ;; Approximate Reciprocal Square Root Instructions.
 
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index e2393aed139..aeae1b1a622 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -1559,10 +1559,10 @@ (define_insn "lsx_vfrecipe_<flsxfmt>"
   [(set_attr "type" "simd_fdiv")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "lsx_vfrsqrt_<flsxfmt>"
+(define_insn "rsqrt<mode>2"
   [(set (match_operand:FLSX 0 "register_operand" "=f")
-	(unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFRSQRT))]
+    (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")]
+		 UNSPEC_LSX_VFRSQRT))]
   "ISA_HAS_LSX"
   "vfrsqrt.<flsxfmt>\t%w0,%w1"
   [(set_attr "type" "simd_fdiv")
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c
new file mode 100644
index 00000000000..24316944d4e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlasx -ffast-math" } */
+/* { dg-final { scan-assembler "xvfrsqrt.s" } } */
+/* { dg-final { scan-assembler "xvfrsqrt.d" } } */
+
+extern float sqrtf (float);
+
+float a[8], b[8];
+
+void
+foo1(void)
+{
+  for (int i = 0; i < 8; i++)
+    a[i] = 1 / sqrtf (b[i]);
+}
+
+extern double sqrt (double);
+
+double da[4], db[4];
+
+void
+foo2(void)
+{
+  for (int i = 0; i < 4; i++)
+    da[i] = 1 / sqrt (db[i]);
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c
new file mode 100644
index 00000000000..519cc47644c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlsx -ffast-math" } */
+/* { dg-final { scan-assembler "vfrsqrt.s" } } */
+/* { dg-final { scan-assembler "vfrsqrt.d" } } */
+
+extern float sqrtf (float);
+
+float a[4], b[4];
+
+void
+foo1(void)
+{
+  for (int i = 0; i < 4; i++)
+    a[i] = 1 / sqrtf (b[i]);
+}
+
+extern double sqrt (double);
+
+double da[2], db[2];
+
+void
+foo2(void)
+{
+  for (int i = 0; i < 2; i++)
+    da[i] = 1 / sqrt (db[i]);
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 3/5] LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.
  2023-12-06  7:04 [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations Jiahao Xu
  2023-12-06  7:04 ` [PATCH v3 1/5] LoongArch: Add support for LoongArch V1.1 approximate instructions Jiahao Xu
  2023-12-06  7:04 ` [PATCH v3 2/5] LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions Jiahao Xu
@ 2023-12-06  7:04 ` Jiahao Xu
  2023-12-06  7:04 ` [PATCH v3 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math Jiahao Xu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Jiahao Xu @ 2023-12-06  7:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: xry111, i, chenglulu, xuchenghua, Jiahao Xu

Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec, and enable
[x]vfrecip instructions to be generated during auto-vectorization.

gcc/ChangeLog:

	* config/loongarch/lasx.md (lasx_xvfrecip_<flasxfmt>): Renamed to ..
	(recip<mode>3): .. this.
	* config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vfrecip_d): Redefine
	to new pattern name.
	(CODE_FOR_lsx_vfrecip_s): Ditto.
	(CODE_FOR_lasx_xvfrecip_d): Ditto.
	(CODE_FOR_lasx_xvfrecip_s): Ditto.
	(loongarch_expand_builtin_direct): For the vector recip instructions, construct a
	temporary parameter const1_vector.
	* config/loongarch/lsx.md (lsx_vfrecip_<flsxfmt>): Renamed to ..
	(recip<mode>3): .. this.
	* config/loongarch/predicates.md (const_vector_1_operand): New predicate.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index c8edc1bfd76..e4310c4523d 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1626,12 +1626,12 @@ (define_insn "lasx_xvfmina_<flasxfmt>"
   [(set_attr "type" "simd_fminmax")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "lasx_xvfrecip_<flasxfmt>"
+(define_insn "recip<mode>3"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
-	(unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
-		      UNSPEC_LASX_XVFRECIP))]
+       (div:FLASX (match_operand:FLASX 1 "const_vector_1_operand" "")
+		  (match_operand:FLASX 2 "register_operand" "f")))]
   "ISA_HAS_LASX"
-  "xvfrecip.<flasxfmt>\t%u0,%u1"
+  "xvfrecip.<flasxfmt>\t%u0,%u2"
   [(set_attr "type" "simd_fdiv")
    (set_attr "mode" "<MODE>")])
 
diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc
index ba8686d4ceb..c77394176db 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -502,6 +502,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE)
 #define CODE_FOR_lsx_vssrlrn_wu_d CODE_FOR_lsx_vssrlrn_u_wu_d
 #define CODE_FOR_lsx_vfrsqrt_d CODE_FOR_rsqrtv2df2
 #define CODE_FOR_lsx_vfrsqrt_s CODE_FOR_rsqrtv4sf2
+#define CODE_FOR_lsx_vfrecip_d CODE_FOR_recipv2df3
+#define CODE_FOR_lsx_vfrecip_s CODE_FOR_recipv4sf3
 
 /* LoongArch ASX define CODE_FOR_lasx_mxxx */
 #define CODE_FOR_lasx_xvsadd_b CODE_FOR_ssaddv32qi3
@@ -780,6 +782,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE)
 #define CODE_FOR_lasx_xvsat_du CODE_FOR_lasx_xvsat_u_du
 #define CODE_FOR_lasx_xvfrsqrt_d CODE_FOR_rsqrtv4df2
 #define CODE_FOR_lasx_xvfrsqrt_s CODE_FOR_rsqrtv8sf2
+#define CODE_FOR_lasx_xvfrecip_d CODE_FOR_recipv4df3
+#define CODE_FOR_lasx_xvfrecip_s CODE_FOR_recipv8sf3
 
 static const struct loongarch_builtin_description loongarch_builtins[] = {
 #define LARCH_MOVFCSR2GR 0
@@ -3024,6 +3028,22 @@ loongarch_expand_builtin_direct (enum insn_code icode, rtx target, tree exp,
   if (has_target_p)
     create_output_operand (&ops[opno++], target, TYPE_MODE (TREE_TYPE (exp)));
 
+  /* For the vector reciprocal instructions, we need to construct a temporary
+     parameter const1_vector.  */
+  switch (icode)
+    {
+    case CODE_FOR_recipv8sf3:
+    case CODE_FOR_recipv4df3:
+    case CODE_FOR_recipv4sf3:
+    case CODE_FOR_recipv2df3:
+      loongarch_prepare_builtin_arg (&ops[2], exp, 0);
+      create_input_operand (&ops[1], CONST1_RTX (ops[0].mode), ops[0].mode);
+      return loongarch_expand_builtin_insn (icode, 3, ops, has_target_p);
+
+    default:
+      break;
+    }
+
   /* Map the arguments to the other operands.  */
   gcc_assert (opno + call_expr_nargs (exp)
 	      == insn_data[icode].n_generator_args);
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index aeae1b1a622..06402e3b353 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -1539,12 +1539,12 @@ (define_insn "lsx_vfmina_<flsxfmt>"
   [(set_attr "type" "simd_fminmax")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "lsx_vfrecip_<flsxfmt>"
+(define_insn "recip<mode>3"
   [(set (match_operand:FLSX 0 "register_operand" "=f")
-	(unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFRECIP))]
+       (div:FLSX (match_operand:FLSX 1 "const_vector_1_operand" "")
+		 (match_operand:FLSX 2 "register_operand" "f")))]
   "ISA_HAS_LSX"
-  "vfrecip.<flsxfmt>\t%w0,%w1"
+  "vfrecip.<flsxfmt>\t%w0,%w2"
   [(set_attr "type" "simd_fdiv")
    (set_attr "mode" "<MODE>")])
 
diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md
index d02e846cb12..f7796da10b2 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -227,6 +227,10 @@ (define_predicate "const_1_operand"
   (and (match_code "const_int,const_wide_int,const_double,const_vector")
        (match_test "op == CONST1_RTX (GET_MODE (op))")))
 
+(define_predicate "const_vector_1_operand"
+  (and (match_code "const_vector")
+       (match_test "op == CONST1_RTX (GET_MODE (op))")))
+
 (define_predicate "reg_or_1_operand"
   (ior (match_operand 0 "const_1_operand")
        (match_operand 0 "register_operand")))
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math.
  2023-12-06  7:04 [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations Jiahao Xu
                   ` (2 preceding siblings ...)
  2023-12-06  7:04 ` [PATCH v3 3/5] LoongArch: Redefine pattern for xvfrecip/vfrecip instructions Jiahao Xu
@ 2023-12-06  7:04 ` Jiahao Xu
  2023-12-06  7:04 ` [PATCH v3 5/5] LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf when -mrecip is enabled Jiahao Xu
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Jiahao Xu @ 2023-12-06  7:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: xry111, i, chenglulu, xuchenghua, Jiahao Xu

When both the -mrecip and -mfrecipe options are enabled, use approximate reciprocal
instructions and approximate reciprocal square root instructions with additional
Newton-Raphson steps to implement single precision floating-point division, square
root and reciprocal square root operations, for a better performance.

gcc/ChangeLog:

        * config/loongarch/genopts/loongarch.opt.in (recip_mask): New variable.
        (-mrecip, -mrecip): New options.
        * config/loongarch/lasx.md (div<mode>3): New expander.
        (*div<mode>3): Rename.
        (sqrt<mode>2): New expander.
        (*sqrt<mode>2): Rename.
        (rsqrt<mode>2): New expander.
        * config/loongarch/loongarch-protos.h (loongarch_emit_swrsqrtsf): New prototype.
        (loongarch_emit_swdivsf): Ditto.
        * config/loongarch/loongarch.cc (loongarch_option_override_internal): Set
        recip_mask for -mrecip and -mrecip= options.
        (loongarch_emit_swrsqrtsf): New function.
        (loongarch_emit_swdivsf): Ditto.
        * config/loongarch/loongarch.h (RECIP_MASK_NONE, RECIP_MASK_DIV, RECIP_MASK_SQRT
        RECIP_MASK_RSQRT, RECIP_MASK_VEC_DIV, RECIP_MASK_VEC_SQRT, RECIP_MASK_VEC_RSQRT
        RECIP_MASK_ALL): New bitmasks.
        (TARGET_RECIP_DIV, TARGET_RECIP_SQRT, TARGET_RECIP_RSQRT, TARGET_RECIP_VEC_DIV
        TARGET_RECIP_VEC_SQRT, TARGET_RECIP_VEC_RSQRT): New tests.
        * config/loongarch/loongarch.md (sqrt<mode>2): New expander.
        (*sqrt<mode>2): Rename.
        (rsqrt<mode>2): New expander.
        * config/loongarch/loongarch.opt (recip_mask): New variable.
        (-mrecip, -mrecip): New options.
        * config/loongarch/lsx.md (div<mode>3): New expander.
        (*div<mode>3): Rename.
        (sqrt<mode>2): New expander.
        (*sqrt<mode>2): Rename.
        (rsqrt<mode>2): New expander.
        * config/loongarch/predicates.md (reg_or_vecotr_1_operand): New predicate.
        * doc/invoke.texi (LoongArch Options): Document new options.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/divf.c: New test.
	* gcc.target/loongarch/recip-divf.c: New test.
	* gcc.target/loongarch/recip-sqrtf.c: New test.
	* gcc.target/loongarch/sqrtf.c: New test.
	* gcc.target/loongarch/vector/lasx/lasx-divf.c: New test.
	* gcc.target/loongarch/vector/lasx/lasx-recip-divf.c: New test.
	* gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c: New test.
	* gcc.target/loongarch/vector/lasx/lasx-recip.c: New test.
	* gcc.target/loongarch/vector/lasx/lasx-sqrtf.c: New test.
	* gcc.target/loongarch/vector/lsx/lsx-divf.c: New test.
	* gcc.target/loongarch/vector/lsx/lsx-recip-divf.c: New test.
	* gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c: New test.
	* gcc.target/loongarch/vector/lsx/lsx-recip.c: New test.
	* gcc.target/loongarch/vector/lsx/lsx-sqrtf.c: New test.

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in
index 483b185b059..c3848d02fd3 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -23,6 +23,9 @@ config/loongarch/loongarch-opts.h
 HeaderInclude
 config/loongarch/loongarch-str.h
 
+TargetVariable
+unsigned int recip_mask = 0
+
 ; ISA related options
 ;; Base ISA
 Enum
@@ -194,6 +197,14 @@ mexplicit-relocs
 Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET)
 Use %reloc() assembly operators (for backward compatibility).
 
+mrecip
+Target RejectNegative Var(loongarch_recip)
+Generate approximate reciprocal divide and square root for better throughput.
+
+mrecip=
+Target RejectNegative Joined Var(loongarch_recip_name)
+Control generation of reciprocal estimates.
+
 ; The code model option names for -mcmodel.
 Enum
 Name(cmodel) Type(int)
diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index e4310c4523d..f6f2feedbb3 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1194,7 +1194,25 @@ (define_insn "mul<mode>3"
   [(set_attr "type" "simd_fmul")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "div<mode>3"
+(define_expand "div<mode>3"
+  [(set (match_operand:FLASX 0 "register_operand")
+    (div:FLASX (match_operand:FLASX 1 "reg_or_vecotr_1_operand")
+	       (match_operand:FLASX 2 "register_operand")))]
+  "ISA_HAS_LASX"
+{
+  if (<MODE>mode == V8SFmode
+    && TARGET_RECIP_VEC_DIV
+    && optimize_insn_for_speed_p ()
+    && flag_finite_math_only && !flag_trapping_math
+    && flag_unsafe_math_optimizations)
+  {
+    loongarch_emit_swdivsf (operands[0], operands[1],
+	operands[2], V8SFmode);
+    DONE;
+  }
+})
+
+(define_insn "*div<mode>3"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
 	(div:FLASX (match_operand:FLASX 1 "register_operand" "f")
 		   (match_operand:FLASX 2 "register_operand" "f")))]
@@ -1223,7 +1241,23 @@ (define_insn "fnma<mode>4"
   [(set_attr "type" "simd_fmadd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "sqrt<mode>2"
+(define_expand "sqrt<mode>2"
+  [(set (match_operand:FLASX 0 "register_operand")
+    (sqrt:FLASX (match_operand:FLASX 1 "register_operand")))]
+  "ISA_HAS_LASX"
+{
+  if (<MODE>mode == V8SFmode
+      && TARGET_RECIP_VEC_SQRT
+      && flag_unsafe_math_optimizations
+      && optimize_insn_for_speed_p ()
+      && flag_finite_math_only && !flag_trapping_math)
+    {
+      loongarch_emit_swrsqrtsf (operands[0], operands[1], V8SFmode, 0);
+      DONE;
+    }
+})
+
+(define_insn "*sqrt<mode>2"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
 	(sqrt:FLASX (match_operand:FLASX 1 "register_operand" "f")))]
   "ISA_HAS_LASX"
@@ -1646,7 +1680,20 @@ (define_insn "lasx_xvfrecipe_<flasxfmt>"
   [(set_attr "type" "simd_fdiv")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "rsqrt<mode>2"
+(define_expand "rsqrt<mode>2"
+  [(set (match_operand:FLASX 0 "register_operand" "=f")
+    (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
+	     UNSPEC_LASX_XVFRSQRT))]
+  "ISA_HAS_LASX"
+ {
+   if (<MODE>mode == V8SFmode && TARGET_RECIP_VEC_RSQRT)
+     {
+       loongarch_emit_swrsqrtsf (operands[0], operands[1], V8SFmode, 1);
+       DONE;
+     }
+})
+
+(define_insn "*rsqrt<mode>2"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
     (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
 		  UNSPEC_LASX_XVFRSQRT))]
diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h
index cb8fc36b086..f2ff93b5e10 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -220,5 +220,7 @@ extern rtx loongarch_gen_const_int_vector_shuffle (machine_mode, int);
 extern tree loongarch_build_builtin_va_list (void);
 
 extern rtx loongarch_build_signbit_mask (machine_mode, bool, bool);
+extern void loongarch_emit_swrsqrtsf (rtx, rtx, machine_mode, bool);
+extern void loongarch_emit_swdivsf (rtx, rtx, rtx, machine_mode);
 extern bool loongarch_explicit_relocs_p (enum loongarch_symbol_type);
 #endif /* ! GCC_LOONGARCH_PROTOS_H */
diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 96a4b846f2d..2c06edcff92 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7547,6 +7547,71 @@ loongarch_option_override_internal (struct gcc_options *opts,
 
   /* Function to allocate machine-dependent function status.  */
   init_machine_status = &loongarch_init_machine_status;
+
+  /* -mrecip options.  */
+  static struct
+    {
+      const char *string;	    /* option name.  */
+      unsigned int mask;	    /* mask bits to set.  */
+    }
+  const recip_options[] = {
+	{ "all",       RECIP_MASK_ALL },
+	{ "none",      RECIP_MASK_NONE },
+	{ "div",       RECIP_MASK_DIV },
+	{ "sqrt",      RECIP_MASK_SQRT },
+	{ "rsqrt",     RECIP_MASK_RSQRT },
+	{ "vec-div",   RECIP_MASK_VEC_DIV },
+	{ "vec-sqrt",  RECIP_MASK_VEC_SQRT },
+	{ "vec-rsqrt", RECIP_MASK_VEC_RSQRT },
+  };
+
+  if (loongarch_recip_name)
+    {
+      char *p = ASTRDUP (loongarch_recip_name);
+      char *q;
+      unsigned int mask, i;
+      bool invert;
+
+      while ((q = strtok (p, ",")) != NULL)
+	{
+	  p = NULL;
+	  if (*q == '!')
+	    {
+	      invert = true;
+	      q++;
+	    }
+	  else
+	    invert = false;
+
+	  if (!strcmp (q, "default"))
+	    mask = RECIP_MASK_ALL;
+	  else
+	    {
+	      for (i = 0; i < ARRAY_SIZE (recip_options); i++)
+		if (!strcmp (q, recip_options[i].string))
+		  {
+		    mask = recip_options[i].mask;
+		    break;
+		  }
+
+	      if (i == ARRAY_SIZE (recip_options))
+		{
+		  error ("unknown option for %<-mrecip=%s%>", q);
+		  invert = false;
+		  mask = RECIP_MASK_NONE;
+		}
+	    }
+
+	  if (invert)
+	    recip_mask &= ~mask;
+	  else
+	    recip_mask |= mask;
+	}
+    }
+  if (loongarch_recip)
+    recip_mask |= RECIP_MASK_ALL;
+  if (!TARGET_FRECIPE)
+    recip_mask = RECIP_MASK_NONE;
 }
 
 
@@ -11470,6 +11535,126 @@ loongarch_build_signbit_mask (machine_mode mode, bool vect, bool invert)
   return force_reg (vec_mode, v);
 }
 
+/* Use rsqrte instruction and Newton-Rhapson to compute the approximation of
+   a single precision floating point [reciprocal] square root.  */
+
+void loongarch_emit_swrsqrtsf (rtx res, rtx a, machine_mode mode, bool recip)
+{
+  rtx x0, e0, e1, e2, mhalf, monehalf;
+  REAL_VALUE_TYPE r;
+  int unspec;
+
+  x0 = gen_reg_rtx (mode);
+  e0 = gen_reg_rtx (mode);
+  e1 = gen_reg_rtx (mode);
+  e2 = gen_reg_rtx (mode);
+
+  real_arithmetic (&r, ABS_EXPR, &dconsthalf, NULL);
+  mhalf = const_double_from_real_value (r, SFmode);
+
+  real_arithmetic (&r, PLUS_EXPR, &dconsthalf, &dconst1);
+  monehalf = const_double_from_real_value (r, SFmode);
+  unspec = UNSPEC_RSQRTE;
+
+  if (VECTOR_MODE_P (mode))
+    {
+      mhalf = loongarch_build_const_vector (mode, true, mhalf);
+      monehalf = loongarch_build_const_vector (mode, true, monehalf);
+      unspec = GET_MODE_SIZE (mode) == 32 ? UNSPEC_LASX_XVFRSQRTE
+					  : UNSPEC_LSX_VFRSQRTE;
+    }
+
+  /* rsqrt(a) =  rsqrte(a) * (1.5 - 0.5 * a * rsqrte(a) * rsqrte(a))
+     sqrt(a)  =  a * rsqrte(a) * (1.5 - 0.5 * a * rsqrte(a) * rsqrte(a))  */
+
+  a = force_reg (mode, a);
+
+  /* x0 = rsqrt(a) estimate.  */
+  emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, a),
+					      unspec)));
+
+  /* If (a == 0.0) Filter out infinity to prevent NaN for sqrt(0.0).  */
+  if (!recip)
+    {
+      rtx zero = force_reg (mode, CONST0_RTX (mode));
+
+      if (VECTOR_MODE_P (mode))
+	{
+	  machine_mode imode = related_int_vector_mode (mode).require ();
+	  rtx mask = gen_reg_rtx (imode);
+	  emit_insn (gen_rtx_SET (mask, gen_rtx_NE (imode, a, zero)));
+	  emit_insn (gen_rtx_SET (x0, gen_rtx_AND (mode, x0,
+						   gen_lowpart (mode, mask))));
+	}
+      else
+	{
+	  rtx target = emit_conditional_move (x0, { GT, a, zero, mode },
+					      x0, zero, mode, 0);
+	  if (target != x0)
+	    emit_move_insn (x0, target);
+	}
+    }
+
+  /* e0 = x0 * a  */
+  emit_insn (gen_rtx_SET (e0, gen_rtx_MULT (mode, x0, a)));
+  /* e1 = e0 * x0  */
+  emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, e0, x0)));
+
+  /* e2 = 1.5 - e1 * 0.5  */
+  mhalf = force_reg (mode, mhalf);
+  monehalf = force_reg (mode, monehalf);
+  emit_insn (gen_rtx_SET (e2, gen_rtx_FMA (mode,
+					   gen_rtx_NEG (mode, e1),
+							mhalf, monehalf)));
+
+  if (recip)
+    /* res = e2 * x0  */
+    emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, x0, e2)));
+  else
+    /* res = e2 * e0  */
+    emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e2, e0)));
+}
+
+/* Use recipe instruction and Newton-Rhapson to compute the approximation of
+   a single precision floating point divide.  */
+
+void loongarch_emit_swdivsf (rtx res, rtx a, rtx b, machine_mode mode)
+{
+  rtx x0, e0, mtwo;
+  REAL_VALUE_TYPE r;
+  x0 = gen_reg_rtx (mode);
+  e0 = gen_reg_rtx (mode);
+  int unspec = UNSPEC_RECIPE;
+
+  real_arithmetic (&r, ABS_EXPR, &dconst2, NULL);
+  mtwo = const_double_from_real_value (r, SFmode);
+
+  if (VECTOR_MODE_P (mode))
+    {
+      mtwo = loongarch_build_const_vector (mode, true, mtwo);
+      unspec = GET_MODE_SIZE (mode) == 32 ? UNSPEC_LASX_XVFRECIPE
+					  : UNSPEC_LSX_VFRECIPE;
+    }
+
+  mtwo = force_reg (mode, mtwo);
+
+  /* a / b = a * recipe(b) * (2.0 - b * recipe(b))  */
+
+  /* x0 = 1./b estimate.  */
+  emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b),
+					      unspec)));
+  /* 2.0 - b * x0  */
+  emit_insn (gen_rtx_SET (e0, gen_rtx_FMA (mode,
+					   gen_rtx_NEG (mode, b), x0, mtwo)));
+
+  /* x0 = a * x0  */
+  if (a != CONST1_RTX (mode))
+    emit_insn (gen_rtx_SET (x0, gen_rtx_MULT (mode, a, x0)));
+
+  /* res = e0 * x0  */
+  emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0)));
+}
+
 static bool
 loongarch_builtin_support_vector_misalignment (machine_mode mode,
 					       const_tree type,
@@ -11665,6 +11850,9 @@ loongarch_asm_code_end (void)
 #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \
   loongarch_autovectorize_vector_modes
 
+#undef TARGET_OPTAB_SUPPORTED_P
+#define TARGET_OPTAB_SUPPORTED_P loongarch_optab_supported_p
+
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS loongarch_init_builtins
 #undef TARGET_BUILTIN_DECL
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index fa8a3f5582f..f1350b6048f 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -702,6 +702,24 @@ enum reg_class
    && (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT		\
        || GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT))
 
+#define RECIP_MASK_NONE         0x00
+#define RECIP_MASK_DIV          0x01
+#define RECIP_MASK_SQRT         0x02
+#define RECIP_MASK_RSQRT        0x04
+#define RECIP_MASK_VEC_DIV      0x08
+#define RECIP_MASK_VEC_SQRT     0x10
+#define RECIP_MASK_VEC_RSQRT    0x20
+#define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \
+			| RECIP_MASK_RSQRT | RECIP_MASK_VEC_SQRT \
+			| RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_RSQRT)
+
+#define TARGET_RECIP_DIV        ((recip_mask & RECIP_MASK_DIV) != 0 || TARGET_uARCH_LA664)
+#define TARGET_RECIP_SQRT       ((recip_mask & RECIP_MASK_SQRT) != 0 || TARGET_uARCH_LA664)
+#define TARGET_RECIP_RSQRT      ((recip_mask & RECIP_MASK_RSQRT) != 0 || TARGET_uARCH_LA664)
+#define TARGET_RECIP_VEC_DIV    ((recip_mask & RECIP_MASK_VEC_DIV) != 0 || TARGET_uARCH_LA664)
+#define TARGET_RECIP_VEC_SQRT   ((recip_mask & RECIP_MASK_VEC_SQRT) != 0 || TARGET_uARCH_LA664)
+#define TARGET_RECIP_VEC_RSQRT  ((recip_mask & RECIP_MASK_VEC_RSQRT) != 0 || TARGET_uARCH_LA664)
+
 /* 1 if N is a possible register number for function argument passing.
    We have no FP argument registers when soft-float.  */
 
diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index fd154b02e48..1a10b809e3c 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -893,9 +893,21 @@ (define_peephole
 ;; Float division and modulus.
 (define_expand "div<mode>3"
   [(set (match_operand:ANYF 0 "register_operand")
-	(div:ANYF (match_operand:ANYF 1 "reg_or_1_operand")
-		  (match_operand:ANYF 2 "register_operand")))]
-  "")
+    (div:ANYF (match_operand:ANYF 1 "reg_or_1_operand")
+	      (match_operand:ANYF 2 "register_operand")))]
+  ""
+{
+  if (<MODE>mode == SFmode
+    && TARGET_RECIP_DIV
+    && optimize_insn_for_speed_p ()
+    && flag_finite_math_only && !flag_trapping_math
+    && flag_unsafe_math_optimizations)
+  {
+    loongarch_emit_swdivsf (operands[0], operands[1],
+	operands[2], SFmode);
+    DONE;
+  }
+})
 
 (define_insn "*div<mode>3"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
@@ -1126,7 +1138,23 @@ (define_insn "*fnma<mode>4"
 ;;
 ;;  ....................
 
-(define_insn "sqrt<mode>2"
+(define_expand "sqrt<mode>2"
+  [(set (match_operand:ANYF 0 "register_operand")
+    (sqrt:ANYF (match_operand:ANYF 1 "register_operand")))]
+  ""
+ {
+  if (<MODE>mode == SFmode
+      && TARGET_RECIP_SQRT
+      && flag_unsafe_math_optimizations
+      && !optimize_insn_for_size_p ()
+      && flag_finite_math_only && !flag_trapping_math)
+    {
+      loongarch_emit_swrsqrtsf (operands[0], operands[1], SFmode, 0);
+      DONE;
+    }
+ })
+
+(define_insn "*sqrt<mode>2"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
 	(sqrt:ANYF (match_operand:ANYF 1 "register_operand" "f")))]
   ""
@@ -1135,6 +1163,19 @@ (define_insn "sqrt<mode>2"
    (set_attr "mode" "<UNITMODE>")
    (set_attr "insn_count" "1")])
 
+(define_expand "rsqrt<mode>2"
+  [(set (match_operand:ANYF 0 "register_operand")
+    (unspec:ANYF [(match_operand:ANYF 1 "register_operand")]
+	   UNSPEC_RSQRT))]
+  "TARGET_HARD_FLOAT"
+{
+   if (<MODE>mode == SFmode && TARGET_RECIP_RSQRT)
+     {
+       loongarch_emit_swrsqrtsf (operands[0], operands[1], SFmode, 1);
+       DONE;
+     }
+})
+
 (define_insn "*rsqrt<mode>2"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
     (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")]
diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt
index cdd59ae4fcf..61d25130ea9 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -31,6 +31,9 @@ config/loongarch/loongarch-opts.h
 HeaderInclude
 config/loongarch/loongarch-str.h
 
+TargetVariable
+unsigned int recip_mask = 0
+
 ; ISA related options
 ;; Base ISA
 Enum
@@ -202,6 +205,14 @@ mexplicit-relocs
 Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET)
 Use %reloc() assembly operators (for backward compatibility).
 
+mrecip
+Target RejectNegative Var(loongarch_recip)
+Generate approximate reciprocal divide and square root for better throughput.
+
+mrecip=
+Target RejectNegative Joined Var(loongarch_recip_name)
+Control generation of reciprocal estimates.
+
 ; The code model option names for -mcmodel.
 Enum
 Name(cmodel) Type(int)
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 06402e3b353..55810041d39 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -1083,7 +1083,25 @@ (define_insn "mul<mode>3"
   [(set_attr "type" "simd_fmul")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "div<mode>3"
+(define_expand "div<mode>3"
+  [(set (match_operand:FLSX 0 "register_operand")
+    (div:FLSX (match_operand:FLSX 1 "reg_or_vecotr_1_operand")
+	      (match_operand:FLSX 2 "register_operand")))]
+  "ISA_HAS_LSX"
+{
+  if (<MODE>mode == V4SFmode
+    && TARGET_RECIP_VEC_DIV
+    && optimize_insn_for_speed_p ()
+    && flag_finite_math_only && !flag_trapping_math
+    && flag_unsafe_math_optimizations)
+  {
+    loongarch_emit_swdivsf (operands[0], operands[1],
+	operands[2], V4SFmode);
+    DONE;
+  }
+})
+
+(define_insn "*div<mode>3"
   [(set (match_operand:FLSX 0 "register_operand" "=f")
 	(div:FLSX (match_operand:FLSX 1 "register_operand" "f")
 		  (match_operand:FLSX 2 "register_operand" "f")))]
@@ -1112,7 +1130,23 @@ (define_insn "fnma<mode>4"
   [(set_attr "type" "simd_fmadd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "sqrt<mode>2"
+(define_expand "sqrt<mode>2"
+  [(set (match_operand:FLSX 0 "register_operand")
+    (sqrt:FLSX (match_operand:FLSX 1 "register_operand")))]
+  "ISA_HAS_LSX"
+{
+  if (<MODE>mode == V4SFmode
+      && TARGET_RECIP_VEC_SQRT
+      && flag_unsafe_math_optimizations
+      && optimize_insn_for_speed_p ()
+      && flag_finite_math_only && !flag_trapping_math)
+    {
+      loongarch_emit_swrsqrtsf (operands[0], operands[1], V4SFmode, 0);
+      DONE;
+    }
+})
+
+(define_insn "*sqrt<mode>2"
   [(set (match_operand:FLSX 0 "register_operand" "=f")
 	(sqrt:FLSX (match_operand:FLSX 1 "register_operand" "f")))]
   "ISA_HAS_LSX"
@@ -1559,7 +1593,20 @@ (define_insn "lsx_vfrecipe_<flsxfmt>"
   [(set_attr "type" "simd_fdiv")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "rsqrt<mode>2"
+(define_expand "rsqrt<mode>2"
+  [(set (match_operand:FLSX 0 "register_operand" "=f")
+    (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")]
+	     UNSPEC_LSX_VFRSQRT))]
+ "ISA_HAS_LSX"
+{
+ if (<MODE>mode == V4SFmode && TARGET_RECIP_VEC_RSQRT)
+   {
+     loongarch_emit_swrsqrtsf (operands[0], operands[1], V4SFmode, 1);
+     DONE;
+   }
+})
+
+(define_insn "*rsqrt<mode>2"
   [(set (match_operand:FLSX 0 "register_operand" "=f")
     (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")]
 		 UNSPEC_LSX_VFRSQRT))]
diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md
index f7796da10b2..9e9ce58cb53 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -235,6 +235,10 @@ (define_predicate "reg_or_1_operand"
   (ior (match_operand 0 "const_1_operand")
        (match_operand 0 "register_operand")))
 
+(define_predicate "reg_or_vecotr_1_operand"
+  (ior (match_operand 0 "const_vector_1_operand")
+       (match_operand 0 "register_operand")))
+
 ;; These are used in vec_merge, hence accept bitmask as const_int.
 (define_predicate "const_exp_2_operand"
   (and (match_code "const_int")
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6fe63b5f999..bb83edbcb8f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1051,6 +1051,7 @@ Objective-C and Objective-C++ Dialects}.
 -mexplicit-relocs=@var{style} -mexplicit-relocs -mno-explicit-relocs
 -mdirect-extern-access -mno-direct-extern-access
 -mcmodel=@var{code-model} -mrelax -mpass-mrelax-to-as}
+-mrecip  -mrecip=@var{opt}
 
 @emph{M32R/D Options}
 @gccoptlist{-m32r2  -m32rx  -m32r
@@ -26598,6 +26599,59 @@ detecting corresponding assembler support:
 This option is mostly useful for debugging, or interoperation with
 assemblers different from the build-time one.
 
+@opindex mrecip
+@item -mrecip
+This option enables use of the reciprocal estimate and reciprocal square
+root estimate instructions with additional Newton-Raphson steps to increase
+precision instead of doing a divide or square root and divide for
+floating-point arguments.
+These instructions are generated only when @option{-funsafe-math-optimizations}
+is enabled together with @option{-ffinite-math-only} and
+@option{-fno-trapping-math}.
+This option is off by default. Before you can use this option, you must sure the
+target CPU supports frecipe and frsqrte instructions.
+Note that while the throughput of the sequence is higher than the throughput of
+the non-reciprocal instruction, the precision of the sequence can be decreased
+by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).
+
+@opindex mrecip=opt
+@item -mrecip=@var{opt}
+This option controls which reciprocal estimate instructions
+may be used.  @var{opt} is a comma-separated list of options, which may
+be preceded by a @samp{!} to invert the option:
+
+@table @samp
+@item all
+Enable all estimate instructions.
+
+@item default
+Enable the default instructions, equivalent to @option{-mrecip}.
+
+@item none
+Disable all estimate instructions, equivalent to @option{-mno-recip}.
+
+@item div
+Enable the approximation for scalar division.
+
+@item vec-div
+Enable the approximation for vectorized division.
+
+@item sqrt
+Enable the approximation for scalar square root.
+
+@item vec-sqrt
+Enable the approximation for vectorized square root.
+
+@item rsqrt
+Enable the approximation for scalar reciprocal square root.
+
+@item vec-rsqrt
+Enable the approximation for vectorized reciprocal square root.
+@end table
+
+So, for example, @option{-mrecip=all,!sqrt} enables
+all of the reciprocal approximations, except for scalar square root.
+
 @item loongarch-vect-unroll-limit
 The vectorizer will use available tuning information to determine whether it
 would be beneficial to unroll the main vectorized loop and by how much.  This
diff --git a/gcc/testsuite/gcc.target/loongarch/divf.c b/gcc/testsuite/gcc.target/loongarch/divf.c
new file mode 100644
index 00000000000..6c831817c9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/divf.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mrecip -mfrecipe -fno-unsafe-math-optimizations" } */
+/* { dg-final { scan-assembler "fdiv.s" } } */
+/* { dg-final { scan-assembler-not "frecipe.s" } } */
+
+float
+foo(float a, float b)
+{
+  return a / b;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/recip-divf.c b/gcc/testsuite/gcc.target/loongarch/recip-divf.c
new file mode 100644
index 00000000000..db5e3e48888
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/recip-divf.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mrecip -mfrecipe" } */
+/* { dg-final { scan-assembler "frecipe.s" } } */
+
+float
+foo(float a, float b)
+{
+  return a / b;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c
new file mode 100644
index 00000000000..7f45db6cdea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mrecip -mfrecipe" } */
+/* { dg-final { scan-assembler-times "frsqrte.s" 3 } } */
+
+extern float sqrtf (float);
+
+float
+foo1 (float a, float b)
+{
+  return a/sqrtf(b);
+}
+
+float
+foo2 (float a, float b)
+{
+  return sqrtf(a/b);
+}
+
+float
+foo3 (float a)
+{
+  return sqrtf(a);
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/sqrtf.c b/gcc/testsuite/gcc.target/loongarch/sqrtf.c
new file mode 100644
index 00000000000..c2720faac7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/sqrtf.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mrecip -mfrecipe -fno-unsafe-math-optimizations" } */
+/* { dg-final { scan-assembler-times "fsqrt.s" 3 } } */
+/* { dg-final { scan-assembler-not "frsqrte.s" } } */
+
+extern float sqrtf (float);
+
+float
+foo1 (float a, float b)
+{
+  return a/sqrtf(b);
+}
+
+float
+foo2 (float a, float b)
+{
+  return sqrtf(a/b);
+}
+
+float
+foo3 (float a)
+{
+  return sqrtf(a);
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c
new file mode 100644
index 00000000000..748a82200d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mrecip -mlasx -mfrecipe -fno-unsafe-math-optimizations" } */
+/* { dg-final { scan-assembler "xvfdiv.s" } } */
+/* { dg-final { scan-assembler-not "xvfrecipe.s" } } */
+
+float a[8],b[8],c[8];
+
+void 
+foo ()
+{
+  for (int i = 0; i < 8; i++)
+    c[i] = a[i] / b[i];
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c
new file mode 100644
index 00000000000..6532756f07d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mrecip -mlasx -mfrecipe" } */
+/* { dg-final { scan-assembler "xvfrecipe.s" } } */
+
+float a[8],b[8],c[8];
+
+void
+foo ()
+{
+  for (int i = 0; i < 8; i++)
+    c[i] = a[i] / b[i];
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c
new file mode 100644
index 00000000000..a623dff8f27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mrecip -mlasx -mfrecipe" } */
+/* { dg-final { scan-assembler-times "xvfrsqrte.s" 3 } } */
+
+float a[8], b[8], c[8];
+
+extern float sqrtf (float);
+
+void
+foo1 (void)
+{
+  for (int i = 0; i < 8; i++)
+    c[i] = a[i] / sqrtf (b[i]);
+}
+
+void
+foo2 (void)
+{
+  for (int i = 0; i < 8; i++)
+    c[i] = sqrtf (a[i] / b[i]);
+}
+
+void
+foo3 (void)
+{
+  for (int i = 0; i < 8; i++)
+    c[i] = sqrtf (a[i]);
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c
new file mode 100644
index 00000000000..083c868406b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlasx -fno-vect-cost-model" } */
+/* { dg-final { scan-assembler "xvfrecip.s" } } */
+/* { dg-final { scan-assembler "xvfrecip.d" } } */
+/* { dg-final { scan-assembler-not "xvfdiv.s" } } */
+/* { dg-final { scan-assembler-not "xvfdiv.d" } } */
+
+float a[8], b[8];
+
+void 
+foo1(void)
+{
+  for (int i = 0; i < 8; i++)
+    a[i] = 1 / (b[i]);
+}
+
+double da[4], db[4];
+
+void
+foo2(void)
+{
+  for (int i = 0; i < 4; i++)
+    da[i] = 1 / (db[i]);
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c
new file mode 100644
index 00000000000..a005a38865d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fno-unsafe-math-optimizations  -mrecip -mlasx -mfrecipe" } */
+/* { dg-final { scan-assembler-times "xvfsqrt.s" 3 } } */
+/* { dg-final { scan-assembler-not "xvfrsqrte.s" } } */
+
+float a[8], b[8], c[8];
+
+extern float sqrtf (float);
+
+void
+foo1 (void)
+{
+  for (int i = 0; i < 8; i++)
+    c[i] = a[i] / sqrtf (b[i]);
+}
+
+void
+foo2 (void)
+{
+  for (int i = 0; i < 8; i++)
+    c[i] = sqrtf (a[i] / b[i]);
+}
+
+void
+foo3 (void)
+{
+  for (int i = 0; i < 8; i++)
+    c[i] = sqrtf (a[i]);
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c
new file mode 100644
index 00000000000..1219b1ef842
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mrecip -mlsx -mfrecipe -fno-unsafe-math-optimizations" } */
+/* { dg-final { scan-assembler "vfdiv.s" } } */
+/* { dg-final { scan-assembler-not "vfrecipe.s" } } */
+
+float a[4],b[4],c[4];
+
+void
+foo ()
+{
+  for (int i = 0; i < 4; i++)
+    c[i] = a[i] / b[i];
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c
new file mode 100644
index 00000000000..edbe8d9098f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mrecip -mlsx -mfrecipe" } */
+/* { dg-final { scan-assembler "vfrecipe.s" } } */
+
+float a[4],b[4],c[4];
+
+void
+foo ()
+{
+  for (int i = 0; i < 4; i++)
+    c[i] = a[i] / b[i];
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c
new file mode 100644
index 00000000000..d356f915eb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mrecip -mlsx -mfrecipe" } */
+/* { dg-final { scan-assembler-times "vfrsqrte.s" 3 } } */
+
+float a[4], b[4], c[4];
+
+extern float sqrtf (float);
+
+void
+foo1 (void)
+{
+  for (int i = 0; i < 4; i++)
+    c[i] = a[i] / sqrtf (b[i]);
+}
+
+void
+foo2 (void)
+{
+  for (int i = 0; i < 4; i++)
+    c[i] = sqrtf (a[i] / b[i]);
+}
+
+void
+foo3 (void)
+{
+  for (int i = 0; i < 4; i++)
+    c[i] = sqrtf (a[i]);
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c
new file mode 100644
index 00000000000..c4d6af4db93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlsx -fno-vect-cost-model" } */
+/* { dg-final { scan-assembler "vfrecip.s" } } */
+/* { dg-final { scan-assembler "vfrecip.d" } } */
+/* { dg-final { scan-assembler-not "vfdiv.s" } } */
+/* { dg-final { scan-assembler-not "vfdiv.d" } } */
+
+float a[4], b[4];
+
+void
+foo1(void)
+{
+  for (int i = 0; i < 4; i++)
+    a[i] = 1 / (b[i]);
+}
+
+double da[2], db[2];
+
+void
+foo2(void)
+{
+  for (int i = 0; i < 2; i++)
+    da[i] = 1 / (db[i]);
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c
new file mode 100644
index 00000000000..3ff6570a67a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mrecip -mlsx -mfrecipe -fno-unsafe-math-optimizations" } */
+/* { dg-final { scan-assembler-times "vfsqrt.s" 3 } } */
+/* { dg-final { scan-assembler-not "vfrsqrte.s" } } */
+
+float a[4], b[4], c[4];
+
+extern float sqrtf (float);
+
+void
+foo1 (void)
+{
+  for (int i = 0; i < 4; i++)
+    c[i] = a[i] / sqrtf (b[i]);
+}
+
+void
+foo2 (void)
+{
+  for (int i = 0; i < 4; i++)
+    c[i] = sqrtf (a[i] / b[i]);
+}
+
+void
+foo3 (void)
+{
+  for (int i = 0; i < 4; i++)
+    c[i] = sqrtf (a[i]);
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 5/5] LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf when -mrecip is enabled.
  2023-12-06  7:04 [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations Jiahao Xu
                   ` (3 preceding siblings ...)
  2023-12-06  7:04 ` [PATCH v3 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math Jiahao Xu
@ 2023-12-06  7:04 ` Jiahao Xu
  2023-12-06  8:25 ` [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations Jiahao Xu
  2023-12-08  8:31 ` [pushed][PATCH " chenglulu
  6 siblings, 0 replies; 8+ messages in thread
From: Jiahao Xu @ 2023-12-06  7:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: xry111, i, chenglulu, xuchenghua, Jiahao Xu

Using -mrecip generates a sequence of instructions to replace divf, sqrtf and rsqrtf. The number
of generated instructions is close to or exceeds the maximum issue instructions per cycle of the
LoongArch, so vectorized loop unrolling is not performed on them.

gcc/ChangeLog:

	* config/loongarch/loongarch.cc (loongarch_vector_costs::determine_suggested_unroll_factor):
	If m_has_recip is true, uf return 1.
	(loongarch_vector_costs::add_stmt_cost): Detect the use of approximate instruction sequence.

diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 2c06edcff92..0ca60e15ced 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3974,7 +3974,9 @@ protected:
   /* Reduction factor for suggesting unroll factor.  */
   unsigned m_reduc_factor = 0;
   /* True if the loop contains an average operation. */
-  bool m_has_avg =false;
+  bool m_has_avg = false;
+  /* True if the loop uses approximation instruction sequence.  */
+  bool m_has_recip = false;
 };
 
 /* Implement TARGET_VECTORIZE_CREATE_COSTS.  */
@@ -4021,7 +4023,7 @@ loongarch_vector_costs::determine_suggested_unroll_factor (loop_vec_info loop_vi
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
 
-  if (m_has_avg)
+  if (m_has_avg || m_has_recip)
     return 1;
 
   /* Don't unroll if it's specified explicitly not to be unrolled.  */
@@ -4081,6 +4083,36 @@ loongarch_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
 	}
     }
 
+  combined_fn cfn;
+  if (kind == vector_stmt
+      && stmt_info
+      && stmt_info->stmt)
+    {
+      /* Detect the use of approximate instruction sequence.  */
+      if ((TARGET_RECIP_VEC_SQRT || TARGET_RECIP_VEC_RSQRT)
+	  && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST)
+	switch (cfn)
+	  {
+	  case CFN_BUILT_IN_SQRTF:
+	    m_has_recip = true;
+	  default:
+	    break;
+	  }
+      else if (TARGET_RECIP_VEC_DIV
+	       && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN)
+	{
+	  machine_mode mode = TYPE_MODE (vectype);
+	  switch (gimple_assign_rhs_code (stmt_info->stmt))
+	    {
+	    case RDIV_EXPR:
+	      if (GET_MODE_INNER (mode) == SFmode)
+		m_has_recip = true;
+	    default:
+	      break;
+	    }
+	}
+    }
+
   return retval;
 }
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations.
  2023-12-06  7:04 [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations Jiahao Xu
                   ` (4 preceding siblings ...)
  2023-12-06  7:04 ` [PATCH v3 5/5] LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf when -mrecip is enabled Jiahao Xu
@ 2023-12-06  8:25 ` Jiahao Xu
  2023-12-08  8:31 ` [pushed][PATCH " chenglulu
  6 siblings, 0 replies; 8+ messages in thread
From: Jiahao Xu @ 2023-12-06  8:25 UTC (permalink / raw)
  To: gcc-patches; +Cc: xry111, i, chenglulu, xuchenghua


在 2023/12/6 下午3:04, Jiahao Xu 写道:
> LoongArch V1.1 adds support for approximate instructions, which are utilized along with additional
> Newton-Raphson steps implement single precision floating-point division, square root and reciprocal
> square root operations for better throughput.
>
> The patches are modifications made based on the patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html

The changes in version 3 compared to the previous version include:

*Enable -mfrecipe when using -march=la664.
*Implement builtin functions for frecipe and frsqrte instructions and 
introduce a new builtin macro "__loongarch_frecipe".
*Add corresponding test cases for the implemented builtin functions.
*Update the usage for the new intrinsic functions and builtin functions 
in extend.texi.
*Add reverse tests for scenarios where the -mrecip option is not enabled.
> Jiahao Xu (5):
>    LoongArch: Add support for LoongArch V1.1 approximate instructions.
>    LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
>      instructions.
>    LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.
>    LoongArch: New options -mrecip and -mrecip= with ffast-math.
>    LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf
>      when -mrecip is enabled.
>
>   gcc/config/loongarch/genopts/isa-evolution.in |   1 +
>   gcc/config/loongarch/genopts/loongarch.opt.in |  11 +
>   gcc/config/loongarch/larchintrin.h            |  38 +++
>   gcc/config/loongarch/lasx.md                  |  89 ++++++-
>   gcc/config/loongarch/lasxintrin.h             |  34 +++
>   gcc/config/loongarch/loongarch-builtins.cc    |  66 +++++
>   gcc/config/loongarch/loongarch-c.cc           |   3 +
>   gcc/config/loongarch/loongarch-cpucfg-map.h   |   1 +
>   gcc/config/loongarch/loongarch-def.cc         |   3 +-
>   gcc/config/loongarch/loongarch-protos.h       |   2 +
>   gcc/config/loongarch/loongarch-str.h          |   1 +
>   gcc/config/loongarch/loongarch.cc             | 252 +++++++++++++++++-
>   gcc/config/loongarch/loongarch.h              |  18 ++
>   gcc/config/loongarch/loongarch.md             | 104 ++++++--
>   gcc/config/loongarch/loongarch.opt            |  15 ++
>   gcc/config/loongarch/lsx.md                   |  89 ++++++-
>   gcc/config/loongarch/lsxintrin.h              |  34 +++
>   gcc/config/loongarch/predicates.md            |   8 +
>   gcc/doc/extend.texi                           |  35 +++
>   gcc/doc/invoke.texi                           |  54 ++++
>   gcc/testsuite/gcc.target/loongarch/divf.c     |  10 +
>   .../loongarch/larch-frecipe-builtin.c         |  28 ++
>   .../gcc.target/loongarch/recip-divf.c         |   9 +
>   .../gcc.target/loongarch/recip-sqrtf.c        |  23 ++
>   gcc/testsuite/gcc.target/loongarch/sqrtf.c    |  24 ++
>   .../loongarch/vector/lasx/lasx-divf.c         |  13 +
>   .../vector/lasx/lasx-frecipe-builtin.c        |  30 +++
>   .../loongarch/vector/lasx/lasx-recip-divf.c   |  12 +
>   .../loongarch/vector/lasx/lasx-recip-sqrtf.c  |  28 ++
>   .../loongarch/vector/lasx/lasx-recip.c        |  24 ++
>   .../loongarch/vector/lasx/lasx-rsqrt.c        |  26 ++
>   .../loongarch/vector/lasx/lasx-sqrtf.c        |  29 ++
>   .../loongarch/vector/lsx/lsx-divf.c           |  13 +
>   .../vector/lsx/lsx-frecipe-builtin.c          |  30 +++
>   .../loongarch/vector/lsx/lsx-recip-divf.c     |  12 +
>   .../loongarch/vector/lsx/lsx-recip-sqrtf.c    |  28 ++
>   .../loongarch/vector/lsx/lsx-recip.c          |  24 ++
>   .../loongarch/vector/lsx/lsx-rsqrt.c          |  26 ++
>   .../loongarch/vector/lsx/lsx-sqrtf.c          |  29 ++
>   39 files changed, 1234 insertions(+), 42 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pushed][PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations.
  2023-12-06  7:04 [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations Jiahao Xu
                   ` (5 preceding siblings ...)
  2023-12-06  8:25 ` [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations Jiahao Xu
@ 2023-12-08  8:31 ` chenglulu
  6 siblings, 0 replies; 8+ messages in thread
From: chenglulu @ 2023-12-08  8:31 UTC (permalink / raw)
  To: Jiahao Xu, gcc-patches; +Cc: xry111, i, xuchenghua

Pushed to r14-6311...r14-6315.

在 2023/12/6 下午3:04, Jiahao Xu 写道:
> LoongArch V1.1 adds support for approximate instructions, which are utilized along with additional
> Newton-Raphson steps implement single precision floating-point division, square root and reciprocal
> square root operations for better throughput.
>
> The patches are modifications made based on the patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html
>
> Jiahao Xu (5):
>    LoongArch: Add support for LoongArch V1.1 approximate instructions.
>    LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
>      instructions.
>    LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.
>    LoongArch: New options -mrecip and -mrecip= with ffast-math.
>    LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf
>      when -mrecip is enabled.
>
>   gcc/config/loongarch/genopts/isa-evolution.in |   1 +
>   gcc/config/loongarch/genopts/loongarch.opt.in |  11 +
>   gcc/config/loongarch/larchintrin.h            |  38 +++
>   gcc/config/loongarch/lasx.md                  |  89 ++++++-
>   gcc/config/loongarch/lasxintrin.h             |  34 +++
>   gcc/config/loongarch/loongarch-builtins.cc    |  66 +++++
>   gcc/config/loongarch/loongarch-c.cc           |   3 +
>   gcc/config/loongarch/loongarch-cpucfg-map.h   |   1 +
>   gcc/config/loongarch/loongarch-def.cc         |   3 +-
>   gcc/config/loongarch/loongarch-protos.h       |   2 +
>   gcc/config/loongarch/loongarch-str.h          |   1 +
>   gcc/config/loongarch/loongarch.cc             | 252 +++++++++++++++++-
>   gcc/config/loongarch/loongarch.h              |  18 ++
>   gcc/config/loongarch/loongarch.md             | 104 ++++++--
>   gcc/config/loongarch/loongarch.opt            |  15 ++
>   gcc/config/loongarch/lsx.md                   |  89 ++++++-
>   gcc/config/loongarch/lsxintrin.h              |  34 +++
>   gcc/config/loongarch/predicates.md            |   8 +
>   gcc/doc/extend.texi                           |  35 +++
>   gcc/doc/invoke.texi                           |  54 ++++
>   gcc/testsuite/gcc.target/loongarch/divf.c     |  10 +
>   .../loongarch/larch-frecipe-builtin.c         |  28 ++
>   .../gcc.target/loongarch/recip-divf.c         |   9 +
>   .../gcc.target/loongarch/recip-sqrtf.c        |  23 ++
>   gcc/testsuite/gcc.target/loongarch/sqrtf.c    |  24 ++
>   .../loongarch/vector/lasx/lasx-divf.c         |  13 +
>   .../vector/lasx/lasx-frecipe-builtin.c        |  30 +++
>   .../loongarch/vector/lasx/lasx-recip-divf.c   |  12 +
>   .../loongarch/vector/lasx/lasx-recip-sqrtf.c  |  28 ++
>   .../loongarch/vector/lasx/lasx-recip.c        |  24 ++
>   .../loongarch/vector/lasx/lasx-rsqrt.c        |  26 ++
>   .../loongarch/vector/lasx/lasx-sqrtf.c        |  29 ++
>   .../loongarch/vector/lsx/lsx-divf.c           |  13 +
>   .../vector/lsx/lsx-frecipe-builtin.c          |  30 +++
>   .../loongarch/vector/lsx/lsx-recip-divf.c     |  12 +
>   .../loongarch/vector/lsx/lsx-recip-sqrtf.c    |  28 ++
>   .../loongarch/vector/lsx/lsx-recip.c          |  24 ++
>   .../loongarch/vector/lsx/lsx-rsqrt.c          |  26 ++
>   .../loongarch/vector/lsx/lsx-sqrtf.c          |  29 ++
>   39 files changed, 1234 insertions(+), 42 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-12-08  8:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-06  7:04 [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations Jiahao Xu
2023-12-06  7:04 ` [PATCH v3 1/5] LoongArch: Add support for LoongArch V1.1 approximate instructions Jiahao Xu
2023-12-06  7:04 ` [PATCH v3 2/5] LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions Jiahao Xu
2023-12-06  7:04 ` [PATCH v3 3/5] LoongArch: Redefine pattern for xvfrecip/vfrecip instructions Jiahao Xu
2023-12-06  7:04 ` [PATCH v3 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math Jiahao Xu
2023-12-06  7:04 ` [PATCH v3 5/5] LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf when -mrecip is enabled Jiahao Xu
2023-12-06  8:25 ` [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations Jiahao Xu
2023-12-08  8:31 ` [pushed][PATCH " chenglulu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).