[PATCH v3 0/5] LoongArch: SIMD fixes and optimizations

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations
@ 2023-11-20  0:47 Xi Ruoyao
  2023-11-20  0:47 ` [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578] Xi Ruoyao
                   ` (5 more replies)
  0 siblings, 6 replies; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-20  0:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: chenglulu, i, xuchenghua, Xi Ruoyao

The [1/5] patch is the PR112578 fix at
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637097.html.
It has been changed to remove the nearbyint pattern (because nearbyint
should not raise FE_INEXACT even if -ffp-int-builtin-inexact).
As other patches depending on the simd.md file introduced by this, sending
it as the first of this series.

As many LASX instructions are only differentiated from the corresponding
LSX instruction with operand length, create simd.md file to contain the
RTX templates sharable by LSX and LASX.  This makes the code cleaner and
easier to maintain.

The [2/5] and [3/5] patches make vector product highpart and rotate
shift operations for GNU vectors and auto vectorization.

The [4/5] patch is a simple code cleanup, with no function change.

The [5/5] patch uses LSX for FP scalar rounding operations if LSX is
available and -ffp-int-builtin-exact.  We do this because the base FP
ISA does not have such instructions.  Using LSX is overkill, but still
much faster than calling libc functions.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Xi Ruoyao (5):
  LoongArch: Fix usage of LSX and LASX frint/ftint instructions
    [PR112578]
  LoongArch: Use standard pattern name and RTX code for LSX/LASX muh
    instructions
  LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate
    shift
  LoongArch: Remove lrint_allow_inexact
  LoongArch: Use LSX for scalar FP rounding with explicit rounding mode

 gcc/config/loongarch/lasx.md                  | 283 -----------------
 gcc/config/loongarch/loongarch-builtins.cc    |  52 ++--
 gcc/config/loongarch/loongarch.md             |  12 +-
 gcc/config/loongarch/lsx.md                   | 293 ------------------
 gcc/config/loongarch/simd.md                  | 268 ++++++++++++++++
 .../loongarch/vect-frint-no-inexact.c         |  48 +++
 .../loongarch/vect-frint-scalar-no-inexact.c  |  23 ++
 .../gcc.target/loongarch/vect-frint-scalar.c  |  43 +++
 .../gcc.target/loongarch/vect-frint.c         |  85 +++++
 .../loongarch/vect-ftint-no-inexact.c         |  44 +++
 .../gcc.target/loongarch/vect-ftint.c         |  83 +++++
 gcc/testsuite/gcc.target/loongarch/vect-muh.c |  36 +++
 .../gcc.target/loongarch/vect-rotr.c          |  36 +++
 13 files changed, 701 insertions(+), 605 deletions(-)
 create mode 100644 gcc/config/loongarch/simd.md
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-muh.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-rotr.c

-- 
2.42.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-20  0:47 [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations Xi Ruoyao
@ 2023-11-20  0:47 ` Xi Ruoyao
  2023-11-23  6:35   ` chenglulu
  2023-11-20  0:47 ` [PATCH v3 2/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX muh instructions Xi Ruoyao
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-20  0:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: chenglulu, i, xuchenghua, Xi Ruoyao

The usage LSX and LASX frint/ftint instructions had some problems:

1. These instructions raises FE_INEXACT, which is not allowed with
   -fno-fp-int-builtin-inexact for most C2x section F.10.6 functions
   (the only exceptions are rint, lrint, and llrint).
2. The "frint" instruction without explicit rounding mode is used for
   roundM2, this is incorrect because roundM2 is defined "rounding
   operand 1 to the *nearest* integer, rounding away from zero in the
   event of a tie".  We actually don't have such an instruction.  Our
   frintrne instruction is roundevenM2 (unfortunately, this is not
   documented).
3. These define_insn's are written in a way not so easy to hack.

So I removed these instructions and created a "simd.md" file, then added
them and the corresponding expanders there.  The advantage of the
simd.md file is we don't need to duplicate the RTL template twice (in
lsx.md and lasx.md).

gcc/ChangeLog:

	PR target/112578
	* config/loongarch/lsx.md (UNSPEC_LSX_VFTINT_S,
	UNSPEC_LSX_VFTINTRNE, UNSPEC_LSX_VFTINTRP,
	UNSPEC_LSX_VFTINTRM, UNSPEC_LSX_VFRINTRNE_S,
	UNSPEC_LSX_VFRINTRNE_D, UNSPEC_LSX_VFRINTRZ_S,
	UNSPEC_LSX_VFRINTRZ_D, UNSPEC_LSX_VFRINTRP_S,
	UNSPEC_LSX_VFRINTRP_D, UNSPEC_LSX_VFRINTRM_S,
	UNSPEC_LSX_VFRINTRM_D): Remove.
	(ILSX, FLSX): Move into ...
	(VIMODE): Move into ...
	(FRINT_S, FRINT_D): Remove.
	(frint_pattern_s, frint_pattern_d, frint_suffix): Remove.
	(lsx_vfrint_<flsxfmt>, lsx_vftint_s_<ilsxfmt>_<flsxfmt>,
	lsx_vftintrne_w_s, lsx_vftintrne_l_d, lsx_vftintrp_w_s,
	lsx_vftintrp_l_d, lsx_vftintrm_w_s, lsx_vftintrm_l_d,
	lsx_vfrintrne_s, lsx_vfrintrne_d, lsx_vfrintrz_s,
	lsx_vfrintrz_d, lsx_vfrintrp_s, lsx_vfrintrp_d,
	lsx_vfrintrm_s, lsx_vfrintrm_d,
	<FRINT_S:frint_pattern_s>v4sf2,
	<FRINT_D:frint_pattern_d>v2df2, round<mode>2,
	fix_trunc<mode>2): Remove.
	* config/loongarch/lasx.md: Likewise.
	* config/loongarch/simd.md: New file.
	(ILSX, ILASX, FLSX, FLASX, VIMODE): ... here.
	(IVEC, FVEC): New mode iterators.
	(VIMODE): ... here.  Extend it to work for all LSX/LASX vector
	modes.
	(x, wu, simd_isa, WVEC, vimode, simdfmt, simdifmt_for_f,
	elebits): New mode attributes.
	(UNSPEC_SIMD_FRINTRP, UNSPEC_SIMD_FRINTRZ, UNSPEC_SIMD_FRINT,
	UNSPEC_SIMD_FRINTRM, UNSPEC_SIMD_FRINTRNE): New unspecs.
	(SIMD_FRINT): New int iterator.
	(simd_frint_rounding, simd_frint_pattern): New int attributes.
	(<simd_isa>_<x>vfrint<simd_frint_rounding>_<simdfmt>): New
	define_insn template for frint instructions.
	(<simd_isa>_<x>vftint<simd_frint_rounding>_<simdifmt_for_f>_<simdfmt>):
	Likewise, but for ftint instructions.
	(<simd_frint_pattern><mode>2): New define_expand with
	flag_fp_int_builtin_inexact checked.
	(l<simd_frint_pattern><mode><vimode>2): Likewise.
	(ftrunc<mode>2): New define_expand.  It does not require
	flag_fp_int_builtin_inexact.
	(fix_trunc<mode><vimode>2): New define_insn_and_split.  It does
	not require flag_fp_int_builtin_inexact.
	(include): Add lsx.md and lasx.md.
	* config/loongarch/loongarch.md (include): Include simd.md,
	instead of including lsx.md and lasx.md directly.
	* config/loongarch/loongarch-builtins.cc
	(CODE_FOR_lsx_vftint_w_s, CODE_FOR_lsx_vftint_l_d,
	CODE_FOR_lasx_xvftint_w_s, CODE_FOR_lasx_xvftint_l_d):
	Remove.

gcc/testsuite/ChangeLog:

	PR target/112578
	* gcc.target/loongarch/vect-frint.c: New test.
	* gcc.target/loongarch/vect-frint-no-inexact.c: New test.
	* gcc.target/loongarch/vect-ftint.c: New test.
	* gcc.target/loongarch/vect-ftint-no-inexact.c: New test.
---
 gcc/config/loongarch/lasx.md                  | 239 -----------------
 gcc/config/loongarch/loongarch-builtins.cc    |   4 -
 gcc/config/loongarch/loongarch.md             |   7 +-
 gcc/config/loongarch/lsx.md                   | 243 ------------------
 gcc/config/loongarch/simd.md                  | 194 ++++++++++++++
 .../loongarch/vect-frint-no-inexact.c         |  48 ++++
 .../gcc.target/loongarch/vect-frint.c         |  85 ++++++
 .../loongarch/vect-ftint-no-inexact.c         |  44 ++++
 .../gcc.target/loongarch/vect-ftint.c         |  83 ++++++
 9 files changed, 456 insertions(+), 491 deletions(-)
 create mode 100644 gcc/config/loongarch/simd.md
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 2e11f061202..d4a56c307c4 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -53,7 +53,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVFCMP_SULT
   UNSPEC_LASX_XVFCMP_SUN
   UNSPEC_LASX_XVFCMP_SUNE
-  UNSPEC_LASX_XVFTINT_S
   UNSPEC_LASX_XVFTINT_U
   UNSPEC_LASX_XVCLO
   UNSPEC_LASX_XVSAT_S
@@ -92,12 +91,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVEXTRINS
   UNSPEC_LASX_XVMSKLTZ
   UNSPEC_LASX_XVSIGNCOV
-  UNSPEC_LASX_XVFTINTRNE_W_S
-  UNSPEC_LASX_XVFTINTRNE_L_D
-  UNSPEC_LASX_XVFTINTRP_W_S
-  UNSPEC_LASX_XVFTINTRP_L_D
-  UNSPEC_LASX_XVFTINTRM_W_S
-  UNSPEC_LASX_XVFTINTRM_L_D
   UNSPEC_LASX_XVFTINT_W_D
   UNSPEC_LASX_XVFFINT_S_L
   UNSPEC_LASX_XVFTINTRZ_W_D
@@ -116,14 +109,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVFTINTRML_L_S
   UNSPEC_LASX_XVFTINTRNEL_L_S
   UNSPEC_LASX_XVFTINTRNEH_L_S
-  UNSPEC_LASX_XVFRINTRNE_S
-  UNSPEC_LASX_XVFRINTRNE_D
-  UNSPEC_LASX_XVFRINTRZ_S
-  UNSPEC_LASX_XVFRINTRZ_D
-  UNSPEC_LASX_XVFRINTRP_S
-  UNSPEC_LASX_XVFRINTRP_D
-  UNSPEC_LASX_XVFRINTRM_S
-  UNSPEC_LASX_XVFRINTRM_D
   UNSPEC_LASX_XVREPLVE0_Q
   UNSPEC_LASX_XVPERM_W
   UNSPEC_LASX_XVPERMI_Q
@@ -206,9 +191,6 @@ (define_mode_iterator LASX_WD [V4DI V4DF V8SI V8SF])
 ;; Only used for copy256_{u,s}.w.
 (define_mode_iterator LASX_W    [V8SI V8SF])
 
-;; Only integer modes in LASX.
-(define_mode_iterator ILASX [V4DI V8SI V16HI V32QI])
-
 ;; As ILASX but excludes V32QI.
 (define_mode_iterator ILASX_DWH [V4DI V8SI V16HI])
 
@@ -224,9 +206,6 @@ (define_mode_iterator ILASX_DW  [V4DI V8SI])
 ;; Only integer modes smaller than a word.
 (define_mode_iterator ILASX_HB  [V16HI V32QI])
 
-;; Only floating-point modes in LASX.
-(define_mode_iterator FLASX  [V4DF V8SF])
-
 ;; Only used for immediate set shuffle elements instruction.
 (define_mode_iterator LASX_WHB_W [V8SI V16HI V32QI V8SF])
 
@@ -500,37 +479,6 @@ (define_mode_attr lasxfmt_wd
    (V16HI "w")
    (V32QI "w")])
 
-(define_int_iterator FRINT256_S [UNSPEC_LASX_XVFRINTRP_S
-			       UNSPEC_LASX_XVFRINTRZ_S
-			       UNSPEC_LASX_XVFRINT
-			       UNSPEC_LASX_XVFRINTRM_S])
-
-(define_int_iterator FRINT256_D [UNSPEC_LASX_XVFRINTRP_D
-			       UNSPEC_LASX_XVFRINTRZ_D
-			       UNSPEC_LASX_XVFRINT
-			       UNSPEC_LASX_XVFRINTRM_D])
-
-(define_int_attr frint256_pattern_s
-  [(UNSPEC_LASX_XVFRINTRP_S  "ceil")
-   (UNSPEC_LASX_XVFRINTRZ_S  "btrunc")
-   (UNSPEC_LASX_XVFRINT	     "rint")
-   (UNSPEC_LASX_XVFRINTRM_S  "floor")])
-
-(define_int_attr frint256_pattern_d
-  [(UNSPEC_LASX_XVFRINTRP_D  "ceil")
-   (UNSPEC_LASX_XVFRINTRZ_D  "btrunc")
-   (UNSPEC_LASX_XVFRINT	     "rint")
-   (UNSPEC_LASX_XVFRINTRM_D  "floor")])
-
-(define_int_attr frint256_suffix
-  [(UNSPEC_LASX_XVFRINTRP_S  "rp")
-   (UNSPEC_LASX_XVFRINTRP_D  "rp")
-   (UNSPEC_LASX_XVFRINTRZ_S  "rz")
-   (UNSPEC_LASX_XVFRINTRZ_D  "rz")
-   (UNSPEC_LASX_XVFRINT	     "")
-   (UNSPEC_LASX_XVFRINTRM_S  "rm")
-   (UNSPEC_LASX_XVFRINTRM_D  "rm")])
-
 (define_expand "vec_init<mode><unitmode>"
   [(match_operand:LASX 0 "register_operand")
    (match_operand:LASX 1 "")]
@@ -1688,15 +1636,6 @@ (define_insn "lasx_xvfrecip_<flasxfmt>"
   [(set_attr "type" "simd_fdiv")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "lasx_xvfrint_<flasxfmt>"
-  [(set (match_operand:FLASX 0 "register_operand" "=f")
-	(unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
-		      UNSPEC_LASX_XVFRINT))]
-  "ISA_HAS_LASX"
-  "xvfrint.<flasxfmt>\t%u0,%u1"
-  [(set_attr "type" "simd_fcvt")
-   (set_attr "mode" "<MODE>")])
-
 (define_insn "lasx_xvfrsqrt_<flasxfmt>"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
 	(unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
@@ -1706,16 +1645,6 @@ (define_insn "lasx_xvfrsqrt_<flasxfmt>"
   [(set_attr "type" "simd_fdiv")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "lasx_xvftint_s_<ilasxfmt>_<flasxfmt>"
-  [(set (match_operand:<VIMODE256> 0 "register_operand" "=f")
-	(unspec:<VIMODE256> [(match_operand:FLASX 1 "register_operand" "f")]
-			    UNSPEC_LASX_XVFTINT_S))]
-  "ISA_HAS_LASX"
-  "xvftint.<ilasxfmt>.<flasxfmt>\t%u0,%u1"
-  [(set_attr "type" "simd_fcvt")
-   (set_attr "cnv_mode" "<FINTCNV256_2>")
-   (set_attr "mode" "<MODE>")])
-
 (define_insn "lasx_xvftint_u_<ilasxfmt_u>_<flasxfmt>"
   [(set (match_operand:<VIMODE256> 0 "register_operand" "=f")
 	(unspec:<VIMODE256> [(match_operand:FLASX 1 "register_operand" "f")]
@@ -1726,18 +1655,6 @@ (define_insn "lasx_xvftint_u_<ilasxfmt_u>_<flasxfmt>"
    (set_attr "cnv_mode" "<FINTCNV256_2>")
    (set_attr "mode" "<MODE>")])
 
-
-
-(define_insn "fix_trunc<FLASX:mode><mode256_i>2"
-  [(set (match_operand:<VIMODE256> 0 "register_operand" "=f")
-	(fix:<VIMODE256> (match_operand:FLASX 1 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-  "xvftintrz.<ilasxfmt>.<flasxfmt>\t%u0,%u1"
-  [(set_attr "type" "simd_fcvt")
-   (set_attr "cnv_mode" "<FINTCNV256_2>")
-   (set_attr "mode" "<MODE>")])
-
-
 (define_insn "fixuns_trunc<FLASX:mode><mode256_i>2"
   [(set (match_operand:<VIMODE256> 0 "register_operand" "=f")
 	(unsigned_fix:<VIMODE256> (match_operand:FLASX 1 "register_operand" "f")))]
@@ -3245,60 +3162,6 @@ (define_insn "xvfnmadd<mode>4_nmadd4"
   [(set_attr "type" "simd_fmadd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "lasx_xvftintrne_w_s"
-  [(set (match_operand:V8SI 0 "register_operand" "=f")
-	(unspec:V8SI [(match_operand:V8SF 1 "register_operand" "f")]
-		     UNSPEC_LASX_XVFTINTRNE_W_S))]
-  "ISA_HAS_LASX"
-  "xvftintrne.w.s\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V8SF")])
-
-(define_insn "lasx_xvftintrne_l_d"
-  [(set (match_operand:V4DI 0 "register_operand" "=f")
-	(unspec:V4DI [(match_operand:V4DF 1 "register_operand" "f")]
-		     UNSPEC_LASX_XVFTINTRNE_L_D))]
-  "ISA_HAS_LASX"
-  "xvftintrne.l.d\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4DF")])
-
-(define_insn "lasx_xvftintrp_w_s"
-  [(set (match_operand:V8SI 0 "register_operand" "=f")
-	(unspec:V8SI [(match_operand:V8SF 1 "register_operand" "f")]
-		     UNSPEC_LASX_XVFTINTRP_W_S))]
-  "ISA_HAS_LASX"
-  "xvftintrp.w.s\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V8SF")])
-
-(define_insn "lasx_xvftintrp_l_d"
-  [(set (match_operand:V4DI 0 "register_operand" "=f")
-	(unspec:V4DI [(match_operand:V4DF 1 "register_operand" "f")]
-		     UNSPEC_LASX_XVFTINTRP_L_D))]
-  "ISA_HAS_LASX"
-  "xvftintrp.l.d\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4DF")])
-
-(define_insn "lasx_xvftintrm_w_s"
-  [(set (match_operand:V8SI 0 "register_operand" "=f")
-	(unspec:V8SI [(match_operand:V8SF 1 "register_operand" "f")]
-		     UNSPEC_LASX_XVFTINTRM_W_S))]
-  "ISA_HAS_LASX"
-  "xvftintrm.w.s\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V8SF")])
-
-(define_insn "lasx_xvftintrm_l_d"
-  [(set (match_operand:V4DI 0 "register_operand" "=f")
-	(unspec:V4DI [(match_operand:V4DF 1 "register_operand" "f")]
-		     UNSPEC_LASX_XVFTINTRM_L_D))]
-  "ISA_HAS_LASX"
-  "xvftintrm.l.d\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4DF")])
-
 (define_insn "lasx_xvftint_w_d"
   [(set (match_operand:V8SI 0 "register_operand" "=f")
 	(unspec:V8SI [(match_operand:V4DF 1 "register_operand" "f")
@@ -3467,108 +3330,6 @@ (define_insn "lasx_xvftintrnel_l_s"
   [(set_attr "type" "simd_shift")
    (set_attr "mode" "V8SF")])
 
-(define_insn "lasx_xvfrintrne_s"
-  [(set (match_operand:V8SF 0 "register_operand" "=f")
-	(unspec:V8SF [(match_operand:V8SF 1 "register_operand" "f")]
-		     UNSPEC_LASX_XVFRINTRNE_S))]
-  "ISA_HAS_LASX"
-  "xvfrintrne.s\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V8SF")])
-
-(define_insn "lasx_xvfrintrne_d"
-  [(set (match_operand:V4DF 0 "register_operand" "=f")
-	(unspec:V4DF [(match_operand:V4DF 1 "register_operand" "f")]
-		     UNSPEC_LASX_XVFRINTRNE_D))]
-  "ISA_HAS_LASX"
-  "xvfrintrne.d\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4DF")])
-
-(define_insn "lasx_xvfrintrz_s"
-  [(set (match_operand:V8SF 0 "register_operand" "=f")
-	(unspec:V8SF [(match_operand:V8SF 1 "register_operand" "f")]
-		     UNSPEC_LASX_XVFRINTRZ_S))]
-  "ISA_HAS_LASX"
-  "xvfrintrz.s\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V8SF")])
-
-(define_insn "lasx_xvfrintrz_d"
-  [(set (match_operand:V4DF 0 "register_operand" "=f")
-	(unspec:V4DF [(match_operand:V4DF 1 "register_operand" "f")]
-		     UNSPEC_LASX_XVFRINTRZ_D))]
-  "ISA_HAS_LASX"
-  "xvfrintrz.d\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4DF")])
-
-(define_insn "lasx_xvfrintrp_s"
-  [(set (match_operand:V8SF 0 "register_operand" "=f")
-	(unspec:V8SF [(match_operand:V8SF 1 "register_operand" "f")]
-		     UNSPEC_LASX_XVFRINTRP_S))]
-  "ISA_HAS_LASX"
-  "xvfrintrp.s\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V8SF")])
-
-(define_insn "lasx_xvfrintrp_d"
-  [(set (match_operand:V4DF 0 "register_operand" "=f")
-	(unspec:V4DF [(match_operand:V4DF 1 "register_operand" "f")]
-		     UNSPEC_LASX_XVFRINTRP_D))]
-  "ISA_HAS_LASX"
-  "xvfrintrp.d\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4DF")])
-
-(define_insn "lasx_xvfrintrm_s"
-  [(set (match_operand:V8SF 0 "register_operand" "=f")
-	(unspec:V8SF [(match_operand:V8SF 1 "register_operand" "f")]
-		     UNSPEC_LASX_XVFRINTRM_S))]
-  "ISA_HAS_LASX"
-  "xvfrintrm.s\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V8SF")])
-
-(define_insn "lasx_xvfrintrm_d"
-  [(set (match_operand:V4DF 0 "register_operand" "=f")
-	(unspec:V4DF [(match_operand:V4DF 1 "register_operand" "f")]
-		     UNSPEC_LASX_XVFRINTRM_D))]
-  "ISA_HAS_LASX"
-  "xvfrintrm.d\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4DF")])
-
-;; Vector versions of the floating-point frint patterns.
-;; Expands to btrunc, ceil, floor, rint.
-(define_insn "<FRINT256_S:frint256_pattern_s>v8sf2"
- [(set (match_operand:V8SF 0 "register_operand" "=f")
-	(unspec:V8SF [(match_operand:V8SF 1 "register_operand" "f")]
-			 FRINT256_S))]
-  "ISA_HAS_LASX"
-  "xvfrint<FRINT256_S:frint256_suffix>.s\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V8SF")])
-
-(define_insn "<FRINT256_D:frint256_pattern_d>v4df2"
- [(set (match_operand:V4DF 0 "register_operand" "=f")
-	(unspec:V4DF [(match_operand:V4DF 1 "register_operand" "f")]
-			 FRINT256_D))]
-  "ISA_HAS_LASX"
-  "xvfrint<FRINT256_D:frint256_suffix>.d\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4DF")])
-
-;; Expands to round.
-(define_insn "round<mode>2"
- [(set (match_operand:FLASX 0 "register_operand" "=f")
-	(unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")]
-			 UNSPEC_LASX_XVFRINT))]
-  "ISA_HAS_LASX"
-  "xvfrint.<flasxfmt>\t%u0,%u1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "<MODE>")])
-
 ;; Offset load and broadcast
 (define_expand "lasx_xvldrepl_<lasxfmt_f>"
   [(match_operand:LASX 0 "register_operand")
diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc
index db02aacdc3f..cbd833aa283 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -419,8 +419,6 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lsx_vabsd_hu CODE_FOR_lsx_vabsd_u_hu
 #define CODE_FOR_lsx_vabsd_wu CODE_FOR_lsx_vabsd_u_wu
 #define CODE_FOR_lsx_vabsd_du CODE_FOR_lsx_vabsd_u_du
-#define CODE_FOR_lsx_vftint_w_s CODE_FOR_lsx_vftint_s_w_s
-#define CODE_FOR_lsx_vftint_l_d CODE_FOR_lsx_vftint_s_l_d
 #define CODE_FOR_lsx_vftint_wu_s CODE_FOR_lsx_vftint_u_wu_s
 #define CODE_FOR_lsx_vftint_lu_d CODE_FOR_lsx_vftint_u_lu_d
 #define CODE_FOR_lsx_vandn_v CODE_FOR_vandnv16qi3
@@ -725,8 +723,6 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lasx_xvssrlrn_bu_h CODE_FOR_lasx_xvssrlrn_u_bu_h
 #define CODE_FOR_lasx_xvssrlrn_hu_w CODE_FOR_lasx_xvssrlrn_u_hu_w
 #define CODE_FOR_lasx_xvssrlrn_wu_d CODE_FOR_lasx_xvssrlrn_u_wu_d
-#define CODE_FOR_lasx_xvftint_w_s CODE_FOR_lasx_xvftint_s_w_s
-#define CODE_FOR_lasx_xvftint_l_d CODE_FOR_lasx_xvftint_s_l_d
 #define CODE_FOR_lasx_xvftint_wu_s CODE_FOR_lasx_xvftint_u_wu_s
 #define CODE_FOR_lasx_xvftint_lu_d CODE_FOR_lasx_xvftint_u_lu_d
 #define CODE_FOR_lasx_xvsllwil_h_b CODE_FOR_lasx_xvsllwil_s_h_b
diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index cd4ed495697..78ed63f2132 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -4026,11 +4026,8 @@ (define_peephole2
 (include "generic.md")
 (include "la464.md")
 
-; The LoongArch SX Instructions.
-(include "lsx.md")
-
-; The LoongArch ASX Instructions.
-(include "lasx.md")
+; The LoongArch SIMD Instructions.
+(include "simd.md")
 
 (define_c_enum "unspec" [
   UNSPEC_ADDRESS_FIRST
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 5e8d8d74b43..c1c3719e383 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -55,7 +55,6 @@ (define_c_enum "unspec" [
   UNSPEC_LSX_VFCMP_SULT
   UNSPEC_LSX_VFCMP_SUN
   UNSPEC_LSX_VFCMP_SUNE
-  UNSPEC_LSX_VFTINT_S
   UNSPEC_LSX_VFTINT_U
   UNSPEC_LSX_VSAT_S
   UNSPEC_LSX_VSAT_U
@@ -89,9 +88,6 @@ (define_c_enum "unspec" [
   UNSPEC_LSX_VEXTRINS
   UNSPEC_LSX_VMSKLTZ
   UNSPEC_LSX_VSIGNCOV
-  UNSPEC_LSX_VFTINTRNE
-  UNSPEC_LSX_VFTINTRP
-  UNSPEC_LSX_VFTINTRM
   UNSPEC_LSX_VFTINT_W_D
   UNSPEC_LSX_VFFINT_S_L
   UNSPEC_LSX_VFTINTRZ_W_D
@@ -110,14 +106,6 @@ (define_c_enum "unspec" [
   UNSPEC_LSX_VFTINTRNEL_L_S
   UNSPEC_LSX_VFTINTRNEH_L_S
   UNSPEC_LSX_VFTINTH_L_H
-  UNSPEC_LSX_VFRINTRNE_S
-  UNSPEC_LSX_VFRINTRNE_D
-  UNSPEC_LSX_VFRINTRZ_S
-  UNSPEC_LSX_VFRINTRZ_D
-  UNSPEC_LSX_VFRINTRP_S
-  UNSPEC_LSX_VFRINTRP_D
-  UNSPEC_LSX_VFRINTRM_S
-  UNSPEC_LSX_VFRINTRM_D
   UNSPEC_LSX_VSSRARN_S
   UNSPEC_LSX_VSSRARN_U
   UNSPEC_LSX_VSSRLN_U
@@ -221,9 +209,6 @@ (define_mode_iterator LSX_D    [V2DI V2DF])
 ;; Only used for copy_{u,s}.w and vilvh.
 (define_mode_iterator LSX_W    [V4SI V4SF])
 
-;; Only integer modes.
-(define_mode_iterator ILSX     [V2DI V4SI V8HI V16QI])
-
 ;; As ILSX but excludes V16QI.
 (define_mode_iterator ILSX_DWH [V2DI V4SI V8HI])
 
@@ -242,21 +227,9 @@ (define_mode_iterator ILSX_HB  [V8HI V16QI])
 ;;;; Only integer modes for fixed-point madd_q/maddr_q.
 ;;(define_mode_iterator ILSX_WH  [V4SI V8HI])
 
-;; Only floating-point modes.
-(define_mode_iterator FLSX     [V2DF V4SF])
-
 ;; Only used for immediate set shuffle elements instruction.
 (define_mode_iterator LSX_WHB_W [V4SI V8HI V16QI V4SF])
 
-;; The attribute gives the integer vector mode with same size.
-(define_mode_attr VIMODE
-  [(V2DF "V2DI")
-   (V4SF "V4SI")
-   (V2DI "V2DI")
-   (V4SI "V4SI")
-   (V8HI "V8HI")
-   (V16QI "V16QI")])
-
 ;; The attribute gives half modes for vector modes.
 (define_mode_attr VHMODE
   [(V8HI "V16QI")
@@ -400,38 +373,6 @@ (define_mode_attr bitimm
    (V4SI  "uimm5")
    (V2DI  "uimm6")])
 
-
-(define_int_iterator FRINT_S [UNSPEC_LSX_VFRINTRP_S
-			    UNSPEC_LSX_VFRINTRZ_S
-			    UNSPEC_LSX_VFRINT
-			    UNSPEC_LSX_VFRINTRM_S])
-
-(define_int_iterator FRINT_D [UNSPEC_LSX_VFRINTRP_D
-			    UNSPEC_LSX_VFRINTRZ_D
-			    UNSPEC_LSX_VFRINT
-			    UNSPEC_LSX_VFRINTRM_D])
-
-(define_int_attr frint_pattern_s
-  [(UNSPEC_LSX_VFRINTRP_S  "ceil")
-   (UNSPEC_LSX_VFRINTRZ_S  "btrunc")
-   (UNSPEC_LSX_VFRINT	   "rint")
-   (UNSPEC_LSX_VFRINTRM_S  "floor")])
-
-(define_int_attr frint_pattern_d
-  [(UNSPEC_LSX_VFRINTRP_D  "ceil")
-   (UNSPEC_LSX_VFRINTRZ_D  "btrunc")
-   (UNSPEC_LSX_VFRINT	   "rint")
-   (UNSPEC_LSX_VFRINTRM_D  "floor")])
-
-(define_int_attr frint_suffix
-  [(UNSPEC_LSX_VFRINTRP_S  "rp")
-   (UNSPEC_LSX_VFRINTRP_D  "rp")
-   (UNSPEC_LSX_VFRINTRZ_S  "rz")
-   (UNSPEC_LSX_VFRINTRZ_D  "rz")
-   (UNSPEC_LSX_VFRINT	   "")
-   (UNSPEC_LSX_VFRINTRM_S  "rm")
-   (UNSPEC_LSX_VFRINTRM_D  "rm")])
-
 (define_expand "vec_init<mode><unitmode>"
   [(match_operand:LSX 0 "register_operand")
    (match_operand:LSX 1 "")]
@@ -1616,15 +1557,6 @@ (define_insn "lsx_vfrecip_<flsxfmt>"
   [(set_attr "type" "simd_fdiv")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "lsx_vfrint_<flsxfmt>"
-  [(set (match_operand:FLSX 0 "register_operand" "=f")
-	(unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFRINT))]
-  "ISA_HAS_LSX"
-  "vfrint.<flsxfmt>\t%w0,%w1"
-  [(set_attr "type" "simd_fcvt")
-   (set_attr "mode" "<MODE>")])
-
 (define_insn "lsx_vfrsqrt_<flsxfmt>"
   [(set (match_operand:FLSX 0 "register_operand" "=f")
 	(unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")]
@@ -1634,16 +1566,6 @@ (define_insn "lsx_vfrsqrt_<flsxfmt>"
   [(set_attr "type" "simd_fdiv")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "lsx_vftint_s_<ilsxfmt>_<flsxfmt>"
-  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
-	(unspec:<VIMODE> [(match_operand:FLSX 1 "register_operand" "f")]
-			 UNSPEC_LSX_VFTINT_S))]
-  "ISA_HAS_LSX"
-  "vftint.<ilsxfmt>.<flsxfmt>\t%w0,%w1"
-  [(set_attr "type" "simd_fcvt")
-   (set_attr "cnv_mode" "<FINTCNV_2>")
-   (set_attr "mode" "<MODE>")])
-
 (define_insn "lsx_vftint_u_<ilsxfmt_u>_<flsxfmt>"
   [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
 	(unspec:<VIMODE> [(match_operand:FLSX 1 "register_operand" "f")]
@@ -1654,15 +1576,6 @@ (define_insn "lsx_vftint_u_<ilsxfmt_u>_<flsxfmt>"
    (set_attr "cnv_mode" "<FINTCNV_2>")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "fix_trunc<FLSX:mode><mode_i>2"
-  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
-	(fix:<VIMODE> (match_operand:FLSX 1 "register_operand" "f")))]
-  "ISA_HAS_LSX"
-  "vftintrz.<ilsxfmt>.<flsxfmt>\t%w0,%w1"
-  [(set_attr "type" "simd_fcvt")
-   (set_attr "cnv_mode" "<FINTCNV_2>")
-   (set_attr "mode" "<MODE>")])
-
 (define_insn "fixuns_trunc<FLSX:mode><mode_i>2"
   [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
 	(unsigned_fix:<VIMODE> (match_operand:FLSX 1 "register_operand" "f")))]
@@ -2965,60 +2878,6 @@ (define_insn "vfnmadd<mode>4_nmadd4"
   [(set_attr "type" "simd_fmadd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "lsx_vftintrne_w_s"
-  [(set (match_operand:V4SI 0 "register_operand" "=f")
-	(unspec:V4SI [(match_operand:V4SF 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFTINTRNE))]
-  "ISA_HAS_LSX"
-  "vftintrne.w.s\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4SF")])
-
-(define_insn "lsx_vftintrne_l_d"
-  [(set (match_operand:V2DI 0 "register_operand" "=f")
-	(unspec:V2DI [(match_operand:V2DF 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFTINTRNE))]
-  "ISA_HAS_LSX"
-  "vftintrne.l.d\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V2DF")])
-
-(define_insn "lsx_vftintrp_w_s"
-  [(set (match_operand:V4SI 0 "register_operand" "=f")
-	(unspec:V4SI [(match_operand:V4SF 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFTINTRP))]
-  "ISA_HAS_LSX"
-  "vftintrp.w.s\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4SF")])
-
-(define_insn "lsx_vftintrp_l_d"
-  [(set (match_operand:V2DI 0 "register_operand" "=f")
-	(unspec:V2DI [(match_operand:V2DF 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFTINTRP))]
-  "ISA_HAS_LSX"
-  "vftintrp.l.d\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V2DF")])
-
-(define_insn "lsx_vftintrm_w_s"
-  [(set (match_operand:V4SI 0 "register_operand" "=f")
-	(unspec:V4SI [(match_operand:V4SF 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFTINTRM))]
-  "ISA_HAS_LSX"
-  "vftintrm.w.s\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4SF")])
-
-(define_insn "lsx_vftintrm_l_d"
-  [(set (match_operand:V2DI 0 "register_operand" "=f")
-	(unspec:V2DI [(match_operand:V2DF 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFTINTRM))]
-  "ISA_HAS_LSX"
-  "vftintrm.l.d\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V2DF")])
-
 (define_insn "lsx_vftint_w_d"
   [(set (match_operand:V4SI 0 "register_operand" "=f")
 	(unspec:V4SI [(match_operand:V2DF 1 "register_operand" "f")
@@ -3187,108 +3046,6 @@ (define_insn "lsx_vftintrnel_l_s"
   [(set_attr "type" "simd_shift")
    (set_attr "mode" "V4SF")])
 
-(define_insn "lsx_vfrintrne_s"
-  [(set (match_operand:V4SF 0 "register_operand" "=f")
-	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFRINTRNE_S))]
-  "ISA_HAS_LSX"
-  "vfrintrne.s\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4SF")])
-
-(define_insn "lsx_vfrintrne_d"
-  [(set (match_operand:V2DF 0 "register_operand" "=f")
-	(unspec:V2DF [(match_operand:V2DF 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFRINTRNE_D))]
-  "ISA_HAS_LSX"
-  "vfrintrne.d\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V2DF")])
-
-(define_insn "lsx_vfrintrz_s"
-  [(set (match_operand:V4SF 0 "register_operand" "=f")
-	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFRINTRZ_S))]
-  "ISA_HAS_LSX"
-  "vfrintrz.s\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4SF")])
-
-(define_insn "lsx_vfrintrz_d"
-  [(set (match_operand:V2DF 0 "register_operand" "=f")
-	(unspec:V2DF [(match_operand:V2DF 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFRINTRZ_D))]
-  "ISA_HAS_LSX"
-  "vfrintrz.d\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V2DF")])
-
-(define_insn "lsx_vfrintrp_s"
-  [(set (match_operand:V4SF 0 "register_operand" "=f")
-	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFRINTRP_S))]
-  "ISA_HAS_LSX"
-  "vfrintrp.s\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4SF")])
-
-(define_insn "lsx_vfrintrp_d"
-  [(set (match_operand:V2DF 0 "register_operand" "=f")
-	(unspec:V2DF [(match_operand:V2DF 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFRINTRP_D))]
-  "ISA_HAS_LSX"
-  "vfrintrp.d\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V2DF")])
-
-(define_insn "lsx_vfrintrm_s"
-  [(set (match_operand:V4SF 0 "register_operand" "=f")
-	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFRINTRM_S))]
-  "ISA_HAS_LSX"
-  "vfrintrm.s\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4SF")])
-
-(define_insn "lsx_vfrintrm_d"
-  [(set (match_operand:V2DF 0 "register_operand" "=f")
-	(unspec:V2DF [(match_operand:V2DF 1 "register_operand" "f")]
-		     UNSPEC_LSX_VFRINTRM_D))]
-  "ISA_HAS_LSX"
-  "vfrintrm.d\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V2DF")])
-
-;; Vector versions of the floating-point frint patterns.
-;; Expands to btrunc, ceil, floor, rint.
-(define_insn "<FRINT_S:frint_pattern_s>v4sf2"
- [(set (match_operand:V4SF 0 "register_operand" "=f")
-	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "f")]
-			 FRINT_S))]
-  "ISA_HAS_LSX"
-  "vfrint<FRINT_S:frint_suffix>.s\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V4SF")])
-
-(define_insn "<FRINT_D:frint_pattern_d>v2df2"
- [(set (match_operand:V2DF 0 "register_operand" "=f")
-	(unspec:V2DF [(match_operand:V2DF 1 "register_operand" "f")]
-			 FRINT_D))]
-  "ISA_HAS_LSX"
-  "vfrint<FRINT_D:frint_suffix>.d\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "V2DF")])
-
-;; Expands to round.
-(define_insn "round<mode>2"
- [(set (match_operand:FLSX 0 "register_operand" "=f")
-	(unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")]
-			 UNSPEC_LSX_VFRINT))]
-  "ISA_HAS_LSX"
-  "vfrint.<flsxfrint>\t%w0,%w1"
-  [(set_attr "type" "simd_shift")
-   (set_attr "mode" "<MODE>")])
-
 ;; Offset load and broadcast
 (define_expand "lsx_vldrepl_<lsxfmt_f>"
   [(match_operand:LSX 0 "register_operand")
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
new file mode 100644
index 00000000000..f371e201127
--- /dev/null
+++ b/gcc/config/loongarch/simd.md
@@ -0,0 +1,194 @@
+;; Integer modes supported by LSX.
+(define_mode_iterator ILSX    [V2DI V4SI V8HI V16QI])
+
+;; Integer modes supported by LASX.
+(define_mode_iterator ILASX   [V4DI V8SI V16HI V32QI])
+
+;; FP modes supported by LSX
+(define_mode_iterator FLSX    [V2DF V4SF])
+
+;; FP modes supported by LASX
+(define_mode_iterator FLASX   [V4DF V8SF])
+
+;; All integer modes available
+(define_mode_iterator IVEC    [(ILSX "ISA_HAS_LSX") (ILASX "ISA_HAS_LASX")])
+
+;; All FP modes available
+(define_mode_iterator FVEC    [(FLSX "ISA_HAS_LSX") (FLASX "ISA_HAS_LASX")])
+
+;; Mnemonic prefix, "x" for LASX modes.
+(define_mode_attr x [(V2DI "") (V4SI "") (V8HI "") (V16QI "")
+		     (V2DF "") (V4SF "")
+		     (V4DI "x") (V8SI "x") (V16HI "x") (V32QI "x")
+		     (V4DF "x") (V8SF "x")])
+
+;; Modifier for vector register, "w" for LSX modes, "u" for LASX modes.
+(define_mode_attr wu [(V2DI "w") (V4SI "w") (V8HI "w") (V16QI "w")
+		      (V2DF "w") (V4SF "w")
+		      (V4DI "u") (V8SI "u") (V16HI "u") (V32QI "u")
+		      (V4DF "u") (V8SF "u")])
+
+;; define_insn name prefix, "lsx" or "lasx"
+(define_mode_attr simd_isa
+  [(V2DI "lsx") (V4SI "lsx") (V8HI "lsx") (V16QI "lsx")
+   (V2DF "lsx") (V4SF "lsx")
+   (V4DI "lasx") (V8SI "lasx") (V16HI "lasx") (V32QI "lasx")
+   (V4DF "lasx") (V8SF "lasx")])
+
+;; Widen integer modes for intermediate values in RTX pattern.
+(define_mode_attr WVEC [(V2DI "V2TI") (V4DI "V4TI")
+			(V4SI "V4DI") (V8SI "V8DI")
+			(V8HI "V8SI") (V16HI "V16SI")
+			(V16QI "V16HI") (V32QI "V32HI")])
+
+;; Integer vector modes with the same length and unit size as a mode.
+(define_mode_attr VIMODE [(V2DI "V2DI") (V4SI "V4SI")
+			  (V8HI "V8HI") (V16QI "V16QI")
+			  (V2DF "V2DI") (V4SF "V4SI")
+			  (V4DI "V4DI") (V8SI "V8SI")
+			  (V16HI "V16HI") (V32QI "V32QI")
+			  (V4DF "V4DI") (V8SF "V8SI")])
+
+;; Lower-case version.
+(define_mode_attr vimode [(V2DF "v2di") (V4SF "v4si")
+			  (V4DF "v4di") (V8SF "v8si")])
+
+;; Suffix for LSX or LASX instructions.
+(define_mode_attr simdfmt [(V2DF "d") (V4DF "d")
+			   (V4SF "s") (V8SF "s")
+			   (V2DI "d") (V4DI "d")
+			   (V4SI "w") (V8SI "w")
+			   (V8HI "h") (V16HI "h")
+			   (V16QI "b") (V32QI "b")])
+
+;; Suffix for integer mode in LSX or LASX instructions with FP input but
+;; integer output.
+(define_mode_attr simdifmt_for_f [(V2DF "l") (V4DF "l")
+				  (V4SF "w") (V8SF "w")])
+
+;; Size of vector elements in bits.
+(define_mode_attr elmbits [(V2DI "64") (V4DI "64")
+			   (V4SI "32") (V8SI "32")
+			   (V8HI "16") (V16HI "16")
+			   (V16QI "8") (V32QI "8")])
+
+;; =======================================================================
+;; For many LASX instructions, the only difference of it from the LSX
+;; counterpart is the length of vector operands.  Describe these LSX/LASX
+;; instruction here so we can avoid duplicating logics.
+;; =======================================================================
+
+;;
+;; FP vector rounding instructions
+;;
+
+(define_c_enum "unspec"
+  [UNSPEC_SIMD_FRINTRP
+   UNSPEC_SIMD_FRINTRZ
+   UNSPEC_SIMD_FRINT
+   UNSPEC_SIMD_FRINTRM
+   UNSPEC_SIMD_FRINTRNE])
+
+(define_int_iterator SIMD_FRINT
+  [UNSPEC_SIMD_FRINTRP
+   UNSPEC_SIMD_FRINTRZ
+   UNSPEC_SIMD_FRINT
+   UNSPEC_SIMD_FRINTRM
+   UNSPEC_SIMD_FRINTRNE])
+
+(define_int_attr simd_frint_rounding
+  [(UNSPEC_SIMD_FRINTRP		"rp")
+   (UNSPEC_SIMD_FRINTRZ		"rz")
+   (UNSPEC_SIMD_FRINT		"")
+   (UNSPEC_SIMD_FRINTRM		"rm")
+   (UNSPEC_SIMD_FRINTRNE	"rne")])
+
+;; All these, but rint, are controlled by -ffp-int-builtin-inexact.
+;; Note: nearbyint is NOT allowed to raise FE_INEXACT even if
+;; -ffp-int-builtin-inexact, but rint is ALLOWED to raise it even if
+;; -fno-fp-int-builtin-inexact.
+(define_int_attr simd_frint_pattern
+  [(UNSPEC_SIMD_FRINTRP		"ceil")
+   (UNSPEC_SIMD_FRINTRZ		"btrunc")
+   (UNSPEC_SIMD_FRINT		"rint")
+   (UNSPEC_SIMD_FRINTRNE	"roundeven")
+   (UNSPEC_SIMD_FRINTRM		"floor")])
+
+;; <x>vfrint.{/rp/rz/rm}
+(define_insn "<simd_isa>_<x>vfrint<simd_frint_rounding>_<simdfmt>"
+  [(set (match_operand:FVEC 0 "register_operand" "=f")
+	(unspec:FVEC [(match_operand:FVEC 1 "register_operand" "f")]
+		     SIMD_FRINT))]
+  ""
+  "<x>vfrint<simd_frint_rounding>.<simdfmt>\t%<wu>0,%<wu>1"
+  [(set_attr "type" "simd_fcvt")
+   (set_attr "mode" "<MODE>")])
+
+;; Expand the standard-named patterns to <x>vfrint instructions if
+;; raising inexact exception is allowed.
+
+(define_expand "<simd_frint_pattern><mode>2"
+  [(set (match_operand:FVEC 0 "register_operand" "=f")
+	(unspec:FVEC [(match_operand:FVEC 1 "register_operand" "f")]
+		     SIMD_FRINT))]
+   "<SIMD_FRINT> == UNSPEC_SIMD_FRINT ||
+    flag_fp_int_builtin_inexact ||
+    !flag_trapping_math")
+
+;; ftrunc is like btrunc, but it's allowed to raise inexact exception
+;; even if -fno-fp-int-builtin-inexact.
+(define_expand "ftrunc<mode>2"
+  [(set (match_operand:FVEC 0 "register_operand" "=f")
+	(unspec:FVEC [(match_operand:FVEC 1 "register_operand" "f")]
+		     UNSPEC_SIMD_FRINTRZ))]
+  "")
+
+;; <x>vftint.{/rp/rz/rm}
+(define_insn
+  "<simd_isa>_<x>vftint<simd_frint_rounding>_<simdifmt_for_f>_<simdfmt>"
+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
+	(fix:<VIMODE>
+	  (unspec:FVEC [(match_operand:FVEC 1 "register_operand" "f")]
+		       SIMD_FRINT)))]
+  ""
+  "<x>vftint<simd_frint_rounding>.<simdifmt_for_f>.<simdfmt>\t%<wu>0,%<wu>1"
+  [(set_attr "type" "simd_fcvt")
+   (set_attr "mode" "<MODE>")])
+
+;; Expand the standard-named patterns to <x>vftint instructions if
+;; raising inexact exception.
+
+(define_expand "l<simd_frint_pattern><mode><vimode>2"
+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
+	(fix:<VIMODE>
+	  (unspec:FVEC [(match_operand:FVEC 1 "register_operand" "f")]
+		       SIMD_FRINT)))]
+   "<SIMD_FRINT> == UNSPEC_SIMD_FRINT ||
+    flag_fp_int_builtin_inexact ||
+    !flag_trapping_math")
+
+;; fix_trunc is allowed to raise inexact exception even if
+;; -fno-fp-int-builtin-inexact.  Because the middle end trys to match
+;; (FIX x) and it does not know (FIX (UNSPEC_SIMD_FRINTRZ x)), we need
+;; to use define_insn_and_split instead of define_expand (expanders are
+;; not considered during matching).
+(define_insn_and_split "fix_trunc<mode><vimode>2"
+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
+	(fix:<VIMODE> (match_operand:FVEC 1 "register_operand" "f")))]
+  ""
+  "#"
+  ""
+  [(const_int 0)]
+  {
+    emit_insn (gen_<simd_isa>_<x>vftintrz_<simdifmt_for_f>_<simdfmt> (
+      operands[0], operands[1]));
+    DONE;
+  }
+  [(set_attr "type" "simd_fcvt")
+   (set_attr "mode" "<MODE>")])
+
+; The LoongArch SX Instructions.
+(include "lsx.md")
+
+; The LoongArch ASX Instructions.
+(include "lasx.md")
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c b/gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c
new file mode 100644
index 00000000000..7bbaf1fba5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mabi=lp64d -mdouble-float -fno-math-errno -fno-fp-int-builtin-inexact -mlasx" } */
+
+#include "vect-frint.c"
+
+/* ceil */
+/* { dg-final { scan-assembler "bl\t%plt\\(ceil\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(ceilf\\)" } } */
+/* { dg-final { scan-assembler-not "\tvfrintrp\.s" } } */
+/* { dg-final { scan-assembler-not "\tvfrintrp\.d" } } */
+/* { dg-final { scan-assembler-not "\txvfrintrp\.s" } } */
+/* { dg-final { scan-assembler-not "\txvfrintrp\.d" } } */
+
+/* floor */
+/* { dg-final { scan-assembler "bl\t%plt\\(floor\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(floorf\\)" } } */
+/* { dg-final { scan-assembler-not "\tvfrintrm\.s" } } */
+/* { dg-final { scan-assembler-not "\tvfrintrm\.d" } } */
+/* { dg-final { scan-assembler-not "\txvfrintrm\.s" } } */
+/* { dg-final { scan-assembler-not "\txvfrintrm\.d" } } */
+
+/* nearbyint + rint: Only rint is allowed */
+/* { dg-final { scan-assembler "bl\t%plt\\(nearbyint\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(nearbyintf\\)" } } */
+/* { dg-final { scan-assembler-times "\tvfrint\.s" 1 } } */
+/* { dg-final { scan-assembler-times "\tvfrint\.d" 1 } } */
+/* { dg-final { scan-assembler-times "\txvfrint\.s" 1 } } */
+/* { dg-final { scan-assembler-times "\txvfrint\.d" 1 } } */
+
+/* round: we don't have a corresponding instruction */
+/* { dg-final { scan-assembler "bl\t%plt\\(round\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(roundf\\)" } } */
+
+/* roundeven */
+/* { dg-final { scan-assembler "bl\t%plt\\(roundeven\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(roundevenf\\)" } } */
+/* { dg-final { scan-assembler-not "\tvfrintrne\.s" } } */
+/* { dg-final { scan-assembler-not "\tvfrintrne\.d" } } */
+/* { dg-final { scan-assembler-not "\txvfrintrne\.s" } } */
+/* { dg-final { scan-assembler-not "\txvfrintrne\.d" } } */
+
+/* trunc */
+/* { dg-final { scan-assembler "bl\t%plt\\(trunc\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(truncf\\)" } } */
+/* { dg-final { scan-assembler-not "\tvfrintrz\.s" } } */
+/* { dg-final { scan-assembler-not "\tvfrintrz\.d" } } */
+/* { dg-final { scan-assembler-not "\txvfrintrz\.s" } } */
+/* { dg-final { scan-assembler-not "\txvfrintrz\.d" } } */
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-frint.c b/gcc/testsuite/gcc.target/loongarch/vect-frint.c
new file mode 100644
index 00000000000..6bf211e7e98
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-frint.c
@@ -0,0 +1,85 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mabi=lp64d -mdouble-float -fno-math-errno -ffp-int-builtin-inexact -mlasx" } */
+
+float out_x[8];
+double out_y[4];
+
+float x[8];
+double y[4];
+
+#define TEST(op, N, func) \
+void \
+test_##op##_##N##_##func () \
+{ \
+  for (int i = 0; i < N; i++) \
+    out_##op[i] = __builtin_##func (op[i]); \
+}
+
+TEST(x, 4, ceilf);
+TEST(x, 4, floorf);
+TEST(x, 4, nearbyintf);
+TEST(x, 4, rintf);
+TEST(x, 4, roundf);
+TEST(x, 4, roundevenf);
+TEST(x, 4, truncf);
+
+TEST(x, 8, ceilf);
+TEST(x, 8, floorf);
+TEST(x, 8, nearbyintf);
+TEST(x, 8, rintf);
+TEST(x, 8, roundf);
+TEST(x, 8, roundevenf);
+TEST(x, 8, truncf);
+
+TEST(y, 2, ceil);
+TEST(y, 2, floor);
+TEST(y, 2, nearbyint);
+TEST(y, 2, rint);
+TEST(y, 2, round);
+TEST(y, 2, roundeven);
+TEST(y, 2, trunc);
+
+TEST(y, 4, ceil);
+TEST(y, 4, floor);
+TEST(y, 4, nearbyint);
+TEST(y, 4, rint);
+TEST(y, 4, round);
+TEST(y, 4, roundeven);
+TEST(y, 4, trunc);
+
+/* ceil */
+/* { dg-final { scan-assembler "\tvfrintrp\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrp\.d" } } */
+/* { dg-final { scan-assembler "\txvfrintrp\.s" } } */
+/* { dg-final { scan-assembler "\txvfrintrp\.d" } } */
+
+/* floor */
+/* { dg-final { scan-assembler "\tvfrintrm\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrm\.d" } } */
+/* { dg-final { scan-assembler "\txvfrintrm\.s" } } */
+/* { dg-final { scan-assembler "\txvfrintrm\.d" } } */
+
+/* rint and nearbyint
+   nearbyint has been disallowed to raise FE_INEXACT for decades.  */
+/* { dg-final { scan-assembler-times "\tvfrint\.s" 1 } } */
+/* { dg-final { scan-assembler-times "\tvfrint\.d" 1 } } */
+/* { dg-final { scan-assembler-times "\txvfrint\.s" 1 } } */
+/* { dg-final { scan-assembler-times "\txvfrint\.d" 1 } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(nearbyint\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(nearbyintf\\)" } } */
+
+/* round: we don't have a corresponding instruction */
+/* { dg-final { scan-assembler "bl\t%plt\\(round\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(roundf\\)" } } */
+
+/* roundeven */
+/* { dg-final { scan-assembler "\tvfrintrne\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrne\.d" } } */
+/* { dg-final { scan-assembler "\txvfrintrne\.s" } } */
+/* { dg-final { scan-assembler "\txvfrintrne\.d" } } */
+
+/* trunc */
+/* { dg-final { scan-assembler "\tvfrintrz\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrz\.d" } } */
+/* { dg-final { scan-assembler "\txvfrintrz\.s" } } */
+/* { dg-final { scan-assembler "\txvfrintrz\.d" } } */
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c b/gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c
new file mode 100644
index 00000000000..83d268099ac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mabi=lp64d -mdouble-float -fno-math-errno -fno-fp-int-builtin-inexact -mlasx" } */
+
+#include "vect-ftint.c"
+
+/* ceil */
+/* { dg-final { scan-assembler "bl\t%plt\\(ceil\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(ceilf\\)" } } */
+/* { dg-final { scan-assembler-not "\tvftintrp\.w\.s" } } */
+/* { dg-final { scan-assembler-not "\tvftintrp\.l\.d" } } */
+/* { dg-final { scan-assembler-not "\txvftintrp\.w\.s" } } */
+/* { dg-final { scan-assembler-not "\txvftintrp\.l\.d" } } */
+
+/* floor */
+/* { dg-final { scan-assembler "bl\t%plt\\(floor\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(floorf\\)" } } */
+/* { dg-final { scan-assembler-not "\tvftintrm\.w\.s" } } */
+/* { dg-final { scan-assembler-not "\tvftintrm\.l\.d" } } */
+/* { dg-final { scan-assembler-not "\txvftintrm\.w\.s" } } */
+/* { dg-final { scan-assembler-not "\txvftintrm\.l\.d" } } */
+
+/* nearbyint + rint */
+/* { dg-final { scan-assembler "bl\t%plt\\(floor\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(floorf\\)" } } */
+/* { dg-final { scan-assembler-times "\tvftint\.w\.s" 1 } } */
+/* { dg-final { scan-assembler-times "\tvftint\.l\.d" 1 } } */
+/* { dg-final { scan-assembler-times "\txvftint\.w\.s" 1 } } */
+/* { dg-final { scan-assembler-times "\txvftint\.l\.d" 1 } } */
+
+/* round: we don't have a corresponding instruction */
+/* { dg-final { scan-assembler "bl\t%plt\\(lround\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(roundf\\)" } } */
+
+/* roundeven */
+/* { dg-final { scan-assembler "bl\t%plt\\(roundeven\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(roundevenf\\)" } } */
+/* { dg-final { scan-assembler-not "\tvftintrne\.w\.s" } } */
+/* { dg-final { scan-assembler-not "\tvftintrne\.l\.d" } } */
+/* { dg-final { scan-assembler-not "\txvftintrne\.w\.s" } } */
+/* { dg-final { scan-assembler-not "\txvftintrne\.l\.d" } } */
+
+/* trunc: XFAIL due to PR 107723 */
+/* { dg-final { scan-assembler "bl\t%plt\\(trunc\\)" { xfail *-*-* } } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(truncf\\)" } } */
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-ftint.c b/gcc/testsuite/gcc.target/loongarch/vect-ftint.c
new file mode 100644
index 00000000000..c4962ed1774
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-ftint.c
@@ -0,0 +1,83 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mabi=lp64d -mdouble-float -fno-math-errno -ffp-int-builtin-inexact -mlasx" } */
+
+int out_x[8];
+long out_y[4];
+
+float x[8];
+double y[4];
+
+#define TEST(op, N, func) \
+void \
+test_##op##_##N##_##func () \
+{ \
+  for (int i = 0; i < N; i++) \
+    out_##op[i] = __builtin_##func (op[i]); \
+}
+
+TEST(x, 4, ceilf);
+TEST(x, 4, floorf);
+TEST(x, 4, nearbyintf);
+TEST(x, 4, rintf);
+TEST(x, 4, roundf);
+TEST(x, 4, roundevenf);
+TEST(x, 4, truncf);
+
+TEST(x, 8, ceilf);
+TEST(x, 8, floorf);
+TEST(x, 8, nearbyintf);
+TEST(x, 8, rintf);
+TEST(x, 8, roundf);
+TEST(x, 8, roundevenf);
+TEST(x, 8, truncf);
+
+TEST(y, 2, ceil);
+TEST(y, 2, floor);
+TEST(y, 2, nearbyint);
+TEST(y, 2, rint);
+TEST(y, 2, round);
+TEST(y, 2, roundeven);
+TEST(y, 2, trunc);
+
+TEST(y, 4, ceil);
+TEST(y, 4, floor);
+TEST(y, 4, nearbyint);
+TEST(y, 4, rint);
+TEST(y, 4, round);
+TEST(y, 4, roundeven);
+TEST(y, 4, trunc);
+
+/* ceil */
+/* { dg-final { scan-assembler "\tvftintrp\.w\.s" } } */
+/* { dg-final { scan-assembler "\tvftintrp\.l\.d" } } */
+/* { dg-final { scan-assembler "\txvftintrp\.w\.s" } } */
+/* { dg-final { scan-assembler "\txvftintrp\.l\.d" } } */
+
+/* floor */
+/* { dg-final { scan-assembler "\tvftintrm\.w\.s" } } */
+/* { dg-final { scan-assembler "\tvftintrm\.l\.d" } } */
+/* { dg-final { scan-assembler "\txvftintrm\.w\.s" } } */
+/* { dg-final { scan-assembler "\txvftintrm\.l\.d" } } */
+
+/* rint and nearbyint
+   nearbyint has been disallowed to raise FE_INEXACT for decades.  */
+/* { dg-final { scan-assembler-times "\tvftint\.w\.s" 1 } } */
+/* { dg-final { scan-assembler-times "\tvftint\.l\.d" 1 } } */
+/* { dg-final { scan-assembler-times "\txvftint\.w\.s" 1 } } */
+/* { dg-final { scan-assembler-times "\txvftint\.l\.d" 1 } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(nearbyint\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(nearbyintf\\)" } } */
+
+/* round: we don't have a corresponding instruction */
+/* { dg-final { scan-assembler "bl\t%plt\\(lround\\)" } } */
+/* { dg-final { scan-assembler "bl\t%plt\\(roundf\\)" } } */
+
+/* roundeven */
+/* { dg-final { scan-assembler "\tvftintrne\.w\.s" } } */
+/* { dg-final { scan-assembler "\tvftintrne\.l\.d" } } */
+/* { dg-final { scan-assembler "\txvftintrne\.w\.s" } } */
+/* { dg-final { scan-assembler "\txvftintrne\.l\.d" } } */
+
+/* trunc */
+/* { dg-final { scan-assembler-not "bl\t%plt\\(trunc\\)" } } */
+/* { dg-final { scan-assembler-not "bl\t%plt\\(truncf\\)" } } */
-- 
2.42.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-20  0:47 ` [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578] Xi Ruoyao
@ 2023-11-23  6:35   ` chenglulu
  2023-11-23  7:11     ` Xi Ruoyao
  0 siblings, 1 reply; 33+ messages in thread
From: chenglulu @ 2023-11-23  6:35 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua


在 2023/11/20 上午8:47, Xi Ruoyao 写道:
> The usage LSX and LASX frint/ftint instructions had some problems:
>
> 1. These instructions raises FE_INEXACT, which is not allowed with
>     -fno-fp-int-builtin-inexact for most C2x section F.10.6 functions
>     (the only exceptions are rint, lrint, and llrint).
> 2. The "frint" instruction without explicit rounding mode is used for
>     roundM2, this is incorrect because roundM2 is defined "rounding
>     operand 1 to the *nearest* integer, rounding away from zero in the
>     event of a tie".  We actually don't have such an instruction.  Our
>     frintrne instruction is roundevenM2 (unfortunately, this is not
>     documented).
> 3. These define_insn's are written in a way not so easy to hack.
>
> So I removed these instructions and created a "simd.md" file, then added
> them and the corresponding expanders there.  The advantage of the
> simd.md file is we don't need to duplicate the RTL template twice (in
> lsx.md and lasx.md).
/* snip */
> +;; fix_trunc is allowed to raise inexact exception even if
> +;; -fno-fp-int-builtin-inexact.  Because the middle end trys to match
> +;; (FIX x) and it does not know (FIX (UNSPEC_SIMD_FRINTRZ x)), we need
> +;; to use define_insn_and_split instead of define_expand (expanders are
> +;; not considered during matching).

Hi,

  I don’t quite understand this part. Is it because define_insn would be 
duplicated with the above implementation,

so define_insn_and_split is used?


Thanks.

> +(define_insn_and_split "fix_trunc<mode><vimode>2"
> +  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> +	(fix:<VIMODE> (match_operand:FVEC 1 "register_operand" "f")))]
> +  ""
> +  "#"
> +  ""
> +  [(const_int 0)]
> +  {
> +    emit_insn (gen_<simd_isa>_<x>vftintrz_<simdifmt_for_f>_<simdfmt> (
> +      operands[0], operands[1]));
> +    DONE;
> +  }
> +  [(set_attr "type" "simd_fcvt")
> +   (set_attr "mode" "<MODE>")])




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-23  6:35   ` chenglulu
@ 2023-11-23  7:11     ` Xi Ruoyao
  2023-11-23  7:31       ` chenglulu
  0 siblings, 1 reply; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-23  7:11 UTC (permalink / raw)
  To: chenglulu, gcc-patches; +Cc: i, xuchenghua

On Thu, 2023-11-23 at 14:35 +0800, chenglulu wrote:
> Hi,
> 
>   I don’t quite understand this part. Is it because define_insn would be 
> duplicated with the above implementation,
> 
> so define_insn_and_split is used?

Yes, but if you think duplicating the above implementation is better I
can dup it as well (as it's just a single line).

(I wrote it as a define_expand but it didn't work, then I modified it to
define_insn_and_split).


> > +(define_insn_and_split "fix_trunc<mode><vimode>2"
> > +  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> > +	(fix:<VIMODE> (match_operand:FVEC 1 "register_operand" "f")))]
> > +  ""
> > +  "#"
> > +  ""
> > +  [(const_int 0)]
> > +  {
> > +    emit_insn (gen_<simd_isa>_<x>vftintrz_<simdifmt_for_f>_<simdfmt> (
> > +      operands[0], operands[1]));
> > +    DONE;
> > +  }
> > +  [(set_attr "type" "simd_fcvt")
> > +   (set_attr "mode" "<MODE>")])

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-23  7:11     ` Xi Ruoyao
@ 2023-11-23  7:31       ` chenglulu
  2023-11-23  8:13         ` chenglulu
  2023-11-23  8:54         ` Xi Ruoyao
  0 siblings, 2 replies; 33+ messages in thread
From: chenglulu @ 2023-11-23  7:31 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua


在 2023/11/23 下午3:11, Xi Ruoyao 写道:
> On Thu, 2023-11-23 at 14:35 +0800, chenglulu wrote:
>> Hi,
>>
>>    I don’t quite understand this part. Is it because define_insn would be
>> duplicated with the above implementation,
>>
>> so define_insn_and_split is used?
> Yes, but if you think duplicating the above implementation is better I
> can dup it as well (as it's just a single line).
>
> (I wrote it as a define_expand but it didn't work, then I modified it to
> define_insn_and_split).
>
I just thought it was weird when I was looking at the code.

I modified this code to use define_expand:

     (define_expand "fix_trunc<mode><vimode>2"
       [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
             (fix:<VIMODE> (match_operand:FVEC 1 "register_operand" "f")))]
       ""
       {
         emit_insn (gen_<simd_isa>_<x>vftintrz_<simdifmt_for_f>_<simdfmt> (
           operands[0], operands[1]));
         DONE;
       }
       [(set_attr "type" "simd_fcvt")
        (set_attr "mode" "<MODE>")])

Here are my test cases:

     typedef float __attribute__ ((mode (SF))) float_t;
     typedef int __attribute__ ((mode (SI))) int_t;

     extern int_t v[4];
     int_t
     lt_fixdfsi (float_t *x)
     {

       for (int i=0;i<4;i++)
         v[i] = x[i];
     }

This still achieves the desired effect, generating the following 
assembly code:

lt_fixdfsi:
.LFB0 = .
     .cfi_startproc

     or    $r13,$r4,$r0     # 16    [c=4 l=4]  *movdi_64bit/0
     la.global    $r12,v     # 8    [c=4 l=12]  *movdi_64bit/1
     vld    $vr0,$r13,0     # 6    [c=12 l=4]  movv4sf_lsx/1
     vftintrz.w.s    $vr0,$vr0     # 7    [c=12 l=4] lsx_vftintrz_w_s
     vst    $vr0,$r12,0     # 9    [c=4 l=4]  movv4si_lsx/2

So I don't know if I'm getting it right?:-(

>>> +(define_insn_and_split "fix_trunc<mode><vimode>2"
>>> +  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
>>> +	(fix:<VIMODE> (match_operand:FVEC 1 "register_operand" "f")))]
>>> +  ""
>>> +  "#"
>>> +  ""
>>> +  [(const_int 0)]
>>> +  {
>>> +    emit_insn (gen_<simd_isa>_<x>vftintrz_<simdifmt_for_f>_<simdfmt> (
>>> +      operands[0], operands[1]));
>>> +    DONE;
>>> +  }
>>> +  [(set_attr "type" "simd_fcvt")
>>> +   (set_attr "mode" "<MODE>")])


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-23  7:31       ` chenglulu
@ 2023-11-23  8:13         ` chenglulu
  2023-11-23  9:02           ` Xi Ruoyao
  2023-11-23  8:54         ` Xi Ruoyao
  1 sibling, 1 reply; 33+ messages in thread
From: chenglulu @ 2023-11-23  8:13 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua


在 2023/11/23 下午3:31, chenglulu 写道:
>
> 在 2023/11/23 下午3:11, Xi Ruoyao 写道:
>> On Thu, 2023-11-23 at 14:35 +0800, chenglulu wrote:
>>> Hi,
>>>
>>>    I don’t quite understand this part. Is it because define_insn 
>>> would be
>>> duplicated with the above implementation,
>>>
>>> so define_insn_and_split is used?
>> Yes, but if you think duplicating the above implementation is better I
>> can dup it as well (as it's just a single line).
>>
>> (I wrote it as a define_expand but it didn't work, then I modified it to
>> define_insn_and_split).
>>
> I just thought it was weird when I was looking at the code.
>
> I modified this code to use define_expand:
>
>     (define_expand "fix_trunc<mode><vimode>2"
>       [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
>             (fix:<VIMODE> (match_operand:FVEC 1 "register_operand" 
> "f")))]
>       ""
>       {
>         emit_insn 
> (gen_<simd_isa>_<x>vftintrz_<simdifmt_for_f>_<simdfmt> (
>           operands[0], operands[1]));
>         DONE;
>       }
>       [(set_attr "type" "simd_fcvt")
>        (set_attr "mode" "<MODE>")])
>
> Here are my test cases:
>
>     typedef float __attribute__ ((mode (SF))) float_t;
>     typedef int __attribute__ ((mode (SI))) int_t;
>
>     extern int_t v[4];
>     int_t
>     lt_fixdfsi (float_t *x)
>     {
>
>       for (int i=0;i<4;i++)
>         v[i] = x[i];
>     }
>
> This still achieves the desired effect, generating the following 
> assembly code:
>
> lt_fixdfsi:
> .LFB0 = .
>     .cfi_startproc
>
>     or    $r13,$r4,$r0     # 16    [c=4 l=4]  *movdi_64bit/0
>     la.global    $r12,v     # 8    [c=4 l=12]  *movdi_64bit/1
>     vld    $vr0,$r13,0     # 6    [c=12 l=4]  movv4sf_lsx/1
>     vftintrz.w.s    $vr0,$vr0     # 7    [c=12 l=4] lsx_vftintrz_w_s
>     vst    $vr0,$r12,0     # 9    [c=4 l=4]  movv4si_lsx/2
>
> So I don't know if I'm getting it right?:-(
>
The fix_truncv4sfv4si2 template is indeed called when debugging with gdb.

So I think we can use define_expand here.

>>>> +(define_insn_and_split "fix_trunc<mode><vimode>2"
>>>> +  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
>>>> +    (fix:<VIMODE> (match_operand:FVEC 1 "register_operand" "f")))]
>>>> +  ""
>>>> +  "#"
>>>> +  ""
>>>> +  [(const_int 0)]
>>>> +  {
>>>> +    emit_insn 
>>>> (gen_<simd_isa>_<x>vftintrz_<simdifmt_for_f>_<simdfmt> (
>>>> +      operands[0], operands[1]));
>>>> +    DONE;
>>>> +  }
>>>> +  [(set_attr "type" "simd_fcvt")
>>>> +   (set_attr "mode" "<MODE>")])


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-23  8:13         ` chenglulu
@ 2023-11-23  9:02           ` Xi Ruoyao
  2023-11-23  9:12             ` chenglulu
  0 siblings, 1 reply; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-23  9:02 UTC (permalink / raw)
  To: chenglulu, gcc-patches; +Cc: i, xuchenghua

On Thu, 2023-11-23 at 16:13 +0800, chenglulu wrote:
> The fix_truncv4sfv4si2 template is indeed called when debugging with
> gdb.
> 
> So I think we can use define_expand here.

The problem is cases where we want to combine an rint call with float-
to-int conversion:

float x[4];
int y[4];

void test()
{
	for (int i = 0; i < 4; i++)
		y[i] = __builtin_rintf(x[i]);
}

With define_expand we get "vfrint + vftintrz", but with define_insn we
get a single "vftint".

Arguably the generic code should try to handle this (PR86609), but it's
"not sure if that's a good idea in general" (comment 1 in the PR) so we
can do this in a target-specific way.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-23  9:02           ` Xi Ruoyao
@ 2023-11-23  9:12             ` chenglulu
  2023-11-23 10:12               ` Xi Ruoyao
  0 siblings, 1 reply; 33+ messages in thread
From: chenglulu @ 2023-11-23  9:12 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua


在 2023/11/23 下午5:02, Xi Ruoyao 写道:
> On Thu, 2023-11-23 at 16:13 +0800, chenglulu wrote:
>> The fix_truncv4sfv4si2 template is indeed called when debugging with
>> gdb.
>>
>> So I think we can use define_expand here.
> The problem is cases where we want to combine an rint call with float-
> to-int conversion:
>
> float x[4];
> int y[4];
>
> void test()
> {
> 	for (int i = 0; i < 4; i++)
> 		y[i] = __builtin_rintf(x[i]);
> }
>
> With define_expand we get "vfrint + vftintrz", but with define_insn we
> get a single "vftint".
>
> Arguably the generic code should try to handle this (PR86609), but it's
> "not sure if that's a good idea in general" (comment 1 in the PR) so we
> can do this in a target-specific way.
>
I tried to use Ofast to compile, and found that a vftint was generated, 
and at.006t.gimple appeared.

If O2 was compiled, __builtin_rintf would be generated, but Ofast would 
generate __builtin_irintf


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-23  9:12             ` chenglulu
@ 2023-11-23 10:12               ` Xi Ruoyao
  2023-11-23 12:06                 ` Xi Ruoyao
  2023-11-23 18:03                 ` Joseph Myers
  0 siblings, 2 replies; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-23 10:12 UTC (permalink / raw)
  To: chenglulu, gcc-patches, Uros Bizjak, Joseph Myers; +Cc: i, xuchenghua

On Thu, 2023-11-23 at 17:12 +0800, chenglulu wrote:
> 
> 在 2023/11/23 下午5:02, Xi Ruoyao 写道:
> > On Thu, 2023-11-23 at 16:13 +0800, chenglulu wrote:
> > > The fix_truncv4sfv4si2 template is indeed called when debugging with
> > > gdb.
> > > 
> > > So I think we can use define_expand here.
> > The problem is cases where we want to combine an rint call with float-
> > to-int conversion:
> > 
> > float x[4];
> > int y[4];
> > 
> > void test()
> > {
> > 	for (int i = 0; i < 4; i++)
> > 		y[i] = __builtin_rintf(x[i]);
> > }
> > 
> > With define_expand we get "vfrint + vftintrz", but with define_insn we
> > get a single "vftint".
> > 
> > Arguably the generic code should try to handle this (PR86609), but it's
> > "not sure if that's a good idea in general" (comment 1 in the PR) so we
> > can do this in a target-specific way.
> > 
> I tried to use Ofast to compile, and found that a vftint was generated, 
> and at.006t.gimple appeared.
> 
> If O2 was compiled, __builtin_rintf would be generated, but Ofast would 
> generate __builtin_irintf

Indeed...  It seems the FE will only generate __builtin_irintf when -
fno-math-errno -funsafe-math-optimizations.

But I cannot see why this is necessary (at least for us): the rintf
function does not set errno at all, and to me using vftint.w.s here is
safe: if the rounded result can be represented as a 32-bit int,
obviously there is no issue;  otherwise, per C23 section F.4 we should
raise FE_INVALID and produce unspecified result.  It seems our ftint.w.s
instruction has the required semantics.

+Uros and Joseph for some comment about the expected behavior of
(int)rintf(x).

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-23 10:12               ` Xi Ruoyao
@ 2023-11-23 12:06                 ` Xi Ruoyao
  2023-11-23 18:03                 ` Joseph Myers
  1 sibling, 0 replies; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-23 12:06 UTC (permalink / raw)
  To: chenglulu, gcc-patches, Uros Bizjak, Joseph Myers; +Cc: i, xuchenghua

On Thu, 2023-11-23 at 18:12 +0800, Xi Ruoyao wrote:
> On Thu, 2023-11-23 at 17:12 +0800, chenglulu wrote:
> > 
> > 在 2023/11/23 下午5:02, Xi Ruoyao 写道:
> > > On Thu, 2023-11-23 at 16:13 +0800, chenglulu wrote:
> > > > The fix_truncv4sfv4si2 template is indeed called when debugging with
> > > > gdb.
> > > > 
> > > > So I think we can use define_expand here.
> > > The problem is cases where we want to combine an rint call with float-
> > > to-int conversion:
> > > 
> > > float x[4];
> > > int y[4];
> > > 
> > > void test()
> > > {
> > > 	for (int i = 0; i < 4; i++)
> > > 		y[i] = __builtin_rintf(x[i]);
> > > }
> > > 
> > > With define_expand we get "vfrint + vftintrz", but with define_insn we
> > > get a single "vftint".
> > > 
> > > Arguably the generic code should try to handle this (PR86609), but it's
> > > "not sure if that's a good idea in general" (comment 1 in the PR) so we
> > > can do this in a target-specific way.
> > > 
> > I tried to use Ofast to compile, and found that a vftint was generated, 
> > and at.006t.gimple appeared.
> > 
> > If O2 was compiled, __builtin_rintf would be generated, but Ofast would 
> > generate __builtin_irintf
> 
> Indeed...  It seems the FE will only generate __builtin_irintf when -
> fno-math-errno -funsafe-math-optimizations.
> 
> But I cannot see why this is necessary (at least for us): the rintf
> function does not set errno at all, and to me using vftint.w.s here is
> safe: if the rounded result can be represented as a 32-bit int,
> obviously there is no issue;  otherwise, per C23 section F.4 we should
> raise FE_INVALID and produce unspecified result.  It seems our ftint.w.s
> instruction has the required semantics.
> 
> +Uros and Joseph for some comment about the expected behavior of
> (int)rintf(x).

I've spent some time reading the code and got some results.

For -fno-math-errno, it's for preventing from converting (int)rintf(x)
to a call to the *external* function irintf(x).  The problem is rintf
never sets errno, but irintf may set errno, this was PR 61876.  However
it's not a problem preventing us from using ftint.w.s because this
instruction does not sets errno.

For -funsafe-math-optimizations, there seems a logic error in
convert_to_integer_1:

  /* Convert e.g. (long)round(d) -> lround(d).  */
  /* If we're converting to char, we may encounter differing behavior
     between converting from double->char vs double->long->char.
     We're in "undefined" territory but we prefer to be conservative,
     so only proceed in "unsafe" math mode.  */
  if (optimize
      && (flag_unsafe_math_optimizations
          || (long_integer_type_node
              && outprec >= TYPE_PRECISION (long_integer_type_node))))

But shouldn't we compare against integer_type_node here as we're
handling __builtin_irint etc. of which the output is int (not long) in
this block?

Anyway, both constraints does not apply for our ftint.w.s instruction. 
And IMO the second constraint is a target-independent bug which should
be fixed.  The first constraint must remain there, but it's only for
preventing from mistakenly using an external irint (which may set
errno), not the ftint.w.s instruction (it does not even know errno).  So
we should use the target-specific way, i. e. a define_insn, to ensure
the optimization even if -fmath-errno.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-23 10:12               ` Xi Ruoyao
  2023-11-23 12:06                 ` Xi Ruoyao
@ 2023-11-23 18:03                 ` Joseph Myers
  2023-11-24  2:39                   ` Xi Ruoyao
  1 sibling, 1 reply; 33+ messages in thread
From: Joseph Myers @ 2023-11-23 18:03 UTC (permalink / raw)
  To: Xi Ruoyao; +Cc: chenglulu, gcc-patches, Uros Bizjak, i, xuchenghua

On Thu, 23 Nov 2023, Xi Ruoyao wrote:

> Indeed...  It seems the FE will only generate __builtin_irintf when -
> fno-math-errno -funsafe-math-optimizations.
> 
> But I cannot see why this is necessary (at least for us): the rintf
> function does not set errno at all, and to me using vftint.w.s here is
> safe: if the rounded result can be represented as a 32-bit int,
> obviously there is no issue;  otherwise, per C23 section F.4 we should
> raise FE_INVALID and produce unspecified result.  It seems our ftint.w.s
> instruction has the required semantics.

The rint functions indeed don't set errno (don't have domain or range 
errors, at least if you ignore the option for signaling NaNs arguments to 
be domain errors - which is in TS 18661-1, but not what glibc does, and 
not in C23).

The lrint / llrint functions should logically set errno for a domain error 
in the cases where they raise "invalid".  That they don't in glibc is 
glibc bug 6798.  And __builtin_irint functions can fall back to calling 
lrint functions if not expanded inline.  So the potential for errno 
setting explains why -fno-math-errno is required.

I don't see an obvious need for -funsafe-math-optimizations here - but at 
least -fno-trapping-math should be needed in some cases.  That's because 
rint raises "inexact" for noninteger argument - but lrint doesn't raise 
"inexact" when raising "invalid".  So if, for example, long is 32-bit and 
the floating type in use is double, calling rint for a noninteger argument 
too large for long, and then converting to a 32-bit signed integer type 
(long or int), raises both "inexact" and "invalid" - but a direct call to 
lrint raises such "invalid".

There are plausible arguments for both -fno-math-errno and 
-fno-trapping-math as defaults, or at least a subset of -fno-trapping-math 
(such as allowing code transformations that don't raise new exceptions but 
might lose some exceptions).

The comment in convert.cc about why -funsafe-math-optimizations is 
required says:

  /* Convert e.g. (long)round(d) -> lround(d).  */
  /* If we're converting to char, we may encounter differing behavior
     between converting from double->char vs double->long->char.
     We're in "undefined" territory but we prefer to be conservative,
     so only proceed in "unsafe" math mode.  */
  if (optimize
      && (flag_unsafe_math_optimizations
          || (long_integer_type_node
              && outprec >= TYPE_PRECISION (long_integer_type_node))))

I think that we should not try to guarantee particular results for such 
out-of-range conversions (unspecified value plus "invalid" given Annex F, 
or undefined in the base standard).

There are various known bugs in Bugzilla for cases where a single 
computation in the abstract machine that produces not-fully-specified 
results gets duplicated and the generated code ends up executing both 
duplicates, with different results from both, while other parts of the 
code expect the single computation in the abstract machine to have a 
single result, so assume the two values generated are the same when they 
are not.  But I wouldn't expect those bugs to be addressed by guaranteeing 
that a given result, maybe matching hardware, is given globally from any 
such computation; it would seem more likely to address them by somehow 
preventing a single computation in the abstract machine from producing 
multiple different results in the actual execution, when it's a 
computation (such as a possibly out-of-range floating conversion to 
integer) for which the results are not fully specified.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-23 18:03                 ` Joseph Myers
@ 2023-11-24  2:39                   ` Xi Ruoyao
  2023-11-24  8:01                     ` chenglulu
  0 siblings, 1 reply; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-24  2:39 UTC (permalink / raw)
  To: Joseph Myers, chenglulu; +Cc: gcc-patches, Uros Bizjak, i, xuchenghua

On Thu, 2023-11-23 at 18:03 +0000, Joseph Myers wrote:
> The rint functions indeed don't set errno (don't have domain or range 
> errors, at least if you ignore the option for signaling NaNs arguments to 
> be domain errors - which is in TS 18661-1, but not what glibc does, and 
> not in C23).
> 
> The lrint / llrint functions should logically set errno for a domain error 
> in the cases where they raise "invalid".  That they don't in glibc is 
> glibc bug 6798.  And __builtin_irint functions can fall back to calling 
> lrint functions if not expanded inline.  So the potential for errno 
> setting explains why -fno-math-errno is required.

I agree.  But this is not preventing optimizing vectorized "(int)
rintf(x[i])" into a vftint.w.s instruction which does not set errno.

> I don't see an obvious need for -funsafe-math-optimizations here - but at 
> least -fno-trapping-math should be needed in some cases.  That's because 
> rint raises "inexact" for noninteger argument - but lrint doesn't raise 
> "inexact" when raising "invalid".  So if, for example, long is 32-bit and 
> the floating type in use is double, calling rint for a noninteger argument 
> too large for long, and then converting to a 32-bit signed integer type 
> (long or int), raises both "inexact" and "invalid" - but a direct call to 
> lrint raises such "invalid".

Interesting...  But for (i32)rintf(x) it's impossible to have a non-
integer value out of [-2147483648, 2147483648) except NaN and +-Inf,
likewise for (i64)rint(x).  So using vftint.w.s and vftint.l.d
instructions should still be fine.  We also have a vftint.w.d
instruction but it's only used as an intrinsic as at now, and my patch
does not attempt to use it.

Lulu: so my conclusion is an (i32)rintf -> irintf transformation is
indeed "unsafe" generally, but a machine-specific transformation to
vftint.w.s is fine and we should use the define_insn to do it.  Do you
agree?

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-24  2:39                   ` Xi Ruoyao
@ 2023-11-24  8:01                     ` chenglulu
  2023-11-24  8:26                       ` Xi Ruoyao
  0 siblings, 1 reply; 33+ messages in thread
From: chenglulu @ 2023-11-24  8:01 UTC (permalink / raw)
  To: Xi Ruoyao, Joseph Myers; +Cc: gcc-patches, Uros Bizjak, i, xuchenghua


在 2023/11/24 上午10:39, Xi Ruoyao 写道:
> On Thu, 2023-11-23 at 18:03 +0000, Joseph Myers wrote:
>> The rint functions indeed don't set errno (don't have domain or range
>> errors, at least if you ignore the option for signaling NaNs arguments to
>> be domain errors - which is in TS 18661-1, but not what glibc does, and
>> not in C23).
>>
>> The lrint / llrint functions should logically set errno for a domain error
>> in the cases where they raise "invalid".  That they don't in glibc is
>> glibc bug 6798.  And __builtin_irint functions can fall back to calling
>> lrint functions if not expanded inline.  So the potential for errno
>> setting explains why -fno-math-errno is required.
> I agree.  But this is not preventing optimizing vectorized "(int)
> rintf(x[i])" into a vftint.w.s instruction which does not set errno.

I'm sorry. Please forgive my stupidity. I don't see a description in TS 
18661-1 that explicitly

says rint does not set errno.

I only saw lrint llrint in n2310 with this description:

F7.12.9.5

"The lrint and llrint functions round their argument to the nearest 
integer value, rounding
according to the current rounding direction. If the rounded value is 
outside the range of the return
type, the numeric result is unspecified and a domain error or range 
error may occur."

I don't know if I'm right?



>
>> I don't see an obvious need for -funsafe-math-optimizations here - but at
>> least -fno-trapping-math should be needed in some cases.  That's because
>> rint raises "inexact" for noninteger argument - but lrint doesn't raise
>> "inexact" when raising "invalid".  So if, for example, long is 32-bit and
>> the floating type in use is double, calling rint for a noninteger argument
>> too large for long, and then converting to a 32-bit signed integer type
>> (long or int), raises both "inexact" and "invalid" - but a direct call to
>> lrint raises such "invalid".
> Interesting...  But for (i32)rintf(x) it's impossible to have a non-
> integer value out of [-2147483648, 2147483648) except NaN and +-Inf,
> likewise for (i64)rint(x).  So using vftint.w.s and vftint.l.d
> instructions should still be fine.  We also have a vftint.w.d
> instruction but it's only used as an intrinsic as at now, and my patch
> does not attempt to use it.
>
> Lulu: so my conclusion is an (i32)rintf -> irintf transformation is
> indeed "unsafe" generally, but a machine-specific transformation to
> vftint.w.s is fine and we should use the define_insn to do it.  Do you
> agree?
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-24  8:01                     ` chenglulu
@ 2023-11-24  8:26                       ` Xi Ruoyao
  2023-11-24  8:36                         ` chenglulu
  0 siblings, 1 reply; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-24  8:26 UTC (permalink / raw)
  To: chenglulu, Joseph Myers; +Cc: gcc-patches, Uros Bizjak, i, xuchenghua

On Fri, 2023-11-24 at 16:01 +0800, chenglulu wrote:
> I only saw lrint llrint in n2310 with this description:
> 
> F7.12.9.5
> 
> "The lrint and llrint functions round their argument to the nearest 
> integer value, rounding
> according to the current rounding direction. If the rounded value is 
> outside the range of the return
> type, the numeric result is unspecified and a domain error or range 
> error may occur."
> 
> I don't know if I'm right?

There's an explanation in the linux man-page for lrint:

       SUSv2 and POSIX.1‐2001 contain text about overflow (which might set er‐
       rno to ERANGE, or raise an FE_OVERFLOW exception).   In  practice,  the
       result  cannot  overflow on any current machine, so this error‐handling
       stuff is just nonsense.  (More precisely, overflow can happen only when
       the maximum value of the exponent is smaller than the  number  of  man‐
       tissa bits.  For the IEEE‐754 standard 32‐bit and 64‐bit floating‐point
       numbers  the maximum value of the exponent is 127 (respectively, 1023),
       and the number of mantissa bits including the implicit bit is  24  (re‐
       spectively, 53).)

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-24  8:26                       ` Xi Ruoyao
@ 2023-11-24  8:36                         ` chenglulu
  2023-11-24  8:42                           ` Xi Ruoyao
  0 siblings, 1 reply; 33+ messages in thread
From: chenglulu @ 2023-11-24  8:36 UTC (permalink / raw)
  To: Xi Ruoyao, Joseph Myers; +Cc: gcc-patches, Uros Bizjak, i, xuchenghua


在 2023/11/24 下午4:26, Xi Ruoyao 写道:
> On Fri, 2023-11-24 at 16:01 +0800, chenglulu wrote:
>> I only saw lrint llrint in n2310 with this description:
>>
>> F7.12.9.5
>>
>> "The lrint and llrint functions round their argument to the nearest
>> integer value, rounding
>> according to the current rounding direction. If the rounded value is
>> outside the range of the return
>> type, the numeric result is unspecified and a domain error or range
>> error may occur."
>>
>> I don't know if I'm right?
> There's an explanation in the linux man-page for lrint:
>
>         SUSv2 and POSIX.1‐2001 contain text about overflow (which might set er‐
>         rno to ERANGE, or raise an FE_OVERFLOW exception).   In  practice,  the
>         result  cannot  overflow on any current machine, so this error‐handling
>         stuff is just nonsense.  (More precisely, overflow can happen only when
>         the maximum value of the exponent is smaller than the  number  of  man‐
>         tissa bits.  For the IEEE‐754 standard 32‐bit and 64‐bit floating‐point
>         numbers  the maximum value of the exponent is 127 (respectively, 1023),
>         and the number of mantissa bits including the implicit bit is  24  (re‐
>         spectively, 53).)
>
This is the description of rint rintf rintl  in the linux man-page.:-(


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-24  8:36                         ` chenglulu
@ 2023-11-24  8:42                           ` Xi Ruoyao
  2023-11-24  9:46                             ` chenglulu
  0 siblings, 1 reply; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-24  8:42 UTC (permalink / raw)
  To: chenglulu, Joseph Myers; +Cc: gcc-patches, Uros Bizjak, i, xuchenghua

On Fri, 2023-11-24 at 16:36 +0800, chenglulu wrote:
> 
> 在 2023/11/24 下午4:26, Xi Ruoyao 写道:
> > On Fri, 2023-11-24 at 16:01 +0800, chenglulu wrote:
> > > I only saw lrint llrint in n2310 with this description:
> > > 
> > > F7.12.9.5
> > > 
> > > "The lrint and llrint functions round their argument to the nearest
> > > integer value, rounding
> > > according to the current rounding direction. If the rounded value is
> > > outside the range of the return
> > > type, the numeric result is unspecified and a domain error or range
> > > error may occur."
> > > 
> > > I don't know if I'm right?
> > There's an explanation in the linux man-page for lrint:
> > 
> >         SUSv2 and POSIX.1‐2001 contain text about overflow (which might set er‐
> >         rno to ERANGE, or raise an FE_OVERFLOW exception).   In  practice,  the
> >         result  cannot  overflow on any current machine, so this error‐handling
> >         stuff is just nonsense.  (More precisely, overflow can happen only when
> >         the maximum value of the exponent is smaller than the  number  of  man‐
> >         tissa bits.  For the IEEE‐754 standard 32‐bit and 64‐bit floating‐point
> >         numbers  the maximum value of the exponent is 127 (respectively, 1023),
> >         and the number of mantissa bits including the implicit bit is  24  (re‐
> >         spectively, 53).)
> > 
> This is the description of rint rintf rintl  in the linux man-page.:-(

Phew, I misread the message.

Yes, for lrint we assume it may set errno.  For example:

long x[4];
double y[4];

void test()
{
	for (int i = 0; i < 4; i++)
		x[i] = __builtin_lrint(y[i]);
}

We produce a loop calling lrint with -O2 -mlasx:

.L2:
	fldx.d	$f0,$r26,$r23
	bl	%plt(lrint)
	stx.d	$r4,$r25,$r23
	addi.d	$r23,$r23,8
	bne	$r23,$r24,.L2

because using xvftint.l.d may miss an errno from the libc.  Only with -
O2 -mlasx -fno-math-errno xvftint.l.d is emitted.

But for

long x[4];
double y[4];

void test()
{
	for (int i = 0; i < 4; i++)
		x[i] = (long) __builtin_rint(y[i]);
}

we know rint does not set errno, and converting a double to long does
not set errno, so using xvftint.l.d is correct.

On the contrary, we cannot optimize it to the first example because it
may cause an errno to be mistakenly set when the libc sets errno for
lrint.  That's why the generic code only transforms (int)rintf -> irintf
or (long)rint -> lrint when -ffast-math.

But this limitation does not apply for the xvftint.l.d instruction (as
xvftint.l.d is just an instruction and it does not know errno at all).

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-24  8:42                           ` Xi Ruoyao
@ 2023-11-24  9:46                             ` chenglulu
  2023-11-24 10:30                               ` Xi Ruoyao
  0 siblings, 1 reply; 33+ messages in thread
From: chenglulu @ 2023-11-24  9:46 UTC (permalink / raw)
  To: Xi Ruoyao, Joseph Myers; +Cc: gcc-patches, Uros Bizjak, i, xuchenghua


在 2023/11/24 下午4:42, Xi Ruoyao 写道:
> On Fri, 2023-11-24 at 16:36 +0800, chenglulu wrote:
>> 在 2023/11/24 下午4:26, Xi Ruoyao 写道:
>>> On Fri, 2023-11-24 at 16:01 +0800, chenglulu wrote:
>>>> I only saw lrint llrint in n2310 with this description:
>>>>
>>>> F7.12.9.5
>>>>
>>>> "The lrint and llrint functions round their argument to the nearest
>>>> integer value, rounding
>>>> according to the current rounding direction. If the rounded value is
>>>> outside the range of the return
>>>> type, the numeric result is unspecified and a domain error or range
>>>> error may occur."
>>>>
>>>> I don't know if I'm right?
>>> There's an explanation in the linux man-page for lrint:
>>>
>>>          SUSv2 and POSIX.1‐2001 contain text about overflow (which might set er‐
>>>          rno to ERANGE, or raise an FE_OVERFLOW exception).   In  practice,  the
>>>          result  cannot  overflow on any current machine, so this error‐handling
>>>          stuff is just nonsense.  (More precisely, overflow can happen only when
>>>          the maximum value of the exponent is smaller than the  number  of  man‐
>>>          tissa bits.  For the IEEE‐754 standard 32‐bit and 64‐bit floating‐point
>>>          numbers  the maximum value of the exponent is 127 (respectively, 1023),
>>>          and the number of mantissa bits including the implicit bit is  24  (re‐
>>>          spectively, 53).)
>>>
>> This is the description of rint rintf rintl  in the linux man-page.:-(
> Phew, I misread the message.
>
> Yes, for lrint we assume it may set errno.  For example:
>
> long x[4];
> double y[4];
>
> void test()
> {
> 	for (int i = 0; i < 4; i++)
> 		x[i] = __builtin_lrint(y[i]);
> }
>
> We produce a loop calling lrint with -O2 -mlasx:
>
> .L2:
> 	fldx.d	$f0,$r26,$r23
> 	bl	%plt(lrint)
> 	stx.d	$r4,$r25,$r23
> 	addi.d	$r23,$r23,8
> 	bne	$r23,$r24,.L2
>
> because using xvftint.l.d may miss an errno from the libc.  Only with -
> O2 -mlasx -fno-math-errno xvftint.l.d is emitted.
>
> But for
>
> long x[4];
> double y[4];
>
> void test()
> {
> 	for (int i = 0; i < 4; i++)
> 		x[i] = (long) __builtin_rint(y[i]);
> }
>
> we know rint does not set errno, and converting a double to long does
> not set errno, so using xvftint.l.d is correct.
>
> On the contrary, we cannot optimize it to the first example because it
> may cause an errno to be mistakenly set when the libc sets errno for
> lrint.  That's why the generic code only transforms (int)rintf -> irintf
> or (long)rint -> lrint when -ffast-math.
>
> But this limitation does not apply for the xvftint.l.d instruction (as
> xvftint.l.d is just an instruction and it does not know errno at all).
>
Yeah, I know what you mean. That is, our handling of errno and exception 
flag bits

before and after optimization is unchanged, then the optimization is no 
problem.

So I agree with your optimization.

It's just that I'm confused that the description of rint in n2310, 
including Joseph's email,

all say that rint will not set errno, but linux-man says "which might 
set errno to ERANGE" .

The two aspects about rint lrint's handling of errno are opposite.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-24  9:46                             ` chenglulu
@ 2023-11-24 10:30                               ` Xi Ruoyao
  2023-11-24 14:59                                 ` chenglulu
  0 siblings, 1 reply; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-24 10:30 UTC (permalink / raw)
  To: chenglulu, Joseph Myers; +Cc: gcc-patches, Uros Bizjak, i, xuchenghua

On Fri, 2023-11-24 at 17:46 +0800, chenglulu wrote:
> It's just that I'm confused that the description of rint in n2310, 
> including Joseph's email,

> all say that rint will not set errno, but linux-man says "which might 
> set errno to ERANGE" .

Annex F has:

The C floating types match the IEC 60559 formats as follows:
- The float type matches the IEC 60559 single format.
- The double type matches the IEC 60559 double format.

With these constraints rint and rintf just cannot have a range or domain
error.  So Annex F does not say rint may set errno.  Linux-man says:

ERRORS
       No errors occur.  POSIX.1‐2001 documents a range error  for  overflows,
       but see NOTES.

It's because POSIX does not mandates float/double to match the IEC 60559
formats, thus a range error may happen with some strange floating point
formats.  The NOTES says "this won't happen for IEC 60559 formats".

For lrint, N2310 says:

If the rounded value is outside the range of the return type, the
numeric result is unspecified and a domain error or range error may
occur.

So a EDOM or ERANGE may be set.  On the contrary, the man page says:

       The following errors can occur:

       Domain error: x is a NaN or infinite, or the rounded value is too large
              An invalid floating‐point exception (FE_INVALID) is raised.

       These functions do not set errno.

The last paragraph is Glibc-specific, and it's considered a Glibc bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=6798).

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-24 10:30                               ` Xi Ruoyao
@ 2023-11-24 14:59                                 ` chenglulu
  0 siblings, 0 replies; 33+ messages in thread
From: chenglulu @ 2023-11-24 14:59 UTC (permalink / raw)
  To: Xi Ruoyao, Joseph Myers; +Cc: gcc-patches, Uros Bizjak, i, xuchenghua


在 2023/11/24 下午6:30, Xi Ruoyao 写道:
> On Fri, 2023-11-24 at 17:46 +0800, chenglulu wrote:
>> It's just that I'm confused that the description of rint in n2310,
>> including Joseph's email,
>> all say that rint will not set errno, but linux-man says "which might
>> set errno to ERANGE" .
> Annex F has:
>
> The C floating types match the IEC 60559 formats as follows:
> - The float type matches the IEC 60559 single format.
> - The double type matches the IEC 60559 double format.
>
> With these constraints rint and rintf just cannot have a range or domain
> error.  So Annex F does not say rint may set errno.  Linux-man says:
>
> ERRORS
>         No errors occur.  POSIX.1‐2001 documents a range error  for  overflows,
>         but see NOTES.
>
> It's because POSIX does not mandates float/double to match the IEC 60559
> formats, thus a range error may happen with some strange floating point
> formats.  The NOTES says "this won't happen for IEC 60559 formats".
>
> For lrint, N2310 says:
>
> If the rounded value is outside the range of the return type, the
> numeric result is unspecified and a domain error or range error may
> occur.
>
> So a EDOM or ERANGE may be set.  On the contrary, the man page says:
>
>         The following errors can occur:
>
>         Domain error: x is a NaN or infinite, or the rounded value is too large
>                An invalid floating‐point exception (FE_INVALID) is raised.
>
>         These functions do not set errno.
>
> The last paragraph is Glibc-specific, and it's considered a Glibc bug
> (https://sourceware.org/bugzilla/show_bug.cgi?id=6798).
>
Ok, the question is clear, I have no objection.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]
  2023-11-23  7:31       ` chenglulu
  2023-11-23  8:13         ` chenglulu
@ 2023-11-23  8:54         ` Xi Ruoyao
  1 sibling, 0 replies; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-23  8:54 UTC (permalink / raw)
  To: chenglulu, gcc-patches; +Cc: i, xuchenghua

On Thu, 2023-11-23 at 15:31 +0800, chenglulu wrote:
> I modified this code to use define_expand:
> 
>      (define_expand "fix_trunc<mode><vimode>2"
>        [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
>              (fix:<VIMODE> (match_operand:FVEC 1 "register_operand" "f")))]
>        ""
>        {
>          emit_insn (gen_<simd_isa>_<x>vftintrz_<simdifmt_for_f>_<simdfmt> (
>            operands[0], operands[1]));
>          DONE;
>        }
>        [(set_attr "type" "simd_fcvt")
>         (set_attr "mode" "<MODE>")])

For

float x[4];
int y[4];

void test()
{
	for (int i = 0; i < 4; i++)
		y[i] = __builtin_rintf(x[i]);
}

it produces

	la.local	$r12,.LANCHOR0
	vld	$vr0,$r12,0
	vfrint.s	$vr0,$vr0
	vftintrz.w.s	$vr0,$vr0
	vst	$vr0,$r12,16
	jr	$r1

But with a define_insn or define_insn_and_split:

	la.local	$r12,.LANCHOR0
	vld	$vr0,$r12,0
	vftint.w.s	$vr0,$vr0
	vst	$vr0,$r12,16
	jr	$r1

(Our scalar code also generates sub-optimal frint.s-ftintxx.w.s
sequences.  I guess should fix the scalar code later as well.)

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 2/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX muh instructions
  2023-11-20  0:47 [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations Xi Ruoyao
  2023-11-20  0:47 ` [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578] Xi Ruoyao
@ 2023-11-20  0:47 ` Xi Ruoyao
  2023-11-23 12:08   ` chenglulu
  2023-11-20  0:47 ` [PATCH v3 3/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate shift Xi Ruoyao
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-20  0:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: chenglulu, i, xuchenghua, Xi Ruoyao

Removes unnecessary UNSPECs and make the muh instructions useful with
GNU vectors or auto vectorization.

gcc/ChangeLog:

	* config/loongarch/simd.md (muh): New code attribute mapping
	any_extend to smul_highpart or umul_highpart.
	(<su>mul<mode>3_highpart): New define_insn.
	* config/loongarch/lsx.md (UNSPEC_LSX_VMUH_S): Remove.
	(UNSPEC_LSX_VMUH_U): Remove.
	(lsx_vmuh_s_<lsxfmt>): Remove.
	(lsx_vmuh_u_<lsxfmt>): Remove.
	* config/loongarch/lasx.md (UNSPEC_LASX_XVMUH_S): Remove.
	(UNSPEC_LASX_XVMUH_U): Remove.
	(lasx_xvmuh_s_<lasxfmt>): Remove.
	(lasx_xvmuh_u_<lasxfmt>): Remove.
	* config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vmuh_b):
	Redefine to standard pattern name.
	(CODE_FOR_lsx_vmuh_h): Likewise.
	(CODE_FOR_lsx_vmuh_w): Likewise.
	(CODE_FOR_lsx_vmuh_d): Likewise.
	(CODE_FOR_lsx_vmuh_bu): Likewise.
	(CODE_FOR_lsx_vmuh_hu): Likewise.
	(CODE_FOR_lsx_vmuh_wu): Likewise.
	(CODE_FOR_lsx_vmuh_du): Likewise.
	(CODE_FOR_lasx_xvmuh_b): Likewise.
	(CODE_FOR_lasx_xvmuh_h): Likewise.
	(CODE_FOR_lasx_xvmuh_w): Likewise.
	(CODE_FOR_lasx_xvmuh_d): Likewise.
	(CODE_FOR_lasx_xvmuh_bu): Likewise.
	(CODE_FOR_lasx_xvmuh_hu): Likewise.
	(CODE_FOR_lasx_xvmuh_wu): Likewise.
	(CODE_FOR_lasx_xvmuh_du): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vect-muh.c: New test.
---
 gcc/config/loongarch/lasx.md                  | 22 ------------
 gcc/config/loongarch/loongarch-builtins.cc    | 32 ++++++++---------
 gcc/config/loongarch/lsx.md                   | 22 ------------
 gcc/config/loongarch/simd.md                  | 16 +++++++++
 gcc/testsuite/gcc.target/loongarch/vect-muh.c | 36 +++++++++++++++++++
 5 files changed, 68 insertions(+), 60 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-muh.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index d4a56c307c4..023a023b44e 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -68,8 +68,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_BRANCH
   UNSPEC_LASX_BRANCH_V
 
-  UNSPEC_LASX_XVMUH_S
-  UNSPEC_LASX_XVMUH_U
   UNSPEC_LASX_MXVEXTW_U
   UNSPEC_LASX_XVSLLWIL_S
   UNSPEC_LASX_XVSLLWIL_U
@@ -2823,26 +2821,6 @@ (define_insn "neg<mode>2"
   [(set_attr "type" "simd_logic")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "lasx_xvmuh_s_<lasxfmt>"
-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-	(unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
-		       (match_operand:ILASX 2 "register_operand" "f")]
-		      UNSPEC_LASX_XVMUH_S))]
-  "ISA_HAS_LASX"
-  "xvmuh.<lasxfmt>\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "<MODE>")])
-
-(define_insn "lasx_xvmuh_u_<lasxfmt_u>"
-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-	(unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
-		       (match_operand:ILASX 2 "register_operand" "f")]
-		      UNSPEC_LASX_XVMUH_U))]
-  "ISA_HAS_LASX"
-  "xvmuh.<lasxfmt_u>\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "<MODE>")])
-
 (define_insn "lasx_xvsllwil_s_<dlasxfmt>_<lasxfmt>"
   [(set (match_operand:<VDMODE256> 0 "register_operand" "=f")
 	(unspec:<VDMODE256> [(match_operand:ILASX_WHB 1 "register_operand" "f")
diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc
index cbd833aa283..a6fcc1c731e 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -319,6 +319,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lsx_vmod_hu CODE_FOR_umodv8hi3
 #define CODE_FOR_lsx_vmod_wu CODE_FOR_umodv4si3
 #define CODE_FOR_lsx_vmod_du CODE_FOR_umodv2di3
+#define CODE_FOR_lsx_vmuh_b CODE_FOR_smulv16qi3_highpart
+#define CODE_FOR_lsx_vmuh_h CODE_FOR_smulv8hi3_highpart
+#define CODE_FOR_lsx_vmuh_w CODE_FOR_smulv4si3_highpart
+#define CODE_FOR_lsx_vmuh_d CODE_FOR_smulv2di3_highpart
+#define CODE_FOR_lsx_vmuh_bu CODE_FOR_umulv16qi3_highpart
+#define CODE_FOR_lsx_vmuh_hu CODE_FOR_umulv8hi3_highpart
+#define CODE_FOR_lsx_vmuh_wu CODE_FOR_umulv4si3_highpart
+#define CODE_FOR_lsx_vmuh_du CODE_FOR_umulv2di3_highpart
 #define CODE_FOR_lsx_vmul_b CODE_FOR_mulv16qi3
 #define CODE_FOR_lsx_vmul_h CODE_FOR_mulv8hi3
 #define CODE_FOR_lsx_vmul_w CODE_FOR_mulv4si3
@@ -439,14 +447,6 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lsx_vfnmsub_s CODE_FOR_vfnmsubv4sf4_nmsub4
 #define CODE_FOR_lsx_vfnmsub_d CODE_FOR_vfnmsubv2df4_nmsub4
 
-#define CODE_FOR_lsx_vmuh_b CODE_FOR_lsx_vmuh_s_b
-#define CODE_FOR_lsx_vmuh_h CODE_FOR_lsx_vmuh_s_h
-#define CODE_FOR_lsx_vmuh_w CODE_FOR_lsx_vmuh_s_w
-#define CODE_FOR_lsx_vmuh_d CODE_FOR_lsx_vmuh_s_d
-#define CODE_FOR_lsx_vmuh_bu CODE_FOR_lsx_vmuh_u_bu
-#define CODE_FOR_lsx_vmuh_hu CODE_FOR_lsx_vmuh_u_hu
-#define CODE_FOR_lsx_vmuh_wu CODE_FOR_lsx_vmuh_u_wu
-#define CODE_FOR_lsx_vmuh_du CODE_FOR_lsx_vmuh_u_du
 #define CODE_FOR_lsx_vsllwil_h_b CODE_FOR_lsx_vsllwil_s_h_b
 #define CODE_FOR_lsx_vsllwil_w_h CODE_FOR_lsx_vsllwil_s_w_h
 #define CODE_FOR_lsx_vsllwil_d_w CODE_FOR_lsx_vsllwil_s_d_w
@@ -588,6 +588,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lasx_xvmul_h CODE_FOR_mulv16hi3
 #define CODE_FOR_lasx_xvmul_w CODE_FOR_mulv8si3
 #define CODE_FOR_lasx_xvmul_d CODE_FOR_mulv4di3
+#define CODE_FOR_lasx_xvmuh_b CODE_FOR_smulv32qi3_highpart
+#define CODE_FOR_lasx_xvmuh_h CODE_FOR_smulv16hi3_highpart
+#define CODE_FOR_lasx_xvmuh_w CODE_FOR_smulv8si3_highpart
+#define CODE_FOR_lasx_xvmuh_d CODE_FOR_smulv4di3_highpart
+#define CODE_FOR_lasx_xvmuh_bu CODE_FOR_umulv32qi3_highpart
+#define CODE_FOR_lasx_xvmuh_hu CODE_FOR_umulv16hi3_highpart
+#define CODE_FOR_lasx_xvmuh_wu CODE_FOR_umulv8si3_highpart
+#define CODE_FOR_lasx_xvmuh_du CODE_FOR_umulv4di3_highpart
 #define CODE_FOR_lasx_xvclz_b CODE_FOR_clzv32qi2
 #define CODE_FOR_lasx_xvclz_h CODE_FOR_clzv16hi2
 #define CODE_FOR_lasx_xvclz_w CODE_FOR_clzv8si2
@@ -697,14 +705,6 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lasx_xvavgr_hu CODE_FOR_lasx_xvavgr_u_hu
 #define CODE_FOR_lasx_xvavgr_wu CODE_FOR_lasx_xvavgr_u_wu
 #define CODE_FOR_lasx_xvavgr_du CODE_FOR_lasx_xvavgr_u_du
-#define CODE_FOR_lasx_xvmuh_b CODE_FOR_lasx_xvmuh_s_b
-#define CODE_FOR_lasx_xvmuh_h CODE_FOR_lasx_xvmuh_s_h
-#define CODE_FOR_lasx_xvmuh_w CODE_FOR_lasx_xvmuh_s_w
-#define CODE_FOR_lasx_xvmuh_d CODE_FOR_lasx_xvmuh_s_d
-#define CODE_FOR_lasx_xvmuh_bu CODE_FOR_lasx_xvmuh_u_bu
-#define CODE_FOR_lasx_xvmuh_hu CODE_FOR_lasx_xvmuh_u_hu
-#define CODE_FOR_lasx_xvmuh_wu CODE_FOR_lasx_xvmuh_u_wu
-#define CODE_FOR_lasx_xvmuh_du CODE_FOR_lasx_xvmuh_u_du
 #define CODE_FOR_lasx_xvssran_b_h CODE_FOR_lasx_xvssran_s_b_h
 #define CODE_FOR_lasx_xvssran_h_w CODE_FOR_lasx_xvssran_s_h_w
 #define CODE_FOR_lasx_xvssran_w_d CODE_FOR_lasx_xvssran_s_w_d
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index c1c3719e383..537afaf9625 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -64,8 +64,6 @@ (define_c_enum "unspec" [
   UNSPEC_LSX_VSRLR
   UNSPEC_LSX_VSRLRI
   UNSPEC_LSX_VSHUF
-  UNSPEC_LSX_VMUH_S
-  UNSPEC_LSX_VMUH_U
   UNSPEC_LSX_VEXTW_S
   UNSPEC_LSX_VEXTW_U
   UNSPEC_LSX_VSLLWIL_S
@@ -2506,26 +2504,6 @@ (define_insn "vneg<mode>2"
   [(set_attr "type" "simd_logic")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "lsx_vmuh_s_<lsxfmt>"
-  [(set (match_operand:ILSX 0 "register_operand" "=f")
-	(unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f")
-		      (match_operand:ILSX 2 "register_operand" "f")]
-		     UNSPEC_LSX_VMUH_S))]
-  "ISA_HAS_LSX"
-  "vmuh.<lsxfmt>\t%w0,%w1,%w2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "<MODE>")])
-
-(define_insn "lsx_vmuh_u_<lsxfmt_u>"
-  [(set (match_operand:ILSX 0 "register_operand" "=f")
-	(unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f")
-		      (match_operand:ILSX 2 "register_operand" "f")]
-		     UNSPEC_LSX_VMUH_U))]
-  "ISA_HAS_LSX"
-  "vmuh.<lsxfmt_u>\t%w0,%w1,%w2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "<MODE>")])
-
 (define_insn "lsx_vextw_s_d"
   [(set (match_operand:V2DI 0 "register_operand" "=f")
 	(unspec:V2DI [(match_operand:V4SI 1 "register_operand" "f")]
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index f371e201127..79324183233 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -187,6 +187,22 @@ (define_insn_and_split "fix_trunc<mode><vimode>2"
   [(set_attr "type" "simd_fcvt")
    (set_attr "mode" "<MODE>")])
 
+;; <x>vmuh.{b/h/w/d}
+
+(define_code_attr muh
+  [(sign_extend "smul_highpart")
+   (zero_extend "umul_highpart")])
+
+(define_insn "<su>mul<mode>3_highpart"
+  [(set (match_operand:IVEC 0 "register_operand" "=f")
+	(<muh>:IVEC (match_operand:IVEC 1 "register_operand" "f")
+		    (match_operand:IVEC 2 "register_operand" "f")))
+   (any_extend (const_int 0))]
+  ""
+  "<x>vmuh.<simdfmt><u>\t%<wu>0,%<wu>1,%<wu>2"
+  [(set_attr "type" "simd_int_arith")
+   (set_attr "mode" "<MODE>")])
+
 ; The LoongArch SX Instructions.
 (include "lsx.md")
 
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-muh.c b/gcc/testsuite/gcc.target/loongarch/vect-muh.c
new file mode 100644
index 00000000000..a788840b23c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-muh.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-mlasx -O3" } */
+/* { dg-final { scan-assembler "\tvmuh\.w\t" } } */
+/* { dg-final { scan-assembler "\tvmuh\.wu\t" } } */
+/* { dg-final { scan-assembler "\txvmuh\.w\t" } } */
+/* { dg-final { scan-assembler "\txvmuh\.wu\t" } } */
+
+int a[8], b[8], c[8];
+
+void
+test1 (void)
+{
+  for (int i = 0; i < 4; i++)
+    c[i] = ((long)a[i] * (long)b[i]) >> 32;
+}
+
+void
+test2 (void)
+{
+  for (int i = 0; i < 4; i++)
+    c[i] = ((long)(unsigned)a[i] * (long)(unsigned)b[i]) >> 32;
+}
+
+void
+test3 (void)
+{
+  for (int i = 0; i < 8; i++)
+    c[i] = ((long)a[i] * (long)b[i]) >> 32;
+}
+
+void
+test4 (void)
+{
+  for (int i = 0; i < 8; i++)
+    c[i] = ((long)(unsigned)a[i] * (long)(unsigned)b[i]) >> 32;
+}
-- 
2.42.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 2/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX muh instructions
  2023-11-20  0:47 ` [PATCH v3 2/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX muh instructions Xi Ruoyao
@ 2023-11-23 12:08   ` chenglulu
  0 siblings, 0 replies; 33+ messages in thread
From: chenglulu @ 2023-11-23 12:08 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua

LGTM.

Thanks!

在 2023/11/20 上午8:47, Xi Ruoyao 写道:
> Removes unnecessary UNSPECs and make the muh instructions useful with
> GNU vectors or auto vectorization.
>
> gcc/ChangeLog:
>
> 	* config/loongarch/simd.md (muh): New code attribute mapping
> 	any_extend to smul_highpart or umul_highpart.
> 	(<su>mul<mode>3_highpart): New define_insn.
> 	* config/loongarch/lsx.md (UNSPEC_LSX_VMUH_S): Remove.
> 	(UNSPEC_LSX_VMUH_U): Remove.
> 	(lsx_vmuh_s_<lsxfmt>): Remove.
> 	(lsx_vmuh_u_<lsxfmt>): Remove.
> 	* config/loongarch/lasx.md (UNSPEC_LASX_XVMUH_S): Remove.
> 	(UNSPEC_LASX_XVMUH_U): Remove.
> 	(lasx_xvmuh_s_<lasxfmt>): Remove.
> 	(lasx_xvmuh_u_<lasxfmt>): Remove.
> 	* config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vmuh_b):
> 	Redefine to standard pattern name.
> 	(CODE_FOR_lsx_vmuh_h): Likewise.
> 	(CODE_FOR_lsx_vmuh_w): Likewise.
> 	(CODE_FOR_lsx_vmuh_d): Likewise.
> 	(CODE_FOR_lsx_vmuh_bu): Likewise.
> 	(CODE_FOR_lsx_vmuh_hu): Likewise.
> 	(CODE_FOR_lsx_vmuh_wu): Likewise.
> 	(CODE_FOR_lsx_vmuh_du): Likewise.
> 	(CODE_FOR_lasx_xvmuh_b): Likewise.
> 	(CODE_FOR_lasx_xvmuh_h): Likewise.
> 	(CODE_FOR_lasx_xvmuh_w): Likewise.
> 	(CODE_FOR_lasx_xvmuh_d): Likewise.
> 	(CODE_FOR_lasx_xvmuh_bu): Likewise.
> 	(CODE_FOR_lasx_xvmuh_hu): Likewise.
> 	(CODE_FOR_lasx_xvmuh_wu): Likewise.
> 	(CODE_FOR_lasx_xvmuh_du): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/loongarch/vect-muh.c: New test.
> ---
>   gcc/config/loongarch/lasx.md                  | 22 ------------
>   gcc/config/loongarch/loongarch-builtins.cc    | 32 ++++++++---------
>   gcc/config/loongarch/lsx.md                   | 22 ------------
>   gcc/config/loongarch/simd.md                  | 16 +++++++++
>   gcc/testsuite/gcc.target/loongarch/vect-muh.c | 36 +++++++++++++++++++
>   5 files changed, 68 insertions(+), 60 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-muh.c
>
> diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
> index d4a56c307c4..023a023b44e 100644
> --- a/gcc/config/loongarch/lasx.md
> +++ b/gcc/config/loongarch/lasx.md
> @@ -68,8 +68,6 @@ (define_c_enum "unspec" [
>     UNSPEC_LASX_BRANCH
>     UNSPEC_LASX_BRANCH_V
>   
> -  UNSPEC_LASX_XVMUH_S
> -  UNSPEC_LASX_XVMUH_U
>     UNSPEC_LASX_MXVEXTW_U
>     UNSPEC_LASX_XVSLLWIL_S
>     UNSPEC_LASX_XVSLLWIL_U
> @@ -2823,26 +2821,6 @@ (define_insn "neg<mode>2"
>     [(set_attr "type" "simd_logic")
>      (set_attr "mode" "<MODE>")])
>   
> -(define_insn "lasx_xvmuh_s_<lasxfmt>"
> -  [(set (match_operand:ILASX 0 "register_operand" "=f")
> -	(unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
> -		       (match_operand:ILASX 2 "register_operand" "f")]
> -		      UNSPEC_LASX_XVMUH_S))]
> -  "ISA_HAS_LASX"
> -  "xvmuh.<lasxfmt>\t%u0,%u1,%u2"
> -  [(set_attr "type" "simd_int_arith")
> -   (set_attr "mode" "<MODE>")])
> -
> -(define_insn "lasx_xvmuh_u_<lasxfmt_u>"
> -  [(set (match_operand:ILASX 0 "register_operand" "=f")
> -	(unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
> -		       (match_operand:ILASX 2 "register_operand" "f")]
> -		      UNSPEC_LASX_XVMUH_U))]
> -  "ISA_HAS_LASX"
> -  "xvmuh.<lasxfmt_u>\t%u0,%u1,%u2"
> -  [(set_attr "type" "simd_int_arith")
> -   (set_attr "mode" "<MODE>")])
> -
>   (define_insn "lasx_xvsllwil_s_<dlasxfmt>_<lasxfmt>"
>     [(set (match_operand:<VDMODE256> 0 "register_operand" "=f")
>   	(unspec:<VDMODE256> [(match_operand:ILASX_WHB 1 "register_operand" "f")
> diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc
> index cbd833aa283..a6fcc1c731e 100644
> --- a/gcc/config/loongarch/loongarch-builtins.cc
> +++ b/gcc/config/loongarch/loongarch-builtins.cc
> @@ -319,6 +319,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
>   #define CODE_FOR_lsx_vmod_hu CODE_FOR_umodv8hi3
>   #define CODE_FOR_lsx_vmod_wu CODE_FOR_umodv4si3
>   #define CODE_FOR_lsx_vmod_du CODE_FOR_umodv2di3
> +#define CODE_FOR_lsx_vmuh_b CODE_FOR_smulv16qi3_highpart
> +#define CODE_FOR_lsx_vmuh_h CODE_FOR_smulv8hi3_highpart
> +#define CODE_FOR_lsx_vmuh_w CODE_FOR_smulv4si3_highpart
> +#define CODE_FOR_lsx_vmuh_d CODE_FOR_smulv2di3_highpart
> +#define CODE_FOR_lsx_vmuh_bu CODE_FOR_umulv16qi3_highpart
> +#define CODE_FOR_lsx_vmuh_hu CODE_FOR_umulv8hi3_highpart
> +#define CODE_FOR_lsx_vmuh_wu CODE_FOR_umulv4si3_highpart
> +#define CODE_FOR_lsx_vmuh_du CODE_FOR_umulv2di3_highpart
>   #define CODE_FOR_lsx_vmul_b CODE_FOR_mulv16qi3
>   #define CODE_FOR_lsx_vmul_h CODE_FOR_mulv8hi3
>   #define CODE_FOR_lsx_vmul_w CODE_FOR_mulv4si3
> @@ -439,14 +447,6 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
>   #define CODE_FOR_lsx_vfnmsub_s CODE_FOR_vfnmsubv4sf4_nmsub4
>   #define CODE_FOR_lsx_vfnmsub_d CODE_FOR_vfnmsubv2df4_nmsub4
>   
> -#define CODE_FOR_lsx_vmuh_b CODE_FOR_lsx_vmuh_s_b
> -#define CODE_FOR_lsx_vmuh_h CODE_FOR_lsx_vmuh_s_h
> -#define CODE_FOR_lsx_vmuh_w CODE_FOR_lsx_vmuh_s_w
> -#define CODE_FOR_lsx_vmuh_d CODE_FOR_lsx_vmuh_s_d
> -#define CODE_FOR_lsx_vmuh_bu CODE_FOR_lsx_vmuh_u_bu
> -#define CODE_FOR_lsx_vmuh_hu CODE_FOR_lsx_vmuh_u_hu
> -#define CODE_FOR_lsx_vmuh_wu CODE_FOR_lsx_vmuh_u_wu
> -#define CODE_FOR_lsx_vmuh_du CODE_FOR_lsx_vmuh_u_du
>   #define CODE_FOR_lsx_vsllwil_h_b CODE_FOR_lsx_vsllwil_s_h_b
>   #define CODE_FOR_lsx_vsllwil_w_h CODE_FOR_lsx_vsllwil_s_w_h
>   #define CODE_FOR_lsx_vsllwil_d_w CODE_FOR_lsx_vsllwil_s_d_w
> @@ -588,6 +588,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
>   #define CODE_FOR_lasx_xvmul_h CODE_FOR_mulv16hi3
>   #define CODE_FOR_lasx_xvmul_w CODE_FOR_mulv8si3
>   #define CODE_FOR_lasx_xvmul_d CODE_FOR_mulv4di3
> +#define CODE_FOR_lasx_xvmuh_b CODE_FOR_smulv32qi3_highpart
> +#define CODE_FOR_lasx_xvmuh_h CODE_FOR_smulv16hi3_highpart
> +#define CODE_FOR_lasx_xvmuh_w CODE_FOR_smulv8si3_highpart
> +#define CODE_FOR_lasx_xvmuh_d CODE_FOR_smulv4di3_highpart
> +#define CODE_FOR_lasx_xvmuh_bu CODE_FOR_umulv32qi3_highpart
> +#define CODE_FOR_lasx_xvmuh_hu CODE_FOR_umulv16hi3_highpart
> +#define CODE_FOR_lasx_xvmuh_wu CODE_FOR_umulv8si3_highpart
> +#define CODE_FOR_lasx_xvmuh_du CODE_FOR_umulv4di3_highpart
>   #define CODE_FOR_lasx_xvclz_b CODE_FOR_clzv32qi2
>   #define CODE_FOR_lasx_xvclz_h CODE_FOR_clzv16hi2
>   #define CODE_FOR_lasx_xvclz_w CODE_FOR_clzv8si2
> @@ -697,14 +705,6 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
>   #define CODE_FOR_lasx_xvavgr_hu CODE_FOR_lasx_xvavgr_u_hu
>   #define CODE_FOR_lasx_xvavgr_wu CODE_FOR_lasx_xvavgr_u_wu
>   #define CODE_FOR_lasx_xvavgr_du CODE_FOR_lasx_xvavgr_u_du
> -#define CODE_FOR_lasx_xvmuh_b CODE_FOR_lasx_xvmuh_s_b
> -#define CODE_FOR_lasx_xvmuh_h CODE_FOR_lasx_xvmuh_s_h
> -#define CODE_FOR_lasx_xvmuh_w CODE_FOR_lasx_xvmuh_s_w
> -#define CODE_FOR_lasx_xvmuh_d CODE_FOR_lasx_xvmuh_s_d
> -#define CODE_FOR_lasx_xvmuh_bu CODE_FOR_lasx_xvmuh_u_bu
> -#define CODE_FOR_lasx_xvmuh_hu CODE_FOR_lasx_xvmuh_u_hu
> -#define CODE_FOR_lasx_xvmuh_wu CODE_FOR_lasx_xvmuh_u_wu
> -#define CODE_FOR_lasx_xvmuh_du CODE_FOR_lasx_xvmuh_u_du
>   #define CODE_FOR_lasx_xvssran_b_h CODE_FOR_lasx_xvssran_s_b_h
>   #define CODE_FOR_lasx_xvssran_h_w CODE_FOR_lasx_xvssran_s_h_w
>   #define CODE_FOR_lasx_xvssran_w_d CODE_FOR_lasx_xvssran_s_w_d
> diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
> index c1c3719e383..537afaf9625 100644
> --- a/gcc/config/loongarch/lsx.md
> +++ b/gcc/config/loongarch/lsx.md
> @@ -64,8 +64,6 @@ (define_c_enum "unspec" [
>     UNSPEC_LSX_VSRLR
>     UNSPEC_LSX_VSRLRI
>     UNSPEC_LSX_VSHUF
> -  UNSPEC_LSX_VMUH_S
> -  UNSPEC_LSX_VMUH_U
>     UNSPEC_LSX_VEXTW_S
>     UNSPEC_LSX_VEXTW_U
>     UNSPEC_LSX_VSLLWIL_S
> @@ -2506,26 +2504,6 @@ (define_insn "vneg<mode>2"
>     [(set_attr "type" "simd_logic")
>      (set_attr "mode" "<MODE>")])
>   
> -(define_insn "lsx_vmuh_s_<lsxfmt>"
> -  [(set (match_operand:ILSX 0 "register_operand" "=f")
> -	(unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f")
> -		      (match_operand:ILSX 2 "register_operand" "f")]
> -		     UNSPEC_LSX_VMUH_S))]
> -  "ISA_HAS_LSX"
> -  "vmuh.<lsxfmt>\t%w0,%w1,%w2"
> -  [(set_attr "type" "simd_int_arith")
> -   (set_attr "mode" "<MODE>")])
> -
> -(define_insn "lsx_vmuh_u_<lsxfmt_u>"
> -  [(set (match_operand:ILSX 0 "register_operand" "=f")
> -	(unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f")
> -		      (match_operand:ILSX 2 "register_operand" "f")]
> -		     UNSPEC_LSX_VMUH_U))]
> -  "ISA_HAS_LSX"
> -  "vmuh.<lsxfmt_u>\t%w0,%w1,%w2"
> -  [(set_attr "type" "simd_int_arith")
> -   (set_attr "mode" "<MODE>")])
> -
>   (define_insn "lsx_vextw_s_d"
>     [(set (match_operand:V2DI 0 "register_operand" "=f")
>   	(unspec:V2DI [(match_operand:V4SI 1 "register_operand" "f")]
> diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
> index f371e201127..79324183233 100644
> --- a/gcc/config/loongarch/simd.md
> +++ b/gcc/config/loongarch/simd.md
> @@ -187,6 +187,22 @@ (define_insn_and_split "fix_trunc<mode><vimode>2"
>     [(set_attr "type" "simd_fcvt")
>      (set_attr "mode" "<MODE>")])
>   
> +;; <x>vmuh.{b/h/w/d}
> +
> +(define_code_attr muh
> +  [(sign_extend "smul_highpart")
> +   (zero_extend "umul_highpart")])
> +
> +(define_insn "<su>mul<mode>3_highpart"
> +  [(set (match_operand:IVEC 0 "register_operand" "=f")
> +	(<muh>:IVEC (match_operand:IVEC 1 "register_operand" "f")
> +		    (match_operand:IVEC 2 "register_operand" "f")))
> +   (any_extend (const_int 0))]
> +  ""
> +  "<x>vmuh.<simdfmt><u>\t%<wu>0,%<wu>1,%<wu>2"
> +  [(set_attr "type" "simd_int_arith")
> +   (set_attr "mode" "<MODE>")])
> +
>   ; The LoongArch SX Instructions.
>   (include "lsx.md")
>   
> diff --git a/gcc/testsuite/gcc.target/loongarch/vect-muh.c b/gcc/testsuite/gcc.target/loongarch/vect-muh.c
> new file mode 100644
> index 00000000000..a788840b23c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/vect-muh.c
> @@ -0,0 +1,36 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mlasx -O3" } */
> +/* { dg-final { scan-assembler "\tvmuh\.w\t" } } */
> +/* { dg-final { scan-assembler "\tvmuh\.wu\t" } } */
> +/* { dg-final { scan-assembler "\txvmuh\.w\t" } } */
> +/* { dg-final { scan-assembler "\txvmuh\.wu\t" } } */
> +
> +int a[8], b[8], c[8];
> +
> +void
> +test1 (void)
> +{
> +  for (int i = 0; i < 4; i++)
> +    c[i] = ((long)a[i] * (long)b[i]) >> 32;
> +}
> +
> +void
> +test2 (void)
> +{
> +  for (int i = 0; i < 4; i++)
> +    c[i] = ((long)(unsigned)a[i] * (long)(unsigned)b[i]) >> 32;
> +}
> +
> +void
> +test3 (void)
> +{
> +  for (int i = 0; i < 8; i++)
> +    c[i] = ((long)a[i] * (long)b[i]) >> 32;
> +}
> +
> +void
> +test4 (void)
> +{
> +  for (int i = 0; i < 8; i++)
> +    c[i] = ((long)(unsigned)a[i] * (long)(unsigned)b[i]) >> 32;
> +}


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 3/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate shift
  2023-11-20  0:47 [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations Xi Ruoyao
  2023-11-20  0:47 ` [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578] Xi Ruoyao
  2023-11-20  0:47 ` [PATCH v3 2/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX muh instructions Xi Ruoyao
@ 2023-11-20  0:47 ` Xi Ruoyao
  2023-11-23  8:42   ` chenglulu
  2023-11-20  0:47 ` [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact Xi Ruoyao
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-20  0:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: chenglulu, i, xuchenghua, Xi Ruoyao

Remove unnecessary UNSPECs and make the [x]vrotr[i] instructions useful
with GNU vectors and auto vectorization.

gcc/ChangeLog:

	* config/loongarch/lsx.md (bitimm): Move to ...
	(UNSPEC_LSX_VROTR): Remove.
	(lsx_vrotr_<lsxfmt>): Remove.
	(lsx_vrotri_<lsxfmt>): Remove.
	* config/loongarch/lasx.md (UNSPEC_LASX_XVROTR): Remove.
	(lsx_vrotr_<lsxfmt>): Remove.
	(lsx_vrotri_<lsxfmt>): Remove.
	* config/loongarch/simd.md (bitimm): ... here.  Expand it to
	cover LASX modes.
	(vrotr<mode>3): New define_insn.
	(vrotri<mode>3): New define_insn.
	* config/loongarch/loongarch-builtins.cc:
	(CODE_FOR_lsx_vrotr_b): Use standard pattern name.
	(CODE_FOR_lsx_vrotr_h): Likewise.
	(CODE_FOR_lsx_vrotr_w): Likewise.
	(CODE_FOR_lsx_vrotr_d): Likewise.
	(CODE_FOR_lasx_xvrotr_b): Likewise.
	(CODE_FOR_lasx_xvrotr_h): Likewise.
	(CODE_FOR_lasx_xvrotr_w): Likewise.
	(CODE_FOR_lasx_xvrotr_d): Likewise.
	(CODE_FOR_lsx_vrotri_b): Define to standard pattern name.
	(CODE_FOR_lsx_vrotri_h): Likewise.
	(CODE_FOR_lsx_vrotri_w): Likewise.
	(CODE_FOR_lsx_vrotri_d): Likewise.
	(CODE_FOR_lasx_xvrotri_b): Likewise.
	(CODE_FOR_lasx_xvrotri_h): Likewise.
	(CODE_FOR_lasx_xvrotri_w): Likewise.
	(CODE_FOR_lasx_xvrotri_d): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vect-rotr.c: New test.
---
 gcc/config/loongarch/lasx.md                  | 22 ------------
 gcc/config/loongarch/loongarch-builtins.cc    | 16 +++++++++
 gcc/config/loongarch/lsx.md                   | 28 ---------------
 gcc/config/loongarch/simd.md                  | 29 +++++++++++++++
 .../gcc.target/loongarch/vect-rotr.c          | 36 +++++++++++++++++++
 5 files changed, 81 insertions(+), 50 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-rotr.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 023a023b44e..116b30c0774 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -138,7 +138,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVHSUBW_Q_D
   UNSPEC_LASX_XVHADDW_QU_DU
   UNSPEC_LASX_XVHSUBW_QU_DU
-  UNSPEC_LASX_XVROTR
   UNSPEC_LASX_XVADD_Q
   UNSPEC_LASX_XVSUB_Q
   UNSPEC_LASX_XVREPLVE
@@ -4232,18 +4231,6 @@ (define_insn "lasx_xvhsubw_qu_du"
   [(set_attr "type" "simd_int_arith")
    (set_attr "mode" "V4DI")])
 
-;;XVROTR.B   XVROTR.H   XVROTR.W   XVROTR.D
-;;TODO-478
-(define_insn "lasx_xvrotr_<lasxfmt>"
-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-	(unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
-		       (match_operand:ILASX 2 "register_operand" "f")]
-		      UNSPEC_LASX_XVROTR))]
-  "ISA_HAS_LASX"
-  "xvrotr.<lasxfmt>\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "<MODE>")])
-
 ;;XVADD.Q
 ;;TODO2
 (define_insn "lasx_xvadd_q"
@@ -4426,15 +4413,6 @@ (define_insn "lasx_xvexth_qu_du"
   [(set_attr "type" "simd_fcvt")
    (set_attr "mode" "V4DI")])
 
-(define_insn "lasx_xvrotri_<lasxfmt>"
-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-	(rotatert:ILASX (match_operand:ILASX 1 "register_operand" "f")
-		       (match_operand 2 "const_<bitimm256>_operand" "")))]
-  "ISA_HAS_LASX"
-  "xvrotri.<lasxfmt>\t%u0,%u1,%2"
-  [(set_attr "type" "simd_shf")
-   (set_attr "mode" "<MODE>")])
-
 (define_insn "lasx_xvextl_q_d"
   [(set (match_operand:V4DI 0 "register_operand" "=f")
 	(unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f")]
diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc
index a6fcc1c731e..5d037ab7f10 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -369,6 +369,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lsx_vsrli_h CODE_FOR_vlshrv8hi3
 #define CODE_FOR_lsx_vsrli_w CODE_FOR_vlshrv4si3
 #define CODE_FOR_lsx_vsrli_d CODE_FOR_vlshrv2di3
+#define CODE_FOR_lsx_vrotr_b CODE_FOR_vrotrv16qi3
+#define CODE_FOR_lsx_vrotr_h CODE_FOR_vrotrv8hi3
+#define CODE_FOR_lsx_vrotr_w CODE_FOR_vrotrv4si3
+#define CODE_FOR_lsx_vrotr_d CODE_FOR_vrotrv2di3
+#define CODE_FOR_lsx_vrotri_b CODE_FOR_rotrv16qi3
+#define CODE_FOR_lsx_vrotri_h CODE_FOR_rotrv8hi3
+#define CODE_FOR_lsx_vrotri_w CODE_FOR_rotrv4si3
+#define CODE_FOR_lsx_vrotri_d CODE_FOR_rotrv2di3
 #define CODE_FOR_lsx_vsub_b CODE_FOR_subv16qi3
 #define CODE_FOR_lsx_vsub_h CODE_FOR_subv8hi3
 #define CODE_FOR_lsx_vsub_w CODE_FOR_subv4si3
@@ -634,6 +642,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lasx_xvsrli_h CODE_FOR_vlshrv16hi3
 #define CODE_FOR_lasx_xvsrli_w CODE_FOR_vlshrv8si3
 #define CODE_FOR_lasx_xvsrli_d CODE_FOR_vlshrv4di3
+#define CODE_FOR_lasx_xvrotr_b CODE_FOR_vrotrv32qi3
+#define CODE_FOR_lasx_xvrotr_h CODE_FOR_vrotrv16hi3
+#define CODE_FOR_lasx_xvrotr_w CODE_FOR_vrotrv8si3
+#define CODE_FOR_lasx_xvrotr_d CODE_FOR_vrotrv4di3
+#define CODE_FOR_lasx_xvrotri_b CODE_FOR_rotrv32qi3
+#define CODE_FOR_lasx_xvrotri_h CODE_FOR_rotrv16hi3
+#define CODE_FOR_lasx_xvrotri_w CODE_FOR_rotrv8si3
+#define CODE_FOR_lasx_xvrotri_d CODE_FOR_rotrv4di3
 #define CODE_FOR_lasx_xvsub_b CODE_FOR_subv32qi3
 #define CODE_FOR_lasx_xvsub_h CODE_FOR_subv16hi3
 #define CODE_FOR_lasx_xvsub_w CODE_FOR_subv8si3
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 537afaf9625..23239993404 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -141,7 +141,6 @@ (define_c_enum "unspec" [
   UNSPEC_LSX_VMADDWOD
   UNSPEC_LSX_VMADDWOD2
   UNSPEC_LSX_VMADDWOD3
-  UNSPEC_LSX_VROTR
   UNSPEC_LSX_VADD_Q
   UNSPEC_LSX_VSUB_Q
   UNSPEC_LSX_VEXTH_Q_D
@@ -363,14 +362,6 @@ (define_mode_attr bitmask
    (V8HI "exp_8")
    (V16QI "exp_16")])
 
-;; This attribute is used to form an immediate operand constraint using
-;; "const_<bitimm>_operand".
-(define_mode_attr bitimm
-  [(V16QI "uimm3")
-   (V8HI  "uimm4")
-   (V4SI  "uimm5")
-   (V2DI  "uimm6")])
-
 (define_expand "vec_init<mode><unitmode>"
   [(match_operand:LSX 0 "register_operand")
    (match_operand:LSX 1 "")]
@@ -4152,16 +4143,6 @@ (define_insn "lsx_vmaddwod_q_du_d"
   [(set_attr "type" "simd_int_arith")
    (set_attr "mode" "V2DI")])
 
-(define_insn "lsx_vrotr_<lsxfmt>"
-  [(set (match_operand:ILSX 0 "register_operand" "=f")
-	(unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f")
-		      (match_operand:ILSX 2 "register_operand" "f")]
-		     UNSPEC_LSX_VROTR))]
-  "ISA_HAS_LSX"
-  "vrotr.<lsxfmt>\t%w0,%w1,%w2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "<MODE>")])
-
 (define_insn "lsx_vadd_q"
   [(set (match_operand:V2DI 0 "register_operand" "=f")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f")
@@ -4255,15 +4236,6 @@ (define_insn "lsx_vexth_qu_du"
   [(set_attr "type" "simd_fcvt")
    (set_attr "mode" "V2DI")])
 
-(define_insn "lsx_vrotri_<lsxfmt>"
-  [(set (match_operand:ILSX 0 "register_operand" "=f")
-	(rotatert:ILSX (match_operand:ILSX 1 "register_operand" "f")
-		      (match_operand 2 "const_<bitimm>_operand" "")))]
-  "ISA_HAS_LSX"
-  "vrotri.<lsxfmt>\t%w0,%w1,%2"
-  [(set_attr "type" "simd_shf")
-   (set_attr "mode" "<MODE>")])
-
 (define_insn "lsx_vextl_q_d"
   [(set (match_operand:V2DI 0 "register_operand" "=f")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f")]
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 79324183233..6937477e3df 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -72,6 +72,13 @@ (define_mode_attr elmbits [(V2DI "64") (V4DI "64")
 			   (V8HI "16") (V16HI "16")
 			   (V16QI "8") (V32QI "8")])
 
+;; This attribute is used to form an immediate operand constraint using
+;; "const_<bitimm>_operand".
+(define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3")
+			  (V8HI  "uimm4") (V16HI "uimm4")
+			  (V4SI  "uimm5") (V8SI "uimm5")
+			  (V2DI  "uimm6") (V4DI "uimm6")])
+
 ;; =======================================================================
 ;; For many LASX instructions, the only difference of it from the LSX
 ;; counterpart is the length of vector operands.  Describe these LSX/LASX
@@ -203,6 +210,28 @@ (define_insn "<su>mul<mode>3_highpart"
   [(set_attr "type" "simd_int_arith")
    (set_attr "mode" "<MODE>")])
 
+;; <x>vrotr.{b/h/w/d}
+
+(define_insn "vrotr<mode>3"
+  [(set (match_operand:IVEC 0 "register_operand" "=f")
+	(rotatert:IVEC (match_operand:IVEC 1 "register_operand" "f")
+		       (match_operand:IVEC 2 "register_operand" "f")))]
+  ""
+  "<x>vrotr.<simdfmt>\t%<wu>0,%<wu>1,%<wu>2"
+  [(set_attr "type" "simd_int_arith")
+   (set_attr "mode" "<MODE>")])
+
+;; <x>vrotri.{b/h/w/d}
+
+(define_insn "rotr<mode>3"
+  [(set (match_operand:IVEC 0 "register_operand" "=f")
+	(rotatert:IVEC (match_operand:IVEC 1 "register_operand" "f")
+		       (match_operand:SI 2 "const_<bitimm>_operand")))]
+  ""
+  "<x>vrotri.<simdfmt>\t%<wu>0,%<wu>1,%2";
+  [(set_attr "type" "simd_int_arith")
+   (set_attr "mode" "<MODE>")])
+
 ; The LoongArch SX Instructions.
 (include "lsx.md")
 
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-rotr.c b/gcc/testsuite/gcc.target/loongarch/vect-rotr.c
new file mode 100644
index 00000000000..733c36334ce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-rotr.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlasx" } */
+/* { dg-final { scan-assembler "\tvrotr\.w\t" } } */
+/* { dg-final { scan-assembler "\txvrotr\.w\t" } } */
+/* { dg-final { scan-assembler "\tvrotri\.w\t\[^\n\]*7\n" } } */
+/* { dg-final { scan-assembler "\txvrotri\.w\t\[^\n\]*7\n" } } */
+
+unsigned int a[8], b[8];
+
+void
+test1 (void)
+{
+  for (int i = 0; i < 4; i++)
+    a[i] = a[i] >> b[i] | a[i] << (32 - b[i]);
+}
+
+void
+test2 (void)
+{
+  for (int i = 0; i < 8; i++)
+    a[i] = a[i] >> b[i] | a[i] << (32 - b[i]);
+}
+
+void
+test3 (void)
+{
+  for (int i = 0; i < 4; i++)
+    a[i] = a[i] >> 7 | a[i] << 25;
+}
+
+void
+test4 (void)
+{
+  for (int i = 0; i < 8; i++)
+    a[i] = a[i] >> 7 | a[i] << 25;
+}
-- 
2.42.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 3/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate shift
  2023-11-20  0:47 ` [PATCH v3 3/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate shift Xi Ruoyao
@ 2023-11-23  8:42   ` chenglulu
  0 siblings, 0 replies; 33+ messages in thread
From: chenglulu @ 2023-11-23  8:42 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua

LGTM.

Thanks.

在 2023/11/20 上午8:47, Xi Ruoyao 写道:
> Remove unnecessary UNSPECs and make the [x]vrotr[i] instructions useful
> with GNU vectors and auto vectorization.
>
> gcc/ChangeLog:
>
> 	* config/loongarch/lsx.md (bitimm): Move to ...
> 	(UNSPEC_LSX_VROTR): Remove.
> 	(lsx_vrotr_<lsxfmt>): Remove.
> 	(lsx_vrotri_<lsxfmt>): Remove.
> 	* config/loongarch/lasx.md (UNSPEC_LASX_XVROTR): Remove.
> 	(lsx_vrotr_<lsxfmt>): Remove.
> 	(lsx_vrotri_<lsxfmt>): Remove.
> 	* config/loongarch/simd.md (bitimm): ... here.  Expand it to
> 	cover LASX modes.
> 	(vrotr<mode>3): New define_insn.
> 	(vrotri<mode>3): New define_insn.
> 	* config/loongarch/loongarch-builtins.cc:
> 	(CODE_FOR_lsx_vrotr_b): Use standard pattern name.
> 	(CODE_FOR_lsx_vrotr_h): Likewise.
> 	(CODE_FOR_lsx_vrotr_w): Likewise.
> 	(CODE_FOR_lsx_vrotr_d): Likewise.
> 	(CODE_FOR_lasx_xvrotr_b): Likewise.
> 	(CODE_FOR_lasx_xvrotr_h): Likewise.
> 	(CODE_FOR_lasx_xvrotr_w): Likewise.
> 	(CODE_FOR_lasx_xvrotr_d): Likewise.
> 	(CODE_FOR_lsx_vrotri_b): Define to standard pattern name.
> 	(CODE_FOR_lsx_vrotri_h): Likewise.
> 	(CODE_FOR_lsx_vrotri_w): Likewise.
> 	(CODE_FOR_lsx_vrotri_d): Likewise.
> 	(CODE_FOR_lasx_xvrotri_b): Likewise.
> 	(CODE_FOR_lasx_xvrotri_h): Likewise.
> 	(CODE_FOR_lasx_xvrotri_w): Likewise.
> 	(CODE_FOR_lasx_xvrotri_d): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/loongarch/vect-rotr.c: New test.
> ---
>   gcc/config/loongarch/lasx.md                  | 22 ------------
>   gcc/config/loongarch/loongarch-builtins.cc    | 16 +++++++++
>   gcc/config/loongarch/lsx.md                   | 28 ---------------
>   gcc/config/loongarch/simd.md                  | 29 +++++++++++++++
>   .../gcc.target/loongarch/vect-rotr.c          | 36 +++++++++++++++++++
>   5 files changed, 81 insertions(+), 50 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-rotr.c
>
> diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
> index 023a023b44e..116b30c0774 100644
> --- a/gcc/config/loongarch/lasx.md
> +++ b/gcc/config/loongarch/lasx.md
> @@ -138,7 +138,6 @@ (define_c_enum "unspec" [
>     UNSPEC_LASX_XVHSUBW_Q_D
>     UNSPEC_LASX_XVHADDW_QU_DU
>     UNSPEC_LASX_XVHSUBW_QU_DU
> -  UNSPEC_LASX_XVROTR
>     UNSPEC_LASX_XVADD_Q
>     UNSPEC_LASX_XVSUB_Q
>     UNSPEC_LASX_XVREPLVE
> @@ -4232,18 +4231,6 @@ (define_insn "lasx_xvhsubw_qu_du"
>     [(set_attr "type" "simd_int_arith")
>      (set_attr "mode" "V4DI")])
>   
> -;;XVROTR.B   XVROTR.H   XVROTR.W   XVROTR.D
> -;;TODO-478
> -(define_insn "lasx_xvrotr_<lasxfmt>"
> -  [(set (match_operand:ILASX 0 "register_operand" "=f")
> -	(unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
> -		       (match_operand:ILASX 2 "register_operand" "f")]
> -		      UNSPEC_LASX_XVROTR))]
> -  "ISA_HAS_LASX"
> -  "xvrotr.<lasxfmt>\t%u0,%u1,%u2"
> -  [(set_attr "type" "simd_int_arith")
> -   (set_attr "mode" "<MODE>")])
> -
>   ;;XVADD.Q
>   ;;TODO2
>   (define_insn "lasx_xvadd_q"
> @@ -4426,15 +4413,6 @@ (define_insn "lasx_xvexth_qu_du"
>     [(set_attr "type" "simd_fcvt")
>      (set_attr "mode" "V4DI")])
>   
> -(define_insn "lasx_xvrotri_<lasxfmt>"
> -  [(set (match_operand:ILASX 0 "register_operand" "=f")
> -	(rotatert:ILASX (match_operand:ILASX 1 "register_operand" "f")
> -		       (match_operand 2 "const_<bitimm256>_operand" "")))]
> -  "ISA_HAS_LASX"
> -  "xvrotri.<lasxfmt>\t%u0,%u1,%2"
> -  [(set_attr "type" "simd_shf")
> -   (set_attr "mode" "<MODE>")])
> -
>   (define_insn "lasx_xvextl_q_d"
>     [(set (match_operand:V4DI 0 "register_operand" "=f")
>   	(unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f")]
> diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc
> index a6fcc1c731e..5d037ab7f10 100644
> --- a/gcc/config/loongarch/loongarch-builtins.cc
> +++ b/gcc/config/loongarch/loongarch-builtins.cc
> @@ -369,6 +369,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
>   #define CODE_FOR_lsx_vsrli_h CODE_FOR_vlshrv8hi3
>   #define CODE_FOR_lsx_vsrli_w CODE_FOR_vlshrv4si3
>   #define CODE_FOR_lsx_vsrli_d CODE_FOR_vlshrv2di3
> +#define CODE_FOR_lsx_vrotr_b CODE_FOR_vrotrv16qi3
> +#define CODE_FOR_lsx_vrotr_h CODE_FOR_vrotrv8hi3
> +#define CODE_FOR_lsx_vrotr_w CODE_FOR_vrotrv4si3
> +#define CODE_FOR_lsx_vrotr_d CODE_FOR_vrotrv2di3
> +#define CODE_FOR_lsx_vrotri_b CODE_FOR_rotrv16qi3
> +#define CODE_FOR_lsx_vrotri_h CODE_FOR_rotrv8hi3
> +#define CODE_FOR_lsx_vrotri_w CODE_FOR_rotrv4si3
> +#define CODE_FOR_lsx_vrotri_d CODE_FOR_rotrv2di3
>   #define CODE_FOR_lsx_vsub_b CODE_FOR_subv16qi3
>   #define CODE_FOR_lsx_vsub_h CODE_FOR_subv8hi3
>   #define CODE_FOR_lsx_vsub_w CODE_FOR_subv4si3
> @@ -634,6 +642,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
>   #define CODE_FOR_lasx_xvsrli_h CODE_FOR_vlshrv16hi3
>   #define CODE_FOR_lasx_xvsrli_w CODE_FOR_vlshrv8si3
>   #define CODE_FOR_lasx_xvsrli_d CODE_FOR_vlshrv4di3
> +#define CODE_FOR_lasx_xvrotr_b CODE_FOR_vrotrv32qi3
> +#define CODE_FOR_lasx_xvrotr_h CODE_FOR_vrotrv16hi3
> +#define CODE_FOR_lasx_xvrotr_w CODE_FOR_vrotrv8si3
> +#define CODE_FOR_lasx_xvrotr_d CODE_FOR_vrotrv4di3
> +#define CODE_FOR_lasx_xvrotri_b CODE_FOR_rotrv32qi3
> +#define CODE_FOR_lasx_xvrotri_h CODE_FOR_rotrv16hi3
> +#define CODE_FOR_lasx_xvrotri_w CODE_FOR_rotrv8si3
> +#define CODE_FOR_lasx_xvrotri_d CODE_FOR_rotrv4di3
>   #define CODE_FOR_lasx_xvsub_b CODE_FOR_subv32qi3
>   #define CODE_FOR_lasx_xvsub_h CODE_FOR_subv16hi3
>   #define CODE_FOR_lasx_xvsub_w CODE_FOR_subv8si3
> diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
> index 537afaf9625..23239993404 100644
> --- a/gcc/config/loongarch/lsx.md
> +++ b/gcc/config/loongarch/lsx.md
> @@ -141,7 +141,6 @@ (define_c_enum "unspec" [
>     UNSPEC_LSX_VMADDWOD
>     UNSPEC_LSX_VMADDWOD2
>     UNSPEC_LSX_VMADDWOD3
> -  UNSPEC_LSX_VROTR
>     UNSPEC_LSX_VADD_Q
>     UNSPEC_LSX_VSUB_Q
>     UNSPEC_LSX_VEXTH_Q_D
> @@ -363,14 +362,6 @@ (define_mode_attr bitmask
>      (V8HI "exp_8")
>      (V16QI "exp_16")])
>   
> -;; This attribute is used to form an immediate operand constraint using
> -;; "const_<bitimm>_operand".
> -(define_mode_attr bitimm
> -  [(V16QI "uimm3")
> -   (V8HI  "uimm4")
> -   (V4SI  "uimm5")
> -   (V2DI  "uimm6")])
> -
>   (define_expand "vec_init<mode><unitmode>"
>     [(match_operand:LSX 0 "register_operand")
>      (match_operand:LSX 1 "")]
> @@ -4152,16 +4143,6 @@ (define_insn "lsx_vmaddwod_q_du_d"
>     [(set_attr "type" "simd_int_arith")
>      (set_attr "mode" "V2DI")])
>   
> -(define_insn "lsx_vrotr_<lsxfmt>"
> -  [(set (match_operand:ILSX 0 "register_operand" "=f")
> -	(unspec:ILSX [(match_operand:ILSX 1 "register_operand" "f")
> -		      (match_operand:ILSX 2 "register_operand" "f")]
> -		     UNSPEC_LSX_VROTR))]
> -  "ISA_HAS_LSX"
> -  "vrotr.<lsxfmt>\t%w0,%w1,%w2"
> -  [(set_attr "type" "simd_int_arith")
> -   (set_attr "mode" "<MODE>")])
> -
>   (define_insn "lsx_vadd_q"
>     [(set (match_operand:V2DI 0 "register_operand" "=f")
>   	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f")
> @@ -4255,15 +4236,6 @@ (define_insn "lsx_vexth_qu_du"
>     [(set_attr "type" "simd_fcvt")
>      (set_attr "mode" "V2DI")])
>   
> -(define_insn "lsx_vrotri_<lsxfmt>"
> -  [(set (match_operand:ILSX 0 "register_operand" "=f")
> -	(rotatert:ILSX (match_operand:ILSX 1 "register_operand" "f")
> -		      (match_operand 2 "const_<bitimm>_operand" "")))]
> -  "ISA_HAS_LSX"
> -  "vrotri.<lsxfmt>\t%w0,%w1,%2"
> -  [(set_attr "type" "simd_shf")
> -   (set_attr "mode" "<MODE>")])
> -
>   (define_insn "lsx_vextl_q_d"
>     [(set (match_operand:V2DI 0 "register_operand" "=f")
>   	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f")]
> diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
> index 79324183233..6937477e3df 100644
> --- a/gcc/config/loongarch/simd.md
> +++ b/gcc/config/loongarch/simd.md
> @@ -72,6 +72,13 @@ (define_mode_attr elmbits [(V2DI "64") (V4DI "64")
>   			   (V8HI "16") (V16HI "16")
>   			   (V16QI "8") (V32QI "8")])
>   
> +;; This attribute is used to form an immediate operand constraint using
> +;; "const_<bitimm>_operand".
> +(define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3")
> +			  (V8HI  "uimm4") (V16HI "uimm4")
> +			  (V4SI  "uimm5") (V8SI "uimm5")
> +			  (V2DI  "uimm6") (V4DI "uimm6")])
> +
>   ;; =======================================================================
>   ;; For many LASX instructions, the only difference of it from the LSX
>   ;; counterpart is the length of vector operands.  Describe these LSX/LASX
> @@ -203,6 +210,28 @@ (define_insn "<su>mul<mode>3_highpart"
>     [(set_attr "type" "simd_int_arith")
>      (set_attr "mode" "<MODE>")])
>   
> +;; <x>vrotr.{b/h/w/d}
> +
> +(define_insn "vrotr<mode>3"
> +  [(set (match_operand:IVEC 0 "register_operand" "=f")
> +	(rotatert:IVEC (match_operand:IVEC 1 "register_operand" "f")
> +		       (match_operand:IVEC 2 "register_operand" "f")))]
> +  ""
> +  "<x>vrotr.<simdfmt>\t%<wu>0,%<wu>1,%<wu>2"
> +  [(set_attr "type" "simd_int_arith")
> +   (set_attr "mode" "<MODE>")])
> +
> +;; <x>vrotri.{b/h/w/d}
> +
> +(define_insn "rotr<mode>3"
> +  [(set (match_operand:IVEC 0 "register_operand" "=f")
> +	(rotatert:IVEC (match_operand:IVEC 1 "register_operand" "f")
> +		       (match_operand:SI 2 "const_<bitimm>_operand")))]
> +  ""
> +  "<x>vrotri.<simdfmt>\t%<wu>0,%<wu>1,%2";
> +  [(set_attr "type" "simd_int_arith")
> +   (set_attr "mode" "<MODE>")])
> +
>   ; The LoongArch SX Instructions.
>   (include "lsx.md")
>   
> diff --git a/gcc/testsuite/gcc.target/loongarch/vect-rotr.c b/gcc/testsuite/gcc.target/loongarch/vect-rotr.c
> new file mode 100644
> index 00000000000..733c36334ce
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/vect-rotr.c
> @@ -0,0 +1,36 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mlasx" } */
> +/* { dg-final { scan-assembler "\tvrotr\.w\t" } } */
> +/* { dg-final { scan-assembler "\txvrotr\.w\t" } } */
> +/* { dg-final { scan-assembler "\tvrotri\.w\t\[^\n\]*7\n" } } */
> +/* { dg-final { scan-assembler "\txvrotri\.w\t\[^\n\]*7\n" } } */
> +
> +unsigned int a[8], b[8];
> +
> +void
> +test1 (void)
> +{
> +  for (int i = 0; i < 4; i++)
> +    a[i] = a[i] >> b[i] | a[i] << (32 - b[i]);
> +}
> +
> +void
> +test2 (void)
> +{
> +  for (int i = 0; i < 8; i++)
> +    a[i] = a[i] >> b[i] | a[i] << (32 - b[i]);
> +}
> +
> +void
> +test3 (void)
> +{
> +  for (int i = 0; i < 4; i++)
> +    a[i] = a[i] >> 7 | a[i] << 25;
> +}
> +
> +void
> +test4 (void)
> +{
> +  for (int i = 0; i < 8; i++)
> +    a[i] = a[i] >> 7 | a[i] << 25;
> +}


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact
  2023-11-20  0:47 [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations Xi Ruoyao
                   ` (2 preceding siblings ...)
  2023-11-20  0:47 ` [PATCH v3 3/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate shift Xi Ruoyao
@ 2023-11-20  0:47 ` Xi Ruoyao
  2023-11-23  8:23   ` chenglulu
  2023-11-20  0:47 ` [PATCH v3 5/5] LoongArch: Use LSX for scalar FP rounding with explicit rounding mode Xi Ruoyao
  2023-11-29  7:12 ` Pushed: [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations Xi Ruoyao
  5 siblings, 1 reply; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-20  0:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: chenglulu, i, xuchenghua, Xi Ruoyao

No functional change, just a cleanup.

gcc/ChangeLog:

	* config/loongarch/loongarch.md (lrint_allow_inexact): Remove.
	(<lrint_pattern><ANYF:mode><ANYFI:mode>2): Check if <LRINT>
	== UNSPEC_FTINT instead of <lrint_allow_inexact>.
---
 gcc/config/loongarch/loongarch.md | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index 78ed63f2132..1e019815451 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -585,9 +585,6 @@ (define_int_attr lrint_pattern [(UNSPEC_FTINT "lrint")
 (define_int_attr lrint_submenmonic [(UNSPEC_FTINT "")
 				    (UNSPEC_FTINTRM "rm")
 				    (UNSPEC_FTINTRP "rp")])
-(define_int_attr lrint_allow_inexact [(UNSPEC_FTINT "1")
-				      (UNSPEC_FTINTRM "0")
-				      (UNSPEC_FTINTRP "0")])
 
 ;; Iterator and attributes for bytepick.d
 (define_int_iterator bytepick_w_ashift_amount [8 16 24])
@@ -2384,7 +2381,7 @@ (define_insn "<lrint_pattern><ANYF:mode><ANYFI:mode>2"
 	(unspec:ANYFI [(match_operand:ANYF 1 "register_operand" "f")]
 		      LRINT))]
   "TARGET_HARD_FLOAT &&
-   (<lrint_allow_inexact>
+   (<LRINT> == UNSPEC_FTINT
     || flag_fp_int_builtin_inexact
     || !flag_trapping_math)"
   "ftint<lrint_submenmonic>.<ANYFI:ifmt>.<ANYF:fmt> %0,%1"
-- 
2.42.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact
  2023-11-20  0:47 ` [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact Xi Ruoyao
@ 2023-11-23  8:23   ` chenglulu
  2023-11-23  8:58     ` Xi Ruoyao
  0 siblings, 1 reply; 33+ messages in thread
From: chenglulu @ 2023-11-23  8:23 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua

I tested it and it was fine. I never knew this could be used like this.

Thank you!

在 2023/11/20 上午8:47, Xi Ruoyao 写道:
> No functional change, just a cleanup.
>
> gcc/ChangeLog:
>
> 	* config/loongarch/loongarch.md (lrint_allow_inexact): Remove.
> 	(<lrint_pattern><ANYF:mode><ANYFI:mode>2): Check if <LRINT>
> 	== UNSPEC_FTINT instead of <lrint_allow_inexact>.
> ---
>   gcc/config/loongarch/loongarch.md | 5 +----
>   1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
> index 78ed63f2132..1e019815451 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -585,9 +585,6 @@ (define_int_attr lrint_pattern [(UNSPEC_FTINT "lrint")
>   (define_int_attr lrint_submenmonic [(UNSPEC_FTINT "")
>   				    (UNSPEC_FTINTRM "rm")
>   				    (UNSPEC_FTINTRP "rp")])
> -(define_int_attr lrint_allow_inexact [(UNSPEC_FTINT "1")
> -				      (UNSPEC_FTINTRM "0")
> -				      (UNSPEC_FTINTRP "0")])
>   
>   ;; Iterator and attributes for bytepick.d
>   (define_int_iterator bytepick_w_ashift_amount [8 16 24])
> @@ -2384,7 +2381,7 @@ (define_insn "<lrint_pattern><ANYF:mode><ANYFI:mode>2"
>   	(unspec:ANYFI [(match_operand:ANYF 1 "register_operand" "f")]
>   		      LRINT))]
>     "TARGET_HARD_FLOAT &&
> -   (<lrint_allow_inexact>
> +   (<LRINT> == UNSPEC_FTINT
>       || flag_fp_int_builtin_inexact
>       || !flag_trapping_math)"
>     "ftint<lrint_submenmonic>.<ANYFI:ifmt>.<ANYF:fmt> %0,%1"


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact
  2023-11-23  8:23   ` chenglulu
@ 2023-11-23  8:58     ` Xi Ruoyao
  2023-11-23  9:14       ` chenglulu
  0 siblings, 1 reply; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-23  8:58 UTC (permalink / raw)
  To: chenglulu, gcc-patches; +Cc: i, xuchenghua

On Thu, 2023-11-23 at 16:23 +0800, chenglulu wrote:
> I tested it and it was fine. I never knew this could be used like
> this.

I remember when I wrote r13-3920 I tried this but failed.  Maybe
something has been improved in machine description parser, or perhaps I
just did some stupid thing that time...

> Thank you!
> 
> 在 2023/11/20 上午8:47, Xi Ruoyao 写道:
> > No functional change, just a cleanup.
> > 
> > gcc/ChangeLog:
> > 
> > 	* config/loongarch/loongarch.md (lrint_allow_inexact):
> > Remove.
> > 	(<lrint_pattern><ANYF:mode><ANYFI:mode>2): Check if <LRINT>
> > 	== UNSPEC_FTINT instead of <lrint_allow_inexact>.
> > ---
> >   gcc/config/loongarch/loongarch.md | 5 +----
> >   1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/gcc/config/loongarch/loongarch.md
> > b/gcc/config/loongarch/loongarch.md
> > index 78ed63f2132..1e019815451 100644
> > --- a/gcc/config/loongarch/loongarch.md
> > +++ b/gcc/config/loongarch/loongarch.md
> > @@ -585,9 +585,6 @@ (define_int_attr lrint_pattern [(UNSPEC_FTINT
> > "lrint")
> >   (define_int_attr lrint_submenmonic [(UNSPEC_FTINT "")
> >   				    (UNSPEC_FTINTRM "rm")
> >   				    (UNSPEC_FTINTRP "rp")])
> > -(define_int_attr lrint_allow_inexact [(UNSPEC_FTINT "1")
> > -				      (UNSPEC_FTINTRM "0")
> > -				      (UNSPEC_FTINTRP "0")])
> >   
> >   ;; Iterator and attributes for bytepick.d
> >   (define_int_iterator bytepick_w_ashift_amount [8 16 24])
> > @@ -2384,7 +2381,7 @@ (define_insn
> > "<lrint_pattern><ANYF:mode><ANYFI:mode>2"
> >   	(unspec:ANYFI [(match_operand:ANYF 1 "register_operand"
> > "f")]
> >   		      LRINT))]
> >     "TARGET_HARD_FLOAT &&
> > -   (<lrint_allow_inexact>
> > +   (<LRINT> == UNSPEC_FTINT
> >       || flag_fp_int_builtin_inexact
> >       || !flag_trapping_math)"
> >     "ftint<lrint_submenmonic>.<ANYFI:ifmt>.<ANYF:fmt> %0,%1"
> 

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact
  2023-11-23  8:58     ` Xi Ruoyao
@ 2023-11-23  9:14       ` chenglulu
  2023-11-23 12:24         ` Xi Ruoyao
  0 siblings, 1 reply; 33+ messages in thread
From: chenglulu @ 2023-11-23  9:14 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua


在 2023/11/23 下午4:58, Xi Ruoyao 写道:
> On Thu, 2023-11-23 at 16:23 +0800, chenglulu wrote:
>> I tested it and it was fine. I never knew this could be used like
>> this.
> I remember when I wrote r13-3920 I tried this but failed.  Maybe
> something has been improved in machine description parser, or perhaps I
> just did some stupid thing that time...

But I think this is a really cool implementation!

When I look at this code and compare it to our scalar implementation, it 
seems

that our scalar implementation still lacks an "lround".

>
>> Thank you!
>>
>> 在 2023/11/20 上午8:47, Xi Ruoyao 写道:
>>> No functional change, just a cleanup.
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* config/loongarch/loongarch.md (lrint_allow_inexact):
>>> Remove.
>>> 	(<lrint_pattern><ANYF:mode><ANYFI:mode>2): Check if <LRINT>
>>> 	== UNSPEC_FTINT instead of <lrint_allow_inexact>.
>>> ---
>>>    gcc/config/loongarch/loongarch.md | 5 +----
>>>    1 file changed, 1 insertion(+), 4 deletions(-)
>>>
>>> diff --git a/gcc/config/loongarch/loongarch.md
>>> b/gcc/config/loongarch/loongarch.md
>>> index 78ed63f2132..1e019815451 100644
>>> --- a/gcc/config/loongarch/loongarch.md
>>> +++ b/gcc/config/loongarch/loongarch.md
>>> @@ -585,9 +585,6 @@ (define_int_attr lrint_pattern [(UNSPEC_FTINT
>>> "lrint")
>>>    (define_int_attr lrint_submenmonic [(UNSPEC_FTINT "")
>>>    				    (UNSPEC_FTINTRM "rm")
>>>    				    (UNSPEC_FTINTRP "rp")])
>>> -(define_int_attr lrint_allow_inexact [(UNSPEC_FTINT "1")
>>> -				      (UNSPEC_FTINTRM "0")
>>> -				      (UNSPEC_FTINTRP "0")])
>>>    
>>>    ;; Iterator and attributes for bytepick.d
>>>    (define_int_iterator bytepick_w_ashift_amount [8 16 24])
>>> @@ -2384,7 +2381,7 @@ (define_insn
>>> "<lrint_pattern><ANYF:mode><ANYFI:mode>2"
>>>    	(unspec:ANYFI [(match_operand:ANYF 1 "register_operand"
>>> "f")]
>>>    		      LRINT))]
>>>      "TARGET_HARD_FLOAT &&
>>> -   (<lrint_allow_inexact>
>>> +   (<LRINT> == UNSPEC_FTINT
>>>        || flag_fp_int_builtin_inexact
>>>        || !flag_trapping_math)"
>>>      "ftint<lrint_submenmonic>.<ANYFI:ifmt>.<ANYF:fmt> %0,%1"


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact
  2023-11-23  9:14       ` chenglulu
@ 2023-11-23 12:24         ` Xi Ruoyao
  2023-11-23 14:39           ` chenglulu
  0 siblings, 1 reply; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-23 12:24 UTC (permalink / raw)
  To: chenglulu, gcc-patches; +Cc: i, xuchenghua

On Thu, 2023-11-23 at 17:14 +0800, chenglulu wrote:
> When I look at this code and compare it to our scalar implementation, it 
> seems
> 
> that our scalar implementation still lacks an "lround".

Should be "lroundeven".  We don't have an instruction for lround :(.

I tried this but it does not work:

-(define_int_iterator LRINT [UNSPEC_FTINT UNSPEC_FTINTRM UNSPEC_FTINTRP])
+(define_int_iterator LRINT
+  [UNSPEC_FTINT UNSPEC_FTINTRM UNSPEC_FTINTRP UNSPEC_FTINTRNE])
 (define_int_attr lrint_pattern [(UNSPEC_FTINT "lrint")
 				(UNSPEC_FTINTRM "lfloor")
-				(UNSPEC_FTINTRP "lceil")])
+				(UNSPEC_FTINTRP "lceil")
+				(UNSPEC_FTINTRNE "lroundeven")])
 (define_int_attr lrint_submenmonic [(UNSPEC_FTINT "")
 				    (UNSPEC_FTINTRM "rm")
-				    (UNSPEC_FTINTRP "rp")])
+				    (UNSPEC_FTINTRP "rp")
+				    (UNSPEC_FTINTRNE "rne")])

The problem is "lroundevenMN2" is not a standard pattern name.  The SIMD
version of ftintrne in patch 1 only works because we are expanding
"roundevenM2" (it's a standard pattern name) to UNSPEC_SIMD_FRINTRNE,
and then a define_insn can match (fix (UNSPEC_SIMD_FRINTRNE op)).  But
for non-SIMD we don't have roundevenM2.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact
  2023-11-23 12:24         ` Xi Ruoyao
@ 2023-11-23 14:39           ` chenglulu
  0 siblings, 0 replies; 33+ messages in thread
From: chenglulu @ 2023-11-23 14:39 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua


在 2023/11/23 下午8:24, Xi Ruoyao 写道:
> On Thu, 2023-11-23 at 17:14 +0800, chenglulu wrote:
>> When I look at this code and compare it to our scalar implementation, it
>> seems
>>
>> that our scalar implementation still lacks an "lround".
> Should be "lroundeven".  We don't have an instruction for lround :(.
>
> I tried this but it does not work:
>
> -(define_int_iterator LRINT [UNSPEC_FTINT UNSPEC_FTINTRM UNSPEC_FTINTRP])
> +(define_int_iterator LRINT
> +  [UNSPEC_FTINT UNSPEC_FTINTRM UNSPEC_FTINTRP UNSPEC_FTINTRNE])
>   (define_int_attr lrint_pattern [(UNSPEC_FTINT "lrint")
>   				(UNSPEC_FTINTRM "lfloor")
> -				(UNSPEC_FTINTRP "lceil")])
> +				(UNSPEC_FTINTRP "lceil")
> +				(UNSPEC_FTINTRNE "lroundeven")])
>   (define_int_attr lrint_submenmonic [(UNSPEC_FTINT "")
>   				    (UNSPEC_FTINTRM "rm")
> -				    (UNSPEC_FTINTRP "rp")])
> +				    (UNSPEC_FTINTRP "rp")
> +				    (UNSPEC_FTINTRNE "rne")])
>
> The problem is "lroundevenMN2" is not a standard pattern name.  The SIMD
> version of ftintrne in patch 1 only works because we are expanding
> "roundevenM2" (it's a standard pattern name) to UNSPEC_SIMD_FRINTRNE,
> and then a define_insn can match (fix (UNSPEC_SIMD_FRINTRNE op)).  But
> for non-SIMD we don't have roundevenM2.
>
Okay, I understand. I think this is a bit regretful.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 5/5] LoongArch: Use LSX for scalar FP rounding with explicit rounding mode
  2023-11-20  0:47 [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations Xi Ruoyao
                   ` (3 preceding siblings ...)
  2023-11-20  0:47 ` [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact Xi Ruoyao
@ 2023-11-20  0:47 ` Xi Ruoyao
  2023-11-29  7:12 ` Pushed: [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations Xi Ruoyao
  5 siblings, 0 replies; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-20  0:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: chenglulu, i, xuchenghua, Xi Ruoyao

In LoongArch FP base ISA there is only the frint.{s/d} instruction which
reads the global rounding mode.  Utilize LSX for explicit rounding mode
even if the operand is scalar.  It seems wasting the CPU power, but
still much faster than calling the library function.

gcc/ChangeLog:

	* config/loongarch/simd.md (LSX_SCALAR_FRINT): New int iterator.
	(VLSX_FOR_FMODE): New mode attribute.
	(<simd_for_scalar_frint_pattern><mode>2): New expander,
	expanding to vreplvei.{w/d} + frint{rp/rz/rm/rne}.{s.d}.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vect-frint-scalar.c: New test.
	* gcc.target/loongarch/vect-frint-scalar-no-inexact.c: New test.
---
 gcc/config/loongarch/simd.md                  | 29 +++++++++++++
 .../loongarch/vect-frint-scalar-no-inexact.c  | 23 ++++++++++
 .../gcc.target/loongarch/vect-frint-scalar.c  | 43 +++++++++++++++++++
 3 files changed, 95 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c

diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 6937477e3df..e592de49aa0 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -150,6 +150,35 @@ (define_expand "ftrunc<mode>2"
 		     UNSPEC_SIMD_FRINTRZ))]
   "")
 
+;; Use LSX for scalar ceil/floor/trunc/roundeven when -mlsx and -ffp-int-
+;; builtin-inexact.  The base FP instruction set lacks these operations.
+;; Yes we are wasting 50% or even 75% of the CPU horsepower, but it's still
+;; much faster than calling a libc function: on LA464 and LA664 there is a
+;; 3x ~ 5x speed up.
+;;
+;; Note that a vreplvei instruction is needed or we'll also operate on the
+;; junk in high bits of the vector register and produce random FP exceptions.
+
+(define_int_iterator LSX_SCALAR_FRINT
+  [UNSPEC_SIMD_FRINTRP
+   UNSPEC_SIMD_FRINTRZ
+   UNSPEC_SIMD_FRINTRM
+   UNSPEC_SIMD_FRINTRNE])
+
+(define_mode_attr VLSX_FOR_FMODE [(DF "V2DF") (SF "V4SF")])
+
+(define_expand "<simd_frint_pattern><mode>2"
+  [(set (match_dup 2)
+     (vec_duplicate:<VLSX_FOR_FMODE>
+       (match_operand:ANYF 1 "register_operand")))
+   (set (match_dup 2)
+	(unspec:<VLSX_FOR_FMODE> [(match_dup 2)] LSX_SCALAR_FRINT))
+   (set (match_operand:ANYF 0 "register_operand")
+	(vec_select:ANYF (match_dup 2) (parallel [(const_int 0)])))
+   (clobber (match_scratch:<VLSX_FOR_FMODE> 3))]
+  "ISA_HAS_LSX && (flag_fp_int_builtin_inexact || !flag_trapping_math)"
+  "operands[2] = gen_reg_rtx (<VLSX_FOR_FMODE>mode);")
+
 ;; <x>vftint.{/rp/rz/rm}
 (define_insn
   "<simd_isa>_<x>vftint<simd_frint_rounding>_<simdifmt_for_f>_<simdfmt>"
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
new file mode 100644
index 00000000000..002e3b92df7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlsx -fno-fp-int-builtin-inexact" } */
+
+#include "vect-frint-scalar.c"
+
+/* cannot use LSX for these with -fno-fp-int-builtin-inexact,
+   call library function.  */
+/* { dg-final { scan-assembler "\tb\t%plt\\(ceil\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(ceilf\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(floor\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(floorf\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(trunc\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(truncf\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(roundeven\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(roundevenf\\)" } } */
+
+/* nearbyint is not allowed to rasie FE_INEXACT for decades */
+/* { dg-final { scan-assembler "\tb\t%plt\\(nearbyint\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(nearbyintf\\)" } } */
+
+/* rint should just use basic FP operation */
+/* { dg-final { scan-assembler "\tfrint\.s" } } */
+/* { dg-final { scan-assembler "\tfrint\.d" } } */
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
new file mode 100644
index 00000000000..c7cb40be7d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlsx" } */
+
+#define test(func, suffix) \
+__typeof__ (1.##suffix) \
+_##func##suffix (__typeof__ (1.##suffix) x) \
+{ \
+  return __builtin_##func##suffix (x); \
+}
+
+test (ceil, f)
+test (ceil, )
+test (floor, f)
+test (floor, )
+test (trunc, f)
+test (trunc, )
+test (roundeven, f)
+test (roundeven, )
+test (nearbyint, f)
+test (nearbyint, )
+test (rint, f)
+test (rint, )
+
+/* { dg-final { scan-assembler "\tvfrintrp\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrm\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrz\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrne\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrp\.d" } } */
+/* { dg-final { scan-assembler "\tvfrintrm\.d" } } */
+/* { dg-final { scan-assembler "\tvfrintrz\.d" } } */
+/* { dg-final { scan-assembler "\tvfrintrne\.d" } } */
+
+/* must do vreplvei first */
+/* { dg-final { scan-assembler-times "\tvreplvei\.w\t\\\$vr0,\\\$vr0,0" 4 } } */
+/* { dg-final { scan-assembler-times "\tvreplvei\.d\t\\\$vr0,\\\$vr0,0" 4 } } */
+
+/* nearbyint is not allowed to rasie FE_INEXACT for decades */
+/* { dg-final { scan-assembler "\tb\t%plt\\(nearbyint\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(nearbyintf\\)" } } */
+
+/* rint should just use basic FP operation */
+/* { dg-final { scan-assembler "\tfrint\.s" } } */
+/* { dg-final { scan-assembler "\tfrint\.d" } } */
-- 
2.42.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Pushed: [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations
  2023-11-20  0:47 [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations Xi Ruoyao
                   ` (4 preceding siblings ...)
  2023-11-20  0:47 ` [PATCH v3 5/5] LoongArch: Use LSX for scalar FP rounding with explicit rounding mode Xi Ruoyao
@ 2023-11-29  7:12 ` Xi Ruoyao
  2023-11-29  7:45   ` chenglulu
  5 siblings, 1 reply; 33+ messages in thread
From: Xi Ruoyao @ 2023-11-29  7:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: chenglulu, i, xuchenghua

On Mon, 2023-11-20 at 08:47 +0800, Xi Ruoyao wrote:
> The [1/5] patch is the PR112578 fix at
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637097.html.
> It has been changed to remove the nearbyint pattern (because nearbyint
> should not raise FE_INEXACT even if -ffp-int-builtin-inexact).
> As other patches depending on the simd.md file introduced by this, sending
> it as the first of this series.
> 
> As many LASX instructions are only differentiated from the corresponding
> LSX instruction with operand length, create simd.md file to contain the
> RTX templates sharable by LSX and LASX.  This makes the code cleaner and
> easier to maintain.
> 
> The [2/5] and [3/5] patches make vector product highpart and rotate
> shift operations for GNU vectors and auto vectorization.
> 
> The [4/5] patch is a simple code cleanup, with no function change.
> 
> The [5/5] patch uses LSX for FP scalar rounding operations if LSX is
> available and -ffp-int-builtin-exact.  We do this because the base FP
> ISA does not have such instructions.  Using LSX is overkill, but still
> much faster than calling libc functions.
> 
> Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Pushed r14-5950 .. r14-5954 with minor change: a FSF copyright
disclaimer is added into simd.md in the 1st patch, and an used
match_scratch is removed from <simd_frint_pattern><mode>2 in the 5th
patch.

> Xi Ruoyao (5):
>   LoongArch: Fix usage of LSX and LASX frint/ftint instructions
>     [PR112578]
>   LoongArch: Use standard pattern name and RTX code for LSX/LASX muh
>     instructions
>   LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate
>     shift
>   LoongArch: Remove lrint_allow_inexact
>   LoongArch: Use LSX for scalar FP rounding with explicit rounding mode
> 
>  gcc/config/loongarch/lasx.md                  | 283 -----------------
>  gcc/config/loongarch/loongarch-builtins.cc    |  52 ++--
>  gcc/config/loongarch/loongarch.md             |  12 +-
>  gcc/config/loongarch/lsx.md                   | 293 ------------------
>  gcc/config/loongarch/simd.md                  | 268 ++++++++++++++++
>  .../loongarch/vect-frint-no-inexact.c         |  48 +++
>  .../loongarch/vect-frint-scalar-no-inexact.c  |  23 ++
>  .../gcc.target/loongarch/vect-frint-scalar.c  |  43 +++
>  .../gcc.target/loongarch/vect-frint.c         |  85 +++++
>  .../loongarch/vect-ftint-no-inexact.c         |  44 +++
>  .../gcc.target/loongarch/vect-ftint.c         |  83 +++++
>  gcc/testsuite/gcc.target/loongarch/vect-muh.c |  36 +++
>  .../gcc.target/loongarch/vect-rotr.c          |  36 +++
>  13 files changed, 701 insertions(+), 605 deletions(-)
>  create mode 100644 gcc/config/loongarch/simd.md
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-muh.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-rotr.c

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Pushed: [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations
  2023-11-29  7:12 ` Pushed: [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations Xi Ruoyao
@ 2023-11-29  7:45   ` chenglulu
  0 siblings, 0 replies; 33+ messages in thread
From: chenglulu @ 2023-11-29  7:45 UTC (permalink / raw)
  To: Xi Ruoyao, gcc-patches; +Cc: i, xuchenghua


在 2023/11/29 下午3:12, Xi Ruoyao 写道:
> On Mon, 2023-11-20 at 08:47 +0800, Xi Ruoyao wrote:
>> The [1/5] patch is the PR112578 fix at
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637097.html.
>> It has been changed to remove the nearbyint pattern (because nearbyint
>> should not raise FE_INEXACT even if -ffp-int-builtin-inexact).
>> As other patches depending on the simd.md file introduced by this, sending
>> it as the first of this series.
>>
>> As many LASX instructions are only differentiated from the corresponding
>> LSX instruction with operand length, create simd.md file to contain the
>> RTX templates sharable by LSX and LASX.  This makes the code cleaner and
>> easier to maintain.
>>
>> The [2/5] and [3/5] patches make vector product highpart and rotate
>> shift operations for GNU vectors and auto vectorization.
>>
>> The [4/5] patch is a simple code cleanup, with no function change.
>>
>> The [5/5] patch uses LSX for FP scalar rounding operations if LSX is
>> available and -ffp-int-builtin-exact.  We do this because the base FP
>> ISA does not have such instructions.  Using LSX is overkill, but still
>> much faster than calling libc functions.
>>
>> Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> Pushed r14-5950 .. r14-5954 with minor change: a FSF copyright
> disclaimer is added into simd.md in the 1st patch, and an used
> match_scratch is removed from <simd_frint_pattern><mode>2 in the 5th
> patch.
>
Thank you very much!:-)
>> Xi Ruoyao (5):
>>    LoongArch: Fix usage of LSX and LASX frint/ftint instructions
>>      [PR112578]
>>    LoongArch: Use standard pattern name and RTX code for LSX/LASX muh
>>      instructions
>>    LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate
>>      shift
>>    LoongArch: Remove lrint_allow_inexact
>>    LoongArch: Use LSX for scalar FP rounding with explicit rounding mode
>>
>>   gcc/config/loongarch/lasx.md                  | 283 -----------------
>>   gcc/config/loongarch/loongarch-builtins.cc    |  52 ++--
>>   gcc/config/loongarch/loongarch.md             |  12 +-
>>   gcc/config/loongarch/lsx.md                   | 293 ------------------
>>   gcc/config/loongarch/simd.md                  | 268 ++++++++++++++++
>>   .../loongarch/vect-frint-no-inexact.c         |  48 +++
>>   .../loongarch/vect-frint-scalar-no-inexact.c  |  23 ++
>>   .../gcc.target/loongarch/vect-frint-scalar.c  |  43 +++
>>   .../gcc.target/loongarch/vect-frint.c         |  85 +++++
>>   .../loongarch/vect-ftint-no-inexact.c         |  44 +++
>>   .../gcc.target/loongarch/vect-ftint.c         |  83 +++++
>>   gcc/testsuite/gcc.target/loongarch/vect-muh.c |  36 +++
>>   .../gcc.target/loongarch/vect-rotr.c          |  36 +++
>>   13 files changed, 701 insertions(+), 605 deletions(-)
>>   create mode 100644 gcc/config/loongarch/simd.md
>>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c
>>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
>>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
>>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint.c
>>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c
>>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint.c
>>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-muh.c
>>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-rotr.c


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2023-11-29  7:45 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-20  0:47 [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations Xi Ruoyao
2023-11-20  0:47 ` [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578] Xi Ruoyao
2023-11-23  6:35   ` chenglulu
2023-11-23  7:11     ` Xi Ruoyao
2023-11-23  7:31       ` chenglulu
2023-11-23  8:13         ` chenglulu
2023-11-23  9:02           ` Xi Ruoyao
2023-11-23  9:12             ` chenglulu
2023-11-23 10:12               ` Xi Ruoyao
2023-11-23 12:06                 ` Xi Ruoyao
2023-11-23 18:03                 ` Joseph Myers
2023-11-24  2:39                   ` Xi Ruoyao
2023-11-24  8:01                     ` chenglulu
2023-11-24  8:26                       ` Xi Ruoyao
2023-11-24  8:36                         ` chenglulu
2023-11-24  8:42                           ` Xi Ruoyao
2023-11-24  9:46                             ` chenglulu
2023-11-24 10:30                               ` Xi Ruoyao
2023-11-24 14:59                                 ` chenglulu
2023-11-23  8:54         ` Xi Ruoyao
2023-11-20  0:47 ` [PATCH v3 2/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX muh instructions Xi Ruoyao
2023-11-23 12:08   ` chenglulu
2023-11-20  0:47 ` [PATCH v3 3/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate shift Xi Ruoyao
2023-11-23  8:42   ` chenglulu
2023-11-20  0:47 ` [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact Xi Ruoyao
2023-11-23  8:23   ` chenglulu
2023-11-23  8:58     ` Xi Ruoyao
2023-11-23  9:14       ` chenglulu
2023-11-23 12:24         ` Xi Ruoyao
2023-11-23 14:39           ` chenglulu
2023-11-20  0:47 ` [PATCH v3 5/5] LoongArch: Use LSX for scalar FP rounding with explicit rounding mode Xi Ruoyao
2023-11-29  7:12 ` Pushed: [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations Xi Ruoyao
2023-11-29  7:45   ` chenglulu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).