[PATCH v3 5/5] LoongArch: Use LSX for scalar FP rounding with explicit rounding mode

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Xi Ruoyao <xry111@xry111.site>
To: gcc-patches@gcc.gnu.org
Cc: chenglulu <chenglulu@loongson.cn>,
	i@xen0n.name, xuchenghua@loongson.cn,
	Xi Ruoyao <xry111@xry111.site>
Subject: [PATCH v3 5/5] LoongArch: Use LSX for scalar FP rounding with explicit rounding mode
Date: Mon, 20 Nov 2023 08:47:28 +0800	[thread overview]
Message-ID: <20231120004728.205167-6-xry111@xry111.site> (raw)
In-Reply-To: <20231120004728.205167-1-xry111@xry111.site>

In LoongArch FP base ISA there is only the frint.{s/d} instruction which
reads the global rounding mode.  Utilize LSX for explicit rounding mode
even if the operand is scalar.  It seems wasting the CPU power, but
still much faster than calling the library function.

gcc/ChangeLog:

	* config/loongarch/simd.md (LSX_SCALAR_FRINT): New int iterator.
	(VLSX_FOR_FMODE): New mode attribute.
	(<simd_for_scalar_frint_pattern><mode>2): New expander,
	expanding to vreplvei.{w/d} + frint{rp/rz/rm/rne}.{s.d}.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/vect-frint-scalar.c: New test.
	* gcc.target/loongarch/vect-frint-scalar-no-inexact.c: New test.
---
 gcc/config/loongarch/simd.md                  | 29 +++++++++++++
 .../loongarch/vect-frint-scalar-no-inexact.c  | 23 ++++++++++
 .../gcc.target/loongarch/vect-frint-scalar.c  | 43 +++++++++++++++++++
 3 files changed, 95 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c

diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 6937477e3df..e592de49aa0 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -150,6 +150,35 @@ (define_expand "ftrunc<mode>2"
 		     UNSPEC_SIMD_FRINTRZ))]
   "")
 
+;; Use LSX for scalar ceil/floor/trunc/roundeven when -mlsx and -ffp-int-
+;; builtin-inexact.  The base FP instruction set lacks these operations.
+;; Yes we are wasting 50% or even 75% of the CPU horsepower, but it's still
+;; much faster than calling a libc function: on LA464 and LA664 there is a
+;; 3x ~ 5x speed up.
+;;
+;; Note that a vreplvei instruction is needed or we'll also operate on the
+;; junk in high bits of the vector register and produce random FP exceptions.
+
+(define_int_iterator LSX_SCALAR_FRINT
+  [UNSPEC_SIMD_FRINTRP
+   UNSPEC_SIMD_FRINTRZ
+   UNSPEC_SIMD_FRINTRM
+   UNSPEC_SIMD_FRINTRNE])
+
+(define_mode_attr VLSX_FOR_FMODE [(DF "V2DF") (SF "V4SF")])
+
+(define_expand "<simd_frint_pattern><mode>2"
+  [(set (match_dup 2)
+     (vec_duplicate:<VLSX_FOR_FMODE>
+       (match_operand:ANYF 1 "register_operand")))
+   (set (match_dup 2)
+	(unspec:<VLSX_FOR_FMODE> [(match_dup 2)] LSX_SCALAR_FRINT))
+   (set (match_operand:ANYF 0 "register_operand")
+	(vec_select:ANYF (match_dup 2) (parallel [(const_int 0)])))
+   (clobber (match_scratch:<VLSX_FOR_FMODE> 3))]
+  "ISA_HAS_LSX && (flag_fp_int_builtin_inexact || !flag_trapping_math)"
+  "operands[2] = gen_reg_rtx (<VLSX_FOR_FMODE>mode);")
+
 ;; <x>vftint.{/rp/rz/rm}
 (define_insn
   "<simd_isa>_<x>vftint<simd_frint_rounding>_<simdifmt_for_f>_<simdfmt>"
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
new file mode 100644
index 00000000000..002e3b92df7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlsx -fno-fp-int-builtin-inexact" } */
+
+#include "vect-frint-scalar.c"
+
+/* cannot use LSX for these with -fno-fp-int-builtin-inexact,
+   call library function.  */
+/* { dg-final { scan-assembler "\tb\t%plt\\(ceil\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(ceilf\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(floor\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(floorf\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(trunc\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(truncf\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(roundeven\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(roundevenf\\)" } } */
+
+/* nearbyint is not allowed to rasie FE_INEXACT for decades */
+/* { dg-final { scan-assembler "\tb\t%plt\\(nearbyint\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(nearbyintf\\)" } } */
+
+/* rint should just use basic FP operation */
+/* { dg-final { scan-assembler "\tfrint\.s" } } */
+/* { dg-final { scan-assembler "\tfrint\.d" } } */
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
new file mode 100644
index 00000000000..c7cb40be7d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlsx" } */
+
+#define test(func, suffix) \
+__typeof__ (1.##suffix) \
+_##func##suffix (__typeof__ (1.##suffix) x) \
+{ \
+  return __builtin_##func##suffix (x); \
+}
+
+test (ceil, f)
+test (ceil, )
+test (floor, f)
+test (floor, )
+test (trunc, f)
+test (trunc, )
+test (roundeven, f)
+test (roundeven, )
+test (nearbyint, f)
+test (nearbyint, )
+test (rint, f)
+test (rint, )
+
+/* { dg-final { scan-assembler "\tvfrintrp\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrm\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrz\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrne\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrp\.d" } } */
+/* { dg-final { scan-assembler "\tvfrintrm\.d" } } */
+/* { dg-final { scan-assembler "\tvfrintrz\.d" } } */
+/* { dg-final { scan-assembler "\tvfrintrne\.d" } } */
+
+/* must do vreplvei first */
+/* { dg-final { scan-assembler-times "\tvreplvei\.w\t\\\$vr0,\\\$vr0,0" 4 } } */
+/* { dg-final { scan-assembler-times "\tvreplvei\.d\t\\\$vr0,\\\$vr0,0" 4 } } */
+
+/* nearbyint is not allowed to rasie FE_INEXACT for decades */
+/* { dg-final { scan-assembler "\tb\t%plt\\(nearbyint\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(nearbyintf\\)" } } */
+
+/* rint should just use basic FP operation */
+/* { dg-final { scan-assembler "\tfrint\.s" } } */
+/* { dg-final { scan-assembler "\tfrint\.d" } } */
-- 
2.42.1

next prev parent reply	other threads:[~2023-11-20  0:48 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-20  0:47 [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations Xi Ruoyao
2023-11-20  0:47 ` [PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578] Xi Ruoyao
2023-11-23  6:35   ` chenglulu
2023-11-23  7:11     ` Xi Ruoyao
2023-11-23  7:31       ` chenglulu
2023-11-23  8:13         ` chenglulu
2023-11-23  9:02           ` Xi Ruoyao
2023-11-23  9:12             ` chenglulu
2023-11-23 10:12               ` Xi Ruoyao
2023-11-23 12:06                 ` Xi Ruoyao
2023-11-23 18:03                 ` Joseph Myers
2023-11-24  2:39                   ` Xi Ruoyao
2023-11-24  8:01                     ` chenglulu
2023-11-24  8:26                       ` Xi Ruoyao
2023-11-24  8:36                         ` chenglulu
2023-11-24  8:42                           ` Xi Ruoyao
2023-11-24  9:46                             ` chenglulu
2023-11-24 10:30                               ` Xi Ruoyao
2023-11-24 14:59                                 ` chenglulu
2023-11-23  8:54         ` Xi Ruoyao
2023-11-20  0:47 ` [PATCH v3 2/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX muh instructions Xi Ruoyao
2023-11-23 12:08   ` chenglulu
2023-11-20  0:47 ` [PATCH v3 3/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate shift Xi Ruoyao
2023-11-23  8:42   ` chenglulu
2023-11-20  0:47 ` [PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact Xi Ruoyao
2023-11-23  8:23   ` chenglulu
2023-11-23  8:58     ` Xi Ruoyao
2023-11-23  9:14       ` chenglulu
2023-11-23 12:24         ` Xi Ruoyao
2023-11-23 14:39           ` chenglulu
2023-11-20  0:47 ` Xi Ruoyao [this message]
2023-11-29  7:12 ` Pushed: [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations Xi Ruoyao
2023-11-29  7:45   ` chenglulu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231120004728.205167-6-xry111@xry111.site \
    --to=xry111@xry111.site \
    --cc=chenglulu@loongson.cn \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=i@xen0n.name \
    --cc=xuchenghua@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).