From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <chenglulu@sourceware.org>
Received: by sourceware.org (Postfix, from userid 7877)
	id 0B7613858C60; Fri, 26 Jan 2024 08:13:10 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0B7613858C60
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1706256791;
	bh=IEZt2zbdvk2sbM7ML+klBrZdDSF97KRDFQWIDVNIeak=;
	h=From:To:Subject:Date:From;
	b=VUQbAi4e7BDCS57V54+HswuOtzahspRMtf5VOWQH84TJaIFF0uNxPe7Y4rOckiKHH
	 o+wpNPRAChEMsxmbvbhzMJOTFx+Dka2JnqEfFN3lqW6SdZwm9NyTqV3wj6yvJYsJzf
	 lTyqFOSxHa/PUhMHVfK80WgTG48ywVf0kf5UDGc4=
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="utf-8"
From: LuluCheng <chenglulu@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org
Subject: [gcc r14-8444] LoongArch: Optimize implementation of single-precision
 floating-point approximate division.
X-Act-Checkin: gcc
X-Git-Author: Li Wei <liwei@loongson.cn>
X-Git-Refname: refs/heads/master
X-Git-Oldrev: bfd6b36f08021f023e0e9223f5aea315b74a5c56
X-Git-Newrev: 58a27738c57fbf73c367b85537b714c29e0d5825
Message-Id: <20240126081311.0B7613858C60@sourceware.org>
Date: Fri, 26 Jan 2024 08:13:10 +0000 (GMT)
List-Id: <gcc-cvs.sourceware.org>

https://gcc.gnu.org/g:58a27738c57fbf73c367b85537b714c29e0d5825

commit r14-8444-g58a27738c57fbf73c367b85537b714c29e0d5825
Author: Li Wei <liwei@loongson.cn>
Date:   Wed Jan 24 17:44:17 2024 +0800

    LoongArch: Optimize implementation of single-precision floating-point approximate division.
    
    We found that in the spec17 521.wrf program, some loop invariant code generated
    from single-precision floating-point approximate division calculation failed to
    propose a loop. This is because the pseudo-register that stores the
    intermediate temporary calculation results is rewritten in the implementation
    of single-precision floating-point approximate division, failing to propose
    invariants in the loop2_invariant pass. To this end, the intermediate temporary
    calculation results are stored in new pseudo-registers without destroying the
    read-write dependency, so that they could be recognized as loop invariants in
    the loop2_invariant pass.
    After optimization, the number of instructions of 521.wrf is reduced by 0.18%
    compared with before optimization (1716612948501 -> 1713471771364).
    
    gcc/ChangeLog:
    
            * config/loongarch/loongarch.cc (loongarch_emit_swdivsf): Adjust.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/loongarch/invariant-recip.c: New test.

Diff:
---
 gcc/config/loongarch/loongarch.cc                  | 19 +++++++++----
 .../gcc.target/loongarch/invariant-recip.c         | 33 ++++++++++++++++++++++
 2 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index dba1252c8f71..b494040d165d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -10847,16 +10847,23 @@ void loongarch_emit_swdivsf (rtx res, rtx a, rtx b, machine_mode mode)
   /* x0 = 1./b estimate.  */
   emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b),
 					      unspec)));
-  /* 2.0 - b * x0  */
+  /* e0 = 2.0 - b * x0.  */
   emit_insn (gen_rtx_SET (e0, gen_rtx_FMA (mode,
 					   gen_rtx_NEG (mode, b), x0, mtwo)));
 
-  /* x0 = a * x0  */
   if (a != CONST1_RTX (mode))
-    emit_insn (gen_rtx_SET (x0, gen_rtx_MULT (mode, a, x0)));
-
-  /* res = e0 * x0  */
-  emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0)));
+    {
+      rtx e1 = gen_reg_rtx (mode);
+      /* e1 = a * x0.  */
+      emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, a, x0)));
+      /* res = e0 * e1.  */
+      emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, e1)));
+    }
+  else
+    {
+      /* res = e0 * x0.  */
+      emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0)));
+    }
 }
 
 static bool
diff --git a/gcc/testsuite/gcc.target/loongarch/invariant-recip.c b/gcc/testsuite/gcc.target/loongarch/invariant-recip.c
new file mode 100644
index 000000000000..2f64f6ed5e50
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/invariant-recip.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=loongarch64 -mabi=lp64d -mrecip -mfrecipe -fdump-rtl-loop2_invariant " } */
+/* { dg-final { scan-rtl-dump "Decided to move dependent invariant" "loop2_invariant" } } */
+
+void
+nislfv_rain_plm (int im, int km, float dzl[im][km], float rql[im][km],
+                 float dt)
+{
+  int i, k;
+  float con1, decfl;
+  float dz[km], qn[km], wi[km + 1];
+
+  for (i = 0; i < im; i++)
+    {
+      for (k = 0; k < km; k++)
+        {
+          dz[k] = dzl[i][k];
+        }
+      con1 = 0.05;
+      for (k = km - 1; k >= 0; k--)
+        {
+          decfl = (wi[k + 1] - wi[k]) * dt / dz[k];
+          if (decfl > con1)
+            {
+              wi[k] = wi[k + 1] - con1 * dz[k] / dt;
+            }
+        }
+      for (k = 0; k < km; k++)
+        {
+          rql[i][k] = qn[k];
+        }
+    }
+}