[PATCH] RISC-V: Split unordered FP comparisons into individual RTL insns

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] RISC-V: Split unordered FP comparisons into individual RTL insns
@ 2022-06-09 13:44 Maciej W. Rozycki
  2022-06-23 13:39 ` Maciej W. Rozycki
  0 siblings, 1 reply; 7+ messages in thread
From: Maciej W. Rozycki @ 2022-06-09 13:44 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Waterman, Jim Wilson, Kito Cheng, Palmer Dabbelt

We have unordered FP comparisons implemented as RTL insns that produce 
multiple machine instructions.  Such RTL insns are hard to match with a 
processor pipeline description and additionally there is a redundant 
SNEZ instruction produced on the result of these comparisons even though 
the FLT.fmt and FLE.fmt machine instructions already produce either 0 or 
1, e.g.:

long
flt (double x, double y)
{
  return __builtin_isless (x, y);
}

with `-O2 -fno-finite-math-only -fno-signaling-nans' gets compiled to:

	.globl	flt
	.type	flt, @function
flt:
	frflags	a5
	flt.d	a0,fa0,fa1
	fsflags	a5
	snez	a0,a0
	ret
	.size	flt, .-flt

because the middle end can't see through the UNSPEC operation unordered 
FP comparisons have been defined in terms of.

These instructions are only produced via an expander already, so change 
the expander to emit individual RTL insns for each machine instruction 
in the ultimate ultimate sequence produced rather than deferring to a 
single RTL insn producing the whole sequence at once.

	gcc/
	* config/riscv/riscv.md (UNSPECV_FSNVSNAN): New constant.
	(QUIET_PATTERN): New int attribute.
	(f<quiet_pattern>_quiet<ANYF:mode><X:mode>4): Emit the intended 
	RTL insns entirely within the preparation statements.
	(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default)
	(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan): Remove 
	insns.
	(*riscv_fsnvsnan<mode>2): New insn.

	gcc/testsuite/
	* gcc.target/riscv/fle-ieee.c: New test.
	* gcc.target/riscv/fle-snan.c: New test.
	* gcc.target/riscv/fle.c: New test.
	* gcc.target/riscv/flef-ieee.c: New test.
	* gcc.target/riscv/flef-snan.c: New test.
	* gcc.target/riscv/flef.c: New test.
	* gcc.target/riscv/flt-ieee.c: New test.
	* gcc.target/riscv/flt-snan.c: New test.
	* gcc.target/riscv/flt.c: New test.
	* gcc.target/riscv/fltf-ieee.c: New test.
	* gcc.target/riscv/fltf-snan.c: New test.
	* gcc.target/riscv/fltf.c: New test.
---
Hi,

 I think it is a step in the right direction, however ultimately I think 
we ought to actually tell GCC about the IEEE exception flags, so that the 
compiler can track data dependencies and we do not have to resort to 
UNSPECs which the compiler cannot see through.  E.g. for a piece of code 
like:

long
fltlt (double x, double y, double z)
{
  return __builtin_isless (x, y) + __builtin_isless (x, z);
}

(using an addition here for clarity because for a logical operation even 
more horror is produced) we get:

	.globl	fltlt
	.type	fltlt, @function
fltlt:
	frflags	a5	# 8	[c=4 l=4]  riscv_frflags
	flt.d	a0,fa0,fa1	# 9	[c=4 l=4]  *cstoredfdi4
	fsflags	a5	# 10	[c=0 l=4]  riscv_fsflags
	frflags	a4	# 16	[c=4 l=4]  riscv_frflags
	flt.d	a5,fa0,fa2	# 17	[c=4 l=4]  *cstoredfdi4
	fsflags	a4	# 18	[c=0 l=4]  riscv_fsflags
	addw	a0,a0,a5	# 30	[c=8 l=4]  *addsi3_extended/0
	ret		# 40	[c=0 l=4]  simple_return
	.size	fltlt, .-fltlt

where the middle FSFLAGS/FRFLAGS pair makes no sense of course and is a 
waste of both space and cycles.

 I'm yet running some benchmarking to see if the use of UNSPEC_VOLATILEs 
makes any noticeable performance difference, but I suspect it does not as 
the compiler could not do much about the original multiple-instruction 
single RTL insns anyway.

 No regressions with the GCC (with and w/o `-fsignaling-nans') and glibc 
testsuites (as per commit 1fcbfb00fc67 ("RISC-V: Fix -fsignaling-nans for 
glibc testsuite.")).  OK to apply?

  Maciej
---
 gcc/config/riscv/riscv.md                  |   67 +++++++++++++++--------------
 gcc/testsuite/gcc.target/riscv/fle-ieee.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/fle-snan.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/fle.c       |   12 +++++
 gcc/testsuite/gcc.target/riscv/flef-ieee.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/flef-snan.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/flef.c      |   12 +++++
 gcc/testsuite/gcc.target/riscv/flt-ieee.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/flt-snan.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/flt.c       |   12 +++++
 gcc/testsuite/gcc.target/riscv/fltf-ieee.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/fltf-snan.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/fltf.c      |   12 +++++
 13 files changed, 179 insertions(+), 32 deletions(-)

gcc-riscv-fcmp-split.diff
Index: gcc/gcc/config/riscv/riscv.md
===================================================================
--- gcc.orig/gcc/config/riscv/riscv.md
+++ gcc/gcc/config/riscv/riscv.md
@@ -57,6 +57,7 @@
   ;; Floating-point unspecs.
   UNSPECV_FRFLAGS
   UNSPECV_FSFLAGS
+  UNSPECV_FSNVSNAN
 
   ;; Interrupt handler instructions.
   UNSPECV_MRET
@@ -360,6 +361,7 @@
 ;; Iterator and attributes for quiet comparisons.
 (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET])
 (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET "le")])
+(define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") (UNSPEC_FLE_QUIET "LE")])
 
 ;; This code iterator allows signed and unsigned widening multiplications
 ;; to use the same template.
@@ -2326,39 +2328,31 @@
    (set_attr "mode" "<UNITMODE>")])
 
 (define_expand "f<quiet_pattern>_quiet<ANYF:mode><X:mode>4"
-   [(parallel [(set (match_operand:X      0 "register_operand")
-		    (unspec:X
-		     [(match_operand:ANYF 1 "register_operand")
-		      (match_operand:ANYF 2 "register_operand")]
-		     QUIET_COMPARISON))
-	       (clobber (match_scratch:X 3))])]
-  "TARGET_HARD_FLOAT")
-
-(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default"
-   [(set (match_operand:X      0 "register_operand" "=r")
-	 (unspec:X
-	  [(match_operand:ANYF 1 "register_operand" " f")
-	   (match_operand:ANYF 2 "register_operand" " f")]
-	  QUIET_COMPARISON))
-    (clobber (match_scratch:X 3 "=&r"))]
-  "TARGET_HARD_FLOAT && ! HONOR_SNANS (<ANYF:MODE>mode)"
-  "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3"
-  [(set_attr "type" "fcmp")
-   (set_attr "mode" "<UNITMODE>")
-   (set (attr "length") (const_int 12))])
+   [(set (match_operand:X               0 "register_operand")
+	 (unspec:X [(match_operand:ANYF 1 "register_operand")
+		    (match_operand:ANYF 2 "register_operand")]
+		   QUIET_COMPARISON))]
+  "TARGET_HARD_FLOAT"
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  rtx tmp = gen_reg_rtx (SImode);
+  rtx cmp = gen_rtx_<QUIET_PATTERN> (<X:MODE>mode, op1, op2);
+  rtx frflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, const0_rtx),
+					 UNSPECV_FRFLAGS);
+  rtx fsflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, tmp),
+					 UNSPECV_FSFLAGS);
 
-(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan"
-   [(set (match_operand:X      0 "register_operand" "=r")
-	 (unspec:X
-	  [(match_operand:ANYF 1 "register_operand" " f")
-	   (match_operand:ANYF 2 "register_operand" " f")]
-	  QUIET_COMPARISON))
-    (clobber (match_scratch:X 3 "=&r"))]
-  "TARGET_HARD_FLOAT && HONOR_SNANS (<ANYF:MODE>mode)"
-  "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3\n\tfeq.<fmt>\tzero,%1,%2"
-  [(set_attr "type" "fcmp")
-   (set_attr "mode" "<UNITMODE>")
-   (set (attr "length") (const_int 16))])
+  emit_insn (gen_rtx_SET (tmp, frflags));
+  emit_insn (gen_rtx_SET (op0, cmp));
+  emit_insn (fsflags);
+  if (HONOR_SNANS (<ANYF:MODE>mode))
+    emit_insn (gen_rtx_UNSPEC_VOLATILE (<ANYF:MODE>mode,
+					gen_rtvec (2, op1, op2),
+					UNSPECV_FSNVSNAN));
+  DONE;
+})
 
 (define_insn "*seq_zero_<X:mode><GPR:mode>"
   [(set (match_operand:GPR       0 "register_operand" "=r")
@@ -2766,6 +2760,15 @@
   "TARGET_HARD_FLOAT"
   "fsflags\t%0")
 
+(define_insn "*riscv_fsnvsnan<mode>2"
+  [(unspec_volatile [(match_operand:ANYF 0 "register_operand")
+		     (match_operand:ANYF 1 "register_operand")]
+		    UNSPECV_FSNVSNAN)]
+  "TARGET_HARD_FLOAT"
+  "feq.<fmt>\tzero,%0,%1"
+  [(set_attr "type" "fcmp")
+   (set_attr "mode" "<UNITMODE>")])
+
 (define_insn "riscv_mret"
   [(return)
    (unspec_volatile [(const_int 0)] UNSPECV_MRET)]
Index: gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */
+
+long
+fle (double x, double y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */
+
+long
+fle (double x, double y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fle.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-ffinite-math-only" } */
+
+long
+fle (double x, double y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:gt|le)\\.d\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */
+
+long
+flef (float x, float y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */
+
+long
+flef (float x, float y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-ffinite-math-only" } */
+
+long
+flef (float x, float y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:gt|le)\\.s\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */
+
+long
+flt (double x, double y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */
+
+long
+flt (double x, double y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-ffinite-math-only" } */
+
+long
+flt (double x, double y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.d\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */
+
+long
+fltf (float x, float y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */
+
+long
+fltf (float x, float y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-ffinite-math-only" } */
+
+long
+fltf (float x, float y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.s\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] RISC-V: Split unordered FP comparisons into individual RTL insns
  2022-06-09 13:44 [PATCH] RISC-V: Split unordered FP comparisons into individual RTL insns Maciej W. Rozycki
@ 2022-06-23 13:39 ` Maciej W. Rozycki
  2022-06-23 16:44   ` Kito Cheng
  0 siblings, 1 reply; 7+ messages in thread
From: Maciej W. Rozycki @ 2022-06-23 13:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Waterman, Jim Wilson, Kito Cheng, Palmer Dabbelt

On Thu, 9 Jun 2022, Maciej W. Rozycki wrote:

>  I'm yet running some benchmarking to see if the use of UNSPEC_VOLATILEs 
> makes any noticeable performance difference, but I suspect it does not as 
> the compiler could not do much about the original multiple-instruction 
> single RTL insns anyway.

 This has now finally completed.  I used SPECfp 2006 built at `-O3' and 
statically linked, which needs ~33 hours per run with the HiFive Unmatched 
board at its standard 1196MHz clock rate.  Here are the results merged by 
hand from original reports:

               Base     Base     Base     Base   Est Base  Est Base  Est Base   
Benchmarks     Ref.   Run Time Run Time Run Time   Ratio     Ratio     Ratio  
                       (base)  (length) (split)   (base)   (length)   (split)  
------------- ------  -------  -------  -------  --------  --------  -------- 
410.bwaves     13590    10353    10396    10370    1.31      1.31      1.31   
416.gamess     19580     9080     9410     9284    2.16      2.08      2.11   
433.milc        9180     5465     5475     5610    1.68      1.68      1.64   
434.zeusmp      9100     5773     5767     5761    1.58      1.58      1.58   
435.gromacs     7140     3605     3561     3545    1.98      2.00      2.01   
436.cactusADM  11950     7779     7658     7680    1.54      1.56      1.56   
437.leslie3d    9400    10280    10697    10274    0.914     0.879     0.915  
444.namd        8020     3141     3120     3129    2.55      2.57      2.56   
447.dealII     11440     3459     3490     3495    3.31      3.28      3.27   
450.soplex      8340     4698     4899     4781    1.78      1.70      1.74   
453.povray      5320     1953     1922     1916    2.72      2.77      2.78   
454.calculix    8250     4844     4857     4821    1.70      1.70      1.71   
459.GemsFDTD   10610     8703     8957     9028    1.22      1.18      1.18   
465.tonto       9840     4585     4539     4620    2.15      2.17      2.13   
470.lbm        13740    10172    10945    10789    1.35      1.26      1.27   
481.wrf        11170     8516     8646     8584    1.31      1.29      1.30   
482.sphinx3    19490     9240     9267     9280    2.11      2.10      2.10   
==============================================================================

The execution time reference (second column) is for a Sun Ultra Enterprise 
2 system from 1997, based on a 296MHz UltraSPARC II CPU, times are given 
in seconds (lower is better) and the ratios calculated are in relation to 
the reference (higher is better).

In the table above "base" results are with upstream master as at commit 
7b98910406b5 ("c++: ICE with template NEW_EXPR [PR105803]".  Then "length" 
results are with commit 72b185189f91 ("RISC-V: Reset the length to the 
default of 4 for FP comparisons") applied on top, as it does make changes 
to code produced even at `-O3' (where size matters less than speed), e.g.:

    46b2c:	8d01b787          	fld	fa5,-1840(gp) # 7760a8 <__SDATA_BEGIN__+0xd0>
-   46b30:	66f4b027          	fsd	fa5,1632(s1)
-   46b34:	a029                	j	46b3e <gciinp_+0x124>
-   46b36:	8c01b787          	fld	fa5,-1856(gp) # 776098 <__SDATA_BEGIN__+0xc0>
-   46b3a:	66f4b027          	fsd	fa5,1632(s1)
-   46b3e:	00ab67b7          	lui	a5,0xab6
-   46b42:	0a07b707          	fld	fa4,160(a5) # ab60a0 <runopt_>
-   46b46:	8d81b787          	fld	fa5,-1832(gp) # 7760b0 <__SDATA_BEGIN__+0xd8>
-   46b4a:	a2f727d3          	feq.d	a5,fa4,fa5
-   46b4e:	18079fe3          	bnez	a5,474ec <gciinp_+0xad2>
-   46b52:	00afd7b7          	lui	a5,0xafd
-   46b56:	4607a703          	lw	a4,1120(a5) # afd460 <symtry_+0x47340>
-   46b5a:	4785                	li	a5,1
-   46b5c:	18f708e3          	beq	a4,a5,474ec <gciinp_+0xad2>
-   46b60:	00aaeab7          	lui	s5,0xaae
-   46b64:	d70a8a93          	addi	s5,s5,-656 # aadd70 <infoa_>
-   46b68:	008aa783          	lw	a5,8(s5)
-   46b6c:	8301b707          	fld	fa4,-2000(gp) # 776008 <__SDATA_BEGIN__+0x30>
-   46b70:	37fd                	addiw	a5,a5,-1
+   46b30:	00ab67b7          	lui	a5,0xab6
+   46b34:	0a07b707          	fld	fa4,160(a5) # ab60a0 <runopt_>
+   46b38:	66f4b027          	fsd	fa5,1632(s1)
+   46b3c:	8d81b787          	fld	fa5,-1832(gp) # 7760b0 <__SDATA_BEGIN__+0xd8>
+   46b40:	a2f727d3          	feq.d	a5,fa4,fa5
+   46b44:	c39d                	beqz	a5,46b6a <gciinp_+0x150>
+   46b46:	8901b787          	fld	fa5,-1904(gp) # 776068 <__SDATA_BEGIN__+0x90>
+   46b4a:	66f4b027          	fsd	fa5,1632(s1)
+   46b4e:	a02d                	j	46b78 <gciinp_+0x15e>
+   46b50:	8c01b787          	fld	fa5,-1856(gp) # 776098 <__SDATA_BEGIN__+0xc0>
+   46b54:	66f4b027          	fsd	fa5,1632(s1)
+   46b58:	00ab67b7          	lui	a5,0xab6
+   46b5c:	0a07b707          	fld	fa4,160(a5) # ab60a0 <runopt_>
+   46b60:	8d81b787          	fld	fa5,-1832(gp) # 7760b0 <__SDATA_BEGIN__+0xd8>
+   46b64:	a2f727d3          	feq.d	a5,fa4,fa5
+   46b68:	fff9                	bnez	a5,46b46 <gciinp_+0x12c>
+   46b6a:	00afd7b7          	lui	a5,0xafd
+   46b6e:	4607a703          	lw	a4,1120(a5) # afd460 <symtry_+0x47340>
+   46b72:	4785                	li	a5,1
+   46b74:	fcf709e3          	beq	a4,a5,46b46 <gciinp_+0x12c>
+   46b78:	00aaeab7          	lui	s5,0xaae
+   46b7c:	d70a8a93          	addi	s5,s5,-656 # aadd70 <infoa_>
+   46b80:	008aa783          	lw	a5,8(s5)
+   46b84:	8301b707          	fld	fa4,-2000(gp) # 776008 <__SDATA_BEGIN__+0x30>
+   46b88:	37fd                	addiw	a5,a5,-1

And finally "split" is with this patch also applied, changing code in 
places as well, e.g.:

@@ -4873598,13 +4873598,13 @@
   5f5744:	87bf70ef          	jal	ra,5ecfbe <_gfortrani_internal_error>
   5f5748:	8281b407          	fld	fs0,-2008(gp) # 776000 <__SDATA_BEGIN__+0x28>
   5f574c:	221c                	fld	fa5,0(a2)
-  5f574e:	0079a7b7          	lui	a5,0x79a
-  5f5752:	ac22                	fsd	fs0,24(sp)
-  5f5754:	a83e                	fsd	fa5,16(sp)
-  5f5756:	27c2                	fld	fa5,16(sp)
-  5f5758:	4907b707          	fld	fa4,1168(a5) # 79a490 <__global_pointer$+0x23cb8>
-  5f575c:	22f7a7d3          	fabs.d	fa5,fa5
-  5f5760:	00102773          	frflags	a4
+  5f574e:	ac22                	fsd	fs0,24(sp)
+  5f5750:	a83e                	fsd	fa5,16(sp)
+  5f5752:	27c2                	fld	fa5,16(sp)
+  5f5754:	22f7a7d3          	fabs.d	fa5,fa5
+  5f5758:	00102773          	frflags	a4
+  5f575c:	0079a7b7          	lui	a5,0x79a
+  5f5760:	4907b707          	fld	fa4,1168(a5) # 79a490 <__global_pointer$+0x23cb8>
   5f5764:	a2e787d3          	fle.d	a5,fa5,fa4
   5f5768:	00171073          	fsflags	a4
   5f576c:	2c078363          	beqz	a5,5f5a32 <determine_en_precision+0x328>

or:

@@ -4909410,9 +4909410,9 @@
   60eb8a:	a2f696d3          	flt.d	a3,fa3,fa5
   60eb8e:	00161073          	fsflags	a2
   60eb92:	ee81                	bnez	a3,60ebaa <__hypot+0x58>
-  60eb94:	f20707d3          	fmv.d.x	fa5,a4
-  60eb98:	22f7a7d3          	fabs.d	fa5,fa5
-  60eb9c:	00102673          	frflags	a2
+  60eb94:	00102673          	frflags	a2
+  60eb98:	f20707d3          	fmv.d.x	fa5,a4
+  60eb9c:	22f7a7d3          	fabs.d	fa5,fa5
   60eba0:	a2f696d3          	flt.d	a3,fa3,fa5
   60eba4:	00161073          	fsflags	a2
   60eba8:	c29d                	beqz	a3,60ebce <__hypot+0x7c>

(so no arithmetic FP instructions appear to be scheduled between FSFLAGS 
and FRFLAGS, though it's not clear to me how the compiler knows it is not 
allowed do it) or finally:

-   66204:	52cd754b          	fnmsub.d	fa0,fs10,fa2,fa0
-   66208:	40157553          	fcvt.s.d	fa0,fa0
-   6620c:	a0e517d3          	flt.s	a5,fa0,fa4
-   66210:	58079263          	bnez	a5,66794 <do_cg+0xd20>
-   66214:	00102773          	frflags	a4
-   66218:	a0e517d3          	flt.s	a5,fa0,fa4
-   6621c:	00171073          	fsflags	a4
-   66220:	220793e3          	bnez	a5,66c46 <do_cg+0x11d2>
-   66224:	580576d3          	fsqrt.s	fa3,fa0
+   6620c:	52cd754b          	fnmsub.d	fa0,fs10,fa2,fa0
+   66210:	40157553          	fcvt.s.d	fa0,fa0
+   66214:	a0e517d3          	flt.s	a5,fa0,fa4
+   66218:	58079063          	bnez	a5,66798 <do_cg+0xd1c>
+   6621c:	00102773          	frflags	a4
+   66220:	00171073          	fsflags	a4
+   66224:	220793e3          	bnez	a5,66c4a <do_cg+0x11ce>
+   66228:	580576d3          	fsqrt.s	fa3,fa0

(at least removing a redundant FLT.S instruction, although this doesn't 
seem optimal anyway as there appears no way for the second BNEZ branch to 
be ever taken, but I gather that's an unfortunate consequence of the 
volatility of `riscv_frflags'/`riscv_fsflags' RTL insns) and I was able to 
spot a place where an FMV.D instruction has been removed too, indicating a 
better register allocation.

 Results quoted above seem to suggest that in some cases a performance 
regression has resulted from the change, but that may not necessarily be 
the case given that the benchmarks have been run on a live even if lightly 
loaded Linux system.  Obtaining standard three samples would require ~4.5 
days per SPECfp 2006 iteration or almost a fortnight total.

 Therefore I chose to rerun only one of the worst offenders and the 
results are as follows:

                                  Estimated  
                Base     Base       Base     
Benchmarks      Ref.   Run Time     Ratio    
-------------- ------  ---------  ---------  
416.gamess      19580       9138       2.14 S
416.gamess      19580       9498       2.06 S
416.gamess      19580       9478       2.07 *

corresponding to the "split" result earlier on.  So the variation between 
runs is similar to the supposed loss of performance and therefore I think 
we do not need to be concerned.  If there's anything that we're missing, 
it's the tracking of IEEE exception flags, as I previously mentioned.

 I did not run benchmarking for `-fsignaling-nans'.  Relative figures are 
expected to be similar as the only difference is the presence of a FEQ.fmt 
instruction following FSFLAGS.  I've spotted this anomaly however:

-   3c670:	5a057553          	fsqrt.d	fa0,fa0
-   3c674:	f20006d3          	fmv.d.x	fa3,zero
-   3c678:	8d01b787          	fld	fa5,-1840(gp) # c5678 <__SDATA_BEGIN__+0xd0>
-   3c67c:	0ad57553          	fsub.d	fa0,fa0,fa3
-   3c680:	a2a797d3          	flt.d	a5,fa5,fa0
-   3c684:	e3cd                	bnez	a5,3c726 <_ZN9ResultSet5checkEv+0xe6>
-   3c686:	8d81b787          	fld	fa5,-1832(gp) # c5680 <__SDATA_BEGIN__+0xd8>
-   3c68a:	a2f517d3          	flt.d	a5,fa0,fa5
-   3c68e:	efc1                	bnez	a5,3c726 <_ZN9ResultSet5checkEv+0xe6>
-   3c690:	3578                	fld	fa4,232(a0)
-   3c692:	317c                	fld	fa5,224(a0)
-   3c694:	3968                	fld	fa0,240(a0)
-   3c696:	12e77753          	fmul.d	fa4,fa4,fa4
-   3c69a:	72f7f7c3          	fmadd.d	fa5,fa5,fa5,fa4
-   3c69e:	7aa57543          	fmadd.d	fa0,fa0,fa0,fa5
-   3c6a2:	00102773          	frflags	a4
-   3c6a6:	a2d517d3          	flt.d	a5,fa0,fa3
-   3c6aa:	00171073          	fsflags	a4
-   3c6ae:	a2d52053          	feq.d	zero,fa0,fa3
-   3c6b2:	efc9                	bnez	a5,3c74c <_ZN9ResultSet5checkEv+0x10c>
-   3c6b4:	5a057553          	fsqrt.d	fa0,fa0
+   3c66e:	5a057553          	fsqrt.d	fa0,fa0
+   3c672:	f20006d3          	fmv.d.x	fa3,zero
+   3c676:	8d01b787          	fld	fa5,-1840(gp) # c5678 <__SDATA_BEGIN__+0xd0>
+   3c67a:	0ad57553          	fsub.d	fa0,fa0,fa3
+   3c67e:	a2a797d3          	flt.d	a5,fa5,fa0
+   3c682:	e7cd                	bnez	a5,3c72c <_ZN9ResultSet5checkEv+0xee>
+   3c684:	8d81b787          	fld	fa5,-1832(gp) # c5680 <__SDATA_BEGIN__+0xd8>
+   3c688:	a2f517d3          	flt.d	a5,fa0,fa5
+   3c68c:	e3c5                	bnez	a5,3c72c <_ZN9ResultSet5checkEv+0xee>
+   3c68e:	3578                	fld	fa4,232(a0)
+   3c690:	317c                	fld	fa5,224(a0)
+   3c692:	3968                	fld	fa0,240(a0)
+   3c694:	12e77753          	fmul.d	fa4,fa4,fa4
+   3c698:	72f7f7c3          	fmadd.d	fa5,fa5,fa5,fa4
+   3c69c:	7aa57543          	fmadd.d	fa0,fa0,fa0,fa5
+   3c6a0:	00102773          	frflags	a4
+   3c6a4:	a2d517d3          	flt.d	a5,fa0,fa3
+   3c6a8:	00171073          	fsflags	a4
+   3c6ac:	f20007d3          	fmv.d.x	fa5,zero
+   3c6b0:	a2f52053          	feq.d	zero,fa0,fa5
+   3c6b4:	efd9                	bnez	a5,3c752 <_ZN9ResultSet5checkEv+0x114>
+   3c6b6:	5a057553          	fsqrt.d	fa0,fa0

where the compiler for some reason cannot realise it already has the value 
of 0.0 available in fa3 and instead uses an extra move to fa5 for the 
final FEQ.D.

>  No regressions with the GCC (with and w/o `-fsignaling-nans') and glibc 
> testsuites (as per commit 1fcbfb00fc67 ("RISC-V: Fix -fsignaling-nans for 
> glibc testsuite.")).  OK to apply?

 Any comments on the change, anyone?

  Maciej

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] RISC-V: Split unordered FP comparisons into individual RTL insns
  2022-06-23 13:39 ` Maciej W. Rozycki
@ 2022-06-23 16:44   ` Kito Cheng
  2022-07-04 14:12     ` [PATCH v2] " Maciej W. Rozycki
  0 siblings, 1 reply; 7+ messages in thread
From: Kito Cheng @ 2022-06-23 16:44 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: GCC Patches, Andrew Waterman

[-- Attachment #1: Type: text/plain, Size: 3376 bytes --]

Hi Maciej:

Thanks for detail analysis and performance number report, I am concern
about this patch might let compiler schedule the fsflags/frflags with
other floating point instructions, and the major issue is we didn't
model fflags right in GCC as you mentioned in the first email.

So I think we should model this right before we split that, I guess
that would be a bunch of work:
1. Add fflags to the hard register list.
2. Add (clobber (reg fflags)) or (set (reg fflags) (fpop
(operands...))) to those floating point operations which might change
fflags
3. Rewrite riscv_frflags and riscv_fsflags pattern by right RTL
pattern: (set (reg) (reg fflags)) and (set (reg fflags) (reg)).
4. Then split *f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default and
*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan pattern.

However I am not sure about the code gen impact of 2, especially the
impact to the combine pass, not sure if you are interested to give a
try?

And, I did some hack for part of this approach (1+3+4) got following
result for "__builtin_isless (x, y) + __builtin_isless (x, z)":

fltlt:
       frflags a4      # 8     [c=4 l=4]  riscv_frflags
       flt.d   a5,fa0,fa1      # 14    [c=4 l=4]  *cstoredfdi4
       flt.d   a0,fa0,fa2      # 17    [c=4 l=4]  *cstoredfdi4
       fsflags a4      # 18    [c=4 l=4]  riscv_fsflags
       add     a0,a0,a5        # 30    [c=4 l=4]  adddi3/0
       ret             # 40    [c=0 l=4]  simple_return

Verbose version:
fltlt:
#(insn 8 5 9 (set (reg:SI 14 a4 [88])
#        (reg:SI 66 fflags)) "x.c":5:10 258 {riscv_frflags}
#     (expr_list:REG_DEAD (reg:SI 66 fflags)
#        (nil)))
       frflags a4      # 8     [c=4 l=4]  riscv_frflags
#(insn 14 11 15 (parallel [
#            (set (reg:DI 15 a5 [90])
#                (lt:DI (reg/v:DF 42 fa0 [orig:81 x ] [81])
#                    (reg:DF 43 fa1 [101])))
#            (clobber:SI (reg:SI 66 fflags))
#        ]) "x.c":5:10 197 {*cstoredfdi4}
#     (expr_list:REG_DEAD (reg:DF 43 fa1 [101])
#        (expr_list:REG_UNUSED (reg:SI 66 fflags)
#            (nil))))
       flt.d   a5,fa0,fa1      # 14    [c=4 l=4]  *cstoredfdi4
#(insn 17 15 18 (parallel [
#            (set (reg:DI 10 a0 [94])
#                (lt:DI (reg/v:DF 42 fa0 [orig:81 x ] [81])
#                    (reg:DF 44 fa2 [102])))
#            (clobber:SI (reg:SI 66 fflags))
#        ]) "x.c":5:36 197 {*cstoredfdi4}
#     (expr_list:REG_DEAD (reg:DF 44 fa2 [102])
#        (expr_list:REG_DEAD (reg/v:DF 42 fa0 [orig:81 x ] [81])
#            (expr_list:REG_UNUSED (reg:SI 66 fflags)
#                (nil)))))
       flt.d   a0,fa0,fa2      # 17    [c=4 l=4]  *cstoredfdi4
#(insn 18 17 19 (set (reg:SI 66 fflags)
#        (reg:SI 14 a4 [88])) "x.c":5:36 259 {riscv_fsflags}
#     (expr_list:REG_DEAD (reg:SI 14 a4 [88])
#        (nil)))
       fsflags a4      # 18    [c=4 l=4]  riscv_fsflags
#(insn 30 25 31 (set (reg/i:DI 10 a0)
#        (plus:DI (reg:DI 10 a0 [94])
#            (reg:DI 15 a5 [90]))) "x.c":6:1 4 {adddi3}
#     (expr_list:REG_DEAD (reg:DI 15 a5 [90])
#        (nil)))
       add     a0,a0,a5        # 30    [c=4 l=4]  adddi3/0
#(jump_insn 40 39 41 (simple_return) "x.c":6:1 244 {simple_return}
#     (nil)
# -> simple_return)
       ret             # 40    [c=0 l=4]  simple_return
----

But this hack add an extra use of fflags to prevent FFLAGS getting
CSEed, patch attached.

[-- Attachment #2: 0001-model-fflags.patch --]
[-- Type: text/x-patch, Size: 6664 bytes --]

From 1116422bb5a69d8361f5c5bc334a122fecbaa306 Mon Sep 17 00:00:00 2001
From: Kito Cheng <kito.cheng@sifive.com>
Date: Fri, 24 Jun 2022 00:41:55 +0800
Subject: [PATCH] model fflags

---
 gcc/config/riscv/riscv.h  | 12 +++++------
 gcc/config/riscv/riscv.md | 42 ++++++++++++++++++++-------------------
 gcc/function.cc           |  1 +
 3 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 6f7f4d3fbdc..c4e6efe9885 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -291,7 +291,7 @@ ASM_MISA_SPEC
 	- ARG_POINTER_REGNUM
 	- FRAME_POINTER_REGNUM */
 
-#define FIRST_PSEUDO_REGISTER 66
+#define FIRST_PSEUDO_REGISTER 67
 
 /* x0, sp, gp, and tp are fixed.  */
 
@@ -303,7 +303,7 @@ ASM_MISA_SPEC
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,			\
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,			\
   /* Others.  */							\
-  1, 1									\
+  1, 1, 1								\
 }
 
 /* a0-a7, t0-t6, fa0-fa7, and ft0-ft11 are volatile across calls.
@@ -317,7 +317,7 @@ ASM_MISA_SPEC
   1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,			\
   1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,			\
   /* Others.  */							\
-  1, 1									\
+  1, 1, 1,								\
 }
 
 /* Select a register mode required for caller save of hard regno REGNO.
@@ -472,7 +472,7 @@ enum reg_class
   { 0xffffffff, 0x00000000, 0x00000000 },	/* GR_REGS */		\
   { 0x00000000, 0xffffffff, 0x00000000 },	/* FP_REGS */		\
   { 0x00000000, 0x00000000, 0x00000003 },	/* FRAME_REGS */	\
-  { 0xffffffff, 0xffffffff, 0x00000003 }	/* ALL_REGS */		\
+  { 0xffffffff, 0xffffffff, 0x00000007 }	/* ALL_REGS */		\
 }
 
 /* A C expression whose value is a register class containing hard
@@ -514,7 +514,7 @@ enum reg_class
   40, 41, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,			\
   /* None of the remaining classes have defined call-saved		\
      registers.  */							\
-  64, 65								\
+  64, 65, 66								\
 }
 
 /* True if VALUE is a signed 12-bit number.  */
@@ -777,7 +777,7 @@ typedef struct {
   "fs0", "fs1", "fa0", "fa1", "fa2", "fa3", "fa4", "fa5",	\
   "fa6", "fa7", "fs2", "fs3", "fs4", "fs5", "fs6", "fs7",	\
   "fs8", "fs9", "fs10","fs11","ft8", "ft9", "ft10","ft11",	\
-  "arg", "frame", }
+  "arg", "frame", "fflags" }
 
 #define ADDITIONAL_REGISTER_NAMES					\
 {									\
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4f59bb99cf5..9039cc73e4d 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -47,6 +47,7 @@ (define_c_enum "unspec" [
 
   ;; Stack tie
   UNSPEC_TIE
+  UNSPEC_FSNVSNAN
 ])
 
 (define_c_enum "unspecv" [
@@ -57,7 +58,6 @@ (define_c_enum "unspecv" [
   ;; Floating-point unspecs.
   UNSPECV_FRFLAGS
   UNSPECV_FSFLAGS
-  UNSPECV_FSNVSNAN
 
   ;; Interrupt handler instructions.
   UNSPECV_MRET
@@ -99,6 +99,7 @@ (define_constants
    (S9_REGNUM			25)
    (S10_REGNUM			26)
    (S11_REGNUM			27)
+   (FFLAGS_REGNUM		66)
 
    (NORMAL_RETURN		0)
    (SIBCALL_RETURN		1)
@@ -2300,7 +2301,8 @@ (define_expand "cstore<mode>4"
   [(set (match_operand:SI 0 "register_operand")
 	(match_operator:SI 1 "order_operator"
 	    [(match_operand:GPR 2 "register_operand")
-	     (match_operand:GPR 3 "nonmemory_operand")]))]
+	     (match_operand:GPR 3 "nonmemory_operand")]))
+   (clobber (reg:SI FFLAGS_REGNUM))]
   ""
 {
   riscv_expand_int_scc (operands[0], GET_CODE (operands[1]), operands[2],
@@ -2312,7 +2314,8 @@ (define_expand "cstore<mode>4"
   [(set (match_operand:SI 0 "register_operand")
 	(match_operator:SI 1 "fp_scc_comparison"
 	     [(match_operand:ANYF 2 "register_operand")
-	      (match_operand:ANYF 3 "register_operand")]))]
+	      (match_operand:ANYF 3 "register_operand")]))
+   (clobber (reg:SI FFLAGS_REGNUM))]
   "TARGET_HARD_FLOAT"
 {
   riscv_expand_float_scc (operands[0], GET_CODE (operands[1]), operands[2],
@@ -2324,7 +2327,8 @@ (define_insn "*cstore<ANYF:mode><X:mode>4"
    [(set (match_operand:X         0 "register_operand" "=r")
 	 (match_operator:X 1 "fp_native_comparison"
 	     [(match_operand:ANYF 2 "register_operand" " f")
-	      (match_operand:ANYF 3 "register_operand" " f")]))]
+	      (match_operand:ANYF 3 "register_operand" " f")]))
+   (clobber (reg:SI FFLAGS_REGNUM))]
   "TARGET_HARD_FLOAT"
   "f%C1.<fmt>\t%0,%2,%3"
   [(set_attr "type" "fcmp")
@@ -2342,18 +2346,15 @@ (define_expand "f<quiet_pattern>_quiet<ANYF:mode><X:mode>4"
   rtx op2 = operands[2];
   rtx tmp = gen_reg_rtx (SImode);
   rtx cmp = gen_rtx_<QUIET_PATTERN> (<X:MODE>mode, op1, op2);
-  rtx frflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, const0_rtx),
-					 UNSPECV_FRFLAGS);
-  rtx fsflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, tmp),
-					 UNSPECV_FSFLAGS);
-
-  emit_insn (gen_rtx_SET (tmp, frflags));
-  emit_insn (gen_rtx_SET (op0, cmp));
-  emit_insn (fsflags);
+
+  emit_insn (gen_riscv_frflags (tmp));
+rtx set = gen_rtx_SET (op0, cmp);
+rtx clobber =  gen_rtx_CLOBBER(SImode, gen_rtx_REG(SImode, FFLAGS_REGNUM));
+  rtvec vec = gen_rtvec (2, set, clobber);
+  emit_insn (gen_rtx_PARALLEL(VOIDmode, vec));
+  emit_insn (gen_riscv_fsflags(tmp));
   if (HONOR_SNANS (<ANYF:MODE>mode))
-    emit_insn (gen_rtx_UNSPEC_VOLATILE (<ANYF:MODE>mode,
-					gen_rtvec (2, op1, op2),
-					UNSPECV_FSNVSNAN));
+    emit_insn (gen_riscv_fsnvsnan<ANYF:mode>2 (op1, op2));
   DONE;
 })
 
@@ -2754,19 +2755,20 @@ (define_insn "gpr_restore_return"
 
 (define_insn "riscv_frflags"
   [(set (match_operand:SI 0 "register_operand" "=r")
-	(unspec_volatile [(const_int 0)] UNSPECV_FRFLAGS))]
+	(reg:SI FFLAGS_REGNUM))]
   "TARGET_HARD_FLOAT"
   "frflags\t%0")
 
 (define_insn "riscv_fsflags"
-  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] UNSPECV_FSFLAGS)]
+  [(set (reg:SI FFLAGS_REGNUM) (match_operand:SI 0 "csr_operand" "rK"))]
   "TARGET_HARD_FLOAT"
   "fsflags\t%0")
 
-(define_insn "*riscv_fsnvsnan<mode>2"
-  [(unspec_volatile [(match_operand:ANYF 0 "register_operand")
+(define_insn "riscv_fsnvsnan<mode>2"
+  [(set (reg:SI FFLAGS_REGNUM)
+        (unspec [(match_operand:ANYF 0 "register_operand")
 		     (match_operand:ANYF 1 "register_operand")]
-		    UNSPECV_FSNVSNAN)]
+		    UNSPEC_FSNVSNAN))]
   "TARGET_HARD_FLOAT"
   "feq.<fmt>\tzero,%0,%1"
   [(set_attr "type" "fcmp")
diff --git a/gcc/function.cc b/gcc/function.cc
index ad0096a43ef..cb0d743558f 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -5590,6 +5590,7 @@ expand_function_end (void)
      sh mach_dep_reorg) that still try and compute their own lifetime info
      instead of using the general framework.  */
   use_return_register ();
+  emit_insn (gen_rtx_USE(SImode, gen_rtx_REG(SImode,66)));
 }
 
 rtx
-- 
2.34.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] RISC-V: Split unordered FP comparisons into individual RTL insns
  2022-06-23 16:44   ` Kito Cheng
@ 2022-07-04 14:12     ` Maciej W. Rozycki
  2022-07-18 15:42       ` [PING][PATCH " Maciej W. Rozycki
  0 siblings, 1 reply; 7+ messages in thread
From: Maciej W. Rozycki @ 2022-07-04 14:12 UTC (permalink / raw)
  To: Kito Cheng; +Cc: GCC Patches, Andrew Waterman

We have unordered FP comparisons implemented as RTL insns that produce 
multiple machine instructions.  Such RTL insns are hard to match with a 
processor pipeline description and additionally there is a redundant 
SNEZ instruction produced on the result of these comparisons even though 
the FLT.fmt and FLE.fmt machine instructions already produce either 0 or 
1, e.g.:

long
flt (double x, double y)
{
  return __builtin_isless (x, y);
}

with `-O2 -fno-finite-math-only -ftrapping-math -fno-signaling-nans' 
gets compiled to:

	.globl	flt
	.type	flt, @function
flt:
	frflags	a5
	flt.d	a0,fa0,fa1
	fsflags	a5
	snez	a0,a0
	ret
	.size	flt, .-flt

because the middle end can't see through the UNSPEC operation unordered 
FP comparisons have been defined in terms of.

These instructions are only produced via an expander already, so change 
the expander to emit individual RTL insns for each machine instruction 
in the ultimate ultimate sequence produced rather than deferring to a 
single RTL insn producing the whole sequence at once.

	gcc/
	* config/riscv/riscv.md (UNSPECV_FSNVSNAN): New constant.
	(QUIET_PATTERN): New int attribute.
	(f<quiet_pattern>_quiet<ANYF:mode><X:mode>4): Emit the intended 
	RTL insns entirely within the preparation statements.
	(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default)
	(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan): Remove 
	insns.
	(*riscv_fsnvsnan<mode>2): New insn.

	gcc/testsuite/
	* gcc.target/riscv/fle-ieee.c: New test.
	* gcc.target/riscv/fle-snan.c: New test.
	* gcc.target/riscv/fle.c: New test.
	* gcc.target/riscv/flef-ieee.c: New test.
	* gcc.target/riscv/flef-snan.c: New test.
	* gcc.target/riscv/flef.c: New test.
	* gcc.target/riscv/flt-ieee.c: New test.
	* gcc.target/riscv/flt-snan.c: New test.
	* gcc.target/riscv/flt.c: New test.
	* gcc.target/riscv/fltf-ieee.c: New test.
	* gcc.target/riscv/fltf-snan.c: New test.
	* gcc.target/riscv/fltf.c: New test.
---
Hi Kito,

 Thank you for your review.

> Thanks for detail analysis and performance number report, I am concern
> about this patch might let compiler schedule the fsflags/frflags with
> other floating point instructions, and the major issue is we didn't
> model fflags right in GCC as you mentioned in the first email.

 I have now looked through various places and we're good.

 First of all the C language standard has this:

"F.8.1 Environment management

"IEC 60559 requires that floating-point operations implicitly raise 
floating-point exception status flags, and that rounding control modes can 
be set explicitly to affect result values of floating-point operations. 
When the state for the FENV_ACCESS pragma (defined in <fenv.h>) is "on", 
these changes to the floating-point state are treated as side effects 
which respect sequence points."

We don't actually support the FENV_ACCESS pragma, however we provide for 
having FP environment support in the C library (e.g. `riscv_getflags' and 
`riscv_setflags' among others is the RISC-V port of glibc):

"  * 'The default state for the 'FENV_ACCESS' pragma (C99 and C11
     7.6.1).'

"    This pragma is not implemented, but the default is to "off" unless
     '-frounding-math' is used in which case it is "on"."

(I find this misleading, because my interpretation of our documentation 
and code is that the default is "on" whenever `-frounding-math' and 
`-ftrapping-math' are active both at a time; arguably the text is however 
correct, because `-ftrapping-math' is on by default, so it doesn't have to 
"be used" and it's `-fno-trapping-math' that has to "be unused" for the 
effect to be in place of what FENV_ACCESS "on" would do should we 
implement it).

 Now `riscv_getflags' and `riscv_setflags' use `asm volatile' to peek or 
poke at IEEE exception flags, so for these pieces of code to work the 
compiler has to make sure not to reorder volatile instructions around 
trapping instructions and that is handled in `can_move_insns_across':

	  if (may_trap_or_fault_p (PATTERN (insn))
	      && (trapping_insns_in_across
		  || other_branch_live != NULL
		  || volatile_insn_p (PATTERN (insn))))
	    break;

(cf.: <https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00254.html>, and 
commit c6d851b95a7b).  And we consider FP arithmetic instructions trapping 
under `-ftrapping-math' in `may_trap_p_1':

    case NEG:
    case ABS:
    case SUBREG:
    case VEC_MERGE:
    case VEC_SELECT:
    case VEC_CONCAT:
    case VEC_DUPLICATE:
      /* These operations don't trap even with floating point.  */
      break;

    default:
      /* Any floating arithmetic may trap.  */
      if (FLOAT_MODE_P (GET_MODE (x)) && flag_trapping_math)
	return 1;

(other relevant operations are handled elsewhere with this switch 
statement).  This is consistent with my observations where only FLD or 
FABS.D (neither trapping) get reordered across FRFLAGS (volatile by means 
of `unspec_volatile').

 However following the observations above I chose to update the test cases 
to better reflect how we control IEEE exception handling and use 
`-fno-trapping-math' rather than `-ffinite-math-only' to disable the use 
of unordered comparison operations.

> So I think we should model this right before we split that, I guess
> that would be a bunch of work:
> 1. Add fflags to the hard register list.
> 2. Add (clobber (reg fflags)) or (set (reg fflags) (fpop
> (operands...))) to those floating point operations which might change
> fflags
> 3. Rewrite riscv_frflags and riscv_fsflags pattern by right RTL
> pattern: (set (reg) (reg fflags)) and (set (reg fflags) (reg)).
> 4. Then split *f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default and
> *f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan pattern.
> 
> However I am not sure about the code gen impact of 2, especially the
> impact to the combine pass, not sure if you are interested to give a
> try?

 I think we can look into it long-term as a further optimisation, but to 
solve the problem of insn scheduling at hand my current proposal seems 
adequate enough and I suspect adding full data dependency tracking for the 
IEEE flags could be a rabbit hole (ISTM after all no other target does 
that, and I smell there's been a reason for that).

 FAOD if at all I'd envision doing such tracking individually for each
exception flag, following "Accumulating CSRs" listings from Section 14.3 
"Source and Destination Register Listings" in the unprivileged ISA spec.

 While re-reviewing code I have spotted I previously missed operand 
constraints for the new `*riscv_fsnvsnan<mode>2' insn, so I have added 
them now.

 I have verified that the new test cases still pass with the update in 
place, and I have rerun full `-fsignaling-nans' regression testing for the 
constraint fix.  OK to apply?

  Maciej

Changes from v1:

- Add operand constraints for the new `*riscv_fsnvsnan<mode>2' insn.

- In test cases use `-fno-trapping-math' rather than `-ffinite-math-only'; 
  consequently force `-fno-finite-math-only' so that any defaults do not 
  interfere with the results expected.
---
 gcc/config/riscv/riscv.md                  |   67 +++++++++++++++--------------
 gcc/testsuite/gcc.target/riscv/fle-ieee.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/fle-snan.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/fle.c       |   12 +++++
 gcc/testsuite/gcc.target/riscv/flef-ieee.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/flef-snan.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/flef.c      |   12 +++++
 gcc/testsuite/gcc.target/riscv/flt-ieee.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/flt-snan.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/flt.c       |   12 +++++
 gcc/testsuite/gcc.target/riscv/fltf-ieee.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/fltf-snan.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/fltf.c      |   12 +++++
 13 files changed, 179 insertions(+), 32 deletions(-)

gcc-riscv-fcmp-split.diff
Index: gcc/gcc/config/riscv/riscv.md
===================================================================
--- gcc.orig/gcc/config/riscv/riscv.md
+++ gcc/gcc/config/riscv/riscv.md
@@ -57,6 +57,7 @@
   ;; Floating-point unspecs.
   UNSPECV_FRFLAGS
   UNSPECV_FSFLAGS
+  UNSPECV_FSNVSNAN
 
   ;; Interrupt handler instructions.
   UNSPECV_MRET
@@ -360,6 +361,7 @@
 ;; Iterator and attributes for quiet comparisons.
 (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET])
 (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET "le")])
+(define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") (UNSPEC_FLE_QUIET "LE")])
 
 ;; This code iterator allows signed and unsigned widening multiplications
 ;; to use the same template.
@@ -2326,39 +2328,31 @@
    (set_attr "mode" "<UNITMODE>")])
 
 (define_expand "f<quiet_pattern>_quiet<ANYF:mode><X:mode>4"
-   [(parallel [(set (match_operand:X      0 "register_operand")
-		    (unspec:X
-		     [(match_operand:ANYF 1 "register_operand")
-		      (match_operand:ANYF 2 "register_operand")]
-		     QUIET_COMPARISON))
-	       (clobber (match_scratch:X 3))])]
-  "TARGET_HARD_FLOAT")
-
-(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default"
-   [(set (match_operand:X      0 "register_operand" "=r")
-	 (unspec:X
-	  [(match_operand:ANYF 1 "register_operand" " f")
-	   (match_operand:ANYF 2 "register_operand" " f")]
-	  QUIET_COMPARISON))
-    (clobber (match_scratch:X 3 "=&r"))]
-  "TARGET_HARD_FLOAT && ! HONOR_SNANS (<ANYF:MODE>mode)"
-  "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3"
-  [(set_attr "type" "fcmp")
-   (set_attr "mode" "<UNITMODE>")
-   (set (attr "length") (const_int 12))])
+   [(set (match_operand:X               0 "register_operand")
+	 (unspec:X [(match_operand:ANYF 1 "register_operand")
+		    (match_operand:ANYF 2 "register_operand")]
+		   QUIET_COMPARISON))]
+  "TARGET_HARD_FLOAT"
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  rtx tmp = gen_reg_rtx (SImode);
+  rtx cmp = gen_rtx_<QUIET_PATTERN> (<X:MODE>mode, op1, op2);
+  rtx frflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, const0_rtx),
+					 UNSPECV_FRFLAGS);
+  rtx fsflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, tmp),
+					 UNSPECV_FSFLAGS);
 
-(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan"
-   [(set (match_operand:X      0 "register_operand" "=r")
-	 (unspec:X
-	  [(match_operand:ANYF 1 "register_operand" " f")
-	   (match_operand:ANYF 2 "register_operand" " f")]
-	  QUIET_COMPARISON))
-    (clobber (match_scratch:X 3 "=&r"))]
-  "TARGET_HARD_FLOAT && HONOR_SNANS (<ANYF:MODE>mode)"
-  "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3\n\tfeq.<fmt>\tzero,%1,%2"
-  [(set_attr "type" "fcmp")
-   (set_attr "mode" "<UNITMODE>")
-   (set (attr "length") (const_int 16))])
+  emit_insn (gen_rtx_SET (tmp, frflags));
+  emit_insn (gen_rtx_SET (op0, cmp));
+  emit_insn (fsflags);
+  if (HONOR_SNANS (<ANYF:MODE>mode))
+    emit_insn (gen_rtx_UNSPEC_VOLATILE (<ANYF:MODE>mode,
+					gen_rtvec (2, op1, op2),
+					UNSPECV_FSNVSNAN));
+  DONE;
+})
 
 (define_insn "*seq_zero_<X:mode><GPR:mode>"
   [(set (match_operand:GPR       0 "register_operand" "=r")
@@ -2766,6 +2760,15 @@
   "TARGET_HARD_FLOAT"
   "fsflags\t%0")
 
+(define_insn "*riscv_fsnvsnan<mode>2"
+  [(unspec_volatile [(match_operand:ANYF 0 "register_operand" "f")
+		     (match_operand:ANYF 1 "register_operand" "f")]
+		    UNSPECV_FSNVSNAN)]
+  "TARGET_HARD_FLOAT"
+  "feq.<fmt>\tzero,%0,%1"
+  [(set_attr "type" "fcmp")
+   (set_attr "mode" "<UNITMODE>")])
+
 (define_insn "riscv_mret"
   [(return)
    (unspec_volatile [(const_int 0)] UNSPECV_MRET)]
Index: gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } */
+
+long
+fle (double x, double y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+
+long
+fle (double x, double y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fle.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" } */
+
+long
+fle (double x, double y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:gt|le)\\.d\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } */
+
+long
+flef (float x, float y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+
+long
+flef (float x, float y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" } */
+
+long
+flef (float x, float y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:gt|le)\\.s\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } */
+
+long
+flt (double x, double y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+
+long
+flt (double x, double y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" } */
+
+long
+flt (double x, double y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.d\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } */
+
+long
+fltf (float x, float y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+
+long
+fltf (float x, float y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" } */
+
+long
+fltf (float x, float y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.s\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PING][PATCH v2] RISC-V: Split unordered FP comparisons into individual RTL insns
  2022-07-04 14:12     ` [PATCH v2] " Maciej W. Rozycki
@ 2022-07-18 15:42       ` Maciej W. Rozycki
  2022-07-27 10:40         ` Kito Cheng
  0 siblings, 1 reply; 7+ messages in thread
From: Maciej W. Rozycki @ 2022-07-18 15:42 UTC (permalink / raw)
  To: Kito Cheng; +Cc: GCC Patches, Andrew Waterman

On Mon, 4 Jul 2022, Maciej W. Rozycki wrote:

> These instructions are only produced via an expander already, so change 
> the expander to emit individual RTL insns for each machine instruction 
> in the ultimate ultimate sequence produced rather than deferring to a 
> single RTL insn producing the whole sequence at once.

 Ping for:

<https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597767.html>

  Maciej

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PING][PATCH v2] RISC-V: Split unordered FP comparisons into individual RTL insns
  2022-07-18 15:42       ` [PING][PATCH " Maciej W. Rozycki
@ 2022-07-27 10:40         ` Kito Cheng
  2022-07-28 13:20           ` Maciej W. Rozycki
  0 siblings, 1 reply; 7+ messages in thread
From: Kito Cheng @ 2022-07-27 10:40 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: GCC Patches, Andrew Waterman

Hi Maciej:

I am convinced that is OK for now, I agree modeling fflags would be a
rabbit hole, I tried to build a full GNU toolchain with my quick patch
and saw many ICE during build libraries, that definitely should be a
long-term optimization project.

Although I'm thinking if we should default -fno-trapping-math for
RISC-V, because RISC-V didn't trap for any floating point operations,
however I think that would be another topic.

so you got my LGTM :)

On Mon, Jul 18, 2022 at 11:43 PM Maciej W. Rozycki <macro@embecosm.com> wrote:
>
> On Mon, 4 Jul 2022, Maciej W. Rozycki wrote:
>
> > These instructions are only produced via an expander already, so change
> > the expander to emit individual RTL insns for each machine instruction
> > in the ultimate ultimate sequence produced rather than deferring to a
> > single RTL insn producing the whole sequence at once.
>
>  Ping for:
>
> <https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597767.html>
>
>   Maciej

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PING][PATCH v2] RISC-V: Split unordered FP comparisons into individual RTL insns
  2022-07-27 10:40         ` Kito Cheng
@ 2022-07-28 13:20           ` Maciej W. Rozycki
  0 siblings, 0 replies; 7+ messages in thread
From: Maciej W. Rozycki @ 2022-07-28 13:20 UTC (permalink / raw)
  To: Kito Cheng; +Cc: GCC Patches, Andrew Waterman

Hi Kito,

> I am convinced that is OK for now, I agree modeling fflags would be a
> rabbit hole, I tried to build a full GNU toolchain with my quick patch
> and saw many ICE during build libraries, that definitely should be a
> long-term optimization project.
> 
> Although I'm thinking if we should default -fno-trapping-math for
> RISC-V, because RISC-V didn't trap for any floating point operations,
> however I think that would be another topic.

 No need to do anything for RISC-V I believe, as Richard has mentioned 
that's been his plan for GCC 13 compiler-wide; see the discussion here:
<https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598589.html> for 
further details.

> so you got my LGTM :)

 I have applied this change now then, thank you for your review.

  Maciej

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-07-28 13:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-09 13:44 [PATCH] RISC-V: Split unordered FP comparisons into individual RTL insns Maciej W. Rozycki
2022-06-23 13:39 ` Maciej W. Rozycki
2022-06-23 16:44   ` Kito Cheng
2022-07-04 14:12     ` [PATCH v2] " Maciej W. Rozycki
2022-07-18 15:42       ` [PING][PATCH " Maciej W. Rozycki
2022-07-27 10:40         ` Kito Cheng
2022-07-28 13:20           ` Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).