From: "Maciej W. Rozycki" <macro@embecosm.com>
To: Kito Cheng <kito.cheng@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
Andrew Waterman <andrew@sifive.com>
Subject: [PATCH v2] RISC-V: Split unordered FP comparisons into individual RTL insns
Date: Mon, 4 Jul 2022 15:12:08 +0100 (BST) [thread overview]
Message-ID: <alpine.DEB.2.20.2207011936120.10833@tpp.orcam.me.uk> (raw)
In-Reply-To: <CA+yXCZBLavnLVv=AVYQiSnoa38u5i2OwHSEg1fxJa0kF2e8x6Q@mail.gmail.com>
We have unordered FP comparisons implemented as RTL insns that produce
multiple machine instructions. Such RTL insns are hard to match with a
processor pipeline description and additionally there is a redundant
SNEZ instruction produced on the result of these comparisons even though
the FLT.fmt and FLE.fmt machine instructions already produce either 0 or
1, e.g.:
long
flt (double x, double y)
{
return __builtin_isless (x, y);
}
with `-O2 -fno-finite-math-only -ftrapping-math -fno-signaling-nans'
gets compiled to:
.globl flt
.type flt, @function
flt:
frflags a5
flt.d a0,fa0,fa1
fsflags a5
snez a0,a0
ret
.size flt, .-flt
because the middle end can't see through the UNSPEC operation unordered
FP comparisons have been defined in terms of.
These instructions are only produced via an expander already, so change
the expander to emit individual RTL insns for each machine instruction
in the ultimate ultimate sequence produced rather than deferring to a
single RTL insn producing the whole sequence at once.
gcc/
* config/riscv/riscv.md (UNSPECV_FSNVSNAN): New constant.
(QUIET_PATTERN): New int attribute.
(f<quiet_pattern>_quiet<ANYF:mode><X:mode>4): Emit the intended
RTL insns entirely within the preparation statements.
(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default)
(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan): Remove
insns.
(*riscv_fsnvsnan<mode>2): New insn.
gcc/testsuite/
* gcc.target/riscv/fle-ieee.c: New test.
* gcc.target/riscv/fle-snan.c: New test.
* gcc.target/riscv/fle.c: New test.
* gcc.target/riscv/flef-ieee.c: New test.
* gcc.target/riscv/flef-snan.c: New test.
* gcc.target/riscv/flef.c: New test.
* gcc.target/riscv/flt-ieee.c: New test.
* gcc.target/riscv/flt-snan.c: New test.
* gcc.target/riscv/flt.c: New test.
* gcc.target/riscv/fltf-ieee.c: New test.
* gcc.target/riscv/fltf-snan.c: New test.
* gcc.target/riscv/fltf.c: New test.
---
Hi Kito,
Thank you for your review.
> Thanks for detail analysis and performance number report, I am concern
> about this patch might let compiler schedule the fsflags/frflags with
> other floating point instructions, and the major issue is we didn't
> model fflags right in GCC as you mentioned in the first email.
I have now looked through various places and we're good.
First of all the C language standard has this:
"F.8.1 Environment management
"IEC 60559 requires that floating-point operations implicitly raise
floating-point exception status flags, and that rounding control modes can
be set explicitly to affect result values of floating-point operations.
When the state for the FENV_ACCESS pragma (defined in <fenv.h>) is "on",
these changes to the floating-point state are treated as side effects
which respect sequence points."
We don't actually support the FENV_ACCESS pragma, however we provide for
having FP environment support in the C library (e.g. `riscv_getflags' and
`riscv_setflags' among others is the RISC-V port of glibc):
" * 'The default state for the 'FENV_ACCESS' pragma (C99 and C11
7.6.1).'
" This pragma is not implemented, but the default is to "off" unless
'-frounding-math' is used in which case it is "on"."
(I find this misleading, because my interpretation of our documentation
and code is that the default is "on" whenever `-frounding-math' and
`-ftrapping-math' are active both at a time; arguably the text is however
correct, because `-ftrapping-math' is on by default, so it doesn't have to
"be used" and it's `-fno-trapping-math' that has to "be unused" for the
effect to be in place of what FENV_ACCESS "on" would do should we
implement it).
Now `riscv_getflags' and `riscv_setflags' use `asm volatile' to peek or
poke at IEEE exception flags, so for these pieces of code to work the
compiler has to make sure not to reorder volatile instructions around
trapping instructions and that is handled in `can_move_insns_across':
if (may_trap_or_fault_p (PATTERN (insn))
&& (trapping_insns_in_across
|| other_branch_live != NULL
|| volatile_insn_p (PATTERN (insn))))
break;
(cf.: <https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00254.html>, and
commit c6d851b95a7b). And we consider FP arithmetic instructions trapping
under `-ftrapping-math' in `may_trap_p_1':
case NEG:
case ABS:
case SUBREG:
case VEC_MERGE:
case VEC_SELECT:
case VEC_CONCAT:
case VEC_DUPLICATE:
/* These operations don't trap even with floating point. */
break;
default:
/* Any floating arithmetic may trap. */
if (FLOAT_MODE_P (GET_MODE (x)) && flag_trapping_math)
return 1;
(other relevant operations are handled elsewhere with this switch
statement). This is consistent with my observations where only FLD or
FABS.D (neither trapping) get reordered across FRFLAGS (volatile by means
of `unspec_volatile').
However following the observations above I chose to update the test cases
to better reflect how we control IEEE exception handling and use
`-fno-trapping-math' rather than `-ffinite-math-only' to disable the use
of unordered comparison operations.
> So I think we should model this right before we split that, I guess
> that would be a bunch of work:
> 1. Add fflags to the hard register list.
> 2. Add (clobber (reg fflags)) or (set (reg fflags) (fpop
> (operands...))) to those floating point operations which might change
> fflags
> 3. Rewrite riscv_frflags and riscv_fsflags pattern by right RTL
> pattern: (set (reg) (reg fflags)) and (set (reg fflags) (reg)).
> 4. Then split *f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default and
> *f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan pattern.
>
> However I am not sure about the code gen impact of 2, especially the
> impact to the combine pass, not sure if you are interested to give a
> try?
I think we can look into it long-term as a further optimisation, but to
solve the problem of insn scheduling at hand my current proposal seems
adequate enough and I suspect adding full data dependency tracking for the
IEEE flags could be a rabbit hole (ISTM after all no other target does
that, and I smell there's been a reason for that).
FAOD if at all I'd envision doing such tracking individually for each
exception flag, following "Accumulating CSRs" listings from Section 14.3
"Source and Destination Register Listings" in the unprivileged ISA spec.
While re-reviewing code I have spotted I previously missed operand
constraints for the new `*riscv_fsnvsnan<mode>2' insn, so I have added
them now.
I have verified that the new test cases still pass with the update in
place, and I have rerun full `-fsignaling-nans' regression testing for the
constraint fix. OK to apply?
Maciej
Changes from v1:
- Add operand constraints for the new `*riscv_fsnvsnan<mode>2' insn.
- In test cases use `-fno-trapping-math' rather than `-ffinite-math-only';
consequently force `-fno-finite-math-only' so that any defaults do not
interfere with the results expected.
---
gcc/config/riscv/riscv.md | 67 +++++++++++++++--------------
gcc/testsuite/gcc.target/riscv/fle-ieee.c | 12 +++++
gcc/testsuite/gcc.target/riscv/fle-snan.c | 12 +++++
gcc/testsuite/gcc.target/riscv/fle.c | 12 +++++
gcc/testsuite/gcc.target/riscv/flef-ieee.c | 12 +++++
gcc/testsuite/gcc.target/riscv/flef-snan.c | 12 +++++
gcc/testsuite/gcc.target/riscv/flef.c | 12 +++++
gcc/testsuite/gcc.target/riscv/flt-ieee.c | 12 +++++
gcc/testsuite/gcc.target/riscv/flt-snan.c | 12 +++++
gcc/testsuite/gcc.target/riscv/flt.c | 12 +++++
gcc/testsuite/gcc.target/riscv/fltf-ieee.c | 12 +++++
gcc/testsuite/gcc.target/riscv/fltf-snan.c | 12 +++++
gcc/testsuite/gcc.target/riscv/fltf.c | 12 +++++
13 files changed, 179 insertions(+), 32 deletions(-)
gcc-riscv-fcmp-split.diff
Index: gcc/gcc/config/riscv/riscv.md
===================================================================
--- gcc.orig/gcc/config/riscv/riscv.md
+++ gcc/gcc/config/riscv/riscv.md
@@ -57,6 +57,7 @@
;; Floating-point unspecs.
UNSPECV_FRFLAGS
UNSPECV_FSFLAGS
+ UNSPECV_FSNVSNAN
;; Interrupt handler instructions.
UNSPECV_MRET
@@ -360,6 +361,7 @@
;; Iterator and attributes for quiet comparisons.
(define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET])
(define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET "le")])
+(define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") (UNSPEC_FLE_QUIET "LE")])
;; This code iterator allows signed and unsigned widening multiplications
;; to use the same template.
@@ -2326,39 +2328,31 @@
(set_attr "mode" "<UNITMODE>")])
(define_expand "f<quiet_pattern>_quiet<ANYF:mode><X:mode>4"
- [(parallel [(set (match_operand:X 0 "register_operand")
- (unspec:X
- [(match_operand:ANYF 1 "register_operand")
- (match_operand:ANYF 2 "register_operand")]
- QUIET_COMPARISON))
- (clobber (match_scratch:X 3))])]
- "TARGET_HARD_FLOAT")
-
-(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default"
- [(set (match_operand:X 0 "register_operand" "=r")
- (unspec:X
- [(match_operand:ANYF 1 "register_operand" " f")
- (match_operand:ANYF 2 "register_operand" " f")]
- QUIET_COMPARISON))
- (clobber (match_scratch:X 3 "=&r"))]
- "TARGET_HARD_FLOAT && ! HONOR_SNANS (<ANYF:MODE>mode)"
- "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3"
- [(set_attr "type" "fcmp")
- (set_attr "mode" "<UNITMODE>")
- (set (attr "length") (const_int 12))])
+ [(set (match_operand:X 0 "register_operand")
+ (unspec:X [(match_operand:ANYF 1 "register_operand")
+ (match_operand:ANYF 2 "register_operand")]
+ QUIET_COMPARISON))]
+ "TARGET_HARD_FLOAT"
+{
+ rtx op0 = operands[0];
+ rtx op1 = operands[1];
+ rtx op2 = operands[2];
+ rtx tmp = gen_reg_rtx (SImode);
+ rtx cmp = gen_rtx_<QUIET_PATTERN> (<X:MODE>mode, op1, op2);
+ rtx frflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, const0_rtx),
+ UNSPECV_FRFLAGS);
+ rtx fsflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, tmp),
+ UNSPECV_FSFLAGS);
-(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan"
- [(set (match_operand:X 0 "register_operand" "=r")
- (unspec:X
- [(match_operand:ANYF 1 "register_operand" " f")
- (match_operand:ANYF 2 "register_operand" " f")]
- QUIET_COMPARISON))
- (clobber (match_scratch:X 3 "=&r"))]
- "TARGET_HARD_FLOAT && HONOR_SNANS (<ANYF:MODE>mode)"
- "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3\n\tfeq.<fmt>\tzero,%1,%2"
- [(set_attr "type" "fcmp")
- (set_attr "mode" "<UNITMODE>")
- (set (attr "length") (const_int 16))])
+ emit_insn (gen_rtx_SET (tmp, frflags));
+ emit_insn (gen_rtx_SET (op0, cmp));
+ emit_insn (fsflags);
+ if (HONOR_SNANS (<ANYF:MODE>mode))
+ emit_insn (gen_rtx_UNSPEC_VOLATILE (<ANYF:MODE>mode,
+ gen_rtvec (2, op1, op2),
+ UNSPECV_FSNVSNAN));
+ DONE;
+})
(define_insn "*seq_zero_<X:mode><GPR:mode>"
[(set (match_operand:GPR 0 "register_operand" "=r")
@@ -2766,6 +2760,15 @@
"TARGET_HARD_FLOAT"
"fsflags\t%0")
+(define_insn "*riscv_fsnvsnan<mode>2"
+ [(unspec_volatile [(match_operand:ANYF 0 "register_operand" "f")
+ (match_operand:ANYF 1 "register_operand" "f")]
+ UNSPECV_FSNVSNAN)]
+ "TARGET_HARD_FLOAT"
+ "feq.<fmt>\tzero,%0,%1"
+ [(set_attr "type" "fcmp")
+ (set_attr "mode" "<UNITMODE>")])
+
(define_insn "riscv_mret"
[(return)
(unspec_volatile [(const_int 0)] UNSPECV_MRET)]
Index: gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } */
+
+long
+fle (double x, double y)
+{
+ return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+
+long
+fle (double x, double y)
+{
+ return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fle.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" } */
+
+long
+fle (double x, double y)
+{
+ return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:gt|le)\\.d\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } */
+
+long
+flef (float x, float y)
+{
+ return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+
+long
+flef (float x, float y)
+{
+ return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" } */
+
+long
+flef (float x, float y)
+{
+ return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:gt|le)\\.s\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } */
+
+long
+flt (double x, double y)
+{
+ return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+
+long
+flt (double x, double y)
+{
+ return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" } */
+
+long
+flt (double x, double y)
+{
+ return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.d\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } */
+
+long
+fltf (float x, float y)
+{
+ return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+
+long
+fltf (float x, float y)
+{
+ return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" } */
+
+long
+fltf (float x, float y)
+{
+ return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.s\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
next prev parent reply other threads:[~2022-07-04 14:12 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-09 13:44 [PATCH] " Maciej W. Rozycki
2022-06-23 13:39 ` Maciej W. Rozycki
2022-06-23 16:44 ` Kito Cheng
2022-07-04 14:12 ` Maciej W. Rozycki [this message]
2022-07-18 15:42 ` [PING][PATCH v2] " Maciej W. Rozycki
2022-07-27 10:40 ` Kito Cheng
2022-07-28 13:20 ` Maciej W. Rozycki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.20.2207011936120.10833@tpp.orcam.me.uk \
--to=macro@embecosm.com \
--cc=andrew@sifive.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=kito.cheng@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).