* [PATCH] RISC-V: Split unordered FP comparisons into individual RTL insns @ 2022-06-09 13:44 Maciej W. Rozycki 2022-06-23 13:39 ` Maciej W. Rozycki 0 siblings, 1 reply; 7+ messages in thread From: Maciej W. Rozycki @ 2022-06-09 13:44 UTC (permalink / raw) To: gcc-patches; +Cc: Andrew Waterman, Jim Wilson, Kito Cheng, Palmer Dabbelt We have unordered FP comparisons implemented as RTL insns that produce multiple machine instructions. Such RTL insns are hard to match with a processor pipeline description and additionally there is a redundant SNEZ instruction produced on the result of these comparisons even though the FLT.fmt and FLE.fmt machine instructions already produce either 0 or 1, e.g.: long flt (double x, double y) { return __builtin_isless (x, y); } with `-O2 -fno-finite-math-only -fno-signaling-nans' gets compiled to: .globl flt .type flt, @function flt: frflags a5 flt.d a0,fa0,fa1 fsflags a5 snez a0,a0 ret .size flt, .-flt because the middle end can't see through the UNSPEC operation unordered FP comparisons have been defined in terms of. These instructions are only produced via an expander already, so change the expander to emit individual RTL insns for each machine instruction in the ultimate ultimate sequence produced rather than deferring to a single RTL insn producing the whole sequence at once. gcc/ * config/riscv/riscv.md (UNSPECV_FSNVSNAN): New constant. (QUIET_PATTERN): New int attribute. (f<quiet_pattern>_quiet<ANYF:mode><X:mode>4): Emit the intended RTL insns entirely within the preparation statements. (*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default) (*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan): Remove insns. (*riscv_fsnvsnan<mode>2): New insn. gcc/testsuite/ * gcc.target/riscv/fle-ieee.c: New test. * gcc.target/riscv/fle-snan.c: New test. * gcc.target/riscv/fle.c: New test. * gcc.target/riscv/flef-ieee.c: New test. * gcc.target/riscv/flef-snan.c: New test. * gcc.target/riscv/flef.c: New test. * gcc.target/riscv/flt-ieee.c: New test. * gcc.target/riscv/flt-snan.c: New test. * gcc.target/riscv/flt.c: New test. * gcc.target/riscv/fltf-ieee.c: New test. * gcc.target/riscv/fltf-snan.c: New test. * gcc.target/riscv/fltf.c: New test. --- Hi, I think it is a step in the right direction, however ultimately I think we ought to actually tell GCC about the IEEE exception flags, so that the compiler can track data dependencies and we do not have to resort to UNSPECs which the compiler cannot see through. E.g. for a piece of code like: long fltlt (double x, double y, double z) { return __builtin_isless (x, y) + __builtin_isless (x, z); } (using an addition here for clarity because for a logical operation even more horror is produced) we get: .globl fltlt .type fltlt, @function fltlt: frflags a5 # 8 [c=4 l=4] riscv_frflags flt.d a0,fa0,fa1 # 9 [c=4 l=4] *cstoredfdi4 fsflags a5 # 10 [c=0 l=4] riscv_fsflags frflags a4 # 16 [c=4 l=4] riscv_frflags flt.d a5,fa0,fa2 # 17 [c=4 l=4] *cstoredfdi4 fsflags a4 # 18 [c=0 l=4] riscv_fsflags addw a0,a0,a5 # 30 [c=8 l=4] *addsi3_extended/0 ret # 40 [c=0 l=4] simple_return .size fltlt, .-fltlt where the middle FSFLAGS/FRFLAGS pair makes no sense of course and is a waste of both space and cycles. I'm yet running some benchmarking to see if the use of UNSPEC_VOLATILEs makes any noticeable performance difference, but I suspect it does not as the compiler could not do much about the original multiple-instruction single RTL insns anyway. No regressions with the GCC (with and w/o `-fsignaling-nans') and glibc testsuites (as per commit 1fcbfb00fc67 ("RISC-V: Fix -fsignaling-nans for glibc testsuite.")). OK to apply? Maciej --- gcc/config/riscv/riscv.md | 67 +++++++++++++++-------------- gcc/testsuite/gcc.target/riscv/fle-ieee.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fle-snan.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fle.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flef-ieee.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flef-snan.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flef.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flt-ieee.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flt-snan.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flt.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fltf-ieee.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fltf-snan.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fltf.c | 12 +++++ 13 files changed, 179 insertions(+), 32 deletions(-) gcc-riscv-fcmp-split.diff Index: gcc/gcc/config/riscv/riscv.md =================================================================== --- gcc.orig/gcc/config/riscv/riscv.md +++ gcc/gcc/config/riscv/riscv.md @@ -57,6 +57,7 @@ ;; Floating-point unspecs. UNSPECV_FRFLAGS UNSPECV_FSFLAGS + UNSPECV_FSNVSNAN ;; Interrupt handler instructions. UNSPECV_MRET @@ -360,6 +361,7 @@ ;; Iterator and attributes for quiet comparisons. (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET]) (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET "le")]) +(define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") (UNSPEC_FLE_QUIET "LE")]) ;; This code iterator allows signed and unsigned widening multiplications ;; to use the same template. @@ -2326,39 +2328,31 @@ (set_attr "mode" "<UNITMODE>")]) (define_expand "f<quiet_pattern>_quiet<ANYF:mode><X:mode>4" - [(parallel [(set (match_operand:X 0 "register_operand") - (unspec:X - [(match_operand:ANYF 1 "register_operand") - (match_operand:ANYF 2 "register_operand")] - QUIET_COMPARISON)) - (clobber (match_scratch:X 3))])] - "TARGET_HARD_FLOAT") - -(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default" - [(set (match_operand:X 0 "register_operand" "=r") - (unspec:X - [(match_operand:ANYF 1 "register_operand" " f") - (match_operand:ANYF 2 "register_operand" " f")] - QUIET_COMPARISON)) - (clobber (match_scratch:X 3 "=&r"))] - "TARGET_HARD_FLOAT && ! HONOR_SNANS (<ANYF:MODE>mode)" - "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3" - [(set_attr "type" "fcmp") - (set_attr "mode" "<UNITMODE>") - (set (attr "length") (const_int 12))]) + [(set (match_operand:X 0 "register_operand") + (unspec:X [(match_operand:ANYF 1 "register_operand") + (match_operand:ANYF 2 "register_operand")] + QUIET_COMPARISON))] + "TARGET_HARD_FLOAT" +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + rtx tmp = gen_reg_rtx (SImode); + rtx cmp = gen_rtx_<QUIET_PATTERN> (<X:MODE>mode, op1, op2); + rtx frflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, const0_rtx), + UNSPECV_FRFLAGS); + rtx fsflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, tmp), + UNSPECV_FSFLAGS); -(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan" - [(set (match_operand:X 0 "register_operand" "=r") - (unspec:X - [(match_operand:ANYF 1 "register_operand" " f") - (match_operand:ANYF 2 "register_operand" " f")] - QUIET_COMPARISON)) - (clobber (match_scratch:X 3 "=&r"))] - "TARGET_HARD_FLOAT && HONOR_SNANS (<ANYF:MODE>mode)" - "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3\n\tfeq.<fmt>\tzero,%1,%2" - [(set_attr "type" "fcmp") - (set_attr "mode" "<UNITMODE>") - (set (attr "length") (const_int 16))]) + emit_insn (gen_rtx_SET (tmp, frflags)); + emit_insn (gen_rtx_SET (op0, cmp)); + emit_insn (fsflags); + if (HONOR_SNANS (<ANYF:MODE>mode)) + emit_insn (gen_rtx_UNSPEC_VOLATILE (<ANYF:MODE>mode, + gen_rtvec (2, op1, op2), + UNSPECV_FSNVSNAN)); + DONE; +}) (define_insn "*seq_zero_<X:mode><GPR:mode>" [(set (match_operand:GPR 0 "register_operand" "=r") @@ -2766,6 +2760,15 @@ "TARGET_HARD_FLOAT" "fsflags\t%0") +(define_insn "*riscv_fsnvsnan<mode>2" + [(unspec_volatile [(match_operand:ANYF 0 "register_operand") + (match_operand:ANYF 1 "register_operand")] + UNSPECV_FSNVSNAN)] + "TARGET_HARD_FLOAT" + "feq.<fmt>\tzero,%0,%1" + [(set_attr "type" "fcmp") + (set_attr "mode" "<UNITMODE>")]) + (define_insn "riscv_mret" [(return) (unspec_volatile [(const_int 0)] UNSPECV_MRET)] Index: gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */ + +long +fle (double x, double y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */ + +long +fle (double x, double y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fle.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fle.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-ffinite-math-only" } */ + +long +fle (double x, double y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tf(?:gt|le)\\.d\t\[^\n\]*\n" } } */ +/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */ + +long +flef (float x, float y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */ + +long +flef (float x, float y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flef.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flef.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-ffinite-math-only" } */ + +long +flef (float x, float y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tf(?:gt|le)\\.s\t\[^\n\]*\n" } } */ +/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */ + +long +flt (double x, double y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */ + +long +flt (double x, double y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flt.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flt.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-ffinite-math-only" } */ + +long +flt (double x, double y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.d\t\[^\n\]*\n" } } */ +/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */ + +long +fltf (float x, float y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */ + +long +fltf (float x, float y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fltf.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fltf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-ffinite-math-only" } */ + +long +fltf (float x, float y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.s\t\[^\n\]*\n" } } */ +/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] RISC-V: Split unordered FP comparisons into individual RTL insns 2022-06-09 13:44 [PATCH] RISC-V: Split unordered FP comparisons into individual RTL insns Maciej W. Rozycki @ 2022-06-23 13:39 ` Maciej W. Rozycki 2022-06-23 16:44 ` Kito Cheng 0 siblings, 1 reply; 7+ messages in thread From: Maciej W. Rozycki @ 2022-06-23 13:39 UTC (permalink / raw) To: gcc-patches; +Cc: Andrew Waterman, Jim Wilson, Kito Cheng, Palmer Dabbelt On Thu, 9 Jun 2022, Maciej W. Rozycki wrote: > I'm yet running some benchmarking to see if the use of UNSPEC_VOLATILEs > makes any noticeable performance difference, but I suspect it does not as > the compiler could not do much about the original multiple-instruction > single RTL insns anyway. This has now finally completed. I used SPECfp 2006 built at `-O3' and statically linked, which needs ~33 hours per run with the HiFive Unmatched board at its standard 1196MHz clock rate. Here are the results merged by hand from original reports: Base Base Base Base Est Base Est Base Est Base Benchmarks Ref. Run Time Run Time Run Time Ratio Ratio Ratio (base) (length) (split) (base) (length) (split) ------------- ------ ------- ------- ------- -------- -------- -------- 410.bwaves 13590 10353 10396 10370 1.31 1.31 1.31 416.gamess 19580 9080 9410 9284 2.16 2.08 2.11 433.milc 9180 5465 5475 5610 1.68 1.68 1.64 434.zeusmp 9100 5773 5767 5761 1.58 1.58 1.58 435.gromacs 7140 3605 3561 3545 1.98 2.00 2.01 436.cactusADM 11950 7779 7658 7680 1.54 1.56 1.56 437.leslie3d 9400 10280 10697 10274 0.914 0.879 0.915 444.namd 8020 3141 3120 3129 2.55 2.57 2.56 447.dealII 11440 3459 3490 3495 3.31 3.28 3.27 450.soplex 8340 4698 4899 4781 1.78 1.70 1.74 453.povray 5320 1953 1922 1916 2.72 2.77 2.78 454.calculix 8250 4844 4857 4821 1.70 1.70 1.71 459.GemsFDTD 10610 8703 8957 9028 1.22 1.18 1.18 465.tonto 9840 4585 4539 4620 2.15 2.17 2.13 470.lbm 13740 10172 10945 10789 1.35 1.26 1.27 481.wrf 11170 8516 8646 8584 1.31 1.29 1.30 482.sphinx3 19490 9240 9267 9280 2.11 2.10 2.10 ============================================================================== The execution time reference (second column) is for a Sun Ultra Enterprise 2 system from 1997, based on a 296MHz UltraSPARC II CPU, times are given in seconds (lower is better) and the ratios calculated are in relation to the reference (higher is better). In the table above "base" results are with upstream master as at commit 7b98910406b5 ("c++: ICE with template NEW_EXPR [PR105803]". Then "length" results are with commit 72b185189f91 ("RISC-V: Reset the length to the default of 4 for FP comparisons") applied on top, as it does make changes to code produced even at `-O3' (where size matters less than speed), e.g.: 46b2c: 8d01b787 fld fa5,-1840(gp) # 7760a8 <__SDATA_BEGIN__+0xd0> - 46b30: 66f4b027 fsd fa5,1632(s1) - 46b34: a029 j 46b3e <gciinp_+0x124> - 46b36: 8c01b787 fld fa5,-1856(gp) # 776098 <__SDATA_BEGIN__+0xc0> - 46b3a: 66f4b027 fsd fa5,1632(s1) - 46b3e: 00ab67b7 lui a5,0xab6 - 46b42: 0a07b707 fld fa4,160(a5) # ab60a0 <runopt_> - 46b46: 8d81b787 fld fa5,-1832(gp) # 7760b0 <__SDATA_BEGIN__+0xd8> - 46b4a: a2f727d3 feq.d a5,fa4,fa5 - 46b4e: 18079fe3 bnez a5,474ec <gciinp_+0xad2> - 46b52: 00afd7b7 lui a5,0xafd - 46b56: 4607a703 lw a4,1120(a5) # afd460 <symtry_+0x47340> - 46b5a: 4785 li a5,1 - 46b5c: 18f708e3 beq a4,a5,474ec <gciinp_+0xad2> - 46b60: 00aaeab7 lui s5,0xaae - 46b64: d70a8a93 addi s5,s5,-656 # aadd70 <infoa_> - 46b68: 008aa783 lw a5,8(s5) - 46b6c: 8301b707 fld fa4,-2000(gp) # 776008 <__SDATA_BEGIN__+0x30> - 46b70: 37fd addiw a5,a5,-1 + 46b30: 00ab67b7 lui a5,0xab6 + 46b34: 0a07b707 fld fa4,160(a5) # ab60a0 <runopt_> + 46b38: 66f4b027 fsd fa5,1632(s1) + 46b3c: 8d81b787 fld fa5,-1832(gp) # 7760b0 <__SDATA_BEGIN__+0xd8> + 46b40: a2f727d3 feq.d a5,fa4,fa5 + 46b44: c39d beqz a5,46b6a <gciinp_+0x150> + 46b46: 8901b787 fld fa5,-1904(gp) # 776068 <__SDATA_BEGIN__+0x90> + 46b4a: 66f4b027 fsd fa5,1632(s1) + 46b4e: a02d j 46b78 <gciinp_+0x15e> + 46b50: 8c01b787 fld fa5,-1856(gp) # 776098 <__SDATA_BEGIN__+0xc0> + 46b54: 66f4b027 fsd fa5,1632(s1) + 46b58: 00ab67b7 lui a5,0xab6 + 46b5c: 0a07b707 fld fa4,160(a5) # ab60a0 <runopt_> + 46b60: 8d81b787 fld fa5,-1832(gp) # 7760b0 <__SDATA_BEGIN__+0xd8> + 46b64: a2f727d3 feq.d a5,fa4,fa5 + 46b68: fff9 bnez a5,46b46 <gciinp_+0x12c> + 46b6a: 00afd7b7 lui a5,0xafd + 46b6e: 4607a703 lw a4,1120(a5) # afd460 <symtry_+0x47340> + 46b72: 4785 li a5,1 + 46b74: fcf709e3 beq a4,a5,46b46 <gciinp_+0x12c> + 46b78: 00aaeab7 lui s5,0xaae + 46b7c: d70a8a93 addi s5,s5,-656 # aadd70 <infoa_> + 46b80: 008aa783 lw a5,8(s5) + 46b84: 8301b707 fld fa4,-2000(gp) # 776008 <__SDATA_BEGIN__+0x30> + 46b88: 37fd addiw a5,a5,-1 And finally "split" is with this patch also applied, changing code in places as well, e.g.: @@ -4873598,13 +4873598,13 @@ 5f5744: 87bf70ef jal ra,5ecfbe <_gfortrani_internal_error> 5f5748: 8281b407 fld fs0,-2008(gp) # 776000 <__SDATA_BEGIN__+0x28> 5f574c: 221c fld fa5,0(a2) - 5f574e: 0079a7b7 lui a5,0x79a - 5f5752: ac22 fsd fs0,24(sp) - 5f5754: a83e fsd fa5,16(sp) - 5f5756: 27c2 fld fa5,16(sp) - 5f5758: 4907b707 fld fa4,1168(a5) # 79a490 <__global_pointer$+0x23cb8> - 5f575c: 22f7a7d3 fabs.d fa5,fa5 - 5f5760: 00102773 frflags a4 + 5f574e: ac22 fsd fs0,24(sp) + 5f5750: a83e fsd fa5,16(sp) + 5f5752: 27c2 fld fa5,16(sp) + 5f5754: 22f7a7d3 fabs.d fa5,fa5 + 5f5758: 00102773 frflags a4 + 5f575c: 0079a7b7 lui a5,0x79a + 5f5760: 4907b707 fld fa4,1168(a5) # 79a490 <__global_pointer$+0x23cb8> 5f5764: a2e787d3 fle.d a5,fa5,fa4 5f5768: 00171073 fsflags a4 5f576c: 2c078363 beqz a5,5f5a32 <determine_en_precision+0x328> or: @@ -4909410,9 +4909410,9 @@ 60eb8a: a2f696d3 flt.d a3,fa3,fa5 60eb8e: 00161073 fsflags a2 60eb92: ee81 bnez a3,60ebaa <__hypot+0x58> - 60eb94: f20707d3 fmv.d.x fa5,a4 - 60eb98: 22f7a7d3 fabs.d fa5,fa5 - 60eb9c: 00102673 frflags a2 + 60eb94: 00102673 frflags a2 + 60eb98: f20707d3 fmv.d.x fa5,a4 + 60eb9c: 22f7a7d3 fabs.d fa5,fa5 60eba0: a2f696d3 flt.d a3,fa3,fa5 60eba4: 00161073 fsflags a2 60eba8: c29d beqz a3,60ebce <__hypot+0x7c> (so no arithmetic FP instructions appear to be scheduled between FSFLAGS and FRFLAGS, though it's not clear to me how the compiler knows it is not allowed do it) or finally: - 66204: 52cd754b fnmsub.d fa0,fs10,fa2,fa0 - 66208: 40157553 fcvt.s.d fa0,fa0 - 6620c: a0e517d3 flt.s a5,fa0,fa4 - 66210: 58079263 bnez a5,66794 <do_cg+0xd20> - 66214: 00102773 frflags a4 - 66218: a0e517d3 flt.s a5,fa0,fa4 - 6621c: 00171073 fsflags a4 - 66220: 220793e3 bnez a5,66c46 <do_cg+0x11d2> - 66224: 580576d3 fsqrt.s fa3,fa0 + 6620c: 52cd754b fnmsub.d fa0,fs10,fa2,fa0 + 66210: 40157553 fcvt.s.d fa0,fa0 + 66214: a0e517d3 flt.s a5,fa0,fa4 + 66218: 58079063 bnez a5,66798 <do_cg+0xd1c> + 6621c: 00102773 frflags a4 + 66220: 00171073 fsflags a4 + 66224: 220793e3 bnez a5,66c4a <do_cg+0x11ce> + 66228: 580576d3 fsqrt.s fa3,fa0 (at least removing a redundant FLT.S instruction, although this doesn't seem optimal anyway as there appears no way for the second BNEZ branch to be ever taken, but I gather that's an unfortunate consequence of the volatility of `riscv_frflags'/`riscv_fsflags' RTL insns) and I was able to spot a place where an FMV.D instruction has been removed too, indicating a better register allocation. Results quoted above seem to suggest that in some cases a performance regression has resulted from the change, but that may not necessarily be the case given that the benchmarks have been run on a live even if lightly loaded Linux system. Obtaining standard three samples would require ~4.5 days per SPECfp 2006 iteration or almost a fortnight total. Therefore I chose to rerun only one of the worst offenders and the results are as follows: Estimated Base Base Base Benchmarks Ref. Run Time Ratio -------------- ------ --------- --------- 416.gamess 19580 9138 2.14 S 416.gamess 19580 9498 2.06 S 416.gamess 19580 9478 2.07 * corresponding to the "split" result earlier on. So the variation between runs is similar to the supposed loss of performance and therefore I think we do not need to be concerned. If there's anything that we're missing, it's the tracking of IEEE exception flags, as I previously mentioned. I did not run benchmarking for `-fsignaling-nans'. Relative figures are expected to be similar as the only difference is the presence of a FEQ.fmt instruction following FSFLAGS. I've spotted this anomaly however: - 3c670: 5a057553 fsqrt.d fa0,fa0 - 3c674: f20006d3 fmv.d.x fa3,zero - 3c678: 8d01b787 fld fa5,-1840(gp) # c5678 <__SDATA_BEGIN__+0xd0> - 3c67c: 0ad57553 fsub.d fa0,fa0,fa3 - 3c680: a2a797d3 flt.d a5,fa5,fa0 - 3c684: e3cd bnez a5,3c726 <_ZN9ResultSet5checkEv+0xe6> - 3c686: 8d81b787 fld fa5,-1832(gp) # c5680 <__SDATA_BEGIN__+0xd8> - 3c68a: a2f517d3 flt.d a5,fa0,fa5 - 3c68e: efc1 bnez a5,3c726 <_ZN9ResultSet5checkEv+0xe6> - 3c690: 3578 fld fa4,232(a0) - 3c692: 317c fld fa5,224(a0) - 3c694: 3968 fld fa0,240(a0) - 3c696: 12e77753 fmul.d fa4,fa4,fa4 - 3c69a: 72f7f7c3 fmadd.d fa5,fa5,fa5,fa4 - 3c69e: 7aa57543 fmadd.d fa0,fa0,fa0,fa5 - 3c6a2: 00102773 frflags a4 - 3c6a6: a2d517d3 flt.d a5,fa0,fa3 - 3c6aa: 00171073 fsflags a4 - 3c6ae: a2d52053 feq.d zero,fa0,fa3 - 3c6b2: efc9 bnez a5,3c74c <_ZN9ResultSet5checkEv+0x10c> - 3c6b4: 5a057553 fsqrt.d fa0,fa0 + 3c66e: 5a057553 fsqrt.d fa0,fa0 + 3c672: f20006d3 fmv.d.x fa3,zero + 3c676: 8d01b787 fld fa5,-1840(gp) # c5678 <__SDATA_BEGIN__+0xd0> + 3c67a: 0ad57553 fsub.d fa0,fa0,fa3 + 3c67e: a2a797d3 flt.d a5,fa5,fa0 + 3c682: e7cd bnez a5,3c72c <_ZN9ResultSet5checkEv+0xee> + 3c684: 8d81b787 fld fa5,-1832(gp) # c5680 <__SDATA_BEGIN__+0xd8> + 3c688: a2f517d3 flt.d a5,fa0,fa5 + 3c68c: e3c5 bnez a5,3c72c <_ZN9ResultSet5checkEv+0xee> + 3c68e: 3578 fld fa4,232(a0) + 3c690: 317c fld fa5,224(a0) + 3c692: 3968 fld fa0,240(a0) + 3c694: 12e77753 fmul.d fa4,fa4,fa4 + 3c698: 72f7f7c3 fmadd.d fa5,fa5,fa5,fa4 + 3c69c: 7aa57543 fmadd.d fa0,fa0,fa0,fa5 + 3c6a0: 00102773 frflags a4 + 3c6a4: a2d517d3 flt.d a5,fa0,fa3 + 3c6a8: 00171073 fsflags a4 + 3c6ac: f20007d3 fmv.d.x fa5,zero + 3c6b0: a2f52053 feq.d zero,fa0,fa5 + 3c6b4: efd9 bnez a5,3c752 <_ZN9ResultSet5checkEv+0x114> + 3c6b6: 5a057553 fsqrt.d fa0,fa0 where the compiler for some reason cannot realise it already has the value of 0.0 available in fa3 and instead uses an extra move to fa5 for the final FEQ.D. > No regressions with the GCC (with and w/o `-fsignaling-nans') and glibc > testsuites (as per commit 1fcbfb00fc67 ("RISC-V: Fix -fsignaling-nans for > glibc testsuite.")). OK to apply? Any comments on the change, anyone? Maciej ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] RISC-V: Split unordered FP comparisons into individual RTL insns 2022-06-23 13:39 ` Maciej W. Rozycki @ 2022-06-23 16:44 ` Kito Cheng 2022-07-04 14:12 ` [PATCH v2] " Maciej W. Rozycki 0 siblings, 1 reply; 7+ messages in thread From: Kito Cheng @ 2022-06-23 16:44 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: GCC Patches, Andrew Waterman [-- Attachment #1: Type: text/plain, Size: 3376 bytes --] Hi Maciej: Thanks for detail analysis and performance number report, I am concern about this patch might let compiler schedule the fsflags/frflags with other floating point instructions, and the major issue is we didn't model fflags right in GCC as you mentioned in the first email. So I think we should model this right before we split that, I guess that would be a bunch of work: 1. Add fflags to the hard register list. 2. Add (clobber (reg fflags)) or (set (reg fflags) (fpop (operands...))) to those floating point operations which might change fflags 3. Rewrite riscv_frflags and riscv_fsflags pattern by right RTL pattern: (set (reg) (reg fflags)) and (set (reg fflags) (reg)). 4. Then split *f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default and *f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan pattern. However I am not sure about the code gen impact of 2, especially the impact to the combine pass, not sure if you are interested to give a try? And, I did some hack for part of this approach (1+3+4) got following result for "__builtin_isless (x, y) + __builtin_isless (x, z)": fltlt: frflags a4 # 8 [c=4 l=4] riscv_frflags flt.d a5,fa0,fa1 # 14 [c=4 l=4] *cstoredfdi4 flt.d a0,fa0,fa2 # 17 [c=4 l=4] *cstoredfdi4 fsflags a4 # 18 [c=4 l=4] riscv_fsflags add a0,a0,a5 # 30 [c=4 l=4] adddi3/0 ret # 40 [c=0 l=4] simple_return Verbose version: fltlt: #(insn 8 5 9 (set (reg:SI 14 a4 [88]) # (reg:SI 66 fflags)) "x.c":5:10 258 {riscv_frflags} # (expr_list:REG_DEAD (reg:SI 66 fflags) # (nil))) frflags a4 # 8 [c=4 l=4] riscv_frflags #(insn 14 11 15 (parallel [ # (set (reg:DI 15 a5 [90]) # (lt:DI (reg/v:DF 42 fa0 [orig:81 x ] [81]) # (reg:DF 43 fa1 [101]))) # (clobber:SI (reg:SI 66 fflags)) # ]) "x.c":5:10 197 {*cstoredfdi4} # (expr_list:REG_DEAD (reg:DF 43 fa1 [101]) # (expr_list:REG_UNUSED (reg:SI 66 fflags) # (nil)))) flt.d a5,fa0,fa1 # 14 [c=4 l=4] *cstoredfdi4 #(insn 17 15 18 (parallel [ # (set (reg:DI 10 a0 [94]) # (lt:DI (reg/v:DF 42 fa0 [orig:81 x ] [81]) # (reg:DF 44 fa2 [102]))) # (clobber:SI (reg:SI 66 fflags)) # ]) "x.c":5:36 197 {*cstoredfdi4} # (expr_list:REG_DEAD (reg:DF 44 fa2 [102]) # (expr_list:REG_DEAD (reg/v:DF 42 fa0 [orig:81 x ] [81]) # (expr_list:REG_UNUSED (reg:SI 66 fflags) # (nil))))) flt.d a0,fa0,fa2 # 17 [c=4 l=4] *cstoredfdi4 #(insn 18 17 19 (set (reg:SI 66 fflags) # (reg:SI 14 a4 [88])) "x.c":5:36 259 {riscv_fsflags} # (expr_list:REG_DEAD (reg:SI 14 a4 [88]) # (nil))) fsflags a4 # 18 [c=4 l=4] riscv_fsflags #(insn 30 25 31 (set (reg/i:DI 10 a0) # (plus:DI (reg:DI 10 a0 [94]) # (reg:DI 15 a5 [90]))) "x.c":6:1 4 {adddi3} # (expr_list:REG_DEAD (reg:DI 15 a5 [90]) # (nil))) add a0,a0,a5 # 30 [c=4 l=4] adddi3/0 #(jump_insn 40 39 41 (simple_return) "x.c":6:1 244 {simple_return} # (nil) # -> simple_return) ret # 40 [c=0 l=4] simple_return ---- But this hack add an extra use of fflags to prevent FFLAGS getting CSEed, patch attached. [-- Attachment #2: 0001-model-fflags.patch --] [-- Type: text/x-patch, Size: 6664 bytes --] From 1116422bb5a69d8361f5c5bc334a122fecbaa306 Mon Sep 17 00:00:00 2001 From: Kito Cheng <kito.cheng@sifive.com> Date: Fri, 24 Jun 2022 00:41:55 +0800 Subject: [PATCH] model fflags --- gcc/config/riscv/riscv.h | 12 +++++------ gcc/config/riscv/riscv.md | 42 ++++++++++++++++++++------------------- gcc/function.cc | 1 + 3 files changed, 29 insertions(+), 26 deletions(-) diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index 6f7f4d3fbdc..c4e6efe9885 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -291,7 +291,7 @@ ASM_MISA_SPEC - ARG_POINTER_REGNUM - FRAME_POINTER_REGNUM */ -#define FIRST_PSEUDO_REGISTER 66 +#define FIRST_PSEUDO_REGISTER 67 /* x0, sp, gp, and tp are fixed. */ @@ -303,7 +303,7 @@ ASM_MISA_SPEC 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \ /* Others. */ \ - 1, 1 \ + 1, 1, 1 \ } /* a0-a7, t0-t6, fa0-fa7, and ft0-ft11 are volatile across calls. @@ -317,7 +317,7 @@ ASM_MISA_SPEC 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, \ 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, \ /* Others. */ \ - 1, 1 \ + 1, 1, 1, \ } /* Select a register mode required for caller save of hard regno REGNO. @@ -472,7 +472,7 @@ enum reg_class { 0xffffffff, 0x00000000, 0x00000000 }, /* GR_REGS */ \ { 0x00000000, 0xffffffff, 0x00000000 }, /* FP_REGS */ \ { 0x00000000, 0x00000000, 0x00000003 }, /* FRAME_REGS */ \ - { 0xffffffff, 0xffffffff, 0x00000003 } /* ALL_REGS */ \ + { 0xffffffff, 0xffffffff, 0x00000007 } /* ALL_REGS */ \ } /* A C expression whose value is a register class containing hard @@ -514,7 +514,7 @@ enum reg_class 40, 41, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, \ /* None of the remaining classes have defined call-saved \ registers. */ \ - 64, 65 \ + 64, 65, 66 \ } /* True if VALUE is a signed 12-bit number. */ @@ -777,7 +777,7 @@ typedef struct { "fs0", "fs1", "fa0", "fa1", "fa2", "fa3", "fa4", "fa5", \ "fa6", "fa7", "fs2", "fs3", "fs4", "fs5", "fs6", "fs7", \ "fs8", "fs9", "fs10","fs11","ft8", "ft9", "ft10","ft11", \ - "arg", "frame", } + "arg", "frame", "fflags" } #define ADDITIONAL_REGISTER_NAMES \ { \ diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 4f59bb99cf5..9039cc73e4d 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -47,6 +47,7 @@ (define_c_enum "unspec" [ ;; Stack tie UNSPEC_TIE + UNSPEC_FSNVSNAN ]) (define_c_enum "unspecv" [ @@ -57,7 +58,6 @@ (define_c_enum "unspecv" [ ;; Floating-point unspecs. UNSPECV_FRFLAGS UNSPECV_FSFLAGS - UNSPECV_FSNVSNAN ;; Interrupt handler instructions. UNSPECV_MRET @@ -99,6 +99,7 @@ (define_constants (S9_REGNUM 25) (S10_REGNUM 26) (S11_REGNUM 27) + (FFLAGS_REGNUM 66) (NORMAL_RETURN 0) (SIBCALL_RETURN 1) @@ -2300,7 +2301,8 @@ (define_expand "cstore<mode>4" [(set (match_operand:SI 0 "register_operand") (match_operator:SI 1 "order_operator" [(match_operand:GPR 2 "register_operand") - (match_operand:GPR 3 "nonmemory_operand")]))] + (match_operand:GPR 3 "nonmemory_operand")])) + (clobber (reg:SI FFLAGS_REGNUM))] "" { riscv_expand_int_scc (operands[0], GET_CODE (operands[1]), operands[2], @@ -2312,7 +2314,8 @@ (define_expand "cstore<mode>4" [(set (match_operand:SI 0 "register_operand") (match_operator:SI 1 "fp_scc_comparison" [(match_operand:ANYF 2 "register_operand") - (match_operand:ANYF 3 "register_operand")]))] + (match_operand:ANYF 3 "register_operand")])) + (clobber (reg:SI FFLAGS_REGNUM))] "TARGET_HARD_FLOAT" { riscv_expand_float_scc (operands[0], GET_CODE (operands[1]), operands[2], @@ -2324,7 +2327,8 @@ (define_insn "*cstore<ANYF:mode><X:mode>4" [(set (match_operand:X 0 "register_operand" "=r") (match_operator:X 1 "fp_native_comparison" [(match_operand:ANYF 2 "register_operand" " f") - (match_operand:ANYF 3 "register_operand" " f")]))] + (match_operand:ANYF 3 "register_operand" " f")])) + (clobber (reg:SI FFLAGS_REGNUM))] "TARGET_HARD_FLOAT" "f%C1.<fmt>\t%0,%2,%3" [(set_attr "type" "fcmp") @@ -2342,18 +2346,15 @@ (define_expand "f<quiet_pattern>_quiet<ANYF:mode><X:mode>4" rtx op2 = operands[2]; rtx tmp = gen_reg_rtx (SImode); rtx cmp = gen_rtx_<QUIET_PATTERN> (<X:MODE>mode, op1, op2); - rtx frflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, const0_rtx), - UNSPECV_FRFLAGS); - rtx fsflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, tmp), - UNSPECV_FSFLAGS); - - emit_insn (gen_rtx_SET (tmp, frflags)); - emit_insn (gen_rtx_SET (op0, cmp)); - emit_insn (fsflags); + + emit_insn (gen_riscv_frflags (tmp)); +rtx set = gen_rtx_SET (op0, cmp); +rtx clobber = gen_rtx_CLOBBER(SImode, gen_rtx_REG(SImode, FFLAGS_REGNUM)); + rtvec vec = gen_rtvec (2, set, clobber); + emit_insn (gen_rtx_PARALLEL(VOIDmode, vec)); + emit_insn (gen_riscv_fsflags(tmp)); if (HONOR_SNANS (<ANYF:MODE>mode)) - emit_insn (gen_rtx_UNSPEC_VOLATILE (<ANYF:MODE>mode, - gen_rtvec (2, op1, op2), - UNSPECV_FSNVSNAN)); + emit_insn (gen_riscv_fsnvsnan<ANYF:mode>2 (op1, op2)); DONE; }) @@ -2754,19 +2755,20 @@ (define_insn "gpr_restore_return" (define_insn "riscv_frflags" [(set (match_operand:SI 0 "register_operand" "=r") - (unspec_volatile [(const_int 0)] UNSPECV_FRFLAGS))] + (reg:SI FFLAGS_REGNUM))] "TARGET_HARD_FLOAT" "frflags\t%0") (define_insn "riscv_fsflags" - [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] UNSPECV_FSFLAGS)] + [(set (reg:SI FFLAGS_REGNUM) (match_operand:SI 0 "csr_operand" "rK"))] "TARGET_HARD_FLOAT" "fsflags\t%0") -(define_insn "*riscv_fsnvsnan<mode>2" - [(unspec_volatile [(match_operand:ANYF 0 "register_operand") +(define_insn "riscv_fsnvsnan<mode>2" + [(set (reg:SI FFLAGS_REGNUM) + (unspec [(match_operand:ANYF 0 "register_operand") (match_operand:ANYF 1 "register_operand")] - UNSPECV_FSNVSNAN)] + UNSPEC_FSNVSNAN))] "TARGET_HARD_FLOAT" "feq.<fmt>\tzero,%0,%1" [(set_attr "type" "fcmp") diff --git a/gcc/function.cc b/gcc/function.cc index ad0096a43ef..cb0d743558f 100644 --- a/gcc/function.cc +++ b/gcc/function.cc @@ -5590,6 +5590,7 @@ expand_function_end (void) sh mach_dep_reorg) that still try and compute their own lifetime info instead of using the general framework. */ use_return_register (); + emit_insn (gen_rtx_USE(SImode, gen_rtx_REG(SImode,66))); } rtx -- 2.34.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2] RISC-V: Split unordered FP comparisons into individual RTL insns 2022-06-23 16:44 ` Kito Cheng @ 2022-07-04 14:12 ` Maciej W. Rozycki 2022-07-18 15:42 ` [PING][PATCH " Maciej W. Rozycki 0 siblings, 1 reply; 7+ messages in thread From: Maciej W. Rozycki @ 2022-07-04 14:12 UTC (permalink / raw) To: Kito Cheng; +Cc: GCC Patches, Andrew Waterman We have unordered FP comparisons implemented as RTL insns that produce multiple machine instructions. Such RTL insns are hard to match with a processor pipeline description and additionally there is a redundant SNEZ instruction produced on the result of these comparisons even though the FLT.fmt and FLE.fmt machine instructions already produce either 0 or 1, e.g.: long flt (double x, double y) { return __builtin_isless (x, y); } with `-O2 -fno-finite-math-only -ftrapping-math -fno-signaling-nans' gets compiled to: .globl flt .type flt, @function flt: frflags a5 flt.d a0,fa0,fa1 fsflags a5 snez a0,a0 ret .size flt, .-flt because the middle end can't see through the UNSPEC operation unordered FP comparisons have been defined in terms of. These instructions are only produced via an expander already, so change the expander to emit individual RTL insns for each machine instruction in the ultimate ultimate sequence produced rather than deferring to a single RTL insn producing the whole sequence at once. gcc/ * config/riscv/riscv.md (UNSPECV_FSNVSNAN): New constant. (QUIET_PATTERN): New int attribute. (f<quiet_pattern>_quiet<ANYF:mode><X:mode>4): Emit the intended RTL insns entirely within the preparation statements. (*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default) (*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan): Remove insns. (*riscv_fsnvsnan<mode>2): New insn. gcc/testsuite/ * gcc.target/riscv/fle-ieee.c: New test. * gcc.target/riscv/fle-snan.c: New test. * gcc.target/riscv/fle.c: New test. * gcc.target/riscv/flef-ieee.c: New test. * gcc.target/riscv/flef-snan.c: New test. * gcc.target/riscv/flef.c: New test. * gcc.target/riscv/flt-ieee.c: New test. * gcc.target/riscv/flt-snan.c: New test. * gcc.target/riscv/flt.c: New test. * gcc.target/riscv/fltf-ieee.c: New test. * gcc.target/riscv/fltf-snan.c: New test. * gcc.target/riscv/fltf.c: New test. --- Hi Kito, Thank you for your review. > Thanks for detail analysis and performance number report, I am concern > about this patch might let compiler schedule the fsflags/frflags with > other floating point instructions, and the major issue is we didn't > model fflags right in GCC as you mentioned in the first email. I have now looked through various places and we're good. First of all the C language standard has this: "F.8.1 Environment management "IEC 60559 requires that floating-point operations implicitly raise floating-point exception status flags, and that rounding control modes can be set explicitly to affect result values of floating-point operations. When the state for the FENV_ACCESS pragma (defined in <fenv.h>) is "on", these changes to the floating-point state are treated as side effects which respect sequence points." We don't actually support the FENV_ACCESS pragma, however we provide for having FP environment support in the C library (e.g. `riscv_getflags' and `riscv_setflags' among others is the RISC-V port of glibc): " * 'The default state for the 'FENV_ACCESS' pragma (C99 and C11 7.6.1).' " This pragma is not implemented, but the default is to "off" unless '-frounding-math' is used in which case it is "on"." (I find this misleading, because my interpretation of our documentation and code is that the default is "on" whenever `-frounding-math' and `-ftrapping-math' are active both at a time; arguably the text is however correct, because `-ftrapping-math' is on by default, so it doesn't have to "be used" and it's `-fno-trapping-math' that has to "be unused" for the effect to be in place of what FENV_ACCESS "on" would do should we implement it). Now `riscv_getflags' and `riscv_setflags' use `asm volatile' to peek or poke at IEEE exception flags, so for these pieces of code to work the compiler has to make sure not to reorder volatile instructions around trapping instructions and that is handled in `can_move_insns_across': if (may_trap_or_fault_p (PATTERN (insn)) && (trapping_insns_in_across || other_branch_live != NULL || volatile_insn_p (PATTERN (insn)))) break; (cf.: <https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00254.html>, and commit c6d851b95a7b). And we consider FP arithmetic instructions trapping under `-ftrapping-math' in `may_trap_p_1': case NEG: case ABS: case SUBREG: case VEC_MERGE: case VEC_SELECT: case VEC_CONCAT: case VEC_DUPLICATE: /* These operations don't trap even with floating point. */ break; default: /* Any floating arithmetic may trap. */ if (FLOAT_MODE_P (GET_MODE (x)) && flag_trapping_math) return 1; (other relevant operations are handled elsewhere with this switch statement). This is consistent with my observations where only FLD or FABS.D (neither trapping) get reordered across FRFLAGS (volatile by means of `unspec_volatile'). However following the observations above I chose to update the test cases to better reflect how we control IEEE exception handling and use `-fno-trapping-math' rather than `-ffinite-math-only' to disable the use of unordered comparison operations. > So I think we should model this right before we split that, I guess > that would be a bunch of work: > 1. Add fflags to the hard register list. > 2. Add (clobber (reg fflags)) or (set (reg fflags) (fpop > (operands...))) to those floating point operations which might change > fflags > 3. Rewrite riscv_frflags and riscv_fsflags pattern by right RTL > pattern: (set (reg) (reg fflags)) and (set (reg fflags) (reg)). > 4. Then split *f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default and > *f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan pattern. > > However I am not sure about the code gen impact of 2, especially the > impact to the combine pass, not sure if you are interested to give a > try? I think we can look into it long-term as a further optimisation, but to solve the problem of insn scheduling at hand my current proposal seems adequate enough and I suspect adding full data dependency tracking for the IEEE flags could be a rabbit hole (ISTM after all no other target does that, and I smell there's been a reason for that). FAOD if at all I'd envision doing such tracking individually for each exception flag, following "Accumulating CSRs" listings from Section 14.3 "Source and Destination Register Listings" in the unprivileged ISA spec. While re-reviewing code I have spotted I previously missed operand constraints for the new `*riscv_fsnvsnan<mode>2' insn, so I have added them now. I have verified that the new test cases still pass with the update in place, and I have rerun full `-fsignaling-nans' regression testing for the constraint fix. OK to apply? Maciej Changes from v1: - Add operand constraints for the new `*riscv_fsnvsnan<mode>2' insn. - In test cases use `-fno-trapping-math' rather than `-ffinite-math-only'; consequently force `-fno-finite-math-only' so that any defaults do not interfere with the results expected. --- gcc/config/riscv/riscv.md | 67 +++++++++++++++-------------- gcc/testsuite/gcc.target/riscv/fle-ieee.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fle-snan.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fle.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flef-ieee.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flef-snan.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flef.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flt-ieee.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flt-snan.c | 12 +++++ gcc/testsuite/gcc.target/riscv/flt.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fltf-ieee.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fltf-snan.c | 12 +++++ gcc/testsuite/gcc.target/riscv/fltf.c | 12 +++++ 13 files changed, 179 insertions(+), 32 deletions(-) gcc-riscv-fcmp-split.diff Index: gcc/gcc/config/riscv/riscv.md =================================================================== --- gcc.orig/gcc/config/riscv/riscv.md +++ gcc/gcc/config/riscv/riscv.md @@ -57,6 +57,7 @@ ;; Floating-point unspecs. UNSPECV_FRFLAGS UNSPECV_FSFLAGS + UNSPECV_FSNVSNAN ;; Interrupt handler instructions. UNSPECV_MRET @@ -360,6 +361,7 @@ ;; Iterator and attributes for quiet comparisons. (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET]) (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET "le")]) +(define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") (UNSPEC_FLE_QUIET "LE")]) ;; This code iterator allows signed and unsigned widening multiplications ;; to use the same template. @@ -2326,39 +2328,31 @@ (set_attr "mode" "<UNITMODE>")]) (define_expand "f<quiet_pattern>_quiet<ANYF:mode><X:mode>4" - [(parallel [(set (match_operand:X 0 "register_operand") - (unspec:X - [(match_operand:ANYF 1 "register_operand") - (match_operand:ANYF 2 "register_operand")] - QUIET_COMPARISON)) - (clobber (match_scratch:X 3))])] - "TARGET_HARD_FLOAT") - -(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default" - [(set (match_operand:X 0 "register_operand" "=r") - (unspec:X - [(match_operand:ANYF 1 "register_operand" " f") - (match_operand:ANYF 2 "register_operand" " f")] - QUIET_COMPARISON)) - (clobber (match_scratch:X 3 "=&r"))] - "TARGET_HARD_FLOAT && ! HONOR_SNANS (<ANYF:MODE>mode)" - "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3" - [(set_attr "type" "fcmp") - (set_attr "mode" "<UNITMODE>") - (set (attr "length") (const_int 12))]) + [(set (match_operand:X 0 "register_operand") + (unspec:X [(match_operand:ANYF 1 "register_operand") + (match_operand:ANYF 2 "register_operand")] + QUIET_COMPARISON))] + "TARGET_HARD_FLOAT" +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + rtx tmp = gen_reg_rtx (SImode); + rtx cmp = gen_rtx_<QUIET_PATTERN> (<X:MODE>mode, op1, op2); + rtx frflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, const0_rtx), + UNSPECV_FRFLAGS); + rtx fsflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, tmp), + UNSPECV_FSFLAGS); -(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan" - [(set (match_operand:X 0 "register_operand" "=r") - (unspec:X - [(match_operand:ANYF 1 "register_operand" " f") - (match_operand:ANYF 2 "register_operand" " f")] - QUIET_COMPARISON)) - (clobber (match_scratch:X 3 "=&r"))] - "TARGET_HARD_FLOAT && HONOR_SNANS (<ANYF:MODE>mode)" - "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3\n\tfeq.<fmt>\tzero,%1,%2" - [(set_attr "type" "fcmp") - (set_attr "mode" "<UNITMODE>") - (set (attr "length") (const_int 16))]) + emit_insn (gen_rtx_SET (tmp, frflags)); + emit_insn (gen_rtx_SET (op0, cmp)); + emit_insn (fsflags); + if (HONOR_SNANS (<ANYF:MODE>mode)) + emit_insn (gen_rtx_UNSPEC_VOLATILE (<ANYF:MODE>mode, + gen_rtvec (2, op1, op2), + UNSPECV_FSNVSNAN)); + DONE; +}) (define_insn "*seq_zero_<X:mode><GPR:mode>" [(set (match_operand:GPR 0 "register_operand" "=r") @@ -2766,6 +2760,15 @@ "TARGET_HARD_FLOAT" "fsflags\t%0") +(define_insn "*riscv_fsnvsnan<mode>2" + [(unspec_volatile [(match_operand:ANYF 0 "register_operand" "f") + (match_operand:ANYF 1 "register_operand" "f")] + UNSPECV_FSNVSNAN)] + "TARGET_HARD_FLOAT" + "feq.<fmt>\tzero,%0,%1" + [(set_attr "type" "fcmp") + (set_attr "mode" "<UNITMODE>")]) + (define_insn "riscv_mret" [(return) (unspec_volatile [(const_int 0)] UNSPECV_MRET)] Index: gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } */ + +long +fle (double x, double y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */ + +long +fle (double x, double y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fle.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fle.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" } */ + +long +fle (double x, double y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tf(?:gt|le)\\.d\t\[^\n\]*\n" } } */ +/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } */ + +long +flef (float x, float y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */ + +long +flef (float x, float y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flef.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flef.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" } */ + +long +flef (float x, float y) +{ + return __builtin_islessequal (x, y); +} + +/* { dg-final { scan-assembler "\tf(?:gt|le)\\.s\t\[^\n\]*\n" } } */ +/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } */ + +long +flt (double x, double y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */ + +long +flt (double x, double y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/flt.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/flt.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" } */ + +long +flt (double x, double y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.d\t\[^\n\]*\n" } } */ +/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } */ + +long +fltf (float x, float y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */ + +long +fltf (float x, float y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */ +/* { dg-final { scan-assembler-not "snez" } } */ Index: gcc/gcc/testsuite/gcc.target/riscv/fltf.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/riscv/fltf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" } */ + +long +fltf (float x, float y) +{ + return __builtin_isless (x, y); +} + +/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.s\t\[^\n\]*\n" } } */ +/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */ ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PING][PATCH v2] RISC-V: Split unordered FP comparisons into individual RTL insns 2022-07-04 14:12 ` [PATCH v2] " Maciej W. Rozycki @ 2022-07-18 15:42 ` Maciej W. Rozycki 2022-07-27 10:40 ` Kito Cheng 0 siblings, 1 reply; 7+ messages in thread From: Maciej W. Rozycki @ 2022-07-18 15:42 UTC (permalink / raw) To: Kito Cheng; +Cc: GCC Patches, Andrew Waterman On Mon, 4 Jul 2022, Maciej W. Rozycki wrote: > These instructions are only produced via an expander already, so change > the expander to emit individual RTL insns for each machine instruction > in the ultimate ultimate sequence produced rather than deferring to a > single RTL insn producing the whole sequence at once. Ping for: <https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597767.html> Maciej ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PING][PATCH v2] RISC-V: Split unordered FP comparisons into individual RTL insns 2022-07-18 15:42 ` [PING][PATCH " Maciej W. Rozycki @ 2022-07-27 10:40 ` Kito Cheng 2022-07-28 13:20 ` Maciej W. Rozycki 0 siblings, 1 reply; 7+ messages in thread From: Kito Cheng @ 2022-07-27 10:40 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: GCC Patches, Andrew Waterman Hi Maciej: I am convinced that is OK for now, I agree modeling fflags would be a rabbit hole, I tried to build a full GNU toolchain with my quick patch and saw many ICE during build libraries, that definitely should be a long-term optimization project. Although I'm thinking if we should default -fno-trapping-math for RISC-V, because RISC-V didn't trap for any floating point operations, however I think that would be another topic. so you got my LGTM :) On Mon, Jul 18, 2022 at 11:43 PM Maciej W. Rozycki <macro@embecosm.com> wrote: > > On Mon, 4 Jul 2022, Maciej W. Rozycki wrote: > > > These instructions are only produced via an expander already, so change > > the expander to emit individual RTL insns for each machine instruction > > in the ultimate ultimate sequence produced rather than deferring to a > > single RTL insn producing the whole sequence at once. > > Ping for: > > <https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597767.html> > > Maciej ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PING][PATCH v2] RISC-V: Split unordered FP comparisons into individual RTL insns 2022-07-27 10:40 ` Kito Cheng @ 2022-07-28 13:20 ` Maciej W. Rozycki 0 siblings, 0 replies; 7+ messages in thread From: Maciej W. Rozycki @ 2022-07-28 13:20 UTC (permalink / raw) To: Kito Cheng; +Cc: GCC Patches, Andrew Waterman Hi Kito, > I am convinced that is OK for now, I agree modeling fflags would be a > rabbit hole, I tried to build a full GNU toolchain with my quick patch > and saw many ICE during build libraries, that definitely should be a > long-term optimization project. > > Although I'm thinking if we should default -fno-trapping-math for > RISC-V, because RISC-V didn't trap for any floating point operations, > however I think that would be another topic. No need to do anything for RISC-V I believe, as Richard has mentioned that's been his plan for GCC 13 compiler-wide; see the discussion here: <https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598589.html> for further details. > so you got my LGTM :) I have applied this change now then, thank you for your review. Maciej ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-07-28 13:21 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-09 13:44 [PATCH] RISC-V: Split unordered FP comparisons into individual RTL insns Maciej W. Rozycki 2022-06-23 13:39 ` Maciej W. Rozycki 2022-06-23 16:44 ` Kito Cheng 2022-07-04 14:12 ` [PATCH v2] " Maciej W. Rozycki 2022-07-18 15:42 ` [PING][PATCH " Maciej W. Rozycki 2022-07-27 10:40 ` Kito Cheng 2022-07-28 13:20 ` Maciej W. Rozycki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).