[PATCH 0/2] RISC-V: Make "prefetch.i" built-in usable

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/2] RISC-V: Make "prefetch.i" built-in usable
@ 2023-08-14  5:32 Tsukasa OI
  2023-08-14  5:32 ` [PATCH 1/2] RISC-V: Add support for the 'Zfa' extension Tsukasa OI
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Tsukasa OI @ 2023-08-14  5:32 UTC (permalink / raw)
  To: Tsukasa OI, Kito Cheng, Palmer Dabbelt, Andrew Waterman, Jim Wilson
  Cc: gcc-patches

Hello,

and... I think this might be my first *large* patch set for GCC
contribution and definitely the first one to touch the machine description.

So, please review it carefully.


Background
===========

This patch set adds an optimization to FP constant initialization using a
FLI instruction, which is a part of the 'Zfa' extension which provides
additional floating-point instructions.

FLI instructions ("fli.h" for binary16, "fli.s" for binary32, "fli.d" for
binary64 and "fli.q" for binary128 [which can be ignored because current
GCC for RISC-V does not natively support binary128]) provide an
load-immediate operation for following 32 immediates.

| Binary Encoding | Immediate (and its part of binary representation) |
| --------------- | --------------------------------------------------|
|    `00000` ( 0) | -1.0          (-0b1.00 * 2^(+ 0))                 |
|    `00001` ( 1) | Minimum positive normal value                     |
|                 | sign=[0] exponent=[0..01] significand=[000..000]  |
|    `00010` ( 2) | 1.00*2^(-16)  (+0b1.00 * 2^(-16))                 |
|    `00011` ( 3) | 1.00*2^(-15)  (+0b1.00 * 2^(-15))                 |
|    `00100` ( 4) | 1.00*2^(- 8)  (+0b1.00 * 2^(- 8))                 |
|    `00101` ( 5) | 1.00*2^(- 7)  (+0b1.00 * 2^(- 7))                 |
|    `00110` ( 6) | 1.00*2^(- 4)  (+0b1.00 * 2^(- 4)) = 0.0625        |
|    `00111` ( 7) | 1.00*2^(- 3)  (+0b1.00 * 2^(- 3)) = 0.125         |
|    `01000` ( 8) | 1.00*2^(- 2)  (+0b1.00 * 2^(- 2)) : 0.25          |
|    `01001` ( 9) | 1.25*2^(- 2)  (+0b1.01 * 2^(- 2)) : 0.3125        |
|    `01010` (10) | 1.50*2^(- 2)  (+0b1.10 * 2^(- 2)) : 0.375         |
|    `01011` (11) | 1.75*2^(- 2)  (+0b1.11 * 2^(- 2)) : 0.4375        |
|    `01100` (12) | 1.00*2^(- 1)  (+0b1.00 * 2^(- 1)) : 0.5           |
|    `01101` (13) | 1.25*2^(- 1)  (+0b1.01 * 2^(- 1)) : 0.625         |
|    `01110` (14) | 1.50*2^(- 1)  (+0b1.10 * 2^(- 1)) : 0.75          |
|    `01111` (15) | 1.75*2^(- 1)  (+0b1.11 * 2^(- 1)) : 0.875         |
|    `10000` (16) | 1.00*2^(+ 0)  (+0b1.00 * 2^(+ 0)) : 1.0           |
|    `10001` (17) | 1.25*2^(+ 0)  (+0b1.01 * 2^(+ 0)) : 1.25          |
|    `10010` (18) | 1.50*2^(+ 0)  (+0b1.10 * 2^(+ 0)) : 1.5           |
|    `10011` (19) | 1.75*2^(+ 0)  (+0b1.11 * 2^(+ 0)) : 1.75          |
|    `10100` (20) | 1.00*2^(+ 1)  (+0b1.00 * 2^(+ 1)) : 2.0           |
|    `10101` (21) | 1.25*2^(+ 1)  (+0b1.01 * 2^(+ 1)) : 2.5           |
|    `10110` (22) | 1.50*2^(+ 1)  (+0b1.10 * 2^(+ 1)) : 3.0           |
|    `10111` (23) | 1.00*2^(+ 2)  (+0b1.00 * 2^(+ 2)) = 4             |
|    `11000` (24) | 1.00*2^(+ 3)  (+0b1.00 * 2^(+ 3)) = 8             |
|    `11001` (25) | 1.00*2^(+ 4)  (+0b1.00 * 2^(+ 4)) = 16            |
|    `11010` (26) | 1.00*2^(+ 7)  (+0b1.00 * 2^(+ 7)) = 128           |
|    `11011` (27) | 1.00*2^(+ 8)  (+0b1.00 * 2^(+ 8)) = 256           |
|    `11100` (28) | 1.00*2^(+15)  (+0b1.00 * 2^(+15)) = 32768         |
|    `11101` (29) | 1.00*2^(+16)  (+0b1.00 * 2^(+16)) = 65536         |
|                 | On "fli.h", this is equivalent to positive inf.   |
|    `11110` (30) | Positive infinity                                 |
|                 | sign=[0] exponent=[1..11] significand=[000..000]  |
|    `11111` (31) | Canonical NaN (positive, quiet and zero payload)  |
|                 | sign=[0] exponent=[1..11] significand=[100..000]  |

Currently, initializing a FP constant (except zero) involves memory and its
use can be reduced by FLI instructions.

We may have a room to generate much complex constants with multiple FLI
instructions (e.g. like long integer constants) but for starter, we can
begin with optimizing one FP constant initialization with one FLI
instruction (and because FP arithmetic often requires larger latency,
benefits of making multiple FLI sequence is not high compared to integers).


FLI FP constant checking
=========================

An instruction with a similar role to RISC-V's FLI instructions is the Arm/
AArch64's vmov.f32 instruction. It provides a load-immediate operation for
constant that can be represented in the following form:

> (-1)^s * 0b1.xxxx * 2^r   (where -3 <= r <= +4; fits in 3-bits)

This patch is largely influenced by AArch64's handling but
compared to this, handling RISC-V's FLI FP constant can be a little tricky.

*   FLI normally generates only values with sign bit 0 except the binary
    encoding 0 (which loads -1.0 with sign bit 1).
*   Not only finite values, FLI can generate positive infinity and
    canonical NaN.
*   Because FLI can generate canonical NaN, handling NaN is preferred but
    FLI only generates canonical NaN.  Since we can easily create a non-
    canonical NaN with __builtin_nan ("[PAYLOAD]") and that could be a
    direct return value of a function, we must reject non-canonical NaNs
    (otherwise it'll generate "fli.d fa0,nan" where NaN is non-canonical).
*   Exponent range and mantissa constraint is a bit tricky.
    On binary encodings 8-22, it looks like 0b1.xx * 2^r (where -2 <= 1)
    but we have to explicitly reject 0b1.11 * 2^1 (that is 3.5) because
    the value 3.5 is not in the list.
    Other 1.00 * 2^r values have discontinuous r.
*   Binary encoding 1 (minimum positive normal value for corresponding
    type) depends on the type (or mode) we are on.
*   Assembler accepts three string operands: "min", "inf" and "nan".

Handling those like aarch64_float_const_representable_p can be
inefficient.  So, I implemented riscv_get_float_fli_const function which
returns complex information about a FLI constant (including whether the
constant is valid for a FLI constant).

This complex information contains:

1.  Validness
2.  Sign bit (only set for -1.0)
3.  FLI constant type ("min", "inf", "nan" or a finite number but "min")
4.  Highest two bits of mantissa under the point (xx for 0b1.xx)
    on a finite value except "min".
5.  Biased exponent (yet sparse representation to make handling easier)
    on a finite value except "min".  For 0b1.xx * 2^r, (r+16) is stored.
    Valid range of this is [0, 32] (inclusive) so it requires 6 bits.

On many ABIs, those information is packed into an integer sized bitfield.


New Constraint: "H"
====================

According to the GCC Internals documentation, (along with "G") "H" is
preferred for a machine-dependent fashion to permit immediate floating
operands in particular ranges of values.  Because "G" is already used to
represent +0.0, this patch set uses "H" for FLI-capable FP constants.

It adds one variant per operation:

*   movhf_hardfloat
*   movsf_hardfloat
*   movdf_hardfloat_rv32
*   movdf_hardfloat_rv64

Note that the 'Zfa' extension requires the 'F' extension (which is the
hard float).



Portions that I'm not sure whether they are okay
=================================================

*   NaN handling (comparison with canonical NaN)
    Due to constraints, I had to compare a NaN with known binary
    representations with known IEEE 754 binary16/32/64's canonical NaN but
    it there any better way to perform this?
*   Any ICE possibility?
    For simple programs, I confirmed that no ICE occurs but I'm not sure
    whether this applies to other programs.  If I miss some cases in
    riscv_output_move or riscv_print_operand functions (corresponding
    mov instructions in riscv.md), it can easily cause an ICE.


Sincerely,
Tsukasa




Tsukasa OI (2):
  RISC-V: Add support for the 'Zfa' extension
  RISC-V: Constant FP Optimization with 'Zfa'

 gcc/common/config/riscv/riscv-common.cc    |   3 +
 gcc/config/riscv/constraints.md            |   7 +
 gcc/config/riscv/riscv-opts.h              |   2 +
 gcc/config/riscv/riscv-protos.h            |  34 +++
 gcc/config/riscv/riscv.cc                  | 250 ++++++++++++++++++++-
 gcc/config/riscv/riscv.md                  |  24 +-
 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c |  24 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c |  24 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c |  14 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c | 111 +++++++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c |  98 ++++++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c |  61 +++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c |  30 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c |  39 ++++
 14 files changed, 697 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c


base-commit: 614052dd4ea083e086712809c754ffebd9361316
-- 
2.41.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] RISC-V: Add support for the 'Zfa' extension
  2023-08-14  5:32 [PATCH 0/2] RISC-V: Make "prefetch.i" built-in usable Tsukasa OI
@ 2023-08-14  5:32 ` Tsukasa OI
  2023-08-25 20:22   ` Jeff Law
  2023-08-14  5:32 ` [PATCH 2/2] RISC-V: Constant FP Optimization with 'Zfa' Tsukasa OI
  2023-08-14  6:19 ` [PATCH 0/2] " Tsukasa OI
  2 siblings, 1 reply; 10+ messages in thread
From: Tsukasa OI @ 2023-08-14  5:32 UTC (permalink / raw)
  To: Tsukasa OI, Kito Cheng, Palmer Dabbelt, Andrew Waterman, Jim Wilson
  Cc: gcc-patches

From: Tsukasa OI <research_trasio@irq.a4lg.com>

This commit adds support for the 'Zfa' extension containing additional
floating point instructions, version 0.1 (stable and approved).

gcc/ChangeLog:

	* common/config/riscv/riscv-common.cc
	(riscv_implied_info): Add implication 'Zfa' -> 'F'.
	(riscv_ext_version_table) Add support for the 'Zfa' extension.
	(riscv_ext_flag_table) Set MASK_ZFA if 'Zfa' is available.
	* config/riscv/riscv-opts.h (MASK_ZFA, TARGET_ZFA): New.
---
 gcc/common/config/riscv/riscv-common.cc | 3 +++
 gcc/config/riscv/riscv-opts.h           | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc
index 21f83f26371f..01d68856bc40 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -121,6 +121,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zvksg", "zvks"},
   {"zvksg", "zvkg"},
 
+  {"zfa", "f"},
   {"zfh", "zfhmin"},
   {"zfhmin", "f"},
   {"zvfhmin", "zve32f"},
@@ -257,6 +258,7 @@ static const struct riscv_ext_version riscv_ext_version_table[] =
   {"zvl32768b", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvl65536b", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"zfa",       ISA_SPEC_CLASS_NONE, 0, 1},
   {"zfh",       ISA_SPEC_CLASS_NONE, 1, 0},
   {"zfhmin",    ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvfhmin",   ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1400,6 +1402,7 @@ static const riscv_ext_flag_table_t riscv_ext_flag_table[] =
   {"zvl32768b", &gcc_options::x_riscv_zvl_flags, MASK_ZVL32768B},
   {"zvl65536b", &gcc_options::x_riscv_zvl_flags, MASK_ZVL65536B},
 
+  {"zfa",       &gcc_options::x_riscv_zf_subext, MASK_ZFA},
   {"zfhmin",    &gcc_options::x_riscv_zf_subext, MASK_ZFHMIN},
   {"zfh",       &gcc_options::x_riscv_zf_subext, MASK_ZFH},
   {"zvfhmin",   &gcc_options::x_riscv_zf_subext, MASK_ZVFHMIN},
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index aeea805b3425..e31ec7c4074a 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -249,11 +249,13 @@ enum riscv_entity
 #define MASK_ZFH      (1 << 1)
 #define MASK_ZVFHMIN  (1 << 2)
 #define MASK_ZVFH     (1 << 3)
+#define MASK_ZFA      (1 << 4)
 
 #define TARGET_ZFHMIN  ((riscv_zf_subext & MASK_ZFHMIN) != 0)
 #define TARGET_ZFH     ((riscv_zf_subext & MASK_ZFH) != 0)
 #define TARGET_ZVFHMIN ((riscv_zf_subext & MASK_ZVFHMIN) != 0)
 #define TARGET_ZVFH    ((riscv_zf_subext & MASK_ZVFH) != 0)
+#define TARGET_ZFA     ((riscv_zf_subext & MASK_ZFA) != 0)
 
 #define MASK_ZMMUL      (1 << 0)
 #define TARGET_ZMMUL    ((riscv_zm_subext & MASK_ZMMUL) != 0)
-- 
2.41.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] RISC-V: Constant FP Optimization with 'Zfa'
  2023-08-14  5:32 [PATCH 0/2] RISC-V: Make "prefetch.i" built-in usable Tsukasa OI
  2023-08-14  5:32 ` [PATCH 1/2] RISC-V: Add support for the 'Zfa' extension Tsukasa OI
@ 2023-08-14  5:32 ` Tsukasa OI
  2023-08-14 12:51   ` [2/2] " Jin Ma
  2023-08-14  6:19 ` [PATCH 0/2] " Tsukasa OI
  2 siblings, 1 reply; 10+ messages in thread
From: Tsukasa OI @ 2023-08-14  5:32 UTC (permalink / raw)
  To: Tsukasa OI, Kito Cheng, Palmer Dabbelt, Andrew Waterman, Jim Wilson
  Cc: gcc-patches

From: Tsukasa OI <research_trasio@irq.a4lg.com>

This commit implements an optimization for assignments from a FP constant
to a FP register using a FLI instruction from the 'Zfa' extension.

To this purpose, it adds the constraint "H" and adds hardfloat move
instructions a "H -> f" variant.  Because FLI instruction constraint is
a bit complex, it adds the riscv_get_float_fli_const function to parse
a floating point constant if appropriate and the validness is contained
in its return value.

It also modifies the cost model for floating point constants and implements
simple yet bit-accurate printer for valid finite FLI constants.

This optimization is partially based on AArch64
(vmov instruction handling).

gcc/ChangeLog:

	* config/riscv/constraints.md (H): New.
	* config/riscv/riscv-protos.h (enum riscv_float_fli_const_type):
	New to identify the FLI constant type.
	(struct riscv_float_fli_const): New to represent an optional
	FLI constant.
	* config/riscv/riscv.cc (riscv_get_float_fli_const): New function
	to parse a CONST_DOUBLE and return optionally-valid FLI constant.
	(riscv_const_insns): Modify CONST_DOUBLE cost model.
	(riscv_output_move): Add FLI instruction outputs.
	(riscv_print_operand): Print a finite FLI constant as a hexadecimal
	FP representation or a string operand "min", "inf" or "nan".
	* config/riscv/riscv.md (movhf_hardfloat, movsf_hardfloat,
	movdf_hardfloat_rv32, movdf_hardfloat_rv64): Add "H" variant
	for 'Zfa' extension-based FP constant moves.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/zfa-fli-1.c: New test.
	* gcc.target/riscv/zfa-fli-2.c: Ditto.
	* gcc.target/riscv/zfa-fli-3.c: Ditto.
	* gcc.target/riscv/zfa-fli-4.c: Ditto.
	* gcc.target/riscv/zfa-fli-5.c: Ditto.
	* gcc.target/riscv/zfa-fli-6.c: Ditto.
	* gcc.target/riscv/zfa-fli-7.c: Ditto.
	* gcc.target/riscv/zfa-fli-8.c: Ditto.
---
 gcc/config/riscv/constraints.md            |   7 +
 gcc/config/riscv/riscv-protos.h            |  34 +++
 gcc/config/riscv/riscv.cc                  | 250 ++++++++++++++++++++-
 gcc/config/riscv/riscv.md                  |  24 +-
 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c |  24 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c |  24 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c |  14 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c | 111 +++++++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c |  98 ++++++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c |  61 +++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c |  30 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c |  39 ++++
 12 files changed, 692 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 44525b2da491..d57c72ef14f0 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -98,6 +98,13 @@
   (and (match_code "const_double")
        (match_test "op == CONST0_RTX (mode)")))
 
+;; Floating-point constant that can be generated by a FLI instruction
+;; in the 'Zfa' standard extension.
+(define_constraint "H"
+  "@internal"
+  (and (match_code "const_double")
+       (match_test "riscv_get_float_fli_const (op).valid")))
+
 (define_memory_constraint "A"
   "An address that is held in a general-purpose register."
   (and (match_code "mem")
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 2fbed04ff84c..6effa2437251 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -80,6 +80,39 @@ struct riscv_address_info {
   enum riscv_symbol_type symbol_type;
 };
 
+/* Classifies a floating point constant possibly retrieved by
+   the FLI instructions.
+
+   RISCV_FLOAT_CONST_MIN
+       The minimum positive normal value for given mode.
+
+   RISCV_FLOAT_CONST_INF
+       Positive infinity.
+
+   RISCV_FLOAT_CONST_NAN
+       Canonical NaN (positive, quiet and zero payload NaN).
+
+   RISCV_FLOAT_CONST_FINITE
+       A finite number.  */
+enum riscv_float_fli_const_type {
+  RISCV_FLOAT_CONST_MIN,
+  RISCV_FLOAT_CONST_INF,
+  RISCV_FLOAT_CONST_NAN,
+  RISCV_FLOAT_CONST_FINITE,
+};
+
+/* Information about a floating point constant possibly retrieved by
+   the FLI instructions.  */
+struct riscv_float_fli_const {
+  bool valid: 1;
+  bool sign: 1;
+  enum riscv_float_fli_const_type type: 2;
+  /* Highest 2 bits of IEEE754 mantissa on RISCV_FLOAT_CONST_FINITE.  */
+  unsigned int mantissa_below_point: 2;
+  /* IEEE754 normal exponent - 16 on RISCV_FLOAT_CONST_FINITE.  */
+  unsigned int biased_exponent: 6;
+};
+
 /* Routines implemented in riscv.cc.  */
 extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
 extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
@@ -125,6 +158,7 @@ extern rtx riscv_gen_gpr_save_insn (struct riscv_frame_info *);
 extern bool riscv_gpr_save_operation_p (rtx);
 extern void riscv_reinit (void);
 extern poly_uint64 riscv_regmode_natural_size (machine_mode);
+extern struct riscv_float_fli_const riscv_get_float_fli_const (rtx);
 extern bool riscv_v_ext_vector_mode_p (machine_mode);
 extern bool riscv_v_ext_tuple_mode_p (machine_mode);
 extern bool riscv_v_ext_vls_mode_p (machine_mode);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f9b7a9ee749f..a8c13b014130 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -812,6 +812,185 @@ riscv_legitimate_constant_p (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
   return riscv_const_insns (x) > 0;
 }
 
+/* Check and generate information corresponding a floating point constant
+   that can be generated from a FLI instruction.  */
+
+struct riscv_float_fli_const
+riscv_get_float_fli_const (rtx x)
+{
+  struct riscv_float_fli_const result = {
+    false, false, RISCV_FLOAT_CONST_FINITE, 0, 0
+  };
+  REAL_VALUE_TYPE r, m;
+
+  if (!TARGET_HARD_FLOAT || !TARGET_ZFA)
+    return result;
+  switch (GET_MODE (x))
+    {
+    case HFmode:
+      /* Not only 'Zfhmin', either 'Zfh' or 'Zvfh' is required.  */
+      if (!TARGET_ZFH && !TARGET_ZVFH)
+	return result;
+      break;
+    case SFmode: break;
+    case DFmode: break;
+    default: return result;
+    }
+
+  if (!CONST_DOUBLE_P (x))
+    return result;
+
+  r = *CONST_DOUBLE_REAL_VALUE (x);
+
+  if (REAL_VALUE_ISNAN (r))
+    {
+      long reprs[2] = { 0 };
+      /* Compare with canonical NaN.  */
+      switch (GET_MODE (x))
+	{
+	case HFmode:
+	  reprs[0] = real_to_target (NULL, &r,
+				     float_mode_for_size (16).require ());
+	  /* 0x7e00: Canonical NaN for binary16.  */
+	  if (reprs[0] != 0x7e00)
+	    return result;
+	  break;
+	case SFmode:
+	  reprs[0] = real_to_target (NULL, &r,
+				     float_mode_for_size (32).require ());
+	  /* 0x7fc00000: Canonical NaN for binary32.  */
+	  if (reprs[0] != 0x7fc00000)
+	    return result;
+	  break;
+	case DFmode:
+	  real_to_target (reprs, &r, float_mode_for_size (64).require ());
+	  if (FLOAT_WORDS_BIG_ENDIAN)
+	    std::swap (reprs[0], reprs[1]);
+	  /* 0x7ff80000_00000000: Canonical NaN for binary64.  */
+	  if (reprs[0] != 0 || reprs[1] != 0x7ff80000)
+	    return result;
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      result.type = RISCV_FLOAT_CONST_NAN;
+      result.valid = true;
+      return result;
+    }
+  else if (REAL_VALUE_ISINF (r))
+    {
+      if (REAL_VALUE_NEGATIVE (r))
+	return result;
+      result.type = RISCV_FLOAT_CONST_INF;
+      result.valid = true;
+      return result;
+    }
+
+  bool sign = REAL_VALUE_NEGATIVE (r);
+  result.sign = sign;
+
+  r = real_value_abs (&r);
+  /* GCC internally does not use IEEE754-like encoding (where normalized
+     significands are in the range [1, 2).  GCC uses [0.5, 1) (see real.cc).
+     So, this exponent_p1 variable equals IEEE754 unbiased exponent + 1.  */
+  int exponent_p1 = REAL_EXP (&r);
+
+  /* For the mantissa, we expand into two HOST_WIDE_INTS, apart from the
+     highest (sign) bit, with a fixed binary point at bit point_pos.
+     m1 holds the low part of the mantissa, m2 the high part.
+     WARNING: If we ever have a representation using more than 2 * H_W_I - 1
+     bits for the mantissa, this can fail (low bits will be lost).  */
+  bool fail = false;
+  real_ldexp (&m, &r, (2 * HOST_BITS_PER_WIDE_INT - 1) - exponent_p1);
+  wide_int w = real_to_integer (&m, &fail, HOST_BITS_PER_WIDE_INT * 2);
+  if (fail)
+    return result;
+
+  /* If the low part of the mantissa has bits set we cannot represent
+     the value.  */
+  if (w.ulow () != 0)
+    return result;
+  /* We have rejected the lower HOST_WIDE_INT, so update our
+     understanding of how many bits lie in the mantissa and
+     look only at the high HOST_WIDE_INT.  */
+  unsigned HOST_WIDE_INT mantissa = w.elt (1);
+
+  /* We cannot represent the value 0.0.  */
+  if (mantissa == 0)
+    return result;
+
+  /* We can only represent values with a mantissa of the form 1.xx.  */
+  unsigned HOST_WIDE_INT mask
+      = ((unsigned HOST_WIDE_INT) 1 << (HOST_BITS_PER_WIDE_INT - 4)) - 1;
+  if ((mantissa & mask) != 0)
+    return result;
+  mantissa >>= HOST_BITS_PER_WIDE_INT - 4;
+  /* Now the lowest 3-bits of mantissa should form (1.xx)b.  */
+  gcc_assert (mantissa & (1u << 2));
+  /* Mask out the highest bit.  */
+  mantissa &= ~(1u << 2);
+
+  if (mantissa == 0)
+    {
+      /* We cannot represent any values but -1.0.  */
+      if (exponent_p1 != 1 && sign)
+	return result;
+      switch (exponent_p1)
+	{
+	case -15: /* 1.0 * 2^(-16)  */
+	case -14: /* 1.0 * 2^(-15)  */
+	case -7:  /* 1.0 * 2^(- 8)  */
+	case -6:  /* 1.0 * 2^(- 7)  */
+	case 8:   /* 1.0 * 2^(+ 7)  */
+	case 9:   /* 1.0 * 2^(+ 8)  */
+	case 16:  /* 1.0 * 2^(+15)  */
+	case 17:  /* 1.0 * 2^(+16)  */
+	  break;
+	default:
+	  if (exponent_p1 >= -3 && exponent_p1 <= 5)
+	    /* 1.0 * 2^[-4,4]  */
+	    break;
+	  switch (GET_MODE (x))
+	    {
+	    case HFmode: /* IEEE 754 binary16.  */
+	      /* Minimum positive normal == 1.0 * 2^(-14)  */
+	      if (exponent_p1 != -13) return result;
+	      break;
+	    case SFmode: /* IEEE 754 binary32.  */
+	      /* Minimum positive normal == 1.0 * 2^(-126)  */
+	      if (exponent_p1 != -125) return result;
+	      break;
+	    case DFmode: /* IEEE 754 binary64.  */
+	      /* Minimum positive normal == 1.0 * 2^(-1022)  */
+	      if (exponent_p1 != -1021) return result;
+	      break;
+	    default:
+	      gcc_unreachable ();
+	    }
+	  result.type = RISCV_FLOAT_CONST_MIN;
+	  result.valid = true;
+	  return result;
+	}
+    }
+  else
+    {
+      if (sign)
+	return result;
+      if (exponent_p1 < -1 || exponent_p1 > 2)
+	return result;
+      /* The value is (+1.xx)b * 2^[-2,1].
+	 But we cannot represent (+1.11)b * 2^1 (that is 3.5). */
+      if (exponent_p1 == 2 && mantissa == 3)
+	return result;
+    }
+
+  result.valid = true;
+  result.mantissa_below_point = mantissa;
+  result.biased_exponent = exponent_p1 + 15;
+
+  return result;
+}
+
 /* Implement TARGET_CANNOT_FORCE_CONST_MEM.  */
 
 static bool
@@ -1322,8 +1501,12 @@ riscv_const_insns (rtx x)
       }
 
     case CONST_DOUBLE:
-      /* We can use x0 to load floating-point zero.  */
-      return x == CONST0_RTX (GET_MODE (x)) ? 1 : 0;
+      /* We can use x0 to load floating-point zero.
+	 We also have FLI instructions when the Zfa extension is enabled.  */
+      return x == CONST0_RTX (GET_MODE (x))        ? 1
+	     : riscv_get_float_fli_const (x).valid ? 1
+						   : 0;
+
     case CONST_VECTOR:
       {
 	/* TODO: This is not accurate, we will need to
@@ -1362,17 +1545,14 @@ riscv_const_insns (rtx x)
 		   constant incurs a literal-pool access.  Allow this in
 		   order to increase vectorization possibilities.  */
 		int n = riscv_const_insns (elt);
-		if (CONST_DOUBLE_P (elt))
-		    return 1 + 4; /* vfmv.v.f + memory access.  */
+		/* We need as many insns as it takes to load the constant
+		   into a GPR and one vmv.v.x.  */
+		if (n != 0)
+		  return 1 + n;
+		else if (CONST_DOUBLE_P (elt))
+		  return 1 + 4; /* vfmv.v.f + memory access.  */
 		else
-		  {
-		    /* We need as many insns as it takes to load the constant
-		       into a GPR and one vmv.v.x.  */
-		    if (n != 0)
-		      return 1 + n;
-		    else
-		      return 1 + 4; /*vmv.v.x + memory access.  */
-		  }
+		  return 1 + 4; /* vmv.v.x + memory access.  */
 	      }
 	  }
 
@@ -3196,6 +3376,22 @@ riscv_output_move (rtx dest, rtx src)
       gcc_assert (known_eq (rtx_to_poly_int64 (src), BYTES_PER_RISCV_VECTOR));
       return "csrr\t%0,vlenb";
     }
+  if (dest_code == REG && src_code == CONST_DOUBLE)
+    {
+      struct riscv_float_fli_const flt = riscv_get_float_fli_const (src);
+      if (flt.valid)
+	{
+	  switch (width)
+	    {
+	    case 2:
+	      return "fli.h\t%0,%1";
+	    case 4:
+	      return "fli.s\t%0,%1";
+	    case 8:
+	      return "fli.d\t%0,%1";
+	    }
+	}
+    }
   gcc_unreachable ();
 }
 
@@ -5117,6 +5313,36 @@ riscv_print_operand (FILE *file, rtx op, int letter)
 	    output_address (mode, XEXP (op, 0));
 	  break;
 
+	case CONST_DOUBLE:
+	  {
+	    struct riscv_float_fli_const flt = riscv_get_float_fli_const (op);
+	    if (flt.valid)
+	      {
+		switch (flt.type)
+		  {
+		  case RISCV_FLOAT_CONST_MIN:
+		    fputs ("min", file);
+		    break;
+		  case RISCV_FLOAT_CONST_INF:
+		    fputs ("inf", file);
+		    break;
+		  case RISCV_FLOAT_CONST_NAN:
+		    fputs ("nan", file);
+		    break;
+		  default:
+		    /* Use simpler (and bit-perfect) printer.  */
+		    if (flt.sign)
+		      fputc ('-', file);
+		    fprintf (file, "0x1.%cp%+d",
+			     "048c"[flt.mantissa_below_point],
+			     (int) flt.biased_exponent - 16);
+		    break;
+		  }
+		break;
+	      }
+	  }
+	  /* Fall through.  */
+
 	default:
 	  if (letter == 'z' && op == CONST0_RTX (GET_MODE (op)))
 	    fputs (reg_names[GP_REG_FIRST], file);
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index b456fa6abb3c..ce73db33830d 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1744,13 +1744,13 @@
 })
 
 (define_insn "*movhf_hardfloat"
-  [(set (match_operand:HF 0 "nonimmediate_operand" "=f,f,f,m,m,*f,*r,  *r,*r,*m")
-	(match_operand:HF 1 "move_operand"         " f,G,m,f,G,*r,*f,*G*r,*m,*r"))]
+  [(set (match_operand:HF 0 "nonimmediate_operand" "=f,f,f,f,m,m, *f,*r,   *r,*r,*m")
+	(match_operand:HF 1 "move_operand"         " f,G,H,m,f,G,*r,*f,*G*r,*m,*r"))]
   "TARGET_ZFHMIN
    && (register_operand (operands[0], HFmode)
        || reg_or_0_operand (operands[1], HFmode))"
   { return riscv_output_move (operands[0], operands[1]); }
-  [(set_attr "move_type" "fmove,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
+  [(set_attr "move_type" "fmove,mtc,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
    (set_attr "mode" "HF")])
 
 (define_insn "*movhf_softfloat"
@@ -2075,13 +2075,13 @@
 })
 
 (define_insn "*movsf_hardfloat"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=f,f,f,m,m,*f,*r,  *r,*r,*m")
-	(match_operand:SF 1 "move_operand"         " f,G,m,f,G,*r,*f,*G*r,*m,*r"))]
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=f,f,f,f,m,m, *f,*r,   *r,*r,*m")
+	(match_operand:SF 1 "move_operand"         " f,G,H,m,f,G,*r,*f,*G*r,*m,*r"))]
   "TARGET_HARD_FLOAT
    && (register_operand (operands[0], SFmode)
        || reg_or_0_operand (operands[1], SFmode))"
   { return riscv_output_move (operands[0], operands[1]); }
-  [(set_attr "move_type" "fmove,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
+  [(set_attr "move_type" "fmove,mtc,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
    (set_attr "mode" "SF")])
 
 (define_insn "*movsf_softfloat"
@@ -2109,23 +2109,23 @@
 ;; In RV32, we lack fmv.x.d and fmv.d.x.  Go through memory instead.
 ;; (However, we can still use fcvt.d.w to zero a floating-point register.)
 (define_insn "*movdf_hardfloat_rv32"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=f,f,f,m,m,*th_f_fmv,*th_r_fmv,  *r,*r,*m")
-	(match_operand:DF 1 "move_operand"         " f,G,m,f,G,*th_r_fmv,*th_f_fmv,*r*G,*m,*r"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=f,f,f,f,m,m, *th_f_fmv,*th_r_fmv,  *r, *r,*m")
+	(match_operand:DF 1 "move_operand"         " f,G,H,m,f,G,*th_r_fmv,*th_f_fmv,*r*G,*m,*r"))]
   "!TARGET_64BIT && TARGET_DOUBLE_FLOAT
    && (register_operand (operands[0], DFmode)
        || reg_or_0_operand (operands[1], DFmode))"
   { return riscv_output_move (operands[0], operands[1]); }
-  [(set_attr "move_type" "fmove,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
+  [(set_attr "move_type" "fmove,mtc,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
    (set_attr "mode" "DF")])
 
 (define_insn "*movdf_hardfloat_rv64"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=f,f,f,m,m,*f,*r,  *r,*r,*m")
-	(match_operand:DF 1 "move_operand"         " f,G,m,f,G,*r,*f,*r*G,*m,*r"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=f,f,f,f,m,m, *f,*r,  *r, *r,*m")
+	(match_operand:DF 1 "move_operand"         " f,G,H,m,f,G,*r,*f,*r*G,*m,*r"))]
   "TARGET_64BIT && TARGET_DOUBLE_FLOAT
    && (register_operand (operands[0], DFmode)
        || reg_or_0_operand (operands[1], DFmode))"
   { return riscv_output_move (operands[0], operands[1]); }
-  [(set_attr "move_type" "fmove,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
+  [(set_attr "move_type" "fmove,mtc,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
    (set_attr "mode" "DF")])
 
 (define_insn "*movdf_softfloat"
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-1.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-1.c
new file mode 100644
index 000000000000..35ea5c477676
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imfd_zfa -mabi=lp64d"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imfd_zfa -mabi=ilp32d" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-Os" "-Og" "-Oz"} } */
+
+#ifndef __riscv_zfa
+#error Feature macro not defined
+#endif
+
+double
+foo_positive_d (double a)
+{
+  /* Use 3 FLI FP constants.  */
+  return (2.5 * a - 1.0) / 0.875;
+}
+
+float
+foo_positive_s (float a)
+{
+  return ((float) 2.5 * a - (float) 1.0) / (float) 0.875;
+}
+
+/* { dg-final { scan-assembler-times "fli\\.s\t" 3 } } */
+/* { dg-final { scan-assembler-times "fli\\.d\t" 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-2.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-2.c
new file mode 100644
index 000000000000..10d49d116e46
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-2.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imfd_zfa -mabi=lp64d"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imfd_zfa -mabi=ilp32d" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-O2" "-Os" "-Og" "-Oz"} } */
+
+#ifndef __riscv_zfa
+#error Feature macro not defined
+#endif
+
+double
+foo_negative_d (double a)
+{
+  /* Use 3 "non-FLI" FP constants.  */
+  return (3.5 * a - 5.0) / 0.1875;
+}
+
+float
+foo_negative_s (float a)
+{
+  return ((float) 3.5 * a - (float) 5.0) / (float) 0.1875;
+}
+
+/* { dg-final { scan-assembler-not "fli\\.s\t" } } */
+/* { dg-final { scan-assembler-not "fli\\.d\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-3.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-3.c
new file mode 100644
index 000000000000..6d069b2a4a9c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-3.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imfd_zfa -mabi=lp64d"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imfd_zfa -mabi=ilp32d" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-Os" "-Og" "-Oz"} } */
+
+double
+foo_positive_s (float a)
+{
+  /* Use 3 FLI FP constants (but type conversion occur in the middle).  */
+  return (2.5f * a - 1.0) / 0.875;
+}
+
+/* { dg-final { scan-assembler-times "fli\\.s\t" 1 } } */
+/* { dg-final { scan-assembler-times "fli\\.d\t" 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-4.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-4.c
new file mode 100644
index 000000000000..153853efb196
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-4.c
@@ -0,0 +1,111 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imfd_zfa_zfh -mabi=lp64d"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imfd_zfa_zfh -mabi=ilp32d" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Oz"} } */
+
+#define TYPE_h _Float16
+#define TYPE_s float
+#define TYPE_d double
+
+#define DECL_TYPE(TYPE_SHORT) TYPE_##TYPE_SHORT
+
+#define DECL_FUNC(TYPE_SHORT, N, VALUE)                                       \
+  DECL_TYPE (TYPE_SHORT) const_##TYPE_SHORT##_##N (void)                      \
+    {                                                                         \
+      return VALUE;                                                           \
+    }
+
+#define DECL_FINITE_FUNCS(TYPE_SHORT)                                         \
+  DECL_FUNC (TYPE_SHORT, 00, -1)                                              \
+  DECL_FUNC (TYPE_SHORT, 02, 0.0000152587890625)                              \
+  DECL_FUNC (TYPE_SHORT, 03, 0.000030517578125)                               \
+  DECL_FUNC (TYPE_SHORT, 04, 0.00390625)                                      \
+  DECL_FUNC (TYPE_SHORT, 05, 0.0078125)                                       \
+  DECL_FUNC (TYPE_SHORT, 06, 0.0625)                                          \
+  DECL_FUNC (TYPE_SHORT, 07, 0.125)                                           \
+  DECL_FUNC (TYPE_SHORT, 08, 0.25)                                            \
+  DECL_FUNC (TYPE_SHORT, 09, 0.3125)                                          \
+  DECL_FUNC (TYPE_SHORT, 10, 0.375)                                           \
+  DECL_FUNC (TYPE_SHORT, 11, 0.4375)                                          \
+  DECL_FUNC (TYPE_SHORT, 12, 0.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 13, 0.625)                                           \
+  DECL_FUNC (TYPE_SHORT, 14, 0.75)                                            \
+  DECL_FUNC (TYPE_SHORT, 15, 0.875)                                           \
+  DECL_FUNC (TYPE_SHORT, 16, 1)                                               \
+  DECL_FUNC (TYPE_SHORT, 17, 1.25)                                            \
+  DECL_FUNC (TYPE_SHORT, 18, 1.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 19, 1.75)                                            \
+  DECL_FUNC (TYPE_SHORT, 20, 2)                                               \
+  DECL_FUNC (TYPE_SHORT, 21, 2.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 22, 3)                                               \
+  DECL_FUNC (TYPE_SHORT, 23, 4)                                               \
+  DECL_FUNC (TYPE_SHORT, 24, 8)                                               \
+  DECL_FUNC (TYPE_SHORT, 25, 16)                                              \
+  DECL_FUNC (TYPE_SHORT, 26, 128)                                             \
+  DECL_FUNC (TYPE_SHORT, 27, 256)                                             \
+  DECL_FUNC (TYPE_SHORT, 28, 32768)                                           \
+  DECL_FUNC (TYPE_SHORT, 29, 65536)
+
+/* Finite numbers (except 2^16 in _Float16, making an inf).  */
+DECL_FINITE_FUNCS (h)
+DECL_FINITE_FUNCS (s)
+DECL_FINITE_FUNCS (d)
+
+/* min.  */
+DECL_FUNC (h, 01, __FLT16_MIN__)
+DECL_FUNC (s, 01, __FLT_MIN__)
+DECL_FUNC (d, 01, __DBL_MIN__)
+
+/* inf.  */
+DECL_FUNC (h, 30, __builtin_inff16 ())
+DECL_FUNC (s, 30, __builtin_inff ())
+DECL_FUNC (d, 30, __builtin_inf ())
+
+/* nan.  */
+DECL_FUNC (h, 31, __builtin_nanf16 (""))
+DECL_FUNC (s, 31, __builtin_nanf (""))
+DECL_FUNC (d, 31, __builtin_nan (""))
+
+
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,-0x1\\.0p\\+0\n" 3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-16\n"   3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-15\n"   3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-8\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-7\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-4\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-3\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-2\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.4p-2\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.8p-2\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.cp-2\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p-1\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.4p-1\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.8p-1\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.cp-1\n"    3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+0\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.4p\\+0\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.8p\\+0\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.cp\\+0\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+1\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.4p\\+1\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.8p\\+1\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+2\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+3\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+4\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+7\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+8\n"  3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,0x1\\.0p\\+15\n" 3 } } */
+/* { dg-final { scan-assembler-times "fli\\.\[sd]\tfa0,0x1\\.0p\\+16\n"  2 } } */
+
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,min\n" 3 } } */
+
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,inf\n" 2 } } */
+/* { dg-final { scan-assembler-times "fli\\.s\tfa0,inf\n" 1 } } */
+/* { dg-final { scan-assembler-times "fli\\.d\tfa0,inf\n" 1 } } */
+
+/* { dg-final { scan-assembler-times "fli\\.\[hsd]\tfa0,nan\n" 3 } } */
+
+
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0," 32 } } */
+/* { dg-final { scan-assembler-times "fli\\.s\tfa0," 32 } } */
+/* { dg-final { scan-assembler-times "fli\\.d\tfa0," 32 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-5.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-5.c
new file mode 100644
index 000000000000..186f91ffb349
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-5.c
@@ -0,0 +1,98 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imf_zfa_zvfh -mabi=lp64f"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imf_zfa_zvfh -mabi=ilp32f" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Oz"} } */
+
+/* Even if 'Zfh' is disabled, "fli.h" is usable when
+   both 'Zfa' and 'Zvfh' are available.  */
+#ifdef __riscv_zfh
+#error Invalid feature macro defined
+#endif
+
+#define TYPE_h _Float16
+
+#define DECL_TYPE(TYPE_SHORT) TYPE_##TYPE_SHORT
+
+#define DECL_FUNC(TYPE_SHORT, N, VALUE)                                       \
+  DECL_TYPE (TYPE_SHORT) const_##TYPE_SHORT##_##N (void)                      \
+    {                                                                         \
+      return VALUE;                                                           \
+    }
+
+#define DECL_FINITE_FUNCS(TYPE_SHORT)                                         \
+  DECL_FUNC (TYPE_SHORT, 00, -1)                                              \
+  DECL_FUNC (TYPE_SHORT, 02, 0.0000152587890625)                              \
+  DECL_FUNC (TYPE_SHORT, 03, 0.000030517578125)                               \
+  DECL_FUNC (TYPE_SHORT, 04, 0.00390625)                                      \
+  DECL_FUNC (TYPE_SHORT, 05, 0.0078125)                                       \
+  DECL_FUNC (TYPE_SHORT, 06, 0.0625)                                          \
+  DECL_FUNC (TYPE_SHORT, 07, 0.125)                                           \
+  DECL_FUNC (TYPE_SHORT, 08, 0.25)                                            \
+  DECL_FUNC (TYPE_SHORT, 09, 0.3125)                                          \
+  DECL_FUNC (TYPE_SHORT, 10, 0.375)                                           \
+  DECL_FUNC (TYPE_SHORT, 11, 0.4375)                                          \
+  DECL_FUNC (TYPE_SHORT, 12, 0.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 13, 0.625)                                           \
+  DECL_FUNC (TYPE_SHORT, 14, 0.75)                                            \
+  DECL_FUNC (TYPE_SHORT, 15, 0.875)                                           \
+  DECL_FUNC (TYPE_SHORT, 16, 1)                                               \
+  DECL_FUNC (TYPE_SHORT, 17, 1.25)                                            \
+  DECL_FUNC (TYPE_SHORT, 18, 1.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 19, 1.75)                                            \
+  DECL_FUNC (TYPE_SHORT, 20, 2)                                               \
+  DECL_FUNC (TYPE_SHORT, 21, 2.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 22, 3)                                               \
+  DECL_FUNC (TYPE_SHORT, 23, 4)                                               \
+  DECL_FUNC (TYPE_SHORT, 24, 8)                                               \
+  DECL_FUNC (TYPE_SHORT, 25, 16)                                              \
+  DECL_FUNC (TYPE_SHORT, 26, 128)                                             \
+  DECL_FUNC (TYPE_SHORT, 27, 256)                                             \
+  DECL_FUNC (TYPE_SHORT, 28, 32768)                                           \
+  DECL_FUNC (TYPE_SHORT, 29, 65536)
+
+/* Finite numbers (except 2^16 in _Float16, making an inf).  */
+DECL_FINITE_FUNCS (h)
+
+/* min.  */
+DECL_FUNC (h, 01, __FLT16_MIN__)
+
+/* inf.  */
+DECL_FUNC (h, 30, __builtin_inff16 ())
+
+
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,-0x1\\.0p\\+0\n" 1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-16\n"   1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-15\n"   1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-8\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-7\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-4\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-3\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-2\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.4p-2\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.8p-2\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.cp-2\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p-1\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.4p-1\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.8p-1\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.cp-1\n"    1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+0\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.4p\\+0\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.8p\\+0\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.cp\\+0\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+1\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.4p\\+1\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.8p\\+1\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+2\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+3\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+4\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+7\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+8\n"  1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,0x1\\.0p\\+15\n" 1 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,inf\n"           2 } } */
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,min\n"           1 } } */
+
+
+/* nan.  */
+DECL_FUNC (h, 31, __builtin_nanf16 (""))
+
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,nan\n"           1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-6.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-6.c
new file mode 100644
index 000000000000..2ee830d5c14c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-6.c
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imf_zfa_zfhmin -mabi=lp64f"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imf_zfa_zfhmin -mabi=ilp32f" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Oz"} } */
+
+/* "fli.h" is unavailable even if both 'Zfa' and 'Zfhmin' is enabled.  */
+
+#define TYPE_h _Float16
+
+#define DECL_TYPE(TYPE_SHORT) TYPE_##TYPE_SHORT
+
+#define DECL_FUNC(TYPE_SHORT, N, VALUE)                                       \
+  DECL_TYPE (TYPE_SHORT) const_##TYPE_SHORT##_##N (void)                      \
+    {                                                                         \
+      return VALUE;                                                           \
+    }
+
+#define DECL_FINITE_FUNCS(TYPE_SHORT)                                         \
+  DECL_FUNC (TYPE_SHORT, 00, -1)                                              \
+  DECL_FUNC (TYPE_SHORT, 02, 0.0000152587890625)                              \
+  DECL_FUNC (TYPE_SHORT, 03, 0.000030517578125)                               \
+  DECL_FUNC (TYPE_SHORT, 04, 0.00390625)                                      \
+  DECL_FUNC (TYPE_SHORT, 05, 0.0078125)                                       \
+  DECL_FUNC (TYPE_SHORT, 06, 0.0625)                                          \
+  DECL_FUNC (TYPE_SHORT, 07, 0.125)                                           \
+  DECL_FUNC (TYPE_SHORT, 08, 0.25)                                            \
+  DECL_FUNC (TYPE_SHORT, 09, 0.3125)                                          \
+  DECL_FUNC (TYPE_SHORT, 10, 0.375)                                           \
+  DECL_FUNC (TYPE_SHORT, 11, 0.4375)                                          \
+  DECL_FUNC (TYPE_SHORT, 12, 0.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 13, 0.625)                                           \
+  DECL_FUNC (TYPE_SHORT, 14, 0.75)                                            \
+  DECL_FUNC (TYPE_SHORT, 15, 0.875)                                           \
+  DECL_FUNC (TYPE_SHORT, 16, 1)                                               \
+  DECL_FUNC (TYPE_SHORT, 17, 1.25)                                            \
+  DECL_FUNC (TYPE_SHORT, 18, 1.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 19, 1.75)                                            \
+  DECL_FUNC (TYPE_SHORT, 20, 2)                                               \
+  DECL_FUNC (TYPE_SHORT, 21, 2.5)                                             \
+  DECL_FUNC (TYPE_SHORT, 22, 3)                                               \
+  DECL_FUNC (TYPE_SHORT, 23, 4)                                               \
+  DECL_FUNC (TYPE_SHORT, 24, 8)                                               \
+  DECL_FUNC (TYPE_SHORT, 25, 16)                                              \
+  DECL_FUNC (TYPE_SHORT, 26, 128)                                             \
+  DECL_FUNC (TYPE_SHORT, 27, 256)                                             \
+  DECL_FUNC (TYPE_SHORT, 28, 32768)                                           \
+  DECL_FUNC (TYPE_SHORT, 29, 65536)
+
+/* Finite numbers (except 2^16 in _Float16, making an inf).  */
+DECL_FINITE_FUNCS (h)
+
+/* min.  */
+DECL_FUNC (h, 01, __FLT16_MIN__)
+
+/* inf.  */
+DECL_FUNC (h, 30, __builtin_inff16 ())
+
+/* nan.  */
+DECL_FUNC (h, 31, __builtin_nanf16 (""))
+
+/* { dg-final { scan-assembler-not "fli\\.h\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-7.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-7.c
new file mode 100644
index 000000000000..4da8a2985852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-7.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imfd_zfa_zfh -mabi=lp64d"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imfd_zfa_zfh -mabi=ilp32d" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Oz"} } */
+
+/* Canonical NaN is, positive, quiet NaN with zero payload.  */
+
+#define TYPE_h _Float16
+#define TYPE_s float
+#define TYPE_d double
+
+#define DECL_TYPE(TYPE_SHORT) TYPE_##TYPE_SHORT
+
+#define DECL_FUNC(TYPE_SHORT, N, VALUE)                                       \
+  DECL_TYPE (TYPE_SHORT) const_##TYPE_SHORT##_##N (void)                      \
+    {                                                                         \
+      return VALUE;                                                           \
+    }
+
+/* Canonical NaN.  */
+DECL_FUNC (h, 1, __builtin_nanf16 (""))
+DECL_FUNC (s, 1, __builtin_nanf (""))
+DECL_FUNC (d, 1, __builtin_nan (""))
+DECL_FUNC (h, 2, __builtin_nanf16 ("0"))
+DECL_FUNC (s, 2, __builtin_nanf ("0"))
+DECL_FUNC (d, 2, __builtin_nan ("0"))
+
+/* { dg-final { scan-assembler-times "fli\\.h\tfa0,nan\n" 2 } } */
+/* { dg-final { scan-assembler-times "fli\\.s\tfa0,nan\n" 2 } } */
+/* { dg-final { scan-assembler-times "fli\\.d\tfa0,nan\n" 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fli-8.c b/gcc/testsuite/gcc.target/riscv/zfa-fli-8.c
new file mode 100644
index 000000000000..a09726c0cb59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fli-8.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imfd_zfa_zfh -mabi=lp64d"  { target { rv64 } } } */
+/* { dg-options "-march=rv32imfd_zfa_zfh -mabi=ilp32d" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Oz"} } */
+
+/* Canonical NaN is, positive, quiet NaN with zero payload.  */
+
+#define TYPE_h _Float16
+#define TYPE_s float
+#define TYPE_d double
+
+#define DECL_TYPE(TYPE_SHORT) TYPE_##TYPE_SHORT
+
+#define DECL_FUNC(TYPE_SHORT, N, VALUE)                                       \
+  DECL_TYPE (TYPE_SHORT) const_##TYPE_SHORT##_##N (void)                      \
+    {                                                                         \
+      return VALUE;                                                           \
+    }
+
+/* Non-canonical NaN.  */
+DECL_FUNC (h, 1, __builtin_nansf16 (""))
+DECL_FUNC (s, 1, __builtin_nansf (""))
+DECL_FUNC (d, 1, __builtin_nans (""))
+DECL_FUNC (h, 2, __builtin_nansf16 ("0"))
+DECL_FUNC (s, 2, __builtin_nansf ("0"))
+DECL_FUNC (d, 2, __builtin_nans ("0"))
+DECL_FUNC (h, 3, __builtin_nanf16 ("1"))
+DECL_FUNC (s, 3, __builtin_nanf ("1"))
+DECL_FUNC (d, 3, __builtin_nan ("1"))
+DECL_FUNC (h, 4, __builtin_nansf16 ("1"))
+DECL_FUNC (s, 4, __builtin_nansf ("1"))
+DECL_FUNC (d, 4, __builtin_nans ("1"))
+
+/* Canonical NaN, negated (making it non-canonical).  */
+DECL_FUNC (h, 5, -__builtin_nanf16 (""))
+DECL_FUNC (s, 5, -__builtin_nanf (""))
+DECL_FUNC (d, 5, -__builtin_nan (""))
+
+/* { dg-final { scan-assembler-not "fli\\.\[hsd]\tfa0,nan\n" } } */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] RISC-V: Constant FP Optimization with 'Zfa'
  2023-08-14  5:32 [PATCH 0/2] RISC-V: Make "prefetch.i" built-in usable Tsukasa OI
  2023-08-14  5:32 ` [PATCH 1/2] RISC-V: Add support for the 'Zfa' extension Tsukasa OI
  2023-08-14  5:32 ` [PATCH 2/2] RISC-V: Constant FP Optimization with 'Zfa' Tsukasa OI
@ 2023-08-14  6:19 ` Tsukasa OI
  2 siblings, 0 replies; 10+ messages in thread
From: Tsukasa OI @ 2023-08-14  6:19 UTC (permalink / raw)
  Cc: gcc-patches

Oh my, I forgot to change the subject of PATCH 0/2.
That should have been "RISC-V: Constant FP Optimization with 'Zfa'", the
same subject as PATCH 2/2.

Sorry for confusion!

On 2023/08/14 14:32, Tsukasa OI wrote:
> Hello,
> 
> and... I think this might be my first *large* patch set for GCC
> contribution and definitely the first one to touch the machine description.
> 
> So, please review it carefully.
> 
> 
> Background
> ===========
> 
> This patch set adds an optimization to FP constant initialization using a
> FLI instruction, which is a part of the 'Zfa' extension which provides
> additional floating-point instructions.
> 
> FLI instructions ("fli.h" for binary16, "fli.s" for binary32, "fli.d" for
> binary64 and "fli.q" for binary128 [which can be ignored because current
> GCC for RISC-V does not natively support binary128]) provide an
> load-immediate operation for following 32 immediates.
> 
> | Binary Encoding | Immediate (and its part of binary representation) |
> | --------------- | --------------------------------------------------|
> |    `00000` ( 0) | -1.0          (-0b1.00 * 2^(+ 0))                 |
> |    `00001` ( 1) | Minimum positive normal value                     |
> |                 | sign=[0] exponent=[0..01] significand=[000..000]  |
> |    `00010` ( 2) | 1.00*2^(-16)  (+0b1.00 * 2^(-16))                 |
> |    `00011` ( 3) | 1.00*2^(-15)  (+0b1.00 * 2^(-15))                 |
> |    `00100` ( 4) | 1.00*2^(- 8)  (+0b1.00 * 2^(- 8))                 |
> |    `00101` ( 5) | 1.00*2^(- 7)  (+0b1.00 * 2^(- 7))                 |
> |    `00110` ( 6) | 1.00*2^(- 4)  (+0b1.00 * 2^(- 4)) = 0.0625        |
> |    `00111` ( 7) | 1.00*2^(- 3)  (+0b1.00 * 2^(- 3)) = 0.125         |
> |    `01000` ( 8) | 1.00*2^(- 2)  (+0b1.00 * 2^(- 2)) : 0.25          |
> |    `01001` ( 9) | 1.25*2^(- 2)  (+0b1.01 * 2^(- 2)) : 0.3125        |
> |    `01010` (10) | 1.50*2^(- 2)  (+0b1.10 * 2^(- 2)) : 0.375         |
> |    `01011` (11) | 1.75*2^(- 2)  (+0b1.11 * 2^(- 2)) : 0.4375        |
> |    `01100` (12) | 1.00*2^(- 1)  (+0b1.00 * 2^(- 1)) : 0.5           |
> |    `01101` (13) | 1.25*2^(- 1)  (+0b1.01 * 2^(- 1)) : 0.625         |
> |    `01110` (14) | 1.50*2^(- 1)  (+0b1.10 * 2^(- 1)) : 0.75          |
> |    `01111` (15) | 1.75*2^(- 1)  (+0b1.11 * 2^(- 1)) : 0.875         |
> |    `10000` (16) | 1.00*2^(+ 0)  (+0b1.00 * 2^(+ 0)) : 1.0           |
> |    `10001` (17) | 1.25*2^(+ 0)  (+0b1.01 * 2^(+ 0)) : 1.25          |
> |    `10010` (18) | 1.50*2^(+ 0)  (+0b1.10 * 2^(+ 0)) : 1.5           |
> |    `10011` (19) | 1.75*2^(+ 0)  (+0b1.11 * 2^(+ 0)) : 1.75          |
> |    `10100` (20) | 1.00*2^(+ 1)  (+0b1.00 * 2^(+ 1)) : 2.0           |
> |    `10101` (21) | 1.25*2^(+ 1)  (+0b1.01 * 2^(+ 1)) : 2.5           |
> |    `10110` (22) | 1.50*2^(+ 1)  (+0b1.10 * 2^(+ 1)) : 3.0           |
> |    `10111` (23) | 1.00*2^(+ 2)  (+0b1.00 * 2^(+ 2)) = 4             |
> |    `11000` (24) | 1.00*2^(+ 3)  (+0b1.00 * 2^(+ 3)) = 8             |
> |    `11001` (25) | 1.00*2^(+ 4)  (+0b1.00 * 2^(+ 4)) = 16            |
> |    `11010` (26) | 1.00*2^(+ 7)  (+0b1.00 * 2^(+ 7)) = 128           |
> |    `11011` (27) | 1.00*2^(+ 8)  (+0b1.00 * 2^(+ 8)) = 256           |
> |    `11100` (28) | 1.00*2^(+15)  (+0b1.00 * 2^(+15)) = 32768         |
> |    `11101` (29) | 1.00*2^(+16)  (+0b1.00 * 2^(+16)) = 65536         |
> |                 | On "fli.h", this is equivalent to positive inf.   |
> |    `11110` (30) | Positive infinity                                 |
> |                 | sign=[0] exponent=[1..11] significand=[000..000]  |
> |    `11111` (31) | Canonical NaN (positive, quiet and zero payload)  |
> |                 | sign=[0] exponent=[1..11] significand=[100..000]  |
> 
> Currently, initializing a FP constant (except zero) involves memory and its
> use can be reduced by FLI instructions.
> 
> We may have a room to generate much complex constants with multiple FLI
> instructions (e.g. like long integer constants) but for starter, we can
> begin with optimizing one FP constant initialization with one FLI
> instruction (and because FP arithmetic often requires larger latency,
> benefits of making multiple FLI sequence is not high compared to integers).
> 
> 
> FLI FP constant checking
> =========================
> 
> An instruction with a similar role to RISC-V's FLI instructions is the Arm/
> AArch64's vmov.f32 instruction. It provides a load-immediate operation for
> constant that can be represented in the following form:
> 
>> (-1)^s * 0b1.xxxx * 2^r   (where -3 <= r <= +4; fits in 3-bits)
> 
> This patch is largely influenced by AArch64's handling but
> compared to this, handling RISC-V's FLI FP constant can be a little tricky.
> 
> *   FLI normally generates only values with sign bit 0 except the binary
>     encoding 0 (which loads -1.0 with sign bit 1).
> *   Not only finite values, FLI can generate positive infinity and
>     canonical NaN.
> *   Because FLI can generate canonical NaN, handling NaN is preferred but
>     FLI only generates canonical NaN.  Since we can easily create a non-
>     canonical NaN with __builtin_nan ("[PAYLOAD]") and that could be a
>     direct return value of a function, we must reject non-canonical NaNs
>     (otherwise it'll generate "fli.d fa0,nan" where NaN is non-canonical).
> *   Exponent range and mantissa constraint is a bit tricky.
>     On binary encodings 8-22, it looks like 0b1.xx * 2^r (where -2 <= 1)
>     but we have to explicitly reject 0b1.11 * 2^1 (that is 3.5) because
>     the value 3.5 is not in the list.
>     Other 1.00 * 2^r values have discontinuous r.
> *   Binary encoding 1 (minimum positive normal value for corresponding
>     type) depends on the type (or mode) we are on.
> *   Assembler accepts three string operands: "min", "inf" and "nan".
> 
> Handling those like aarch64_float_const_representable_p can be
> inefficient.  So, I implemented riscv_get_float_fli_const function which
> returns complex information about a FLI constant (including whether the
> constant is valid for a FLI constant).
> 
> This complex information contains:
> 
> 1.  Validness
> 2.  Sign bit (only set for -1.0)
> 3.  FLI constant type ("min", "inf", "nan" or a finite number but "min")
> 4.  Highest two bits of mantissa under the point (xx for 0b1.xx)
>     on a finite value except "min".
> 5.  Biased exponent (yet sparse representation to make handling easier)
>     on a finite value except "min".  For 0b1.xx * 2^r, (r+16) is stored.
>     Valid range of this is [0, 32] (inclusive) so it requires 6 bits.
> 
> On many ABIs, those information is packed into an integer sized bitfield.
> 
> 
> New Constraint: "H"
> ====================
> 
> According to the GCC Internals documentation, (along with "G") "H" is
> preferred for a machine-dependent fashion to permit immediate floating
> operands in particular ranges of values.  Because "G" is already used to
> represent +0.0, this patch set uses "H" for FLI-capable FP constants.
> 
> It adds one variant per operation:
> 
> *   movhf_hardfloat
> *   movsf_hardfloat
> *   movdf_hardfloat_rv32
> *   movdf_hardfloat_rv64
> 
> Note that the 'Zfa' extension requires the 'F' extension (which is the
> hard float).
> 
> 
> 
> Portions that I'm not sure whether they are okay
> =================================================
> 
> *   NaN handling (comparison with canonical NaN)
>     Due to constraints, I had to compare a NaN with known binary
>     representations with known IEEE 754 binary16/32/64's canonical NaN but
>     it there any better way to perform this?
> *   Any ICE possibility?
>     For simple programs, I confirmed that no ICE occurs but I'm not sure
>     whether this applies to other programs.  If I miss some cases in
>     riscv_output_move or riscv_print_operand functions (corresponding
>     mov instructions in riscv.md), it can easily cause an ICE.
> 
> 
> Sincerely,
> Tsukasa
> 
> 
> 
> 
> Tsukasa OI (2):
>   RISC-V: Add support for the 'Zfa' extension
>   RISC-V: Constant FP Optimization with 'Zfa'
> 
>  gcc/common/config/riscv/riscv-common.cc    |   3 +
>  gcc/config/riscv/constraints.md            |   7 +
>  gcc/config/riscv/riscv-opts.h              |   2 +
>  gcc/config/riscv/riscv-protos.h            |  34 +++
>  gcc/config/riscv/riscv.cc                  | 250 ++++++++++++++++++++-
>  gcc/config/riscv/riscv.md                  |  24 +-
>  gcc/testsuite/gcc.target/riscv/zfa-fli-1.c |  24 ++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-2.c |  24 ++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-3.c |  14 ++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-4.c | 111 +++++++++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-5.c |  98 ++++++++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-6.c |  61 +++++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-7.c |  30 +++
>  gcc/testsuite/gcc.target/riscv/zfa-fli-8.c |  39 ++++
>  14 files changed, 697 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c
> 
> 
> base-commit: 614052dd4ea083e086712809c754ffebd9361316

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [2/2] RISC-V: Constant FP Optimization with 'Zfa'
  2023-08-14  5:32 ` [PATCH 2/2] RISC-V: Constant FP Optimization with 'Zfa' Tsukasa OI
@ 2023-08-14 12:51   ` Jin Ma
  2023-08-15  3:38     ` Tsukasa OI
  2023-08-25 20:59     ` Jeff Law
  0 siblings, 2 replies; 10+ messages in thread
From: Jin Ma @ 2023-08-14 12:51 UTC (permalink / raw)
  To: gcc-patches
  Cc: research_trasio, jeffreyalaw, palmer, richard.sandiford,
	kito.cheng, philipp.tomsich, christoph.muellner, rdapp.gcc,
	juzhe.zhong, vineetg, jinma.contrib

Hi Tsukasa,
  What a coincidence, I also implemented zfa extension, which also includes fli related instructions :)

links: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627294.html

> +  if (!TARGET_HARD_FLOAT || !TARGET_ZFA)
> +    return result;
> +  switch (GET_MODE (x))
> +    {
> +    case HFmode:
> +      /* Not only 'Zfhmin', either 'Zfh' or 'Zvfh' is required.  */
> +      if (!TARGET_ZFH && !TARGET_ZVFH)

When Zvfh means that zfh is also on, so there may be no need to judge
the TARGET_ZVFH here. By the way,the format here seems wrong, maybe 'tab'
is needed for alignment?

> +	return result;
> +      break;
> +    case SFmode: break;
> +    case DFmode: break;

Maybe we still have to judge TARGET_DOUBLE_FLOAT?

> +    default: return result;
> +    }
> +
> +  if (!CONST_DOUBLE_P (x))
> +    return result;

I think it might be better to judge whether x satisfies the CONST_DOUBLE_P
before switch (GET_MODE (x)) above.

> +
> +  r = *CONST_DOUBLE_REAL_VALUE (x);
> +
> +  if (REAL_VALUE_ISNAN (r))
> +    {
> +      long reprs[2] = { 0 };
> +      /* Compare with canonical NaN.  */
> +      switch (GET_MODE (x))
> +	{
> +	case HFmode:
> +	  reprs[0] = real_to_target (NULL, &r,
> +				     float_mode_for_size (16).require ());
> +	  /* 0x7e00: Canonical NaN for binary16.  */
> +	  if (reprs[0] != 0x7e00)
> +	    return result;
> +	  break;
> +	case SFmode:
> +	  reprs[0] = real_to_target (NULL, &r,
> +				     float_mode_for_size (32).require ());
> +	  /* 0x7fc00000: Canonical NaN for binary32.  */
> +	  if (reprs[0] != 0x7fc00000)
> +	    return result;
> +	  break;
> +	case DFmode:
> +	  real_to_target (reprs, &r, float_mode_for_size (64).require ());
> +	  if (FLOAT_WORDS_BIG_ENDIAN)
> +	    std::swap (reprs[0], reprs[1]);
> +	  /* 0x7ff80000_00000000: Canonical NaN for binary64.  */
> +	  if (reprs[0] != 0 || reprs[1] != 0x7ff80000)
> +	    return result;
> +	  break;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +      result.type = RISCV_FLOAT_CONST_NAN;
> +      result.valid = true;
> +      return result;
> +    }
> +  else if (REAL_VALUE_ISINF (r))
> +    {
> +      if (REAL_VALUE_NEGATIVE (r))
> +	return result;
> +      result.type = RISCV_FLOAT_CONST_INF;
> +      result.valid = true;
> +      return result;
> +    }
> +
> +  bool sign = REAL_VALUE_NEGATIVE (r);
> +  result.sign = sign;
> +
> +  r = real_value_abs (&r);
> +  /* GCC internally does not use IEEE754-like encoding (where normalized
> +     significands are in the range [1, 2).  GCC uses [0.5, 1) (see real.cc).
> +     So, this exponent_p1 variable equals IEEE754 unbiased exponent + 1.  */
> +  int exponent_p1 = REAL_EXP (&r);
> +
> +  /* For the mantissa, we expand into two HOST_WIDE_INTS, apart from the
> +     highest (sign) bit, with a fixed binary point at bit point_pos.
> +     m1 holds the low part of the mantissa, m2 the high part.
> +     WARNING: If we ever have a representation using more than 2 * H_W_I - 1
> +     bits for the mantissa, this can fail (low bits will be lost).  */
> +  bool fail = false;
> +  real_ldexp (&m, &r, (2 * HOST_BITS_PER_WIDE_INT - 1) - exponent_p1);
> +  wide_int w = real_to_integer (&m, &fail, HOST_BITS_PER_WIDE_INT * 2);
> +  if (fail)
> +    return result;
> +
> +  /* If the low part of the mantissa has bits set we cannot represent
> +     the value.  */
> +  if (w.ulow () != 0)
> +    return result;
> +  /* We have rejected the lower HOST_WIDE_INT, so update our
> +     understanding of how many bits lie in the mantissa and
> +     look only at the high HOST_WIDE_INT.  */
> +  unsigned HOST_WIDE_INT mantissa = w.elt (1);
> +
> +  /* We cannot represent the value 0.0.  */
> +  if (mantissa == 0)
> +    return result;
> +
> +  /* We can only represent values with a mantissa of the form 1.xx.  */
> +  unsigned HOST_WIDE_INT mask
> +      = ((unsigned HOST_WIDE_INT) 1 << (HOST_BITS_PER_WIDE_INT - 4)) - 1;
> +  if ((mantissa & mask) != 0)
> +    return result;
> +  mantissa >>= HOST_BITS_PER_WIDE_INT - 4;
> +  /* Now the lowest 3-bits of mantissa should form (1.xx)b.  */
> +  gcc_assert (mantissa & (1u << 2));
> +  /* Mask out the highest bit.  */
> +  mantissa &= ~(1u << 2);
> +
> +  if (mantissa == 0)
> +    {
> +      /* We cannot represent any values but -1.0.  */
> +      if (exponent_p1 != 1 && sign)
> +	return result;
> +      switch (exponent_p1)
> +	{
> +	case -15: /* 1.0 * 2^(-16)  */
> +	case -14: /* 1.0 * 2^(-15)  */
> +	case -7:  /* 1.0 * 2^(- 8)  */
> +	case -6:  /* 1.0 * 2^(- 7)  */
> +	case 8:   /* 1.0 * 2^(+ 7)  */
> +	case 9:   /* 1.0 * 2^(+ 8)  */
> +	case 16:  /* 1.0 * 2^(+15)  */
> +	case 17:  /* 1.0 * 2^(+16)  */
> +	  break;
> +	default:
> +	  if (exponent_p1 >= -3 && exponent_p1 <= 5)
> +	    /* 1.0 * 2^[-4,4]  */
> +	    break;
> +	  switch (GET_MODE (x))
> +	    {
> +	    case HFmode: /* IEEE 754 binary16.  */
> +	      /* Minimum positive normal == 1.0 * 2^(-14)  */
> +	      if (exponent_p1 != -13) return result;
> +	      break;
> +	    case SFmode: /* IEEE 754 binary32.  */
> +	      /* Minimum positive normal == 1.0 * 2^(-126)  */
> +	      if (exponent_p1 != -125) return result;
> +	      break;
> +	    case DFmode: /* IEEE 754 binary64.  */
> +	      /* Minimum positive normal == 1.0 * 2^(-1022)  */
> +	      if (exponent_p1 != -1021) return result;
> +	      break;
> +	    default:
> +	      gcc_unreachable ();
> +	    }
> +	  result.type = RISCV_FLOAT_CONST_MIN;
> +	  result.valid = true;
> +	  return result;
> +	}
> +    }
> +  else
> +    {
> +      if (sign)
> +	return result;
> +      if (exponent_p1 < -1 || exponent_p1 > 2)
> +	return result;
> +      /* The value is (+1.xx)b * 2^[-2,1].
> +	 But we cannot represent (+1.11)b * 2^1 (that is 3.5). */
> +      if (exponent_p1 == 2 && mantissa == 3)
> +	return result;
> +    }
> +
> +  result.valid = true;
> +  result.mantissa_below_point = mantissa;
> +  result.biased_exponent = exponent_p1 + 15;
> +
> +  return result;
> +}
> +

This code is great and completely different from the way I implemented it.
I'm not sure which one is better, but my idea is that the fli instruction
corresponds to three tables (HF, SF and DF), all of which represent
specific values. the library in gcc's real.h can very well convert
the corresponding values into the values in the table, so it is only
necessary to perform a simple binary search to look up the tables.

@@ -1362,17 +1545,14 @@  riscv_const_insns (rtx x)
 		   constant incurs a literal-pool access.  Allow this in
 		   order to increase vectorization possibilities.  */
 		int n = riscv_const_insns (elt);
-		if (CONST_DOUBLE_P (elt))
-		    return 1 + 4; /* vfmv.v.f + memory access.  */
> +		/* We need as many insns as it takes to load the constant
> +		   into a GPR and one vmv.v.x.  */
> +		if (n != 0)
> +		  return 1 + n;
> +		else if (CONST_DOUBLE_P (elt))
> +		  return 1 + 4; /* vfmv.v.f + memory access.  */
 		else
-		  {
-		    /* We need as many insns as it takes to load the constant
-		       into a GPR and one vmv.v.x.  */
-		    if (n != 0)
-		      return 1 + n;
-		    else
-		      return 1 + 4; /*vmv.v.x + memory access.  */
-		  }
> +		  return 1 + 4; /* vmv.v.x + memory access.  */
 	      }
 	  }

I don't seem to understand here, if n = = 0, always return 1 + 4?
If so, it could be
if (n != 0)
   return 1 + n;
else
  return 1 + 4;

@@ -5117,6 +5313,36 @@  riscv_print_operand (FILE *file, rtx op, int letter)
 	    output_address (mode, XEXP (op, 0));
 	  break;
 
> +	case CONST_DOUBLE:
> +	  {
> +	    struct riscv_float_fli_const flt = riscv_get_float_fli_const (op);
> +	    if (flt.valid)
> +	      {
> +		switch (flt.type)
> +		  {
> +		  case RISCV_FLOAT_CONST_MIN:
> +		    fputs ("min", file);
> +		    break;
> +		  case RISCV_FLOAT_CONST_INF:
> +		    fputs ("inf", file);
> +		    break;
> +		  case RISCV_FLOAT_CONST_NAN:
> +		    fputs ("nan", file);
> +		    break;
> +		  default:
> +		    /* Use simpler (and bit-perfect) printer.  */
> +		    if (flt.sign)
> +		      fputc ('-', file);
> +		    fprintf (file, "0x1.%cp%+d",
> +			     "048c"[flt.mantissa_below_point],
> +			     (int) flt.biased_exponent - 16);
> +		    break;
> +		  }
> +		break;
> +	      }
> +	  }
> +	  /* Fall through.  */

Display floating-point values at the assembly level can refer llvm
https://reviews.llvm.org/D145645. 

It may also be necessary to deal with riscv_split_64bit_move_p
and riscv_legitimize_const_move for rv32, otherwise the mov of
DFmode on rv32 will be split into high 32-bit mov and low 32-bit
mov, thus unable to generate fli instructions.

Thanks,
Jin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [2/2] RISC-V: Constant FP Optimization with 'Zfa'
  2023-08-14 12:51   ` [2/2] " Jin Ma
@ 2023-08-15  3:38     ` Tsukasa OI
  2023-08-15  7:59       ` Tsukasa OI
  2023-08-25 20:59     ` Jeff Law
  1 sibling, 1 reply; 10+ messages in thread
From: Tsukasa OI @ 2023-08-15  3:38 UTC (permalink / raw)
  To: Jin Ma; +Cc: gcc-patches

On 2023/08/14 21:51, Jin Ma wrote:
> Hi Tsukasa,
>   What a coincidence, I also implemented zfa extension, which also includes fli related instructions :)

Hi, I'm glad to know that someone is working on this extension more
comprehensively (especially when "someone" is an experienced GCC
contributor).  I prefer your patch set in general and glad to learn from
your patch set and your response that my approach was not *that* bad as
I expected.

When a new extension gets available, I will be more confident making a
patch set for GCC (as I already do in GNU Binutils).

> 
> links: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627294.html
> 
>> +  if (!TARGET_HARD_FLOAT || !TARGET_ZFA)
>> +    return result;
>> +  switch (GET_MODE (x))
>> +    {
>> +    case HFmode:
>> +      /* Not only 'Zfhmin', either 'Zfh' or 'Zvfh' is required.  */
>> +      if (!TARGET_ZFH && !TARGET_ZVFH)
> 
> When Zvfh means that zfh is also on, so there may be no need to judge
> the TARGET_ZVFH here. By the way,the format here seems wrong, maybe 'tab'
> is needed for alignment?

For indentation, I believe this is okay considering 3 indent (soft tab)
from the top (meaning 6 spaces).

For specification requirements, I think I'm correct.

The spec says that 'Zvfh' depends on 'Zve32f' and 'Zfhmin'.  'Zfhmin' is
a conversion-only 'Zfh' subset ('Zve32f' doesn't require any
FP16-related extensions).

Note that "fli.h" requires 'Zfa' and ('Zfh' and/or 'Zvfh').

So, 'Zfh' alone will not be sufficient to check requirements to the
"fli.h" instruction.  So, checking TARGET_ZFH || TARGET_ZVFH (for
existence of the "fli.h") should be correct and I think your patch needs
to be changed "in the long term".

"In the long term" means that, current GNU Binutils has a bug which
"fli.h" requires 'Zfa' and 'Zfh' ('Zfa' and 'Zvfh' does not work).
My initial 'Zfa' proposal (improved by Christoph Müllner and upstreamed
into master) intentionally ignored this case because I assumed that
approval/ratification of 'Zvfh' will take some time and we have a time
to fix before a release of Binutils following approval of both 'Zfa' and
'Zvfh' (it turned out to be wrong).

cf. <https://sourceware.org/pipermail/binutils/2023-August/129006.html>

So, "fixing" this part (on your patch) alone will not make the program
work (on the simulator) because current buggy GNU Binutils won't accept
it.  I'm working on it on the GNU Binutils side.

> 
>> +	return result;
>> +      break;
>> +    case SFmode: break;
>> +    case DFmode: break;
> 
> Maybe we still have to judge TARGET_DOUBLE_FLOAT?

Indeed.  I just missed that.

> 
>> +    default: return result;
>> +    }
>> +
>> +  if (!CONST_DOUBLE_P (x))
>> +    return result;
> 
> I think it might be better to judge whether x satisfies the CONST_DOUBLE_P
> before switch (GET_MODE (x)) above.

That's correct.  I think that's a part of leftover when I'm experimenting.

> 
>> +
>> +  r = *CONST_DOUBLE_REAL_VALUE (x);
>> +
>> +  if (REAL_VALUE_ISNAN (r))
>> +    {
>> +      long reprs[2] = { 0 };
>> +      /* Compare with canonical NaN.  */
>> +      switch (GET_MODE (x))
>> +	{
>> +	case HFmode:
>> +	  reprs[0] = real_to_target (NULL, &r,
>> +				     float_mode_for_size (16).require ());
>> +	  /* 0x7e00: Canonical NaN for binary16.  */
>> +	  if (reprs[0] != 0x7e00)
>> +	    return result;
>> +	  break;
>> +	case SFmode:
>> +	  reprs[0] = real_to_target (NULL, &r,
>> +				     float_mode_for_size (32).require ());
>> +	  /* 0x7fc00000: Canonical NaN for binary32.  */
>> +	  if (reprs[0] != 0x7fc00000)
>> +	    return result;
>> +	  break;
>> +	case DFmode:
>> +	  real_to_target (reprs, &r, float_mode_for_size (64).require ());
>> +	  if (FLOAT_WORDS_BIG_ENDIAN)
>> +	    std::swap (reprs[0], reprs[1]);
>> +	  /* 0x7ff80000_00000000: Canonical NaN for binary64.  */
>> +	  if (reprs[0] != 0 || reprs[1] != 0x7ff80000)
>> +	    return result;
>> +	  break;
>> +	default:
>> +	  gcc_unreachable ();
>> +	}
>> +      result.type = RISCV_FLOAT_CONST_NAN;
>> +      result.valid = true;
>> +      return result;
>> +    }
>> +  else if (REAL_VALUE_ISINF (r))
>> +    {
>> +      if (REAL_VALUE_NEGATIVE (r))
>> +	return result;
>> +      result.type = RISCV_FLOAT_CONST_INF;
>> +      result.valid = true;
>> +      return result;
>> +    }
>> +
>> +  bool sign = REAL_VALUE_NEGATIVE (r);
>> +  result.sign = sign;
>> +
>> +  r = real_value_abs (&r);
>> +  /* GCC internally does not use IEEE754-like encoding (where normalized
>> +     significands are in the range [1, 2).  GCC uses [0.5, 1) (see real.cc).
>> +     So, this exponent_p1 variable equals IEEE754 unbiased exponent + 1.  */
>> +  int exponent_p1 = REAL_EXP (&r);
>> +
>> +  /* For the mantissa, we expand into two HOST_WIDE_INTS, apart from the
>> +     highest (sign) bit, with a fixed binary point at bit point_pos.
>> +     m1 holds the low part of the mantissa, m2 the high part.
>> +     WARNING: If we ever have a representation using more than 2 * H_W_I - 1
>> +     bits for the mantissa, this can fail (low bits will be lost).  */
>> +  bool fail = false;
>> +  real_ldexp (&m, &r, (2 * HOST_BITS_PER_WIDE_INT - 1) - exponent_p1);
>> +  wide_int w = real_to_integer (&m, &fail, HOST_BITS_PER_WIDE_INT * 2);
>> +  if (fail)
>> +    return result;
>> +
>> +  /* If the low part of the mantissa has bits set we cannot represent
>> +     the value.  */
>> +  if (w.ulow () != 0)
>> +    return result;
>> +  /* We have rejected the lower HOST_WIDE_INT, so update our
>> +     understanding of how many bits lie in the mantissa and
>> +     look only at the high HOST_WIDE_INT.  */
>> +  unsigned HOST_WIDE_INT mantissa = w.elt (1);
>> +
>> +  /* We cannot represent the value 0.0.  */
>> +  if (mantissa == 0)
>> +    return result;
>> +
>> +  /* We can only represent values with a mantissa of the form 1.xx.  */
>> +  unsigned HOST_WIDE_INT mask
>> +      = ((unsigned HOST_WIDE_INT) 1 << (HOST_BITS_PER_WIDE_INT - 4)) - 1;
>> +  if ((mantissa & mask) != 0)
>> +    return result;
>> +  mantissa >>= HOST_BITS_PER_WIDE_INT - 4;
>> +  /* Now the lowest 3-bits of mantissa should form (1.xx)b.  */
>> +  gcc_assert (mantissa & (1u << 2));
>> +  /* Mask out the highest bit.  */
>> +  mantissa &= ~(1u << 2);
>> +
>> +  if (mantissa == 0)
>> +    {
>> +      /* We cannot represent any values but -1.0.  */
>> +      if (exponent_p1 != 1 && sign)
>> +	return result;
>> +      switch (exponent_p1)
>> +	{
>> +	case -15: /* 1.0 * 2^(-16)  */
>> +	case -14: /* 1.0 * 2^(-15)  */
>> +	case -7:  /* 1.0 * 2^(- 8)  */
>> +	case -6:  /* 1.0 * 2^(- 7)  */
>> +	case 8:   /* 1.0 * 2^(+ 7)  */
>> +	case 9:   /* 1.0 * 2^(+ 8)  */
>> +	case 16:  /* 1.0 * 2^(+15)  */
>> +	case 17:  /* 1.0 * 2^(+16)  */
>> +	  break;
>> +	default:
>> +	  if (exponent_p1 >= -3 && exponent_p1 <= 5)
>> +	    /* 1.0 * 2^[-4,4]  */
>> +	    break;
>> +	  switch (GET_MODE (x))
>> +	    {
>> +	    case HFmode: /* IEEE 754 binary16.  */
>> +	      /* Minimum positive normal == 1.0 * 2^(-14)  */
>> +	      if (exponent_p1 != -13) return result;
>> +	      break;
>> +	    case SFmode: /* IEEE 754 binary32.  */
>> +	      /* Minimum positive normal == 1.0 * 2^(-126)  */
>> +	      if (exponent_p1 != -125) return result;
>> +	      break;
>> +	    case DFmode: /* IEEE 754 binary64.  */
>> +	      /* Minimum positive normal == 1.0 * 2^(-1022)  */
>> +	      if (exponent_p1 != -1021) return result;
>> +	      break;
>> +	    default:
>> +	      gcc_unreachable ();
>> +	    }
>> +	  result.type = RISCV_FLOAT_CONST_MIN;
>> +	  result.valid = true;
>> +	  return result;
>> +	}
>> +    }
>> +  else
>> +    {
>> +      if (sign)
>> +	return result;
>> +      if (exponent_p1 < -1 || exponent_p1 > 2)
>> +	return result;
>> +      /* The value is (+1.xx)b * 2^[-2,1].
>> +	 But we cannot represent (+1.11)b * 2^1 (that is 3.5). */
>> +      if (exponent_p1 == 2 && mantissa == 3)
>> +	return result;
>> +    }
>> +
>> +  result.valid = true;
>> +  result.mantissa_below_point = mantissa;
>> +  result.biased_exponent = exponent_p1 + 15;
>> +
>> +  return result;
>> +}
>> +
> 
> This code is great and completely different from the way I implemented it.
> I'm not sure which one is better, but my idea is that the fli instruction
> corresponds to three tables (HF, SF and DF), all of which represent
> specific values. the library in gcc's real.h can very well convert
> the corresponding values into the values in the table, so it is only
> necessary to perform a simple binary search to look up the tables.

Yup.  My approach (based on AArch64's VMOV.F32 constraint checking code)
is more generic but I think constants with single FLI instruction don't
need to be that generic.

If multi-instruction FLI sequence gets realistic, this kind of generic
approach (handling finite constants precisely) will be helpful (multi
FLI sequence with addition might need some additional measures to avoid
underflow, though).  But for now, I think your approach is better and
simpler.

> 
> @@ -1362,17 +1545,14 @@  riscv_const_insns (rtx x)
>  		   constant incurs a literal-pool access.  Allow this in
>  		   order to increase vectorization possibilities.  */
>  		int n = riscv_const_insns (elt);
> -		if (CONST_DOUBLE_P (elt))
> -		    return 1 + 4; /* vfmv.v.f + memory access.  */
>> +		/* We need as many insns as it takes to load the constant
>> +		   into a GPR and one vmv.v.x.  */
>> +		if (n != 0)
>> +		  return 1 + n;
>> +		else if (CONST_DOUBLE_P (elt))
>> +		  return 1 + 4; /* vfmv.v.f + memory access.  */
>  		else
> -		  {
> -		    /* We need as many insns as it takes to load the constant
> -		       into a GPR and one vmv.v.x.  */
> -		    if (n != 0)
> -		      return 1 + n;
> -		    else
> -		      return 1 + 4; /*vmv.v.x + memory access.  */
> -		  }
>> +		  return 1 + 4; /* vmv.v.x + memory access.  */
>  	      }
>  	  }
> 
> I don't seem to understand here, if n = = 0, always return 1 + 4?
> If so, it could be
> if (n != 0)
>    return 1 + n;
> else
>   return 1 + 4;
> 
> @@ -5117,6 +5313,36 @@  riscv_print_operand (FILE *file, rtx op, int letter)
>  	    output_address (mode, XEXP (op, 0));
>  	  break;
>  
>> +	case CONST_DOUBLE:
>> +	  {
>> +	    struct riscv_float_fli_const flt = riscv_get_float_fli_const (op);
>> +	    if (flt.valid)
>> +	      {
>> +		switch (flt.type)
>> +		  {
>> +		  case RISCV_FLOAT_CONST_MIN:
>> +		    fputs ("min", file);
>> +		    break;
>> +		  case RISCV_FLOAT_CONST_INF:
>> +		    fputs ("inf", file);
>> +		    break;
>> +		  case RISCV_FLOAT_CONST_NAN:
>> +		    fputs ("nan", file);
>> +		    break;
>> +		  default:
>> +		    /* Use simpler (and bit-perfect) printer.  */
>> +		    if (flt.sign)
>> +		      fputc ('-', file);
>> +		    fprintf (file, "0x1.%cp%+d",
>> +			     "048c"[flt.mantissa_below_point],
>> +			     (int) flt.biased_exponent - 16);
>> +		    break;
>> +		  }
>> +		break;
>> +	      }
>> +	  }
>> +	  /* Fall through.  */
> 
> Display floating-point values at the assembly level can refer llvm
> https://reviews.llvm.org/D145645. 

Thanks for the link.  I personally prefer hexfloats to avoid precision
problems as possible and that's how GNU Binutils prints the FLI
constants.  But that makes sense (and I feel decimals are okay).

> 
> It may also be necessary to deal with riscv_split_64bit_move_p
> and riscv_legitimize_const_move for rv32, otherwise the mov of
> DFmode on rv32 will be split into high 32-bit mov and low 32-bit
> mov, thus unable to generate fli instructions.

Thanks for letting me know.  I'm fighting against GCC's large code base
for future contribution and that's a lot of help for me.

Thanks,
Tsukasa

> 
> Thanks,
> Jin
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [2/2] RISC-V: Constant FP Optimization with 'Zfa'
  2023-08-15  3:38     ` Tsukasa OI
@ 2023-08-15  7:59       ` Tsukasa OI
  2023-08-15  9:20         ` Jin Ma
  0 siblings, 1 reply; 10+ messages in thread
From: Tsukasa OI @ 2023-08-15  7:59 UTC (permalink / raw)
  To: Jin Ma, Tsukasa OI; +Cc: gcc-patches

On 2023/08/15 12:38, Tsukasa OI wrote:
> On 2023/08/14 21:51, Jin Ma wrote:
>> Hi Tsukasa,
>>   What a coincidence, I also implemented zfa extension, which also includes fli related instructions :)
> 
> Hi, I'm glad to know that someone is working on this extension more
> comprehensively (especially when "someone" is an experienced GCC
> contributor).  I prefer your patch set in general and glad to learn from
> your patch set and your response that my approach was not *that* bad as
> I expected.
> 
> When a new extension gets available, I will be more confident making a
> patch set for GCC (as I already do in GNU Binutils).
> 
>>
>> links: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627294.html
>>
>>> +  if (!TARGET_HARD_FLOAT || !TARGET_ZFA)
>>> +    return result;
>>> +  switch (GET_MODE (x))
>>> +    {
>>> +    case HFmode:
>>> +      /* Not only 'Zfhmin', either 'Zfh' or 'Zvfh' is required.  */
>>> +      if (!TARGET_ZFH && !TARGET_ZVFH)
>>
>> When Zvfh means that zfh is also on, so there may be no need to judge
>> the TARGET_ZVFH here. By the way,the format here seems wrong, maybe 'tab'
>> is needed for alignment?
> 
> For indentation, I believe this is okay considering 3 indent (soft tab)
> from the top (meaning 6 spaces).
> 
> For specification requirements, I think I'm correct.
> 
> The spec says that 'Zvfh' depends on 'Zve32f' and 'Zfhmin'.  'Zfhmin' is
> a conversion-only 'Zfh' subset ('Zve32f' doesn't require any
> FP16-related extensions).
> 
> Note that "fli.h" requires 'Zfa' and ('Zfh' and/or 'Zvfh').
> 
> So, 'Zfh' alone will not be sufficient to check requirements to the
> "fli.h" instruction.  So, checking TARGET_ZFH || TARGET_ZVFH (for
> existence of the "fli.h") should be correct and I think your patch needs
> to be changed "in the long term".
> 
> "In the long term" means that, current GNU Binutils has a bug which
> "fli.h" requires 'Zfa' and 'Zfh' ('Zfa' and 'Zvfh' does not work).
> My initial 'Zfa' proposal (improved by Christoph Müllner and upstreamed
> into master) intentionally ignored this case because I assumed that
> approval/ratification of 'Zvfh' will take some time and we have a time
> to fix before a release of Binutils following approval of both 'Zfa' and
> 'Zvfh' (it turned out to be wrong).
> 
> cf. <https://sourceware.org/pipermail/binutils/2023-August/129006.html>
> 
> So, "fixing" this part (on your patch) alone will not make the program
> work (on the simulator) because current buggy GNU Binutils won't accept
> it.  I'm working on it on the GNU Binutils side.

Okay, the bug is fixed on GNU Binutils (master) and waiting approval
from the release maintainer (for binutils-2_41-branch).

Thanks,
Tsukasa

> 
>>
>>> +	return result;
>>> +      break;
>>> +    case SFmode: break;
>>> +    case DFmode: break;
>>
>> Maybe we still have to judge TARGET_DOUBLE_FLOAT?
> 
> Indeed.  I just missed that.
> 
>>
>>> +    default: return result;
>>> +    }
>>> +
>>> +  if (!CONST_DOUBLE_P (x))
>>> +    return result;
>>
>> I think it might be better to judge whether x satisfies the CONST_DOUBLE_P
>> before switch (GET_MODE (x)) above.
> 
> That's correct.  I think that's a part of leftover when I'm experimenting.
> 
>>
>>> +
>>> +  r = *CONST_DOUBLE_REAL_VALUE (x);
>>> +
>>> +  if (REAL_VALUE_ISNAN (r))
>>> +    {
>>> +      long reprs[2] = { 0 };
>>> +      /* Compare with canonical NaN.  */
>>> +      switch (GET_MODE (x))
>>> +	{
>>> +	case HFmode:
>>> +	  reprs[0] = real_to_target (NULL, &r,
>>> +				     float_mode_for_size (16).require ());
>>> +	  /* 0x7e00: Canonical NaN for binary16.  */
>>> +	  if (reprs[0] != 0x7e00)
>>> +	    return result;
>>> +	  break;
>>> +	case SFmode:
>>> +	  reprs[0] = real_to_target (NULL, &r,
>>> +				     float_mode_for_size (32).require ());
>>> +	  /* 0x7fc00000: Canonical NaN for binary32.  */
>>> +	  if (reprs[0] != 0x7fc00000)
>>> +	    return result;
>>> +	  break;
>>> +	case DFmode:
>>> +	  real_to_target (reprs, &r, float_mode_for_size (64).require ());
>>> +	  if (FLOAT_WORDS_BIG_ENDIAN)
>>> +	    std::swap (reprs[0], reprs[1]);
>>> +	  /* 0x7ff80000_00000000: Canonical NaN for binary64.  */
>>> +	  if (reprs[0] != 0 || reprs[1] != 0x7ff80000)
>>> +	    return result;
>>> +	  break;
>>> +	default:
>>> +	  gcc_unreachable ();
>>> +	}
>>> +      result.type = RISCV_FLOAT_CONST_NAN;
>>> +      result.valid = true;
>>> +      return result;
>>> +    }
>>> +  else if (REAL_VALUE_ISINF (r))
>>> +    {
>>> +      if (REAL_VALUE_NEGATIVE (r))
>>> +	return result;
>>> +      result.type = RISCV_FLOAT_CONST_INF;
>>> +      result.valid = true;
>>> +      return result;
>>> +    }
>>> +
>>> +  bool sign = REAL_VALUE_NEGATIVE (r);
>>> +  result.sign = sign;
>>> +
>>> +  r = real_value_abs (&r);
>>> +  /* GCC internally does not use IEEE754-like encoding (where normalized
>>> +     significands are in the range [1, 2).  GCC uses [0.5, 1) (see real.cc).
>>> +     So, this exponent_p1 variable equals IEEE754 unbiased exponent + 1.  */
>>> +  int exponent_p1 = REAL_EXP (&r);
>>> +
>>> +  /* For the mantissa, we expand into two HOST_WIDE_INTS, apart from the
>>> +     highest (sign) bit, with a fixed binary point at bit point_pos.
>>> +     m1 holds the low part of the mantissa, m2 the high part.
>>> +     WARNING: If we ever have a representation using more than 2 * H_W_I - 1
>>> +     bits for the mantissa, this can fail (low bits will be lost).  */
>>> +  bool fail = false;
>>> +  real_ldexp (&m, &r, (2 * HOST_BITS_PER_WIDE_INT - 1) - exponent_p1);
>>> +  wide_int w = real_to_integer (&m, &fail, HOST_BITS_PER_WIDE_INT * 2);
>>> +  if (fail)
>>> +    return result;
>>> +
>>> +  /* If the low part of the mantissa has bits set we cannot represent
>>> +     the value.  */
>>> +  if (w.ulow () != 0)
>>> +    return result;
>>> +  /* We have rejected the lower HOST_WIDE_INT, so update our
>>> +     understanding of how many bits lie in the mantissa and
>>> +     look only at the high HOST_WIDE_INT.  */
>>> +  unsigned HOST_WIDE_INT mantissa = w.elt (1);
>>> +
>>> +  /* We cannot represent the value 0.0.  */
>>> +  if (mantissa == 0)
>>> +    return result;
>>> +
>>> +  /* We can only represent values with a mantissa of the form 1.xx.  */
>>> +  unsigned HOST_WIDE_INT mask
>>> +      = ((unsigned HOST_WIDE_INT) 1 << (HOST_BITS_PER_WIDE_INT - 4)) - 1;
>>> +  if ((mantissa & mask) != 0)
>>> +    return result;
>>> +  mantissa >>= HOST_BITS_PER_WIDE_INT - 4;
>>> +  /* Now the lowest 3-bits of mantissa should form (1.xx)b.  */
>>> +  gcc_assert (mantissa & (1u << 2));
>>> +  /* Mask out the highest bit.  */
>>> +  mantissa &= ~(1u << 2);
>>> +
>>> +  if (mantissa == 0)
>>> +    {
>>> +      /* We cannot represent any values but -1.0.  */
>>> +      if (exponent_p1 != 1 && sign)
>>> +	return result;
>>> +      switch (exponent_p1)
>>> +	{
>>> +	case -15: /* 1.0 * 2^(-16)  */
>>> +	case -14: /* 1.0 * 2^(-15)  */
>>> +	case -7:  /* 1.0 * 2^(- 8)  */
>>> +	case -6:  /* 1.0 * 2^(- 7)  */
>>> +	case 8:   /* 1.0 * 2^(+ 7)  */
>>> +	case 9:   /* 1.0 * 2^(+ 8)  */
>>> +	case 16:  /* 1.0 * 2^(+15)  */
>>> +	case 17:  /* 1.0 * 2^(+16)  */
>>> +	  break;
>>> +	default:
>>> +	  if (exponent_p1 >= -3 && exponent_p1 <= 5)
>>> +	    /* 1.0 * 2^[-4,4]  */
>>> +	    break;
>>> +	  switch (GET_MODE (x))
>>> +	    {
>>> +	    case HFmode: /* IEEE 754 binary16.  */
>>> +	      /* Minimum positive normal == 1.0 * 2^(-14)  */
>>> +	      if (exponent_p1 != -13) return result;
>>> +	      break;
>>> +	    case SFmode: /* IEEE 754 binary32.  */
>>> +	      /* Minimum positive normal == 1.0 * 2^(-126)  */
>>> +	      if (exponent_p1 != -125) return result;
>>> +	      break;
>>> +	    case DFmode: /* IEEE 754 binary64.  */
>>> +	      /* Minimum positive normal == 1.0 * 2^(-1022)  */
>>> +	      if (exponent_p1 != -1021) return result;
>>> +	      break;
>>> +	    default:
>>> +	      gcc_unreachable ();
>>> +	    }
>>> +	  result.type = RISCV_FLOAT_CONST_MIN;
>>> +	  result.valid = true;
>>> +	  return result;
>>> +	}
>>> +    }
>>> +  else
>>> +    {
>>> +      if (sign)
>>> +	return result;
>>> +      if (exponent_p1 < -1 || exponent_p1 > 2)
>>> +	return result;
>>> +      /* The value is (+1.xx)b * 2^[-2,1].
>>> +	 But we cannot represent (+1.11)b * 2^1 (that is 3.5). */
>>> +      if (exponent_p1 == 2 && mantissa == 3)
>>> +	return result;
>>> +    }
>>> +
>>> +  result.valid = true;
>>> +  result.mantissa_below_point = mantissa;
>>> +  result.biased_exponent = exponent_p1 + 15;
>>> +
>>> +  return result;
>>> +}
>>> +
>>
>> This code is great and completely different from the way I implemented it.
>> I'm not sure which one is better, but my idea is that the fli instruction
>> corresponds to three tables (HF, SF and DF), all of which represent
>> specific values. the library in gcc's real.h can very well convert
>> the corresponding values into the values in the table, so it is only
>> necessary to perform a simple binary search to look up the tables.
> 
> Yup.  My approach (based on AArch64's VMOV.F32 constraint checking code)
> is more generic but I think constants with single FLI instruction don't
> need to be that generic.
> 
> If multi-instruction FLI sequence gets realistic, this kind of generic
> approach (handling finite constants precisely) will be helpful (multi
> FLI sequence with addition might need some additional measures to avoid
> underflow, though).  But for now, I think your approach is better and
> simpler.
> 
>>
>> @@ -1362,17 +1545,14 @@  riscv_const_insns (rtx x)
>>  		   constant incurs a literal-pool access.  Allow this in
>>  		   order to increase vectorization possibilities.  */
>>  		int n = riscv_const_insns (elt);
>> -		if (CONST_DOUBLE_P (elt))
>> -		    return 1 + 4; /* vfmv.v.f + memory access.  */
>>> +		/* We need as many insns as it takes to load the constant
>>> +		   into a GPR and one vmv.v.x.  */
>>> +		if (n != 0)
>>> +		  return 1 + n;
>>> +		else if (CONST_DOUBLE_P (elt))
>>> +		  return 1 + 4; /* vfmv.v.f + memory access.  */
>>  		else
>> -		  {
>> -		    /* We need as many insns as it takes to load the constant
>> -		       into a GPR and one vmv.v.x.  */
>> -		    if (n != 0)
>> -		      return 1 + n;
>> -		    else
>> -		      return 1 + 4; /*vmv.v.x + memory access.  */
>> -		  }
>>> +		  return 1 + 4; /* vmv.v.x + memory access.  */
>>  	      }
>>  	  }
>>
>> I don't seem to understand here, if n = = 0, always return 1 + 4?
>> If so, it could be
>> if (n != 0)
>>    return 1 + n;
>> else
>>   return 1 + 4;
>>
>> @@ -5117,6 +5313,36 @@  riscv_print_operand (FILE *file, rtx op, int letter)
>>  	    output_address (mode, XEXP (op, 0));
>>  	  break;
>>  
>>> +	case CONST_DOUBLE:
>>> +	  {
>>> +	    struct riscv_float_fli_const flt = riscv_get_float_fli_const (op);
>>> +	    if (flt.valid)
>>> +	      {
>>> +		switch (flt.type)
>>> +		  {
>>> +		  case RISCV_FLOAT_CONST_MIN:
>>> +		    fputs ("min", file);
>>> +		    break;
>>> +		  case RISCV_FLOAT_CONST_INF:
>>> +		    fputs ("inf", file);
>>> +		    break;
>>> +		  case RISCV_FLOAT_CONST_NAN:
>>> +		    fputs ("nan", file);
>>> +		    break;
>>> +		  default:
>>> +		    /* Use simpler (and bit-perfect) printer.  */
>>> +		    if (flt.sign)
>>> +		      fputc ('-', file);
>>> +		    fprintf (file, "0x1.%cp%+d",
>>> +			     "048c"[flt.mantissa_below_point],
>>> +			     (int) flt.biased_exponent - 16);
>>> +		    break;
>>> +		  }
>>> +		break;
>>> +	      }
>>> +	  }
>>> +	  /* Fall through.  */
>>
>> Display floating-point values at the assembly level can refer llvm
>> https://reviews.llvm.org/D145645. 
> 
> Thanks for the link.  I personally prefer hexfloats to avoid precision
> problems as possible and that's how GNU Binutils prints the FLI
> constants.  But that makes sense (and I feel decimals are okay).
> 
>>
>> It may also be necessary to deal with riscv_split_64bit_move_p
>> and riscv_legitimize_const_move for rv32, otherwise the mov of
>> DFmode on rv32 will be split into high 32-bit mov and low 32-bit
>> mov, thus unable to generate fli instructions.
> 
> Thanks for letting me know.  I'm fighting against GCC's large code base
> for future contribution and that's a lot of help for me.
> 
> Thanks,
> Tsukasa
> 
>>
>> Thanks,
>> Jin
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [2/2] RISC-V: Constant FP Optimization with 'Zfa'
  2023-08-15  7:59       ` Tsukasa OI
@ 2023-08-15  9:20         ` Jin Ma
  0 siblings, 0 replies; 10+ messages in thread
From: Jin Ma @ 2023-08-15  9:20 UTC (permalink / raw)
  To: Tsukasa OI; +Cc: gcc-patches

On 2023/08/15 12:38, Tsukasa OI wrote:
> > On 2023/08/14 21:51, Jin Ma wrote:
> >> Hi Tsukasa,
> >>   What a coincidence, I also implemented zfa extension, which also includes fli related instructions :)
> > 
> > Hi, I'm glad to know that someone is working on this extension more
> > comprehensively (especially when "someone" is an experienced GCC
> > contributor).  I prefer your patch set in general and glad to learn from
> > your patch set and your response that my approach was not *that* bad as
> > I expected.
> > 
> > When a new extension gets available, I will be more confident making a
> > patch set for GCC (as I already do in GNU Binutils).
> > 
> >>
> >> links: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627294.html
> >>
> >>> +  if (!TARGET_HARD_FLOAT || !TARGET_ZFA)
> >>> +    return result;
> >>> +  switch (GET_MODE (x))
> >>> +    {
> >>> +    case HFmode:
> >>> +      /* Not only 'Zfhmin', either 'Zfh' or 'Zvfh' is required.  */
> >>> +      if (!TARGET_ZFH && !TARGET_ZVFH)
> >>
> >> When Zvfh means that zfh is also on, so there may be no need to judge
> >> the TARGET_ZVFH here. By the way,the format here seems wrong, maybe 'tab'
> >> is needed for alignment?
> > 
> > For indentation, I believe this is okay considering 3 indent (soft tab)
> > from the top (meaning 6 spaces).
> > 
> > For specification requirements, I think I'm correct.
> > 
> > The spec says that 'Zvfh' depends on 'Zve32f' and 'Zfhmin'.  'Zfhmin' is
> > a conversion-only 'Zfh' subset ('Zve32f' doesn't require any
> > FP16-related extensions).
> > 
> > Note that "fli.h" requires 'Zfa' and ('Zfh' and/or 'Zvfh').
> > 
> > So, 'Zfh' alone will not be sufficient to check requirements to the
> > "fli.h" instruction.  So, checking TARGET_ZFH || TARGET_ZVFH (for
> > existence of the "fli.h") should be correct and I think your patch needs
> > to be changed "in the long term".
> > 
> > "In the long term" means that, current GNU Binutils has a bug which
> > "fli.h" requires 'Zfa' and 'Zfh' ('Zfa' and 'Zvfh' does not work).
> > My initial 'Zfa' proposal (improved by Christoph Müllner and upstreamed
> > into master) intentionally ignored this case because I assumed that
> > approval/ratification of 'Zvfh' will take some time and we have a time
> > to fix before a release of Binutils following approval of both 'Zfa' and
> > 'Zvfh' (it turned out to be wrong).
> > 
> > cf. <https://sourceware.org/pipermail/binutils/2023-August/129006.html>
> > 
> > So, "fixing" this part (on your patch) alone will not make the program
> > work (on the simulator) because current buggy GNU Binutils won't accept
> > it.  I'm working on it on the GNU Binutils side.
> 
> Okay, the bug is fixed on GNU Binutils (master) and waiting approval
> from the release maintainer (for binutils-2_41-branch).
> 
> Thanks,
> Tsukasa
> 

Yes, you are right. I did not notice that zfh and zvfh are relatively independent.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] RISC-V: Add support for the 'Zfa' extension
  2023-08-14  5:32 ` [PATCH 1/2] RISC-V: Add support for the 'Zfa' extension Tsukasa OI
@ 2023-08-25 20:22   ` Jeff Law
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Law @ 2023-08-25 20:22 UTC (permalink / raw)
  To: Tsukasa OI, Kito Cheng, Palmer Dabbelt, Andrew Waterman, Jim Wilson
  Cc: gcc-patches



On 8/13/23 23:32, Tsukasa OI via Gcc-patches wrote:
> From: Tsukasa OI <research_trasio@irq.a4lg.com>
> 
> This commit adds support for the 'Zfa' extension containing additional
> floating point instructions, version 0.1 (stable and approved).
> 
> gcc/ChangeLog:
> 
> 	* common/config/riscv/riscv-common.cc
> 	(riscv_implied_info): Add implication 'Zfa' -> 'F'.
> 	(riscv_ext_version_table) Add support for the 'Zfa' extension.
> 	(riscv_ext_flag_table) Set MASK_ZFA if 'Zfa' is available.
> 	* config/riscv/riscv-opts.h (MASK_ZFA, TARGET_ZFA): New.
So I think this and Jin Ma's most recently posted Zfa bits are almost 
functionally equivalent.  The only notable difference in this patch is 
Jin's work puts Zfa into its own subextension rather than in the 
existing zf subextension.

I think that was done in the v10 patch from Jin in response to the 
implies/depends comment from Kito.  So I'm inclined to say that's the 
preferred approach.

Given that's the only notable difference between this patch and Jin's 
patch, I'm going to consider 1/2 in this series superseded by Jin's work.

jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [2/2] RISC-V: Constant FP Optimization with 'Zfa'
  2023-08-14 12:51   ` [2/2] " Jin Ma
  2023-08-15  3:38     ` Tsukasa OI
@ 2023-08-25 20:59     ` Jeff Law
  1 sibling, 0 replies; 10+ messages in thread
From: Jeff Law @ 2023-08-25 20:59 UTC (permalink / raw)
  To: Jin Ma, gcc-patches
  Cc: research_trasio, palmer, richard.sandiford, kito.cheng,
	philipp.tomsich, christoph.muellner, rdapp.gcc, juzhe.zhong,
	vineetg, jinma.contrib



On 8/14/23 06:51, Jin Ma wrote:

> 
> This code is great and completely different from the way I implemented it.
> I'm not sure which one is better, but my idea is that the fli instruction
> corresponds to three tables (HF, SF and DF), all of which represent
> specific values. the library in gcc's real.h can very well convert
> the corresponding values into the values in the table, so it is only
> necessary to perform a simple binary search to look up the tables.
Yea, I was kindof amazed at how Tsukasa implemented that code.  But I 
think the tables are easier to understand, so I'd tend to prefer them.

I'm still evaluating, but in general it looks like your implementation 
is (functionally) a superset of what Tsukasa has done.  I've still got 
some testing to do with Tsukasa's tests to verify, but my inclination is 
to go with your v10 patch right now.

Jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-08-25 20:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-14  5:32 [PATCH 0/2] RISC-V: Make "prefetch.i" built-in usable Tsukasa OI
2023-08-14  5:32 ` [PATCH 1/2] RISC-V: Add support for the 'Zfa' extension Tsukasa OI
2023-08-25 20:22   ` Jeff Law
2023-08-14  5:32 ` [PATCH 2/2] RISC-V: Constant FP Optimization with 'Zfa' Tsukasa OI
2023-08-14 12:51   ` [2/2] " Jin Ma
2023-08-15  3:38     ` Tsukasa OI
2023-08-15  7:59       ` Tsukasa OI
2023-08-15  9:20         ` Jin Ma
2023-08-25 20:59     ` Jeff Law
2023-08-14  6:19 ` [PATCH 0/2] " Tsukasa OI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).