public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/2] RISC-V: Make "prefetch.i" built-in usable
@ 2023-08-14  5:32 Tsukasa OI
  2023-08-14  5:32 ` [PATCH 1/2] RISC-V: Add support for the 'Zfa' extension Tsukasa OI
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Tsukasa OI @ 2023-08-14  5:32 UTC (permalink / raw)
  To: Tsukasa OI, Kito Cheng, Palmer Dabbelt, Andrew Waterman, Jim Wilson
  Cc: gcc-patches

Hello,

and... I think this might be my first *large* patch set for GCC
contribution and definitely the first one to touch the machine description.

So, please review it carefully.


Background
===========

This patch set adds an optimization to FP constant initialization using a
FLI instruction, which is a part of the 'Zfa' extension which provides
additional floating-point instructions.

FLI instructions ("fli.h" for binary16, "fli.s" for binary32, "fli.d" for
binary64 and "fli.q" for binary128 [which can be ignored because current
GCC for RISC-V does not natively support binary128]) provide an
load-immediate operation for following 32 immediates.

| Binary Encoding | Immediate (and its part of binary representation) |
| --------------- | --------------------------------------------------|
|    `00000` ( 0) | -1.0          (-0b1.00 * 2^(+ 0))                 |
|    `00001` ( 1) | Minimum positive normal value                     |
|                 | sign=[0] exponent=[0..01] significand=[000..000]  |
|    `00010` ( 2) | 1.00*2^(-16)  (+0b1.00 * 2^(-16))                 |
|    `00011` ( 3) | 1.00*2^(-15)  (+0b1.00 * 2^(-15))                 |
|    `00100` ( 4) | 1.00*2^(- 8)  (+0b1.00 * 2^(- 8))                 |
|    `00101` ( 5) | 1.00*2^(- 7)  (+0b1.00 * 2^(- 7))                 |
|    `00110` ( 6) | 1.00*2^(- 4)  (+0b1.00 * 2^(- 4)) = 0.0625        |
|    `00111` ( 7) | 1.00*2^(- 3)  (+0b1.00 * 2^(- 3)) = 0.125         |
|    `01000` ( 8) | 1.00*2^(- 2)  (+0b1.00 * 2^(- 2)) : 0.25          |
|    `01001` ( 9) | 1.25*2^(- 2)  (+0b1.01 * 2^(- 2)) : 0.3125        |
|    `01010` (10) | 1.50*2^(- 2)  (+0b1.10 * 2^(- 2)) : 0.375         |
|    `01011` (11) | 1.75*2^(- 2)  (+0b1.11 * 2^(- 2)) : 0.4375        |
|    `01100` (12) | 1.00*2^(- 1)  (+0b1.00 * 2^(- 1)) : 0.5           |
|    `01101` (13) | 1.25*2^(- 1)  (+0b1.01 * 2^(- 1)) : 0.625         |
|    `01110` (14) | 1.50*2^(- 1)  (+0b1.10 * 2^(- 1)) : 0.75          |
|    `01111` (15) | 1.75*2^(- 1)  (+0b1.11 * 2^(- 1)) : 0.875         |
|    `10000` (16) | 1.00*2^(+ 0)  (+0b1.00 * 2^(+ 0)) : 1.0           |
|    `10001` (17) | 1.25*2^(+ 0)  (+0b1.01 * 2^(+ 0)) : 1.25          |
|    `10010` (18) | 1.50*2^(+ 0)  (+0b1.10 * 2^(+ 0)) : 1.5           |
|    `10011` (19) | 1.75*2^(+ 0)  (+0b1.11 * 2^(+ 0)) : 1.75          |
|    `10100` (20) | 1.00*2^(+ 1)  (+0b1.00 * 2^(+ 1)) : 2.0           |
|    `10101` (21) | 1.25*2^(+ 1)  (+0b1.01 * 2^(+ 1)) : 2.5           |
|    `10110` (22) | 1.50*2^(+ 1)  (+0b1.10 * 2^(+ 1)) : 3.0           |
|    `10111` (23) | 1.00*2^(+ 2)  (+0b1.00 * 2^(+ 2)) = 4             |
|    `11000` (24) | 1.00*2^(+ 3)  (+0b1.00 * 2^(+ 3)) = 8             |
|    `11001` (25) | 1.00*2^(+ 4)  (+0b1.00 * 2^(+ 4)) = 16            |
|    `11010` (26) | 1.00*2^(+ 7)  (+0b1.00 * 2^(+ 7)) = 128           |
|    `11011` (27) | 1.00*2^(+ 8)  (+0b1.00 * 2^(+ 8)) = 256           |
|    `11100` (28) | 1.00*2^(+15)  (+0b1.00 * 2^(+15)) = 32768         |
|    `11101` (29) | 1.00*2^(+16)  (+0b1.00 * 2^(+16)) = 65536         |
|                 | On "fli.h", this is equivalent to positive inf.   |
|    `11110` (30) | Positive infinity                                 |
|                 | sign=[0] exponent=[1..11] significand=[000..000]  |
|    `11111` (31) | Canonical NaN (positive, quiet and zero payload)  |
|                 | sign=[0] exponent=[1..11] significand=[100..000]  |

Currently, initializing a FP constant (except zero) involves memory and its
use can be reduced by FLI instructions.

We may have a room to generate much complex constants with multiple FLI
instructions (e.g. like long integer constants) but for starter, we can
begin with optimizing one FP constant initialization with one FLI
instruction (and because FP arithmetic often requires larger latency,
benefits of making multiple FLI sequence is not high compared to integers).


FLI FP constant checking
=========================

An instruction with a similar role to RISC-V's FLI instructions is the Arm/
AArch64's vmov.f32 instruction. It provides a load-immediate operation for
constant that can be represented in the following form:

> (-1)^s * 0b1.xxxx * 2^r   (where -3 <= r <= +4; fits in 3-bits)

This patch is largely influenced by AArch64's handling but
compared to this, handling RISC-V's FLI FP constant can be a little tricky.

*   FLI normally generates only values with sign bit 0 except the binary
    encoding 0 (which loads -1.0 with sign bit 1).
*   Not only finite values, FLI can generate positive infinity and
    canonical NaN.
*   Because FLI can generate canonical NaN, handling NaN is preferred but
    FLI only generates canonical NaN.  Since we can easily create a non-
    canonical NaN with __builtin_nan ("[PAYLOAD]") and that could be a
    direct return value of a function, we must reject non-canonical NaNs
    (otherwise it'll generate "fli.d fa0,nan" where NaN is non-canonical).
*   Exponent range and mantissa constraint is a bit tricky.
    On binary encodings 8-22, it looks like 0b1.xx * 2^r (where -2 <= 1)
    but we have to explicitly reject 0b1.11 * 2^1 (that is 3.5) because
    the value 3.5 is not in the list.
    Other 1.00 * 2^r values have discontinuous r.
*   Binary encoding 1 (minimum positive normal value for corresponding
    type) depends on the type (or mode) we are on.
*   Assembler accepts three string operands: "min", "inf" and "nan".

Handling those like aarch64_float_const_representable_p can be
inefficient.  So, I implemented riscv_get_float_fli_const function which
returns complex information about a FLI constant (including whether the
constant is valid for a FLI constant).

This complex information contains:

1.  Validness
2.  Sign bit (only set for -1.0)
3.  FLI constant type ("min", "inf", "nan" or a finite number but "min")
4.  Highest two bits of mantissa under the point (xx for 0b1.xx)
    on a finite value except "min".
5.  Biased exponent (yet sparse representation to make handling easier)
    on a finite value except "min".  For 0b1.xx * 2^r, (r+16) is stored.
    Valid range of this is [0, 32] (inclusive) so it requires 6 bits.

On many ABIs, those information is packed into an integer sized bitfield.


New Constraint: "H"
====================

According to the GCC Internals documentation, (along with "G") "H" is
preferred for a machine-dependent fashion to permit immediate floating
operands in particular ranges of values.  Because "G" is already used to
represent +0.0, this patch set uses "H" for FLI-capable FP constants.

It adds one variant per operation:

*   movhf_hardfloat
*   movsf_hardfloat
*   movdf_hardfloat_rv32
*   movdf_hardfloat_rv64

Note that the 'Zfa' extension requires the 'F' extension (which is the
hard float).



Portions that I'm not sure whether they are okay
=================================================

*   NaN handling (comparison with canonical NaN)
    Due to constraints, I had to compare a NaN with known binary
    representations with known IEEE 754 binary16/32/64's canonical NaN but
    it there any better way to perform this?
*   Any ICE possibility?
    For simple programs, I confirmed that no ICE occurs but I'm not sure
    whether this applies to other programs.  If I miss some cases in
    riscv_output_move or riscv_print_operand functions (corresponding
    mov instructions in riscv.md), it can easily cause an ICE.


Sincerely,
Tsukasa




Tsukasa OI (2):
  RISC-V: Add support for the 'Zfa' extension
  RISC-V: Constant FP Optimization with 'Zfa'

 gcc/common/config/riscv/riscv-common.cc    |   3 +
 gcc/config/riscv/constraints.md            |   7 +
 gcc/config/riscv/riscv-opts.h              |   2 +
 gcc/config/riscv/riscv-protos.h            |  34 +++
 gcc/config/riscv/riscv.cc                  | 250 ++++++++++++++++++++-
 gcc/config/riscv/riscv.md                  |  24 +-
 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c |  24 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c |  24 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c |  14 ++
 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c | 111 +++++++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c |  98 ++++++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c |  61 +++++
 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c |  30 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c |  39 ++++
 14 files changed, 697 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-8.c


base-commit: 614052dd4ea083e086712809c754ffebd9361316
-- 
2.41.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-08-25 20:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-14  5:32 [PATCH 0/2] RISC-V: Make "prefetch.i" built-in usable Tsukasa OI
2023-08-14  5:32 ` [PATCH 1/2] RISC-V: Add support for the 'Zfa' extension Tsukasa OI
2023-08-25 20:22   ` Jeff Law
2023-08-14  5:32 ` [PATCH 2/2] RISC-V: Constant FP Optimization with 'Zfa' Tsukasa OI
2023-08-14 12:51   ` [2/2] " Jin Ma
2023-08-15  3:38     ` Tsukasa OI
2023-08-15  7:59       ` Tsukasa OI
2023-08-15  9:20         ` Jin Ma
2023-08-25 20:59     ` Jeff Law
2023-08-14  6:19 ` [PATCH 0/2] " Tsukasa OI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).