From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out30-112.freemail.mail.aliyun.com (out30-112.freemail.mail.aliyun.com [115.124.30.112]) by sourceware.org (Postfix) with ESMTPS id BD64D3858D28 for ; Sat, 6 May 2023 07:54:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BD64D3858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.alibaba.com X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045168;MF=jinma@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0Vhsc9aS_1683359689; Received: from localhost.localdomain(mailfrom:jinma@linux.alibaba.com fp:SMTPD_---0Vhsc9aS_1683359689) by smtp.aliyun-inc.com; Sat, 06 May 2023 15:54:51 +0800 From: Jin Ma To: gcc-patches@gcc.gnu.org Cc: jeffreyalaw@gmail.com, kito.cheng@sifive.com, kito.cheng@gmail.com, palmer@dabbelt.com, ijinma@yeah.net Subject: Re: [PATCH v8] RISC-V: Add the 'zfa' extension, version 0.2. Date: Sat, 6 May 2023 15:54:40 +0800 Message-Id: <20230506075440.1078-1-jinma@linux.alibaba.com> X-Mailer: git-send-email 2.38.1.windows.1 In-Reply-To: <20230419095751.815-1-jinma@linux.alibaba.com> References: <20230419095751.815-1-jinma@linux.alibaba.com> MIME-Version: 1.0 Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-20.3 required=5.0 tests=BAYES_00,ENV_AND_HDR_SPF_MATCH,GIT_PATCH_0,KAM_DMARC_STATUS,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,URIBL_BLACK,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > On 4/19/23 03:57, Jin Ma wrote: > > This patch adds the 'Zfa' extension for riscv, which is based on: > > https://github.com/riscv/riscv-isa-manual/commits/zfb > > https://github.com/riscv/riscv-isa-manual/commit/1f038182810727f5feca311072e630d6baac51da > > > > The binutils-gdb for 'Zfa' extension: > > https://github.com/a4lg/binutils-gdb/commits/riscv-zfa > > > > What needs special explanation is: > > 1, The immediate number of the instructions FLI.H/S/D is represented in the assembly as a > > floating-point value, with scientific counting when rs1 is 1,2, and decimal numbers for > > the rest. > > > > Related llvm link: > > https://reviews.llvm.org/D145645 > > Related discussion link: > > https://github.com/riscv/riscv-isa-manual/issues/980 > Right. I think the goal right now is to get the bulk of this reviewed > now. Ideally we'll get to the point where the only outstanding issue is > the interface between the assembler & gcc. I will send a new version referring to the latest binutils(v5) in the near future: https://sourceware.org/pipermail/binutils/2023-April/127060.html > > > > > 2, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally to > > accelerate the processing of JavaScript Numbers.", so it seems that no implementation > > is required. > Fair enough. There's seems to be a general desire to wire up builtins > for many things that aren't directly usable by the compiler. So > consider such a change as a follow-up. I don't think something like > this should hold up the blk of Zfa. > > > > > 3, The instructions FMINM and FMAXM correspond to C23 library function fminimum and fmaximum. > > Therefore, this patch has simply implemented the pattern of fminm3 and > > fmaxm3 to prepare for later. > Sounds good. > > > > > > gcc/ChangeLog: > > > > * common/config/riscv/riscv-common.cc: Add zfa extension version. > > * config/riscv/constraints.md (Zf): Constrain the floating point number that the > > instructions FLI.H/S/D can load. > > ((TARGET_XTHEADFMV || TARGET_ZFA) ? FP_REGS : NO_REGS): enable FMVP.D.X and FMVH.X.D. > > * config/riscv/iterators.md (ceil): New. > > * config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): New. > > * config/riscv/riscv.cc (find_index_in_array): New. > > (riscv_float_const_rtx_index_for_fli): Get the index of the floating-point number that > > the instructions FLI.H/S/D can mov. > > (riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, memory is not applicable. > > (riscv_const_insns): The cost of FLI.H/S/D is 3. > > (riscv_legitimize_const_move): Likewise. > > (riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no split is required. > > (riscv_output_move): Output the mov instructions in zfa extension. > > (riscv_print_operand): Output the floating-point value of the FLI.H/S/D immediate in assembly > > (riscv_secondary_memory_needed): Likewise. > > * config/riscv/riscv.h (GP_REG_RTX_P): New. > > * config/riscv/riscv.md (fminm3): New. > > > > > index c448e6b37e9..62d9094f966 100644 > > --- a/gcc/config/riscv/constraints.md > > +++ b/gcc/config/riscv/constraints.md > > @@ -118,6 +118,13 @@ (define_constraint "T" > > (and (match_operand 0 "move_operand") > > (match_test "CONSTANT_P (op)"))) > > > > +;; Zfa constraints. > > + > > +(define_constraint "Zf" > > + "A floating point number that can be loaded using instruction `fli` in zfa." > > + (and (match_code "const_double") > > + (match_test "(riscv_float_const_rtx_index_for_fli (op) != -1)"))) > > + > > ;; Vector constraints. > > > > (define_register_constraint "vr" "TARGET_VECTOR ? V_REGS : NO_REGS" > > @@ -183,8 +190,8 @@ (define_memory_constraint "Wdm" > > > > ;; Vendor ISA extension constraints. > > > > -(define_register_constraint "th_f_fmv" "TARGET_XTHEADFMV ? FP_REGS : NO_REGS" > > +(define_register_constraint "th_f_fmv" "(TARGET_XTHEADFMV || TARGET_ZFA) ? FP_REGS : NO_REGS" > > "A floating-point register for XTheadFmv.") > > > > -(define_register_constraint "th_r_fmv" "TARGET_XTHEADFMV ? GR_REGS : NO_REGS" > > +(define_register_constraint "th_r_fmv" "(TARGET_XTHEADFMV || TARGET_ZFA) ? GR_REGS : NO_REGS" > > "An integer register for XTheadFmv.") > I think Christoph had good suggestions on the constraints. So let's go > with his suggestions. > > You might consider a follow-up patch where you use negation of one of > the predefined constants for synthesis. I would not be surprised at all > if that's as efficient on some cores as loading the negated constants > out of the constant pool. But I don't think it has to be a part of this > patch. > I also think the Christoph is right, and I will revise it according to his suggestion. > > > > > diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md > > index 9b767038452..c81b08e3cc5 100644 > > --- a/gcc/config/riscv/iterators.md > > +++ b/gcc/config/riscv/iterators.md > > @@ -288,3 +288,8 @@ (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET]) > > (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET "le")]) > > (define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") (UNSPEC_FLE_QUIET "LE")]) > > > > +(define_int_iterator ROUND [UNSPEC_ROUND UNSPEC_FLOOR UNSPEC_CEIL UNSPEC_BTRUNC UNSPEC_ROUNDEVEN UNSPEC_NEARBYINT]) > > +(define_int_attr round_pattern [(UNSPEC_ROUND "round") (UNSPEC_FLOOR "floor") (UNSPEC_CEIL "ceil") > > + (UNSPEC_BTRUNC "btrunc") (UNSPEC_ROUNDEVEN "roundeven") (UNSPEC_NEARBYINT "nearbyint")]) > > +(define_int_attr round_rm [(UNSPEC_ROUND "rmm") (UNSPEC_FLOOR "rdn") (UNSPEC_CEIL "rup") > > + (UNSPEC_BTRUNC "rtz") (UNSPEC_ROUNDEVEN "rne") (UNSPEC_NEARBYINT "dyn")]) > Do we really need to use unspecs for all these cases? I would expect > some correspond to the trunc, round, ceil, nearbyint, etc well known RTX > codes. > > In general, we should try to avoid unspecs when there is a clear > semantic match between the instruction and GCC's RTX opcodes. So please > review the existing RTX code semantics to see if any match the new > instructions. If there are matches, use those RTX codes rather than > UNSPECs. I'll try, thanks. > > > > > > > +/* Immediate values loaded by the FLI.S instruction in Chapter 25 of the latest RISC-V ISA > > + Manual draft. For details, please see: > > + https://github.com/riscv/riscv-isa-manual/releases/tag/draft-20221217-cb3b9d1 */ > > + > > +unsigned HOST_WIDE_INT fli_value_hf[32] = > > +{ > > + 0xbc00, 0x400, 0x100, 0x200, 0x1c00, 0x2000, 0x2c00, 0x3000, > > + 0x3400, 0x3500, 0x3600, 0x3700, 0x3800, 0x3900, 0x3a00, 0x3b00, > > + 0x3c00, 0x3d00, 0x3e00, 0x3f00, 0x4000, 0x4100, 0x4200, 0x4400, > > + 0x4800, 0x4c00, 0x5800, 0x5c00, 0x7800, > > + /* Only used for filling, ensuring that 29 and 30 of HF are the same. */ > > + 0x7800, > > + 0x7c00, 0x7e00, > > +}; > > + > > +unsigned HOST_WIDE_INT fli_value_sf[32] = > > +{ > > + 0xbf800000, 0x00800000, 0x37800000, 0x38000000, 0x3b800000, 0x3c000000, 0x3d800000, 0x3e000000, > > + 0x3e800000, 0x3ea00000, 0x3ec00000, 0x3ee00000, 0x3f000000, 0x3f200000, 0x3f400000, 0x3f600000, > > + 0x3f800000, 0x3fa00000, 0x3fc00000, 0x3fe00000, 0x40000000, 0x40200000, 0x40400000, 0x40800000, > > + 0x41000000, 0x41800000, 0x43000000, 0x43800000, 0x47000000, 0x47800000, 0x7f800000, 0x7fc00000 > > +}; > > + > > +unsigned HOST_WIDE_INT fli_value_df[32] = > > +{ > > + 0xbff0000000000000, 0x10000000000000, 0x3ef0000000000000, 0x3f00000000000000, > > + 0x3f70000000000000, 0x3f80000000000000, 0x3fb0000000000000, 0x3fc0000000000000, > > + 0x3fd0000000000000, 0x3fd4000000000000, 0x3fd8000000000000, 0x3fdc000000000000, > > + 0x3fe0000000000000, 0x3fe4000000000000, 0x3fe8000000000000, 0x3fec000000000000, > > + 0x3ff0000000000000, 0x3ff4000000000000, 0x3ff8000000000000, 0x3ffc000000000000, > > + 0x4000000000000000, 0x4004000000000000, 0x4008000000000000, 0x4010000000000000, > > + 0x4020000000000000, 0x4030000000000000, 0x4060000000000000, 0x4070000000000000, > > + 0x40e0000000000000, 0x40f0000000000000, 0x7ff0000000000000, 0x7ff8000000000000, > > +}; > Going to assume these are sane. I think the only concern would be > endianness, but it looks like you handle that reasonably. I did a simple treatment of endianness in the function riscv_float_const_rtx_index_for_fli(), which seems to be correct at present. In addition, in the next version, I used the newer floating point literal formats instead according to your suggestion. For example: unsigned HOST_WIDE_INT fli_value_sf[32] = { 0xbf8p20, 0x008p20, 0x378p20, 0x380p20, 0x3b8p20, 0x3c0p20, 0x3d8p20, 0x3e0p20, 0x3e8p20, 0x3eap20, 0x3ecp20, 0x3eep20, 0x3f0p20, 0x3f2p20, 0x3f4p20, 0x3f6p20, 0x3f8p20, 0x3fap20, 0x3fcp20, 0x3fep20, 0x400p20, 0x402p20, 0x404p20, 0x408p20, 0x410p20, 0x418p20, 0x430p20, 0x438p20, 0x470p20, 0x478p20, 0x7f8p20, 0x7fcp20 }; Is that so? I don't know if I understand correctly. > > + > > +/* Find the index of TARGET in ARRAY, and return -1 if not found. */ > > + > > +static int > > +find_index_in_array (unsigned HOST_WIDE_INT target, unsigned HOST_WIDE_INT *array, int len) > > +{ > > + if (array == NULL) > > + return -1; > > + > > + for (int i = 0; i < len; i++) > > + { > > + if (target == array[i]) > > + return i; > > + } > > + return -1; > > +} > Given the way constraint and operand matching occurrs, I wouldn't be > surprised if this search turns out to be compile-time expensive. Yes, I tried to find a better way, but the compiler seems to have to retrieve the 32 values of the fli instruction, which may need to be optimized, such as the binary search algorithm. > > > > > + > > +/* Return index of the FLI instruction table if rtx X is an immediate constant that > > + can be moved using a single FLI instruction in zfa extension. -1 otherwise. */ > > + > > +int > > +riscv_float_const_rtx_index_for_fli (rtx x) > > +{ > > + machine_mode mode = GET_MODE (x); > > + > > + if (!TARGET_ZFA || mode == VOIDmode > > + || !CONST_DOUBLE_P(x) > > + || (mode == HFmode && !TARGET_ZFH) > > + || (mode == SFmode && !TARGET_HARD_FLOAT) > > + || (mode == DFmode && !TARGET_DOUBLE_FLOAT)) > > + return -1; > Bring the "|| mode == VOIDmode" down to its own line similar to how > you've done with the !CONST_DOUBLE_P check. Fix in the next version. > > > > > static bool > > @@ -826,6 +936,9 @@ riscv_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) > > if (GET_CODE (x) == HIGH) > > return true; > > > > + if (riscv_float_const_rtx_index_for_fli (x) != -1) > > + return true; > > + > So if you do a follow-up handling negative fli constants, obviously we'd > need further changes in this code. This is a query index function, I think it will only return 0 or positive integer, there should be no negative index. > > @@ -1213,6 +1326,11 @@ riscv_const_insns (rtx x) > > } > > > > case CONST_DOUBLE: > > + /* See if we can use FMV directly. */ > > + if (riscv_float_const_rtx_index_for_fli (x) != -1) > > + return 3; > That seems fairly high cost-wise. Where did this value come from? Or > is it relative to COSTS_N_INSNS? Referring to the relevant patch aarch64, in this case the COSTS_N_INSNS (3) is returned, so I simply define it here as 3, or should I change it to COSTS_N_INSNS (3)? https://github.com/gcc-mirror/gcc/commit/a217096563e356fa03cc5163665148227613c62f#diff-2ea6a52c675e9f1862287091ef606b129d9e311224999af1cc017317c62c1efeR6942 > > > if (TARGET_DOUBLE_FLOAT > > && ((FP_REG_RTX_P (src) && FP_REG_RTX_P (dest)) > > || (FP_REG_RTX_P (dest) && MEM_P (src)) > > || (FP_REG_RTX_P (src) && MEM_P (dest)) > > + || (TARGET_ZFA > > + && ((FP_REG_RTX_P (dest) && GP_REG_RTX_P (src)) > > + || (FP_REG_RTX_P (src) && GP_REG_RTX_P (dest)))) > The formatting of the second FP_REG_RTX_P check looks goofy, but that > may be a mailer issue. Double check the "|| FP_REG" should line up > under the FP_REG_RTX_P. It will be fixed in the next version. > > > > @@ -2968,6 +3103,14 @@ riscv_output_move (rtx dest, rtx src) > > case 8: > > return "fld\t%0,%1"; > > } > > + > > + if (src_code == CONST_DOUBLE && (riscv_float_const_rtx_index_for_fli (src) != -1)) > > + switch (width) > > + { > > + case 2: return "fli.h\t%0,%1"; > > + case 4: return "fli.s\t%0,%1"; > > + case 8: return "fli.d\t%0,%1"; > > + } > We generally discourage having code on the same line as a case > statement, so bring those return statements down to a new line. > It will be fixed in the next version. > > > > > > @@ -1580,6 +1609,26 @@ (define_insn "l2" > > [(set_attr "type" "fcvt") > > (set_attr "mode" "")]) > > > > +(define_insn "2" > > + [(set (match_operand:ANYF 0 "register_operand" "=f") > > + (unspec:ANYF > > + [(match_operand:ANYF 1 "register_operand" " f")] > > + ROUND))] > > + "TARGET_HARD_FLOAT && TARGET_ZFA" > > + "fround.\t%0,%1," > > + [(set_attr "type" "fcvt") > > + (set_attr "mode" "")]) > > + > > +(define_insn "rint2" > > + [(set (match_operand:ANYF 0 "register_operand" "=f") > > + (unspec:ANYF > > + [(match_operand:ANYF 1 "register_operand" " f")] > > + UNSPEC_RINT))] > > + "TARGET_HARD_FLOAT && TARGET_ZFA" > > + "froundnx.\t%0,%1" > > + [(set_attr "type" "fcvt") > > + (set_attr "mode" "")]) > Please review the existing RTX codes and their semantics in the > internals manual and if any of the new instructions match those existing > primitives, implement them using those RTX codes rather than with an UNSPEC. > I'll try, thanks. > > Overall it looks pretty good. > > jeff Thank you for your guidance. Jin Ma