From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=eW2k=A3=linux.alibaba.com=jinma@sourceware.org>
Received: from out30-112.freemail.mail.aliyun.com (out30-112.freemail.mail.aliyun.com [115.124.30.112])
	by sourceware.org (Postfix) with ESMTPS id BD64D3858D28
	for <gcc-patches@gcc.gnu.org>; Sat,  6 May 2023 07:54:55 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BD64D3858D28
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.alibaba.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.alibaba.com
X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045168;MF=jinma@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0Vhsc9aS_1683359689;
Received: from localhost.localdomain(mailfrom:jinma@linux.alibaba.com fp:SMTPD_---0Vhsc9aS_1683359689)
          by smtp.aliyun-inc.com;
          Sat, 06 May 2023 15:54:51 +0800
From: Jin Ma <jinma@linux.alibaba.com>
To: gcc-patches@gcc.gnu.org
Cc: jeffreyalaw@gmail.com,
	kito.cheng@sifive.com,
	kito.cheng@gmail.com,
	palmer@dabbelt.com,
	ijinma@yeah.net
Subject: Re: [PATCH v8] RISC-V: Add the 'zfa' extension, version 0.2.
Date: Sat,  6 May 2023 15:54:40 +0800
Message-Id: <20230506075440.1078-1-jinma@linux.alibaba.com>
X-Mailer: git-send-email 2.38.1.windows.1
In-Reply-To: <20230419095751.815-1-jinma@linux.alibaba.com>
References: <20230419095751.815-1-jinma@linux.alibaba.com>
MIME-Version: 1.0
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-20.3 required=5.0 tests=BAYES_00,ENV_AND_HDR_SPF_MATCH,GIT_PATCH_0,KAM_DMARC_STATUS,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,URIBL_BLACK,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


> On 4/19/23 03:57, Jin Ma wrote:
> > This patch adds the 'Zfa' extension for riscv, which is based on:
> >    https://github.com/riscv/riscv-isa-manual/commits/zfb
> >    https://github.com/riscv/riscv-isa-manual/commit/1f038182810727f5feca311072e630d6baac51da
> > 
> > The binutils-gdb for 'Zfa' extension:
> >    https://github.com/a4lg/binutils-gdb/commits/riscv-zfa
> > 
> > What needs special explanation is:
> > 1, The immediate number of the instructions FLI.H/S/D is represented in the assembly as a
> >    floating-point value, with scientific counting when rs1 is 1,2, and decimal numbers for
> >    the rest.
> > 
> >    Related llvm link:
> >      https://reviews.llvm.org/D145645
> >    Related discussion link:
> >      https://github.com/riscv/riscv-isa-manual/issues/980
> Right.  I think the goal right now is to get the bulk of this reviewed 
> now.  Ideally we'll get to the point where the only outstanding issue is 
> the interface between the assembler & gcc.

I will send a new version referring to the latest binutils(v5) in the near future:
https://sourceware.org/pipermail/binutils/2023-April/127060.html

> 
> > 
> > 2, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally to
> >    accelerate the processing of JavaScript Numbers.", so it seems that no implementation
> >    is required.
> Fair enough.  There's seems to be a general desire to wire up builtins 
> for many things that aren't directly usable by the compiler.  So 
> consider such a change as a follow-up.   I don't think something like 
> this should hold up the blk of Zfa.
> 
> > 
> > 3, The instructions FMINM and FMAXM correspond to C23 library function fminimum and fmaximum.
> >    Therefore, this patch has simply implemented the pattern of fminm<hf\sf\df>3 and
> >    fmaxm<hf\sf\df>3 to prepare for later.
> Sounds good.
> 
> 
> > 
> > gcc/ChangeLog:
> > 
> > 	* common/config/riscv/riscv-common.cc: Add zfa extension version.
> > 	* config/riscv/constraints.md (Zf): Constrain the floating point number that the
> > 	instructions FLI.H/S/D can load.
> > 	((TARGET_XTHEADFMV || TARGET_ZFA) ? FP_REGS : NO_REGS): enable FMVP.D.X and FMVH.X.D.
> > 	* config/riscv/iterators.md (ceil): New.
> > 	* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): New.
> > 	* config/riscv/riscv.cc (find_index_in_array): New.
> > 	(riscv_float_const_rtx_index_for_fli): Get the index of the floating-point number that
> > 	the instructions FLI.H/S/D can mov.
> > 	(riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, memory is not applicable.
> > 	(riscv_const_insns): The cost of FLI.H/S/D is 3.
> > 	(riscv_legitimize_const_move): Likewise.
> > 	(riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no split is required.
> > 	(riscv_output_move): Output the mov instructions in zfa extension.
> > 	(riscv_print_operand): Output the floating-point value of the FLI.H/S/D immediate in assembly
> > 	(riscv_secondary_memory_needed): Likewise.
> > 	* config/riscv/riscv.h (GP_REG_RTX_P): New.
> > 	* config/riscv/riscv.md (fminm<mode>3): New.
> > 
> 
> > index c448e6b37e9..62d9094f966 100644
> > --- a/gcc/config/riscv/constraints.md
> > +++ b/gcc/config/riscv/constraints.md
> > @@ -118,6 +118,13 @@ (define_constraint "T"
> >     (and (match_operand 0 "move_operand")
> >          (match_test "CONSTANT_P (op)")))
> >   
> > +;; Zfa constraints.
> > +
> > +(define_constraint "Zf"
> > +  "A floating point number that can be loaded using instruction `fli` in zfa."
> > +  (and (match_code "const_double")
> > +       (match_test "(riscv_float_const_rtx_index_for_fli (op) != -1)")))
> > +
> >   ;; Vector constraints.
> >   
> >   (define_register_constraint "vr" "TARGET_VECTOR ? V_REGS : NO_REGS"
> > @@ -183,8 +190,8 @@ (define_memory_constraint "Wdm"
> >   
> >   ;; Vendor ISA extension constraints.
> >   
> > -(define_register_constraint "th_f_fmv" "TARGET_XTHEADFMV ? FP_REGS : NO_REGS"
> > +(define_register_constraint "th_f_fmv" "(TARGET_XTHEADFMV || TARGET_ZFA) ? FP_REGS : NO_REGS"
> >     "A floating-point register for XTheadFmv.")
> >   
> > -(define_register_constraint "th_r_fmv" "TARGET_XTHEADFMV ? GR_REGS : NO_REGS"
> > +(define_register_constraint "th_r_fmv" "(TARGET_XTHEADFMV || TARGET_ZFA) ? GR_REGS : NO_REGS"
> >     "An integer register for XTheadFmv.")
> I think Christoph had good suggestions on the constraints.  So let's go 
> with his suggestions.
> 
> You might consider a follow-up patch where you use negation of one of 
> the predefined constants for synthesis.  I would not be surprised at all 
> if that's as efficient on some cores as loading the negated constants 
> out of the constant pool.  But I don't think it has to be a part of this 
> patch.
> 

I also think the Christoph is right, and I will revise it according to his suggestion.

> 
> 
> 
> > diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
> > index 9b767038452..c81b08e3cc5 100644
> > --- a/gcc/config/riscv/iterators.md
> > +++ b/gcc/config/riscv/iterators.md
> > @@ -288,3 +288,8 @@ (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET])
> >   (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET "le")])
> >   (define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") (UNSPEC_FLE_QUIET "LE")])
> >   
> > +(define_int_iterator ROUND [UNSPEC_ROUND UNSPEC_FLOOR UNSPEC_CEIL UNSPEC_BTRUNC UNSPEC_ROUNDEVEN UNSPEC_NEARBYINT])
> > +(define_int_attr round_pattern [(UNSPEC_ROUND "round") (UNSPEC_FLOOR "floor") (UNSPEC_CEIL "ceil")
> > +				(UNSPEC_BTRUNC "btrunc") (UNSPEC_ROUNDEVEN "roundeven") (UNSPEC_NEARBYINT "nearbyint")])
> > +(define_int_attr round_rm [(UNSPEC_ROUND "rmm") (UNSPEC_FLOOR "rdn") (UNSPEC_CEIL "rup")
> > +			   (UNSPEC_BTRUNC "rtz") (UNSPEC_ROUNDEVEN "rne") (UNSPEC_NEARBYINT "dyn")])
> Do we really need to use unspecs for all these cases?  I would expect 
> some correspond to the trunc, round, ceil, nearbyint, etc well known RTX 
> codes.
> 
> In general, we should try to avoid unspecs when there is a clear 
> semantic match between the instruction and GCC's RTX opcodes.  So please 
> review the existing RTX code semantics to see if any match the new 
> instructions.  If there are matches, use those RTX codes rather than 
> UNSPECs.

I'll try, thanks.

> 
> 
> 
> >   
> > +/* Immediate values loaded by the FLI.S instruction in Chapter 25 of the latest RISC-V ISA
> > +   Manual draft. For details, please see:
> > +   https://github.com/riscv/riscv-isa-manual/releases/tag/draft-20221217-cb3b9d1 */
> > +
> > +unsigned HOST_WIDE_INT fli_value_hf[32] =
> > +{
> > +  0xbc00, 0x400, 0x100, 0x200, 0x1c00, 0x2000, 0x2c00, 0x3000,
> > +  0x3400, 0x3500, 0x3600, 0x3700, 0x3800, 0x3900, 0x3a00, 0x3b00,
> > +  0x3c00, 0x3d00, 0x3e00, 0x3f00, 0x4000, 0x4100, 0x4200, 0x4400,
> > +  0x4800, 0x4c00, 0x5800, 0x5c00, 0x7800,
> > +  /* Only used for filling, ensuring that 29 and 30 of HF are the same. */
> > +  0x7800,
> > +  0x7c00, 0x7e00,
> > +};
> > +
> > +unsigned HOST_WIDE_INT fli_value_sf[32] =
> > +{
> > +  0xbf800000, 0x00800000, 0x37800000, 0x38000000, 0x3b800000, 0x3c000000, 0x3d800000, 0x3e000000,
> > +  0x3e800000, 0x3ea00000, 0x3ec00000, 0x3ee00000, 0x3f000000, 0x3f200000, 0x3f400000, 0x3f600000,
> > +  0x3f800000, 0x3fa00000, 0x3fc00000, 0x3fe00000, 0x40000000, 0x40200000, 0x40400000, 0x40800000,
> > +  0x41000000, 0x41800000, 0x43000000, 0x43800000, 0x47000000, 0x47800000, 0x7f800000, 0x7fc00000
> > +};
> > +
> > +unsigned HOST_WIDE_INT fli_value_df[32] =
> > +{
> > +  0xbff0000000000000, 0x10000000000000, 0x3ef0000000000000, 0x3f00000000000000,
> > +  0x3f70000000000000, 0x3f80000000000000, 0x3fb0000000000000, 0x3fc0000000000000,
> > +  0x3fd0000000000000, 0x3fd4000000000000, 0x3fd8000000000000, 0x3fdc000000000000,
> > +  0x3fe0000000000000, 0x3fe4000000000000, 0x3fe8000000000000, 0x3fec000000000000,
> > +  0x3ff0000000000000, 0x3ff4000000000000, 0x3ff8000000000000, 0x3ffc000000000000,
> > +  0x4000000000000000, 0x4004000000000000, 0x4008000000000000, 0x4010000000000000,
> > +  0x4020000000000000, 0x4030000000000000, 0x4060000000000000, 0x4070000000000000,
> > +  0x40e0000000000000, 0x40f0000000000000, 0x7ff0000000000000, 0x7ff8000000000000,
> > +};
> Going to assume these are sane.  I think the only concern would be 
> endianness, but it looks like you handle that reasonably.

I did a simple treatment of endianness in the function riscv_float_const_rtx_index_for_fli(),
which seems to be correct at present.

In addition, in the next version, I used the newer floating point literal 
formats instead according to your suggestion.

For example:
unsigned HOST_WIDE_INT fli_value_sf[32] =
{
  0xbf8p20, 0x008p20, 0x378p20, 0x380p20, 0x3b8p20, 0x3c0p20, 0x3d8p20, 0x3e0p20,
  0x3e8p20, 0x3eap20, 0x3ecp20, 0x3eep20, 0x3f0p20, 0x3f2p20, 0x3f4p20, 0x3f6p20,
  0x3f8p20, 0x3fap20, 0x3fcp20, 0x3fep20, 0x400p20, 0x402p20, 0x404p20, 0x408p20,
  0x410p20, 0x418p20, 0x430p20, 0x438p20, 0x470p20, 0x478p20, 0x7f8p20, 0x7fcp20
};

Is that so? I don't know if I understand correctly.

> > +
> > +/* Find the index of TARGET in ARRAY, and return -1 if not found. */
> > +
> > +static int
> > +find_index_in_array (unsigned HOST_WIDE_INT target, unsigned HOST_WIDE_INT *array, int len)
> > +{
> > +  if (array == NULL)
> > +    return -1;
> > +
> > +  for (int i = 0; i < len; i++)
> > +    {
> > +      if (target == array[i])
> > +	return i;
> > +    }
> > +  return -1;
> > +}
> Given the way constraint and operand matching occurrs, I wouldn't be 
> surprised if this search turns out to be compile-time expensive.

Yes, I tried to find a better way, but the compiler seems to have to retrieve the 32 values of 
the fli instruction, which may need to be optimized, such as the binary search algorithm.

> 
> 
> 
> > +
> > +/* Return index of the FLI instruction table if rtx X is an immediate constant that
> > +   can be moved using a single FLI instruction in zfa extension. -1 otherwise. */
> > +
> > +int
> > +riscv_float_const_rtx_index_for_fli (rtx x)
> > +{
> > +  machine_mode mode = GET_MODE (x);
> > +
> > +  if (!TARGET_ZFA || mode == VOIDmode
> > +      || !CONST_DOUBLE_P(x)
> > +      || (mode == HFmode && !TARGET_ZFH)
> > +      || (mode == SFmode && !TARGET_HARD_FLOAT)
> > +      || (mode == DFmode && !TARGET_DOUBLE_FLOAT))
> > +    return -1;
> Bring the "|| mode == VOIDmode" down to its own line similar to how 
> you've done with the !CONST_DOUBLE_P check.

Fix in the next version.

> 
> >   
> >   static bool
> > @@ -826,6 +936,9 @@ riscv_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
> >     if (GET_CODE (x) == HIGH)
> >       return true;
> >   
> > +  if (riscv_float_const_rtx_index_for_fli (x) != -1)
> > +   return true;
> > +
> So if you do a follow-up handling negative fli constants, obviously we'd 
> need further changes in this code.

This is a query index function, I think it will only return 0 or positive integer, 
there should be no negative index.

> > @@ -1213,6 +1326,11 @@ riscv_const_insns (rtx x)
> >         }
> >   
> >       case CONST_DOUBLE:
> > +      /* See if we can use FMV directly.  */
> > +      if (riscv_float_const_rtx_index_for_fli (x) != -1)
> > +	return 3;
> That seems fairly high cost-wise.  Where did this value come from?   Or 
> is it relative to COSTS_N_INSNS?

Referring to the relevant patch aarch64, in this case the COSTS_N_INSNS (3) is returned, so I
simply define it here as 3, or should I change it to COSTS_N_INSNS (3)?
https://github.com/gcc-mirror/gcc/commit/a217096563e356fa03cc5163665148227613c62f#diff-2ea6a52c675e9f1862287091ef606b129d9e311224999af1cc017317c62c1efeR6942

> 
> >     if (TARGET_DOUBLE_FLOAT
> >         && ((FP_REG_RTX_P (src) && FP_REG_RTX_P (dest))
> >   	  || (FP_REG_RTX_P (dest) && MEM_P (src))
> >   	  || (FP_REG_RTX_P (src) && MEM_P (dest))
> > +	  || (TARGET_ZFA
> > +	      && ((FP_REG_RTX_P (dest) && GP_REG_RTX_P (src))
> > +	      || (FP_REG_RTX_P (src) && GP_REG_RTX_P (dest))))
> The formatting of the second FP_REG_RTX_P check looks goofy, but that 
> may be a mailer issue.  Double check the "|| FP_REG" should line up 
> under the FP_REG_RTX_P.


It will be fixed in the next version.

> 
> 
> > @@ -2968,6 +3103,14 @@ riscv_output_move (rtx dest, rtx src)
> >   	  case 8:
> >   	    return "fld\t%0,%1";
> >   	  }
> > +
> > +      if (src_code == CONST_DOUBLE && (riscv_float_const_rtx_index_for_fli (src) != -1))
> > +	switch (width)
> > +	  {
> > +	    case 2: return "fli.h\t%0,%1";
> > +	    case 4: return "fli.s\t%0,%1";
> > +	    case 8: return "fli.d\t%0,%1";
> > +	  }
> We generally discourage having code on the same line as a case 
> statement, so bring those return statements down to a new line.
> 


It will be fixed in the next version.

> 
> 
> 
> 
> > @@ -1580,6 +1609,26 @@ (define_insn "l<rint_pattern><ANYF:mode><GPR:mode>2"
> >     [(set_attr "type" "fcvt")
> >      (set_attr "mode" "<ANYF:MODE>")])
> >   
> > +(define_insn "<round_pattern><ANYF:mode>2"
> > +  [(set (match_operand:ANYF     0 "register_operand" "=f")
> > +	(unspec:ANYF
> > +	    [(match_operand:ANYF 1 "register_operand" " f")]
> > +	ROUND))]
> > +  "TARGET_HARD_FLOAT && TARGET_ZFA"
> > +  "fround.<ANYF:fmt>\t%0,%1,<round_rm>"
> > +  [(set_attr "type" "fcvt")
> > +   (set_attr "mode" "<ANYF:MODE>")])
> > +
> > +(define_insn "rint<ANYF:mode>2"
> > +  [(set (match_operand:ANYF     0 "register_operand" "=f")
> > +	(unspec:ANYF
> > +	    [(match_operand:ANYF 1 "register_operand" " f")]
> > +	UNSPEC_RINT))]
> > +  "TARGET_HARD_FLOAT && TARGET_ZFA"
> > +  "froundnx.<ANYF:fmt>\t%0,%1"
> > +  [(set_attr "type" "fcvt")
> > +   (set_attr "mode" "<ANYF:MODE>")])
> Please review the existing RTX codes and their semantics in the 
> internals manual and if any of the new instructions match those existing 
> primitives, implement them using those RTX codes rather than with an UNSPEC.
>

I'll try, thanks.

> 
> Overall it looks pretty good.
> 
> jeff

Thank you for your guidance.

Jin Ma