From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=ta2e=A2=gmail.com=jeffreyalaw@sourceware.org>
Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a])
	by sourceware.org (Postfix) with ESMTPS id 4FAAD3858D1E
	for <gcc-patches@gcc.gnu.org>; Fri,  5 May 2023 23:31:09 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4FAAD3858D1E
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-x62a.google.com with SMTP id d9443c01a7336-1aafa41116fso16194175ad.1
        for <gcc-patches@gcc.gnu.org>; Fri, 05 May 2023 16:31:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1683329468; x=1685921468;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=/aCmZ2ibEIpqMgZ93IPbo9+WscYcMQi4iYF0DHM+4BQ=;
        b=XWW3WdJDsykSeXNfVlV5YD8hGXUO2PsI5IxxoVoLKCHM4SzwyVfz1FN6HwP+FP4OvK
         0Q/VfY2mAPKQ04Re9jYYiFtisK0vqawwOtTvJJ5uYL/5699SmoAaj/3XMnppEWt5NAbM
         8Aw5KfspMheYtEaSGhsgT2fjup+iVaVv/+z5e7Ae7SzV57Q7g7wX0xPHwda/RsZQRw+t
         HaVNzJeges/aF9fqTn6bzkR6kDvzUtM78VCFTZ6FGsCrlsXGQVqHgzQ9yNeJUeKJ0dgp
         NzwVK86n5DlZMyZUvYvC4rfEh+ILfOj0sFRXbFEDkbLlYzU0aa/vVYFwGLyqJxiiW47u
         qPLg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1683329468; x=1685921468;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=/aCmZ2ibEIpqMgZ93IPbo9+WscYcMQi4iYF0DHM+4BQ=;
        b=lEWrJHYoQtORhsGPLxMDnQMXo2Ly5OzVWfpEPzWMfPDp85Hax+KnDUxea8GF2M6svF
         lVjhpMWpbWy9ldotnuiBartkIfeDbAMKdPjAq/ZGdoxXeo2PielFllIMWSiQ5OaJeJ2q
         9PDesQHiQqD7GqKp8wWq3c9CRUMDQJ+JU0x7UboMR8IA3VLPJdKYamMhDyCMoOUTM0ch
         /rv9f9FXIDVs2TSVc/eIRClV/mN+eyl9JSds18cyh2+Ol5neiooCEAK+u2m/upZr38Xy
         hHUhDzVY9nUQdNNsIFYZx5sk+1klOFL5EwEznC6Z622J7F3+ZNTXbVwerf4/UY4g0VD3
         nL3A==
X-Gm-Message-State: AC+VfDxU1tWAH2Ig+yoKp4wVjEsyvYP3NDosynttOdI4N66eFVoOceA2
	sCRLmHewNdHTeHm2NPSB1iw=
X-Google-Smtp-Source: ACHHUZ7XS6Ek5vH05o7OPvLZLiNo6/5a9utyCulBCpjyYKcNAkeVYg2xmJHDul+8sTNDGIbkaQV/WQ==
X-Received: by 2002:a17:902:c40a:b0:1ab:28ec:bf10 with SMTP id k10-20020a170902c40a00b001ab28ecbf10mr3904355plk.51.1683329467898;
        Fri, 05 May 2023 16:31:07 -0700 (PDT)
Received: from ?IPV6:2601:681:8600:13d0::99f? ([2601:681:8600:13d0::99f])
        by smtp.gmail.com with ESMTPSA id ja3-20020a170902efc300b001a1a8e98e93sm2266082plb.287.2023.05.05.16.31.06
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Fri, 05 May 2023 16:31:07 -0700 (PDT)
Message-ID: <b2af30ca-0e9b-9a6a-450b-871238b56722@gmail.com>
Date: Fri, 5 May 2023 17:31:05 -0600
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.9.1
Subject: Re: [PATCH v8] RISC-V: Add the 'zfa' extension, version 0.2.
Content-Language: en-US
To: Jin Ma <jinma@linux.alibaba.com>, gcc-patches@gcc.gnu.org
Cc: kito.cheng@sifive.com, kito.cheng@gmail.com, palmer@dabbelt.com,
 christoph.muellner@vrull.eu, ijinma@yeah.net
References: <20230419095751.815-1-jinma@linux.alibaba.com>
From: Jeff Law <jeffreyalaw@gmail.com>
In-Reply-To: <20230419095751.815-1-jinma@linux.alibaba.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 4/19/23 03:57, Jin Ma wrote:
> This patch adds the 'Zfa' extension for riscv, which is based on:
>    https://github.com/riscv/riscv-isa-manual/commits/zfb
>    https://github.com/riscv/riscv-isa-manual/commit/1f038182810727f5feca311072e630d6baac51da
> 
> The binutils-gdb for 'Zfa' extension:
>    https://github.com/a4lg/binutils-gdb/commits/riscv-zfa
> 
> What needs special explanation is:
> 1, The immediate number of the instructions FLI.H/S/D is represented in the assembly as a
>    floating-point value, with scientific counting when rs1 is 1,2, and decimal numbers for
>    the rest.
> 
>    Related llvm link:
>      https://reviews.llvm.org/D145645
>    Related discussion link:
>      https://github.com/riscv/riscv-isa-manual/issues/980
Right.  I think the goal right now is to get the bulk of this reviewed 
now.  Ideally we'll get to the point where the only outstanding issue is 
the interface between the assembler & gcc.

> 
> 2, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally to
>    accelerate the processing of JavaScript Numbers.", so it seems that no implementation
>    is required.
Fair enough.  There's seems to be a general desire to wire up builtins 
for many things that aren't directly usable by the compiler.  So 
consider such a change as a follow-up.   I don't think something like 
this should hold up the blk of Zfa.

> 
> 3, The instructions FMINM and FMAXM correspond to C23 library function fminimum and fmaximum.
>    Therefore, this patch has simply implemented the pattern of fminm<hf\sf\df>3 and
>    fmaxm<hf\sf\df>3 to prepare for later.
Sounds good.


> 
> gcc/ChangeLog:
> 
> 	* common/config/riscv/riscv-common.cc: Add zfa extension version.
> 	* config/riscv/constraints.md (Zf): Constrain the floating point number that the
> 	instructions FLI.H/S/D can load.
> 	((TARGET_XTHEADFMV || TARGET_ZFA) ? FP_REGS : NO_REGS): enable FMVP.D.X and FMVH.X.D.
> 	* config/riscv/iterators.md (ceil): New.
> 	* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): New.
> 	* config/riscv/riscv.cc (find_index_in_array): New.
> 	(riscv_float_const_rtx_index_for_fli): Get the index of the floating-point number that
> 	the instructions FLI.H/S/D can mov.
> 	(riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, memory is not applicable.
> 	(riscv_const_insns): The cost of FLI.H/S/D is 3.
> 	(riscv_legitimize_const_move): Likewise.
> 	(riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no split is required.
> 	(riscv_output_move): Output the mov instructions in zfa extension.
> 	(riscv_print_operand): Output the floating-point value of the FLI.H/S/D immediate in assembly
> 	(riscv_secondary_memory_needed): Likewise.
> 	* config/riscv/riscv.h (GP_REG_RTX_P): New.
> 	* config/riscv/riscv.md (fminm<mode>3): New.
> 

> index c448e6b37e9..62d9094f966 100644
> --- a/gcc/config/riscv/constraints.md
> +++ b/gcc/config/riscv/constraints.md
> @@ -118,6 +118,13 @@ (define_constraint "T"
>     (and (match_operand 0 "move_operand")
>          (match_test "CONSTANT_P (op)")))
>   
> +;; Zfa constraints.
> +
> +(define_constraint "Zf"
> +  "A floating point number that can be loaded using instruction `fli` in zfa."
> +  (and (match_code "const_double")
> +       (match_test "(riscv_float_const_rtx_index_for_fli (op) != -1)")))
> +
>   ;; Vector constraints.
>   
>   (define_register_constraint "vr" "TARGET_VECTOR ? V_REGS : NO_REGS"
> @@ -183,8 +190,8 @@ (define_memory_constraint "Wdm"
>   
>   ;; Vendor ISA extension constraints.
>   
> -(define_register_constraint "th_f_fmv" "TARGET_XTHEADFMV ? FP_REGS : NO_REGS"
> +(define_register_constraint "th_f_fmv" "(TARGET_XTHEADFMV || TARGET_ZFA) ? FP_REGS : NO_REGS"
>     "A floating-point register for XTheadFmv.")
>   
> -(define_register_constraint "th_r_fmv" "TARGET_XTHEADFMV ? GR_REGS : NO_REGS"
> +(define_register_constraint "th_r_fmv" "(TARGET_XTHEADFMV || TARGET_ZFA) ? GR_REGS : NO_REGS"
>     "An integer register for XTheadFmv.")
I think Christoph had good suggestions on the constraints.  So let's go 
with his suggestions.

You might consider a follow-up patch where you use negation of one of 
the predefined constants for synthesis.  I would not be surprised at all 
if that's as efficient on some cores as loading the negated constants 
out of the constant pool.  But I don't think it has to be a part of this 
patch.


> diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
> index 9b767038452..c81b08e3cc5 100644
> --- a/gcc/config/riscv/iterators.md
> +++ b/gcc/config/riscv/iterators.md
> @@ -288,3 +288,8 @@ (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET])
>   (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET "le")])
>   (define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") (UNSPEC_FLE_QUIET "LE")])
>   
> +(define_int_iterator ROUND [UNSPEC_ROUND UNSPEC_FLOOR UNSPEC_CEIL UNSPEC_BTRUNC UNSPEC_ROUNDEVEN UNSPEC_NEARBYINT])
> +(define_int_attr round_pattern [(UNSPEC_ROUND "round") (UNSPEC_FLOOR "floor") (UNSPEC_CEIL "ceil")
> +				(UNSPEC_BTRUNC "btrunc") (UNSPEC_ROUNDEVEN "roundeven") (UNSPEC_NEARBYINT "nearbyint")])
> +(define_int_attr round_rm [(UNSPEC_ROUND "rmm") (UNSPEC_FLOOR "rdn") (UNSPEC_CEIL "rup")
> +			   (UNSPEC_BTRUNC "rtz") (UNSPEC_ROUNDEVEN "rne") (UNSPEC_NEARBYINT "dyn")])
Do we really need to use unspecs for all these cases?  I would expect 
some correspond to the trunc, round, ceil, nearbyint, etc well known RTX 
codes.

In general, we should try to avoid unspecs when there is a clear 
semantic match between the instruction and GCC's RTX opcodes.  So please 
review the existing RTX code semantics to see if any match the new 
instructions.  If there are matches, use those RTX codes rather than 
UNSPECs.


>   
> +/* Immediate values loaded by the FLI.S instruction in Chapter 25 of the latest RISC-V ISA
> +   Manual draft. For details, please see:
> +   https://github.com/riscv/riscv-isa-manual/releases/tag/draft-20221217-cb3b9d1 */
> +
> +unsigned HOST_WIDE_INT fli_value_hf[32] =
> +{
> +  0xbc00, 0x400, 0x100, 0x200, 0x1c00, 0x2000, 0x2c00, 0x3000,
> +  0x3400, 0x3500, 0x3600, 0x3700, 0x3800, 0x3900, 0x3a00, 0x3b00,
> +  0x3c00, 0x3d00, 0x3e00, 0x3f00, 0x4000, 0x4100, 0x4200, 0x4400,
> +  0x4800, 0x4c00, 0x5800, 0x5c00, 0x7800,
> +  /* Only used for filling, ensuring that 29 and 30 of HF are the same. */
> +  0x7800,
> +  0x7c00, 0x7e00,
> +};
> +
> +unsigned HOST_WIDE_INT fli_value_sf[32] =
> +{
> +  0xbf800000, 0x00800000, 0x37800000, 0x38000000, 0x3b800000, 0x3c000000, 0x3d800000, 0x3e000000,
> +  0x3e800000, 0x3ea00000, 0x3ec00000, 0x3ee00000, 0x3f000000, 0x3f200000, 0x3f400000, 0x3f600000,
> +  0x3f800000, 0x3fa00000, 0x3fc00000, 0x3fe00000, 0x40000000, 0x40200000, 0x40400000, 0x40800000,
> +  0x41000000, 0x41800000, 0x43000000, 0x43800000, 0x47000000, 0x47800000, 0x7f800000, 0x7fc00000
> +};
> +
> +unsigned HOST_WIDE_INT fli_value_df[32] =
> +{
> +  0xbff0000000000000, 0x10000000000000, 0x3ef0000000000000, 0x3f00000000000000,
> +  0x3f70000000000000, 0x3f80000000000000, 0x3fb0000000000000, 0x3fc0000000000000,
> +  0x3fd0000000000000, 0x3fd4000000000000, 0x3fd8000000000000, 0x3fdc000000000000,
> +  0x3fe0000000000000, 0x3fe4000000000000, 0x3fe8000000000000, 0x3fec000000000000,
> +  0x3ff0000000000000, 0x3ff4000000000000, 0x3ff8000000000000, 0x3ffc000000000000,
> +  0x4000000000000000, 0x4004000000000000, 0x4008000000000000, 0x4010000000000000,
> +  0x4020000000000000, 0x4030000000000000, 0x4060000000000000, 0x4070000000000000,
> +  0x40e0000000000000, 0x40f0000000000000, 0x7ff0000000000000, 0x7ff8000000000000,
> +};
Going to assume these are sane.  I think the only concern would be 
endianness, but it looks like you handle that reasonably.
> +
> +/* Find the index of TARGET in ARRAY, and return -1 if not found. */
> +
> +static int
> +find_index_in_array (unsigned HOST_WIDE_INT target, unsigned HOST_WIDE_INT *array, int len)
> +{
> +  if (array == NULL)
> +    return -1;
> +
> +  for (int i = 0; i < len; i++)
> +    {
> +      if (target == array[i])
> +	return i;
> +    }
> +  return -1;
> +}
Given the way constraint and operand matching occurrs, I wouldn't be 
surprised if this search turns out to be compile-time expensive.


> +
> +/* Return index of the FLI instruction table if rtx X is an immediate constant that
> +   can be moved using a single FLI instruction in zfa extension. -1 otherwise. */
> +
> +int
> +riscv_float_const_rtx_index_for_fli (rtx x)
> +{
> +  machine_mode mode = GET_MODE (x);
> +
> +  if (!TARGET_ZFA || mode == VOIDmode
> +      || !CONST_DOUBLE_P(x)
> +      || (mode == HFmode && !TARGET_ZFH)
> +      || (mode == SFmode && !TARGET_HARD_FLOAT)
> +      || (mode == DFmode && !TARGET_DOUBLE_FLOAT))
> +    return -1;
Bring the "|| mode == VOIDmode" down to its own line similar to how 
you've done with the !CONST_DOUBLE_P check.


>   
>   static bool
> @@ -826,6 +936,9 @@ riscv_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>     if (GET_CODE (x) == HIGH)
>       return true;
>   
> +  if (riscv_float_const_rtx_index_for_fli (x) != -1)
> +   return true;
> +
So if you do a follow-up handling negative fli constants, obviously we'd 
need further changes in this code.


> @@ -1213,6 +1326,11 @@ riscv_const_insns (rtx x)
>         }
>   
>       case CONST_DOUBLE:
> +      /* See if we can use FMV directly.  */
> +      if (riscv_float_const_rtx_index_for_fli (x) != -1)
> +	return 3;
That seems fairly high cost-wise.  Where did this value come from?   Or 
is it relative to COSTS_N_INSNS?


>     if (TARGET_DOUBLE_FLOAT
>         && ((FP_REG_RTX_P (src) && FP_REG_RTX_P (dest))
>   	  || (FP_REG_RTX_P (dest) && MEM_P (src))
>   	  || (FP_REG_RTX_P (src) && MEM_P (dest))
> +	  || (TARGET_ZFA
> +	      && ((FP_REG_RTX_P (dest) && GP_REG_RTX_P (src))
> +	      || (FP_REG_RTX_P (src) && GP_REG_RTX_P (dest))))
The formatting of the second FP_REG_RTX_P check looks goofy, but that 
may be a mailer issue.  Double check the "|| FP_REG" should line up 
under the FP_REG_RTX_P.


> @@ -2968,6 +3103,14 @@ riscv_output_move (rtx dest, rtx src)
>   	  case 8:
>   	    return "fld\t%0,%1";
>   	  }
> +
> +      if (src_code == CONST_DOUBLE && (riscv_float_const_rtx_index_for_fli (src) != -1))
> +	switch (width)
> +	  {
> +	    case 2: return "fli.h\t%0,%1";
> +	    case 4: return "fli.s\t%0,%1";
> +	    case 8: return "fli.d\t%0,%1";
> +	  }
We generally discourage having code on the same line as a case 
statement, so bring those return statements down to a new line.


> @@ -1580,6 +1609,26 @@ (define_insn "l<rint_pattern><ANYF:mode><GPR:mode>2"
>     [(set_attr "type" "fcvt")
>      (set_attr "mode" "<ANYF:MODE>")])
>   
> +(define_insn "<round_pattern><ANYF:mode>2"
> +  [(set (match_operand:ANYF     0 "register_operand" "=f")
> +	(unspec:ANYF
> +	    [(match_operand:ANYF 1 "register_operand" " f")]
> +	ROUND))]
> +  "TARGET_HARD_FLOAT && TARGET_ZFA"
> +  "fround.<ANYF:fmt>\t%0,%1,<round_rm>"
> +  [(set_attr "type" "fcvt")
> +   (set_attr "mode" "<ANYF:MODE>")])
> +
> +(define_insn "rint<ANYF:mode>2"
> +  [(set (match_operand:ANYF     0 "register_operand" "=f")
> +	(unspec:ANYF
> +	    [(match_operand:ANYF 1 "register_operand" " f")]
> +	UNSPEC_RINT))]
> +  "TARGET_HARD_FLOAT && TARGET_ZFA"
> +  "froundnx.<ANYF:fmt>\t%0,%1"
> +  [(set_attr "type" "fcvt")
> +   (set_attr "mode" "<ANYF:MODE>")])
Please review the existing RTX codes and their semantics in the 
internals manual and if any of the new instructions match those existing 
primitives, implement them using those RTX codes rather than with an UNSPEC.


Overall it looks pretty good.

jeff