From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=HRIl=CF=gmail.com=jeffreyalaw@sourceware.org>
Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a])
	by sourceware.org (Postfix) with ESMTPS id C6FAC3858D35
	for <gcc-patches@gcc.gnu.org>; Sat, 17 Jun 2023 02:17:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C6FAC3858D35
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-pf1-x42a.google.com with SMTP id d2e1a72fcca58-6685421cdb3so20988b3a.1
        for <gcc-patches@gcc.gnu.org>; Fri, 16 Jun 2023 19:17:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1686968238; x=1689560238;
        h=content-transfer-encoding:in-reply-to:from:references:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=1aoL66HtWYOlrSQa5VNLyEgOTZh52dBglS4jTN7Pxhc=;
        b=KeDapqJzcoRcF0S6cRqnaF0mh+vN1INUFAW/+iqzTpKCLzD4Cgl0a6d+51jj/nmKAq
         rtcM06JGFlBsywsps9rlyD17mZKYsnZ9xSY2vP8oLRnK6EEgkg/wU/z6BgVYEyZvomMi
         Fu4LjFjZvNv2APp9QW1WcZ/pppaJUmO7bEMMxmw6xydIX6ozYkwKgRq71COZG+aeMw8s
         se4C7Z5x4UJhDl3SuxirTY2GHjTV2K1ppNxZvv8/GLLt4yx6eTKKwkO5kvNTDEUbLM/z
         Dd8NCwquRYRKR8EbX9YzWSAZm9GvDUZPmur1Ebjaw73PD2yJOilRFyLI8RYsgMnjNtIn
         /yMw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1686968238; x=1689560238;
        h=content-transfer-encoding:in-reply-to:from:references:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=1aoL66HtWYOlrSQa5VNLyEgOTZh52dBglS4jTN7Pxhc=;
        b=OkiKcLLbOzXQLR/62BszZMC/yiFAaKWfRk3row8JoXvONFiaSq40BXBm+325D2Qlzq
         /obXGsfbqmLz7GQkk3Y/Z7x9knMRLgh0+FqHu7CezZliO26BCZUB9nL/0U4nW38WwOyb
         2r85KidIkS6DmlCVgQmgI9DbNlYhRRyxr+D01FqThRjgYkN9+g7nV10/ouiBSrB14teW
         DyzQy+X1JP4alp1U1nUnZtBA2XnfEcfH/hYwjdUHm0DIKmNBL4zbkZda4Mpnm4euQSUi
         5UGZ9ce6b/uZw/6JlNjSL5PkJxtvVkt4bZFb02uLiuUN2SDY+P3m0ByuxCBKa8Pcye+P
         sc2g==
X-Gm-Message-State: AC+VfDxW/K09fOeliGRe5eml1iB8vTM8FAobgn1HZaagg/lPOUXvanZ9
	IamMw0n+5D9wvo+HCQJhXR4=
X-Google-Smtp-Source: ACHHUZ7VbeCWTSSkRyK25P1ZY7xCnv7kmLER0VEUhBnY4qWJ2hyjI64+VzoQf++vt3DwD2Bl2q771w==
X-Received: by 2002:a05:6a20:6a10:b0:10a:dd79:65bd with SMTP id p16-20020a056a206a1000b0010add7965bdmr12457221pzk.27.1686968237557;
        Fri, 16 Jun 2023 19:17:17 -0700 (PDT)
Received: from [172.31.0.109] ([136.36.130.248])
        by smtp.gmail.com with ESMTPSA id f15-20020a65550f000000b005533f154df1sm1638937pgr.2.2023.06.16.19.17.16
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Fri, 16 Jun 2023 19:17:16 -0700 (PDT)
Message-ID: <de0f9fab-62a7-9a6b-7d8b-f5552319b013@gmail.com>
Date: Fri, 16 Jun 2023 20:17:15 -0600
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.12.0
Subject: Re: [PATCH] RISC-V: Add autovec FP unary operations.
Content-Language: en-US
To: =?UTF-8?B?6ZKf5bGF5ZOy?= <juzhe.zhong@rivai.ai>,
 "rdapp.gcc" <rdapp.gcc@gmail.com>, gcc-patches <gcc-patches@gcc.gnu.org>,
 palmer <palmer@dabbelt.com>, "kito.cheng" <kito.cheng@gmail.com>
References: <490fd4af-75d2-de76-fa74-f9ebb478b8b8@gmail.com>
 <1377c200-9c30-0b7f-0893-0d7d976dfd43@gmail.com>
 <C082D80E441B7DA6+2023061505153418390417@rivai.ai>
From: Jeff Law <jeffreyalaw@gmail.com>
In-Reply-To: <C082D80E441B7DA6+2023061505153418390417@rivai.ai>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 6/14/23 15:15, 钟居哲 wrote:
> Hi, Jeff.  Thanks for quick approval.
> 
> When I reviewed the patch:
> (define_expand "<optab><mode>2"
>    [(set (match_operand:VF 0 "register_operand")
>      (any_float_unop_nofrm:VF
>       (match_operand:VF 1 "register_operand")))]
> "TARGET_VECTOR"
> {
>    insn_code icode = code_for_pred (<CODE>, <MODE>mode);
>    riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
>    DONE;
> })
> 
> There could be issue here of FP16 vector.
> Since let's see VF iterator:
> (define_mode_iterator VF [
>    (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
>    (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
>    (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
>    (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
>    (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
>    (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
>    (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
> ....
> 
> You can see For all FP16 mode, we use predicate "TARGET_VECTOR_ELEN_FP_16"
> which is true when either TARGET_ZVFHM or TARGET_ZVFHMIN.
> The reason we do that since most floating-point instructions are using 
> same iterators that we can't add TARGET_ZVFHMIN or TARGET_ZVFH
> in naive way. Some instructions pattern are using VF for example vle16.v 
> which should be enabled as long as TARGET_ZVFHMIN wheras
> the instructions like vfneg.v need TARGET_ZVFH.
> 
> So I do the experiment:
> void
> f (_Float16 *restrict a, _Float16 *restrict b)
> {
> for (int i = 0; i < 100; ++i)
>      {
> a[i] = -b[i];
>      }
> }
> 
> with compile option:
> -march=rv64gcv_zvfhmin --param=riscv-autovec-preference=fixed-vlmax -O3
> 
> ICE happens:
> auto.c:26:1: error: unable to generate reloads for:
> (insn 8 7 9 2 (set (reg:VNx8HF 186 [ vect__6.7 ])
>          (if_then_else:VNx8HF (unspec:VNx8BI [
>                      (const_vector:VNx8BI [
>                              (const_int 1 [0x1]) repeated x8
>                          ])
>                      (const_int 8 [0x8])
>                      (const_int 2 [0x2]) repeated x2
>                      (const_int 0 [0])
>                      (reg:SI 66 vl)
>                      (reg:SI 67 vtype)
>                  ] UNSPEC_VPREDICATE)
>              (neg:VNx8HF (reg:VNx8HF 134 [ vect__4.6 ]))
>              (unspec:VNx8HF [
>                      (reg:SI 0 zero)
>                  ] UNSPEC_VUNDEF))) "auto.c":24:14 6631 {pred_negvnx8hf}
>       (expr_list:REG_DEAD (reg:VNx8HF 134 [ vect__4.6 ])
>          (nil)))
> 
> The reason of ICE is that we have enabled auto-vectorzation pattern of 
> vfneg.v when TARGET_ZVFHMIN according to VF iterators but
> the instructions pattern of vfneg.v is correctly disabled and only 
> enabled when TARGET_ZVFH since we have this attribute for each
> RVV instruction pattern:
> (define_attr "fp_vector_disabled" "no,yes"
>    (cond [
>      (and (eq_attr "type" "vfmov,vfalu,vfmul,vfdiv,
>          vfwalu,vfwmul,vfmuladd,vfwmuladd,
>          vfsqrt,vfrecp,vfminmax,vfsgnj,vfcmp,
>          vfclass,vfmerge,
>          vfncvtitof,vfwcvtftoi,vfcvtftoi,vfcvtitof,
>          vfredo,vfredu,vfwredo,vfwredu,
>          vfslide1up,vfslide1down")
>     (and (eq_attr "mode" 
> "VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF")
>          (match_test "!TARGET_ZVFH")))
>      (const_string "yes")
> 
> ;; The mode records as QI for the FP16 <=> INT8 instruction.
>      (and (eq_attr "type" "vfncvtftoi,vfwcvtitof")
>     (and (eq_attr "mode" 
> "VNx1QI,VNx2QI,VNx4QI,VNx8QI,VNx16QI,VNx32QI,VNx64QI")
>          (match_test "!TARGET_ZVFH")))
>      (const_string "yes")
>    ]
>    (const_string "no")))
> 
> When I slightly change the pattern as follows:
> (define_expand "<optab><mode>2"
>    [(set (match_operand:VF 0 "register_operand")
>      (any_float_unop_nofrm:VF
>       (match_operand:VF 1 "register_operand")))]
> "TARGET_VECTOR && !(GET_MODE_INNER (<MODE>mode) == HFmode && !TARGET_ZVFH)"
> {
>    insn_code icode = code_for_pred (<CODE>, <MODE>mode);
>    riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
>    DONE;
> })
> 
> Add && !(GET_MODE_INNER (<MODE>mode) == HFmode && !TARGET_ZVFH)
> to condition.
> 
> It works for both TARGET_ZVFH and TARGET_ZVFHMIN
> -march=rv64gcv_zvfhmin:
> f:
>          li      a4,2147450880
>          li      a5,-2147450880
>          addi    a4,a4,-1
>          addi    a5,a5,1
>          slli    a3,a5,32
>          slli    a2,a4,32
>          mv      a5,a4
>          li      a4,-2147450880
>          addi    a6,a1,200
>          add     a3,a3,a4
>          add     a2,a2,a5
> .L2:
>          ld      a5,0(a1)
>          addi    a0,a0,8
>          addi    a1,a1,8
>          not     a4,a5
>          and     a5,a5,a2
>          and     a4,a4,a3
>          sub     a5,a3,a5
>          xor     a5,a4,a5
>          sd      a5,-8(a0)
>          bne     a1,a6,.L2
>          ret
> 
> -march=rv64gcv_zvfh:
> f:
>          vsetivli        zero,8,e16,m1,ta,ma
>          addi    a4,a1,16
>          addi    a5,a0,16
>          vle16.v v1,0(a1)
>          vfneg.v v1,v1
>          vse16.v v1,0(a0)
>          addi    a2,a1,32
>          addi    a3,a0,32
>          vle16.v v1,0(a4)
>          vfneg.v v1,v1
>          vse16.v v1,0(a5)
>          addi    a4,a1,48
>          addi    a5,a0,48
>          vle16.v v1,0(a2)
>          vfneg.v v1,v1
>          vse16.v v1,0(a3)
>          addi    a2,a1,64
>          addi    a3,a0,64
>          vle16.v v1,0(a4)
>          vfneg.v v1,v1
>          vse16.v v1,0(a5)
>          addi    a4,a1,80
>          addi    a5,a0,80
>          vle16.v v1,0(a2)
>          vfneg.v v1,v1
>          vse16.v v1,0(a3)
> ....
> 
> 
> This is what we expected, TARGET_ZVFH enable auto-vectorization wheras 
> no auto-vectorization when TARGET_ZVFHMIN since
> vfneg.v is not allowed in TARGET_ZVFHMIN.
> 
> However, I think adding !(GET_MODE_INNER (<MODE>mode) == HFmode && 
> !TARGET_ZVFH)
> is an ugly implementation and not easy to maintain since we will need 
> add this condition to each floating-point patterns.
> 
> So, give me some time to figure out an elegant way to support 
> auto-vectorization.
Sigh.  There are days when I look at how the ISA is managed and I don't 
know whether to cry or scream.

Thanks for the detailed explanation, you're absolutely correct that we 
need to be cognizant of the pitfalls of how the iterators interact ZVFH 
and ZVFHMIN.

jeff