From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-xa29.google.com (mail-vk1-xa29.google.com [IPv6:2607:f8b0:4864:20::a29]) by sourceware.org (Postfix) with ESMTPS id 4713F3858D1E for ; Wed, 28 Jun 2023 08:31:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4713F3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-vk1-xa29.google.com with SMTP id 71dfb90a1353d-4716726b741so1493135e0c.3 for ; Wed, 28 Jun 2023 01:31:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687941091; x=1690533091; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=P7H4RTaQRiU8uk8nvsM6djSe6RM7a1YQH/0lhUkIuDE=; b=Qw71DE/l/z5J0C/zt6FogG61Saj4NldItpfzJSw5390F242biX2ql9svbqq1MEa9/+ ixzx2hxjLpD84cbmaZI9clPPPguC59O39caKRZH3bQF7kLF8AK8zFPwxGVzYIOCYdvhO E1rFck3gMLCn76ckBt4nZucbYngAVgG4pPiZ81Ht2YvEyEZDex+WNsFiga9wLud+ihle 0zw5RDfb9lvx3j6JowYfjkh6/+BijBCG/ioRjep53nqEirSBWhZFPFxDeYl90wpM/MIZ iFVtE2evbjLDjC9Rf6JyBC0sPodZwCOkjo1If0VHPz0RfyTzWK23BT2cJMlNlesDXgMP nxCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687941091; x=1690533091; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P7H4RTaQRiU8uk8nvsM6djSe6RM7a1YQH/0lhUkIuDE=; b=lKzHXXxvpas/qU/2Qy4u17DnTSL9qxHx5kp3GcoQ4plpsTCuKgA7lw/aYX6fUkK9X6 +RBM/s1jsFkAKPhTfvTUCeBr13DbyFYVNceWM7rlf9zhEO2lCwsgMX1lwkJYH3kSUfGR TZRVP89cGuI6kEiEcDQVd6BGSTgaHpJeWybsUpFxlPcHmD2bTohc4zY73w0O5I4v6hsy lRiUBW5ikfSuZPyLY6zqYOJfvPx8VbKwLC0BbrMoN2H5sdG0jiG+OdRfeP+/ZfZOvETR bKfdGojOlATZmWTk6oIXmV6glzBPKCqIiNUBJEVt/L/KoPe33H786uwixDj76xs9vNdf aN+A== X-Gm-Message-State: AC+VfDzNUTHTUqcEIUX0DaMVZeW1/FQTLaOJtSYUNyFjY5GVIyRb901e IVOuVpiOl3XtINWBo3G62jS3XpH1a8xB80eno/4= X-Google-Smtp-Source: ACHHUZ4ZfuxH8J1jKiquJ9S4x0ne02syGcK8sAu2+abr5UsS4eVDkebOwy3cXgnzIW4UCxVIxuPqaAWostC3vRQOovs= X-Received: by 2002:a1f:c847:0:b0:47d:d600:4c65 with SMTP id y68-20020a1fc847000000b0047dd6004c65mr1570160vkf.12.1687941091137; Wed, 28 Jun 2023 01:31:31 -0700 (PDT) MIME-Version: 1.0 References: <20230628082707.256024-1-juzhe.zhong@rivai.ai> In-Reply-To: <20230628082707.256024-1-juzhe.zhong@rivai.ai> From: Kito Cheng Date: Wed, 28 Jun 2023 16:31:19 +0800 Message-ID: Subject: Re: [PATCH] RISC-V: Support vfwmacc combine lowering To: Juzhe-Zhong Cc: gcc-patches@gcc.gnu.org, kito.cheng@sifive.com, palmer@dabbelt.com, palmer@rivosinc.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_ASCII_DIVIDERS,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: LGTM On Wed, Jun 28, 2023 at 4:28=E2=80=AFPM Juzhe-Zhong = wrote: > > This patch adds combine pattern as follows: > > 1. (set (reg) (fma (float_extend:reg)(float_extend:reg)(reg))) > This pattern allows combine: vfwcvt + vfwcvt + vfmacc =3D=3D> vwfmacc. > > 2. (set (reg) (fma (float_extend:reg)(reg)(reg))) > This pattern is the intermediate IR that enhances the combine optimiza= tions. > Since for the complicate situation, combine pass can not combine both = operands > of multiplication at the first time, it will try to first combine at t= he first > stage: (set (reg) (fma (float_extend:reg)(reg)(reg))). Then combine an= other > extension of the other operand at the second stage. > > This can enhance combine optimization for the following case: > #define TEST_TYPE(TYPE1, TYPE2) = \ > __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 ( = \ > TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3= , \ > TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b, = \ > TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n) = \ > { = \ > for (int i =3D 0; i < n; i++) = \ > { = \ > dst[i] +=3D (TYPE1) a[i] * (TYPE1) b[i]; = \ > dst2[i] +=3D (TYPE1) a2[i] * (TYPE1) b[i]; = \ > dst3[i] +=3D (TYPE1) a2[i] * (TYPE1) a[i]; = \ > dst4[i] +=3D (TYPE1) a[i] * (TYPE1) b2[i]; = \ > } = \ > } > > #define TEST_ALL() = \ > TEST_TYPE (int16_t, int8_t) = \ > TEST_TYPE (uint16_t, uint8_t) = \ > TEST_TYPE (int32_t, int16_t) = \ > TEST_TYPE (uint32_t, uint16_t) = \ > TEST_TYPE (int64_t, int32_t) = \ > TEST_TYPE (uint64_t, uint32_t) = \ > TEST_TYPE (float, _Float16) = \ > TEST_TYPE (double, float) > > TEST_ALL () > > gcc/ChangeLog: > > * config/riscv/autovec-opt.md (*double_widen_fma): New patt= ern. > (*single_widen_fma): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/widen/widen-8.c: Add floating-poin= t. > * gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: Ditto. > * gcc.target/riscv/rvv/autovec/widen/widen_run-8.c: Ditto. > * gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-8.c: New test= . > > --- > gcc/config/riscv/autovec-opt.md | 58 +++++++++++++++++++ > .../riscv/rvv/autovec/widen/widen-8.c | 7 ++- > .../rvv/autovec/widen/widen-complicate-5.c | 7 ++- > .../riscv/rvv/autovec/widen/widen_run-8.c | 5 +- > .../rvv/autovec/widen/widen_run_zvfh-8.c | 32 ++++++++++ > 5 files changed, 103 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/wide= n_run_zvfh-8.c > > diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-o= pt.md > index 1fcd55ac2a0..1a1cef0eaa5 100644 > --- a/gcc/config/riscv/autovec-opt.md > +++ b/gcc/config/riscv/autovec-opt.md > @@ -444,3 +444,61 @@ > } > [(set_attr "type" "vfwmul") > (set_attr "mode" "")]) > + > +;; ---------------------------------------------------------------------= ---- > +;; ---- [FP] VFWMACC > +;; ---------------------------------------------------------------------= ---- > +;; Includes: > +;; - vfwmacc.vv > +;; ---------------------------------------------------------------------= ---- > + > +;; Combine ext + ext + fma =3D=3D=3D> widen fma. > +;; Most of circumstantces, LoopVectorizer will generate the following IR= : > +;; vect__8.176_40 =3D (vector([2,2]) double) vect__7.175_41; > +;; vect__11.180_35 =3D (vector([2,2]) double) vect__10.179_36; > +;; vect__13.182_33 =3D .FMA (vect__11.180_35, vect__8.176_40, vect__4.17= 2_45); > +(define_insn_and_split "*double_widen_fma" > + [(set (match_operand:VWEXTF 0 "register_operand") > + (fma:VWEXTF > + (float_extend:VWEXTF > + (match_operand: 2 "register_operand")) > + (float_extend:VWEXTF > + (match_operand: 3 "register_operand")) > + (match_operand:VWEXTF 1 "register_operand")))] > + "TARGET_VECTOR && can_create_pseudo_p ()" > + "#" > + "&& 1" > + [(const_int 0)] > + { > + riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_widen_mul (P= LUS, mode), > + riscv_vector::RVV_WIDEN_TER= NOP, operands); > + DONE; > + } > + [(set_attr "type" "vfwmuladd") > + (set_attr "mode" "")]) > + > +;; This helps to match ext + fma. > +(define_insn_and_split "*single_widen_fma" > + [(set (match_operand:VWEXTF 0 "register_operand") > + (fma:VWEXTF > + (float_extend:VWEXTF > + (match_operand: 2 "register_operand")) > + (match_operand:VWEXTF 3 "register_operand") > + (match_operand:VWEXTF 1 "register_operand")))] > + "TARGET_VECTOR && can_create_pseudo_p ()" > + "#" > + "&& 1" > + [(const_int 0)] > + { > + insn_code icode =3D code_for_pred_extend (mode); > + rtx tmp =3D gen_reg_rtx (mode); > + rtx ext_ops[] =3D {tmp, operands[2]}; > + riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ext_op= s); > + > + rtx dst =3D expand_ternary_op (mode, fma_optab, tmp, operands[= 3], > + operands[1], operands[0], 0); > + emit_move_insn (operands[0], dst); > + DONE; > + } > + [(set_attr "type" "vfwmuladd") > + (set_attr "mode" "")]) > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c b= /gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c > index f3ca07c02e0..8f41bdfdec2 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-8.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-additional-options "-march=3Drv32gcv -mabi=3Dilp32d --param=3Dri= scv-autovec-preference=3Dscalable" } */ > +/* { dg-additional-options "-march=3Drv32gcv_zvfh -mabi=3Dilp32d --param= =3Driscv-autovec-preference=3Dscalable -ffast-math" } */ > > #include > > @@ -19,9 +19,12 @@ > TEST_TYPE (int32_t, int16_t) = \ > TEST_TYPE (uint32_t, uint16_t) = \ > TEST_TYPE (int64_t, int32_t) = \ > - TEST_TYPE (uint64_t, uint32_t) > + TEST_TYPE (uint64_t, uint32_t) = \ > + TEST_TYPE (float, _Float16) = \ > + TEST_TYPE (double, float) > > TEST_ALL () > > /* { dg-final { scan-assembler-times {\tvwmacc\.vv} 3 } } */ > /* { dg-final { scan-assembler-times {\tvwmaccu\.vv} 3 } } */ > +/* { dg-final { scan-assembler-times {\tvfwmacc\.vv} 2 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-compl= icate-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complica= te-5.c > index 187b6db21fd..3ff8483cde4 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-5= .c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-5= .c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-additional-options "-march=3Drv32gcv -mabi=3Dilp32d --param=3Dri= scv-autovec-preference=3Dscalable" } */ > +/* { dg-additional-options "-march=3Drv32gcv_zvfh -mabi=3Dilp32d --param= =3Driscv-autovec-preference=3Dscalable -ffast-math" } */ > > #include > > @@ -24,9 +24,12 @@ > TEST_TYPE (int32_t, int16_t) = \ > TEST_TYPE (uint32_t, uint16_t) = \ > TEST_TYPE (int64_t, int32_t) = \ > - TEST_TYPE (uint64_t, uint32_t) > + TEST_TYPE (uint64_t, uint32_t) = \ > + TEST_TYPE (float, _Float16) = \ > + TEST_TYPE (double, float) > > TEST_ALL () > > /* { dg-final { scan-assembler-times {\tvwmacc\.vv} 12 } } */ > /* { dg-final { scan-assembler-times {\tvwmaccu\.vv} 12 } } */ > +/* { dg-final { scan-assembler-times {\tvfwmacc\.vv} 8 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8= .c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c > index f4840d30dc2..15095002154 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-8.c > @@ -1,5 +1,5 @@ > /* { dg-do run { target { riscv_vector } } } */ > -/* { dg-additional-options "--param=3Driscv-autovec-preference=3Dscalabl= e" } */ > +/* { dg-additional-options "--param=3Driscv-autovec-preference=3Dscalabl= e -ffast-math" } */ > > #include > #include "widen-8.c" > @@ -29,7 +29,8 @@ > RUN (int32_t, int16_t, -32768) = \ > RUN (uint32_t, uint16_t, 65535) = \ > RUN (int64_t, int32_t, -2147483648) = \ > - RUN (uint64_t, uint32_t, 4294967295) > + RUN (uint64_t, uint32_t, 4294967295) = \ > + RUN (double, float, -2147483648) > > int > main () > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_z= vfh-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-8= .c > new file mode 100644 > index 00000000000..63563b86e7c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-8.c > @@ -0,0 +1,32 @@ > +/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */ > +/* { dg-additional-options "--param=3Driscv-autovec-preference=3Dscalabl= e -ffast-math" } */ > + > +#include > +#include "widen-8.c" > + > +#define SZ 512 > + > +#define RUN(TYPE1, TYPE2, LIMIT) = \ > + TYPE2 a##TYPE2[SZ]; = \ > + TYPE2 b##TYPE2[SZ]; = \ > + TYPE1 dst##TYPE1[SZ]; = \ > + TYPE1 dst2##TYPE1[SZ]; = \ > + for (int i =3D 0; i < SZ; i++) = \ > + { = \ > + a##TYPE2[i] =3D LIMIT + i % 8723; = \ > + b##TYPE2[i] =3D LIMIT + i & 1964; = \ > + dst##TYPE1[i] =3D LIMIT + i & 628; = \ > + dst2##TYPE1[i] =3D LIMIT + i & 628; = \ > + } = \ > + vwmacc_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE2, SZ); = \ > + for (int i =3D 0; i < SZ; i++) = \ > + assert (dst##TYPE1[i] = \ > + =3D=3D ((TYPE1) a##TYPE2[i] * (TYPE1) b##TYPE2[i]) + dst2##TY= PE1[i]); > + > +#define RUN_ALL() RUN (float, _Float16, -32768) > + > +int > +main () > +{ > + RUN_ALL () > +} > -- > 2.36.1 >