From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=4ixu=DG=sifive.com=kito.cheng@sourceware.org>
Received: from mail-oo1-xc30.google.com (mail-oo1-xc30.google.com [IPv6:2607:f8b0:4864:20::c30])
	by sourceware.org (Postfix) with ESMTPS id 08EA03858CDB
	for <gcc-patches@gcc.gnu.org>; Thu, 20 Jul 2023 08:59:26 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 08EA03858CDB
Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=sifive.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=sifive.com
Received: by mail-oo1-xc30.google.com with SMTP id 006d021491bc7-5633b7e5f90so397739eaf.1
        for <gcc-patches@gcc.gnu.org>; Thu, 20 Jul 2023 01:59:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=sifive.com; s=google; t=1689843564; x=1690448364;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Svd6JTa6NtwWxyg/35zt3zleg5FwRIbDLqV7SwRQ/8w=;
        b=TzHzLHxYKwja8Ydt0ep2lA5Y0cpMGEu3wuiKaApzHAha+j+hpcAsch2Mb+f/fTK+CQ
         3Jw2gHEhnohHZ6/8Fx8n/hDcz6b5/5NYhahYOccTX7OtomCVpT3/JTk4u5a0Iduq/K3v
         N9CjoLaemwFg5JrfPstoAdETftLTD+Dtw9b+26Dh4RQ3fngrVPdsYYMt/bjrE4DsRd9w
         KJC+T1peaQxOvHZTJuZYFx+8A+9LIgpsU35BieHE4DZnyLS+Aj4MWsdLu2Ugr1Mln/7t
         U7edzw9i0nXGm8vvAcPmO4XYJBmLbz7Ae/OjtNfl/derynEdaFUopXmwCy9UhdjLvvaQ
         E3vg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1689843564; x=1690448364;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Svd6JTa6NtwWxyg/35zt3zleg5FwRIbDLqV7SwRQ/8w=;
        b=LRSXP8GYOLvGZmGl10N5qjBn+QXITCxtojtBKr5rjo6AFixQxStA+mf3veCZkdWMQ8
         JV1dqUjg+6brWC9dVZujc8h72+XTzqMYbQJn1B63IZcwPApsf1Ea8M8kxldk/CK0FOmh
         6Awau6uohWF74LOblUtCIpqaZK6kkR7urLnSdlECbzo013wcyExfaPWITvUUQzKhWkaU
         o2HeMGOwlVOz1I5qx+vXtbVbyb8bSlkrxD5TzcCyF4dQipcorcsEBa9HpnYAVJJV49iZ
         C4x4gw8xCd/tQa5QYlo/JPOcvGX5ANPnDOKG9en82Wml3bfUpPdMzLtzQVq118dlGd8h
         Tzfg==
X-Gm-Message-State: ABy/qLY6u8jr3plmf+7oMaMGhVP0LGu1ux5p6mNCmFho98Ghd7MGJRKZ
	iLR73tgyGJnpcRC9Kb5a/pTt3ZIb2iPUDMjNNR4E7w==
X-Google-Smtp-Source: APBJJlG3qgwnjKm1ev/9+lK9UmoO+NIhR7ge0/PH17viI3RRU9L5JvDTspBYpQhovhxiowrIzkI4B4cKFfoqE4bsqHs=
X-Received: by 2002:a05:6808:349:b0:396:169f:3660 with SMTP id
 j9-20020a056808034900b00396169f3660mr920447oie.58.1689843564504; Thu, 20 Jul
 2023 01:59:24 -0700 (PDT)
MIME-Version: 1.0
References: <20230720085103.159227-1-juzhe.zhong@rivai.ai>
In-Reply-To: <20230720085103.159227-1-juzhe.zhong@rivai.ai>
From: Kito Cheng <kito.cheng@sifive.com>
Date: Thu, 20 Jul 2023 16:59:13 +0800
Message-ID: <CALLt3TjTg9NzUKava6gDSiFWqL3oppFfDDRU1KeWD20Z03NB1A@mail.gmail.com>
Subject: Re: [PATCH V2] RISC-V: Support in-order floating-point reduction
To: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Cc: gcc-patches@gcc.gnu.org, kito.cheng@gmail.com, jeffreyalaw@gmail.com, 
	rdapp.gcc@gmail.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_ASCII_DIVIDERS,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

LGTM, but I would like make sure Robin is OK too

On Thu, Jul 20, 2023 at 4:51=E2=80=AFPM Juzhe-Zhong <juzhe.zhong@rivai.ai> =
wrote:
>
> This patch is depending on:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624995.html
>
> Consider this following case:
> float foo (float *__restrict a, int n)
> {
>   float result =3D 1.0;
>   for (int i =3D 0; i < n; i++)
>    result +=3D a[i];
>   return result;
> }
>
> Compile with **NO** -ffast-math:
>
> Before this patch:
> <source>:4:21: missed: couldn't vectorize loop
> <source>:1:7: missed: not vectorized: relevant phi not supported: result_=
14 =3D PHI <result_11(6), 1.0e+0(5)>
>
> After this patch:
> foo:
>         lui     a5,%hi(.LC0)
>         flw     fa0,%lo(.LC0)(a5)
>         ble     a1,zero,.L4
> .L3:
>         vsetvli a5,a1,e32,m1,ta,ma
>         vle32.v v1,0(a0)
>         slli    a4,a5,2
>         sub     a1,a1,a5
>         vfmv.s.f        v2,fa0
>         add     a0,a0,a4
>         vfredosum.vs    v1,v1,v2     ----------> FOLD_LEFT_PLUS
>         vfmv.f.s        fa0,v1
>         bne     a1,zero,.L3
>         ret
> .L4:
>         ret
>
> gcc/ChangeLog:
>
>         * config/riscv/autovec.md (fold_left_plus_<mode>): New pattern.
>         (mask_len_fold_left_plus_<mode>): Ditto.
>         * config/riscv/riscv-protos.h (enum insn_type): New enum.
>         (enum reduction_type): Ditto.
>         (expand_reduction): Add in-order reduction.
>         * config/riscv/riscv-v.cc (emit_nonvlmax_fp_reduction_insn): New =
function.
>         (expand_reduction): Add in-order reduction.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c: New test.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c: New test.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c: New test.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c: New test.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c: New test.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c: New test.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c: New test.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c: New te=
st.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c: New te=
st.
>
> ---
>  gcc/config/riscv/autovec.md                   | 39 ++++++++++++++
>  gcc/config/riscv/riscv-protos.h               | 13 ++++-
>  gcc/config/riscv/riscv-v.cc                   | 53 +++++++++++++++----
>  .../riscv/rvv/autovec/reduc/reduc_strict-1.c  | 28 ++++++++++
>  .../riscv/rvv/autovec/reduc/reduc_strict-2.c  | 26 +++++++++
>  .../riscv/rvv/autovec/reduc/reduc_strict-3.c  | 18 +++++++
>  .../riscv/rvv/autovec/reduc/reduc_strict-4.c  | 24 +++++++++
>  .../riscv/rvv/autovec/reduc/reduc_strict-5.c  | 28 ++++++++++
>  .../riscv/rvv/autovec/reduc/reduc_strict-6.c  | 18 +++++++
>  .../riscv/rvv/autovec/reduc/reduc_strict-7.c  | 21 ++++++++
>  .../rvv/autovec/reduc/reduc_strict_run-1.c    | 29 ++++++++++
>  .../rvv/autovec/reduc/reduc_strict_run-2.c    | 31 +++++++++++
>  12 files changed, 317 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/redu=
c_strict-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/redu=
c_strict-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/redu=
c_strict-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/redu=
c_strict-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/redu=
c_strict-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/redu=
c_strict-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/redu=
c_strict-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/redu=
c_strict_run-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/redu=
c_strict_run-2.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 00947207f3f..667a877d009 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1687,3 +1687,42 @@
>    riscv_vector::expand_reduction (SMIN, operands, f);
>    DONE;
>  })
> +
> +;; ---------------------------------------------------------------------=
----
> +;; ---- [FP] Left-to-right reductions
> +;; ---------------------------------------------------------------------=
----
> +;; Includes:
> +;; - vfredosum.vs
> +;; ---------------------------------------------------------------------=
----
> +
> +;; Unpredicated in-order FP reductions.
> +(define_expand "fold_left_plus_<mode>"
> +  [(match_operand:<VEL> 0 "register_operand")
> +   (match_operand:<VEL> 1 "register_operand")
> +   (match_operand:VF 2 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  riscv_vector::expand_reduction (PLUS, operands,
> +                                 operands[1],
> +                                 riscv_vector::reduction_type::FOLD_LEFT=
);
> +  DONE;
> +})
> +
> +;; Predicated in-order FP reductions.
> +(define_expand "mask_len_fold_left_plus_<mode>"
> +  [(match_operand:<VEL> 0 "register_operand")
> +   (match_operand:<VEL> 1 "register_operand")
> +   (match_operand:VF 2 "register_operand")
> +   (match_operand:<VM> 3 "vector_mask_operand")
> +   (match_operand 4 "autovec_length_operand")
> +   (match_operand 5 "const_0_operand")]
> +  "TARGET_VECTOR"
> +{
> +  if (rtx_equal_p (operands[4], const0_rtx))
> +    emit_move_insn (operands[0], operands[1]);
> +  else
> +    riscv_vector::expand_reduction (PLUS, operands,
> +                                   operands[1],
> +                                   riscv_vector::reduction_type::MASK_LE=
N_FOLD_LEFT);
> +  DONE;
> +})
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-pro=
tos.h
> index 16fb8dabca0..c9520f689e2 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -199,6 +199,7 @@ enum insn_type
>    RVV_GATHER_M_OP =3D 5,
>    RVV_SCATTER_M_OP =3D 4,
>    RVV_REDUCTION_OP =3D 3,
> +  RVV_REDUCTION_TU_OP =3D RVV_REDUCTION_OP + 2,
>  };
>  enum vlmul_type
>  {
> @@ -247,7 +248,7 @@ void emit_vlmax_merge_insn (unsigned, int, rtx *);
>  void emit_vlmax_cmp_insn (unsigned, rtx *);
>  void emit_vlmax_cmp_mu_insn (unsigned, rtx *);
>  void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
> -void emit_scalar_move_insn (unsigned, rtx *);
> +void emit_scalar_move_insn (unsigned, rtx *, rtx =3D 0);
>  void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
>  enum vlmul_type get_vlmul (machine_mode);
>  unsigned int get_ratio (machine_mode);
> @@ -270,6 +271,13 @@ enum mask_policy
>    MASK_AGNOSTIC =3D 1,
>    MASK_ANY =3D 2,
>  };
> +
> +enum class reduction_type
> +{
> +  UNORDERED,
> +  FOLD_LEFT,
> +  MASK_LEN_FOLD_LEFT,
> +};
>  enum tail_policy get_prefer_tail_policy ();
>  enum mask_policy get_prefer_mask_policy ();
>  rtx get_avl_type_rtx (enum avl_type);
> @@ -282,7 +290,8 @@ bool has_vi_variant_p (rtx_code, rtx);
>  void expand_vec_cmp (rtx, rtx_code, rtx, rtx);
>  bool expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
>  void expand_cond_len_binop (rtx_code, rtx *);
> -void expand_reduction (rtx_code, rtx *, rtx);
> +void expand_reduction (rtx_code, rtx *, rtx,
> +                      reduction_type =3D reduction_type::UNORDERED);
>  #endif
>  bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
>                           bool, void (*)(rtx *, rtx));
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 53088edf909..e338be151d3 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -1023,11 +1023,11 @@ emit_nonvlmax_fp_tu_insn (unsigned icode, int op_=
num, rtx *ops, rtx avl)
>  /* Emit vmv.s.x instruction.  */
>
>  void
> -emit_scalar_move_insn (unsigned icode, rtx *ops)
> +emit_scalar_move_insn (unsigned icode, rtx *ops, rtx len)
>  {
>    machine_mode dest_mode =3D GET_MODE (ops[0]);
>    machine_mode mask_mode =3D get_mask_mode (dest_mode).require ();
> -  insn_expander<RVV_INSN_OPERANDS_MAX> e (riscv_vector::RVV_SCALAR_MOV_O=
P,
> +  insn_expander<RVV_INSN_OPERANDS_MAX> e (RVV_SCALAR_MOV_OP,
>                                           /* HAS_DEST_P */ true,
>                                           /* FULLY_UNMASKED_P */ false,
>                                           /* USE_REAL_MERGE_P */ true,
> @@ -1038,7 +1038,7 @@ emit_scalar_move_insn (unsigned icode, rtx *ops)
>
>    e.set_policy (TAIL_ANY);
>    e.set_policy (MASK_ANY);
> -  e.set_vl (CONST1_RTX (Pmode));
> +  e.set_vl (len ? len : CONST1_RTX (Pmode));
>    e.emit_insn ((enum insn_code) icode, ops);
>  }
>
> @@ -1196,6 +1196,26 @@ emit_vlmax_fp_reduction_insn (unsigned icode, int =
op_num, rtx *ops)
>    e.emit_insn ((enum insn_code) icode, ops);
>  }
>
> +/* Emit reduction instruction.  */
> +static void
> +emit_nonvlmax_fp_reduction_insn (unsigned icode, int op_num, rtx *ops, r=
tx vl)
> +{
> +  machine_mode dest_mode =3D GET_MODE (ops[0]);
> +  machine_mode mask_mode =3D get_mask_mode (GET_MODE (ops[1])).require (=
);
> +  insn_expander<RVV_INSN_OPERANDS_MAX> e (op_num,
> +                                         /* HAS_DEST_P */ true,
> +                                         /* FULLY_UNMASKED_P */ false,
> +                                         /* USE_REAL_MERGE_P */ true,
> +                                         /* HAS_AVL_P */ true,
> +                                         /* VLMAX_P */ false, dest_mode,
> +                                         mask_mode);
> +
> +  e.set_policy (TAIL_ANY);
> +  e.set_rounding_mode (FRM_DYN);
> +  e.set_vl (vl);
> +  e.emit_insn ((enum insn_code) icode, ops);
> +}
> +
>  /* Emit merge instruction.  */
>
>  static machine_mode
> @@ -3343,9 +3363,10 @@ expand_cond_len_ternop (unsigned icode, rtx *ops)
>
>  /* Expand reduction operations.  */
>  void
> -expand_reduction (rtx_code code, rtx *ops, rtx init)
> +expand_reduction (rtx_code code, rtx *ops, rtx init, reduction_type type=
)
>  {
> -  machine_mode vmode =3D GET_MODE (ops[1]);
> +  rtx vector =3D type =3D=3D reduction_type::UNORDERED ? ops[1] : ops[2]=
;
> +  machine_mode vmode =3D GET_MODE (vector);
>    machine_mode m1_mode =3D get_m1_mode (vmode).require ();
>    machine_mode m1_mmode =3D get_mask_mode (m1_mode).require ();
>
> @@ -3353,16 +3374,30 @@ expand_reduction (rtx_code code, rtx *ops, rtx in=
it)
>    rtx m1_mask =3D gen_scalar_move_mask (m1_mmode);
>    rtx m1_undef =3D RVV_VUNDEF (m1_mode);
>    rtx scalar_move_ops[] =3D {m1_tmp, m1_mask, m1_undef, init};
> -  emit_scalar_move_insn (code_for_pred_broadcast (m1_mode), scalar_move_=
ops);
> +  rtx len =3D type =3D=3D reduction_type::MASK_LEN_FOLD_LEFT ? ops[4] : =
NULL_RTX;
> +  emit_scalar_move_insn (code_for_pred_broadcast (m1_mode), scalar_move_=
ops,
> +                        len);
>
>    rtx m1_tmp2 =3D gen_reg_rtx (m1_mode);
> -  rtx reduc_ops[] =3D {m1_tmp2, ops[1], m1_tmp};
> +  rtx reduc_ops[] =3D {m1_tmp2, vector, m1_tmp};
>
>    if (FLOAT_MODE_P (vmode) && code =3D=3D PLUS)
>      {
>        insn_code icode
> -       =3D code_for_pred_reduc_plus (UNSPEC_UNORDERED, vmode, m1_mode);
> -      emit_vlmax_fp_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
> +       =3D code_for_pred_reduc_plus (type =3D=3D reduction_type::UNORDER=
ED
> +                                     ? UNSPEC_UNORDERED
> +                                     : UNSPEC_ORDERED,
> +                                   vmode, m1_mode);
> +      if (type =3D=3D reduction_type::MASK_LEN_FOLD_LEFT)
> +       {
> +         rtx mask =3D ops[3];
> +         rtx mask_len_reduc_ops[]
> +           =3D {m1_tmp2, mask, RVV_VUNDEF (m1_mode), vector, m1_tmp};
> +         emit_nonvlmax_fp_reduction_insn (icode, RVV_REDUCTION_TU_OP,
> +                                          mask_len_reduc_ops, len);
> +       }
> +      else
> +       emit_vlmax_fp_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops)=
;
>      }
>    else
>      {
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_stric=
t-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c
> new file mode 100644
> index 00000000000..c293e9ae746
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=3Drv32gcv_zvfh -mabi=3Dilp32d --param=
=3Driscv-autovec-preference=3Dscalable -fno-vect-cost-model" } */
> +
> +#include <stdint-gcc.h>
> +
> +#define NUM_ELEMS(TYPE) ((int)(5 * (256 / sizeof (TYPE)) + 3))
> +
> +#define DEF_REDUC_PLUS(TYPE)                   \
> +  TYPE __attribute__ ((noinline, noclone))     \
> +  reduc_plus_##TYPE (TYPE *a, TYPE *b)         \
> +  {                                            \
> +    TYPE r =3D 0, q =3D 3;                         \
> +    for (int i =3D 0; i < NUM_ELEMS (TYPE); i++) \
> +      {                                                \
> +       r +=3D a[i];                              \
> +       q -=3D b[i];                              \
> +      }                                                \
> +    return r * q;                              \
> +  }
> +
> +#define TEST_ALL(T) \
> +  T (_Float16) \
> +  T (float) \
> +  T (double)
> +
> +TEST_ALL (DEF_REDUC_PLUS)
> +
> +/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+=
,\s*v[0-9]+} 6 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_stric=
t-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c
> new file mode 100644
> index 00000000000..2e1e7ab674d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=3Drv32gcv_zvfh -mabi=3Dilp32d --param=
=3Driscv-autovec-preference=3Dscalable -fno-vect-cost-model" } */
> +
> +#define NUM_ELEMS(TYPE) ((int) (5 * (256 / sizeof (TYPE)) + 3))
> +
> +#define DEF_REDUC_PLUS(TYPE)                                   \
> +void __attribute__ ((noinline, noclone))                       \
> +reduc_plus_##TYPE (TYPE (*restrict a)[NUM_ELEMS (TYPE)],       \
> +                  TYPE *restrict r, int n)                     \
> +{                                                              \
> +  for (int i =3D 0; i < n; i++)                                  \
> +    {                                                          \
> +      r[i] =3D 0;                                                       =
 \
> +      for (int j =3D 0; j < NUM_ELEMS (TYPE); j++)               \
> +        r[i] +=3D a[i][j];                                       \
> +    }                                                          \
> +}
> +
> +#define TEST_ALL(T) \
> +  T (_Float16) \
> +  T (float) \
> +  T (double)
> +
> +TEST_ALL (DEF_REDUC_PLUS)
> +
> +/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+=
,\s*v[0-9]+} 3 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_stric=
t-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c
> new file mode 100644
> index 00000000000..f559d40e60f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=3Drv32gcv_zvfh -mabi=3Dilp32d --param=
=3Driscv-autovec-preference=3Dscalable -fno-vect-cost-model" } */
> +
> +double mat[100][2];
> +
> +double
> +slp_reduc_plus (int n)
> +{
> +  double tmp =3D 0.0;
> +  for (int i =3D 0; i < n; i++)
> +    {
> +      tmp =3D tmp + mat[i][0];
> +      tmp =3D tmp + mat[i][1];
> +    }
> +  return tmp;
> +}
> +
> +/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+=
,\s*v[0-9]+} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_stric=
t-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c
> new file mode 100644
> index 00000000000..428d371d9cf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=3Drv32gcv_zvfh -mabi=3Dilp32d --param=
=3Driscv-autovec-preference=3Dscalable -fno-vect-cost-model" } */
> +
> +double mat[100][8];
> +
> +double
> +slp_reduc_plus (int n)
> +{
> +  double tmp =3D 0.0;
> +  for (int i =3D 0; i < n; i++)
> +    {
> +      tmp =3D tmp + mat[i][0];
> +      tmp =3D tmp + mat[i][1];
> +      tmp =3D tmp + mat[i][2];
> +      tmp =3D tmp + mat[i][3];
> +      tmp =3D tmp + mat[i][4];
> +      tmp =3D tmp + mat[i][5];
> +      tmp =3D tmp + mat[i][6];
> +      tmp =3D tmp + mat[i][7];
> +    }
> +  return tmp;
> +}
> +
> +/* { dg-final { scan-assembler {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[=
0-9]+} } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_stric=
t-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c
> new file mode 100644
> index 00000000000..24add2291f1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=3Drv32gcv_zvfh -mabi=3Dilp32d --param=
=3Driscv-autovec-preference=3Dscalable -fno-vect-cost-model" } */
> +
> +double mat[100][12];
> +
> +double
> +slp_reduc_plus (int n)
> +{
> +  double tmp =3D 0.0;
> +  for (int i =3D 0; i < n; i++)
> +    {
> +      tmp =3D tmp + mat[i][0];
> +      tmp =3D tmp + mat[i][1];
> +      tmp =3D tmp + mat[i][2];
> +      tmp =3D tmp + mat[i][3];
> +      tmp =3D tmp + mat[i][4];
> +      tmp =3D tmp + mat[i][5];
> +      tmp =3D tmp + mat[i][6];
> +      tmp =3D tmp + mat[i][7];
> +      tmp =3D tmp + mat[i][8];
> +      tmp =3D tmp + mat[i][9];
> +      tmp =3D tmp + mat[i][10];
> +      tmp =3D tmp + mat[i][11];
> +    }
> +  return tmp;
> +}
> +
> +/* { dg-final { scan-assembler {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[=
0-9]+} } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_stric=
t-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c
> new file mode 100644
> index 00000000000..c1567b067ba
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=3Drv32gcv_zvfh -mabi=3Dilp32d --param=
=3Driscv-autovec-preference=3Dscalable -fno-vect-cost-model -fdump-tree-vec=
t-details" } */
> +
> +float
> +double_reduc (float (*i)[16])
> +{
> +  float l =3D 0;
> +
> +#pragma GCC unroll 0
> +  for (int a =3D 0; a < 8; a++)
> +    for (int b =3D 0; b < 100; b++)
> +      l +=3D i[b][a];
> +  return l;
> +}
> +
> +/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+=
,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-tree-dump "Detected double reduction" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "OUTER LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_stric=
t-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c
> new file mode 100644
> index 00000000000..f742a824bb2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=3Drv32gcv_zvfh -mabi=3Dilp32d --param=
=3Driscv-autovec-preference=3Dscalable -fno-vect-cost-model -fdump-tree-vec=
t-details" } */
> +
> +float
> +double_reduc (float *i, float *j)
> +{
> +  float k =3D 0, l =3D 0;
> +
> +  for (int a =3D 0; a < 8; a++)
> +    for (int b =3D 0; b < 100; b++)
> +      {
> +        k +=3D i[b];
> +        l +=3D j[b];
> +      }
> +  return l * k;
> +}
> +
> +/* { dg-final { scan-assembler-times {vle32\.v} 2 } } */
> +/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+=
,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-tree-dump "Detected double reduction" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "OUTER LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_stric=
t_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_r=
un-1.c
> new file mode 100644
> index 00000000000..516be97e9eb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1=
.c
> @@ -0,0 +1,29 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param=3Driscv-autovec-preference=3Dscalabl=
e -fno-vect-cost-model" } */
> +
> +#include "reduc_strict-1.c"
> +
> +#define TEST_REDUC_PLUS(TYPE)                  \
> +  {                                            \
> +    TYPE a[NUM_ELEMS (TYPE)];                  \
> +    TYPE b[NUM_ELEMS (TYPE)];                  \
> +    TYPE r =3D 0, q =3D 3;                         \
> +    for (int i =3D 0; i < NUM_ELEMS (TYPE); i++) \
> +      {                                                \
> +       a[i] =3D (i * 0.1) * (i & 1 ? 1 : -1);    \
> +       b[i] =3D (i * 0.3) * (i & 1 ? 1 : -1);    \
> +       r +=3D a[i];                              \
> +       q -=3D b[i];                              \
> +       asm volatile ("" ::: "memory");         \
> +      }                                                \
> +    TYPE res =3D reduc_plus_##TYPE (a, b);       \
> +    if (res !=3D r * q)                          \
> +      __builtin_abort ();                      \
> +  }
> +
> +int __attribute__ ((optimize (1)))
> +main ()
> +{
> +  TEST_ALL (TEST_REDUC_PLUS);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_stric=
t_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_r=
un-2.c
> new file mode 100644
> index 00000000000..0a4238d96f3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2=
.c
> @@ -0,0 +1,31 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param=3Driscv-autovec-preference=3Dscalabl=
e -fno-vect-cost-model" } */
> +
> +#include "reduc_strict-2.c"
> +
> +#define NROWS 5
> +
> +#define TEST_REDUC_PLUS(TYPE)                                  \
> +  {                                                            \
> +    TYPE a[NROWS][NUM_ELEMS (TYPE)];                           \
> +    TYPE r[NROWS];                                             \
> +    TYPE expected[NROWS] =3D {};                                 \
> +    for (int i =3D 0; i < NROWS; ++i)                            \
> +      for (int j =3D 0; j < NUM_ELEMS (TYPE); ++j)               \
> +       {                                                       \
> +         a[i][j] =3D (i * 0.1 + j * 0.6) * (j & 1 ? 1 : -1);     \
> +         expected[i] +=3D a[i][j];                               \
> +         asm volatile ("" ::: "memory");                       \
> +       }                                                       \
> +    reduc_plus_##TYPE (a, r, NROWS);                           \
> +    for (int i =3D 0; i < NROWS; ++i)                            \
> +      if (r[i] !=3D expected[i])                                 \
> +       __builtin_abort ();                                     \
> +  }
> +
> +int __attribute__ ((optimize (1)))
> +main ()
> +{
> +  TEST_ALL (TEST_REDUC_PLUS);
> +  return 0;
> +}
> --
> 2.36.1
>