From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpbgau1.qq.com (smtpbgau1.qq.com [54.206.16.166]) by sourceware.org (Postfix) with ESMTPS id 2FBA73858CD1 for ; Fri, 14 Jul 2023 23:45:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2FBA73858CD1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp78t1689378304tk61szq0 Received: from server1.localdomain ( [58.60.1.22]) by bizesmtp.qq.com (ESMTP) with id ; Sat, 15 Jul 2023 07:45:02 +0800 (CST) X-QQ-SSF: 01400000000000G0T000000A0000000 X-QQ-FEAT: Zre9CpKCW9MOh+NzI+ssEzFxGRnwZAMXf2aZ6Ea7UEK0DhtF7bZzmkKRe/fgI NRlQ6hkYWnipN3LLWaynZilVZSopn+IKN6DG89RScq6nYqmtQCnq0gB7rl4wajaSyqDdmPT cWTqcoIV4CbZCQZ/bfPQqF0IVHSjmIstC+0EbfRkmoyC32FH27YeD3jxq0uMNhqQtvwlGpk nEIx5Nd+dw5D6VsYpgpMEe/CHXzm/ZKVZcS/5cWWb4ZKVSXoRo3cLKrSnQTT6l7TJT5Qlr7 +HjSzwVGRqDls0jwO979znoUMyycOpBxK9a+3Ch+7lEJNcTaRWhZiS/Z/0wRNjj2uGTFfh9 GKXGYrgBquvow5axTzFjJFGphquJs1ngYihVdD9 X-QQ-GoodBg: 2 X-BIZMAIL-ID: 15584993614538988058 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: rguenther@suse.de, richard.sandiford@arm.com, Ju-Zhe Zhong Subject: [PATCH] VECT: Add mask_len_fold_left_plus for in-order floating-point reduction Date: Sat, 15 Jul 2023 07:45:00 +0800 Message-Id: <20230714234500.75826-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,RCVD_IN_BARRACUDACENTRAL,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: From: Ju-Zhe Zhong Hi, Richard and Richi. This patch adds mask_len_fold_left_plus pattern to support in-order floating-point reduction for target support len loop control. Consider this following case: double foo2 (double *__restrict a, double init, int *__restrict cond, int n) { for (int i = 0; i < n; i++) if (cond[i]) init += a[i]; return init; } ARM SVE: ... vec_mask_and_60 = loop_mask_54 & mask__23.33_57; vect__ifc__35.37_64 = .VCOND_MASK (vec_mask_and_60, vect__8.36_61, { 0.0, ... }); _36 = .MASK_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, loop_mask_54); ... For RVV, we want to see: ... _36 = .MASK_LEN_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, control_mask, loop_len, bias); ... gcc/ChangeLog: * doc/md.texi: Add mask_len_fold_left_plus. * internal-fn.cc (mask_len_fold_left_direct): Ditto. (expand_mask_len_fold_left_optab_fn): Ditto. (direct_mask_len_fold_left_optab_supported_p): Ditto. * internal-fn.def (MASK_LEN_FOLD_LEFT_PLUS): Ditto. * optabs.def (OPTAB_D): Ditto. --- gcc/doc/md.texi | 13 +++++++++++++ gcc/internal-fn.cc | 5 +++++ gcc/internal-fn.def | 3 +++ gcc/optabs.def | 1 + 4 files changed, 22 insertions(+) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index cbcb992e5d7..6f44e66399d 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5615,6 +5615,19 @@ no reassociation. Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand (operand 3) that specifies which elements of the source vector should be added. +@cindex @code{mask_len_fold_left_plus_@var{m}} instruction pattern +@item @code{mask_len_fold_left_plus_@var{m}} +Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand +(operand 3), len operand (operand 4) and bias operand (operand 5) that +performs following operations strictly in-order (no reassociation): + +@smallexample +operand0 = operand1; +for (i = 0; i < LEN + BIAS; i++) + if (operand3[i]) + operand0 += operand2[i]; +@end smallexample + @cindex @code{sdot_prod@var{m}} instruction pattern @item @samp{sdot_prod@var{m}} diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index e698f0bffc7..2bf4fc492fe 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -190,6 +190,7 @@ init_internal_fns () #define fold_extract_direct { 2, 2, false } #define fold_left_direct { 1, 1, false } #define mask_fold_left_direct { 1, 1, false } +#define mask_len_fold_left_direct { 1, 1, false } #define check_ptrs_direct { 0, 0, false } const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = { @@ -3890,6 +3891,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab, #define expand_mask_fold_left_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 3) +#define expand_mask_len_fold_left_optab_fn(FN, STMT, OPTAB) \ + expand_direct_optab_fn (FN, STMT, OPTAB, 5) + #define expand_check_ptrs_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 4) @@ -3997,6 +4001,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, #define direct_fold_extract_optab_supported_p direct_optab_supported_p #define direct_fold_left_optab_supported_p direct_optab_supported_p #define direct_mask_fold_left_optab_supported_p direct_optab_supported_p +#define direct_mask_len_fold_left_optab_supported_p direct_optab_supported_p #define direct_check_ptrs_optab_supported_p direct_optab_supported_p #define direct_vec_set_optab_supported_p direct_optab_supported_p #define direct_vec_extract_optab_supported_p direct_optab_supported_p diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index ea750a921ed..d3aec51b1f2 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -319,6 +319,9 @@ DEF_INTERNAL_OPTAB_FN (FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW, DEF_INTERNAL_OPTAB_FN (MASK_FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW, mask_fold_left_plus, mask_fold_left) +DEF_INTERNAL_OPTAB_FN (MASK_LEN_FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW, + mask_len_fold_left_plus, mask_len_fold_left) + /* Unary math functions. */ DEF_INTERNAL_FLT_FN (ACOS, ECF_CONST, acos, unary) DEF_INTERNAL_FLT_FN (ACOSH, ECF_CONST, acosh, unary) diff --git a/gcc/optabs.def b/gcc/optabs.def index 3dae228fba6..7023392979e 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -385,6 +385,7 @@ OPTAB_D (reduc_ior_scal_optab, "reduc_ior_scal_$a") OPTAB_D (reduc_xor_scal_optab, "reduc_xor_scal_$a") OPTAB_D (fold_left_plus_optab, "fold_left_plus_$a") OPTAB_D (mask_fold_left_plus_optab, "mask_fold_left_plus_$a") +OPTAB_D (mask_len_fold_left_plus_optab, "mask_len_fold_left_plus_$a") OPTAB_D (extract_last_optab, "extract_last_$a") OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a") -- 2.36.1