From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 5D9D13858D32 for ; Wed, 19 Jul 2023 08:17:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5D9D13858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 64B091FD67; Wed, 19 Jul 2023 08:17:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1689754631; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=HxRFFMaMD68LzSxb0tS9sO+kxeCYZcNGHlti9bxM2Xs=; b=1V9sLCeFJSUPSl1EhggFysh/4ewPXZICOwrhztrS/p8VYy7a7FdcSCucHBSmcOQ9Dxh/zt lumcLH+mbMmHmbF0eX4GL9MWygePN+Ri8tdZF92sYBeXrZupGyAafbuydCg66zSL/duwuH IRLcmE68FWJ4McgZNWEDABNj1piABgc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1689754631; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=HxRFFMaMD68LzSxb0tS9sO+kxeCYZcNGHlti9bxM2Xs=; b=NUMYhzIBKUFFt3/99YoFNCQQnwEkPSSmdbcwKadCA0smCz2V0JZ8SF6VU61rsdJKvM71SI DHJVw1KCNe65U9CA== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 5126D2C142; Wed, 19 Jul 2023 08:17:11 +0000 (UTC) Date: Wed, 19 Jul 2023 08:17:11 +0000 (UTC) From: Richard Biener To: Ju-Zhe Zhong cc: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: Re: [PATCH] VECT: Add mask_len_fold_left_plus for in-order floating-point reduction In-Reply-To: <20230714234500.75826-1-juzhe.zhong@rivai.ai> Message-ID: References: <20230714234500.75826-1-juzhe.zhong@rivai.ai> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sat, 15 Jul 2023, juzhe.zhong@rivai.ai wrote: > From: Ju-Zhe Zhong > > Hi, Richard and Richi. > > This patch adds mask_len_fold_left_plus pattern to support in-order floating-point > reduction for target support len loop control. > > Consider this following case: > double > foo2 (double *__restrict a, > double init, > int *__restrict cond, > int n) > { > for (int i = 0; i < n; i++) > if (cond[i]) > init += a[i]; > return init; > } > > ARM SVE: > > ... > vec_mask_and_60 = loop_mask_54 & mask__23.33_57; > vect__ifc__35.37_64 = .VCOND_MASK (vec_mask_and_60, vect__8.36_61, { 0.0, ... }); > _36 = .MASK_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, loop_mask_54); > ... > > For RVV, we want to see: > ... > _36 = .MASK_LEN_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, control_mask, loop_len, bias); > ... OK. Richard. > gcc/ChangeLog: > > * doc/md.texi: Add mask_len_fold_left_plus. > * internal-fn.cc (mask_len_fold_left_direct): Ditto. > (expand_mask_len_fold_left_optab_fn): Ditto. > (direct_mask_len_fold_left_optab_supported_p): Ditto. > * internal-fn.def (MASK_LEN_FOLD_LEFT_PLUS): Ditto. > * optabs.def (OPTAB_D): Ditto. > > --- > gcc/doc/md.texi | 13 +++++++++++++ > gcc/internal-fn.cc | 5 +++++ > gcc/internal-fn.def | 3 +++ > gcc/optabs.def | 1 + > 4 files changed, 22 insertions(+) > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index cbcb992e5d7..6f44e66399d 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -5615,6 +5615,19 @@ no reassociation. > Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand > (operand 3) that specifies which elements of the source vector should be added. > > +@cindex @code{mask_len_fold_left_plus_@var{m}} instruction pattern > +@item @code{mask_len_fold_left_plus_@var{m}} > +Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand > +(operand 3), len operand (operand 4) and bias operand (operand 5) that > +performs following operations strictly in-order (no reassociation): > + > +@smallexample > +operand0 = operand1; > +for (i = 0; i < LEN + BIAS; i++) > + if (operand3[i]) > + operand0 += operand2[i]; > +@end smallexample > + > @cindex @code{sdot_prod@var{m}} instruction pattern > @item @samp{sdot_prod@var{m}} > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index e698f0bffc7..2bf4fc492fe 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -190,6 +190,7 @@ init_internal_fns () > #define fold_extract_direct { 2, 2, false } > #define fold_left_direct { 1, 1, false } > #define mask_fold_left_direct { 1, 1, false } > +#define mask_len_fold_left_direct { 1, 1, false } > #define check_ptrs_direct { 0, 0, false } > > const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = { > @@ -3890,6 +3891,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab, > #define expand_mask_fold_left_optab_fn(FN, STMT, OPTAB) \ > expand_direct_optab_fn (FN, STMT, OPTAB, 3) > > +#define expand_mask_len_fold_left_optab_fn(FN, STMT, OPTAB) \ > + expand_direct_optab_fn (FN, STMT, OPTAB, 5) > + > #define expand_check_ptrs_optab_fn(FN, STMT, OPTAB) \ > expand_direct_optab_fn (FN, STMT, OPTAB, 4) > > @@ -3997,6 +4001,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, > #define direct_fold_extract_optab_supported_p direct_optab_supported_p > #define direct_fold_left_optab_supported_p direct_optab_supported_p > #define direct_mask_fold_left_optab_supported_p direct_optab_supported_p > +#define direct_mask_len_fold_left_optab_supported_p direct_optab_supported_p > #define direct_check_ptrs_optab_supported_p direct_optab_supported_p > #define direct_vec_set_optab_supported_p direct_optab_supported_p > #define direct_vec_extract_optab_supported_p direct_optab_supported_p > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index ea750a921ed..d3aec51b1f2 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -319,6 +319,9 @@ DEF_INTERNAL_OPTAB_FN (FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW, > DEF_INTERNAL_OPTAB_FN (MASK_FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW, > mask_fold_left_plus, mask_fold_left) > > +DEF_INTERNAL_OPTAB_FN (MASK_LEN_FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW, > + mask_len_fold_left_plus, mask_len_fold_left) > + > /* Unary math functions. */ > DEF_INTERNAL_FLT_FN (ACOS, ECF_CONST, acos, unary) > DEF_INTERNAL_FLT_FN (ACOSH, ECF_CONST, acosh, unary) > diff --git a/gcc/optabs.def b/gcc/optabs.def > index 3dae228fba6..7023392979e 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -385,6 +385,7 @@ OPTAB_D (reduc_ior_scal_optab, "reduc_ior_scal_$a") > OPTAB_D (reduc_xor_scal_optab, "reduc_xor_scal_$a") > OPTAB_D (fold_left_plus_optab, "fold_left_plus_$a") > OPTAB_D (mask_fold_left_plus_optab, "mask_fold_left_plus_$a") > +OPTAB_D (mask_len_fold_left_plus_optab, "mask_len_fold_left_plus_$a") > > OPTAB_D (extract_last_optab, "extract_last_$a") > OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a") > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)