From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id BF1163858C41 for ; Mon, 21 Aug 2023 14:16:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BF1163858C41 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4DADA2F4; Mon, 21 Aug 2023 07:16:52 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C17473F740; Mon, 21 Aug 2023 07:16:10 -0700 (PDT) From: Richard Sandiford To: Juzhe-Zhong Mail-Followup-To: Juzhe-Zhong ,gcc-patches@gcc.gnu.org, rguenther@suse.de, richard.sandiford@arm.com Cc: gcc-patches@gcc.gnu.org, rguenther@suse.de Subject: Re: [PATCH] gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold References: <20230816131205.233568-1-juzhe.zhong@rivai.ai> Date: Mon, 21 Aug 2023 15:16:09 +0100 In-Reply-To: <20230816131205.233568-1-juzhe.zhong@rivai.ai> (Juzhe-Zhong's message of "Wed, 16 Aug 2023 21:12:05 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-25.6 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Juzhe-Zhong writes: > Hi, Richard and Richi. > > Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math. > It's supported in tree-ssa-math-opts.cc. However, GCC failed to support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS. > > Consider this following case: > #define TEST_TYPE(TYPE) \ > __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \ > TYPE *__restrict a, \ > TYPE *__restrict b, int n) \ > { \ > for (int i = 0; i < n; i++) \ > dst[i] -= a[i] * b[i]; \ > } > > #define TEST_ALL() \ > TEST_TYPE (float) \ > > TEST_ALL () > > Gimple IR for RVV: > > ... > _39 = -vect__8.14_26; > vect__10.16_21 = .COND_LEN_FMA ({ -1, ... }, vect__6.11_30, _39, vect__4.8_34, vect__4.8_34, _46, 0); > ... > > This is because this following piece of codes in tree-ssa-math-opts.cc: > > if (len) > fma_stmt > = gimple_build_call_internal (IFN_COND_LEN_FMA, 7, cond, mulop1, op2, > addop, else_value, len, bias); > else if (cond) > fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1, > op2, addop, else_value); > else > fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop); > gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt)); > gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun, > use_stmt)); > gsi_replace (&gsi, fma_stmt, true); > /* Follow all SSA edges so that we generate FMS, FNMA and FNMS > regardless of where the negation occurs. */ > gimple *orig_stmt = gsi_stmt (gsi); > if (fold_stmt (&gsi, follow_all_ssa_edges)) > { > if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi))) > gcc_unreachable (); > update_stmt (gsi_stmt (gsi)); > } > > 'fold_stmt' failed to fold NEGATE_EXPR + COND_LEN_FMA ====> COND_LEN_FNMA. > > This patch support STMT fold into: > > vect__10.16_21 = .COND_LEN_FNMA ({ -1, ... }, vect__8.14_26, vect__6.11_30, vect__4.8_34, { 0.0, ... }, _46, 0); > > Note that COND_LEN_FNMA has 7 arguments and COND_LEN_ADD has 6 arguments. > > Extend maximum num ops: > - static const unsigned int MAX_NUM_OPS = 5; > + static const unsigned int MAX_NUM_OPS = 7; > > Bootstrap and Regtest on X86 passed. > > Fully tested COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS on RISC-V backend. > > Testing on aarch64 is on progress. > > gcc/ChangeLog: > > * genmatch.cc (decision_tree::gen): Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold. > * gimple-match-exports.cc (gimple_simplify): Ditto. > (gimple_resimplify6): New function. > (gimple_resimplify7): New function. > (gimple_match_op::resimplify): Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold. > (convert_conditional_op): Ditto. > (build_call_internal): Ditto. > (try_conditional_simplification): Ditto. > (gimple_extract): Ditto. > * gimple-match.h (gimple_match_cond::gimple_match_cond): Ditto. > * internal-fn.cc (CASE): Ditto. > > --- > gcc/genmatch.cc | 2 +- > gcc/gimple-match-exports.cc | 124 ++++++++++++++++++++++++++++++++++-- > gcc/gimple-match.h | 19 +++++- > gcc/internal-fn.cc | 11 ++-- > 4 files changed, 144 insertions(+), 12 deletions(-) > > diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc > index f46d2e1520d..a1925a747a7 100644 > --- a/gcc/genmatch.cc > +++ b/gcc/genmatch.cc > @@ -4052,7 +4052,7 @@ decision_tree::gen (vec &files, bool gimple) > } > fprintf (stderr, "removed %u duplicate tails\n", rcnt); > > - for (unsigned n = 1; n <= 5; ++n) > + for (unsigned n = 1; n <= 7; ++n) > { > bool has_kids_p = false; > > diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc > index 7aeb4ddb152..895950309b7 100644 > --- a/gcc/gimple-match-exports.cc > +++ b/gcc/gimple-match-exports.cc > @@ -60,6 +60,12 @@ extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), > code_helper, tree, tree, tree, tree, tree); > extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), > code_helper, tree, tree, tree, tree, tree, tree); > +extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), > + code_helper, tree, tree, tree, tree, tree, tree, > + tree); > +extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), > + code_helper, tree, tree, tree, tree, tree, tree, > + tree, tree); > > /* Functions that are needed by gimple-match but that are exported and used in > other places in the compiler. */ > @@ -89,6 +95,8 @@ static bool gimple_resimplify2 (gimple_seq *, gimple_match_op *, tree (*)(tree)) > static bool gimple_resimplify3 (gimple_seq *, gimple_match_op *, tree (*)(tree)); > static bool gimple_resimplify4 (gimple_seq *, gimple_match_op *, tree (*)(tree)); > static bool gimple_resimplify5 (gimple_seq *, gimple_match_op *, tree (*)(tree)); > +static bool gimple_resimplify6 (gimple_seq *, gimple_match_op *, tree (*)(tree)); > +static bool gimple_resimplify7 (gimple_seq *, gimple_match_op *, tree (*)(tree)); > > /* Match and simplify the toplevel valueized operation THIS. > Replaces THIS with a simplified and/or canonicalized result and > @@ -109,6 +117,10 @@ gimple_match_op::resimplify (gimple_seq *seq, tree (*valueize)(tree)) > return gimple_resimplify4 (seq, this, valueize); > case 5: > return gimple_resimplify5 (seq, this, valueize); > + case 6: > + return gimple_resimplify6 (seq, this, valueize); > + case 7: > + return gimple_resimplify7 (seq, this, valueize); > default: > gcc_unreachable (); > } > @@ -146,7 +158,14 @@ convert_conditional_op (gimple_match_op *orig_op, > if (ifn == IFN_LAST) > return false; > unsigned int num_ops = orig_op->num_ops; > - new_op->set_op (as_combined_fn (ifn), orig_op->type, num_ops + 2); > + unsigned int num_cond_ops = 2; > + if (orig_op->cond.len) > + { > + /* Convert COND_FNMA to COND_LEN_FNMA if it has LEN and BIAS. */ Suggest just: /* Add the length and bias parameters. */ > + ifn = get_len_internal_fn (ifn); > + num_cond_ops = 4; > + } > + new_op->set_op (as_combined_fn (ifn), orig_op->type, num_ops + num_cond_ops); > new_op->ops[0] = orig_op->cond.cond; > for (unsigned int i = 0; i < num_ops; ++i) > new_op->ops[i + 1] = orig_op->ops[i]; > @@ -155,6 +174,11 @@ convert_conditional_op (gimple_match_op *orig_op, > else_value = targetm.preferred_else_value (ifn, orig_op->type, > num_ops, orig_op->ops); > new_op->ops[num_ops + 1] = else_value; > + if (orig_op->cond.len) > + { > + new_op->ops[num_ops + 2] = orig_op->cond.len; > + new_op->ops[num_ops + 3] = orig_op->cond.bias; > + } > return true; > } > /* Helper for gimple_simplify valueizing OP using VALUEIZE and setting > @@ -219,7 +243,9 @@ build_call_internal (internal_fn fn, gimple_match_op *res_op) > res_op->op_or_null (1), > res_op->op_or_null (2), > res_op->op_or_null (3), > - res_op->op_or_null (4)); > + res_op->op_or_null (4), > + res_op->op_or_null (5), > + res_op->op_or_null (6)); > } > > /* RES_OP is the result of a simplification. If it is conditional, > @@ -319,6 +345,7 @@ try_conditional_simplification (internal_fn ifn, gimple_match_op *res_op, > { > code_helper op; > tree_code code = conditional_internal_fn_code (ifn); > + int len_index = internal_fn_len_index (ifn); > if (code != ERROR_MARK) > op = code; > else > @@ -330,12 +357,20 @@ try_conditional_simplification (internal_fn ifn, gimple_match_op *res_op, > } > > unsigned int num_ops = res_op->num_ops; > + /* num_cond_ops = 2 for COND_ADD (MASK and ELSE) > + wheras num_cond_ops = 4 for COND_LEN_ADD (MASK, ELSE, LEN and BIAS). */ > + unsigned int num_cond_ops = len_index < 0 ? 2 : 4; > + tree else_value > + = len_index < 0 ? res_op->ops[num_ops - 1] : res_op->ops[num_ops - 3]; > + tree len = len_index < 0 ? NULL_TREE : res_op->ops[num_ops - 2]; > + tree bias = len_index < 0 ? NULL_TREE : res_op->ops[num_ops - 1]; > + unsigned int last_op_index = len_index < 0 ? num_ops - 1 : num_ops - 3; What's this used for? I'm surprised it doesn't cause a bootstrap failure. > gimple_match_op cond_op (gimple_match_cond (res_op->ops[0], > - res_op->ops[num_ops - 1]), > - op, res_op->type, num_ops - 2); > + else_value, len, bias), > + op, res_op->type, num_ops - num_cond_ops); > > memcpy (cond_op.ops, res_op->ops + 1, (num_ops - 1) * sizeof *cond_op.ops); > - switch (num_ops - 2) > + switch (num_ops - num_cond_ops) > { > case 1: > if (!gimple_resimplify1 (seq, &cond_op, valueize)) > @@ -717,7 +752,7 @@ gimple_extract (gimple *stmt, gimple_match_op *res_op, > /* ??? This way we can't simplify calls with side-effects. */ > if (gimple_call_lhs (stmt) != NULL_TREE > && gimple_call_num_args (stmt) >= 1 > - && gimple_call_num_args (stmt) <= 5) > + && gimple_call_num_args (stmt) <= 7) > { > combined_fn cfn; > if (gimple_call_internal_p (stmt)) > @@ -1145,6 +1180,83 @@ gimple_resimplify5 (gimple_seq *seq, gimple_match_op *res_op, > return canonicalized; > } > > +/* Helper that matches and simplifies the toplevel result from > + a gimple_simplify run (where we don't want to build > + a stmt in case it's used in in-place folding). Replaces > + RES_OP with a simplified and/or canonicalized result and > + returns whether any change was made. */ > + > +static bool > +gimple_resimplify6 (gimple_seq *seq, gimple_match_op *res_op, > + tree (*valueize)(tree)) > +{ > + /* No constant folding is defined for five-operand functions. */ > + > + /* Canonicalize operand order. */ > + bool canonicalized = false; > + int argno = first_commutative_argument (res_op->code, res_op->type); > + if (argno >= 0 > + && tree_swap_operands_p (res_op->ops[argno], res_op->ops[argno + 1])) > + { > + std::swap (res_op->ops[argno], res_op->ops[argno + 1]); > + canonicalized = true; > + } > + > + gimple_match_op res_op2 (*res_op); > + if (gimple_simplify (&res_op2, seq, valueize, > + res_op->code, res_op->type, > + res_op->ops[0], res_op->ops[1], res_op->ops[2], > + res_op->ops[3], res_op->ops[4], res_op->ops[5])) > + { > + *res_op = res_op2; > + return true; > + } > + > + if (maybe_resimplify_conditional_op (seq, res_op, valueize)) > + return true; > + > + return canonicalized; > +} > + > +/* Helper that matches and simplifies the toplevel result from > + a gimple_simplify run (where we don't want to build > + a stmt in case it's used in in-place folding). Replaces > + RES_OP with a simplified and/or canonicalized result and > + returns whether any change was made. */ > + > +static bool > +gimple_resimplify7 (gimple_seq *seq, gimple_match_op *res_op, > + tree (*valueize)(tree)) > +{ > + /* No constant folding is defined for five-operand functions. */ > + > + /* Canonicalize operand order. */ > + bool canonicalized = false; > + int argno = first_commutative_argument (res_op->code, res_op->type); > + if (argno >= 0 > + && tree_swap_operands_p (res_op->ops[argno], res_op->ops[argno + 1])) > + { > + std::swap (res_op->ops[argno], res_op->ops[argno + 1]); > + canonicalized = true; > + } > + > + gimple_match_op res_op2 (*res_op); > + if (gimple_simplify (&res_op2, seq, valueize, > + res_op->code, res_op->type, > + res_op->ops[0], res_op->ops[1], res_op->ops[2], > + res_op->ops[3], res_op->ops[4], res_op->ops[5], > + res_op->ops[6])) > + { > + *res_op = res_op2; > + return true; > + } > + > + if (maybe_resimplify_conditional_op (seq, res_op, valueize)) > + return true; > + > + return canonicalized; > +} > + > /* Return a canonical form for CODE when operating on TYPE. The idea > is to remove redundant ways of representing the same operation so > that code_helpers can be hashed and compared for equality. > diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h > index b20585dca4b..c16cfede826 100644 > --- a/gcc/gimple-match.h > +++ b/gcc/gimple-match.h > @@ -34,6 +34,7 @@ public: > /* Build an unconditional op. */ > gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {} > gimple_match_cond (tree, tree); > + gimple_match_cond (tree, tree, tree, tree); > > gimple_match_cond any_else () const; > > @@ -44,6 +45,16 @@ public: > /* The value to use when the condition is false. This is NULL_TREE if > the operation is unconditional or if the value doesn't matter. */ > tree else_value; > + > + /* The length value for conditional len operation that operation occurs > + when element index < len + bias. This is NULL_TREE if the operation > + is non-LEN operation. */ > + tree len; > + > + /* The bias value for conditional len operation that operation occurs > + when element index < len + bias. This is NULL_TREE if the operation > + is non-LEN operation. */ > + tree bias; I don't think these two need separate comments. How about: /* The length and bias parameters to be applied to a vector operation, so that the condition is forced to false when the element index is >= LEN + BIAS. These are NULL_TREE if the operation isn't applied to vectors, or if no such length limit is in use. */ tree len; tree bias; > }; > > inline > @@ -52,6 +63,12 @@ gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in) > { > } > > +inline > +gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in, > + tree len_in, tree bias_in) > + : cond (cond_in), else_value (else_value_in), len (len_in), bias (bias_in) > +{} > + > /* Return a gimple_match_cond with the same condition but with an > arbitrary ELSE_VALUE. */ > > @@ -93,7 +110,7 @@ public: > bool resimplify (gimple_seq *, tree (*)(tree)); > > /* The maximum value of NUM_OPS. */ > - static const unsigned int MAX_NUM_OPS = 5; > + static const unsigned int MAX_NUM_OPS = 7; > > /* The conditions under which the operation is performed, and the value to > use as a fallback. */ > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index cc1ede58799..be9b93e9f71 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4477,11 +4477,14 @@ get_unconditional_internal_fn (internal_fn ifn) > { > switch (ifn) > { > -#define CASE(NAME) case IFN_COND_##NAME: return IFN_##NAME; > - FOR_EACH_COND_FN_PAIR(CASE) > +#define CASE(NAME) \ > + case IFN_COND_##NAME: \ > + case IFN_COND_LEN_##NAME: \ > + return IFN_##NAME; > + FOR_EACH_COND_FN_PAIR (CASE) > #undef CASE > - default: > - return IFN_LAST; > + default: > + return IFN_LAST; > } > } The reindentation isn't correct: labels are supposed to line up with the "{". LGTM otherwise. Thanks, Richard