public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS @ 2023-09-13 9:31 juzhe.zhong at rivai dot ai 2023-09-13 9:46 ` [Bug c/111401] " rguenth at gcc dot gnu.org ` (8 more replies) 0 siblings, 9 replies; 10+ messages in thread From: juzhe.zhong at rivai dot ai @ 2023-09-13 9:31 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 Bug ID: 111401 Summary: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- There is a case I think I missed the optimization in the loop vectorizer: https://godbolt.org/z/x5sjdenhM double foo2 (double *__restrict a, double init, int *__restrict cond, int n) { for (int i = 0; i < n; i++) if (cond[i]) init += a[i]; return init; } It generate the GIMPLE IR as follows: _60 = .SELECT_VL (ivtmp_58, 4); ... vect__ifc__35.14_56 = .VCOND_MASK (mask__23.10_50, vect__8.13_54, { 0.0, 0.0, 0.0, 0.0 }); _36 = .MASK_LEN_FOLD_LEFT_PLUS (init_20, vect__ifc__35.14_56, { -1, -1, -1, -1 }, _60, 0); The mask of MASK_LEN_FOLD_LEFT_PLUS is the dummy mask {-1.-1,...-1} I think we should forward the mask of VCOND_MASK into the MASK_LEN_FOLD_LEFT_PLUS. Then we can eliminate the VCOND_MASK. I don't where is the optimal place to do the optimization. Should be the match.pd ? or the loop vectorizer code? Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS 2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai @ 2023-09-13 9:46 ` rguenth at gcc dot gnu.org 2023-09-13 16:52 ` [Bug middle-end/111401] " rdapp at gcc dot gnu.org ` (7 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu.org @ 2023-09-13 9:46 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed| |2023-09-13 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- The vectorizer sees if-converted code like <bb 3> [local count: 955630224]: # init_20 = PHI <_36(8), init_12(D)(18)> # i_22 = PHI <i_18(8), 0(18)> _1 = (long unsigned int) i_22; _2 = _1 * 4; _3 = cond_15(D) + _2; _4 = *_3; _23 = _4 != 0; _6 = _1 * 8; _38 = _37 + _6; _7 = (double *) _38; _8 = .MASK_LOAD (_7, 64B, _23); _ifc__35 = _23 ? _8 : 0.0; _36 = init_20 + _ifc__35; i_18 = i_22 + 1; if (n_13(D) > i_18) so what it produces matches up here. There's the possibility to modify the if-conversion handling to use a COND_ADD instead of the COND_EXPR plus ADD, I think that would be the best thing here. See tree-if-conv.cc:is_cond_scalar_reduction/convert_scalar_cond_reduction I think this is also wrong code when signed zeros are involved. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS 2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai 2023-09-13 9:46 ` [Bug c/111401] " rguenth at gcc dot gnu.org @ 2023-09-13 16:52 ` rdapp at gcc dot gnu.org 2023-09-13 21:25 ` rdapp at gcc dot gnu.org ` (6 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rdapp at gcc dot gnu.org @ 2023-09-13 16:52 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 Robin Dapp <rdapp at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rdapp at gcc dot gnu.org --- Comment #2 from Robin Dapp <rdapp at gcc dot gnu.org> --- I played around with this a bit. Emitting a COND_LEN in if-convert is easy: _ifc__35 = .COND_ADD (_23, init_20, _8, init_20); However, during reduction handling we rely on the reduction being a gimple assign and binary operation, though so I needed to fix some places and indices as well as the proper mask. What complicates things a bit is that we assume that "init_20" (i.e. the reduction def) occurs once when we have it twice in the COND_ADD. I just special cased that for now. Is this the proper thing to do? diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 23c6e8259e7..e99add3cf16 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) static bool fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn) { - if (code == PLUS_EXPR) + if (code == PLUS_EXPR || code == IFN_COND_ADD) { *reduc_fn = IFN_FOLD_LEFT_PLUS; return true; @@ -4106,8 +4106,11 @@ vect_is_simple_reduction (loop_vec_info loop_info, stmt_vec_info phi_info, return NULL; } - nphi_def_loop_uses++; - phi_use_stmt = use_stmt; + if (use_stmt != phi_use_stmt) + { + nphi_def_loop_uses++; + phi_use_stmt = use_stmt; + } @@ -7440,6 +7457,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo, if (i == STMT_VINFO_REDUC_IDX (stmt_info)) continue; + if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)]) + continue; + Apart from that I think what's mainly missing is making the added code nicer. Going to attach a tentative patch later. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS 2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai 2023-09-13 9:46 ` [Bug c/111401] " rguenth at gcc dot gnu.org 2023-09-13 16:52 ` [Bug middle-end/111401] " rdapp at gcc dot gnu.org @ 2023-09-13 21:25 ` rdapp at gcc dot gnu.org 2023-09-14 6:46 ` rguenther at suse dot de ` (5 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rdapp at gcc dot gnu.org @ 2023-09-13 21:25 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 --- Comment #3 from Robin Dapp <rdapp at gcc dot gnu.org> --- Several other things came up, so I'm just going to post the latest status here without having revised or tested it. Going to try fixing it and testing tomorrow. --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) static bool fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn) { - if (code == PLUS_EXPR) + if (code == PLUS_EXPR || code == IFN_COND_ADD) { *reduc_fn = IFN_FOLD_LEFT_PLUS; return true; @@ -4106,8 +4106,13 @@ vect_is_simple_reduction (loop_vec_info loop_info, stmt_vec_info phi_info, return NULL; } - nphi_def_loop_uses++; - phi_use_stmt = use_stmt; + /* We might have two uses in the same instruction, only count them as + one. */ + if (use_stmt != phi_use_stmt) + { + nphi_def_loop_uses++; + phi_use_stmt = use_stmt; + } } tree latch_def = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop)); @@ -6861,7 +6866,7 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, gimple **vec_stmt, slp_tree slp_node, gimple *reduc_def_stmt, tree_code code, internal_fn reduc_fn, - tree ops[3], tree vectype_in, + tree *ops, int num_ops, tree vectype_in, int reduc_index, vec_loop_masks *masks, vec_loop_lens *lens) { @@ -6883,11 +6888,24 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (vectype_out), TYPE_VECTOR_SUBPARTS (vectype_in))); - tree op0 = ops[1 - reduc_index]; + /* The operands either come from a binary operation or a COND_ADD operation. + The former is a gimple assign and the latter is a gimple call with four + arguments. */ + gcc_assert (num_ops == 2 || num_ops == 4); + bool is_cond_add = num_ops == 4; + tree op0, opmask; + if (!is_cond_add) + op0 = ops[1 - reduc_index]; + else + { + op0 = ops[2]; + opmask = ops[0]; + gcc_assert (!slp_node); + } int group_size = 1; stmt_vec_info scalar_dest_def_info; - auto_vec<tree> vec_oprnds0; + auto_vec<tree> vec_oprnds0, vec_opmask; if (slp_node) { auto_vec<vec<tree> > vec_defs (2); @@ -6903,9 +6921,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1, op0, &vec_oprnds0); scalar_dest_def_info = stmt_info; + if (is_cond_add) + { + vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1, + opmask, &vec_opmask); + gcc_assert (vec_opmask.length() == 1); + } } - tree scalar_dest = gimple_assign_lhs (scalar_dest_def_info->stmt); + gimple *sdef = scalar_dest_def_info->stmt; + tree scalar_dest = is_gimple_call (sdef) + ? gimple_call_lhs (sdef) + : gimple_assign_lhs (scalar_dest_def_info->stmt); tree scalar_type = TREE_TYPE (scalar_dest); tree reduc_var = gimple_phi_result (reduc_def_stmt); @@ -6945,7 +6972,11 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, i, 1); signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); bias = build_int_cst (intQI_type_node, biasval); - mask = build_minus_one_cst (truth_type_for (vectype_in)); + /* If we have a COND_ADD take its mask. Otherwise use {-1, ...}. */ + if (is_cond_add) + mask = vec_opmask[0]; + else + mask = build_minus_one_cst (truth_type_for (vectype_in)); } /* Handle MINUS by adding the negative. */ @@ -7440,6 +7471,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo, if (i == STMT_VINFO_REDUC_IDX (stmt_info)) continue; + if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)]) + continue; + /* There should be only one cycle def in the stmt, the one leading to reduc_def. */ if (VECTORIZABLE_CYCLE_DEF (dt)) @@ -8211,8 +8245,21 @@ vect_transform_reduction (loop_vec_info loop_vinfo, vec_num = 1; } - code_helper code = canonicalize_code (op.code, op.type); - internal_fn cond_fn = get_conditional_internal_fn (code, op.type); + code_helper code (op.code); + internal_fn cond_fn; + + if (code.is_internal_fn ()) + { + internal_fn ifn = internal_fn (op.code); + code = canonicalize_code (conditional_internal_fn_code (ifn), op.type); + cond_fn = ifn; + } + else + { + code = canonicalize_code (op.code, op.type); + cond_fn = get_conditional_internal_fn (code, op.type); + } + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn, vectype_in); @@ -8240,8 +8287,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo, gcc_assert (code.is_tree_code ()); return vectorize_fold_left_reduction (loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi, - tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks, - lens); + tree_code (code), reduc_fn, op.ops, op.num_ops, vectype_in, + reduc_index, masks, lens); } bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info); ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS 2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai ` (2 preceding siblings ...) 2023-09-13 21:25 ` rdapp at gcc dot gnu.org @ 2023-09-14 6:46 ` rguenther at suse dot de 2023-09-14 6:51 ` rguenther at suse dot de ` (4 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rguenther at suse dot de @ 2023-09-14 6:46 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 --- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 13 Sep 2023, rdapp at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 > > Robin Dapp <rdapp at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |rdapp at gcc dot gnu.org > > --- Comment #2 from Robin Dapp <rdapp at gcc dot gnu.org> --- > I played around with this a bit. Emitting a COND_LEN in if-convert is easy: > > _ifc__35 = .COND_ADD (_23, init_20, _8, init_20); > > However, during reduction handling we rely on the reduction being a gimple > assign and binary operation, though so I needed to fix some places and indices > as well as the proper mask. > > What complicates things a bit is that we assume that "init_20" (i.e. the > reduction def) occurs once when we have it twice in the COND_ADD. I just > special cased that for now. Is this the proper thing to do? I think so - we should ignore a use in the else value when the other use is in that same stmt. > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index 23c6e8259e7..e99add3cf16 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared > *shared) > static bool > fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn) > { > - if (code == PLUS_EXPR) > + if (code == PLUS_EXPR || code == IFN_COND_ADD) > { > *reduc_fn = IFN_FOLD_LEFT_PLUS; > return true; > @@ -4106,8 +4106,11 @@ vect_is_simple_reduction (loop_vec_info loop_info, > stmt_vec_info phi_info, > return NULL; > } > > - nphi_def_loop_uses++; > - phi_use_stmt = use_stmt; > + if (use_stmt != phi_use_stmt) > + { > + nphi_def_loop_uses++; > + phi_use_stmt = use_stmt; > + } > > @@ -7440,6 +7457,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > if (i == STMT_VINFO_REDUC_IDX (stmt_info)) > continue; > > + if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)]) > + continue; > + > > Apart from that I think what's mainly missing is making the added code nicer. > Going to attach a tentative patch later. > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS 2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai ` (3 preceding siblings ...) 2023-09-14 6:46 ` rguenther at suse dot de @ 2023-09-14 6:51 ` rguenther at suse dot de 2023-09-14 15:07 ` rdapp at gcc dot gnu.org ` (3 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rguenther at suse dot de @ 2023-09-14 6:51 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 --- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 13 Sep 2023, rdapp at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 > > --- Comment #3 from Robin Dapp <rdapp at gcc dot gnu.org> --- > Several other things came up, so I'm just going to post the latest status here > without having revised or tested it. Going to try fixing it and testing > tomorrow. I think what's important to do is make sure targets without masking are still getting the cond-reduction code generation (but with the signed-zero issue fixed). Using a cond_add is probably better than the vec_cond + add even for the not fold-left reduction case. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS 2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai ` (4 preceding siblings ...) 2023-09-14 6:51 ` rguenther at suse dot de @ 2023-09-14 15:07 ` rdapp at gcc dot gnu.org 2023-09-15 6:42 ` rguenth at gcc dot gnu.org ` (2 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rdapp at gcc dot gnu.org @ 2023-09-14 15:07 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 --- Comment #6 from Robin Dapp <rdapp at gcc dot gnu.org> --- Created attachment 55902 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55902&action=edit Tentative You're referring to the case where we have init = -0.0, the condition is false and we end up wrongly doing -0.0 + 0.0 = 0.0? I suppose -0.0 the proper neutral element for PLUS (and WIDEN_SUM?) when honoring signed zeros? And 0.0 for MINUS? Doesn't that also depend on the rounding mode? neutral_op_for_reduction could return a -0 for PLUS if we honor it for that type. Or is that too intrusive? Guess I should add a test case for that as well. Another thing is that swapping operands is not as easy with COND_ADD because the addition would be in the else. I'd punt for that case for now. Next problem - might be a mistake on my side. For avx512 we create a COND_ADD but the respective MASK_FOLD_LEFT_PLUS is not available, causing us to create numerous vec_extracts as fallback that increase the cost until we don't vectorize anymore. Therefore I added a vectorized_internal_fn_supported_p (IFN_FOLD_LEFT_PLUS, TREE_TYPE (lhs)). SLP paths and ncopies != 1 are excluded as well. Not really happy with how the patch looks now but at least the testsuites on aarch and x86 pass. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS 2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai ` (5 preceding siblings ...) 2023-09-14 15:07 ` rdapp at gcc dot gnu.org @ 2023-09-15 6:42 ` rguenth at gcc dot gnu.org 2023-11-02 10:50 ` cvs-commit at gcc dot gnu.org 2023-11-02 22:40 ` juzhe.zhong at rivai dot ai 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu.org @ 2023-09-15 6:42 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 --- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Robin Dapp from comment #6) > Created attachment 55902 [details] > Tentative > > You're referring to the case where we have init = -0.0, the condition is > false and we end up wrongly doing -0.0 + 0.0 = 0.0? > I suppose -0.0 the proper neutral element for PLUS (and WIDEN_SUM?) when > honoring signed zeros? And 0.0 for MINUS? Doesn't that also depend on the > rounding mode? Yes, if the rounding mode isn't known there isn't a working neutral element. > neutral_op_for_reduction could return a -0 for PLUS if we honor it for that > type. Or is that too intrusive? I suppose that could work, but we need to check that we're not using this for the initial value. > Guess I should add a test case for that as well. > > Another thing is that swapping operands is not as easy with COND_ADD because > the addition would be in the else. I'd punt for that case for now. > > Next problem - might be a mistake on my side. For avx512 we create a > COND_ADD but the respective MASK_FOLD_LEFT_PLUS is not available, causing us > to create numerous vec_extracts as fallback that increase the cost until we > don't vectorize anymore. Yeah, but then a fold-left reduction wasn't necessary in the first place? We should avoid that (it's slow even when the target supports it) when possible. > Therefore I added a > vectorized_internal_fn_supported_p (IFN_FOLD_LEFT_PLUS, TREE_TYPE (lhs)). > SLP paths and ncopies != 1 are excluded as well. Not really happy with how > the patch looks now but at least the testsuites on aarch and x86 pass. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS 2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai ` (6 preceding siblings ...) 2023-09-15 6:42 ` rguenth at gcc dot gnu.org @ 2023-11-02 10:50 ` cvs-commit at gcc dot gnu.org 2023-11-02 22:40 ` juzhe.zhong at rivai dot ai 8 siblings, 0 replies; 10+ messages in thread From: cvs-commit at gcc dot gnu.org @ 2023-11-02 10:50 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 --- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Robin Dapp <rdapp@gcc.gnu.org>: https://gcc.gnu.org/g:01c18f58d37865d5f3bbe93e666183b54ec608c7 commit r14-5076-g01c18f58d37865d5f3bbe93e666183b54ec608c7 Author: Robin Dapp <rdapp@ventanamicro.com> Date: Wed Sep 13 22:19:35 2023 +0200 ifcvt/vect: Emit COND_OP for conditional scalar reduction. As described in PR111401 we currently emit a COND and a PLUS expression for conditional reductions. This makes it difficult to combine both into a masked reduction statement later. This patch improves that by directly emitting a COND_ADD/COND_OP during ifcvt and adjusting some vectorizer code to handle it. It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS is true. gcc/ChangeLog: PR middle-end/111401 * internal-fn.cc (internal_fn_else_index): New function. * internal-fn.h (internal_fn_else_index): Define. * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP if supported. (predicate_scalar_phi): Add whitespace. * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP. (neutral_op_for_reduction): Return -0 for PLUS. (check_reduction_path): Don't count else operand in COND_OP. (vect_is_simple_reduction): Ditto. (vect_create_epilog_for_reduction): Fix whitespace. (vectorize_fold_left_reduction): Add COND_OP handling. (vectorizable_reduction): Don't count else operand in COND_OP. (vect_transform_reduction): Add COND_OP handling. * tree-vectorizer.h (neutral_op_for_reduction): Add default parameter. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test. * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust. * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS 2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai ` (7 preceding siblings ...) 2023-11-02 10:50 ` cvs-commit at gcc dot gnu.org @ 2023-11-02 22:40 ` juzhe.zhong at rivai dot ai 8 siblings, 0 replies; 10+ messages in thread From: juzhe.zhong at rivai dot ai @ 2023-11-02 22:40 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 JuzheZhong <juzhe.zhong at rivai dot ai> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #9 from JuzheZhong <juzhe.zhong at rivai dot ai> --- Fixed ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-11-02 22:40 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai 2023-09-13 9:46 ` [Bug c/111401] " rguenth at gcc dot gnu.org 2023-09-13 16:52 ` [Bug middle-end/111401] " rdapp at gcc dot gnu.org 2023-09-13 21:25 ` rdapp at gcc dot gnu.org 2023-09-14 6:46 ` rguenther at suse dot de 2023-09-14 6:51 ` rguenther at suse dot de 2023-09-14 15:07 ` rdapp at gcc dot gnu.org 2023-09-15 6:42 ` rguenth at gcc dot gnu.org 2023-11-02 10:50 ` cvs-commit at gcc dot gnu.org 2023-11-02 22:40 ` juzhe.zhong at rivai dot ai
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).