* [PATCH V2] VECT: Support floating-point in-order reduction for length loop control @ 2023-07-21 10:05 juzhe.zhong 2023-07-21 10:51 ` Richard Biener 0 siblings, 1 reply; 6+ messages in thread From: juzhe.zhong @ 2023-07-21 10:05 UTC (permalink / raw) To: gcc-patches; +Cc: richard.sandiford, rguenther, Ju-Zhe Zhong From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai> Hi, Richard and Richi. This patch support floating-point in-order reduction for loop length control. Consider this following case: float foo (float *__restrict a, int n) { float result = 1.0; for (int i = 0; i < n; i++) result += a[i]; return result; } When compile with **NO** -ffast-math on ARM SVE, we will end up with: loop_mask = WHILE_ULT result = MASK_FOLD_LEFT_PLUS (...loop_mask...) For RVV, we don't use length loop control instead of mask: So, with this patch, we expect to see: loop_len = SELECT_VL result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...) gcc/ChangeLog: * tree-vect-loop.cc (get_masked_reduction_fn): Add mask_len_fold_left. (vectorize_fold_left_reduction): Ditto. (vectorizable_reduction): Ditto. (vect_transform_reduction): Ditto. --- gcc/tree-vect-loop.cc | 41 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 36 insertions(+), 5 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index b44fb9c7712..59ab7879d55 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -6800,11 +6800,13 @@ static internal_fn get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in) { internal_fn mask_reduc_fn; + internal_fn mask_len_reduc_fn; switch (reduc_fn) { case IFN_FOLD_LEFT_PLUS: mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS; + mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS; break; default: @@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in) if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in, OPTIMIZE_FOR_SPEED)) return mask_reduc_fn; + if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in, + OPTIMIZE_FOR_SPEED)) + return mask_len_reduc_fn; return IFN_LAST; } @@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, gimple *reduc_def_stmt, tree_code code, internal_fn reduc_fn, tree ops[3], tree vectype_in, - int reduc_index, vec_loop_masks *masks) + int reduc_index, vec_loop_masks *masks, + vec_loop_lens *lens) { class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); tree vectype_out = STMT_VINFO_VECTYPE (stmt_info); @@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, { gimple *new_stmt; tree mask = NULL_TREE; + tree len = NULL_TREE; + tree bias = NULL_TREE; if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i); + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) + { + len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in, + i, 1); + signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + bias = build_int_cst (intQI_type_node, biasval); + mask = build_minus_one_cst (truth_type_for (vectype_in)); + } /* Handle MINUS by adding the negative. */ if (reduc_fn != IFN_LAST && code == MINUS_EXPR) @@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, the preceding operation. */ if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST)) { - if (mask && mask_reduc_fn != IFN_LAST) + if (len && mask && mask_reduc_fn != IFN_LAST) + new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var, + def0, mask, len, bias); + else if (mask && mask_reduc_fn != IFN_LAST) new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var, def0, mask); else @@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo, else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) { vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type); if (reduction_type != FOLD_LEFT_REDUCTION @@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo, LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; } else - vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, - vectype_in, NULL); + { + internal_fn mask_reduc_fn + = get_masked_reduction_fn (reduc_fn, vectype_in); + + if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS) + vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num, + vectype_in, 1); + else + vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, + vectype_in, NULL); + } } return true; } @@ -8137,6 +8166,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo, code_helper code = canonicalize_code (op.code, op.type); internal_fn cond_fn = get_conditional_internal_fn (code, op.type); vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn, vectype_in); /* Transform. */ @@ -8162,7 +8192,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo, gcc_assert (code.is_tree_code ()); return vectorize_fold_left_reduction (loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi, - tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks); + tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks, + lens); } bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info); -- 2.36.3 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH V2] VECT: Support floating-point in-order reduction for length loop control 2023-07-21 10:05 [PATCH V2] VECT: Support floating-point in-order reduction for length loop control juzhe.zhong @ 2023-07-21 10:51 ` Richard Biener 2023-07-21 10:59 ` juzhe.zhong 2023-07-21 11:08 ` juzhe.zhong 0 siblings, 2 replies; 6+ messages in thread From: Richard Biener @ 2023-07-21 10:51 UTC (permalink / raw) To: Ju-Zhe Zhong; +Cc: gcc-patches, richard.sandiford On Fri, 21 Jul 2023, juzhe.zhong@rivai.ai wrote: > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai> > > Hi, Richard and Richi. > > This patch support floating-point in-order reduction for loop length control. > > Consider this following case: > > float foo (float *__restrict a, int n) > { > float result = 1.0; > for (int i = 0; i < n; i++) > result += a[i]; > return result; > } > > When compile with **NO** -ffast-math on ARM SVE, we will end up with: > > loop_mask = WHILE_ULT > result = MASK_FOLD_LEFT_PLUS (...loop_mask...) > > For RVV, we don't use length loop control instead of mask: > > So, with this patch, we expect to see: > > loop_len = SELECT_VL > result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...) > > gcc/ChangeLog: > > * tree-vect-loop.cc (get_masked_reduction_fn): Add mask_len_fold_left. > (vectorize_fold_left_reduction): Ditto. > (vectorizable_reduction): Ditto. > (vect_transform_reduction): Ditto. > > --- > gcc/tree-vect-loop.cc | 41 ++++++++++++++++++++++++++++++++++++----- > 1 file changed, 36 insertions(+), 5 deletions(-) > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index b44fb9c7712..59ab7879d55 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -6800,11 +6800,13 @@ static internal_fn > get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in) > { > internal_fn mask_reduc_fn; > + internal_fn mask_len_reduc_fn; > > switch (reduc_fn) > { > case IFN_FOLD_LEFT_PLUS: > mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS; > + mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS; > break; > > default: > @@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in) > if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in, > OPTIMIZE_FOR_SPEED)) > return mask_reduc_fn; > + if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in, > + OPTIMIZE_FOR_SPEED)) > + return mask_len_reduc_fn; > return IFN_LAST; > } > > @@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > gimple *reduc_def_stmt, > tree_code code, internal_fn reduc_fn, > tree ops[3], tree vectype_in, > - int reduc_index, vec_loop_masks *masks) > + int reduc_index, vec_loop_masks *masks, > + vec_loop_lens *lens) > { > class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > tree vectype_out = STMT_VINFO_VECTYPE (stmt_info); > @@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > { > gimple *new_stmt; > tree mask = NULL_TREE; > + tree len = NULL_TREE; > + tree bias = NULL_TREE; > if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) > mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i); > + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) > + { > + len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in, > + i, 1); > + signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); > + bias = build_int_cst (intQI_type_node, biasval); > + mask = build_minus_one_cst (truth_type_for (vectype_in)); > + } > > /* Handle MINUS by adding the negative. */ > if (reduc_fn != IFN_LAST && code == MINUS_EXPR) > @@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > the preceding operation. */ > if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST)) > { > - if (mask && mask_reduc_fn != IFN_LAST) > + if (len && mask && mask_reduc_fn != IFN_LAST) check mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS instead? > + new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var, > + def0, mask, len, bias); > + else if (mask && mask_reduc_fn != IFN_LAST) Likewise. Otherwise looks good to me. Richard. > new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var, > def0, mask); > else > @@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) > { > vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); > internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type); > > if (reduction_type != FOLD_LEFT_REDUCTION > @@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; > } > else > - vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, > - vectype_in, NULL); > + { > + internal_fn mask_reduc_fn > + = get_masked_reduction_fn (reduc_fn, vectype_in); > + > + if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS) > + vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num, > + vectype_in, 1); > + else > + vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, > + vectype_in, NULL); > + } > } > return true; > } > @@ -8137,6 +8166,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > code_helper code = canonicalize_code (op.code, op.type); > internal_fn cond_fn = get_conditional_internal_fn (code, op.type); > vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); > bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn, vectype_in); > > /* Transform. */ > @@ -8162,7 +8192,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > gcc_assert (code.is_tree_code ()); > return vectorize_fold_left_reduction > (loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi, > - tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks); > + tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks, > + lens); > } > > bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info); > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: [PATCH V2] VECT: Support floating-point in-order reduction for length loop control 2023-07-21 10:51 ` Richard Biener @ 2023-07-21 10:59 ` juzhe.zhong 2023-07-21 11:08 ` juzhe.zhong 1 sibling, 0 replies; 6+ messages in thread From: juzhe.zhong @ 2023-07-21 10:59 UTC (permalink / raw) To: rguenther; +Cc: gcc-patches, richard.sandiford [-- Attachment #1: Type: text/plain, Size: 6711 bytes --] Thanks Richi, Address comment on V3: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625167.html Bootstrap and regression is on the way. juzhe.zhong@rivai.ai From: Richard Biener Date: 2023-07-21 18:51 To: Ju-Zhe Zhong CC: gcc-patches; richard.sandiford Subject: Re: [PATCH V2] VECT: Support floating-point in-order reduction for length loop control On Fri, 21 Jul 2023, juzhe.zhong@rivai.ai wrote: > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai> > > Hi, Richard and Richi. > > This patch support floating-point in-order reduction for loop length control. > > Consider this following case: > > float foo (float *__restrict a, int n) > { > float result = 1.0; > for (int i = 0; i < n; i++) > result += a[i]; > return result; > } > > When compile with **NO** -ffast-math on ARM SVE, we will end up with: > > loop_mask = WHILE_ULT > result = MASK_FOLD_LEFT_PLUS (...loop_mask...) > > For RVV, we don't use length loop control instead of mask: > > So, with this patch, we expect to see: > > loop_len = SELECT_VL > result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...) > > gcc/ChangeLog: > > * tree-vect-loop.cc (get_masked_reduction_fn): Add mask_len_fold_left. > (vectorize_fold_left_reduction): Ditto. > (vectorizable_reduction): Ditto. > (vect_transform_reduction): Ditto. > > --- > gcc/tree-vect-loop.cc | 41 ++++++++++++++++++++++++++++++++++++----- > 1 file changed, 36 insertions(+), 5 deletions(-) > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index b44fb9c7712..59ab7879d55 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -6800,11 +6800,13 @@ static internal_fn > get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in) > { > internal_fn mask_reduc_fn; > + internal_fn mask_len_reduc_fn; > > switch (reduc_fn) > { > case IFN_FOLD_LEFT_PLUS: > mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS; > + mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS; > break; > > default: > @@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in) > if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in, > OPTIMIZE_FOR_SPEED)) > return mask_reduc_fn; > + if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in, > + OPTIMIZE_FOR_SPEED)) > + return mask_len_reduc_fn; > return IFN_LAST; > } > > @@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > gimple *reduc_def_stmt, > tree_code code, internal_fn reduc_fn, > tree ops[3], tree vectype_in, > - int reduc_index, vec_loop_masks *masks) > + int reduc_index, vec_loop_masks *masks, > + vec_loop_lens *lens) > { > class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > tree vectype_out = STMT_VINFO_VECTYPE (stmt_info); > @@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > { > gimple *new_stmt; > tree mask = NULL_TREE; > + tree len = NULL_TREE; > + tree bias = NULL_TREE; > if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) > mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i); > + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) > + { > + len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in, > + i, 1); > + signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); > + bias = build_int_cst (intQI_type_node, biasval); > + mask = build_minus_one_cst (truth_type_for (vectype_in)); > + } > > /* Handle MINUS by adding the negative. */ > if (reduc_fn != IFN_LAST && code == MINUS_EXPR) > @@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > the preceding operation. */ > if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST)) > { > - if (mask && mask_reduc_fn != IFN_LAST) > + if (len && mask && mask_reduc_fn != IFN_LAST) check mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS instead? > + new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var, > + def0, mask, len, bias); > + else if (mask && mask_reduc_fn != IFN_LAST) Likewise. Otherwise looks good to me. Richard. > new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var, > def0, mask); > else > @@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) > { > vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); > internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type); > > if (reduction_type != FOLD_LEFT_REDUCTION > @@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; > } > else > - vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, > - vectype_in, NULL); > + { > + internal_fn mask_reduc_fn > + = get_masked_reduction_fn (reduc_fn, vectype_in); > + > + if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS) > + vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num, > + vectype_in, 1); > + else > + vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, > + vectype_in, NULL); > + } > } > return true; > } > @@ -8137,6 +8166,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > code_helper code = canonicalize_code (op.code, op.type); > internal_fn cond_fn = get_conditional_internal_fn (code, op.type); > vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); > bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn, vectype_in); > > /* Transform. */ > @@ -8162,7 +8192,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > gcc_assert (code.is_tree_code ()); > return vectorize_fold_left_reduction > (loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi, > - tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks); > + tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks, > + lens); > } > > bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info); > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: [PATCH V2] VECT: Support floating-point in-order reduction for length loop control 2023-07-21 10:51 ` Richard Biener 2023-07-21 10:59 ` juzhe.zhong @ 2023-07-21 11:08 ` juzhe.zhong 2023-07-23 4:32 ` Lehua Ding 1 sibling, 1 reply; 6+ messages in thread From: juzhe.zhong @ 2023-07-21 11:08 UTC (permalink / raw) To: rguenther; +Cc: gcc-patches, richard.sandiford [-- Attachment #1: Type: text/plain, Size: 7350 bytes --] Oh. Sorry for missing a fix, Now I fix as you suggested on V4 https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625169.html Change it as follows: if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS) new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var, def0, mask, len, bias); else if (mask_reduc_fn == IFN_MASK_FOLD_LEFT_PLUS) new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var, def0, mask); else new_stmt = gimple_build_call_internal (reduc_fn, 2, reduc_var, def0); Sorry for that. Bootstrap && Regression on running. juzhe.zhong@rivai.ai From: Richard Biener Date: 2023-07-21 18:51 To: Ju-Zhe Zhong CC: gcc-patches; richard.sandiford Subject: Re: [PATCH V2] VECT: Support floating-point in-order reduction for length loop control On Fri, 21 Jul 2023, juzhe.zhong@rivai.ai wrote: > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai> > > Hi, Richard and Richi. > > This patch support floating-point in-order reduction for loop length control. > > Consider this following case: > > float foo (float *__restrict a, int n) > { > float result = 1.0; > for (int i = 0; i < n; i++) > result += a[i]; > return result; > } > > When compile with **NO** -ffast-math on ARM SVE, we will end up with: > > loop_mask = WHILE_ULT > result = MASK_FOLD_LEFT_PLUS (...loop_mask...) > > For RVV, we don't use length loop control instead of mask: > > So, with this patch, we expect to see: > > loop_len = SELECT_VL > result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...) > > gcc/ChangeLog: > > * tree-vect-loop.cc (get_masked_reduction_fn): Add mask_len_fold_left. > (vectorize_fold_left_reduction): Ditto. > (vectorizable_reduction): Ditto. > (vect_transform_reduction): Ditto. > > --- > gcc/tree-vect-loop.cc | 41 ++++++++++++++++++++++++++++++++++++----- > 1 file changed, 36 insertions(+), 5 deletions(-) > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index b44fb9c7712..59ab7879d55 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -6800,11 +6800,13 @@ static internal_fn > get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in) > { > internal_fn mask_reduc_fn; > + internal_fn mask_len_reduc_fn; > > switch (reduc_fn) > { > case IFN_FOLD_LEFT_PLUS: > mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS; > + mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS; > break; > > default: > @@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in) > if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in, > OPTIMIZE_FOR_SPEED)) > return mask_reduc_fn; > + if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in, > + OPTIMIZE_FOR_SPEED)) > + return mask_len_reduc_fn; > return IFN_LAST; > } > > @@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > gimple *reduc_def_stmt, > tree_code code, internal_fn reduc_fn, > tree ops[3], tree vectype_in, > - int reduc_index, vec_loop_masks *masks) > + int reduc_index, vec_loop_masks *masks, > + vec_loop_lens *lens) > { > class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > tree vectype_out = STMT_VINFO_VECTYPE (stmt_info); > @@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > { > gimple *new_stmt; > tree mask = NULL_TREE; > + tree len = NULL_TREE; > + tree bias = NULL_TREE; > if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) > mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i); > + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) > + { > + len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in, > + i, 1); > + signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); > + bias = build_int_cst (intQI_type_node, biasval); > + mask = build_minus_one_cst (truth_type_for (vectype_in)); > + } > > /* Handle MINUS by adding the negative. */ > if (reduc_fn != IFN_LAST && code == MINUS_EXPR) > @@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo, > the preceding operation. */ > if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST)) > { > - if (mask && mask_reduc_fn != IFN_LAST) > + if (len && mask && mask_reduc_fn != IFN_LAST) check mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS instead? > + new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var, > + def0, mask, len, bias); > + else if (mask && mask_reduc_fn != IFN_LAST) Likewise. Otherwise looks good to me. Richard. > new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var, > def0, mask); > else > @@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) > { > vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); > internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type); > > if (reduction_type != FOLD_LEFT_REDUCTION > @@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; > } > else > - vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, > - vectype_in, NULL); > + { > + internal_fn mask_reduc_fn > + = get_masked_reduction_fn (reduc_fn, vectype_in); > + > + if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS) > + vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num, > + vectype_in, 1); > + else > + vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, > + vectype_in, NULL); > + } > } > return true; > } > @@ -8137,6 +8166,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > code_helper code = canonicalize_code (op.code, op.type); > internal_fn cond_fn = get_conditional_internal_fn (code, op.type); > vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); > bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn, vectype_in); > > /* Transform. */ > @@ -8162,7 +8192,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > gcc_assert (code.is_tree_code ()); > return vectorize_fold_left_reduction > (loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi, > - tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks); > + tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks, > + lens); > } > > bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info); > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: [PATCH V2] VECT: Support floating-point in-order reduction for length loop control 2023-07-21 11:08 ` juzhe.zhong @ 2023-07-23 4:32 ` Lehua Ding 2023-07-24 6:44 ` Richard Biener 0 siblings, 1 reply; 6+ messages in thread From: Lehua Ding @ 2023-07-23 4:32 UTC (permalink / raw) To: rguenther; +Cc: gcc-patches, richard.sandiford, juzhe.zhong [-- Attachment #1: Type: text/plain, Size: 230 bytes --] Hi Richard, Bootstrap and regression are passed on X86 and no new testcases fail on AArch64 with V5 patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625293.html V5 patch is ok for trunk? Best, Lehua ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: [PATCH V2] VECT: Support floating-point in-order reduction for length loop control 2023-07-23 4:32 ` Lehua Ding @ 2023-07-24 6:44 ` Richard Biener 0 siblings, 0 replies; 6+ messages in thread From: Richard Biener @ 2023-07-24 6:44 UTC (permalink / raw) To: Lehua Ding; +Cc: gcc-patches, richard.sandiford, juzhe.zhong On Sun, 23 Jul 2023, Lehua Ding wrote: > Hi Richard, > > > Bootstrap and regression are passed on X86 and > no new testcases fail on AArch64 with V5 patch: > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625293.html > > > V5 patch is ok for trunk? Yes. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-07-24 6:44 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-07-21 10:05 [PATCH V2] VECT: Support floating-point in-order reduction for length loop control juzhe.zhong 2023-07-21 10:51 ` Richard Biener 2023-07-21 10:59 ` juzhe.zhong 2023-07-21 11:08 ` juzhe.zhong 2023-07-23 4:32 ` Lehua Ding 2023-07-24 6:44 ` Richard Biener
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).