public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
@ 2023-09-13 9:31 juzhe.zhong at rivai dot ai
2023-09-13 9:46 ` [Bug c/111401] " rguenth at gcc dot gnu.org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-09-13 9:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
Bug ID: 111401
Summary: Middle-end: Missed optimization of
MASK_LEN_FOLD_LEFT_PLUS
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: juzhe.zhong at rivai dot ai
Target Milestone: ---
There is a case I think I missed the optimization in the loop vectorizer:
https://godbolt.org/z/x5sjdenhM
double
foo2 (double *__restrict a,
double init,
int *__restrict cond,
int n)
{
for (int i = 0; i < n; i++)
if (cond[i])
init += a[i];
return init;
}
It generate the GIMPLE IR as follows:
_60 = .SELECT_VL (ivtmp_58, 4);
...
vect__ifc__35.14_56 = .VCOND_MASK (mask__23.10_50, vect__8.13_54, { 0.0, 0.0,
0.0, 0.0 });
_36 = .MASK_LEN_FOLD_LEFT_PLUS (init_20, vect__ifc__35.14_56, { -1, -1, -1,
-1 }, _60, 0);
The mask of MASK_LEN_FOLD_LEFT_PLUS is the dummy mask {-1.-1,...-1}
I think we should forward the mask of VCOND_MASK into the
MASK_LEN_FOLD_LEFT_PLUS.
Then we can eliminate the VCOND_MASK.
I don't where is the optimal place to do the optimization.
Should be the match.pd ? or the loop vectorizer code?
Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug c/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
@ 2023-09-13 9:46 ` rguenth at gcc dot gnu.org
2023-09-13 16:52 ` [Bug middle-end/111401] " rdapp at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-09-13 9:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Last reconfirmed| |2023-09-13
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The vectorizer sees if-converted code like
<bb 3> [local count: 955630224]:
# init_20 = PHI <_36(8), init_12(D)(18)>
# i_22 = PHI <i_18(8), 0(18)>
_1 = (long unsigned int) i_22;
_2 = _1 * 4;
_3 = cond_15(D) + _2;
_4 = *_3;
_23 = _4 != 0;
_6 = _1 * 8;
_38 = _37 + _6;
_7 = (double *) _38;
_8 = .MASK_LOAD (_7, 64B, _23);
_ifc__35 = _23 ? _8 : 0.0;
_36 = init_20 + _ifc__35;
i_18 = i_22 + 1;
if (n_13(D) > i_18)
so what it produces matches up here. There's the possibility to
modify the if-conversion handling to use a COND_ADD instead of
the COND_EXPR plus ADD, I think that would be the best thing here.
See tree-if-conv.cc:is_cond_scalar_reduction/convert_scalar_cond_reduction
I think this is also wrong code when signed zeros are involved.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
2023-09-13 9:46 ` [Bug c/111401] " rguenth at gcc dot gnu.org
@ 2023-09-13 16:52 ` rdapp at gcc dot gnu.org
2023-09-13 21:25 ` rdapp at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-09-13 16:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
Robin Dapp <rdapp at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rdapp at gcc dot gnu.org
--- Comment #2 from Robin Dapp <rdapp at gcc dot gnu.org> ---
I played around with this a bit. Emitting a COND_LEN in if-convert is easy:
_ifc__35 = .COND_ADD (_23, init_20, _8, init_20);
However, during reduction handling we rely on the reduction being a gimple
assign and binary operation, though so I needed to fix some places and indices
as well as the proper mask.
What complicates things a bit is that we assume that "init_20" (i.e. the
reduction def) occurs once when we have it twice in the COND_ADD. I just
special cased that for now. Is this the proper thing to do?
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 23c6e8259e7..e99add3cf16 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared
*shared)
static bool
fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
{
- if (code == PLUS_EXPR)
+ if (code == PLUS_EXPR || code == IFN_COND_ADD)
{
*reduc_fn = IFN_FOLD_LEFT_PLUS;
return true;
@@ -4106,8 +4106,11 @@ vect_is_simple_reduction (loop_vec_info loop_info,
stmt_vec_info phi_info,
return NULL;
}
- nphi_def_loop_uses++;
- phi_use_stmt = use_stmt;
+ if (use_stmt != phi_use_stmt)
+ {
+ nphi_def_loop_uses++;
+ phi_use_stmt = use_stmt;
+ }
@@ -7440,6 +7457,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
if (i == STMT_VINFO_REDUC_IDX (stmt_info))
continue;
+ if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)])
+ continue;
+
Apart from that I think what's mainly missing is making the added code nicer.
Going to attach a tentative patch later.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
2023-09-13 9:46 ` [Bug c/111401] " rguenth at gcc dot gnu.org
2023-09-13 16:52 ` [Bug middle-end/111401] " rdapp at gcc dot gnu.org
@ 2023-09-13 21:25 ` rdapp at gcc dot gnu.org
2023-09-14 6:46 ` rguenther at suse dot de
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-09-13 21:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
--- Comment #3 from Robin Dapp <rdapp at gcc dot gnu.org> ---
Several other things came up, so I'm just going to post the latest status here
without having revised or tested it. Going to try fixing it and testing
tomorrow.
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared
*shared)
static bool
fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
{
- if (code == PLUS_EXPR)
+ if (code == PLUS_EXPR || code == IFN_COND_ADD)
{
*reduc_fn = IFN_FOLD_LEFT_PLUS;
return true;
@@ -4106,8 +4106,13 @@ vect_is_simple_reduction (loop_vec_info loop_info,
stmt_vec_info phi_info,
return NULL;
}
- nphi_def_loop_uses++;
- phi_use_stmt = use_stmt;
+ /* We might have two uses in the same instruction, only count them as
+ one. */
+ if (use_stmt != phi_use_stmt)
+ {
+ nphi_def_loop_uses++;
+ phi_use_stmt = use_stmt;
+ }
}
tree latch_def = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
@@ -6861,7 +6866,7 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
gimple **vec_stmt, slp_tree slp_node,
gimple *reduc_def_stmt,
tree_code code, internal_fn reduc_fn,
- tree ops[3], tree vectype_in,
+ tree *ops, int num_ops, tree vectype_in,
int reduc_index, vec_loop_masks *masks,
vec_loop_lens *lens)
{
@@ -6883,11 +6888,24 @@ vectorize_fold_left_reduction (loop_vec_info
loop_vinfo,
gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (vectype_out),
TYPE_VECTOR_SUBPARTS (vectype_in)));
- tree op0 = ops[1 - reduc_index];
+ /* The operands either come from a binary operation or a COND_ADD operation.
+ The former is a gimple assign and the latter is a gimple call with four
+ arguments. */
+ gcc_assert (num_ops == 2 || num_ops == 4);
+ bool is_cond_add = num_ops == 4;
+ tree op0, opmask;
+ if (!is_cond_add)
+ op0 = ops[1 - reduc_index];
+ else
+ {
+ op0 = ops[2];
+ opmask = ops[0];
+ gcc_assert (!slp_node);
+ }
int group_size = 1;
stmt_vec_info scalar_dest_def_info;
- auto_vec<tree> vec_oprnds0;
+ auto_vec<tree> vec_oprnds0, vec_opmask;
if (slp_node)
{
auto_vec<vec<tree> > vec_defs (2);
@@ -6903,9 +6921,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1,
op0, &vec_oprnds0);
scalar_dest_def_info = stmt_info;
+ if (is_cond_add)
+ {
+ vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1,
+ opmask, &vec_opmask);
+ gcc_assert (vec_opmask.length() == 1);
+ }
}
- tree scalar_dest = gimple_assign_lhs (scalar_dest_def_info->stmt);
+ gimple *sdef = scalar_dest_def_info->stmt;
+ tree scalar_dest = is_gimple_call (sdef)
+ ? gimple_call_lhs (sdef)
+ : gimple_assign_lhs (scalar_dest_def_info->stmt);
tree scalar_type = TREE_TYPE (scalar_dest);
tree reduc_var = gimple_phi_result (reduc_def_stmt);
@@ -6945,7 +6972,11 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
i, 1);
signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS
(loop_vinfo);
bias = build_int_cst (intQI_type_node, biasval);
- mask = build_minus_one_cst (truth_type_for (vectype_in));
+ /* If we have a COND_ADD take its mask. Otherwise use {-1, ...}. */
+ if (is_cond_add)
+ mask = vec_opmask[0];
+ else
+ mask = build_minus_one_cst (truth_type_for (vectype_in));
}
/* Handle MINUS by adding the negative. */
@@ -7440,6 +7471,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
if (i == STMT_VINFO_REDUC_IDX (stmt_info))
continue;
+ if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)])
+ continue;
+
/* There should be only one cycle def in the stmt, the one
leading to reduc_def. */
if (VECTORIZABLE_CYCLE_DEF (dt))
@@ -8211,8 +8245,21 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
vec_num = 1;
}
- code_helper code = canonicalize_code (op.code, op.type);
- internal_fn cond_fn = get_conditional_internal_fn (code, op.type);
+ code_helper code (op.code);
+ internal_fn cond_fn;
+
+ if (code.is_internal_fn ())
+ {
+ internal_fn ifn = internal_fn (op.code);
+ code = canonicalize_code (conditional_internal_fn_code (ifn), op.type);
+ cond_fn = ifn;
+ }
+ else
+ {
+ code = canonicalize_code (op.code, op.type);
+ cond_fn = get_conditional_internal_fn (code, op.type);
+ }
+
vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn,
vectype_in);
@@ -8240,8 +8287,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
gcc_assert (code.is_tree_code ());
return vectorize_fold_left_reduction
(loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi,
- tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks,
- lens);
+ tree_code (code), reduc_fn, op.ops, op.num_ops, vectype_in,
+ reduc_index, masks, lens);
}
bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info);
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
` (2 preceding siblings ...)
2023-09-13 21:25 ` rdapp at gcc dot gnu.org
@ 2023-09-14 6:46 ` rguenther at suse dot de
2023-09-14 6:51 ` rguenther at suse dot de
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenther at suse dot de @ 2023-09-14 6:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 13 Sep 2023, rdapp at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
>
> Robin Dapp <rdapp at gcc dot gnu.org> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |rdapp at gcc dot gnu.org
>
> --- Comment #2 from Robin Dapp <rdapp at gcc dot gnu.org> ---
> I played around with this a bit. Emitting a COND_LEN in if-convert is easy:
>
> _ifc__35 = .COND_ADD (_23, init_20, _8, init_20);
>
> However, during reduction handling we rely on the reduction being a gimple
> assign and binary operation, though so I needed to fix some places and indices
> as well as the proper mask.
>
> What complicates things a bit is that we assume that "init_20" (i.e. the
> reduction def) occurs once when we have it twice in the COND_ADD. I just
> special cased that for now. Is this the proper thing to do?
I think so - we should ignore a use in the else value when the other
use is in that same stmt.
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 23c6e8259e7..e99add3cf16 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared
> *shared)
> static bool
> fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
> {
> - if (code == PLUS_EXPR)
> + if (code == PLUS_EXPR || code == IFN_COND_ADD)
> {
> *reduc_fn = IFN_FOLD_LEFT_PLUS;
> return true;
> @@ -4106,8 +4106,11 @@ vect_is_simple_reduction (loop_vec_info loop_info,
> stmt_vec_info phi_info,
> return NULL;
> }
>
> - nphi_def_loop_uses++;
> - phi_use_stmt = use_stmt;
> + if (use_stmt != phi_use_stmt)
> + {
> + nphi_def_loop_uses++;
> + phi_use_stmt = use_stmt;
> + }
>
> @@ -7440,6 +7457,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
> if (i == STMT_VINFO_REDUC_IDX (stmt_info))
> continue;
>
> + if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)])
> + continue;
> +
>
> Apart from that I think what's mainly missing is making the added code nicer.
> Going to attach a tentative patch later.
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
` (3 preceding siblings ...)
2023-09-14 6:46 ` rguenther at suse dot de
@ 2023-09-14 6:51 ` rguenther at suse dot de
2023-09-14 15:07 ` rdapp at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenther at suse dot de @ 2023-09-14 6:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 13 Sep 2023, rdapp at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
>
> --- Comment #3 from Robin Dapp <rdapp at gcc dot gnu.org> ---
> Several other things came up, so I'm just going to post the latest status here
> without having revised or tested it. Going to try fixing it and testing
> tomorrow.
I think what's important to do is make sure targets without
masking are still getting the cond-reduction code generation
(but with the signed-zero issue fixed). Using a cond_add is
probably better than the vec_cond + add even for the not
fold-left reduction case.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
` (4 preceding siblings ...)
2023-09-14 6:51 ` rguenther at suse dot de
@ 2023-09-14 15:07 ` rdapp at gcc dot gnu.org
2023-09-15 6:42 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-09-14 15:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
--- Comment #6 from Robin Dapp <rdapp at gcc dot gnu.org> ---
Created attachment 55902
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55902&action=edit
Tentative
You're referring to the case where we have init = -0.0, the condition is false
and we end up wrongly doing -0.0 + 0.0 = 0.0?
I suppose -0.0 the proper neutral element for PLUS (and WIDEN_SUM?) when
honoring signed zeros? And 0.0 for MINUS? Doesn't that also depend on the
rounding mode?
neutral_op_for_reduction could return a -0 for PLUS if we honor it for that
type. Or is that too intrusive?
Guess I should add a test case for that as well.
Another thing is that swapping operands is not as easy with COND_ADD because
the addition would be in the else. I'd punt for that case for now.
Next problem - might be a mistake on my side. For avx512 we create a COND_ADD
but the respective MASK_FOLD_LEFT_PLUS is not available, causing us to create
numerous vec_extracts as fallback that increase the cost until we don't
vectorize anymore.
Therefore I added a
vectorized_internal_fn_supported_p (IFN_FOLD_LEFT_PLUS, TREE_TYPE (lhs)).
SLP paths and ncopies != 1 are excluded as well. Not really happy with how the
patch looks now but at least the testsuites on aarch and x86 pass.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
` (5 preceding siblings ...)
2023-09-14 15:07 ` rdapp at gcc dot gnu.org
@ 2023-09-15 6:42 ` rguenth at gcc dot gnu.org
2023-11-02 10:50 ` cvs-commit at gcc dot gnu.org
2023-11-02 22:40 ` juzhe.zhong at rivai dot ai
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-09-15 6:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Robin Dapp from comment #6)
> Created attachment 55902 [details]
> Tentative
>
> You're referring to the case where we have init = -0.0, the condition is
> false and we end up wrongly doing -0.0 + 0.0 = 0.0?
> I suppose -0.0 the proper neutral element for PLUS (and WIDEN_SUM?) when
> honoring signed zeros? And 0.0 for MINUS? Doesn't that also depend on the
> rounding mode?
Yes, if the rounding mode isn't known there isn't a working neutral element.
> neutral_op_for_reduction could return a -0 for PLUS if we honor it for that
> type. Or is that too intrusive?
I suppose that could work, but we need to check that we're not using this
for the initial value.
> Guess I should add a test case for that as well.
>
> Another thing is that swapping operands is not as easy with COND_ADD because
> the addition would be in the else. I'd punt for that case for now.
>
> Next problem - might be a mistake on my side. For avx512 we create a
> COND_ADD but the respective MASK_FOLD_LEFT_PLUS is not available, causing us
> to create numerous vec_extracts as fallback that increase the cost until we
> don't vectorize anymore.
Yeah, but then a fold-left reduction wasn't necessary in the first place?
We should avoid that (it's slow even when the target supports it) when
possible.
> Therefore I added a
> vectorized_internal_fn_supported_p (IFN_FOLD_LEFT_PLUS, TREE_TYPE (lhs)).
> SLP paths and ncopies != 1 are excluded as well. Not really happy with how
> the patch looks now but at least the testsuites on aarch and x86 pass.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
` (6 preceding siblings ...)
2023-09-15 6:42 ` rguenth at gcc dot gnu.org
@ 2023-11-02 10:50 ` cvs-commit at gcc dot gnu.org
2023-11-02 22:40 ` juzhe.zhong at rivai dot ai
8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-02 10:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Robin Dapp <rdapp@gcc.gnu.org>:
https://gcc.gnu.org/g:01c18f58d37865d5f3bbe93e666183b54ec608c7
commit r14-5076-g01c18f58d37865d5f3bbe93e666183b54ec608c7
Author: Robin Dapp <rdapp@ventanamicro.com>
Date: Wed Sep 13 22:19:35 2023 +0200
ifcvt/vect: Emit COND_OP for conditional scalar reduction.
As described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions. This makes it difficult to combine both
into a masked reduction statement later.
This patch improves that by directly emitting a COND_ADD/COND_OP during
ifcvt and adjusting some vectorizer code to handle it.
It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
is true.
gcc/ChangeLog:
PR middle-end/111401
* internal-fn.cc (internal_fn_else_index): New function.
* internal-fn.h (internal_fn_else_index): Define.
* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
if supported.
(predicate_scalar_phi): Add whitespace.
* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
(neutral_op_for_reduction): Return -0 for PLUS.
(check_reduction_path): Don't count else operand in COND_OP.
(vect_is_simple_reduction): Ditto.
(vect_create_epilog_for_reduction): Fix whitespace.
(vectorize_fold_left_reduction): Add COND_OP handling.
(vectorizable_reduction): Don't count else operand in COND_OP.
(vect_transform_reduction): Add COND_OP handling.
* tree-vectorizer.h (neutral_op_for_reduction): Add default
parameter.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
` (7 preceding siblings ...)
2023-11-02 10:50 ` cvs-commit at gcc dot gnu.org
@ 2023-11-02 22:40 ` juzhe.zhong at rivai dot ai
8 siblings, 0 replies; 10+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-02 22:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
JuzheZhong <juzhe.zhong at rivai dot ai> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #9 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Fixed
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-11-02 22:40 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-13 9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
2023-09-13 9:46 ` [Bug c/111401] " rguenth at gcc dot gnu.org
2023-09-13 16:52 ` [Bug middle-end/111401] " rdapp at gcc dot gnu.org
2023-09-13 21:25 ` rdapp at gcc dot gnu.org
2023-09-14 6:46 ` rguenther at suse dot de
2023-09-14 6:51 ` rguenther at suse dot de
2023-09-14 15:07 ` rdapp at gcc dot gnu.org
2023-09-15 6:42 ` rguenth at gcc dot gnu.org
2023-11-02 10:50 ` cvs-commit at gcc dot gnu.org
2023-11-02 22:40 ` juzhe.zhong at rivai dot ai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).