public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement
@ 2023-12-20 9:54 fxue at os dot amperecomputing.com
2023-12-20 13:09 ` [Bug tree-optimization/113091] " rguenth at gcc dot gnu.org
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2023-12-20 9:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091
Bug ID: 113091
Summary: Over-estimate SLP vector-to-scalar cost for non-live
pattern statement
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: fxue at os dot amperecomputing.com
Target Milestone: ---
Gcc fails to vectorize the below testcase on aarch64.
int test(unsigned array[8]);
int foo(char *a, char *b)
{
unsigned array[8];
array[0] = (a[0] - b[0]);
array[1] = (a[1] - b[1]);
array[2] = (a[2] - b[2]);
array[3] = (a[3] - b[3]);
array[4] = (a[4] - b[4]);
array[5] = (a[5] - b[5]);
array[6] = (a[6] - b[6]);
array[7] = (a[7] - b[7]);
return test(array);
}
The dump shows that loads to a[i] and b[i] are considered to be live as scalar
references, which results in over-estimated vector-to-scalar cost.
*a_50(D) 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 1B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 3B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 5B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 7B] 1 times vec_to_scalar costs 2 in epilogue
*b_51(D) 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 1B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 3B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 5B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 7B] 1 times vec_to_scalar costs 2 in epilogue
Subtraction on char type is recognized as widen-sub, and involves two kinds of
pattern replacement.
* Original
_1 = *a_50(D);
_2 = (int) _1;
_3 = *b_51(D);
_4 = (int) _3;
_5 = _2 - _4;
* After pattern replacement
patt_63 = (unsigned short) _1; // _2 = (int) _1;
patt_64 = (int) patt_63; // _2 = (int) _1;
patt_65 = (unsigned short) _3; // _4 = (int) _3;
patt_66 = (int) patt_65; // _4 = (int) _3;
patt_67 = .VEC_WIDEN_MINUS (_1, _3); // _5 = _2 - _4;
patt_68 = (signed short) patt_67; // _5 = _2 - _4;
patt_69 = (int) patt_68; // _5 = _2 - _4;
For the statement "_2 = (int) _1", its vectorization representative "patt_64 =
(int) patt_63" is not marked as PURE_SLP, so it is conservatively considered to
having scalar use and being live outside of SLP bb (in the function
vect_bb_slp_mark_live_stmts). However, the pattern definition is actually dead,
should not contribute to vector-to-scalar cost.
Those defs from pattern statements are not part of function body, we could not
track def/use chain as ordinary SSAs. Probably, we may have a quick fix for one
situation, if the original SSA "_2" has single use, its existence should be
only covered by vectorized operation, no matter what/how it would be w/o
pattern replacement.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
2023-12-20 9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
@ 2023-12-20 13:09 ` rguenth at gcc dot gnu.org
2023-12-21 5:25 ` fxue at os dot amperecomputing.com
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-12-20 13:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org,
| |rsandifo at gcc dot gnu.org
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's the logic
FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
{
if (svisited.contains (stmt_info))
continue;
stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
if (STMT_VINFO_IN_PATTERN_P (orig_stmt_info)
&& STMT_VINFO_RELATED_STMT (orig_stmt_info) != stmt_info)
/* Only the pattern root stmt computes the original scalar value. */
continue;
bool mark_visited = true;
gimple *orig_stmt = orig_stmt_info->stmt;
ssa_op_iter op_iter;
def_operand_p def_p;
FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
{
imm_use_iterator use_iter;
gimple *use_stmt;
stmt_vec_info use_stmt_info;
FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
if (!is_gimple_debug (use_stmt))
{
use_stmt_info = bb_vinfo->lookup_stmt (use_stmt);
if (!use_stmt_info
|| !PURE_SLP_STMT (vect_stmt_to_vectorize (use_stmt_info)))
{
STMT_VINFO_LIVE_P (stmt_info) = true;
specifically the last check. That's supposed to pick up the "main" pattern
that's now covering the scalar stmt.
But somehow the "main" pattern,
patt_67 = .VEC_WIDEN_MINUS (_1, _3); // _5 = _2 - _4;
patt_68 = (signed short) patt_67; // _5 = _2 - _4;
patt_69 = (int) patt_68; // _5 = _2 - _4;
doesn't get picked up here. I wonder what's the orig_stmt and the def
picked and what original scalar use we end up in where the
vect_stmt_to_vectorize isn't the "last" pattern. Maybe we really want
these "overlapping" patterns, but IMHO having "two entries" into
a chain of scalar stmts is bad and we should link up the whole matched
sequence to the final "root" instead?
That said, the current code doesn't see that wherever we end up isn't
dead code (aka fully covered by the vectorization).
IMO vect_stmt_to_vectorize for each of those stmts should end up at
patt_69 = (int) patt_68;
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
2023-12-20 9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
2023-12-20 13:09 ` [Bug tree-optimization/113091] " rguenth at gcc dot gnu.org
@ 2023-12-21 5:25 ` fxue at os dot amperecomputing.com
2023-12-21 5:27 ` fxue at os dot amperecomputing.com
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2023-12-21 5:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091
--- Comment #2 from Feng Xue <fxue at os dot amperecomputing.com> ---
(In reply to Richard Biener from comment #1)
> It's the logic
>
> FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> {
> if (svisited.contains (stmt_info))
> continue;
> stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
> if (STMT_VINFO_IN_PATTERN_P (orig_stmt_info)
> && STMT_VINFO_RELATED_STMT (orig_stmt_info) != stmt_info)
> /* Only the pattern root stmt computes the original scalar value. */
> continue;
> bool mark_visited = true;
> gimple *orig_stmt = orig_stmt_info->stmt;
> ssa_op_iter op_iter;
> def_operand_p def_p;
> FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
> {
> imm_use_iterator use_iter;
> gimple *use_stmt;
> stmt_vec_info use_stmt_info;
> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
> if (!is_gimple_debug (use_stmt))
> {
> use_stmt_info = bb_vinfo->lookup_stmt (use_stmt);
> if (!use_stmt_info
> || !PURE_SLP_STMT (vect_stmt_to_vectorize
> (use_stmt_info)))
> {
> STMT_VINFO_LIVE_P (stmt_info) = true;
>
> specifically the last check. That's supposed to pick up the "main" pattern
> that's now covering the scalar stmt.
>
> But somehow the "main" pattern,
>
> patt_67 = .VEC_WIDEN_MINUS (_1, _3); // _5 = _2 - _4;
> patt_68 = (signed short) patt_67; // _5 = _2 - _4;
> patt_69 = (int) patt_68; // _5 = _2 - _4;
>
> doesn't get picked up here. I wonder what's the orig_stmt and the def
> picked and what original scalar use we end up in where the
> vect_stmt_to_vectorize isn't the "last" pattern. Maybe we really want
This problem happens at slp node:
note: node 0x425bc38 (max_nunits=8, refcnt=1) vector(8) char
note: op template: _1 = *a_50(D);
note: stmt 0 _1 = *a_50(D);
note: stmt 1 _7 = MEM[(char *)a_50(D) + 1B];
note: stmt 2 _13 = MEM[(char *)a_50(D) + 2B];
note: stmt 3 _19 = MEM[(char *)a_50(D) + 3B];
note: stmt 4 _25 = MEM[(char *)a_50(D) + 4B];
note: stmt 5 _31 = MEM[(char *)a_50(D) + 5B];
note: stmt 6 _37 = MEM[(char *)a_50(D) + 6B];
note: stmt 7 _43 = MEM[(char *)a_50(D) + 7B];
The orig_stmt is "_1 = *a_50(D)"
The use stmt is "_2 = (int) _1", whose pattern statement is "patt_64 = (int)
patt_63", which is not referenced by any original or other pattern statements.
Or in other word, the orig_stmt could be absorbed into a vector operation,
without any outlier scalar use.
The fore-mentioned "last check" in vect_bb_slp_mark_live_stmts would make the
orig_stmt to be STMT_VINFO_LIVE_P, which actually implies it has scalar use
(though it should not have), the difference is re-generating the def somewhere,
rather than retaining the original scalar statement. And the following
"vectorizable_live_operation" would account the new operations into
vectorization cost of the SLP instance.
The function vect_bb_vectorization_profitable_p resorts to a recursive way to
identify scalar use, for this case, setting STMT_VINFO_LIVE_P or not would
change scalar cost computation. If we can avoid such fake-liveness adjustment
on the statements we are interested in, vectorization cost could beat scalar
cost, and make the former succeed.
Unvectorized:
mov x2, x0
stp x29, x30, [sp, -48]!
mov x29, sp
ldrb w3, [x1]
ldrb w4, [x1, 1]
add x0, sp, 16
ldrb w9, [x2]
ldrb w8, [x2, 1]
sub w9, w9, w3
ldrb w7, [x2, 2]
ldrb w3, [x1, 2]
sub w8, w8, w4
ldrb w6, [x2, 3]
ldrb w4, [x1, 3]
sub w7, w7, w3
ldrb w10, [x1, 5]
ldrb w3, [x1, 4]
sub w6, w6, w4
ldrb w5, [x2, 4]
ldrb w4, [x2, 5]
sub w5, w5, w3
ldrb w3, [x2, 6]
sub w4, w4, w10
ldrb w2, [x2, 7]
ldrb w10, [x1, 6]
ldrb w1, [x1, 7]
sub w3, w3, w10
stp w9, w8, [sp, 16]
sub w1, w2, w1
stp w7, w6, [sp, 24]
stp w5, w4, [sp, 32]
stp w3, w1, [sp, 40]
bl test
ldp x29, x30, [sp], 48
ret
Vectorized:
mov x2, x0
stp x29, x30, [sp, -48]!
mov x29, sp
ldr d1, [x1]
add x0, sp, 16
ldr d0, [x2]
usubl v0.8h, v0.8b, v1.8b
sxtl v1.4s, v0.4h
sxtl2 v0.4s, v0.8h
stp q1, q0, [sp, 16]
bl test
ldp x29, x30, [sp], 48
ret
> these "overlapping" patterns, but IMHO having "two entries" into
> a chain of scalar stmts is bad and we should link up the whole matched
> sequence to the final "root" instead?
>
> That said, the current code doesn't see that wherever we end up isn't
> dead code (aka fully covered by the vectorization).
>
> IMO vect_stmt_to_vectorize for each of those stmts should end up at
>
> patt_69 = (int) patt_68;
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
2023-12-20 9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
2023-12-20 13:09 ` [Bug tree-optimization/113091] " rguenth at gcc dot gnu.org
2023-12-21 5:25 ` fxue at os dot amperecomputing.com
@ 2023-12-21 5:27 ` fxue at os dot amperecomputing.com
2023-12-21 7:31 ` rguenth at gcc dot gnu.org
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2023-12-21 5:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091
--- Comment #3 from Feng Xue <fxue at os dot amperecomputing.com> ---
The function vect_bb_vectorization_profitable_p resorts to a recursive way to
identify scalar use, for this case, setting STMT_VINFO_LIVE_P or not would not
change scalar cost computation.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
2023-12-20 9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
` (2 preceding siblings ...)
2023-12-21 5:27 ` fxue at os dot amperecomputing.com
@ 2023-12-21 7:31 ` rguenth at gcc dot gnu.org
2023-12-21 11:01 ` rsandifo at gcc dot gnu.org
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-12-21 7:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Last reconfirmed| |2023-12-21
Status|UNCONFIRMED |NEW
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
"The use stmt is "_2 = (int) _1", whose pattern statement is "patt_64 = (int)
patt_63", which is not referenced by any original or other pattern statements.
Or in other word, the orig_stmt could be absorbed into a vector operation,
without any outlier scalar use."
That means the code sees that _2 = (int) _1 isn't vectorized (the pattern
stmt isn't actually used) which means _2 = (int) _1 stays in the code and
thus _1 is live.
The issue here is that because the "outer" pattern consumes
patt_64 = (int) patt_63 it should have adjusted _2 = (int) _1 stmt-to-vectorize
as being the outer pattern root stmt for all this logic to work correctly.
Otherwise we have no means of identifying whether a scalar stmt takes part
in vectorization or not.
I'm not sure what restrictions we place on pattern recognition of patterns - do
we require single-uses or do we allow the situation that one vectorization
path picks up the "inner" pattern while another picks the "outer" one?
In theory we can hack up the liveness analysis but as you noticed that
isn't the part doing the costing. The costing part is just written in
the very same way (vect_bb_vectorization_profitable_p, specifically
vect_slp_gather_vectorized_scalar_stmts and vect_bb_slp_scalar_cost).
Basically the scalar cost is
the cost of the scalar stmts that are fully replaced (can be DCEd after
vectorization) by the vector stmts.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
2023-12-20 9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
` (3 preceding siblings ...)
2023-12-21 7:31 ` rguenth at gcc dot gnu.org
@ 2023-12-21 11:01 ` rsandifo at gcc dot gnu.org
2023-12-22 3:55 ` fxue at os dot amperecomputing.com
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-12-21 11:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091
--- Comment #5 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
> The issue here is that because the "outer" pattern consumes
> patt_64 = (int) patt_63 it should have adjusted _2 = (int) _1 stmt-to-vectorize
> as being the outer pattern root stmt for all this logic to work correctly.
I don't think it can though, at least not in general. The final pattern
stmt has to compute the same value as the original scalar stmt.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
2023-12-20 9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
` (4 preceding siblings ...)
2023-12-21 11:01 ` rsandifo at gcc dot gnu.org
@ 2023-12-22 3:55 ` fxue at os dot amperecomputing.com
2023-12-26 15:16 ` fxue at os dot amperecomputing.com
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2023-12-22 3:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091
--- Comment #6 from Feng Xue <fxue at os dot amperecomputing.com> ---
(In reply to Richard Sandiford from comment #5)
> > The issue here is that because the "outer" pattern consumes
> > patt_64 = (int) patt_63 it should have adjusted _2 = (int) _1 stmt-to-vectorize
> > as being the outer pattern root stmt for all this logic to work correctly.
>
> I don't think it can though, at least not in general. The final pattern
> stmt has to compute the same value as the original scalar stmt.
Could current pattern replacement support N:1 mapping (N stmts -> 1 pattern)?
If not, probably this handing would break related code somewhere.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
2023-12-20 9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
` (5 preceding siblings ...)
2023-12-22 3:55 ` fxue at os dot amperecomputing.com
@ 2023-12-26 15:16 ` fxue at os dot amperecomputing.com
2023-12-29 10:35 ` fxue at os dot amperecomputing.com
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2023-12-26 15:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091
--- Comment #7 from Feng Xue <fxue at os dot amperecomputing.com> ---
> The issue here is that because the "outer" pattern consumes
> patt_64 = (int) patt_63 it should have adjusted _2 = (int) _1
> stmt-to-vectorize
> as being the outer pattern root stmt for all this logic to work correctly.
We could not simply make this adjustment since pattern recognition does not
require replaced SSA to be of single-use. If we change the above case to attach
another scalar use to "_2" as:
int foo(char *a, char *b)
{
unsigned array[8];
int a0 = a[0]; // _2 = (int) _1;
array[0] = (a0 - b[0]);
array[1] = (a[1] - b[1]);
array[2] = (a[2] - b[2]);
array[3] = (a[3] - b[3]);
array[4] = (a[4] - b[4]);
array[5] = (a[5] - b[5]);
array[6] = (a[6] - b[6]);
array[7] = (a[7] - b[7]);
return test(array) + a0;
}
The pattern statement "patt_64 = (int) patt_63" for "_2 = (int) _1" should be
kept. So we also need the check of "identifying whether a scalar stmt takes
part
in vectorization or not" to ensure the adjustment is doable.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
2023-12-20 9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
` (6 preceding siblings ...)
2023-12-26 15:16 ` fxue at os dot amperecomputing.com
@ 2023-12-29 10:35 ` fxue at os dot amperecomputing.com
2024-01-16 3:36 ` cvs-commit at gcc dot gnu.org
2024-01-31 3:13 ` fxue at os dot amperecomputing.com
9 siblings, 0 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2023-12-29 10:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091
--- Comment #8 from Feng Xue <fxue at os dot amperecomputing.com> ---
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641547.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
2023-12-20 9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
` (7 preceding siblings ...)
2023-12-29 10:35 ` fxue at os dot amperecomputing.com
@ 2024-01-16 3:36 ` cvs-commit at gcc dot gnu.org
2024-01-31 3:13 ` fxue at os dot amperecomputing.com
9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-16 3:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091
--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Feng Xue <fxue@gcc.gnu.org>:
https://gcc.gnu.org/g:57f611604e8bab67af6c0bcfe6ea88c001408412
commit r14-7272-g57f611604e8bab67af6c0bcfe6ea88c001408412
Author: Feng Xue <fxue@os.amperecomputing.com>
Date: Thu Dec 28 16:55:39 2023 +0800
Do not count unused scalar use when marking STMT_VINFO_LIVE_P [PR113091]
When pattern recognition is involved, a statement whose definition is
consumed in some pattern, may not be included in the final replacement
pattern statements, and would be skipped when building SLP graph.
* Original
char a_c = *(char *) a;
char b_c = *(char *) b;
unsigned short a_s = (unsigned short) a_c;
int a_i = (int) a_s;
int b_i = (int) b_c;
int r_i = a_i - b_i;
* After pattern replacement
a_s = (unsigned short) a_c;
a_i = (int) a_s;
patt_b_s = (unsigned short) b_c; // b_i = (int) b_c
patt_b_i = (int) patt_b_s; // b_i = (int) b_c
patt_r_s = widen_minus(a_c, b_c); // r_i = a_i - b_i
patt_r_i = (int) patt_r_s; // r_i = a_i - b_i
The definitions of a_i(original statement) and b_i(pattern statement)
are related to, but actually not part of widen_minus pattern.
Vectorizing the pattern does not cause these definition statements to
be marked as PURE_SLP. For this case, we need to recursively check
whether their uses are all absorbed into vectorized code. But there
is an exception that some use may participate in an vectorized
operation via an external SLP node containing that use as an element.
gcc/ChangeLog:
PR tree-optimization/113091
* tree-vect-slp.cc (vect_slp_has_scalar_use): New function.
(vect_bb_slp_mark_live_stmts): New parameter scalar_use_map, check
scalar use with new function.
(vect_bb_slp_mark_live_stmts): New function as entry to existing
overriden functions with same name.
(vect_slp_analyze_operations): Call new entry function to mark
live statements.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/bb-slp-pr113091.c: New test.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
2023-12-20 9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
` (8 preceding siblings ...)
2024-01-16 3:36 ` cvs-commit at gcc dot gnu.org
@ 2024-01-31 3:13 ` fxue at os dot amperecomputing.com
9 siblings, 0 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2024-01-31 3:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091
Feng Xue <fxue at os dot amperecomputing.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #10 from Feng Xue <fxue at os dot amperecomputing.com> ---
Fixed
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-01-31 3:13 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-20 9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
2023-12-20 13:09 ` [Bug tree-optimization/113091] " rguenth at gcc dot gnu.org
2023-12-21 5:25 ` fxue at os dot amperecomputing.com
2023-12-21 5:27 ` fxue at os dot amperecomputing.com
2023-12-21 7:31 ` rguenth at gcc dot gnu.org
2023-12-21 11:01 ` rsandifo at gcc dot gnu.org
2023-12-22 3:55 ` fxue at os dot amperecomputing.com
2023-12-26 15:16 ` fxue at os dot amperecomputing.com
2023-12-29 10:35 ` fxue at os dot amperecomputing.com
2024-01-16 3:36 ` cvs-commit at gcc dot gnu.org
2024-01-31 3:13 ` fxue at os dot amperecomputing.com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).