public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/114192] New: scalar code left around following early break vectorization of reduction
@ 2024-03-01 16:44 acoplan at gcc dot gnu.org
2024-03-01 17:00 ` [Bug tree-optimization/114192] " tnfchris at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: acoplan at gcc dot gnu.org @ 2024-03-01 16:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114192
Bug ID: 114192
Summary: scalar code left around following early break
vectorization of reduction
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: acoplan at gcc dot gnu.org
Target Milestone: ---
For the following testcase:
int a[1024];
int f4(int *x, int n)
{
int sum = 0;
for (int i = 0; i < n; i++)
{
sum += a[i];
if (a[i] == 42)
break;
}
return sum;
}
at -O3 on aarch64 we vectorize it and get the following vector loop:
.L4:
cmp x7, x2
beq .L23
.L6:
ubfiz x3, x2, 4, 32
ldr w6, [x4, x2, lsl 2] // scalar load
mov v27.16b, v30.16b
mov w0, w5
add v30.4s, v30.4s, v25.4s
add w5, w5, w6 // scalar add
ldr q29, [x4, x3]
add x2, x2, 1
cmeq v31.4s, v29.4s, v26.4s
add v28.4s, v28.4s, v29.4s
umaxp v31.4s, v31.4s, v31.4s
fmov x3, d31
cbz x3, .L4
but here the old scalar code has been left around. If we remove the early exit
from the loop, then although we still leave the scalar code around in the
vectorizer, it gets optimized away immediately by the following DCE pass.
Without the early exit, in the vectorizer dump we have:
<bb 3> [local count: 860067200]:
# sum_10 = PHI <sum_6(6), 0(9)>
# i_12 = PHI <i_7(6), 0(9)>
# vect_sum_10.8_25 = PHI <vect_sum_6.12_29(6), { 0, 0, 0, 0 }(9)>
# vectp_a.9_26 = PHI <vectp_a.9_27(6), &a(9)>
# ivtmp_32 = PHI <ivtmp_33(6), 0(9)>
vect__1.11_28 = MEM <vector(4) int> [(int *)vectp_a.9_26];
_1 = a[i_12]; // scalar load
vect_sum_6.12_29 = vect__1.11_28 + vect_sum_10.8_25;
sum_6 = _1 + sum_10;
i_7 = i_12 + 1;
vectp_a.9_27 = vectp_a.9_26 + 16;
ivtmp_33 = ivtmp_32 + 1;
if (ivtmp_33 < bnd.5_22)
goto <bb 6>; [89.00%]
else
goto <bb 11>; [11.00%]
i.e. the scalar load is left around, but it seems to get cleaned up by the
(immediately following) dce pass:
<bb 3> [local count: 860067200]:
# vect_sum_10.8_25 = PHI <vect_sum_6.12_29(6), { 0, 0, 0, 0 }(9)>
# vectp_a.9_26 = PHI <vectp_a.9_27(6), &a(9)>
# ivtmp_32 = PHI <ivtmp_33(6), 0(9)>
vect__1.11_28 = MEM <vector(4) int> [(int *)vectp_a.9_26];
vect_sum_6.12_29 = vect__1.11_28 + vect_sum_10.8_25;
vectp_a.9_27 = vectp_a.9_26 + 16;
ivtmp_33 = ivtmp_32 + 1;
if (ivtmp_33 < bnd.5_22)
goto <bb 6>; [89.00%]
else
goto <bb 11>; [11.00%]
perhaps the dce needs improving to clean up the dead scalar code in the early
exit case, too.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/114192] scalar code left around following early break vectorization of reduction
2024-03-01 16:44 [Bug tree-optimization/114192] New: scalar code left around following early break vectorization of reduction acoplan at gcc dot gnu.org
@ 2024-03-01 17:00 ` tnfchris at gcc dot gnu.org
2024-03-04 8:29 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-03-01 17:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114192
Tamar Christina <tnfchris at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Last reconfirmed| |2024-03-01
--- Comment #1 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Confirmed.
It looks like DCE6 no longer thinks:
# sum_10 = PHI <sum_7(7), 0(11)>
_1 = aD.4432[i_12];
sum_7 = _1 + sum_11;
is dead after vectorization.
it removes the only dead consumer of sum_7,
a PHI node left over in the guard block which becomes unused after the
reduction is vectorized.
DCE says:
marking necessary through sum_11 stmt sum_11 = PHI <sum_7(7), 0(11)>
processing: sum_11 = PHI <sum_7(7), 0(11)>
marking necessary through sum_7 stmt sum_7 = _1 + sum_11;
processing: sum_7 = _1 + sum_11;
marking necessary through _1 stmt _1 = a[i_12];
processing: _1 = a[i_12];
so it thinks the closed definition is needed?
This seems to only happen with reductions, other live operations look fine:
extern int a[1024];
int f4(int *x, int n)
{
int sum = 0;
for (int i = 0; i < n; i++)
{
sum = a[i];
if (a[i] == 42)
break;
}
return sum;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/114192] scalar code left around following early break vectorization of reduction
2024-03-01 16:44 [Bug tree-optimization/114192] New: scalar code left around following early break vectorization of reduction acoplan at gcc dot gnu.org
2024-03-01 17:00 ` [Bug tree-optimization/114192] " tnfchris at gcc dot gnu.org
@ 2024-03-04 8:29 ` rguenth at gcc dot gnu.org
2024-03-04 9:00 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-04 8:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114192
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
Status|NEW |ASSIGNED
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is the scalar reduction value is live from the main loop to the
epilog. We don't seem to use the vector .REDUC_PLUS value on both paths.
Likely failure of reduction epilog generation. Let me have a look.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/114192] scalar code left around following early break vectorization of reduction
2024-03-01 16:44 [Bug tree-optimization/114192] New: scalar code left around following early break vectorization of reduction acoplan at gcc dot gnu.org
2024-03-01 17:00 ` [Bug tree-optimization/114192] " tnfchris at gcc dot gnu.org
2024-03-04 8:29 ` rguenth at gcc dot gnu.org
@ 2024-03-04 9:00 ` rguenth at gcc dot gnu.org
2024-03-04 9:02 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-04 9:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114192
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 57600
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57600&action=edit
patch
Ah, so the issue is that we only replace the LHS, in this case we pass
the reduction stmt for the early exit but the live value is defined by
the PHI (we re-start the iteration). That confuses the replacement process.
It looks like it might also be wrong for the peeled case on the main edge?
The following fixes it for me. Didn't check what happens for the peeled
case with a reduction.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/114192] scalar code left around following early break vectorization of reduction
2024-03-01 16:44 [Bug tree-optimization/114192] New: scalar code left around following early break vectorization of reduction acoplan at gcc dot gnu.org
` (2 preceding siblings ...)
2024-03-04 9:00 ` rguenth at gcc dot gnu.org
@ 2024-03-04 9:02 ` rguenth at gcc dot gnu.org
2024-03-04 10:45 ` cvs-commit at gcc dot gnu.org
2024-03-04 10:45 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-04 9:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114192
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
vect-early-break_104-pr113373.c might be such case which ICEs then.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/114192] scalar code left around following early break vectorization of reduction
2024-03-01 16:44 [Bug tree-optimization/114192] New: scalar code left around following early break vectorization of reduction acoplan at gcc dot gnu.org
` (3 preceding siblings ...)
2024-03-04 9:02 ` rguenth at gcc dot gnu.org
@ 2024-03-04 10:45 ` cvs-commit at gcc dot gnu.org
2024-03-04 10:45 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-03-04 10:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114192
--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:324d2907c86f05e40dc52d226940308f53a956c2
commit r14-9292-g324d2907c86f05e40dc52d226940308f53a956c2
Author: Richard Biener <rguenther@suse.de>
Date: Mon Mar 4 09:46:13 2024 +0100
tree-optimization/114192 - scalar reduction kept live with early break vect
The following fixes a missing replacement of the reduction value
used in the epilog, causing the scalar reduction to be kept live
across the early break exit path.
PR tree-optimization/114192
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Use the
appropriate def for the live out stmt in case of an alternate
exit.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/114192] scalar code left around following early break vectorization of reduction
2024-03-01 16:44 [Bug tree-optimization/114192] New: scalar code left around following early break vectorization of reduction acoplan at gcc dot gnu.org
` (4 preceding siblings ...)
2024-03-04 10:45 ` cvs-commit at gcc dot gnu.org
@ 2024-03-04 10:45 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-04 10:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114192
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-03-04 10:45 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-01 16:44 [Bug tree-optimization/114192] New: scalar code left around following early break vectorization of reduction acoplan at gcc dot gnu.org
2024-03-01 17:00 ` [Bug tree-optimization/114192] " tnfchris at gcc dot gnu.org
2024-03-04 8:29 ` rguenth at gcc dot gnu.org
2024-03-04 9:00 ` rguenth at gcc dot gnu.org
2024-03-04 9:02 ` rguenth at gcc dot gnu.org
2024-03-04 10:45 ` cvs-commit at gcc dot gnu.org
2024-03-04 10:45 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).