public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/113592] New: missed partial sum optimization in vectorizer
@ 2024-01-25 3:48 liuhongt at gcc dot gnu.org
2024-01-25 3:50 ` [Bug tree-optimization/113592] " liuhongt at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-25 3:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113592
Bug ID: 113592
Summary: missed partial sum optimization in vectorizer
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: liuhongt at gcc dot gnu.org
Target Milestone: ---
double
foo (short* p, int n)
{
double sum = 0;
for (int i = 0; i != n; i++)
sum += p[i] * (double)p[i];
return sum;
}
w/ fast-math vectorizer generates
<bb 5> [local count: 860067200]:
# vect_sum_16.8_44 = PHI <vect_sum_12.15_61(5), { 0.0, 0.0, 0.0, 0.0 }(4)>
# ivtmp.35_152 = PHI <ivtmp.35_151(5), ivtmp.35_150(4)>
# DEBUG BEGIN_STMT
# DEBUG D#13 => D#14 * 2
# DEBUG D#12 => p_11(D) + D#13
_149 = (void *) ivtmp.35_152;
vect__4.11_47 = MEM <vector(16) short int> [(short int *)_149];
# DEBUG D#11 => *D#12
vect__5.13_48 = [vec_unpack_lo_expr] vect__4.11_47;
vect__5.13_49 = [vec_unpack_hi_expr] vect__4.11_47;
vect__5.12_50 = [vec_unpack_float_lo_expr] vect__5.13_48;
vect__5.12_51 = [vec_unpack_float_hi_expr] vect__5.13_48;
vect__5.12_52 = [vec_unpack_float_lo_expr] vect__5.13_49;
vect__5.12_53 = [vec_unpack_float_hi_expr] vect__5.13_49;
# DEBUG D#10 => (double) D#11
vect_powmult_6.14_55 = vect__5.12_51 * vect__5.12_51;
_62 = .FMA (vect__5.12_50, vect__5.12_50, vect_powmult_6.14_55);
vect_powmult_6.14_57 = vect__5.12_53 * vect__5.12_53;
_45 = .FMA (vect__5.12_52, vect__5.12_52, vect_powmult_6.14_57);
_46 = _45 + _62;
# DEBUG D#9 => D#10 * D#10
vect_sum_12.15_61 = vect_sum_16.8_44 + _46;
# DEBUG sum => D#8
# DEBUG BEGIN_STMT
# DEBUG i => NULL
# DEBUG sum => D#8
# DEBUG BEGIN_STMT
ivtmp.35_151 = ivtmp.35_152 + 32;
if (_18 != ivtmp.35_151)
goto <bb 5>; [89.00%]
else
goto <bb 8>; [11.00%]
But it can be better with.
....
vect_powmult_6.14_55 = .FMA (vect__5.12_51, vect__5.12_51, 0);
_62 = .FMA (vect__5.12_50, vect__5.12_50, 0);
vect_powmult_6.14_57 = .FMA (vect__5.12_53, vect__5.12_53, 0);
_45 = .FMA (vect__5.12_52, vect__5.12_52, 0);
ivtmp.35_151 = ivtmp.35_152 + 32;
if (_18 != ivtmp.35_151)
goto <bb 5>; [89.00%]
else
goto <bb 8>; [11.00%]
<bb 8>
_tmp1 = vect_powmult_6.14_55 + _62;
_tmp2 = vect_powmult_6.14_57 + _45;
_tmp3 = _tmp1 + _tmp2;
_tmp4_scalar = .REDUCE_SUM (_tmp3);
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/113592] missed partial sum optimization in vectorizer
2024-01-25 3:48 [Bug tree-optimization/113592] New: missed partial sum optimization in vectorizer liuhongt at gcc dot gnu.org
@ 2024-01-25 3:50 ` liuhongt at gcc dot gnu.org
2024-01-25 3:50 ` liuhongt at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-25 3:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113592
--- Comment #1 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
*** Bug 113593 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/113592] missed partial sum optimization in vectorizer
2024-01-25 3:48 [Bug tree-optimization/113592] New: missed partial sum optimization in vectorizer liuhongt at gcc dot gnu.org
2024-01-25 3:50 ` [Bug tree-optimization/113592] " liuhongt at gcc dot gnu.org
@ 2024-01-25 3:50 ` liuhongt at gcc dot gnu.org
2024-01-25 4:46 ` liuhongt at gcc dot gnu.org
2024-01-25 9:57 ` rguenth at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-25 3:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113592
--- Comment #2 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
*** Bug 113594 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/113592] missed partial sum optimization in vectorizer
2024-01-25 3:48 [Bug tree-optimization/113592] New: missed partial sum optimization in vectorizer liuhongt at gcc dot gnu.org
2024-01-25 3:50 ` [Bug tree-optimization/113592] " liuhongt at gcc dot gnu.org
2024-01-25 3:50 ` liuhongt at gcc dot gnu.org
@ 2024-01-25 4:46 ` liuhongt at gcc dot gnu.org
2024-01-25 9:57 ` rguenth at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-25 4:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113592
--- Comment #3 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
This testcase is probably not a good example for typical partail sum which
relies on unroll loops.
double
foo (double* p, int n)
{
double sum = 0;
for (int i = 0; i != n; i++)
sum += p[i] * p[i];
return sum;
}
This should be better, vectorizer can do unroll + partial_sum itselves.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/113592] missed partial sum optimization in vectorizer
2024-01-25 3:48 [Bug tree-optimization/113592] New: missed partial sum optimization in vectorizer liuhongt at gcc dot gnu.org
` (2 preceding siblings ...)
2024-01-25 4:46 ` liuhongt at gcc dot gnu.org
@ 2024-01-25 9:57 ` rguenth at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-25 9:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113592
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86_64-*-*
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
The vectorizer for the original testcase generates
# vect_sum_20.8_49 = PHI <vect_sum_16.21_75(6), { 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0 }(9)>
...
vect__9.20_68 = vect__5.12_55 * vect__8.16_61;
vect__9.20_69 = vect__5.12_56 * vect__8.17_63;
vect__9.20_70 = vect__5.12_57 * vect__8.18_65;
vect__9.20_71 = vect__5.12_58 * vect__8.19_67;
_9 = _5 * _8;
vect_sum_16.21_72 = vect__9.20_68 + vect_sum_20.8_49;
vect_sum_16.21_73 = vect__9.20_69 + vect_sum_16.21_72;
vect_sum_16.21_74 = vect__9.20_70 + vect_sum_16.21_73;
vect_sum_16.21_75 = vect__9.20_71 + vect_sum_16.21_74;
sum_16 = _9 + sum_20;
the adds are from the optimization to reduce the number of reduction IVs
(we could alternatively keep them independent with 4 IVs and handle the
reducing in the epilogue). This is to reduce register pressure.
But this also shows if the issue isn't the multiple IVs, that this could
be handled by reassoc + FMA forming given the vectorizer itself doesn't
produce FMAs here.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-01-25 9:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-25 3:48 [Bug tree-optimization/113592] New: missed partial sum optimization in vectorizer liuhongt at gcc dot gnu.org
2024-01-25 3:50 ` [Bug tree-optimization/113592] " liuhongt at gcc dot gnu.org
2024-01-25 3:50 ` liuhongt at gcc dot gnu.org
2024-01-25 4:46 ` liuhongt at gcc dot gnu.org
2024-01-25 9:57 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).