* [Bug tree-optimization/106989] GCC fail to vectorize and clang succeed
2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
@ 2022-09-20 23:21 ` pinskia at gcc dot gnu.org
2022-09-20 23:46 ` crazylht at gmail dot com
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-09-20 23:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |53947
Keywords| |missed-optimization
Component|c |tree-optimization
Severity|normal |enhancement
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/106989] GCC fail to vectorize and clang succeed
2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
2022-09-20 23:21 ` [Bug tree-optimization/106989] " pinskia at gcc dot gnu.org
@ 2022-09-20 23:46 ` crazylht at gmail dot com
2022-09-21 0:42 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: crazylht at gmail dot com @ 2022-09-20 23:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989
--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
> real_t* __restrict__ xx;
> real_t* yy;
> real_t s243(void)
> {
> for (int nl = 0; nl < iterations; nl++) {
> for (int i = 0; i < LEN_1D-1; i++) {
> a[i] = b[i] + c[i ] * d[i];
> b[i] = a[i] + d[i ] * e[i];
> a[i] = b[i] + a[i+1] * d[i];
> }
> }
> }
Manually change the code to below, gcc can vectorize the loop.
real_t s243(void)
{
for (int nl = 0; nl < iterations; nl++) {
for (int i = 0; i < LEN_1D-1; i++) {
// a[i] = b[i] + c[i ] * d[i]; propagate it into next line.
b[i] = b[i] + c[i ] * d[i] + d[i ] * e[i];
a[i] = b[i] + a[i+1] * d[i];
}
}
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/106989] GCC fail to vectorize and clang succeed
2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
2022-09-20 23:21 ` [Bug tree-optimization/106989] " pinskia at gcc dot gnu.org
2022-09-20 23:46 ` crazylht at gmail dot com
@ 2022-09-21 0:42 ` pinskia at gcc dot gnu.org
2022-09-21 0:45 ` pinskia at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-09-21 0:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
/app/example.cpp:20:25: note: Detected interleaving store a[i_27] and a[i_27]
/app/example.cpp:20:25: note: Queuing group with duplicate access for fixup
/app/example.cpp:20:25: note: zero step in outer loop.
/app/example.cpp:20:25: note: zero step in outer loop.
/app/example.cpp:20:25: missed: not vectorized: complicated access pattern.
/app/example.cpp:22:18: missed: not vectorized: complicated access pattern.
/app/example.cpp:20:25: missed: bad data access.
...
/app/example.cpp:21:27: note: dependence distance = 0.
/app/example.cpp:21:27: note: dependence distance == 0 between b[i_27] and
b[i_27]
/app/example.cpp:21:27: note: dependence distance = 1.
/app/example.cpp:22:18: missed: not vectorized, possible dependence between
data-refs a[i_27] and a[_9]
/app/example.cpp:21:27: missed: bad data dependence.
/app/example.cpp:21:27: note: ***** Analysis failed with vector mode V4SF
There is a missing DSE before hand:
# VUSE <.MEM_28>
_1 = bD.3768[i_27];
# VUSE <.MEM_28>
_2 = cD.3769[i_27];
# VUSE <.MEM_28>
_3 = dD.3770[i_27];
_4 = _2 * _3;
_5 = _1 + _4;
# .MEM_19 = VDEF <.MEM_28>
aD.3767[i_27] = _5;
# VUSE <.MEM_19>
_6 = eD.3771[i_27];
_7 = _3 * _6;
_8 = _5 + _7;
# .MEM_20 = VDEF <.MEM_19>
bD.3768[i_27] = _8;
# RANGE [irange] int [1, 31999] NONZERO 0x7fff
_9 = i_27 + 1;
# VUSE <.MEM_20>
_10 = aD.3767[_9];
_11 = _3 * _10;
_12 = _8 + _11;
# .MEM_21 = VDEF <.MEM_20>
aD.3767[i_27] = _12;
DSE does not notice the store defining MEM_19 does touch the load:
# VUSE <.MEM_20>
_10 = aD.3767[_9];
And that it is redudent with the store defining MEM_21.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/106989] GCC fail to vectorize and clang succeed
2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
` (2 preceding siblings ...)
2022-09-21 0:42 ` pinskia at gcc dot gnu.org
@ 2022-09-21 0:45 ` pinskia at gcc dot gnu.org
2022-09-21 0:50 ` juzhe.zhong at rivai dot ai
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-09-21 0:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The DSE happens but only at the RTL level ....
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/106989] GCC fail to vectorize and clang succeed
2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
` (3 preceding siblings ...)
2022-09-21 0:45 ` pinskia at gcc dot gnu.org
@ 2022-09-21 0:50 ` juzhe.zhong at rivai dot ai
2022-09-21 8:11 ` rguenth at gcc dot gnu.org
2022-09-22 15:56 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2022-09-21 0:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989
--- Comment #4 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Andrew Pinski from comment #3)
> The DSE happens but only at the RTL level ....
Is it a good idea to do data-ref in DSE and remove the first redundant store?
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/106989] GCC fail to vectorize and clang succeed
2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
` (4 preceding siblings ...)
2022-09-21 0:50 ` juzhe.zhong at rivai dot ai
@ 2022-09-21 8:11 ` rguenth at gcc dot gnu.org
2022-09-22 15:56 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-09-21 8:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2022-09-21
CC| |rguenth at gcc dot gnu.org
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #4)
> (In reply to Andrew Pinski from comment #3)
> > The DSE happens but only at the RTL level ....
>
> Is it a good idea to do data-ref in DSE and remove the first redundant store?
Probably - of course that makes things more expensive. The case at hand
can probably be handled by what LIMs mem_refs_may_alias_p does, using
get_inner_reference_aff (mem1->mem.ref, &off1, &size1);
get_inner_reference_aff (mem2->mem.ref, &off2, &size2);
aff_combination_expand (&off1, ttae_cache);
aff_combination_expand (&off2, ttae_cache);
aff_combination_scale (&off1, -1);
aff_combination_add (&off2, &off1);
if (aff_comb_cannot_overlap_p (&off2, size1, size2))
return false;
but I'm not sure we should add more code doing things like that ...
If we think that firing up dataref analysis at the point we discover the
possible use is too expensive we could also optimistically queue them
and only when we find a killing def (and thus the store would be dead),
process the queued uses, checking them if they are really important.
But well, maybe just try the simplest approach and measure the compile-time
effect. That is, in
/* If the statement is a use the store is not dead. */
else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
{
/* Handle common cases where we can easily build an ao_ref
structure for USE_STMT and in doing so we find that the
references hit non-live bytes and thus can be ignored.
for a gimple assignment, check dr_may_alias_p after analyzing both
stmts (we can of course at least cache the DR for 'stmt'). Guarded
with flag_expensive_optimizations (thus -O2+). You also need to then
initialize loops and SCEV.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/106989] GCC fail to vectorize and clang succeed
2022-09-20 23:05 [Bug c/106989] New: GCC fail to vectorize and clang succeed juzhe.zhong at rivai dot ai
` (5 preceding siblings ...)
2022-09-21 8:11 ` rguenth at gcc dot gnu.org
@ 2022-09-22 15:56 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-09-22 15:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106989
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |DUPLICATE
Status|NEW |RESOLVED
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Exact dup of bug 99407.
*** This bug has been marked as a duplicate of bug 99407 ***
^ permalink raw reply [flat|nested] 8+ messages in thread