public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can
@ 2024-08-12 6:50 kugan at gcc dot gnu.org
2024-08-12 7:03 ` [Bug tree-optimization/116338] " pinskia at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: kugan at gcc dot gnu.org @ 2024-08-12 6:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338
Bug ID: 116338
Summary: GCC is not vectoring TSVC s255 while clang can
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: kugan at gcc dot gnu.org
Target Milestone: ---
reduced test case:
typedef float real_t;
extern __attribute__((aligned(64))) real_t a[32000], b[32000];
void s255()
{
real_t x, y;
x = b[32000 -1];
y = b[32000 -2];
for (int i = 0; i < 32000; i++) {
a[i] = (b[i] + x + y) * (real_t).333;
y = x;
x = b[i];
}
}
gcc is not able to vectorize the loop whereas clang can. See
https://godbolt.org/z/64Kxaahqr
gcc -v
Using built-in specs.
COLLECT_GCC=/home/kvivekananda/install/bin/gcc
COLLECT_LTO_WRAPPER=/home/kvivekananda/install/libexec/gcc/aarch64-unknown-linux-gnu/15.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc_base/configure --prefix=/home/kvivekananda/install/
--enable-languages=c,c++,fortran,lto,objc
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 15.0.0 20240618 (experimental) (GCC)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/116338] GCC is not vectoring TSVC s255 while clang can
2024-08-12 6:50 [Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can kugan at gcc dot gnu.org
@ 2024-08-12 7:03 ` pinskia at gcc dot gnu.org
2024-08-19 14:15 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-12 7:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2024-08-12
Ever confirmed|0 |1
Severity|normal |enhancement
Status|UNCONFIRMED |NEW
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/116338] GCC is not vectoring TSVC s255 while clang can
2024-08-12 6:50 [Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can kugan at gcc dot gnu.org
2024-08-12 7:03 ` [Bug tree-optimization/116338] " pinskia at gcc dot gnu.org
@ 2024-08-19 14:15 ` rguenth at gcc dot gnu.org
2024-08-20 7:24 ` kugan at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-08-19 14:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is the recurrence
<bb 2> [local count: 10737416]:
x_10 = b[31999];
y_11 = b[31998];
<bb 3> [local count: 1063004408]:
# x_18 = PHI <_1(5), x_10(2)>
# y_19 = PHI <x_18(5), y_11(2)>
_1 = b[i_20];
..
<bb 5> [local count: 1052266995]:
goto <bb 3>; [100.00%]
we handle some cases via vect_phi_first_order_recurrence_p, somebody needs
to dig in why this one isn't (or can't be) handled with that mechanism.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/116338] GCC is not vectoring TSVC s255 while clang can
2024-08-12 6:50 [Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can kugan at gcc dot gnu.org
2024-08-12 7:03 ` [Bug tree-optimization/116338] " pinskia at gcc dot gnu.org
2024-08-19 14:15 ` rguenth at gcc dot gnu.org
@ 2024-08-20 7:24 ` kugan at gcc dot gnu.org
2024-08-20 7:43 ` rguenth at gcc dot gnu.org
2024-08-21 3:45 ` kugan at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: kugan at gcc dot gnu.org @ 2024-08-20 7:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338
--- Comment #3 from kugan at gcc dot gnu.org ---
(In reply to Richard Biener from comment #2)
> The issue is the recurrence
>
> <bb 2> [local count: 10737416]:
> x_10 = b[31999];
> y_11 = b[31998];
>
> <bb 3> [local count: 1063004408]:
> # x_18 = PHI <_1(5), x_10(2)>
> # y_19 = PHI <x_18(5), y_11(2)>
> _1 = b[i_20];
> ..
>
> <bb 5> [local count: 1052266995]:
> goto <bb 3>; [100.00%]
>
> we handle some cases via vect_phi_first_order_recurrence_p, somebody needs
> to dig in why this one isn't (or can't be) handled with that mechanism.
/* Ensure the loop latch definition is from within the loop. */
edge latch = loop_latch_edge (loop);
tree ldef = PHI_ARG_DEF_FROM_EDGE (phi, latch);
if (TREE_CODE (ldef) != SSA_NAME
|| SSA_NAME_IS_DEFAULT_DEF (ldef)
|| is_a <gphi *> (SSA_NAME_DEF_STMT (ldef))
|| !flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT (ldef))))
return false;
(gdb) p debug_tree (ldef)
<ssa_name 0xfffff7979900
type <real_type 0xfffff796d0a8 real_t sizes-gimplified SF
size <integer_cst 0xfffff7a86150 constant 32>
unit-size <integer_cst 0xfffff7a86168 constant 4>
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0xfffff7a8b2a0 precision:32
pointer_to_this <pointer_type 0xfffff79b2b28>>
visited var <var_decl 0xfffff79b1510 x>
def_stmt x_18 = PHI <_1(5), x_10(2)>
version:18>
$1 = void
That is PHI arg defined along the loop latch is also PHI stmt in the case.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/116338] GCC is not vectoring TSVC s255 while clang can
2024-08-12 6:50 [Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can kugan at gcc dot gnu.org
` (2 preceding siblings ...)
2024-08-20 7:24 ` kugan at gcc dot gnu.org
@ 2024-08-20 7:43 ` rguenth at gcc dot gnu.org
2024-08-21 3:45 ` kugan at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-08-20 7:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
You can try to see whether adding a SSA copy would make this supported, it
seems not allowing a PHI is simply a missed feature.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/116338] GCC is not vectoring TSVC s255 while clang can
2024-08-12 6:50 [Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can kugan at gcc dot gnu.org
` (3 preceding siblings ...)
2024-08-20 7:43 ` rguenth at gcc dot gnu.org
@ 2024-08-21 3:45 ` kugan at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: kugan at gcc dot gnu.org @ 2024-08-21 3:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116338
--- Comment #5 from kugan at gcc dot gnu.org ---
(In reply to Richard Biener from comment #4)
> You can try to see whether adding a SSA copy would make this supported, it
> seems not allowing a PHI is simply a missed feature.
We now fail in
/* If this isn't a nested cycle or if the nested cycle reduction value
is used ouside of the inner loop we cannot handle uses of the reduction
value. */
if (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1)
Even if I comment this, I see:
t1.c:16:25: note: worklist: examine stmt: _22 = x_18 + y_19;
t1.c:16:25: note: vect_is_simple_use: operand x_18 = PHI <_1(5), x_10(2)>,
type of def: unknown
t1.c:16:25: missed: Unsupported pattern.
t1.c:10:6: missed: not vectorized: unsupported use in stmt.
t1.c:16:25: missed: unexpected pattern.
t1.c:16:25: note: ***** Analysis failed with vector mode V4SF
Do we need to somehow mark both the PHI stents as part of the first order
reduction?
<bb 3> [local count: 1063004408]:
# x_18 = PHI <_1(5), x_10(2)>
# y_19 = PHI <x_18(5), y_11(2)>
# i_20 = PHI <i_13(5), 0(2)>
# ivtmp_17 = PHI <ivtmp_16(5), 32000(2)>
_1 = b[i_20];
_22 = x_18 + y_19;
_3 = _1 + _22;
_4 = _3 * 3.33000004291534423828125e-1;
a[i_20] = _4;
i_13 = i_20 + 1;
ivtmp_16 = ivtmp_17 - 1;
if (ivtmp_16 != 0)
goto <bb 5>; [98.99%]
else
goto <bb 4>; [1.01%]
<bb 5> [local count: 1052266995]:
goto <bb 3>; [100.00%]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-08-21 3:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-12 6:50 [Bug tree-optimization/116338] New: GCC is not vectoring TSVC s255 while clang can kugan at gcc dot gnu.org
2024-08-12 7:03 ` [Bug tree-optimization/116338] " pinskia at gcc dot gnu.org
2024-08-19 14:15 ` rguenth at gcc dot gnu.org
2024-08-20 7:24 ` kugan at gcc dot gnu.org
2024-08-20 7:43 ` rguenth at gcc dot gnu.org
2024-08-21 3:45 ` kugan at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).