public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/111697] New: Sub optimal code gen for initialising vector using loop
@ 2023-10-04 19:25 prathamesh3492 at gcc dot gnu.org
2023-10-04 19:31 ` [Bug tree-optimization/111697] " pinskia at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: prathamesh3492 at gcc dot gnu.org @ 2023-10-04 19:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111697
Bug ID: 111697
Summary: Sub optimal code gen for initialising vector using
loop
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: prathamesh3492 at gcc dot gnu.org
Target Milestone: ---
Hi,
For the following test-case:
typedef int v4si __attribute__((vector_size (sizeof (int) * 4)));
v4si f(int x)
{
v4si v;
for (int i = 0; i < 4; i++)
v[i] = x;
return v;
}
Compiling with -O2 results in following .optimized dump:
v4si f (int x)
{
v4si v;
<bb 2> [local count: 214748368]:
v_16 = BIT_INSERT_EXPR <v_12(D), x_6(D), 0 (32 bits)>;
v_20 = BIT_INSERT_EXPR <v_16, x_6(D), 32 (32 bits)>;
v_24 = BIT_INSERT_EXPR <v_20, x_6(D), 64 (32 bits)>;
v_2 = BIT_INSERT_EXPR <v_24, x_6(D), 96 (32 bits)>;
return v_2;
}
and following code-gen on aarch64:
f:
movi v0.4s, 0
fmov s31, w0
ins v0.s[0], v31.s[0]
ins v0.s[1], v31.s[0]
ins v0.s[2], v31.s[0]
ins v0.s[3], v31.s[0]
ret
which could instead be a single dup instruction:
f:
dup v0.4s, w0
ret
Similarly, code-gen on x86_64:
f:
movd %edi, %xmm0
movd %edi, %xmm1
pshufd $225, %xmm0, %xmm0
movss %xmm1, %xmm0
pshufd $225, %xmm0, %xmm0
pshufd $198, %xmm0, %xmm0
movss %xmm1, %xmm0
pshufd $198, %xmm0, %xmm0
pshufd $39, %xmm0, %xmm0
movss %xmm1, %xmm0
pshufd $39, %xmm0, %xmm0
ret
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/111697] Sub optimal code gen for initialising vector using loop
2023-10-04 19:25 [Bug tree-optimization/111697] New: Sub optimal code gen for initialising vector using loop prathamesh3492 at gcc dot gnu.org
@ 2023-10-04 19:31 ` pinskia at gcc dot gnu.org
2023-10-05 7:46 ` rguenth at gcc dot gnu.org
2023-10-05 7:48 ` rguenth at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-04 19:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111697
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2023-10-04
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed. PR 58497 is basically the same issue in the end. I had patches for
this but I was not 100% sure it was handling that in a decent location.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/111697] Sub optimal code gen for initialising vector using loop
2023-10-04 19:25 [Bug tree-optimization/111697] New: Sub optimal code gen for initialising vector using loop prathamesh3492 at gcc dot gnu.org
2023-10-04 19:31 ` [Bug tree-optimization/111697] " pinskia at gcc dot gnu.org
@ 2023-10-05 7:46 ` rguenth at gcc dot gnu.org
2023-10-05 7:48 ` rguenth at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-05 7:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111697
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
We have quite some code doing vector CTOR stuff in tree-ssa-forwprop.cc and
this should be optimized to
v_2 = { x_6(D), x_6(D), x_6(D), x_6(D) };
note SLP vectorization can do this but it fails because it doesn't handle
a default def insert - it handles a group of BIT_INSERT_EXPRs as
vector CTOR and SLP discovery doesn't know how to start from external defs
(it needs actual definition stmts).
A more general approach would be to try to track vector construction through
symbolic execution like we form bswap in the bswap pass.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/111697] Sub optimal code gen for initialising vector using loop
2023-10-04 19:25 [Bug tree-optimization/111697] New: Sub optimal code gen for initialising vector using loop prathamesh3492 at gcc dot gnu.org
2023-10-04 19:31 ` [Bug tree-optimization/111697] " pinskia at gcc dot gnu.org
2023-10-05 7:46 ` rguenth at gcc dot gnu.org
@ 2023-10-05 7:48 ` rguenth at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-05 7:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111697
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> We have quite some code doing vector CTOR stuff in tree-ssa-forwprop.cc and
> this should be optimized to
>
> v_2 = { x_6(D), x_6(D), x_6(D), x_6(D) };
>
> note SLP vectorization can do this but it fails because it doesn't handle
> a default def insert - it handles a group of BIT_INSERT_EXPRs as
> vector CTOR and SLP discovery doesn't know how to start from external defs
> (it needs actual definition stmts).
>
> A more general approach would be to try to track vector construction through
> symbolic execution like we form bswap in the bswap pass.
You could "steal" the code in vect_slp_check_for_roots,
else if (code == BIT_INSERT_EXPR
&& VECTOR_TYPE_P (TREE_TYPE (rhs))
&& TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs)).is_constant ()
&& TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs)).to_constant () > 1
&& integer_zerop (gimple_assign_rhs3 (assign))
&& useless_type_conversion_p
(TREE_TYPE (TREE_TYPE (rhs)),
TREE_TYPE (gimple_assign_rhs2 (assign)))
&& bb_vinfo->lookup_def (gimple_assign_rhs2 (assign)))
{
/* We start to match on insert to lane zero but since the
inserts need not be ordered we'd have to search both
the def and the use chains. */
...
and put it into tree-ssa-forwprop.cc, explicitly creating the vector CTOR.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-10-05 7:48 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-04 19:25 [Bug tree-optimization/111697] New: Sub optimal code gen for initialising vector using loop prathamesh3492 at gcc dot gnu.org
2023-10-04 19:31 ` [Bug tree-optimization/111697] " pinskia at gcc dot gnu.org
2023-10-05 7:46 ` rguenth at gcc dot gnu.org
2023-10-05 7:48 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).