public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
[not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
@ 2021-12-06 23:36 ` pinskia at gcc dot gnu.org
2021-12-07 8:22 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-06 23:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|target |tree-optimization
--- Comment #17 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed on the trunk, this is no longer a target issue:
__builtin_ia32_shufps is lowered into a VEC_PERM_EXPR now:
_1 = MEM[(const float &)v_7(D) + 12];
MEM[(float &)&D.5891] = _1;
_24 = D.5891._rep.vecf;
_25 = VEC_PERM_EXPR <_24, _24, { 0, 0, 0, 0 }>;
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
[not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
2021-12-06 23:36 ` [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing pinskia at gcc dot gnu.org
@ 2021-12-07 8:22 ` rguenth at gcc dot gnu.org
2024-04-14 1:11 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-12-07 8:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756
--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
Good:
<bb 2> [local count: 1073741824]:
_1 = *m_12(D);
_14 = VEC_PERM_EXPR <v_13(D), v_13(D), { 0, 0, 0, 0 }>;
_2 = _1 * _14;
_3 = MEM[(__v4sf *)m_12(D) + 16B];
_15 = VEC_PERM_EXPR <v_13(D), v_13(D), { 1, 1, 1, 1 }>;
_4 = _3 * _15;
_5 = _2 + _4;
_6 = MEM[(__v4sf *)m_12(D) + 32B];
_16 = VEC_PERM_EXPR <v_13(D), v_13(D), { 2, 2, 2, 2 }>;
_7 = _6 * _16;
_8 = _5 + _7;
_9 = MEM[(__v4sf *)m_12(D) + 48B];
_17 = VEC_PERM_EXPR <v_13(D), v_13(D), { 3, 3, 3, 3 }>;
_10 = _9 * _17;
_18 = _8 + _10;
return _18;
Bad:
<bb 2> [local count: 1073741824]:
_1 = *m_12(D);
_30 = BIT_FIELD_REF <v_13(D), 32, 0>;
v_28 = BIT_INSERT_EXPR <v_27(D), _30, 0>;
_29 = VEC_PERM_EXPR <v_28, v_28, { 0, 0, 0, 0 }>;
_2 = _1 * _29;
_3 = MEM[(__v4sf *)m_12(D) + 16B];
_26 = BIT_FIELD_REF <v_13(D), 32, 32>;
v_24 = BIT_INSERT_EXPR <v_23(D), _26, 0>;
_25 = VEC_PERM_EXPR <v_24, v_24, { 0, 0, 0, 0 }>;
_4 = _3 * _25;
_5 = _2 + _4;
_6 = MEM[(__v4sf *)m_12(D) + 32B];
_14 = BIT_FIELD_REF <v_13(D), 32, 64>;
v_16 = BIT_INSERT_EXPR <v_17(D), _14, 0>;
_15 = VEC_PERM_EXPR <v_16, v_16, { 0, 0, 0, 0 }>;
_7 = _6 * _15;
_8 = _5 + _7;
_9 = MEM[(__v4sf *)m_12(D) + 48B];
_18 = BIT_FIELD_REF <v_13(D), 32, 96>;
v_20 = BIT_INSERT_EXPR <v_21(D), _18, 0>;
_19 = VEC_PERM_EXPR <v_20, v_20, { 0, 0, 0, 0 }>;
_10 = _9 * _19;
_22 = _8 + _10;
return _22;
So what's missing is converting the extract element, insert at 0 & splat
into splat element N.
_30 = BIT_FIELD_REF <v_13(D), 32, 0>;
v_28 = BIT_INSERT_EXPR <v_27(D), _30, 0>;
_29 = VEC_PERM_EXPR <v_28, v_28, { 0, 0, 0, 0 }>;
Shows a missing no-op (insert into default-def at 0 from extract from same
position can simply return the vector we extract from).
_26 = BIT_FIELD_REF <v_13(D), 32, 32>;
v_24 = BIT_INSERT_EXPR <v_23(D), _26, 0>;
_25 = VEC_PERM_EXPR <v_24, v_24, { 0, 0, 0, 0 }>;
is a bit more complicated - the VEC_PERM_EXPR indices should be modified
based on the fact we only pick the just inserted elements and those
were extracted from another (compatible) vector.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
[not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
2021-12-06 23:36 ` [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing pinskia at gcc dot gnu.org
2021-12-07 8:22 ` rguenth at gcc dot gnu.org
@ 2024-04-14 1:11 ` pinskia at gcc dot gnu.org
2024-04-14 1:16 ` pinskia at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-14 1:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756
--- Comment #19 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 57941
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57941&action=edit
Uninclude version of the "More concise demonstration of the v4sf->float->v4sf
issue"
Uninclude version so it is easier to test with newer compilers.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
[not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2024-04-14 1:11 ` pinskia at gcc dot gnu.org
@ 2024-04-14 1:16 ` pinskia at gcc dot gnu.org
2024-04-14 1:29 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-14 1:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work| |14.0
--- Comment #20 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Looks fixed on the trunk, will look into what fixed it in a few minutes.
transform_bad no longer has BIT_INSERT_EXPR .
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
[not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2024-04-14 1:16 ` pinskia at gcc dot gnu.org
@ 2024-04-14 1:29 ` pinskia at gcc dot gnu.org
2024-04-14 1:31 ` pinskia at gcc dot gnu.org
2024-04-14 1:33 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-14 1:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756
--- Comment #21 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
r14-3381-g27de9aa152141e combined with
r13-3212-gb88adba751da63
r13-3271-g786e4c024f9416
Fixed the "More concise demonstration of the v4sf->float->v4sf issue" testcase.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
[not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2024-04-14 1:29 ` pinskia at gcc dot gnu.org
@ 2024-04-14 1:31 ` pinskia at gcc dot gnu.org
2024-04-14 1:33 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-14 1:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756
--- Comment #22 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 57942
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57942&action=edit
uninclude of the original testcase
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
[not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2024-04-14 1:31 ` pinskia at gcc dot gnu.org
@ 2024-04-14 1:33 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-14 1:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work|14.0 |
--- Comment #23 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The original testcase still has an issue though.
We get:
```
_1 = MEM[(const float &)v_7(D) + 12];
MEM[(float &)&D.6716] = _1;
_24 = D.6716._rep.vecf;
_25 = VEC_PERM_EXPR <_24, _24, { 0, 0, 0, 0 }>;
```
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-04-14 1:33 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
2021-12-06 23:36 ` [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing pinskia at gcc dot gnu.org
2021-12-07 8:22 ` rguenth at gcc dot gnu.org
2024-04-14 1:11 ` pinskia at gcc dot gnu.org
2024-04-14 1:16 ` pinskia at gcc dot gnu.org
2024-04-14 1:29 ` pinskia at gcc dot gnu.org
2024-04-14 1:31 ` pinskia at gcc dot gnu.org
2024-04-14 1:33 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).