public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
       [not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
@ 2021-12-06 23:36 ` pinskia at gcc dot gnu.org
  2021-12-07  8:22 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-06 23:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |tree-optimization

--- Comment #17 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed on the trunk, this is no longer a target issue:

__builtin_ia32_shufps is lowered into a VEC_PERM_EXPR now:
  _1 = MEM[(const float &)v_7(D) + 12];
  MEM[(float &)&D.5891] = _1;
  _24 = D.5891._rep.vecf;
  _25 = VEC_PERM_EXPR <_24, _24, { 0, 0, 0, 0 }>;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
       [not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
  2021-12-06 23:36 ` [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing pinskia at gcc dot gnu.org
@ 2021-12-07  8:22 ` rguenth at gcc dot gnu.org
  2024-04-14  1:11 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-12-07  8:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
Good:

  <bb 2> [local count: 1073741824]:
  _1 = *m_12(D);
  _14 = VEC_PERM_EXPR <v_13(D), v_13(D), { 0, 0, 0, 0 }>;
  _2 = _1 * _14;
  _3 = MEM[(__v4sf *)m_12(D) + 16B];
  _15 = VEC_PERM_EXPR <v_13(D), v_13(D), { 1, 1, 1, 1 }>;
  _4 = _3 * _15;
  _5 = _2 + _4;
  _6 = MEM[(__v4sf *)m_12(D) + 32B];
  _16 = VEC_PERM_EXPR <v_13(D), v_13(D), { 2, 2, 2, 2 }>;
  _7 = _6 * _16;
  _8 = _5 + _7;
  _9 = MEM[(__v4sf *)m_12(D) + 48B];
  _17 = VEC_PERM_EXPR <v_13(D), v_13(D), { 3, 3, 3, 3 }>;
  _10 = _9 * _17;
  _18 = _8 + _10;
  return _18;

Bad:

  <bb 2> [local count: 1073741824]:
  _1 = *m_12(D);
  _30 = BIT_FIELD_REF <v_13(D), 32, 0>;
  v_28 = BIT_INSERT_EXPR <v_27(D), _30, 0>;
  _29 = VEC_PERM_EXPR <v_28, v_28, { 0, 0, 0, 0 }>;
  _2 = _1 * _29;
  _3 = MEM[(__v4sf *)m_12(D) + 16B];
  _26 = BIT_FIELD_REF <v_13(D), 32, 32>;
  v_24 = BIT_INSERT_EXPR <v_23(D), _26, 0>;
  _25 = VEC_PERM_EXPR <v_24, v_24, { 0, 0, 0, 0 }>;
  _4 = _3 * _25;
  _5 = _2 + _4;
  _6 = MEM[(__v4sf *)m_12(D) + 32B];
  _14 = BIT_FIELD_REF <v_13(D), 32, 64>;
  v_16 = BIT_INSERT_EXPR <v_17(D), _14, 0>;
  _15 = VEC_PERM_EXPR <v_16, v_16, { 0, 0, 0, 0 }>;
  _7 = _6 * _15;
  _8 = _5 + _7;
  _9 = MEM[(__v4sf *)m_12(D) + 48B];
  _18 = BIT_FIELD_REF <v_13(D), 32, 96>;
  v_20 = BIT_INSERT_EXPR <v_21(D), _18, 0>;
  _19 = VEC_PERM_EXPR <v_20, v_20, { 0, 0, 0, 0 }>;
  _10 = _9 * _19;
  _22 = _8 + _10;
  return _22;

So what's missing is converting the extract element, insert at 0 & splat
into splat element N.

  _30 = BIT_FIELD_REF <v_13(D), 32, 0>;
  v_28 = BIT_INSERT_EXPR <v_27(D), _30, 0>;
  _29 = VEC_PERM_EXPR <v_28, v_28, { 0, 0, 0, 0 }>;

Shows a missing no-op (insert into default-def at 0 from extract from same
position can simply return the vector we extract from).

  _26 = BIT_FIELD_REF <v_13(D), 32, 32>;
  v_24 = BIT_INSERT_EXPR <v_23(D), _26, 0>;
  _25 = VEC_PERM_EXPR <v_24, v_24, { 0, 0, 0, 0 }>;

is a bit more complicated - the VEC_PERM_EXPR indices should be modified
based on the fact we only pick the just inserted elements and those
were extracted from another (compatible) vector.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
       [not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
  2021-12-06 23:36 ` [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing pinskia at gcc dot gnu.org
  2021-12-07  8:22 ` rguenth at gcc dot gnu.org
@ 2024-04-14  1:11 ` pinskia at gcc dot gnu.org
  2024-04-14  1:16 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-14  1:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756

--- Comment #19 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 57941
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57941&action=edit
Uninclude version of the "More concise demonstration of the v4sf->float->v4sf
issue"

Uninclude version so it is easier to test with newer compilers.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
       [not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2024-04-14  1:11 ` pinskia at gcc dot gnu.org
@ 2024-04-14  1:16 ` pinskia at gcc dot gnu.org
  2024-04-14  1:29 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-14  1:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |14.0

--- Comment #20 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Looks fixed on the trunk, will look into what fixed it in a few minutes.

transform_bad no longer has BIT_INSERT_EXPR .

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
       [not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2024-04-14  1:16 ` pinskia at gcc dot gnu.org
@ 2024-04-14  1:29 ` pinskia at gcc dot gnu.org
  2024-04-14  1:31 ` pinskia at gcc dot gnu.org
  2024-04-14  1:33 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-14  1:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756

--- Comment #21 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
r14-3381-g27de9aa152141e combined with
r13-3212-gb88adba751da63
r13-3271-g786e4c024f9416

Fixed the "More concise demonstration of the v4sf->float->v4sf issue" testcase.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
       [not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2024-04-14  1:29 ` pinskia at gcc dot gnu.org
@ 2024-04-14  1:31 ` pinskia at gcc dot gnu.org
  2024-04-14  1:33 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-14  1:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756

--- Comment #22 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 57942
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57942&action=edit
uninclude of the original testcase

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing
       [not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2024-04-14  1:31 ` pinskia at gcc dot gnu.org
@ 2024-04-14  1:33 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-14  1:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29756

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|14.0                        |

--- Comment #23 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The original testcase still has an issue though.
We get:
```
  _1 = MEM[(const float &)v_7(D) + 12];
  MEM[(float &)&D.6716] = _1;
  _24 = D.6716._rep.vecf;
  _25 = VEC_PERM_EXPR <_24, _24, { 0, 0, 0, 0 }>;
```

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-14  1:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-29756-4@http.gcc.gnu.org/bugzilla/>
2021-12-06 23:36 ` [Bug tree-optimization/29756] SSE intrinsics hard to use without redundant temporaries appearing pinskia at gcc dot gnu.org
2021-12-07  8:22 ` rguenth at gcc dot gnu.org
2024-04-14  1:11 ` pinskia at gcc dot gnu.org
2024-04-14  1:16 ` pinskia at gcc dot gnu.org
2024-04-14  1:29 ` pinskia at gcc dot gnu.org
2024-04-14  1:31 ` pinskia at gcc dot gnu.org
2024-04-14  1:33 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).