public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/115534] New: intermediate stack use not eliminated
@ 2024-06-18 7:48 tnfchris at gcc dot gnu.org
2024-06-18 12:36 ` [Bug tree-optimization/115534] " pinskia at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-06-18 7:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534
Bug ID: 115534
Summary: intermediate stack use not eliminated
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: tnfchris at gcc dot gnu.org
Target Milestone: ---
Consider the following example:
#include <stdint.h>
typedef struct _pixel_t
{
double red, green, blue, opacity;
} pixel_t;
typedef struct _PixelPacket
{
unsigned short blue, green, red, opacity;
} PixelPacket;
pixel_t f (unsigned height, unsigned width, unsigned virt_width,
uint8_t *restrict k, const PixelPacket *restrict k_pixels)
{
pixel_t result = {};
for (unsigned u=0; u < (width & -4); u++, k--) {
result.red += (*k)*k_pixels[u].red;
result.green += (*k)*k_pixels[u].green;
result.blue += (*k)*k_pixels[u].blue;
result.opacity += (*k)*k_pixels[u].opacity;
k_pixels += virt_width;
}
return result;
}
---
compiled with -O3 vectorizes as good, but the epilogue code is very
inefficient:
fadd v29.2d, v29.2d, v30.2d
fadd v28.2d, v28.2d, v31.2d
cmp w5, w1
bhi .L3
mov v31.16b, v28.16b
ins v31.d[1], v29.d[1]
ins v29.d[1], v28.d[1]
stp q31, q29, [sp, 32]
ldp d0, d1, [sp, 32]
ldp d2, d3, [sp, 48]
add sp, sp, 64
ret
.L4:
movi v29.2d, 0
mov v31.16b, v29.16b
stp q31, q29, [sp, 32]
ldp d0, d1, [sp, 32]
ldp d2, d3, [sp, 48]
add sp, sp, 64
ret
as in it goes through the stack to create the return registers. This looks
like at gimple we still have the store:
<bb 5> [local count: 105119324]:
_33 = VEC_PERM_EXPR <vect__10.16_41, vect__10.16_42, { 0, 3 }>;
_31 = VEC_PERM_EXPR <vect__10.16_42, vect__10.16_41, { 0, 3 }>;
<bb 6> [local count: 118111600]:
# vect_result_red_64.18_28 = PHI <_33(5), { 0.0, 0.0 }(2)>
# vect_result_red_64.18_105 = PHI <_31(5), { 0.0, 0.0 }(2)>
MEM <vector(2) double> [(double *)&D.4535] = vect_result_red_64.18_28;
MEM <vector(2) double> [(double *)&D.4535 + 16B] = vect_result_red_64.18_105;
return D.4535;
clang is able to generate much better code here:
fadd v0.2d, v0.2d, v1.2d
fadd v2.2d, v2.2d, v3.2d
b.ne .LBB0_2
.LBB0_3:
mov d1, v2.d[1]
mov d3, v0.d[1]
ret
The vectorized code gets reg-alloc'ed so that d0 an d2 are already in the right
registers at the end of the vector loop, and the epilogue only has to split the
registers up to get d1 and d3.
I think we would generate the same if we were to elide the intermediate stack
store.
See https://godbolt.org/z/ocqchWWs5
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/115534] intermediate stack use not eliminated
2024-06-18 7:48 [Bug tree-optimization/115534] New: intermediate stack use not eliminated tnfchris at gcc dot gnu.org
@ 2024-06-18 12:36 ` pinskia at gcc dot gnu.org
2024-06-18 12:36 ` [Bug middle-end/115534] " pinskia at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-18 12:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect there is a dup of this already. See the bug which I made this one
blocking for a list of related bugs.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115534] intermediate stack use not eliminated
2024-06-18 7:48 [Bug tree-optimization/115534] New: intermediate stack use not eliminated tnfchris at gcc dot gnu.org
2024-06-18 12:36 ` [Bug tree-optimization/115534] " pinskia at gcc dot gnu.org
@ 2024-06-18 12:36 ` pinskia at gcc dot gnu.org
2024-06-18 12:49 ` tnfchris at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-18 12:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|tree-optimization |middle-end
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115534] intermediate stack use not eliminated
2024-06-18 7:48 [Bug tree-optimization/115534] New: intermediate stack use not eliminated tnfchris at gcc dot gnu.org
2024-06-18 12:36 ` [Bug tree-optimization/115534] " pinskia at gcc dot gnu.org
2024-06-18 12:36 ` [Bug middle-end/115534] " pinskia at gcc dot gnu.org
@ 2024-06-18 12:49 ` tnfchris at gcc dot gnu.org
2024-06-18 17:26 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-06-18 12:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534
--- Comment #2 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> I suspect there is a dup of this already. See the bug which I made this one
> blocking for a list of related bugs.
Most of the other bugs relate to the argument expansions, however this one,
regardless of the expansion itself shouldn't need the intermediate stack.
I think there are various other ways the operation could have been kept in a
gimple register.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115534] intermediate stack use not eliminated
2024-06-18 7:48 [Bug tree-optimization/115534] New: intermediate stack use not eliminated tnfchris at gcc dot gnu.org
` (2 preceding siblings ...)
2024-06-18 12:49 ` tnfchris at gcc dot gnu.org
@ 2024-06-18 17:26 ` pinskia at gcc dot gnu.org
2024-06-18 17:29 ` pinskia at gcc dot gnu.org
2024-06-18 18:11 ` tnfchris at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-18 17:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2024-06-18
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
Note even without vectorizer turned on we get really bad code from the return
when expanding (just happens the RTL optimizers can remove the load/stores but
not the stack location):
```
;; return D.4535;
(insn 60 59 61 (set (reg:DF 156)
(mem/c:DF (plus:DI (reg/f:DI 95 virtual-stack-vars)
(const_int -32 [0xffffffffffffffe0])) [4 D.4535+0 S8 A128]))
"/app/example.cpp":24:12 -1
(nil))
(insn 61 60 62 (set (reg:DF 157)
(mem/c:DF (plus:DI (reg/f:DI 95 virtual-stack-vars)
(const_int -24 [0xffffffffffffffe8])) [4 D.4535+8 S8 A64]))
"/app/example.cpp":24:12 -1
(nil))
(insn 62 61 63 (set (reg:DF 158)
(mem/c:DF (plus:DI (reg/f:DI 95 virtual-stack-vars)
(const_int -16 [0xfffffffffffffff0])) [4 D.4535+16 S8 A128]))
"/app/example.cpp":24:12 -1
(nil))
(insn 63 62 64 (set (reg:DF 159)
(mem/c:DF (plus:DI (reg/f:DI 95 virtual-stack-vars)
(const_int -8 [0xfffffffffffffff8])) [4 D.4535+24 S8 A64]))
"/app/example.cpp":24:12 -1
(nil))
(insn 64 63 65 (set (reg:DF 132 [ <retval> ])
(reg:DF 156)) "/app/example.cpp":24:12 -1
(nil))
(insn 65 64 66 (set (reg:DF 133 [ <retval>+8 ])
(reg:DF 157)) "/app/example.cpp":24:12 -1
(nil))
(insn 66 65 67 (set (reg:DF 134 [ <retval>+16 ])
(reg:DF 158)) "/app/example.cpp":24:12 -1
(nil))
(insn 67 66 68 (set (reg:DF 135 [ <retval>+24 ])
(reg:DF 159)) "/app/example.cpp":24:12 -1
(nil))
(jump_insn 68 67 69 (set (pc)
(label_ref 0)) "/app/example.cpp":24:12 -1
(nil))
```
On the stack location we get:
```
sub sp, sp, #64
```
This is why I said there are a few duplicates there ...
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115534] intermediate stack use not eliminated
2024-06-18 7:48 [Bug tree-optimization/115534] New: intermediate stack use not eliminated tnfchris at gcc dot gnu.org
` (3 preceding siblings ...)
2024-06-18 17:26 ` pinskia at gcc dot gnu.org
@ 2024-06-18 17:29 ` pinskia at gcc dot gnu.org
2024-06-18 18:11 ` tnfchris at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-18 17:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This might be improved by
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654819.html . Or it might
be the case the vectorizer case needs to be improved afterwards. But I think
that is the infrastructure for fixing this issue.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/115534] intermediate stack use not eliminated
2024-06-18 7:48 [Bug tree-optimization/115534] New: intermediate stack use not eliminated tnfchris at gcc dot gnu.org
` (4 preceding siblings ...)
2024-06-18 17:29 ` pinskia at gcc dot gnu.org
@ 2024-06-18 18:11 ` tnfchris at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-06-18 18:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534
--- Comment #5 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #4)
> This might be improved by
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654819.html . Or it
> might be the case the vectorizer case needs to be improved afterwards. But I
> think that is the infrastructure for fixing this issue.
Yeah Richard pointed me to this today as well. The vectorizer case is a bit
unique because the vectorizer has packed scalar values in two vector registers.
So yeah think it's likely some work will be needed afterwards but will see
after the fsra patch lands :)
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-06-18 18:11 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-18 7:48 [Bug tree-optimization/115534] New: intermediate stack use not eliminated tnfchris at gcc dot gnu.org
2024-06-18 12:36 ` [Bug tree-optimization/115534] " pinskia at gcc dot gnu.org
2024-06-18 12:36 ` [Bug middle-end/115534] " pinskia at gcc dot gnu.org
2024-06-18 12:49 ` tnfchris at gcc dot gnu.org
2024-06-18 17:26 ` pinskia at gcc dot gnu.org
2024-06-18 17:29 ` pinskia at gcc dot gnu.org
2024-06-18 18:11 ` tnfchris at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).