public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/104723] New: [12 regression] Redundant usage of stack
@ 2022-03-01 8:35 crazylht at gmail dot com
2022-03-01 8:44 ` [Bug target/104723] " crazylht at gmail dot com
` (14 more replies)
0 siblings, 15 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2022-03-01 8:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
Bug ID: 104723
Summary: [12 regression] Redundant usage of stack
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: crazylht at gmail dot com
Target Milestone: ---
Host: x86_64-pc-linux-gnu
Target: x86_64-*-* i?86-*-*
bool f256(char *a)
{
char t[] = "012345678901234567890123456789012345678901234567";
return __builtin_memcpy(a, &t[0], sizeof(t)) == 0;
}
https://godbolt.org/z/jcjbT4d8e
gcc12 generates
vmovdqa64 ymm31, YMMWORD PTR .LC0[rip]
xor eax, eax
vmovdqu64 YMMWORD PTR [rsp-72], ymm31
vmovdqa64 ymm31, YMMWORD PTR .LC1[rip]
vmovdqu64 YMMWORD PTR [rsp-55], ymm31
vmovdqu64 ymm31, YMMWORD PTR [rsp-72]
vmovdqu64 YMMWORD PTR [rdi], ymm31
vmovdqu64 ymm31, YMMWORD PTR [rsp-55]
vmovdqu64 YMMWORD PTR [rdi+17], ymm31
Why build “unaligned string" by stack instead of putting it directly into the
constant pool.
gcc 11 seems fine.
f256(char*):
vmovdqa xmm0, XMMWORD PTR .LC0[rip]
mov BYTE PTR [rdi+48], 0
vmovdqu XMMWORD PTR [rdi], xmm0
vmovdqa xmm0, XMMWORD PTR .LC1[rip]
xor eax, eax
vmovdqu XMMWORD PTR [rdi+16], xmm0
vmovdqa xmm0, XMMWORD PTR .LC2[rip]
vmovdqu XMMWORD PT
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
@ 2022-03-01 8:44 ` crazylht at gmail dot com
2022-03-01 8:47 ` crazylht at gmail dot com
` (13 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2022-03-01 8:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #0)
> bool f256(char *a)
> {
> char t[] = "012345678901234567890123456789012345678901234567";
> return __builtin_memcpy(a, &t[0], sizeof(t)) == 0;
> }
>
> https://godbolt.org/z/jcjbT4d8e
>
> gcc12 generates
>
> vmovdqa64 ymm31, YMMWORD PTR .LC0[rip]
> xor eax, eax
> vmovdqu64 YMMWORD PTR [rsp-72], ymm31
> vmovdqa64 ymm31, YMMWORD PTR .LC1[rip]
> vmovdqu64 YMMWORD PTR [rsp-55], ymm31
> vmovdqu64 ymm31, YMMWORD PTR [rsp-72]
STF issue here?
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
2022-03-01 8:44 ` [Bug target/104723] " crazylht at gmail dot com
@ 2022-03-01 8:47 ` crazylht at gmail dot com
2022-03-01 12:38 ` lili.cui at intel dot com
` (12 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2022-03-01 8:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
update testcase
void f256(char *a)
{
char t[] = "012345678901234567890123456789012345678901234567";
__builtin_memcpy(a, &t[0], sizeof(t));
}
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
2022-03-01 8:44 ` [Bug target/104723] " crazylht at gmail dot com
2022-03-01 8:47 ` crazylht at gmail dot com
@ 2022-03-01 12:38 ` lili.cui at intel dot com
2022-03-01 13:41 ` rguenth at gcc dot gnu.org
` (11 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: lili.cui at intel dot com @ 2022-03-01 12:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
--- Comment #3 from cuilili <lili.cui at intel dot com> ---
(In reply to Hongtao.liu from comment #1)
> STF issue here?
Yes, Since "YMMWORD PTR [rsp-72]" across the cache line, it has STLF issue
here.
vmovdqu64 YMMWORD PTR [rsp-72], ymm31 --> store 32 bytes from [rsp-72],
across cache line
vmovdqu64 YMMWORD PTR [rsp-55], ymm31 --> over write part of YMMWORD PTR
[rsp-72]
vmovdqu64 ymm31, YMMWORD PTR [rsp-72] --> STLF with first instruction and has
penalty.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
` (2 preceding siblings ...)
2022-03-01 12:38 ` lili.cui at intel dot com
@ 2022-03-01 13:41 ` rguenth at gcc dot gnu.org
2022-03-01 16:00 ` jakub at gcc dot gnu.org
` (10 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-03-01 13:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Last reconfirmed| |2022-03-01
Status|UNCONFIRMED |NEW
Keywords| |missed-optimization,
| |needs-bisection
Target Milestone|--- |12.0
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed with -O2 -march=skylake-avx512. On the high-level side we fail to
promote write-once vars to static storage:
bool f256 (char * a)
{
char t[49];
<bb 2> [local count: 1073741824]:
t = "012345678901234567890123456789012345678901234567";
__builtin_memcpy (a_3(D), &t[0], 49);
t ={v} {CLOBBER};
return 0;
}
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
` (3 preceding siblings ...)
2022-03-01 13:41 ` rguenth at gcc dot gnu.org
@ 2022-03-01 16:00 ` jakub at gcc dot gnu.org
2022-03-01 16:07 ` jakub at gcc dot gnu.org
` (9 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-03-01 16:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
We can only do that if we have a guarantee the callee doesn't care about
whether it is automatic or static var and in the case where it isn't const also
a guarantee it doesn't modify it. Both are the case for memcpy source of
course.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
` (4 preceding siblings ...)
2022-03-01 16:00 ` jakub at gcc dot gnu.org
@ 2022-03-01 16:07 ` jakub at gcc dot gnu.org
2022-03-01 18:45 ` hjl.tools at gmail dot com
` (8 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-03-01 16:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|needs-bisection |
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Started with r12-2666-g29f0e955c97da002b5adb4e8c9dfd2ea9709e207
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
` (5 preceding siblings ...)
2022-03-01 16:07 ` jakub at gcc dot gnu.org
@ 2022-03-01 18:45 ` hjl.tools at gmail dot com
2022-03-01 18:50 ` hjl.tools at gmail dot com
` (7 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2022-03-01 18:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
--- Comment #7 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Jakub Jelinek from comment #6)
> Started with r12-2666-g29f0e955c97da002b5adb4e8c9dfd2ea9709e207
DSE can remove redundant load/store for TI, but not OI/XI.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
` (6 preceding siblings ...)
2022-03-01 18:45 ` hjl.tools at gmail dot com
@ 2022-03-01 18:50 ` hjl.tools at gmail dot com
2022-03-02 8:11 ` lili.cui at intel dot com
` (6 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2022-03-01 18:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
--- Comment #8 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to H.J. Lu from comment #7)
> (In reply to Jakub Jelinek from comment #6)
> > Started with r12-2666-g29f0e955c97da002b5adb4e8c9dfd2ea9709e207
>
> DSE can remove redundant load/store for TI, but not OI/XI.
It is due to overlapping store.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
` (7 preceding siblings ...)
2022-03-01 18:50 ` hjl.tools at gmail dot com
@ 2022-03-02 8:11 ` lili.cui at intel dot com
2022-04-22 16:28 ` jakub at gcc dot gnu.org
` (5 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: lili.cui at intel dot com @ 2022-03-02 8:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
--- Comment #9 from cuilili <lili.cui at intel dot com> ---
(In reply to cuilili from comment #3)
> (In reply to Hongtao.liu from comment #1)
> > STF issue here?
>
correct comment #3
I used perf to collect the "ld_blocks.store_forward" event for those two test
cases, stlf_64_55_64.S has STLF issue due to the two stores overlapping, not
related to crossing cache line.
In this case it has STLF issue.
----------------------------------------------------------------
$cat stlf_64_55_64.S
...
.LFB0:
.cfi_startproc
vmovdqu %ymm0, -64(%rsp)
vmovdqu %ymm1, -55(%rsp)
vmovdqu -64(%rsp), %ymm0
ret
.cfi_endproc
...
$ perf stat -e ld_blocks.store_forward ./stlf_64_55_64.out
runtime= : 128883744
Performance counter stats for './stlf_64_55_64.out':
10,000,507 ld_blocks.store_forward:u
----------------------------------------------------------------
In this case it can do STLF.
----------------------------------------------------------------
$ cat stlf_64_128_64.S
...
.LFB0:
.cfi_startproc
vmovdqu %ymm0, -64(%rsp)
vmovdqu %ymm1, -128(%rsp)
vmovdqu -64(%rsp), %ymm0
ret
.cfi_endproc
...
$ perf stat -e ld_blocks.store_forward ./stlf_64_128_64.out
runtime= : 56477424
Performance counter stats for './stlf_64_128_64.out':
2 ld_blocks.store_forward:u
0.022103902 seconds time elapsed
-------------------------------------------------------------
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
` (8 preceding siblings ...)
2022-03-02 8:11 ` lili.cui at intel dot com
@ 2022-04-22 16:28 ` jakub at gcc dot gnu.org
2022-04-24 7:37 ` lili.cui at intel dot com
` (4 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-22 16:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to H.J. Lu from comment #8)
> > DSE can remove redundant load/store for TI, but not OI/XI.
DSE can remove redundant load/store for OI/XI just fine, just remove the last 7
from the string so that it is 48 bytes instead of 49 and all of sudden it works
fine.
It is indeed due to:
> It is due to overlapping store.
this.
Wonder if we couldn't special case overlapping stores if they are loaded from
constant pool and the overlapping bytes have the same values.
And for the backend, the question is how big the penalty for the overlapping
store is compared to doing multiple non-overlapping stores. Say for those 49
bytes one could do one OI, one TI/V1TI and one QI load/store as opposed to
one aligned and one misaligned OI load/store.
For say:
void
foo (void *p, void *q)
{
__builtin_memcpy (p, q, 49);
}
we emit the 2 overlapping loads/stores for -mavx512f and 4 non-overlapping
loads/stores with say -mavx2.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
` (9 preceding siblings ...)
2022-04-22 16:28 ` jakub at gcc dot gnu.org
@ 2022-04-24 7:37 ` lili.cui at intel dot com
2022-04-26 9:41 ` jakub at gcc dot gnu.org
` (3 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: lili.cui at intel dot com @ 2022-04-24 7:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
--- Comment #11 from cuilili <lili.cui at intel dot com> ---
(In reply to Jakub Jelinek from comment #10)
> And for the backend, the question is how big the penalty for the overlapping
> store is compared to doing multiple non-overlapping stores. Say for those
> 49 bytes one could do one OI, one TI/V1TI and one QI load/store as opposed to
> one aligned and one misaligned OI load/store.
>
> For say:
> void
> foo (void *p, void *q)
> {
> __builtin_memcpy (p, q, 49);
> }
> we emit the 2 overlapping loads/stores for -mavx512f and 4 non-overlapping
> loads/stores with say -mavx2.
I execute both code sequence 100000 times on ICX and znver3 machines.
For ICX: 2 overlapping loads/stores are 3.5x faster than 4 non-overlapping
loads/stores.
For Znver3: 2 overlapping loads/stores are 1.39x faster than 4 non-overlapping
loads/stores.
------------------------------------
vmovdqu ymm0, YMMWORD PTR [rsi]
vmovdqu YMMWORD PTR [rdi], ymm0
vmovdqu ymm1, YMMWORD PTR [rsi+17]
vmovdqu YMMWORD PTR [rdi+17], ymm1
------------------------------------
vmovdqu xmm0, XMMWORD PTR [rsi]
vmovdqu XMMWORD PTR [rdi], xmm0
vmovdqu xmm1, XMMWORD PTR [rsi+16]
vmovdqu XMMWORD PTR [rdi+16], xmm1
vmovdqu xmm2, XMMWORD PTR [rsi+32]
vmovdqu XMMWORD PTR [rdi+32], xmm2
movzx eax, BYTE PTR [rsi+48]
mov BYTE PTR [rdi+48], al
-----------------------------------
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
` (10 preceding siblings ...)
2022-04-24 7:37 ` lili.cui at intel dot com
@ 2022-04-26 9:41 ` jakub at gcc dot gnu.org
2022-05-06 8:32 ` [Bug target/104723] [12/13 " jakub at gcc dot gnu.org
` (2 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-04-26 9:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The overlapping stores happen due to TARGET_OVERLAP_OP_BY_PIECES_P returning
true since PR90773.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12/13 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
` (11 preceding siblings ...)
2022-04-26 9:41 ` jakub at gcc dot gnu.org
@ 2022-05-06 8:32 ` jakub at gcc dot gnu.org
2022-07-26 12:58 ` rguenth at gcc dot gnu.org
2023-05-08 12:23 ` [Bug target/104723] [12/13/14 " rguenth at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-06 8:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|12.0 |12.2
--- Comment #13 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 12.1 is being released, retargeting bugs to GCC 12.2.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12/13 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
` (12 preceding siblings ...)
2022-05-06 8:32 ` [Bug target/104723] [12/13 " jakub at gcc dot gnu.org
@ 2022-07-26 12:58 ` rguenth at gcc dot gnu.org
2023-05-08 12:23 ` [Bug target/104723] [12/13/14 " rguenth at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-26 12:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/104723] [12/13/14 regression] Redundant usage of stack
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
` (13 preceding siblings ...)
2022-07-26 12:58 ` rguenth at gcc dot gnu.org
@ 2023-05-08 12:23 ` rguenth at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-08 12:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|12.3 |12.4
--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 12.3 is being released, retargeting bugs to GCC 12.4.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-05-08 12:23 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-01 8:35 [Bug target/104723] New: [12 regression] Redundant usage of stack crazylht at gmail dot com
2022-03-01 8:44 ` [Bug target/104723] " crazylht at gmail dot com
2022-03-01 8:47 ` crazylht at gmail dot com
2022-03-01 12:38 ` lili.cui at intel dot com
2022-03-01 13:41 ` rguenth at gcc dot gnu.org
2022-03-01 16:00 ` jakub at gcc dot gnu.org
2022-03-01 16:07 ` jakub at gcc dot gnu.org
2022-03-01 18:45 ` hjl.tools at gmail dot com
2022-03-01 18:50 ` hjl.tools at gmail dot com
2022-03-02 8:11 ` lili.cui at intel dot com
2022-04-22 16:28 ` jakub at gcc dot gnu.org
2022-04-24 7:37 ` lili.cui at intel dot com
2022-04-26 9:41 ` jakub at gcc dot gnu.org
2022-05-06 8:32 ` [Bug target/104723] [12/13 " jakub at gcc dot gnu.org
2022-07-26 12:58 ` rguenth at gcc dot gnu.org
2023-05-08 12:23 ` [Bug target/104723] [12/13/14 " rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).