public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13
@ 2023-05-26 11:38 pdimov at gmail dot com
2023-05-26 16:36 ` [Bug tree-optimization/109985] " pinskia at gcc dot gnu.org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: pdimov at gmail dot com @ 2023-05-26 11:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985
Bug ID: 109985
Summary: __builtin_prefetch ignored by GCC 12/13
Product: gcc
Version: 13.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: pdimov at gmail dot com
Target Milestone: ---
We are investigating a Boost.Unordered performance regression with GCC 12,
on the following benchmark:
https://github.com/boostorg/boost_unordered_benchmarks/blob/4c717baac1bff8d3e51cb8485b72bbb63d533265/scattered_lookup.cpp
and it looks like the reason is that GCC 12 (and 13) ignore a call to
`__builtin_prefetch`.
While GCC 11 generates this:
```
.L108:
mov r8, r12
movdqa xmm0, xmm1
sal r8, 4
lea r14, [r10+r8]
pcmpeqb xmm0, XMMWORD PTR [r14]
pmovmskb edx, xmm0
and edx, 32767
je .L104
sub r8, r12
sal r8, 4
add r8, QWORD PTR [rbx+32]
prefetcht0 [r8]
.L106:
xor r15d, r15d
rep bsf r15d, edx
movsx r15, r15d
sal r15, 4
add r15, r8
cmp rsi, QWORD PTR [r15]
jne .L144
add r9, QWORD PTR [r15+8]
mov rax, rdi
cmp r11, rdi
jne .L145
```
(https://godbolt.org/z/d663fdM16 - prefetcht0 [r8] right before L106)
GCC 12 generates this in the same function:
```
.L108:
mov r8, r10
movdqa xmm0, xmm1
sal r8, 4
lea r9, [rbp+0+r8]
pcmpeqb xmm0, XMMWORD PTR [r9]
pmovmskb edx, xmm0
and edx, 32767
je .L104
mov rdi, QWORD PTR [rsp+16]
sub r8, r10
mov QWORD PTR [rsp+24], rax
sal r8, 4
mov rdi, QWORD PTR [rdi+32]
mov QWORD PTR [rsp+8], rdi
mov rax, rdi
.L106:
xor edi, edi
rep bsf edi, edx
movsx rdi, edi
sal rdi, 4
add rdi, r8
add rdi, rax
cmp r11, QWORD PTR [rdi]
jne .L143
add rsi, 8
add rbx, QWORD PTR [rdi+8]
cmp r12, rsi
jne .L109
```
(https://godbolt.org/z/T7csq7TPz - no prefetcht0 instruction before L106)
Simplifying this code unfortunately leads to the prefetcht0 being generated.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
@ 2023-05-26 16:36 ` pinskia at gcc dot gnu.org
2023-05-26 17:15 ` christian.mazakas at gmail dot com
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-26 16:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |WAITING
Ever confirmed|0 |1
Last reconfirmed| |2023-05-26
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
There are only two __builtin_prefetch in .optimized for GCC 12.
This is definitely going to be hard to debug ...
Can you attach the preprocessed source?
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
2023-05-26 16:36 ` [Bug tree-optimization/109985] " pinskia at gcc dot gnu.org
@ 2023-05-26 17:15 ` christian.mazakas at gmail dot com
2023-05-26 17:17 ` pinskia at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: christian.mazakas at gmail dot com @ 2023-05-26 17:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985
Christian Mazakas <christian.mazakas at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |christian.mazakas at gmail dot com
--- Comment #2 from Christian Mazakas <christian.mazakas at gmail dot com> ---
Created attachment 55172
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55172&action=edit
Preprocessed source from the relevant godbolt.org link
This is the preprocessed output on my machine, generated using the code from
the relevant benchmark and develop Branch of Unordered
Let me know if it doesn't provide enough information or if more is required.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
2023-05-26 16:36 ` [Bug tree-optimization/109985] " pinskia at gcc dot gnu.org
2023-05-26 17:15 ` christian.mazakas at gmail dot com
@ 2023-05-26 17:17 ` pinskia at gcc dot gnu.org
2023-05-26 17:38 ` jakub at gcc dot gnu.org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-26 17:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|1 |0
Status|WAITING |UNCONFIRMED
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
` (2 preceding siblings ...)
2023-05-26 17:17 ` pinskia at gcc dot gnu.org
@ 2023-05-26 17:38 ` jakub at gcc dot gnu.org
2023-05-26 22:28 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-05-26 17:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
CC| |hubicka at gcc dot gnu.org,
| |jakub at gcc dot gnu.org
Status|UNCONFIRMED |NEW
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Since r12-5236-g5aa91072e24c1e16 the -O3 assembly contains just 2 prefetches
rather than 4.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
` (3 preceding siblings ...)
2023-05-26 17:38 ` jakub at gcc dot gnu.org
@ 2023-05-26 22:28 ` pinskia at gcc dot gnu.org
2023-05-28 20:33 ` hubicka at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-26 22:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Hmm:
modref analyzing 'void boost::unordered::detail::foa::prefetch(const
void*)/3452' (ipa=0) (pure)
Analyzing flags of ssa name: p_1(D)
Analyzing stmt: __builtin_prefetch (p_1(D));
current flags of p_1(D) no_direct_clobber no_indirect_clobber
no_direct_escape no_indirect_escape not_returned_directly
not_returned_indirectly no_direct_read no_indirect_read
flags of ssa name p_1(D) no_direct_clobber no_indirect_clobber no_direct_escape
no_indirect_escape not_returned_directly not_returned_indirectly no_direct_read
no_indirect_read
Always executed bbbs (assuming return or EH): 2
- Analyzing call:__builtin_prefetch (p_1(D));
- ECF_CONST | ECF_NOVOPS, ignoring all stores and all loads except for args.
Function found to be const: void boost::unordered::detail::foa::prefetch(const
void*)/3452
Declaration updated to be const: void
boost::unordered::detail::foa::prefetch(const void*)/3452
- modref done with result: tracked.
loads:
stores:
Try dse
parm 0 flags: not_returned_directly not_returned_indirectly no_direct_read
no_indirect_read
void boost::unordered::detail::foa::prefetch (const void * p)
{
<bb 2> [local count: 1073741824]:
__builtin_prefetch (p_1(D));
return;
}
Maybe that explains it,
DEF_GCC_BUILTIN (BUILT_IN_PREFETCH, "prefetch",
BT_FN_VOID_CONST_PTR_VAR, ATTR_NOVOPS_LEAF_LIST)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
` (4 preceding siblings ...)
2023-05-26 22:28 ` pinskia at gcc dot gnu.org
@ 2023-05-28 20:33 ` hubicka at gcc dot gnu.org
2023-05-28 20:40 ` hubicka at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-05-28 20:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org
Status|NEW |ASSIGNED
--- Comment #5 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Hmm, this is slipperly. So novops tells gcc that the function has on memory
side effects and in turn we optimize out the call?
I think we need to handle novops as having side-effects.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
` (5 preceding siblings ...)
2023-05-28 20:33 ` hubicka at gcc dot gnu.org
@ 2023-05-28 20:40 ` hubicka at gcc dot gnu.org
2023-05-30 7:31 ` rguenth at gcc dot gnu.org
2023-05-30 7:32 ` [Bug tree-optimization/109985] [12/13/14 Regression] " rguenth at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-05-28 20:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985
--- Comment #6 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Created attachment 55180
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55180&action=edit
untested patch
It turns out that as modref was written for memory loads/stores only and later
side effects discovery was retrofitted, I forgot to revisit code handling CONST
and NOVOPS together. There are quite few places where we can not short-circuit
on NOVOPS and be sure we merge in the side effects and determinism flags.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
` (6 preceding siblings ...)
2023-05-28 20:40 ` hubicka at gcc dot gnu.org
@ 2023-05-30 7:31 ` rguenth at gcc dot gnu.org
2023-05-30 7:32 ` [Bug tree-optimization/109985] [12/13/14 Regression] " rguenth at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-30 7:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
novops for prefetch also isn't the very best thing to do - that makes it free
to schedule across loads/stores which probably isn't the intent. Of course
it shouldn't be a barrier for CSE so modeling the exact perfect behavior
is difficult.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/109985] [12/13/14 Regression] __builtin_prefetch ignored by GCC 12/13
2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
` (7 preceding siblings ...)
2023-05-30 7:31 ` rguenth at gcc dot gnu.org
@ 2023-05-30 7:32 ` rguenth at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-30 7:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |12.4
Summary|__builtin_prefetch ignored |[12/13/14 Regression]
|by GCC 12/13 |__builtin_prefetch ignored
| |by GCC 12/13
Keywords| |wrong-code
Priority|P3 |P2
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-05-30 7:32 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
2023-05-26 16:36 ` [Bug tree-optimization/109985] " pinskia at gcc dot gnu.org
2023-05-26 17:15 ` christian.mazakas at gmail dot com
2023-05-26 17:17 ` pinskia at gcc dot gnu.org
2023-05-26 17:38 ` jakub at gcc dot gnu.org
2023-05-26 22:28 ` pinskia at gcc dot gnu.org
2023-05-28 20:33 ` hubicka at gcc dot gnu.org
2023-05-28 20:40 ` hubicka at gcc dot gnu.org
2023-05-30 7:31 ` rguenth at gcc dot gnu.org
2023-05-30 7:32 ` [Bug tree-optimization/109985] [12/13/14 Regression] " rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).