public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13
@ 2023-05-26 11:38 pdimov at gmail dot com
  2023-05-26 16:36 ` [Bug tree-optimization/109985] " pinskia at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: pdimov at gmail dot com @ 2023-05-26 11:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985

            Bug ID: 109985
           Summary: __builtin_prefetch ignored by GCC 12/13
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pdimov at gmail dot com
  Target Milestone: ---

We are investigating a Boost.Unordered performance regression with GCC 12,
on the following benchmark:

https://github.com/boostorg/boost_unordered_benchmarks/blob/4c717baac1bff8d3e51cb8485b72bbb63d533265/scattered_lookup.cpp

and it looks like the reason is that GCC 12 (and 13) ignore a call to
`__builtin_prefetch`.

While GCC 11 generates this:

```
.L108:
        mov     r8, r12
        movdqa  xmm0, xmm1
        sal     r8, 4
        lea     r14, [r10+r8]
        pcmpeqb xmm0, XMMWORD PTR [r14]
        pmovmskb        edx, xmm0
        and     edx, 32767
        je      .L104
        sub     r8, r12
        sal     r8, 4
        add     r8, QWORD PTR [rbx+32]
        prefetcht0      [r8]
.L106:
        xor     r15d, r15d
        rep bsf r15d, edx
        movsx   r15, r15d
        sal     r15, 4
        add     r15, r8
        cmp     rsi, QWORD PTR [r15]
        jne     .L144
        add     r9, QWORD PTR [r15+8]
        mov     rax, rdi
        cmp     r11, rdi
        jne     .L145
```
(https://godbolt.org/z/d663fdM16 - prefetcht0 [r8] right before L106)

GCC 12 generates this in the same function:
```
.L108:
        mov     r8, r10
        movdqa  xmm0, xmm1
        sal     r8, 4
        lea     r9, [rbp+0+r8]
        pcmpeqb xmm0, XMMWORD PTR [r9]
        pmovmskb        edx, xmm0
        and     edx, 32767
        je      .L104
        mov     rdi, QWORD PTR [rsp+16]
        sub     r8, r10
        mov     QWORD PTR [rsp+24], rax
        sal     r8, 4
        mov     rdi, QWORD PTR [rdi+32]
        mov     QWORD PTR [rsp+8], rdi
        mov     rax, rdi
.L106:
        xor     edi, edi
        rep bsf edi, edx
        movsx   rdi, edi
        sal     rdi, 4
        add     rdi, r8
        add     rdi, rax
        cmp     r11, QWORD PTR [rdi]
        jne     .L143
        add     rsi, 8
        add     rbx, QWORD PTR [rdi+8]
        cmp     r12, rsi
        jne     .L109
```
(https://godbolt.org/z/T7csq7TPz - no prefetcht0 instruction before L106)

Simplifying this code unfortunately leads to the prefetcht0 being generated.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
  2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
@ 2023-05-26 16:36 ` pinskia at gcc dot gnu.org
  2023-05-26 17:15 ` christian.mazakas at gmail dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-26 16:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-05-26

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
There are only two __builtin_prefetch in .optimized for GCC 12.

This is definitely going to be hard to debug ...

Can you attach the preprocessed source?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
  2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
  2023-05-26 16:36 ` [Bug tree-optimization/109985] " pinskia at gcc dot gnu.org
@ 2023-05-26 17:15 ` christian.mazakas at gmail dot com
  2023-05-26 17:17 ` pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: christian.mazakas at gmail dot com @ 2023-05-26 17:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985

Christian Mazakas <christian.mazakas at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |christian.mazakas at gmail dot com

--- Comment #2 from Christian Mazakas <christian.mazakas at gmail dot com> ---
Created attachment 55172
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55172&action=edit
Preprocessed source from the relevant godbolt.org link

This is the preprocessed output on my machine, generated using the code from
the relevant benchmark and develop Branch of Unordered

Let me know if it doesn't provide enough information or if more is required.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
  2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
  2023-05-26 16:36 ` [Bug tree-optimization/109985] " pinskia at gcc dot gnu.org
  2023-05-26 17:15 ` christian.mazakas at gmail dot com
@ 2023-05-26 17:17 ` pinskia at gcc dot gnu.org
  2023-05-26 17:38 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-26 17:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|1                           |0
             Status|WAITING                     |UNCONFIRMED

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
  2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
                   ` (2 preceding siblings ...)
  2023-05-26 17:17 ` pinskia at gcc dot gnu.org
@ 2023-05-26 17:38 ` jakub at gcc dot gnu.org
  2023-05-26 22:28 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-05-26 17:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |jakub at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Since r12-5236-g5aa91072e24c1e16 the -O3 assembly contains just 2 prefetches
rather than 4.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
  2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
                   ` (3 preceding siblings ...)
  2023-05-26 17:38 ` jakub at gcc dot gnu.org
@ 2023-05-26 22:28 ` pinskia at gcc dot gnu.org
  2023-05-28 20:33 ` hubicka at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-26 22:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Hmm:
modref analyzing 'void boost::unordered::detail::foa::prefetch(const
void*)/3452' (ipa=0) (pure)
Analyzing flags of ssa name: p_1(D)
  Analyzing stmt: __builtin_prefetch (p_1(D));
  current flags of p_1(D) no_direct_clobber no_indirect_clobber
no_direct_escape no_indirect_escape not_returned_directly
not_returned_indirectly no_direct_read no_indirect_read
flags of ssa name p_1(D) no_direct_clobber no_indirect_clobber no_direct_escape
no_indirect_escape not_returned_directly not_returned_indirectly no_direct_read
no_indirect_read
Always executed bbbs (assuming return or EH): 2
 - Analyzing call:__builtin_prefetch (p_1(D));
 - ECF_CONST | ECF_NOVOPS, ignoring all stores and all loads except for args.
Function found to be const: void boost::unordered::detail::foa::prefetch(const
void*)/3452
Declaration updated to be const: void
boost::unordered::detail::foa::prefetch(const void*)/3452
 - modref done with result: tracked.
  loads:
  stores:
  Try dse
  parm 0 flags: not_returned_directly not_returned_indirectly no_direct_read
no_indirect_read
void boost::unordered::detail::foa::prefetch (const void * p)
{
  <bb 2> [local count: 1073741824]:
  __builtin_prefetch (p_1(D));
  return;

}


Maybe that explains it, 


DEF_GCC_BUILTIN        (BUILT_IN_PREFETCH, "prefetch",
BT_FN_VOID_CONST_PTR_VAR, ATTR_NOVOPS_LEAF_LIST)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
  2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
                   ` (4 preceding siblings ...)
  2023-05-26 22:28 ` pinskia at gcc dot gnu.org
@ 2023-05-28 20:33 ` hubicka at gcc dot gnu.org
  2023-05-28 20:40 ` hubicka at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-05-28 20:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |hubicka at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #5 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Hmm, this is slipperly.  So novops tells gcc that the function has on memory
side effects and in turn we optimize out the call?

I think we need to handle novops as having side-effects.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
  2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
                   ` (5 preceding siblings ...)
  2023-05-28 20:33 ` hubicka at gcc dot gnu.org
@ 2023-05-28 20:40 ` hubicka at gcc dot gnu.org
  2023-05-30  7:31 ` rguenth at gcc dot gnu.org
  2023-05-30  7:32 ` [Bug tree-optimization/109985] [12/13/14 Regression] " rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-05-28 20:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985

--- Comment #6 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Created attachment 55180
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55180&action=edit
untested patch

It turns out that as modref was written for memory loads/stores only and later
side effects discovery was retrofitted, I forgot to revisit code handling CONST
and NOVOPS together. There are quite few places where we can not short-circuit
on NOVOPS and be sure we merge in the side effects and determinism flags.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13
  2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
                   ` (6 preceding siblings ...)
  2023-05-28 20:40 ` hubicka at gcc dot gnu.org
@ 2023-05-30  7:31 ` rguenth at gcc dot gnu.org
  2023-05-30  7:32 ` [Bug tree-optimization/109985] [12/13/14 Regression] " rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-30  7:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
novops for prefetch also isn't the very best thing to do - that makes it free
to schedule across loads/stores which probably isn't the intent.  Of course
it shouldn't be a barrier for CSE so modeling the exact perfect behavior
is difficult.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/109985] [12/13/14 Regression] __builtin_prefetch ignored by GCC 12/13
  2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
                   ` (7 preceding siblings ...)
  2023-05-30  7:31 ` rguenth at gcc dot gnu.org
@ 2023-05-30  7:32 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-30  7:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.4
            Summary|__builtin_prefetch ignored  |[12/13/14 Regression]
                   |by GCC 12/13                |__builtin_prefetch ignored
                   |                            |by GCC 12/13
           Keywords|                            |wrong-code
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-05-30  7:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-26 11:38 [Bug c++/109985] New: __builtin_prefetch ignored by GCC 12/13 pdimov at gmail dot com
2023-05-26 16:36 ` [Bug tree-optimization/109985] " pinskia at gcc dot gnu.org
2023-05-26 17:15 ` christian.mazakas at gmail dot com
2023-05-26 17:17 ` pinskia at gcc dot gnu.org
2023-05-26 17:38 ` jakub at gcc dot gnu.org
2023-05-26 22:28 ` pinskia at gcc dot gnu.org
2023-05-28 20:33 ` hubicka at gcc dot gnu.org
2023-05-28 20:40 ` hubicka at gcc dot gnu.org
2023-05-30  7:31 ` rguenth at gcc dot gnu.org
2023-05-30  7:32 ` [Bug tree-optimization/109985] [12/13/14 Regression] " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).