public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/110879] New: Unnecessary reread from memory in a loop
@ 2023-08-02 19:26 palevichva at gmail dot com
  2023-08-03 11:42 ` [Bug libstdc++/110879] [14 Regression] Unnecessary reread from memory in a loop with std::vector rguenth at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: palevichva at gmail dot com @ 2023-08-02 19:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110879

            Bug ID: 110879
           Summary: Unnecessary reread from memory in a loop
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: palevichva at gmail dot com
  Target Milestone: ---

Created attachment 55678
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55678&action=edit
preprocessed file by g++ from revision dd2eb972a

I've found a strange regression in optimization. Trunk version of g++ produces
less optimal assembly. It rereads same memory location in every iteration of a
loop. More specifically, it rereads fields _M_finish and _M_end_of_storage of a
vector from memory every push_back call, although it is not necessary.
Released version 13.2 doesn't do that, and just uses values from registers.

I'm compiling following code:

#include <vector>

std::vector<int> f(std::size_t n) {
    std::vector<int> res;
    res.reserve(n);
    for (std::size_t i = 0; i < n; ++i) {
        res.push_back(i*i);
    }
    return res;
}

The main body of a loop looks like this:
~/.local/gcc/bin/g++ -S -fverbose-asm -O3 -std=c++20 pb.cpp

>.L41:
># /home/scaiper/.local/gcc/include/c++/14.0.0/bits/stl_construct.h:97:     { return ::new((void*)__location) _Tp(std::forward<_Args>(__args)...); }
>        movl    %r15d, (%rbx)   # _3, *prephitmp_51
># /home/scaiper/.local/gcc/include/c++/14.0.0/bits/vector.tcc:119:          ++this->_M_impl._M_finish;
>        addq    $4, %rbx        #, tmp135
>        movq    %rbx, 8(%rbp)   # tmp135, res_8(D)->D.35756._M_impl.D.35067._M_finish
>.L8:
># pb.cpp:6:     for (std::size_t i = 0; i < n; ++i) {
>        addq    $1, %r13        #, i
># pb.cpp:6:     for (std::size_t i = 0; i < n; ++i) {
>        cmpq    %r13, %r12      # i, n
>        je      .L1     #,
># /home/scaiper/.local/gcc/include/c++/14.0.0/bits/vector.tcc:114:      if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
>        movq    8(%rbp), %rbx   # res_8(D)->D.35756._M_impl.D.35067._M_finish, prephitmp_51
># /home/scaiper/.local/gcc/include/c++/14.0.0/bits/vector.tcc:114:      if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
>        movq    16(%rbp), %rax  # res_8(D)->D.35756._M_impl.D.35067._M_end_of_storage, pretmp_52
>.L16:
># pb.cpp:7:         res.push_back(i*i);
>        movl    %r13d, %r15d    # i, _3
>        imull   %r13d, %r15d    # i, _3
># /home/scaiper/.local/gcc/include/c++/14.0.0/bits/vector.tcc:114:      if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
>        cmpq    %rax, %rbx      # pretmp_52, prephitmp_51
>        jne     .L41    #,

Same loop as produced by 13.2:
~/.local/gcc-13.2/bin/g++ -v -S -fverbose-asm -O3 -std=c++20 pb.cpp

>.L43:
># /home/scaiper/.local/gcc-13.2/include/c++/13.2.0/bits/stl_construct.h:97:     { return ::new((void*)__location) _Tp(std::forward<_Args>(__args)...); }
>        movl    %r12d, (%rcx)   # _3, *prephitmp_4
># /home/scaiper/.local/gcc-13.2/include/c++/13.2.0/bits/vector.tcc:119:             ++this->_M_impl._M_finish;
>        addq    $4, %rcx        #, prephitmp_4
>        movq    %rcx, 8(%rbp)   # prephitmp_4, res_8(D)->D.35699._M_impl.D.35010._M_finish
>.L8:
># pb.cpp:6:     for (std::size_t i = 0; i < n; ++i) {
>        addq    $1, %rbx        #, i
># pb.cpp:6:     for (std::size_t i = 0; i < n; ++i) {
>        cmpq    %rbx, %r13      # i, n
>        je      .L1     #,
>.L18:
># pb.cpp:7:         res.push_back(i*i);
>        movl    %ebx, %r12d     # i, _3
>        imull   %ebx, %r12d     # i, _3
># /home/scaiper/.local/gcc-13.2/include/c++/13.2.0/bits/vector.tcc:114:         if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
>        cmpq    %r8, %rcx       # prephitmp_74, prephitmp_4
>        jne     .L43    #,

Notice this extra commands in the first snippet:
movq    8(%rbp), %rbx
movq    16(%rbp), %rax

I've bisected this problem to the commit dd2eb972a (libstdc++: Use RAII in
std::vector::_M_realloc_insert)
(https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=dd2eb972a5b063e10c83878d5c9336a818fa8291).
It doesn't look like commit is the problem. Code looks pretty equivalent. But
for some reason compiler produces different result.

I'm using version built from aforementioned commit dd2eb972a:
Target: x86_64-pc-linux-gnu
Configured with: ../gcc/configure --enable-languages=c++ --disable-multilib
--prefix=/home/scaiper/.local/gcc
gcc version 14.0.0 20230623 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-v' '-S' '-fverbose-asm' '-O3' '-std=c++20'
'-shared-libgcc' '-mtune=generic' '-march=x86-64'

Comparing with 13.2:
Target: x86_64-pc-linux-gnu
Configured with: ../gcc/configure --enable-languages=c++ --disable-multilib
--prefix=/home/scaiper/.local/gcc-13.2
gcc version 13.2.0 (GCC)
COLLECT_GCC_OPTIONS='-v' '-S' '-fverbose-asm' '-O3' '-std=c++20'
'-shared-libgcc' '-mtune=generic' '-march=x86-64'

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/110879] [14 Regression] Unnecessary reread from memory in a loop with std::vector
  2023-08-02 19:26 [Bug c++/110879] New: Unnecessary reread from memory in a loop palevichva at gmail dot com
@ 2023-08-03 11:42 ` rguenth at gcc dot gnu.org
  2023-09-01 15:02 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-08-03 11:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110879

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |14.0
            Summary|Unnecessary reread from     |[14 Regression] Unnecessary
                   |memory in a loop            |reread from memory in a
                   |                            |loop with std::vector
          Component|c++                         |libstdc++

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/110879] [14 Regression] Unnecessary reread from memory in a loop with std::vector
  2023-08-02 19:26 [Bug c++/110879] New: Unnecessary reread from memory in a loop palevichva at gmail dot com
  2023-08-03 11:42 ` [Bug libstdc++/110879] [14 Regression] Unnecessary reread from memory in a loop with std::vector rguenth at gcc dot gnu.org
@ 2023-09-01 15:02 ` cvs-commit at gcc dot gnu.org
  2023-09-01 16:12 ` redi at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-09-01 15:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110879

--- Comment #1 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:419c423d3aeca754e47e1ce1bf707735603a90a3

commit r14-3627-g419c423d3aeca754e47e1ce1bf707735603a90a3
Author: Vladimir Palevich <palevichva@gmail.com>
Date:   Wed Aug 9 01:34:05 2023 +0300

    libstdc++: fix memory clobbering in std::vector [PR110879]

    Fix ordering to prevent clobbering of class members by a call to deallocate
    in _M_realloc_insert and _M_default_append.

    Because of recent changes in _M_realloc_insert and _M_default_append,
    calls to deallocate were ordered after assignment to class members of
    std::vector (in the guard destructor), which is causing said members to
    be call-clobbered.  This is preventing further optimization, the
    compiler is unable to move memory read out of a hot loop in this case.

    This patch reorders the call to before assignments by putting guard in
    its own block. Plus a new testsuite for this case.  I'm not very happy
    with the new testsuite, but I don't know how to properly test this.

            PR libstdc++/110879

    libstdc++-v3/ChangeLog:

            * include/bits/vector.tcc (_M_realloc_insert): End guard
            lifetime just before assignment to class members.
            (_M_default_append): Likewise.

    gcc/testsuite/ChangeLog:

            * g++.dg/pr110879.C: New test.

    Signed-off-by: Vladimir Palevich  <palevichva@gmail.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/110879] [14 Regression] Unnecessary reread from memory in a loop with std::vector
  2023-08-02 19:26 [Bug c++/110879] New: Unnecessary reread from memory in a loop palevichva at gmail dot com
  2023-08-03 11:42 ` [Bug libstdc++/110879] [14 Regression] Unnecessary reread from memory in a loop with std::vector rguenth at gcc dot gnu.org
  2023-09-01 15:02 ` cvs-commit at gcc dot gnu.org
@ 2023-09-01 16:12 ` redi at gcc dot gnu.org
  2023-11-22  8:53 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2023-09-01 16:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110879

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
This should be fixed now. Thanks for the report and the patch.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/110879] [14 Regression] Unnecessary reread from memory in a loop with std::vector
  2023-08-02 19:26 [Bug c++/110879] New: Unnecessary reread from memory in a loop palevichva at gmail dot com
                   ` (2 preceding siblings ...)
  2023-09-01 16:12 ` redi at gcc dot gnu.org
@ 2023-11-22  8:53 ` jakub at gcc dot gnu.org
  2023-11-22 11:27 ` redi at gcc dot gnu.org
  2023-11-23 10:58 ` cvs-commit at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-11-22  8:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110879

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The new testcase FAILs on both x86_64-linux and i686-linux with C++98:
FAIL: g++.dg/opt/pr110879.C  -std=gnu++98  scan-tree-dump-not optimized
"=\\s*\\S*res_(?!\\S*_M_end_of_storage;)"
Is it expected (if so, the test should have { target c++11 } either on dg-do or
on dg-final), or not?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/110879] [14 Regression] Unnecessary reread from memory in a loop with std::vector
  2023-08-02 19:26 [Bug c++/110879] New: Unnecessary reread from memory in a loop palevichva at gmail dot com
                   ` (3 preceding siblings ...)
  2023-11-22  8:53 ` jakub at gcc dot gnu.org
@ 2023-11-22 11:27 ` redi at gcc dot gnu.org
  2023-11-23 10:58 ` cvs-commit at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-22 11:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110879

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Ah I think that's probably expected. In _M_realloc_insert (and now
_M_realloc_append) we have:

#if __cplusplus >= 201103L
        if _GLIBCXX17_CONSTEXPR (_S_use_relocate())
          {
            // Relocation cannot throw.
            __new_finish = _S_relocate(__old_start, __position.base(),
                                       __new_start, _M_get_Tp_allocator());
            ++__new_finish;
            __new_finish = _S_relocate(__position.base(), __old_finish,
                                       __new_finish, _M_get_Tp_allocator());
          }
        else
#endif

and then an alternative path used for non-trivial types and for C++98. That
alternative path does more work and probably can't be optimized as well, so the
reads from _M_end_of_storage aren't optimized out.

I think we can just use { target c++11 } for the test.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/110879] [14 Regression] Unnecessary reread from memory in a loop with std::vector
  2023-08-02 19:26 [Bug c++/110879] New: Unnecessary reread from memory in a loop palevichva at gmail dot com
                   ` (4 preceding siblings ...)
  2023-11-22 11:27 ` redi at gcc dot gnu.org
@ 2023-11-23 10:58 ` cvs-commit at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-23 10:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110879

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:256d64b3460807f33ff2f70e676801bdcf908c1c

commit r14-5774-g256d64b3460807f33ff2f70e676801bdcf908c1c
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Nov 22 21:24:08 2023 +0000

    c++: Require C++11 for g++.dg/opt/pr110879.C [PR110879]

    The _M_realloc_insert member does not have the trivial relocation
    optimization for C++98, which seems to be why the _M_end_of_storage
    member does not get optimized away. Make this test unsupported for
    C++98.

    gcc/testsuite/ChangeLog:

            PR libstdc++/110879
            * g++.dg/opt/pr110879.C: Require C++11 or later.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-11-23 10:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-02 19:26 [Bug c++/110879] New: Unnecessary reread from memory in a loop palevichva at gmail dot com
2023-08-03 11:42 ` [Bug libstdc++/110879] [14 Regression] Unnecessary reread from memory in a loop with std::vector rguenth at gcc dot gnu.org
2023-09-01 15:02 ` cvs-commit at gcc dot gnu.org
2023-09-01 16:12 ` redi at gcc dot gnu.org
2023-11-22  8:53 ` jakub at gcc dot gnu.org
2023-11-22 11:27 ` redi at gcc dot gnu.org
2023-11-23 10:58 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).