From: Jan Hubicka <hubicka@ucw.cz>
To: Matthias Kretz <m.kretz@gsi.de>
Cc: jwakely@redhat.com, libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org
Subject: Re: libstdc++: Speed up push_back
Date: Fri, 24 Nov 2023 21:07:35 +0100 [thread overview]
Message-ID: <ZWECh8DQFpocwj+0@kam.mff.cuni.cz> (raw)
In-Reply-To: <ZV9qqZqt6BpBjrlN@kam.mff.cuni.cz>
> With my changes at -O3 we now inline push_back, so we could optimize the
> first loop to the second. However with
> ~/trunk-install/bin/gcc -O3 auto.C -S -fdump-tree-all-details -fno-exceptions -fno-store-merging -fno-tree-slp-vectorize
> the fist problem is right at the begining:
>
> <bb 2> [local count: 97603128]:
> MEM[(struct _Vector_impl_data *)x_4(D)]._M_start = 0B;
> MEM[(struct _Vector_impl_data *)x_4(D)]._M_finish = 0B;
> MEM[(struct _Vector_impl_data *)x_4(D)]._M_end_of_storage = 0B;
> _37 = operator new (40);
> _22 = x_4(D)->D.26019._M_impl.D.25320._M_finish;
> _23 = x_4(D)->D.26019._M_impl.D.25320._M_start;
Thinking of this problem, it is easy to adjust reserve to copy _M_start
and _M_finish to local variables across the call of new() which makes
the old values visible to compiler regardless of points-to-analysis.
In fact _M_realloc_insert already has such code:
// Make local copies of these members because the compiler thinks
// the allocator can alter them if 'this' is globally reachable.
pointer __old_start = this->_M_impl._M_start;
pointer __old_finish = this->_M_impl._M_finish;
So attached patch does that in reserve. The downside is that if things
are not inlined we may end up pushing extra copy to stack, but I believe
the benefit from inlining actually pays this back.
The testcase with loop still does not optimize it so I simplified it:
#include <vector>
auto
f()
{
std::vector<int> x;
x.reserve(10);
x.push_back(0);
x.push_back(0);
x.push_back(0);
x.push_back(0);
x.push_back(0);
x.push_back(0);
x.push_back(0);
x.push_back(0);
x.push_back(0);
return x;
}
auto
g()
{ return std::vector<int>(10, 0); }
This now compiles to less code but it is somewhat funny:
<bb 2> [local count: 1073741824]:
MEM <vector(2) long unsigned int> [(int * *)x_3(D)] = { 0, 0 };
MEM[(struct _Vector_impl_data *)x_3(D)]._M_end_of_storage = 0B;
_70 = operator new (40);
<bb 3> [local count: 1073741824]:
x_3(D)->D.26024._M_impl.D.25325._M_start = _70;
_65 = _70 + 40;
x_3(D)->D.26024._M_impl.D.25325._M_end_of_storage = _65;
_74 = _70 + 4;
x_3(D)->D.26024._M_impl.D.25325._M_finish = _74;
MEM <unsigned long> [(int *)_70] = 0;
_80 = _70 + 8;
x_3(D)->D.26024._M_impl.D.25325._M_finish = _80;
*_80 = 0;
_86 = _70 + 12;
x_3(D)->D.26024._M_impl.D.25325._M_finish = _86;
*_86 = 0;
_92 = _70 + 16;
x_3(D)->D.26024._M_impl.D.25325._M_finish = _92;
*_92 = 0;
_98 = _70 + 20;
x_3(D)->D.26024._M_impl.D.25325._M_finish = _98;
*_98 = 0;
_104 = _70 + 24;
x_3(D)->D.26024._M_impl.D.25325._M_finish = _104;
*_104 = 0;
_110 = _70 + 28;
x_3(D)->D.26024._M_impl.D.25325._M_finish = _110;
*_110 = 0;
_116 = _70 + 32;
x_3(D)->D.26024._M_impl.D.25325._M_finish = _116;
*_116 = 0;
_122 = _70 + 36;
x_3(D)->D.26024._M_impl.D.25325._M_finish = _122;
return x_3(D);
The setup code in BB2 is useless and due to fake escape, as discussed in
PRR112653.
However it is funny that we miss dead store elimintation and repeately
set x_3(D)->D.26024._M_impl.D.25325._M_finish = _104;
The problem here is that until pushback.C.208t.forwprop4:std::vector we
do not optimize the reallocation code:
<bb 2> [local count: 1073741824]:
MEM <vector(2) long unsigned int> [(int * *)x_3(D)] = { 0, 0 };
MEM[(struct _Vector_impl_data *)x_3(D)]._M_end_of_storage = 0B;
_70 = operator new (40);
<bb 3> [local count: 1073741824]:
x_3(D)->D.26024._M_impl.D.25325._M_start = _70;
_65 = _70 + 40;
x_3(D)->D.26024._M_impl.D.25325._M_end_of_storage = _65;
*_70 = 0;
_74 = _70 + 4;
x_3(D)->D.26024._M_impl.D.25325._M_finish = _74;
D.26115 = 0;
if (_65 != _74)
goto <bb 4>; [82.57%]
else
goto <bb 5>; [17.43%]
<bb 4> [local count: 886588624]:
*_74 = 0;
_80 = _70 + 8;
x_3(D)->D.26024._M_impl.D.25325._M_finish = _80;
goto <bb 8>; [100.00%]
<bb 5> [local count: 187153200]:
std::vector<int>::_M_realloc_append<int> (x_3(D), &D.26115);
goto <bb 7>; [100.00%]
<bb 6> [count: 0]:
<L12>:
goto <bb 44>; [100.00%]
<bb 7> [local count: 187153200]:
pretmp_127 = MEM[(int * const &)x_3(D) + 8];
pretmp_144 = MEM[(struct _Vector_base *)x_3(D)]._M_impl.D.25325._M_end_of_storage;
And after forwprop4 it is too late because DSE is no longer run.
I filled PR112706. For some reason FRE is missing to optimize:
int *ptr;
void link_error ();
void
test ()
{
int *ptr1 = ptr + 10;
int *ptr2 = ptr + 20;
if (ptr1 == ptr2)
link_error ();
}
The vector.tcc change was regtested on x86_64-linux, OK?
libstdc++-v3/ChangeLog:
* include/bits/vector.tcc (reserve): Copy _M_start and _M_finish
to local variables to allow propagation across call to
allocator.
diff --git a/libstdc++-v3/include/bits/vector.tcc b/libstdc++-v3/include/bits/vector.tcc
index 0ccef7911b3..0a9db29c1c7 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -72,27 +72,30 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
if (this->capacity() < __n)
{
const size_type __old_size = size();
+ // Make local copies of these members because the compiler thinks
+ // the allocator can alter them if 'this' is globally reachable.
+ pointer __old_start = this->_M_impl._M_start;
+ pointer __old_finish = this->_M_impl._M_finish;
pointer __tmp;
#if __cplusplus >= 201103L
if _GLIBCXX17_CONSTEXPR (_S_use_relocate())
{
__tmp = this->_M_allocate(__n);
- _S_relocate(this->_M_impl._M_start, this->_M_impl._M_finish,
+ _S_relocate(__old_start, __old_finish,
__tmp, _M_get_Tp_allocator());
}
else
#endif
{
__tmp = _M_allocate_and_copy(__n,
- _GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(this->_M_impl._M_start),
- _GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(this->_M_impl._M_finish));
- std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
+ _GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(__old_start),
+ _GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(__old_finish));
+ std::_Destroy(__old_start, __old_finish,
_M_get_Tp_allocator());
}
_GLIBCXX_ASAN_ANNOTATE_REINIT;
- _M_deallocate(this->_M_impl._M_start,
- this->_M_impl._M_end_of_storage
- - this->_M_impl._M_start);
+ _M_deallocate(__old_start,
+ this->_M_impl._M_end_of_storage - __old_finish);
this->_M_impl._M_start = __tmp;
this->_M_impl._M_finish = __tmp + __old_size;
this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
next prev parent reply other threads:[~2023-11-24 20:07 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-19 21:53 Jan Hubicka
2023-11-20 12:09 ` Jonathan Wakely
2023-11-20 15:44 ` Jan Hubicka
2023-11-20 16:46 ` Jonathan Wakely
2023-11-21 12:50 ` Jan Hubicka
2023-11-21 13:07 ` Jonathan Wakely
2023-11-23 8:15 ` Matthias Kretz
2023-11-23 15:07 ` Jan Hubicka
2023-11-23 15:33 ` Jan Hubicka
2023-11-23 15:43 ` Jan Hubicka
2023-11-23 16:26 ` Jonathan Wakely
2023-11-23 16:20 ` Jonathan Wakely
2023-11-24 10:21 ` Martin Jambor
2023-11-24 10:23 ` Richard Biener
2023-11-24 19:45 ` Marc Glisse
2023-11-24 20:07 ` Jan Hubicka [this message]
2023-11-24 21:55 ` Jonathan Wakely
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZWECh8DQFpocwj+0@kam.mff.cuni.cz \
--to=hubicka@ucw.cz \
--cc=gcc-patches@gcc.gnu.org \
--cc=jwakely@redhat.com \
--cc=libstdc++@gcc.gnu.org \
--cc=m.kretz@gsi.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).