* [committed] libstdc++: Optimize ref-count updates in COW std::string
@ 2021-12-01 15:08 Jonathan Wakely
2021-12-01 18:16 ` Florian Weimer
0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Wakely @ 2021-12-01 15:08 UTC (permalink / raw)
To: libstdc++, gcc-patches
Tested x86_64-linux, pushed to trunk.
Most ref-count updates in the COW string are done via the functions in
<ext/atomicity.h>, which will use non-atomic ops when the program is
known to be single-threaded. The _M_is_leaked() and _M_is_shared()
functions use __atomic_load_n directly, because <ext/atomicity.h>
doesn't provide a load operation. Those functions can check the
__is_single_threaded() predicate to avoid using __atomic_load_n when not
needed.
The move constructor for the fully-dynamic-string increments the
ref-count by either 2 or 1, for leaked or non-leaked strings
respectively. That can be changed to use a non-atomic store of 1 for all
non-shared strings. It can be non-atomic because even if the program is
multi-threaded, conflicting access to the rvalue object while it's being
moved from would be data race anyway. It can store 1 directly for all
non-shared strings because it doesn't matter whether the initial
refcount was -1 or 0, it should be 1 after the move constructor creates
a second owner.
libstdc++-v3/ChangeLog:
* include/bits/cow_string.h (basic_string::_M_is_leaked): Use
non-atomic load when __is_single_threaded() is true.
(basic_string::_M_is_shared): Likewise.
(basic_string::(basic_string&&)) [_GLIBCXX_FULLY_DYNAMIC_STRING]:
Use non-atomic store when rvalue is not shared.
---
libstdc++-v3/include/bits/cow_string.h | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h
index ced395b80b8..4fae1d02981 100644
--- a/libstdc++-v3/include/bits/cow_string.h
+++ b/libstdc++-v3/include/bits/cow_string.h
@@ -105,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* destroy the empty-string _Rep object.
*
* All but the last paragraph is considered pretty conventional
- * for a C++ string implementation.
+ * for a Copy-On-Write C++ string implementation.
*/
// 21.3 Template class basic_string
template<typename _CharT, typename _Traits, typename _Alloc>
@@ -207,10 +207,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// so we need to use an atomic load. However, _M_is_leaked
// predicate does not change concurrently (i.e. the string is either
// leaked or not), so a relaxed load is enough.
- return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
-#else
- return this->_M_refcount < 0;
+ if (!__gnu_cxx::__is_single_threaded())
+ return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
#endif
+ return this->_M_refcount < 0;
}
bool
@@ -222,10 +222,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// but one reference concurrently with this check, so we need this
// load to be acquire to synchronize with release fetch_and_add in
// _M_dispose.
- return __atomic_load_n(&this->_M_refcount, __ATOMIC_ACQUIRE) > 0;
-#else
- return this->_M_refcount > 0;
+ if (!__gnu_cxx::__is_single_threaded())
+ return __atomic_load_n(&this->_M_refcount, __ATOMIC_ACQUIRE) > 0;
#endif
+ return this->_M_refcount > 0;
}
void
@@ -629,12 +629,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
#else
// Rather than allocate an empty string for the rvalue string,
// just share ownership with it by incrementing the reference count.
- // If the rvalue string was "leaked" then it was the unique owner,
- // so need an extra increment to indicate shared ownership.
- if (_M_rep()->_M_is_leaked())
- __gnu_cxx::__atomic_add_dispatch(&_M_rep()->_M_refcount, 2);
- else
+ // If the rvalue string was the unique owner then there are exactly
+ // two owners now.
+ if (_M_rep()->_M_is_shared())
__gnu_cxx::__atomic_add_dispatch(&_M_rep()->_M_refcount, 1);
+ else
+ _M_rep()->_M_refcount = 1;
#endif
}
--
2.31.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [committed] libstdc++: Optimize ref-count updates in COW std::string
2021-12-01 15:08 [committed] libstdc++: Optimize ref-count updates in COW std::string Jonathan Wakely
@ 2021-12-01 18:16 ` Florian Weimer
2021-12-01 18:24 ` Jonathan Wakely
0 siblings, 1 reply; 4+ messages in thread
From: Florian Weimer @ 2021-12-01 18:16 UTC (permalink / raw)
To: Jonathan Wakely via Libstdc++; +Cc: gcc-patches, Jonathan Wakely
* Jonathan Wakely via Libstdc:
> diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h
> index ced395b80b8..4fae1d02981 100644
> --- a/libstdc++-v3/include/bits/cow_string.h
> +++ b/libstdc++-v3/include/bits/cow_string.h
> @@ -105,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> * destroy the empty-string _Rep object.
> *
> * All but the last paragraph is considered pretty conventional
> - * for a C++ string implementation.
> + * for a Copy-On-Write C++ string implementation.
> */
> // 21.3 Template class basic_string
> template<typename _CharT, typename _Traits, typename _Alloc>
> @@ -207,10 +207,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> // so we need to use an atomic load. However, _M_is_leaked
> // predicate does not change concurrently (i.e. the string is either
> // leaked or not), so a relaxed load is enough.
> - return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
> -#else
> - return this->_M_refcount < 0;
> + if (!__gnu_cxx::__is_single_threaded())
> + return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
> #endif
> + return this->_M_refcount < 0;
> }
Relaxed MO loads of word-size values on all current architectures only
have a compiler barrier, so I think the optimization makes things worse?
(I doubt the conditional lack of a compiler barrier leads to
optimization improvements elsewhere.)
Thanks,
Florian
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [committed] libstdc++: Optimize ref-count updates in COW std::string
2021-12-01 18:16 ` Florian Weimer
@ 2021-12-01 18:24 ` Jonathan Wakely
2021-12-02 16:55 ` [committed] libstdc++: Restore unconditional atomic load " Jonathan Wakely
0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Wakely @ 2021-12-01 18:24 UTC (permalink / raw)
To: Florian Weimer; +Cc: Jonathan Wakely via Libstdc++, gcc Patches
On Wed, 1 Dec 2021 at 18:16, Florian Weimer wrote:
>
> * Jonathan Wakely via Libstdc:
>
> > diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h
> > index ced395b80b8..4fae1d02981 100644
> > --- a/libstdc++-v3/include/bits/cow_string.h
> > +++ b/libstdc++-v3/include/bits/cow_string.h
> > @@ -105,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > * destroy the empty-string _Rep object.
> > *
> > * All but the last paragraph is considered pretty conventional
> > - * for a C++ string implementation.
> > + * for a Copy-On-Write C++ string implementation.
> > */
> > // 21.3 Template class basic_string
> > template<typename _CharT, typename _Traits, typename _Alloc>
> > @@ -207,10 +207,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > // so we need to use an atomic load. However, _M_is_leaked
> > // predicate does not change concurrently (i.e. the string is either
> > // leaked or not), so a relaxed load is enough.
> > - return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
> > -#else
> > - return this->_M_refcount < 0;
> > + if (!__gnu_cxx::__is_single_threaded())
> > + return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
> > #endif
> > + return this->_M_refcount < 0;
> > }
>
> Relaxed MO loads of word-size values on all current architectures only
> have a compiler barrier, so I think the optimization makes things worse?
Hmm, yes.
> (I doubt the conditional lack of a compiler barrier leads to
> optimization improvements elsewhere.)
Probably not. I'll revert the change to _M_is_leaked() and just keep
it for _M_is_shared().
Thanks for pointing that out.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [committed] libstdc++: Restore unconditional atomic load in COW std::string
2021-12-01 18:24 ` Jonathan Wakely
@ 2021-12-02 16:55 ` Jonathan Wakely
0 siblings, 0 replies; 4+ messages in thread
From: Jonathan Wakely @ 2021-12-02 16:55 UTC (permalink / raw)
To: Florian Weimer; +Cc: Jonathan Wakely via Libstdc++, gcc Patches
[-- Attachment #1: Type: text/plain, Size: 1919 bytes --]
On Wed, 1 Dec 2021 at 18:24, Jonathan Wakely wrote:
>
> On Wed, 1 Dec 2021 at 18:16, Florian Weimer wrote:
> >
> > * Jonathan Wakely via Libstdc:
> >
> > > diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h
> > > index ced395b80b8..4fae1d02981 100644
> > > --- a/libstdc++-v3/include/bits/cow_string.h
> > > +++ b/libstdc++-v3/include/bits/cow_string.h
> > > @@ -105,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > * destroy the empty-string _Rep object.
> > > *
> > > * All but the last paragraph is considered pretty conventional
> > > - * for a C++ string implementation.
> > > + * for a Copy-On-Write C++ string implementation.
> > > */
> > > // 21.3 Template class basic_string
> > > template<typename _CharT, typename _Traits, typename _Alloc>
> > > @@ -207,10 +207,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > // so we need to use an atomic load. However, _M_is_leaked
> > > // predicate does not change concurrently (i.e. the string is either
> > > // leaked or not), so a relaxed load is enough.
> > > - return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
> > > -#else
> > > - return this->_M_refcount < 0;
> > > + if (!__gnu_cxx::__is_single_threaded())
> > > + return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
> > > #endif
> > > + return this->_M_refcount < 0;
> > > }
> >
> > Relaxed MO loads of word-size values on all current architectures only
> > have a compiler barrier, so I think the optimization makes things worse?
>
> Hmm, yes.
>
> > (I doubt the conditional lack of a compiler barrier leads to
> > optimization improvements elsewhere.)
>
> Probably not. I'll revert the change to _M_is_leaked() and just keep
> it for _M_is_shared().
>
> Thanks for pointing that out.
Reverted by the attached patch, tested powerpc64le-linux.
[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 1263 bytes --]
commit b5a568683f71b4a8b1e4e45a43484398e9a66ff2
Author: Jonathan Wakely <jwakely@redhat.com>
Date: Wed Dec 1 20:58:58 2021
libstdc++: Restore unconditional atomic load in COW std::string
The relaxed load is already optimal, checking the __single_threaded
global before doing a non-atomic load isn't an optimization.
libstdc++-v3/ChangeLog:
* include/bits/cow_string.h (basic_string::_M_is_leaked()):
Revert change to check __is_single_threaded() before using
atomic load.
diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h
index d6ddf3489d1..389b39583e4 100644
--- a/libstdc++-v3/include/bits/cow_string.h
+++ b/libstdc++-v3/include/bits/cow_string.h
@@ -207,10 +207,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// so we need to use an atomic load. However, _M_is_leaked
// predicate does not change concurrently (i.e. the string is either
// leaked or not), so a relaxed load is enough.
- if (!__gnu_cxx::__is_single_threaded())
- return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
-#endif
+ return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0;
+#else
return this->_M_refcount < 0;
+#endif
}
bool
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-12-02 16:55 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-01 15:08 [committed] libstdc++: Optimize ref-count updates in COW std::string Jonathan Wakely
2021-12-01 18:16 ` Florian Weimer
2021-12-01 18:24 ` Jonathan Wakely
2021-12-02 16:55 ` [committed] libstdc++: Restore unconditional atomic load " Jonathan Wakely
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).