public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] libstdc++: Do not leak empty COW strings
@ 2021-12-02 17:00 Jonathan Wakely
  2021-12-09 23:25 ` Jonathan Wakely
  0 siblings, 1 reply; 2+ messages in thread
From: Jonathan Wakely @ 2021-12-02 17:00 UTC (permalink / raw)
  To: libstdc++, gcc-patches

Apart from "don't bother changing the COW string", does anybody see a
reason we shouldn't do this? This passes all tests for normal COW
strings and fully-dynamic COW strings.


When non-const references, pointers or iterators are obtained to the
contents of a COW std::basic_string, the implementation has to assume it
could result in a write to the contents. If the string was previously
shared, it does the "copy-on-write" step of creating a new copy of the
data that is not shared by another object.  It also marks the string as
"leaked", so that no future copies of it will share ownership either.

However, if the string is empty then the only character in the sequence
is the terminating null, and modifying that is undefined behaviour. This
means that non-const references/pointers/iterators to an empty string
are effectively const. Since no direct modification is possible, there
is no need to "leak" the string, it can be safely shared with other
objects. This avoids unnecessary allocations to create new copies of
empty strings that can't be modified anyway.

We already did this optimization for strings that share ownership of the
static _S_empty_rep() object, but not for strings that have non-zero
capacity, and not for fully-dynamic-strings (where the _S_empty_rep()
object is never used).

With this change we avoid two allocations in the return statement:

  std::string s;
  s.reserve(1);       // allocate
  std::string s2 = s;
  std::string s3 = s;
  return s[0] + s2[0] + s3[0]; // leak+allocate twice

libstdc++-v3/ChangeLog:

	* include/bits/cow_string.h (basic_string::_M_leak_hard): Do not
	reallocate an empty string.
---
 libstdc++-v3/include/bits/cow_string.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h
index 389b39583e4..b21a7422246 100644
--- a/libstdc++-v3/include/bits/cow_string.h
+++ b/libstdc++-v3/include/bits/cow_string.h
@@ -3366,10 +3366,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     basic_string<_CharT, _Traits, _Alloc>::
     _M_leak_hard()
     {
-#if _GLIBCXX_FULLY_DYNAMIC_STRING == 0
-      if (_M_rep() == &_S_empty_rep())
+      // No need to create a new copy of an empty string when a non-const
+      // reference/pointer/iterator into it is obtained. Modifying the
+      // trailing null character is undefined, so the ref/pointer/iterator
+      // is effectively const anyway.
+      if (this->empty())
 	return;
-#endif
+
       if (_M_rep()->_M_is_shared())
 	_M_mutate(0, 0, 0);
       _M_rep()->_M_set_leaked();
-- 
2.31.1


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] libstdc++: Do not leak empty COW strings
  2021-12-02 17:00 [PATCH] libstdc++: Do not leak empty COW strings Jonathan Wakely
@ 2021-12-09 23:25 ` Jonathan Wakely
  0 siblings, 0 replies; 2+ messages in thread
From: Jonathan Wakely @ 2021-12-09 23:25 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc Patches

On Thu, 2 Dec 2021 at 17:21, Jonathan Wakely via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
>
> Apart from "don't bother changing the COW string", does anybody see a
> reason we shouldn't do this? This passes all tests for normal COW
> strings and fully-dynamic COW strings.

Pushed to trunk.

>
> When non-const references, pointers or iterators are obtained to the
> contents of a COW std::basic_string, the implementation has to assume it
> could result in a write to the contents. If the string was previously
> shared, it does the "copy-on-write" step of creating a new copy of the
> data that is not shared by another object.  It also marks the string as
> "leaked", so that no future copies of it will share ownership either.
>
> However, if the string is empty then the only character in the sequence
> is the terminating null, and modifying that is undefined behaviour. This
> means that non-const references/pointers/iterators to an empty string
> are effectively const. Since no direct modification is possible, there
> is no need to "leak" the string, it can be safely shared with other
> objects. This avoids unnecessary allocations to create new copies of
> empty strings that can't be modified anyway.
>
> We already did this optimization for strings that share ownership of the
> static _S_empty_rep() object, but not for strings that have non-zero
> capacity, and not for fully-dynamic-strings (where the _S_empty_rep()
> object is never used).
>
> With this change we avoid two allocations in the return statement:
>
>   std::string s;
>   s.reserve(1);       // allocate
>   std::string s2 = s;
>   std::string s3 = s;
>   return s[0] + s2[0] + s3[0]; // leak+allocate twice
>
> libstdc++-v3/ChangeLog:
>
>         * include/bits/cow_string.h (basic_string::_M_leak_hard): Do not
>         reallocate an empty string.
> ---
>  libstdc++-v3/include/bits/cow_string.h | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h
> index 389b39583e4..b21a7422246 100644
> --- a/libstdc++-v3/include/bits/cow_string.h
> +++ b/libstdc++-v3/include/bits/cow_string.h
> @@ -3366,10 +3366,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      basic_string<_CharT, _Traits, _Alloc>::
>      _M_leak_hard()
>      {
> -#if _GLIBCXX_FULLY_DYNAMIC_STRING == 0
> -      if (_M_rep() == &_S_empty_rep())
> +      // No need to create a new copy of an empty string when a non-const
> +      // reference/pointer/iterator into it is obtained. Modifying the
> +      // trailing null character is undefined, so the ref/pointer/iterator
> +      // is effectively const anyway.
> +      if (this->empty())
>         return;
> -#endif
> +
>        if (_M_rep()->_M_is_shared())
>         _M_mutate(0, 0, 0);
>        _M_rep()->_M_set_leaked();
> --
> 2.31.1
>


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-12-09 23:25 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-02 17:00 [PATCH] libstdc++: Do not leak empty COW strings Jonathan Wakely
2021-12-09 23:25 ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).