From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 2181) id CD28C3858402; Thu, 9 Dec 2021 23:26:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CD28C3858402 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" From: Jonathan Wakely To: gcc-cvs@gcc.gnu.org, libstdc++-cvs@gcc.gnu.org Subject: [gcc r12-5875] libstdc++: Do not leak empty COW strings X-Act-Checkin: gcc X-Git-Author: Jonathan Wakely X-Git-Refname: refs/heads/master X-Git-Oldrev: f8463b0e3ec2438b4cfb8c9a468d59761db2c8a0 X-Git-Newrev: fb9875ebf10c86be21824cb836b7b3b80f3a731b Message-Id: <20211209232639.CD28C3858402@sourceware.org> Date: Thu, 9 Dec 2021 23:26:39 +0000 (GMT) X-BeenThere: libstdc++-cvs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Dec 2021 23:26:39 -0000 https://gcc.gnu.org/g:fb9875ebf10c86be21824cb836b7b3b80f3a731b commit r12-5875-gfb9875ebf10c86be21824cb836b7b3b80f3a731b Author: Jonathan Wakely Date: Thu Dec 2 15:50:17 2021 +0000 libstdc++: Do not leak empty COW strings When non-const references, pointers or iterators are obtained to the contents of a COW std::basic_string, the implementation has to assume it could result in a write to the contents. If the string was previously shared, it does the "copy-on-write" step of creating a new copy of the data that is not shared by another object. It also marks the string as "leaked", so that no future copies of it will share ownership either. However, if the string is empty then the only character in the sequence is the terminating null, and modifying that is undefined behaviour. This means that non-const references/pointers/iterators to an empty string are effectively const. Since no direct modification is possible, there is no need to "leak" the string, it can be safely shared with other objects. This avoids unnecessary allocations to create new copies of empty strings that can't be modified anyway. We already did this optimization for strings that share ownership of the static _S_empty_rep() object, but not for strings that have non-zero capacity, and not for fully-dynamic-strings (where the _S_empty_rep() object is never used). With this change we avoid two allocations in the return statement: std::string s; s.reserve(1); // allocate std::string s2 = s; std::string s3 = s; return s[0] + s2[0] + s3[0]; // leak+allocate twice libstdc++-v3/ChangeLog: * include/bits/cow_string.h (basic_string::_M_leak_hard): Do not reallocate an empty string. Diff: --- libstdc++-v3/include/bits/cow_string.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h index 389b39583e4..b21a7422246 100644 --- a/libstdc++-v3/include/bits/cow_string.h +++ b/libstdc++-v3/include/bits/cow_string.h @@ -3366,10 +3366,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION basic_string<_CharT, _Traits, _Alloc>:: _M_leak_hard() { -#if _GLIBCXX_FULLY_DYNAMIC_STRING == 0 - if (_M_rep() == &_S_empty_rep()) + // No need to create a new copy of an empty string when a non-const + // reference/pointer/iterator into it is obtained. Modifying the + // trailing null character is undefined, so the ref/pointer/iterator + // is effectively const anyway. + if (this->empty()) return; -#endif + if (_M_rep()->_M_is_shared()) _M_mutate(0, 0, 0); _M_rep()->_M_set_leaked();