From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 3A9AC385800A for ; Thu, 9 Dec 2021 23:25:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3A9AC385800A Received: from mail-yb1-f198.google.com (mail-yb1-f198.google.com [209.85.219.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-229-Aidv6pCQPyKOvKMjUoKoyA-1; Thu, 09 Dec 2021 18:25:45 -0500 X-MC-Unique: Aidv6pCQPyKOvKMjUoKoyA-1 Received: by mail-yb1-f198.google.com with SMTP id d9-20020a251d09000000b005c208092922so13208730ybd.20 for ; Thu, 09 Dec 2021 15:25:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=eUEj7Y2NX3dzTYnsHPOxr4i7QHPo1O2WveksQ4e+gTw=; b=lx2z6v8Gs9p/w9ZWu5+/36eFJIpeo+kXpmccJWKbZa/qXSoIDebvNEOInvom7oWNki lvc30wJMATyxWnSVhp+96u5nyTT9LsSN1LrbfiQn0QUqEVuD0C//IpV3hNaOsBXo1ZT1 kmswHlLXCn9XA0XZzNaFTbM+AqQ0atXyy5+X4E2gd0G+rWu048Z3puBxR0UvJqLbWoN+ 9W+bF/ORaOKB2eC/9+hwwWvt8o8W1D/Lg0mbCWYTQAU34weT36He7ffhkGBBvYTvXJm1 mig257BTiGsrZmKR4PhpJtdqFf7oOKLEXQ2yPSMKILBDtsvy7kBnp2KLchHj+/+0qpYs GUrw== X-Gm-Message-State: AOAM532fu2hB/BM+CU5Y19BcnAFzLsJ3OIqMFsOigVXjslib68C/yYKY Idu/qh9ROMdLMEpBd4TVvPkTBpNT96DE76HHK5ksse2driUaGQJ7PdtvtkzQEnDDJp5ebqddcS3 OAsmFa7DSxjuk3Bt31C55ekidgYJTtI9+BQ== X-Received: by 2002:a25:3854:: with SMTP id f81mr10383632yba.316.1639092344407; Thu, 09 Dec 2021 15:25:44 -0800 (PST) X-Google-Smtp-Source: ABdhPJwXGv0ZTNP5J9onK9t38Ws33QeUyFoz31mgGEIDiZV4K0puaqv7SjGV2C6cAZcwFGFwbQIb+nrC+AYuwA9ohu4= X-Received: by 2002:a25:3854:: with SMTP id f81mr10383608yba.316.1639092344176; Thu, 09 Dec 2021 15:25:44 -0800 (PST) MIME-Version: 1.0 References: <20211202170031.366865-1-jwakely@redhat.com> In-Reply-To: <20211202170031.366865-1-jwakely@redhat.com> From: Jonathan Wakely Date: Thu, 9 Dec 2021 23:25:33 +0000 Message-ID: Subject: Re: [PATCH] libstdc++: Do not leak empty COW strings To: Jonathan Wakely Cc: "libstdc++" , gcc Patches X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Dec 2021 23:25:47 -0000 On Thu, 2 Dec 2021 at 17:21, Jonathan Wakely via Libstdc++ wrote: > > Apart from "don't bother changing the COW string", does anybody see a > reason we shouldn't do this? This passes all tests for normal COW > strings and fully-dynamic COW strings. Pushed to trunk. > > When non-const references, pointers or iterators are obtained to the > contents of a COW std::basic_string, the implementation has to assume it > could result in a write to the contents. If the string was previously > shared, it does the "copy-on-write" step of creating a new copy of the > data that is not shared by another object. It also marks the string as > "leaked", so that no future copies of it will share ownership either. > > However, if the string is empty then the only character in the sequence > is the terminating null, and modifying that is undefined behaviour. This > means that non-const references/pointers/iterators to an empty string > are effectively const. Since no direct modification is possible, there > is no need to "leak" the string, it can be safely shared with other > objects. This avoids unnecessary allocations to create new copies of > empty strings that can't be modified anyway. > > We already did this optimization for strings that share ownership of the > static _S_empty_rep() object, but not for strings that have non-zero > capacity, and not for fully-dynamic-strings (where the _S_empty_rep() > object is never used). > > With this change we avoid two allocations in the return statement: > > std::string s; > s.reserve(1); // allocate > std::string s2 = s; > std::string s3 = s; > return s[0] + s2[0] + s3[0]; // leak+allocate twice > > libstdc++-v3/ChangeLog: > > * include/bits/cow_string.h (basic_string::_M_leak_hard): Do not > reallocate an empty string. > --- > libstdc++-v3/include/bits/cow_string.h | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h > index 389b39583e4..b21a7422246 100644 > --- a/libstdc++-v3/include/bits/cow_string.h > +++ b/libstdc++-v3/include/bits/cow_string.h > @@ -3366,10 +3366,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > basic_string<_CharT, _Traits, _Alloc>:: > _M_leak_hard() > { > -#if _GLIBCXX_FULLY_DYNAMIC_STRING == 0 > - if (_M_rep() == &_S_empty_rep()) > + // No need to create a new copy of an empty string when a non-const > + // reference/pointer/iterator into it is obtained. Modifying the > + // trailing null character is undefined, so the ref/pointer/iterator > + // is effectively const anyway. > + if (this->empty()) > return; > -#endif > + > if (_M_rep()->_M_is_shared()) > _M_mutate(0, 0, 0); > _M_rep()->_M_set_leaked(); > -- > 2.31.1 >