From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id A9A21385843B for ; Wed, 1 Dec 2021 15:08:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A9A21385843B Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-441-jfIoMr8POlyDPG0qSyfZcw-1; Wed, 01 Dec 2021 10:08:09 -0500 X-MC-Unique: jfIoMr8POlyDPG0qSyfZcw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BB15C425E1; Wed, 1 Dec 2021 15:08:08 +0000 (UTC) Received: from localhost (unknown [10.33.36.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6BCCD60843; Wed, 1 Dec 2021 15:08:08 +0000 (UTC) From: Jonathan Wakely To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [committed] libstdc++: Avoid unwanted allocations in filesystem::path Date: Wed, 1 Dec 2021 15:08:07 +0000 Message-Id: <20211201150807.217431-1-jwakely@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" X-Spam-Status: No, score=-14.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Dec 2021 15:08:25 -0000 Tested x86_64-linux, pushed to trunk. When using COW strings, accessing _M_pathname[0] and similar non-const accessors can cause the string to "leak", meaning it reallocates itself if it shares ownership with another string object. This causes test failures for --enable-fully-dynamic-string builds: /home/jwakely/src/gcc/libstdc++-v3/testsuite/experimental/filesystem/path/construct/90634.cc:62: void test01(): Assertion 'bytes_allocated == 0' failed. FAIL: experimental/filesystem/path/construct/90634.cc execution test This FAIL happens because the fully-dynamic move constructor results in shared ownership, so for path(std::move(std::string("foo"))) the _M_pathname member shares ownership with the temporary, and the non-const accesses in _M_split_cmpts() cause a new copy of the string to be allocated. This un-sharing is wasteful, and entirely unnecessary when sharing ownership with an rvalue that is about to release its ownership anyway. Even for lvalues, sharing ownership is not a problem and reallocating a unique copy of the string is wasteful. This removes non-const accesses of _M_pathname in the path::_M_split_cmpts() members. libstdc++-v3/ChangeLog: * src/c++17/fs_path.cc (path::_M_split_cmpts()): Remove micro-optimization for "/" path. * src/filesystem/path.cc (path::_M_split_cmpts()): Only access the contents of _M_pathname using const member functions. --- libstdc++-v3/src/c++17/fs_path.cc | 5 ----- libstdc++-v3/src/filesystem/path.cc | 31 ++++++++++++++++------------- 2 files changed, 17 insertions(+), 19 deletions(-) diff --git a/libstdc++-v3/src/c++17/fs_path.cc b/libstdc++-v3/src/c++17/fs_path.cc index 506ff25f9a6..967d9471134 100644 --- a/libstdc++-v3/src/c++17/fs_path.cc +++ b/libstdc++-v3/src/c++17/fs_path.cc @@ -1872,11 +1872,6 @@ path::_M_split_cmpts() _M_cmpts.type(_Type::_Filename); return; } - if (_M_pathname.length() == 1 && _M_pathname[0] == preferred_separator) - { - _M_cmpts.type(_Type::_Root_dir); - return; - } _Parser parser(_M_pathname); diff --git a/libstdc++-v3/src/filesystem/path.cc b/libstdc++-v3/src/filesystem/path.cc index a935573740f..8e8806a953f 100644 --- a/libstdc++-v3/src/filesystem/path.cc +++ b/libstdc++-v3/src/filesystem/path.cc @@ -337,15 +337,18 @@ path::_M_split_cmpts() _M_type = _Type::_Multi; _M_cmpts.clear(); - if (_M_pathname.empty()) + // Use const-reference to access _M_pathname, to avoid "leaking" COW string. + const auto& pathname = _M_pathname; + + if (pathname.empty()) return; { // Approximate count of components, to reserve space in _M_cmpts vector: int count = 1; - bool saw_sep_last = _S_is_dir_sep(_M_pathname[0]); + bool saw_sep_last = _S_is_dir_sep(pathname[0]); bool saw_non_sep = !saw_sep_last; - for (value_type c : _M_pathname) + for (value_type c : pathname) { if (_S_is_dir_sep(c)) saw_sep_last = true; @@ -363,13 +366,13 @@ path::_M_split_cmpts() } size_t pos = 0; - const size_t len = _M_pathname.size(); + const size_t len = pathname.size(); // look for root name or root directory - if (_S_is_dir_sep(_M_pathname[0])) + if (_S_is_dir_sep(pathname[0])) { // look for root name, such as "//" or "//foo" - if (len > 1 && _M_pathname[1] == _M_pathname[0]) + if (len > 1 && pathname[1] == pathname[0]) { if (len == 2) { @@ -378,11 +381,11 @@ path::_M_split_cmpts() return; } - if (!_S_is_dir_sep(_M_pathname[2])) + if (!_S_is_dir_sep(pathname[2])) { // got root name, find its end pos = 3; - while (pos < len && !_S_is_dir_sep(_M_pathname[pos])) + while (pos < len && !_S_is_dir_sep(pathname[pos])) ++pos; if (pos == len) { @@ -409,7 +412,7 @@ path::_M_split_cmpts() ++pos; } #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS - else if (len > 1 && _M_pathname[1] == L':') + else if (len > 1 && pathname[1] == L':') { // got disk designator if (len == 2) @@ -418,7 +421,7 @@ path::_M_split_cmpts() return; } _M_add_root_name(2); - if (len > 2 && _S_is_dir_sep(_M_pathname[2])) + if (len > 2 && _S_is_dir_sep(pathname[2])) _M_add_root_dir(2); pos = 2; } @@ -426,9 +429,9 @@ path::_M_split_cmpts() else { size_t n = 1; - for (; n < _M_pathname.size() && !_S_is_dir_sep(_M_pathname[n]); ++n) + for (; n < pathname.size() && !_S_is_dir_sep(pathname[n]); ++n) { } - if (n == _M_pathname.size()) + if (n == pathname.size()) { _M_type = _Type::_Filename; return; @@ -438,7 +441,7 @@ path::_M_split_cmpts() size_t back = pos; while (pos < len) { - if (_S_is_dir_sep(_M_pathname[pos])) + if (_S_is_dir_sep(pathname[pos])) { if (back != pos) _M_add_filename(back, pos - back); @@ -450,7 +453,7 @@ path::_M_split_cmpts() if (back != pos) _M_add_filename(back, pos - back); - else if (_S_is_dir_sep(_M_pathname.back())) + else if (_S_is_dir_sep(pathname.back())) { // [path.itr]/8 // "Dot, if one or more trailing non-root slash characters are present." -- 2.31.1