public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jonathan Wakely <jwakely@redhat.com>
To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org
Subject: [committed] libstdc++: Fix regression in memory use when constructing paths
Date: Wed, 13 Oct 2021 23:57:09 +0100	[thread overview]
Message-ID: <YWdkRVBp41FbiG0K@redhat.com> (raw)
In-Reply-To: <YWc/XUt+NJUGimJX@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3621 bytes --]

On 13/10/21 21:19 +0100, Jonathan Wakely wrote:
>On 13/10/21 20:41 +0100, Jonathan Wakely wrote:
>>Adjust the __detail::__effective_range overloads so they always return a
>>string or string view using std::char_traits, because we don't care
>>about the traits of an incoming string.
>>
>>Use std::contiguous_iterator in the __effective_range(const Source&)
>>overload, to allow returning a basic_string_view in more cases. For the
>>non-contiguous cases in both __effective_range and __string_from_range,
>>return a std::string instead of std::u8string when the value type of the
>>range is char8_t.  These changes avoid unnecessary basic_string
>>temporaries.
>
>[...]
>
>>  template<typename _InputIterator>
>>    inline auto
>>    __string_from_range(_InputIterator __first, _InputIterator __last)
>>    {
>>      using _EcharT
>>	= typename std::iterator_traits<_InputIterator>::value_type;
>>-      static_assert(__is_encoded_char<_EcharT>);
>>+      static_assert(__is_encoded_char<_EcharT>); // C++17 [fs.req]/3
>>
>>-#if __cpp_lib_concepts
>>-      constexpr bool __contiguous = std::contiguous_iterator<_InputIterator>;
>>-#else
>>-      constexpr bool __contiguous
>>-	= is_pointer_v<decltype(std::__niter_base(__first))>;
>>-#endif
>>-      if constexpr (__contiguous)
>>+      if constexpr (__is_contiguous<_InputIterator>)
>
>Oops, this pessimiszes construction from string::iterator and
>vector::iterator in C++17 mode, because the new __is_contiguous
>variable template just uses is_pointer_v, without the __niter_base
>call that unwraps a __normal_iterator.
>
>That means that we now create a basic_string<C> temporary where we
>previously jsut returned a basic_string_view<C>.
>
>I am testing a fix.

Here's the fix, committed to trunk.

With this change there are no temporaries for string and vector
iterators passed to path constructors. For debug mode there are some
unnecessary temporaries for vector::iterator arguments, because the
safe iterator isn't recognized as contiguous in C++17 mode (but it's
fine in C++20 mode).

The bytes allocated to construct a path consisting of a single
filename with 38 characters are:

Constructor arguments               GCC 11 | GCC 12 | 11 debug | 12 debug
-------------------------------------------|--------|----------|---------
const char*                           39   |   39   |    39    |   39
const char(&)[N]                      39   |   39   |    39    |   39
const char8_t*                        39   |   39   |    39    |   39
const char8_t(&)[N]                   39   |   39   |    39    |   39
std::string::iterator pair            39   |   39   |    39    |   39
std::string::iterator NTCTS           92   |   39   |    92    |   39
std::u8string::iterator pair          39   |   39   |    39    |   39
std::u8string::iterator NTCTS        131   |   39   |   131    |   39
std::vector<char8_t>::iterator pair   39   |   39   |    78    |   78
std::vector<char8_t>::iterator NTCTS 131   |   39   |   131    |   39
std::list<char8_t>::iterator pair     78   |   78   |    78    |   78
std::list<char8_t>::iterator NTCTS   131   |  131   |   131    |  131

So for GCC 12 there are no unwanted allocations unless the iterators
are not contiguous (e.g. std::list::iterator). In that case we need to
construct a temporary string. For the pair(Iter, Iter) constructor we
can use distance(first, last) to size that temporary string correctly,
but for the path(const Source&) constructor we read one character at a
time from the input and call push_back.

In any case, the regression is fixed and we're at least as good as GCC
11 in all cases now.



[-- Attachment #2: patch.txt --]
[-- Type: text/x-patch, Size: 2297 bytes --]

commit f874a13ca3870a56036a90758b0d41c8c217f4f7
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Oct 13 21:32:14 2021

    libstdc++: Fix regression in memory use when constructing paths
    
    The changes in r12-4381 were intended to reduce memory usage, but
    replacing the __contiguous constant in __string_from_range with the new
    __is_contiguous variable template caused a regression. The old code
    checked is_pointer_v<decltype(std::__niter_base(__first))> but he new
    code just checks is_pointer_v<_InputIterator>. This means that we no
    longer recognise basic_string::iterator and vector::iterator as
    contiguous, and so return a temporary basic_string instead of a
    basic_string_view. This only affects C++17 mode, because the
    std::contiguous_iterator concept is used in C++20 which gives the right
    answer for __normal_iterator (and more types as well).
    
    The fix is to specialize the new __is_contiguous variable template so it
    is true for __normal_iterator<T*, C> specializations. The new partial
    specializations are defined for C++20 too, because it should be cheaper
    to match the partial specialization than to check whether the
    std::contiguous_iterator concept is satisfied.
    
    libstdc++-v3/ChangeLog:
    
            * include/bits/fs_path.h (__detail::__is_contiguous): Add
            partial specializations for pointers and __normal_iterator.

diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h
index 05db792fbae..c51bfa3095a 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -158,9 +158,16 @@ namespace __detail
     constexpr bool __is_contiguous = std::contiguous_iterator<_Iter>;
 #else
   template<typename _Iter>
-    constexpr bool __is_contiguous = is_pointer_v<_Iter>;
+    constexpr bool __is_contiguous = false;
 #endif
 
+  template<typename _Tp>
+    constexpr bool __is_contiguous<_Tp*> = true;
+
+  template<typename _Tp, typename _Seq>
+    constexpr bool
+    __is_contiguous<__gnu_cxx::__normal_iterator<_Tp*, _Seq>> = true;
+
 #if !defined _GLIBCXX_FILESYSTEM_IS_WINDOWS && defined _GLIBCXX_USE_CHAR8_T
   // For POSIX treat char8_t sequences as char without encoding conversions.
   template<typename _EcharT>

  reply	other threads:[~2021-10-13 22:57 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-13 19:41 [committed] libstdc++: Refactor filesystem::path encoding conversions Jonathan Wakely
2021-10-13 20:19 ` Jonathan Wakely
2021-10-13 22:57   ` Jonathan Wakely [this message]
2021-10-14 14:18 ` [committed] libstdc++: Fix brainwrong in path::_S_convert(T) [PR102743] Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YWdkRVBp41FbiG0K@redhat.com \
    --to=jwakely@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).