From: Jonathan Wakely <jwakely@redhat.com>
To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org
Subject: [committed] libstdc++: Refactor filesystem::path encoding conversions
Date: Wed, 13 Oct 2021 20:41:38 +0100 [thread overview]
Message-ID: <YWc2cp/MJDN0/LDa@redhat.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 1520 bytes --]
Adjust the __detail::__effective_range overloads so they always return a
string or string view using std::char_traits, because we don't care
about the traits of an incoming string.
Use std::contiguous_iterator in the __effective_range(const Source&)
overload, to allow returning a basic_string_view in more cases. For the
non-contiguous cases in both __effective_range and __string_from_range,
return a std::string instead of std::u8string when the value type of the
range is char8_t. These changes avoid unnecessary basic_string
temporaries.
Also simplify __string_from_range(Iter, Iter) to not need
std::__to_address for the contiguous case.
Combine the _S_convert(string_type) and _S_convert(const T&) overloads
into a single _S_convert(T) function which also avoids the dangling
view problem of PR 102592 (should that recur somehow).
libstdc++-v3/ChangeLog:
* include/bits/fs_path.h (__detail::__is_contiguous): New
variable template to identify contiguous iterators.
(__detail::__unified_char8_t): New alias template to decide when
to treat char8_t as char without encoding conversion.
(__detail::__effective_range(const basic_string<C,T>&)): Use
std::char_traits<C> for returned string view.
(__detail::__effective_range(const basic_string_view<C,T>&)):
Likewise.
(__detail::__effective_range(const Source&)): Use
__is_contiguous to detect mode cases of contiguous iterators.
Use __unified_char8_t to return a std::string instead of
std::u8string.
Tested powerpc64le-linux. Committed to trunk.
[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 10123 bytes --]
commit b83b810ac440f72e7551b6496539e60ac30c0d8a
Author: Jonathan Wakely <jwakely@redhat.com>
Date: Wed Oct 13 17:19:57 2021
libstdc++: Refactor filesystem::path encoding conversions
Adjust the __detail::__effective_range overloads so they always return a
string or string view using std::char_traits, because we don't care
about the traits of an incoming string.
Use std::contiguous_iterator in the __effective_range(const Source&)
overload, to allow returning a basic_string_view in more cases. For the
non-contiguous casecl in both __effective_range and __string_from_range,
return a std::string instead of std::u8string when the value type of the
range is char8_t. These changes avoid unnecessary basic_string
temporaries.
Also simplify __string_from_range(Iter, Iter) to not need
std::__to_address for the contiguous case.
Combine the _S_convert(string_type) and _S_convert(const T&) overloads
into a single _S_convert(T) function which also avoids the dangling
view problem of PR 102592 (should that recur somehow).
libstdc++-v3/ChangeLog:
* include/bits/fs_path.h (__detail::__is_contiguous): New
variable template to identify contiguous iterators.
(__detail::__unified_char8_t): New alias template to decide when
to treat char8_t as char without encoding conversion.
(__detail::__effective_range(const basic_string<C,T>&)): Use
std::char_traits<C> for returned string view.
(__detail::__effective_range(const basic_string_view<C,T>&)):
Likewise.
(__detail::__effective_range(const Source&)): Use
__is_contiguous to detect mode cases of contiguous iterators.
Use __unified_char8_t to return a std::string instead of
std::u8string.
diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h
index 7ead8ac299c..05db792fbae 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -153,33 +153,56 @@ namespace __detail
template<typename _Iter, typename _Tr = __safe_iterator_traits<_Iter>>
using _Path2 = enable_if_t<__is_path_iter_src<_Tr>::value, path>;
+#if __cpp_lib_concepts
+ template<typename _Iter>
+ constexpr bool __is_contiguous = std::contiguous_iterator<_Iter>;
+#else
+ template<typename _Iter>
+ constexpr bool __is_contiguous = is_pointer_v<_Iter>;
+#endif
+
+#if !defined _GLIBCXX_FILESYSTEM_IS_WINDOWS && defined _GLIBCXX_USE_CHAR8_T
+ // For POSIX treat char8_t sequences as char without encoding conversions.
+ template<typename _EcharT>
+ using __unified_u8_t
+ = __conditional_t<is_same_v<_EcharT, char8_t>, char, _EcharT>;
+#else
+ template<typename _EcharT>
+ using __unified_u8_t = _EcharT;
+#endif
+
// The __effective_range overloads convert a Source parameter into
- // either a basic_string_view or basic_string containing the
+ // either a basic_string_view<C> or basic_string<C> containing the
// effective range of the Source, as defined in [fs.path.req].
template<typename _CharT, typename _Traits, typename _Alloc>
- inline basic_string_view<_CharT, _Traits>
+ inline basic_string_view<_CharT>
__effective_range(const basic_string<_CharT, _Traits, _Alloc>& __source)
+ noexcept
{ return __source; }
template<typename _CharT, typename _Traits>
- inline const basic_string_view<_CharT, _Traits>&
+ inline basic_string_view<_CharT>
__effective_range(const basic_string_view<_CharT, _Traits>& __source)
+ noexcept
{ return __source; }
+ // Return the effective range of an NTCTS.
template<typename _Source>
- inline auto
+ auto
__effective_range(const _Source& __source)
{
- if constexpr (is_pointer_v<decay_t<_Source>>)
- return basic_string_view{&*__source};
+ // Remove a level of normal/safe iterator indirection, or decay an array.
+ using _Iter = decltype(std::__niter_base(__source));
+ using value_type = typename iterator_traits<_Iter>::value_type;
+
+ if constexpr (__is_contiguous<_Iter>)
+ return basic_string_view<value_type>{&*__source};
else
{
// _Source is an input iterator that iterates over an NTCTS.
// Create a basic_string by reading until the null character.
- using value_type
- = typename iterator_traits<_Source>::value_type;
- basic_string<value_type> __str;
+ basic_string<__unified_u8_t<value_type>> __str;
_Source __it = __source;
for (value_type __ch = *__it; __ch != value_type(); __ch = *++__it)
__str.push_back(__ch);
@@ -188,20 +211,39 @@ namespace __detail
}
// The value type of a Source parameter's effective range.
- template<typename _Tp>
- using __value_t = typename remove_reference_t<
- decltype(__detail::__effective_range(std::declval<_Tp>()))>::value_type;
+ template<typename _Source>
+ struct __source_value_type_impl
+ {
+ using type
+ = typename __safe_iterator_traits<decay_t<_Source>>::value_type;
+ };
+
+ template<typename _CharT, typename _Traits, typename _Alloc>
+ struct __source_value_type_impl<basic_string<_CharT, _Traits, _Alloc>>
+ {
+ using type = _CharT;
+ };
+
+ template<typename _CharT, typename _Traits>
+ struct __source_value_type_impl<basic_string_view<_CharT, _Traits>>
+ {
+ using type = _CharT;
+ };
+
+ // The value type of a Source parameter's effective range.
+ template<typename _Source>
+ using __source_value_t = typename __source_value_type_impl<_Source>::type;
// SFINAE helper to check that an effective range has value_type char,
// as required by path constructors taking a std::locale parameter.
// The type _Tp must have already been checked by _Path<Tp> or _Path2<_Tp>.
- template<typename _Tp, typename _Val = __value_t<_Tp>>
+ template<typename _Tp, typename _Val = __source_value_t<_Tp>>
using __value_type_is_char
= std::enable_if_t<std::is_same_v<_Val, char>, _Val>;
// As above, but also allows char8_t, as required by u8path
// C++20 [depr.fs.path.factory]
- template<typename _Tp, typename _Val = __value_t<_Tp>>
+ template<typename _Tp, typename _Val = __source_value_t<_Tp>>
using __value_type_is_char_or_char8_t
= std::enable_if_t<std::is_same_v<_Val, char>
#ifdef _GLIBCXX_USE_CHAR8_T
@@ -209,31 +251,27 @@ namespace __detail
#endif
, _Val>;
- // Create a string or string view from an iterator range.
+ // Create a basic_string<C> or basic_string_view<C> from an iterator range.
template<typename _InputIterator>
inline auto
__string_from_range(_InputIterator __first, _InputIterator __last)
{
using _EcharT
= typename std::iterator_traits<_InputIterator>::value_type;
- static_assert(__is_encoded_char<_EcharT>);
+ static_assert(__is_encoded_char<_EcharT>); // C++17 [fs.req]/3
-#if __cpp_lib_concepts
- constexpr bool __contiguous = std::contiguous_iterator<_InputIterator>;
-#else
- constexpr bool __contiguous
- = is_pointer_v<decltype(std::__niter_base(__first))>;
-#endif
- if constexpr (__contiguous)
+ if constexpr (__is_contiguous<_InputIterator>)
{
// For contiguous iterators we can just return a string view.
- const auto* __f = std::__to_address(std::__niter_base(__first));
- const auto* __l = std::__to_address(std::__niter_base(__last));
- return basic_string_view<_EcharT>(__f, __l - __f);
+ if (auto __len = __last - __first) [[__likely__]]
+ return basic_string_view<_EcharT>(&*__first, __len);
+ return basic_string_view<_EcharT>();
}
else
- // Conversion requires contiguous characters, so create a string.
- return basic_string<_EcharT>(__first, __last);
+ {
+ // Conversion requires contiguous characters, so create a string.
+ return basic_string<__unified_u8_t<_EcharT>>(__first, __last);
+ }
}
/// @} group filesystem
@@ -573,27 +611,22 @@ namespace __detail
pair<const string_type*, size_t> _M_find_extension() const noexcept;
// path::_S_convert creates a basic_string<value_type> or
- // basic_string_view<value_type> from a range (either the effective
- // range of a Source parameter, or a pair of InputIterator parameters),
+ // basic_string_view<value_type> from a basic_string<C> or
+ // basic_string_view<C>, for an encoded character type C,
// performing the conversions required by [fs.path.type.cvt].
- // If the value_type of the range value type is path::value_type,
- // no encoding conversion is performed. If the range is contiguous
- // a string_view
-
- static string_type
- _S_convert(string_type __str)
- { return __str; }
-
template<typename _Tp>
static auto
- _S_convert(const _Tp& __str)
+ _S_convert(_Tp __str)
{
- if constexpr (is_same_v<_Tp, string_type>)
- return __str;
- else if constexpr (is_same_v<_Tp, basic_string_view<value_type>>)
- return __str;
- else if constexpr (is_same_v<typename _Tp::value_type, value_type>)
- return basic_string_view<value_type>(__str.data(), __str.size());
+ if constexpr (is_same_v<_Tp, typename _Tp::value_type>)
+ return __str; // No conversion needed.
+#if !defined _GLIBCXX_FILESYSTEM_IS_WINDOWS && defined _GLIBCXX_USE_CHAR8_T
+ else if constexpr (is_same_v<_Tp, std::u8string>)
+ // Calling _S_convert<char8_t> will return a u8string_view that
+ // refers to __str and would dangle after this function returns.
+ // Return a string_type instead, to avoid dangling.
+ return string_type(_S_convert(std::u8string_view(__str)));
+#endif
else
return _S_convert(__str.data(), __str.data() + __str.size());
}
@@ -602,6 +635,9 @@ namespace __detail
static auto
_S_convert(const _EcharT* __first, const _EcharT* __last);
+ // _S_convert_loc converts a range of char to string_type, using the
+ // supplied locale for encoding conversions.
+
static string_type
_S_convert_loc(const char* __first, const char* __last,
const std::locale& __loc);
next reply other threads:[~2021-10-13 19:41 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-13 19:41 Jonathan Wakely [this message]
2021-10-13 20:19 ` Jonathan Wakely
2021-10-13 22:57 ` [committed] libstdc++: Fix regression in memory use when constructing paths Jonathan Wakely
2021-10-14 14:18 ` [committed] libstdc++: Fix brainwrong in path::_S_convert(T) [PR102743] Jonathan Wakely
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YWc2cp/MJDN0/LDa@redhat.com \
--to=jwakely@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=libstdc++@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).