From: Jonathan Wakely <jwakely@redhat.com>
To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org
Subject: Re: [committed 3/3] libstdc++: Refactor filesystem::path string conversions
Date: Tue, 2 Jun 2020 00:14:22 +0100 [thread overview]
Message-ID: <20200601231422.GX2678@redhat.com> (raw)
In-Reply-To: <20200523084402.GT2678@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 2347 bytes --]
On 23/05/20 09:44 +0100, Jonathan Wakely wrote:
>This simplifies the logic of converting Source arguments and pairs of
>InputIterator arguments into the native string format. For any input
>that is a contiguous range of path::value_type (or char8_t for POSIX)
>a string view can be created and the conversion can be done directly,
>with no intermediate allocation. Previously some cases created a
>basic_string unnecessarily, for example construction from a pair of
>path::string_type::iterators, or a pair of non-const value_type*
>pointers.
>
> * include/bits/fs_path.h (__detail::_S_range_begin)
> (__detail::_S_range_end, path::_S_string_from_iter): Replace with
> overloaded function template __detail::__effective_range.
> (__detail::__effective_range): New overloaded function template to
> create a basic_string or basic_string_view for an effective range.
> (__detail::__value_type_is_char): Use __detail::__effective_range.
> Do not use remove_const on value type.
> (__detail::__value_type_is_char_or_char8_t): Likewise.
> (path::path(const Source&, format))
> (path::path(const Source&, const locale&))
> (path::operator/=(const Source&), path::append(const Source&))
> (path::concat(const Source&)): Use __detail::__effective_range.
> (path::_S_to_string(InputIterator, InputIterator)): New function
> template to create a string view if possible, or string otherwise.
> (path::_S_convert): Add overloads that convert a string returned
> by __detail::__effective_range. Use if-constexpr to inline conversion
> logic from all overloads of _Cvt::_S_convert.
> (path::_S_convert_loc): Add overload that converts a string. Use
> _S_to_string to avoid allocation when possible.
> (path::_Cvt): Remove.
> (path::operator+=(CharT)): Remove indirection through path::concat.
> * include/experimental/bits/fs_path.h (path::_S_convert_loc): Add
> overload for non-const pointers, to avoid constructing a std::string.
> * src/c++17/fs_path.cc (path::_S_convert_loc): Replace conditional
> compilation with call to _S_convert.
This commit broke *-*-mingw* bootstrap. Fixed with the attached patch.
Tested powerpc64le-linux and x86_64-w64-mingw32, committed to master.
[-- Attachment #2: patch.txt --]
[-- Type: text/x-patch, Size: 8517 bytes --]
commit cd3f067b82a1331f5fb695879ba5c3d9bb2cca3a
Author: Jonathan Wakely <jwakely@redhat.com>
Date: Tue Jun 2 00:07:05 2020 +0100
libstdc++: Fix filesystem::u8path for mingw targets (PR 95392)
When I refactored filesystem::path string conversions in
r11-587-584d52b088f9fcf78704b504c3f1f07e17c1cded I failed to update the
mingw-specific code in filesystem::u8path, causing a bootstrap failure.
This fixes it, and further refactors the mingw-specific code along the
same lines as the previous commit. All conversions from UTF-8 strings to
wide strings now use the same helper function, __wstr_from_utf8.
PR libstdc++/95392
* include/bits/fs_path.h (path::_S_to_string): Move to
namespace-scope and rename to ...
(__detail::__string_from_range): ... this.
[WINDOWS] (__detail::__wstr_from_utf8): New function template to
convert a char sequence containing UTF-8 to wstring.
(path::_S_convert(Iter, Iter)): Adjust call to _S_to_string.
(path::_S_convert_loc(Iter, Iter, const locale&)): Likewise.
(u8path(InputIterator, InputIterator)) [WINDOWS]: Use
__string_from_range to obtain a contiguous range and
__wstr_from_utf8 to obtain a wide string.
(u8path(const Source&)) [WINDOWS]: Use __effective_range to
obtain a contiguous range and __wstr_from_utf8 to obtain a wide
string.
(path::_S_convert(const _EcharT*, const _EcharT)) [WINDOWS]:
Use __wstr_from_utf8.
diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h
index 2d2766ec62e..26ddf0afec4 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -211,6 +211,51 @@ namespace __detail
#endif
, _Val>;
+ // Create a string or string view from an iterator range.
+ template<typename _InputIterator>
+ inline auto
+ __string_from_range(_InputIterator __first, _InputIterator __last)
+ {
+ using _EcharT
+ = typename std::iterator_traits<_InputIterator>::value_type;
+ static_assert(__is_encoded_char<_EcharT>);
+
+#if __cpp_lib_concepts
+ constexpr bool __contiguous = std::contiguous_iterator<_InputIterator>;
+#else
+ constexpr bool __contiguous
+ = is_pointer_v<decltype(std::__niter_base(__first))>;
+#endif
+ if constexpr (__contiguous)
+ {
+ // For contiguous iterators we can just return a string view.
+ const auto* __f = std::__to_address(std::__niter_base(__first));
+ const auto* __l = std::__to_address(std::__niter_base(__last));
+ return basic_string_view<_EcharT>(__f, __l - __f);
+ }
+ else
+ // Conversion requires contiguous characters, so create a string.
+ return basic_string<_EcharT>(__first, __last);
+ }
+
+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+ template<typename _Tp>
+ inline std::wstring
+ __wstr_from_utf8(const _Tp& __str)
+ {
+ static_assert(std::is_same_v<typename _Tp::value_type, char>);
+ std::wstring __wstr;
+ // XXX This assumes native wide encoding is UTF-16.
+ std::codecvt_utf8_utf16<wchar_t> __wcvt;
+ const auto __p = __str.data();
+ if (!__str_codecvt_in_all(__p, __p + __str.size(), __wstr, __wcvt))
+ _GLIBCXX_THROW_OR_ABORT(filesystem_error(
+ "Cannot convert character sequence",
+ std::make_error_code(errc::illegal_byte_sequence)));
+ return __wstr;
+ }
+#endif
+
} // namespace __detail
/// @endcond
@@ -542,33 +587,6 @@ namespace __detail
pair<const string_type*, size_t> _M_find_extension() const noexcept;
- // Create a string or string view from an iterator range.
- template<typename _InputIterator>
- static auto
- _S_to_string(_InputIterator __first, _InputIterator __last)
- {
- using _EcharT
- = typename std::iterator_traits<_InputIterator>::value_type;
- static_assert(__detail::__is_encoded_char<_EcharT>);
-
-#if __cpp_lib_concepts
- constexpr bool __contiguous = std::contiguous_iterator<_InputIterator>;
-#else
- constexpr bool __contiguous
- = is_pointer_v<decltype(std::__niter_base(__first))>;
-#endif
- if constexpr (__contiguous)
- {
- // For contiguous iterators we can just return a string view.
- const auto* __f = std::__to_address(std::__niter_base(__first));
- const auto* __l = std::__to_address(std::__niter_base(__last));
- return basic_string_view<_EcharT>(__f, __l - __f);
- }
- else
- // Conversion requires contiguous characters, so create a string.
- return basic_string<_EcharT>(__first, __last);
- }
-
// path::_S_convert creates a basic_string<value_type> or
// basic_string_view<value_type> from a range (either the effective
// range of a Source parameter, or a pair of InputIterator parameters),
@@ -602,7 +620,7 @@ namespace __detail
template<typename _Iter>
static auto
_S_convert(_Iter __first, _Iter __last)
- { return _S_convert(_S_to_string(__first, __last)); }
+ { return _S_convert(__detail::__string_from_range(__first, __last)); }
static string_type
_S_convert_loc(const char* __first, const char* __last,
@@ -612,7 +630,7 @@ namespace __detail
static string_type
_S_convert_loc(_Iter __first, _Iter __last, const std::locale& __loc)
{
- const auto __s = _S_to_string(__first, __last);
+ const auto __s = __detail::__string_from_range(__first, __last);
return _S_convert_loc(__s.data(), __s.data() + __s.size(), __loc);
}
@@ -738,28 +756,10 @@ namespace __detail
{
#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
if constexpr (is_same_v<_CharT, char>)
- {
- // XXX This assumes native wide encoding is UTF-16.
- std::codecvt_utf8_utf16<path::value_type> __cvt;
- path::string_type __tmp;
- if constexpr (is_pointer_v<_InputIterator>)
- {
- if (__str_codecvt_in_all(__first, __last, __tmp, __cvt))
- return path{ __tmp };
- }
- else
- {
- const std::string __u8str{__first, __last};
- const char* const __p = __u8str.data();
- if (__str_codecvt_in_all(__p, __p + __u8str.size(), __tmp, __cvt))
- return path{ __tmp };
- }
- _GLIBCXX_THROW_OR_ABORT(filesystem_error(
- "Cannot convert character sequence",
- std::make_error_code(errc::illegal_byte_sequence)));
- }
+ return path{ __detail::__wstr_from_utf8(
+ __detail::__string_from_range(__first, __last)) };
else
- return path{ __first, __last };
+ return path{ __first, __last }; // constructor handles char8_t
#else
// This assumes native normal encoding is UTF-8.
return path{ __first, __last };
@@ -778,21 +778,12 @@ namespace __detail
{
#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
if constexpr (is_same_v<_CharT, char>)
- {
- if constexpr (is_convertible_v<const _Source&, std::string_view>)
- {
- const std::string_view __s = __source;
- return filesystem::u8path(__s.data(), __s.data() + __s.size());
- }
- else
- {
- std::string __s = path::_S_string_from_iter(__source);
- return filesystem::u8path(__s.data(), __s.data() + __s.size());
- }
- }
+ return path{ __detail::__wstr_from_utf8(
+ __detail::__effective_range(__source)) };
else
- return path{ __source };
+ return path{ __source }; // constructor handles char8_t
#else
+ // This assumes native normal encoding is UTF-8.
return path{ __source };
#endif
}
@@ -836,11 +827,8 @@ namespace __detail
#ifdef _GLIBCXX_USE_CHAR8_T
else if constexpr (is_same_v<_EcharT, char8_t>)
{
- const char* __f2 = (const char*)__f;
- const char* __l2 = (const char*)__l;
- std::codecvt_utf8_utf16<wchar_t> __wcvt;
- if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt))
- return __wstr;
+ const auto __f2 = reinterpret_cast<const char*>(__f);
+ return __detail::__wstr_from_utf8(string_view(__f2, __l - __f));
}
#endif
else // char16_t or char32_t
@@ -849,13 +837,7 @@ namespace __detail
{ } __cvt;
std::string __str;
if (__str_codecvt_out_all(__f, __l, __str, __cvt))
- {
- const char* __f2 = __str.data();
- const char* __l2 = __f2 + __str.size();
- std::codecvt_utf8_utf16<wchar_t> __wcvt;
- if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt))
- return __wstr;
- }
+ return __detail::__wstr_from_utf8(__str);
}
#else // ! windows
struct _UCvt : std::codecvt<_EcharT, char, std::mbstate_t>
prev parent reply other threads:[~2020-06-01 23:14 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-23 8:40 [committed 0/3] " Jonathan Wakely
2020-05-23 8:42 ` [committed 1/3] libstdc++: Simplify filesystem::path SFINAE constraints Jonathan Wakely
2020-05-23 8:43 ` [committed 2/3] libstdc++: Remove incorrect static specifiers Jonathan Wakely
2020-05-23 8:44 ` [committed 3/3] libstdc++: Refactor filesystem::path string conversions Jonathan Wakely
2020-06-01 23:14 ` Jonathan Wakely [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200601231422.GX2678@redhat.com \
--to=jwakely@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=libstdc++@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).