From: Jonathan Wakely <jwakely@redhat.com>
To: Tom Honermann <tom@honermann.net>
Cc: "libstdc++@gcc.gnu.org" <libstdc++@gcc.gnu.org>,
gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 0/4]: C++ P1423R3 char8_t remediation implementation
Date: Sat, 30 Nov 2019 01:04:00 -0000 [thread overview]
Message-ID: <20191130010341.GB11522@redhat.com> (raw)
In-Reply-To: <20191129214534.GA11522@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 2885 bytes --]
On 29/11/19 21:45 +0000, Jonathan Wakely wrote:
>On 29/11/19 19:48 +0000, Jonathan Wakely wrote:
>>On 29/11/19 17:45 +0000, Jonathan Wakely wrote:
>>>On 15/09/19 15:39 -0400, Tom Honermann wrote:
>>>>This series of patches provides an implementation of the changes
>>>>for C++ proposal P1423R3 [1].
>>>>
>>>>These changes do not impact default libstdc++ behavior for C++17
>>>>and earlier; they are only active for C++2a or when the
>>>>-fchar8_t option is specified.
>>>>
>>>>Tested x86_64-linux.
>>>>
>>>>Patch 1: Decouple constraints for u8path from path constructors.
>>>>Patch 2: Update __cpp_lib_char8_t feature test macro value, add
>>>>deleted operators, update u8path.
>>>>Patch 3: Updates to existing tests.
>>>>Patch 4: New tests.
>>>>
>>>>Tom.
>>>>
>>>>[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r3.html
>>>
>>>It took a while, but I've committed these four patches, with just some
>>>minor whitespace changes and changelog tweaks.
>>
>>Running the new tests revealed a latent bug on Windows, where
>>experimental::filesystem::u8path(const Source&) assumed the input
>>was an iterator over a NTCTS. That worked for a const char* but not a
>>std::string or experimental::string_view.
>>
>>The attached patch fixes that (and simplifies the #if and if-constexpr
>>conditions for Windows) but there's a remaining bug. Constructing a
>>experimental::filesystem::path from a char8_t string doesn't do the
>>right thing on Windows, so these cases fails:
>>
>>fs::path p(u8"\xf0\x9d\x84\x9e");
>>VERIFY( p.u8string() == u8"\U0001D11E" );
>>
>>p = fs::u8path(u8"\xf0\x9d\x84\x9e");
>>VERIFY( p.u8string() == u8"\U0001D11E" );
>>
>>It works correctly for std::filesystem::path, just not the TS version.
>
>I think this is the fix needed for the TS code:
>
>--- a/libstdc++-v3/include/experimental/bits/fs_path.h
>+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
>@@ -765,7 +765,14 @@ namespace __detail
> {
>#ifdef _GLIBCXX_USE_CHAR8_T
> if constexpr (is_same<_CharT, char8_t>::value)
>- return _S_wconvert((const char*)__f, (const char*)__l, true_type());
>+ {
>+ const char* __f2 = (const char*)__f;
>+ const char* __l2 = (const char*)__l;
>+ std::wstring __wstr;
>+ std::codecvt_utf8_utf16<wchar_t> __wcvt;
>+ if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt))
>+ return __wstr;
>+ }
> else
>#endif
> {
>
>The current code uses std::codecvt<wchar_t, char, mbstate_t> but when
>we know the input is UTF-8 encoded we should use codecvt_utf8_utf16
>(which is what the C++17 code already does for char8_t input).
>
>I'll add that the patch I'm testing.
Here's the final patch I'm committing. Tested powerpc64le-linux and
x86_64-w64-mingw, committed to trunk.
[-- Attachment #2: patch.txt --]
[-- Type: text/x-patch, Size: 7586 bytes --]
commit cb232cf3d475d51fe5e550680bff64dfd32f57a5
Author: Jonathan Wakely <jwakely@redhat.com>
Date: Fri Nov 29 19:36:39 2019 +0000
libstdc++: Fix experimental::filesystem::u8path(const Source&) for Windows
This function failed to compile when called with a std::string.
Also, constructing a path with a char8_t string did not correctly treat
the string as already UTF-8 encoded.
* include/bits/fs_path.h (u8path(InputIterator, InputIterator))
(u8path(const Source&)) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Simplify
conditions.
* include/experimental/bits/fs_path.h [_GLIBCXX_FILESYSTEM_IS_WINDOWS]
(__u8path(const Source&, char)): Add overloads for std::string and
types convertible to std::string.
(_Cvt::_S_wconvert): Add a new overload for char8_t strings and use
codecvt_utf8_utf16 to do the correct conversion.
diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h
index b129372447b..20ec42da57d 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -691,14 +691,8 @@ namespace __detail
u8path(_InputIterator __first, _InputIterator __last)
{
#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
-#ifdef _GLIBCXX_USE_CHAR8_T
- if constexpr (is_same_v<_CharT, char8_t>)
+ if constexpr (is_same_v<_CharT, char>)
{
- return path{ __first, __last };
- }
- else
- {
-#endif
// XXX This assumes native wide encoding is UTF-16.
std::codecvt_utf8_utf16<path::value_type> __cvt;
path::string_type __tmp;
@@ -710,16 +704,16 @@ namespace __detail
else
{
const std::string __u8str{__first, __last};
- const char* const __ptr = __u8str.data();
- if (__str_codecvt_in_all(__ptr, __ptr + __u8str.size(), __tmp, __cvt))
+ const char* const __p = __u8str.data();
+ if (__str_codecvt_in_all(__p, __p + __u8str.size(), __tmp, __cvt))
return path{ __tmp };
}
_GLIBCXX_THROW_OR_ABORT(filesystem_error(
"Cannot convert character sequence",
std::make_error_code(errc::illegal_byte_sequence)));
-#ifdef _GLIBCXX_USE_CHAR8_T
}
-#endif
+ else
+ return path{ __first, __last };
#else
// This assumes native normal encoding is UTF-8.
return path{ __first, __last };
@@ -737,14 +731,8 @@ namespace __detail
u8path(const _Source& __source)
{
#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
-#ifdef _GLIBCXX_USE_CHAR8_T
- if constexpr (is_same_v<_CharT, char8_t>)
+ if constexpr (is_same_v<_CharT, char>)
{
- return path{ __source };
- }
- else
- {
-#endif
if constexpr (is_convertible_v<const _Source&, std::string_view>)
{
const std::string_view __s = __source;
@@ -755,9 +743,9 @@ namespace __detail
std::string __s = path::_S_string_from_iter(__source);
return filesystem::u8path(__s.data(), __s.data() + __s.size());
}
-#ifdef _GLIBCXX_USE_CHAR8_T
}
-#endif
+ else
+ return path{ __source };
#else
return path{ __source };
#endif
diff --git a/libstdc++-v3/include/experimental/bits/fs_path.h b/libstdc++-v3/include/experimental/bits/fs_path.h
index 91202e5b008..5ce012eec81 100644
--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -644,8 +644,22 @@ namespace __detail
/// Create a path from a UTF-8-encoded sequence of char
#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+ inline path
+ __u8path(const string& __s, char)
+ {
+ return filesystem::u8path(__s.data(), __s.data() + __s.size());
+ }
+
template<typename _Source>
- inline path
+ inline __enable_if_t<is_convertible<const _Source&, string>::value, path>
+ __u8path(const _Source& __source, char)
+ {
+ std::string __s = __source;
+ return filesystem::u8path(__s.data(), __s.data() + __s.size());
+ }
+
+ template<typename _Source>
+ inline __enable_if_t<!is_convertible<const _Source&, string>::value, path>
__u8path(const _Source& __source, char)
{
std::string __s = path::_S_string_from_iter(__source);
@@ -733,8 +747,21 @@ namespace __detail
struct path::_Cvt
{
#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+#ifdef _GLIBCXX_USE_CHAR8_T
static string_type
- _S_wconvert(const char* __f, const char* __l, true_type)
+ _S_wconvert(const char8_t* __f, const char8_t* __l, const char8_t*)
+ {
+ const char* __f2 = (const char*)__f;
+ const char* __l2 = (const char*)__l;
+ std::wstring __wstr;
+ std::codecvt_utf8_utf16<wchar_t> __wcvt;
+ if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt))
+ return __wstr;
+ }
+#endif
+
+ static string_type
+ _S_wconvert(const char* __f, const char* __l, const char*)
{
using _Cvt = std::codecvt<wchar_t, char, mbstate_t>;
const auto& __cvt = std::use_facet<_Cvt>(std::locale{});
@@ -747,36 +774,29 @@ namespace __detail
}
static string_type
- _S_wconvert(const _CharT* __f, const _CharT* __l, false_type)
+ _S_wconvert(const _CharT* __f, const _CharT* __l, const void*)
{
-#ifdef _GLIBCXX_USE_CHAR8_T
- if constexpr (is_same<_CharT, char8_t>::value)
- return _S_wconvert((const char*)__f, (const char*)__l, true_type());
- else
-#endif
+ struct _UCvt : std::codecvt<_CharT, char, std::mbstate_t>
+ { } __cvt;
+ std::string __str;
+ if (__str_codecvt_out_all(__f, __l, __str, __cvt))
{
- struct _UCvt : std::codecvt<_CharT, char, std::mbstate_t>
- { } __cvt;
- std::string __str;
- if (__str_codecvt_out_all(__f, __l, __str, __cvt))
- {
- const char* __f2 = __str.data();
- const char* __l2 = __f2 + __str.size();
- std::codecvt_utf8_utf16<wchar_t> __wcvt;
- std::wstring __wstr;
- if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt))
- return __wstr;
- }
- _GLIBCXX_THROW_OR_ABORT(filesystem_error(
- "Cannot convert character sequence",
- std::make_error_code(errc::illegal_byte_sequence)));
+ const char* __f2 = __str.data();
+ const char* __l2 = __f2 + __str.size();
+ std::codecvt_utf8_utf16<wchar_t> __wcvt;
+ std::wstring __wstr;
+ if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt))
+ return __wstr;
}
+ _GLIBCXX_THROW_OR_ABORT(filesystem_error(
+ "Cannot convert character sequence",
+ std::make_error_code(errc::illegal_byte_sequence)));
}
static string_type
_S_convert(const _CharT* __f, const _CharT* __l)
{
- return _S_wconvert(__f, __l, is_same<_CharT, char>{});
+ return _S_wconvert(__f, __l, (const _CharT*)nullptr);
}
#else
static string_type
@@ -786,19 +806,17 @@ namespace __detail
if constexpr (is_same<_CharT, char8_t>::value)
return string_type(__f, __l);
else
- {
#endif
+ {
struct _UCvt : std::codecvt<_CharT, char, std::mbstate_t>
{ } __cvt;
std::string __str;
if (__str_codecvt_out_all(__f, __l, __str, __cvt))
return __str;
-#ifdef _GLIBCXX_USE_CHAR8_T
+ _GLIBCXX_THROW_OR_ABORT(filesystem_error(
+ "Cannot convert character sequence",
+ std::make_error_code(errc::illegal_byte_sequence)));
}
-#endif
- _GLIBCXX_THROW_OR_ABORT(filesystem_error(
- "Cannot convert character sequence",
- std::make_error_code(errc::illegal_byte_sequence)));
}
#endif
prev parent reply other threads:[~2019-11-30 1:03 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-15 19:39 Tom Honermann
2019-11-29 17:47 ` Jonathan Wakely
2019-11-29 19:53 ` Jonathan Wakely
2019-11-29 22:11 ` Jonathan Wakely
2019-11-30 1:04 ` Jonathan Wakely [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191130010341.GB11522@redhat.com \
--to=jwakely@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=libstdc++@gcc.gnu.org \
--cc=tom@honermann.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).