From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by sourceware.org (Postfix) with ESMTP id 89FE03840C06 for ; Mon, 1 Jun 2020 23:14:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 89FE03840C06 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-280-6ajEgB5sO0Wy-eY6YZZToA-1; Mon, 01 Jun 2020 19:14:25 -0400 X-MC-Unique: 6ajEgB5sO0Wy-eY6YZZToA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4B1F9835B40; Mon, 1 Jun 2020 23:14:24 +0000 (UTC) Received: from localhost (unknown [10.33.36.10]) by smtp.corp.redhat.com (Postfix) with ESMTP id BB31C10013D2; Mon, 1 Jun 2020 23:14:23 +0000 (UTC) Date: Tue, 2 Jun 2020 00:14:22 +0100 From: Jonathan Wakely To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: Re: [committed 3/3] libstdc++: Refactor filesystem::path string conversions Message-ID: <20200601231422.GX2678@redhat.com> References: <20200523084043.GQ2678@redhat.com> <20200523084402.GT2678@redhat.com> MIME-Version: 1.0 In-Reply-To: <20200523084402.GT2678@redhat.com> X-Clacks-Overhead: GNU Terry Pratchett X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="00hq2S6J2Jlg6EbK" Content-Disposition: inline X-Spam-Status: No, score=-17.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Jun 2020 23:14:29 -0000 --00hq2S6J2Jlg6EbK Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline On 23/05/20 09:44 +0100, Jonathan Wakely wrote: >This simplifies the logic of converting Source arguments and pairs of >InputIterator arguments into the native string format. For any input >that is a contiguous range of path::value_type (or char8_t for POSIX) >a string view can be created and the conversion can be done directly, >with no intermediate allocation. Previously some cases created a >basic_string unnecessarily, for example construction from a pair of >path::string_type::iterators, or a pair of non-const value_type* >pointers. > > * include/bits/fs_path.h (__detail::_S_range_begin) > (__detail::_S_range_end, path::_S_string_from_iter): Replace with > overloaded function template __detail::__effective_range. > (__detail::__effective_range): New overloaded function template to > create a basic_string or basic_string_view for an effective range. > (__detail::__value_type_is_char): Use __detail::__effective_range. > Do not use remove_const on value type. > (__detail::__value_type_is_char_or_char8_t): Likewise. > (path::path(const Source&, format)) > (path::path(const Source&, const locale&)) > (path::operator/=(const Source&), path::append(const Source&)) > (path::concat(const Source&)): Use __detail::__effective_range. > (path::_S_to_string(InputIterator, InputIterator)): New function > template to create a string view if possible, or string otherwise. > (path::_S_convert): Add overloads that convert a string returned > by __detail::__effective_range. Use if-constexpr to inline conversion > logic from all overloads of _Cvt::_S_convert. > (path::_S_convert_loc): Add overload that converts a string. Use > _S_to_string to avoid allocation when possible. > (path::_Cvt): Remove. > (path::operator+=(CharT)): Remove indirection through path::concat. > * include/experimental/bits/fs_path.h (path::_S_convert_loc): Add > overload for non-const pointers, to avoid constructing a std::string. > * src/c++17/fs_path.cc (path::_S_convert_loc): Replace conditional > compilation with call to _S_convert. This commit broke *-*-mingw* bootstrap. Fixed with the attached patch. Tested powerpc64le-linux and x86_64-w64-mingw32, committed to master. --00hq2S6J2Jlg6EbK Content-Type: text/x-patch; charset=us-ascii Content-Disposition: attachment; filename="patch.txt" commit cd3f067b82a1331f5fb695879ba5c3d9bb2cca3a Author: Jonathan Wakely Date: Tue Jun 2 00:07:05 2020 +0100 libstdc++: Fix filesystem::u8path for mingw targets (PR 95392) When I refactored filesystem::path string conversions in r11-587-584d52b088f9fcf78704b504c3f1f07e17c1cded I failed to update the mingw-specific code in filesystem::u8path, causing a bootstrap failure. This fixes it, and further refactors the mingw-specific code along the same lines as the previous commit. All conversions from UTF-8 strings to wide strings now use the same helper function, __wstr_from_utf8. PR libstdc++/95392 * include/bits/fs_path.h (path::_S_to_string): Move to namespace-scope and rename to ... (__detail::__string_from_range): ... this. [WINDOWS] (__detail::__wstr_from_utf8): New function template to convert a char sequence containing UTF-8 to wstring. (path::_S_convert(Iter, Iter)): Adjust call to _S_to_string. (path::_S_convert_loc(Iter, Iter, const locale&)): Likewise. (u8path(InputIterator, InputIterator)) [WINDOWS]: Use __string_from_range to obtain a contiguous range and __wstr_from_utf8 to obtain a wide string. (u8path(const Source&)) [WINDOWS]: Use __effective_range to obtain a contiguous range and __wstr_from_utf8 to obtain a wide string. (path::_S_convert(const _EcharT*, const _EcharT)) [WINDOWS]: Use __wstr_from_utf8. diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h index 2d2766ec62e..26ddf0afec4 100644 --- a/libstdc++-v3/include/bits/fs_path.h +++ b/libstdc++-v3/include/bits/fs_path.h @@ -211,6 +211,51 @@ namespace __detail #endif , _Val>; + // Create a string or string view from an iterator range. + template + inline auto + __string_from_range(_InputIterator __first, _InputIterator __last) + { + using _EcharT + = typename std::iterator_traits<_InputIterator>::value_type; + static_assert(__is_encoded_char<_EcharT>); + +#if __cpp_lib_concepts + constexpr bool __contiguous = std::contiguous_iterator<_InputIterator>; +#else + constexpr bool __contiguous + = is_pointer_v; +#endif + if constexpr (__contiguous) + { + // For contiguous iterators we can just return a string view. + const auto* __f = std::__to_address(std::__niter_base(__first)); + const auto* __l = std::__to_address(std::__niter_base(__last)); + return basic_string_view<_EcharT>(__f, __l - __f); + } + else + // Conversion requires contiguous characters, so create a string. + return basic_string<_EcharT>(__first, __last); + } + +#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS + template + inline std::wstring + __wstr_from_utf8(const _Tp& __str) + { + static_assert(std::is_same_v); + std::wstring __wstr; + // XXX This assumes native wide encoding is UTF-16. + std::codecvt_utf8_utf16 __wcvt; + const auto __p = __str.data(); + if (!__str_codecvt_in_all(__p, __p + __str.size(), __wstr, __wcvt)) + _GLIBCXX_THROW_OR_ABORT(filesystem_error( + "Cannot convert character sequence", + std::make_error_code(errc::illegal_byte_sequence))); + return __wstr; + } +#endif + } // namespace __detail /// @endcond @@ -542,33 +587,6 @@ namespace __detail pair _M_find_extension() const noexcept; - // Create a string or string view from an iterator range. - template - static auto - _S_to_string(_InputIterator __first, _InputIterator __last) - { - using _EcharT - = typename std::iterator_traits<_InputIterator>::value_type; - static_assert(__detail::__is_encoded_char<_EcharT>); - -#if __cpp_lib_concepts - constexpr bool __contiguous = std::contiguous_iterator<_InputIterator>; -#else - constexpr bool __contiguous - = is_pointer_v; -#endif - if constexpr (__contiguous) - { - // For contiguous iterators we can just return a string view. - const auto* __f = std::__to_address(std::__niter_base(__first)); - const auto* __l = std::__to_address(std::__niter_base(__last)); - return basic_string_view<_EcharT>(__f, __l - __f); - } - else - // Conversion requires contiguous characters, so create a string. - return basic_string<_EcharT>(__first, __last); - } - // path::_S_convert creates a basic_string or // basic_string_view from a range (either the effective // range of a Source parameter, or a pair of InputIterator parameters), @@ -602,7 +620,7 @@ namespace __detail template static auto _S_convert(_Iter __first, _Iter __last) - { return _S_convert(_S_to_string(__first, __last)); } + { return _S_convert(__detail::__string_from_range(__first, __last)); } static string_type _S_convert_loc(const char* __first, const char* __last, @@ -612,7 +630,7 @@ namespace __detail static string_type _S_convert_loc(_Iter __first, _Iter __last, const std::locale& __loc) { - const auto __s = _S_to_string(__first, __last); + const auto __s = __detail::__string_from_range(__first, __last); return _S_convert_loc(__s.data(), __s.data() + __s.size(), __loc); } @@ -738,28 +756,10 @@ namespace __detail { #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS if constexpr (is_same_v<_CharT, char>) - { - // XXX This assumes native wide encoding is UTF-16. - std::codecvt_utf8_utf16 __cvt; - path::string_type __tmp; - if constexpr (is_pointer_v<_InputIterator>) - { - if (__str_codecvt_in_all(__first, __last, __tmp, __cvt)) - return path{ __tmp }; - } - else - { - const std::string __u8str{__first, __last}; - const char* const __p = __u8str.data(); - if (__str_codecvt_in_all(__p, __p + __u8str.size(), __tmp, __cvt)) - return path{ __tmp }; - } - _GLIBCXX_THROW_OR_ABORT(filesystem_error( - "Cannot convert character sequence", - std::make_error_code(errc::illegal_byte_sequence))); - } + return path{ __detail::__wstr_from_utf8( + __detail::__string_from_range(__first, __last)) }; else - return path{ __first, __last }; + return path{ __first, __last }; // constructor handles char8_t #else // This assumes native normal encoding is UTF-8. return path{ __first, __last }; @@ -778,21 +778,12 @@ namespace __detail { #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS if constexpr (is_same_v<_CharT, char>) - { - if constexpr (is_convertible_v) - { - const std::string_view __s = __source; - return filesystem::u8path(__s.data(), __s.data() + __s.size()); - } - else - { - std::string __s = path::_S_string_from_iter(__source); - return filesystem::u8path(__s.data(), __s.data() + __s.size()); - } - } + return path{ __detail::__wstr_from_utf8( + __detail::__effective_range(__source)) }; else - return path{ __source }; + return path{ __source }; // constructor handles char8_t #else + // This assumes native normal encoding is UTF-8. return path{ __source }; #endif } @@ -836,11 +827,8 @@ namespace __detail #ifdef _GLIBCXX_USE_CHAR8_T else if constexpr (is_same_v<_EcharT, char8_t>) { - const char* __f2 = (const char*)__f; - const char* __l2 = (const char*)__l; - std::codecvt_utf8_utf16 __wcvt; - if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt)) - return __wstr; + const auto __f2 = reinterpret_cast(__f); + return __detail::__wstr_from_utf8(string_view(__f2, __l - __f)); } #endif else // char16_t or char32_t @@ -849,13 +837,7 @@ namespace __detail { } __cvt; std::string __str; if (__str_codecvt_out_all(__f, __l, __str, __cvt)) - { - const char* __f2 = __str.data(); - const char* __l2 = __f2 + __str.size(); - std::codecvt_utf8_utf16 __wcvt; - if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt)) - return __wstr; - } + return __detail::__wstr_from_utf8(__str); } #else // ! windows struct _UCvt : std::codecvt<_EcharT, char, std::mbstate_t> --00hq2S6J2Jlg6EbK--