From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id ACEF83858D32; Mon, 10 Jul 2023 15:47:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org ACEF83858D32 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689004043; bh=2N0DMF2UQ6C4EbxWdWl+bh2VSAtKpVAX3e2Nq71wURs=; h=From:To:Subject:Date:In-Reply-To:References:From; b=mWTcBkJ4IQYNJciSn2nQ67GuQBbSixgJukm86f7gue9u4ai3o5MulTEpNJyRzayY1 +aTC5Dxw5zoSupKFSgQIMLWjD8/70OhfKXL6uGd5BSV2FQeNh/2LvO5NDvBXIS3INm S+NJurauIINPTbGN1pVqLKJEiGAoi+5NDn4FT3uI= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug libstdc++/95048] [11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters Date: Mon, 10 Jul 2023 15:47:22 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libstdc++ X-Bugzilla-Version: 10.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: redi at gcc dot gnu.org X-Bugzilla-Target-Milestone: 11.5 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95048 --- Comment #26 from CVS Commits --- The releases/gcc-11 branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:d308b11fa94728507984b4ccc949219511273ab6 commit r11-10903-gd308b11fa94728507984b4ccc949219511273ab6 Author: Jonathan Wakely Date: Fri Nov 11 15:22:02 2022 +0000 libstdc++: Fix wstring conversions in filesystem::path [PR95048] In commit r9-7381-g91756c4abc1757 I changed filesystem::path to use std::codecvt for conversions from all wide strings to UTF-8, instead of using std::codecvt_utf8. This was done because for 16-bit wchar_t, std::codecvt_utf8 only supports UCS-2 and not UTF-16. The rationale for the change was sound, but the actual fix was not. It's OK to use std::codecvt for char16_t or char32_t, because the specializations for those types always use UTF-8 , but std::codecvt uses the current locale's encodings, and the narrow encoding is probably ASCII and can't support non-ASCII characters. The correct fix is to use std::codecvt only for char16_t and char32_t. For 32-bit wchar_t we could have continued using std::codecvt_utf8 because that uses UTF-32 which is fine, switching to std::codecvt broke non-Windows targets with 32-bit wchar_t. For 16-bit wchar_t we did need to change, but should have changed to std::codecvt_utf8_utf16 instead, as that always uses UTF-16 not UCS-2. I actually noted that in the commit message for r9-7381-g91756c4abc1757 but didn't use that option. Oops. This replaces the unconditional std::codecvt with a type defined via template specialization, so it can vary depending on the wide character type. The code is also simplified to remove some of the mess of #ifdef and if-constexpr conditions. libstdc++-v3/ChangeLog: PR libstdc++/95048 * include/bits/fs_path.h (path::_Codecvt): New class template that selects the kind of code conversion done. (path::_Codecvt): Select based on sizeof(wchar_t). (_GLIBCXX_CONV_FROM_UTF8): New macro to allow the same code to be used for Windows and POSIX. (path::_S_convert(const EcharT*, const EcharT*)): Simplify by using _Codecvt and _GLIBCXX_CONV_FROM_UTF8 abstractions. (path::_S_str_convert(basic_string_view, const A&)): Simplify nested conditions. * include/experimental/bits/fs_path.h (path::_Cvt): Define nested typedef controlling type of code conversion done. (path::_Cvt::_S_wconvert): Use new typedef. (path::string(const A&)): Likewise. * testsuite/27_io/filesystem/path/construct/95048.cc: New test. * testsuite/experimental/filesystem/path/construct/95048.cc: New test. (cherry picked from commit b331bf303bdc1edead41e2b3d11d1a7804b433cf)=