From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id D000F3858C1F; Wed, 30 Aug 2023 19:10:19 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D000F3858C1F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1693422619; bh=eLohTVxxmi9cJChUz1IWHNT4p90ynieohbMDO1HgZU4=; h=From:To:Subject:Date:From; b=tHsPHCoKSalQi6wgPBo9fdvFKyrThbEKvEHc1yn7iGoEzyQFSl3gzpIf+w2sejXlm rDsvDaP1zTadA28e5gm87HOpNedRNQrALcZ8fYGvgVB7vRJbhqFgsQIYk7sgRjzNRs 0Nagcst8sqNyiTxnQ98jSU8c6CyLbCSXzgu/TyBU= From: "thiago at kde dot org" To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/111244] New: std::filesystem::path encoding mismatches locale on Windows Date: Wed, 30 Aug 2023 19:10:19 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c++ X-Bugzilla-Version: 13.2.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: thiago at kde dot org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111244 Bug ID: 111244 Summary: std::filesystem::path encoding mismatches locale on Windows Product: gcc Version: 13.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: thiago at kde dot org Target Milestone: --- Test: $ cat fstest.cpp=20 #include #include int main(int argc, char **argv) { for (int i =3D 1; i < argc; ++i) { std::filesystem::path p(argv[i]); if (std::filesystem::exists(p)) { printf("%s %llu\n", argv[1], (unsigned long long)std::filesystem::file_size(p)); } else { printf("%s does not exist\n", argv[1]); } } } $ touch fil=C3=A6 $ g++ fstest.cpp $ ./a.out fstest.cpp fil=C3=A6 On Linux (and any other Unix): fstest.cpp 377 fstest.cpp 0 On Windows with libc++ or MS STL: fstest.cpp 377 fstest.cpp 0 On Windows with libstdc++: fstest.cpp 377 terminate called after throwing an instance of 'std::filesystem::__cxx11::filesystem_error' what(): filesystem error: Cannot convert character sequence: Illegal byte sequence This is caused by std::filesystem::path interpreting the input as UTF-8. On Windows, it's not; it must be decoded using the locale codec.=20 Strictly speaking, the same should apply to the conversion to Unicode on Un= ix systems too, but a) they're almost all UTF-8 these days, so the corner cases may be ignored by a policy decision and b) the mismatch of input does not l= ead to inability to refer to files by fs::path alone.=