public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters
@ 2020-05-11  7:59 kontakt at neonfoto dot de
  2020-05-11 20:49 ` [Bug libstdc++/95048] " redi at gcc dot gnu.org
                   ` (33 more replies)
  0 siblings, 34 replies; 35+ messages in thread
From: kontakt at neonfoto dot de @ 2020-05-11  7:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

            Bug ID: 95048
           Summary: wstring-constructor of std::filesystem::path throws
                    for non-ASCII characters
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kontakt at neonfoto dot de
  Target Milestone: ---

When trying to port our Windows application to Linux, I encountered a problem
constructing an instance of std::filesystem::path with a wide-character literal
containing a non-ASCII wchar:

#include <filesystem>

int main()
{
    std::filesystem::path p = L"ä";
}

This builds fine with g++-10 -Wall -Wextra -pedantic -std=c++17 minimal.cpp on
my Ubuntu 18.04 in WSL (using g++ from this ppa:
https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/test) but throws an
exception on execution: 

terminate called after throwing an instance of
'std::filesystem::__cxx11::filesystem_error'
  what():  filesystem error: Cannot convert character sequence: Invalid or
incomplete multibyte or wide character

Reading the C++ standard, I believe this should not happen and libstdc++ should
be able to convert the wchar literal to a path. Using clang with libc++ instead
of libstdc++ performs a conversion as I expected. Trying different versions of
g++ in the Compiler Explorer (https://godbolt.org/z/KQD1I6) shows that this
also used to work with g++9.1 and stopped working in g++9.2.

The problem was already described by someone else in this StackOverflow post:
https://stackoverflow.com/questions/58521857/cross-platform-way-to-handle-stdstring-stdwstring-with-stdfilesystempath

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
@ 2020-05-11 20:49 ` redi at gcc dot gnu.org
  2020-05-11 20:59 ` carlos at redhat dot com
                   ` (32 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: redi at gcc dot gnu.org @ 2020-05-11 20:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-05-11

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
This happens because glibc won't convert the wide string to UTF-8:

#include <wchar.h>
#include <assert.h>

int main()
{
  const wchar_t wstr[] = L"ä";
  const wchar_t* from = wstr;
  char to[10];
  mbstate_t s;
  size_t res = wcsnrtombs(to, &from, 1, sizeof(to), &s);
  assert(res != (size_t)-1);
}

I'm not yet sure why glibc refuses to convert that.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
  2020-05-11 20:49 ` [Bug libstdc++/95048] " redi at gcc dot gnu.org
@ 2020-05-11 20:59 ` carlos at redhat dot com
  2020-05-11 21:14 ` [Bug libstdc++/95048] [9/10/11 Regression] " redi at gcc dot gnu.org
                   ` (31 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: carlos at redhat dot com @ 2020-05-11 20:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com

--- Comment #2 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Jonathan Wakely from comment #1)
> This happens because glibc won't convert the wide string to UTF-8:
> 
> #include <wchar.h>
> #include <assert.h>
> 
> int main()
> {
>   const wchar_t wstr[] = L"ä";
>   const wchar_t* from = wstr;
>   char to[10];
>   mbstate_t s;
>   size_t res = wcsnrtombs(to, &from, 1, sizeof(to), &s);
>   assert(res != (size_t)-1);
> }
> 
> I'm not yet sure why glibc refuses to convert that.

ISO C says:

"At program startup, the equivalent of
setlocale(LC_ALL, "C");
is executed."

Which means you are trying to convert UTF-8 to ASCII.

You should call setlocale with a non-ASCII character set to make this work.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [9/10/11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
  2020-05-11 20:49 ` [Bug libstdc++/95048] " redi at gcc dot gnu.org
  2020-05-11 20:59 ` carlos at redhat dot com
@ 2020-05-11 21:14 ` redi at gcc dot gnu.org
  2020-05-11 21:19 ` redi at gcc dot gnu.org
                   ` (30 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: redi at gcc dot gnu.org @ 2020-05-11 21:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
            Summary|wstring-constructor of      |[9/10/11 Regression]
                   |std::filesystem::path       |wstring-constructor of
                   |throws for non-ASCII        |std::filesystem::path
                   |characters                  |throws for non-ASCII
                   |                            |characters
           Assignee|unassigned at gcc dot gnu.org      |redi at gcc dot gnu.org

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
In libstdc++ filesystem::path uses std::codecvt<wchar_t, char, mbstate_t> to
convert from wide character strings in the native wide encoding format (which
is set by GCC's -fwide-exec-charset option) to the native encoding for
pathnames.

The native encoding for pathnames should be UTF-8, but because we use codecvt,
which uses wcsnrtombs, we're actually converting to the narrow encoding of the
current C locale (which is ASCII by default as Carlos said).

So I think libstdc++ needs to use a different conversion from the native wide
encoding to UTF-8 (to be independent of the current locale's narrow encoding).

This would have changed with r272385 (and r272389 on the gcc-9 branch) which
was fixing PR libstdc++/90281.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [9/10/11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (2 preceding siblings ...)
  2020-05-11 21:14 ` [Bug libstdc++/95048] [9/10/11 Regression] " redi at gcc dot gnu.org
@ 2020-05-11 21:19 ` redi at gcc dot gnu.org
  2020-05-11 21:25 ` redi at gcc dot gnu.org
                   ` (29 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: redi at gcc dot gnu.org @ 2020-05-11 21:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Using std::codecvt_utf8<wchar_t> fixes it:

--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -836,8 +836,7 @@ namespace __detail
              }
          }
 #else // ! windows
-       struct _UCvt : std::codecvt<_CharT, char, std::mbstate_t>
-       { } __cvt;
+       struct _UCvt : std::codecvt_utf8<_CharT> { } __cvt;
        std::string __str;
        if (__str_codecvt_out_all(__f, __l, __str, __cvt))
          return __str;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [9/10/11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (3 preceding siblings ...)
  2020-05-11 21:19 ` redi at gcc dot gnu.org
@ 2020-05-11 21:25 ` redi at gcc dot gnu.org
  2020-06-26 12:05 ` rguenth at gcc dot gnu.org
                   ` (28 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: redi at gcc dot gnu.org @ 2020-05-11 21:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Argh, this only works if the wide encoding happens to be UCS-4 (or for 16-bit
wchar_t, UCS-2) because std::codecvt_utf8 only supports converting between that
and UTF-8. Which is why I used std::codecvt<wchar_t, char, mbstate_t> in the
first place.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [9/10/11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (4 preceding siblings ...)
  2020-05-11 21:25 ` redi at gcc dot gnu.org
@ 2020-06-26 12:05 ` rguenth at gcc dot gnu.org
  2020-11-12 13:48 ` gcc-bugzilla at m dot chronial.de
                   ` (27 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-26 12:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |9.4

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [9/10/11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (5 preceding siblings ...)
  2020-06-26 12:05 ` rguenth at gcc dot gnu.org
@ 2020-11-12 13:48 ` gcc-bugzilla at m dot chronial.de
  2020-11-12 13:58 ` redi at gcc dot gnu.org
                   ` (26 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: gcc-bugzilla at m dot chronial.de @ 2020-11-12 13:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Christian Fersch <gcc-bugzilla at m dot chronial.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gcc-bugzilla at m dot chronial.de

--- Comment #6 from Christian Fersch <gcc-bugzilla at m dot chronial.de> ---
It seems like the solution would be to use codecvt_utf8 if wchar_t is 32bit and
 codecvt_utf8_utf16 if wchar_t is 16bit. This also seems to be what libc++ is
doing. Would you accept a patch for this?

Do we need to handle systems where wchar_t is something other than 16 or 32 bit
wide?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [9/10/11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (6 preceding siblings ...)
  2020-11-12 13:48 ` gcc-bugzilla at m dot chronial.de
@ 2020-11-12 13:58 ` redi at gcc dot gnu.org
  2020-11-12 14:02 ` redi at gcc dot gnu.org
                   ` (25 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: redi at gcc dot gnu.org @ 2020-11-12 13:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #7 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Christian Fersch from comment #6)
> It seems like the solution would be to use codecvt_utf8 if wchar_t is 32bit
> and  codecvt_utf8_utf16 if wchar_t is 16bit. This also seems to be what
> libc++ is doing. Would you accept a patch for this?

Doesn't this have the problem I pointed out in comment 5? Using codecvt_utf8
assumes that the wchar_t encoding is either UCS-2 or UCS-4. GCC supports
changing that encoding using the -fwide-exec-charset= option.

> Do we need to handle systems where wchar_t is something other than 16 or 32
> bit wide?

No.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [9/10/11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (7 preceding siblings ...)
  2020-11-12 13:58 ` redi at gcc dot gnu.org
@ 2020-11-12 14:02 ` redi at gcc dot gnu.org
  2020-11-12 23:49 ` gcc-bugzilla at m dot chronial.de
                   ` (24 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: redi at gcc dot gnu.org @ 2020-11-12 14:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #8 from Jonathan Wakely <redi at gcc dot gnu.org> ---
We do have a codecvt specialization that uses iconv, which would allow us to
convert from the native wide encoding to UTF-8, independent of the locale's
narrow encoding.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [9/10/11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (8 preceding siblings ...)
  2020-11-12 14:02 ` redi at gcc dot gnu.org
@ 2020-11-12 23:49 ` gcc-bugzilla at m dot chronial.de
  2021-01-14  8:44 ` rguenth at gcc dot gnu.org
                   ` (23 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: gcc-bugzilla at m dot chronial.de @ 2020-11-12 23:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #9 from Christian Fersch <gcc-bugzilla at m dot chronial.de> ---
But is it possible to query the value of -fwide-exec-charset? I had quick look
and couldn't find anything.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [9/10/11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (9 preceding siblings ...)
  2020-11-12 23:49 ` gcc-bugzilla at m dot chronial.de
@ 2021-01-14  8:44 ` rguenth at gcc dot gnu.org
  2021-04-19 10:40 ` redi at gcc dot gnu.org
                   ` (22 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-14  8:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [9/10/11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (10 preceding siblings ...)
  2021-01-14  8:44 ` rguenth at gcc dot gnu.org
@ 2021-04-19 10:40 ` redi at gcc dot gnu.org
  2021-06-01  8:17 ` [Bug libstdc++/95048] [9/10/11/12 " rguenth at gcc dot gnu.org
                   ` (21 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: redi at gcc dot gnu.org @ 2021-04-19 10:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|redi at gcc dot gnu.org            |unassigned at gcc dot gnu.org
             Status|ASSIGNED                    |NEW

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [9/10/11/12 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (11 preceding siblings ...)
  2021-04-19 10:40 ` redi at gcc dot gnu.org
@ 2021-06-01  8:17 ` rguenth at gcc dot gnu.org
  2021-07-23 12:40 ` gcc-bugzilla at m dot chronial.de
                   ` (20 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-01  8:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|9.4                         |9.5

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9.4 is being released, retargeting bugs to GCC 9.5.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [9/10/11/12 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (12 preceding siblings ...)
  2021-06-01  8:17 ` [Bug libstdc++/95048] [9/10/11/12 " rguenth at gcc dot gnu.org
@ 2021-07-23 12:40 ` gcc-bugzilla at m dot chronial.de
  2021-10-19 13:25 ` redi at gcc dot gnu.org
                   ` (19 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: gcc-bugzilla at m dot chronial.de @ 2021-07-23 12:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #11 from Christian Fersch <gcc-bugzilla at m dot chronial.de> ---
Would you accept a patch that implements my solution from comment 6? It seems
to me like that would be an improvement over the current situation.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [9/10/11/12 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (13 preceding siblings ...)
  2021-07-23 12:40 ` gcc-bugzilla at m dot chronial.de
@ 2021-10-19 13:25 ` redi at gcc dot gnu.org
  2022-05-27  9:42 ` [Bug libstdc++/95048] [10/11/12/13 " rguenth at gcc dot gnu.org
                   ` (18 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: redi at gcc dot gnu.org @ 2021-10-19 13:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #12 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Christian Fersch from comment #9)
> But is it possible to query the value of -fwide-exec-charset? I had quick
> look and couldn't find anything.

It's possible now: __GNUC_WIDE_EXECUTION_CHARSET_NAME

https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [10/11/12/13 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (14 preceding siblings ...)
  2021-10-19 13:25 ` redi at gcc dot gnu.org
@ 2022-05-27  9:42 ` rguenth at gcc dot gnu.org
  2022-06-28 10:40 ` jakub at gcc dot gnu.org
                   ` (17 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-05-27  9:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|9.5                         |10.4

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9 branch is being closed

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [10/11/12/13 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (15 preceding siblings ...)
  2022-05-27  9:42 ` [Bug libstdc++/95048] [10/11/12/13 " rguenth at gcc dot gnu.org
@ 2022-06-28 10:40 ` jakub at gcc dot gnu.org
  2022-10-24 15:09 ` ulf.lorenz at ptvgroup dot com
                   ` (16 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-06-28 10:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|10.4                        |10.5

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 10.4 is being released, retargeting bugs to GCC 10.5.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [10/11/12/13 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (16 preceding siblings ...)
  2022-06-28 10:40 ` jakub at gcc dot gnu.org
@ 2022-10-24 15:09 ` ulf.lorenz at ptvgroup dot com
  2022-10-25 11:54 ` redi at gcc dot gnu.org
                   ` (15 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: ulf.lorenz at ptvgroup dot com @ 2022-10-24 15:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Ulf Lorenz <ulf.lorenz at ptvgroup dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ulf.lorenz at ptvgroup dot com

--- Comment #15 from Ulf Lorenz <ulf.lorenz at ptvgroup dot com> ---
I have submitted a patch to the patch mailing list on 19th. I assume it needs
someone to review.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [10/11/12/13 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (17 preceding siblings ...)
  2022-10-24 15:09 ` ulf.lorenz at ptvgroup dot com
@ 2022-10-25 11:54 ` redi at gcc dot gnu.org
  2022-10-25 12:47 ` ulf.lorenz at ptvgroup dot com
                   ` (14 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: redi at gcc dot gnu.org @ 2022-10-25 11:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #16 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Thanks. As per https://gcc.gnu.org/lists.html patches for libstdc++ need to be
CC'd to the libstdc++ list, but I'll find it in the gcc-patches archive and
review it ASAP.

I'm still concerned that this just moves the problem, so that different cases
fail instead. But maybe it's an improvement for the most common case, and it's
better to fail in the rare situations.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [10/11/12/13 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (18 preceding siblings ...)
  2022-10-25 11:54 ` redi at gcc dot gnu.org
@ 2022-10-25 12:47 ` ulf.lorenz at ptvgroup dot com
  2022-11-11 17:43 ` cvs-commit at gcc dot gnu.org
                   ` (13 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: ulf.lorenz at ptvgroup dot com @ 2022-10-25 12:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #17 from Ulf Lorenz <ulf.lorenz at ptvgroup dot com> ---
Your point was an item during implementation. Besides the other cases being
rare and probably exotic, the killer argument was that it would be way more
complex. While I am confident now that I can move around the libstdc++ source
code reasonably well and do simple patches and working tests, understanding
iconv and the required acceptance_tests_with_special_compilation_flags requires
considerable more time budget than I have needed so far.

I agree that this is not the bullet-proof solution, though, but only an
incremental improvement.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [10/11/12/13 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (19 preceding siblings ...)
  2022-10-25 12:47 ` ulf.lorenz at ptvgroup dot com
@ 2022-11-11 17:43 ` cvs-commit at gcc dot gnu.org
  2022-11-11 17:46 ` [Bug libstdc++/95048] [10/11/12 " redi at gcc dot gnu.org
                   ` (12 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-11 17:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #18 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:b331bf303bdc1edead41e2b3d11d1a7804b433cf

commit r13-3909-gb331bf303bdc1edead41e2b3d11d1a7804b433cf
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Nov 11 15:22:02 2022 +0000

    libstdc++: Fix wstring conversions in filesystem::path [PR95048]

    In commit r9-7381-g91756c4abc1757 I changed filesystem::path to use
    std::codecvt<CharT, char, mbstate_t> for conversions from all wide
    strings to UTF-8, instead of using std::codecvt_utf8<CharT>. This was
    done because for 16-bit wchar_t, std::codecvt_utf8<wchar_t> only
    supports UCS-2 and not UTF-16. The rationale for the change was sound,
    but the actual fix was not. It's OK to use std::codecvt for char16_t or
    char32_t, because the specializations for those types always use UTF-8 ,
    but std::codecvt<wchar_t, char, mbstate_t> uses the current locale's
    encodings, and the narrow encoding is probably ASCII and can't support
    non-ASCII characters.

    The correct fix is to use std::codecvt only for char16_t and char32_t.
    For 32-bit wchar_t we could have continued using std::codecvt_utf8
    because that uses UTF-32 which is fine, switching to std::codecvt broke
    non-Windows targets with 32-bit wchar_t. For 16-bit wchar_t we did need
    to change, but should have changed to std::codecvt_utf8_utf16<wchar_t>
    instead, as that always uses UTF-16 not UCS-2. I actually noted that in
    the commit message for r9-7381-g91756c4abc1757 but didn't use that
    option. Oops.

    This replaces the unconditional std::codecvt<CharT, char, mbstate_t>
    with a type defined via template specialization, so it can vary
    depending on the wide character type. The code is also simplified to
    remove some of the mess of #ifdef and if-constexpr conditions.

    libstdc++-v3/ChangeLog:

            PR libstdc++/95048
            * include/bits/fs_path.h (path::_Codecvt): New class template
            that selects the kind of code conversion done.
            (path::_Codecvt<wchar_t>): Select based on sizeof(wchar_t).
            (_GLIBCXX_CONV_FROM_UTF8): New macro to allow the same code to
            be used for Windows and POSIX.
            (path::_S_convert(const EcharT*, const EcharT*)): Simplify by
            using _Codecvt and _GLIBCXX_CONV_FROM_UTF8 abstractions.
            (path::_S_str_convert(basic_string_view<value_type>, const A&)):
            Simplify nested conditions.
            * include/experimental/bits/fs_path.h (path::_Cvt): Define
            nested typedef controlling type of code conversion done.
            (path::_Cvt::_S_wconvert): Use new typedef.
            (path::string(const A&)): Likewise.
            * testsuite/27_io/filesystem/path/construct/95048.cc: New test.
            * testsuite/experimental/filesystem/path/construct/95048.cc: New
            test.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [10/11/12 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (20 preceding siblings ...)
  2022-11-11 17:43 ` cvs-commit at gcc dot gnu.org
@ 2022-11-11 17:46 ` redi at gcc dot gnu.org
  2022-11-11 17:51 ` redi at gcc dot gnu.org
                   ` (11 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: redi at gcc dot gnu.org @ 2022-11-11 17:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[10/11/12/13 Regression]    |[10/11/12 Regression]
                   |wstring-constructor of      |wstring-constructor of
                   |std::filesystem::path       |std::filesystem::path
                   |throws for non-ASCII        |throws for non-ASCII
                   |characters                  |characters
           Assignee|unassigned at gcc dot gnu.org      |redi at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #19 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Thanks for the patch, but I've fixed this slightly differently. The conditional
logic was a mess, I hope it's easier to follow now. The fix has also been
applied to experimental::filesystem::path.

The fix still needs to be backported to the release branches.

Thanks to everybody who helped me understand what the right conversions were
here.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [10/11/12 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (21 preceding siblings ...)
  2022-11-11 17:46 ` [Bug libstdc++/95048] [10/11/12 " redi at gcc dot gnu.org
@ 2022-11-11 17:51 ` redi at gcc dot gnu.org
  2022-11-11 22:29 ` cvs-commit at gcc dot gnu.org
                   ` (10 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: redi at gcc dot gnu.org @ 2022-11-11 17:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jengelh at inai dot de

--- Comment #20 from Jonathan Wakely <redi at gcc dot gnu.org> ---
*** Bug 102839 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [10/11/12 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (22 preceding siblings ...)
  2022-11-11 17:51 ` redi at gcc dot gnu.org
@ 2022-11-11 22:29 ` cvs-commit at gcc dot gnu.org
  2022-11-14 18:34 ` cvs-commit at gcc dot gnu.org
                   ` (9 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-11 22:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #21 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:8214ec0cf33482f60139ae18a40567317e63c1ff

commit r13-3915-g8214ec0cf33482f60139ae18a40567317e63c1ff
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Nov 11 22:25:14 2022 +0000

    libstdc++: Fix <experimental/filesystem> for Windows [PR95048]

    I meant to include this change in r13-3909-gb331bf303bdc1e but I forgot
    to sync it from the machine where I did the mingw testing to the one
    where I pushed the commit.

    libstdc++-v3/ChangeLog:

            PR libstdc++/95048
            * include/experimental/bits/fs_path.h (path::_Cvt::_S_wconvert):
            Construct codecvt directly instead of getting it from the
            locale.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [10/11/12 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (23 preceding siblings ...)
  2022-11-11 22:29 ` cvs-commit at gcc dot gnu.org
@ 2022-11-14 18:34 ` cvs-commit at gcc dot gnu.org
  2022-11-22 11:51 ` [Bug libstdc++/95048] [10/11 " ulf.lorenz at ptvgroup dot com
                   ` (8 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-14 18:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #22 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-12 branch has been updated by Jonathan Wakely
<redi@gcc.gnu.org>:

https://gcc.gnu.org/g:c6bd8fac5e3bc6003f889fbd6042c0d8aa9c40ed

commit r12-8909-gc6bd8fac5e3bc6003f889fbd6042c0d8aa9c40ed
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Nov 11 15:22:02 2022 +0000

    libstdc++: Fix wstring conversions in filesystem::path [PR95048]

    In commit r9-7381-g91756c4abc1757 I changed filesystem::path to use
    std::codecvt<CharT, char, mbstate_t> for conversions from all wide
    strings to UTF-8, instead of using std::codecvt_utf8<CharT>. This was
    done because for 16-bit wchar_t, std::codecvt_utf8<wchar_t> only
    supports UCS-2 and not UTF-16. The rationale for the change was sound,
    but the actual fix was not. It's OK to use std::codecvt for char16_t or
    char32_t, because the specializations for those types always use UTF-8 ,
    but std::codecvt<wchar_t, char, mbstate_t> uses the current locale's
    encodings, and the narrow encoding is probably ASCII and can't support
    non-ASCII characters.

    The correct fix is to use std::codecvt only for char16_t and char32_t.
    For 32-bit wchar_t we could have continued using std::codecvt_utf8
    because that uses UTF-32 which is fine, switching to std::codecvt broke
    non-Windows targets with 32-bit wchar_t. For 16-bit wchar_t we did need
    to change, but should have changed to std::codecvt_utf8_utf16<wchar_t>
    instead, as that always uses UTF-16 not UCS-2. I actually noted that in
    the commit message for r9-7381-g91756c4abc1757 but didn't use that
    option. Oops.

    This replaces the unconditional std::codecvt<CharT, char, mbstate_t>
    with a type defined via template specialization, so it can vary
    depending on the wide character type. The code is also simplified to
    remove some of the mess of #ifdef and if-constexpr conditions.

    libstdc++-v3/ChangeLog:

            PR libstdc++/95048
            * include/bits/fs_path.h (path::_Codecvt): New class template
            that selects the kind of code conversion done.
            (path::_Codecvt<wchar_t>): Select based on sizeof(wchar_t).
            (_GLIBCXX_CONV_FROM_UTF8): New macro to allow the same code to
            be used for Windows and POSIX.
            (path::_S_convert(const EcharT*, const EcharT*)): Simplify by
            using _Codecvt and _GLIBCXX_CONV_FROM_UTF8 abstractions.
            (path::_S_str_convert(basic_string_view<value_type>, const A&)):
            Simplify nested conditions.
            * include/experimental/bits/fs_path.h (path::_Cvt): Define
            nested typedef controlling type of code conversion done.
            (path::_Cvt::_S_wconvert): Use new typedef.
            (path::string(const A&)): Likewise.
            * testsuite/27_io/filesystem/path/construct/95048.cc: New test.
            * testsuite/experimental/filesystem/path/construct/95048.cc: New
            test.

    (cherry picked from commit b331bf303bdc1edead41e2b3d11d1a7804b433cf)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [10/11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (24 preceding siblings ...)
  2022-11-14 18:34 ` cvs-commit at gcc dot gnu.org
@ 2022-11-22 11:51 ` ulf.lorenz at ptvgroup dot com
  2022-11-22 12:30 ` redi at gcc dot gnu.org
                   ` (7 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: ulf.lorenz at ptvgroup dot com @ 2022-11-22 11:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #23 from Ulf Lorenz <ulf.lorenz at ptvgroup dot com> ---
I have just one tiny comment on the patches, and that is the tests.

In my original submission, I also modified the tests that verify the path
output (path::wstring() in particular) that AFAIR also had the encoding
problem. In particular, the file
libstdc++-v3/testsuite/27_io/filesystem/path/native/string.cc is lacking a test
with std::path that uses non-Ascii characters.There is test02() which looks
good except that the string "abc" does not catch most conversion problems, and
there is test04() that does not test for path::wstring().

So I think, test04() could just do with a few lines like

#ifdef _GLIBCXX_USE_WCHAR_T
auto strw = p.wstring();
VERIFY( strw == L"\xf0\x9d\x84\x9e");
#endif

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [10/11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (25 preceding siblings ...)
  2022-11-22 11:51 ` [Bug libstdc++/95048] [10/11 " ulf.lorenz at ptvgroup dot com
@ 2022-11-22 12:30 ` redi at gcc dot gnu.org
  2023-07-07 10:37 ` [Bug libstdc++/95048] [11 " rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: redi at gcc dot gnu.org @ 2022-11-22 12:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #24 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Yes, I agree. I'll get those added.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (26 preceding siblings ...)
  2022-11-22 12:30 ` redi at gcc dot gnu.org
@ 2023-07-07 10:37 ` rguenth at gcc dot gnu.org
  2023-07-10 15:47 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-07 10:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|10.5                        |11.5

--- Comment #25 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 10 branch is being closed.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (27 preceding siblings ...)
  2023-07-07 10:37 ` [Bug libstdc++/95048] [11 " rguenth at gcc dot gnu.org
@ 2023-07-10 15:47 ` cvs-commit at gcc dot gnu.org
  2023-07-12 20:04 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-10 15:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #26 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Jonathan Wakely
<redi@gcc.gnu.org>:

https://gcc.gnu.org/g:d308b11fa94728507984b4ccc949219511273ab6

commit r11-10903-gd308b11fa94728507984b4ccc949219511273ab6
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Nov 11 15:22:02 2022 +0000

    libstdc++: Fix wstring conversions in filesystem::path [PR95048]

    In commit r9-7381-g91756c4abc1757 I changed filesystem::path to use
    std::codecvt<CharT, char, mbstate_t> for conversions from all wide
    strings to UTF-8, instead of using std::codecvt_utf8<CharT>. This was
    done because for 16-bit wchar_t, std::codecvt_utf8<wchar_t> only
    supports UCS-2 and not UTF-16. The rationale for the change was sound,
    but the actual fix was not. It's OK to use std::codecvt for char16_t or
    char32_t, because the specializations for those types always use UTF-8 ,
    but std::codecvt<wchar_t, char, mbstate_t> uses the current locale's
    encodings, and the narrow encoding is probably ASCII and can't support
    non-ASCII characters.

    The correct fix is to use std::codecvt only for char16_t and char32_t.
    For 32-bit wchar_t we could have continued using std::codecvt_utf8
    because that uses UTF-32 which is fine, switching to std::codecvt broke
    non-Windows targets with 32-bit wchar_t. For 16-bit wchar_t we did need
    to change, but should have changed to std::codecvt_utf8_utf16<wchar_t>
    instead, as that always uses UTF-16 not UCS-2. I actually noted that in
    the commit message for r9-7381-g91756c4abc1757 but didn't use that
    option. Oops.

    This replaces the unconditional std::codecvt<CharT, char, mbstate_t>
    with a type defined via template specialization, so it can vary
    depending on the wide character type. The code is also simplified to
    remove some of the mess of #ifdef and if-constexpr conditions.

    libstdc++-v3/ChangeLog:

            PR libstdc++/95048
            * include/bits/fs_path.h (path::_Codecvt): New class template
            that selects the kind of code conversion done.
            (path::_Codecvt<wchar_t>): Select based on sizeof(wchar_t).
            (_GLIBCXX_CONV_FROM_UTF8): New macro to allow the same code to
            be used for Windows and POSIX.
            (path::_S_convert(const EcharT*, const EcharT*)): Simplify by
            using _Codecvt and _GLIBCXX_CONV_FROM_UTF8 abstractions.
            (path::_S_str_convert(basic_string_view<value_type>, const A&)):
            Simplify nested conditions.
            * include/experimental/bits/fs_path.h (path::_Cvt): Define
            nested typedef controlling type of code conversion done.
            (path::_Cvt::_S_wconvert): Use new typedef.
            (path::string(const A&)): Likewise.
            * testsuite/27_io/filesystem/path/construct/95048.cc: New test.
            * testsuite/experimental/filesystem/path/construct/95048.cc: New
            test.

    (cherry picked from commit b331bf303bdc1edead41e2b3d11d1a7804b433cf)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (28 preceding siblings ...)
  2023-07-10 15:47 ` cvs-commit at gcc dot gnu.org
@ 2023-07-12 20:04 ` cvs-commit at gcc dot gnu.org
  2023-07-12 20:16 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-12 20:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #27 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:d6384ad1a9ab7ea46990a7ed1299d5a2be4acece

commit r14-2478-gd6384ad1a9ab7ea46990a7ed1299d5a2be4acece
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Jul 12 14:40:19 2023 +0100

    libstdc++: Check conversion from filesystem::path to wide strings [PR95048]

    The testcase added for this bug only checks conversion from wide strings
    on construction, but the fix also covered conversion to wide stings via
    path::wstring(). Add checks for that, and u16string() and u32string().

    libstdc++-v3/ChangeLog:

            PR libstdc++/95048
            * testsuite/27_io/filesystem/path/construct/95048.cc: Check
            conversions to wide strings.
            * testsuite/experimental/filesystem/path/construct/95048.cc:
            Likewise.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (29 preceding siblings ...)
  2023-07-12 20:04 ` cvs-commit at gcc dot gnu.org
@ 2023-07-12 20:16 ` cvs-commit at gcc dot gnu.org
  2023-07-12 20:19 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-12 20:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #28 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
<redi@gcc.gnu.org>:

https://gcc.gnu.org/g:db179779c9416cebb646758d276744f78536cc25

commit r13-7559-gdb179779c9416cebb646758d276744f78536cc25
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Jul 12 14:40:19 2023 +0100

    libstdc++: Check conversion from filesystem::path to wide strings [PR95048]

    The testcase added for this bug only checks conversion from wide strings
    on construction, but the fix also covered conversion to wide strings via
    path::wstring(). Add checks for that, and u16string() and u32string().

    libstdc++-v3/ChangeLog:

            PR libstdc++/95048
            * testsuite/27_io/filesystem/path/construct/95048.cc: Check
            conversions to wide strings.
            * testsuite/experimental/filesystem/path/construct/95048.cc:
            Likewise.

    (cherry picked from commit d6384ad1a9ab7ea46990a7ed1299d5a2be4acece)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (30 preceding siblings ...)
  2023-07-12 20:16 ` cvs-commit at gcc dot gnu.org
@ 2023-07-12 20:19 ` cvs-commit at gcc dot gnu.org
  2023-07-12 20:23 ` cvs-commit at gcc dot gnu.org
  2023-07-12 20:24 ` redi at gcc dot gnu.org
  33 siblings, 0 replies; 35+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-12 20:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #29 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-12 branch has been updated by Jonathan Wakely
<redi@gcc.gnu.org>:

https://gcc.gnu.org/g:530b749c71d7aaaf965d53227911411572c35146

commit r12-9768-g530b749c71d7aaaf965d53227911411572c35146
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Jul 12 14:40:19 2023 +0100

    libstdc++: Check conversion from filesystem::path to wide strings [PR95048]

    The testcase added for this bug only checks conversion from wide strings
    on construction, but the fix also covered conversion to wide strings via
    path::wstring(). Add checks for that, and u16string() and u32string().

    libstdc++-v3/ChangeLog:

            PR libstdc++/95048
            * testsuite/27_io/filesystem/path/construct/95048.cc: Check
            conversions to wide strings.
            * testsuite/experimental/filesystem/path/construct/95048.cc:
            Likewise.

    (cherry picked from commit d6384ad1a9ab7ea46990a7ed1299d5a2be4acece)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (31 preceding siblings ...)
  2023-07-12 20:19 ` cvs-commit at gcc dot gnu.org
@ 2023-07-12 20:23 ` cvs-commit at gcc dot gnu.org
  2023-07-12 20:24 ` redi at gcc dot gnu.org
  33 siblings, 0 replies; 35+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-12 20:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

--- Comment #30 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Jonathan Wakely
<redi@gcc.gnu.org>:

https://gcc.gnu.org/g:470f32f964574febf484edaf9e580067ac97f3b6

commit r11-10906-g470f32f964574febf484edaf9e580067ac97f3b6
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Jul 12 14:40:19 2023 +0100

    libstdc++: Check conversion from filesystem::path to wide strings [PR95048]

    The testcase added for this bug only checks conversion from wide strings
    on construction, but the fix also covered conversion to wide strings via
    path::wstring(). Add checks for that, and u16string() and u32string().

    libstdc++-v3/ChangeLog:

            PR libstdc++/95048
            * testsuite/27_io/filesystem/path/construct/95048.cc: Check
            conversions to wide strings.
            * testsuite/experimental/filesystem/path/construct/95048.cc:
            Likewise.

    (cherry picked from commit d6384ad1a9ab7ea46990a7ed1299d5a2be4acece)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug libstdc++/95048] [11 Regression] wstring-constructor of std::filesystem::path throws for non-ASCII characters
  2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
                   ` (32 preceding siblings ...)
  2023-07-12 20:23 ` cvs-commit at gcc dot gnu.org
@ 2023-07-12 20:24 ` redi at gcc dot gnu.org
  33 siblings, 0 replies; 35+ messages in thread
From: redi at gcc dot gnu.org @ 2023-07-12 20:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95048

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #31 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #24)
> Yes, I agree. I'll get those added.

Tests updated to include conversion to wide strings with wstring().

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2023-07-12 20:24 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-11  7:59 [Bug libstdc++/95048] New: wstring-constructor of std::filesystem::path throws for non-ASCII characters kontakt at neonfoto dot de
2020-05-11 20:49 ` [Bug libstdc++/95048] " redi at gcc dot gnu.org
2020-05-11 20:59 ` carlos at redhat dot com
2020-05-11 21:14 ` [Bug libstdc++/95048] [9/10/11 Regression] " redi at gcc dot gnu.org
2020-05-11 21:19 ` redi at gcc dot gnu.org
2020-05-11 21:25 ` redi at gcc dot gnu.org
2020-06-26 12:05 ` rguenth at gcc dot gnu.org
2020-11-12 13:48 ` gcc-bugzilla at m dot chronial.de
2020-11-12 13:58 ` redi at gcc dot gnu.org
2020-11-12 14:02 ` redi at gcc dot gnu.org
2020-11-12 23:49 ` gcc-bugzilla at m dot chronial.de
2021-01-14  8:44 ` rguenth at gcc dot gnu.org
2021-04-19 10:40 ` redi at gcc dot gnu.org
2021-06-01  8:17 ` [Bug libstdc++/95048] [9/10/11/12 " rguenth at gcc dot gnu.org
2021-07-23 12:40 ` gcc-bugzilla at m dot chronial.de
2021-10-19 13:25 ` redi at gcc dot gnu.org
2022-05-27  9:42 ` [Bug libstdc++/95048] [10/11/12/13 " rguenth at gcc dot gnu.org
2022-06-28 10:40 ` jakub at gcc dot gnu.org
2022-10-24 15:09 ` ulf.lorenz at ptvgroup dot com
2022-10-25 11:54 ` redi at gcc dot gnu.org
2022-10-25 12:47 ` ulf.lorenz at ptvgroup dot com
2022-11-11 17:43 ` cvs-commit at gcc dot gnu.org
2022-11-11 17:46 ` [Bug libstdc++/95048] [10/11/12 " redi at gcc dot gnu.org
2022-11-11 17:51 ` redi at gcc dot gnu.org
2022-11-11 22:29 ` cvs-commit at gcc dot gnu.org
2022-11-14 18:34 ` cvs-commit at gcc dot gnu.org
2022-11-22 11:51 ` [Bug libstdc++/95048] [10/11 " ulf.lorenz at ptvgroup dot com
2022-11-22 12:30 ` redi at gcc dot gnu.org
2023-07-07 10:37 ` [Bug libstdc++/95048] [11 " rguenth at gcc dot gnu.org
2023-07-10 15:47 ` cvs-commit at gcc dot gnu.org
2023-07-12 20:04 ` cvs-commit at gcc dot gnu.org
2023-07-12 20:16 ` cvs-commit at gcc dot gnu.org
2023-07-12 20:19 ` cvs-commit at gcc dot gnu.org
2023-07-12 20:23 ` cvs-commit at gcc dot gnu.org
2023-07-12 20:24 ` redi at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).