* [:xdigit:] does not work with std::wstring in a Cygwin environment
@ 2022-02-11 16:02 Gans, Markus
2022-02-11 19:35 ` Corinna Vinschen
2022-02-13 18:25 ` Achim Gratz
0 siblings, 2 replies; 5+ messages in thread
From: Gans, Markus @ 2022-02-11 16:02 UTC (permalink / raw)
To: 'cygwin@cygwin.com'
This seems to be an internal Cygwin error:
https://www.reddit.com/r/cpp_questions/comments/sp52gq/xdigit_does_not_work_with_stdwstring_in_a_cygwin/
------------------------------------------------------------------------------
I have an unexpected behavior with Cygwin for the character class [:xdigit:]. The pattern matching for [:xdigit:] behaves like the pattern matching of [:digit:] when using a wide string. With `std::string` everything works fine.
Example:
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::cout << "Wide character string\n";
std::wstring w_character = L"a";
if ( regex_match(w_character, std::wregex(L"[[:xdigit:]]")) )
std::cout << "'" << char(w_character[0]) << "' is a hex digit\n";
else
std::cout << "'" << char(w_character[0]) << "' is not a hex digit\n";
std::cout << "----------------------\n"
<< "String with 1 byte character\n";
std::string character = "a";
if ( regex_match(character, std::regex("[[:xdigit:]]")) )
std::cout << "'" << char(w_character[0]) << "' is a hex digit\n";
else
std::cout << "'" << char(w_character[0]) << "' is not a hex digit\n";
return 0;
}
Output in a Cygwin environment:
Wide character string
'a' is not a hex digit
----------------------
Character string
'a' is a hex digit
Output on Linux:
Wide character string
'a' is a hex digit
----------------------
String with 1 byte character
'a' is a hex digit
Question: Why does Cygwin not detect the letters a, b, c, d, e, and f as hexadecimal digits in a wide string?
------------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [:xdigit:] does not work with std::wstring in a Cygwin environment
2022-02-11 16:02 [:xdigit:] does not work with std::wstring in a Cygwin environment Gans, Markus
@ 2022-02-11 19:35 ` Corinna Vinschen
2022-02-13 18:25 ` Achim Gratz
1 sibling, 0 replies; 5+ messages in thread
From: Corinna Vinschen @ 2022-02-11 19:35 UTC (permalink / raw)
To: cygwin; +Cc: Achim Gratz
On Feb 11 16:02, Gans, Markus wrote:
> This seems to be an internal Cygwin error:
>
> https://www.reddit.com/r/cpp_questions/comments/sp52gq/xdigit_does_not_work_with_stdwstring_in_a_cygwin/
>
> ------------------------------------------------------------------------------
> I have an unexpected behavior with Cygwin for the character class [:xdigit:]. The pattern matching for [:xdigit:] behaves like the pattern matching of [:digit:] when using a wide string. With `std::string` everything works fine.
>
> Example:
>
> #include <iostream>
> #include <string>
> #include <regex>
>
> int main ()
> {
> std::cout << "Wide character string\n";
> std::wstring w_character = L"a";
>
> if ( regex_match(w_character, std::wregex(L"[[:xdigit:]]")) )
> std::cout << "'" << char(w_character[0]) << "' is a hex digit\n";
> else
> std::cout << "'" << char(w_character[0]) << "' is not a hex digit\n";
>
> std::cout << "----------------------\n"
> << "String with 1 byte character\n";
> std::string character = "a";
>
> if ( regex_match(character, std::regex("[[:xdigit:]]")) )
> std::cout << "'" << char(w_character[0]) << "' is a hex digit\n";
> else
> std::cout << "'" << char(w_character[0]) << "' is not a hex digit\n";
>
> return 0;
> }
>
> Output in a Cygwin environment:
>
> Wide character string
> 'a' is not a hex digit
> ----------------------
> Character string
> 'a' is a hex digit
>
> Output on Linux:
>
> Wide character string
> 'a' is a hex digit
> ----------------------
> String with 1 byte character
> 'a' is a hex digit
>
> Question: Why does Cygwin not detect the letters a, b, c, d, e, and f as hexadecimal digits in a wide string?
> ------------------------------------------------------------------------------
This seems to be a bug in libstdc++. None of the above functions call
any internal library function which could affect the result. That means
regcomp(3), regexec(3), isxdigit{_l}(3) or iswxdigit{_l}(3).
Achim, any idea? Is wchar support broken in Cygwin's libstdc++, by any
chance?
Corinna
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [:xdigit:] does not work with std::wstring in a Cygwin environment
2022-02-11 16:02 [:xdigit:] does not work with std::wstring in a Cygwin environment Gans, Markus
2022-02-11 19:35 ` Corinna Vinschen
@ 2022-02-13 18:25 ` Achim Gratz
2022-02-15 1:36 ` Hans-Bernhard Bröker
1 sibling, 1 reply; 5+ messages in thread
From: Achim Gratz @ 2022-02-13 18:25 UTC (permalink / raw)
To: cygwin
Gans, Markus writes:
> This seems to be an internal Cygwin error:
>
> https://www.reddit.com/r/cpp_questions/comments/sp52gq/xdigit_does_not_work_with_stdwstring_in_a_cygwin/
[…]
> Question: Why does Cygwin not detect the letters a, b, c, d, e, and f as hexadecimal digits in a wide string?
I have no idea, there doesn't seem to be any external libraries
involved. At a quick glance there also weren't any commits that would
obviously fix a bug in that region. There is no OS specific
configuration for Cygwin explicitly, instead there is one for newlib
that actually gets used.
Please report this bug upstream.
Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+
Samples for the Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#BlofeldSamplesExtra
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [:xdigit:] does not work with std::wstring in a Cygwin environment
2022-02-13 18:25 ` Achim Gratz
@ 2022-02-15 1:36 ` Hans-Bernhard Bröker
2022-02-17 23:11 ` Hans-Bernhard Bröker
0 siblings, 1 reply; 5+ messages in thread
From: Hans-Bernhard Bröker @ 2022-02-15 1:36 UTC (permalink / raw)
To: cygwin
Am 13.02.2022 um 19:25 schrieb Achim Gratz:
> Gans, Markus writes:
>> This seems to be an internal Cygwin error:
>>
>> https://www.reddit.com/r/cpp_questions/comments/sp52gq/xdigit_does_not_work_with_stdwstring_in_a_cygwin/
>
>>
[…]
>> Question: Why does Cygwin not detect the letters a, b, c, d, e, and
>> f as hexadecimal digits in a wide string?
[...]
> There is no OS specific configuration for Cygwin explicitly, instead
> there is one for newlib that actually gets used.
This piqued my curiosity, so I had a look at how libstdc++ is built. I
found that at least for one crucial source file, called
ctype_members.cc, cygwin builds do _not_ use the newlib edition, but
rather the "generic" one. And that may very well be the problem here.
The superficial cause of the problem is that member function
_M_initialize_ctype() in
libstdc++-v3/config/locale/generic/ctype_member.cc fills most of its
array _M_wmask[] with zeroes instead of meaningful character class
identifiers.
The slightly deeper reason is that the companion array _M_bit[] is also
suspiciously full of zeroes.
But the real problem, IMHO, is that the type ctype<wchar_t>::mask is
just a plain char. That overflows the looped shift used to fill
_M_bit[], which in turn leads to nonsense in _M_wmask[].
I didn't manage to find where this ctype<wchar_t>::mask is defined, but
the way it's used here cannot work if it's defined as plain char. The
newlib edition of ctype_members.cc loops over just 8 bits instead of
16, which would allow this to work.
So we either have to pick up a different type definition of
ctype<wchar_t>::mask, or a different edition of ctype_members.cc --- I
guess it should be the newlib one.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [:xdigit:] does not work with std::wstring in a Cygwin environment
2022-02-15 1:36 ` Hans-Bernhard Bröker
@ 2022-02-17 23:11 ` Hans-Bernhard Bröker
0 siblings, 0 replies; 5+ messages in thread
From: Hans-Bernhard Bröker @ 2022-02-17 23:11 UTC (permalink / raw)
To: cygwin
Am 15.02.2022 um 02:36 schrieb Hans-Bernhard Bröker:
> Am 13.02.2022 um 19:25 schrieb Achim Gratz:
>> There is no OS specific configuration for Cygwin explicitly, instead
>> there is one for newlib that actually gets used.
> This piqued my curiosity, so I had a look at how libstdc++ is built. I
> found that at least for one crucial source file, called
> ctype_members.cc, cygwin builds do _not_ use the newlib edition, but
> rather the "generic" one. And that may very well be the problem here.
[...]
I've taken the liberty of filing this upstream as a GCC/libstdc++ issue.
The extremely condensed version of the issue is that libstdc++ builds by
selecting config/os/newlib, but it does not pick --enable-clocale=newlib.
Enabling the more global --with-newlib flag would do the latter for us,
but it might have other, less desirable effects on top of that.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-02-17 23:11 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-11 16:02 [:xdigit:] does not work with std::wstring in a Cygwin environment Gans, Markus
2022-02-11 19:35 ` Corinna Vinschen
2022-02-13 18:25 ` Achim Gratz
2022-02-15 1:36 ` Hans-Bernhard Bröker
2022-02-17 23:11 ` Hans-Bernhard Bröker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).