public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* [:xdigit:] does not work with std::wstring in a Cygwin environment
@ 2022-02-11 16:02 Gans, Markus
  2022-02-11 19:35 ` Corinna Vinschen
  2022-02-13 18:25 ` Achim Gratz
  0 siblings, 2 replies; 5+ messages in thread
From: Gans, Markus @ 2022-02-11 16:02 UTC (permalink / raw)
  To: 'cygwin@cygwin.com'

This seems to be an internal Cygwin error:

https://www.reddit.com/r/cpp_questions/comments/sp52gq/xdigit_does_not_work_with_stdwstring_in_a_cygwin/

------------------------------------------------------------------------------
I have an unexpected behavior with Cygwin for the character class [:xdigit:]. The pattern matching for [:xdigit:] behaves like the pattern matching of [:digit:] when using a wide string. With `std::string` everything works fine.

Example:

    #include <iostream>
    #include <string>
    #include <regex>
    
    int main ()
    {
      std::cout << "Wide character string\n";
      std::wstring w_character = L"a";
    
      if ( regex_match(w_character, std::wregex(L"[[:xdigit:]]")) )
        std::cout << "'" << char(w_character[0]) << "' is a hex digit\n";
      else
        std::cout << "'" << char(w_character[0]) << "' is not a hex digit\n";
    
      std::cout << "----------------------\n"
                << "String with 1 byte character\n";
      std::string character = "a";
    
      if ( regex_match(character, std::regex("[[:xdigit:]]")) )
        std::cout << "'" << char(w_character[0]) << "' is a hex digit\n";
      else
        std::cout << "'" << char(w_character[0]) << "' is not a hex digit\n";
    
      return 0;
    }

Output in a Cygwin environment:

    Wide character string
    'a' is not a hex digit
    ----------------------
    Character string
    'a' is a hex digit

Output on Linux:

    Wide character string
    'a' is a hex digit
    ----------------------
    String with 1 byte character
    'a' is a hex digit

Question: Why does Cygwin not detect the letters a, b, c, d, e, and f as hexadecimal digits in a wide string?
------------------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [:xdigit:] does not work with std::wstring in a Cygwin environment
  2022-02-11 16:02 [:xdigit:] does not work with std::wstring in a Cygwin environment Gans, Markus
@ 2022-02-11 19:35 ` Corinna Vinschen
  2022-02-13 18:25 ` Achim Gratz
  1 sibling, 0 replies; 5+ messages in thread
From: Corinna Vinschen @ 2022-02-11 19:35 UTC (permalink / raw)
  To: cygwin; +Cc: Achim Gratz

On Feb 11 16:02, Gans, Markus wrote:
> This seems to be an internal Cygwin error:
> 
> https://www.reddit.com/r/cpp_questions/comments/sp52gq/xdigit_does_not_work_with_stdwstring_in_a_cygwin/
> 
> ------------------------------------------------------------------------------
> I have an unexpected behavior with Cygwin for the character class [:xdigit:]. The pattern matching for [:xdigit:] behaves like the pattern matching of [:digit:] when using a wide string. With `std::string` everything works fine.
> 
> Example:
> 
>     #include <iostream>
>     #include <string>
>     #include <regex>
>     
>     int main ()
>     {
>       std::cout << "Wide character string\n";
>       std::wstring w_character = L"a";
>     
>       if ( regex_match(w_character, std::wregex(L"[[:xdigit:]]")) )
>         std::cout << "'" << char(w_character[0]) << "' is a hex digit\n";
>       else
>         std::cout << "'" << char(w_character[0]) << "' is not a hex digit\n";
>     
>       std::cout << "----------------------\n"
>                 << "String with 1 byte character\n";
>       std::string character = "a";
>     
>       if ( regex_match(character, std::regex("[[:xdigit:]]")) )
>         std::cout << "'" << char(w_character[0]) << "' is a hex digit\n";
>       else
>         std::cout << "'" << char(w_character[0]) << "' is not a hex digit\n";
>     
>       return 0;
>     }
> 
> Output in a Cygwin environment:
> 
>     Wide character string
>     'a' is not a hex digit
>     ----------------------
>     Character string
>     'a' is a hex digit
> 
> Output on Linux:
> 
>     Wide character string
>     'a' is a hex digit
>     ----------------------
>     String with 1 byte character
>     'a' is a hex digit
> 
> Question: Why does Cygwin not detect the letters a, b, c, d, e, and f as hexadecimal digits in a wide string?
> ------------------------------------------------------------------------------

This seems to be a bug in libstdc++.  None of the above functions call
any internal library function which could affect the result.  That means
regcomp(3), regexec(3), isxdigit{_l}(3) or iswxdigit{_l}(3).

Achim, any idea?  Is wchar support broken in Cygwin's libstdc++, by any
chance?


Corinna

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [:xdigit:] does not work with std::wstring in a Cygwin environment
  2022-02-11 16:02 [:xdigit:] does not work with std::wstring in a Cygwin environment Gans, Markus
  2022-02-11 19:35 ` Corinna Vinschen
@ 2022-02-13 18:25 ` Achim Gratz
  2022-02-15  1:36   ` Hans-Bernhard Bröker
  1 sibling, 1 reply; 5+ messages in thread
From: Achim Gratz @ 2022-02-13 18:25 UTC (permalink / raw)
  To: cygwin

Gans, Markus writes:
> This seems to be an internal Cygwin error:
>
> https://www.reddit.com/r/cpp_questions/comments/sp52gq/xdigit_does_not_work_with_stdwstring_in_a_cygwin/
[…]
> Question: Why does Cygwin not detect the letters a, b, c, d, e, and f as hexadecimal digits in a wide string?

I have no idea, there doesn't seem to be any external libraries
involved.  At a quick glance there also weren't any commits that would
obviously fix a bug in that region.  There is no OS specific
configuration for Cygwin explicitly, instead there is one for newlib
that actually gets used.

Please report this bug upstream.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Samples for the Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#BlofeldSamplesExtra

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [:xdigit:] does not work with std::wstring in a Cygwin environment
  2022-02-13 18:25 ` Achim Gratz
@ 2022-02-15  1:36   ` Hans-Bernhard Bröker
  2022-02-17 23:11     ` Hans-Bernhard Bröker
  0 siblings, 1 reply; 5+ messages in thread
From: Hans-Bernhard Bröker @ 2022-02-15  1:36 UTC (permalink / raw)
  To: cygwin

Am 13.02.2022 um 19:25 schrieb Achim Gratz:
> Gans, Markus writes:
>> This seems to be an internal Cygwin error:
>> 
>> https://www.reddit.com/r/cpp_questions/comments/sp52gq/xdigit_does_not_work_with_stdwstring_in_a_cygwin/
>
>> 
[…]
>> Question: Why does Cygwin not detect the letters a, b, c, d, e, and
>> f as hexadecimal digits in a wide string?

[...]

> There is no OS specific configuration for Cygwin explicitly, instead
> there is one for newlib that actually gets used.

This piqued my curiosity, so I had a look at how libstdc++ is built.  I
found that at least for one crucial source file, called
ctype_members.cc, cygwin builds do _not_ use the newlib edition, but
rather the "generic" one.  And that may very well be the problem here.

The superficial cause of the problem is that member function
_M_initialize_ctype() in
libstdc++-v3/config/locale/generic/ctype_member.cc fills most of its
array _M_wmask[] with zeroes instead of meaningful character class 
identifiers.

The slightly deeper reason is that the companion array _M_bit[] is also
suspiciously full of zeroes.

But the real problem, IMHO, is that the type ctype<wchar_t>::mask is
just a plain char.  That overflows the looped shift used to fill
_M_bit[], which in turn leads to nonsense in _M_wmask[].

I didn't manage to find where this ctype<wchar_t>::mask is defined, but 
the way it's used here cannot work if it's defined as plain char.  The
newlib edition of ctype_members.cc loops over just 8 bits instead of
16, which would allow this to work.

So we either have to pick up a different type definition of 
ctype<wchar_t>::mask, or a different edition of ctype_members.cc --- I 
guess it should be the newlib one.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [:xdigit:] does not work with std::wstring in a Cygwin environment
  2022-02-15  1:36   ` Hans-Bernhard Bröker
@ 2022-02-17 23:11     ` Hans-Bernhard Bröker
  0 siblings, 0 replies; 5+ messages in thread
From: Hans-Bernhard Bröker @ 2022-02-17 23:11 UTC (permalink / raw)
  To: cygwin

Am 15.02.2022 um 02:36 schrieb Hans-Bernhard Bröker:
> Am 13.02.2022 um 19:25 schrieb Achim Gratz:

>> There is no OS specific configuration for Cygwin explicitly, instead
>> there is one for newlib that actually gets used.

> This piqued my curiosity, so I had a look at how libstdc++ is built.  I
> found that at least for one crucial source file, called
> ctype_members.cc, cygwin builds do _not_ use the newlib edition, but
> rather the "generic" one.  And that may very well be the problem here.
[...]

I've taken the liberty of filing this upstream as a GCC/libstdc++ issue.

The extremely condensed version of the issue is that libstdc++ builds by 
selecting config/os/newlib, but it does not pick --enable-clocale=newlib.

Enabling the more global --with-newlib flag would do the latter for us, 
but it might have other, less desirable effects on top of that.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-02-17 23:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-11 16:02 [:xdigit:] does not work with std::wstring in a Cygwin environment Gans, Markus
2022-02-11 19:35 ` Corinna Vinschen
2022-02-13 18:25 ` Achim Gratz
2022-02-15  1:36   ` Hans-Bernhard Bröker
2022-02-17 23:11     ` Hans-Bernhard Bröker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).