public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [Mingw-w64-public] std::regex freezes in Japanese locale
       [not found] ` <778019458.8796650.1611425106252@mail.yahoo.com>
@ 2021-01-24  8:10   ` Liu Hao
  2021-01-24 16:41     ` Jonathan Wakely
  0 siblings, 1 reply; 2+ messages in thread
From: Liu Hao @ 2021-01-24  8:10 UTC (permalink / raw)
  To: mingw-w64-public, libstdc++; +Cc: Hannes Domani, Jonathan Wakely


[-- Attachment #1.1: Type: text/plain, Size: 1895 bytes --]

在 2021-01-24 02:05, Hannes Domani via Mingw-w64-public 写道:
> Am Samstag, 23. Januar 2021, 16:46:18 MEZ hat Jeroen Ooms <jeroen@berkeley.edu> Folgendes geschrieben:
> 
>> A user of the R programming language has reported that std::regex
>> causes a hang for certain regular expressions when running in Japanese
>> locale. I was able to reproduce this both with our production
>> toolchain (mingw-w64 v5 + gcc 8) as well as the latest msys2
>> toolchains.
>>
>> Is this a bug in mingw-w64 or elsewhere? Below a minimal example:
>>
>> #include <regex>
>> int main() {
>>     setlocale(LC_ALL, "Japanese");
>>     std::regex reg("[0-9]");
>>     return 0;
>> }
> 
> I can reproduce this as well, it took 108 seconds to finish here.
> 
> Deep in regex is this function:
> std::__detail::_BracketMatcher<std::__cxx11::regex_traits<char>, false, false>::_M_make_cache(std::integral_constant<bool, true>)
> 
> This caches transformed values of the unicode values 0-255 to the current
> locale, with strxfrm_l [1].
> This fails for a lot of them for japanese, and as documented, strxfrm_l
> returns INT_MAX in this case.
> But std::collate::do_transform does not handle any error case, it uses all
> return values as the length of the transformed string.
> And then it creates a copy of this 2GB string, which takes a lot of time,
> around ~1s for each failing character.
> 
> It think this should be reported to gcc (libstdc++).
> 
> 
> [1] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strxfrm-wcsxfrm-strxfrm-l-wcsxfrm-l?view=msvc-160
> 
> 

Add CC libstdc++ and jwakely.

Despite Microsoft docs, the standard `_?(str|wcs)xfrm(_l)?` functions don't have return values to 
indicate errors. This issue seems to be caused by invalid MBCSs passed to `_strxfrm_l`, which should 
be avoided.




-- 
Best regards,
LH_Mouse


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Mingw-w64-public] std::regex freezes in Japanese locale
  2021-01-24  8:10   ` [Mingw-w64-public] std::regex freezes in Japanese locale Liu Hao
@ 2021-01-24 16:41     ` Jonathan Wakely
  0 siblings, 0 replies; 2+ messages in thread
From: Jonathan Wakely @ 2021-01-24 16:41 UTC (permalink / raw)
  To: Liu Hao; +Cc: MinGW-64 Mailinglist, libstdc++, Hannes Domani

On Sun, 24 Jan 2021 at 08:15, Liu Hao <lh_mouse@126.com> wrote:
>
> 在 2021-01-24 02:05, Hannes Domani via Mingw-w64-public 写道:
> > Am Samstag, 23. Januar 2021, 16:46:18 MEZ hat Jeroen Ooms <jeroen@berkeley.edu> Folgendes geschrieben:
> >
> >> A user of the R programming language has reported that std::regex
> >> causes a hang for certain regular expressions when running in Japanese
> >> locale. I was able to reproduce this both with our production
> >> toolchain (mingw-w64 v5 + gcc 8) as well as the latest msys2
> >> toolchains.
> >>
> >> Is this a bug in mingw-w64 or elsewhere? Below a minimal example:
> >>
> >> #include <regex>
> >> int main() {
> >>     setlocale(LC_ALL, "Japanese");
> >>     std::regex reg("[0-9]");
> >>     return 0;
> >> }
> >
> > I can reproduce this as well, it took 108 seconds to finish here.
> >
> > Deep in regex is this function:
> > std::__detail::_BracketMatcher<std::__cxx11::regex_traits<char>, false, false>::_M_make_cache(std::integral_constant<bool, true>)
> >
> > This caches transformed values of the unicode values 0-255 to the current
> > locale, with strxfrm_l [1].
> > This fails for a lot of them for japanese, and as documented, strxfrm_l
> > returns INT_MAX in this case.
> > But std::collate::do_transform does not handle any error case, it uses all
> > return values as the length of the transformed string.
> > And then it creates a copy of this 2GB string, which takes a lot of time,
> > around ~1s for each failing character.
> >
> > It think this should be reported to gcc (libstdc++).
> >
> >
> > [1] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strxfrm-wcsxfrm-strxfrm-l-wcsxfrm-l?view=msvc-160
> >
> >
>
> Add CC libstdc++ and jwakely.
>
> Despite Microsoft docs, the standard `_?(str|wcs)xfrm(_l)?` functions don't have return values to
> indicate errors. This issue seems to be caused by invalid MBCSs passed to `_strxfrm_l`, which should
> be avoided.

I think this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98723

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-01-24 16:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CABFfbXvurKZjO8tjrFrirwy_Spego-nRvOksjkmnDtYsESu=YQ@mail.gmail.com>
     [not found] ` <778019458.8796650.1611425106252@mail.yahoo.com>
2021-01-24  8:10   ` [Mingw-w64-public] std::regex freezes in Japanese locale Liu Hao
2021-01-24 16:41     ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).