From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by sourceware.org (Postfix) with ESMTPS id 82A863857C50 for ; Sun, 24 Jan 2021 16:41:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 82A863857C50 Received: by mail-wm1-x335.google.com with SMTP id u14so1892798wml.4 for ; Sun, 24 Jan 2021 08:41:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=m+TvSju+NUMDr80lDjpTh+Ea/+otoK+03aKO6l1dTwM=; b=YMea3OKPqL3TmtzG8Cer5eQ20cKwMNe3EwX1f1X83N9deU4ZUCGU3lWY+QneZz79Zw d5HE96xUuz0VsFkKxd7sAqOwWSSqnGCYhGb+tdYzbDtujXlDwZn6xtb0l1cjPrB2M85U FuSrY4bzwYdqKqUmIt3BYB691ZIMAkZ61DGiCmX5Kw8g55qLQ0kqqZ4zZcMNYjjYIEC2 uuv13W4mVOF6Bub9tE7WQ9tNpByysiNZd+JFfhHubGqaxpU0c5s+xWoVoBK4HTb5bzfO F1F2L2jwEg0bADVhoyDEvj5zgkZJ7DAT370HycWUIB6FIYK1dY1RzaBqO2DyARntpOEU Ecjg== X-Gm-Message-State: AOAM532fpsrJmFrKW0V/ysNio+F42hwzJkpHTOtANxt/BtDUZkPQzm2r EJLrk7lc8l9vwfo+Ons9vyC0//+CtQiUbBaYKWI= X-Google-Smtp-Source: ABdhPJzJxpAQzP1e8jXOS5LyVqTQOQeyPVY87TP/qn6hE00AMX+huCoNK6Tx3nETPsfu4OselO+U1G0ru+3phADkGmM= X-Received: by 2002:a1c:3185:: with SMTP id x127mr1544547wmx.117.1611506506284; Sun, 24 Jan 2021 08:41:46 -0800 (PST) MIME-Version: 1.0 References: <778019458.8796650.1611425106252@mail.yahoo.com> In-Reply-To: From: Jonathan Wakely Date: Sun, 24 Jan 2021 16:41:34 +0000 Message-ID: Subject: Re: [Mingw-w64-public] std::regex freezes in Japanese locale To: Liu Hao Cc: MinGW-64 Mailinglist , "libstdc++" , Hannes Domani Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Jan 2021 16:41:49 -0000 On Sun, 24 Jan 2021 at 08:15, Liu Hao wrote: > > =E5=9C=A8 2021-01-24 02:05, Hannes Domani via Mingw-w64-public =E5=86=99= =E9=81=93: > > Am Samstag, 23. Januar 2021, 16:46:18 MEZ hat Jeroen Ooms Folgendes geschrieben: > > > >> A user of the R programming language has reported that std::regex > >> causes a hang for certain regular expressions when running in Japanese > >> locale. I was able to reproduce this both with our production > >> toolchain (mingw-w64 v5 + gcc 8) as well as the latest msys2 > >> toolchains. > >> > >> Is this a bug in mingw-w64 or elsewhere? Below a minimal example: > >> > >> #include > >> int main() { > >> setlocale(LC_ALL, "Japanese"); > >> std::regex reg("[0-9]"); > >> return 0; > >> } > > > > I can reproduce this as well, it took 108 seconds to finish here. > > > > Deep in regex is this function: > > std::__detail::_BracketMatcher, false,= false>::_M_make_cache(std::integral_constant) > > > > This caches transformed values of the unicode values 0-255 to the curre= nt > > locale, with strxfrm_l [1]. > > This fails for a lot of them for japanese, and as documented, strxfrm_l > > returns INT_MAX in this case. > > But std::collate::do_transform does not handle any error case, it uses = all > > return values as the length of the transformed string. > > And then it creates a copy of this 2GB string, which takes a lot of tim= e, > > around ~1s for each failing character. > > > > It think this should be reported to gcc (libstdc++). > > > > > > [1] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/st= rxfrm-wcsxfrm-strxfrm-l-wcsxfrm-l?view=3Dmsvc-160 > > > > > > Add CC libstdc++ and jwakely. > > Despite Microsoft docs, the standard `_?(str|wcs)xfrm(_l)?` functions don= 't have return values to > indicate errors. This issue seems to be caused by invalid MBCSs passed to= `_strxfrm_l`, which should > be avoided. I think this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98723