From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16859 invoked by alias); 6 Feb 2015 06:55:36 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 16802 invoked by uid 48); 6 Feb 2015 06:55:31 -0000 From: "timshen at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug libstdc++/63776] [C++11] Regex collate matching not working Date: Fri, 06 Feb 2015 06:55:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libstdc++ X-Bugzilla-Version: 4.9.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: timshen at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-02/txt/msg00539.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D63776 --- Comment #8 from Tim Shen --- I'm not sure how you call boost::regex in your code, here's what I did: // g++ b.cc -lboost_regex -licuuc #include #include #include #include using namespace boost; int main() { std::locale loc("en_US.UTF-8"); std::string s(u8"=C4=AA"); u32regex re =3D make_u32regex("[[:alpha:]]"); std::cout << u32regex_match(s.data(), s.data() + s.size(), re) << "\n"; return 0; } If this is the way that we do utf-8 matching using boost, then I don't think std::regex_match and boost::u32regex_match (notice that it's not boost::regex_match) have the same semantic. An user who uses boost::u32regex_match explicitly tells the library that "I want a unicode match here, here's my regex object, with type u32regex, plea= se do the decode for and match for me", and u32regex is actually boost::basic_regex< ::UChar32, icu_regex_traits> with a library defined regex_traits. u32regex_match, on the other hand, takes no user defined regex_traits type, but u32regex only. I don't think std::regex_match should care about decoding a char string to wchar_t string and call std::regex_match>, leaving user defined RegexTraits potentially unused. Instead, user can maually decode the utf-8 string (I'm sad we don't have a standard char iterator adaptor which converts a utf-8 char iterator to char= 32_t iterator) and call std::regex_match<..., wchar_t, ...>. These are my understanding, so it's surely possible that I may miss somethi= ng. Thoughts? >>From gcc-bugs-return-476207-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Fri Feb 06 07:03:56 2015 Return-Path: Delivered-To: listarch-gcc-bugs@gcc.gnu.org Received: (qmail 32281 invoked by alias); 6 Feb 2015 07:03:56 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Delivered-To: mailing list gcc-bugs@gcc.gnu.org Received: (qmail 32123 invoked by uid 48); 6 Feb 2015 07:03:52 -0000 From: "timshen at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug libstdc++/63775] [C++11] Regex range with leading dash (-) not working Date: Fri, 06 Feb 2015 07:03:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libstdc++ X-Bugzilla-Version: 4.9.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: timshen at gcc dot gnu.org X-Bugzilla-Status: RESOLVED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status resolution Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-02/txt/msg00540.txt.bz2 Content-length: 416 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63775 Tim Shen changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #3 from Tim Shen --- Fixed.