From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 112980 invoked by alias); 30 Jul 2018 17:45:58 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 112970 invoked by uid 89); 30 Jul 2018 17:45:57 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=regress, urgent X-HELO: mail-qk0-f169.google.com Return-Path: Subject: Re: [WIPv5] Expected behaviour for a-z, A-Z, and 0-9 (Bug 23393). To: Florian Weimer , libc-alpha@sourceware.org References: <9d6f47ec-f9eb-ead0-889c-3b9aae66551c@redhat.com> <5bcef059-b928-d2e9-82dd-2ae68be96020@redhat.com> <541d18da-6318-382e-d5cd-6c69a5db1a07@redhat.com> <8359bdf2-457e-e2f1-ac90-e4b27b2e0495@redhat.com> <94e7e1b2-6bfe-617f-2060-160631f82f80@redhat.com> <7630a77d-ae62-d500-aa36-dc5e54ff38b5@redhat.com> <45fb2593-81cd-9062-128f-5af2b29a7a7f@redhat.com> <52e7399c-1e6c-0ddc-ca04-3fed2410123d@redhat.com> From: Carlos O'Donell Openpgp: preference=signencrypt Message-ID: Date: Mon, 30 Jul 2018 17:45:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <52e7399c-1e6c-0ddc-ca04-3fed2410123d@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2018-07/txt/msg01041.txt.bz2 On 07/30/2018 01:39 PM, Florian Weimer wrote: > On 07/28/2018 03:12 AM, Carlos O'Donell wrote: >> On 07/26/2018 10:50 AM, Florian Weimer wrote: >>> shs_CA: U+0000E6 matches /[a-z]/ unexpectedly >>> shs_CA: U+0000C6 matches /[A-Z]/ unexpectedly >>> shs_CA.utf8: U+0000E6 matches /[a-z]/ unexpectedly >>> shs_CA.utf8: U+0000C6 matches /[A-Z]/ unexpectedly > >> This is a WIP, because the number of tests now is too big >> to simply add them to tst-fnmatch.input, and so I'm writing >> a new tester tst-rational-ranges.c. I'm parsing SUPPORTED, >> expecting all of the locales to be built for testing, and >> then running through all the rational ranges to test >> inclusion of the required datums. > > Let me repeat my suggestion that we should initially fix the locales > with the common collation order, where glibc 2.28 regresses. I do not think it is appropriate to release rational range support on only a subset of the SUPPORTED set of locales. Either we support it on all SUPPORTED locales or we work until we are ready. At present glibc 2.28 does not regress because of commit 7cd7d36f1feb3ccacf476e909b115b45cdd46e77 to deinterlace lower and uppercase. In glibc 2.28 we simply have ~2500 characters in the range of a-z, and in 2.27 we had ~250, it's still a large set of non-ASCII characters accepted by the range, all because we caught up to Unicode 9.0.0 with the ISO 14651 collation update (and will soon updated to Unicode 10.0.0 with the next release, and probably always lagging a bit). I don't see an urgent need to get rational range support into 2.28. I was happy to get it in earlier, but now with deeper testing showing that not all locales are working correctly, I'm not happy to see this go out the door. I think it will be ready very shortly, and we can check it in immediately into 2.29, and then continue our work on code point ranges as the next step, which will require even more testing, and internal API cleanup. -- Cheers, Carlos.