From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 74475 invoked by alias); 26 Jul 2018 01:20:19 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 74461 invoked by uid 89); 26 Jul 2018 01:20:19 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=BAYES_00,KAM_MANYTO,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy=territory, Hx-languages-length:2135 X-HELO: mail-qt0-f195.google.com Return-Path: Subject: Re: [PATCH] Keep expected behaviour for [a-z] and [A-z] (Bug 23393). To: Florian Weimer , GNU C Library , Rich Felker , Mike Fabian , Zorro Lang , "Joseph S. Myers" References: <9d6f47ec-f9eb-ead0-889c-3b9aae66551c@redhat.com> <4de6a552-8b4c-ffe0-caf2-0a2d07a908f4@redhat.com> <646a94c8-3b25-b65e-7fc7-0637e58cacc1@redhat.com> From: Carlos O'Donell Message-ID: <1313f0d2-8c64-8ec0-ef09-cd39bd6d4416@redhat.com> Date: Thu, 26 Jul 2018 01:20:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <646a94c8-3b25-b65e-7fc7-0637e58cacc1@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2018-07/txt/msg00869.txt.bz2 On 07/25/2018 06:50 PM, Florian Weimer wrote: > On 07/25/2018 11:35 PM, Carlos O'Donell wrote: >> I have committed only the most conservative fix for this issue, >> which is to deinterlace the lower and upper case ranges. >> >> I think we are too late to commit rational ranges, and we can do >> that in 2.29 when it opens. Right now I want to remove the blocker >> that is causing regressions for en_US.UTF-8 scripts that use [a-z], >> and [A-Z]. > > How is this the most conservative fix, relative to glibc 2.27 > upstream? We have two solutions to fix the regression: * Revert the entire ISO 14651 udpate. - This is 13 commits for just the update. - Several more commits for Rafal and Mike's work on locales on top of that. * Fix the key issue of a-z interleaving with A-Z. My opinion is that is most conservative to fix the interleaving. In 2.27 we accepted 297 characters between A-Z. In 2.28 we accept 2280 characters between A-Z as part of the ISO 14651 update. > [a-z] still matches lots of non-ASCII characters, which it did not > before. This is not true, we were already matching 297 characters between A-Z in 2.27. It has always been the case that we accepted non-ASCII characters in the range. With the ISO 14651 update the *key* issue was that lowercase and uppercase were now mixed in collation element ordering, resulting in surprising matches and failures like the reported xfs test failure where [a-z] matched "Makefile" and broke their test infrastructure. > When I meant that we left regression-fixing territory, I was talking > about the locales which had iso14651_t1_common customizations. OK, so to be clear you think we *should* go forward with rational ranges? I don't think it's too late, we could commit it tomorrow, it should not impact machine testing in way. My v4 fixes all of the locales that either have customizations on iso14651_t1_common or have their own custom locales. No more locales remain to be fixed, I tested all of them with tst-fnmatch.input additions to catch the ones that needed fixing. Cheers, Carlos.