From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 50613 invoked by alias); 26 Jul 2018 13:25:40 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 50603 invoked by uid 89); 26 Jul 2018 13:25:40 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=zs, HX-Received:sk:5-v6mr1 X-HELO: mail-qt0-f193.google.com Return-Path: Subject: Re: [PATCH] Keep expected behaviour for [a-z] and [A-z] (Bug 23393). To: Andreas Schwab , Florian Weimer Cc: Jonathan Nieder , GNU C Library , Rich Felker , Mike Fabian , Zorro Lang , "Joseph S. Myers" References: <9d6f47ec-f9eb-ead0-889c-3b9aae66551c@redhat.com> <20180726013351.GC217613@aiede.svl.corp.google.com> <02c54107-d38d-885c-2f5e-656315667d19@redhat.com> <20180726021643.GE217613@aiede.svl.corp.google.com> From: Carlos O'Donell Message-ID: <6e16e657-d946-53ce-2010-daaa523cec1f@redhat.com> Date: Thu, 26 Jul 2018 13:25:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2018-07/txt/msg00887.txt.bz2 On 07/26/2018 04:18 AM, Andreas Schwab wrote: > On Jul 26 2018, Florian Weimer wrote: > >> The bash implementation of glob always uses strcoll/wcscoll ordering when >> globasciirange is not active. It does not use collation element ordering, >> so rearranging collation data does not affect it. > > Why does strcoll not agree with the collation sequence? There are two terms that mean very different things. The strcoll output and collation sequence are the same. The collation sequence is not the same as the collation element ordering (the order of the rules in the source file). POSIX mandated the use of collation element ordering (not sequence) for regular expression ranges, and then decided this was a bad idea and instead made it unspecified. In glibc we continue to implement and support collation element ordering, not collation sequence, for posix regular expression ranges. Even collation sequence is a bad idea because [a-z] does not include all the z's that are sorted after z, and you need special collation element markers like AFTER-Z to find all the z's. Instead we should use rational ranges and make everything based on code points to make it portable across all locales. Cheers, Carlos.