public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Carlos O'Donell <carlos@redhat.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
	Florian Weimer <fweimer@redhat.com>,
	Rich Felker <dalias@aerifal.cx>, Mike Fabian <mfabian@redhat.com>,
	Zorro Lang <zlang@redhat.com>,
	"Joseph S. Myers" <joseph@codesourcery.com>
Subject: Re: [PATCH] Keep expected behaviour for [a-z] and [A-z] (Bug 23393).
Date: Thu, 26 Jul 2018 01:33:00 -0000	[thread overview]
Message-ID: <20180726013351.GC217613@aiede.svl.corp.google.com> (raw)
In-Reply-To: <9d6f47ec-f9eb-ead0-889c-3b9aae66551c@redhat.com>

Hi,

Carlos O'Donell wrote:

> In commit 9479b6d5e08eacce06c6ab60abc9b2f4eb8b71e4 we updated all of
> the collation data to harmonize with the new version of ISO 14651
> which is derived from Unicode 9.0.0.  This collation update brought
> with it some changes to locales which were not desirable by some
> users, in particular it altered the meaning of the
> locale-dependent-range regular expression, namely [a-z] and [A-Z], and
> for en_US it caused uppercase letters to be matched by [a-z] for the
> first time.

The Debian system where it is most convenient for me to test has
Debian's libc6 package, version 2.24-12.  [a-z] matches uppercase
letters.  I've always considered that undesirable but I'm confused
about the described regression.  Did one of Debian's patches to
localedata cause it to pick up the regression early (by which I mean,
more than 5 years ago)?

> In glibc we implement the requirement of ISO POSIX-2:1993 and use
> collation element order (CEO) to construct the range expression, the
> API internally is __collseq_table_lookup().  The fact that we use CEO
> and also have 4-level weights on each collation rule means that we can
> in practice reorder the collation rules in iso14651_t1_common (the new
> data) to provide consistent range expression resolution *and* the
> weights should maintain the expected total order.
[...]
> * Adds new test data en_US.UTF-8.in for sort-test.sh which exercises
>   strcoll* and strxfrm* and ensures the ISO 14651 collation remains.

Cool!  Checking my understanding: does this mean that if I have files

	lll
	MMM
	nnn

that with this patch,

	echo [a-z]*

would no longer match MMM, and

	ls | sort

would continue to sort in the order lll < MMM < nnn?

I wish we had done it 10 years ago. ;-)  Thanks for getting it done.

Jonathan

  parent reply	other threads:[~2018-07-26  1:33 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-19 19:43 Carlos O'Donell
2018-07-19 20:39 ` Florian Weimer
2018-07-20 18:49   ` Carlos O'Donell
2018-07-20 19:02     ` Rich Felker
2018-07-20 19:19     ` Florian Weimer
2018-07-20 21:56       ` Carlos O'Donell
2018-07-23 15:11         ` Florian Weimer
2018-07-23 18:09           ` Rational Ranges - Rafal and Mike's opinion? " Carlos O'Donell
2018-07-24 20:45             ` Rafal Luzynski
2018-07-24 20:53               ` Carlos O'Donell
2018-07-24 20:59               ` Carlos O'Donell
2018-07-25 15:44             ` Mike FABIAN
2018-07-25 15:54           ` [PATCHv3] Expected behaviour for a-z, A-Z, and 0-9 " Carlos O'Donell
2018-07-25 20:19             ` Florian Weimer
2018-07-25 20:25               ` Carlos O'Donell
2018-07-25 20:31                 ` Florian Weimer
2018-07-25 20:57                   ` [PATCHv4] " Carlos O'Donell
2018-07-26  2:34                     ` [PATCHv4a] " Carlos O'Donell
2018-07-26 14:51                       ` Florian Weimer
2018-07-26 14:59                         ` Carlos O'Donell
2018-07-28  1:12                         ` [WIPv5] " Carlos O'Donell
2018-07-30 17:40                           ` Florian Weimer
2018-07-30 17:45                             ` Carlos O'Donell
2018-07-30 17:54                               ` Florian Weimer
2018-07-30 18:26                                 ` Carlos O'Donell
2018-07-30 18:34                                   ` Florian Weimer
2018-07-31  2:18                             ` Carlos O'Donell
2018-07-25 21:06                 ` [PATCHv3] " Rafal Luzynski
2018-07-25 21:12                   ` Carlos O'Donell
2018-07-25 21:35 ` [PATCH] Keep expected behaviour for [a-z] and [A-z] " Carlos O'Donell
2018-07-25 22:50   ` Florian Weimer
2018-07-26  1:20     ` Carlos O'Donell
2018-07-26  8:09       ` Andreas Schwab
2018-07-26  9:16         ` Florian Weimer
2018-07-26  1:33 ` Jonathan Nieder [this message]
2018-07-26  1:49   ` Carlos O'Donell
2018-07-26  2:16     ` Jonathan Nieder
2018-07-26  3:48       ` Carlos O'Donell
2018-07-26  7:42       ` Florian Weimer
2018-07-26  8:18         ` Andreas Schwab
2018-07-26  9:15           ` Florian Weimer
2018-07-26 13:25           ` Carlos O'Donell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180726013351.GC217613@aiede.svl.corp.google.com \
    --to=jrnieder@gmail.com \
    --cc=carlos@redhat.com \
    --cc=dalias@aerifal.cx \
    --cc=fweimer@redhat.com \
    --cc=joseph@codesourcery.com \
    --cc=libc-alpha@sourceware.org \
    --cc=mfabian@redhat.com \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).