public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Carlos O'Donell <carlos@redhat.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
	Florian Weimer <fweimer@redhat.com>,
	Rich Felker <dalias@aerifal.cx>, Mike Fabian <mfabian@redhat.com>,
	Zorro Lang <zlang@redhat.com>,
	"Joseph S. Myers" <joseph@codesourcery.com>
Subject: Re: [PATCH] Keep expected behaviour for [a-z] and [A-z] (Bug 23393).
Date: Thu, 26 Jul 2018 03:48:00 -0000	[thread overview]
Message-ID: <2cf8f1be-dd69-e2ed-9ad2-8ec14e776d8b@redhat.com> (raw)
In-Reply-To: <20180726021643.GE217613@aiede.svl.corp.google.com>

On 07/25/2018 10:16 PM, Jonathan Nieder wrote:
> Carlos O'Donell wrote:
>> On 07/25/2018 09:33 PM, Jonathan Nieder wrote:
> 
>>> The Debian system where it is most convenient for me to test has
>>> Debian's libc6 package, version 2.24-12.  [a-z] matches uppercase
>>> letters.  I've always considered that undesirable but I'm confused
>>> about the described regression.  Did one of Debian's patches to
>>> localedata cause it to pick up the regression early (by which I mean,
>>> more than 5 years ago)?
>>
>> It depends entirely on the locale you use. Some locales already have
>> [a-z] matching uppercase and have had it for years. The problem is that
>> this is new for en_US.UTF-8.
>>
>> Which locale did you use? en_US.UTF-8? If so, then yes, Debian must have
>> done something different with iso14651_t1_common to change this, or added
>> something else. I did a quick look at the debian patches for 2.24-12 and
>> didn't see anything that would change this materially for en_US.
> 
> I tried with the following locales:
> 
>  en_US:		matches (bad)
>  en_US.UTF-8:	matches (bad)
>  C:		does not match (good)
>  C.UTF-8:	does not match (good)
>  fr_CH:		matches (bad)
>  fr_CH.UTF-8:	matches (bad)
> 
> Looking over
> https://salsa.debian.org/glibc-team/glibc/tree/sid/debian/patches/localedata
> and https://salsa.debian.org/glibc-team/glibc/tree/sid/debian/patches/locale,
> I don't see any obvious culprits.  Anyway, please just take this as more
> feedback in favor of your approach.
> 
> See the user reports merged with https://bugs.debian.org/301717.

This is your shell doing the expanding, and worse doing it
differently from glibc.

My bash shell also handles [a-z] expansion differently given 
the locale data. It appears to be using collation sequence
i.e. the order in which the elements sort in. 

Using grep doesn't result in these matches.

The fix is this: `shopt -s globasciiranges`, and we should
make it the default from now on. The option turns on rational
ranges for bash. Florian found this out when digging into
the issue.

We have a lot of cleanup to do to get rational ranges on
at each step of expansion.

Cheers,
Carlos.


  reply	other threads:[~2018-07-26  3:48 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-19 19:43 Carlos O'Donell
2018-07-19 20:39 ` Florian Weimer
2018-07-20 18:49   ` Carlos O'Donell
2018-07-20 19:02     ` Rich Felker
2018-07-20 19:19     ` Florian Weimer
2018-07-20 21:56       ` Carlos O'Donell
2018-07-23 15:11         ` Florian Weimer
2018-07-23 18:09           ` Rational Ranges - Rafal and Mike's opinion? " Carlos O'Donell
2018-07-24 20:45             ` Rafal Luzynski
2018-07-24 20:53               ` Carlos O'Donell
2018-07-24 20:59               ` Carlos O'Donell
2018-07-25 15:44             ` Mike FABIAN
2018-07-25 15:54           ` [PATCHv3] Expected behaviour for a-z, A-Z, and 0-9 " Carlos O'Donell
2018-07-25 20:19             ` Florian Weimer
2018-07-25 20:25               ` Carlos O'Donell
2018-07-25 20:31                 ` Florian Weimer
2018-07-25 20:57                   ` [PATCHv4] " Carlos O'Donell
2018-07-26  2:34                     ` [PATCHv4a] " Carlos O'Donell
2018-07-26 14:51                       ` Florian Weimer
2018-07-26 14:59                         ` Carlos O'Donell
2018-07-28  1:12                         ` [WIPv5] " Carlos O'Donell
2018-07-30 17:40                           ` Florian Weimer
2018-07-30 17:45                             ` Carlos O'Donell
2018-07-30 17:54                               ` Florian Weimer
2018-07-30 18:26                                 ` Carlos O'Donell
2018-07-30 18:34                                   ` Florian Weimer
2018-07-31  2:18                             ` Carlos O'Donell
2018-07-25 21:06                 ` [PATCHv3] " Rafal Luzynski
2018-07-25 21:12                   ` Carlos O'Donell
2018-07-25 21:35 ` [PATCH] Keep expected behaviour for [a-z] and [A-z] " Carlos O'Donell
2018-07-25 22:50   ` Florian Weimer
2018-07-26  1:20     ` Carlos O'Donell
2018-07-26  8:09       ` Andreas Schwab
2018-07-26  9:16         ` Florian Weimer
2018-07-26  1:33 ` Jonathan Nieder
2018-07-26  1:49   ` Carlos O'Donell
2018-07-26  2:16     ` Jonathan Nieder
2018-07-26  3:48       ` Carlos O'Donell [this message]
2018-07-26  7:42       ` Florian Weimer
2018-07-26  8:18         ` Andreas Schwab
2018-07-26  9:15           ` Florian Weimer
2018-07-26 13:25           ` Carlos O'Donell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2cf8f1be-dd69-e2ed-9ad2-8ec14e776d8b@redhat.com \
    --to=carlos@redhat.com \
    --cc=dalias@aerifal.cx \
    --cc=fweimer@redhat.com \
    --cc=joseph@codesourcery.com \
    --cc=jrnieder@gmail.com \
    --cc=libc-alpha@sourceware.org \
    --cc=mfabian@redhat.com \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).