From: Florian Weimer <fweimer@redhat.com>
To: Carlos O'Donell <carlos@redhat.com>,
GNU C Library <libc-alpha@sourceware.org>,
Rich Felker <dalias@aerifal.cx>, Mike Fabian <mfabian@redhat.com>,
Zorro Lang <zlang@redhat.com>,
"Joseph S. Myers" <joseph@codesourcery.com>
Subject: Re: [PATCH] Keep expected behaviour for [a-z] and [A-z] (Bug 23393).
Date: Fri, 20 Jul 2018 19:19:00 -0000 [thread overview]
Message-ID: <5bcef059-b928-d2e9-82dd-2ae68be96020@redhat.com> (raw)
In-Reply-To: <f905879a-fd42-331e-eac1-46ed54d06d9e@redhat.com>
On 07/20/2018 08:49 PM, Carlos O'Donell wrote:
> On 07/19/2018 04:39 PM, Florian Weimer wrote:
>> On 07/19/2018 09:43 PM, Carlos O'Donell wrote:
>>> * Add back tests to tst-fnmatch.input and tst-regexloc.c which
>>> exercise that [a-z] does not match A or Z.
>>
>> [a-z] still matches ñ, ð, but not ð£, which I doubt is useful.
>
> Sorry, I don't follow, it absolutely matches ASCII z.
The z I wrote above is one of the non-BMP math characters.
> We deinterlace the collation element ordering (not sequence) to get
> the right range expression resolution.
>
> See the added fnmatch tests:
>
> +en_US.UTF-8 "a" "[a-z]" 0
> +en_US.UTF-8 "z" "[a-z]" 0
> +en_US.UTF-8 "A" "[a-z]" NOMATCH
> +en_US.UTF-8 "Z" "[a-z]" NOMATCH
> +en_US.UTF-8 "a" "[A-Z]" NOMATCH
> +en_US.UTF-8 "z" "[A-Z]" NOMATCH
> +en_US.UTF-8 "A" "[A-Z]" 0
> +en_US.UTF-8 "Z" "[A-Z]" 0
> +en_US.UTF-8 "0" "[0-9]" 0
> +en_US.UTF-8 "9" "[0-9]" 0
>
> [a-z] matches a-z (including z), *and* all the lowercase inbetween,
> and so behaves like :lower: effectively.
There are characters equivalent to ASCII z (like the z above), but which
sort after z, so they are not matched. This is one reason why I think
this is a bad idea: it looks like [:lower:], but it's not. Same for
[0-9], I assume.
>> It's an improvement, and it may be good enough for glibc 2.28, but I would
>> rather see us implement the rational ranges interpretation.
>
> That requires all ranges behave rationally?
>
> We could fix a-z, A-Z, and 0-9 easily.
>
> Patch attached.
(NB: Patch is relative to the previous patch.)
My enumeration tester likes it much more. 8-)
actual: "abcdefghijklmnopqrstuvwxyz"
actual: "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
actual: "0123456789"
That's for [a-z], [A-Z], [0-9], in en_US.UTF-8 and de_DE.ISO-8859-1.
However, I still get this:
tst-regex-classes.script:85:0: result character set difference in locale
tr_TR.ISO-8859-9
enumerate_chars '[a-z]' "abcdefghijklmnopqrstuvwxyz";
^
expected: "abcdefghijklmnopqrstuvwxyz"
actual: "abcdefghjklmnopqrstuvwxyz"
tst-regex-classes.script:86:0: result character set difference in locale
tr_TR.ISO-8859-9
enumerate_chars '[A-Z]' "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
^
expected: "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
actual: "ABCDEFGHJKLMNOPQRSTUVWXYZ"
error: 2 test failures
Can you fix this with data-only changes, too?
posix/bug-regex17 regresses as well in the test for bug 9697, but I can
incorporate that into my enumeration tester. I don't think the bug is
actually regressing, it's just that the test objective is not expressed
properly in it.
posix/tst-rxspencer fails as well, presumably due to this:
UTF-8 aA FAIL regcomp failed: Invalid range end
UTF-8 aAcC FAIL regcomp failed: Invalid range end
I think this happens because the test blindly replaces ASCII characters
with non-ASCII characters, which causes issues if they are not ordered
as expected.
Thanks,
Florian
next prev parent reply other threads:[~2018-07-20 19:19 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-19 19:43 Carlos O'Donell
2018-07-19 20:39 ` Florian Weimer
2018-07-20 18:49 ` Carlos O'Donell
2018-07-20 19:02 ` Rich Felker
2018-07-20 19:19 ` Florian Weimer [this message]
2018-07-20 21:56 ` Carlos O'Donell
2018-07-23 15:11 ` Florian Weimer
2018-07-23 18:09 ` Rational Ranges - Rafal and Mike's opinion? " Carlos O'Donell
2018-07-24 20:45 ` Rafal Luzynski
2018-07-24 20:53 ` Carlos O'Donell
2018-07-24 20:59 ` Carlos O'Donell
2018-07-25 15:44 ` Mike FABIAN
2018-07-25 15:54 ` [PATCHv3] Expected behaviour for a-z, A-Z, and 0-9 " Carlos O'Donell
2018-07-25 20:19 ` Florian Weimer
2018-07-25 20:25 ` Carlos O'Donell
2018-07-25 20:31 ` Florian Weimer
2018-07-25 20:57 ` [PATCHv4] " Carlos O'Donell
2018-07-26 2:34 ` [PATCHv4a] " Carlos O'Donell
2018-07-26 14:51 ` Florian Weimer
2018-07-26 14:59 ` Carlos O'Donell
2018-07-28 1:12 ` [WIPv5] " Carlos O'Donell
2018-07-30 17:40 ` Florian Weimer
2018-07-30 17:45 ` Carlos O'Donell
2018-07-30 17:54 ` Florian Weimer
2018-07-30 18:26 ` Carlos O'Donell
2018-07-30 18:34 ` Florian Weimer
2018-07-31 2:18 ` Carlos O'Donell
2018-07-25 21:06 ` [PATCHv3] " Rafal Luzynski
2018-07-25 21:12 ` Carlos O'Donell
2018-07-25 21:35 ` [PATCH] Keep expected behaviour for [a-z] and [A-z] " Carlos O'Donell
2018-07-25 22:50 ` Florian Weimer
2018-07-26 1:20 ` Carlos O'Donell
2018-07-26 8:09 ` Andreas Schwab
2018-07-26 9:16 ` Florian Weimer
2018-07-26 1:33 ` Jonathan Nieder
2018-07-26 1:49 ` Carlos O'Donell
2018-07-26 2:16 ` Jonathan Nieder
2018-07-26 3:48 ` Carlos O'Donell
2018-07-26 7:42 ` Florian Weimer
2018-07-26 8:18 ` Andreas Schwab
2018-07-26 9:15 ` Florian Weimer
2018-07-26 13:25 ` Carlos O'Donell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5bcef059-b928-d2e9-82dd-2ae68be96020@redhat.com \
--to=fweimer@redhat.com \
--cc=carlos@redhat.com \
--cc=dalias@aerifal.cx \
--cc=joseph@codesourcery.com \
--cc=libc-alpha@sourceware.org \
--cc=mfabian@redhat.com \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).