On 07/19/2018 04:39 PM, Florian Weimer wrote:
> On 07/19/2018 09:43 PM, Carlos O'Donell wrote:
>> * Add back tests to tst-fnmatch.input and tst-regexloc.c which 
>> exercise that [a-z] does not match A or Z.
> 
> [a-z] still matches Ã±, ð, but not ð£, which I doubt is useful.

Sorry, I don't follow, it absolutely matches ASCII z.

We deinterlace the collation element ordering (not sequence) to get
the right range expression resolution.

See the added fnmatch tests:

+en_US.UTF-8     "a"                    "[a-z]"                0
+en_US.UTF-8     "z"                    "[a-z]"                0
+en_US.UTF-8     "A"                    "[a-z]"                NOMATCH
+en_US.UTF-8     "Z"                    "[a-z]"                NOMATCH
+en_US.UTF-8     "a"                    "[A-Z]"                NOMATCH
+en_US.UTF-8     "z"                    "[A-Z]"                NOMATCH
+en_US.UTF-8     "A"                    "[A-Z]"                0
+en_US.UTF-8     "Z"                    "[A-Z]"                0
+en_US.UTF-8     "0"                    "[0-9]"                0
+en_US.UTF-8     "9"                    "[0-9]"                0

[a-z] matches a-z (including z), *and* all the lowercase inbetween,
and so behaves like :lower: effectively.

[A-Z] matches A-Z (including Z), *and* all the uppercase inbetwee,
and so behaves like :upper: effectively.

I left in all the matches for the accented characters because it was
the most conservative thing to do for now.

I could be persuaded otherwise I think, just reading the old history
and seeing the new reports seems to indicate we should back down to
behaving like C/POSIX in these cases.

> It's an improvement, and it may be good enough for glibc 2.28, but I would
> rather see us implement the rational ranges interpretation.

That requires all ranges behave rationally?

We could fix a-z, A-Z, and 0-9 easily.

Patch attached.

It has no effect on collation sequence, but it will break scripts
that expect the new-style behaviour, and we knew that, but it
certainly aligns us with the pre-POSIX requirement and the rest of
the GNU tools implementing rational ranges, which is a much better
reason.

-- 
Cheers,
Carlos.