On 07/19/2018 04:39 PM, Florian Weimer wrote: > On 07/19/2018 09:43 PM, Carlos O'Donell wrote: >> * Add back tests to tst-fnmatch.input and tst-regexloc.c which >> exercise that [a-z] does not match A or Z. > > [a-z] still matches ñ, 𝚗, but not 𝚣, which I doubt is useful. Sorry, I don't follow, it absolutely matches ASCII z. We deinterlace the collation element ordering (not sequence) to get the right range expression resolution. See the added fnmatch tests: +en_US.UTF-8 "a" "[a-z]" 0 +en_US.UTF-8 "z" "[a-z]" 0 +en_US.UTF-8 "A" "[a-z]" NOMATCH +en_US.UTF-8 "Z" "[a-z]" NOMATCH +en_US.UTF-8 "a" "[A-Z]" NOMATCH +en_US.UTF-8 "z" "[A-Z]" NOMATCH +en_US.UTF-8 "A" "[A-Z]" 0 +en_US.UTF-8 "Z" "[A-Z]" 0 +en_US.UTF-8 "0" "[0-9]" 0 +en_US.UTF-8 "9" "[0-9]" 0 [a-z] matches a-z (including z), *and* all the lowercase inbetween, and so behaves like :lower: effectively. [A-Z] matches A-Z (including Z), *and* all the uppercase inbetwee, and so behaves like :upper: effectively. I left in all the matches for the accented characters because it was the most conservative thing to do for now. I could be persuaded otherwise I think, just reading the old history and seeing the new reports seems to indicate we should back down to behaving like C/POSIX in these cases. > It's an improvement, and it may be good enough for glibc 2.28, but I would > rather see us implement the rational ranges interpretation. That requires all ranges behave rationally? We could fix a-z, A-Z, and 0-9 easily. Patch attached. It has no effect on collation sequence, but it will break scripts that expect the new-style behaviour, and we knew that, but it certainly aligns us with the pre-POSIX requirement and the rest of the GNU tools implementing rational ranges, which is a much better reason. -- Cheers, Carlos.