On 07/20/2018 03:19 PM, Florian Weimer wrote: > On 07/20/2018 08:49 PM, Carlos O'Donell wrote: >> On 07/19/2018 04:39 PM, Florian Weimer wrote: >>> On 07/19/2018 09:43 PM, Carlos O'Donell wrote: >>>> * Add back tests to tst-fnmatch.input and tst-regexloc.c which >>>> exercise that [a-z] does not match A or Z. >>> >>> [a-z] still matches ñ, 𝚗, but not 𝚣, which I doubt is useful. >> >> Sorry, I don't follow, it absolutely matches ASCII z. > > The z I wrote above is one of the non-BMP math characters. Thanks :-} It was a conservative solution. >> We deinterlace the collation element ordering (not sequence) to get >> the right range expression resolution. >> >> See the added fnmatch tests: >> >> +en_US.UTF-8     "a"                    "[a-z]"                0 >> +en_US.UTF-8     "z"                    "[a-z]"                0 >> +en_US.UTF-8     "A"                    "[a-z]"                NOMATCH >> +en_US.UTF-8     "Z"                    "[a-z]"                NOMATCH >> +en_US.UTF-8     "a"                    "[A-Z]"                NOMATCH >> +en_US.UTF-8     "z"                    "[A-Z]"                NOMATCH >> +en_US.UTF-8     "A"                    "[A-Z]"                0 >> +en_US.UTF-8     "Z"                    "[A-Z]"                0 >> +en_US.UTF-8     "0"                    "[0-9]"                0 >> +en_US.UTF-8     "9"                    "[0-9]"                0 >> >> [a-z] matches a-z (including z), *and* all the lowercase inbetween, >> and so behaves like :lower: effectively. > > There are characters equivalent to ASCII z (like the z above), but > which sort after z, so they are not matched. This is one reason why > I think this is a bad idea: it looks like [:lower:], but it's not. > Same for [0-9], I assume. Again, conservatively, this is how it worked before, and now works again the same, but retains the improvement of ISO 14651 data being added. >>> It's an improvement, and it may be good enough for glibc 2.28, but I would >>> rather see us implement the rational ranges interpretation. >> >> That requires all ranges behave rationally? >> >> We could fix a-z, A-Z, and 0-9 easily. >> >> Patch attached. > > (NB: Patch is relative to the previous patch.) > > My enumeration tester likes it much more. 8-) It was designed exactly for your enumerator ;-) >   actual:   "abcdefghijklmnopqrstuvwxyz" >   actual:   "ABCDEFGHIJKLMNOPQRSTUVWXYZ" >   actual:   "0123456789" > > That's for [a-z], [A-Z], [0-9], in en_US.UTF-8 and de_DE.ISO-8859-1. However, I still get this: > > tst-regex-classes.script:85:0: result character set difference in locale tr_TR.ISO-8859-9 > enumerate_chars '[a-z]' "abcdefghijklmnopqrstuvwxyz"; > ^ >   expected: "abcdefghijklmnopqrstuvwxyz" >   actual:   "abcdefghjklmnopqrstuvwxyz" > > tst-regex-classes.script:86:0: result character set difference in locale tr_TR.ISO-8859-9 > enumerate_chars '[A-Z]' "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; > ^ >   expected: "ABCDEFGHIJKLMNOPQRSTUVWXYZ" >   actual:   "ABCDEFGHJKLMNOPQRSTUVWXYZ" > error: 2 test failures > > Can you fix this with data-only changes, too? Yes, I need to duplicate the rational range for A-Z in tr_TR and remove 'i' since it's just fine the way it is, the existing New patch attached with additional tests in tst-fnmatch.input to test tr_TR.UTF-8, and ISO-8859-9. Noticed equivalence class issues and filed a bug and added an XFAIL-ish test case in test-fnmatch.input: https://sourceware.org/bugzilla/show_bug.cgi?id=23437 > posix/bug-regex17 regresses as well in the test for bug 9697, but I > can incorporate that into my enumeration tester. I don't think the > bug is actually regressing, it's just that the test objective is not > expressed properly in it. Fixed. > > posix/tst-rxspencer fails as well, presumably due to this: > > UTF-8 aA FAIL regcomp failed: Invalid range end > UTF-8 aAcC FAIL regcomp failed: Invalid range end > > I think this happens because the test blindly replaces ASCII > characters with non-ASCII characters, which causes issues if they are > not ordered as expected. Fixed. v2 - Fixed tr_TR by duplicating A-Z rational range. - Fixed tst-rxspender. - Fixed bug-regex17. Tell me how the new version does. -- Cheers, Carlos.