From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 2191) id 8E6843855005; Tue, 27 Jul 2021 17:34:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8E6843855005 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="iso-8859-1" From: Carlos O'Donell To: glibc-cvs@sourceware.org Subject: [glibc/codonell/c-utf8] Fix fnmatch and regcomp for zero collation rule locales. X-Act-Checkin: glibc X-Git-Author: Carlos O'Donell X-Git-Refname: refs/heads/codonell/c-utf8 X-Git-Oldrev: 147ed422f86c5753b861e473dceca0b7787f0f65 X-Git-Newrev: 3bca8f2cb69f1cc453511e1b8ac5be5cceb35bd7 Message-Id: <20210727173423.8E6843855005@sourceware.org> Date: Tue, 27 Jul 2021 17:34:23 +0000 (GMT) X-BeenThere: glibc-cvs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jul 2021 17:34:23 -0000 https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3bca8f2cb69f1cc453511e1b8ac5be5cceb35bd7 commit 3bca8f2cb69f1cc453511e1b8ac5be5cceb35bd7 Author: Carlos O'Donell Date: Tue Jul 27 13:32:26 2021 -0400 Fix fnmatch and regcomp for zero collation rule locales. Add test coverage for a zero rule locale via additional testing in bug-regex1, bug-regex4, bug-regex6, bug-regex19, transbug, tst-fnmatch, tst-fnmatch7, tst-regcomp-truncated, and tst-regex using C.UTF-8 (zero collation rule locale). Diff: --- posix/Makefile | 180 +++++++++++++++---- posix/bug-regex1.c | 20 +++ posix/bug-regex19.c | 22 ++- posix/bug-regex4.c | 25 +++ posix/bug-regex6.c | 2 +- posix/fnmatch_loop.c | 29 +-- posix/regcomp.c | 12 +- posix/transbug.c | 22 ++- posix/tst-fnmatch.input | 397 ++++++++++++++++++++++++++++++++++++++++++ posix/tst-fnmatch7.c | 18 ++ posix/tst-regcomp-truncated.c | 1 + posix/tst-regex.c | 25 ++- 12 files changed, 678 insertions(+), 75 deletions(-) diff --git a/posix/Makefile b/posix/Makefile index 059efb3cd2..61aba17db8 100644 --- a/posix/Makefile +++ b/posix/Makefile @@ -73,41 +73,133 @@ routines := \ globfree64-time64 aux := init-posix environ -tests := test-errno tstgetopt testfnm runtests runptests \ - tst-preadwrite tst-preadwrite64 test-vfork regexbug1 \ - tst-mmap tst-mmap-offset tst-getaddrinfo tst-truncate \ - tst-truncate64 tst-fork tst-fnmatch tst-regexloc tst-dir \ - tst-chmod bug-regex1 bug-regex2 bug-regex3 bug-regex4 \ - tst-gnuglob tst-gnuglob64 tst-regex bug-regex6 bug-regex7 \ - bug-regex8 bug-regex9 bug-regex10 bug-regex11 bug-regex12 \ - bug-regex13 bug-regex14 bug-regex15 bug-regex16 \ - bug-regex17 bug-regex18 bug-regex19 \ - bug-regex21 bug-regex22 bug-regex23 bug-regex24 \ - bug-regex25 bug-regex26 bug-regex27 bug-regex28 \ - bug-regex29 bug-regex30 bug-regex31 bug-regex32 \ - tst-nice tst-nanosleep tst-regex2 \ - transbug tst-rxspencer tst-pcre tst-boost \ - bug-ga1 tst-vfork1 tst-vfork2 \ - tst-waitid tst-wait4 tst-wait3 \ - tst-getaddrinfo2 bug-glob2 bug-glob3 tst-sysconf \ - tst-execvp1 tst-execvp2 tst-execlp1 tst-execlp2 \ - tst-execv1 tst-execv2 tst-execl1 tst-execl2 \ - tst-execve1 tst-execve2 tst-execle1 tst-execle2 \ - tst-execvp3 tst-execvp4 \ - tst-execvpe1 tst-execvpe2 tst-execvpe3 tst-execvpe4 \ - tst-execvpe5 tst-execvpe6 \ - tst-getaddrinfo3 tst-fnmatch2 tst-cpucount tst-cpuset \ - bug-getopt1 bug-getopt2 bug-getopt3 bug-getopt4 \ - bug-getopt5 tst-getopt_long1 bug-regex34 bug-regex35 \ - tst-pathconf tst-rxspencer-no-utf8 \ - tst-fnmatch3 bug-regex36 \ - tst-fnmatch4 tst-fnmatch5 tst-fnmatch6 \ - tst-posix_spawn-fd tst-posix_spawn-setsid \ - tst-posix_fadvise tst-posix_fadvise64 \ - tst-sysconf-empty-chroot tst-glob_symlinks tst-fexecve \ - tst-glob-tilde test-ssize-max tst-spawn4 bug-regex37 \ - bug-regex38 tst-regcomp-truncated tst-spawn-chdir \ - tst-wordexp-nocmd tst-execveat tst-spawn5 +tests := \ + bug-ga1 \ + bug-getopt1 \ + bug-getopt2 \ + bug-getopt3 \ + bug-getopt4 \ + bug-getopt5 \ + bug-glob2 \ + bug-glob3 \ + bug-regex1 \ + bug-regex10 \ + bug-regex11 \ + bug-regex12 \ + bug-regex13 \ + bug-regex14 \ + bug-regex15 \ + bug-regex16 \ + bug-regex17 \ + bug-regex18 \ + bug-regex19 \ + bug-regex2 \ + bug-regex21 \ + bug-regex22 \ + bug-regex23 \ + bug-regex24 \ + bug-regex25 \ + bug-regex26 \ + bug-regex27 \ + bug-regex28 \ + bug-regex29 \ + bug-regex3 \ + bug-regex30 \ + bug-regex31 \ + bug-regex32 \ + bug-regex34 \ + bug-regex35 \ + bug-regex36 \ + bug-regex37 \ + bug-regex38 \ + bug-regex4 \ + bug-regex6 \ + bug-regex7 \ + bug-regex8 \ + bug-regex9 \ + runptests \ + runtests \ + test-errno \ + testfnm \ + test-ssize-max \ + test-vfork regexbug1 \ + transbug \ + tst-boost \ + tst-chmod \ + tst-cpucount \ + tst-cpuset \ + tst-dir \ + tst-execl1 \ + tst-execl2 \ + tst-execle1 \ + tst-execle2 \ + tst-execlp1 \ + tst-execlp2 \ + tst-execv1 \ + tst-execv2 \ + tst-execve1 \ + tst-execve2 \ + tst-execveat \ + tst-execvp1 \ + tst-execvp2 \ + tst-execvp3 \ + tst-execvp4 \ + tst-execvpe1 \ + tst-execvpe2 \ + tst-execvpe3 \ + tst-execvpe4 \ + tst-execvpe5 \ + tst-execvpe6 \ + tst-fexecve \ + tst-fnmatch \ + tst-fnmatch2 \ + tst-fnmatch3 \ + tst-fnmatch4 \ + tst-fnmatch5 \ + tst-fnmatch6 \ + tst-fnmatch7 \ + tst-fork \ + tst-getaddrinfo \ + tst-getaddrinfo2 \ + tst-getaddrinfo3 \ + tstgetopt \ + tst-getopt_long1 \ + tst-glob_symlinks \ + tst-glob-tilde \ + tst-gnuglob \ + tst-gnuglob64 \ + tst-mmap \ + tst-mmap-offset \ + tst-nanosleep \ + tst-nice \ + tst-pathconf \ + tst-pcre \ + tst-posix_fadvise \ + tst-posix_fadvise64 \ + tst-posix_spawn-fd \ + tst-posix_spawn-setsid \ + tst-preadwrite \ + tst-preadwrite64 \ + tst-regcomp-truncated \ + tst-regex \ + tst-regex2 \ + tst-regexloc \ + tst-rxspencer \ + tst-rxspencer-no-utf8 \ + tst-spawn4 \ + tst-spawn5 \ + tst-spawn-chdir \ + tst-sysconf \ + tst-sysconf-empty-chroot \ + tst-truncate \ + tst-truncate64 \ + tst-vfork1 \ + tst-vfork2 \ + tst-wait3 \ + tst-wait4 \ + tst-waitid \ + tst-wordexp-nocmd \ + # tests # Test for the glob symbol version that was replaced in glibc 2.27. ifeq ($(have-GLIBC_2.26)$(build-shared),yesyes) @@ -190,9 +282,20 @@ $(objpfx)wordexp-tst.out: wordexp-tst.sh $(objpfx)wordexp-test $(evaluate-test) endif -LOCALES := cs_CZ.UTF-8 da_DK.ISO-8859-1 de_DE.ISO-8859-1 de_DE.UTF-8 \ - en_US.UTF-8 es_US.ISO-8859-1 es_US.UTF-8 ja_JP.EUC-JP tr_TR.UTF-8 \ - cs_CZ.ISO-8859-2 +LOCALES := \ + C.UTF-8 \ + cs_CZ.UTF-8 \ + da_DK.ISO-8859-1 \ + de_DE.ISO-8859-1 \ + de_DE.UTF-8 \ + en_US.UTF-8 \ + es_US.ISO-8859-1 \ + es_US.UTF-8 \ + ja_JP.EUC-JP \ + tr_TR.UTF-8 \ + cs_CZ.ISO-8859-2 \ + # LOCALES + include ../gen-locales.mk $(objpfx)bug-regex1.out: $(gen-locales) @@ -222,6 +325,7 @@ $(objpfx)tst-regexloc.out: $(gen-locales) $(objpfx)tst-rxspencer.out: $(gen-locales) $(objpfx)tst-rxspencer-no-utf8.out: $(gen-locales) $(objpfx)tst-regcomp-truncated.out: $(gen-locales) +$(objpfx)tst-fnmatch7.out: $(gen-locales) endif # If we will use the generic uname implementation, we must figure out what diff --git a/posix/bug-regex1.c b/posix/bug-regex1.c index 38eb543951..85da8cc7ca 100644 --- a/posix/bug-regex1.c +++ b/posix/bug-regex1.c @@ -41,6 +41,26 @@ main (void) puts (" -> OK"); } + puts ("in C.UTF-8 locale"); + setlocale (LC_ALL, "C.UTF-8"); + s = re_compile_pattern ("[anù]*n", 7, ®ex); + if (s != NULL) + { + puts ("re_compile_pattern return non-NULL value"); + result = 1; + } + else + { + match = re_match (®ex, "an", 2, 0, ®s); + if (match != 2) + { + printf ("re_match returned %d, expected 2\n", match); + result = 1; + } + else + puts (" -> OK"); + } + puts ("in de_DE.ISO-8859-1 locale"); setlocale (LC_ALL, "de_DE.ISO-8859-1"); s = re_compile_pattern ("[anù]*n", 7, ®ex); diff --git a/posix/bug-regex19.c b/posix/bug-regex19.c index b3fee0a730..e00ff60a14 100644 --- a/posix/bug-regex19.c +++ b/posix/bug-regex19.c @@ -25,6 +25,7 @@ #include #include #include +#include #define BRE RE_SYNTAX_POSIX_BASIC #define ERE RE_SYNTAX_POSIX_EXTENDED @@ -407,8 +408,8 @@ do_mb_tests (const struct test_s *test) return 0; } -int -main (void) +static int +do_test (void) { size_t i; int ret = 0; @@ -417,20 +418,17 @@ main (void) for (i = 0; i < sizeof (tests) / sizeof (tests[0]); ++i) { - if (setlocale (LC_ALL, "de_DE.ISO-8859-1") == NULL) - { - puts ("setlocale de_DE.ISO-8859-1 failed"); - ret = 1; - } + xsetlocale (LC_ALL, "de_DE.ISO-8859-1"); ret |= do_one_test (&tests[i], ""); - if (setlocale (LC_ALL, "de_DE.UTF-8") == NULL) - { - puts ("setlocale de_DE.UTF-8 failed"); - ret = 1; - } + xsetlocale (LC_ALL, "de_DE.UTF-8"); + ret |= do_one_test (&tests[i], "UTF-8 "); + ret |= do_mb_tests (&tests[i]); + xsetlocale (LC_ALL, "C.UTF-8"); ret |= do_one_test (&tests[i], "UTF-8 "); ret |= do_mb_tests (&tests[i]); } return ret; } + +#include diff --git a/posix/bug-regex4.c b/posix/bug-regex4.c index 8d5ae11567..6475833c52 100644 --- a/posix/bug-regex4.c +++ b/posix/bug-regex4.c @@ -32,8 +32,33 @@ main (void) memset (®ex, '\0', sizeof (regex)); + printf ("INFO: Checking C.\n"); setlocale (LC_ALL, "C"); + s = re_compile_pattern ("ab[cde]", 7, ®ex); + if (s != NULL) + { + puts ("re_compile_pattern returned non-NULL value"); + result = 1; + } + else + { + match[0] = re_search_2 (®ex, "xyabez", 6, "", 0, 1, 5, NULL, 6); + match[1] = re_search_2 (®ex, NULL, 0, "abc", 3, 0, 3, NULL, 3); + match[2] = re_search_2 (®ex, "xya", 3, "bd", 2, 2, 3, NULL, 5); + if (match[0] != 2 || match[1] != 0 || match[2] != 2) + { + printf ("re_search_2 returned %d,%d,%d, expected 2,0,2\n", + match[0], match[1], match[2]); + result = 1; + } + else + puts (" -> OK"); + } + + printf ("INFO: Checking C.UTF-8.\n"); + setlocale (LC_ALL, "C.UTF-8"); + s = re_compile_pattern ("ab[cde]", 7, ®ex); if (s != NULL) { diff --git a/posix/bug-regex6.c b/posix/bug-regex6.c index 2bdf2126a4..0929b69b83 100644 --- a/posix/bug-regex6.c +++ b/posix/bug-regex6.c @@ -30,7 +30,7 @@ main (int argc, char *argv[]) regex_t re; regmatch_t mat[10]; int i, j, ret = 0; - const char *locales[] = { "C", "de_DE.UTF-8" }; + const char *locales[] = { "C", "C.UTF-8", "de_DE.UTF-8" }; const char *string = "http://www.regex.com/pattern/matching.html#intro"; regmatch_t expect[10] = { { 0, 48 }, { 0, 5 }, { 0, 4 }, { 5, 20 }, { 7, 20 }, { 20, 42 }, diff --git a/posix/fnmatch_loop.c b/posix/fnmatch_loop.c index 7f938af590..2092a2de73 100644 --- a/posix/fnmatch_loop.c +++ b/posix/fnmatch_loop.c @@ -51,6 +51,7 @@ FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end, _NL_CURRENT(LC_COLLATE, _NL_COLLATE_COLLSEQMB); # endif #endif + uint32_t nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES); while ((c = *p++) != L_('\0')) { @@ -324,8 +325,6 @@ FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end, diagnose a "used initialized" in a dead branch in the findidx function. */ UCHAR str; - uint32_t nrules = - _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES); const CHAR *startp = p; c = *++p; @@ -437,8 +436,6 @@ FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end, if (c == L_('[') && *p == L_('.')) { - uint32_t nrules = - _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES); const CHAR *startp = p; size_t c1 = 0; @@ -608,10 +605,12 @@ FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end, various characters appear in the source file. A strange concept, nowhere documented. */ - uint32_t fcollseq; - uint32_t lcollseq; + uint32_t fcollseq = 0; + uint32_t lcollseq = 1; /* Set higher than fcollseq. */ UCHAR cend = *p++; + if (nrules != 0) + { # if WIDE_CHAR_VERSION /* Search in the 'names' array for the characters. */ fcollseq = __collseq_table_lookup (collseq, fn); @@ -629,13 +628,11 @@ FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end, fcollseq = collseq[fn]; lcollseq = is_seqval ? cold : collseq[(UCHAR) cold]; # endif + } is_seqval = false; if (cend == L_('[') && *p == L_('.')) { - uint32_t nrules = - _NL_CURRENT_WORD (LC_COLLATE, - _NL_COLLATE_NRULES); const CHAR *startp = p; size_t c1 = 0; @@ -755,11 +752,11 @@ FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end, /* XXX It is not entirely clear to me how to handle characters which are not mentioned in the collation specification. */ - if ( + if (nrules != 0 && ( # if WIDE_CHAR_VERSION lcollseq == 0xffffffff || # endif - lcollseq <= fcollseq) + lcollseq <= fcollseq)) { /* We have to look at the upper bound. */ uint32_t hcollseq; @@ -789,6 +786,16 @@ FCT (const CHAR *pattern, const CHAR *string, const CHAR *string_end, if (lcollseq <= hcollseq && fcollseq <= hcollseq) goto matched; } + else + { + /* No rules, but it is a range. */ + + if (cend == L_('\0')) + return FNM_NOMATCH; + + if ((UCHAR) cold <= fn && fn <= cend) + goto matched; + } # if WIDE_CHAR_VERSION range_not_matched: # endif diff --git a/posix/regcomp.c b/posix/regcomp.c index d93698ae78..f55d20cbfd 100644 --- a/posix/regcomp.c +++ b/posix/regcomp.c @@ -2889,7 +2889,7 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token, if (MB_CUR_MAX == 1) */ if (nrules == 0) - return collseqmb[br_elem->opr.ch]; + return br_elem->opr.ch; else { wint_t wc = __btowc (br_elem->opr.ch); @@ -2900,6 +2900,8 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token, { if (nrules != 0) return __collseq_table_lookup (collseqwc, br_elem->opr.wch); + else + return br_elem->opr.wch; } else if (br_elem->type == COLL_SYM) { @@ -2935,7 +2937,7 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token, } } else if (sym_name_len == 1) - return collseqmb[br_elem->opr.name[0]]; + return br_elem->opr.name[0]; } return UINT_MAX; } @@ -3017,7 +3019,7 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token, if (MB_CUR_MAX == 1) */ if (nrules == 0) - ch_collseq = collseqmb[ch]; + ch_collseq = ch; else ch_collseq = __collseq_table_lookup (collseqwc, __btowc (ch)); if (start_collseq <= ch_collseq && ch_collseq <= end_collseq) @@ -3103,11 +3105,11 @@ parse_bracket_exp (re_string_t *regexp, re_dfa_t *dfa, re_token_t *token, int token_len; bool first_round = true; #ifdef _LIBC - collseqmb = (const unsigned char *) - _NL_CURRENT (LC_COLLATE, _NL_COLLATE_COLLSEQMB); nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES); if (nrules) { + collseqmb = (const unsigned char *) + _NL_CURRENT (LC_COLLATE, _NL_COLLATE_COLLSEQMB); /* if (MB_CUR_MAX > 1) */ diff --git a/posix/transbug.c b/posix/transbug.c index d0983b4d44..71632b7976 100644 --- a/posix/transbug.c +++ b/posix/transbug.c @@ -116,14 +116,30 @@ do_test (void) static const char lower[] = "[[:lower:]]+"; static const char upper[] = "[[:upper:]]+"; struct re_registers regs[4]; + int result; +#define CHECK(exp) \ + if (exp) { puts (#exp); result = 1; } + + printf ("INFO: Checking C.\n"); setlocale (LC_ALL, "C"); (void) re_set_syntax (RE_SYNTAX_GNU_AWK); - int result; -#define CHECK(exp) \ - if (exp) { puts (#exp); result = 1; } + result = run_test (lower, regs); + result |= run_test (upper, ®s[2]); + if (! result) + { + CHECK (regs[0].start[0] != regs[2].start[0]); + CHECK (regs[0].end[0] != regs[2].end[0]); + CHECK (regs[1].start[0] != regs[3].start[0]); + CHECK (regs[1].end[0] != regs[3].end[0]); + } + + printf ("INFO: Checking C.UTF-8.\n"); + setlocale (LC_ALL, "C.UTF-8"); + + (void) re_set_syntax (RE_SYNTAX_GNU_AWK); result = run_test (lower, regs); result |= run_test (upper, ®s[2]); diff --git a/posix/tst-fnmatch.input b/posix/tst-fnmatch.input index 67aac5aada..9ba975a285 100644 --- a/posix/tst-fnmatch.input +++ b/posix/tst-fnmatch.input @@ -472,6 +472,403 @@ C "\\" "[Z-\\]]" 0 C "]" "[Z-\\]]" 0 C "-" "[Z-\\]]" NOMATCH +C "a" "[0-9]*" NOMATCH +C "48" "[0-9]*" 0 + +# B.6 004(C) +C.UTF-8 "!#%+,-./01234567889" "!#%+,-./01234567889" 0 +C.UTF-8 ":;=@ABCDEFGHIJKLMNO" ":;=@ABCDEFGHIJKLMNO" 0 +C.UTF-8 "PQRSTUVWXYZ]abcdefg" "PQRSTUVWXYZ]abcdefg" 0 +C.UTF-8 "hijklmnopqrstuvwxyz" "hijklmnopqrstuvwxyz" 0 +C.UTF-8 "^_{}~" "^_{}~" 0 + +# B.6 005(C) +C.UTF-8 "\"$&'()" "\\\"\\$\\&\\'\\(\\)" 0 +C.UTF-8 "*?[\\`|" "\\*\\?\\[\\\\\\`\\|" 0 +C.UTF-8 "<>" "\\<\\>" 0 + +# B.6 006(C) +C.UTF-8 "?*[" "[?*[][?*[][?*[]" 0 +C.UTF-8 "a/b" "?/b" 0 + +# B.6 007(C) +C.UTF-8 "a/b" "a?b" 0 +C.UTF-8 "a/b" "a/?" 0 +C.UTF-8 "aa/b" "?/b" NOMATCH +C.UTF-8 "aa/b" "a?b" NOMATCH +C.UTF-8 "a/bb" "a/?" NOMATCH + +# B.6 009(C) +C.UTF-8 "abc" "[abc]" NOMATCH +C.UTF-8 "x" "[abc]" NOMATCH +C.UTF-8 "a" "[abc]" 0 +C.UTF-8 "[" "[[abc]" 0 +C.UTF-8 "a" "[][abc]" 0 +C.UTF-8 "a]" "[]a]]" 0 + +# B.6 010(C) +C.UTF-8 "xyz" "[!abc]" NOMATCH +C.UTF-8 "x" "[!abc]" 0 +C.UTF-8 "a" "[!abc]" NOMATCH + +# B.6 011(C) +C.UTF-8 "]" "[][abc]" 0 +C.UTF-8 "abc]" "[][abc]" NOMATCH +C.UTF-8 "[]abc" "[][]abc" NOMATCH +C.UTF-8 "]" "[!]]" NOMATCH +C.UTF-8 "aa]" "[!]a]" NOMATCH +C.UTF-8 "]" "[!a]" 0 +C.UTF-8 "]]" "[!a]]" 0 + +# B.6 012(C) +C.UTF-8 "a" "[[.a.]]" 0 +C.UTF-8 "-" "[[.-.]]" 0 +C.UTF-8 "-" "[[.-.][.].]]" 0 +C.UTF-8 "-" "[[.].][.-.]]" 0 +C.UTF-8 "-" "[[.-.][=u=]]" 0 +C.UTF-8 "-" "[[.-.][:alpha:]]" 0 +C.UTF-8 "a" "[![.a.]]" NOMATCH + +# B.6 013(C) +C.UTF-8 "a" "[[.b.]]" NOMATCH +C.UTF-8 "a" "[[.b.][.c.]]" NOMATCH +C.UTF-8 "a" "[[.b.][=b=]]" NOMATCH + + +# B.6 015(C) +C.UTF-8 "a" "[[=a=]]" 0 +C.UTF-8 "b" "[[=a=]b]" 0 +C.UTF-8 "b" "[[=a=][=b=]]" 0 +C.UTF-8 "a" "[[=a=][=b=]]" 0 +C.UTF-8 "a" "[[=a=][.b.]]" 0 +C.UTF-8 "a" "[[=a=][:digit:]]" 0 + +# B.6 016(C) +C.UTF-8 "=" "[[=a=]b]" NOMATCH +C.UTF-8 "]" "[[=a=]b]" NOMATCH +C.UTF-8 "a" "[[=b=][=c=]]" NOMATCH +C.UTF-8 "a" "[[=b=][.].]]" NOMATCH +C.UTF-8 "a" "[[=b=][:digit:]]" NOMATCH + +# B.6 017(C) +C.UTF-8 "a" "[[:alnum:]]" 0 +C.UTF-8 "a" "[![:alnum:]]" NOMATCH +C.UTF-8 "-" "[[:alnum:]]" NOMATCH +C.UTF-8 "a]a" "[[:alnum:]]a" NOMATCH +C.UTF-8 "-" "[[:alnum:]-]" 0 +C.UTF-8 "aa" "[[:alnum:]]a" 0 +C.UTF-8 "-" "[![:alnum:]]" 0 +C.UTF-8 "]" "[!][:alnum:]]" NOMATCH +C.UTF-8 "[" "[![:alnum:][]" NOMATCH +C.UTF-8 "a" "[[:alnum:]]" 0 +C.UTF-8 "b" "[[:alnum:]]" 0 +C.UTF-8 "c" "[[:alnum:]]" 0 +C.UTF-8 "d" "[[:alnum:]]" 0 +C.UTF-8 "e" "[[:alnum:]]" 0 +C.UTF-8 "f" "[[:alnum:]]" 0 +C.UTF-8 "g" "[[:alnum:]]" 0 +C.UTF-8 "h" "[[:alnum:]]" 0 +C.UTF-8 "i" "[[:alnum:]]" 0 +C.UTF-8 "j" "[[:alnum:]]" 0 +C.UTF-8 "k" "[[:alnum:]]" 0 +C.UTF-8 "l" "[[:alnum:]]" 0 +C.UTF-8 "m" "[[:alnum:]]" 0 +C.UTF-8 "n" "[[:alnum:]]" 0 +C.UTF-8 "o" "[[:alnum:]]" 0 +C.UTF-8 "p" "[[:alnum:]]" 0 +C.UTF-8 "q" "[[:alnum:]]" 0 +C.UTF-8 "r" "[[:alnum:]]" 0 +C.UTF-8 "s" "[[:alnum:]]" 0 +C.UTF-8 "t" "[[:alnum:]]" 0 +C.UTF-8 "u" "[[:alnum:]]" 0 +C.UTF-8 "v" "[[:alnum:]]" 0 +C.UTF-8 "w" "[[:alnum:]]" 0 +C.UTF-8 "x" "[[:alnum:]]" 0 +C.UTF-8 "y" "[[:alnum:]]" 0 +C.UTF-8 "z" "[[:alnum:]]" 0 +C.UTF-8 "A" "[[:alnum:]]" 0 +C.UTF-8 "B" "[[:alnum:]]" 0 +C.UTF-8 "C" "[[:alnum:]]" 0 +C.UTF-8 "D" "[[:alnum:]]" 0 +C.UTF-8 "E" "[[:alnum:]]" 0 +C.UTF-8 "F" "[[:alnum:]]" 0 +C.UTF-8 "G" "[[:alnum:]]" 0 +C.UTF-8 "H" "[[:alnum:]]" 0 +C.UTF-8 "I" "[[:alnum:]]" 0 +C.UTF-8 "J" "[[:alnum:]]" 0 +C.UTF-8 "K" "[[:alnum:]]" 0 +C.UTF-8 "L" "[[:alnum:]]" 0 +C.UTF-8 "M" "[[:alnum:]]" 0 +C.UTF-8 "N" "[[:alnum:]]" 0 +C.UTF-8 "O" "[[:alnum:]]" 0 +C.UTF-8 "P" "[[:alnum:]]" 0 +C.UTF-8 "Q" "[[:alnum:]]" 0 +C.UTF-8 "R" "[[:alnum:]]" 0 +C.UTF-8 "S" "[[:alnum:]]" 0 +C.UTF-8 "T" "[[:alnum:]]" 0 +C.UTF-8 "U" "[[:alnum:]]" 0 +C.UTF-8 "V" "[[:alnum:]]" 0 +C.UTF-8 "W" "[[:alnum:]]" 0 +C.UTF-8 "X" "[[:alnum:]]" 0 +C.UTF-8 "Y" "[[:alnum:]]" 0 +C.UTF-8 "Z" "[[:alnum:]]" 0 +C.UTF-8 "0" "[[:alnum:]]" 0 +C.UTF-8 "1" "[[:alnum:]]" 0 +C.UTF-8 "2" "[[:alnum:]]" 0 +C.UTF-8 "3" "[[:alnum:]]" 0 +C.UTF-8 "4" "[[:alnum:]]" 0 +C.UTF-8 "5" "[[:alnum:]]" 0 +C.UTF-8 "6" "[[:alnum:]]" 0 +C.UTF-8 "7" "[[:alnum:]]" 0 +C.UTF-8 "8" "[[:alnum:]]" 0 +C.UTF-8 "9" "[[:alnum:]]" 0 +C.UTF-8 "!" "[[:alnum:]]" NOMATCH +C.UTF-8 "#" "[[:alnum:]]" NOMATCH +C.UTF-8 "%" "[[:alnum:]]" NOMATCH +C.UTF-8 "+" "[[:alnum:]]" NOMATCH +C.UTF-8 "," "[[:alnum:]]" NOMATCH +C.UTF-8 "-" "[[:alnum:]]" NOMATCH +C.UTF-8 "." "[[:alnum:]]" NOMATCH +C.UTF-8 "/" "[[:alnum:]]" NOMATCH +C.UTF-8 ":" "[[:alnum:]]" NOMATCH +C.UTF-8 ";" "[[:alnum:]]" NOMATCH +C.UTF-8 "=" "[[:alnum:]]" NOMATCH +C.UTF-8 "@" "[[:alnum:]]" NOMATCH +C.UTF-8 "[" "[[:alnum:]]" NOMATCH +C.UTF-8 "\\" "[[:alnum:]]" NOMATCH +C.UTF-8 "]" "[[:alnum:]]" NOMATCH +C.UTF-8 "^" "[[:alnum:]]" NOMATCH +C.UTF-8 "_" "[[:alnum:]]" NOMATCH +C.UTF-8 "{" "[[:alnum:]]" NOMATCH +C.UTF-8 "}" "[[:alnum:]]" NOMATCH +C.UTF-8 "~" "[[:alnum:]]" NOMATCH +C.UTF-8 "\"" "[[:alnum:]]" NOMATCH +C.UTF-8 "$" "[[:alnum:]]" NOMATCH +C.UTF-8 "&" "[[:alnum:]]" NOMATCH +C.UTF-8 "'" "[[:alnum:]]" NOMATCH +C.UTF-8 "(" "[[:alnum:]]" NOMATCH +C.UTF-8 ")" "[[:alnum:]]" NOMATCH +C.UTF-8 "*" "[[:alnum:]]" NOMATCH +C.UTF-8 "?" "[[:alnum:]]" NOMATCH +C.UTF-8 "`" "[[:alnum:]]" NOMATCH +C.UTF-8 "|" "[[:alnum:]]" NOMATCH +C.UTF-8 "<" "[[:alnum:]]" NOMATCH +C.UTF-8 ">" "[[:alnum:]]" NOMATCH +C.UTF-8 "\t" "[[:cntrl:]]" 0 +C.UTF-8 "t" "[[:cntrl:]]" NOMATCH +C.UTF-8 "t" "[[:lower:]]" 0 +C.UTF-8 "\t" "[[:lower:]]" NOMATCH +C.UTF-8 "T" "[[:lower:]]" NOMATCH +C.UTF-8 "\t" "[[:space:]]" 0 +C.UTF-8 "t" "[[:space:]]" NOMATCH +C.UTF-8 "t" "[[:alpha:]]" 0 +C.UTF-8 "\t" "[[:alpha:]]" NOMATCH +C.UTF-8 "0" "[[:digit:]]" 0 +C.UTF-8 "\t" "[[:digit:]]" NOMATCH +C.UTF-8 "t" "[[:digit:]]" NOMATCH +C.UTF-8 "\t" "[[:print:]]" NOMATCH +C.UTF-8 "t" "[[:print:]]" 0 +C.UTF-8 "T" "[[:upper:]]" 0 +C.UTF-8 "\t" "[[:upper:]]" NOMATCH +C.UTF-8 "t" "[[:upper:]]" NOMATCH +C.UTF-8 "\t" "[[:blank:]]" 0 +C.UTF-8 "t" "[[:blank:]]" NOMATCH +C.UTF-8 "\t" "[[:graph:]]" NOMATCH +C.UTF-8 "t" "[[:graph:]]" 0 +C.UTF-8 "." "[[:punct:]]" 0 +C.UTF-8 "t" "[[:punct:]]" NOMATCH +C.UTF-8 "\t" "[[:punct:]]" NOMATCH +C.UTF-8 "0" "[[:xdigit:]]" 0 +C.UTF-8 "\t" "[[:xdigit:]]" NOMATCH +C.UTF-8 "a" "[[:xdigit:]]" 0 +C.UTF-8 "A" "[[:xdigit:]]" 0 +C.UTF-8 "t" "[[:xdigit:]]" NOMATCH +C.UTF-8 "a" "[[alpha]]" NOMATCH +C.UTF-8 "a" "[[alpha:]]" NOMATCH +C.UTF-8 "a]" "[[alpha]]" 0 +C.UTF-8 "a]" "[[alpha:]]" 0 +C.UTF-8 "a" "[[:alpha:][.b.]]" 0 +C.UTF-8 "a" "[[:alpha:][=b=]]" 0 +C.UTF-8 "a" "[[:alpha:][:digit:]]" 0 +C.UTF-8 "a" "[[:digit:][:alpha:]]" 0 + +# B.6 018(C) +C.UTF-8 "a" "[a-c]" 0 +C.UTF-8 "b" "[a-c]" 0 +C.UTF-8 "c" "[a-c]" 0 +C.UTF-8 "a" "[b-c]" NOMATCH +C.UTF-8 "d" "[b-c]" NOMATCH +C.UTF-8 "B" "[a-c]" NOMATCH +C.UTF-8 "b" "[A-C]" NOMATCH +C.UTF-8 "" "[a-c]" NOMATCH +C.UTF-8 "as" "[a-ca-z]" NOMATCH +C.UTF-8 "a" "[[.a.]-c]" 0 +C.UTF-8 "a" "[a-[.c.]]" 0 +C.UTF-8 "a" "[[.a.]-[.c.]]" 0 +C.UTF-8 "b" "[[.a.]-c]" 0 +C.UTF-8 "b" "[a-[.c.]]" 0 +C.UTF-8 "b" "[[.a.]-[.c.]]" 0 +C.UTF-8 "c" "[[.a.]-c]" 0 +C.UTF-8 "c" "[a-[.c.]]" 0 +C.UTF-8 "c" "[[.a.]-[.c.]]" 0 +C.UTF-8 "d" "[[.a.]-c]" NOMATCH +C.UTF-8 "d" "[a-[.c.]]" NOMATCH +C.UTF-8 "d" "[[.a.]-[.c.]]" NOMATCH + +# B.6 019(C) +C.UTF-8 "a" "[c-a]" NOMATCH +C.UTF-8 "a" "[[.c.]-a]" NOMATCH +C.UTF-8 "a" "[c-[.a.]]" NOMATCH +C.UTF-8 "a" "[[.c.]-[.a.]]" NOMATCH +C.UTF-8 "c" "[c-a]" NOMATCH +C.UTF-8 "c" "[[.c.]-a]" NOMATCH +C.UTF-8 "c" "[c-[.a.]]" NOMATCH +C.UTF-8 "c" "[[.c.]-[.a.]]" NOMATCH + +# B.6 020(C) +C.UTF-8 "a" "[a-c0-9]" 0 +C.UTF-8 "d" "[a-c0-9]" NOMATCH +C.UTF-8 "B" "[a-c0-9]" NOMATCH + +# B.6 021(C) +C.UTF-8 "-" "[-a]" 0 +C.UTF-8 "a" "[-b]" NOMATCH +C.UTF-8 "-" "[!-a]" NOMATCH +C.UTF-8 "a" "[!-b]" 0 +C.UTF-8 "-" "[a-c-0-9]" 0 +C.UTF-8 "b" "[a-c-0-9]" 0 +C.UTF-8 "a:" "a[0-9-a]" NOMATCH +C.UTF-8 "a:" "a[09-a]" 0 + +# B.6 024(C) +C.UTF-8 "" "*" 0 +C.UTF-8 "asd/sdf" "*" 0 + +# B.6 025(C) +C.UTF-8 "as" "[a-c][a-z]" 0 +C.UTF-8 "as" "??" 0 + +# B.6 026(C) +C.UTF-8 "asd/sdf" "as*df" 0 +C.UTF-8 "asd/sdf" "as*" 0 +C.UTF-8 "asd/sdf" "*df" 0 +C.UTF-8 "asd/sdf" "as*dg" NOMATCH +C.UTF-8 "asdf" "as*df" 0 +C.UTF-8 "asdf" "as*df?" NOMATCH +C.UTF-8 "asdf" "as*??" 0 +C.UTF-8 "asdf" "a*???" 0 +C.UTF-8 "asdf" "*????" 0 +C.UTF-8 "asdf" "????*" 0 +C.UTF-8 "asdf" "??*?" 0 + +# B.6 027(C) +C.UTF-8 "/" "/" 0 +C.UTF-8 "/" "/*" 0 +C.UTF-8 "/" "*/" 0 +C.UTF-8 "/" "/?" NOMATCH +C.UTF-8 "/" "?/" NOMATCH +C.UTF-8 "/" "?" 0 +C.UTF-8 "." "?" 0 +C.UTF-8 "/." "??" 0 +C.UTF-8 "/" "[!a-c]" 0 +C.UTF-8 "." "[!a-c]" 0 + +# B.6 029(C) +C.UTF-8 "/" "/" 0 PATHNAME +C.UTF-8 "//" "//" 0 PATHNAME +C.UTF-8 "/.a" "/*" 0 PATHNAME +C.UTF-8 "/.a" "/?a" 0 PATHNAME +C.UTF-8 "/.a" "/[!a-z]a" 0 PATHNAME +C.UTF-8 "/.a/.b" "/*/?b" 0 PATHNAME + +# B.6 030(C) +C.UTF-8 "/" "?" NOMATCH PATHNAME +C.UTF-8 "/" "*" NOMATCH PATHNAME +C.UTF-8 "a/b" "a?b" NOMATCH PATHNAME +C.UTF-8 "/.a/.b" "/*b" NOMATCH PATHNAME + +# B.6 031(C) +C.UTF-8 "/$" "\\/\\$" 0 +C.UTF-8 "/[" "\\/\\[" 0 +C.UTF-8 "/[" "\\/[" 0 +C.UTF-8 "/[]" "\\/\\[]" 0 + +# B.6 032(C) +C.UTF-8 "/$" "\\/\\$" NOMATCH NOESCAPE +C.UTF-8 "/\\$" "\\/\\$" NOMATCH NOESCAPE +C.UTF-8 "\\/\\$" "\\/\\$" 0 NOESCAPE + +# B.6 033(C) +C.UTF-8 ".asd" ".*" 0 PERIOD +C.UTF-8 "/.asd" "*" 0 PERIOD +C.UTF-8 "/as/.df" "*/?*f" 0 PERIOD +C.UTF-8 "..asd" ".[!a-z]*" 0 PERIOD + +# B.6 034(C) +C.UTF-8 ".asd" "*" NOMATCH PERIOD +C.UTF-8 ".asd" "?asd" NOMATCH PERIOD +C.UTF-8 ".asd" "[!a-z]*" NOMATCH PERIOD + +# B.6 035(C) +C.UTF-8 "/." "/." 0 PATHNAME|PERIOD +C.UTF-8 "/.a./.b." "/.*/.*" 0 PATHNAME|PERIOD +C.UTF-8 "/.a./.b." "/.??/.??" 0 PATHNAME|PERIOD + +# B.6 036(C) +C.UTF-8 "/." "*" NOMATCH PATHNAME|PERIOD +C.UTF-8 "/." "/*" NOMATCH PATHNAME|PERIOD +C.UTF-8 "/." "/?" NOMATCH PATHNAME|PERIOD +C.UTF-8 "/." "/[!a-z]" NOMATCH PATHNAME|PERIOD +C.UTF-8 "/a./.b." "/*/*" NOMATCH PATHNAME|PERIOD +C.UTF-8 "/a./.b." "/??/???" NOMATCH PATHNAME|PERIOD + +# Some home-grown tests. +C.UTF-8 "foobar" "foo*[abc]z" NOMATCH +C.UTF-8 "foobaz" "foo*[abc][xyz]" 0 +C.UTF-8 "foobaz" "foo?*[abc][xyz]" 0 +C.UTF-8 "foobaz" "foo?*[abc][x/yz]" 0 +C.UTF-8 "foobaz" "foo?*[abc]/[xyz]" NOMATCH PATHNAME +C.UTF-8 "a" "a/" NOMATCH PATHNAME +C.UTF-8 "a/" "a" NOMATCH PATHNAME +C.UTF-8 "//a" "/a" NOMATCH PATHNAME +C.UTF-8 "/a" "//a" NOMATCH PATHNAME +C.UTF-8 "az" "[a-]z" 0 +C.UTF-8 "bz" "[ab-]z" 0 +C.UTF-8 "cz" "[ab-]z" NOMATCH +C.UTF-8 "-z" "[ab-]z" 0 +C.UTF-8 "az" "[-a]z" 0 +C.UTF-8 "bz" "[-ab]z" 0 +C.UTF-8 "cz" "[-ab]z" NOMATCH +C.UTF-8 "-z" "[-ab]z" 0 +C.UTF-8 "\\" "[\\\\-a]" 0 +C.UTF-8 "_" "[\\\\-a]" 0 +C.UTF-8 "a" "[\\\\-a]" 0 +C.UTF-8 "-" "[\\\\-a]" NOMATCH +C.UTF-8 "\\" "[\\]-a]" NOMATCH +C.UTF-8 "_" "[\\]-a]" 0 +C.UTF-8 "a" "[\\]-a]" 0 +C.UTF-8 "]" "[\\]-a]" 0 +C.UTF-8 "-" "[\\]-a]" NOMATCH +C.UTF-8 "\\" "[!\\\\-a]" NOMATCH +C.UTF-8 "_" "[!\\\\-a]" NOMATCH +C.UTF-8 "a" "[!\\\\-a]" NOMATCH +C.UTF-8 "-" "[!\\\\-a]" 0 +C.UTF-8 "!" "[\\!-]" 0 +C.UTF-8 "-" "[\\!-]" 0 +C.UTF-8 "\\" "[\\!-]" NOMATCH +C.UTF-8 "Z" "[Z-\\\\]" 0 +C.UTF-8 "[" "[Z-\\\\]" 0 +C.UTF-8 "\\" "[Z-\\\\]" 0 +C.UTF-8 "-" "[Z-\\\\]" NOMATCH +C.UTF-8 "Z" "[Z-\\]]" 0 +C.UTF-8 "[" "[Z-\\]]" 0 +C.UTF-8 "\\" "[Z-\\]]" 0 +C.UTF-8 "]" "[Z-\\]]" 0 +C.UTF-8 "-" "[Z-\\]]" NOMATCH + +C.UTF-8 "a" "[0-9]*" NOMATCH +C.UTF-8 "48" "[0-9]*" 0 + # Following are tests outside the scope of IEEE 2003.2 since they are using # locales other than the C locale. The main focus of the tests is on the # handling of ranges and the recognition of character (vs bytes). diff --git a/posix/tst-fnmatch7.c b/posix/tst-fnmatch7.c new file mode 100644 index 0000000000..440c9ca59e --- /dev/null +++ b/posix/tst-fnmatch7.c @@ -0,0 +1,18 @@ +#include +#include +#include +#include + +static int +do_test (void) +{ + char pattern[] = "[a-c]"; + const char *string = "a"; + + xsetlocale (LC_ALL, "C.UTF-8"); + TEST_VERIFY (fnmatch (pattern, string, 0) == 0); + + return 0; +} + +#include diff --git a/posix/tst-regcomp-truncated.c b/posix/tst-regcomp-truncated.c index 84195fcd2e..da3f97799e 100644 --- a/posix/tst-regcomp-truncated.c +++ b/posix/tst-regcomp-truncated.c @@ -37,6 +37,7 @@ static const char locales[][17] = { "C", + "C.UTF-8", "en_US.UTF-8", "de_DE.ISO-8859-1", }; diff --git a/posix/tst-regex.c b/posix/tst-regex.c index e7c2b05e86..4be5d173eb 100644 --- a/posix/tst-regex.c +++ b/posix/tst-regex.c @@ -32,6 +32,7 @@ #include #include #include +#include #if defined _POSIX_CPUTIME && _POSIX_CPUTIME >= 0 @@ -150,9 +151,23 @@ test_expr (const char *expr, int expected, int expectedicase) size_t outlen; char *uexpr; - /* First test: search with an UTF-8 locale. */ - if (setlocale (LC_ALL, "de_DE.UTF-8") == NULL) - error (EXIT_FAILURE, 0, "cannot set locale de_DE.UTF-8"); + /* First test: search with basic C.UTF-8 locale. */ + printf ("INFO: Testing C.UTF-8.\n"); + xsetlocale (LC_ALL, "C.UTF-8"); + + printf ("\nTest \"%s\" with multi-byte locale\n", expr); + result = run_test (expr, mem, memlen, 0, expected); + printf ("\nTest \"%s\" with multi-byte locale, case insensitive\n", expr); + result |= run_test (expr, mem, memlen, 1, expectedicase); + printf ("\nTest \"%s\" backwards with multi-byte locale\n", expr); + result |= run_test_backwards (expr, mem, memlen, 0, expected); + printf ("\nTest \"%s\" backwards with multi-byte locale, case insensitive\n", + expr); + result |= run_test_backwards (expr, mem, memlen, 1, expectedicase); + + /* Second test: search with an UTF-8 locale. */ + printf ("INFO: Testing de_DE.UTF-8.\n"); + xsetlocale (LC_ALL, "de_DE.UTF-8"); printf ("\nTest \"%s\" with multi-byte locale\n", expr); result = run_test (expr, mem, memlen, 0, expected); @@ -165,8 +180,8 @@ test_expr (const char *expr, int expected, int expectedicase) result |= run_test_backwards (expr, mem, memlen, 1, expectedicase); /* Second test: search with an ISO-8859-1 locale. */ - if (setlocale (LC_ALL, "de_DE.ISO-8859-1") == NULL) - error (EXIT_FAILURE, 0, "cannot set locale de_DE.ISO-8859-1"); + printf ("INFO: Testing de_DE.ISO-8859-1.\n"); + xsetlocale (LC_ALL, "de_DE.ISO-8859-1"); inmem = (char *) expr; inlen = strlen (expr);