From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 50933 invoked by alias); 24 Feb 2018 06:16:27 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 50917 invoked by uid 89); 24 Feb 2018 06:16:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=z X-HELO: mail-qt0-f170.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=j5eG+CpwgL+gF0iBUoZKhxLKZmnISpFQMXWYot38IYg=; b=nNxpyzxVD4uKdduEFU/l4SFjv+8pWrEZbMWiuOczvpBxbpi3F8te3Drrx7wRmaOg2T Y5ZhBk4tCbiaR+aGY/7JSZP9v3tJPVnz1T5Aq6JAcdlo0jCnRVxJBKiAp8V02LJqaocz FEryh6+9fkqkiV431y8A4uO/qcSe+LwsGjLirdWDlxYFPJg6oRyQkBlOhSbptuOp7njY PxWkaY+jGUPVevDjbouQaf7lYhrKn/rtGBMwPerRi7hWQjblOZ4IIeDOZ4folXjOFbsR g9t5gXjPK7/ke3m/gHfdTwWR9OfJsJvRXwN9sDKgvZVrrc+H9II5wUspfOpEZSkUTsBK neBw== X-Gm-Message-State: APf1xPCl+yOabf1X58E+Ru7yV3+c5XfWt5zhF65e6eFrJBnMASMEfWMM ihn6lkTFW+X4PdyTqcdlzjvx2g== X-Google-Smtp-Source: AG47ELuvnAofPDWVY7I61cLzg6VAGs+smYYnL/FS4vKnXi4aaXn+KmI/97ivRUXzb7B9UxAcTQtoDw== X-Received: by 10.237.54.35 with SMTP id e32mr6460920qtb.322.1519452983092; Fri, 23 Feb 2018 22:16:23 -0800 (PST) Subject: Re: [Patch v3 11/14] [BZ #14095] update collation data from Unicode / ISO 14651 To: Mike FABIAN , libc-alpha@sourceware.org Cc: "Dmitry V. Levin" References: From: Carlos O'Donell Message-ID: <41c1f7dc-f603-6dd3-895f-7f755865e4d3@redhat.com> Date: Sat, 24 Feb 2018 06:21:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2018-02/txt/msg00693.txt.bz2 On 02/23/2018 02:24 AM, Mike FABIAN wrote: > From 5c65168e569ba0c59ad43bbd88f37cdb356c16b6 Mon Sep 17 00:00:00 2001 > From: Mike FABIAN > Date: Tue, 23 Jan 2018 17:29:36 +0100 > Subject: [PATCH 11/14] Fix test cases tst-fnmatch and tst-regexloc for the new > iso14651_t1_common file. > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit OK with the following changes: - Comment added in tst-fnmatch.input about range usage like this. - Rework the test input to keep testing the range. See comments below. Reviewed-by: Carlos O'Donell > See: > > http://pubs.opengroup.org/onlinepubs/7908799/xbd/re.html > >> A range expression represents the set of collating elements that fall >> between two elements in the current collation sequence, >> inclusively. It is expressed as the starting point and the ending >> point separated by a hyphen (-). >> >> Range expressions must not be used in portable applications because >> their behaviour is dependent on the collating sequence. Ranges will be >> treated according to the current collating sequence, and include such >> characters that fall within the range based on that collating >> sequence, regardless of character values. This, however, means that >> the interpretation will differ depending on collating sequence. If, >> for instance, one collating sequence defines ä as a variant of a, >> while another defines it as a letter following z, then the expression >> [ä-z] is valid in the first language and invalid in the second. > Therefore, using [a-z] does not make much sense except in the C/POSIX locale. > The new iso14651_t1_common lists upper case and lower case Latin characters > in a different order than the old one which causes surprising results > for example in the de_DE locale: [a-z] now includes A because A comes > after a in iso14651_t1_common but does not include Z because that comes > after z in iso14651_t1_common. Why delete the tests though? Why not adjust them to cover the result? The old tests were similarly adjusted, since it expects 'ä' to be within the range of [a-z], similarly we could adjust the tests? > > * posix/tst-fnmatch.input: Use range expressions only in C locale. > * posix/tst-regexloc.c: Do not use a range expression for > de_DE.ISO-8859-1 locale. > --- > posix/tst-fnmatch.input | 40 ---------------------------------------- > posix/tst-regexloc.c | 4 ++-- > 2 files changed, 2 insertions(+), 42 deletions(-) > > diff --git a/posix/tst-fnmatch.input b/posix/tst-fnmatch.input > index 88b3f739a5..1e2f62c0ed 100644 > --- a/posix/tst-fnmatch.input > +++ b/posix/tst-fnmatch.input > @@ -418,26 +418,6 @@ C "-" "[Z-\\]]" NOMATCH > # Following are tests outside the scope of IEEE 2003.2 since they are using > # locales other than the C locale. The main focus of the tests is on the > # handling of ranges and the recognition of character (vs bytes). Here we need a comment explaining exactly why [a-z] is tricky. Basically include the text you wrote for the commit message here :-) > -de_DE.ISO-8859-1 "a" "[a-z]" 0 > -de_DE.ISO-8859-1 "z" "[a-z]" 0 > -de_DE.ISO-8859-1 "ä" "[a-z]" 0 > -de_DE.ISO-8859-1 "ö" "[a-z]" 0 > -de_DE.ISO-8859-1 "ü" "[a-z]" 0 > -de_DE.ISO-8859-1 "A" "[a-z]" NOMATCH This becomes 0. > -de_DE.ISO-8859-1 "Z" "[a-z]" NOMATCH Stays the same. > -de_DE.ISO-8859-1 "Ä" "[a-z]" NOMATCH > -de_DE.ISO-8859-1 "Ö" "[a-z]" NOMATCH > -de_DE.ISO-8859-1 "Ü" "[a-z]" NOMATCH All become 0. etc. > -de_DE.ISO-8859-1 "a" "[A-Z]" NOMATCH > -de_DE.ISO-8859-1 "z" "[A-Z]" NOMATCH > -de_DE.ISO-8859-1 "ä" "[A-Z]" NOMATCH > -de_DE.ISO-8859-1 "ö" "[A-Z]" NOMATCH > -de_DE.ISO-8859-1 "ü" "[A-Z]" NOMATCH > -de_DE.ISO-8859-1 "A" "[A-Z]" 0 > -de_DE.ISO-8859-1 "Z" "[A-Z]" 0 > -de_DE.ISO-8859-1 "Ä" "[A-Z]" 0 > -de_DE.ISO-8859-1 "Ö" "[A-Z]" 0 > -de_DE.ISO-8859-1 "Ü" "[A-Z]" 0 > de_DE.ISO-8859-1 "a" "[[:lower:]]" 0 > de_DE.ISO-8859-1 "z" "[[:lower:]]" 0 > de_DE.ISO-8859-1 "ä" "[[:lower:]]" 0 > @@ -510,26 +490,6 @@ de_DE.ISO-8859-1 "ba" "[[.a.]]a" NOMATCH > > > # And with a multibyte character set. > -de_DE.UTF-8 "a" "[a-z]" 0 > -de_DE.UTF-8 "z" "[a-z]" 0 > -de_DE.UTF-8 "ä" "[a-z]" 0 > -de_DE.UTF-8 "ö" "[a-z]" 0 > -de_DE.UTF-8 "ü" "[a-z]" 0 > -de_DE.UTF-8 "A" "[a-z]" NOMATCH > -de_DE.UTF-8 "Z" "[a-z]" NOMATCH > -de_DE.UTF-8 "Ä" "[a-z]" NOMATCH > -de_DE.UTF-8 "Ö" "[a-z]" NOMATCH > -de_DE.UTF-8 "Ãœ" "[a-z]" NOMATCH > -de_DE.UTF-8 "a" "[A-Z]" NOMATCH > -de_DE.UTF-8 "z" "[A-Z]" NOMATCH > -de_DE.UTF-8 "ä" "[A-Z]" NOMATCH > -de_DE.UTF-8 "ö" "[A-Z]" NOMATCH > -de_DE.UTF-8 "ü" "[A-Z]" NOMATCH > -de_DE.UTF-8 "A" "[A-Z]" 0 > -de_DE.UTF-8 "Z" "[A-Z]" 0 > -de_DE.UTF-8 "Ä" "[A-Z]" 0 > -de_DE.UTF-8 "Ö" "[A-Z]" 0 > -de_DE.UTF-8 "Ãœ" "[A-Z]" 0 > de_DE.UTF-8 "a" "[[:lower:]]" 0 > de_DE.UTF-8 "z" "[[:lower:]]" 0 > de_DE.UTF-8 "ä" "[[:lower:]]" 0 > diff --git a/posix/tst-regexloc.c b/posix/tst-regexloc.c > index 60235b4d3b..7fbc496d0c 100644 > --- a/posix/tst-regexloc.c > +++ b/posix/tst-regexloc.c > @@ -29,8 +29,8 @@ do_test (void) > > if (setlocale (LC_ALL, "de_DE.ISO-8859-1") == NULL) > puts ("cannot set locale"); > - else if (regcomp (&re, "[a-f]*", 0) != REG_NOERROR) > - puts ("cannot compile expression \"[a-f]*\""); > + else if (regcomp (&re, "[abcdef]*", 0) != REG_NOERROR) > + puts ("cannot compile expression \"[abcdef]*\""); OK. > else if (regexec (&re, "abcdefCDEF", 1, mat, 0) == REG_NOMATCH) > puts ("no match"); > else > -- 2.14.3 -- Cheers, Carlos.