From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTP id 2769E3858428 for ; Mon, 6 Sep 2021 03:41:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2769E3858428 Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-35-rGBJJtXnN1yDMb6GyNs_DA-1; Sun, 05 Sep 2021 23:41:20 -0400 X-MC-Unique: rGBJJtXnN1yDMb6GyNs_DA-1 Received: by mail-qv1-f69.google.com with SMTP id h14-20020a0cffce000000b00372ea3f12a5so10150522qvv.9 for ; Sun, 05 Sep 2021 20:41:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=CpwYEBD5TQRnyzXp2dO3T+SDarXN5ZVxQn2ZBpuvMgA=; b=U7RrKeAd6q0K4M0jp6JLeEakp1dKLQfd7ZQUAAg4y/3nVhrFETrxQonvmayIl/5th+ jIeDqIEk81XUqPN2OE98qhXDzhSpAuqjGdUcpiwzzDD2a6U103Qpj7cwbTTsFlN0ydbX awRoq83fstjsNu7P59C5tLDMMso558f2zO8vuuJjgwoUh87uzCQLk7dTSmUfkAgS8gnn N8ZfqpdRnfOHVjx4woG7eFyaSrfLSOYhi5PDWFHS10/+hB+KQvoeukIBkLbOjciQXDVv lOIVIY18AWrx9b1v0TxGovpD5V1jGAQRjP2KLBngNrOxrOUpKla0ThSzXM8hiAOnwFR6 v5+w== X-Gm-Message-State: AOAM530bOYNjyQrw5LjQk9OVF5SjuPhSt9/qGcwXw7e+jP6RUEG7zzNJ 1peSnpZoTedMjnGn2rqT9JcyKekpb5FzfkvekMze6FDXJENv79W+HAd4kFAHn0VQ6KsvUTtDZYO xsnsgDMCCGH0d/d+s95FhX4/8gzKafwrSq0hoo7v7g3tEi3XEdYfAeugMKGSdv6LkfQARNw== X-Received: by 2002:a37:9cc8:: with SMTP id f191mr9228401qke.113.1630899679974; Sun, 05 Sep 2021 20:41:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyzVZ8a0F67cdj1P5jytDkGBfrnIIE023UY/HAUQO57zgSY/ekec41gziDLfssQ7LQgUEgJ3A== X-Received: by 2002:a37:9cc8:: with SMTP id f191mr9228391qke.113.1630899679723; Sun, 05 Sep 2021 20:41:19 -0700 (PDT) Received: from [192.168.1.16] (198-84-214-74.cpe.teksavvy.com. [198.84.214.74]) by smtp.gmail.com with ESMTPSA id 207sm5712888qkh.45.2021.09.05.20.41.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 05 Sep 2021 20:41:19 -0700 (PDT) Subject: Re: [PATCH v9 2/2] Add generic C.UTF-8 locale (Bug 17318) To: Florian Weimer Cc: libc-alpha@sourceware.org References: <20210902020546.90935-1-carlos@redhat.com> <20210902020546.90935-3-carlos@redhat.com> <87mtov81g2.fsf@oldenburg.str.redhat.com> From: Carlos O'Donell Organization: Red Hat Message-ID: <837d13d5-fccd-0dfe-759f-910cf9a01f5d@redhat.com> Date: Sun, 5 Sep 2021 23:41:17 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <87mtov81g2.fsf@oldenburg.str.redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-14.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Sep 2021 03:41:24 -0000 On 9/2/21 11:03 AM, Florian Weimer wrote: > * Carlos O'Donell: > >> diff --git a/NEWS b/NEWS >> index 79c895e382..807105a596 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -9,7 +9,15 @@ Version 2.35 >> >> Major new features: >> >> - [Add new features here] >> +* Support for the C.UTF-8 locale has been added to glibc. The locale >> + supports full code-point sorting for all valid Unicode code points. >> + A limitation in the framework for fnmatch, regexec, and regcomp requires >> + a compromise to save space and only ASCII-based range expressions are >> + supported for now (see bug 28255). The full size of the locale is only >> + ~400KiB, with 346KiB coming from LC_CTYPE information for Unicode. This >> + locale harmonizes downstream C.UTF-8 already shipping in Gentoo, Debian, >> + Ubuntu, Fedora, CentOS Stream, and RHEL. The locale is not built into >> + glibc, and must be installed. > > I would say “various downstream distributions”. You left out SUSE's > distributions, and they have C.UTF-8 as well: > > I double checked that implementation and it's a copy Mike Fabian's original that we put into Fedora/RHEL so we are already harmonized with that, which is good. I've adjusted the text following your recommendation though, it's clearer. >> --- /dev/null >> +++ b/iconv/tst-iconv9.c > >> + /* From ISO-8859-1 to ASCII. */ > >> + /* From UTF-8 to ASCII. */ > > Missing spaces after “.”. Fixed. >> diff --git a/posix/transbug.c b/posix/transbug.c >> index d0983b4d44..71632b7976 100644 >> --- a/posix/transbug.c >> +++ b/posix/transbug.c >> @@ -116,14 +116,30 @@ do_test (void) >> static const char lower[] = "[[:lower:]]+"; >> static const char upper[] = "[[:upper:]]+"; >> struct re_registers regs[4]; >> + int result; >> >> +#define CHECK(exp) \ >> + if (exp) { puts (#exp); result = 1; } >> + >> + printf ("INFO: Checking C.\n"); >> setlocale (LC_ALL, "C"); >> >> (void) re_set_syntax (RE_SYNTAX_GNU_AWK); >> >> - int result; >> -#define CHECK(exp) \ >> - if (exp) { puts (#exp); result = 1; } >> + result = run_test (lower, regs); >> + result |= run_test (upper, ®s[2]); >> + if (! result) >> + { >> + CHECK (regs[0].start[0] != regs[2].start[0]); >> + CHECK (regs[0].end[0] != regs[2].end[0]); >> + CHECK (regs[1].start[0] != regs[3].start[0]); >> + CHECK (regs[1].end[0] != regs[3].end[0]); >> + } >> + >> + printf ("INFO: Checking C.UTF-8.\n"); >> + setlocale (LC_ALL, "C.UTF-8"); >> + >> + (void) re_set_syntax (RE_SYNTAX_GNU_AWK); >> >> result = run_test (lower, regs); >> result |= run_test (upper, ®s[2]); > > The second-to-last line overwrites the previous test results. > > I think this can go in if you address those nits. Fixed. I'll use |= for all of them and init to zero. I'll post a v10. Only 2/2 needs a Reviewed-by. Thanks for your review. -- Cheers, Carlos.