From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18486 invoked by alias); 25 Jul 2017 14:37:10 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 18436 invoked by uid 89); 25 Jul 2017 14:37:08 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE,RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=H*r:TLS1.2, Hx-languages-length:2039, stand, our X-HELO: albireo.enyo.de From: Florian Weimer To: Carlos O'Donell Cc: Mike FABIAN , Andreas Schwab , libc-alpha@sourceware.org Subject: Re: Is it OK to write ASCII strings directly into locale source files? References: <5f71f2f6-be0e-2b5d-91ce-03386eafa7f7@redhat.com> <87h8y13gvb.fsf@mid.deneb.enyo.de> <87379lczdi.fsf@mid.deneb.enyo.de> <7fa0552d-c24b-3c5c-cad3-1359eb4dd6bd@redhat.com> <87mv7sbo75.fsf@mid.deneb.enyo.de> Date: Tue, 25 Jul 2017 19:05:00 -0000 In-Reply-To: (Carlos O'Donell's message of "Tue, 25 Jul 2017 10:21:27 -0400") Message-ID: <87h8y0bn1s.fsf@mid.deneb.enyo.de> MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2017-07/txt/msg00853.txt.bz2 * Carlos O'Donell: >>> However, I caution against throwing away the compatibility of our locales >>> with POSIX, which doesn't seem to allow UTF-8 in the specification. >> >> It does, to some extent: >> >> | A character in the portable character set can be represented by the >> | character itself, in which case the value of the character is >> | implementation-defined. (Implementations may allow other characters >> | to be represented as themselves, but such locale definitions are not >> | portable.) >> >> You'll need a very hostile interpretation to say that this doesn't >> allow multi-byte character sequences in localedef input. > > I see what you're saying, which is that we are *still* POSIX comliant, > but not portable? Right, and I think that's okay because the glibc locales are for glibc. > I assume we are focusing on the "()" text which allows some kind of escape > hatch outside of the portable character set and allow us to use UTF-8? Exactly. >> But I found this in the guts of localedef: >> >> /* The standards leave it up to the implementation to decide >> what to do with character which stand for themself. We >> could jump through hoops to find out the value relative to >> the charmap and the repertoire map, but instead we leave >> it up to the locale definition author to write a better >> definition. We assume here that every character which >> stands for itself is encoded using ISO 8859-1. Using the >> escape character is allowed. */ >> >> So we currently hard-code ISO 8859-1 (not UTF-8) to avoid the >> bootstrapping problem. > > We could just assume UTF-8, but yes, it looks like this needs a little bit > more looking into. Yes, and we don't have a real bootstrapping problem because while we have charmap file for UTF-8, we have a separate UTF-8 implementation in iconv/gconv, and we could use that to break the loop. > Either way, I support using the portable character set today, and that's > a step forward. Agreed.