From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-82407-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 116956 invoked by alias); 25 Jul 2017 14:12:21 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 116946 invoked by uid 89); 25 Jul 2017 14:12:20 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE,RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=H*r:TLS1.2, interpretation, month, stand
X-HELO: albireo.enyo.de
From: Florian Weimer <fw@deneb.enyo.de>
To: Carlos O'Donell <carlos@redhat.com>
Cc: Mike FABIAN <mfabian@redhat.com>,  Andreas Schwab <schwab@suse.de>,  libc-alpha@sourceware.org
Subject: Re: Is it OK to write ASCII strings directly into locale source files?
References: <s9d8tje9e1k.fsf@redhat.com>
	<5f71f2f6-be0e-2b5d-91ce-03386eafa7f7@redhat.com>
	<mvmy3rdx577.fsf@suse.de> <87h8y13gvb.fsf@mid.deneb.enyo.de>
	<e43a088a-cb33-c322-7587-c20d993e7fa6@redhat.com>
	<87379lczdi.fsf@mid.deneb.enyo.de>
	<7fa0552d-c24b-3c5c-cad3-1359eb4dd6bd@redhat.com>
	<s9dbmo9xcjq.fsf@redhat.com>
	<f550c8ca-da3f-7fbb-e54a-d372fea36a9d@redhat.com>
Date: Tue, 25 Jul 2017 14:21:00 -0000
In-Reply-To: <f550c8ca-da3f-7fbb-e54a-d372fea36a9d@redhat.com> (Carlos
	O'Donell's message of "Tue, 25 Jul 2017 08:17:44 -0400")
Message-ID: <87mv7sbo75.fsf@mid.deneb.enyo.de>
MIME-Version: 1.0
Content-Type: text/plain
X-SW-Source: 2017-07/txt/msg00851.txt.bz2

* Carlos O'Donell:

> On 07/25/2017 02:20 AM, Mike FABIAN wrote:
>> Carlos O'Donell <carlos@redhat.com> wrote:
>> 
>>> My only argument is that when you are forced to use <Uxxx> encoding it
>>> is empirically less likely you'll make a mistake. Like reading a sentence
>>> backwards to catch errors since it prevents your brain from filling in
>>> the missing information.
>> 
>> But there are also many mistakes because somebody mistyped code points.
>> Several weird typos in things like month names look as if somebody
>> mistyped code points.
>
> Ultimately I defer to your judgement as localedata maintainer to create
> a workflow that is easy for you and benefits your work.
>
> However, I caution against throwing away the compatibility of our locales
> with POSIX, which doesn't seem to allow UTF-8 in the specification.

It does, to some extent:

| A character in the portable character set can be represented by the
| character itself, in which case the value of the character is
| implementation-defined. (Implementations may allow other characters
| to be represented as themselves, but such locale definitions are not
| portable.)

You'll need a very hostile interpretation to say that this doesn't
allow multi-byte character sequences in localedef input.

But I found this in the guts of localedef:

	      /* The standards leave it up to the implementation to decide
		 what to do with character which stand for themself.  We
		 could jump through hoops to find out the value relative to
		 the charmap and the repertoire map, but instead we leave
		 it up to the locale definition author to write a better
		 definition.  We assume here that every character which
		 stands for itself is encoded using ISO 8859-1.  Using the
		 escape character is allowed.  */

So we currently hard-code ISO 8859-1 (not UTF-8) to avoid the
bootstrapping problem.