From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-82379-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 97638 invoked by alias); 24 Jul 2017 19:00:58 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 83560 invoked by uid 89); 24 Jul 2017 19:00:46 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=0.7 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE,UNSUBSCRIBE_BODY autolearn=no version=3.3.2 spammy=pdfs, pursue, PDFs, Supporting
X-HELO: mail-qt0-f175.google.com
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:subject:to:cc:references:from:organization
         :message-id:date:user-agent:mime-version:in-reply-to
         :content-language:content-transfer-encoding;
        bh=3LAF1u0Qf9t+Pl0k1a9MDbvXr6aa5TNVy18XoZQw8gI=;
        b=Z7XFTJ5+6PEl8Eh5mN3Ozpwr/5OWOL69HkZhPjeTMZxcYeK0TfIkQ4AFmIKA70zpGU
         gZbBuH0ugeo3tNxg9ilD+U7iRBqzbrHUzB/xWQnJ1bCD7ESD6Cum1+su+K5kjcgcjqJS
         z7RsN86GZKPJ+y3Bv+c2ifCQDg8LMLDQTfUEapCES/SYxHaKO85Oizla9iOW0jz+CglJ
         mzHD8au2ZCoHPr3DaIoMY5wj/M2P0uJAwNHlj3VA4Pg/zoMClzX7DmQH6Q88bvkOyax+
         pZpDxkgQ7auMZ/+bsdgcM0iFgwyXQhMii4/gKQcRljiHv+WsInkEk+esTnCis1wqBfiu
         Yd7w==
X-Gm-Message-State: AIVw111SZYTI530A3PzB74WOYixxfJhzIaIlGHQpoCaXmjzSzokwNgk1
	FSIiRkeMgpWE2NKKjJ/t0A==
X-Received: by 10.237.56.135 with SMTP id k7mr21327418qte.134.1500922831221;
        Mon, 24 Jul 2017 12:00:31 -0700 (PDT)
Subject: Re: Is it OK to write ASCII strings directly into locale source
 files?
To: Florian Weimer <fw@deneb.enyo.de>, Andreas Schwab <schwab@suse.de>
Cc: Mike FABIAN <mfabian@redhat.com>, libc-alpha@sourceware.org
References: <s9d8tje9e1k.fsf@redhat.com>
 <5f71f2f6-be0e-2b5d-91ce-03386eafa7f7@redhat.com> <mvmy3rdx577.fsf@suse.de>
 <87h8y13gvb.fsf@mid.deneb.enyo.de>
From: Carlos O'Donell <carlos@redhat.com>
Message-ID: <e43a088a-cb33-c322-7587-c20d993e7fa6@redhat.com>
Date: Mon, 24 Jul 2017 20:07:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <87h8y13gvb.fsf@mid.deneb.enyo.de>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-SW-Source: 2017-07/txt/msg00823.txt.bz2

On 07/24/2017 01:05 PM, Florian Weimer wrote:
> * Andreas Schwab:
> 
>> On Jul 24 2017, Carlos O'Donell <carlos@redhat.com> wrote:
>>
>>> So let us start slowly and agree with 'ASCII - [<>]' where < denotes
>>> the start of a code point and > the end of the code point.
>>
>> POSIX says "character in the portable character set" if you want to keep
>> it portable.
> 
> But our locales only have to be compatible with our localedef, right?

Should developers be able to write tools to the POSIX locale spec and parse
our source locale definitions? Supporting more than just GNU/Linux? Do the
BSDs share our locale definitions?

> I know that the FSF does not claim copyright on our locales, so anyone
> is free to take them and use them with their own non-GNU systems (or
> sell them as PDFs/books).  But this does not mean we have to make
> their lives easier if it comes at a cost to us (e.g., verifying that
> we only use the portable character set, or refraining from using full
> UTF-8 at a future date). 
I agree with your sentiment, and leave it up to Mike to decide what makes
it ultimately easier for him as a subsystem maintainer to work with. There
is certainly a cost/reward balance.

My only technical objection with writing straight UTF-8 is that it could
lead to more mistakes, and Mike just found one in CLDR where an Arabic
Farsi character was used incorrectly because it displayed the same glyph.
It was caught when harmonizing with glibc where you have to write out the
code points (Mike filed a bug upstream with CLDR).

My preference would be to start small, start using the POSIX portable
character set to it's maximum extent for all latin-based languages, see
how that works out, and then decide if we even need to pursue full UTF-8
and in which form.

-- 
Cheers,
Carlos.