From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-82391-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 85907 invoked by alias); 24 Jul 2017 22:55:57 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 83662 invoked by uid 89); 24 Jul 2017 22:55:55 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM autolearn=no version=3.3.2 spammy=sentence, HContent-Transfer-Encoding:8bit
X-HELO: mail-qk0-f182.google.com
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:subject:to:cc:references:from:organization
         :message-id:date:user-agent:mime-version:in-reply-to
         :content-language:content-transfer-encoding;
        bh=OX7Kg1deC5Qvqvts2NnQWTl47qHt8G/ORizwUK88p3c=;
        b=Amcs+MP7UJsTfLaEfWwsJ9grBDgFGysztFHdBKw6MVtTrf0h95mrLI5qAuZrwHzrgx
         QSkcQOkRUkuiIJecEBH5HEgcdZKhCxIV03vTOFkdTHCdUUFfCV0A1tFxw5TUBU8tnSUQ
         sznlNjYciaDce8cUVb7yGqcx/qOo6qmCobz064njX4phkmWribOBYVRfrVvTux8MdJrK
         tZBCwgm12WS9+kI7eouHavrJR5SWvo9E/A8qhMwPzZIUgNCiEEEa/lcj+ZRwmObL9U6P
         dbPdiv/w8XSwQxdp9gXq36gKoAPZ5ziR4Dw2sQXk4K3ms6ni0uKSx682GbJxWlTgLkLf
         xJEw==
X-Gm-Message-State: AIVw110oLXBiymyRZz59XU3hIU00AuSiJJ859mbD33JybEnzIQRnQKUg
	JDwCDMm2AziN21sKcu2rWg==
X-Received: by 10.55.69.73 with SMTP id s70mr4027665qka.291.1500936952240;
        Mon, 24 Jul 2017 15:55:52 -0700 (PDT)
Subject: Re: Is it OK to write ASCII strings directly into locale source
 files?
To: Florian Weimer <fw@deneb.enyo.de>
Cc: Andreas Schwab <schwab@suse.de>, Mike FABIAN <mfabian@redhat.com>,
 libc-alpha@sourceware.org
References: <s9d8tje9e1k.fsf@redhat.com>
 <5f71f2f6-be0e-2b5d-91ce-03386eafa7f7@redhat.com> <mvmy3rdx577.fsf@suse.de>
 <87h8y13gvb.fsf@mid.deneb.enyo.de>
 <e43a088a-cb33-c322-7587-c20d993e7fa6@redhat.com>
 <87379lczdi.fsf@mid.deneb.enyo.de>
From: Carlos O'Donell <carlos@redhat.com>
Message-ID: <7fa0552d-c24b-3c5c-cad3-1359eb4dd6bd@redhat.com>
Date: Tue, 25 Jul 2017 05:40:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <87379lczdi.fsf@mid.deneb.enyo.de>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-SW-Source: 2017-07/txt/msg00835.txt.bz2

On 07/24/2017 05:13 PM, Florian Weimer wrote:
>> My only technical objection with writing straight UTF-8 is that it could
>> lead to more mistakes, and Mike just found one in CLDR where an Arabic
>> Farsi character was used incorrectly because it displayed the same glyph.
>> It was caught when harmonizing with glibc where you have to write out the
>> code points (Mike filed a bug upstream with CLDR).
> 
> Wasn't it caught by locale testing which revealed that the locale
> wasn't compatible with ISO-8859-6?  That sanity check would still
> apply to locale definitions written in UTF-8.

My point was that the mistake was made in CLDR upstream where I only
presume the mistake was made because the glyphs are identical.

If we had not been using ISO-8859-6, or if we'd had a mapping from
all the UTF-8 chars into ISO-8859-6 (there was no transliteration for the
Farsi character), then we would not have noticed the error in the 
original source locale.

My only argument is that when you are forced to use <Uxxx> encoding it
is empirically less likely you'll make a mistake. Like reading a sentence
backwards to catch errors since it prevents your brain from filling in
the missing information.

> I would still prefer the <Uâ¦> encoding for control characters which
> are in the portable character set.  So I have to object to the
> âmaximumâ part. :)

Yes, I had ignored the control characters, so I agree, not maximally :}

-- 
Cheers,
Carlos.