From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 85907 invoked by alias); 24 Jul 2017 22:55:57 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 83662 invoked by uid 89); 24 Jul 2017 22:55:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM autolearn=no version=3.3.2 spammy=sentence, HContent-Transfer-Encoding:8bit X-HELO: mail-qk0-f182.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=OX7Kg1deC5Qvqvts2NnQWTl47qHt8G/ORizwUK88p3c=; b=Amcs+MP7UJsTfLaEfWwsJ9grBDgFGysztFHdBKw6MVtTrf0h95mrLI5qAuZrwHzrgx QSkcQOkRUkuiIJecEBH5HEgcdZKhCxIV03vTOFkdTHCdUUFfCV0A1tFxw5TUBU8tnSUQ sznlNjYciaDce8cUVb7yGqcx/qOo6qmCobz064njX4phkmWribOBYVRfrVvTux8MdJrK tZBCwgm12WS9+kI7eouHavrJR5SWvo9E/A8qhMwPzZIUgNCiEEEa/lcj+ZRwmObL9U6P dbPdiv/w8XSwQxdp9gXq36gKoAPZ5ziR4Dw2sQXk4K3ms6ni0uKSx682GbJxWlTgLkLf xJEw== X-Gm-Message-State: AIVw110oLXBiymyRZz59XU3hIU00AuSiJJ859mbD33JybEnzIQRnQKUg JDwCDMm2AziN21sKcu2rWg== X-Received: by 10.55.69.73 with SMTP id s70mr4027665qka.291.1500936952240; Mon, 24 Jul 2017 15:55:52 -0700 (PDT) Subject: Re: Is it OK to write ASCII strings directly into locale source files? To: Florian Weimer Cc: Andreas Schwab , Mike FABIAN , libc-alpha@sourceware.org References: <5f71f2f6-be0e-2b5d-91ce-03386eafa7f7@redhat.com> <87h8y13gvb.fsf@mid.deneb.enyo.de> <87379lczdi.fsf@mid.deneb.enyo.de> From: Carlos O'Donell Message-ID: <7fa0552d-c24b-3c5c-cad3-1359eb4dd6bd@redhat.com> Date: Tue, 25 Jul 2017 05:40:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <87379lczdi.fsf@mid.deneb.enyo.de> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2017-07/txt/msg00835.txt.bz2 On 07/24/2017 05:13 PM, Florian Weimer wrote: >> My only technical objection with writing straight UTF-8 is that it could >> lead to more mistakes, and Mike just found one in CLDR where an Arabic >> Farsi character was used incorrectly because it displayed the same glyph. >> It was caught when harmonizing with glibc where you have to write out the >> code points (Mike filed a bug upstream with CLDR). > > Wasn't it caught by locale testing which revealed that the locale > wasn't compatible with ISO-8859-6? That sanity check would still > apply to locale definitions written in UTF-8. My point was that the mistake was made in CLDR upstream where I only presume the mistake was made because the glyphs are identical. If we had not been using ISO-8859-6, or if we'd had a mapping from all the UTF-8 chars into ISO-8859-6 (there was no transliteration for the Farsi character), then we would not have noticed the error in the original source locale. My only argument is that when you are forced to use encoding it is empirically less likely you'll make a mistake. Like reading a sentence backwards to catch errors since it prevents your brain from filling in the missing information. > I would still prefer the encoding for control characters which > are in the portable character set. So I have to object to the > “maximum” part. :) Yes, I had ignored the control characters, so I agree, not maximally :} -- Cheers, Carlos.