From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25383 invoked by alias); 24 Jul 2017 13:32:19 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 25324 invoked by uid 89); 24 Jul 2017 13:32:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AC_HTML_NONSENSE_TAGS,AWL,BAYES_00,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,T_FILL_THIS_FORM_SHORT autolearn=no version=3.3.2 spammy=consensus, Hx-spam-relays-external:209.85.220.178, H*RU:209.85.220.178 X-HELO: mail-qk0-f178.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=xjheTe3X6/Cv/QVDowtvwvCF+Qe3V/R9SoN6ZQCJ/Ao=; b=s6PwaYlqOEhN/XP1HbtB/4j95+X9CIVYcPsN6OmFt3PTJnbZ9+W4cR8ucs2mrNuujq gsuXmfHyQlBWN423e+DHz+NbtPWvE+XBxY1HkdkoCvzRTO4SX8kRuYkQW7cGtsvoR9QH oaor+L72CEaIoBYQNYQBF1jZSSe/+wjr+ImNa7AeVGaHb6ce+hdlxfaXNqw2S8H2ezN3 DvYMTuLHi/eY5avRF4A0g8dCVW5Yp7yRU8z8aZftFpe7dpOiofK2udOMMJmZkD8itJVd f+Ek4Bq9Vxg7IrqNlPz7DC/7w4e3FL3faStg7u+8Wq6GdOfRMn/xTvP0IQAJARHMhBnT WjMw== X-Gm-Message-State: AIVw110IBPG6AriAzuSxdr9oUahIHw4duK7dAMD3lY4uXbhxO5nVGrhy 0PiLhP3P/PD1ypX266Cncw== X-Received: by 10.55.180.198 with SMTP id d189mr18654958qkf.103.1500903132324; Mon, 24 Jul 2017 06:32:12 -0700 (PDT) Subject: Re: Is it OK to write ASCII strings directly into locale source files? To: Mike FABIAN Cc: libc-alpha@sourceware.org References: <5f71f2f6-be0e-2b5d-91ce-03386eafa7f7@redhat.com> From: Carlos O'Donell Message-ID: <9d38a4b0-9b06-8ee5-79b7-ed6b5e7fc40d@redhat.com> Date: Mon, 24 Jul 2017 14:47:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2017-07/txt/msg00813.txt.bz2 On 07/24/2017 09:28 AM, Mike FABIAN wrote: > Carlos O'Donell wrote: > >> On 07/24/2017 09:09 AM, Mike FABIAN wrote: >>> >>> Currently the locale source files use a lot of code points even for >>> strings which are pure ASCII. For example localedata/locales/de_DE >>> contains: >>> >>> % "%a %d %b %Y %T %Z" >>> d_t_fmt >>> "" >>> >>> Would it be OK to write this as >>> >>> d_t_fmt "%a %d %b %Y %T %Z" >>> >>> ?? >>> >>> This would make the files much more readable. >>> >>> Stuff that is mostly ASCII can probably be written like this: >>> >>> % https://oc.wikipedia.org/wiki/Fran%C3%A7a França >>> country_name "Frana" >>> >>> which is already more readable then writing it all in code points. >>> >>> It would be even nicer to write it completely in UTF-8, i.e.: >>> >>> country_name "França" >>> >>> but I am not sure whether this is allowed in the locale source files. >>> >>> But at least for everything which is ASCII, it might be OK already to >>> write the characters directly. >>> >>> Is writing ASCII there allowed or not?? >> >> It's not ASCII though is it? Since '<' and '>' have to be reserved >> to support parsing of UTF-8 code points, so it's "almost ASCII." >> >> I'm ok using 'almost' ASCII characters as their 1-byte UTF-8 form >> instead of the verbose code-points, but we need to document exactly >> which characters are allowed. I believe the answer is everything >> except '<>'. >> >> I'm not entirely ready to allow all UTF-8, since that descends into >> the much more complex discussion around NFC, NFKC, NFD, NFKD etc. and >> which form should be used. Then there are discussions around uniqueness >> of decomposition and exactly what did the source author want. >> >> So let us start slowly and agree with 'ASCII - [<>]' where < denotes >> the start of a code point and > the end of the code point. > > Yes, that sounds like a very reasonable first step! > > Is it OK to use that already *now*? You and Rafal are localedata maintainers, you can assume consensus, therefore you can start changing things in whatever way you wish. Before you change this though I would like to see your list of reasons for making the change, what benefits do you see it bringing? Is readability the only one? > Or is any change necessary to make that work? I do not know. > I tried > > country_name "Frana" > > and it seems to work: > > bash-4.4# LC_ALL=oc_FR.UTF-8 locale -k country_name > country_name="França" > > So maybe it is possible to use that right now without having to change > anything in the code parsing the locale source files. You need to document somewhere what is acceptable and what is not and which ASCII characters cannot be used. -- Cheers, Carlos.