From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 894 invoked by alias); 24 Jul 2017 13:22:55 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 880 invoked by uid 89); 24 Jul 2017 13:22:54 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AC_HTML_NONSENSE_TAGS,AWL,BAYES_00,RCVD_IN_DNSWL_LOW,RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.2 spammy=slowly, NFD, Hx-languages-length:1868, fran X-HELO: mail-qt0-f179.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=4xy+23YG+opXhPj/kcwga26UUviL6zjLUhHb7HLsQuI=; b=l0fJWq0GDrVrjR6QNPQvfjkBA08a7OXU24tTsBDayiU3IclivWfzmDEqzobmKc2I9K QzuxphpjXepte/CzgLflOlkjIUVXE/hpZIcXbL92quOAQFqGWt9gbdTaPM5uDSeLVPtd V87bkBKinBwoekaH+ejbPpj+MLFGXxvLrQZ1kQ+28TT828EJndwunbQiBAUYxaLx8Jrj 4PlySuKlmXjUYKvsJWM4U1tTZw8ozkxkP5K/7miWQJ+O9GkBRT9ReEIQsSt+fifdGAgF 3QnRauktFcF61Zx47Tri2pJ6poC7QI9KMY0kv0d0USm1G/KhrnzayMsgGfSPsNuvYT/R gLkA== X-Gm-Message-State: AIVw112CYgyJfvbaqCSgP1NJL77ktJw6duT1hDOtpPO26cYxUVwsOalA x+TzqeonFxpSe2wySH3MrQ== X-Received: by 10.200.56.175 with SMTP id f44mr20612957qtc.315.1500902571332; Mon, 24 Jul 2017 06:22:51 -0700 (PDT) Subject: Re: Is it OK to write ASCII strings directly into locale source files? To: Mike FABIAN , libc-alpha@sourceware.org References: From: Carlos O'Donell Message-ID: <5f71f2f6-be0e-2b5d-91ce-03386eafa7f7@redhat.com> Date: Mon, 24 Jul 2017 13:28:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2017-07/txt/msg00811.txt.bz2 On 07/24/2017 09:09 AM, Mike FABIAN wrote: > > Currently the locale source files use a lot of code points even for > strings which are pure ASCII. For example localedata/locales/de_DE > contains: > > % "%a %d %b %Y %T %Z" > d_t_fmt "" > > Would it be OK to write this as > > d_t_fmt "%a %d %b %Y %T %Z" > > ?? > > This would make the files much more readable. > > Stuff that is mostly ASCII can probably be written like this: > > % https://oc.wikipedia.org/wiki/Fran%C3%A7a França > country_name "Frana" > > which is already more readable then writing it all in code points. > > It would be even nicer to write it completely in UTF-8, i.e.: > > country_name "França" > > but I am not sure whether this is allowed in the locale source files. > > But at least for everything which is ASCII, it might be OK already to > write the characters directly. > > Is writing ASCII there allowed or not?? It's not ASCII though is it? Since '<' and '>' have to be reserved to support parsing of UTF-8 code points, so it's "almost ASCII." I'm ok using 'almost' ASCII characters as their 1-byte UTF-8 form instead of the verbose code-points, but we need to document exactly which characters are allowed. I believe the answer is everything except '<>'. I'm not entirely ready to allow all UTF-8, since that descends into the much more complex discussion around NFC, NFKC, NFD, NFKD etc. and which form should be used. Then there are discussions around uniqueness of decomposition and exactly what did the source author want. So let us start slowly and agree with 'ASCII - [<>]' where < denotes the start of a code point and > the end of the code point. -- Cheers, Carlos.