Re: Is it OK to write ASCII strings directly into locale source files?

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Carlos O'Donell <carlos@redhat.com>
To: Mike FABIAN <mfabian@redhat.com>
Cc: libc-alpha@sourceware.org
Subject: Re: Is it OK to write ASCII strings directly into locale source files?
Date: Mon, 24 Jul 2017 14:47:00 -0000	[thread overview]
Message-ID: <9d38a4b0-9b06-8ee5-79b7-ed6b5e7fc40d@redhat.com> (raw)
In-Reply-To: <s9d7eyy6k1y.fsf@redhat.com>

On 07/24/2017 09:28 AM, Mike FABIAN wrote:
> Carlos O'Donell <carlos@redhat.com> wrote:
> 
>> On 07/24/2017 09:09 AM, Mike FABIAN wrote:
>>>
>>> Currently the locale source files use a lot of code points even for
>>> strings which are pure ASCII. For example localedata/locales/de_DE
>>> contains:
>>>
>>> %	"%a %d %b %Y %T %Z"
>>> d_t_fmt
>>> "<U0025><U0061><U0020><U0025><U0064><U0020><U0025><U0062><U0020><U0025><U0059><U0020><U0025><U0054><U0020><U0025><U005A>"
>>>
>>> Would it be OK to write this as
>>>
>>> d_t_fmt "%a %d %b %Y %T %Z"
>>>
>>> ??
>>>
>>> This would make the files much more readable.
>>>
>>> Stuff that is mostly ASCII can probably be written like this:
>>>
>>> % https://oc.wikipedia.org/wiki/Fran%C3%A7a FranÃ§a
>>> country_name "Fran<U00E7>a"
>>>
>>> which is already more readable then writing it all in <U00??> code points.
>>>
>>> It would be even nicer to write it completely in UTF-8, i.e.:
>>>
>>> country_name "FranÃ§a"
>>>
>>> but I am not sure whether this is allowed in the locale source files.
>>>
>>> But at least for everything which is ASCII, it might be OK already to
>>> write the characters directly.
>>>
>>> Is writing ASCII there allowed or not??
>>  
>> It's not ASCII though is it? Since '<' and '>' have to be reserved
>> to support parsing of UTF-8 code points, so it's "almost ASCII."
>>
>> I'm ok using 'almost' ASCII characters as their 1-byte UTF-8 form
>> instead of the verbose code-points, but we need to document exactly
>> which characters are allowed. I believe the answer is everything
>> except '<>'.
>>
>> I'm not entirely ready to allow all UTF-8, since that descends into
>> the much more complex discussion around NFC, NFKC, NFD, NFKD etc. and
>> which form should be used. Then there are discussions around uniqueness
>> of decomposition and exactly what did the source author want.
>>
>> So let us start slowly and agree with 'ASCII - [<>]' where < denotes
>> the start of a code point and > the end of the code point.
> 
> Yes, that sounds like a very reasonable first step!
> 
> Is it OK to use that already *now*?

You and Rafal are localedata maintainers, you can assume consensus, therefore
you can start changing things in whatever way you wish.

Before you change this though I would like to see your list of reasons
for making the change, what benefits do you see it bringing? Is readability
the only one?

> Or is any change necessary to make that work?

I do not know.

> I tried
> 
> country_name "Fran<U00E7>a"
> 
> and it seems to work:
> 
> bash-4.4# LC_ALL=oc_FR.UTF-8 locale -k country_name
> country_name="FranÃ§a"
> 
> So maybe it is possible to use that right now without having to change
> anything in the code parsing the locale source files.
 
You need to document somewhere what is acceptable and what is not and
which ASCII characters cannot be used.

-- 
Cheers,
Carlos.

next prev parent reply	other threads:[~2017-07-24 13:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-24 13:13 Mike FABIAN
2017-07-24 13:28 ` Carlos O'Donell
2017-07-24 13:32   ` Mike FABIAN
2017-07-24 14:47     ` Carlos O'Donell [this message]
2017-07-24 15:03       ` Mike FABIAN
2017-07-24 15:45         ` Carlos O'Donell
2017-07-24 22:39       ` Rafal Luzynski
2017-07-24 22:55         ` Carlos O'Donell
2017-07-24 14:49   ` Andreas Schwab
2017-07-24 15:07     ` Carlos O'Donell
2017-07-24 17:07     ` Florian Weimer
2017-07-24 20:07       ` Carlos O'Donell
2017-07-24 22:34         ` Florian Weimer
2017-07-24 22:51           ` Rafal Luzynski
2017-07-25  5:40           ` Carlos O'Donell
2017-07-25  6:27             ` Mike FABIAN
2017-07-25 12:48               ` Carlos O'Donell
2017-07-25 14:21                 ` Florian Weimer
2017-07-25 14:37                   ` Carlos O'Donell
2017-07-25 19:05                     ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9d38a4b0-9b06-8ee5-79b7-ed6b5e7fc40d@redhat.com \
    --to=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=mfabian@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).