From: Mike FABIAN <mfabian@redhat.com>
To: Carlos O'Donell <carlos@redhat.com>
Cc: libc-alpha@sourceware.org
Subject: Re: Is it OK to write ASCII strings directly into locale source files?
Date: Mon, 24 Jul 2017 13:32:00 -0000 [thread overview]
Message-ID: <s9d7eyy6k1y.fsf@redhat.com> (raw)
In-Reply-To: <5f71f2f6-be0e-2b5d-91ce-03386eafa7f7@redhat.com> (Carlos O'Donell's message of "Mon, 24 Jul 2017 09:22:48 -0400")
Carlos O'Donell <carlos@redhat.com> wrote:
> On 07/24/2017 09:09 AM, Mike FABIAN wrote:
>>
>> Currently the locale source files use a lot of code points even for
>> strings which are pure ASCII. For example localedata/locales/de_DE
>> contains:
>>
>> % "%a %d %b %Y %T %Z"
>> d_t_fmt
>> "<U0025><U0061><U0020><U0025><U0064><U0020><U0025><U0062><U0020><U0025><U0059><U0020><U0025><U0054><U0020><U0025><U005A>"
>>
>> Would it be OK to write this as
>>
>> d_t_fmt "%a %d %b %Y %T %Z"
>>
>> ??
>>
>> This would make the files much more readable.
>>
>> Stuff that is mostly ASCII can probably be written like this:
>>
>> % https://oc.wikipedia.org/wiki/Fran%C3%A7a França
>> country_name "Fran<U00E7>a"
>>
>> which is already more readable then writing it all in <U00??> code points.
>>
>> It would be even nicer to write it completely in UTF-8, i.e.:
>>
>> country_name "França"
>>
>> but I am not sure whether this is allowed in the locale source files.
>>
>> But at least for everything which is ASCII, it might be OK already to
>> write the characters directly.
>>
>> Is writing ASCII there allowed or not??
>
> It's not ASCII though is it? Since '<' and '>' have to be reserved
> to support parsing of UTF-8 code points, so it's "almost ASCII."
>
> I'm ok using 'almost' ASCII characters as their 1-byte UTF-8 form
> instead of the verbose code-points, but we need to document exactly
> which characters are allowed. I believe the answer is everything
> except '<>'.
>
> I'm not entirely ready to allow all UTF-8, since that descends into
> the much more complex discussion around NFC, NFKC, NFD, NFKD etc. and
> which form should be used. Then there are discussions around uniqueness
> of decomposition and exactly what did the source author want.
>
> So let us start slowly and agree with 'ASCII - [<>]' where < denotes
> the start of a code point and > the end of the code point.
Yes, that sounds like a very reasonable first step!
Is it OK to use that already *now*?
Or is any change necessary to make that work?
I tried
country_name "Fran<U00E7>a"
and it seems to work:
bash-4.4# LC_ALL=oc_FR.UTF-8 locale -k country_name
country_name="França"
So maybe it is possible to use that right now without having to change
anything in the code parsing the locale source files.
--
Mike FABIAN <mfabian@redhat.com>
next prev parent reply other threads:[~2017-07-24 13:28 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-24 13:13 Mike FABIAN
2017-07-24 13:28 ` Carlos O'Donell
2017-07-24 13:32 ` Mike FABIAN [this message]
2017-07-24 14:47 ` Carlos O'Donell
2017-07-24 15:03 ` Mike FABIAN
2017-07-24 15:45 ` Carlos O'Donell
2017-07-24 22:39 ` Rafal Luzynski
2017-07-24 22:55 ` Carlos O'Donell
2017-07-24 14:49 ` Andreas Schwab
2017-07-24 15:07 ` Carlos O'Donell
2017-07-24 17:07 ` Florian Weimer
2017-07-24 20:07 ` Carlos O'Donell
2017-07-24 22:34 ` Florian Weimer
2017-07-24 22:51 ` Rafal Luzynski
2017-07-25 5:40 ` Carlos O'Donell
2017-07-25 6:27 ` Mike FABIAN
2017-07-25 12:48 ` Carlos O'Donell
2017-07-25 14:21 ` Florian Weimer
2017-07-25 14:37 ` Carlos O'Donell
2017-07-25 19:05 ` Florian Weimer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=s9d7eyy6k1y.fsf@redhat.com \
--to=mfabian@redhat.com \
--cc=carlos@redhat.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).