From: Keld Simonsen <keld@keldix.com>
To: Troy Korjuslommi <tjk@tksoft.com>
Cc: Steven Abner <pheonix@zoomtown.com>,
libc-locales@sourceware.org, Carlos O'Donell <carlos@redhat.com>
Subject: Re: locale encodings
Date: Thu, 14 Nov 2013 11:33:00 -0000 [thread overview]
Message-ID: <20131114113308.GA9638@rap.rap.dk> (raw)
In-Reply-To: <1384415405.2935.29.camel@uno11.loco>
I am aware of the problem, and will look into it.
It may take some time, tho.
Best regards
keld
On Thu, Nov 14, 2013 at 09:50:05AM +0200, Troy Korjuslommi wrote:
> By the way, I ran some tests on the fi_FI locale for glibc-2.18 and it
> seems to contain out of date information in regards to collation. The
> correct collation order/data are specified in Finnish standard SFS-EN
> 13710 published in 2011 (Finnish standard based on EN 13710 ~aka ISO/IEC
> 14651) and CLDR, and implemented in ICU. Quick look at the fi_FI file
> tells me that at least the dates are off, which would imply the data
> being off. The collation errors seem to be diacritic related, so I would
> have to go through the actual data to determine whether the error is in
> strcoll's dealing with UTF-8 or the collation data. The collation data
> seems to be the most likely suspect. Keld, your name is listed as the
> contact, so maybe best that you check this out. In case only the
> comments are off. Also, the charset is wrong. It is listed as iso-8859-1
> for fi_FI and iso-8859-15 for fi_FI@euro. The correct charset for
> Finnish is UTF-8. Only UTF-8 includes all the characters included in the
> current standards.
>
> Since EN 13710 specifies a European collation order, it should also be
> used in other Europan locales as the default sorting order.
>
> I've tried to push for more cooperation with CLDR in the past too, and
> here is a good case in point why it would actually be a good idea to
> keep an eye on CLDR. There is no need to automate the process
> (difficulty of which seems to be the reason for resisting CLDR), just
> get the relevant data. Running comparison tests between cldr and libc
> would also be a good idea. ICU is pretty up-to-date in terms of CLDR and
> other Unicode.org data, so that would be an easy way to implement the
> tests.
>
> Troy
>
>
> On Tue, 2013-11-12 at 10:37 -0500, Steven Abner wrote:
> > On 12 Nov 2013, at 9:34 AM, Steven Abner wrote:
> >
> > > all data that is important, save one, is in POSIX's 7-bit ASCII
> >
> > I wish to add, the quoted strings however are UTF8 instead of the default set. Off the top of my
> > head, the JP file has quoted ("") strings for correct display of months, hours, etc. in UTF8.
> > As far as embedded, a Japanese microwave doesn't need UTF8 for display, but the designer
> > who butchers the code for the microwave, even a Japanese one, can readily use UTF8 to set up
> > JIS0201 or even their own proprietary 128 or less byte display code, and internal communications.
> > That same designer could use UTF8, and default character information from glibc locales to
> > create an embedded version of a code set for microwaves in China.
> > Not saying this is standard, but my point was, I guess, is default character set for the locale could
> > or should go into the ASCII section of "LC" data. Comments in any encoding get gobbled, quoted
> > strings either in default character set or UTF8.
> > I am no expert, just food for thought.
> > Steve
>
next prev parent reply other threads:[~2013-11-14 11:33 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-11 1:28 Steven Abner
2013-11-11 5:19 ` Carlos O'Donell
2013-11-11 12:58 ` Troy Korjuslommi
2013-11-12 1:23 ` Keld Simonsen
2013-11-12 5:38 ` Carlos O'Donell
2013-11-12 13:36 ` Keld Simonsen
2013-11-12 14:39 ` Carlos O'Donell
2013-11-12 16:11 ` Keld Simonsen
2013-11-12 14:52 ` Steven Abner
2013-11-12 16:15 ` Steven Abner
2013-11-14 7:47 ` Troy Korjuslommi
2013-11-14 11:33 ` Keld Simonsen [this message]
2013-11-14 20:47 ` Steven Abner
2013-11-14 21:17 ` Steven Abner
2013-11-14 21:17 ` Keld Simonsen
2013-11-26 17:05 Marko Myllynen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131114113308.GA9638@rap.rap.dk \
--to=keld@keldix.com \
--cc=carlos@redhat.com \
--cc=libc-locales@sourceware.org \
--cc=pheonix@zoomtown.com \
--cc=tjk@tksoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).