Re: locale encodings - Keld Simonsen

public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed

From: Keld Simonsen <keld@keldix.com>
To: Troy Korjuslommi <tjk@tksoft.com>
Cc: Steven Abner <pheonix@zoomtown.com>,
	libc-locales@sourceware.org, Carlos O'Donell <carlos@redhat.com>
Subject: Re: locale encodings
Date: Thu, 14 Nov 2013 11:33:00 -0000	[thread overview]
Message-ID: <20131114113308.GA9638@rap.rap.dk> (raw)
In-Reply-To: <1384415405.2935.29.camel@uno11.loco>

I am aware of the problem, and will look into it.
It may take some time, tho.

Best regards
keld

On Thu, Nov 14, 2013 at 09:50:05AM +0200, Troy Korjuslommi wrote:
> By the way, I ran some tests on the fi_FI locale for glibc-2.18 and it
> seems to contain out of date information in regards to collation. The
> correct collation order/data are specified in Finnish standard SFS-EN
> 13710 published in 2011 (Finnish standard based on EN 13710 ~aka ISO/IEC
> 14651) and CLDR, and implemented in ICU. Quick look at the fi_FI file
> tells me that at least the dates are off, which would imply the data
> being off. The collation errors seem to be diacritic related, so I would
> have to go through the actual data to determine whether the error is in
> strcoll's dealing with UTF-8 or the collation data. The collation data
> seems to be the most likely suspect. Keld, your name is listed as the
> contact, so maybe best that you check this out. In case only the
> comments are off. Also, the charset is wrong. It is listed as iso-8859-1
> for fi_FI and iso-8859-15 for fi_FI@euro. The correct charset for
> Finnish is UTF-8. Only UTF-8 includes all the characters included in the
> current standards.
> 
> Since EN 13710 specifies a European collation order, it should also be
> used in other Europan locales as the default sorting order.
> 
> I've tried to push for more cooperation with CLDR in the past too, and
> here is a good case in point why it would actually be a good idea to
> keep an eye on CLDR. There is no need to automate the process
> (difficulty of which seems to be the reason for resisting CLDR), just
> get the relevant data. Running comparison tests between cldr and libc
> would also be a good idea. ICU is pretty up-to-date in terms of CLDR and
> other Unicode.org data, so that would be an easy way to implement the
> tests.
> 
> Troy
> 
> 
> On Tue, 2013-11-12 at 10:37 -0500, Steven Abner wrote:
> > On 12 Nov 2013, at 9:34 AM, Steven Abner wrote:
> > 
> > > all data that is important, save one, is in POSIX's 7-bit ASCII
> > 
> >  I wish to add, the quoted strings however are UTF8 instead of the default set. Off the top of my
> > head, the JP file has quoted ("") strings for correct display of months, hours, etc. in UTF8.
> >  As far as embedded, a Japanese microwave doesn't need UTF8 for display, but the designer
> > who butchers the code for the microwave, even a Japanese one, can readily use UTF8 to set up
> > JIS0201 or even their own proprietary 128 or less byte display code, and internal communications.
> > That same designer could use UTF8, and default character information from glibc locales to
> > create an embedded version of a code set for microwaves in China.
> >   Not saying this is standard, but my point was, I guess, is default character set for the locale could
> > or should go into the ASCII section of "LC" data. Comments in any encoding get gobbled, quoted
> > strings either in default character set or UTF8.
> >   I am no expert, just food for thought.
> > Steve
>

next prev parent reply	other threads:[~2013-11-14 11:33 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-11  1:28 Steven Abner
2013-11-11  5:19 ` Carlos O'Donell
2013-11-11 12:58 ` Troy Korjuslommi
2013-11-12  1:23   ` Keld Simonsen
2013-11-12  5:38     ` Carlos O'Donell
2013-11-12 13:36       ` Keld Simonsen
2013-11-12 14:39         ` Carlos O'Donell
2013-11-12 16:11           ` Keld Simonsen
2013-11-12 14:52         ` Steven Abner
2013-11-12 16:15           ` Steven Abner
2013-11-14  7:47             ` Troy Korjuslommi
2013-11-14 11:33               ` Keld Simonsen [this message]
2013-11-14 20:47               ` Steven Abner
2013-11-14 21:17                 ` Steven Abner
2013-11-14 21:17                 ` Keld Simonsen
2013-11-26 17:05 Marko Myllynen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131114113308.GA9638@rap.rap.dk \
    --to=keld@keldix.com \
    --cc=carlos@redhat.com \
    --cc=libc-locales@sourceware.org \
    --cc=pheonix@zoomtown.com \
    --cc=tjk@tksoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).