public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* locale encodings
@ 2013-11-11  1:28 Steven Abner
  2013-11-11  5:19 ` Carlos O'Donell
  2013-11-11 12:58 ` Troy Korjuslommi
  0 siblings, 2 replies; 16+ messages in thread
From: Steven Abner @ 2013-11-11  1:28 UTC (permalink / raw)
  To: libc-locales

Hi,
 Can you tell me what file format "cs_CZ", "sk_SK", "sv_SE" and "wo_SN" are encoded in? I was going to try
to fix it for my use, but can't open in a normal editor. I was doing a design test when these files tripped a non-POSIX portable character set code in my scanf()'s isspace(). I think they might be ISO8859-2 but not sure. Normal editor claims it can't be
open in UTF-8. I'd rather not second guess someone else's work, if I can. If it is  ISO8859-2, I'll just decode/encode me a
UTF file to examine. Two other files have UTF8 encodings, which is no problem. Others do but weren't within scope of
the trap (comment character to first word after). I am only trying to verify the file parser is picking up exact data, and hopefully
not being corrupted by unusual codes, as some have been.
Thanks,
Steve
pheonix@zoomtown.com

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: locale encodings
@ 2013-11-26 17:05 Marko Myllynen
  0 siblings, 0 replies; 16+ messages in thread
From: Marko Myllynen @ 2013-11-26 17:05 UTC (permalink / raw)
  To: 'Troy Korjuslommi'; +Cc: libc-locales

Hi Troy,

> I ran some tests on the fi_FI locale for glibc-2.18 and it seems to
> contain out of date information in regards to collation. The correct
> collation order/data are specified in Finnish standard SFS-EN 13710
> published in 2011 (Finnish standard based on EN 13710 ~aka ISO/IEC
> 14651) and CLDR, and implemented in ICU.

yes, as Keld mentioned, we're aware of this, once EN 13710 is
implemented it should be easy to implement SFS-EN 13710 on top of it.
We're tracking EN 13710 support in
https://sourceware.org/bugzilla/show_bug.cgi?id=16052.

> The charset is wrong. It is listed as iso-8859-1 for fi_FI and
> iso-8859-15 for fi_FI@euro. The correct charset for Finnish is UTF-8.
> Only UTF-8 includes all the characters included in the current
> standards.

Good point, ISO-8859-1 is certainly incorrect since the introduction of
the Euro sign, I'll send a patch to fix this shortly.

> I've tried to push for more cooperation with CLDR in the past too, and
> here is a good case in point why it would actually be a good idea to
> keep an eye on CLDR. There is no need to automate the process
> (difficulty of which seems to be the reason for resisting CLDR), just
> get the relevant data. Running comparison tests between cldr and libc
> would also be a good idea. ICU is pretty up-to-date in terms of CLDR
> and other Unicode.org data, so that would be an easy way to implement
> the tests.

I updated fi_FI two years ago to match CLDR where applicable and to
implement some missing fields (see
https://sourceware.org/bugzilla/show_bug.cgi?id=12962). Based on that,
comparing glibc vs CLDR data manually is quite tedious and even today
some parts are not fully compatible with CLDR/recommendations due to
limitations in POSIX (see
https://sourceware.org/bugzilla/show_bug.cgi?id=12747). However, now
that it's been done once it should be pretty straightforward to keep the
glibc fi_FI data in sync with CLDR.

Thanks,

-- 
Marko Myllynen

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-11-26 17:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-11  1:28 locale encodings Steven Abner
2013-11-11  5:19 ` Carlos O'Donell
2013-11-11 12:58 ` Troy Korjuslommi
2013-11-12  1:23   ` Keld Simonsen
2013-11-12  5:38     ` Carlos O'Donell
2013-11-12 13:36       ` Keld Simonsen
2013-11-12 14:39         ` Carlos O'Donell
2013-11-12 16:11           ` Keld Simonsen
2013-11-12 14:52         ` Steven Abner
2013-11-12 16:15           ` Steven Abner
2013-11-14  7:47             ` Troy Korjuslommi
2013-11-14 11:33               ` Keld Simonsen
2013-11-14 20:47               ` Steven Abner
2013-11-14 21:17                 ` Steven Abner
2013-11-14 21:17                 ` Keld Simonsen
2013-11-26 17:05 Marko Myllynen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).