From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-locales-return-2795-listarch-libc-locales=sources.redhat.com@sourceware.org>
Received: (qmail 28530 invoked by alias); 26 Nov 2013 17:05:05 -0000
Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-locales.sourceware.org>
List-Subscribe: <mailto:libc-locales-subscribe@sourceware.org>
List-Post: <mailto:libc-locales@sourceware.org>
List-Help: <mailto:libc-locales-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: libc-locales-owner@sourceware.org
Received: (qmail 28507 invoked by uid 89); 26 Nov 2013 17:05:04 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_50,RDNS_NONE,SPF_HELO_PASS,SPF_PASS,URIBL_BLOCKED autolearn=no version=3.3.2
X-HELO: mx1.redhat.com
Message-ID: <5294D465.2020006@redhat.com>
Date: Tue, 26 Nov 2013 17:05:00 -0000
From: Marko Myllynen <myllynen@redhat.com>
Reply-To: myllynen@redhat.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131028 Thunderbird/17.0.10
MIME-Version: 1.0
To: "'Troy Korjuslommi'" <tjk@tksoft.com>
CC: libc-locales@sourceware.org
Subject: Re: locale encodings
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-SW-Source: 2013-q4/txt/msg00150.txt.bz2

Hi Troy,

> I ran some tests on the fi_FI locale for glibc-2.18 and it seems to
> contain out of date information in regards to collation. The correct
> collation order/data are specified in Finnish standard SFS-EN 13710
> published in 2011 (Finnish standard based on EN 13710 ~aka ISO/IEC
> 14651) and CLDR, and implemented in ICU.

yes, as Keld mentioned, we're aware of this, once EN 13710 is
implemented it should be easy to implement SFS-EN 13710 on top of it.
We're tracking EN 13710 support in
https://sourceware.org/bugzilla/show_bug.cgi?id=16052.

> The charset is wrong. It is listed as iso-8859-1 for fi_FI and
> iso-8859-15 for fi_FI@euro. The correct charset for Finnish is UTF-8.
> Only UTF-8 includes all the characters included in the current
> standards.

Good point, ISO-8859-1 is certainly incorrect since the introduction of
the Euro sign, I'll send a patch to fix this shortly.

> I've tried to push for more cooperation with CLDR in the past too, and
> here is a good case in point why it would actually be a good idea to
> keep an eye on CLDR. There is no need to automate the process
> (difficulty of which seems to be the reason for resisting CLDR), just
> get the relevant data. Running comparison tests between cldr and libc
> would also be a good idea. ICU is pretty up-to-date in terms of CLDR
> and other Unicode.org data, so that would be an easy way to implement
> the tests.

I updated fi_FI two years ago to match CLDR where applicable and to
implement some missing fields (see
https://sourceware.org/bugzilla/show_bug.cgi?id=12962). Based on that,
comparing glibc vs CLDR data manually is quite tedious and even today
some parts are not fully compatible with CLDR/recommendations due to
limitations in POSIX (see
https://sourceware.org/bugzilla/show_bug.cgi?id=12747). However, now
that it's been done once it should be pretty straightforward to keep the
glibc fi_FI data in sync with CLDR.

Thanks,

-- 
Marko Myllynen