From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28530 invoked by alias); 26 Nov 2013 17:05:05 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 28507 invoked by uid 89); 26 Nov 2013 17:05:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_50,RDNS_NONE,SPF_HELO_PASS,SPF_PASS,URIBL_BLOCKED autolearn=no version=3.3.2 X-HELO: mx1.redhat.com Message-ID: <5294D465.2020006@redhat.com> Date: Tue, 26 Nov 2013 17:05:00 -0000 From: Marko Myllynen Reply-To: myllynen@redhat.com User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131028 Thunderbird/17.0.10 MIME-Version: 1.0 To: "'Troy Korjuslommi'" CC: libc-locales@sourceware.org Subject: Re: locale encodings Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-SW-Source: 2013-q4/txt/msg00150.txt.bz2 Hi Troy, > I ran some tests on the fi_FI locale for glibc-2.18 and it seems to > contain out of date information in regards to collation. The correct > collation order/data are specified in Finnish standard SFS-EN 13710 > published in 2011 (Finnish standard based on EN 13710 ~aka ISO/IEC > 14651) and CLDR, and implemented in ICU. yes, as Keld mentioned, we're aware of this, once EN 13710 is implemented it should be easy to implement SFS-EN 13710 on top of it. We're tracking EN 13710 support in https://sourceware.org/bugzilla/show_bug.cgi?id=16052. > The charset is wrong. It is listed as iso-8859-1 for fi_FI and > iso-8859-15 for fi_FI@euro. The correct charset for Finnish is UTF-8. > Only UTF-8 includes all the characters included in the current > standards. Good point, ISO-8859-1 is certainly incorrect since the introduction of the Euro sign, I'll send a patch to fix this shortly. > I've tried to push for more cooperation with CLDR in the past too, and > here is a good case in point why it would actually be a good idea to > keep an eye on CLDR. There is no need to automate the process > (difficulty of which seems to be the reason for resisting CLDR), just > get the relevant data. Running comparison tests between cldr and libc > would also be a good idea. ICU is pretty up-to-date in terms of CLDR > and other Unicode.org data, so that would be an easy way to implement > the tests. I updated fi_FI two years ago to match CLDR where applicable and to implement some missing fields (see https://sourceware.org/bugzilla/show_bug.cgi?id=12962). Based on that, comparing glibc vs CLDR data manually is quite tedious and even today some parts are not fully compatible with CLDR/recommendations due to limitations in POSIX (see https://sourceware.org/bugzilla/show_bug.cgi?id=12747). However, now that it's been done once it should be pretty straightforward to keep the glibc fi_FI data in sync with CLDR. Thanks, -- Marko Myllynen