From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-locales-return-2731-listarch-libc-locales=sources.redhat.com@sourceware.org>
Received: (qmail 6362 invoked by alias); 14 Nov 2013 11:33:19 -0000
Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-locales.sourceware.org>
List-Subscribe: <mailto:libc-locales-subscribe@sourceware.org>
List-Post: <mailto:libc-locales@sourceware.org>
List-Help: <mailto:libc-locales-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: libc-locales-owner@sourceware.org
Received: (qmail 6349 invoked by uid 89); 14 Nov 2013 11:33:18 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=1.3 required=5.0 tests=AWL,BAYES_50,RDNS_NONE,URIBL_BLOCKED autolearn=no version=3.3.2
X-HELO: rap.rap.dk
Date: Thu, 14 Nov 2013 11:33:00 -0000
From: Keld Simonsen <keld@keldix.com>
To: Troy Korjuslommi <tjk@tksoft.com>
Cc: Steven Abner <pheonix@zoomtown.com>, libc-locales@sourceware.org,
	Carlos O'Donell <carlos@redhat.com>
Subject: Re: locale encodings
Message-ID: <20131114113308.GA9638@rap.rap.dk>
References: <31AACAB8-A716-47CC-B755-F33DD77BA51E@zoomtown.com>
 <1384174607.4028.8.camel@uno11.loco>
 <20131112012257.GA31828@rap.rap.dk>
 <5281BEB1.2010909@redhat.com>
 <20131112133642.GA22738@rap.rap.dk>
 <98244D14-49A6-4953-8F6B-9D393E435324@zoomtown.com>
 <EC3F7154-A278-4126-B33C-10E107B63BD9@zoomtown.com>
 <1384415405.2935.29.camel@uno11.loco>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
In-Reply-To: <1384415405.2935.29.camel@uno11.loco>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-SW-Source: 2013-q4/txt/msg00086.txt.bz2

I am aware of the problem, and will look into it.
It may take some time, tho.

Best regards
keld

On Thu, Nov 14, 2013 at 09:50:05AM +0200, Troy Korjuslommi wrote:
> By the way, I ran some tests on the fi_FI locale for glibc-2.18 and it
> seems to contain out of date information in regards to collation. The
> correct collation order/data are specified in Finnish standard SFS-EN
> 13710 published in 2011 (Finnish standard based on EN 13710 ~aka ISO/IEC
> 14651) and CLDR, and implemented in ICU. Quick look at the fi_FI file
> tells me that at least the dates are off, which would imply the data
> being off. The collation errors seem to be diacritic related, so I would
> have to go through the actual data to determine whether the error is in
> strcoll's dealing with UTF-8 or the collation data. The collation data
> seems to be the most likely suspect. Keld, your name is listed as the
> contact, so maybe best that you check this out. In case only the
> comments are off. Also, the charset is wrong. It is listed as iso-8859-1
> for fi_FI and iso-8859-15 for fi_FI@euro. The correct charset for
> Finnish is UTF-8. Only UTF-8 includes all the characters included in the
> current standards.
> 
> Since EN 13710 specifies a European collation order, it should also be
> used in other Europan locales as the default sorting order.
> 
> I've tried to push for more cooperation with CLDR in the past too, and
> here is a good case in point why it would actually be a good idea to
> keep an eye on CLDR. There is no need to automate the process
> (difficulty of which seems to be the reason for resisting CLDR), just
> get the relevant data. Running comparison tests between cldr and libc
> would also be a good idea. ICU is pretty up-to-date in terms of CLDR and
> other Unicode.org data, so that would be an easy way to implement the
> tests.
> 
> Troy
> 
> 
> On Tue, 2013-11-12 at 10:37 -0500, Steven Abner wrote:
> > On 12 Nov 2013, at 9:34 AM, Steven Abner wrote:
> > 
> > > all data that is important, save one, is in POSIX's 7-bit ASCII
> > 
> >  I wish to add, the quoted strings however are UTF8 instead of the default set. Off the top of my
> > head, the JP file has quoted ("") strings for correct display of months, hours, etc. in UTF8.
> >  As far as embedded, a Japanese microwave doesn't need UTF8 for display, but the designer
> > who butchers the code for the microwave, even a Japanese one, can readily use UTF8 to set up
> > JIS0201 or even their own proprietary 128 or less byte display code, and internal communications.
> > That same designer could use UTF8, and default character information from glibc locales to
> > create an embedded version of a code set for microwaves in China.
> >   Not saying this is standard, but my point was, I guess, is default character set for the locale could
> > or should go into the ASCII section of "LC" data. Comments in any encoding get gobbled, quoted
> > strings either in default character set or UTF8.
> >   I am no expert, just food for thought.
> > Steve
>