From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7566 invoked by alias); 12 May 2006 10:18:02 -0000 Received: (qmail 7537 invoked by uid 48); 12 May 2006 10:17:55 -0000 Date: Fri, 12 May 2006 10:18:00 -0000 Message-ID: <20060512101755.7536.qmail@sourceware.org> From: "pablo at mandriva dot com" To: glibc-bugs@sources.redhat.com In-Reply-To: <20050116082251.672.barbier@linuxfr.org> References: <20050116082251.672.barbier@linuxfr.org> Reply-To: sourceware-bugzilla@sourceware.org Subject: [Bug localedata/672] Include iso14651_t1 in collation rules X-Bugzilla-Reason: CC Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org X-SW-Source: 2006-05/txt/msg00079.txt.bz2 List-Id: ------- Additional Comments From pablo at mandriva dot com 2006-05-12 10:17 ------- The use of iso14651_t1 by default then only redefine or add some local rules if needed is indeed much better than redefine everyhing in a locale; as the things redefined are much smaller, it helps understand the important rules, and more easily detect errors and correct them. Also, it also allow sorting in a predictable way the characters out of the scope of the locale, which is a very nice thing to have. I attached an improved iso14651_t1 that adds a lot of other latin and cyrillic characters that were missing, so they get sorted too; it also handles double accented letters (like in vietnamese); and adds armenian and tifinagh script blocks; considet de t/s with cedilla and t/s with comma below as synonyms for sorting and made digraphs (as opposed to ligatures) as synonyms of the base letters for sorting. It provides a much better default collating set. Note that with the exception of t/s with cedilla and t/s with comma below and the digraphs (which are unicode compatibility stuff and should never be typed directly btw), I mainly only added new, previously ignored, characters. The main advantages of that modified file are a proper (or at least, quite acceptable) sorting, when using a generic (eg not specific to that language) locale; in particular when sorting words from Armenian, Vietnamese, African or Native American languages written in latin script, languages of former USSR written in cyrillic script. -- http://sourceware.org/bugzilla/show_bug.cgi?id=672 ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.