From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20228 invoked by alias); 6 Oct 2005 17:45:12 -0000 Mailing-List: contact glibc-bugs-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sources.redhat.com Received: (qmail 19917 invoked by uid 48); 6 Oct 2005 17:45:08 -0000 Date: Thu, 06 Oct 2005 17:45:00 -0000 From: "egmont at uhulinux dot hu" To: glibc-bugs@sources.redhat.com Message-ID: <20051006174507.1430.egmont@uhulinux.hu> Reply-To: sourceware-bugzilla@sourceware.org Subject: [Bug localedata/1430] New: regression: worse collation for hu_HU X-Bugzilla-Reason: CC X-SW-Source: 2005-10/txt/msg00009.txt.bz2 List-Id: Please revert libc/localedata/locales/hu_HU revision 1.18, "Better collation". It is not better, it is worse. According to the Hungarian rules, aacute, eacute, iacute, oacute and uacute must be treated the same as their unaccented counterparts, also wovels with diaeresis should be treated the same as ther counterparts with double acutes. In other words: a = á < e = é < i = í < o = ó < ö = ő < u = ú < ü = ű For example, the following is a correct alphabetical order: ablak állat apa áru az These wovels in one equivalence class only make a difference if they are the only letters which differ, e.g.: Eger egér éger eget éget This was perfectly implemented in the previous version, as well as mentioned in some comment lines within this file (which comment is still there although it doesn't correspond to what's implemented right now). I don't know who and why suggested the modifications of 1.18, but he was surely wrong. If needed, I can scan some pages of dictionaries or phone books and upload it to prove these sorting rules. If someone just happens to prefer sorting this way, then he is of course absolutely free to create an own locale for himself, or set LC_COLLATE=C or something similar, but there's hardly any place for that work in glibc. Glibc should follow the national rules, and r1.18 was a move against it. Ulrich, If I recall correctly, some years ago it was you to whom I sent the hu_HU sorting rules which fixed some bugs. Then you asked me to manually sort a lot of words you had previously received from some other Hungarian guy and test whether glibc sorts it in the same order. Then glibc with those Hungarian collating rules passed that test, but the new rules would obviously fail on them. Do you happen to still have that file? (I don't think I have them, but I'll take a look at it.) I guess it would be a really wise move to put such kind of sorted files into glibc's source and add a sorting test case for them. Ps1: a and á, as well as e and é are different voices so it's often argued if it's logical to put them in the same group, this is rather a tradition than a logical decision. On the other hand, i and í, o and ó, ö and ő, u and ú, and finally ü and ű are the same voices, with the latter ones pronounced longer. Crosswords and similar stuff treat a and á, and é and é differently, while the other pairs are interchangeable there. But alphabetical sorting uses different rules. Ps2: All the words above in the examples are real Hungarian words. -- Summary: regression: worse collation for hu_HU Product: glibc Version: unspecified Status: NEW Severity: normal Priority: P2 Component: localedata AssignedTo: libc-locales at sources dot redhat dot com ReportedBy: egmont at uhulinux dot hu CC: glibc-bugs at sources dot redhat dot com http://sourceware.org/bugzilla/show_bug.cgi?id=1430 ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.