public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug localedata/1430] New: regression: worse collation for hu_HU
@ 2005-10-06 17:45 egmont at uhulinux dot hu
  2005-10-14 20:25 ` [Bug localedata/1430] " drepper at redhat dot com
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: egmont at uhulinux dot hu @ 2005-10-06 17:45 UTC (permalink / raw)
  To: glibc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3115 bytes --]

Please revert libc/localedata/locales/hu_HU revision 1.18, "Better collation".
It is not better, it is worse.

According to the Hungarian rules, aacute, eacute, iacute, oacute and uacute
must be treated the same as their unaccented counterparts, also wovels with
diaeresis should be treated the same as ther counterparts with double acutes.
In other words:
a = á < e = é < i = í < o = ó < ö = &#337; < u = ú < ü = &#369;

For example, the following is a correct alphabetical order:
ablak
állat
apa
áru
az

These wovels in one equivalence class only make a difference if they are the
only letters which differ, e.g.:
Eger
egér
éger
eget
éget

This was perfectly implemented in the previous version, as well as mentioned
in some comment lines within this file (which comment is still there although
it doesn't correspond to what's implemented right now).

I don't know who and why suggested the modifications of 1.18, but he was surely
wrong. If needed, I can scan some pages of dictionaries or phone books and
upload it to prove these sorting rules.

If someone just happens to prefer sorting this way, then he is of course
absolutely free to create an own locale for himself, or set LC_COLLATE=C or
something similar, but there's hardly any place for that work in glibc. Glibc
should follow the national rules, and r1.18 was a move against it.


Ulrich, If I recall correctly, some years ago it was you to whom I sent the
hu_HU sorting rules which fixed some bugs. Then you asked me to manually
sort a lot of words you had previously received from some other Hungarian guy
and test whether glibc sorts it in the same order. Then glibc with those
Hungarian collating rules passed that test, but the new rules would obviously
fail on them. Do you happen to still have that file? (I don't think I have
them, but I'll take a look at it.)

I guess it would be a really wise move to put such kind of sorted files into
glibc's source and add a sorting test case for them.


Ps1: a and á, as well as e and é are different voices so it's often argued
if it's logical to put them in the same group, this is rather a tradition than
a logical decision. On the other hand, i and í, o and ó, ö and &#337;, u and ú, and
finally ü and &#369; are the same voices, with the latter ones pronounced longer.
Crosswords and similar stuff treat a and á, and é and é differently, while the
other pairs are interchangeable there. But alphabetical sorting uses different
rules.

Ps2: All the words above in the examples are real Hungarian words.

-- 
           Summary: regression: worse collation for hu_HU
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
        AssignedTo: libc-locales at sources dot redhat dot com
        ReportedBy: egmont at uhulinux dot hu
                CC: glibc-bugs at sources dot redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=1430

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-02-18  4:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-06 17:45 [Bug localedata/1430] New: regression: worse collation for hu_HU egmont at uhulinux dot hu
2005-10-14 20:25 ` [Bug localedata/1430] " drepper at redhat dot com
2005-10-17  7:57 ` egmont at uhulinux dot hu
2005-10-17  8:01 ` jakub at redhat dot com
2005-10-18 11:06 ` egmont at uhulinux dot hu
2005-10-18 14:24 ` drepper at redhat dot com
2005-10-18 14:29 ` egmont at uhulinux dot hu
2006-04-25 18:28 ` drepper at redhat dot com
2006-04-25 18:50 ` egmont at uhulinux dot hu
2006-05-03 14:05 ` egmont at uhulinux dot hu
2006-05-03 14:07 ` egmont at uhulinux dot hu
2006-05-03 14:13 ` egmont at uhulinux dot hu
2006-11-16 11:52 ` [Bug localedata/1430] [2.4/2.5 regression] " egmont at uhulinux dot hu
2007-02-18  4:43 ` drepper at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).