------- Additional Comments From mfabian at suse dot de 2006-05-09 16:02 ------- Comment from Markus Kuhn from the Novell Bugzilla: Comment #4 From Markus Kuhn 2006-03-21 11:24 MST [ ] Private Glibc implements a 4-pass sorting algorithm, something like the Unicode Collation Algorithm defined at http://www.unicode.org/reports/tr10/ or equivalently the International Standard Ordering defined in ISO 14651. The SPACE is not ignored, it affects the sorting order only with lower priority than - the base characters - accents - whether base characters are uppercase or lower case At level 4, space is treated like punctuation. The Unicode sorting algorithm has lots of options. If you look at http://www.unicode.org/reports/tr10/#Variable_Weighting you will see that variable weighting options are avaliable for characters such as SPACE. Perhaps the UTF-8 locales were configured to use something equivalent to the "blanked" option, whereas what the user expects here is the "non-ignorable" option? It is up to the locale designer to chose these options, and I suspect the necessary discussion on which options are best here has never taken place. The culprit is probably in the file /usr/share/i18n/locales/iso14651_t1 the line IGNORE;IGNORE;IGNORE; # 32 which says that SPACE is sorted at level 4 only, i.e. with lowest priority. I don't think this is a particularly good choice. File format spec: http://www.cl.cam.ac.uk/~mgk25/volatile/ISO-14652.pdf People like Ulrich Drepper, Alain LaBonté, Keld J. Simonsen would know more on the origins of this. -- http://sourceware.org/bugzilla/show_bug.cgi?id=2648 ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.