public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/19922] New: [PATCH] iso14651_t1_common: Define collation for Malayalam chillu characters
@ 2016-04-08  6:58 santhosh.thottingal at gmail dot com
  2016-04-15 21:27 ` [Bug localedata/19922] " vapier at gentoo dot org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: santhosh.thottingal at gmail dot com @ 2016-04-08  6:58 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=19922

            Bug ID: 19922
           Summary: [PATCH] iso14651_t1_common: Define collation for
                    Malayalam chillu characters
           Product: glibc
           Version: 2.25
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: santhosh.thottingal at gmail dot com
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

Created attachment 9164
  --> https://sourceware.org/bugzilla/attachment.cgi?id=9164&action=edit
iso14651_t1_common: define collation for Malayalam chillu characters

Malayalam Chillu characters, that were added in Unicode 5.1 is not considered
in the collation rules for Malayalam. These 6 characters are 
U+07DA  to U+07DF

Unicode defines them as alternate representation of ZWJ based Chillus
(Consonant+Virama+ZWJ). ZWJ based chillus are represented in the collation
rules already.

So U+07DA  to U+07DF should have primary collation weight equal to the ZWJ
based Chillus. Note that ZWJ has 0 collation weight(ignorable in collation).
So:

U+07DA(ൺ) and U+0D23(ണ)+ U+0D4D(്) have same primary weight and differs in
secondary level weight.

Unicode CLDR collation also follows exactly same logic. See
http://unicode.org/cldr/trac/browser/trunk/common/collation/ml.xml

 [...]
 #  Pre-5.1 Chillus secondary equal to 5.1 chillus.
 #  Chillus primary equal to their consonant_dead form.
 &ക്<<ക്\u200D<<<ൿ
 &ണ്<<ണ്\u200D<<<ൺ
 &ന്<<ന്\u200D<<<ൻ
 &ര്<<ര്\u200D<<<ർ
 &ല്<<ല്\u200D<<<ൽ
 &ള്<<ള്\u200D<<<ൾ
 [...]


The attached patch implements this.

To test, have a text file with following content:
ണ്‍
ണ്
ൺ

$ LANG=ml_IN.UTF-8 sort ~/sort.txt
ണ്
ണ്‍
ൺ

The same input can be tested with
http://demo.icu-project.org/icu-bin/collation.html and verify the output is
same as the above output.

Explanation of output:

1. ണ\u0D4D - This is ണ + ് 
2. ണ\u0D4D\u200D - This is ണ + ് + ZWJ - ZWJ based chillu. Sorts after the ZWJ
less dead form of ണ.
3. ൺ - This is atomic chillu ൺ U+07DA - with secondary level collation weight
differing from above ZWJ based chillu.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-07-13 15:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-08  6:58 [Bug localedata/19922] New: [PATCH] iso14651_t1_common: Define collation for Malayalam chillu characters santhosh.thottingal at gmail dot com
2016-04-15 21:27 ` [Bug localedata/19922] " vapier at gentoo dot org
2016-05-15 11:07 ` santhosh.thottingal at gmail dot com
2017-04-18  9:59 ` pravin.d.s at gmail dot com
2017-04-18  9:59 ` santhosh.thottingal at gmail dot com
2017-04-19 10:44 ` santhosh.thottingal at gmail dot com
2017-04-19 11:05 ` pravin.d.s at gmail dot com
2017-06-11 14:12 ` cvs-commit at gcc dot gnu.org
2017-06-11 14:19 ` zackw at panix dot com
2017-06-11 14:29 ` cvs-commit at gcc dot gnu.org
2017-06-11 14:32 ` cvs-commit at gcc dot gnu.org
2017-06-11 14:35 ` cvs-commit at gcc dot gnu.org
2017-07-13 15:37 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).