* [Bug localedata/10501] New: bn_IN collation does not have canonical equivalence definitions
@ 2009-08-09 5:12 santhosh dot thottingal at gmail dot com
2009-08-10 4:16 ` [Bug localedata/10501] " pravin dot d dot s at gmail dot com
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: santhosh dot thottingal at gmail dot com @ 2009-08-09 5:12 UTC (permalink / raw)
To: libc-locales
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]
Th bn_IN collation definitions does not have canonical equivalence definitions
for the canonical decomposition of the following letters :
U+09CB BENGALI VOWEL SIGN O
U+09CC BENGALI VOWEL SIGN AU
U+09DC BENGALI LETTER RRA
U+09DD BENGALI LETTER RHA
U+09DF BENGALI LETTER YYA
How to reproduce :
Sort the following sequences in LANG =bn_IN.UTF-8
à¦à§à¦¾ written using canonical decomposition of U+09CB BENGALI VOWEL SIGN O
à¦à§
à¦à§
The expected sorting order is
à¦à§
à¦à§à¦¾
à¦à§
But the actual result is
à¦à§à¦¾
à¦à§
à¦à§
--
Summary: bn_IN collation does not have canonical equivalence
definitions
Product: glibc
Version: 2.10
Status: NEW
Severity: normal
Priority: P2
Component: localedata
AssignedTo: libc-locales at sources dot redhat dot com
ReportedBy: santhosh dot thottingal at gmail dot com
CC: glibc-bugs at sources dot redhat dot com
http://sourceware.org/bugzilla/show_bug.cgi?id=10501
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug localedata/10501] bn_IN collation does not have canonical equivalence definitions
2009-08-09 5:12 [Bug localedata/10501] New: " santhosh dot thottingal at gmail dot com
@ 2009-08-10 4:16 ` pravin dot d dot s at gmail dot com
2009-08-17 11:40 ` santhosh dot thottingal at gmail dot com
2009-08-17 12:18 ` sayamindu at gmail dot com
2 siblings, 0 replies; 6+ messages in thread
From: pravin dot d dot s at gmail dot com @ 2009-08-10 4:16 UTC (permalink / raw)
To: libc-locales
------- Additional Comments From pravin dot d dot s at gmail dot com 2009-08-10 04:16 -------
(In reply to comment #0)
> Th bn_IN collation definitions does not have canonical equivalence definitions
> for the canonical decomposition of the following letters :
> U+09CB BENGALI VOWEL SIGN O
> U+09CC BENGALI VOWEL SIGN AU
these combination never occur in real world typing data, so no need to handle
these thing.
even if somebody mistakenly typing same we suppose to tell them this in
incorrect and can create spoofing (and note these are not normalized sequences)
and thats why rendering engine throwing dotted circle for these combination
please check qt, icu there is bug with pango, if possible see uniscribe as well
> U+09DC BENGALI LETTER RRA
> U+09DD BENGALI LETTER RHA
> U+09DF BENGALI LETTER YYA
this is already handled.
so in IMO we should close this bug as Not a Bug
--
http://sourceware.org/bugzilla/show_bug.cgi?id=10501
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug localedata/10501] bn_IN collation does not have canonical equivalence definitions
2009-08-09 5:12 [Bug localedata/10501] New: " santhosh dot thottingal at gmail dot com
2009-08-10 4:16 ` [Bug localedata/10501] " pravin dot d dot s at gmail dot com
@ 2009-08-17 11:40 ` santhosh dot thottingal at gmail dot com
2009-08-17 12:18 ` sayamindu at gmail dot com
2 siblings, 0 replies; 6+ messages in thread
From: santhosh dot thottingal at gmail dot com @ 2009-08-17 11:40 UTC (permalink / raw)
To: libc-locales
------- Additional Comments From santhosh dot thottingal at gmail dot com 2009-08-17 11:40 -------
Refer the collation rules of UCA -
http://www.unicode.org/Public/UCA/latest/allkeys.txt
[...]
09CB ; [.1B48.0020.0002.09CB] # BENGALI VOWEL SIGN O
09C7 09BE ; [.1B48.0020.0002.09CB] # BENGALI VOWEL SIGN O
09CC ; [.1B49.0020.0002.09CC] # BENGALI VOWEL SIGN AU
09C7 09D7 ; [.1B49.0020.0002.09CC] # BENGALI VOWEL SIGN AU
[...]
It is implemented in UCA and should be available in glibc localedata too. ie,
Collation weights of canonically equivalent sequences should be explicitly
defined in glibc and there should not be any assumption on the input to the
collation.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=10501
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug localedata/10501] bn_IN collation does not have canonical equivalence definitions
2009-08-09 5:12 [Bug localedata/10501] New: " santhosh dot thottingal at gmail dot com
2009-08-10 4:16 ` [Bug localedata/10501] " pravin dot d dot s at gmail dot com
2009-08-17 11:40 ` santhosh dot thottingal at gmail dot com
@ 2009-08-17 12:18 ` sayamindu at gmail dot com
2 siblings, 0 replies; 6+ messages in thread
From: sayamindu at gmail dot com @ 2009-08-17 12:18 UTC (permalink / raw)
To: libc-locales
------- Additional Comments From sayamindu at gmail dot com 2009-08-17 12:18 -------
(In reply to comment #2)
> Refer the collation rules of UCA -
> http://www.unicode.org/Public/UCA/latest/allkeys.txt
> [...]
> 09CB ; [.1B48.0020.0002.09CB] # BENGALI VOWEL SIGN O
> 09C7 09BE ; [.1B48.0020.0002.09CB] # BENGALI VOWEL SIGN O
> 09CC ; [.1B49.0020.0002.09CC] # BENGALI VOWEL SIGN AU
> 09C7 09D7 ; [.1B49.0020.0002.09CC] # BENGALI VOWEL SIGN AU
> [...]
>
> It is implemented in UCA and should be available in glibc localedata too. ie,
> Collation weights of canonically equivalent sequences should be explicitly
> defined in glibc and there should not be any assumption on the input to the
> collation.
>
I would tend to second Santhosh here, since we do not know where the data might
be coming from (eg: someone might try to assume a shortcut while implementing a
legacy encoding -> unicode converter, etc)
--
http://sourceware.org/bugzilla/show_bug.cgi?id=10501
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-03 16:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-10501-716@http.sourceware.org/bugzilla/>
2014-07-01 9:36 ` [Bug localedata/10501] bn_IN collation does not have canonical equivalence definitions fweimer at redhat dot com
2024-01-03 16:09 ` maiku.fabian at gmail dot com
2024-01-03 16:10 ` maiku.fabian at gmail dot com
2009-08-09 5:12 [Bug localedata/10501] New: " santhosh dot thottingal at gmail dot com
2009-08-10 4:16 ` [Bug localedata/10501] " pravin dot d dot s at gmail dot com
2009-08-17 11:40 ` santhosh dot thottingal at gmail dot com
2009-08-17 12:18 ` sayamindu at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).