public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/10501] bn_IN collation does not have canonical equivalence definitions
       [not found] <bug-10501-716@http.sourceware.org/bugzilla/>
@ 2014-07-01  9:36 ` fweimer at redhat dot com
  2024-01-03 16:09 ` maiku.fabian at gmail dot com
  2024-01-03 16:10 ` maiku.fabian at gmail dot com
  2 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2014-07-01  9:36 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=10501

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/10501] bn_IN collation does not have canonical equivalence definitions
       [not found] <bug-10501-716@http.sourceware.org/bugzilla/>
  2014-07-01  9:36 ` [Bug localedata/10501] bn_IN collation does not have canonical equivalence definitions fweimer at redhat dot com
@ 2024-01-03 16:09 ` maiku.fabian at gmail dot com
  2024-01-03 16:10 ` maiku.fabian at gmail dot com
  2 siblings, 0 replies; 6+ messages in thread
From: maiku.fabian at gmail dot com @ 2024-01-03 16:09 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=10501

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maiku.fabian at gmail dot com

--- Comment #4 from Mike FABIAN <maiku.fabian at gmail dot com> ---
glibc-2.38 seems to sort the way you want to:

mfabian@hathi:/local/mfabian/src/glibc/localedata (master $%)
$ cat bn_IN.UTF-8.in
কো
কৈ
কো 
mfabian@hathi:/local/mfabian/src/glibc/localedata (master $%)
$ LC_ALL=bn_IN.UTF-8 sort < bn_IN.UTF-8.in
কৈ
কো
কো 
mfabian@hathi:/local/mfabian/src/glibc/localedata (master $%)
$ rpm -q glibc
glibc-2.38-14.fc39.x86_64
mfabian@hathi:/local/mfabian/src/glibc/localedata (master $%)
$ 

The Bengali locale bn_IN just includes the standard iso sort order

LC_COLLATE
% Copy the template from ISO/IEC 14651
copy "iso14651_t1"

%
END LC_COLLATE

That iso sort order should be the same as DUCET:

https://unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table

It used to be extremely out of date but in 2018 I did an update to the 2016
version,
see this commit:

    commit 9479b6d5e08eacce06c6ab60abc9b2f4eb8b71e4
    Author: Mike FABIAN <mfabian@redhat.com>
    Date:   Tue Jan 30 17:59:00 2018 +0100

        Update iso14651_t1_common file to ISO14651_2016_TABLE1_en.txt [BZ
#14095]

        [BZ #14095] - Review / update collation data from Unicode / ISO 14651

        File downloaded from:
        http://standards.iso.org/iso-iec/14651/ed-4/ISO14651_2016_TABLE1_en.txt

        Updating this file alone is not enough, there are problems in the new
        file which need to be fixed and the collation rules for many locales
        need to be adapted. This is done by the following patches.

        This update also fixes the problem that many characters are treated as
        identical when sorting because they were not yet in the old
        iso14651_t1_common file, see:

        https://bugzilla.redhat.com/show_bug.cgi?id=1336308
        - Infinite (∞) and empty set (∅) are treated as if they were the same
character by sort and uniq

                [BZ #14095]
                * localedata/locales/iso14651_t1_common: Update file to
                latest version from ISO (ISO14651_2016_TABLE1_en.txt).

This might have fixed the problem reported here.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/10501] bn_IN collation does not have canonical equivalence definitions
       [not found] <bug-10501-716@http.sourceware.org/bugzilla/>
  2014-07-01  9:36 ` [Bug localedata/10501] bn_IN collation does not have canonical equivalence definitions fweimer at redhat dot com
  2024-01-03 16:09 ` maiku.fabian at gmail dot com
@ 2024-01-03 16:10 ` maiku.fabian at gmail dot com
  2 siblings, 0 replies; 6+ messages in thread
From: maiku.fabian at gmail dot com @ 2024-01-03 16:10 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=10501

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
           Assignee|libc-locales at sourceware dot org |maiku.fabian at gmail dot com
   Target Milestone|---                         |2.38
             Status|NEW                         |RESOLVED

--- Comment #5 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Closing as fixed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/10501] bn_IN collation does not have canonical equivalence definitions
  2009-08-09  5:12 [Bug localedata/10501] New: " santhosh dot thottingal at gmail dot com
  2009-08-10  4:16 ` [Bug localedata/10501] " pravin dot d dot s at gmail dot com
  2009-08-17 11:40 ` santhosh dot thottingal at gmail dot com
@ 2009-08-17 12:18 ` sayamindu at gmail dot com
  2 siblings, 0 replies; 6+ messages in thread
From: sayamindu at gmail dot com @ 2009-08-17 12:18 UTC (permalink / raw)
  To: libc-locales


------- Additional Comments From sayamindu at gmail dot com  2009-08-17 12:18 -------
(In reply to comment #2)
> Refer the collation rules of UCA -
> http://www.unicode.org/Public/UCA/latest/allkeys.txt
> [...]
> 09CB  ; [.1B48.0020.0002.09CB] # BENGALI VOWEL SIGN O
> 09C7 09BE ; [.1B48.0020.0002.09CB] # BENGALI VOWEL SIGN O
> 09CC  ; [.1B49.0020.0002.09CC] # BENGALI VOWEL SIGN AU
> 09C7 09D7 ; [.1B49.0020.0002.09CC] # BENGALI VOWEL SIGN AU
> [...]
> 
> It is implemented in UCA and should be available in glibc localedata too. ie,
> Collation weights of canonically equivalent sequences should be explicitly
> defined in glibc and there should not be any assumption on the input to the
> collation.
> 

I would tend to second Santhosh here, since we do not know where the data might
be coming from (eg: someone might try to assume a shortcut while implementing a
legacy encoding -> unicode converter, etc)


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10501

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/10501] bn_IN collation does not have canonical equivalence definitions
  2009-08-09  5:12 [Bug localedata/10501] New: " santhosh dot thottingal at gmail dot com
  2009-08-10  4:16 ` [Bug localedata/10501] " pravin dot d dot s at gmail dot com
@ 2009-08-17 11:40 ` santhosh dot thottingal at gmail dot com
  2009-08-17 12:18 ` sayamindu at gmail dot com
  2 siblings, 0 replies; 6+ messages in thread
From: santhosh dot thottingal at gmail dot com @ 2009-08-17 11:40 UTC (permalink / raw)
  To: libc-locales


------- Additional Comments From santhosh dot thottingal at gmail dot com  2009-08-17 11:40 -------
Refer the collation rules of UCA -
http://www.unicode.org/Public/UCA/latest/allkeys.txt
[...]
09CB  ; [.1B48.0020.0002.09CB] # BENGALI VOWEL SIGN O
09C7 09BE ; [.1B48.0020.0002.09CB] # BENGALI VOWEL SIGN O
09CC  ; [.1B49.0020.0002.09CC] # BENGALI VOWEL SIGN AU
09C7 09D7 ; [.1B49.0020.0002.09CC] # BENGALI VOWEL SIGN AU
[...]

It is implemented in UCA and should be available in glibc localedata too. ie,
Collation weights of canonically equivalent sequences should be explicitly
defined in glibc and there should not be any assumption on the input to the
collation.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10501

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/10501] bn_IN collation does not have canonical equivalence definitions
  2009-08-09  5:12 [Bug localedata/10501] New: " santhosh dot thottingal at gmail dot com
@ 2009-08-10  4:16 ` pravin dot d dot s at gmail dot com
  2009-08-17 11:40 ` santhosh dot thottingal at gmail dot com
  2009-08-17 12:18 ` sayamindu at gmail dot com
  2 siblings, 0 replies; 6+ messages in thread
From: pravin dot d dot s at gmail dot com @ 2009-08-10  4:16 UTC (permalink / raw)
  To: libc-locales


------- Additional Comments From pravin dot d dot s at gmail dot com  2009-08-10 04:16 -------
(In reply to comment #0)
> Th bn_IN collation definitions does not have canonical equivalence definitions
> for the canonical decomposition of the following letters :
> U+09CB BENGALI VOWEL SIGN O
> U+09CC BENGALI VOWEL SIGN AU

these combination never occur in real world typing data, so no need to handle
these thing.

even if somebody mistakenly typing same we suppose to tell them this in
incorrect and can create spoofing (and note these are not normalized sequences)
and thats why rendering engine throwing dotted circle for these combination
please check qt, icu there is bug with pango, if possible see uniscribe as well  
 
> U+09DC BENGALI LETTER RRA
> U+09DD BENGALI LETTER RHA
> U+09DF BENGALI LETTER YYA

this is already handled.

so in IMO we should close this bug as Not a Bug

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10501

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-01-03 16:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-10501-716@http.sourceware.org/bugzilla/>
2014-07-01  9:36 ` [Bug localedata/10501] bn_IN collation does not have canonical equivalence definitions fweimer at redhat dot com
2024-01-03 16:09 ` maiku.fabian at gmail dot com
2024-01-03 16:10 ` maiku.fabian at gmail dot com
2009-08-09  5:12 [Bug localedata/10501] New: " santhosh dot thottingal at gmail dot com
2009-08-10  4:16 ` [Bug localedata/10501] " pravin dot d dot s at gmail dot com
2009-08-17 11:40 ` santhosh dot thottingal at gmail dot com
2009-08-17 12:18 ` sayamindu at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).