public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/18943] New: Collation of NFD strings
@ 2015-09-09 21:19 egmont at gmail dot com
  2015-09-10  5:51 ` Keld Simonsen
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: egmont at gmail dot com @ 2015-09-09 21:19 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=18943

            Bug ID: 18943
           Summary: Collation of NFD strings
           Product: glibc
           Version: 2.22
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: egmont at gmail dot com
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

Forking off from bug 18927 comment 8 & 11:

Collate definitions currently assume the input to be in NFC. If the available
UTF-8 unittests are converted to NFD (the localedata/*.in files which have
UTF-8 in Makefile's test-input) then they fail.

It would be nice to automatically make normalization the lowest priority factor
when deciding on collation, so that different normalizations of the same word
are as close to each other as possible. That is, to implement it once (e.g. in
iso14651_common) without having to modify individual locale definitions.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug localedata/18943] New: Collation of NFD strings
  2015-09-09 21:19 [Bug localedata/18943] New: Collation of NFD strings egmont at gmail dot com
@ 2015-09-10  5:51 ` Keld Simonsen
  2015-09-10  5:52 ` [Bug localedata/18943] " keld at keldix dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Keld Simonsen @ 2015-09-10  5:51 UTC (permalink / raw)
  To: egmont at gmail dot com; +Cc: libc-locales

On Wed, Sep 09, 2015 at 07:46:02PM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=18943
> 
>             Bug ID: 18943
>            Summary: Collation of NFD strings
>            Product: glibc
>            Version: 2.22
>             Status: NEW
>           Severity: enhancement
>           Priority: P2
>          Component: localedata
>           Assignee: unassigned at sourceware dot org
>           Reporter: egmont at gmail dot com
>                 CC: libc-locales at sourceware dot org
>   Target Milestone: ---
> 
> Forking off from bug 18927 comment 8 & 11:
> 
> Collate definitions currently assume the input to be in NFC. If the available
> UTF-8 unittests are converted to NFD (the localedata/*.in files which have
> UTF-8 in Makefile's test-input) then they fail.
> 
> It would be nice to automatically make normalization the lowest priority factor
> when deciding on collation, so that different normalizations of the same word
> are as close to each other as possible. That is, to implement it once (e.g. in
> iso14651_common) without having to modify individual locale definitions.

Both NFC and NFD data should collate as expected. And you can mix then as you like,
you do not need to normalize them.

Best regards
keld

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug localedata/18943] Collation of NFD strings
  2015-09-09 21:19 [Bug localedata/18943] New: Collation of NFD strings egmont at gmail dot com
  2015-09-10  5:51 ` Keld Simonsen
@ 2015-09-10  5:52 ` keld at keldix dot com
  2015-09-10  6:40 ` egmont at gmail dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: keld at keldix dot com @ 2015-09-10  5:52 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=18943

--- Comment #1 from keld at keldix dot com <keld at keldix dot com> ---
On Wed, Sep 09, 2015 at 07:46:02PM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=18943
> 
>             Bug ID: 18943
>            Summary: Collation of NFD strings
>            Product: glibc
>            Version: 2.22
>             Status: NEW
>           Severity: enhancement
>           Priority: P2
>          Component: localedata
>           Assignee: unassigned at sourceware dot org
>           Reporter: egmont at gmail dot com
>                 CC: libc-locales at sourceware dot org
>   Target Milestone: ---
> 
> Forking off from bug 18927 comment 8 & 11:
> 
> Collate definitions currently assume the input to be in NFC. If the available
> UTF-8 unittests are converted to NFD (the localedata/*.in files which have
> UTF-8 in Makefile's test-input) then they fail.
> 
> It would be nice to automatically make normalization the lowest priority factor
> when deciding on collation, so that different normalizations of the same word
> are as close to each other as possible. That is, to implement it once (e.g. in
> iso14651_common) without having to modify individual locale definitions.

Both NFC and NFD data should collate as expected. And you can mix then as you
like,
you do not need to normalize them.

Best regards
keld

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug localedata/18943] Collation of NFD strings
  2015-09-09 21:19 [Bug localedata/18943] New: Collation of NFD strings egmont at gmail dot com
  2015-09-10  5:51 ` Keld Simonsen
  2015-09-10  5:52 ` [Bug localedata/18943] " keld at keldix dot com
@ 2015-09-10  6:40 ` egmont at gmail dot com
  2015-09-10  6:49   ` Keld Simonsen
  2015-09-10  6:52 ` keld at keldix dot com
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 8+ messages in thread
From: egmont at gmail dot com @ 2015-09-10  6:40 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=18943

--- Comment #2 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to keld@keldix.com from comment #1)

> Both NFC and NFD data should collate as expected. And you can mix then as
> you like,
> you do not need to normalize them.

Not sure what you mean by "should" or "can"... whether you agree with me that
this should be the desired behavior (glad to hear it), or claim that this is
what actually happens (which is unfortunately false).

Revert a broken change pointed out in bug 18589 (to make the tests pass
deterministicly at the first place). Run "make tests" -> success.

Then use "uconv -x any-nfd" to convert fr_FR.in, si_LK.in, tr_TR.in, uk_UA.in
(and perhaps hu_HU.in from bug 18934) to NFD. Re-run "make tests" -> failure.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug localedata/18943] Collation of NFD strings
  2015-09-10  6:40 ` egmont at gmail dot com
@ 2015-09-10  6:49   ` Keld Simonsen
  0 siblings, 0 replies; 8+ messages in thread
From: Keld Simonsen @ 2015-09-10  6:49 UTC (permalink / raw)
  To: egmont at gmail dot com; +Cc: libc-locales

On Thu, Sep 10, 2015 at 06:38:43AM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=18943
> 
> --- Comment #2 from Egmont Koblinger <egmont at gmail dot com> ---
> (In reply to keld@keldix.com from comment #1)
> 
> > Both NFC and NFD data should collate as expected. And you can mix then as
> > you like,
> > you do not need to normalize them.
> 
> Not sure what you mean by "should" or "can"... whether you agree with me that
> this should be the desired behavior (glad to hear it), or claim that this is
> what actually happens (which is unfortunately false).
> 
> Revert a broken change pointed out in bug 18589 (to make the tests pass
> deterministicly at the first place). Run "make tests" -> success.
> 
> Then use "uconv -x any-nfd" to convert fr_FR.in, si_LK.in, tr_TR.in, uk_UA.in
> (and perhaps hu_HU.in from bug 18934) to NFD. Re-run "make tests" -> failure.

What I mean is that ISO 14651 tables are made to have this feature - I specified it.
Whether this is then the case for the i18n locale in glibc, I understand
that this is not the case at this point. Whether this is because of old data
or insufficient implementation of the standard I don't know.

best regards
keld

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug localedata/18943] Collation of NFD strings
  2015-09-09 21:19 [Bug localedata/18943] New: Collation of NFD strings egmont at gmail dot com
                   ` (2 preceding siblings ...)
  2015-09-10  6:40 ` egmont at gmail dot com
@ 2015-09-10  6:52 ` keld at keldix dot com
  2015-09-10  7:25 ` egmont at gmail dot com
  2017-10-21  8:12 ` maiku.fabian at gmail dot com
  5 siblings, 0 replies; 8+ messages in thread
From: keld at keldix dot com @ 2015-09-10  6:52 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=18943

--- Comment #3 from keld at keldix dot com <keld at keldix dot com> ---
On Thu, Sep 10, 2015 at 06:38:43AM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=18943
> 
> --- Comment #2 from Egmont Koblinger <egmont at gmail dot com> ---
> (In reply to keld@keldix.com from comment #1)
> 
> > Both NFC and NFD data should collate as expected. And you can mix then as
> > you like,
> > you do not need to normalize them.
> 
> Not sure what you mean by "should" or "can"... whether you agree with me that
> this should be the desired behavior (glad to hear it), or claim that this is
> what actually happens (which is unfortunately false).
> 
> Revert a broken change pointed out in bug 18589 (to make the tests pass
> deterministicly at the first place). Run "make tests" -> success.
> 
> Then use "uconv -x any-nfd" to convert fr_FR.in, si_LK.in, tr_TR.in, uk_UA.in
> (and perhaps hu_HU.in from bug 18934) to NFD. Re-run "make tests" -> failure.

What I mean is that ISO 14651 tables are made to have this feature - I
specified it.
Whether this is then the case for the i18n locale in glibc, I understand
that this is not the case at this point. Whether this is because of old data
or insufficient implementation of the standard I don't know.

best regards
keld

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug localedata/18943] Collation of NFD strings
  2015-09-09 21:19 [Bug localedata/18943] New: Collation of NFD strings egmont at gmail dot com
                   ` (3 preceding siblings ...)
  2015-09-10  6:52 ` keld at keldix dot com
@ 2015-09-10  7:25 ` egmont at gmail dot com
  2017-10-21  8:12 ` maiku.fabian at gmail dot com
  5 siblings, 0 replies; 8+ messages in thread
From: egmont at gmail dot com @ 2015-09-10  7:25 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=18943

--- Comment #4 from Egmont Koblinger <egmont at gmail dot com> ---
Thanks for the clarification!

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug localedata/18943] Collation of NFD strings
  2015-09-09 21:19 [Bug localedata/18943] New: Collation of NFD strings egmont at gmail dot com
                   ` (4 preceding siblings ...)
  2015-09-10  7:25 ` egmont at gmail dot com
@ 2017-10-21  8:12 ` maiku.fabian at gmail dot com
  5 siblings, 0 replies; 8+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-10-21  8:12 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=18943

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maiku.fabian at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-10-21  8:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-09 21:19 [Bug localedata/18943] New: Collation of NFD strings egmont at gmail dot com
2015-09-10  5:51 ` Keld Simonsen
2015-09-10  5:52 ` [Bug localedata/18943] " keld at keldix dot com
2015-09-10  6:40 ` egmont at gmail dot com
2015-09-10  6:49   ` Keld Simonsen
2015-09-10  6:52 ` keld at keldix dot com
2015-09-10  7:25 ` egmont at gmail dot com
2017-10-21  8:12 ` maiku.fabian at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).