* [Bug localedata/18943] New: Collation of NFD strings
@ 2015-09-09 21:19 egmont at gmail dot com
2015-09-10 5:51 ` Keld Simonsen
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: egmont at gmail dot com @ 2015-09-09 21:19 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=18943
Bug ID: 18943
Summary: Collation of NFD strings
Product: glibc
Version: 2.22
Status: NEW
Severity: enhancement
Priority: P2
Component: localedata
Assignee: unassigned at sourceware dot org
Reporter: egmont at gmail dot com
CC: libc-locales at sourceware dot org
Target Milestone: ---
Forking off from bug 18927 comment 8 & 11:
Collate definitions currently assume the input to be in NFC. If the available
UTF-8 unittests are converted to NFD (the localedata/*.in files which have
UTF-8 in Makefile's test-input) then they fail.
It would be nice to automatically make normalization the lowest priority factor
when deciding on collation, so that different normalizations of the same word
are as close to each other as possible. That is, to implement it once (e.g. in
iso14651_common) without having to modify individual locale definitions.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bug localedata/18943] New: Collation of NFD strings
2015-09-09 21:19 [Bug localedata/18943] New: Collation of NFD strings egmont at gmail dot com
@ 2015-09-10 5:51 ` Keld Simonsen
2015-09-10 5:52 ` [Bug localedata/18943] " keld at keldix dot com
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Keld Simonsen @ 2015-09-10 5:51 UTC (permalink / raw)
To: egmont at gmail dot com; +Cc: libc-locales
On Wed, Sep 09, 2015 at 07:46:02PM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=18943
>
> Bug ID: 18943
> Summary: Collation of NFD strings
> Product: glibc
> Version: 2.22
> Status: NEW
> Severity: enhancement
> Priority: P2
> Component: localedata
> Assignee: unassigned at sourceware dot org
> Reporter: egmont at gmail dot com
> CC: libc-locales at sourceware dot org
> Target Milestone: ---
>
> Forking off from bug 18927 comment 8 & 11:
>
> Collate definitions currently assume the input to be in NFC. If the available
> UTF-8 unittests are converted to NFD (the localedata/*.in files which have
> UTF-8 in Makefile's test-input) then they fail.
>
> It would be nice to automatically make normalization the lowest priority factor
> when deciding on collation, so that different normalizations of the same word
> are as close to each other as possible. That is, to implement it once (e.g. in
> iso14651_common) without having to modify individual locale definitions.
Both NFC and NFD data should collate as expected. And you can mix then as you like,
you do not need to normalize them.
Best regards
keld
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug localedata/18943] Collation of NFD strings
2015-09-09 21:19 [Bug localedata/18943] New: Collation of NFD strings egmont at gmail dot com
2015-09-10 5:51 ` Keld Simonsen
@ 2015-09-10 5:52 ` keld at keldix dot com
2015-09-10 6:40 ` egmont at gmail dot com
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: keld at keldix dot com @ 2015-09-10 5:52 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=18943
--- Comment #1 from keld at keldix dot com <keld at keldix dot com> ---
On Wed, Sep 09, 2015 at 07:46:02PM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=18943
>
> Bug ID: 18943
> Summary: Collation of NFD strings
> Product: glibc
> Version: 2.22
> Status: NEW
> Severity: enhancement
> Priority: P2
> Component: localedata
> Assignee: unassigned at sourceware dot org
> Reporter: egmont at gmail dot com
> CC: libc-locales at sourceware dot org
> Target Milestone: ---
>
> Forking off from bug 18927 comment 8 & 11:
>
> Collate definitions currently assume the input to be in NFC. If the available
> UTF-8 unittests are converted to NFD (the localedata/*.in files which have
> UTF-8 in Makefile's test-input) then they fail.
>
> It would be nice to automatically make normalization the lowest priority factor
> when deciding on collation, so that different normalizations of the same word
> are as close to each other as possible. That is, to implement it once (e.g. in
> iso14651_common) without having to modify individual locale definitions.
Both NFC and NFD data should collate as expected. And you can mix then as you
like,
you do not need to normalize them.
Best regards
keld
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug localedata/18943] Collation of NFD strings
2015-09-09 21:19 [Bug localedata/18943] New: Collation of NFD strings egmont at gmail dot com
2015-09-10 5:51 ` Keld Simonsen
2015-09-10 5:52 ` [Bug localedata/18943] " keld at keldix dot com
@ 2015-09-10 6:40 ` egmont at gmail dot com
2015-09-10 6:49 ` Keld Simonsen
2015-09-10 6:52 ` keld at keldix dot com
` (2 subsequent siblings)
5 siblings, 1 reply; 8+ messages in thread
From: egmont at gmail dot com @ 2015-09-10 6:40 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=18943
--- Comment #2 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to keld@keldix.com from comment #1)
> Both NFC and NFD data should collate as expected. And you can mix then as
> you like,
> you do not need to normalize them.
Not sure what you mean by "should" or "can"... whether you agree with me that
this should be the desired behavior (glad to hear it), or claim that this is
what actually happens (which is unfortunately false).
Revert a broken change pointed out in bug 18589 (to make the tests pass
deterministicly at the first place). Run "make tests" -> success.
Then use "uconv -x any-nfd" to convert fr_FR.in, si_LK.in, tr_TR.in, uk_UA.in
(and perhaps hu_HU.in from bug 18934) to NFD. Re-run "make tests" -> failure.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bug localedata/18943] Collation of NFD strings
2015-09-10 6:40 ` egmont at gmail dot com
@ 2015-09-10 6:49 ` Keld Simonsen
0 siblings, 0 replies; 8+ messages in thread
From: Keld Simonsen @ 2015-09-10 6:49 UTC (permalink / raw)
To: egmont at gmail dot com; +Cc: libc-locales
On Thu, Sep 10, 2015 at 06:38:43AM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=18943
>
> --- Comment #2 from Egmont Koblinger <egmont at gmail dot com> ---
> (In reply to keld@keldix.com from comment #1)
>
> > Both NFC and NFD data should collate as expected. And you can mix then as
> > you like,
> > you do not need to normalize them.
>
> Not sure what you mean by "should" or "can"... whether you agree with me that
> this should be the desired behavior (glad to hear it), or claim that this is
> what actually happens (which is unfortunately false).
>
> Revert a broken change pointed out in bug 18589 (to make the tests pass
> deterministicly at the first place). Run "make tests" -> success.
>
> Then use "uconv -x any-nfd" to convert fr_FR.in, si_LK.in, tr_TR.in, uk_UA.in
> (and perhaps hu_HU.in from bug 18934) to NFD. Re-run "make tests" -> failure.
What I mean is that ISO 14651 tables are made to have this feature - I specified it.
Whether this is then the case for the i18n locale in glibc, I understand
that this is not the case at this point. Whether this is because of old data
or insufficient implementation of the standard I don't know.
best regards
keld
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug localedata/18943] Collation of NFD strings
2015-09-09 21:19 [Bug localedata/18943] New: Collation of NFD strings egmont at gmail dot com
` (2 preceding siblings ...)
2015-09-10 6:40 ` egmont at gmail dot com
@ 2015-09-10 6:52 ` keld at keldix dot com
2015-09-10 7:25 ` egmont at gmail dot com
2017-10-21 8:12 ` maiku.fabian at gmail dot com
5 siblings, 0 replies; 8+ messages in thread
From: keld at keldix dot com @ 2015-09-10 6:52 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=18943
--- Comment #3 from keld at keldix dot com <keld at keldix dot com> ---
On Thu, Sep 10, 2015 at 06:38:43AM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=18943
>
> --- Comment #2 from Egmont Koblinger <egmont at gmail dot com> ---
> (In reply to keld@keldix.com from comment #1)
>
> > Both NFC and NFD data should collate as expected. And you can mix then as
> > you like,
> > you do not need to normalize them.
>
> Not sure what you mean by "should" or "can"... whether you agree with me that
> this should be the desired behavior (glad to hear it), or claim that this is
> what actually happens (which is unfortunately false).
>
> Revert a broken change pointed out in bug 18589 (to make the tests pass
> deterministicly at the first place). Run "make tests" -> success.
>
> Then use "uconv -x any-nfd" to convert fr_FR.in, si_LK.in, tr_TR.in, uk_UA.in
> (and perhaps hu_HU.in from bug 18934) to NFD. Re-run "make tests" -> failure.
What I mean is that ISO 14651 tables are made to have this feature - I
specified it.
Whether this is then the case for the i18n locale in glibc, I understand
that this is not the case at this point. Whether this is because of old data
or insufficient implementation of the standard I don't know.
best regards
keld
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug localedata/18943] Collation of NFD strings
2015-09-09 21:19 [Bug localedata/18943] New: Collation of NFD strings egmont at gmail dot com
` (3 preceding siblings ...)
2015-09-10 6:52 ` keld at keldix dot com
@ 2015-09-10 7:25 ` egmont at gmail dot com
2017-10-21 8:12 ` maiku.fabian at gmail dot com
5 siblings, 0 replies; 8+ messages in thread
From: egmont at gmail dot com @ 2015-09-10 7:25 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=18943
--- Comment #4 from Egmont Koblinger <egmont at gmail dot com> ---
Thanks for the clarification!
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug localedata/18943] Collation of NFD strings
2015-09-09 21:19 [Bug localedata/18943] New: Collation of NFD strings egmont at gmail dot com
` (4 preceding siblings ...)
2015-09-10 7:25 ` egmont at gmail dot com
@ 2017-10-21 8:12 ` maiku.fabian at gmail dot com
5 siblings, 0 replies; 8+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-10-21 8:12 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=18943
Mike FABIAN <maiku.fabian at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |maiku.fabian at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-10-21 8:12 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-09 21:19 [Bug localedata/18943] New: Collation of NFD strings egmont at gmail dot com
2015-09-10 5:51 ` Keld Simonsen
2015-09-10 5:52 ` [Bug localedata/18943] " keld at keldix dot com
2015-09-10 6:40 ` egmont at gmail dot com
2015-09-10 6:49 ` Keld Simonsen
2015-09-10 6:52 ` keld at keldix dot com
2015-09-10 7:25 ` egmont at gmail dot com
2017-10-21 8:12 ` maiku.fabian at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).