public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug localedata/17750] New: wrong collation order of diacritics in most locales
@ 2014-12-23  4:25 aoliva at sourceware dot org
  2014-12-23  5:34 ` [Bug localedata/17750] " aoliva at sourceware dot org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: aoliva at sourceware dot org @ 2014-12-23  4:25 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=17750

            Bug ID: 17750
           Summary: wrong collation order of diacritics in most locales
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: aoliva at sourceware dot org
                CC: libc-locales at sourceware dot org

http://www.unicode.org/reports/tr10/tr10-30.html states:

<quote>
Normally, all differences in sorting are assessed from the start to the end of
the string. If all of the base letters are the same, the first accent
difference determines the final order. In row 1 of Table 5, the first accent
difference is on the o, so that is what determines the order. In some French
dictionary ordering traditions, however, it is the last accent difference that
determines the order, as shown in row 2.
</quote>

Table 5 says:

<pre>
Normal Accent Ordering      cote < coté < côte < côté
Backward Accent Ordering     cote < côte < coté < côté
</pre>

However, glibc implements backward accent ordering for all locales except de_DE
and lb_LU.  

Unicode CLDR 26 confirms this is wrong: the only file in
http://unicode.org/cldr/trac/browser/tags/release-26/common/collation/ that has
settings backwards="on" is fr_CA.xml.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
>From glibc-bugs-return-26949-listarch-glibc-bugs=sources.redhat.com@sourceware.org Tue Dec 23 04:30:15 2014
Return-Path: <glibc-bugs-return-26949-listarch-glibc-bugs=sources.redhat.com@sourceware.org>
Delivered-To: listarch-glibc-bugs@sources.redhat.com
Received: (qmail 1839 invoked by alias); 23 Dec 2014 04:30:14 -0000
Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs@sourceware.org>
List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-owner@sourceware.org
Delivered-To: mailing list glibc-bugs@sourceware.org
Received: (qmail 1764 invoked by uid 48); 23 Dec 2014 04:30:08 -0000
From: "aoliva at sourceware dot org" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug localedata/17750] wrong collation order of diacritics in most locales
Date: Tue, 23 Dec 2014 04:30:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: localedata
X-Bugzilla-Version: unspecified
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: aoliva at sourceware dot org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: aoliva at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: assigned_to
Message-ID: <bug-17750-131-zrUVlzbNA1@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-17750-131@http.sourceware.org/bugzilla/>
References: <bug-17750-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-12/txt/msg00192.txt.bz2
Content-length: 566

https://sourceware.org/bugzilla/show_bug.cgi?id\x17750

Alexandre Oliva <aoliva at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at sourceware dot org   |aoliva at sourceware dot org

--- Comment #1 from Alexandre Oliva <aoliva at sourceware dot org> ---
Mine.  I posted a patch at
https://sourceware.org/ml/libc-alpha/2014-12/msg00524.html

--
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/17750] wrong collation order of diacritics in most locales
  2014-12-23  4:25 [Bug localedata/17750] New: wrong collation order of diacritics in most locales aoliva at sourceware dot org
@ 2014-12-23  5:34 ` aoliva at sourceware dot org
  2014-12-23 18:12 ` keld at keldix dot com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: aoliva at sourceware dot org @ 2014-12-23  5:34 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=17750

Alexandre Oliva <aoliva at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/17750] wrong collation order of diacritics in most locales
  2014-12-23  4:25 [Bug localedata/17750] New: wrong collation order of diacritics in most locales aoliva at sourceware dot org
  2014-12-23  5:34 ` [Bug localedata/17750] " aoliva at sourceware dot org
@ 2014-12-23 18:12 ` keld at keldix dot com
  2015-01-29 13:17 ` fweimer at redhat dot com
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: keld at keldix dot com @ 2014-12-23 18:12 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=17750

--- Comment #2 from keld at keldix dot com <keld at keldix dot com> ---
On Tue, Dec 23, 2014 at 04:25:27AM +0000, aoliva at sourceware dot org wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=17750
> 
>             Bug ID: 17750
>            Summary: wrong collation order of diacritics in most locales
>            Product: glibc
>            Version: unspecified
>             Status: NEW
>           Severity: normal
>           Priority: P2
>          Component: localedata
>           Assignee: unassigned at sourceware dot org
>           Reporter: aoliva at sourceware dot org
>                 CC: libc-locales at sourceware dot org
> 
> http://www.unicode.org/reports/tr10/tr10-30.html states:
> 
> <quote>
> Normally, all differences in sorting are assessed from the start to the end of
> the string. If all of the base letters are the same, the first accent
> difference determines the final order. In row 1 of Table 5, the first accent
> difference is on the o, so that is what determines the order. In some French
> dictionary ordering traditions, however, it is the last accent difference that
> determines the order, as shown in row 2.
> </quote>
> 
> Table 5 says:
> 
> <pre>
> Normal Accent Ordering      cote < coté < côte < côté
> Backward Accent Ordering     cote < côte < coté < côté
> </pre>
> 
> However, glibc implements backward accent ordering for all locales except de_DE
> and lb_LU.  
> 
> Unicode CLDR 26 confirms this is wrong: the only file in
> http://unicode.org/cldr/trac/browser/tags/release-26/common/collation/ that has
> settings backwards="on" is fr_CA.xml.

This was probably done because if there are more than one accented letter in a
string,
the word or name is probably French, and then the french rules should be
followed.
This would mean that CLDR is wrong.

Best regards
Keld

-- 
You are receiving this mail because:
You are on the CC list for the bug.
>From glibc-bugs-return-26952-listarch-glibc-bugs=sources.redhat.com@sourceware.org Tue Dec 23 23:00:58 2014
Return-Path: <glibc-bugs-return-26952-listarch-glibc-bugs=sources.redhat.com@sourceware.org>
Delivered-To: listarch-glibc-bugs@sources.redhat.com
Received: (qmail 24037 invoked by alias); 23 Dec 2014 23:00:57 -0000
Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs@sourceware.org>
List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-owner@sourceware.org
Delivered-To: mailing list glibc-bugs@sourceware.org
Received: (qmail 23985 invoked by uid 48); 23 Dec 2014 23:00:53 -0000
From: "aoliva at sourceware dot org" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug localedata/17750] wrong collation order of diacritics in most locales
Date: Tue, 23 Dec 2014 23:00:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: localedata
X-Bugzilla-Version: unspecified
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: aoliva at sourceware dot org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: aoliva at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-17750-131-1fBT7V2XBg@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-17750-131@http.sourceware.org/bugzilla/>
References: <bug-17750-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-12/txt/msg00195.txt.bz2
Content-length: 1135

https://sourceware.org/bugzilla/show_bug.cgi?id=17750

--- Comment #3 from Alexandre Oliva <aoliva at sourceware dot org> ---
Even if your assumption that more than one diacritic in a word implied the word
was in French, there are various other points that make your suggestion flawed.

First of all, the forward or backward accent ordering doesn't even apply to all
French speakers.

Second, there are words with more than one diacritic in other languages.  I
happen to be a native speaker of one such language.

Third, you don't need more than one diacritic in a word to trigger the problem.
 Consider Cortes, Córtes, and Cortés; pelo, pêlo, pelô; Schlagerforderung,
Schlagerförderung, Schlägerforderung, Schlägerförderung.

Fourth, Unicode and CLDR are the result of a lot of work by a lot of people who
study lots of languages and local customs.  It would take a lot more than
groundless speculation to conclude they're wrong.  (Which is not to say they're
perfect in all regards, of course ;-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
>From glibc-bugs-return-26953-listarch-glibc-bugs=sources.redhat.com@sourceware.org Wed Dec 24 14:04:43 2014
Return-Path: <glibc-bugs-return-26953-listarch-glibc-bugs=sources.redhat.com@sourceware.org>
Delivered-To: listarch-glibc-bugs@sources.redhat.com
Received: (qmail 28070 invoked by alias); 24 Dec 2014 14:04:43 -0000
Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs@sourceware.org>
List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-owner@sourceware.org
Delivered-To: mailing list glibc-bugs@sourceware.org
Received: (qmail 28014 invoked by uid 48); 24 Dec 2014 14:04:38 -0000
From: "carlos at redhat dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug localedata/17750] wrong collation order of diacritics in most locales
Date: Wed, 24 Dec 2014 14:04:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: localedata
X-Bugzilla-Version: unspecified
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: carlos at redhat dot com
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: aoliva at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-17750-131-4BliFSKnNx@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-17750-131@http.sourceware.org/bugzilla/>
References: <bug-17750-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-12/txt/msg00196.txt.bz2
Content-length: 1622

https://sourceware.org/bugzilla/show_bug.cgi?id=17750

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com

--- Comment #4 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Alexandre Oliva from comment #3)
> Even if your assumption that more than one diacritic in a word implied the
> word was in French, there are various other points that make your suggestion
> flawed.
> 
> First of all, the forward or backward accent ordering doesn't even apply to
> all French speakers.
> 
> Second, there are words with more than one diacritic in other languages.  I
> happen to be a native speaker of one such language.
> 
> Third, you don't need more than one diacritic in a word to trigger the
> problem.  Consider Cortes, Córtes, and Cortés; pelo, pêlo, pelô;
> Schlagerforderung, Schlagerförderung, Schlägerforderung, Schlägerförderung.
> 
> Fourth, Unicode and CLDR are the result of a lot of work by a lot of people
> who study lots of languages and local customs.  It would take a lot more
> than groundless speculation to conclude they're wrong.  (Which is not to say
> they're perfect in all regards, of course ;-)

I agree with Alex. We would need a very detailed analysis of why CLDR is wrong
to ignore their implementation and do something different.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
>From glibc-bugs-return-26954-listarch-glibc-bugs=sources.redhat.com@sourceware.org Thu Dec 25 11:38:42 2014
Return-Path: <glibc-bugs-return-26954-listarch-glibc-bugs=sources.redhat.com@sourceware.org>
Delivered-To: listarch-glibc-bugs@sources.redhat.com
Received: (qmail 28798 invoked by alias); 25 Dec 2014 11:38:41 -0000
Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs@sourceware.org>
List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-owner@sourceware.org
Delivered-To: mailing list glibc-bugs@sourceware.org
Received: (qmail 28744 invoked by uid 55); 25 Dec 2014 11:38:34 -0000
From: "keld at keldix dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug localedata/17750] wrong collation order of diacritics in most locales
Date: Thu, 25 Dec 2014 11:38:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: localedata
X-Bugzilla-Version: unspecified
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: keld at keldix dot com
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: aoliva at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-17750-131-DZvxy81yTJ@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-17750-131@http.sourceware.org/bugzilla/>
References: <bug-17750-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-12/txt/msg00197.txt.bz2
Content-length: 2156

https://sourceware.org/bugzilla/show_bug.cgi?id=17750

--- Comment #5 from keld at keldix dot com <keld at keldix dot com> ---
On Tue, Dec 23, 2014 at 11:00:50PM +0000, aoliva at sourceware dot org wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=17750
> 
> --- Comment #3 from Alexandre Oliva <aoliva at sourceware dot org> ---
> Even if your assumption that more than one diacritic in a word implied the word
> was in French, there are various other points that make your suggestion flawed.
> 
> First of all, the forward or backward accent ordering doesn't even apply to all
> French speakers.
> 
> Second, there are words with more than one diacritic in other languages.  I
> happen to be a native speaker of one such language.
> 
> Third, you don't need more than one diacritic in a word to trigger the problem.
>  Consider Cortes, Córtes, and Cortés; pelo, pêlo, pelô; Schlagerforderung,
> Schlagerförderung, Schlägerforderung, Schlägerförderung.
> 
> Fourth, Unicode and CLDR are the result of a lot of work by a lot of people who
> study lots of languages and local customs.  It would take a lot more than
> groundless speculation to conclude they're wrong.  (Which is not to say they're
> perfect in all regards, of course ;-)

1. Which french speakers does not use the backward accent ordering?
I do have access to some of the sorting experts from the French community.

2. I see that for some languages, eg. German, it makes sense to use forward
ordering on accents.
   Which languages would that apply to?

3. Yes, I see that there may be just one accent in some strings, and then the
ordering depends om the position. I was involved in the current recommendation
to
use backward ordering in the default tables And I was not the only one,
and the recommendation came out of the sorting experts in ISO and I believe
also in CEN. 

4. Well, CLDR does not have more ressources that we have. And they are known
not to listen to other expertise than their own.


Best regards
Keld

-- 
You are receiving this mail because:
You are on the CC list for the bug.
>From glibc-bugs-return-26955-listarch-glibc-bugs=sources.redhat.com@sourceware.org Fri Dec 26 15:31:52 2014
Return-Path: <glibc-bugs-return-26955-listarch-glibc-bugs=sources.redhat.com@sourceware.org>
Delivered-To: listarch-glibc-bugs@sources.redhat.com
Received: (qmail 21080 invoked by alias); 26 Dec 2014 15:31:51 -0000
Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs@sourceware.org>
List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-owner@sourceware.org
Delivered-To: mailing list glibc-bugs@sourceware.org
Received: (qmail 21019 invoked by uid 48); 26 Dec 2014 15:31:46 -0000
From: "jwilk at jwilk dot net" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug libc/17715] Robustify TZ file parser and reduce attack surface
Date: Fri, 26 Dec 2014 15:31:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: libc
X-Bugzilla-Version: 2.21
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jwilk at jwilk dot net
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: fweimer at redhat dot com
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: security+
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-17715-131-hpm8H741Ln@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-17715-131@http.sourceware.org/bugzilla/>
References: <bug-17715-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-12/txt/msg00198.txt.bz2
Content-length: 384

https://sourceware.org/bugzilla/show_bug.cgi?id\x17715

Jakub Wilk <jwilk at jwilk dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jwilk at jwilk dot net

--
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/17750] wrong collation order of diacritics in most locales
  2014-12-23  4:25 [Bug localedata/17750] New: wrong collation order of diacritics in most locales aoliva at sourceware dot org
  2014-12-23  5:34 ` [Bug localedata/17750] " aoliva at sourceware dot org
  2014-12-23 18:12 ` keld at keldix dot com
@ 2015-01-29 13:17 ` fweimer at redhat dot com
  2015-01-29 14:35 ` carlos at redhat dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: fweimer at redhat dot com @ 2015-01-29 13:17 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=17750

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com

--- Comment #6 from Florian Weimer <fweimer at redhat dot com> ---
Fixing this will change the sort order of existing data, which is quite risky. 
Is it really worth it?

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/17750] wrong collation order of diacritics in most locales
  2014-12-23  4:25 [Bug localedata/17750] New: wrong collation order of diacritics in most locales aoliva at sourceware dot org
                   ` (2 preceding siblings ...)
  2015-01-29 13:17 ` fweimer at redhat dot com
@ 2015-01-29 14:35 ` carlos at redhat dot com
  2015-01-30 15:25 ` keld at keldix dot com
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: carlos at redhat dot com @ 2015-01-29 14:35 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=17750

--- Comment #7 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Florian Weimer from comment #6)
> Fixing this will change the sort order of existing data, which is quite
> risky.  Is it really worth it?

For the long term support of locales it must change. Unless we get more
maintainers my plan is to conintue to push that we match CLDR, UNICODE and thus
exactly what libicu does and reduce the "surprise" for developers going from
java to C/C++ or vice-versa.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/17750] wrong collation order of diacritics in most locales
  2014-12-23  4:25 [Bug localedata/17750] New: wrong collation order of diacritics in most locales aoliva at sourceware dot org
                   ` (3 preceding siblings ...)
  2015-01-29 14:35 ` carlos at redhat dot com
@ 2015-01-30 15:25 ` keld at keldix dot com
  2015-01-30 17:52 ` carlos at redhat dot com
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: keld at keldix dot com @ 2015-01-30 15:25 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=17750

--- Comment #9 from keld at keldix dot com <keld at keldix dot com> ---
On Thu, Jan 29, 2015 at 02:35:11PM +0000, carlos at redhat dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=17750
> 
> --- Comment #7 from Carlos O'Donell <carlos at redhat dot com> ---
> (In reply to Florian Weimer from comment #6)
> > Fixing this will change the sort order of existing data, which is quite
> > risky.  Is it really worth it?
> 
> For the long term support of locales it must change. Unless we get more
> maintainers my plan is to conintue to push that we match CLDR, UNICODE and thus
> exactly what libicu does and reduce the "surprise" for developers going from
> java to C/C++ or vice-versa.

The fix is wrong, IMHO.

Best regards
Keld

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/17750] wrong collation order of diacritics in most locales
  2014-12-23  4:25 [Bug localedata/17750] New: wrong collation order of diacritics in most locales aoliva at sourceware dot org
                   ` (4 preceding siblings ...)
  2015-01-30 15:25 ` keld at keldix dot com
@ 2015-01-30 17:52 ` carlos at redhat dot com
  2015-09-08  8:50 ` egmont at gmail dot com
  2015-09-08  8:52 ` egmont at gmail dot com
  7 siblings, 0 replies; 9+ messages in thread
From: carlos at redhat dot com @ 2015-01-30 17:52 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=17750

--- Comment #10 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to keld@keldix.com from comment #9)
> On Thu, Jan 29, 2015 at 02:35:11PM +0000, carlos at redhat dot com wrote:
> > https://sourceware.org/bugzilla/show_bug.cgi?id=17750
> > 
> > --- Comment #7 from Carlos O'Donell <carlos at redhat dot com> ---
> > (In reply to Florian Weimer from comment #6)
> > > Fixing this will change the sort order of existing data, which is quite
> > > risky.  Is it really worth it?
> > 
> > For the long term support of locales it must change. Unless we get more
> > maintainers my plan is to conintue to push that we match CLDR, UNICODE and thus
> > exactly what libicu does and reduce the "surprise" for developers going from
> > java to C/C++ or vice-versa.
> 
> The fix is wrong, IMHO.

Thanks for stating that. In this case we'll need to discuss why it's wrong and
try to come to a consensus, including talking to CLDR about it. Thus this issue
is going to be more work, but not impossible.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/17750] wrong collation order of diacritics in most locales
  2014-12-23  4:25 [Bug localedata/17750] New: wrong collation order of diacritics in most locales aoliva at sourceware dot org
                   ` (5 preceding siblings ...)
  2015-01-30 17:52 ` carlos at redhat dot com
@ 2015-09-08  8:50 ` egmont at gmail dot com
  2015-09-08  8:52 ` egmont at gmail dot com
  7 siblings, 0 replies; 9+ messages in thread
From: egmont at gmail dot com @ 2015-09-08  8:50 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=17750

Egmont Koblinger <egmont at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |egmont at gmail dot com

--- Comment #11 from Egmont Koblinger <egmont at gmail dot com> ---
This change broke (among others) the Hungarian locales (see 18934).

I totally agree with Alexandre's opinion (the assumptions made by the patch
being wrong on so many levels); extending with a fifth one:

Even if there are some French words present in a list, if you're using a
certain language then the alphabetical rules of that language should apply, not
the French one. This is what locale definitions are about. Define in the French
locales the way to sort words on a French UI, but please leave the other
locales alone.

I'm disappointed that such a change that was doomed to break so many locales
managed to make it into glibc. But I think that in the end it boils down to the
lack of proper unittest coverage.

In the above mentioned bug I created an extensive unittest for Hungarian, one
that points to the official rules of alphabetical sorting and takes the
examples from that (plus many more), and would have failed with this change.

I encourage maintainers of locale files to come up with similarly extensive
unittests.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/17750] wrong collation order of diacritics in most locales
  2014-12-23  4:25 [Bug localedata/17750] New: wrong collation order of diacritics in most locales aoliva at sourceware dot org
                   ` (6 preceding siblings ...)
  2015-09-08  8:50 ` egmont at gmail dot com
@ 2015-09-08  8:52 ` egmont at gmail dot com
  7 siblings, 0 replies; 9+ messages in thread
From: egmont at gmail dot com @ 2015-09-08  8:52 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=17750

--- Comment #12 from Egmont Koblinger <egmont at gmail dot com> ---
Sorry, make it a link: bug 18934.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-09-08  8:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-23  4:25 [Bug localedata/17750] New: wrong collation order of diacritics in most locales aoliva at sourceware dot org
2014-12-23  5:34 ` [Bug localedata/17750] " aoliva at sourceware dot org
2014-12-23 18:12 ` keld at keldix dot com
2015-01-29 13:17 ` fweimer at redhat dot com
2015-01-29 14:35 ` carlos at redhat dot com
2015-01-30 15:25 ` keld at keldix dot com
2015-01-30 17:52 ` carlos at redhat dot com
2015-09-08  8:50 ` egmont at gmail dot com
2015-09-08  8:52 ` egmont at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).