* [Bug localedata/17750] wrong collation order of diacritics in most locales
2014-12-23 4:25 [Bug localedata/17750] New: wrong collation order of diacritics in most locales aoliva at sourceware dot org
2014-12-23 5:34 ` [Bug localedata/17750] " aoliva at sourceware dot org
@ 2014-12-23 18:12 ` keld at keldix dot com
2015-01-29 13:17 ` fweimer at redhat dot com
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: keld at keldix dot com @ 2014-12-23 18:12 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=17750
--- Comment #2 from keld at keldix dot com <keld at keldix dot com> ---
On Tue, Dec 23, 2014 at 04:25:27AM +0000, aoliva at sourceware dot org wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=17750
>
> Bug ID: 17750
> Summary: wrong collation order of diacritics in most locales
> Product: glibc
> Version: unspecified
> Status: NEW
> Severity: normal
> Priority: P2
> Component: localedata
> Assignee: unassigned at sourceware dot org
> Reporter: aoliva at sourceware dot org
> CC: libc-locales at sourceware dot org
>
> http://www.unicode.org/reports/tr10/tr10-30.html states:
>
> <quote>
> Normally, all differences in sorting are assessed from the start to the end of
> the string. If all of the base letters are the same, the first accent
> difference determines the final order. In row 1 of Table 5, the first accent
> difference is on the o, so that is what determines the order. In some French
> dictionary ordering traditions, however, it is the last accent difference that
> determines the order, as shown in row 2.
> </quote>
>
> Table 5 says:
>
> <pre>
> Normal Accent Ordering cote < coté < côte < côté
> Backward Accent Ordering cote < côte < coté < côté
> </pre>
>
> However, glibc implements backward accent ordering for all locales except de_DE
> and lb_LU.
>
> Unicode CLDR 26 confirms this is wrong: the only file in
> http://unicode.org/cldr/trac/browser/tags/release-26/common/collation/ that has
> settings backwards="on" is fr_CA.xml.
This was probably done because if there are more than one accented letter in a
string,
the word or name is probably French, and then the french rules should be
followed.
This would mean that CLDR is wrong.
Best regards
Keld
--
You are receiving this mail because:
You are on the CC list for the bug.
>From glibc-bugs-return-26952-listarch-glibc-bugs=sources.redhat.com@sourceware.org Tue Dec 23 23:00:58 2014
Return-Path: <glibc-bugs-return-26952-listarch-glibc-bugs=sources.redhat.com@sourceware.org>
Delivered-To: listarch-glibc-bugs@sources.redhat.com
Received: (qmail 24037 invoked by alias); 23 Dec 2014 23:00:57 -0000
Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs@sourceware.org>
List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-owner@sourceware.org
Delivered-To: mailing list glibc-bugs@sourceware.org
Received: (qmail 23985 invoked by uid 48); 23 Dec 2014 23:00:53 -0000
From: "aoliva at sourceware dot org" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug localedata/17750] wrong collation order of diacritics in most locales
Date: Tue, 23 Dec 2014 23:00:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: localedata
X-Bugzilla-Version: unspecified
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: aoliva at sourceware dot org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: aoliva at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-17750-131-1fBT7V2XBg@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-17750-131@http.sourceware.org/bugzilla/>
References: <bug-17750-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-12/txt/msg00195.txt.bz2
Content-length: 1135
https://sourceware.org/bugzilla/show_bug.cgi?id=17750
--- Comment #3 from Alexandre Oliva <aoliva at sourceware dot org> ---
Even if your assumption that more than one diacritic in a word implied the word
was in French, there are various other points that make your suggestion flawed.
First of all, the forward or backward accent ordering doesn't even apply to all
French speakers.
Second, there are words with more than one diacritic in other languages. I
happen to be a native speaker of one such language.
Third, you don't need more than one diacritic in a word to trigger the problem.
Consider Cortes, Córtes, and Cortés; pelo, pêlo, pelô; Schlagerforderung,
Schlagerförderung, Schlägerforderung, Schlägerförderung.
Fourth, Unicode and CLDR are the result of a lot of work by a lot of people who
study lots of languages and local customs. It would take a lot more than
groundless speculation to conclude they're wrong. (Which is not to say they're
perfect in all regards, of course ;-)
--
You are receiving this mail because:
You are on the CC list for the bug.
>From glibc-bugs-return-26953-listarch-glibc-bugs=sources.redhat.com@sourceware.org Wed Dec 24 14:04:43 2014
Return-Path: <glibc-bugs-return-26953-listarch-glibc-bugs=sources.redhat.com@sourceware.org>
Delivered-To: listarch-glibc-bugs@sources.redhat.com
Received: (qmail 28070 invoked by alias); 24 Dec 2014 14:04:43 -0000
Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs@sourceware.org>
List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-owner@sourceware.org
Delivered-To: mailing list glibc-bugs@sourceware.org
Received: (qmail 28014 invoked by uid 48); 24 Dec 2014 14:04:38 -0000
From: "carlos at redhat dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug localedata/17750] wrong collation order of diacritics in most locales
Date: Wed, 24 Dec 2014 14:04:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: localedata
X-Bugzilla-Version: unspecified
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: carlos at redhat dot com
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: aoliva at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-17750-131-4BliFSKnNx@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-17750-131@http.sourceware.org/bugzilla/>
References: <bug-17750-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-12/txt/msg00196.txt.bz2
Content-length: 1622
https://sourceware.org/bugzilla/show_bug.cgi?id=17750
Carlos O'Donell <carlos at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |carlos at redhat dot com
--- Comment #4 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Alexandre Oliva from comment #3)
> Even if your assumption that more than one diacritic in a word implied the
> word was in French, there are various other points that make your suggestion
> flawed.
>
> First of all, the forward or backward accent ordering doesn't even apply to
> all French speakers.
>
> Second, there are words with more than one diacritic in other languages. I
> happen to be a native speaker of one such language.
>
> Third, you don't need more than one diacritic in a word to trigger the
> problem. Consider Cortes, Córtes, and Cortés; pelo, pêlo, pelô;
> Schlagerforderung, Schlagerförderung, Schlägerforderung, Schlägerförderung.
>
> Fourth, Unicode and CLDR are the result of a lot of work by a lot of people
> who study lots of languages and local customs. It would take a lot more
> than groundless speculation to conclude they're wrong. (Which is not to say
> they're perfect in all regards, of course ;-)
I agree with Alex. We would need a very detailed analysis of why CLDR is wrong
to ignore their implementation and do something different.
--
You are receiving this mail because:
You are on the CC list for the bug.
>From glibc-bugs-return-26954-listarch-glibc-bugs=sources.redhat.com@sourceware.org Thu Dec 25 11:38:42 2014
Return-Path: <glibc-bugs-return-26954-listarch-glibc-bugs=sources.redhat.com@sourceware.org>
Delivered-To: listarch-glibc-bugs@sources.redhat.com
Received: (qmail 28798 invoked by alias); 25 Dec 2014 11:38:41 -0000
Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs@sourceware.org>
List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-owner@sourceware.org
Delivered-To: mailing list glibc-bugs@sourceware.org
Received: (qmail 28744 invoked by uid 55); 25 Dec 2014 11:38:34 -0000
From: "keld at keldix dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug localedata/17750] wrong collation order of diacritics in most locales
Date: Thu, 25 Dec 2014 11:38:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: localedata
X-Bugzilla-Version: unspecified
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: keld at keldix dot com
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: aoliva at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-17750-131-DZvxy81yTJ@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-17750-131@http.sourceware.org/bugzilla/>
References: <bug-17750-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-12/txt/msg00197.txt.bz2
Content-length: 2156
https://sourceware.org/bugzilla/show_bug.cgi?id=17750
--- Comment #5 from keld at keldix dot com <keld at keldix dot com> ---
On Tue, Dec 23, 2014 at 11:00:50PM +0000, aoliva at sourceware dot org wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=17750
>
> --- Comment #3 from Alexandre Oliva <aoliva at sourceware dot org> ---
> Even if your assumption that more than one diacritic in a word implied the word
> was in French, there are various other points that make your suggestion flawed.
>
> First of all, the forward or backward accent ordering doesn't even apply to all
> French speakers.
>
> Second, there are words with more than one diacritic in other languages. I
> happen to be a native speaker of one such language.
>
> Third, you don't need more than one diacritic in a word to trigger the problem.
> Consider Cortes, Córtes, and Cortés; pelo, pêlo, pelô; Schlagerforderung,
> Schlagerförderung, Schlägerforderung, Schlägerförderung.
>
> Fourth, Unicode and CLDR are the result of a lot of work by a lot of people who
> study lots of languages and local customs. It would take a lot more than
> groundless speculation to conclude they're wrong. (Which is not to say they're
> perfect in all regards, of course ;-)
1. Which french speakers does not use the backward accent ordering?
I do have access to some of the sorting experts from the French community.
2. I see that for some languages, eg. German, it makes sense to use forward
ordering on accents.
Which languages would that apply to?
3. Yes, I see that there may be just one accent in some strings, and then the
ordering depends om the position. I was involved in the current recommendation
to
use backward ordering in the default tables And I was not the only one,
and the recommendation came out of the sorting experts in ISO and I believe
also in CEN.
4. Well, CLDR does not have more ressources that we have. And they are known
not to listen to other expertise than their own.
Best regards
Keld
--
You are receiving this mail because:
You are on the CC list for the bug.
>From glibc-bugs-return-26955-listarch-glibc-bugs=sources.redhat.com@sourceware.org Fri Dec 26 15:31:52 2014
Return-Path: <glibc-bugs-return-26955-listarch-glibc-bugs=sources.redhat.com@sourceware.org>
Delivered-To: listarch-glibc-bugs@sources.redhat.com
Received: (qmail 21080 invoked by alias); 26 Dec 2014 15:31:51 -0000
Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs@sourceware.org>
List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-owner@sourceware.org
Delivered-To: mailing list glibc-bugs@sourceware.org
Received: (qmail 21019 invoked by uid 48); 26 Dec 2014 15:31:46 -0000
From: "jwilk at jwilk dot net" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug libc/17715] Robustify TZ file parser and reduce attack surface
Date: Fri, 26 Dec 2014 15:31:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: libc
X-Bugzilla-Version: 2.21
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jwilk at jwilk dot net
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: fweimer at redhat dot com
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: security+
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-17715-131-hpm8H741Ln@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-17715-131@http.sourceware.org/bugzilla/>
References: <bug-17715-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-12/txt/msg00198.txt.bz2
Content-length: 384
https://sourceware.org/bugzilla/show_bug.cgi?id\x17715
Jakub Wilk <jwilk at jwilk dot net> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jwilk at jwilk dot net
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread