From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30264 invoked by alias); 23 Dec 2014 18:12:00 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Received: (qmail 30219 invoked by uid 55); 23 Dec 2014 18:11:55 -0000 From: "keld at keldix dot com" To: glibc-bugs@sourceware.org Subject: [Bug localedata/17750] wrong collation order of diacritics in most locales Date: Tue, 23 Dec 2014 18:12:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: keld at keldix dot com X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: aoliva at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-12/txt/msg00194.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D17750 --- Comment #2 from keld at keldix dot com --- On Tue, Dec 23, 2014 at 04:25:27AM +0000, aoliva at sourceware dot org wrot= e: > https://sourceware.org/bugzilla/show_bug.cgi?id=3D17750 >=20 > Bug ID: 17750 > Summary: wrong collation order of diacritics in most locales > Product: glibc > Version: unspecified > Status: NEW > Severity: normal > Priority: P2 > Component: localedata > Assignee: unassigned at sourceware dot org > Reporter: aoliva at sourceware dot org > CC: libc-locales at sourceware dot org >=20 > http://www.unicode.org/reports/tr10/tr10-30.html states: >=20 > > Normally, all differences in sorting are assessed from the start to the e= nd of > the string. If all of the base letters are the same, the first accent > difference determines the final order. In row 1 of Table 5, the first acc= ent > difference is on the o, so that is what determines the order. In some Fre= nch > dictionary ordering traditions, however, it is the last accent difference= that > determines the order, as shown in row 2. > >=20 > Table 5 says: >=20 >
> Normal Accent Ordering      cote < cot=C3=A9 < c=C3=B4te < c=C3=B4t=C3=A9
> Backward Accent Ordering     cote < c=C3=B4te < cot=C3=A9 < c=C3=B4t=C3=A9
> 
>=20 > However, glibc implements backward accent ordering for all locales except= de_DE > and lb_LU.=20=20 >=20 > Unicode CLDR 26 confirms this is wrong: the only file in > http://unicode.org/cldr/trac/browser/tags/release-26/common/collation/ th= at has > settings backwards=3D"on" is fr_CA.xml. This was probably done because if there are more than one accented letter i= n a string, the word or name is probably French, and then the french rules should be followed. This would mean that CLDR is wrong. Best regards Keld --=20 You are receiving this mail because: You are on the CC list for the bug. >>From glibc-bugs-return-26952-listarch-glibc-bugs=sources.redhat.com@sourceware.org Tue Dec 23 23:00:58 2014 Return-Path: Delivered-To: listarch-glibc-bugs@sources.redhat.com Received: (qmail 24037 invoked by alias); 23 Dec 2014 23:00:57 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Delivered-To: mailing list glibc-bugs@sourceware.org Received: (qmail 23985 invoked by uid 48); 23 Dec 2014 23:00:53 -0000 From: "aoliva at sourceware dot org" To: glibc-bugs@sourceware.org Subject: [Bug localedata/17750] wrong collation order of diacritics in most locales Date: Tue, 23 Dec 2014 23:00:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: aoliva at sourceware dot org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: aoliva at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-12/txt/msg00195.txt.bz2 Content-length: 1135 https://sourceware.org/bugzilla/show_bug.cgi?id=3D17750 --- Comment #3 from Alexandre Oliva --- Even if your assumption that more than one diacritic in a word implied the = word was in French, there are various other points that make your suggestion fla= wed. First of all, the forward or backward accent ordering doesn't even apply to= all French speakers. Second, there are words with more than one diacritic in other languages. I happen to be a native speaker of one such language. Third, you don't need more than one diacritic in a word to trigger the prob= lem. Consider Cortes, C=C3=B3rtes, and Cort=C3=A9s; pelo, p=C3=AAlo, pel=C3=B4;= Schlagerforderung, Schlagerf=C3=B6rderung, Schl=C3=A4gerforderung, Schl=C3=A4gerf=C3=B6rderung. Fourth, Unicode and CLDR are the result of a lot of work by a lot of people= who study lots of languages and local customs. It would take a lot more than groundless speculation to conclude they're wrong. (Which is not to say the= y're perfect in all regards, of course ;-) --=20 You are receiving this mail because: You are on the CC list for the bug. >>From glibc-bugs-return-26953-listarch-glibc-bugs=sources.redhat.com@sourceware.org Wed Dec 24 14:04:43 2014 Return-Path: Delivered-To: listarch-glibc-bugs@sources.redhat.com Received: (qmail 28070 invoked by alias); 24 Dec 2014 14:04:43 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Delivered-To: mailing list glibc-bugs@sourceware.org Received: (qmail 28014 invoked by uid 48); 24 Dec 2014 14:04:38 -0000 From: "carlos at redhat dot com" To: glibc-bugs@sourceware.org Subject: [Bug localedata/17750] wrong collation order of diacritics in most locales Date: Wed, 24 Dec 2014 14:04:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: carlos at redhat dot com X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: aoliva at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-12/txt/msg00196.txt.bz2 Content-length: 1622 https://sourceware.org/bugzilla/show_bug.cgi?id=3D17750 Carlos O'Donell changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |carlos at redhat dot com --- Comment #4 from Carlos O'Donell --- (In reply to Alexandre Oliva from comment #3) > Even if your assumption that more than one diacritic in a word implied the > word was in French, there are various other points that make your suggest= ion > flawed. >=20 > First of all, the forward or backward accent ordering doesn't even apply = to > all French speakers. >=20 > Second, there are words with more than one diacritic in other languages. = I > happen to be a native speaker of one such language. >=20 > Third, you don't need more than one diacritic in a word to trigger the > problem. Consider Cortes, C=C3=B3rtes, and Cort=C3=A9s; pelo, p=C3=AAlo,= pel=C3=B4; > Schlagerforderung, Schlagerf=C3=B6rderung, Schl=C3=A4gerforderung, Schl= =C3=A4gerf=C3=B6rderung. >=20 > Fourth, Unicode and CLDR are the result of a lot of work by a lot of peop= le > who study lots of languages and local customs. It would take a lot more > than groundless speculation to conclude they're wrong. (Which is not to = say > they're perfect in all regards, of course ;-) I agree with Alex. We would need a very detailed analysis of why CLDR is wr= ong to ignore their implementation and do something different. --=20 You are receiving this mail because: You are on the CC list for the bug. >>From glibc-bugs-return-26954-listarch-glibc-bugs=sources.redhat.com@sourceware.org Thu Dec 25 11:38:42 2014 Return-Path: Delivered-To: listarch-glibc-bugs@sources.redhat.com Received: (qmail 28798 invoked by alias); 25 Dec 2014 11:38:41 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Delivered-To: mailing list glibc-bugs@sourceware.org Received: (qmail 28744 invoked by uid 55); 25 Dec 2014 11:38:34 -0000 From: "keld at keldix dot com" To: glibc-bugs@sourceware.org Subject: [Bug localedata/17750] wrong collation order of diacritics in most locales Date: Thu, 25 Dec 2014 11:38:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: keld at keldix dot com X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: aoliva at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-12/txt/msg00197.txt.bz2 Content-length: 2156 https://sourceware.org/bugzilla/show_bug.cgi?id=3D17750 --- Comment #5 from keld at keldix dot com --- On Tue, Dec 23, 2014 at 11:00:50PM +0000, aoliva at sourceware dot org wrot= e: > https://sourceware.org/bugzilla/show_bug.cgi?id=3D17750 >=20 > --- Comment #3 from Alexandre Oliva --- > Even if your assumption that more than one diacritic in a word implied th= e word > was in French, there are various other points that make your suggestion f= lawed. >=20 > First of all, the forward or backward accent ordering doesn't even apply = to all > French speakers. >=20 > Second, there are words with more than one diacritic in other languages. = I > happen to be a native speaker of one such language. >=20 > Third, you don't need more than one diacritic in a word to trigger the pr= oblem. > Consider Cortes, C=C3=B3rtes, and Cort=C3=A9s; pelo, p=C3=AAlo, pel=C3= =B4; Schlagerforderung, > Schlagerf=C3=B6rderung, Schl=C3=A4gerforderung, Schl=C3=A4gerf=C3=B6rderu= ng. >=20 > Fourth, Unicode and CLDR are the result of a lot of work by a lot of peop= le who > study lots of languages and local customs. It would take a lot more than > groundless speculation to conclude they're wrong. (Which is not to say t= hey're > perfect in all regards, of course ;-) 1. Which french speakers does not use the backward accent ordering? I do have access to some of the sorting experts from the French community. 2. I see that for some languages, eg. German, it makes sense to use forward ordering on accents. Which languages would that apply to? 3. Yes, I see that there may be just one accent in some strings, and then t= he ordering depends om the position. I was involved in the current recommendat= ion to use backward ordering in the default tables And I was not the only one, and the recommendation came out of the sorting experts in ISO and I believe also in CEN.=20 4. Well, CLDR does not have more ressources that we have. And they are known not to listen to other expertise than their own. Best regards Keld --=20 You are receiving this mail because: You are on the CC list for the bug. >>From glibc-bugs-return-26955-listarch-glibc-bugs=sources.redhat.com@sourceware.org Fri Dec 26 15:31:52 2014 Return-Path: Delivered-To: listarch-glibc-bugs@sources.redhat.com Received: (qmail 21080 invoked by alias); 26 Dec 2014 15:31:51 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Delivered-To: mailing list glibc-bugs@sourceware.org Received: (qmail 21019 invoked by uid 48); 26 Dec 2014 15:31:46 -0000 From: "jwilk at jwilk dot net" To: glibc-bugs@sourceware.org Subject: [Bug libc/17715] Robustify TZ file parser and reduce attack surface Date: Fri, 26 Dec 2014 15:31:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: libc X-Bugzilla-Version: 2.21 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: jwilk at jwilk dot net X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: fweimer at redhat dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: security+ X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-12/txt/msg00198.txt.bz2 Content-length: 384 https://sourceware.org/bugzilla/show_bug.cgi?id=17715 Jakub Wilk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jwilk at jwilk dot net -- You are receiving this mail because: You are on the CC list for the bug.