From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9410 invoked by alias); 6 Nov 2014 11:03:14 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Received: (qmail 9005 invoked by uid 48); 6 Nov 2014 11:03:05 -0000 From: "maiku.fabian at gmail dot com" To: glibc-bugs@sourceware.org Subject: [Bug localedata/14094] Update locale data to Unicode 7.0.0 Date: Thu, 06 Nov 2014 11:03:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: 2.21 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: maiku.fabian at gmail dot com X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: pravin.d.s at gmail dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: security- X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-11/txt/msg00022.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D14094 --- Comment #21 from Mike FABIAN --- Now when using gen-unicode-ctype.c with UnicodeData.txt-7.0.0 to generate LC_CTYPE, the generated file lacks far fewer characters compared to the old i18n file in glibc: alpha: Missing 246 characters of old ctype in new ctype=20 blank: Missing 1 characters of old ctype in new ctype=20 cntrl: Missing 0 characters of old ctype in new ctype=20 combining: Missing 3 characters of old ctype in new ctype=20 combining_level3: Missing 5 characters of old ctype in new ctype=20 digit: Missing 0 characters of old ctype in new ctype=20 graph: Missing 0 characters of old ctype in new ctype=20 lower: Missing 20 characters of old ctype in new ctype=20 print: Missing 0 characters of old ctype in new ctype=20 punct: Missing 16 characters of old ctype in new ctype=20 space: Missing 1 characters of old ctype in new ctype=20 tolower: Missing 0 characters of old ctype in new ctype=20 totitle: Missing 0 characters of old ctype in new ctype=20 toupper: Missing 0 characters of old ctype in new ctype=20 upper: Missing 0 characters of old ctype in new ctype=20 xdigit: Missing 0 characters of old ctype in new ctype For example, gen-unicode-ctype.c does not put U+0901 into the =E2=80=9Calpha=E2=80=9D class although it should be there according to DerivedCoreProperties.txt: error: 0x901 =E0=A4=81 alpha False: These have general category =E2=80=9CMn= =E2=80=9D i.e. these are combining characters (both in UnicodeData.txt 5.0.0 and 7.0.0): =E2=80=9C0901;DEVANAGARI SIGN CANDRABINDU;Mn;0;NSM;;;;;N;;;;;= =E2=80=9D, =E2=80=9D0902;DEVANAGARI SIGN ANUSVARA;Mn;0;NSM;;;;;N;;;;;=E2= =80=9D, =E2=80=9C0903;DEVANAGARI SIGN VISARGA;Mc;0;L;;;;;N;;;;;=E2=80= =9D. According to DerivedCoreProperties.txt (7.0.0) these are =E2=80=9CAlphabetic=E2=80=9D.=20=20 Apparently this has been edited manually (correctly) in the old i18n file of glibc. So this would be fixed in the automatic generation when using DerivedCoreProperties.txt for =E2=80=9Calpha=E2=80=9D. But some of the above seem to be errors in the old i18n file of glib, for example: error: 0x1090 =E1=82=90 punct True: MYANMAR SHAN DIGIT ZERO - MYANMAR SHAN = DIGIT NINE. These are digits, but because ISO C 99 forbids to put them into digit they should go into alpha. This is in =E2=80=9Cpunct=E2=80=9D in the old i18n file but gen-unicode-cty= pe.c would put it into =E2=80=9Calpha=E2=80=9D which seems better for such digits according to the comments in gen-unicode-ctype.c. I went through all these =E2=80=9CMissing=E2=80=9D characters individually and looked them up in UnicodeData.txt and DerivedCoreProperties.txt, checked what how should be classified and added test cases for them to the ctype-compatibility.py script. I=E2=80=99ll attach the full report after using gen-unicode-ctype.c with UnicodeData.txt-7.0.0 to generate LC_CTYPE. --=20 You are receiving this mail because: You are on the CC list for the bug. >>From glibc-bugs-return-26531-listarch-glibc-bugs=sources.redhat.com@sourceware.org Thu Nov 06 11:06:32 2014 Return-Path: Delivered-To: listarch-glibc-bugs@sources.redhat.com Received: (qmail 11037 invoked by alias); 6 Nov 2014 11:06:32 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Delivered-To: mailing list glibc-bugs@sourceware.org Received: (qmail 10964 invoked by uid 48); 6 Nov 2014 11:06:28 -0000 From: "maiku.fabian at gmail dot com" To: glibc-bugs@sourceware.org Subject: [Bug localedata/14094] Update locale data to Unicode 7.0.0 Date: Thu, 06 Nov 2014 11:06:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: 2.21 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: maiku.fabian at gmail dot com X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: pravin.d.s at gmail dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: security- X-Bugzilla-Changed-Fields: attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-11/txt/msg00023.txt.bz2 Content-length: 505 https://sourceware.org/bugzilla/show_bug.cgi?id=14094 --- Comment #22 from Mike FABIAN --- Created attachment 7908 --> https://sourceware.org/bugzilla/attachment.cgi?id=7908&action=edit unicode-7.0.0-report-full-output Full report from ctype-compatibility.py when comparing the old i18n file in glibc with the file generated by gen-unicode-ctype.c using UnicodeData.txt from Unicode 7.0.0. -- You are receiving this mail because: You are on the CC list for the bug.