public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "maiku.fabian at gmail dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug localedata/14094] Update locale data to Unicode 7.0.0
Date: Thu, 06 Nov 2014 11:03:00 -0000 [thread overview]
Message-ID: <bug-14094-131-5W0TApSwRp@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-14094-131@http.sourceware.org/bugzilla/>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 5432 bytes --]
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #21 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Now when using gen-unicode-ctype.c with UnicodeData.txt-7.0.0
to generate LC_CTYPE, the generated file lacks far fewer
characters compared to the old i18n file in glibc:
alpha: Missing 246 characters of old ctype in new ctype
blank: Missing 1 characters of old ctype in new ctype
cntrl: Missing 0 characters of old ctype in new ctype
combining: Missing 3 characters of old ctype in new ctype
combining_level3: Missing 5 characters of old ctype in new ctype
digit: Missing 0 characters of old ctype in new ctype
graph: Missing 0 characters of old ctype in new ctype
lower: Missing 20 characters of old ctype in new ctype
print: Missing 0 characters of old ctype in new ctype
punct: Missing 16 characters of old ctype in new ctype
space: Missing 1 characters of old ctype in new ctype
tolower: Missing 0 characters of old ctype in new ctype
totitle: Missing 0 characters of old ctype in new ctype
toupper: Missing 0 characters of old ctype in new ctype
upper: Missing 0 characters of old ctype in new ctype
xdigit: Missing 0 characters of old ctype in new ctype
For example, gen-unicode-ctype.c does not put U+0901 into
the âalphaâ class although it should be there
according to DerivedCoreProperties.txt:
error: 0x901 ठalpha False: These have general category âMnâ i.e. these are
combining
characters (both in UnicodeData.txt 5.0.0 and 7.0.0):
â0901;DEVANAGARI SIGN CANDRABINDU;Mn;0;NSM;;;;;N;;;;;â,
â0902;DEVANAGARI SIGN ANUSVARA;Mn;0;NSM;;;;;N;;;;;â,
â0903;DEVANAGARI SIGN VISARGA;Mc;0;L;;;;;N;;;;;â.
According to DerivedCoreProperties.txt (7.0.0) these are
âAlphabeticâ.
Apparently this has been edited manually (correctly) in the old i18n file
of glibc.
So this would be fixed in the automatic generation
when using DerivedCoreProperties.txt for âalphaâ.
But some of the above seem to be errors in the old i18n file
of glib, for example:
error: 0x1090 á punct True: MYANMAR SHAN DIGIT ZERO - MYANMAR SHAN DIGIT NINE.
These are digits, but because ISO C 99 forbids to
put them into digit they should go into alpha.
This is in âpunctâ in the old i18n file but gen-unicode-ctype.c
would put it into âalphaâ which seems better for such digits
according to the comments in gen-unicode-ctype.c.
I went through all these âMissingâ characters individually
and looked them up in UnicodeData.txt and DerivedCoreProperties.txt,
checked what how should be classified and added test cases
for them to the ctype-compatibility.py script.
Iâll attach the full report after using gen-unicode-ctype.c with
UnicodeData.txt-7.0.0 to generate LC_CTYPE.
--
You are receiving this mail because:
You are on the CC list for the bug.
>From glibc-bugs-return-26531-listarch-glibc-bugs=sources.redhat.com@sourceware.org Thu Nov 06 11:06:32 2014
Return-Path: <glibc-bugs-return-26531-listarch-glibc-bugs=sources.redhat.com@sourceware.org>
Delivered-To: listarch-glibc-bugs@sources.redhat.com
Received: (qmail 11037 invoked by alias); 6 Nov 2014 11:06:32 -0000
Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <glibc-bugs.sourceware.org>
List-Subscribe: <mailto:glibc-bugs-subscribe@sourceware.org>
List-Post: <mailto:glibc-bugs@sourceware.org>
List-Help: <mailto:glibc-bugs-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: glibc-bugs-owner@sourceware.org
Delivered-To: mailing list glibc-bugs@sourceware.org
Received: (qmail 10964 invoked by uid 48); 6 Nov 2014 11:06:28 -0000
From: "maiku.fabian at gmail dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug localedata/14094] Update locale data to Unicode 7.0.0
Date: Thu, 06 Nov 2014 11:06:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: localedata
X-Bugzilla-Version: 2.21
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: maiku.fabian at gmail dot com
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: pravin.d.s at gmail dot com
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: security-
X-Bugzilla-Changed-Fields: attachments.created
Message-ID: <bug-14094-131-3sr7lFrpCn@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-14094-131@http.sourceware.org/bugzilla/>
References: <bug-14094-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-11/txt/msg00023.txt.bz2
Content-length: 505
https://sourceware.org/bugzilla/show_bug.cgi?id\x14094
--- Comment #22 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 7908
--> https://sourceware.org/bugzilla/attachment.cgi?idy08&actioníit
unicode-7.0.0-report-full-output
Full report from ctype-compatibility.py when comparing the old i18n
file in glibc with the file generated by gen-unicode-ctype.c using
UnicodeData.txt from Unicode 7.0.0.
--
You are receiving this mail because:
You are on the CC list for the bug.
next prev parent reply other threads:[~2014-11-06 11:03 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-10 20:28 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
2012-05-11 3:26 ` [Bug localedata/14094] " bugdal at aerifal dot cx
2013-11-26 17:05 ` myllynen at redhat dot com
2014-02-18 9:24 ` pravin.d.s at gmail dot com
2014-05-21 11:11 ` allan at archlinux dot org
2014-05-23 7:54 ` [Bug localedata/14094] Update locale data to Unicode 6.3 pravin.d.s at gmail dot com
2014-05-23 12:02 ` joseph at codesourcery dot com
2014-05-23 13:20 ` pravin.d.s at gmail dot com
2014-06-10 9:38 ` pravin.d.s at gmail dot com
2014-06-10 14:38 ` carlos at redhat dot com
2014-06-11 3:49 ` pravin.d.s at gmail dot com
2014-06-19 10:28 ` pravin.d.s at gmail dot com
2014-06-21 19:10 ` [Bug localedata/14094] Update locale data to Unicode 7.0.0 pravin.d.s at gmail dot com
2014-06-25 11:02 ` fweimer at redhat dot com
2014-06-25 12:24 ` pravin.d.s at gmail dot com
2014-06-25 13:47 ` carlos at redhat dot com
2014-07-04 9:13 ` pravin.d.s at gmail dot com
2014-07-17 10:41 ` pravin.d.s at gmail dot com
2014-07-22 12:18 ` pravin.d.s at gmail dot com
2014-09-05 1:07 ` carlos at redhat dot com
2014-09-29 7:13 ` maiku.fabian at gmail dot com
2014-09-29 7:17 ` pravin.d.s at gmail dot com
2014-11-06 11:00 ` maiku.fabian at gmail dot com
2014-11-06 11:03 ` maiku.fabian at gmail dot com [this message]
2014-11-06 11:45 ` maiku.fabian at gmail dot com
2014-11-12 10:13 ` pravin.d.s at gmail dot com
2014-11-12 10:18 ` pravin.d.s at gmail dot com
2014-11-14 7:15 ` maiku.fabian at gmail dot com
2014-11-14 7:34 ` maiku.fabian at gmail dot com
2014-11-24 11:20 ` maiku.fabian at gmail dot com
2014-12-01 10:14 ` maiku.fabian at gmail dot com
2014-12-03 12:27 ` maiku.fabian at gmail dot com
2014-12-03 12:27 ` maiku.fabian at gmail dot com
2014-12-04 10:33 ` maiku.fabian at gmail dot com
2015-02-20 22:36 ` cvs-commit at gcc dot gnu.org
2015-02-21 0:06 ` aoliva at sourceware dot org
2015-02-21 20:24 ` aoliva at sourceware dot org
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-14094-131-5W0TApSwRp@http.sourceware.org/bugzilla/ \
--to=sourceware-bugzilla@sourceware.org \
--cc=glibc-bugs@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).