* [Bug localedata/14094] New: Update locale data to Unicode 6.1
@ 2012-05-10 21:06 jsm28 at gcc dot gnu.org
2012-05-11 7:28 ` [Bug localedata/14094] " bugdal at aerifal dot cx
` (47 more replies)
0 siblings, 48 replies; 49+ messages in thread
From: jsm28 at gcc dot gnu.org @ 2012-05-10 21:06 UTC (permalink / raw)
To: libc-locales
http://sourceware.org/bugzilla/show_bug.cgi?id=14094
Bug #: 14094
Summary: Update locale data to Unicode 6.1
Product: glibc
Version: 2.15
Status: NEW
Severity: normal
Priority: P2
Component: localedata
AssignedTo: unassigned@sourceware.org
ReportedBy: jsm28@gcc.gnu.org
CC: libc-locales@sources.redhat.com
Classification: Unclassified
The Unicode locale data - character map and LC_CTYPE information - should be
updated from Unicode 6.1 (the character map is currently based on 6.0, and
LC_CTYPE is currently based on 5.0). This should be done with proper
automation and wiki documentation being added of how to do future updates. I
identified the following tasks at
<http://sourceware.org/ml/libc-alpha/2012-05/msg00590.html>:
* Ensure the character type data in localedata/charmaps/i18n can be
properly reproduced from Unicode 5.0 data using gen-unicode-ctype.c,
adapting gen-unicode-ctype.c as needed to replicate any changes that
may have been made not using that program.
* Update the character type data to Unicode 6.1, removing any local
hacks from gen-unicode-ctype.c that are no longer needed.
(10646:2012, corresponding to Unicode 6.1, appears to be in
publication stage so should be out very soon.)
* Ensure the character data in localedata/charmaps/UTF-8 can be
reproduced in some automated fashion from Unicode 6.0, locating any
previously used automation for this or creating some new automation
if any previous automation can't be found.
* Update the character data to Unicode 6.1, removing any local hacks
in the automation from the previous step.
* Document thoroughly on the wiki how the automation works and how to
do updates to new Unicode versions.
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 6.1
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
@ 2012-05-11 7:28 ` bugdal at aerifal dot cx
2013-11-26 17:07 ` myllynen at redhat dot com
` (46 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: bugdal at aerifal dot cx @ 2012-05-11 7:28 UTC (permalink / raw)
To: libc-locales
http://sourceware.org/bugzilla/show_bug.cgi?id=14094
Rich Felker <bugdal at aerifal dot cx> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bugdal at aerifal dot cx
--- Comment #1 from Rich Felker <bugdal at aerifal dot cx> 2012-05-11 03:25:47 UTC ---
One of the major "local hacks" can be fixed, fixing many other problems at the
same time, by switching to using the Unicode "Alphabetic" property (from
DerivedCoreProperties.txt) instead of just categories L* for class alpha. Right
now there are many languages whose letters are considered non-alphabetic by
glibc because they're in category Mn or Mc or even Cf. There are "local hacks"
to fix this for maybe one or two languages, but using the right Unicode
property would fix it for all languages.
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 6.1
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
2012-05-11 7:28 ` [Bug localedata/14094] " bugdal at aerifal dot cx
@ 2013-11-26 17:07 ` myllynen at redhat dot com
2014-02-18 10:12 ` pravin.d.s at gmail dot com
` (45 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: myllynen at redhat dot com @ 2013-11-26 17:07 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Marko Myllynen <myllynen at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |myllynen at redhat dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 6.1
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
2012-05-11 7:28 ` [Bug localedata/14094] " bugdal at aerifal dot cx
2013-11-26 17:07 ` myllynen at redhat dot com
@ 2014-02-18 10:12 ` pravin.d.s at gmail dot com
2014-05-21 12:52 ` allan at archlinux dot org
` (44 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-02-18 10:12 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Pravin S <pravin.d.s at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |pravin.d.s at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 6.1
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (2 preceding siblings ...)
2014-02-18 10:12 ` pravin.d.s at gmail dot com
@ 2014-05-21 12:52 ` allan at archlinux dot org
2014-05-21 12:52 ` johannes at kyriasis dot com
` (43 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: allan at archlinux dot org @ 2014-05-21 12:52 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Allan McRae <allan at archlinux dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |allan at archlinux dot org
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 6.1
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (3 preceding siblings ...)
2014-05-21 12:52 ` allan at archlinux dot org
@ 2014-05-21 12:52 ` johannes at kyriasis dot com
2014-05-23 7:56 ` [Bug localedata/14094] Update locale data to Unicode 6.3 pravin.d.s at gmail dot com
` (42 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: johannes at kyriasis dot com @ 2014-05-21 12:52 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Johannes Löthberg <johannes at kyriasis dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |johannes at kyriasis dot com
--- Comment #2 from Johannes Löthberg <johannes at kyriasis dot com> ---
*** Bug 16969 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 6.3
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (4 preceding siblings ...)
2014-05-21 12:52 ` johannes at kyriasis dot com
@ 2014-05-23 7:56 ` pravin.d.s at gmail dot com
2014-05-23 12:11 ` joseph at codesourcery dot com
` (41 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-05-23 7:56 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Pravin S <pravin.d.s at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Update locale data to |Update locale data to
|Unicode 6.1 |Unicode 6.3
--- Comment #3 from Pravin S <pravin.d.s at gmail dot com> ---
Rather than Uniocode 6.1, it should be Unicode 6.3.
Two files as mentioned in bug are
1. i18n (LC_CTYPE) (it used to be generated by gen-unicode-ctype.c, )
2. UTF-8 (it looks conversion from Unicode to UTF-8), i will find out
Are there any other files also involved in upgrading glibc localedata to
Unicode 6.1?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 6.3
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (5 preceding siblings ...)
2014-05-23 7:56 ` [Bug localedata/14094] Update locale data to Unicode 6.3 pravin.d.s at gmail dot com
@ 2014-05-23 12:11 ` joseph at codesourcery dot com
2014-05-23 13:55 ` pravin.d.s at gmail dot com
` (40 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: joseph at codesourcery dot com @ 2014-05-23 12:11 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #4 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
Once the data is updated (maybe once just the character map is updated),
__STDC_ISO_10646__ should be updated in include/stdc-predef.h to reflect
the publication date of the edition or amendment to ISO 10646
corresponding to the version of Unicode in use.
I advise keeping each of the tasks I listed as a separate patch, as it's
important to be confident we aren't losing desired local changes in the
course of the update (which means the existing files need to be reproduced
exactly by some automation before the update is done).
Bug 16061 relates to transliteration data, some of which came from
Unicode, and bug 14095 to collation data. The same principles apply to
those - reproduce the existing files, understanding any local changes in
the process, then update to a newer Unicode version - but they are likely
to involve much more work in understanding the existing state then
updating while preserving any desired local changes.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 6.3
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (6 preceding siblings ...)
2014-05-23 12:11 ` joseph at codesourcery dot com
@ 2014-05-23 13:55 ` pravin.d.s at gmail dot com
2014-06-10 9:43 ` pravin.d.s at gmail dot com
` (39 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-05-23 13:55 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #5 from Pravin S <pravin.d.s at gmail dot com> ---
Yeah, Backward compatibility is must.
I will write small script to check we are not changing existing maps, so we can
be confident before commiting.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 6.3
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (7 preceding siblings ...)
2014-05-23 13:55 ` pravin.d.s at gmail dot com
@ 2014-06-10 9:43 ` pravin.d.s at gmail dot com
2014-06-10 14:40 ` carlos at redhat dot com
` (38 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-06-10 9:43 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Pravin S <pravin.d.s at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at sourceware dot org |pravin.d.s at gmail dot com
--- Comment #6 from Pravin S <pravin.d.s at gmail dot com> ---
I have written script for checking backward compabitibility of new LC_CTYPE
with old LC_CTYPE.
Script is available at https://github.com/pravins/glibc-i18n
Important thing for us presently is report generated by script. i.e.
https://raw.githubusercontent.com/pravins/glibc-i18n/master/Report
While doing this also found in existing i18n file <U0D70>..<U0D75>; included
twice.
% MALAYALAM/
<U0D66>..<U0D75>;<U0D70>..<U0D75>;/
Let me know if anything is missing.
In next step, i will check missing characters from LC_CTYPE 5.0.0 with LC_CTYPE
6.3.0 and confirm are these intentional changes at Unicode or something we are
missing.
Will be ready with patch for updating LC_CTYPE next time.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 6.3
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (8 preceding siblings ...)
2014-06-10 9:43 ` pravin.d.s at gmail dot com
@ 2014-06-10 14:40 ` carlos at redhat dot com
2014-06-11 4:25 ` pravin.d.s at gmail dot com
` (37 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: carlos at redhat dot com @ 2014-06-10 14:40 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Carlos O'Donell <carlos at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |carlos at redhat dot com
--- Comment #7 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Pravin S from comment #6)
> I have written script for checking backward compabitibility of new LC_CTYPE
> with old LC_CTYPE.
>
> Script is available at https://github.com/pravins/glibc-i18n
>
> Important thing for us presently is report generated by script. i.e.
>
> https://raw.githubusercontent.com/pravins/glibc-i18n/master/Report
>
> While doing this also found in existing i18n file <U0D70>..<U0D75>; included
> twice.
>
> % MALAYALAM/
> <U0D66>..<U0D75>;<U0D70>..<U0D75>;/
>
> Let me know if anything is missing.
>
> In next step, i will check missing characters from LC_CTYPE 5.0.0 with
> LC_CTYPE 6.3.0 and confirm are these intentional changes at Unicode or
> something we are missing.
>
> Will be ready with patch for updating LC_CTYPE next time.
Thanks Pravin! I think the missing step is to get these scripts checked into
glibc's script/ directory so that we have them in a central location with some
internal comments showing how to run the script. This way we can re-run them at
later stages to verify what's missing and stay in sync (say the release manager
runs it before a release).
Eventually we want a documented process here:
https://sourceware.org/glibc/wiki/Regeneration
Even if it's just "Run this script. Fix all warnings by hand" it would be a
good start.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 6.3
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (9 preceding siblings ...)
2014-06-10 14:40 ` carlos at redhat dot com
@ 2014-06-11 4:25 ` pravin.d.s at gmail dot com
2014-06-19 11:28 ` pravin.d.s at gmail dot com
` (36 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-06-11 4:25 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #8 from Pravin S <pravin.d.s at gmail dot com> ---
Agree with you, will do it.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 6.3
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (10 preceding siblings ...)
2014-06-11 4:25 ` pravin.d.s at gmail dot com
@ 2014-06-19 11:28 ` pravin.d.s at gmail dot com
2014-06-21 19:15 ` [Bug localedata/14094] Update locale data to Unicode 7.0.0 pravin.d.s at gmail dot com
` (35 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-06-19 11:28 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #9 from Pravin S <pravin.d.s at gmail dot com> ---
(In reply to Rich Felker from comment #1)
> One of the major "local hacks" can be fixed, fixing many other problems at
> the same time, by switching to using the Unicode "Alphabetic" property (from
> DerivedCoreProperties.txt) instead of just categories L* for class alpha.
> Right now there are many languages whose letters are considered
> non-alphabetic by glibc because they're in category Mn or Mc or even Cf.
> There are "local hacks" to fix this for maybe one or two languages, but
> using the right Unicode property would fix it for all languages.
I was almost done with things bug While updating this, i found around 248
characters were added after gen-unicode-ctype.c processing in ALPHA group in
present i18n CTYPE (Unicode 5.1
https://github.com/pravins/glibc-i18n/blob/master/unicode5-1/Report ) and i am
facing same issue while upgrading it to Unicode 6.3 (246 characters)
(https://github.com/pravins/glibc-i18n/blob/master/Report)
During reading http://www.unicode.org/reports/tr44/#Property_List_Table It is
mentioned
"Implementations should simply use the derived properties, and should not try
to rederive them from lists of simple properties and collections of rules,
because of the chances for error and divergence when doing so."
I agree with Rich, We should collect available things from
DerivedCoreProperties.txt rather than processing raw UnicodeData.txt. I am
writing script to process groups from DerivedCoreProperties.txt
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (11 preceding siblings ...)
2014-06-19 11:28 ` pravin.d.s at gmail dot com
@ 2014-06-21 19:15 ` pravin.d.s at gmail dot com
2014-06-25 12:08 ` fweimer at redhat dot com
` (34 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-06-21 19:15 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Pravin S <pravin.d.s at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Update locale data to |Update locale data to
|Unicode 6.3 |Unicode 7.0.0
--- Comment #10 from Pravin S <pravin.d.s at gmail dot com> ---
I am working with latest Unicode standard, so updated bug summary.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (12 preceding siblings ...)
2014-06-21 19:15 ` [Bug localedata/14094] Update locale data to Unicode 7.0.0 pravin.d.s at gmail dot com
@ 2014-06-25 12:08 ` fweimer at redhat dot com
2014-06-25 12:43 ` pravin.d.s at gmail dot com
` (33 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: fweimer at redhat dot com @ 2014-06-25 12:08 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Florian Weimer <fweimer at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags| |security-
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (13 preceding siblings ...)
2014-06-25 12:08 ` fweimer at redhat dot com
@ 2014-06-25 12:43 ` pravin.d.s at gmail dot com
2014-06-25 13:54 ` carlos at redhat dot com
` (32 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-06-25 12:43 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #11 from Pravin S <pravin.d.s at gmail dot com> ---
(In reply to Joseph Myers from comment #0)
>
> * Ensure the character data in localedata/charmaps/UTF-8 can be
> reproduced in some automated fashion from Unicode 6.0, locating any
> previously used automation for this or creating some new automation
> if any previous automation can't be found.
Me too not able to find previous automation for same.
I can simply pass all Unicode to python unicode-to-utf8 and format it as
required by UTF-8 file.
Any hint on how to do this?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (14 preceding siblings ...)
2014-06-25 12:43 ` pravin.d.s at gmail dot com
@ 2014-06-25 13:54 ` carlos at redhat dot com
2014-07-04 10:51 ` pravin.d.s at gmail dot com
` (31 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: carlos at redhat dot com @ 2014-06-25 13:54 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #12 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Pravin S from comment #11)
> (In reply to Joseph Myers from comment #0)
> >
> > * Ensure the character data in localedata/charmaps/UTF-8 can be
> > reproduced in some automated fashion from Unicode 6.0, locating any
> > previously used automation for this or creating some new automation
> > if any previous automation can't be found.
>
> Me too not able to find previous automation for same.
>
> I can simply pass all Unicode to python unicode-to-utf8 and format it as
> required by UTF-8 file.
>
> Any hint on how to do this?
Not really, this is why this problem requires "work" ;-)
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (15 preceding siblings ...)
2014-06-25 13:54 ` carlos at redhat dot com
@ 2014-07-04 10:51 ` pravin.d.s at gmail dot com
2014-07-17 12:44 ` pravin.d.s at gmail dot com
` (30 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-07-04 10:51 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #13 from Pravin S <pravin.d.s at gmail dot com> ---
Created attachment 7679
--> https://sourceware.org/bugzilla/attachment.cgi?id=7679&action=edit
Patch to update UTF-8 CHARMAP to unicode 7.0
I have worked on updating UTF-8 file to Unicode 7.0. Following are the
important points before review this patch.
1. Present patch is only for CHARMAP, patch for updating WIDTH will be
available soon.
2. utf8-gen.py: New script to generate UTF-8 file.
3. patch is created by ignoring space changes (-w)
4.
''' Where UnicodeData.txt file has given characters in range
Example:
3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
UTF-8 file mention these range by adding 0x3F inbetween First and
Last Unicode character.
Example:
<U3400>..<U343F> /xe3/x90/x80 <CJK Ideograph Extension A>
.
.
<U4D80>..<U4DB5> /xe4/xb6/x80 <CJK Ideograph Extension A>
* Note: No idea why Hangul syllable AC00; D7A3; were not expanded in
Unicode **
** 5.0 UTF-8. We are following consistency and expanding Hangul as
well.**
* '''
5. Name changes are in UnicodeData.txt in some cases.
''' Some characters have <control> as a name, so using "Unicode 1.0
Name"
Characters U+0080, U+0081, U+0084 and U+0099 has "<control>" as a
name and even no "Unicode 1.0 Name" (10th field) in UnicodeData.txt
We can write code to take there alternate name from NameAliases.txt '''
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (16 preceding siblings ...)
2014-07-04 10:51 ` pravin.d.s at gmail dot com
@ 2014-07-17 12:44 ` pravin.d.s at gmail dot com
2014-07-22 13:03 ` pravin.d.s at gmail dot com
` (29 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-07-17 12:44 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Pravin S <pravin.d.s at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #7679|0 |1
is obsolete| |
--- Comment #14 from Pravin S <pravin.d.s at gmail dot com> ---
Created attachment 7715
--> https://sourceware.org/bugzilla/attachment.cgi?id=7715&action=edit
Patch to update UTF-8 CHARMAP and WIDTH to unicode 7.0
Done with all work with UTF-8 file.
Added two script:
1. utf8-gen.py to generate UTF-8 file
2. utf8-compatibility.py : to check backward compatibility of newly generated
UTF-8 file
3. Report of new UTF-8 file backward compatibility is available AT
https://raw.githubusercontent.com/pravins/glibc-i18n/master/report-utf8
Submitting to glibc-alpha, please help to quick review and push to git.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (17 preceding siblings ...)
2014-07-17 12:44 ` pravin.d.s at gmail dot com
@ 2014-07-22 13:03 ` pravin.d.s at gmail dot com
2014-09-05 1:08 ` carlos at redhat dot com
` (28 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-07-22 13:03 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #15 from Pravin S <pravin.d.s at gmail dot com> ---
Created attachment 7720
--> https://sourceware.org/bugzilla/attachment.cgi?id=7720&action=edit
Patch to update UTF-8 i18n file (CTYPE) to unicode 7.0
Patch does the following stuff:
* locales/i18n: Updated to Unicode 7.0.0
* scripts/gen-unicode-ctype.c: Disabled upper, lower, alpha and outdigit
classes.
* scripts/ctype-gen.sh: Shell script to generate LC_CTYPE for new Unicode
version.
* scripts/gen-unicode-ctype-dcp.py: New script for generating locales/i18n
upper, lower and alpha ctype from DerivedCoreProperties.txt
* scripts/ctype-compatibility.py: Script for testing testing backward
compatibility of LC_CTYPE locales/i18n.
Report for backward compatibility is available at
https://raw.githubusercontent.com/pravins/glibc-i18n/master/unicode7-0/ctype-compatibility5_1-to-7_0
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (18 preceding siblings ...)
2014-07-22 13:03 ` pravin.d.s at gmail dot com
@ 2014-09-05 1:08 ` carlos at redhat dot com
2014-09-29 7:29 ` maiku.fabian at gmail dot com
` (27 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: carlos at redhat dot com @ 2014-09-05 1:08 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Carlos O'Donell <carlos at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Version|2.15 |2.21
--- Comment #16 from Carlos O'Donell <carlos at redhat dot com> ---
Pravin,
Is any part of your work ready for 2.21 when it opens?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (19 preceding siblings ...)
2014-09-05 1:08 ` carlos at redhat dot com
@ 2014-09-29 7:29 ` maiku.fabian at gmail dot com
2014-09-29 7:30 ` pravin.d.s at gmail dot com
` (26 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-09-29 7:29 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Mike FABIAN <maiku.fabian at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |maiku.fabian at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (20 preceding siblings ...)
2014-09-29 7:29 ` maiku.fabian at gmail dot com
@ 2014-09-29 7:30 ` pravin.d.s at gmail dot com
2014-10-14 8:08 ` maiku.fabian at gmail dot com
` (25 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-09-29 7:30 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #17 from Pravin S <pravin.d.s at gmail dot com> ---
I am still waiting for someone to review these patches.
Best way will be,
1. Build glibc with patches.
2. Test WIDTH and CTYPE function (does it return proper value) may be one can
do same with existing glibc and compare.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (21 preceding siblings ...)
2014-09-29 7:30 ` pravin.d.s at gmail dot com
@ 2014-10-14 8:08 ` maiku.fabian at gmail dot com
2014-11-06 11:03 ` maiku.fabian at gmail dot com
` (24 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-10-14 8:08 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #18 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Pravin S from comment #14)
> Created attachment 7715 [details]
> Patch to update UTF-8 CHARMAP and WIDTH to unicode 7.0
>
> Done with all work with UTF-8 file.
> Added two script:
> 1. utf8-gen.py to generate UTF-8 file
> 2. utf8-compatibility.py : to check backward compatibility of newly
> generated UTF-8 file
> 3. Report of new UTF-8 file backward compatibility is available AT
> https://raw.githubusercontent.com/pravins/glibc-i18n/master/report-utf8
>
> Submitting to glibc-alpha, please help to quick review and push to git.
I checked the scripts Pravin used and the resulting UTF-8 file.
I found only one minor problem:
In some cases, both UnicodeData.txt and EastAsianWidth.txt have information
about width. For example, EastAsianWidth.txt has:
302A..302D;W # Mn [4] IDEOGRAPHIC LEVEL TONE MARK..IDEOGRAPHIC
ENTERING TONE MARK
which gives us width 2 for these 4 characters (because of “W”) but
UnicodeData.txt has:
302A;IDEOGRAPHIC LEVEL TONE MARK;Mn;218;NSM;;;;;N;;;;;
302B;IDEOGRAPHIC RISING TONE MARK;Mn;228;NSM;;;;;N;;;;;
302C;IDEOGRAPHIC DEPARTING TONE MARK;Mn;232;NSM;;;;;N;;;;;
302D;IDEOGRAPHIC ENTERING TONE MARK;Mn;222;NSM;;;;;N;;;;;
which would give width 0 (because of “NSM”).
I changed Pravin’s script a bit to prefer the information from
EastAsianWidth.txt in case of conflicts.
Pravin has already merged my change into his git repository.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (23 preceding siblings ...)
2014-11-06 11:03 ` maiku.fabian at gmail dot com
@ 2014-11-06 11:03 ` maiku.fabian at gmail dot com
2014-11-06 11:05 ` maiku.fabian at gmail dot com
` (22 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-11-06 11:03 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #19 from Mike FABIAN <maiku.fabian at gmail dot com> ---
I extended Pravin’s ctype-compatibility.py script to produce more
human readable output and added many extra tests.
Joseph Myers> * Ensure the character type data in
Joseph Myers> localedata/charmaps/i18n can be properly reproduced from
Joseph Myers> Unicode 5.0 data using gen-unicode-ctype.c, adapting
Joseph Myers> gen-unicode-ctype.c as needed to replicate any changes
Joseph Myers> that may have been made not using that program.
When using gen-unicode-ctype.c with UnicodeData.txt-5.0.0
to generate LC_CTYPE, the generated file lacks
many characters which apparently have been manually added
to glibc’s i18n file:
alpha: Missing 1238 characters of old ctype in new ctype
blank: Missing 0 characters of old ctype in new ctype
cntrl: Missing 0 characters of old ctype in new ctype
combining: Missing 124 characters of old ctype in new ctype
combining_level3: Missing 49 characters of old ctype in new ctype
digit: Missing 0 characters of old ctype in new ctype
graph: Missing 1571 characters of old ctype in new ctype
lower: Missing 115 characters of old ctype in new ctype
print: Missing 1571 characters of old ctype in new ctype
punct: Missing 335 characters of old ctype in new ctype
space: Missing 0 characters of old ctype in new ctype
tolower: Missing 19 characters of old ctype in new ctype
totitle: Missing 8 characters of old ctype in new ctype
toupper: Missing 18 characters of old ctype in new ctype
upper: Missing 100 characters of old ctype in new ctype
xdigit: Missing 0 characters of old ctype in new ctype
I.e. reproducing the localedata/charmaps/i18n character type data
from Unicode 5.0 data using gen-unicode-ctype.c does not work
well because glibc’s i18n file apparently has been edited
manually a lot already to include newer Unicode data.
Apparently quite a few mistake have been made by manually editing
the i18n file. For example, the report from ctype-compatibility.py
also produces for the old i18n file:
error: 0xa67f ꙿ punct True: 0xa67f CYRILLIC PAYEROK. Not in Unicode 5.0.0. In
Unicode
7.0.0. General category Lm (Letter
modifier). DerivedCoreProperties.txt says it is
“Alphabetic”. Apparently added manually to punct by mistake in
glibc’s old LC_CTYPE.
error: 0xa67f ꙿ alpha False: 0xa67f CYRILLIC PAYEROK. Not in Unicode 5.0.0. In
Unicode
7.0.0. General category Lm (Letter
modifier). DerivedCoreProperties.txt says it is
“Alphabetic”. Apparently added manually to punct by mistake in
glibc’s old LC_CTYPE.
Another example:
error: 0x9f4 ৴ alpha True:
“09F4;BENGALI CURRENCY NUMERATOR ONE;No;0;L;;;;1/16;N;;;;;”
“09F5;BENGALI CURRENCY NUMERATOR TWO;No;0;L;;;;1/8;N;;;;;”
“09F6;BENGALI CURRENCY NUMERATOR THREE;No;0;L;;;;3/16;N;;;;;”
“09F7;BENGALI CURRENCY NUMERATOR FOUR;No;0;L;;;;1/4;N;;;;;”
“09F8;BENGALI CURRENCY NUMERATOR ONE LESS THAN THE
DENOMINATOR;No;0;L;;;;3/4;N;;;;;”
“09F9;BENGALI CURRENCY DENOMINATOR SIXTEEN;No;0;L;;;;16;N;;;;;”
“09FA;BENGALI ISSHAR;So;0;L;;;;;N;;;;;”
According to DerivedCoreProperties.txt (7.0.0) these are *not*
“Alphabetic”.
So this has been mistakenly added to “alpha” in the old i18n file
of glibc (but gen-unicode-ctype.c correctly puts in into “punct”,
i.e. this seems to be another mistake by manual editing).
Some of the errors reported by ctype-compatibility.py
error: 0x250 ɐ lower False: Should be lower in Unicode 7.0.0 (was not lower in
Unicode 5.0.0).
would be fixed by using gen-unicode-ctype.c with Unicode 7.0.0 input.
There are many more problems like this in the old i18n file,
my tests found 133 errors total:
------------------------------------------------------------
Old file = /local/mfabian/src/glibc/localedata/locales/i18n
Number of errors in old file = 133
------------------------------------------------------------
I’ll attach the full report.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (22 preceding siblings ...)
2014-10-14 8:08 ` maiku.fabian at gmail dot com
@ 2014-11-06 11:03 ` maiku.fabian at gmail dot com
2014-11-06 11:03 ` maiku.fabian at gmail dot com
` (23 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-11-06 11:03 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #20 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 7907
--> https://sourceware.org/bugzilla/attachment.cgi?id=7907&action=edit
unicode-5.0.0-report-full-output
Full report from ctype-compatibility.py when comparing the old i18n
file in glibc with the file generated by gen-unicode-ctype.c using
UnicodeData.txt from Unicode 5.0.0.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (24 preceding siblings ...)
2014-11-06 11:03 ` maiku.fabian at gmail dot com
@ 2014-11-06 11:05 ` maiku.fabian at gmail dot com
2014-11-06 11:22 ` maiku.fabian at gmail dot com
` (21 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-11-06 11:05 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #21 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Now when using gen-unicode-ctype.c with UnicodeData.txt-7.0.0
to generate LC_CTYPE, the generated file lacks far fewer
characters compared to the old i18n file in glibc:
alpha: Missing 246 characters of old ctype in new ctype
blank: Missing 1 characters of old ctype in new ctype
cntrl: Missing 0 characters of old ctype in new ctype
combining: Missing 3 characters of old ctype in new ctype
combining_level3: Missing 5 characters of old ctype in new ctype
digit: Missing 0 characters of old ctype in new ctype
graph: Missing 0 characters of old ctype in new ctype
lower: Missing 20 characters of old ctype in new ctype
print: Missing 0 characters of old ctype in new ctype
punct: Missing 16 characters of old ctype in new ctype
space: Missing 1 characters of old ctype in new ctype
tolower: Missing 0 characters of old ctype in new ctype
totitle: Missing 0 characters of old ctype in new ctype
toupper: Missing 0 characters of old ctype in new ctype
upper: Missing 0 characters of old ctype in new ctype
xdigit: Missing 0 characters of old ctype in new ctype
For example, gen-unicode-ctype.c does not put U+0901 into
the “alpha” class although it should be there
according to DerivedCoreProperties.txt:
error: 0x901 ँ alpha False: These have general category “Mn” i.e. these are
combining
characters (both in UnicodeData.txt 5.0.0 and 7.0.0):
“0901;DEVANAGARI SIGN CANDRABINDU;Mn;0;NSM;;;;;N;;;;;”,
”0902;DEVANAGARI SIGN ANUSVARA;Mn;0;NSM;;;;;N;;;;;”,
“0903;DEVANAGARI SIGN VISARGA;Mc;0;L;;;;;N;;;;;”.
According to DerivedCoreProperties.txt (7.0.0) these are
“Alphabetic”.
Apparently this has been edited manually (correctly) in the old i18n file
of glibc.
So this would be fixed in the automatic generation
when using DerivedCoreProperties.txt for “alpha”.
But some of the above seem to be errors in the old i18n file
of glib, for example:
error: 0x1090 ႐ punct True: MYANMAR SHAN DIGIT ZERO - MYANMAR SHAN DIGIT NINE.
These are digits, but because ISO C 99 forbids to
put them into digit they should go into alpha.
This is in “punct” in the old i18n file but gen-unicode-ctype.c
would put it into “alpha” which seems better for such digits
according to the comments in gen-unicode-ctype.c.
I went through all these “Missing” characters individually
and looked them up in UnicodeData.txt and DerivedCoreProperties.txt,
checked what how should be classified and added test cases
for them to the ctype-compatibility.py script.
I’ll attach the full report after using gen-unicode-ctype.c with
UnicodeData.txt-7.0.0 to generate LC_CTYPE.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (25 preceding siblings ...)
2014-11-06 11:05 ` maiku.fabian at gmail dot com
@ 2014-11-06 11:22 ` maiku.fabian at gmail dot com
2014-11-06 11:56 ` maiku.fabian at gmail dot com
` (20 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-11-06 11:22 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #22 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 7908
--> https://sourceware.org/bugzilla/attachment.cgi?id=7908&action=edit
unicode-7.0.0-report-full-output
Full report from ctype-compatibility.py when comparing the old i18n
file in glibc with the file generated by gen-unicode-ctype.c using
UnicodeData.txt from Unicode 7.0.0.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (26 preceding siblings ...)
2014-11-06 11:22 ` maiku.fabian at gmail dot com
@ 2014-11-06 11:56 ` maiku.fabian at gmail dot com
2014-11-06 11:59 ` maiku.fabian at gmail dot com
` (19 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-11-06 11:56 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #23 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Now Pravin’s approach in the patch attached to comment#15
is to comment out the generation of “upper”, “lower”
and “alpha” from gen-unicode-ctype.c and add another
script gen-unicode-ctype-dcp.py which adds these.
But this is a bit problematic.
1) it does not put digits like
alpha: Missing: ٠ 0x660 ARABIC-INDIC DIGIT ZERO
into “alpha”, which gen-unicode-ctype.c would have done.
gen-unicode-ctype.c contains the comment
/* Consider all the non-ASCII digits as alphabetic.
ISO C 99 forbids us to have them in category "digit",
but we want iswalnum to return true on them. */
which sounds reasonable.
2) it does not put characters like
lower: Missing: Dž 0x1c5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH
CARON
into lower. This is actually title case, not lower case,
but glibc does have only “lower” and “upper”, not “title”.
Although it has “toupper”, “tolower”, and “totitle”.
gen-unicode-ctype.c puts characters which change when “toupper”
is applied into “lower” and characters which change when “tolower”
is applied into “upper”. Therefore, gen-unicode-ctype.c
puts title case characters like Dž 0x1c5 into *both*, “upper” *and*
“lower”. Which seems reasonable if glibc has no “title”.
3) it does not put some characters like:
upper: Missing: ᾈ 0x1f88 GREEK CAPITAL LETTER ALPHA WITH PSILI AND
PROSGEGRAMMENI
into “upper”. Surprisingly,
“U+1F88 ᾈ GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI”
is *not* listed as “Uppercase” in
http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt .
Although U+1F80 seems to be Uppercase according to
http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt
because it has a tolower mapping to U+1F80:
1F80;GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI;Ll;0;L;1F00
0345;;;;N;;;1F88;;1F88
1F88;GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI;Lt;0;L;1F08
0345;;;;N;;;;1F80;
So this might be a bug in DerivedCoreProperties.txt.
Generating “upper” and “lower” the way gen-unicode-ctype.c does,
i.e. just using UnicodeData.txt and check whether characters
change when mapping them to upper or to lower does not produce this
error. I think the approach gen-unicode-ctype.c uses for “upper”
and “lower” is fine, it is not necessary to use DerivedCoreProperties.txt
for this.
4) *many* characters end up being in “alpha” *and* “punct”
For example:
error: ⷶ 0x2df6 is alpha and punct
gen-unicode-ctype.c has the comment:
/* alpha restriction: "No character specified for the keywords cntrl,
digit, punct or space shall be specified." */
This restriction is violated because the the second script
gen-unicode-ctype-dcp.py used in Pravin’s 2-pass approach does not
check whether gen-unicode-ctype.c has already put a character into
“punct” before putting it into “alpha”.
The character “ⷶ U+2df6 COMBINING CYRILLIC LETTER A” is “Alphabetic”
according to DerivedCoreProperties.txt:
2DE0..2DFF ; Alphabetic # Mn [32] COMBINING CYRILLIC LETTER
BE..COMBINING CYRILLIC LETTER IOTIFIED BIG YUS
So Pravin’s script does rightly put it in to “alpha”.
But looking at this, it seems not a good idea to have two independent
programs generating the file in 2 independent passes.
Verifications like gen-unicode-ctype.c does:
/* toupper restriction: "Only characters specified for the keywords
lower and upper shall be specified. */
...
/* tolower restriction: "Only characters specified for the keywords
lower and upper shall be specified. */
...
/* alpha restriction: "Characters classified as either upper or lower
shall automatically belong to this class. */
...
/* alpha restriction: "No character specified for the keywords cntrl,
digit, punct or space shall be specified." */
...
/* space restriction: "No character specified for the keywords upper,
lower, alpha, digit, graph or xdigit shall be specified."
upper, lower, alpha already checked above. */
...
/* cntrl restriction: "No character specified for the keywords upper,
lower, alpha, digit, punct, graph, print or xdigit shall be
specified." upper, lower, alpha already checked above. */
...
can be done much easier when using a single program.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (27 preceding siblings ...)
2014-11-06 11:56 ` maiku.fabian at gmail dot com
@ 2014-11-06 11:59 ` maiku.fabian at gmail dot com
2014-11-06 12:09 ` maiku.fabian at gmail dot com
` (18 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-11-06 11:59 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #24 from Mike FABIAN <maiku.fabian at gmail dot com> ---
So I think we should do either:
1) improve gen-unicode-ctype.c and make it use
DerivedCoreProperties.txt for “alpha”
or:
2) rewrite gen-unicode-ctype.c to Python
First a rewrite which produces *exactly* the same
output as gen-unicode-ctype.c, then add code
to use DerivedCoreProperties.txt for “alpha”
No matter whether extending the C-Program or writing a Python program,
it should be a single program to be able to verify the restrictions
mentioned easily.
It would be nice of course to make the program read in the old i18n
file and replace the characters classes and write out a new file which
keeps the rest of the original file so that no manual copy&paste of
the generated character classes is necessary.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (28 preceding siblings ...)
2014-11-06 11:59 ` maiku.fabian at gmail dot com
@ 2014-11-06 12:09 ` maiku.fabian at gmail dot com
2014-11-12 10:15 ` pravin.d.s at gmail dot com
` (17 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-11-06 12:09 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #25 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Mike FABIAN from comment #24)
> No matter whether extending the C-Program or writing a Python program,
> it should be a single program to be able to verify the restrictions
> mentioned easily.
And as a 2nd pass, after the single program to generate the character
class data, use ctype-compatibility.py as a "test-suite".
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (29 preceding siblings ...)
2014-11-06 12:09 ` maiku.fabian at gmail dot com
@ 2014-11-12 10:15 ` pravin.d.s at gmail dot com
2014-11-12 10:25 ` pravin.d.s at gmail dot com
` (16 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-11-12 10:15 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Pravin S <pravin.d.s at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Depends on| |17588
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (30 preceding siblings ...)
2014-11-12 10:15 ` pravin.d.s at gmail dot com
@ 2014-11-12 10:25 ` pravin.d.s at gmail dot com
2014-11-14 15:10 ` maiku.fabian at gmail dot com
` (15 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-11-12 10:25 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #26 from Pravin S <pravin.d.s at gmail dot com> ---
(In reply to Mike FABIAN from comment #18)
> (In reply to Pravin S from comment #14)
> > Created attachment 7715 [details]
> > Patch to update UTF-8 CHARMAP and WIDTH to unicode 7.0
> >
> > Done with all work with UTF-8 file.
> > Added two script:
> > 1. utf8-gen.py to generate UTF-8 file
> > 2. utf8-compatibility.py : to check backward compatibility of newly
> > generated UTF-8 file
> > 3. Report of new UTF-8 file backward compatibility is available AT
> > https://raw.githubusercontent.com/pravins/glibc-i18n/master/report-utf8
> >
> > Submitting to glibc-alpha, please help to quick review and push to git.
>
> I checked the scripts Pravin used and the resulting UTF-8 file.
>
> I found only one minor problem:
>
> In some cases, both UnicodeData.txt and EastAsianWidth.txt have information
> about width. For example, EastAsianWidth.txt has:
>
> 302A..302D;W # Mn [4] IDEOGRAPHIC LEVEL TONE MARK..IDEOGRAPHIC
> ENTERING TONE MARK
>
> which gives us width 2 for these 4 characters (because of “W”) but
> UnicodeData.txt has:
>
> 302A;IDEOGRAPHIC LEVEL TONE MARK;Mn;218;NSM;;;;;N;;;;;
> 302B;IDEOGRAPHIC RISING TONE MARK;Mn;228;NSM;;;;;N;;;;;
> 302C;IDEOGRAPHIC DEPARTING TONE MARK;Mn;232;NSM;;;;;N;;;;;
> 302D;IDEOGRAPHIC ENTERING TONE MARK;Mn;222;NSM;;;;;N;;;;;
>
> which would give width 0 (because of “NSM”).
>
> I changed Pravin’s script a bit to prefer the information from
> EastAsianWidth.txt in case of conflicts.
>
> Pravin has already merged my change into his git repository.
Thanks Mike for review. This bug is presently tracking two changes one with
i18n file and other with UTF-8 file. Both changes are significant so for better
tracking i created new bug
https://sourceware.org/bugzilla/show_bug.cgi?id=17588 for UTF-8 file. I will
submit respective patches there.
i18n ctype is still pending.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (34 preceding siblings ...)
2014-11-14 15:10 ` maiku.fabian at gmail dot com
@ 2014-11-14 15:10 ` maiku.fabian at gmail dot com
2014-11-14 15:11 ` maiku.fabian at gmail dot com
` (11 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-11-14 15:10 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #28 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 7932
--> https://sourceware.org/bugzilla/attachment.cgi?id=7932&action=edit
gen-unicode-ctype.py
Improved version of gen-unicode-ctype.py which also parses
DerivedCoreProperties.txt and uses it (partly) for is_alpha(),
is_lower(), and is_upper().
"partly" because of 1):
# Consider all the non-ASCII digits as alphabetic.
# ISO C 99 forbids us to have them in category “digit”,
# but we want iswalnum to return true on them.
These digits are not “Alphabetic” in DerivedCoreProperties.txt
but it seems to makes sense to treat them as alpha according
to this comment by Bruno.
and 2):
title case characters are treated as both upper *and* lower.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (33 preceding siblings ...)
2014-11-14 15:10 ` maiku.fabian at gmail dot com
@ 2014-11-14 15:10 ` maiku.fabian at gmail dot com
2014-11-14 15:10 ` maiku.fabian at gmail dot com
` (12 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-11-14 15:10 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #29 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 7933
--> https://sourceware.org/bugzilla/attachment.cgi?id=7933&action=edit
report-gen-unicode-ctype.py-DerivedCoreProperties-7.0.0
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (32 preceding siblings ...)
2014-11-14 15:10 ` maiku.fabian at gmail dot com
@ 2014-11-14 15:10 ` maiku.fabian at gmail dot com
2014-11-14 15:10 ` maiku.fabian at gmail dot com
` (13 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-11-14 15:10 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #30 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Mike FABIAN from comment #29)
> Created attachment 7933 [details]
> report-gen-unicode-ctype.py-DerivedCoreProperties-7.0.0
From this report:
alpha: Missing: ⒜ 0x249c PARENTHESIZED LATIN SMALL LETTER A
...
These are *not* “Alphabetic” in DerivedCoreProperties.txt, therefore
it is correct to remove them.
978 characters have been removed from “punct” which are now in “alpha”
because of DerivedCoreProperties.txt.
Number of errors in new file = 11:
These are only errors like:
error: 0xe2f ฯ alpha True: FIXME: Theppitak Karoonboonyanan
<thep@links.nectec.or.th> says
<U0E2F>, <U0E46> should belong to punct. DerivedCoreProperties.txt
says it is alpha.
...
error: 0xe4e ๎ alpha False: FIXME: gen-unicode-ctype.c: Theppitak
Karoonboonyanan
<thep@links.nectec.or.th> says <U0E47>..<U0E4E> are
is_alpha. DerivedCoreProperties does *not*.
I wrote mail to Theppitak Karoonboonyanan <thep@links.nectec.or.th>
and Bruno, The mail to thep@links.nectec.or.th bounced and I did not
get an answer from Bruno.
I think it is better to trust DerivedCoreProperties.txt here, so I don’t
think these are errors.
So I think my updated gen-unicode-ctype.py produces the character
classes correctly (as far as possible with the limitations caused by
glibc and ISO C 99).
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (31 preceding siblings ...)
2014-11-12 10:25 ` pravin.d.s at gmail dot com
@ 2014-11-14 15:10 ` maiku.fabian at gmail dot com
2014-11-14 15:10 ` maiku.fabian at gmail dot com
` (14 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-11-14 15:10 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #27 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 7931
--> https://sourceware.org/bugzilla/attachment.cgi?id=7931&action=edit
gen-unicode-ctype.py
Python rewrite of Bruno Haible’s gen-unicode-ctype.c.
This version produces *exactly* the same output as the C program:
$ gcc -o gen-unicode-ctype gen-unicode-ctype.c
$ ./gen-unicode-ctype UnicodeData.txt 7.0.0
$ ./gen-unicode-ctype.py -u UnicodeData.txt -o unicode-new
--unicode_version 7.0.0
$ diff -u unicode unicode-new
$
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (35 preceding siblings ...)
2014-11-14 15:10 ` maiku.fabian at gmail dot com
@ 2014-11-14 15:11 ` maiku.fabian at gmail dot com
2014-11-24 11:28 ` maiku.fabian at gmail dot com
` (10 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-11-14 15:11 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #31 from Mike FABIAN <maiku.fabian at gmail dot com> ---
I think I should probably do another update to gen-unicode-ctype.py
to read in the original “i18n” file of glibc and write out a new
one replacing the character classes to avoid having to do cut and paste
manually.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (36 preceding siblings ...)
2014-11-14 15:11 ` maiku.fabian at gmail dot com
@ 2014-11-24 11:28 ` maiku.fabian at gmail dot com
2014-12-01 10:38 ` maiku.fabian at gmail dot com
` (9 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-11-24 11:28 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #32 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Mike FABIAN from comment #23)
> 3) it does not put some characters like:
>
> upper: Missing: ᾈ 0x1f88 GREEK CAPITAL LETTER ALPHA WITH PSILI AND
> PROSGEGRAMMENI
>
> into “upper”. Surprisingly,
>
> “U+1F88 ᾈ GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI”
> is *not* listed as “Uppercase” in
> http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt .
>
> Although U+1F80 seems to be Uppercase according to
> http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt
> because it has a tolower mapping to U+1F80:
>
> 1F80;GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI;Ll;0;L;1F00
> 0345;;;;N;;;1F88;;1F88
> 1F88;GREEK CAPITAL LETTER ALPHA WITH PSILI AND
> PROSGEGRAMMENI;Lt;0;L;1F08 0345;;;;N;;;;1F80;
>
> So this might be a bug in DerivedCoreProperties.txt.
It is not a bug in DerivedCoreProperties.txt, I asked on the Unicode
mailing list:
http://www.unicode.org/mail-arch/unicode-ml/y2014-m11/0010.html
So these are actually title case as well.
That means, because of the restrictions of ISO C 99, these title
characters should be both in the “upper” and “lower” character class
in LC_CTYPE (my gen-unicode-ctype.py from comment#28 does this).
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (37 preceding siblings ...)
2014-11-24 11:28 ` maiku.fabian at gmail dot com
@ 2014-12-01 10:38 ` maiku.fabian at gmail dot com
2014-12-03 10:01 ` maiku.fabian at gmail dot com
` (8 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-12-01 10:38 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Mike FABIAN <maiku.fabian at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #7931|0 |1
is obsolete| |
Attachment #7932|0 |1
is obsolete| |
--- Comment #33 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 7979
--> https://sourceware.org/bugzilla/attachment.cgi?id=7979&action=edit
gen-unicode-ctype.py
New version of gen-unicode-ctype.py which can read the head and tail
of the original i18n file. To avoid having to cut and paste the
generated LC_CTYPE character classes into the new glibc i18n file,
read the old file as well. Copy everything from the old file to the
newly generated file except the LC_CTYPE character class data, which
are generated from the UnicodeData.txt and DerivedCoreProperties.txt
given.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (38 preceding siblings ...)
2014-12-01 10:38 ` maiku.fabian at gmail dot com
@ 2014-12-03 10:01 ` maiku.fabian at gmail dot com
2014-12-03 12:47 ` maiku.fabian at gmail dot com
` (7 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-12-03 10:01 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #34 from Mike FABIAN <maiku.fabian at gmail dot com> ---
When I generate a new glibc/localedata/locales/i18n file
using gen-unicode-ctype.py from comment#33 and build
glibc with that and then run the tests with “make check”, I get
one failure:
FAIL: localedata/tst-ctype
Looking why it fails I find in ./localedata/tst-ctype.out:
Locale-specific tests for `lower'
islower('ª' = '\xaa') is true
islower('º' = '\xba') is true
Locale-specific tests for `lower'
...
2 errors for `de_DE.ISO-8859-1' locale
The new “lower” character class generated by gen-unicode-ctype.py
contains U+00AA ª FEMININE ORDINAL INDICATOR and U+00BA º MASCULINE
ORDINAL INDICATOR.
The test tst-ctype run by “make check” wants them *not* to be lower case.
DerivedCoreProperties.txt lists both as lower case though:
00AA ; Lowercase # Lo FEMININE ORDINAL INDICATOR
00BA ; Lowercase # Lo MASCULINE ORDINAL INDICATOR
That’s why gen-unicode-ctype.py adds them to the “lower” character
class, it adds all characters found in DerivedCoreProperties.txt
marked as “Lowercase” to the character class “lower”.
I wonder what needs to be done here.
Is the test in glibc wrong?
If so, it could be fixed by a patch like this:
$ git show | iconv -f iso-8859-1 -t utf-8
commit 25c913674386011a44b6270579a894b2e8200d25
Author: Mike FABIAN <mfabian@redhat.com>
Date: Wed Dec 3 10:05:42 2014 +0100
Fix test case localedata/tst-ctype-de_DE.ISO-8859-1.in
DerivedCoreProperties.txt from Unicode 7.0.0 lists
the characters U+00AA (ª) and U+00BA (º) as lower case:
00AA ; Lowercase # Lo FEMININE ORDINAL INDICATOR
00BA ; Lowercase # Lo MASCULINE ORDINAL INDICATOR
diff --git a/localedata/tst-ctype-de_DE.ISO-8859-1.in
b/localedata/tst-ctype-de_DE.ISO-8859-1.in
index f71d76c..e124a52 100644
--- a/localedata/tst-ctype-de_DE.ISO-8859-1.in
+++ b/localedata/tst-ctype-de_DE.ISO-8859-1.in
@@ -1,5 +1,5 @@
lower ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
- 000000000000000000000100000000000000000000000000
+ 000000000010000000000100001000000000000000000000
lower ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
000000000000000111111111111111111111111011111111
upper ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (40 preceding siblings ...)
2014-12-03 12:47 ` maiku.fabian at gmail dot com
@ 2014-12-03 12:47 ` maiku.fabian at gmail dot com
2014-12-04 10:35 ` maiku.fabian at gmail dot com
` (5 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-12-03 12:47 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #35 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 7988
--> https://sourceware.org/bugzilla/attachment.cgi?id=7988&action=edit
0001-Update-LC_CTYPE-character-class-data-to-Unicode-7.0..patch
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (39 preceding siblings ...)
2014-12-03 10:01 ` maiku.fabian at gmail dot com
@ 2014-12-03 12:47 ` maiku.fabian at gmail dot com
2014-12-03 12:47 ` maiku.fabian at gmail dot com
` (6 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-12-03 12:47 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #36 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 7989
--> https://sourceware.org/bugzilla/attachment.cgi?id=7989&action=edit
0002-Fix-test-case-localedata-tst-ctype-de_DE.ISO-8859-1..patch
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (41 preceding siblings ...)
2014-12-03 12:47 ` maiku.fabian at gmail dot com
@ 2014-12-04 10:35 ` maiku.fabian at gmail dot com
2015-02-21 1:04 ` cvs-commit at gcc dot gnu.org
` (4 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: maiku.fabian at gmail dot com @ 2014-12-04 10:35 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #37 from Mike FABIAN <maiku.fabian at gmail dot com> ---
*** Bug 14010 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (42 preceding siblings ...)
2014-12-04 10:35 ` maiku.fabian at gmail dot com
@ 2015-02-21 1:04 ` cvs-commit at gcc dot gnu.org
2015-02-21 1:05 ` aoliva at sourceware dot org
` (3 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2015-02-21 1:04 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #38 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, master has been updated
via 4a4839c94a4c93ffc0d5b95c69a08b02a57007f2 (commit)
from e4a399dc3dbb3228eb39af230ad11bc42a018c93 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4a4839c94a4c93ffc0d5b95c69a08b02a57007f2
commit 4a4839c94a4c93ffc0d5b95c69a08b02a57007f2
Author: Alexandre Oliva <aoliva@redhat.com>
Date: Fri Feb 20 20:14:59 2015 -0200
Unicode 7.0.0 update; added generator scripts.
for localedata/ChangeLog
[BZ #17588]
[BZ #13064]
[BZ #14094]
[BZ #17998]
* unicode-gen/Makefile: New.
* unicode-gen/unicode-license.txt: New, from Unicode.
* unicode-gen/UnicodeData.txt: New, from Unicode.
* unicode-gen/DerivedCoreProperties.txt: New, from Unicode.
* unicode-gen/EastAsianWidth.txt: New, from Unicode.
* unicode-gen/gen_unicode_ctype.py: New generator, from Mike
FABIAN <mfabian@redhat.com>.
* unicode-gen/ctype_compatibility.py: New verifier, from
Pravin Satpute <psatpute@redhat.com> and Mike FABIAN.
* unicode-gen/ctype_compatibility_test_cases.py: New verifier
module, from Mike FABIAN.
* unicode-gen/utf8_gen.py: New generator, from Pravin Satpute
and Mike FABIAN.
* unicode-gen/utf8_compatibility.py: New verifier, from Pravin
Satpute and Mike FABIAN.
* charmaps/UTF-8: Update.
* locales/i18n: Update.
* gen-unicode-ctype.c: Remove.
* tst-ctype-de_DE.ISO-8859-1.in: Adjust, islower now returns
true for ordinal indicators.
-----------------------------------------------------------------------
Summary of changes:
NEWS | 11 +-
localedata/ChangeLog | 27 +
localedata/charmaps/UTF-8 |11946 ++++++---
localedata/gen-unicode-ctype.c | 784 -
localedata/locales/i18n | 2652 +-
localedata/tst-ctype-de_DE.ISO-8859-1.in | 2 +-
localedata/unicode-gen/DerivedCoreProperties.txt |10794 ++++++++
localedata/unicode-gen/EastAsianWidth.txt | 2121 ++
localedata/unicode-gen/Makefile | 99 +
localedata/unicode-gen/UnicodeData.txt |27268 ++++++++++++++++++++
localedata/unicode-gen/ctype_compatibility.py | 546 +
.../unicode-gen/ctype_compatibility_test_cases.py | 951 +
localedata/unicode-gen/gen_unicode_ctype.py | 751 +
localedata/unicode-gen/unicode-license.txt | 50 +
localedata/unicode-gen/utf8_compatibility.py | 399 +
localedata/unicode-gen/utf8_gen.py | 286 +
16 files changed, 53305 insertions(+), 5382 deletions(-)
delete mode 100644 localedata/gen-unicode-ctype.c
create mode 100644 localedata/unicode-gen/DerivedCoreProperties.txt
create mode 100644 localedata/unicode-gen/EastAsianWidth.txt
create mode 100644 localedata/unicode-gen/Makefile
create mode 100644 localedata/unicode-gen/UnicodeData.txt
create mode 100755 localedata/unicode-gen/ctype_compatibility.py
create mode 100644 localedata/unicode-gen/ctype_compatibility_test_cases.py
create mode 100755 localedata/unicode-gen/gen_unicode_ctype.py
create mode 100644 localedata/unicode-gen/unicode-license.txt
create mode 100755 localedata/unicode-gen/utf8_compatibility.py
create mode 100755 localedata/unicode-gen/utf8_gen.py
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (43 preceding siblings ...)
2015-02-21 1:04 ` cvs-commit at gcc dot gnu.org
@ 2015-02-21 1:05 ` aoliva at sourceware dot org
2015-02-21 20:50 ` aoliva at sourceware dot org
` (2 subsequent siblings)
47 siblings, 0 replies; 49+ messages in thread
From: aoliva at sourceware dot org @ 2015-02-21 1:05 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Bug 14094 depends on bug 17588, which changed state.
Bug 17588 Summary: Update UTF-8 charmap and width to Unicode 7.0.0
https://sourceware.org/bugzilla/show_bug.cgi?id=17588
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (44 preceding siblings ...)
2015-02-21 1:05 ` aoliva at sourceware dot org
@ 2015-02-21 20:50 ` aoliva at sourceware dot org
2016-03-22 11:19 ` egmont at gmail dot com
2016-03-22 18:33 ` vapier at gentoo dot org
47 siblings, 0 replies; 49+ messages in thread
From: aoliva at sourceware dot org @ 2015-02-21 20:50 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Alexandre Oliva <aoliva at sourceware dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
CC| |aoliva at sourceware dot org
Resolution|--- |FIXED
--- Comment #39 from Alexandre Oliva <aoliva at sourceware dot org> ---
Fixed
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (45 preceding siblings ...)
2015-02-21 20:50 ` aoliva at sourceware dot org
@ 2016-03-22 11:19 ` egmont at gmail dot com
2016-03-22 18:33 ` vapier at gentoo dot org
47 siblings, 0 replies; 49+ messages in thread
From: egmont at gmail dot com @ 2016-03-22 11:19 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Egmont Koblinger <egmont at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |egmont at gmail dot com
--- Comment #40 from Egmont Koblinger <egmont at gmail dot com> ---
Please see bug 19852 for a followup issue.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [Bug localedata/14094] Update locale data to Unicode 7.0.0
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
` (46 preceding siblings ...)
2016-03-22 11:19 ` egmont at gmail dot com
@ 2016-03-22 18:33 ` vapier at gentoo dot org
47 siblings, 0 replies; 49+ messages in thread
From: vapier at gentoo dot org @ 2016-03-22 18:33 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
Mike Frysinger <vapier at gentoo dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://sourceware.org/bugz
| |illa/show_bug.cgi?id=19852
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2016-03-22 18:33 UTC | newest]
Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-10 21:06 [Bug localedata/14094] New: Update locale data to Unicode 6.1 jsm28 at gcc dot gnu.org
2012-05-11 7:28 ` [Bug localedata/14094] " bugdal at aerifal dot cx
2013-11-26 17:07 ` myllynen at redhat dot com
2014-02-18 10:12 ` pravin.d.s at gmail dot com
2014-05-21 12:52 ` allan at archlinux dot org
2014-05-21 12:52 ` johannes at kyriasis dot com
2014-05-23 7:56 ` [Bug localedata/14094] Update locale data to Unicode 6.3 pravin.d.s at gmail dot com
2014-05-23 12:11 ` joseph at codesourcery dot com
2014-05-23 13:55 ` pravin.d.s at gmail dot com
2014-06-10 9:43 ` pravin.d.s at gmail dot com
2014-06-10 14:40 ` carlos at redhat dot com
2014-06-11 4:25 ` pravin.d.s at gmail dot com
2014-06-19 11:28 ` pravin.d.s at gmail dot com
2014-06-21 19:15 ` [Bug localedata/14094] Update locale data to Unicode 7.0.0 pravin.d.s at gmail dot com
2014-06-25 12:08 ` fweimer at redhat dot com
2014-06-25 12:43 ` pravin.d.s at gmail dot com
2014-06-25 13:54 ` carlos at redhat dot com
2014-07-04 10:51 ` pravin.d.s at gmail dot com
2014-07-17 12:44 ` pravin.d.s at gmail dot com
2014-07-22 13:03 ` pravin.d.s at gmail dot com
2014-09-05 1:08 ` carlos at redhat dot com
2014-09-29 7:29 ` maiku.fabian at gmail dot com
2014-09-29 7:30 ` pravin.d.s at gmail dot com
2014-10-14 8:08 ` maiku.fabian at gmail dot com
2014-11-06 11:03 ` maiku.fabian at gmail dot com
2014-11-06 11:03 ` maiku.fabian at gmail dot com
2014-11-06 11:05 ` maiku.fabian at gmail dot com
2014-11-06 11:22 ` maiku.fabian at gmail dot com
2014-11-06 11:56 ` maiku.fabian at gmail dot com
2014-11-06 11:59 ` maiku.fabian at gmail dot com
2014-11-06 12:09 ` maiku.fabian at gmail dot com
2014-11-12 10:15 ` pravin.d.s at gmail dot com
2014-11-12 10:25 ` pravin.d.s at gmail dot com
2014-11-14 15:10 ` maiku.fabian at gmail dot com
2014-11-14 15:10 ` maiku.fabian at gmail dot com
2014-11-14 15:10 ` maiku.fabian at gmail dot com
2014-11-14 15:10 ` maiku.fabian at gmail dot com
2014-11-14 15:11 ` maiku.fabian at gmail dot com
2014-11-24 11:28 ` maiku.fabian at gmail dot com
2014-12-01 10:38 ` maiku.fabian at gmail dot com
2014-12-03 10:01 ` maiku.fabian at gmail dot com
2014-12-03 12:47 ` maiku.fabian at gmail dot com
2014-12-03 12:47 ` maiku.fabian at gmail dot com
2014-12-04 10:35 ` maiku.fabian at gmail dot com
2015-02-21 1:04 ` cvs-commit at gcc dot gnu.org
2015-02-21 1:05 ` aoliva at sourceware dot org
2015-02-21 20:50 ` aoliva at sourceware dot org
2016-03-22 11:19 ` egmont at gmail dot com
2016-03-22 18:33 ` vapier at gentoo dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).