[Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range

public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed

* [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
@ 2017-11-02 14:00 claude at 2xlibre dot net
  2017-11-02 14:15 ` [Bug localedata/22387] " claude at 2xlibre dot net
                   ` (33 more replies)
  0 siblings, 34 replies; 42+ messages in thread
From: claude at 2xlibre dot net @ 2017-11-02 14:00 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

            Bug ID: 22387
           Summary: Replace unicode sequences <Uxxxx> for characters
                    inside the ASCII printable range
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: claude at 2xlibre dot net
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

Quoting Mike Fabian from #22382:

>> As a side note, I see less unicode sequence codes like <U0063> in locale
>> files. Do you have a new policy in place?

>We agreed that it is OK to use ASCII directly, so one has to use <U....>
>only for stuff which is not ASCII.

>> Would you like patches for more
>> global replacements for all files?

>I think yes. When we started to use more ASCII a while ago, we did not
>do global replacements and changed it only in the files we touched anyway
>to see whether it would cause any problems. As far as I know we did not
>encounter any problems so far, so it seems OK to do it globally.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
@ 2017-11-02 14:15 ` claude at 2xlibre dot net
  2017-11-02 23:34   ` Keld Simonsen
  2017-11-02 16:54 ` maiku.fabian at gmail dot com
                   ` (32 subsequent siblings)
  33 siblings, 1 reply; 42+ messages in thread
From: claude at 2xlibre dot net @ 2017-11-02 14:15 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #1 from Claude Paroz <claude at 2xlibre dot net> ---
Created attachment 10570
  --> https://sourceware.org/bugzilla/attachment.cgi?id=10570&action=edit
Partial patch for opinions

Hereby a first draft of what the patch could be. Is this the right direction?
Do you want one big patch or anything else?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
  2017-11-02 14:15 ` [Bug localedata/22387] " claude at 2xlibre dot net
@ 2017-11-02 16:54 ` maiku.fabian at gmail dot com
  2017-11-02 17:00 ` maiku.fabian at gmail dot com
                   ` (31 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-11-02 16:54 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2017-11-02
                 CC|                            |maiku.fabian at gmail dot com
           Assignee|unassigned at sourceware dot org   |maiku.fabian at gmail dot com
     Ever confirmed|0                           |1

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
  2017-11-02 14:15 ` [Bug localedata/22387] " claude at 2xlibre dot net
  2017-11-02 16:54 ` maiku.fabian at gmail dot com
@ 2017-11-02 17:00 ` maiku.fabian at gmail dot com
  2017-11-02 17:06 ` claude at 2xlibre dot net
                   ` (30 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-11-02 17:00 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #2 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Claude Paroz from comment #1)
> Created attachment 10570 [details]
> Partial patch for opinions
> 
> Hereby a first draft of what the patch could be. Is this the right
> direction? 

Yes, but I should have been clearer that even some ASCII characters are
not allowed, for example % is usually the comment character, so it
cannot be used like this:

punct   !;";#;$;%;&;';(;);*;+;,;-;.;<U002F>;:;;;<;=;>;?;@;[;\;];^;_;`;{;|;};~

And / is usually the line continuation character.

> Do you want one big patch or anything else?

Whatever is easiest, I could also write a script to do it ...

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (2 preceding siblings ...)
  2017-11-02 17:00 ` maiku.fabian at gmail dot com
@ 2017-11-02 17:06 ` claude at 2xlibre dot net
  2017-11-02 17:21 ` claude at 2xlibre dot net
                   ` (29 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: claude at 2xlibre dot net @ 2017-11-02 17:06 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #3 from Claude Paroz <claude at 2xlibre dot net> ---
OK, I'll special case '%' and '/'.

I'm also using a script to do much of the changes. A quick manual review allows
for example for comment deletion where it is merely a copy of the (now
unobfuscated) value.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (3 preceding siblings ...)
  2017-11-02 17:06 ` claude at 2xlibre dot net
@ 2017-11-02 17:21 ` claude at 2xlibre dot net
  2017-11-02 17:38 ` schwab@linux-m68k.org
                   ` (28 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: claude at 2xlibre dot net @ 2017-11-02 17:21 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #4 from Claude Paroz <claude at 2xlibre dot net> ---
But hopefully we can still use '%' and '/' when they are inside a string (see
for example the d_fmt   "%d/%m/%y" line in the current an_ES locale).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (4 preceding siblings ...)
  2017-11-02 17:21 ` claude at 2xlibre dot net
@ 2017-11-02 17:38 ` schwab@linux-m68k.org
  2017-11-02 20:40 ` egmont at gmail dot com
                   ` (27 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: schwab@linux-m68k.org @ 2017-11-02 17:38 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #5 from Andreas Schwab <schwab@linux-m68k.org> ---
The escape character is also special inside strings.

See
<http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_03>
for the full rules.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (5 preceding siblings ...)
  2017-11-02 17:38 ` schwab@linux-m68k.org
@ 2017-11-02 20:40 ` egmont at gmail dot com
  2017-11-02 23:34 ` keld at keldix dot com
                   ` (26 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: egmont at gmail dot com @ 2017-11-02 20:40 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

Egmont Koblinger <egmont at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |egmont at gmail dot com

--- Comment #6 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to Andreas Schwab from comment #5)

> See
> <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.
> html#tag_07_03> for the full rules.

Bullet point 2 here says "Within a string, the double-quote character, the
escape character, and the right angle bracket character shall be escaped [...]"

Why not the left angle bracket too? Otherwise you can't tell for sure whether
"<U+0020>" stands for a space, or for literal
lessthan-you-plus-oh-oh-two-oh-greaterthan.

I think it doesn't hurt to remain a bit safer with special characters, e.g.
escape the comma, semicolon, less-than, greater-than, backshash, and whatever
the escape character (typically overridden to slash in locale files)
everywhere.

---

On the other hand, what about non-ASCII characters? Are they allowed as raw
UTF-8, or do they still need to be escaped? Allowing raw UTF-8, such as a
weekday name of "hétfő" rather than "h<U00E9>tf<U0151>" would highly improve
readability of the file.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (6 preceding siblings ...)
  2017-11-02 20:40 ` egmont at gmail dot com
@ 2017-11-02 23:34 ` keld at keldix dot com
  2017-11-03  3:12 ` carlos at redhat dot com
                   ` (25 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: keld at keldix dot com @ 2017-11-02 23:34 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #7 from keld at keldix dot com <keld at keldix dot com> ---
I think we should not do this, as it would make locales unusable
with ebcdic encodings. I am also unsure how it will work with utf-16.

I propose you use better mnemonics for the ascii range, such as <a> for a,
etc.  That is, use the mnemonics defined in the POSIX standard for the ascii
range.

best regards
keld

On Thu, Nov 02, 2017 at 02:15:41PM +0000, claude at 2xlibre dot net wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #1 from Claude Paroz <claude at 2xlibre dot net> ---
> Created attachment 10570
>   --> https://sourceware.org/bugzilla/attachment.cgi?id=10570&action=edit
> Partial patch for opinions
> 
> Hereby a first draft of what the patch could be. Is this the right direction?
> Do you want one big patch or anything else?
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:15 ` [Bug localedata/22387] " claude at 2xlibre dot net
@ 2017-11-02 23:34   ` Keld Simonsen
  0 siblings, 0 replies; 42+ messages in thread
From: Keld Simonsen @ 2017-11-02 23:34 UTC (permalink / raw)
  To: claude at 2xlibre dot net; +Cc: libc-locales

I think we should not do this, as it would make locales unusable
with ebcdic encodings. I am also unsure how it will work with utf-16.

I propose you use better mnemonics for the ascii range, such as <a> for a,
etc.  That is, use the mnemonics defined in the POSIX standard for the ascii range.

best regards
keld

On Thu, Nov 02, 2017 at 02:15:41PM +0000, claude at 2xlibre dot net wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #1 from Claude Paroz <claude at 2xlibre dot net> ---
> Created attachment 10570
>   --> https://sourceware.org/bugzilla/attachment.cgi?id=10570&action=edit
> Partial patch for opinions
> 
> Hereby a first draft of what the patch could be. Is this the right direction?
> Do you want one big patch or anything else?
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (7 preceding siblings ...)
  2017-11-02 23:34 ` keld at keldix dot com
@ 2017-11-03  3:12 ` carlos at redhat dot com
  2017-11-07 23:54   ` Keld Simonsen
  2017-11-08 19:59   ` Keld Simonsen
  2017-11-03  6:49 ` schwab@linux-m68k.org
                   ` (24 subsequent siblings)
  33 siblings, 2 replies; 42+ messages in thread
From: carlos at redhat dot com @ 2017-11-03  3:12 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com

--- Comment #8 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to keld@keldix.com from comment #7)
> I think we should not do this, as it would make locales unusable
> with ebcdic encodings. I am also unsure how it will work with utf-16.

Please provide a justification for this requirement to support EBCDIC and
UTF-16, included systems that would be impacted today by this change.

I spoke with Ulrich Drepper directly, and he did point out that the design idea
behind using <Uxxxx> sequences was indeed to support the locales on systems
that had other encodings like EBCDIC, but with the rise of UTF-8 as the defacto
standard, no such systems have really materialized.

> I propose you use better mnemonics for the ascii range, such as <a> for a,
> etc.  That is, use the mnemonics defined in the POSIX standard for the ascii
> range.

I disagree strongly with this, why use '<a>' instead of 'a'? Please provide
strong rationale for why we should keep using the <Uxxxx> format.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (8 preceding siblings ...)
  2017-11-03  3:12 ` carlos at redhat dot com
@ 2017-11-03  6:49 ` schwab@linux-m68k.org
  2017-11-03  9:52 ` egmont at gmail dot com
                   ` (23 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: schwab@linux-m68k.org @ 2017-11-03  6:49 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #9 from Andreas Schwab <schwab@linux-m68k.org> ---
(In reply to Egmont Koblinger from comment #6)
> Bullet point 2 here says "Within a string, the double-quote character, the
> escape character, and the right angle bracket character shall be escaped
> [...]"
> 
> Why not the left angle bracket too?

I think "right" is a typo here.  It doesn't really make sense otherwise.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (9 preceding siblings ...)
  2017-11-03  6:49 ` schwab@linux-m68k.org
@ 2017-11-03  9:52 ` egmont at gmail dot com
  2017-11-03  9:56 ` egmont at gmail dot com
                   ` (22 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: egmont at gmail dot com @ 2017-11-03  9:52 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #10 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to Andreas Schwab from comment #9)

> I think "right" is a typo here.  It doesn't really make sense otherwise.

Or they meant "right angle" (i.e. 90 degrees) bracket :-D

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (10 preceding siblings ...)
  2017-11-03  9:52 ` egmont at gmail dot com
@ 2017-11-03  9:56 ` egmont at gmail dot com
  2017-11-09 10:18   ` Keld Simonsen
  2017-11-03 15:43 ` maiku.fabian at gmail dot com
                   ` (21 subsequent siblings)
  33 siblings, 1 reply; 42+ messages in thread
From: egmont at gmail dot com @ 2017-11-03  9:56 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #11 from Egmont Koblinger <egmont at gmail dot com> ---
I don't understand the EBCDIC worries at all.

These locale definition files are in ASCII. If you interpret these same files
in EBCDIC, section names and property names don't make any sense, and neither
do encoded characters such as "<U0020>", I mean it's no longer
less/greater-than, uppercase U and digits.

Then, if you iconv the file, the resulting <U0020> and friends still define
Unicode codepoints and not EBCDIC ones.

So, in order to use these files in an EBCDIC environment, they need to be
converted on two different levels.

This does not become any harder or any more complicated by allowing plain ASCII
characters.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (11 preceding siblings ...)
  2017-11-03  9:56 ` egmont at gmail dot com
@ 2017-11-03 15:43 ` maiku.fabian at gmail dot com
  2017-11-03 15:51 ` egmont at gmail dot com
                   ` (20 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-11-03 15:43 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #12 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Andreas Schwab from comment #5)
> The escape character is also special inside strings.
> 
> See
> <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.
> html#tag_07_03> for the full rules.

To understand what that means in practice, here an example.
The following source:

d_fmt   "%d/%m/%Y  %% // /% %/ \% \/ \\"

produces this:

[root@taka /]# LC_ALL=tpi_PG.UTF-8 locale -k d_fmt
d_fmt="%d%m%Y  %% / % % \% \ \\"

I.e. the current source for tpi_PG has an error in d_fmt,
instead of 

"%d/%m/%Y" 

it should be

"%d//%m//%Y"

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (12 preceding siblings ...)
  2017-11-03 15:43 ` maiku.fabian at gmail dot com
@ 2017-11-03 15:51 ` egmont at gmail dot com
  2017-11-03 16:49 ` joseph at codesourcery dot com
                   ` (19 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: egmont at gmail dot com @ 2017-11-03 15:51 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #13 from Egmont Koblinger <egmont at gmail dot com> ---
Can't a proposed patch be verified by comparing the compiled locale files
byte-by-byte?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (13 preceding siblings ...)
  2017-11-03 15:51 ` egmont at gmail dot com
@ 2017-11-03 16:49 ` joseph at codesourcery dot com
  2017-11-06 13:23 ` claude at 2xlibre dot net
                   ` (18 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: joseph at codesourcery dot com @ 2017-11-03 16:49 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #14 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
Furthermore, glibc effectively requires locales to be ASCII-compatible, in 
that plenty of code dealing with strings in glibc directly generates or 
interprets ASCII characters based on string or character constants in the 
glibc code.  (There may be a few variations for a few characters in some 
locales, and it can't be assumed that toupper ('i') or tolower ('I') are 
ASCII-compatible because of Turkish locales.)  Thus we can assume that 
localedef is run in an ASCII-compatible locale.  EBCDIC variants are 
supported by iconv for character set conversions; they are *not* supported 
as locale character sets.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (14 preceding siblings ...)
  2017-11-03 16:49 ` joseph at codesourcery dot com
@ 2017-11-06 13:23 ` claude at 2xlibre dot net
  2017-11-06 13:26 ` claude at 2xlibre dot net
                   ` (17 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: claude at 2xlibre dot net @ 2017-11-06 13:23 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #15 from Claude Paroz <claude at 2xlibre dot net> ---
(In reply to Egmont Koblinger from comment #13)
> Can't a proposed patch be verified by comparing the compiled locale files
> byte-by-byte?

Thanks, that was an excellent suggestion that allowed me to spot some errors in
my (almost-ready) forthcoming patch.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (15 preceding siblings ...)
  2017-11-06 13:23 ` claude at 2xlibre dot net
@ 2017-11-06 13:26 ` claude at 2xlibre dot net
  2017-11-07 15:12 ` claude at 2xlibre dot net
                   ` (16 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: claude at 2xlibre dot net @ 2017-11-06 13:26 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #16 from Claude Paroz <claude at 2xlibre dot net> ---
(In reply to Mike FABIAN from comment #12)
> I.e. the current source for tpi_PG has an error in d_fmt,
> instead of 
> 
> "%d/%m/%Y" 
> 
> it should be
> 
> "%d//%m//%Y"

I reported similar errors in #22403 for locales an_ES, kab_DZ and om_ET.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (16 preceding siblings ...)
  2017-11-06 13:26 ` claude at 2xlibre dot net
@ 2017-11-07 15:12 ` claude at 2xlibre dot net
  2017-11-07 16:57 ` piotrdrag at gmail dot com
                   ` (15 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: claude at 2xlibre dot net @ 2017-11-07 15:12 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

Claude Paroz <claude at 2xlibre dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #10570|0                           |1
        is obsolete|                            |

--- Comment #17 from Claude Paroz <claude at 2xlibre dot net> ---
Created attachment 10577
  --> https://sourceware.org/bugzilla/attachment.cgi?id=10577&action=edit
Complete patch

Here's the complete patch implementing these sequence replacements.

I'm open to split the patch in smaller chunks if it's easier to review.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (17 preceding siblings ...)
  2017-11-07 15:12 ` claude at 2xlibre dot net
@ 2017-11-07 16:57 ` piotrdrag at gmail dot com
  2017-11-07 23:54 ` keld at keldix dot com
                   ` (14 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: piotrdrag at gmail dot com @ 2017-11-07 16:57 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

Piotr Drąg <piotrdrag at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |piotrdrag at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-03  3:12 ` carlos at redhat dot com
@ 2017-11-07 23:54   ` Keld Simonsen
  2017-11-08 19:59   ` Keld Simonsen
  1 sibling, 0 replies; 42+ messages in thread
From: Keld Simonsen @ 2017-11-07 23:54 UTC (permalink / raw)
  To: carlos at redhat dot com; +Cc: libc-locales

Hi

I am not sure if I can write a very strong case against using ASCII in strings.
I hav no practical experience with problems, but I see a number of possible
conflicts. You could also say that because of the design used until now,
where all our locales have been character coding independent, we have
not seen any problems!

I am the editor of ISO 14652 and ISO 30112, and of the 100 pages annex in
the POSIX standard that originally introduced the codeset independent locales
for POSIX and thus Linux. 14652 and 30112 are the standards that define many
of the extensions from POSIX that we use in glibc for i18n and l10n.
From an architectual view I would really like that we keep glibc locales
character coding independent, so our locales can be used without change 
on all systems that adheres to those standards.

But even if we restrict ourselves to only look at glibc implementations,
using coding dependent locales may cause problems. Not on everyday Linux
systems, were we mostly operate in UTF-8, and sometimes in other coded
character sets, but on other systems. 

gcc and glibc is probably the most ported C compiler and C library in the world.
Some of the platforms it has been ported to run in non-ascii compatible
environments, I think this includes

    MS windows, which uses UTF-16, and which now includes an Ubuntu system
    MAC OS/ IOS , which uses UTF16, and where gcc/glibc ports exists
    EBCDIC machines, where gcc/glibc ports exist - they run many banking and aviation systems
    Embedded systems where many kinds of character sets are used.
    Older systems in Eastern Asia, still using older Eastern Asia 14-bit character sets.

I do see the need for better looking locales, They would be easer to write and debug.
Thereofre I propose that we use the mnemonics defined in ISO 14652/ISO 30112 at least
for the ASCII characters. These were also used in the original locales that I wrote
and Ulrich Drepper used for his initial work for glibc. At some point Ulrich decided
to use Uxxxx mnemonics, which made locales more unreadable. I do agree that using Uxxxx
is a good solution for the characters that are not known to everybody, such as Chinese,
Korean and Japanese characters. This gives a chance to everybody in the world to
work on locales using these characters, which actually in our moderne world means
all locales in the world, as we all may use full UTF-8 or the like.

best regards
Keld

On Fri, Nov 03, 2017 at 03:12:25AM +0000, carlos at redhat dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> Carlos O'Donell <carlos at redhat dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |carlos at redhat dot com
> 
> --- Comment #8 from Carlos O'Donell <carlos at redhat dot com> ---
> (In reply to keld@keldix.com from comment #7)
> > I think we should not do this, as it would make locales unusable
> > with ebcdic encodings. I am also unsure how it will work with utf-16.
> 
> Please provide a justification for this requirement to support EBCDIC and
> UTF-16, included systems that would be impacted today by this change.
> 
> I spoke with Ulrich Drepper directly, and he did point out that the design idea
> behind using <Uxxxx> sequences was indeed to support the locales on systems
> that had other encodings like EBCDIC, but with the rise of UTF-8 as the defacto
> standard, no such systems have really materialized.
> 
> > I propose you use better mnemonics for the ascii range, such as <a> for a,
> > etc.  That is, use the mnemonics defined in the POSIX standard for the ascii
> > range.
> 
> I disagree strongly with this, why use '<a>' instead of 'a'? Please provide
> strong rationale for why we should keep using the <Uxxxx> format.
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (18 preceding siblings ...)
  2017-11-07 16:57 ` piotrdrag at gmail dot com
@ 2017-11-07 23:54 ` keld at keldix dot com
  2017-11-08 20:00 ` keld at keldix dot com
                   ` (13 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: keld at keldix dot com @ 2017-11-07 23:54 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #18 from keld at keldix dot com <keld at keldix dot com> ---
Hi

I am not sure if I can write a very strong case against using ASCII in strings.
I hav no practical experience with problems, but I see a number of possible
conflicts. You could also say that because of the design used until now,
where all our locales have been character coding independent, we have
not seen any problems!

I am the editor of ISO 14652 and ISO 30112, and of the 100 pages annex in
the POSIX standard that originally introduced the codeset independent locales
for POSIX and thus Linux. 14652 and 30112 are the standards that define many
of the extensions from POSIX that we use in glibc for i18n and l10n.
From an architectual view I would really like that we keep glibc locales
character coding independent, so our locales can be used without change 
on all systems that adheres to those standards.

But even if we restrict ourselves to only look at glibc implementations,
using coding dependent locales may cause problems. Not on everyday Linux
systems, were we mostly operate in UTF-8, and sometimes in other coded
character sets, but on other systems. 

gcc and glibc is probably the most ported C compiler and C library in the
world.
Some of the platforms it has been ported to run in non-ascii compatible
environments, I think this includes

    MS windows, which uses UTF-16, and which now includes an Ubuntu system
    MAC OS/ IOS , which uses UTF16, and where gcc/glibc ports exists
    EBCDIC machines, where gcc/glibc ports exist - they run many banking and
aviation systems
    Embedded systems where many kinds of character sets are used.
    Older systems in Eastern Asia, still using older Eastern Asia 14-bit
character sets.

I do see the need for better looking locales, They would be easer to write and
debug.
Thereofre I propose that we use the mnemonics defined in ISO 14652/ISO 30112 at
least
for the ASCII characters. These were also used in the original locales that I
wrote
and Ulrich Drepper used for his initial work for glibc. At some point Ulrich
decided
to use Uxxxx mnemonics, which made locales more unreadable. I do agree that
using Uxxxx
is a good solution for the characters that are not known to everybody, such as
Chinese,
Korean and Japanese characters. This gives a chance to everybody in the world
to
work on locales using these characters, which actually in our moderne world
means
all locales in the world, as we all may use full UTF-8 or the like.

best regards
Keld

On Fri, Nov 03, 2017 at 03:12:25AM +0000, carlos at redhat dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> Carlos O'Donell <carlos at redhat dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |carlos at redhat dot com
> 
> --- Comment #8 from Carlos O'Donell <carlos at redhat dot com> ---
> (In reply to keld@keldix.com from comment #7)
> > I think we should not do this, as it would make locales unusable
> > with ebcdic encodings. I am also unsure how it will work with utf-16.
> 
> Please provide a justification for this requirement to support EBCDIC and
> UTF-16, included systems that would be impacted today by this change.
> 
> I spoke with Ulrich Drepper directly, and he did point out that the design idea
> behind using <Uxxxx> sequences was indeed to support the locales on systems
> that had other encodings like EBCDIC, but with the rise of UTF-8 as the defacto
> standard, no such systems have really materialized.
> 
> > I propose you use better mnemonics for the ascii range, such as <a> for a,
> > etc.  That is, use the mnemonics defined in the POSIX standard for the ascii
> > range.
> 
> I disagree strongly with this, why use '<a>' instead of 'a'? Please provide
> strong rationale for why we should keep using the <Uxxxx> format.
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-03  3:12 ` carlos at redhat dot com
  2017-11-07 23:54   ` Keld Simonsen
@ 2017-11-08 19:59   ` Keld Simonsen
  1 sibling, 0 replies; 42+ messages in thread
From: Keld Simonsen @ 2017-11-08 19:59 UTC (permalink / raw)
  To: carlos at redhat dot com; +Cc: libc-locales

On Fri, Nov 03, 2017 at 03:12:25AM +0000, carlos at redhat dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> Carlos O'Donell <carlos at redhat dot com> changed:
> 
> > I propose you use better mnemonics for the ascii range, such as <a> for a,
> > etc.  That is, use the mnemonics defined in the POSIX standard for the ascii
> > range.
> 
> I disagree strongly with this, why use '<a>' instead of 'a'? Please provide
> strong rationale for why we should keep using the <Uxxxx> format.

If you use <a> instead of 'a' then the compiled locale would have a well-defined 
content, when the source encoding and the target encoding differ. This is not
the case when just using 'a', giving wrong results.

I listed a number of scenarios where you would use different source and
target character encoding in my previous mail. This involves major OS'es
like OS X, Microsoft Windows, IOS and embedded systems.

Best regards
keld

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (19 preceding siblings ...)
  2017-11-07 23:54 ` keld at keldix dot com
@ 2017-11-08 20:00 ` keld at keldix dot com
  2017-11-09 10:19 ` keld at keldix dot com
                   ` (12 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: keld at keldix dot com @ 2017-11-08 20:00 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #19 from keld at keldix dot com <keld at keldix dot com> ---
On Fri, Nov 03, 2017 at 03:12:25AM +0000, carlos at redhat dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> Carlos O'Donell <carlos at redhat dot com> changed:
> 
> > I propose you use better mnemonics for the ascii range, such as <a> for a,
> > etc.  That is, use the mnemonics defined in the POSIX standard for the ascii
> > range.
> 
> I disagree strongly with this, why use '<a>' instead of 'a'? Please provide
> strong rationale for why we should keep using the <Uxxxx> format.

If you use <a> instead of 'a' then the compiled locale would have a
well-defined 
content, when the source encoding and the target encoding differ. This is not
the case when just using 'a', giving wrong results.

I listed a number of scenarios where you would use different source and
target character encoding in my previous mail. This involves major OS'es
like OS X, Microsoft Windows, IOS and embedded systems.

Best regards
keld

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-03  9:56 ` egmont at gmail dot com
@ 2017-11-09 10:18   ` Keld Simonsen
  0 siblings, 0 replies; 42+ messages in thread
From: Keld Simonsen @ 2017-11-09 10:18 UTC (permalink / raw)
  To: egmont at gmail dot com; +Cc: libc-locales

On Fri, Nov 03, 2017 at 09:56:16AM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #11 from Egmont Koblinger <egmont at gmail dot com> ---
> I don't understand the EBCDIC worries at all.
> 
> These locale definition files are in ASCII. If you interpret these same files
> in EBCDIC, section names and property names don't make any sense, and neither
> do encoded characters such as "<U0020>", I mean it's no longer
> less/greater-than, uppercase U and digits.

Yes all source files should be converted from Ascii to the ebcdic in question.
This is also the case on UTF-16 systems, the source files should be converted
from some sort of ascii compatible encoding to UTF-16. Or the other way - if you
move sources from a non ascii-compatible system to an ascii-compatible system.

This process can be done automatically using eg iconv.

> Then, if you iconv the file, the resulting <U0020> and friends still define
> Unicode codepoints and not EBCDIC ones.

No they are not unicode (or UCS) codepoints. When you compile the locale into a binary
format, then you apply an EBCDIC charmap, and the symbolic <uxxxx> character names get
encoded according to the EBCDIC encoding applied by localedef -f option question.

> So, in order to use these files in an EBCDIC environment, they need to be
> converted on two different levels.

No, only one level of conversion is needed and that can be fully automated.

> This does not become any harder or any more complicated by allowing plain ASCII
> characters.

Well, not so, if you operate in an environment with a source encoding different
from the ebcdic target encoding, and vice versa. 

best regards
Keld

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (20 preceding siblings ...)
  2017-11-08 20:00 ` keld at keldix dot com
@ 2017-11-09 10:19 ` keld at keldix dot com
  2017-11-09 16:31 ` joseph at codesourcery dot com
                   ` (11 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: keld at keldix dot com @ 2017-11-09 10:19 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #20 from keld at keldix dot com <keld at keldix dot com> ---
On Fri, Nov 03, 2017 at 09:56:16AM +0000, egmont at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #11 from Egmont Koblinger <egmont at gmail dot com> ---
> I don't understand the EBCDIC worries at all.
> 
> These locale definition files are in ASCII. If you interpret these same files
> in EBCDIC, section names and property names don't make any sense, and neither
> do encoded characters such as "<U0020>", I mean it's no longer
> less/greater-than, uppercase U and digits.

Yes all source files should be converted from Ascii to the ebcdic in question.
This is also the case on UTF-16 systems, the source files should be converted
from some sort of ascii compatible encoding to UTF-16. Or the other way - if
you
move sources from a non ascii-compatible system to an ascii-compatible system.

This process can be done automatically using eg iconv.

> Then, if you iconv the file, the resulting <U0020> and friends still define
> Unicode codepoints and not EBCDIC ones.

No they are not unicode (or UCS) codepoints. When you compile the locale into a
binary
format, then you apply an EBCDIC charmap, and the symbolic <uxxxx> character
names get
encoded according to the EBCDIC encoding applied by localedef -f option
question.

> So, in order to use these files in an EBCDIC environment, they need to be
> converted on two different levels.

No, only one level of conversion is needed and that can be fully automated.

> This does not become any harder or any more complicated by allowing plain ASCII
> characters.

Well, not so, if you operate in an environment with a source encoding different
from the ebcdic target encoding, and vice versa. 

best regards
Keld

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (21 preceding siblings ...)
  2017-11-09 10:19 ` keld at keldix dot com
@ 2017-11-09 16:31 ` joseph at codesourcery dot com
  2017-11-14  8:11 ` cvs-commit at gcc dot gnu.org
                   ` (10 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: joseph at codesourcery dot com @ 2017-11-09 16:31 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #21 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
On Thu, 9 Nov 2017, keld at keldix dot com wrote:

> Yes all source files should be converted from Ascii to the ebcdic in question.
> This is also the case on UTF-16 systems, the source files should be converted
> from some sort of ascii compatible encoding to UTF-16. Or the other way - if
> you
> move sources from a non ascii-compatible system to an ascii-compatible system.
> 
> This process can be done automatically using eg iconv.

No, it can't be done automatically, without having information somewhere 
about which character set each source file is in (it's entirely possible 
some, e.g. those representing expected output of testcases, are in mixed 
character sets - and in any case represent particular sequences of octets 
that must be preserved because they are to be compared against test output 
in particular locales).

glibc does not make any attempt to support locales that are not more or 
less ASCII compatible, and does not make any attempt to support 16-bit 
bytes (which are not supported by POSIX either) which would be needed for 
UTF-16 to be a valid locale encoding.  We should not pretend that it does, 
any more than we should pretend it supports non-ELF object formats.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (22 preceding siblings ...)
  2017-11-09 16:31 ` joseph at codesourcery dot com
@ 2017-11-14  8:11 ` cvs-commit at gcc dot gnu.org
  2017-11-14  8:13 ` maiku.fabian at gmail dot com
                   ` (9 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2017-11-14  8:11 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #22 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  a259f5d388d6195da958b2d147d17c2e2d16b857 (commit)
      from  cae87e64dca14f50da7bbd99085c7f5e413ad0f8 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a259f5d388d6195da958b2d147d17c2e2d16b857

commit a259f5d388d6195da958b2d147d17c2e2d16b857
Author: Claude Paroz <claude@2xlibre.net>
Date:   Thu Nov 2 15:10:42 2017 +0100

    Replaced unicode sequences in the ASCII printable range

        [BZ #22387]
        * localedata/locales/aa_DJ: Improved readibility by replacing
        <Uxxxx> sequences in the ASCII printable range by their ASCII
        character equivalents.
        * localedata/locales/aa_ER: Likewise.
        * localedata/locales/aa_ER@saaho: Likewise.
        * localedata/locales/aa_ET: Likewise.
        * localedata/locales/af_ZA: Likewise.
        * localedata/locales/agr_PE: Likewise.
        * localedata/locales/ak_GH: Likewise.
        * localedata/locales/am_ET: Likewise.
        * localedata/locales/anp_IN: Likewise.
        * localedata/locales/ar_AE: Likewise.
        * localedata/locales/ar_BH: Likewise.
        * localedata/locales/ar_DZ: Likewise.
        * localedata/locales/ar_EG: Likewise.
        * localedata/locales/ar_IN: Likewise.
        * localedata/locales/ar_IQ: Likewise.
        * localedata/locales/ar_JO: Likewise.
        * localedata/locales/ar_KW: Likewise.
        * localedata/locales/ar_LB: Likewise.
        * localedata/locales/ar_LY: Likewise.
        * localedata/locales/ar_MA: Likewise.
        * localedata/locales/ar_OM: Likewise.
        * localedata/locales/ar_QA: Likewise.
        * localedata/locales/ar_SA: Likewise.
        * localedata/locales/ar_SD: Likewise.
        * localedata/locales/ar_SS: Likewise.
        * localedata/locales/ar_SY: Likewise.
        * localedata/locales/ar_TN: Likewise.
        * localedata/locales/ar_YE: Likewise.
        * localedata/locales/as_IN: Likewise.
        * localedata/locales/ast_ES: Likewise.
        * localedata/locales/ayc_PE: Likewise.
        * localedata/locales/az_AZ: Likewise.
        * localedata/locales/az_IR: Likewise.
        * localedata/locales/be_BY: Likewise.
        * localedata/locales/be_BY@latin: Likewise.
        * localedata/locales/bem_ZM: Likewise.
        * localedata/locales/ber_DZ: Likewise.
        * localedata/locales/ber_MA: Likewise.
        * localedata/locales/bg_BG: Likewise.
        * localedata/locales/bhb_IN: Likewise.
        * localedata/locales/bho_IN: Likewise.
        * localedata/locales/bi_VU: Likewise.
        * localedata/locales/bn_BD: Likewise.
        * localedata/locales/bn_IN: Likewise.
        * localedata/locales/bo_CN: Likewise.
        * localedata/locales/bo_IN: Likewise.
        * localedata/locales/br_FR: Likewise.
        * localedata/locales/brx_IN: Likewise.
        * localedata/locales/bs_BA: Likewise.
        * localedata/locales/byn_ER: Likewise.
        * localedata/locales/ca_AD: Likewise.
        * localedata/locales/ca_ES: Likewise.
        * localedata/locales/ca_FR: Likewise.
        * localedata/locales/ca_IT: Likewise.
        * localedata/locales/ce_RU: Likewise.
        * localedata/locales/chr_US: Likewise.
        * localedata/locales/cmn_TW: Likewise.
        * localedata/locales/crh_UA: Likewise.
        * localedata/locales/cs_CZ: Likewise.
        * localedata/locales/csb_PL: Likewise.
        * localedata/locales/cv_RU: Likewise.
        * localedata/locales/cy_GB: Likewise.
        * localedata/locales/da_DK: Likewise.
        * localedata/locales/de_AT: Likewise.
        * localedata/locales/de_BE: Likewise.
        * localedata/locales/de_CH: Likewise.
        * localedata/locales/de_DE: Likewise.
        * localedata/locales/de_IT: Likewise.
        * localedata/locales/de_LI: Likewise.
        * localedata/locales/de_LU: Likewise.
        * localedata/locales/doi_IN: Likewise.
        * localedata/locales/dv_MV: Likewise.
        * localedata/locales/dz_BT: Likewise.
        * localedata/locales/el_CY: Likewise.
        * localedata/locales/el_GR: Likewise.
        * localedata/locales/en_AG: Likewise.
        * localedata/locales/en_AU: Likewise.
        * localedata/locales/en_BW: Likewise.
        * localedata/locales/en_CA: Likewise.
        * localedata/locales/en_DK: Likewise.
        * localedata/locales/en_GB: Likewise.
        * localedata/locales/en_HK: Likewise.
        * localedata/locales/en_IE: Likewise.
        * localedata/locales/en_IL: Likewise.
        * localedata/locales/en_IN: Likewise.
        * localedata/locales/en_NG: Likewise.
        * localedata/locales/en_NZ: Likewise.
        * localedata/locales/en_PH: Likewise.
        * localedata/locales/en_SG: Likewise.
        * localedata/locales/en_US: Likewise.
        * localedata/locales/en_ZA: Likewise.
        * localedata/locales/en_ZM: Likewise.
        * localedata/locales/en_ZW: Likewise.
        * localedata/locales/eo: Likewise.
        * localedata/locales/es_AR: Likewise.
        * localedata/locales/es_BO: Likewise.
        * localedata/locales/es_CL: Likewise.
        * localedata/locales/es_CO: Likewise.
        * localedata/locales/es_CR: Likewise.
        * localedata/locales/es_CU: Likewise.
        * localedata/locales/es_DO: Likewise.
        * localedata/locales/es_EC: Likewise.
        * localedata/locales/es_ES: Likewise.
        * localedata/locales/es_GT: Likewise.
        * localedata/locales/es_HN: Likewise.
        * localedata/locales/es_MX: Likewise.
        * localedata/locales/es_NI: Likewise.
        * localedata/locales/es_PA: Likewise.
        * localedata/locales/es_PE: Likewise.
        * localedata/locales/es_PR: Likewise.
        * localedata/locales/es_PY: Likewise.
        * localedata/locales/es_SV: Likewise.
        * localedata/locales/es_US: Likewise.
        * localedata/locales/es_UY: Likewise.
        * localedata/locales/es_VE: Likewise.
        * localedata/locales/et_EE: Likewise.
        * localedata/locales/eu_ES: Likewise.
        * localedata/locales/eu_ES@euro: Likewise.
        * localedata/locales/fa_IR: Likewise.
        * localedata/locales/ff_SN: Likewise.
        * localedata/locales/fi_FI: Likewise.
        * localedata/locales/fil_PH: Likewise.
        * localedata/locales/fo_FO: Likewise.
        * localedata/locales/fr_BE: Likewise.
        * localedata/locales/fr_CA: Likewise.
        * localedata/locales/fr_CH: Likewise.
        * localedata/locales/fr_FR: Likewise.
        * localedata/locales/fr_LU: Likewise.
        * localedata/locales/fur_IT: Likewise.
        * localedata/locales/fy_DE: Likewise.
        * localedata/locales/fy_NL: Likewise.
        * localedata/locales/ga_IE: Likewise.
        * localedata/locales/gd_GB: Likewise.
        * localedata/locales/gez_ER: Likewise.
        * localedata/locales/gez_ET: Likewise.
        * localedata/locales/gl_ES: Likewise.
        * localedata/locales/gu_IN: Likewise.
        * localedata/locales/gv_GB: Likewise.
        * localedata/locales/ha_NG: Likewise.
        * localedata/locales/hak_TW: Likewise.
        * localedata/locales/he_IL: Likewise.
        * localedata/locales/hi_IN: Likewise.
        * localedata/locales/hif_FJ: Likewise.
        * localedata/locales/hne_IN: Likewise.
        * localedata/locales/hr_HR: Likewise.
        * localedata/locales/hsb_DE: Likewise.
        * localedata/locales/ht_HT: Likewise.
        * localedata/locales/hu_HU: Likewise.
        * localedata/locales/hy_AM: Likewise.
        * localedata/locales/i18n: Likewise.
        * localedata/locales/ia_FR: Likewise.
        * localedata/locales/id_ID: Likewise.
        * localedata/locales/ig_NG: Likewise.
        * localedata/locales/ik_CA: Likewise.
        * localedata/locales/is_IS: Likewise.
        * localedata/locales/it_CH: Likewise.
        * localedata/locales/it_IT: Likewise.
        * localedata/locales/iu_CA: Likewise.
        * localedata/locales/ja_JP: Likewise.
        * localedata/locales/ka_GE: Likewise.
        * localedata/locales/kk_KZ: Likewise.
        * localedata/locales/kl_GL: Likewise.
        * localedata/locales/kn_IN: Likewise.
        * localedata/locales/ko_KR: Likewise.
        * localedata/locales/kok_IN: Likewise.
        * localedata/locales/ks_IN: Likewise.
        * localedata/locales/ks_IN@devanagari: Likewise.
        * localedata/locales/ku_TR: Likewise.
        * localedata/locales/kw_GB: Likewise.
        * localedata/locales/ky_KG: Likewise.
        * localedata/locales/lb_LU: Likewise.
        * localedata/locales/lg_UG: Likewise.
        * localedata/locales/li_BE: Likewise.
        * localedata/locales/li_NL: Likewise.
        * localedata/locales/lij_IT: Likewise.
        * localedata/locales/ln_CD: Likewise.
        * localedata/locales/lo_LA: Likewise.
        * localedata/locales/lt_LT: Likewise.
        * localedata/locales/lv_LV: Likewise.
        * localedata/locales/lzh_TW: Likewise.
        * localedata/locales/mag_IN: Likewise.
        * localedata/locales/mai_IN: Likewise.
        * localedata/locales/mg_MG: Likewise.
        * localedata/locales/mhr_RU: Likewise.
        * localedata/locales/mi_NZ: Likewise.
        * localedata/locales/mk_MK: Likewise.
        * localedata/locales/ml_IN: Likewise.
        * localedata/locales/mn_MN: Likewise.
        * localedata/locales/mni_IN: Likewise.
        * localedata/locales/mr_IN: Likewise.
        * localedata/locales/ms_MY: Likewise.
        * localedata/locales/mt_MT: Likewise.
        * localedata/locales/my_MM: Likewise.
        * localedata/locales/nan_TW: Likewise.
        * localedata/locales/nan_TW@latin: Likewise.
        * localedata/locales/nb_NO: Likewise.
        * localedata/locales/nds_DE: Likewise.
        * localedata/locales/nds_NL: Likewise.
        * localedata/locales/ne_NP: Likewise.
        * localedata/locales/nhn_MX: Likewise.
        * localedata/locales/niu_NU: Likewise.
        * localedata/locales/niu_NZ: Likewise.
        * localedata/locales/nl_AW: Likewise.
        * localedata/locales/nl_BE: Likewise.
        * localedata/locales/nl_NL: Likewise.
        * localedata/locales/nn_NO: Likewise.
        * localedata/locales/nr_ZA: Likewise.
        * localedata/locales/nso_ZA: Likewise.
        * localedata/locales/oc_FR: Likewise.
        * localedata/locales/om_ET: Likewise.
        * localedata/locales/om_KE: Likewise.
        * localedata/locales/or_IN: Likewise.
        * localedata/locales/os_RU: Likewise.
        * localedata/locales/pa_IN: Likewise.
        * localedata/locales/pa_PK: Likewise.
        * localedata/locales/pap_AW: Likewise.
        * localedata/locales/pap_CW: Likewise.
        * localedata/locales/pl_PL: Likewise.
        * localedata/locales/ps_AF: Likewise.
        * localedata/locales/pt_BR: Likewise.
        * localedata/locales/pt_PT: Likewise.
        * localedata/locales/quz_PE: Likewise.
        * localedata/locales/raj_IN: Likewise.
        * localedata/locales/ro_RO: Likewise.
        * localedata/locales/ru_RU: Likewise.
        * localedata/locales/ru_UA: Likewise.
        * localedata/locales/rw_RW: Likewise.
        * localedata/locales/sa_IN: Likewise.
        * localedata/locales/sat_IN: Likewise.
        * localedata/locales/sc_IT: Likewise.
        * localedata/locales/sd_IN: Likewise.
        * localedata/locales/sd_IN@devanagari: Likewise.
        * localedata/locales/se_NO: Likewise.
        * localedata/locales/sgs_LT: Likewise.
        * localedata/locales/shs_CA: Likewise.
        * localedata/locales/si_LK: Likewise.
        * localedata/locales/sid_ET: Likewise.
        * localedata/locales/sk_SK: Likewise.
        * localedata/locales/sl_SI: Likewise.
        * localedata/locales/sm_WS: Likewise.
        * localedata/locales/so_DJ: Likewise.
        * localedata/locales/so_ET: Likewise.
        * localedata/locales/so_KE: Likewise.
        * localedata/locales/so_SO: Likewise.
        * localedata/locales/sq_AL: Likewise.
        * localedata/locales/sq_MK: Likewise.
        * localedata/locales/sr_ME: Likewise.
        * localedata/locales/sr_RS: Likewise.
        * localedata/locales/sr_RS@latin: Likewise.
        * localedata/locales/ss_ZA: Likewise.
        * localedata/locales/st_ZA: Likewise.
        * localedata/locales/sv_FI: Likewise.
        * localedata/locales/sv_SE: Likewise.
        * localedata/locales/sw_KE: Likewise.
        * localedata/locales/sw_TZ: Likewise.
        * localedata/locales/szl_PL: Likewise.
        * localedata/locales/ta_IN: Likewise.
        * localedata/locales/ta_LK: Likewise.
        * localedata/locales/tcy_IN: Likewise.
        * localedata/locales/te_IN: Likewise.
        * localedata/locales/tg_TJ: Likewise.
        * localedata/locales/th_TH: Likewise.
        * localedata/locales/the_NP: Likewise.
        * localedata/locales/ti_ER: Likewise.
        * localedata/locales/ti_ET: Likewise.
        * localedata/locales/tig_ER: Likewise.
        * localedata/locales/tk_TM: Likewise.
        * localedata/locales/tl_PH: Likewise.
        * localedata/locales/tn_ZA: Likewise.
        * localedata/locales/to_TO: Likewise.
        * localedata/locales/tpi_PG: Likewise.
        * localedata/locales/tr_CY: Likewise.
        * localedata/locales/tr_TR: Likewise.
        * localedata/locales/ts_ZA: Likewise.
        * localedata/locales/tt_RU: Likewise.
        * localedata/locales/tt_RU@iqtelif: Likewise.
        * localedata/locales/ug_CN: Likewise.
        * localedata/locales/uk_UA: Likewise.
        * localedata/locales/unm_US: Likewise.
        * localedata/locales/ur_IN: Likewise.
        * localedata/locales/ur_PK: Likewise.
        * localedata/locales/uz_UZ: Likewise.
        * localedata/locales/uz_UZ@cyrillic: Likewise.
        * localedata/locales/ve_ZA: Likewise.
        * localedata/locales/vi_VN: Likewise.
        * localedata/locales/wa_BE: Likewise.
        * localedata/locales/wae_CH: Likewise.
        * localedata/locales/wal_ET: Likewise.
        * localedata/locales/wo_SN: Likewise.
        * localedata/locales/xh_ZA: Likewise.
        * localedata/locales/yi_US: Likewise.
        * localedata/locales/yo_NG: Likewise.
        * localedata/locales/yue_HK: Likewise.
        * localedata/locales/yuw_PG: Likewise.
        * localedata/locales/zh_CN: Likewise.
        * localedata/locales/zh_HK: Likewise.
        * localedata/locales/zh_SG: Likewise.
        * localedata/locales/zh_TW: Likewise.
        * localedata/locales/zu_ZA: Likewise.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                           |  305 +++++++++++++++++++++++++++++++++++
 localedata/locales/aa_DJ            |  136 ++++++----------
 localedata/locales/aa_ER            |  129 ++++++---------
 localedata/locales/aa_ER@saaho      |  104 +++++-------
 localedata/locales/aa_ET            |  130 ++++++---------
 localedata/locales/af_ZA            |  113 +++++--------
 localedata/locales/agr_PE           |  113 ++++++--------
 localedata/locales/ak_GH            |  152 +++++++-----------
 localedata/locales/am_ET            |   50 ++----
 localedata/locales/anp_IN           |   35 ++---
 localedata/locales/ar_AE            |   71 ++++-----
 localedata/locales/ar_BH            |   68 +++-----
 localedata/locales/ar_DZ            |   68 +++-----
 localedata/locales/ar_EG            |   68 +++-----
 localedata/locales/ar_IN            |   39 ++---
 localedata/locales/ar_IQ            |   84 ++++------
 localedata/locales/ar_JO            |   84 ++++------
 localedata/locales/ar_KW            |   69 +++-----
 localedata/locales/ar_LB            |   84 ++++------
 localedata/locales/ar_LY            |   69 +++-----
 localedata/locales/ar_MA            |   68 +++-----
 localedata/locales/ar_OM            |   67 +++-----
 localedata/locales/ar_QA            |   69 +++-----
 localedata/locales/ar_SA            |   59 +++-----
 localedata/locales/ar_SD            |   71 +++-----
 localedata/locales/ar_SS            |   70 +++-----
 localedata/locales/ar_SY            |   85 ++++------
 localedata/locales/ar_TN            |   69 +++-----
 localedata/locales/ar_YE            |   68 +++-----
 localedata/locales/as_IN            |   36 ++---
 localedata/locales/ast_ES           |   85 +++++------
 localedata/locales/ayc_PE           |  113 ++++++-------
 localedata/locales/az_AZ            |  114 ++++++-------
 localedata/locales/az_IR            |   39 +----
 localedata/locales/be_BY            |   49 +++----
 localedata/locales/be_BY@latin      |  116 ++++++-------
 localedata/locales/bem_ZM           |  150 +++++++-----------
 localedata/locales/ber_DZ           |  129 +++++++--------
 localedata/locales/ber_MA           |  129 +++++++--------
 localedata/locales/bg_BG            |   58 +++----
 localedata/locales/bhb_IN           |  114 ++++++--------
 localedata/locales/bho_IN           |   34 ++---
 localedata/locales/bi_VU            |  158 ++++++++-----------
 localedata/locales/bn_BD            |   50 +++----
 localedata/locales/bn_IN            |   35 ++---
 localedata/locales/bo_CN            |   18 +--
 localedata/locales/bo_IN            |   19 +--
 localedata/locales/br_FR            |   93 +++++------
 localedata/locales/brx_IN           |   47 ++----
 localedata/locales/bs_BA            |  105 ++++++-------
 localedata/locales/byn_ER           |   52 +++----
 localedata/locales/ca_AD            |   34 ++---
 localedata/locales/ca_ES            |  101 +++++-------
 localedata/locales/ca_FR            |   26 +--
 localedata/locales/ca_IT            |   26 +--
 localedata/locales/ce_RU            |   61 +++-----
 localedata/locales/chr_US           |   28 ++--
 localedata/locales/cmn_TW           |   90 ++++------
 localedata/locales/crh_UA           |  129 +++++++--------
 localedata/locales/cs_CZ            |  190 +++++++++++-----------
 localedata/locales/csb_PL           |   60 ++++----
 localedata/locales/cv_RU            |  113 ++++++-------
 localedata/locales/cy_GB            |   91 +++++------
 localedata/locales/da_DK            |  119 ++++++--------
 localedata/locales/de_AT            |   98 +++++------
 localedata/locales/de_BE            |   94 +++++------
 localedata/locales/de_CH            |  104 ++++++-------
 localedata/locales/de_DE            |  140 +++++++---------
 localedata/locales/de_IT            |   82 +++++-----
 localedata/locales/de_LI            |   31 ++--
 localedata/locales/de_LU            |  106 ++++++-------
 localedata/locales/doi_IN           |   43 ++----
 localedata/locales/dv_MV            |   51 +++----
 localedata/locales/dz_BT            |   69 ++++-----
 localedata/locales/el_CY            |   46 ++----
 localedata/locales/el_GR            |   60 +++-----
 localedata/locales/en_AG            |  112 ++++++-------
 localedata/locales/en_AU            |  117 ++++++--------
 localedata/locales/en_BW            |   46 ++----
 localedata/locales/en_CA            |  116 ++++++--------
 localedata/locales/en_DK            |  105 ++++++-------
 localedata/locales/en_GB            |  117 ++++++--------
 localedata/locales/en_HK            |   99 +++++-------
 localedata/locales/en_IE            |  106 ++++++-------
 localedata/locales/en_IL            |   85 +++++------
 localedata/locales/en_IN            |   85 +++++------
 localedata/locales/en_NG            |  129 ++++++---------
 localedata/locales/en_NZ            |  117 ++++++--------
 localedata/locales/en_PH            |  101 +++++-------
 localedata/locales/en_SG            |  104 ++++++-------
 localedata/locales/en_US            |  133 +++++++---------
 localedata/locales/en_ZA            |  161 +++++++------------
 localedata/locales/en_ZM            |  101 +++++-------
 localedata/locales/en_ZW            |   47 ++----
 localedata/locales/eo               |  107 ++++++-------
 localedata/locales/es_AR            |  116 ++++++-------
 localedata/locales/es_BO            |  112 ++++++-------
 localedata/locales/es_CL            |  112 ++++++-------
 localedata/locales/es_CO            |  117 ++++++--------
 localedata/locales/es_CR            |  131 +++++++--------
 localedata/locales/es_CU            |  109 ++++++-------
 localedata/locales/es_DO            |  116 ++++++-------
 localedata/locales/es_EC            |  112 ++++++-------
 localedata/locales/es_ES            |  114 ++++++-------
 localedata/locales/es_GT            |  116 ++++++-------
 localedata/locales/es_HN            |  115 ++++++-------
 localedata/locales/es_MX            |  115 ++++++-------
 localedata/locales/es_NI            |  122 +++++++--------
 localedata/locales/es_PA            |  116 ++++++-------
 localedata/locales/es_PE            |  118 ++++++--------
 localedata/locales/es_PR            |  116 ++++++-------
 localedata/locales/es_PY            |  112 ++++++-------
 localedata/locales/es_SV            |  116 ++++++-------
 localedata/locales/es_US            |  113 ++++++-------
 localedata/locales/es_UY            |  112 ++++++-------
 localedata/locales/es_VE            |  118 ++++++--------
 localedata/locales/et_EE            |  107 ++++++-------
 localedata/locales/eu_ES            |  109 ++++++-------
 localedata/locales/eu_ES@euro       |    8 +-
 localedata/locales/fa_IR            |   67 +++------
 localedata/locales/ff_SN            |  178 ++++++++-------------
 localedata/locales/fi_FI            |  123 +++++++--------
 localedata/locales/fil_PH           |  114 ++++++--------
 localedata/locales/fo_FO            |   98 +++++------
 localedata/locales/fr_BE            |  110 ++++++-------
 localedata/locales/fr_CA            |   99 +++++-------
 localedata/locales/fr_CH            |   99 +++++-------
 localedata/locales/fr_FR            |  123 ++++++--------
 localedata/locales/fr_LU            |  107 ++++++-------
 localedata/locales/fur_IT           |   82 ++++------
 localedata/locales/fy_DE            |   90 +++++------
 localedata/locales/fy_NL            |  102 +++++-------
 localedata/locales/ga_IE            |  114 ++++++-------
 localedata/locales/gd_GB            |  109 +++++--------
 localedata/locales/gez_ER           |   37 ++---
 localedata/locales/gez_ET           |   36 ++---
 localedata/locales/gl_ES            |  112 ++++++-------
 localedata/locales/gu_IN            |   38 ++---
 localedata/locales/gv_GB            |  129 +++++++--------
 localedata/locales/ha_NG            |  103 +++++-------
 localedata/locales/hak_TW           |   90 ++++------
 localedata/locales/he_IL            |   60 +++----
 localedata/locales/hi_IN            |   48 ++----
 localedata/locales/hif_FJ           |  129 +++++++--------
 localedata/locales/hne_IN           |   32 ++---
 localedata/locales/hr_HR            |  105 ++++++-------
 localedata/locales/hsb_DE           |   97 +++++------
 localedata/locales/ht_HT            |  144 +++++++----------
 localedata/locales/hu_HU            |  116 ++++++--------
 localedata/locales/hy_AM            |   48 +++---
 localedata/locales/i18n             |   51 +++----
 localedata/locales/ia_FR            |   93 +++++------
 localedata/locales/id_ID            |  115 ++++++-------
 localedata/locales/ig_NG            |   98 +++++------
 localedata/locales/ik_CA            |   92 +++++------
 localedata/locales/is_IS            |  116 ++++++-------
 localedata/locales/it_CH            |  109 ++++++-------
 localedata/locales/it_IT            |  115 ++++++--------
 localedata/locales/iu_CA            |   34 ++---
 localedata/locales/ja_JP            |  108 ++++++-------
 localedata/locales/ka_GE            |   40 ++---
 localedata/locales/kk_KZ            |   53 +++----
 localedata/locales/kl_GL            |  100 +++++-------
 localedata/locales/kn_IN            |   46 ++----
 localedata/locales/ko_KR            |   91 +++++------
 localedata/locales/kok_IN           |   39 ++---
 localedata/locales/ks_IN            |   36 ++---
 localedata/locales/ks_IN@devanagari |   45 ++----
 localedata/locales/ku_TR            |  107 ++++++-------
 localedata/locales/kw_GB            |  113 ++++++-------
 localedata/locales/ky_KG            |   57 +++----
 localedata/locales/lb_LU            |  146 ++++++++---------
 localedata/locales/lg_UG            |  133 +++++++---------
 localedata/locales/li_BE            |   24 ++--
 localedata/locales/li_NL            |   88 +++++------
 localedata/locales/lij_IT           |   99 +++++-------
 localedata/locales/ln_CD            |  140 +++++++---------
 localedata/locales/lo_LA            |  105 ++++++-------
 localedata/locales/lt_LT            |  110 ++++++-------
 localedata/locales/lv_LV            |  108 ++++++-------
 localedata/locales/lzh_TW           |   89 ++++------
 localedata/locales/mag_IN           |   43 ++----
 localedata/locales/mai_IN           |   32 ++---
 localedata/locales/mg_MG            |  129 ++++++---------
 localedata/locales/mhr_RU           |   36 ++---
 localedata/locales/mi_NZ            |   89 +++++------
 localedata/locales/mk_MK            |   58 +++----
 localedata/locales/ml_IN            |   35 ++---
 localedata/locales/mn_MN            |  237 +++++++++++++--------------
 localedata/locales/mni_IN           |   38 ++---
 localedata/locales/mr_IN            |   50 ++----
 localedata/locales/ms_MY            |  103 ++++++-------
 localedata/locales/mt_MT            |  129 +++++++---------
 localedata/locales/my_MM            |   55 +++----
 localedata/locales/nan_TW           |   90 ++++------
 localedata/locales/nan_TW@latin     |  119 ++++++--------
 localedata/locales/nb_NO            |  119 ++++++--------
 localedata/locales/nds_DE           |   84 +++++-----
 localedata/locales/nds_NL           |   82 +++++-----
 localedata/locales/ne_NP            |   53 +++----
 localedata/locales/nhn_MX           |   90 +++++------
 localedata/locales/niu_NU           |  126 +++++++--------
 localedata/locales/niu_NZ           |   18 +--
 localedata/locales/nl_AW            |  104 ++++++-------
 localedata/locales/nl_BE            |   89 +++++------
 localedata/locales/nl_NL            |  107 ++++++-------
 localedata/locales/nn_NO            |  110 ++++++--------
 localedata/locales/nr_ZA            |  123 ++++++--------
 localedata/locales/nso_ZA           |  108 ++++++-------
 localedata/locales/oc_FR            |   82 +++++-----
 localedata/locales/om_ET            |   39 ++---
 localedata/locales/om_KE            |   81 +++++-----
 localedata/locales/or_IN            |   54 +++----
 localedata/locales/os_RU            |   29 ++---
 localedata/locales/pa_IN            |   46 ++----
 localedata/locales/pa_PK            |   30 ++---
 localedata/locales/pap_AW           |  103 ++++++-------
 localedata/locales/pap_CW           |  101 ++++++-------
 localedata/locales/pl_PL            |  109 ++++++-------
 localedata/locales/ps_AF            |   72 ++++-----
 localedata/locales/pt_BR            |  114 ++++++-------
 localedata/locales/pt_PT            |  119 +++++++--------
 localedata/locales/quz_PE           |  118 ++++++--------
 localedata/locales/raj_IN           |   19 +--
 localedata/locales/ro_RO            |  134 +++++++---------
 localedata/locales/ru_RU            |   43 ++---
 localedata/locales/ru_UA            |   49 +++----
 localedata/locales/rw_RW            |  111 ++++++-------
 localedata/locales/sa_IN            |   66 +++-----
 localedata/locales/sat_IN           |   39 ++---
 localedata/locales/sc_IT            |   93 +++++------
 localedata/locales/sd_IN            |   38 ++---
 localedata/locales/sd_IN@devanagari |   50 ++----
 localedata/locales/se_NO            |  121 +++++++--------
 localedata/locales/sgs_LT           |   87 +++++------
 localedata/locales/shs_CA           |  105 ++++++-------
 localedata/locales/si_LK            |   61 +++----
 localedata/locales/sid_ET           |  125 +++++++--------
 localedata/locales/sk_SK            |  132 +++++++--------
 localedata/locales/sl_SI            |  108 ++++++-------
 localedata/locales/sm_WS            |  151 +++++++----------
 localedata/locales/so_DJ            |   79 ++++-----
 localedata/locales/so_ET            |  118 ++++++--------
 localedata/locales/so_KE            |  118 ++++++--------
 localedata/locales/so_SO            |  157 +++++++++----------
 localedata/locales/sq_AL            |  131 +++++++---------
 localedata/locales/sq_MK            |   36 ++---
 localedata/locales/sr_ME            |   56 +++----
 localedata/locales/sr_RS            |   66 +++-----
 localedata/locales/sr_RS@latin      |  121 +++++++--------
 localedata/locales/ss_ZA            |  122 ++++++--------
 localedata/locales/st_ZA            |  123 ++++++--------
 localedata/locales/sv_FI            |   98 +++++------
 localedata/locales/sv_SE            |  110 ++++++-------
 localedata/locales/sw_KE            |  145 ++++++-----------
 localedata/locales/sw_TZ            |  142 +++++++----------
 localedata/locales/szl_PL           |   91 +++++------
 localedata/locales/ta_IN            |   47 +++---
 localedata/locales/ta_LK            |   26 ++--
 localedata/locales/tcy_IN           |   32 ++---
 localedata/locales/te_IN            |   49 ++----
 localedata/locales/tg_TJ            |   46 ++----
 localedata/locales/th_TH            |  108 ++++++-------
 localedata/locales/the_NP           |   45 ++----
 localedata/locales/ti_ER            |   70 ++++-----
 localedata/locales/ti_ET            |   75 ++++-----
 localedata/locales/tig_ER           |   48 +++----
 localedata/locales/tk_TM            |  123 +++++++--------
 localedata/locales/tl_PH            |   98 +++++------
 localedata/locales/tn_ZA            |  121 ++++++---------
 localedata/locales/to_TO            |  145 +++++++----------
 localedata/locales/tpi_PG           |    6 +-
 localedata/locales/tr_CY            |   29 ++---
 localedata/locales/tr_TR            |  133 +++++++---------
 localedata/locales/ts_ZA            |  118 ++++++--------
 localedata/locales/tt_RU            |   29 ++--
 localedata/locales/tt_RU@iqtelif    |  130 +++++++--------
 localedata/locales/ug_CN            |   31 ++---
 localedata/locales/uk_UA            |   92 ++++++------
 localedata/locales/unm_US           |  105 ++++++-------
 localedata/locales/ur_IN            |   39 ++---
 localedata/locales/ur_PK            |   50 +++----
 localedata/locales/uz_UZ            |  112 ++++++-------
 localedata/locales/uz_UZ@cyrillic   |   46 +++---
 localedata/locales/ve_ZA            |  108 ++++++-------
 localedata/locales/vi_VN            |  133 +++++++---------
 localedata/locales/wa_BE            |   96 +++++------
 localedata/locales/wae_CH           |  131 +++++++--------
 localedata/locales/wal_ET           |   44 ++----
 localedata/locales/wo_SN            |  121 ++++++--------
 localedata/locales/xh_ZA            |  123 ++++++--------
 localedata/locales/yi_US            |   50 +++----
 localedata/locales/yo_NG            |  113 ++++++-------
 localedata/locales/yue_HK           |   57 +++----
 localedata/locales/yuw_PG           |    2 +-
 localedata/locales/zh_CN            |   66 ++++-----
 localedata/locales/zh_HK            |   65 ++++----
 localedata/locales/zh_SG            |   50 +++---
 localedata/locales/zh_TW            |   88 ++++-------
 localedata/locales/zu_ZA            |  124 +++++++--------
 300 files changed, 11664 insertions(+), 15069 deletions(-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (23 preceding siblings ...)
  2017-11-14  8:11 ` cvs-commit at gcc dot gnu.org
@ 2017-11-14  8:13 ` maiku.fabian at gmail dot com
  2017-11-14  8:17 ` claude at 2xlibre dot net
                   ` (8 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-11-14  8:13 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|---                         |2.27

--- Comment #23 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Fixed in glibc master.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (24 preceding siblings ...)
  2017-11-14  8:13 ` maiku.fabian at gmail dot com
@ 2017-11-14  8:17 ` claude at 2xlibre dot net
  2017-11-14 13:02   ` Keld Simonsen
  2017-11-14 13:02 ` keld at keldix dot com
                   ` (7 subsequent siblings)
  33 siblings, 1 reply; 42+ messages in thread
From: claude at 2xlibre dot net @ 2017-11-14  8:17 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #24 from Claude Paroz <claude at 2xlibre dot net> ---
Awesome, thanks Mike for the commit!

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-14  8:17 ` claude at 2xlibre dot net
@ 2017-11-14 13:02   ` Keld Simonsen
  0 siblings, 0 replies; 42+ messages in thread
From: Keld Simonsen @ 2017-11-14 13:02 UTC (permalink / raw)
  To: claude at 2xlibre dot net; +Cc: libc-locales

This commit is highly problematic, damaging the portablilty of glibc locales.
I wish they will be reverted.

Best regards
keld

On Tue, Nov 14, 2017 at 08:15:04AM +0000, claude at 2xlibre dot net wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #24 from Claude Paroz <claude at 2xlibre dot net> ---
> Awesome, thanks Mike for the commit!
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (25 preceding siblings ...)
  2017-11-14  8:17 ` claude at 2xlibre dot net
@ 2017-11-14 13:02 ` keld at keldix dot com
  2017-11-14 13:06   ` Keld Simonsen
  2017-11-14 13:07 ` keld at keldix dot com
                   ` (6 subsequent siblings)
  33 siblings, 1 reply; 42+ messages in thread
From: keld at keldix dot com @ 2017-11-14 13:02 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #25 from keld at keldix dot com <keld at keldix dot com> ---
This commit is highly problematic, damaging the portablilty of glibc locales.
I wish they will be reverted.

Best regards
keld

On Tue, Nov 14, 2017 at 08:15:04AM +0000, claude at 2xlibre dot net wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #24 from Claude Paroz <claude at 2xlibre dot net> ---
> Awesome, thanks Mike for the commit!
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-14 13:02 ` keld at keldix dot com
@ 2017-11-14 13:06   ` Keld Simonsen
  0 siblings, 0 replies; 42+ messages in thread
From: Keld Simonsen @ 2017-11-14 13:06 UTC (permalink / raw)
  To: keld at keldix dot com; +Cc: libc-locales

Is there a script to convert these ascii values to <uxxxx> strings?

I would need it for the ISO 30112 standard, as I do not want to publish 
non-portable code in the ISO standard.

best regards
Keld

On Tue, Nov 14, 2017 at 01:02:38PM +0000, keld at keldix dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #25 from keld at keldix dot com <keld at keldix dot com> ---
> This commit is highly problematic, damaging the portablilty of glibc locales.
> I wish they will be reverted.
> 
> Best regards
> keld
> 
> On Tue, Nov 14, 2017 at 08:15:04AM +0000, claude at 2xlibre dot net wrote:
> > https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> > 
> > --- Comment #24 from Claude Paroz <claude at 2xlibre dot net> ---
> > Awesome, thanks Mike for the commit!
> > 
> > -- 
> > You are receiving this mail because:
> > You are on the CC list for the bug.
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (26 preceding siblings ...)
  2017-11-14 13:02 ` keld at keldix dot com
@ 2017-11-14 13:07 ` keld at keldix dot com
  2017-11-14 13:19 ` egmont at gmail dot com
                   ` (5 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: keld at keldix dot com @ 2017-11-14 13:07 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #26 from keld at keldix dot com <keld at keldix dot com> ---
Is there a script to convert these ascii values to <uxxxx> strings?

I would need it for the ISO 30112 standard, as I do not want to publish 
non-portable code in the ISO standard.

best regards
Keld

On Tue, Nov 14, 2017 at 01:02:38PM +0000, keld at keldix dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #25 from keld at keldix dot com <keld at keldix dot com> ---
> This commit is highly problematic, damaging the portablilty of glibc locales.
> I wish they will be reverted.
> 
> Best regards
> keld
> 
> On Tue, Nov 14, 2017 at 08:15:04AM +0000, claude at 2xlibre dot net wrote:
> > https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> > 
> > --- Comment #24 from Claude Paroz <claude at 2xlibre dot net> ---
> > Awesome, thanks Mike for the commit!
> > 
> > -- 
> > You are receiving this mail because:
> > You are on the CC list for the bug.
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (27 preceding siblings ...)
  2017-11-14 13:07 ` keld at keldix dot com
@ 2017-11-14 13:19 ` egmont at gmail dot com
  2017-11-14 20:25 ` jwilk at jwilk dot net
                   ` (4 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: egmont at gmail dot com @ 2017-11-14 13:19 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #27 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to keld@keldix.com from comment #25)

> This commit is highly problematic, damaging the portablilty of glibc locales.

If this kind of portability is really a concern, someone could some up with a
script that converts from the new version to the old one. It could even be
integrated with the build system to the level where these generated files are
actually placed under BUILD and then further processed.

I wish the current change even pushed it further, towards raw UTF-8 at least
for printable and "non-problematic" (to some vague, arbitrary definition)
characters.

I have on a few occasions made some minor edits to effected parts of a locale
file, dealing with the <Uxxxx> notation was a nightmare. Working with a string
like "h<U00E9>tf<U0151>" is already much better than
"<U0068><U00E9><U0074><U0066><U0151>", but seeing "hétfő" would be ideal.

Source code is meant to be human-readable, which all these <Uxxxx>s is most
certainly not.

There's a reason people write code like
  printf("Hello world!\n");
and not
  printf("\x48\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21\x0a");

If for whatever reason the latter, hard-to-read (and hard-to-write) form is
required, it should be auto-generated from the former, easy-to-read (and
easy-to-write) one.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (28 preceding siblings ...)
  2017-11-14 13:19 ` egmont at gmail dot com
@ 2017-11-14 20:25 ` jwilk at jwilk dot net
  2017-11-14 23:32 ` carlos at redhat dot com
                   ` (3 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: jwilk at jwilk dot net @ 2017-11-14 20:25 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

Jakub Wilk <jwilk at jwilk dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jwilk at jwilk dot net

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (29 preceding siblings ...)
  2017-11-14 20:25 ` jwilk at jwilk dot net
@ 2017-11-14 23:32 ` carlos at redhat dot com
  2017-11-15 10:15 ` maiku.fabian at gmail dot com
                   ` (2 subsequent siblings)
  33 siblings, 0 replies; 42+ messages in thread
From: carlos at redhat dot com @ 2017-11-14 23:32 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #28 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to keld@keldix.com from comment #25)
> This commit is highly problematic, damaging the portablilty of glibc locales.
> I wish they will be reverted.

The glibc community is a consensus driven community. If you have an objection,
please raise that objection on libc-alpha, and include the relevant parties to
discuss the issue. Consensus discussions should not be held on the bug tracker.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (30 preceding siblings ...)
  2017-11-14 23:32 ` carlos at redhat dot com
@ 2017-11-15 10:15 ` maiku.fabian at gmail dot com
  2017-11-16 23:39   ` Keld Simonsen
  2017-11-15 10:36 ` schwab@linux-m68k.org
  2017-11-16 23:39 ` keld at keldix dot com
  33 siblings, 1 reply; 42+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-11-15 10:15 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #29 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Egmont Koblinger from comment #27)
> (In reply to keld@keldix.com from comment #25)
> 
> > This commit is highly problematic, damaging the portablilty of glibc locales.
> 
> If this kind of portability is really a concern, someone could some up with
> a script that converts from the new version to the old one. It could even be
> integrated with the build system to the level where these generated files
> are actually placed under BUILD and then further processed.

Yes, if that is really a concern, we could easily convert it to different
formats.
I really doubt that this can cause problems though. If the file contained
“<a>”, one still has to be able to read the ascii characters “<”, “a”, and “>”
to interpret the file, I don’t see anything which is lost by just writing “a”
instead. If one cannot read an ascii file, one would not be able to read the
keywords in the file either. So if something else than ascii like EBCDIC
is needed, one would need some conversion anyway. Using “a” instead of “<a>”
does not make such conversion any harder.

> I wish the current change even pushed it further, towards raw UTF-8 at least
> for printable and "non-problematic" (to some vague, arbitrary definition)
> characters.

I agree. In the long run this would be even better. Readability of the
source is useful. Let’s see what our experiences with using ascii directly
are, if no problems occur we can think about using UTF-8 for “non-problematic”
characters.

> I have on a few occasions made some minor edits to effected parts of a
> locale file, dealing with the <Uxxxx> notation was a nightmare. Working with
> a string like "h<U00E9>tf<U0151>" is already much better than
> "<U0068><U00E9><U0074><U0066><U0151>", but seeing "hétfő" would be ideal.

Yes, I also found the <Uxxxx> annoying when browsing the files, it
makes it much harder to spot errors.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (31 preceding siblings ...)
  2017-11-15 10:15 ` maiku.fabian at gmail dot com
@ 2017-11-15 10:36 ` schwab@linux-m68k.org
  2017-11-16 23:39 ` keld at keldix dot com
  33 siblings, 0 replies; 42+ messages in thread
From: schwab@linux-m68k.org @ 2017-11-15 10:36 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #30 from Andreas Schwab <schwab@linux-m68k.org> ---
> Yes, I also found the <Uxxxx> annoying when browsing the files, it
> makes it much harder to spot errors.

Try this:

  (font-lock-add-keywords nil
     '(("<U\\(....\\)>"
        (0 (progn (compose-region (match-beginning 0) (match-end 0)
                  (string-to-number (match-string 1) 16)))))))

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-15 10:15 ` maiku.fabian at gmail dot com
@ 2017-11-16 23:39   ` Keld Simonsen
  0 siblings, 0 replies; 42+ messages in thread
From: Keld Simonsen @ 2017-11-16 23:39 UTC (permalink / raw)
  To: maiku.fabian at gmail dot com; +Cc: libc-locales

On Wed, Nov 15, 2017 at 10:15:31AM +0000, maiku.fabian at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #29 from Mike FABIAN <maiku.fabian at gmail dot com> ---
> (In reply to Egmont Koblinger from comment #27)
> > (In reply to keld@keldix.com from comment #25)
> > 
> > > This commit is highly problematic, damaging the portablilty of glibc locales.
> > 
> > If this kind of portability is really a concern, someone could some up with
> > a script that converts from the new version to the old one. It could even be
> > integrated with the build system to the level where these generated files
> > are actually placed under BUILD and then further processed.
> 
> Yes, if that is really a concern, we could easily convert it to different
> formats.
> I really doubt that this can cause problems though. If the file contained
> ???<a>???, one still has to be able to read the ascii characters ???<???, ???a???, and ???>???
> to interpret the file, I don???t see anything which is lost by just writing ???a???
> instead. If one cannot read an ascii file, one would not be able to read the
> keywords in the file either. So if something else than ascii like EBCDIC
> is needed, one would need some conversion anyway. Using ???a??? instead of ???<a>???
> does not make such conversion any harder.

I have explained  earlier that not using symbolic character names will generate
wrong results in situations where the source and target coded character set have
different encodings of ascii characters. 

The locales as they have come from my hand even preserves portability when some 
characters in the ascii character set have different encodings, which happens
on EBCDICs with different national ebcdic character sets. These are still in use
on big banking and aviation systems AFAIK. 

As an editor of multiple ISO standards on POSIX/Linux locales I do strive for general specs
and portablility. I can understand that this is not an issue for glibc people. 
I just have been happy that glibc has been using the ISO specs, and that I as 
ISO editor could use the glibc specs in return. This is not the case anymore with the recent
patch. 

I do have a great concern for the readability of the locales. That is why I made
an elaborate set of symbolic character names, that were much easier to proofread
than the <uxxxx> names, such as the <a> and greek <a*> names, japanese kana, arabic,
hebrew etc. Thus the locales were both portable over almost all known platforms, and
readable to some extent.  I was quite happy when I saw that the Arabic name for the 
10th month was something like "octobr" - it meant that I as someone that could not
read arabic at all, could write and maintain an arabic locale, with some confidence.

Also, I cannot edit japanese or arabic characters in utf-8, as I don't know them, and 
I think this is also the case for many mauntainers or glibc locales. They may be fluent
in their own locale, but locales from other cultures may be beyond their capability
to edit in raw utf-8.

I wish that we could have some arrangement so that we can have mutual exchange again
of locale specs.

Best regards
keld

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Bug localedata/22387] Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range
  2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
                   ` (32 preceding siblings ...)
  2017-11-15 10:36 ` schwab@linux-m68k.org
@ 2017-11-16 23:39 ` keld at keldix dot com
  33 siblings, 0 replies; 42+ messages in thread
From: keld at keldix dot com @ 2017-11-16 23:39 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=22387

--- Comment #31 from keld at keldix dot com <keld at keldix dot com> ---
On Wed, Nov 15, 2017 at 10:15:31AM +0000, maiku.fabian at gmail dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=22387
> 
> --- Comment #29 from Mike FABIAN <maiku.fabian at gmail dot com> ---
> (In reply to Egmont Koblinger from comment #27)
> > (In reply to keld@keldix.com from comment #25)
> > 
> > > This commit is highly problematic, damaging the portablilty of glibc locales.
> > 
> > If this kind of portability is really a concern, someone could some up with
> > a script that converts from the new version to the old one. It could even be
> > integrated with the build system to the level where these generated files
> > are actually placed under BUILD and then further processed.
> 
> Yes, if that is really a concern, we could easily convert it to different
> formats.
> I really doubt that this can cause problems though. If the file contained
> ???<a>???, one still has to be able to read the ascii characters ???<???, ???a???, and ???>???
> to interpret the file, I don???t see anything which is lost by just writing ???a???
> instead. If one cannot read an ascii file, one would not be able to read the
> keywords in the file either. So if something else than ascii like EBCDIC
> is needed, one would need some conversion anyway. Using ???a??? instead of ???<a>???
> does not make such conversion any harder.

I have explained  earlier that not using symbolic character names will generate
wrong results in situations where the source and target coded character set
have
different encodings of ascii characters. 

The locales as they have come from my hand even preserves portability when some 
characters in the ascii character set have different encodings, which happens
on EBCDICs with different national ebcdic character sets. These are still in
use
on big banking and aviation systems AFAIK. 

As an editor of multiple ISO standards on POSIX/Linux locales I do strive for
general specs
and portablility. I can understand that this is not an issue for glibc people. 
I just have been happy that glibc has been using the ISO specs, and that I as 
ISO editor could use the glibc specs in return. This is not the case anymore
with the recent
patch. 

I do have a great concern for the readability of the locales. That is why I
made
an elaborate set of symbolic character names, that were much easier to
proofread
than the <uxxxx> names, such as the <a> and greek <a*> names, japanese kana,
arabic,
hebrew etc. Thus the locales were both portable over almost all known
platforms, and
readable to some extent.  I was quite happy when I saw that the Arabic name for
the 
10th month was something like "octobr" - it meant that I as someone that could
not
read arabic at all, could write and maintain an arabic locale, with some
confidence.

Also, I cannot edit japanese or arabic characters in utf-8, as I don't know
them, and 
I think this is also the case for many mauntainers or glibc locales. They may
be fluent
in their own locale, but locales from other cultures may be beyond their
capability
to edit in raw utf-8.

I wish that we could have some arrangement so that we can have mutual exchange
again
of locale specs.

Best regards
keld

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2017-11-16 23:39 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-02 14:00 [Bug localedata/22387] New: Replace unicode sequences <Uxxxx> for characters inside the ASCII printable range claude at 2xlibre dot net
2017-11-02 14:15 ` [Bug localedata/22387] " claude at 2xlibre dot net
2017-11-02 23:34   ` Keld Simonsen
2017-11-02 16:54 ` maiku.fabian at gmail dot com
2017-11-02 17:00 ` maiku.fabian at gmail dot com
2017-11-02 17:06 ` claude at 2xlibre dot net
2017-11-02 17:21 ` claude at 2xlibre dot net
2017-11-02 17:38 ` schwab@linux-m68k.org
2017-11-02 20:40 ` egmont at gmail dot com
2017-11-02 23:34 ` keld at keldix dot com
2017-11-03  3:12 ` carlos at redhat dot com
2017-11-07 23:54   ` Keld Simonsen
2017-11-08 19:59   ` Keld Simonsen
2017-11-03  6:49 ` schwab@linux-m68k.org
2017-11-03  9:52 ` egmont at gmail dot com
2017-11-03  9:56 ` egmont at gmail dot com
2017-11-09 10:18   ` Keld Simonsen
2017-11-03 15:43 ` maiku.fabian at gmail dot com
2017-11-03 15:51 ` egmont at gmail dot com
2017-11-03 16:49 ` joseph at codesourcery dot com
2017-11-06 13:23 ` claude at 2xlibre dot net
2017-11-06 13:26 ` claude at 2xlibre dot net
2017-11-07 15:12 ` claude at 2xlibre dot net
2017-11-07 16:57 ` piotrdrag at gmail dot com
2017-11-07 23:54 ` keld at keldix dot com
2017-11-08 20:00 ` keld at keldix dot com
2017-11-09 10:19 ` keld at keldix dot com
2017-11-09 16:31 ` joseph at codesourcery dot com
2017-11-14  8:11 ` cvs-commit at gcc dot gnu.org
2017-11-14  8:13 ` maiku.fabian at gmail dot com
2017-11-14  8:17 ` claude at 2xlibre dot net
2017-11-14 13:02   ` Keld Simonsen
2017-11-14 13:02 ` keld at keldix dot com
2017-11-14 13:06   ` Keld Simonsen
2017-11-14 13:07 ` keld at keldix dot com
2017-11-14 13:19 ` egmont at gmail dot com
2017-11-14 20:25 ` jwilk at jwilk dot net
2017-11-14 23:32 ` carlos at redhat dot com
2017-11-15 10:15 ` maiku.fabian at gmail dot com
2017-11-16 23:39   ` Keld Simonsen
2017-11-15 10:36 ` schwab@linux-m68k.org
2017-11-16 23:39 ` keld at keldix dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).