public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth
@ 2017-07-11 14:18 tg at mirbsd dot de
  2017-07-12 11:01 ` [Bug localedata/21750] " tjk at tksoft dot com
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: tg at mirbsd dot de @ 2017-07-11 14:18 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

            Bug ID: 21750
           Summary: column width of characters incompatible with classical
                    wcwidth
           Product: glibc
           Version: 2.26
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: tg at mirbsd dot de
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

I’ve compared the new autogenerated column width from
localedata/unicode-gen/utf8_gen.py with the results of the classical wcwidth()
implementation from xterm (adjusted to Unicode 10.0.0) and found a few
divergences (and bugs on my (MirBSD, which uses something based on xterm’s data
system-wide) side, which I fixed).

1. U+00AD is forced to width 1 in xterm, autodetected as combining in glibc

Rationale for forcing it to 1 is likely that U+0000‥U+00FF are latin1, which,
when displayed as 8bit on terminals, had no combining characters at all.

Change Request to glibc: force U+00AD to width 1.

2. The UCD has three codepoints that are Me/Mn category but not NSM bidi class:
U+0CBF U+0CC6 U-00011C3F

This is likely a bug in UCD but can be fixed by glibc treating Me/Mn the same
as Cf/NSM, which I do.

Change Request to glibc: handle Me/Mn category the same as NSM bidi class.

3. Hangul Jamo medial vowels and final consonants are set to 0 by xterm so they
combine on top of the preceding initial ones: U+1160‥U+11FF

Change Request to glibc: force U+1160‥U+11FF to width 0.

4. During parsing, EastAsianWidth data overrides UCD data, more specifically
the NSM property.

This leads to U+302A‥U+302D and – see also
https://sourceware.org/bugzilla/show_bug.cgi?id=19852 – U+3099 and U+309A being
treated as width 2.

Change Request to glibc: read EAW before UCD so the NSM overrides EAW here.

5. Ambiguous circled numbers and neutral hexagrams changed width

xterm used to set those to width 2, likely because they are ideographs and not
unlike zodiac signs and emoji (which, I notice, have been set to width 2 in UCD
nowadays)

Change Request to glibc: force U+3248‥U+324F and U+4DC0‥U+4DFF to width 2.


Note: I’ve initially reported the surprising change to Debian as
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=826256 but have redone the
research today (against 2.24 in Debian and git master commit
2a91300176a5991d9825eba085e502196a3f47cd in glibc) against Unicode 10,
double-checked *all* differences against MirBSD code and fixed a few bugs there
after making it possible to compare the results (considering glibc only puts
actually assigned codepoints into the localedata/charmaps/UTF-8 file).

Rationale for requesting the change in glibc is so that all systems I have
access to use the same width data, preventing display artifacts and glitches up
to making an editor somewhat unusable with heavy Unicode (I have test files
containing the entire Unicode range). Thank you for listening.

If necessary, I will provide patches (to utf8_gen.py most likely) when asked.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
@ 2017-07-12 11:01 ` tjk at tksoft dot com
  2017-07-12 11:01 ` [Bug localedata/21750] New: " Troy Korjuslommi
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tjk at tksoft dot com @ 2017-07-12 11:01 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #1 from Troy Korjuslommi <tjk at tksoft dot com> ---
Excuse my ignorance, but isn't U+00AD (soft hyphen) usually invisible,
i.e. zero columns? If an app breaks up words at end-of-line, it can use
the soft hyphens as helpers to detect the correct locations. The app can
then add a visible hyphen to the end of the line. (If the app also reads
from the terminal, then it can e.g. ignore visible hyphens when preceded
by a soft hyphen, or use some other mechanism to mark the character as
for terminal display only).

I am not suggesting a change, if xterm etc. multitude of apps are
already handling soft hyphens in some other manner, just wondering.

Troy



On Tue, 2017-07-11 at 14:18 +0000, tg at mirbsd dot de wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=21750
> 
>             Bug ID: 21750
>            Summary: column width of characters incompatible with classical
>                     wcwidth
>            Product: glibc
>            Version: 2.26
>             Status: UNCONFIRMED
>           Severity: normal
>           Priority: P2
>          Component: localedata
>           Assignee: unassigned at sourceware dot org
>           Reporter: tg at mirbsd dot de
>                 CC: libc-locales at sourceware dot org
>   Target Milestone: ---
> 
> I’ve compared the new autogenerated column width from
> localedata/unicode-gen/utf8_gen.py with the results of the classical wcwidth()
> implementation from xterm (adjusted to Unicode 10.0.0) and found a few
> divergences (and bugs on my (MirBSD, which uses something based on xterm’s data
> system-wide) side, which I fixed).
> 
> 1. U+00AD is forced to width 1 in xterm, autodetected as combining in glibc
> 
> Rationale for forcing it to 1 is likely that U+0000‥U+00FF are latin1, which,
> when displayed as 8bit on terminals, had no combining characters at all.
> 
> Change Request to glibc: force U+00AD to width 1.
> 
> 2. The UCD has three codepoints that are Me/Mn category but not NSM bidi class:
> U+0CBF U+0CC6 U-00011C3F
> 
> This is likely a bug in UCD but can be fixed by glibc treating Me/Mn the same
> as Cf/NSM, which I do.
> 
> Change Request to glibc: handle Me/Mn category the same as NSM bidi class.
> 
> 3. Hangul Jamo medial vowels and final consonants are set to 0 by xterm so they
> combine on top of the preceding initial ones: U+1160‥U+11FF
> 
> Change Request to glibc: force U+1160‥U+11FF to width 0.
> 
> 4. During parsing, EastAsianWidth data overrides UCD data, more specifically
> the NSM property.
> 
> This leads to U+302A‥U+302D and – see also
> https://sourceware.org/bugzilla/show_bug.cgi?id=19852 – U+3099 and U+309A being
> treated as width 2.
> 
> Change Request to glibc: read EAW before UCD so the NSM overrides EAW here.
> 
> 5. Ambiguous circled numbers and neutral hexagrams changed width
> 
> xterm used to set those to width 2, likely because they are ideographs and not
> unlike zodiac signs and emoji (which, I notice, have been set to width 2 in UCD
> nowadays)
> 
> Change Request to glibc: force U+3248‥U+324F and U+4DC0‥U+4DFF to width 2.
> 
> 
> Note: I’ve initially reported the surprising change to Debian as
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=826256 but have redone the
> research today (against 2.24 in Debian and git master commit
> 2a91300176a5991d9825eba085e502196a3f47cd in glibc) against Unicode 10,
> double-checked *all* differences against MirBSD code and fixed a few bugs there
> after making it possible to compare the results (considering glibc only puts
> actually assigned codepoints into the localedata/charmaps/UTF-8 file).
> 
> Rationale for requesting the change in glibc is so that all systems I have
> access to use the same width data, preventing display artifacts and glitches up
> to making an editor somewhat unusable with heavy Unicode (I have test files
> containing the entire Unicode range). Thank you for listening.
> 
> If necessary, I will provide patches (to utf8_gen.py most likely) when asked.
>

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
  2017-07-12 11:01 ` [Bug localedata/21750] " tjk at tksoft dot com
@ 2017-07-12 11:01 ` Troy Korjuslommi
  2017-07-12 13:39 ` [Bug localedata/21750] " tg at mirbsd dot de
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Troy Korjuslommi @ 2017-07-12 11:01 UTC (permalink / raw)
  To: tg at mirbsd dot de; +Cc: libc-locales

Excuse my ignorance, but isn't U+00AD (soft hyphen) usually invisible,
i.e. zero columns? If an app breaks up words at end-of-line, it can use
the soft hyphens as helpers to detect the correct locations. The app can
then add a visible hyphen to the end of the line. (If the app also reads
from the terminal, then it can e.g. ignore visible hyphens when preceded
by a soft hyphen, or use some other mechanism to mark the character as
for terminal display only).

I am not suggesting a change, if xterm etc. multitude of apps are
already handling soft hyphens in some other manner, just wondering.

Troy
 

  
On Tue, 2017-07-11 at 14:18 +0000, tg at mirbsd dot de wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=21750
> 
>             Bug ID: 21750
>            Summary: column width of characters incompatible with classical
>                     wcwidth
>            Product: glibc
>            Version: 2.26
>             Status: UNCONFIRMED
>           Severity: normal
>           Priority: P2
>          Component: localedata
>           Assignee: unassigned at sourceware dot org
>           Reporter: tg at mirbsd dot de
>                 CC: libc-locales at sourceware dot org
>   Target Milestone: ---
> 
> I’ve compared the new autogenerated column width from
> localedata/unicode-gen/utf8_gen.py with the results of the classical wcwidth()
> implementation from xterm (adjusted to Unicode 10.0.0) and found a few
> divergences (and bugs on my (MirBSD, which uses something based on xterm’s data
> system-wide) side, which I fixed).
> 
> 1. U+00AD is forced to width 1 in xterm, autodetected as combining in glibc
> 
> Rationale for forcing it to 1 is likely that U+0000‥U+00FF are latin1, which,
> when displayed as 8bit on terminals, had no combining characters at all.
> 
> Change Request to glibc: force U+00AD to width 1.
> 
> 2. The UCD has three codepoints that are Me/Mn category but not NSM bidi class:
> U+0CBF U+0CC6 U-00011C3F
> 
> This is likely a bug in UCD but can be fixed by glibc treating Me/Mn the same
> as Cf/NSM, which I do.
> 
> Change Request to glibc: handle Me/Mn category the same as NSM bidi class.
> 
> 3. Hangul Jamo medial vowels and final consonants are set to 0 by xterm so they
> combine on top of the preceding initial ones: U+1160‥U+11FF
> 
> Change Request to glibc: force U+1160‥U+11FF to width 0.
> 
> 4. During parsing, EastAsianWidth data overrides UCD data, more specifically
> the NSM property.
> 
> This leads to U+302A‥U+302D and – see also
> https://sourceware.org/bugzilla/show_bug.cgi?id=19852 – U+3099 and U+309A being
> treated as width 2.
> 
> Change Request to glibc: read EAW before UCD so the NSM overrides EAW here.
> 
> 5. Ambiguous circled numbers and neutral hexagrams changed width
> 
> xterm used to set those to width 2, likely because they are ideographs and not
> unlike zodiac signs and emoji (which, I notice, have been set to width 2 in UCD
> nowadays)
> 
> Change Request to glibc: force U+3248‥U+324F and U+4DC0‥U+4DFF to width 2.
> 
> 
> Note: I’ve initially reported the surprising change to Debian as
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=826256 but have redone the
> research today (against 2.24 in Debian and git master commit
> 2a91300176a5991d9825eba085e502196a3f47cd in glibc) against Unicode 10,
> double-checked *all* differences against MirBSD code and fixed a few bugs there
> after making it possible to compare the results (considering glibc only puts
> actually assigned codepoints into the localedata/charmaps/UTF-8 file).
> 
> Rationale for requesting the change in glibc is so that all systems I have
> access to use the same width data, preventing display artifacts and glitches up
> to making an editor somewhat unusable with heavy Unicode (I have test files
> containing the entire Unicode range). Thank you for listening.
> 
> If necessary, I will provide patches (to utf8_gen.py most likely) when asked.
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
  2017-07-12 11:01 ` [Bug localedata/21750] " tjk at tksoft dot com
  2017-07-12 11:01 ` [Bug localedata/21750] New: " Troy Korjuslommi
@ 2017-07-12 13:39 ` tg at mirbsd dot de
  2017-07-14 12:04 ` tg at mirbsd dot de
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tg at mirbsd dot de @ 2017-07-12 13:39 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #2 from Thorsten Glaser <tg at mirbsd dot de> ---
(In reply to Troy Korjuslommi from comment #1)
> Excuse my ignorance, but isn't U+00AD (soft hyphen) usually invisible,
> i.e. zero columns? If an app breaks up words at end-of-line, it can use
> the soft hyphens as helpers to detect the correct locations. The app can

Yes, in theory. This codepoint could be used in the *input data* to
determine soft breaks. However (see below) they should *not* output
those to a terminal emulator (GUIs that handle this themselves are
likely fine).

> I am not suggesting a change, if xterm etc. multitude of apps are
> already handling soft hyphens in some other manner, just wondering.

Similar to U+0060 (the gravis accent 「`」) however, terminal emulators
have been treating both ASCII (for U+0060) and 8-bit codepages like
ISO 8859-1 (for U+00AD) as each (non-control) character having a constant
width of 1 (for SBCS), and xterm’s wcwidth() code had special handling
to force U+00AD to 1:

/*
 […]
 *    - SOFT HYPHEN (U+00AD) has a column width of 1.
 […]
 */
[…]
  /* generated by "uniset +cat=Me +cat=Mn +cat=Cf -00AD +1160-11FF +200B c" */

Source:
http://www.mirbsd.org/cvs.cgi/X11/xc/programs/xterm/wcwidth.c?rev=1.1.103.1;content-type=text%2Fplain


So you’d want to output U+0060 U+0008 U+0061 (` + backspace + a) to get à on a
(printed) terminal (or in code that uses such to emulate them), and similarily,
strip soft hyphens from the output (or manifest them as regular ones) before
outputting a soft-wrapped text (mostly because the terminal emulator will also
not soft-wrap, it’ll break at the end of the line, so you’d convert U+00AD to
some kind of hyphen (hyphen-minus or U+2010 perhaps) followed by a line
break(⚠) if preparing something fopr terminal output).


I’ve noticed the incompatibilities especially when the hexagrams, one of which
I’m using for UI purposes, changed width, and tried to discover all of them in
order to harmonise the width assumptions the various programs I have access to
use on all systems I use, with classical xterm wcwidth.c as base, since those
widths are the domain of a fixed-cell terminal emulator more than something
else (which can use its own data, if necessary).

I do volunteer to provide patches, here and elsewhere, so that, with the same
UCD version as input, we get consistent output (and I’ve sanity-checked the
output I got before opening this report).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (2 preceding siblings ...)
  2017-07-12 13:39 ` [Bug localedata/21750] " tg at mirbsd dot de
@ 2017-07-14 12:04 ` tg at mirbsd dot de
  2017-08-15 13:11 ` maiku.fabian at gmail dot com
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tg at mirbsd dot de @ 2017-07-14 12:04 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #3 from Thorsten Glaser <tg at mirbsd dot de> ---
Created attachment 10257
  --> https://sourceware.org/bugzilla/attachment.cgi?id=10257&action=edit
tarball of “git am”able patches

I’ve done the patches and compared the output, which Looks Good To Me™. Please
apply.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (3 preceding siblings ...)
  2017-07-14 12:04 ` tg at mirbsd dot de
@ 2017-08-15 13:11 ` maiku.fabian at gmail dot com
  2017-08-16  7:54 ` maiku.fabian at gmail dot com
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-08-15 13:11 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maiku.fabian at gmail dot com
           Assignee|unassigned at sourceware dot org   |maiku.fabian at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (4 preceding siblings ...)
  2017-08-15 13:11 ` maiku.fabian at gmail dot com
@ 2017-08-16  7:54 ` maiku.fabian at gmail dot com
  2017-08-16 14:17 ` maiku.fabian at gmail dot com
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-08-16  7:54 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2017-08-16
     Ever confirmed|0                           |1

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (5 preceding siblings ...)
  2017-08-16  7:54 ` maiku.fabian at gmail dot com
@ 2017-08-16 14:17 ` maiku.fabian at gmail dot com
  2017-08-16 15:28 ` maiku.fabian at gmail dot com
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-08-16 14:17 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #5 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Summary of the chatlog in the last comment

https://sourceware.org/bugzilla/show_bug.cgi?id=21750#c4

in English:

Thorsten and me agree that setting the width of U+3248..U+324F
to 2 because the glyphs for these characters are quadratic in
most fonts.

(I also asked on the
Unicode mailing list now whether this could be a bug in
the Unicode data:
http://www.unicode.org/mail-arch/unicode-ml/y2017-m08/0007.html
But even if it is not a bug, setting these to 2 seems to be
much better for users of terminals and that is what wcwidth
in glibc is mostly used for after all).

We also agree to set the width of the hexagrams U+4DC0..U+4DFF is
considerably wider than single width in most fonts. In some
classic Xorg fonts they are fully double width. In most scalable fonts
they are somewhat narrower than double width but considerabely wider
then single width. So marking them as width 1 would cause
problems in terminals, even if they are not fully double width
it makes sense to mark them as width 2 because they certainly
won’t fit in a single character cell in a terminal.

We also agree that the Hangul Jamo U+1160‥U+11FF are sort
of "combining characters" although they are not marked as such
in the Unicode data. But they are fragments of Hangul characters
which combine. So it seems correct to mark them as width 0.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (6 preceding siblings ...)
  2017-08-16 14:17 ` maiku.fabian at gmail dot com
@ 2017-08-16 15:28 ` maiku.fabian at gmail dot com
  2017-08-16 18:18 ` egmont at gmail dot com
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-08-16 15:28 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #6 from Mike FABIAN <maiku.fabian at gmail dot com> ---
And we also agree that setting the width of the soft hyphen U+00AD
to 0 as in Unicode seems to be not helpful for terminal
applications and as wcwidth is mostly important for terminal
applications, it makes sense to keep set the width of U+00AD to
1 as it "historically" always was in wcwidth.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (7 preceding siblings ...)
  2017-08-16 15:28 ` maiku.fabian at gmail dot com
@ 2017-08-16 18:18 ` egmont at gmail dot com
  2017-08-17  9:07 ` tg at mirbsd dot de
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: egmont at gmail dot com @ 2017-08-16 18:18 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

Egmont Koblinger <egmont at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |egmont at gmail dot com

--- Comment #7 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to Mike FABIAN from comment #5)

> [...] setting these to 2 seems to be
> much better for users of terminals and that is what wcwidth
> in glibc is mostly used for after all).

Guys,

With a huge thanks and great respect towards you working on addressing these
issues, allow me please firmly oppose against deviating from the Unicode
database.

The width is probably indeed primarly used by terminal emulators and apps
running inside. They, however, use all kinds of various sources for this data,
not just glibc's wcwidth().

For example, VTE-based emulators (such as GNOME Terminal) rely on glib's
g_unichar_iswide(), see [1]. Alas I don't have any usage metrics, but the poll
at [2] suggest that VTE's usage share amongst terminal emulators on Linux might
be somewhere in the ballpark of 50%.

As for apps, if my memories are correct, I believe Vim uses its own built-in
database rather than wcwidth(). So does the Joe text editor [3] (okay, it's a
really marginal one), and presumably many more apps.

Let alone all other non-glibc based systems with their own wcwidth()
implementation that one might ssh to/from.

For apps inside terminal emulators to work correctly, it's crucial that all the
relevant components agree on the width. This has caused quite a headache when
Unicode 9.0 changed the width of plenty of codepoints, see e.g. the bugreport
with animgif at [4] (and tons of duplicates in other bugzillas and
stackoverflow forums), but this is going to fade away as eventually everyone's
upgrading their Unicode version.

You cannot, however, reasonably assume that other folks out there, i.e.
terminal emulators as well as applications that don't rely on wcwidth() but
some other data source, or those other data sources such as glib and probably a
whole lot more, are all going to apply your modifications. And then again we
haven't talked about ssh'ing to/from non-glibc systems.

If a certain glyph does not fit in its designated character cell, most terminal
emulators will overflow it to the next cell. A slight overflow happens at way
more codepoints than the ones debated now, e.g. in case of VTE and a not too
large font, even the antialiasing of English letters such as 'W', 'm' overflows
to the next cell. Of course I understand that the overflow in case of U+3248
"㉈" and friends is way more prominent, potentially causing the given and the
subsequent glyph not to be readable at all, which is indeed bad.

But causing the entire canvas's contents to fall apart is even worse. And
that's what typically happens when players of the game disagree on the width,
as seen e.g. again at [4].

If you'd really like to see these particular codepoints becoming double wide
(which I'm also in favor of), I firmly believe this change should be made in
the Unicode database first, so that eventually everyone implementing a
wcwidth()-like method gets that update; rather than just glibc, resulting in a
long-term disagreement between parties and in turn inevitable corruption of the
entire terminal window in quite a few terminal emulators and apps.

[1] https://bugzilla.gnome.org/show_bug.cgi?id=772890
[2] https://opensource.com/life/15/11/top-open-source-terminal-emulators
[3] https://sourceforge.net/p/joe-editor/bugs/363/
[4] https://github.com/powerline/powerline/issues/1652

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (8 preceding siblings ...)
  2017-08-16 18:18 ` egmont at gmail dot com
@ 2017-08-17  9:07 ` tg at mirbsd dot de
  2017-08-17  9:16 ` cvs-commit at gcc dot gnu.org
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tg at mirbsd dot de @ 2017-08-17  9:07 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #8 from Thorsten Glaser <tg at mirbsd dot de> ---
Hi Egmont,

only a short response because we have FrOSCon/FrogLabs preparations and
workshop until Monday:

We’re not strictly speaking deviating from UCD because UCD does *not* define
wcwidth.

Terminal emulators use wcwidth, especially xterm uses ONLY it *and* defines it.

Applications such as editors in the terminal (cf. jupp) use wcwidth or carry
their own data which is prepared the same way as wcwidth (often they use a copy
of xterm's code).

You speak of compatibility and breaking. Strictly speaking, the switch glibc
recently (two or three majors, I think) did to regenerated data *did* break
applications, and this bugreport is 100% returning the glibc data to the way it
was before in the places the previous change introduced bugs, while still
keeping it up-to-date with recent Unicode.

So, therefore, with this patch applied, less things will break than without.

Outlyers like libglib (used by only one of the multitude of terminal emulators)
can then import the data (and mechanism used to generate) from here.

Other systems use the old wcwidth code from xterm, to which this one (with my
patches applied) is compatible for all chars that did not get changed in or
added to Unicode, which is the maximum compatibility and an easily to achieved,
and achievable and should-be-achieved goal.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (9 preceding siblings ...)
  2017-08-17  9:07 ` tg at mirbsd dot de
@ 2017-08-17  9:16 ` cvs-commit at gcc dot gnu.org
  2017-08-17 13:51 ` maiku.fabian at gmail dot com
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2017-08-17  9:16 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #9 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  bb6274ee1293a6bc76d9d7c889783303de181295 (commit)
       via  c14b84baae83bfb73f7cd00ba7c24964ad1c712c (commit)
       via  7a79e321c6f85b204036c33d85f6b2aa794e7c76 (commit)
       via  267ee5d7ab57591a6b1bc2d2a010c88188427063 (commit)
       via  41b6f0ce85d98c62739b04863e8c38a1f4154e80 (commit)
       via  580be3035d2e0f479c4ac955bf719b0bf936f5cf (commit)
      from  038d1cafafb3094a9fbebd35f4aa8d0ebae0e55b (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bb6274ee1293a6bc76d9d7c889783303de181295

commit bb6274ee1293a6bc76d9d7c889783303de181295
Author: Akhilesh Kumar <akhilesh.k@samsung.com>
Date:   Wed Aug 16 15:33:58 2017 +0530

    Fix abmon for bem_ZM

    Until now the abbreviated month names were in English.

        [BZ #21960]
        * locales/bem_ZM (LC_TIME): Fix abmon, make it agree with CLDR.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c14b84baae83bfb73f7cd00ba7c24964ad1c712c

commit c14b84baae83bfb73f7cd00ba7c24964ad1c712c
Author: Akhilesh Kumar <akhilesh.k@samsung.com>
Date:   Wed Aug 16 18:01:53 2017 +0530

    Fix country name for xh_ZA

        [BZ #21959]
        * locales/xh_ZA (LC_ADDRESS): Fix country name.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7a79e321c6f85b204036c33d85f6b2aa794e7c76

commit 7a79e321c6f85b204036c33d85f6b2aa794e7c76
Author: Thorsten Glaser <tg@mirbsd.de>
Date:   Fri Jul 14 14:02:50 2017 +0200

    Refresh generated charmap data and ChangeLog

        [BZ #21750]
        * charmaps/UTF-8: Refresh.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=267ee5d7ab57591a6b1bc2d2a010c88188427063

commit 267ee5d7ab57591a6b1bc2d2a010c88188427063
Author: Thorsten Glaser <tg@mirbsd.de>
Date:   Fri Jul 14 14:02:46 2017 +0200

    Resolve some historically special cases of ambiguous width

    [BZ #21750]
    * unicode-gen/utf8_gen.py (U+00AD): Set width to 1.
    * unicode-gen/utf8_gen.py (U+1160..U+11FF): Set width to 0.
    * unicode-gen/utf8_gen.py (U+3248..U+324F): Set width to 2.
    * unicode-gen/utf8_gen.py (U+4DC0..U+4DFF): Likewise.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=41b6f0ce85d98c62739b04863e8c38a1f4154e80

commit 41b6f0ce85d98c62739b04863e8c38a1f4154e80
Author: Thorsten Glaser <tg@mirbsd.de>
Date:   Fri Jul 14 14:02:44 2017 +0200

    Handle more cases of combining characters

    [BZ #21750]
    * unicode-gen/utf8_gen.py: Treat category Me and Mn as combining.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=580be3035d2e0f479c4ac955bf719b0bf936f5cf

commit 580be3035d2e0f479c4ac955bf719b0bf936f5cf
Author: Thorsten Glaser <tg@mirbsd.de>
Date:   Fri Jul 14 14:02:37 2017 +0200

    UnicodeData has precedence over EastAsianWidth

    [BZ #19852]
    [BZ #21750]
    * unicode-gen/utf8_gen.py: Process EastAsianWidth lines before
      UnicodeData lines so the latter have precedence; remove hack
      to group output by EastAsianWidth ranges.

-----------------------------------------------------------------------

Summary of changes:
 localedata/ChangeLog               |   24 +
 localedata/charmaps/UTF-8          |111468
+++++++++++++++++++++++++++++++++++-
 localedata/locales/bem_ZM          |   25 +-
 localedata/locales/xh_ZA           |    5 +-
 localedata/unicode-gen/utf8_gen.py |   38 +-
 5 files changed, 111400 insertions(+), 160 deletions(-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (10 preceding siblings ...)
  2017-08-17  9:16 ` cvs-commit at gcc dot gnu.org
@ 2017-08-17 13:51 ` maiku.fabian at gmail dot com
  2017-08-18  7:29 ` schwab@linux-m68k.org
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-08-17 13:51 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|---                         |2.27

--- Comment #10 from Mike FABIAN <maiku.fabian at gmail dot com> ---
FIXED.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (11 preceding siblings ...)
  2017-08-17 13:51 ` maiku.fabian at gmail dot com
@ 2017-08-18  7:29 ` schwab@linux-m68k.org
  2017-08-18 11:04 ` egmont at gmail dot com
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: schwab@linux-m68k.org @ 2017-08-18  7:29 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

Andreas Schwab <schwab@linux-m68k.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---

--- Comment #11 from Andreas Schwab <schwab@linux-m68k.org> ---
Thorsten Glaser does not have an assignment for glibc on file, we cannot accept
his contributions until this is sorted out.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (12 preceding siblings ...)
  2017-08-18  7:29 ` schwab@linux-m68k.org
@ 2017-08-18 11:04 ` egmont at gmail dot com
  2017-08-21  7:24 ` cvs-commit at gcc dot gnu.org
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: egmont at gmail dot com @ 2017-08-18 11:04 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #12 from Egmont Koblinger <egmont at gmail dot com> ---
(In reply to Thorsten Glaser from comment #8)

> We’re not strictly speaking deviating from UCD because UCD does *not* define
> wcwidth.

Well, it defines the East_Asian_Width property from which you derive wcwidth
using a couple of generic rules plus a few exceptions to them.

You've just (re?)added 3248..324F and a few other ranges to these exceptions,
which in my eyes means that yes, you are deviating from Unicode.

> Terminal emulators use wcwidth, especially xterm uses ONLY it *and* defines
> it.
> 
> Applications such as editors in the terminal (cf. jupp) use wcwidth or carry
> their own data which is prepared the same way as wcwidth (often they use a
> copy of xterm's code).

To be more precise, xterm and a few others copy Markus Kuhn's implementation. I
don't think anyone copies from xterm.

This defines the 3248..324F range as ambiguous (I've checked the most recent
xterm-330 and a randomly chosen ~4 year old xterm-300 – a randomly picked even
older xterm-260 is different which suggests that case xterm has a long ago
caught up with the changes), which, by default, means it is 1 cell wide in
xterm (unless -cjk_width is specified in which case all other ambiguous ones
are turned into double)...

> You speak of compatibility and breaking. Strictly speaking, the switch glibc
> recently (two or three majors, I think) did to regenerated data *did* break
> applications, and this bugreport is 100% returning the glibc data to the way
> it was before in the places the previous change introduced bugs, while still
> keeping it up-to-date with recent Unicode.
> 
> So, therefore, with this patch applied, less things will break than without.

... so I absolutely don't get why less things would be broken now. As far as I
can see, with this patch you have just further broken the handling of these
codepoints by deviating from Unicode and from xterm.

> Outlyers like libglib (used by only one of the multitude of terminal
> emulators) can then import the data (and mechanism used to generate) from
> here.

You really don't seriously expect that two glibc maintainers decide over a chat
that they add a few exceptions to the generic rules, and "outlyers" (like glib,
maybe Qt, maybe Java, maybe some other "giant" pieces of (perhaps commercial)
software, maybe other libc implementations of other Unices (like Mac), maybe a
whole lot more) will follow; do you??

(And on a side note... IMHO submitting a change right after someone brings up
some concerns, not even giving time for a reasonable discussion, isn't really a
polite thing... Especially since recently it took me about 2 years and about
10-15 pings that were left unanswered to get through a well unittested locale
change, I can't understand why this hurry now.)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (13 preceding siblings ...)
  2017-08-18 11:04 ` egmont at gmail dot com
@ 2017-08-21  7:24 ` cvs-commit at gcc dot gnu.org
  2017-09-03 16:35 ` vapier at gentoo dot org
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2017-08-21  7:24 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #13 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  486afa6d27156665959e59b86e7aad18c3832cbe (commit)
      from  a3fe6a20bf81ef6a97a761dac9050517e7fd7a1f (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=486afa6d27156665959e59b86e7aad18c3832cbe

commit 486afa6d27156665959e59b86e7aad18c3832cbe
Author: Mike FABIAN <mfabian@redhat.com>
Date:   Fri Aug 18 13:41:34 2017 +0200

    Use the range notation in charmaps/UTF-8 for all ranges of neighbouring
characters with the same width

        [BZ #21750]
        * charmaps/UTF-8: Use the range notation for all ranges
        of neighbouring characters with the same width.

-----------------------------------------------------------------------

Summary of changes:
 localedata/ChangeLog      |    6 +
 localedata/charmaps/UTF-8 |113545
+--------------------------------------------
 2 files changed, 300 insertions(+), 113251 deletions(-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (14 preceding siblings ...)
  2017-08-21  7:24 ` cvs-commit at gcc dot gnu.org
@ 2017-09-03 16:35 ` vapier at gentoo dot org
  2017-09-03 20:43 ` vapier at gentoo dot org
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: vapier at gentoo dot org @ 2017-09-03 16:35 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #14 from Mike Frysinger <vapier at gentoo dot org> ---
this bug report has a lot of things in it.  i think each request in the
original post should be split out into sep reports.  other than overall
discussion about keeping things in sync, it's impossible to follow discussion
about specific codepoints.

wrt U+00AD: https://www.cs.tut.fi/~jkorpela/shy.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (15 preceding siblings ...)
  2017-09-03 16:35 ` vapier at gentoo dot org
@ 2017-09-03 20:43 ` vapier at gentoo dot org
  2017-09-03 21:03 ` vapier at gentoo dot org
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: vapier at gentoo dot org @ 2017-09-03 20:43 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

Mike Frysinger <vapier at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |22073


Referenced Bugs:

https://sourceware.org/bugzilla/show_bug.cgi?id=22073
[Bug 22073] charmaps/UTF-8: wcwidth of U+00AD: 0 or 1 ?
-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (16 preceding siblings ...)
  2017-09-03 20:43 ` vapier at gentoo dot org
@ 2017-09-03 21:03 ` vapier at gentoo dot org
  2017-09-03 21:32 ` vapier at gentoo dot org
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: vapier at gentoo dot org @ 2017-09-03 21:03 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

Mike Frysinger <vapier at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |22074


Referenced Bugs:

https://sourceware.org/bugzilla/show_bug.cgi?id=22074
[Bug 22074] charmaps/UTF-8: wcwidth for U+1160-U+11FF (Hangul Jungseong and
Jongseong) should be 0
-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (17 preceding siblings ...)
  2017-09-03 21:03 ` vapier at gentoo dot org
@ 2017-09-03 21:32 ` vapier at gentoo dot org
  2017-09-04 14:33 ` maiku.fabian at gmail dot com
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: vapier at gentoo dot org @ 2017-09-03 21:32 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #15 from Mike Frysinger <vapier at gentoo dot org> ---
i've forked soft hyphen (U+00AD) into bug 22073 and Hangul Jamo into bug 22074.
 feel free to take follow ups for those topics to those respective bugs so the
discussion can stay focused and not get cluttered up.

i haven't looked into the other codepoints raised in the original comment, so
if they aren't resolved, feel free to fork them out too.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (18 preceding siblings ...)
  2017-09-03 21:32 ` vapier at gentoo dot org
@ 2017-09-04 14:33 ` maiku.fabian at gmail dot com
  2017-09-06 13:06 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-09-04 14:33 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #16 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Mike Frysinger from comment #15)
> i've forked soft hyphen (U+00AD) into bug 22073 and Hangul Jamo into bug
> 22074.  feel free to take follow ups for those topics to those respective
> bugs so the discussion can stay focused and not get cluttered up.
> 
> i haven't looked into the other codepoints raised in the original comment,
> so if they aren't resolved, feel free to fork them out too.

For the code points

3248..324F;A # No [8] CIRCLED NUMBER TEN ON BLACK SQUARE..CIRCLED NUMBER EIGHTY
ON BLACK SQUARE 

I asked on the unicode mailing list:

http://www.unicode.org/mail-arch/unicode-ml/y2017-m08/0007.html

And the response makes me think that we are free to use wcwidth 2 for
these in glibc if that fits our “context” best:

http://www.unicode.org/mail-arch/unicode-ml/y2017-m08/0023.html

> "A" means, you get to decide whether to treat these as "W" or "N" based on context.
>
> There's really not strong need to change an "A" towards "W", because
> "A" doesn't get in your way if you decided that "W" works better for
> you.
>
> Remember that all the EAW properties ares supposed to be "resolved"
> down to W or N. For some, like Na that resolution is deterministic,
> for A it is context/application dependent, but when you finally
> process your data, only W(ide) or N(arrow) remain after resolution.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (19 preceding siblings ...)
  2017-09-04 14:33 ` maiku.fabian at gmail dot com
@ 2017-09-06 13:06 ` cvs-commit at gcc dot gnu.org
  2017-09-14 13:45 ` maiku.fabian at gmail dot com
  2017-09-14 18:25 ` tg at mirbsd dot de
  22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2017-09-06 13:06 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #17 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  2ae5be041d9ea89cdd0f37734d72051e8f773947 (commit)
       via  af83ed5c4647bda196fc1a7efebbe8019aa83f4a (commit)
      from  4f3647e46e3f645c6516faa299efc6e89d520d7b (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2ae5be041d9ea89cdd0f37734d72051e8f773947

commit 2ae5be041d9ea89cdd0f37734d72051e8f773947
Author: Mike FABIAN <mfabian@redhat.com>
Date:   Wed Sep 6 11:19:33 2017 +0200

    Improve utf8_gen.py to set the width for characters with
Prepended_Concatenation_Mark property to 1

        [BZ #22070]
        * localedata/unicode-gen/utf8_gen.py: Set the width for
        characters with Prepended_Concatenation_Mark property to 1
        * localedata/charmaps/UTF-8: Updated using the improved script.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=af83ed5c4647bda196fc1a7efebbe8019aa83f4a

commit af83ed5c4647bda196fc1a7efebbe8019aa83f4a
Author: Mike FABIAN <mfabian@redhat.com>
Date:   Fri Aug 18 10:12:29 2017 +0200

    Write all ranges of neighbouring characters with the same width using the
range notation in charmaps/UTF-8

    Writing ranges of neighbouring characters with the same with like this

        <U000E0100>...<U000E01EF>       0

    in charmaps/UTF-8 is more efficient than writing many single character
lines
    like:

        <U000E0100>     0
        <U000E0101>     0
        ...

        [BZ #21750]
        * unicode-gen/utf8_gen.py: Write all ranges of neighbouring characters
        with the same width using the range notation in charmaps/UTF-8.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                           |   14 +
 localedata/charmaps/UTF-8           |   10 +-
 localedata/unicode-gen/Makefile     |    4 +-
 localedata/unicode-gen/PropList.txt | 1618 +++++++++++++++++++++++++++++++++++
 localedata/unicode-gen/utf8_gen.py  |   84 ++-
 5 files changed, 1704 insertions(+), 26 deletions(-)
 create mode 100644 localedata/unicode-gen/PropList.txt

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (20 preceding siblings ...)
  2017-09-06 13:06 ` cvs-commit at gcc dot gnu.org
@ 2017-09-14 13:45 ` maiku.fabian at gmail dot com
  2017-09-14 18:25 ` tg at mirbsd dot de
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-09-14 13:45 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #18 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Mike Frysinger from comment #15)
> i've forked soft hyphen (U+00AD) into bug 22073 and Hangul Jamo into bug
> 22074.  feel free to take follow ups for those topics to those respective
> bugs so the discussion can stay focused and not get cluttered up.
> 
> i haven't looked into the other codepoints raised in the original comment,
> so if they aren't resolved, feel free to fork them out too.

I think there is nothing more to do in this bug here, 
therefore I close it as FIXED.

(Copyright assignment by Thorsen Glaser is underway).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/21750] column width of characters incompatible with classical wcwidth
  2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
                   ` (21 preceding siblings ...)
  2017-09-14 13:45 ` maiku.fabian at gmail dot com
@ 2017-09-14 18:25 ` tg at mirbsd dot de
  22 siblings, 0 replies; 24+ messages in thread
From: tg at mirbsd dot de @ 2017-09-14 18:25 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=21750

--- Comment #19 from Thorsten Glaser <tg at mirbsd dot de> ---
I submitted it on Wed, 6 Sep 2017 15:15:38 +0000 (UTC)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-09-14 16:38 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-11 14:18 [Bug localedata/21750] New: column width of characters incompatible with classical wcwidth tg at mirbsd dot de
2017-07-12 11:01 ` [Bug localedata/21750] " tjk at tksoft dot com
2017-07-12 11:01 ` [Bug localedata/21750] New: " Troy Korjuslommi
2017-07-12 13:39 ` [Bug localedata/21750] " tg at mirbsd dot de
2017-07-14 12:04 ` tg at mirbsd dot de
2017-08-15 13:11 ` maiku.fabian at gmail dot com
2017-08-16  7:54 ` maiku.fabian at gmail dot com
2017-08-16 14:17 ` maiku.fabian at gmail dot com
2017-08-16 15:28 ` maiku.fabian at gmail dot com
2017-08-16 18:18 ` egmont at gmail dot com
2017-08-17  9:07 ` tg at mirbsd dot de
2017-08-17  9:16 ` cvs-commit at gcc dot gnu.org
2017-08-17 13:51 ` maiku.fabian at gmail dot com
2017-08-18  7:29 ` schwab@linux-m68k.org
2017-08-18 11:04 ` egmont at gmail dot com
2017-08-21  7:24 ` cvs-commit at gcc dot gnu.org
2017-09-03 16:35 ` vapier at gentoo dot org
2017-09-03 20:43 ` vapier at gentoo dot org
2017-09-03 21:03 ` vapier at gentoo dot org
2017-09-03 21:32 ` vapier at gentoo dot org
2017-09-04 14:33 ` maiku.fabian at gmail dot com
2017-09-06 13:06 ` cvs-commit at gcc dot gnu.org
2017-09-14 13:45 ` maiku.fabian at gmail dot com
2017-09-14 18:25 ` tg at mirbsd dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).