public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/26120] New: column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
@ 2020-06-16  5:43 maiku.fabian at gmail dot com
  2020-06-16  5:44 ` [Bug localedata/26120] " maiku.fabian at gmail dot com
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-16  5:43 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

            Bug ID: 26120
           Summary: column width of  of some Korean JUNGSEONG/JONGSEONG
                    characters wrong (should be 0)
           Product: glibc
           Version: 2.31
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: maiku.fabian at gmail dot com
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

Robert Ross <rob.ross@ymail.com> writes:

> Thank you for maintaining glibc's "localedata/charmaps/UTF-8".  It is
> good that most "HANGUL JUNGSEONG" characters have zero width due to
> "<U1160>...<U11FF> 0" on line 48775 but strange that the newer "HANGUL
> JUNGSEONG" characters have width 1 since there is no
> "<UD7B0>...<UD7C6> 0".  Similarly most "HANGUL JONGSEONG" characters
> have width 0 due to line 48775 but the newer ones have width 1 since
> there is no "<UD7CB>...<UD7FB> 0".  Please correct this if it's an
> error or explain if it's not.

In https://www.unicode.org/Public/13.0.0/ucd/EastAsianWidth.txt all of these
have width "N".

http://www.unicode.org/reports/tr11/ says:

6.2 Combining Marks

> Combining marks have been classified and are given a property
> assignment based on their typical applicability. For example,
> combining marks typically applied to characters of class N, Na, or W
> are classified as A. Combining marks for purely non-East Asian scripts
> are marked as N, and nonspacing marks used only with wide characters
> are given a W. Even more so than for other characters, the
> East_Asian_Width property for combining marks is not the same as their
> display width.
> 
> In particular, nonspacing marks do not possess actual advance
> width. Therefore, even when displaying combining marks, the
> East_Asian_Width property cannot be related to the advance width of
> these characters. However, it can be useful in determining the
> encoding length in a legacy encoding, or the choice of font for the
> range of characters including that nonspacing mark. The width of the
> glyph image of a nonspacing mark should always be chosen as the
> appropriate one for the width of the base character.

See also: https://sourceware.org/bugzilla/show_bug.cgi?id=21750#c5

> We also agree that the Hangul Jamo U+1160‥U+11FF are sort
> of "combining characters" although they are not marked as such
> in the Unicode data. But they are fragments of Hangul characters
> which combine. So it seems correct to mark them as width 0.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
@ 2020-06-16  5:44 ` maiku.fabian at gmail dot com
  2020-06-16  5:45 ` maiku.fabian at gmail dot com
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-16  5:44 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tg at mirbsd dot de

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
  2020-06-16  5:44 ` [Bug localedata/26120] " maiku.fabian at gmail dot com
@ 2020-06-16  5:45 ` maiku.fabian at gmail dot com
  2020-06-16  5:46 ` maiku.fabian at gmail dot com
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-16  5:45 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |egmont at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
  2020-06-16  5:44 ` [Bug localedata/26120] " maiku.fabian at gmail dot com
  2020-06-16  5:45 ` maiku.fabian at gmail dot com
@ 2020-06-16  5:46 ` maiku.fabian at gmail dot com
  2020-06-16  5:47 ` maiku.fabian at gmail dot com
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-16  5:46 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rob.ross at ymail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (2 preceding siblings ...)
  2020-06-16  5:46 ` maiku.fabian at gmail dot com
@ 2020-06-16  5:47 ` maiku.fabian at gmail dot com
  2020-06-16  5:47 ` maiku.fabian at gmail dot com
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-16  5:47 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

--- Comment #1 from Mike FABIAN <maiku.fabian at gmail dot com> ---
So I think it is best to set all JUNGSEONG/JONGSEONG characters to width 0.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (3 preceding siblings ...)
  2020-06-16  5:47 ` maiku.fabian at gmail dot com
@ 2020-06-16  5:47 ` maiku.fabian at gmail dot com
  2020-06-16  5:58 ` maiku.fabian at gmail dot com
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-16  5:47 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at sourceware dot org   |maiku.fabian at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (4 preceding siblings ...)
  2020-06-16  5:47 ` maiku.fabian at gmail dot com
@ 2020-06-16  5:58 ` maiku.fabian at gmail dot com
  2020-06-16  8:26 ` maiku.fabian at gmail dot com
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-16  5:58 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

--- Comment #2 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Some information from a chat with Thorsten Glaser (in German):

<mfabian> Alles was  JUNGSEONG oder JONGSEONG im Namen hat, ist so ein
Combining
          Character? [20年06月15日 21:38:17]
<MirWarm> soweit ich das verstanden habe, sind die koreanischen zeichen immer
          choseong + j{u,o}ngseong [20年06月15日 21:54:15]
<MirWarm>  The Hangul jamo are divided into three classes: choseong (Leading
          consonants), jungseong (Vowels) and jongseong (Trailing consonants)
          which in the rest of this write-up will be referred to as L, V and T.
                                                          [20年06月15日 21:58:54]
<MirWarm> A standard Hangul syllable is composed as (L+V+T*)
                                                          [20年06月15日 21:58:55]
<MirWarm> ah, ja [20年06月15日 21:58:57]
<MirWarm> also die choseong sind wohl nicht required im koreanischen Skript,
          aber in Unicode wohl, man muß dann mit U+115F anfangen
                                                          [20年06月15日 21:59:24]
<MirWarm> choseong ist initial (C), jungseong ist medial (G) und nucleus (V),
          jongseong ist coda (K) [20年06月15日 22:00:15]
<MirWarm> und koreanische silbenwörter sind (C)(G)V(K) [20年06月15日 22:00:27]
<MirWarm> und in Unicode nimmt man U+115F, wenn C fehlt [20年06月15日 22:00:53]
<MirWarm> 115F ist 1, die anderen sind 0 [20年06月15日 22:01:06]
<MirWarm> paßt [20年06月15日 22:01:07]
<MirWarm> bin in ~5 minuten wieder da [20年06月15日 22:01:14]
*** MirWarm (~mird@2001-4dd7-dca-0-21f-3bff-fe0d-cbb1.ipv6dyn.netcologne.de)
has
    quit: Quit: using sirc version 2.211-MirDebian-20181124-1+ssfe
(RANDOM=2406)
                                                          [20年06月15日 22:01:15]
*** MirWarm (~mird@x61e.mirbsd.org) has joined channel #mirbsd
                                                          [20年06月15日 22:06:44]
<MirWarm> re [20年06月15日 22:07:05]
<MirWarm> ich mach bei mir dann gleich mal D7B0 .. D7FF noch auf 0
                                                          [20年06月15日 22:08:33]
<MirWarm> so, committed [20年06月15日 22:31:19]

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (5 preceding siblings ...)
  2020-06-16  5:58 ` maiku.fabian at gmail dot com
@ 2020-06-16  8:26 ` maiku.fabian at gmail dot com
  2020-06-16 11:53 ` fweimer at redhat dot com
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-16  8:26 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

--- Comment #3 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 12623
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12623&action=edit
0001-Set-width-of-JUNGSEONG-JONGSEONG-characters-from-UD7.patch

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (6 preceding siblings ...)
  2020-06-16  8:26 ` maiku.fabian at gmail dot com
@ 2020-06-16 11:53 ` fweimer at redhat dot com
  2020-06-16 17:32 ` maiku.fabian at gmail dot com
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: fweimer at redhat dot com @ 2020-06-16 11:53 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-
                 CC|                            |fweimer at redhat dot com

--- Comment #4 from Florian Weimer <fweimer at redhat dot com> ---
Does gnulib need updating as well?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (7 preceding siblings ...)
  2020-06-16 11:53 ` fweimer at redhat dot com
@ 2020-06-16 17:32 ` maiku.fabian at gmail dot com
  2020-06-16 17:35 ` fweimer at redhat dot com
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-16 17:32 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

--- Comment #5 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Florian Weimer from comment #4)
> Does gnulib need updating as well?

I don’t know. Does gnulib have width data?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (8 preceding siblings ...)
  2020-06-16 17:32 ` maiku.fabian at gmail dot com
@ 2020-06-16 17:35 ` fweimer at redhat dot com
  2020-06-20 21:19 ` tg at mirbsd dot de
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: fweimer at redhat dot com @ 2020-06-16 17:35 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

--- Comment #6 from Florian Weimer <fweimer at redhat dot com> ---
Yes, I think it's here:

http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/uniwidth/width.c;h=c760ad33183418a8f103152ff43d57fabbc3949d;hb=HEAD

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (9 preceding siblings ...)
  2020-06-16 17:35 ` fweimer at redhat dot com
@ 2020-06-20 21:19 ` tg at mirbsd dot de
  2020-06-21  9:00 ` maiku.fabian at gmail dot com
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tg at mirbsd dot de @ 2020-06-20 21:19 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

--- Comment #7 from Thorsten Glaser <tg at mirbsd dot de> ---
Erk… glibc is particular about not defining widths of not-defined characters.

Besides D7FC‥D7FF (which gave me an error in the output from my own scripts),
D7C7‥D7CA are not yet assigned and so probably need to be excluded in glibc.

Should they ever be defined, we’ll need to adjust here, so it’s probably better
to iterate over the entire D7C0‥D7FF range and ony change widths for defined
codepoints from the current UCD version.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (10 preceding siblings ...)
  2020-06-20 21:19 ` tg at mirbsd dot de
@ 2020-06-21  9:00 ` maiku.fabian at gmail dot com
  2020-06-21  9:07 ` maiku.fabian at gmail dot com
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-21  9:00 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

--- Comment #8 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 12629
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12629&action=edit
0001-Set-width-of-JUNGSEONG-JONGSEONG-characters-from-UD7.patch

Updated patch to ommit the unassigned characters.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (11 preceding siblings ...)
  2020-06-21  9:00 ` maiku.fabian at gmail dot com
@ 2020-06-21  9:07 ` maiku.fabian at gmail dot com
  2020-06-21 14:20 ` tg at mirbsd dot de
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-21  9:07 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

--- Comment #9 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Thorsten Glaser from comment #7)
> Erk… glibc is particular about not defining widths of not-defined characters.
> 
> Besides D7FC‥D7FF (which gave me an error in the output from my own
> scripts), D7C7‥D7CA are not yet assigned and so probably need to be excluded
> in glibc.
> 
> Should they ever be defined, we’ll need to adjust here, so it’s probably
> better to iterate over the entire D7C0‥D7FF range and ony change widths for
> defined codepoints from the current UCD version.

Thank you for noticing that!

I was aware that glibc has a problem with defining width of unassigned
characters, therefore I used 

 for key in list(range(0xD7B0, 0xD7FC)):

instead of 

 for key in list(range(0xD7B0, 0xD800)):

because D7FC and D7FF are undefined and localedef gave me errors
when I included them. Surprisingly localedef did not give  errors for the
unassigned D7C7‥D7CA ...

I had checked the range manually and thought all characters
from D7B0 to D7FB were assigned, but apparently I missed D7C7‥D7CA.

I improved the generator script a bit to omit the unassigned characters,
if these get defined in future, the script would add them.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (12 preceding siblings ...)
  2020-06-21  9:07 ` maiku.fabian at gmail dot com
@ 2020-06-21 14:20 ` tg at mirbsd dot de
  2020-06-23  7:08 ` maiku.fabian at gmail dot com
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tg at mirbsd dot de @ 2020-06-21 14:20 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

--- Comment #10 from Thorsten Glaser <tg at mirbsd dot de> ---
Looks okay (but now you can use 0xD800 in the range call), this is similar to
what I did in my script
http://www.mirbsd.org/cvs.cgi/contrib/code/Snippets/eaw2glibc that
postprocesses the width output I normally use (script
http://www.mirbsd.org/cvs.cgi/contrib/code/Snippets/eawparse and
http://www.mirbsd.org/cvs.cgi/X11/xc/programs/xterm/wcwidth.c?rev=HEAD contains
an example of its output) into glibc-compatible format.

The output I get (for UCD 13.0.0) is identical to yours.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (13 preceding siblings ...)
  2020-06-21 14:20 ` tg at mirbsd dot de
@ 2020-06-23  7:08 ` maiku.fabian at gmail dot com
  2020-06-23  7:33 ` tg at mirbsd dot de
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-23  7:08 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

--- Comment #11 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Thorsten Glaser from comment #10)
> Looks okay (but now you can use 0xD800 in the range call), 

Yes, I could. But if 0xD7FE and 0xD7FF ever get assigned, 
would they be characters of the same type? I would have to check 
that manually anyway.

> The output I get (for UCD 13.0.0) is identical to yours.

Great!

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (14 preceding siblings ...)
  2020-06-23  7:08 ` maiku.fabian at gmail dot com
@ 2020-06-23  7:33 ` tg at mirbsd dot de
  2020-06-23  8:50 ` maiku.fabian at gmail dot com
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tg at mirbsd dot de @ 2020-06-23  7:33 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

--- Comment #12 from Thorsten Glaser <tg at mirbsd dot de> ---
According to Blocks.txt, yes. Unicode does assign characters to blocks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (15 preceding siblings ...)
  2020-06-23  7:33 ` tg at mirbsd dot de
@ 2020-06-23  8:50 ` maiku.fabian at gmail dot com
  2020-06-23  9:03 ` maiku.fabian at gmail dot com
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-23  8:50 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

--- Comment #13 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Thorsten Glaser from comment #12)
> According to Blocks.txt, yes. Unicode does assign characters to blocks.

D7B0..D7FF; Hangul Jamo Extended-B

I think you are right, I’ll change the script to end the range at the end of
that block, that seems more likely to be correct if these characters ever get
assigned.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (16 preceding siblings ...)
  2020-06-23  8:50 ` maiku.fabian at gmail dot com
@ 2020-06-23  9:03 ` maiku.fabian at gmail dot com
  2020-06-25 13:05 ` maiku.fabian at gmail dot com
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-23  9:03 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #12623|0                           |1
        is obsolete|                            |
  Attachment #12629|0                           |1
        is obsolete|                            |

--- Comment #14 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 12651
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12651&action=edit
0001-Set-width-of-JUNGSEONG-JONGSEONG-characters-from-UD7.patch

End the range at 0xD7FF

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (17 preceding siblings ...)
  2020-06-23  9:03 ` maiku.fabian at gmail dot com
@ 2020-06-25 13:05 ` maiku.fabian at gmail dot com
  2020-06-26 12:26 ` maiku.fabian at gmail dot com
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-25 13:05 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #12651|0                           |1
        is obsolete|                            |

--- Comment #15 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 12661
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12661&action=edit
0001-Set-width-of-JUNGSEONG-JONGSEONG-characters-from-UD7.patch

Use "make install" instead of only changing the UTF-8 file.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (18 preceding siblings ...)
  2020-06-25 13:05 ` maiku.fabian at gmail dot com
@ 2020-06-26 12:26 ` maiku.fabian at gmail dot com
  2020-06-26 12:27 ` maiku.fabian at gmail dot com
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-26 12:26 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |2.32

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (19 preceding siblings ...)
  2020-06-26 12:26 ` maiku.fabian at gmail dot com
@ 2020-06-26 12:27 ` maiku.fabian at gmail dot com
  2020-06-28 12:50 ` maiku.fabian at gmail dot com
  2021-12-30  0:35 ` bruno at clisp dot org
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-26 12:27 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (20 preceding siblings ...)
  2020-06-26 12:27 ` maiku.fabian at gmail dot com
@ 2020-06-28 12:50 ` maiku.fabian at gmail dot com
  2021-12-30  0:35 ` bruno at clisp dot org
  22 siblings, 0 replies; 24+ messages in thread
From: maiku.fabian at gmail dot com @ 2020-06-28 12:50 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #16 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Fixed in glibc master.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug localedata/26120] column width of  of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0)
  2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
                   ` (21 preceding siblings ...)
  2020-06-28 12:50 ` maiku.fabian at gmail dot com
@ 2021-12-30  0:35 ` bruno at clisp dot org
  22 siblings, 0 replies; 24+ messages in thread
From: bruno at clisp dot org @ 2021-12-30  0:35 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=26120

Bruno Haible <bruno at clisp dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bruno at clisp dot org

--- Comment #17 from Bruno Haible <bruno at clisp dot org> ---
(In reply to Florian Weimer from comment #6)
> Yes, I think it's here:
> 
> http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/uniwidth/width.c;h=c760ad33183418a8f103152ff43d57fabbc3949d;hb=HEAD

I have applied an equivalent change to the uc_width function in gnulib:
https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=8026587b94e4274f3406a36bc89348a24ea86b6a

Experiments with xterm did not convince me, but experiments with gnome-terminal
did, since gnome-terminal is widely used and is ahead in terms of Unicode
support.

And the fact that Unicode's EastAsianWidth.txt assigns width 2 to these
characters is irrelevant, because https://www.unicode.org/reports/tr11/ makes
it clear that its focus is about traditional Japanese rendering engines - but
such traditional code cannot handle conjoining Hangul Jamo anyway. Here we need
to care about the Unicode-compliant rendering engines (such as the one in
gnome-terminal), not the legacy rendering engines.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-12-30  0:35 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-16  5:43 [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) maiku.fabian at gmail dot com
2020-06-16  5:44 ` [Bug localedata/26120] " maiku.fabian at gmail dot com
2020-06-16  5:45 ` maiku.fabian at gmail dot com
2020-06-16  5:46 ` maiku.fabian at gmail dot com
2020-06-16  5:47 ` maiku.fabian at gmail dot com
2020-06-16  5:47 ` maiku.fabian at gmail dot com
2020-06-16  5:58 ` maiku.fabian at gmail dot com
2020-06-16  8:26 ` maiku.fabian at gmail dot com
2020-06-16 11:53 ` fweimer at redhat dot com
2020-06-16 17:32 ` maiku.fabian at gmail dot com
2020-06-16 17:35 ` fweimer at redhat dot com
2020-06-20 21:19 ` tg at mirbsd dot de
2020-06-21  9:00 ` maiku.fabian at gmail dot com
2020-06-21  9:07 ` maiku.fabian at gmail dot com
2020-06-21 14:20 ` tg at mirbsd dot de
2020-06-23  7:08 ` maiku.fabian at gmail dot com
2020-06-23  7:33 ` tg at mirbsd dot de
2020-06-23  8:50 ` maiku.fabian at gmail dot com
2020-06-23  9:03 ` maiku.fabian at gmail dot com
2020-06-25 13:05 ` maiku.fabian at gmail dot com
2020-06-26 12:26 ` maiku.fabian at gmail dot com
2020-06-26 12:27 ` maiku.fabian at gmail dot com
2020-06-28 12:50 ` maiku.fabian at gmail dot com
2021-12-30  0:35 ` bruno at clisp dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).