From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sourceware-bugzilla@sourceware.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 8D0643851C2C; Tue, 16 Jun 2020 05:43:43 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8D0643851C2C
From: "maiku.fabian at gmail dot com" <sourceware-bugzilla@sourceware.org>
To: libc-locales@sourceware.org
Subject: [Bug localedata/26120] New: column width of  of some Korean
 JUNGSEONG/JONGSEONG characters wrong (should be 0)
Date: Tue, 16 Jun 2020 05:43:43 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: localedata
X-Bugzilla-Version: 2.31
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: maiku.fabian at gmail dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter cc target_milestone
Message-ID: <bug-26120-716@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: libc-locales@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-locales mailing list <libc-locales.sourceware.org>
List-Unsubscribe: <http://sourceware.org/mailman/options/libc-locales>,
 <mailto:libc-locales-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-locales/>
List-Help: <mailto:libc-locales-request@sourceware.org?subject=help>
List-Subscribe: <http://sourceware.org/mailman/listinfo/libc-locales>,
 <mailto:libc-locales-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jun 2020 05:43:43 -0000

https://sourceware.org/bugzilla/show_bug.cgi?id=3D26120

            Bug ID: 26120
           Summary: column width of  of some Korean JUNGSEONG/JONGSEONG
                    characters wrong (should be 0)
           Product: glibc
           Version: 2.31
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: maiku.fabian at gmail dot com
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

Robert Ross <rob.ross@ymail.com> writes:

> Thank you for maintaining glibc's "localedata/charmaps/UTF-8".  It is
> good that most "HANGUL JUNGSEONG" characters have zero width due to
> "<U1160>...<U11FF> 0" on line 48775 but strange that the newer "HANGUL
> JUNGSEONG" characters have width 1 since there is no
> "<UD7B0>...<UD7C6> 0".  Similarly most "HANGUL JONGSEONG" characters
> have width 0 due to line 48775 but the newer ones have width 1 since
> there is no "<UD7CB>...<UD7FB> 0".  Please correct this if it's an
> error or explain if it's not.

In https://www.unicode.org/Public/13.0.0/ucd/EastAsianWidth.txt all of these
have width "N".

http://www.unicode.org/reports/tr11/ says:

6.2 Combining Marks

> Combining marks have been classified and are given a property
> assignment based on their typical applicability. For example,
> combining marks typically applied to characters of class N, Na, or W
> are classified as A. Combining marks for purely non-East Asian scripts
> are marked as N, and nonspacing marks used only with wide characters
> are given a W. Even more so than for other characters, the
> East_Asian_Width property for combining marks is not the same as their
> display width.
>=20
> In particular, nonspacing marks do not possess actual advance
> width. Therefore, even when displaying combining marks, the
> East_Asian_Width property cannot be related to the advance width of
> these characters. However, it can be useful in determining the
> encoding length in a legacy encoding, or the choice of font for the
> range of characters including that nonspacing mark. The width of the
> glyph image of a nonspacing mark should always be chosen as the
> appropriate one for the width of the base character.

See also: https://sourceware.org/bugzilla/show_bug.cgi?id=3D21750#c5

> We also agree that the Hangul Jamo U+1160=E2=80=A5U+11FF are sort
> of "combining characters" although they are not marked as such
> in the Unicode data. But they are fragments of Hangul characters
> which combine. So it seems correct to mark them as width 0.

--=20
You are receiving this mail because:
You are on the CC list for the bug.=