From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 8D0643851C2C; Tue, 16 Jun 2020 05:43:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8D0643851C2C From: "maiku.fabian at gmail dot com" To: libc-locales@sourceware.org Subject: [Bug localedata/26120] New: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) Date: Tue, 16 Jun 2020 05:43:43 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: 2.31 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: maiku.fabian at gmail dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: libc-locales@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-locales mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Jun 2020 05:43:43 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D26120 Bug ID: 26120 Summary: column width of of some Korean JUNGSEONG/JONGSEONG characters wrong (should be 0) Product: glibc Version: 2.31 Status: NEW Severity: normal Priority: P2 Component: localedata Assignee: unassigned at sourceware dot org Reporter: maiku.fabian at gmail dot com CC: libc-locales at sourceware dot org Target Milestone: --- Robert Ross writes: > Thank you for maintaining glibc's "localedata/charmaps/UTF-8". It is > good that most "HANGUL JUNGSEONG" characters have zero width due to > "... 0" on line 48775 but strange that the newer "HANGUL > JUNGSEONG" characters have width 1 since there is no > "... 0". Similarly most "HANGUL JONGSEONG" characters > have width 0 due to line 48775 but the newer ones have width 1 since > there is no "... 0". Please correct this if it's an > error or explain if it's not. In https://www.unicode.org/Public/13.0.0/ucd/EastAsianWidth.txt all of these have width "N". http://www.unicode.org/reports/tr11/ says: 6.2 Combining Marks > Combining marks have been classified and are given a property > assignment based on their typical applicability. For example, > combining marks typically applied to characters of class N, Na, or W > are classified as A. Combining marks for purely non-East Asian scripts > are marked as N, and nonspacing marks used only with wide characters > are given a W. Even more so than for other characters, the > East_Asian_Width property for combining marks is not the same as their > display width. >=20 > In particular, nonspacing marks do not possess actual advance > width. Therefore, even when displaying combining marks, the > East_Asian_Width property cannot be related to the advance width of > these characters. However, it can be useful in determining the > encoding length in a legacy encoding, or the choice of font for the > range of characters including that nonspacing mark. The width of the > glyph image of a nonspacing mark should always be chosen as the > appropriate one for the width of the base character. See also: https://sourceware.org/bugzilla/show_bug.cgi?id=3D21750#c5 > We also agree that the Hangul Jamo U+1160=E2=80=A5U+11FF are sort > of "combining characters" although they are not marked as such > in the Unicode data. But they are fragments of Hangul characters > which combine. So it seems correct to mark them as width 0. --=20 You are receiving this mail because: You are on the CC list for the bug.=