public inbox for glibc-bugs@sourceware.org help / color / mirror / Atom feed
From: "carlos at redhat dot com" <sourceware-bugzilla@sourceware.org> To: glibc-bugs@sourceware.org Subject: [Bug localedata/31370] wcwidth() does not treat DEFAULT_IGNORABLE_CODE_POINTs as zero-width Date: Wed, 14 Feb 2024 18:27:21 +0000 [thread overview] Message-ID: <bug-31370-131-mGgv1B6dLY@http.sourceware.org/bugzilla/> (raw) In-Reply-To: <bug-31370-131@http.sourceware.org/bugzilla/> https://sourceware.org/bugzilla/show_bug.cgi?id=31370 --- Comment #4 from Carlos O'Donell <carlos at redhat dot com> --- (In reply to Jules Bertholet from comment #3) > > Please provide a patch to libc-alpha@sourceware.org > > https://sourceware.org/pipermail/libc-alpha/2024-February/154574.html > > > Please also provide justification for the zero width by quoting another implementation that also provides zero width e.g. CLDR. > > CLDR doesn't address width issues at all, this is defined by Unicode itself. > The Unicode Standard, version 15.0, §5.21 - Characters Ignored for Display > <https://www.unicode.org/versions/Unicode15.1.0/ch05.pdf#G40095>: What do the libicu APIs return for these characters? > > The list of characters which should be ignored for display in fallback rendering is given by a character property: Default_Ignorable_Code_Point (DI). Those characters include almost all format characters, all variation selectors, and a few other exceptional characters, such as Hangul fillers. The exact list is defined in DerivedCoreProperties.txt in the Unicode Character Database. > > U+115F HANGUL CHOSEONG FILLER needs a carveout due to the unique behavior of > the conjoining Korean jamo characters. One composed Hangul "syllable block" > like 퓛 is made up of two to three individual component characters, or > "jamo". These are all assigned an `East_Asian_Width` of `Wide` by Unicode, > which would normally mean they would all be assigned width 2 by glibc; a > combination of (leading choseong jamo) + (medial jungseong jamo) + (trailing > jongseong jamo) would then have width 2 + 2 + 2 = 6. However, glibc (and > other wcwidth implementations) special-cases jungseong and jongseong, > assigning them all width 0, to ensure that the complete block has width 2 + > 0 + 0 = 2 as it should. U+115F is meant for use in syllable blocks that are > intentionally missing a leading jamo; it must be assigned a width of 2 even > though it has no visible display to ensure that the complete block has width > 2. Justification like this is *great* to have in the commit message e.g. here in a v2. https://patchwork.sourceware.org/project/glibc/patch/20240211175840.228824-2-julesbertholet@quoi.xyz/ > You can read more about Unicode jamo in the Unicode spec, sections 3.12 > <https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf#G24646> and 18.6 > <https://www.unicode.org/versions/Unicode15.0.0/ch18.pdf#G31028>. -- You are receiving this mail because: You are on the CC list for the bug.
next prev parent reply other threads:[~2024-02-14 18:27 UTC|newest] Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2024-02-11 16:41 [Bug localedata/31370] New: " julesbertholet at quoi dot xyz 2024-02-11 16:55 ` [Bug localedata/31370] " julesbertholet at quoi dot xyz 2024-02-12 13:45 ` carlos at redhat dot com 2024-02-13 22:53 ` maiku.fabian at gmail dot com 2024-02-14 18:02 ` julesbertholet at quoi dot xyz 2024-02-14 18:27 ` carlos at redhat dot com [this message] 2024-02-14 20:47 ` julesbertholet at quoi dot xyz 2024-02-14 20:49 ` julesbertholet at quoi dot xyz 2024-02-16 17:43 ` carlos at redhat dot com 2024-02-18 18:20 ` julesbertholet at quoi dot xyz
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-31370-131-mGgv1B6dLY@http.sourceware.org/bugzilla/ \ --to=sourceware-bugzilla@sourceware.org \ --cc=glibc-bugs@sourceware.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).