public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
From: "julesbertholet at quoi dot xyz" <sourceware-bugzilla@sourceware.org>
To: libc-locales@sourceware.org
Subject: [Bug localedata/31370] wcwidth() does not treat DEFAULT_IGNORABLE_CODE_POINTs as zero-width
Date: Wed, 14 Feb 2024 18:02:03 +0000	[thread overview]
Message-ID: <bug-31370-716-85kXKDfOKg@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-31370-716@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=31370

--- Comment #3 from Jules Bertholet <julesbertholet at quoi dot xyz> ---
> Please provide a patch to libc-alpha@sourceware.org

https://sourceware.org/pipermail/libc-alpha/2024-February/154574.html

> Please also provide justification for the zero width by quoting another implementation that also provides zero width e.g. CLDR.

CLDR doesn't address width issues at all, this is defined by Unicode itself.
The Unicode Standard, version 15.0, §5.21 - Characters Ignored for Display
<https://www.unicode.org/versions/Unicode15.1.0/ch05.pdf#G40095>:

> The list of characters which should be ignored for display in fallback rendering is given by a character property: Default_Ignorable_Code_Point (DI). Those characters include almost all format characters, all variation selectors, and a few other exceptional characters, such as Hangul fillers. The exact list is defined in DerivedCoreProperties.txt in the Unicode Character Database.

U+115F HANGUL CHOSEONG FILLER needs a carveout due to the unique behavior of
the conjoining Korean jamo characters. One composed Hangul "syllable block"
like 퓛 is made up of two to three individual component characters, or "jamo".
These are all assigned an `East_Asian_Width` of `Wide` by Unicode, which would
normally mean they would all be assigned width 2 by glibc; a combination of
(leading choseong jamo) + (medial jungseong jamo) + (trailing jongseong jamo)
would then have width 2 + 2 + 2 = 6. However, glibc (and other wcwidth
implementations) special-cases jungseong and jongseong, assigning them all
width 0, to ensure that the complete block has width 2 + 0 + 0 = 2 as it
should. U+115F is meant for use in syllable blocks that are intentionally
missing a leading jamo; it must be assigned a width of 2 even though it has no
visible display to ensure that the complete block has width 2.

You can read more about Unicode jamo in the Unicode spec, sections 3.12
<https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf#G24646> and 18.6
<https://www.unicode.org/versions/Unicode15.0.0/ch18.pdf#G31028>.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

  parent reply	other threads:[~2024-02-14 18:02 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-11 16:41 [Bug localedata/31370] New: " julesbertholet at quoi dot xyz
2024-02-11 16:55 ` [Bug localedata/31370] " julesbertholet at quoi dot xyz
2024-02-12 13:45 ` carlos at redhat dot com
2024-02-13 22:53 ` maiku.fabian at gmail dot com
2024-02-14 18:02 ` julesbertholet at quoi dot xyz [this message]
2024-02-14 18:27 ` carlos at redhat dot com
2024-02-14 20:47 ` julesbertholet at quoi dot xyz
2024-02-14 20:49 ` julesbertholet at quoi dot xyz
2024-02-16 17:43 ` carlos at redhat dot com
2024-02-18 18:20 ` julesbertholet at quoi dot xyz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-31370-716-85kXKDfOKg@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=libc-locales@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).