public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
From: "carlos at redhat dot com" <sourceware-bugzilla@sourceware.org>
To: libc-locales@sourceware.org
Subject: [Bug localedata/31370] wcwidth() does not treat DEFAULT_IGNORABLE_CODE_POINTs as zero-width
Date: Wed, 14 Feb 2024 18:27:21 +0000	[thread overview]
Message-ID: <bug-31370-716-NkvUJonQje@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-31370-716@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=31370

--- Comment #4 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Jules Bertholet from comment #3)
> > Please provide a patch to libc-alpha@sourceware.org
> 
> https://sourceware.org/pipermail/libc-alpha/2024-February/154574.html
> 
> > Please also provide justification for the zero width by quoting another implementation that also provides zero width e.g. CLDR.
> 
> CLDR doesn't address width issues at all, this is defined by Unicode itself.
> The Unicode Standard, version 15.0, §5.21 - Characters Ignored for Display
> <https://www.unicode.org/versions/Unicode15.1.0/ch05.pdf#G40095>:

What do the libicu APIs return for these characters?

> > The list of characters which should be ignored for display in fallback rendering is given by a character property: Default_Ignorable_Code_Point (DI). Those characters include almost all format characters, all variation selectors, and a few other exceptional characters, such as Hangul fillers. The exact list is defined in DerivedCoreProperties.txt in the Unicode Character Database.
> 
> U+115F HANGUL CHOSEONG FILLER needs a carveout due to the unique behavior of
> the conjoining Korean jamo characters. One composed Hangul "syllable block"
> like 퓛 is made up of two to three individual component characters, or
> "jamo". These are all assigned an `East_Asian_Width` of `Wide` by Unicode,
> which would normally mean they would all be assigned width 2 by glibc; a
> combination of (leading choseong jamo) + (medial jungseong jamo) + (trailing
> jongseong jamo) would then have width 2 + 2 + 2 = 6. However, glibc (and
> other wcwidth implementations) special-cases jungseong and jongseong,
> assigning them all width 0, to ensure that the complete block has width 2 +
> 0 + 0 = 2 as it should. U+115F is meant for use in syllable blocks that are
> intentionally missing a leading jamo; it must be assigned a width of 2 even
> though it has no visible display to ensure that the complete block has width
> 2.

Justification like this is *great* to have in the commit message e.g. here in a
v2.
https://patchwork.sourceware.org/project/glibc/patch/20240211175840.228824-2-julesbertholet@quoi.xyz/

> You can read more about Unicode jamo in the Unicode spec, sections 3.12
> <https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf#G24646> and 18.6
> <https://www.unicode.org/versions/Unicode15.0.0/ch18.pdf#G31028>.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

  parent reply	other threads:[~2024-02-14 18:27 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-11 16:41 [Bug localedata/31370] New: " julesbertholet at quoi dot xyz
2024-02-11 16:55 ` [Bug localedata/31370] " julesbertholet at quoi dot xyz
2024-02-12 13:45 ` carlos at redhat dot com
2024-02-13 22:53 ` maiku.fabian at gmail dot com
2024-02-14 18:02 ` julesbertholet at quoi dot xyz
2024-02-14 18:27 ` carlos at redhat dot com [this message]
2024-02-14 20:47 ` julesbertholet at quoi dot xyz
2024-02-14 20:49 ` julesbertholet at quoi dot xyz
2024-02-16 17:43 ` carlos at redhat dot com
2024-02-18 18:20 ` julesbertholet at quoi dot xyz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-31370-716-NkvUJonQje@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=libc-locales@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).