public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: cygwin@cygwin.com
Subject: Re: Unicode width data inconsistent/outdated
Date: Sat, 29 Jul 2017 15:23:00 -0000	[thread overview]
Message-ID: <20170728195826.GI24013@calimero.vinschen.de> (raw)
In-Reply-To: <289bd98b-e644-888d-07f8-8965b6538373@towo.net>

[-- Attachment #1: Type: text/plain, Size: 1977 bytes --]

On Jul 26 23:43, Thomas Wolff wrote:
> Am 26.07.2017 um 11:50 schrieb Corinna Vinschen:
> > On Jul 26 03:16, Yaakov Selkowitz wrote:
> > > On 2017-07-26 03:08, Corinna Vinschen wrote:
> > > > On Jul 26 08:49, Thomas Wolff wrote:
> > > > > It would be good to keep wcwidth/wcswidth in sync with the installed
> > > > > Unicode data version (package unicode-ucd).
> > > > > Currently it seems to be hard-coded (in newlib/libc/string/wcwidth.c);
> > > > > it refers to Unicode 5.0 while installed Unicode data suggest 9.0 would
> > > > > be used.
> > > > > I can provide some scripts to generate the respective tables if desired.
> > > > > Thomas
> > > > If you can update the newlib files this way and send matching patches
> > > > to the newlib list, this would be highly appreciated.
> > > Thomas, I just updated unicode-ucd to 10.0 for this purpose.
> Thanks.
> > 
> > Oh, and, btw, the comment in wcwidth.c isn't quite correct.  The
> > cwstate in newlib is on Unicode 5.2, see newlib/libc/ctype/towupper.c.
> Oh, a number of other embedded tables. To make the tow* and isw* functions
> more easily adaptable to Unicode updates, there will be some revisions to do
> here. And the to* and is* ones (without 'w') even refer to locales in a way
> I do not understand. Maybe I'll restrict my effort to wcwidth first...

The to* and is* ones (without 'w') don't matter at all and you don't
have to touch them.

The Unicode stuff only affects the tow and isw functions.

As for how to fetch the data, you may want to have a look into
newlib/libc/ctype/utf8alpha.h and newlib/libc/ctype/utf8print.h.  The
header comments contain the awk scripts used to collect the data.

All other isw* files like iswblank.c contain comments explaining
what Unicode character categories are covered.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2017-07-28 19:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-26 14:23 Thomas Wolff
2017-07-26 14:59 ` Corinna Vinschen
2017-07-26 17:03   ` Yaakov Selkowitz
2017-07-26 17:06     ` Corinna Vinschen
2017-07-27 17:09       ` Thomas Wolff
2017-07-29 15:23         ` Corinna Vinschen [this message]
2017-08-03 19:44           ` Thomas Wolff
2017-08-04 17:02             ` Corinna Vinschen
2017-08-05 19:06               ` Thomas Wolff
2017-08-05 20:24                 ` Brian Inglis
2017-08-05 20:53                   ` Thomas Wolff
2017-08-07  9:28                 ` Corinna Vinschen
2017-08-07 10:41                   ` Corinna Vinschen
2017-08-07 19:07                   ` Brian Inglis
2017-08-07 19:31                     ` Thomas Wolff
2017-08-07 21:29                       ` Brian Inglis
2017-08-08  0:29                         ` Thomas Wolff
2017-08-07 19:27                   ` Thomas Wolff
2017-08-08  8:22                     ` Corinna Vinschen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170728195826.GI24013@calimero.vinschen.de \
    --to=corinna-cygwin@cygwin.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).