public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: Bruno Haible <bruno@clisp.org>
Cc: cygwin@cygwin.com
Subject: Re: GB18030 locale
Date: Mon, 31 Jul 2023 15:38:11 +0200	[thread overview]
Message-ID: <ZMe5Q02S5ap5gBbJ@calimero.vinschen.de> (raw)
In-Reply-To: <ZMeH6yZQkK0exU8H@calimero.vinschen.de>

Hi Bruno,

On Jul 31 12:07, Corinna Vinschen via Cygwin wrote:
> On Jul 29 11:53, Bruno Haible via Cygwin wrote:
> > Corinna Vinschen wrote:
> > > However, on debugging this, I see it's totally broken.  Trying to fix
> > > this in the existing functions is futile.  We need dedicated
> > > support functions for GB18030, kind of like the FreeBSD functions,
> > > just with extra support for surrogate pairs, as with our UTF8 stuff.
> > 
> > In case it helps: Find here a test suite for the various multibyte
> > functions with GB18030 specific test cases. (Extracted from gnulib.)
> > https://haible.de/bruno/gnu/testdir-gb18030.tar.gz
> 
> Thank you, I'm already hacking and testing :)

I have a problem with the c32isalpha function.

c32isalpha fails for the character U+FF11 FULLWIDTH DIGIT ONE,
because it expects the character to be an alphabetic character.

The Cygwin unicode information is automatically generated from the
Unicode data file UnicodeData.txt, fresh from their homepage.  iswalpha
in newlib is checking for the Unicode categories, using the expression:

    return cat == CAT_LC || cat == CAT_Lu || cat == CAT_Ll || cat == CAT_Lt
          || cat == CAT_Lm || cat == CAT_Lo
	  || cat == CAT_Nl // Letter_Number
	  ;

with CAT_foo being equivalent to Unicode category foo.

Per UnicodeData.txt, ff11 is of category Nd, so it's a digit, not an
alphabetic character.

I see that Glibc returns 1 from c32isalpha for U+FF11, but I don't see
where it takes that info and why this is correct.  Can you point me to
some info on this?


Thanks,
Corinna

  reply	other threads:[~2023-07-31 13:38 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-27 10:15 fnmatch improvements Bruno Haible
2023-07-27 18:24 ` Corinna Vinschen
2023-07-27 19:05   ` Corinna Vinschen
2023-07-27 20:25     ` Brian Inglis
2023-07-27 21:22       ` Bruno Haible
2023-07-27 22:17         ` Brian Inglis
2023-07-28  9:00           ` Corinna Vinschen
2023-07-28  9:53             ` Corinna Vinschen
2023-07-27 21:40     ` Bruno Haible
2023-07-28  8:53       ` Corinna Vinschen
2023-07-28 10:56         ` Bruno Haible
2023-07-28 11:14           ` Corinna Vinschen
2023-07-28 18:59           ` Corinna Vinschen
2023-07-28 19:33             ` Bruno Haible
2023-07-28 19:54             ` GB18030 locale Bruno Haible
2023-07-29  9:23               ` Corinna Vinschen
2023-07-29  9:53                 ` Bruno Haible
2023-07-31 10:07                   ` Corinna Vinschen
2023-07-31 13:38                     ` Corinna Vinschen [this message]
2023-07-31 14:06                       ` character class "alpha" Bruno Haible
2023-07-31 17:46                         ` Corinna Vinschen
2023-07-31 18:20                           ` Corinna Vinschen
2023-07-31 18:43                             ` Bruno Haible
2023-07-31 21:12                               ` Corinna Vinschen
2023-08-01 16:29                                 ` Brian Inglis
2023-08-02  7:56                                   ` Corinna Vinschen
2023-08-02 15:06                                     ` Corinna Vinschen
2023-07-31 21:13                               ` Brian Inglis
2023-07-31 21:37                                 ` Bruno Haible
2023-07-28 11:12         ` fnmatch improvements Corinna Vinschen
2023-07-28 11:22           ` Bruno Haible
2023-07-28 21:42           ` Bill Stewart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZMe5Q02S5ap5gBbJ@calimero.vinschen.de \
    --to=corinna-cygwin@cygwin.com \
    --cc=bruno@clisp.org \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).