public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: cygwin@cygwin.com
Subject: Re: raise(-1) has stopped returning an error recently
Date: Thu, 25 Nov 2021 13:54:42 +0100	[thread overview]
Message-ID: <YZ+HkgPIwmCuTcJr@calimero.vinschen.de> (raw)
In-Reply-To: <643c1cb7-9b18-25cf-62b0-8085c8fab137@Shaw.ca>

On Nov 24 11:01, Brian Inglis via Cygwin wrote:
> On 2021-11-24 02:25, Corinna Vinschen via Cygwin wrote:
> > > On Tue, Nov 23, 2021 at 11:18:25AM -0700, Brian Inglis wrote:
> > > > Do Cygwin and/or Windows support surrogate pairs in UTF-8?
> > 
> > You mean UTF-16.  UTF-8 doesn't know surrogate pairs, UTF-16 does.
> > Originally there was UCS-2, 16 bits, with only 65536 code points.
> > However, Unicode left the BMP already with version 2.0 in 1996, so
> > UTF-16 and surrogate pairs became necessary.  Windows as well as Cygwin
> > support them.
> 
> How does Cygwin support UTF-16 locales with surrogate pairs?

UTF-16 locales?  There's no such thing.  UTF-16 is just the 16 bit
representation for Unicode, and as such, is independent of the locale.
On the user side, Cygwin only supports UTF-8 as Unicode representation.
Internally you can then convert them to wchar_t which is UTF-16.

> Are they the "native" locales inherited from Windows if others are not
> specified e.g. UTF-8, some OEM SBCS or MBCS?

Just try `locale -av' and you'll see all supported locales and their
respective default codeset.  All of them can be used with .utf8
specifier to use UTF-8 instead of the default codeset.  Some of them
use UTF-8 as default codeset anyway, e. g., fa_IR or yo_NG.

> > > There are 3 tests in surrogate-pair and only the 3rd one failed. So I guess
> > > surrogate pairs in UTF-8 "mostly work".
> > 
> > UTF-16.  The surrogate stuff is evil at times.  Have a look at the
> > __utf8_wctomb function in
> > https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdlib/wctomb_r.c
> > Lone surrogate halfs in an input stream are a problem, for instance.
> 
> Thus the confusion with grep surrogate pair tests which appear to be running
> under a UTF-8 locale: see attached surrogate pair extract from cygport
> --debug grep.cygport check.

An STC in plain C might be helpful.


Corinna

  reply	other threads:[~2021-11-25 12:54 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-22  5:20 Duncan Roe
2021-11-22 10:25 ` Corinna Vinschen
2021-11-22 13:06   ` Corinna Vinschen
2021-11-23  8:27     ` Duncan Roe
2021-11-23  9:50       ` Corinna Vinschen
2021-11-23 18:18         ` Brian Inglis
2021-11-23 22:36           ` Duncan Roe
2021-11-24  9:25             ` Corinna Vinschen
2021-11-24 18:01               ` Brian Inglis
2021-11-25 12:54                 ` Corinna Vinschen [this message]
2021-11-27  7:24                   ` Brian Inglis
2021-11-28  3:04                 ` Duncan Roe
2021-11-26 23:43               ` Duncan Roe
2021-11-29 10:41                 ` Corinna Vinschen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YZ+HkgPIwmCuTcJr@calimero.vinschen.de \
    --to=corinna-cygwin@cygwin.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).