From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.126.187]) by sourceware.org (Postfix) with ESMTPS id C954B385843B for ; Thu, 25 Nov 2021 12:54:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C954B385843B Received: from calimero.vinschen.de ([24.134.7.25]) by mrelayeu.kundenserver.de (mreue009 [212.227.15.167]) with ESMTPSA (Nemesis) id 1MFbeI-1mqphs2Uik-00HAnS for ; Thu, 25 Nov 2021 13:54:42 +0100 Received: by calimero.vinschen.de (Postfix, from userid 500) id 38530A80525; Thu, 25 Nov 2021 13:54:42 +0100 (CET) Date: Thu, 25 Nov 2021 13:54:42 +0100 From: Corinna Vinschen To: cygwin@cygwin.com Subject: Re: raise(-1) has stopped returning an error recently Message-ID: Reply-To: cygwin@cygwin.com Mail-Followup-To: cygwin@cygwin.com References: <42c9bb90-dd78-edfa-99ff-f65f7e000956@SystematicSw.ab.ca> <643c1cb7-9b18-25cf-62b0-8085c8fab137@Shaw.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <643c1cb7-9b18-25cf-62b0-8085c8fab137@Shaw.ca> X-Provags-ID: V03:K1:O69r4blocL2mhs4Zc2CibcTGt4ZETpDI9ENebpi7W2DQLyVy/Dt +LdlORXlkp0bOIISUTIPC+DKCulcAN9yvdAxioNLdel+bsKtmL2SwGiJjNbiym3l0/Bmvi0 D/7/SEEXK1sSZSg4m56vSzK8a4M3It3yW+fjFVT6Z/VPg/wPFQigMuBMU2X3zlunk1XEg4C skM+Gvqa5+zxPdwkdL99g== X-UI-Out-Filterresults: notjunk:1;V03:K0:bYwmlDtpJ4s=:fus8fCR/f1ERpQ0fbvqDvQ fre3K50mrQV2cxQf7SBnXUFXmTX4MVkGNbx2oMdSO7Zl6jNHLxnK39FrVtUpKWlJwZlwczPkS 8JhV8ytKOkANLf8dOI0yZ/BZ+u9D22iyWt5L5JfWSxn1VC0YZ7ZFg/S3Ck49braleEnbGrIQ7 NvPl4xbbu9vCPGhPmBLQGK5tSBAV7Tt6FxCgS/WZa+dIasHhO9hIpQN2YT/24zfqeC/vzdaRV fRWVBG1s1LArf+Itk+FHy9QksSjSeNDPFv64ZKNzTYjfs5aN15M0+GYBEPA92uufHnMcwsfYa ZkTlSfH2+Yo/9oShe4wOtAi2uD+VgO05Za3bDEV5oR0fOqhYrLFJv4FNxq0LEgrTqtxWgEv8t VH1OgRz53GzcBjvIKZoUOgCA09WEA/hBilBtrwI16eIoFrCE7Ut5maoTwxB7legGqImiLCAcN ODnnABJAzxAaS4nv0EJJb/AWcDX2I1Gh0c47/1Bvf3T2rsK4ffzgfC2Y/sPc6DIHNlIqo2EMf REpKl2dwIS7xWOQgV5uJkx0TdlqY2fI6OY1NK3ECVLi9Pw8I887VV3b5QeRUGQd9avLZHe6Of x1X5GVAXyQFQKghBREy/r+94M+el5BK5K7IHypT1WUxZKDpbqIWQ9hLmEVtCthfjlTN32vu8a PRVwM3saTMV4avttS2tFljg6Xds/VagUwXHJZfRBVAseObQEjPIW7PAqdnLkq2aX+3UnH7wbR c9vigueUnci/SH+B X-Spam-Status: No, score=-99.4 required=5.0 tests=BAYES_00, GOOD_FROM_CORINNA_CYGWIN, KAM_DMARC_NONE, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NEUTRAL, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Nov 2021 12:54:45 -0000 On Nov 24 11:01, Brian Inglis via Cygwin wrote: > On 2021-11-24 02:25, Corinna Vinschen via Cygwin wrote: > > > On Tue, Nov 23, 2021 at 11:18:25AM -0700, Brian Inglis wrote: > > > > Do Cygwin and/or Windows support surrogate pairs in UTF-8? > > > > You mean UTF-16. UTF-8 doesn't know surrogate pairs, UTF-16 does. > > Originally there was UCS-2, 16 bits, with only 65536 code points. > > However, Unicode left the BMP already with version 2.0 in 1996, so > > UTF-16 and surrogate pairs became necessary. Windows as well as Cygwin > > support them. > > How does Cygwin support UTF-16 locales with surrogate pairs? UTF-16 locales? There's no such thing. UTF-16 is just the 16 bit representation for Unicode, and as such, is independent of the locale. On the user side, Cygwin only supports UTF-8 as Unicode representation. Internally you can then convert them to wchar_t which is UTF-16. > Are they the "native" locales inherited from Windows if others are not > specified e.g. UTF-8, some OEM SBCS or MBCS? Just try `locale -av' and you'll see all supported locales and their respective default codeset. All of them can be used with .utf8 specifier to use UTF-8 instead of the default codeset. Some of them use UTF-8 as default codeset anyway, e. g., fa_IR or yo_NG. > > > There are 3 tests in surrogate-pair and only the 3rd one failed. So I guess > > > surrogate pairs in UTF-8 "mostly work". > > > > UTF-16. The surrogate stuff is evil at times. Have a look at the > > __utf8_wctomb function in > > https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdlib/wctomb_r.c > > Lone surrogate halfs in an input stream are a problem, for instance. > > Thus the confusion with grep surrogate pair tests which appear to be running > under a UTF-8 locale: see attached surrogate pair extract from cygport > --debug grep.cygport check. An STC in plain C might be helpful. Corinna