From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1971 invoked by alias); 1 Apr 2015 16:01:47 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 1958 invoked by uid 89); 1 Apr 2015 16:01:46 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.1 required=5.0 tests=AWL,BAYES_50,LIKELY_SPAM_SUBJECT,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: etr-usa.com Received: from etr-usa.com (HELO etr-usa.com) (130.94.180.135) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 01 Apr 2015 16:01:45 +0000 Received: (qmail 77475 invoked by uid 13447); 1 Apr 2015 16:01:43 -0000 Received: from unknown (HELO polypore.west.etr-usa.com) ([73.26.17.49]) (envelope-sender ) by 130.94.180.135 (qmail-ldap-1.03) with AES256-SHA encrypted SMTP for ; 1 Apr 2015 16:01:43 -0000 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Subject: Re: With bad UTF-8, cygwin can create files it can't read From: Warren Young In-Reply-To: <20150401133401.GV13285@calimero.vinschen.de> Date: Wed, 01 Apr 2015 16:01:00 -0000 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20150330110446.GK29875@calimero.vinschen.de> <20150401133401.GV13285@calimero.vinschen.de> To: cygwin@cygwin.com X-IsSubscribed: yes X-SW-Source: 2015-04/txt/msg00028.txt.bz2 On Apr 1, 2015, at 7:34 AM, Corinna Vinschen wr= ote: >=20 > As you probably know, Unicode values beyond the base plane (that is, > everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation) > are represented as so-called surrogate pairs in UTF-16, two UTF-16 > values in the 0xd800 - 0xdfff range. I happened to have run across a similar strangeness in Unicode earlier toda= y. Does Cygwin cope with/care about Unicode normalization forms? http://goo.gl/jnsqhC For example, will open(2) cope with any UTF-8 form of a string that you cou= ld pass in UTF-16 encoding to CreateFile()? You could imagine, say, a web app getting a string from a user, then using = that to access a file on disk. A different browser given the =E2=80=9Csame= =E2=80=9D string could result in a different series of bytes passed to the = Cygwin POSIX layer. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple