From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 51735 invoked by alias); 1 Apr 2015 16:10:33 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 51722 invoked by uid 89); 1 Apr 2015 16:10:32 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.6 required=5.0 tests=AWL,BAYES_50,LIKELY_SPAM_SUBJECT autolearn=no version=3.3.2 X-HELO: calimero.vinschen.de Received: from aquarius.hirmke.de (HELO calimero.vinschen.de) (217.91.18.234) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 01 Apr 2015 16:10:31 +0000 Received: by calimero.vinschen.de (Postfix, from userid 500) id 0BBA7A8096E; Wed, 1 Apr 2015 18:10:29 +0200 (CEST) Date: Wed, 01 Apr 2015 16:10:00 -0000 From: Corinna Vinschen To: cygwin@cygwin.com Subject: Re: With bad UTF-8, cygwin can create files it can't read Message-ID: <20150401161029.GB13285@calimero.vinschen.de> Reply-To: cygwin@cygwin.com Mail-Followup-To: cygwin@cygwin.com References: <20150330110446.GK29875@calimero.vinschen.de> <20150401133401.GV13285@calimero.vinschen.de> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Z0wTxTCd2IDq3u/i" Content-Disposition: inline In-Reply-To: <20150401133401.GV13285@calimero.vinschen.de> User-Agent: Mutt/1.5.23 (2014-03-12) X-SW-Source: 2015-04/txt/msg00029.txt.bz2 --Z0wTxTCd2IDq3u/i Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-length: 1883 On Apr 1 15:34, Corinna Vinschen wrote: > Hi Stuart, >=20 > On Mar 30 13:04, Corinna Vinschen wrote: > > On Mar 25 14:34, Kyzer wrote: > > > Hello, > > >=20 > > > I've found that if you use cygwin to create a file with badly-encoded > > > UTF-8, readdir() gives out an entry with a name that cygwin won't > > > subsequently accept. > > >=20 > > > * create a file using filename with hex bytes F4 8F BF BF > > > * readdir() reports the filename as hex bytes E2 8E B3 ED BF BF > > > * attempting to open or unlink the filename E2 8E B3 ED BF BF fails > > > * attempting to open or unlink the filename F4 8F BF BF succeeds > >=20 > > Thanks for the testcase. I'll have a look later this week (I hope). >=20 > Wow. Just wow. You found a long-standing bug in the wctomb conversion > from UTF-16 to UTF-8. >=20 > As you probably know, Unicode values beyond the base plane (that is, > everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation) > are represented as so-called surrogate pairs in UTF-16, two UTF-16 > values in the 0xd800 - 0xdfff range. >=20 > While the conversion from UTF-8 f4 8f Bf Bf to UTF-16 dbff dfff > worked fine, the conversion back to UTF-8 has a subtil bug. There's > a test for a lone high surrogate pair in the underlying conversion > function. This tests the next UTF-16 value like this: >=20 > if (wchar < 0xdc00 || wchar >=3D 0xdfff) > /* Handle lone high surrogate */ >=20 > Notice the >=3D 0xdfff? That should have been > 0xdfff. Duh. This > bug is only a bit over 5 years old... >=20 > Fixed in the git repo. I'l regenerate the today's fool..., erm, the > today's developer snapshot on https://cygwin.com/snapshots/ later today. Snapshot is up. Please give it a try. Thanks, Corinna --=20 Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat --Z0wTxTCd2IDq3u/i Content-Type: application/pgp-signature Content-length: 819 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJVHBh0AAoJEPU2Bp2uRE+gCKgP/11hhfr8MLbM1vm4WcbSveYo SohCWaKS9imjXTYGflhgTBjOCxmndB6FOoS3fq3LuGwyFQ8/1niB0CnVAc6lE9le 1TDD+bTULE8TGqpWmdVVi/FjUX4P8bB7qnsFREmNx0D2NUy5dOGobOIAASqBzK33 Xs09ShcDC6F697a/I0Z4w8+YB5PR2PzPpIw6N9mHjpP3fu9FR6eMNnx2l9x8TU0U bNc8qRrG1nWWHwn4K0G+JpLiLJfkW46EPj8gvpBGbVeSlpRDqmGwJKRPO4OFsRci 3rGrAijdtatNZzOgbSlLOlH391XaSqQSBg3PM4VtYjbUVSvgs76ArNaJFa9UyrHh BQa0sZFmYUkYVOIAqPYfqKF/iGMPAW9jhlD/DsETgRMijq1ZoNvEIGZJlQoJVzLL g+SHPVxYzaIC2ssVlNftqKeGVdIMhiJUA5du7Rga9rB3gAJQwC1/x3mVeCU9RXeh f+x7EGQvS/IdSLjVqCg1xYLOpeGZWDuQ2mrl29LZOEx/xceG6IcQOi2JCZ7Y1vUH si8ktTyl97d1bN7h7HbgBG+1QcnBNvy0Syd+/CHxh7dZ7CFyI/AO2XpfE2+T6sdw 1aIS3h0Q+x0KXIggw15WBumOWRVz7Uhns71bCyAGE0sEAPmX2QTy5zJ0/WyJ/6h4 +y3QQBV32YcLl6UE22EP =amxW -----END PGP SIGNATURE----- --Z0wTxTCd2IDq3u/i--