public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: cygwin@cygwin.com
Subject: Re: With bad UTF-8, cygwin can create files it can't read
Date: Wed, 01 Apr 2015 16:10:00 -0000	[thread overview]
Message-ID: <20150401161029.GB13285@calimero.vinschen.de> (raw)
In-Reply-To: <20150401133401.GV13285@calimero.vinschen.de>

[-- Attachment #1: Type: text/plain, Size: 1907 bytes --]

On Apr  1 15:34, Corinna Vinschen wrote:
> Hi Stuart,
> 
> On Mar 30 13:04, Corinna Vinschen wrote:
> > On Mar 25 14:34, Kyzer wrote:
> > > Hello,
> > > 
> > > I've found that if you use cygwin to create a file with badly-encoded
> > > UTF-8, readdir() gives out an entry with a name that cygwin won't
> > > subsequently accept.
> > > 
> > > * create a file using filename with hex bytes F4 8F BF BF
> > > * readdir() reports the filename as hex bytes E2 8E B3 ED BF BF
> > > * attempting to open or unlink the filename E2 8E B3 ED BF BF fails
> > > * attempting to open or unlink the filename F4 8F BF BF succeeds
> > 
> > Thanks for the testcase.  I'll have a look later this week (I hope).
> 
> Wow.  Just wow.  You found a long-standing bug in the wctomb conversion
> from UTF-16 to UTF-8.
> 
> As you probably know, Unicode values beyond the base plane (that is,
> everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation)
> are represented as so-called surrogate pairs in UTF-16, two UTF-16
> values in the 0xd800 - 0xdfff range.
> 
> While the conversion from UTF-8 f4 8f Bf Bf to UTF-16 dbff dfff
> worked fine, the conversion back to UTF-8 has a subtil bug.  There's
> a test for a lone high surrogate pair in the underlying conversion
> function.  This tests the next UTF-16 value like this:
> 
>   if (wchar < 0xdc00 || wchar >= 0xdfff)
>     /* Handle lone high surrogate */
> 
> Notice the >= 0xdfff?  That should have been > 0xdfff.  Duh.  This
> bug is only a bit over 5 years old...
> 
> Fixed in the git repo.  I'l regenerate the today's fool..., erm, the
> today's developer snapshot on https://cygwin.com/snapshots/ later today.

Snapshot is up.  Please give it a try.


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

      parent reply	other threads:[~2015-04-01 16:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-25 15:26 Kyzer
2015-03-30 11:16 ` Corinna Vinschen
2015-04-01 13:34   ` Corinna Vinschen
2015-04-01 16:01     ` Warren Young
2015-04-01 16:16       ` Corinna Vinschen
2015-04-01 16:10     ` Corinna Vinschen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150401161029.GB13285@calimero.vinschen.de \
    --to=corinna-cygwin@cygwin.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).