public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Warren Young <wyml@etr-usa.com>
To: cygwin@cygwin.com
Subject: Re: With bad UTF-8, cygwin can create files it can't read
Date: Wed, 01 Apr 2015 16:01:00 -0000	[thread overview]
Message-ID: <F7BC8B64-DE90-4F01-9C8F-2BB3511B4EF5@etr-usa.com> (raw)
In-Reply-To: <20150401133401.GV13285@calimero.vinschen.de>

On Apr 1, 2015, at 7:34 AM, Corinna Vinschen <corinna-cygwin@cygwin.com> wrote:
> 
> As you probably know, Unicode values beyond the base plane (that is,
> everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation)
> are represented as so-called surrogate pairs in UTF-16, two UTF-16
> values in the 0xd800 - 0xdfff range.

I happened to have run across a similar strangeness in Unicode earlier today.  Does Cygwin cope with/care about Unicode normalization forms?

  http://goo.gl/jnsqhC

For example, will open(2) cope with any UTF-8 form of a string that you could pass in UTF-16 encoding to CreateFile()?

You could imagine, say, a web app getting a string from a user, then using that to access a file on disk.  A different browser given the “same” string could result in a different series of bytes passed to the Cygwin POSIX layer.
--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

  reply	other threads:[~2015-04-01 16:01 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-25 15:26 Kyzer
2015-03-30 11:16 ` Corinna Vinschen
2015-04-01 13:34   ` Corinna Vinschen
2015-04-01 16:01     ` Warren Young [this message]
2015-04-01 16:16       ` Corinna Vinschen
2015-04-01 16:10     ` Corinna Vinschen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F7BC8B64-DE90-4F01-9C8F-2BB3511B4EF5@etr-usa.com \
    --to=wyml@etr-usa.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).