public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: cygwin@cygwin.com
Subject: Re: Possible Bug (clarification) in Cygwin 1.7.5 -- findfirstfile (and findnextfile) yeild bad cfilename when file names have special characters.  Works in cygwin 1.5, fails in 1.7
Date: Fri, 04 Nov 2011 08:47:00 -0000	[thread overview]
Message-ID: <20111104084619.GM9159@calimero.vinschen.de> (raw)
In-Reply-To: <4EB30DF9.2080006@cwilson.fastmail.fm>

On Nov  3 17:56, Charles Wilson wrote:
> On 11/3/2011 4:48 PM, Leon Vanderploeg wrote:
> > With cygwin 1.7.5, cFileName with a special characters such as ñ (n
> > with tidle above it) fail be properly extracted from a
> > WIN32_FIND_DATA structure with findFirstFile (or findNextFile).
> > 
> > To set up a simple test scenario, I created a file in C:\Testing
> > named  Mañana.docx.  I compiled the code at the end of this message
> > on Cygwin 1.7.9 with GCC version 3.4.4 on Server 2008 32 bit system.
> > On this system (and on a Windows 7 32 bit machine), it returns:
> 
> a) Why are you using native Win32 APIs in a cygwin program? You should
> be using the POSIX interfaces instead -- see /usr/include/dirent.h.
> 
> DIR *opendir (const char *);
> DIR *fdopendir (int);
> struct dirent *readdir (DIR *);
> int readdir_r (DIR *, struct dirent *, struct dirent **);
> void rewinddir (DIR *);
> int closedir (DIR *);

ACK++

> b) What you observe is an artifact of cygwin-1.7's new *support* for
> i18n.  In cygwin-1.5, it just didn't care and passed all the bytes back
> exactly as found without transliteration.  In 1.7, it (correctly)
> transcodes strings into the current locale -- and your current locale
> does not appear to support ñ -- or, at least, you haven't told cygwin to
> use the correct one.
> 
> (I'm probably thoroughly botching this explanation, but the point is,

Just a bit.  What you have to keep in mind is that Windows stores all
object names, including filenames, as UTF-16 strings, UNICODE in Windows
terminology.  When you use the ANSI Win32 API as in this example, then
the UTF-16 names are converted to the currently defined ANSI charset on
output, for instance codepage 1252 for Western Europe languages.

Cygwin 1.5 either used the ANSI API, or it converted strings from UTF-16
to the current Windows ANSI charset or vice versa.

Cygwin 1.7 doesn't use the ANSI API anymore, rather it uses UNICODE to
talk to Windows only, and the multibyte charset is defined through the
environment(*) as defined in POSIX.  UTF-8 is the default now.

> you need to check your LC_* and LANG env vars, and maybe call
> setlocale(LC_ALL, "") in your application.)

And even than the code won't work.  If you don't define UNICODE,
FindFirstFile/FindNextFile will use the ANSI versions of this API,
FindFirstFileA/FindNextFileA.  If you didn't set your LANG/LC_CTYPE/LC_ALL
variables to use your current Windows ANSI charset *and* called
setlocale, Cygwin will use UTF-8 by default.  Therefore, the character ñ
will have another multibyte encoding, 0xc3 0xb1, rather than, say, 0xf1
in Windows codepage 1252.  To avoid this problem, you can use the
UNICODE API FindFirstFileW/ FindNextFileW and convert the filename the
current multibyte charset via wcstombs and friends.

However, as Chuck has pointed out, the obviously right thing to do is to
use the POSIX API opendir/readdir/closedir instead.


Corinna

(*) http://cygwin.com/cygwin-ug-net/setup-locale.html
    http://cygwin.com/cygwin-ug-net/using-utils.html#locale

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

  reply	other threads:[~2011-11-04  8:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-03 20:49 Leon Vanderploeg
2011-11-03 21:56 ` Charles Wilson
2011-11-04  8:47   ` Corinna Vinschen [this message]
2011-11-10  5:19     ` Leon Vanderploeg
2011-11-10  9:59       ` Corinna Vinschen
2011-11-10 10:09         ` Corinna Vinschen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111104084619.GM9159@calimero.vinschen.de \
    --to=corinna-cygwin@cygwin.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).