From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6282 invoked by alias); 10 Nov 2011 05:19:16 -0000 Received: (qmail 6273 invoked by uid 22791); 10 Nov 2011 05:19:14 -0000 X-SWARE-Spam-Status: No, hits=-0.3 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from bosmailout12.eigbox.net (HELO bosmailout12.eigbox.net) (66.96.189.12) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 10 Nov 2011 05:19:01 +0000 Received: from bosmailscan20.eigbox.net ([10.20.15.20]) by bosmailout12.eigbox.net with esmtp (Exim) id 1RON2S-00041v-E4 for cygwin@cygwin.com; Thu, 10 Nov 2011 00:19:00 -0500 Received: from bosimpout02.eigbox.net ([10.20.55.2]) by bosmailscan20.eigbox.net with esmtp (Exim) id 1RON2R-0000yV-Ne for cygwin@cygwin.com; Thu, 10 Nov 2011 00:18:59 -0500 Received: from bosauthsmtp13.eigbox.net ([10.20.18.13]) by bosimpout02.eigbox.net with NO UCE id vHJz1h0050GvDVm01HJzeH; Thu, 10 Nov 2011 00:18:59 -0500 X-EN-OrigOutIP: 10.20.18.13 X-EN-IMPSID: vHJz1h0050GvDVm01HJzeH Received: from c-24-8-203-182.hsd1.co.comcast.net ([24.8.203.182] helo=laptop3) by bosauthsmtp13.eigbox.net with esmtpa (Exim) id 1RON2R-0006lF-P8 for cygwin@cygwin.com; Thu, 10 Nov 2011 00:18:59 -0500 From: "Leon Vanderploeg" To: References: <135801cc9a69$f73ceaf0$e5b6c0d0$@vaultnow.com> <4EB30DF9.2080006@cwilson.fastmail.fm> <20111104084619.GM9159@calimero.vinschen.de> In-Reply-To: <20111104084619.GM9159@calimero.vinschen.de> Subject: RE: Possible Bug (clarification) in Cygwin 1.7.5 -- findfirstfile (and findnextfile) yeild bad cfilename when file names have special characters. Works in cygwin 1.5, fails in 1.7 Date: Thu, 10 Nov 2011 05:19:00 -0000 Message-ID: <029901cc9f68$41108a80$c3319f80$@vaultnow.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-EN-UserInfo: ca4eb83b29c199dc675fd93de881b90e:2283ef65109048eed6984feda31515c6 X-EN-AuthUser: leonv@vaultnow.com X-EN-OrigIP: 24.8.203.182 X-EN-OrigHost: c-24-8-203-182.hsd1.co.comcast.net X-IsSubscribed: yes Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com X-SW-Source: 2011-11/txt/msg00168.txt.bz2 Many thanks to Charles and Corinna for the help. I have modified the code = to use the POSIX functions. I still have one problem I cannot seem to conq= uer.=20=20 I need to be able to read and write the (yes, I know it's evil) archive bit= . Unless there is a POSIX function (which I seriously doubt) for these ite= ms, I am locked into the windows APIs. I have read and re-read the Cygwin documentation on internationalization at= least 6 times and I cannot figure out what I need to do to get this to wor= k. I have tried numerous combinations of environment variables and locale = settings in the code, but none of them work. The windows API fails to find= the file specified. I just want US English that can handle the extended c= haracter set to the windows APIs. In this case, let's use the example of t= he copyright symbol (the small c with a circle around it). What needs to b= e set in the environment, and what needs to be set in the C code to handle = these characters correctly? Your help and assistance is GREATLY appreciated. Leon Leon Vanderploeg Cell 303-877-9654 On Nov 3 17:56, Charles Wilson wrote: > On 11/3/2011 4:48 PM, Leon Vanderploeg wrote: > > With cygwin 1.7.5, cFileName with a special characters such as =C3=B1 (= n=20 > > with tidle above it) fail be properly extracted from a=20 > > WIN32_FIND_DATA structure with findFirstFile (or findNextFile). > >=20 > > To set up a simple test scenario, I created a file in C:\Testing=20 > > named Ma=C3=B1ana.docx. I compiled the code at the end of this messag= e=20 > > on Cygwin 1.7.9 with GCC version 3.4.4 on Server 2008 32 bit system. > > On this system (and on a Windows 7 32 bit machine), it returns: >=20 > a) Why are you using native Win32 APIs in a cygwin program? You should=20 > be using the POSIX interfaces instead -- see /usr/include/dirent.h. >=20 > DIR *opendir (const char *); > DIR *fdopendir (int); > struct dirent *readdir (DIR *); > int readdir_r (DIR *, struct dirent *, struct dirent **); void=20 > rewinddir (DIR *); int closedir (DIR *); ACK++ > b) What you observe is an artifact of cygwin-1.7's new *support* for=20 > i18n. In cygwin-1.5, it just didn't care and passed all the bytes=20 > back exactly as found without transliteration. In 1.7, it (correctly)=20 > transcodes strings into the current locale -- and your current locale=20 > does not appear to support =C3=B1 -- or, at least, you haven't told cygwi= n=20 > to use the correct one. >=20 > (I'm probably thoroughly botching this explanation, but the point is, Just a bit. What you have to keep in mind is that Windows stores all objec= t names, including filenames, as UTF-16 strings, UNICODE in Windows termino= logy. When you use the ANSI Win32 API as in this example, then the UTF-16 = names are converted to the currently defined ANSI charset on output, for in= stance codepage 1252 for Western Europe languages. Cygwin 1.5 either used the ANSI API, or it converted strings from UTF-16 to= the current Windows ANSI charset or vice versa. Cygwin 1.7 doesn't use the ANSI API anymore, rather it uses UNICODE to talk= to Windows only, and the multibyte charset is defined through the environment(*) as defined in POSIX. UTF-8 is the default now. > you need to check your LC_* and LANG env vars, and maybe call=20 > setlocale(LC_ALL, "") in your application.) And even than the code won't work. If you don't define UNICODE, FindFirstF= ile/FindNextFile will use the ANSI versions of this API, FindFirstFileA/Fin= dNextFileA. If you didn't set your LANG/LC_CTYPE/LC_ALL variables to use y= our current Windows ANSI charset *and* called setlocale, Cygwin will use UT= F-8 by default. Therefore, the character =C3=B1 will have another multibyt= e encoding, 0xc3 0xb1, rather than, say, 0xf1 in Windows codepage 1252. To= avoid this problem, you can use the UNICODE API FindFirstFileW/ FindNextFi= leW and convert the filename the current multibyte charset via wcstombs and= friends. However, as Chuck has pointed out, the obviously right thing to do is to us= e the POSIX API opendir/readdir/closedir instead. Corinna -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple