From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 50860 invoked by alias); 5 Feb 2016 15:47:56 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 50850 invoked by uid 89); 5 Feb 2016 15:47:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 spammy=bh, H*r:8.12.11, Hx-languages-length:1777, cygwin.h X-HELO: demumfd001.nsn-inter.net Received: from demumfd001.nsn-inter.net (HELO demumfd001.nsn-inter.net) (93.183.12.32) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Fri, 05 Feb 2016 15:47:54 +0000 Received: from demuprx016.emea.nsn-intra.net ([10.150.129.55]) by demumfd001.nsn-inter.net (8.15.2/8.15.2) with ESMTPS id u15FlnDC001414 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 5 Feb 2016 15:47:50 GMT Received: from [10.149.163.250] ([10.149.163.250]) by demuprx016.emea.nsn-intra.net (8.12.11.20060308/8.12.11) with ESMTP id u15FlN2X020418; Fri, 5 Feb 2016 16:47:44 +0100 From: Thomas Wolff Subject: cygwin_conv_ functions and character encoding To: cygwin@cygwin.com Message-ID: <56B4C40A.4060607@towo.net> Date: Fri, 05 Feb 2016 15:47:00 -0000 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-purgate-type: clean X-purgate-Ad: Categorized by eleven eXpurgate (R) http://www.eleven.de X-purgate: clean X-purgate: This mail is considered clean (visit http://www.eleven.de for further information) X-purgate-size: 1823 X-purgate-ID: 151667::1454687270-00001C0B-241601AA/0/0 X-IsSubscribed: yes X-SW-Source: 2016-02/txt/msg00042.txt.bz2 The cygwin path conversion functions ignore the current locale; rather they seem to always use the locale environment set when the program was started, see test program convloc.c: #include #include #include #include int main() { setlocale(LC_ALL, "C.UTF-8"); char * utfstring = "böh"; printf("ustring <%s>\n", utfstring); wchar_t * wstring = cygwin_create_path(CCP_POSIX_TO_WIN_W, utfstring); printf("wstring <%ls>\n", wstring); } Run in a UTF-8 terminal: > LC_CTYPE=de_DE ./convloc ustring (C.UTF-8) wstring (C.UTF-8) In sys_wcstombs in strfuncs.cc I see: const char *charset = cygheap->locale.charset; which is set in internal_setlocale ()... In fact, the situation can be fixed by adding after setlocale(): cygwin_internal(CW_INT_SETLOCALE); // -> internal_setlocale(); (cf. https://sourceware.org/ml/cygwin-developers/2010-02/msg00054.html) but I think those functions should use the proper locale implicitly; according to the generic description in http://linux.die.net/man/3/setlocale, LC_CTYPE affects ... conversion ... functions, in my opinion this would include cygwin-specific conversion functions as well as implicitly called conversion (see open() below). The same problem applies to the open() function (involving path conversion). The wide string function mbstowcs behaves as expected. The whole issue occurred to me while trying to work around a missing conversion functionality, just converting the pathname syntax between Unicode strings. The desired options would be like: CCP_POSIX_W_TO_WIN_W, /* from is wchar_t *posix, to is wchar_t *win32 */ CCP_WIN_W_TO_POSIX_W, /* from is wchar_t *win32, to is wchar_t *posix */ ------ Thomas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple