public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* cygwin_conv_ functions and character encoding
@ 2016-02-05 15:47 Thomas Wolff
  2016-02-08 13:15 ` Corinna Vinschen
  0 siblings, 1 reply; 2+ messages in thread
From: Thomas Wolff @ 2016-02-05 15:47 UTC (permalink / raw)
  To: cygwin

The cygwin path conversion functions ignore the current locale;
rather they seem to always use the locale environment set when the 
program was started, see test program convloc.c:

#include <locale.h>
#include <stdio.h>
#include <sys/cygwin.h>
#include <stdlib.h>
int main() {
   setlocale(LC_ALL, "C.UTF-8");
   char * utfstring = "böh";
   printf("ustring <%s>\n", utfstring);
   wchar_t * wstring = cygwin_create_path(CCP_POSIX_TO_WIN_W, utfstring);
   printf("wstring <%ls>\n", wstring);
}

Run in a UTF-8 terminal:
 > LC_CTYPE=de_DE ./convloc
ustring (C.UTF-8) <böh>
wstring (C.UTF-8) <D:\TEMP\böh>

In sys_wcstombs in strfuncs.cc I see:
   const char *charset = cygheap->locale.charset;
which is set in internal_setlocale ()...

In fact, the situation can be fixed by adding after setlocale():
   cygwin_internal(CW_INT_SETLOCALE);  // -> internal_setlocale();
(cf. https://sourceware.org/ml/cygwin-developers/2010-02/msg00054.html)
but I think those functions should use the proper locale implicitly; 
according to the generic description in 
http://linux.die.net/man/3/setlocale,
LC_CTYPE affects ... conversion ... functions, in my opinion this would 
include cygwin-specific conversion functions as well as implicitly 
called conversion (see open() below).
The same problem applies to the open() function (involving path conversion).
The wide string function mbstowcs behaves as expected.


The whole issue occurred to me while trying to work around a missing 
conversion functionality, just converting the pathname syntax between 
Unicode strings. The desired options would be like:
   CCP_POSIX_W_TO_WIN_W,   /* from is wchar_t *posix, to is wchar_t 
*win32  */
   CCP_WIN_W_TO_POSIX_W,   /* from is wchar_t *win32, to is wchar_t 
*posix  */

------
Thomas

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: cygwin_conv_ functions and character encoding
  2016-02-05 15:47 cygwin_conv_ functions and character encoding Thomas Wolff
@ 2016-02-08 13:15 ` Corinna Vinschen
  0 siblings, 0 replies; 2+ messages in thread
From: Corinna Vinschen @ 2016-02-08 13:15 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 2013 bytes --]

On Feb  5 16:47, Thomas Wolff wrote:
> The cygwin path conversion functions ignore the current locale;
> rather they seem to always use the locale environment set when the program
> was started, see test program convloc.c:
> [...]
> In sys_wcstombs in strfuncs.cc I see:
>   const char *charset = cygheap->locale.charset;
> which is set in internal_setlocale ()...
> 
> In fact, the situation can be fixed by adding after setlocale():
>   cygwin_internal(CW_INT_SETLOCALE);  // -> internal_setlocale();
> (cf. https://sourceware.org/ml/cygwin-developers/2010-02/msg00054.html)
> but I think those functions should use the proper locale implicitly;
> according to the generic description in
> http://linux.die.net/man/3/setlocale,
> LC_CTYPE affects ... conversion ... functions, in my opinion this would
> include cygwin-specific conversion functions as well as implicitly called
> conversion (see open() below).
> The same problem applies to the open() function (involving path conversion).
> The wide string function mbstowcs behaves as expected.

Path conversion is a problem when switching locales.  Typically
an application calls setlocale (LC_ALL, "") only and in that case the
conversion works as desired.  We could switch cygwin_path_conv as
you said, but I'm wondering what side-effects that may have.  See
the comment in internal_setlocale().

> The whole issue occurred to me while trying to work around a missing
> conversion functionality, just converting the pathname syntax between
> Unicode strings. The desired options would be like:
>   CCP_POSIX_W_TO_WIN_W,   /* from is wchar_t *posix, to is wchar_t *win32
> */
>   CCP_WIN_W_TO_POSIX_W,   /* from is wchar_t *win32, to is wchar_t *posix
> */

Those are not available because POSIX paths are always multibyte strings.
Patches welcome, though.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-02-08 13:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-05 15:47 cygwin_conv_ functions and character encoding Thomas Wolff
2016-02-08 13:15 ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).