* The C locale @ 2009-08-30 16:59 Andy Koppe 2009-08-31 0:53 ` Christopher Faylor 0 siblings, 1 reply; 51+ messages in thread From: Andy Koppe @ 2009-08-30 16:59 UTC (permalink / raw) To: cygwin Trying to reply to Tuomo Valkonen's post about locale issues, I got rather confused about the C locale. The manual and the POSIX standard say that it supports ASCII only, so in theory anything above 0x7F should be rejected. In practice though, both Cygwin 1.5 and 1.7 do support characters above 0x7F in the C locale, which could be quite useful. Trouble is, they do so rather inconsistenly. Both in 1.5 and 1.7, the mb conversion functions treat such characters as ISO-8859-1. In other words, conversion between chars and wchars are simple casts (except that wchars above 0xFF can't be converted). This makes some sense. Filename handling is different though. Cygwin 1.5 translates filenames according to the system's ANSI codepage. I guess the inconsistency with the mb functions didn't really matter, as the mb functions were pretty much useless anyway, and supporting the system codepage was more important. So, with Cygwin 1.7, I'd have expected filename handling in the C locale to either use ISO-8859-1 for consistency with the mb functions, or the ANSI codepage for compatibility with 1.5. In actual fact though, it uses UTF-8. Is this on purpose? If so, shouldn't the multibyte conversions functions in the C locale use UTF-8 as well? Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-08-30 16:59 The C locale Andy Koppe @ 2009-08-31 0:53 ` Christopher Faylor 2009-09-02 6:29 ` Andy Koppe 0 siblings, 1 reply; 51+ messages in thread From: Christopher Faylor @ 2009-08-31 0:53 UTC (permalink / raw) To: cygwin On Sun, Aug 30, 2009 at 05:59:11PM +0100, Andy Koppe wrote: >Trying to reply to Tuomo Valkonen's post about locale issues, I got >rather confused about the C locale. The manual and the POSIX standard >say that it supports ASCII only, so in theory anything above 0x7F >should be rejected. In practice though, both Cygwin 1.5 and 1.7 do >support characters above 0x7F in the C locale, which could be quite >useful. Trouble is, they do so rather inconsistenly. > >Both in 1.5 and 1.7, the mb conversion functions treat such characters >as ISO-8859-1. In other words, conversion between chars and wchars are >simple casts (except that wchars above 0xFF can't be converted). This >makes some sense. > >Filename handling is different though. Cygwin 1.5 translates filenames >according to the system's ANSI codepage. I guess the inconsistency >with the mb functions didn't really matter, as the mb functions were >pretty much useless anyway, and supporting the system codepage was >more important. > >So, with Cygwin 1.7, I'd have expected filename handling in the C >locale to either use ISO-8859-1 for consistency with the mb functions, >or the ANSI codepage for compatibility with 1.5. In actual fact >though, it uses UTF-8. > >Is this on purpose? If so, shouldn't the multibyte conversions >functions in the C locale use UTF-8 as well? Since Cygin has a clear system that it is supposed to be emulating, the real question is "What does Linux do?" cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-08-31 0:53 ` Christopher Faylor @ 2009-09-02 6:29 ` Andy Koppe 2009-09-02 11:48 ` Eric Blake 2009-09-02 13:56 ` IWAMURO Motonori 0 siblings, 2 replies; 51+ messages in thread From: Andy Koppe @ 2009-09-02 6:29 UTC (permalink / raw) To: cygwin Christopher Faylor: >Andy Koppe: >>Trying to reply to [banned]'s post about locale issues, I got >>rather confused about the C locale. The manual and the POSIX standard >>say that it supports ASCII only, so in theory anything above 0x7F >>should be rejected. In practice though, both Cygwin 1.5 and 1.7 do >>support characters above 0x7F in the C locale, which could be quite >>useful. Trouble is, they do so rather inconsistenly. >> >>Both in 1.5 and 1.7, the mb conversion functions treat such characters >>as ISO-8859-1. In other words, conversion between chars and wchars are >>simple casts (except that wchars above 0xFF can't be converted). This >>makes some sense. >> >>Filename handling is different though. Cygwin 1.5 translates filenames >>according to the system's ANSI codepage. I guess the inconsistency >>with the mb functions didn't really matter, as the mb functions were >>pretty much useless anyway, and supporting the system codepage was >>more important. >> >>So, with Cygwin 1.7, I'd have expected filename handling in the C >>locale to either use ISO-8859-1 for consistency with the mb functions, >>or the ANSI codepage for compatibility with 1.5. In actual fact >>though, it uses UTF-8. >> >>Is this on purpose? If so, shouldn't the multibyte conversions >>functions in the C locale use UTF-8 as well? > >Since Cygwin has a clear system that it is supposed to be emulating, >the real question is "What does Linux do?" Tried it on Debian and Suse: the multibyte conversion functions are strict ASCII, i.e. anything beyond 0x7F is considered an encoding error. POSIX requires that ASCII is supported in the C locale, but does not actually outlaw ASCII-compatible extensions beyond that. Locales don't affect filenames on Linux, i.e. any sequence of bytes passed to open() goes straight to disk (except for the path separator). This effectively means that filenames are encoded in whatever charset happened to be active at the time the file was created. Hence anyone accessing it with a different charset setting will get gibberish. POSIX is impressively unhelpful on the topic of filenames. All it guarantees for filenames is the "portable filename character set": ASCII letters and digits, plus the hyphen, dot, and underscore. So altogether we've got no fewer than four choices here: - strict ASCII (as with Linux mb functions) - ISO-8859-1 (as with newlib mb functions) - Default Windows ANSI/OEM codepage (as with Cygwin 1.5 filenames) - UTF-8 (as with Cygwin 1.7 filenames) In Cygwin 1.5, both file operations and the console use the default Windows codepage, which often contains all the characters a user cares about. If you set up readline for 8-bit I/O and change the console font to something useful, this works reasonably well, including Cygwin-created filenames showing up correctly in Explorer. A rather important exception is 'ls', which seems to have its own hardcoded limitation to 7 bits for the C locale: anything non-ASCII is shown as '? there'. Things do work correctly elsewhere though, e.g. in bash tab completion or Midnight Commander. A user with such a setup who upgrades to 1.7 will find that things will no longer work as before, since filenames are translated to UTF-8 whereas the console now seems to use ISO-8859-1 (presumably via the mb functions) by default. Hence a file called 'bäh' in Explorer (with a-umlaut in the middle), will show as 'bäh' instead. And if you try to create 'bäh' in Cygwin 1.7, you actually get a file called 'b', because the 'ä' (0xE4) in ISO-8859-1 turns into an encoding error when interpreted as UTF-8, and the name simply seems to be truncated at that point. I see two good solutions: - Use the default Windows codepage for filenames, console, and multibyte functions. This is what happens already if you specifiy a locale with a language but no charset, e.g. "en". Maximum 1.5 compatibility. - Use UTF-8 throughout. Full Unicode support out-of-the box. And a cheap'n'nasty one: - Restrict the multibyte functions and console to 7-bit ASCII. Still means it's inconsistent with the filename conversions, but at least non-ASCII characters wouldn't show up wrongly. Instead, they wouldn't show at all. Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-02 6:29 ` Andy Koppe @ 2009-09-02 11:48 ` Eric Blake 2009-09-02 20:10 ` Andy Koppe 2009-09-02 13:56 ` IWAMURO Motonori 1 sibling, 1 reply; 51+ messages in thread From: Eric Blake @ 2009-09-02 11:48 UTC (permalink / raw) To: cygwin -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Andy Koppe on 9/2/2009 12:29 AM: > A rather important exception is 'ls', which seems to have its own > hardcoded limitation to 7 bits for the C locale: anything non-ASCII is > shown as '? there'. That's only because the current build of cygwin ls pre-dates a lot of the locale support. I'm hoping that when I get time to build coreutils 7.5, that ls will start printing characters marked printable in the current locale. - -- Don't work too hard, make some time for fun as well! Eric Blake ebb9@byu.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkqeW5oACgkQ84KuGfSFAYCorACgwpbJ4oKz8+iEiwj5CkFgDBi+ +fkAoMJBlo9tZIyVzArULs9ZBQXREaI1 =ESA8 -----END PGP SIGNATURE----- -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-02 11:48 ` Eric Blake @ 2009-09-02 20:10 ` Andy Koppe 0 siblings, 0 replies; 51+ messages in thread From: Andy Koppe @ 2009-09-02 20:10 UTC (permalink / raw) To: cygwin Eric Blake: >> A rather important exception is 'ls', which seems to have its own >> hardcoded limitation to 7 bits for the C locale: anything non-ASCII is >> shown as '? there'. > > That's only because the current build of cygwin ls pre-dates a lot of the > locale support. I'm hoping that when I get time to build coreutils 7.5, > that ls will start printing characters marked printable in the current locale. Don't worry, on 1.7 it already works fine in locales other than "C". And it turns out that the restriction with the latter is due to newlib being inconsistent: whereas the conversion functions use ISO-8859-1, the ctype functions insist on ASCII, i.e. the isbla() functions return 0 for anything above 0x7F. So in the C locale we've currently got UTF-8 for filenames, ISO-8859-1 for the console and multibyte conversions, and ASCII for the ctype functions. Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-02 6:29 ` Andy Koppe 2009-09-02 11:48 ` Eric Blake @ 2009-09-02 13:56 ` IWAMURO Motonori 2009-09-07 20:08 ` Andy Koppe 1 sibling, 1 reply; 51+ messages in thread From: IWAMURO Motonori @ 2009-09-02 13:56 UTC (permalink / raw) To: cygwin Hi. 2009/9/2 Andy Koppe <andy.koppe@gmail.com>: > I see two good solutions: > - Use the default Windows codepage for filenames, console, and > multibyte functions. This is what happens already if you specifiy a > locale with a language but no charset, e.g. "en". Maximum 1.5 > compatibility. > - Use UTF-8 throughout. Full Unicode support out-of-the box. I want to use UTF-8 throughout. Because: - a lot of UNIX tools using network (e.g. rsync, scp, ...) treat the file name as 8bit byte array. - default locale of modern UNIX based OS is *.UTF-8. - The file with the filename including the character outside the codepage (e.g. files in iTunes folder) can be handled. -- IWAMURO Motnori <http://vmi.jp/> -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-02 13:56 ` IWAMURO Motonori @ 2009-09-07 20:08 ` Andy Koppe 2009-09-08 19:35 ` Corinna Vinschen 0 siblings, 1 reply; 51+ messages in thread From: Andy Koppe @ 2009-09-07 20:08 UTC (permalink / raw) To: cygwin 2009/9/2 IWAMURO Motonori: > I want to use UTF-8 throughout. > Because: > - a lot of UNIX tools using network (e.g. rsync, scp, ...) treat the > file name as 8bit byte array. > - default locale of modern UNIX based OS is *.UTF-8. > - The file with the filename including the character outside the > codepage (e.g. files in iTunes folder) can be handled. I'm minded to agree, but actually there's a big stumbling block here: many interactive programs in Cygwin do not (yet) support UTF-8, e.g. nano, mutt, and mc. If you try, you get all sorts of funny effects with invalid characters and mispositioned cursors. That's not acceptable as default. Which leaves one apparently good solution for the "C" locale: >> - Use the default Windows codepage for filenames, console, and >> multibyte functions. This is what happens already if you specifiy a >> locale with a language but no charset, e.g. "en". Maximum 1.5 >> compatibility. On a closely related note, Debian are introducing a "C.UTF-8" locale as a language-neutral locale with a UTF-8 character set. This is useful for choosing UTF-8 without picking up language-specific stuff like sorting rules. See here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776. It's a rather lengthy thread, but in the end they did decide to go for it. Cygwin 1.7, through newlib, already has "C-UTF-8", as well as the likes of "C-ISO-8859-1" or "C-SJIS". So how about replacing the "C-" with "C." in those, considering that Cygwin has no backward compatibility requirement regarding those? Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-07 20:08 ` Andy Koppe @ 2009-09-08 19:35 ` Corinna Vinschen 2009-09-08 20:48 ` Andy Koppe 2009-09-08 21:49 ` Andy Koppe 0 siblings, 2 replies; 51+ messages in thread From: Corinna Vinschen @ 2009-09-08 19:35 UTC (permalink / raw) To: cygwin On Sep 7 21:08, Andy Koppe wrote: > Which leaves one apparently good solution for the "C" locale: > >> - Use the default Windows codepage for filenames, console, and > >> multibyte functions. This is what happens already if you specifiy a > >> locale with a language but no charset, e.g. "en". Maximum 1.5 > >> compatibility. UTF-8 has been chosen because it has the advantage that every UTF-16 Windows filename will result in a valid multibyte string. Every choice has its advantage and its trade-offs. Maximum 1.5 compatibility (what for and how long?) vs. maximum default usability in the long run (at least I hope so). > On a closely related note, Debian are introducing a "C.UTF-8" locale > as a language-neutral locale with a UTF-8 character set. This is > useful for choosing UTF-8 without picking up language-specific stuff > like sorting rules. See here: > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776. It's a rather > lengthy thread, but in the end they did decide to go for it. Doesn't just setting LC_CTYPE=fo_ba.UTF-8 has the same result? > Cygwin 1.7, through newlib, already has "C-UTF-8", as well as the > likes of "C-ISO-8859-1" or "C-SJIS". So how about replacing the "C-" > with "C." in those, considering that Cygwin has no backward > compatibility requirement regarding those? No, but newlib has. That was the only reason to keep these specifiers. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-08 19:35 ` Corinna Vinschen @ 2009-09-08 20:48 ` Andy Koppe 2009-09-08 21:49 ` Andy Koppe 1 sibling, 0 replies; 51+ messages in thread From: Andy Koppe @ 2009-09-08 20:48 UTC (permalink / raw) To: cygwin 2009/9/8 Corinna Vinschen: >> Which leaves one apparently good solution for the "C" locale: >> >> - Use the default Windows codepage for filenames, console, and >> >> multibyte functions. This is what happens already if you specifiy a >> >> locale with a language but no charset, e.g. "en". Maximum 1.5 >> >> compatibility. > > UTF-8 has been chosen because it has the advantage that every UTF-16 > Windows filename will result in a valid multibyte string. Fair enough, if the console and the character conversion functions used UTF-8 as well (and if applications such as mc, nano and mutt were rebuilt with UTF-8 support). Unfortunately, they use ISO-8859-1, so out-of-the box the support for non-ASCII characters in Cygwin 1.7 is effectively broken. Please see posts earlier in this thread for the problems caused by this. Yes, users can set a locale variable to get this working, but hacking Cygwin.bat or finding the Windows environment variable dialog isn't exactly intuitive. And they didn't have to do that in 1.5 to at least get the Windows "ANSI" codepage working. > Every choice has its advantage and its trade-offs. The current choices have nothing but disadvantages, due to mixing of UTF-8 and ISO-8859-1. Besides, regarding the Windows codepage, wasn't the ^N scheme introduced to deal with filename characters outside the current charset? >> On a closely related note, Debian are introducing a "C.UTF-8" locale >> as a language-neutral locale with a UTF-8 character set. This is >> useful for choosing UTF-8 without picking up language-specific stuff >> like sorting rules. See here: >> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776. It's a rather >> lengthy thread, but in the end they did decide to go for it. > > Doesn't just setting LC_CTYPE=fo_ba.UTF-8 has the same result? For newlib, yes, because it doesn't (yet) care about the language part. But the language part nevertheless matters for many programs, and it may also matter when connecting to other hosts, e.g. by changing the sort order in 'ls'. "C.charset" would mean: give me all the default behaviours, except that I want this specific charset. >> Cygwin 1.7, through newlib, already has "C-UTF-8", as well as the >> likes of "C-ISO-8859-1" or "C-SJIS". So how about replacing the "C-" >> with "C." in those, considering that Cygwin has no backward >> compatibility requirement regarding those? > > No, but newlib has. Understood. I meant a __CYGWIN__-guarded change. Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-08 19:35 ` Corinna Vinschen 2009-09-08 20:48 ` Andy Koppe @ 2009-09-08 21:49 ` Andy Koppe 2009-09-21 10:38 ` Corinna Vinschen 1 sibling, 1 reply; 51+ messages in thread From: Andy Koppe @ 2009-09-08 21:49 UTC (permalink / raw) To: cygwin ps: > Maximum 1.5 compatibility (what for and how long?) vs. maximum > default usability in the long run (at least I hope so). Compatibilty for users upgrading to 1.7, who are used to being able to use the non-ASCII chars in their ANSI codepage, which is usually all they care about. And who have files encoded in that codepage, while being blissfully unaware what stuff like "LC_CTYPE" or "CP1251" means. And who are therefore going to complain about Cygwin 1.7 breaking their files. Using UTF-8 throughout is a worthwhile aim of course, but it's a bumpy road to get there, with lots of apps not yet ready. Moreover, is there actually any other OS where the "C" locale uses UTF-8? Afaik, Linuxes just set LANG to *.UTF-8 somewhere in the startup scripts. Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-08 21:49 ` Andy Koppe @ 2009-09-21 10:38 ` Corinna Vinschen 2009-09-21 13:08 ` Lapo Luchini ` (2 more replies) 0 siblings, 3 replies; 51+ messages in thread From: Corinna Vinschen @ 2009-09-21 10:38 UTC (permalink / raw) To: cygwin On Sep 8 22:49, Andy Koppe wrote: > ps: > > Maximum 1.5 compatibility (what for and how long?) vs. maximum > > default usability in the long run (at least I hope so). > > Compatibilty for users upgrading to 1.7, who are used to being able to > use the non-ASCII chars in their ANSI codepage, which is usually all > they care about. And who have files encoded in that codepage, while > being blissfully unaware what stuff like "LC_CTYPE" or "CP1251" means. > And who are therefore going to complain about Cygwin 1.7 breaking > their files. > > Using UTF-8 throughout is a worthwhile aim of course, but it's a bumpy > road to get there, with lots of apps not yet ready. Moreover, is there > actually any other OS where the "C" locale uses UTF-8? Afaik, Linuxes > just set LANG to *.UTF-8 somewhere in the startup scripts. Back from vacation I re-read this thread now and I have to say I just don't know what is the best course of action here. The idea to use UTF-8 for filename and console operations by default was to get the least problems converting from UTF-16 to multibyte, so that readdir() always returns a valid filename. Since the filename is supposed to be just a NUL-terminated stream of bytes, the application shouldn't care what the filename looks like, it should just always use it as is. In contrast to Linux filesystems, where the filename actually *is* a simple byte stream, we have to convert the filename back and forth from and to UTF-16. As for the conversion of filenames, you get the same problem on Linux if the filename contains non-ASCII bytes and these bytes are not a valid multibyte character in the current locale. Referring to another of your mails in this thread: > A user with such a setup who upgrades to 1.7 will find that things > will no longer work as before, since filenames are translated to UTF-8 > whereas the console now seems to use ISO-8859-1 (presumably via the mb > functions) by default. Hence a file called 'b\344h' in Explorer (with > a-umlaut in the middle), will show as 'bäh' instead. That's because the console uses the ascii conversion by default which is the newlib implementation just passing through all bytes unconverted, even the >=0x80 ones. That's ISO-8859-1 conincidentally. However, that means the console uses the same conversion as the application. Only the filename conversion uses UTF-8. > And if you try to create 'b\344h' in Cygwin 1.7, you actually get a file > called 'b', because the '\344' (0xE4) in ISO-8859-1 turns into an > encoding error when interpreted as UTF-8, and the name simply seems to > be truncated at that point. Yes, that *is* a problem. > I see two good solutions: > - Use the default Windows codepage for filenames, console, and > multibyte functions. This is what happens already if you specifiy a > locale with a language but no charset, e.g. "en". Maximum 1.5 > compatibility. Hmm, yes, that might be an option. Allowing the C.UTF-8 locale could workaround the remaining problems. > - Use UTF-8 throughout. Full Unicode support out-of-the box. What means "throughout"? Do you want ASCII multibyte conversion to use UTF-8 as well? Of course that will still result in problems if a shell script has a filename hardcoded in, say, CP1252. > And a cheap'n'nasty one: > - Restrict the multibyte functions and console to 7-bit ASCII. Still > means it's inconsistent with the filename conversions, but at least > non-ASCII characters wouldn't show up wrongly. Instead, they wouldn't > show at all. I remember having seen this on Linux as well in some GUI applications. Apart from that, the fourth solution is to stick to the current implementation to use UTF-8 for filenames by default and relaxed ASCII (ISO-8859-1) as provided by newlib for everything else. The problem is, I don't know for sure what the best appraoch is, and it seems nobody except you and Iwamuro are actually interested to discuss this. And you both have a contrary opinion in this matter. Personally I have no problem with the current approach. I understand the potential problems, but, as usual, solving it one way results in problems in another scenario and vice versa. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-21 10:38 ` Corinna Vinschen @ 2009-09-21 13:08 ` Lapo Luchini 2009-09-21 14:39 ` Charles Wilson 2009-09-21 21:20 ` Andy Koppe 2 siblings, 0 replies; 51+ messages in thread From: Lapo Luchini @ 2009-09-21 13:08 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 1246 bytes --] Corinna Vinschen wrote: >> And if you try to create 'b\344h' in Cygwin 1.7, you actually get a file >> called 'b', because the '\344' (0xE4) in ISO-8859-1 turns into an >> encoding error when interpreted as UTF-8, and the name simply seems to >> be truncated at that point. > > Yes, that *is* a problem. Doesn't seems to be exactly that simple: it doesn't stop on the FIRST non-UTF8 character, but just before the LAST one. So I guess it's not because it's an encoding error (I doubt the conversion is made from the end to the start?) but something more complex. http://cygwin.com/ml/cygwin/2009-09/msg00329.html > Personally I have no problem with the current approach. I understand > the potential problems, but, as usual, solving it one way results in > problems in another scenario and vice versa. FWIW I do like the current approach: for example I can transfer with rsync and commit and checkout with monotone any filename including Japanese characters... (well, except the names of the aforementioned thread, but that's a bug which can be solved, not something implied in the current approach) -- Lapo Luchini - http://lapo.it/ “C is quirky, flawed, and an enormous success.” (Dennis M. Ritchie) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 898 bytes --] ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-21 10:38 ` Corinna Vinschen 2009-09-21 13:08 ` Lapo Luchini @ 2009-09-21 14:39 ` Charles Wilson 2009-09-21 21:20 ` Andy Koppe 2 siblings, 0 replies; 51+ messages in thread From: Charles Wilson @ 2009-09-21 14:39 UTC (permalink / raw) To: cygwin Corinna Vinschen wrote: > The problem is, I don't know for sure what the best appraoch is, and it > seems nobody except you and Iwamuro are actually interested to discuss > this. I don't know about anyone else, but I haven't chimed in because I don't know enough about the issue to have an intelligent opinion. I think the problem is that the already limited universe of cygwin contributors becomes very tiny when those ignorant of NLS/char-encoding issues are self-excluded. Sorry. I've just been hoping that those of you who DO know enough about this issue can reach a mutually satisfactory compromise/solution. -- Chuck -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-21 10:38 ` Corinna Vinschen 2009-09-21 13:08 ` Lapo Luchini 2009-09-21 14:39 ` Charles Wilson @ 2009-09-21 21:20 ` Andy Koppe 2009-09-22 5:59 ` Lapo Luchini 2009-09-24 7:03 ` IWAMURO Motonori 2 siblings, 2 replies; 51+ messages in thread From: Andy Koppe @ 2009-09-21 21:20 UTC (permalink / raw) To: cygwin 2009/9/21 Corinna Vinschen: > Back from vacation I re-read this thread now and I have to say I just > don't know what is the best course of action here. I'm afraid I can only reiterate what I said previously. Let's use the Windows "ANSI" codepage as the character set for the C locale, for both the conversion functions and filenames. This means CP1252 on Western systems, CP1251 on Cyrillic ones, CP932 on Japanese ones, and so on. This way, the non-ASCII needs of most users are covered out-of-the-box, and compatibility with Cygwin 1.5 and users' ANSI-encoded files is ensured. Applications that still assume that a byte and a character are the same thing work correctly (except that they'll treat East Asian doublebyte chars as two characters, but a different default charset won't cure that). Filenames created on the Cygwin side show up correctly in Explorer. Windows filenames show up correctly in Cygwin as long as they're limited to the ANSI codepage. The ^N encoding nevertheless ensures that UTF-16 characters outside that codepage are uniquely represented. Beyond that, encourage maintainers to make their applications UTF-8-capable and encourage users to choose a UTF-8 locale. Consider adding a locale setting to setup.exe that gets written to cygwin.bat. > The idea to use UTF-8 for filename and console operations by default was > to get the least problems converting from UTF-16 to multibyte, so that > readdir() always returns a valid filename. But the ^N scheme does ensure that for any charset anyway, doesn't it? > As for the conversion of filenames, you get the same problem on Linux if > the filename contains non-ASCII bytes and these bytes are not a valid > multibyte character in the current locale. Yes, but Cygwin does actually have a big advantage here. Unlike Linux, where the filename encoding is basically undefined, we *know* that Windows filenames are always encoded as UTF-16. Therefore, the Cygwin file functions do have the chance to always translate filenames correctly into the application's locale. And with any locale except "C" and "POSIX", this is working very well, due to your great work implementing all the difficult bits such as the ^N and 0xDC?? encodings and UTF-16 surrogates (and notwithstanding the issue with translating 0xDC??s to charsets other than UTF-8). >> I see two good solutions: >> - Use the default Windows codepage for filenames, console, and >> multibyte functions. This is what happens already if you specifiy a >> locale with a language but no charset, e.g. "en". Maximum 1.5 >> compatibility. > > Hmm, yes, that might be an option. Allowing the C.UTF-8 locale > could workaround the remaining problems. Not sure that the C.UTF-8 locale is necessary for that, but it would be nice to have, and it's easy to implement. >> - Use UTF-8 throughout. Full Unicode support out-of-the box. > > What means "throughout"? Do you want ASCII multibyte conversion to > use UTF-8 as well? Yep, that was the idea, but later on I realised that it's not a good one, because too many applications still assume that a byte and a character are the same thing. For example, start nano in a UTF-8 locale, enter a few umlauts, and move the cursor around, and you'll see some weird effects. Similarly, filenames with non-ASCII chars will corrupt midnight commander's display. Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-21 21:20 ` Andy Koppe @ 2009-09-22 5:59 ` Lapo Luchini 2009-09-22 6:23 ` Lapo Luchini 2009-09-22 6:47 ` Andy Koppe 2009-09-24 7:03 ` IWAMURO Motonori 1 sibling, 2 replies; 51+ messages in thread From: Lapo Luchini @ 2009-09-22 5:59 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 1454 bytes --] Andy Koppe wrote: > This way, the non-ASCII needs of most users are covered > out-of-the-box [...] > Windows filenames show up correctly in Cygwin as long as they're > limited to the ANSI codepage. I fail to see how that is a desiderable thing. Filesystem is UTF-16, Cygwin is now Unicode-aware, but anything that doesn't fit ANSI is thrown away for the sake of retro-compatibility of Cygwin-1.5 which was not Unicode-aware? As a user, the ability to show correctly formatted UTF-8 filenames is one of the features I most appreciated in Cygwin-1.7 and reverting that would be a serious setback... even writing that in a ChangeLog would be a bit troublesome... "we added support for Unicode - except you can't use for anything you couldn't already do before when it was not there, since we're using ANSI as an intermediate format anyways"? > For example, start nano in a UTF-8 locale, enter a few umlauts, and > move the cursor around, and you'll see some weird effects. IMHO a bit of "weird effects" while moving cursor are a much less severe problem that being unable to write the filename "like I think it is". Using ANSI, anything over U+100 Unicode would be an ugly ^N-encoded 3-bytes-per-char ugly stuff which no human can "see" as the filename he intended to use. just my 2c -- Lapo Luchini - http://lapo.it/ “You don't have to distrust the government to want to use cryptography.” (Phil Zimmermann) [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 896 bytes --] ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-22 5:59 ` Lapo Luchini @ 2009-09-22 6:23 ` Lapo Luchini 2009-09-22 6:50 ` Andy Koppe 2009-09-22 6:47 ` Andy Koppe 1 sibling, 1 reply; 51+ messages in thread From: Lapo Luchini @ 2009-09-22 6:23 UTC (permalink / raw) To: cygwin Lapo Luchini wrote: > I fail to see how that is a desiderable thing. > Filesystem is UTF-16, Cygwin is now Unicode-aware, but anything that > doesn't fit ANSI is thrown away for the sake of retro-compatibility of > Cygwin-1.5 which was not Unicode-aware? On a second reading, I guess you meant that *ONLY for LANG=C* and leave the current usage for LANG=xx_XX.UTF-8, is that so? In that case, the "forced ANSI retro-compatibility" would only bit people with a missing (or messed up) environment and that's an ill-defined situation where nobody can really argue if he gets suboptimal results. Also, when used for scripts with no user-interaction being capable to save every filename and read it again the same format is a sure pro. If, OTOH, you're an interactive shell and your user likely wants to "see stuff" well, he better set a LANG env. PS: this would work around the "LANG=C ls" file-not-found issues, but a solution for "LANG=xx_XX.UTF-8 ls" issue would still be neeeded. (same goes for `find $dir -delete`) -- Lapo Luchini - http://lapo.it/ âTwo can keep a secret if one is dead.â (anonymous) -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-22 6:23 ` Lapo Luchini @ 2009-09-22 6:50 ` Andy Koppe 0 siblings, 0 replies; 51+ messages in thread From: Andy Koppe @ 2009-09-22 6:50 UTC (permalink / raw) To: cygwin 2009/9/22 Lapo Luchini: > On a second reading, I guess you meant that *ONLY for LANG=C* and leave > the current usage for LANG=xx_XX.UTF-8, is that so? Yes, this thread is solely about the C locale. Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-22 5:59 ` Lapo Luchini 2009-09-22 6:23 ` Lapo Luchini @ 2009-09-22 6:47 ` Andy Koppe 2009-09-22 8:43 ` Lapo Luchini 1 sibling, 1 reply; 51+ messages in thread From: Andy Koppe @ 2009-09-22 6:47 UTC (permalink / raw) To: cygwin 2009/9/22 Lapo Luchini: > Andy Koppe wrote: >> This way, the non-ASCII needs of most users are covered >> out-of-the-box [...] >> Windows filenames show up correctly in Cygwin as long as they're >> limited to the ANSI codepage. > > I fail to see how that is a desiderable thing. > Filesystem is UTF-16, Cygwin is now Unicode-aware, but anything that > doesn't fit ANSI is thrown away [...]? No, it isn't. UTF-16 filename characters that can't be represented in the current charset are encoded by a ^N followed by the character's UTF-8 representation. The current C locale, on the other hand, simply represents all non-ASCII characters as UTF-8, even though the application charset is ISO-8859-1. This means that even those characters that can be represented in the application charset show up incorrectly. For example, a Windows filename "bäh" turns into "bŤh" in the C locale, while it shows up correctly with explicitly set ISO-8859-1 or CP1252. > As a user, the ability to show correctly formatted UTF-8 filenames is > one of the features I most appreciated in Cygwin-1.7 That ability isn't going anywhere. As before, you need to set your locale to one with a UTF-8 charset to get full UTF-8 support. Btw, are you actually using the C locale? Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-22 6:47 ` Andy Koppe @ 2009-09-22 8:43 ` Lapo Luchini 2009-09-22 12:50 ` Andy Koppe 0 siblings, 1 reply; 51+ messages in thread From: Lapo Luchini @ 2009-09-22 8:43 UTC (permalink / raw) To: cygwin Andy Koppe wrote: > No, it isn't. UTF-16 filename characters that can't be represented in > the current charset are encoded by a ^N followed by the character's > UTF-8 representation. OK, right. > For example, a Windows filename "bäh" turns into "bà ¤h" in the C locale, > while it shows up correctly with explicitly set ISO-8859-1 or CP1252. Uh? Doesn't seem so to me: if I create "bäh" in WindowsExplorer, then open up an UTF-8 mintty console I have a consistent output with both LANG=C and LANG=it_IT.UTF-8 (of course, since right now C is UTF-8): % LANG=C ls -l|egrep b.h -rw-r--r-- 1 lapo None 0 Sep 22 09:53 bäh % LANG=it_IT.UTF-8 ls -l|egrep b.h -rw-r--r-- 1 lapo None 0 22 Sep 09:53 bäh So I'm not sure what do you mean with 'a Windows filename "bäh" turns into "bà ¤h" in the C locale'... you mean that a script sees it as 62C3A468 as opposed as 62E468? Or that actual "bà ¤h" is shown somewhere? As "bà ¤h" is just a representation, and it depends on the charset the console expects (and in fact in this UTF-8-encoded message, it will be probably represented with 62C385C2A468)... if the console is UTF-8, what's currently shown is what I'd expect. If OTOH we're talking what it is in raw form and not of what is shown (i.e. about "3 bytes" vs a "4 bytes" string) well, that's a different issue, and I'm not sure why a program should prefer a 3-byte representations as opposed to a 4-byte one...? But OTOH as far as "not caring" goes, it sure can be a nice feature to be retro-compatible in that single case, since the behavior is not well-defined anyways... But again, if a script creates a filename that happens to contain Japanese characters (or even umlauts or r-quotes/l-quotes) I would expect to see that on the filesystem too, and not some random-looking escaped-sequence... > Btw, are you actually using the C locale? Not usually, but it happens from time to time (mostly in script, or in cases such as the monotone "make check" unit tests; one which tries to create UTF-8 filenames and then ISO-8859-1 filenames currently fail). -- Lapo Luchini - http://lapo.it/ âEndure. In enduring, grow strong.â (Dak'kon, videogame "Torment", 1999) -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-22 8:43 ` Lapo Luchini @ 2009-09-22 12:50 ` Andy Koppe 2009-09-22 16:26 ` Lapo Luchini 0 siblings, 1 reply; 51+ messages in thread From: Andy Koppe @ 2009-09-22 12:50 UTC (permalink / raw) To: cygwin 2009/9/22 Lapo Luchini: >> For example, a Windows filename "bäh" turns into "bŤh" in the C locale, >> while it shows up correctly with explicitly set ISO-8859-1 or CP1252. > > Uh? Doesn't seem so to me: if I create "bäh" in WindowsExplorer, then > open up an UTF-8 mintty console I have a consistent output with both > LANG=C and LANG=it_IT.UTF-8 (of course, since right now C is UTF-8): > > % LANG=C ls -l|egrep b.h > -rw-r--r-- 1 lapo None 0 Sep 22 09:53 bäh > % LANG=it_IT.UTF-8 ls -l|egrep b.h > -rw-r--r-- 1 lapo None 0 22 Sep 09:53 bäh You've presumably got mintty set to UTF-8, hence mintty's output conversion turned ls's ISO-8859-1 "Ť" (i.e. "\xC3\xA4") into "ä". > So I'm not sure what do you mean with 'a Windows filename "bäh" turns > into "bŤh" in the C locale'... you mean that a script sees it as > 62C3A468 as opposed as 62E468? Or that actual "bŤh" is shown somewhere? Both. For the latter, try it in the default Cygwin console, without any locale variables set. > But OTOH as far as "not caring" goes, it sure can be a nice feature to > be retro-compatible in that single case Thanks. Unfortunately the "C" locale is rather important though, because that's what people will be using unless they go to the effort of finding out how to set a different locale. Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-22 12:50 ` Andy Koppe @ 2009-09-22 16:26 ` Lapo Luchini 2009-09-22 16:49 ` Mark J. Reed 2009-09-22 22:11 ` Thorsten Kampe 0 siblings, 2 replies; 51+ messages in thread From: Lapo Luchini @ 2009-09-22 16:26 UTC (permalink / raw) To: cygwin Andy Koppe wrote: > You've presumably got mintty set to UTF-8, hence mintty's output > conversion turned ls's ISO-8859-1 "à ¤" (i.e. "\xC3\xA4") into "ä". There never was any ISO-8859-1 "à ¤" in the first place, only one a-umlaut entered in WindowsExplorer (in the expected way) and correctly interpreted by a UTF8-capable terminal which is doing his job. Nobody ever intended to write a Latin1 string with the meaning of "A-ring + currency symbol" which has been translated by chance in a a-umlaut... >> you mean that a script sees it as 62C3A468 as opposed as 62E468? >> Or that actual "bà ¤h" is shown somewhere? > > Both. For the latter, try it in the default Cygwin console, without > any locale variables set. OK, if you consider "what is shown in cmd.exe" as "the real stuff" then I agree with you. But cmd.exe isn't even capable of printing the Euro sign (no cygwin involved, I mean the plain Windows Prompt), I guess there's no hope to ever seeing in there anything but a very limited output... (which surprises me a bit: Euro sign is present in CP1252) I agree with you that the "default console" installed by the default installation SHOULD be able to show the more common accents at the very least (à èéìòù in Italy, umaluts and à in Germany and so on,), but wouldn't it be possible to offer the user *something better* than plain limited cmd.exe, in the default installation? -- Lapo Luchini - http://lapo.it/ âThere is no reason anyone would want a computer in their home.â (Ken Olson, founder of DEC, 1977) -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-22 16:26 ` Lapo Luchini @ 2009-09-22 16:49 ` Mark J. Reed 2009-09-22 17:04 ` Lapo Luchini 2009-09-22 22:11 ` Thorsten Kampe 1 sibling, 1 reply; 51+ messages in thread From: Mark J. Reed @ 2009-09-22 16:49 UTC (permalink / raw) To: cygwin On Tue, Sep 22, 2009 at 12:26 PM, Lapo Luchini wrote: > There never was any ISO-8859-1 "Ť" in the first place, only one > a-umlaut entered in WindowsExplorer (in the expected way) and correctly > interpreted by a UTF8-capable terminal which is doing his job. > > Nobody ever intended to write a Latin1 string with the meaning of > "A-ring + currency symbol" which has been translated by chance in a > a-umlaut... Yes, but it's working because you (1) lied about your locale (using C when your terminal is set to UTF-8) and (2) happen to have your terminal set to UTF-8, which is how Cygwin happens to be encoding the character. It's a big accident and stops working if you were actually using a non-UTF-8 terminal and locale (hopefully matching ones). -- Mark J. Reed <markjreed@gmail.com> -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-22 16:49 ` Mark J. Reed @ 2009-09-22 17:04 ` Lapo Luchini 0 siblings, 0 replies; 51+ messages in thread From: Lapo Luchini @ 2009-09-22 17:04 UTC (permalink / raw) To: cygwin Mark J. Reed wrote: > Yes, but it's working because you (1) lied about your locale (using C > when your terminal is set to UTF-8) and (2) happen to have your > terminal set to UTF-8, which is how Cygwin happens to be encoding the > character. It's a big accident and stops working if you were > actually using a non-UTF-8 terminal and locale (hopefully matching > ones). I'm very sorry, but I still can't see your point... =( It's true, "by accident" my terminal is using the more general ASCII-compatible charset possible (that is, UTF-8) and Cygwin is currently using that as a default as well, ok. So LANG=C works essentially because my terminal uses THE SAME charset as Cygwin uses by default (and not specifically because that's UTF-8). But OTOH if LANG=C used CP1252 it would only work only if my terminal "by accident" was using the very same CP1252 and would stop working if I were using a non-CP1252 terminal and matching locale. How is this a fundamentally different case? In the first case I have to match my terminal, but I can see *any* character really and never get any "surprise". In the second case I can use default cmd.exe, but I get a crippled output in many possible usecases. The main reason I see for using CP1252 (or anything that's the default CP, CP1252 is just an example) is that cygwin-in-cmd.exe would show the *same* crippledness shown by the default native WindowsPrompt, so even if very limited, the user would get the least surprise. And as far a traffic on cygwin@cygwin.com goes, I see that's a VERY valid issue. -- Lapo Luchini - http://lapo.it/ âPremature optimisation is the root of all evil in programming.â (C. A. R. Hoare) -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-22 16:26 ` Lapo Luchini 2009-09-22 16:49 ` Mark J. Reed @ 2009-09-22 22:11 ` Thorsten Kampe 2009-09-23 5:12 ` Lapo Luchini 1 sibling, 1 reply; 51+ messages in thread From: Thorsten Kampe @ 2009-09-22 22:11 UTC (permalink / raw) To: cygwin * Lapo Luchini (Tue, 22 Sep 2009 18:26:32 +0200) > But cmd.exe isn't even capable of printing the Euro sign (no cygwin > involved, I mean the plain Windows Prompt), I guess there's no hope to > ever seeing in there anything but a very limited output... (which > surprises me a bit: Euro sign is present in CP1252) Microsoft Windows [Version 6.1.7600] Copyright (c) 2009 Microsoft Corporation. All rights reserved. thorsten@HOMBRE[C:\Users\thorsten]> echo ⬠⬠thorsten@HOMBRE[C:\Users\thorsten]> chcp Active code page: 437 -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-22 22:11 ` Thorsten Kampe @ 2009-09-23 5:12 ` Lapo Luchini 2009-09-23 9:04 ` Thorsten Kampe 0 siblings, 1 reply; 51+ messages in thread From: Lapo Luchini @ 2009-09-23 5:12 UTC (permalink / raw) To: cygwin Thorsten Kampe wrote: > * Lapo Luchini (Tue, 22 Sep 2009 18:26:32 +0200) >> But cmd.exe isn't even capable of printing the Euro sign (no cygwin >> involved, I mean the plain Windows Prompt), I guess there's no hope to >> ever seeing in there anything but a very limited output... (which >> surprises me a bit: Euro sign is present in CP1252) > > Microsoft Windows [Version 6.1.7600] > Copyright (c) 2009 Microsoft Corporation. All rights reserved. > > thorsten@HOMBRE[C:\Users\thorsten]> echo ⬠> ⬠> > thorsten@HOMBRE[C:\Users\thorsten]> chcp > Active code page: 437 OK, so it *can* display it. But why it does not, when showing a filename? (which was what I did in the previous message to check) http://img223.imageshack.us/img223/7821/winprompt.png I created a file "aà â¬ç§.txt" in WinExplorer, and then: C:\>dir *.txt Volume in drive C is Primary Volume Serial Number is 8437-B5FC Directory of C:\ 23/09/2009 06.58 0 aà ??.txt 1 File(s) 0 bytes Errr.... actually I can't even reproduce your example. If I write "echo â¬" at the prompt I get this: C:\>echo ? ? C:\>chcp Active code page: 437 -- Lapo Luchini - http://lapo.it/ âI think, therefore I am⦠I think.â (Nordom, videogame "Torment", 1999) -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-23 5:12 ` Lapo Luchini @ 2009-09-23 9:04 ` Thorsten Kampe 2009-09-23 10:48 ` Lapo Luchini 0 siblings, 1 reply; 51+ messages in thread From: Thorsten Kampe @ 2009-09-23 9:04 UTC (permalink / raw) To: cygwin * Lapo Luchini (Wed, 23 Sep 2009 07:11:48 +0200) > > Thorsten Kampe wrote: > > * Lapo Luchini (Tue, 22 Sep 2009 18:26:32 +0200) > >> But cmd.exe isn't even capable of printing the Euro sign (no cygwin > >> involved, I mean the plain Windows Prompt), I guess there's no hope to > >> ever seeing in there anything but a very limited output... (which > >> surprises me a bit: Euro sign is present in CP1252) > > > > Microsoft Windows [Version 6.1.7600] > > Copyright (c) 2009 Microsoft Corporation. All rights reserved. > > > > thorsten@HOMBRE[C:\Users\thorsten]> echo ⬠> > ⬠> > > > thorsten@HOMBRE[C:\Users\thorsten]> chcp > > Active code page: 437 > > OK, so it *can* display it. > But why it does not, when showing a filename? > (which was what I did in the previous message to check) > > http://img223.imageshack.us/img223/7821/winprompt.png > > I created a file "aà �.txt" in WinExplorer, and then: > > C:\>dir *.txt > Volume in drive C is Primary > Volume Serial Number is 8437-B5FC > > Directory of C:\ > > 23/09/2009 06.58 0 aà ??.txt > 1 File(s) 0 bytes Works for me, too. Maybe not only the codepage but also the GUI locale settings are involved. This is on Windows 7. Thorsten -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-23 9:04 ` Thorsten Kampe @ 2009-09-23 10:48 ` Lapo Luchini 2009-09-23 12:04 ` Andy Koppe 2009-09-24 7:58 ` Thorsten Kampe 0 siblings, 2 replies; 51+ messages in thread From: Lapo Luchini @ 2009-09-23 10:48 UTC (permalink / raw) To: cygwin Thorsten Kampe wrote: >> I created a file "aà â¬ç§.txt" in WinExplorer, and then: >> >> 23/09/2009 06.58 0 aà ??.txt >> 1 File(s) 0 bytes > > Works for me, too. Maybe not only the codepage but also the GUI locale > settings are involved. This is on Windows 7. Oh, that's interesting, it may be they improved the console in Win7? Did you see only the euro or also the Japanese character? Uh, nope. I still get "aà ??.txt" in my Win7 vitual machine... both with chcp 850 and 437. Did you do anything regarding the console settings? (not that it does seem to have any) -- Lapo Luchini - http://lapo.it/ âThe best way to predict the future is to implement it.â (David Heinemeier Hansson) -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-23 10:48 ` Lapo Luchini @ 2009-09-23 12:04 ` Andy Koppe 2009-09-23 15:16 ` Mark J. Reed 2009-09-24 7:58 ` Thorsten Kampe 1 sibling, 1 reply; 51+ messages in thread From: Andy Koppe @ 2009-09-23 12:04 UTC (permalink / raw) To: cygwin 2009/9/23 Lapo Luchini: >> Works for me, too. Maybe not only the codepage but also the GUI locale >> settings are involved. This is on Windows 7. > > Oh, that's interesting, it may be they improved the console in Win7? > Did you see only the euro or also the Japanese character? > > Uh, nope. I still get "aà??.txt" in my Win7 vitual machine... both with > chcp 850 and 437. Did you do anything regarding the console settings? > (not that it does seem to have any) I think it depends on the font that's selected in the console properties. With the "Raster Font", only the "OEM" codepage (i.e. 437, 850, ...) is supported. With "Lucida Console" or "Consolas", however, it seems to switch to UTF-16 mode. (Well, more UCS2 than UTF-16 actually, since surrogates aren't supported.) Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-23 12:04 ` Andy Koppe @ 2009-09-23 15:16 ` Mark J. Reed 0 siblings, 0 replies; 51+ messages in thread From: Mark J. Reed @ 2009-09-23 15:16 UTC (permalink / raw) To: cygwin If I switch the console font to Lucida, I can see the Euro sign, too (even on XP Pro). But mixing and matching with Cygwin doesn't work well H:\>echo € | c:\cygwin\bin\od -t x1 0000000 3f 20 0d 0a (the Cygwin process saw the Euro sign as a question mark) but H:\>c:\cygwin\bin\echo € | c:\cygwin\bin\od -t x1 0000000 e2 82 ac 0a which is the proper UTF-8 encoding of the Euro sign. So the output of a Windows process coming in through a pipe is treated differently than input from the Windows console. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-23 10:48 ` Lapo Luchini 2009-09-23 12:04 ` Andy Koppe @ 2009-09-24 7:58 ` Thorsten Kampe 1 sibling, 0 replies; 51+ messages in thread From: Thorsten Kampe @ 2009-09-24 7:58 UTC (permalink / raw) To: cygwin * Lapo Luchini (Wed, 23 Sep 2009 12:48:03 +0200) > Thorsten Kampe wrote: > >> I created a file "aà �.txt" in WinExplorer, and then: > >> > >> 23/09/2009 06.58 0 aà ??.txt > >> 1 File(s) 0 bytes > > > > Works for me, too. Maybe not only the codepage but also the GUI locale > > settings are involved. This is on Windows 7. > > Oh, that's interesting, it may be they improved the console in Win7? > Did you see only the euro or also the Japanese character? The Japanese character displayed as a question mark in my newsreader - so I had to skip that character when creating a file. > Uh, nope. I still get "aà ??.txt" in my Win7 vitual machine... both with > chcp 850 and 437. Did you do anything regarding the console settings? > (not that it does seem to have any) The only thing I changed was setting the console font to Dejavu Sans Mono. I created a text file with Japanese characters from Wikipedia. These displayed in Windows Explorer but not in Cmd.exe (empty squares - but no question marks). Thorsten -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-21 21:20 ` Andy Koppe 2009-09-22 5:59 ` Lapo Luchini @ 2009-09-24 7:03 ` IWAMURO Motonori 2009-09-24 7:34 ` Corinna Vinschen 1 sibling, 1 reply; 51+ messages in thread From: IWAMURO Motonori @ 2009-09-24 7:03 UTC (permalink / raw) To: cygwin 2009/9/22 Andy Koppe <andy.koppe@gmail.com>: > Let's use the Windows "ANSI" codepage as the character set for the C > locale, for both the conversion functions and filenames. This means > CP1252 on Western systems, CP1251 on Cyrillic ones, CP932 on Japanese > ones, and so on. I oppose the approach (the ANSI codepage is used at C locale) because CP932 (the codepage for Japanese) is hostile to the UNIX-like tools. The reason is that the CP932 format contains a lot of meta characters as follows. single character of CP932: /[\x00-\x7F\xA0-\xDF]|[\x81-\x9F\xE0-\xFC][\x40-\x7E\x80-\xFC]/ This has a ruined influence to the tools that don't see locale. -- IWAMURO Motnori <http://vmi.jp/> -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-24 7:03 ` IWAMURO Motonori @ 2009-09-24 7:34 ` Corinna Vinschen 2009-09-24 9:39 ` IWAMURO Motonori 0 siblings, 1 reply; 51+ messages in thread From: Corinna Vinschen @ 2009-09-24 7:34 UTC (permalink / raw) To: cygwin On Sep 24 16:03, IWAMURO Motonori wrote: > 2009/9/22 Andy Koppe <andy.koppe@gmail.com>: > > Let's use the Windows "ANSI" codepage as the character set for the C > > locale, for both the conversion functions and filenames. This means > > CP1252 on Western systems, CP1251 on Cyrillic ones, CP932 on Japanese > > ones, and so on. > > I oppose the approach (the ANSI codepage is used at C locale) because > CP932 (the codepage for Japanese) is hostile to the UNIX-like tools. > > The reason is that the CP932 format contains a lot of meta characters > as follows. > > single character of CP932: > /[\x00-\x7F\xA0-\xDF]|[\x81-\x9F\xE0-\xFC][\x40-\x7E\x80-\xFC]/ I don't understand. Are you saying that the single character in CP932 consists of 12 bytes? As far as I can see, CP932 is S-JIS, which is a just a simple double byte character set. What am I missing. > This has a ruined influence to the tools that don't see locale. Can you please try to explain the problem in a bit more detail for those of us not fluent in eastern asian languages? What do you mean with "hostile" and "ruined influence"? Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-24 7:34 ` Corinna Vinschen @ 2009-09-24 9:39 ` IWAMURO Motonori 2009-09-24 9:57 ` Corinna Vinschen 0 siblings, 1 reply; 51+ messages in thread From: IWAMURO Motonori @ 2009-09-24 9:39 UTC (permalink / raw) To: cygwin 2009/9/24 Corinna Vinschen <corinna-cygwin@cygwin.com>: > On Sep 24 16:03, IWAMURO Motonori wrote: >> 2009/9/22 Andy Koppe <andy.koppe@gmail.com>: >> > Let's use the Windows "ANSI" codepage as the character set for the C >> > locale, for both the conversion functions and filenames. This means >> > CP1252 on Western systems, CP1251 on Cyrillic ones, CP932 on Japanese >> > ones, and so on. >> >> I oppose the approach (the ANSI codepage is used at C locale) because >> CP932 (the codepage for Japanese) is hostile to the UNIX-like tools. >> >> The reason is that the CP932 format contains a lot of meta characters >> as follows. >> >> single character of CP932: >> /[\x00-\x7F\xA0-\xDF]|[\x81-\x9F\xE0-\xFC][\x40-\x7E\x80-\xFC]/ > > I don't understand. Are you saying that the single character in CP932 > consists of 12 bytes? As far as I can see, CP932 is S-JIS, which > is a just a simple double byte character set. What am I missing. - CP932 (Shift_JIS) has 1byte character and 2bytes character. - The range of 1byte character is 0x00-0x7F and 0xA0-0xDF. - The range of first byte of 2byte character is 0x80-0x9F and 0xE0-0xFC. - The range of second byte of 2byte character is 0x40-7E and 0x80-0xFC. This includes "[", "\", "]", "^", "`", "{", "|", "}". A lot of problems of the tools (don't see locale and use escaped string, globbing or regexp) are caused by the last fact. - Can't open file or directory. - Destroy filenames. - Lost files. For example: Case1: The CP932 byte sequence of "項目表.xls" is 8D 80 96 DA 95 *5C* (=='\') 2E 78 6C 73. When this character string is treated as a character string with the escape without locale, 0x5C disappears. Case2: When use regexp of /スポット/, I expect that it matches the character strings including "スポット". But, the tools (don't see locale) treat as /ス\x83|ット/ because the byte sequence of "スポット" is 83 58 83 *7C* (=='|') 83 62 83 67. As a result, the strings not expected are matched. Case3: When use glob of "データ0[0-9].dat", it treated as "デ\x81[\x83^0[0-9].dat". As a result, the files expected are not matched. -- IWAMURO Motnori <http://vmi.jp/> -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-24 9:39 ` IWAMURO Motonori @ 2009-09-24 9:57 ` Corinna Vinschen 2009-09-24 10:00 ` Corinna Vinschen 2009-09-27 3:44 ` IWAMURO Motonori 0 siblings, 2 replies; 51+ messages in thread From: Corinna Vinschen @ 2009-09-24 9:57 UTC (permalink / raw) To: cygwin On Sep 24 18:37, IWAMURO Motonori wrote: > 2009/9/24 Corinna Vinschen <corinna-cygwin@cygwin.com>: > > On Sep 24 16:03, IWAMURO Motonori wrote: > >> 2009/9/22 Andy Koppe <andy.koppe@gmail.com>: > >> > Let's use the Windows "ANSI" codepage as the character set for the C > >> > locale, for both the conversion functions and filenames. This means > >> > CP1252 on Western systems, CP1251 on Cyrillic ones, CP932 on Japanese > >> > ones, and so on. > >> > >> I oppose the approach (the ANSI codepage is used at C locale) because > >> CP932 (the codepage for Japanese) is hostile to the UNIX-like tools. > >> > >> The reason is that the CP932 format contains a lot of meta characters > >> as follows. > >> > >> single character of CP932: > >> /[\x00-\x7F\xA0-\xDF]|[\x81-\x9F\xE0-\xFC][\x40-\x7E\x80-\xFC]/ > > > > I don't understand. Are you saying that the single character in CP932 > > consists of 12 bytes? As far as I can see, CP932 is S-JIS, which > > is a just a simple double byte character set. What am I missing. > > - CP932 (Shift_JIS) has 1byte character and 2bytes character. > > - The range of 1byte character is 0x00-0x7F and 0xA0-0xDF. > > - The range of first byte of 2byte character is 0x80-0x9F and 0xE0-0xFC. > > - The range of second byte of 2byte character is 0x40-7E and 0x80-0xFC. > This includes "[", "\", "]", "^", "`", "{", "|", "}". Ok, thanks for your examples, they show neatly where the problem is. As you might know, the codepage 20932 (EUC-JP) is also not the same as the UNIX EUC_JP implementation. The JIS-X-0212 three byte codes are folded into two-byte sequences as described in a comment in strfuncs.cc: /* Unfortunately, the Windows eucJP codepage 20932 is not really 100% compatible to eucJP. It's a cute approximation which makes it a doublebyte codepage. The JIS-X-0212 three byte codes (0x8f,0xa1-0xfe,0xa1-0xfe) are folded into two byte codes as follows: The 0x8f is stripped, the next byte is taken as is, the third byte is mapped into the lower 7-bit area by masking it with 0x7f. So, for instance, the eucJP code 0x8f,0xdd,0xf8 becomes 0xdd,0x78 in CP 20932. To be really eucJP compatible, we have to map the JIS-X-0212 characters between CP 20932 and eucJP ourselves. */ My question is this: Is the S-JIS implementation on UNIX systems also using a different implementation to avoid using characters from the ASCII range? If so, can't we change the __sjis_wctomb and __sjis_mbtowc functions in the same manner as the __eucjp_wctomb and __eucjp_mbtowc functions to get a safer implementation? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-24 9:57 ` Corinna Vinschen @ 2009-09-24 10:00 ` Corinna Vinschen 2009-09-26 9:15 ` Corinna Vinschen 2009-09-27 3:44 ` IWAMURO Motonori 1 sibling, 1 reply; 51+ messages in thread From: Corinna Vinschen @ 2009-09-24 10:00 UTC (permalink / raw) To: cygwin On Sep 24 11:57, Corinna Vinschen wrote: > On Sep 24 18:37, IWAMURO Motonori wrote: > > - CP932 (Shift_JIS) has 1byte character and 2bytes character. > > > > - The range of 1byte character is 0x00-0x7F and 0xA0-0xDF. > > > > - The range of first byte of 2byte character is 0x80-0x9F and 0xE0-0xFC. > > > > - The range of second byte of 2byte character is 0x40-7E and 0x80-0xFC. > > This includes "[", "\", "]", "^", "`", "{", "|", "}". > > Ok, thanks for your examples, they show neatly where the problem is. > > As you might know, the codepage 20932 (EUC-JP) is also not the same > as the UNIX EUC_JP implementation. The JIS-X-0212 three byte codes > are folded into two-byte sequences as described in a comment in > strfuncs.cc: > > /* Unfortunately, the Windows eucJP codepage 20932 is not really 100% > compatible to eucJP. It's a cute approximation which makes it a > doublebyte codepage. > The JIS-X-0212 three byte codes (0x8f,0xa1-0xfe,0xa1-0xfe) are folded > into two byte codes as follows: The 0x8f is stripped, the next byte is > taken as is, the third byte is mapped into the lower 7-bit area by > masking it with 0x7f. So, for instance, the eucJP code 0x8f,0xdd,0xf8 > becomes 0xdd,0x78 in CP 20932. > > To be really eucJP compatible, we have to map the JIS-X-0212 characters > between CP 20932 and eucJP ourselves. */ > > My question is this: Is the S-JIS implementation on UNIX systems > also using a different implementation to avoid using characters > from the ASCII range? If so, can't we change the __sjis_wctomb > and __sjis_mbtowc functions in the same manner as the __eucjp_wctomb > and __eucjp_mbtowc functions to get a safer implementation? Hmm, as far as I can see from wikipedia, S-JIS is simply defined that way. Bah. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-24 10:00 ` Corinna Vinschen @ 2009-09-26 9:15 ` Corinna Vinschen 2009-09-27 3:21 ` IWAMURO Motonori 0 siblings, 1 reply; 51+ messages in thread From: Corinna Vinschen @ 2009-09-26 9:15 UTC (permalink / raw) To: cygwin On Sep 24 12:00, Corinna Vinschen wrote: > On Sep 24 11:57, Corinna Vinschen wrote: > > On Sep 24 18:37, IWAMURO Motonori wrote: > > > - CP932 (Shift_JIS) has 1byte character and 2bytes character. > > > > > > - The range of 1byte character is 0x00-0x7F and 0xA0-0xDF. > > > > > > - The range of first byte of 2byte character is 0x80-0x9F and 0xE0-0xFC. > > > > > > - The range of second byte of 2byte character is 0x40-7E and 0x80-0xFC. > > > This includes "[", "\", "]", "^", "`", "{", "|", "}". > > > > Ok, thanks for your examples, they show neatly where the problem is. > > > > As you might know, the codepage 20932 (EUC-JP) is also not the same > > as the UNIX EUC_JP implementation. The JIS-X-0212 three byte codes > > are folded into two-byte sequences as described in a comment in > > strfuncs.cc: > > > > /* Unfortunately, the Windows eucJP codepage 20932 is not really 100% > > compatible to eucJP. It's a cute approximation which makes it a > > doublebyte codepage. > > The JIS-X-0212 three byte codes (0x8f,0xa1-0xfe,0xa1-0xfe) are folded > > into two byte codes as follows: The 0x8f is stripped, the next byte is > > taken as is, the third byte is mapped into the lower 7-bit area by > > masking it with 0x7f. So, for instance, the eucJP code 0x8f,0xdd,0xf8 > > becomes 0xdd,0x78 in CP 20932. > > > > To be really eucJP compatible, we have to map the JIS-X-0212 characters > > between CP 20932 and eucJP ourselves. */ > > > > My question is this: Is the S-JIS implementation on UNIX systems > > also using a different implementation to avoid using characters > > from the ASCII range? If so, can't we change the __sjis_wctomb > > and __sjis_mbtowc functions in the same manner as the __eucjp_wctomb > > and __eucjp_mbtowc functions to get a safer implementation? > > Hmm, as far as I can see from wikipedia, S-JIS is simply defined > that way. Bah. This leads me to another question to you and other users working with Japanese systems. As far as I understood this, the default ANSI and OEM codepage on Japanese Windows systems is 932/SJIS, right? And your examples show nicely how bad codepage 932/SJIS is from a usability perspective. Right now, if you specify a locale like "ja_JP" on your machine, that is, without specifying the charset, Cygwin will fetch the ANSI codepage from Windows and use that as your charset. That means, LANG="ja_JP" will result in using the charset SJIS. The question is this: Wouldn't it be better from a usability perspective to avoid SJIS in this case, and to switch Cygwin to EUCJP instead? So, for a Japanese user: LANG="C" -> UTF-8 LANG="ja" -> EUCJP LANG="ja_JP" -> EUCJP LANG="ja_JP.SJIS" -> SJIS That would mean, *only* when specifying SJIS explicitely, Cygwin actually uses SJIS. Is that a feasible approach? Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-26 9:15 ` Corinna Vinschen @ 2009-09-27 3:21 ` IWAMURO Motonori 2009-09-28 16:03 ` IWAMURO Motonori 0 siblings, 1 reply; 51+ messages in thread From: IWAMURO Motonori @ 2009-09-27 3:21 UTC (permalink / raw) To: cygwin Hi. > the default ANSI and OEM codepage on Japanese Windows systems is > 932/SJIS, right? Yes. > LANG="C" -> UTF-8 (snip) > LANG="ja_JP.SJIS" -> SJIS It's good. > LANG="ja" -> EUCJP > LANG="ja_JP" -> EUCJP Hmmm, It is a difficult problem. I think selecting UTF-8 is good because eucJP is legacy. But, for interoperability with other UNIX-like system(*), I don't think selecting UTF-8 is good. * Solaris: ja, ja_JP -> eucJP * Linux (Debian): ja -> Unknown, ja_JP -> eucJP I need to think more... -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-27 3:21 ` IWAMURO Motonori @ 2009-09-28 16:03 ` IWAMURO Motonori 2009-09-28 16:16 ` Corinna Vinschen 0 siblings, 1 reply; 51+ messages in thread From: IWAMURO Motonori @ 2009-09-28 16:03 UTC (permalink / raw) To: cygwin 2009/9/27 IWAMURO Motonori <deenheart@gmail.com>: >> LANG="ja" -> EUCJP >> LANG="ja_JP" -> EUCJP > > Hmmm, It is a difficult problem. > > I think selecting UTF-8 is good because eucJP is legacy. > > But, for interoperability with other UNIX-like system(*), I don't > think selecting UTF-8 is good. > > * Solaris: ja, ja_JP -> eucJP > * Linux (Debian): ja -> Unknown, ja_JP -> eucJP > > I need to think more... My conclusion is as follows as a result of hearing other Japanese people's opinion: LANG=ja -> UTF-8 LANG=ja_JP -> UTF-8 Because, we specify "eucJP" explicitly when we need it. -- IWAMURO Motnori <http://vmi.jp/> -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-28 16:03 ` IWAMURO Motonori @ 2009-09-28 16:16 ` Corinna Vinschen 2009-09-29 0:23 ` wynfield ` (4 more replies) 0 siblings, 5 replies; 51+ messages in thread From: Corinna Vinschen @ 2009-09-28 16:16 UTC (permalink / raw) To: cygwin On Sep 29 01:03, IWAMURO Motonori wrote: > 2009/9/27 IWAMURO Motonori <deenheart@gmail.com>: > >> LANG="ja" -> EUCJP > >> LANG="ja_JP" -> EUCJP > > > > Hmmm, It is a difficult problem. > > > > I think selecting UTF-8 is good because eucJP is legacy. > > > > But, for interoperability with other UNIX-like system(*), I don't > > think selecting UTF-8 is good. > > > > * Solaris: ja, ja_JP -> eucJP > > * Linux (Debian): ja -> Unknown, ja_JP -> eucJP > > > > I need to think more... > > My conclusion is as follows as a result of hearing other Japanese > people's opinion: > > LANG=ja -> UTF-8 > LANG=ja_JP -> UTF-8 > > Because, we specify "eucJP" explicitly when we need it. Hmm. That's an interesting point. In theory this sounds like a good idea to be used for all locales which don't specify the charset explicitely, because that results in using the same charset, "UTF-8", for all such locales. "C", "ja" or "en_US" would all default to UTF-8. The downside is that a user, who needs to work under the default ANSI codepage for some reason, has to know the name of the default ANSI codepage. Right now any user who needs the default ANSI codepage can simply set LANG to some language code and go ahead, without having to know the number. With your solution, that wouldn't be possible anymore and the user would have to figure out the default ANSI codepage on the system before being able to use it. I honestly don't know if that's really a problem, though. But I don't want to take that feature away for now. Anybody having a strong opinion on this issue? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-28 16:16 ` Corinna Vinschen @ 2009-09-29 0:23 ` wynfield 2009-09-29 4:04 ` Andy Koppe 2009-09-29 13:55 ` IWAMURO Motonori 2009-09-29 4:27 ` Andy Koppe ` (3 subsequent siblings) 4 siblings, 2 replies; 51+ messages in thread From: wynfield @ 2009-09-29 0:23 UTC (permalink / raw) To: cygwin Though I'm not an up on the details involved here, I will give you feedback to the request for information about the locale issue, because it affects the quick accessability and usage of Japanese language documents. Either of the two follow values would be acceptable, but I feel that the UTF-8 charset is becoming more and more adopted. LANG=ja -> UTF-8 LANG=ja_JP -> UTF-8 Also the following be suitable if possible.. LANG=ja -> iso-2022-jp LANG=ja_JP -> iso-2022-jp Regards: > On Sep 29 01:03, IWAMURO Motonori wrote: > > > > > > ..... <snipped> > > > > > > I think selecting UTF-8 is good because eucJP is legacy. > > >> <and> > > > > My conclusion is as follows as a result of hearing other Japanese > > people's opinion: > > > > LANG=ja -> UTF-8 > > LANG=ja_JP -> UTF-8 > > > > Because, we specify "eucJP" explicitly when we need it. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-29 0:23 ` wynfield @ 2009-09-29 4:04 ` Andy Koppe 2009-09-29 13:55 ` IWAMURO Motonori 1 sibling, 0 replies; 51+ messages in thread From: Andy Koppe @ 2009-09-29 4:04 UTC (permalink / raw) To: cygwin 2009/9/29 wynfield: > > Though I'm not an up on the details involved here, I will give > you feedback to the request for information about the locale issue, because it affects the quick accessability and usage of Japanese language documents. > > Either of the two follow values would be acceptable, but I feel that the UTF-8 charset is becoming more and more adopted. > LANG=ja -> UTF-8 > LANG=ja_JP -> UTF-8 > > Also the following be suitable if possible.. > LANG=ja -> iso-2022-jp > LANG=ja_JP -> iso-2022-jp Thanks for the feedback! Now, Windows knows three different variants of iso-2022-jp. Do you know which one's the preferred one? CP50220: ISO 2022 Japanese with no halfwidth Katakana; Japanese (JIS) CP50221: ISO 2022 Japanese with halfwidth Katakana; Japanese (JIS-Allow 1 byte Kana) CP50222: ISO 2022 Japanese JIS X 0201-1989; Japanese (JIS-Allow 1 byte Kana - SO/SI) Also, Wikipedia has this to say: "Since ISO 2022 is a stateful encoding, a program can not jump in the middle of a block of text to search, insert or delete characters. This makes manipulation of the text very cumbersome and slow when compared to non-stateful encodings. Any jump in the middle of the text may require a back up to the previous escape sequence before the bytes following the escape sequence can be interpreted." Doesn't that make it very difficult to use with standard Unix tools? Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-29 0:23 ` wynfield 2009-09-29 4:04 ` Andy Koppe @ 2009-09-29 13:55 ` IWAMURO Motonori 1 sibling, 0 replies; 51+ messages in thread From: IWAMURO Motonori @ 2009-09-29 13:55 UTC (permalink / raw) To: cygwin 2009/9/29 <wynfield@gmail.com>: > Also the following be suitable if possible.. > LANG=ja -> iso-2022-jp > LANG=ja_JP -> iso-2022-jp Hmmm, I think that it is unreal. -- IWAMURO Motnori <http://vmi.jp/> -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-28 16:16 ` Corinna Vinschen 2009-09-29 0:23 ` wynfield @ 2009-09-29 4:27 ` Andy Koppe 2009-09-29 7:03 ` Corinna Vinschen 2009-09-29 10:55 ` Lapo Luchini ` (2 subsequent siblings) 4 siblings, 1 reply; 51+ messages in thread From: Andy Koppe @ 2009-09-29 4:27 UTC (permalink / raw) To: cygwin 2009/9/28 Corinna Vinschen >> My conclusion is as follows as a result of hearing other Japanese >> people's opinion: >> >> LANG=ja -> UTF-8 >> LANG=ja_JP -> UTF-8 >> >> Because, we specify "eucJP" explicitly when we need it. > > Hmm. > > That's an interesting point. > > In theory this sounds like a good idea to be used for all locales which > don't specify the charset explicitely, because that results in using the > same charset, "UTF-8", for all such locales. "C", "ja" or "en_US" > would all default to UTF-8. Hmm, there's much to be said for that. > The downside is that a user, who needs to work under the default ANSI > codepage for some reason, has to know the name of the default ANSI > codepage. Right now any user who needs the default ANSI codepage can > simply set LANG to some language code and go ahead, without having to > know the number. With your solution, that wouldn't be possible anymore > and the user would have to figure out the default ANSI codepage on the > system before being able to use it. How about an explicit "ANSI" charset that maps to GetACP()? And "OEM" for GetOEMCP()? Those would make easy replacements for the CYGWIN=codepage:[ansi|oem] option. Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-29 4:27 ` Andy Koppe @ 2009-09-29 7:03 ` Corinna Vinschen 0 siblings, 0 replies; 51+ messages in thread From: Corinna Vinschen @ 2009-09-29 7:03 UTC (permalink / raw) To: cygwin On Sep 29 05:27, Andy Koppe wrote: > 2009/9/28 Corinna Vinschen > > The downside is that a user, who needs to work under the default ANSI > > codepage for some reason, has to know the name of the default ANSI > > codepage. Â Right now any user who needs the default ANSI codepage can > > simply set LANG to some language code and go ahead, without having to > > know the number. Â With your solution, that wouldn't be possible anymore > > and the user would have to figure out the default ANSI codepage on the > > system before being able to use it. > > How about an explicit "ANSI" charset that maps to GetACP()? And "OEM" > for GetOEMCP()? Those would make easy replacements for the > CYGWIN=codepage:[ansi|oem] option. Not for 1.7.1. Maybe later, if there's any actual demand. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-28 16:16 ` Corinna Vinschen 2009-09-29 0:23 ` wynfield 2009-09-29 4:27 ` Andy Koppe @ 2009-09-29 10:55 ` Lapo Luchini 2009-09-29 11:12 ` Thomas Wolff 2009-09-29 14:13 ` IWAMURO Motonori 4 siblings, 0 replies; 51+ messages in thread From: Lapo Luchini @ 2009-09-29 10:55 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 427 bytes --] Corinna Vinschen wrote: > The downside is that a user, who needs to work under the default ANSI > codepage for some reason, has to know the name of the default ANSI > codepage. Mhhh... IMHO any user interested int his probably knows his own ANSI codepage all too well (CP1252 for me), but maybe that's a programmer's point of view and many users can have those issues as well. -- Lapo Luchini - http://lapo.it/ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 898 bytes --] ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-28 16:16 ` Corinna Vinschen ` (2 preceding siblings ...) 2009-09-29 10:55 ` Lapo Luchini @ 2009-09-29 11:12 ` Thomas Wolff 2009-09-29 12:12 ` Corinna Vinschen 2009-09-29 14:13 ` IWAMURO Motonori 4 siblings, 1 reply; 51+ messages in thread From: Thomas Wolff @ 2009-09-29 11:12 UTC (permalink / raw) To: cygwin Corinna Vinschen wrote: > On Sep 29 01:03, IWAMURO Motonori wrote: > >> 2009/9/27 IWAMURO Motonori <deenheart@gmail.com>: >> >>>> LANG="ja" -> EUCJP >>>> LANG="ja_JP" -> EUCJP >>>> >>> Hmmm, It is a difficult problem. >>> >>> I think selecting UTF-8 is good because eucJP is legacy. >>> >>> But, for interoperability with other UNIX-like system(*), I don't >>> think selecting UTF-8 is good. >>> >>> * Solaris: ja, ja_JP -> eucJP >>> * Linux (Debian): ja -> Unknown, ja_JP -> eucJP >>> >>> I need to think more... >>> >> My conclusion is as follows as a result of hearing other Japanese >> people's opinion: >> >> LANG=ja -> UTF-8 >> LANG=ja_JP -> UTF-8 >> >> Because, we specify "eucJP" explicitly when we need it. >> > > Hmm. > > That's an interesting point. > > In theory this sounds like a good idea to be used for all locales which > don't specify the charset explicitely, because that results in using the > same charset, "UTF-8", for all such locales. "C", "ja" or "en_US" > would all default to UTF-8. > The keyword here again should be compatibility. That means, unfortunately, that I do not think this is a good idea. A number of locales have been established on common systems that do not specify their encoding explicitly (i.e. in their name). Since there is now more or less a common set of such locales among various Linux and Unix systems, this seems to be a de-facto standard although I am not aware of any more formal definition/listing/description of this. On a modern Linux system, use the following command to get a list (not sure if it's appropriate to attach it here): for l in `locale -a` do echo "$l `LC_ALL=$l locale charmap`" done I have also tried to incorporate a best guess assembly of mappings from modern systems in my editor mined so it can derive the encoding from the locale name, so you could also take a working list from there. I think this list should be used for reference to define the locale/encoding mapping, other choices may be more attractive but only raise problems. > The downside is that a user, who needs to work under the default ANSI > codepage for some reason, has to know the name of the default ANSI > codepage. Right now any user who needs the default ANSI codepage can > simply set LANG to some language code and go ahead, without having to > know the number. With your solution, that wouldn't be possible anymore > and the user would have to figure out the default ANSI codepage on the > system before being able to use it. > > I honestly don't know if that's really a problem, though. But I don't > want to take that feature away for now. Anybody having a strong opinion > on this issue? > I wasn't quite aware that the old "codepage:oem" setting didn't strictly mean "CP850" or "CP437" but apparently the respective system locale. If that is really needed, maybe the "C" locale should get you there, or some "OEM" as (I think) Andy proposed. If someone feels the need to combine a specific language setting with the unspecific "system locale", well, maybe a pseudo encoding name could be invented to form names like "en_GB.OEM". Just leaving out the encoding suffix should not have that effect as I argued above. Kind regards, Thomas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-29 11:12 ` Thomas Wolff @ 2009-09-29 12:12 ` Corinna Vinschen 2009-09-29 14:30 ` IWAMURO Motonori 0 siblings, 1 reply; 51+ messages in thread From: Corinna Vinschen @ 2009-09-29 12:12 UTC (permalink / raw) To: cygwin On Sep 29 13:05, Thomas Wolff wrote: > Corinna Vinschen wrote: >> In theory this sounds like a good idea to be used for all locales which >> don't specify the charset explicitely, because that results in using the >> same charset, "UTF-8", for all such locales. "C", "ja" or "en_US" >> would all default to UTF-8. >> > The keyword here again should be compatibility. That means, > unfortunately, that I do not think this is a good idea. > A number of locales have been established on common systems that do not > specify their encoding explicitly (i.e. in their name). > Since there is now more or less a common set of such locales among > various Linux and Unix systems, this seems to be > a de-facto standard although I am not aware of any more formal > definition/listing/description of this. > On a modern Linux system, use the following command to get a list (not > sure if it's appropriate to attach it here): > for l in `locale -a` > do echo "$l `LC_ALL=$l locale charmap`" > done > > I have also tried to incorporate a best guess assembly of mappings from > modern systems in my editor mined so it can > derive the encoding from the locale name, so you could also take a > working list from there. > > I think this list should be used for reference to define the > locale/encoding mapping, other choices may be more attractive > but only raise problems. This isn't feasible for now. As I described in the documentation, the actual content of the language and territory part is not evaluated for now. *Only* the charset part (and the cjknarrow modifier, FWIW) have a meaning for newlib/Cygwin so far. What happens for now is that Cygwin calls a function which fetches the ANSI codepage and generates the current charset from there. So that's what happens: LANG="C" -> UTF-8 LANG="xx" -> charset equivalent to ANSI codepage LANG="xx_XX" -> ditto LANG="xx_XX.CHARSET" -> Use charset CHARSET We won't add extra functionality. In the long run it would be nice to change the setlocale functionality to use actual locale files in every respect, but that's wishful thinking for now. To return to the original problem which started this request. I asked if the default charset for the japanese language should be set to EUCJP rather than SJIS. The actual implementation would have been like this if (lang="xx or lang="xx_XX" with x in [a-z] and X in [A-Z]?) set_charset_from_codepage() set_charset_from_codepage() { switch (GetANSI ()) [...] case 932: charset="EUCJP" <-- Instead of the current `charset="SJIS" [...] } Everything going beyond this in complexity is out of the question for now. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-29 12:12 ` Corinna Vinschen @ 2009-09-29 14:30 ` IWAMURO Motonori 0 siblings, 0 replies; 51+ messages in thread From: IWAMURO Motonori @ 2009-09-29 14:30 UTC (permalink / raw) To: cygwin 2009/9/29 Corinna Vinschen <corinna-cygwin@cygwin.com>: > I asked if the default charset for the japanese language should be set > to EUCJP rather than SJIS. The actual implementation would have been > like this > > if (lang="xx or lang="xx_XX" with x in [a-z] and X in [A-Z]?) > set_charset_from_codepage() > > set_charset_from_codepage() > { > switch (GetANSI ()) > [...] > case 932: > charset="EUCJP" <-- Instead of the current `charset="SJIS" > [...] > } I think that it is not good for Japanese users because EUCJP doesn't become substitution of SJIS. -- IWAMURO Motnori <http://vmi.jp/> -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-28 16:16 ` Corinna Vinschen ` (3 preceding siblings ...) 2009-09-29 11:12 ` Thomas Wolff @ 2009-09-29 14:13 ` IWAMURO Motonori 2009-09-29 14:55 ` Corinna Vinschen 4 siblings, 1 reply; 51+ messages in thread From: IWAMURO Motonori @ 2009-09-29 14:13 UTC (permalink / raw) To: cygwin 2009/9/29 Corinna Vinschen <corinna-cygwin@cygwin.com>: > The downside is that a user, who needs to work under the default ANSI > codepage for some reason, has to know the name of the default ANSI > codepage. If the problem is a problem of 1.5->1.7 migration, how about building in the wizard which sets the locale environment variable to setup.exe? Is not it proper as the solution? -- IWAMURO Motnori <http://vmi.jp/> -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-29 14:13 ` IWAMURO Motonori @ 2009-09-29 14:55 ` Corinna Vinschen 0 siblings, 0 replies; 51+ messages in thread From: Corinna Vinschen @ 2009-09-29 14:55 UTC (permalink / raw) To: cygwin On Sep 29 23:13, IWAMURO Motonori wrote: > 2009/9/29 Corinna Vinschen <corinna-cygwin@cygwin.com>: > > The downside is that a user, who needs to work under the default ANSI > > codepage for some reason, has to know the name of the default ANSI > > codepage. > > If the problem is a problem of 1.5->1.7 migration, how about building > in the wizard which sets the locale environment variable to setup.exe? > Is not it proper as the solution? We don't want to enforce the usage of the ANSI codepage after installation. Default should be "C" with UTF-8 charset. Setting LC_ALL/LC_CTYPE/LANG is the choice of the user. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: The C locale 2009-09-24 9:57 ` Corinna Vinschen 2009-09-24 10:00 ` Corinna Vinschen @ 2009-09-27 3:44 ` IWAMURO Motonori 1 sibling, 0 replies; 51+ messages in thread From: IWAMURO Motonori @ 2009-09-27 3:44 UTC (permalink / raw) To: cygwin 2009/9/24 Corinna Vinschen <corinna-cygwin@cygwin.com>: > My question is this: Is the S-JIS implementation on UNIX systems > also using a different implementation to avoid using characters > from the ASCII range? If so, can't we change the __sjis_wctomb > and __sjis_mbtowc functions in the same manner as the __eucjp_wctomb > and __eucjp_mbtowc functions to get a safer implementation? I don't think that it is necessary to think about it. The problem of eucJP is not caused on the SJIS environment because SJIS don't support JIS-X-0212. -- IWAMURO Motnori <http://vmi.jp/> -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2009-09-29 14:55 UTC | newest] Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-08-30 16:59 The C locale Andy Koppe 2009-08-31 0:53 ` Christopher Faylor 2009-09-02 6:29 ` Andy Koppe 2009-09-02 11:48 ` Eric Blake 2009-09-02 20:10 ` Andy Koppe 2009-09-02 13:56 ` IWAMURO Motonori 2009-09-07 20:08 ` Andy Koppe 2009-09-08 19:35 ` Corinna Vinschen 2009-09-08 20:48 ` Andy Koppe 2009-09-08 21:49 ` Andy Koppe 2009-09-21 10:38 ` Corinna Vinschen 2009-09-21 13:08 ` Lapo Luchini 2009-09-21 14:39 ` Charles Wilson 2009-09-21 21:20 ` Andy Koppe 2009-09-22 5:59 ` Lapo Luchini 2009-09-22 6:23 ` Lapo Luchini 2009-09-22 6:50 ` Andy Koppe 2009-09-22 6:47 ` Andy Koppe 2009-09-22 8:43 ` Lapo Luchini 2009-09-22 12:50 ` Andy Koppe 2009-09-22 16:26 ` Lapo Luchini 2009-09-22 16:49 ` Mark J. Reed 2009-09-22 17:04 ` Lapo Luchini 2009-09-22 22:11 ` Thorsten Kampe 2009-09-23 5:12 ` Lapo Luchini 2009-09-23 9:04 ` Thorsten Kampe 2009-09-23 10:48 ` Lapo Luchini 2009-09-23 12:04 ` Andy Koppe 2009-09-23 15:16 ` Mark J. Reed 2009-09-24 7:58 ` Thorsten Kampe 2009-09-24 7:03 ` IWAMURO Motonori 2009-09-24 7:34 ` Corinna Vinschen 2009-09-24 9:39 ` IWAMURO Motonori 2009-09-24 9:57 ` Corinna Vinschen 2009-09-24 10:00 ` Corinna Vinschen 2009-09-26 9:15 ` Corinna Vinschen 2009-09-27 3:21 ` IWAMURO Motonori 2009-09-28 16:03 ` IWAMURO Motonori 2009-09-28 16:16 ` Corinna Vinschen 2009-09-29 0:23 ` wynfield 2009-09-29 4:04 ` Andy Koppe 2009-09-29 13:55 ` IWAMURO Motonori 2009-09-29 4:27 ` Andy Koppe 2009-09-29 7:03 ` Corinna Vinschen 2009-09-29 10:55 ` Lapo Luchini 2009-09-29 11:12 ` Thomas Wolff 2009-09-29 12:12 ` Corinna Vinschen 2009-09-29 14:30 ` IWAMURO Motonori 2009-09-29 14:13 ` IWAMURO Motonori 2009-09-29 14:55 ` Corinna Vinschen 2009-09-27 3:44 ` IWAMURO Motonori
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).