public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Lapo Luchini <lapo@lapo.it>
To: cygwin@cygwin.com
Subject: Re: The C locale
Date: Tue, 22 Sep 2009 08:43:00 -0000	[thread overview]
Message-ID: <h9a2mo$867$1@ger.gmane.org> (raw)
In-Reply-To: <416096c60909212347r7e03a4f3q7d518ff7e8bce55d@mail.gmail.com>

Andy Koppe wrote:
> No, it isn't. UTF-16 filename characters that can't be represented in
> the current charset are encoded by a ^N followed by the character's
> UTF-8 representation.

OK, right.

> For example, a Windows filename "bäh" turns into "bŤh" in the C locale,
> while it shows up correctly with explicitly set ISO-8859-1 or CP1252.

Uh? Doesn't seem so to me: if I create "bäh" in WindowsExplorer, then
open up an UTF-8 mintty console I have a consistent output with both
LANG=C and LANG=it_IT.UTF-8 (of course, since right now C is UTF-8):

% LANG=C ls -l|egrep b.h
-rw-r--r-- 1 lapo None     0 Sep 22 09:53 bäh
% LANG=it_IT.UTF-8 ls -l|egrep b.h
-rw-r--r-- 1 lapo None     0 22 Sep 09:53 bäh

So I'm not sure what do you mean with 'a Windows filename "bäh" turns
into "bŤh" in the C locale'... you mean that a script sees it as
62C3A468 as opposed as 62E468? Or that actual "bŤh" is shown somewhere?

As "bŤh" is just a representation, and it depends on the charset the
console expects (and in fact in this UTF-8-encoded message, it will be
probably represented with 62C385C2A468)... if the console is UTF-8,
what's currently shown is what I'd expect.
If OTOH we're talking what it is in raw form and not of what is shown
(i.e. about "3 bytes" vs a "4 bytes" string) well, that's a different
issue, and I'm not sure why a program should prefer a 3-byte
representations as opposed to a 4-byte one...?

But OTOH as far as "not caring" goes, it sure can be a nice feature to
be retro-compatible in that single case, since the behavior is not
well-defined anyways...
But again, if a script creates a filename that happens to contain
Japanese characters (or even umlauts or r-quotes/l-quotes) I would
expect to see that on the filesystem too, and not some random-looking
escaped-sequence...

> Btw, are you actually using the C locale?

Not usually, but it happens from time to time (mostly in script, or in
cases such as the monotone "make check" unit tests; one which tries to
create UTF-8 filenames and then ISO-8859-1 filenames currently fail).

-- 
Lapo Luchini - http://lapo.it/

“Endure. In enduring, grow strong.” (Dak'kon, videogame "Torment", 1999)


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

  reply	other threads:[~2009-09-22  8:43 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-30 16:59 Andy Koppe
2009-08-31  0:53 ` Christopher Faylor
2009-09-02  6:29   ` Andy Koppe
2009-09-02 11:48     ` Eric Blake
2009-09-02 20:10       ` Andy Koppe
2009-09-02 13:56     ` IWAMURO Motonori
2009-09-07 20:08       ` Andy Koppe
2009-09-08 19:35         ` Corinna Vinschen
2009-09-08 20:48           ` Andy Koppe
2009-09-08 21:49           ` Andy Koppe
2009-09-21 10:38             ` Corinna Vinschen
2009-09-21 13:08               ` Lapo Luchini
2009-09-21 14:39               ` Charles Wilson
2009-09-21 21:20               ` Andy Koppe
2009-09-22  5:59                 ` Lapo Luchini
2009-09-22  6:23                   ` Lapo Luchini
2009-09-22  6:50                     ` Andy Koppe
2009-09-22  6:47                   ` Andy Koppe
2009-09-22  8:43                     ` Lapo Luchini [this message]
2009-09-22 12:50                       ` Andy Koppe
2009-09-22 16:26                         ` Lapo Luchini
2009-09-22 16:49                           ` Mark J. Reed
2009-09-22 17:04                             ` Lapo Luchini
2009-09-22 22:11                           ` Thorsten Kampe
2009-09-23  5:12                             ` Lapo Luchini
2009-09-23  9:04                               ` Thorsten Kampe
2009-09-23 10:48                                 ` Lapo Luchini
2009-09-23 12:04                                   ` Andy Koppe
2009-09-23 15:16                                     ` Mark J. Reed
2009-09-24  7:58                                   ` Thorsten Kampe
2009-09-24  7:03                 ` IWAMURO Motonori
2009-09-24  7:34                   ` Corinna Vinschen
2009-09-24  9:39                     ` IWAMURO Motonori
2009-09-24  9:57                       ` Corinna Vinschen
2009-09-24 10:00                         ` Corinna Vinschen
2009-09-26  9:15                           ` Corinna Vinschen
2009-09-27  3:21                             ` IWAMURO Motonori
2009-09-28 16:03                               ` IWAMURO Motonori
2009-09-28 16:16                                 ` Corinna Vinschen
2009-09-29  0:23                                   ` wynfield
2009-09-29  4:04                                     ` Andy Koppe
2009-09-29 13:55                                     ` IWAMURO Motonori
2009-09-29  4:27                                   ` Andy Koppe
2009-09-29  7:03                                     ` Corinna Vinschen
2009-09-29 10:55                                   ` Lapo Luchini
2009-09-29 11:12                                   ` Thomas Wolff
2009-09-29 12:12                                     ` Corinna Vinschen
2009-09-29 14:30                                       ` IWAMURO Motonori
2009-09-29 14:13                                   ` IWAMURO Motonori
2009-09-29 14:55                                     ` Corinna Vinschen
2009-09-27  3:44                         ` IWAMURO Motonori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='h9a2mo$867$1@ger.gmane.org' \
    --to=lapo@lapo.it \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).