public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Umlauts on commandline and in .bat files
@ 2001-07-18  3:50 Ralf Fassel
  2001-07-18  4:02 ` Corinna Vinschen
  0 siblings, 1 reply; 7+ messages in thread
From: Ralf Fassel @ 2001-07-18  3:50 UTC (permalink / raw)
  To: cygwin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2421 bytes --]

Consider this program, printing the commandline arguments char by char
in octal:

    #include <stdio.h>
    int
    main(int argc, char *argv[])
    {
	int i;
	unsigned char *p;
	for (i=1; i < argc; i++) {
	    p = argv[i];
	    while (*p) {
		printf("%03o ", *p++);
	    }
	    printf("\n");
	}
	return 0;
    }

Now in a .bat file, I have a commandline containing german Umlauts:
    $ cat ttt.bat
    ./t.exe "ÄÖÜäöüß"

Running this via `sh' yields the expected result:
    $ sh ttt.bat
    304 326 334 344 366 374 337

But running via the `.bat => cmd' binding
    $ ./ttt.bat

    h:\ralf\si++.4.0.C138>./t.exe "-Í_õ÷³¯"
    055 315 137 365 367 263 257

gives me some unexpected bytes.
However:
    $ cmd /c 'type ttt.bat | od -b'
    0000000 056 057 164 056 145 170 145 040 042 304 326 334 344 366 374 337
    0000020 042 012 012
is ok (the leading stuff is the command itself, the bytes are 304 326 ff).

What's even more confusing:
    $ cmd /c type ttt.bat
    ./t.exe "-Í_õ÷³¯"
show's me the `wrong' bytes 055 315 ...,
but 
    $ cmd /c type ttt.bat  | od -b
    0000000 056 057 164 056 145 170 145 040 042 304 326 334 344 366 374 337
    0000020 042 012 012
gives again the correct bytes.

What is it that I'm missing here?

It's obviously not an error in bash/cygwin, but I'd be very thankful
for any pointers to the underlying problem...

R', not a Doze-Expert :-/
--------------------
cygcheck:
WinNT Ver 4.0 build 1381 Service Pack 6
(german language version)
    Cygwin DLL version info:
        dll major: 1003
        dll minor: 1
        dll epoch: 19
        dll bad signal mask: 19005
        dll old termios: 5
        dll malloc env: 28
        api major: 0
        api minor: 38
        shared data: 3
        dll identifier: cygwin1
        mount registry: 2
        cygnus registry name: Cygnus Solutions
        cygwin registry name: Cygwin
        program options name: Program Options
        cygwin mount registry name: mounts v2
        cygdrive flags: cygdrive flags
        cygdrive prefix: cygdrive prefix
        cygdrive default prefix: 
        build date: Tue Apr 24 20:01:02 EDT 2001
        shared id: cygwin1S3


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Umlauts on commandline and in .bat files
  2001-07-18  3:50 Umlauts on commandline and in .bat files Ralf Fassel
@ 2001-07-18  4:02 ` Corinna Vinschen
  2001-07-18  4:11   ` egor duda
  0 siblings, 1 reply; 7+ messages in thread
From: Corinna Vinschen @ 2001-07-18  4:02 UTC (permalink / raw)
  To: cygwin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1278 bytes --]

On Wed, Jul 18, 2001 at 12:49:02PM +0200, Ralf Fassel wrote:
> Consider this program, printing the commandline arguments char by char
> in octal:
> 
>     #include <stdio.h>
>     int
>     main(int argc, char *argv[])
>     {
> 	int i;
> 	unsigned char *p;
> 	for (i=1; i < argc; i++) {
> 	    p = argv[i];
> 	    while (*p) {
> 		printf("%03o ", *p++);
> 	    }
> 	    printf("\n");
> 	}
> 	return 0;
>     }
> 
> Now in a .bat file, I have a commandline containing german Umlauts:
>     $ cat ttt.bat
>     ./t.exe "ÄÖÜäöüß"
> 
> Running this via `sh' yields the expected result:
>     $ sh ttt.bat
>     304 326 334 344 366 374 337
> 
> But running via the `.bat => cmd' binding
>     $ ./ttt.bat
> 
>     h:\ralf\si++.4.0.C138>./t.exe "-Í_õ÷³¯"
>     055 315 137 365 367 263 257

CMD is running with OEM character set, Cygwin processes with ANSI.

Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Developer                                mailto:cygwin@cygwin.com
Red Hat, Inc.

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Umlauts on commandline and in .bat files
  2001-07-18  4:02 ` Corinna Vinschen
@ 2001-07-18  4:11   ` egor duda
  2001-07-18  5:03     ` Ralf Fassel
  0 siblings, 1 reply; 7+ messages in thread
From: egor duda @ 2001-07-18  4:11 UTC (permalink / raw)
  To: Corinna Vinschen; +Cc: Ralf Fassel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 948 bytes --]

Hi!

Wednesday, 18 July, 2001 Corinna Vinschen cygwin@cygwin.com wrote:

>> Now in a .bat file, I have a commandline containing german Umlauts:
>>     $ cat ttt.bat
>>     ./t.exe "ÄÖÜäöüß"
>> 
>> Running this via `sh' yields the expected result:
>>     $ sh ttt.bat
>>     304 326 334 344 366 374 337
>> 
>> But running via the `.bat => cmd' binding
>>     $ ./ttt.bat
>> 
>>     h:\ralf\si++.4.0.C138>./t.exe "-Í_õ÷³¯"
>>     055 315 137 365 367 263 257

CV> CMD is running with OEM character set, Cygwin processes with ANSI.

But one can change the latter by adding 'codepage:oem' to then CYGWIN
environment variable.

Egor.            mailto:deo@logos-m.ru ICQ 5165414 FidoNet 2:5020/496.19


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Umlauts on commandline and in .bat files
  2001-07-18  4:11   ` egor duda
@ 2001-07-18  5:03     ` Ralf Fassel
  2001-07-18  9:36       ` egor duda
  2001-07-18  9:36       ` egor duda
  0 siblings, 2 replies; 7+ messages in thread
From: Ralf Fassel @ 2001-07-18  5:03 UTC (permalink / raw)
  To: egor duda

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 808 bytes --]

* egor duda
| >>     h:\ralf\si++.4.0.C138>./t.exe "-Í_õ÷³¯"
| >>     055 315 137 365 367 263 257
| 
| CV> CMD is running with OEM character set, Cygwin processes with ANSI.
| 
| But one can change the latter by adding 'codepage:oem' to then CYGWIN
| environment variable.

I'd rather change the former... :-/

I thought the character set only determines which character
representation is shown on the screen (octal 304 is Umlaut-A in one
set and fuzzy-bar in another), not which *byte* value is passed to the
command?  Octal 304 is octal 304 no matter what character set?

R'

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Umlauts on commandline and in .bat files
  2001-07-18  5:03     ` Ralf Fassel
  2001-07-18  9:36       ` egor duda
@ 2001-07-18  9:36       ` egor duda
  2001-07-18 11:26         ` Solved: " Ralf Fassel
  1 sibling, 1 reply; 7+ messages in thread
From: egor duda @ 2001-07-18  9:36 UTC (permalink / raw)
  To: Ralf Fassel; +Cc: cygwin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1201 bytes --]

Hi!

Wednesday, 18 July, 2001 Ralf Fassel ralf@akutech.de wrote:

RF> * egor duda
| >>>     h:\ralf\si++.4.0.C138>./t.exe "-Í_õ÷³¯"
| >>>     055 315 137 365 367 263 257
RF> | 
| CV>> CMD is running with OEM character set, Cygwin processes with ANSI.
RF> | 
RF> | But one can change the latter by adding 'codepage:oem' to then CYGWIN
RF> | environment variable.

RF> I'd rather change the former... :-/

you have to write your own cmd.exe then...

RF> I thought the character set only determines which character
RF> representation is shown on the screen (octal 304 is Umlaut-A in one
RF> set and fuzzy-bar in another), not which *byte* value is passed to the
RF> command?  Octal 304 is octal 304 no matter what character set?

it also affects input. when you type something in console window and
some program tries to read this input, what it got depends on what
codepage is currently selected.

Egor.            mailto:deo@logos-m.ru ICQ 5165414 FidoNet 2:5020/496.19


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Umlauts on commandline and in .bat files
  2001-07-18  5:03     ` Ralf Fassel
@ 2001-07-18  9:36       ` egor duda
  2001-07-18  9:36       ` egor duda
  1 sibling, 0 replies; 7+ messages in thread
From: egor duda @ 2001-07-18  9:36 UTC (permalink / raw)
  To: Ralf Fassel; +Cc: egor duda

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 754 bytes --]

Hi!

Wednesday, 18 July, 2001 Ralf Fassel ralf@akutech.de wrote:

RF> * egor duda
| >>>     h:\ralf\si++.4.0.C138>./t.exe "-Í_õ÷³¯"
| >>>     055 315 137 365 367 263 257
RF> | 
| CV>> CMD is running with OEM character set, Cygwin processes with ANSI.
RF> | 
RF> | But one can change the latter by adding 'codepage:oem' to then CYGWIN
RF> | environment variable.

RF> I'd rather change the former... :-/

iirc, 'chcp' command can change the former.

Egor.            mailto:deo@logos-m.ru ICQ 5165414 FidoNet 2:5020/496.19


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Solved: Re: Umlauts on commandline and in .bat files
  2001-07-18  9:36       ` egor duda
@ 2001-07-18 11:26         ` Ralf Fassel
  0 siblings, 0 replies; 7+ messages in thread
From: Ralf Fassel @ 2001-07-18 11:26 UTC (permalink / raw)
  To: egor duda

* egor duda
| Hi!

Rehi!

| RF> [different codepages]
| RF> Octal 304 is octal 304 no matter what character set?
| 
| it also affects input. when you type something in console window and
| some program tries to read this input, what it got depends on what
| codepage is currently selected.

Yes, but... but I don't type anything here?  The .bat file comes from
disk, the bytes in the .bat are the ones I want, nevertheless cmd.exe
seems to mangle it and make different bytes from it?  Obviously some
*very* clever input processing of cmd.exe...

| RF> I'd rather change the former... :-/
| 
| iirc, 'chcp' command can change the former.

Yup, I tried that before, but obviously not hard enough.  I assumed
that codepage 850 was the one I want (doc says `western european multi
language'), but actually it seems to be 1252, the default ANSI code
page according to some resource kit docu.

If I go `chcp 1252' and then call my .bat, the output is ok
(characters displayed are still weird, but...).  Changing the
equivalent registry entry `OEMCP' in
  HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage
to `1252', reboot, even changes the code page at cmd.exe startup, and
also my `real' problem, the NutCracker shell.  I don't know if this is
a clever idea, but it seems to `work'.

Thanks so far...
R'

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-07-18 11:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-07-18  3:50 Umlauts on commandline and in .bat files Ralf Fassel
2001-07-18  4:02 ` Corinna Vinschen
2001-07-18  4:11   ` egor duda
2001-07-18  5:03     ` Ralf Fassel
2001-07-18  9:36       ` egor duda
2001-07-18  9:36       ` egor duda
2001-07-18 11:26         ` Solved: " Ralf Fassel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).