public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Quotes around command-line argument that has unicode characters are not removed
@ 2018-03-22  5:41 Dmitry Katsubo via cygwin
  2018-03-22 12:24 ` Andrey Repin
  2018-03-22 13:35 ` Mikhail Usenko via cygwin
  0 siblings, 2 replies; 13+ messages in thread
From: Dmitry Katsubo via cygwin @ 2018-03-22  5:41 UTC (permalink / raw)
  To: cygwin

Dear Cygwin community,

I observe the following on my Cygwin: when I put quotes around file that has
non-ASCII symbols, these quotes are passed to argv of the process literally,
otherwise they are removed. I would expect that there is a consistency.

I have written a small C program that displays arguments, and run it three
times:

#1 For the file with space, taken into quotes ("the file.txt") -- OK
#2 For the file with non-ASCII characters (Château.txt) -- OK
#3 For the file with non-ASCII characters, taken into quotes ("Château.txt") -- WRONG

d:\cli> uname -a
CYGWIN_NT-6.1-WOW PC 2.9.0(0.318/5/3) 2017-09-12 10:41 i686 Cygwin

D:\cli> chcp
Active code page: 866

D:\cli> dir
...cut...
2018-03-22  00:43                 0 Château.txt
2018-03-22  00:01               393 test.c
2018-03-22  00:01           150,230 test.exe
2018-03-21  00:15               186 test.pl
2018-03-22  00:43                 0 the file.txt
2018-03-22  00:40                16 текст плюс.txt
               6 File(s)        150,825 bytes
               2 Dir(s)  41,972,293,632 bytes free

D:\cli> test "the file.txt"
param 0 = test
param 1 = the file.txt
File 'the file.txt' was opened

D:\cli> test Château.txt
param 0 = test
param 1 = Château.txt
File 'Château.txt' was opened

D:\cli> test "Château.txt"
param 0 = test
param 1 = "Château.txt"
Failed to open '"Château.txt"': No such file or directory

As one can see, the last run fails. I am a bit puzzled: how can I pass the name
of the file with space and Unicode symbols? I need to do it in uniform way, as I
am calling a Cygwin program from native Windows program, as in [1].

D:\cli> test "текст плюс.txt"
param 0 = test
param 1 = "текст плюс.txt"
Failed to open '"текст плюс.txt"': No such file or directory

I have search a bit, but I couldn't find a direct answer. From post [1] and [2]
I see that compiler inserts the code to do some argument pre-processing like
@pathnames [3], but what are exactly the rules? Is quote pre-processing done in
dcrt0.cc:177 [4]?

Any feedback is appreciated.

[1] https://sourceware.org/ml/cygwin/2016-05/msg00082.html
[2] http://daviddeley.com/autohotkey/parameters/parameters.htm
[3] https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-at
[4] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L177

=== test.c ===
#include <stdio.h>
#include <errno.h>
#include <string.h>

int main(int argc, char* argv[])
{
	for (int i = 0; i < argc; i++)
	{
		printf("param %d = %s\n", i, argv[i]);
	}
	FILE* f = fopen(argv[1], "r");
	if (f != NULL)
	{
		printf("File '%s' was opened\n", argv[1]);
		fclose(f);
	} else {
		printf("Failed to open '%s': %s\n", argv[1], strerror(errno));
	}
	return 0;
}

-- 
With best regards,
Dmitry

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Quotes around command-line argument that has unicode characters are not removed
  2018-03-22  5:41 Quotes around command-line argument that has unicode characters are not removed Dmitry Katsubo via cygwin
@ 2018-03-22 12:24 ` Andrey Repin
  2018-03-22 17:25   ` Kaz Kylheku
                     ` (2 more replies)
  2018-03-22 13:35 ` Mikhail Usenko via cygwin
  1 sibling, 3 replies; 13+ messages in thread
From: Andrey Repin @ 2018-03-22 12:24 UTC (permalink / raw)
  To: Dmitry Katsubo, cygwin

Greetings, Dmitry Katsubo!

> Dear Cygwin community,

> I observe the following on my Cygwin:

This is not cygwin, this is bare Windows.

> when I put quotes around file that has
> non-ASCII symbols, these quotes are passed to argv of the process literally,
> otherwise they are removed. I would expect that there is a consistency.

Parameter unquoting done by the shell.
CMD does that differently from POSIX shells.

> I have written a small C program that displays arguments, and run it three
> times:

Run it in bash. I'm pretty sure you will see your results more consistent.

> #1 For the file with space, taken into quotes ("the file.txt") -- OK
> #2 For the file with non-ASCII characters (Château.txt) -- OK
> #3 For the file with non-ASCII characters, taken into quotes ("Château.txt") -- WRONG

> d:\cli> uname -a
> CYGWIN_NT-6.1-WOW PC 2.9.0(0.318/5/3) 2017-09-12 10:41 i686 Cygwin

> D:\cli> chcp
> Active code page: 866

> D:\cli> dir
> ...cut...
> 2018-03-22  00:43                 0 Château.txt
> 2018-03-22  00:01               393 test.c
> 2018-03-22  00:01           150,230 test.exe
> 2018-03-21  00:15               186 test.pl
> 2018-03-22  00:43                 0 the file.txt
> 2018-03-22  00:40                16 текст плюс.txt
>                6 File(s)        150,825 bytes
>                2 Dir(s)  41,972,293,632 bytes free

> D:\cli> test "the file.txt"
> param 0 = test
> param 1 = the file.txt
> File 'the file.txt' was opened

> D:\cli> test Château.txt
> param 0 = test
> param 1 = Château.txt
> File 'Château.txt' was opened

> D:\cli> test "Château.txt"
> param 0 = test
> param 1 = "Château.txt"
> Failed to open '"Château.txt"': No such file or directory

> As one can see, the last run fails. I am a bit puzzled: how can I pass the name
> of the file with space and Unicode symbols? I need to do it in uniform way, as I
> am calling a Cygwin program from native Windows program, as in [1].

> D:\cli> test "текст плюс.txt"
> param 0 = test
> param 1 = "текст плюс.txt"
> Failed to open '"текст плюс.txt"': No such file or directory

> I have search a bit, but I couldn't find a direct answer. From post [1] and [2]
> I see that compiler inserts the code to do some argument pre-processing like
> @pathnames [3], but what are exactly the rules? Is quote pre-processing done in
> dcrt0.cc:177 [4]?

> Any feedback is appreciated.

> [1] https://sourceware.org/ml/cygwin/2016-05/msg00082.html
> [2] http://daviddeley.com/autohotkey/parameters/parameters.htm
> [3] https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-at
> [4] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L177

> === test.c ===
> #include <stdio.h>
> #include <errno.h>
> #include <string.h>

> int main(int argc, char* argv[])
> {
>         for (int i = 0; i < argc; i++)
>         {
>                 printf("param %d = %s\n", i, argv[i]);
>         }
>         FILE* f = fopen(argv[1], "r");
>         if (f != NULL)
>         {
>                 printf("File '%s' was opened\n", argv[1]);
>                 fclose(f);
>         } else {
>                 printf("Failed to open '%s': %s\n", argv[1], strerror(errno));
>         }
>         return 0;
> }



-- 
With best regards,
Andrey Repin
Thursday, March 22, 2018 14:21:25

Sorry for my terrible english...
--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Quotes around command-line argument that has unicode characters are not removed
  2018-03-22  5:41 Quotes around command-line argument that has unicode characters are not removed Dmitry Katsubo via cygwin
  2018-03-22 12:24 ` Andrey Repin
@ 2018-03-22 13:35 ` Mikhail Usenko via cygwin
  2018-03-22 15:35   ` Andrey Repin
  1 sibling, 1 reply; 13+ messages in thread
From: Mikhail Usenko via cygwin @ 2018-03-22 13:35 UTC (permalink / raw)
  To: cygwin; +Cc: Dmitry Katsubo

On Thu, 22 Mar 2018 01:15:00 +0100
Dmitry Katsubo via cygwin <...> wrote:

> Dear Cygwin community,
> 
> I observe the following on my Cygwin: when I put quotes around file that has
> non-ASCII symbols, these quotes are passed to argv of the process literally,
> otherwise they are removed. I would expect that there is a consistency.
> 
> I have written a small C program that displays arguments, and run it three
> times:
> 
> #1 For the file with space, taken into quotes ("the file.txt") -- OK
> #2 For the file with non-ASCII characters (Château.txt) -- OK
> #3 For the file with non-ASCII characters, taken into quotes ("Château.txt") -- WRONG
> 
> d:\cli> uname -a
> CYGWIN_NT-6.1-WOW PC 2.9.0(0.318/5/3) 2017-09-12 10:41 i686 Cygwin
> 
> D:\cli> chcp
> Active code page: 866
> 
> D:\cli> dir
> ...cut...
> 2018-03-22  00:43                 0 Château.txt
> 2018-03-22  00:01               393 test.c
> 2018-03-22  00:01           150,230 test.exe
> 2018-03-21  00:15               186 test.pl
> 2018-03-22  00:43                 0 the file.txt
> 2018-03-22  00:40                16 текст плюс.txt
>                6 File(s)        150,825 bytes
>                2 Dir(s)  41,972,293,632 bytes free
> 
> D:\cli> test "the file.txt"
> param 0 = test
> param 1 = the file.txt
> File 'the file.txt' was opened
> 
> D:\cli> test Château.txt
> param 0 = test
> param 1 = Château.txt
> File 'Château.txt' was opened
> 
> D:\cli> test "Château.txt"
> param 0 = test
> param 1 = "Château.txt"
> Failed to open '"Château.txt"': No such file or directory
> 
> As one can see, the last run fails. I am a bit puzzled: how can I pass the name
> of the file with space and Unicode symbols? I need to do it in uniform way, as I
> am calling a Cygwin program from native Windows program, as in [1].
> 
> D:\cli> test "текст плюс.txt"
> param 0 = test
> param 1 = "текст плюс.txt"
> Failed to open '"текст плюс.txt"': No such file or directory
> 
> I have search a bit, but I couldn't find a direct answer. From post [1] and [2]
> I see that compiler inserts the code to do some argument pre-processing like
> @pathnames [3], but what are exactly the rules? Is quote pre-processing done in
> dcrt0.cc:177 [4]?
> 
> Any feedback is appreciated.
> 
> [1] https://sourceware.org/ml/cygwin/2016-05/msg00082.html
> [2] http://daviddeley.com/autohotkey/parameters/parameters.htm
> [3] https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-at
> [4] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L177
> 
> === test.c ===
> #include <stdio.h>
> #include <errno.h>
> #include <string.h>
> 
> int main(int argc, char* argv[])
> {
> 	for (int i = 0; i < argc; i++)
> 	{
> 		printf("param %d = %s\n", i, argv[i]);
> 	}
> 	FILE* f = fopen(argv[1], "r");
> 	if (f != NULL)
> 	{
> 		printf("File '%s' was opened\n", argv[1]);
> 		fclose(f);
> 	} else {
> 		printf("Failed to open '%s': %s\n", argv[1], strerror(errno));
> 	}
> 	return 0;
> }
> 
> -- 

Hello, Dmintry,
consider these test cases:

Native (msvcrt) binary:
-----------------------
$ x86_64-w64-mingw32-gcc test.c -o test-win.exe
$ ldd test-win.exe
        ntdll.dll => /cygdrive/c/Windows/SYSTEM32/ntdll.dll (0x7fa05900000)
        KERNEL32.DLL => /cygdrive/c/Windows/system32/KERNEL32.DLL (0x7fa030e0000)
        KERNELBASE.dll => /cygdrive/c/Windows/system32/KERNELBASE.dll (0x7fa028f0000)
        msvcrt.dll => /cygdrive/c/Windows/system32/msvcrt.dll (0x7fa03220000)
-----------------------

Cygwin-flavor binary:
---------------------
$ gcc test.c -o test-cygwin.exe
$ ldd test-cygwin.exe
        ntdll.dll => /cygdrive/c/Windows/SYSTEM32/ntdll.dll (0x7fa05900000)
        KERNEL32.DLL => /cygdrive/c/Windows/system32/KERNEL32.DLL (0x7fa030e0000)
        KERNELBASE.dll => /cygdrive/c/Windows/system32/KERNELBASE.dll (0x7fa028f0000)
        cygwin1.dll => /usr/bin/cygwin1.dll (0x180040000)
---------------------

Create a file with non-ascii chars in the name:
-----------------------------------------------
$ touch "текст плюс.txt"
-----------------------------------------------

Run both binaries in mintty with bash:
--------------------------------------
$ ./test-win "текст плюс.txt"
param 0 = D:\wroot\test.cygwin\Quotes around command-line argument that has unicode characters are not removed\test-win.exe
param 1 = â–’â–’â–’â–’â–’ â–’â–’â–’â–’.txt
File 'â–’â–’â–’â–’â–’ â–’â–’â–’â–’.txt' was opened
$ ./test-cygwin "текст плюс.txt"
param 0 = ./test-cygwin
param 1 = текст плюс.txt
File 'текст плюс.txt' was opened
--------------------------------------

Run the binaries in cmd.exe with bash:
--------------------------------------
$ ./test-win "текст плюс.txt"
param 0 = D:\wroot\test.cygwin\Quotes around command-line argument that has unicode characters are not removed\test-win.exe
param 1 = ЄхъёЄ яы■ё.txt
File 'ЄхъёЄ яы■ё.txt' was opened
$ ./test-cygwin "текст плюс.txt"
param 0 = ./test-cygwin
param 1 = текст плюс.txt
File 'текст плюс.txt' was opened
--------------------------------------

Run in bare cmd.exe
(/usr/bin/cygwin1.dll should be copied next to ./test-cygwin.exe)
-------------------
D:\wroot\test.cygwin\Quotes around command-line argument that has unicode characters are not removed>.\test-win.exe "текст плюс.txt"
param 0 = .\test-win.exe
param 1 = ЄхъёЄ яы■ё.txt
File 'ЄхъёЄ яы■ё.txt' was opened
D:\wroot\test.cygwin\Quotes around command-line argument that has unicode characters are not removed>.\test-cygwin.exe "текст плюс.txt"
param 0 = ./test-cygwin
param 1 = "текст плюс.txt"
Failed to open '"текст плюс.txt"': No such file or directory
-------------------

In bare cmd.exe native-msvcrt binary is working OK with quoted non-ascii
arguments, while cygwin-flavor binary is not. But I don't know exactly which
level here: cmd.exe or msvcrt.dll/cygwin1.dll is responsible for
such a behavior.


-- 


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Quotes around command-line argument that has unicode characters are not removed
  2018-03-22 13:35 ` Mikhail Usenko via cygwin
@ 2018-03-22 15:35   ` Andrey Repin
  2018-03-22 22:21     ` Dmitry Katsubo via cygwin
  0 siblings, 1 reply; 13+ messages in thread
From: Andrey Repin @ 2018-03-22 15:35 UTC (permalink / raw)
  To: Mikhail Usenko, cygwin

Greetings, Mikhail Usenko!

> In bare cmd.exe native-msvcrt binary is working OK with quoted non-ascii
> arguments, while cygwin-flavor binary is not. But I don't know exactly which
> level here: cmd.exe or msvcrt.dll/cygwin1.dll is responsible for
> such a behavior.

Locale settings affecting Cygwin binary.

If you
set LANG=ru_RU.CP866
(f.e.)
before invoking cygwin testcase in native CMD, you will likely see it
working better.
Alternatively, you could try
chcp 65001


-- 
With best regards,
Andrey Repin
Thursday, March 22, 2018 16:22:13

Sorry for my terrible english...


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Quotes around command-line argument that has unicode characters  are not removed
  2018-03-22 12:24 ` Andrey Repin
@ 2018-03-22 17:25   ` Kaz Kylheku
  2018-03-22 22:46     ` Dmitry Katsubo via cygwin
  2018-03-22 21:14   ` Dmitry Katsubo via cygwin
  2018-03-23  7:58   ` Thomas Wolff
  2 siblings, 1 reply; 13+ messages in thread
From: Kaz Kylheku @ 2018-03-22 17:25 UTC (permalink / raw)
  To: cygwin; +Cc: Dmitry Katsubo

On 2018-03-22 04:24, Andrey Repin wrote:
> Greetings, Dmitry Katsubo!
> 
>> Dear Cygwin community,
> 
>> I observe the following on my Cygwin:
> 
> This is not cygwin, this is bare Windows.

That may be so, yet there may be an issue here for someone packaging
Cygwin programs for use as native Windows applications.

That is to say, there could potentially be something here that the 
Cygnal
project could address:

http://www.kylheku.com/cygnal/

Cygnal is an ultra-light fork of the Cygwin DLL that is intended for 
users like Dmitry Katsubo, who run Cygwin programs out of the Windows 
environment directly, after building them in Cygwin.

> 
>> when I put quotes around file that has
>> non-ASCII symbols, these quotes are passed to argv of the process 
>> literally,
>> otherwise they are removed. I would expect that there is a 
>> consistency.
> 
> Parameter unquoting done by the shell.
> CMD does that differently from POSIX shells.

As I seem to recall, CMD doesn't do anything, period! It passes the 
command line
as one big string. It has to since that's the OS mechanism.

The quoting conventions come from how various run-time libraries deal 
with that
string. An influential convention is that of the MS Visual C run-time 
library;
it behooves other run-times to be compatible with that for consistency 
with
programs whose main() was compiled with MSVC.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Quotes around command-line argument that has unicode characters are not removed
  2018-03-22 12:24 ` Andrey Repin
  2018-03-22 17:25   ` Kaz Kylheku
@ 2018-03-22 21:14   ` Dmitry Katsubo via cygwin
  2018-03-23  7:58   ` Thomas Wolff
  2 siblings, 0 replies; 13+ messages in thread
From: Dmitry Katsubo via cygwin @ 2018-03-22 21:14 UTC (permalink / raw)
  To: cygwin

On 2018-03-22 12:24, Andrey Repin wrote:
> 
> This is not cygwin, this is bare Windows.

This is executable linked against cygwin1.dll. I personally call such
binaries "Cygwin programs". However it is run from Windows.

> Parameter unquoting done by the shell.
> CMD does that differently from POSIX shells.

CMD does nothing when you execute a program from it. Command-line
is passed literally. I've download procmon.exe [1] and filtered by
process name "cmd.exe". When I run

D:\cli> test abc "текст\" плюс.txt"

(suppose that CMD will at least remove backslashes) I see the following
in the log:

test abc "текст\" плюс.txt"

[1] https://docs.microsoft.com/en-us/sysinternals/downloads/procmon

-- 
With best regards,
Dmitry

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Quotes around command-line argument that has unicode characters are not removed
  2018-03-22 15:35   ` Andrey Repin
@ 2018-03-22 22:21     ` Dmitry Katsubo via cygwin
  2018-03-27 10:05       ` Andrey Repin
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Katsubo via cygwin @ 2018-03-22 22:21 UTC (permalink / raw)
  To: cygwin

On 2018-03-22 14:25, Andrey Repin wrote:
> Greetings, Mikhail Usenko!
> 
>> In bare cmd.exe native-msvcrt binary is working OK with quoted non-ascii
>> arguments, while cygwin-flavor binary is not. But I don't know exactly which
>> level here: cmd.exe or msvcrt.dll/cygwin1.dll is responsible for
>> such a behavior.

Thanks, Mikhail! I generally agree with you. If you follow the links I've
provided in my original mail, you can see that cmd.exe does not do any argument
splitting. I also see that from this method signature [1]:

build_argv (char *cmd, char **&argv, int &argc, int winshell)

which basically takes a string as input and returns an array of strings plus
number of arguments as output. So this is either done by msvcrt.dll or by
cygwin1.dll and they have different ways of doing that, which is OK provided
it is documented and done consistently. I refer back to dcrt0.cc where the
woodoo is done. In particular in line 165 [2] it checks that execution was
performed from bare Windows, and behaves differently.

On 2018-03-22 12:24, Andrey Repin wrote:
> Run it in bash. I'm pretty sure you will see your results more consistent.

When "test.exe" is run from bash, it behaves correctly because as you said
bash did the most of dirty work. I also tried to workaround like below,
but it does not work:

D:\cli> bash -c "./test 'текст плюс.txt'"
bash: ./test 'текст плюс.txt': No such file or directory

> Locale settings affecting Cygwin binary.
> 
> If you
> set LANG=ru_RU.CP866
> (f.e.)
> before invoking cygwin testcase in native CMD, you will likely see it
> working better.

Thanks for this advise, Andrey. I see that it reacts, but works worth :)
I think it advises to output characters in CP866, but console is UTF-8:

D:\cli> set LANG=ru_RU.CP866

D:\cli> test "текст плюс.txt"
param 0 = test
param 1 = ⥪▒▒ ▒▒▒▒.txt
Failed to open '⥪▒▒ ▒▒▒▒.txt': No such file or directory

But.. ta-da! I made it working like that:

D:\cli> set LANG=ru_RU.UTF-8

D:\cli> test "текст плюс.txt"
param 0 = test
param 1 = текст плюс.txt
File 'текст плюс.txt' was opened

Hooray, it worked!

> Alternatively, you could try
> chcp 65001

That does not help:

D:\cli> chcp 65001
Active code page: 65001

D:\cli> test "текст плюс.txt"
param 0 = test
param 1 = "текст плюс.txt"
Failed to open '"текст плюс.txt"': No such file or directory

[1] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L297
[2] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L165

-- 
With best regards,
Dmitry

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Quotes around command-line argument that has unicode characters are not removed
  2018-03-22 17:25   ` Kaz Kylheku
@ 2018-03-22 22:46     ` Dmitry Katsubo via cygwin
  2018-03-25  0:04       ` Kaz Kylheku
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Katsubo via cygwin @ 2018-03-22 22:46 UTC (permalink / raw)
  To: cygwin

On 2018-03-22 18:10, Kaz Kylheku wrote:
> That may be so, yet there may be an issue here for someone packaging
> Cygwin programs for use as native Windows applications.
> 
> That is to say, there could potentially be something here that the Cygnal
> project could address:
> 
> http://www.kylheku.com/cygnal/
> 
> Cygnal is an ultra-light fork of the Cygwin DLL that is intended for users,
> who run Cygwin programs out of the Windows environment directly, after building them in Cygwin.

Thanks for the hint. I confirm that just substituting cygwin1.dll makes
the test working:

D:\cli> test "текст плюс.txt"
param 0 = test
param 1 = текст плюс.txt
File 'текст плюс.txt' was opened

I was not able to find any relevant difference in dcrt0.cc, but perhaps the
difference is in initial setting of locale (Cygnal initialization).

-- 
With best regards,
Dmitry

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Quotes around command-line argument that has unicode characters are not removed
  2018-03-22 12:24 ` Andrey Repin
  2018-03-22 17:25   ` Kaz Kylheku
  2018-03-22 21:14   ` Dmitry Katsubo via cygwin
@ 2018-03-23  7:58   ` Thomas Wolff
  2018-03-23 12:20     ` Steven Penny
  2 siblings, 1 reply; 13+ messages in thread
From: Thomas Wolff @ 2018-03-23  7:58 UTC (permalink / raw)
  To: cygwin

Am 22.03.2018 um 12:24 schrieb Andrey Repin:
> ...
>> when I put quotes around file that has
>> non-ASCII symbols, these quotes are passed to argv of the process literally,
>> otherwise they are removed. I would expect that there is a consistency.
> Parameter unquoting done by the shell.
> CMD does that differently from POSIX shells.
cmd.exe applies some inconsistent "smart" (in an MS sense...) magic 
quoting; it adds additional quotes if the parameter contains non-ASCII 
characters.
>> I have written a small C program that displays arguments, and run it three times:
> ...
You can also test this with cygwin /bin/echo:
C:\cygwin\bin>.\echo "bla"
bla

C:\cygwin\bin>.\echo "blö"
"blö"

This is also the reason why 'chere' fails on non-ASCII directories.

>> As one can see, the last run fails. I am a bit puzzled: how can I pass the name
>> of the file with space and Unicode symbols? I need to do it in uniform way, as I
>> am calling a Cygwin program from native Windows program, as in [1].
Due to the weird cmd.exe behaviour, you cannot. However, cygwin could 
apply a workaround by magic unquoting.

Thomas

>> Any feedback is appreciated.
>> [1] https://sourceware.org/ml/cygwin/2016-05/msg00082.html
>> [2] http://daviddeley.com/autohotkey/parameters/parameters.htm
>> [3] https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-at
>> [4] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L177

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Quotes around command-line argument that has unicode characters are not removed
  2018-03-23  7:58   ` Thomas Wolff
@ 2018-03-23 12:20     ` Steven Penny
  0 siblings, 0 replies; 13+ messages in thread
From: Steven Penny @ 2018-03-23 12:20 UTC (permalink / raw)
  To: cygwin

On Fri, 23 Mar 2018 08:39:21, Thomas Wolff wrote:
> Due to the weird cmd.exe behaviour, you cannot. However, cygwin could 
> apply a workaround by magic unquoting.

This is correct. note that "run" has this "workaround" already via the "--quote"
option. that code could perhaps be applied in other places:

http://sourceware.org/git/gitweb.cgi?p=cygwin-apps/run.git&a=blob&f=src/run.1.in


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Quotes around command-line argument that has unicode characters  are not removed
  2018-03-22 22:46     ` Dmitry Katsubo via cygwin
@ 2018-03-25  0:04       ` Kaz Kylheku
  0 siblings, 0 replies; 13+ messages in thread
From: Kaz Kylheku @ 2018-03-25  0:04 UTC (permalink / raw)
  To: cygwin

On 2018-03-22 15:21, Dmitry Katsubo via cygwin wrote:
> On 2018-03-22 18:10, Kaz Kylheku wrote:
>> That may be so, yet there may be an issue here for someone packaging
>> Cygwin programs for use as native Windows applications.
>> 
>> That is to say, there could potentially be something here that the 
>> Cygnal
>> project could address:
>> 
>> http://www.kylheku.com/cygnal/
>> 
>> Cygnal is an ultra-light fork of the Cygwin DLL that is intended for 
>> users,
>> who run Cygwin programs out of the Windows environment directly, after 
>> building them in Cygwin.
> 
> Thanks for the hint. I confirm that just substituting cygwin1.dll makes
> the test working:
> 
> D:\cli> test "текст плюс.txt"
> param 0 = test
> param 1 = текст плюс.txt
> File 'текст плюс.txt' was opened

Well, that seems like a miracle, because in Cygnal, I don't remember 
doing anything
to the processing of the command line or initial locale.

> I was not able to find any relevant difference in dcrt0.cc, but perhaps 
> the
> difference is in initial setting of locale (Cygnal initialization).

Could be some Cygwin issue caused by newer commit that isn't picked up 
in Cygnal;
i.e "red herring".

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Quotes around command-line argument that has unicode characters are not removed
  2018-03-22 22:21     ` Dmitry Katsubo via cygwin
@ 2018-03-27 10:05       ` Andrey Repin
  2018-03-27 17:39         ` Brian Inglis
  0 siblings, 1 reply; 13+ messages in thread
From: Andrey Repin @ 2018-03-27 10:05 UTC (permalink / raw)
  To: Dmitry Katsubo, cygwin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 1948 bytes --]

Greetings, Dmitry Katsubo!

>> Locale settings affecting Cygwin binary.
>> 
>> If you
>> set LANG=ru_RU.CP866
>> (f.e.)
>> before invoking cygwin testcase in native CMD, you will likely see it
>> working better.

> Thanks for this advise, Andrey. I see that it reacts, but works worth :)
> I think it advises to output characters in CP866, but console is UTF-8:

> D:\cli> set LANG=ru_RU.CP866

> D:\cli> test "текст плюс.txt"
> param 0 = test
> param 1 = ⥪▒▒ ▒▒▒▒.txt
> Failed to open '⥪▒▒ ▒▒▒▒.txt': No such file or directory

> But.. ta-da! I made it working like that:

> D:\cli> set LANG=ru_RU.UTF-8

> D:\cli> test "текст плюс.txt"
> param 0 = test
> param 1 = текст плюс.txt
> File 'текст плюс.txt' was opened

> Hooray, it worked!

This is no magic. Console settings must match locale set in the environment.
Please test again with "chcp" to get current console codepage and setting LANG to match it.
I could not see which version of Windows you're using, sorry. It is possible
that console is set to a different codepage than usual.

>> Alternatively, you could try
>> chcp 65001

> That does not help:

> D:\cli> chcp 65001
> Active code page: 65001

> D:\cli> test "текст плюс.txt"
> param 0 = test
> param 1 = "текст плюс.txt"
> Failed to open '"текст плюс.txt"': No such file or directory

> [1] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L297
> [2] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L165



-- 
With best regards,
Andrey Repin
Tuesday, March 27, 2018 12:51:10

Sorry for my terrible english...\0ТÒÐÐ¥\a&ö&ÆVÒ\a&W\x06÷'G3¢\x02\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒ÷\a&ö&ÆV×2æ‡FÖÀФd\x15\x13¢\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒöf\x17\x12ðФFö7VÖVçF\x17F–öã¢\x02\x02\x02\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒöFö72æ‡FÖÀÐ¥Vç7V'67&–&R\x06–æfó¢\x02\x02\x02\x02\x02\x06‡GG\x03¢òö7–wv–âæ6öÒöÖÂò7Vç7V'67&–&R×6–×\x06ÆPРÐ

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Quotes around command-line argument that has unicode characters are not removed
  2018-03-27 10:05       ` Andrey Repin
@ 2018-03-27 17:39         ` Brian Inglis
  0 siblings, 0 replies; 13+ messages in thread
From: Brian Inglis @ 2018-03-27 17:39 UTC (permalink / raw)
  To: cygwin

On 2018-03-27 03:56, Andrey Repin wrote:
>>> Locale settings affecting Cygwin binary.
>>> If you
>>> set LANG=ru_RU.CP866
>>> (f.e.)
>>> before invoking cygwin testcase in native CMD, you will likely see it
>>> working better.
>> Thanks for this advise, Andrey. I see that it reacts, but works worth :)
>> I think it advises to output characters in CP866, but console is UTF-8:
>> D:\cli> set LANG=ru_RU.CP866
>> D:\cli> test "текст плюс.txt"
>> param 0 = test
>> param 1 = ⥪▒▒ ▒▒▒▒.txt
>> Failed to open '⥪▒▒ ▒▒▒▒.txt': No such file or directory
>> But.. ta-da! I made it working like that:
>> D:\cli> set LANG=ru_RU.UTF-8
>> D:\cli> test "текст плюс.txt"
>> param 0 = test
>> param 1 = текст плюс.txt
>> File 'текст плюс.txt' was opened
>> Hooray, it worked!
> This is no magic. Console settings must match locale set in the environment.
> Please test again with "chcp" to get current console codepage and setting LANG to match it.
> I could not see which version of Windows you're using, sorry. It is possible
> that console is set to a different codepage than usual.
>>> Alternatively, you could try
>>> chcp 65001
>> That does not help:
>> D:\cli> chcp 65001
>> Active code page: 65001
>> D:\cli> test "текст плюс.txt"
>> param 0 = test
>> param 1 = "текст плюс.txt"
>> Failed to open '"текст плюс.txt"': No such file or directory
>> [1] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L297
>> [2] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L165


If you're using cmd you can also set AutoRun commands like:

	$ cat HKCU-SW-MS-Command_Processor-AutoRun-chcp_65001.reg
	Windows Registry Editor Version 5.00

	[HKEY_CURRENT_USER\Software\Microsoft\Command Processor]
	"AutoRun"="@chcp 65001 >nul"


- append " && command..." to add more commands to AutoRun; these must use only
the common base characters.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-03-27 15:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-22  5:41 Quotes around command-line argument that has unicode characters are not removed Dmitry Katsubo via cygwin
2018-03-22 12:24 ` Andrey Repin
2018-03-22 17:25   ` Kaz Kylheku
2018-03-22 22:46     ` Dmitry Katsubo via cygwin
2018-03-25  0:04       ` Kaz Kylheku
2018-03-22 21:14   ` Dmitry Katsubo via cygwin
2018-03-23  7:58   ` Thomas Wolff
2018-03-23 12:20     ` Steven Penny
2018-03-22 13:35 ` Mikhail Usenko via cygwin
2018-03-22 15:35   ` Andrey Repin
2018-03-22 22:21     ` Dmitry Katsubo via cygwin
2018-03-27 10:05       ` Andrey Repin
2018-03-27 17:39         ` Brian Inglis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).