public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* surrounding double quotes not removed from native command line arguments when they contain unicode and locale is default
@ 2020-11-12 16:10 basinilya
  2020-11-16  5:54 ` L A Walsh
  0 siblings, 1 reply; 2+ messages in thread
From: basinilya @ 2020-11-12 16:10 UTC (permalink / raw)
  To: cygwin

Hi.
When I launch a Cygwin program from a native Windows program and an argument in the command line string is quoted and contains national characters then the Cygwin program behaves as if double quotes were part of the program argument.
This happens if I don't explicitly set LC_ALL or if I set LC_ALL=C or set LC_ALL=C.UTF-8

This is a problem because arguments with spaces must be quoted.

If I set the locale to some language and country the quotes are removed as expected no matter what code page I use, UTF-8 or a single-byte code page. The locale doesn't have to match the alphabet used.

If the argument is not quoted or if it doesn't contain national characters then it works even with the C locale.

    C:\>set LC_ALL=
    
    C:\>C:/cygwin/bin/ls -l C:/test-z-я/some.txt
    -rw-r--r-- 1 il None 0 Nov 12 09:52 'C:/test-z-'$'/321/217''/some.txt'
    
    C:\>C:/cygwin/bin/ls -l "C:/test-z-я/some.txt"
    /usr/bin/ls: cannot access '"C:/test-z-'$'\321\217''/some.txt"': No such file or directory
    
    C:\>C:/cygwin/bin/ls -l "C:/test-z-Z/some.txt"
    -rw-r--r-- 1 il None 0 Nov 12 09:52 C:/test-z-Z/some.txt
    
    C:\>C:\cygwin\bin\locale
    LANG=
    LC_CTYPE="C.UTF-8"
    LC_NUMERIC="C.UTF-8"
    LC_TIME="C.UTF-8"
    LC_COLLATE="C.UTF-8"
    LC_MONETARY="C.UTF-8"
    LC_MESSAGES="C.UTF-8"
    LC_ALL=
    
    C:\>set LC_ALL=C.UTF-8
    
    C:\>C:/cygwin/bin/ls -l "C:/test-z-я/some.txt"
    /usr/bin/ls: cannot access '"C:/test-z-я/some.txt"': No such file or directory
    
    C:\>set LC_ALL=en_US.CP1252
    
    C:\>C:/cygwin/bin/ls -l "C:/test-z-я/some.txt"
    -rw-r--r-- 1 il None 0 Nov 12 09:52 'C:/test-z-'$'/030''N'$'/217''/some.txt'
    
    C:\>set LC_ALL=en_US.UTF-8
    
    C:\>C:/cygwin/bin/ls -l "C:/test-z-я/some.txt"
    -rw-r--r-- 1 il None 0 Nov 12 09:52 'C:/test-z-я/some.txt'
    

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: surrounding double quotes not removed from native command line arguments when they contain unicode and locale is default
  2020-11-12 16:10 surrounding double quotes not removed from native command line arguments when they contain unicode and locale is default basinilya
@ 2020-11-16  5:54 ` L A Walsh
  0 siblings, 0 replies; 2+ messages in thread
From: L A Walsh @ 2020-11-16  5:54 UTC (permalink / raw)
  To: basinilya; +Cc: cygwin

On 2020/11/12 08:10, Ilya Basin via Cygwin wrote:
> Hi.
> When I launch a Cygwin program from a native Windows program and an argument in the command line string is quoted and contains national characters then the Cygwin program behaves as if double quotes were part of the program argument.
> This happens if I don't explicitly set LC_ALL or if I set LC_ALL=C or set LC_ALL=C.UTF-8
>   
----
    The argument handling for cygwin and posix programs comes from the shell
that is used.  The native windows programs don't have that.  Best thing to
try is to run bash as a wrapper around the program, like:

C:/cygwin/bin/bash.exe -c "/cygwin/c/test-z-я/some.txt".

Make sure your LC_CTYPE is set to a valid value for your area, like mine is
set to "en_US.UTF-8". Only my LC_CTYPE is set to something other than the
default, like:
>  locale

LANG=                                                                           
LC_CTYPE="en_US.UTF-8"                                                          
LC_NUMERIC="C"                                                                  
LC_TIME="C"                                                                     
LC_COLLATE="C"                                                                  
LC_MONETARY="C"                                                                 
LC_MESSAGES="C"                                                                 
LC_ALL=                                                                         



> This is a problem because arguments with spaces must be quoted.
>
> If I set the locale to some language and country the quotes are removed as expected no matter what code page I use, UTF-8 or a single-byte code page. The locale doesn't have to match the alphabet used.
>   
----
    Right -- it is just for other stuff, but the problem is the locale
program still wants *some* valid value.

    Type "locale -a" to list all locales and pick whatever is closest to 
where
you are, or pick "en_US", like you said, doesn't really matter -- but:

>     C:\>set LC_ALL=C.UTF-8
>   
----
    C.UTF-8 isn't a valid value for LANG or LC_ALL.

    You probably don't want a single code page for your language like:
>     C:\>set LC_ALL=en_US.CP1252
>     
>     C:\>C:/cygwin/bin/ls -l "C:/test-z-я/some.txt"
>     -rw-r--r-- 1 il None 0 Nov 12 09:52 'C:/test-z-'$'/030''N'$'/217''/some.txt'
>   
----
    Because if you use a character that isn't in that code page, you are 
likely
to have problems.  You want to use a UTF-8, or utf8 codepage. Like this:
> C:\>set LC_ALL=en_US.UTF-8
>   

That's the way the locale system works/interacts with windows.
Just use quotes + UTF-8 -- that way you can write
your stuff consistently and get consistent results.

It's even better if you just use 'bash' and avoid the Win-Cyg-Win boundary
translations.

-linda





^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-11-16  5:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-12 16:10 surrounding double quotes not removed from native command line arguments when they contain unicode and locale is default basinilya
2020-11-16  5:54 ` L A Walsh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).