public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Brian Inglis <Brian.Inglis@SystematicSw.ab.ca>
To: cygwin@cygwin.com
Subject: Re: Unconsistent command-line parsing in case of UTF-8 quoted arguments
Date: Tue, 6 Oct 2020 20:20:04 -0600	[thread overview]
Message-ID: <6d976fd5-74f9-a9e0-49a1-26993fc13b51@SystematicSw.ab.ca> (raw)
In-Reply-To: <CAFC9CLCHk0WMj935OzZF+HeAdDbv-kGU_SHyi47vohagM+ZmtQ@mail.gmail.com>

On 2020-10-06 15:36, Jérôme Froissart wrote:
> Thanks for your replies.
> This issue only happens when a program is run from cmd.exe, not from a
> Cygwin bash shell.
> This is important for me, since I discovered this bug in a project
> that must be run from Windows graphical shell (i.e. there is no
> sensible way to run it through Cygwin and Bash).
> 
>> Please show us the output from "uname -a" and "locale" run from the bash prompt.
> 
>> Please provide the results of "locale" command right before running your test
>> binary.
> Here are the more detailed steps to reproduce the issue (along with
> answers to your requests about `uname`, `locale`, etc.).
> (I mostly reproduced what billziss-gh had done before, I do not take
> all the credits :D)
> 
> Here is an example C file

> I have built it with gcc from Cygwin
>     $ gcc -o binary example.c
> 
> Running it from the same Cygwin bash prompt works as expected
>     $ uname -a
>     CYGWIN_NT-10.0 XPS 3.1.5(0.340/5/3) 2020-06-01 08:59 x86_64 Cygwin
>     # (XPS is my Windows machine name)
> 
>     $ locale
>     LANG=fr_FR.UTF-8
>     LC_CTYPE="fr_FR.UTF-8"
>     LC_NUMERIC="fr_FR.UTF-8"
>     LC_TIME="fr_FR.UTF-8"
>     LC_COLLATE="fr_FR.UTF-8"
>     LC_MONETARY="fr_FR.UTF-8"
>     LC_MESSAGES="fr_FR.UTF-8"
>     LC_ALL=
> 
>     $ which gcc
>     /usr/bin/gcc
> 
>     # The following runs as expected
>     $ ./binary.exe "foo bar" "Jérôme"
>     C="C:\Users\Public\binary.exe"
>     0=./binary
>     1=foo bar
>     2=Jérôme
> 
> Now, let's start a Windows shell (cmd.exe)
> Note that I had to copy cygwin1.dll from my Cygwin installation
> directory, otherwise binary.exe would not start.
> I do not know whether there is a `locale` equivalent in Windows
> command prompt, so I merely ran my program.
>     C:\Users\Public>binary.exe "foo bar" "Jérôme"
>     C=binary.exe  "foo bar" "J□r□me"
>     0=binary
>     1=foo bar
>     2="Jérôme"
> 
> This behaviour is not expected and is quite inconsistent with what
> happened through Bash.
> Besides the "strange squares" that appear on the first line, and the
> extra space after binary.exe, I especially did not expect "Jérôme" to
> remain quoted as a second argument.
> 
> Sorry for the delay in my answer. I hope this is now clear, please ask
> me for more examples or investigation if you need.
> Thanks for your help.

Create a new or change your current Command Prompt shortcut to run:

	"%windir%\system32\cmd /u"

"/U Causes the output of internal commands to a pipe or file to be Unicode"

and add "chcp 65001":

	"%windir%\system32\cmd /u /k chcp 65001"

or set

	HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor\Autorun

or

	HKEY_CURRENT_USER\Software\Microsoft\Command Processor\AutoRun

to command

	"@chcp 65001 > nul"

e.g.

	> reg add HKEY_CURRENT_USER\Software\Microsoft\Command Processor ^
		/v AutoRun /d "@chcp 65001 > nul" /f

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

  parent reply	other threads:[~2020-10-07  2:20 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-02 21:40 Jérôme Froissart
2020-10-03  2:22 ` Doug Henderson
2020-10-04 11:18 ` Andrey Repin
2020-10-06 21:36   ` Jérôme Froissart
2020-10-07  1:10     ` Andrey Repin
2020-10-07 22:21       ` Jérôme Froissart
2020-10-11 18:55         ` Andrey Repin
2020-10-07  2:20     ` Brian Inglis [this message]
2020-10-07  5:17     ` Thomas Wolff
2020-10-07 23:32       ` Brian Inglis
2020-10-08  0:59         ` Eliot Moss
2020-10-08  6:22           ` Brian Inglis
2020-10-13 16:30     ` Kaz Kylheku (Cygwin)
2020-10-14 21:47       ` Jérôme Froissart
2020-10-14 22:14         ` Jérôme Froissart
2020-10-15  5:14         ` UTF-8 quoted args passed to program include quotes when run from cmd Brian Inglis
2020-10-19  2:32         ` Unconsistent command-line parsing in case of UTF-8 quoted arguments Kaz Kylheku (Cygwin)
2020-10-13 17:34     ` Brian Inglis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6d976fd5-74f9-a9e0-49a1-26993fc13b51@SystematicSw.ab.ca \
    --to=brian.inglis@systematicsw.ab.ca \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).