From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out-no.shaw.ca (smtp-out-no.shaw.ca [64.59.134.12]) by sourceware.org (Postfix) with ESMTPS id 1B9AE3858D35 for ; Wed, 7 Oct 2020 02:20:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 1B9AE3858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=SystematicSw.ab.ca Authentication-Results: sourceware.org; spf=none smtp.mailfrom=brian.inglis@systematicsw.ab.ca Received: from [192.168.1.104] ([24.64.172.44]) by shaw.ca with ESMTP id Pz3kknwYzs3D6Pz3lkXnRe; Tue, 06 Oct 2020 20:20:06 -0600 X-Authority-Analysis: v=2.4 cv=bZHV7MDB c=1 sm=1 tr=0 ts=5f7d25d6 a=kiZT5GMN3KAWqtYcXc+/4Q==:117 a=kiZT5GMN3KAWqtYcXc+/4Q==:17 a=IkcTkHD0fZMA:10 a=Pdbar9f_tpPZj61_2-IA:9 a=QEXdDO2ut3YA:10 Reply-To: cygwin@cygwin.com Subject: Re: Unconsistent command-line parsing in case of UTF-8 quoted arguments To: cygwin@cygwin.com References: <634821436.20201004141809@yandex.ru> From: Brian Inglis Autocrypt: addr=Brian.Inglis@SystematicSw.ab.ca; prefer-encrypt=mutual; keydata= mDMEXopx8xYJKwYBBAHaRw8BAQdAnCK0qv/xwUCCZQoA9BHRYpstERrspfT0NkUWQVuoePa0 LkJyaWFuIEluZ2xpcyA8QnJpYW4uSW5nbGlzQFN5c3RlbWF0aWNTdy5hYi5jYT6IlgQTFggA PhYhBMM5/lbU970GBS2bZB62lxu92I8YBQJeinHzAhsDBQkJZgGABQsJCAcCBhUKCQgLAgQW AgMBAh4BAheAAAoJEB62lxu92I8Y0ioBAI8xrggNxziAVmr+Xm6nnyjoujMqWcq3oEhlYGAO WacZAQDFtdDx2koSVSoOmfaOyRTbIWSf9/Cjai29060fsmdsDLg4BF6KcfMSCisGAQQBl1UB BQEBB0Awv8kHI2PaEgViDqzbnoe8B9KMHoBZLS92HdC7ZPh8HQMBCAeIfgQYFggAJhYhBMM5 /lbU970GBS2bZB62lxu92I8YBQJeinHzAhsMBQkJZgGAAAoJEB62lxu92I8YZwUBAJw/74rF IyaSsGI7ewCdCy88Lce/kdwX7zGwid+f8NZ3AQC/ezTFFi5obXnyMxZJN464nPXiggtT9gN5 RSyTY8X+AQ== Organization: Systematic Software Message-ID: <6d976fd5-74f9-a9e0-49a1-26993fc13b51@SystematicSw.ab.ca> Date: Tue, 6 Oct 2020 20:20:04 -0600 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-CA Content-Transfer-Encoding: 8bit X-CMAE-Envelope: MS4xfJ/Jq3XMYBDB7NMgFBCrdejNH/PrdMha11JfQYueJzuMNhKKU++pMspDkFBy0g9SRwfr4if+m0lsMSoX52j7UzMzy5s0zJ4Qp88tCmINQTa2ksWvZJb6 dkHsyX6XCT70WvAc3QaTNbdHdTw2YuBIItuEOpWLp1Jd7CGHmykfNhYUyClaMe14s3PnkM8qhiWkWkfPcnMNzZlpdKKRVNsm2Uc= X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Oct 2020 02:20:08 -0000 On 2020-10-06 15:36, Jérôme Froissart wrote: > Thanks for your replies. > This issue only happens when a program is run from cmd.exe, not from a > Cygwin bash shell. > This is important for me, since I discovered this bug in a project > that must be run from Windows graphical shell (i.e. there is no > sensible way to run it through Cygwin and Bash). > >> Please show us the output from "uname -a" and "locale" run from the bash prompt. > >> Please provide the results of "locale" command right before running your test >> binary. > Here are the more detailed steps to reproduce the issue (along with > answers to your requests about `uname`, `locale`, etc.). > (I mostly reproduced what billziss-gh had done before, I do not take > all the credits :D) > > Here is an example C file > I have built it with gcc from Cygwin > $ gcc -o binary example.c > > Running it from the same Cygwin bash prompt works as expected > $ uname -a > CYGWIN_NT-10.0 XPS 3.1.5(0.340/5/3) 2020-06-01 08:59 x86_64 Cygwin > # (XPS is my Windows machine name) > > $ locale > LANG=fr_FR.UTF-8 > LC_CTYPE="fr_FR.UTF-8" > LC_NUMERIC="fr_FR.UTF-8" > LC_TIME="fr_FR.UTF-8" > LC_COLLATE="fr_FR.UTF-8" > LC_MONETARY="fr_FR.UTF-8" > LC_MESSAGES="fr_FR.UTF-8" > LC_ALL= > > $ which gcc > /usr/bin/gcc > > # The following runs as expected > $ ./binary.exe "foo bar" "Jérôme" > C="C:\Users\Public\binary.exe" > 0=./binary > 1=foo bar > 2=Jérôme > > Now, let's start a Windows shell (cmd.exe) > Note that I had to copy cygwin1.dll from my Cygwin installation > directory, otherwise binary.exe would not start. > I do not know whether there is a `locale` equivalent in Windows > command prompt, so I merely ran my program. > C:\Users\Public>binary.exe "foo bar" "Jérôme" > C=binary.exe "foo bar" "J□r□me" > 0=binary > 1=foo bar > 2="Jérôme" > > This behaviour is not expected and is quite inconsistent with what > happened through Bash. > Besides the "strange squares" that appear on the first line, and the > extra space after binary.exe, I especially did not expect "Jérôme" to > remain quoted as a second argument. > > Sorry for the delay in my answer. I hope this is now clear, please ask > me for more examples or investigation if you need. > Thanks for your help. Create a new or change your current Command Prompt shortcut to run: "%windir%\system32\cmd /u" "/U Causes the output of internal commands to a pipe or file to be Unicode" and add "chcp 65001": "%windir%\system32\cmd /u /k chcp 65001" or set HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor\Autorun or HKEY_CURRENT_USER\Software\Microsoft\Command Processor\AutoRun to command "@chcp 65001 > nul" e.g. > reg add HKEY_CURRENT_USER\Software\Microsoft\Command Processor ^ /v AutoRun /d "@chcp 65001 > nul" /f -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in binary units and prefixes, physical quantities in SI.]