From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <743-406-3965@kylheku.com> Received: from smtp-out-no.shaw.ca (smtp-out-no.shaw.ca [64.59.134.12]) by sourceware.org (Postfix) with ESMTPS id 495E23861817 for ; Tue, 13 Oct 2020 16:30:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 495E23861817 Received: from kylheku.com ([70.79.163.252]) by shaw.ca with ESMTPA id SNC9keXPps3D6SNCAkrv5z; Tue, 13 Oct 2020 10:30:39 -0600 X-Authority-Analysis: v=2.4 cv=bZHV7MDB c=1 sm=1 tr=0 ts=5f85d62f a=95A0EdhkF1LMGt25d7h1IQ==:117 a=95A0EdhkF1LMGt25d7h1IQ==:17 a=IkcTkHD0fZMA:10 a=SMorJkV_YP8A:10 a=afefHYAZSVUA:10 a=FhMo6CzChv-EA_v4RMMA:9 a=QEXdDO2ut3YA:10 Received: from www-data by kylheku.com with local (Exim 4.72) (envelope-from <743-406-3965@kylheku.com>) id 1kSNC9-00080I-97; Tue, 13 Oct 2020 09:30:37 -0700 To: =?UTF-8?Q?J=C3=A9r=C3=B4me_Froissart?= Subject: Re: Unconsistent command-line parsing in case of UTF-8 quoted arguments X-PHP-Originating-Script: 501:rcmail.php MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Tue, 13 Oct 2020 09:30:37 -0700 From: "Kaz Kylheku (Cygwin)" <743-406-3965@kylheku.com> Cc: cygwin@cygwin.com In-Reply-To: References: <634821436.20201004141809@yandex.ru> Message-ID: X-Sender: 743-406-3965@kylheku.com User-Agent: Roundcube Webmail/0.9.2 X-CMAE-Envelope: MS4xfAOWl4UyINI3/YdXFUdJqQucx6HZnVCgceMrq2FvqIolN1Zrcp4G10T5cEU016SrdCGm66XGIyf9prS4JznAFjoicbopesLEEwZI45+HFear+NUkuJ9P 0ddpaQNb4bzW82qIrIxz5bGCecJ4B0R5yzqs9E+ngXN7/YyUTAKNgPWrLFgjqy5Qz0wT87peNDj3lVJdkn05GVPJSqp+hQvHQFpyugbcmOlX/M+hqtWvmmXj X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, FROM_STARTS_WITH_NUMS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Oct 2020 16:30:42 -0000 On 2020-10-06 14:36, Jérôme Froissart wrote: > Here is an example C file > $ cat example.c > #include > > const char *GetCommandLineA(void); > > int main(int argc, char *argv[]) > { > const char *s = GetCommandLineA(); > printf("C=%s\n", s); > > for (int i = 0; argc > i; i++) > printf("%d=%s\n", i, argv[i]); > > return 0; > } Your program's comparison seems to be based on the hypothesis that Cygwin parses the GetCommandLineA() command line. But this hypothesis is almost certainly wrong. > Now, let's start a Windows shell (cmd.exe) > Note that I had to copy cygwin1.dll from my Cygwin installation > directory, otherwise binary.exe would not start. > I do not know whether there is a `locale` equivalent in Windows > command prompt, so I merely ran my program. > C:\Users\Public>binary.exe "foo bar" "Jérôme" > C=binary.exe "foo bar" "J□r□me" > 0=binary > 1=foo bar > 2="Jérôme" The "A" command line from GetCommandLineA has "tofu" characters: é and ô were not decoded properly. The é and ô characters we see in the Cygwin-parsed arguments coming into main could not have been recovered from these "tofu" replacement characters. What is actually being parsed must be the WCHAR command line corresponding to what comes from GetCommandLineW(). It's necessary to show that one to get a more complete understanding.