From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com [209.85.208.41]) by sourceware.org (Postfix) with ESMTPS id 38A5C3858D35 for ; Tue, 6 Oct 2020 21:36:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 38A5C3858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=froissart.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=jerome.froissart@gmail.com Received: by mail-ed1-f41.google.com with SMTP id 33so15154812edq.13 for ; Tue, 06 Oct 2020 14:36:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=W9Bos3VsN2yrCPSwI47XvnUMkdXPEgGSCIFb9oacb3Q=; b=JvofSgtlrroxp1Y7AjkmDw36agMg5+arAH07oJtmFpeXZP6tR7cjuRD5f53ZfoyliL 3PVCbZ9P5oSAdZFLAO3XN83qSnHla/fXAu1S0yC0JIc5YvDfx2RstyWpjkITmXXG4iU4 lhard5Bu1UuGKqeMvRHYCbgW2NGSt0ow97mXwtGvc8p0G1FVlpLMJJtavFKU5ULUkfXU OMMXjtuZ9rNBwKS/gnoZtbGf0OSxXr38bHX89kAHHSRTBYJdGdfEw7I9jsoUMN+y+e9e Pria/93yyPz945SMOpc4QUQbDYAE22TOZUduXj4yUZPUG5oBLKYhxU/VKWJ3A2H9vn1n wQNg== X-Gm-Message-State: AOAM53113A0Ix/DOgv621572sSuJRzlmHuLD/duEGziV4jz2A/aXV0BO pnzxzmfg0/I2p8dcZ32Ur0kY8Yogq2B+DfeK7ES4J5gCRWI= X-Google-Smtp-Source: ABdhPJw7/QiR2eUsBXynUERmL6MMGPGzha5LoST0HWcDshbq8KE/juvns7QTCE/1eR+8JwKwq727GqZMxKPMyQlQ5W8= X-Received: by 2002:a05:6402:699:: with SMTP id f25mr129312edy.372.1602020169846; Tue, 06 Oct 2020 14:36:09 -0700 (PDT) MIME-Version: 1.0 References: <634821436.20201004141809@yandex.ru> In-Reply-To: <634821436.20201004141809@yandex.ru> From: =?UTF-8?B?SsOpcsO0bWUgRnJvaXNzYXJ0?= Date: Tue, 6 Oct 2020 23:36:07 +0200 Message-ID: Subject: Re: Unconsistent command-line parsing in case of UTF-8 quoted arguments To: cygwin@cygwin.com Cc: =?UTF-8?B?SsOpcsO0bWUgRnJvaXNzYXJ0?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Oct 2020 21:36:12 -0000 Thanks for your replies. This issue only happens when a program is run from cmd.exe, not from a Cygwin bash shell. This is important for me, since I discovered this bug in a project that must be run from Windows graphical shell (i.e. there is no sensible way to run it through Cygwin and Bash). > Please show us the output from "uname -a" and "locale" run from the bash = prompt. > Please provide the results of "locale" command right before running your = test > binary. Here are the more detailed steps to reproduce the issue (along with answers to your requests about `uname`, `locale`, etc.). (I mostly reproduced what billziss-gh had done before, I do not take all the credits :D) Here is an example C file $ cat example.c #include const char *GetCommandLineA(void); int main(int argc, char *argv[]) { const char *s =3D GetCommandLineA(); printf("C=3D%s\n", s); for (int i =3D 0; argc > i; i++) printf("%d=3D%s\n", i, argv[i]); return 0; } I have built it with gcc from Cygwin $ gcc -o binary example.c Running it from the same Cygwin bash prompt works as expected $ uname -a CYGWIN_NT-10.0 XPS 3.1.5(0.340/5/3) 2020-06-01 08:59 x86_64 Cygwin # (XPS is my Windows machine name) $ locale LANG=3Dfr_FR.UTF-8 LC_CTYPE=3D"fr_FR.UTF-8" LC_NUMERIC=3D"fr_FR.UTF-8" LC_TIME=3D"fr_FR.UTF-8" LC_COLLATE=3D"fr_FR.UTF-8" LC_MONETARY=3D"fr_FR.UTF-8" LC_MESSAGES=3D"fr_FR.UTF-8" LC_ALL=3D $ which gcc /usr/bin/gcc # The following runs as expected $ ./binary.exe "foo bar" "J=C3=A9r=C3=B4me" C=3D"C:\Users\Public\binary.exe" 0=3D./binary 1=3Dfoo bar 2=3DJ=C3=A9r=C3=B4me Now, let's start a Windows shell (cmd.exe) Note that I had to copy cygwin1.dll from my Cygwin installation directory, otherwise binary.exe would not start. I do not know whether there is a `locale` equivalent in Windows command prompt, so I merely ran my program. C:\Users\Public>binary.exe "foo bar" "J=C3=A9r=C3=B4me" C=3Dbinary.exe "foo bar" "J=E2=96=A1r=E2=96=A1me" 0=3Dbinary 1=3Dfoo bar 2=3D"J=C3=A9r=C3=B4me" This behaviour is not expected and is quite inconsistent with what happened through Bash. Besides the "strange squares" that appear on the first line, and the extra space after binary.exe, I especially did not expect "J=C3=A9r=C3=B4me= " to remain quoted as a second argument. Sorry for the delay in my answer. I hope this is now clear, please ask me for more examples or investigation if you need. Thanks for your help. J=C3=A9r=C3=B4me