public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: "Kaz Kylheku (Cygwin)" <743-406-3965@kylheku.com>
To: "Jérôme Froissart" <software@froissart.eu>
Cc: cygwin@cygwin.com
Subject: Re: Unconsistent command-line parsing in case of UTF-8 quoted arguments
Date: Tue, 13 Oct 2020 09:30:37 -0700	[thread overview]
Message-ID: <d4f283fe85c31be76dcfc01b20bb375e@mail.kylheku.com> (raw)
In-Reply-To: <CAFC9CLCHk0WMj935OzZF+HeAdDbv-kGU_SHyi47vohagM+ZmtQ@mail.gmail.com>

On 2020-10-06 14:36, Jérôme Froissart wrote:
> Here is an example C file
>     $ cat example.c
>     #include <stdio.h>
> 
>     const char *GetCommandLineA(void);
> 
>     int main(int argc, char *argv[])
>     {
>         const char *s = GetCommandLineA();
>         printf("C=%s\n", s);
> 
>         for (int i = 0; argc > i; i++)
>             printf("%d=%s\n", i, argv[i]);
> 
>         return 0;
>     }

Your program's comparison seems to be based on the
hypothesis that Cygwin parses the GetCommandLineA() command line.

But this hypothesis is almost certainly wrong.

> Now, let's start a Windows shell (cmd.exe)
> Note that I had to copy cygwin1.dll from my Cygwin installation
> directory, otherwise binary.exe would not start.
> I do not know whether there is a `locale` equivalent in Windows
> command prompt, so I merely ran my program.
>     C:\Users\Public>binary.exe "foo bar" "Jérôme"
>     C=binary.exe  "foo bar" "J□r□me"
>     0=binary
>     1=foo bar
>     2="Jérôme"

The "A" command line from GetCommandLineA has "tofu"
characters: é and ô were not decoded properly.

The é and ô characters we see in the Cygwin-parsed
arguments coming into main could not have been recovered
from these "tofu" replacement characters.

What is actually being parsed must be the WCHAR command line
corresponding to what comes from GetCommandLineW().

It's necessary to show that one to get a more complete understanding.


  parent reply	other threads:[~2020-10-13 16:30 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-02 21:40 Jérôme Froissart
2020-10-03  2:22 ` Doug Henderson
2020-10-04 11:18 ` Andrey Repin
2020-10-06 21:36   ` Jérôme Froissart
2020-10-07  1:10     ` Andrey Repin
2020-10-07 22:21       ` Jérôme Froissart
2020-10-11 18:55         ` Andrey Repin
2020-10-07  2:20     ` Brian Inglis
2020-10-07  5:17     ` Thomas Wolff
2020-10-07 23:32       ` Brian Inglis
2020-10-08  0:59         ` Eliot Moss
2020-10-08  6:22           ` Brian Inglis
2020-10-13 16:30     ` Kaz Kylheku (Cygwin) [this message]
2020-10-14 21:47       ` Jérôme Froissart
2020-10-14 22:14         ` Jérôme Froissart
2020-10-15  5:14         ` UTF-8 quoted args passed to program include quotes when run from cmd Brian Inglis
2020-10-19  2:32         ` Unconsistent command-line parsing in case of UTF-8 quoted arguments Kaz Kylheku (Cygwin)
2020-10-13 17:34     ` Brian Inglis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d4f283fe85c31be76dcfc01b20bb375e@mail.kylheku.com \
    --to=743-406-3965@kylheku.com \
    --cc=cygwin@cygwin.com \
    --cc=software@froissart.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).