From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by sourceware.org (Postfix) with ESMTPS id D1F433857C71 for ; Fri, 2 Oct 2020 21:40:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D1F433857C71 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=froissart.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=jerome.froissart@gmail.com Received: by mail-ej1-f42.google.com with SMTP id p9so3819731ejf.6 for ; Fri, 02 Oct 2020 14:40:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=rcWZCuGS0YKPMQlE06DRBm80FhKEOI6JPNn/VSsr2A4=; b=TRYADRsdTx1RMTacfLKu0utKTMDdRwHjJ1oLbuKiOVjI867YQnQVKjL4srOa+Rhd3R zKWMQqonQck4UsaJCNf5sTv0p0QV7GppTzJ9nQNHsJBooo7W47t4F5mvb6xc1PoaRsi/ dN/PyvnU4/p5KXzbDpaHleHYNPpoQxzWYzjFSFV7GFU7sCiTm+CNFX6AQBtgHQJuyzyN mwN12LhsR5Zrq0mrJ7jQc/tK8tBsbFgNiCf2k/kahmU48Qwsc5x5DbhMBt9fBVVUjJOK jBUvk0usduT60wQFG7ZHsBE8698iwGBdVh8fuSQBzuJvsOBc+X63w/hqan3WuP4ll+M0 lo2w== X-Gm-Message-State: AOAM530VFvm2OwNa3SD9fhDQPCYZawtIde5og9zcDLDWcrNA0tazPgf4 04HAZ7sxEWSeiHiKLidxbQp7/iNSBi+vHK+AxDwf6ratKxg= X-Google-Smtp-Source: ABdhPJz6G1gpnH2Wi9D7MyZGHyoYti+FeiWHLZPvRl69BvimZ0Y22T+6Oauq6hGv+Mgrx8FvEytOsyns9nc1HEMaxuU= X-Received: by 2002:a17:906:f8d5:: with SMTP id lh21mr4067859ejb.185.1601674822599; Fri, 02 Oct 2020 14:40:22 -0700 (PDT) MIME-Version: 1.0 From: =?UTF-8?B?SsOpcsO0bWUgRnJvaXNzYXJ0?= Date: Fri, 2 Oct 2020 23:40:12 +0200 Message-ID: Subject: Unconsistent command-line parsing in case of UTF-8 quoted arguments To: cygwin@cygwin.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Oct 2020 21:40:25 -0000 Hello, By discussing a merge request on another project [1], I think billziss-gh found a weirdness in the way Cygwin parses the command line arguments when non-ASCII characters come into play. EXPECTED BEHAVIOUR: cygwin should parse the following command line binary.exe --non-ascii "chara=C3=A7t=C3=A9rs" --ascii "nothing-fancy-he= re" as argv =3D ["binary.exe", "--non-ascii", "chara\xXX\xXXt\xXX\xXXrs", "--ascii", "nothing-fancy-here"] // \xXX\xXX being the UTF-8 encoding of the special characters, but this does not really matter here before calling main() ACTUAL BEHAVIOUR: it parses it as argv =3D ["binary.exe", "--non-ascii", "\"chara\xXX\xXXt\xXX\xXXrs\"", // mind the unstripped quotes here... "--ascii", "nothing-fancy-here" // ...but not here ] It looks that words containing UTF-8 characters are not properly stripped when they are surrounded by quotes, unlinke ASCII words. More examples and a better description is available at [1] (thanks to billziss-gh for his analysis, much more thorough than mine) For the record, we wrote a work-around in our specific program, but handling this issue in Cygwin might be a better way to solve it. [1]: https://github.com/billziss-gh/sshfs-win/pull/208 (Checking for quotes around non-ascii usernames passed by Windows) Thanks for your help! In case you didn't have time, please tell me where to look at, and I might try to fix it myself and send a patch proposal if that is easy enough (I have never read Cygwin's code yet). J=C3=A9r=C3=B4me