From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from Ishtar.sc.tlinx.org (ishtar.tlinx.org [173.164.175.65]) by sourceware.org (Postfix) with ESMTPS id 5C8433892028 for ; Mon, 16 Nov 2020 05:54:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5C8433892028 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tlinx.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cygwin@tlinx.org Received: from [192.168.3.12] (Athenae [192.168.3.12]) by Ishtar.sc.tlinx.org (8.14.7/8.14.4/SuSE Linux 0.8) with ESMTP id 0AG5r9IK085680; Sun, 15 Nov 2020 21:53:11 -0800 Message-ID: <5FB213F8.1060902@tlinx.org> Date: Sun, 15 Nov 2020 21:54:00 -0800 From: L A Walsh User-Agent: Thunderbird MIME-Version: 1.0 To: basinilya@gmail.com CC: cygwin@cygwin.com Subject: Re: surrounding double quotes not removed from native command line arguments when they contain unicode and locale is default References: <420b941a-8f6d-29a3-f97d-724025130ce7@gmail.com> In-Reply-To: <420b941a-8f6d-29a3-f97d-724025130ce7@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Nov 2020 05:54:35 -0000 On 2020/11/12 08:10, Ilya Basin via Cygwin wrote: > Hi. > When I launch a Cygwin program from a native Windows program and an arg= ument in the command line string is quoted and contains national characte= rs then the Cygwin program behaves as if double quotes were part of the p= rogram argument. > This happens if I don't explicitly set LC_ALL or if I set LC_ALL=3DC or= set LC_ALL=3DC.UTF-8 > =20 ---- The argument handling for cygwin and posix programs comes from the sh= ell that is used. The native windows programs don't have that. Best thing t= o try is to run bash as a wrapper around the program, like: C:/cygwin/bin/bash.exe -c "/cygwin/c/test-z-=D1=8F/some.txt". Make sure your LC_CTYPE is set to a valid value for your area, like mine = is set to "en_US.UTF-8". Only my LC_CTYPE is set to something other than the= default, like: > locale LANG=3D = =20 LC_CTYPE=3D"en_US.UTF-8" = =20 LC_NUMERIC=3D"C" = =20 LC_TIME=3D"C" = =20 LC_COLLATE=3D"C" = =20 LC_MONETARY=3D"C" = =20 LC_MESSAGES=3D"C" = =20 LC_ALL=3D = =20 > This is a problem because arguments with spaces must be quoted. > > If I set the locale to some language and country the quotes are removed= as expected no matter what code page I use, UTF-8 or a single-byte code = page. The locale doesn't have to match the alphabet used. > =20 ---- Right -- it is just for other stuff, but the problem is the locale program still wants *some* valid value. Type "locale -a" to list all locales and pick whatever is closest to = where you are, or pick "en_US", like you said, doesn't really matter -- but: > C:\>set LC_ALL=3DC.UTF-8 > =20 ---- C.UTF-8 isn't a valid value for LANG or LC_ALL. You probably don't want a single code page for your language like: > C:\>set LC_ALL=3Den_US.CP1252 > =20 > C:\>C:/cygwin/bin/ls -l "C:/test-z-=D1=8F/some.txt" > -rw-r--r-- 1 il None 0 Nov 12 09:52 'C:/test-z-'$'/030''N'$'/217''/= some.txt' > =20 ---- Because if you use a character that isn't in that code page, you are = likely to have problems. You want to use a UTF-8, or utf8 codepage. Like this: > C:\>set LC_ALL=3Den_US.UTF-8 > =20 That's the way the locale system works/interacts with windows. Just use quotes + UTF-8 -- that way you can write your stuff consistently and get consistent results. It's even better if you just use 'bash' and avoid the Win-Cyg-Win boundar= y translations. -linda