From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from conssluserg-03.nifty.com (conssluserg-03.nifty.com [210.131.2.82]) by sourceware.org (Postfix) with ESMTPS id CA2653858D20 for ; Sat, 3 Dec 2022 13:43:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CA2653858D20 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=nifty.ne.jp Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nifty.ne.jp Received: from HP-Z230 (aj135041.dynamic.ppp.asahi-net.or.jp [220.150.135.41]) (authenticated) by conssluserg-03.nifty.com with ESMTP id 2B3Dgk2L006437 for ; Sat, 3 Dec 2022 22:42:46 +0900 DKIM-Filter: OpenDKIM Filter v2.10.3 conssluserg-03.nifty.com 2B3Dgk2L006437 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nifty.ne.jp; s=dec2015msa; t=1670074966; bh=qp6MbiHSRi+KyzbnFnamSAgiiAgj8Gh9249rURvyJj8=; h=Date:From:To:Subject:In-Reply-To:References:From; b=Hyx1klZzhrvWkwpjy+VAMP1icWlWEpaVdonFSnfmorpW/VAKwjGsaTjtYHsDYJqrd SsdWyTyvYOaCNI4q1q7t6+ObMs8bB5BMq35FajIHOpaFxduaGOD+vZ1IfT0zMCeIWw MUy9EQw28hrMDtXf7HfgZHfQ3jyIUsFr1pD+aB51guVHg7bST+ieZ4989t/JYVKTXe wi0GjraDHDgkL4UJ0zTa2PvFr/RvFnX3EFRl+saNnSkKyTp5/TkKn3TYynUcpMPudz rdfdHHDJFF6koED26u1NO/9RsTyXRmkArvxA+kDyBiNCGDWbk924bVEg7uxbNk/vYS 1HpWZ913vIqqA== X-Nifty-SrcIP: [220.150.135.41] Date: Sat, 3 Dec 2022 22:42:46 +0900 From: Takashi Yano To: cygwin@cygwin.com Subject: Re: [BUG core?] Regression with parsing =?UTF-8?B?V2luZG93cw==?= =?UTF-8?B?4oCZ?= command-line Message-Id: <20221203224246.e81fcbb5ba989a4a7c25ddde@nifty.ne.jp> In-Reply-To: <20221203192810.03c73015303ef3ad4fe241f3@nifty.ne.jp> References: <20221116124824.zzobomcsmowvjtbr@math.berkeley.edu> <20221203034030.a6ghnwcze4rkqeap@math.berkeley.edu> <20221203192810.03c73015303ef3ad4fe241f3@nifty.ne.jp> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sat, 3 Dec 2022 19:28:10 +0900 Takashi Yano wrote: > On Fri, 2 Dec 2022 19:40:30 -0800 > Ilya Zakharevich wrote: > > On Wed, Nov 16, 2022 at 04:48:25AM -0800, I wrote: > > > De-quoting (converting the Windows’ command-line into argc/argv) does > > > not remove double quotes if characters not fit for 8-bit (?) are present. > > > > > > Broken in: CYGWIN_NT-6.1 Bu 3.3.4(0.341/5/3) 2022-01-31 19:35 x86_64 Cygwin > > > Works in: CYGWIN_NT-6.1-WOW Bu 2.2.1(0.289/5/3) 2015-08-20 11:40 i686 Cygwin > > > > > > To reproduce, do in CMD’s command line: > > > > > > D:\> D:\Programs\cygwin2022\bin\perl -wle "print for @ARGV" . "/i/" "/и/" . > > > . > > > /i/ > > > "/и/" > > > . > > > > I triple-checked > > • with a Win10 machine (and a version of cygwin given above), > > • with a fresh latest(=test)-cygwin-dll installation on a Win7 (as above) machine. > > > > Same bug everywhere. > > This certainly seems to be a problem of cygwin1.dll. > > Though I am not sure this is the right thing, I have confirmed > that the following patch solves the issue. > > diff --git a/newlib/libc/locale/lctype.c b/newlib/libc/locale/lctype.c > index 644669765..732d132e1 100644 > --- a/newlib/libc/locale/lctype.c > +++ b/newlib/libc/locale/lctype.c > @@ -25,11 +25,20 @@ > > #define LCCTYPE_SIZE (sizeof(struct lc_ctype_T) / sizeof(char *)) > > +#ifdef __CYGWIN__ > +static char numsix[] = { '\6', '\0'}; > +#else > static char numone[] = { '\1', '\0'}; > +#endif > > const struct lc_ctype_T _C_ctype_locale = { > +#ifdef __CYGWIN__ > + "UTF-8", /* codeset */ > + numsix /* mb_cur_max */ > +#else > "ASCII", /* codeset */ > numone /* mb_cur_max */ > +#endif > #ifdef __HAVE_LOCALE_INFO_EXTENDED__ > , > { "0", "1", "2", "3", "4", /* outdigits */ The patch above also affects __C_locale. The patch below should be more appropriate. diff --git a/newlib/libc/locale/locale.c b/newlib/libc/locale/locale.c index e523d2366..7485ac292 100644 --- a/newlib/libc/locale/locale.c +++ b/newlib/libc/locale/locale.c @@ -244,6 +244,21 @@ const struct __locale_t __C_locale = }; #endif /* _MB_CAPABLE */ +#ifdef __CYGWIN__ +static char numsix[] = { '\6', '\0'}; +static const struct lc_ctype_T _C_UTF8_ctype_locale = { + "UTF-8", /* codeset */ + numsix /* mb_cur_max */ +#ifdef __HAVE_LOCALE_INFO_EXTENDED__ + , + { "0", "1", "2", "3", "4", /* outdigits */ + "5", "6", "7", "8", "9" }, + { L"0", L"1", L"2", L"3", L"4", /* woutdigits */ + L"5", L"6", L"7", L"8", L"9" } +#endif +}; +#endif + struct __locale_t __global_locale = { { "C", "C", DEFAULT_LOCALE, "C", "C", "C", "C", }, @@ -272,10 +287,11 @@ struct __locale_t __global_locale = { NULL, NULL }, /* LC_ALL */ #ifdef __CYGWIN__ { &_C_collate_locale, NULL }, /* LC_COLLATE */ + { &_C_UTF8_ctype_locale, NULL }, /* LC_CTYPE */ #else { NULL, NULL }, /* LC_COLLATE */ -#endif { &_C_ctype_locale, NULL }, /* LC_CTYPE */ +#endif { &_C_monetary_locale, NULL }, /* LC_MONETARY */ { &_C_numeric_locale, NULL }, /* LC_NUMERIC */ { &_C_time_locale, NULL }, /* LC_TIME */ -- Takashi Yano