From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id D79AB3858403 for ; Mon, 25 Mar 2024 10:32:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D79AB3858403 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D79AB3858403 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711362745; cv=none; b=cg3C5929IUAYsEFTjCj703z7EHwnrcKp0q/CnWiVTFpZh6smFTWRtxmf3pTvUE8XTyKwvbsYuuv8HAlpi8ly5XjC9JXX5mWB2tmfABU1WPV2ov5TKOCsBkwhyF7Wk0c7esPyUcFT9aJj20v6/j4IIHevDW0j0KrtQYrM5ccEuQc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711362745; c=relaxed/simple; bh=66BYyjCPfh9248/8KtcdpbxLaKvKVWlkSuwoXsh4NPo=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=N/LCYDF13d2WfH9KZcPio5ggt4HP77SCLGuUon43CIBn9I9qgLjNDV4DgHVVAW4diM2rEm1XjaPfYKCgtMkxTfC3fUrgVAmfOEJ0T3Uk9OgcLNNnY4clWRbDwZUOFVXdosUN9Hkrr2d9TtF+r+H2Nrv1q5E+JRHwWu9vZjpYNnc= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1711362740; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=/CP3tYAe1YHmNL32+dw/Xdzro2d0HaCbCUNwrnQVi/s=; b=DmmBt6x+0I5roy+NLWZWRIgFgGs17ao07frn+h5Hh0xXtfHnffqUl+u0D15SutS/iSCklr ppkTRap5VpxncJWqyHL3gAU2qtCs1qkecqKfYDbOo03SfP9sShZ4dvMqPnoGQJKecX09mQ ObMncG6eT3xMCyKX9HlOhBAa4uM694s= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-320-vGzt6bCmN8GwiDXTtCnIGA-1; Mon, 25 Mar 2024 06:32:16 -0400 X-MC-Unique: vGzt6bCmN8GwiDXTtCnIGA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 943D5185A781; Mon, 25 Mar 2024 10:32:16 +0000 (UTC) Received: from calimero.vinschen.de (unknown [10.39.192.6]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 431C4111DD0C; Mon, 25 Mar 2024 10:32:16 +0000 (UTC) Received: by calimero.vinschen.de (Postfix, from userid 500) id C0899A80B9E; Mon, 25 Mar 2024 11:32:10 +0100 (CET) Date: Mon, 25 Mar 2024 11:32:10 +0100 From: Corinna Vinschen To: Jun T Cc: newlib@sourceware.org, Bruno Haible Subject: Re: wctomb() accepts out-of-range character in C-locale Message-ID: Reply-To: newlib@sourceware.org Mail-Followup-To: Jun T , newlib@sourceware.org, Bruno Haible References: MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: [CC Bruno Haible, gnulib maintainer, to kick my memory] Hi Jun, On Mar 25 16:45, Jun T wrote: > Dear newlib developers, > (this is the first time I post to this list) > > On recent Cygwin, the following C code output '1' (i.e., wide character > 0x80 can be converted into a valid single-byte character in C-locale): > > --------------------------------------- > #include > #include > #include > > int main() { > char buf[MB_CUR_MAX]; > setlocale(LC_ALL, "C"); > printf("%d\n", wctomb(buf, 0x80)); > return 0; > } > --------------------------------------- > > On Linux it outputs '-1'. > > It seems this is due to the following commit: > > ------------------------------------------------ > commit 8a4318943875cd922601d34e54ce8a83ad2e733c > Author: Corinna Vinschen > Date: Mon Jul 31 12:44:16 2023 +0200 > > Revert "* libc/stdlib/mbtowc_r.c (__ascii_mbtowc): Disallow conversion of" > > This reverts commit 2b77087a48ea56e77fca5aeab478c922f6473d7c. > > For some reason lost in time, commit 2b77087a48ea5 introduced > Cygwin-specific code treating single byte characters outside the > portable character set as illegal chars. However, Cygwin was > always alone with this over-correct behaviour and it leads to > stuff like gnulib replacing functions defined in Cygwin with > their own implementation just due to that. > ------------------------------------------------ > > Probably the function __ascii_wctomb() is used not only in C-locale > but also in some other locales, and the commit is for "fixing" > some problems in these locales? No, __ascii_wctomb is by default used in "C". > But a wide character >= 0x80 can't be converted into a valid > character in C-loccale (7bit), I think. Yes, I know, and that was what the original code from 2b77087a48 did. But at the time I reverted this special handling, Bruno had reported a change in gnulib in terms of fnmatch starting at https://cygwin.com/pipermail/cygwin/2023-July/254017.html During testing I found that gnulib was replacing various functions built into Cygwin for several reasons, and one of them was that the conversion of wide char to multibyte in the "C" locale was not transparently converting chars from 0x80 up to 0xff. I'm actually puzzled right now that this doesn't work in GLibc either. Bruno, I really need your input here, because I just don't remember :( Do you have an idea what gnulib configure test might have been the trigger for the above revert? And if GLibc also doesn't let chars >= 0x80 slip through, then Cygwin's special handling was right. But then this would introduce gnulib trouble again... Can you help us? Thanks, Corinna