From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9149 invoked by alias); 2 Feb 2011 23:03:59 -0000 Received: (qmail 9140 invoked by uid 22791); 2 Feb 2011 23:03:57 -0000 X-SWARE-Spam-Status: No, hits=-0.7 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,TW_WW X-Spam-Check-By: sourceware.org Received: from mo-p00-ob.rzone.de (HELO mo-p00-ob.rzone.de) (81.169.146.162) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 02 Feb 2011 23:03:50 +0000 X-RZG-AUTH: :Ln4Re0+Ic/6oZXR1YgKryK8brksyK8dozXDwHXjf9hj/zDNRbfA44+iwyQ== X-RZG-CLASS-ID: mo00 Received: from linuix.haible.de (dslb-088-068-046-137.pools.arcor-ip.net [88.68.46.137]) by post.strato.de (fruni mo63) (RZmta 25.1) with ESMTPA id 204577n12M9clT ; Thu, 3 Feb 2011 00:03:47 +0100 (MET) From: Bruno Haible To: bug-gnulib@gnu.org Subject: Re: 16-bit wchar_t on Windows and Cygwin Date: Wed, 02 Feb 2011 23:03:00 -0000 User-Agent: KMail/1.9.9 Cc: Eric Blake , cygwin@cygwin.com References: <201101310304.42975.bruno@clisp.org> <201102021229.04623.bruno@clisp.org> <4D49CB7C.5040000@redhat.com> In-Reply-To: <4D49CB7C.5040000@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201102030003.46763.bruno@clisp.org> Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com X-SW-Source: 2011-02/txt/msg00085.txt.bz2 Hello Eric, > > Here's a new proposal: > > - Define a type 'wwchar_t' on all platforms, equivalent to uint32_t > > on Windows platforms and to 'wchar_t' otherwise. > > - Define functions 'mbrtowwc', 'iswwalpha', 'wwcwidth', and similar. > > Their definition will be a trivial redirection to 'mbrtowc', 'iswalpha', > > 'wcwidth' on most platforms, and a use of libunistring modules on > > Windows platforms. > ... > Are you thinking of making a sane wrapping around either 4-byte wchar_t > or which maps to 2-byte wchar_t but sanely handles UTF-16 (which makes > it a thin wrapper on both Linux and Cygwin, but needing more work on > mingw), or are you thinking that it is always a 4-byte type (needing > lots more memory manipulation on cygwin to convert between 2- and 4-byte > representations when using cygwin's functions, or else reimplementing > everything from scratch by completely bypassing cygwin)? I'm not sure I understand your question. The plan is that - On platforms with a 32-bit wchar_t, like glibc, *BSD, and many others, 'wwchar_t' is identical to 'wchar_t', and the function wrappers are simple redirections. - On Cygwin and mingw, wwchar_t is 'uint32_t' (so as to accommodate all Unicode characters and WEOF and so that it plays well with 'wint_t'). mbrtowwc is implemented by 1 or 2 calls to mbrtowc. mbsrtowwcs may be implemented by a call to mbsrtowcs and an additional conversion loop, or it might be implemented on top of mbrtowwc; that's merely a speed vs. memory trade-off. The plan is not to "completely bypassing cygwin", but to use as much of Cygwin's built-ins as makes sense. - On platforms with a 16-bit wchar_t but where the wchar_t[] encoding in Unicode locales is merely UCS-2, like AIX, use the no-op thin wrappers as well. If the platform does not support more than the BMP, it makes not much sense for GNU programs to try to work around that. > As to the name: I agree the opinion of others that xchar_t is easier to > type and easier to avoid typos of a missing 'w' than wwchar_t. If a developer makes a typo here, he's likely to get a gcc warning or a link error. But yes, it's possible to pass a 'wwchar_t' to iswalpha(), which will yield wrong results. I don't think this risk can be much reduced through a different name. > gnulib already has xprintf as a counterpart to xmalloc (which calls > exit() if the printf fails for memory allocation or other non-I/O > related reasons), so we can't blindly use 'x' Good point. The 'x' prefix has already several meanings in gnulib: - checking against memory allocation failure, - checking against errors, - no size limitation, - a more convenient interface, - a wrapper that prints an error message. It doesn't seem wise to add another meaning to it. Thanks for the feedback. -- In memoriam Carl Friedrich Goerdeler -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple