From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <cygwin-return-168691-listarch-cygwin=sourceware.org@cygwin.com>
Received: (qmail 28366 invoked by alias); 3 Feb 2011 00:13:03 -0000
Received: (qmail 28356 invoked by uid 22791); 3 Feb 2011 00:13:02 -0000
X-SWARE-Spam-Status: No, hits=-0.8 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,TW_WW
X-Spam-Check-By: sourceware.org
Received: from mo-p00-ob.rzone.de (HELO mo-p00-ob.rzone.de) (81.169.146.160)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 03 Feb 2011 00:12:56 +0000
X-RZG-AUTH: :Ln4Re0+Ic/6oZXR1YgKryK8brksyK8dozXDwHXjf9hj/zDNRbfA44+iwyQ==
X-RZG-CLASS-ID: mo00
Received: from linuix.haible.de	(dslb-088-068-046-137.pools.arcor-ip.net [88.68.46.137])	by post.strato.de (fruni mo29) (RZmta 25.1)	with ESMTPA id u03529n12N4wUv ; Thu, 3 Feb 2011 01:12:53 +0100 (MET)
From: Bruno Haible <bruno@clisp.org>
To: Eric Blake <eblake@redhat.com>
Subject: Re: 16-bit wchar_t on Windows and Cygwin
Date: Thu, 03 Feb 2011 00:13:00 -0000
User-Agent: KMail/1.9.9
Cc: bug-gnulib@gnu.org, cygwin@cygwin.com
References: <201101310304.42975.bruno@clisp.org> <201102030003.46763.bruno@clisp.org> <4D49E68C.2030509@redhat.com>
In-Reply-To: <4D49E68C.2030509@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain;  charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <201102030112.53179.bruno@clisp.org>
Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm
Precedence: bulk
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe@cygwin.com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-help@cygwin.com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
X-SW-Source: 2011-02/txt/msg00088.txt.bz2

Hi Eric,

> I was asking:
> 
> should wwchar_t (or xwchar_t, but not xchar_t) be 2-bytes on cygwin, but
> unlike the POSIX definition of wchar_t being always 1 character per
> unit, the new type is explicitly documented as being multi-unit on some
> platforms but with sane semantics
> 
> or should it always be 4-bytes, where conversion from wchar_t to
> wwchar_t requires some efforts, and where the new type must be used
> everywhere (which means wrapping a lot of APIs), but where you can once
> again assume POSIX semantics of 1 character per unit, simplifying life
> of callers at the expense of converting to the new type

In the first case we wouldn't need a new type.

The plan is the second alternative. The goal is *not* to have to extend
each of quotearg.c, regcomp.c, mbchar.h, wc.c, etc. to handle UTF-16
explicitly with #ifdefs, more variables, and more logic.

> if it works out, should we also add wwchar_t natively into cygwin? 

More and more Unix platforms offer only UTF-8 locales. One can predict
that in 10 years, all Unix platforms will offer only UTF-8 locales. At this
point wchar_t will be UCS-4 on all these platforms (except AIX).

The mbrtoc32 function from the C1X API that you pointed to will then be
equivalent to mbrtowwc.

So, you can view 'wwchar_t' as a temporary measure that will bridge the
gap between the ANSI C Amd. 1 API and the C1X API.

Bruno
-- 
In memoriam Carl Friedrich Goerdeler <http://en.wikipedia.org/wiki/Carl_Friedrich_Goerdeler>

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple