From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27044 invoked by alias); 22 Sep 2009 09:57:55 -0000 Received: (qmail 27036 invoked by uid 22791); 22 Sep 2009 09:57:54 -0000 X-Spam-Check-By: sourceware.org Received: from aquarius.hirmke.de (HELO calimero.vinschen.de) (217.91.18.234) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 22 Sep 2009 09:57:49 +0000 Received: by calimero.vinschen.de (Postfix, from userid 500) id 210DE6D434D; Tue, 22 Sep 2009 11:57:39 +0200 (CEST) Date: Tue, 22 Sep 2009 09:57:00 -0000 From: Corinna Vinschen To: cygwin@cygwin.com Subject: Re: non-BMP character width Message-ID: <20090922095739.GS20981@calimero.vinschen.de> Reply-To: cygwin@cygwin.com Mail-Followup-To: cygwin@cygwin.com References: <200909161148.n8GBm4ha001469@mail.bln1.bf.nsn-intra.net> <20090921163348.GL20981@calimero.vinschen.de> <20090921175759.GM20981@calimero.vinschen.de> <4AB8592F.9060803@lapo.it> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4AB8592F.9060803@lapo.it> User-Agent: Mutt/1.5.19 (2009-02-20) Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com X-SW-Source: 2009-09/txt/msg00525.txt.bz2 On Sep 22 06:57, Lapo Luchini wrote: > Corinna Vinschen wrote: > > Sure. I was specificially asking for a testcase, preferrably in > > plain C, which allows to reproduce this under a debugger. > > Actually, I can't reproduce that, but I guess it's a problem of the > specific console he's using (Thomas, which one is that?): on mintty it > works ok (I'm not really sure it outputs U+10001, but it surely shows a > single box) and on rxvt it just shows as four ISO-8859-1 chars: > (es expected, as native rxvt doesn't support Unicode) > > mintty% echo "-\xF0\x90\x80\x81-" > -???- > rxvt% echo "-\xF0\x90\x80\x81-" > -ð???- > > Also ok on `ls`: > > % cat s.c > int main() { > fopen("a-\xF0\x90\x80\x81", "w"); > return 0; > } > % ./s > % ls -l|fgrep a- > -rw-r--r-- 1 lapo None 0 22 Sep 06:50 a-??? Uh, I see. That occurs in the normal Windows console. This is not Cygwin's fault. Cygwin's console code converts the multibyte string to the WCHAR representation and prints it to the console using the WriteConsoleW function. That function prints two blocks/question marks for a surrogate pair. Look at the file in a cmd shell, it will also print two blocks/question marks for the surrogate pair. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple