From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28923 invoked by alias); 1 Apr 2010 09:34:22 -0000 Received: (qmail 28907 invoked by uid 22791); 1 Apr 2010 09:34:21 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=BAYES_00,MSGID_MULTIPLE_AT,TW_BJ X-Spam-Check-By: sourceware.org Received: from mailhost.u-strasbg.fr (HELO mailhost.u-strasbg.fr) (130.79.200.155) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 01 Apr 2010 09:34:16 +0000 Received: from baal.u-strasbg.fr (baal.u-strasbg.fr [IPv6:2001:660:2402::41]) by mailhost.u-strasbg.fr (8.14.3/jtpda-5.5pre1) with ESMTP id o319Xw1q076420 ; Thu, 1 Apr 2010 11:33:58 +0200 (CEST) (envelope-from pierre.muller@ics-cnrs.unistra.fr) Received: from mailserver.u-strasbg.fr (ms3.u-strasbg.fr [IPv6:2001:660:2402:d::12]) by baal.u-strasbg.fr (8.14.0/jtpda-5.5pre1) with ESMTP id o319XwCv028745 ; Thu, 1 Apr 2010 11:33:58 +0200 (CEST) (envelope-from pierre.muller@ics-cnrs.unistra.fr) Received: from d620muller (gw-ics.u-strasbg.fr [130.79.210.225]) (user=mullerp mech=LOGIN) by mailserver.u-strasbg.fr (8.14.3/jtpda-5.5pre1) with ESMTP id o319Xvtf056609 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO) ; Thu, 1 Apr 2010 11:33:57 +0200 (CEST) (envelope-from pierre.muller@ics-cnrs.unistra.fr) From: "Pierre Muller" To: "'Eli Zaretskii'" Cc: , References: <11484.4708740295$1268865815@news.gmane.org> <83r5ngix6d.fsf@gnu.org> <15103.6087111153$1269298497@news.gmane.org> <006101cad0ec$cb7915d0$626b4170$%muller@ics-cnrs.unistra.fr> <83tyrwxy72.fsf@gnu.org> In-Reply-To: <83tyrwxy72.fsf@gnu.org> Subject: RE: [RFC-v2] Allow explicit 16 or 32 char in 'x /s' Date: Thu, 01 Apr 2010 09:34:00 -0000 Message-ID: <000f01cad17e$7686f140$6394d3c0$@muller@ics-cnrs.unistra.fr> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2010-04/txt/msg00003.txt.bz2 > -----Message d'origine----- > De=A0: gdb-patches-owner@sourceware.org [mailto:gdb-patches- > owner@sourceware.org] De la part de Eli Zaretskii > > +the unit size defaults to @samp{b}, unless it is explicitly given. > > +Use @kbd{x /hs} to display 16-bit char strings and @kbd{x /ws} to > display > > +32-bit strings. The next use of @kbd{x /s} will again display 8-bit > > strings. >=20 > This is okay, but I still think we should mention that the encoding is > UTF-16 and UCS-4, respectively, and that it cannot be changed. According to c_emit_char function, it is=20 UTF-16 (LE or BE depending on target endianess) or UTF-32 (LE or BE also). Is UCS-4 exactly the same as UTF-32? Furthermore, this is c_emit_char, which means that this is a language specific output. Several languages have their own emit_char functions, several of them start by a=20 c &=3D 0xFF; line, which discards higher bytes of the character value. (found in f-lang.c:86, m2-lang.c:45, objc-lang.c:287 and p-lang.c:161) Of course these implementations would benefit from=20 using the more up to date c-lang.c implementation, but that is another story. This means that UTF-16 and UTF-32 will only be used for c, cplus, assembler, minimal.=20 Java language seems to use another scheme to represent=20 extended characters: it uses=20 fprintf_unfiltered (stream, "\\u%.4x", (unsigned int) c); To summarize, I don't think that saying that ' /hs' uses UTF-16 without specifying that this is language specific is correct. Should I just mention that the output is language dependent and uses UTF-16 or UTF-32 for c, cplus, assembler and minimal languages? Pierre Muller