public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [RFC] Allow explicit 16 or 32 char in 'x /s'
@ 2010-03-17 22:43 Pierre Muller
  2010-03-18  7:01 ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Pierre Muller @ 2010-03-17 22:43 UTC (permalink / raw)
  To: gdb-patches

 
  The patch below allows to 
print strings that are made of 16 bit or 32 bit char 
using:
'x /hs ' or 'x /ws ' commands.

  I tried to enable this feature, keeping it to a minimum:
  The size modifier is not remembered for /s format,
thus any subsequent use of /s alone will still 
print out byte char strings.

I found out a c-language specific issue that made a wrong calculation of the
position of the next string, if you used 'x /2hs ' command
and have two consecutive Unicode strings.
  This patch also fixes that problem,
but I am not sure that this problem could really appear before
as the char size was fored to 1 byte...

Pierre Muller



2010-03-17  Pierre Muller  <muller@ics.u-strasbg.fr>

	* c-lang.c (classify_type): Recognize also types used
	for /hs or /ws format specifier in 'x' command.
	* printcmd.c (decode_format): Set char size to byte
	for strings unless explicit size is given.
	(print_formatted): Correct calculation of NEXT_ADDRESS
	for 16 or 32 bit strings.
	(do_examine): Do not force byte size for strings.

Index: c-lang.c
===================================================================
RCS file: /cvs/src/src/gdb/c-lang.c,v
retrieving revision 1.81
diff -u -p -r1.81 c-lang.c
--- c-lang.c	5 Mar 2010 20:18:11 -0000	1.81
+++ c-lang.c	17 Mar 2010 22:11:08 -0000
@@ -100,13 +100,19 @@ classify_type (struct type *elttype, str
 	  goto done;
 	}
 
-      if (!strcmp (name, "char16_t"))
+      /* Also recognize the type used by 'x /hs' command.  */
+      if (!strcmp (name, "char16_t")
+          || (TYPE_CODE (elttype) == TYPE_CODE_INT
+              && TYPE_LENGTH (elttype) == 2))
 	{
 	  result = C_CHAR_16;
 	  goto done;
 	}
 
-      if (!strcmp (name, "char32_t"))
+      /* Also recognize the type used by 'x /ws' command.  */
+      if (!strcmp (name, "char32_t")
+          || (TYPE_CODE (elttype) == TYPE_CODE_INT
+              && TYPE_LENGTH (elttype) == 4))
 	{
 	  result = C_CHAR_32;
 	  goto done;
Index: printcmd.c
===================================================================
RCS file: /cvs/src/src/gdb/printcmd.c,v
retrieving revision 1.173
diff -u -p -r1.173 printcmd.c
--- printcmd.c	5 Mar 2010 20:18:14 -0000	1.173
+++ printcmd.c	17 Mar 2010 22:11:08 -0000
@@ -260,6 +260,11 @@ decode_format (char **string_ptr, int of
 	/* Characters default to one byte.  */
 	val.size = osize ? 'b' : osize;
 	break;
+      case 's':
+	/* Display strings with byte size chars unless explicitly specified.
*/
+	val.size = 'b';
+	break;
+
       default:
 	/* The default is the size most recently specified.  */
 	val.size = osize;
@@ -295,7 +300,7 @@ print_formatted (struct value *val, int 
 	    next_address = (value_address (val)
 			    + val_print_string (elttype,
 						value_address (val), -1,
-						stream, options));
+						stream, options) * len);
 	  }
 	  return;
 
@@ -802,9 +807,11 @@ do_examine (struct format_data fmt, stru
   next_gdbarch = gdbarch;
   next_address = addr;
 
-  /* String or instruction format implies fetch single bytes
-     regardless of the specified size.  */
-  if (format == 's' || format == 'i')
+  /* Instruction format implies fetch single bytes
+     regardless of the specified size.
+     The case of strings is handled n decode_format, only explicit
+     size operator are not changed to 'b'.  */
+  if (format == 'i')
     size = 'b';
 
   if (size == 'a')

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Allow explicit 16 or 32 char in 'x /s'
  2010-03-17 22:43 [RFC] Allow explicit 16 or 32 char in 'x /s' Pierre Muller
@ 2010-03-18  7:01 ` Eli Zaretskii
  2010-03-18 14:20   ` Pierre Muller
       [not found]   ` <001e01cac69a$75167630$5f436290$%muller@ics-cnrs.unistra.fr>
  0 siblings, 2 replies; 8+ messages in thread
From: Eli Zaretskii @ 2010-03-18  7:01 UTC (permalink / raw)
  To: Pierre Muller; +Cc: gdb-patches

> From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr>
> Date: Wed, 17 Mar 2010 23:42:53 +0100
> 
>  
>   The patch below allows to 
> print strings that are made of 16 bit or 32 bit char 
> using:
> 'x /hs ' or 'x /ws ' commands.

Thanks.  If this patch is accepted, we will need a suitable change for
the manual.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [RFC] Allow explicit 16 or 32 char in 'x /s'
  2010-03-18  7:01 ` Eli Zaretskii
@ 2010-03-18 14:20   ` Pierre Muller
       [not found]   ` <001e01cac69a$75167630$5f436290$%muller@ics-cnrs.unistra.fr>
  1 sibling, 0 replies; 8+ messages in thread
From: Pierre Muller @ 2010-03-18 14:20 UTC (permalink / raw)
  To: 'Eli Zaretskii'; +Cc: gdb-patches

> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Eli Zaretskii
> Envoyé : Thursday, March 18, 2010 8:02 AM
> À : Pierre Muller
> Cc : gdb-patches@sourceware.org
> Objet : Re: [RFC] Allow explicit 16 or 32 char in 'x /s'
> 
> > From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr>
> > Date: Wed, 17 Mar 2010 23:42:53 +0100
> >
> >
> >   The patch below allows to
> > print strings that are made of 16 bit or 32 bit char
> > using:
> > 'x /hs ' or 'x /ws ' commands.
> 
> Thanks.  If this patch is accepted, we will need a suitable change for
> the manual.

How about this change?

Pierre

doc/ChangeLog entry:

2010-03-18  Pierre Muller  <muller@ics.u-strasbg.fr>

      * gdbint.texinfo (Examining memory): Update for
	change in string display with explicit size.


Index: doc/gdb.texinfo
===================================================================
RCS file: /cvs/src/src/gdb/doc/gdb.texinfo,v
retrieving revision 1.680
diff -u -p -r1.680 gdb.texinfo
--- doc/gdb.texinfo	12 Mar 2010 19:15:52 -0000	1.680
+++ doc/gdb.texinfo	18 Mar 2010 12:50:15 -0000
@@ -7232,8 +7232,11 @@ Giant words (eight bytes).
 @end table
 
 Each time you specify a unit size with @code{x}, that size becomes the
-default unit the next time you use @code{x}.  (For the @samp{s} and
-@samp{i} formats, the unit size is ignored and is normally not written.)
+default unit the next time you use @code{x}. For the @samp{i} format,
+the unit size is ignored and is normally not written. For the @samp{s}
format,
+the unit size defaults to @samp{b}, unless it is explicitly given.
+Ue @code{x /hs} to display 16-bit char strings and @code{x /ws} to display
+32-bit strings. The next use of @code{x /s} will still display 8-bit
strings.
 
 @item @var{addr}, starting display address
 @var{addr} is the address where you want @value{GDBN} to begin displaying

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Allow explicit 16 or 32 char in 'x /s'
       [not found]   ` <001e01cac69a$75167630$5f436290$%muller@ics-cnrs.unistra.fr>
@ 2010-03-18 18:26     ` Eli Zaretskii
  0 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2010-03-18 18:26 UTC (permalink / raw)
  To: Pierre Muller; +Cc: gdb-patches

> From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr>
> Cc: <gdb-patches@sourceware.org>
> Date: Thu, 18 Mar 2010 13:56:42 +0100
> 
> > >   The patch below allows to
> > > print strings that are made of 16 bit or 32 bit char
> > > using:
> > > 'x /hs ' or 'x /ws ' commands.
> > 
> > Thanks.  If this patch is accepted, we will need a suitable change for
> > the manual.
> 
> How about this change?

It's okay, but it needs a few fixes:

> doc/ChangeLog entry:
> 
> 2010-03-18  Pierre Muller  <muller@ics.u-strasbg.fr>
> 
>       * gdbint.texinfo (Examining memory): Update for

gdb.texinfo, not gdbint.texinfo.

> +default unit the next time you use @code{x}. For the @samp{i} format,
                                              ^^
Two spaces between sentences (here and elsewhere in your patch).

> +Ue @code{x /hs} to display 16-bit char strings and @code{x /ws} to display

Suggest to rephrase

  Use @kbd{x /hs} to display strings made of 16-bid wide characters

and similarly for x/ws.

> +32-bit strings. The next use of @code{x /s} will still display 8-bit
                                                    ^^^^^
I suggest "again" instead of "still"

Okay with these changes.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Allow explicit 16 or 32 char in 'x /s'
       [not found]     ` <15103.6087111153$1269298497@news.gmane.org>
@ 2010-03-30 20:33       ` Tom Tromey
  0 siblings, 0 replies; 8+ messages in thread
From: Tom Tromey @ 2010-03-30 20:33 UTC (permalink / raw)
  To: Pierre Muller; +Cc: 'Eli Zaretskii', gdb-patches

>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:

Pierre>   But I don't know exactly for other languages and I would like
Pierre> to be sure about what you want me to add to the docs...

I think no other language has been updated to deal with wide characters.

Pierre>   Furthermore if you look into charset_for_string_type
Pierre> function in c-lang.c source, you will see that there are two FIXME
Pierre> just right at the position of these charset name settings.

Yeah ... those are actually pedantic FIXMEs, in that (IIRC) nothing
guarantees that char16_t==UTF-16, even though that is the common
meaning.

Pierre>   To answer Tom's concern about the change in classify_type function,
Pierre> I modified my patch to change the elttype in do_examine to match exactly
Pierre> what is expected by charset_for_string_type function.
Pierre> Thus this new version has no modification in c-lang.c file.

Suppose the inferior does not define char16_t.  Won't this new code
allocate a new type each time the user uses x/hs?  That seems bad.

What about passing the desired encoding to LA_PRINT_STRING, via a new
argument to val_print_string?  That makes the patch a lot bigger, though
it is mostly mechanical.

Pierre>   I also added a very basic check for string display using 'x
Pierre> /hs' and 'x /ws'.

Thanks.

Pierre> +      case 's':
Pierre> +	/* Display strings with byte size chars unless explicitly specified.
Pierre> */
Pierre> +	val.size = 'b';
Pierre> +	break;

I think x/hs followed by x should probably print another wide string.
I couldn't tell offhand if it does this or not.

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [RFC] Allow explicit 16 or 32 char in 'x /s'
  2010-03-19  7:32   ` Eli Zaretskii
@ 2010-03-22 22:54     ` Pierre Muller
       [not found]     ` <15103.6087111153$1269298497@news.gmane.org>
  1 sibling, 0 replies; 8+ messages in thread
From: Pierre Muller @ 2010-03-22 22:54 UTC (permalink / raw)
  To: 'Eli Zaretskii', tromey; +Cc: gdb-patches



> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Eli Zaretskii
> Envoyé : Friday, March 19, 2010 8:32 AM
> À : tromey@redhat.com
> Cc : pierre.muller@ics-cnrs.unistra.fr; gdb-patches@sourceware.org
> Objet : Re: [RFC] Allow explicit 16 or 32 char in 'x /s'
> 
> > From: Tom Tromey <tromey@redhat.com>
> > Cc: <gdb-patches@sourceware.org>
> > Date: Thu, 18 Mar 2010 16:08:27 -0600
> >
> > I think the documentation should reflect that the user can't choose
> the
> > encoding used here.
> 
> I agree.  It should also say which encoding is used by GDB in this
> case.

  Not that I do not agree with you, but I would like to 
stress that how the string is displayed also depend on the current language,
so that, for C or any other language using c_printstr function,
/hs will use UTF-16LE or UTF-16BE according to current gdbarch endianess.
/ws will use UTF-32LE or UTF-32BE.

  But I don't know exactly for other languages and I would like to be sure
about
what you want me to add to the docs...
  Furthermore if you look into charset_for_string_type
function in c-lang.c source, you will see that there are two FIXME
just right at the position of these charset name settings.

  To answer Tom's concern about the change in classify_type function,
I modified my patch to change the elttype in do_examine to match exactly
what is expected by charset_for_string_type function.
Thus this new version has no modification in c-lang.c file.

  I also added a very basic check for string display using 'x /hs' and 'x
/ws'.

Pierre Muller



2010-03-22  Pierre Muller  <muller@ics.u-strasbg.fr>

	* printcmd.c (decode_format): Set char size to byte
	for strings unless explicit size is given.
	(print_formatted): Correct calculation of NEXT_ADDRESS
	for 16 or 32 bit strings.
	(do_examine): Do not force byte size for strings.
	Use 'char16_t' and 'char32_t' types to allow
	for correct recognition in classify_type.
	
2010-03-22  Pierre Muller  <muller@ics.u-strasbg.fr>

	* gdb.base/charset.c (Strin16, String32): New variables.
	* gdb.base/charset.exp (gdb_test): Test correct display
	of 16 or 32 bit strings.

Index: printcmd.c
===================================================================
RCS file: /cvs/src/src/gdb/printcmd.c,v
retrieving revision 1.173
diff -u -p -r1.173 printcmd.c
--- printcmd.c	5 Mar 2010 20:18:14 -0000	1.173
+++ printcmd.c	22 Mar 2010 22:25:34 -0000
@@ -260,6 +260,11 @@ decode_format (char **string_ptr, int of
 	/* Characters default to one byte.  */
 	val.size = osize ? 'b' : osize;
 	break;
+      case 's':
+	/* Display strings with byte size chars unless explicitly specified.
*/
+	val.size = 'b';
+	break;
+
       default:
 	/* The default is the size most recently specified.  */
 	val.size = osize;
@@ -295,7 +300,7 @@ print_formatted (struct value *val, int 
 	    next_address = (value_address (val)
 			    + val_print_string (elttype,
 						value_address (val), -1,
-						stream, options));
+						stream, options) * len);
 	  }
 	  return;
 
@@ -802,9 +807,11 @@ do_examine (struct format_data fmt, stru
   next_gdbarch = gdbarch;
   next_address = addr;
 
-  /* String or instruction format implies fetch single bytes
-     regardless of the specified size.  */
-  if (format == 's' || format == 'i')
+  /* Instruction format implies fetch single bytes
+     regardless of the specified size.
+     The case of strings is handled n decode_format, only explicit
+     size operator are not changed to 'b'.  */
+  if (format == 'i')
     size = 'b';
 
   if (size == 'a')
@@ -831,6 +838,36 @@ do_examine (struct format_data fmt, stru
   else if (size == 'g')
     val_type = builtin_type (next_gdbarch)->builtin_int64;
 
+  if (format == 's')
+    {
+      struct type *char_type;
+      if (size == 'h')
+	{
+	  char_type = lookup_typename (current_language, next_gdbarch,
+				       "char16_t", NULL, 1);
+	  if (!char_type)
+	    char_type = arch_type (next_gdbarch, TYPE_CODE_INT, 2,
"char16_t");
+	  check_typedef (char_type);
+	  if (TYPE_LENGTH (char_type) == 2)
+	    val_type = char_type;
+	}
+      else if (size == 'w')
+	{
+	  char_type = lookup_typename (current_language, next_gdbarch,
+				       "char32_t", NULL, 1);
+	  if (!char_type)
+	    char_type = arch_type (next_gdbarch, TYPE_CODE_INT, 4,
"char32_t");
+	  check_typedef (char_type);
+	  if (char_type && TYPE_LENGTH (char_type) == 4)
+	    val_type = char_type;
+	}
+      else
+        {
+	  size = 'b';
+	  val_type = builtin_type (next_gdbarch)->builtin_int8;
+        }
+    }
+
   maxelts = 8;
   if (size == 'w')
     maxelts = 4;
Index: testsuite/gdb.base/charset.c
===================================================================
RCS file: /cvs/src/src/gdb/testsuite/gdb.base/charset.c,v
retrieving revision 1.12
diff -u -p -r1.12 charset.c
--- testsuite/gdb.base/charset.c	1 Jan 2010 07:32:00 -0000	1.12
+++ testsuite/gdb.base/charset.c	22 Mar 2010 22:25:34 -0000
@@ -65,6 +65,9 @@ typedef unsigned int char32_t;
 char16_t uvar;
 char32_t Uvar;
 
+char16_t *String16;
+char32_t *String32;
+
 /* A typedef to a typedef should also work.  */
 typedef wchar_t my_wchar_t;
 my_wchar_t myvar;
Index: testsuite/gdb.base/charset.exp
===================================================================
RCS file: /cvs/src/src/gdb/testsuite/gdb.base/charset.exp,v
retrieving revision 1.21
diff -u -p -r1.21 charset.exp
--- testsuite/gdb.base/charset.exp	17 Feb 2010 22:05:58 -0000	1.21
+++ testsuite/gdb.base/charset.exp	22 Mar 2010 22:25:35 -0000
@@ -616,4 +616,21 @@ gdb_test "print 'a' == 'a' || 'b' == 'b'
   ".* = 1" \
   "EVAL_SKIP cleanup handling regression test"
 
+
+proc string_display { var_name set_prefix x_size x_type} {
+  gdb_test "set ${var_name} = ${set_prefix}\"Test String\\0with zeroes\""
"" "Assign ${var_name} with prefix ${set_prefix}"
+  gdb_test "x /2${x_size}s ${var_name}" ".* ${x_type}\"Test
String\"\[\r\n\]+.* ${x_type}\"with zeroes\"" "Display String ${var_name}
with x/${x_size}s"
+}
+
+string_display String16 u h u
+if {$wchar_size == 2} {
+  string_display String16 L h u
+}
+ 
+string_display String32 U w U
+if {$wchar_size == 4} {
+  string_display String32 L w U
+}
+
+
 gdb_exit 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Allow explicit 16 or 32 char in 'x /s'
  2010-03-18 22:08 ` Tom Tromey
@ 2010-03-19  7:32   ` Eli Zaretskii
  2010-03-22 22:54     ` Pierre Muller
       [not found]     ` <15103.6087111153$1269298497@news.gmane.org>
  0 siblings, 2 replies; 8+ messages in thread
From: Eli Zaretskii @ 2010-03-19  7:32 UTC (permalink / raw)
  To: tromey; +Cc: pierre.muller, gdb-patches

> From: Tom Tromey <tromey@redhat.com>
> Cc: <gdb-patches@sourceware.org>
> Date: Thu, 18 Mar 2010 16:08:27 -0600
> 
> I think the documentation should reflect that the user can't choose the
> encoding used here.

I agree.  It should also say which encoding is used by GDB in this
case.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Allow explicit 16 or 32 char in 'x /s'
       [not found] <11484.4708740295$1268865815@news.gmane.org>
@ 2010-03-18 22:08 ` Tom Tromey
  2010-03-19  7:32   ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Tromey @ 2010-03-18 22:08 UTC (permalink / raw)
  To: Pierre Muller; +Cc: gdb-patches

>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:

Pierre>   The patch below allows to 
Pierre> print strings that are made of 16 bit or 32 bit char 
Pierre> using:
Pierre> 'x /hs ' or 'x /ws ' commands.

It seems like a good idea to me.

Pierre>   I tried to enable this feature, keeping it to a minimum:
Pierre>   The size modifier is not remembered for /s format,
Pierre> thus any subsequent use of /s alone will still 
Pierre> print out byte char strings.

If the user types 'x/2hs' and then 'x/2', does the second invocation
still print wide strings?  I think it should.

Pierre> -      if (!strcmp (name, "char16_t"))
Pierre> +      /* Also recognize the type used by 'x /hs' command.  */
Pierre> +      if (!strcmp (name, "char16_t")
Pierre> +          || (TYPE_CODE (elttype) == TYPE_CODE_INT
Pierre> +              && TYPE_LENGTH (elttype) == 2))
Pierre>  	{
Pierre>  	  result = C_CHAR_16;
Pierre>  	  goto done;
Pierre>  	}

I am a little concerned that this code can confuse the user.
If sizeof(wchar_t) == 2, then sometimes you could end up printing a
wchar_t using UTF-16 -- which may or may not be appropriate.

I'm not sure how much this matters in practice.  However, it seems like
it may be cleaner to override classify_type's decision based directly on
the format character, instead of on the implied type.  What do you think
of that?  This would also let us introduce a new format character
meaning "wchar_t".

I think the documentation should reflect that the user can't choose the
encoding used here.

Pierre> +     The case of strings is handled n decode_format, only explicit

Typo, s/n/in/

Finally, please add some test cases.

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-03-30 20:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-17 22:43 [RFC] Allow explicit 16 or 32 char in 'x /s' Pierre Muller
2010-03-18  7:01 ` Eli Zaretskii
2010-03-18 14:20   ` Pierre Muller
     [not found]   ` <001e01cac69a$75167630$5f436290$%muller@ics-cnrs.unistra.fr>
2010-03-18 18:26     ` Eli Zaretskii
     [not found] <11484.4708740295$1268865815@news.gmane.org>
2010-03-18 22:08 ` Tom Tromey
2010-03-19  7:32   ` Eli Zaretskii
2010-03-22 22:54     ` Pierre Muller
     [not found]     ` <15103.6087111153$1269298497@news.gmane.org>
2010-03-30 20:33       ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).