* [RFC] Allow explicit 16 or 32 char in 'x /s'
@ 2010-03-17 22:43 Pierre Muller
2010-03-18 7:01 ` Eli Zaretskii
0 siblings, 1 reply; 8+ messages in thread
From: Pierre Muller @ 2010-03-17 22:43 UTC (permalink / raw)
To: gdb-patches
The patch below allows to
print strings that are made of 16 bit or 32 bit char
using:
'x /hs ' or 'x /ws ' commands.
I tried to enable this feature, keeping it to a minimum:
The size modifier is not remembered for /s format,
thus any subsequent use of /s alone will still
print out byte char strings.
I found out a c-language specific issue that made a wrong calculation of the
position of the next string, if you used 'x /2hs ' command
and have two consecutive Unicode strings.
This patch also fixes that problem,
but I am not sure that this problem could really appear before
as the char size was fored to 1 byte...
Pierre Muller
2010-03-17 Pierre Muller <muller@ics.u-strasbg.fr>
* c-lang.c (classify_type): Recognize also types used
for /hs or /ws format specifier in 'x' command.
* printcmd.c (decode_format): Set char size to byte
for strings unless explicit size is given.
(print_formatted): Correct calculation of NEXT_ADDRESS
for 16 or 32 bit strings.
(do_examine): Do not force byte size for strings.
Index: c-lang.c
===================================================================
RCS file: /cvs/src/src/gdb/c-lang.c,v
retrieving revision 1.81
diff -u -p -r1.81 c-lang.c
--- c-lang.c 5 Mar 2010 20:18:11 -0000 1.81
+++ c-lang.c 17 Mar 2010 22:11:08 -0000
@@ -100,13 +100,19 @@ classify_type (struct type *elttype, str
goto done;
}
- if (!strcmp (name, "char16_t"))
+ /* Also recognize the type used by 'x /hs' command. */
+ if (!strcmp (name, "char16_t")
+ || (TYPE_CODE (elttype) == TYPE_CODE_INT
+ && TYPE_LENGTH (elttype) == 2))
{
result = C_CHAR_16;
goto done;
}
- if (!strcmp (name, "char32_t"))
+ /* Also recognize the type used by 'x /ws' command. */
+ if (!strcmp (name, "char32_t")
+ || (TYPE_CODE (elttype) == TYPE_CODE_INT
+ && TYPE_LENGTH (elttype) == 4))
{
result = C_CHAR_32;
goto done;
Index: printcmd.c
===================================================================
RCS file: /cvs/src/src/gdb/printcmd.c,v
retrieving revision 1.173
diff -u -p -r1.173 printcmd.c
--- printcmd.c 5 Mar 2010 20:18:14 -0000 1.173
+++ printcmd.c 17 Mar 2010 22:11:08 -0000
@@ -260,6 +260,11 @@ decode_format (char **string_ptr, int of
/* Characters default to one byte. */
val.size = osize ? 'b' : osize;
break;
+ case 's':
+ /* Display strings with byte size chars unless explicitly specified.
*/
+ val.size = 'b';
+ break;
+
default:
/* The default is the size most recently specified. */
val.size = osize;
@@ -295,7 +300,7 @@ print_formatted (struct value *val, int
next_address = (value_address (val)
+ val_print_string (elttype,
value_address (val), -1,
- stream, options));
+ stream, options) * len);
}
return;
@@ -802,9 +807,11 @@ do_examine (struct format_data fmt, stru
next_gdbarch = gdbarch;
next_address = addr;
- /* String or instruction format implies fetch single bytes
- regardless of the specified size. */
- if (format == 's' || format == 'i')
+ /* Instruction format implies fetch single bytes
+ regardless of the specified size.
+ The case of strings is handled n decode_format, only explicit
+ size operator are not changed to 'b'. */
+ if (format == 'i')
size = 'b';
if (size == 'a')
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] Allow explicit 16 or 32 char in 'x /s'
2010-03-17 22:43 [RFC] Allow explicit 16 or 32 char in 'x /s' Pierre Muller
@ 2010-03-18 7:01 ` Eli Zaretskii
2010-03-18 14:20 ` Pierre Muller
[not found] ` <001e01cac69a$75167630$5f436290$%muller@ics-cnrs.unistra.fr>
0 siblings, 2 replies; 8+ messages in thread
From: Eli Zaretskii @ 2010-03-18 7:01 UTC (permalink / raw)
To: Pierre Muller; +Cc: gdb-patches
> From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr>
> Date: Wed, 17 Mar 2010 23:42:53 +0100
>
>
> The patch below allows to
> print strings that are made of 16 bit or 32 bit char
> using:
> 'x /hs ' or 'x /ws ' commands.
Thanks. If this patch is accepted, we will need a suitable change for
the manual.
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [RFC] Allow explicit 16 or 32 char in 'x /s'
2010-03-18 7:01 ` Eli Zaretskii
@ 2010-03-18 14:20 ` Pierre Muller
[not found] ` <001e01cac69a$75167630$5f436290$%muller@ics-cnrs.unistra.fr>
1 sibling, 0 replies; 8+ messages in thread
From: Pierre Muller @ 2010-03-18 14:20 UTC (permalink / raw)
To: 'Eli Zaretskii'; +Cc: gdb-patches
> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Eli Zaretskii
> Envoyé : Thursday, March 18, 2010 8:02 AM
> À : Pierre Muller
> Cc : gdb-patches@sourceware.org
> Objet : Re: [RFC] Allow explicit 16 or 32 char in 'x /s'
>
> > From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr>
> > Date: Wed, 17 Mar 2010 23:42:53 +0100
> >
> >
> > The patch below allows to
> > print strings that are made of 16 bit or 32 bit char
> > using:
> > 'x /hs ' or 'x /ws ' commands.
>
> Thanks. If this patch is accepted, we will need a suitable change for
> the manual.
How about this change?
Pierre
doc/ChangeLog entry:
2010-03-18 Pierre Muller <muller@ics.u-strasbg.fr>
* gdbint.texinfo (Examining memory): Update for
change in string display with explicit size.
Index: doc/gdb.texinfo
===================================================================
RCS file: /cvs/src/src/gdb/doc/gdb.texinfo,v
retrieving revision 1.680
diff -u -p -r1.680 gdb.texinfo
--- doc/gdb.texinfo 12 Mar 2010 19:15:52 -0000 1.680
+++ doc/gdb.texinfo 18 Mar 2010 12:50:15 -0000
@@ -7232,8 +7232,11 @@ Giant words (eight bytes).
@end table
Each time you specify a unit size with @code{x}, that size becomes the
-default unit the next time you use @code{x}. (For the @samp{s} and
-@samp{i} formats, the unit size is ignored and is normally not written.)
+default unit the next time you use @code{x}. For the @samp{i} format,
+the unit size is ignored and is normally not written. For the @samp{s}
format,
+the unit size defaults to @samp{b}, unless it is explicitly given.
+Ue @code{x /hs} to display 16-bit char strings and @code{x /ws} to display
+32-bit strings. The next use of @code{x /s} will still display 8-bit
strings.
@item @var{addr}, starting display address
@var{addr} is the address where you want @value{GDBN} to begin displaying
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] Allow explicit 16 or 32 char in 'x /s'
[not found] ` <001e01cac69a$75167630$5f436290$%muller@ics-cnrs.unistra.fr>
@ 2010-03-18 18:26 ` Eli Zaretskii
0 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2010-03-18 18:26 UTC (permalink / raw)
To: Pierre Muller; +Cc: gdb-patches
> From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr>
> Cc: <gdb-patches@sourceware.org>
> Date: Thu, 18 Mar 2010 13:56:42 +0100
>
> > > The patch below allows to
> > > print strings that are made of 16 bit or 32 bit char
> > > using:
> > > 'x /hs ' or 'x /ws ' commands.
> >
> > Thanks. If this patch is accepted, we will need a suitable change for
> > the manual.
>
> How about this change?
It's okay, but it needs a few fixes:
> doc/ChangeLog entry:
>
> 2010-03-18 Pierre Muller <muller@ics.u-strasbg.fr>
>
> * gdbint.texinfo (Examining memory): Update for
gdb.texinfo, not gdbint.texinfo.
> +default unit the next time you use @code{x}. For the @samp{i} format,
^^
Two spaces between sentences (here and elsewhere in your patch).
> +Ue @code{x /hs} to display 16-bit char strings and @code{x /ws} to display
Suggest to rephrase
Use @kbd{x /hs} to display strings made of 16-bid wide characters
and similarly for x/ws.
> +32-bit strings. The next use of @code{x /s} will still display 8-bit
^^^^^
I suggest "again" instead of "still"
Okay with these changes.
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] Allow explicit 16 or 32 char in 'x /s'
[not found] ` <15103.6087111153$1269298497@news.gmane.org>
@ 2010-03-30 20:33 ` Tom Tromey
0 siblings, 0 replies; 8+ messages in thread
From: Tom Tromey @ 2010-03-30 20:33 UTC (permalink / raw)
To: Pierre Muller; +Cc: 'Eli Zaretskii', gdb-patches
>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:
Pierre> But I don't know exactly for other languages and I would like
Pierre> to be sure about what you want me to add to the docs...
I think no other language has been updated to deal with wide characters.
Pierre> Furthermore if you look into charset_for_string_type
Pierre> function in c-lang.c source, you will see that there are two FIXME
Pierre> just right at the position of these charset name settings.
Yeah ... those are actually pedantic FIXMEs, in that (IIRC) nothing
guarantees that char16_t==UTF-16, even though that is the common
meaning.
Pierre> To answer Tom's concern about the change in classify_type function,
Pierre> I modified my patch to change the elttype in do_examine to match exactly
Pierre> what is expected by charset_for_string_type function.
Pierre> Thus this new version has no modification in c-lang.c file.
Suppose the inferior does not define char16_t. Won't this new code
allocate a new type each time the user uses x/hs? That seems bad.
What about passing the desired encoding to LA_PRINT_STRING, via a new
argument to val_print_string? That makes the patch a lot bigger, though
it is mostly mechanical.
Pierre> I also added a very basic check for string display using 'x
Pierre> /hs' and 'x /ws'.
Thanks.
Pierre> + case 's':
Pierre> + /* Display strings with byte size chars unless explicitly specified.
Pierre> */
Pierre> + val.size = 'b';
Pierre> + break;
I think x/hs followed by x should probably print another wide string.
I couldn't tell offhand if it does this or not.
Tom
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [RFC] Allow explicit 16 or 32 char in 'x /s'
2010-03-19 7:32 ` Eli Zaretskii
@ 2010-03-22 22:54 ` Pierre Muller
[not found] ` <15103.6087111153$1269298497@news.gmane.org>
1 sibling, 0 replies; 8+ messages in thread
From: Pierre Muller @ 2010-03-22 22:54 UTC (permalink / raw)
To: 'Eli Zaretskii', tromey; +Cc: gdb-patches
> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Eli Zaretskii
> Envoyé : Friday, March 19, 2010 8:32 AM
> À : tromey@redhat.com
> Cc : pierre.muller@ics-cnrs.unistra.fr; gdb-patches@sourceware.org
> Objet : Re: [RFC] Allow explicit 16 or 32 char in 'x /s'
>
> > From: Tom Tromey <tromey@redhat.com>
> > Cc: <gdb-patches@sourceware.org>
> > Date: Thu, 18 Mar 2010 16:08:27 -0600
> >
> > I think the documentation should reflect that the user can't choose
> the
> > encoding used here.
>
> I agree. It should also say which encoding is used by GDB in this
> case.
Not that I do not agree with you, but I would like to
stress that how the string is displayed also depend on the current language,
so that, for C or any other language using c_printstr function,
/hs will use UTF-16LE or UTF-16BE according to current gdbarch endianess.
/ws will use UTF-32LE or UTF-32BE.
But I don't know exactly for other languages and I would like to be sure
about
what you want me to add to the docs...
Furthermore if you look into charset_for_string_type
function in c-lang.c source, you will see that there are two FIXME
just right at the position of these charset name settings.
To answer Tom's concern about the change in classify_type function,
I modified my patch to change the elttype in do_examine to match exactly
what is expected by charset_for_string_type function.
Thus this new version has no modification in c-lang.c file.
I also added a very basic check for string display using 'x /hs' and 'x
/ws'.
Pierre Muller
2010-03-22 Pierre Muller <muller@ics.u-strasbg.fr>
* printcmd.c (decode_format): Set char size to byte
for strings unless explicit size is given.
(print_formatted): Correct calculation of NEXT_ADDRESS
for 16 or 32 bit strings.
(do_examine): Do not force byte size for strings.
Use 'char16_t' and 'char32_t' types to allow
for correct recognition in classify_type.
2010-03-22 Pierre Muller <muller@ics.u-strasbg.fr>
* gdb.base/charset.c (Strin16, String32): New variables.
* gdb.base/charset.exp (gdb_test): Test correct display
of 16 or 32 bit strings.
Index: printcmd.c
===================================================================
RCS file: /cvs/src/src/gdb/printcmd.c,v
retrieving revision 1.173
diff -u -p -r1.173 printcmd.c
--- printcmd.c 5 Mar 2010 20:18:14 -0000 1.173
+++ printcmd.c 22 Mar 2010 22:25:34 -0000
@@ -260,6 +260,11 @@ decode_format (char **string_ptr, int of
/* Characters default to one byte. */
val.size = osize ? 'b' : osize;
break;
+ case 's':
+ /* Display strings with byte size chars unless explicitly specified.
*/
+ val.size = 'b';
+ break;
+
default:
/* The default is the size most recently specified. */
val.size = osize;
@@ -295,7 +300,7 @@ print_formatted (struct value *val, int
next_address = (value_address (val)
+ val_print_string (elttype,
value_address (val), -1,
- stream, options));
+ stream, options) * len);
}
return;
@@ -802,9 +807,11 @@ do_examine (struct format_data fmt, stru
next_gdbarch = gdbarch;
next_address = addr;
- /* String or instruction format implies fetch single bytes
- regardless of the specified size. */
- if (format == 's' || format == 'i')
+ /* Instruction format implies fetch single bytes
+ regardless of the specified size.
+ The case of strings is handled n decode_format, only explicit
+ size operator are not changed to 'b'. */
+ if (format == 'i')
size = 'b';
if (size == 'a')
@@ -831,6 +838,36 @@ do_examine (struct format_data fmt, stru
else if (size == 'g')
val_type = builtin_type (next_gdbarch)->builtin_int64;
+ if (format == 's')
+ {
+ struct type *char_type;
+ if (size == 'h')
+ {
+ char_type = lookup_typename (current_language, next_gdbarch,
+ "char16_t", NULL, 1);
+ if (!char_type)
+ char_type = arch_type (next_gdbarch, TYPE_CODE_INT, 2,
"char16_t");
+ check_typedef (char_type);
+ if (TYPE_LENGTH (char_type) == 2)
+ val_type = char_type;
+ }
+ else if (size == 'w')
+ {
+ char_type = lookup_typename (current_language, next_gdbarch,
+ "char32_t", NULL, 1);
+ if (!char_type)
+ char_type = arch_type (next_gdbarch, TYPE_CODE_INT, 4,
"char32_t");
+ check_typedef (char_type);
+ if (char_type && TYPE_LENGTH (char_type) == 4)
+ val_type = char_type;
+ }
+ else
+ {
+ size = 'b';
+ val_type = builtin_type (next_gdbarch)->builtin_int8;
+ }
+ }
+
maxelts = 8;
if (size == 'w')
maxelts = 4;
Index: testsuite/gdb.base/charset.c
===================================================================
RCS file: /cvs/src/src/gdb/testsuite/gdb.base/charset.c,v
retrieving revision 1.12
diff -u -p -r1.12 charset.c
--- testsuite/gdb.base/charset.c 1 Jan 2010 07:32:00 -0000 1.12
+++ testsuite/gdb.base/charset.c 22 Mar 2010 22:25:34 -0000
@@ -65,6 +65,9 @@ typedef unsigned int char32_t;
char16_t uvar;
char32_t Uvar;
+char16_t *String16;
+char32_t *String32;
+
/* A typedef to a typedef should also work. */
typedef wchar_t my_wchar_t;
my_wchar_t myvar;
Index: testsuite/gdb.base/charset.exp
===================================================================
RCS file: /cvs/src/src/gdb/testsuite/gdb.base/charset.exp,v
retrieving revision 1.21
diff -u -p -r1.21 charset.exp
--- testsuite/gdb.base/charset.exp 17 Feb 2010 22:05:58 -0000 1.21
+++ testsuite/gdb.base/charset.exp 22 Mar 2010 22:25:35 -0000
@@ -616,4 +616,21 @@ gdb_test "print 'a' == 'a' || 'b' == 'b'
".* = 1" \
"EVAL_SKIP cleanup handling regression test"
+
+proc string_display { var_name set_prefix x_size x_type} {
+ gdb_test "set ${var_name} = ${set_prefix}\"Test String\\0with zeroes\""
"" "Assign ${var_name} with prefix ${set_prefix}"
+ gdb_test "x /2${x_size}s ${var_name}" ".* ${x_type}\"Test
String\"\[\r\n\]+.* ${x_type}\"with zeroes\"" "Display String ${var_name}
with x/${x_size}s"
+}
+
+string_display String16 u h u
+if {$wchar_size == 2} {
+ string_display String16 L h u
+}
+
+string_display String32 U w U
+if {$wchar_size == 4} {
+ string_display String32 L w U
+}
+
+
gdb_exit
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] Allow explicit 16 or 32 char in 'x /s'
2010-03-18 22:08 ` Tom Tromey
@ 2010-03-19 7:32 ` Eli Zaretskii
2010-03-22 22:54 ` Pierre Muller
[not found] ` <15103.6087111153$1269298497@news.gmane.org>
0 siblings, 2 replies; 8+ messages in thread
From: Eli Zaretskii @ 2010-03-19 7:32 UTC (permalink / raw)
To: tromey; +Cc: pierre.muller, gdb-patches
> From: Tom Tromey <tromey@redhat.com>
> Cc: <gdb-patches@sourceware.org>
> Date: Thu, 18 Mar 2010 16:08:27 -0600
>
> I think the documentation should reflect that the user can't choose the
> encoding used here.
I agree. It should also say which encoding is used by GDB in this
case.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] Allow explicit 16 or 32 char in 'x /s'
[not found] <11484.4708740295$1268865815@news.gmane.org>
@ 2010-03-18 22:08 ` Tom Tromey
2010-03-19 7:32 ` Eli Zaretskii
0 siblings, 1 reply; 8+ messages in thread
From: Tom Tromey @ 2010-03-18 22:08 UTC (permalink / raw)
To: Pierre Muller; +Cc: gdb-patches
>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:
Pierre> The patch below allows to
Pierre> print strings that are made of 16 bit or 32 bit char
Pierre> using:
Pierre> 'x /hs ' or 'x /ws ' commands.
It seems like a good idea to me.
Pierre> I tried to enable this feature, keeping it to a minimum:
Pierre> The size modifier is not remembered for /s format,
Pierre> thus any subsequent use of /s alone will still
Pierre> print out byte char strings.
If the user types 'x/2hs' and then 'x/2', does the second invocation
still print wide strings? I think it should.
Pierre> - if (!strcmp (name, "char16_t"))
Pierre> + /* Also recognize the type used by 'x /hs' command. */
Pierre> + if (!strcmp (name, "char16_t")
Pierre> + || (TYPE_CODE (elttype) == TYPE_CODE_INT
Pierre> + && TYPE_LENGTH (elttype) == 2))
Pierre> {
Pierre> result = C_CHAR_16;
Pierre> goto done;
Pierre> }
I am a little concerned that this code can confuse the user.
If sizeof(wchar_t) == 2, then sometimes you could end up printing a
wchar_t using UTF-16 -- which may or may not be appropriate.
I'm not sure how much this matters in practice. However, it seems like
it may be cleaner to override classify_type's decision based directly on
the format character, instead of on the implied type. What do you think
of that? This would also let us introduce a new format character
meaning "wchar_t".
I think the documentation should reflect that the user can't choose the
encoding used here.
Pierre> + The case of strings is handled n decode_format, only explicit
Typo, s/n/in/
Finally, please add some test cases.
Tom
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-03-30 20:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-17 22:43 [RFC] Allow explicit 16 or 32 char in 'x /s' Pierre Muller
2010-03-18 7:01 ` Eli Zaretskii
2010-03-18 14:20 ` Pierre Muller
[not found] ` <001e01cac69a$75167630$5f436290$%muller@ics-cnrs.unistra.fr>
2010-03-18 18:26 ` Eli Zaretskii
[not found] <11484.4708740295$1268865815@news.gmane.org>
2010-03-18 22:08 ` Tom Tromey
2010-03-19 7:32 ` Eli Zaretskii
2010-03-22 22:54 ` Pierre Muller
[not found] ` <15103.6087111153$1269298497@news.gmane.org>
2010-03-30 20:33 ` Tom Tromey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).