* [RFC] Allow explicit 16 or 32 char in 'x /s' @ 2010-03-17 22:43 Pierre Muller 2010-03-18 7:01 ` Eli Zaretskii 0 siblings, 1 reply; 8+ messages in thread From: Pierre Muller @ 2010-03-17 22:43 UTC (permalink / raw) To: gdb-patches The patch below allows to print strings that are made of 16 bit or 32 bit char using: 'x /hs ' or 'x /ws ' commands. I tried to enable this feature, keeping it to a minimum: The size modifier is not remembered for /s format, thus any subsequent use of /s alone will still print out byte char strings. I found out a c-language specific issue that made a wrong calculation of the position of the next string, if you used 'x /2hs ' command and have two consecutive Unicode strings. This patch also fixes that problem, but I am not sure that this problem could really appear before as the char size was fored to 1 byte... Pierre Muller 2010-03-17 Pierre Muller <muller@ics.u-strasbg.fr> * c-lang.c (classify_type): Recognize also types used for /hs or /ws format specifier in 'x' command. * printcmd.c (decode_format): Set char size to byte for strings unless explicit size is given. (print_formatted): Correct calculation of NEXT_ADDRESS for 16 or 32 bit strings. (do_examine): Do not force byte size for strings. Index: c-lang.c =================================================================== RCS file: /cvs/src/src/gdb/c-lang.c,v retrieving revision 1.81 diff -u -p -r1.81 c-lang.c --- c-lang.c 5 Mar 2010 20:18:11 -0000 1.81 +++ c-lang.c 17 Mar 2010 22:11:08 -0000 @@ -100,13 +100,19 @@ classify_type (struct type *elttype, str goto done; } - if (!strcmp (name, "char16_t")) + /* Also recognize the type used by 'x /hs' command. */ + if (!strcmp (name, "char16_t") + || (TYPE_CODE (elttype) == TYPE_CODE_INT + && TYPE_LENGTH (elttype) == 2)) { result = C_CHAR_16; goto done; } - if (!strcmp (name, "char32_t")) + /* Also recognize the type used by 'x /ws' command. */ + if (!strcmp (name, "char32_t") + || (TYPE_CODE (elttype) == TYPE_CODE_INT + && TYPE_LENGTH (elttype) == 4)) { result = C_CHAR_32; goto done; Index: printcmd.c =================================================================== RCS file: /cvs/src/src/gdb/printcmd.c,v retrieving revision 1.173 diff -u -p -r1.173 printcmd.c --- printcmd.c 5 Mar 2010 20:18:14 -0000 1.173 +++ printcmd.c 17 Mar 2010 22:11:08 -0000 @@ -260,6 +260,11 @@ decode_format (char **string_ptr, int of /* Characters default to one byte. */ val.size = osize ? 'b' : osize; break; + case 's': + /* Display strings with byte size chars unless explicitly specified. */ + val.size = 'b'; + break; + default: /* The default is the size most recently specified. */ val.size = osize; @@ -295,7 +300,7 @@ print_formatted (struct value *val, int next_address = (value_address (val) + val_print_string (elttype, value_address (val), -1, - stream, options)); + stream, options) * len); } return; @@ -802,9 +807,11 @@ do_examine (struct format_data fmt, stru next_gdbarch = gdbarch; next_address = addr; - /* String or instruction format implies fetch single bytes - regardless of the specified size. */ - if (format == 's' || format == 'i') + /* Instruction format implies fetch single bytes + regardless of the specified size. + The case of strings is handled n decode_format, only explicit + size operator are not changed to 'b'. */ + if (format == 'i') size = 'b'; if (size == 'a') ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] Allow explicit 16 or 32 char in 'x /s' 2010-03-17 22:43 [RFC] Allow explicit 16 or 32 char in 'x /s' Pierre Muller @ 2010-03-18 7:01 ` Eli Zaretskii 2010-03-18 14:20 ` Pierre Muller [not found] ` <001e01cac69a$75167630$5f436290$%muller@ics-cnrs.unistra.fr> 0 siblings, 2 replies; 8+ messages in thread From: Eli Zaretskii @ 2010-03-18 7:01 UTC (permalink / raw) To: Pierre Muller; +Cc: gdb-patches > From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr> > Date: Wed, 17 Mar 2010 23:42:53 +0100 > > > The patch below allows to > print strings that are made of 16 bit or 32 bit char > using: > 'x /hs ' or 'x /ws ' commands. Thanks. If this patch is accepted, we will need a suitable change for the manual. ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [RFC] Allow explicit 16 or 32 char in 'x /s' 2010-03-18 7:01 ` Eli Zaretskii @ 2010-03-18 14:20 ` Pierre Muller [not found] ` <001e01cac69a$75167630$5f436290$%muller@ics-cnrs.unistra.fr> 1 sibling, 0 replies; 8+ messages in thread From: Pierre Muller @ 2010-03-18 14:20 UTC (permalink / raw) To: 'Eli Zaretskii'; +Cc: gdb-patches > -----Message d'origine----- > De : gdb-patches-owner@sourceware.org [mailto:gdb-patches- > owner@sourceware.org] De la part de Eli Zaretskii > Envoyé : Thursday, March 18, 2010 8:02 AM > À : Pierre Muller > Cc : gdb-patches@sourceware.org > Objet : Re: [RFC] Allow explicit 16 or 32 char in 'x /s' > > > From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr> > > Date: Wed, 17 Mar 2010 23:42:53 +0100 > > > > > > The patch below allows to > > print strings that are made of 16 bit or 32 bit char > > using: > > 'x /hs ' or 'x /ws ' commands. > > Thanks. If this patch is accepted, we will need a suitable change for > the manual. How about this change? Pierre doc/ChangeLog entry: 2010-03-18 Pierre Muller <muller@ics.u-strasbg.fr> * gdbint.texinfo (Examining memory): Update for change in string display with explicit size. Index: doc/gdb.texinfo =================================================================== RCS file: /cvs/src/src/gdb/doc/gdb.texinfo,v retrieving revision 1.680 diff -u -p -r1.680 gdb.texinfo --- doc/gdb.texinfo 12 Mar 2010 19:15:52 -0000 1.680 +++ doc/gdb.texinfo 18 Mar 2010 12:50:15 -0000 @@ -7232,8 +7232,11 @@ Giant words (eight bytes). @end table Each time you specify a unit size with @code{x}, that size becomes the -default unit the next time you use @code{x}. (For the @samp{s} and -@samp{i} formats, the unit size is ignored and is normally not written.) +default unit the next time you use @code{x}. For the @samp{i} format, +the unit size is ignored and is normally not written. For the @samp{s} format, +the unit size defaults to @samp{b}, unless it is explicitly given. +Ue @code{x /hs} to display 16-bit char strings and @code{x /ws} to display +32-bit strings. The next use of @code{x /s} will still display 8-bit strings. @item @var{addr}, starting display address @var{addr} is the address where you want @value{GDBN} to begin displaying ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <001e01cac69a$75167630$5f436290$%muller@ics-cnrs.unistra.fr>]
* Re: [RFC] Allow explicit 16 or 32 char in 'x /s' [not found] ` <001e01cac69a$75167630$5f436290$%muller@ics-cnrs.unistra.fr> @ 2010-03-18 18:26 ` Eli Zaretskii 0 siblings, 0 replies; 8+ messages in thread From: Eli Zaretskii @ 2010-03-18 18:26 UTC (permalink / raw) To: Pierre Muller; +Cc: gdb-patches > From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr> > Cc: <gdb-patches@sourceware.org> > Date: Thu, 18 Mar 2010 13:56:42 +0100 > > > > The patch below allows to > > > print strings that are made of 16 bit or 32 bit char > > > using: > > > 'x /hs ' or 'x /ws ' commands. > > > > Thanks. If this patch is accepted, we will need a suitable change for > > the manual. > > How about this change? It's okay, but it needs a few fixes: > doc/ChangeLog entry: > > 2010-03-18 Pierre Muller <muller@ics.u-strasbg.fr> > > * gdbint.texinfo (Examining memory): Update for gdb.texinfo, not gdbint.texinfo. > +default unit the next time you use @code{x}. For the @samp{i} format, ^^ Two spaces between sentences (here and elsewhere in your patch). > +Ue @code{x /hs} to display 16-bit char strings and @code{x /ws} to display Suggest to rephrase Use @kbd{x /hs} to display strings made of 16-bid wide characters and similarly for x/ws. > +32-bit strings. The next use of @code{x /s} will still display 8-bit ^^^^^ I suggest "again" instead of "still" Okay with these changes. Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <11484.4708740295$1268865815@news.gmane.org>]
* Re: [RFC] Allow explicit 16 or 32 char in 'x /s' [not found] <11484.4708740295$1268865815@news.gmane.org> @ 2010-03-18 22:08 ` Tom Tromey 2010-03-19 7:32 ` Eli Zaretskii 0 siblings, 1 reply; 8+ messages in thread From: Tom Tromey @ 2010-03-18 22:08 UTC (permalink / raw) To: Pierre Muller; +Cc: gdb-patches >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: Pierre> The patch below allows to Pierre> print strings that are made of 16 bit or 32 bit char Pierre> using: Pierre> 'x /hs ' or 'x /ws ' commands. It seems like a good idea to me. Pierre> I tried to enable this feature, keeping it to a minimum: Pierre> The size modifier is not remembered for /s format, Pierre> thus any subsequent use of /s alone will still Pierre> print out byte char strings. If the user types 'x/2hs' and then 'x/2', does the second invocation still print wide strings? I think it should. Pierre> - if (!strcmp (name, "char16_t")) Pierre> + /* Also recognize the type used by 'x /hs' command. */ Pierre> + if (!strcmp (name, "char16_t") Pierre> + || (TYPE_CODE (elttype) == TYPE_CODE_INT Pierre> + && TYPE_LENGTH (elttype) == 2)) Pierre> { Pierre> result = C_CHAR_16; Pierre> goto done; Pierre> } I am a little concerned that this code can confuse the user. If sizeof(wchar_t) == 2, then sometimes you could end up printing a wchar_t using UTF-16 -- which may or may not be appropriate. I'm not sure how much this matters in practice. However, it seems like it may be cleaner to override classify_type's decision based directly on the format character, instead of on the implied type. What do you think of that? This would also let us introduce a new format character meaning "wchar_t". I think the documentation should reflect that the user can't choose the encoding used here. Pierre> + The case of strings is handled n decode_format, only explicit Typo, s/n/in/ Finally, please add some test cases. Tom ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] Allow explicit 16 or 32 char in 'x /s' 2010-03-18 22:08 ` Tom Tromey @ 2010-03-19 7:32 ` Eli Zaretskii 2010-03-22 22:54 ` Pierre Muller [not found] ` <15103.6087111153$1269298497@news.gmane.org> 0 siblings, 2 replies; 8+ messages in thread From: Eli Zaretskii @ 2010-03-19 7:32 UTC (permalink / raw) To: tromey; +Cc: pierre.muller, gdb-patches > From: Tom Tromey <tromey@redhat.com> > Cc: <gdb-patches@sourceware.org> > Date: Thu, 18 Mar 2010 16:08:27 -0600 > > I think the documentation should reflect that the user can't choose the > encoding used here. I agree. It should also say which encoding is used by GDB in this case. ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [RFC] Allow explicit 16 or 32 char in 'x /s' 2010-03-19 7:32 ` Eli Zaretskii @ 2010-03-22 22:54 ` Pierre Muller [not found] ` <15103.6087111153$1269298497@news.gmane.org> 1 sibling, 0 replies; 8+ messages in thread From: Pierre Muller @ 2010-03-22 22:54 UTC (permalink / raw) To: 'Eli Zaretskii', tromey; +Cc: gdb-patches > -----Message d'origine----- > De : gdb-patches-owner@sourceware.org [mailto:gdb-patches- > owner@sourceware.org] De la part de Eli Zaretskii > Envoyé : Friday, March 19, 2010 8:32 AM > À : tromey@redhat.com > Cc : pierre.muller@ics-cnrs.unistra.fr; gdb-patches@sourceware.org > Objet : Re: [RFC] Allow explicit 16 or 32 char in 'x /s' > > > From: Tom Tromey <tromey@redhat.com> > > Cc: <gdb-patches@sourceware.org> > > Date: Thu, 18 Mar 2010 16:08:27 -0600 > > > > I think the documentation should reflect that the user can't choose > the > > encoding used here. > > I agree. It should also say which encoding is used by GDB in this > case. Not that I do not agree with you, but I would like to stress that how the string is displayed also depend on the current language, so that, for C or any other language using c_printstr function, /hs will use UTF-16LE or UTF-16BE according to current gdbarch endianess. /ws will use UTF-32LE or UTF-32BE. But I don't know exactly for other languages and I would like to be sure about what you want me to add to the docs... Furthermore if you look into charset_for_string_type function in c-lang.c source, you will see that there are two FIXME just right at the position of these charset name settings. To answer Tom's concern about the change in classify_type function, I modified my patch to change the elttype in do_examine to match exactly what is expected by charset_for_string_type function. Thus this new version has no modification in c-lang.c file. I also added a very basic check for string display using 'x /hs' and 'x /ws'. Pierre Muller 2010-03-22 Pierre Muller <muller@ics.u-strasbg.fr> * printcmd.c (decode_format): Set char size to byte for strings unless explicit size is given. (print_formatted): Correct calculation of NEXT_ADDRESS for 16 or 32 bit strings. (do_examine): Do not force byte size for strings. Use 'char16_t' and 'char32_t' types to allow for correct recognition in classify_type. 2010-03-22 Pierre Muller <muller@ics.u-strasbg.fr> * gdb.base/charset.c (Strin16, String32): New variables. * gdb.base/charset.exp (gdb_test): Test correct display of 16 or 32 bit strings. Index: printcmd.c =================================================================== RCS file: /cvs/src/src/gdb/printcmd.c,v retrieving revision 1.173 diff -u -p -r1.173 printcmd.c --- printcmd.c 5 Mar 2010 20:18:14 -0000 1.173 +++ printcmd.c 22 Mar 2010 22:25:34 -0000 @@ -260,6 +260,11 @@ decode_format (char **string_ptr, int of /* Characters default to one byte. */ val.size = osize ? 'b' : osize; break; + case 's': + /* Display strings with byte size chars unless explicitly specified. */ + val.size = 'b'; + break; + default: /* The default is the size most recently specified. */ val.size = osize; @@ -295,7 +300,7 @@ print_formatted (struct value *val, int next_address = (value_address (val) + val_print_string (elttype, value_address (val), -1, - stream, options)); + stream, options) * len); } return; @@ -802,9 +807,11 @@ do_examine (struct format_data fmt, stru next_gdbarch = gdbarch; next_address = addr; - /* String or instruction format implies fetch single bytes - regardless of the specified size. */ - if (format == 's' || format == 'i') + /* Instruction format implies fetch single bytes + regardless of the specified size. + The case of strings is handled n decode_format, only explicit + size operator are not changed to 'b'. */ + if (format == 'i') size = 'b'; if (size == 'a') @@ -831,6 +838,36 @@ do_examine (struct format_data fmt, stru else if (size == 'g') val_type = builtin_type (next_gdbarch)->builtin_int64; + if (format == 's') + { + struct type *char_type; + if (size == 'h') + { + char_type = lookup_typename (current_language, next_gdbarch, + "char16_t", NULL, 1); + if (!char_type) + char_type = arch_type (next_gdbarch, TYPE_CODE_INT, 2, "char16_t"); + check_typedef (char_type); + if (TYPE_LENGTH (char_type) == 2) + val_type = char_type; + } + else if (size == 'w') + { + char_type = lookup_typename (current_language, next_gdbarch, + "char32_t", NULL, 1); + if (!char_type) + char_type = arch_type (next_gdbarch, TYPE_CODE_INT, 4, "char32_t"); + check_typedef (char_type); + if (char_type && TYPE_LENGTH (char_type) == 4) + val_type = char_type; + } + else + { + size = 'b'; + val_type = builtin_type (next_gdbarch)->builtin_int8; + } + } + maxelts = 8; if (size == 'w') maxelts = 4; Index: testsuite/gdb.base/charset.c =================================================================== RCS file: /cvs/src/src/gdb/testsuite/gdb.base/charset.c,v retrieving revision 1.12 diff -u -p -r1.12 charset.c --- testsuite/gdb.base/charset.c 1 Jan 2010 07:32:00 -0000 1.12 +++ testsuite/gdb.base/charset.c 22 Mar 2010 22:25:34 -0000 @@ -65,6 +65,9 @@ typedef unsigned int char32_t; char16_t uvar; char32_t Uvar; +char16_t *String16; +char32_t *String32; + /* A typedef to a typedef should also work. */ typedef wchar_t my_wchar_t; my_wchar_t myvar; Index: testsuite/gdb.base/charset.exp =================================================================== RCS file: /cvs/src/src/gdb/testsuite/gdb.base/charset.exp,v retrieving revision 1.21 diff -u -p -r1.21 charset.exp --- testsuite/gdb.base/charset.exp 17 Feb 2010 22:05:58 -0000 1.21 +++ testsuite/gdb.base/charset.exp 22 Mar 2010 22:25:35 -0000 @@ -616,4 +616,21 @@ gdb_test "print 'a' == 'a' || 'b' == 'b' ".* = 1" \ "EVAL_SKIP cleanup handling regression test" + +proc string_display { var_name set_prefix x_size x_type} { + gdb_test "set ${var_name} = ${set_prefix}\"Test String\\0with zeroes\"" "" "Assign ${var_name} with prefix ${set_prefix}" + gdb_test "x /2${x_size}s ${var_name}" ".* ${x_type}\"Test String\"\[\r\n\]+.* ${x_type}\"with zeroes\"" "Display String ${var_name} with x/${x_size}s" +} + +string_display String16 u h u +if {$wchar_size == 2} { + string_display String16 L h u +} + +string_display String32 U w U +if {$wchar_size == 4} { + string_display String32 L w U +} + + gdb_exit ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <15103.6087111153$1269298497@news.gmane.org>]
* Re: [RFC] Allow explicit 16 or 32 char in 'x /s' [not found] ` <15103.6087111153$1269298497@news.gmane.org> @ 2010-03-30 20:33 ` Tom Tromey 0 siblings, 0 replies; 8+ messages in thread From: Tom Tromey @ 2010-03-30 20:33 UTC (permalink / raw) To: Pierre Muller; +Cc: 'Eli Zaretskii', gdb-patches >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: Pierre> But I don't know exactly for other languages and I would like Pierre> to be sure about what you want me to add to the docs... I think no other language has been updated to deal with wide characters. Pierre> Furthermore if you look into charset_for_string_type Pierre> function in c-lang.c source, you will see that there are two FIXME Pierre> just right at the position of these charset name settings. Yeah ... those are actually pedantic FIXMEs, in that (IIRC) nothing guarantees that char16_t==UTF-16, even though that is the common meaning. Pierre> To answer Tom's concern about the change in classify_type function, Pierre> I modified my patch to change the elttype in do_examine to match exactly Pierre> what is expected by charset_for_string_type function. Pierre> Thus this new version has no modification in c-lang.c file. Suppose the inferior does not define char16_t. Won't this new code allocate a new type each time the user uses x/hs? That seems bad. What about passing the desired encoding to LA_PRINT_STRING, via a new argument to val_print_string? That makes the patch a lot bigger, though it is mostly mechanical. Pierre> I also added a very basic check for string display using 'x Pierre> /hs' and 'x /ws'. Thanks. Pierre> + case 's': Pierre> + /* Display strings with byte size chars unless explicitly specified. Pierre> */ Pierre> + val.size = 'b'; Pierre> + break; I think x/hs followed by x should probably print another wide string. I couldn't tell offhand if it does this or not. Tom ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-03-30 20:33 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-03-17 22:43 [RFC] Allow explicit 16 or 32 char in 'x /s' Pierre Muller 2010-03-18 7:01 ` Eli Zaretskii 2010-03-18 14:20 ` Pierre Muller [not found] ` <001e01cac69a$75167630$5f436290$%muller@ics-cnrs.unistra.fr> 2010-03-18 18:26 ` Eli Zaretskii [not found] <11484.4708740295$1268865815@news.gmane.org> 2010-03-18 22:08 ` Tom Tromey 2010-03-19 7:32 ` Eli Zaretskii 2010-03-22 22:54 ` Pierre Muller [not found] ` <15103.6087111153$1269298497@news.gmane.org> 2010-03-30 20:33 ` Tom Tromey
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).