[RFC] Collecting strings at tracepoints

public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed

* [RFC] Collecting strings at tracepoints
@ 2010-06-04 22:53 Stan Shebs
  2010-06-04 23:00 ` Michael Snyder
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Stan Shebs @ 2010-06-04 22:53 UTC (permalink / raw)
  To: gdb

Collection of strings is a problem for tracepoint users, because the 
literal interpretation of "collect str", where str is a char *, is to 
collect the address of the string, but not any of its contents.  It is 
possible to use the '@' syntax to get some contents, for instance 
"collect str@40" acquires the first 40 characters, but it is a poor 
approximation; if the string is shorter than that, you collect more than 
necessary, and possibly run into access trouble if str+40 is outside the 
program's address space, or else the string is longer, in which case you 
may miss the part you really wanted.

For normal printing of strings GDB has a couple tricks it does.  First, 
it explicitly recognizes types that are pointers to chars, and 
automatically dereferences and prints the bytes it finds.  Second, the 
print elements limit prevents excessive output in case the string is long.

For tracepoint collection, I think the automatic heuristic is probably 
not a good idea.  In interactive use, if you print too much string, or 
just wanted to see the address, there's no harm in displaying extra 
data.  But for tracing, the user needs a little more control, so that 
the buffer doesn't inadvertantly fill up too soon.  So I think that 
means that we should have the user explicitly request collection of 
string contents.

Looking at how '@' syntax works, we can extend it without disrupting 
expression parsing much.  For instance, "str@@" could mean to deference 
str, and collect bytes until a 0 is seen, or the print elements limit is 
reached (implication is that we would have to tell the target that 
number).  The user could exercise even finer control by supplying the 
limit explicitly, for instance "str@/80" to collect at most 80 chars of 
the string.  ("str@@80" seems like it would cause ambiguity problems vs 
"str@@").

This extended syntax could work for the print command too, in lieu of 
tweaking the print element limit, and for types that GDB does not 
recognize as a string type.

Under the hood, it's not yet clear if we will need additional bytecodes, 
but probably so.

Stan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Collecting strings at tracepoints
  2010-06-04 22:53 [RFC] Collecting strings at tracepoints Stan Shebs
@ 2010-06-04 23:00 ` Michael Snyder
  2010-06-08 21:19 ` Tom Tromey
  2010-06-15 22:51 ` Doug Evans
  2 siblings, 0 replies; 8+ messages in thread
From: Michael Snyder @ 2010-06-04 23:00 UTC (permalink / raw)
  To: Stan Shebs; +Cc: gdb

Stan Shebs wrote:
> Collection of strings is a problem for tracepoint users, because the 
> literal interpretation of "collect str", where str is a char *, is to 
> collect the address of the string, but not any of its contents.  It is 
> possible to use the '@' syntax to get some contents, for instance 
> "collect str@40" acquires the first 40 characters, but it is a poor 
> approximation; if the string is shorter than that, you collect more than 
> necessary, and possibly run into access trouble if str+40 is outside the 
> program's address space, or else the string is longer, in which case you 
> may miss the part you really wanted.
> 
> For normal printing of strings GDB has a couple tricks it does.  First, 
> it explicitly recognizes types that are pointers to chars, and 
> automatically dereferences and prints the bytes it finds.  Second, the 
> print elements limit prevents excessive output in case the string is long.
> 
> For tracepoint collection, I think the automatic heuristic is probably 
> not a good idea.  In interactive use, if you print too much string, or 
> just wanted to see the address, there's no harm in displaying extra 
> data.  But for tracing, the user needs a little more control, so that 
> the buffer doesn't inadvertantly fill up too soon.  So I think that 
> means that we should have the user explicitly request collection of 
> string contents.
> 
> Looking at how '@' syntax works, we can extend it without disrupting 
> expression parsing much.  For instance, "str@@" could mean to deference 
> str, and collect bytes until a 0 is seen, or the print elements limit is 
> reached (implication is that we would have to tell the target that 
> number).  The user could exercise even finer control by supplying the 
> limit explicitly, for instance "str@/80" to collect at most 80 chars of 
> the string.  ("str@@80" seems like it would cause ambiguity problems vs 
> "str@@").
> 
> This extended syntax could work for the print command too, in lieu of 
> tweaking the print element limit, and for types that GDB does not 
> recognize as a string type.
> 
> Under the hood, it's not yet clear if we will need additional bytecodes, 
> but probably so.

Another possibility might be as an option to the "collect" command,
eg.
 > collect /s foo (collect foo as a string).

Does seem like an additional byte code would be needed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Collecting strings at tracepoints
  2010-06-04 22:53 [RFC] Collecting strings at tracepoints Stan Shebs
  2010-06-04 23:00 ` Michael Snyder
@ 2010-06-08 21:19 ` Tom Tromey
  2010-06-15 22:51 ` Doug Evans
  2 siblings, 0 replies; 8+ messages in thread
From: Tom Tromey @ 2010-06-08 21:19 UTC (permalink / raw)
  To: Stan Shebs; +Cc: gdb

>>>>> "Stan" == Stan Shebs <stan@codesourcery.com> writes:

Stan> For tracepoint collection, I think the automatic heuristic is probably
Stan> not a good idea.

I agree.

Stan> Looking at how '@' syntax works, we can extend it without disrupting
Stan> expression parsing much.  For instance, "str@@" could mean to
Stan> deference str, and collect bytes until a 0 is seen, or the print
Stan> elements limit is reached (implication is that we would have to tell
Stan> the target that number).  The user could exercise even finer control
Stan> by supplying the limit explicitly, for instance "str@/80" to collect
Stan> at most 80 chars of the string.  ("str@@80" seems like it would cause
Stan> ambiguity problems vs "str@@").

Wouldn't it be "*str @@ 80" (note the leading "*").

I'm not super fond of the syntax, but I think it is probably as good as
anything else I'd think up :-)

Stan> Under the hood, it's not yet clear if we will need additional
Stan> bytecodes, but probably so.

If you add additional AX bytecodes, please consider adding what is
needed for DWARF location expressions at the same time:

    http://sourceware.org/bugzilla/show_bug.cgi?id=11662

It seems better to batch such changes.

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Collecting strings at tracepoints
  2010-06-04 22:53 [RFC] Collecting strings at tracepoints Stan Shebs
  2010-06-04 23:00 ` Michael Snyder
  2010-06-08 21:19 ` Tom Tromey
@ 2010-06-15 22:51 ` Doug Evans
  2010-06-16  0:09   ` Stan Shebs
  2 siblings, 1 reply; 8+ messages in thread
From: Doug Evans @ 2010-06-15 22:51 UTC (permalink / raw)
  To: Stan Shebs; +Cc: gdb

On Fri, Jun 4, 2010 at 3:52 PM, Stan Shebs <stan@codesourcery.com> wrote:
> Collection of strings is a problem for tracepoint users, because the literal
> interpretation of "collect str", where str is a char *, is to collect the
> address of the string, but not any of its contents.  It is possible to use
> the '@' syntax to get some contents, for instance "collect str@40" acquires
> the first 40 characters, but it is a poor approximation; if the string is
> shorter than that, you collect more than necessary, and possibly run into
> access trouble if str+40 is outside the program's address space, or else the
> string is longer, in which case you may miss the part you really wanted.
>
> For normal printing of strings GDB has a couple tricks it does.  First, it
> explicitly recognizes types that are pointers to chars, and automatically
> dereferences and prints the bytes it finds.  Second, the print elements
> limit prevents excessive output in case the string is long.
>
> For tracepoint collection, I think the automatic heuristic is probably not a
> good idea.  In interactive use, if you print too much string, or just wanted
> to see the address, there's no harm in displaying extra data.  But for
> tracing, the user needs a little more control, so that the buffer doesn't
> inadvertantly fill up too soon.  So I think that means that we should have
> the user explicitly request collection of string contents.
>
> Looking at how '@' syntax works, we can extend it without disrupting
> expression parsing much.  For instance, "str@@" could mean to deference str,
> and collect bytes until a 0 is seen, or the print elements limit is reached
> (implication is that we would have to tell the target that number).  The
> user could exercise even finer control by supplying the limit explicitly,
> for instance "str@/80" to collect at most 80 chars of the string.
>  ("str@@80" seems like it would cause ambiguity problems vs "str@@").
>
> This extended syntax could work for the print command too, in lieu of
> tweaking the print element limit, and for types that GDB does not recognize
> as a string type.

Apologies for coming into this a bit late.

I want to make sure I understand the proposed syntax.

str@@ would collect up to the first \0 or print elements limit.
str@/80 would collect up to the first \0 or 80 bytes.

That feels too inconsistent: "@@" triggers the special "up until the
first \0", *except* when its @/.
"up until the first \0" is one thing and specifying a limit is an
add-on.  Each should have their own syntax (e.g. str@@/80; it's
perhaps klunkier, but @@ is klunky to begin with. :-)]

Michael mentioned collect /s as a possibility.
That *feels* better, given that you mention the print command (if p/s
doesn't print its arg as a string, what does p/s mean?).
To add a max-length, "collect /80s" doesn't work, it's inconsistent
with the "x" command; "x /80s" doesn't mean "max 80 chars".
Maybe "collect /s@80"?  [At this point, I don't have a strong opinion
on @ vs another character.]
"x/s@80 foo" feels like a nice extension (print foo as a string up to 80 chars)
Plus "x/20s@80 foo" also works (print 20 strings beginning at foo,
each with a max length of 80).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Collecting strings at tracepoints
  2010-06-15 22:51 ` Doug Evans
@ 2010-06-16  0:09   ` Stan Shebs
  2010-06-16 18:00     ` Doug Evans
  0 siblings, 1 reply; 8+ messages in thread
From: Stan Shebs @ 2010-06-16  0:09 UTC (permalink / raw)
  To: Doug Evans; +Cc: Stan Shebs, gdb

Doug Evans wrote:
> On Fri, Jun 4, 2010 at 3:52 PM, Stan Shebs <stan@codesourcery.com> wrote:
>   
>> Collection of strings is a problem for tracepoint users, because the literal
>> interpretation of "collect str", where str is a char *, is to collect the
>> address of the string, but not any of its contents.  It is possible to use
>> the '@' syntax to get some contents, for instance "collect str@40" acquires
>> the first 40 characters, but it is a poor approximation; if the string is
>> shorter than that, you collect more than necessary, and possibly run into
>> access trouble if str+40 is outside the program's address space, or else the
>> string is longer, in which case you may miss the part you really wanted.
>>
>> For normal printing of strings GDB has a couple tricks it does.  First, it
>> explicitly recognizes types that are pointers to chars, and automatically
>> dereferences and prints the bytes it finds.  Second, the print elements
>> limit prevents excessive output in case the string is long.
>>
>> For tracepoint collection, I think the automatic heuristic is probably not a
>> good idea.  In interactive use, if you print too much string, or just wanted
>> to see the address, there's no harm in displaying extra data.  But for
>> tracing, the user needs a little more control, so that the buffer doesn't
>> inadvertantly fill up too soon.  So I think that means that we should have
>> the user explicitly request collection of string contents.
>>
>> Looking at how '@' syntax works, we can extend it without disrupting
>> expression parsing much.  For instance, "str@@" could mean to deference str,
>> and collect bytes until a 0 is seen, or the print elements limit is reached
>> (implication is that we would have to tell the target that number).  The
>> user could exercise even finer control by supplying the limit explicitly,
>> for instance "str@/80" to collect at most 80 chars of the string.
>>  ("str@@80" seems like it would cause ambiguity problems vs "str@@").
>>
>> This extended syntax could work for the print command too, in lieu of
>> tweaking the print element limit, and for types that GDB does not recognize
>> as a string type.
>>     
>
> Apologies for coming into this a bit late.
>   

I've been remiss in my replies, so will try to wrap all up here.

> I want to make sure I understand the proposed syntax.
>
> str@@ would collect up to the first \0 or print elements limit.
> str@/80 would collect up to the first \0 or 80 bytes.
>   

As Tom points out, it would actually be "*str@@" etc.

> That feels too inconsistent: "@@" triggers the special "up until the
> first \0", *except* when its @/.
> "up until the first \0" is one thing and specifying a limit is an
> add-on.  Each should have their own syntax (e.g. str@@/80; it's
> perhaps klunkier, but @@ is klunky to begin with. :-)]
>   

I just threw "@/" out there as something that was parseable.  @ is a 
totally general binary operator, the second argument doesn't have to be 
a constant (not even for tracing).  So any extensions to it need to be 
something that is not ambiguous with anything else.  "@@" for the common 
case seemed logical.  Allowing both "@@" and "@@<expr>" could get us 
into dangling-else style ambiguity; given that this is our arbitrary 
extension, why create parsing ambiguity if there is no language syntax 
forcing us to?

> Michael mentioned collect /s as a possibility.
> That *feels* better, given that you mention the print command (if p/s
> doesn't print its arg as a string, what does p/s mean?).
> To add a max-length, "collect /80s" doesn't work, it's inconsistent
> with the "x" command; "x /80s" doesn't mean "max 80 chars".
> Maybe "collect /s@80"?  [At this point, I don't have a strong opinion
> on @ vs another character.]
> "x/s@80 foo" feels like a nice extension (print foo as a string up to 80 chars)
> Plus "x/20s@80 foo" also works (print 20 strings beginning at foo,
> each with a max length of 80).
>
>   

The /s idea is appealing, but it has a couple downsides.  First, there 
is the default-collect variable, although I suppose "set default-collect 
/s str" could be made to have the right effect.  Second, it would apply 
to everything in the collection line, whether you realized it or not; I 
can see users getting burned because FUNKYTYPE is typedef'ed to char on 
some machines and not others, and so "collect /s str, funkytown" may 
fill the trace buffer unexpectedly quickly.  Having it available in 
expressions means that it can be used in more ways, although admittedly 
something like "collect $tsv = (*str@@[len-1] == (*str2@/80)[79])" is 
pretty freaky, not likely to be seen in real life.  We also need to do 
something for MI, since there are Eclipse users wanting to trace.

But the downsides aren't really bad, I think /s is worth considering 
further.

Stan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Collecting strings at tracepoints
  2010-06-16  0:09   ` Stan Shebs
@ 2010-06-16 18:00     ` Doug Evans
  2010-06-16 18:18       ` Stan Shebs
  0 siblings, 1 reply; 8+ messages in thread
From: Doug Evans @ 2010-06-16 18:00 UTC (permalink / raw)
  To: Stan Shebs; +Cc: gdb

On Tue, Jun 15, 2010 at 5:09 PM, Stan Shebs <stan@codesourcery.com> wrote:
> As Tom points out, it would actually be "*str@@" etc.

Yeah, I know.  I left that out as it wasn't germane to my msg.

>> That feels too inconsistent: "@@" triggers the special "up until the
>> first \0", *except* when its @/.
>> "up until the first \0" is one thing and specifying a limit is an
>> add-on.  Each should have their own syntax (e.g. str@@/80; it's
>> perhaps klunkier, but @@ is klunky to begin with. :-)]
>>
>
> I just threw "@/" out there as something that was parseable.  @ is a totally
> general binary operator, the second argument doesn't have to be a constant
> (not even for tracing).  So any extensions to it need to be something that
> is not ambiguous with anything else.  "@@" for the common case seemed
> logical.  Allowing both "@@" and "@@<expr>" could get us into dangling-else
> style ambiguity; given that this is our arbitrary extension, why create
> parsing ambiguity if there is no language syntax forcing us to?

I don't quite follow.
You're going from @ being a binary operator and extending it, to
concerns of @@ vs @@<expr>.
Guessing, you're not really extending @ except visually.

> Second, it would apply to
> everything in the collection line, whether you realized it or not; I can see
> users getting burned because FUNKYTYPE is typedef'ed to char on some
> machines and not others, and so "collect /s str, funkytown" may fill the
> trace buffer unexpectedly quickly.

Ah.  I wasn't aware one could do "collect a,b,c,d".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Collecting strings at tracepoints
  2010-06-16 18:00     ` Doug Evans
@ 2010-06-16 18:18       ` Stan Shebs
  2010-06-16 18:27         ` Tom Tromey
  0 siblings, 1 reply; 8+ messages in thread
From: Stan Shebs @ 2010-06-16 18:18 UTC (permalink / raw)
  To: Doug Evans; +Cc: Stan Shebs, gdb

Doug Evans wrote:
> On Tue, Jun 15, 2010 at 5:09 PM, Stan Shebs <stan@codesourcery.com> wrote:
>   
>> I just threw "@/" out there as something that was parseable.  @ is a totally
>> general binary operator, the second argument doesn't have to be a constant
>> (not even for tracing).  So any extensions to it need to be something that
>> is not ambiguous with anything else.  "@@" for the common case seemed
>> logical.  Allowing both "@@" and "@@<expr>" could get us into dangling-else
>> style ambiguity; given that this is our arbitrary extension, why create
>> parsing ambiguity if there is no language syntax forcing us to?
>>     
>
> I don't quite follow.
> You're going from @ being a binary operator and extending it, to
> concerns of @@ vs @@<expr>.
> Guessing, you're not really extending @ except visually.
>
>   

That's right.  Partly because the expedient for string collection right 
now is "*str@40", so it extends a known behavior, and partly because '@' 
is about the only character that isn't already claimed by language 
and/or GDB command syntax.

As far as parsing goes, it wasn't obvious to me whether it make more 
sense to add new tokens like "@@" etc, or to add syntax rules using 
single-char tokens.  I haven't actually tried implementing anything yet, 
although looking at the calendar, I think I'd better get busy. :-)

It occurs to me that /s and @@ are not mutually exclusive, and it 
wouldn't be bad if both forms were available.  Users like it when they 
can guess at the syntax and everything works as expected. :-)

Stan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Collecting strings at tracepoints
  2010-06-16 18:18       ` Stan Shebs
@ 2010-06-16 18:27         ` Tom Tromey
  0 siblings, 0 replies; 8+ messages in thread
From: Tom Tromey @ 2010-06-16 18:27 UTC (permalink / raw)
  To: Stan Shebs; +Cc: Doug Evans, gdb

>>>>> "Stan" == Stan Shebs <stan@codesourcery.com> writes:

Stan> That's right.  Partly because the expedient for string collection
Stan> right now is "*str@40", so it extends a known behavior, and partly
Stan> because '@' is about the only character that isn't already claimed by
Stan> language and/or GDB command syntax.

Not to bikeshed this too much, but another idea would be to extend the
current syntax:

*str @ -1          /* find the \0 */
*str @ { -1, 80 }  /* find the \0, or stop at 80 */

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-06-16 18:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-04 22:53 [RFC] Collecting strings at tracepoints Stan Shebs
2010-06-04 23:00 ` Michael Snyder
2010-06-08 21:19 ` Tom Tromey
2010-06-15 22:51 ` Doug Evans
2010-06-16  0:09   ` Stan Shebs
2010-06-16 18:00     ` Doug Evans
2010-06-16 18:18       ` Stan Shebs
2010-06-16 18:27         ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).