* [RFC] Collecting strings at tracepoints
@ 2010-06-04 22:53 Stan Shebs
2010-06-04 23:00 ` Michael Snyder
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Stan Shebs @ 2010-06-04 22:53 UTC (permalink / raw)
To: gdb
Collection of strings is a problem for tracepoint users, because the
literal interpretation of "collect str", where str is a char *, is to
collect the address of the string, but not any of its contents. It is
possible to use the '@' syntax to get some contents, for instance
"collect str@40" acquires the first 40 characters, but it is a poor
approximation; if the string is shorter than that, you collect more than
necessary, and possibly run into access trouble if str+40 is outside the
program's address space, or else the string is longer, in which case you
may miss the part you really wanted.
For normal printing of strings GDB has a couple tricks it does. First,
it explicitly recognizes types that are pointers to chars, and
automatically dereferences and prints the bytes it finds. Second, the
print elements limit prevents excessive output in case the string is long.
For tracepoint collection, I think the automatic heuristic is probably
not a good idea. In interactive use, if you print too much string, or
just wanted to see the address, there's no harm in displaying extra
data. But for tracing, the user needs a little more control, so that
the buffer doesn't inadvertantly fill up too soon. So I think that
means that we should have the user explicitly request collection of
string contents.
Looking at how '@' syntax works, we can extend it without disrupting
expression parsing much. For instance, "str@@" could mean to deference
str, and collect bytes until a 0 is seen, or the print elements limit is
reached (implication is that we would have to tell the target that
number). The user could exercise even finer control by supplying the
limit explicitly, for instance "str@/80" to collect at most 80 chars of
the string. ("str@@80" seems like it would cause ambiguity problems vs
"str@@").
This extended syntax could work for the print command too, in lieu of
tweaking the print element limit, and for types that GDB does not
recognize as a string type.
Under the hood, it's not yet clear if we will need additional bytecodes,
but probably so.
Stan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] Collecting strings at tracepoints
2010-06-04 22:53 [RFC] Collecting strings at tracepoints Stan Shebs
@ 2010-06-04 23:00 ` Michael Snyder
2010-06-08 21:19 ` Tom Tromey
2010-06-15 22:51 ` Doug Evans
2 siblings, 0 replies; 8+ messages in thread
From: Michael Snyder @ 2010-06-04 23:00 UTC (permalink / raw)
To: Stan Shebs; +Cc: gdb
Stan Shebs wrote:
> Collection of strings is a problem for tracepoint users, because the
> literal interpretation of "collect str", where str is a char *, is to
> collect the address of the string, but not any of its contents. It is
> possible to use the '@' syntax to get some contents, for instance
> "collect str@40" acquires the first 40 characters, but it is a poor
> approximation; if the string is shorter than that, you collect more than
> necessary, and possibly run into access trouble if str+40 is outside the
> program's address space, or else the string is longer, in which case you
> may miss the part you really wanted.
>
> For normal printing of strings GDB has a couple tricks it does. First,
> it explicitly recognizes types that are pointers to chars, and
> automatically dereferences and prints the bytes it finds. Second, the
> print elements limit prevents excessive output in case the string is long.
>
> For tracepoint collection, I think the automatic heuristic is probably
> not a good idea. In interactive use, if you print too much string, or
> just wanted to see the address, there's no harm in displaying extra
> data. But for tracing, the user needs a little more control, so that
> the buffer doesn't inadvertantly fill up too soon. So I think that
> means that we should have the user explicitly request collection of
> string contents.
>
> Looking at how '@' syntax works, we can extend it without disrupting
> expression parsing much. For instance, "str@@" could mean to deference
> str, and collect bytes until a 0 is seen, or the print elements limit is
> reached (implication is that we would have to tell the target that
> number). The user could exercise even finer control by supplying the
> limit explicitly, for instance "str@/80" to collect at most 80 chars of
> the string. ("str@@80" seems like it would cause ambiguity problems vs
> "str@@").
>
> This extended syntax could work for the print command too, in lieu of
> tweaking the print element limit, and for types that GDB does not
> recognize as a string type.
>
> Under the hood, it's not yet clear if we will need additional bytecodes,
> but probably so.
Another possibility might be as an option to the "collect" command,
eg.
> collect /s foo (collect foo as a string).
Does seem like an additional byte code would be needed.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] Collecting strings at tracepoints
2010-06-04 22:53 [RFC] Collecting strings at tracepoints Stan Shebs
2010-06-04 23:00 ` Michael Snyder
@ 2010-06-08 21:19 ` Tom Tromey
2010-06-15 22:51 ` Doug Evans
2 siblings, 0 replies; 8+ messages in thread
From: Tom Tromey @ 2010-06-08 21:19 UTC (permalink / raw)
To: Stan Shebs; +Cc: gdb
>>>>> "Stan" == Stan Shebs <stan@codesourcery.com> writes:
Stan> For tracepoint collection, I think the automatic heuristic is probably
Stan> not a good idea.
I agree.
Stan> Looking at how '@' syntax works, we can extend it without disrupting
Stan> expression parsing much. For instance, "str@@" could mean to
Stan> deference str, and collect bytes until a 0 is seen, or the print
Stan> elements limit is reached (implication is that we would have to tell
Stan> the target that number). The user could exercise even finer control
Stan> by supplying the limit explicitly, for instance "str@/80" to collect
Stan> at most 80 chars of the string. ("str@@80" seems like it would cause
Stan> ambiguity problems vs "str@@").
Wouldn't it be "*str @@ 80" (note the leading "*").
I'm not super fond of the syntax, but I think it is probably as good as
anything else I'd think up :-)
Stan> Under the hood, it's not yet clear if we will need additional
Stan> bytecodes, but probably so.
If you add additional AX bytecodes, please consider adding what is
needed for DWARF location expressions at the same time:
http://sourceware.org/bugzilla/show_bug.cgi?id=11662
It seems better to batch such changes.
Tom
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] Collecting strings at tracepoints
2010-06-04 22:53 [RFC] Collecting strings at tracepoints Stan Shebs
2010-06-04 23:00 ` Michael Snyder
2010-06-08 21:19 ` Tom Tromey
@ 2010-06-15 22:51 ` Doug Evans
2010-06-16 0:09 ` Stan Shebs
2 siblings, 1 reply; 8+ messages in thread
From: Doug Evans @ 2010-06-15 22:51 UTC (permalink / raw)
To: Stan Shebs; +Cc: gdb
On Fri, Jun 4, 2010 at 3:52 PM, Stan Shebs <stan@codesourcery.com> wrote:
> Collection of strings is a problem for tracepoint users, because the literal
> interpretation of "collect str", where str is a char *, is to collect the
> address of the string, but not any of its contents. It is possible to use
> the '@' syntax to get some contents, for instance "collect str@40" acquires
> the first 40 characters, but it is a poor approximation; if the string is
> shorter than that, you collect more than necessary, and possibly run into
> access trouble if str+40 is outside the program's address space, or else the
> string is longer, in which case you may miss the part you really wanted.
>
> For normal printing of strings GDB has a couple tricks it does. First, it
> explicitly recognizes types that are pointers to chars, and automatically
> dereferences and prints the bytes it finds. Second, the print elements
> limit prevents excessive output in case the string is long.
>
> For tracepoint collection, I think the automatic heuristic is probably not a
> good idea. In interactive use, if you print too much string, or just wanted
> to see the address, there's no harm in displaying extra data. But for
> tracing, the user needs a little more control, so that the buffer doesn't
> inadvertantly fill up too soon. So I think that means that we should have
> the user explicitly request collection of string contents.
>
> Looking at how '@' syntax works, we can extend it without disrupting
> expression parsing much. For instance, "str@@" could mean to deference str,
> and collect bytes until a 0 is seen, or the print elements limit is reached
> (implication is that we would have to tell the target that number). The
> user could exercise even finer control by supplying the limit explicitly,
> for instance "str@/80" to collect at most 80 chars of the string.
> ("str@@80" seems like it would cause ambiguity problems vs "str@@").
>
> This extended syntax could work for the print command too, in lieu of
> tweaking the print element limit, and for types that GDB does not recognize
> as a string type.
Apologies for coming into this a bit late.
I want to make sure I understand the proposed syntax.
str@@ would collect up to the first \0 or print elements limit.
str@/80 would collect up to the first \0 or 80 bytes.
That feels too inconsistent: "@@" triggers the special "up until the
first \0", *except* when its @/.
"up until the first \0" is one thing and specifying a limit is an
add-on. Each should have their own syntax (e.g. str@@/80; it's
perhaps klunkier, but @@ is klunky to begin with. :-)]
Michael mentioned collect /s as a possibility.
That *feels* better, given that you mention the print command (if p/s
doesn't print its arg as a string, what does p/s mean?).
To add a max-length, "collect /80s" doesn't work, it's inconsistent
with the "x" command; "x /80s" doesn't mean "max 80 chars".
Maybe "collect /s@80"? [At this point, I don't have a strong opinion
on @ vs another character.]
"x/s@80 foo" feels like a nice extension (print foo as a string up to 80 chars)
Plus "x/20s@80 foo" also works (print 20 strings beginning at foo,
each with a max length of 80).
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] Collecting strings at tracepoints
2010-06-15 22:51 ` Doug Evans
@ 2010-06-16 0:09 ` Stan Shebs
2010-06-16 18:00 ` Doug Evans
0 siblings, 1 reply; 8+ messages in thread
From: Stan Shebs @ 2010-06-16 0:09 UTC (permalink / raw)
To: Doug Evans; +Cc: Stan Shebs, gdb
Doug Evans wrote:
> On Fri, Jun 4, 2010 at 3:52 PM, Stan Shebs <stan@codesourcery.com> wrote:
>
>> Collection of strings is a problem for tracepoint users, because the literal
>> interpretation of "collect str", where str is a char *, is to collect the
>> address of the string, but not any of its contents. It is possible to use
>> the '@' syntax to get some contents, for instance "collect str@40" acquires
>> the first 40 characters, but it is a poor approximation; if the string is
>> shorter than that, you collect more than necessary, and possibly run into
>> access trouble if str+40 is outside the program's address space, or else the
>> string is longer, in which case you may miss the part you really wanted.
>>
>> For normal printing of strings GDB has a couple tricks it does. First, it
>> explicitly recognizes types that are pointers to chars, and automatically
>> dereferences and prints the bytes it finds. Second, the print elements
>> limit prevents excessive output in case the string is long.
>>
>> For tracepoint collection, I think the automatic heuristic is probably not a
>> good idea. In interactive use, if you print too much string, or just wanted
>> to see the address, there's no harm in displaying extra data. But for
>> tracing, the user needs a little more control, so that the buffer doesn't
>> inadvertantly fill up too soon. So I think that means that we should have
>> the user explicitly request collection of string contents.
>>
>> Looking at how '@' syntax works, we can extend it without disrupting
>> expression parsing much. For instance, "str@@" could mean to deference str,
>> and collect bytes until a 0 is seen, or the print elements limit is reached
>> (implication is that we would have to tell the target that number). The
>> user could exercise even finer control by supplying the limit explicitly,
>> for instance "str@/80" to collect at most 80 chars of the string.
>> ("str@@80" seems like it would cause ambiguity problems vs "str@@").
>>
>> This extended syntax could work for the print command too, in lieu of
>> tweaking the print element limit, and for types that GDB does not recognize
>> as a string type.
>>
>
> Apologies for coming into this a bit late.
>
I've been remiss in my replies, so will try to wrap all up here.
> I want to make sure I understand the proposed syntax.
>
> str@@ would collect up to the first \0 or print elements limit.
> str@/80 would collect up to the first \0 or 80 bytes.
>
As Tom points out, it would actually be "*str@@" etc.
> That feels too inconsistent: "@@" triggers the special "up until the
> first \0", *except* when its @/.
> "up until the first \0" is one thing and specifying a limit is an
> add-on. Each should have their own syntax (e.g. str@@/80; it's
> perhaps klunkier, but @@ is klunky to begin with. :-)]
>
I just threw "@/" out there as something that was parseable. @ is a
totally general binary operator, the second argument doesn't have to be
a constant (not even for tracing). So any extensions to it need to be
something that is not ambiguous with anything else. "@@" for the common
case seemed logical. Allowing both "@@" and "@@<expr>" could get us
into dangling-else style ambiguity; given that this is our arbitrary
extension, why create parsing ambiguity if there is no language syntax
forcing us to?
> Michael mentioned collect /s as a possibility.
> That *feels* better, given that you mention the print command (if p/s
> doesn't print its arg as a string, what does p/s mean?).
> To add a max-length, "collect /80s" doesn't work, it's inconsistent
> with the "x" command; "x /80s" doesn't mean "max 80 chars".
> Maybe "collect /s@80"? [At this point, I don't have a strong opinion
> on @ vs another character.]
> "x/s@80 foo" feels like a nice extension (print foo as a string up to 80 chars)
> Plus "x/20s@80 foo" also works (print 20 strings beginning at foo,
> each with a max length of 80).
>
>
The /s idea is appealing, but it has a couple downsides. First, there
is the default-collect variable, although I suppose "set default-collect
/s str" could be made to have the right effect. Second, it would apply
to everything in the collection line, whether you realized it or not; I
can see users getting burned because FUNKYTYPE is typedef'ed to char on
some machines and not others, and so "collect /s str, funkytown" may
fill the trace buffer unexpectedly quickly. Having it available in
expressions means that it can be used in more ways, although admittedly
something like "collect $tsv = (*str@@[len-1] == (*str2@/80)[79])" is
pretty freaky, not likely to be seen in real life. We also need to do
something for MI, since there are Eclipse users wanting to trace.
But the downsides aren't really bad, I think /s is worth considering
further.
Stan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] Collecting strings at tracepoints
2010-06-16 0:09 ` Stan Shebs
@ 2010-06-16 18:00 ` Doug Evans
2010-06-16 18:18 ` Stan Shebs
0 siblings, 1 reply; 8+ messages in thread
From: Doug Evans @ 2010-06-16 18:00 UTC (permalink / raw)
To: Stan Shebs; +Cc: gdb
On Tue, Jun 15, 2010 at 5:09 PM, Stan Shebs <stan@codesourcery.com> wrote:
> As Tom points out, it would actually be "*str@@" etc.
Yeah, I know. I left that out as it wasn't germane to my msg.
>> That feels too inconsistent: "@@" triggers the special "up until the
>> first \0", *except* when its @/.
>> "up until the first \0" is one thing and specifying a limit is an
>> add-on. Each should have their own syntax (e.g. str@@/80; it's
>> perhaps klunkier, but @@ is klunky to begin with. :-)]
>>
>
> I just threw "@/" out there as something that was parseable. @ is a totally
> general binary operator, the second argument doesn't have to be a constant
> (not even for tracing). So any extensions to it need to be something that
> is not ambiguous with anything else. "@@" for the common case seemed
> logical. Allowing both "@@" and "@@<expr>" could get us into dangling-else
> style ambiguity; given that this is our arbitrary extension, why create
> parsing ambiguity if there is no language syntax forcing us to?
I don't quite follow.
You're going from @ being a binary operator and extending it, to
concerns of @@ vs @@<expr>.
Guessing, you're not really extending @ except visually.
> Second, it would apply to
> everything in the collection line, whether you realized it or not; I can see
> users getting burned because FUNKYTYPE is typedef'ed to char on some
> machines and not others, and so "collect /s str, funkytown" may fill the
> trace buffer unexpectedly quickly.
Ah. I wasn't aware one could do "collect a,b,c,d".
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] Collecting strings at tracepoints
2010-06-16 18:00 ` Doug Evans
@ 2010-06-16 18:18 ` Stan Shebs
2010-06-16 18:27 ` Tom Tromey
0 siblings, 1 reply; 8+ messages in thread
From: Stan Shebs @ 2010-06-16 18:18 UTC (permalink / raw)
To: Doug Evans; +Cc: Stan Shebs, gdb
Doug Evans wrote:
> On Tue, Jun 15, 2010 at 5:09 PM, Stan Shebs <stan@codesourcery.com> wrote:
>
>> I just threw "@/" out there as something that was parseable. @ is a totally
>> general binary operator, the second argument doesn't have to be a constant
>> (not even for tracing). So any extensions to it need to be something that
>> is not ambiguous with anything else. "@@" for the common case seemed
>> logical. Allowing both "@@" and "@@<expr>" could get us into dangling-else
>> style ambiguity; given that this is our arbitrary extension, why create
>> parsing ambiguity if there is no language syntax forcing us to?
>>
>
> I don't quite follow.
> You're going from @ being a binary operator and extending it, to
> concerns of @@ vs @@<expr>.
> Guessing, you're not really extending @ except visually.
>
>
That's right. Partly because the expedient for string collection right
now is "*str@40", so it extends a known behavior, and partly because '@'
is about the only character that isn't already claimed by language
and/or GDB command syntax.
As far as parsing goes, it wasn't obvious to me whether it make more
sense to add new tokens like "@@" etc, or to add syntax rules using
single-char tokens. I haven't actually tried implementing anything yet,
although looking at the calendar, I think I'd better get busy. :-)
It occurs to me that /s and @@ are not mutually exclusive, and it
wouldn't be bad if both forms were available. Users like it when they
can guess at the syntax and everything works as expected. :-)
Stan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] Collecting strings at tracepoints
2010-06-16 18:18 ` Stan Shebs
@ 2010-06-16 18:27 ` Tom Tromey
0 siblings, 0 replies; 8+ messages in thread
From: Tom Tromey @ 2010-06-16 18:27 UTC (permalink / raw)
To: Stan Shebs; +Cc: Doug Evans, gdb
>>>>> "Stan" == Stan Shebs <stan@codesourcery.com> writes:
Stan> That's right. Partly because the expedient for string collection
Stan> right now is "*str@40", so it extends a known behavior, and partly
Stan> because '@' is about the only character that isn't already claimed by
Stan> language and/or GDB command syntax.
Not to bikeshed this too much, but another idea would be to extend the
current syntax:
*str @ -1 /* find the \0 */
*str @ { -1, 80 } /* find the \0, or stop at 80 */
Tom
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-06-16 18:27 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-04 22:53 [RFC] Collecting strings at tracepoints Stan Shebs
2010-06-04 23:00 ` Michael Snyder
2010-06-08 21:19 ` Tom Tromey
2010-06-15 22:51 ` Doug Evans
2010-06-16 0:09 ` Stan Shebs
2010-06-16 18:00 ` Doug Evans
2010-06-16 18:18 ` Stan Shebs
2010-06-16 18:27 ` Tom Tromey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).