public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug translator/14431] New: NULL/invalid char * pretty printed as "<unknown>" string
@ 2012-08-03 10:30 mjw at redhat dot com
  2012-08-04 22:58 ` [Bug translator/14431] " mjw at redhat dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: mjw at redhat dot com @ 2012-08-03 10:30 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=14431

             Bug #: 14431
           Summary: NULL/invalid char * pretty printed as "<unknown>"
                    string
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: translator
        AssignedTo: systemtap@sourceware.org
        ReportedBy: mjw@redhat.com
    Classification: Unclassified


If you have a char or unsigned char pointer and pretty print it, but it doesn't
actually point to a valid string, then the result is not very useful. char *
and unsigned char * are often used as arbitrary pointers/markers in code, not
just for strings.

Contrived example:

#include <malloc.h>
#include <stddef.h>

struct foobar
{
  char *foo;
  size_t foo_size;
  unsigned char *bar;
  size_t bar_size;
};

static struct foobar *fb;

unsigned char
use ()
{
  if (fb->foo) *fb->foo = 'a';
  return fb->bar_size > 0 ? *fb->bar : 0;
}

int
main (int argc, char **argv)
{
  struct foobar stack;
  fb = &stack;
  fb->foo_size = 42;
  fb->bar_size = 47;
  fb->foo = (char *) calloc (fb->foo_size, 1);
  fb->bar = (unsigned char *) calloc (fb->bar_size, 1);
  fb->bar_size = use (&fb);
  fb->foo = NULL;
  fb->bar = fb->foo + fb->foo_size;
  fb->bar_size = 0;
  return use (&fb);
}

$ gcc -g -o foobar foobar.c 
$ stap -e 'probe process.function("use") { log($fb$) }' -c ./foobar
{.foo="", .foo_size=42, .bar="", .bar_size=47}
{.foo="<unknown>", .foo_size=42, .bar="<unknown>", .bar_size=0}

It would have been convenient if those last two foo/bar pointers were
represented by something different than just "<unknown>".

Maybe "<unknown@0x0>" and "<unknown@0x42>"?

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug translator/14431] NULL/invalid char * pretty printed as "<unknown>" string
  2012-08-03 10:30 [Bug translator/14431] New: NULL/invalid char * pretty printed as "<unknown>" string mjw at redhat dot com
@ 2012-08-04 22:58 ` mjw at redhat dot com
  2012-08-05 20:29 ` jistone at redhat dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: mjw at redhat dot com @ 2012-08-04 22:58 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=14431

--- Comment #1 from Mark Wielaard <mjw at redhat dot com> 2012-08-04 22:57:54 UTC ---
What about we always also add the address of a char array?

diff --git a/tapsets.cxx b/tapsets.cxx
index 1068a42..edeb30c 100644
--- a/tapsets.cxx
+++ b/tapsets.cxx
@@ -2893,18 +2893,20 @@ dwarf_pretty_print::print_chars (Dwarf_Die* start_type,
target_symbol* e,
   const char *name = dwarf_diename (&type);
   if (name && (name == string("char") || name == string("unsigned char")))
     {
-      if (push_deref (pf, "\"%s\"", e))
+      if (push_deref (pf, "\"%s\"@%p", e))
         {
           // steal the last arg for a string access
           assert (!pf->args.empty());
+          expression* expr = pf->args.back();
           functioncall* fcall = new functioncall;
           fcall->tok = e->tok;
           fcall->function = userspace_p ? "user_string2" : "kernel_string2";
-          fcall->args.push_back (pf->args.back());
+          fcall->args.push_back (expr);
           expression *err_msg = new literal_string ("<unknown>");
           err_msg->tok = e->tok;
           fcall->args.push_back (err_msg);
           pf->args.back() = fcall;
+         pf->args.push_back (expr);
         }
       return true;
     }

For our contrived example above that would give:

{.foo=""@0xcb4010, .foo_size=42, .bar=""@0xcb4050, .bar_size=47}
{.foo="<unknown>"@0x0, .foo_size=42, .bar="<unknown>"@0x2a, .bar_size=0}

Does that look reasonable or silly?

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug translator/14431] NULL/invalid char * pretty printed as "<unknown>" string
  2012-08-03 10:30 [Bug translator/14431] New: NULL/invalid char * pretty printed as "<unknown>" string mjw at redhat dot com
  2012-08-04 22:58 ` [Bug translator/14431] " mjw at redhat dot com
@ 2012-08-05 20:29 ` jistone at redhat dot com
  2012-08-06  8:43 ` [Bug translator/14431] char * always being printed as, possibly INVALID/NULL "<unknown>", string without giving actual address mjw at redhat dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jistone at redhat dot com @ 2012-08-05 20:29 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=14431

Josh Stone <jistone at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jistone at redhat dot com

--- Comment #2 from Josh Stone <jistone at redhat dot com> 2012-08-05 20:29:44 UTC ---
(In reply to comment #1)
> What about we always also add the address of a char array?
[...]
> For our contrived example above that would give:
> 
> {.foo=""@0xcb4010, .foo_size=42, .bar=""@0xcb4050, .bar_size=47}
> {.foo="<unknown>"@0x0, .foo_size=42, .bar="<unknown>"@0x2a, .bar_size=0}
> 
> Does that look reasonable or silly?

It's tough to automagically do The Right Thing without knowing what the user is
looking for.  And in some cases, especially with kernel pointers, the %p may
overshadow the actual string, something like:

> {.foo=""@0xffffffffffcb4010, .foo_size=42, .bar=""@0xffffffffffcb4050, .bar_size=47}

Rather than trying both string value and address every time, maybe we should
make new kernel/user_string variants that return either a quoted string *or* a
pointer value, e.g. "\"foo\"" or "0x2a" depending on whether it's readable. 
Maybe "?@0x2a" to indicate it was tried but couldn't be read.  Your examples
would be:

> {.foo="", .foo_size=42, .bar="", .bar_size=47}
> {.foo=?@0x0, .foo_size=42, .bar=?@0x2a, .bar_size=0}

We can only try to be just so smart... for people who know they want something
different, like all pointer addresses, they can always roll their own
pretty-print.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug translator/14431] char * always being printed as, possibly INVALID/NULL "<unknown>", string without giving actual address
  2012-08-03 10:30 [Bug translator/14431] New: NULL/invalid char * pretty printed as "<unknown>" string mjw at redhat dot com
  2012-08-04 22:58 ` [Bug translator/14431] " mjw at redhat dot com
  2012-08-05 20:29 ` jistone at redhat dot com
@ 2012-08-06  8:43 ` mjw at redhat dot com
  2012-08-06 15:32 ` mjw at redhat dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: mjw at redhat dot com @ 2012-08-06  8:43 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=14431

Mark Wielaard <mjw at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|NULL/invalid char * pretty  |char * always being printed
                   |printed as "<unknown>"      |as, possibly INVALID/NULL
                   |string                      |"<unknown>", string without
                   |                            |giving actual address

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug translator/14431] char * always being printed as, possibly INVALID/NULL "<unknown>", string without giving actual address
  2012-08-03 10:30 [Bug translator/14431] New: NULL/invalid char * pretty printed as "<unknown>" string mjw at redhat dot com
                   ` (2 preceding siblings ...)
  2012-08-06  8:43 ` [Bug translator/14431] char * always being printed as, possibly INVALID/NULL "<unknown>", string without giving actual address mjw at redhat dot com
@ 2012-08-06 15:32 ` mjw at redhat dot com
  2012-08-06 17:24 ` jistone at redhat dot com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: mjw at redhat dot com @ 2012-08-06 15:32 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=14431

--- Comment #3 from Mark Wielaard <mjw at redhat dot com> 2012-08-06 15:32:38 UTC ---
(In reply to comment #2)
> (In reply to comment #1)
> > What about we always also add the address of a char array?
> [...]
> > For our contrived example above that would give:
> > 
> > {.foo=""@0xcb4010, .foo_size=42, .bar=""@0xcb4050, .bar_size=47}
> > {.foo="<unknown>"@0x0, .foo_size=42, .bar="<unknown>"@0x2a, .bar_size=0}
> > 
> > Does that look reasonable or silly?
> 
> It's tough to automagically do The Right Thing without knowing what the user is
> looking for.

Yes, that is why we should just give all information instead of guessing what
the user wants.

>  And in some cases, especially with kernel pointers, the %p may
> overshadow the actual string, something like:
> 
> > {.foo=""@0xffffffffffcb4010, .foo_size=42, .bar=""@0xffffffffffcb4050, .bar_size=47}

What do you mean by overshadow?
I think this example shows nicely why you always want the address, since the
strings look the same (are empty - by accident?), but clearly they are not the
same since they are at different addresses.

> Rather than trying both string value and address every time, maybe we should
> make new kernel/user_string variants that return either a quoted string *or* a
> pointer value, e.g. "\"foo\"" or "0x2a" depending on whether it's readable. 
> Maybe "?@0x2a" to indicate it was tried but couldn't be read.  Your examples
> would be:
> 
> > {.foo="", .foo_size=42, .bar="", .bar_size=47}
> > {.foo=?@0x0, .foo_size=42, .bar=?@0x2a, .bar_size=0}
> 
> We can only try to be just so smart...

Yes, that was what I was going for at first. But that seems "too smart".
We might think we are seeing valid strings because there is some zero character
nearby, but that might not be at all what the user was going for (they just use
char * as generic pointer, not as string pointer).

I do like the usage of plain ? instead of "<unknown>". Updated patch below.

> for people who know they want something
> different, like all pointer addresses, they can always roll their own
> pretty-print.

Is that really that simple? I wasn't able to do that easily. Except by printing
the fields individually with my own identifiers.

Updated patch:

diff --git a/tapsets.cxx b/tapsets.cxx
index 1068a42..f3ffaa8 100644
--- a/tapsets.cxx
+++ b/tapsets.cxx
@@ -2893,18 +2893,20 @@ dwarf_pretty_print::print_chars (Dwarf_Die* start_type,
target_symbol* e,
   const char *name = dwarf_diename (&type);
   if (name && (name == string("char") || name == string("unsigned char")))
     {
-      if (push_deref (pf, "\"%s\"", e))
+      if (push_deref (pf, "\"%s\"@%p", e))
         {
           // steal the last arg for a string access
           assert (!pf->args.empty());
+          expression* expr = pf->args.back();
           functioncall* fcall = new functioncall;
           fcall->tok = e->tok;
           fcall->function = userspace_p ? "user_string2" : "kernel_string2";
-          fcall->args.push_back (pf->args.back());
-          expression *err_msg = new literal_string ("<unknown>");
+          fcall->args.push_back (expr);
+          expression *err_msg = new literal_string ("?");
           err_msg->tok = e->tok;
           fcall->args.push_back (err_msg);
           pf->args.back() = fcall;
+          pf->args.push_back (expr);
         }
       return true;
     }

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug translator/14431] char * always being printed as, possibly INVALID/NULL "<unknown>", string without giving actual address
  2012-08-03 10:30 [Bug translator/14431] New: NULL/invalid char * pretty printed as "<unknown>" string mjw at redhat dot com
                   ` (3 preceding siblings ...)
  2012-08-06 15:32 ` mjw at redhat dot com
@ 2012-08-06 17:24 ` jistone at redhat dot com
  2012-08-06 18:59 ` mjw at redhat dot com
  2015-10-21 18:50 ` fche at redhat dot com
  6 siblings, 0 replies; 8+ messages in thread
From: jistone at redhat dot com @ 2012-08-06 17:24 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=14431

--- Comment #4 from Josh Stone <jistone at redhat dot com> 2012-08-06 17:24:42 UTC ---
(In reply to comment #3)
> (In reply to comment #2)
> > 
> > It's tough to automagically do The Right Thing without knowing what the user is
> > looking for.
> 
> Yes, that is why we should just give all information instead of guessing what
> the user wants.

But too much information will make it difficult to read.

> >  And in some cases, especially with kernel pointers, the %p may
> > overshadow the actual string, something like:
> > 
> > > {.foo=""@0xffffffffffcb4010, .foo_size=42, .bar=""@0xffffffffffcb4050, .bar_size=47}
> 
> What do you mean by overshadow?

I meant that since the value of kernel pointers on 64-bit will always take 16
characters, it clutters the output quite a bit.  That may be a matter of
opinion, but this is supposed to be "pretty" printing.

> I think this example shows nicely why you always want the address, since the
> strings look the same (are empty - by accident?), but clearly they are not the
> same since they are at different addresses.

If you really are looking just for string values, then the memory addresses of
those strings are irrelevant.

> I do like the usage of plain ? instead of "<unknown>".

Note the difference that I proposed it *without* quotes, so it's clear that the
string is not really "?".

> > for people who know they want something
> > different, like all pointer addresses, they can always roll their own
> > pretty-print.
> 
> Is that really that simple? I wasn't able to do that easily. Except by printing
> the fields individually with my own identifiers.

Yes, you'd print fields yourself individually to get the formatting you really
want.

We can't be all things to all people with a heuristic like pretty printing.  I
tried to make it as simple-yet-useful as possible, to hopefully capture most
cases.  Another example is chasing & expanding struct pointers, which we
decided was too difficult to get "right" without cluttering the general case.

So here you have "possibly invalid char*", or worse "valid char* to a
non-string" - I'm not sure we can fully address this without making the "char*
pointing to a string" case worse.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug translator/14431] char * always being printed as, possibly INVALID/NULL "<unknown>", string without giving actual address
  2012-08-03 10:30 [Bug translator/14431] New: NULL/invalid char * pretty printed as "<unknown>" string mjw at redhat dot com
                   ` (4 preceding siblings ...)
  2012-08-06 17:24 ` jistone at redhat dot com
@ 2012-08-06 18:59 ` mjw at redhat dot com
  2015-10-21 18:50 ` fche at redhat dot com
  6 siblings, 0 replies; 8+ messages in thread
From: mjw at redhat dot com @ 2012-08-06 18:59 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=14431

--- Comment #5 from Mark Wielaard <mjw at redhat dot com> 2012-08-06 18:59:22 UTC ---
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > 
> > > It's tough to automagically do The Right Thing without knowing what the user is
> > > looking for.
> > 
> > Yes, that is why we should just give all information instead of guessing what
> > the user wants.
> 
> But too much information will make it difficult to read.

Of course. But having the address there isn't too much, is it?

> I meant that since the value of kernel pointers on 64-bit will always take 16
> characters, it clutters the output quite a bit.  That may be a matter of
> opinion, but this is supposed to be "pretty" printing.

Aha. I think this is "pretty", and not cluttered. But can easily be convinced
of any other formatting. So you have a different suggestion for adding the
address?

> > I do like the usage of plain ? instead of "<unknown>".
> 
> Note the difference that I proposed it *without* quotes, so it's clear that the
> string is not really "?".

ah, yes, but that is somewhat more difficult since then we have to be
more fancy about the results of [kernel|user]_string2(). How about using
"<?>".

> > Is that really that simple? I wasn't able to do that easily. Except by printing
> > the fields individually with my own identifiers.
> 
> Yes, you'd print fields yourself individually to get the formatting you really
> want.

Which isn't really practical for larger/deeper structs.
And [unsigned] char * is kind of special since it is often not used as
string pointer, so warrants having the address around always.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug translator/14431] char * always being printed as, possibly INVALID/NULL "<unknown>", string without giving actual address
  2012-08-03 10:30 [Bug translator/14431] New: NULL/invalid char * pretty printed as "<unknown>" string mjw at redhat dot com
                   ` (5 preceding siblings ...)
  2012-08-06 18:59 ` mjw at redhat dot com
@ 2015-10-21 18:50 ` fche at redhat dot com
  6 siblings, 0 replies; 8+ messages in thread
From: fche at redhat dot com @ 2015-10-21 18:50 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=14431

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |fche at redhat dot com
         Resolution|---                         |FIXED

--- Comment #6 from Frank Ch. Eigler <fche at redhat dot com> ---
commit 9fcf867f09d0

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-10-21 18:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-03 10:30 [Bug translator/14431] New: NULL/invalid char * pretty printed as "<unknown>" string mjw at redhat dot com
2012-08-04 22:58 ` [Bug translator/14431] " mjw at redhat dot com
2012-08-05 20:29 ` jistone at redhat dot com
2012-08-06  8:43 ` [Bug translator/14431] char * always being printed as, possibly INVALID/NULL "<unknown>", string without giving actual address mjw at redhat dot com
2012-08-06 15:32 ` mjw at redhat dot com
2012-08-06 17:24 ` jistone at redhat dot com
2012-08-06 18:59 ` mjw at redhat dot com
2015-10-21 18:50 ` fche at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).