public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Andrew Burgess <aburgess@redhat.com>
To: Tom Tromey <tom@tromey.com>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH v2 09/18] Include \0 in printable wide characters
Date: Wed, 23 Feb 2022 13:49:30 +0000	[thread overview]
Message-ID: <20220223134930.GT2571@redhat.com> (raw)
In-Reply-To: <20220217220547.3874030-10-tom@tromey.com>

* Tom Tromey <tom@tromey.com> [2022-02-17 15:05:37 -0700]:

> print_wchar can already display \0, so include it in the list of
> "printable" characters in wchar_printable.  This is useful in a coming
> patch to change Rust to use the generic character-printing code.
> ---
>  gdb/valprint.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gdb/valprint.c b/gdb/valprint.c
> index 545dfbca73f..57201164c87 100644
> --- a/gdb/valprint.c
> +++ b/gdb/valprint.c
> @@ -2160,7 +2160,7 @@ wchar_printable (gdb_wchar_t w)
>  	  || w == LCST ('\a') || w == LCST ('\b')
>  	  || w == LCST ('\f') || w == LCST ('\n')
>  	  || w == LCST ('\r') || w == LCST ('\t')
> -	  || w == LCST ('\v'));
> +	  || w == LCST ('\v') || w == LCST ('\0'));
>  }

I'm not convinced that this change is OK.

Unfortunately, I don't an example that shows this change causing a
failure, but I can explain my thinking for why this bothers me.

My first thought was how can you add this to wchar_printable, without
also adding something to print_wchar.  Your commit message says that
print_wchar already handles \0, but I feel that's a little misleading,
the code path that "handles" \0 is (I think) the path that prints
values as escape sequences, so surely your commit could actually just
be:

  static int
  wchar_printable (gdb_wchar_t w)
  {
    /* print_wchar handles everything!  */
    return true;
  }

So, I tried completely removing wchar_printable, and all its one user,
and now the core of generic_emit_char looks like this:

  while (1)
    {
      int num_chars;
      gdb_wchar_t *chars;
      const gdb_byte *buf;
      size_t buflen;
      enum wchar_iterate_result result;

      num_chars = iter.iterate (&result, &chars, &buf, &buflen);
      if (num_chars < 0)
	break;
      if (num_chars > 0)
	{
	  for (int i = 0; i < num_chars; ++i)
	    print_wchar (chars[i], buf, buflen,
			 TYPE_LENGTH (type), byte_order,
			 &wchar_buf, quoter, &need_escape);
	}
      else
	print_wchar (gdb_WEOF, buf, buflen, TYPE_LENGTH (type),
		     byte_order, &wchar_buf, quoter, &need_escape);
    }

With that change there's no regressions on upstream/master.  And if I
apply this series completely, and make a similar change to
generic_emit_char, I still see no regressions.

So, does this mean that wchar_printable is not needed?

I don't think so.  I think there is a problem with the above change,
and its covered by this comment, which I found slightly cryptic, but I
think I eventually figured out:

  /* If all characters are printable, print them.  Otherwise,
     we're going to have to print an escape sequence.  We
     check all characters because we want to print the target
     bytes in the escape sequence, and we don't know character
     boundaries there.  */

My confusion here is that I initially thought; if we have multiple
characters, some that are printable, and some that are not, then
surely, we would want to print the initial printable ones for real,
and only later switch to escape sequences, right?

Except, that's not what we do.

And the reason (probably obvious to quicker minds than mine) is that
characters might have different widths, so we can't "just" print the
initial characters, and then print the unprintable as escape
sequences, as we wouldn't know where in BUF the unprintable character
actually starts.

OK, so my idea of removing wchar_printable is clearly a bad idea, but
how does this relate to your change?

Well, prior to this patch, if we had 3 characters, the first two are
printable, and the third was \0, we would spot the non-printable \0,
and so print the whole buffer, all 3 characters, as escape sequences.

With this patch, all 3 characters will appear to be printable.  So now
we will print the first character, just fine.  Then print the second
character just fine.  Now for the third character, the \0, we call to
print_wchar.  The \0 is not handled by anything but the 'default' case
of the switch.

In the default case, the \0 is non-printable, so we end up in the
escape sequence printing code, which then tries to load bytes starting
from BUF - which isn't going to be correct.

Now, this is where things are a bit weird.  The code in
generic_emit_char is clearly written to handle multiple characters,
but, I've only ever seen it print 1 character, which is why, I claim,
your above change to wchar_printable works.

I'm of the opinion that passing buf, buflen, the TYPE_LENGTH(type),
and byte_order to the first print_wchar call (in generic_emit_char) is
a bad idea, if we hadn't done that then your proposed change would
never have worked.

I propose that there should be three tightly coupled, and related
functions:

  bool wchar_printable (gdb_wchar_t w);
  void print_wchar (gdb_wint_t w, struct obstack *output, int quoter, int *need_escapep);
  void print_wchar_escape (const gdb_byte *orig, int orig_len, int width,
			   enum bfd_endian byte_order, struct obstack *output,
			   int quoter, int *need_escapep);

With the third one of these being entirely new.  I think that the
wchar_printable _must_ return true only for characters that
print_wchar can handle.  If print_wchar sees something it can't
handle, then this is an internal_error situation.

As for the impact on your patch series, my thinking is that you might
need to consider replacing emit_char_ftype with an actual
emit_char_handler object reference.  We'd then have:

  struct emit_wchar_handler
  {
    virtual bool is_printable (gdb_wchar_t w)
    { .... }

    void print (gdb_wint_t w, struct obstack *output,
                int quoter, int *need_escapep)
    { ... }

    virtual void print_wchar_escape (const gdb_byte *orig, int orig_len, int width,
			             enum bfd_endian byte_order, struct obstack *output,
			             int quoter, int *need_escapep)
    { ... }
  };

an instance of this would replace default_emit_wchar in the valprint.h
API.  And languages can create a sub-class of this if they need too.

Have I missed something here?

Thanks,
Andrew


  reply	other threads:[~2022-02-23 13:49 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-17 22:05 [PATCH v2 00/18] Refactor character printing Tom Tromey
2022-02-17 22:05 ` [PATCH v2 01/18] Fix latent quote char bug in generic_printstr Tom Tromey
2022-02-17 22:05 ` [PATCH v2 02/18] Boolify need_escape in generic_emit_char Tom Tromey
2022-02-17 22:05 ` [PATCH v2 03/18] Remove c_emit_char Tom Tromey
2022-02-17 22:05 ` [PATCH v2 04/18] Remove c_printstr Tom Tromey
2022-02-17 22:05 ` [PATCH v2 05/18] Don't use wchar_printable in print_wchar Tom Tromey
2022-02-22 15:36   ` Andrew Burgess
2022-10-10 16:39     ` Tom Tromey
2022-02-17 22:05 ` [PATCH v2 06/18] Fix a latent bug " Tom Tromey
2022-02-17 22:05 ` [PATCH v2 07/18] Remove language_defn::emitchar Tom Tromey
2022-02-17 22:05 ` [PATCH v2 08/18] Add gdb_iswcntrl Tom Tromey
2022-02-17 22:05 ` [PATCH v2 09/18] Include \0 in printable wide characters Tom Tromey
2022-02-23 13:49   ` Andrew Burgess [this message]
2022-02-23 22:28     ` Tom Tromey
2022-02-23 23:59       ` Tom Tromey
2022-02-17 22:05 ` [PATCH v2 10/18] Use a ui_file in print_wchar Tom Tromey
2022-02-17 22:05 ` [PATCH v2 11/18] Add an emitter callback to generic_printstr and generic_emit_char Tom Tromey
2022-02-17 22:05 ` [PATCH v2 12/18] Add a default encoding to generic_emit_char and generic_printstr Tom Tromey
2022-02-17 22:05 ` [PATCH v2 13/18] Change generic_emit_char to print the quotes Tom Tromey
2022-02-17 22:05 ` [PATCH v2 14/18] Use generic_emit_char in Rust Tom Tromey
2022-02-17 22:05 ` [PATCH v2 15/18] Use generic_emit_char in Ada Tom Tromey
2022-02-17 22:05 ` [PATCH v2 16/18] Use generic_emit_char in Modula-2 Tom Tromey
2022-02-23 20:17   ` Gaius Mulley
2022-03-16 12:29   ` [PATCH] Additional modula2 tests Gaius Mulley
2022-04-07 14:21     ` Tom Tromey
2022-04-09 23:16       ` Gaius Mulley
2022-04-11 19:45   ` [PATCH v1] Array access in Modula-2 Gaius Mulley
2022-02-17 22:05 ` [PATCH v2 17/18] Use generic_emit_char in Pascal Tom Tromey
2022-02-17 22:05 ` [PATCH v2 18/18] Simplify Fortran string printing Tom Tromey
2022-10-10 17:37 ` [PATCH v2 00/18] Refactor character printing Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220223134930.GT2571@redhat.com \
    --to=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).