From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <aburgess@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
 by sourceware.org (Postfix) with ESMTPS id 67151394881A
 for <gdb-patches@sourceware.org>; Wed, 23 Feb 2022 13:49:35 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 67151394881A
Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com
 [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-20-GIuGVHH8OfynrUgywPMpkA-1; Wed, 23 Feb 2022 08:49:33 -0500
X-MC-Unique: GIuGVHH8OfynrUgywPMpkA-1
Received: by mail-wr1-f69.google.com with SMTP id
 e1-20020adfa741000000b001e2e74c3d4eso10204680wrd.12
 for <gdb-patches@sourceware.org>; Wed, 23 Feb 2022 05:49:33 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to;
 bh=1d+xQfhn/sYp/OiqKSal7Ayjkn36kjFv0uzlZFHX+lU=;
 b=3AfaGs7bHSep4eAsOstZS1Bhwa/DcPkrYotO3et3taol+Daz6yaBqDIpoIhqzCqsia
 osDB0kMGUZWbUmFG7bGWJQEuCbb8dpYKMxMSbht3iOChdCdfLFRYkEVjv+h/wn70WYPT
 Wfr8e9FBDs5n8GiXMOFE3CFflYZncQc4pNLp518bkZhX/7QbIVWEH4LKvJ3sF8+FtE3x
 TU1Ewip6d6764xj/DLBMFN1Mgn7fylj266NWKSMvkKVMSuWq22uoLK8/adtRlbwxd9Nh
 TIG9AtNQBtP1LO+cWNzTmc6ZDq6nMtf22K5/GnbRu1wMmt+ELu3ygY8/gqgo7Wb82UEA
 Q+TA==
X-Gm-Message-State: AOAM533slfXYDfxBnZOOrBrTsvBEvMVTVqOPsS1k5SYE/D8Ma3nnXfGU
 LVpzyiSN28acwynM6wBLrIXIIKGLQ6Y20cxTfTNLiMY3wGIzLlAmcSsm96Q6Zco26zDFrO5VvfE
 KcYp89tqhhltD3IsQHtpAEw==
X-Received: by 2002:a7b:c245:0:b0:37c:440:c17 with SMTP id
 b5-20020a7bc245000000b0037c04400c17mr7536161wmj.130.1645624172235; 
 Wed, 23 Feb 2022 05:49:32 -0800 (PST)
X-Google-Smtp-Source: ABdhPJyZC5UTZTd4QhgIRKvrlUJ6VSwwRMVnu5u7nhja54pNWlaq/DaMDG7HgwI9BVtwyF38x9mU9Q==
X-Received: by 2002:a7b:c245:0:b0:37c:440:c17 with SMTP id
 b5-20020a7bc245000000b0037c04400c17mr7536145wmj.130.1645624171876; 
 Wed, 23 Feb 2022 05:49:31 -0800 (PST)
Received: from localhost (host86-169-131-29.range86-169.btcentralplus.com.
 [86.169.131.29]) by smtp.gmail.com with ESMTPSA id
 p6-20020a05600c358600b00354d399ef32sm5825791wmq.39.2022.02.23.05.49.30
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Wed, 23 Feb 2022 05:49:31 -0800 (PST)
Date: Wed, 23 Feb 2022 13:49:30 +0000
From: Andrew Burgess <aburgess@redhat.com>
To: Tom Tromey <tom@tromey.com>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH v2 09/18] Include \0 in printable wide characters
Message-ID: <20220223134930.GT2571@redhat.com>
References: <20220217220547.3874030-1-tom@tromey.com>
 <20220217220547.3874030-10-tom@tromey.com>
MIME-Version: 1.0
In-Reply-To: <20220217220547.3874030-10-tom@tromey.com>
X-Operating-System: Linux/5.8.18-100.fc31.x86_64 (x86_64)
X-Uptime: 13:00:19 up 12 days,  2:39,
X-Editor: GNU Emacs [ http://www.gnu.org/software/emacs ]
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Feb 2022 13:49:39 -0000

* Tom Tromey <tom@tromey.com> [2022-02-17 15:05:37 -0700]:

> print_wchar can already display \0, so include it in the list of
> "printable" characters in wchar_printable.  This is useful in a coming
> patch to change Rust to use the generic character-printing code.
> ---
>  gdb/valprint.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gdb/valprint.c b/gdb/valprint.c
> index 545dfbca73f..57201164c87 100644
> --- a/gdb/valprint.c
> +++ b/gdb/valprint.c
> @@ -2160,7 +2160,7 @@ wchar_printable (gdb_wchar_t w)
>  	  || w == LCST ('\a') || w == LCST ('\b')
>  	  || w == LCST ('\f') || w == LCST ('\n')
>  	  || w == LCST ('\r') || w == LCST ('\t')
> -	  || w == LCST ('\v'));
> +	  || w == LCST ('\v') || w == LCST ('\0'));
>  }

I'm not convinced that this change is OK.

Unfortunately, I don't an example that shows this change causing a
failure, but I can explain my thinking for why this bothers me.

My first thought was how can you add this to wchar_printable, without
also adding something to print_wchar.  Your commit message says that
print_wchar already handles \0, but I feel that's a little misleading,
the code path that "handles" \0 is (I think) the path that prints
values as escape sequences, so surely your commit could actually just
be:

  static int
  wchar_printable (gdb_wchar_t w)
  {
    /* print_wchar handles everything!  */
    return true;
  }

So, I tried completely removing wchar_printable, and all its one user,
and now the core of generic_emit_char looks like this:

  while (1)
    {
      int num_chars;
      gdb_wchar_t *chars;
      const gdb_byte *buf;
      size_t buflen;
      enum wchar_iterate_result result;

      num_chars = iter.iterate (&result, &chars, &buf, &buflen);
      if (num_chars < 0)
	break;
      if (num_chars > 0)
	{
	  for (int i = 0; i < num_chars; ++i)
	    print_wchar (chars[i], buf, buflen,
			 TYPE_LENGTH (type), byte_order,
			 &wchar_buf, quoter, &need_escape);
	}
      else
	print_wchar (gdb_WEOF, buf, buflen, TYPE_LENGTH (type),
		     byte_order, &wchar_buf, quoter, &need_escape);
    }

With that change there's no regressions on upstream/master.  And if I
apply this series completely, and make a similar change to
generic_emit_char, I still see no regressions.

So, does this mean that wchar_printable is not needed?

I don't think so.  I think there is a problem with the above change,
and its covered by this comment, which I found slightly cryptic, but I
think I eventually figured out:

  /* If all characters are printable, print them.  Otherwise,
     we're going to have to print an escape sequence.  We
     check all characters because we want to print the target
     bytes in the escape sequence, and we don't know character
     boundaries there.  */

My confusion here is that I initially thought; if we have multiple
characters, some that are printable, and some that are not, then
surely, we would want to print the initial printable ones for real,
and only later switch to escape sequences, right?

Except, that's not what we do.

And the reason (probably obvious to quicker minds than mine) is that
characters might have different widths, so we can't "just" print the
initial characters, and then print the unprintable as escape
sequences, as we wouldn't know where in BUF the unprintable character
actually starts.

OK, so my idea of removing wchar_printable is clearly a bad idea, but
how does this relate to your change?

Well, prior to this patch, if we had 3 characters, the first two are
printable, and the third was \0, we would spot the non-printable \0,
and so print the whole buffer, all 3 characters, as escape sequences.

With this patch, all 3 characters will appear to be printable.  So now
we will print the first character, just fine.  Then print the second
character just fine.  Now for the third character, the \0, we call to
print_wchar.  The \0 is not handled by anything but the 'default' case
of the switch.

In the default case, the \0 is non-printable, so we end up in the
escape sequence printing code, which then tries to load bytes starting
from BUF - which isn't going to be correct.

Now, this is where things are a bit weird.  The code in
generic_emit_char is clearly written to handle multiple characters,
but, I've only ever seen it print 1 character, which is why, I claim,
your above change to wchar_printable works.

I'm of the opinion that passing buf, buflen, the TYPE_LENGTH(type),
and byte_order to the first print_wchar call (in generic_emit_char) is
a bad idea, if we hadn't done that then your proposed change would
never have worked.

I propose that there should be three tightly coupled, and related
functions:

  bool wchar_printable (gdb_wchar_t w);
  void print_wchar (gdb_wint_t w, struct obstack *output, int quoter, int *need_escapep);
  void print_wchar_escape (const gdb_byte *orig, int orig_len, int width,
			   enum bfd_endian byte_order, struct obstack *output,
			   int quoter, int *need_escapep);

With the third one of these being entirely new.  I think that the
wchar_printable _must_ return true only for characters that
print_wchar can handle.  If print_wchar sees something it can't
handle, then this is an internal_error situation.

As for the impact on your patch series, my thinking is that you might
need to consider replacing emit_char_ftype with an actual
emit_char_handler object reference.  We'd then have:

  struct emit_wchar_handler
  {
    virtual bool is_printable (gdb_wchar_t w)
    { .... }

    void print (gdb_wint_t w, struct obstack *output,
                int quoter, int *need_escapep)
    { ... }

    virtual void print_wchar_escape (const gdb_byte *orig, int orig_len, int width,
			             enum bfd_endian byte_order, struct obstack *output,
			             int quoter, int *need_escapep)
    { ... }
  };

an instance of this would replace default_emit_wchar in the valprint.h
API.  And languages can create a sub-class of this if they need too.

Have I missed something here?

Thanks,
Andrew