From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 67151394881A for ; Wed, 23 Feb 2022 13:49:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 67151394881A Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-20-GIuGVHH8OfynrUgywPMpkA-1; Wed, 23 Feb 2022 08:49:33 -0500 X-MC-Unique: GIuGVHH8OfynrUgywPMpkA-1 Received: by mail-wr1-f69.google.com with SMTP id e1-20020adfa741000000b001e2e74c3d4eso10204680wrd.12 for ; Wed, 23 Feb 2022 05:49:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=1d+xQfhn/sYp/OiqKSal7Ayjkn36kjFv0uzlZFHX+lU=; b=3AfaGs7bHSep4eAsOstZS1Bhwa/DcPkrYotO3et3taol+Daz6yaBqDIpoIhqzCqsia osDB0kMGUZWbUmFG7bGWJQEuCbb8dpYKMxMSbht3iOChdCdfLFRYkEVjv+h/wn70WYPT Wfr8e9FBDs5n8GiXMOFE3CFflYZncQc4pNLp518bkZhX/7QbIVWEH4LKvJ3sF8+FtE3x TU1Ewip6d6764xj/DLBMFN1Mgn7fylj266NWKSMvkKVMSuWq22uoLK8/adtRlbwxd9Nh TIG9AtNQBtP1LO+cWNzTmc6ZDq6nMtf22K5/GnbRu1wMmt+ELu3ygY8/gqgo7Wb82UEA Q+TA== X-Gm-Message-State: AOAM533slfXYDfxBnZOOrBrTsvBEvMVTVqOPsS1k5SYE/D8Ma3nnXfGU LVpzyiSN28acwynM6wBLrIXIIKGLQ6Y20cxTfTNLiMY3wGIzLlAmcSsm96Q6Zco26zDFrO5VvfE KcYp89tqhhltD3IsQHtpAEw== X-Received: by 2002:a7b:c245:0:b0:37c:440:c17 with SMTP id b5-20020a7bc245000000b0037c04400c17mr7536161wmj.130.1645624172235; Wed, 23 Feb 2022 05:49:32 -0800 (PST) X-Google-Smtp-Source: ABdhPJyZC5UTZTd4QhgIRKvrlUJ6VSwwRMVnu5u7nhja54pNWlaq/DaMDG7HgwI9BVtwyF38x9mU9Q== X-Received: by 2002:a7b:c245:0:b0:37c:440:c17 with SMTP id b5-20020a7bc245000000b0037c04400c17mr7536145wmj.130.1645624171876; Wed, 23 Feb 2022 05:49:31 -0800 (PST) Received: from localhost (host86-169-131-29.range86-169.btcentralplus.com. [86.169.131.29]) by smtp.gmail.com with ESMTPSA id p6-20020a05600c358600b00354d399ef32sm5825791wmq.39.2022.02.23.05.49.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Feb 2022 05:49:31 -0800 (PST) Date: Wed, 23 Feb 2022 13:49:30 +0000 From: Andrew Burgess To: Tom Tromey Cc: gdb-patches@sourceware.org Subject: Re: [PATCH v2 09/18] Include \0 in printable wide characters Message-ID: <20220223134930.GT2571@redhat.com> References: <20220217220547.3874030-1-tom@tromey.com> <20220217220547.3874030-10-tom@tromey.com> MIME-Version: 1.0 In-Reply-To: <20220217220547.3874030-10-tom@tromey.com> X-Operating-System: Linux/5.8.18-100.fc31.x86_64 (x86_64) X-Uptime: 13:00:19 up 12 days, 2:39, X-Editor: GNU Emacs [ http://www.gnu.org/software/emacs ] X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Feb 2022 13:49:39 -0000 * Tom Tromey [2022-02-17 15:05:37 -0700]: > print_wchar can already display \0, so include it in the list of > "printable" characters in wchar_printable. This is useful in a coming > patch to change Rust to use the generic character-printing code. > --- > gdb/valprint.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gdb/valprint.c b/gdb/valprint.c > index 545dfbca73f..57201164c87 100644 > --- a/gdb/valprint.c > +++ b/gdb/valprint.c > @@ -2160,7 +2160,7 @@ wchar_printable (gdb_wchar_t w) > || w == LCST ('\a') || w == LCST ('\b') > || w == LCST ('\f') || w == LCST ('\n') > || w == LCST ('\r') || w == LCST ('\t') > - || w == LCST ('\v')); > + || w == LCST ('\v') || w == LCST ('\0')); > } I'm not convinced that this change is OK. Unfortunately, I don't an example that shows this change causing a failure, but I can explain my thinking for why this bothers me. My first thought was how can you add this to wchar_printable, without also adding something to print_wchar. Your commit message says that print_wchar already handles \0, but I feel that's a little misleading, the code path that "handles" \0 is (I think) the path that prints values as escape sequences, so surely your commit could actually just be: static int wchar_printable (gdb_wchar_t w) { /* print_wchar handles everything! */ return true; } So, I tried completely removing wchar_printable, and all its one user, and now the core of generic_emit_char looks like this: while (1) { int num_chars; gdb_wchar_t *chars; const gdb_byte *buf; size_t buflen; enum wchar_iterate_result result; num_chars = iter.iterate (&result, &chars, &buf, &buflen); if (num_chars < 0) break; if (num_chars > 0) { for (int i = 0; i < num_chars; ++i) print_wchar (chars[i], buf, buflen, TYPE_LENGTH (type), byte_order, &wchar_buf, quoter, &need_escape); } else print_wchar (gdb_WEOF, buf, buflen, TYPE_LENGTH (type), byte_order, &wchar_buf, quoter, &need_escape); } With that change there's no regressions on upstream/master. And if I apply this series completely, and make a similar change to generic_emit_char, I still see no regressions. So, does this mean that wchar_printable is not needed? I don't think so. I think there is a problem with the above change, and its covered by this comment, which I found slightly cryptic, but I think I eventually figured out: /* If all characters are printable, print them. Otherwise, we're going to have to print an escape sequence. We check all characters because we want to print the target bytes in the escape sequence, and we don't know character boundaries there. */ My confusion here is that I initially thought; if we have multiple characters, some that are printable, and some that are not, then surely, we would want to print the initial printable ones for real, and only later switch to escape sequences, right? Except, that's not what we do. And the reason (probably obvious to quicker minds than mine) is that characters might have different widths, so we can't "just" print the initial characters, and then print the unprintable as escape sequences, as we wouldn't know where in BUF the unprintable character actually starts. OK, so my idea of removing wchar_printable is clearly a bad idea, but how does this relate to your change? Well, prior to this patch, if we had 3 characters, the first two are printable, and the third was \0, we would spot the non-printable \0, and so print the whole buffer, all 3 characters, as escape sequences. With this patch, all 3 characters will appear to be printable. So now we will print the first character, just fine. Then print the second character just fine. Now for the third character, the \0, we call to print_wchar. The \0 is not handled by anything but the 'default' case of the switch. In the default case, the \0 is non-printable, so we end up in the escape sequence printing code, which then tries to load bytes starting from BUF - which isn't going to be correct. Now, this is where things are a bit weird. The code in generic_emit_char is clearly written to handle multiple characters, but, I've only ever seen it print 1 character, which is why, I claim, your above change to wchar_printable works. I'm of the opinion that passing buf, buflen, the TYPE_LENGTH(type), and byte_order to the first print_wchar call (in generic_emit_char) is a bad idea, if we hadn't done that then your proposed change would never have worked. I propose that there should be three tightly coupled, and related functions: bool wchar_printable (gdb_wchar_t w); void print_wchar (gdb_wint_t w, struct obstack *output, int quoter, int *need_escapep); void print_wchar_escape (const gdb_byte *orig, int orig_len, int width, enum bfd_endian byte_order, struct obstack *output, int quoter, int *need_escapep); With the third one of these being entirely new. I think that the wchar_printable _must_ return true only for characters that print_wchar can handle. If print_wchar sees something it can't handle, then this is an internal_error situation. As for the impact on your patch series, my thinking is that you might need to consider replacing emit_char_ftype with an actual emit_char_handler object reference. We'd then have: struct emit_wchar_handler { virtual bool is_printable (gdb_wchar_t w) { .... } void print (gdb_wint_t w, struct obstack *output, int quoter, int *need_escapep) { ... } virtual void print_wchar_escape (const gdb_byte *orig, int orig_len, int width, enum bfd_endian byte_order, struct obstack *output, int quoter, int *need_escapep) { ... } }; an instance of this would replace default_emit_wchar in the valprint.h API. And languages can create a sub-class of this if they need too. Have I missed something here? Thanks, Andrew