From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9205 invoked by alias); 6 Dec 2019 15:54:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 9196 invoked by uid 89); 6 Dec 2019 15:54:38 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,SPF_PASS autolearn=ham version=3.3.1 spammy=awareness, remind, portion, emoji X-HELO: us-smtp-1.mimecast.com Received: from us-smtp-delivery-1.mimecast.com (HELO us-smtp-1.mimecast.com) (205.139.110.120) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 06 Dec 2019 15:54:36 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575647675; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pLczbJ0tySA4vzlUeus9xpVXDGs/kFnqiaSa0BsZicY=; b=UEuxECEZzDWF2/yVfV+X+I6meS+8u0Rx6vC61zUjUWRfGfJYYCKz1kEs7O9DsJjerms7o5 f9cH3DsuYQQCZCkRfTY+7723KnRq7PQLC7NQRTQ9GK1lN3C+TE4SDLtWXdXHFRDVtvFuzc BVbDen5SLxpz2S2oq93+BJFsWf3p3+g= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-336-9IGVlkXTNuKsWhx-siv3xw-1; Fri, 06 Dec 2019 10:54:33 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7898810866F6; Fri, 6 Dec 2019 15:54:32 +0000 (UTC) Received: from ovpn-116-76.phx2.redhat.com (ovpn-116-76.phx2.redhat.com [10.3.116.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id B5AE577340; Fri, 6 Dec 2019 15:54:31 +0000 (UTC) Message-ID: Subject: Re: [PATCH] Multibyte awareness for diagnostics (PR 49973) From: David Malcolm To: Lewis Hyatt Cc: gcc-patches@gcc.gnu.org, joseph@codesourcery.com Date: Fri, 06 Dec 2019 15:54:00 -0000 In-Reply-To: <20191126162842.GA43936@ldh.local> References: <20190926201639.GA82807@ldh.local> <20190927204144.GA86720@ldh.local> <8a30a5a30078714f399822b0513b070af59c3e88.camel@redhat.com> <20191120162708.GA45237@ldh.local> <20191120163543.GA45766@ldh.local> <1d0ea3400a755748caa6b07884d769988473a346.camel@redhat.com> <20191126162842.GA43936@ldh.local> User-Agent: Evolution 3.32.4 (3.32.4-1.fc30) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2019-12/txt/msg00428.txt.bz2 On Tue, 2019-11-26 at 11:28 -0500, Lewis Hyatt wrote: > New version 4 patch attached, and responses below too. > Thanks for the various updates; sorry about the delay in responding. [...] > > As noted above, m_x_offset should be renamed to clarify its units > > ("m_x_offset_display"?) > > > > Can you move this calculation of the offset to a subroutine please. > > (I wonder if it can be unit-tested, but don't feel obliged to). > > > > Done. I also split the calculation of m_linenum_width into a > subroutine for consistency, and added new selftests that cover both. > > One thing that came up when setting up the selftests. The existing > code > offsets the caret position by (m_linenum_width + 2) when comparing it > to the > physical end of the display. I think this should be (m_linenum_width > + 3), > because the line number is followed by the three-character string " | > " > prior to the start of the source line. I went ahead and made that > change. This was one part of the patch I wasn't sure about, but I've been re-reading it and have come round. We emit a space after the '|' for m_x_offset's column, which is column 0 for the "no offset" case. There are a few places where move_to_column is used to reset things after a possible newline. I think I've had in my mind the idea that there's an initial space after the '|' that terminates the left margin, and then the source begins, but for some reason I thought that if we were offsetting, then the offset source could potentially begin right up against the '|' character. On re-reading trunk's implementation, I think that that isn't the current behavior, and that we simply have a " | " margin as your patch describes, with the offset source text being printed after it. Indeed, I think we ought to always have at least some indent of the source lines, so that they stand out from the diagnostic lines. > It did require a corresponding change to resolve some new testsuites > failure > afterwards, in this file: > > gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c > > The expected output just needs to shift left by one column, which I > did in > this version of the patch as well. It seems to make sense to me, but > I > wanted to mention in case I am missing something subtle here. I made > a > similar adjustment in case line numbers are not being output; here we > still > output a space before every line so it feels like that should be > taken into > account. > This calculation still does not attempt to take into account whether > pp_print_prefix () will do anything, not sure if that is desired or > not, but > it was the existing behavior. > > I am glad you brought this up, as the logic is a little tricky and > another > issue was fixed when I separated it out and worked through the tests: > some > source lines cannot be offset by exactly as many display columns as > dictated > by m_x_offset_display. For instance, if the offset is 1 column, and > the > portion of the line to be deleted ends with a character of wcwidth 2, > then > it ends up offsetting too much. I added code to pad with a space in > this > case to make it line up again. This is in the selftests as well. Excellent; thanks for doing this. [...] > BTW, bootstrap and reg-test were performed in linux x86-64. Test > results are > the same before and after: > > FAIL 104 104 > PASS 454289 454289 > UNSUPPORTED 10722 10722 > UNTESTED 205 205 > XFAIL 1652 1652 > XPASS 35 35 > > Also I wanted to remind that the three Unicode data files: > > contrib/unicode/EastAsianWidth.txt > contrib/unicode/PropList.txt > contrib/unicode/UnicodeData.txt > > are to be committed along with this this patch but I did not include > them in > the email since they are so large. > > Thanks again for your time, assuming you have the patience to look at > this > again :). > > -Lewis [...] > gcc/testsuite/ChangeLog: > > 2019-11-26 Lewis Hyatt > > * gcc.dg/plugin/diagnostic_plugin_test_show_locus.c > (test_show_locus): Adjust expected output based on new behavior. To be pedantic, this ChangeLog text seems wrong: I think you're actually tweaking the config to preserve the old expected output given the new x-offset implementation, right? The change in question is: > diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c > index 3f62edd408e..153bdb2fd89 100644 > --- a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c > +++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c > @@ -174,7 +174,7 @@ test_show_locus (function *fun) > > /* Hardcode the "terminal width", to verify the behavior of > very wide lines. */ > - global_dc->caret_max_width = 70; > + global_dc->caret_max_width = 71; > > if (0 == strcmp (fnname, "test_simple")) > { [...] > diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c > index cb920f6b9d0..472cb604c8b 100644 > --- a/gcc/diagnostic-show-locus.c > +++ b/gcc/diagnostic-show-locus.c [...] > +/* Test that layout::calculate_x_offset_display() works. */ > +static void > +test_layout_x_offset_display_utf8 (const line_table_case &case_) > +{ Thanks for adding selftests for this. > + const char *content > + = "This line is very long, so that we can use it to test the logic for " > + "clipping long lines. Also this: \xf0\x9f\x98\x82\xf0\x9f\x98\x82 is a " > + "pair of emojis that occupies 8 bytes and 4 display columns, starting at " > + "column #102.\n"; > + > + const int line_bytes = strlen (content) - 1; > + const int line_display_cols = line_bytes - 2*2; Please add comments explaining where the "-1" and the "- 2*2" come from in the above; presumably it's for the newline, and for the difference between encoding length and display cols for the 2 emoji chars? [...] > + /* Test that the source line is offset as expected when printed. */ > + { > + test_diagnostic_context dc; > + dc.caret_max_width = small_width - 6; > + dc.min_margin_width = test_left_margin - test_linenum_sep + 1; > + dc.show_line_numbers_p = true; > + rich_location richloc (line_table, > + linemap_position_for_column (line_table, > + emoji_col)); > + layout test_layout (&dc, &richloc, DK_ERROR); > + test_layout.print_line (1); > + ASSERT_STREQ (" 1 | \xf0\x9f\x98\x82\xf0\x9f\x98\x82 is a pair of emojis " > + "that occupies 8 bytes and 4 display columns, starting at " > + "column #102.\n" > + " | ^\n\n", > + pp_formatted_text (dc.printer)); > + } > + > + /* Similar to the previous example, but now the offset called for would split > + the first emoji in the middle of the UTF-8 sequence. Check that we replace > + it with a padding space in this case. */ > + { > + test_diagnostic_context dc; > + dc.caret_max_width = small_width - 5; > + dc.min_margin_width = test_left_margin - test_linenum_sep + 1; > + dc.show_line_numbers_p = true; > + rich_location richloc (line_table, > + linemap_position_for_column (line_table, > + emoji_col + 2)); > + layout test_layout (&dc, &richloc, DK_ERROR); > + test_layout.print_line (1); > + ASSERT_STREQ (" 1 | \xf0\x9f\x98\x82 is a pair of emojis " > + "that occupies 8 bytes and 4 display columns, starting at " > + "column #102.\n" > + " | ^\n\n", > + pp_formatted_text (dc.printer)); > + } Might be nice to turn on the ruler in the dc for these tests (I find it helpful anytime I'm debugging x-offsets) - but it's not essential; up to you. [...] The patch is OK for trunk with the nits above fixed. Do you have commit access? (I've got my own patch [1] that touches diagnostic- show-locus.c which I'll need to refresh once yours goes in); need to remember to include those data files when committing. Thanks very much for fixing this, and sorry again for the delay in reviewing it. Dave [1] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01515.html