From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x730.google.com (mail-qk1-x730.google.com [IPv6:2607:f8b0:4864:20::730]) by sourceware.org (Postfix) with ESMTPS id 517873858403 for ; Fri, 4 Nov 2022 21:05:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 517873858403 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-qk1-x730.google.com with SMTP id v8so3819033qkg.12 for ; Fri, 04 Nov 2022 14:05:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=pml0GyDxSRyuG2iMUmSLcuw+ssjdTc1Nk39LJ8sIz0k=; b=jOsSQ4uwYW8jwXmOQK8TTVT71QY4uVwY5+aOKymJEj0oYAcltpOrNRqr2IWL747hPF 7bTzDRd7JKSdlB9IjExd50khEGC4WldX5olOtjVffaYVAUZ8d8lT8EGpbCyJNOGSs1CP s0WyY11dKnoRLl9E/0UUzRzBmI25P+7ia/7qE010g/mEJ3p+klVQ9QcOTIOpAZboiqrp MyDyPeThjDtMpQhv0LShY/cZ0205EyigL0qduj/rraMQztEWLaSEqjhCFl+W3J5QlTxF C4wkwniCd4k7q3Lj4U/az2GmnPaAn7fzg2R9jns88DgxSax2QMblBCNfyfqJimfUbVoh pk2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=pml0GyDxSRyuG2iMUmSLcuw+ssjdTc1Nk39LJ8sIz0k=; b=uYeSgELuzWzxRXWKlpjXajoM0DQijIDXM3gaW0+sC4AOgLgqEzb7Yo01jmn+bKgo3Z NNx1oVWclzcxD7BxNAlc5Yq/gnv8NcfmF75OINkpuN6fmMUBRP14U86Ptuh/epRg1nUL 1GLOo83XcLoUAuP86Qd2IMAGkJ3QycdIpBlzWDby7gFWzlMozDS5R87YOpD+4Zfo5PQg qpMArB2szYy0P5GJDeMSNsNBm1CledZPsn1GL6z7rKI4UA1LPo8GDb+h41lj/dpayo4X IfPc8jqG3Gv9f7mRZwYzmyMBQsoJj8x3GFISPp1nCjAl9GvdpLznN4aw3H2VFUdiCWUE Ah0w== X-Gm-Message-State: ACrzQf2zpowfe7bc8g+k6eT3Xxwk/OH1Zf5MKyersrVBx5JrFMoB1JOe tL8+UsU0lsfo0kQohYeWJMI= X-Google-Smtp-Source: AMsMyM7eW3Xy7O04xUqdBpXL/Yp0FJrJg5Mcbe5WRs47ht38jLzDESsA60EKWtr4fCOwQa8aU9MUUg== X-Received: by 2002:a37:4145:0:b0:6fa:14fc:4fd0 with SMTP id o66-20020a374145000000b006fa14fc4fd0mr26193228qka.291.1667595953547; Fri, 04 Nov 2022 14:05:53 -0700 (PDT) Received: from ldh-imac.local (96-67-140-173-static.hfc.comcastbusiness.net. [96.67.140.173]) by smtp.gmail.com with ESMTPSA id s21-20020ac85cd5000000b003a55fe9f352sm272744qta.64.2022.11.04.14.05.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Nov 2022 14:05:52 -0700 (PDT) Date: Fri, 4 Nov 2022 17:05:50 -0400 From: Lewis Hyatt To: David Malcolm Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH 5/6] diagnostics: Support generated data in additional contexts Message-ID: <20221104210550.GA92497@ldh-imac.local> References: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="3V7upXqbjpZ4EhLz" Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-3039.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --3V7upXqbjpZ4EhLz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Fri, Nov 04, 2022 at 12:42:29PM -0400, David Malcolm wrote: > On Fri, 2022-11-04 at 09:44 -0400, Lewis Hyatt via Gcc-patches wrote: > > Add awareness that diagnostic locations may be in generated buffers > > rather > > than an actual file to other places in the diagnostics code that may > > care, > > most notably SARIF output (which needs to obtain its own snapshots of > > the code > > involved). For edit context output, which outputs fixit hints as > > diffs, for > > now just make sure we ignore generated data buffers. At the moment, > > there is > > no ability for a fixit hint to be generated in such a buffer. > > > > Because SARIF uses JSON as well, also add the ability to the > > json::string > > class to handle a buffer with nulls in the middle (since we place no > > restriction on LC_GEN content) by providing the option to specify the > > data > > length. > > Please can you split this patch into three parts: > - the SARIF part > - the json changes > - the edit-context.cc changes (I think this at least counts as an > "obvious" change with respect to the other changes in the kit, though > I'm still working my way through patch 4 in the kit). > > Please add a DejaGnu testcase to the SARIF part, with a diagnostic that > references a generated data buffer; see > gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-*.c > for examples of SARIF testcases. > > Please add a selftest to the json change so that we have a unit test of > constructing a json::string with an embedded NUL, and how we serialize > such a string (probably to json.cc's test_writing_strings) > > Thanks > Dave Yes, certainly, sorry for not splitting it up more to start with. Regarding the SARIF testcase, it's not that easy to get SARIF output to actually output generated data, because as of now it can only appear in a _Pragma, and SARIF does not output macro definitions currently. I think the only way I know to do it, is to make use of -fdump-internal-locations, which generates top-level inform() calls inside the _Pragma that can end up in the SARIF output. So I wrote a testcase that does this, but not sure how you will feel about having the testsuite rely on this internal debugging option. I wasn't sure what's the best way to send the 3 split up patches. I attached them here as 5a/6, 5b/6, 5c/6, in case that's right, but I wasn't sure if I should just resend the whole batch (minus perhaps the 2 you have already acked), and/or if I should wait for feedback on the other patches first. Happy to do whatever makes it easier for you, and thanks for your time! Note that the new SARIF patch (5c/6) now needs to come last in the series, after the patch 6/6 that actually supports _Pragma, so that the new testcase can make use of that. -Lewis --3V7upXqbjpZ4EhLz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline; filename="Pragma_patch_5a.txt" [PATCH 5a/6] diagnostics: Handle generated data locations in edit_context Class edit_context handles outputting fixit hints in diff form that could be manually or automatically applied by the user. This will not make sense for generated data locations, such as the contents of a _Pragma string, because the text to be modified does not appear in the user's input files. We do not currently ever generate fixit hints in such a context, but for future-proofing purposes, ignore such locations in edit context now. gcc/ChangeLog: * edit-context.cc (edit_context::apply_fixit): Ignore locations in generated data. diff --git a/gcc/edit-context.cc b/gcc/edit-context.cc index 6879ddd41b4..aa95bc0834f 100644 --- a/gcc/edit-context.cc +++ b/gcc/edit-context.cc @@ -301,8 +301,12 @@ edit_context::apply_fixit (const fixit_hint *hint) return false; if (start.column == 0) return false; + if (start.generated_data) + return false; if (next_loc.column == 0) return false; + if (next_loc.generated_data) + return false; edited_file &file = get_or_insert_file (start.file); if (!m_valid) --3V7upXqbjpZ4EhLz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline; filename="Pragma_patch_5b.txt" [PATCH 5b/6] diagnostics: Remove null-termination requirement for json::string json::string currently handles null-terminated data and so can't work with data that may contain embedded null bytes or that is not null-terminated. Supporting such data will make json::string more robust in some contexts, such as SARIF output, which uses it to output user source code that may contain embedded null bytes. gcc/ChangeLog: * json.h (class string): Add M_LEN member to store the length of the data. Add constructor taking an explicit length. * json.cc (string::string): Implement the new constructor. (string::print): Support print strings that are not null-terminated. Escape embdedded null bytes on output. (test_writing_strings): Test the new null-byte-related features of json::string. diff --git a/gcc/json.cc b/gcc/json.cc index 974f8c36825..3a79cac02ac 100644 --- a/gcc/json.cc +++ b/gcc/json.cc @@ -190,6 +190,15 @@ string::string (const char *utf8) { gcc_assert (utf8); m_utf8 = xstrdup (utf8); + m_len = strlen (utf8); +} + +string::string (const char *utf8, size_t len) +{ + gcc_assert (utf8); + m_utf8 = XNEWVEC (char, len); + m_len = len; + memcpy (m_utf8, utf8, len); } /* Implementation of json::value::print for json::string. */ @@ -198,9 +207,9 @@ void string::print (pretty_printer *pp) const { pp_character (pp, '"'); - for (const char *ptr = m_utf8; *ptr; ptr++) + for (size_t i = 0; i != m_len; ++i) { - char ch = *ptr; + char ch = m_utf8[i]; switch (ch) { case '"': @@ -224,7 +233,9 @@ string::print (pretty_printer *pp) const case '\t': pp_string (pp, "\\t"); break; - + case '\0': + pp_string (pp, "\\0"); + break; default: pp_character (pp, ch); } @@ -341,6 +352,12 @@ test_writing_strings () string contains_quotes ("before \"quoted\" after"); assert_print_eq (contains_quotes, "\"before \\\"quoted\\\" after\""); + + const char data[] = {'a', 'b', 'c', 'd', '\0', 'e', 'f'}; + string not_terminated (data, 3); + assert_print_eq (not_terminated, "\"abc\""); + string embedded_null (data, sizeof data); + assert_print_eq (embedded_null, "\"abcd\\0ef\""); } /* Verify that JSON literals are written correctly. */ diff --git a/gcc/json.h b/gcc/json.h index f272981259b..f7afd843dc5 100644 --- a/gcc/json.h +++ b/gcc/json.h @@ -156,16 +156,19 @@ class integer_number : public value class string : public value { public: - string (const char *utf8); + explicit string (const char *utf8); + string (const char *utf8, size_t len); ~string () { free (m_utf8); } enum kind get_kind () const final override { return JSON_STRING; } void print (pretty_printer *pp) const final override; const char *get_string () const { return m_utf8; } + size_t get_length () const { return m_len; } private: char *m_utf8; + size_t m_len; }; /* Subclass of value for the three JSON literals "true", "false", --3V7upXqbjpZ4EhLz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline; filename="Pragma_patch_5c.txt" [PATCH 5c/6] diagnostics: Support generated data locations in SARIF output The diagnostics routines for SARIF output need to read the source code back in, so that they can generate "snippet" and "content" records, so they need to be able to cope with generated data locations. Add support for that in diagnostic-format-sarif.cc. gcc/ChangeLog: * diagnostic-format-sarif.cc (sarif_builder::xloc_to_fb): New function. (sarif_builder::maybe_make_physical_location_object): Support generated data locations. (sarif_builder::make_artifact_location_object): Likewise. (sarif_builder::maybe_make_region_object_for_context): Likewise. (sarif_builder::make_artifact_object): Likewise. (sarif_builder::maybe_make_artifact_content_object): Likewise. (get_source_lines): Likewise. gcc/testsuite/ChangeLog: * c-c++-common/diagnostic-format-sarif-file-5.c: New test. diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc index 7110db4edd6..81141c9358f 100644 --- a/gcc/diagnostic-format-sarif.cc +++ b/gcc/diagnostic-format-sarif.cc @@ -125,7 +125,10 @@ private: json::array *maybe_make_kinds_array (diagnostic_event::meaning m) const; json::object *maybe_make_physical_location_object (location_t loc); json::object *make_artifact_location_object (location_t loc); - json::object *make_artifact_location_object (const char *filename); + + typedef std::pair filename_or_buffer; + json::object *make_artifact_location_object (filename_or_buffer fb); + json::object *make_artifact_location_object_for_pwd () const; json::object *maybe_make_region_object (location_t loc) const; json::object *maybe_make_region_object_for_context (location_t loc) const; @@ -146,16 +149,17 @@ private: json::object *make_reporting_descriptor_object_for_cwe_id (int cwe_id) const; json::object * make_reporting_descriptor_reference_object_for_cwe_id (int cwe_id); - json::object *make_artifact_object (const char *filename); - json::object *maybe_make_artifact_content_object (const char *filename) const; - json::object *maybe_make_artifact_content_object (const char *filename, - int start_line, + json::object *make_artifact_object (filename_or_buffer fb); + json::object * + maybe_make_artifact_content_object (filename_or_buffer fb) const; + json::object *maybe_make_artifact_content_object (expanded_location xloc, int end_line) const; json::object *make_fix_object (const rich_location &rich_loc); json::object *make_artifact_change_object (const rich_location &richloc); json::object *make_replacement_object (const fixit_hint &hint) const; json::object *make_artifact_content_object (const char *text) const; int get_sarif_column (expanded_location exploc) const; + static filename_or_buffer xloc_to_fb (expanded_location xloc); diagnostic_context *m_context; @@ -166,7 +170,11 @@ private: diagnostic group. */ sarif_result *m_cur_group_result; - hash_set m_filenames; + /* If the second member is >0, then this is a buffer of generated content, + with that length, not a filename. */ + hash_set , + int_hash > + > m_filenames; bool m_seen_any_relative_paths; hash_set m_rule_id_set; json::array *m_rules_arr; @@ -588,6 +596,15 @@ sarif_builder::make_location_object (const diagnostic_event &event) return location_obj; } +/* Populate a filename_or_buffer pair from an expanded location. */ +sarif_builder::filename_or_buffer +sarif_builder::xloc_to_fb (expanded_location xloc) +{ + if (xloc.generated_data_len) + return filename_or_buffer (xloc.generated_data, xloc.generated_data_len); + return filename_or_buffer (xloc.file, 0); +} + /* Make a physicalLocation object (SARIF v2.1.0 section 3.29) for LOC, or return NULL; Add any filename to the m_artifacts. */ @@ -603,7 +620,7 @@ sarif_builder::maybe_make_physical_location_object (location_t loc) /* "artifactLocation" property (SARIF v2.1.0 section 3.29.3). */ json::object *artifact_loc_obj = make_artifact_location_object (loc); phys_loc_obj->set ("artifactLocation", artifact_loc_obj); - m_filenames.add (LOCATION_FILE (loc)); + m_filenames.add (xloc_to_fb (expand_location (loc))); /* "region" property (SARIF v2.1.0 section 3.29.4). */ if (json::object *region_obj = maybe_make_region_object (loc)) @@ -627,7 +644,7 @@ sarif_builder::maybe_make_physical_location_object (location_t loc) json::object * sarif_builder::make_artifact_location_object (location_t loc) { - return make_artifact_location_object (LOCATION_FILE (loc)); + return make_artifact_location_object (xloc_to_fb (expand_location (loc))); } /* The ID value for use in "uriBaseId" properties (SARIF v2.1.0 section 3.4.4) @@ -639,10 +656,12 @@ sarif_builder::make_artifact_location_object (location_t loc) or return NULL. */ json::object * -sarif_builder::make_artifact_location_object (const char *filename) +sarif_builder::make_artifact_location_object (filename_or_buffer fb) { json::object *artifact_loc_obj = new json::object (); + const auto filename = (fb.second ? special_fname_generated () : fb.first); + /* "uri" property (SARIF v2.1.0 section 3.4.3). */ artifact_loc_obj->set ("uri", new json::string (filename)); @@ -795,9 +814,7 @@ sarif_builder::maybe_make_region_object_for_context (location_t loc) const /* "snippet" property (SARIF v2.1.0 section 3.30.13). */ if (json::object *artifact_content_obj - = maybe_make_artifact_content_object (exploc_start.file, - exploc_start.line, - exploc_finish.line)) + = maybe_make_artifact_content_object (exploc_start, exploc_finish.line)) region_obj->set ("snippet", artifact_content_obj); return region_obj; @@ -1248,24 +1265,24 @@ sarif_builder::maybe_make_cwe_taxonomy_object () const /* Make an artifact object (SARIF v2.1.0 section 3.24). */ json::object * -sarif_builder::make_artifact_object (const char *filename) +sarif_builder::make_artifact_object (filename_or_buffer fb) { json::object *artifact_obj = new json::object (); /* "location" property (SARIF v2.1.0 section 3.24.2). */ - json::object *artifact_loc_obj = make_artifact_location_object (filename); + json::object *artifact_loc_obj = make_artifact_location_object (fb); artifact_obj->set ("location", artifact_loc_obj); /* "contents" property (SARIF v2.1.0 section 3.24.8). */ if (json::object *artifact_content_obj - = maybe_make_artifact_content_object (filename)) + = maybe_make_artifact_content_object (fb)) artifact_obj->set ("contents", artifact_content_obj); /* "sourceLanguage" property (SARIF v2.1.0 section 3.24.10). */ if (m_context->m_client_data_hooks) if (const char *source_lang = m_context->m_client_data_hooks->maybe_get_sarif_source_language - (filename)) + (fb.first)) artifact_obj->set ("sourceLanguage", new json::string (source_lang)); return artifact_obj; @@ -1331,34 +1348,40 @@ maybe_read_file (const char *filename) full contents of FILENAME. */ json::object * -sarif_builder::maybe_make_artifact_content_object (const char *filename) const +sarif_builder::maybe_make_artifact_content_object (filename_or_buffer fb) const { - char *text_utf8 = maybe_read_file (filename); - if (!text_utf8) - return NULL; - - json::object *artifact_content_obj = new json::object (); - artifact_content_obj->set ("text", new json::string (text_utf8)); - free (text_utf8); - + json::object *artifact_content_obj = nullptr; + if (fb.second) + { + artifact_content_obj = new json::object (); + artifact_content_obj->set ("text", new json::string (fb.first, + fb.second)); + } + else if (char *text_utf8 = maybe_read_file (fb.first)) + { + artifact_content_obj = new json::object (); + artifact_content_obj->set ("text", new json::string (text_utf8)); + free (text_utf8); + } return artifact_content_obj; } /* Attempt to read the given range of lines from FILENAME; return - a freshly-allocated 0-terminated buffer containing them, or NULL. */ + a freshly-allocated buffer containing them, or NULL. + The buffer is null-terminated, but could also contain embedded null + bytes, so the char_span's length() accessor should be used. */ -static char * -get_source_lines (const char *filename, - int start_line, +static char_span +get_source_lines (expanded_location xloc, int end_line) { auto_vec result; - for (int line = start_line; line <= end_line; line++) + for (int line = xloc.line; line <= end_line; line++) { - char_span line_content = location_get_source_line (filename, line); + char_span line_content = location_get_source_line (xloc, line); if (!line_content.get_buffer ()) - return NULL; + return char_span (nullptr, 0); result.reserve (line_content.length () + 1); for (size_t i = 0; i < line_content.length (); i++) result.quick_push (line_content[i]); @@ -1366,26 +1389,25 @@ get_source_lines (const char *filename, } result.safe_push ('\0'); - return xstrdup (result.address ()); + return char_span (xstrdup (result.address ()), result.length() - 1); } /* Make an artifactContent object (SARIF v2.1.0 section 3.3) for the given - run of lines within FILENAME (including the endpoints). */ + run of lines starting at XLOC (including the endpoints). */ json::object * -sarif_builder::maybe_make_artifact_content_object (const char *filename, - int start_line, +sarif_builder::maybe_make_artifact_content_object (expanded_location xloc, int end_line) const { - char *text_utf8 = get_source_lines (filename, start_line, end_line); + const char_span text_utf8 = get_source_lines (xloc, end_line); if (!text_utf8) return NULL; json::object *artifact_content_obj = new json::object (); - artifact_content_obj->set ("text", new json::string (text_utf8)); - free (text_utf8); - + artifact_content_obj->set ("text", new json::string (text_utf8.get_buffer (), + text_utf8.length ())); + free (const_cast (text_utf8.get_buffer ())); return artifact_content_obj; } diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c b/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c new file mode 100644 index 00000000000..2ca6a069d3f --- /dev/null +++ b/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c @@ -0,0 +1,31 @@ +/* The goal is to test SARIF output of generated data, such as a _Pragma string. + But SARIF output as of yet does not output macro definitions, so such + generated data buffers never end up in the typical SARIF output. One way we + can achieve it is to use -fdump-internal-locations, which outputs top-level + diagnostic notes inside macro definitions, that SARIF will end up processing. + It also outputs a lot of other stuff to stderr (not to the SARIF file) that + is not relevant to this test, so we use a blanket dg-regexp to filter all of + that away. */ + +/* { dg-do compile } */ +/* { dg-options "-fdiagnostics-format=sarif-file -fdump-internal-locations" } */ +/* { dg-allow-blank-lines-in-output "" } */ + +_Pragma("GCC diagnostic push") + +/* { dg-regexp {(.|[\n\r])*} } */ + +/* Because of the way -fdump-internal-locations works, these regexes themselves + will end up in the sarif output also. But due to the escaping, they don't + match themselves, so they still test what we need. */ + +/* Four of this pair are output for the tokens inside the + _Pragma string (3 plus a PRAGMA_EOL). */ + +/* { dg-final { scan-sarif-file "\"artifactLocation\": \{\"uri\": \"\"," } } */ +/* { dg-final { scan-sarif-file "\"snippet\": \{\"text\": \"GCC diagnostic push\\\\n\"" } } */ + +/* One of this pair is output for the overall internal location. */ + +/* { dg-final { scan-sarif-file "\{\"location\": \{\"uri\": \"\"," } } */ +/* { dg-final { scan-sarif-file "\"contents\": \{\"text\": \"GCC diagnostic push\\\\n\\\\0" } } */ --3V7upXqbjpZ4EhLz--