public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Lewis Hyatt <lhyatt@gmail.com>
To: David Malcolm <dmalcolm@redhat.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output
Date: Tue, 15 Aug 2023 13:51:50 -0400	[thread overview]
Message-ID: <20230815175150.GA40360@ldh-imac.local> (raw)
In-Reply-To: <ac3118311aa63cf6ac7af215884607855d7cc9e9.camel@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3534 bytes --]

On Tue, Aug 15, 2023 at 01:04:04PM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > The diagnostics routines for SARIF output need to read the source code back
> > in, so that they can generate "snippet" and "content" records, so they need to
> > be able to cope with generated data locations.  Add support for that in
> > diagnostic-format-sarif.cc.
> > 
> > gcc/ChangeLog:
> > 
> >         * diagnostic-format-sarif.cc (class sarif_builder): Adapt interface
> >         to support generated data locations.
> >         (sarif_builder::maybe_make_physical_location_object): Change the
> >         m_filenames hash_set to support generated data.
> >         (sarif_builder::make_artifact_location_object): Use a source_id rather
> >         than a plain file name.
> >         (sarif_builder::maybe_make_region_object): Adapt to
> >         expanded_location interface changes.
> >         (sarif_builder::maybe_make_region_object_for_context): Likewise.
> >         (sarif_builder::make_artifact_object): Likewise.
> >         (sarif_builder::make_run_object): Handle generated data.
> >         (sarif_builder::maybe_make_artifact_content_object): Likewise.
> >         (get_source_lines): Likewise.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >         * c-c++-common/diagnostic-format-sarif-file-5.c: New test.
> 
> I'm not sure if generated data is allowed as part of a SARIF artefact,
> or if there's a more standard-compliant way of representing this; SARIF
> says an artefact is a "sequence of bytes addressable via a URI".
> 
> Can you post a simple example of the generated .sarif JSON please? 
> e.g. from the new test, so that we can see it looks like.
> 
> You could run it through:
> 
>   python -m json.tool 
> 
> to format it for easier reading.

For a simple example like:

_Pragma("GCC diagnostic ignored \"-Wnot-an-option\"")

for which the normal output is:

=====
In buffer generated from t.cpp:1:
<generated>:1:24: warning: unknown option after ‘#pragma GCC diagnostic’ kind [-Wpragmas]
    1 | GCC diagnostic ignored "-Wnot-an-option"
      |                        ^~~~~~~~~~~~~~~~~
t.cpp:1:1: note: in <_Pragma directive>
    1 | _Pragma("GCC diagnostic ignored \"-Wnot-an-option\"")
      | ^~~~~~~
=====

The SARIF output does not end up referencing any generated data locations,
because those are logically part of the "expansion" of the _Pragma
directive, and it doesn't output macro expansions.  In order for SARIF to
currently do something with generated data, it needs to see a generated data
location in a non-macro context. The only way to get GCC to do that, right
now, is with -fdump-internal-locations, which is what the new test case
does. That just unfortunately generates a larger amount of output. I attached
it, in case that's still helpful, for the following program:

=====
_Pragma("GCC diagnostic push")
=====

I guess there's potentially already a problem here because 'python -m
json.tool' is unhappy with this output and refuses to process it:

=====
Invalid \escape: line 1 column 3436 (char 3435)
=====

The related text is:
=====
{"location": {"uri": "<generated>", "uriBaseId": "PWD"},
"contents":{"text": "GCC diagnostic push\n\0"}
=====

And the \0 is not allowed it seems?

I also attached the output of 'python -m json.tool' anyway, after manually
removing the \0.

Is it better to just skip these locations for now?

-Lewis

[-- Attachment #2: t.cpp.sarif --]
[-- Type: text/plain, Size: 5872 bytes --]

{"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json", "version": "2.1.0", "runs": [{"tool": {"driver": {"name": "GNU C++17", "fullName": "GNU C++17 (GCC) version 14.0.0 20230811 (experimental) (x86_64-pc-linux-gnu)", "version": "14.0.0 20230811 (experimental)", "informationUri": "https://gcc.gnu.org/gcc-14/", "rules": []}}, "invocations": [{"executionSuccessful": true, "toolExecutionNotifications": []}], "originalUriBaseIds": {"PWD": {"uri": "file:///home/lewis/"}}, "artifacts": [{"location": {"uri": "t.cpp", "uriBaseId": "PWD"}, "contents": {"text": "_Pragma(\"GCC diagnostic push\")\n"}, "sourceLanguage": "cplusplus"}, {"location": {"uri": "/usr/include/stdc-predef.h"}, "contents": {"text": "/* Copyright (C) 1991-2022 Free Software Foundation, Inc.\n   This file is part of the GNU C Library.\n\n   The GNU C Library is free software; you can redistribute it and/or\n   modify it under the terms of the GNU Lesser General Public\n   License as published by the Free Software Foundation; either\n   version 2.1 of the License, or (at your option) any later version.\n\n   The GNU C Library is distributed in the hope that it will be useful,\n   but WITHOUT ANY WARRANTY; without even the implied warranty of\n   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU\n   Lesser General Public License for more details.\n\n   You should have received a copy of the GNU Lesser General Public\n   License along with the GNU C Library; if not, see\n   <https://www.gnu.org/licenses/>.  */\n\n#ifndef\t_STDC_PREDEF_H\n#define\t_STDC_PREDEF_H\t1\n\n/* This header is separate from features.h so that the compiler can\n   include it implicitly at the start of every compilation.  It must\n   not itself include <features.h> or any other header that includes\n   <features.h> because the implicit include comes before any feature\n   test macros that may be defined in a source file before it first\n   explicitly includes a system header.  GCC knows the name of this\n   header in order to preinclude it.  */\n\n/* glibc's intent is to support the IEC 559 math functionality, real\n   and complex.  If the GCC (4.9 and later) predefined macros\n   specifying compiler intent are available, use them to determine\n   whether the overall intent is to support these features; otherwise,\n   presume an older compiler has intent to support these features and\n   define these macros by default.  */\n\n#ifdef __GCC_IEC_559\n# if __GCC_IEC_559 > 0\n#  define __STDC_IEC_559__\t\t1\n#  define __STDC_IEC_60559_BFP__ \t201404L\n# endif\n#else\n# define __STDC_IEC_559__\t\t1\n# define __STDC_IEC_60559_BFP__ \t201404L\n#endif\n\n#ifdef __GCC_IEC_559_COMPLEX\n# if __GCC_IEC_559_COMPLEX > 0\n#  define __STDC_IEC_559_COMPLEX__\t1\n#  define __STDC_IEC_60559_COMPLEX__\t201404L\n# endif\n#else\n# define __STDC_IEC_559_COMPLEX__\t1\n# define __STDC_IEC_60559_COMPLEX__\t201404L\n#endif\n\n/* wchar_t uses Unicode 10.0.0.  Version 10.0 of the Unicode Standard is\n   synchronized with ISO/IEC 10646:2017, fifth edition, plus\n   the following additions from Amendment 1 to the fifth edition:\n   - 56 emoji characters\n   - 285 hentaigana\n   - 3 additional Zanabazar Square characters */\n#define __STDC_ISO_10646__\t\t201706L\n\n#endif\n"}, "sourceLanguage": "cplusplus"}, {"location": {"uri": "<generated>", "uriBaseId": "PWD"}, "contents": {"text": "GCC diagnostic push\n\0"}, "sourceLanguage": "cplusplus"}], "results": [{"ruleId": "note", "level": "note", "message": {"text": "expansion point is location 258918"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "t.cpp", "uriBaseId": "PWD"}, "region": {"startLine": 1, "startColumn": 1, "endColumn": 8}, "contextRegion": {"startLine": 1, "snippet": {"text": "_Pragma(\"GCC diagnostic push\")\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 0 has ‘x-location == y-location == 259906’"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "<generated>", "uriBaseId": "PWD"}, "region": {"startLine": 1, "startColumn": 1, "endColumn": 4}, "contextRegion": {"startLine": 1, "snippet": {"text": "GCC diagnostic push\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 1 has ‘x-location == y-location == 260387’"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "<generated>", "uriBaseId": "PWD"}, "region": {"startLine": 1, "startColumn": 16, "endColumn": 20}, "contextRegion": {"startLine": 1, "snippet": {"text": "GCC diagnostic push\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 2 has ‘x-location == y-location == 260512’"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "<generated>", "uriBaseId": "PWD"}, "region": {"startLine": 1, "startColumn": 20, "endColumn": 21}, "contextRegion": {"startLine": 1, "snippet": {"text": "GCC diagnostic push\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "expansion point is location 189172"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "/usr/include/stdc-predef.h"}, "region": {"startLine": 47, "startColumn": 6, "endColumn": 27}, "contextRegion": {"startLine": 47, "snippet": {"text": "# if __GCC_IEC_559_COMPLEX > 0\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 0 has ‘x-location == y-location == 1’"}, "locations": [{}]}, {"ruleId": "note", "level": "note", "message": {"text": "expansion point is location 148204"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "/usr/include/stdc-predef.h"}, "region": {"startLine": 37, "startColumn": 6, "endColumn": 19}, "contextRegion": {"startLine": 37, "snippet": {"text": "# if __GCC_IEC_559 > 0\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 0 has ‘x-location == y-location == 1’"}, "locations": [{}]}]}]}

[-- Attachment #3: t.cpp.json --]
[-- Type: text/plain, Size: 12185 bytes --]

{
    "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
    "runs": [
        {
            "artifacts": [
                {
                    "contents": {
                        "text": "_Pragma(\"GCC diagnostic push\")\n"
                    },
                    "location": {
                        "uri": "t.cpp",
                        "uriBaseId": "PWD"
                    },
                    "sourceLanguage": "cplusplus"
                },
                {
                    "contents": {
                        "text": "/* Copyright (C) 1991-2022 Free Software Foundation, Inc.\n   This file is part of the GNU C Library.\n\n   The GNU C Library is free software; you can redistribute it and/or\n   modify it under the terms of the GNU Lesser General Public\n   License as published by the Free Software Foundation; either\n   version 2.1 of the License, or (at your option) any later version.\n\n   The GNU C Library is distributed in the hope that it will be useful,\n   but WITHOUT ANY WARRANTY; without even the implied warranty of\n   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU\n   Lesser General Public License for more details.\n\n   You should have received a copy of the GNU Lesser General Public\n   License along with the GNU C Library; if not, see\n   <https://www.gnu.org/licenses/>.  */\n\n#ifndef\t_STDC_PREDEF_H\n#define\t_STDC_PREDEF_H\t1\n\n/* This header is separate from features.h so that the compiler can\n   include it implicitly at the start of every compilation..  It must\n   not itself include <features.h> or any other header that includes\n   <features.h> because the implicit include comes before any feature\n   test macros that may be defined in a source file before it first\n   explicitly includes a system header.  GCC knows the name of this\n   header in order to preinclude it.  */\n\n/* glibc's intent is to support the IEC 559 math functionality, real\n   and complex.  If the GCC (4.9 and later) predefined macros\n   specifying compiler intent are available, use them to determine\n   whether the overall intent is to support these features; otherwise,\n   presume an older compiler has intent to support these features and\n   define these macros by default.  */\n\n#ifdef __GCC_IEC_559\n# if __GCC_IEC_559 > 0\n#  define __STDC_IEC_559__\t\t1\n#  define __STDC_IEC_60559_BFP__ \t201404L\n# endif\n#else\n# define __STDC_IEC_559__\t\t1\n# define __STDC_IEC_60559_BFP__ \t201404L\n#endif\n\n#ifdef __GCC_IEC_559_COMPLEX\n# if __GCC_IEC_559_COMPLEX > 0\n#  define __STDC_IEC_559_COMPLEX__\t1\n#  define __STDC_IEC_60559_COMPLEX__\t201404L\n# endif\n#else\n# define __STDC_IEC_559_COMPLEX__\t1\n# define __STDC_IEC_60559_COMPLEX__\t201404L\n#endif\n\n/* wchar_t uses Unicode 10.0.0.  Version 10.0 of the Unicode Standard is\n   synchronized with ISO/IEC 10646:2017, fifth edition, plus\n   the following additions from Amendment 1 to the fifth edition:\n   - 56 emoji characters\n   - 285 hentaigana\n   - 3 additional Zanabazar Square characters */\n#define __STDC_ISO_10646__\t\t201706L\n\n#endif\n"
                    },
                    "location": {
                        "uri": "/usr/include/stdc-predef.h"
                    },
                    "sourceLanguage": "cplusplus"
                },
                {
                    "contents": {
                        "text": "GCC diagnostic push\n"
                    },
                    "location": {
                        "uri": "<generated>",
                        "uriBaseId": "PWD"
                    },
                    "sourceLanguage": "cplusplus"
                }
            ],
            "invocations": [
                {
                    "executionSuccessful": true,
                    "toolExecutionNotifications": []
                }
            ],
            "originalUriBaseIds": {
                "PWD": {
                    "uri": "file:///home/lewis/"
                }
            },
            "results": [
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "t.cpp",
                                    "uriBaseId": "PWD"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "_Pragma(\"GCC diagnostic push\")\n"
                                    },
                                    "startLine": 1
                                },
                                "region": {
                                    "endColumn": 8,
                                    "startColumn": 1,
                                    "startLine": 1
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "expansion point is location 258918"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "<generated>",
                                    "uriBaseId": "PWD"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "GCC diagnostic push\n"
                                    },
                                    "startLine": 1
                                },
                                "region": {
                                    "endColumn": 4,
                                    "startColumn": 1,
                                    "startLine": 1
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "token 0 has \u2018x-location == y-location == 259906\u2019"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "<generated>",
                                    "uriBaseId": "PWD"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "GCC diagnostic push\n"
                                    },
                                    "startLine": 1
                                },
                                "region": {
                                    "endColumn": 20,
                                    "startColumn": 16,
                                    "startLine": 1
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "token 1 has \u2018x-location == y-location == 260387\u2019"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "<generated>",
                                    "uriBaseId": "PWD"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "GCC diagnostic push\n"
                                    },
                                    "startLine": 1
                                },
                                "region": {
                                    "endColumn": 21,
                                    "startColumn": 20,
                                    "startLine": 1
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "token 2 has \u2018x-location == y-location == 260512\u2019"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "/usr/include/stdc-predef.h"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "# if __GCC_IEC_559_COMPLEX > 0\n"
                                    },
                                    "startLine": 47
                                },
                                "region": {
                                    "endColumn": 27,
                                    "startColumn": 6,
                                    "startLine": 47
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "expansion point is location 189172"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {}
                    ],
                    "message": {
                        "text": "token 0 has \u2018x-location == y-location == 1\u2019"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "/usr/include/stdc-predef.h"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "# if __GCC_IEC_559 > 0\n"
                                    },
                                    "startLine": 37
                                },
                                "region": {
                                    "endColumn": 19,
                                    "startColumn": 6,
                                    "startLine": 37
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "expansion point is location 148204"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {}
                    ],
                    "message": {
                        "text": "token 0 has \u2018x-location == y-location == 1\u2019"
                    },
                    "ruleId": "note"
                }
            ],
            "tool": {
                "driver": {
                    "fullName": "GNU C++17 (GCC) version 14.0.0 20230811 (experimental) (x86_64-pc-linux-gnu)",
                    "informationUri": "https://gcc.gnu.org/gcc-14/",
                    "name": "GNU C++17",
                    "rules": [],
                    "version": "14.0.0 20230811 (experimental)"
                }
            }
        }
    ],
    "version": "2.1.0"
}

  reply	other threads:[~2023-08-15 17:51 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-21 23:08 [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
2023-07-28 22:58   ` David Malcolm
2023-07-31 22:39     ` Lewis Hyatt
2023-08-09 22:14       ` [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
2023-08-11 22:45           ` David Malcolm
2023-08-13 20:18             ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 2/8] libcpp: diagnostics: Support generated data in expanded locations Lewis Hyatt
2023-08-11 23:02           ` David Malcolm
2023-08-14 21:41             ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot Lewis Hyatt
2023-08-15 15:43           ` David Malcolm
2023-08-15 17:58             ` Lewis Hyatt
2023-08-15 19:39               ` David Malcolm
2023-08-23 21:22                 ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers Lewis Hyatt
2023-08-15 16:15           ` David Malcolm
2023-08-15 18:15             ` Lewis Hyatt
2023-08-15 19:46               ` David Malcolm
2023-08-15 20:08                 ` Lewis Hyatt
2023-08-23 19:41                   ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 5/8] diagnostics: Support testing generated data in input.cc selftests Lewis Hyatt
2023-08-15 16:27           ` David Malcolm
2023-08-09 22:14         ` [PATCH v4 6/8] diagnostics: Full support for generated data locations Lewis Hyatt
2023-08-15 16:39           ` David Malcolm
2023-08-09 22:14         ` [PATCH v4 7/8] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output Lewis Hyatt
2023-08-15 17:04           ` David Malcolm
2023-08-15 17:51             ` Lewis Hyatt [this message]
2023-07-21 23:08 ` [PATCH v3 2/4] diagnostics: Handle generated data locations in edit_context Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 3/4] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 4/4] diagnostics: Support generated data locations in SARIF output Lewis Hyatt
2023-07-28 22:22 ` [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens David Malcolm
2023-07-29 14:27   ` Lewis Hyatt
2023-07-29 16:03     ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230815175150.GA40360@ldh-imac.local \
    --to=lhyatt@gmail.com \
    --cc=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).