public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "dmalcolm at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug analyzer/109098] Encoding errors on SARIF output for non-UTF-8 source files
Date: Sat, 11 Mar 2023 00:31:26 +0000	[thread overview]
Message-ID: <bug-109098-4-MeCzzhATTc@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-109098-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109098

--- Comment #3 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> I would have assumed you need -finput-charset= for the non-utf8 ones really
> if your LANG/LANGUAGE is not set to C/UTF8 really.

Yeah, but when complaining about encoding issues, the error message we emit
should at least be properly encoded :/

It's a major pain for my integration testing where two(?) bad bytes in one
source file lead to an unparseable .sarif file (out of thousands).

When quoting source in the .sarif output, we should ensure that the final JSON
output is all valid UTF-8, perhaps falling back to not quoting source for cases
where e.g.
- the source file isn't validly encoded, or
- the -finput-charset= is wrong, or   
- the -finput-charset= is missing or
- where the source file (erroneously) uses a mixture of different encodings in
different 
parts of itself

Probably should also check we do something sane for trojan source attacks

  parent reply	other threads:[~2023-03-11  0:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-11  0:22 [Bug analyzer/109098] New: " dmalcolm at gcc dot gnu.org
2023-03-11  0:24 ` [Bug analyzer/109098] " dmalcolm at gcc dot gnu.org
2023-03-11  0:26 ` pinskia at gcc dot gnu.org
2023-03-11  0:28 ` pinskia at gcc dot gnu.org
2023-03-11  0:31 ` dmalcolm at gcc dot gnu.org [this message]
2023-03-11  0:33 ` dmalcolm at gcc dot gnu.org
2023-03-11  0:55 ` hp at gcc dot gnu.org
2023-03-13 21:46 ` joseph at codesourcery dot com
2023-03-25  1:01 ` dmalcolm at gcc dot gnu.org
2023-03-25  1:57 ` hp at gcc dot gnu.org
2023-03-27 15:34 ` dmalcolm at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-109098-4-MeCzzhATTc@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).