From: Marek Polacek <polacek@redhat.com>
To: Martin Sebor <msebor@gmail.com>
Cc: Joseph Myers <joseph@codesourcery.com>,
Jakub Jelinek <jakub@redhat.com>,
GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH v2] libcpp: Implement -Wbidirectional for CVE-2021-42574 [PR103026]
Date: Tue, 2 Nov 2021 15:52:05 -0400 [thread overview]
Message-ID: <YYGW5eiZhSxq6aHl@redhat.com> (raw)
In-Reply-To: <49fc8d97-4cc7-5c78-d65a-02ea1419f483@gmail.com>
On Tue, Nov 02, 2021 at 01:20:03PM -0600, Martin Sebor wrote:
> On 11/2/21 11:18 AM, Marek Polacek via Gcc-patches wrote:
> > On Mon, Nov 01, 2021 at 10:10:40PM +0000, Joseph Myers wrote:
> > > On Mon, 1 Nov 2021, Marek Polacek via Gcc-patches wrote:
> > >
> > > > + /* We've read a bidi char, update the current vector as necessary. */
> > > > + void on_char (kind k, bool ucn_p)
> > > > + {
> > > > + switch (k)
> > > > + {
> > > > + case kind::LRE:
> > > > + case kind::RLE:
> > > > + case kind::LRO:
> > > > + case kind::RLO:
> > > > + vec.push (ucn_p ? 3u : 1u);
> > > > + break;
> > > > + case kind::LRI:
> > > > + case kind::RLI:
> > > > + case kind::FSI:
> > > > + vec.push (ucn_p ? 2u : 0u);
> > > > + break;
> > > > + case kind::PDF:
> > > > + if (current_ctx () == kind::PDF)
> > > > + pop ();
> > > > + break;
> > > > + case kind::PDI:
> > > > + if (current_ctx () == kind::PDI)
> > > > + pop ();
> > >
> > > My understanding is that PDI should pop all intermediate PDF contexts
> > > outward to a PDI context, which it also pops. (But if it's embedded only
> > > in PDF contexts, with no PDI context containing it, it doesn't pop
> > > anything.)
> > >
> > > I think failing to handle that only means libcpp sometimes models there
> > > as being more bidirectional contexts open than there should be, so it
> > > might give spurious warnings when in fact all such contexts had been
> > > closed by end of string or comment.
> >
> > Ah, you're right.
> > https://www.unicode.org/reports/tr9/#Terminating_Explicit_Directional_Isolates
> > says that "[PDI] terminates the scope of the last LRI, RLI, or FSI whose
> > scope has not yet been terminated, as well as the scopes of any subsequent
> > LREs, RLEs, LROs, or RLOs whose scopes have not yet been terminated."
> > but PDF doesn't have the latter quirk.
> >
> > Fixed in the below: I added a suitable truncate into on_char. The new test
> > Wbidirectional-14.c exercises the handling of PDI.
> >
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> >
> > -- >8 --
> > From a link below:
> > "An issue was discovered in the Bidirectional Algorithm in the Unicode
> > Specification through 14.0. It permits the visual reordering of
> > characters via control sequences, which can be used to craft source code
> > that renders different logic than the logical ordering of tokens
> > ingested by compilers and interpreters. Adversaries can leverage this to
> > encode source code for compilers accepting Unicode such that targeted
> > vulnerabilities are introduced invisibly to human reviewers."
> >
> > More info:
> > https://nvd.nist.gov/vuln/detail/CVE-2021-42574
> > https://trojansource.codes/
> >
> > This is not a compiler bug. However, to mitigate the problem, this patch
> > implements -Wbidirectional=[none|unpaired|any] to warn about possibly
> > misleading Unicode bidirectional characters the preprocessor may encounter.
>
> Birectional sounds very general. Can we come up with a name
> that's a bit more descriptive of the problem the warning reports?
> From skimming the docs and the tests it looks like the warning
> points out uses of bidirectonal characters in the program source
> code as well as comments. Would -Wbidirectional-text be better?
> Or -Wbidirectional-chars? (If Clang is also adding a warning
> for this, syncing up with them one way or the other might be
> helpful.)
I dunno, I could go with -Wbidirectional-chars. Does anyone else
think I should rename the current name to -Wbidirectional-chars?
Other ideas: -Wunicode-bidi / -Wmultibyte-chars / -Wmisleading-bidirectional.
The patch for clang-tidy I saw called this misleading-bidirectional.
> ...
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index c5730228821..9dfb95dc24c 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -327,7 +327,9 @@ Objective-C and Objective-C++ Dialects}.
> > -Warith-conversion @gol
> > -Warray-bounds -Warray-bounds=@var{n} -Warray-compare @gol
> > -Wno-attributes -Wattribute-alias=@var{n} -Wno-attribute-alias @gol
> > --Wno-attribute-warning -Wbool-compare -Wbool-operation @gol
> > +-Wno-attribute-warning @gol
> > +-Wbidirectional=@r{[}none@r{|}unpaired@r{|}any@r{]} @gol
> > +-Wbool-compare -Wbool-operation @gol
> > -Wno-builtin-declaration-mismatch @gol
> > -Wno-builtin-macro-redefined -Wc90-c99-compat -Wc99-c11-compat @gol
> > -Wc11-c2x-compat @gol
> > @@ -7674,6 +7676,21 @@ Attributes considered include @code{alloc_align}, @code{alloc_size},
> > This is the default. You can disable these warnings with either
> > @option{-Wno-attribute-alias} or @option{-Wattribute-alias=0}.
> > +@item -Wbidirectional=@r{[}none@r{|}unpaired@r{|}any@r{]}
> > +@opindex Wbidirectional=
> > +@opindex Wbidirectional
> > +@opindex Wno-bidirectional
> > +Warn about UTF-8 bidirectional characters.
>
> I suggest to mention where. If everywhere, enumerate the most
> common contexts to make it clear it means everywhere:
>
> Warn about UTF-8 bidirectional characters in source code,
> including string literals, identifiers, and comments.
OK, I've updated the text to:
Warn about possibly misleading UTF-8 bidirectional characters in comments,
string literals, character constants, and identifiers.
Marek
next prev parent reply other threads:[~2021-11-02 19:52 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-01 16:36 [PATCH] " Marek Polacek
2021-11-01 22:10 ` Joseph Myers
2021-11-02 17:18 ` [PATCH v2] " Marek Polacek
2021-11-02 19:20 ` Martin Sebor
2021-11-02 19:52 ` Marek Polacek [this message]
2021-11-08 21:33 ` Marek Polacek
2021-11-15 17:28 ` [PATCH] libcpp: Implement -Wbidi-chars " Marek Polacek
2021-11-15 23:15 ` David Malcolm
2021-11-16 19:50 ` [PATCH v2] " Marek Polacek
2021-11-16 23:00 ` David Malcolm
2021-11-17 0:37 ` [PATCH v3] " Marek Polacek
2021-11-17 2:28 ` David Malcolm
2021-11-17 3:05 ` Marek Polacek
2021-11-17 22:45 ` [committed] libcpp: escape non-ASCII source bytes in -Wbidi-chars= [PR103026] David Malcolm
2021-11-17 22:45 ` [PATCH 2/2] libcpp: capture and underline ranges " David Malcolm
2021-11-17 23:01 ` Marek Polacek
2021-11-30 8:38 ` [PATCH] libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026] Stephan Bergmann
2021-11-30 13:26 ` Marek Polacek
2021-11-30 15:00 ` Stephan Bergmann
2021-11-30 15:27 ` Marek Polacek
2022-01-14 9:23 ` Stephan Bergmann
2022-01-14 13:28 ` Marek Polacek
2022-01-14 14:52 ` Stephan Bergmann
2021-11-02 20:57 ` [PATCH 0/2] Re: [PATCH] libcpp: Implement -Wbidirectional " David Malcolm
2021-11-02 20:58 ` [PATCH 1/2] Flag CPP_W_BIDIRECTIONAL so that source lines are escaped David Malcolm
2021-11-02 21:07 ` David Malcolm
2021-11-02 20:58 ` [PATCH 2/2] Capture locations of bidi chars and underline ranges David Malcolm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YYGW5eiZhSxq6aHl@redhat.com \
--to=polacek@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jakub@redhat.com \
--cc=joseph@codesourcery.com \
--cc=msebor@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).