public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: Marek Polacek <polacek@redhat.com>,
	Joseph Myers <joseph@codesourcery.com>
Cc: Jakub Jelinek <jakub@redhat.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH v2] libcpp: Implement -Wbidirectional for CVE-2021-42574 [PR103026]
Date: Tue, 2 Nov 2021 13:20:03 -0600	[thread overview]
Message-ID: <49fc8d97-4cc7-5c78-d65a-02ea1419f483@gmail.com> (raw)
In-Reply-To: <YYFzA3Z4rPwTo927@redhat.com>

On 11/2/21 11:18 AM, Marek Polacek via Gcc-patches wrote:
> On Mon, Nov 01, 2021 at 10:10:40PM +0000, Joseph Myers wrote:
>> On Mon, 1 Nov 2021, Marek Polacek via Gcc-patches wrote:
>>
>>> +  /* We've read a bidi char, update the current vector as necessary.  */
>>> +  void on_char (kind k, bool ucn_p)
>>> +  {
>>> +    switch (k)
>>> +      {
>>> +      case kind::LRE:
>>> +      case kind::RLE:
>>> +      case kind::LRO:
>>> +      case kind::RLO:
>>> +	vec.push (ucn_p ? 3u : 1u);
>>> +	break;
>>> +      case kind::LRI:
>>> +      case kind::RLI:
>>> +      case kind::FSI:
>>> +	vec.push (ucn_p ? 2u : 0u);
>>> +	break;
>>> +      case kind::PDF:
>>> +	if (current_ctx () == kind::PDF)
>>> +	  pop ();
>>> +	break;
>>> +      case kind::PDI:
>>> +	if (current_ctx () == kind::PDI)
>>> +	  pop ();
>>
>> My understanding is that PDI should pop all intermediate PDF contexts
>> outward to a PDI context, which it also pops.  (But if it's embedded only
>> in PDF contexts, with no PDI context containing it, it doesn't pop
>> anything.)
>>
>> I think failing to handle that only means libcpp sometimes models there
>> as being more bidirectional contexts open than there should be, so it
>> might give spurious warnings when in fact all such contexts had been
>> closed by end of string or comment.
> 
> Ah, you're right.
> https://www.unicode.org/reports/tr9/#Terminating_Explicit_Directional_Isolates
> says that "[PDI] terminates the scope of the last LRI, RLI, or FSI whose
> scope has not yet been terminated, as well as the scopes of any subsequent
> LREs, RLEs, LROs, or RLOs whose scopes have not yet been terminated."
> but PDF doesn't have the latter quirk.
> 
> Fixed in the below: I added a suitable truncate into on_char.  The new test
> Wbidirectional-14.c exercises the handling of PDI.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> -- >8 --
>  From a link below:
> "An issue was discovered in the Bidirectional Algorithm in the Unicode
> Specification through 14.0. It permits the visual reordering of
> characters via control sequences, which can be used to craft source code
> that renders different logic than the logical ordering of tokens
> ingested by compilers and interpreters. Adversaries can leverage this to
> encode source code for compilers accepting Unicode such that targeted
> vulnerabilities are introduced invisibly to human reviewers."
> 
> More info:
> https://nvd.nist.gov/vuln/detail/CVE-2021-42574
> https://trojansource.codes/
> 
> This is not a compiler bug.  However, to mitigate the problem, this patch
> implements -Wbidirectional=[none|unpaired|any] to warn about possibly
> misleading Unicode bidirectional characters the preprocessor may encounter.

Birectional sounds very general.  Can we come up with a name
that's a bit more descriptive of the problem the warning reports?
 From skimming the docs and the tests it looks like the warning
points out uses of bidirectonal characters in the program source
code as well as comments.  Would -Wbidirectional-text be better?
Or -Wbidirectional-chars?  (If Clang is also adding a warning
for this, syncing up with them one way or the other might be
helpful.)

...
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c5730228821..9dfb95dc24c 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -327,7 +327,9 @@ Objective-C and Objective-C++ Dialects}.
>   -Warith-conversion @gol
>   -Warray-bounds  -Warray-bounds=@var{n}  -Warray-compare @gol
>   -Wno-attributes  -Wattribute-alias=@var{n} -Wno-attribute-alias @gol
> --Wno-attribute-warning  -Wbool-compare  -Wbool-operation @gol
> +-Wno-attribute-warning  @gol
> +-Wbidirectional=@r{[}none@r{|}unpaired@r{|}any@r{]} @gol
> +-Wbool-compare  -Wbool-operation @gol
>   -Wno-builtin-declaration-mismatch @gol
>   -Wno-builtin-macro-redefined  -Wc90-c99-compat  -Wc99-c11-compat @gol
>   -Wc11-c2x-compat @gol
> @@ -7674,6 +7676,21 @@ Attributes considered include @code{alloc_align}, @code{alloc_size},
>   This is the default.  You can disable these warnings with either
>   @option{-Wno-attribute-alias} or @option{-Wattribute-alias=0}.
>   
> +@item -Wbidirectional=@r{[}none@r{|}unpaired@r{|}any@r{]}
> +@opindex Wbidirectional=
> +@opindex Wbidirectional
> +@opindex Wno-bidirectional
> +Warn about UTF-8 bidirectional characters.

I suggest to mention where.  If everywhere, enumerate the most
common contexts to make it clear it means everywhere:

   Warn about UTF-8 bidirectional characters in source code,
   including string literals, identifiers, and comments.

Martin

  reply	other threads:[~2021-11-02 19:20 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-01 16:36 [PATCH] " Marek Polacek
2021-11-01 22:10 ` Joseph Myers
2021-11-02 17:18   ` [PATCH v2] " Marek Polacek
2021-11-02 19:20     ` Martin Sebor [this message]
2021-11-02 19:52       ` Marek Polacek
2021-11-08 21:33         ` Marek Polacek
2021-11-15 17:28           ` [PATCH] libcpp: Implement -Wbidi-chars " Marek Polacek
2021-11-15 23:15             ` David Malcolm
2021-11-16 19:50               ` [PATCH v2] " Marek Polacek
2021-11-16 23:00                 ` David Malcolm
2021-11-17  0:37                   ` [PATCH v3] " Marek Polacek
2021-11-17  2:28                     ` David Malcolm
2021-11-17  3:05                       ` Marek Polacek
2021-11-17 22:45                         ` [committed] libcpp: escape non-ASCII source bytes in -Wbidi-chars= [PR103026] David Malcolm
2021-11-17 22:45                           ` [PATCH 2/2] libcpp: capture and underline ranges " David Malcolm
2021-11-17 23:01                             ` Marek Polacek
2021-11-30  8:38             ` [PATCH] libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026] Stephan Bergmann
2021-11-30 13:26               ` Marek Polacek
2021-11-30 15:00                 ` Stephan Bergmann
2021-11-30 15:27                   ` Marek Polacek
2022-01-14  9:23                     ` Stephan Bergmann
2022-01-14 13:28                       ` Marek Polacek
2022-01-14 14:52                         ` Stephan Bergmann
2021-11-02 20:57 ` [PATCH 0/2] Re: [PATCH] libcpp: Implement -Wbidirectional " David Malcolm
2021-11-02 20:58   ` [PATCH 1/2] Flag CPP_W_BIDIRECTIONAL so that source lines are escaped David Malcolm
2021-11-02 21:07     ` David Malcolm
2021-11-02 20:58   ` [PATCH 2/2] Capture locations of bidi chars and underline ranges David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49fc8d97-4cc7-5c78-d65a-02ea1419f483@gmail.com \
    --to=msebor@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=joseph@codesourcery.com \
    --cc=polacek@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).