public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Marek Polacek <polacek@redhat.com>
To: Stephan Bergmann <sbergman@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>,
	Martin Sebor <msebor@redhat.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>,
	Joseph Myers <joseph@codesourcery.com>
Subject: Re: [PATCH] libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026]
Date: Tue, 30 Nov 2021 08:26:57 -0500	[thread overview]
Message-ID: <YaYmoSN/8PBEGavw@redhat.com> (raw)
In-Reply-To: <1ab9d323-af24-c2a0-0c6a-596b3f68c605@redhat.com>

On Tue, Nov 30, 2021 at 09:38:57AM +0100, Stephan Bergmann wrote:
> On 15/11/2021 18:28, Marek Polacek via Gcc-patches wrote:
> > On Mon, Nov 08, 2021 at 04:33:43PM -0500, Marek Polacek wrote:
> > > Ping, can we conclude on the name?   IMHO, -Wbidirectional is just fine,
> > > but changing the name is a trivial operation.
> > 
> > Here's a patch with a better name (suggested by Jonathan W.).  Otherwise no
> > changes.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> >  From a link below:
> > "An issue was discovered in the Bidirectional Algorithm in the Unicode
> > Specification through 14.0. It permits the visual reordering of
> > characters via control sequences, which can be used to craft source code
> > that renders different logic than the logical ordering of tokens
> > ingested by compilers and interpreters. Adversaries can leverage this to
> > encode source code for compilers accepting Unicode such that targeted
> > vulnerabilities are introduced invisibly to human reviewers."
> > 
> > More info:
> > https://nvd.nist.gov/vuln/detail/CVE-2021-42574
> > https://trojansource.codes/
> > 
> > This is not a compiler bug.  However, to mitigate the problem, this patch
> > implements -Wbidi-chars=[none|unpaired|any] to warn about possibly
> > misleading Unicode bidirectional characters the preprocessor may encounter.
> > 
> > The default is =unpaired, which warns about improperly terminated
> > bidirectional characters; e.g. a LRE without its appertaining PDF.  The
> > level =any warns about any use of bidirectional characters.
> > 
> > This patch handles both UCNs and UTF-8 characters.  UCNs designating
> > bidi characters in identifiers are accepted since r204886.  Then r217144
> > enabled -fextended-identifiers by default.  Extended characters in C/C++
> > identifiers have been accepted since r275979.  However, this patch still
> > warns about mixing UTF-8 and UCN bidi characters; there seems to be no
> > good reason to allow mixing them.
> 
> I wonder what the rationale is to warn about UCNs, like in
> 
> >         aText = u"\u202D" + aText;
> 
> (as found in the LibreOffice source code).

Is this line mixing a UCN and a UTF-8?  Or is it just that you're
prepending a LRO to aText?  We warn because the LRO is not "closed"
in the context of its string literal, which was part of the Trojan
source attack.  So "\u202D ... \u202C" would not warn.

I'm not sure what workaround I could offer.  Maybe provide an option not to
warn about UCNs at all, though even that is potentially dangerous -- while
you can see UCNs in the source code, if you print strings containing them,
they won't be visible anymore.

Marek


  reply	other threads:[~2021-11-30 13:27 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-01 16:36 [PATCH] libcpp: Implement -Wbidirectional " Marek Polacek
2021-11-01 22:10 ` Joseph Myers
2021-11-02 17:18   ` [PATCH v2] " Marek Polacek
2021-11-02 19:20     ` Martin Sebor
2021-11-02 19:52       ` Marek Polacek
2021-11-08 21:33         ` Marek Polacek
2021-11-15 17:28           ` [PATCH] libcpp: Implement -Wbidi-chars " Marek Polacek
2021-11-15 23:15             ` David Malcolm
2021-11-16 19:50               ` [PATCH v2] " Marek Polacek
2021-11-16 23:00                 ` David Malcolm
2021-11-17  0:37                   ` [PATCH v3] " Marek Polacek
2021-11-17  2:28                     ` David Malcolm
2021-11-17  3:05                       ` Marek Polacek
2021-11-17 22:45                         ` [committed] libcpp: escape non-ASCII source bytes in -Wbidi-chars= [PR103026] David Malcolm
2021-11-17 22:45                           ` [PATCH 2/2] libcpp: capture and underline ranges " David Malcolm
2021-11-17 23:01                             ` Marek Polacek
2021-11-30  8:38             ` [PATCH] libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026] Stephan Bergmann
2021-11-30 13:26               ` Marek Polacek [this message]
2021-11-30 15:00                 ` Stephan Bergmann
2021-11-30 15:27                   ` Marek Polacek
2022-01-14  9:23                     ` Stephan Bergmann
2022-01-14 13:28                       ` Marek Polacek
2022-01-14 14:52                         ` Stephan Bergmann
2021-11-02 20:57 ` [PATCH 0/2] Re: [PATCH] libcpp: Implement -Wbidirectional " David Malcolm
2021-11-02 20:58   ` [PATCH 1/2] Flag CPP_W_BIDIRECTIONAL so that source lines are escaped David Malcolm
2021-11-02 21:07     ` David Malcolm
2021-11-02 20:58   ` [PATCH 2/2] Capture locations of bidi chars and underline ranges David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YaYmoSN/8PBEGavw@redhat.com \
    --to=polacek@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=joseph@codesourcery.com \
    --cc=msebor@redhat.com \
    --cc=sbergman@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).