public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug preprocessor/103026] Implement warning for Unicode bidi override characters  [CVE-2021-42574]
Date: Wed, 17 Nov 2021 03:01:38 +0000	[thread overview]
Message-ID: <bug-103026-4-zIKv1elNN8@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-103026-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103026

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Marek Polacek <mpolacek@gcc.gnu.org>:

https://gcc.gnu.org/g:51c500269bf53749b107807d84271385fad35628

commit r12-5331-g51c500269bf53749b107807d84271385fad35628
Author: Marek Polacek <polacek@redhat.com>
Date:   Wed Oct 6 14:33:59 2021 -0400

    libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026]

    From a link below:
    "An issue was discovered in the Bidirectional Algorithm in the Unicode
    Specification through 14.0. It permits the visual reordering of
    characters via control sequences, which can be used to craft source code
    that renders different logic than the logical ordering of tokens
    ingested by compilers and interpreters. Adversaries can leverage this to
    encode source code for compilers accepting Unicode such that targeted
    vulnerabilities are introduced invisibly to human reviewers."

    More info:
    https://nvd.nist.gov/vuln/detail/CVE-2021-42574
    https://trojansource.codes/

    This is not a compiler bug.  However, to mitigate the problem, this patch
    implements -Wbidi-chars=[none|unpaired|any] to warn about possibly
    misleading Unicode bidirectional control characters the preprocessor may
    encounter.

    The default is =unpaired, which warns about improperly terminated
    bidirectional control characters; e.g. a LRE without its corresponding PDF.
    The level =any warns about any use of bidirectional control characters.

    This patch handles both UCNs and UTF-8 characters.  UCNs designating
    bidi characters in identifiers are accepted since r204886.  Then r217144
    enabled -fextended-identifiers by default.  Extended characters in C/C++
    identifiers have been accepted since r275979.  However, this patch still
    warns about mixing UTF-8 and UCN bidi characters; there seems to be no
    good reason to allow mixing them.

    We warn in different contexts: comments (both C and C++-style), string
    literals, character constants, and identifiers.  Expectedly, UCNs are
ignored
    in comments and raw string literals.  The bidirectional control characters
    can nest so this patch handles that as well.

    I have not included nor tested this at all with Fortran (which also has
    string literals and line comments).

    Dave M. posted patches improving diagnostic involving Unicode characters.
    This patch does not make use of this new infrastructure yet.

            PR preprocessor/103026

    gcc/c-family/ChangeLog:

            * c.opt (Wbidi-chars, Wbidi-chars=): New option.

    gcc/ChangeLog:

            * doc/invoke.texi: Document -Wbidi-chars.

    libcpp/ChangeLog:

            * include/cpplib.h (enum cpp_bidirectional_level): New.
            (struct cpp_options): Add cpp_warn_bidirectional.
            (enum cpp_warning_reason): Add CPP_W_BIDIRECTIONAL.
            * internal.h (struct cpp_reader): Add warn_bidi_p member
            function.
            * init.c (cpp_create_reader): Set cpp_warn_bidirectional.
            * lex.c (bidi): New namespace.
            (get_bidi_utf8): New function.
            (get_bidi_ucn): Likewise.
            (maybe_warn_bidi_on_close): Likewise.
            (maybe_warn_bidi_on_char): Likewise.
            (_cpp_skip_block_comment): Implement warning about bidirectional
            control characters.
            (skip_line_comment): Likewise.
            (forms_identifier_p): Likewise.
            (lex_identifier): Likewise.
            (lex_string): Likewise.
            (lex_raw_string): Likewise.

    gcc/testsuite/ChangeLog:

            * c-c++-common/Wbidi-chars-1.c: New test.
            * c-c++-common/Wbidi-chars-2.c: New test.
            * c-c++-common/Wbidi-chars-3.c: New test.
            * c-c++-common/Wbidi-chars-4.c: New test.
            * c-c++-common/Wbidi-chars-5.c: New test.
            * c-c++-common/Wbidi-chars-6.c: New test.
            * c-c++-common/Wbidi-chars-7.c: New test.
            * c-c++-common/Wbidi-chars-8.c: New test.
            * c-c++-common/Wbidi-chars-9.c: New test.
            * c-c++-common/Wbidi-chars-10.c: New test.
            * c-c++-common/Wbidi-chars-11.c: New test.
            * c-c++-common/Wbidi-chars-12.c: New test.
            * c-c++-common/Wbidi-chars-13.c: New test.
            * c-c++-common/Wbidi-chars-14.c: New test.
            * c-c++-common/Wbidi-chars-15.c: New test.
            * c-c++-common/Wbidi-chars-16.c: New test.
            * c-c++-common/Wbidi-chars-17.c: New test.

  parent reply	other threads:[~2021-11-17  3:01 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-01 15:03 [Bug c++/103026] New: " mpolacek at gcc dot gnu.org
2021-11-01 15:03 ` [Bug c++/103026] " mpolacek at gcc dot gnu.org
2021-11-01 16:38 ` [Bug preprocessor/103026] " mpolacek at gcc dot gnu.org
2021-11-01 17:50 ` jakub at gcc dot gnu.org
2021-11-17  3:01 ` cvs-commit at gcc dot gnu.org [this message]
2021-11-17  3:05 ` mpolacek at gcc dot gnu.org
2021-11-17 22:33 ` cvs-commit at gcc dot gnu.org
2021-11-17 22:35 ` cvs-commit at gcc dot gnu.org
2021-11-18 14:35 ` dmalcolm at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-103026-4-zIKv1elNN8@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).