public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug preprocessor/104030] New: -Wbidi-chars should not warn about UCNs
@ 2022-01-14 14:51 sbergman at redhat dot com
  2022-01-14 14:55 ` [Bug preprocessor/104030] [12 Regression] " mpolacek at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: sbergman at redhat dot com @ 2022-01-14 14:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104030

            Bug ID: 104030
           Summary: -Wbidi-chars should not warn about UCNs
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: preprocessor
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sbergman at redhat dot com
  Target Milestone: ---

As discussed in the sub-thread starting at
<https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585710.html> "Re:
[PATCH] libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026]",
-Wbidi-chars should not emit warnings when the problematic characters are
written as UCNs rather than verbatim.  For example, the line

>         aText = u"\u202D" + aText;

found in the LibreOffice source code should not cause a warning (which couldn't
even be silenced with a local `#pragma GCC diagnostic ignored "-Wbidi-chars"`
due to bug 53431).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug preprocessor/104030] [12 Regression] -Wbidi-chars should not warn about UCNs
  2022-01-14 14:51 [Bug preprocessor/104030] New: -Wbidi-chars should not warn about UCNs sbergman at redhat dot com
@ 2022-01-14 14:55 ` mpolacek at gcc dot gnu.org
  2022-01-14 14:59 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: mpolacek at gcc dot gnu.org @ 2022-01-14 14:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104030

Marek Polacek <mpolacek at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|-Wbidi-chars should not     |[12 Regression]
                   |warn about UCNs             |-Wbidi-chars should not
                   |                            |warn about UCNs
   Last reconfirmed|                            |2022-01-14
           Assignee|unassigned at gcc dot gnu.org      |mpolacek at gcc dot gnu.org
   Target Milestone|---                         |12.0
             Status|UNCONFIRMED                 |ASSIGNED
     Ever confirmed|0                           |1
                 CC|                            |mpolacek at gcc dot gnu.org

--- Comment #1 from Marek Polacek <mpolacek at gcc dot gnu.org> ---
Mine then.  Regression because it rejects previously accepted code.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug preprocessor/104030] [12 Regression] -Wbidi-chars should not warn about UCNs
  2022-01-14 14:51 [Bug preprocessor/104030] New: -Wbidi-chars should not warn about UCNs sbergman at redhat dot com
  2022-01-14 14:55 ` [Bug preprocessor/104030] [12 Regression] " mpolacek at gcc dot gnu.org
@ 2022-01-14 14:59 ` jakub at gcc dot gnu.org
  2022-01-14 15:00 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-01-14 14:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104030

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Either we drop the UCN support altogether, or make -Wbidi-chars a 2 level
warning, -Wbidi-chars mapping to -Wbidi-chars=1 which doesn't warn about UCNs
and
-Wbidi-chars=2 that does.
UCNs indeed don't have the problem that a user in an editor sees something
different than what it actually is (unless some editor interprets UCNs and
shows them as unicode chars), but one reason to warn about UCNs was to make
sure that even what the program prints doesn't suffer from such problems.  Of
course, if something like libreoffice (I bet) carefully ensures it is paired,
but constructs it from smaller separate literals, then it is fine.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug preprocessor/104030] [12 Regression] -Wbidi-chars should not warn about UCNs
  2022-01-14 14:51 [Bug preprocessor/104030] New: -Wbidi-chars should not warn about UCNs sbergman at redhat dot com
  2022-01-14 14:55 ` [Bug preprocessor/104030] [12 Regression] " mpolacek at gcc dot gnu.org
  2022-01-14 14:59 ` jakub at gcc dot gnu.org
@ 2022-01-14 15:00 ` jakub at gcc dot gnu.org
  2022-01-14 15:05 ` mpolacek at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-01-14 15:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104030

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Or e.g. for identifiers with UCNs in them, if they aren't paired, then
as/ld/readelf/demangler at runtime etc. can show weird things.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug preprocessor/104030] [12 Regression] -Wbidi-chars should not warn about UCNs
  2022-01-14 14:51 [Bug preprocessor/104030] New: -Wbidi-chars should not warn about UCNs sbergman at redhat dot com
                   ` (2 preceding siblings ...)
  2022-01-14 15:00 ` jakub at gcc dot gnu.org
@ 2022-01-14 15:05 ` mpolacek at gcc dot gnu.org
  2022-01-14 15:07 ` mpolacek at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: mpolacek at gcc dot gnu.org @ 2022-01-14 15:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104030

--- Comment #4 from Marek Polacek <mpolacek at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #2)
> Either we drop the UCN support altogether, or make -Wbidi-chars a 2 level
> warning, -Wbidi-chars mapping to -Wbidi-chars=1 which doesn't warn about
> UCNs and
> -Wbidi-chars=2 that does.

Exactly.  Except I'm not sure how well that will play with the rest of the
-Wbidi-chars= suboptions.  :/

Like,

-Wbidi-chars=any -Wbidi-chars=2

should probably warn about any bidi chars, including UCNs, but the latter
option might cause it to be just -Wbidi-chars=unpaired, but warn about UCNs.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug preprocessor/104030] [12 Regression] -Wbidi-chars should not warn about UCNs
  2022-01-14 14:51 [Bug preprocessor/104030] New: -Wbidi-chars should not warn about UCNs sbergman at redhat dot com
                   ` (3 preceding siblings ...)
  2022-01-14 15:05 ` mpolacek at gcc dot gnu.org
@ 2022-01-14 15:07 ` mpolacek at gcc dot gnu.org
  2022-01-14 15:30 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: mpolacek at gcc dot gnu.org @ 2022-01-14 15:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104030

--- Comment #5 from Marek Polacek <mpolacek at gcc dot gnu.org> ---
So maybe add -Wbidi-chars-ucn, which is off by default.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug preprocessor/104030] [12 Regression] -Wbidi-chars should not warn about UCNs
  2022-01-14 14:51 [Bug preprocessor/104030] New: -Wbidi-chars should not warn about UCNs sbergman at redhat dot com
                   ` (4 preceding siblings ...)
  2022-01-14 15:07 ` mpolacek at gcc dot gnu.org
@ 2022-01-14 15:30 ` jakub at gcc dot gnu.org
  2022-01-14 15:41 ` sbergman at redhat dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-01-14 15:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104030

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Or support -Wbidi-chars=unpaired,ucn or -Wbidi-chars=any,ucn ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug preprocessor/104030] [12 Regression] -Wbidi-chars should not warn about UCNs
  2022-01-14 14:51 [Bug preprocessor/104030] New: -Wbidi-chars should not warn about UCNs sbergman at redhat dot com
                   ` (5 preceding siblings ...)
  2022-01-14 15:30 ` jakub at gcc dot gnu.org
@ 2022-01-14 15:41 ` sbergman at redhat dot com
  2022-01-18 13:52 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: sbergman at redhat dot com @ 2022-01-14 15:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104030

--- Comment #7 from Stephan Bergmann <sbergman at redhat dot com> ---
(In reply to Jakub Jelinek from comment #2)
> Of course, if something like libreoffice (I bet) carefully ensures it is
> paired, but constructs it from smaller separate literals, then it is fine.

(Or doesn't even need to ensure that e.g. a LRO is paired with a PDF, as my
understanding of <https://www.unicode.org/reports/tr9/tr9-44.html> "Unicode
Bidirectional Algorithm" is that such an LRO doesn't require a matching PDF, in
which case its effect extends to the end of the paragraph, which appears to be
what the example LibreOffice code in comment 0 makes use of.)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug preprocessor/104030] [12 Regression] -Wbidi-chars should not warn about UCNs
  2022-01-14 14:51 [Bug preprocessor/104030] New: -Wbidi-chars should not warn about UCNs sbergman at redhat dot com
                   ` (6 preceding siblings ...)
  2022-01-14 15:41 ` sbergman at redhat dot com
@ 2022-01-18 13:52 ` rguenth at gcc dot gnu.org
  2022-01-24 22:49 ` cvs-commit at gcc dot gnu.org
  2022-01-24 22:50 ` mpolacek at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-18 13:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104030

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P1
           Keywords|                            |diagnostic

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug preprocessor/104030] [12 Regression] -Wbidi-chars should not warn about UCNs
  2022-01-14 14:51 [Bug preprocessor/104030] New: -Wbidi-chars should not warn about UCNs sbergman at redhat dot com
                   ` (7 preceding siblings ...)
  2022-01-18 13:52 ` rguenth at gcc dot gnu.org
@ 2022-01-24 22:49 ` cvs-commit at gcc dot gnu.org
  2022-01-24 22:50 ` mpolacek at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-01-24 22:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104030

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Marek Polacek <mpolacek@gcc.gnu.org>:

https://gcc.gnu.org/g:ae36f839632ddb67a53c26e9c7e73b0f56c4c11b

commit r12-6850-gae36f839632ddb67a53c26e9c7e73b0f56c4c11b
Author: Marek Polacek <polacek@redhat.com>
Date:   Wed Jan 19 19:05:22 2022 -0500

    preprocessor: -Wbidi-chars and UCNs [PR104030]

    Stephan Bergmann reported that our -Wbidi-chars breaks the build
    of LibreOffice because we warn about UCNs even when their usage
    is correct: LibreOffice constructs strings piecewise, as in:

      aText = u"\u202D" + aText;

    and warning about that is overzealous.  Since no editor (AFAIK)
    interprets UCNs to show them as Unicode characters, there's less
    risk in misinterpreting them, and so perhaps we shouldn't warn
    about them by default.  However, identifiers containing UCNs or
    programs generating other programs could still cause confusion,
    so I'm keeping the UCN checking.  To turn it on, you just need
    to use -Wbidi-chars=unpaired,ucn or -Wbidi-chars=any,ucn.

    The implementation is done by using the new EnumSet feature.

            PR preprocessor/104030

    gcc/c-family/ChangeLog:

            * c.opt (Wbidi-chars): Mark as EnumSet.  Also accept =ucn.

    gcc/ChangeLog:

            * doc/invoke.texi: Update documentation for -Wbidi-chars.

    libcpp/ChangeLog:

            * include/cpplib.h (enum cpp_bidirectional_level): Add
            bidirectional_ucn.  Set values explicitly.
            * internal.h (cpp_reader): Adjust warn_bidi_p.
            * lex.cc (maybe_warn_bidi_on_close): Don't warn about UCNs
            unless UCN checking is on.
            (maybe_warn_bidi_on_char): Likewise.

    gcc/testsuite/ChangeLog:

            * c-c++-common/Wbidi-chars-10.c: Turn on UCN checking.
            * c-c++-common/Wbidi-chars-11.c: Likewise.
            * c-c++-common/Wbidi-chars-14.c: Likewise.
            * c-c++-common/Wbidi-chars-16.c: Likewise.
            * c-c++-common/Wbidi-chars-17.c: Likewise.
            * c-c++-common/Wbidi-chars-4.c: Likewise.
            * c-c++-common/Wbidi-chars-5.c: Likewise.
            * c-c++-common/Wbidi-chars-6.c: Likewise.
            * c-c++-common/Wbidi-chars-7.c: Likewise.
            * c-c++-common/Wbidi-chars-8.c: Likewise.
            * c-c++-common/Wbidi-chars-9.c: Likewise.
            * c-c++-common/Wbidi-chars-ranges.c: Likewise.
            * c-c++-common/Wbidi-chars-18.c: New test.
            * c-c++-common/Wbidi-chars-19.c: New test.
            * c-c++-common/Wbidi-chars-20.c: New test.
            * c-c++-common/Wbidi-chars-21.c: New test.
            * c-c++-common/Wbidi-chars-22.c: New test.
            * c-c++-common/Wbidi-chars-23.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug preprocessor/104030] [12 Regression] -Wbidi-chars should not warn about UCNs
  2022-01-14 14:51 [Bug preprocessor/104030] New: -Wbidi-chars should not warn about UCNs sbergman at redhat dot com
                   ` (8 preceding siblings ...)
  2022-01-24 22:49 ` cvs-commit at gcc dot gnu.org
@ 2022-01-24 22:50 ` mpolacek at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: mpolacek at gcc dot gnu.org @ 2022-01-24 22:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104030

Marek Polacek <mpolacek at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #9 from Marek Polacek <mpolacek at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-01-24 22:50 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-14 14:51 [Bug preprocessor/104030] New: -Wbidi-chars should not warn about UCNs sbergman at redhat dot com
2022-01-14 14:55 ` [Bug preprocessor/104030] [12 Regression] " mpolacek at gcc dot gnu.org
2022-01-14 14:59 ` jakub at gcc dot gnu.org
2022-01-14 15:00 ` jakub at gcc dot gnu.org
2022-01-14 15:05 ` mpolacek at gcc dot gnu.org
2022-01-14 15:07 ` mpolacek at gcc dot gnu.org
2022-01-14 15:30 ` jakub at gcc dot gnu.org
2022-01-14 15:41 ` sbergman at redhat dot com
2022-01-18 13:52 ` rguenth at gcc dot gnu.org
2022-01-24 22:49 ` cvs-commit at gcc dot gnu.org
2022-01-24 22:50 ` mpolacek at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).