public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression
@ 2021-09-22 10:06 redi at gcc dot gnu.org
  2021-09-24 18:50 ` [Bug libstdc++/102447] " mpolacek at gcc dot gnu.org
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: redi at gcc dot gnu.org @ 2021-09-22 10:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

            Bug ID: 102447
           Summary: std::regex incorrectly accepts invalid bracket
                    expression
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: redi at gcc dot gnu.org
            Blocks: 102445
  Target Milestone: ---

#include <regex>
#include <cassert>

int main()
{
  try {
    std::regex{"[\\w-a]"};
    assert(!"here");
  } catch (const std::regex_error& e) {
    assert(e.code() == std::regex_constants::error_range);
  }
}

This should run and exit successfully, but with GCC we get:

a.out: reg.C:8: int main(): Assertion `!"here"' failed.
Aborted (core dumped)


The bracket expression [\w-a] is invalid.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102445
[Bug 102445] [meta-bug] std::regex issues

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
@ 2021-09-24 18:50 ` mpolacek at gcc dot gnu.org
  2021-09-24 19:03 ` redi at gcc dot gnu.org
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: mpolacek at gcc dot gnu.org @ 2021-09-24 18:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

Marek Polacek <mpolacek at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mpolacek at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-09-24

--- Comment #1 from Marek Polacek <mpolacek at gcc dot gnu.org> ---
Confirmed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
  2021-09-24 18:50 ` [Bug libstdc++/102447] " mpolacek at gcc dot gnu.org
@ 2021-09-24 19:03 ` redi at gcc dot gnu.org
  2021-09-24 21:32 ` redi at gcc dot gnu.org
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: redi at gcc dot gnu.org @ 2021-09-24 19:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |redi at gcc dot gnu.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
  2021-09-24 18:50 ` [Bug libstdc++/102447] " mpolacek at gcc dot gnu.org
  2021-09-24 19:03 ` redi at gcc dot gnu.org
@ 2021-09-24 21:32 ` redi at gcc dot gnu.org
  2021-09-27 11:45 ` redi at gcc dot gnu.org
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: redi at gcc dot gnu.org @ 2021-09-24 21:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Actually, this might not be a bug.

We have this comment in regex_compiler.tcc

      // POSIX doesn't allow '-' as a start-range char (say [a-z--0]),
      // except when the '-' is the first or last character in the bracket
      // expression ([--0]). ECMAScript treats all '-' after a range as a
      // normal character. Also see above, where _M_expression_term gets
called.
      //
      // As a result, POSIX rejects [-----], but ECMAScript doesn't.
      // Boost (1.57.0) always uses POSIX style even in its ECMAScript syntax.
      // Clang (3.5) always uses ECMAScript style even in its POSIX syntax.
      //
      // It turns out that no one reads BNFs ;)


So [\w-a] is valid for the ECMAScript syntax and is equivalent to POSIX
[-_[:alnum:]].

You can confirm this using your browser's javascript console, where this will
print true:

RegExp('[\\w-a]').test('-')

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-09-24 21:32 ` redi at gcc dot gnu.org
@ 2021-09-27 11:45 ` redi at gcc dot gnu.org
  2021-10-01 10:24 ` redi at gcc dot gnu.org
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: redi at gcc dot gnu.org @ 2021-09-27 11:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |INVALID

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Not a bug, this is the expected behaviour for ECMAScript regular expressions.

Using one of the POSIX syntax options will cause regex_error to be thrown.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-09-27 11:45 ` redi at gcc dot gnu.org
@ 2021-10-01 10:24 ` redi at gcc dot gnu.org
  2021-10-02  1:28 ` rs2740 at gmail dot com
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: redi at gcc dot gnu.org @ 2021-10-01 10:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|INVALID                     |---
             Status|RESOLVED                    |NEW

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Reopening. JavaScript engines in web browsers accept invalid regexes for legacy
support:
https://262.ecma-international.org/#sec-additional-ecmascript-features-for-web-browsers

If we're not implementing a browser engine, then it should be a syntax error:

 NonemptyClassRanges :: ClassAtom - ClassAtom ClassRanges

    It is a Syntax Error if IsCharacterClass of the first ClassAtom is true or
IsCharacterClass of the second ClassAtom is true.

    It is a Syntax Error if IsCharacterClass of the first ClassAtom is false
and IsCharacterClass of the second ClassAtom is false and the CharacterValue of
the first ClassAtom is larger than the CharacterValue of the second ClassAtom.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-10-01 10:24 ` redi at gcc dot gnu.org
@ 2021-10-02  1:28 ` rs2740 at gmail dot com
  2021-10-02  6:55 ` redi at gcc dot gnu.org
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rs2740 at gmail dot com @ 2021-10-02  1:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

TC <rs2740 at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rs2740 at gmail dot com

--- Comment #5 from TC <rs2740 at gmail dot com> ---
Hmm, but C++'s normative reference is to a 1999 version of ECMAScript...which
might well have the "legacy" behavior? (I haven't looked at it in detail.)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-10-02  1:28 ` rs2740 at gmail dot com
@ 2021-10-02  6:55 ` redi at gcc dot gnu.org
  2021-10-02 16:54 ` rs2740 at gmail dot com
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: redi at gcc dot gnu.org @ 2021-10-02  6:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

--- Comment #6 from Jonathan Wakely <redi at gcc dot gnu.org> ---
I have looked in detail (I have the 3rd, 4th and 5th editions here) but my
brain started oozing out of my ears.

15.10.2.15 NonemptyClassRanges and 15.10.2.16 NonemptyClassRangesNoDash are the
relevant sections of the 1999 3rd edition. The former defines:

  The internal helper function CharacterRange takes two CharSet parameters
  A and B and performs the following:
  1. If A does not contain exactly one character or B does not contain exactly
  one character then throw a SyntaxError exception.

And the latter has this note:

  Informative comments: ClassRanges can expand into single ClassAtoms and/or
  ranges of two ClassAtoms separated by dashes. In the latter case the
  ClassRanges includes all characters between the first ClassAtom and the
  second ClassAtom, inclusive; an error occurs if either ClassAtom does not
  represent a single character (for example, if one is \w) or if the first
  ClassAtom's code point value is greater than the second ClassAtom's code
  point value.



The ClassAtom \w does not contain exactly one character, so I think it's a
syntax error.

The 3rd edition doesn't mention any legacy features of RegExp, but it does seem
to require the strict behaviour.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-10-02  6:55 ` redi at gcc dot gnu.org
@ 2021-10-02 16:54 ` rs2740 at gmail dot com
  2021-10-04  5:09 ` s.ikarashi at fujitsu dot com
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rs2740 at gmail dot com @ 2021-10-02 16:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

--- Comment #7 from TC <rs2740 at gmail dot com> ---
(In reply to Jonathan Wakely from comment #6)
> I have looked in detail (I have the 3rd, 4th and 5th editions here) but my
> brain started oozing out of my ears.
> 
> 15.10.2.15 NonemptyClassRanges and 15.10.2.16 NonemptyClassRangesNoDash are
> the relevant sections of the 1999 3rd edition. The former defines:
> 
>   The internal helper function CharacterRange takes two CharSet parameters
>   A and B and performs the following:
>   1. If A does not contain exactly one character or B does not contain
> exactly
>   one character then throw a SyntaxError exception.
> 
> And the latter has this note:
> 
>   Informative comments: ClassRanges can expand into single ClassAtoms and/or
>   ranges of two ClassAtoms separated by dashes. In the latter case the
>   ClassRanges includes all characters between the first ClassAtom and the
>   second ClassAtom, inclusive; an error occurs if either ClassAtom does not
>   represent a single character (for example, if one is \w) or if the first
>   ClassAtom's code point value is greater than the second ClassAtom's code
>   point value.
> 
> 
> 
> The ClassAtom \w does not contain exactly one character, so I think it's a
> syntax error.
> 
> The 3rd edition doesn't mention any legacy features of RegExp, but it does
> seem to require the strict behaviour.

I've looked at the 1999 spec now, and agree with your reading.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2021-10-02 16:54 ` rs2740 at gmail dot com
@ 2021-10-04  5:09 ` s.ikarashi at fujitsu dot com
  2021-12-13 22:27 ` redi at gcc dot gnu.org
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: s.ikarashi at fujitsu dot com @ 2021-10-04  5:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

Ikarashi <s.ikarashi at fujitsu dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |s.ikarashi at fujitsu dot com

--- Comment #8 from Ikarashi <s.ikarashi at fujitsu dot com> ---
> The ClassAtom \w does not contain exactly one character, so I think it's a syntax error.

If you process '\w', '-', and 'a' in this order,
can the first \w be a ClassAtom anyway?
According to the definition of Atom,
it seems to be counted as a "\ AtomEscape" before the beginning of a
CharacterClass.

Atom[U, N] ::
      PatternCharacter
      .
      \ AtomEscape[?U, ?N]
      CharacterClass[?U]
      ( GroupSpecifier[?U] Disjunction[?U, ?N] )
      ( ? : Disjunction[?U, ?N] )

A \w is a "\ CharacterClassEscape", so can be a "\ AtomEscape".
I know it also can be a "\ ClassEscape" and a ClassAtomNoDash,
however, \w-a looks two Atoms to me, one Atom \w and one Atom -a.

Is there any rule defining the order of such interpretations?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2021-10-04  5:09 ` s.ikarashi at fujitsu dot com
@ 2021-12-13 22:27 ` redi at gcc dot gnu.org
  2021-12-14 21:47 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: redi at gcc dot gnu.org @ 2021-12-13 22:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2021-12-13 22:27 ` redi at gcc dot gnu.org
@ 2021-12-14 21:47 ` cvs-commit at gcc dot gnu.org
  2021-12-14 21:51 ` redi at gcc dot gnu.org
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-12-14 21:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:7ce3c230edf6e498e125c805a6dd313bf87dc439

commit r12-5977-g7ce3c230edf6e498e125c805a6dd313bf87dc439
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Tue Dec 14 14:32:35 2021 +0000

    libstdc++: Fix handling of invalid ranges in std::regex [PR102447]

    std::regex currently allows invalid bracket ranges such as [\w-a] which
    are only allowed by ECMAScript when in web browser compatibility mode.
    It should be an error, because the start of the range is a character
    class, not a single character. The current implementation of
    _Compiler::_M_expression_term does not provide a way to reject this,
    because we only remember a previous character, not whether we just
    processed a character class (or collating symbol etc.)

    This patch replaces the pair<bool, CharT> used to emulate
    optional<CharT> with a custom class closer to pair<tribool,CharT>. That
    allows us to track three states, so that we can tell when we've just
    seen a character class.

    With this additional state the code in _M_expression_term for processing
    the _S_token_bracket_dash can be improved to correctly reject the [\w-a]
    case, without regressing for valid cases such as [\w-] and [----].

    libstdc++-v3/ChangeLog:

            PR libstdc++/102447
            * include/bits/regex_compiler.h (_Compiler::_BracketState): New
            class.
            (_Compiler::_BrackeyMatcher): New alias template.
            (_Compiler::_M_expression_term): Change pair<bool, CharT>
            parameter to _BracketState. Process first character for
            ECMAScript syntax as well as POSIX.
            * include/bits/regex_compiler.tcc
            (_Compiler::_M_insert_bracket_matcher): Pass _BracketState.
            (_Compiler::_M_expression_term): Use _BracketState to store
            state between calls. Improve handling of dashes in ranges.
            * testsuite/28_regex/algorithms/regex_match/cstring_bracket_01.cc:
            Add more tests for ranges containing dashes. Check invalid
            ranges with character class at the beginning.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2021-12-14 21:47 ` cvs-commit at gcc dot gnu.org
@ 2021-12-14 21:51 ` redi at gcc dot gnu.org
  2022-07-07 23:33 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: redi at gcc dot gnu.org @ 2021-12-14 21:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

--- Comment #10 from Jonathan Wakely <redi at gcc dot gnu.org> ---
The std::regex{"[\\w-a]"} case will throw a std::regex_error exception now. I'd
like to backport this, but I'm going to wait a while. I am not entirely
confident that my changes won't cause regressions elsewhere in the bracket
handling.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2021-12-14 21:51 ` redi at gcc dot gnu.org
@ 2022-07-07 23:33 ` cvs-commit at gcc dot gnu.org
  2022-07-07 23:37 ` redi at gcc dot gnu.org
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-07-07 23:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Jonathan Wakely
<redi@gcc.gnu.org>:

https://gcc.gnu.org/g:c725028a8bb9478ec84332641147ad12b9236922

commit r11-10130-gc725028a8bb9478ec84332641147ad12b9236922
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Tue Dec 14 14:32:35 2021 +0000

    libstdc++: Fix handling of invalid ranges in std::regex [PR102447]

    std::regex currently allows invalid bracket ranges such as [\w-a] which
    are only allowed by ECMAScript when in web browser compatibility mode.
    It should be an error, because the start of the range is a character
    class, not a single character. The current implementation of
    _Compiler::_M_expression_term does not provide a way to reject this,
    because we only remember a previous character, not whether we just
    processed a character class (or collating symbol etc.)

    This patch replaces the pair<bool, CharT> used to emulate
    optional<CharT> with a custom class closer to pair<tribool,CharT>. That
    allows us to track three states, so that we can tell when we've just
    seen a character class.

    With this additional state the code in _M_expression_term for processing
    the _S_token_bracket_dash can be improved to correctly reject the [\w-a]
    case, without regressing for valid cases such as [\w-] and [----].

    libstdc++-v3/ChangeLog:

            PR libstdc++/102447
            * include/bits/regex_compiler.h (_Compiler::_BracketState): New
            class.
            (_Compiler::_BrackeyMatcher): New alias template.
            (_Compiler::_M_expression_term): Change pair<bool, CharT>
            parameter to _BracketState. Process first character for
            ECMAScript syntax as well as POSIX.
            * include/bits/regex_compiler.tcc
            (_Compiler::_M_insert_bracket_matcher): Pass _BracketState.
            (_Compiler::_M_expression_term): Use _BracketState to store
            state between calls. Improve handling of dashes in ranges.
            * testsuite/28_regex/algorithms/regex_match/cstring_bracket_01.cc:
            Add more tests for ranges containing dashes. Check invalid
            ranges with character class at the beginning.

    (cherry picked from commit 7ce3c230edf6e498e125c805a6dd313bf87dc439)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2022-07-07 23:33 ` cvs-commit at gcc dot gnu.org
@ 2022-07-07 23:37 ` redi at gcc dot gnu.org
  2023-06-23 16:12 ` cvs-commit at gcc dot gnu.org
  2023-06-23 16:18 ` redi at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: redi at gcc dot gnu.org @ 2022-07-07 23:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|---                         |11.4

--- Comment #12 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Fixed for 11.4 too.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2022-07-07 23:37 ` redi at gcc dot gnu.org
@ 2023-06-23 16:12 ` cvs-commit at gcc dot gnu.org
  2023-06-23 16:18 ` redi at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-23 16:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-10 branch has been updated by Jonathan Wakely
<redi@gcc.gnu.org>:

https://gcc.gnu.org/g:4c347b8d59958d5aa76c5fdcecd72478e08c5aa3

commit r10-11465-g4c347b8d59958d5aa76c5fdcecd72478e08c5aa3
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Tue Dec 14 14:32:35 2021 +0000

    libstdc++: Fix handling of invalid ranges in std::regex [PR102447]

    std::regex currently allows invalid bracket ranges such as [\w-a] which
    are only allowed by ECMAScript when in web browser compatibility mode.
    It should be an error, because the start of the range is a character
    class, not a single character. The current implementation of
    _Compiler::_M_expression_term does not provide a way to reject this,
    because we only remember a previous character, not whether we just
    processed a character class (or collating symbol etc.)

    This patch replaces the pair<bool, CharT> used to emulate
    optional<CharT> with a custom class closer to pair<tribool,CharT>. That
    allows us to track three states, so that we can tell when we've just
    seen a character class.

    With this additional state the code in _M_expression_term for processing
    the _S_token_bracket_dash can be improved to correctly reject the [\w-a]
    case, without regressing for valid cases such as [\w-] and [----].

    libstdc++-v3/ChangeLog:

            PR libstdc++/102447
            * include/bits/regex_compiler.h (_Compiler::_BracketState): New
            class.
            (_Compiler::_BrackeyMatcher): New alias template.
            (_Compiler::_M_expression_term): Change pair<bool, CharT>
            parameter to _BracketState. Process first character for
            ECMAScript syntax as well as POSIX.
            * include/bits/regex_compiler.tcc
            (_Compiler::_M_insert_bracket_matcher): Pass _BracketState.
            (_Compiler::_M_expression_term): Use _BracketState to store
            state between calls. Improve handling of dashes in ranges.
            * testsuite/28_regex/algorithms/regex_match/cstring_bracket_01.cc:
            Add more tests for ranges containing dashes. Check invalid
            ranges with character class at the beginning.

    (cherry picked from commit 7ce3c230edf6e498e125c805a6dd313bf87dc439)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug libstdc++/102447] std::regex incorrectly accepts invalid bracket expression
  2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2023-06-23 16:12 ` cvs-commit at gcc dot gnu.org
@ 2023-06-23 16:18 ` redi at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: redi at gcc dot gnu.org @ 2023-06-23 16:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102447

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|11.4                        |10.5

--- Comment #14 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Backported for 10.5 too.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-06-23 16:18 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22 10:06 [Bug libstdc++/102447] New: std::regex incorrectly accepts invalid bracket expression redi at gcc dot gnu.org
2021-09-24 18:50 ` [Bug libstdc++/102447] " mpolacek at gcc dot gnu.org
2021-09-24 19:03 ` redi at gcc dot gnu.org
2021-09-24 21:32 ` redi at gcc dot gnu.org
2021-09-27 11:45 ` redi at gcc dot gnu.org
2021-10-01 10:24 ` redi at gcc dot gnu.org
2021-10-02  1:28 ` rs2740 at gmail dot com
2021-10-02  6:55 ` redi at gcc dot gnu.org
2021-10-02 16:54 ` rs2740 at gmail dot com
2021-10-04  5:09 ` s.ikarashi at fujitsu dot com
2021-12-13 22:27 ` redi at gcc dot gnu.org
2021-12-14 21:47 ` cvs-commit at gcc dot gnu.org
2021-12-14 21:51 ` redi at gcc dot gnu.org
2022-07-07 23:33 ` cvs-commit at gcc dot gnu.org
2022-07-07 23:37 ` redi at gcc dot gnu.org
2023-06-23 16:12 ` cvs-commit at gcc dot gnu.org
2023-06-23 16:18 ` redi at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).