public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/102480] New: std::regex fails to match ^ when match_prev_avail is used
@ 2021-09-24 23:34 redi at gcc dot gnu.org
  2021-09-24 23:34 ` [Bug libstdc++/102480] " redi at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2021-09-24 23:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102480

            Bug ID: 102480
           Summary: std::regex fails to match ^ when match_prev_avail is
                    used
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: redi at gcc dot gnu.org
            Blocks: 102445
  Target Milestone: ---

#include <regex>
#include <cassert>

int main() {
  char str[] = "\na";
  std::regex re("^a");
  assert(std::regex_match(str + 1, str + 2, re));
  using std::regex_constants::match_prev_avail;
  assert(std::regex_match(str + 1, str + 2, re, match_prev_avail));
}

Both assertions should pass.

For the first match, the regex matches at the beginning of the input.

For the second match, the regex should also match because the previous
character is a line terminator.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102445
[Bug 102445] [meta-bug] std::regex issues

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/102480] std::regex fails to match ^ when match_prev_avail is used
  2021-09-24 23:34 [Bug libstdc++/102480] New: std::regex fails to match ^ when match_prev_avail is used redi at gcc dot gnu.org
@ 2021-09-24 23:34 ` redi at gcc dot gnu.org
  2021-09-27 11:45 ` redi at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2021-09-24 23:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102480

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-09-24
     Ever confirmed|0                           |1
           Assignee|unassigned at gcc dot gnu.org      |redi at gcc dot gnu.org
             Status|UNCONFIRMED                 |ASSIGNED

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/102480] std::regex fails to match ^ when match_prev_avail is used
  2021-09-24 23:34 [Bug libstdc++/102480] New: std::regex fails to match ^ when match_prev_avail is used redi at gcc dot gnu.org
  2021-09-24 23:34 ` [Bug libstdc++/102480] " redi at gcc dot gnu.org
@ 2021-09-27 11:45 ` redi at gcc dot gnu.org
  2021-09-29 12:49 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2021-09-27 11:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102480

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Actually, I'm not sure what we're supposed to do here.

The ^ anchor matches the start of the input, not the start of a line (except
when using ECMAScript and multiline, but GCC doesn't support multiline yet).

The standard is very unclear what match_prev_avail actually does, but I think
it means that the first character of the input is not actually at the start, so
should not match ^ (except for ECMAScript|multiline cases).

I'm trying to get clarification.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/102480] std::regex fails to match ^ when match_prev_avail is used
  2021-09-24 23:34 [Bug libstdc++/102480] New: std::regex fails to match ^ when match_prev_avail is used redi at gcc dot gnu.org
  2021-09-24 23:34 ` [Bug libstdc++/102480] " redi at gcc dot gnu.org
  2021-09-27 11:45 ` redi at gcc dot gnu.org
@ 2021-09-29 12:49 ` cvs-commit at gcc dot gnu.org
  2021-09-29 15:47 ` redi at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-29 12:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102480

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:f38cd3bdb4cd429a5f7082ea91793a59b37d47b9

commit r12-3963-gf38cd3bdb4cd429a5f7082ea91793a59b37d47b9
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Sep 29 13:48:19 2021 +0100

    libstdc++: Implement std::regex_constants::multiline (LWG 2503)

    This implements LWG 2503, which allows ^ and $ to match line terminator
    characters, rather than only matching the beginning and end of the
    entire input. The multiline option is only valid for ECMAScript, but
    for other grammars we ignore it rather than throwing an exception.

    This is related to PR libstdc++/102480, which incorrectly said that
    ECMAscript should match the beginning of a line when match_prev_avail
    is used. I think that's only supposed to happen when multiline is used.

    The new regex_constants::multiline and basic_regex::multiline constants
    are not defined for strict -std=c++11 and -std=c++14 modes, but
    regex_constants::__multiline is always defined, so that the
    implementation can use it internally.

    Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

    libstdc++-v3/ChangeLog:

            * include/bits/regex.h (basic_regex::multiline): Define constant
            for C++17.
            * include/bits/regex_constants.h (regex_constants::multiline):
            Define constant for C++17.
            (regex_constants::__multiline): Define duplicate constant for
            internal use in C++11 and C++14.
            * include/bits/regex_executor.h (_Executor::_M_match_multiline()):
            New member function.
            (_Executor::_M_is_line_terminator(_CharT)): New member function.
            (_Executor::_M_at_begin(), _Executor::_M_at_end()): Use new
            member functions to support multiline matches.
            * testsuite/28_regex/algorithms/regex_match/multiline.cc: New test.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/102480] std::regex fails to match ^ when match_prev_avail is used
  2021-09-24 23:34 [Bug libstdc++/102480] New: std::regex fails to match ^ when match_prev_avail is used redi at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-09-29 12:49 ` cvs-commit at gcc dot gnu.org
@ 2021-09-29 15:47 ` redi at gcc dot gnu.org
  2022-07-07 23:32 ` cvs-commit at gcc dot gnu.org
  2022-07-07 23:36 ` redi at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2021-09-29 15:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102480

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|ASSIGNED                    |RESOLVED

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
The original report was invalid, but I've now implemented the multiline option
which does support the behaviour in the original testcase.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/102480] std::regex fails to match ^ when match_prev_avail is used
  2021-09-24 23:34 [Bug libstdc++/102480] New: std::regex fails to match ^ when match_prev_avail is used redi at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-09-29 15:47 ` redi at gcc dot gnu.org
@ 2022-07-07 23:32 ` cvs-commit at gcc dot gnu.org
  2022-07-07 23:36 ` redi at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-07-07 23:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102480

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Jonathan Wakely
<redi@gcc.gnu.org>:

https://gcc.gnu.org/g:e61c16fd9ba89e7bc12284e524dcd8379c2bebfc

commit r11-10120-ge61c16fd9ba89e7bc12284e524dcd8379c2bebfc
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Sep 29 13:48:19 2021 +0100

    libstdc++: Implement std::regex_constants::multiline (LWG 2503)

    This implements LWG 2503, which allows ^ and $ to match line terminator
    characters, rather than only matching the beginning and end of the
    entire input. The multiline option is only valid for ECMAScript, but
    for other grammars we ignore it rather than throwing an exception.

    This is related to PR libstdc++/102480, which incorrectly said that
    ECMAscript should match the beginning of a line when match_prev_avail
    is used. I think that's only supposed to happen when multiline is used.

    The new regex_constants::multiline and basic_regex::multiline constants
    are not defined for strict -std=c++11 and -std=c++14 modes, but
    regex_constants::__multiline is always defined, so that the
    implementation can use it internally.

    Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

    libstdc++-v3/ChangeLog:

            * include/bits/regex.h (basic_regex::multiline): Define constant
            for C++17.
            * include/bits/regex_constants.h (regex_constants::multiline):
            Define constant for C++17.
            (regex_constants::__multiline): Define duplicate constant for
            internal use in C++11 and C++14.
            * include/bits/regex_executor.h (_Executor::_M_match_multiline()):
            New member function.
            (_Executor::_M_is_line_terminator(_CharT)): New member function.
            (_Executor::_M_at_begin(), _Executor::_M_at_end()): Use new
            member functions to support multiline matches.
            * testsuite/28_regex/algorithms/regex_match/multiline.cc: New test.

    (cherry picked from commit f38cd3bdb4cd429a5f7082ea91793a59b37d47b9)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/102480] std::regex fails to match ^ when match_prev_avail is used
  2021-09-24 23:34 [Bug libstdc++/102480] New: std::regex fails to match ^ when match_prev_avail is used redi at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2022-07-07 23:32 ` cvs-commit at gcc dot gnu.org
@ 2022-07-07 23:36 ` redi at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2022-07-07 23:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102480

--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #3)
> The original report was invalid, but I've now implemented the multiline
> option which does support the behaviour in the original testcase.

Backported for 11.4 now too.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-07-07 23:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-24 23:34 [Bug libstdc++/102480] New: std::regex fails to match ^ when match_prev_avail is used redi at gcc dot gnu.org
2021-09-24 23:34 ` [Bug libstdc++/102480] " redi at gcc dot gnu.org
2021-09-27 11:45 ` redi at gcc dot gnu.org
2021-09-29 12:49 ` cvs-commit at gcc dot gnu.org
2021-09-29 15:47 ` redi at gcc dot gnu.org
2022-07-07 23:32 ` cvs-commit at gcc dot gnu.org
2022-07-07 23:36 ` redi at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).