public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/61424] New: std::regex matches right to left, not leftmost longest
@ 2014-06-05 20:11 redi at gcc dot gnu.org
  2014-06-05 21:06 ` [Bug libstdc++/61424] " redi at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: redi at gcc dot gnu.org @ 2014-06-05 20:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61424

            Bug ID: 61424
           Summary: std::regex matches right to left, not leftmost longest
           Product: gcc
           Version: 4.9.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: redi at gcc dot gnu.org

#include <regex>
#include <iostream>

using namespace std;

int main()
{
  regex_constants::syntax_option_type grammar[] = {
    regex_constants::ECMAScript, regex_constants::extended,
    regex_constants::awk, regex_constants::egrep
  };
  for (auto g : grammar)
  {
    regex re("tournament|tour", g);
    const char str[] = "tournament";
    cmatch m;
    regex_search(str, m, re);
    cout << m[0] << endl;
  }
}

This prints:

tour
tour
tour
tour

ECMAscript should check alternations left to right, and POSIX has the leftmost,
longest rule
(http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/leftmost_longest_rule.html)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libstdc++/61424] std::regex matches right to left, not leftmost longest
  2014-06-05 20:11 [Bug libstdc++/61424] New: std::regex matches right to left, not leftmost longest redi at gcc dot gnu.org
@ 2014-06-05 21:06 ` redi at gcc dot gnu.org
  2014-06-05 22:33 ` timshen at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: redi at gcc dot gnu.org @ 2014-06-05 21:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61424

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
A slight variation:

#include <regex>
#include <iostream>

using namespace std;

int main()
{
  regex_constants::syntax_option_type grammar[] = {
    regex_constants::ECMAScript, regex_constants::extended,
    regex_constants::awk, regex_constants::egrep
  };
  for (auto g : grammar)
  {
    regex re("tour|tournament|tourn", g);
    const char str[] = "tournament";
    cmatch m;
    if (regex_search(str, m, re))
      cout << m[0] << endl;
    else
      cout << "-" << endl;
  }
}

ECMAscript should match "tour", the POSIX ERE grammars should match
"tournament"

Instead we match "tourn" for all grammars.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libstdc++/61424] std::regex matches right to left, not leftmost longest
  2014-06-05 20:11 [Bug libstdc++/61424] New: std::regex matches right to left, not leftmost longest redi at gcc dot gnu.org
  2014-06-05 21:06 ` [Bug libstdc++/61424] " redi at gcc dot gnu.org
@ 2014-06-05 22:33 ` timshen at gcc dot gnu.org
  2014-07-01  2:11 ` timshen at gcc dot gnu.org
  2015-02-10 14:43 ` pierreblavy at yahoo dot fr
  3 siblings, 0 replies; 5+ messages in thread
From: timshen at gcc dot gnu.org @ 2014-06-05 22:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61424

--- Comment #2 from Tim Shen <timshen at gcc dot gnu.org> ---
Sorry, the preference of results of "|" is actually arbitrary. I'll fix it
later.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libstdc++/61424] std::regex matches right to left, not leftmost longest
  2014-06-05 20:11 [Bug libstdc++/61424] New: std::regex matches right to left, not leftmost longest redi at gcc dot gnu.org
  2014-06-05 21:06 ` [Bug libstdc++/61424] " redi at gcc dot gnu.org
  2014-06-05 22:33 ` timshen at gcc dot gnu.org
@ 2014-07-01  2:11 ` timshen at gcc dot gnu.org
  2015-02-10 14:43 ` pierreblavy at yahoo dot fr
  3 siblings, 0 replies; 5+ messages in thread
From: timshen at gcc dot gnu.org @ 2014-07-01  2:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61424

--- Comment #3 from Tim Shen <timshen at gcc dot gnu.org> ---
Author: timshen
Date: Tue Jul  1 02:10:31 2014
New Revision: 212184

URL: https://gcc.gnu.org/viewcvs?rev=212184&root=gcc&view=rev
Log:
    PR libstdc++/61424
    * include/bits/regex.tcc (__regex_algo_impl<>): Use DFS for ECMAScript,
    not just regex containing back-references.
    * include/bits/regex_compiler.tcc (_Compiler<>::_M_disjunction):
    exchange _M_next and _M_alt for alternative operator,
    making matching from left to right.
    * include/bits/regex_executor.h (_State_info<>::_M_get_sol_pos):
    Add position tracking fom DFS.
    * include/bits/regex_executor.tcc (_Executor<>::_M_main_dispatch,
    _Executor<>::_M_dfs): Likewise.
    * include/bits/regex_scanner.h: Remove unused enum entry.
    * testsuite/28_regex/algorithms/regex_search/61424.cc: New
    testcase from PR.


Added:
    trunk/libstdc++-v3/testsuite/28_regex/algorithms/regex_search/61424.cc
Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/regex.tcc
    trunk/libstdc++-v3/include/bits/regex_compiler.tcc
    trunk/libstdc++-v3/include/bits/regex_executor.h
    trunk/libstdc++-v3/include/bits/regex_executor.tcc
    trunk/libstdc++-v3/include/bits/regex_scanner.h


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libstdc++/61424] std::regex matches right to left, not leftmost longest
  2014-06-05 20:11 [Bug libstdc++/61424] New: std::regex matches right to left, not leftmost longest redi at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2014-07-01  2:11 ` timshen at gcc dot gnu.org
@ 2015-02-10 14:43 ` pierreblavy at yahoo dot fr
  3 siblings, 0 replies; 5+ messages in thread
From: pierreblavy at yahoo dot fr @ 2015-02-10 14:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61424

pierreblavy at yahoo dot fr changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pierreblavy at yahoo dot fr

--- Comment #4 from pierreblavy at yahoo dot fr ---
*** Bug 64936 has been marked as a duplicate of this bug. ***


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-02-10 14:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-05 20:11 [Bug libstdc++/61424] New: std::regex matches right to left, not leftmost longest redi at gcc dot gnu.org
2014-06-05 21:06 ` [Bug libstdc++/61424] " redi at gcc dot gnu.org
2014-06-05 22:33 ` timshen at gcc dot gnu.org
2014-07-01  2:11 ` timshen at gcc dot gnu.org
2015-02-10 14:43 ` pierreblavy at yahoo dot fr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).