public inbox for glibc-bugs-regex@sourceware.org
help / color / mirror / Atom feed
* [Bug regex/14301] New: Regular expression wrong match with a number of groups
@ 2012-06-27  7:26 valery_reznic at yahoo dot com
  2012-06-27  7:48 ` [Bug regex/14301] " schwab@linux-m68k.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: valery_reznic at yahoo dot com @ 2012-06-27  7:26 UTC (permalink / raw)
  To: glibc-bugs-regex

http://sourceware.org/bugzilla/show_bug.cgi?id=14301

             Bug #: 14301
           Summary: Regular expression wrong match with a number of groups
           Product: glibc
           Version: 2.11
            Status: NEW
          Severity: normal
          Priority: P2
         Component: regex
        AssignedTo: unassigned@sourceware.org
        ReportedBy: valery_reznic@yahoo.com
                CC: drepper.fsp@gmail.com
    Classification: Unclassified


Created attachment 6487
  --> http://sourceware.org/bugzilla/attachment.cgi?id=6487
test program for bug demonstration

I compile regular expression with REG_EXTENDED | REG_ICASE (see attached
program)
Then run it as following:

(First parameter - regular expression, second - text to match)

1. ./r '[ ](4[0-9]{15})|(4[0-9]{12})[ ]'       ' 4123456789012 ' - match
2. ./r '[ ](4[0-9]{15})|(4[0-9]{12})[ ]'       'r4123456789012'  - no match
3. ./r '[ ](4[0-9]{15})|(4[0-9]{12})|(AAA)[ ]' 'r4123456789012'  - match
4. ./r '[ ](4[0-9]{12})|(4[0-9]{15})|(AAA)[ ]' 'r4123456789012'  - no-match

1. Match, as expected - second group (4[0-9]{12})
2. No, match as expected.
3. Match, when should not be
4. Just for interest - swap second and first group 4[0-9]{12} and 4[0-9]{15}
   and all of sudden it works as expected.

Attached program can be compiled with

gcc -Wall r.c -o r

I tested it on systems with following glibc versions:
- 2.3.2 Fedora Core 1, both i386 and x86-64
- 2.11  Fedora 12 - x86-64
- 2.5   CentOS release 5.6 (Final) - i386

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug regex/14301] Regular expression wrong match with a number of groups
  2012-06-27  7:26 [Bug regex/14301] New: Regular expression wrong match with a number of groups valery_reznic at yahoo dot com
@ 2012-06-27  7:48 ` schwab@linux-m68k.org
  2012-06-27  8:16 ` valery_reznic at yahoo dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: schwab@linux-m68k.org @ 2012-06-27  7:48 UTC (permalink / raw)
  To: glibc-bugs-regex

http://sourceware.org/bugzilla/show_bug.cgi?id=14301

Andreas Schwab <schwab@linux-m68k.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID

--- Comment #1 from Andreas Schwab <schwab@linux-m68k.org> 2012-06-27 07:47:26 UTC ---
(4[0-9]{12}) matches 4123456789012.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug regex/14301] Regular expression wrong match with a number of groups
  2012-06-27  7:26 [Bug regex/14301] New: Regular expression wrong match with a number of groups valery_reznic at yahoo dot com
  2012-06-27  7:48 ` [Bug regex/14301] " schwab@linux-m68k.org
@ 2012-06-27  8:16 ` valery_reznic at yahoo dot com
  2012-06-27  8:33 ` fweimer at redhat dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: valery_reznic at yahoo dot com @ 2012-06-27  8:16 UTC (permalink / raw)
  To: glibc-bugs-regex

http://sourceware.org/bugzilla/show_bug.cgi?id=14301

Valery <valery_reznic at yahoo dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |

--- Comment #2 from Valery <valery_reznic at yahoo dot com> 2012-06-27 08:15:30 UTC ---
(In reply to comment #1)
> (4[0-9]{12}) matches 4123456789012.

So what?
Regular expression required leading and trailing space to be present for match.

IMO, result for case 3) are wrong

Also please note, that in case 3 and 4 regular expressions are essentially the
same - only (4[0-9]{15}) and (4[0-9]{12}) that connected with OR swapped.

But results are different.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug regex/14301] Regular expression wrong match with a number of groups
  2012-06-27  7:26 [Bug regex/14301] New: Regular expression wrong match with a number of groups valery_reznic at yahoo dot com
  2012-06-27  7:48 ` [Bug regex/14301] " schwab@linux-m68k.org
  2012-06-27  8:16 ` valery_reznic at yahoo dot com
@ 2012-06-27  8:33 ` fweimer at redhat dot com
  2012-06-27  8:51 ` valery_reznic at yahoo dot com
  2014-06-13 14:58 ` fweimer at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2012-06-27  8:33 UTC (permalink / raw)
  To: glibc-bugs-regex

http://sourceware.org/bugzilla/show_bug.cgi?id=14301

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
                 CC|                            |fweimer at redhat dot com
         Resolution|                            |INVALID

--- Comment #3 from Florian Weimer <fweimer at redhat dot com> 2012-06-27 08:32:43 UTC ---
(In reply to comment #2)
> (In reply to comment #1)
> > (4[0-9]{12}) matches 4123456789012.
> 
> So what?
> Regular expression required leading and trailing space to be present for match.

I don't think so.  "|" does not bind this way.  a[b]|c|d[e] is equivalent to
(ab)|c|(de), not (a)(b|c|d)(e).

> Also please note, that in case 3 and 4 regular expressions are essentially the
> same - only (4[0-9]{15}) and (4[0-9]{12}) that connected with OR swapped.

This also explains the difference when swapping the parenthesized constructs.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug regex/14301] Regular expression wrong match with a number of groups
  2012-06-27  7:26 [Bug regex/14301] New: Regular expression wrong match with a number of groups valery_reznic at yahoo dot com
                   ` (2 preceding siblings ...)
  2012-06-27  8:33 ` fweimer at redhat dot com
@ 2012-06-27  8:51 ` valery_reznic at yahoo dot com
  2014-06-13 14:58 ` fweimer at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: valery_reznic at yahoo dot com @ 2012-06-27  8:51 UTC (permalink / raw)
  To: glibc-bugs-regex

http://sourceware.org/bugzilla/show_bug.cgi?id=14301

--- Comment #4 from Valery <valery_reznic at yahoo dot com> 2012-06-27 08:50:54 UTC ---
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > (4[0-9]{12}) matches 4123456789012.
> > 
> > So what?
> > Regular expression required leading and trailing space to be present for match.
> 
> I don't think so.  "|" does not bind this way.  a[b]|c|d[e] is equivalent to
> (ab)|c|(de), not (a)(b|c|d)(e).
> 
> > Also please note, that in case 3 and 4 regular expressions are essentially the
> > same - only (4[0-9]{15}) and (4[0-9]{12}) that connected with OR swapped.
> 
> This also explains the difference when swapping the parenthesized constructs.

Got it! Leading space was part only of the first (   ) group, trailing space
was part only of the last (   ) group

For some reason I though that '|' has higher priority than concatenation.
One more pair of the  () fixed the problem.
'[ ]((4[0-9]{15})|(4[0-9]{12})|(AAA))[ ]'

Thank you very much for the explanation. I didn't believe how stupid I was.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug regex/14301] Regular expression wrong match with a number of groups
  2012-06-27  7:26 [Bug regex/14301] New: Regular expression wrong match with a number of groups valery_reznic at yahoo dot com
                   ` (3 preceding siblings ...)
  2012-06-27  8:51 ` valery_reznic at yahoo dot com
@ 2014-06-13 14:58 ` fweimer at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2014-06-13 14:58 UTC (permalink / raw)
  To: glibc-bugs-regex

https://sourceware.org/bugzilla/show_bug.cgi?id=14301

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-13 14:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-27  7:26 [Bug regex/14301] New: Regular expression wrong match with a number of groups valery_reznic at yahoo dot com
2012-06-27  7:48 ` [Bug regex/14301] " schwab@linux-m68k.org
2012-06-27  8:16 ` valery_reznic at yahoo dot com
2012-06-27  8:33 ` fweimer at redhat dot com
2012-06-27  8:51 ` valery_reznic at yahoo dot com
2014-06-13 14:58 ` fweimer at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).