public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/30024] New: regcomp does not honour the documented behaviour.
@ 2023-01-18 23:02 gilles.duvert@univ-grenoble-alpes.fr
  2023-01-18 23:26 ` [Bug libc/30024] " schwab@linux-m68k.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: gilles.duvert@univ-grenoble-alpes.fr @ 2023-01-18 23:02 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30024

            Bug ID: 30024
           Summary: regcomp does not honour the documented behaviour.
           Product: glibc
           Version: 2.36
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: gilles.duvert@univ-grenoble-alpes.fr
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

Description of problem:

regcomp() should correctly find the occurences of '{ ' in a string, since it is
said: (man 7 regex) 
A  '{'  followed by a character other than a digit is an ordinary character,
not the beginning of a bound(!).

Version-Release number of selected component (if applicable):


How reproducible:
Always.

Steps to Reproduce:
1. compile and run this small C code below (slightly edited copy of the man
example). 
2. the result is OK on, e.g., OSX. Not on Mageia 8 (glibc 2.32-30) and not on
Mageia Cauldron (glibc 2.36-30 at the time of writing) where an error is
issued. 

Instead, the program using regcomp(), should find the positions of '{ ' in the
string "1234 G!t!rk{ ss { zz...\n" 

Please notet hat the problem exists also with '{' and not '{ ' as I demonstrate
in the below program. I believe the sentence " '{' followed by a character..."
holds even if there is no character at all (and indeed the OSX version behaves
the same with '{'), so the exact extent of this glibc bug is to be determined.

3. the code:
      #include <stdint.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <regex.h>

       #define ARRAY_SIZE(arr) (sizeof((arr)) / sizeof((arr)[0]))

       static const char *const str =  "1234 G!t!rk{ ss { zz...\n";
       static const char *const re = "{ ";

       int main(void)
       {
           static const char *s = str;
           regex_t     regex;
           regmatch_t  pmatch[1];
           regoff_t    off, len;
int cflags = REG_EXTENDED;
 int res=regcomp(&regex, re, cflags);
 if (res) {
   printf("regcomp error:");
   if (res == REG_BADBR   ) printf(" REG_BADBR   ");
   if (res == REG_BADPAT  ) printf(" REG_BADPAT  ");
   if (res == REG_BADRPT  ) printf(" REG_BADRPT  ");
   if (res == REG_EBRACE  ) printf(" REG_EBRACE  ");
   if (res == REG_EBRACK  ) printf(" REG_EBRACK  ");
   if (res == REG_ECOLLATE) printf(" REG_ECOLLATE");
   if (res == REG_ECTYPE  ) printf(" REG_ECTYPE  ");
   /* if (res == REG_EEND    ) printf(" REG_EEND    "); */
   if (res == REG_EESCAPE  ) printf(" REG_EESCAPE  ");
   if (res == REG_EPAREN  ) printf(" REG_EPAREN  ");
   if (res == REG_ERANGE  ) printf(" REG_ERANGE  ");
   /* if (res == REG_ESIZE   ) printf(" REG_ESIZE   ");  */
   if (res == REG_ESPACE  ) printf(" REG_ESPACE  ");
   if (res == REG_ESUBREG ) printf(" REG_ESUBREG ");
   printf("\n");
  exit(EXIT_FAILURE);
 }

           printf("String = \"%s\"\n", str);
           printf("Matches:\n");

           for (int i = 0; ; i++) {
               if (regexec(&regex, s, ARRAY_SIZE(pmatch), pmatch, 0))
                   break;

               off = pmatch[0].rm_so + (s - str);
               len = pmatch[0].rm_eo - pmatch[0].rm_so;
               printf("#%d:\n", i);
               printf("offset = %jd; length = %jd\n", (intmax_t) off,
                       (intmax_t) len);
               printf("substring = \"%.*s\"\n", len, s + pmatch[0].rm_so);

               s += pmatch[0].rm_eo;
           }

           exit(EXIT_SUCCESS);
       }

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libc/30024] regcomp does not honour the documented behaviour.
  2023-01-18 23:02 [Bug libc/30024] New: regcomp does not honour the documented behaviour gilles.duvert@univ-grenoble-alpes.fr
@ 2023-01-18 23:26 ` schwab@linux-m68k.org
  2023-01-19 11:18 ` gilles.duvert@univ-grenoble-alpes.fr
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: schwab@linux-m68k.org @ 2023-01-18 23:26 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30024

Andreas Schwab <schwab@linux-m68k.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |NOTABUG

--- Comment #1 from Andreas Schwab <schwab@linux-m68k.org> ---
POSIX says:

Any of the following uses produce undefined results: ...
* If a <left-brace> is not part of a valid interval expression (see EREs
Matching Multiple Characters)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libc/30024] regcomp does not honour the documented behaviour.
  2023-01-18 23:02 [Bug libc/30024] New: regcomp does not honour the documented behaviour gilles.duvert@univ-grenoble-alpes.fr
  2023-01-18 23:26 ` [Bug libc/30024] " schwab@linux-m68k.org
@ 2023-01-19 11:18 ` gilles.duvert@univ-grenoble-alpes.fr
  2023-01-19 15:36 ` schwab@linux-m68k.org
  2023-01-19 22:21 ` gilles.duvert@univ-grenoble-alpes.fr
  3 siblings, 0 replies; 5+ messages in thread
From: gilles.duvert@univ-grenoble-alpes.fr @ 2023-01-19 11:18 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30024

--- Comment #2 from Gilles Duvert <gilles.duvert@univ-grenoble-alpes.fr> ---
man 7 regex excerpt:
REGEX(7)                                                                       
                        Linux Programmer's Manual                              
                                                                 REGEX(7)

NAME
       regex - POSIX.2 regular expressions

and man 3 regcomp says:
REGEX(3)                                                                       
                        Linux Programmer's Manual                              
                                                                 REGEX(3)

NAME
       regcomp, regexec, regerror, regfree - POSIX regex functions


so, POSIX, POSIX.2 ??

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libc/30024] regcomp does not honour the documented behaviour.
  2023-01-18 23:02 [Bug libc/30024] New: regcomp does not honour the documented behaviour gilles.duvert@univ-grenoble-alpes.fr
  2023-01-18 23:26 ` [Bug libc/30024] " schwab@linux-m68k.org
  2023-01-19 11:18 ` gilles.duvert@univ-grenoble-alpes.fr
@ 2023-01-19 15:36 ` schwab@linux-m68k.org
  2023-01-19 22:21 ` gilles.duvert@univ-grenoble-alpes.fr
  3 siblings, 0 replies; 5+ messages in thread
From: schwab@linux-m68k.org @ 2023-01-19 15:36 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30024

--- Comment #3 from Andreas Schwab <schwab@linux-m68k.org> ---
Third-party manpages are not authoritative.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libc/30024] regcomp does not honour the documented behaviour.
  2023-01-18 23:02 [Bug libc/30024] New: regcomp does not honour the documented behaviour gilles.duvert@univ-grenoble-alpes.fr
                   ` (2 preceding siblings ...)
  2023-01-19 15:36 ` schwab@linux-m68k.org
@ 2023-01-19 22:21 ` gilles.duvert@univ-grenoble-alpes.fr
  3 siblings, 0 replies; 5+ messages in thread
From: gilles.duvert@univ-grenoble-alpes.fr @ 2023-01-19 22:21 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30024

--- Comment #4 from Gilles Duvert <gilles.duvert@univ-grenoble-alpes.fr> ---
Do you recommend to file a report to manpages?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-01-19 22:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-18 23:02 [Bug libc/30024] New: regcomp does not honour the documented behaviour gilles.duvert@univ-grenoble-alpes.fr
2023-01-18 23:26 ` [Bug libc/30024] " schwab@linux-m68k.org
2023-01-19 11:18 ` gilles.duvert@univ-grenoble-alpes.fr
2023-01-19 15:36 ` schwab@linux-m68k.org
2023-01-19 22:21 ` gilles.duvert@univ-grenoble-alpes.fr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).