public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug glob/26628] New: fnmatch with multi-character collating symbols and equivalence classes gives surprising results
@ 2020-09-16 18:32 harald at gigawatt dot nl
  2020-09-17 15:53 ` [Bug glob/26628] " fweimer at redhat dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: harald at gigawatt dot nl @ 2020-09-16 18:32 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=26628

            Bug ID: 26628
           Summary: fnmatch with multi-character collating symbols and
                    equivalence classes gives surprising results
           Product: glibc
           Version: 2.32
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: glob
          Assignee: unassigned at sourceware dot org
          Reporter: harald at gigawatt dot nl
  Target Milestone: ---

Reported as a new bug as requested in bug #26620. Please consider again this
test program, run in the en_US.UTF-8 locale:

  #include <stdio.h>
  #include <locale.h>
  #include <fnmatch.h>
  int main(int argc, char *argv[]) {
    setlocale(LC_ALL, "");
    if (argc != 3) {
      fprintf(stderr, "usage: fnmatch <pattern> <string>\n");
      return 2;
    }
    return !!fnmatch(argv[1], argv[2], 0);
  }

I have tested on glibc 2.32 with the fix for #26620 applied.

1. Matching lists stop after the first match:

  fnmatch '[[.L·.]L]·' 'L·' # should match, does not

POSIX leaves it unspecified whether [[.L·.]L] can match L·, but it must match
L, so [[.L·.]L]· must match L·. glibc lets [[.L·.]L] match L·, resulting in a
failed match, and does not try again letting [[.L·.]L] match just L.

Likewise, the following result is allowed by POSIX but inconsistent with how
glibc behaves in other situations:

  fnmatch '[L[.L·.]]' 'L·' # should match, does not

Here, glibc lets [L[.L·.]] match L, resulting in a failed match, and does not
try again letting [L[.L·.]] match L·.

2. Equivalence classes do not allow multi-character collating elements:

  fnmatch '[[=L·=]]' 'Ŀ' # should match, does not

Although POSIX leaves it unspecified whether a collating symbol or equivalence
class can match against multiple characters, it must match against a single
character. glibc instead does not syntactically allow an equivalence class with
multiple characters, and treats the second "[" as a literal. As a result:

  fnmatch '[[=L·=]]' '=]' # should not match, does

3. Equivalence classes check multiple characters but then assume matching a
single character

  fnmatch '[[=Ŀ=]]·' 'L·' # should not match, does

Here, glibc considers the whole "L·" string to determine that [[=Ŀ=]] matches,
but then advances by a single character, not the length of the match, so sees
the remainder of the string as "·" and attempts to match that to the remainder
of the pattern, also "·".

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug glob/26628] fnmatch with multi-character collating symbols and equivalence classes gives surprising results
  2020-09-16 18:32 [Bug glob/26628] New: fnmatch with multi-character collating symbols and equivalence classes gives surprising results harald at gigawatt dot nl
@ 2020-09-17 15:53 ` fweimer at redhat dot com
  2020-09-17 17:23 ` harald at gigawatt dot nl
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: fweimer at redhat dot com @ 2020-09-17 15:53 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=26628

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com

--- Comment #1 from Florian Weimer <fweimer at redhat dot com> ---
POSIX says (for [. .]):

“If the string is not a collating element in the current locale, the expression
is invalid.”

Is L· really a collating element?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug glob/26628] fnmatch with multi-character collating symbols and equivalence classes gives surprising results
  2020-09-16 18:32 [Bug glob/26628] New: fnmatch with multi-character collating symbols and equivalence classes gives surprising results harald at gigawatt dot nl
  2020-09-17 15:53 ` [Bug glob/26628] " fweimer at redhat dot com
@ 2020-09-17 17:23 ` harald at gigawatt dot nl
  2021-03-13  9:16 ` gadelat at gmail dot com
  2021-03-13  9:16 ` gadelat at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: harald at gigawatt dot nl @ 2020-09-17 17:23 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=26628

--- Comment #2 from Harald van Dijk <harald at gigawatt dot nl> ---
(In reply to Florian Weimer from comment #1)
> POSIX says (for [. .]):
> 
> “If the string is not a collating element in the current locale, the
> expression is invalid.”
> 
> Is L· really a collating element?

Yes. For this and for bug #26620 I was initially testing with "ch" and "dd" in
the cy_GB.UTF-8 locale, but I looked through
/usr/share/i18n/locales/iso14651_t1_common and picked the first collating
element in there to make it easier to reproduce, since I know many more systems
will have the en_US.UTF-8 locale than the cy_GB.UTF-8 locale.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug glob/26628] fnmatch with multi-character collating symbols and equivalence classes gives surprising results
  2020-09-16 18:32 [Bug glob/26628] New: fnmatch with multi-character collating symbols and equivalence classes gives surprising results harald at gigawatt dot nl
  2020-09-17 15:53 ` [Bug glob/26628] " fweimer at redhat dot com
  2020-09-17 17:23 ` harald at gigawatt dot nl
@ 2021-03-13  9:16 ` gadelat at gmail dot com
  2021-03-13  9:16 ` gadelat at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: gadelat at gmail dot com @ 2021-03-13  9:16 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=26628

Gabriel Ostrolucký <gadelat at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gadelat at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug glob/26628] fnmatch with multi-character collating symbols and equivalence classes gives surprising results
  2020-09-16 18:32 [Bug glob/26628] New: fnmatch with multi-character collating symbols and equivalence classes gives surprising results harald at gigawatt dot nl
                   ` (2 preceding siblings ...)
  2021-03-13  9:16 ` gadelat at gmail dot com
@ 2021-03-13  9:16 ` gadelat at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: gadelat at gmail dot com @ 2021-03-13  9:16 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=26628

Gabriel Ostrolucký <gadelat at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|gadelat at gmail dot com           |

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-13  9:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-16 18:32 [Bug glob/26628] New: fnmatch with multi-character collating symbols and equivalence classes gives surprising results harald at gigawatt dot nl
2020-09-17 15:53 ` [Bug glob/26628] " fweimer at redhat dot com
2020-09-17 17:23 ` harald at gigawatt dot nl
2021-03-13  9:16 ` gadelat at gmail dot com
2021-03-13  9:16 ` gadelat at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).