public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug glob/26628] New: fnmatch with multi-character collating symbols and equivalence classes gives surprising results
@ 2020-09-16 18:32 harald at gigawatt dot nl
2020-09-17 15:53 ` [Bug glob/26628] " fweimer at redhat dot com
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: harald at gigawatt dot nl @ 2020-09-16 18:32 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=26628
Bug ID: 26628
Summary: fnmatch with multi-character collating symbols and
equivalence classes gives surprising results
Product: glibc
Version: 2.32
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: glob
Assignee: unassigned at sourceware dot org
Reporter: harald at gigawatt dot nl
Target Milestone: ---
Reported as a new bug as requested in bug #26620. Please consider again this
test program, run in the en_US.UTF-8 locale:
#include <stdio.h>
#include <locale.h>
#include <fnmatch.h>
int main(int argc, char *argv[]) {
setlocale(LC_ALL, "");
if (argc != 3) {
fprintf(stderr, "usage: fnmatch <pattern> <string>\n");
return 2;
}
return !!fnmatch(argv[1], argv[2], 0);
}
I have tested on glibc 2.32 with the fix for #26620 applied.
1. Matching lists stop after the first match:
fnmatch '[[.L·.]L]·' 'L·' # should match, does not
POSIX leaves it unspecified whether [[.L·.]L] can match L·, but it must match
L, so [[.L·.]L]· must match L·. glibc lets [[.L·.]L] match L·, resulting in a
failed match, and does not try again letting [[.L·.]L] match just L.
Likewise, the following result is allowed by POSIX but inconsistent with how
glibc behaves in other situations:
fnmatch '[L[.L·.]]' 'L·' # should match, does not
Here, glibc lets [L[.L·.]] match L, resulting in a failed match, and does not
try again letting [L[.L·.]] match L·.
2. Equivalence classes do not allow multi-character collating elements:
fnmatch '[[=L·=]]' 'Ŀ' # should match, does not
Although POSIX leaves it unspecified whether a collating symbol or equivalence
class can match against multiple characters, it must match against a single
character. glibc instead does not syntactically allow an equivalence class with
multiple characters, and treats the second "[" as a literal. As a result:
fnmatch '[[=L·=]]' '=]' # should not match, does
3. Equivalence classes check multiple characters but then assume matching a
single character
fnmatch '[[=Ŀ=]]·' 'L·' # should not match, does
Here, glibc considers the whole "L·" string to determine that [[=Ŀ=]] matches,
but then advances by a single character, not the length of the match, so sees
the remainder of the string as "·" and attempts to match that to the remainder
of the pattern, also "·".
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug glob/26628] fnmatch with multi-character collating symbols and equivalence classes gives surprising results
2020-09-16 18:32 [Bug glob/26628] New: fnmatch with multi-character collating symbols and equivalence classes gives surprising results harald at gigawatt dot nl
@ 2020-09-17 15:53 ` fweimer at redhat dot com
2020-09-17 17:23 ` harald at gigawatt dot nl
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: fweimer at redhat dot com @ 2020-09-17 15:53 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=26628
Florian Weimer <fweimer at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |fweimer at redhat dot com
--- Comment #1 from Florian Weimer <fweimer at redhat dot com> ---
POSIX says (for [. .]):
“If the string is not a collating element in the current locale, the expression
is invalid.”
Is L· really a collating element?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug glob/26628] fnmatch with multi-character collating symbols and equivalence classes gives surprising results
2020-09-16 18:32 [Bug glob/26628] New: fnmatch with multi-character collating symbols and equivalence classes gives surprising results harald at gigawatt dot nl
2020-09-17 15:53 ` [Bug glob/26628] " fweimer at redhat dot com
@ 2020-09-17 17:23 ` harald at gigawatt dot nl
2021-03-13 9:16 ` gadelat at gmail dot com
2021-03-13 9:16 ` gadelat at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: harald at gigawatt dot nl @ 2020-09-17 17:23 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=26628
--- Comment #2 from Harald van Dijk <harald at gigawatt dot nl> ---
(In reply to Florian Weimer from comment #1)
> POSIX says (for [. .]):
>
> “If the string is not a collating element in the current locale, the
> expression is invalid.”
>
> Is L· really a collating element?
Yes. For this and for bug #26620 I was initially testing with "ch" and "dd" in
the cy_GB.UTF-8 locale, but I looked through
/usr/share/i18n/locales/iso14651_t1_common and picked the first collating
element in there to make it easier to reproduce, since I know many more systems
will have the en_US.UTF-8 locale than the cy_GB.UTF-8 locale.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug glob/26628] fnmatch with multi-character collating symbols and equivalence classes gives surprising results
2020-09-16 18:32 [Bug glob/26628] New: fnmatch with multi-character collating symbols and equivalence classes gives surprising results harald at gigawatt dot nl
2020-09-17 15:53 ` [Bug glob/26628] " fweimer at redhat dot com
2020-09-17 17:23 ` harald at gigawatt dot nl
@ 2021-03-13 9:16 ` gadelat at gmail dot com
2021-03-13 9:16 ` gadelat at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: gadelat at gmail dot com @ 2021-03-13 9:16 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=26628
Gabriel Ostrolucký <gadelat at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |gadelat at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug glob/26628] fnmatch with multi-character collating symbols and equivalence classes gives surprising results
2020-09-16 18:32 [Bug glob/26628] New: fnmatch with multi-character collating symbols and equivalence classes gives surprising results harald at gigawatt dot nl
` (2 preceding siblings ...)
2021-03-13 9:16 ` gadelat at gmail dot com
@ 2021-03-13 9:16 ` gadelat at gmail dot com
3 siblings, 0 replies; 5+ messages in thread
From: gadelat at gmail dot com @ 2021-03-13 9:16 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=26628
Gabriel Ostrolucký <gadelat at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC|gadelat at gmail dot com |
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-03-13 9:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-16 18:32 [Bug glob/26628] New: fnmatch with multi-character collating symbols and equivalence classes gives surprising results harald at gigawatt dot nl
2020-09-17 15:53 ` [Bug glob/26628] " fweimer at redhat dot com
2020-09-17 17:23 ` harald at gigawatt dot nl
2021-03-13 9:16 ` gadelat at gmail dot com
2021-03-13 9:16 ` gadelat at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).