public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug glob/31075] New: fnmatch("??") matches on 2-byte single characters (as well as 2 any-length characters)
@ 2023-11-18 10:03 stephane+sourceware at chazelas dot org
  2023-11-18 11:53 ` [Bug glob/31075] " stephane+sourceware at chazelas dot org
  2023-11-24 19:56 ` [Bug glob/31075] fnmatch("??") matches on one two byte valid character (as well as two " stephane+sourceware at chazelas dot org
  0 siblings, 2 replies; 3+ messages in thread
From: stephane+sourceware at chazelas dot org @ 2023-11-18 10:03 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31075

            Bug ID: 31075
           Summary: fnmatch("??") matches on 2-byte single characters (as
                    well as 2 any-length characters)
           Product: glibc
           Version: 2.34
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: glob
          Assignee: unassigned at sourceware dot org
          Reporter: stephane+sourceware at chazelas dot org
  Target Milestone: ---

Regression introduced in 2.34 by commit
a79328c745219dcb395070cdcd3be065a8347f24  reproduced on Ubuntu 22.04, Debian
sid libc6:amd64 2.37-12, and current git HEAD
(dae3cf4134d476a4b4ef86fd7012231d6436c15e) built on that sid system.

find . -name '??'

In a UTF-8 locale matches on a UTF-8 encoded éé (0xc3 0xa9 0xc3 0xa9) but also
on a UTF-8 encoded é (0xc3 0xa9):

To reproduce, from a shell with support for ksh93-style $'...' quotes (ksh93,
zsh, bash...) and on a system where the C.UTF-8 locale has been enabled (change
to any other UTF-8 locale if not):

(
  mkdir new-dir && cd new-dir || exit
  touch $'\xc3\xa9' $'\xc3\xa9\xc3\xa9'
  export LC_ALL=C.UTF-8
  locale charmap
  find . -name '??'
)

UTF-8
./é
./éé

It seems when fnmatch() fails to match in wchar_t mode, it tries again in char
mode. The pattern is also treated as a char[] array then which makes it even
worth than the (already quite buggy) behaviour of bash pattern matching
(https://lists.gnu.org/archive/html/bug-bash/2021-02/msg00054.html), as that's
done even when both the subject and pattern are properly encoded in the user's
locale charmap.

$ find . '*[á-ä]*'
./é
./éé

Those didn't match in wchar_t mode but matched in char mode as that became a
*[\303\241-\303\244]* match so matches on anything containing byte 0241 to
0303.

Like for bash, it becomes worse in locales that have characters whose encoding
contains the encoding of [, ] or \ as it can end up matching on a pattern
completely different from the one intended by the user.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug glob/31075] fnmatch("??") matches on 2-byte single characters (as well as 2 any-length characters)
  2023-11-18 10:03 [Bug glob/31075] New: fnmatch("??") matches on 2-byte single characters (as well as 2 any-length characters) stephane+sourceware at chazelas dot org
@ 2023-11-18 11:53 ` stephane+sourceware at chazelas dot org
  2023-11-24 19:56 ` [Bug glob/31075] fnmatch("??") matches on one two byte valid character (as well as two " stephane+sourceware at chazelas dot org
  1 sibling, 0 replies; 3+ messages in thread
From: stephane+sourceware at chazelas dot org @ 2023-11-18 11:53 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31075

--- Comment #1 from Stephane Chazelas <stephane+sourceware at chazelas dot org> ---
(In reply to Stephane Chazelas from comment #0)
> Regression introduced in 2.34 by commit
> a79328c745219dcb395070cdcd3be065a8347f24  reproduced on Ubuntu 22.04, Debian
> sid libc6:amd64 2.37-12, and current git HEAD
> (dae3cf4134d476a4b4ef86fd7012231d6436c15e) built on that sid system.
> 
> find . -name '??'
[...]

To clarify, "find" is used here to demonstrate the behaviour of the libc's
fnmatch(). Here with GNU find.

$ LD_DEBUG=bindings find é -name '??' |& grep fnmatch
    323895:     binding file /lib/x86_64-linux-gnu/libselinux.so.1 [0] to
/lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `fnmatch' [GLIBC_2.2.5]
    323895:     binding file find [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]:
normal symbol `fnmatch' [GLIBC_2.2.5]

$ ltrace -e 'fnmatch' find é éé $'\U10FFFF\U10FFFF' -name '??'
find->fnmatch("foo", "foo", 0)                                                 
                                      = 0
find->fnmatch("Foo", "foo", 0)                                                 
                                      = 1
find->fnmatch("Foo", "foo", 16)                                                
                                      = 0
find->fnmatch("??", "\303\251", 0)                                             
                                      = 0
é
find->fnmatch("??", "\303\251\303\251", 0)                                     
                                      = 0
éé
find->fnmatch("??", "\364\217\277\277\364\217\277\277", 0)                     
                                      = 0
??
+++ exited (status 0) +++

(here showing ?? matching one 2-byte character, two 2-byte characters and two
4-byte characters).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug glob/31075] fnmatch("??") matches on one two byte valid character (as well as two any-length characters)
  2023-11-18 10:03 [Bug glob/31075] New: fnmatch("??") matches on 2-byte single characters (as well as 2 any-length characters) stephane+sourceware at chazelas dot org
  2023-11-18 11:53 ` [Bug glob/31075] " stephane+sourceware at chazelas dot org
@ 2023-11-24 19:56 ` stephane+sourceware at chazelas dot org
  1 sibling, 0 replies; 3+ messages in thread
From: stephane+sourceware at chazelas dot org @ 2023-11-24 19:56 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31075

Stephane Chazelas <stephane+sourceware at chazelas dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|fnmatch("??") matches on    |fnmatch("??") matches on
                   |2-byte single characters    |one two byte valid
                   |(as well as 2 any-length    |character (as well as two
                   |characters)                 |any-length characters)
              Flags|                            |security?

--- Comment #2 from Stephane Chazelas <stephane+sourceware at chazelas dot org> ---
Can likely be considered a security issue as that means patterns match things
that where not intended be matched (I'll let you guys decide on that), but on
the other hand that bug works around long-standing issues whereby for instance

find . ! -name '*evil*' -exec ... {} +

was failing to exclude file names containing "evil" when what's on either side
is not valid text in the users locale (a common issue these days where UTF-8 is
the norm).

Though of course falling back to treating both pattern and subject as char[]
arrays when the subject cannot be decoded as text like bash does (and what
might have been the intent of a79328c745219dcb395070cdcd3be065a8347f24) is
incorrect (see
https://lists.gnu.org/archive/html/bug-bash/2021-02/msg00054.html for more
details).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-11-24 19:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-18 10:03 [Bug glob/31075] New: fnmatch("??") matches on 2-byte single characters (as well as 2 any-length characters) stephane+sourceware at chazelas dot org
2023-11-18 11:53 ` [Bug glob/31075] " stephane+sourceware at chazelas dot org
2023-11-24 19:56 ` [Bug glob/31075] fnmatch("??") matches on one two byte valid character (as well as two " stephane+sourceware at chazelas dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).