public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/14185] New: fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s)
@ 2012-05-30 19:01 pfrankli at redhat dot com
  2012-05-31  0:30 ` [Bug libc/14185] " bugdal at aerifal dot cx
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: pfrankli at redhat dot com @ 2012-05-30 19:01 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=14185

             Bug #: 14185
           Summary: fnmatch() fails when '*' wildcard is applied on the
                    file name containing multi-byte character(s)
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: unassigned@sourceware.org
        ReportedBy: pfrankli@redhat.com
                CC: drepper.fsp@gmail.com
    Classification: Unclassified


Created attachment 6427
  --> http://sourceware.org/bugzilla/attachment.cgi?id=6427
simple testcase

fnmatch tries to convert the string (filename) from multibyte to a wide
character representation via mbsrtowcs.  When the encoding is invalid (say \366
when using en_US.UTF-8), mbsrtowcs returns EILSEQ (illegal byte sequence).

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/14185] fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s)
  2012-05-30 19:01 [Bug libc/14185] New: fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s) pfrankli at redhat dot com
@ 2012-05-31  0:30 ` bugdal at aerifal dot cx
  2012-05-31  2:49 ` law at redhat dot com
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugdal at aerifal dot cx @ 2012-05-31  0:30 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=14185

Rich Felker <bugdal at aerifal dot cx> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugdal at aerifal dot cx

--- Comment #1 from Rich Felker <bugdal at aerifal dot cx> 2012-05-31 00:29:51 UTC ---
While this is not a conformance issue (actually, the most correct/conformant
thing may be to reject as non-matching or give an error on invalid strings like
this), I believe it is a quality-of-implementation issue. It makes it
unnecessarily difficult to do things like deleting junk files created by an
obnoxious user or extracting a corrupt archive file (or just one created by
someone with poor taste in character encoding); "rm -f *" will fail.

As a related (same fundamental cause: the conversion to a wchar_t string)
quality-of-implementation issue, fnmatch("*", huge_string, 0) fails on glibc
even though it should obviously match without even having to inspect
huge_string, much less make a 4x-size copy of it.

Unfortunately glibc's fnmatch implementation is just really ugly, and I don't
think issues like this can be fixed without piling on ugly hacks, or just
replacing the implementation with something saner...

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/14185] fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s)
  2012-05-30 19:01 [Bug libc/14185] New: fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s) pfrankli at redhat dot com
  2012-05-31  0:30 ` [Bug libc/14185] " bugdal at aerifal dot cx
@ 2012-05-31  2:49 ` law at redhat dot com
  2012-05-31 20:18 ` pfrankli at redhat dot com
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: law at redhat dot com @ 2012-05-31  2:49 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=14185

law at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at redhat dot com

--- Comment #2 from law at redhat dot com 2012-05-31 02:49:09 UTC ---
AFAICT there are no defined errors from fnmatch.  It's defined as returning 0
for a match, FNM_NOMATCH for no match and another non-zero value for an error. 
However, no specific error codes are defined.

But what doesn't make sense about the current implementation is if pattern is
something like *.csv and the string is <invalid>.csv, then we return an error. 
To me that seems plainly wrong as we can get a positive match regardless of the
invalid character in the string.

As you note, it makes rm -f * fail as will "find" and a variety of other tools
which rely upon fnmatch.  This actually came to our attention because "find"
was missing an obvious match because of an invalid character in a filename.

It's worth noting that gnulib's fnmatch has been fixed to address this problem.


Yes, there's an additional issue with handling out of memory conditions with
large strings.  We don't have a fix for that and I think it should be tracked
as a separate and distinct issue.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/14185] fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s)
  2012-05-30 19:01 [Bug libc/14185] New: fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s) pfrankli at redhat dot com
  2012-05-31  0:30 ` [Bug libc/14185] " bugdal at aerifal dot cx
  2012-05-31  2:49 ` law at redhat dot com
@ 2012-05-31 20:18 ` pfrankli at redhat dot com
  2012-06-01  2:32 ` bugdal at aerifal dot cx
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: pfrankli at redhat dot com @ 2012-05-31 20:18 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=14185

--- Comment #3 from Patsy Franklin <pfrankli at redhat dot com> 2012-05-31 20:18:10 UTC ---
Created attachment 6429
  --> http://sourceware.org/bugzilla/attachment.cgi?id=6429
patch file

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/14185] fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s)
  2012-05-30 19:01 [Bug libc/14185] New: fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s) pfrankli at redhat dot com
                   ` (2 preceding siblings ...)
  2012-05-31 20:18 ` pfrankli at redhat dot com
@ 2012-06-01  2:32 ` bugdal at aerifal dot cx
  2012-06-18 17:17 ` law at redhat dot com
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugdal at aerifal dot cx @ 2012-06-01  2:32 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=14185

--- Comment #4 from Rich Felker <bugdal at aerifal dot cx> 2012-06-01 02:32:05 UTC ---
This patch fixes the bug in the test case, but I'm not convinced that it
doesn't create other (possibly much worse!) bugs in the process. For instance,
the pattern "[á]*" should not match "\xa1.txt", but with fallback to treating
everything as bytes, it does match. Basically this patch would make "rm -f
pattern" potentially delete lots of files with illegal byte sequences in their
names whenever pattern contains a bracket expression with multibyte characters
in it. (Note that the effect becomes even more severe with a range expression
like [à-á]...)

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/14185] fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s)
  2012-05-30 19:01 [Bug libc/14185] New: fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s) pfrankli at redhat dot com
                   ` (3 preceding siblings ...)
  2012-06-01  2:32 ` bugdal at aerifal dot cx
@ 2012-06-18 17:17 ` law at redhat dot com
  2013-10-09  9:18 ` neleai at seznam dot cz
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: law at redhat dot com @ 2012-06-18 17:17 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=14185

--- Comment #5 from law at redhat dot com 2012-06-18 17:17:34 UTC ---
Rich,
Yes, the proposed patch may have an issue around pattern "[á]*" should not
match "\xa1.txt"; not sure if that's a worse situation or not.

The problem the patch fixes has been reported multiple times over the last 10
years or so without anyone taking action in glibc.  gnulib fixed it in
effectively the same way as the patch above in 2005 and I don't think anyone's
reported problems.

SuSE had the patch attached to this BZ in their distro for a while, but pulled
it for reasons unknown.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/14185] fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s)
  2012-05-30 19:01 [Bug libc/14185] New: fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s) pfrankli at redhat dot com
                   ` (4 preceding siblings ...)
  2012-06-18 17:17 ` law at redhat dot com
@ 2013-10-09  9:18 ` neleai at seznam dot cz
  2013-10-10 16:50 ` bugdal at aerifal dot cx
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: neleai at seznam dot cz @ 2013-10-09  9:18 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14185

Ondrej Bilka <neleai at seznam dot cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |neleai at seznam dot cz

--- Comment #6 from Ondrej Bilka <neleai at seznam dot cz> ---
Was patch above send to libc-alpha?

A proper solution would be use new function called say mbsrtowcs_with_errors.
At illegal sequence it would write special widechar and tried to resumed
matching next byte.

It would be slow but fnmatch is not performance-critical anyway, on UTF8 2/3 of
time is spend at conversion to wide chars which is not neccessary when needle
is ascii only.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/14185] fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s)
  2012-05-30 19:01 [Bug libc/14185] New: fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s) pfrankli at redhat dot com
                   ` (5 preceding siblings ...)
  2013-10-09  9:18 ` neleai at seznam dot cz
@ 2013-10-10 16:50 ` bugdal at aerifal dot cx
  2014-06-19 14:35 ` fweimer at redhat dot com
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugdal at aerifal dot cx @ 2013-10-10 16:50 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14185

--- Comment #7 from Rich Felker <bugdal at aerifal dot cx> ---
A much better solution would be to rewrite fnmatch not to go through a
temporary wchar_t string at all. It's actually just as easy, if not easier, to
do the matching directly as a multibyte string. It can be done in-place (no
additional temporary storage at all) for standard fnmatch, but I'm not clear on
whether this can be made efficient for the GNU extensions. I believe it may
also be possible to adapt the twoway algorithm to fnmatch so that (at least
most) matches take linear time rather than quadratic time, but it's still quite
possible that I'm wrong on this (I haven't worked out enough of the details
yet).

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/14185] fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s)
  2012-05-30 19:01 [Bug libc/14185] New: fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s) pfrankli at redhat dot com
                   ` (6 preceding siblings ...)
  2013-10-10 16:50 ` bugdal at aerifal dot cx
@ 2014-06-19 14:35 ` fweimer at redhat dot com
  2015-08-27 22:09 ` [Bug glob/14185] " jsm28 at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: fweimer at redhat dot com @ 2014-06-19 14:35 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14185

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com
              Flags|                            |security-

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug glob/14185] fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s)
  2012-05-30 19:01 [Bug libc/14185] New: fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s) pfrankli at redhat dot com
                   ` (7 preceding siblings ...)
  2014-06-19 14:35 ` fweimer at redhat dot com
@ 2015-08-27 22:09 ` jsm28 at gcc dot gnu.org
  2021-02-23 21:06 ` cvs-commit at gcc dot gnu.org
  2021-02-23 21:07 ` adhemerval.zanella at linaro dot org
  10 siblings, 0 replies; 12+ messages in thread
From: jsm28 at gcc dot gnu.org @ 2015-08-27 22:09 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14185

Joseph Myers <jsm28 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|libc                        |glob

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug glob/14185] fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s)
  2012-05-30 19:01 [Bug libc/14185] New: fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s) pfrankli at redhat dot com
                   ` (8 preceding siblings ...)
  2015-08-27 22:09 ` [Bug glob/14185] " jsm28 at gcc dot gnu.org
@ 2021-02-23 21:06 ` cvs-commit at gcc dot gnu.org
  2021-02-23 21:07 ` adhemerval.zanella at linaro dot org
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-02-23 21:06 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14185

--- Comment #10 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Adhemerval Zanella
<azanella@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a79328c745219dcb395070cdcd3be065a8347f24

commit a79328c745219dcb395070cdcd3be065a8347f24
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Mon Jan 4 14:26:30 2021 -0300

    posix: Falling back to non wide mode in case of encoding error [BZ #14185]

    Gnulib has added the proposed fix with aed23714d60 (done in 2005), but
    recently with a glibc merge with 67306f6 (done in 2020 with sync back)
    it has fallback to old semantic to return -1 on in case of failure.

    From gnulib developer feedback it was an oversight.  Although the full
    fix for BZ #14185 would require to rewrite fnmatch implementation to use
    mbrtowc instead of mbsrtowcs on the full input, this mitigate the issue
    and it has been used by gnulib for a long time.

    This patch also removes the alloca usage on the string convertion to
    wide characters before calling the internal function.

    Checked on x86_64-linux-gnu.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug glob/14185] fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s)
  2012-05-30 19:01 [Bug libc/14185] New: fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s) pfrankli at redhat dot com
                   ` (9 preceding siblings ...)
  2021-02-23 21:06 ` cvs-commit at gcc dot gnu.org
@ 2021-02-23 21:07 ` adhemerval.zanella at linaro dot org
  10 siblings, 0 replies; 12+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2021-02-23 21:07 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=14185

Adhemerval Zanella <adhemerval.zanella at linaro dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |2.34
           Assignee|unassigned at sourceware dot org   |adhemerval.zanella at linaro dot o
                   |                            |rg
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #11 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
Fixed on 2.34.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-02-23 21:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-30 19:01 [Bug libc/14185] New: fnmatch() fails when '*' wildcard is applied on the file name containing multi-byte character(s) pfrankli at redhat dot com
2012-05-31  0:30 ` [Bug libc/14185] " bugdal at aerifal dot cx
2012-05-31  2:49 ` law at redhat dot com
2012-05-31 20:18 ` pfrankli at redhat dot com
2012-06-01  2:32 ` bugdal at aerifal dot cx
2012-06-18 17:17 ` law at redhat dot com
2013-10-09  9:18 ` neleai at seznam dot cz
2013-10-10 16:50 ` bugdal at aerifal dot cx
2014-06-19 14:35 ` fweimer at redhat dot com
2015-08-27 22:09 ` [Bug glob/14185] " jsm28 at gcc dot gnu.org
2021-02-23 21:06 ` cvs-commit at gcc dot gnu.org
2021-02-23 21:07 ` adhemerval.zanella at linaro dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).