From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 294B03858D1E; Sat, 18 Nov 2023 10:03:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 294B03858D1E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1700301792; bh=lbGrZN8oDiOUKpwfgdzhEn29REkwCATvfTbvo3a2RmA=; h=From:To:Subject:Date:From; b=VBbTi2H3drhbG40uQx6pINd/XbDgSjEoMZHAsWkjvoP+tnFhJKTNjuwp79yNAwTOB bPzzrV8fCR5QU7wZwrWxAvt6iuhR4pAAlsdYP3OqUyY/kWSfwcczZDTCTKOCFzIEaO Dj4bk0HocAl63djn3zCUcHRZuI5ovto1Zqp2MK2w= From: "stephane+sourceware at chazelas dot org" To: glibc-bugs@sourceware.org Subject: [Bug glob/31075] New: fnmatch("??") matches on 2-byte single characters (as well as 2 any-length characters) Date: Sat, 18 Nov 2023 10:03:11 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: glob X-Bugzilla-Version: 2.34 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: stephane+sourceware at chazelas dot org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D31075 Bug ID: 31075 Summary: fnmatch("??") matches on 2-byte single characters (as well as 2 any-length characters) Product: glibc Version: 2.34 Status: UNCONFIRMED Severity: normal Priority: P2 Component: glob Assignee: unassigned at sourceware dot org Reporter: stephane+sourceware at chazelas dot org Target Milestone: --- Regression introduced in 2.34 by commit a79328c745219dcb395070cdcd3be065a8347f24 reproduced on Ubuntu 22.04, Debian sid libc6:amd64 2.37-12, and current git HEAD (dae3cf4134d476a4b4ef86fd7012231d6436c15e) built on that sid system. find . -name '??' In a UTF-8 locale matches on a UTF-8 encoded =C3=A9=C3=A9 (0xc3 0xa9 0xc3 0= xa9) but also on a UTF-8 encoded =C3=A9 (0xc3 0xa9): To reproduce, from a shell with support for ksh93-style $'...' quotes (ksh9= 3, zsh, bash...) and on a system where the C.UTF-8 locale has been enabled (ch= ange to any other UTF-8 locale if not): ( mkdir new-dir && cd new-dir || exit touch $'\xc3\xa9' $'\xc3\xa9\xc3\xa9' export LC_ALL=3DC.UTF-8 locale charmap find . -name '??' ) UTF-8 ./=C3=A9 ./=C3=A9=C3=A9 It seems when fnmatch() fails to match in wchar_t mode, it tries again in c= har mode. The pattern is also treated as a char[] array then which makes it even worth than the (already quite buggy) behaviour of bash pattern matching (https://lists.gnu.org/archive/html/bug-bash/2021-02/msg00054.html), as tha= t's done even when both the subject and pattern are properly encoded in the use= r's locale charmap. $ find . '*[=C3=A1-=C3=A4]*' ./=C3=A9 ./=C3=A9=C3=A9 Those didn't match in wchar_t mode but matched in char mode as that became a *[\303\241-\303\244]* match so matches on anything containing byte 0241 to 0303. Like for bash, it becomes worse in locales that have characters whose encod= ing contains the encoding of [, ] or \ as it can end up matching on a pattern completely different from the one intended by the user. --=20 You are receiving this mail because: You are on the CC list for the bug.=