From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17762 invoked by alias); 30 Apr 2010 07:36:46 -0000 Received: (qmail 17670 invoked by uid 48); 30 Apr 2010 07:36:28 -0000 Date: Fri, 30 Apr 2010 07:36:00 -0000 From: "bonzini at gnu dot org" To: glibc-bugs-regex@sources.redhat.com Message-ID: <20100430073626.11561.bonzini@gnu.org> Reply-To: sourceware-bugzilla@sourceware.org Subject: [Bug regex/11561] New: Collation characters represented by internal name instead of character sequence X-Bugzilla-Reason: CC Mailing-List: contact glibc-bugs-regex-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-regex-owner@sourceware.org X-SW-Source: 2010-04/txt/msg00002.txt.bz2 In the glibc locale definitions, collating elements have a hyphenated name: collating-symbol collating-element from "" and the hyphenated name have to be used in regular expression for [[. .]] to work properly: $ echo '*ch*' | LC_COLLATE=cs_CZ.UTF-8 sed 's/[[.c-h.]]//' ** $ echo 'ch' | LC_COLLATE=cs_CZ.UTF-8 sed 's/[[.ch.]]//' sed: -e expression #1, char 12: Invalid collation character However, POSIX 1.2008 says: A collating symbol is a collating element enclosed within bracket-period ( "[." and ".]" ) delimiters. Collating elements are defined as described in Collation Order . Conforming applications shall represent multi-character collating elements as collating symbols when it is necessary to distinguish them from a list of the individual characters that make up the multi-character collating element. For example, if the string "ch" is a collating element defined using the line: collating-element from "" in the locale definition, the expression "[[.ch.]]" shall be treated as an RE containing the collating symbol 'ch', while "[ch]" shall be treated as an RE matching 'c' or 'h' . Collating symbols are recognized only inside bracket expressions. If the string is not a collating element in the current locale, the expression is invalid. POSIX especially mentions [[.ch.]] in the example instead of [[.ch-digraph.]] so this is a bug in glibc. It shouldn't be hard to fix it in regcomp. -- Summary: Collation characters represented by internal name instead of character sequence Product: glibc Version: unspecified Status: NEW Severity: normal Priority: P2 Component: regex AssignedTo: bonzini at gnu dot org ReportedBy: bonzini at gnu dot org CC: glibc-bugs-regex at sources dot redhat dot com,glibc- bugs at sources dot redhat dot com http://sourceware.org/bugzilla/show_bug.cgi?id=11561 ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.