From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18233 invoked by alias); 4 Apr 2011 22:19:27 -0000 Received: (qmail 18222 invoked by uid 22791); 4 Apr 2011 22:19:26 -0000 X-SWARE-Spam-Status: No, hits=-6.9 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 04 Apr 2011 22:19:17 +0000 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p34MJHvJ011873 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 4 Apr 2011 18:19:17 -0400 Received: from [10.3.113.73] (ovpn-113-73.phx2.redhat.com [10.3.113.73]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p34MJGhK010562 for ; Mon, 4 Apr 2011 18:19:17 -0400 Message-ID: <4D9A43E4.50305@redhat.com> Date: Tue, 05 Apr 2011 00:34:00 -0000 From: Eric Blake User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.15) Gecko/20110307 Fedora/3.1.9-0.39.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.9 MIME-Version: 1.0 To: cygwin@cygwin.com Subject: Re: grep problem? References: In-Reply-To: OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enig43BC6E6DE4AD495DF777B6FE" X-IsSubscribed: yes Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com X-SW-Source: 2011-04/txt/msg00066.txt.bz2 --------------enig43BC6E6DE4AD495DF777B6FE Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-length: 1175 On 04/04/2011 04:09 PM, Jim Garrison wrote: > I'm getting weird behavior from grep. Searching for a bracketed range of = characters (i.e. [A-Z]) is doing case-insensitive matching, while an identi= cal but explicit character set match (i.e. [ABCDE...Z]) does not. Your problem is not with grep, but with your LC_COLLATE settings (which inherit from LC_ALL). POSIX states that range expressions (such as [A-Z]) are undefined in any locale except C; and some locales (like en_US.UTF-8) happen to treat A-B as AaB, A-b as AaBb, and so forth (that is, they collate case-insensitively). >=20 > $ grep '[a-b]' test.dat > abcde > ABCDE So, in a case-insensitive collation, this range expression includes at least one of A or B (but probably not both); and since that matches the ABCDE line, you get a correct result for the collation locale you requested. >=20 > Contrast with the correctly-working examples below >=20 > $ grep '[ab]' test.dat > abcde Here, there's no range, so there's no ambiguity. Also, try "LC_ALL=3DC grep '[a-b]' test.dat" to see a difference. --=20 Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org --------------enig43BC6E6DE4AD495DF777B6FE Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" Content-length: 619 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJNmkPkAAoJEKeha0olJ0NqgJIH/ioD8/TqSao22mBQIrZjOWoG hz+DHmBW9WHfcXFo3WY4iX2Fq9GTAmBKCOpYhymfiWbVVMOVmexFIWRlrFUQ4uPx pdGuaVqb5VMh0UNGazF8nrT4/0I1a0uF8C0SWZ3OqucB5w71nNA2YhMuDMkAYZZN PCYdy4WsnCXHG/6UK50k+YdswEN+njgPrYOE+VPqJOZ+UTA0cUNIKVbz6Va7C9Eq x+3zSTBNge9OAORE+Vo9Pc04D1YGQfNVVf+vkKM0JH5FKwpYaMseSJngHDLgDy2t LSWlz3l0lS2Bfhprnxm/EETq+69+DbboJ419Z2lU1caQDMfUmeyYPw7TA+GMOz4= =BvCW -----END PGP SIGNATURE----- --------------enig43BC6E6DE4AD495DF777B6FE--