From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 122315 invoked by alias); 30 Oct 2015 14:07:18 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 122306 invoked by uid 89); 30 Oct 2015 14:07:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: limerock01.mail.cornell.edu Received: from limerock01.mail.cornell.edu (HELO limerock01.mail.cornell.edu) (128.84.13.241) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 30 Oct 2015 14:07:15 +0000 X-CornellRouted: This message has been Routed already. Received: from authusersmtp.mail.cornell.edu (granite4.serverfarm.cornell.edu [10.16.197.9]) by limerock01.mail.cornell.edu (8.14.4/8.14.4_cu) with ESMTP id t9UE7DDY011966 for ; Fri, 30 Oct 2015 10:07:13 -0400 Received: from [10.13.22.3] (50-192-21-217-static.hfc.comcastbusiness.net [50.192.21.217]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.14.4/8.12.10) with ESMTP id t9UE7BNJ016107 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT) for ; Fri, 30 Oct 2015 10:07:13 -0400 Subject: Re: Bug in collation functions? To: cygwin@cygwin.com References: <20151029075050.GE5319@calimero.vinschen.de> <20151029083057.GH5319@calimero.vinschen.de> <56321815.7000203@cornell.edu> <20151029153516.GJ5319@calimero.vinschen.de> <56323F2E.4030807@cornell.edu> <56324598.9060604@cornell.edu> <56324E82.7000402@redhat.com> <563268A4.6000005@cornell.edu> <56329462.2090206@cornell.edu> <56329BE8.808@cornell.edu> <20151030120320.GO5319@calimero.vinschen.de> From: Ken Brown Message-ID: <56337996.2000400@cornell.edu> Date: Fri, 30 Oct 2015 19:14:00 -0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <20151030120320.GO5319@calimero.vinschen.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2015-10/txt/msg00566.txt.bz2 Hi Corinna, On 10/30/2015 8:03 AM, Corinna Vinschen wrote: > On Oct 29 18:21, Ken Brown wrote: >> The fallback I had in mind is to return the shorter string if they have >> different lengths and otherwise to revert to wcscmp. > > I had a longer look into this suggestion and the below code and it took > me some time to find out what bugged me with it: > > What about str/wcsxfrm? > > Per POSIX, calling strcmp on the result of strxfrm is equivalent to > calling strcoll (analogue with wcs*). If you extend *coll to perform an > extra check on the length, you will have cases in which the above rule > fails. You can't perform the length test on the result of *xfrm and > expect the same result as in *coll. > > In fact, when calling LCMapStringW with NORM_IGNORESYMBOLS (you would > have to do this anyway if we add this flag in *coll), the resulting > transformed strings created from the input strings "11" and "1.1" would > be identical, so a length test on the xfrm string is not meaningful at > all. > > The bottom line is, afaics, we must make sure that CompareStringW and > LCMapStringW are called the same way, and their result/output has to be > returned to the caller. Performing an extra check in *coll which can't > be reliably performed in *xfrm is not feasible. > > Does that make sense? Yes, I see the problem, and I don't see a good way around it. So I think we probably have to leave things as they are and live with the fact that we can't do comparisons that ignore whitespace and punctuation. The alternative of allowing str/wcscoll to return 0 on unequal strings doesn't seem feasible in view of Eric's comments. What about the other issue I raised: Should setlocale return null to indicate an error if it's given an invalid locale name like en_DE.UTF-8? Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple