From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 48876 invoked by alias); 10 Sep 2015 05:51:19 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 48848 invoked by uid 89); 10 Sep 2015 05:51:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.5 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RDNS_DYNAMIC autolearn=no version=3.3.2 X-Spam-User: qpsmtpd, 2 recipients X-HELO: rap.rap.dk Date: Thu, 10 Sep 2015 05:51:00 -0000 From: Keld Simonsen To: egmont at gmail dot com Cc: libc-locales@sourceware.org Subject: Re: [Bug localedata/18943] New: Collation of NFD strings Message-ID: <20150910055107.GA16592@rap.rap.dk> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-SW-Source: 2015-q3/txt/msg00178.txt.bz2 On Wed, Sep 09, 2015 at 07:46:02PM +0000, egmont at gmail dot com wrote: > https://sourceware.org/bugzilla/show_bug.cgi?id=18943 > > Bug ID: 18943 > Summary: Collation of NFD strings > Product: glibc > Version: 2.22 > Status: NEW > Severity: enhancement > Priority: P2 > Component: localedata > Assignee: unassigned at sourceware dot org > Reporter: egmont at gmail dot com > CC: libc-locales at sourceware dot org > Target Milestone: --- > > Forking off from bug 18927 comment 8 & 11: > > Collate definitions currently assume the input to be in NFC. If the available > UTF-8 unittests are converted to NFD (the localedata/*.in files which have > UTF-8 in Makefile's test-input) then they fail. > > It would be nice to automatically make normalization the lowest priority factor > when deciding on collation, so that different normalizations of the same word > are as close to each other as possible. That is, to implement it once (e.g. in > iso14651_common) without having to modify individual locale definitions. Both NFC and NFD data should collate as expected. And you can mix then as you like, you do not need to normalize them. Best regards keld