From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 113126 invoked by alias); 8 Oct 2018 22:23:46 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 113105 invoked by uid 89); 8 Oct 2018 22:23:46 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy=uppercase, H*Ad:U*ldv X-HELO: shared-ano163.rev.nazwa.pl X-Spam-Score: 1 Date: Mon, 08 Oct 2018 22:23:00 -0000 From: Rafal Luzynski Reply-To: Rafal Luzynski To: Marko Myllynen , Egor Kobylkin , Keld Simonsen Cc: libc-alpha@sourceware.org, libc-locales@sourceware.org, "Dmitry V. Levin" , Volodymyr Lisivka , Carlos O'Donell , Max Kutny , danilo@gnome.org Message-ID: <246390048.827062.1539037422672@poczta.nazwa.pl> In-Reply-To: References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <16e785f3-2e9f-ceb2-698f-dc33c91a5d5e@kobylkin.com> <20181003091949.GA21486@rap.rap.dk> <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com> <1485772360.805333.1538731225156@poczta.nazwa.pl> <69e26cab-810e-824b-3b16-b75ac44d8b0c@redhat.com> Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2018-q4/txt/msg00021.txt.bz2 8.10.2018 14:40 Marko Myllynen wrote: > Hi, > > Thanks for the update. I have few mostly cosmetic comments below, > hopefully we'll hear from others whether they agree with this direction. > > - Please add the standard glibc locale header (see the existing > translit_* files for reference) > - Consider wrapping the header lines at or around column 70-72 > - Consider describing which characters, character ranges, or blocks are > supported (perhaps also describe why some of those are not included, see > e.g. https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode) > - Please remove trailing whitespaces and spaces after ; Thanks for this, Marko. While at this, in the ChangeLog and in the commit message these paths: * locales/aa_DJ: likewise 1. Should be a relative path starting in the root directory of glibc source, that is: "* localedata/locales/aa_DJ". 2. Should be "Likewise." (starting with an uppercase and ending with a dot). > - No duplicates: > > % CYRILLIC SMALL LETTER IE > ; > > should become: > > % CYRILLIC SMALL LETTER IE > > > - There are few issues with the definitions: > > % CYRILLIC CAPITAL LETTER U > ; > % CYRILLIC UNDEFINED > ; "" > > % CYRILLIC SMALL LETTER U > ; > % CYRILLIC UNDEFINED > ; "" Are the duplicates here because some Cyrillic letters may have multiple Latin transliterations depending on the context, for example Cyrillic IE must be transliterated sometimes as "e", sometimes as "ie", sometimes as "ye" or "je"? Can we provide rules for groups of characters instead? > I wonder would it be possible to automate generation of this file so > that issues like the above could avoided? But perhaps that could be the > next step once this initial patch lands. I agree with this. Regards, Rafal