From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 117097 invoked by alias); 19 Nov 2018 09:22:08 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 117057 invoked by uid 89); 19 Nov 2018 09:22:06 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=3.1 required=5.0 tests=BAYES_00,BODY_8BITS,GARBLED_BODY,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy=covers, hints, him, personal X-HELO: mout.kundenserver.de Subject: Re: [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] To: Marko Myllynen , libc-alpha@sourceware.org, libc-locales@sourceware.org References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <837001401.21346.1542406647888@poczta.nazwa.pl> <5a247161-c498-ed50-ff4a-58f2ecf974f0@redhat.com> From: Egor Kobylkin Openpgp: preference=signencrypt Message-ID: <29627b4c-317d-5e80-f34b-920e0eadadee@kobylkin.com> Date: Mon, 19 Nov 2018 09:22:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <5a247161-c498-ed50-ff4a-58f2ecf974f0@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2018-q4/txt/msg00114.txt.bz2 On 19.11.18 08:13, Marko Myllynen wrote: > Hi, > > On 17/11/2018 20.34, Egor Kobylkin wrote: >> >> Shouldn't we have two explicit rules for transcription and >> transliteration not dependent on a destination character set? >> >> This would contradict ISO 9.1995. (System A). >> System A was added on Marko's request (so setting him on TO:) I am >> neutral on keeping it or dropping it, just to be clear. >> >> This particular rule with h/x would make sense it's own. >> But again - it would contradict the standards. >> On the other hand, for my personal needs I care less about standards but >> about current functionality and data loss because of missing >> transcription altogether due to the BZ #2872. > > Given the amount of questions above I think the way forward is to try > follow the relevant standards as closely as possible and also check what > the other implementations (i.e., uconv(1)) do. For example, checking the > case earlier mentioned case may or may not give some hints: > > $ echo Шема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin > Šema > $ echo Схема | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin > Shema > $ uconv -V > uconv v2.1 ICU 50.1.2 Marko, Your example only covers _tansliteration_ to Latin Diacritics iconv -f UTF-8 -t ISO-8859-15//TRANSLIT \ | iconv -f ISO-8859-15 -t UTF-8 while BZ #2872 is about _transcription_ to ASCII iconv -f UTF-8 -t ASCII//TRANSLIT The glibc wiki explicitly lists this use case (ASCII) as the test example https://sourceware.org/glibc/wiki/Locales#Testing_Locales So again, you are asking to have ISO 9.1995. System A but the bug is about ISO 9.1995. System B (GOST 7.79-2000) Bests, Egor