From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-locales-return-6502-listarch-libc-locales=sources.redhat.com@sourceware.org>
Received: (qmail 117097 invoked by alias); 19 Nov 2018 09:22:08 -0000
Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-locales.sourceware.org>
List-Subscribe: <mailto:libc-locales-subscribe@sourceware.org>
List-Post: <mailto:libc-locales@sourceware.org>
List-Help: <mailto:libc-locales-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: libc-locales-owner@sourceware.org
Received: (qmail 117057 invoked by uid 89); 19 Nov 2018 09:22:06 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=3.1 required=5.0 tests=BAYES_00,BODY_8BITS,GARBLED_BODY,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy=covers, hints, him, personal
X-HELO: mout.kundenserver.de
Subject: Re: [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ
 #2872]
To: Marko Myllynen <myllynen@redhat.com>, libc-alpha@sourceware.org,
 libc-locales@sourceware.org
References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com>
 <20180412224352.GB2911@altlinux.org>
 <b82fe65b-b880-a2b5-c97d-2a6aae9c1165@kobylkin.com>
 <837001401.21346.1542406647888@poczta.nazwa.pl>
 <bef63562-09d1-3306-aae9-20002ccf4130@kobylkin.com>
 <5a247161-c498-ed50-ff4a-58f2ecf974f0@redhat.com>
From: Egor Kobylkin <egor@kobylkin.com>
Openpgp: preference=signencrypt
Message-ID: <29627b4c-317d-5e80-f34b-920e0eadadee@kobylkin.com>
Date: Mon, 19 Nov 2018 09:22:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.2.1
MIME-Version: 1.0
In-Reply-To: <5a247161-c498-ed50-ff4a-58f2ecf974f0@redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-SW-Source: 2018-q4/txt/msg00114.txt.bz2

On 19.11.18 08:13, Marko Myllynen wrote:
> Hi,
> 
> On 17/11/2018 20.34, Egor Kobylkin wrote:

>>
>> Shouldn't we have two explicit rules for transcription and
>> transliteration not dependent on a destination character set?
>>
>> This would contradict ISO 9.1995. (System A).
>> System A was added on Marko's request (so setting him on TO:) I am
>> neutral on keeping it or dropping it, just to be clear.
>>
>> This particular rule with h/x would make sense it's own.
>> But again - it would contradict the standards.
>> On the other hand, for my personal needs I care less about standards but
>> about current functionality and data loss because of missing
>> transcription altogether due to the BZ #2872.
> 
> Given the amount of questions above I think the way forward is to try
> follow the relevant standards as closely as possible and also check what
> the other implementations (i.e., uconv(1)) do. For example, checking the
> case earlier mentioned case may or may not give some hints:
> 
> $ echo Ð¨ÐµÐ¼Ð°  | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
> Å ema
> $ echo Ð¡ÑÐµÐ¼Ð° | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
> Shema
> $ uconv -V
> uconv v2.1  ICU 50.1.2

Marko,

Your example only covers _tansliteration_ to Latin Diacritics
iconv -f UTF-8 -t ISO-8859-15//TRANSLIT \
| iconv -f ISO-8859-15 -t UTF-8

while BZ #2872 is about _transcription_ to ASCII
iconv -f UTF-8 -t ASCII//TRANSLIT

The glibc wiki explicitly lists this use case (ASCII) as the test
example https://sourceware.org/glibc/wiki/Locales#Testing_Locales

So again, you are asking to have ISO 9.1995. System A but the bug is
about ISO 9.1995. System B (GOST 7.79-2000)


Bests,
Egor