From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-locales-return-6742-listarch-libc-locales=sources.redhat.com@sourceware.org>
Received: (qmail 75713 invoked by alias); 5 Jun 2019 23:51:19 -0000
Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-locales.sourceware.org>
List-Subscribe: <mailto:libc-locales-subscribe@sourceware.org>
List-Post: <mailto:libc-locales@sourceware.org>
List-Help: <mailto:libc-locales-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: libc-locales-owner@sourceware.org
Received: (qmail 75694 invoked by uid 89); 5 Jun 2019 23:51:18 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: =?ISO-8859-1?Q?No, score=-4.7 required=5.0 tests=AWL,BAYES_00,BODY_8BITS,GARBLED_BODY,KAM_MANYTO,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 spammy==d0=b5=d0=bc=d0=b0, HTo:U*siddhesh, H*i:@kobylkin.com, H*f:@kobylkin.com?=
X-HELO: shared-ano163.rev.nazwa.pl
X-Spam-Score: -1
Date: Wed, 05 Jun 2019 23:51:00 -0000
From: Rafal Luzynski <digitalfreak@lingonborough.com>
To: "Diego (Egor) Kobylkin" <egor@kobylkin.com>,
	Marko Myllynen <myllynen@redhat.com>,
	Carlos O'Donell <codonell@redhat.com>,
	"libc-alpha@sourceware.org" <libc-alpha@sourceware.org>,
	"libc-locales@sourceware.org" <libc-locales@sourceware.org>,
	Siddhesh Poyarekar <siddhesh@gotplt.org>
Cc: Mike Fabian <mfabian@redhat.com>
Message-ID: <2030695416.914859.1559778544120@poczta.nazwa.pl>
In-Reply-To: <DDiRMB942zU2NTs_1xTsb-zTgRD2L6AOaaJW-a0-0YJ3O5voZt2GeTjQJQ0c_hExTwcJKvBMiXIeyHsdieM2Q1m61oOpU27Msj09zowycVM=@kobylkin.com>
References: <DDiRMB942zU2NTs_1xTsb-zTgRD2L6AOaaJW-a0-0YJ3O5voZt2GeTjQJQ0c_hExTwcJKvBMiXIeyHsdieM2Q1m61oOpU27Msj09zowycVM=@kobylkin.com>
Subject: Re: [PING^8][PATCH v12] Locales: Cyrillic -> ASCII transliteration
 [BZ #2872]
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-SW-Source: 2019-q2/txt/msg00078.txt.bz2

5.06.2019 08:47 "Diego (Egor) Kobylkin" <egor@kobylkin.com> wrote:
>=20
> ping
>=20
> Egor Kobylkin

I second these pings.  Marko, Carlos, Siddhesh, Mike, is there anything
else I can do here?

Since the questions may sound overwhelming, I'd like to focus on
a single issue:

How should we handle the upper/lower case when a single Cyrillic letter
is transliterated to a Latin digraph (trigraph, etc.)?

Possible answers (Cyrillic -> Latin Extended -> ASCII):

1. "=D0=A8" -> "=C5=A0" -> "SH"

   e.g.: "=D0=A8=D0=B5=D0=BC=D0=B0" -> "=C5=A0ema" -> "SHema"
         "=D0=A1=D1=85=D0=B5=D0=BC=D0=B0" ----------> "Shema"

2. "=D0=A8" -> "=C5=A0" -> "Sh"

   e.g.: "=D0=A8=D0=B5=D0=BC=D0=B0" -> "=C5=A0ema" -> "Shema"
         "=D0=A1=D1=85=D0=B5=D0=BC=D0=B0" ----------> "Shema"

Personally I don't like the answer 1. because "SHema" looks weird
to me.  Egor in turn does not like the answer 2. because the output
string becomes ambiguous.

Should we maybe have a smart algorithm which would select the title
case or the upper case of the output characters depending on the
context in the word?  Note that it would not resolve the problem of
the output text being ambiguous.

Regards,

Rafal