From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 75713 invoked by alias); 5 Jun 2019 23:51:19 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 75694 invoked by uid 89); 5 Jun 2019 23:51:18 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: =?ISO-8859-1?Q?No, score=-4.7 required=5.0 tests=AWL,BAYES_00,BODY_8BITS,GARBLED_BODY,KAM_MANYTO,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 spammy==d0=b5=d0=bc=d0=b0, HTo:U*siddhesh, H*i:@kobylkin.com, H*f:@kobylkin.com?= X-HELO: shared-ano163.rev.nazwa.pl X-Spam-Score: -1 Date: Wed, 05 Jun 2019 23:51:00 -0000 From: Rafal Luzynski To: "Diego (Egor) Kobylkin" , Marko Myllynen , Carlos O'Donell , "libc-alpha@sourceware.org" , "libc-locales@sourceware.org" , Siddhesh Poyarekar Cc: Mike Fabian Message-ID: <2030695416.914859.1559778544120@poczta.nazwa.pl> In-Reply-To: References: Subject: Re: [PING^8][PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SW-Source: 2019-q2/txt/msg00078.txt.bz2 5.06.2019 08:47 "Diego (Egor) Kobylkin" wrote: >=20 > ping >=20 > Egor Kobylkin I second these pings. Marko, Carlos, Siddhesh, Mike, is there anything else I can do here? Since the questions may sound overwhelming, I'd like to focus on a single issue: How should we handle the upper/lower case when a single Cyrillic letter is transliterated to a Latin digraph (trigraph, etc.)? Possible answers (Cyrillic -> Latin Extended -> ASCII): 1. "=D0=A8" -> "=C5=A0" -> "SH" e.g.: "=D0=A8=D0=B5=D0=BC=D0=B0" -> "=C5=A0ema" -> "SHema" "=D0=A1=D1=85=D0=B5=D0=BC=D0=B0" ----------> "Shema" 2. "=D0=A8" -> "=C5=A0" -> "Sh" e.g.: "=D0=A8=D0=B5=D0=BC=D0=B0" -> "=C5=A0ema" -> "Shema" "=D0=A1=D1=85=D0=B5=D0=BC=D0=B0" ----------> "Shema" Personally I don't like the answer 1. because "SHema" looks weird to me. Egor in turn does not like the answer 2. because the output string becomes ambiguous. Should we maybe have a smart algorithm which would select the title case or the upper case of the output characters depending on the context in the word? Note that it would not resolve the problem of the output text being ambiguous. Regards, Rafal