public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
From: Egor Kobylkin <egor@kobylkin.com>
To: libc-alpha@sourceware.org, libc-locales@sourceware.org
Subject: Re: [PATCH v8] Locales: Cyrillic -> ASCII transliteration table [BZ #2872]
Date: Fri, 02 Nov 2018 23:27:00 -0000	[thread overview]
Message-ID: <ffbdaf2a-7d2c-c096-5762-7e91b6de19b2@kobylkin.com> (raw)
In-Reply-To: <1847777640.296958.1541197328619@poczta.nazwa.pl>

Moving everybody from To: and CC: on BCC. It seems at this stage it is
Rafal and me. It is still going to libc-alpha and libc-locales. If you
are interested to be put back on CC - please let me know.

On 02.11.18 23:22, Rafal Luzynski wrote:
>> * Consistently transliterate single uppercase Cyrillic letters to 
>> sequences of all uppercase Latin letters in all languages
>> (whenever a Cyrillic letter is transliterated to more than one
>> Latin letter), for example "Ї" is now transliterated as "YI" rather
>> than "Yi".
> 
> I think you have not yet explained whether this is required by any
> existing standard (please provide links) or whether this is your
> genuine idea to distinguish between the cases like "Ш" transliterated > to "Sh" and
 "Сх" also transliterated to "Sh".

I remember seeing this form of the capitalization it in actual
transliterated texts long time ago but can't find a formal description
as of now. Just don't want to claim this to be my original idea.

>> The choice for YO, SH, YA, ZH etc. is to avoid naming collisions for
>> example for "Сх" and "Ш" that would both transliterate to Sh:
>> With SH:"Схема"->"Shema" but "Шема"->"SHema"
>> With Sh:"Схема"->"Shema" and "Шема"->"Shema". Collision!
>> This is important e.g. for renaming files, grouping as in using uniq >> etc.

As for the users - I am a user and I have demonstrated the use cases
where the collisions due to "one symbol capitalization" would cause
irreversible damage to data. For a library like glibc this seems like a
relevant issue to consider.

The "two symbol capitalization" on the other hand would prevent
collision and can be easily corrected in the userspace if needed
with something like

foo="SHema"
foo="${foo:0:1}$(tr '[:upper:]' '[:lower:]' <<<${foo:1})"
echo "$foo"
Shema

It looks like everyone really using transliteration for something
sensitive already have done it the userspace since at least 2006 when
this bug was first logged. So we won't brake the official use cases
where the capitalization should be done in a certain way. But we will
prevent new bugs due to collision if we use "two symbol capitalization"
indeed.

Happy to hear arguments to the contrary.

Bests,
Egor Kobylkin

  reply	other threads:[~2018-11-02 23:27 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com>
     [not found] ` <20180412224352.GB2911@altlinux.org>
2018-07-17 19:34   ` SUBJECT: [PATCH] " Egor Kobylkin
2018-07-17 19:41     ` Carlos O'Donell
2018-07-17 19:50       ` Egor Kobylkin
2018-07-17 19:59         ` Carlos O'Donell
2018-08-06 19:00   ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29 Egor Kobylkin
2018-10-03  8:28     ` Egor Kobylkin
2018-10-03  9:20       ` Keld Simonsen
2018-10-03  9:32         ` Egor Kobylkin
2018-10-05  8:44           ` Marko Myllynen
2018-10-05  9:20           ` Rafal Luzynski
2018-10-05 10:37             ` Egor Kobylkin
2018-10-08 22:05               ` Rafal Luzynski
2018-10-08 22:52                 ` Egor Kobylkin
2018-10-09 21:43                   ` Rafal Luzynski
2018-10-09 16:10                 ` Marko Myllynen
2018-10-09 16:22                   ` Egor Kobylkin
2018-10-09 16:49                     ` Marko Myllynen
2018-10-09 22:09                   ` Rafal Luzynski
2018-10-10 11:21                     ` Marko Myllynen
2018-10-11 10:10                   ` Marko Myllynen
     [not found]             ` <deacdf31-d0bb-a92d-1de3-934d6b4cb158@kobylkin.com>
2018-10-05 11:54               ` Marko Myllynen
2018-10-05 12:01                 ` Egor Kobylkin
2018-10-05 12:21                   ` Marko Myllynen
2018-10-05 20:47                     ` Egor Kobylkin
2018-10-08 12:41                       ` Marko Myllynen
2018-10-08 22:23                         ` Rafal Luzynski
2018-10-08 23:36                           ` Egor Kobylkin
2018-10-09 13:18                             ` Egor Kobylkin
2018-10-09 18:34                               ` Egor Kobylkin
2018-10-09 22:18                                 ` Rafal Luzynski
2018-10-09 22:40                                   ` Egor Kobylkin
2018-10-09 22:43                                     ` Egor Kobylkin
2018-10-10 11:23                                       ` Marko Myllynen
2018-10-10 12:20                                         ` Egor Kobylkin
2018-10-10 12:34                                           ` Marko Myllynen
2018-10-10 22:29   ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v2 Egor Kobylkin
2018-10-11 10:00     ` Marko Myllynen
2018-10-11 11:05     ` Rafal Luzynski
2018-10-11 13:10       ` Marko Myllynen
2018-10-11 13:51       ` Volodymyr Lisivka
2018-10-11 14:59       ` Egor Kobylkin
2018-10-11 21:31         ` Egor Kobylkin
2018-10-11 15:05       ` Egor Kobylkin
2018-10-11 15:45   ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v3 Egor Kobylkin
2018-10-11 21:33   ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v4 Egor Kobylkin
2018-10-12 14:06   ` [PATCH v5] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] Egor Kobylkin
2018-10-13  1:01     ` Rafal Luzynski
2018-10-13 16:58       ` Egor Kobylkin
2018-10-15 11:05         ` Marko Myllynen
2018-10-15 11:55           ` Egor Kobylkin
2018-10-23 23:08         ` Rafal Luzynski
2018-10-17 14:17   ` [PATCH v6] " Egor Kobylkin
2018-11-01 22:52   ` [PATCH v7] " Egor Kobylkin
2018-11-02  0:01   ` [PATCH v8] " Egor Kobylkin
2018-11-02 22:22     ` Rafal Luzynski
2018-11-02 23:27       ` Egor Kobylkin [this message]
2018-11-14 21:25   ` [PATCH v9] " Egor Kobylkin
2018-11-16 22:17     ` Rafal Luzynski
2018-11-17 18:35       ` Egor Kobylkin
2018-11-19  7:14         ` Marko Myllynen
2018-11-19  9:22           ` Egor Kobylkin
2018-11-19 19:36             ` Marko Myllynen
2018-12-01 22:09           ` Rafal Luzynski
2018-12-01 22:53             ` Egor Kobylkin
2018-12-03 22:19             ` Egor Kobylkin
     [not found]               ` <1361059722.707244.1544231740358@poczta.nazwa.pl>
2018-12-10 21:20                 ` Marko Myllynen
2018-12-19 22:26                   ` Rafal Luzynski
2018-12-19 22:48                     ` Egor Kobylkin
2018-12-19 23:51                       ` Rafal Luzynski
2018-11-19 11:11   ` [PATCH v10] " Egor Kobylkin
2018-12-07 23:35     ` Rafal Luzynski
2018-12-08 21:51       ` Egor Kobylkin
2018-12-19 22:42         ` Rafal Luzynski
2018-12-19 23:02           ` Egor Kobylkin
2018-12-20  0:06             ` Rafal Luzynski
2018-12-08 22:28   ` [PATCH v11] Locales: Cyrillic -> ASCII transliteration " Egor Kobylkin
2018-12-19 23:16     ` Egor Kobylkin
2018-12-26 10:07       ` Siddhesh Poyarekar
2018-12-26 12:14         ` Egor Kobylkin
2018-12-27  1:31           ` Siddhesh Poyarekar
2018-12-27 11:31             ` Rafal Luzynski
2019-01-02 18:39   ` [PATCH v12] " Egor Kobylkin
2019-01-05 14:36     ` Rafal Luzynski
2019-01-05 21:13       ` Egor Kobylkin
2019-01-07 20:37         ` Marko Myllynen
2019-01-09  0:46           ` Egor Kobylkin
2019-01-09 20:03             ` Marko Myllynen
2019-02-04  7:14               ` [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] ping for 2.30 Egor Kobylkin
2019-02-14 16:48                 ` Marko Myllynen
2019-03-04 22:12                   ` Egor Kobylkin
2019-03-11 13:59                     ` PING " Egor Kobylkin
2019-03-14 19:49                       ` Egor Kobylkin
2019-04-19 22:24                   ` Rafal Luzynski
     [not found]                     ` <5ELixS9SQ0DW4mlvswp96ASpLobBabU9KQ6zOTH-Udrb34mABhcqiPERpBZfPWZ9F77s8XNmiLIAq9UWu0AjLFFdjOz_FZVU5_xF-SiQkrw=@kobylkin.com>
2019-04-27  2:51                       ` Siddhesh Poyarekar
2019-04-27  7:34                         ` Diego (Egor) Kobylkin
2019-04-09  1:04     ` [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] Carlos O'Donell
2019-03-19 10:39   ` ping " Egor Kobylkin
2019-03-28 16:20     ` [PING^4][PATCH " Marko Myllynen
2019-04-04 19:44     ` [PING^5][PATCH " Egor Kobylkin
2019-04-06  1:36       ` Siddhesh Poyarekar
2019-04-16  7:15     ` [PING^6][PATCH " Marko Myllynen
2019-04-16 13:17       ` Carlos O'Donell
2019-04-16 17:07         ` Egor Kobylkin
2019-04-16 17:58           ` Carlos O'Donell
2019-04-16 18:41             ` Egor Kobylkin
2019-04-16 19:06               ` Carlos O'Donell
2019-05-10 12:19                 ` Marko Myllynen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ffbdaf2a-7d2c-c096-5762-7e91b6de19b2@kobylkin.com \
    --to=egor@kobylkin.com \
    --cc=libc-alpha@sourceware.org \
    --cc=libc-locales@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).