public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
From: Egor Kobylkin <egor@kobylkin.com>
To: Rafal Luzynski <digitalfreak@lingonborough.com>,
	Marko Myllynen <myllynen@redhat.com>
Cc: Keld Simonsen <keld@keldix.com>,
	libc-alpha@sourceware.org, libc-locales@sourceware.org,
	"Dmitry V. Levin" <ldv@altlinux.org>,
	Volodymyr Lisivka <vlisivka@gmail.com>,
	Carlos O'Donell <carlos@redhat.com>, Max Kutny <mkutny@gmail.com>,
	danilo@gnome.org
Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
Date: Tue, 09 Oct 2018 18:34:00 -0000	[thread overview]
Message-ID: <a9af47d8-bf3d-e607-38e1-a6e765a604d3@kobylkin.com> (raw)
In-Reply-To: <f6b530b0-53b7-bd90-9bb9-864d0a477f50@kobylkin.com>


The culprits were the "" around the "<U0423><U0301>" (<U00DA>) and
"<U0443><U0301>" (<U00FA>).
It works now with
% CYRILLIC UNDEFINED
<U0423><U0301> <U00DA>;"<U0055><U0060>"
% CYRILLIC UNDEFINED
<U0443><U0301> <U00FA>;"<U0075><U0060>"

The <U0301> is "combining" and obviously it doesn't work if enclosed in
quotes with the letter codepoint. Please let me know if there is another
explanation.

I will now make those changes and generate the patch itself.
Egor

On 09.10.2018 15:18, Egor Kobylkin wrote:
> Hi,
> 
> I have now implemented all the changes requested for translit_cyrillic
> file but started hitting what seems like a bug:
> 
> - If the line <U0425> <U0048>;<U0058> is present in translt_cyrillic the
> locale compilation fails i.e. grep CYRILLIC < $testfile |
> LOCPATH=$workdir/compiled_locales/"$locale"/ LC_ALL="$locale".UTF-8
> iconv -f UTF-8 -t ASCII//TRANSLIT is hanging frozen.
> 
> - If the line <U0425> <U0048>;<U0058> is absent from translit_cyrillic
> everything works, just the transliteration of <U0425> fails as expected
> (? is displayed)
> 
> - If translit_cyrillic contains <U0425> <U0048>;<U0058> as the _only_
> line the transliteration of <U0425> works again (others as ?).
> 
> Would you have any idea into what direction should I look? The new
> translit_cyrillic is attached.
> 
> (<U0425> is % CYRILLIC CAPITAL LETTER HA)
> 
> Best regards,
> Egor
> 
> On 09.10.2018 01:35, Egor Kobylkin wrote:
>> On 09.10.2018 00:23, Rafal Luzynski wrote:
>>> 8.10.2018 14:40 Marko Myllynen <myllynen@redhat.com> wrote:
>>>> Hi,
>>>>
>>>> Thanks for the update. I have few mostly cosmetic comments below,
>>>> hopefully we'll hear from others whether they agree with this direction.
>>>>
>>
>> Yeah, the earlier we have feedback the more productive we are. I'd be
>> happy to get much feedback on this as early as possible. So please
>> everybody concerned please chime in.
>>
>>>
>>>> - No duplicates:
>>>>
>>>> % CYRILLIC SMALL LETTER IE
>>>> <U0435> <U0065>; <U0065>
>>>>
>>>> should become:
>>>>
>>>> % CYRILLIC SMALL LETTER IE
>>>> <U0435> <U0065>
>>>>
>>>> - There are few issues with the definitions:
>>>>
>>>> % CYRILLIC CAPITAL LETTER U
>>>> <U0423> <U0055>; <U0055>
>>>> % CYRILLIC UNDEFINED
>>>> <U0423><U0423> <U00DA>; "<U0055><U0060>"
>>>>
>>>> % CYRILLIC SMALL LETTER U
>>>> <U0443> <U0075>; <U0075>
>>>> % CYRILLIC UNDEFINED
>>>> <U0443><U0443> <U00FA>; "<U0075><U0060>"
>>>
>>> Are the duplicates here because some Cyrillic letters may have multiple
>>> Latin transliterations depending on the context, for example Cyrillic IE
>>> must be transliterated sometimes as "e", sometimes as "ie", sometimes
>>> as "ye" or "je"?  Can we provide rules for groups of characters instead?
>> No, the duplicates are just by design of my line generating logic. I
>> have fixed (removed) them. The varying transcription between
>> languages/locales can not be handled in one file at all as far as I
>> understood.
>>
>>>
>>>> I wonder would it be possible to automate generation of this file so
>>>> that issues like the above could avoided? But perhaps that could be the
>>>> next step once this initial patch lands.
>>
>> I am generating the content part of the translit_cyrillc from the
>> LibreOffice Spreadsheet. Not sure if you had time to view it by now?
>> https://sourceware.org/bugzilla/attachment.cgi?id=11299
>>
>> Anyway I have just fixed the issues identified by Marko above in that
>> spreadsheet. I will do the changes for the below request and then upload
>> the new translit_cyrillic file to the bugzilla.
>>
>>
>>>> - Please add the standard glibc locale header (see the existing
>>>> translit_* files for reference)
>>>> - Consider wrapping the header lines at or around column 70-72
>>>> - Consider describing which characters, character ranges, or blocks are
>>>> supported (perhaps also describe why some of those are not included, see
>>>> e.g. https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode)
>>>> - Please remove trailing whitespaces and spaces after ;
>>>
>>> Thanks for this, Marko.  While at this, in the ChangeLog and in the commit
>>> message these paths:
>>>
>>> 	* locales/aa_DJ: likewise
>>>
>>> 1. Should be a relative path starting in the root directory of glibc
>> source,
>>>    that is: "* localedata/locales/aa_DJ".
>>> 2. Should be "Likewise." (starting with an uppercase and ending with a
>> dot).
>>
>> will do.
>>
>> Bests,
>> Egor
>>
> 

  reply	other threads:[~2018-10-09 18:34 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com>
     [not found] ` <20180412224352.GB2911@altlinux.org>
2018-07-17 19:34   ` SUBJECT: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] Egor Kobylkin
2018-07-17 19:41     ` Carlos O'Donell
2018-07-17 19:50       ` Egor Kobylkin
2018-07-17 19:59         ` Carlos O'Donell
2018-08-06 19:00   ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29 Egor Kobylkin
2018-10-03  8:28     ` Egor Kobylkin
2018-10-03  9:20       ` Keld Simonsen
2018-10-03  9:32         ` Egor Kobylkin
2018-10-05  8:44           ` Marko Myllynen
2018-10-05  9:20           ` Rafal Luzynski
2018-10-05 10:37             ` Egor Kobylkin
2018-10-08 22:05               ` Rafal Luzynski
2018-10-08 22:52                 ` Egor Kobylkin
2018-10-09 21:43                   ` Rafal Luzynski
2018-10-09 16:10                 ` Marko Myllynen
2018-10-09 16:22                   ` Egor Kobylkin
2018-10-09 16:49                     ` Marko Myllynen
2018-10-09 22:09                   ` Rafal Luzynski
2018-10-10 11:21                     ` Marko Myllynen
2018-10-11 10:10                   ` Marko Myllynen
     [not found]             ` <deacdf31-d0bb-a92d-1de3-934d6b4cb158@kobylkin.com>
2018-10-05 11:54               ` Marko Myllynen
2018-10-05 12:01                 ` Egor Kobylkin
2018-10-05 12:21                   ` Marko Myllynen
2018-10-05 20:47                     ` Egor Kobylkin
2018-10-08 12:41                       ` Marko Myllynen
2018-10-08 22:23                         ` Rafal Luzynski
2018-10-08 23:36                           ` Egor Kobylkin
2018-10-09 13:18                             ` Egor Kobylkin
2018-10-09 18:34                               ` Egor Kobylkin [this message]
2018-10-09 22:18                                 ` Rafal Luzynski
2018-10-09 22:40                                   ` Egor Kobylkin
2018-10-09 22:43                                     ` Egor Kobylkin
2018-10-10 11:23                                       ` Marko Myllynen
2018-10-10 12:20                                         ` Egor Kobylkin
2018-10-10 12:34                                           ` Marko Myllynen
2018-10-10 22:29   ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v2 Egor Kobylkin
2018-10-11 10:00     ` Marko Myllynen
2018-10-11 11:05     ` Rafal Luzynski
2018-10-11 13:10       ` Marko Myllynen
2018-10-11 13:51       ` Volodymyr Lisivka
2018-10-11 14:59       ` Egor Kobylkin
2018-10-11 21:31         ` Egor Kobylkin
2018-10-11 15:05       ` Egor Kobylkin
2018-10-11 15:45   ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v3 Egor Kobylkin
2018-10-11 21:33   ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v4 Egor Kobylkin
2018-10-12 14:06   ` [PATCH v5] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] Egor Kobylkin
2018-10-13  1:01     ` Rafal Luzynski
2018-10-13 16:58       ` Egor Kobylkin
2018-10-15 11:05         ` Marko Myllynen
2018-10-15 11:55           ` Egor Kobylkin
2018-10-23 23:08         ` Rafal Luzynski
2018-10-17 14:17   ` [PATCH v6] " Egor Kobylkin
2018-11-01 22:52   ` [PATCH v7] " Egor Kobylkin
2018-11-02  0:01   ` [PATCH v8] " Egor Kobylkin
2018-11-02 22:22     ` Rafal Luzynski
2018-11-02 23:27       ` Egor Kobylkin
2018-11-14 21:25   ` [PATCH v9] " Egor Kobylkin
2018-11-16 22:17     ` Rafal Luzynski
2018-11-17 18:35       ` Egor Kobylkin
2018-11-19  7:14         ` Marko Myllynen
2018-11-19  9:22           ` Egor Kobylkin
2018-11-19 19:36             ` Marko Myllynen
2018-12-01 22:09           ` Rafal Luzynski
2018-12-01 22:53             ` Egor Kobylkin
2018-12-03 22:19             ` Egor Kobylkin
     [not found]               ` <1361059722.707244.1544231740358@poczta.nazwa.pl>
2018-12-10 21:20                 ` Marko Myllynen
2018-12-19 22:26                   ` Rafal Luzynski
2018-12-19 22:48                     ` Egor Kobylkin
2018-12-19 23:51                       ` Rafal Luzynski
2018-11-19 11:11   ` [PATCH v10] " Egor Kobylkin
2018-12-07 23:35     ` Rafal Luzynski
2018-12-08 21:51       ` Egor Kobylkin
2018-12-19 22:42         ` Rafal Luzynski
2018-12-19 23:02           ` Egor Kobylkin
2018-12-20  0:06             ` Rafal Luzynski
2018-12-08 22:28   ` [PATCH v11] Locales: Cyrillic -> ASCII transliteration " Egor Kobylkin
2018-12-19 23:16     ` Egor Kobylkin
2018-12-26 10:07       ` Siddhesh Poyarekar
2018-12-26 12:14         ` Egor Kobylkin
2018-12-27  1:31           ` Siddhesh Poyarekar
2018-12-27 11:31             ` Rafal Luzynski
2019-01-02 18:39   ` [PATCH v12] " Egor Kobylkin
2019-01-05 14:36     ` Rafal Luzynski
2019-01-05 21:13       ` Egor Kobylkin
2019-01-07 20:37         ` Marko Myllynen
2019-01-09  0:46           ` Egor Kobylkin
2019-01-09 20:03             ` Marko Myllynen
2019-02-04  7:14               ` [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] ping for 2.30 Egor Kobylkin
2019-02-14 16:48                 ` Marko Myllynen
2019-03-04 22:12                   ` Egor Kobylkin
2019-03-11 13:59                     ` PING " Egor Kobylkin
2019-03-14 19:49                       ` Egor Kobylkin
2019-04-19 22:24                   ` Rafal Luzynski
     [not found]                     ` <5ELixS9SQ0DW4mlvswp96ASpLobBabU9KQ6zOTH-Udrb34mABhcqiPERpBZfPWZ9F77s8XNmiLIAq9UWu0AjLFFdjOz_FZVU5_xF-SiQkrw=@kobylkin.com>
2019-04-27  2:51                       ` Siddhesh Poyarekar
2019-04-27  7:34                         ` Diego (Egor) Kobylkin
2019-04-09  1:04     ` [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] Carlos O'Donell
2019-03-19 10:39   ` ping " Egor Kobylkin
2019-03-28 16:20     ` [PING^4][PATCH " Marko Myllynen
2019-04-04 19:44     ` [PING^5][PATCH " Egor Kobylkin
2019-04-06  1:36       ` Siddhesh Poyarekar
2019-04-16  7:15     ` [PING^6][PATCH " Marko Myllynen
2019-04-16 13:17       ` Carlos O'Donell
2019-04-16 17:07         ` Egor Kobylkin
2019-04-16 17:58           ` Carlos O'Donell
2019-04-16 18:41             ` Egor Kobylkin
2019-04-16 19:06               ` Carlos O'Donell
2019-05-10 12:19                 ` Marko Myllynen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a9af47d8-bf3d-e607-38e1-a6e765a604d3@kobylkin.com \
    --to=egor@kobylkin.com \
    --cc=carlos@redhat.com \
    --cc=danilo@gnome.org \
    --cc=digitalfreak@lingonborough.com \
    --cc=keld@keldix.com \
    --cc=ldv@altlinux.org \
    --cc=libc-alpha@sourceware.org \
    --cc=libc-locales@sourceware.org \
    --cc=mkutny@gmail.com \
    --cc=myllynen@redhat.com \
    --cc=vlisivka@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).