Freeze ping. I'd like to ping the list on this patch and to have some discussion on moving ASCII transliteration to locale/C-translit.h.in before the freeze. The wiki page for 2.29 [12] is set as "immutable" for newly registered users, not sure it is so desired. I could not add this patch there as "desired". I have added 2.29 keyword to the bug entry. Bests, Egor Kobylkin [12] https://sourceware.org/glibc/wiki/Release/2.29 On 08.12.18 23:28, Egor Kobylkin wrote: > Changelog v11: > * Re-targeted the patch against locale/C-translit.h.in as the proper > file for the ASCII translit table. > * Correspondingly the patch now only contains the additional > Cyrillic-ASCII strings in the format of locale/C-translit.h.in table. > The 'include "translit_cyrillic";""' directives are not necessary in the > locale files and they are now all left intact. > * Also the file translit_cyrillic is not longer needed and is omitted. > * Edited below email, commit message. > > Changelog v10: > * Removed ISO 9.1995 GOST 7.79-2000 System A (transliteration to Latin > with diacritics) as conflicting with System B within glibc mechanics and > not solving BZ #2872 > * Edited below email, commit message, comment in translit_cyrillic to > reflect System A removal > * Removed and (Cyrillic U with acute, > using composition) as composing is not covered by current glibc > conversion mechanics > > Changelog v9: > * Fixed formatting (trailing spaces etc.) > * Put commit summary in the patch file, now it is generated completely > by git format-patch > > Changelog v8: > * Re-added missing translit_cyrillic in patch v7 (due to missing "git > add" in the script). > > Changelog v7: > * Generated against git://sourceware.org/git/glibc.git master with git > format-patch. > * The 'include "translit_cyrillic";""' now immediately follows last > 'include "translit_XXX";""' string (was inserted just before > translit_end previously.) > * Only the locales already having 'include .*translit.*;""' are patched > (see the list for manual exclusions below, full list of included locales > at the end of the email in the commit section.) > * Excluded az_AZ completely to avoid circular reference from tr_TR via > “copy "tr_TR"”. > > Changelog v6: > * Locales removed from the patch: C and sd_PK. > * Added locales: az_AZ and ky_KG. > * Consistently transliterate single uppercase Cyrillic letters > to sequences of all uppercase Latin letters in all languages (whenever > a Cyrillic letter is transliterated to more than one Latin letter), > for example "Ї" is now transliterated as "YI" rather than "Yi". > > Dear locale maintainers, > > fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails" > > https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1] > > add the Cyrillic transliteration rows to locale/C-translit.h.in. > > The patch is attached. > > > Current bug effect: > > The glibc wiki explicitly lists this use case as the test example and > currently it fails on Cyrillic texts [1] [8] [9]: > > iconv -f UTF-8 -t ASCII//TRANSLIT < translit-test-input.txt |grep CYRILLIC > > CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???. > > - it produces a string of question marks and spaces. > > This is what it should produce and it does so after the patch applied: > > CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe > chayu. > > > The root problem and the fix: > > The root problem is the missing transliteration table that I am > supplying here. > > > COMMIT MESSAGE: > This translit_cyrillic table enables conversion (e.g. with iconv) from a > UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text. > > Example: iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII > compatible transcription. > > While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of > a transliteration/transcription has only Latin/ASCII codes but still can > be read by a native speaker. Among other things it is useful for > processing the Cyrillic texts and filenames by programs or on systems > that are not specifically prepared to work with Cyrillic, don't have > corresponding fonts installed or can't handle UTF-8. > > The patch content (mapping) is based on ISO 9.1995 standard [10] and its > derivative GOST 7.79-2000 System B official source (Federal Agency on > Technical Regulating and Metrology Of Russian Federation [2]). > Technically an independent but mostly identical source [3] was used and > prepared in a spreadsheet [6]. > > The transliteration of Cyrillic to ASCII according to GOST 7.79-2000 > System B represents what is actually called transcription (preserving > phonemes), while System A is the transliteration (preserving graphemes). > There is no meaningful way to preserve graphemes converting Cyrillic to > ASCII and thus the System B is chosen [11]. To be super clear the System > A has nothing to do with this bug regardless it being a transliteration. > > Those interested in implementing System A for transliteration of > Cyrillic to Latin with Diacritic as a new feature are welcome to use the > spreadsheet in [6] as a starting point. > > Links: > > [1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872 > [2] GOST 7.79-2000 official source > http://protect.gost.ru/document.aspx?control=7&id=130715 (is only > available in low quality gif format) > [3] http://transliteration.ru/gost-7-79-2000/ and > http://www.yfermer.ru/specifications/285821.html > [4] Wikipedia article on Cyrillic transliteration with Latin alphabet > https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9 > [5] http://man7.org/linux/man-pages/man5/locale.5.html > [6] Spreadsheet for generating translit_cyrillic > https://sourceware.org/bugzilla/attachment.cgi?bugid=2872&action=viewall&hide_obsolete=1 > [8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales > [9] translit-test-input.txt > https://sourceware.org/bugzilla/attachment.cgi?id=11304 > [10] https://en.wikipedia.org/wiki/ISO_9#GOST_7.79_System_B > [11] > https://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=gslmka8xq3 > > Best regards, > Egor Kobylkin > >