* Language code changes over time: zh_CN -> zh_Hans
@ 2019-10-10 15:22 Jean-Baptiste Holcroft
2019-10-10 16:25 ` Florian Weimer
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Jean-Baptiste Holcroft @ 2019-10-10 15:22 UTC (permalink / raw)
To: libc-alpha, libc-locales
Hi,
the Fedora community is migrating to the Weblate translation platform.
This translation platform uses zh_Hans, zh_Hant, zh_Hant_HK by default,
instead of zh_CN, zh_TW and zh_HK.
If I understand correctly, we need to make sure the language code exists
in glibc before deciding using the new code for Linux applications.
The issue is as follow: a translation platform can be used by many
projects, not all using glibc, like Websites or mobile application who
already are using the new code.
The web tells that the codes zh_CN, zh_TW, zh_HK are old codes and
should be replaced by the new ones.
Multiple sources tends to confirm this (and show that other actors
already did the move), the most relevant ones are:
https://www.w3.org/TR/i18n-html-tech-lang/#h2_langvalues
http://www.rfc-editor.org/rfc/bcp/bcp47.txt
And the replacement rule is also in the CLDR:
https://github.com/unicode-org/cldr/blob/ed7854cb6209678739712854a2df1cac732be540/common/supplemental/supplementalMetadata.xml#L177
What is the support status of these new language codes in Glibc?
If not supported, can we imagine to have backward compatibility while
upstream projects migrate to the new language code?
I assume these are not the only language code renaming, what policy do
you suggest concerning these?
Please do not hesitate to tell where this discussion should happen as
this is my first contact with your community.
Thanks a lot for your help,
Jean-Baptiste
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Language code changes over time: zh_CN -> zh_Hans
2019-10-10 15:22 Language code changes over time: zh_CN -> zh_Hans Jean-Baptiste Holcroft
@ 2019-10-10 16:25 ` Florian Weimer
2019-10-11 20:27 ` Mike FABIAN
2019-10-10 20:53 ` Carlos O'Donell
2019-10-11 20:41 ` Mike FABIAN
2 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2019-10-10 16:25 UTC (permalink / raw)
To: Jean-Baptiste Holcroft; +Cc: libc-alpha, libc-locales
* Jean-Baptiste Holcroft:
> the Fedora community is migrating to the Weblate translation platform.
> This translation platform uses zh_Hans, zh_Hant, zh_Hant_HK by
> default, instead of zh_CN, zh_TW and zh_HK.
Is zh_Hant_HK perhaps a typo? Written as-is, that's going to be
problematic. We had some bugs specific to the eo locale because its
name does not have an underscore in it.
I'm not sure if the language tag syntax is actually compatible with our
locale names. For example, we write sr-Latn-RS (mentioned in RFC 5646)
as sr_RS@latin.
We have an aliasing mechanism, so we don't need to store the locale data
several times over, so that part shouldn't be a problem.
> If I understand correctly, we need to make sure the language code
> exists in glibc before deciding using the new code for Linux
> applications.
>
> The issue is as follow: a translation platform can be used by many
> projects, not all using glibc, like Websites or mobile application who
> already are using the new code.
>
> The web tells that the codes zh_CN, zh_TW, zh_HK are old codes and
> should be replaced by the new ones.
> Multiple sources tends to confirm this (and show that other actors
> already did the move), the most relevant ones are:
> https://www.w3.org/TR/i18n-html-tech-lang/#h2_langvalues
This says:
| Where possible, use the codes zh-Hans and zh-Hant to refer to Simplified
| and Traditional Chinese, respectively. more...
And the “more...” link goes to:
<https://www.w3.org/International/articles/language-tags/Overview.var#script>
which is dead. This isn't very reassuring.
> What is the support status of these new language codes in Glibc?
> If not supported, can we imagine to have backward compatibility while
> upstream projects migrate to the new language code?
> I assume these are not the only language code renaming, what policy do
> you suggest concerning these?
We must have renamed language codes in the past, but I don't remember
any examples. Mostly we hadn't 1:1 replacements, but transitions due to
changing political circumstances. There were also some language codes
which were completely wrong.
Thanks,
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Language code changes over time: zh_CN -> zh_Hans
2019-10-10 15:22 Language code changes over time: zh_CN -> zh_Hans Jean-Baptiste Holcroft
2019-10-10 16:25 ` Florian Weimer
@ 2019-10-10 20:53 ` Carlos O'Donell
2019-10-11 20:41 ` Mike FABIAN
2 siblings, 0 replies; 7+ messages in thread
From: Carlos O'Donell @ 2019-10-10 20:53 UTC (permalink / raw)
To: Jean-Baptiste Holcroft, libc-alpha, libc-locales, Rafal Luzynski,
Mike Fabian
On 10/10/19 11:22 AM, Jean-Baptiste Holcroft wrote:
> What is the support status of these new language codes in Glibc?
The W3C reference says to use zh-Hans and zh-Hant, which follow the BCP47 RFC.
So glibc would need to add aliases for zh-Hans and zh-Hant.
We don't currently have those aliases, but could add them.
> If not supported, can we imagine to have backward compatibility while
> upstream projects migrate to the new language code? I assume these
> are not the only language code renaming, what policy do you suggest
> concerning these?
We should get the work done upstream and backport.
Mike, Rafal, Any opinions? This just seems like a locale.alias addition?
Is there a wider set of automatic aliases we need to add to conform to BCP47
with the system APIs?
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Language code changes over time: zh_CN -> zh_Hans
2019-10-10 16:25 ` Florian Weimer
@ 2019-10-11 20:27 ` Mike FABIAN
0 siblings, 0 replies; 7+ messages in thread
From: Mike FABIAN @ 2019-10-11 20:27 UTC (permalink / raw)
To: Florian Weimer; +Cc: Jean-Baptiste Holcroft, libc-alpha, libc-locales
Florian Weimer <fweimer@redhat.com> ããã¯ããã¾ãã:
> * Jean-Baptiste Holcroft:
>
>> the Fedora community is migrating to the Weblate translation platform.
>> This translation platform uses zh_Hans, zh_Hant, zh_Hant_HK by
>> default, instead of zh_CN, zh_TW and zh_HK.
>
> Is zh_Hant_HK perhaps a typo?
No, Hant is the script (traditional Chinese, similar to Latn in
sr-Latn-RS). Traditional Chinese is correct for HK, simplified
would be used in zh_Hans_CN and zh_Hans_SG.
> Written as-is, that's going to be
> problematic. We had some bugs specific to the eo locale because its
> name does not have an underscore in it.
> I'm not sure if the language tag syntax is actually compatible with our
> locale names. For example, we write sr-Latn-RS (mentioned in RFC 5646)
> as sr_RS@latin.
I am not sure either.
> We have an aliasing mechanism, so we don't need to store the locale data
> several times over, so that part shouldn't be a problem.
> | Where possible, use the codes zh-Hans and zh-Hant to refer to Simplified
> | and Traditional Chinese, respectively. more...
>
> And the âmore...â link goes to:
>
> <https://www.w3.org/International/articles/language-tags/Overview.var#script>
>
> which is dead. This isn't very reassuring.
--
Mike FABIAN <mfabian@redhat.com>
ç¡ç ä¸è¶³ã¯ããä»äºã®æµã ã
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Language code changes over time: zh_CN -> zh_Hans
2019-10-10 15:22 Language code changes over time: zh_CN -> zh_Hans Jean-Baptiste Holcroft
2019-10-10 16:25 ` Florian Weimer
2019-10-10 20:53 ` Carlos O'Donell
@ 2019-10-11 20:41 ` Mike FABIAN
2019-10-12 9:35 ` Jean-Baptiste
2 siblings, 1 reply; 7+ messages in thread
From: Mike FABIAN @ 2019-10-11 20:41 UTC (permalink / raw)
To: Jean-Baptiste Holcroft; +Cc: libc-alpha, libc-locales
Jean-Baptiste Holcroft <jean-baptiste@holcroft.fr> ããã¯ããã¾ãã:
> Hi,
>
> the Fedora community is migrating to the Weblate translation platform.
> This translation platform uses zh_Hans, zh_Hant, zh_Hant_HK by
> default, instead of zh_CN, zh_TW and zh_HK.
Does it have to use these?
This is also weblate:
https://l10n.opensuse.org/projects/yast-network/#languages
https://l10n.opensuse.org/languages/zh_CN/yast-network/
https://l10n.opensuse.org/languages/zh_TW/yast-network/
And it uses zh_CN and zh_TW. So that seems possible with weblate.
I think it would be easier to use zh_CN, zh_TW, zh_SG, zh_HK
in weblate instead changing glibc to accept zh_Hans_CN etc.
If you look into the folders /usr/share/locale/zh_TW and
/usr/share/locale/zh_HK on a linux system, you will find that there are
many translations there. 1020 .mo files in
/usr/share/locale/zh_CN/LC_MESSAGES/ on my Fedora 30 and 977 .mo files
in /usr/share/locale/zh_TW/LC_MESSAGES. And no translations at all in
/usr/share/locale/zh_Hans_CN/LC_MESSAGES/ and
/usr/share/locale/zh_Hant_TW/LC_MESSAGES/. Although these directories
exist, they are empty. So switching to zh_Hans_CN would be quite
a lot of effort:
- changing glibc to accept zh_Hans_CN.UTF-8 (Not sure whether that is
even possible)
- Changing all these packages with the translations.
It might be possible to change gettext: If the locale is zh_CN.UTF-8,
it could look in the zh_CN folder as it does not and if no .mo file
is found there, it could try zh_Hans_CN as well.
But that is also not an easy change.
I think using zh_CN and zh_TW in weblate is easiest and
looking at the above weblate site used by openSUSE, it seems possible.
--
Mike FABIAN <mfabian@redhat.com>
ç¡ç ä¸è¶³ã¯ããä»äºã®æµã ã
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Language code changes over time: zh_CN -> zh_Hans
2019-10-11 20:41 ` Mike FABIAN
@ 2019-10-12 9:35 ` Jean-Baptiste
2019-10-12 13:52 ` Mike FABIAN
0 siblings, 1 reply; 7+ messages in thread
From: Jean-Baptiste @ 2019-10-12 9:35 UTC (permalink / raw)
To: libc-locales; +Cc: libc-alpha
Thank for your answers, I don't debate Weblate can use the old code.
I see two issues:
1. depending on the target platform, upstream project have to manually convert language code used by the system
2. The default setting in a translation platform is for every project, it force every user project we host to remain on the old code by default
Short term solution is use old codes and I could have done it without telling anyone.
But I feel like we need to have upstream support for new languages code so we follow the standards evolution over time. Android did the change years ago, Django also, Microsoft. Net also, and I'm sure we can find other example.
I'm confused on the reason why we have zh_Hans_CN in Locales folder, does it means we can use it already?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Language code changes over time: zh_CN -> zh_Hans
2019-10-12 9:35 ` Jean-Baptiste
@ 2019-10-12 13:52 ` Mike FABIAN
0 siblings, 0 replies; 7+ messages in thread
From: Mike FABIAN @ 2019-10-12 13:52 UTC (permalink / raw)
To: Jean-Baptiste; +Cc: libc-locales, libc-alpha
Jean-Baptiste <jean-baptiste@holcroft.fr> ããã¯ããã¾ãã:
> Thank for your answers, I don't debate Weblate can use the old code.
>
> I see two issues:
> 1. depending on the target platform, upstream project have to manually
> convert language code used by the system
> 2. The default setting in a translation platform is for every project,
> it force every user project we host to remain on the old code by
> default
>
> Short term solution is use old codes and I could have done it without telling anyone.
>
> But I feel like we need to have upstream support for new languages
> code so we follow the standards evolution over time. Android did the
> change years ago, Django also, Microsoft. Net also, and I'm sure we
> can find other example.
>
> I'm confused on the reason why we have zh_Hans_CN in Locales folder,
> does it means we can use it already?
No, because gettext would find translations in that folder only
if you set LC_MESSAGES=zh_Hans_CN.UTF-8, which you cannot:
$ LC_ALL=zh_Hans_CN.UTF-8 locale charmap
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968
Maybe you can set LANGUAGE=zh_Hans_CN to make gettext find translations
in that folder.
We have several folders in /usr/share/locale/, some of them probably
only because somebody made a mistake. For example:
$ ls /usr/share/locale/zh_* -d
/usr/share/locale/zh_CN/ /usr/share/locale/zh_Hant/ /usr/share/locale/zh_TW/
/usr/share/locale/zh_CN.GB2312/ /usr/share/locale/zh_Hant_TW/ /usr/share/locale/zh_TW.Big5/
/usr/share/locale/zh_Hans_CN/ /usr/share/locale/zh_HK/
zh_CN.GB2312 and zh_TW.Big5 make no sense.
Or for Serbian:
$ ls /usr/share/locale/sr* -d
/usr/share/locale/sr/ '/usr/share/locale/sr@Latn'/
/usr/share/locale/sr_Cyrl/ /usr/share/locale/sr_ME/
/usr/share/locale/srd/ /usr/share/locale/srn/
'/usr/share/locale/sr@ije'/ /usr/share/locale/srr/
'/usr/share/locale/sr@ijekavian'/ /usr/share/locale/sr_RS/
'/usr/share/locale/sr@ijekavianlatin'/ '/usr/share/locale/sr_RS@latin'/
'/usr/share/locale/sr@latin'/
Why do we have sr@Latn and sr_RS@latin??
--
Mike FABIAN <mfabian@redhat.com>
ç¡ç ä¸è¶³ã¯ããä»äºã®æµã ã
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-10-12 13:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-10 15:22 Language code changes over time: zh_CN -> zh_Hans Jean-Baptiste Holcroft
2019-10-10 16:25 ` Florian Weimer
2019-10-11 20:27 ` Mike FABIAN
2019-10-10 20:53 ` Carlos O'Donell
2019-10-11 20:41 ` Mike FABIAN
2019-10-12 9:35 ` Jean-Baptiste
2019-10-12 13:52 ` Mike FABIAN
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).