public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* Language code changes over time: zh_CN -> zh_Hans
@ 2019-10-10 15:22 Jean-Baptiste Holcroft
  2019-10-10 16:25 ` Florian Weimer
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jean-Baptiste Holcroft @ 2019-10-10 15:22 UTC (permalink / raw)
  To: libc-alpha, libc-locales

Hi,

the Fedora community is migrating to the Weblate translation platform.
This translation platform uses zh_Hans, zh_Hant, zh_Hant_HK by default, 
instead of zh_CN, zh_TW and zh_HK.

If I understand correctly, we need to make sure the language code exists 
in glibc before deciding using the new code for Linux applications.

The issue is as follow: a translation platform can be used by many 
projects, not all using glibc, like Websites or mobile application who 
already are using the new code.

The web tells that the codes zh_CN, zh_TW, zh_HK are old codes and 
should be replaced by the new ones.
Multiple sources tends to confirm this (and show that other actors 
already did the move), the most relevant ones are:
https://www.w3.org/TR/i18n-html-tech-lang/#h2_langvalues
http://www.rfc-editor.org/rfc/bcp/bcp47.txt

And the replacement rule is also in the CLDR:
https://github.com/unicode-org/cldr/blob/ed7854cb6209678739712854a2df1cac732be540/common/supplemental/supplementalMetadata.xml#L177


What is the support status of these new language codes in Glibc?
If not supported, can we imagine to have backward compatibility while 
upstream projects migrate to the new language code?
I assume these are not the only language code renaming, what policy do 
you suggest concerning these?


Please do not hesitate to tell where this discussion should happen as 
this is my first contact with your community.


Thanks a lot for your help,
Jean-Baptiste

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Language code changes over time: zh_CN -> zh_Hans
  2019-10-10 15:22 Language code changes over time: zh_CN -> zh_Hans Jean-Baptiste Holcroft
@ 2019-10-10 16:25 ` Florian Weimer
  2019-10-11 20:27   ` Mike FABIAN
  2019-10-10 20:53 ` Carlos O'Donell
  2019-10-11 20:41 ` Mike FABIAN
  2 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2019-10-10 16:25 UTC (permalink / raw)
  To: Jean-Baptiste Holcroft; +Cc: libc-alpha, libc-locales

* Jean-Baptiste Holcroft:

> the Fedora community is migrating to the Weblate translation platform.
> This translation platform uses zh_Hans, zh_Hant, zh_Hant_HK by
> default, instead of zh_CN, zh_TW and zh_HK.

Is zh_Hant_HK perhaps a typo?  Written as-is, that's going to be
problematic.  We had some bugs specific to the eo locale because its
name does not have an underscore in it.

I'm not sure if the language tag syntax is actually compatible with our
locale names.  For example, we write sr-Latn-RS (mentioned in RFC 5646)
as sr_RS@latin.

We have an aliasing mechanism, so we don't need to store the locale data
several times over, so that part shouldn't be a problem.

> If I understand correctly, we need to make sure the language code
> exists in glibc before deciding using the new code for Linux
> applications.
>
> The issue is as follow: a translation platform can be used by many
> projects, not all using glibc, like Websites or mobile application who
> already are using the new code.
>
> The web tells that the codes zh_CN, zh_TW, zh_HK are old codes and
> should be replaced by the new ones.
> Multiple sources tends to confirm this (and show that other actors
> already did the move), the most relevant ones are:
> https://www.w3.org/TR/i18n-html-tech-lang/#h2_langvalues

This says:

| Where possible, use the codes zh-Hans and zh-Hant to refer to Simplified
| and Traditional Chinese, respectively. more...

And the “more...” link goes to:

<https://www.w3.org/International/articles/language-tags/Overview.var#script>

which is dead.  This isn't very reassuring.

> What is the support status of these new language codes in Glibc?
> If not supported, can we imagine to have backward compatibility while
> upstream projects migrate to the new language code?
> I assume these are not the only language code renaming, what policy do
> you suggest concerning these?

We must have renamed language codes in the past, but I don't remember
any examples.  Mostly we hadn't 1:1 replacements, but transitions due to
changing political circumstances.  There were also some language codes
which were completely wrong.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Language code changes over time: zh_CN -> zh_Hans
  2019-10-10 15:22 Language code changes over time: zh_CN -> zh_Hans Jean-Baptiste Holcroft
  2019-10-10 16:25 ` Florian Weimer
@ 2019-10-10 20:53 ` Carlos O'Donell
  2019-10-11 20:41 ` Mike FABIAN
  2 siblings, 0 replies; 7+ messages in thread
From: Carlos O'Donell @ 2019-10-10 20:53 UTC (permalink / raw)
  To: Jean-Baptiste Holcroft, libc-alpha, libc-locales, Rafal Luzynski,
	Mike Fabian

On 10/10/19 11:22 AM, Jean-Baptiste Holcroft wrote:
> What is the support status of these new language codes in Glibc?

The W3C reference says to use zh-Hans and zh-Hant, which follow the BCP47 RFC.

So glibc would need to add aliases for zh-Hans and zh-Hant.

We don't currently have those aliases, but could add them.

> If not supported, can we imagine to have backward compatibility while
> upstream projects migrate to the new language code? I assume these
> are not the only language code renaming, what policy do you suggest
> concerning these?

We should get the work done upstream and backport.

Mike, Rafal, Any opinions? This just seems like a locale.alias addition?
Is there a wider set of automatic aliases we need to add to conform to BCP47
with the system APIs?

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Language code changes over time: zh_CN -> zh_Hans
  2019-10-10 16:25 ` Florian Weimer
@ 2019-10-11 20:27   ` Mike FABIAN
  0 siblings, 0 replies; 7+ messages in thread
From: Mike FABIAN @ 2019-10-11 20:27 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Jean-Baptiste Holcroft, libc-alpha, libc-locales

Florian Weimer <fweimer@redhat.com> さんはかきました:

> * Jean-Baptiste Holcroft:
>
>> the Fedora community is migrating to the Weblate translation platform.
>> This translation platform uses zh_Hans, zh_Hant, zh_Hant_HK by
>> default, instead of zh_CN, zh_TW and zh_HK.
>
> Is zh_Hant_HK perhaps a typo?

No, Hant is the script (traditional Chinese, similar to Latn in
sr-Latn-RS). Traditional Chinese is correct for HK, simplified
would be used in zh_Hans_CN and zh_Hans_SG.

> Written as-is, that's going to be
> problematic.  We had some bugs specific to the eo locale because its
> name does not have an underscore in it.

> I'm not sure if the language tag syntax is actually compatible with our
> locale names.  For example, we write sr-Latn-RS (mentioned in RFC 5646)
> as sr_RS@latin.

I am not sure either.

> We have an aliasing mechanism, so we don't need to store the locale data
> several times over, so that part shouldn't be a problem.

> | Where possible, use the codes zh-Hans and zh-Hant to refer to Simplified
> | and Traditional Chinese, respectively. more...
>
> And the “more...” link goes to:
>
> <https://www.w3.org/International/articles/language-tags/Overview.var#script>
>
> which is dead.  This isn't very reassuring.

-- 
Mike FABIAN <mfabian@redhat.com>
睡眠不足はいい仕事の敵だ。

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Language code changes over time: zh_CN -> zh_Hans
  2019-10-10 15:22 Language code changes over time: zh_CN -> zh_Hans Jean-Baptiste Holcroft
  2019-10-10 16:25 ` Florian Weimer
  2019-10-10 20:53 ` Carlos O'Donell
@ 2019-10-11 20:41 ` Mike FABIAN
  2019-10-12  9:35   ` Jean-Baptiste
  2 siblings, 1 reply; 7+ messages in thread
From: Mike FABIAN @ 2019-10-11 20:41 UTC (permalink / raw)
  To: Jean-Baptiste Holcroft; +Cc: libc-alpha, libc-locales

Jean-Baptiste Holcroft <jean-baptiste@holcroft.fr> さんはかきました:

> Hi,
>
> the Fedora community is migrating to the Weblate translation platform.
> This translation platform uses zh_Hans, zh_Hant, zh_Hant_HK by
> default, instead of zh_CN, zh_TW and zh_HK.

Does it have to use these?

This is also weblate:

https://l10n.opensuse.org/projects/yast-network/#languages
https://l10n.opensuse.org/languages/zh_CN/yast-network/
https://l10n.opensuse.org/languages/zh_TW/yast-network/

And it uses zh_CN and zh_TW. So that seems possible with weblate.

I think it would be easier to use zh_CN, zh_TW, zh_SG, zh_HK
in weblate instead changing glibc to accept zh_Hans_CN etc.

If you look into the folders /usr/share/locale/zh_TW and
/usr/share/locale/zh_HK on a linux system, you will find that there are
many translations there.  1020 .mo files in
/usr/share/locale/zh_CN/LC_MESSAGES/ on my Fedora 30 and 977 .mo files
in /usr/share/locale/zh_TW/LC_MESSAGES.  And no translations at all in
/usr/share/locale/zh_Hans_CN/LC_MESSAGES/ and
/usr/share/locale/zh_Hant_TW/LC_MESSAGES/. Although these directories
exist, they are empty. So switching to zh_Hans_CN would be quite
a lot of effort:

- changing glibc to accept zh_Hans_CN.UTF-8 (Not sure whether that  is
  even possible)
- Changing all these packages with the translations.

It might be possible to change gettext: If the locale is zh_CN.UTF-8,
it could look in the zh_CN folder as it does not and if no .mo file
is found there, it could try zh_Hans_CN as well.
But that is also not an easy change.

I think using zh_CN and zh_TW in weblate is easiest and
looking at the above weblate site used by openSUSE, it seems possible.

-- 
Mike FABIAN <mfabian@redhat.com>
睡眠不足はいい仕事の敵だ。

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Language code changes over time: zh_CN -> zh_Hans
  2019-10-11 20:41 ` Mike FABIAN
@ 2019-10-12  9:35   ` Jean-Baptiste
  2019-10-12 13:52     ` Mike FABIAN
  0 siblings, 1 reply; 7+ messages in thread
From: Jean-Baptiste @ 2019-10-12  9:35 UTC (permalink / raw)
  To: libc-locales; +Cc: libc-alpha

Thank for your answers, I don't debate Weblate can use the old code.

I see two issues:
1. depending on the target platform, upstream project have to manually convert language code used by the system
2. The default setting in a translation platform is for every project, it force every user project we host to remain on the old code by default

Short term solution is use old codes and I could have done it without telling anyone.

But I feel like we need to have upstream support for new languages code so we follow the standards evolution over time. Android did the change years ago, Django also, Microsoft. Net also, and I'm sure we can find other example.

I'm confused on the reason why we have zh_Hans_CN in Locales folder, does it means we can use it already?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Language code changes over time: zh_CN -> zh_Hans
  2019-10-12  9:35   ` Jean-Baptiste
@ 2019-10-12 13:52     ` Mike FABIAN
  0 siblings, 0 replies; 7+ messages in thread
From: Mike FABIAN @ 2019-10-12 13:52 UTC (permalink / raw)
  To: Jean-Baptiste; +Cc: libc-locales, libc-alpha

Jean-Baptiste <jean-baptiste@holcroft.fr> さんはかきました:

> Thank for your answers, I don't debate Weblate can use the old code.
>
> I see two issues:
> 1. depending on the target platform, upstream project have to manually
> convert language code used by the system
> 2. The default setting in a translation platform is for every project,
> it force every user project we host to remain on the old code by
> default
>
> Short term solution is use old codes and I could have done it without telling anyone.
>
> But I feel like we need to have upstream support for new languages
> code so we follow the standards evolution over time. Android did the
> change years ago, Django also, Microsoft. Net also, and I'm sure we
> can find other example.
>
> I'm confused on the reason why we have zh_Hans_CN in Locales folder,
> does it means we can use it already?

No, because gettext  would find  translations in that folder only
if you set LC_MESSAGES=zh_Hans_CN.UTF-8, which you cannot:

$ LC_ALL=zh_Hans_CN.UTF-8 locale charmap
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968

Maybe you can set LANGUAGE=zh_Hans_CN to make gettext find translations
in that folder.

We have several folders in /usr/share/locale/, some of them probably
only because somebody made a mistake. For example:

$ ls /usr/share/locale/zh_* -d
/usr/share/locale/zh_CN/         /usr/share/locale/zh_Hant/     /usr/share/locale/zh_TW/
/usr/share/locale/zh_CN.GB2312/  /usr/share/locale/zh_Hant_TW/  /usr/share/locale/zh_TW.Big5/
/usr/share/locale/zh_Hans_CN/    /usr/share/locale/zh_HK/

zh_CN.GB2312 and zh_TW.Big5 make no sense.

Or for Serbian:
$ ls /usr/share/locale/sr* -d 
 /usr/share/locale/sr/                  '/usr/share/locale/sr@Latn'/
 /usr/share/locale/sr_Cyrl/              /usr/share/locale/sr_ME/
 /usr/share/locale/srd/                  /usr/share/locale/srn/
'/usr/share/locale/sr@ije'/              /usr/share/locale/srr/
'/usr/share/locale/sr@ijekavian'/        /usr/share/locale/sr_RS/
'/usr/share/locale/sr@ijekavianlatin'/  '/usr/share/locale/sr_RS@latin'/
'/usr/share/locale/sr@latin'/

Why do we have sr@Latn and sr_RS@latin??

-- 
Mike FABIAN <mfabian@redhat.com>
睡眠不足はいい仕事の敵だ。

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-10-12 13:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-10 15:22 Language code changes over time: zh_CN -> zh_Hans Jean-Baptiste Holcroft
2019-10-10 16:25 ` Florian Weimer
2019-10-11 20:27   ` Mike FABIAN
2019-10-10 20:53 ` Carlos O'Donell
2019-10-11 20:41 ` Mike FABIAN
2019-10-12  9:35   ` Jean-Baptiste
2019-10-12 13:52     ` Mike FABIAN

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).