From: <yingxiang.yao@kasumi321.de>
To: "'Rickard Armiento'" <misc-dev@armiento.net>
Cc: <fweimer@redhat.com>, <carlos@redhat.com>, <locales@tfan.eu>,
<libc-locales@sourceware.org>
Subject: AW: Adding an en_SE locale (was: Adding an en_DE locale)
Date: Wed, 21 Aug 2024 23:11:06 +0200 [thread overview]
Message-ID: <022a01daf40e$a45eb2c0$ed1c1840$@kasumi321.de> (raw)
In-Reply-To: <d2f3d7c4-91c2-430c-9adc-39a856e5ad58@armiento.net>
Hello Rickard,
I understand your confusion and I was also not sure how to do it properly. I removed things like currency etc. just to avoid the case like that Sweden and other EU countries are not using Euro. For the thousand separators, I think a thin space would be appropriate, since this is also stated in style guide of English language of EU commission. Also in this guide, they state that comma is used as decimal separator due to technical reasons.
Regards,
Yingxiang Yao
-----Ursprüngliche Nachricht-----
Von: Rickard Armiento <misc-dev@armiento.net>
Gesendet: Mittwoch, 21. August 2024 14:01
An: libc-locales@sourceware.org
Cc: fweimer@redhat.com; carlos@redhat.com; locales@tfan.eu; yingxiang.yao@kasumi321.de
Betreff: Adding an en_SE locale (was: Adding an en_DE locale)
On 7/3/24 12:20 PM, Thomas Näveke wrote:
>>> I was wondering if I was welcome to submit a patch for an "en_DE"
>>> locale, i.e. a locale that uses English language but German units, >>> dates, etc.
On 7/5/24 12:06 PM, Carlos O'Donell wrote:
>> Yes, if Windows 10 and Unicode CLDR have such locales, then I think >> we should consider that they are useful and in use by users.
On 7/5/24 15:21 PM, Florian Weimer wrote:
> On the other hand, glibc supports on the fly composition of locales, > so you could use LANG=en_US.utf8 with the de_DE.utf8 overrides for the > parts you want from there (or vice versa).
I started looking into how to contribute locales to glibc with a similar
motivation: to see if an en_SE locale for English as used in Sweden could be accepted. Similar as for en_DE, en_SE is present in Unicode CLDR, and a web search shows many examples of confusion caused by uses of this locale designation despite not being present in glibc.
I then found that Yingxiang Yao (included in cc) sent a patch in January to glibc-alpha for an en_EU locale, which is highly related to the question how to handle these locale issues for users of English in the EU region.
https://sourceware.org/pipermail/libc-alpha/2024-January/thread.html#153906
(It is also noteworthy that glibc already includes an en_DK locale for English as used in Denmark, which is quite close to the proposed en_US).
At first look, it seems the en_EU could be the general solution to the issue of otherwise having each EU country defining their own en_*. With an en_EU users can set LANG to this, and override relevant country-specific conventions using the LC_* settings (as suggested by Florian Weimer above).
However, looking more closely at this, it gets a bit messy. For English as used in Sweden, all of the following categories probably should be overridden as sv_SE rather than en_EU (or en_US, en_GB):
LC_COLLATE=sv_SE.UTF-8
# sv_SE modifies iso14651_t1 for the Swedish umlat characters and there # may be special rules for V and W. Arguably, even when using English in # Sweden, one would normally sort names, etc., according to this Swedish # sort order, not US or UK English.
LC_CTYPE=sv_SE.UTF-8
# The Swedish umlaut characters needs correct classifications when # dealing with, e.g., Swedish names (e.g., personal and geographical) # alongside English text.
LC_MONETARY=sv_SE.UTF-8
# Sweden uses its own currency (SEK), and not the Euro.
LC_TELEPHONE=sv_SE.UTF-8
# The standard way of writing phone numbers differ across EU.
# Most importantly, the 'int_select' and 'int_prefix' needs # country-specific values.
So, it turns out only the following categories would adopt the proposed en_EU (or, in fact, one can use en_DK, since for these they are equal):
LC_TIME=en_EU.UTF-8
LC_NAME=en_EU.UTF-8
LC_PAPER=en_EU.UTF-8
LC_MESSAGES=en_EU.UTF-8
But, finally, there are two categories that cause trouble:
LC_NUMERIC: when writing decimal numbers in English in Sweden, it is common to adopt the English use of decimal point (.) instead of the decimal comma (,). However, the en_US / en_GB use of a comma as a thousands separator is usually *not* adopted, probably in part because it could then be confused with the Swedish decimal separator. Hence, a thin space "<U202F>" (which sv_SE uses) is arguably the right choice.
None of the other locales discussed here (including en_DK and the proposed en_EU) matches this definition of LC_NUMERIC.
Could perhaps the proposed en_EU be modified to instead use the more English-centric decimal_point="." but with thousand_sep = "<U202F>"?
Maybe there is a need to investigate more deeply how people write numbers alongside English text across different EU countries?
LC_ADDRESS: This setting unfortunately combines country-related fields (country_name, ab2, and ab3) and language-related fields (lang_name, lang_ab, lang_term, lang_lib). I don't see how one can handle this setting fully correctly across the different EU countries without separate en_* locales.
My conclusion is that even if it may be possible (perhaps with some
compromise) to deal with the remaining issues, the end mix of settings will be complex for end users. Hence, maybe it is more user-friendly with separate en_DE, en_SE, en_*, etc. alongside the already existing en_DK. It would also help avoiding issues caused by, e.g., the en_SE designation already being in some use (e.g., in CLDR).
I have made my en_SE available here:
https://github.com/httk/locale-en_SE/blob/main/en_SE
I would be happy to hear any thoughts or suggestions related to taking the steps to contribute this into glibc.
Best regards,
Rickard
next prev parent reply other threads:[~2024-08-21 21:11 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-03 16:20 Adding an en_DE locale Thomas Näveke
2024-07-05 12:06 ` Carlos O'Donell
2024-07-05 15:21 ` Florian Weimer
2024-08-21 12:01 ` Adding an en_SE locale (was: Adding an en_DE locale) Rickard Armiento
2024-08-21 13:25 ` Adding an en_SE locale Carlos O'Donell
2024-08-21 21:11 ` yingxiang.yao [this message]
2024-08-22 11:01 ` Rickard Armiento
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='022a01daf40e$a45eb2c0$ed1c1840$@kasumi321.de' \
--to=yingxiang.yao@kasumi321.de \
--cc=carlos@redhat.com \
--cc=fweimer@redhat.com \
--cc=libc-locales@sourceware.org \
--cc=locales@tfan.eu \
--cc=misc-dev@armiento.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).