public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
From: Rickard Armiento <misc-dev@armiento.net>
To: libc-locales@sourceware.org
Cc: fweimer@redhat.com, carlos@redhat.com, locales@tfan.eu,
	yingxiang.yao@kasumi321.de
Subject: Adding an en_SE locale (was: Adding an en_DE locale)
Date: Wed, 21 Aug 2024 14:01:04 +0200	[thread overview]
Message-ID: <d2f3d7c4-91c2-430c-9adc-39a856e5ad58@armiento.net> (raw)
In-Reply-To: <87le2fvp7s.fsf@oldenburg.str.redhat.com>

On 7/3/24 12:20 PM, Thomas Näveke wrote:

 >>> I was wondering if I was welcome to submit a patch for an "en_DE"
 >>> locale, i.e. a locale that uses English language but German units,
 >>> dates, etc.

On 7/5/24 12:06 PM, Carlos O'Donell wrote:

 >> Yes, if Windows 10 and Unicode CLDR have such locales, then I think
 >> we should consider that they are useful and in use by users.

On 7/5/24 15:21 PM, Florian Weimer wrote:

 > On the other hand, glibc supports on the fly composition of locales,
 > so you could use LANG=en_US.utf8 with the de_DE.utf8 overrides for the
 > parts you want from there (or vice versa).

I started looking into how to contribute locales to glibc with a similar 
motivation: to see if an en_SE locale for English as used in Sweden 
could be accepted. Similar as for en_DE, en_SE is present in Unicode 
CLDR, and a web search shows many examples of confusion caused by uses 
of this locale designation despite not being present in glibc.

I then found that Yingxiang Yao (included in cc) sent a patch in January 
to glibc-alpha for an en_EU locale, which is highly related to the 
question how to handle these locale issues for users of English in the 
EU region.

 
https://sourceware.org/pipermail/libc-alpha/2024-January/thread.html#153906

(It is also noteworthy that glibc already includes an en_DK locale for 
English as used in Denmark, which is quite close to the proposed en_US).

At first look, it seems the en_EU could be the general solution to the 
issue of otherwise having each EU country defining their own en_*. With 
an en_EU users can set LANG to this, and override relevant 
country-specific conventions using the LC_* settings (as suggested by 
Florian Weimer above).

However, looking more closely at this, it gets a bit messy. For English 
as used in Sweden, all of the following categories probably should be 
overridden as sv_SE rather than en_EU (or en_US, en_GB):

LC_COLLATE=sv_SE.UTF-8
# sv_SE modifies iso14651_t1 for the Swedish umlat characters and there
# may be special rules for V and W. Arguably, even when using English in 
# Sweden, one would normally sort names, etc., according to this Swedish
# sort order, not US or UK English.

LC_CTYPE=sv_SE.UTF-8
# The Swedish umlaut characters needs correct classifications when
# dealing with, e.g., Swedish names (e.g., personal and geographical)
# alongside English text.

LC_MONETARY=sv_SE.UTF-8
# Sweden uses its own currency (SEK), and not the Euro.

LC_TELEPHONE=sv_SE.UTF-8
# The standard way of writing phone numbers differ across EU.
# Most importantly, the 'int_select' and 'int_prefix' needs
# country-specific values.

So, it turns out only the following categories would adopt the proposed 
en_EU (or, in fact, one can use en_DK, since for these they are equal):

LC_TIME=en_EU.UTF-8
LC_NAME=en_EU.UTF-8
LC_PAPER=en_EU.UTF-8
LC_MESSAGES=en_EU.UTF-8

But, finally, there are two categories that cause trouble:

LC_NUMERIC: when writing decimal numbers in English in Sweden, it is 
common to adopt the English use of decimal point (.) instead of the 
decimal comma (,). However, the en_US / en_GB use of a comma as a 
thousands separator is usually *not* adopted, probably in part because 
it could then be confused with the Swedish decimal separator. Hence, a 
thin space "<U202F>" (which sv_SE uses) is arguably the right choice. 
None of the other locales discussed here (including en_DK and the 
proposed en_EU) matches this definition of LC_NUMERIC.

Could perhaps the proposed en_EU be modified to instead use the more 
English-centric decimal_point="." but with thousand_sep = "<U202F>"? 
Maybe there is a need to investigate more deeply how people write 
numbers alongside English text across different EU countries?

LC_ADDRESS: This setting unfortunately combines country-related fields 
(country_name, ab2, and ab3) and language-related fields (lang_name, 
lang_ab, lang_term, lang_lib). I don't see how one can handle this 
setting fully correctly across the different EU countries without 
separate en_* locales.

My conclusion is that even if it may be possible (perhaps with some 
compromise) to deal with the remaining issues, the end mix of settings 
will be complex for end users. Hence, maybe it is more user-friendly 
with separate en_DE, en_SE, en_*, etc. alongside the already existing 
en_DK. It would also help avoiding issues caused by, e.g., the en_SE 
designation already being in some use (e.g., in CLDR).

I have made my en_SE available here:

   https://github.com/httk/locale-en_SE/blob/main/en_SE

I would be happy to hear any thoughts or suggestions related to taking 
the steps to contribute this into glibc.

Best regards,
Rickard


  reply	other threads:[~2024-08-21 12:01 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-03 16:20 Adding an en_DE locale Thomas Näveke
2024-07-05 12:06 ` Carlos O'Donell
2024-07-05 15:21   ` Florian Weimer
2024-08-21 12:01     ` Rickard Armiento [this message]
2024-08-21 13:25       ` Adding an en_SE locale Carlos O'Donell
2024-08-21 21:11       ` AW: Adding an en_SE locale (was: Adding an en_DE locale) yingxiang.yao
2024-08-22 11:01         ` Adding an en_SE locale Rickard Armiento

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2f3d7c4-91c2-430c-9adc-39a856e5ad58@armiento.net \
    --to=misc-dev@armiento.net \
    --cc=carlos@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=libc-locales@sourceware.org \
    --cc=locales@tfan.eu \
    --cc=yingxiang.yao@kasumi321.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).