* Adding an en_DE locale
@ 2024-07-03 16:20 Thomas Näveke
2024-07-05 12:06 ` Carlos O'Donell
0 siblings, 1 reply; 7+ messages in thread
From: Thomas Näveke @ 2024-07-03 16:20 UTC (permalink / raw)
To: libc-locales
Hello,
I was wondering if I was welcome to submit a patch for an "en_DE"
locale, i.e. a locale that uses English language but German units,
dates, etc. It would be useful for English speakers working in Germany
or with Germans, as well as for Germans who prefer the display language
to be English while retaining their date format and units. There is
precedence for such a locale, as it exists on Windows (Checked on
Win10). It is also defined in the Unicode CLDR
(https://github.com/unicode-org/cldr/blob/main/common/main/en_DE.xml). A
similar locale exists for Danish with en_DK.
My idea is to copy the de_DE locale and adjust the following fields with
the entries from en_US:
abday "Sun";"Mon";"Tue";"Wed";"Thu";"Fri";"Sat"
day "Sunday";/
"Monday";/
"Tuesday";/
"Wednesday";/
"Thursday";/
"Friday";/
"Saturday"
abmon "Jan";"Feb";/
"Mar";"Apr";/
"May";"Jun";/
"Jul";"Aug";/
"Sep";"Oct";/
"Nov";"Dec"
mon "January";/
"February";/
"March";/
"April";/
"May";/
"June";/
"July";/
"August";/
"September";/
"October";/
"November";/
"December"
yesexpr "^[+1yY]"
noexpr "^[-0nN]"
yesstr "yes"
nostr "no"
name_fmt "%d%t%g%t%m%t%f"
name_miss "Miss."
name_mr "Mr."
name_mrs "Mrs."
name_ms "Ms."
lang_name "English"
lang_ab "en"
lang_term "eng"
lang_lib "eng"
Thank you for your Time,
Thomas Näveke
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Adding an en_DE locale
2024-07-03 16:20 Adding an en_DE locale Thomas Näveke
@ 2024-07-05 12:06 ` Carlos O'Donell
2024-07-05 15:21 ` Florian Weimer
0 siblings, 1 reply; 7+ messages in thread
From: Carlos O'Donell @ 2024-07-05 12:06 UTC (permalink / raw)
To: Thomas Näveke, libc-locales
On 7/3/24 12:20 PM, Thomas Näveke wrote:
> I was wondering if I was welcome to submit a patch for an "en_DE"
> locale, i.e. a locale that uses English language but German units,
> dates, etc. It would be useful for English speakers working in
> Germany or with Germans, as well as for Germans who prefer the
> display language to be English while retaining their date format and
> units. There is precedence for such a locale, as it exists on Windows
> (Checked on Win10). It is also defined in the Unicode CLDR
> (https://github.com/unicode-org/cldr/blob/main/common/main/en_DE.xml).
> A similar locale exists for Danish with en_DK. My idea is to copy the
> de_DE locale and adjust the following fields with the entries from
> en_US:
Thomas,
Yes, if Windows 10 and Unicode CLDR have such locales, then I think we should consider
that they are useful and in use by users.
Rather than literal copies of en_US, you can try to use 'copy "en_US"' within the
relevant LC_* blocks to reference the data from en_US.
Please have a look at the contribution checklist here:
https://sourceware.org/glibc/wiki/Contribution%20checklist
Thank you!
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Adding an en_DE locale
2024-07-05 12:06 ` Carlos O'Donell
@ 2024-07-05 15:21 ` Florian Weimer
2024-08-21 12:01 ` Adding an en_SE locale (was: Adding an en_DE locale) Rickard Armiento
0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2024-07-05 15:21 UTC (permalink / raw)
To: Carlos O'Donell via Libc-locales
Cc: Thomas Näveke, Carlos O'Donell
* Carlos O'Donell via Libc-locales:
> On 7/3/24 12:20 PM, Thomas Näveke wrote:
>> I was wondering if I was welcome to submit a patch for an "en_DE"
>> locale, i.e. a locale that uses English language but German units,
>> dates, etc. It would be useful for English speakers working in
>> Germany or with Germans, as well as for Germans who prefer the
>> display language to be English while retaining their date format and
>> units. There is precedence for such a locale, as it exists on Windows
>> (Checked on Win10). It is also defined in the Unicode CLDR
>> (https://github.com/unicode-org/cldr/blob/main/common/main/en_DE.xml).
>> A similar locale exists for Danish with en_DK. My idea is to copy the
>> de_DE locale and adjust the following fields with the entries from
>> en_US:
>
> Thomas,
>
> Yes, if Windows 10 and Unicode CLDR have such locales, then I think we
> should consider that they are useful and in use by users.
On the other hand, glibc supports on the fly composition of locales, so
you could use LANG=en_US.utf8 with the de_DE.utf8 overrides for the
parts you want from there (or vice versa). Or you can compile your own
locale using localedef. Most distributions ship the locale sources, so
that you can compose something quickly using those copy directives.
Thanks,
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Adding an en_SE locale (was: Adding an en_DE locale)
2024-07-05 15:21 ` Florian Weimer
@ 2024-08-21 12:01 ` Rickard Armiento
2024-08-21 13:25 ` Adding an en_SE locale Carlos O'Donell
2024-08-21 21:11 ` AW: Adding an en_SE locale (was: Adding an en_DE locale) yingxiang.yao
0 siblings, 2 replies; 7+ messages in thread
From: Rickard Armiento @ 2024-08-21 12:01 UTC (permalink / raw)
To: libc-locales; +Cc: fweimer, carlos, locales, yingxiang.yao
On 7/3/24 12:20 PM, Thomas Näveke wrote:
>>> I was wondering if I was welcome to submit a patch for an "en_DE"
>>> locale, i.e. a locale that uses English language but German units,
>>> dates, etc.
On 7/5/24 12:06 PM, Carlos O'Donell wrote:
>> Yes, if Windows 10 and Unicode CLDR have such locales, then I think
>> we should consider that they are useful and in use by users.
On 7/5/24 15:21 PM, Florian Weimer wrote:
> On the other hand, glibc supports on the fly composition of locales,
> so you could use LANG=en_US.utf8 with the de_DE.utf8 overrides for the
> parts you want from there (or vice versa).
I started looking into how to contribute locales to glibc with a similar
motivation: to see if an en_SE locale for English as used in Sweden
could be accepted. Similar as for en_DE, en_SE is present in Unicode
CLDR, and a web search shows many examples of confusion caused by uses
of this locale designation despite not being present in glibc.
I then found that Yingxiang Yao (included in cc) sent a patch in January
to glibc-alpha for an en_EU locale, which is highly related to the
question how to handle these locale issues for users of English in the
EU region.
https://sourceware.org/pipermail/libc-alpha/2024-January/thread.html#153906
(It is also noteworthy that glibc already includes an en_DK locale for
English as used in Denmark, which is quite close to the proposed en_US).
At first look, it seems the en_EU could be the general solution to the
issue of otherwise having each EU country defining their own en_*. With
an en_EU users can set LANG to this, and override relevant
country-specific conventions using the LC_* settings (as suggested by
Florian Weimer above).
However, looking more closely at this, it gets a bit messy. For English
as used in Sweden, all of the following categories probably should be
overridden as sv_SE rather than en_EU (or en_US, en_GB):
LC_COLLATE=sv_SE.UTF-8
# sv_SE modifies iso14651_t1 for the Swedish umlat characters and there
# may be special rules for V and W. Arguably, even when using English in
# Sweden, one would normally sort names, etc., according to this Swedish
# sort order, not US or UK English.
LC_CTYPE=sv_SE.UTF-8
# The Swedish umlaut characters needs correct classifications when
# dealing with, e.g., Swedish names (e.g., personal and geographical)
# alongside English text.
LC_MONETARY=sv_SE.UTF-8
# Sweden uses its own currency (SEK), and not the Euro.
LC_TELEPHONE=sv_SE.UTF-8
# The standard way of writing phone numbers differ across EU.
# Most importantly, the 'int_select' and 'int_prefix' needs
# country-specific values.
So, it turns out only the following categories would adopt the proposed
en_EU (or, in fact, one can use en_DK, since for these they are equal):
LC_TIME=en_EU.UTF-8
LC_NAME=en_EU.UTF-8
LC_PAPER=en_EU.UTF-8
LC_MESSAGES=en_EU.UTF-8
But, finally, there are two categories that cause trouble:
LC_NUMERIC: when writing decimal numbers in English in Sweden, it is
common to adopt the English use of decimal point (.) instead of the
decimal comma (,). However, the en_US / en_GB use of a comma as a
thousands separator is usually *not* adopted, probably in part because
it could then be confused with the Swedish decimal separator. Hence, a
thin space "<U202F>" (which sv_SE uses) is arguably the right choice.
None of the other locales discussed here (including en_DK and the
proposed en_EU) matches this definition of LC_NUMERIC.
Could perhaps the proposed en_EU be modified to instead use the more
English-centric decimal_point="." but with thousand_sep = "<U202F>"?
Maybe there is a need to investigate more deeply how people write
numbers alongside English text across different EU countries?
LC_ADDRESS: This setting unfortunately combines country-related fields
(country_name, ab2, and ab3) and language-related fields (lang_name,
lang_ab, lang_term, lang_lib). I don't see how one can handle this
setting fully correctly across the different EU countries without
separate en_* locales.
My conclusion is that even if it may be possible (perhaps with some
compromise) to deal with the remaining issues, the end mix of settings
will be complex for end users. Hence, maybe it is more user-friendly
with separate en_DE, en_SE, en_*, etc. alongside the already existing
en_DK. It would also help avoiding issues caused by, e.g., the en_SE
designation already being in some use (e.g., in CLDR).
I have made my en_SE available here:
https://github.com/httk/locale-en_SE/blob/main/en_SE
I would be happy to hear any thoughts or suggestions related to taking
the steps to contribute this into glibc.
Best regards,
Rickard
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Adding an en_SE locale
2024-08-21 12:01 ` Adding an en_SE locale (was: Adding an en_DE locale) Rickard Armiento
@ 2024-08-21 13:25 ` Carlos O'Donell
2024-08-21 21:11 ` AW: Adding an en_SE locale (was: Adding an en_DE locale) yingxiang.yao
1 sibling, 0 replies; 7+ messages in thread
From: Carlos O'Donell @ 2024-08-21 13:25 UTC (permalink / raw)
To: Rickard Armiento, libc-locales; +Cc: fweimer, locales, yingxiang.yao
On 8/21/24 8:01 AM, Rickard Armiento wrote:
> My conclusion is that even if it may be possible (perhaps with some
> compromise) to deal with the remaining issues, the end mix of
> settings will be complex for end users. Hence, maybe it is more
> user-friendly with separate en_DE, en_SE, en_*, etc. alongside the
> already existing en_DK. It would also help avoiding issues caused by,
> e.g., the en_SE designation already being in some use (e.g., in
> CLDR).
>
> I have made my en_SE available here:
>
> https://github.com/httk/locale-en_SE/blob/main/en_SE
>
> I would be happy to hear any thoughts or suggestions related to
> taking the steps to contribute this into glibc.
At a high level the harmonization between what CLDR offers, and glibc offers is
important since it sets users up for a smoother experience when using applications
that use system localization vs. libicu-based localization.
The fact that cldr/common/main/en_SE.xml exists, is sufficient for us to review
adding en_SE to glibc.
There are 1067 XML files in cldr/common/main/, while only 347 xx_YY locales
in glibc/localedata/locales/, so we have a long way to go for harmonization
and the topics are complicated to audit.
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 7+ messages in thread
* AW: Adding an en_SE locale (was: Adding an en_DE locale)
2024-08-21 12:01 ` Adding an en_SE locale (was: Adding an en_DE locale) Rickard Armiento
2024-08-21 13:25 ` Adding an en_SE locale Carlos O'Donell
@ 2024-08-21 21:11 ` yingxiang.yao
2024-08-22 11:01 ` Adding an en_SE locale Rickard Armiento
1 sibling, 1 reply; 7+ messages in thread
From: yingxiang.yao @ 2024-08-21 21:11 UTC (permalink / raw)
To: 'Rickard Armiento'; +Cc: fweimer, carlos, locales, libc-locales
Hello Rickard,
I understand your confusion and I was also not sure how to do it properly. I removed things like currency etc. just to avoid the case like that Sweden and other EU countries are not using Euro. For the thousand separators, I think a thin space would be appropriate, since this is also stated in style guide of English language of EU commission. Also in this guide, they state that comma is used as decimal separator due to technical reasons.
Regards,
Yingxiang Yao
-----Ursprüngliche Nachricht-----
Von: Rickard Armiento <misc-dev@armiento.net>
Gesendet: Mittwoch, 21. August 2024 14:01
An: libc-locales@sourceware.org
Cc: fweimer@redhat.com; carlos@redhat.com; locales@tfan.eu; yingxiang.yao@kasumi321.de
Betreff: Adding an en_SE locale (was: Adding an en_DE locale)
On 7/3/24 12:20 PM, Thomas Näveke wrote:
>>> I was wondering if I was welcome to submit a patch for an "en_DE"
>>> locale, i.e. a locale that uses English language but German units, >>> dates, etc.
On 7/5/24 12:06 PM, Carlos O'Donell wrote:
>> Yes, if Windows 10 and Unicode CLDR have such locales, then I think >> we should consider that they are useful and in use by users.
On 7/5/24 15:21 PM, Florian Weimer wrote:
> On the other hand, glibc supports on the fly composition of locales, > so you could use LANG=en_US.utf8 with the de_DE.utf8 overrides for the > parts you want from there (or vice versa).
I started looking into how to contribute locales to glibc with a similar
motivation: to see if an en_SE locale for English as used in Sweden could be accepted. Similar as for en_DE, en_SE is present in Unicode CLDR, and a web search shows many examples of confusion caused by uses of this locale designation despite not being present in glibc.
I then found that Yingxiang Yao (included in cc) sent a patch in January to glibc-alpha for an en_EU locale, which is highly related to the question how to handle these locale issues for users of English in the EU region.
https://sourceware.org/pipermail/libc-alpha/2024-January/thread.html#153906
(It is also noteworthy that glibc already includes an en_DK locale for English as used in Denmark, which is quite close to the proposed en_US).
At first look, it seems the en_EU could be the general solution to the issue of otherwise having each EU country defining their own en_*. With an en_EU users can set LANG to this, and override relevant country-specific conventions using the LC_* settings (as suggested by Florian Weimer above).
However, looking more closely at this, it gets a bit messy. For English as used in Sweden, all of the following categories probably should be overridden as sv_SE rather than en_EU (or en_US, en_GB):
LC_COLLATE=sv_SE.UTF-8
# sv_SE modifies iso14651_t1 for the Swedish umlat characters and there # may be special rules for V and W. Arguably, even when using English in # Sweden, one would normally sort names, etc., according to this Swedish # sort order, not US or UK English.
LC_CTYPE=sv_SE.UTF-8
# The Swedish umlaut characters needs correct classifications when # dealing with, e.g., Swedish names (e.g., personal and geographical) # alongside English text.
LC_MONETARY=sv_SE.UTF-8
# Sweden uses its own currency (SEK), and not the Euro.
LC_TELEPHONE=sv_SE.UTF-8
# The standard way of writing phone numbers differ across EU.
# Most importantly, the 'int_select' and 'int_prefix' needs # country-specific values.
So, it turns out only the following categories would adopt the proposed en_EU (or, in fact, one can use en_DK, since for these they are equal):
LC_TIME=en_EU.UTF-8
LC_NAME=en_EU.UTF-8
LC_PAPER=en_EU.UTF-8
LC_MESSAGES=en_EU.UTF-8
But, finally, there are two categories that cause trouble:
LC_NUMERIC: when writing decimal numbers in English in Sweden, it is common to adopt the English use of decimal point (.) instead of the decimal comma (,). However, the en_US / en_GB use of a comma as a thousands separator is usually *not* adopted, probably in part because it could then be confused with the Swedish decimal separator. Hence, a thin space "<U202F>" (which sv_SE uses) is arguably the right choice.
None of the other locales discussed here (including en_DK and the proposed en_EU) matches this definition of LC_NUMERIC.
Could perhaps the proposed en_EU be modified to instead use the more English-centric decimal_point="." but with thousand_sep = "<U202F>"?
Maybe there is a need to investigate more deeply how people write numbers alongside English text across different EU countries?
LC_ADDRESS: This setting unfortunately combines country-related fields (country_name, ab2, and ab3) and language-related fields (lang_name, lang_ab, lang_term, lang_lib). I don't see how one can handle this setting fully correctly across the different EU countries without separate en_* locales.
My conclusion is that even if it may be possible (perhaps with some
compromise) to deal with the remaining issues, the end mix of settings will be complex for end users. Hence, maybe it is more user-friendly with separate en_DE, en_SE, en_*, etc. alongside the already existing en_DK. It would also help avoiding issues caused by, e.g., the en_SE designation already being in some use (e.g., in CLDR).
I have made my en_SE available here:
https://github.com/httk/locale-en_SE/blob/main/en_SE
I would be happy to hear any thoughts or suggestions related to taking the steps to contribute this into glibc.
Best regards,
Rickard
^ permalink raw reply [flat|nested] 7+ messages in thread
* Adding an en_SE locale
2024-08-21 21:11 ` AW: Adding an en_SE locale (was: Adding an en_DE locale) yingxiang.yao
@ 2024-08-22 11:01 ` Rickard Armiento
0 siblings, 0 replies; 7+ messages in thread
From: Rickard Armiento @ 2024-08-22 11:01 UTC (permalink / raw)
To: libc-locales; +Cc: carlos, yingxiang.yao, locales
On 8/21/24 21:11 PM, Yingxiang Yao wrote:
> I removed things like currency etc. just to avoid the case like that
> Sweden and other EU countries are not using Euro.
To me, most of the choices in your proposed en_EU looks very reasonable,
I was just surprised how much I still had to override for a reasonable
en_SE. And I perfectly agree setting the currency to Euro makes the most
sense for en_EU.
However, on this topic: perhaps the en_EU LC_CTYPE needs more
consideration? Instead of just copying en_GB, I would argue it would be
better to mimic how LC_CTYPE is set up in sv_SE: copy "i18n" and then
merge definitions of special characters in official use throughout the
languages in the EU region, so that things like 'convert string to
uppercase' generally works correctly throughout the EU. A quick search
came up with this post on the various special characters (with a nice map):
https://jakubmarian.com/special-characters-diacritics-used-in-european-languages/
Then LC_COLLATE should probably implement precisely EOR / EN 13710
https://en.wikipedia.org/wiki/European_ordering_rules
There is actually one existing glibc locale that references EN 13710,
and that is fi_FI. That locale seems to use EN 13710 + an adaption to
Finnish. Maybe one can reverse that definition into a pure
implementation of EN 13710?
> For the thousand separators, I think a thin space would be
> appropriate, since this is also stated in style guide of English
> language of EU commission. Also in this guide, they state that comma
> is used as decimal separator due to technical reasons.
In this link:
https://commission.europa.eu/system/files/2023-11/styleguide_english_dgt_en.pdf
I found the following quote:
===========
> Decimal separator. In English, the integral part of a number is
> separated from its fractional part by a point, not a comma as in
> other European languages. For technical reasons, however, the EU
> Publications Office will replace points with commas in English
> documents that are to appear in the Official Journal of the European
> Union
===========
Is this the statement you refer to? My interpretation of that statement
is a bit different. To me it appears to say:
1. The right decimal separator in English is a point.
2. However, for technical reasons, specifically the Official Journal of
the European Union will not adhere to (1) for English documents.
So, I don't read this as a mandate to generally use decimal commas
instead of periods when not subject to such limitations.
From a more practical perspective: If a user specifies "LANG=en_EU" and
opens a spreadsheet software, what number format would they expect to
use? Decimal points, or commas?
Best regards,
Rickard
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-22 11:01 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-03 16:20 Adding an en_DE locale Thomas Näveke
2024-07-05 12:06 ` Carlos O'Donell
2024-07-05 15:21 ` Florian Weimer
2024-08-21 12:01 ` Adding an en_SE locale (was: Adding an en_DE locale) Rickard Armiento
2024-08-21 13:25 ` Adding an en_SE locale Carlos O'Donell
2024-08-21 21:11 ` AW: Adding an en_SE locale (was: Adding an en_DE locale) yingxiang.yao
2024-08-22 11:01 ` Adding an en_SE locale Rickard Armiento
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).