* Adding an en_DE locale @ 2024-07-03 16:20 Thomas Näveke 2024-07-05 12:06 ` Carlos O'Donell 0 siblings, 1 reply; 7+ messages in thread From: Thomas Näveke @ 2024-07-03 16:20 UTC (permalink / raw) To: libc-locales Hello, I was wondering if I was welcome to submit a patch for an "en_DE" locale, i.e. a locale that uses English language but German units, dates, etc. It would be useful for English speakers working in Germany or with Germans, as well as for Germans who prefer the display language to be English while retaining their date format and units. There is precedence for such a locale, as it exists on Windows (Checked on Win10). It is also defined in the Unicode CLDR (https://github.com/unicode-org/cldr/blob/main/common/main/en_DE.xml). A similar locale exists for Danish with en_DK. My idea is to copy the de_DE locale and adjust the following fields with the entries from en_US: abday "Sun";"Mon";"Tue";"Wed";"Thu";"Fri";"Sat" day "Sunday";/ "Monday";/ "Tuesday";/ "Wednesday";/ "Thursday";/ "Friday";/ "Saturday" abmon "Jan";"Feb";/ "Mar";"Apr";/ "May";"Jun";/ "Jul";"Aug";/ "Sep";"Oct";/ "Nov";"Dec" mon "January";/ "February";/ "March";/ "April";/ "May";/ "June";/ "July";/ "August";/ "September";/ "October";/ "November";/ "December" yesexpr "^[+1yY]" noexpr "^[-0nN]" yesstr "yes" nostr "no" name_fmt "%d%t%g%t%m%t%f" name_miss "Miss." name_mr "Mr." name_mrs "Mrs." name_ms "Ms." lang_name "English" lang_ab "en" lang_term "eng" lang_lib "eng" Thank you for your Time, Thomas Näveke ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Adding an en_DE locale 2024-07-03 16:20 Adding an en_DE locale Thomas Näveke @ 2024-07-05 12:06 ` Carlos O'Donell 2024-07-05 15:21 ` Florian Weimer 0 siblings, 1 reply; 7+ messages in thread From: Carlos O'Donell @ 2024-07-05 12:06 UTC (permalink / raw) To: Thomas Näveke, libc-locales On 7/3/24 12:20 PM, Thomas Näveke wrote: > I was wondering if I was welcome to submit a patch for an "en_DE" > locale, i.e. a locale that uses English language but German units, > dates, etc. It would be useful for English speakers working in > Germany or with Germans, as well as for Germans who prefer the > display language to be English while retaining their date format and > units. There is precedence for such a locale, as it exists on Windows > (Checked on Win10). It is also defined in the Unicode CLDR > (https://github.com/unicode-org/cldr/blob/main/common/main/en_DE.xml). > A similar locale exists for Danish with en_DK. My idea is to copy the > de_DE locale and adjust the following fields with the entries from > en_US: Thomas, Yes, if Windows 10 and Unicode CLDR have such locales, then I think we should consider that they are useful and in use by users. Rather than literal copies of en_US, you can try to use 'copy "en_US"' within the relevant LC_* blocks to reference the data from en_US. Please have a look at the contribution checklist here: https://sourceware.org/glibc/wiki/Contribution%20checklist Thank you! -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Adding an en_DE locale 2024-07-05 12:06 ` Carlos O'Donell @ 2024-07-05 15:21 ` Florian Weimer 2024-08-21 12:01 ` Adding an en_SE locale (was: Adding an en_DE locale) Rickard Armiento 0 siblings, 1 reply; 7+ messages in thread From: Florian Weimer @ 2024-07-05 15:21 UTC (permalink / raw) To: Carlos O'Donell via Libc-locales Cc: Thomas Näveke, Carlos O'Donell * Carlos O'Donell via Libc-locales: > On 7/3/24 12:20 PM, Thomas Näveke wrote: >> I was wondering if I was welcome to submit a patch for an "en_DE" >> locale, i.e. a locale that uses English language but German units, >> dates, etc. It would be useful for English speakers working in >> Germany or with Germans, as well as for Germans who prefer the >> display language to be English while retaining their date format and >> units. There is precedence for such a locale, as it exists on Windows >> (Checked on Win10). It is also defined in the Unicode CLDR >> (https://github.com/unicode-org/cldr/blob/main/common/main/en_DE.xml). >> A similar locale exists for Danish with en_DK. My idea is to copy the >> de_DE locale and adjust the following fields with the entries from >> en_US: > > Thomas, > > Yes, if Windows 10 and Unicode CLDR have such locales, then I think we > should consider that they are useful and in use by users. On the other hand, glibc supports on the fly composition of locales, so you could use LANG=en_US.utf8 with the de_DE.utf8 overrides for the parts you want from there (or vice versa). Or you can compile your own locale using localedef. Most distributions ship the locale sources, so that you can compose something quickly using those copy directives. Thanks, Florian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Adding an en_SE locale (was: Adding an en_DE locale) 2024-07-05 15:21 ` Florian Weimer @ 2024-08-21 12:01 ` Rickard Armiento 2024-08-21 13:25 ` Adding an en_SE locale Carlos O'Donell 2024-08-21 21:11 ` AW: Adding an en_SE locale (was: Adding an en_DE locale) yingxiang.yao 0 siblings, 2 replies; 7+ messages in thread From: Rickard Armiento @ 2024-08-21 12:01 UTC (permalink / raw) To: libc-locales; +Cc: fweimer, carlos, locales, yingxiang.yao On 7/3/24 12:20 PM, Thomas Näveke wrote: >>> I was wondering if I was welcome to submit a patch for an "en_DE" >>> locale, i.e. a locale that uses English language but German units, >>> dates, etc. On 7/5/24 12:06 PM, Carlos O'Donell wrote: >> Yes, if Windows 10 and Unicode CLDR have such locales, then I think >> we should consider that they are useful and in use by users. On 7/5/24 15:21 PM, Florian Weimer wrote: > On the other hand, glibc supports on the fly composition of locales, > so you could use LANG=en_US.utf8 with the de_DE.utf8 overrides for the > parts you want from there (or vice versa). I started looking into how to contribute locales to glibc with a similar motivation: to see if an en_SE locale for English as used in Sweden could be accepted. Similar as for en_DE, en_SE is present in Unicode CLDR, and a web search shows many examples of confusion caused by uses of this locale designation despite not being present in glibc. I then found that Yingxiang Yao (included in cc) sent a patch in January to glibc-alpha for an en_EU locale, which is highly related to the question how to handle these locale issues for users of English in the EU region. https://sourceware.org/pipermail/libc-alpha/2024-January/thread.html#153906 (It is also noteworthy that glibc already includes an en_DK locale for English as used in Denmark, which is quite close to the proposed en_US). At first look, it seems the en_EU could be the general solution to the issue of otherwise having each EU country defining their own en_*. With an en_EU users can set LANG to this, and override relevant country-specific conventions using the LC_* settings (as suggested by Florian Weimer above). However, looking more closely at this, it gets a bit messy. For English as used in Sweden, all of the following categories probably should be overridden as sv_SE rather than en_EU (or en_US, en_GB): LC_COLLATE=sv_SE.UTF-8 # sv_SE modifies iso14651_t1 for the Swedish umlat characters and there # may be special rules for V and W. Arguably, even when using English in # Sweden, one would normally sort names, etc., according to this Swedish # sort order, not US or UK English. LC_CTYPE=sv_SE.UTF-8 # The Swedish umlaut characters needs correct classifications when # dealing with, e.g., Swedish names (e.g., personal and geographical) # alongside English text. LC_MONETARY=sv_SE.UTF-8 # Sweden uses its own currency (SEK), and not the Euro. LC_TELEPHONE=sv_SE.UTF-8 # The standard way of writing phone numbers differ across EU. # Most importantly, the 'int_select' and 'int_prefix' needs # country-specific values. So, it turns out only the following categories would adopt the proposed en_EU (or, in fact, one can use en_DK, since for these they are equal): LC_TIME=en_EU.UTF-8 LC_NAME=en_EU.UTF-8 LC_PAPER=en_EU.UTF-8 LC_MESSAGES=en_EU.UTF-8 But, finally, there are two categories that cause trouble: LC_NUMERIC: when writing decimal numbers in English in Sweden, it is common to adopt the English use of decimal point (.) instead of the decimal comma (,). However, the en_US / en_GB use of a comma as a thousands separator is usually *not* adopted, probably in part because it could then be confused with the Swedish decimal separator. Hence, a thin space "<U202F>" (which sv_SE uses) is arguably the right choice. None of the other locales discussed here (including en_DK and the proposed en_EU) matches this definition of LC_NUMERIC. Could perhaps the proposed en_EU be modified to instead use the more English-centric decimal_point="." but with thousand_sep = "<U202F>"? Maybe there is a need to investigate more deeply how people write numbers alongside English text across different EU countries? LC_ADDRESS: This setting unfortunately combines country-related fields (country_name, ab2, and ab3) and language-related fields (lang_name, lang_ab, lang_term, lang_lib). I don't see how one can handle this setting fully correctly across the different EU countries without separate en_* locales. My conclusion is that even if it may be possible (perhaps with some compromise) to deal with the remaining issues, the end mix of settings will be complex for end users. Hence, maybe it is more user-friendly with separate en_DE, en_SE, en_*, etc. alongside the already existing en_DK. It would also help avoiding issues caused by, e.g., the en_SE designation already being in some use (e.g., in CLDR). I have made my en_SE available here: https://github.com/httk/locale-en_SE/blob/main/en_SE I would be happy to hear any thoughts or suggestions related to taking the steps to contribute this into glibc. Best regards, Rickard ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Adding an en_SE locale 2024-08-21 12:01 ` Adding an en_SE locale (was: Adding an en_DE locale) Rickard Armiento @ 2024-08-21 13:25 ` Carlos O'Donell 2024-08-21 21:11 ` AW: Adding an en_SE locale (was: Adding an en_DE locale) yingxiang.yao 1 sibling, 0 replies; 7+ messages in thread From: Carlos O'Donell @ 2024-08-21 13:25 UTC (permalink / raw) To: Rickard Armiento, libc-locales; +Cc: fweimer, locales, yingxiang.yao On 8/21/24 8:01 AM, Rickard Armiento wrote: > My conclusion is that even if it may be possible (perhaps with some > compromise) to deal with the remaining issues, the end mix of > settings will be complex for end users. Hence, maybe it is more > user-friendly with separate en_DE, en_SE, en_*, etc. alongside the > already existing en_DK. It would also help avoiding issues caused by, > e.g., the en_SE designation already being in some use (e.g., in > CLDR). > > I have made my en_SE available here: > > https://github.com/httk/locale-en_SE/blob/main/en_SE > > I would be happy to hear any thoughts or suggestions related to > taking the steps to contribute this into glibc. At a high level the harmonization between what CLDR offers, and glibc offers is important since it sets users up for a smoother experience when using applications that use system localization vs. libicu-based localization. The fact that cldr/common/main/en_SE.xml exists, is sufficient for us to review adding en_SE to glibc. There are 1067 XML files in cldr/common/main/, while only 347 xx_YY locales in glibc/localedata/locales/, so we have a long way to go for harmonization and the topics are complicated to audit. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 7+ messages in thread
* AW: Adding an en_SE locale (was: Adding an en_DE locale) 2024-08-21 12:01 ` Adding an en_SE locale (was: Adding an en_DE locale) Rickard Armiento 2024-08-21 13:25 ` Adding an en_SE locale Carlos O'Donell @ 2024-08-21 21:11 ` yingxiang.yao 2024-08-22 11:01 ` Adding an en_SE locale Rickard Armiento 1 sibling, 1 reply; 7+ messages in thread From: yingxiang.yao @ 2024-08-21 21:11 UTC (permalink / raw) To: 'Rickard Armiento'; +Cc: fweimer, carlos, locales, libc-locales Hello Rickard, I understand your confusion and I was also not sure how to do it properly. I removed things like currency etc. just to avoid the case like that Sweden and other EU countries are not using Euro. For the thousand separators, I think a thin space would be appropriate, since this is also stated in style guide of English language of EU commission. Also in this guide, they state that comma is used as decimal separator due to technical reasons. Regards, Yingxiang Yao -----Ursprüngliche Nachricht----- Von: Rickard Armiento <misc-dev@armiento.net> Gesendet: Mittwoch, 21. August 2024 14:01 An: libc-locales@sourceware.org Cc: fweimer@redhat.com; carlos@redhat.com; locales@tfan.eu; yingxiang.yao@kasumi321.de Betreff: Adding an en_SE locale (was: Adding an en_DE locale) On 7/3/24 12:20 PM, Thomas Näveke wrote: >>> I was wondering if I was welcome to submit a patch for an "en_DE" >>> locale, i.e. a locale that uses English language but German units, >>> dates, etc. On 7/5/24 12:06 PM, Carlos O'Donell wrote: >> Yes, if Windows 10 and Unicode CLDR have such locales, then I think >> we should consider that they are useful and in use by users. On 7/5/24 15:21 PM, Florian Weimer wrote: > On the other hand, glibc supports on the fly composition of locales, > so you could use LANG=en_US.utf8 with the de_DE.utf8 overrides for the > parts you want from there (or vice versa). I started looking into how to contribute locales to glibc with a similar motivation: to see if an en_SE locale for English as used in Sweden could be accepted. Similar as for en_DE, en_SE is present in Unicode CLDR, and a web search shows many examples of confusion caused by uses of this locale designation despite not being present in glibc. I then found that Yingxiang Yao (included in cc) sent a patch in January to glibc-alpha for an en_EU locale, which is highly related to the question how to handle these locale issues for users of English in the EU region. https://sourceware.org/pipermail/libc-alpha/2024-January/thread.html#153906 (It is also noteworthy that glibc already includes an en_DK locale for English as used in Denmark, which is quite close to the proposed en_US). At first look, it seems the en_EU could be the general solution to the issue of otherwise having each EU country defining their own en_*. With an en_EU users can set LANG to this, and override relevant country-specific conventions using the LC_* settings (as suggested by Florian Weimer above). However, looking more closely at this, it gets a bit messy. For English as used in Sweden, all of the following categories probably should be overridden as sv_SE rather than en_EU (or en_US, en_GB): LC_COLLATE=sv_SE.UTF-8 # sv_SE modifies iso14651_t1 for the Swedish umlat characters and there # may be special rules for V and W. Arguably, even when using English in # Sweden, one would normally sort names, etc., according to this Swedish # sort order, not US or UK English. LC_CTYPE=sv_SE.UTF-8 # The Swedish umlaut characters needs correct classifications when # dealing with, e.g., Swedish names (e.g., personal and geographical) # alongside English text. LC_MONETARY=sv_SE.UTF-8 # Sweden uses its own currency (SEK), and not the Euro. LC_TELEPHONE=sv_SE.UTF-8 # The standard way of writing phone numbers differ across EU. # Most importantly, the 'int_select' and 'int_prefix' needs # country-specific values. So, it turns out only the following categories would adopt the proposed en_EU (or, in fact, one can use en_DK, since for these they are equal): LC_TIME=en_EU.UTF-8 LC_NAME=en_EU.UTF-8 LC_PAPER=en_EU.UTF-8 LC_MESSAGES=en_EU.UTF-8 But, finally, there are two categories that cause trouble: LC_NUMERIC: when writing decimal numbers in English in Sweden, it is common to adopt the English use of decimal point (.) instead of the decimal comma (,). However, the en_US / en_GB use of a comma as a thousands separator is usually *not* adopted, probably in part because it could then be confused with the Swedish decimal separator. Hence, a thin space "<U202F>" (which sv_SE uses) is arguably the right choice. None of the other locales discussed here (including en_DK and the proposed en_EU) matches this definition of LC_NUMERIC. Could perhaps the proposed en_EU be modified to instead use the more English-centric decimal_point="." but with thousand_sep = "<U202F>"? Maybe there is a need to investigate more deeply how people write numbers alongside English text across different EU countries? LC_ADDRESS: This setting unfortunately combines country-related fields (country_name, ab2, and ab3) and language-related fields (lang_name, lang_ab, lang_term, lang_lib). I don't see how one can handle this setting fully correctly across the different EU countries without separate en_* locales. My conclusion is that even if it may be possible (perhaps with some compromise) to deal with the remaining issues, the end mix of settings will be complex for end users. Hence, maybe it is more user-friendly with separate en_DE, en_SE, en_*, etc. alongside the already existing en_DK. It would also help avoiding issues caused by, e.g., the en_SE designation already being in some use (e.g., in CLDR). I have made my en_SE available here: https://github.com/httk/locale-en_SE/blob/main/en_SE I would be happy to hear any thoughts or suggestions related to taking the steps to contribute this into glibc. Best regards, Rickard ^ permalink raw reply [flat|nested] 7+ messages in thread
* Adding an en_SE locale 2024-08-21 21:11 ` AW: Adding an en_SE locale (was: Adding an en_DE locale) yingxiang.yao @ 2024-08-22 11:01 ` Rickard Armiento 0 siblings, 0 replies; 7+ messages in thread From: Rickard Armiento @ 2024-08-22 11:01 UTC (permalink / raw) To: libc-locales; +Cc: carlos, yingxiang.yao, locales On 8/21/24 21:11 PM, Yingxiang Yao wrote: > I removed things like currency etc. just to avoid the case like that > Sweden and other EU countries are not using Euro. To me, most of the choices in your proposed en_EU looks very reasonable, I was just surprised how much I still had to override for a reasonable en_SE. And I perfectly agree setting the currency to Euro makes the most sense for en_EU. However, on this topic: perhaps the en_EU LC_CTYPE needs more consideration? Instead of just copying en_GB, I would argue it would be better to mimic how LC_CTYPE is set up in sv_SE: copy "i18n" and then merge definitions of special characters in official use throughout the languages in the EU region, so that things like 'convert string to uppercase' generally works correctly throughout the EU. A quick search came up with this post on the various special characters (with a nice map): https://jakubmarian.com/special-characters-diacritics-used-in-european-languages/ Then LC_COLLATE should probably implement precisely EOR / EN 13710 https://en.wikipedia.org/wiki/European_ordering_rules There is actually one existing glibc locale that references EN 13710, and that is fi_FI. That locale seems to use EN 13710 + an adaption to Finnish. Maybe one can reverse that definition into a pure implementation of EN 13710? > For the thousand separators, I think a thin space would be > appropriate, since this is also stated in style guide of English > language of EU commission. Also in this guide, they state that comma > is used as decimal separator due to technical reasons. In this link: https://commission.europa.eu/system/files/2023-11/styleguide_english_dgt_en.pdf I found the following quote: =========== > Decimal separator. In English, the integral part of a number is > separated from its fractional part by a point, not a comma as in > other European languages. For technical reasons, however, the EU > Publications Office will replace points with commas in English > documents that are to appear in the Official Journal of the European > Union =========== Is this the statement you refer to? My interpretation of that statement is a bit different. To me it appears to say: 1. The right decimal separator in English is a point. 2. However, for technical reasons, specifically the Official Journal of the European Union will not adhere to (1) for English documents. So, I don't read this as a mandate to generally use decimal commas instead of periods when not subject to such limitations. From a more practical perspective: If a user specifies "LANG=en_EU" and opens a spreadsheet software, what number format would they expect to use? Decimal points, or commas? Best regards, Rickard ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-22 11:01 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-07-03 16:20 Adding an en_DE locale Thomas Näveke 2024-07-05 12:06 ` Carlos O'Donell 2024-07-05 15:21 ` Florian Weimer 2024-08-21 12:01 ` Adding an en_SE locale (was: Adding an en_DE locale) Rickard Armiento 2024-08-21 13:25 ` Adding an en_SE locale Carlos O'Donell 2024-08-21 21:11 ` AW: Adding an en_SE locale (was: Adding an en_DE locale) yingxiang.yao 2024-08-22 11:01 ` Adding an en_SE locale Rickard Armiento
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).