From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) by sourceware.org (Postfix) with ESMTPS id F18563870C10 for ; Wed, 21 Aug 2024 12:01:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F18563870C10 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=armiento.net Authentication-Results: sourceware.org; spf=none smtp.mailfrom=armiento.se ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F18563870C10 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::12b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724241673; cv=none; b=tt9z32YLouEMJEEkDDwcxGV7b4nWe6DNeng0nOdSFqwg2NwXI/ieaIWb7uFloARJISWioGKu/TxYJ3ZewwLHZU4gs54SiuqTFIM8yp36+PUsuCwykF4RAGc7vNssDpVUVbhQUrIFag5JvqJoSsBAMttxDG3xBWS4BcexRz76efU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724241673; c=relaxed/simple; bh=hS3UKb0jb3zqTr5ovKOFfE6AIjJ+XvcpgjgI2srb1Mw=; h=DKIM-Signature:Message-ID:Date:MIME-Version:To:Subject:From; b=aVthy06EQ9lv6YDwlgzLuemkTtNquHWnKNQOxtiTBzBSrUZ/8Qn+PElCU4PPLg5br0gnVSf0S6jfOX0MLNWjGdXfn4Cu2k+bP1IRge18r6UVLtzs6W9DUUJxaJOxeOKzOP5RBoMFDkQ7cm9Xqi/gRUgsTAmC+oUxin+QCOtwIEM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x12b.google.com with SMTP id 2adb3069b0e04-5334a8a1b07so531987e87.1 for ; Wed, 21 Aug 2024 05:01:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armiento-se.20230601.gappssmtp.com; s=20230601; t=1724241666; x=1724846466; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:cc:from:content-language :subject:references:to:user-agent:mime-version:date:message-id:from :to:cc:subject:date:message-id:reply-to; bh=rsEad5Zjdwr3C9qHaCvVRHHEi8vsytjLp6/BMNA/sZo=; b=L8eF0cNneHWhuhQpCFbMd10JuJZknMLu14utESSMy3oiBtP3PKO0geH8pniT2JjC/D FeMg2M/591mV2Xfngx8xfgtIKp3z/cvYnh/JQzOkjVzcBu+TwY0rthsRyMnadNxd2Kud MOo+BBI3sYgcFTFEJIZeGEBEdllNBmuai+JpbIH349eOsJQa8B6c2H43TNmRaaZ5/irN Ef8hRoRGw7P8CSr65uef94QbTxeQMnflVpzaNsPSs8Zt60F6/VLPSGhzxELKpwUEFauz pSXFxfwe3wFa1oLwma7TVyibAsqmGiVG+XvBFa52mdA/VpbsZ8PW+asa0Z1Y1HKkgrsz qIRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724241666; x=1724846466; h=content-transfer-encoding:in-reply-to:cc:from:content-language :subject:references:to:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rsEad5Zjdwr3C9qHaCvVRHHEi8vsytjLp6/BMNA/sZo=; b=aXgdgJtvt0/MwFxttk0b6GWkh+LcJ7DXdqVDXbMsDud5kdGS7iEPT4Q8I6SY3WCPdr LmMgmqbp1EuEGjqWKI/8zgq4fpt35CD0dq6JTSGf1D3afBINbnAv2VPh2NYwQx2YuTxO htMqk9yppQmgx/lqYww7Yd9K/AGXUDYrMT3JSPran+YAv/woXPOAVuMS+25CsEXeFTAO 7Bws2iHPfrcdJuA1dc4CT47dSsnpqG8g6RQ7m1+s8hzTKb1jev8KIAyBBhVvat5ND0lh HU3TeoovN7L8S3sPF4hC3A+lsVr4WpNolGTV0d+PMQgSdLHMY2FRh0TZfNhE2n8m1YA6 5hTg== X-Gm-Message-State: AOJu0Yzl90SJy6Ync3hoOeqO5Gtbt7075cUZcnSwZLxeaeO7lgu/8A5z vvzr87nteti919WUoqokTta2dJtEZjDmKeN6Akxg3bs0ZmwmP3iaDUPiHh7qOJG5RiOcaRaAAPE HwA== X-Google-Smtp-Source: AGHT+IFWpTfYhfqjXSSkLyR9CDjCyXZdBpj00udBvzjGRDOMA4u5j86SUGXTpIUIQRkRvAA3DyUJXg== X-Received: by 2002:a05:6512:3982:b0:52e:9b92:4999 with SMTP id 2adb3069b0e04-53348549bd8mr1211551e87.2.1724241665740; Wed, 21 Aug 2024 05:01:05 -0700 (PDT) Received: from ?IPV6:2001:6b0:17:fc11:1000::89f4? ([2001:6b0:17:fc11:1000::89f4]) by smtp.googlemail.com with ESMTPSA id 2adb3069b0e04-5334aefe9f8sm67918e87.206.2024.08.21.05.01.04 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 21 Aug 2024 05:01:05 -0700 (PDT) Message-ID: Date: Wed, 21 Aug 2024 14:01:04 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: libc-locales@sourceware.org References: <87le2fvp7s.fsf@oldenburg.str.redhat.com> Subject: Adding an en_SE locale (was: Adding an en_DE locale) Content-Language: sv-FI, en-US From: Rickard Armiento Cc: fweimer@redhat.com, carlos@redhat.com, locales@tfan.eu, yingxiang.yao@kasumi321.de In-Reply-To: <87le2fvp7s.fsf@oldenburg.str.redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 7/3/24 12:20 PM, Thomas Näveke wrote: >>> I was wondering if I was welcome to submit a patch for an "en_DE" >>> locale, i.e. a locale that uses English language but German units, >>> dates, etc. On 7/5/24 12:06 PM, Carlos O'Donell wrote: >> Yes, if Windows 10 and Unicode CLDR have such locales, then I think >> we should consider that they are useful and in use by users. On 7/5/24 15:21 PM, Florian Weimer wrote: > On the other hand, glibc supports on the fly composition of locales, > so you could use LANG=en_US.utf8 with the de_DE.utf8 overrides for the > parts you want from there (or vice versa). I started looking into how to contribute locales to glibc with a similar motivation: to see if an en_SE locale for English as used in Sweden could be accepted. Similar as for en_DE, en_SE is present in Unicode CLDR, and a web search shows many examples of confusion caused by uses of this locale designation despite not being present in glibc. I then found that Yingxiang Yao (included in cc) sent a patch in January to glibc-alpha for an en_EU locale, which is highly related to the question how to handle these locale issues for users of English in the EU region. https://sourceware.org/pipermail/libc-alpha/2024-January/thread.html#153906 (It is also noteworthy that glibc already includes an en_DK locale for English as used in Denmark, which is quite close to the proposed en_US). At first look, it seems the en_EU could be the general solution to the issue of otherwise having each EU country defining their own en_*. With an en_EU users can set LANG to this, and override relevant country-specific conventions using the LC_* settings (as suggested by Florian Weimer above). However, looking more closely at this, it gets a bit messy. For English as used in Sweden, all of the following categories probably should be overridden as sv_SE rather than en_EU (or en_US, en_GB): LC_COLLATE=sv_SE.UTF-8 # sv_SE modifies iso14651_t1 for the Swedish umlat characters and there # may be special rules for V and W. Arguably, even when using English in # Sweden, one would normally sort names, etc., according to this Swedish # sort order, not US or UK English. LC_CTYPE=sv_SE.UTF-8 # The Swedish umlaut characters needs correct classifications when # dealing with, e.g., Swedish names (e.g., personal and geographical) # alongside English text. LC_MONETARY=sv_SE.UTF-8 # Sweden uses its own currency (SEK), and not the Euro. LC_TELEPHONE=sv_SE.UTF-8 # The standard way of writing phone numbers differ across EU. # Most importantly, the 'int_select' and 'int_prefix' needs # country-specific values. So, it turns out only the following categories would adopt the proposed en_EU (or, in fact, one can use en_DK, since for these they are equal): LC_TIME=en_EU.UTF-8 LC_NAME=en_EU.UTF-8 LC_PAPER=en_EU.UTF-8 LC_MESSAGES=en_EU.UTF-8 But, finally, there are two categories that cause trouble: LC_NUMERIC: when writing decimal numbers in English in Sweden, it is common to adopt the English use of decimal point (.) instead of the decimal comma (,). However, the en_US / en_GB use of a comma as a thousands separator is usually *not* adopted, probably in part because it could then be confused with the Swedish decimal separator. Hence, a thin space "" (which sv_SE uses) is arguably the right choice. None of the other locales discussed here (including en_DK and the proposed en_EU) matches this definition of LC_NUMERIC. Could perhaps the proposed en_EU be modified to instead use the more English-centric decimal_point="." but with thousand_sep = ""? Maybe there is a need to investigate more deeply how people write numbers alongside English text across different EU countries? LC_ADDRESS: This setting unfortunately combines country-related fields (country_name, ab2, and ab3) and language-related fields (lang_name, lang_ab, lang_term, lang_lib). I don't see how one can handle this setting fully correctly across the different EU countries without separate en_* locales. My conclusion is that even if it may be possible (perhaps with some compromise) to deal with the remaining issues, the end mix of settings will be complex for end users. Hence, maybe it is more user-friendly with separate en_DE, en_SE, en_*, etc. alongside the already existing en_DK. It would also help avoiding issues caused by, e.g., the en_SE designation already being in some use (e.g., in CLDR). I have made my en_SE available here: https://github.com/httk/locale-en_SE/blob/main/en_SE I would be happy to hear any thoughts or suggestions related to taking the steps to contribute this into glibc. Best regards, Rickard