public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Carlos O'Donell <carlos@redhat.com>
To: libc-alpha@sourceware.org, Sebastian Rasmussen <sebras@gmail.com>,
	Mike FABIAN <mfabian@redhat.com>
Subject: Re: [PATCH] Update sv_SE to treate 'W' as a distinct character (Bug 25036)
Date: Tue, 6 Apr 2021 10:23:40 -0400	[thread overview]
Message-ID: <9cf1e9c6-dc26-4005-9d4d-d48170cd745e@redhat.com> (raw)
In-Reply-To: <20210319014318.2565491-1-carlos@redhat.com>

On 3/18/21 9:43 PM, Carlos O'Donell wrote:
> From: Sebastian Rasmussen <sebras@gmail.com>
> 
> The 13th edition of Svenska Akademiens ordlista lists 'W' as a
> distinct letter that sorts after 'V'. We adjust the sv_SE locale
> (and tests) to match this updated and "reformed" language change.
> This harmonizes us with CLDR 1.5.0 (2007) for sv_SE sorting of
> the letter 'W'.

I will be committing this patch shortly to resolve this issue.

I haven't seen any objections and the general consensus is to
harmonize with CLDR which has already made these changes.

General feedback from native speakers is that this is the correct
way forward for the sv_SE locale.
 
> No regressions on x86_64, and locale sorting tests all pass.
> 
> Co-authored-by: Carlos O'Donell <carlos@redhat.com>
> ---
>  localedata/locales/sv_SE       | 26 +++++++++-----------------
>  localedata/sv_SE.ISO-8859-1.in |  4 ++--
>  localedata/sv_SE.UTF-8.in      |  4 ++--
>  3 files changed, 13 insertions(+), 21 deletions(-)
> 
> diff --git a/localedata/locales/sv_SE b/localedata/locales/sv_SE
> index b0901726db..f54c73226d 100644
> --- a/localedata/locales/sv_SE
> +++ b/localedata/locales/sv_SE
> @@ -61,22 +61,25 @@ LC_COLLATE
>  copy "iso14651_t1"
>  
>  % CLDR collation rules for Swedish:
> -% (see: https://unicode.org/cldr/trac/browser/trunk/common/collation/sv.xml)
> +% (https://github.com/unicode-org/cldr/blob/master/common/collation/sv.xml)
>  %
> -% <collation type="standard">
> +% We use the new "reformed" rules from the 13th edition of Svenska Akademiens
> +% ordlista where 'W' is considered a distinct character sorting after 'V'.
> +% This matches CLDR 1.5.0 released in 2007.
> +%
> +% <defaultCollation>reformed</defaultCollation>
> +% <collation type="reformed">
>  %   <cr><![CDATA[
>  %     &D<<đ<<<Đ<<ð<<<Ð
>  %     &t<<<þ/h
>  %     &T<<<Þ/H
> -%     &v<<<V<<w<<<W
>  %     &Y<<ü<<<Ü<<ű<<<Ű
>  %     &[before 1]ǀ<å<<<Å<ä<<<Ä<<æ<<<Æ<<ę<<<Ę<ö<<<Ö<<ø<<<Ø<<ő<<<Ő<<œ<<<Œ<<ô<<<Ô
>  %   ]]></cr>
>  % </collation>
>  %
> -% And CLDR also lists the following
> -% index characters:
> -% (see: https://unicode.org/cldr/trac/browser/trunk/common/main/sv.xml)
> +% And CLDR also lists the following index characters:
> +% (https://github.com/unicode-org/cldr/blob/master/common/main/sv.xml)
>  %
>  % <exemplarCharacters type="index">[A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Å Ä Ö]</exemplarCharacters>
>  %
> @@ -103,17 +106,6 @@ reorder-after <AFTER-Z>
>  <U00DE> "<S0074><S0068>";"<BASE><BASE>";"<COMPATCAP><COMPATCAP>";IGNORE % Þ
>  <U00FE> "<S0074><S0068>";"<BASE><BASE>";"<COMPAT><COMPAT>";IGNORE % þ
>  
> -% The letter w is normally not present in the Swedish alphabet. It
> -% exists in some names in Swedish and foreign words, but is accounted
> -% for as a variant of 'v'.  Words and names with 'w' are in Swedish
> -% ordered alphabetically among the words and names with 'v'. If two
> -% words or names are only to be distinguished by 'v' or % 'w', 'v' is
> -% placed before 'w'.
> -
> -% &v<<<V<<w<<<W
> -<U0057> <S0076>;"<BASE><VRNT1>";"<CAP><MIN>";IGNORE % W
> -<U0077> <S0076>;"<BASE><VRNT1>";"<MIN><MIN>";IGNORE % w
> -
>  % &Y<<ü<<<Ü<<ű<<<Ű
>  <U00DC> <S0079>;"<BASE><TREMA>";"<CAP><MIN>";IGNORE % Ü
>  <U00FC> <S0079>;"<BASE><TREMA>";"<MIN><MIN>";IGNORE % ü
> diff --git a/localedata/sv_SE.ISO-8859-1.in b/localedata/sv_SE.ISO-8859-1.in
> index 967c761370..94552ea80a 100644
> --- a/localedata/sv_SE.ISO-8859-1.in
> +++ b/localedata/sv_SE.ISO-8859-1.in
> @@ -42,10 +42,10 @@ u
>  U
>  v
>  V
> -w
> -W
>  va
>  Va
> +w
> +W
>  x
>  X
>  y
> diff --git a/localedata/sv_SE.UTF-8.in b/localedata/sv_SE.UTF-8.in
> index 6db46e6271..80a093e709 100644
> --- a/localedata/sv_SE.UTF-8.in
> +++ b/localedata/sv_SE.UTF-8.in
> @@ -65,10 +65,10 @@ U
>  Ů
>  v
>  V
> -w
> -W
>  va
>  Va
> +w
> +W
>  x
>  X
>  y
> 


-- 
Cheers,
Carlos.


  reply	other threads:[~2021-04-06 14:23 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19  1:43 Carlos O'Donell
2021-04-06 14:23 ` Carlos O'Donell [this message]
2021-04-06 16:55   ` Carlos O'Donell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9cf1e9c6-dc26-4005-9d4d-d48170cd745e@redhat.com \
    --to=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=mfabian@redhat.com \
    --cc=sebras@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).