public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Update sv_SE to treate 'W' as a distinct character (Bug 25036)
@ 2021-03-19  1:43 Carlos O'Donell
  2021-04-06 14:23 ` Carlos O'Donell
  0 siblings, 1 reply; 3+ messages in thread
From: Carlos O'Donell @ 2021-03-19  1:43 UTC (permalink / raw)
  To: libc-alpha, Sebastian Rasmussen, Mike FABIAN

From: Sebastian Rasmussen <sebras@gmail.com>

The 13th edition of Svenska Akademiens ordlista lists 'W' as a
distinct letter that sorts after 'V'. We adjust the sv_SE locale
(and tests) to match this updated and "reformed" language change.
This harmonizes us with CLDR 1.5.0 (2007) for sv_SE sorting of
the letter 'W'.

No regressions on x86_64, and locale sorting tests all pass.

Co-authored-by: Carlos O'Donell <carlos@redhat.com>
---
 localedata/locales/sv_SE       | 26 +++++++++-----------------
 localedata/sv_SE.ISO-8859-1.in |  4 ++--
 localedata/sv_SE.UTF-8.in      |  4 ++--
 3 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/localedata/locales/sv_SE b/localedata/locales/sv_SE
index b0901726db..f54c73226d 100644
--- a/localedata/locales/sv_SE
+++ b/localedata/locales/sv_SE
@@ -61,22 +61,25 @@ LC_COLLATE
 copy "iso14651_t1"
 
 % CLDR collation rules for Swedish:
-% (see: https://unicode.org/cldr/trac/browser/trunk/common/collation/sv.xml)
+% (https://github.com/unicode-org/cldr/blob/master/common/collation/sv.xml)
 %
-% <collation type="standard">
+% We use the new "reformed" rules from the 13th edition of Svenska Akademiens
+% ordlista where 'W' is considered a distinct character sorting after 'V'.
+% This matches CLDR 1.5.0 released in 2007.
+%
+% <defaultCollation>reformed</defaultCollation>
+% <collation type="reformed">
 %   <cr><![CDATA[
 %     &D<<đ<<<Đ<<ð<<<Ð
 %     &t<<<þ/h
 %     &T<<<Þ/H
-%     &v<<<V<<w<<<W
 %     &Y<<ü<<<Ü<<ű<<<Ű
 %     &[before 1]ǀ<å<<<Å<ä<<<Ä<<æ<<<Æ<<ę<<<Ę<ö<<<Ö<<ø<<<Ø<<ő<<<Ő<<œ<<<Œ<<ô<<<Ô
 %   ]]></cr>
 % </collation>
 %
-% And CLDR also lists the following
-% index characters:
-% (see: https://unicode.org/cldr/trac/browser/trunk/common/main/sv.xml)
+% And CLDR also lists the following index characters:
+% (https://github.com/unicode-org/cldr/blob/master/common/main/sv.xml)
 %
 % <exemplarCharacters type="index">[A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Å Ä Ö]</exemplarCharacters>
 %
@@ -103,17 +106,6 @@ reorder-after <AFTER-Z>
 <U00DE> "<S0074><S0068>";"<BASE><BASE>";"<COMPATCAP><COMPATCAP>";IGNORE % Þ
 <U00FE> "<S0074><S0068>";"<BASE><BASE>";"<COMPAT><COMPAT>";IGNORE % þ
 
-% The letter w is normally not present in the Swedish alphabet. It
-% exists in some names in Swedish and foreign words, but is accounted
-% for as a variant of 'v'.  Words and names with 'w' are in Swedish
-% ordered alphabetically among the words and names with 'v'. If two
-% words or names are only to be distinguished by 'v' or % 'w', 'v' is
-% placed before 'w'.
-
-% &v<<<V<<w<<<W
-<U0057> <S0076>;"<BASE><VRNT1>";"<CAP><MIN>";IGNORE % W
-<U0077> <S0076>;"<BASE><VRNT1>";"<MIN><MIN>";IGNORE % w
-
 % &Y<<ü<<<Ü<<ű<<<Ű
 <U00DC> <S0079>;"<BASE><TREMA>";"<CAP><MIN>";IGNORE % Ü
 <U00FC> <S0079>;"<BASE><TREMA>";"<MIN><MIN>";IGNORE % ü
diff --git a/localedata/sv_SE.ISO-8859-1.in b/localedata/sv_SE.ISO-8859-1.in
index 967c761370..94552ea80a 100644
--- a/localedata/sv_SE.ISO-8859-1.in
+++ b/localedata/sv_SE.ISO-8859-1.in
@@ -42,10 +42,10 @@ u
 U
 v
 V
-w
-W
 va
 Va
+w
+W
 x
 X
 y
diff --git a/localedata/sv_SE.UTF-8.in b/localedata/sv_SE.UTF-8.in
index 6db46e6271..80a093e709 100644
--- a/localedata/sv_SE.UTF-8.in
+++ b/localedata/sv_SE.UTF-8.in
@@ -65,10 +65,10 @@ U
 Ů
 v
 V
-w
-W
 va
 Va
+w
+W
 x
 X
 y
-- 
2.26.2


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Update sv_SE to treate 'W' as a distinct character (Bug 25036)
  2021-03-19  1:43 [PATCH] Update sv_SE to treate 'W' as a distinct character (Bug 25036) Carlos O'Donell
@ 2021-04-06 14:23 ` Carlos O'Donell
  2021-04-06 16:55   ` Carlos O'Donell
  0 siblings, 1 reply; 3+ messages in thread
From: Carlos O'Donell @ 2021-04-06 14:23 UTC (permalink / raw)
  To: libc-alpha, Sebastian Rasmussen, Mike FABIAN

On 3/18/21 9:43 PM, Carlos O'Donell wrote:
> From: Sebastian Rasmussen <sebras@gmail.com>
> 
> The 13th edition of Svenska Akademiens ordlista lists 'W' as a
> distinct letter that sorts after 'V'. We adjust the sv_SE locale
> (and tests) to match this updated and "reformed" language change.
> This harmonizes us with CLDR 1.5.0 (2007) for sv_SE sorting of
> the letter 'W'.

I will be committing this patch shortly to resolve this issue.

I haven't seen any objections and the general consensus is to
harmonize with CLDR which has already made these changes.

General feedback from native speakers is that this is the correct
way forward for the sv_SE locale.
 
> No regressions on x86_64, and locale sorting tests all pass.
> 
> Co-authored-by: Carlos O'Donell <carlos@redhat.com>
> ---
>  localedata/locales/sv_SE       | 26 +++++++++-----------------
>  localedata/sv_SE.ISO-8859-1.in |  4 ++--
>  localedata/sv_SE.UTF-8.in      |  4 ++--
>  3 files changed, 13 insertions(+), 21 deletions(-)
> 
> diff --git a/localedata/locales/sv_SE b/localedata/locales/sv_SE
> index b0901726db..f54c73226d 100644
> --- a/localedata/locales/sv_SE
> +++ b/localedata/locales/sv_SE
> @@ -61,22 +61,25 @@ LC_COLLATE
>  copy "iso14651_t1"
>  
>  % CLDR collation rules for Swedish:
> -% (see: https://unicode.org/cldr/trac/browser/trunk/common/collation/sv.xml)
> +% (https://github.com/unicode-org/cldr/blob/master/common/collation/sv.xml)
>  %
> -% <collation type="standard">
> +% We use the new "reformed" rules from the 13th edition of Svenska Akademiens
> +% ordlista where 'W' is considered a distinct character sorting after 'V'.
> +% This matches CLDR 1.5.0 released in 2007.
> +%
> +% <defaultCollation>reformed</defaultCollation>
> +% <collation type="reformed">
>  %   <cr><![CDATA[
>  %     &D<<đ<<<Đ<<ð<<<Ð
>  %     &t<<<þ/h
>  %     &T<<<Þ/H
> -%     &v<<<V<<w<<<W
>  %     &Y<<ü<<<Ü<<ű<<<Ű
>  %     &[before 1]ǀ<å<<<Å<ä<<<Ä<<æ<<<Æ<<ę<<<Ę<ö<<<Ö<<ø<<<Ø<<ő<<<Ő<<œ<<<Œ<<ô<<<Ô
>  %   ]]></cr>
>  % </collation>
>  %
> -% And CLDR also lists the following
> -% index characters:
> -% (see: https://unicode.org/cldr/trac/browser/trunk/common/main/sv.xml)
> +% And CLDR also lists the following index characters:
> +% (https://github.com/unicode-org/cldr/blob/master/common/main/sv.xml)
>  %
>  % <exemplarCharacters type="index">[A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Å Ä Ö]</exemplarCharacters>
>  %
> @@ -103,17 +106,6 @@ reorder-after <AFTER-Z>
>  <U00DE> "<S0074><S0068>";"<BASE><BASE>";"<COMPATCAP><COMPATCAP>";IGNORE % Þ
>  <U00FE> "<S0074><S0068>";"<BASE><BASE>";"<COMPAT><COMPAT>";IGNORE % þ
>  
> -% The letter w is normally not present in the Swedish alphabet. It
> -% exists in some names in Swedish and foreign words, but is accounted
> -% for as a variant of 'v'.  Words and names with 'w' are in Swedish
> -% ordered alphabetically among the words and names with 'v'. If two
> -% words or names are only to be distinguished by 'v' or % 'w', 'v' is
> -% placed before 'w'.
> -
> -% &v<<<V<<w<<<W
> -<U0057> <S0076>;"<BASE><VRNT1>";"<CAP><MIN>";IGNORE % W
> -<U0077> <S0076>;"<BASE><VRNT1>";"<MIN><MIN>";IGNORE % w
> -
>  % &Y<<ü<<<Ü<<ű<<<Ű
>  <U00DC> <S0079>;"<BASE><TREMA>";"<CAP><MIN>";IGNORE % Ü
>  <U00FC> <S0079>;"<BASE><TREMA>";"<MIN><MIN>";IGNORE % ü
> diff --git a/localedata/sv_SE.ISO-8859-1.in b/localedata/sv_SE.ISO-8859-1.in
> index 967c761370..94552ea80a 100644
> --- a/localedata/sv_SE.ISO-8859-1.in
> +++ b/localedata/sv_SE.ISO-8859-1.in
> @@ -42,10 +42,10 @@ u
>  U
>  v
>  V
> -w
> -W
>  va
>  Va
> +w
> +W
>  x
>  X
>  y
> diff --git a/localedata/sv_SE.UTF-8.in b/localedata/sv_SE.UTF-8.in
> index 6db46e6271..80a093e709 100644
> --- a/localedata/sv_SE.UTF-8.in
> +++ b/localedata/sv_SE.UTF-8.in
> @@ -65,10 +65,10 @@ U
>  Ů
>  v
>  V
> -w
> -W
>  va
>  Va
> +w
> +W
>  x
>  X
>  y
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Update sv_SE to treate 'W' as a distinct character (Bug 25036)
  2021-04-06 14:23 ` Carlos O'Donell
@ 2021-04-06 16:55   ` Carlos O'Donell
  0 siblings, 0 replies; 3+ messages in thread
From: Carlos O'Donell @ 2021-04-06 16:55 UTC (permalink / raw)
  To: libc-alpha, Sebastian Rasmussen, Mike FABIAN

On 4/6/21 10:23 AM, Carlos O'Donell wrote:
> On 3/18/21 9:43 PM, Carlos O'Donell wrote:
>> From: Sebastian Rasmussen <sebras@gmail.com>
>>
>> The 13th edition of Svenska Akademiens ordlista lists 'W' as a
>> distinct letter that sorts after 'V'. We adjust the sv_SE locale
>> (and tests) to match this updated and "reformed" language change.
>> This harmonizes us with CLDR 1.5.0 (2007) for sv_SE sorting of
>> the letter 'W'.
> 
> I will be committing this patch shortly to resolve this issue.
> 
> I haven't seen any objections and the general consensus is to
> harmonize with CLDR which has already made these changes.
> 
> General feedback from native speakers is that this is the correct
> way forward for the sv_SE locale.

Pushed, and so will be fixed in 2.34.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-04-06 16:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-19  1:43 [PATCH] Update sv_SE to treate 'W' as a distinct character (Bug 25036) Carlos O'Donell
2021-04-06 14:23 ` Carlos O'Donell
2021-04-06 16:55   ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).