de_DE has been using the wrong group separator for over 18 years

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* de_DE has been using the wrong group separator for over 18 years
@ 2018-04-17 22:25 kdex
  2018-04-18  7:14 ` Florian Weimer
  0 siblings, 1 reply; 11+ messages in thread
From: kdex @ 2018-04-17 22:25 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]

To give some context: I have previously posted the following on libc-locales 
and was asked to bring this to the attention of senior developers on this 
least who speak German.

I have noticed that the locale `de_DE` has erroneously been using a full stop 
(U+002E) for the thousands (group) separator in `mon_thousands_sep` and 
`thousands_sep` ever since 2000. The usage of a full stop to group thousands 
has (to my knowledge) has never been standardized.

As per DIN 1333, DIN 5008, and DIN EN ISO 80000, the separator should have 
been a thin space (U+2009).

In fact, DIN 1333 even explicitly forbids the usage of U+002E to group 
thousands, and DIN EN ISO 80000 explicitly excludes all other characters than 
a thin space.

Has anyone noticed this before? I fear that this change might break a lot of 
code that relies on the separator being wrong. Yet, this really should be 
fixed…

What's the best way to deal with this?

For further information, please also refer to the relevant section on 
Wikipedia at [1] (German).

[1] https://de.wikipedia.org/wiki/
Zifferngruppierung#Zur_Problematik_von_Punkt_und_Komma_f%C3%BCr_Tausender-
_und_Dezimaltrennzeichen

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: de_DE has been using the wrong group separator for over 18 years
  2018-04-17 22:25 de_DE has been using the wrong group separator for over 18 years kdex
@ 2018-04-18  7:14 ` Florian Weimer
  2018-04-18  8:31   ` kdex
  0 siblings, 1 reply; 11+ messages in thread
From: Florian Weimer @ 2018-04-18  7:14 UTC (permalink / raw)
  To: kdex, libc-alpha

On 04/18/2018 12:24 AM, kdex wrote:
> To give some context: I have previously posted the following on libc-locales
> and was asked to bring this to the attention of senior developers on this
> least who speak German.
> 
> I have noticed that the locale `de_DE` has erroneously been using a full stop
> (U+002E) for the thousands (group) separator in `mon_thousands_sep` and
> `thousands_sep` ever since 2000. The usage of a full stop to group thousands
> has (to my knowledge) has never been standardized.
> 
> As per DIN 1333, DIN 5008, and DIN EN ISO 80000, the separator should have
> been a thin space (U+2009).
> 
> In fact, DIN 1333 even explicitly forbids the usage of U+002E to group
> thousands, and DIN EN ISO 80000 explicitly excludes all other characters than
> a thin space.

These standards are simply not universally used.  They aren't exactly 
wrong, either, because some typesetters actually use a (thin) space. 
It's just that adoption is poor.

U+002E is perfectly acceptable and widely used, especially if U+2009 is 
not available (and U+0020 risks introducing a line break).  Here's a 
recent example:

Â»Die Finanzkontrolle Schwarzarbeit Ã¼berprÃ¼fte im Jahr 2017 mehr als 
52.000 Arbeitgeber und leitete fast 108.000 Strafverfahren ein. Die 
Anzahl der eingeleiteten Ermittlungsverfahren wegen der NichtgewÃ¤hrung 
des gesetzlichen Mindestlohns nach dem Mindestlohngesetz stieg auf 2.522 
Verfahren (2016: 1.651; 2015: 705).Â«

<https://www.bundesfinanzministerium.de/Content/DE/Pressemitteilungen/Finanzpolitik/2018/04/2018-04-17-ZJPK.html>

(Also look at the date at the top of the pageâ€”it doesn't follow DIN ISO 
8601, either.)

I don't think the locales need to change.  Using characters from the 
ASCII range for printing numbers has its advantages.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: de_DE has been using the wrong group separator for over 18 years
  2018-04-18  7:14 ` Florian Weimer
@ 2018-04-18  8:31   ` kdex
  2018-04-18  9:35     ` Florian Weimer
  2018-04-18 21:05     ` Rafal Luzynski
  0 siblings, 2 replies; 11+ messages in thread
From: kdex @ 2018-04-18  8:31 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 4207 bytes --]

On Wednesday, April 18, 2018 9:14:45 AM CEST Florian Weimer wrote:
> On 04/18/2018 12:24 AM, kdex wrote:
> > To give some context: I have previously posted the following on
> > libc-locales and was asked to bring this to the attention of senior
> > developers on this least who speak German.
> > 
> > I have noticed that the locale `de_DE` has erroneously been using a full
> > stop (U+002E) for the thousands (group) separator in `mon_thousands_sep`
> > and `thousands_sep` ever since 2000. The usage of a full stop to group
> > thousands has (to my knowledge) has never been standardized.
> > 
> > As per DIN 1333, DIN 5008, and DIN EN ISO 80000, the separator should have
> > been a thin space (U+2009).
> > 
> > In fact, DIN 1333 even explicitly forbids the usage of U+002E to group
> > thousands, and DIN EN ISO 80000 explicitly excludes all other characters
> > than a thin space.
> 
> These standards are simply not universally used.  They aren't exactly
> wrong, either, because some typesetters actually use a (thin) space.
> It's just that adoption is poor.
> 
> U+002E is perfectly acceptable and widely used, especially if U+2009 is
> not available (and U+0020 risks introducing a line break).  Here's a
> recent example:
> 
> »Die Finanzkontrolle Schwarzarbeit überprüfte im Jahr 2017 mehr als
> 52.000 Arbeitgeber und leitete fast 108.000 Strafverfahren ein. Die
> Anzahl der eingeleiteten Ermittlungsverfahren wegen der Nichtgewährung
> des gesetzlichen Mindestlohns nach dem Mindestlohngesetz stieg auf 2.522
> Verfahren (2016: 1.651; 2015: 705).«
> 
> <https://www.bundesfinanzministerium.de/Content/DE/Pressemitteilungen/Finanz
> politik/2018/04/2018-04-17-ZJPK.html>
While the Federal Ministry of Finance may be an interesting (or even ironic) 
source to point out, it is in no way normative, and their website is mostly 
subject to their team of web developers.

Note that according to DIN 5008, amounts of money should, for security 
purposes, indeed be grouped with periods, so leaving `mon_thousands_sep` as-is 
would still allow for standards-compliance. Amounts of money are also covered 
by the three norms I've brought up before:

[…] Aus diesem Grund sehen Normen die Verwendung eines Leerzeichens als 
Tausendertrennzeichen vor (DIN 1333, DIN 5008 und ISO 80000). Dabei wird ein 
schmales Leerzeichen empfohlen, falls dieses technisch verfügbar ist. Eine 
Ausnahme bilden Geldbeträge, die aus Sicherheitsgründen mit dem Leerzeichen, 
das mindestens die Breite einer der Ziffern hat, oder einem Trennzeichen (wie 
dem Punkt) getrennt werden können. [2]

> 
> (Also look at the date at the top of the page—it doesn't follow DIN ISO
> 8601, either.)
That's unfortunate; but the paragraph about normativity above would apply 
here, too.

Duden, the German approach to these matters (generally considered relatively 
normative among Germans), adheres to the ISO norms as well [1], which should 
speak for itself.

It's simple enough to find instances of German articles about finances that 
try to use spaces (admittedly the wrong ones) as separators as well, see [3].
> 
> I don't think the locales need to change.  Using characters from the
> ASCII range for printing numbers has its advantages.
I don't think this premise is correct: In de_DE, amounts of money include 
`currency_symbol` (U+20AC), which is not in the ASCII range. ps_AF uses U+066C 
for `thousands_sep`, fa_IR uses U+002C and es_MX even uses U+2009 (the very 
same character that this thread is about). None of these are in the ASCII 
range; so why should we treat de_DE like a special case? It's much easier to 
be standards-compliant here and get used to the fact that numbers do generally 
contain non-ASCII characters that a parser could just skip over.
> 
> Thanks,
> Florian
[1] https://www.duden.de/sprachwissen/rechtschreibregeln/zahlen-und-ziffern
[2] https://de.wikipedia.org/wiki/
Zifferngruppierung#Zur_Problematik_von_Punkt_und_Komma_f%C3%BCr_Tausender-
_und_Dezimaltrennzeichen
[3] https://www.finanzen.ch/nachrichten/aktien/Aktien-Frankfurt-Eroeffnung-Dax-schiebt-sich-wieder-ueber-12-600-Punkte-1021483710

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: de_DE has been using the wrong group separator for over 18 years
  2018-04-18  8:31   ` kdex
@ 2018-04-18  9:35     ` Florian Weimer
  2018-04-18 21:10       ` kdex
  2018-04-18 21:05     ` Rafal Luzynski
  1 sibling, 1 reply; 11+ messages in thread
From: Florian Weimer @ 2018-04-18  9:35 UTC (permalink / raw)
  To: kdex, libc-alpha

On 04/18/2018 10:30 AM, kdex wrote:

> While the Federal Ministry of Finance may be an interesting (or even ironic)
> source to point out, it is in no way normative, and their website is mostly
> subject to their team of web developers.

Â»Rund 32.000 Experten aus Wirtschaft und Forschung, von Verbraucherseite 
und der Ã¶ffentlichen Hand bringen ihr Fachwissen in den Normungsprozess 
ein, den DIN als privatwirtschaftlich organisierter Projektmanager steuert.Â«

<https://www.din.de/de/ueber-normen-und-standards/basiswissen>

And as that web page explains, DIN norms aren't normative, either.  Our 
users expect that the locales follow actual practice, not what some 
document says that they have never seen and nobody has read.  (For 
example, I can't easily tell whether the DIN-proposed keyboard layout 
for German provides a convenient way to enter the relevant space character.)

Wikipedia itself prefers Â».Â« for numbers on (culturally) German pages:

<https://de.wikipedia.org/wiki/Wikipedia:Schreibweise_von_Zahlen#Zifferngruppierung>

You cited a Swiss web page (finanzen.ch), but the Swiss have slightly 
different typographical traditions which do not apply to de_DE.

As I said, some (culturally German) typesetters use spaces (of various 
widths), but their use is somewhat rare.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: de_DE has been using the wrong group separator for over 18 years
  2018-04-18  8:31   ` kdex
  2018-04-18  9:35     ` Florian Weimer
@ 2018-04-18 21:05     ` Rafal Luzynski
  2018-04-19  8:08       ` Florian Weimer
  1 sibling, 1 reply; 11+ messages in thread
From: Rafal Luzynski @ 2018-04-18 21:05 UTC (permalink / raw)
  To: libc-alpha, kdex

18.04.2018 10:30 kdex <kdex@kdex.de> wrote:
>
> On Wednesday, April 18, 2018 9:14:45 AM CEST Florian Weimer wrote:
> > On 04/18/2018 12:24 AM, kdex wrote:
> > > To give some context: I have previously posted the following on
> > > libc-locales and was asked to bring this to the attention of senior
> > > developers on this least who speak German.
> > >
> > > I have noticed that the locale `de_DE` has erroneously been using a full
> > > stop (U+002E) for the thousands (group) separator in `mon_thousands_sep`
> > > and `thousands_sep` ever since 2000. The usage of a full stop to group
> > > thousands has (to my knowledge) has never been standardized.
> > >
> > > As per DIN 1333, DIN 5008, and DIN EN ISO 80000, the separator should have
> > > been a thin space (U+2009).
> > >
> > > In fact, DIN 1333 even explicitly forbids the usage of U+002E to group
> > > thousands, and DIN EN ISO 80000 explicitly excludes all other characters
> > > than a thin space.
> >
> > These standards are simply not universally used. They aren't exactly
> > wrong, either, because some typesetters actually use a (thin) space.
> > It's just that adoption is poor.

Florian, this is ambiguous: do you mean "not universally used, except for
financial institutions" or "not universally used, even by financial
institutions"?  Note that there is mon_thousands_sep (in LC_MONETARY)
and thousands_sep (in LC_NUMERIC) so it is possible to set different
thousands separators to format amounts of money and to format other numbers.

> [...]
> > I don't think the locales need to change. Using characters from the
> > ASCII range for printing numbers has its advantages.
> I don't think this premise is correct: In de_DE, amounts of money include
> `currency_symbol` (U+20AC), which is not in the ASCII range. [...]

This is an euro sign (€) and it is displayed as it is in the locales
implementing Unicode (e.g., de_DE.UTF-8).  In de_DE.ISO-8859-15@euro
it is converted to 0xa4 character which is again an euro sign in
ISO 8859-15.  In de_DE.ISO-8859-1 it is converted to "EUR" string.
So in every charset it is displayed correctly.  My point is that it
is safe to use sophisticated Unicode characters (like narrow space etc.)
in the locale data source code and assume that localedef handles it
smartly in every charset.

Of course I don't know what grouping separator is correct for Germany,
I'm only reminding possible technical solutions.

Regards,

Rafal

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: de_DE has been using the wrong group separator for over 18 years
  2018-04-18  9:35     ` Florian Weimer
@ 2018-04-18 21:10       ` kdex
  2018-04-19  8:55         ` Florian Weimer
  0 siblings, 1 reply; 11+ messages in thread
From: kdex @ 2018-04-18 21:10 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2980 bytes --]

On Wednesday, April 18, 2018 11:35:31 AM CEST Florian Weimer wrote:
> On 04/18/2018 10:30 AM, kdex wrote:
> > While the Federal Ministry of Finance may be an interesting (or even
> > ironic) source to point out, it is in no way normative, and their website
> > is mostly subject to their team of web developers.
> 
> »Rund 32.000 Experten aus Wirtschaft und Forschung, von Verbraucherseite
> und der öffentlichen Hand bringen ihr Fachwissen in den Normungsprozess
> ein, den DIN als privatwirtschaftlich organisierter Projektmanager steuert.«
> 
> <https://www.din.de/de/ueber-normen-und-standards/basiswissen>
> 
> And as that web page explains, DIN norms aren't normative, either.  Our
> users expect that the locales follow actual practice, not what some
> document says that they have never seen and nobody has read.  (For
> example, I can't easily tell whether the DIN-proposed keyboard layout
> for German provides a convenient way to enter the relevant space character.)
By the same argument, you could easily pitch abandoning all norms; most people 
will likely not have seen or read any norm in their lives; the majority just 
replicates what others do, or whatever Duden states.

So yes, it is true that it there is no requirement to follow DIN norms, nor is 
there a requirement to follow Duden's word spellings; though I don't see how a 
seemingly arbitrary group separator with no normative grounds is any better. 
The purpose of a norm is to have a common system that everyone can follow. 
Doesn't deviating from the norm defy the very purpose of having norms in the 
first place?

I would be surprised if *everyone* using the es_MX locale would expect U+2009 
to be used as a group separator; in fact, I do imagine a lot of users being 
just as creative as the Germans when it comes to typing out grouping 
separators by hand.

Hence, the point is less that locale users need the ability to have U+2009 
mapped on their keyboards somewhere, but rather that users should be able to 
input regular numbers and rely on their software to use their system locale to 
figure out how their numbers should be displayed according to the current 
locale.
> 
> Wikipedia itself prefers ».« for numbers on (culturally) German pages:
> 
> <https://de.wikipedia.org/wiki/Wikipedia:Schreibweise_von_Zahlen#Zifferngrup
> pierung>
> 
> You cited a Swiss web page (finanzen.ch), but the Swiss have slightly
> different typographical traditions which do not apply to de_DE.
I think it's best if we consider most URLs exchanged in this thread to be 
anecdotal evidence; this doesn't get us much further (but I do acknowledge 
your point).

Ideally, we should adhere to "official" guidelines. And as has been stated 
before; Duden is de-jure non-normative, but de-facto, it very much is.
> 
> As I said, some (culturally German) typesetters use spaces (of various
> widths), but their use is somewhat rare.
> 
> Thanks,
> Florian

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: de_DE has been using the wrong group separator for over 18 years
  2018-04-18 21:05     ` Rafal Luzynski
@ 2018-04-19  8:08       ` Florian Weimer
  0 siblings, 0 replies; 11+ messages in thread
From: Florian Weimer @ 2018-04-19  8:08 UTC (permalink / raw)
  To: Rafal Luzynski, libc-alpha, kdex

On 04/18/2018 11:05 PM, Rafal Luzynski wrote:
>>> These standards are simply not universally used. They aren't exactly
>>> wrong, either, because some typesetters actually use a (thin) space.
>>> It's just that adoption is poor.

> Florian, this is ambiguous: do you mean "not universally used, except for
> financial institutions" or "not universally used, even by financial
> institutions"?  Note that there is mon_thousands_sep (in LC_MONETARY)
> and thousands_sep (in LC_NUMERIC) so it is possible to set different
> thousands separators to format amounts of money and to format other numbers.

As far as I can tell, the dot as the thousands separator is used 
everywhere in Germany, not just by financial institutions or for 
currency amounts.  On the other hand, you can also find books which use 
a (thin) space as the separator (current works, outside cited parts), 
and some people use it in their personal correspondence as well.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: de_DE has been using the wrong group separator for over 18 years
  2018-04-18 21:10       ` kdex
@ 2018-04-19  8:55         ` Florian Weimer
  2018-04-19 19:55           ` Carlos O'Donell
  0 siblings, 1 reply; 11+ messages in thread
From: Florian Weimer @ 2018-04-19  8:55 UTC (permalink / raw)
  To: libc-alpha

On 04/18/2018 11:10 PM, kdex wrote:
> On Wednesday, April 18, 2018 11:35:31 AM CEST Florian Weimer wrote:
>> On 04/18/2018 10:30 AM, kdex wrote:
>>> While the Federal Ministry of Finance may be an interesting (or even
>>> ironic) source to point out, it is in no way normative, and their website
>>> is mostly subject to their team of web developers.
>>
>> Â»Rund 32.000 Experten aus Wirtschaft und Forschung, von Verbraucherseite
>> und der Ã¶ffentlichen Hand bringen ihr Fachwissen in den Normungsprozess
>> ein, den DIN als privatwirtschaftlich organisierter Projektmanager steuert.Â«
>>
>> <https://www.din.de/de/ueber-normen-und-standards/basiswissen>
>>
>> And as that web page explains, DIN norms aren't normative, either.  Our
>> users expect that the locales follow actual practice, not what some
>> document says that they have never seen and nobody has read.  (For
>> example, I can't easily tell whether the DIN-proposed keyboard layout
>> for German provides a convenient way to enter the relevant space character.)

> By the same argument, you could easily pitch abandoning all norms; most people
> will likely not have seen or read any norm in their lives; the majority just
> replicates what others do, or whatever Duden states.

Yes, but when it comes to natural languages, it is our job (as glibc 
maintainers updating locale definitions) to document the existing 
practice, and not to try to change it.  For whatever reason, DIN 
standards are a poor guidance for maintaining locales, and so are the 
reference materials Duden publishes.

> So yes, it is true that it there is no requirement to follow DIN norms, nor is
> there a requirement to follow Duden's word spellings; though I don't see how a
> seemingly arbitrary group separator with no normative grounds is any better.

I can't access the DIN process documents.  We would have to review why 
they rejected the dot separator when it was widely used, sometimes when 
the standards were created for the first time.  There has to be some 
rationale for the discrepancy.

Based on the publicly available information, the choice to make the dot 
or the space normative in this context appears to be totally arbitrary.

> The purpose of a norm is to have a common system that everyone can follow.
> Doesn't deviating from the norm defy the very purpose of having norms in the
> first place?

For norms in the area of natural language, the norms should document 
existing practice.  Everything else does not make sense and leads to 
poor adoption.  If there is no consensus, you can document multiple 
options (see prototype and non-prototype function definitions in C90, 
for an example in the area of programming languages, where you would 
expect that more rigidness would be appropriate, not less).

> Hence, the point is less that locale users need the ability to have U+2009
> mapped on their keyboards somewhere, but rather that users should be able to
> input regular numbers and rely on their software to use their system locale to
> figure out how their numbers should be displayed according to the current
> locale.

But that's not how people enter numbers in their word processor.

U+2009 also has the wrong line breaking property in the basic Unicode 
line breaking algorithm <https://unicode.org/reports/tr14/>, so it makes 
it quite hard for word processors to do the right thing even if the user 
managers to enter this character.

> Ideally, we should adhere to "official" guidelines. And as has been stated
> before; Duden is de-jure non-normative, but de-facto, it very much is.

That's a historical accident because a government body once referred to 
it as Â»maÃŸgeblich in allen ZweifelsfÃ¤llenÂ« (Â“authoritative if there is 
any doubtÂ”), but that referred to orthography and was before there was 
an official, government-issued list of spellings.  When the official 
word list was finally published in 1996, Duden lost any claims to 
authority (and apparently removed it from their marketing materials).

Regarding number formatting (mainly in typesetting), my 1996 edition of 
the short Duden volume suggests that it is at least partially 
descriptive (Â»hat sich eingebÃ¼rgertÂ«).  In this light, the omission of 
the dot as the separator (which was common at the time) looks like a 
mistake.  Despite a decade or more of widespread use of word processors, 
it only has guidelines for typewriting, and does not address the matter 
of breaking spaces in numbers that arise in word processors.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: de_DE has been using the wrong group separator for over 18 years
  2018-04-19  8:55         ` Florian Weimer
@ 2018-04-19 19:55           ` Carlos O'Donell
  2018-04-19 20:40             ` Florian Weimer
  0 siblings, 1 reply; 11+ messages in thread
From: Carlos O'Donell @ 2018-04-19 19:55 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 04/19/2018 03:55 AM, Florian Weimer wrote:
> On 04/18/2018 11:10 PM, kdex wrote:
>> Hence, the point is less that locale users need the ability to have
>> U+2009 mapped on their keyboards somewhere, but rather that users
>> should be able to input regular numbers and rely on their software
>> to use their system locale to figure out how their numbers should
>> be displayed according to the current locale.
> 
> But that's not how people enter numbers in their word processor.
> 
> U+2009 also has the wrong line breaking property in the basic Unicode
> line breaking algorithm <https://unicode.org/reports/tr14/>, so it
> makes it quite hard for word processors to do the right thing even if
> the user managers to enter this character.

We use U+202F now.

There is some history here with regard to U+2009.

I reviewed the es_MX case for thousands_sep becoming U+2009, and I wrote
to the Mexican government, and reviewed the relative standards and
cultural use cases, but I did *not* consider the impact on the ability for
users to type or the line-breaking aspects of the change (nor do I think
the standard covered these problems).

It seemed like U+2009 was the right choice, but this resulted in swbz#20756:
https://sourceware.org/bugzilla/show_bug.cgi?id=20756

And Mike Fabian fixed this after my review and we now use Narrow non-breaking
space (U+202F).

So you could use U+202F, but it seems like this is just wishful thinking on
the part of standards. Just like how Canada claims to follow ISO 8601 everywhere
and then doesn't.

I agree with the sentiment that these should be "trailing standards" that
define existing practice.

Where existing practice matches government standards, you are assured that
there has been consensus and can adjust the locales to match.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: de_DE has been using the wrong group separator for over 18 years
  2018-04-19 19:55           ` Carlos O'Donell
@ 2018-04-19 20:40             ` Florian Weimer
  2018-04-20  1:38               ` Carlos O'Donell
  0 siblings, 1 reply; 11+ messages in thread
From: Florian Weimer @ 2018-04-19 20:40 UTC (permalink / raw)
  To: Carlos O'Donell, libc-alpha

On 04/19/2018 09:55 PM, Carlos O'Donell wrote:
> On 04/19/2018 03:55 AM, Florian Weimer wrote:
>> On 04/18/2018 11:10 PM, kdex wrote:
>>> Hence, the point is less that locale users need the ability to have
>>> U+2009 mapped on their keyboards somewhere, but rather that users
>>> should be able to input regular numbers and rely on their software
>>> to use their system locale to figure out how their numbers should
>>> be displayed according to the current locale.
>>
>> But that's not how people enter numbers in their word processor.
>>
>> U+2009 also has the wrong line breaking property in the basic Unicode
>> line breaking algorithm <https://unicode.org/reports/tr14/>, so it
>> makes it quite hard for word processors to do the right thing even if
>> the user managers to enter this character.
> 
> We use U+202F now.

Ahh, I knew that I was missing something.  Nice to know that Unicode has 
a narrow non-breaking space.  That's seems to be completely appropriate 
if you want to use a narrow space in this context.

> There is some history here with regard to U+2009.
> 
> I reviewed the es_MX case for thousands_sep becoming U+2009, and I wrote
> to the Mexican government, and reviewed the relative standards and
> cultural use cases, but I did *not* consider the impact on the ability for
> users to type or the line-breaking aspects of the change (nor do I think
> the standard covered these problems).

The matter of entering the character is not so important to glibc's use 
case, I think.  But it should matter for a standard with *word 
processor* usage guidelines.  If it requires using an 
impossible-to-enter character for conformance, then it looks like 
something went wrong during the standardization process.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: de_DE has been using the wrong group separator for over 18 years
  2018-04-19 20:40             ` Florian Weimer
@ 2018-04-20  1:38               ` Carlos O'Donell
  0 siblings, 0 replies; 11+ messages in thread
From: Carlos O'Donell @ 2018-04-20  1:38 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 04/19/2018 03:40 PM, Florian Weimer wrote:
> On 04/19/2018 09:55 PM, Carlos O'Donell wrote:
>> On 04/19/2018 03:55 AM, Florian Weimer wrote:
>>> On 04/18/2018 11:10 PM, kdex wrote:
>>>> Hence, the point is less that locale users need the ability to
>>>> have U+2009 mapped on their keyboards somewhere, but rather
>>>> that users should be able to input regular numbers and rely on
>>>> their software to use their system locale to figure out how
>>>> their numbers should be displayed according to the current
>>>> locale.
>>> 
>>> But that's not how people enter numbers in their word processor.
>>> 
>>> U+2009 also has the wrong line breaking property in the basic
>>> Unicode line breaking algorithm
>>> <https://unicode.org/reports/tr14/>, so it makes it quite hard
>>> for word processors to do the right thing even if the user
>>> managers to enter this character.
>> 
>> We use U+202F now.
> 
> Ahh, I knew that I was missing something.  Nice to know that Unicode
> has a narrow non-breaking space.  That's seems to be completely
> appropriate if you want to use a narrow space in this context.

Exactly. So our locale progression has been:

U+0020 -> U+2009 -> U+202F

As we became more knowledgeable with Unicode, but that still doesn't
mean it should be chosen if it's not common practice.

>> There is some history here with regard to U+2009.
>> 
>> I reviewed the es_MX case for thousands_sep becoming U+2009, and I
>> wrote to the Mexican government, and reviewed the relative
>> standards and cultural use cases, but I did *not* consider the
>> impact on the ability for users to type or the line-breaking
>> aspects of the change (nor do I think the standard covered these
>> problems).
> 
> The matter of entering the character is not so important to glibc's
> use case, I think.  But it should matter for a standard with *word
> processor* usage guidelines.  If it requires using an
> impossible-to-enter character for conformance, then it looks like
> something went wrong during the standardization process.

Agreed.

I would expect a word processor to automatically format my numbers
given the locale. However, while writing by hand it can be a bit hard
to detect exactly when to do that, but not impossible.

In LibreOffice it's a slightly long "ctrl+shift+U 202f enter", but
it's not impossible, nor overly difficult, and you can script it with
a quick auto-correct e.g. :nbs: => U+202F, or some other kind of
automation.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-04-20  1:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-17 22:25 de_DE has been using the wrong group separator for over 18 years kdex
2018-04-18  7:14 ` Florian Weimer
2018-04-18  8:31   ` kdex
2018-04-18  9:35     ` Florian Weimer
2018-04-18 21:10       ` kdex
2018-04-19  8:55         ` Florian Weimer
2018-04-19 19:55           ` Carlos O'Donell
2018-04-19 20:40             ` Florian Weimer
2018-04-20  1:38               ` Carlos O'Donell
2018-04-18 21:05     ` Rafal Luzynski
2018-04-19  8:08       ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).