* de_DE has been using the wrong group separator for over 18 years
@ 2018-04-17 22:25 kdex
2018-04-18 7:14 ` Florian Weimer
0 siblings, 1 reply; 11+ messages in thread
From: kdex @ 2018-04-17 22:25 UTC (permalink / raw)
To: libc-alpha
[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]
To give some context: I have previously posted the following on libc-locales
and was asked to bring this to the attention of senior developers on this
least who speak German.
I have noticed that the locale `de_DE` has erroneously been using a full stop
(U+002E) for the thousands (group) separator in `mon_thousands_sep` and
`thousands_sep` ever since 2000. The usage of a full stop to group thousands
has (to my knowledge) has never been standardized.
As per DIN 1333, DIN 5008, and DIN EN ISO 80000, the separator should have
been a thin space (U+2009).
In fact, DIN 1333 even explicitly forbids the usage of U+002E to group
thousands, and DIN EN ISO 80000 explicitly excludes all other characters than
a thin space.
Has anyone noticed this before? I fear that this change might break a lot of
code that relies on the separator being wrong. Yet, this really should be
fixed…
What's the best way to deal with this?
For further information, please also refer to the relevant section on
Wikipedia at [1] (German).
[1] https://de.wikipedia.org/wiki/
Zifferngruppierung#Zur_Problematik_von_Punkt_und_Komma_f%C3%BCr_Tausender-
_und_Dezimaltrennzeichen
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years
2018-04-17 22:25 de_DE has been using the wrong group separator for over 18 years kdex
@ 2018-04-18 7:14 ` Florian Weimer
2018-04-18 8:31 ` kdex
0 siblings, 1 reply; 11+ messages in thread
From: Florian Weimer @ 2018-04-18 7:14 UTC (permalink / raw)
To: kdex, libc-alpha
On 04/18/2018 12:24 AM, kdex wrote:
> To give some context: I have previously posted the following on libc-locales
> and was asked to bring this to the attention of senior developers on this
> least who speak German.
>
> I have noticed that the locale `de_DE` has erroneously been using a full stop
> (U+002E) for the thousands (group) separator in `mon_thousands_sep` and
> `thousands_sep` ever since 2000. The usage of a full stop to group thousands
> has (to my knowledge) has never been standardized.
>
> As per DIN 1333, DIN 5008, and DIN EN ISO 80000, the separator should have
> been a thin space (U+2009).
>
> In fact, DIN 1333 even explicitly forbids the usage of U+002E to group
> thousands, and DIN EN ISO 80000 explicitly excludes all other characters than
> a thin space.
These standards are simply not universally used. They aren't exactly
wrong, either, because some typesetters actually use a (thin) space.
It's just that adoption is poor.
U+002E is perfectly acceptable and widely used, especially if U+2009 is
not available (and U+0020 risks introducing a line break). Here's a
recent example:
»Die Finanzkontrolle Schwarzarbeit überprüfte im Jahr 2017 mehr als
52.000 Arbeitgeber und leitete fast 108.000 Strafverfahren ein. Die
Anzahl der eingeleiteten Ermittlungsverfahren wegen der Nichtgewährung
des gesetzlichen Mindestlohns nach dem Mindestlohngesetz stieg auf 2.522
Verfahren (2016: 1.651; 2015: 705).«
<https://www.bundesfinanzministerium.de/Content/DE/Pressemitteilungen/Finanzpolitik/2018/04/2018-04-17-ZJPK.html>
(Also look at the date at the top of the pageâit doesn't follow DIN ISO
8601, either.)
I don't think the locales need to change. Using characters from the
ASCII range for printing numbers has its advantages.
Thanks,
Florian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years
2018-04-18 7:14 ` Florian Weimer
@ 2018-04-18 8:31 ` kdex
2018-04-18 9:35 ` Florian Weimer
2018-04-18 21:05 ` Rafal Luzynski
0 siblings, 2 replies; 11+ messages in thread
From: kdex @ 2018-04-18 8:31 UTC (permalink / raw)
To: libc-alpha
[-- Attachment #1: Type: text/plain, Size: 4207 bytes --]
On Wednesday, April 18, 2018 9:14:45 AM CEST Florian Weimer wrote:
> On 04/18/2018 12:24 AM, kdex wrote:
> > To give some context: I have previously posted the following on
> > libc-locales and was asked to bring this to the attention of senior
> > developers on this least who speak German.
> >
> > I have noticed that the locale `de_DE` has erroneously been using a full
> > stop (U+002E) for the thousands (group) separator in `mon_thousands_sep`
> > and `thousands_sep` ever since 2000. The usage of a full stop to group
> > thousands has (to my knowledge) has never been standardized.
> >
> > As per DIN 1333, DIN 5008, and DIN EN ISO 80000, the separator should have
> > been a thin space (U+2009).
> >
> > In fact, DIN 1333 even explicitly forbids the usage of U+002E to group
> > thousands, and DIN EN ISO 80000 explicitly excludes all other characters
> > than a thin space.
>
> These standards are simply not universally used. They aren't exactly
> wrong, either, because some typesetters actually use a (thin) space.
> It's just that adoption is poor.
>
> U+002E is perfectly acceptable and widely used, especially if U+2009 is
> not available (and U+0020 risks introducing a line break). Here's a
> recent example:
>
> »Die Finanzkontrolle Schwarzarbeit überprüfte im Jahr 2017 mehr als
> 52.000 Arbeitgeber und leitete fast 108.000 Strafverfahren ein. Die
> Anzahl der eingeleiteten Ermittlungsverfahren wegen der Nichtgewährung
> des gesetzlichen Mindestlohns nach dem Mindestlohngesetz stieg auf 2.522
> Verfahren (2016: 1.651; 2015: 705).«
>
> <https://www.bundesfinanzministerium.de/Content/DE/Pressemitteilungen/Finanz
> politik/2018/04/2018-04-17-ZJPK.html>
While the Federal Ministry of Finance may be an interesting (or even ironic)
source to point out, it is in no way normative, and their website is mostly
subject to their team of web developers.
Note that according to DIN 5008, amounts of money should, for security
purposes, indeed be grouped with periods, so leaving `mon_thousands_sep` as-is
would still allow for standards-compliance. Amounts of money are also covered
by the three norms I've brought up before:
[…] Aus diesem Grund sehen Normen die Verwendung eines Leerzeichens als
Tausendertrennzeichen vor (DIN 1333, DIN 5008 und ISO 80000). Dabei wird ein
schmales Leerzeichen empfohlen, falls dieses technisch verfügbar ist. Eine
Ausnahme bilden Geldbeträge, die aus Sicherheitsgründen mit dem Leerzeichen,
das mindestens die Breite einer der Ziffern hat, oder einem Trennzeichen (wie
dem Punkt) getrennt werden können. [2]
>
> (Also look at the date at the top of the page—it doesn't follow DIN ISO
> 8601, either.)
That's unfortunate; but the paragraph about normativity above would apply
here, too.
Duden, the German approach to these matters (generally considered relatively
normative among Germans), adheres to the ISO norms as well [1], which should
speak for itself.
It's simple enough to find instances of German articles about finances that
try to use spaces (admittedly the wrong ones) as separators as well, see [3].
>
> I don't think the locales need to change. Using characters from the
> ASCII range for printing numbers has its advantages.
I don't think this premise is correct: In de_DE, amounts of money include
`currency_symbol` (U+20AC), which is not in the ASCII range. ps_AF uses U+066C
for `thousands_sep`, fa_IR uses U+002C and es_MX even uses U+2009 (the very
same character that this thread is about). None of these are in the ASCII
range; so why should we treat de_DE like a special case? It's much easier to
be standards-compliant here and get used to the fact that numbers do generally
contain non-ASCII characters that a parser could just skip over.
>
> Thanks,
> Florian
[1] https://www.duden.de/sprachwissen/rechtschreibregeln/zahlen-und-ziffern
[2] https://de.wikipedia.org/wiki/
Zifferngruppierung#Zur_Problematik_von_Punkt_und_Komma_f%C3%BCr_Tausender-
_und_Dezimaltrennzeichen
[3] https://www.finanzen.ch/nachrichten/aktien/Aktien-Frankfurt-Eroeffnung-Dax-schiebt-sich-wieder-ueber-12-600-Punkte-1021483710
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years
2018-04-18 8:31 ` kdex
@ 2018-04-18 9:35 ` Florian Weimer
2018-04-18 21:10 ` kdex
2018-04-18 21:05 ` Rafal Luzynski
1 sibling, 1 reply; 11+ messages in thread
From: Florian Weimer @ 2018-04-18 9:35 UTC (permalink / raw)
To: kdex, libc-alpha
On 04/18/2018 10:30 AM, kdex wrote:
> While the Federal Ministry of Finance may be an interesting (or even ironic)
> source to point out, it is in no way normative, and their website is mostly
> subject to their team of web developers.
»Rund 32.000 Experten aus Wirtschaft und Forschung, von Verbraucherseite
und der öffentlichen Hand bringen ihr Fachwissen in den Normungsprozess
ein, den DIN als privatwirtschaftlich organisierter Projektmanager steuert.«
<https://www.din.de/de/ueber-normen-und-standards/basiswissen>
And as that web page explains, DIN norms aren't normative, either. Our
users expect that the locales follow actual practice, not what some
document says that they have never seen and nobody has read. (For
example, I can't easily tell whether the DIN-proposed keyboard layout
for German provides a convenient way to enter the relevant space character.)
Wikipedia itself prefers ».« for numbers on (culturally) German pages:
<https://de.wikipedia.org/wiki/Wikipedia:Schreibweise_von_Zahlen#Zifferngruppierung>
You cited a Swiss web page (finanzen.ch), but the Swiss have slightly
different typographical traditions which do not apply to de_DE.
As I said, some (culturally German) typesetters use spaces (of various
widths), but their use is somewhat rare.
Thanks,
Florian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years
2018-04-18 8:31 ` kdex
2018-04-18 9:35 ` Florian Weimer
@ 2018-04-18 21:05 ` Rafal Luzynski
2018-04-19 8:08 ` Florian Weimer
1 sibling, 1 reply; 11+ messages in thread
From: Rafal Luzynski @ 2018-04-18 21:05 UTC (permalink / raw)
To: libc-alpha, kdex
18.04.2018 10:30 kdex <kdex@kdex.de> wrote:
>
> On Wednesday, April 18, 2018 9:14:45 AM CEST Florian Weimer wrote:
> > On 04/18/2018 12:24 AM, kdex wrote:
> > > To give some context: I have previously posted the following on
> > > libc-locales and was asked to bring this to the attention of senior
> > > developers on this least who speak German.
> > >
> > > I have noticed that the locale `de_DE` has erroneously been using a full
> > > stop (U+002E) for the thousands (group) separator in `mon_thousands_sep`
> > > and `thousands_sep` ever since 2000. The usage of a full stop to group
> > > thousands has (to my knowledge) has never been standardized.
> > >
> > > As per DIN 1333, DIN 5008, and DIN EN ISO 80000, the separator should have
> > > been a thin space (U+2009).
> > >
> > > In fact, DIN 1333 even explicitly forbids the usage of U+002E to group
> > > thousands, and DIN EN ISO 80000 explicitly excludes all other characters
> > > than a thin space.
> >
> > These standards are simply not universally used. They aren't exactly
> > wrong, either, because some typesetters actually use a (thin) space.
> > It's just that adoption is poor.
Florian, this is ambiguous: do you mean "not universally used, except for
financial institutions" or "not universally used, even by financial
institutions"? Note that there is mon_thousands_sep (in LC_MONETARY)
and thousands_sep (in LC_NUMERIC) so it is possible to set different
thousands separators to format amounts of money and to format other numbers.
> [...]
> > I don't think the locales need to change. Using characters from the
> > ASCII range for printing numbers has its advantages.
> I don't think this premise is correct: In de_DE, amounts of money include
> `currency_symbol` (U+20AC), which is not in the ASCII range. [...]
This is an euro sign (€) and it is displayed as it is in the locales
implementing Unicode (e.g., de_DE.UTF-8). In de_DE.ISO-8859-15@euro
it is converted to 0xa4 character which is again an euro sign in
ISO 8859-15. In de_DE.ISO-8859-1 it is converted to "EUR" string.
So in every charset it is displayed correctly. My point is that it
is safe to use sophisticated Unicode characters (like narrow space etc.)
in the locale data source code and assume that localedef handles it
smartly in every charset.
Of course I don't know what grouping separator is correct for Germany,
I'm only reminding possible technical solutions.
Regards,
Rafal
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years
2018-04-18 9:35 ` Florian Weimer
@ 2018-04-18 21:10 ` kdex
2018-04-19 8:55 ` Florian Weimer
0 siblings, 1 reply; 11+ messages in thread
From: kdex @ 2018-04-18 21:10 UTC (permalink / raw)
To: libc-alpha
[-- Attachment #1: Type: text/plain, Size: 2980 bytes --]
On Wednesday, April 18, 2018 11:35:31 AM CEST Florian Weimer wrote:
> On 04/18/2018 10:30 AM, kdex wrote:
> > While the Federal Ministry of Finance may be an interesting (or even
> > ironic) source to point out, it is in no way normative, and their website
> > is mostly subject to their team of web developers.
>
> »Rund 32.000 Experten aus Wirtschaft und Forschung, von Verbraucherseite
> und der öffentlichen Hand bringen ihr Fachwissen in den Normungsprozess
> ein, den DIN als privatwirtschaftlich organisierter Projektmanager steuert.«
>
> <https://www.din.de/de/ueber-normen-und-standards/basiswissen>
>
> And as that web page explains, DIN norms aren't normative, either. Our
> users expect that the locales follow actual practice, not what some
> document says that they have never seen and nobody has read. (For
> example, I can't easily tell whether the DIN-proposed keyboard layout
> for German provides a convenient way to enter the relevant space character.)
By the same argument, you could easily pitch abandoning all norms; most people
will likely not have seen or read any norm in their lives; the majority just
replicates what others do, or whatever Duden states.
So yes, it is true that it there is no requirement to follow DIN norms, nor is
there a requirement to follow Duden's word spellings; though I don't see how a
seemingly arbitrary group separator with no normative grounds is any better.
The purpose of a norm is to have a common system that everyone can follow.
Doesn't deviating from the norm defy the very purpose of having norms in the
first place?
I would be surprised if *everyone* using the es_MX locale would expect U+2009
to be used as a group separator; in fact, I do imagine a lot of users being
just as creative as the Germans when it comes to typing out grouping
separators by hand.
Hence, the point is less that locale users need the ability to have U+2009
mapped on their keyboards somewhere, but rather that users should be able to
input regular numbers and rely on their software to use their system locale to
figure out how their numbers should be displayed according to the current
locale.
>
> Wikipedia itself prefers ».« for numbers on (culturally) German pages:
>
> <https://de.wikipedia.org/wiki/Wikipedia:Schreibweise_von_Zahlen#Zifferngrup
> pierung>
>
> You cited a Swiss web page (finanzen.ch), but the Swiss have slightly
> different typographical traditions which do not apply to de_DE.
I think it's best if we consider most URLs exchanged in this thread to be
anecdotal evidence; this doesn't get us much further (but I do acknowledge
your point).
Ideally, we should adhere to "official" guidelines. And as has been stated
before; Duden is de-jure non-normative, but de-facto, it very much is.
>
> As I said, some (culturally German) typesetters use spaces (of various
> widths), but their use is somewhat rare.
>
> Thanks,
> Florian
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years
2018-04-18 21:05 ` Rafal Luzynski
@ 2018-04-19 8:08 ` Florian Weimer
0 siblings, 0 replies; 11+ messages in thread
From: Florian Weimer @ 2018-04-19 8:08 UTC (permalink / raw)
To: Rafal Luzynski, libc-alpha, kdex
On 04/18/2018 11:05 PM, Rafal Luzynski wrote:
>>> These standards are simply not universally used. They aren't exactly
>>> wrong, either, because some typesetters actually use a (thin) space.
>>> It's just that adoption is poor.
> Florian, this is ambiguous: do you mean "not universally used, except for
> financial institutions" or "not universally used, even by financial
> institutions"? Note that there is mon_thousands_sep (in LC_MONETARY)
> and thousands_sep (in LC_NUMERIC) so it is possible to set different
> thousands separators to format amounts of money and to format other numbers.
As far as I can tell, the dot as the thousands separator is used
everywhere in Germany, not just by financial institutions or for
currency amounts. On the other hand, you can also find books which use
a (thin) space as the separator (current works, outside cited parts),
and some people use it in their personal correspondence as well.
Thanks,
Florian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years
2018-04-18 21:10 ` kdex
@ 2018-04-19 8:55 ` Florian Weimer
2018-04-19 19:55 ` Carlos O'Donell
0 siblings, 1 reply; 11+ messages in thread
From: Florian Weimer @ 2018-04-19 8:55 UTC (permalink / raw)
To: libc-alpha
On 04/18/2018 11:10 PM, kdex wrote:
> On Wednesday, April 18, 2018 11:35:31 AM CEST Florian Weimer wrote:
>> On 04/18/2018 10:30 AM, kdex wrote:
>>> While the Federal Ministry of Finance may be an interesting (or even
>>> ironic) source to point out, it is in no way normative, and their website
>>> is mostly subject to their team of web developers.
>>
>> »Rund 32.000 Experten aus Wirtschaft und Forschung, von Verbraucherseite
>> und der öffentlichen Hand bringen ihr Fachwissen in den Normungsprozess
>> ein, den DIN als privatwirtschaftlich organisierter Projektmanager steuert.«
>>
>> <https://www.din.de/de/ueber-normen-und-standards/basiswissen>
>>
>> And as that web page explains, DIN norms aren't normative, either. Our
>> users expect that the locales follow actual practice, not what some
>> document says that they have never seen and nobody has read. (For
>> example, I can't easily tell whether the DIN-proposed keyboard layout
>> for German provides a convenient way to enter the relevant space character.)
> By the same argument, you could easily pitch abandoning all norms; most people
> will likely not have seen or read any norm in their lives; the majority just
> replicates what others do, or whatever Duden states.
Yes, but when it comes to natural languages, it is our job (as glibc
maintainers updating locale definitions) to document the existing
practice, and not to try to change it. For whatever reason, DIN
standards are a poor guidance for maintaining locales, and so are the
reference materials Duden publishes.
> So yes, it is true that it there is no requirement to follow DIN norms, nor is
> there a requirement to follow Duden's word spellings; though I don't see how a
> seemingly arbitrary group separator with no normative grounds is any better.
I can't access the DIN process documents. We would have to review why
they rejected the dot separator when it was widely used, sometimes when
the standards were created for the first time. There has to be some
rationale for the discrepancy.
Based on the publicly available information, the choice to make the dot
or the space normative in this context appears to be totally arbitrary.
> The purpose of a norm is to have a common system that everyone can follow.
> Doesn't deviating from the norm defy the very purpose of having norms in the
> first place?
For norms in the area of natural language, the norms should document
existing practice. Everything else does not make sense and leads to
poor adoption. If there is no consensus, you can document multiple
options (see prototype and non-prototype function definitions in C90,
for an example in the area of programming languages, where you would
expect that more rigidness would be appropriate, not less).
> Hence, the point is less that locale users need the ability to have U+2009
> mapped on their keyboards somewhere, but rather that users should be able to
> input regular numbers and rely on their software to use their system locale to
> figure out how their numbers should be displayed according to the current
> locale.
But that's not how people enter numbers in their word processor.
U+2009 also has the wrong line breaking property in the basic Unicode
line breaking algorithm <https://unicode.org/reports/tr14/>, so it makes
it quite hard for word processors to do the right thing even if the user
managers to enter this character.
> Ideally, we should adhere to "official" guidelines. And as has been stated
> before; Duden is de-jure non-normative, but de-facto, it very much is.
That's a historical accident because a government body once referred to
it as »maßgeblich in allen Zweifelsfällen« (“authoritative if there is
any doubt”), but that referred to orthography and was before there was
an official, government-issued list of spellings. When the official
word list was finally published in 1996, Duden lost any claims to
authority (and apparently removed it from their marketing materials).
Regarding number formatting (mainly in typesetting), my 1996 edition of
the short Duden volume suggests that it is at least partially
descriptive (»hat sich eingebürgert«). In this light, the omission of
the dot as the separator (which was common at the time) looks like a
mistake. Despite a decade or more of widespread use of word processors,
it only has guidelines for typewriting, and does not address the matter
of breaking spaces in numbers that arise in word processors.
Thanks,
Florian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years
2018-04-19 8:55 ` Florian Weimer
@ 2018-04-19 19:55 ` Carlos O'Donell
2018-04-19 20:40 ` Florian Weimer
0 siblings, 1 reply; 11+ messages in thread
From: Carlos O'Donell @ 2018-04-19 19:55 UTC (permalink / raw)
To: Florian Weimer, libc-alpha
On 04/19/2018 03:55 AM, Florian Weimer wrote:
> On 04/18/2018 11:10 PM, kdex wrote:
>> Hence, the point is less that locale users need the ability to have
>> U+2009 mapped on their keyboards somewhere, but rather that users
>> should be able to input regular numbers and rely on their software
>> to use their system locale to figure out how their numbers should
>> be displayed according to the current locale.
>
> But that's not how people enter numbers in their word processor.
>
> U+2009 also has the wrong line breaking property in the basic Unicode
> line breaking algorithm <https://unicode.org/reports/tr14/>, so it
> makes it quite hard for word processors to do the right thing even if
> the user managers to enter this character.
We use U+202F now.
There is some history here with regard to U+2009.
I reviewed the es_MX case for thousands_sep becoming U+2009, and I wrote
to the Mexican government, and reviewed the relative standards and
cultural use cases, but I did *not* consider the impact on the ability for
users to type or the line-breaking aspects of the change (nor do I think
the standard covered these problems).
It seemed like U+2009 was the right choice, but this resulted in swbz#20756:
https://sourceware.org/bugzilla/show_bug.cgi?id=20756
And Mike Fabian fixed this after my review and we now use Narrow non-breaking
space (U+202F).
So you could use U+202F, but it seems like this is just wishful thinking on
the part of standards. Just like how Canada claims to follow ISO 8601 everywhere
and then doesn't.
I agree with the sentiment that these should be "trailing standards" that
define existing practice.
Where existing practice matches government standards, you are assured that
there has been consensus and can adjust the locales to match.
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years
2018-04-19 19:55 ` Carlos O'Donell
@ 2018-04-19 20:40 ` Florian Weimer
2018-04-20 1:38 ` Carlos O'Donell
0 siblings, 1 reply; 11+ messages in thread
From: Florian Weimer @ 2018-04-19 20:40 UTC (permalink / raw)
To: Carlos O'Donell, libc-alpha
On 04/19/2018 09:55 PM, Carlos O'Donell wrote:
> On 04/19/2018 03:55 AM, Florian Weimer wrote:
>> On 04/18/2018 11:10 PM, kdex wrote:
>>> Hence, the point is less that locale users need the ability to have
>>> U+2009 mapped on their keyboards somewhere, but rather that users
>>> should be able to input regular numbers and rely on their software
>>> to use their system locale to figure out how their numbers should
>>> be displayed according to the current locale.
>>
>> But that's not how people enter numbers in their word processor.
>>
>> U+2009 also has the wrong line breaking property in the basic Unicode
>> line breaking algorithm <https://unicode.org/reports/tr14/>, so it
>> makes it quite hard for word processors to do the right thing even if
>> the user managers to enter this character.
>
> We use U+202F now.
Ahh, I knew that I was missing something. Nice to know that Unicode has
a narrow non-breaking space. That's seems to be completely appropriate
if you want to use a narrow space in this context.
> There is some history here with regard to U+2009.
>
> I reviewed the es_MX case for thousands_sep becoming U+2009, and I wrote
> to the Mexican government, and reviewed the relative standards and
> cultural use cases, but I did *not* consider the impact on the ability for
> users to type or the line-breaking aspects of the change (nor do I think
> the standard covered these problems).
The matter of entering the character is not so important to glibc's use
case, I think. But it should matter for a standard with *word
processor* usage guidelines. If it requires using an
impossible-to-enter character for conformance, then it looks like
something went wrong during the standardization process.
Thanks,
Florian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years
2018-04-19 20:40 ` Florian Weimer
@ 2018-04-20 1:38 ` Carlos O'Donell
0 siblings, 0 replies; 11+ messages in thread
From: Carlos O'Donell @ 2018-04-20 1:38 UTC (permalink / raw)
To: Florian Weimer, libc-alpha
On 04/19/2018 03:40 PM, Florian Weimer wrote:
> On 04/19/2018 09:55 PM, Carlos O'Donell wrote:
>> On 04/19/2018 03:55 AM, Florian Weimer wrote:
>>> On 04/18/2018 11:10 PM, kdex wrote:
>>>> Hence, the point is less that locale users need the ability to
>>>> have U+2009 mapped on their keyboards somewhere, but rather
>>>> that users should be able to input regular numbers and rely on
>>>> their software to use their system locale to figure out how
>>>> their numbers should be displayed according to the current
>>>> locale.
>>>
>>> But that's not how people enter numbers in their word processor.
>>>
>>> U+2009 also has the wrong line breaking property in the basic
>>> Unicode line breaking algorithm
>>> <https://unicode.org/reports/tr14/>, so it makes it quite hard
>>> for word processors to do the right thing even if the user
>>> managers to enter this character.
>>
>> We use U+202F now.
>
> Ahh, I knew that I was missing something. Nice to know that Unicode
> has a narrow non-breaking space. That's seems to be completely
> appropriate if you want to use a narrow space in this context.
Exactly. So our locale progression has been:
U+0020 -> U+2009 -> U+202F
As we became more knowledgeable with Unicode, but that still doesn't
mean it should be chosen if it's not common practice.
>> There is some history here with regard to U+2009.
>>
>> I reviewed the es_MX case for thousands_sep becoming U+2009, and I
>> wrote to the Mexican government, and reviewed the relative
>> standards and cultural use cases, but I did *not* consider the
>> impact on the ability for users to type or the line-breaking
>> aspects of the change (nor do I think the standard covered these
>> problems).
>
> The matter of entering the character is not so important to glibc's
> use case, I think. But it should matter for a standard with *word
> processor* usage guidelines. If it requires using an
> impossible-to-enter character for conformance, then it looks like
> something went wrong during the standardization process.
Agreed.
I would expect a word processor to automatically format my numbers
given the locale. However, while writing by hand it can be a bit hard
to detect exactly when to do that, but not impossible.
In LibreOffice it's a slightly long "ctrl+shift+U 202f enter", but
it's not impossible, nor overly difficult, and you can script it with
a quick auto-correct e.g. :nbs: => U+202F, or some other kind of
automation.
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-04-20 1:38 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-17 22:25 de_DE has been using the wrong group separator for over 18 years kdex
2018-04-18 7:14 ` Florian Weimer
2018-04-18 8:31 ` kdex
2018-04-18 9:35 ` Florian Weimer
2018-04-18 21:10 ` kdex
2018-04-19 8:55 ` Florian Weimer
2018-04-19 19:55 ` Carlos O'Donell
2018-04-19 20:40 ` Florian Weimer
2018-04-20 1:38 ` Carlos O'Donell
2018-04-18 21:05 ` Rafal Luzynski
2018-04-19 8:08 ` Florian Weimer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).