* de_DE has been using the wrong group separator for over 18 years @ 2018-04-17 22:25 kdex 2018-04-18 7:14 ` Florian Weimer 0 siblings, 1 reply; 11+ messages in thread From: kdex @ 2018-04-17 22:25 UTC (permalink / raw) To: libc-alpha [-- Attachment #1: Type: text/plain, Size: 1194 bytes --] To give some context: I have previously posted the following on libc-locales and was asked to bring this to the attention of senior developers on this least who speak German. I have noticed that the locale `de_DE` has erroneously been using a full stop (U+002E) for the thousands (group) separator in `mon_thousands_sep` and `thousands_sep` ever since 2000. The usage of a full stop to group thousands has (to my knowledge) has never been standardized. As per DIN 1333, DIN 5008, and DIN EN ISO 80000, the separator should have been a thin space (U+2009). In fact, DIN 1333 even explicitly forbids the usage of U+002E to group thousands, and DIN EN ISO 80000 explicitly excludes all other characters than a thin space. Has anyone noticed this before? I fear that this change might break a lot of code that relies on the separator being wrong. Yet, this really should be fixed… What's the best way to deal with this? For further information, please also refer to the relevant section on Wikipedia at [1] (German). [1] https://de.wikipedia.org/wiki/ Zifferngruppierung#Zur_Problematik_von_Punkt_und_Komma_f%C3%BCr_Tausender- _und_Dezimaltrennzeichen [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years 2018-04-17 22:25 de_DE has been using the wrong group separator for over 18 years kdex @ 2018-04-18 7:14 ` Florian Weimer 2018-04-18 8:31 ` kdex 0 siblings, 1 reply; 11+ messages in thread From: Florian Weimer @ 2018-04-18 7:14 UTC (permalink / raw) To: kdex, libc-alpha On 04/18/2018 12:24 AM, kdex wrote: > To give some context: I have previously posted the following on libc-locales > and was asked to bring this to the attention of senior developers on this > least who speak German. > > I have noticed that the locale `de_DE` has erroneously been using a full stop > (U+002E) for the thousands (group) separator in `mon_thousands_sep` and > `thousands_sep` ever since 2000. The usage of a full stop to group thousands > has (to my knowledge) has never been standardized. > > As per DIN 1333, DIN 5008, and DIN EN ISO 80000, the separator should have > been a thin space (U+2009). > > In fact, DIN 1333 even explicitly forbids the usage of U+002E to group > thousands, and DIN EN ISO 80000 explicitly excludes all other characters than > a thin space. These standards are simply not universally used. They aren't exactly wrong, either, because some typesetters actually use a (thin) space. It's just that adoption is poor. U+002E is perfectly acceptable and widely used, especially if U+2009 is not available (and U+0020 risks introducing a line break). Here's a recent example: »Die Finanzkontrolle Schwarzarbeit überprüfte im Jahr 2017 mehr als 52.000 Arbeitgeber und leitete fast 108.000 Strafverfahren ein. Die Anzahl der eingeleiteten Ermittlungsverfahren wegen der Nichtgewährung des gesetzlichen Mindestlohns nach dem Mindestlohngesetz stieg auf 2.522 Verfahren (2016: 1.651; 2015: 705).« <https://www.bundesfinanzministerium.de/Content/DE/Pressemitteilungen/Finanzpolitik/2018/04/2018-04-17-ZJPK.html> (Also look at the date at the top of the pageâit doesn't follow DIN ISO 8601, either.) I don't think the locales need to change. Using characters from the ASCII range for printing numbers has its advantages. Thanks, Florian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years 2018-04-18 7:14 ` Florian Weimer @ 2018-04-18 8:31 ` kdex 2018-04-18 9:35 ` Florian Weimer 2018-04-18 21:05 ` Rafal Luzynski 0 siblings, 2 replies; 11+ messages in thread From: kdex @ 2018-04-18 8:31 UTC (permalink / raw) To: libc-alpha [-- Attachment #1: Type: text/plain, Size: 4207 bytes --] On Wednesday, April 18, 2018 9:14:45 AM CEST Florian Weimer wrote: > On 04/18/2018 12:24 AM, kdex wrote: > > To give some context: I have previously posted the following on > > libc-locales and was asked to bring this to the attention of senior > > developers on this least who speak German. > > > > I have noticed that the locale `de_DE` has erroneously been using a full > > stop (U+002E) for the thousands (group) separator in `mon_thousands_sep` > > and `thousands_sep` ever since 2000. The usage of a full stop to group > > thousands has (to my knowledge) has never been standardized. > > > > As per DIN 1333, DIN 5008, and DIN EN ISO 80000, the separator should have > > been a thin space (U+2009). > > > > In fact, DIN 1333 even explicitly forbids the usage of U+002E to group > > thousands, and DIN EN ISO 80000 explicitly excludes all other characters > > than a thin space. > > These standards are simply not universally used. They aren't exactly > wrong, either, because some typesetters actually use a (thin) space. > It's just that adoption is poor. > > U+002E is perfectly acceptable and widely used, especially if U+2009 is > not available (and U+0020 risks introducing a line break). Here's a > recent example: > > »Die Finanzkontrolle Schwarzarbeit überprüfte im Jahr 2017 mehr als > 52.000 Arbeitgeber und leitete fast 108.000 Strafverfahren ein. Die > Anzahl der eingeleiteten Ermittlungsverfahren wegen der Nichtgewährung > des gesetzlichen Mindestlohns nach dem Mindestlohngesetz stieg auf 2.522 > Verfahren (2016: 1.651; 2015: 705).« > > <https://www.bundesfinanzministerium.de/Content/DE/Pressemitteilungen/Finanz > politik/2018/04/2018-04-17-ZJPK.html> While the Federal Ministry of Finance may be an interesting (or even ironic) source to point out, it is in no way normative, and their website is mostly subject to their team of web developers. Note that according to DIN 5008, amounts of money should, for security purposes, indeed be grouped with periods, so leaving `mon_thousands_sep` as-is would still allow for standards-compliance. Amounts of money are also covered by the three norms I've brought up before: […] Aus diesem Grund sehen Normen die Verwendung eines Leerzeichens als Tausendertrennzeichen vor (DIN 1333, DIN 5008 und ISO 80000). Dabei wird ein schmales Leerzeichen empfohlen, falls dieses technisch verfügbar ist. Eine Ausnahme bilden Geldbeträge, die aus Sicherheitsgründen mit dem Leerzeichen, das mindestens die Breite einer der Ziffern hat, oder einem Trennzeichen (wie dem Punkt) getrennt werden können. [2] > > (Also look at the date at the top of the page—it doesn't follow DIN ISO > 8601, either.) That's unfortunate; but the paragraph about normativity above would apply here, too. Duden, the German approach to these matters (generally considered relatively normative among Germans), adheres to the ISO norms as well [1], which should speak for itself. It's simple enough to find instances of German articles about finances that try to use spaces (admittedly the wrong ones) as separators as well, see [3]. > > I don't think the locales need to change. Using characters from the > ASCII range for printing numbers has its advantages. I don't think this premise is correct: In de_DE, amounts of money include `currency_symbol` (U+20AC), which is not in the ASCII range. ps_AF uses U+066C for `thousands_sep`, fa_IR uses U+002C and es_MX even uses U+2009 (the very same character that this thread is about). None of these are in the ASCII range; so why should we treat de_DE like a special case? It's much easier to be standards-compliant here and get used to the fact that numbers do generally contain non-ASCII characters that a parser could just skip over. > > Thanks, > Florian [1] https://www.duden.de/sprachwissen/rechtschreibregeln/zahlen-und-ziffern [2] https://de.wikipedia.org/wiki/ Zifferngruppierung#Zur_Problematik_von_Punkt_und_Komma_f%C3%BCr_Tausender- _und_Dezimaltrennzeichen [3] https://www.finanzen.ch/nachrichten/aktien/Aktien-Frankfurt-Eroeffnung-Dax-schiebt-sich-wieder-ueber-12-600-Punkte-1021483710 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years 2018-04-18 8:31 ` kdex @ 2018-04-18 9:35 ` Florian Weimer 2018-04-18 21:10 ` kdex 2018-04-18 21:05 ` Rafal Luzynski 1 sibling, 1 reply; 11+ messages in thread From: Florian Weimer @ 2018-04-18 9:35 UTC (permalink / raw) To: kdex, libc-alpha On 04/18/2018 10:30 AM, kdex wrote: > While the Federal Ministry of Finance may be an interesting (or even ironic) > source to point out, it is in no way normative, and their website is mostly > subject to their team of web developers. »Rund 32.000 Experten aus Wirtschaft und Forschung, von Verbraucherseite und der öffentlichen Hand bringen ihr Fachwissen in den Normungsprozess ein, den DIN als privatwirtschaftlich organisierter Projektmanager steuert.« <https://www.din.de/de/ueber-normen-und-standards/basiswissen> And as that web page explains, DIN norms aren't normative, either. Our users expect that the locales follow actual practice, not what some document says that they have never seen and nobody has read. (For example, I can't easily tell whether the DIN-proposed keyboard layout for German provides a convenient way to enter the relevant space character.) Wikipedia itself prefers ».« for numbers on (culturally) German pages: <https://de.wikipedia.org/wiki/Wikipedia:Schreibweise_von_Zahlen#Zifferngruppierung> You cited a Swiss web page (finanzen.ch), but the Swiss have slightly different typographical traditions which do not apply to de_DE. As I said, some (culturally German) typesetters use spaces (of various widths), but their use is somewhat rare. Thanks, Florian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years 2018-04-18 9:35 ` Florian Weimer @ 2018-04-18 21:10 ` kdex 2018-04-19 8:55 ` Florian Weimer 0 siblings, 1 reply; 11+ messages in thread From: kdex @ 2018-04-18 21:10 UTC (permalink / raw) To: libc-alpha [-- Attachment #1: Type: text/plain, Size: 2980 bytes --] On Wednesday, April 18, 2018 11:35:31 AM CEST Florian Weimer wrote: > On 04/18/2018 10:30 AM, kdex wrote: > > While the Federal Ministry of Finance may be an interesting (or even > > ironic) source to point out, it is in no way normative, and their website > > is mostly subject to their team of web developers. > > »Rund 32.000 Experten aus Wirtschaft und Forschung, von Verbraucherseite > und der öffentlichen Hand bringen ihr Fachwissen in den Normungsprozess > ein, den DIN als privatwirtschaftlich organisierter Projektmanager steuert.« > > <https://www.din.de/de/ueber-normen-und-standards/basiswissen> > > And as that web page explains, DIN norms aren't normative, either. Our > users expect that the locales follow actual practice, not what some > document says that they have never seen and nobody has read. (For > example, I can't easily tell whether the DIN-proposed keyboard layout > for German provides a convenient way to enter the relevant space character.) By the same argument, you could easily pitch abandoning all norms; most people will likely not have seen or read any norm in their lives; the majority just replicates what others do, or whatever Duden states. So yes, it is true that it there is no requirement to follow DIN norms, nor is there a requirement to follow Duden's word spellings; though I don't see how a seemingly arbitrary group separator with no normative grounds is any better. The purpose of a norm is to have a common system that everyone can follow. Doesn't deviating from the norm defy the very purpose of having norms in the first place? I would be surprised if *everyone* using the es_MX locale would expect U+2009 to be used as a group separator; in fact, I do imagine a lot of users being just as creative as the Germans when it comes to typing out grouping separators by hand. Hence, the point is less that locale users need the ability to have U+2009 mapped on their keyboards somewhere, but rather that users should be able to input regular numbers and rely on their software to use their system locale to figure out how their numbers should be displayed according to the current locale. > > Wikipedia itself prefers ».« for numbers on (culturally) German pages: > > <https://de.wikipedia.org/wiki/Wikipedia:Schreibweise_von_Zahlen#Zifferngrup > pierung> > > You cited a Swiss web page (finanzen.ch), but the Swiss have slightly > different typographical traditions which do not apply to de_DE. I think it's best if we consider most URLs exchanged in this thread to be anecdotal evidence; this doesn't get us much further (but I do acknowledge your point). Ideally, we should adhere to "official" guidelines. And as has been stated before; Duden is de-jure non-normative, but de-facto, it very much is. > > As I said, some (culturally German) typesetters use spaces (of various > widths), but their use is somewhat rare. > > Thanks, > Florian [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years 2018-04-18 21:10 ` kdex @ 2018-04-19 8:55 ` Florian Weimer 2018-04-19 19:55 ` Carlos O'Donell 0 siblings, 1 reply; 11+ messages in thread From: Florian Weimer @ 2018-04-19 8:55 UTC (permalink / raw) To: libc-alpha On 04/18/2018 11:10 PM, kdex wrote: > On Wednesday, April 18, 2018 11:35:31 AM CEST Florian Weimer wrote: >> On 04/18/2018 10:30 AM, kdex wrote: >>> While the Federal Ministry of Finance may be an interesting (or even >>> ironic) source to point out, it is in no way normative, and their website >>> is mostly subject to their team of web developers. >> >> »Rund 32.000 Experten aus Wirtschaft und Forschung, von Verbraucherseite >> und der öffentlichen Hand bringen ihr Fachwissen in den Normungsprozess >> ein, den DIN als privatwirtschaftlich organisierter Projektmanager steuert.« >> >> <https://www.din.de/de/ueber-normen-und-standards/basiswissen> >> >> And as that web page explains, DIN norms aren't normative, either. Our >> users expect that the locales follow actual practice, not what some >> document says that they have never seen and nobody has read. (For >> example, I can't easily tell whether the DIN-proposed keyboard layout >> for German provides a convenient way to enter the relevant space character.) > By the same argument, you could easily pitch abandoning all norms; most people > will likely not have seen or read any norm in their lives; the majority just > replicates what others do, or whatever Duden states. Yes, but when it comes to natural languages, it is our job (as glibc maintainers updating locale definitions) to document the existing practice, and not to try to change it. For whatever reason, DIN standards are a poor guidance for maintaining locales, and so are the reference materials Duden publishes. > So yes, it is true that it there is no requirement to follow DIN norms, nor is > there a requirement to follow Duden's word spellings; though I don't see how a > seemingly arbitrary group separator with no normative grounds is any better. I can't access the DIN process documents. We would have to review why they rejected the dot separator when it was widely used, sometimes when the standards were created for the first time. There has to be some rationale for the discrepancy. Based on the publicly available information, the choice to make the dot or the space normative in this context appears to be totally arbitrary. > The purpose of a norm is to have a common system that everyone can follow. > Doesn't deviating from the norm defy the very purpose of having norms in the > first place? For norms in the area of natural language, the norms should document existing practice. Everything else does not make sense and leads to poor adoption. If there is no consensus, you can document multiple options (see prototype and non-prototype function definitions in C90, for an example in the area of programming languages, where you would expect that more rigidness would be appropriate, not less). > Hence, the point is less that locale users need the ability to have U+2009 > mapped on their keyboards somewhere, but rather that users should be able to > input regular numbers and rely on their software to use their system locale to > figure out how their numbers should be displayed according to the current > locale. But that's not how people enter numbers in their word processor. U+2009 also has the wrong line breaking property in the basic Unicode line breaking algorithm <https://unicode.org/reports/tr14/>, so it makes it quite hard for word processors to do the right thing even if the user managers to enter this character. > Ideally, we should adhere to "official" guidelines. And as has been stated > before; Duden is de-jure non-normative, but de-facto, it very much is. That's a historical accident because a government body once referred to it as »maßgeblich in allen Zweifelsfällen« (“authoritative if there is any doubt”), but that referred to orthography and was before there was an official, government-issued list of spellings. When the official word list was finally published in 1996, Duden lost any claims to authority (and apparently removed it from their marketing materials). Regarding number formatting (mainly in typesetting), my 1996 edition of the short Duden volume suggests that it is at least partially descriptive (»hat sich eingebürgert«). In this light, the omission of the dot as the separator (which was common at the time) looks like a mistake. Despite a decade or more of widespread use of word processors, it only has guidelines for typewriting, and does not address the matter of breaking spaces in numbers that arise in word processors. Thanks, Florian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years 2018-04-19 8:55 ` Florian Weimer @ 2018-04-19 19:55 ` Carlos O'Donell 2018-04-19 20:40 ` Florian Weimer 0 siblings, 1 reply; 11+ messages in thread From: Carlos O'Donell @ 2018-04-19 19:55 UTC (permalink / raw) To: Florian Weimer, libc-alpha On 04/19/2018 03:55 AM, Florian Weimer wrote: > On 04/18/2018 11:10 PM, kdex wrote: >> Hence, the point is less that locale users need the ability to have >> U+2009 mapped on their keyboards somewhere, but rather that users >> should be able to input regular numbers and rely on their software >> to use their system locale to figure out how their numbers should >> be displayed according to the current locale. > > But that's not how people enter numbers in their word processor. > > U+2009 also has the wrong line breaking property in the basic Unicode > line breaking algorithm <https://unicode.org/reports/tr14/>, so it > makes it quite hard for word processors to do the right thing even if > the user managers to enter this character. We use U+202F now. There is some history here with regard to U+2009. I reviewed the es_MX case for thousands_sep becoming U+2009, and I wrote to the Mexican government, and reviewed the relative standards and cultural use cases, but I did *not* consider the impact on the ability for users to type or the line-breaking aspects of the change (nor do I think the standard covered these problems). It seemed like U+2009 was the right choice, but this resulted in swbz#20756: https://sourceware.org/bugzilla/show_bug.cgi?id=20756 And Mike Fabian fixed this after my review and we now use Narrow non-breaking space (U+202F). So you could use U+202F, but it seems like this is just wishful thinking on the part of standards. Just like how Canada claims to follow ISO 8601 everywhere and then doesn't. I agree with the sentiment that these should be "trailing standards" that define existing practice. Where existing practice matches government standards, you are assured that there has been consensus and can adjust the locales to match. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years 2018-04-19 19:55 ` Carlos O'Donell @ 2018-04-19 20:40 ` Florian Weimer 2018-04-20 1:38 ` Carlos O'Donell 0 siblings, 1 reply; 11+ messages in thread From: Florian Weimer @ 2018-04-19 20:40 UTC (permalink / raw) To: Carlos O'Donell, libc-alpha On 04/19/2018 09:55 PM, Carlos O'Donell wrote: > On 04/19/2018 03:55 AM, Florian Weimer wrote: >> On 04/18/2018 11:10 PM, kdex wrote: >>> Hence, the point is less that locale users need the ability to have >>> U+2009 mapped on their keyboards somewhere, but rather that users >>> should be able to input regular numbers and rely on their software >>> to use their system locale to figure out how their numbers should >>> be displayed according to the current locale. >> >> But that's not how people enter numbers in their word processor. >> >> U+2009 also has the wrong line breaking property in the basic Unicode >> line breaking algorithm <https://unicode.org/reports/tr14/>, so it >> makes it quite hard for word processors to do the right thing even if >> the user managers to enter this character. > > We use U+202F now. Ahh, I knew that I was missing something. Nice to know that Unicode has a narrow non-breaking space. That's seems to be completely appropriate if you want to use a narrow space in this context. > There is some history here with regard to U+2009. > > I reviewed the es_MX case for thousands_sep becoming U+2009, and I wrote > to the Mexican government, and reviewed the relative standards and > cultural use cases, but I did *not* consider the impact on the ability for > users to type or the line-breaking aspects of the change (nor do I think > the standard covered these problems). The matter of entering the character is not so important to glibc's use case, I think. But it should matter for a standard with *word processor* usage guidelines. If it requires using an impossible-to-enter character for conformance, then it looks like something went wrong during the standardization process. Thanks, Florian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years 2018-04-19 20:40 ` Florian Weimer @ 2018-04-20 1:38 ` Carlos O'Donell 0 siblings, 0 replies; 11+ messages in thread From: Carlos O'Donell @ 2018-04-20 1:38 UTC (permalink / raw) To: Florian Weimer, libc-alpha On 04/19/2018 03:40 PM, Florian Weimer wrote: > On 04/19/2018 09:55 PM, Carlos O'Donell wrote: >> On 04/19/2018 03:55 AM, Florian Weimer wrote: >>> On 04/18/2018 11:10 PM, kdex wrote: >>>> Hence, the point is less that locale users need the ability to >>>> have U+2009 mapped on their keyboards somewhere, but rather >>>> that users should be able to input regular numbers and rely on >>>> their software to use their system locale to figure out how >>>> their numbers should be displayed according to the current >>>> locale. >>> >>> But that's not how people enter numbers in their word processor. >>> >>> U+2009 also has the wrong line breaking property in the basic >>> Unicode line breaking algorithm >>> <https://unicode.org/reports/tr14/>, so it makes it quite hard >>> for word processors to do the right thing even if the user >>> managers to enter this character. >> >> We use U+202F now. > > Ahh, I knew that I was missing something. Nice to know that Unicode > has a narrow non-breaking space. That's seems to be completely > appropriate if you want to use a narrow space in this context. Exactly. So our locale progression has been: U+0020 -> U+2009 -> U+202F As we became more knowledgeable with Unicode, but that still doesn't mean it should be chosen if it's not common practice. >> There is some history here with regard to U+2009. >> >> I reviewed the es_MX case for thousands_sep becoming U+2009, and I >> wrote to the Mexican government, and reviewed the relative >> standards and cultural use cases, but I did *not* consider the >> impact on the ability for users to type or the line-breaking >> aspects of the change (nor do I think the standard covered these >> problems). > > The matter of entering the character is not so important to glibc's > use case, I think. But it should matter for a standard with *word > processor* usage guidelines. If it requires using an > impossible-to-enter character for conformance, then it looks like > something went wrong during the standardization process. Agreed. I would expect a word processor to automatically format my numbers given the locale. However, while writing by hand it can be a bit hard to detect exactly when to do that, but not impossible. In LibreOffice it's a slightly long "ctrl+shift+U 202f enter", but it's not impossible, nor overly difficult, and you can script it with a quick auto-correct e.g. :nbs: => U+202F, or some other kind of automation. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years 2018-04-18 8:31 ` kdex 2018-04-18 9:35 ` Florian Weimer @ 2018-04-18 21:05 ` Rafal Luzynski 2018-04-19 8:08 ` Florian Weimer 1 sibling, 1 reply; 11+ messages in thread From: Rafal Luzynski @ 2018-04-18 21:05 UTC (permalink / raw) To: libc-alpha, kdex 18.04.2018 10:30 kdex <kdex@kdex.de> wrote: > > On Wednesday, April 18, 2018 9:14:45 AM CEST Florian Weimer wrote: > > On 04/18/2018 12:24 AM, kdex wrote: > > > To give some context: I have previously posted the following on > > > libc-locales and was asked to bring this to the attention of senior > > > developers on this least who speak German. > > > > > > I have noticed that the locale `de_DE` has erroneously been using a full > > > stop (U+002E) for the thousands (group) separator in `mon_thousands_sep` > > > and `thousands_sep` ever since 2000. The usage of a full stop to group > > > thousands has (to my knowledge) has never been standardized. > > > > > > As per DIN 1333, DIN 5008, and DIN EN ISO 80000, the separator should have > > > been a thin space (U+2009). > > > > > > In fact, DIN 1333 even explicitly forbids the usage of U+002E to group > > > thousands, and DIN EN ISO 80000 explicitly excludes all other characters > > > than a thin space. > > > > These standards are simply not universally used. They aren't exactly > > wrong, either, because some typesetters actually use a (thin) space. > > It's just that adoption is poor. Florian, this is ambiguous: do you mean "not universally used, except for financial institutions" or "not universally used, even by financial institutions"? Note that there is mon_thousands_sep (in LC_MONETARY) and thousands_sep (in LC_NUMERIC) so it is possible to set different thousands separators to format amounts of money and to format other numbers. > [...] > > I don't think the locales need to change. Using characters from the > > ASCII range for printing numbers has its advantages. > I don't think this premise is correct: In de_DE, amounts of money include > `currency_symbol` (U+20AC), which is not in the ASCII range. [...] This is an euro sign (€) and it is displayed as it is in the locales implementing Unicode (e.g., de_DE.UTF-8). In de_DE.ISO-8859-15@euro it is converted to 0xa4 character which is again an euro sign in ISO 8859-15. In de_DE.ISO-8859-1 it is converted to "EUR" string. So in every charset it is displayed correctly. My point is that it is safe to use sophisticated Unicode characters (like narrow space etc.) in the locale data source code and assume that localedef handles it smartly in every charset. Of course I don't know what grouping separator is correct for Germany, I'm only reminding possible technical solutions. Regards, Rafal ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: de_DE has been using the wrong group separator for over 18 years 2018-04-18 21:05 ` Rafal Luzynski @ 2018-04-19 8:08 ` Florian Weimer 0 siblings, 0 replies; 11+ messages in thread From: Florian Weimer @ 2018-04-19 8:08 UTC (permalink / raw) To: Rafal Luzynski, libc-alpha, kdex On 04/18/2018 11:05 PM, Rafal Luzynski wrote: >>> These standards are simply not universally used. They aren't exactly >>> wrong, either, because some typesetters actually use a (thin) space. >>> It's just that adoption is poor. > Florian, this is ambiguous: do you mean "not universally used, except for > financial institutions" or "not universally used, even by financial > institutions"? Note that there is mon_thousands_sep (in LC_MONETARY) > and thousands_sep (in LC_NUMERIC) so it is possible to set different > thousands separators to format amounts of money and to format other numbers. As far as I can tell, the dot as the thousands separator is used everywhere in Germany, not just by financial institutions or for currency amounts. On the other hand, you can also find books which use a (thin) space as the separator (current works, outside cited parts), and some people use it in their personal correspondence as well. Thanks, Florian ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-04-20 1:38 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-04-17 22:25 de_DE has been using the wrong group separator for over 18 years kdex 2018-04-18 7:14 ` Florian Weimer 2018-04-18 8:31 ` kdex 2018-04-18 9:35 ` Florian Weimer 2018-04-18 21:10 ` kdex 2018-04-19 8:55 ` Florian Weimer 2018-04-19 19:55 ` Carlos O'Donell 2018-04-19 20:40 ` Florian Weimer 2018-04-20 1:38 ` Carlos O'Donell 2018-04-18 21:05 ` Rafal Luzynski 2018-04-19 8:08 ` Florian Weimer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).