public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/23792] fr_FR.UTF8 thousands separator is not C++ conforming
  2018-10-18 12:54 [Bug localedata/23792] New: fr_FR.UTF8 thousands separator is not C++ conforming rguenth at gcc dot gnu.org
@ 2018-10-18 12:54 ` rguenth at gcc dot gnu.org
  2018-10-18 12:55 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2018-10-18 12:54 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=23792

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=87642

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/23792] New: fr_FR.UTF8 thousands separator is not C++ conforming
@ 2018-10-18 12:54 rguenth at gcc dot gnu.org
  2018-10-18 12:54 ` [Bug localedata/23792] " rguenth at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2018-10-18 12:54 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=23792

            Bug ID: 23792
           Summary: fr_FR.UTF8 thousands separator is not C++ conforming
           Product: glibc
           Version: 2.27
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: rguenth at gcc dot gnu.org
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

Appearantly C++ says it has to be a single character.  Testcase:

#include <iostream>
#include <locale>

using namespace std;

int
main()
{
  locale::global(locale(""));
  cout.imbue(locale());
  cout << 1000 << endl;
}

> LANG=fr_FR.UTF8 ./a.out 
1�000
> LANG=fr_FR.UTF8 ./a.out | hexdump -c
0000000   1 342   0   0   0  \n                                        
0000006

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/23792] fr_FR.UTF8 thousands separator is not C++ conforming
  2018-10-18 12:54 [Bug localedata/23792] New: fr_FR.UTF8 thousands separator is not C++ conforming rguenth at gcc dot gnu.org
  2018-10-18 12:54 ` [Bug localedata/23792] " rguenth at gcc dot gnu.org
@ 2018-10-18 12:55 ` rguenth at gcc dot gnu.org
  2018-10-18 20:01 ` jwakely.gcc at gmail dot com
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2018-10-18 12:55 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=23792

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
It worked with glibc 2.22 where it was ' '.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/23792] fr_FR.UTF8 thousands separator is not C++ conforming
  2018-10-18 12:54 [Bug localedata/23792] New: fr_FR.UTF8 thousands separator is not C++ conforming rguenth at gcc dot gnu.org
  2018-10-18 12:54 ` [Bug localedata/23792] " rguenth at gcc dot gnu.org
  2018-10-18 12:55 ` rguenth at gcc dot gnu.org
@ 2018-10-18 20:01 ` jwakely.gcc at gmail dot com
  2018-10-18 20:31 ` fweimer at redhat dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: jwakely.gcc at gmail dot com @ 2018-10-18 20:01 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=23792

Jonathan Wakely <jwakely.gcc at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jwakely.gcc at gmail dot com

--- Comment #2 from Jonathan Wakely <jwakely.gcc at gmail dot com> ---
The C++ facets for thousands separators only return a single char:
https://en.cppreference.com/w/cpp/locale/numpunct/thousands_sep
https://en.cppreference.com/w/cpp/locale/moneypunct/thousands_sep

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/23792] fr_FR.UTF8 thousands separator is not C++ conforming
  2018-10-18 12:54 [Bug localedata/23792] New: fr_FR.UTF8 thousands separator is not C++ conforming rguenth at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2018-10-18 20:01 ` jwakely.gcc at gmail dot com
@ 2018-10-18 20:31 ` fweimer at redhat dot com
  2018-10-19 11:52 ` jwakely.gcc at gmail dot com
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: fweimer at redhat dot com @ 2018-10-18 20:31 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=23792

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com

--- Comment #3 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to Jonathan Wakely from comment #2)
> The C++ facets for thousands separators only return a single char:
> https://en.cppreference.com/w/cpp/locale/numpunct/thousands_sep
> https://en.cppreference.com/w/cpp/locale/moneypunct/thousands_sep

Can you use a space if the relevant locale data consists of more than one byte?
 It will not cover all cases, of course.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/23792] fr_FR.UTF8 thousands separator is not C++ conforming
  2018-10-18 12:54 [Bug localedata/23792] New: fr_FR.UTF8 thousands separator is not C++ conforming rguenth at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2018-10-18 20:31 ` fweimer at redhat dot com
@ 2018-10-19 11:52 ` jwakely.gcc at gmail dot com
  2018-10-19 18:46 ` carlos at redhat dot com
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: jwakely.gcc at gmail dot com @ 2018-10-19 11:52 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=23792

--- Comment #4 from Jonathan Wakely <jwakely.gcc at gmail dot com> ---
Current GCC trunk replaces some known UTF-8 sequences with single byte
equivalents and for other cases uses iconv to "ASCII//TRANSLIT" and back again,
hoping to get a single byte. If that fails it just disables digit grouping.

https://gcc.gnu.org/viewcvs/gcc/trunk/libstdc%2B%2B-v3/config/locale/gnu/numeric_members.cc?r1=265286&r2=265285&pathrev=265286

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/23792] fr_FR.UTF8 thousands separator is not C++ conforming
  2018-10-18 12:54 [Bug localedata/23792] New: fr_FR.UTF8 thousands separator is not C++ conforming rguenth at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2018-10-19 11:52 ` jwakely.gcc at gmail dot com
@ 2018-10-19 18:46 ` carlos at redhat dot com
  2024-01-03 16:31 ` maiku.fabian at gmail dot com
  2024-01-03 17:18 ` maiku.fabian at gmail dot com
  7 siblings, 0 replies; 9+ messages in thread
From: carlos at redhat dot com @ 2018-10-19 18:46 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=23792

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com

--- Comment #5 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Jonathan Wakely from comment #4)
> Current GCC trunk replaces some known UTF-8 sequences with single byte
> equivalents and for other cases uses iconv to "ASCII//TRANSLIT" and back
> again, hoping to get a single byte. If that fails it just disables digit
> grouping.
> 
> https://gcc.gnu.org/viewcvs/gcc/trunk/libstdc%2B%2B-v3/config/locale/gnu/
> numeric_members.cc?r1=265286&r2=265285&pathrev=265286

So have we decided that this is *not* a glibc defect?

The UTF-8 locales are going to use whatever symbol best suits the language for
the various parameters, and in many cases it may be a multi-byte sequence.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/23792] fr_FR.UTF8 thousands separator is not C++ conforming
  2018-10-18 12:54 [Bug localedata/23792] New: fr_FR.UTF8 thousands separator is not C++ conforming rguenth at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2018-10-19 18:46 ` carlos at redhat dot com
@ 2024-01-03 16:31 ` maiku.fabian at gmail dot com
  2024-01-03 17:18 ` maiku.fabian at gmail dot com
  7 siblings, 0 replies; 9+ messages in thread
From: maiku.fabian at gmail dot com @ 2024-01-03 16:31 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=23792

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maiku.fabian at gmail dot com

--- Comment #6 from Mike FABIAN <maiku.fabian at gmail dot com> ---
I cannot reproduce a problem here:

mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$ cat test.cpp 
#include <iostream>
#include <locale>

using namespace std;

int
main()
{
  locale::global(locale(""));
  cout.imbue(locale());
  cout << 1000 << endl;
}
mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$ g++ test.cpp 
mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$ LC_ALL=de_DE.UTF-8 ./a.out
1.000
mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$ LC_ALL=fr_FR.UTF-8 ./a.out
1 000
mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$ LC_ALL=fr_FR.UTF-8 ./a.out | od -t x1
0000000 31 20 30 30 30 0a
0000006
mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$

Surprisingly this prints a regular space in the French locale. Even
though the French locale uses U+202F NARROW NO-BREAK SPACE for

thousands_sep             " "

/usr/bin/printf prints U+202F NARROW NO-BREAK SPACE as the thousands
separator in French locale though:

mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$ LC_ALL=de_DE.UTF-8 /usr/bin/printf "%'d\n" 1000
1.000
mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$ LC_ALL=fr_FR.UTF-8 /usr/bin/printf "%'d\n" 1000
1 000
mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$

So I don’t think there is a bug in glibc here.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug localedata/23792] fr_FR.UTF8 thousands separator is not C++ conforming
  2018-10-18 12:54 [Bug localedata/23792] New: fr_FR.UTF8 thousands separator is not C++ conforming rguenth at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-01-03 16:31 ` maiku.fabian at gmail dot com
@ 2024-01-03 17:18 ` maiku.fabian at gmail dot com
  7 siblings, 0 replies; 9+ messages in thread
From: maiku.fabian at gmail dot com @ 2024-01-03 17:18 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=23792

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |WORKSFORME
           Assignee|unassigned at sourceware dot org   |maiku.fabian at gmail dot com
             Status|NEW                         |RESOLVED

--- Comment #7 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Closing as WORKSFORME.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-01-03 17:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-18 12:54 [Bug localedata/23792] New: fr_FR.UTF8 thousands separator is not C++ conforming rguenth at gcc dot gnu.org
2018-10-18 12:54 ` [Bug localedata/23792] " rguenth at gcc dot gnu.org
2018-10-18 12:55 ` rguenth at gcc dot gnu.org
2018-10-18 20:01 ` jwakely.gcc at gmail dot com
2018-10-18 20:31 ` fweimer at redhat dot com
2018-10-19 11:52 ` jwakely.gcc at gmail dot com
2018-10-19 18:46 ` carlos at redhat dot com
2024-01-03 16:31 ` maiku.fabian at gmail dot com
2024-01-03 17:18 ` maiku.fabian at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).