public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/31205] New: Inconsistent (mon_)grouping formats
@ 2024-01-02 13:28 oscar.gustafsson at gmail dot com
  2024-01-02 13:29 ` [Bug localedata/31205] " oscar.gustafsson at gmail dot com
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: oscar.gustafsson at gmail dot com @ 2024-01-02 13:28 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=31205

            Bug ID: 31205
           Summary: Inconsistent (mon_)grouping formats
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: oscar.gustafsson at gmail dot com
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

I was trying to look into using number grouping for a project and realized that
the formats used is not consistent. For reference, here is the documentation:

https://sourceware.org/glibc/manual/html_node/General-Numeric.html

These are the two issues I've found:

* Many locales have the same digit repeated, e.g., en_US
https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/en_US;h=5cc518dff2fc1309e5cddd86950d6e9898a2d7e1;hb=refs/heads/master#l75
As far as I can tell, it should be enough to have a single 3 there. As is the
case for, e.g., en_HK
https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/en_HK;h=5f797e076099c4972d3c74fe92e5a6607c3bae95;hb=refs/heads/master#l84

* Some locales have 0;0 as grouping, e.g. el_GR
https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/el_GR;h=285e1e009276476f2aa2d2745177944c7b34a78b;hb=HEAD
Not sure what this is supposed to mean, but, e.g,. POSIX have -1 to indicate
"no grouping" 
https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/POSIX;h=7ec7f1c5774ab1fb011c08e2e17d435923e48fe2;hb=refs/heads/master#l262 

Note that "The last member is either 0, in which case the previous member is
used over and over again for all the remaining groups...", i.e., string
termination, but here there will be a string with three string termination
characters, to no previous member.

To some extent this is also the case for mon_grouping, at least the first case.

I guess the impact of this issue depends on the situation. The first one will
just waste a few bytes (and lead to confusion), but the second may lead to
weird results, at least in code using the raw localedata information without
noticing this.

If people agree that this should be consistent and fixed (not so obvious what
to replace 0;0 with, probably -1?), I'd be happy to provide a patch. (Even more
happy to be able to do that using standard git-access, I can provide some
credentials that I know how to use it etc.)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug localedata/31205] Inconsistent (mon_)grouping formats
  2024-01-02 13:28 [Bug localedata/31205] New: Inconsistent (mon_)grouping formats oscar.gustafsson at gmail dot com
@ 2024-01-02 13:29 ` oscar.gustafsson at gmail dot com
  2024-01-02 16:04 ` maiku.fabian at gmail dot com
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: oscar.gustafsson at gmail dot com @ 2024-01-02 13:29 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=31205

--- Comment #1 from Oscar Gustafsson <oscar.gustafsson at gmail dot com> ---
Direct link for the 0;0 case:
https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/el_GR;h=285e1e009276476f2aa2d2745177944c7b34a78b;hb=HEAD#l92

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug localedata/31205] Inconsistent (mon_)grouping formats
  2024-01-02 13:28 [Bug localedata/31205] New: Inconsistent (mon_)grouping formats oscar.gustafsson at gmail dot com
  2024-01-02 13:29 ` [Bug localedata/31205] " oscar.gustafsson at gmail dot com
@ 2024-01-02 16:04 ` maiku.fabian at gmail dot com
  2024-01-02 16:11 ` maiku.fabian at gmail dot com
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: maiku.fabian at gmail dot com @ 2024-01-02 16:04 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=31205

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maiku.fabian at gmail dot com

--- Comment #2 from Mike FABIAN <maiku.fabian at gmail dot com> ---
https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html says:

7.3.4 LC_NUMERIC
...

grouping
    Define the size of each group of digits in formatted non-monetary
quantities. The operand is a sequence of integers separated by semicolons. Each
integer specifies the number of digits in each group, with the initial integer
defining the size of the group immediately preceding the decimal delimiter, and
the following integers defining the preceding groups. If the last integer is
not -1, then the size of the previous group (if any) shall be repeatedly used
for the remainder of the digits. If the last integer is -1, then no further
grouping shall be performed.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug localedata/31205] Inconsistent (mon_)grouping formats
  2024-01-02 13:28 [Bug localedata/31205] New: Inconsistent (mon_)grouping formats oscar.gustafsson at gmail dot com
  2024-01-02 13:29 ` [Bug localedata/31205] " oscar.gustafsson at gmail dot com
  2024-01-02 16:04 ` maiku.fabian at gmail dot com
@ 2024-01-02 16:11 ` maiku.fabian at gmail dot com
  2024-01-02 16:14 ` maiku.fabian at gmail dot com
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: maiku.fabian at gmail dot com @ 2024-01-02 16:11 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=31205

--- Comment #3 from Mike FABIAN <maiku.fabian at gmail dot com> ---
So in the el_GR locale, one could use

grouping -1


instead of 

grouping 0:0


But it does not seem to matter, both do the same:


mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$ grep -E "grouping.*(0;0|-1)" *
C:mon_grouping        -1
C:grouping        -1
POSIX:mon_grouping        -1
POSIX:grouping        -1
aa_DJ:grouping               0;0
ar_SA:mon_grouping      -1
ar_SA:grouping  -1
bs_BA:grouping                  0;0
el_CY:grouping                  0;0
el_GR:grouping                  0;0
eo:grouping      0;0
es_CU:grouping             0;0
gl_ES:grouping             0;0
i18n:mon_grouping        -1
i18n:grouping        -1
mg_MG:grouping                  0;0
pap_AW:grouping                  0;0
pap_CW:grouping                  0;0
pt_PT:grouping                  0;0
rw_RW:grouping                  -1
sl_SI:grouping                  0;0
sr_RS:grouping                  0;0
ti_ER:grouping              0;0
wo_SN:grouping                  0;0
mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$ LC_ALL=rw_RW.UTF-8 /usr/bin/printf "%'f\n" 12345678.9
12345678,900000
mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$ LC_ALL=el_GR.UTF-8 /usr/bin/printf "%'f\n" 12345678.9
12345678,900000
mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug localedata/31205] Inconsistent (mon_)grouping formats
  2024-01-02 13:28 [Bug localedata/31205] New: Inconsistent (mon_)grouping formats oscar.gustafsson at gmail dot com
                   ` (2 preceding siblings ...)
  2024-01-02 16:11 ` maiku.fabian at gmail dot com
@ 2024-01-02 16:14 ` maiku.fabian at gmail dot com
  2024-01-02 16:37 ` oscar.gustafsson at gmail dot com
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: maiku.fabian at gmail dot com @ 2024-01-02 16:14 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=31205

--- Comment #4 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Also 

grouping 3

and

grouping 3;3

behaves the same:

mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$ grep grouping en_US en_PH
en_US:mon_grouping        3;3
en_US:grouping        3;3
en_PH:mon_grouping          3
en_PH:grouping               3
mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$ LC_ALL=en_US.UTF-8 /usr/bin/printf "%'f\n" 12345678.9
12,345,678.900000
mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$ LC_ALL=en_PH.UTF-8 /usr/bin/printf "%'f\n" 12345678.9
12,345,678.900000
mfabian@hathi:/local/mfabian/src/glibc/localedata/locales (master $%)
$

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug localedata/31205] Inconsistent (mon_)grouping formats
  2024-01-02 13:28 [Bug localedata/31205] New: Inconsistent (mon_)grouping formats oscar.gustafsson at gmail dot com
                   ` (3 preceding siblings ...)
  2024-01-02 16:14 ` maiku.fabian at gmail dot com
@ 2024-01-02 16:37 ` oscar.gustafsson at gmail dot com
  2024-01-18 15:11 ` maiku.fabian at gmail dot com
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: oscar.gustafsson at gmail dot com @ 2024-01-02 16:37 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=31205

--- Comment #5 from Oscar Gustafsson <oscar.gustafsson at gmail dot com> ---
Thanks for the reply.

Yes, they behave the same, but for consistency reasons I believe that one of
them should be selected. 

Two reasons:

* When trying to understand how to specify these strings, the mix of formats
(and redundant information) is rather confusing.

* There are other tools relying on these files and it would be better if there
are fewer corner cases to handle/optimizations to be done.

I've later learnt that -1 is translated into "" by localeconv. Hence, one may
suspect that 0;0 works because it translates into three(?) string termination
characters. While this clearly works, one can hardly argue that it makes sense.

For the 3;3 case, it may make sense in the user code to check if there is a
single digit and in that case have a fast path. Which 3;3 will never detect.

Or put another way: what is the benefit of having inconsistent data that may
lead to redundant storage and additional computations?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug localedata/31205] Inconsistent (mon_)grouping formats
  2024-01-02 13:28 [Bug localedata/31205] New: Inconsistent (mon_)grouping formats oscar.gustafsson at gmail dot com
                   ` (4 preceding siblings ...)
  2024-01-02 16:37 ` oscar.gustafsson at gmail dot com
@ 2024-01-18 15:11 ` maiku.fabian at gmail dot com
  2024-01-18 15:12 ` maiku.fabian at gmail dot com
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: maiku.fabian at gmail dot com @ 2024-01-18 15:11 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=31205

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-01-18
             Status|UNCONFIRMED                 |ASSIGNED
     Ever confirmed|0                           |1

--- Comment #6 from Mike FABIAN <maiku.fabian at gmail dot com> ---
OK, then I’ll change 0;0 ➡️ -1 and 3;3 ➡️ -1.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug localedata/31205] Inconsistent (mon_)grouping formats
  2024-01-02 13:28 [Bug localedata/31205] New: Inconsistent (mon_)grouping formats oscar.gustafsson at gmail dot com
                   ` (5 preceding siblings ...)
  2024-01-18 15:11 ` maiku.fabian at gmail dot com
@ 2024-01-18 15:12 ` maiku.fabian at gmail dot com
  2024-01-18 16:11 ` maiku.fabian at gmail dot com
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: maiku.fabian at gmail dot com @ 2024-01-18 15:12 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=31205

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at sourceware dot org   |maiku.fabian at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug localedata/31205] Inconsistent (mon_)grouping formats
  2024-01-02 13:28 [Bug localedata/31205] New: Inconsistent (mon_)grouping formats oscar.gustafsson at gmail dot com
                   ` (6 preceding siblings ...)
  2024-01-18 15:12 ` maiku.fabian at gmail dot com
@ 2024-01-18 16:11 ` maiku.fabian at gmail dot com
  2024-01-19 14:21 ` maiku.fabian at gmail dot com
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: maiku.fabian at gmail dot com @ 2024-01-18 16:11 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=31205

--- Comment #7 from Mike FABIAN <maiku.fabian at gmail dot com> ---
This test case needs to be adapted: 

https://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-common/tst-grouping_iterator.c;h=79cc9f4e7a168fb732af29afd25f194d310384fb;hb=HEAD

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug localedata/31205] Inconsistent (mon_)grouping formats
  2024-01-02 13:28 [Bug localedata/31205] New: Inconsistent (mon_)grouping formats oscar.gustafsson at gmail dot com
                   ` (7 preceding siblings ...)
  2024-01-18 16:11 ` maiku.fabian at gmail dot com
@ 2024-01-19 14:21 ` maiku.fabian at gmail dot com
  2024-01-22 14:22 ` maiku.fabian at gmail dot com
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: maiku.fabian at gmail dot com @ 2024-01-19 14:21 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=31205

--- Comment #8 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Oscar Gustafsson from comment #5)

> * There are other tools relying on these files and it would be better if
> there are fewer corner cases to handle/optimizations to be done.

These other tools nevertheless need to be able to parse '3;3' and '0:0' as this
remains possible.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug localedata/31205] Inconsistent (mon_)grouping formats
  2024-01-02 13:28 [Bug localedata/31205] New: Inconsistent (mon_)grouping formats oscar.gustafsson at gmail dot com
                   ` (8 preceding siblings ...)
  2024-01-19 14:21 ` maiku.fabian at gmail dot com
@ 2024-01-22 14:22 ` maiku.fabian at gmail dot com
  2024-01-25 10:41 ` cvs-commit at gcc dot gnu.org
  2024-01-25 10:50 ` maiku.fabian at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: maiku.fabian at gmail dot com @ 2024-01-22 14:22 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=31205

--- Comment #9 from Mike FABIAN <maiku.fabian at gmail dot com> ---
https://patchwork.sourceware.org/project/glibc/patch/20240122142005.993598-1-mfabian@redhat.com/

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug localedata/31205] Inconsistent (mon_)grouping formats
  2024-01-02 13:28 [Bug localedata/31205] New: Inconsistent (mon_)grouping formats oscar.gustafsson at gmail dot com
                   ` (9 preceding siblings ...)
  2024-01-22 14:22 ` maiku.fabian at gmail dot com
@ 2024-01-25 10:41 ` cvs-commit at gcc dot gnu.org
  2024-01-25 10:50 ` maiku.fabian at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-25 10:41 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=31205

--- Comment #10 from Sourceware Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Mike Fabian <mfabian@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5176a830e70140cb3390c62b7d41f75dbbf33c7c

commit 5176a830e70140cb3390c62b7d41f75dbbf33c7c
Author: Mike FABIAN <mfabian@redhat.com>
Date:   Thu Jan 18 16:52:03 2024 +0100

    localedata: Use consistent values for grouping and mon_grouping

    Resolves: BZ # 31205

    Adapt test cases in test-grouping_iterator.c

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug localedata/31205] Inconsistent (mon_)grouping formats
  2024-01-02 13:28 [Bug localedata/31205] New: Inconsistent (mon_)grouping formats oscar.gustafsson at gmail dot com
                   ` (10 preceding siblings ...)
  2024-01-25 10:41 ` cvs-commit at gcc dot gnu.org
@ 2024-01-25 10:50 ` maiku.fabian at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: maiku.fabian at gmail dot com @ 2024-01-25 10:50 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=31205

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
   Target Milestone|---                         |2.39
             Status|ASSIGNED                    |RESOLVED

--- Comment #11 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Fixed in glibc master.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-01-25 10:50 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-02 13:28 [Bug localedata/31205] New: Inconsistent (mon_)grouping formats oscar.gustafsson at gmail dot com
2024-01-02 13:29 ` [Bug localedata/31205] " oscar.gustafsson at gmail dot com
2024-01-02 16:04 ` maiku.fabian at gmail dot com
2024-01-02 16:11 ` maiku.fabian at gmail dot com
2024-01-02 16:14 ` maiku.fabian at gmail dot com
2024-01-02 16:37 ` oscar.gustafsson at gmail dot com
2024-01-18 15:11 ` maiku.fabian at gmail dot com
2024-01-18 15:12 ` maiku.fabian at gmail dot com
2024-01-18 16:11 ` maiku.fabian at gmail dot com
2024-01-19 14:21 ` maiku.fabian at gmail dot com
2024-01-22 14:22 ` maiku.fabian at gmail dot com
2024-01-25 10:41 ` cvs-commit at gcc dot gnu.org
2024-01-25 10:50 ` maiku.fabian at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).