public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug locale/31030] New: ERA segments are nul-separated rather than semicolon-separated
@ 2023-11-03 17:17 bugzilla at tecnocode dot co.uk
  2023-12-05 14:15 ` [Bug locale/31030] " schwab@linux-m68k.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: bugzilla at tecnocode dot co.uk @ 2023-11-03 17:17 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31030

            Bug ID: 31030
           Summary: ERA segments are nul-separated rather than
                    semicolon-separated
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: locale
          Assignee: unassigned at sourceware dot org
          Reporter: bugzilla at tecnocode dot co.uk
  Target Milestone: ---

nl_langinfo(3) says “Era description segments are separated by semicolons”, but
it appears that `nl_langinfo (ERA)` returns them separated by nul bytes
instead, and the only way to parse them is to additionally call the
(undocumented) `nl_langinfo (_NL_TIME_ERA_NUM_ENTRIES)` to find out how many
segments there are meant to be.

This can be demonstrated with the following example program (compile with `gcc
-o era era.c -Wall`):
```
#include <langinfo.h>
#include <locale.h>
#include <stdint.h>
#include <stdio.h>

int
main (void)
{
  setlocale (LC_ALL, "");
  const char *era = nl_langinfo (ERA);
  int n_entries = (int) (intptr_t) nl_langinfo (_NL_TIME_ERA_NUM_ENTRIES);

  printf ("n_entries: %d, era: %s\n", n_entries, era);

  return 0;
}
```

If run with `LANG=th_TH.UTF-8 ./era` it prints
```
n_entries: 1, era: +:1:-543/01/01:+*:พ.ศ.:%EC %Ey
```
which is all good.

However, if run with `LANG=ja_JP.UTF-8 ./era`, it prints:
```
n_entries: 11, era: +:2:2020/01/01:+*:令和:%EC%Ey年
```

There clearly aren’t 11 segments in the ’era’ description there — only one.
Looking at the ja_JP locale definition, there are correctly 11 segments defined
in it:
https://github.com/bminor/glibc/blob/master/localedata/locales/ja_JP#L14949-L14977

If I read past 10 nul terminators in the `era` string, I can retrieve all 11
segments. So the locale definition does work. It just doesn’t match up to what
nl_langinfo(3) says, and requires using the undocumented
`_NL_TIME_ERA_NUM_ENTRIES` to read all segments.

Is there a reason `era` is using nul separators? Could it be switched to using
semicolons please? :)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-01-16 21:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-03 17:17 [Bug locale/31030] New: ERA segments are nul-separated rather than semicolon-separated bugzilla at tecnocode dot co.uk
2023-12-05 14:15 ` [Bug locale/31030] " schwab@linux-m68k.org
2024-01-15 13:23 ` smcv at collabora dot com
2024-01-15 13:31 ` smcv at collabora dot com
2024-01-15 13:34 ` schwab@linux-m68k.org
2024-01-15 13:35 ` schwab@linux-m68k.org
2024-01-15 13:50 ` bugzilla at tecnocode dot co.uk
2024-01-16 21:58 ` aurelien at aurel32 dot net

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).