public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "bugzilla at tecnocode dot co.uk" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug locale/31030] New: ERA segments are nul-separated rather than semicolon-separated
Date: Fri, 03 Nov 2023 17:17:47 +0000	[thread overview]
Message-ID: <bug-31030-131@http.sourceware.org/bugzilla/> (raw)

https://sourceware.org/bugzilla/show_bug.cgi?id=31030

            Bug ID: 31030
           Summary: ERA segments are nul-separated rather than
                    semicolon-separated
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: locale
          Assignee: unassigned at sourceware dot org
          Reporter: bugzilla at tecnocode dot co.uk
  Target Milestone: ---

nl_langinfo(3) says “Era description segments are separated by semicolons”, but
it appears that `nl_langinfo (ERA)` returns them separated by nul bytes
instead, and the only way to parse them is to additionally call the
(undocumented) `nl_langinfo (_NL_TIME_ERA_NUM_ENTRIES)` to find out how many
segments there are meant to be.

This can be demonstrated with the following example program (compile with `gcc
-o era era.c -Wall`):
```
#include <langinfo.h>
#include <locale.h>
#include <stdint.h>
#include <stdio.h>

int
main (void)
{
  setlocale (LC_ALL, "");
  const char *era = nl_langinfo (ERA);
  int n_entries = (int) (intptr_t) nl_langinfo (_NL_TIME_ERA_NUM_ENTRIES);

  printf ("n_entries: %d, era: %s\n", n_entries, era);

  return 0;
}
```

If run with `LANG=th_TH.UTF-8 ./era` it prints
```
n_entries: 1, era: +:1:-543/01/01:+*:พ.ศ.:%EC %Ey
```
which is all good.

However, if run with `LANG=ja_JP.UTF-8 ./era`, it prints:
```
n_entries: 11, era: +:2:2020/01/01:+*:令和:%EC%Ey年
```

There clearly aren’t 11 segments in the ’era’ description there — only one.
Looking at the ja_JP locale definition, there are correctly 11 segments defined
in it:
https://github.com/bminor/glibc/blob/master/localedata/locales/ja_JP#L14949-L14977

If I read past 10 nul terminators in the `era` string, I can retrieve all 11
segments. So the locale definition does work. It just doesn’t match up to what
nl_langinfo(3) says, and requires using the undocumented
`_NL_TIME_ERA_NUM_ENTRIES` to read all segments.

Is there a reason `era` is using nul separators? Could it be switched to using
semicolons please? :)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

             reply	other threads:[~2023-11-03 17:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-03 17:17 bugzilla at tecnocode dot co.uk [this message]
2023-12-05 14:15 ` [Bug locale/31030] " schwab@linux-m68k.org
2024-01-15 13:23 ` smcv at collabora dot com
2024-01-15 13:31 ` smcv at collabora dot com
2024-01-15 13:34 ` schwab@linux-m68k.org
2024-01-15 13:35 ` schwab@linux-m68k.org
2024-01-15 13:50 ` bugzilla at tecnocode dot co.uk
2024-01-16 21:58 ` aurelien at aurel32 dot net

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-31030-131@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=glibc-bugs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).