public inbox for glibc-bugs@sourceware.org help / color / mirror / Atom feed
From: "bugzilla at tecnocode dot co.uk" <sourceware-bugzilla@sourceware.org> To: glibc-bugs@sourceware.org Subject: [Bug locale/31030] New: ERA segments are nul-separated rather than semicolon-separated Date: Fri, 03 Nov 2023 17:17:47 +0000 [thread overview] Message-ID: <bug-31030-131@http.sourceware.org/bugzilla/> (raw) https://sourceware.org/bugzilla/show_bug.cgi?id=31030 Bug ID: 31030 Summary: ERA segments are nul-separated rather than semicolon-separated Product: glibc Version: unspecified Status: UNCONFIRMED Severity: normal Priority: P2 Component: locale Assignee: unassigned at sourceware dot org Reporter: bugzilla at tecnocode dot co.uk Target Milestone: --- nl_langinfo(3) says “Era description segments are separated by semicolons”, but it appears that `nl_langinfo (ERA)` returns them separated by nul bytes instead, and the only way to parse them is to additionally call the (undocumented) `nl_langinfo (_NL_TIME_ERA_NUM_ENTRIES)` to find out how many segments there are meant to be. This can be demonstrated with the following example program (compile with `gcc -o era era.c -Wall`): ``` #include <langinfo.h> #include <locale.h> #include <stdint.h> #include <stdio.h> int main (void) { setlocale (LC_ALL, ""); const char *era = nl_langinfo (ERA); int n_entries = (int) (intptr_t) nl_langinfo (_NL_TIME_ERA_NUM_ENTRIES); printf ("n_entries: %d, era: %s\n", n_entries, era); return 0; } ``` If run with `LANG=th_TH.UTF-8 ./era` it prints ``` n_entries: 1, era: +:1:-543/01/01:+*:พ.ศ.:%EC %Ey ``` which is all good. However, if run with `LANG=ja_JP.UTF-8 ./era`, it prints: ``` n_entries: 11, era: +:2:2020/01/01:+*:令和:%EC%Ey年 ``` There clearly aren’t 11 segments in the ’era’ description there — only one. Looking at the ja_JP locale definition, there are correctly 11 segments defined in it: https://github.com/bminor/glibc/blob/master/localedata/locales/ja_JP#L14949-L14977 If I read past 10 nul terminators in the `era` string, I can retrieve all 11 segments. So the locale definition does work. It just doesn’t match up to what nl_langinfo(3) says, and requires using the undocumented `_NL_TIME_ERA_NUM_ENTRIES` to read all segments. Is there a reason `era` is using nul separators? Could it be switched to using semicolons please? :) -- You are receiving this mail because: You are on the CC list for the bug.
next reply other threads:[~2023-11-03 17:17 UTC|newest] Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-11-03 17:17 bugzilla at tecnocode dot co.uk [this message] 2023-12-05 14:15 ` [Bug locale/31030] " schwab@linux-m68k.org 2024-01-15 13:23 ` smcv at collabora dot com 2024-01-15 13:31 ` smcv at collabora dot com 2024-01-15 13:34 ` schwab@linux-m68k.org 2024-01-15 13:35 ` schwab@linux-m68k.org 2024-01-15 13:50 ` bugzilla at tecnocode dot co.uk 2024-01-16 21:58 ` aurelien at aurel32 dot net
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-31030-131@http.sourceware.org/bugzilla/ \ --to=sourceware-bugzilla@sourceware.org \ --cc=glibc-bugs@sourceware.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).