* [PATCH 0/2] Make C/POSIX and C.UTF-8 consistent. @ 2022-01-31 5:34 Carlos O'Donell 2022-01-31 5:34 ` [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point Carlos O'Donell 2022-01-31 5:34 ` [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX Carlos O'Donell 0 siblings, 2 replies; 15+ messages in thread From: Carlos O'Donell @ 2022-01-31 5:34 UTC (permalink / raw) To: libc-alpha, fweimer, michael.hudson We had a recent report from Michael Hudson-Doyle that he had seen a problem with C.UTF-8 when running tests for rrdtool. The report prompted Florian to place this on the glibc 2.35 blocker for review. Upon review I decided to haromize C.UTF-8 closer to C/POSIX and I worked with Florian to fix the discrepancies between the C.UTF-8 locale and the builtin C/POSIX locale. The work uncoverd a problem in the parsing of LC_MONETARY by localedef which needed fixing in order to make C/POSIX and C.UTF-8 consistent. The first commit fixes the mon_decimal_point handling in localedef parsing, while the second commit fixes C.UTF-8 and adds a new test to check for consistency beween C/POSIX and C.UTF-8. The test is based on work that Florian Weimer did to help me identify the inconsistencies between the locales. Carlos O'Donell (2): localedef: Fix handling of empty mon_decimal_point localedata: Adjust C.UTF-8 to align with C/POSIX. locale/programs/ld-monetary.c | 4 +- localedata/Makefile | 30 +- localedata/locales/C | 22 +- localedata/tst-c-utf8-consistency.c | 539 ++++++++++++++++++++++++++++ 4 files changed, 579 insertions(+), 16 deletions(-) create mode 100644 localedata/tst-c-utf8-consistency.c -- 2.31.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point 2022-01-31 5:34 [PATCH 0/2] Make C/POSIX and C.UTF-8 consistent Carlos O'Donell @ 2022-01-31 5:34 ` Carlos O'Donell 2022-01-31 15:26 ` Florian Weimer 2022-02-01 11:47 ` Florian Weimer 2022-01-31 5:34 ` [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX Carlos O'Donell 1 sibling, 2 replies; 15+ messages in thread From: Carlos O'Donell @ 2022-01-31 5:34 UTC (permalink / raw) To: libc-alpha, fweimer, michael.hudson The handling of mon_decimal_point is incorrect when it comes to handling the empty "" value. The existing parser in monetary_read() will correctly handle setting the non-wide-character value and the wide-character value e.g. STR_ELEM_WC(mon_decimal_point) if they are set in the locale definition. However, in monetary_finish() we have conflicting TEST_ELEM() which sets a default value (if the locale definition doesn't include one), and subsequent code which looks for mon_decimal_point to be NULL to issue a specific error message and set the defaults. The latter is unused because TEST_ELEM() always sets a default. The simplest solution is to remove the TEST_ELEM() check, and allow the existing check to look to see if mon_decimal_point is NULL and set an appropriate default. The final fix is to move the setting of mon_decimal_point_wc so it occurs only when mon_decimal_point is being set to a default, keeping both values consistent. There is no way to tell the difference between mon_decimal_point_wc having been set to the empty string and not having been defined at all, for that distinction we must use mon_decimal_point being NULL or "", and so we must logically set the default together with mon_decimal_point. Lastly, there are more fixes similar to this that could be made to ld-monetary.c, but we avoid that in order to fix just the code required for mon_decimal_point, which impacts the ability for C.UTF-8 to set mon_decimal_point to "", since without this fix we end up with an inconsistent setting of mon_decimal_point set to "", but mon_decimal_point_wc set to "." which is incorrect. Tested on x86_64 and i686 without regression. --- locale/programs/ld-monetary.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c index 277b9ff042..3b0412b405 100644 --- a/locale/programs/ld-monetary.c +++ b/locale/programs/ld-monetary.c @@ -207,7 +207,6 @@ No definition for %s category found"), "LC_MONETARY"); TEST_ELEM (int_curr_symbol, ""); TEST_ELEM (currency_symbol, ""); - TEST_ELEM (mon_decimal_point, "."); TEST_ELEM (mon_thousands_sep, ""); TEST_ELEM (positive_sign, ""); TEST_ELEM (negative_sign, ""); @@ -257,6 +256,7 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"), record_error (0, 0, _("%s: field `%s' not defined"), "LC_MONETARY", "mon_decimal_point"); monetary->mon_decimal_point = "."; + monetary->mon_decimal_point_wc = L'.'; } else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing) { @@ -264,8 +264,6 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"), %s: value for field `%s' must not be an empty string"), "LC_MONETARY", "mon_decimal_point"); } - if (monetary->mon_decimal_point_wc == L'\0') - monetary->mon_decimal_point_wc = L'.'; if (monetary->mon_grouping_len == 0) { -- 2.31.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point 2022-01-31 5:34 ` [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point Carlos O'Donell @ 2022-01-31 15:26 ` Florian Weimer 2022-01-31 16:09 ` Andreas Schwab 2022-02-01 11:47 ` Florian Weimer 1 sibling, 1 reply; 15+ messages in thread From: Florian Weimer @ 2022-01-31 15:26 UTC (permalink / raw) To: Carlos O'Donell; +Cc: libc-alpha, michael.hudson * Carlos O'Donell: > diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c > index 277b9ff042..3b0412b405 100644 > --- a/locale/programs/ld-monetary.c > +++ b/locale/programs/ld-monetary.c > @@ -207,7 +207,6 @@ No definition for %s category found"), "LC_MONETARY"); > > TEST_ELEM (int_curr_symbol, ""); > TEST_ELEM (currency_symbol, ""); > - TEST_ELEM (mon_decimal_point, "."); > TEST_ELEM (mon_thousands_sep, ""); > TEST_ELEM (positive_sign, ""); > TEST_ELEM (negative_sign, ""); > @@ -257,6 +256,7 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"), > record_error (0, 0, _("%s: field `%s' not defined"), > "LC_MONETARY", "mon_decimal_point"); > monetary->mon_decimal_point = "."; > + monetary->mon_decimal_point_wc = L'.'; > } > else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing) > { > @@ -264,8 +264,6 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"), > %s: value for field `%s' must not be an empty string"), > "LC_MONETARY", "mon_decimal_point"); > } > - if (monetary->mon_decimal_point_wc == L'\0') > - monetary->mon_decimal_point_wc = L'.'; > > if (monetary->mon_grouping_len == 0) > { There's an existing comment /* The decimal point must not be empty. This is not said explicitly in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be != "". */ that says that empty strings/null characters are invalid. The comment was clearly copied from locale/programs/ld-numeric.c. *However* we have got this code in stdio-common/printf_fp.c: decimal = _nl_lookup (loc, LC_MONETARY, MON_DECIMAL_POINT); if (*decimal == '\0') decimal = _nl_lookup (loc, LC_NUMERIC, DECIMAL_POINT); decimalwc = _nl_lookup_word (loc, LC_MONETARY, _NL_MONETARY_DECIMAL_POINT_WC); if (decimalwc == L'\0') decimalwc = _nl_lookup_word (loc, LC_NUMERIC, _NL_NUMERIC_DECIMAL_POINT_WC); So we use LC_NUMERIC as the fallback, and our strfmon implementation is okay with it. But our localeconv implementation lacks this fallback, which looks like a bug because the built-in C locale uses an empty string/a null character. Still I think simplifying the locale data is the right direction here. Thanks, Florian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point 2022-01-31 15:26 ` Florian Weimer @ 2022-01-31 16:09 ` Andreas Schwab 2022-01-31 16:20 ` Florian Weimer 0 siblings, 1 reply; 15+ messages in thread From: Andreas Schwab @ 2022-01-31 16:09 UTC (permalink / raw) To: Florian Weimer via Libc-alpha; +Cc: Carlos O'Donell, Florian Weimer On Jan 31 2022, Florian Weimer via Libc-alpha wrote: > There's an existing comment > > /* The decimal point must not be empty. This is not said explicitly > in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be > != "". */ > > that says that empty strings/null characters are invalid. This is only about decimal_point, mon_decimal_point can be empty. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point 2022-01-31 16:09 ` Andreas Schwab @ 2022-01-31 16:20 ` Florian Weimer 2022-01-31 16:30 ` Andreas Schwab 0 siblings, 1 reply; 15+ messages in thread From: Florian Weimer @ 2022-01-31 16:20 UTC (permalink / raw) To: Andreas Schwab; +Cc: Florian Weimer via Libc-alpha, Carlos O'Donell * Andreas Schwab: > On Jan 31 2022, Florian Weimer via Libc-alpha wrote: > >> There's an existing comment >> >> /* The decimal point must not be empty. This is not said explicitly >> in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be >> != "". */ >> >> that says that empty strings/null characters are invalid. > > This is only about decimal_point, mon_decimal_point can be empty. Hmm, I'll take your word for it. So the comment should definitely go, and the Carlos' change is the right way to do it? Thanks, Florian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point 2022-01-31 16:20 ` Florian Weimer @ 2022-01-31 16:30 ` Andreas Schwab 2022-01-31 16:37 ` Florian Weimer 0 siblings, 1 reply; 15+ messages in thread From: Andreas Schwab @ 2022-01-31 16:30 UTC (permalink / raw) To: Florian Weimer; +Cc: Florian Weimer via Libc-alpha, Carlos O'Donell On Jan 31 2022, Florian Weimer wrote: > * Andreas Schwab: > >> On Jan 31 2022, Florian Weimer via Libc-alpha wrote: >> >>> There's an existing comment >>> >>> /* The decimal point must not be empty. This is not said explicitly >>> in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be >>> != "". */ >>> >>> that says that empty strings/null characters are invalid. >> >> This is only about decimal_point, mon_decimal_point can be empty. > > Hmm, I'll take your word for it. See 7.11.2.1, paragraph 3 and 10. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point 2022-01-31 16:30 ` Andreas Schwab @ 2022-01-31 16:37 ` Florian Weimer 0 siblings, 0 replies; 15+ messages in thread From: Florian Weimer @ 2022-01-31 16:37 UTC (permalink / raw) To: Andreas Schwab; +Cc: Florian Weimer via Libc-alpha * Andreas Schwab: > On Jan 31 2022, Florian Weimer wrote: > >> * Andreas Schwab: >> >>> On Jan 31 2022, Florian Weimer via Libc-alpha wrote: >>> >>>> There's an existing comment >>>> >>>> /* The decimal point must not be empty. This is not said explicitly >>>> in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be >>>> != "". */ >>>> >>>> that says that empty strings/null characters are invalid. >>> >>> This is only about decimal_point, mon_decimal_point can be empty. >> >> Hmm, I'll take your word for it. > > See 7.11.2.1, paragraph 3 and 10. That is fairly conclusive indeed (numbers match C11). Are you okay with Carlos' patch with a comment update? Thanks, Florian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point 2022-01-31 5:34 ` [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point Carlos O'Donell 2022-01-31 15:26 ` Florian Weimer @ 2022-02-01 11:47 ` Florian Weimer 2022-02-01 16:00 ` Carlos O'Donell 1 sibling, 1 reply; 15+ messages in thread From: Florian Weimer @ 2022-02-01 11:47 UTC (permalink / raw) To: Carlos O'Donell; +Cc: libc-alpha, michael.hudson * Carlos O'Donell: > The handling of mon_decimal_point is incorrect when it comes to > handling the empty "" value. The existing parser in monetary_read() > will correctly handle setting the non-wide-character value and the > wide-character value e.g. STR_ELEM_WC(mon_decimal_point) if they are > set in the locale definition. However, in monetary_finish() we have > conflicting TEST_ELEM() which sets a default value (if the locale > definition doesn't include one), and subsequent code which looks for > mon_decimal_point to be NULL to issue a specific error message and set > the defaults. The latter is unused because TEST_ELEM() always sets a > default. The simplest solution is to remove the TEST_ELEM() check, > and allow the existing check to look to see if mon_decimal_point is > NULL and set an appropriate default. The final fix is to move the > setting of mon_decimal_point_wc so it occurs only when > mon_decimal_point is being set to a default, keeping both values > consistent. There is no way to tell the difference between > mon_decimal_point_wc having been set to the empty string and not > having been defined at all, for that distinction we must use > mon_decimal_point being NULL or "", and so we must logically set > the default together with mon_decimal_point. > > Lastly, there are more fixes similar to this that could be made to > ld-monetary.c, but we avoid that in order to fix just the code > required for mon_decimal_point, which impacts the ability for C.UTF-8 > to set mon_decimal_point to "", since without this fix we end up with > an inconsistent setting of mon_decimal_point set to "", but > mon_decimal_point_wc set to "." which is incorrect. > > Tested on x86_64 and i686 without regression. > --- > locale/programs/ld-monetary.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c > index 277b9ff042..3b0412b405 100644 > --- a/locale/programs/ld-monetary.c > +++ b/locale/programs/ld-monetary.c > @@ -207,7 +207,6 @@ No definition for %s category found"), "LC_MONETARY"); > > TEST_ELEM (int_curr_symbol, ""); > TEST_ELEM (currency_symbol, ""); > - TEST_ELEM (mon_decimal_point, "."); > TEST_ELEM (mon_thousands_sep, ""); > TEST_ELEM (positive_sign, ""); > TEST_ELEM (negative_sign, ""); > @@ -257,6 +256,7 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"), > record_error (0, 0, _("%s: field `%s' not defined"), > "LC_MONETARY", "mon_decimal_point"); > monetary->mon_decimal_point = "."; > + monetary->mon_decimal_point_wc = L'.'; > } > else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing) > { > @@ -264,8 +264,6 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"), > %s: value for field `%s' must not be an empty string"), > "LC_MONETARY", "mon_decimal_point"); > } > - if (monetary->mon_decimal_point_wc == L'\0') > - monetary->mon_decimal_point_wc = L'.'; > > if (monetary->mon_grouping_len == 0) > { I have verified that this does not change the localedef output for the existing locales created by install-locale-files. I think we need further cleanups in the comments and checks (which were coped from LC_NUMERIC, but should not apply to LC_MONETARY). But I think we can release with this version. Reviewed-by: Florian Weimer <fweimer@redhat.com> Thanks, Florian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point 2022-02-01 11:47 ` Florian Weimer @ 2022-02-01 16:00 ` Carlos O'Donell 2022-02-01 16:14 ` Carlos O'Donell 0 siblings, 1 reply; 15+ messages in thread From: Carlos O'Donell @ 2022-02-01 16:00 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha, michael.hudson On 2/1/22 06:47, Florian Weimer wrote: > * Carlos O'Donell: > >> The handling of mon_decimal_point is incorrect when it comes to >> handling the empty "" value. The existing parser in monetary_read() >> will correctly handle setting the non-wide-character value and the >> wide-character value e.g. STR_ELEM_WC(mon_decimal_point) if they are >> set in the locale definition. However, in monetary_finish() we have >> conflicting TEST_ELEM() which sets a default value (if the locale >> definition doesn't include one), and subsequent code which looks for >> mon_decimal_point to be NULL to issue a specific error message and set >> the defaults. The latter is unused because TEST_ELEM() always sets a >> default. The simplest solution is to remove the TEST_ELEM() check, >> and allow the existing check to look to see if mon_decimal_point is >> NULL and set an appropriate default. The final fix is to move the >> setting of mon_decimal_point_wc so it occurs only when >> mon_decimal_point is being set to a default, keeping both values >> consistent. There is no way to tell the difference between >> mon_decimal_point_wc having been set to the empty string and not >> having been defined at all, for that distinction we must use >> mon_decimal_point being NULL or "", and so we must logically set >> the default together with mon_decimal_point. >> >> Lastly, there are more fixes similar to this that could be made to >> ld-monetary.c, but we avoid that in order to fix just the code >> required for mon_decimal_point, which impacts the ability for C.UTF-8 >> to set mon_decimal_point to "", since without this fix we end up with >> an inconsistent setting of mon_decimal_point set to "", but >> mon_decimal_point_wc set to "." which is incorrect. >> >> Tested on x86_64 and i686 without regression. >> --- >> locale/programs/ld-monetary.c | 4 +--- >> 1 file changed, 1 insertion(+), 3 deletions(-) >> >> diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c >> index 277b9ff042..3b0412b405 100644 >> --- a/locale/programs/ld-monetary.c >> +++ b/locale/programs/ld-monetary.c >> @@ -207,7 +207,6 @@ No definition for %s category found"), "LC_MONETARY"); >> >> TEST_ELEM (int_curr_symbol, ""); >> TEST_ELEM (currency_symbol, ""); >> - TEST_ELEM (mon_decimal_point, "."); >> TEST_ELEM (mon_thousands_sep, ""); >> TEST_ELEM (positive_sign, ""); >> TEST_ELEM (negative_sign, ""); >> @@ -257,6 +256,7 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"), >> record_error (0, 0, _("%s: field `%s' not defined"), >> "LC_MONETARY", "mon_decimal_point"); >> monetary->mon_decimal_point = "."; >> + monetary->mon_decimal_point_wc = L'.'; >> } >> else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing) >> { >> @@ -264,8 +264,6 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"), >> %s: value for field `%s' must not be an empty string"), >> "LC_MONETARY", "mon_decimal_point"); >> } >> - if (monetary->mon_decimal_point_wc == L'\0') >> - monetary->mon_decimal_point_wc = L'.'; >> >> if (monetary->mon_grouping_len == 0) >> { > > I have verified that this does not change the localedef output for the > existing locales created by install-locale-files. > > I think we need further cleanups in the comments and checks (which were > coped from LC_NUMERIC, but should not apply to LC_MONETARY). But I > think we can release with this version. I filed this bug to track that: Bug 28845 - ld-monetary.c should be updated to match ISO C and other standards. https://sourceware.org/bugzilla/show_bug.cgi?id=28845 Thanks for the review! > Reviewed-by: Florian Weimer <fweimer@redhat.com> > > Thanks, > Florian > -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point 2022-02-01 16:00 ` Carlos O'Donell @ 2022-02-01 16:14 ` Carlos O'Donell 0 siblings, 0 replies; 15+ messages in thread From: Carlos O'Donell @ 2022-02-01 16:14 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha, michael.hudson On 2/1/22 11:00, Carlos O'Donell wrote: > On 2/1/22 06:47, Florian Weimer wrote: >> * Carlos O'Donell: >> >>> The handling of mon_decimal_point is incorrect when it comes to >>> handling the empty "" value. The existing parser in monetary_read() >>> will correctly handle setting the non-wide-character value and the >>> wide-character value e.g. STR_ELEM_WC(mon_decimal_point) if they are >>> set in the locale definition. However, in monetary_finish() we have >>> conflicting TEST_ELEM() which sets a default value (if the locale >>> definition doesn't include one), and subsequent code which looks for >>> mon_decimal_point to be NULL to issue a specific error message and set >>> the defaults. The latter is unused because TEST_ELEM() always sets a >>> default. The simplest solution is to remove the TEST_ELEM() check, >>> and allow the existing check to look to see if mon_decimal_point is >>> NULL and set an appropriate default. The final fix is to move the >>> setting of mon_decimal_point_wc so it occurs only when >>> mon_decimal_point is being set to a default, keeping both values >>> consistent. There is no way to tell the difference between >>> mon_decimal_point_wc having been set to the empty string and not >>> having been defined at all, for that distinction we must use >>> mon_decimal_point being NULL or "", and so we must logically set >>> the default together with mon_decimal_point. >>> >>> Lastly, there are more fixes similar to this that could be made to >>> ld-monetary.c, but we avoid that in order to fix just the code >>> required for mon_decimal_point, which impacts the ability for C.UTF-8 >>> to set mon_decimal_point to "", since without this fix we end up with >>> an inconsistent setting of mon_decimal_point set to "", but >>> mon_decimal_point_wc set to "." which is incorrect. >>> >>> Tested on x86_64 and i686 without regression. >>> --- >>> locale/programs/ld-monetary.c | 4 +--- >>> 1 file changed, 1 insertion(+), 3 deletions(-) >>> >>> diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c >>> index 277b9ff042..3b0412b405 100644 >>> --- a/locale/programs/ld-monetary.c >>> +++ b/locale/programs/ld-monetary.c >>> @@ -207,7 +207,6 @@ No definition for %s category found"), "LC_MONETARY"); >>> >>> TEST_ELEM (int_curr_symbol, ""); >>> TEST_ELEM (currency_symbol, ""); >>> - TEST_ELEM (mon_decimal_point, "."); >>> TEST_ELEM (mon_thousands_sep, ""); >>> TEST_ELEM (positive_sign, ""); >>> TEST_ELEM (negative_sign, ""); >>> @@ -257,6 +256,7 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"), >>> record_error (0, 0, _("%s: field `%s' not defined"), >>> "LC_MONETARY", "mon_decimal_point"); >>> monetary->mon_decimal_point = "."; >>> + monetary->mon_decimal_point_wc = L'.'; >>> } >>> else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing) >>> { >>> @@ -264,8 +264,6 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"), >>> %s: value for field `%s' must not be an empty string"), >>> "LC_MONETARY", "mon_decimal_point"); >>> } >>> - if (monetary->mon_decimal_point_wc == L'\0') >>> - monetary->mon_decimal_point_wc = L'.'; >>> >>> if (monetary->mon_grouping_len == 0) >>> { >> >> I have verified that this does not change the localedef output for the >> existing locales created by install-locale-files. >> >> I think we need further cleanups in the comments and checks (which were >> coped from LC_NUMERIC, but should not apply to LC_MONETARY). But I >> think we can release with this version. > > I filed this bug to track that: > Bug 28845 - ld-monetary.c should be updated to match ISO C and other standards. > https://sourceware.org/bugzilla/show_bug.cgi?id=28845 And I filed one more bug to track the original bug, which I'll close after push: https://sourceware.org/bugzilla/show_bug.cgi?id=28847 -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX. 2022-01-31 5:34 [PATCH 0/2] Make C/POSIX and C.UTF-8 consistent Carlos O'Donell 2022-01-31 5:34 ` [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point Carlos O'Donell @ 2022-01-31 5:34 ` Carlos O'Donell 2022-01-31 8:47 ` Andreas Schwab 2022-02-01 12:05 ` Florian Weimer 1 sibling, 2 replies; 15+ messages in thread From: Carlos O'Donell @ 2022-01-31 5:34 UTC (permalink / raw) To: libc-alpha, fweimer, michael.hudson We have had one downstream report from Canonical [1] that an rrdtool test was broken by the differences in LC_TIME that we had in the non-builtin C locale (C.UTF-8). If one application has an issue there are going to be others, and so with this commit we review and fix all the issues that cause the builtin C locale to be different from C.UTF-8, which includes: * mon_decimal_point should be empty e.g. "" - Depends on mon_decimal_point_wc fix. * negative_sign should be empty e.g. "" * week should be aligned with ISO 30112 default e.g. 7;19971130;4 * d_fmt corrected with escaped slashes e.g. "%m//%d//%y" * yesstr and nostr should be empty e.g. "" * country_ab2 and country_ab3 should be empty e.g. "" We bump LC_IDENTIFICATION version and adjust the date to indicate the change in the locale. A new tst-c-utf8-consistency test is added to ensure consistency between C/POSIX and C.UTF-8. Tested on x86_64 and i686 without regression. [1] https://sourceware.org/pipermail/libc-alpha/2022-January/135703.html Co-authored-by: Florian Weimer <fweimer@redhat.com> --- localedata/Makefile | 30 +- localedata/locales/C | 22 +- localedata/tst-c-utf8-consistency.c | 539 ++++++++++++++++++++++++++++ 3 files changed, 578 insertions(+), 13 deletions(-) create mode 100644 localedata/tst-c-utf8-consistency.c diff --git a/localedata/Makefile b/localedata/Makefile index 79db713925..9ae2e5c161 100644 --- a/localedata/Makefile +++ b/localedata/Makefile @@ -155,11 +155,31 @@ locale_test_suite := tst_iswalnum tst_iswalpha tst_iswcntrl \ tst_wcsxfrm tst_wctob tst_wctomb tst_wctrans \ tst_wctype tst_wcwidth -tests = $(locale_test_suite) tst-digits tst-setlocale bug-iconv-trans \ - tst-leaks tst-mbswcs1 tst-mbswcs2 tst-mbswcs3 tst-mbswcs4 tst-mbswcs5 \ - tst-mbswcs6 tst-xlocale1 tst-xlocale2 bug-usesetlocale \ - tst-strfmon1 tst-sscanf bug-setlocale1 tst-setlocale2 tst-setlocale3 \ - tst-wctype tst-iconv-math-trans +tests = \ + $(locale_test_suite) \ + bug-iconv-trans \ + bug-setlocale1 \ + bug-usesetlocale \ + tst-c-utf8-consistency \ + tst-digits \ + tst-iconv-math-trans \ + tst-leaks \ + tst-mbswcs1 \ + tst-mbswcs2 \ + tst-mbswcs3 \ + tst-mbswcs4 \ + tst-mbswcs5 \ + tst-mbswcs6 \ + tst-setlocale \ + tst-setlocale2 \ + tst-setlocale3 \ + tst-sscanf \ + tst-strfmon1 \ + tst-wctype \ + tst-xlocale1 \ + tst-xlocale2 \ + # tests + tests-static = bug-setlocale1-static tests += $(tests-static) ifeq (yes,$(build-shared)) diff --git a/localedata/locales/C b/localedata/locales/C index ca801c79cf..fb647ccc4b 100644 --- a/localedata/locales/C +++ b/localedata/locales/C @@ -12,8 +12,8 @@ tel "" fax "" language "" territory "" -revision "2.0" -date "2020-06-28" +revision "2.1" +date "2022-01-30" category "i18n:2012";LC_IDENTIFICATION category "i18n:2012";LC_CTYPE category "i18n:2012";LC_COLLATE @@ -68,11 +68,11 @@ LC_MONETARY % glibc/locale/C-monetary.c.). int_curr_symbol "" currency_symbol "" -mon_decimal_point "." +mon_decimal_point "" mon_thousands_sep "" mon_grouping -1 positive_sign "" -negative_sign "-" +negative_sign "" int_frac_digits -1 frac_digits -1 p_cs_precedes -1 @@ -121,7 +121,9 @@ mon "January";"February";"March";"April";"May";"June";"July";/ % % ISO 8601 conforming applications should use the values 7, 19971201 (a % Monday), and 4 (Thursday), respectively. -week 7;19971201;4 +% +% This field is consciously aligned with ISO 30112 and the C/POSIX locale. +week 7;19971130;4 first_weekday 1 first_workday 2 @@ -129,7 +131,7 @@ first_workday 2 d_t_fmt "%a %b %e %H:%M:%S %Y" % Appropriate date representation (%x) -d_fmt "%m/%d/%y" +d_fmt "%m//%d//%y" % Appropriate time representation (%X) t_fmt "%H:%M:%S" @@ -150,8 +152,8 @@ LC_MESSAGES % yesexpr "^[yY]" noexpr "^[nN]" -yesstr "Yes" -nostr "No" +yesstr "" +nostr "" END LC_MESSAGES LC_PAPER @@ -175,6 +177,10 @@ LC_ADDRESS % the LC_ADDRESS category. % (also used in the built in C/POSIX locale in glibc/locale/C-address.c) postal_fmt "%a%N%f%N%d%N%b%N%s %h %e %r%N%C-%z %T%N%c%N" +% The abbreviated 2 char and 3 char should be set to empty strings to +% match the C/POSIX locale. +country_ab2 "" +country_ab3 "" END LC_ADDRESS LC_TELEPHONE diff --git a/localedata/tst-c-utf8-consistency.c b/localedata/tst-c-utf8-consistency.c new file mode 100644 index 0000000000..50feed3090 --- /dev/null +++ b/localedata/tst-c-utf8-consistency.c @@ -0,0 +1,539 @@ +/* Test that C/POSIX and C.UTF-8 are consistent. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <langinfo.h> +#include <locale.h> +#include <stdbool.h> +#include <stdio.h> +#include <support/check.h> + +/* Initialized by do_test using newlocale. */ +static locale_t c_utf8; + +/* Set to true for second pass. */ +static bool use_nl_langinfo_l; + +static void +switch_to_c (void) +{ + if (setlocale (LC_ALL, "C") == NULL) + FAIL_EXIT1 ("setlocale (LC_ALL, \"C\")"); +} + +static void +switch_to_c_utf8 (void) +{ + if (setlocale (LC_ALL, "C.UTF-8") == NULL) + FAIL_EXIT1 ("setlocale (LC_ALL, \"C.UTF-8\")"); +} + +static char * +str (nl_item item) +{ + if (!use_nl_langinfo_l) + switch_to_c (); + return nl_langinfo (item); +} + +static char * +str_utf8 (nl_item item) +{ + if (use_nl_langinfo_l) + return nl_langinfo_l (item, c_utf8); + else + { + switch_to_c_utf8 (); + return nl_langinfo (item); + } +} + +static wchar_t * +wstr (nl_item item) +{ + return (wchar_t *) str (item); +} + +static wchar_t * +wstr_utf8 (nl_item item) +{ + return (wchar_t *) str_utf8 (item); +} + +static int +byte (nl_item item) +{ + return (signed char) *str (item); +} + +static int +byte_utf8 (nl_item item) +{ + return (signed char) *str_utf8 (item); +} + +static int +word (nl_item item) +{ + union + { + char *ptr; + int word; + } u; + u.ptr = str (item); + return u.word; +} + +static int +word_utf8 (nl_item item) +{ + union + { + char *ptr; + int word; + } u; + u.ptr = str_utf8 (item); + return u.word; +} + +static void +one_pass (void) +{ + /* LC_TIME. */ + TEST_COMPARE_STRING (str (ABDAY_1), str_utf8 (ABDAY_1)); + TEST_COMPARE_STRING (str (ABDAY_2), str_utf8 (ABDAY_2)); + TEST_COMPARE_STRING (str (ABDAY_3), str_utf8 (ABDAY_3)); + TEST_COMPARE_STRING (str (ABDAY_4), str_utf8 (ABDAY_4)); + TEST_COMPARE_STRING (str (ABDAY_5), str_utf8 (ABDAY_5)); + TEST_COMPARE_STRING (str (ABDAY_6), str_utf8 (ABDAY_6)); + TEST_COMPARE_STRING (str (ABDAY_7), str_utf8 (ABDAY_7)); + + TEST_COMPARE_STRING (str (DAY_1), str_utf8 (DAY_1)); + TEST_COMPARE_STRING (str (DAY_2), str_utf8 (DAY_2)); + TEST_COMPARE_STRING (str (DAY_3), str_utf8 (DAY_3)); + TEST_COMPARE_STRING (str (DAY_4), str_utf8 (DAY_4)); + TEST_COMPARE_STRING (str (DAY_5), str_utf8 (DAY_5)); + TEST_COMPARE_STRING (str (DAY_6), str_utf8 (DAY_6)); + TEST_COMPARE_STRING (str (DAY_7), str_utf8 (DAY_7)); + + TEST_COMPARE_STRING (str (ABMON_1), str_utf8 (ABMON_1)); + TEST_COMPARE_STRING (str (ABMON_2), str_utf8 (ABMON_2)); + TEST_COMPARE_STRING (str (ABMON_3), str_utf8 (ABMON_3)); + TEST_COMPARE_STRING (str (ABMON_4), str_utf8 (ABMON_4)); + TEST_COMPARE_STRING (str (ABMON_5), str_utf8 (ABMON_5)); + TEST_COMPARE_STRING (str (ABMON_6), str_utf8 (ABMON_6)); + TEST_COMPARE_STRING (str (ABMON_7), str_utf8 (ABMON_7)); + TEST_COMPARE_STRING (str (ABMON_8), str_utf8 (ABMON_8)); + TEST_COMPARE_STRING (str (ABMON_9), str_utf8 (ABMON_9)); + TEST_COMPARE_STRING (str (ABMON_10), str_utf8 (ABMON_10)); + TEST_COMPARE_STRING (str (ABMON_11), str_utf8 (ABMON_11)); + TEST_COMPARE_STRING (str (ABMON_12), str_utf8 (ABMON_12)); + + TEST_COMPARE_STRING (str (MON_1), str_utf8 (MON_1)); + TEST_COMPARE_STRING (str (MON_2), str_utf8 (MON_2)); + TEST_COMPARE_STRING (str (MON_3), str_utf8 (MON_3)); + TEST_COMPARE_STRING (str (MON_4), str_utf8 (MON_4)); + TEST_COMPARE_STRING (str (MON_5), str_utf8 (MON_5)); + TEST_COMPARE_STRING (str (MON_6), str_utf8 (MON_6)); + TEST_COMPARE_STRING (str (MON_7), str_utf8 (MON_7)); + TEST_COMPARE_STRING (str (MON_8), str_utf8 (MON_8)); + TEST_COMPARE_STRING (str (MON_9), str_utf8 (MON_9)); + TEST_COMPARE_STRING (str (MON_10), str_utf8 (MON_10)); + TEST_COMPARE_STRING (str (MON_11), str_utf8 (MON_11)); + TEST_COMPARE_STRING (str (MON_12), str_utf8 (MON_12)); + + TEST_COMPARE_STRING (str (AM_STR), str_utf8 (AM_STR)); + TEST_COMPARE_STRING (str (PM_STR), str_utf8 (PM_STR)); + + TEST_COMPARE_STRING (str (D_T_FMT), str_utf8 (D_T_FMT)); + TEST_COMPARE_STRING (str (D_FMT), str_utf8 (D_FMT)); + TEST_COMPARE_STRING (str (T_FMT), str_utf8 (T_FMT)); + TEST_COMPARE_STRING (str (T_FMT_AMPM), + str_utf8 (T_FMT_AMPM)); + + TEST_COMPARE_STRING (str (ERA), str_utf8 (ERA)); + TEST_COMPARE_STRING (str (ERA_YEAR), str_utf8 (ERA_YEAR)); + TEST_COMPARE_STRING (str (ERA_D_FMT), str_utf8 (ERA_D_FMT)); + TEST_COMPARE_STRING (str (ALT_DIGITS), str_utf8 (ALT_DIGITS)); + TEST_COMPARE_STRING (str (ERA_D_T_FMT), str_utf8 (ERA_D_T_FMT)); + TEST_COMPARE_STRING (str (ERA_T_FMT), str_utf8 (ERA_T_FMT)); + TEST_COMPARE (word (_NL_TIME_ERA_NUM_ENTRIES), + word_utf8 (_NL_TIME_ERA_NUM_ENTRIES)); + /* No array elements, so nothing to compare for _NL_TIME_ERA_ENTRIES. */ + TEST_COMPARE (word (_NL_TIME_ERA_NUM_ENTRIES), 0); + + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABDAY_1), wstr_utf8 (_NL_WABDAY_1)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABDAY_2), wstr_utf8 (_NL_WABDAY_2)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABDAY_3), wstr_utf8 (_NL_WABDAY_3)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABDAY_4), wstr_utf8 (_NL_WABDAY_4)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABDAY_5), wstr_utf8 (_NL_WABDAY_5)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABDAY_6), wstr_utf8 (_NL_WABDAY_6)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABDAY_7), wstr_utf8 (_NL_WABDAY_7)); + + TEST_COMPARE_STRING_WIDE (wstr (_NL_WDAY_1), wstr_utf8 (_NL_WDAY_1)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WDAY_2), wstr_utf8 (_NL_WDAY_2)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WDAY_3), wstr_utf8 (_NL_WDAY_3)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WDAY_4), wstr_utf8 (_NL_WDAY_4)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WDAY_5), wstr_utf8 (_NL_WDAY_5)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WDAY_6), wstr_utf8 (_NL_WDAY_6)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WDAY_7), wstr_utf8 (_NL_WDAY_7)); + + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_1), wstr_utf8 (_NL_WABMON_1)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_2), wstr_utf8 (_NL_WABMON_2)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_3), wstr_utf8 (_NL_WABMON_3)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_4), wstr_utf8 (_NL_WABMON_4)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_5), wstr_utf8 (_NL_WABMON_5)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_6), wstr_utf8 (_NL_WABMON_6)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_7), wstr_utf8 (_NL_WABMON_7)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_8), wstr_utf8 (_NL_WABMON_8)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_9), wstr_utf8 (_NL_WABMON_9)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_10), wstr_utf8 (_NL_WABMON_10)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_11), wstr_utf8 (_NL_WABMON_11)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_12), wstr_utf8 (_NL_WABMON_12)); + + TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_1), wstr_utf8 (_NL_WMON_1)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_2), wstr_utf8 (_NL_WMON_2)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_3), wstr_utf8 (_NL_WMON_3)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_4), wstr_utf8 (_NL_WMON_4)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_5), wstr_utf8 (_NL_WMON_5)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_6), wstr_utf8 (_NL_WMON_6)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_7), wstr_utf8 (_NL_WMON_7)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_8), wstr_utf8 (_NL_WMON_8)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_9), wstr_utf8 (_NL_WMON_9)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_10), wstr_utf8 (_NL_WMON_10)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_11), wstr_utf8 (_NL_WMON_11)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_12), wstr_utf8 (_NL_WMON_12)); + + TEST_COMPARE_STRING_WIDE (wstr (_NL_WAM_STR), wstr_utf8 (_NL_WAM_STR)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WPM_STR), wstr_utf8 (_NL_WPM_STR)); + + TEST_COMPARE_STRING_WIDE (wstr (_NL_WD_T_FMT), wstr_utf8 (_NL_WD_T_FMT)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WD_FMT), wstr_utf8 (_NL_WD_FMT)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WT_FMT), wstr_utf8 (_NL_WT_FMT)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WT_FMT_AMPM), + wstr_utf8 (_NL_WT_FMT_AMPM)); + + TEST_COMPARE_STRING_WIDE (wstr (_NL_WERA_YEAR), wstr_utf8 (_NL_WERA_YEAR)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WERA_D_FMT), wstr_utf8 (_NL_WERA_D_FMT)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WALT_DIGITS), + wstr_utf8 (_NL_WALT_DIGITS)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WERA_D_T_FMT), + wstr_utf8 (_NL_WERA_D_T_FMT)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WERA_T_FMT), wstr_utf8 (_NL_WERA_T_FMT)); + + /* This is somewhat inconsistent, but see locale/categories.def. */ + TEST_COMPARE (byte (_NL_TIME_WEEK_NDAYS), byte_utf8 (_NL_TIME_WEEK_NDAYS)); + TEST_COMPARE (word (_NL_TIME_WEEK_1STDAY), + word_utf8 (_NL_TIME_WEEK_1STDAY)); + TEST_COMPARE (byte (_NL_TIME_WEEK_1STWEEK), + byte_utf8 (_NL_TIME_WEEK_1STWEEK)); + TEST_COMPARE (byte (_NL_TIME_FIRST_WEEKDAY), + byte_utf8 (_NL_TIME_FIRST_WEEKDAY)); + TEST_COMPARE (byte (_NL_TIME_FIRST_WORKDAY), + byte_utf8 (_NL_TIME_FIRST_WORKDAY)); + TEST_COMPARE (byte (_NL_TIME_CAL_DIRECTION), + byte_utf8 (_NL_TIME_CAL_DIRECTION)); + TEST_COMPARE_STRING (str (_NL_TIME_TIMEZONE), str_utf8 (_NL_TIME_TIMEZONE)); + + TEST_COMPARE_STRING (str (_DATE_FMT), str_utf8 (_DATE_FMT)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_W_DATE_FMT), wstr_utf8 (_NL_W_DATE_FMT)); + + /* Expected difference. */ + TEST_COMPARE_STRING (str (_NL_TIME_CODESET), "ANSI_X3.4-1968"); + TEST_COMPARE_STRING (str_utf8 (_NL_TIME_CODESET), "UTF-8"); + + TEST_COMPARE_STRING (str (ALTMON_1), str_utf8 (ALTMON_1)); + TEST_COMPARE_STRING (str (ALTMON_2), str_utf8 (ALTMON_2)); + TEST_COMPARE_STRING (str (ALTMON_3), str_utf8 (ALTMON_3)); + TEST_COMPARE_STRING (str (ALTMON_4), str_utf8 (ALTMON_4)); + TEST_COMPARE_STRING (str (ALTMON_5), str_utf8 (ALTMON_5)); + TEST_COMPARE_STRING (str (ALTMON_6), str_utf8 (ALTMON_6)); + TEST_COMPARE_STRING (str (ALTMON_7), str_utf8 (ALTMON_7)); + TEST_COMPARE_STRING (str (ALTMON_8), str_utf8 (ALTMON_8)); + TEST_COMPARE_STRING (str (ALTMON_9), str_utf8 (ALTMON_9)); + TEST_COMPARE_STRING (str (ALTMON_10), str_utf8 (ALTMON_10)); + TEST_COMPARE_STRING (str (ALTMON_11), str_utf8 (ALTMON_11)); + TEST_COMPARE_STRING (str (ALTMON_12), str_utf8 (ALTMON_12)); + + TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_1), wstr_utf8 (_NL_WALTMON_1)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_2), wstr_utf8 (_NL_WALTMON_2)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_3), wstr_utf8 (_NL_WALTMON_3)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_4), wstr_utf8 (_NL_WALTMON_4)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_5), wstr_utf8 (_NL_WALTMON_5)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_6), wstr_utf8 (_NL_WALTMON_6)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_7), wstr_utf8 (_NL_WALTMON_7)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_8), wstr_utf8 (_NL_WALTMON_8)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_9), wstr_utf8 (_NL_WALTMON_9)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_10), wstr_utf8 (_NL_WALTMON_10)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_11), wstr_utf8 (_NL_WALTMON_11)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_12), wstr_utf8 (_NL_WALTMON_12)); + + TEST_COMPARE_STRING (str (_NL_ABALTMON_1), str_utf8 (_NL_ABALTMON_1)); + TEST_COMPARE_STRING (str (_NL_ABALTMON_2), str_utf8 (_NL_ABALTMON_2)); + TEST_COMPARE_STRING (str (_NL_ABALTMON_3), str_utf8 (_NL_ABALTMON_3)); + TEST_COMPARE_STRING (str (_NL_ABALTMON_4), str_utf8 (_NL_ABALTMON_4)); + TEST_COMPARE_STRING (str (_NL_ABALTMON_5), str_utf8 (_NL_ABALTMON_5)); + TEST_COMPARE_STRING (str (_NL_ABALTMON_6), str_utf8 (_NL_ABALTMON_6)); + TEST_COMPARE_STRING (str (_NL_ABALTMON_7), str_utf8 (_NL_ABALTMON_7)); + TEST_COMPARE_STRING (str (_NL_ABALTMON_8), str_utf8 (_NL_ABALTMON_8)); + TEST_COMPARE_STRING (str (_NL_ABALTMON_9), str_utf8 (_NL_ABALTMON_9)); + TEST_COMPARE_STRING (str (_NL_ABALTMON_10), str_utf8 (_NL_ABALTMON_10)); + TEST_COMPARE_STRING (str (_NL_ABALTMON_11), str_utf8 (_NL_ABALTMON_11)); + TEST_COMPARE_STRING (str (_NL_ABALTMON_12), str_utf8 (_NL_ABALTMON_12)); + + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_1), + wstr_utf8 (_NL_WABALTMON_1)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_2), + wstr_utf8 (_NL_WABALTMON_2)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_3), + wstr_utf8 (_NL_WABALTMON_3)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_4), + wstr_utf8 (_NL_WABALTMON_4)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_5), + wstr_utf8 (_NL_WABALTMON_5)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_6), + wstr_utf8 (_NL_WABALTMON_6)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_7), + wstr_utf8 (_NL_WABALTMON_7)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_8), + wstr_utf8 (_NL_WABALTMON_8)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_9), + wstr_utf8 (_NL_WABALTMON_9)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_10), + wstr_utf8 (_NL_WABALTMON_10)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_11), + wstr_utf8 (_NL_WABALTMON_11)); + TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_12), + wstr_utf8 (_NL_WABALTMON_12)); + + /* LC_COLLATE. Mostly untested, only expected differences. */ + TEST_COMPARE_STRING (str (_NL_COLLATE_CODESET), "ANSI_X3.4-1968"); + TEST_COMPARE_STRING (str_utf8 (_NL_COLLATE_CODESET), "UTF-8"); + + /* LC_CTYPE. Mostly untested, only expected differences. */ + TEST_COMPARE_STRING (str (CODESET), "ANSI_X3.4-1968"); + TEST_COMPARE_STRING (str_utf8 (CODESET), "UTF-8"); + + /* LC_MONETARY. */ + TEST_COMPARE_STRING (str (INT_CURR_SYMBOL), str_utf8 (INT_CURR_SYMBOL)); + TEST_COMPARE_STRING (str (CURRENCY_SYMBOL), str_utf8 (CURRENCY_SYMBOL)); + TEST_COMPARE_STRING (str (MON_DECIMAL_POINT), str_utf8 (MON_DECIMAL_POINT)); + TEST_COMPARE_STRING (str (MON_THOUSANDS_SEP), str_utf8 (MON_THOUSANDS_SEP)); + TEST_COMPARE_STRING (str (MON_GROUPING), str_utf8 (MON_GROUPING)); + TEST_COMPARE_STRING (str (POSITIVE_SIGN), str_utf8 (POSITIVE_SIGN)); + TEST_COMPARE_STRING (str (NEGATIVE_SIGN), str_utf8 (NEGATIVE_SIGN)); + TEST_COMPARE (byte (INT_FRAC_DIGITS), byte_utf8 (INT_FRAC_DIGITS)); + TEST_COMPARE (byte (FRAC_DIGITS), byte_utf8 (FRAC_DIGITS)); + TEST_COMPARE (byte (P_CS_PRECEDES), byte_utf8 (P_CS_PRECEDES)); + TEST_COMPARE (byte (P_SEP_BY_SPACE), byte_utf8 (P_SEP_BY_SPACE)); + TEST_COMPARE (byte (N_CS_PRECEDES), byte_utf8 (N_CS_PRECEDES)); + TEST_COMPARE (byte (N_SEP_BY_SPACE), byte_utf8 (N_SEP_BY_SPACE)); + TEST_COMPARE (byte (P_SIGN_POSN), byte_utf8 (P_SIGN_POSN)); + TEST_COMPARE (byte (N_SIGN_POSN), byte_utf8 (N_SIGN_POSN)); + TEST_COMPARE_STRING (str (CRNCYSTR), str_utf8 (CRNCYSTR)); + TEST_COMPARE (byte (INT_P_CS_PRECEDES), byte_utf8 (INT_P_CS_PRECEDES)); + TEST_COMPARE (byte (INT_P_SEP_BY_SPACE), byte_utf8 (INT_P_SEP_BY_SPACE)); + TEST_COMPARE (byte (INT_N_CS_PRECEDES), byte_utf8 (INT_N_CS_PRECEDES)); + TEST_COMPARE (byte (INT_N_SEP_BY_SPACE), byte_utf8 (INT_N_SEP_BY_SPACE)); + TEST_COMPARE (byte (INT_P_SIGN_POSN), byte_utf8 (INT_P_SIGN_POSN)); + TEST_COMPARE (byte (INT_N_SIGN_POSN), byte_utf8 (INT_N_SIGN_POSN)); + TEST_COMPARE_STRING (str (_NL_MONETARY_DUO_INT_CURR_SYMBOL), + str_utf8 (_NL_MONETARY_DUO_INT_CURR_SYMBOL)); + TEST_COMPARE_STRING (str (_NL_MONETARY_DUO_CURRENCY_SYMBOL), + str_utf8 (_NL_MONETARY_DUO_CURRENCY_SYMBOL)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_FRAC_DIGITS), + byte_utf8 (_NL_MONETARY_DUO_INT_FRAC_DIGITS)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_FRAC_DIGITS), + byte_utf8 (_NL_MONETARY_DUO_FRAC_DIGITS)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_P_CS_PRECEDES), + byte_utf8 (_NL_MONETARY_DUO_P_CS_PRECEDES)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_P_SEP_BY_SPACE), + byte_utf8 (_NL_MONETARY_DUO_P_SEP_BY_SPACE)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_N_CS_PRECEDES), + byte_utf8 (_NL_MONETARY_DUO_N_CS_PRECEDES)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_N_SEP_BY_SPACE), + byte_utf8 (_NL_MONETARY_DUO_N_SEP_BY_SPACE)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_P_CS_PRECEDES), + byte_utf8 (_NL_MONETARY_DUO_INT_P_CS_PRECEDES)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_P_SEP_BY_SPACE), + byte_utf8 (_NL_MONETARY_DUO_INT_P_SEP_BY_SPACE)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_N_CS_PRECEDES), + byte_utf8 (_NL_MONETARY_DUO_INT_N_CS_PRECEDES)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_N_SEP_BY_SPACE), + byte_utf8 (_NL_MONETARY_DUO_INT_N_SEP_BY_SPACE)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_P_SIGN_POSN), + byte_utf8 (_NL_MONETARY_DUO_INT_P_SIGN_POSN)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_N_SIGN_POSN), + byte_utf8 (_NL_MONETARY_DUO_INT_N_SIGN_POSN)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_P_SIGN_POSN), + byte_utf8 (_NL_MONETARY_DUO_P_SIGN_POSN)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_N_SIGN_POSN), + byte_utf8 (_NL_MONETARY_DUO_N_SIGN_POSN)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_P_SIGN_POSN), + byte_utf8 (_NL_MONETARY_DUO_INT_P_SIGN_POSN)); + TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_N_SIGN_POSN), + byte_utf8 (_NL_MONETARY_DUO_INT_N_SIGN_POSN)); + TEST_COMPARE (word (_NL_MONETARY_UNO_VALID_FROM), + word_utf8 (_NL_MONETARY_UNO_VALID_FROM)); + TEST_COMPARE (word (_NL_MONETARY_UNO_VALID_TO), + word_utf8 (_NL_MONETARY_UNO_VALID_TO)); + TEST_COMPARE (word (_NL_MONETARY_DUO_VALID_FROM), + word_utf8 (_NL_MONETARY_DUO_VALID_FROM)); + TEST_COMPARE (word (_NL_MONETARY_DUO_VALID_TO), + word_utf8 (_NL_MONETARY_DUO_VALID_TO)); + /* _NL_MONETARY_CONVERSION_RATE cannot be tested (word array). */ + TEST_COMPARE (word (_NL_MONETARY_DECIMAL_POINT_WC), + word_utf8 (_NL_MONETARY_DECIMAL_POINT_WC)); + TEST_COMPARE (word (_NL_MONETARY_THOUSANDS_SEP_WC), + word_utf8 (_NL_MONETARY_THOUSANDS_SEP_WC)); + /* Expected difference. */ + TEST_COMPARE_STRING (str (_NL_MONETARY_CODESET), "ANSI_X3.4-1968"); + TEST_COMPARE_STRING (str_utf8 (_NL_MONETARY_CODESET), "UTF-8"); + + /* LC_NUMERIC. */ + + TEST_COMPARE_STRING (str (DECIMAL_POINT), str_utf8 (DECIMAL_POINT)); + TEST_COMPARE_STRING (str (RADIXCHAR), str_utf8 (RADIXCHAR)); + TEST_COMPARE_STRING (str (THOUSANDS_SEP), str_utf8 (THOUSANDS_SEP)); + TEST_COMPARE_STRING (str (THOUSEP), str_utf8 (THOUSEP)); + TEST_COMPARE_STRING (str (GROUPING), str_utf8 (GROUPING)); + TEST_COMPARE (word (_NL_NUMERIC_DECIMAL_POINT_WC), + word_utf8 (_NL_NUMERIC_DECIMAL_POINT_WC)); + TEST_COMPARE (word (_NL_NUMERIC_THOUSANDS_SEP_WC), + word_utf8 (_NL_NUMERIC_THOUSANDS_SEP_WC)); + /* Expected difference. */ + TEST_COMPARE_STRING (str (_NL_NUMERIC_CODESET), "ANSI_X3.4-1968"); + TEST_COMPARE_STRING (str_utf8 (_NL_NUMERIC_CODESET), "UTF-8"); + + /* LC_MESSAGES. */ + + TEST_COMPARE_STRING (str (YESEXPR), str_utf8 (YESEXPR)); + TEST_COMPARE_STRING (str (NOEXPR), str_utf8 (NOEXPR)); + TEST_COMPARE_STRING (str (YESSTR), str_utf8 (YESSTR)); + TEST_COMPARE_STRING (str (NOSTR), str_utf8 (NOSTR)); + /* Expected difference. */ + TEST_COMPARE_STRING (str (_NL_MESSAGES_CODESET), "ANSI_X3.4-1968"); + TEST_COMPARE_STRING (str_utf8 (_NL_MESSAGES_CODESET), "UTF-8"); + + /* LC_PAPER. */ + + TEST_COMPARE (word (_NL_PAPER_HEIGHT), word_utf8 (_NL_PAPER_HEIGHT)); + TEST_COMPARE (word (_NL_PAPER_WIDTH), word_utf8 (_NL_PAPER_WIDTH)); + /* Expected difference. */ + TEST_COMPARE_STRING (str (_NL_PAPER_CODESET), "ANSI_X3.4-1968"); + TEST_COMPARE_STRING (str_utf8 (_NL_PAPER_CODESET), "UTF-8"); + + /* LC_NAME. */ + + TEST_COMPARE_STRING (str (_NL_NAME_NAME_FMT), + str_utf8 (_NL_NAME_NAME_FMT)); + TEST_COMPARE_STRING (str (_NL_NAME_NAME_GEN), + str_utf8 (_NL_NAME_NAME_GEN)); + TEST_COMPARE_STRING (str (_NL_NAME_NAME_MR), + str_utf8 (_NL_NAME_NAME_MR)); + TEST_COMPARE_STRING (str (_NL_NAME_NAME_MRS), + str_utf8 (_NL_NAME_NAME_MRS)); + TEST_COMPARE_STRING (str (_NL_NAME_NAME_MISS), + str_utf8 (_NL_NAME_NAME_MISS)); + TEST_COMPARE_STRING (str (_NL_NAME_NAME_MS), + str_utf8 (_NL_NAME_NAME_MS)); + /* Expected difference. */ + TEST_COMPARE_STRING (str (_NL_NAME_CODESET), "ANSI_X3.4-1968"); + TEST_COMPARE_STRING (str_utf8 (_NL_NAME_CODESET), "UTF-8"); + + /* LC_ADDRESS. */ + + TEST_COMPARE_STRING (str (_NL_ADDRESS_POSTAL_FMT), + str_utf8 (_NL_ADDRESS_POSTAL_FMT)); + TEST_COMPARE_STRING (str (_NL_ADDRESS_COUNTRY_NAME), + str_utf8 (_NL_ADDRESS_COUNTRY_NAME)); + TEST_COMPARE_STRING (str (_NL_ADDRESS_COUNTRY_POST), + str_utf8 (_NL_ADDRESS_COUNTRY_POST)); + TEST_COMPARE_STRING (str (_NL_ADDRESS_COUNTRY_AB2), + str_utf8 (_NL_ADDRESS_COUNTRY_AB2)); + TEST_COMPARE_STRING (str (_NL_ADDRESS_COUNTRY_AB3), + str_utf8 (_NL_ADDRESS_COUNTRY_AB3)); + TEST_COMPARE_STRING (str (_NL_ADDRESS_COUNTRY_CAR), + str_utf8 (_NL_ADDRESS_COUNTRY_CAR)); + TEST_COMPARE (word (_NL_ADDRESS_COUNTRY_NUM), + word_utf8 (_NL_ADDRESS_COUNTRY_NUM)); + TEST_COMPARE_STRING (str (_NL_ADDRESS_COUNTRY_ISBN), + str_utf8 (_NL_ADDRESS_COUNTRY_ISBN)); + TEST_COMPARE_STRING (str (_NL_ADDRESS_LANG_NAME), + str_utf8 (_NL_ADDRESS_LANG_NAME)); + TEST_COMPARE_STRING (str (_NL_ADDRESS_LANG_AB), + str_utf8 (_NL_ADDRESS_LANG_AB)); + TEST_COMPARE_STRING (str (_NL_ADDRESS_LANG_TERM), + str_utf8 (_NL_ADDRESS_LANG_TERM)); + TEST_COMPARE_STRING (str (_NL_ADDRESS_LANG_LIB), + str_utf8 (_NL_ADDRESS_LANG_LIB)); + /* Expected difference. */ + TEST_COMPARE_STRING (str (_NL_ADDRESS_CODESET), "ANSI_X3.4-1968"); + TEST_COMPARE_STRING (str_utf8 (_NL_ADDRESS_CODESET), "UTF-8"); + + /* LC_TELEPHONE. */ + + TEST_COMPARE_STRING (str (_NL_TELEPHONE_TEL_INT_FMT), + str_utf8 (_NL_TELEPHONE_TEL_INT_FMT)); + TEST_COMPARE_STRING (str (_NL_TELEPHONE_TEL_DOM_FMT), + str_utf8 (_NL_TELEPHONE_TEL_DOM_FMT)); + TEST_COMPARE_STRING (str (_NL_TELEPHONE_INT_SELECT), + str_utf8 (_NL_TELEPHONE_INT_SELECT)); + TEST_COMPARE_STRING (str (_NL_TELEPHONE_INT_PREFIX), + str_utf8 (_NL_TELEPHONE_INT_PREFIX)); + /* Expected difference. */ + TEST_COMPARE_STRING (str (_NL_TELEPHONE_CODESET), "ANSI_X3.4-1968"); + TEST_COMPARE_STRING (str_utf8 (_NL_TELEPHONE_CODESET), "UTF-8"); + + /* LC_MEASUREMENT. */ + + TEST_COMPARE (byte (_NL_MEASUREMENT_MEASUREMENT), + byte_utf8 (_NL_MEASUREMENT_MEASUREMENT)); + /* Expected difference. */ + TEST_COMPARE_STRING (str (_NL_MEASUREMENT_CODESET), "ANSI_X3.4-1968"); + TEST_COMPARE_STRING (str_utf8 (_NL_MEASUREMENT_CODESET), "UTF-8"); + + /* LC_IDENTIFICATION is skipped since C.UTF-8 is distinct from C. */ + + /* _NL_IDENTIFICATION_CATEGORY cannot be tested because it is a + string array. */ + /* Expected difference. */ + TEST_COMPARE_STRING (str (_NL_IDENTIFICATION_CODESET), "ANSI_X3.4-1968"); + TEST_COMPARE_STRING (str_utf8 (_NL_IDENTIFICATION_CODESET), "UTF-8"); +} + +static int +do_test (void) +{ + puts ("info: using setlocale and nl_langinfo"); + one_pass (); + + puts ("info: using nl_langinfo_l"); + + c_utf8 = newlocale (LC_ALL_MASK, "C.UTF-8", (locale_t) 0); + TEST_VERIFY_EXIT (c_utf8 != (locale_t) 0); + + switch_to_c (); + use_nl_langinfo_l = true; + one_pass (); + + freelocale (c_utf8); + + return 0; +} + +#include <support/test-driver.c> -- 2.31.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX. 2022-01-31 5:34 ` [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX Carlos O'Donell @ 2022-01-31 8:47 ` Andreas Schwab 2022-01-31 16:07 ` Carlos O'Donell 2022-02-01 12:05 ` Florian Weimer 1 sibling, 1 reply; 15+ messages in thread From: Andreas Schwab @ 2022-01-31 8:47 UTC (permalink / raw) To: Carlos O'Donell via Libc-alpha Cc: fweimer, michael.hudson, Carlos O'Donell On Jan 31 2022, Carlos O'Donell via Libc-alpha wrote: > +% > +% This field is consciously aligned with ISO 30112 and the C/POSIX locale. > +week 7;19971130;4 The copy of ISO 30112 that I could find says 7;19971201;4. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX. 2022-01-31 8:47 ` Andreas Schwab @ 2022-01-31 16:07 ` Carlos O'Donell 0 siblings, 0 replies; 15+ messages in thread From: Carlos O'Donell @ 2022-01-31 16:07 UTC (permalink / raw) To: Andreas Schwab, Carlos O'Donell via Libc-alpha Cc: fweimer, michael.hudson On 1/31/22 03:47, Andreas Schwab wrote: > On Jan 31 2022, Carlos O'Donell via Libc-alpha wrote: > >> +% >> +% This field is consciously aligned with ISO 30112 and the C/POSIX locale. >> +week 7;19971130;4 > > The copy of ISO 30112 that I could find says 7;19971201;4. > First and foremost we try to align with C/POSIX builtin locale. Aligning with C/POSIX ensures that application developers can seamlessly change from "C" to "C.UTF-8" without breaking existing tests. e.g. locale/C-time.c 135 { .string = "\7" }, 136 { .word = 19971130 }, 137 { .string = "\4" }, I'm using ISO 30112 WD12 (2018-02-12), which I have access to as part of my SC22 involvement. This version would go on to become the final published standard 2020-09, though I don't yet have a copy of this version. Section 4.7 "LC_TIME" under week: ~~~ If the keyword is not specified the values are taken as 7, 19971130 (a Sunday), and 7 (Saturday), respectively. ISO 8601 conforming applications should use the values 7, 19971201 (a Monday), and 4 (Thursday), respectively. ~~~ This matches the ISO 30112 WD10 [2014] that was used when creating the defaults in ld-time.c: 482 /* Set up defaults based on ISO 30112 WD10 [2014]. */ 483 if (time->week_ndays == 0) 484 time->week_ndays = 7; 485 486 if (time->week_1stday == 0) 487 time->week_1stday = 19971130; 488 489 if (time->week_1stweek == 0) 490 time->week_1stweek = 7; This also matches the legacy withdrawn ISO/IEC 14652:2002. I could change the comment to read: % This field is consciously aligned with the builtin C/POSIX locale. Would a new comment resolve your review? -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX. 2022-01-31 5:34 ` [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX Carlos O'Donell 2022-01-31 8:47 ` Andreas Schwab @ 2022-02-01 12:05 ` Florian Weimer 2022-02-01 16:13 ` Carlos O'Donell 1 sibling, 1 reply; 15+ messages in thread From: Florian Weimer @ 2022-02-01 12:05 UTC (permalink / raw) To: Carlos O'Donell; +Cc: libc-alpha, michael.hudson * Carlos O'Donell: > We have had one downstream report from Canonical [1] that > an rrdtool test was broken by the differences in LC_TIME > that we had in the non-builtin C locale (C.UTF-8). If one > application has an issue there are going to be others, and > so with this commit we review and fix all the issues that > cause the builtin C locale to be different from C.UTF-8, > which includes: > * mon_decimal_point should be empty e.g. "" > - Depends on mon_decimal_point_wc fix. > * negative_sign should be empty e.g. "" > * week should be aligned with ISO 30112 default e.g. 7;19971130;4 > * d_fmt corrected with escaped slashes e.g. "%m//%d//%y" > * yesstr and nostr should be empty e.g. "" > * country_ab2 and country_ab3 should be empty e.g. "" > > We bump LC_IDENTIFICATION version and adjust the date to > indicate the change in the locale. > > A new tst-c-utf8-consistency test is added to ensure > consistency between C/POSIX and C.UTF-8. > > Tested on x86_64 and i686 without regression. > > [1] https://sourceware.org/pipermail/libc-alpha/2022-January/135703.html > > Co-authored-by: Florian Weimer <fweimer@redhat.com> > --- > localedata/Makefile | 30 +- > localedata/locales/C | 22 +- > localedata/tst-c-utf8-consistency.c | 539 ++++++++++++++++++++++++++++ > 3 files changed, 578 insertions(+), 13 deletions(-) > create mode 100644 localedata/tst-c-utf8-consistency.c This looks broadly okay to me. Dropping the ISO standard reference seems prudent if we can't check its contents. Reviewed-by: Florian Weimer <fweimer@redhat.com> Thanks, Florian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX. 2022-02-01 12:05 ` Florian Weimer @ 2022-02-01 16:13 ` Carlos O'Donell 0 siblings, 0 replies; 15+ messages in thread From: Carlos O'Donell @ 2022-02-01 16:13 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha, michael.hudson On 2/1/22 07:05, Florian Weimer wrote: > * Carlos O'Donell: > >> We have had one downstream report from Canonical [1] that >> an rrdtool test was broken by the differences in LC_TIME >> that we had in the non-builtin C locale (C.UTF-8). If one >> application has an issue there are going to be others, and >> so with this commit we review and fix all the issues that >> cause the builtin C locale to be different from C.UTF-8, >> which includes: >> * mon_decimal_point should be empty e.g. "" >> - Depends on mon_decimal_point_wc fix. >> * negative_sign should be empty e.g. "" >> * week should be aligned with ISO 30112 default e.g. 7;19971130;4 >> * d_fmt corrected with escaped slashes e.g. "%m//%d//%y" >> * yesstr and nostr should be empty e.g. "" >> * country_ab2 and country_ab3 should be empty e.g. "" >> >> We bump LC_IDENTIFICATION version and adjust the date to >> indicate the change in the locale. >> >> A new tst-c-utf8-consistency test is added to ensure >> consistency between C/POSIX and C.UTF-8. >> >> Tested on x86_64 and i686 without regression. >> >> [1] https://sourceware.org/pipermail/libc-alpha/2022-January/135703.html >> >> Co-authored-by: Florian Weimer <fweimer@redhat.com> >> --- >> localedata/Makefile | 30 +- >> localedata/locales/C | 22 +- >> localedata/tst-c-utf8-consistency.c | 539 ++++++++++++++++++++++++++++ >> 3 files changed, 578 insertions(+), 13 deletions(-) >> create mode 100644 localedata/tst-c-utf8-consistency.c > > This looks broadly okay to me. Dropping the ISO standard reference > seems prudent if we can't check its contents. > > Reviewed-by: Florian Weimer <fweimer@redhat.com> Thanks. I'll drop the ISO references from the commit message. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2022-02-01 16:14 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-31 5:34 [PATCH 0/2] Make C/POSIX and C.UTF-8 consistent Carlos O'Donell 2022-01-31 5:34 ` [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point Carlos O'Donell 2022-01-31 15:26 ` Florian Weimer 2022-01-31 16:09 ` Andreas Schwab 2022-01-31 16:20 ` Florian Weimer 2022-01-31 16:30 ` Andreas Schwab 2022-01-31 16:37 ` Florian Weimer 2022-02-01 11:47 ` Florian Weimer 2022-02-01 16:00 ` Carlos O'Donell 2022-02-01 16:14 ` Carlos O'Donell 2022-01-31 5:34 ` [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX Carlos O'Donell 2022-01-31 8:47 ` Andreas Schwab 2022-01-31 16:07 ` Carlos O'Donell 2022-02-01 12:05 ` Florian Weimer 2022-02-01 16:13 ` Carlos O'Donell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).