[PATCH 0/2] Make C/POSIX and C.UTF-8 consistent.

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 0/2] Make C/POSIX and C.UTF-8 consistent.
@ 2022-01-31  5:34 Carlos O'Donell
  2022-01-31  5:34 ` [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point Carlos O'Donell
  2022-01-31  5:34 ` [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX Carlos O'Donell
  0 siblings, 2 replies; 15+ messages in thread
From: Carlos O'Donell @ 2022-01-31  5:34 UTC (permalink / raw)
  To: libc-alpha, fweimer, michael.hudson

We had a recent report from Michael Hudson-Doyle that he had seen a
problem with C.UTF-8 when running tests for rrdtool.  The report
prompted Florian to place this on the glibc 2.35 blocker for review.
Upon review I decided to haromize C.UTF-8 closer to C/POSIX and I
worked with Florian to fix the discrepancies between the C.UTF-8
locale and the builtin C/POSIX locale.  The work uncoverd a problem
in the parsing of LC_MONETARY by localedef which needed fixing in order
to make C/POSIX and C.UTF-8 consistent.  The first commit fixes the
mon_decimal_point handling in localedef parsing, while the second commit
fixes C.UTF-8 and adds a new test to check for consistency beween
C/POSIX and C.UTF-8.  The test is based on work that Florian Weimer did
to help me identify the inconsistencies between the locales.

Carlos O'Donell (2):
  localedef: Fix handling of empty mon_decimal_point
  localedata: Adjust C.UTF-8 to align with C/POSIX.

 locale/programs/ld-monetary.c       |   4 +-
 localedata/Makefile                 |  30 +-
 localedata/locales/C                |  22 +-
 localedata/tst-c-utf8-consistency.c | 539 ++++++++++++++++++++++++++++
 4 files changed, 579 insertions(+), 16 deletions(-)
 create mode 100644 localedata/tst-c-utf8-consistency.c

-- 
2.31.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point
  2022-01-31  5:34 [PATCH 0/2] Make C/POSIX and C.UTF-8 consistent Carlos O'Donell
@ 2022-01-31  5:34 ` Carlos O'Donell
  2022-01-31 15:26   ` Florian Weimer
  2022-02-01 11:47   ` Florian Weimer
  2022-01-31  5:34 ` [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX Carlos O'Donell
  1 sibling, 2 replies; 15+ messages in thread
From: Carlos O'Donell @ 2022-01-31  5:34 UTC (permalink / raw)
  To: libc-alpha, fweimer, michael.hudson

The handling of mon_decimal_point is incorrect when it comes to
handling the empty "" value.  The existing parser in monetary_read()
will correctly handle setting the non-wide-character value and the
wide-character value e.g. STR_ELEM_WC(mon_decimal_point) if they are
set in the locale definition.  However, in monetary_finish() we have
conflicting TEST_ELEM() which sets a default value (if the locale
definition doesn't include one), and subsequent code which looks for
mon_decimal_point to be NULL to issue a specific error message and set
the defaults. The latter is unused because TEST_ELEM() always sets a
default.  The simplest solution is to remove the TEST_ELEM() check,
and allow the existing check to look to see if mon_decimal_point is
NULL and set an appropriate default.  The final fix is to move the
setting of mon_decimal_point_wc so it occurs only when
mon_decimal_point is being set to a default, keeping both values
consistent. There is no way to tell the difference between
mon_decimal_point_wc having been set to the empty string and not
having been defined at all, for that distinction we must use
mon_decimal_point being NULL or "", and so we must logically set
the default together with mon_decimal_point.

Lastly, there are more fixes similar to this that could be made to
ld-monetary.c, but we avoid that in order to fix just the code
required for mon_decimal_point, which impacts the ability for C.UTF-8
to set mon_decimal_point to "", since without this fix we end up with
an inconsistent setting of mon_decimal_point set to "", but
mon_decimal_point_wc set to "." which is incorrect.

Tested on x86_64 and i686 without regression.
---
 locale/programs/ld-monetary.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c
index 277b9ff042..3b0412b405 100644
--- a/locale/programs/ld-monetary.c
+++ b/locale/programs/ld-monetary.c
@@ -207,7 +207,6 @@ No definition for %s category found"), "LC_MONETARY");

   TEST_ELEM (int_curr_symbol, "");
   TEST_ELEM (currency_symbol, "");
-  TEST_ELEM (mon_decimal_point, ".");
   TEST_ELEM (mon_thousands_sep, "");
   TEST_ELEM (positive_sign, "");
   TEST_ELEM (negative_sign, "");
@@ -257,6 +256,7 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
 	record_error (0, 0, _("%s: field `%s' not defined"),
 		      "LC_MONETARY", "mon_decimal_point");
       monetary->mon_decimal_point = ".";
+      monetary->mon_decimal_point_wc = L'.';
     }
   else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing)
     {
@@ -264,8 +264,6 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
 %s: value for field `%s' must not be an empty string"),
 		    "LC_MONETARY", "mon_decimal_point");
     }
-  if (monetary->mon_decimal_point_wc == L'\0')
-    monetary->mon_decimal_point_wc = L'.';

   if (monetary->mon_grouping_len == 0)
     {
-- 
2.31.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point
  2022-01-31  5:34 ` [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point Carlos O'Donell
@ 2022-01-31 15:26   ` Florian Weimer
  2022-01-31 16:09     ` Andreas Schwab
  2022-02-01 11:47   ` Florian Weimer
  1 sibling, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2022-01-31 15:26 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha, michael.hudson

* Carlos O'Donell:

> diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c
> index 277b9ff042..3b0412b405 100644
> --- a/locale/programs/ld-monetary.c
> +++ b/locale/programs/ld-monetary.c
> @@ -207,7 +207,6 @@ No definition for %s category found"), "LC_MONETARY");
>  
>    TEST_ELEM (int_curr_symbol, "");
>    TEST_ELEM (currency_symbol, "");
> -  TEST_ELEM (mon_decimal_point, ".");
>    TEST_ELEM (mon_thousands_sep, "");
>    TEST_ELEM (positive_sign, "");
>    TEST_ELEM (negative_sign, "");
> @@ -257,6 +256,7 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>  	record_error (0, 0, _("%s: field `%s' not defined"),
>  		      "LC_MONETARY", "mon_decimal_point");
>        monetary->mon_decimal_point = ".";
> +      monetary->mon_decimal_point_wc = L'.';
>      }
>    else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing)
>      {
> @@ -264,8 +264,6 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>  %s: value for field `%s' must not be an empty string"),
>  		    "LC_MONETARY", "mon_decimal_point");
>      }
> -  if (monetary->mon_decimal_point_wc == L'\0')
> -    monetary->mon_decimal_point_wc = L'.';
>  
>    if (monetary->mon_grouping_len == 0)
>      {

There's an existing comment

  /* The decimal point must not be empty.  This is not said explicitly
     in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be
     != "".  */

that says that empty strings/null characters are invalid.
The comment was clearly copied from locale/programs/ld-numeric.c.

*However* we have got this code in stdio-common/printf_fp.c:

      decimal = _nl_lookup (loc, LC_MONETARY, MON_DECIMAL_POINT);
      if (*decimal == '\0')
	decimal = _nl_lookup (loc, LC_NUMERIC, DECIMAL_POINT);
      decimalwc = _nl_lookup_word (loc, LC_MONETARY,
				    _NL_MONETARY_DECIMAL_POINT_WC);
      if (decimalwc == L'\0')
	decimalwc = _nl_lookup_word (loc, LC_NUMERIC,
				      _NL_NUMERIC_DECIMAL_POINT_WC);

So we use LC_NUMERIC as the fallback, and our strfmon implementation is
okay with it.  But our localeconv implementation lacks this fallback,
which looks like a bug because the built-in C locale uses an empty
string/a null character.

Still I think simplifying the locale data is the right direction here.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point
  2022-01-31 15:26   ` Florian Weimer
@ 2022-01-31 16:09     ` Andreas Schwab
  2022-01-31 16:20       ` Florian Weimer
  0 siblings, 1 reply; 15+ messages in thread
From: Andreas Schwab @ 2022-01-31 16:09 UTC (permalink / raw)
  To: Florian Weimer via Libc-alpha; +Cc: Carlos O'Donell, Florian Weimer

On Jan 31 2022, Florian Weimer via Libc-alpha wrote:

> There's an existing comment
>
>   /* The decimal point must not be empty.  This is not said explicitly
>      in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be
>      != "".  */
>
> that says that empty strings/null characters are invalid.

This is only about decimal_point, mon_decimal_point can be empty.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point
  2022-01-31 16:09     ` Andreas Schwab
@ 2022-01-31 16:20       ` Florian Weimer
  2022-01-31 16:30         ` Andreas Schwab
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2022-01-31 16:20 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Florian Weimer via Libc-alpha, Carlos O'Donell

* Andreas Schwab:

> On Jan 31 2022, Florian Weimer via Libc-alpha wrote:
>
>> There's an existing comment
>>
>>   /* The decimal point must not be empty.  This is not said explicitly
>>      in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be
>>      != "".  */
>>
>> that says that empty strings/null characters are invalid.
>
> This is only about decimal_point, mon_decimal_point can be empty.

Hmm, I'll take your word for it.

So the comment should definitely go, and the Carlos' change is the right
way to do it?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point
  2022-01-31 16:20       ` Florian Weimer
@ 2022-01-31 16:30         ` Andreas Schwab
  2022-01-31 16:37           ` Florian Weimer
  0 siblings, 1 reply; 15+ messages in thread
From: Andreas Schwab @ 2022-01-31 16:30 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Florian Weimer via Libc-alpha, Carlos O'Donell

On Jan 31 2022, Florian Weimer wrote:

> * Andreas Schwab:
>
>> On Jan 31 2022, Florian Weimer via Libc-alpha wrote:
>>
>>> There's an existing comment
>>>
>>>   /* The decimal point must not be empty.  This is not said explicitly
>>>      in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be
>>>      != "".  */
>>>
>>> that says that empty strings/null characters are invalid.
>>
>> This is only about decimal_point, mon_decimal_point can be empty.
>
> Hmm, I'll take your word for it.

See 7.11.2.1, paragraph 3 and 10.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point
  2022-01-31 16:30         ` Andreas Schwab
@ 2022-01-31 16:37           ` Florian Weimer
  0 siblings, 0 replies; 15+ messages in thread
From: Florian Weimer @ 2022-01-31 16:37 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Florian Weimer via Libc-alpha

* Andreas Schwab:

> On Jan 31 2022, Florian Weimer wrote:
>
>> * Andreas Schwab:
>>
>>> On Jan 31 2022, Florian Weimer via Libc-alpha wrote:
>>>
>>>> There's an existing comment
>>>>
>>>>   /* The decimal point must not be empty.  This is not said explicitly
>>>>      in POSIX but ANSI C (ISO/IEC 9899) says in 4.4.2.1 it has to be
>>>>      != "".  */
>>>>
>>>> that says that empty strings/null characters are invalid.
>>>
>>> This is only about decimal_point, mon_decimal_point can be empty.
>>
>> Hmm, I'll take your word for it.
>
> See 7.11.2.1, paragraph 3 and 10.

That is fairly conclusive indeed (numbers match C11).

Are you okay with Carlos' patch with a comment update?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point
  2022-01-31  5:34 ` [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point Carlos O'Donell
  2022-01-31 15:26   ` Florian Weimer
@ 2022-02-01 11:47   ` Florian Weimer
  2022-02-01 16:00     ` Carlos O'Donell
  1 sibling, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2022-02-01 11:47 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha, michael.hudson

* Carlos O'Donell:

> The handling of mon_decimal_point is incorrect when it comes to
> handling the empty "" value.  The existing parser in monetary_read()
> will correctly handle setting the non-wide-character value and the
> wide-character value e.g. STR_ELEM_WC(mon_decimal_point) if they are
> set in the locale definition.  However, in monetary_finish() we have
> conflicting TEST_ELEM() which sets a default value (if the locale
> definition doesn't include one), and subsequent code which looks for
> mon_decimal_point to be NULL to issue a specific error message and set
> the defaults. The latter is unused because TEST_ELEM() always sets a
> default.  The simplest solution is to remove the TEST_ELEM() check,
> and allow the existing check to look to see if mon_decimal_point is
> NULL and set an appropriate default.  The final fix is to move the
> setting of mon_decimal_point_wc so it occurs only when
> mon_decimal_point is being set to a default, keeping both values
> consistent. There is no way to tell the difference between
> mon_decimal_point_wc having been set to the empty string and not
> having been defined at all, for that distinction we must use
> mon_decimal_point being NULL or "", and so we must logically set
> the default together with mon_decimal_point.
>
> Lastly, there are more fixes similar to this that could be made to
> ld-monetary.c, but we avoid that in order to fix just the code
> required for mon_decimal_point, which impacts the ability for C.UTF-8
> to set mon_decimal_point to "", since without this fix we end up with
> an inconsistent setting of mon_decimal_point set to "", but
> mon_decimal_point_wc set to "." which is incorrect.
>
> Tested on x86_64 and i686 without regression.
> ---
>  locale/programs/ld-monetary.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c
> index 277b9ff042..3b0412b405 100644
> --- a/locale/programs/ld-monetary.c
> +++ b/locale/programs/ld-monetary.c
> @@ -207,7 +207,6 @@ No definition for %s category found"), "LC_MONETARY");
>  
>    TEST_ELEM (int_curr_symbol, "");
>    TEST_ELEM (currency_symbol, "");
> -  TEST_ELEM (mon_decimal_point, ".");
>    TEST_ELEM (mon_thousands_sep, "");
>    TEST_ELEM (positive_sign, "");
>    TEST_ELEM (negative_sign, "");
> @@ -257,6 +256,7 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>  	record_error (0, 0, _("%s: field `%s' not defined"),
>  		      "LC_MONETARY", "mon_decimal_point");
>        monetary->mon_decimal_point = ".";
> +      monetary->mon_decimal_point_wc = L'.';
>      }
>    else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing)
>      {
> @@ -264,8 +264,6 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>  %s: value for field `%s' must not be an empty string"),
>  		    "LC_MONETARY", "mon_decimal_point");
>      }
> -  if (monetary->mon_decimal_point_wc == L'\0')
> -    monetary->mon_decimal_point_wc = L'.';
>  
>    if (monetary->mon_grouping_len == 0)
>      {

I have verified that this does not change the localedef output for the
existing locales created by install-locale-files.

I think we need further cleanups in the comments and checks (which were
coped from LC_NUMERIC, but should not apply to LC_MONETARY).  But I
think we can release with this version.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Thanks,
Florian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point
  2022-02-01 11:47   ` Florian Weimer
@ 2022-02-01 16:00     ` Carlos O'Donell
  2022-02-01 16:14       ` Carlos O'Donell
  0 siblings, 1 reply; 15+ messages in thread
From: Carlos O'Donell @ 2022-02-01 16:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, michael.hudson

On 2/1/22 06:47, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>> The handling of mon_decimal_point is incorrect when it comes to
>> handling the empty "" value.  The existing parser in monetary_read()
>> will correctly handle setting the non-wide-character value and the
>> wide-character value e.g. STR_ELEM_WC(mon_decimal_point) if they are
>> set in the locale definition.  However, in monetary_finish() we have
>> conflicting TEST_ELEM() which sets a default value (if the locale
>> definition doesn't include one), and subsequent code which looks for
>> mon_decimal_point to be NULL to issue a specific error message and set
>> the defaults. The latter is unused because TEST_ELEM() always sets a
>> default.  The simplest solution is to remove the TEST_ELEM() check,
>> and allow the existing check to look to see if mon_decimal_point is
>> NULL and set an appropriate default.  The final fix is to move the
>> setting of mon_decimal_point_wc so it occurs only when
>> mon_decimal_point is being set to a default, keeping both values
>> consistent. There is no way to tell the difference between
>> mon_decimal_point_wc having been set to the empty string and not
>> having been defined at all, for that distinction we must use
>> mon_decimal_point being NULL or "", and so we must logically set
>> the default together with mon_decimal_point.
>>
>> Lastly, there are more fixes similar to this that could be made to
>> ld-monetary.c, but we avoid that in order to fix just the code
>> required for mon_decimal_point, which impacts the ability for C.UTF-8
>> to set mon_decimal_point to "", since without this fix we end up with
>> an inconsistent setting of mon_decimal_point set to "", but
>> mon_decimal_point_wc set to "." which is incorrect.
>>
>> Tested on x86_64 and i686 without regression.
>> ---
>>  locale/programs/ld-monetary.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c
>> index 277b9ff042..3b0412b405 100644
>> --- a/locale/programs/ld-monetary.c
>> +++ b/locale/programs/ld-monetary.c
>> @@ -207,7 +207,6 @@ No definition for %s category found"), "LC_MONETARY");
>>  
>>    TEST_ELEM (int_curr_symbol, "");
>>    TEST_ELEM (currency_symbol, "");
>> -  TEST_ELEM (mon_decimal_point, ".");
>>    TEST_ELEM (mon_thousands_sep, "");
>>    TEST_ELEM (positive_sign, "");
>>    TEST_ELEM (negative_sign, "");
>> @@ -257,6 +256,7 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>>  	record_error (0, 0, _("%s: field `%s' not defined"),
>>  		      "LC_MONETARY", "mon_decimal_point");
>>        monetary->mon_decimal_point = ".";
>> +      monetary->mon_decimal_point_wc = L'.';
>>      }
>>    else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing)
>>      {
>> @@ -264,8 +264,6 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>>  %s: value for field `%s' must not be an empty string"),
>>  		    "LC_MONETARY", "mon_decimal_point");
>>      }
>> -  if (monetary->mon_decimal_point_wc == L'\0')
>> -    monetary->mon_decimal_point_wc = L'.';
>>  
>>    if (monetary->mon_grouping_len == 0)
>>      {
> 
> I have verified that this does not change the localedef output for the
> existing locales created by install-locale-files.
> 
> I think we need further cleanups in the comments and checks (which were
> coped from LC_NUMERIC, but should not apply to LC_MONETARY).  But I
> think we can release with this version.

I filed this bug to track that:
Bug 28845 - ld-monetary.c should be updated to match ISO C and other standards.
https://sourceware.org/bugzilla/show_bug.cgi?id=28845

Thanks for the review!


> Reviewed-by: Florian Weimer <fweimer@redhat.com>
> 
> Thanks,
> Florian
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point
  2022-02-01 16:00     ` Carlos O'Donell
@ 2022-02-01 16:14       ` Carlos O'Donell
  0 siblings, 0 replies; 15+ messages in thread
From: Carlos O'Donell @ 2022-02-01 16:14 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, michael.hudson

On 2/1/22 11:00, Carlos O'Donell wrote:
> On 2/1/22 06:47, Florian Weimer wrote:
>> * Carlos O'Donell:
>>
>>> The handling of mon_decimal_point is incorrect when it comes to
>>> handling the empty "" value.  The existing parser in monetary_read()
>>> will correctly handle setting the non-wide-character value and the
>>> wide-character value e.g. STR_ELEM_WC(mon_decimal_point) if they are
>>> set in the locale definition.  However, in monetary_finish() we have
>>> conflicting TEST_ELEM() which sets a default value (if the locale
>>> definition doesn't include one), and subsequent code which looks for
>>> mon_decimal_point to be NULL to issue a specific error message and set
>>> the defaults. The latter is unused because TEST_ELEM() always sets a
>>> default.  The simplest solution is to remove the TEST_ELEM() check,
>>> and allow the existing check to look to see if mon_decimal_point is
>>> NULL and set an appropriate default.  The final fix is to move the
>>> setting of mon_decimal_point_wc so it occurs only when
>>> mon_decimal_point is being set to a default, keeping both values
>>> consistent. There is no way to tell the difference between
>>> mon_decimal_point_wc having been set to the empty string and not
>>> having been defined at all, for that distinction we must use
>>> mon_decimal_point being NULL or "", and so we must logically set
>>> the default together with mon_decimal_point.
>>>
>>> Lastly, there are more fixes similar to this that could be made to
>>> ld-monetary.c, but we avoid that in order to fix just the code
>>> required for mon_decimal_point, which impacts the ability for C.UTF-8
>>> to set mon_decimal_point to "", since without this fix we end up with
>>> an inconsistent setting of mon_decimal_point set to "", but
>>> mon_decimal_point_wc set to "." which is incorrect.
>>>
>>> Tested on x86_64 and i686 without regression.
>>> ---
>>>  locale/programs/ld-monetary.c | 4 +---
>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/locale/programs/ld-monetary.c b/locale/programs/ld-monetary.c
>>> index 277b9ff042..3b0412b405 100644
>>> --- a/locale/programs/ld-monetary.c
>>> +++ b/locale/programs/ld-monetary.c
>>> @@ -207,7 +207,6 @@ No definition for %s category found"), "LC_MONETARY");
>>>  
>>>    TEST_ELEM (int_curr_symbol, "");
>>>    TEST_ELEM (currency_symbol, "");
>>> -  TEST_ELEM (mon_decimal_point, ".");
>>>    TEST_ELEM (mon_thousands_sep, "");
>>>    TEST_ELEM (positive_sign, "");
>>>    TEST_ELEM (negative_sign, "");
>>> @@ -257,6 +256,7 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>>>  	record_error (0, 0, _("%s: field `%s' not defined"),
>>>  		      "LC_MONETARY", "mon_decimal_point");
>>>        monetary->mon_decimal_point = ".";
>>> +      monetary->mon_decimal_point_wc = L'.';
>>>      }
>>>    else if (monetary->mon_decimal_point[0] == '\0' && ! be_quiet && ! nothing)
>>>      {
>>> @@ -264,8 +264,6 @@ not correspond to a valid name in ISO 4217 [--no-warnings=intcurrsym]"),
>>>  %s: value for field `%s' must not be an empty string"),
>>>  		    "LC_MONETARY", "mon_decimal_point");
>>>      }
>>> -  if (monetary->mon_decimal_point_wc == L'\0')
>>> -    monetary->mon_decimal_point_wc = L'.';
>>>  
>>>    if (monetary->mon_grouping_len == 0)
>>>      {
>>
>> I have verified that this does not change the localedef output for the
>> existing locales created by install-locale-files.
>>
>> I think we need further cleanups in the comments and checks (which were
>> coped from LC_NUMERIC, but should not apply to LC_MONETARY).  But I
>> think we can release with this version.
> 
> I filed this bug to track that:
> Bug 28845 - ld-monetary.c should be updated to match ISO C and other standards.
> https://sourceware.org/bugzilla/show_bug.cgi?id=28845

And I filed one more bug to track the original bug, which I'll close after push:
https://sourceware.org/bugzilla/show_bug.cgi?id=28847


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX.
  2022-01-31  5:34 [PATCH 0/2] Make C/POSIX and C.UTF-8 consistent Carlos O'Donell
  2022-01-31  5:34 ` [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point Carlos O'Donell
@ 2022-01-31  5:34 ` Carlos O'Donell
  2022-01-31  8:47   ` Andreas Schwab
  2022-02-01 12:05   ` Florian Weimer
  1 sibling, 2 replies; 15+ messages in thread
From: Carlos O'Donell @ 2022-01-31  5:34 UTC (permalink / raw)
  To: libc-alpha, fweimer, michael.hudson

We have had one downstream report from Canonical [1] that
an rrdtool test was broken by the differences in LC_TIME
that we had in the non-builtin C locale (C.UTF-8). If one
application has an issue there are going to be others, and
so with this commit we review and fix all the issues that
cause the builtin C locale to be different from C.UTF-8,
which includes:
* mon_decimal_point should be empty e.g. ""
 - Depends on mon_decimal_point_wc fix.
* negative_sign should be empty e.g. ""
* week should be aligned with ISO 30112 default e.g. 7;19971130;4
* d_fmt corrected with escaped slashes e.g. "%m//%d//%y"
* yesstr and nostr should be empty e.g. ""
* country_ab2 and country_ab3 should be empty e.g. ""

We bump LC_IDENTIFICATION version and adjust the date to
indicate the change in the locale.

A new tst-c-utf8-consistency test is added to ensure
consistency between C/POSIX and C.UTF-8.

Tested on x86_64 and i686 without regression.

[1] https://sourceware.org/pipermail/libc-alpha/2022-January/135703.html

Co-authored-by: Florian Weimer <fweimer@redhat.com>
---
 localedata/Makefile                 |  30 +-
 localedata/locales/C                |  22 +-
 localedata/tst-c-utf8-consistency.c | 539 ++++++++++++++++++++++++++++
 3 files changed, 578 insertions(+), 13 deletions(-)
 create mode 100644 localedata/tst-c-utf8-consistency.c

diff --git a/localedata/Makefile b/localedata/Makefile
index 79db713925..9ae2e5c161 100644
--- a/localedata/Makefile
+++ b/localedata/Makefile
@@ -155,11 +155,31 @@ locale_test_suite := tst_iswalnum tst_iswalpha tst_iswcntrl            \
 		     tst_wcsxfrm tst_wctob tst_wctomb tst_wctrans      \
 		     tst_wctype tst_wcwidth
 
-tests = $(locale_test_suite) tst-digits tst-setlocale bug-iconv-trans \
-	tst-leaks tst-mbswcs1 tst-mbswcs2 tst-mbswcs3 tst-mbswcs4 tst-mbswcs5 \
-	tst-mbswcs6 tst-xlocale1 tst-xlocale2 bug-usesetlocale \
-	tst-strfmon1 tst-sscanf bug-setlocale1 tst-setlocale2 tst-setlocale3 \
-	tst-wctype tst-iconv-math-trans
+tests = \
+  $(locale_test_suite) \
+  bug-iconv-trans \
+  bug-setlocale1 \
+  bug-usesetlocale \
+  tst-c-utf8-consistency \
+  tst-digits \
+  tst-iconv-math-trans \
+  tst-leaks \
+  tst-mbswcs1 \
+  tst-mbswcs2 \
+  tst-mbswcs3 \
+  tst-mbswcs4 \
+  tst-mbswcs5 \
+  tst-mbswcs6 \
+  tst-setlocale \
+  tst-setlocale2 \
+  tst-setlocale3 \
+  tst-sscanf \
+  tst-strfmon1 \
+  tst-wctype \
+  tst-xlocale1 \
+  tst-xlocale2 \
+  # tests
+
 tests-static = bug-setlocale1-static
 tests += $(tests-static)
 ifeq (yes,$(build-shared))
diff --git a/localedata/locales/C b/localedata/locales/C
index ca801c79cf..fb647ccc4b 100644
--- a/localedata/locales/C
+++ b/localedata/locales/C
@@ -12,8 +12,8 @@ tel        ""
 fax        ""
 language   ""
 territory  ""
-revision   "2.0"
-date       "2020-06-28"
+revision   "2.1"
+date       "2022-01-30"
 category  "i18n:2012";LC_IDENTIFICATION
 category  "i18n:2012";LC_CTYPE
 category  "i18n:2012";LC_COLLATE
@@ -68,11 +68,11 @@ LC_MONETARY
 % glibc/locale/C-monetary.c.).
 int_curr_symbol     ""
 currency_symbol     ""
-mon_decimal_point   "."
+mon_decimal_point   ""
 mon_thousands_sep   ""
 mon_grouping        -1
 positive_sign       ""
-negative_sign       "-"
+negative_sign       ""
 int_frac_digits     -1
 frac_digits         -1
 p_cs_precedes       -1
@@ -121,7 +121,9 @@ mon         "January";"February";"March";"April";"May";"June";"July";/
 %
 % ISO 8601 conforming applications should use the values 7, 19971201 (a
 % Monday), and 4 (Thursday), respectively.
-week    7;19971201;4
+%
+% This field is consciously aligned with ISO 30112 and the C/POSIX locale.
+week    7;19971130;4
 first_weekday	1
 first_workday	2
 
@@ -129,7 +131,7 @@ first_workday	2
 d_t_fmt "%a %b %e %H:%M:%S %Y"
 
 % Appropriate date representation (%x)
-d_fmt   "%m/%d/%y"
+d_fmt   "%m//%d//%y"
 
 % Appropriate time representation (%X)
 t_fmt   "%H:%M:%S"
@@ -150,8 +152,8 @@ LC_MESSAGES
 %
 yesexpr "^[yY]"
 noexpr  "^[nN]"
-yesstr  "Yes"
-nostr   "No"
+yesstr  ""
+nostr   ""
 END LC_MESSAGES
 
 LC_PAPER
@@ -175,6 +177,10 @@ LC_ADDRESS
 % the LC_ADDRESS category.
 % (also used in the built in C/POSIX locale in glibc/locale/C-address.c)
 postal_fmt    "%a%N%f%N%d%N%b%N%s %h %e %r%N%C-%z %T%N%c%N"
+% The abbreviated 2 char and 3 char should be set to empty strings to
+% match the C/POSIX locale.
+country_ab2   ""
+country_ab3   ""
 END LC_ADDRESS
 
 LC_TELEPHONE
diff --git a/localedata/tst-c-utf8-consistency.c b/localedata/tst-c-utf8-consistency.c
new file mode 100644
index 0000000000..50feed3090
--- /dev/null
+++ b/localedata/tst-c-utf8-consistency.c
@@ -0,0 +1,539 @@
+/* Test that C/POSIX and C.UTF-8 are consistent.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <langinfo.h>
+#include <locale.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <support/check.h>
+
+/* Initialized by do_test using newlocale.  */
+static locale_t c_utf8;
+
+/* Set to true for second pass.  */
+static bool use_nl_langinfo_l;
+
+static void
+switch_to_c (void)
+{
+  if (setlocale (LC_ALL, "C") == NULL)
+    FAIL_EXIT1 ("setlocale (LC_ALL, \"C\")");
+}
+
+static void
+switch_to_c_utf8 (void)
+{
+  if (setlocale (LC_ALL, "C.UTF-8") == NULL)
+    FAIL_EXIT1 ("setlocale (LC_ALL, \"C.UTF-8\")");
+}
+
+static char *
+str (nl_item item)
+{
+  if (!use_nl_langinfo_l)
+    switch_to_c  ();
+  return nl_langinfo (item);
+}
+
+static char *
+str_utf8 (nl_item item)
+{
+  if (use_nl_langinfo_l)
+    return nl_langinfo_l (item, c_utf8);
+  else
+    {
+      switch_to_c_utf8  ();
+      return nl_langinfo (item);
+    }
+}
+
+static wchar_t *
+wstr (nl_item item)
+{
+  return (wchar_t *) str (item);
+}
+
+static wchar_t *
+wstr_utf8 (nl_item item)
+{
+  return (wchar_t *) str_utf8 (item);
+}
+
+static int
+byte (nl_item item)
+{
+  return (signed char) *str (item);
+}
+
+static int
+byte_utf8 (nl_item item)
+{
+  return (signed char) *str_utf8 (item);
+}
+
+static int
+word (nl_item item)
+{
+  union
+  {
+    char *ptr;
+    int word;
+  } u;
+  u.ptr = str (item);
+  return u.word;
+}
+
+static int
+word_utf8 (nl_item item)
+{
+  union
+  {
+    char *ptr;
+    int word;
+  } u;
+  u.ptr = str_utf8 (item);
+  return u.word;
+}
+
+static void
+one_pass (void)
+{
+  /* LC_TIME.  */
+  TEST_COMPARE_STRING (str (ABDAY_1), str_utf8 (ABDAY_1));
+  TEST_COMPARE_STRING (str (ABDAY_2), str_utf8 (ABDAY_2));
+  TEST_COMPARE_STRING (str (ABDAY_3), str_utf8 (ABDAY_3));
+  TEST_COMPARE_STRING (str (ABDAY_4), str_utf8 (ABDAY_4));
+  TEST_COMPARE_STRING (str (ABDAY_5), str_utf8 (ABDAY_5));
+  TEST_COMPARE_STRING (str (ABDAY_6), str_utf8 (ABDAY_6));
+  TEST_COMPARE_STRING (str (ABDAY_7), str_utf8 (ABDAY_7));
+
+  TEST_COMPARE_STRING (str (DAY_1), str_utf8 (DAY_1));
+  TEST_COMPARE_STRING (str (DAY_2), str_utf8 (DAY_2));
+  TEST_COMPARE_STRING (str (DAY_3), str_utf8 (DAY_3));
+  TEST_COMPARE_STRING (str (DAY_4), str_utf8 (DAY_4));
+  TEST_COMPARE_STRING (str (DAY_5), str_utf8 (DAY_5));
+  TEST_COMPARE_STRING (str (DAY_6), str_utf8 (DAY_6));
+  TEST_COMPARE_STRING (str (DAY_7), str_utf8 (DAY_7));
+
+  TEST_COMPARE_STRING (str (ABMON_1), str_utf8 (ABMON_1));
+  TEST_COMPARE_STRING (str (ABMON_2), str_utf8 (ABMON_2));
+  TEST_COMPARE_STRING (str (ABMON_3), str_utf8 (ABMON_3));
+  TEST_COMPARE_STRING (str (ABMON_4), str_utf8 (ABMON_4));
+  TEST_COMPARE_STRING (str (ABMON_5), str_utf8 (ABMON_5));
+  TEST_COMPARE_STRING (str (ABMON_6), str_utf8 (ABMON_6));
+  TEST_COMPARE_STRING (str (ABMON_7), str_utf8 (ABMON_7));
+  TEST_COMPARE_STRING (str (ABMON_8), str_utf8 (ABMON_8));
+  TEST_COMPARE_STRING (str (ABMON_9), str_utf8 (ABMON_9));
+  TEST_COMPARE_STRING (str (ABMON_10), str_utf8 (ABMON_10));
+  TEST_COMPARE_STRING (str (ABMON_11), str_utf8 (ABMON_11));
+  TEST_COMPARE_STRING (str (ABMON_12), str_utf8 (ABMON_12));
+
+  TEST_COMPARE_STRING (str (MON_1), str_utf8 (MON_1));
+  TEST_COMPARE_STRING (str (MON_2), str_utf8 (MON_2));
+  TEST_COMPARE_STRING (str (MON_3), str_utf8 (MON_3));
+  TEST_COMPARE_STRING (str (MON_4), str_utf8 (MON_4));
+  TEST_COMPARE_STRING (str (MON_5), str_utf8 (MON_5));
+  TEST_COMPARE_STRING (str (MON_6), str_utf8 (MON_6));
+  TEST_COMPARE_STRING (str (MON_7), str_utf8 (MON_7));
+  TEST_COMPARE_STRING (str (MON_8), str_utf8 (MON_8));
+  TEST_COMPARE_STRING (str (MON_9), str_utf8 (MON_9));
+  TEST_COMPARE_STRING (str (MON_10), str_utf8 (MON_10));
+  TEST_COMPARE_STRING (str (MON_11), str_utf8 (MON_11));
+  TEST_COMPARE_STRING (str (MON_12), str_utf8 (MON_12));
+
+  TEST_COMPARE_STRING (str (AM_STR), str_utf8 (AM_STR));
+  TEST_COMPARE_STRING (str (PM_STR), str_utf8 (PM_STR));
+
+  TEST_COMPARE_STRING (str (D_T_FMT), str_utf8 (D_T_FMT));
+  TEST_COMPARE_STRING (str (D_FMT), str_utf8 (D_FMT));
+  TEST_COMPARE_STRING (str (T_FMT), str_utf8 (T_FMT));
+  TEST_COMPARE_STRING (str (T_FMT_AMPM),
+                       str_utf8 (T_FMT_AMPM));
+
+  TEST_COMPARE_STRING (str (ERA), str_utf8 (ERA));
+  TEST_COMPARE_STRING (str (ERA_YEAR), str_utf8 (ERA_YEAR));
+  TEST_COMPARE_STRING (str (ERA_D_FMT), str_utf8 (ERA_D_FMT));
+  TEST_COMPARE_STRING (str (ALT_DIGITS), str_utf8 (ALT_DIGITS));
+  TEST_COMPARE_STRING (str (ERA_D_T_FMT), str_utf8 (ERA_D_T_FMT));
+  TEST_COMPARE_STRING (str (ERA_T_FMT), str_utf8 (ERA_T_FMT));
+  TEST_COMPARE (word (_NL_TIME_ERA_NUM_ENTRIES),
+                word_utf8 (_NL_TIME_ERA_NUM_ENTRIES));
+  /* No array elements, so nothing to compare for _NL_TIME_ERA_ENTRIES.  */
+  TEST_COMPARE (word (_NL_TIME_ERA_NUM_ENTRIES), 0);
+
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABDAY_1), wstr_utf8 (_NL_WABDAY_1));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABDAY_2), wstr_utf8 (_NL_WABDAY_2));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABDAY_3), wstr_utf8 (_NL_WABDAY_3));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABDAY_4), wstr_utf8 (_NL_WABDAY_4));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABDAY_5), wstr_utf8 (_NL_WABDAY_5));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABDAY_6), wstr_utf8 (_NL_WABDAY_6));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABDAY_7), wstr_utf8 (_NL_WABDAY_7));
+
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WDAY_1), wstr_utf8 (_NL_WDAY_1));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WDAY_2), wstr_utf8 (_NL_WDAY_2));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WDAY_3), wstr_utf8 (_NL_WDAY_3));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WDAY_4), wstr_utf8 (_NL_WDAY_4));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WDAY_5), wstr_utf8 (_NL_WDAY_5));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WDAY_6), wstr_utf8 (_NL_WDAY_6));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WDAY_7), wstr_utf8 (_NL_WDAY_7));
+
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_1), wstr_utf8 (_NL_WABMON_1));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_2), wstr_utf8 (_NL_WABMON_2));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_3), wstr_utf8 (_NL_WABMON_3));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_4), wstr_utf8 (_NL_WABMON_4));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_5), wstr_utf8 (_NL_WABMON_5));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_6), wstr_utf8 (_NL_WABMON_6));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_7), wstr_utf8 (_NL_WABMON_7));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_8), wstr_utf8 (_NL_WABMON_8));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_9), wstr_utf8 (_NL_WABMON_9));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_10), wstr_utf8 (_NL_WABMON_10));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_11), wstr_utf8 (_NL_WABMON_11));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABMON_12), wstr_utf8 (_NL_WABMON_12));
+
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_1), wstr_utf8 (_NL_WMON_1));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_2), wstr_utf8 (_NL_WMON_2));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_3), wstr_utf8 (_NL_WMON_3));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_4), wstr_utf8 (_NL_WMON_4));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_5), wstr_utf8 (_NL_WMON_5));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_6), wstr_utf8 (_NL_WMON_6));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_7), wstr_utf8 (_NL_WMON_7));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_8), wstr_utf8 (_NL_WMON_8));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_9), wstr_utf8 (_NL_WMON_9));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_10), wstr_utf8 (_NL_WMON_10));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_11), wstr_utf8 (_NL_WMON_11));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WMON_12), wstr_utf8 (_NL_WMON_12));
+
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WAM_STR), wstr_utf8 (_NL_WAM_STR));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WPM_STR), wstr_utf8 (_NL_WPM_STR));
+
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WD_T_FMT), wstr_utf8 (_NL_WD_T_FMT));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WD_FMT), wstr_utf8 (_NL_WD_FMT));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WT_FMT), wstr_utf8 (_NL_WT_FMT));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WT_FMT_AMPM),
+                            wstr_utf8 (_NL_WT_FMT_AMPM));
+
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WERA_YEAR), wstr_utf8 (_NL_WERA_YEAR));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WERA_D_FMT), wstr_utf8 (_NL_WERA_D_FMT));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WALT_DIGITS),
+                            wstr_utf8 (_NL_WALT_DIGITS));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WERA_D_T_FMT),
+                            wstr_utf8 (_NL_WERA_D_T_FMT));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WERA_T_FMT), wstr_utf8 (_NL_WERA_T_FMT));
+
+  /* This is somewhat inconsistent, but see locale/categories.def.  */
+  TEST_COMPARE (byte (_NL_TIME_WEEK_NDAYS), byte_utf8 (_NL_TIME_WEEK_NDAYS));
+  TEST_COMPARE (word (_NL_TIME_WEEK_1STDAY),
+                word_utf8 (_NL_TIME_WEEK_1STDAY));
+  TEST_COMPARE (byte (_NL_TIME_WEEK_1STWEEK),
+                byte_utf8 (_NL_TIME_WEEK_1STWEEK));
+  TEST_COMPARE (byte (_NL_TIME_FIRST_WEEKDAY),
+                byte_utf8 (_NL_TIME_FIRST_WEEKDAY));
+  TEST_COMPARE (byte (_NL_TIME_FIRST_WORKDAY),
+                byte_utf8 (_NL_TIME_FIRST_WORKDAY));
+  TEST_COMPARE (byte (_NL_TIME_CAL_DIRECTION),
+                byte_utf8 (_NL_TIME_CAL_DIRECTION));
+  TEST_COMPARE_STRING (str (_NL_TIME_TIMEZONE), str_utf8 (_NL_TIME_TIMEZONE));
+
+  TEST_COMPARE_STRING (str (_DATE_FMT), str_utf8 (_DATE_FMT));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_W_DATE_FMT), wstr_utf8 (_NL_W_DATE_FMT));
+
+  /* Expected difference.  */
+  TEST_COMPARE_STRING (str (_NL_TIME_CODESET), "ANSI_X3.4-1968");
+  TEST_COMPARE_STRING (str_utf8 (_NL_TIME_CODESET), "UTF-8");
+
+  TEST_COMPARE_STRING (str (ALTMON_1), str_utf8 (ALTMON_1));
+  TEST_COMPARE_STRING (str (ALTMON_2), str_utf8 (ALTMON_2));
+  TEST_COMPARE_STRING (str (ALTMON_3), str_utf8 (ALTMON_3));
+  TEST_COMPARE_STRING (str (ALTMON_4), str_utf8 (ALTMON_4));
+  TEST_COMPARE_STRING (str (ALTMON_5), str_utf8 (ALTMON_5));
+  TEST_COMPARE_STRING (str (ALTMON_6), str_utf8 (ALTMON_6));
+  TEST_COMPARE_STRING (str (ALTMON_7), str_utf8 (ALTMON_7));
+  TEST_COMPARE_STRING (str (ALTMON_8), str_utf8 (ALTMON_8));
+  TEST_COMPARE_STRING (str (ALTMON_9), str_utf8 (ALTMON_9));
+  TEST_COMPARE_STRING (str (ALTMON_10), str_utf8 (ALTMON_10));
+  TEST_COMPARE_STRING (str (ALTMON_11), str_utf8 (ALTMON_11));
+  TEST_COMPARE_STRING (str (ALTMON_12), str_utf8 (ALTMON_12));
+
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_1), wstr_utf8 (_NL_WALTMON_1));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_2), wstr_utf8 (_NL_WALTMON_2));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_3), wstr_utf8 (_NL_WALTMON_3));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_4), wstr_utf8 (_NL_WALTMON_4));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_5), wstr_utf8 (_NL_WALTMON_5));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_6), wstr_utf8 (_NL_WALTMON_6));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_7), wstr_utf8 (_NL_WALTMON_7));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_8), wstr_utf8 (_NL_WALTMON_8));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_9), wstr_utf8 (_NL_WALTMON_9));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_10), wstr_utf8 (_NL_WALTMON_10));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_11), wstr_utf8 (_NL_WALTMON_11));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WALTMON_12), wstr_utf8 (_NL_WALTMON_12));
+
+  TEST_COMPARE_STRING (str (_NL_ABALTMON_1), str_utf8 (_NL_ABALTMON_1));
+  TEST_COMPARE_STRING (str (_NL_ABALTMON_2), str_utf8 (_NL_ABALTMON_2));
+  TEST_COMPARE_STRING (str (_NL_ABALTMON_3), str_utf8 (_NL_ABALTMON_3));
+  TEST_COMPARE_STRING (str (_NL_ABALTMON_4), str_utf8 (_NL_ABALTMON_4));
+  TEST_COMPARE_STRING (str (_NL_ABALTMON_5), str_utf8 (_NL_ABALTMON_5));
+  TEST_COMPARE_STRING (str (_NL_ABALTMON_6), str_utf8 (_NL_ABALTMON_6));
+  TEST_COMPARE_STRING (str (_NL_ABALTMON_7), str_utf8 (_NL_ABALTMON_7));
+  TEST_COMPARE_STRING (str (_NL_ABALTMON_8), str_utf8 (_NL_ABALTMON_8));
+  TEST_COMPARE_STRING (str (_NL_ABALTMON_9), str_utf8 (_NL_ABALTMON_9));
+  TEST_COMPARE_STRING (str (_NL_ABALTMON_10), str_utf8 (_NL_ABALTMON_10));
+  TEST_COMPARE_STRING (str (_NL_ABALTMON_11), str_utf8 (_NL_ABALTMON_11));
+  TEST_COMPARE_STRING (str (_NL_ABALTMON_12), str_utf8 (_NL_ABALTMON_12));
+
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_1),
+                            wstr_utf8 (_NL_WABALTMON_1));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_2),
+                            wstr_utf8 (_NL_WABALTMON_2));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_3),
+                            wstr_utf8 (_NL_WABALTMON_3));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_4),
+                            wstr_utf8 (_NL_WABALTMON_4));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_5),
+                            wstr_utf8 (_NL_WABALTMON_5));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_6),
+                            wstr_utf8 (_NL_WABALTMON_6));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_7),
+                            wstr_utf8 (_NL_WABALTMON_7));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_8),
+                            wstr_utf8 (_NL_WABALTMON_8));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_9),
+                            wstr_utf8 (_NL_WABALTMON_9));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_10),
+                            wstr_utf8 (_NL_WABALTMON_10));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_11),
+                            wstr_utf8 (_NL_WABALTMON_11));
+  TEST_COMPARE_STRING_WIDE (wstr (_NL_WABALTMON_12),
+                            wstr_utf8 (_NL_WABALTMON_12));
+
+  /* LC_COLLATE.  Mostly untested, only expected differences.  */
+  TEST_COMPARE_STRING (str (_NL_COLLATE_CODESET), "ANSI_X3.4-1968");
+  TEST_COMPARE_STRING (str_utf8 (_NL_COLLATE_CODESET), "UTF-8");
+
+  /* LC_CTYPE.  Mostly untested, only expected differences.  */
+  TEST_COMPARE_STRING (str (CODESET), "ANSI_X3.4-1968");
+  TEST_COMPARE_STRING (str_utf8 (CODESET), "UTF-8");
+
+  /* LC_MONETARY.  */
+  TEST_COMPARE_STRING (str (INT_CURR_SYMBOL), str_utf8 (INT_CURR_SYMBOL));
+  TEST_COMPARE_STRING (str (CURRENCY_SYMBOL), str_utf8 (CURRENCY_SYMBOL));
+  TEST_COMPARE_STRING (str (MON_DECIMAL_POINT), str_utf8 (MON_DECIMAL_POINT));
+  TEST_COMPARE_STRING (str (MON_THOUSANDS_SEP), str_utf8 (MON_THOUSANDS_SEP));
+  TEST_COMPARE_STRING (str (MON_GROUPING), str_utf8 (MON_GROUPING));
+  TEST_COMPARE_STRING (str (POSITIVE_SIGN), str_utf8 (POSITIVE_SIGN));
+  TEST_COMPARE_STRING (str (NEGATIVE_SIGN), str_utf8 (NEGATIVE_SIGN));
+  TEST_COMPARE (byte (INT_FRAC_DIGITS), byte_utf8 (INT_FRAC_DIGITS));
+  TEST_COMPARE (byte (FRAC_DIGITS), byte_utf8 (FRAC_DIGITS));
+  TEST_COMPARE (byte (P_CS_PRECEDES), byte_utf8 (P_CS_PRECEDES));
+  TEST_COMPARE (byte (P_SEP_BY_SPACE), byte_utf8 (P_SEP_BY_SPACE));
+  TEST_COMPARE (byte (N_CS_PRECEDES), byte_utf8 (N_CS_PRECEDES));
+  TEST_COMPARE (byte (N_SEP_BY_SPACE), byte_utf8 (N_SEP_BY_SPACE));
+  TEST_COMPARE (byte (P_SIGN_POSN), byte_utf8 (P_SIGN_POSN));
+  TEST_COMPARE (byte (N_SIGN_POSN), byte_utf8 (N_SIGN_POSN));
+  TEST_COMPARE_STRING (str (CRNCYSTR), str_utf8 (CRNCYSTR));
+  TEST_COMPARE (byte (INT_P_CS_PRECEDES), byte_utf8 (INT_P_CS_PRECEDES));
+  TEST_COMPARE (byte (INT_P_SEP_BY_SPACE), byte_utf8 (INT_P_SEP_BY_SPACE));
+  TEST_COMPARE (byte (INT_N_CS_PRECEDES), byte_utf8 (INT_N_CS_PRECEDES));
+  TEST_COMPARE (byte (INT_N_SEP_BY_SPACE), byte_utf8 (INT_N_SEP_BY_SPACE));
+  TEST_COMPARE (byte (INT_P_SIGN_POSN), byte_utf8 (INT_P_SIGN_POSN));
+  TEST_COMPARE (byte (INT_N_SIGN_POSN), byte_utf8 (INT_N_SIGN_POSN));
+  TEST_COMPARE_STRING (str (_NL_MONETARY_DUO_INT_CURR_SYMBOL),
+                       str_utf8 (_NL_MONETARY_DUO_INT_CURR_SYMBOL));
+  TEST_COMPARE_STRING (str (_NL_MONETARY_DUO_CURRENCY_SYMBOL),
+                       str_utf8 (_NL_MONETARY_DUO_CURRENCY_SYMBOL));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_FRAC_DIGITS),
+                byte_utf8 (_NL_MONETARY_DUO_INT_FRAC_DIGITS));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_FRAC_DIGITS),
+                byte_utf8 (_NL_MONETARY_DUO_FRAC_DIGITS));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_P_CS_PRECEDES),
+                byte_utf8 (_NL_MONETARY_DUO_P_CS_PRECEDES));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_P_SEP_BY_SPACE),
+                byte_utf8 (_NL_MONETARY_DUO_P_SEP_BY_SPACE));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_N_CS_PRECEDES),
+                byte_utf8 (_NL_MONETARY_DUO_N_CS_PRECEDES));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_N_SEP_BY_SPACE),
+                byte_utf8 (_NL_MONETARY_DUO_N_SEP_BY_SPACE));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_P_CS_PRECEDES),
+                byte_utf8 (_NL_MONETARY_DUO_INT_P_CS_PRECEDES));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_P_SEP_BY_SPACE),
+                byte_utf8 (_NL_MONETARY_DUO_INT_P_SEP_BY_SPACE));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_N_CS_PRECEDES),
+                byte_utf8 (_NL_MONETARY_DUO_INT_N_CS_PRECEDES));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_N_SEP_BY_SPACE),
+                byte_utf8 (_NL_MONETARY_DUO_INT_N_SEP_BY_SPACE));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_P_SIGN_POSN),
+                byte_utf8 (_NL_MONETARY_DUO_INT_P_SIGN_POSN));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_N_SIGN_POSN),
+                byte_utf8 (_NL_MONETARY_DUO_INT_N_SIGN_POSN));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_P_SIGN_POSN),
+                byte_utf8 (_NL_MONETARY_DUO_P_SIGN_POSN));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_N_SIGN_POSN),
+                byte_utf8 (_NL_MONETARY_DUO_N_SIGN_POSN));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_P_SIGN_POSN),
+                byte_utf8 (_NL_MONETARY_DUO_INT_P_SIGN_POSN));
+  TEST_COMPARE (byte (_NL_MONETARY_DUO_INT_N_SIGN_POSN),
+                byte_utf8 (_NL_MONETARY_DUO_INT_N_SIGN_POSN));
+  TEST_COMPARE (word (_NL_MONETARY_UNO_VALID_FROM),
+                word_utf8 (_NL_MONETARY_UNO_VALID_FROM));
+  TEST_COMPARE (word (_NL_MONETARY_UNO_VALID_TO),
+                word_utf8 (_NL_MONETARY_UNO_VALID_TO));
+  TEST_COMPARE (word (_NL_MONETARY_DUO_VALID_FROM),
+                word_utf8 (_NL_MONETARY_DUO_VALID_FROM));
+  TEST_COMPARE (word (_NL_MONETARY_DUO_VALID_TO),
+                word_utf8 (_NL_MONETARY_DUO_VALID_TO));
+  /* _NL_MONETARY_CONVERSION_RATE cannot be tested (word array).  */
+  TEST_COMPARE (word (_NL_MONETARY_DECIMAL_POINT_WC),
+                word_utf8 (_NL_MONETARY_DECIMAL_POINT_WC));
+  TEST_COMPARE (word (_NL_MONETARY_THOUSANDS_SEP_WC),
+                word_utf8 (_NL_MONETARY_THOUSANDS_SEP_WC));
+  /* Expected difference.  */
+  TEST_COMPARE_STRING (str (_NL_MONETARY_CODESET), "ANSI_X3.4-1968");
+  TEST_COMPARE_STRING (str_utf8 (_NL_MONETARY_CODESET), "UTF-8");
+
+  /* LC_NUMERIC.  */
+
+  TEST_COMPARE_STRING (str (DECIMAL_POINT), str_utf8 (DECIMAL_POINT));
+  TEST_COMPARE_STRING (str (RADIXCHAR), str_utf8 (RADIXCHAR));
+  TEST_COMPARE_STRING (str (THOUSANDS_SEP), str_utf8 (THOUSANDS_SEP));
+  TEST_COMPARE_STRING (str (THOUSEP), str_utf8 (THOUSEP));
+  TEST_COMPARE_STRING (str (GROUPING), str_utf8 (GROUPING));
+  TEST_COMPARE (word (_NL_NUMERIC_DECIMAL_POINT_WC),
+                word_utf8 (_NL_NUMERIC_DECIMAL_POINT_WC));
+  TEST_COMPARE (word (_NL_NUMERIC_THOUSANDS_SEP_WC),
+                word_utf8 (_NL_NUMERIC_THOUSANDS_SEP_WC));
+  /* Expected difference.  */
+  TEST_COMPARE_STRING (str (_NL_NUMERIC_CODESET), "ANSI_X3.4-1968");
+  TEST_COMPARE_STRING (str_utf8 (_NL_NUMERIC_CODESET), "UTF-8");
+
+  /* LC_MESSAGES.  */
+
+  TEST_COMPARE_STRING (str (YESEXPR), str_utf8 (YESEXPR));
+  TEST_COMPARE_STRING (str (NOEXPR), str_utf8 (NOEXPR));
+  TEST_COMPARE_STRING (str (YESSTR), str_utf8 (YESSTR));
+  TEST_COMPARE_STRING (str (NOSTR), str_utf8 (NOSTR));
+  /* Expected difference.  */
+  TEST_COMPARE_STRING (str (_NL_MESSAGES_CODESET), "ANSI_X3.4-1968");
+  TEST_COMPARE_STRING (str_utf8 (_NL_MESSAGES_CODESET), "UTF-8");
+
+  /* LC_PAPER.  */
+
+  TEST_COMPARE (word (_NL_PAPER_HEIGHT), word_utf8 (_NL_PAPER_HEIGHT));
+  TEST_COMPARE (word (_NL_PAPER_WIDTH), word_utf8 (_NL_PAPER_WIDTH));
+  /* Expected difference.  */
+  TEST_COMPARE_STRING (str (_NL_PAPER_CODESET), "ANSI_X3.4-1968");
+  TEST_COMPARE_STRING (str_utf8 (_NL_PAPER_CODESET), "UTF-8");
+
+  /* LC_NAME.  */
+
+  TEST_COMPARE_STRING (str (_NL_NAME_NAME_FMT),
+                       str_utf8 (_NL_NAME_NAME_FMT));
+  TEST_COMPARE_STRING (str (_NL_NAME_NAME_GEN),
+                       str_utf8 (_NL_NAME_NAME_GEN));
+  TEST_COMPARE_STRING (str (_NL_NAME_NAME_MR),
+                       str_utf8 (_NL_NAME_NAME_MR));
+  TEST_COMPARE_STRING (str (_NL_NAME_NAME_MRS),
+                       str_utf8 (_NL_NAME_NAME_MRS));
+  TEST_COMPARE_STRING (str (_NL_NAME_NAME_MISS),
+                       str_utf8 (_NL_NAME_NAME_MISS));
+  TEST_COMPARE_STRING (str (_NL_NAME_NAME_MS),
+                       str_utf8 (_NL_NAME_NAME_MS));
+  /* Expected difference.  */
+  TEST_COMPARE_STRING (str (_NL_NAME_CODESET), "ANSI_X3.4-1968");
+  TEST_COMPARE_STRING (str_utf8 (_NL_NAME_CODESET), "UTF-8");
+
+  /* LC_ADDRESS.  */
+
+  TEST_COMPARE_STRING (str (_NL_ADDRESS_POSTAL_FMT),
+                       str_utf8 (_NL_ADDRESS_POSTAL_FMT));
+  TEST_COMPARE_STRING (str (_NL_ADDRESS_COUNTRY_NAME),
+                       str_utf8 (_NL_ADDRESS_COUNTRY_NAME));
+  TEST_COMPARE_STRING (str (_NL_ADDRESS_COUNTRY_POST),
+                       str_utf8 (_NL_ADDRESS_COUNTRY_POST));
+  TEST_COMPARE_STRING (str (_NL_ADDRESS_COUNTRY_AB2),
+                       str_utf8 (_NL_ADDRESS_COUNTRY_AB2));
+  TEST_COMPARE_STRING (str (_NL_ADDRESS_COUNTRY_AB3),
+                       str_utf8 (_NL_ADDRESS_COUNTRY_AB3));
+  TEST_COMPARE_STRING (str (_NL_ADDRESS_COUNTRY_CAR),
+                       str_utf8 (_NL_ADDRESS_COUNTRY_CAR));
+  TEST_COMPARE (word (_NL_ADDRESS_COUNTRY_NUM),
+                word_utf8 (_NL_ADDRESS_COUNTRY_NUM));
+  TEST_COMPARE_STRING (str (_NL_ADDRESS_COUNTRY_ISBN),
+                       str_utf8 (_NL_ADDRESS_COUNTRY_ISBN));
+  TEST_COMPARE_STRING (str (_NL_ADDRESS_LANG_NAME),
+                       str_utf8 (_NL_ADDRESS_LANG_NAME));
+  TEST_COMPARE_STRING (str (_NL_ADDRESS_LANG_AB),
+                       str_utf8 (_NL_ADDRESS_LANG_AB));
+  TEST_COMPARE_STRING (str (_NL_ADDRESS_LANG_TERM),
+                       str_utf8 (_NL_ADDRESS_LANG_TERM));
+  TEST_COMPARE_STRING (str (_NL_ADDRESS_LANG_LIB),
+                       str_utf8 (_NL_ADDRESS_LANG_LIB));
+  /* Expected difference.  */
+  TEST_COMPARE_STRING (str (_NL_ADDRESS_CODESET), "ANSI_X3.4-1968");
+  TEST_COMPARE_STRING (str_utf8 (_NL_ADDRESS_CODESET), "UTF-8");
+
+  /* LC_TELEPHONE.  */
+
+  TEST_COMPARE_STRING (str (_NL_TELEPHONE_TEL_INT_FMT),
+                       str_utf8 (_NL_TELEPHONE_TEL_INT_FMT));
+  TEST_COMPARE_STRING (str (_NL_TELEPHONE_TEL_DOM_FMT),
+                       str_utf8 (_NL_TELEPHONE_TEL_DOM_FMT));
+  TEST_COMPARE_STRING (str (_NL_TELEPHONE_INT_SELECT),
+                       str_utf8 (_NL_TELEPHONE_INT_SELECT));
+  TEST_COMPARE_STRING (str (_NL_TELEPHONE_INT_PREFIX),
+                       str_utf8 (_NL_TELEPHONE_INT_PREFIX));
+  /* Expected difference.  */
+  TEST_COMPARE_STRING (str (_NL_TELEPHONE_CODESET), "ANSI_X3.4-1968");
+  TEST_COMPARE_STRING (str_utf8 (_NL_TELEPHONE_CODESET), "UTF-8");
+
+  /* LC_MEASUREMENT.  */
+
+  TEST_COMPARE (byte (_NL_MEASUREMENT_MEASUREMENT),
+                byte_utf8 (_NL_MEASUREMENT_MEASUREMENT));
+  /* Expected difference.  */
+  TEST_COMPARE_STRING (str (_NL_MEASUREMENT_CODESET), "ANSI_X3.4-1968");
+  TEST_COMPARE_STRING (str_utf8 (_NL_MEASUREMENT_CODESET), "UTF-8");
+
+  /* LC_IDENTIFICATION is skipped since C.UTF-8 is distinct from C.  */
+
+  /* _NL_IDENTIFICATION_CATEGORY cannot be tested because it is a
+     string array.  */
+  /* Expected difference.  */
+  TEST_COMPARE_STRING (str (_NL_IDENTIFICATION_CODESET), "ANSI_X3.4-1968");
+  TEST_COMPARE_STRING (str_utf8 (_NL_IDENTIFICATION_CODESET), "UTF-8");
+}
+
+static int
+do_test (void)
+{
+  puts ("info: using setlocale and nl_langinfo");
+  one_pass ();
+
+  puts ("info: using nl_langinfo_l");
+
+  c_utf8 = newlocale (LC_ALL_MASK, "C.UTF-8", (locale_t) 0);
+  TEST_VERIFY_EXIT (c_utf8 != (locale_t) 0);
+
+  switch_to_c ();
+  use_nl_langinfo_l = true;
+  one_pass ();
+
+  freelocale (c_utf8);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
-- 
2.31.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX.
  2022-01-31  5:34 ` [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX Carlos O'Donell
@ 2022-01-31  8:47   ` Andreas Schwab
  2022-01-31 16:07     ` Carlos O'Donell
  2022-02-01 12:05   ` Florian Weimer
  1 sibling, 1 reply; 15+ messages in thread
From: Andreas Schwab @ 2022-01-31  8:47 UTC (permalink / raw)
  To: Carlos O'Donell via Libc-alpha
  Cc: fweimer, michael.hudson, Carlos O'Donell

On Jan 31 2022, Carlos O'Donell via Libc-alpha wrote:

> +%
> +% This field is consciously aligned with ISO 30112 and the C/POSIX locale.
> +week    7;19971130;4

The copy of ISO 30112 that I could find says 7;19971201;4.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX.
  2022-01-31  8:47   ` Andreas Schwab
@ 2022-01-31 16:07     ` Carlos O'Donell
  0 siblings, 0 replies; 15+ messages in thread
From: Carlos O'Donell @ 2022-01-31 16:07 UTC (permalink / raw)
  To: Andreas Schwab, Carlos O'Donell via Libc-alpha
  Cc: fweimer, michael.hudson

On 1/31/22 03:47, Andreas Schwab wrote:
> On Jan 31 2022, Carlos O'Donell via Libc-alpha wrote:
> 
>> +%
>> +% This field is consciously aligned with ISO 30112 and the C/POSIX locale.
>> +week    7;19971130;4
> 
> The copy of ISO 30112 that I could find says 7;19971201;4.
> 

First and foremost we try to align with C/POSIX builtin locale.

Aligning with C/POSIX ensures that application developers can seamlessly
change from "C" to "C.UTF-8" without breaking existing tests.

e.g. locale/C-time.c
135     { .string = "\7" },
136     { .word = 19971130 },
137     { .string = "\4" },

I'm using ISO 30112 WD12 (2018-02-12), which I have access to as part
of my SC22 involvement. This version would go on to become the final
published standard 2020-09, though I don't yet have a copy of this
version.

Section 4.7 "LC_TIME" under week:
~~~
If the keyword is not specified the values are taken as 7,
19971130 (a Sunday), and 7 (Saturday), respectively.
ISO 8601 conforming applications should use the values
7, 19971201 (a Monday), and 4 (Thursday),
respectively. 
~~~

This matches the ISO 30112 WD10 [2014] that was used when creating
the defaults in ld-time.c:

482   /* Set up defaults based on ISO 30112 WD10 [2014].  */
483   if (time->week_ndays == 0)
484     time->week_ndays = 7;
485 
486   if (time->week_1stday == 0)
487     time->week_1stday = 19971130;
488 
489   if (time->week_1stweek == 0)
490     time->week_1stweek = 7;

This also matches the legacy withdrawn ISO/IEC 14652:2002.

I could change the comment to read:

% This field is consciously aligned with the builtin C/POSIX locale.

Would a new comment resolve your review?

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX.
  2022-01-31  5:34 ` [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX Carlos O'Donell
  2022-01-31  8:47   ` Andreas Schwab
@ 2022-02-01 12:05   ` Florian Weimer
  2022-02-01 16:13     ` Carlos O'Donell
  1 sibling, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2022-02-01 12:05 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha, michael.hudson

* Carlos O'Donell:

> We have had one downstream report from Canonical [1] that
> an rrdtool test was broken by the differences in LC_TIME
> that we had in the non-builtin C locale (C.UTF-8). If one
> application has an issue there are going to be others, and
> so with this commit we review and fix all the issues that
> cause the builtin C locale to be different from C.UTF-8,
> which includes:
> * mon_decimal_point should be empty e.g. ""
>  - Depends on mon_decimal_point_wc fix.
> * negative_sign should be empty e.g. ""
> * week should be aligned with ISO 30112 default e.g. 7;19971130;4
> * d_fmt corrected with escaped slashes e.g. "%m//%d//%y"
> * yesstr and nostr should be empty e.g. ""
> * country_ab2 and country_ab3 should be empty e.g. ""
>
> We bump LC_IDENTIFICATION version and adjust the date to
> indicate the change in the locale.
>
> A new tst-c-utf8-consistency test is added to ensure
> consistency between C/POSIX and C.UTF-8.
>
> Tested on x86_64 and i686 without regression.
>
> [1] https://sourceware.org/pipermail/libc-alpha/2022-January/135703.html
>
> Co-authored-by: Florian Weimer <fweimer@redhat.com>
> ---
>  localedata/Makefile                 |  30 +-
>  localedata/locales/C                |  22 +-
>  localedata/tst-c-utf8-consistency.c | 539 ++++++++++++++++++++++++++++
>  3 files changed, 578 insertions(+), 13 deletions(-)
>  create mode 100644 localedata/tst-c-utf8-consistency.c

This looks broadly okay to me.  Dropping the ISO standard reference
seems prudent if we can't check its contents.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Thanks,
Florian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX.
  2022-02-01 12:05   ` Florian Weimer
@ 2022-02-01 16:13     ` Carlos O'Donell
  0 siblings, 0 replies; 15+ messages in thread
From: Carlos O'Donell @ 2022-02-01 16:13 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, michael.hudson

On 2/1/22 07:05, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>> We have had one downstream report from Canonical [1] that
>> an rrdtool test was broken by the differences in LC_TIME
>> that we had in the non-builtin C locale (C.UTF-8). If one
>> application has an issue there are going to be others, and
>> so with this commit we review and fix all the issues that
>> cause the builtin C locale to be different from C.UTF-8,
>> which includes:
>> * mon_decimal_point should be empty e.g. ""
>>  - Depends on mon_decimal_point_wc fix.
>> * negative_sign should be empty e.g. ""
>> * week should be aligned with ISO 30112 default e.g. 7;19971130;4
>> * d_fmt corrected with escaped slashes e.g. "%m//%d//%y"
>> * yesstr and nostr should be empty e.g. ""
>> * country_ab2 and country_ab3 should be empty e.g. ""
>>
>> We bump LC_IDENTIFICATION version and adjust the date to
>> indicate the change in the locale.
>>
>> A new tst-c-utf8-consistency test is added to ensure
>> consistency between C/POSIX and C.UTF-8.
>>
>> Tested on x86_64 and i686 without regression.
>>
>> [1] https://sourceware.org/pipermail/libc-alpha/2022-January/135703.html
>>
>> Co-authored-by: Florian Weimer <fweimer@redhat.com>
>> ---
>>  localedata/Makefile                 |  30 +-
>>  localedata/locales/C                |  22 +-
>>  localedata/tst-c-utf8-consistency.c | 539 ++++++++++++++++++++++++++++
>>  3 files changed, 578 insertions(+), 13 deletions(-)
>>  create mode 100644 localedata/tst-c-utf8-consistency.c
> 
> This looks broadly okay to me.  Dropping the ISO standard reference
> seems prudent if we can't check its contents.
> 
> Reviewed-by: Florian Weimer <fweimer@redhat.com>

Thanks. I'll drop the ISO references from the commit message.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-02-01 16:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-31  5:34 [PATCH 0/2] Make C/POSIX and C.UTF-8 consistent Carlos O'Donell
2022-01-31  5:34 ` [PATCH 1/2] localedef: Fix handling of empty mon_decimal_point Carlos O'Donell
2022-01-31 15:26   ` Florian Weimer
2022-01-31 16:09     ` Andreas Schwab
2022-01-31 16:20       ` Florian Weimer
2022-01-31 16:30         ` Andreas Schwab
2022-01-31 16:37           ` Florian Weimer
2022-02-01 11:47   ` Florian Weimer
2022-02-01 16:00     ` Carlos O'Donell
2022-02-01 16:14       ` Carlos O'Donell
2022-01-31  5:34 ` [PATCH 2/2] localedata: Adjust C.UTF-8 to align with C/POSIX Carlos O'Donell
2022-01-31  8:47   ` Andreas Schwab
2022-01-31 16:07     ` Carlos O'Donell
2022-02-01 12:05   ` Florian Weimer
2022-02-01 16:13     ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).