public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Carlos O'Donell <carlos@redhat.com>
To: Michael Hudson-Doyle <michael.hudson@canonical.com>
Cc: libc-alpha@sourceware.org, Florian Weimer <fweimer@redhat.com>
Subject: Re: [PATCH v12 2/2] Add generic C.UTF-8 locale (Bug 17318)
Date: Fri, 28 Jan 2022 11:42:52 -0500	[thread overview]
Message-ID: <a2ff9a8e-60cf-b791-80a3-6ef145c608ad@redhat.com> (raw)
In-Reply-To: <CAJ8wqtftKJL2veMSeQRri+5tqmSMFY5VSunAkmu0dupToG7REQ@mail.gmail.com>

On 1/25/22 21:44, Michael Hudson-Doyle wrote:
> On Tue, 7 Sept 2021 at 03:45, Carlos O'Donell via Libc-alpha <
> libc-alpha@sourceware.org> wrote:
> 
>> diff --git a/localedata/locales/C b/localedata/locales/C
>> new file mode 100644
>> index 0000000000..ca801c79cf
>> --- /dev/null
>> +++ b/localedata/locales/C
> 
> 
> [...]
> 
> 
>>
>>
> +LC_TIME
>> +% This is the POSIX Locale definition for the LC_TIME category with the
>> +% exception that time is per ISO 8601 and 24-hour.
>> +%
>> +% Abbreviated weekday names (%a)
>> +abday       "Sun";"Mon";"Tue";"Wed";"Thu";"Fri";"Sat"
>> +
>> +% Full weekday names (%A)
>> +day         "Sunday";"Monday";"Tuesday";"Wednesday";"Thursday";/
>> +            "Friday";"Saturday"
>> +
>> +% Abbreviated month names (%b)
>> +abmon       "Jan";"Feb";"Mar";"Apr";"May";"Jun";"Jul";"Aug";"Sep";/
>> +            "Oct";"Nov";"Dec"
>> +
>> +% Full month names (%B)
>> +mon         "January";"February";"March";"April";"May";"June";"July";/
>> +            "August";"September";"October";"November";"December"
>> +
>> +% Week description, consists of three fields:
>> +% 1. Number of days in a week.
>> +% 2. Gregorian date that is a first weekday (19971130 for Sunday,
>> 19971201 for Monday).
>> +% 3. The weekday number to be contained in the first week of the year.
>> +%
>> +% ISO 8601 conforming applications should use the values 7, 19971201 (a
>> +% Monday), and 4 (Thursday), respectively.
>> +week    7;19971201;4
>>
> 
> It's obviously a bit late, but this is a difference from the Debian/Ubuntu

It is never too late! Thank you for raising this.

Given that you've had problems with one application, other applications will have problems too.

I think we should probably keep C == C.UTF-8 and not change any of the existing LC_TIME properties.

> C.UTF-8 locale, which has:
> 
> week    7;19971130;4

This is the default value from ISO 30112.

This data matches the internal C/POSIX locale.

e.g.

    { .string = "\7" },

7 days in the week.

    { .word = 19971130 },

Week start Sunday. This matches ISO 30112 definition if week is not specified.

    { .string = "\4" },

And Thursday needs to be included in the week for it be considered a "first week."

    { .string = "\1" },
    { .string = "\2" },

And ld-time.c follows defaults from ISO 30112 also.

482   /* Set up defaults based on ISO 30112 WD10 [2014].  */
483   if (time->week_ndays == 0)
484     time->week_ndays = 7;
485 
486   if (time->week_1stday == 0)
487     time->week_1stday = 19971130;
488 
489   if (time->week_1stweek == 0)
490     time->week_1stweek = 7;

> (confusingly, this is preceded by this comment:
> 
> % ISO 8601 conforming applications should use the values 7, 19971130 (a
> % Monday), and 4 (Thursday), respectively.
> 
> but 19971130 is a Sunday).

The above comment is wrong as you note, it is a Sunday.

The verbatim comment from ISO 30112 standard is:
~~~
ISO 8601 conforming applications should use the values 7, 19971201 (a
Monday), and 4 (Thursday), respectively.
~~~

Note the correction in the YYYYMMDD e.g. 19971201.

In our upstream C.UTF-8 locale we are consciously aligning with ISO 8601 in more cases.

117 % Week description, consists of three fields:
118 % 1. Number of days in a week.
119 % 2. Gregorian date that is a first weekday (19971130 for Sunday, 19971201 for Monday).
120 % 3. The weekday number to be contained in the first week of the year.
121 %
122 % ISO 8601 conforming applications should use the values 7, 19971201 (a
123 % Monday), and 4 (Thursday), respectively.
124 week    7;19971201;4
125 first_weekday   1
126 first_workday   2

So there is a difference between C and C.UTF-8 in that they have different first weekday.
 
> The locale(5) page from the man-pages project also says:
> 
> "For compatibility reasons, all glibc locales should set the value of the
> second week list item to 19971130 (Sunday) and base the abday and day lists
> appropriately,".

This is to align with ISO 30112, which is an older standard.

> I found this because it breaks a test of rrdtool (which is probably buggy!
> It sets LC_TIME but needs to clear LC_ALL for that to take any effect) and
> I just wanted to check that this was truly the intended value before (even
> if only just) the release.

In this case for C.UTF-8 we have aligned week with ISO 8601.

There are other parts of C.UTF-8's LC_TIME which are not aligned with ISO 8601.

However, this choice is perhaps inconsistent with the intent of C.UTF-8, so I think this is
actually a bug, and Florian found a real bug in d_fmt (need double slashes).

I'm going to post a patch to fix this and make it consistent with C.

-- 
Cheers,
Carlos.


  reply	other threads:[~2022-01-28 16:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-06 15:43 [PATCH v12 0/2] C.UTF-8 Carlos O'Donell
2021-09-06 15:43 ` [PATCH v12 1/2] Add 'codepoint_collation' support for LC_COLLATE Carlos O'Donell
2021-09-06 17:20   ` Matheus Castanho
2021-09-06 17:28     ` Florian Weimer
2021-09-07  1:28       ` Carlos O'Donell
2021-09-07  1:57     ` Carlos O'Donell
2021-09-20 12:49       ` Matheus Castanho
2021-09-20 12:54         ` Carlos O'Donell
2021-09-06 15:43 ` [PATCH v12 2/2] Add generic C.UTF-8 locale (Bug 17318) Carlos O'Donell
2022-01-26  2:44   ` Michael Hudson-Doyle
2022-01-28 16:42     ` Carlos O'Donell [this message]
2022-01-30 23:58       ` Michael Hudson-Doyle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2ff9a8e-60cf-b791-80a3-6ef145c608ad@redhat.com \
    --to=carlos@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=michael.hudson@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).