public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug localedata/4628] Provide rump locales with ISO 8601 variants for use with LC_TIME
       [not found] <bug-4628-131@http.sourceware.org/bugzilla/>
@ 2021-12-21 14:18 ` dwmw2 at infradead dot org
  2022-01-06 22:03 ` carlos at redhat dot com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: dwmw2 at infradead dot org @ 2021-12-21 14:18 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=4628

--- Comment #21 from David Woodhouse <dwmw2 at infradead dot org> ---
Any progress on this?

The legacy mm/dd and dd/mm forms started becoming obsolescent the moment it
became possible to conduct written communication between the US and Europe in
less time than it takes a steam ship to physically cross the Atlantic Ocean.

As a European, I am constantly bombarded with dates in the US-local form. From
crappy software, crappy web sites, and crappy people. 

So when I see "1/2/2021" I have *absolutely no idea* whether that means January
2nd or February 1st. Even if it's the "right" historical format for my locale,
it's still ambiguous. And that's why it's utterly wrong to be using that format
— even the "correct" variant of it — in the 21st century.

Well-behaved programs should only *ever* use the YYYY-MM-DD form for numeric
dates, and *never* the non-inclusive, ambiguous, parochial local forms mm/dd or
dd/mm.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug localedata/4628] Provide rump locales with ISO 8601 variants for use with LC_TIME
       [not found] <bug-4628-131@http.sourceware.org/bugzilla/>
  2021-12-21 14:18 ` [Bug localedata/4628] Provide rump locales with ISO 8601 variants for use with LC_TIME dwmw2 at infradead dot org
@ 2022-01-06 22:03 ` carlos at redhat dot com
  2022-01-06 22:04 ` carlos at redhat dot com
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: carlos at redhat dot com @ 2022-01-06 22:03 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=4628

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugzilla.mozilla.or
                   |                            |g/show_bug.cgi?id=1509096

--- Comment #22 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to David Woodhouse from comment #21)
> Any progress on this?

I am not aware of any progress on this issue.

Let me summarize the current state over the 14+ years this bug has been open.

I feel that ISO 8601 is a red-herring here, and the vast majority of users are
simply talking about d_fmt and the default becoming YYYY-MM-DD for their
particular use case.

There are three positions that can be taken on default value of d_fmt:

(a) A locale represents the information for an application to use to present
such information to someone local in that locale. Thus d_fmt should be used
when exchanging information locally (whatever that means). An application may
make a choice that in the context of the application's use of the data choose
to present the data in a more interoperable unambiguous format, a 21st century
way, e.g. YYYY-MM-DD for d_fmt.

(b) A locale represents the default way information should be presented, and in
the 21st century, this means d_fmt should be YYYY-MM-DD.

(c) The framework should allow for easy customization of the pattern used for
d_fmt, without loosing the language-specific localization in the rest of
LC_TIME.

Several early responders to this issue in 2007 took the position of (a), and
that no further work was required in glibc. Users were rightly upset because
the solution of making your own locale is not well supported by glibc as a
process in general. Also users only really wanted d_fmt to be YYYY-MM-DD, and
everything else the same.

Some users over time have taken the position (b).

Recent glibc developers have started to come around to (c), and it follows on
from the fact that ICU and other libraries do allow a certain amount of dynamic
customization of date format patterns, as seen in the solution for Thunderbird
here: 
"Implement pref overrides for date and time formats:
intl.date_time.pattern_override.date_* and
intl.date_time.pattern_override.time_* (was: TB 60 on Linux: Setting date
locale to LC_TIME=en_DK.utf8 no longer outputs yyyy-MM-dd format)"
https://bugzilla.mozilla.org/show_bug.cgi?id=1426907

To date we have several suggestions for a way forward:

(1) Add @ variants to all locales.
- Duplicate all locales, rename, and change d_fmt to %Y-%m-%d (with clever
include usage to reduce duplication).

(2) Add @ variants to a few key locales.
- Solution (1) but reduce the overall work and gives some users a solutions.

(3) Implement a language:territory layering.
- Provide a new format for specifying LC_TIME that allows a user to layer
language and territory information. LC_TIME would need to be split into
language-specific entries, and territory-specific entries (language neutral),
and allow layering.

None of (1), (2) or (3) have been implemented to date.

I see maybe another way forward:

(4) New pattern with override selection.

Following the Mozilla and ECMA Script recommendations it might be more possible
to define variants of d_fmt, and allow users to pick a variant e.g.

In environment:
LC_TIME=en_US,d_fmt={iso8601}

In locale sources:
d_fmt            "%m//%d//%Y"
% Variant {iso8601} pattern for d_fmt.
d_fmt_iso8601    "%Y-%M-%d"

Where the locale sources provide tested canonical patterns for the variants.
Then users can pick the variants and we can easily add variants. At the
implementation level we would need to change a pointer after parsing the env
var and we would also break the sharing of some of the pages of the mmap'd
binary locale.

Notes:
https://bugzilla.mozilla.org/show_bug.cgi?id=1426907
https://bugzilla.mozilla.org/show_bug.cgi?id=1509096
https://unicode-org.atlassian.net/browse/CLDR-5769
https://unicode-org.atlassian.net/browse/CLDR-12027
https://github.com/tc39/proposal-intl-datetime-style

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug localedata/4628] Provide rump locales with ISO 8601 variants for use with LC_TIME
       [not found] <bug-4628-131@http.sourceware.org/bugzilla/>
  2021-12-21 14:18 ` [Bug localedata/4628] Provide rump locales with ISO 8601 variants for use with LC_TIME dwmw2 at infradead dot org
  2022-01-06 22:03 ` carlos at redhat dot com
@ 2022-01-06 22:04 ` carlos at redhat dot com
  2022-01-06 22:04 ` carlos at redhat dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: carlos at redhat dot com @ 2022-01-06 22:04 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=4628

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugzilla.mozilla.or
                   |                            |g/show_bug.cgi?id=1426907

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug localedata/4628] Provide rump locales with ISO 8601 variants for use with LC_TIME
       [not found] <bug-4628-131@http.sourceware.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2022-01-06 22:04 ` carlos at redhat dot com
@ 2022-01-06 22:04 ` carlos at redhat dot com
  2022-01-06 22:04 ` carlos at redhat dot com
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: carlos at redhat dot com @ 2022-01-06 22:04 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=4628

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://unicode-org.atlassi
                   |                            |an.net/browse/CLDR-5769

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug localedata/4628] Provide rump locales with ISO 8601 variants for use with LC_TIME
       [not found] <bug-4628-131@http.sourceware.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2022-01-06 22:04 ` carlos at redhat dot com
@ 2022-01-06 22:04 ` carlos at redhat dot com
  2022-01-07  9:48 ` nicolas.mailhot at laposte dot net
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: carlos at redhat dot com @ 2022-01-06 22:04 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=4628

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://unicode-org.atlassi
                   |                            |an.net/browse/CLDR-12027

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug localedata/4628] Provide rump locales with ISO 8601 variants for use with LC_TIME
       [not found] <bug-4628-131@http.sourceware.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2022-01-06 22:04 ` carlos at redhat dot com
@ 2022-01-07  9:48 ` nicolas.mailhot at laposte dot net
  2022-01-07 10:58 ` dwmw2 at infradead dot org
  2022-01-07 11:29 ` nicolas.mailhot at laposte dot net
  7 siblings, 0 replies; 8+ messages in thread
From: nicolas.mailhot at laposte dot net @ 2022-01-07  9:48 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=4628

--- Comment #23 from Nicolas Mailhot <nicolas.mailhot at laposte dot net> ---
Really the whole thing has been blown out of proportion by people who did not
want to work on this, piling strawmen over strawmen.

People asked to have the choice to use a new standard way of presenting time
and dates, a way that has been normalised by ISO, IETF, W3C etc, a way that has
been available by default in other OSes for about 20 years, a way that works
better for human and for software (because it sorts properly).

In other words, exactly what happened before when people switched to 24h time,
but to read some comments implementing 24h time would have required switching
all locales to some fake C unilocale, translating hour names into “neutral”
English.

All people ask for is switching numeric representations to a common format
(yyyy-mm-dd, standard iso weeks…), keeping human day names as-is. 

The only part remotely controversial are iso weeks starting on monday but they
are mostly used by businesses and businesses are the first ones to clamour for
something that does not break when you pass a border (and besides most locales
interested in ISO 8601 dates are already using ISO weeks).

And if locale structure as it exists today is badly suited to this need maybe
locale structure needs to evolve to keep up with human needs?

Unfortunately, free software continues to be unfriendly to non-US users. I
don’t know why its the case, but it’s the sole system that still has problems
selecting A4 paper or distinguishing between input system and input language.
All things MS solved in Office and Windows circa 1995.

Makes you proud of the people that managed the switch to UTF-8 given how
controversial those changes seem to be there (they are not controversial in
other OSes).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug localedata/4628] Provide rump locales with ISO 8601 variants for use with LC_TIME
       [not found] <bug-4628-131@http.sourceware.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2022-01-07  9:48 ` nicolas.mailhot at laposte dot net
@ 2022-01-07 10:58 ` dwmw2 at infradead dot org
  2022-01-07 11:29 ` nicolas.mailhot at laposte dot net
  7 siblings, 0 replies; 8+ messages in thread
From: dwmw2 at infradead dot org @ 2022-01-07 10:58 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=4628

--- Comment #24 from David Woodhouse <dwmw2 at infradead dot org> ---
Thanks, Carlos.

(In reply to Carlos O'Donell from comment #22) 
> I feel that ISO 8601 is a red-herring here, and the vast majority of users
> are simply talking about d_fmt and the default becoming YYYY-MM-DD for their
> particular use case.

Agreed.

> There are three positions that can be taken on default value of d_fmt:

It seems to me that these positions are retroactively (re)defining the very
semantics of d_fmt, and the purposes for which e.g. strftime("%x") should be
used. Yet applications have been using it in its current form for decades, and
it has been received wisdom that "well-behaved" applications will use the
locale format for the date.

We can change precisely what date format the applications get when they use 
d_fmt — that is precisely the *point* of it, after all. But I'd suggest that it
doesn't make much sense to talk about when/whether applications should use it
at all. That ship has sailed.

> (a) A locale represents the information for an application to use to present
> such information to someone local in that locale. Thus d_fmt should be used
> when exchanging information locally (whatever that means). An application
> may make a choice that in the context of the application's use of the data
> choose to present the data in a more interoperable unambiguous format, a
> 21st century way, e.g. YYYY-MM-DD for d_fmt.

It took me a while to parse the difference between (a) and (b) here, and the
distinction between "for local only viewing" in (a) vs. "default" in (b).

I think the final sentence of (a) should say that an application may explicitly
use e.g. YYYY-MM-DD (%Y-%m-%d) *instead* of using d_fmt (%x).

Of course, in today's interconnected world the only application that can know
its output will only be consumed locally is Snapchat. Anything that outputs
text which can be shared via email/screenshots/files/databases would need to
eschew the legacy d_fmt and explicitly use %Y-%m-%d instead.

On top of which, even if an application *could* know that it's displaying only
to a local user, the archaic dd/mm/yy form of en_GB is *still* the wrong thing
to display. The world has changed, with computers now being considered "broken"
if they cannot instantly communicate with others on a different continent, and
the legacy dd/mm form is a poor choice even for *local* communication because
users are inundated with dates in 'wrong' mm/dd form that makes it ambiguous.

Applications should just use %Y-%m-%d everywhere, unconditionally.

So if we choose (a) then we should change the strftime(3) man page to make it
clear that the %x format is deprecated and should never be used, much like the
warning we already have on %D. And we should patch all the applications in the
world, something like this...

-    strftime(buf, sizeof(buf), _("The date is %x"), tm);
+    strftime(buf, sizeof(buf), _("The date is %Y-%m-%d", tm);


> (b) A locale represents the default way information should be presented, and
> in the 21st century, this means d_fmt should be YYYY-MM-DD.

This is my understanding of the current position. Well-behaved applications
*should* use d_fmt, and it *should* do something appropriate based on which
country, and which millennium, the user is living in.

Instead of deprecating '%x' and declaring that decent applications will change
to manually use %Y-%m-%d, which is the logical conclusion of choice (a), choice
(b) would simply use the existing flexibility to make existing applications do
the right thing seamlessly. It seems like the better choice to me.

> (c) The framework should allow for easy customization of the pattern used
> for d_fmt, without loosing the language-specific localization in the rest of
> LC_TIME.

I don't even know that this needs to be selectable at run time. A build time
choice could be perfectly sufficient, wouldn't it?

Let's switch viewpoints and look at the user/application experience rather than
the implementation side.

Let's also take a slightly less contentious example which really is just a bug.
Poland *officially* adopted the YYYY-MM-DD format in 2002, yet 'LC_TIME=pl_PL
date +%x' still seems to output the legacy 07.01.2022 format. Shouldn't that
one just be fixed in the next glibc release? Should we file a separate bug for
it?

A *distribution* might then want to revert that change, perhaps if shipping the
new glibc as an update to an existing system (to avoid breaking some
admittedly-already-broken screenscraping scripts). So making it a build-time
option would be useful.

But the next major release of the distribution would probably just use the
"new" post-2002 d_fmt for pl_PL.

For the individual user, the experience would just be that in a new version,
the behaviour gets updated. And if they *really* object to the fixed behaviour,
they still have the (runtime) option of building their own locale to regress it
for themselves. It's not trivial, but it doesn't *need* to be.

And if we can do it for pl_PL (which we absolutely should), then why shouldn't
we do the same for *all* locales? Because this far into the 21st century,
%Y-%m-%d is the only sane way to represent numeric dates.

> I see maybe another way forward:
> 
> (4) New pattern with override selection.
> 
> Following the Mozilla and ECMA Script recommendations it might be more
> possible to define variants of d_fmt, and allow users to pick a variant e.g.
> 
> In environment:
> LC_TIME=en_US,d_fmt={iso8601}
> 
> In locale sources:
> d_fmt            "%m//%d//%Y"
> % Variant {iso8601} pattern for d_fmt.
> d_fmt_iso8601    "%Y-%M-%d"

If you do proceed with runtime options, please ensure that the *default* is
YYYY-MM-DD and that the suffix is required to go back to the legacy form. Or
define *both* 'iso8601' and 'legacy' suffixes and allow the system default to
be configured at build time (and *that* should default to iso8601).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug localedata/4628] Provide rump locales with ISO 8601 variants for use with LC_TIME
       [not found] <bug-4628-131@http.sourceware.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2022-01-07 10:58 ` dwmw2 at infradead dot org
@ 2022-01-07 11:29 ` nicolas.mailhot at laposte dot net
  7 siblings, 0 replies; 8+ messages in thread
From: nicolas.mailhot at laposte dot net @ 2022-01-07 11:29 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=4628

--- Comment #25 from Nicolas Mailhot <nicolas.mailhot at laposte dot net> ---
>From a change management POW there is no need to redefine defaults.

However, there *is* a need to have format selection happen at the locale name
level. That’s the simplest way both to enable adopters (that will set the new
local and then lobby their distro to select it at install time) and to protect
stragglers (that will do the reverse on systems where a broken app/script can
not accommodate the new format).

UTF8 migration shows local-name-level changes are manageable. And it does not
matter if the old locale identifier corresponds to the old or the new format.

IMHO something finer grained is bound to cause problems because there won’t be
any simple exit hatch for people who have to live with broken scripts or apps.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-01-07 11:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-4628-131@http.sourceware.org/bugzilla/>
2021-12-21 14:18 ` [Bug localedata/4628] Provide rump locales with ISO 8601 variants for use with LC_TIME dwmw2 at infradead dot org
2022-01-06 22:03 ` carlos at redhat dot com
2022-01-06 22:04 ` carlos at redhat dot com
2022-01-06 22:04 ` carlos at redhat dot com
2022-01-06 22:04 ` carlos at redhat dot com
2022-01-07  9:48 ` nicolas.mailhot at laposte dot net
2022-01-07 10:58 ` dwmw2 at infradead dot org
2022-01-07 11:29 ` nicolas.mailhot at laposte dot net

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).