public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [swbz 29035] mktime vs non-DST
@ 2022-08-17 21:18 DJ Delorie
  2022-08-17 21:50 ` Carlos O'Donell
  2022-08-17 23:10 ` Paul Eggert
  0 siblings, 2 replies; 15+ messages in thread
From: DJ Delorie @ 2022-08-17 21:18 UTC (permalink / raw)
  To: libc-alpha


https://sourceware.org/bugzilla/show_bug.cgi?id=29035

TL;DR - requesting a partial reversion of 86aece3 to become
bug-compatible with older releases.

Long version:

In investigating this, I did a deep-dive on how tm_isdst works in
mktime().  It seems to be less of a hint and more of an override, in
that, if you set tm_isdst=1 you're going to get a result that seems an
hour off if you're in the middle of a standard time period.  Same for
tm_isdst=0.  Setting tm_isdst=-1 is the only way to let mktime use the
dst-in-effect for the time period specified.  Note: I'm not
considering the time duplication that happens at period boundaries
(i.e. the "fall back" that causes an hour of clock time to repeat each
fall).

So if you set tm_isdst=1 in a call to mktime(), it figures out the
local DST offset and applies it regardless of timezone rules.

In the BZ case, however, the zoneinfo in effect does not have a DST
defined (or, as we'll see later, hasn't had DST in a long time).  If
there's no DST, and you set tm_isdst=1, what happens?

Well, prior to 2.29, mktime just overrode tm_isdst and returned a
suitable time according to the current zoneinfo, as if you had passed
tm_isdst=0 or -1 instead.

As of 2.29, we have commit 86aece3bfbd44538ba4fdc947872c81d4c5e6e61
by Paul which includes:

    (__mktime_internal): Set errno to EOVERFLOW if the spring-forward
    gap code fails.

   /* We have a match.  Check whether tm.tm_isdst has the requested
      value, if any.  */
   if (isdst_differ (isdst, tm.tm_isdst))
     {
       . . .
+      __set_errno (EOVERFLOW);
+      return -1;
     }

With this change, tm_isdst becomes a hard requirement, and if the
current zone doesn't have a DST defined, you get a failure, where we
used to succeed (but with a non-DST result).

The relevent standards are pretty quiet on this topic; what little
they say can be interpreted either way - tm_isdst is a requirement, or
tm_isdst is a hint to be corrected by mktime() like other fields.

This breaks the logic down into three categories:

1. You're in a transition period where clock time repeats, and you
   need tm_isdst to decide which to return.

2. You're not in a transition period, and you might as well set
   tm_isdst=-1 unless you want an off-by-an-hour result.

3. Your zone doesn't have dst and setting tm_isdst=1 is meaningless.

I can't see an obvious way to detect case 1 from 2, so this seems to
be a useless set of categories.  A better breakdown would be:

1. You set tm_isdst=-1 by default.  Most of the time, this works.

2. If the time is ambiguous due to a transition, case 1 returns EAGAIN
   and you try again with tm_isdst=0 or 1.

3. If you set tm_isdst=0 or 1 outside of a transition, it returns
   EINVAL if it's incorrect for that time.

But that would be a BIG world-breaking change.  One can dream :-)

Meanwhile, I would like us to consider reverting the commit mentioned
above (not the whole commit, just the two lines I included).  This
will have the effect of making the current code bug-compatible with
older code, in that, setting tm_isdst=1 in a no-dst zone returns a
non-dst (but otherwise valid) time, and updates tm_idst to 0.
Returning EOVERFLOW in these new cases is not useful.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [swbz 29035] mktime vs non-DST
  2022-08-17 21:18 [swbz 29035] mktime vs non-DST DJ Delorie
@ 2022-08-17 21:50 ` Carlos O'Donell
  2022-08-17 23:10 ` Paul Eggert
  1 sibling, 0 replies; 15+ messages in thread
From: Carlos O'Donell @ 2022-08-17 21:50 UTC (permalink / raw)
  To: DJ Delorie, libc-alpha, Paul Eggert

On 8/17/22 17:18, DJ Delorie via Libc-alpha wrote:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=29035
> 
> TL;DR - requesting a partial reversion of 86aece3 to become
> bug-compatible with older releases.
> 
> Long version:
> 
> In investigating this, I did a deep-dive on how tm_isdst works in
> mktime().  It seems to be less of a hint and more of an override, in
> that, if you set tm_isdst=1 you're going to get a result that seems an
> hour off if you're in the middle of a standard time period.  Same for
> tm_isdst=0.  Setting tm_isdst=-1 is the only way to let mktime use the
> dst-in-effect for the time period specified.  Note: I'm not
> considering the time duplication that happens at period boundaries
> (i.e. the "fall back" that causes an hour of clock time to repeat each
> fall).
> 
> So if you set tm_isdst=1 in a call to mktime(), it figures out the
> local DST offset and applies it regardless of timezone rules.
> 
> In the BZ case, however, the zoneinfo in effect does not have a DST
> defined (or, as we'll see later, hasn't had DST in a long time).  If
> there's no DST, and you set tm_isdst=1, what happens?
> 
> Well, prior to 2.29, mktime just overrode tm_isdst and returned a
> suitable time according to the current zoneinfo, as if you had passed
> tm_isdst=0 or -1 instead.

We should continue to do that until the end of time.

No matter what the standards say, at this point the behaviour of mktime() when
passed tm_isdst=0 or tm_isdst=1 is a contract with our users.

> As of 2.29, we have commit 86aece3bfbd44538ba4fdc947872c81d4c5e6e61
> by Paul which includes:
> 
>     (__mktime_internal): Set errno to EOVERFLOW if the spring-forward
>     gap code fails.
> 
>    /* We have a match.  Check whether tm.tm_isdst has the requested
>       value, if any.  */
>    if (isdst_differ (isdst, tm.tm_isdst))
>      {
>        . . .
> +      __set_errno (EOVERFLOW);
> +      return -1;
>      }
> 
> With this change, tm_isdst becomes a hard requirement, and if the
> current zone doesn't have a DST defined, you get a failure, where we
> used to succeed (but with a non-DST result).

We should do this.

> The relevent standards are pretty quiet on this topic; what little
> they say can be interpreted either way - tm_isdst is a requirement, or
> tm_isdst is a hint to be corrected by mktime() like other fields.

We should do and keep doing whatever the old code did IMO and document that.

> This breaks the logic down into three categories:
> 
> 1. You're in a transition period where clock time repeats, and you
>    need tm_isdst to decide which to return.
> 
> 2. You're not in a transition period, and you might as well set
>    tm_isdst=-1 unless you want an off-by-an-hour result.
> 
> 3. Your zone doesn't have dst and setting tm_isdst=1 is meaningless.

Consider the application programmer point of view.

They want to always take a specific action like (1), so they just set tm_isdst=1
to ensure we always pick one side of the transition.

You would never set -1.

You would always set, say 1, regardless of the zone.

You expect to always get an answer, never an error, and get a reasonable result.
 
> I can't see an obvious way to detect case 1 from 2, so this seems to
> be a useless set of categories.  A better breakdown would be:
> 
> 1. You set tm_isdst=-1 by default.  Most of the time, this works.

Right.

> 2. If the time is ambiguous due to a transition, case 1 returns EAGAIN
>    and you try again with tm_isdst=0 or 1.

Right.

> 3. If you set tm_isdst=0 or 1 outside of a transition, it returns
>    EINVAL if it's incorrect for that time.

Right.

I agree with you here, but let me introduce what I believe users want:

1. You set tm_isdst=-1 by default and it always works. The algorithm picks
   one of the times in the transition zone for you by default.

2. You set tm_isdst=0/1 and it always works, but the value selects one side
   of the transition if you're in one, otherwise it behaves like -1.

> But that would be a BIG world-breaking change.  One can dream :-)

Lets leave -1 aside for now.

Setting to 0 or 1 should just not fail IMO.

We should work to restore the existing old behaviour, and make sure that
between 2.29 and the future that we don't regress.
 
> Meanwhile, I would like us to consider reverting the commit mentioned
> above (not the whole commit, just the two lines I included).  This
> will have the effect of making the current code bug-compatible with
> older code, in that, setting tm_isdst=1 in a no-dst zone returns a
> non-dst (but otherwise valid) time, and updates tm_idst to 0.
> Returning EOVERFLOW in these new cases is not useful.
 
Agreed. Lets do that. And add tests please.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [swbz 29035] mktime vs non-DST
  2022-08-17 21:18 [swbz 29035] mktime vs non-DST DJ Delorie
  2022-08-17 21:50 ` Carlos O'Donell
@ 2022-08-17 23:10 ` Paul Eggert
  2022-08-18  1:39   ` DJ Delorie
  2022-09-08 20:25   ` DJ Delorie
  1 sibling, 2 replies; 15+ messages in thread
From: Paul Eggert @ 2022-08-17 23:10 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1268 bytes --]

On 8/17/22 14:18, DJ Delorie wrote:
> if you set tm_isdst=1 in a call to mktime(), it figures out the
> local DST offset and applies it regardless of timezone rules.

Actually, mktime doesn't and never did that in general, because in 
general there's no such thing as the "local DST offset": within a single 
TZ setting the difference between standard time and DST need not be 
constant. For example, if we are currently in standard time, it's 
possible that the next spring-forward will add 30 minutes, while the 
previous fall-backward subtracted an hour. In such a situation the 
"local DST offset" could reasonably be thought of as 30 minutes, or 60 
minutes, or even something else if there were other transitions of 
varying sizes at other times.

Although I'm not seeing how BZ#29035 led to your diagnosis (Andreas 
didn't reproduce that bug), we did run into a similar problem in Gnulib, 
and I installed into Gnulib a patch to fix that. So I suggest syncing 
with Gnulib. This is a bit fancier than what you suggested, but I expect 
it'll fix BZ#29035 while it's also fixing the bug reported against Gnulib.

Proposed glibc patches attached. Their effect is to sync glibc from 
Gnulib. If these patches don't fix the problem you diagnosed please let 
me know.

[-- Attachment #2: 0001-Assume-HAVE_TZSET-in-time-mktime.c.patch --]
[-- Type: text/x-patch, Size: 709 bytes --]

From 674d4588995d24f8bc5167f73d8661ef5c82c2de Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 17 Aug 2022 15:35:47 -0700
Subject: [PATCH 1/2] Assume HAVE_TZSET in time/mktime.c

This patch does not affect glibc.  It affects only Gnulib-specific
code and is for coordination with Gnulib.
---
 time/mktime.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/time/mktime.c b/time/mktime.c
index 494c89bf54..aa12e28e16 100644
--- a/time/mktime.c
+++ b/time/mktime.c
@@ -94,7 +94,7 @@ my_tzset (void)
   const char *tz = getenv ("TZ");
   if (tz != NULL && strchr (tz, '/') != NULL)
     _putenv ("TZ=");
-# elif HAVE_TZSET
+# else
   tzset ();
 # endif
 }
-- 
2.34.1


[-- Attachment #3: 0002-mktime-improve-heuristic-for-ca-1986-Indiana-DST.patch --]
[-- Type: text/x-patch, Size: 3198 bytes --]

From bfe919ce3bba22178429c5b39db98c26fd83272e Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 17 Aug 2022 15:38:54 -0700
Subject: [PATCH 2/2] mktime: improve heuristic for ca-1986 Indiana DST

This patch syncs mktime.c from Gnulib, fixing a
problem reported by Mark Krenz <https://bugs.gnu.org/48085>,
and it should fix BZ#29035 too.
* time/mktime.c (__mktime_internal): Be more generous about
accepting arguments with the wrong value of tm_isdst, by falling
back to a one-hour DST difference if we find no nearby DST that is
unusual.  This fixes a problem where "1986-04-28 00:00 EDT" was
rejected when TZ="America/Indianapolis" because the nearest DST
timestamp occurred in 1970, a temporal distance too great for the
old heuristic.  This also also narrows the search a bit, which
is a minor performance win.
---
 time/mktime.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/time/mktime.c b/time/mktime.c
index aa12e28e16..7dc9d67ef9 100644
--- a/time/mktime.c
+++ b/time/mktime.c
@@ -429,8 +429,13 @@ __mktime_internal (struct tm *tp,
 	 time with the right value, and use its UTC offset.
 
 	 Heuristic: probe the adjacent timestamps in both directions,
-	 looking for the desired isdst.  This should work for all real
-	 time zone histories in the tz database.  */
+	 looking for the desired isdst.  If none is found within a
+	 reasonable duration bound, assume a one-hour DST difference.
+	 This should work for all real time zone histories in the tz
+	 database.  */
+
+      /* +1 if we wanted standard time but got DST, -1 if the reverse.  */
+      int dst_difference = (isdst == 0) - (tm.tm_isdst == 0);
 
       /* Distance between probes when looking for a DST boundary.  In
 	 tzdata2003a, the shortest period of DST is 601200 seconds
@@ -441,12 +446,14 @@ __mktime_internal (struct tm *tp,
 	 periods when probing.  */
       int stride = 601200;
 
-      /* The longest period of DST in tzdata2003a is 536454000 seconds
-	 (e.g., America/Jujuy starting 1946-10-01 01:00).  The longest
-	 period of non-DST is much longer, but it makes no real sense
-	 to search for more than a year of non-DST, so use the DST
-	 max.  */
-      int duration_max = 536454000;
+      /* In TZDB 2021e, the longest period of DST (or of non-DST), in
+	 which the DST (or adjacent DST) difference is not one hour,
+	 is 457243209 seconds: e.g., America/Cambridge_Bay with leap
+	 seconds, starting 1965-10-31 00:00 in a switch from
+	 double-daylight time (-05) to standard time (-07), and
+	 continuing to 1980-04-27 02:00 in a switch from standard time
+	 (-07) to daylight time (-06).  */
+      int duration_max = 457243209;
 
       /* Search in both directions, so the maximum distance is half
 	 the duration; add the stride to avoid off-by-1 problems.  */
@@ -483,6 +490,11 @@ __mktime_internal (struct tm *tp,
 	      }
 	  }
 
+      /* No unusual DST offset was found nearby.  Assume one-hour DST.  */
+      t += 60 * 60 * dst_difference;
+      if (mktime_min <= t && t <= mktime_max && convert_time (convert, t, &tm))
+	goto offset_found;
+
       __set_errno (EOVERFLOW);
       return -1;
     }
-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [swbz 29035] mktime vs non-DST
  2022-08-17 23:10 ` Paul Eggert
@ 2022-08-18  1:39   ` DJ Delorie
  2022-08-18  2:37     ` Carlos O'Donell
  2022-08-18  3:02     ` Paul Eggert
  2022-09-08 20:25   ` DJ Delorie
  1 sibling, 2 replies; 15+ messages in thread
From: DJ Delorie @ 2022-08-18  1:39 UTC (permalink / raw)
  To: Paul Eggert; +Cc: libc-alpha

Paul Eggert <eggert@cs.ucla.edu> writes:
> Although I'm not seeing how BZ#29035 led to your diagnosis

We had a customer bz with a similar problem.  I wrote a script to test
every transition in every zoneinfo file, plus a january/june check, for
every tm_isdst.  The patterns tend to jump out at you after that.

> This is a bit fancier than what you suggested, but I expect it'll fix
> BZ#29035 while it's also fixing the bug reported against Gnulib.

It does stop mktime from returning -1 in the problem case, but it still
returns a different value than pre-2.29.  Older code returned the
standard time result, your patches return a DST result, even if the zone
has no DST.  I think this makes it more consistent, but I don't know
what the consequences of "different than before" will be here.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [swbz 29035] mktime vs non-DST
  2022-08-18  1:39   ` DJ Delorie
@ 2022-08-18  2:37     ` Carlos O'Donell
  2022-08-18  3:16       ` Paul Eggert
  2022-08-18 21:17       ` DJ Delorie
  2022-08-18  3:02     ` Paul Eggert
  1 sibling, 2 replies; 15+ messages in thread
From: Carlos O'Donell @ 2022-08-18  2:37 UTC (permalink / raw)
  To: DJ Delorie, Paul Eggert; +Cc: libc-alpha

On 8/17/22 21:39, DJ Delorie via Libc-alpha wrote:
> Paul Eggert <eggert@cs.ucla.edu> writes:
>> Although I'm not seeing how BZ#29035 led to your diagnosis
> 
> We had a customer bz with a similar problem.  I wrote a script to test
> every transition in every zoneinfo file, plus a january/june check, for
> every tm_isdst.  The patterns tend to jump out at you after that.
> 
>> This is a bit fancier than what you suggested, but I expect it'll fix
>> BZ#29035 while it's also fixing the bug reported against Gnulib.
> 
> It does stop mktime from returning -1 in the problem case, but it still
> returns a different value than pre-2.29.  Older code returned the
> standard time result, your patches return a DST result, even if the zone
> has no DST.  I think this makes it more consistent, but I don't know
> what the consequences of "different than before" will be here.
 
Different than before is problematic because of existing code expectations.

The mktime() interface has been around for a long time and users do not expect
and have noticed the semantic changes we are seeing here today.

I would lean towards making the code do exactly what it did before in these
corner cases (unless we don't consider them corner cases).

May you please share some examples you have from your analysis and share how
they are different before and now? How far back does the old behaviour go?

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [swbz 29035] mktime vs non-DST
  2022-08-18  1:39   ` DJ Delorie
  2022-08-18  2:37     ` Carlos O'Donell
@ 2022-08-18  3:02     ` Paul Eggert
  1 sibling, 0 replies; 15+ messages in thread
From: Paul Eggert @ 2022-08-18  3:02 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1409 bytes --]

On 8/17/22 18:39, DJ Delorie wrote:

> It does stop mktime from returning -1 in the problem case, but it still
> returns a different value than pre-2.29.  Older code returned the
> standard time result, your patches return a DST result, even if the zone
> has no DST.

Actually, the old (glibc 2.28) code could return a standard time result 
even in zones with DST, and this led to results that weren't 
self-consistent.

For example, suppose we use the attached program to ask mktime "What 
time is it in Fiji on July 1 at 00:00 DST for the two years 2029 and 
2030?"  Although Fiji no longer observes DST (it stopped last year), the 
old glibc code does this:

   2029-07-31 23:00:00 +1300 +13
   2030-08-01 00:00:00 +1300 +13

whereas Gnulib (and glibc with the proposed patches) does this:

   2029-07-31 23:00:00 +1300 +13
   2030-07-31 23:00:00 +1300 +13

The old code is inconsistent with itself, because it gets confused by 
the DST that Fiji stopped observing in in 2021. The new code is 
consistent with itself, as well as with similar questions asked about 
New York, Los Angeles, etc. when you ask for a DST during a day that DST 
is not observed.

There is no perfection in this area because the mktime API is busted 
(though POSIX may improve it if they listen to me :-). That being said, 
the Gnulib mktime behavior should be better than the old glibc behavior 
for the great majority of users.

[-- Attachment #2: uu.c --]
[-- Type: text/x-csrc, Size: 497 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

void
check_year (int year)
{
  struct tm tm;
  tm.tm_year = year - 1900;
  tm.tm_mon = 7;
  tm.tm_mday = 1;
  tm.tm_hour = tm.tm_min = tm.tm_sec = 0;
  tm.tm_isdst = 1;
  time_t t = mktime (&tm);
  if (t != (time_t) -1)
    {
      char buf[1000];
      strftime (buf, sizeof buf, "%Y-%m-%d %H:%M:%S %z %Z", &tm);
      puts (buf);
    }
}

int
main (void)
{
  setenv ("TZ", "Pacific/Apia", 1);
  check_year (2029);
  check_year (2030);
}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [swbz 29035] mktime vs non-DST
  2022-08-18  2:37     ` Carlos O'Donell
@ 2022-08-18  3:16       ` Paul Eggert
  2022-08-18  4:05         ` Carlos O'Donell
  2022-08-18 21:17       ` DJ Delorie
  1 sibling, 1 reply; 15+ messages in thread
From: Paul Eggert @ 2022-08-18  3:16 UTC (permalink / raw)
  To: Carlos O'Donell, DJ Delorie; +Cc: libc-alpha

On 8/17/22 19:37, Carlos O'Donell wrote:
> The mktime() interface has been around for a long time and users do not expect
> and have noticed the semantic changes we are seeing here today.

The semantic changes that bother users boil down to the following: glibc 
mktime returns -1 when users specify impossible is_dst flags, whereas 
mktime used to succeed. That problem is fixed in Gnulib.

The only disagreement between Gnulib and old (2.28) glibc in this area, 
is in relatively obscure cases where we have not seen any bug reports, 
and in these cases old glibc mostly gets it wrong (for example, see my 
previous email), whereas Gnulib mostly gets it right. Let's stick with 
the Gnulib behavior here.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [swbz 29035] mktime vs non-DST
  2022-08-18  3:16       ` Paul Eggert
@ 2022-08-18  4:05         ` Carlos O'Donell
  0 siblings, 0 replies; 15+ messages in thread
From: Carlos O'Donell @ 2022-08-18  4:05 UTC (permalink / raw)
  To: Paul Eggert, DJ Delorie; +Cc: libc-alpha

On 8/17/22 23:16, Paul Eggert wrote:
> On 8/17/22 19:37, Carlos O'Donell wrote:
>> The mktime() interface has been around for a long time and users do
>> not expect and have noticed the semantic changes we are seeing here
>> today.
> 
> The semantic changes that bother users boil down to the following:
> glibc mktime returns -1 when users specify impossible is_dst flags,
> whereas mktime used to succeed. That problem is fixed in Gnulib.

Right.

> The only disagreement between Gnulib and old (2.28) glibc in this
> area, is in relatively obscure cases where we have not seen any bug
> reports, and in these cases old glibc mostly gets it wrong (for
> example, see my previous email), whereas Gnulib mostly gets it right.
> Let's stick with the Gnulib behavior here.
 
What I'd like to see here as an outcome is a broader battery of tests in
glibc that cover the mktime() behaviour we want to support. We'll work on that
and review the recommended gnulib changes.

We have also seen a somewhat odd use of mktime() to detect DST transitions that
DJ and I need to audit and verify against the changes you have implemented in
gnulib.

Thanks for the quick comments. I think the next step is with us to do more
testing.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [swbz 29035] mktime vs non-DST
  2022-08-18  2:37     ` Carlos O'Donell
  2022-08-18  3:16       ` Paul Eggert
@ 2022-08-18 21:17       ` DJ Delorie
  2022-08-18 21:57         ` Paul Eggert
  1 sibling, 1 reply; 15+ messages in thread
From: DJ Delorie @ 2022-08-18 21:17 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: eggert, libc-alpha

"Carlos O'Donell" <carlos@redhat.com> writes:
> May you please share some examples you have from your analysis and share how
> they are different before and now? How far back does the old behaviour go?

Here are some snippets from the test program I wrote to deduce what
mktime was doing.  The program uses "zdump -v" to get all the
transitions in a zone, and converts that in to a table of time_t's and
struct tm's for each transition boundary.  The table is limited to
1979..2021.

Once the table is built, every transition edge, before and after, is
run through mktime() to verify it returns the given time_t with the
given tm_isdst.  I convert the time_t back to gmtime and localtime,
but do not yet verify the results.  Next, for transitions that involve
a step forward in time (i.e. where there's never an ambiguous wall
clock time), I test mktime with tm_isdst = {-1, 0, 1} and see what
each returns.  This is deemed "broken" if it results in a time_t
different than indicated.

The two timezones I'm using here are America/NewYork (typical
winter/summer STD/DST transitions) and Asia/Tokyo (which hasn't had
DST since 1952) as extremes.

FYI glibc 2.12 reports the same results as 2.28

glibc 2.28 (RHEL 8) (partial listing):

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
"America/New_York", 1615705199, = Sun, 2021-Mar-14  6:59:59 = Sun, 2021-Mar-14  1:59:59 = 0 -18000
1615705199 -> Sun Mar 14 06:59:59 2021 gmt, or Sun Mar 14 01:59:59 2021 local
  - UNK:1615705199 returned  Sun, 2021-Mar-14  1:59:59 STD
  - STD:1615705199 returned  Sun, 2021-Mar-14  1:59:59 STD
broken isdst=1: 1615701599 instead of 1615705199 (-3600)
  - DST:1615701599 returned  Sun, 2021-Mar-14  0:59:59 STD

"America/New_York", 1615705200, = Sun, 2021-Mar-14  7:00:00 = Sun, 2021-Mar-14  3:00:00 = 1 -14400
1615705200 -> Sun Mar 14 07:00:00 2021 gmt, or Sun Mar 14 03:00:00 2021 local
  - UNK:1615705200 returned  Sun, 2021-Mar-14  3:00:00 DST
broken isdst=0: 1615708800 instead of 1615705200 (+3600)
  - STD:1615708800 returned  Sun, 2021-Mar-14  4:00:00 DST
  - DST:1615705200 returned  Sun, 2021-Mar-14  3:00:00 DST

"America/New_York", 1636264799, = Sun, 2021-Nov-07  5:59:59 = Sun, 2021-Nov-07  1:59:59 = 1 -14400
1636264799 -> Sun Nov  7 05:59:59 2021 gmt, or Sun Nov  7 01:59:59 2021 local

"America/New_York", 1636264800, = Sun, 2021-Nov-07  6:00:00 = Sun, 2021-Nov-07  1:00:00 = 0 -18000
1636264800 -> Sun Nov  7 06:00:00 2021 gmt, or Sun Nov  7 01:00:00 2021 local

"America/New_York", 1641038400, = Sat, 2022-Jan-01 12:00:00 = Sat, 2022-Jan-01  7:00:00 = -1 -18000
1641038400 -> Sat Jan  1 12:00:00 2022 gmt, or Sat Jan  1 07:00:00 2022 local
  - UNK:1641038400 returned  Sat, 2022-Jan-01  7:00:00 STD
  - STD:1641038400 returned  Sat, 2022-Jan-01  7:00:00 STD
broken isdst=1: 1641034800 instead of 1641038400 (-3600)
  - DST:1641034800 returned  Sat, 2022-Jan-01  6:00:00 STD

"America/New_York", 1656676800, = Fri, 2022-Jul-01 12:00:00 = Fri, 2022-Jul-01  8:00:00 = -1 -14400
1656676800 -> Fri Jul  1 12:00:00 2022 gmt, or Fri Jul  1 08:00:00 2022 local
  - UNK:1656676800 returned  Fri, 2022-Jul-01  8:00:00 DST
broken isdst=0: 1656680400 instead of 1656676800 (+3600)
  - STD:1656680400 returned  Fri, 2022-Jul-01  9:00:00 DST
  - DST:1656676800 returned  Fri, 2022-Jul-01  8:00:00 DST

"Asia/Tokyo", 1641038400, = Sat, 2022-Jan-01 12:00:00 = Sat, 2022-Jan-01 21:00:00 = -1 32400
1641038400 -> Sat Jan  1 12:00:00 2022 gmt, or Sat Jan  1 21:00:00 2022 local
  - UNK:1641038400 returned  Sat, 2022-Jan-01 21:00:00 STD
  - STD:1641038400 returned  Sat, 2022-Jan-01 21:00:00 STD
  - DST:1641038400 returned  Sat, 2022-Jan-01 21:00:00 STD

"Asia/Tokyo", 1656676800, = Fri, 2022-Jul-01 12:00:00 = Fri, 2022-Jul-01 21:00:00 = -1 32400
1656676800 -> Fri Jul  1 12:00:00 2022 gmt, or Fri Jul  1 21:00:00 2022 local
  - UNK:1656676800 returned  Fri, 2022-Jul-01 21:00:00 STD
  - STD:1656676800 returned  Fri, 2022-Jul-01 21:00:00 STD
  - DST:1656676800 returned  Fri, 2022-Jul-01 21:00:00 STD
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Note that mktime is "broken" at a transition if the tm_isdst passed
doesn't match the tm_isdst that localtime would have returned, in
that, the time is off by an hour.  Whether this is desirable behavior
or not is up to you ;-)

The last four tests (showing tm_isdst passed, returned time_t, and
localtime for that time_t) are testing mktime in January and July, far
from transitions, in every zone (Tokyo has no transitions, so these
tests are all you see from it).  NewYork has the same symptoms as at
transitions - a mismatched tm_isdst results in a wall clock time
that's off by an hour.  For the Asia/Tokyo case, mktime ignores
tm_isdst (apparently) and always returns standard time.

The tests for America/NewYork are identical across all the glibc patch
variants, so I'll skip those from now on.

As of glibc 2.29, the Asia/Tokyo results look like this:

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
"Asia/Tokyo", 1641038400, = Sat, 2022-Jan-01 12:00:00 = Sat, 2022-Jan-01 21:00:00 = -1 32400
1641038400 -> Sat Jan  1 12:00:00 2022 gmt, or Sat Jan  1 21:00:00 2022 local
  - UNK:1641038400 returned  Sat, 2022-Jan-01 21:00:00 STD
  - STD:1641038400 returned  Sat, 2022-Jan-01 21:00:00 STD
broken isdst=1: -1 instead of 1641038400 (-1641038401)
  - DST:-1 returned  Thu, 1970-Jan-01  8:59:59 STD

"Asia/Tokyo", 1656676800, = Fri, 2022-Jul-01 12:00:00 = Fri, 2022-Jul-01 21:00:00 = -1 32400
1656676800 -> Fri Jul  1 12:00:00 2022 gmt, or Fri Jul  1 21:00:00 2022 local
  - UNK:1656676800 returned  Fri, 2022-Jul-01 21:00:00 STD
  - STD:1656676800 returned  Fri, 2022-Jul-01 21:00:00 STD
broken isdst=1: -1 instead of 1656676800 (-1656676801)
  - DST:-1 returned  Thu, 1970-Jan-01  8:59:59 STD
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Note that a mismatched tm_isdst now returns -1/EOVERFLOW.

If you take out the two lines I noted in my initial email, you get this instead:

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
"Asia/Tokyo", 1641038400, = Sat, 2022-Jan-01 12:00:00 = Sat, 2022-Jan-01 21:00:00 = -1 32400
1641038400 -> Sat Jan  1 12:00:00 2022 gmt, or Sat Jan  1 21:00:00 2022 local
  - UNK:1641038400 returned  Sat, 2022-Jan-01 21:00:00 STD
  - STD:1641038400 returned  Sat, 2022-Jan-01 21:00:00 STD
  - DST:1641038400 returned  Sat, 2022-Jan-01 21:00:00 STD

"Asia/Tokyo", 1656676800, = Fri, 2022-Jul-01 12:00:00 = Fri, 2022-Jul-01 21:00:00 = -1 32400
1656676800 -> Fri Jul  1 12:00:00 2022 gmt, or Fri Jul  1 21:00:00 2022 local
  - UNK:1656676800 returned  Fri, 2022-Jul-01 21:00:00 STD
  - STD:1656676800 returned  Fri, 2022-Jul-01 21:00:00 STD
  - DST:1656676800 returned  Fri, 2022-Jul-01 21:00:00 STD
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Note that this matches the 2.28 behavior, but not the America/NewYork
behavior.

If you add Paul's patches to sync to gnulib, you get this:

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
"Asia/Tokyo", 1641038400, = Sat, 2022-Jan-01 12:00:00 = Sat, 2022-Jan-01 21:00:00 = -1 32400
1641038400 -> Sat Jan  1 12:00:00 2022 gmt, or Sat Jan  1 21:00:00 2022 local
  - UNK:1641038400 returned  Sat, 2022-Jan-01 21:00:00 STD
  - STD:1641038400 returned  Sat, 2022-Jan-01 21:00:00 STD
broken isdst=1: 1641034800 instead of 1641038400 (-3600)
  - DST:1641034800 returned  Sat, 2022-Jan-01 20:00:00 STD

"Asia/Tokyo", 1656676800, = Fri, 2022-Jul-01 12:00:00 = Fri, 2022-Jul-01 21:00:00 = -1 32400
1656676800 -> Fri Jul  1 12:00:00 2022 gmt, or Fri Jul  1 21:00:00 2022 local
  - UNK:1656676800 returned  Fri, 2022-Jul-01 21:00:00 STD
  - STD:1656676800 returned  Fri, 2022-Jul-01 21:00:00 STD
broken isdst=1: 1656673200 instead of 1656676800 (-3600)
  - DST:1656673200 returned  Fri, 2022-Jul-01 20:00:00 STD
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

As I noted before, this is consistent with America/NewYork, but still
differs from 2.28.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [swbz 29035] mktime vs non-DST
  2022-08-18 21:17       ` DJ Delorie
@ 2022-08-18 21:57         ` Paul Eggert
  2022-08-18 22:40           ` DJ Delorie
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Eggert @ 2022-08-18 21:57 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha, Carlos O'Donell

A couple of things about those examples.

First, mktime is not "broken" if it returns a time_t that differs from 
the time_t that you passed to localtime before fiddling with localtime's 
resulting struct tm by changing its is_dst flag. That's normal and 
expected mktime behavior, and many (perhaps most) mktime implementations 
do that, and this is illustrated by the New York examples.

Second, and more important, the reason users are complaining is that 
they want to answer questions like "What time is it 30 days after time 
T?" by calling localtime on T, adding 30 to the resulting tm_mday, and 
then calling mktime on the modified struct tm. This process fails in 
recent glibc if the modified struct tm happens to have the wrong 
tm_isdst flag because DST began or end sometime within the 30-day period.

With this in mind, DJ's Tokyo examples (which is the only place where 
Gnulib's behavior disagrees with old glibc) illustrate why the 
differences between old glibc and Gnulib (a) largely don't matter, and 
(b) when they do matter, Gnulib is better. Here's why:

(a) In Tokyo, adding 30 to tm_mday (or whatever) to a contemporary 
timestamp cannot yield a timestamp with the wrong tm_isdst because Tokyo 
hasn't observed DST since 1951, so DJ's examples of Tokyo "broken dst" 
(which is not really broken) largely don't matter.

(b) In the *extremely* unusual case where someone takes a circa-1951 
Tokyo timestamp with tm_isdst=1 and adds 25000 to tm_mday to get a 
contemporaneous timestamp, Gnulib mktime will treat this consistently 
with how it treats New York (and with how it treats adding only 365 to 
Tokyo tm_mday), whereas old glibc will treat it inconsistently.

Of course this use case is implausible, but the use case of adding 10 
years to a circa-2021 timestamp in Fiji is more plausible, and there 
again old glibc is inconsistent whereas Gnulib is consistent.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [swbz 29035] mktime vs non-DST
  2022-08-18 21:57         ` Paul Eggert
@ 2022-08-18 22:40           ` DJ Delorie
  2022-08-18 22:58             ` Paul Eggert
  0 siblings, 1 reply; 15+ messages in thread
From: DJ Delorie @ 2022-08-18 22:40 UTC (permalink / raw)
  To: Paul Eggert; +Cc: libc-alpha, carlos

Paul Eggert <eggert@cs.ucla.edu> writes:
> First, mktime is not "broken" if

Yeah, leftovers from the first iteration, and it just means that the
results don't match what the test expected.  We get to argue over what a
valid test is later ;-)

> illustrate why the differences between old glibc and Gnulib (a)
> largely don't matter,

Sadly, when someone has an application that's been expecting those
results for ages, and suddenly their code breaks, they don't care if the
new version is "better".  It's just "different".  You've given a few
examples of how mktime() might be abused, but we have no idea what kind
of abuses someone else's code might be relying on.

So while I agree that gnulib's code is more consistent, and thus better,
I'm still going to argue the other side until we've covered all the
options - for all those users who will still be affected by the "right"
change.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [swbz 29035] mktime vs non-DST
  2022-08-18 22:40           ` DJ Delorie
@ 2022-08-18 22:58             ` Paul Eggert
  2022-08-19 18:15               ` DJ Delorie
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Eggert @ 2022-08-18 22:58 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha, carlos

On 8/18/22 15:40, DJ Delorie wrote:
> So while I agree that gnulib's code is more consistent, and thus better,
> I'm still going to argue the other side until we've covered all the
> options

Not sure what is meant by "all the options".

Perhaps see what FreeBSD etc. do with your examples? That might be helpful.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [swbz 29035] mktime vs non-DST
  2022-08-18 22:58             ` Paul Eggert
@ 2022-08-19 18:15               ` DJ Delorie
  2022-08-19 22:04                 ` Paul Eggert
  0 siblings, 1 reply; 15+ messages in thread
From: DJ Delorie @ 2022-08-19 18:15 UTC (permalink / raw)
  To: Paul Eggert; +Cc: libc-alpha, carlos

Paul Eggert <eggert@cs.ucla.edu> writes:
> Perhaps see what FreeBSD etc. do with your examples? That might be helpful.

FreeBSD does this, which matches your patched version:

"America/New_York", 1615705199, = Sun, 2021-Mar-14  6:59:59 = Sun, 2021-Mar-14  1:59:59 = 0 -18000
1615705199 -> Sun Mar 14 06:59:59 2021 gmt, or Sun Mar 14 01:59:59 2021 local
  - UNK:1615705199 returned  Sun, 2021-Mar-14  1:59:59 STD
  - STD:1615705199 returned  Sun, 2021-Mar-14  1:59:59 STD
broken isdst=1: 1615701599 instead of 1615705199 (-3600)
  - DST:1615701599 returned  Sun, 2021-Mar-14  0:59:59 STD

"America/New_York", 1615705200, = Sun, 2021-Mar-14  7:00:00 = Sun, 2021-Mar-14  3:00:00 = 1 -14400
1615705200 -> Sun Mar 14 07:00:00 2021 gmt, or Sun Mar 14 03:00:00 2021 local
  - UNK:1615705200 returned  Sun, 2021-Mar-14  3:00:00 DST
broken isdst=0: 1615708800 instead of 1615705200 (+3600)
  - STD:1615708800 returned  Sun, 2021-Mar-14  4:00:00 DST
  - DST:1615705200 returned  Sun, 2021-Mar-14  3:00:00 DST

"America/New_York", 1636264799, = Sun, 2021-Nov-07  5:59:59 = Sun, 2021-Nov-07  1:59:59 = 1 -14400
1636264799 -> Sun Nov  7 05:59:59 2021 gmt, or Sun Nov  7 01:59:59 2021 local

"America/New_York", 1636264800, = Sun, 2021-Nov-07  6:00:00 = Sun, 2021-Nov-07  1:00:00 = 0 -18000
1636264800 -> Sun Nov  7 06:00:00 2021 gmt, or Sun Nov  7 01:00:00 2021 local

"America/New_York", 1641038400, = Sat, 2022-Jan-01 12:00:00 = Sat, 2022-Jan-01  7:00:00 = -1 -18000
1641038400 -> Sat Jan  1 12:00:00 2022 gmt, or Sat Jan  1 07:00:00 2022 local
  - UNK:1641038400 returned  Sat, 2022-Jan-01  7:00:00 STD
  - STD:1641038400 returned  Sat, 2022-Jan-01  7:00:00 STD
broken isdst=1: 1641034800 instead of 1641038400 (-3600)
  - DST:1641034800 returned  Sat, 2022-Jan-01  6:00:00 STD

"America/New_York", 1656676800, = Fri, 2022-Jul-01 12:00:00 = Fri, 2022-Jul-01  8:00:00 = -1 -14400
1656676800 -> Fri Jul  1 12:00:00 2022 gmt, or Fri Jul  1 08:00:00 2022 local
  - UNK:1656676800 returned  Fri, 2022-Jul-01  8:00:00 DST
broken isdst=0: 1656680400 instead of 1656676800 (+3600)
  - STD:1656680400 returned  Fri, 2022-Jul-01  9:00:00 DST
  - DST:1656676800 returned  Fri, 2022-Jul-01  8:00:00 DST

"Asia/Tokyo", 1641038400, = Sat, 2022-Jan-01 12:00:00 = Sat, 2022-Jan-01 21:00:00 = -1 32400
1641038400 -> Sat Jan  1 12:00:00 2022 gmt, or Sat Jan  1 21:00:00 2022 local
  - UNK:1641038400 returned  Sat, 2022-Jan-01 21:00:00 STD
  - STD:1641038400 returned  Sat, 2022-Jan-01 21:00:00 STD
broken isdst=1: 1641034800 instead of 1641038400 (-3600)
  - DST:1641034800 returned  Sat, 2022-Jan-01 20:00:00 STD

"Asia/Tokyo", 1656676800, = Fri, 2022-Jul-01 12:00:00 = Fri, 2022-Jul-01 21:00:00 = -1 32400
1656676800 -> Fri Jul  1 12:00:00 2022 gmt, or Fri Jul  1 21:00:00 2022 local
  - UNK:1656676800 returned  Fri, 2022-Jul-01 21:00:00 STD
  - STD:1656676800 returned  Fri, 2022-Jul-01 21:00:00 STD
broken isdst=1: 1656673200 instead of 1656676800 (-3600)
  - DST:1656673200 returned  Fri, 2022-Jul-01 20:00:00 STD


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [swbz 29035] mktime vs non-DST
  2022-08-19 18:15               ` DJ Delorie
@ 2022-08-19 22:04                 ` Paul Eggert
  0 siblings, 0 replies; 15+ messages in thread
From: Paul Eggert @ 2022-08-19 22:04 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha, carlos

On 8/19/22 11:15, DJ Delorie wrote:
> Paul Eggert <eggert@cs.ucla.edu> writes:
>> Perhaps see what FreeBSD etc. do with your examples? That might be helpful.
> 
> FreeBSD does this, which matches your patched version:

Not too surprising, as FreeBSD uses code derived from tzcode, which has 
done things that way since the 1980s (long before I became the tzcode 
maintainer). I wrote the Gnulib patch with this longstanding behavior at 
the back of my mind.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [swbz 29035] mktime vs non-DST
  2022-08-17 23:10 ` Paul Eggert
  2022-08-18  1:39   ` DJ Delorie
@ 2022-09-08 20:25   ` DJ Delorie
  1 sibling, 0 replies; 15+ messages in thread
From: DJ Delorie @ 2022-09-08 20:25 UTC (permalink / raw)
  To: Paul Eggert; +Cc: libc-alpha


> We have also seen a somewhat odd use of mktime() to detect DST transitions that
> DJ and I need to audit and verify against the changes you have implemented in
> gnulib.

FYI we've finished this, with no problems :-)

Paul Eggert <eggert@cs.ucla.edu> writes:

> From 674d4588995d24f8bc5167f73d8661ef5c82c2de Mon Sep 17 00:00:00 2001

> -# elif HAVE_TZSET
> +# else

Ok.

> From bfe919ce3bba22178429c5b39db98c26fd83272e Mon Sep 17 00:00:00 2001

>  	 Heuristic: probe the adjacent timestamps in both directions,
> -	 looking for the desired isdst.  This should work for all real
> -	 time zone histories in the tz database.  */
> +	 looking for the desired isdst.  If none is found within a
> +	 reasonable duration bound, assume a one-hour DST difference.
> +	 This should work for all real time zone histories in the tz
> +	 database.  */
> +
> +      /* +1 if we wanted standard time but got DST, -1 if the reverse.  */
> +      int dst_difference = (isdst == 0) - (tm.tm_isdst == 0);

Ok.

> -      /* The longest period of DST in tzdata2003a is 536454000 seconds
> -	 (e.g., America/Jujuy starting 1946-10-01 01:00).  The longest
> -	 period of non-DST is much longer, but it makes no real sense
> -	 to search for more than a year of non-DST, so use the DST
> -	 max.  */
> -      int duration_max = 536454000;
> +      /* In TZDB 2021e, the longest period of DST (or of non-DST), in
> +	 which the DST (or adjacent DST) difference is not one hour,
> +	 is 457243209 seconds: e.g., America/Cambridge_Bay with leap
> +	 seconds, starting 1965-10-31 00:00 in a switch from
> +	 double-daylight time (-05) to standard time (-07), and
> +	 continuing to 1980-04-27 02:00 in a switch from standard time
> +	 (-07) to daylight time (-06).  */
> +      int duration_max = 457243209;

Ok.

> +      /* No unusual DST offset was found nearby.  Assume one-hour DST.  */
> +      t += 60 * 60 * dst_difference;
> +      if (mktime_min <= t && t <= mktime_max && convert_time (convert, t, &tm))
> +	goto offset_found;
> +
>        __set_errno (EOVERFLOW);
>        return -1;
>      }

Ok.

LGTM
Reviewed-by: DJ Delorie <dj@redhat.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-09-08 20:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-17 21:18 [swbz 29035] mktime vs non-DST DJ Delorie
2022-08-17 21:50 ` Carlos O'Donell
2022-08-17 23:10 ` Paul Eggert
2022-08-18  1:39   ` DJ Delorie
2022-08-18  2:37     ` Carlos O'Donell
2022-08-18  3:16       ` Paul Eggert
2022-08-18  4:05         ` Carlos O'Donell
2022-08-18 21:17       ` DJ Delorie
2022-08-18 21:57         ` Paul Eggert
2022-08-18 22:40           ` DJ Delorie
2022-08-18 22:58             ` Paul Eggert
2022-08-19 18:15               ` DJ Delorie
2022-08-19 22:04                 ` Paul Eggert
2022-08-18  3:02     ` Paul Eggert
2022-09-08 20:25   ` DJ Delorie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).