public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed
* Timezones in <+-nn> format are not handled
@ 2022-03-15 14:23 Andreas Merkle
  2022-03-15 15:47 ` Joshua Westerheide
  0 siblings, 1 reply; 4+ messages in thread
From: Andreas Merkle @ 2022-03-15 14:23 UTC (permalink / raw)
  To: newlib

Posix timezone strings are in the format ABRVnn[ABRV[nn]][,...], e.g. 
GMT0BST,... is London TZ descriptor with two abbreviations GMT and BST.
ABRV means abbreviation. Such abbreviations are not defined for every 
timezone around the world.
According to https://data.iana.org/time-zones/theory.html if there is no 
common English abbreviations, use offsets like -05 and +0530 that are 
generated by zic's %z notation.
These numeric offsets are enclosed between <...>. For example, 
abbreviation for Sao Paulo TZ is <-03>3 (instead of e.g. valid SAOPAUL03).

A pull request for the newlib-xtensa version by earlephilhower fixes 
this: https://github.com/earlephilhower/newlib-xtensa/pull/14
I would like to provide the corresponding patch here if its ok?

Andi


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Timezones in <+-nn> format are not handled
  2022-03-15 14:23 Timezones in <+-nn> format are not handled Andreas Merkle
@ 2022-03-15 15:47 ` Joshua Westerheide
  2022-03-15 23:05   ` Andreas Merkle
  0 siblings, 1 reply; 4+ messages in thread
From: Joshua Westerheide @ 2022-03-15 15:47 UTC (permalink / raw)
  To: newlib

[-- Attachment #1: Type: text/plain, Size: 1123 bytes --]

Hi Andi,

I've reported the exact same issue a few weeks ago. The maintainers are 
already working on it and a patch has been posted to the mailing list. A 
few changes have been requested and it still needs testing.

I'll attach the patch and previous discussion.


Greetings
jdoubleu

On 3/15/2022 3:23 PM, Andreas Merkle wrote:
> Posix timezone strings are in the format ABRVnn[ABRV[nn]][,...], e.g. 
> GMT0BST,... is London TZ descriptor with two abbreviations GMT and BST.
> ABRV means abbreviation. Such abbreviations are not defined for every 
> timezone around the world.
> According to https://data.iana.org/time-zones/theory.html if there is no 
> common English abbreviations, use offsets like -05 and +0530 that are 
> generated by zic's %z notation.
> These numeric offsets are enclosed between <...>. For example, 
> abbreviation for Sao Paulo TZ is <-03>3 (instead of e.g. valid SAOPAUL03).
> 
> A pull request for the newlib-xtensa version by earlephilhower fixes 
> this: https://github.com/earlephilhower/newlib-xtensa/pull/14
> I would like to provide the corresponding patch here if its ok?
> 
> Andi
> 

[-- Attachment #2: [PATCH 2/2] newlib/libc/time/tzset_r.c(_tzset_unlocked_r): POSIX angle bracket <> support.eml --]
[-- Type: message/rfc822, Size: 8272 bytes --]

[-- Attachment #2.1.1: Type: text/plain, Size: 507 bytes --]


define POSIX specified minimum TZ abbr size 3 TZNAME_MIN
use limits.h TZNAME_MAX, _POSIX_TZNAME_MAX, unistd.h sysconf(_SC_TZNAME_MAX)
issue error if no symbols defined (document fallback value in case required)
allow POSIX angle bracket < > quoted signed alphanumeric tz abbr e.g. <MESZ+0330>
allow POSIX unquoted alphabetic tz abbr e.g. MESZ
apply same changes for DST tz abbr
---
 newlib/libc/time/tzset_r.c | 74 ++++++++++++++++++++++++++++++++------
 1 file changed, 64 insertions(+), 10 deletions(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2.1.2: 0002-newlib-libc-time-tzset_r.c-_tzset_unlocked_r-POSIX-a.patch --]
[-- Type: text/x-patch; name="0002-newlib-libc-time-tzset_r.c-_tzset_unlocked_r-POSIX-a.patch", Size: 3449 bytes --]

diff --git a/newlib/libc/time/tzset_r.c b/newlib/libc/time/tzset_r.c
index 9e0cf834bd6b..6a5fd578c0be 100644
--- a/newlib/libc/time/tzset_r.c
+++ b/newlib/libc/time/tzset_r.c
@@ -1,14 +1,30 @@
 #include <_ansi.h>
+#include <limits.h>	/* {,_POSIX_}TZNAME_MAX */
 #include <reent.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
+#include <unistd.h>	/* sysconf(_SC_TZNAME_MAX) */
 #include <sys/types.h>
 #include <sys/time.h>
 #include "local.h"
 
 #define sscanf siscanf	/* avoid to pull in FP functions. */
 
+#define TZNAME_MIN	3	/* POSIX specified minimum TZ abbr size */
+/* TZNAME_MAX - POSIX specified maximum TZ abbr size */
+/* define TZNAME_MAX if undefined and available */
+#if	!defined(TZNAME_MAX)
+#if	 defined(_POSIX_TZNAME_MAX)
+#define TZNAME_MAX	_POSIX_TZNAME_MAX	/* use POSIX value */
+#elif	 defined(_SC_TZNAME_MAX)
+#define TZNAME_MAX	sysconf(_SC_TZNAME_MAX)	/* use sysconf value */
+#else
+#error	"None of TZNAME_MAX, _POSIX_TZNAME_MAX, _SC_TZNAME_MAX are defined"
+#define TZNAME_MAX	9			/* could use fallback value */
+#endif	/* defined _POSIX_TZNAME_MAX || _SC_TZNAME_MAX */
+#endif	/* !defined(TZNAME_MAX) */
+
 static char __tzname_std[11];
 static char __tzname_dst[11];
 static char *prev_tzenv = NULL;
@@ -45,8 +61,25 @@ _tzset_unlocked_r (struct _reent *reent_ptr)
   if (*tzenv == ':')
     ++tzenv;  
 
-  if (sscanf (tzenv, "%10[^0-9,+-]%n", __tzname_std, &n) <= 0)
-    return;
+  /* allow POSIX angle bracket < > quoted signed alphanumeric tz abbr e.g. <MESZ+0330> */
+  if (*tzenv == '<')
+    {
+      ++tzenv;
+
+      /* quit if no items, too few or too many chars, or no close quote '>' */
+      if (sscanf (tzenv, "%10[-+0-9A-Za-z]%n", __tzname_std, &n) <= 0
+		|| n < TZNAME_MIN || TZNAME_MAX < n || '>' != tzenv[n])
+        return;
+
+      ++tzenv;	/* bump for close quote '>' */
+    }
+  else
+    {
+      /* allow POSIX unquoted alphabetic tz abbr e.g. MESZ */
+      if (sscanf (tzenv, "%10[A-Za-z]%n", __tzname_std, &n) <= 0
+				|| n < TZNAME_MIN || TZNAME_MAX < n)
+        return;
+    }
  
   tzenv += n;
 
@@ -68,17 +101,38 @@ _tzset_unlocked_r (struct _reent *reent_ptr)
   tz->__tzrule[0].offset = sign * (ss + SECSPERMIN * mm + SECSPERHOUR * hh);
   _tzname[0] = __tzname_std;
   tzenv += n;
-  
-  if (sscanf (tzenv, "%10[^0-9,+-]%n", __tzname_dst, &n) <= 0)
-    { /* No dst */
-      _tzname[1] = _tzname[0];
-      _timezone = tz->__tzrule[0].offset;
-      _daylight = 0;
-      return;
+
+  /* allow POSIX angle bracket < > quoted signed alphanumeric tz abbr e.g. <MESZ+0330> */
+  if (*tzenv == '<')
+    {
+      ++tzenv;
+
+      /* quit if no items, too few or too many chars, or no close quote '>' */
+      if (sscanf (tzenv, "%10[-+0-9A-Za-z]%n", __tzname_dst, &n) <= 0
+		|| n < TZNAME_MIN || TZNAME_MAX < n || '>' != tzenv[n])
+	{ /* No dst */
+	  _tzname[1] = _tzname[0];
+	  _timezone = tz->__tzrule[0].offset;
+	  _daylight = 0;
+	  return;
+	}
+
+      ++tzenv;	/* bump for close quote '>' */
     }
   else
-    _tzname[1] = __tzname_dst;
+    {
+      /* allow POSIX unquoted alphabetic tz abbr e.g. MESZ */
+      if (sscanf (tzenv, "%10[A-Za-z]%n", __tzname_dst, &n) <= 0
+				|| n < TZNAME_MIN || TZNAME_MAX < n)
+	{ /* No dst */
+	  _tzname[1] = _tzname[0];
+	  _timezone = tz->__tzrule[0].offset;
+	  _daylight = 0;
+	  return;
+	}
+    }
 
+  _tzname[1] = __tzname_dst;
   tzenv += n;
 
   /* otherwise we have a dst name, look for the offset */

[-- Attachment #3: Support non-POSIX TZ strings.eml --]
[-- Type: message/rfc822, Size: 8594 bytes --]

From: Brian Inglis <Brian.Inglis@SystematicSw.ab.ca>
To: newlib@sourceware.org
Subject: Re: Support non-POSIX TZ strings
Date: Mon, 14 Feb 2022 10:10:23 -0700
Message-ID: <758cfb47-ac13-fb88-877e-63a1d4327429@SystematicSw.ab.ca>

On 2022-02-14 06:21, jdoubleu wrote:
> Hello,
> 
> I stumbled upon an issue with some TZ strings not handled as expected by newlib's tzset() function.
> The tzset functions expects the string stored in the TZ environment variable to follow the POSIX format as described here: https://sourceware.org/newlib/libc.html#tzset <https://sourceware.org/newlib/libc.html#tzset> (or https://www.gnu.org/software/libc/manual/html_node/TZ-Variable.html <https://www.gnu.org/software/libc/manual/html_node/TZ-Variable.html>).
> 
> However, the glibc implementation extends the format and additionally allows ‘<[+|-]hh[:mm[:ss]]>’ in the format (compare https://www.man7.org/linux/man-pages/man3/tzset.3.html <https://www.man7.org/linux/man-pages/man3/tzset.3.html>). It seems like the timezone database (zoneinfo) provided by the IANA (https://www.iana.org/time-zones <https://www.iana.org/time-zones>) adopted that format; or at least the zic compiler generates these strings in the zoneinfo files for most systems.
> 
> That leads to the timezone for "America/Argentina/Buenos_Aires” to be "<-03>3”, as can be seen in this dump https://raw.githubusercontent.com/nayarsystems/posix_tz_db/master/zones.csv <https://raw.githubusercontent.com/nayarsystems/posix_tz_db/master/zones.csv> or a linux system: `tail -n 1 /usr/share/zoneinfo/America/Argentina/Buenos_Aires`.
> 
> Some more background information can be found here https://github.com/esp8266/Arduino/issues/8423 <https://github.com/esp8266/Arduino/issues/8423> and here https://github.com/esp8266/Arduino/issues/7690 <https://github.com/esp8266/Arduino/issues/7690>.
> 
> One way to approach this is for the user to just replace the incompatible part of the string with a valid timezone identifier, as proposed by https://github.com/esp8266/Arduino/pull/7699 <https://github.com/esp8266/Arduino/pull/7699>.
> Since the timezone identifier (e.g. `PST`, `PDT`, `CET`, …) is not really used elsewhere by newlib, this should not be a problem, as far as I can imagine.
> 
> On the other hand, some ports implemented a proper parsing: https://github.com/earlephilhower/newlib-xtensa/pull/14 <https://github.com/earlephilhower/newlib-xtensa/pull/14>.
> 
> Now my question is whether the extended format should be support by newlib? Is this desired behaviour and would you accept code contributions for that matter?

Not sure what point you are trying to make and your terminology is 
non-standard, but we should start with the actual POSIX spec under TZ:

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03 


and the current implementation:

https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/time/tzset_r.c

which does not handle "<" ">" quoted POSIX +/-numeric time zone 
*abbreviations*, now common in the TZ database.

The BSD or TZcode implementations could probably be adapted to update 
newlib tzset to avoid reinvention e.g.

	https://github.com/eggert/tz/blob/main/newtzset.3
	https://github.com/eggert/tz/blob/main/localtime.c#L1081
thru
	https://github.com/eggert/tz/blob/main/localtime.c#L1400

[The original (American) English language time zone abbreviations were 
often made up by the (American) TZ database maintainers and mailing list 
users, and never used or published in the locale (e.g Germany used 
German language time zone abbreviations like MEZ/MESZ not MET, similarly 
for other European countries, see CLDR time zone abbreviations), only by 
(American) English and mailing list users.

These made up (American) English language time zone abbreviations were 
tracked down and replaced by the current TZ database maintainers after 
the POSIX spec was expanded, but none are considered canonical, and CLDR 
locale time zone abbreviations, as supported by ICU, are preferred (see 
announcements on the home page https://unicode.org/).

ICU4X (https://github.com/unicode-org/icu4x) is being developed to 
support "resource constrained" environments, but as the language 
bindings include Rust, Objective C, C++, whether that will be usable 
with embedded libraries such as newlib, musl, uclibc, dietlibc, 
picolibc, might be ascertained by starting a discussion as encouraged on 
the project site.]

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

[-- Attachment #4: Support non-POSIX TZ strings.eml --]
[-- Type: message/rfc822, Size: 6702 bytes --]

From: Brian Inglis <Brian.Inglis@SystematicSw.ab.ca>
To: newlib@sourceware.org
Subject: Re: Support non-POSIX TZ strings
Date: Tue, 15 Feb 2022 15:02:40 -0700
Message-ID: <fcf46fc8-4f09-90cc-5303-7ad2f7b7ae69@SystematicSw.ab.ca>

On 2022-02-14 14:33, Jeff Johnston wrote:
> On Mon, Feb 14, 2022 at 3:46 PM Brian Inglis <
> Brian.Inglis@systematicsw.ab.ca> wrote:
> 
>> On 2022-02-14 12:58, jdoubleu wrote:
>>> On 22-02-14 10:10-0700, Brian Inglis wrote:
>>
>>>> [..] but we should start with the actual POSIX spec under TZ
>>
>>> Yes, that is exactly what I meant: Newlib supporting the <> (angle
>>> brackets) syntax.
>>> I didn't know that it was actually part of POSIX spec, since so many
>>> libs actually don't implement it.
>>
>> Most should have by now if maintained: we should be a laggard! ;^>
>>
>>>> The BSD or TZcode implementations could probably be adapted [..]
>>
>>> It looks like the TZcode implementation by Paul Eggert uses a different
>>> approach to parsing the strings, than the current implementation in
>>> newlib
>>> (
>> https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/time/tzset_r.c).
>>
>>> I'm not sure, if you want to copy the code over or use changes by e.g.
>>> Earle F. Philhower from
>>> https://github.com/earlephilhower/newlib-xtensa/pull/14.
>>> Because of the above question, I'm not sure how to continue on this. I
>>> would like to contribute myself and submit an implementation, but I'll
>>> wait for feedback by other maintainers, first.
>>
>> Upstream sources like BSDs or TZcode official reference implementations
>> are normally preferred because they are feature complete, regularly
>> maintained, feature test and standards compliant, vulnerabilities
>> checked, issues reported, and promptly fixed.
>>
>> I checked the BSDs and they seem to have adopted or adapted the TZcode
>> official reference implementation, so I am not sure from where it may
>> have been adopted, or whether it is original: the maintainer Jeff
>> Johnson may remember.

> Unfortunately, I do not remember the exact details from back then.  With no
> license header, it means it was written by Cygnus/Red Hat.

>> I also wonder if the GMT defaults should be updated to UTC.

Submitted a newlib patch which builds okay, but cannot test, as I don't 
have a newlib platform to run on, and Cygwin uses it's own TZ DB code base.
It should accept up to 10 character abbreviations for STD and DST 
matching POSIX specs including anything within < > quoted content.
If someone needing this could build, test, and send feedback, I'd 
appreciate it.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re: Timezones in <+-nn> format are not handled
  2022-03-15 15:47 ` Joshua Westerheide
@ 2022-03-15 23:05   ` Andreas Merkle
  2022-03-16 20:40     ` Brian Inglis
  0 siblings, 1 reply; 4+ messages in thread
From: Andreas Merkle @ 2022-03-15 23:05 UTC (permalink / raw)
  To: newlib

Hi Joshua,

thanks for the update regarding the issue.
I was not aware of that its already in progress.

Best wishes
Andi


> Hi Andi,
>
> I've reported the exact same issue a few weeks ago. The maintainers are
> already working on it and a patch has been posted to the mailing list. A
> few changes have been requested and it still needs testing.
>
> I'll attach the patch and previous discussion.
>
>
> Greetings
> jdoubleu

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Timezones in <+-nn> format are not handled
  2022-03-15 23:05   ` Andreas Merkle
@ 2022-03-16 20:40     ` Brian Inglis
  0 siblings, 0 replies; 4+ messages in thread
From: Brian Inglis @ 2022-03-16 20:40 UTC (permalink / raw)
  To: newlib

On 2022-03-15 17:05, Andreas Merkle wrote:
>> I've reported the exact same issue a few weeks ago. The maintainers are
>> already working on it and a patch has been posted to the mailing list. A
>> few changes have been requested and it still needs testing.
>> I'll attach the patch and previous discussion.

 > thanks for the update regarding the issue.
 > I was not aware of that its already in progress.

I'll post an updated version soon when I have enough time to get it into 
suitable shape to be accepted for commit.
I don't have a newlib platform for testing in a build, so would 
appreciate testers who can check the integration.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-03-16 20:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-15 14:23 Timezones in <+-nn> format are not handled Andreas Merkle
2022-03-15 15:47 ` Joshua Westerheide
2022-03-15 23:05   ` Andreas Merkle
2022-03-16 20:40     ` Brian Inglis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).