* Support non-POSIX TZ strings
@ 2022-02-14 13:21 jdoubleu
2022-02-14 17:10 ` Brian Inglis
0 siblings, 1 reply; 7+ messages in thread
From: jdoubleu @ 2022-02-14 13:21 UTC (permalink / raw)
To: newlib
Hello,
I stumbled upon an issue with some TZ strings not handled as expected by newlib's tzset() function.
The tzset functions expects the string stored in the TZ environment variable to follow the POSIX format as described here: https://sourceware.org/newlib/libc.html#tzset <https://sourceware.org/newlib/libc.html#tzset> (or https://www.gnu.org/software/libc/manual/html_node/TZ-Variable.html <https://www.gnu.org/software/libc/manual/html_node/TZ-Variable.html>).
However, the glibc implementation extends the format and additionally allows ‘<[+|-]hh[:mm[:ss]]>’ in the format (compare https://www.man7.org/linux/man-pages/man3/tzset.3.html <https://www.man7.org/linux/man-pages/man3/tzset.3.html>). It seems like the timezone database (zoneinfo) provided by the IANA (https://www.iana.org/time-zones <https://www.iana.org/time-zones>) adopted that format; or at least the zic compiler generates these strings in the zoneinfo files for most systems.
That leads to the timezone for "America/Argentina/Buenos_Aires” to be "<-03>3”, as can be seen in this dump https://raw.githubusercontent.com/nayarsystems/posix_tz_db/master/zones.csv <https://raw.githubusercontent.com/nayarsystems/posix_tz_db/master/zones.csv> or a linux system: `tail -n 1 /usr/share/zoneinfo/America/Argentina/Buenos_Aires`.
Some more background information can be found here https://github.com/esp8266/Arduino/issues/8423 <https://github.com/esp8266/Arduino/issues/8423> and here https://github.com/esp8266/Arduino/issues/7690 <https://github.com/esp8266/Arduino/issues/7690>.
One way to approach this is for the user to just replace the incompatible part of the string with a valid timezone identifier, as proposed by https://github.com/esp8266/Arduino/pull/7699 <https://github.com/esp8266/Arduino/pull/7699>.
Since the timezone identifier (e.g. `PST`, `PDT`, `CET`, …) is not really used elsewhere by newlib, this should not be a problem, as far as I can imagine.
On the other hand, some ports implemented a proper parsing: https://github.com/earlephilhower/newlib-xtensa/pull/14 <https://github.com/earlephilhower/newlib-xtensa/pull/14>.
Now my question is whether the extended format should be support by newlib? Is this desired behaviour and would you accept code contributions for that matter?
Kind Regards
—————————————
jdoubleu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Support non-POSIX TZ strings
2022-02-14 13:21 Support non-POSIX TZ strings jdoubleu
@ 2022-02-14 17:10 ` Brian Inglis
2022-02-14 19:58 ` jdoubleu
0 siblings, 1 reply; 7+ messages in thread
From: Brian Inglis @ 2022-02-14 17:10 UTC (permalink / raw)
To: newlib
On 2022-02-14 06:21, jdoubleu wrote:
> Hello,
>
> I stumbled upon an issue with some TZ strings not handled as expected by newlib's tzset() function.
> The tzset functions expects the string stored in the TZ environment variable to follow the POSIX format as described here: https://sourceware.org/newlib/libc.html#tzset <https://sourceware.org/newlib/libc.html#tzset> (or https://www.gnu.org/software/libc/manual/html_node/TZ-Variable.html <https://www.gnu.org/software/libc/manual/html_node/TZ-Variable.html>).
>
> However, the glibc implementation extends the format and additionally allows ‘<[+|-]hh[:mm[:ss]]>’ in the format (compare https://www.man7.org/linux/man-pages/man3/tzset.3.html <https://www.man7.org/linux/man-pages/man3/tzset.3.html>). It seems like the timezone database (zoneinfo) provided by the IANA (https://www.iana.org/time-zones <https://www.iana.org/time-zones>) adopted that format; or at least the zic compiler generates these strings in the zoneinfo files for most systems.
>
> That leads to the timezone for "America/Argentina/Buenos_Aires” to be "<-03>3”, as can be seen in this dump https://raw.githubusercontent.com/nayarsystems/posix_tz_db/master/zones.csv <https://raw.githubusercontent.com/nayarsystems/posix_tz_db/master/zones.csv> or a linux system: `tail -n 1 /usr/share/zoneinfo/America/Argentina/Buenos_Aires`.
>
> Some more background information can be found here https://github.com/esp8266/Arduino/issues/8423 <https://github.com/esp8266/Arduino/issues/8423> and here https://github.com/esp8266/Arduino/issues/7690 <https://github.com/esp8266/Arduino/issues/7690>.
>
> One way to approach this is for the user to just replace the incompatible part of the string with a valid timezone identifier, as proposed by https://github.com/esp8266/Arduino/pull/7699 <https://github.com/esp8266/Arduino/pull/7699>.
> Since the timezone identifier (e.g. `PST`, `PDT`, `CET`, …) is not really used elsewhere by newlib, this should not be a problem, as far as I can imagine.
>
> On the other hand, some ports implemented a proper parsing: https://github.com/earlephilhower/newlib-xtensa/pull/14 <https://github.com/earlephilhower/newlib-xtensa/pull/14>.
>
> Now my question is whether the extended format should be support by newlib? Is this desired behaviour and would you accept code contributions for that matter?
Not sure what point you are trying to make and your terminology is
non-standard, but we should start with the actual POSIX spec under TZ:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03
and the current implementation:
https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/time/tzset_r.c
which does not handle "<" ">" quoted POSIX +/-numeric time zone
*abbreviations*, now common in the TZ database.
The BSD or TZcode implementations could probably be adapted to update
newlib tzset to avoid reinvention e.g.
https://github.com/eggert/tz/blob/main/newtzset.3
https://github.com/eggert/tz/blob/main/localtime.c#L1081
thru
https://github.com/eggert/tz/blob/main/localtime.c#L1400
[The original (American) English language time zone abbreviations were
often made up by the (American) TZ database maintainers and mailing list
users, and never used or published in the locale (e.g Germany used
German language time zone abbreviations like MEZ/MESZ not MET, similarly
for other European countries, see CLDR time zone abbreviations), only by
(American) English and mailing list users.
These made up (American) English language time zone abbreviations were
tracked down and replaced by the current TZ database maintainers after
the POSIX spec was expanded, but none are considered canonical, and CLDR
locale time zone abbreviations, as supported by ICU, are preferred (see
announcements on the home page https://unicode.org/).
ICU4X (https://github.com/unicode-org/icu4x) is being developed to
support "resource constrained" environments, but as the language
bindings include Rust, Objective C, C++, whether that will be usable
with embedded libraries such as newlib, musl, uclibc, dietlibc,
picolibc, might be ascertained by starting a discussion as encouraged on
the project site.]
--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Support non-POSIX TZ strings
2022-02-14 17:10 ` Brian Inglis
@ 2022-02-14 19:58 ` jdoubleu
2022-02-14 20:45 ` Brian Inglis
2022-02-15 22:36 ` Brian Inglis
0 siblings, 2 replies; 7+ messages in thread
From: jdoubleu @ 2022-02-14 19:58 UTC (permalink / raw)
To: newlib
[-- Attachment #1.1.1: Type: text/plain, Size: 997 bytes --]
Thanks for the quick response!
> [..] but we should start with the actual POSIX spec under TZ
Yes, that is exactly what I meant: Newlib supporting the <> (angle
brackets) syntax.
I didn't know that it was actually part of POSIX spec, since so many
libs actually don't implement it.
> The BSD or TZcode implementations could probably be adapted [..]
It looks like the TZcode implementation by Paul Eggert uses a different
approach to parsing the strings, than the current implementation in
newlib
(https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/time/tzset_r.c).
I'm not sure, if you want to copy the code over or use changes by e.g.
Earle F. Philhower from
https://github.com/earlephilhower/newlib-xtensa/pull/14.
Because of the above question, I'm not sure how to continue on this. I
would like to contribute myself and submit an implementation, but I'll
wait for feedback by other maintainers, first.
Cheers
------------
jdoubleu
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3203 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Support non-POSIX TZ strings
2022-02-14 19:58 ` jdoubleu
@ 2022-02-14 20:45 ` Brian Inglis
2022-02-14 21:33 ` Jeff Johnston
2022-02-15 22:36 ` Brian Inglis
1 sibling, 1 reply; 7+ messages in thread
From: Brian Inglis @ 2022-02-14 20:45 UTC (permalink / raw)
To: newlib
On 2022-02-14 12:58, jdoubleu wrote:
> On 22-02-14 10:10-0700, Brian Inglis wrote:
>> [..] but we should start with the actual POSIX spec under TZ
> Yes, that is exactly what I meant: Newlib supporting the <> (angle
> brackets) syntax.
> I didn't know that it was actually part of POSIX spec, since so many
> libs actually don't implement it.
Most should have by now if maintained: we should be a laggard! ;^>
>> The BSD or TZcode implementations could probably be adapted [..]
> It looks like the TZcode implementation by Paul Eggert uses a different
> approach to parsing the strings, than the current implementation in
> newlib
> (https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/time/tzset_r.c).
> I'm not sure, if you want to copy the code over or use changes by e.g.
> Earle F. Philhower from
> https://github.com/earlephilhower/newlib-xtensa/pull/14.
> Because of the above question, I'm not sure how to continue on this. I
> would like to contribute myself and submit an implementation, but I'll
> wait for feedback by other maintainers, first.
Upstream sources like BSDs or TZcode official reference implementations
are normally preferred because they are feature complete, regularly
maintained, feature test and standards compliant, vulnerabilities
checked, issues reported, and promptly fixed.
I checked the BSDs and they seem to have adopted or adapted the TZcode
official reference implementation, so I am not sure from where it may
have been adopted, or whether it is original: the maintainer Jeff
Johnson may remember.
I also wonder if the GMT defaults should be updated to UTC.
--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Support non-POSIX TZ strings
2022-02-14 20:45 ` Brian Inglis
@ 2022-02-14 21:33 ` Jeff Johnston
2022-02-15 22:02 ` Brian Inglis
0 siblings, 1 reply; 7+ messages in thread
From: Jeff Johnston @ 2022-02-14 21:33 UTC (permalink / raw)
To: Newlib
On Mon, Feb 14, 2022 at 3:46 PM Brian Inglis <
Brian.Inglis@systematicsw.ab.ca> wrote:
> On 2022-02-14 12:58, jdoubleu wrote:
> > On 22-02-14 10:10-0700, Brian Inglis wrote:
>
> >> [..] but we should start with the actual POSIX spec under TZ
>
> > Yes, that is exactly what I meant: Newlib supporting the <> (angle
> > brackets) syntax.
> > I didn't know that it was actually part of POSIX spec, since so many
> > libs actually don't implement it.
>
> Most should have by now if maintained: we should be a laggard! ;^>
>
> >> The BSD or TZcode implementations could probably be adapted [..]
>
> > It looks like the TZcode implementation by Paul Eggert uses a different
> > approach to parsing the strings, than the current implementation in
> > newlib
> > (
> https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/time/tzset_r.c).
>
> > I'm not sure, if you want to copy the code over or use changes by e.g.
> > Earle F. Philhower from
> > https://github.com/earlephilhower/newlib-xtensa/pull/14.
> > Because of the above question, I'm not sure how to continue on this. I
> > would like to contribute myself and submit an implementation, but I'll
> > wait for feedback by other maintainers, first.
>
> Upstream sources like BSDs or TZcode official reference implementations
> are normally preferred because they are feature complete, regularly
> maintained, feature test and standards compliant, vulnerabilities
> checked, issues reported, and promptly fixed.
>
> I checked the BSDs and they seem to have adopted or adapted the TZcode
> official reference implementation, so I am not sure from where it may
> have been adopted, or whether it is original: the maintainer Jeff
> Johnson may remember.
>
Unfortunately, I do not remember the exact details from back then. With no
license header, it means it was written by Cygnus/Red Hat.
> I also wonder if the GMT defaults should be updated to UTC.
>
> --
> Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
>
> This email may be disturbing to some readers as it contains
> too much technical detail. Reader discretion is advised.
> [Data in binary units and prefixes, physical quantities in SI.]
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Support non-POSIX TZ strings
2022-02-14 21:33 ` Jeff Johnston
@ 2022-02-15 22:02 ` Brian Inglis
0 siblings, 0 replies; 7+ messages in thread
From: Brian Inglis @ 2022-02-15 22:02 UTC (permalink / raw)
To: newlib
On 2022-02-14 14:33, Jeff Johnston wrote:
> On Mon, Feb 14, 2022 at 3:46 PM Brian Inglis <
> Brian.Inglis@systematicsw.ab.ca> wrote:
>
>> On 2022-02-14 12:58, jdoubleu wrote:
>>> On 22-02-14 10:10-0700, Brian Inglis wrote:
>>
>>>> [..] but we should start with the actual POSIX spec under TZ
>>
>>> Yes, that is exactly what I meant: Newlib supporting the <> (angle
>>> brackets) syntax.
>>> I didn't know that it was actually part of POSIX spec, since so many
>>> libs actually don't implement it.
>>
>> Most should have by now if maintained: we should be a laggard! ;^>
>>
>>>> The BSD or TZcode implementations could probably be adapted [..]
>>
>>> It looks like the TZcode implementation by Paul Eggert uses a different
>>> approach to parsing the strings, than the current implementation in
>>> newlib
>>> (
>> https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/time/tzset_r.c).
>>
>>> I'm not sure, if you want to copy the code over or use changes by e.g.
>>> Earle F. Philhower from
>>> https://github.com/earlephilhower/newlib-xtensa/pull/14.
>>> Because of the above question, I'm not sure how to continue on this. I
>>> would like to contribute myself and submit an implementation, but I'll
>>> wait for feedback by other maintainers, first.
>>
>> Upstream sources like BSDs or TZcode official reference implementations
>> are normally preferred because they are feature complete, regularly
>> maintained, feature test and standards compliant, vulnerabilities
>> checked, issues reported, and promptly fixed.
>>
>> I checked the BSDs and they seem to have adopted or adapted the TZcode
>> official reference implementation, so I am not sure from where it may
>> have been adopted, or whether it is original: the maintainer Jeff
>> Johnson may remember.
> Unfortunately, I do not remember the exact details from back then. With no
> license header, it means it was written by Cygnus/Red Hat.
>> I also wonder if the GMT defaults should be updated to UTC.
Submitted a newlib patch which builds okay, but cannot test, as I don't
have a newlib platform to run on, and Cygwin uses it's own TZ DB code base.
It should accept up to 10 character abbreviations for STD and DST
matching POSIX specs including anything within < > quoted content.
If someone needing this could build, test, and send feedback, I'd
appreciate it.
--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Support non-POSIX TZ strings
2022-02-14 19:58 ` jdoubleu
2022-02-14 20:45 ` Brian Inglis
@ 2022-02-15 22:36 ` Brian Inglis
1 sibling, 0 replies; 7+ messages in thread
From: Brian Inglis @ 2022-02-15 22:36 UTC (permalink / raw)
To: newlib
On 2022-02-14 12:58, jdoubleu wrote:
> Thanks for the quick response!
>
> > [..] but we should start with the actual POSIX spec under TZ
> Yes, that is exactly what I meant: Newlib supporting the <> (angle
> brackets) syntax.
> I didn't know that it was actually part of POSIX spec, since so many
> libs actually don't implement it.
>
> > The BSD or TZcode implementations could probably be adapted [..]
> It looks like the TZcode implementation by Paul Eggert uses a different
> approach to parsing the strings, than the current implementation in
> newlib
> (https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/time/tzset_r.c).
> I'm not sure, if you want to copy the code over or use changes by e.g.
> Earle F. Philhower from
> https://github.com/earlephilhower/newlib-xtensa/pull/14.
That patch includes the angle bracket quotes < > in the STD and DST
abbreviations, but the abbreviations are only contained *within* the
quotes, which should *NOT* be included in the abbreviations.
--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-02-15 22:36 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-14 13:21 Support non-POSIX TZ strings jdoubleu
2022-02-14 17:10 ` Brian Inglis
2022-02-14 19:58 ` jdoubleu
2022-02-14 20:45 ` Brian Inglis
2022-02-14 21:33 ` Jeff Johnston
2022-02-15 22:02 ` Brian Inglis
2022-02-15 22:36 ` Brian Inglis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).