From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <brian.inglis@systematicsw.ab.ca>
Received: from omta002.cacentral1.a.cloudfilter.net
 (omta002.cacentral1.a.cloudfilter.net [3.97.99.33])
 by sourceware.org (Postfix) with ESMTPS id 0E2973858C2C
 for <newlib@sourceware.org>; Wed, 13 Apr 2022 22:19:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0E2973858C2C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=SystematicSw.ab.ca
Authentication-Results: sourceware.org;
 spf=none smtp.mailfrom=systematicsw.ab.ca
Received: from shw-obgw-4004a.ext.cloudfilter.net ([10.228.9.227])
 by cmsmtp with ESMTP
 id eg5mnLxrbgTZYelKinbkpc; Wed, 13 Apr 2022 22:19:28 +0000
Received: from [10.0.0.5] ([184.64.124.72]) by cmsmtp with ESMTP
 id elKhnLc2ld7RfelKhnE3I5; Wed, 13 Apr 2022 22:19:28 +0000
X-Authority-Analysis: v=2.4 cv=XrLphHJ9 c=1 sm=1 tr=0 ts=62574c70
 a=oHm12aVswOWz6TMtn9zYKg==:117 a=oHm12aVswOWz6TMtn9zYKg==:17
 a=IkcTkHD0fZMA:10 a=GcyzOjIWAAAA:8 a=CCpqsmhAAAAA:8 a=uZvujYp8AAAA:8
 a=kBL5Q00NxL560xvGdhkA:9 a=QEXdDO2ut3YA:10 a=JWtY7oSOxG8A:10
 a=O-xZXXFxz3YA:10 a=7ipR1gm3pxYA:10 a=AomDDAB-fV4A:10
 a=hQL3dl6oAZ8NdCsdz28n:22 a=ul9cdbp4aOFLsgKbc677:22 a=SLzB8X_8jTLwj6mN0q5r:22
Message-ID: <a2bd00ae-897a-01f9-2fc9-33df0b0ee08b@SystematicSw.ab.ca>
Date: Wed, 13 Apr 2022 16:19:27 -0600
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.8.0
Reply-To: newlib@sourceware.org
Subject: Re: [PATCH] add tests for tzset(3)
Content-Language: en-CA
To: newlib@sourceware.org
References: <822e81a0-ed9f-200e-3318-0495456ad67e@SystematicSw.ab.ca>
 <20220407233425.2012-1-Brian.Inglis@SystematicSW.ab.ca>
 <9603e3aa-bd7d-6740-c710-27ace1d80397@SystematicSw.ab.ca>
 <CAOox84t_n--Fs4FQDqJbAdHE_xR6T6+uuLKJ=7uzfvegddP_DA@mail.gmail.com>
From: Brian Inglis <Brian.Inglis@SystematicSw.ab.ca>
Organization: Systematic Software
In-Reply-To: <CAOox84t_n--Fs4FQDqJbAdHE_xR6T6+uuLKJ=7uzfvegddP_DA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-CMAE-Envelope: MS4xfHrM6ytgCkWIYdRrg2jfimcKfBIvJB8Vxu1JE4z0ZZ/zewM+WJHA2wNIN0XjlkGuQWBUYX868x5uIm7i1+TkZpM9yvLyW+BGNKOLshBD2lKVt84/aTFi
 lj/4UHkhZqUYVyhSXqwhPUApDv6xE3FMMiBbMZx+p756NCcuUVIdYymdlsVUlj9E8Y60EWY5ZpVPXBjMMptWipPLO6KuJVPf324=
X-Spam-Status: No, score=-1164.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, NICE_REPLY_A, SPF_HELO_NONE, SPF_NONE,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: newlib@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Newlib mailing list <newlib.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/newlib>,
 <mailto:newlib-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/newlib/>
List-Post: <mailto:newlib@sourceware.org>
List-Help: <mailto:newlib-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/newlib>,
 <mailto:newlib-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Apr 2022 22:19:31 -0000

On 2022-04-13 14:33, Jeff Johnston wrote:
> Looking at the glibc tzset code I have locally (not latest/greatest, but
> does support angle brackets):
> 
> If there any parse failures, UTC is defaulted.

We currently leave the time zone info unchanged.

> Extraneous characters inside brackets or less than 3 characters is a
> parse failure.
✔ Check	✔ Check

> Glibc parses the tz name string char by char and allocates space for
> the name strings so there is no max size.

The suggestion was that glibc ignores the remaining characters, but you 
imply that glibc in fact uses the equivalent of the scanf "%m[...]" 
(malloc) modifier, and I think using that would be against the newlib 
philosophy to keep things limited and under control to support small 
targets.  Larger targets like Cygwin (do our own thing including 
zoneinfo), and perhaps RTEMS, can supply their own enhancements.

> the name strings so there is no max size.  I am fine if you want to 
> mandate a maximum, but if you do, then too many chars should be 
> treated as a failure.  If you aren't certain of the limit, make the 
> limit higher than you expect.
Current limits are 3-10 allowing for e.g. <MESZ+03:30> which is the most 
ever likely to be used. It might be reasonable to bump it up to say 15.

> If people run into max limit with reasonable timezone format strings, then
> we can up the limit.

The conditions are more or less what is implemented, but we could do 
with a couple more tweaks to improve things, like check for more or 
extraneous chars within the bracket quotes, and that no characters 
remain unconsumed at the end of the parse.

Context below are latest posts under thread "add test for tzset(3)".


> On Wed, Apr 13, 2022 at 1:53 PM Brian Inglis wrote:
>> On 2022-04-12 12:33, Brian Inglis wrote:
>>> On 2022-04-12 05:19, jdoubleu wrote:
>>>> On 4/11/2022 7:27 PM, Dimitar Dimitrov wrote:
>>>>> On Mon, Apr 11, 2022 at 01:17:16PM +0200, jdoubleu wrote:
>>>>>> looks like I'm running the testsuite against glibc and not newlib
>>>>>> (for target x86_64-pc-linux-gnu). I'm not even sure whether
>>>>>> there's a backend for linux.
>>>>>> I'm currently trying to run only the tzset code against the test
>>>>>> vectors (like Brian Inglis did[1]).
>>>>>> At least it show, that the newlib implementation differs from glibc.
>>>>>> Maybe the test case is flawed and it should indeed fail.
>>>
>>>>>>> 6:20:12 is the timezone string of the previous test case, whose tzset
>>>>>>> call was successful. Looking at the current code, this is expected
>>>>>>> behaviour.
>>>
>>>>>> Okay. Looks like the condition[2] fails. The question is, which
>>>>>> part of it does?
>>>>> I believe it is the ('>' != tzenv[n]) check that fails, because the
>>>>> maximum parsed string limit of 10 has been reached and n=10.
>>>
>>>>>> I've appended a patch, which prints all variables when the condition
>>>>>> fails.
>>>>>> Could you please apply the patch and then recompile and re-run the
>>>>>> tests
>>>>>> again?
>>>
>>>>> Here is the result for arm-none-eabi:
>>>>> parsing name: tzenv="+0123456789ABCDEF>3:33:33", res=1, n=10,
>>>>> tzenv[n] = 9
>>>>> Assertion failed! Expected 1647906533 to equal 1647916532. winter
>>>>> time, timezone = "<+0123456789ABCDEF>3:33:33"
>>>>> I assume you no longer need assembly output from compiler?
>>>
>>>>>> I've previously noticed something with the sscanf format[3].
>>>>>>> Perhaps TZ should be reset to UTC before the bail out?
>>>
>>>>>> I don't think the implementation should fall back to UTC
>>>>>> whenever parsing failed. It apparently doesn't in glibc. I'm not
>>>>>> sure if the behavior is specified somewhere.
>>>
>>>>> The tzset manual page says that UTC is used if TZ cannot be parsed:
>>>>> https://man7.org/linux/man-pages/man3/tzset.3.html
>>>
>>>>>> Maybe resetting it before each test case is a good idea, though.
>>>>>> That makes it clearer, why the test case failed.
>>>>>>> With that chunk removed, as shown below:
>>>>>>>       {"3:33:33",               IN_SECONDS(3, 33, 33),
>>>>>>> NO_TIME},                 // truncates the name (17 + 1)
>>>>>>> I still get:
>>>>>>>     Assertion failed! Expected 1647906533 to equal 1647916532.
>>>>>>> winter time, timezone = "3:33:33"
>>>
>>>>>> My bad; "3:33:33" isn't a valid timezone string. It has to be
>>>>>> prefixed by a
>>>>>> name e.g. "MESZ" or "<+00>", as you tried. That explains why it is
>> also
>>>>>> failing.
>>>
>>>>>> [1]: https://sourceware.org/pipermail/newlib/2022/019529.html
>>>>>> [2]:
>>>>>>
>> https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/time/tzset_r.c;h=9cb30b188f989f65ec9eb6417f5d74020f8c72e9;hb=HEAD#l57
>>>>>>
>>>>>> [3]: https://sourceware.org/pipermail/newlib/2022/019535.html
>>>
>>>>> I believe it is the ('>' != tzenv[n]) check that fails, because the
>>>>> maximum parsed string limit of 10 has been reached and n=10.
>>>
>>>> Okay, so the newlib implementation actually fails, when the name is
>>>> too long.
>>>> The POSIX standard leaves it up to the implementation how to handle
>>>> names[1]:
>>>
>>>>> The interpretation of these fields is unspecified if either field is
>>>>> less than three bytes (except for the case when dst is missing), more
>>>>> than {TZNAME_MAX} bytes, or if they contain characters other than
>>>>> those specified.
>>>
>>>> I'm not sure where to go with this. In both cases, the current
>>>> implementation needs a fix. Looping in Brian Inglis, the writer of the
>>>> current implementation and Jeff Johnston, a maintainer.
>>>> As I see it, there are two solutions:
>>>> 1. Treat long names as an error case. The failed condition[2] needs to
>>>> apply UTC as current time, like Dimitar Dimitrov previously suggested.
>>>> The test needs to be changed to check that it actually fails.
>>>> 2. Mimic glibc's behavior and ignore the remaining characters from the
>>>> name. In that case, the condition[2] and following code needs to be
>>>> updated as well.
>>>> I am happy to provide a patch, when we agreed on one solution.
>>>
>>>>> I assume you no longer need assembly output from compiler?
>>>
>>>> No, thanks. We already found the issue. It would only be helpful, if
>>>> different configurations generate different code (through macros,
>>>> optimization, etc.).
>>>> [1]:
>> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03
>>>> [2]:
>> https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/time/tzset_r.c;hb=HEAD#l57
>>
>>> Does anyone know what glibc or the BSDs do on short < 3 char
>>> abbreviations, or the BSDs do on > LIMIT abbreviations?
>>> What do they do on errors - nothing or revert to UTC?
>>>
>>> Should tzset_r allow 1...2 char abbreviations or treat them as errors?
>>>
>>> Should tzset_r allow > 10 char abbreviations, ignore and skip extra
>>> chars, or treat them as errors?
>>
>> Should tzset_r accept characters within quotes not in the allowed
>> character set, ignore and skip them, or reject such abbreviations?
>>
>>> If it ignores and skips extra or accepts too few characters, then
>>> presumably it should process the remaining fields if present?
>>>
>>> POSIX tzset_r says only that it overrides the default time zone, so is
>>> there a spec for the default time zone, or is it implementation defined
>>> (e.g. /etc/timezone /etc/localtime)?
>>>
>>> Should tzset_r revert to UTC on errors, or just not change the TZ, and
>>> leave it up to the application to handle the error?

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]