From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from rock.gnat.com (rock.gnat.com [IPv6:2620:20:4000:0:a9e:1ff:fe9b:1d1]) by sourceware.org (Postfix) with ESMTPS id A56C33858D3C; Wed, 22 Feb 2023 14:38:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A56C33858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=adacore.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=adacore.com Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id 6869C1160FC; Wed, 22 Feb 2023 09:38:08 -0500 (EST) X-Virus-Scanned: Debian amavisd-new at gnat.com Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 7Oi3yjz1Yxda; Wed, 22 Feb 2023 09:38:08 -0500 (EST) Received: from free.home (tron.gnat.com [IPv6:2620:20:4000:0:46a8:42ff:fe0e:e294]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by rock.gnat.com (Postfix) with ESMTPS id 0346F1160F5; Wed, 22 Feb 2023 09:38:07 -0500 (EST) Received: from livre (livre.home [172.31.160.2]) by free.home (8.15.2/8.15.2) with ESMTPS id 31MERTSL291663 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Wed, 22 Feb 2023 11:27:29 -0300 From: Alexandre Oliva To: Jakub Jelinek Cc: Jonathan Wakely , gcc-patches@gcc.gnu.org, libstdc++@gcc.gnu.org Subject: Re: [PATCH] [PR77760] [libstdc++] encode __time_get_state in tm Organization: Free thinker, does not speak for AdaCore References: Errors-To: aoliva@lxoliva.fsfla.org Date: Wed, 22 Feb 2023 11:27:29 -0300 In-Reply-To: (Jakub Jelinek's message of "Fri, 17 Feb 2023 13:56:32 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.84 X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Feb 17, 2023, Jakub Jelinek wrote: > My worry is that people often invoke the time_get APIs on uninitialized > struct tm. Yeah. I thought you meant get(), but it looks like you meant do_get() as well. I seem to have overread the permissions to overwrite tm members, to update its current contents, and to write unspecified values in its members. When I started down this path, I thought we might be able to hold the state bits only in fields that were to be updated, but the century field and allowing for 16-bit ints made that a bit challenging. So, back to the drawing board, the other possibility that occurs to me is to use a thread-local state-keeper chained stack, set up by get, and that enabled state to be recovered and updated by do_get if relevant incoming arguments match those recorded at the top of the stack (e.g., at least that we're looking at the same tm object). Another approach is to encode state in tm in-place and progressively, and compute derived fields when (i) we've just set one of the input fields, and (ii) any other input fields are in range. E.g., if we parse %p, we make sure tm_hour will be in the 00..11 range for am or 12..23 for pm, using the value previously-held in tm_hour, modulo 12, only if it was in the 00..23 range. If it wasn't in range, we record 00 or 12. When we parse %I, if we find that tm_hour is in the 12..23 range, we take that as meaning a "pm" was parsed for %p before, and add 12 to the parsed value. For century, we'd encode it immediately in tm_year (preserving the previous modulo-100 value), and when parsing %Y we'd use the century data, both at the risk of using garbage in case of uninitialized data. That appears to be in line with the standard-specified behavior. Maybe this is what the specification expects implementations to do? > For the encoding in get, the questions are: > 1) if it is ok if we clear some normally unused bits of certain tm_* > fields even if they wouldn't be otherwise touched; perhaps nothing > will care, but it is still a user visible change Rereading the relevant passages in standards and drafts, ISTM that this unconditional zeroing would be unexpected and not permitted by the standard, indeed. > 2) when those tm_* fields we want to encode the state into are unitialized, > don't we effectively invoke UB by using the uninitialized member > (admittedly we just overwrite some bits in it and leave others as is, > but won't that e.g. cause -Wuninitialized warnings)? ISTM that https://eel.is/c++draft/locale.time.get#virtuals-15 allows do_get to read values of members, and even to expect them to have been zero-initialized. That, and the allowance for unspecified values to be stored in tm in case of error, seem to me to hint at using out-of-range values of struct tm. However, at the end of do_get, we *should* have specified values in place in the just-read member, and get rules out modifying other fields. Could get() build a temporary, automatic tm object, zero-initialize it, set a bit in each field that indicates it is not set, use it throughout multiple do_get calls to hold parsed values and internal state, and then copy to the user-supplied tm object only fields that are marked as set (the bit is clear) and in range? > More importantly, in the do_get the questions are: > 3) while do_get is protected, are we guaranteed that it will never be called > except from the public members of time_get? That's IMHO the main element of risk in using bits from unrelated fields to hold state. Storing the expected values only implies encoding state in tm where it would ultimately end up (i.e., am/pm in tm_hour / 12, century in tm_year / 100 + 19), using a zero (or an equal-to-sign) bit to flag "set", and leaving it for get() to compute derived fields (if that's even standard-compliant; it's not clear that it is, desirable as it seems) > 4) I think we even shouldn't __state._M_finalize_state(__tm); in do_get > if it is nested, then you wouldn't need to tweak src/c++98/locale_facets.cc; > generally state should be finalized once everything is parsed together, > not many times in between (of course if user code will parse stuff > separately that will not work, say get with "%I" in one call, then get > with "%h" in another one etc., but that is user's decision). Parsing "%h%p%I%p%h" should get into tm_hour the value that time_put read to format with the same format string, so there's an argument for "finalizing" tm_hour based on more than the state fields we hold atm, perhaps even for progressive in-place encoding, as I have suggested above. > Just curious, why doesn't the pmf hack work on arm-vxworks7? At first, I thought we were running into this just because we have to define __clang__ because of some vxworks system headers aimed at clang. But even as I tried to drop the #ifndef, the test still failed; I suspected it had to do with ARM's encoding of ptrmemfunc_vbit_in_delta, but I did not confirm that this was the case. > Sure, clang++ doesn't support it and if one derives a class from time_get > and overrides do_get and/or get things will not work properly either. Yeah, I reasoned along these lines to try an alternate solution, even if the alternate solution was to be used only when the test did not pass. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about