public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jakub Jelinek <jakub@redhat.com>
To: Alexandre Oliva <oliva@adacore.com>,
	Jonathan Wakely <jwakely@redhat.com>
Cc: gcc-patches@gcc.gnu.org, libstdc++@gcc.gnu.org
Subject: Re: [PATCH] [PR77760] [libstdc++] encode __time_get_state in tm
Date: Fri, 17 Feb 2023 13:56:32 +0100	[thread overview]
Message-ID: <Y+95gEhz3zQj3ngU@tucnak> (raw)
In-Reply-To: <orh6vkravy.fsf@lxoliva.fsfla.org>

On Fri, Feb 17, 2023 at 04:47:45AM -0300, Alexandre Oliva wrote:
> 
> On platforms that fail the ptrtomemfn-cast-to-pfn hack, such as
> arm-*-vxworks*, time_get fails with %I and %p because the state is not
> preserved across do_get calls.
> 
> This patch introduces an alternate hack, that encodes the state in
> unused bits of struct tm before calling do_get, extracts them in
> do_get, does the processing, and encodes it back, so that get extracts
> it.
> 
> The finalizer is adjusted for idempotence, because both do_get and get
> may call it.

As I said in the PR, I'm worried about this approach, but Jonathan
is the expert on the standard...
My worry is that people often invoke the time_get APIs on uninitialized
struct tm.
https://eel.is/c++draft/locale.time.get#general-1 says that corresponding
members of the struct tm are set, in the past we used to set mostly just
a single tm_* for ultimate format specifiers, with the addition of state
we try to set some further ones as well in the spirit of what strptime
does in glibc.  But still, say if the format specifiers only mention
hours/minutes/seconds, we don't try to update year/day/month related stuff
and vice versa.  Or say minute format specifier will only update tm_min
and nothing else.
For the encoding in get, the questions are:
1) if it is ok if we clear some normally unused bits of certain tm_*
   fields even if they wouldn't be otherwise touched; perhaps nothing
   will care, but it is still a user visible change
2) when those tm_* fields we want to encode the state into are unitialized,
   don't we effectively invoke UB by using the uninitialized member
   (admittedly we just overwrite some bits in it and leave others as is,
   but won't that e.g. cause -Wuninitialized warnings)?
More importantly, in the do_get the questions are:
3) while do_get is protected, are we guaranteed that it will never be called
   except from the public members of time_get?  I mean, if we are called
   from the libstdc++ time_get::get, then there is a state and it is already
   encoded in struct tm, so those bits are initialized and all is fine;
   but if it is called from elsewhere (either a class derived from
   time_get which just calls do_get alone on uninitialized struct tm,
   or say get in a derived class which calls do_get but doesn't encode the
   state, or even mixing of older GCC compiled get with newer GCC compiled
   do_get - get can be inlined into user code, while do_get is called
   through vtable and so could come up from another shared library), those
   struct tm bits could be effectively random and we'd finalize the state
   on the random stuff
4) I think we even shouldn't __state._M_finalize_state(__tm); in do_get
   if it is nested, then you wouldn't need to tweak src/c++98/locale_facets.cc;
   generally state should be finalized once everything is parsed together,
   not many times in between (of course if user code will parse stuff
   separately that will not work, say get with "%I" in one call, then get
   with "%h" in another one etc., but that is user's decision).  That
   would be another ABI issue, if the _M_finalize_state is called in
   do_get and the containing get calls it as well, then depending on which
   _M_finalize_state is used at runtime it would either cope well with the
   double finalization or not; now, if users never called time_get stuff
   on uninitialized struct tm, we could just add another bit to the state,
   whether the state has been actually encoded there, and call
   _M_finalize_state in do_get only if it hasn't been.

Just curious, why doesn't the pmf hack work on arm-vxworks7?
Sure, clang++ doesn't support it and if one derives a class from time_get
and overrides do_get and/or get things will not work properly either.

	Jakub


  reply	other threads:[~2023-02-17 12:56 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-17  7:47 Alexandre Oliva
2023-02-17 12:56 ` Jakub Jelinek [this message]
2023-02-22 14:27   ` Alexandre Oliva
2023-02-23 17:55     ` Alexandre Oliva
2023-02-23 18:21       ` Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+95gEhz3zQj3ngU@tucnak \
    --to=jakub@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jwakely@redhat.com \
    --cc=libstdc++@gcc.gnu.org \
    --cc=oliva@adacore.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).