public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/77760] get_time needs to set tm_wday amd tm_yday
       [not found] <bug-77760-4@http.gcc.gnu.org/bugzilla/>
@ 2021-12-15 19:43 ` jakub at gcc dot gnu.org
  2022-01-10 14:40 ` cvs-commit at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-12-15 19:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77760

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Can we have any state passed around?
_M_extract_via_format can have some extra reference argument to some state like
glibc has passed through it, but wouldn't that work only for the recursive
calls?
E.g. if one uses get with "%r" format or some other one that recurses and
contains everything to recompute the fields, we could recompute them.
But if get itself contains a format string, then each format specifier in it
seems to be required to be processed separately by
https://eel.is/c++draft/locale.time.get#members-8.4
- a virtual method which has standard mandated arguments is called for each
format specifier, so it is unclear how to carry some state around the parsing
of the whole format string.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/77760] get_time needs to set tm_wday amd tm_yday
       [not found] <bug-77760-4@http.gcc.gnu.org/bugzilla/>
  2021-12-15 19:43 ` [Bug libstdc++/77760] get_time needs to set tm_wday amd tm_yday jakub at gcc dot gnu.org
@ 2022-01-10 14:40 ` cvs-commit at gcc dot gnu.org
  2023-02-09  8:50 ` aoliva at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-01-10 14:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77760

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:a8d3c98746098e2784be7144c1ccc9fcc34a0888

commit r12-6410-ga8d3c98746098e2784be7144c1ccc9fcc34a0888
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Mon Jan 10 15:38:47 2022 +0100

    libstdc++: Add %j, %U, %w, %W time_get support, fix %y, %Y, %C, %p
[PR77760]

    glibc strptime passes around some state, what fields in struct tm have been
    set and what needs to be finalized through possibly recursive calls, and
    at the end performs various finalizations, like applying %p so that it
    works for both %I %p and %p %I orders, or applying century so that both
    %C %y and %y %C works, or computation of missing fields from others
    (e.g. from %Y and %j one can compute tm_mon, tm_mday and tm_wday,
    from %Y %U %w, %Y %W %w, %Y %U %a, or %Y %W %w one can compute
    tm_mon, tm_mday, tm_yday or e.g. from %Y %m %d one can compute tm_wday
    and tm_yday.

    As the finalization is quite large and doesn't need to be a template
    (doesn't depend on any iterators or char types), I've put it into
libstdc++,
    and left some padding in the state struct, so that perhaps in the future we
    can track some more state without changing ABI.

    Unfortunately, there is an ugly problem that the standard mandates that
    get method calls the do_get virtual method and I don't see how we can
    cary on any state in between those calls (even if we did an ABI change
    for the facets, the methods are const, so that I think multiple threads
    could use the same time_get objects and we couldn't store state in there).

    There is a hack for that for GCC (seems to work with ICC too, doesn't work
    with clang++) if the do_get method isn't overriden we can pass the state
    around.

    For both do_get_year and per IRC discussions also for %y, the behavior is
    if 1-2 digits are parsed, the year is treated according to POSIX 2008 %y
    rules (0-68 is 2000-2068, 69-99 is 1969-1999), if 3-4 digits are parsed,
    it is treated as %Y.

    2022-01-10  Jakub Jelinek  <jakub@redhat.com>

            PR libstdc++/77760
            * include/bits/locale_facets_nonio.h (__time_get_state): New
struct.
            (time_get::_M_extract_via_format): Declare new method with
            __time_get_state& as an extra argument.
            * include/bits/locale_facets_nonio.tcc (_M_extract_via_format): Add
            __state argument, set various fields in it while parsing.  Handle
%j,
            %U, %w and %W, fix up handling of %y, %Y and %C, don't adjust
tm_hour
            for %p immediately.  Add a wrapper around the method without the
            __state argument for backwards compatibility.
            (_M_extract_num): Remove all __len == 4 special cases.
            (time_get::do_get_time, time_get::do_get_date, time_get::do_get):
Zero
            initialize __state, pass it to _M_extract_via_format and finalize
it
            at the end.
            (do_get_year): For 1-2 digit parsed years, map 0-68 to 2000-2068,
            69-99 to 1969-1999.  For 3-4 digit parsed years use that as year.
            (get): If do_get isn't overloaded from the locale_facets_nonio.tcc
            version, don't call do_get but call _M_extract_via_format instead
to
            pass around state.
            * config/abi/pre/gnu.ver (GLIBCXX_3.4.30): Export
_M_extract_via_format
            with extra __time_get_state and
__time_get_state::_M_finalize_state.
            * src/c++98/locale_facets.cc (is_leap, day_of_the_week,
            day_of_the_year): New functions in anon namespace.
            (mon_yday): New var in anon namespace.
            (__time_get_state::_M_finalize_state): Define.
            * testsuite/22_locale/time_get/get/char/4.cc: New test.
            * testsuite/22_locale/time_get/get/wchar_t/4.cc: New test.
            * testsuite/22_locale/time_get/get_year/char/1.cc (test01): Parse
197
            as year 197AD instead of error.
            * testsuite/22_locale/time_get/get_year/char/5.cc (test01): Parse 1
as
            year 2001 instead of error.
            * testsuite/22_locale/time_get/get_year/char/6.cc: New test.
            * testsuite/22_locale/time_get/get_year/wchar_t/1.cc (test01):
Parse
            197 as year 197AD instead of error.
            * testsuite/22_locale/time_get/get_year/wchar_t/5.cc (test01):
Parse
            1 as year 2001 instead of error.
            * testsuite/22_locale/time_get/get_year/wchar_t/6.cc: New test.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/77760] get_time needs to set tm_wday amd tm_yday
       [not found] <bug-77760-4@http.gcc.gnu.org/bugzilla/>
  2021-12-15 19:43 ` [Bug libstdc++/77760] get_time needs to set tm_wday amd tm_yday jakub at gcc dot gnu.org
  2022-01-10 14:40 ` cvs-commit at gcc dot gnu.org
@ 2023-02-09  8:50 ` aoliva at gcc dot gnu.org
  2023-02-09  9:08 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: aoliva at gcc dot gnu.org @ 2023-02-09  8:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77760

Alexandre Oliva <aoliva at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |aoliva at gcc dot gnu.org

--- Comment #3 from Alexandre Oliva <aoliva at gcc dot gnu.org> ---
I'm looking at a case in which __clang__ is defined, despite compiling with
GCC, and "%I...%p" parsing fails because the hack to pass state around doesn't
work when __clang__ is defined.

This got me thinking that there are more than enough bits in struct tm to
encode all of __time_get_state in it, even with redundancy, so that overwritten
fields could get recovered.  I'm thinking that, before calling do_get, get
would transfer its state onto struct tm in such a recoverable way, and, after
calling it, it would restore state from struct tm and remove the extra
out-of-range encodings.  Our do_get, in turn, would extract state from struct
tm, do its current job, and then pack the state back into struct tm.

AFAICT this could be put in in an ABI-compatible way, dropping the hack and
enabling libstdc++-specific do_get specializations to deal with such extended
states, should we document them.

Has anything along these lines already been considered and ruled out?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/77760] get_time needs to set tm_wday amd tm_yday
       [not found] <bug-77760-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2023-02-09  8:50 ` aoliva at gcc dot gnu.org
@ 2023-02-09  9:08 ` jakub at gcc dot gnu.org
  2023-02-09 23:40 ` aoliva at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-09  9:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77760

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Alexandre Oliva from comment #3)
> I'm looking at a case in which __clang__ is defined, despite compiling with
> GCC, and "%I...%p" parsing fails because the hack to pass state around
> doesn't work when __clang__ is defined.

That would be user's fault if __clang__ is defined even when it is not clang.

> This got me thinking that there are more than enough bits in struct tm to
> encode all of __time_get_state in it, even with redundancy, so that

But can we rely on that?  I'm afraid not all users are clearing struct tm
before
calling the time_get methods, some are calling it with some members already
initialized from something else (manually etc.) and others not initialized at
all, and depending on
which format specifiers are used expect only some of the struct tm members to
be updated, not all of them.  Say fill in manually hour/minute/second and then
parse just date, or vice versa.
Which is why I've done the tracking of the state on the side rather than
directly in it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/77760] get_time needs to set tm_wday amd tm_yday
       [not found] <bug-77760-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2023-02-09  9:08 ` jakub at gcc dot gnu.org
@ 2023-02-09 23:40 ` aoliva at gcc dot gnu.org
  2023-02-10  3:05 ` redi at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: aoliva at gcc dot gnu.org @ 2023-02-09 23:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77760

--- Comment #5 from Alexandre Oliva <aoliva at gcc dot gnu.org> ---
I'm not entirely sure what the point of testing for __clang__ is, really.  Is
libstdc++ used with, or supposed to be used (say, as a system library) with
__clang__?  If so, wouldn't it be useful if it actually worked as expected? 
Currently, the code compiles just fine, though it fails to parse %I%p if
compiled with the actual __clang__ (or when __clang__ is defined on other
compilers for whatever reasons, but that's besides the point).  My thinking is
that either libstdc++ headers are to work with clang, in which case there's an
error in that bit, or they aren't, in which case the preprocessor test is
superfluous and, in this oddball case, harmful.

As for tm bits, my suggestion was to overwrite tm fields internally, not to
expose that externally.  They'd be used as scratch bits.  As in, member
functions in the public interface would not use incoming tm bits as
__time_get_state, but rather a zeroed-out __time_get_state structure, as today,
but when calling the internal implementation primitive do_get, they'd *blindly*
*overwrite* some of the tm bits with those from __time_get_state, and when
do-get returns, they'd pull them back into __time_get_state and out of tm. 
They'd be used as scratch bits, which AFAICT is permissible.  do_get, being
protected and thus more of an internal implementation bit, could make use of
those scratch bits.  do_get overriders could tweak them, for better or worse,
but since this use would be nonstandard, we could probably get away with
assuming any such uses to be libstdc++-specific.  It would probably not be wise
for users to rely on this internal extension, though, since one can hope the
standard will eventually  make room for an implementation of time_get that is
both standard-compliant and compatible with reasonable strptime expectations.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/77760] get_time needs to set tm_wday amd tm_yday
       [not found] <bug-77760-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2023-02-09 23:40 ` aoliva at gcc dot gnu.org
@ 2023-02-10  3:05 ` redi at gcc dot gnu.org
  2023-02-10  8:33 ` jakub at gcc dot gnu.org
  2023-02-17  7:54 ` aoliva at gcc dot gnu.org
  7 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2023-02-10  3:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77760

--- Comment #6 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Alexandre Oliva from comment #5)
> I'm not entirely sure what the point of testing for __clang__ is, really. 
> Is libstdc++ used with, or supposed to be used (say, as a system library)
> with __clang__?

Yes, routinely.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/77760] get_time needs to set tm_wday amd tm_yday
       [not found] <bug-77760-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2023-02-10  3:05 ` redi at gcc dot gnu.org
@ 2023-02-10  8:33 ` jakub at gcc dot gnu.org
  2023-02-17  7:54 ` aoliva at gcc dot gnu.org
  7 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-10  8:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77760

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Alexandre Oliva from comment #5)
> As for tm bits, my suggestion was to overwrite tm fields internally, not to
> expose that externally.  They'd be used as scratch bits.  As in, member
> functions in the public interface would not use incoming tm bits as
> __time_get_state, but rather a zeroed-out __time_get_state structure, as
> today, but when calling the internal implementation primitive do_get, they'd
> *blindly* *overwrite* some of the tm bits with those from __time_get_state,
> and when do-get returns, they'd pull them back into __time_get_state and out
> of tm.  They'd be used as scratch bits, which AFAICT is permissible. 
> do_get, being protected and thus more of an internal implementation bit,
> could make use of those scratch bits.  do_get overriders could tweak them,
> for better or worse, but since this use would be nonstandard, we could
> probably get away with assuming any such uses to be libstdc++-specific.  It
> would probably not be wise for users to rely on this internal extension,
> though, since one can hope the standard will eventually  make room for an
> implementation of time_get that is both standard-compliant and compatible
> with reasonable strptime expectations.

If all users would initialize struct tm to zeros before calling the APIs, then
I can understand that it would work (but still libstdc++ would need to know
which of the calls are nested and which are the outermost, so that they'd
finalize the state before leaving the outermost method back to canonical form).
 But if some fields can be uninitialized, I really don't understand how you
could use them as scratch bits, they could have random values upon entering the
outermost method.

I don't know if C++ says anything about the prior content of struct tm (but
given the recursive processing of some format specifiers it is hard to imagine
they'd e.g. clear the whole struct), for strptime POSIX manpage says:
"It is unspecified whether multiple calls to strptime() using the same tm
structure will update the current contents of the structure or overwrite all
contents of the structure. Conforming applications should make a single call to
strptime() with a format and all data needed to completely specify the date and
time being converted."
and the Linux manpage:
"In principle, this function does not initialize tm but stores only the values
specified. This means that tm should be initialized before the call. Details
differ a bit between different UNIX systems. The glibc implementation does not
touch those fields which are not explicitly specified, except that it
recomputes the tm_wday and tm_yday field if any of the year, month, or day
elements changed."
Now, if even the POSIX manpage shows in an example a case where struct tm isn't
zeroed out before calling it (but has format specifiers to initialize
everything).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/77760] get_time needs to set tm_wday amd tm_yday
       [not found] <bug-77760-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2023-02-10  8:33 ` jakub at gcc dot gnu.org
@ 2023-02-17  7:54 ` aoliva at gcc dot gnu.org
  7 siblings, 0 replies; 8+ messages in thread
From: aoliva at gcc dot gnu.org @ 2023-02-17  7:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77760

--- Comment #8 from Alexandre Oliva <aoliva at gcc dot gnu.org> ---
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612198.html has a
simple-minded implementation, that should make it clear what I mean by scratch:
get() pays no regard to the incoming bits in tm, it initializes them with a
zeroed-out state.

Now, I realize that do_get, if called by a derived class with an uninitialized
tm, might do weird things, because it would take some of those bits as state. 
Is this something of concern?  As in, how internal and reserved for the
implementation is the intermediate state of tm between get and do_get?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-02-17  7:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-77760-4@http.gcc.gnu.org/bugzilla/>
2021-12-15 19:43 ` [Bug libstdc++/77760] get_time needs to set tm_wday amd tm_yday jakub at gcc dot gnu.org
2022-01-10 14:40 ` cvs-commit at gcc dot gnu.org
2023-02-09  8:50 ` aoliva at gcc dot gnu.org
2023-02-09  9:08 ` jakub at gcc dot gnu.org
2023-02-09 23:40 ` aoliva at gcc dot gnu.org
2023-02-10  3:05 ` redi at gcc dot gnu.org
2023-02-10  8:33 ` jakub at gcc dot gnu.org
2023-02-17  7:54 ` aoliva at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).