public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Fix %Z parsing in strptime [BZ #16088]
@ 2023-06-22 15:00 Stanley Lancaster
  2023-06-22 15:21 ` Andreas Schwab
  0 siblings, 1 reply; 6+ messages in thread
From: Stanley Lancaster @ 2023-06-22 15:00 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stanley Lancaster

%Z parsing terminated on space, should've terminated on next character in format string

Author: Stanley Lancast <lancasterharp@gmail.com>
---
 time/strptime_l.c   | 5 +++--
 time/tst-strptime.c | 1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/time/strptime_l.c b/time/strptime_l.c
index 85c3249fcc..5f96795406 100644
--- a/time/strptime_l.c
+++ b/time/strptime_l.c
@@ -772,8 +772,9 @@ __strptime_internal (const char *rp, const char *fmt, struct tm *tmp,
 	  /* Read timezone but perform no conversion.  */
 	  while (ISSPACE (*rp))
 	    rp++;
-	  while (!ISSPACE (*rp) && *rp != '\0')
-	    rp++;
+
+	  while (*rp != *(fmt) && *rp != '\0')
+	  	rp++; 
 	  break;
 	case 'z':
 	  /* We recognize four formats:
diff --git a/time/tst-strptime.c b/time/tst-strptime.c
index 3dae9e0594..6eb1af5c15 100644
--- a/time/tst-strptime.c
+++ b/time/tst-strptime.c
@@ -37,6 +37,7 @@ static const struct
   { "C", "2000-01-01", "%Y-%m-%d", 6, 0, 0, 1 },
   { "C", "03/03/00", "%D", 5, 62, 2, 3 },
   { "C", "9/9/99", "%x", 4, 251, 8, 9 },
+  { "C", "CST-2000-01-01", "%Z-%Y-%m-%d", 6, 0, 0, 1}, /* Ensure %Z terminates properly */
   { "C", "19990502123412", "%Y%m%d%H%M%S", 0, 121, 4, 2 },
   { "C", "2001 20 Mon", "%Y %U %a", 1, 140, 4, 21 },
   { "C", "2001 21 Mon", "%Y %W %a", 1, 140, 4, 21 },
-- 
2.31.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix %Z parsing in strptime [BZ #16088]
  2023-06-22 15:00 [PATCH] Fix %Z parsing in strptime [BZ #16088] Stanley Lancaster
@ 2023-06-22 15:21 ` Andreas Schwab
       [not found]   ` <CAF+15GzeA1T9qPAtVgBgEWpG_Mra_ePRHu7QUBQNvc8F9=nACw@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Schwab @ 2023-06-22 15:21 UTC (permalink / raw)
  To: Stanley Lancaster via Libc-alpha; +Cc: Stanley Lancaster

On Jun 22 2023, Stanley Lancaster via Libc-alpha wrote:

> %Z parsing terminated on space, should've terminated on next character in format string
>
> Author: Stanley Lancast <lancasterharp@gmail.com>
> ---
>  time/strptime_l.c   | 5 +++--
>  time/tst-strptime.c | 1 +
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/time/strptime_l.c b/time/strptime_l.c
> index 85c3249fcc..5f96795406 100644
> --- a/time/strptime_l.c
> +++ b/time/strptime_l.c
> @@ -772,8 +772,9 @@ __strptime_internal (const char *rp, const char *fmt, struct tm *tmp,
>  	  /* Read timezone but perform no conversion.  */
>  	  while (ISSPACE (*rp))
>  	    rp++;
> -	  while (!ISSPACE (*rp) && *rp != '\0')
> -	    rp++;
> +
> +	  while (*rp != *(fmt) && *rp != '\0')
> +	  	rp++; 

The next character could be '%'.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Fwd: [PATCH] Fix %Z parsing in strptime [BZ #16088]
       [not found]   ` <CAF+15GzeA1T9qPAtVgBgEWpG_Mra_ePRHu7QUBQNvc8F9=nACw@mail.gmail.com>
@ 2023-06-22 17:05     ` Stanley Lancaster
  2023-06-22 20:53       ` Paul Eggert
  0 siblings, 1 reply; 6+ messages in thread
From: Stanley Lancaster @ 2023-06-22 17:05 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2127 bytes --]

---------- Forwarded message ---------
From: Stanley Lancaster <lancasterharp@gmail.com>
Date: Thu, Jun 22, 2023 at 10:51 AM
Subject: Re: [PATCH] Fix %Z parsing in strptime [BZ #16088]
To: Andreas Schwab <schwab@suse.de>


Yes. I thought about this possibility. %Z does not provide enough
information about what a "time zone name" looks like. The parser cannot
know when a timezone name ends without the user telling them. If we have
something like "%Z%B" as the format string, and "CSTJUN" as the input, the
parser cannot have any idea if CSTJUN is the timezone name, or if CSTJ is
the timezone name, or if CSTJU is the timezone name, and so on and so
forth...

There has to be some type of failure mode here to allow the programmer to
understand that they need to provide some sort of terminator for %Z. I'm
open to suggestions, if a different failure mode, besides simply parsing
the input incorrectly would be appropriate.

On Thu, Jun 22, 2023 at 10:21 AM Andreas Schwab <schwab@suse.de> wrote:

> On Jun 22 2023, Stanley Lancaster via Libc-alpha wrote:
>
> > %Z parsing terminated on space, should've terminated on next character
> in format string
> >
> > Author: Stanley Lancast <lancasterharp@gmail.com>
> > ---
> >  time/strptime_l.c   | 5 +++--
> >  time/tst-strptime.c | 1 +
> >  2 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/time/strptime_l.c b/time/strptime_l.c
> > index 85c3249fcc..5f96795406 100644
> > --- a/time/strptime_l.c
> > +++ b/time/strptime_l.c
> > @@ -772,8 +772,9 @@ __strptime_internal (const char *rp, const char
> *fmt, struct tm *tmp,
> >         /* Read timezone but perform no conversion.  */
> >         while (ISSPACE (*rp))
> >           rp++;
> > -       while (!ISSPACE (*rp) && *rp != '\0')
> > -         rp++;
> > +
> > +       while (*rp != *(fmt) && *rp != '\0')
> > +             rp++;
>
> The next character could be '%'.
>
> --
> Andreas Schwab, SUSE Labs, schwab@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fwd: [PATCH] Fix %Z parsing in strptime [BZ #16088]
  2023-06-22 17:05     ` Fwd: " Stanley Lancaster
@ 2023-06-22 20:53       ` Paul Eggert
  2023-06-23  2:31         ` Stanley Lancaster
  2023-06-23  6:48         ` Florian Weimer
  0 siblings, 2 replies; 6+ messages in thread
From: Paul Eggert @ 2023-06-22 20:53 UTC (permalink / raw)
  To: Stanley Lancaster; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 770 bytes --]

On 2023-06-22 10:05, Stanley Lancaster via Libc-alpha wrote:
> Yes. I thought about this possibility. %Z does not provide enough
> information about what a "time zone name" looks like.

tzdb and C-locale POSIX time zone names can contain only ASCII 
alphanumerics, "+", and "-", and must contain at least three characters. 
So you could parse only instances of [-+a-zA-Z0-9]{3,}. Although not 
perfect, that would be better than parsing until the next space.

By the way, it was misleading for the glibc manual's strptime section to 
document %Z as "The timezone name". It's not a timezone name - it's a 
time zone abbreviation. The correct term is used in the strftime 
section. I installed the attached documentation patch to fix this issue 
where I found it in glibc.

[-- Attachment #2: 0001-Call-CST-a-time-zone-abbreviation-not-a-name.patch --]
[-- Type: text/x-patch, Size: 9055 bytes --]

From 21fbc0a19366f89638a30eef2b53c6d4baafdb88 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 22 Jun 2023 13:44:50 -0700
Subject: [PATCH] Call "CST" a time zone abbreviation, not a name

In documentation, call strings like "CST" time zone abbreviations, not
time zone names.  This terminology is more precise, and is what tzdb uses.
A string like "CST" is ambiguous and does not fully name a time zone.
---
 manual/conf.texi        |  6 +++---
 manual/time.texi        | 18 +++++++++---------
 posix/bits/posix1_lim.h |  2 +-
 time/mktime.c           |  2 +-
 time/time.h             |  2 +-
 time/tzfile.c           |  4 ++--
 time/tzset.c            |  6 +++---
 timezone/tst-bz28707.c  |  2 +-
 8 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/manual/conf.texi b/manual/conf.texi
index ba9847aaa4..158285dbf0 100644
--- a/manual/conf.texi
+++ b/manual/conf.texi
@@ -85,10 +85,10 @@ If defined, the unvarying maximum number of streams that a single
 process can have open simultaneously.  @xref{Opening Streams}.
 @end deftypevr
 
-@cindex limits, time zone name length
+@cindex limits, time zone abbreviation length
 @deftypevr Macro int TZNAME_MAX
 @standards{POSIX.1, limits.h}
-If defined, the unvarying maximum length of a time zone name.
+If defined, the unvarying maximum length of a time zone abbreviation.
 @xref{Time Zone Functions}.
 @end deftypevr
 
@@ -1044,7 +1044,7 @@ simultaneously.  Its value is @code{8}.
 @item _POSIX_TZNAME_MAX
 @standards{POSIX.1, limits.h}
 The value of this macro is the most restrictive limit permitted by POSIX
-for the maximum length of a time zone name.  Its value is @code{3}.
+for the maximum length of a time zone abbreviation.  Its value is @code{3}.
 
 @item _POSIX2_RE_DUP_MAX
 @standards{POSIX.2, limits.h}
diff --git a/manual/time.texi b/manual/time.texi
index 3aabdc4953..d661d55d40 100644
--- a/manual/time.texi
+++ b/manual/time.texi
@@ -1025,7 +1025,7 @@ The @code{tm_gmtoff} field is derived from BSD and is a GNU library
 extension; it is not visible in a strict @w{ISO C} environment.
 
 @item const char *tm_zone
-This field is the name for the time zone that was used to compute this
+This field is the abbreviation for the time zone that was used to compute this
 broken-down time value.  Like @code{tm_gmtoff}, this field is a BSD and
 GNU extension, and is not visible in a strict @w{ISO C} environment.
 @end table
@@ -2205,7 +2205,7 @@ The full alternative year representation.
 The offset from GMT in @w{ISO 8601}/RFC822 format.
 
 @item %Z
-The timezone name.
+The time zone abbreviation.
 
 @emph{Note:} Currently, this is not fully implemented.  The format is
 recognized, input is consumed but no field in @var{tm} is set.
@@ -2366,7 +2366,7 @@ current time of the timezone matched, not of the current timezone of the
 runtime environment.
 
 @emph{Note}: This is not implemented (currently).  The problem is that
-timezone names are not unique.  If a fixed timezone is assumed for a
+time zone abbreviations are not unique.  If a fixed time zone is assumed for a
 given string (say @code{EST} meaning US East Coast time), then uses for
 countries other than the USA will fail.  So far we have found no good
 solution to this.
@@ -2522,10 +2522,10 @@ summer time) in the local time zone:
 @r{@var{std} @var{offset}}
 @end smallexample
 
-The @var{std} string specifies the name of the time zone.  It must be
+The @var{std} string specifies the time zone abbreviation.  It must be
 three or more characters long and must not contain a leading colon,
 embedded digits, commas, nor plus and minus signs.  There is no space
-character separating the time zone name from the @var{offset}, so these
+character separating the time zone abbreviation from the @var{offset}, so these
 restrictions are necessary to parse the specification correctly.
 
 The @var{offset} specifies the time value you must add to the local time
@@ -2549,7 +2549,7 @@ The second format is used when there is Daylight Saving Time:
 @end smallexample
 
 The initial @var{std} and @var{offset} specify the standard time zone, as
-described above.  The @var{dst} string and @var{offset} specify the name
+described above.  The @var{dst} string and @var{offset} are the abbreviation
 and offset for the corresponding Daylight Saving Time zone; if the
 @var{offset} is omitted, it defaults to one hour ahead of standard time.
 
@@ -2678,10 +2678,10 @@ community of volunteers and put in the public domain.
 @deftypevar {char *} tzname [2]
 @standards{POSIX.1, time.h}
 The array @code{tzname} contains two strings, which are the standard
-names of the pair of time zones (standard and Daylight
-Saving) that the user has selected.  @code{tzname[0]} is the name of
+abbreviations of the pair of time zones (standard and Daylight
+Saving) that the user has selected.  @code{tzname[0]} abbreviates
 the standard time zone (for example, @code{"EST"}), and @code{tzname[1]}
-is the name for the time zone when Daylight Saving Time is in use (for
+abbreviates the time zone when Daylight Saving Time is in use (for
 example, @code{"EDT"}).  These correspond to the @var{std} and @var{dst}
 strings (respectively) from the @code{TZ} environment variable.  If
 Daylight Saving Time is never used, @code{tzname[1]} is the empty string.
diff --git a/posix/bits/posix1_lim.h b/posix/bits/posix1_lim.h
index c2ec164228..e83d6254d5 100644
--- a/posix/bits/posix1_lim.h
+++ b/posix/bits/posix1_lim.h
@@ -134,7 +134,7 @@
 /* Maximum number of characters in a tty name.  */
 #define	_POSIX_TTY_NAME_MAX	9
 
-/* Maximum length of a timezone name (element of `tzname').  */
+/* Maximum length of a time zone abbreviation (element of 'tzname').  */
 #ifdef __USE_XOPEN2K
 # define _POSIX_TZNAME_MAX	6
 #else
diff --git a/time/mktime.c b/time/mktime.c
index 94a4320e6c..a73474e7ee 100644
--- a/time/mktime.c
+++ b/time/mktime.c
@@ -536,7 +536,7 @@ __time64_t
 __mktime64 (struct tm *tp)
 {
   /* POSIX.1 8.1.1 requires that whenever mktime() is called, the
-     time zone names contained in the external variable 'tzname' shall
+     time zone abbreviations contained in the external variable 'tzname' shall
      be set as if the tzset() function had been called.  */
   __tzset ();
 
diff --git a/time/time.h b/time/time.h
index 368f4dc588..859d966822 100644
--- a/time/time.h
+++ b/time/time.h
@@ -215,7 +215,7 @@ extern char *__REDIRECT_NTH (ctime_r, (const time_t *__restrict __timer,
 
 
 /* Defined in localtime.c.  */
-extern char *__tzname[2];	/* Current timezone names.  */
+extern char *__tzname[2];	/* Current time zone abbreviations.  */
 extern int __daylight;		/* If daylight-saving time is ever in use.  */
 extern long int __timezone;	/* Seconds west of UTC.  */
 
diff --git a/time/tzfile.c b/time/tzfile.c
index a267eb652e..8a923d0ccc 100644
--- a/time/tzfile.c
+++ b/time/tzfile.c
@@ -408,7 +408,7 @@ __tzfile_read (const char *file, size_t extra, char **extrap)
 
   fclose (f);
 
-  /* First "register" all timezone names.  */
+  /* First "register" all time zone abbreviations.  */
   for (i = 0; i < num_types; ++i)
     if (__tzstring (&zone_names[types[i].idx]) == NULL)
       goto ret_free_transitions;
@@ -565,7 +565,7 @@ __tzfile_default (const char *std, const char *dst,
   types[1].offset = dstoff;
   types[1].isdst = 1;
 
-  /* Reset the zone names to point to the user's names.  */
+  /* Reset time zone abbreviations to point to the user's abbreviations.  */
   __tzname[0] = (char *) std;
   __tzname[1] = (char *) dst;
 
diff --git a/time/tzset.c b/time/tzset.c
index 8c740a4e4d..78c18f8147 100644
--- a/time/tzset.c
+++ b/time/tzset.c
@@ -145,7 +145,7 @@ compute_offset (unsigned int ss, unsigned int mm, unsigned int hh)
   return ss + mm * 60 + hh * 60 * 60;
 }
 
-/* Parses the time zone name at *TZP, and writes a pointer to an
+/* Parses the time zone abbreviation at *TZP, and writes a pointer to an
    interned string to tz_rules[WHICHRULE].name.  On success, advances
    *TZP, and returns true.  Returns false otherwise.  */
 static bool
@@ -324,10 +324,10 @@ __tzset_parse_tz (const char *tz)
   memset (tz_rules, '\0', sizeof tz_rules);
   tz_rules[0].name = tz_rules[1].name = "";
 
-  /* Get the standard timezone name.  */
+  /* Get the standard time zone abbreviations.  */
   if (parse_tzname (&tz, 0) && parse_offset (&tz, 0))
     {
-      /* Get the DST timezone name (if any).  */
+      /* Get the DST time zone abbreviation (if any).  */
       if (*tz != '\0')
 	{
 	  if (parse_tzname (&tz, 1))
diff --git a/timezone/tst-bz28707.c b/timezone/tst-bz28707.c
index e29546c331..6027c40679 100644
--- a/timezone/tst-bz28707.c
+++ b/timezone/tst-bz28707.c
@@ -37,7 +37,7 @@ do_test (void)
   tzset ();
 
   return
-    /* Sanity-check that we got the right timezone-name for DST.  For
+    /* Sanity-check that we got the right abbreviation for DST.  For
        normal time, we're likely to get "-00" (the "unspecified" marker),
        even though the POSIX timezone string says "-04".  Let's not test
        that.  */
-- 
2.39.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fwd: [PATCH] Fix %Z parsing in strptime [BZ #16088]
  2023-06-22 20:53       ` Paul Eggert
@ 2023-06-23  2:31         ` Stanley Lancaster
  2023-06-23  6:48         ` Florian Weimer
  1 sibling, 0 replies; 6+ messages in thread
From: Stanley Lancaster @ 2023-06-23  2:31 UTC (permalink / raw)
  To: Paul Eggert; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 957 bytes --]

I'll rework the patch to parse the string in this manner

Thanks so much for your suggestions


On Thu, Jun 22, 2023, 3:53 PM Paul Eggert <eggert@cs.ucla.edu> wrote:

> On 2023-06-22 10:05, Stanley Lancaster via Libc-alpha wrote:
> > Yes. I thought about this possibility. %Z does not provide enough
> > information about what a "time zone name" looks like.
>
> tzdb and C-locale POSIX time zone names can contain only ASCII
> alphanumerics, "+", and "-", and must contain at least three characters.
> So you could parse only instances of [-+a-zA-Z0-9]{3,}. Although not
> perfect, that would be better than parsing until the next space.
>
> By the way, it was misleading for the glibc manual's strptime section to
> document %Z as "The timezone name". It's not a timezone name - it's a
> time zone abbreviation. The correct term is used in the strftime
> section. I installed the attached documentation patch to fix this issue
> where I found it in glibc.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fwd: [PATCH] Fix %Z parsing in strptime [BZ #16088]
  2023-06-22 20:53       ` Paul Eggert
  2023-06-23  2:31         ` Stanley Lancaster
@ 2023-06-23  6:48         ` Florian Weimer
  1 sibling, 0 replies; 6+ messages in thread
From: Florian Weimer @ 2023-06-23  6:48 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Stanley Lancaster, libc-alpha

* Paul Eggert:

> From 21fbc0a19366f89638a30eef2b53c6d4baafdb88 Mon Sep 17 00:00:00 2001
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 22 Jun 2023 13:44:50 -0700
> Subject: [PATCH] Call "CST" a time zone abbreviation, not a name
>
> In documentation, call strings like "CST" time zone abbreviations, not
> time zone names.  This terminology is more precise, and is what tzdb uses.
> A string like "CST" is ambiguous and does not fully name a time zone.

This patch is okay, thanks.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Florian


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-06-23  6:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-22 15:00 [PATCH] Fix %Z parsing in strptime [BZ #16088] Stanley Lancaster
2023-06-22 15:21 ` Andreas Schwab
     [not found]   ` <CAF+15GzeA1T9qPAtVgBgEWpG_Mra_ePRHu7QUBQNvc8F9=nACw@mail.gmail.com>
2023-06-22 17:05     ` Fwd: " Stanley Lancaster
2023-06-22 20:53       ` Paul Eggert
2023-06-23  2:31         ` Stanley Lancaster
2023-06-23  6:48         ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).