public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Fix %Z parsing in strptime [BZ #16088]
@ 2023-07-14 14:52 Stanley Lancaster
  2023-07-14 16:11 ` Paul Eggert
  0 siblings, 1 reply; 7+ messages in thread
From: Stanley Lancaster @ 2023-07-14 14:52 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stanley Lancaster

---
 time/strptime_l.c   | 5 ++++-
 time/tst-strptime.c | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/time/strptime_l.c b/time/strptime_l.c
index 85c3249fcc..5954015c4e 100644
--- a/time/strptime_l.c
+++ b/time/strptime_l.c
@@ -770,9 +770,12 @@ __strptime_internal (const char *rp, const char *fmt, struct tm *tmp,
 	  break;
 	case 'Z':
 	  /* Read timezone but perform no conversion.  */
+	  /* we recognize the format [-+a-zA-Z0-9]{3,} */
 	  while (ISSPACE (*rp))
 	    rp++;
-	  while (!ISSPACE (*rp) && *rp != '\0')
+	  
+	  const char* stop_rp = rp + 3;
+	  while (((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <= 'z') || (*rp >= '0' && *rp <= '9')) && (rp < stop_rp) && *rp != '\0')
 	    rp++;
 	  break;
 	case 'z':
diff --git a/time/tst-strptime.c b/time/tst-strptime.c
index 3dae9e0594..31d6945ef1 100644
--- a/time/tst-strptime.c
+++ b/time/tst-strptime.c
@@ -38,6 +38,7 @@ static const struct
   { "C", "03/03/00", "%D", 5, 62, 2, 3 },
   { "C", "9/9/99", "%x", 4, 251, 8, 9 },
   { "C", "19990502123412", "%Y%m%d%H%M%S", 0, 121, 4, 2 },
+  { "C", "1999CST0502123412", "%Y%Z%m%d%H%M%S", 0, 121, 4, 2 },
   { "C", "2001 20 Mon", "%Y %U %a", 1, 140, 4, 21 },
   { "C", "2001 21 Mon", "%Y %W %a", 1, 140, 4, 21 },
   { "C", "2001 21 Mon", "%2000Y %W %a", 1, 140, 4, 21 },
-- 
2.39.3


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix %Z parsing in strptime [BZ #16088]
  2023-07-14 14:52 Fix %Z parsing in strptime [BZ #16088] Stanley Lancaster
@ 2023-07-14 16:11 ` Paul Eggert
  2023-07-18 17:17   ` Stanley Lancaster
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Eggert @ 2023-07-14 16:11 UTC (permalink / raw)
  To: Stanley Lancaster; +Cc: libc-alpha

On 2023-07-14 07:52, Stanley Lancaster via Libc-alpha wrote:

>   	  /* Read timezone but perform no conversion.  */
> +	  /* we recognize the format [-+a-zA-Z0-9]{3,} */

Use GNU style in comment with active voice sentences and two spaces 
after sentence end, e.g., "/* Read time zone but perform no conversion. 
Recognize the format [-+a-zA-Z0-9]{3,}.  */".


> +	  const char* stop_rp = rp + 3;

Again, GNU style: "char *stop_rp" not "char* stop_rp".

More important, this has undefined behavior if rp + 3 overflows. 
Instead, count the number of bytes after the loop finishes, and make 
sure it's 3 or more.


> +	  while (((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <= 'z') || (*rp >= '0' && *rp <= '9')) && (rp < stop_rp) && *rp != '\0')

Omit "&& *rp != '\0'"; it's redundant. Reindent to 80 columns.


> +  { "C", "1999CST0502123412", "%Y%Z%m%d%H%M%S", 0, 121, 4, 2 },

I don't see how this test passes. "CST0502123412" is treated as a time 
zone abbreviation, so the only info is the year. Did you run the tests? 
If so, why did this test pass? If not, please run.

Since the patch does not fix BZ#16088, it needs a commit message that 
describes what the patch does and why it's a win even though it doesn't 
fix. In particular, the patch does not set tm_zone, and there's a reason 
for that, and this should be explained.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix %Z parsing in strptime [BZ #16088]
  2023-07-14 16:11 ` Paul Eggert
@ 2023-07-18 17:17   ` Stanley Lancaster
  2023-07-18 17:22     ` Stanley Lancaster
  2023-07-19  2:13     ` Paul Eggert
  0 siblings, 2 replies; 7+ messages in thread
From: Stanley Lancaster @ 2023-07-18 17:17 UTC (permalink / raw)
  To: Paul Eggert; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 3104 bytes --]

---
 time/strptime_l.c   | 16 +++++++++++-----
 time/tst-strptime.c |  2 ++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/time/strptime_l.c b/time/strptime_l.c
index 85c3249fcc..2382defc92 100644
--- a/time/strptime_l.c
+++ b/time/strptime_l.c
@@ -770,11 +770,17 @@ __strptime_internal (const char *rp, const char *fmt,
struct tm *tmp,
   break;
  case 'Z':
   /* Read timezone but perform no conversion.  */
-  while (ISSPACE (*rp))
-    rp++;
-  while (!ISSPACE (*rp) && *rp != '\0')
-    rp++;
-  break;
+ {
+ while (ISSPACE (*rp))
+     rp++;
+ /* Read time zone but perform no conversion. Recognize the format
[-+a-zA-Z0-9]{3,}.  */
+ const char* start_rp = rp;
+ while ((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <= 'z') || (*rp
>= '0' && *rp <= '9'))
+ rp++;
+ if (start_rp+3 < rp)
+ return NULL;
+ }
  case 'z':
   /* We recognize four formats:
      1. Two digits specify hours.
diff --git a/time/tst-strptime.c b/time/tst-strptime.c
index 3dae9e0594..40145cb109 100644
--- a/time/tst-strptime.c
+++ b/time/tst-strptime.c
@@ -48,6 +48,8 @@ static const struct
     6, 0, 0, 1 },
   { "en_US.ISO-8859-1", "2000-01-01 08:12:21 PM", "%Y-%m-%d %I:%M:%S %p",
     6, 0, 0, 1 },
+  { "en_US.ISO-8859-1", "2000-01-01 08:12:21 AM CST/", "%Y-%m-%d %I:%M:%S
%p %Z/",
+    6, 0, 0, 1},
   { "ja_JP.EUC-JP", "2001 20 \xb7\xee", "%Y %U %a", 1, 140, 4, 21 },
   { "ja_JP.EUC-JP", "2001 21 \xb7\xee", "%Y %W %a", 1, 140, 4, 21 },
   /* Most of the languages do not need the declension of the month names
-- 
2.39.3

On Fri, Jul 14, 2023 at 11:11 AM Paul Eggert <eggert@cs.ucla.edu> wrote:

> On 2023-07-14 07:52, Stanley Lancaster via Libc-alpha wrote:
>
> >         /* Read timezone but perform no conversion.  */
> > +       /* we recognize the format [-+a-zA-Z0-9]{3,} */
>
> Use GNU style in comment with active voice sentences and two spaces
> after sentence end, e.g., "/* Read time zone but perform no conversion.
> Recognize the format [-+a-zA-Z0-9]{3,}.  */".
>
>
> > +       const char* stop_rp = rp + 3;
>
> Again, GNU style: "char *stop_rp" not "char* stop_rp".
>
> More important, this has undefined behavior if rp + 3 overflows.
> Instead, count the number of bytes after the loop finishes, and make
> sure it's 3 or more.
>
>
> > +       while (((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <= 'z')
> || (*rp >= '0' && *rp <= '9')) && (rp < stop_rp) && *rp != '\0')
>
> Omit "&& *rp != '\0'"; it's redundant. Reindent to 80 columns.
>
>
> > +  { "C", "1999CST0502123412", "%Y%Z%m%d%H%M%S", 0, 121, 4, 2 },
>
> I don't see how this test passes. "CST0502123412" is treated as a time
> zone abbreviation, so the only info is the year. Did you run the tests?
> If so, why did this test pass? If not, please run.
>
> Since the patch does not fix BZ#16088, it needs a commit message that
> describes what the patch does and why it's a win even though it doesn't
> fix. In particular, the patch does not set tm_zone, and there's a reason
> for that, and this should be explained.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix %Z parsing in strptime [BZ #16088]
  2023-07-18 17:17   ` Stanley Lancaster
@ 2023-07-18 17:22     ` Stanley Lancaster
  2023-07-19  2:13     ` Paul Eggert
  1 sibling, 0 replies; 7+ messages in thread
From: Stanley Lancaster @ 2023-07-18 17:22 UTC (permalink / raw)
  To: Paul Eggert; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 3816 bytes --]

I made several of the changes Paul suggested. And ran all the tests. This
patch does not fill out the tm_zone information because the manpage for the
api lists this a format character where "the  corresponding fields are
parsed, but no field in tm is changed." I feel that if this is the
behaviour developers have expected & written around in the past, that
behaviour should stay. However, if everyone here feels that a full
implementation should be written, I can do so.

On Tue, Jul 18, 2023 at 12:17 PM Stanley Lancaster <lancasterharp@gmail.com>
wrote:

> ---
>  time/strptime_l.c   | 16 +++++++++++-----
>  time/tst-strptime.c |  2 ++
>  2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/time/strptime_l.c b/time/strptime_l.c
> index 85c3249fcc..2382defc92 100644
> --- a/time/strptime_l.c
> +++ b/time/strptime_l.c
> @@ -770,11 +770,17 @@ __strptime_internal (const char *rp, const char
> *fmt, struct tm *tmp,
>    break;
>   case 'Z':
>    /* Read timezone but perform no conversion.  */
> -  while (ISSPACE (*rp))
> -    rp++;
> -  while (!ISSPACE (*rp) && *rp != '\0')
> -    rp++;
> -  break;
> + {
> + while (ISSPACE (*rp))
> +     rp++;
> + /* Read time zone but perform no conversion. Recognize the format
> [-+a-zA-Z0-9]{3,}.  */
> + const char* start_rp = rp;
> + while ((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <= 'z') || (*rp
> >= '0' && *rp <= '9'))
> + rp++;
> + if (start_rp+3 < rp)
> + return NULL;
> + }
>   case 'z':
>    /* We recognize four formats:
>       1. Two digits specify hours.
> diff --git a/time/tst-strptime.c b/time/tst-strptime.c
> index 3dae9e0594..40145cb109 100644
> --- a/time/tst-strptime.c
> +++ b/time/tst-strptime.c
> @@ -48,6 +48,8 @@ static const struct
>      6, 0, 0, 1 },
>    { "en_US.ISO-8859-1", "2000-01-01 08:12:21 PM", "%Y-%m-%d %I:%M:%S %p",
>      6, 0, 0, 1 },
> +  { "en_US.ISO-8859-1", "2000-01-01 08:12:21 AM CST/", "%Y-%m-%d %I:%M:%S
> %p %Z/",
> +    6, 0, 0, 1},
>    { "ja_JP.EUC-JP", "2001 20 \xb7\xee", "%Y %U %a", 1, 140, 4, 21 },
>    { "ja_JP.EUC-JP", "2001 21 \xb7\xee", "%Y %W %a", 1, 140, 4, 21 },
>    /* Most of the languages do not need the declension of the month names
> --
> 2.39.3
>
> On Fri, Jul 14, 2023 at 11:11 AM Paul Eggert <eggert@cs.ucla.edu> wrote:
>
>> On 2023-07-14 07:52, Stanley Lancaster via Libc-alpha wrote:
>>
>> >         /* Read timezone but perform no conversion.  */
>> > +       /* we recognize the format [-+a-zA-Z0-9]{3,} */
>>
>> Use GNU style in comment with active voice sentences and two spaces
>> after sentence end, e.g., "/* Read time zone but perform no conversion.
>> Recognize the format [-+a-zA-Z0-9]{3,}.  */".
>>
>>
>> > +       const char* stop_rp = rp + 3;
>>
>> Again, GNU style: "char *stop_rp" not "char* stop_rp".
>>
>> More important, this has undefined behavior if rp + 3 overflows.
>> Instead, count the number of bytes after the loop finishes, and make
>> sure it's 3 or more.
>>
>>
>> > +       while (((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <=
>> 'z') || (*rp >= '0' && *rp <= '9')) && (rp < stop_rp) && *rp != '\0')
>>
>> Omit "&& *rp != '\0'"; it's redundant. Reindent to 80 columns.
>>
>>
>> > +  { "C", "1999CST0502123412", "%Y%Z%m%d%H%M%S", 0, 121, 4, 2 },
>>
>> I don't see how this test passes. "CST0502123412" is treated as a time
>> zone abbreviation, so the only info is the year. Did you run the tests?
>> If so, why did this test pass? If not, please run.
>>
>> Since the patch does not fix BZ#16088, it needs a commit message that
>> describes what the patch does and why it's a win even though it doesn't
>> fix. In particular, the patch does not set tm_zone, and there's a reason
>> for that, and this should be explained.
>>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix %Z parsing in strptime [BZ #16088]
  2023-07-18 17:17   ` Stanley Lancaster
  2023-07-18 17:22     ` Stanley Lancaster
@ 2023-07-19  2:13     ` Paul Eggert
  2023-07-24 13:53       ` [PATCH] Update to %Z fix Stanley Lancaster
  1 sibling, 1 reply; 7+ messages in thread
From: Paul Eggert @ 2023-07-19  2:13 UTC (permalink / raw)
  To: Stanley Lancaster; +Cc: libc-alpha

Unfortunately that patch was corrupted by Gmail. Please use git 
send-email to send patches.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] Update to %Z fix
  2023-07-19  2:13     ` Paul Eggert
@ 2023-07-24 13:53       ` Stanley Lancaster
  2023-07-24 16:48         ` Paul Eggert
  0 siblings, 1 reply; 7+ messages in thread
From: Stanley Lancaster @ 2023-07-24 13:53 UTC (permalink / raw)
  To: libc-alpha; +Cc: Stanley Lancaster

---
 time/strptime_l.c   | 16 +++++++++++-----
 time/tst-strptime.c |  2 ++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/time/strptime_l.c b/time/strptime_l.c
index 85c3249fcc..2382defc92 100644
--- a/time/strptime_l.c
+++ b/time/strptime_l.c
@@ -770,11 +770,17 @@ __strptime_internal (const char *rp, const char *fmt, struct tm *tmp,
 	  break;
 	case 'Z':
 	  /* Read timezone but perform no conversion.  */
-	  while (ISSPACE (*rp))
-	    rp++;
-	  while (!ISSPACE (*rp) && *rp != '\0')
-	    rp++;
-	  break;
+		{
+			while (ISSPACE (*rp))
+	    	rp++;
+			/* Read timezone but perform no conversion.  */
+			/* we recognize the format [-+a-zA-Z0-9]{3,} */
+			const char* start_rp = rp;
+			while ((*rp >= 'A' && *rp <= 'Z') || (*rp >= 'a' && *rp <= 'z') || (*rp >= '0' && *rp <= '9'))
+				rp++;
+			if (start_rp+3 < rp)
+				return NULL;
+		}
 	case 'z':
 	  /* We recognize four formats:
 	     1. Two digits specify hours.
diff --git a/time/tst-strptime.c b/time/tst-strptime.c
index 3dae9e0594..40145cb109 100644
--- a/time/tst-strptime.c
+++ b/time/tst-strptime.c
@@ -48,6 +48,8 @@ static const struct
     6, 0, 0, 1 },
   { "en_US.ISO-8859-1", "2000-01-01 08:12:21 PM", "%Y-%m-%d %I:%M:%S %p",
     6, 0, 0, 1 },
+  { "en_US.ISO-8859-1", "2000-01-01 08:12:21 AM CST/", "%Y-%m-%d %I:%M:%S %p %Z/", 
+    6, 0, 0, 1},
   { "ja_JP.EUC-JP", "2001 20 \xb7\xee", "%Y %U %a", 1, 140, 4, 21 },
   { "ja_JP.EUC-JP", "2001 21 \xb7\xee", "%Y %W %a", 1, 140, 4, 21 },
   /* Most of the languages do not need the declension of the month names
-- 
2.39.3


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Update to %Z fix
  2023-07-24 13:53       ` [PATCH] Update to %Z fix Stanley Lancaster
@ 2023-07-24 16:48         ` Paul Eggert
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Eggert @ 2023-07-24 16:48 UTC (permalink / raw)
  To: Stanley Lancaster, libc-alpha

The indenting is so messed up on that one that I found it hard to read.

To help keep track of patch versions, please use the same subject line 
except say [PATCH v3], [PATCH v4] etc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-24 16:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-14 14:52 Fix %Z parsing in strptime [BZ #16088] Stanley Lancaster
2023-07-14 16:11 ` Paul Eggert
2023-07-18 17:17   ` Stanley Lancaster
2023-07-18 17:22     ` Stanley Lancaster
2023-07-19  2:13     ` Paul Eggert
2023-07-24 13:53       ` [PATCH] Update to %Z fix Stanley Lancaster
2023-07-24 16:48         ` Paul Eggert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).