From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oo1-xc2a.google.com (mail-oo1-xc2a.google.com [IPv6:2607:f8b0:4864:20::c2a]) by sourceware.org (Postfix) with ESMTPS id 731AC38582AC for ; Tue, 18 Jul 2023 17:23:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 731AC38582AC Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-oo1-xc2a.google.com with SMTP id 006d021491bc7-5636426c1b3so3764446eaf.1 for ; Tue, 18 Jul 2023 10:23:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689700984; x=1692292984; h=cc:to:subject:message-id:date:from:references:mime-version:from:to :cc:subject:date:message-id:reply-to; bh=BV8qlynwGgOwOt4/iqSRSHvZlWDYxTm0smS4fQoEaEw=; b=K5cGo/uCG/UZWqD0sB4lCKGv8FnaN2RLyJw4RXR4EaOyHRAqL7SwWyKbk2HgIyNmZI 9+DXFYPl6P8DD0xzqiDmoRSvfnG9tcLZMPwKuUNsLKwvVPiGtmBNqPs9VYC6qcm+YEGk byVlMBDDu/wZNaUQXky4HAS6/QOBGQ4nSXW6ZxFUGYIJiLo8yChKrKSTjzFpyP/LdC2A kldyaQ8XcrMnoPZIb0sD0toWg2j2BxHY1dF2yaibKB7c0z/nWaRPfxBi4ByD+3fZYWoc v0oEqFy+h2hXk3Mj0DROgRVz3sfSX4WLcbTpcVcvDDwDyVqhwBN1yX3M3TJwryzHeU+l o94Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689700984; x=1692292984; h=cc:to:subject:message-id:date:from:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BV8qlynwGgOwOt4/iqSRSHvZlWDYxTm0smS4fQoEaEw=; b=bs1xQagWiYcJTO2gfpPWG4yczkQ8XSUIadyz0p+qxXD3++HfmG9OIovRxS+yLANRzE tDFO7h9ycPpjrIxQt15gUwrVKwLtKnCRKSEKPFgNyW3+Xly2IMQWA7/fgKCbTMqTxumo ZrDDc3+zTZWERjw2+E91jg33StRNCTtDqY2MBEZo1WDTesl9AnvIkjpnaUfzouKJWTNS uO2kIKbHCzslcjyqF7mD0atIQEzeghDmU98hx+auUy0pJn52YdCiiWE8qtQ7Yn04szhc HM88y5acewfMTI31wRTwielgmGk3GyF3wAMjA8xKBiqmTgaT/hpkNB1+wb/vvV+jwp+r XfIQ== X-Gm-Message-State: ABy/qLbVhK3tBquzlOqw8k41HWMILseMjedVdZ8zPVt/njM+db5ege4f lGtcnpKWmzFha/tK2fjd0sWHHru0oizxSHBX7w0= X-Google-Smtp-Source: APBJJlHrVp3deppneTQUEiNsuNF1/TwEkFALuBgmDQNWElkP9+HSw6PnlZTylpMgFyUx1SclhaJ7ZumYThmQu74HIGg= X-Received: by 2002:a4a:3416:0:b0:566:ffa7:fd00 with SMTP id b22-20020a4a3416000000b00566ffa7fd00mr7449725ooa.6.1689700983812; Tue, 18 Jul 2023 10:23:03 -0700 (PDT) MIME-Version: 1.0 References: <20230714145224.1456633-1-lancasterharp@gmail.com> From: Stanley Lancaster Date: Tue, 18 Jul 2023 12:22:52 -0500 Message-ID: Subject: Re: Fix %Z parsing in strptime [BZ #16088] To: Paul Eggert Cc: libc-alpha@sourceware.org Content-Type: multipart/alternative; boundary="000000000000eab5480600c62a43" X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000eab5480600c62a43 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I made several of the changes Paul suggested. And ran all the tests. This patch does not fill out the tm_zone information because the manpage for the api lists this a format character where "the corresponding fields are parsed, but no field in tm is changed." I feel that if this is the behaviour developers have expected & written around in the past, that behaviour should stay. However, if everyone here feels that a full implementation should be written, I can do so. On Tue, Jul 18, 2023 at 12:17=E2=80=AFPM Stanley Lancaster wrote: > --- > time/strptime_l.c | 16 +++++++++++----- > time/tst-strptime.c | 2 ++ > 2 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/time/strptime_l.c b/time/strptime_l.c > index 85c3249fcc..2382defc92 100644 > --- a/time/strptime_l.c > +++ b/time/strptime_l.c > @@ -770,11 +770,17 @@ __strptime_internal (const char *rp, const char > *fmt, struct tm *tmp, > break; > case 'Z': > /* Read timezone but perform no conversion. */ > - while (ISSPACE (*rp)) > - rp++; > - while (!ISSPACE (*rp) && *rp !=3D '\0') > - rp++; > - break; > + { > + while (ISSPACE (*rp)) > + rp++; > + /* Read time zone but perform no conversion. Recognize the format > [-+a-zA-Z0-9]{3,}. */ > + const char* start_rp =3D rp; > + while ((*rp >=3D 'A' && *rp <=3D 'Z') || (*rp >=3D 'a' && *rp <=3D 'z')= || (*rp > >=3D '0' && *rp <=3D '9')) > + rp++; > + if (start_rp+3 < rp) > + return NULL; > + } > case 'z': > /* We recognize four formats: > 1. Two digits specify hours. > diff --git a/time/tst-strptime.c b/time/tst-strptime.c > index 3dae9e0594..40145cb109 100644 > --- a/time/tst-strptime.c > +++ b/time/tst-strptime.c > @@ -48,6 +48,8 @@ static const struct > 6, 0, 0, 1 }, > { "en_US.ISO-8859-1", "2000-01-01 08:12:21 PM", "%Y-%m-%d %I:%M:%S %p", > 6, 0, 0, 1 }, > + { "en_US.ISO-8859-1", "2000-01-01 08:12:21 AM CST/", "%Y-%m-%d %I:%M:%S > %p %Z/", > + 6, 0, 0, 1}, > { "ja_JP.EUC-JP", "2001 20 \xb7\xee", "%Y %U %a", 1, 140, 4, 21 }, > { "ja_JP.EUC-JP", "2001 21 \xb7\xee", "%Y %W %a", 1, 140, 4, 21 }, > /* Most of the languages do not need the declension of the month names > -- > 2.39.3 > > On Fri, Jul 14, 2023 at 11:11=E2=80=AFAM Paul Eggert = wrote: > >> On 2023-07-14 07:52, Stanley Lancaster via Libc-alpha wrote: >> >> > /* Read timezone but perform no conversion. */ >> > + /* we recognize the format [-+a-zA-Z0-9]{3,} */ >> >> Use GNU style in comment with active voice sentences and two spaces >> after sentence end, e.g., "/* Read time zone but perform no conversion. >> Recognize the format [-+a-zA-Z0-9]{3,}. */". >> >> >> > + const char* stop_rp =3D rp + 3; >> >> Again, GNU style: "char *stop_rp" not "char* stop_rp". >> >> More important, this has undefined behavior if rp + 3 overflows. >> Instead, count the number of bytes after the loop finishes, and make >> sure it's 3 or more. >> >> >> > + while (((*rp >=3D 'A' && *rp <=3D 'Z') || (*rp >=3D 'a' && *rp= <=3D >> 'z') || (*rp >=3D '0' && *rp <=3D '9')) && (rp < stop_rp) && *rp !=3D '\= 0') >> >> Omit "&& *rp !=3D '\0'"; it's redundant. Reindent to 80 columns. >> >> >> > + { "C", "1999CST0502123412", "%Y%Z%m%d%H%M%S", 0, 121, 4, 2 }, >> >> I don't see how this test passes. "CST0502123412" is treated as a time >> zone abbreviation, so the only info is the year. Did you run the tests? >> If so, why did this test pass? If not, please run. >> >> Since the patch does not fix BZ#16088, it needs a commit message that >> describes what the patch does and why it's a win even though it doesn't >> fix. In particular, the patch does not set tm_zone, and there's a reason >> for that, and this should be explained. >> > --000000000000eab5480600c62a43--