From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20798 invoked by alias); 28 Dec 2018 09:54:07 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 20788 invoked by uid 89); 28 Dec 2018 09:54:07 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.2 required=5.0 tests=BAYES_50,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=discuss, becoming, besides, May X-HELO: shared-ano163.rev.nazwa.pl X-Spam-Score: 0 Date: Fri, 28 Dec 2018 11:18:00 -0000 From: Rafal Luzynski To: TAMUKI Shoichi , libc-alpha@sourceware.org Message-ID: <1171971086.574322.1545990690594@poczta.nazwa.pl> In-Reply-To: <201812070655.AA04129@tamuki.linet.gr.jp> References: <201812070655.AA04129@tamuki.linet.gr.jp> Subject: Re: [PATCH v4] Improve the width of alternate representation for year in strftime [BZ #23758] MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2018-12/txt/msg01001.txt.bz2 Hello Tamuki Shoichi, I am sorry for this delayed reply. Two general remarks: 1. Whatever you contribute here you will also have to port and contribute to gnulib. [1] The reason is that "date" command line utility uses gnulib's fprintftime which implements a similar functionality but does not use strftime. If you don't contribute to gnulib then you will not see these new features in "date". 2. As we are short of time, for easier review I suggest splitting this patch into two patches implementing two features: * setting the default width of "%Ey" to 2; * passing the additional flags from "%EY" to "%Ey". While setting the width of "%Ey" is clear and obvious to me and I understand it is urgent, I perceive passing the flags correctly as an additional problem fixed by the same patch. Also the implementation of this additional feature is not clear to me. Please ignore my suggestion to split if other maintainers review your patch and accept as it is. Could you please reword the first line of your commit message to make it shorter than 72 characters so that if you do "git log --oneline" in a standard 80 columns terminal the title does not wrap? Also, it could start with the name of the function or the module you are correcting. My suggestion: strftime: Improve the default width of "%Ey" [BZ #23758]. or maybe better: strftime: Set the default width of "%Ey" to 2 [BZ #23758]. 7.12.2018 07:55 TAMUKI Shoichi wrote: > > The Japanese era name is scheduled to be changed on May 1, 2019. > Prior to this, change the alternate representation for year in I am not sure but I think that it should be called "alternative" rather than "alternate". > strftime to pad the number with zero to keep it constant width, so > that prevent the trouble we saw in the past from becoming obvious > again from the year after the era name changes onward. > > Since only one Japanese era name is used by each emperor's reign as > lately, it is rare that the year ends in one digit or lasts more than > three digits. In addition, the default width of month, day, hour, > minute, and second is 2, so adjust the default width of year the same > as them, and then the whole display balance is improved. Therefore, > it would be reasonable to change the width padding with zero of %Ey > default to 2. I was thinking whether we want to set the default width of "%Ey" to 2 but indeed, using Gregorian calendar we have the default width "%y" set to 2 and we use zero padding so for the consistency it is fine to set "%Ey" the same. My reasoning is different than yours but the result is the same so I think it is OK. Now what about "%EY"? I was not looking at this before so this piece of code is new to me. As far as I can see, "%EY", other than "%Ey" and "%Y", is not a formatting of the year number but is just replaced by a subformat (so it is similar to "%c" or "%D" etc.) Therefore there is no question what is the default width of the year number in "%EY" and whether it should be the same as "%Y" (which has no default padding). The subformat substituted for "%EY" may or may not contain "%Ey". > Currently in glibc, besides ja_JP locale, the locales using the > conversion specifier above are lo_LA (Laos) and th_TH (Thailand). In > these locales, they use the Buddhist era. The Buddhist era is a value > obtained by adding 543 to the Christian era, so they are not affected > by the change of the conversion specifier %Ey. > > Furthermore, for the output string of the conversion specifier %EY, > an optional flag is given to the conversion specifier so that it can > be also used the current non-padding format, and the padding format > can be controlled. To achieve this, when an optional flag is given to > the conversion specifier %EY, the %Ey included in the combined > conversion specifier is interpreted as if decorated with the > appropriate flag. This change is reasonable but, as I said before, it is not related with the era change so for my review this should belong to the second patch. Again, this is only if nobody else reviews and if you agree to split. > [...] > Remarks: > At last, I have fulfilled all legal requirements, so you can review > my patch without worrying about the copyright issues. I am sorry for not noticing this remark before. I was not looking carefully at your patch because I did not know you had signed the copyright assignment. On the other hand, I did not know you had signed the assignment because I was not looking at your patch. :-/ > In this patch, all items pointed out so far have been fixed. > > My concern is the documentation. Please review especially wording, > expression, etc. (I am not an English native speaker.) I am not an English native speaker either so I hesitate to proofread. Please take my feedback about the documentation as incomplete and not fully reliable. I provide it only because nobody else did it so far. On the other hand, I think it is OK if we have to fix the documentation in the freeze period. > NEWS | 7 +++ > manual/time.texi | 15 ++++++ > time/Makefile | 2 +- > time/strftime_l.c | 28 ++++++----- > time/tst-strftime2.c | 133 > +++++++++++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 173 insertions(+), 12 deletions(-) > create mode 100644 time/tst-strftime2.c > > diff --git a/NEWS b/NEWS > index 8483dcf4928..40e33e228f8 100644 > --- a/NEWS > +++ b/NEWS > @@ -41,6 +41,13 @@ Major new features: > incosistent mutex state after fork call in multithread environment. > In both popen and system there is no direct access to user-defined > mutexes. > > +* Improve the width of alternate representation for year in strftime. As I said before, I think it should be s/alternate/alternative. Also, please note that the term "alternative" may be ambiguous here because this may apply to both "%Ey" (alternative era, decimal digits) and "%Oy" (alternative numbering system but nothing about an alternative era). > + For %Ey conversion specifier, the default action is now to pad the > + number with zero to keep minimum 2 digits as well as %y. "to keep the minimum width of 2 digits, same as %y" (or: "... similar as %y") > Also, the > + optional flag (either _ or -) can be used for %EY, so that the > + internal %Ey is interpreted as if decorated with the appropriate > + flag. > + > [...] This would belong to the second patch. Also, I'm not sure if it needs to be mentioned. Just those flag will work as expected. It could be perceived as a bug that they did not work but now it is being fixed. (Again, I am not absolutely sure you should remove it, just asking in public). > [...] > diff --git a/manual/time.texi b/manual/time.texi > index 4d154452eba..c1d5770f9e0 100644 > --- a/manual/time.texi > +++ b/manual/time.texi > @@ -1393,6 +1393,9 @@ The preferred calendar time representation for the > current locale. > The century of the year. This is equivalent to the greatest integer not > greater than the year divided by 100. > > +If the @code{E} modifier is specified (@code{%EC}), the locale's > +alternate representation for year (the era name) is used instead. "%C" is a format specifier for a century so I believe that the word "year" is just a copy/paste typo here. > [...] > +If the @code{E} modifier is specified (@code{%Ey}), the locale's > +alternate representation for year (the era year) is used instead. The > +default action is to pad the number with zero to keep minimum 2 digits > +as well as @code{%y}. Again: s/alternate/alternative/ and s/ as well as/, same as/ or /s/ as well as/, similar to/ or any other better wording. > + > @item %Y > The year as a decimal number, using the Gregorian calendar. Years > before the year @code{1} are numbered @code{0}, @code{-1}, and so on. > > +If the @code{E} modifier is specified (@code{%EY}), the locale's > +alternate representation for year (generally the combination of > +@code{%EC} and @code{%Ey}) is used instead. If I understand correctly, the alternative representation for "%EY" is just a subformat which may contain "%EC" and "%Ey" or may contain anything the user puts there. What about just removing the sentence in brackets and not mentioning what is inside? My reasoning that it may be not true and it does not actually matter what is inside as long as it works as the user wants it to work. > In this case, the > +optional flag (either @code{_} or @code{-}) can be used, so that the > +internal @code{%Ey} is interpreted as if decorated with the > +appropriate flag. Again I'm not sure we should put here the obvious things which work as expected. This may be mentioned here if this bug is perceived as severe by Japanese people but I think we don't mention all bugs fixed in this section of NEWS. There is another section which lists all fixed bugs but on the other hand the bug 23758 is about the default width of %Ey rather than not working flags. I am not sure what to do here. File a separate bug report? Remove any mention of the flags now working as too obvious? Leave as is? > [...] > diff --git a/time/Makefile b/time/Makefile > index 743bd99f182..9f94e07e3bc 100644 > --- a/time/Makefile > +++ b/time/Makefile > @@ -43,7 +43,7 @@ tests := test_time clocktest tst-posixtz tst-strptime > tst_wcsftime \ > tst-getdate tst-mktime tst-mktime2 tst-ftime_l tst-strftime \ > tst-mktime3 tst-strptime2 bug-asctime bug-asctime_r bug-mktime1 \ > tst-strptime3 bug-getdate1 tst-strptime-whitespace tst-ftime \ > - tst-tzname tst-y2039 bug-mktime4 > + tst-tzname tst-y2039 bug-mktime4 tst-strftime2 > > include ../Rules > I think you will have to modify LOCALES in this Makefile to include the locales you are using. Please also see below when I discuss the test case. Have you checked if your test case actually runs? > diff --git a/time/strftime_l.c b/time/strftime_l.c > index c71f9f47a95..8797341ea53 100644 > --- a/time/strftime_l.c > +++ b/time/strftime_l.c > @@ -434,7 +434,7 @@ static CHAR_T const month_name[][10] = > #endif > > static size_t __strftime_internal (CHAR_T *, size_t, const CHAR_T *, > - const struct tm *, bool * > + const struct tm *, int *, bool * It's not clear to me why this new argument has to be a pointer: "int *" rather than just "int". Do you use the value of yr_spec in the caller after __strftime_internal is executed? I need an explanation and also if you split your patch this should belong to the second part. However, if other maintainers have reviewed and accept this part you will not have to do anything. > ut_argument_spec > LOCALE_PARAM) __THROW; > > @@ -456,8 +456,9 @@ my_strftime (CHAR_T *s, size_t maxsize, const CHAR_T > *format, > tmcopy = *tp; > tp = &tmcopy; > #endif > + int yr_spec = 0; /* Override padding for %Ey. */ Same here, do you need yr_spec here as a variable? > bool tzset_called = false; > - return __strftime_internal (s, maxsize, format, tp, &tzset_called > + return __strftime_internal (s, maxsize, format, tp, &yr_spec, > &tzset_called > ut_argument LOCALE_ARG); > } > #ifdef _LIBC > @@ -466,7 +467,7 @@ libc_hidden_def (my_strftime) > > static size_t > __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format, > - const struct tm *tp, bool *tzset_called > + const struct tm *tp, int *yr_spec, bool *tzset_called Same here. > ut_argument_spec LOCALE_PARAM) > { > #if defined _LIBC && defined USE_IN_EXTENDED_LOCALE_MODEL > @@ -820,7 +821,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const > CHAR_T *format, > if (modifier == L_('O')) > goto bad_format; > #ifdef _NL_CURRENT > - if (! (modifier == 'E' > + if (! (modifier == L_('E') That's an unrelated change, probably does not actually change anything but it's good to accept this for the consistency. If the patch is split, it should belong to the first part. > [...] > @@ -838,11 +839,12 @@ __strftime_internal (CHAR_T *s, size_t maxsize, > const CHAR_T *format, > { > CHAR_T *old_start = p; > size_t len = __strftime_internal (NULL, (size_t) -1, subfmt, > - tp, tzset_called ut_argument > - LOCALE_ARG); > + tp, yr_spec, tzset_called > + ut_argument LOCALE_ARG); > add (len, __strftime_internal (p, maxsize - i, subfmt, > - tp, tzset_called ut_argument > - LOCALE_ARG)); > + tp, yr_spec, tzset_called > + ut_argument LOCALE_ARG)); > + *yr_spec = 0; Whatever was the value of yr_spec after two calls of __strftime_internal it is now lost and the new value is always 0. It's not clear to me. Why do you need a variable if you don't use its value and always set to 0? What if a caller initializes yr_spec to anything different than 0? > [...] > @@ -917,7 +919,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const > CHAR_T *format, > #ifdef _NL_CURRENT > if (! (modifier == L_('E') > && (*(subfmt = > - (const CHAR_T *)_NL_CURRENT (LC_TIME, NLW(ERA_D_FMT))) > + (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(ERA_D_FMT))) Again an unrelated change but good, let's accept it. This should be in the first part, if the patch is split. (I'm sorry if my email client breaks formatting here, please refer to the original patch in case of doubt). > [...] > @@ -1262,7 +1264,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, > const CHAR_T *format, > DO_NUMBER (1, tp->tm_wday); > > case L_('Y'): > - if (modifier == 'E') > + if (modifier == L_('E')) Again unrelated but good, let's accept it. This should belong to the first part, if the patch is split. > [...] > @@ -1273,6 +1275,8 @@ __strftime_internal (CHAR_T *s, size_t maxsize, > const CHAR_T *format, > # else > subfmt = era->era_format; > # endif > + if (pad != 0) > + *yr_spec = pad; I guess here is some reason why yr_spec is a pointer but I have not analyzed this part yet. If split, this should belong to the second part. > [...] > @@ -1294,7 +1298,9 @@ __strftime_internal (CHAR_T *s, size_t maxsize, > const CHAR_T *format, > if (era) > { > int delta = tp->tm_year - era->start_date[0]; > - DO_NUMBER (1, (era->offset > + if (*yr_spec != 0) > + pad = *yr_spec; Again this is something which I don't understand but maybe I should just read more. If split, this should belong to the second part. > + DO_NUMBER (2, (era->offset > + delta * era->absolute_direction)); If I understand correctly, this is the core part of setting the default width of "%Ey" to 2. Very good, very desired and very reasonable, especially if you scroll few lines down and see that "%y" has the value 2 here as well. > [...] > diff --git a/time/tst-strftime2.c b/time/tst-strftime2.c > new file mode 100644 > index 00000000000..6f2cf4ddd81 > --- /dev/null > +++ b/time/tst-strftime2.c > @@ -0,0 +1,133 @@ > +/* Verify the behavior of strftime on alternate representation for year. > + > + Copyright (C) 2018 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > +#include > +#include > + > +static const char *locales[] = { "ja_JP.UTF-8", "lo_LA.UTF-8", > "th_TH.UTF-8" }; These locales should be added to LOCALES in Makefile to make sure they are generated before the test is run. As far as I can see only "ja_JP.UTF-8" is generated now but please list them all. We should not rely on other scripts because they may remove "ja_JP.UTF-8" ever in future. > +#define nlocales (sizeof (locales) / sizeof (locales[0])) I think we already have a macro to calculate the number of elements in the array but I don't remember its name at the moment. > +static const char *formats[] = { "%EY", "%_EY", "%-EY" }; So this test case tests the behavior of additional flags passed to "%EY", is that correct? Therefore if the patch is split then it should belong to the second part. It does not test the width of "%Ey" and whether "%Ey" works correctly (displays the correct year number) at all, does it? Do we have such a test already? Do we need it? Is it difficult for you to write it? Are we able to prepare a test for the future Japanese era already? > [...] > + { > + if (setlocale (LC_ALL, locales[i]) == NULL) > + { > + printf ("locale %s does not exist, skipping...\n", locales[i]); > + continue; I'm afraid that this test skips most of the locales. > [...] I don't provide the feedback for the rest of this test case because I don't feel competent enough. Other maintainers are welcome to provide their feedback. Regards, Rafal [1] http://git.savannah.gnu.org/gitweb/?p=gnulib.git