Re: [PATCH v4] Improve the width of alternate representation for year in strftime [BZ #23758]

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Rafal Luzynski <digitalfreak@lingonborough.com>
To: TAMUKI Shoichi <tamuki@linet.gr.jp>, libc-alpha@sourceware.org
Subject: Re: [PATCH v4] Improve the width of alternate representation for year in strftime [BZ #23758]
Date: Fri, 28 Dec 2018 11:18:00 -0000	[thread overview]
Message-ID: <1171971086.574322.1545990690594@poczta.nazwa.pl> (raw)
In-Reply-To: <201812070655.AA04129@tamuki.linet.gr.jp>

Hello Tamuki Shoichi,

I am sorry for this delayed reply.

Two general remarks:

1. Whatever you contribute here you will also have to port and contribute
to gnulib. [1] The reason is that "date" command line utility uses gnulib's
fprintftime which implements a similar functionality but does not use
strftime.  If you don't contribute to gnulib then you will not see these
new features in "date".

2. As we are short of time, for easier review I suggest splitting this patch
into two patches implementing two features:

* setting the default width of "%Ey" to 2;
* passing the additional flags from "%EY" to "%Ey".

While setting the width of "%Ey" is clear and obvious to me and I understand
it is urgent, I perceive passing the flags correctly as an additional
problem
fixed by the same patch.  Also the implementation of this additional feature
is not clear to me.

Please ignore my suggestion to split if other maintainers review your patch
and accept as it is.

Could you please reword the first line of your commit message to make it
shorter than 72 characters so that if you do "git log --oneline" in a
standard
80 columns terminal the title does not wrap?  Also, it could start with the
name of the function or the module you are correcting.  My suggestion:

strftime: Improve the default width of "%Ey" [BZ #23758].

or maybe better:

strftime: Set the default width of "%Ey" to 2 [BZ #23758].

7.12.2018 07:55 TAMUKI Shoichi <tamuki@linet.gr.jp> wrote:
> 
> The Japanese era name is scheduled to be changed on May 1, 2019.
> Prior to this, change the alternate representation for year in

I am not sure but I think that it should be called "alternative"
rather than "alternate".

> strftime to pad the number with zero to keep it constant width, so
> that prevent the trouble we saw in the past from becoming obvious
> again from the year after the era name changes onward.
> 
> Since only one Japanese era name is used by each emperor's reign as
> lately, it is rare that the year ends in one digit or lasts more than
> three digits.  In addition, the default width of month, day, hour,
> minute, and second is 2, so adjust the default width of year the same
> as them, and then the whole display balance is improved.  Therefore,
> it would be reasonable to change the width padding with zero of %Ey
> default to 2.

I was thinking whether we want to set the default width of "%Ey" to 2
but indeed, using Gregorian calendar we have the default width "%y"
set to 2 and we use zero padding so for the consistency it is fine to
set "%Ey" the same.  My reasoning is different than yours but the result
is the same so I think it is OK.  Now what about "%EY"?  I was not looking
at this before so this piece of code is new to me.  As far as I can see,
"%EY", other than "%Ey" and "%Y", is not a formatting of the year number
but is just replaced by a subformat (so it is similar to "%c" or "%D" etc.)
Therefore there is no question what is the default width of the year
number in "%EY" and whether it should be the same as "%Y" (which has
no default padding).  The subformat substituted for "%EY" may or may not
contain "%Ey".

> Currently in glibc, besides ja_JP locale, the locales using the
> conversion specifier above are lo_LA (Laos) and th_TH (Thailand).  In
> these locales, they use the Buddhist era.  The Buddhist era is a value
> obtained by adding 543 to the Christian era, so they are not affected
> by the change of the conversion specifier %Ey.
> 
> Furthermore, for the output string of the conversion specifier %EY,
> an optional flag is given to the conversion specifier so that it can
> be also used the current non-padding format, and the padding format
> can be controlled.  To achieve this, when an optional flag is given to
> the conversion specifier %EY, the %Ey included in the combined
> conversion specifier is interpreted as if decorated with the
> appropriate flag.

This change is reasonable but, as I said before, it is not related with
the era change so for my review this should belong to the second patch.
Again, this is only if nobody else reviews and if you agree to split.

> [...]
>  Remarks:
>  At last, I have fulfilled all legal requirements, so you can review
>  my patch without worrying about the copyright issues.

I am sorry for not noticing this remark before.  I was not looking carefully
at your patch because I did not know you had signed the copyright
assignment.
On the other hand, I did not know you had signed the assignment because I
was
not looking at your patch. :-/

>  In this patch, all items pointed out so far have been fixed.
> 
>  My concern is the documentation.  Please review especially wording,
>  expression, etc.  (I am not an English native speaker.)

I am not an English native speaker either so I hesitate to proofread.
Please take my feedback about the documentation as incomplete and not
fully reliable.  I provide it only because nobody else did it so far.
On the other hand, I think it is OK if we have to fix the documentation
in the freeze period.

>  NEWS                 |   7 +++
>  manual/time.texi     |  15 ++++++
>  time/Makefile        |   2 +-
>  time/strftime_l.c    |  28 ++++++-----
>  time/tst-strftime2.c | 133
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 173 insertions(+), 12 deletions(-)
>  create mode 100644 time/tst-strftime2.c
> 
> diff --git a/NEWS b/NEWS
> index 8483dcf4928..40e33e228f8 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -41,6 +41,13 @@ Major new features:
>    incosistent mutex state after fork call in multithread environment.
>    In both popen and system there is no direct access to user-defined
> mutexes.
>  
> +* Improve the width of alternate representation for year in strftime.

As I said before, I think it should be s/alternate/alternative.  Also,
please note that the term "alternative" may be ambiguous here because
this may apply to both "%Ey" (alternative era, decimal digits) and "%Oy"
(alternative numbering system but nothing about an alternative era).

> +  For %Ey conversion specifier, the default action is now to pad the
> +  number with zero to keep minimum 2 digits as well as %y.

"to keep the minimum width of 2 digits, same as %y"

(or: "... similar as %y")

>  Also, the
> +  optional flag (either _ or -) can be used for %EY, so that the
> +  internal %Ey is interpreted as if decorated with the appropriate
> +  flag.
> +
> [...]

This would belong to the second patch.  Also, I'm not sure if it needs
to be mentioned.  Just those flag will work as expected.  It could be
perceived as a bug that they did not work but now it is being fixed.
(Again, I am not absolutely sure you should remove it, just asking in
public).

> [...]
> diff --git a/manual/time.texi b/manual/time.texi
> index 4d154452eba..c1d5770f9e0 100644
> --- a/manual/time.texi
> +++ b/manual/time.texi
> @@ -1393,6 +1393,9 @@ The preferred calendar time representation for the
> current locale.
>  The century of the year.  This is equivalent to the greatest integer not
>  greater than the year divided by 100.
>  
> +If the @code{E} modifier is specified (@code{%EC}), the locale's
> +alternate representation for year (the era name) is used instead.

"%C" is a format specifier for a century so I believe that the word
"year" is just a copy/paste typo here.

>  [...]
> +If the @code{E} modifier is specified (@code{%Ey}), the locale's
> +alternate representation for year (the era year) is used instead.  The
> +default action is to pad the number with zero to keep minimum 2 digits
> +as well as @code{%y}.

Again: s/alternate/alternative/ and s/ as well as/, same as/ or
/s/ as well as/, similar to/ or any other better wording.

> +
>  @item %Y
>  The year as a decimal number, using the Gregorian calendar.  Years
>  before the year @code{1} are numbered @code{0}, @code{-1}, and so on.
>  
> +If the @code{E} modifier is specified (@code{%EY}), the locale's
> +alternate representation for year (generally the combination of
> +@code{%EC} and @code{%Ey}) is used instead.

If I understand correctly, the alternative representation for "%EY"
is just a subformat which may contain "%EC" and "%Ey" or may contain
anything the user puts there.  What about just removing the sentence
in brackets and not mentioning what is inside?  My reasoning that it
may be not true and it does not actually matter what is inside as long
as it works as the user wants it to work.

>  In this case, the
> +optional flag (either @code{_} or @code{-}) can be used, so that the
> +internal @code{%Ey} is interpreted as if decorated with the
> +appropriate flag.

Again I'm not sure we should put here the obvious things which work
as expected.  This may be mentioned here if this bug is perceived
as severe by Japanese people but I think we don't mention all bugs
fixed in this section of NEWS.  There is another section which lists
all fixed bugs but on the other hand the bug 23758 is about the
default width of %Ey rather than not working flags.

I am not sure what to do here.  File a separate bug report?  Remove
any mention of the flags now working as too obvious?  Leave as is?

> [...]
> diff --git a/time/Makefile b/time/Makefile
> index 743bd99f182..9f94e07e3bc 100644
> --- a/time/Makefile
> +++ b/time/Makefile
> @@ -43,7 +43,7 @@ tests	:= test_time clocktest tst-posixtz tst-strptime
> tst_wcsftime \
>  	   tst-getdate tst-mktime tst-mktime2 tst-ftime_l tst-strftime \
>  	   tst-mktime3 tst-strptime2 bug-asctime bug-asctime_r bug-mktime1 \
>  	   tst-strptime3 bug-getdate1 tst-strptime-whitespace tst-ftime \
> -	   tst-tzname tst-y2039 bug-mktime4
> +	   tst-tzname tst-y2039 bug-mktime4 tst-strftime2
>  
>  include ../Rules
>  

I think you will have to modify LOCALES in this Makefile to include
the locales you are using.  Please also see below when I discuss the
test case.  Have you checked if your test case actually runs?

> diff --git a/time/strftime_l.c b/time/strftime_l.c
> index c71f9f47a95..8797341ea53 100644
> --- a/time/strftime_l.c
> +++ b/time/strftime_l.c
> @@ -434,7 +434,7 @@ static CHAR_T const month_name[][10] =
>  #endif
>  
>  static size_t __strftime_internal (CHAR_T *, size_t, const CHAR_T *,
> -				   const struct tm *, bool *
> +				   const struct tm *, int *, bool *

It's not clear to me why this new argument has to be a pointer:
"int *" rather than just "int".  Do you use the value of yr_spec in the
caller after __strftime_internal is executed?

I need an explanation and also if you split your patch this should belong
to the second part.  However, if other maintainers have reviewed and
accept this part you will not have to do anything.

>  				   ut_argument_spec
>  				   LOCALE_PARAM) __THROW;
>  
> @@ -456,8 +456,9 @@ my_strftime (CHAR_T *s, size_t maxsize, const CHAR_T
> *format,
>    tmcopy = *tp;
>    tp = &tmcopy;
>  #endif
> +  int yr_spec = 0;		/* Override padding for %Ey.  */

Same here, do you need yr_spec here as a variable?

>    bool tzset_called = false;
> -  return __strftime_internal (s, maxsize, format, tp, &tzset_called
> +  return __strftime_internal (s, maxsize, format, tp, &yr_spec,
> &tzset_called
>  			      ut_argument LOCALE_ARG);
>  }
>  #ifdef _LIBC
> @@ -466,7 +467,7 @@ libc_hidden_def (my_strftime)
>  
>  static size_t
>  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
> -		     const struct tm *tp, bool *tzset_called
> +		     const struct tm *tp, int *yr_spec, bool *tzset_called

Same here.

>  		     ut_argument_spec LOCALE_PARAM)
>  {
>  #if defined _LIBC && defined USE_IN_EXTENDED_LOCALE_MODEL
> @@ -820,7 +821,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const
> CHAR_T *format,
>  	  if (modifier == L_('O'))
>  	    goto bad_format;
>  #ifdef _NL_CURRENT
> -	  if (! (modifier == 'E'
> +	  if (! (modifier == L_('E')

That's an unrelated change, probably does not actually change anything
but it's good to accept this for the consistency.  If the patch is split,
it should belong to the first part.

> [...]
> @@ -838,11 +839,12 @@ __strftime_internal (CHAR_T *s, size_t maxsize,
> const CHAR_T *format,
>  	  {
>  	    CHAR_T *old_start = p;
>  	    size_t len = __strftime_internal (NULL, (size_t) -1, subfmt,
> -					      tp, tzset_called ut_argument
> -					      LOCALE_ARG);
> +					      tp, yr_spec, tzset_called
> +					      ut_argument LOCALE_ARG);
>  	    add (len, __strftime_internal (p, maxsize - i, subfmt,
> -					   tp, tzset_called ut_argument
> -					   LOCALE_ARG));
> +					   tp, yr_spec, tzset_called
> +					   ut_argument LOCALE_ARG));
> +	    *yr_spec = 0;

Whatever was the value of yr_spec after two calls of __strftime_internal
it is now lost and the new value is always 0.  It's not clear to me.
Why do you need a variable if you don't use its value and always set to 0?
What if a caller initializes yr_spec to anything different than 0?

> [...]
> @@ -917,7 +919,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const
> CHAR_T *format,
>  #ifdef _NL_CURRENT
>  	  if (! (modifier == L_('E')
>  		 && (*(subfmt =
> -		       (const CHAR_T *)_NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
> +		       (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))

Again an unrelated change but good, let's accept it.  This should
be in the first part, if the patch is split.

(I'm sorry if my email client breaks formatting here, please refer
to the original patch in case of doubt).

> [...]
> @@ -1262,7 +1264,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize,
> const CHAR_T *format,
>  	  DO_NUMBER (1, tp->tm_wday);
>  
>  	case L_('Y'):
> -	  if (modifier == 'E')
> +	  if (modifier == L_('E'))

Again unrelated but good, let's accept it.  This should belong to the
first part, if the patch is split.

> [...]
> @@ -1273,6 +1275,8 @@ __strftime_internal (CHAR_T *s, size_t maxsize,
> const CHAR_T *format,
>  # else
>  		  subfmt = era->era_format;
>  # endif
> +		  if (pad != 0)
> +		    *yr_spec = pad;

I guess here is some reason why yr_spec is a pointer but I have not
analyzed this part yet.  If split, this should belong to the second part.

> [...]
> @@ -1294,7 +1298,9 @@ __strftime_internal (CHAR_T *s, size_t maxsize,
> const CHAR_T *format,
>  	      if (era)
>  		{
>  		  int delta = tp->tm_year - era->start_date[0];
> -		  DO_NUMBER (1, (era->offset
> +		  if (*yr_spec != 0)
> +		    pad = *yr_spec;

Again this is something which I don't understand but maybe I should
just read more.  If split, this should belong to the second part.

> +		  DO_NUMBER (2, (era->offset
>  				 + delta * era->absolute_direction));

If I understand correctly, this is the core part of setting the default
width of "%Ey" to 2.  Very good, very desired and very reasonable,
especially
if you scroll few lines down and see that "%y" has the value 2 here as well.

> [...]
> diff --git a/time/tst-strftime2.c b/time/tst-strftime2.c
> new file mode 100644
> index 00000000000..6f2cf4ddd81
> --- /dev/null
> +++ b/time/tst-strftime2.c
> @@ -0,0 +1,133 @@
> +/* Verify the behavior of strftime on alternate representation for year.
> +
> +   Copyright (C) 2018 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <locale.h>
> +#include <time.h>
> +#include <stdio.h>
> +#include <string.h>
> +
> +static const char *locales[] = { "ja_JP.UTF-8", "lo_LA.UTF-8",
> "th_TH.UTF-8" };

These locales should be added to LOCALES in Makefile to make sure
they are generated before the test is run.  As far as I can see only
"ja_JP.UTF-8" is generated now but please list them all.  We should not
rely on other scripts because they may remove "ja_JP.UTF-8" ever in
future.

> +#define nlocales (sizeof (locales) / sizeof (locales[0]))

I think we already have a macro to calculate the number of elements
in the array but I don't remember its name at the moment.

> +static const char *formats[] = { "%EY", "%_EY", "%-EY" };

So this test case tests the behavior of additional flags passed to "%EY",
is that correct?  Therefore if the patch is split then it should belong
to the second part.  It does not test the width of "%Ey" and whether
"%Ey" works correctly (displays the correct year number) at all, does it?
Do we have such a test already?  Do we need it?  Is it difficult for you
to write it?  Are we able to prepare a test for the future Japanese era
already?

> [...]
> +    {
> +      if (setlocale (LC_ALL, locales[i]) == NULL)
> +	{
> +	  printf ("locale %s does not exist, skipping...\n", locales[i]);
> +	  continue;

I'm afraid that this test skips most of the locales.

> [...]

I don't provide the feedback for the rest of this test case because
I don't feel competent enough.

Other maintainers are welcome to provide their feedback.

Regards,

Rafal

[1] http://git.savannah.gnu.org/gitweb/?p=gnulib.git

next prev parent reply	other threads:[~2018-12-28  9:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-07  7:01 TAMUKI Shoichi
2018-12-14 12:44 ` TAMUKI Shoichi
2018-12-21 10:13   ` Rafal Luzynski
2018-12-22 11:16     ` TAMUKI Shoichi
2019-01-03 14:34       ` Carlos O'Donell
2018-12-28 11:18 ` Rafal Luzynski [this message]
2019-01-06  6:31   ` TAMUKI Shoichi
2019-01-07 16:23     ` Joseph Myers
2019-01-08  9:08       ` TAMUKI Shoichi
2019-01-03 14:43 ` Carlos O'Donell
2019-01-03 22:53   ` Rafal Luzynski
2019-01-12 13:38   ` TAMUKI Shoichi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1171971086.574322.1545990690594@poczta.nazwa.pl \
    --to=digitalfreak@lingonborough.com \
    --cc=libc-alpha@sourceware.org \
    --cc=tamuki@linet.gr.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).