Re: [PATCH v4] Improve the width of alternate representation for year in strftime [BZ #23758]

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: TAMUKI Shoichi <tamuki@linet.gr.jp>
To: Rafal Luzynski <digitalfreak@lingonborough.com>,
	libc-alpha@sourceware.org
Subject: Re: [PATCH v4] Improve the width of alternate representation for year in strftime [BZ #23758]
Date: Sun, 06 Jan 2019 06:31:00 -0000	[thread overview]
Message-ID: <201901060628.AA04155@tamuki.linet.gr.jp> (raw)
In-Reply-To: <1171971086.574322.1545990690594@poczta.nazwa.pl>

Hello Rafal,

Sorry to be late replying.  Thank you for the detailed review on this
patch.

From: Rafal Luzynski <digitalfreak@lingonborough.com>
Subject: Re: [PATCH v4] Improve the width of alternate representation for year in strftime [BZ #23758]
Date: Fri, 28 Dec 2018 10:51:30 +0100 (CET)

> Two general remarks:
> 
> 1. Whatever you contribute here you will also have to port and contribute
> to gnulib. [1] The reason is that "date" command line utility uses gnulib's
> fprintftime which implements a similar functionality but does not use
> strftime.  If you don't contribute to gnulib then you will not see these
> new features in "date".

I am also aware of this.  I am going to contribute to Gnulib after
committing with consensus of these fixes in Glibc.

> 2. As we are short of time, for easier review I suggest splitting this patch
> into two patches implementing two features:
> 
> * setting the default width of "%Ey" to 2;
> * passing the additional flags from "%EY" to "%Ey".
> 
> it is urgent, I perceive passing the flags correctly as an additional problem
> fixed by the same patch.  Also the implementation of this additional feature
> is not clear to me.
> 
> Please ignore my suggestion to split if other maintainers review your patch
> and accept as it is.

Certainly, the patch's outlook is better and it becomes easier to
understand, so I will try to split the patch by the next update.
Thank you for your suggestion.

> Could you please reword the first line of your commit message to make it
> shorter than 72 characters so that if you do "git log --oneline" in a standard
> 80 columns terminal the title does not wrap?  Also, it could start with the
> name of the function or the module you are correcting.  My suggestion:
> 
> strftime: Improve the default width of "%Ey" [BZ #23758].
> 
> or maybe better:
> 
> strftime: Set the default width of "%Ey" to 2 [BZ #23758].

It was careless that the title exceeded 80 characters.  I will fix
them while referring to the above.

> > The Japanese era name is scheduled to be changed on May 1, 2019.
> > Prior to this, change the alternate representation for year in
> 
> I am not sure but I think that it should be called "alternative"
> rather than "alternate".

There are three words of "alternate" in the current time.texi, and it
matched this.  Based on this proposal, I will fix these to
"alternative".

> > Since only one Japanese era name is used by each emperor's reign as
> > lately, it is rare that the year ends in one digit or lasts more than
> > three digits.  In addition, the default width of month, day, hour,
> > minute, and second is 2, so adjust the default width of year the same
> > as them, and then the whole display balance is improved.  Therefore,
> > it would be reasonable to change the width padding with zero of %Ey
> > default to 2.
> 
> I was thinking whether we want to set the default width of "%Ey" to 2
> but indeed, using Gregorian calendar we have the default width "%y"
> set to 2 and we use zero padding so for the consistency it is fine to
> set "%Ey" the same.  My reasoning is different than yours but the result
> is the same so I think it is OK.  Now what about "%EY"?  I was not looking
> at this before so this piece of code is new to me.  As far as I can see,
> "%EY", other than "%Ey" and "%Y", is not a formatting of the year number
> but is just replaced by a subformat (so it is similar to "%c" or "%D" etc.)
> Therefore there is no question what is the default width of the year
> number in "%EY" and whether it should be the same as "%Y" (which has
> no default padding).  The subformat substituted for "%EY" may or may not
> contain "%Ey".

The reason for setting the default width of "%y" to 2 is because "%y"
is 0 or more and less than 100, so that makes sense.  On the other
hand, although the range of "%Ey" is indeterminate, it is  practically
reasonable to fit in 2 digits.  As you said, I think that it can be
thought of as "%y" the same.

For "%EY", it is replaced with a subformat containing "%Ey" and it can
be regarded as a full format of "%Ey" including the era name "%EC".
Currently, padding of "%Ey" can be controlled freely, but "%EY"
depends on "%Ey" default padding included in the replacing subformats.
This patch is made to be able to control this.

> > Furthermore, for the output string of the conversion specifier %EY,
> > an optional flag is given to the conversion specifier so that it can
> > be also used the current non-padding format, and the padding format
> > can be controlled.  To achieve this, when an optional flag is given to
> > the conversion specifier %EY, the %Ey included in the combined
> > conversion specifier is interpreted as if decorated with the
> > appropriate flag.
> 
> This change is reasonable but, as I said before, it is not related with
> the era change so for my review this should belong to the second patch.
> Again, this is only if nobody else reviews and if you agree to split.

Yes, it belongs to the second patch because this fix is to control
padding of "%EY".

> >  Remarks:
> >  At last, I have fulfilled all legal requirements, so you can review
> >  my patch without worrying about the copyright issues.
> 
> I am sorry for not noticing this remark before.  I was not looking carefully
> at your patch because I did not know you had signed the copyright assignment.
> On the other hand, I did not know you had signed the assignment because I was
> not looking at your patch. :-/

No problem.  Sorry, it is hard to find.

> >  My concern is the documentation.  Please review especially wording,
> >  expression, etc.  (I am not an English native speaker.)
> 
> I am not an English native speaker either so I hesitate to proofread.
> Please take my feedback about the documentation as incomplete and not
> fully reliable.  I provide it only because nobody else did it so far.
> On the other hand, I think it is OK if we have to fix the documentation
> in the freeze period.

Your feedback is very helpful.  Thank you very much.

> > +* Improve the width of alternate representation for year in strftime.
> 
> As I said before, I think it should be s/alternate/alternative.  Also,
> please note that the term "alternative" may be ambiguous here because
> this may apply to both "%Ey" (alternative era, decimal digits) and "%Oy"
> (alternative numbering system but nothing about an alternative era).
> 
> > +  For %Ey conversion specifier, the default action is now to pad the
> > +  number with zero to keep minimum 2 digits as well as %y.
> 
> "to keep the minimum width of 2 digits, same as %y"
> 
> (or: "... similar as %y")

I will fix them as above.

> > +  optional flag (either _ or -) can be used for %EY, so that the
> > +  internal %Ey is interpreted as if decorated with the appropriate
> > +  flag.
> 
> This would belong to the second patch.  Also, I'm not sure if it needs
> to be mentioned.  Just those flag will work as expected.  It could be
> perceived as a bug that they did not work but now it is being fixed.
> (Again, I am not absolutely sure you should remove it, just asking in
> public).

Yes, it belongs to the second patch.

> > +If the @code{E} modifier is specified (@code{%EC}), the locale's
> > +alternate representation for year (the era name) is used instead.
> 
> "%C" is a format specifier for a century so I believe that the word
> "year" is just a copy/paste typo here.

"%EC" is a format specifier that replaces an era name instead of a
century, and the era name is a component of the locale's alternative
representation of the full format year, so it was written as above.

> > +If the @code{E} modifier is specified (@code{%Ey}), the locale's
> > +alternate representation for year (the era year) is used instead.  The
> > +default action is to pad the number with zero to keep minimum 2 digits
> > +as well as @code{%y}.
> 
> Again: s/alternate/alternative/ and s/ as well as/, same as/ or
> /s/ as well as/, similar to/ or any other better wording.

I will fix them as above.

> > +If the @code{E} modifier is specified (@code{%EY}), the locale's
> > +alternate representation for year (generally the combination of
> > +@code{%EC} and @code{%Ey}) is used instead.
> 
> If I understand correctly, the alternative representation for "%EY"
> is just a subformat which may contain "%EC" and "%Ey" or may contain
> anything the user puts there.  What about just removing the sentence
> in brackets and not mentioning what is inside?  My reasoning that it
> may be not true and it does not actually matter what is inside as long
> as it works as the user wants it to work.

That's right.  "%EY" is the full format of "%Ey" with "%EC" and "%Ey"
as a component.  Users can also define formats that do not include
both.  Especially in the Japanese locale, we use that situation in the
first year when the era changes (i.e. 1926, 1989, 2019, etc.).

> > +optional flag (either @code{_} or @code{-}) can be used, so that the
> > +internal @code{%Ey} is interpreted as if decorated with the
> > +appropriate flag.
> 
> Again I'm not sure we should put here the obvious things which work
> as expected.  This may be mentioned here if this bug is perceived
> as severe by Japanese people but I think we don't mention all bugs
> fixed in this section of NEWS.  There is another section which lists
> all fixed bugs but on the other hand the bug 23758 is about the
> default width of %Ey rather than not working flags.
> 
> I am not sure what to do here.  File a separate bug report?  Remove
> any mention of the flags now working as too obvious?  Leave as is?

As mentioned above, currently, padding of "%Ey" can be controlled
freely, but "%EY" depends on "%Ey" default padding included in the
replacing subformats.  This patch is made to be able to control this.

> > -	   tst-tzname tst-y2039 bug-mktime4
> > +	   tst-tzname tst-y2039 bug-mktime4 tst-strftime2
> 
> I think you will have to modify LOCALES in this Makefile to include
> the locales you are using.  Please also see below when I discuss the
> test case.  Have you checked if your test case actually runs?

I was running the test case directly in the local environment to check
the behavior.  Certainly, we need to add the locales to be used for
LOCALES of Makefile.  Thank you for pointing out.

> > -				   const struct tm *, bool *
> > +				   const struct tm *, int *, bool *
> 
> It's not clear to me why this new argument has to be a pointer:
> "int *" rather than just "int".  Do you use the value of yr_spec in the
> caller after __strftime_internal is executed?
> 
> I need an explanation and also if you split your patch this should belong
> to the second part.  However, if other maintainers have reviewed and
> accept this part you will not have to do anything.
> 
> > +  int yr_spec = 0;		/* Override padding for %Ey.  */
> 
> Same here, do you need yr_spec here as a variable?
> 
> > -  return __strftime_internal (s, maxsize, format, tp, &tzset_called
> > +  return __strftime_internal (s, maxsize, format, tp, &yr_spec, &tzset_called
> 
> > -		     const struct tm *tp, bool *tzset_called
> > +		     const struct tm *tp, int *yr_spec, bool *tzset_called
> 
> Same here.

Since the value of yr_spec in the caller is used after
__strftime_internal is executed, this new argument has to be a
pointer.

> > -	  if (! (modifier == 'E'
> > +	  if (! (modifier == L_('E')
> 
> That's an unrelated change, probably does not actually change anything
> but it's good to accept this for the consistency.  If the patch is split,
> it should belong to the first part.

For clarity, I will include unrelated changes in another patch.

> >  	    size_t len = __strftime_internal (NULL, (size_t) -1, subfmt,
> > -					      tp, tzset_called ut_argument
> > -					      LOCALE_ARG);
> > +					      tp, yr_spec, tzset_called
> > +					      ut_argument LOCALE_ARG);
> >  	    add (len, __strftime_internal (p, maxsize - i, subfmt,
> > -					   tp, tzset_called ut_argument
> > -					   LOCALE_ARG));
> > +					   tp, yr_spec, tzset_called
> > +					   ut_argument LOCALE_ARG));
> > +	    *yr_spec = 0;
> 
> Whatever was the value of yr_spec after two calls of __strftime_internal
> it is now lost and the new value is always 0.  It's not clear to me.
> Why do you need a variable if you don't use its value and always set to 0?
> What if a caller initializes yr_spec to anything different than 0?

yr_spec is a variable introduced to pass information such as the flags
of "%Ey" included in the subformat to replace "%EY".  The reason for
setting yr_spec to 0 in the above is to finish the passing of
information such as flags and restore the subsequent behavior to the
default.

> > -		       (const CHAR_T *)_NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
> > +		       (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
> 
> Again an unrelated change but good, let's accept it.  This should
> be in the first part, if the patch is split.
> 
> > -	  if (modifier == 'E')
> > +	  if (modifier == L_('E'))
> 
> Again unrelated but good, let's accept it.  This should belong to the
> first part, if the patch is split.

Likewise, I will include them in another patch.

> > +		  if (pad != 0)
> > +		    *yr_spec = pad;
> 
> I guess here is some reason why yr_spec is a pointer but I have not
> analyzed this part yet.  If split, this should belong to the second part.
> 
> > -		  DO_NUMBER (1, (era->offset
> > +		  if (*yr_spec != 0)
> > +		    pad = *yr_spec;
> 
> Again this is something which I don't understand but maybe I should
> just read more.  If split, this should belong to the second part.

According to the value of "pad", DO_NUMBER macro will prepare the
appropriate padding.  Yes, it belongs to the second patch.

> > +		  DO_NUMBER (2, (era->offset
> 
> If I understand correctly, this is the core part of setting the default
> width of "%Ey" to 2.  Very good, very desired and very reasonable, especially
> if you scroll few lines down and see that "%y" has the value 2 here as well.

Yes, that's right.

> > +static const char *locales[] = { "ja_JP.UTF-8", "lo_LA.UTF-8", "th_TH.UTF-8" };
> 
> These locales should be added to LOCALES in Makefile to make sure
> they are generated before the test is run.  As far as I can see only
> "ja_JP.UTF-8" is generated now but please list them all.  We should not
> rely on other scripts because they may remove "ja_JP.UTF-8" ever in
> future.

As you pointed out, I will add "ja_JP.UTF-8", "lo_LA.UTF-8", and
"th_TH.UTF-8" to LOCALES in Makefile.

> > +#define nlocales (sizeof (locales) / sizeof (locales[0]))
> 
> I think we already have a macro to calculate the number of elements
> in the array but I don't remember its name at the moment.

As a result of searching the Glibc source tree as follows, it seems
that there are many cases that we prepare the current situation on our
own.  I could not find a generic macro.

tamuki@seadog:~/glibc$ grep -nr "^#define .*sizeof .* / sizeof .*[0].*"

> > +static const char *formats[] = { "%EY", "%_EY", "%-EY" };
> 
> So this test case tests the behavior of additional flags passed to "%EY",
> is that correct?  Therefore if the patch is split then it should belong
> to the second part.  It does not test the width of "%Ey" and whether
> "%Ey" works correctly (displays the correct year number) at all, does it?
> Do we have such a test already?  Do we need it?  Is it difficult for you
> to write it?  Are we able to prepare a test for the future Japanese era
> already?

In this test case, the test for the width of "%Ey" and whether "%Ey"
works correctly (displays the correct year number) are confirmed by
"%Ey" included in the subformat which replaces "%EY".

> > +	  printf ("locale %s does not exist, skipping...\n", locales[i]);
> 
> I'm afraid that this test skips most of the locales.

The locales to be tested are "ja_JP.UTF-8", "lo_LA.UTF-8", and
"th_TH.UTF-8" which use an alternative era.  As long as I add these to
LOCALES in Makefile, I think that the test will be executed.

> I don't provide the feedback for the rest of this test case because
> I don't feel competent enough.

It tests the width of "%Ey" and whether "%Ey" works correctly
(displays the correct year number) while changing the flags of "%Ey"
and the year, month, and day near the boundary of the era.

Regards,
TAMUKI Shoichi

next prev parent reply	other threads:[~2019-01-06  6:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-07  7:01 TAMUKI Shoichi
2018-12-14 12:44 ` TAMUKI Shoichi
2018-12-21 10:13   ` Rafal Luzynski
2018-12-22 11:16     ` TAMUKI Shoichi
2019-01-03 14:34       ` Carlos O'Donell
2018-12-28 11:18 ` Rafal Luzynski
2019-01-06  6:31   ` TAMUKI Shoichi [this message]
2019-01-07 16:23     ` Joseph Myers
2019-01-08  9:08       ` TAMUKI Shoichi
2019-01-03 14:43 ` Carlos O'Donell
2019-01-03 22:53   ` Rafal Luzynski
2019-01-12 13:38   ` TAMUKI Shoichi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201901060628.AA04155@tamuki.linet.gr.jp \
    --to=tamuki@linet.gr.jp \
    --cc=digitalfreak@lingonborough.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).