From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 112951 invoked by alias); 11 Jan 2018 05:04:43 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 97826 invoked by uid 89); 11 Jan 2018 05:03:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.8 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy= X-HELO: mail-qk0-f171.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=bKiexF9qXBkiJyfzRISs3uFv5JVSqZSL6Tq+jFbc37k=; b=ITK0ZPDQFHnkzMFHOjKE/peLAs2sX3THCSoUOAW7JV2v9nGSecMewojHr6H6PiLXta mpdkff34EdK4VcNtQ75iRF909AP3zBGARsn4r/nEhd0NdBsCFwgnD7OnoocacWgj+ZzM PNek/5CO+MI2W2daNVe4hBNqinvaUWIvYZfQwYiJyILc5aqkEKEG9CR5NjHQtdEyhGHn 7jIyNRyiW3b/g8TRRA1kd9unCBjBMH0hoOB1L5xkp0/54bG0oHxmm8zueWiK5tubnNbK dU6br8UEUTV8Zm3OjLc1Sfp+7W8WVGHNCOAJSgSOcmE7I0fB1sB4tM/zfH+J9GRjuUuG ynuw== X-Gm-Message-State: AKwxytel2odEZ7K7ZS+JczstIaQ8C0ZeX9dlxknzxP6peIzj3WMUhlzs 08HFLsFKiQYQqd+Rp2+JIUOoKBwUpb4= X-Google-Smtp-Source: ACJfBotycZNVSR7Giwm8AGwBBsyrq6Wo8yn0agcXTR31FVYDZWMX2fBruUp0KAgYfrjUfFWjMdosRg== X-Received: by 10.55.234.22 with SMTP id t22mr29541944qkj.339.1515646989450; Wed, 10 Jan 2018 21:03:09 -0800 (PST) Subject: Re: [PATCH v11 1/5] Implement alternative month names (bug 10871). To: Rafal Luzynski , libc-alpha@sourceware.org References: <382846926.675155.1515455688091@poczta.nazwa.pl> <309950992.675174.1515455961934@poczta.nazwa.pl> From: Carlos O'Donell Message-ID: Date: Thu, 11 Jan 2018 05:05:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <309950992.675174.1515455961934@poczta.nazwa.pl> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2018-01/txt/msg00364.txt.bz2 On 01/08/2018 03:59 PM, Rafal Luzynski wrote: > Some languages (Slavic, Baltic, etc.) require a genitive case of the > month name when formatting a full date (with the day number) while > they require a nominative case when referring to the month standalone. > This requirement cannot be fulfilled without providing two forms for > each month name. From now it is precised that nl_langinfo(MON_1) > series (up to MON_12) and strftime("%B") generate the month names in > the grammatical form used when the month forms part of a complete date. > If the grammatical form used when the month is named by itself is needed, > the new values nl_langinfo(ALTMON_1) (up to ALTMON_12) and > strftime("%OB") are supported. This new feature is optional so the > languages which do not need it or do not yet provide the updated > locales simply do not use it and their behaviour is unchanged. High level: Overall the idea you have proposed makes perfect sense. I'm excited to see someone pushing a new interface design like this to solve real language problems with the interfaces. I look forward to the POSIX standardization. My obvious worry remains that POSIX does not standardize %OB/%Ob, but all we can do is lead by example and solve user problems. Thank you for this work. Design: By the very definition of the %O* variants, if the locale does not provide ALTMON_* then it must revert back to the non-%O* variant e.g. %B. I see this implemented in the parser which is good. I also see that strftime is relaxed in what it accepts and that is also good and required. One oddity I noticed in patch 3 is the missing ABALTMON_* definitions for the _GNU_SOURCE case, is that on purpose and why? It's not needed, but it seems like a missing useful feature that completes the API. Implementation: My only request here is additional tests that verify, for locales that don't have ALTMON or ABALTMON specified, thus verifying the fallback works. With that added I think you can commit this change squashed together with the pl_PL and ru_RU changes and the test cases fixed up to pass. Reviewed-by: Carlos O'Donell > [BZ #10871] > * locale/C-time.c: Add alternative month names, define them as the > same as mon explicitly. > * locale/categories.def: alt_mon and wide-alt_mon added. > * locale/langinfo.h [__USE_GNU]: New public symbols ALTMON_1, > ALTMON_2, ALTMON_3, ALTMON_4, ALTMON_5, ALTMON_6, ALTMON_7, > ALTMON_8, ALTMON_9, ALTMON_10, ALTMON_11, ALTMON_12, > _NL_WALTMON_1, _NL_WALTMON_2, _NL_WALTMON_3, _NL_WALTMON_4, > _NL_WALTMON_5, _NL_WALTMON_6, _NL_WALTMON_7, _NL_WALTMON_8, > _NL_WALTMON_9, _NL_WALTMON_10, _NL_WALTMON_11, _NL_WALTMON_12. > * locale/programs/ld-time.c: Alternative month names support > added, they are a copy of mon if not specified explicitly. > * locale/programs/locfile-kw.gperf: alt_mon defined. > * locale/programs/locfile-token.h: tok_alt_mon defined. > * localedata/tst-langinfo.c: Add tests for the new constants > ALTMON_1 .. ALTMON_12. > * time/Makefile (LOCALES): Add pl_PL.UTF-8 for tests. > * time/strftime_l.c: %OB format for alternative month names > added. > * time/strptime_l.c: Alternative month names also recognized. > * time/tst-strptime.c: Add tests to parse different forms of > month names including the new %OB format specifier. > --- > locale/C-time.c | 28 ++++++++++++++++++++-- > locale/categories.def | 2 ++ > locale/langinfo.h | 50 ++++++++++++++++++++++++++++++++++++++-- > locale/programs/ld-time.c | 21 +++++++++++++++++ > locale/programs/locfile-kw.gperf | 1 + > locale/programs/locfile-token.h | 1 + > localedata/tst-langinfo.c | 12 ++++++++++ > time/Makefile | 2 +- > time/strftime_l.c | 11 +++++++-- > time/strptime_l.c | 32 +++++++++++++++++++++---- > time/tst-strptime.c | 6 +++++ > 11 files changed, 155 insertions(+), 11 deletions(-) > > diff --git a/locale/C-time.c b/locale/C-time.c > index 1f1ee01..73bc700 100644 > --- a/locale/C-time.c > +++ b/locale/C-time.c > @@ -30,7 +30,7 @@ const struct __locale_data _nl_C_LC_TIME attribute_hidden = > { NULL, }, /* no cached data */ > UNDELETABLE, > 0, > - 111, > + 135, OK. > { > { .string = "Sun" }, > { .string = "Mon" }, > @@ -142,6 +142,30 @@ const struct __locale_data _nl_C_LC_TIME attribute_hidden = > { .string = "" }, > { .string = "%a %b %e %H:%M:%S %Z %Y" }, > { .wstr = (const uint32_t *) L"%a %b %e %H:%M:%S %Z %Y" }, > - { .string = _nl_C_codeset } > + { .string = _nl_C_codeset }, > + { .string = "January" }, > + { .string = "February" }, > + { .string = "March" }, > + { .string = "April" }, > + { .string = "May" }, > + { .string = "June" }, > + { .string = "July" }, > + { .string = "August" }, > + { .string = "September" }, > + { .string = "October" }, > + { .string = "November" }, > + { .string = "December" }, > + { .wstr = (const uint32_t *) L"January" }, > + { .wstr = (const uint32_t *) L"February" }, > + { .wstr = (const uint32_t *) L"March" }, > + { .wstr = (const uint32_t *) L"April" }, > + { .wstr = (const uint32_t *) L"May" }, > + { .wstr = (const uint32_t *) L"June" }, > + { .wstr = (const uint32_t *) L"July" }, > + { .wstr = (const uint32_t *) L"August" }, > + { .wstr = (const uint32_t *) L"September" }, > + { .wstr = (const uint32_t *) L"October" }, > + { .wstr = (const uint32_t *) L"November" }, > + { .wstr = (const uint32_t *) L"December" } OK. > } > }; > diff --git a/locale/categories.def b/locale/categories.def > index 47947f7..3cbb4e6 100644 > --- a/locale/categories.def > +++ b/locale/categories.def > @@ -249,6 +249,8 @@ DEFINE_CATEGORY > DEFINE_ELEMENT (_DATE_FMT, "date_fmt", opt, string) > DEFINE_ELEMENT (_NL_W_DATE_FMT, "wide-date_fmt", opt, > wstring) > DEFINE_ELEMENT (_NL_TIME_CODESET, "time-codeset", std, string) > + DEFINE_ELEMENT (ALTMON_1, "alt_mon", opt, stringarray, 12, 12) > + DEFINE_ELEMENT (_NL_WALTMON_1, "wide-alt_mon", opt, wstringarray, 12, 12) OK. > ), NO_POSTLOAD) > > > diff --git a/locale/langinfo.h b/locale/langinfo.h > index 28a0a73..0fbd838 100644 > --- a/locale/langinfo.h > +++ b/locale/langinfo.h > @@ -100,7 +100,8 @@ enum > ABMON_12, > #define ABMON_12 ABMON_12 > > - /* Long month names. */ > + /* Long month names, in the grammatical form used when the month > + forms part of a complete date. */ OK. > MON_1, /* January */ > #define MON_1 MON_1 > MON_2, > @@ -189,7 +190,8 @@ enum > _NL_WABMON_11, > _NL_WABMON_12, > > - /* Long month names. */ > + /* Long month names, in the grammatical form used when the month > + forms part of a complete date. */ OK. > _NL_WMON_1, /* January */ > _NL_WMON_2, > _NL_WMON_3, > @@ -231,6 +233,50 @@ enum > > _NL_TIME_CODESET, > > + /* Long month names, in the grammatical form used when the month > + is named by itself. */ > + __ALTMON_1, /* January */ > + __ALTMON_2, > + __ALTMON_3, > + __ALTMON_4, > + __ALTMON_5, > + __ALTMON_6, > + __ALTMON_7, > + __ALTMON_8, > + __ALTMON_9, > + __ALTMON_10, > + __ALTMON_11, > + __ALTMON_12, OK. > +#ifdef __USE_GNU > +# define ALTMON_1 __ALTMON_1 > +# define ALTMON_2 __ALTMON_2 > +# define ALTMON_3 __ALTMON_3 > +# define ALTMON_4 __ALTMON_4 > +# define ALTMON_5 __ALTMON_5 > +# define ALTMON_6 __ALTMON_6 > +# define ALTMON_7 __ALTMON_7 > +# define ALTMON_8 __ALTMON_8 > +# define ALTMON_9 __ALTMON_9 > +# define ALTMON_10 __ALTMON_10 > +# define ALTMON_11 __ALTMON_11 > +# define ALTMON_12 __ALTMON_12 OK. Requires _GNU_SOURCE, which is good because this is an extension not yet defined by POSIX. > +#endif > + > + /* Long month names, in the grammatical form used when the month > + is named by itself. */ > + _NL_WALTMON_1, /* January */ > + _NL_WALTMON_2, > + _NL_WALTMON_3, > + _NL_WALTMON_4, > + _NL_WALTMON_5, > + _NL_WALTMON_6, > + _NL_WALTMON_7, > + _NL_WALTMON_8, > + _NL_WALTMON_9, > + _NL_WALTMON_10, > + _NL_WALTMON_11, > + _NL_WALTMON_12, OK. > + > _NL_NUM_LC_TIME, /* Number of indices in LC_TIME category. */ > > /* LC_COLLATE category: text sorting. > diff --git a/locale/programs/ld-time.c b/locale/programs/ld-time.c > index 67d055a..4186448 100644 > --- a/locale/programs/ld-time.c > +++ b/locale/programs/ld-time.c > @@ -91,6 +91,9 @@ struct locale_time_t > const char *date_fmt; > const uint32_t *wdate_fmt; > int alt_digits_defined; > + const char *alt_mon[12]; > + const uint32_t *walt_mon[12]; > + int alt_mon_defined; OK. > unsigned char week_ndays; > uint32_t week_1stday; > unsigned char week_1stweek; > @@ -639,6 +642,15 @@ time_output (struct localedef_t *locale, const struct > charmap_t *charmap, > add_locale_string (&file, time->date_fmt); > add_locale_wstring (&file, time->wdate_fmt); > add_locale_string (&file, charmap->code_set_name); > + > + /* The alt'mons. */ > + for (n = 0; n < 12; ++n) > + add_locale_string (&file, time->alt_mon[n] ?: ""); > + > + /* The wide character alt'mons. */ > + for (n = 0; n < 12; ++n) > + add_locale_wstring (&file, time->walt_mon[n] ?: empty_wstr); OK. > + > write_locale_data (output_path, LC_TIME, "LC_TIME", &file); > } > > @@ -782,6 +794,7 @@ time_read (struct linereader *ldfile, struct localedef_t > *result, > STRARR_ELEM (mon, 12, 12); > STRARR_ELEM (am_pm, 2, 2); > STRARR_ELEM (alt_digits, 0, 100); > + STRARR_ELEM (alt_mon, 12, 12); OK. > > case tok_era: > /* Ignore the rest of the line if we don't need the input of > @@ -934,6 +947,14 @@ time_read (struct linereader *ldfile, struct localedef_t > *result, > lr_error (ldfile, _("\ > %1$s: definition does not end with `END %1$s'"), "LC_TIME"); > lr_ignore_rest (ldfile, now->tok == tok_lc_time); > + > + /* If alt_mon was not specified, make it a copy of mon. */ > + if (!ignore_content && !time->alt_mon_defined) > + { > + memcpy (time->alt_mon, time->mon, sizeof (time->mon)); > + memcpy (time->walt_mon, time->wmon, sizeof (time->wmon)); > + time->alt_mon_defined = 1; > + } OK. Good, fall back to %B. > return; > > default: > diff --git a/locale/programs/locfile-kw.gperf b/locale/programs/locfile-kw.gperf > index c74c1f2..dad7f21 100644 > --- a/locale/programs/locfile-kw.gperf > +++ b/locale/programs/locfile-kw.gperf > @@ -148,6 +148,7 @@ first_workday, tok_first_workday, 0 > cal_direction, tok_cal_direction, 0 > timezone, tok_timezone, 0 > date_fmt, tok_date_fmt, 0 > +alt_mon, tok_alt_mon, 0 OK. > LC_MESSAGES, tok_lc_messages, 0 > yesexpr, tok_yesexpr, 0 > noexpr, tok_noexpr, 0 > diff --git a/locale/programs/locfile-token.h b/locale/programs/locfile-token.h > index f02325d..d49da5e 100644 > --- a/locale/programs/locfile-token.h > +++ b/locale/programs/locfile-token.h > @@ -186,6 +186,7 @@ enum token_t > tok_cal_direction, > tok_timezone, > tok_date_fmt, > + tok_alt_mon, OK. > tok_lc_messages, > tok_yesexpr, > tok_noexpr, > diff --git a/localedata/tst-langinfo.c b/localedata/tst-langinfo.c > index 8c3667c..0d33e75 100644 > --- a/localedata/tst-langinfo.c > +++ b/localedata/tst-langinfo.c > @@ -50,6 +50,18 @@ struct map > VAL (ABMON_8), > VAL (ABMON_9), > VAL (ALT_DIGITS), > + VAL (ALTMON_1), > + VAL (ALTMON_10), > + VAL (ALTMON_11), > + VAL (ALTMON_12), > + VAL (ALTMON_2), > + VAL (ALTMON_3), > + VAL (ALTMON_4), > + VAL (ALTMON_5), > + VAL (ALTMON_6), > + VAL (ALTMON_7), > + VAL (ALTMON_8), > + VAL (ALTMON_9), OK. > VAL (AM_STR), > VAL (CRNCYSTR), > VAL (CURRENCY_SYMBOL), > diff --git a/time/Makefile b/time/Makefile > index 264eed9..91adcd0 100644 > --- a/time/Makefile > +++ b/time/Makefile > @@ -48,7 +48,7 @@ tests := test_time clocktest tst-posixtz tst-strptime > tst_wcsftime \ > include ../Rules > > ifeq ($(run-built-tests),yes) > -LOCALES := de_DE.ISO-8859-1 en_US.ISO-8859-1 ja_JP.EUC-JP > +LOCALES := de_DE.ISO-8859-1 en_US.ISO-8859-1 ja_JP.EUC-JP pl_PL.UTF-8 OK. > include ../gen-locales.mk > > $(objpfx)tst-ftime_l.out: $(gen-locales) > diff --git a/time/strftime_l.c b/time/strftime_l.c > index 18651ff..ac5d28f 100644 > --- a/time/strftime_l.c > +++ b/time/strftime_l.c > @@ -492,6 +492,9 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T > *format, > # define f_month \ > ((const CHAR_T *) (tp->tm_mon < 0 || tp->tm_mon > 11 \ > ? "?" : _NL_CURRENT (LC_TIME, NLW(MON_1) + tp->tm_mon))) > +# define f_altmonth \ > + ((const CHAR_T *) (tp->tm_mon < 0 || tp->tm_mon > 11 \ > + ? "?" : _NL_CURRENT (LC_TIME, NLW(ALTMON_1) + tp->tm_mon))) OK. > # define ampm \ > ((const CHAR_T *) _NL_CURRENT (LC_TIME, tp->tm_hour > 11 \ > ? NLW(PM_STR) : NLW(AM_STR))) > @@ -507,6 +510,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T > *format, > ? "?" : month_name[tp->tm_mon]) > # define a_wkday f_wkday > # define a_month f_month > +# define f_altmonth f_month OK. > # define ampm (L_("AMPM") + 2 * (tp->tm_hour > 11)) > > size_t aw_len = 3; > @@ -785,7 +789,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T > *format, > #endif > > case L_('B'): > - if (modifier != 0) > + if (modifier == L_('E')) > goto bad_format; OK. Accept %O but not %E. > if (change_case) > { > @@ -793,7 +797,10 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const > CHAR_T *format, > to_lowcase = 0; > } > #if defined _NL_CURRENT || !HAVE_STRFTIME > - cpy (STRLEN (f_month), f_month); > + if (modifier == L_('O')) > + cpy (STRLEN (f_altmonth), f_altmonth); OK. > + else > + cpy (STRLEN (f_month), f_month); > break; > #else > goto underlying_strftime; > diff --git a/time/strptime_l.c b/time/strptime_l.c > index 7d4758e..39cf38d 100644 > --- a/time/strptime_l.c > +++ b/time/strptime_l.c > @@ -124,6 +124,8 @@ extern const struct __locale_data _nl_C_LC_TIME > attribute_hidden; > (&_nl_C_LC_TIME.values[_NL_ITEM_INDEX (ABDAY_1)].string) > # define month_name (&_nl_C_LC_TIME.values[_NL_ITEM_INDEX (MON_1)].string) > # define ab_month_name (&_nl_C_LC_TIME.values[_NL_ITEM_INDEX (ABMON_1)].string) > +# define alt_month_name \ > + (&_nl_C_LC_TIME.values[_NL_ITEM_INDEX (ALTMON_1)].string) OK. > # define HERE_D_T_FMT (_nl_C_LC_TIME.values[_NL_ITEM_INDEX (D_T_FMT)].string) > # define HERE_D_FMT (_nl_C_LC_TIME.values[_NL_ITEM_INDEX (D_FMT)].string) > # define HERE_AM_STR (_nl_C_LC_TIME.values[_NL_ITEM_INDEX (AM_STR)].string) > @@ -319,10 +321,9 @@ __strptime_internal (const char *rp, const char *fmt, > struct tm *tmp, > while (*fmt >= '0' && *fmt <= '9') > ++fmt; > > -#ifndef _NL_CURRENT > - /* We need this for handling the `E' modifier. */ > + /* In some cases, modifiers are handled by adjusting state and > + then restarting the switch statement below. */ OK. > start_over: > -#endif > > /* Make back up of current processing pointer. */ > rp_backup = rp; > @@ -423,13 +424,32 @@ __strptime_internal (const char *rp, const char *fmt, > struct tm *tmp, > ab_month_name[cnt])) > decided_longest = loc; > } > +#ifdef _LIBC > + /* Now check the alt month. */ > + trp = rp; > + if (match_string (_NL_CURRENT (LC_TIME, ALTMON_1 + cnt), trp) > + && trp > rp_longest) > + { > + rp_longest = trp; > + cnt_longest = cnt; > + if (s.decided == not > + && strcmp (_NL_CURRENT (LC_TIME, ALTMON_1 + cnt), > + alt_month_name[cnt])) > + decided_longest = loc; > + } > +#endif OK. Look for alternative month name. > } > #endif > if (s.decided != loc > && (((trp = rp, match_string (month_name[cnt], trp)) > && trp > rp_longest) > || ((trp = rp, match_string (ab_month_name[cnt], trp)) > - && trp > rp_longest))) > + && trp > rp_longest) > +#ifdef _LIBC > + || ((trp = rp, match_string (alt_month_name[cnt], trp)) > + && trp > rp_longest) OK. > +#endif > + )) > { > rp_longest = trp; > cnt_longest = cnt; > @@ -1015,6 +1035,10 @@ __strptime_internal (const char *rp, const char *fmt, > struct tm *tmp, > case 'O': > switch (*fmt++) > { > + case 'B': > + /* Match month name. Reprocess as plain 'B'. */ > + fmt--; > + goto start_over; OK. > case 'd': > case 'e': > /* Match day of month using alternate numeric symbols. */ > diff --git a/time/tst-strptime.c b/time/tst-strptime.c > index 34ad797..bbc1390 100644 > --- a/time/tst-strptime.c > +++ b/time/tst-strptime.c > @@ -51,6 +51,12 @@ static const struct > 6, 0, 0, 1 }, > { "ja_JP.EUC-JP", "2001 20 \xb7\xee", "%Y %U %a", 1, 140, 4, 21 }, > { "ja_JP.EUC-JP", "2001 21 \xb7\xee", "%Y %W %a", 1, 140, 4, 21 }, > + { "pl_PL.UTF-8", "21 lis 2017", "%d %b %Y", 2, 324, 10, 21 }, > + { "pl_PL.UTF-8", "22 LIS 2017", "%d %B %Y", 3, 325, 10, 22 }, > + /* TODO: Use the genitive case here as soon as it is added to localedata. */ > + { "pl_PL.UTF-8", "23 listopad 2017", "%d %B %Y", 4, 326, 10, 23 }, > + /* The nominative case is incorrect here but it is parseable. */ > + { "pl_PL.UTF-8", "24 listopad 2017", "%d %OB %Y", 5, 327, 10, 24 }, Here we need examples of %OB for langauges that do *not* have genitive names in order to test the conversion of %OB for locales that would have this definition missing (tests that %OB does fall back to %B). > }; > > > -- Cheers, Carlos.