public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] localedate: LC_IDENTIFICATION.category: set to ISO 14652 2002 standard
@ 2016-04-13 16:39 Mike Frysinger
  2016-04-13 18:50 ` Chris Leonard
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Mike Frysinger @ 2016-04-13 16:39 UTC (permalink / raw)
  To: libc-alpha


[-- Attachment #1.1: Type: text/plain, Size: 1505 bytes --]

The ISO 14652 standard defines the valid values for the category
keyword as only two options:
	posix:1993
	i18n:2002

The vast majority of locales had changed the "i18n" string to the
name of its own locale (e.g. "ak_GH:2013") as well as tweaking the
date (presumably thinking it should be the date of submission).

Convert all of them to "i18n:2002" for consistency.

Compressed + attached due to size.  Example change:
--- a/localedata/locales/ak_GH
+++ b/localedata/locales/ak_GH
@@ -37,19 +37,19 @@ language     "Akan"
 territory    "Ghana"
 revision     "1.0"
 date         "2013-08-24"
-%
-category  "ak_GH:2013";LC_IDENTIFICATION
-category  "ak_GH:2013";LC_CTYPE
-category  "ak_GH:2013";LC_COLLATE
-category  "ak_GH:2013";LC_TIME
-category  "ak_GH:2013";LC_NUMERIC
-category  "ak_GH:2013";LC_MONETARY
-category  "ak_GH:2013";LC_PAPER
-category  "ak_GH:2013";LC_MEASUREMENT
-category  "ak_GH:2013";LC_MESSAGES
-category  "ak_GH:2013";LC_NAME
-category  "ak_GH:2013";LC_ADDRESS
-category  "ak_GH:2013";LC_TELEPHONE
+
+category "i18n:2002";LC_IDENTIFICATION
+category "i18n:2002";LC_CTYPE
+category "i18n:2002";LC_COLLATE
+category "i18n:2002";LC_TIME
+category "i18n:2002";LC_NUMERIC
+category "i18n:2002";LC_MONETARY
+category "i18n:2002";LC_PAPER
+category "i18n:2002";LC_MEASUREMENT
+category "i18n:2002";LC_MESSAGES
+category "i18n:2002";LC_NAME
+category "i18n:2002";LC_ADDRESS
+category "i18n:2002";LC_TELEPHONE
 END LC_IDENTIFICATION
 
 LC_CTYPE

[-- Attachment #1.2: 0001-localedate-LC_IDENTIFICATION.category-set-to-ISO-146.patch.xz --]
[-- Type: application/x-xz, Size: 21452 bytes --]

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] localedate: LC_IDENTIFICATION.category: set to ISO 14652 2002 standard
  2016-04-13 16:39 [PATCH] localedate: LC_IDENTIFICATION.category: set to ISO 14652 2002 standard Mike Frysinger
@ 2016-04-13 18:50 ` Chris Leonard
  2016-04-13 18:57 ` Carlos O'Donell
  2016-04-14 21:18 ` [PATCH v2] localedata: LC_IDENTIFICATION.category: set to ISO 30112 2014 standard Mike Frysinger
  2 siblings, 0 replies; 15+ messages in thread
From: Chris Leonard @ 2016-04-13 18:50 UTC (permalink / raw)
  To: libc-alpha

 +1 from me FWIW

cjl

On Wed, Apr 13, 2016 at 12:39 PM, Mike Frysinger <vapier@gentoo.org> wrote:
> The ISO 14652 standard defines the valid values for the category
> keyword as only two options:
>         posix:1993
>         i18n:2002
>
> The vast majority of locales had changed the "i18n" string to the
> name of its own locale (e.g. "ak_GH:2013") as well as tweaking the
> date (presumably thinking it should be the date of submission).
>
> Convert all of them to "i18n:2002" for consistency.
>
> Compressed + attached due to size.  Example change:
> --- a/localedata/locales/ak_GH
> +++ b/localedata/locales/ak_GH
> @@ -37,19 +37,19 @@ language     "Akan"
>  territory    "Ghana"
>  revision     "1.0"
>  date         "2013-08-24"
> -%
> -category  "ak_GH:2013";LC_IDENTIFICATION
> -category  "ak_GH:2013";LC_CTYPE
> -category  "ak_GH:2013";LC_COLLATE
> -category  "ak_GH:2013";LC_TIME
> -category  "ak_GH:2013";LC_NUMERIC
> -category  "ak_GH:2013";LC_MONETARY
> -category  "ak_GH:2013";LC_PAPER
> -category  "ak_GH:2013";LC_MEASUREMENT
> -category  "ak_GH:2013";LC_MESSAGES
> -category  "ak_GH:2013";LC_NAME
> -category  "ak_GH:2013";LC_ADDRESS
> -category  "ak_GH:2013";LC_TELEPHONE
> +
> +category "i18n:2002";LC_IDENTIFICATION
> +category "i18n:2002";LC_CTYPE
> +category "i18n:2002";LC_COLLATE
> +category "i18n:2002";LC_TIME
> +category "i18n:2002";LC_NUMERIC
> +category "i18n:2002";LC_MONETARY
> +category "i18n:2002";LC_PAPER
> +category "i18n:2002";LC_MEASUREMENT
> +category "i18n:2002";LC_MESSAGES
> +category "i18n:2002";LC_NAME
> +category "i18n:2002";LC_ADDRESS
> +category "i18n:2002";LC_TELEPHONE
>  END LC_IDENTIFICATION
>
>  LC_CTYPE

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] localedate: LC_IDENTIFICATION.category: set to ISO 14652 2002 standard
  2016-04-13 16:39 [PATCH] localedate: LC_IDENTIFICATION.category: set to ISO 14652 2002 standard Mike Frysinger
  2016-04-13 18:50 ` Chris Leonard
@ 2016-04-13 18:57 ` Carlos O'Donell
  2016-04-13 20:05   ` Mike Frysinger
  2016-04-13 22:45   ` [PATCH] localedef: check LC_IDENTIFICATION.category values Mike Frysinger
  2016-04-14 21:18 ` [PATCH v2] localedata: LC_IDENTIFICATION.category: set to ISO 30112 2014 standard Mike Frysinger
  2 siblings, 2 replies; 15+ messages in thread
From: Carlos O'Donell @ 2016-04-13 18:57 UTC (permalink / raw)
  To: libc-alpha

On 04/13/2016 12:39 PM, Mike Frysinger wrote:
> The ISO 14652 standard defines the valid values for the category
> keyword as only two options:
> 	posix:1993
> 	i18n:2002
> 
> The vast majority of locales had changed the "i18n" string to the
> name of its own locale (e.g. "ak_GH:2013") as well as tweaking the
> date (presumably thinking it should be the date of submission).
> 
> Convert all of them to "i18n:2002" for consistency.

Any chance you can tighten the parser to reject anything but the
two valid category keywords?

I think this change is correct, but I'd rather see a patch that
enforces policy *and* changes the locale source to match.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] localedate: LC_IDENTIFICATION.category: set to ISO 14652 2002 standard
  2016-04-13 18:57 ` Carlos O'Donell
@ 2016-04-13 20:05   ` Mike Frysinger
  2016-04-13 22:45   ` [PATCH] localedef: check LC_IDENTIFICATION.category values Mike Frysinger
  1 sibling, 0 replies; 15+ messages in thread
From: Mike Frysinger @ 2016-04-13 20:05 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha, keld

[-- Attachment #1: Type: text/plain, Size: 1530 bytes --]

On 13 Apr 2016 14:57, Carlos O'Donell wrote:
> On 04/13/2016 12:39 PM, Mike Frysinger wrote:
> > The ISO 14652 standard defines the valid values for the category
> > keyword as only two options:
> > 	posix:1993
> > 	i18n:2002
> > 
> > The vast majority of locales had changed the "i18n" string to the
> > name of its own locale (e.g. "ak_GH:2013") as well as tweaking the
> > date (presumably thinking it should be the date of submission).
> > 
> > Convert all of them to "i18n:2002" for consistency.
> 
> Any chance you can tighten the parser to reject anything but the
> two valid category keywords?

i figured someone would ask for that eventually :).  it's not clear to
me how many valid values there are because the ISO 14652 standard is
difficult to obtain.  i've only be able to find 1999 and 2002 copies,
but i'm pretty sure there's other revisions as well.  maybe we start
off only accepting these two values and worry about the rest later ?

the other aspect is that, while we might validate some sanity on the
category fields in general, the code (afaict) is not structured for
actually handling the differences.  for example, if the locale says
posix:1993 or i18n:1999 (which the older ISO 14652 1999 standard
allows), we don't change the parsing behavior to reject features
that are new to i18n:2002.

i guess one thing at a time: let's update localdef to only accept
these two values and reject all others.  i'll look at that before
merging this patch in case it's easy to do.
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] localedef: check LC_IDENTIFICATION.category values
  2016-04-13 18:57 ` Carlos O'Donell
  2016-04-13 20:05   ` Mike Frysinger
@ 2016-04-13 22:45   ` Mike Frysinger
  2016-04-14  8:59     ` keld
  2016-04-14 21:21     ` [PATCH v2] " Mike Frysinger
  1 sibling, 2 replies; 15+ messages in thread
From: Mike Frysinger @ 2016-04-13 22:45 UTC (permalink / raw)
  To: libc-alpha; +Cc: carlos, keld

Currently localedef accepts any value for the category keyword.  This has
allowed bad values to propagate to the vast majority of locales (~90%).
Add some logic to only accept the 1993 POSIX and 2002 ISO-14652 standards.

2016-04-13  Mike Frysinger  <vapier@gentoo.org>

	* locale/programs/ld-identification.c (identification_finish): Check
	that the values in identification->category are only posix:1993 or
	i18n:2002.
---
 locale/programs/ld-identification.c | 42 ++++++++++++++++++++++++++++++-------
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/locale/programs/ld-identification.c b/locale/programs/ld-identification.c
index 1e8fa84..eccb388 100644
--- a/locale/programs/ld-identification.c
+++ b/locale/programs/ld-identification.c
@@ -164,14 +164,42 @@ No definition for %s category found"), "LC_IDENTIFICATION"));
   TEST_ELEM (date);
 
   for (num = 0; num < __LC_LAST; ++num)
-    if (num != LC_ALL && identification->category[num] == NULL)
-      {
-	if (verbose && ! nothing)
-	  WITH_CUR_LOCALE (error (0, 0, _("\
+    {
+      /* We don't accept/parse this category, so skip it early.  */
+      if (num == LC_ALL)
+	continue;
+
+      if (identification->category[num] == NULL)
+	{
+	  if (verbose && ! nothing)
+	    WITH_CUR_LOCALE (error (0, 0, _("\
 %s: no identification for category `%s'"),
-				  "LC_IDENTIFICATION", category_name[num]));
-	identification->category[num] = "";
-      }
+				    "LC_IDENTIFICATION", category_name[num]));
+	  identification->category[num] = "";
+	}
+      else
+	{
+	  /* Only list the standards we care about.  */
+	  static const char * const standards[] =
+	    {
+	      "posix:1993",
+	      "i18n:2002",
+	    };
+	  size_t i;
+	  bool matched = false;
+
+	  for (i = 0; i < sizeof (standards) / sizeof (standards[0]); ++i)
+	    if (strcmp (identification->category[num], standards[i]) == 0)
+	      matched = true;
+
+	  if (matched != true)
+	    WITH_CUR_LOCALE (error (0, 0, _("\
+%s: unknown standard `%s' for category `%s'"),
+				    "LC_IDENTIFICATION",
+				    identification->category[num],
+				    category_name[num]));
+	}
+    }
 }
 
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] localedef: check LC_IDENTIFICATION.category values
  2016-04-13 22:45   ` [PATCH] localedef: check LC_IDENTIFICATION.category values Mike Frysinger
@ 2016-04-14  8:59     ` keld
  2016-04-14  9:26       ` keld
  2016-04-14 21:21     ` [PATCH v2] " Mike Frysinger
  1 sibling, 1 reply; 15+ messages in thread
From: keld @ 2016-04-14  8:59 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: libc-alpha, carlos

Please also allow ISO 30112 categories.

best regards
keld

On Wed, Apr 13, 2016 at 06:45:32PM -0400, Mike Frysinger wrote:
> Currently localedef accepts any value for the category keyword.  This has
> allowed bad values to propagate to the vast majority of locales (~90%).
> Add some logic to only accept the 1993 POSIX and 2002 ISO-14652 standards.
> 
> 2016-04-13  Mike Frysinger  <vapier@gentoo.org>
> 
> 	* locale/programs/ld-identification.c (identification_finish): Check
> 	that the values in identification->category are only posix:1993 or
> 	i18n:2002.
> ---
>  locale/programs/ld-identification.c | 42 ++++++++++++++++++++++++++++++-------
>  1 file changed, 35 insertions(+), 7 deletions(-)
> 
> diff --git a/locale/programs/ld-identification.c b/locale/programs/ld-identification.c
> index 1e8fa84..eccb388 100644
> --- a/locale/programs/ld-identification.c
> +++ b/locale/programs/ld-identification.c
> @@ -164,14 +164,42 @@ No definition for %s category found"), "LC_IDENTIFICATION"));
>    TEST_ELEM (date);
>  
>    for (num = 0; num < __LC_LAST; ++num)
> -    if (num != LC_ALL && identification->category[num] == NULL)
> -      {
> -	if (verbose && ! nothing)
> -	  WITH_CUR_LOCALE (error (0, 0, _("\
> +    {
> +      /* We don't accept/parse this category, so skip it early.  */
> +      if (num == LC_ALL)
> +	continue;
> +
> +      if (identification->category[num] == NULL)
> +	{
> +	  if (verbose && ! nothing)
> +	    WITH_CUR_LOCALE (error (0, 0, _("\
>  %s: no identification for category `%s'"),
> -				  "LC_IDENTIFICATION", category_name[num]));
> -	identification->category[num] = "";
> -      }
> +				    "LC_IDENTIFICATION", category_name[num]));
> +	  identification->category[num] = "";
> +	}
> +      else
> +	{
> +	  /* Only list the standards we care about.  */
> +	  static const char * const standards[] =
> +	    {
> +	      "posix:1993",
> +	      "i18n:2002",
> +	    };
> +	  size_t i;
> +	  bool matched = false;
> +
> +	  for (i = 0; i < sizeof (standards) / sizeof (standards[0]); ++i)
> +	    if (strcmp (identification->category[num], standards[i]) == 0)
> +	      matched = true;
> +
> +	  if (matched != true)
> +	    WITH_CUR_LOCALE (error (0, 0, _("\
> +%s: unknown standard `%s' for category `%s'"),
> +				    "LC_IDENTIFICATION",
> +				    identification->category[num],
> +				    category_name[num]));
> +	}
> +    }
>  }
>  
>  
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] localedef: check LC_IDENTIFICATION.category values
  2016-04-14  8:59     ` keld
@ 2016-04-14  9:26       ` keld
  2016-04-14 13:50         ` Mike Frysinger
  0 siblings, 1 reply; 15+ messages in thread
From: keld @ 2016-04-14  9:26 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: libc-alpha, carlos

Actually the standards 14652/30112 were set up so you could declare
what version of the locale category was used for the data.
POSIX is different from 14652 and again different from 30112.
30112 is the one that most closely corresponds to glibc implementations.


I also think that POSIX allows for more categories than the ones that the
9945 standard defines, and in that way 14652 and 30112 are compatible 
with POSIX. I would advise that this still be allowed, but then declared
in the LC_IDENTIFICATION section. Maybe we should use a specifiv version value like
"non-standard" to indicate that.

I would advice to use the values for the locale versions
given in 30112. The values defined in 30112 are:
i18n:2004
i18n:2012
posix:1993

Best regards
Keld


On Thu, Apr 14, 2016 at 10:59:19AM +0200, keld@keldix.com wrote:
> Please also allow ISO 30112 categories.
> 
> best regards
> keld
> 
> On Wed, Apr 13, 2016 at 06:45:32PM -0400, Mike Frysinger wrote:
> > Currently localedef accepts any value for the category keyword.  This has
> > allowed bad values to propagate to the vast majority of locales (~90%).
> > Add some logic to only accept the 1993 POSIX and 2002 ISO-14652 standards.
> > 
> > 2016-04-13  Mike Frysinger  <vapier@gentoo.org>
> > 
> > 	* locale/programs/ld-identification.c (identification_finish): Check
> > 	that the values in identification->category are only posix:1993 or
> > 	i18n:2002.
> > ---
> >  locale/programs/ld-identification.c | 42 ++++++++++++++++++++++++++++++-------
> >  1 file changed, 35 insertions(+), 7 deletions(-)
> > 
> > diff --git a/locale/programs/ld-identification.c b/locale/programs/ld-identification.c
> > index 1e8fa84..eccb388 100644
> > --- a/locale/programs/ld-identification.c
> > +++ b/locale/programs/ld-identification.c
> > @@ -164,14 +164,42 @@ No definition for %s category found"), "LC_IDENTIFICATION"));
> >    TEST_ELEM (date);
> >  
> >    for (num = 0; num < __LC_LAST; ++num)
> > -    if (num != LC_ALL && identification->category[num] == NULL)
> > -      {
> > -	if (verbose && ! nothing)
> > -	  WITH_CUR_LOCALE (error (0, 0, _("\
> > +    {
> > +      /* We don't accept/parse this category, so skip it early.  */
> > +      if (num == LC_ALL)
> > +	continue;
> > +
> > +      if (identification->category[num] == NULL)
> > +	{
> > +	  if (verbose && ! nothing)
> > +	    WITH_CUR_LOCALE (error (0, 0, _("\
> >  %s: no identification for category `%s'"),
> > -				  "LC_IDENTIFICATION", category_name[num]));
> > -	identification->category[num] = "";
> > -      }
> > +				    "LC_IDENTIFICATION", category_name[num]));
> > +	  identification->category[num] = "";
> > +	}
> > +      else
> > +	{
> > +	  /* Only list the standards we care about.  */
> > +	  static const char * const standards[] =
> > +	    {
> > +	      "posix:1993",
> > +	      "i18n:2002",
> > +	    };
> > +	  size_t i;
> > +	  bool matched = false;
> > +
> > +	  for (i = 0; i < sizeof (standards) / sizeof (standards[0]); ++i)
> > +	    if (strcmp (identification->category[num], standards[i]) == 0)
> > +	      matched = true;
> > +
> > +	  if (matched != true)
> > +	    WITH_CUR_LOCALE (error (0, 0, _("\
> > +%s: unknown standard `%s' for category `%s'"),
> > +				    "LC_IDENTIFICATION",
> > +				    identification->category[num],
> > +				    category_name[num]));
> > +	}
> > +    }
> >  }
> >  
> >  
> > -- 
> > 2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] localedef: check LC_IDENTIFICATION.category values
  2016-04-14  9:26       ` keld
@ 2016-04-14 13:50         ` Mike Frysinger
  2016-04-14 15:04           ` keld
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Frysinger @ 2016-04-14 13:50 UTC (permalink / raw)
  To: keld; +Cc: libc-alpha, carlos

[-- Attachment #1: Type: text/plain, Size: 2607 bytes --]

On 14 Apr 2016 11:26, keld@keldix.com wrote:
> Actually the standards 14652/30112 were set up so you could declare
> what version of the locale category was used for the data.
> POSIX is different from 14652 and again different from 30112.
> 30112 is the one that most closely corresponds to glibc implementations.

in general, for standards that are stuck behind ISO's dumb paywall (they
want to charge CHF198 for the pleasure of downloading what should be in
the public), you'll have to tell me what values to plug in, and/or what
it says.

although i have found this link:
	http://www.open-std.org/JTC1/SC35/WG5/docs/30112d10.pdf
is that the same ?

if it is, i would highlight that the examples provided in the spec do
not seem to line up with the spec itself ;).  the Danish example that
is embedded in the file tries to use "i18n:2000", and it doesn't use
double quotes like it says it should be.

> I also think that POSIX allows for more categories than the ones that the
> 9945 standard defines, and in that way 14652 and 30112 are compatible 

looks like ISO 9945 is just the combined POSIX standard (2003 edition).
the public 2004 edition [1] and 2013 edition [2] do not define the cat
LC_IDENTIFICATION, so they wouldn't have anything to say here.  also,
even if those allow for defining of arbitrary categories, that's kind
of orthogonal to glibc's localedef needs isn't it ?  the utility has
been rejecting all unknown categories for basically ever at this point.
[1] http://pubs.opengroup.org/onlinepubs/009695399/
[2] http://pubs.opengroup.org/onlinepubs/9699919799/

if you try to do:
LC_FOO
...
END LC_FOO
localdef will reject it as a syntax error.

if you try to do:
LC_IDENTIFICATION
...
category "en_US:2000";LC_FOO
...
END LC_IDENTIFICATION
localdef will reject it as a syntax error (ignoring the standard part).

are you referring to something else ?

> with POSIX. I would advise that this still be allowed, but then declared
> in the LC_IDENTIFICATION section. Maybe we should use a specifiv version value like
> "non-standard" to indicate that.

why do we need to support that ?  we're talking about what localedef
will accept, and localedef is entirely a glibc-specific utility.  the
binary format it produces is internal glibc ABI.  seems like accepting
other random values isn't useful to us.

> I would advice to use the values for the locale versions
> given in 30112. The values defined in 30112 are:
> i18n:2004
> i18n:2012
> posix:1993

OK.  shall i update all the locale files then to use i18n:2012 ?
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] localedef: check LC_IDENTIFICATION.category values
  2016-04-14 13:50         ` Mike Frysinger
@ 2016-04-14 15:04           ` keld
  2016-04-14 17:49             ` Mike Frysinger
  0 siblings, 1 reply; 15+ messages in thread
From: keld @ 2016-04-14 15:04 UTC (permalink / raw)
  To: libc-alpha, carlos

On Thu, Apr 14, 2016 at 09:50:33AM -0400, Mike Frysinger wrote:
> On 14 Apr 2016 11:26, keld@keldix.com wrote:
> > Actually the standards 14652/30112 were set up so you could declare
> > what version of the locale category was used for the data.
> > POSIX is different from 14652 and again different from 30112.
> > 30112 is the one that most closely corresponds to glibc implementations.
> 
> in general, for standards that are stuck behind ISO's dumb paywall (they
> want to charge CHF198 for the pleasure of downloading what should be in
> the public), you'll have to tell me what values to plug in, and/or what
> it says.

I agree.

> although i have found this link:
> 	http://www.open-std.org/JTC1/SC35/WG5/docs/30112d10.pdf
> is that the same ?

It is a new Working Draft for the revision of 30112, so it contains all of
the approved TR 30112 from 2014, plus some. But it is not a standard,
it is work in progress. That is why we are allowed to have it publically available.

> if it is, i would highlight that the examples provided in the spec do
> not seem to line up with the spec itself ;).  the Danish example that
> is embedded in the file tries to use "i18n:2000", and it doesn't use
> double quotes like it says it should be.

There are errors everywhere. This is a draft, and not supposed to be error-free.
Anyway, the same inconsistency was probably in the approved TR.
I will see to that this be corrected. Probably it should be marked with
the new standards's identifying value.

> > I also think that POSIX allows for more categories than the ones that the
> > 9945 standard defines, and in that way 14652 and 30112 are compatible 
> 
> looks like ISO 9945 is just the combined POSIX standard (2003 edition).
> the public 2004 edition [1] and 2013 edition [2] do not define the cat
> LC_IDENTIFICATION, so they wouldn't have anything to say here.  also,
> even if those allow for defining of arbitrary categories, that's kind
> of orthogonal to glibc's localedef needs isn't it ?  the utility has
> been rejecting all unknown categories for basically ever at this point.
> [1] http://pubs.opengroup.org/onlinepubs/009695399/
> [2] http://pubs.opengroup.org/onlinepubs/9699919799/

Well, yes, LC_IDENTIFICATION is a novelty of 14652. 
But 9945 - POSIX does allow implementation defined categories AFAIK.
There is one new category in 30112, namely LC_KEYBOARD. I am not sure whether
glibc supports LC_XLITERATE eitherC, or the functionality is present only in 
LC_CTYPE.

> 
> if you try to do:
> LC_FOO
> ...
> END LC_FOO
> localdef will reject it as a syntax error.
> 
> if you try to do:
> LC_IDENTIFICATION
> ...
> category "en_US:2000";LC_FOO
> ...
> END LC_IDENTIFICATION
> localdef will reject it as a syntax error (ignoring the standard part).
> 
> are you referring to something else ?

No. I would like your last example to not error, it could issue a warning,
or at least that LC_KEYBOARD be accepted. 
In that way one could use localedef to test new functionality.

> > with POSIX. I would advise that this still be allowed, but then declared
> > in the LC_IDENTIFICATION section. Maybe we should use a specifiv version value like
> > "non-standard" to indicate that.
> 
> why do we need to support that ?  we're talking about what localedef
> will accept, and localedef is entirely a glibc-specific utility.  the
> binary format it produces is internal glibc ABI.  seems like accepting
> other random values isn't useful to us.

Localedef is specified in POSIX, 
http://pubs.opengroup.org/onlinepubs/009696699/utilities/localedef.html

> > I would advice to use the values for the locale versions
> > given in 30112. The values defined in 30112 are:
> > i18n:2004
> > i18n:2012
> > posix:1993
> 
> OK.  shall i update all the locale files then to use i18n:2012 ?

Yes, I think that this is the most appropiate.

Best regards
Keld


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] localedef: check LC_IDENTIFICATION.category values
  2016-04-14 15:04           ` keld
@ 2016-04-14 17:49             ` Mike Frysinger
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Frysinger @ 2016-04-14 17:49 UTC (permalink / raw)
  To: keld; +Cc: libc-alpha, carlos

[-- Attachment #1: Type: text/plain, Size: 4026 bytes --]

On 14 Apr 2016 17:04, keld@keldix.com wrote:
> On Thu, Apr 14, 2016 at 09:50:33AM -0400, Mike Frysinger wrote:
> > On 14 Apr 2016 11:26, keld@keldix.com wrote:
> > > I also think that POSIX allows for more categories than the ones that the
> > > 9945 standard defines, and in that way 14652 and 30112 are compatible 
> > 
> > looks like ISO 9945 is just the combined POSIX standard (2003 edition).
> > the public 2004 edition [1] and 2013 edition [2] do not define the cat
> > LC_IDENTIFICATION, so they wouldn't have anything to say here.  also,
> > even if those allow for defining of arbitrary categories, that's kind
> > of orthogonal to glibc's localedef needs isn't it ?  the utility has
> > been rejecting all unknown categories for basically ever at this point.
> > [1] http://pubs.opengroup.org/onlinepubs/009695399/
> > [2] http://pubs.opengroup.org/onlinepubs/9699919799/
> 
> Well, yes, LC_IDENTIFICATION is a novelty of 14652. 
> But 9945 - POSIX does allow implementation defined categories AFAIK.

sure -- see below

> There is one new category in 30112, namely LC_KEYBOARD. I am not sure whether
> glibc supports LC_XLITERATE eitherC, or the functionality is present only in 
> LC_CTYPE.

we don't support LC_KEYBOARD or LC_XLITERATE today.  i think any new
categories would need to be proposed including why glibc should carry
them at all.  i haven't read the standard, so i can't speak to either.

> > if you try to do:
> > LC_FOO
> > ...
> > END LC_FOO
> > localdef will reject it as a syntax error.
> > 
> > if you try to do:
> > LC_IDENTIFICATION
> > ...
> > category "en_US:2000";LC_FOO
> > ...
> > END LC_IDENTIFICATION
> > localdef will reject it as a syntax error (ignoring the standard part).
> > 
> > are you referring to something else ?
> 
> No. I would like your last example to not error, it could issue a warning,
> or at least that LC_KEYBOARD be accepted. 
> In that way one could use localedef to test new functionality.

we can have it warn.  localedef has precedence w/not warning about many
things or being fatal by default, but adding -v makes it more strict.
this seems to fall into that bucket.

i'm not keen on -v/--verbose being a hidden alias to also "exit non-zero
in many more cases", but that's a diff topic :).

> > > with POSIX. I would advise that this still be allowed, but then declared
> > > in the LC_IDENTIFICATION section. Maybe we should use a specifiv version value like
> > > "non-standard" to indicate that.
> > 
> > why do we need to support that ?  we're talking about what localedef
> > will accept, and localedef is entirely a glibc-specific utility.  the
> > binary format it produces is internal glibc ABI.  seems like accepting
> > other random values isn't useful to us.
> 
> Localedef is specified in POSIX, 
> http://pubs.opengroup.org/onlinepubs/009696699/utilities/localedef.html

on the frontend sure.  i was thinking of its output format which is not
specified by POSIX but is an internal glibc ABI detail.  it even says:
	The localedef utility shall convert source definitions for locale
	categories into a format usable by the functions and utilities ...
i.e. it doesn't specify that output format.

back to the frontend, what POSIX specifically says is:
	In addition, the input may contain source for implementation-defined
	categories.

so glibc's localedef is free to support as many more or few categories as
it sees fit.  that includes outright rejecting unknown ones.

also, if we want to speak stricly about POSIX, it also says:
	-u  code_set_name
	Specify the name of a codeset used as the target mapping of character
	symbols and collating element symbols whose encoding values are defined
	in terms of the ISO/IEC 10646-1:2000 standard position constant values.

pretty sure that says we aren't even permitted to support a newer standard
there.  whether it matters in practice i'm not sure (haven't done a diff on
the diff versions/standards).
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] localedata: LC_IDENTIFICATION.category: set to ISO 30112 2014 standard
  2016-04-13 16:39 [PATCH] localedate: LC_IDENTIFICATION.category: set to ISO 14652 2002 standard Mike Frysinger
  2016-04-13 18:50 ` Chris Leonard
  2016-04-13 18:57 ` Carlos O'Donell
@ 2016-04-14 21:18 ` Mike Frysinger
  2016-04-15  4:44   ` Carlos O'Donell
  2 siblings, 1 reply; 15+ messages in thread
From: Mike Frysinger @ 2016-04-14 21:18 UTC (permalink / raw)
  To: libc-alpha


[-- Attachment #1.1: Type: text/plain, Size: 1586 bytes --]

The ISO 30112 standard defines the valid values for the category
keyword as only a few options:
	posix:1993
	i18n:2004
	i18n:2012

The vast majority of locales had changed the "i18n" string to the
name of its own locale (e.g. "ak_GH:2013") as well as tweaking the
date (presumably thinking it should be the date of submission).

Convert all of them to "i18n:2012" for consistency.  A follow up
change will update localedef to actually check/validate the field.

Compressed for size.  Sample change:
--- a/localedata/locales/kk_KZ
+++ b/localedata/locales/kk_KZ
@@ -35,19 +35,19 @@ language   "Kazakh"
 territory  "Kazakhstan"
 revision   "1.0"
 date       "2003-06-06"
-%
-category  "kk_KZ:2000";LC_IDENTIFICATION
-category  "kk_KZ:2000";LC_CTYPE
-category  "kk_KZ:2000";LC_COLLATE
-category  "kk_KZ:2000";LC_TIME
-category  "kk_KZ:2000";LC_NUMERIC
-category  "kk_KZ:2000";LC_MONETARY
-category  "kk_KZ:2000";LC_MESSAGES
-category  "kk_KZ:2000";LC_PAPER
-category  "kk_KZ:2000";LC_NAME
-category  "kk_KZ:2000";LC_ADDRESS
-category  "kk_KZ:2000";LC_TELEPHONE
-category  "kk_KZ:2000";LC_MEASUREMENT
+
+category "i18n:2012";LC_IDENTIFICATION
+category "i18n:2012";LC_CTYPE
+category "i18n:2012";LC_COLLATE
+category "i18n:2012";LC_TIME
+category "i18n:2012";LC_NUMERIC
+category "i18n:2012";LC_MONETARY
+category "i18n:2012";LC_MESSAGES
+category "i18n:2012";LC_PAPER
+category "i18n:2012";LC_NAME
+category "i18n:2012";LC_ADDRESS
+category "i18n:2012";LC_TELEPHONE
+category "i18n:2012";LC_MEASUREMENT
 END LC_IDENTIFICATION
 
 LC_COLLATE

[-- Attachment #1.2: 0001-localedata-LC_IDENTIFICATION.category-set-to-ISO-301.patch.xz --]
[-- Type: application/x-xz, Size: 21312 bytes --]

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] localedef: check LC_IDENTIFICATION.category values
  2016-04-13 22:45   ` [PATCH] localedef: check LC_IDENTIFICATION.category values Mike Frysinger
  2016-04-14  8:59     ` keld
@ 2016-04-14 21:21     ` Mike Frysinger
  2016-04-15  4:44       ` Carlos O'Donell
  1 sibling, 1 reply; 15+ messages in thread
From: Mike Frysinger @ 2016-04-14 21:21 UTC (permalink / raw)
  To: libc-alpha; +Cc: keld

Currently localedef accepts any value for the category keyword.  This has
allowed bad values to propagate to the vast majority of locales (~90%).
Add some logic to only accept a few standards.

2016-04-13  Mike Frysinger  <vapier@gentoo.org>

	* locale/programs/ld-identification.c (identification_finish): Check
	that the values in identification->category are only known.
---
v2:
	- tweak list of accepted standards

 locale/programs/ld-identification.c | 43 +++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/locale/programs/ld-identification.c b/locale/programs/ld-identification.c
index 1e8fa84..9234304 100644
--- a/locale/programs/ld-identification.c
+++ b/locale/programs/ld-identification.c
@@ -164,14 +164,43 @@ No definition for %s category found"), "LC_IDENTIFICATION"));
   TEST_ELEM (date);
 
   for (num = 0; num < __LC_LAST; ++num)
-    if (num != LC_ALL && identification->category[num] == NULL)
-      {
-	if (verbose && ! nothing)
-	  WITH_CUR_LOCALE (error (0, 0, _("\
+    {
+      /* We don't accept/parse this category, so skip it early.  */
+      if (num == LC_ALL)
+	continue;
+
+      if (identification->category[num] == NULL)
+	{
+	  if (verbose && ! nothing)
+	    WITH_CUR_LOCALE (error (0, 0, _("\
 %s: no identification for category `%s'"),
-				  "LC_IDENTIFICATION", category_name[num]));
-	identification->category[num] = "";
-      }
+				    "LC_IDENTIFICATION", category_name[num]));
+	  identification->category[num] = "";
+	}
+      else
+	{
+	  /* Only list the standards we care about.  */
+	  static const char * const standards[] =
+	    {
+	      "posix:1993",
+	      "i18n:2004",
+	      "i18n:2012",
+	    };
+	  size_t i;
+	  bool matched = false;
+
+	  for (i = 0; i < sizeof (standards) / sizeof (standards[0]); ++i)
+	    if (strcmp (identification->category[num], standards[i]) == 0)
+	      matched = true;
+
+	  if (matched != true)
+	    WITH_CUR_LOCALE (error (0, 0, _("\
+%s: unknown standard `%s' for category `%s'"),
+				    "LC_IDENTIFICATION",
+				    identification->category[num],
+				    category_name[num]));
+	}
+    }
 }
 
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] localedata: LC_IDENTIFICATION.category: set to ISO 30112 2014 standard
  2016-04-14 21:18 ` [PATCH v2] localedata: LC_IDENTIFICATION.category: set to ISO 30112 2014 standard Mike Frysinger
@ 2016-04-15  4:44   ` Carlos O'Donell
  0 siblings, 0 replies; 15+ messages in thread
From: Carlos O'Donell @ 2016-04-15  4:44 UTC (permalink / raw)
  To: libc-alpha

On 04/14/2016 05:18 PM, Mike Frysinger wrote:
> The ISO 30112 standard defines the valid values for the category
> keyword as only a few options:
> 	posix:1993
> 	i18n:2004
> 	i18n:2012
> 
> The vast majority of locales had changed the "i18n" string to the
> name of its own locale (e.g. "ak_GH:2013") as well as tweaking the
> date (presumably thinking it should be the date of submission).
> 
> Convert all of them to "i18n:2012" for consistency.  A follow up
> change will update localedef to actually check/validate the field.

This looks good to me.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] localedef: check LC_IDENTIFICATION.category values
  2016-04-14 21:21     ` [PATCH v2] " Mike Frysinger
@ 2016-04-15  4:44       ` Carlos O'Donell
  2016-04-15 16:46         ` Mike Frysinger
  0 siblings, 1 reply; 15+ messages in thread
From: Carlos O'Donell @ 2016-04-15  4:44 UTC (permalink / raw)
  To: Mike Frysinger, libc-alpha; +Cc: keld

On 04/14/2016 05:21 PM, Mike Frysinger wrote:
> Currently localedef accepts any value for the category keyword.  This has
> allowed bad values to propagate to the vast majority of locales (~90%).
> Add some logic to only accept a few standards.
> 
> 2016-04-13  Mike Frysinger  <vapier@gentoo.org>
> 
> 	* locale/programs/ld-identification.c (identification_finish): Check
> 	that the values in identification->category are only known.
> ---
> v2:
> 	- tweak list of accepted standards

OK if you expand the comment "Only list the standards we care about." to
list the standards we reviewed when making this list, that way a future
developers can review.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] localedef: check LC_IDENTIFICATION.category values
  2016-04-15  4:44       ` Carlos O'Donell
@ 2016-04-15 16:46         ` Mike Frysinger
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Frysinger @ 2016-04-15 16:46 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha, keld

[-- Attachment #1: Type: text/plain, Size: 983 bytes --]

On 15 Apr 2016 00:44, Carlos O'Donell wrote:
> On 04/14/2016 05:21 PM, Mike Frysinger wrote:
> > Currently localedef accepts any value for the category keyword.  This has
> > allowed bad values to propagate to the vast majority of locales (~90%).
> > Add some logic to only accept a few standards.
> > 
> > 2016-04-13  Mike Frysinger  <vapier@gentoo.org>
> > 
> > 	* locale/programs/ld-identification.c (identification_finish): Check
> > 	that the values in identification->category are only known.
> > ---
> > v2:
> > 	- tweak list of accepted standards
> 
> OK if you expand the comment "Only list the standards we care about." to
> list the standards we reviewed when making this list, that way a future
> developers can review.

ok, i've written:
          /* Only list the standards we care about.  This is based on the
             ISO 30112 WD10 [2014] standard which supersedes all previous
             revisions of the ISO 14652 standard.  */ 
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-04-15 16:46 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-13 16:39 [PATCH] localedate: LC_IDENTIFICATION.category: set to ISO 14652 2002 standard Mike Frysinger
2016-04-13 18:50 ` Chris Leonard
2016-04-13 18:57 ` Carlos O'Donell
2016-04-13 20:05   ` Mike Frysinger
2016-04-13 22:45   ` [PATCH] localedef: check LC_IDENTIFICATION.category values Mike Frysinger
2016-04-14  8:59     ` keld
2016-04-14  9:26       ` keld
2016-04-14 13:50         ` Mike Frysinger
2016-04-14 15:04           ` keld
2016-04-14 17:49             ` Mike Frysinger
2016-04-14 21:21     ` [PATCH v2] " Mike Frysinger
2016-04-15  4:44       ` Carlos O'Donell
2016-04-15 16:46         ` Mike Frysinger
2016-04-14 21:18 ` [PATCH v2] localedata: LC_IDENTIFICATION.category: set to ISO 30112 2014 standard Mike Frysinger
2016-04-15  4:44   ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).