public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] intl: Treat C.UTF-8 locale like C locale, part 2 (BZ# 16621)
@ 2023-09-10 19:17 Bruno Haible
  2023-09-11  7:43 ` Florian Weimer
  2023-12-12  9:08 ` Florian Weimer
  0 siblings, 2 replies; 4+ messages in thread
From: Bruno Haible @ 2023-09-10 19:17 UTC (permalink / raw)
  To: libc-alpha; +Cc: Bruno Haible

The previous commit was incomplete: gettext() still returns a translation
if the file /usr/share/locale/C/LC_MESSAGES/<domain>.mo exists. This patch
prohibits the translation also in this case.

* gettext-runtime/intl/dcigettext.c (DCIGETTEXT): Treat C.<encoding> locale
like the C locale.
---
 intl/dcigettext.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/intl/dcigettext.c b/intl/dcigettext.c
index 27063886d2..fb69bbf94b 100644
--- a/intl/dcigettext.c
+++ b/intl/dcigettext.c
@@ -691,9 +691,10 @@ DCIGETTEXT (const char *domainname, const char *msgid1, const char *msgid2,
 	    continue;
 	}
 
-      /* If the current locale value is C (or POSIX) we don't load a
-	 domain.  Return the MSGID.  */
-      if (strcmp (single_locale, "C") == 0
+      /* If the current locale value is "C" or "C.<encoding>" or "POSIX",
+	 we don't load a domain.  Return the MSGID.  */
+      if ((single_locale[0] == 'C'
+	   && (single_locale[1] == '\0' || single_locale[1] == '.'))
 	  || strcmp (single_locale, "POSIX") == 0)
 	break;
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] intl: Treat C.UTF-8 locale like C locale, part 2 (BZ# 16621)
  2023-09-10 19:17 [PATCH] intl: Treat C.UTF-8 locale like C locale, part 2 (BZ# 16621) Bruno Haible
@ 2023-09-11  7:43 ` Florian Weimer
  2023-09-12 14:44   ` Bruno Haible
  2023-12-12  9:08 ` Florian Weimer
  1 sibling, 1 reply; 4+ messages in thread
From: Florian Weimer @ 2023-09-11  7:43 UTC (permalink / raw)
  To: Bruno Haible; +Cc: libc-alpha

* Bruno Haible:

> The previous commit was incomplete: gettext() still returns a translation
> if the file /usr/share/locale/C/LC_MESSAGES/<domain>.mo exists. This patch
> prohibits the translation also in this case.
>
> * gettext-runtime/intl/dcigettext.c (DCIGETTEXT): Treat C.<encoding> locale
> like the C locale.
> ---
>  intl/dcigettext.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/intl/dcigettext.c b/intl/dcigettext.c
> index 27063886d2..fb69bbf94b 100644
> --- a/intl/dcigettext.c
> +++ b/intl/dcigettext.c
> @@ -691,9 +691,10 @@ DCIGETTEXT (const char *domainname, const char *msgid1, const char *msgid2,
>  	    continue;
>  	}
>  
> -      /* If the current locale value is C (or POSIX) we don't load a
> -	 domain.  Return the MSGID.  */
> -      if (strcmp (single_locale, "C") == 0
> +      /* If the current locale value is "C" or "C.<encoding>" or "POSIX",
> +	 we don't load a domain.  Return the MSGID.  */
> +      if ((single_locale[0] == 'C'
> +	   && (single_locale[1] == '\0' || single_locale[1] == '.'))
>  	  || strcmp (single_locale, "POSIX") == 0)
>  	break;

I wasn't sure if this is a bug.  The implementation does not fallback to
translation, it just uses C as a message catalog.  Do you consider this
a problem?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] intl: Treat C.UTF-8 locale like C locale, part 2 (BZ# 16621)
  2023-09-11  7:43 ` Florian Weimer
@ 2023-09-12 14:44   ` Bruno Haible
  0 siblings, 0 replies; 4+ messages in thread
From: Bruno Haible @ 2023-09-12 14:44 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

Florian Weimer wrote:
> > The previous commit was incomplete: gettext() still returns a translation
> > if the file /usr/share/locale/C/LC_MESSAGES/<domain>.mo exists. This patch
> > prohibits the translation also in this case.
> 
> I wasn't sure if this is a bug.  The implementation does not fallback to
> translation, it just uses C as a message catalog.  Do you consider this
> a problem?

Yes, I consider this a bug, for two reasons:

* The wiki page https://sourceware.org/glibc/wiki/Proposals/C.UTF-8 states
  "It shall be the C locale but with UTF-8 encodings."
  and
  "These will be the same as C... LC_MESSAGES"

  The C locale has the property that gettext() returns the msgid in all cases,
  regardless of what files are on disk and regardless of the values of any
  environment variables.

  If the C.UTF-8 has the property that gettext() returns msgid only if there
  is no translation catalog at /usr/share/locale/C/LC_MESSAGES/<domain>.mo,
  it is *not* the same as "the C locale but with UTF-8 encodings".

* We have this rule, that gettext() returns the msgid when the locale is the
  "C" locale, because
     - the POSIX standard specifies the precise output of some programs (e.g.
       'diff') in the C locale, and
     - we wanted, from the beginning in 1995, that gettext() can be used in
       the source code of these programs, without an explicit check for the
       locale.
  It is possible that, in the long run, POSIX adopts the C.UTF-8 locale,
  since several platforms already have it: glibc, musl libc, FreeBSD, NetBSD,
  OpenBSD, Cygwin, Android.
  When this happens, we want that the maintainers of 'diff' etc. can continue
  to use gettext(), without introducing an explicit check for the locale.

Bruno




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] intl: Treat C.UTF-8 locale like C locale, part 2 (BZ# 16621)
  2023-09-10 19:17 [PATCH] intl: Treat C.UTF-8 locale like C locale, part 2 (BZ# 16621) Bruno Haible
  2023-09-11  7:43 ` Florian Weimer
@ 2023-12-12  9:08 ` Florian Weimer
  1 sibling, 0 replies; 4+ messages in thread
From: Florian Weimer @ 2023-12-12  9:08 UTC (permalink / raw)
  To: Bruno Haible; +Cc: libc-alpha

* Bruno Haible:

> The previous commit was incomplete: gettext() still returns a translation
> if the file /usr/share/locale/C/LC_MESSAGES/<domain>.mo exists. This patch
> prohibits the translation also in this case.
>
> * gettext-runtime/intl/dcigettext.c (DCIGETTEXT): Treat C.<encoding> locale
> like the C locale.
> ---
>  intl/dcigettext.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/intl/dcigettext.c b/intl/dcigettext.c
> index 27063886d2..fb69bbf94b 100644
> --- a/intl/dcigettext.c
> +++ b/intl/dcigettext.c
> @@ -691,9 +691,10 @@ DCIGETTEXT (const char *domainname, const char *msgid1, const char *msgid2,
>  	    continue;
>  	}
>  
> -      /* If the current locale value is C (or POSIX) we don't load a
> -	 domain.  Return the MSGID.  */
> -      if (strcmp (single_locale, "C") == 0
> +      /* If the current locale value is "C" or "C.<encoding>" or "POSIX",
> +	 we don't load a domain.  Return the MSGID.  */
> +      if ((single_locale[0] == 'C'
> +	   && (single_locale[1] == '\0' || single_locale[1] == '.'))
>  	  || strcmp (single_locale, "POSIX") == 0)
>  	break;

These arguments:

> * The wiki page https://sourceware.org/glibc/wiki/Proposals/C.UTF-8 states
>   "It shall be the C locale but with UTF-8 encodings."
>   and
>   "These will be the same as C... LC_MESSAGES"
> 
>   The C locale has the property that gettext() returns the msgid in all cases,
>   regardless of what files are on disk and regardless of the values of any
>   environment variables.
> 
>   If the C.UTF-8 has the property that gettext() returns msgid only if there
>   is no translation catalog at /usr/share/locale/C/LC_MESSAGES/<domain>.mo,
>   it is *not* the same as "the C locale but with UTF-8 encodings".
> 
> * We have this rule, that gettext() returns the msgid when the locale is the
>   "C" locale, because
>      - the POSIX standard specifies the precise output of some programs (e.g.
>        'diff') in the C locale, and
>      - we wanted, from the beginning in 1995, that gettext() can be used in
>        the source code of these programs, without an explicit check for the
>        locale.
>   It is possible that, in the long run, POSIX adopts the C.UTF-8 locale,
>   since several platforms already have it: glibc, musl libc, FreeBSD, NetBSD,
>   OpenBSD, Cygwin, Android.
>   When this happens, we want that the maintainers of 'diff' etc. can continue
>   to use gettext(), without introducing an explicit check for the locale.

<https://inbox.sourceware.org/libc-alpha/10272749.85pcf5A44T@nimes/>

convined me, so:

Reviewed-by: Florian Weimer <fweimer@redhat.com>

I'll push it for you.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-12-12  9:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-10 19:17 [PATCH] intl: Treat C.UTF-8 locale like C locale, part 2 (BZ# 16621) Bruno Haible
2023-09-11  7:43 ` Florian Weimer
2023-09-12 14:44   ` Bruno Haible
2023-12-12  9:08 ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).