public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* why does `LANG` not influence the locale in GNU/Linux
@ 2022-01-27  0:35 Godmar Back
  2022-01-27  5:28 ` Carlos O'Donell
  0 siblings, 1 reply; 4+ messages in thread
From: Godmar Back @ 2022-01-27  0:35 UTC (permalink / raw)
  To: William Tambe via Libc-help

If I am writing code that uses fgetwc to read from standard input, I
need to call setlocale(LC_CTYPE, "en_US.utf8"); to ensure that fgetwc
will treat stdin as a UTF-8 encoded stream.  This property appears to
hold (on Ubuntu 20, with default GNU libc), independent of the setting
of LANG.  LANG is set to en_US.utf8.

By contrast, Python 3 changes its behavior with regard to the encoding
it assumes sys.stdin to be in based on the LANG variable - if set to
en_US.utf8, it'll decode as UTF-8, if set to C or unset, it'll treat
the input as 8-bit ASCII encoding (I believe).

My question is why GNU libc (chose?) to not use LANG and what
standards, if any, apply here.
Is my characterization of the behavior correct?

 - Godmar

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: why does `LANG` not influence the locale in GNU/Linux
  2022-01-27  0:35 why does `LANG` not influence the locale in GNU/Linux Godmar Back
@ 2022-01-27  5:28 ` Carlos O'Donell
  2022-01-27  5:40   ` Godmar Back
  0 siblings, 1 reply; 4+ messages in thread
From: Carlos O'Donell @ 2022-01-27  5:28 UTC (permalink / raw)
  To: Godmar Back; +Cc: libc-help

On 1/26/22 19:35, Godmar Back via Libc-help wrote:
> If I am writing code that uses fgetwc to read from standard input, I
> need to call setlocale(LC_CTYPE, "en_US.utf8"); to ensure that fgetwc
> will treat stdin as a UTF-8 encoded stream.  This property appears to
> hold (on Ubuntu 20, with default GNU libc), independent of the setting
> of LANG.  LANG is set to en_US.utf8.
> 
> By contrast, Python 3 changes its behavior with regard to the encoding
> it assumes sys.stdin to be in based on the LANG variable - if set to
> en_US.utf8, it'll decode as UTF-8, if set to C or unset, it'll treat
> the input as 8-bit ASCII encoding (I believe).
> 
> My question is why GNU libc (chose?) to not use LANG and what
> standards, if any, apply here.
> Is my characterization of the behavior correct?

The ISO C standard and the POSIX standard have specified the behaviour.

ISO C says that at program startup the equivalent of:
setlocale (LC_ALL, "C"); /* (See 7.11.1.1.4) */
is executed. Therefore C programs start in the C locale, not the locale
as specified by LANG.

POSIX defines several environment variables, and they have precedence
rules for deciding which one should be used by the application [1].
However, this just defines the categories, but it doesn't *apply* them to the
running program. Internationalized programs that wish to initialize locale
specific operation must call setlocale [2]:

setlocale (LC_CALL, "");

Which will set up the global locale according to the environment variables
in the precedence as setup by POSIX.

Does that answer your question?

-- 
Cheers,
Carlos.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
[2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: why does `LANG` not influence the locale in GNU/Linux
  2022-01-27  5:28 ` Carlos O'Donell
@ 2022-01-27  5:40   ` Godmar Back
  2022-01-27  5:53     ` Carlos O'Donell
  0 siblings, 1 reply; 4+ messages in thread
From: Godmar Back @ 2022-01-27  5:40 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-help

On Thu, Jan 27, 2022 at 12:29 AM Carlos O'Donell <carlos@redhat.com> wrote:
>
> On 1/26/22 19:35, Godmar Back via Libc-help wrote:
> > If I am writing code that uses fgetwc to read from standard input, I
> > need to call setlocale(LC_CTYPE, "en_US.utf8"); to ensure that fgetwc
> > will treat stdin as a UTF-8 encoded stream.  This property appears to
> > hold (on Ubuntu 20, with default GNU libc), independent of the setting
> > of LANG.  LANG is set to en_US.utf8.
> >
> > By contrast, Python 3 changes its behavior with regard to the encoding
> > it assumes sys.stdin to be in based on the LANG variable - if set to
> > en_US.utf8, it'll decode as UTF-8, if set to C or unset, it'll treat
> > the input as 8-bit ASCII encoding (I believe).
> >
> > My question is why GNU libc (chose?) to not use LANG and what
> > standards, if any, apply here.
> > Is my characterization of the behavior correct?
>
> The ISO C standard and the POSIX standard have specified the behaviour.
>
> ISO C says that at program startup the equivalent of:
> setlocale (LC_ALL, "C"); /* (See 7.11.1.1.4) */
> is executed. Therefore C programs start in the C locale, not the locale
> as specified by LANG.
>
> POSIX defines several environment variables, and they have precedence
> rules for deciding which one should be used by the application [1].
> However, this just defines the categories, but it doesn't *apply* them to the
> running program. Internationalized programs that wish to initialize locale
> specific operation must call setlocale [2]:
>
> setlocale (LC_CALL, "");

There's an extraneous `C` here, I'm reading this as  `setlocale (LC_ALL, "")`.

>
> Which will set up the global locale according to the environment variables
> in the precedence as setup by POSIX.
>
> Does that answer your question?

Yes it does, thank you.

It's slightly ironic if you read, for instance PEP-538 [*] that claims
that Python has a need "to use the configured locale encoding by
default for consistency with other locale-aware components in the same
process or subprocesses" when the default behavior in a Unix
environment is to not be locale-aware and revert back to the C locale.

[*] https://www.python.org/dev/peps/pep-0538/

>
> --
> Cheers,
> Carlos.
>
> [1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
> [2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: why does `LANG` not influence the locale in GNU/Linux
  2022-01-27  5:40   ` Godmar Back
@ 2022-01-27  5:53     ` Carlos O'Donell
  0 siblings, 0 replies; 4+ messages in thread
From: Carlos O'Donell @ 2022-01-27  5:53 UTC (permalink / raw)
  To: Godmar Back; +Cc: libc-help

On 1/27/22 00:40, Godmar Back wrote:
> It's slightly ironic if you read, for instance PEP-538 [*] that claims
> that Python has a need "to use the configured locale encoding by
> default for consistency with other locale-aware components in the same
> process or subprocesses" when the default behavior in a Unix
> environment is to not be locale-aware and revert back to the C locale.

Nick Coghlan is the author of PEP-538, and not coincidentally, opened bug 17318
which we just resolved for glibc 2.35 with the addition of a C.UTF-8 that
harmonizes the downstream distribution implementations (and minimized it).

The creation of C.UTF-8 was partly driven by GNOME and Python needs to have
an "always on" UTF-8 encoded locale that could be used as a fallback, rather
than the ASCII C/POSIX locale [1][2].

I feel like this is less about locale awareness and more about the encoding of
information using UTF-8 by default.
 
> [*] https://www.python.org/dev/peps/pep-0538/

-- 
Cheers,
Carlos.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1313818
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=17318


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-01-27  5:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-27  0:35 why does `LANG` not influence the locale in GNU/Linux Godmar Back
2022-01-27  5:28 ` Carlos O'Donell
2022-01-27  5:40   ` Godmar Back
2022-01-27  5:53     ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).