* why does `LANG` not influence the locale in GNU/Linux
@ 2022-01-27 0:35 Godmar Back
2022-01-27 5:28 ` Carlos O'Donell
0 siblings, 1 reply; 4+ messages in thread
From: Godmar Back @ 2022-01-27 0:35 UTC (permalink / raw)
To: William Tambe via Libc-help
If I am writing code that uses fgetwc to read from standard input, I
need to call setlocale(LC_CTYPE, "en_US.utf8"); to ensure that fgetwc
will treat stdin as a UTF-8 encoded stream. This property appears to
hold (on Ubuntu 20, with default GNU libc), independent of the setting
of LANG. LANG is set to en_US.utf8.
By contrast, Python 3 changes its behavior with regard to the encoding
it assumes sys.stdin to be in based on the LANG variable - if set to
en_US.utf8, it'll decode as UTF-8, if set to C or unset, it'll treat
the input as 8-bit ASCII encoding (I believe).
My question is why GNU libc (chose?) to not use LANG and what
standards, if any, apply here.
Is my characterization of the behavior correct?
- Godmar
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: why does `LANG` not influence the locale in GNU/Linux
2022-01-27 0:35 why does `LANG` not influence the locale in GNU/Linux Godmar Back
@ 2022-01-27 5:28 ` Carlos O'Donell
2022-01-27 5:40 ` Godmar Back
0 siblings, 1 reply; 4+ messages in thread
From: Carlos O'Donell @ 2022-01-27 5:28 UTC (permalink / raw)
To: Godmar Back; +Cc: libc-help
On 1/26/22 19:35, Godmar Back via Libc-help wrote:
> If I am writing code that uses fgetwc to read from standard input, I
> need to call setlocale(LC_CTYPE, "en_US.utf8"); to ensure that fgetwc
> will treat stdin as a UTF-8 encoded stream. This property appears to
> hold (on Ubuntu 20, with default GNU libc), independent of the setting
> of LANG. LANG is set to en_US.utf8.
>
> By contrast, Python 3 changes its behavior with regard to the encoding
> it assumes sys.stdin to be in based on the LANG variable - if set to
> en_US.utf8, it'll decode as UTF-8, if set to C or unset, it'll treat
> the input as 8-bit ASCII encoding (I believe).
>
> My question is why GNU libc (chose?) to not use LANG and what
> standards, if any, apply here.
> Is my characterization of the behavior correct?
The ISO C standard and the POSIX standard have specified the behaviour.
ISO C says that at program startup the equivalent of:
setlocale (LC_ALL, "C"); /* (See 7.11.1.1.4) */
is executed. Therefore C programs start in the C locale, not the locale
as specified by LANG.
POSIX defines several environment variables, and they have precedence
rules for deciding which one should be used by the application [1].
However, this just defines the categories, but it doesn't *apply* them to the
running program. Internationalized programs that wish to initialize locale
specific operation must call setlocale [2]:
setlocale (LC_CALL, "");
Which will set up the global locale according to the environment variables
in the precedence as setup by POSIX.
Does that answer your question?
--
Cheers,
Carlos.
[1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
[2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: why does `LANG` not influence the locale in GNU/Linux
2022-01-27 5:28 ` Carlos O'Donell
@ 2022-01-27 5:40 ` Godmar Back
2022-01-27 5:53 ` Carlos O'Donell
0 siblings, 1 reply; 4+ messages in thread
From: Godmar Back @ 2022-01-27 5:40 UTC (permalink / raw)
To: Carlos O'Donell; +Cc: libc-help
On Thu, Jan 27, 2022 at 12:29 AM Carlos O'Donell <carlos@redhat.com> wrote:
>
> On 1/26/22 19:35, Godmar Back via Libc-help wrote:
> > If I am writing code that uses fgetwc to read from standard input, I
> > need to call setlocale(LC_CTYPE, "en_US.utf8"); to ensure that fgetwc
> > will treat stdin as a UTF-8 encoded stream. This property appears to
> > hold (on Ubuntu 20, with default GNU libc), independent of the setting
> > of LANG. LANG is set to en_US.utf8.
> >
> > By contrast, Python 3 changes its behavior with regard to the encoding
> > it assumes sys.stdin to be in based on the LANG variable - if set to
> > en_US.utf8, it'll decode as UTF-8, if set to C or unset, it'll treat
> > the input as 8-bit ASCII encoding (I believe).
> >
> > My question is why GNU libc (chose?) to not use LANG and what
> > standards, if any, apply here.
> > Is my characterization of the behavior correct?
>
> The ISO C standard and the POSIX standard have specified the behaviour.
>
> ISO C says that at program startup the equivalent of:
> setlocale (LC_ALL, "C"); /* (See 7.11.1.1.4) */
> is executed. Therefore C programs start in the C locale, not the locale
> as specified by LANG.
>
> POSIX defines several environment variables, and they have precedence
> rules for deciding which one should be used by the application [1].
> However, this just defines the categories, but it doesn't *apply* them to the
> running program. Internationalized programs that wish to initialize locale
> specific operation must call setlocale [2]:
>
> setlocale (LC_CALL, "");
There's an extraneous `C` here, I'm reading this as `setlocale (LC_ALL, "")`.
>
> Which will set up the global locale according to the environment variables
> in the precedence as setup by POSIX.
>
> Does that answer your question?
Yes it does, thank you.
It's slightly ironic if you read, for instance PEP-538 [*] that claims
that Python has a need "to use the configured locale encoding by
default for consistency with other locale-aware components in the same
process or subprocesses" when the default behavior in a Unix
environment is to not be locale-aware and revert back to the C locale.
[*] https://www.python.org/dev/peps/pep-0538/
>
> --
> Cheers,
> Carlos.
>
> [1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
> [2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: why does `LANG` not influence the locale in GNU/Linux
2022-01-27 5:40 ` Godmar Back
@ 2022-01-27 5:53 ` Carlos O'Donell
0 siblings, 0 replies; 4+ messages in thread
From: Carlos O'Donell @ 2022-01-27 5:53 UTC (permalink / raw)
To: Godmar Back; +Cc: libc-help
On 1/27/22 00:40, Godmar Back wrote:
> It's slightly ironic if you read, for instance PEP-538 [*] that claims
> that Python has a need "to use the configured locale encoding by
> default for consistency with other locale-aware components in the same
> process or subprocesses" when the default behavior in a Unix
> environment is to not be locale-aware and revert back to the C locale.
Nick Coghlan is the author of PEP-538, and not coincidentally, opened bug 17318
which we just resolved for glibc 2.35 with the addition of a C.UTF-8 that
harmonizes the downstream distribution implementations (and minimized it).
The creation of C.UTF-8 was partly driven by GNOME and Python needs to have
an "always on" UTF-8 encoded locale that could be used as a fallback, rather
than the ASCII C/POSIX locale [1][2].
I feel like this is less about locale awareness and more about the encoding of
information using UTF-8 by default.
> [*] https://www.python.org/dev/peps/pep-0538/
--
Cheers,
Carlos.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1313818
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=17318
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-01-27 5:53 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-27 0:35 why does `LANG` not influence the locale in GNU/Linux Godmar Back
2022-01-27 5:28 ` Carlos O'Donell
2022-01-27 5:40 ` Godmar Back
2022-01-27 5:53 ` Carlos O'Donell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).